estimation and inference in over-identified structural factor-augmented var …...

52
Estimation and Inference in Over-identified Structural Factor-augmented VAR Models Xu Han This Version: November 2015 Abstract During the past decade, factor-augmented VAR (FAVAR) models have been widely used for structural analysis in the literature, where the effects of structural shocks are often estimated under just-identifying restrictions. However, as the number of restric- tions in the FAVAR setup can be large due to its high dimensionality, the structural shocks are over-identified. This paper develops a new estimator for the impulse re- sponse functions with a fixed number of over-identifying restrictions. The proposed identification scheme nests the conventional just-identified recursive scheme as a spe- cial case. We establish the asymptotic distributions of the new estimator and develop test statistics for the over-identifying restrictions. Simulation results show that adding a few more over-identifying restrictions can lead to a substantial improvement in esti- mation accuracy. We estimate the effects of the monetary policy shock based on a U.S. macroeconomic data set. The result shows that our over-identified scheme can help to improve statistical significance and eliminate incorrect restrictions that lead to spurious impulse responses. Key words: High-dimensional Factor Models, Identification and Estimation, Structural Im- pulse Responses JEL Classification: C33, C13, E32 I would like to thank Xu Cheng, Atsushi Inoue, Arthur Lewbel, Bo Li, and Joris Pinkse for their insightful suggestions, and participants at 2015 Tsinghua International Conference on Econometrics for helpful comments. Department of Economics and Finance, City University of Hong Kong, Hong Kong SAR. E-mail: [email protected] 1

Upload: vuonganh

Post on 23-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Estimation and Inference in Over-identified StructuralFactor-augmented VAR Models∗

Xu Han†

This Version: November 2015

Abstract

During the past decade, factor-augmented VAR (FAVAR) models have been widelyused for structural analysis in the literature, where the effects of structural shocks areoften estimated under just-identifying restrictions. However, as the number of restric-tions in the FAVAR setup can be large due to its high dimensionality, the structuralshocks are over-identified. This paper develops a new estimator for the impulse re-sponse functions with a fixed number of over-identifying restrictions. The proposedidentification scheme nests the conventional just-identified recursive scheme as a spe-cial case. We establish the asymptotic distributions of the new estimator and developtest statistics for the over-identifying restrictions. Simulation results show that addinga few more over-identifying restrictions can lead to a substantial improvement in esti-mation accuracy. We estimate the effects of the monetary policy shock based on a U.S.macroeconomic data set. The result shows that our over-identified scheme can help toimprove statistical significance and eliminate incorrect restrictions that lead to spuriousimpulse responses.

Key words: High-dimensional Factor Models, Identification and Estimation, Structural Im-pulse ResponsesJEL Classification: C33, C13, E32

∗I would like to thank Xu Cheng, Atsushi Inoue, Arthur Lewbel, Bo Li, and Joris Pinkse for theirinsightful suggestions, and participants at 2015 Tsinghua International Conference on Econometrics for helpfulcomments.

†Department of Economics and Finance, City University of Hong Kong, Hong Kong SAR. E-mail:[email protected]

1

1 Introduction

In their seminal work, Bernanke et al. (2005) develop the factor-augmented VAR (FAVAR)model based on conventional VAR analysis and factor models (Sargent and Sims, 1977; Stockand Watson, 2002). They introduce a few common factors extracted from high-dimensionaldata into a VAR process. In contrast to a low-dimensional VAR model, FAVAR utilizesmore information in high-dimensional data to identify the space spanned by the underlyingstructural shocks without the loss of parsimony. Due to such merits, the empirical literaturerelated to FAVAR has rapidly grown in recent years (e.g., Stock and Watson, 2005; Boivinet al., 2009; Gilchrist et al., 2009; Bianchi et al., 2009; Forni and Gambetti, 2010; Koop andKorobilis, 2014; Ritschl and Sarferaz, 2014, among others).

As the structural shocks in FAVAR are not observed, additional conditions must be im-posed for identification purposes. Most identification methods in conventional structuralVAR models can also be applied to FAVAR. However, the numbers of restrictions used inFAVARs and structural VARs can be quite different. In practice, just-identified schemesare commonly used in structural VAR due to its limited number of variables and computa-tional simplicity. In contrast, the number of restrictions in FAVAR can be quite large dueto the high-dimensional nature of factor models. For example, Stock and Watson (2005)impose the assumption that the monetary policy shock does not contemporaneously affect64 slow-moving variables, such as IP indexes, employment, sales, wages, and CPIs. Gilchristet al. (2009) similarly impose dozens of short-run zero restrictions to identify the creditmarket shocks. As the number of factors is often set equal to a small number comparedwith the cross section dimension in a factor model, imposing all such restrictions leads to anover-identified system.

Using over-identifying restrictions in FAVAR is not only a natural result of the highdimensionality of the model. It also has at least two benefits from a theoretical perspec-tive. First, it is well known that adding valid and informative restrictions leads to moreefficient estimators. Second, over-identifying restrictions can be tested to let practitionersknow whether the imposed conditions are satisfied for the given data. In the low-dimensionalstructural VAR literature, over-identified models are often estimated by maximum likelihoodor Bayesian methods (e.g., Leeper et al., 1996; Sims and Zha, 1998; Canova and Pérez Forero,2012; Kociecki, et al., 2012, among others). In general, such estimation procedures do nothave closed-form solutions, and the computational cost can be very high, especially in ahigh-dimensional setup such as FAVAR. While Bai and Wang’s (2015) recent study developsa Bayesian method under the dynamic factor model setup that allows for the possibility ofover-identification, the literature related to frequentist estimation of over-identified structural

2

FAVAR models remains quite scarce.This paper contributes to the literature by developing a new frequentist method for es-

timation and inference in over-identified structural FAVAR models. We focus on the iden-tification method using contemporaneous zero restrictions, which have broad applicationsin the literature.1 The zero restrictions, which are usually guided by economic theory, areimposed on variables that are not affected by the target structural shock within the sameperiod. The just-identified recursive scheme studied by Forni and Gambetti (2010) and Baiet al. (2015) can be viewed as a special case of the identification scheme proposed in thispaper. We prove that the new estimator for the impulse response function is consistent andestablish its asymptotic distribution for the purpose of statistical inference.

The method developed in this paper has several advantages. First, the validity of theover-identifying restrictions can be tested. Based on Kleibergen and Paap’s (2006) ranktest, we propose a statistic that can jointly test two rank conditions used to identify thetarget structural shock. The new test can detect wrong identification conditions and thusavoid inconsistent estimators for the impulse response functions (IRFs). Second, since theconventional just-identified recursive scheme is nested as a special case under our frame-work, our estimator tends to be more efficient. Simulation results show that adding a fewover-identifying restrictions can decrease the mean squared error substantially. Third, thenew estimator under over-identifying restrictions has a closed-form solution and is easy tocompute.

Some recent work has studied identification theory in factor models or FAVAR. Under thefundamentalness assumption, Forni et al. (2009) show that the identification of structuralshocks in FAVAR reduces to the choice of the rotation matrix that links the reduced-formerrors and structural shocks. Bai and Ng (2013) and Bai and Li (2012) develop several setsof conditions under which a factor model can be just-identified. Under similar just-identifiedschemes, Bai et al. (2015) develop estimators and excellent inferential theory for estimatedIRFs in a FAVAR setup. However, it seems quite nontrivial to extend their estimators toallow for over-identifying restrictions. We also differentiate from Bai et al. (2015) in that ourmodel distinguishes static and dynamic factors. Bai and Wang (2014) propose three sets ofequivalent rank conditions under which the factors can be identified in static and dynamicfactor models. While their theory allows for over-identification, this paper substantially

1As one of the most widely used methods, contemporaneous zero restrictions have been applied in FAVARto identify various shocks including (but not limited to): the monetary policy shock (Bernanke et al., 2005;Stock and Watson, 2005; Boivin et al., 2009; Forni and Gambetti, 2010; Eickmeier et al., 2015), the creditmarket shocks (Gilchrist et al., 2009), the uncertainty shocks (Caggiano et al., 2014), the international shocks(Mumtaz and Surico, 2009), the banking shocks (Buch et al., 2014), and the news shocks (Forni et al., 2014).

3

differs from their work: we focus on the estimation and inference under an over-identifiedscheme, and they focus mainly on the conditions to ensure identification. Stock and Watson(2005) and Han (2015) consider an over-identified FAVAR where the number of restrictions isproportional to the cross-sectional dimension. Han’s (2015) test for over-identifying restric-tions is a test for the number of fast-moving shocks in Stock and Watson (2005). In contrast,the estimator in this paper is computed under a fixed number of restrictions, which decreasethe possibility of misspecification yet still preserve the advantages of over-identified schemes.Thus, the estimation procedure, inferential theory, and asymptotics of the test statistics areall quite different from those presented by Stock and Watson (2005) and Han (2015). Ga-farov’s (2014) recent study also considers a FAVAR with many identifying restrictions, butthe author mainly focuses on the possibility of point identification under sign restrictions.

In an empirical application, we estimate the effects of the monetary policy shock on themain U.S. macroeconomic variables via our estimators. The results show that the identifica-tion conditions rejected by our tests yield spurious IRFs with unexpected signs. In contrast,the identification conditions accepted by our tests lead to IRFs consistent with economictheory. In addition, the over-identified scheme improves the average statistical significanceof IRFs compared with the conventional just-identified scheme.

The rest of the paper is organized as follows. In Section 2, we introduce our new estimatorfor the IRFs and discuss the identification conditions. Section 3 presents our assumptions,provides the main theorems about the asymptotics of our estimators, and illustrates howto implement our inferential theory in practice. We develop test statistics to determine thevalidity of the over-identifying restrictions. Section 4 investigates the finite-sample propertiesof the estimators and statistics via Monte Carlo experiments. Section 5 uses an empiricalapplication to show the gains from using the proposed estimators under over-identifyingrestrictions. Section 6 concludes the paper.

Notations: Let PZ = Z(Z ′Z)−1Z ′ and MZ ≡ I − PZ for any matrix Z. Z+ denotes theMP inverse of Z. Kab is the commutation matrix such that Kabvec(Z) = vec(Z ′) for any a×b

matrix Z. Following the matrix algebra convention, we use the abbreviation Ka to denoteKaa. The Euclidean norm of a matrix Z is denoted as ∥Z∥ = [trace(Z ′Z)]1/2. We use →p

and →d to denote convergences in probability and distribution, respectively.

4

2 The Identification of Structural Shocks in FAVARModels

2.1 The Model Setup

We consider the following factor model for t = 1, ..., T ,

Xt = ΛFt + et, (2.1)

Ft =p∑

j=1ΦjFt−j + Gηt, (2.2)

ηt = Aζt, (2.3)

where Xt = [X1t, ..., XNt]′ is an N -dimensional vector, Ft is an r−dimensional unobservedstatic factor, Λ is an N × r factor loading matrix, et = [e1t, ..., eNt]′ is an N -dimensionalidiosyncratic error term, G is an r×q matrix of rank q, ηt is the q−dimensional reduced-formshock, ζt is the q−dimensional structural shock, E(ζtζ

′t) = Iq, and A is a q × q nonsingular

matrix. We set q ≤ r, so that our setup allows for dynamic factors. By assuming thestationarity of Ft, we have

(Ir − Φ1L − ... − ΦpLp)−1 = Ir +∞∑

s=1ΨsL

s, (2.4)

where Ψs is the coefficient matrix in the vector moving average representation of (2.2).Let Ft = [F ′

t−1, ..., F ′t−p]′ and Φ = [Φ1, ..., Φp]. Plugging (2.2) and (2.3) into (2.1), we

obtain

Xt = ΠFt + Θηt + et (2.5)

= ΠFt + Γζt + et, (2.6)

where Π = ΛΦ, Θ = ΛG and Γ = ΘA. The matrix representation of (2.6) is given by

X = FΠ′ + ηΘ′ + e

= FΠ′ + ζΓ′ + e, (2.7)

where X = [Xp+1, ..., XT ]′, F = [Fp+1, ..., FT ]′, η = [ηp+1, ..., ηT ]′, ζ = [ζp+1, ..., ζT ]′, ande = [ep+1, ..., eT ]′. We use λi, θi and γi to denote the transpose of the i-th row of Λ, Θ and

5

Γ, respectively. In particular, we have

Λ = [λ1, ..., λN ]′, Θ = [θ1, ..., θN ]′, Γ = [γ1, ..., γN ]′. (2.8)

For the i-th cross-section, the factor model (2.1) can be represented as

Xi = Fλi + ei, (2.9)

where Xi = [Xi1, ..., XiT ]′, F = [F1, ..., FT ]′, and ei = [ei1, ..., eiT ]′.Let Ft denote the standard principal component (PC) estimator for Ft, so that T −1∑T

t=1 FtF′t =

Ir. The OLS estimators for λi and Λ are given by

λi = 1T

T∑t=1

FtXit,

Λ = 1T

T∑t=1

XtF′t . (2.10)

Let Ft = [F ′t−1, ..., F ′

t−p]′, F = [Fp+1, ..., FT ]′, and

X = MFX. (2.11)

Since X is an estimate for ζΓ′ + e, the reduced-form shocks can be estimated by PC. Setthe estimated reduced-form shocks, denoted as η, equal to

√T − p times the eigenvectors

corresponding to the q largest eigenvalues of the (T − p) × (T − p) matrix XX ′. Note that Fand η are orthogonal by design. Thus, the OLS estimator for G in (2.2) can be computed as

G = 1T − p

T∑t=p+1

Ftη′t, (2.12)

where we use η′η/(T − p) = Iq. Furthermore, as Θ = ΛG, the estimators for θi and Θ canbe computed as

θi = G′λi, Θ = ΛG. (2.13)

The estimator for Φ is given by

Φ =

T∑t=p+1

FtF ′t

(F ′F)−1. (2.14)

Given Φ, we can compute Ψs for s = 1, 2, ... by inverting the lag polynomial defined in (2.4).

6

2.2 Identification and Estimation

Suppose we are interested in the effects of the m-th structural shock. Given the precedingmodel setup, the IRF of Xt with respect to the m-th shock ζm,t−s is given by

∂Xt

∂ζm,t−s

= ΛΨsGAm, (2.15)

where Am denotes the m-th column of A for m = 1, ..., q. For s = 0, we set Ψs = Ir, so (2.15)becomes

∂Xt

∂ζmt

= ΛGAm = ΘAm. (2.16)

The estimators for Λ, Ψs, and G are defined in Section 2.1. We need only an appropriateestimator for Am to compute the IRF in (2.15). It is well known that η can estimate itsunobserved counterpart only up to some rotation. To identify Am, additional restrictionsmust be imposed.

Assumption ID1 – (T − p)−1ζ ′ζ = Iq.

Assumption ID1 requires that the sample mean of ζtζ′t is equal to its population mean

Eζtζ′t = Iq. This assumption is commonly used in the structural FAVAR literature, such as in

the identifying restrictions IC3 and IC5 of Bai and Li (2012), and PC2 of Bai and Ng (2013).Furthermore, recall that the principal component estimator η satisfies that (T −p)−1η′η = Iq.Since Assumption ID1 implies that (T − p)−1ζ ′ζ = Iq, it follows that the estimator A mustbe orthonormal by Eq. (2.3).

We discuss the identification and estimation of Am as follows. We begin with the casewhere m = q. We impose the short-run restrictions that ζqt does not affect X1t, ...Xkt

contemporaneously for some k ≥ q − 1. This implies that Γ has the following structure:

Γ =

Γ1

Γ2

, Γ1 =

γ11 · · · γ1,q−1 0... · · · ... ...

γk1 · · · γk,q−1 0

k×q

, Γ2 is (N − k) × q. (2.17)

The structure of Γ1 implies thatΘ1Aq = 0k×1, (2.18)

where Θ1 denotes the first k rows of Θ. Let Θ1 and Λ1 denote the first k rows of Θ and Λ,respectively. Thus,

Θ1 = Λ1G. (2.19)

7

We make the following assumption on the rank of Γ1 to ensure the identification of Aq.

Assumption ID2 – rank(Γ1) = q − 1 and the zero restrictions are satisfied for Γ1 defined inEq. (2.17).

Assumption ID2 and Eq. (2.18) leads to a natural estimator for Aq, i.e.,

Aq = arg minAq∈Rq

A′qΘ′

1Θ1Aq s.t. A′qAq = 1. (2.20)

It is straightforward that Aq is the eigenvector associated with the smallest eigenvalue ofΘ′

1Θ1. The following procedure summarizes how to compute the IRFs with respect to ζqt.

Algorithm 1:(1) Compute Aq as the eigenvector associated with the smallest eigenvalue of Θ′

1Θ1, whereΘ1 denotes the first k rows of Θ defined in (2.13).(2) The contemporaneous effects of ζqt on Xt can be estimated as ΘAq, and the effects ofζq,t−s on Xt can be computed as ΛΨsGAq.

Next, consider the case where we are interested in the effects of the m-th shock ζjt for1 ≤ m < q. We assume that Γ1 has the following structure:

Γ =

Γ1

Γ2

, Γ1 =

γ11 · · · γ1,m−1 0 01×(q−m)... · · · ... ... ...

γℓ1 · · · γℓ,m−1 0 01×(q−m)

γℓ+1,1 · · · γℓ+1,m−1 γℓ+1,m 01×(q−m)... · · · ... ... ...

γk1 · · · γk,m−1 γk,m 01×(q−m)

k×q

, Γ2 is (N −k)×q. (2.21)

where k ≥ q − 1, k > ℓ ≥ m − 1, and the m-th column of Γ1 should have at least one nonzeroelement. The short-run restrictions in (2.21) have three implications: (1) the last q−m shocksdo not affect X1t, ..., Xkt contemporaneously; (2) the m-th shock does not affect X1t, ..., Xℓt

contemporaneously; and (3) the m-th shock could affect Xℓ+1,t, ..., Xkt contemporaneously.We make the following assumption to ensure the identification of Am.

Assumption ID3 – Let Γ11 denote the first ℓ rows of Γ1 in Eq. (2.21). rank(Γ1) = m,rank(Γ11) = m − 1, and the zero restrictions are satisfied for Γ1 defined in Eq. (2.21).

8

Furthermore, Γ′1Γ1 has distinct nonzero eigenvalues.

The assumption that Γ′1Γ1 has distinct nonzero eigenvalues ensures the differentiability of the

eigenvector functions. This condition is used in Section 3 in the derivation of the asymptoticdistribution of our estimators.

Given Eq. (2.21) and Assumption ID3, we propose the following procedure to estimatethe IRFs with respect to ζmt.

Algorithm 2:(1) Let A1:m denote the eigenvectors associated with the first m eigenvalues of Θ′

1Θ1 indescending order, where Θ1 denotes the first k rows of Θ defined in (2.13).(2) We set Am equal to the eigenvector associated with the smallest eigenvalue of A′

1:mΘ′11Θ11A1:m,

where Θ11 denotes the first ℓ rows of Θ1. (Note that Am is m × 1.) Hence, A1:mAm is anestimator for Am.2

(3) The contemporaneous IRFs with respect to ζmt are computed as ΘA1:mAm, and the IRFswith respect to ζm,t−s are computed as ΛΨsGA1:mAm.

Remark 1: The identification of Aq in (2.17) or Am in (2.21) is up to a sign change. Tofigure out the sign of Am, we can use economic theory to impose a sign restriction on anonzero element of the m-th column of Γ. In the rest of the paper, we assume that the signof Am has been pinned down unless otherwise stated.

Remark 2: In the case of m = q, we need at least q restrictions to identify Aq. The lastcolumn of Γ1 provides k restrictions, and the constraint A′

qAq = 1 in Eq. (2.20) provides anadditional restriction. If k ≥ q, then Aq is over-identified, and the identifying restrictionsγ1q = ... = γkq = 0 can be tested by examining whether rank(Γ1) = q − 1. If k = q − 1, thenAq is just-identified and rank(Γ1) is at most q − 1, so the identifying restrictions are alwayssatisfied.

In the case of 1 ≤ m < q, note that A1:m is an estimator for the space spanned byA1, ..., Am, the identification of which requires the first m columns of Γ1 to be of full columnrank. When k = m, the smallest q − m eigenvalues of Θ′

1Θ1 are always zero, and the spacespanned by A1, ..., Am is just-identified. When k > m, rank(Γ1) may be greater than m ifthe last q − m columns of Γ1 are not zero, so the identifying restrictions can be tested byimplementing a rank test on Θ1. Once Θ11A1:m is obtained, the estimation of Am is similarto that of Aq in the case of m = q. If ℓ ≥ m, then Am is computed under over-identifying

2When m = 1, note that Am = 1 and A1:m = A1.

9

restrictions, which can be tested by checking the rank of Θ11. The tests for identifyingrestrictions are discussed in Section 3.4.

Remark 3 (relation to the just-identified recursive scheme): The conventional identificationscheme usually imposes a recursive structure in the first q rows of Γ. To be specific, theupper q × q block of Γ is a lower triangular matrix of full rank, i.e.,

Γ =

γ11 0 · · · 0γ21 γ22 · · · 0... ... . . . ...

γq1 γq2 · · · γqq

... ... · · · ...γN1 γN2 · · · γNq

. (2.22)

This identification scheme is considered by Bai and Li (2012), Bai et al. (2015), Bai andNg (2013), and Forni and Gambetti (2010) among others. In addition to the q(q + 1)/2restrictions given by ID1, Eq. (2.22) provides q(q − 1)/2 restrictions, so matrix A is just-identified. Given Θ, the rotation matrix A can be computed by QR decomposition of thetranspose of the first q rows of Θ. See, for example, PC2 in Bai and Ng (2013).

To see the link between (2.22) and our new identification scheme, let us start with thelast column of A. Note that γ1q = ... = γq−1,q = 0 provides q −1 restrictions, so the first q −1rows of Γ in (2.22) correspond to Γ1 in (2.17). Since k = q −1 and the first q −1 rows of Γ areof rank q − 1, Aq is just-identified. Next, we consider Aq−1, i.e., m = q − 1 in our notation.Since Aq is just-identified, the space spanned by A1, ..., Aq−1 is also just-identified. The firstℓ rows of Γ in (2.22) correspond to Γ11 in (2.21). The (q − 1)-th column of Γ has q − 2 zeroentries. Thus, ℓ = q −2 and rank(Γ11) = q −2 = m−1. Similar to Aq, the estimation of Aq−1

(defined in step 2 of Algorithm 2) is under just-identified restrictions. Thus, Aq−1 (estimatedby A1:q−1Aq−1) is just-identified. The remaining columns of A are also just-identified and canbe estimated by implementing Algorithm 2 repeatedly. Hence, our new identification schemenests the commonly used just-identified recursive scheme as a special case.

Remark 4: In contrast to the conventional scheme in (2.22), both (2.17) and (2.21) allowover-identification and have the following advantages. First, the number of restrictions ina high-dimensional FAVAR can be large. For example, the monetary policy shock is oftenassumed to have no contemporaneous effects on slow-moving variables such as industrialproduction, unemployment, and CPIs, all of which consist of numerous sub-aggregate series

10

in a typical data set for factor models. Using every valid restriction leads to over-identificationand is expected to yield more efficient IRFs than a just-identified scheme. Second, just-identifying restrictions can always lead to an estimator with a lower triangular structure in(2.22) even if the restrictions are wrong. In contrast, because the validity of over-identifyingrestrictions can be tested, incorrect restrictions can be avoided in our estimation of IRFs andthe results are more reliable than those obtained under untestable just-identifying restrictions.

Remark 5: The identification schemes in this paper use a fixed number of restrictions. Incontrast, the number of over-identifying restrictions in studies by Stock and Watson (2005)and Han (2015) is diverging and equal to NS times the number of fast-moving shocks, whereNS is proportional to N and denotes the number of slow-moving variables. In practice,imposing a diverging number of restrictions increases the possibility of misspecification, soeven the least restrictive setup (with only one fast-moving shock) may still involve incor-rect restrictions due to a large NS, and no candidate setup can pass the specification test.Thus, this paper considers a fixed number of restrictions, which decrease the possibility ofmisspecification yet still preserve the advantages of over-identified schemes.

3 Assumptions and Asymptotic Theory

3.1 Assumptions

We make the following assumptions.Assumption 1 – There exists a positive constant M < ∞ such that:(a) E∥Ft∥4 < M , T −1∑T

t=1 FtF′t →p ΣF , and T −1∑T

t=p+1 FtF ′t →p ΣF for some positive

definite matrices ΣF and ΣF .(b) E(ζtζ

′t) = Iq, E∥ζt∥4 < M , E(ζsζ

′t) = 0 for any s = t, and (T − p)−1∑T

p+1 ζtζ′t →p Iq.

(c) E∥T −1/2∑Tt=p+1 ζtF ′

t∥2 < M .

Assumption 2 – There exists a positive constant M < ∞ such that:(a) E∥λi∥4 ≤ M , and ∥Λ′Λ/N − ΣΛ∥ →p 0 for some r × r positive definite matrix ΣΛ.(b) rank(G) = q, ∥G∥ ≤ M , and ∥Φ∥ ≤ M . A is orthonormal.(c) All of the roots of |Iq − Φ1L − ... − ΦpLp| = 0 are outside the unit circle.(d) Both ΣF ΣΛ and G′ΣΛG have distinct eigenvalues.

Assumption 3 – There exists a positive constant M < ∞ such that for all N and T :(a) E(eit) = 0, Ee2

it = σ2i , and E |eit|8 ≤ M for all i and t.

(b) E(e′set/N) = E(N−1∑N

i=1 eiseit) = γN(s, t), |γN(s, s)| ≤ M for all s, and T −1∑Ts=1

∑Tt=1

11

|γN(s, t)| ≤ M .(c) E(eiteh,t−j) = τih,t,j with |τih,t,j| ≤ |τih| for some τih and for all t and j = 0, ..., p. Inaddition, N−1∑N

i=1∑N

h=1|τih| ≤ M .(d) E(eitehs) = τih,ts, and (NT )−1∑N

i=1∑N

h=1∑T

s=1∑T

t=1 |τih,ts| ≤ M .(e) For every (t, s), E

∣∣∣N−1/2∑Ni=1[eiseit − E(eiseit)]

∣∣∣4 ≤ M .

Assumption 4 – The variablesλi, ζt and eit are three mutually independent groups.

Assumption 5 – There exists a positive constant M < ∞ such that for all N and T , andfor every t ≤ T and i ≤ N :(a) ∑T

s=1 |γN(s, t)| ≤ M .(b)∑N

h=1 |τih| ≤ M .

Assumption 6 – There exists a positive constant M < ∞ such that for all N and T :(a) For each t, E

∥∥∥ 1√NT

∑Ts=1

∑Ni=1 Fs[eiseit − E(eiseit)]

∥∥∥2≤ M , E

∥∥∥ 1√NT

∑Ts=p+1

∑Ni=1 ζs[eiseit

−E(eiseit)]∥2 ≤ M .(b) E

∥∥∥ 1√NT

∑Tt=1

∑Ni=1 Ftλ

′ieit

∥∥∥ ≤ M .

(c) E∥∥∥ 1√

NT

∑Tt=p+1

∑Ni=1 λiei,t−jF ′

t

∥∥∥2≤ M and E

∥∥∥ 1√NT

∑Tt=p+1

∑Ni=1 λiei,t−jζ

′t

∥∥∥2≤ M for j =

0, 1, ..., p.(d) For each t, E

∥∥∥ 1√N

∑Ni=1 λieit

∥∥∥2≤ M .

(e) For each i, E∥∥∥ 1√

T

∑Tt=1 Fteit

∥∥∥4≤ M and E

∥∥∥ 1√T

∑Tt=p+1 ζteit

∥∥∥4≤ M . For each i and for

j = 1, ..., p, E∥∥∥ 1√

T

∑Tt=p+1 F ′

t−jeit

∥∥∥4≤ M .

Assumptions 1–6 are either from or slight modifications of Assumptions A–G of Bai(2003). Assumption 1(a) regulates the moments of static factors. The positive definitenessof ΣF is the same as Assumption A10 of Amengual and Watson (2007). Assumption 1(b)imposes conditions on the structural shocks, which are serially uncorrelated and have anidentity covariance matrix. Assumption 1(c) is not restrictive, as it is commonly assumed thatζt and lags of Ft are uncorrelated in VAR models. Assumption 2(a) follows from AssumptionB of Bai (2003). Furthermore, for any nonsingular matrix A, there exists a nonsingularmatrix DA such that GA = GA with A = DAA being orthonormal and G = GD−1

A . Thus,setting A to orthonormal does not result in a loss of generality. Assumption 2(d) is similar toAssumption G of Bai (2003). It ensures the existence of the probability limits of the rotationmatrices HF and Hη defined in (3.1) and (3.2). Assumptions 3 and 5 allow the idiosyncraticerrors to be correlated in both time and cross section dimensions. Assumption 4 is similar toAssumption D of Bai and Ng (2004). Assumption 6 is not stringent because all of the sums

12

in this assumption involve zero mean random variables. It is close to Assumption F of Bai(2003) and Assumption 6 of Han (2015).

It is well known that the principal components consistently estimate the original factorsup to a rotation. Let X0 ≡ [X1, ..., XT ]′ be the full data matrix. We define the followingrotation matrices for Ft and ηt,

HF =(

Λ′ΛN

)(F ′F

T

)V −1

X , (3.1)

Hη =(

Θ′ΘN

)(η′η

T − p

)V , (3.2)

where F = [F1, ..., FT ]′, F = [F1, ..., FT ]′, VX is the r×r diagonal matrix consisting of the firstr largest eigenvalues of X0X0′

/(NT ) in decreasing order, and V is the q × q diagonal matrixconsisting of the first q eigenvalues of XX ′/[N(T − p)] in descending order. By Lemma A3and Proposition 1 of Bai (2003), the rotation matrices have well defined probability limits.Thus, we define

HF = plimN,T →∞HF ,

Hη = plimN,T →∞Hη. (3.3)

Lemma A1 of Bai (2003) shows that T −1∑Tt=1 ∥Ft − H ′

F Ft∥2 = Op

(δ−2

NT

), where

δNT = min(√

N,√

T ). (3.4)

We can analogously define HF such that Ft is a consistent estimate for H ′FFt. Recall that

Ft = [F ′t−1, ..., F ′

t−p]′ and Ft = [F ′t−1, ..., F ′

t−p]′. Let

HF ≡ Ip ⊗ HF , (3.5)

so that Ft is a consistent estimator for H ′FFt. The probability limit of HF is defined as

HF = Ip ⊗ HF . (3.6)

3.2 Asymptotic Distributions of the IRFs

This subsection presents the results of the asymptotic distributions of the estimated IRFs byAlgorithms 1 and 2. We derive the asymptotic representations of Λ, Ψs, G, Θ, Aq, A1:m, and

13

Am, and then combine these results to obtain the asymptotic distributions of the IRFs. Let

e(1) = [e(1)1 , ..., e

(1)T ]′ and e

(1)t = [e1t, ..., ekt]′. (3.7)

Thus, e(1)t denotes the idiosyncratic errors for the first k variables used for the identification

conditions in (2.17) and (2.21).The following assumption is required to derive the asymptotic distributions of the pro-

posed estimators.

Assumption 7 – Central Limit TheoremFor any i = 1, ..., N ,

1√T

vec

(e(1)′

F)

F ′ei

vec(F ′η)

→d N(0(kr+r+rpq)×1, Ωi

). (3.8)

Assumption 7 is simply the central limit theorem and can be confirmed under primitiveassumptions. See, for example, White’s (1984) Theorem 5.15 or Brockwell and Davis’s (1991)Theorem 7.1.2.

Proposition 1 – Under Assumptions 1–7, if√

T/N → 0 as N, T → ∞, then

√Tvec(Λ1 − Λ1H

−1′

F ) = 1√T

(H ′F ⊗ Ik)vec

(e(1)′

F)

+ op(1)

→dN(0(k+1)r×1, (H ′

F ⊗ Ik)Ω(1)(HF ⊗ Ik))

, (3.9)

where Ω(1) denotes the rk×rk upper-left block of Ωi, i.e., the asymptotic variance of 1√T

vec(e(1)′

F).

Proposition 2(a) Under Assumptions 1–6 and ID1, G − H ′

F GHη = Op(δ−2NT ).

(b) Under Assumptions 1–7 and ID1, if√

T/N → 0 as N, T → ∞, then

√Tvec(Θ1 − Θ1H

−1′

η ) = (G′H ′F ⊗ Ik) 1√

Tvec

(e(1)′

F)

+ op(1)

→d N(0(k+1)q×1, SΩ(1)S ′

), (3.10)

14

where S = H ′ηG′HF H ′

F ⊗ Ik.

Proposition 3 – Under Assumptions 1–7, if√

T/N → 0 as N, T → ∞, then for anyi = 1, ..., N ,

√T

vec(Λ1 − Λ1H

−1′

F )λi − H−1

F λi

vec(Ψ′s − H−1

F Ψ′sHF )

= W1√T

vec

(e(1)′

F)

F ′ei

vec(F ′η)

+ op(1),

→d N(0(kq+r+r2)×1, WΩiW

′)

, (3.11)

where W =

(H ′

F ⊗ Ik) 0k×r 0k×rpq

0r×kr H ′F 0r×rpq

0r2×kr 0r2×r Rs

[H ′

F G ⊗(ΣFHF

)−1] and Rs = ∑s

j=1(H−1F Ψj−1HF ⊗

[H−1F Ψ′

s−jHF , H−1F Ψ′

s−j−1HF , ..., H−1F Ψ′

s−j−p+1HF ]) with Ψ0 = Ir and Ψs = 0r×r for s < 0.

Proposition 3 directly implies the asymptotic distribution of [vec(Λ1 − Λ1H−1′

F )′, (λi −H−1

F λi)′]′, which is summarized by the following corollary.

Corollary 1 – Under the Assumptions of Proposition 3,

√T

vec(Λ1 − Λ1H−1′

F )λi − H−1

F λi

→d N(0(kr+r)×1, W1ΩiW

′1

), (3.12)

where W1 =

(H ′F ⊗ Ik) 0k×r 0k×rpq

0r×kr H ′F 0r×rpq

.

Propositions 1–3 provide the asymptotic distributions of Λ1, λi, Θ1, G, and Ψs. Proposition2(a) shows that

√T (G − H ′

F GHη) has a degenerate limit distribution if√

T/N → 0, so thatH ′

F GHη can be replaced with G as if G is observed when√

T is much smaller than N . Thedistribution in Proposition 2(b) is useful for obtaining the asymptotic representations of Aq

and A1:mAm, both of which are continuous functions of Θ1. The results in Proposition 3 andCorollary 1 are applied to obtain the asymptotic distributions of the IRFs over time. Thefollowing theorem presents the result under the identification scheme specified by (2.17).

Theorem 1 – Under Assumptions ID1, ID2, and 1–7, and the condition that√

T/N → 0 asN and T → ∞,

15

(a) Aq is a consistent estimator for H ′ηAq and

√T (Aq − H ′

ηAq) = B1√

Tvec(Θ1 − Θ1H−1′

η ) + op(1).

For the first k variables,

√T (Θ1Aq − Θ1Aq) = B2

√Tvec(Θ1 − Θ1H

−1′

η ) + op(1)

→d N(0k×1, B2SΩ(1)S ′B′2), (3.13)

where B1 = A′qHη ⊗

[(−H−1

η Θ′1Θ1H

−1′η )+H−1

η Θ′1

]with rank(B1) = q − 1, B2 = Θ1H

−1′η B1 +

(A′qHη) ⊗ Ik with rank(B2) = k − q + 1, and S is defined in Proposition 2(b).

(b) For i = k + 1, ..., N ,

√T (θ′

iAq − θ′iAq) =

√TB

(i)3

vec(Λ1 − Λ1H−1′

F )(λi − H−1

η λi)

+ op(1)

→d N(0, B

(i)3 W1ΩiW

′1B

(i)′

3

), (3.14)

where C1 = [Ikr...0kr×r], C2 = [0r×kr

...Ir], and B(i)3 = θ′

iH−1′η B1(H ′

ηG′HF ⊗ Ik)C1 +A′qG

′HF C2.(c) For the IRFs of Xit to ζq,t−s (s ≥ 1),

√T (λ′

iΨsGAq − λ′iΨsGAq) =

√TB

(i)4

vec(Λ1 − Λ1H

−1′

F )λi − H−1

F λi

vec(Ψ′s − H−1

F Ψ′sHF )

+ op(1)

→d N(0, B(i)4 WΩiW

′B(i)′

4 ), (3.15)

where C3 = [Irk...0rk×(r+1)r], C4 = [0r×rk

...Ir...0r×r2 ], C5 = [0r2×(rk+r)

...Ir2 ], and B(i)4 = λ′

iΨsGHη

B1(H ′ηG′HF ⊗ Ik)C3 + A′

qG′Ψ′

sHF C4 + (λ′iH

−1′

F ⊗ A′qG

′HF )C5.

Next, we derive the asymptotic results for the IRFs computed in Algorithm 2. Let A1:m

denote the eigenvectors associated with the first m eigenvalues of Θ′1Θ1, and let A∗

m be theeigenvector associated with the smallest eigenvalue of A′

1:mΘ′11Θ11A1:m, where Θ11 denotes

the first ℓ rows of Θ1. LetC0 = [Iℓ

...0ℓ×(k−ℓ)], (3.16)

so Θ11 = C0Θ1. Based on Assumption ID3, we know that rank(Θ1) = m and rank(Θ11) = m−1. Hence, A1:m spans the space of [A1, ..., Am], and the smallest eigenvalue of A′

1:mΘ′11Θ11A1:m

is zero, which implies that A∗′mA′

1:mΘ′11Θ11A1:mA∗

m = 0. If a sign is appropriately selected for

16

A∗m, then it follows that

Am = A1:mA∗m. (3.17)

Let αj and αj denote the j-th eigenvalue of Θ′1Θ1 and Θ′

1Θ1, respectively (j ≤ m), and letAj denote the j-th column of A1:m. Define

B5 = (A′1:mHη ⊗ Ik) + (Im ⊗ Θ1H

−1′

η )Q[(Kq + Iq2)(Iq ⊗ H−1

η Θ′1)]

B6 = A∗′

m ⊗ (−A′1:mΘ′

11Θ11A1:m)+A′1:mΘ′

11, (3.18)

where Q =

Q1...

Qm

with Qj = plimN,T →∞

(A′

j

)⊗ (αjIq − H−1

η Θ′1Θ1H

−1′η )+ for j = 1, ..., m.

Theorem 2 – Under Assumptions ID1, ID3, and 1–7, and the condition that√

T/N → 0 asN and T → ∞,(a) Am consistently estimates A∗

m. The IRFs of the first k variables of Xt with respect to ζmt

(m < q) have the following asymptotic distribution

√T (Θ1A1:mAm − Θ1Am) =B7

√Tvec(Θ1 − Θ1H

−1′

η ) + op(1)

→dN(0k×1, B7SΩ(1)S ′B′7) (3.19)

where B7 =[(A∗′

m ⊗ Ik) + Θ1A1:mB6(Im ⊗ C0)]B5 with C0 defined in (3.16).

(b) A1:mAm consistently estimates H ′ηAm and

√Tvec(A1:mAm − H ′

ηAm) = B8√

Tvec(Θ1 − Θ1H−1′

η ) + op(1),

where B8 = (A∗′m ⊗ Iq)Q

[(Kq + Iq2)(Iq ⊗ H−1

η Θ′1)]

+ H ′ηA1:mB6(Im ⊗ C0)B5.

For i = k + 1, ..., N , the IRFs of the i-th variable of Xt with respect to ζmt (m < q) havethe following asymptotic representation

√T (θ′

iA1:mAm − θ′iAm) = B

(i)9

√T

vec(Λ1 − Λ1H−1′

F )(λi − H−1

η λi)

+ op(1)

→d N(0, B

(i)9 W1ΩiW

′1B

(i)′

9

), (3.20)

where B(i)9 = θ′

iH−1′η B8(H ′

ηG′HF ⊗ Ik)C1 + A′mG′HF C2, with C1 and C2 defined in Theorem

1(b).

17

(c) For the IRFs of Xit with respect to ζm,t−s (m < q, s ≥ 1), we have

√T(λ′

iΨsGA1:mAm − λ′iΨsGAm

)= B

(i)10

√T

vec(Λ1 − Λ1H

−1′

F )λi − H−1

F λi

vec(Ψ′s − H−1

F Ψ′sHF )

+ op(1)

→d N(0, B(i)10 WΩiW

′B(i)′

10 ), (3.21)

where B(i)10 = λ′

iΨsGHηB8(H ′ηG′HF ⊗ Ik)C3 + A′

mG′Ψ′sHF C4 + (λ′

iH−1′

F ⊗ A′mG′HF )C5, with

C3, C4, and C5 defined in Theorem 1(c).

Note that the asymptotic variance of√

T (Θ1Aq − Θ1Aq) in Theorem 1(a) is not of full rankbecause q − 1 of k zero restrictions are used for the identification of Aq. When k = q − 1,Aq is just-identified, so Θ1Aq and its asymptotic variance are exactly equal to zero. Theasymptotic variance of

√T (Θ1A1:mAm − Θ1Am) in Theorem 2(a) is also not of full rank.

More specifically, the first ℓ variables provide m − 1 restrictions for the identification of A∗m.

Hence, the rank of the variance of√

T (Θ11A1:mAm − Θ11Am) (i.e., contemporaneous IRFs ofthe first ℓ variables) is no greater than ℓ − m + 1. This result is not listed in Theorem 2 dueto space limitations. However, it is included in the proof for Theorem 2(a).

Theorems 1 and 2 establish the consistency and asymptotic normality for the dynamicIRFs computed by Algorithms 1 and 2. Note that our estimators can consistently estimatethe IRFs without any rotation, although Aq and A1:mAm are consistent estimators for H ′

ηAq

and H ′ηAm, respectively. One does not need Hη or HF to estimate the asymptotic variances

of the estimated IRFs. Hence, Theorems 1 and 2 are very useful for frequentist inference inempirical analysis. The next subsection provides practical guidance for the implementationof Theorems 1 and 2.

3.3 Implementation Guidance for the Inference of Estimated IRFs

To conduct statistical inference for the estimated IRFs, we propose the following estimatorsto compute the asymptotic variances in Theorems 1 and 2. The idiosyncratic errors can beconsistently estimated as follows:

et = Xt − ΛFt. (3.22)

18

Let e(1)t and eit denote the first k and the i-th elements of et, respectively. Let

Σ(1) = 1T − r

T∑t=1

e(1)t e

(1)′

t . (3.23)

Since F and e are independent by Assumption 4, we propose the following estimator forSΩ(1)S ′:

SΩ(1)S ′ = (G′G) ⊗ Σ(1). (3.24)

For the inference of IRFs in Theorems 1(b), 1(c), 2(b), and 2(c), we compute the followingestimator for WΩiW

′:

WΩiW ′ = 1T − r

T∑t=p+1

ξ(i)t ξ

(i)′

t , (3.25)

where

ξ(i)t =

vec(e(1)

t F ′t)

Fteit

Rs

(G ⊗

(F ′FT −p

)−1)

vec(Ftη′t)

,

andRs =

s∑j=1

(Ψj−1 ⊗ [Ψ′s−j, Ψ′

s−j−1, ..., Ψ′s−j−p+1])

with Ψ0 = Ir and Ψs = 0r×r for s < 0. The consistency of Σ(1) and WΩiW ′ holds if et isserially uncorrelated.3 When et is serially correlated, the HAC estimators for the asymptoticvariances can be readily constructed following Bai (2003) and Han and Inoue (2014).4

The constant matrices Q, Bs for s = 1, 2, 5, 6, 7, 8, and B(i)s for s = 3, 4, 9, 10 can be

readily estimated by replacing the unknown parameters with their finite-sample analogs.Based on (3.3), Propositions 1–3, and Theorems 1–2, we know that Θ1, θi, λi, Ψs, G, Aq,A1:m, Am consistently estimate Θ1H

−1′η , H−1

η θi, H−1F λi, H ′

F ΨsH−1′

F , H ′F GHη, H ′

ηAq, H ′ηA1:m,

A∗m, respectively. Furthermore, Hη is orthonormal by Lemma 4 in the appendix. Hence, we

propose the following estimators for the constant matrices used in Theorems 1 and 2:Q =

[Q′

1, ..., Q′m

]′with Qj = A′

j ⊗ (αjIq − Θ′1Θ1)+;

B1 =[A′

q ⊗ (−Θ′1Θ1)+Θ′

1

], B2 = Θ1B1 + A′

q ⊗ Ik, B(i)3 = θ′

iB1(G′ ⊗ Ik)C1 + A′qG

′C2;B

(i)4 = λ′

iΨsGB1(G′ ⊗ Ik)C3 + A′qG

′Ψ′sC4 + (λ′

i ⊗ A′qG

′)C5;3The proof for the consistency of the variance estimators in (3.24) and (3.25) mainly follows Lemma B3

of Han (2015).4The consistency of HAC estimators involving estimated factors is proved by, for example, Theorem 2 of

Han and Inoue (2014).

19

B5 = (A′1:m ⊗ Ik) + (Im ⊗ Θ1)Q

[(Kq + Iq2)(Iq ⊗ Θ′

1)];

B6 = A′m ⊗ (−A′

1:mΘ′11Θ11A1:m)+A′

1:mΘ′11;

B7 =[(A′

m ⊗ Ik) + Θ1A1:mB6(Im ⊗ C0)]

B5;B8 = (A′

m ⊗ Iq)Q[(Kq + Iq2)(Iq ⊗ Θ′

1)]

+ A1:mB6(Im ⊗ C0)B5;B

(i)9 = θ′

iB8(G′ ⊗ Ik)C1 + A′mA′

1:mG′C2;B

(i)10 = λ′

iΨsGB8(G′ ⊗ Ik)C3 + A′mA′

1:mG′Ψ′sC4 + (λ′

i ⊗ A′mA′

1:mG′)C5.

3.4 Testing the Over-identifying Restrictions

One major advantage of our estimators is that they allow us to test the validity of over-identifying restrictions in the factor loading matrix. To test the over-identifying restrictionsin Γ1 in (2.17) when k > q − 1, we can check the rank of the estimator Θ1. Consider thefollowing hypotheses:

H(d)0 : rank(Θ1) = d,

H(d)1 : rank(Θ1) > d. (3.26)

Given the asymptotic distribution of Θ1 (see Proposition 1), we can implement Kleibergenand Paap’s test (2006, hereafter KP test). Define

Ξ = Σ(1)−1/2Θ1(G′G)−1/2. (3.27)

Via singular value decomposition, we decompose Ξ as

Ξ = UDV ′ =

U11 U12

U21 U22

D1

D2

V ′11 V ′

21

V ′12 V ′

22

, (3.28)

where U ′U = Ik, V ′V = Iq, D is a k × q matrix consisting of the singular values of Ξ on itsmain diagonal and is equal to zero elsewhere; U11, D1, and V11 are d × d matrices; U12 andU ′

21 are d × (k − d) matrices; V12 and V ′21 are d × (q − d) matrices; U22 is a (k − d) × (k − d)

matrix; V22 is a (q − d) × (q − d) matrix; and D2 is a (k − d) × (q − d) matrix. Define

Md,1 =

U12

U22

U−122 (U22U ′

22)1/2,

Md,2 = (V22V ′22)1/2(V ′

22)−1[V ′

12...V ′

22

],

ρd = (Md,2 ⊗ M ′d,1)vec(Ξ). (3.29)

20

Given the results in Proposition 2, we obtain the following theorem.

Theorem 3 – Under H(d)0 , Assumptions 1–7, ID1, and the condition that

√T/N → 0 as N

and T → ∞, it follows that

√T ρd →d N(0, I(q−d)(k−d)),

wd = T ρ′dρd →d χ2

(k−d)(q−d). (3.30)

Hence, if ID2 holds, then d = q − 1 and under the null hypothesis H(q−1)0 , (3.30) implies

wq−1 →d χ2k−q+1. (3.31)

Remark 6: An alternative way to test H(q−1)0 is to use the asymptotic distribution of√

T (Θ1Aq − Θ1Aq). Based on Theorem 1(a), it follows that

TA′qΘ′

1

[B2((G′G) ⊗ Σ(1)

)B′

2

]+Θ1Aq →d χ2

k−q+1. (3.32)

The statistic in (3.32) is asymptotically equivalent to wq−1 under H(q−1)0 , but cannot be

applied to test H(d)0 when d < q − 1. Thus, we consider only the more general test wd in the

rest of the paper.

Next, we consider testing the over-identifying restrictions in (2.21). Recall that ID3imposes two rank conditions: rank(Γ1) = m and rank(Γ11) = m − 1. We could test the rankof Θ1 and Θ11 separately using the KP test defined in (3.30). However, this method wouldnot control the size for jointly testing two rank conditions. Thus, we propose a joint test forthe following hypotheses:

H(d)′

0 : rank(Θ1) = d and rank(Θ11) = d − 1

H(d)′

1 : rank(Θ1) > d or rank(Θ11) > d − 1. (3.33)

Note that rank(Θ11A1:m) = d − 1 under H(d)′

0 , so our test statistic checks the rank of Θ11A1:m

in addition to Θ1. DefineΥ = Σ(1)−1/2

11 Θ11A1:m, (3.34)

where Σ(1)11 denotes the upper-left ℓ × ℓ corner of Σ(1), corresponding to errors of the first ℓ

21

variables. Via singular value decomposition, we decompose Υ as

Υ = U (ℓ)D(ℓ)V(ℓ)′ =

U (ℓ)11 U (ℓ)

12

U (ℓ)21 U (ℓ)

22

D(ℓ)1

D(ℓ)2

V(ℓ)′

11 V(ℓ)′

21

V(ℓ)′

12 V(ℓ)′

22

,

where U (ℓ)′U (ℓ) = Iℓ, V(ℓ)′V(ℓ) = Im, D is an ℓ × m matrix consisting of the singular values ofΥ on its main diagonal and is equal to zero elsewhere; U (ℓ)

11 , D(ℓ)1 , and V(ℓ)

11 are (d−1)× (d−1)matrices; U12 and U ′

21 are (d − 1) × (ℓ − d + 1) matrices; V12 and V ′21 are (d − 1) × (m − d + 1)

matrices; U22 is a (ℓ − d + 1) × (ℓ − d + 1) matrix; V22 is a (m − d + 1) × (m − d + 1) matrix;and D2 is a (ℓ − d + 1) × (m − d + 1) matrix. We define

M(ℓ)d,1 =

U (ℓ)12

U (ℓ)22

(U (1)22

)−1 (U (ℓ)

22 U (ℓ)′

22

)1/2,

M(ℓ)d,2 =

(V(ℓ)

22 V(ℓ)′

22

)1/2(V(ℓ)′

22 )−1[V(ℓ)′

12...V(ℓ)′

22

],

τd = (M (ℓ)d,2 ⊗ M

(ℓ)′

d,1 )vec(Υ),

Bd =

(Md,2 ⊗ M ′

d,1

) ((G′G)−1/2 ⊗ Σ(1)−1/2

)(M

(ℓ)d,2 ⊗ M

(ℓ)′

d,1 Σ(1)−1/2

11 C0

)B5

. (3.35)

The joint test is then computed as

wjointd = T [ρ′

d, τ ′d](Bd

((G′G) ⊗ Σ(1)

)B′

d

)−1 ρd

τd

. (3.36)

The term τd is constructed in a similar manner to ρd and designed to test the rank of Θ11A1:m.The joint statistic essentially combines the two tests for rank(Θ1) = d and rank(Θ11A1:m) =d − 1 given that both of their distributions depend on that of Θ1. The matrix Bd is a bridgeconnecting the distributions of (ρ′

d, τ ′d)′ and vec(Θ1). To establish the theoretical result for

the joint test, we make the following assumption.

Assumption 8 – The matrix Bd(SΩ(1)S ′)B′d is asymptotically nonsingular.

This assumption is similar to Assumption 2 of Kleibergen and Paap (2006) and ensures theinvertibility of the covariance matrix. Theorem 4 presents the asymptotic distribution ofwjoint

d .

Theorem 4 – Under H(d)′

0 , Assumption ID1 and 1–8, if√

T/N → 0 and Γ′1Γ1 has distinct

22

nonzero eigenvalues, thenwjoint

d →d χ2(k−d)(q−d)+(ℓ−d+1). (3.37)

Thus under the identification assumption ID3, the following is true for the joint statistic

wjointm →d χ2

(k−m)(q−m)+(ℓ−m+1). (3.38)

In practice, the true value of m is unknown. We propose the following test procedure todetermine the value of m.Algorithm 3:

Start with a value 0 ≤ m ≤ q − 1 and test the null hypothesis H(m)′

0 : rank(Θ1) = m andrank(Θ11) = m − 1 versus the alternative hypothesis H(m)′

1 : rank(Θ1) > m or rank(Θ11) >

m − 1. If the null is not rejected at the level α, then we set m = m. If the null is rejectedat the level α, then test the null H(m+1)′

0 versus the alternative H(m+1)′

1 . Repeat the testingprocedure until H(j)′

0 is not rejected at the level α and set m = j.

Algorithm 3 can select the true m with a probability approaching 1−α. If the null hypothesism = q − 1 is rejected, then it implies that the identification assumption is incorrect. Oneshould consider changing the variables used for identification.

Remark 7: Consider the case of k = ℓ + 1 in (2.21). Assume that the true data generatingprocess is such that all of the zero restrictions are satisfied except those in the last q − m

columns of the (ℓ + 1)-th row, i.e.,

Γ1 =

γ11 · · · γ1,m−1 0 01×(q−m)... · · · ... ... ...

γℓ1 · · · γℓ,m−1 0 01×(q−m)

γℓ+1,1 · · · γℓ+1,m−1 γℓ+1,m ∗1×(q−m)

k×q

, (3.39)

where the asterisk denotes the nonzero entries. In such a scenario, the rank conditions inID3 continue to hold, but the zero restrictions in ID3 do not. The incorrect restrictionsdenoted by the asterisk in (3.39) cannot be tested, and the estimated IRFs are inconsistentin general. Recall that the just-identified scheme in (2.22) always has k = ℓ + 1 (see Remark3). This again demonstrates the advantage of using over-identifying restrictions. In practice,we recommend using k > ℓ + 1, if possible, to avoid the untestable restrictions (denoted bythe asterisk) in (3.39).

23

4 Simulations

This section investigates the finite-sample performance of the proposed estimators. We con-sider the following data generating processes (DGPs). Similar to Bai et al. (2015), we specifythe following VAR process for the dynamic factors

ft = ϕI · ft−1 + ζt, (4.1)

where the structural shocks ζt ∼ N(0q×1, Iq) and ϕ = 0.7. The static factors are generatedas Ft = [f ′

t , f1,t−1, ..., fr−q,t−1]′ with r ≤ 2q for t = 1, ..., T , so some lags of ft are included asstatic factors. The idiosyncratic errors are generated by eit

i.i.d.∼ N(0, r) for i = 1, ..., N andt = 1, ..., T . Eq. (4.1) implies that

Ft = Φ1Ft−1 + GAζt,

where Φ1 =

ϕI 0q×(r−q)

0(r−q)×q 0(r−q)×(r−q)

and GA =

Iq

0(r−q)×1

. We set r = 7 and q = 5. The

number of replications is 5000.Recall that Γ = ΛGA. By the design on GA, Γ is equal to the first q columns of Λ. The

elements in the last r − q columns in Λ are drawn from i.i.d. N(0, 1). The first q columns ofΛ (i.e., Γ) are generated according to the structure specified as follows:

Γ =

Γ1

Γ2

, Γ1 =

∗ · · · ∗ 0 01×(q−m)... · · · ... ... ...∗ · · · ∗ 0 01×(q−m)

∗ · · · ∗ ι 01×(q−m)... · · · ... ... ...∗ · · · ∗ ∗ 01×(q−m)

k×q

, Γ2 =

∗ · · · ∗ ι

∗ · · · ∗ ∗... · · · ... ...∗ · · · ∗ ∗

(N−k)×q

(4.2)

where the asterisks are nonzero entries drawn from i.i.d. N(0, 1) and ι is a fixed constantwhose sign is assumed to be known to pin down the signs of Aq and A1:mAm. Hence, theobservables are generated as

Xt = ΛFt + et = [Γ...Λr−q+1:r]Ft + et, (4.3)

where Λr−q+1:r denotes the last r − q columns of Λ. We set m ∈ 3, 4, (k, ℓ) ∈ (5, 4), (6, 4),(6, 5), (7, 4), (7, 5), and ι = 1.5 in the simulations. For m = 4, all of the k − ℓ combinations

24

considered in our simulations lead to over-identification of Aq and Am. For m = 3, Am isover-identified, but Aq is not identified.

We investigate the estimation accuracy based on just-identifying and over-identifyingrestrictions. For q = 5 and m = 4, we would need (k, ℓ) = (4, 3) to achieve just-identificationfor Aq and Am by Remark 2. Under our DGP with m = 4, it is always possible to use a subsetof valid restrictions so that Aq and Am are just-identified. 5 Table 1 reports the ratio of rootmean square errors (RMSEs) of ΘAq and ΘA1:mAm under all of the available restrictionsin Γ1 to the RMSEs of ΘAq and ΘA1:mAm under a subset of just-identifying restrictions.It is clear that all of the ratios in Table 1 are much less than one, indicating that addinga few over-identifying restrictions can substantially improve the estimation accuracy of theestimators (i.e., the contemporaneous IRFs). For instance, using k = 5 (i.e., the degree ofover-identification is one) can decrease the RMSE of ΘAq by more than 30% compared withthe just-identified scheme for N = 125 and T = 250. The improvement in efficiency increasesas N and T increase. In addition, ΘAq is more accurate when k increases, and ΘA1:mAm

is more accurate when either k or ℓ increases. Thus, Table 1 shows that the proposedestimators based on over-identifying restrictions can significantly improve the estimationaccuracy compared with the conventional estimators under the just-identified scheme.

Tables 2 and 3 focus on the finite-sample performance of the asymptotic distributionsestablished in Theorems 1 and 2. The frequencies with which the t-statistics lie outside theinterval [−1.96, 1.96] are reported. Table 2 summarizes the results for the IRFs with respectto the q-th shock and m-th shock when m = 4, and Table 3 summarizes the results for theIRFs with respect to the m-th shock when m = 3. We consider the horizon s = 5 for thedynamic IRFs. In general, the asymptotic approximation works well in finite samples andthe rejection frequency approaches 5% as N and T increase. In addition, the numbers inTable 2 (m = 4) are even closer to 5% than those in Table 3. This is consistent with thefollowing intuition: a smaller m means more zero eigenvalues of Θ′

1Θ1, but the eigenvaluesof its finite-sample analog Θ′

1Θ1 are never exactly zero, so the a smaller m introduces largerestimation errors (than the prediction based on asymptotic distributions) in small samples.However, as N and T increase, all the rejection frequencies in Table 3 improve and convergeto the nominal level.

We also study the finite-sample performance of wd and wjointd proposed in Section (3.4).

We use wm and wjointm to test the null hypothesis that rank(Θm) = m and the joint null

hypothesis that rank(Θ1) = m and rank(Θ11) = m − 1, respectively. The nominal size is5%. It is not surprising that the magnitude of the size distortion is positively correlated with

5For example, we only use restrictions in the 3rd – 6th rows of Γ1 for the case (k, ℓ) = (7, 5).

25

the degrees of freedom of the asymptotic χ2 distribution. For m = 3 and (k, ℓ) = (7, 5), thejoint test follows a χ2

11 asymptotically, and its rejection frequency is 10.6% when (N, T ) =(125, 250) and 7.5% when (N, T ) = (250, 500). Thus, the size distortion vanishes as N andT increase. In general, the effective size of the proposed tests is acceptable for the samplesizes considered in the simulations.

In the last experiment, we investigate the power of the wm and wjointm tests. The data

are generated in the same way as before, except in this case all of the zero entries of Γ1 in(4.2) are replaced with random draws from i.i.d. N(0, β2) with β ∈ 0.1, ..., 1. Under thisrevised DGP, the matrix Γ1 is of full rank q. We compare the power of wm and wjoint

m withthe power of the infeasible rank tests, assuming that both Xt − Λr−q+1:r[f1,t−1, ..., fr−q,t−1]′

and ft are observed.6 For the null hypothesis rank(Θm) = m, we use the KP test as theinfeasible analog. For the joint null hypothesis, we construct a joint test in a similar mannerto the feasible wjoint

d . We consider the size-adjusted power for the feasible and infeasible tests.Figure 1 shows the size-adjusted power when (N, T ) = (125, 500) and (k, ℓ) = (6, 4). Thesolid lines with circles denote the size-adjusted power of wd and wjoint

d . The solid lines withasterisks denote the size-adjusted power of the infeasible KP test and joint rank test. Theupper-left (lower-left) panel shows the power of wd and the KP test against the null hypothesisthat rank(Θ1) = 4 (rank(Θ1) = 3). The upper-right (lower-right) panel shows the power ofwjoint

d and the infeasible joint test against the joint null hypothesis that rank(Θ1) = 4 andrank(Θ11) = 3 (rank(Θ1) = 3 and rank(Θ11) = 2). It is clear that the size-adjusted powerof wd and wjoint

d is very close to that of their infeasible analogs, which confirms that theproposed tests are powerful against the false null hypothesis.7 The power increases in thevalue of β and number of restrictions tested. We also run simulations for other values of k −ℓ

combinations with different sample sizes. The results exhibit very similar patterns to whatis shown in Figure 1.

5 Empirical Application

This section presents the empirical applications of the proposed method in practice. The dataset is an updated version of that used by Stock and Watson (2005). It consists of monthlyobservations of 124 U.S. macroeconomic time series from 1960:1 through 2010:12. The seriesare properly transformed so that they are approximately stationary. The transformation issimilar to those seen in studies by Stock and Watson (2005), Bernanke et al. (2005), and

6Note that Xt − Λr−q+1:r[f1,t−1, ..., fr−q,t−1]′ = Γft + et by our DGP.7The power of the proposed test could be even higher than that of the infeasible ones if no size adjustment

is conducted.

26

Forni and Gambetti (2010). See the supplement appendix for the full list of variables andtransformation codes.

5.1 Model Specification

We use Bai and Ng’s (2002) information criteria and detect 12 static factors. Hence, we setr = 12, which seems to be large enough to recover the space spanned by the static factors.We also try r = 10 and the results are qualitatively the same. BIC suggests that Ft follows aVAR(1) process, so we set p = 1 in our analysis. Following Amengual and Watson (2007), weapply Bai and Ng’s (2002) information criteria on X and detect five dynamic factors. Thus,we set q = 5 in our benchmark model.

We consider the identification of the monetary policy shock. In a conventional structuralVAR analysis, it is common to assume that slow-moving variables such as real output andprice levels are not affected by the monetary policy shock contemporaneously. In addition,some other fast-moving shocks do not affect the monetary policy instruments within thesame month, i.e., the central bank is not aware of the contemporaneous information aboutthese fast shocks when it makes decisions related to the monetary policy. Therefore, it isreasonable to consider a setup where the factor loading Γ follows a structure similar to (2.21).For Γ11, we include the following four variables in our benchmark setup: IP index, CPI, PCEdeflator, and commodity price. For Γ1, we add a few more variables: the federal funds rate(FFR), M1, M2, and total reserve, all of which are closely related to the monetary policyinstruments adopted by Christiano, Eichenbaum, and Evans (1998). Hence, the benchmarkmodel has k = 8 and ℓ = 4.

Before estimating the IRFs, we implement the proposed tests in Section (3.4) to checkthe model specification. Based on Algorithm 3, we find the following results: the p-valuefor the null hypothesis rank(Θ1) = 3 and rank(Θ11) = 2 is 2 × 10−10, and the p-value forthe null hypothesis rank(Θ1) = 4 and rank(Θ11) = 3 is 0.276. This implies m = 4 for ourbenchmark specification. We also implement the wd test to double check the ranks of Θ1 andΘ11 separately. The p-values are reported in Table 5 and are consistent with the joint testresults.

5.2 The Effects of a Monetary Policy Shock

First, we compare the IRFs based on two different setups for m: (1) m = 4, the benchmarksetup selected by our tests; and (2) m = 3, the setup rejected by our over-identificationtests. Figure 2 presents the cumulative IRFs of various macroeconomic variables after a

27

contractionary monetary policy shock. It is remarkable that the results based on m = 3(marked by dashed curves) are quite contradictory to economic theory. For m = 3, the realoutput and employment undergo substantial increases after the shock. The price puzzle isalso evident: all the price indexes (CPI, PCE deflator, and PPI) remain above their pre-shocklevels even three years after the shock. Many other variables also exhibit a wrong sign in theirIRFs, such as consumer expectation, inventories, unemployment, and capacity utilization.

In contrast, the IRFs under m = 4 are very consistent with what is expected to occur aftera contractionary monetary policy shock. The real output, consumption, employment, orders,M2, consumer expectation, and all price indexes start to decline after the shock. No pricepuzzle appears. In addition, both the JPY/USD exchange rate and the SP500 respond veryquickly after the shock, undergoing a significant shift and then remaining at a constant level.Hence, there is no delayed overshooting puzzle for exchange rate (Eichenbaum and Evans,1995), which is consistent with the findings of Forni and Gambetti (2010). In summary, thecomparison in Figure 2 shows the advantage of using testable over-identifying restrictionsto uncover the effects of a structural shock. The setup selected by our tests (m = 4) yieldsmore reasonable results than the rejected setup (m = 3). With a conventional just-identifiedscheme, we are unable to test the validity of the restrictions and could obtain very misleadingestimates for the IRFs.

Next, we compare the inferential results for the IRFs under over- and just-identifyingrestrictions. A subset of restrictions in our benchmark specification can be used so thatthe model is just-identified. Under the just-identification scheme, the variables in Γ11 areIP index, CPI, and commodity price, and Γ1 includes one additional variable: FFR. Thecontemporaneous IRFs of the variables in Γ11 are exactly zero. The IRFs under the just-identified scheme are similar to those shown in Figure 2 with m = 4 and thus not reported.Instead, we focus on whether the over-identification scheme can improve the inferential resultsfor the IRFs.

Overall, for the variables considered in Figure 2, our benchmark (over-identified) setupleads to 434, 382, and 282 t-ratios that are significant at the 10%, 5%, and 1% levels,respectively, whereas the just-identified setup leads to 384, 351, and 261 t-ratios that aresignificant at the 10%, 5%, and 1% levels, respectively.8 Moreover, Table 6 reports the p-values of the t-ratios of the estimated IRFs under just- and over-identified schemes. The majorpatterns of the t-ratios are similar across these two schemes, but adding over-identifyingrestrictions does lead to improvements in statistical significance. For example, although

8The total number of comparable t-ratios are 20 × 37 − 3 = 737 for 20 variables over s = 0, 1, ..., 36 with3 unavailable t-ratios excluded under the just identified scheme.

28

employment does not respond significantly to the monetary policy shock under the just-identified scheme, it responds to the shock significantly at the 10% level 18 months after theshock under our over-identified scheme. In addition, the contemporaneous response of theJPY/USD exchange rate has a p-value equal to 1.2% under the over-identified scheme versusa p-value equal to 10.5% under the just-identified scheme.

6 Conclusion

This paper develops new estimators for the impulse response functions in structural FAVARmodels under over-identifying restrictions. Compared with the untestable just-identifiedschemes commonly used in the literature, our framework allows practitioners to test thevalidity of the identification restrictions. We establish the asymptotic distributions of thenew estimators and develop test statistics for the over-identifying restrictions. A simulationstudy confirms that the estimated impulse response functions tend to be more accurate underan over-identified scheme than under a just-identified scheme. An empirical applicationwith U.S. macroeconomic data shows that our over-identified scheme can help to improvestatistical significance and eliminate the use of incorrect restrictions that lead to spuriousimpulse responses.

29

Appendix

A Lemmas

The following lemmas are useful in the proofs of main theoretical results in the paper. Tosave space, the proofs of the following lemmas are provided in the supplement appendix.Lemma 1: under Assumptions 1–4(a) T −1∑T

t=1 ∥Ft − H ′F Ft∥2 = Op

(δ−2

NT

)and T −1∑T

t=p+1 ∥Ft − H ′FFt∥2 = Op(δ−2

NT ).(b) VX →p V , where V is the diagonal matrix consisting of the eigenvalues of ΣF ΣΛ.(c) HF and HF are Op(1) and nonsingular as N and T → ∞.Lemma 2: under Assumptions 1–6(a) T −1∑T

t=1(Ft − H ′F Ft)[F ′

t , eit, η′t] = Op(δ−2

NT ) for any given i = 1, ..., N .(b) T −1(F − FHF)′[F ...η] = Op(δ−2

NT ).Lemma 3: under Assumptions 1–6T −1∑T

t=p+1 ∥ηt − H ′ηηt∥2 = Op(δ−2

NT ).Lemma 4: under Assumptions 1–6(a) T −1(η − ηHη)′η = Op(δ−2

NT ).(b) Hη →p Hη, where Hη is orthonormal.

B Proofs for Propositions

Proof of Proposition 1:Consider the first k variables of Xt. Let Λ1 denote the first k rows of Λ and X

(1)t =

[X1t, ..., Xkt]′. Recall that e(1)t = [e1t, ..., ekt]′ and e(1) = [e(1)

1 , ..., e(1)T ]′. Hence,

X(1)t = Λ1Ft + e

(1)t , (B.1)

Since T −1∑Tt=1 FtF

′t = Ir, the OLS estimator for Λ1 is

Λ1 = 1T

T∑t=1

X(1)t F ′

t = 1T

T∑t=1

(Λ1Ft + e(1)t )F ′

t (B.2)

30

Note that

1T

T∑t=1

Λ1FtF′t = Λ1H

−1′

F

1T

T∑t=1

FtF′t + Λ1H

−1′

F

1T

T∑t=1

(H ′F Ft − Ft)F ′

t

= Λ1H−1′

F + Op(δ−2NT )

where the term Op(δ−2NT ) follows Lemma B3 of Bai (2003). Now, we can rewrite (B.2) as

Λ1 = Λ1H−1′

F + 1T

T∑t=1

e(1)t F ′

tHF + 1T

T∑t=1

e(1)t (Ft − H ′

F Ft)′ + Op(δ−2NT )

= Λ1H−1′

F + 1T

e(1)′FHF + Op(δ−2

NT ), (B.3)

where T −1∑Tt=1 e

(1)t (Ft − H ′

F Ft)′ = Op(δ−2NT ) by Lemmas 2(a). Hence, using the condition√

T/N → 0, we obtain

√Tvec(Λ1 − Λ1H

−1′

F ) = 1√T

(H ′F ⊗ Ik)vec

(e(1)′

F)

+ op(1). (B.4)

The asymptotic normality directly follows from Assumption 7 and (3.3).Q.E.D.

Proof of Proposition 2:(a) Note that

G = 1T − p

T∑t=p+1

Ftη′t

= 1T − p

T∑t=p+1

H ′F Ftη

′t + 1

T − p

T∑t=p+1

(Ft − H ′F Ft)(ηt − H ′

ηηt)′ + 1T − p

T∑t=p+1

(Ft − H ′F Ft)η′

tHη

= 1T − p

T∑t=p+1

(H ′

F ΦH−1′

F H ′FFt−1η

′t + H ′

F Gηtη′t

)+ Op(δ−2

NT )

= 1T − p

T∑t=p+1

(H ′

F ΦH−1′

F H ′FFt−1η

′t + H ′

F Gηtη′tHη

)+ Op(δ−2

NT ),

where the last two terms in the 2nd line are Op(δ−2NT ) by Lemmas 1(a), 2(a), 3, and Cauchy-

Schwarz inequality, and the 4th line follows from Lemma 4(a). Recall the identificationrestriction ζ ′ζ/(T − p) = Iq in Assumption ID1, we have η′η/(T − p) = Iq since A is or-

31

thonormal. By fact that F ′η = 0, we have

G − H ′F GHη = 1

T − pH ′

F ΦH−1′

F (FHF − F)′η + Op(δ−2NT )

= H ′F ΦH−1′

F [(FHF − F)′(η − ηHη) + (FHF − F)′ηHη]T − p

+ Op(δ−2NT )

= Op(δ−2NT ), (B.5)

by Cauchy-Schwarz inequality and Lemmas 1(a), 3 and 2(a).(b) Consider the distribution of Θ1. Note that Lemmas 3 and 4(a) imply

η′η − H ′ηη′ηHη

T − p=

(η − ηHη)′(η − ηHη) + (η − ηHη)′ηHη + H ′ηη′(η − ηHη)

T − p= Op(δ−2

NT ).

Thus, we have

(η′η − H ′ηη′ηHη)/(T − p) = Iq − H ′

ηη′ηHη/(T − p) = Op(δ−2NT ). (B.6)

By Assumptions ID1 and 2(b), we know η′η/(T − p) = Iq, and thus (B.6) implies thatIq = H ′

ηHη + Op(δ−2NT ), which means

H ′η = H−1

η + Op(δ−2NT ). (B.7)

Now, we can derive the asymptotic distribution of√

Tvec(Θ1 − Θ1H−1′η ). Since Θ1 = Λ1G,

we can obtain√

Tvec(Θ1 − Θ1H−1′

η ) =√

Tvec(Λ1G − Λ1H−1′

F H ′F GHη) + Op(

√Tδ−2

NT )= (H ′

ηG′HF ⊗ Ik)√

Tvec(Λ1 − Λ1H−1′

F ) + Op(√

Tδ−2NT ), (B.8)

where the 1st line uses the result in (B.7) and 2nd line uses Proposition 2(a). The desiredresult is obtained by combining (B.4) and (B.8).Q.E.D.

Proof of Proposition 3:First, consider the distributions of Φ and Ψs. Note that Lemma 2 implies the following:

T −1T∑

t=p+1(Ft − H ′

FFt)F ′t = T −1

T∑t=p+1

(Ft − H ′FFt)(F ′

tΦ′ + η′tG

′) = Op(δ−2NT ), (B.9)

T −1T∑

t=p+1ΦFt(Ft − H ′

F Ft)′ = T −1T∑

t=p+1(Ft − Gηt)(Ft − H ′

F Ft)′ = Op(δ−2NT ). (B.10)

32

Hence, we have

T −1T∑

t=p+1Ft(Ft − H ′

F Ft)′

=T −1T∑

t=p+1(Ft − H ′

FFt)(Ft − H ′F Ft)′ + T −1

T∑t=p+1

H ′FFt(Ft − H ′

F Ft)′ = Op(δ−2NT ), (B.11)

where the 1st term is Op(δ−2NT ) by Cauchy-Schwarz inequality and Lemma 1(a), and the 2nd

term is Op(δ−2NT ) by (B.10).

The model (2.2) can be rewritten as

F ′tHF =

p∑j=1

F ′t−jHF H−1

F Φ′jHF + η′

tG′HF

= F ′t−1HFH−1

F Φ′HF + η′tG

′HF . (B.12)

The OLS estimate for Φ is given by

Φ′ =(

F ′FT − p

)−1 ∑Tt=p+1 FtF

′t

T − p

=(

F ′FT − p

)−1 ∑Tt=p+1

[H ′

FFtF′tHF + (Ft − H ′

FFt)F ′tHF + Ft(Ft − H ′

F Ft)′]

T − p, (B.13)

where the last two terms in the square brackets are Op(δ−2NT ) by (B.9) and (B.11). Hence,

Φ′ =(

H ′FF ′FHF

T − p

)−1 ∑Tt=p+1 H ′

FFtF′tHF

T − p+ Op(δ−2

NT ), (B.14)

where we use the fact that

F ′FT − p

−H ′FF ′FHFT − p

= (F − FHF )′(F − FHF )T − p

+H ′FF ′(F − FHF )

T − p+(F − FHF )′FHF

T − p= Op(δ−2

NT )

(B.15)by Lemmas 1(a) and 2(b). Plugging (B.12) into (B.14) yields

Φ′ =(

H ′FF ′FHF

T − p

)−1 ∑Tt=p+1 H ′

FFt(F ′tHFH−1

F Φ′HF + η′tG

′HF )T − p

+ Op(δ−2NT )

√Tvec(Φ′ − H−1

F Φ′HF ) =

H ′F G ⊗

(H ′

FF ′FHF

T − p

)−1

H ′F

√T∑T

t=p+1 vec(Ftη′t)

T − p+ Op

(√T

δ2NT

).

(B.16)

33

Combining (2.2) and (2.4) gives

F ′tHF = η′

tG′HF +

∞∑s=1

η′tG

′HF H−1F Ψ′

sHF .

(B.16) implies that Φ′j is a consistent estimated for H−1

F Φ′jHF , so Ψ′

s is a consistent estimatefor H−1

F Ψ′sHF in the VMA representation. By (11.7.1) to (11.7.5) of Hamilton (1994), we

have √Tvec(Ψ′

s − H−1F Ψ′

sHF ) = Rs

√Tvec(Φ′ − H−1

F Φ′HF ) + op(1), (B.17)

where Rs = ∑sj=1(H−1

F Ψj−1HF ⊗ [H−1F Ψ′

s−jHF , H−1F Ψ′

s−j−1HF , ..., H−1F Ψ′

s−j−p+1HF ]) withΨ0 = Ir and Ψs = 0r×r for s < 0.

Finally, for the OLS estimator λi, it follows that

λi − H−1F λi = 1

TH ′

F F ′ei + Op(δ−2NT ). (B.18)

Combining the results in (B.4), (B.18), and (B.17) gives

√T

vec(Λ1 − Λ1H

−1′

F )λi − H−1

F λi

vec(Ψ′s − H−1

F Ψ′sHF )

= 1√T

W

vec

(e(1)′

F)

F ′ei

vec(F ′η)

+ op(1).

The proof of Proposition 3 is completed.Q.E.D.

C Proofs for Theorems

Proof of Theorem 1:(a) First, we derive the asymptotics of Aq. Recall that αq and αq denotes the smallest

eigenvalue of Θ′1Θ1 and Θ′

1Θ1, respectively, so we have

Θ′1Θ1Aq = αqAq.

Assumption ID2 implies αq = 0. Since Θ1Aq = 0k×1 by (2.17) and A′qAq = 1, we know that

Aq is the eigenvector associated with the smallest eigenvalue of Θ′1Θ1, i.e.,

34

Θ′1Θ1Aq = Aqαq,

H−1η Θ′

1Θ1H−1′

η H′

ηAq = H−1η Aqαq, (C.1)

Combining (B.7) and (C.1), we obtain

H−1η Θ′

1Θ1H−1′

η H ′ηAq = H ′

ηAqαq + Op(δ−2NT ). (C.2)

Let A∗q denote the eigenvector associated with the smallest eigenvalue of H−1

η Θ′1Θ1H

−1′η . Since

αq = 0 by Assumption ID2, we have

H−1η Θ′

1Θ1H−1′

η A∗q = 0

H−1η Θ′

1Θ1H−1′

η H ′ηAq = Op(δ−2

NT ). (C.3)

Since H−1η Θ′

1Θ1H−1′η has only one zero eigenvalue, (C.3) implies A∗

q ± H ′ηAq = Op(δ−2

NT ). Byimplementing an appropriate sign restriction on A∗

q, we obtain

A∗q = H ′

ηAq + Op(δ−2NT ). (C.4)

Since αq = 0 is unique, the eigenvector associated with the zero eigenvalue of H−1η Θ′

1Θ1H−1′η

is a continuously differentiable function in the neighborhood of Θ1H−1′η . Also, Proposition 2

shows that Θ1 consistently estimates Θ1H−1′η , so Aq consistently estimates A∗

q by continuousmapping theorem. By Theorem 7 of Magnus and Neudecker (1999), we know that,

dAq = (−H−1η Θ′

1Θ1H−1′

η )+(dΘ′1Θ1 + Θ′

1dΘ1)A∗q

=[A∗′

q ⊗ (−H−1η Θ′

1Θ1H−1′

η )+] [

(Θ′1 ⊗ Iq)vec(dΘ′

1) + (Iq ⊗ Θ′1)vec(dΘ1)

]=[A∗′

q ⊗ (−H−1η Θ′

1Θ1H−1′

η )+] [

(Θ′1 ⊗ Iq)Kkq + (Iq ⊗ Θ′

1)]

vec(dΘ1)

= B1vec(dΘ1). (C.5)

Since Θ1 − Θ1H−1′η →p 0 and Θ1H

−1′η A∗

q = 0k×1 by the definition of A∗q, it follows that

A∗′q Θ′

1 →p 01×k. Thus, we have

B1 →p B1 ≡ A′qHη ⊗

[(−H−1

η Θ′1Θ1H

−1′

η )+H−1η Θ′

1

], with rank(B1) = q − 1,

because rank(Θ1) = q −1 by Assumption ID2. By Delta method and asymptotic distribution

35

in Proposition 1, we obtain

√T (Aq − A∗

q) = B1√

Tvec(Θ1 − Θ1H−1′

η ) + op(1)

→d N(0q×1, B1SΩ(1)S ′B′1). (C.6)

Under the condition that√

T/N → 0, (C.4) and (C.6) imply that

√T (Aq − H

ηAq) = B1√

Tvec(Θ1 − Θ1H−1′

η ) + op(1), (C.7)

which has the same asymptotic distribution as (C.6).Next, we obtain the contemporaneous IRFs of the first k variables,

√T (Θ1Aq − Θ1Aq) =

√T Θ1(Aq − H

ηAq) +√

T (Θ1 − Θ1H−1′

η )H ′

ηAq

= Θ1B1 + [(A′qHη) ⊗ Ik]

√Tvec(Θ1 − Θ1H

−1′

η ) + op(1)

= B2√

Tvec(Θ1 − Θ1H−1′

η ) + op(1)

→d N(0k×1, B2SΩ(1)S ′B′2),

where

B2 = Θ1H−1′

η B1 + (A′qHη) ⊗ Ik

= A′qHη ⊗

[Ik − Θ1H

−1′

η (H−1η Θ′

1Θ1H−1′

η )+H−1η Θ′

1

]. (C.8)

Note that Θ1H−1′η (H−1

η Θ′1Θ1H

−1′η )+H−1

η Θ′1 is symmetric and idempotent and its rank is equal

torank(Θ1) = q − 1. Hence,

rank(B2) = 1 × trace[Ik − Θ1H

−1′

η (H−1η Θ′

1Θ1H−1′

η )+H−1η Θ′

1

]= k − rank(Θ1) = k − q + 1,

where we use the fact that the rank and the trace are equal for symmetric and idempotentmatrices.

(b) To obtain the asymptotic representation of θ′iAq for i = k + 1, ..., N , note that

vec(Λ1 − Λ1H−1′

F ) = [Ikr...0kr×r]

vec(Λ1 − Λ1H−1′

F )(λi − H−1

η λi)

+ Op(δ−2NT ),

λi − H−1η λi = [0r×kr

...Ir]

vec(Λ1 − Λ1H−1′

F )(λi − H−1

η λi)

+ Op(δ−2NT ).

36

Let C1 = [Ikr...0kr×r] and C2 = [0r×kr

...Ir]. Then we have

√T (θ′

iAq − θ′iAq) =

√T θ′

i(Aq − H ′ηAq) +

√T (θ′

i − θ′iH

−1′

η )H ′ηAq

=√

T θ′iB1vec(Θ1 − Θ1H

−1′

η ) +√

TA′qHη(θi − H−1

η θi) + op(1)

=√

T (θ′iB1(H ′

ηG′HF ⊗ Ik)C1 + A′qG

′HF C2)

vec(Λ1 − Λ1H−1′

F )(λi − H−1

η λi)

+ op(1)

→d N(0, B

(i)3 W1ΩiW

′1B

(i)′

3

),

where the third line uses (B.8), and B(i)3 = θ′

iH−1′η B1(H ′

ηG′HF ⊗ Ik)C1 + A′qG

′HF C2.(c) Next, consider the IRFs of Xit to ζq,t−s (s ≥ 1). By the condition

√T/N → 0, and

results in (B.5), (B.7), and Proposition 2, we have

√T (λ′

iΨsGAq − λ′iH

−1′

F H ′F ΨsH−1′

F H ′F GHηH−1

η Aq)

=√

T λ′iΨsG(Aq − H ′

ηAq) +√

T (λi − H−1F λi)′ΨsH ′

F GAq +√

Tλ′iH

−1′

F (Ψs − H ′F ΨsH−1′

F )H ′F GAq + op(1)

=√

T λ′iΨsGB1vec(Θ1 − Θ1H−1′

η ) +√

TA′qG′HF Ψ′

s(λi − H−1F λi)

+ (λ′iH

−1′

F ⊗ A′qG′HF )

√Tvec(Ψ′

s − H−1F Ψ′

sHF ) + op(1)

=[λ′

iΨsGB1(H ′ηG′HF ⊗ Ik)C3 + A′

qG′HF Ψ′sC4 + (λ′

iH−1′

F ⊗ A′qG′HF )C5

]√T

vec(Λ1 − Λ1H−1′

F )λi − H−1

F λi

vec(Ψ′s − H−1

F Ψ′sHF )

+ op(1),

where C3 = [Irk...0rk×(r+1)r], C4 = [0r×rk

...Ir...0r×r2 ], and C5 = [0r2×(rk+r)

...Ir2 ], and the secondlast line follows from (C.7). Hence,

√T (λ′

iΨsGAq − λ′iΨsGAq) →d N(0, B

(i)4 WΩiW

′B(i)′

4 ),

where B(i)4 = λ′

iΨsGHηB1(H ′ηG′HF ⊗ Ik)C3 + A′

qG′Ψ′

sHF C4 + (λ′iH

−1′

F ⊗ A′qG

′HF )C5.Q.E.D.

Proof of Theorem 2:(a) Recall that A1:m are the eigenvectors associated with first m eigenvalues of Θ′

1Θ1 indescending order. Recall that C0 = [Iℓ

...0ℓ×(k−ℓ)], so Θ11 = C0Θ1. By the definition of A1:m,

A′1:mΘ′

1C′0C0Θ1A1:mAm = Amαm, (C.9)

where αm is the smallest eigenvalue of A′1:mΘ′

11Θ11A1:m. To find the asymptotic distributionof Am, we first investigate the asymptotic representation of Θ11A1:m. By design of A1:m, we

37

haveΘ′

1Θ1A1:m = A1:mZ,

where Z is a diagonal matrix consisting of the first m eigenvalues of Θ′1Θ1 in descending

order. Similar to (C.1), let Z be the diagonal matrix consisting of the first m eigenvalues ofΘ′

1Θ1 and let A1:m denote the first m columns of A, so we have

H−1η Θ′

1Θ1H−1′

η H ′ηA1:m = H−1

η A1:mZ

H−1η Θ′

1Θ1H−1′

η H ′ηA1:m = H ′

ηA1:mZ + Op(δ−2NT ), (C.10)

where the last line of (C.10) uses (B.7). By Assumption ID3, Θ′1Θ1 = AΓ′

1Γ1A′ has distinct

eigenvalues, so H−1η Θ′

1Θ1H−1′η also has distinct nonzero eigenvalues because Hη is orthonor-

mal asymptotically by Lemma 4(b). Hence, the first m eigenvectors of H−1η Θ′

1Θ1H−1′η are

continuously differentiable functions. Recall that A∗1:m denotes the eigenvectors associated

with the first m eigenvalues of H−1η Θ′

1Θ1H−1′η , so (C.10) implies that

A∗1:m = H ′

ηA1:m + Op(δ−2NT ), (C.11)

if the signs of each column of A∗1:m are properly adjusted. Let A∗

j and Aj denote the j-thcolumn of A∗

1:m and A1:m, respectively (1 ≤ j ≤ m); let αj and αj denote the j-th diagonalelement of Z and Z, respectively. Also, recall that Θ1 consistently estimates Θ1H

−1′η , so

A1:m consistently estimates A∗1:m and H ′

ηA1:m (up to a sign). By Theorem 7 of Magnus andNeudecker (1999), we have

dAj = (αjIq − H−1η Θ′

1Θ1H−1′

η )+(dΘ′1Θ1 + Θ′

1dΘ1)A∗j

=[A∗′

j ⊗ (αjIq − H−1η Θ′

1Θ1H−1′

η )+] [

(Θ′1 ⊗ Iq)vec(dΘ′

1) + (Iq ⊗ Θ′1)vec(dΘ1

])

=[A∗′

j ⊗ (αjIq − H−1η Θ′

1Θ1H−1′

η )+] [

(Θ′1 ⊗ Iq)Kkq + (Iq ⊗ Θ′

1)]vec(dΘ1)

=[A∗′

j ⊗ (αjIq − H−1η Θ′

1Θ1H−1′

η )+] [

(Kq + Iq2)(Iq ⊗ Θ′1)]

vec(dΘ1)

for j = 1, ..., m. Let Qj = A∗′j ⊗ (αjIq − H−1

η Θ′1Θ1H

−1′η )+. Hence,

dvec(A1:m) = Q[(Kq + Iq2)(Iq ⊗ Θ′

1)]

vec(dΘ1), (C.12)

38

where Q =

Q1...

Qm

. For the first k variables, by (C.12) and (C.11) we have

√Tvec(Θ1A1:m − Θ1H

−1′

η H ′ηA1:m)

=vec[√

T (Θ1 − Θ1H−1′

η )A1:m +√

TΘ1H−1′

η (A1:m − H ′ηA1:m)

]=

√T (A′

1:m ⊗ Ik)vec(Θ1 − Θ1H−1′

η ) + (Im ⊗ Θ1H−1′

η )√

Tvec(A1:m − H ′ηA1:m)

=B5√

Tvec(Θ1 − Θ1H−1′

η ) + op(1) (C.13)

where B5 = (A′1:m ⊗ Ik) + (Im ⊗ Θ1H

−1′η )Q

[(Kq + Iq2)(Iq ⊗ Θ′

1)]. The probability limit of B5

is given by

B5 = (A′1:mHη ⊗ Ik) + (Im ⊗ Θ1H

−1′

η )Q[(Kq + Iq2)(Iq ⊗ H−1

η Θ′1)],

where Q =

Q1...

Qm

with Qj = plim(A′

j

)⊗ (αjIq − H−1

η Θ′1Θ1H

−1′η )+ for j = 1, ..., m.

For the first ℓ variables, we have

√Tvec(C0Θ1A1:m − C0Θ1H

−1′

η H ′ηA1:m)

=(Im ⊗ C0)B5√

Tvec(Θ1 − Θ1H−1′

η ) + op(1). (C.14)

Next, we consider the asymptotic properties of Am. Recall that αm is the smallest eigenvalueof A′

1:mΘ′1C

′0C0Θ1A1:m and A∗

m is the corresponding eigenvector, so

A′1:mΘ′

11Θ11A1:mA∗m = A∗

mα∗m. (C.15)

LetΘ11 = Θ11A1:m. (C.16)

Since Θ11 is a consistent estimator for Θ11H−1′η and A1:m is consistent estimator for H ′

ηA1:m,it follows that Θ11 − Θ11A1:m →p 0 and

Θ′11Θ11 − A′

1:mΘ′11Θ11A1:m →p 0. (C.17)

By the identification condition in Assumption ID3, we have α∗m = 0 being the unique zero

39

eigenvalue of A′1:mΘ′

11Θ11A1:m, so Am − A∗m →p 0. Similar to (C.5), we have

dAm =[A∗′

m ⊗ (−A′1:mΘ′

11Θ11A1:m)+]

vec(dΘ′11Θ11 + Θ′

11dΘ11)

=[A∗′

m ⊗ (−A′1:mΘ′

11Θ11A1:m)+] [

(Θ′11 ⊗ Im)Kℓm + (Im ⊗ Θ′

11)]

vec(dΘ11)

= B6vec(dΘ11),

where B6 =[A∗′

m ⊗ (−A′1:mΘ′

11Θ11A1:m)+] [

(Θ′11 ⊗ Im)Kℓm + (Im ⊗ Θ′

11)]. Since Θ11A1:mA∗

m =0ℓ×1 by definition of A∗

m in (C.15) and Θ11 − Θ11A1:m →p 0, it follows that Θ11A∗m →p 0ℓ×1.

Thus, we have

B6 →p B6 ≡[A∗′

m ⊗ (−A′1:mΘ′

11Θ11A1:m)+A′1:mΘ′

11

], with rank(B6) = m − 1,

because rank (Θ11A1:m) = m − 1 by ID3. By Delta method, we have

√T (Am − A∗

m) =√

TB6vec(Θ11 − Θ11A1:m) + op(1). (C.18)

Next, we can derive the following asymptotic representation for the contemporaneous IRFsof the first k variables in Xt with respect to ζmt,

√T (Θ1A1:mAm − Θ1A1:mA∗

m)

=√

T (Θ1A1:m − Θ1A1:m)Am + Θ1A1:m√

T (Am − A∗m)

=√

T (A′m ⊗ Ik)vec(Θ1A1:m − Θ1A1:m) +

√TΘ1A1:mB6vec(Θ11 − Θ11A1:m) + op(1)

=[(A′

m ⊗ Ik) + Θ1A1:mB6(Im ⊗ C0)]

B5√

Tvec(Θ1 − Θ1H−1′

η ) + op(1),

where we use the results in (C.18) and (C.14). By the distribution of√

Tvec(Θ1 − Θ1H−1′η )

in Proposition 1 and (3.17), we obtain

√T (Θ1A1:mAm − Θ1Am) →d N(0k×1, B7SΩ(1)S ′B′

7),

where B7 =[(A∗′

m ⊗ Ik) + Θ1A1:mB6(Im ⊗ C0)]

B5.For the IRFs of the first ℓ variables in Xt with respect to ζmt,

√T (Θ11A1:mAm − Θ11A1:mA∗

m)

=√

T (Θ11A1:m − Θ1A1:m)Am + Θ11A1:m√

T (Am − A∗m)

=B7ℓ

√Tvec(Θ1 − Θ1H

−1′

η ) + op(1),

40

where B7ℓ =[(A∗′

m ⊗ Iℓ) + Θ11A1:mB6]

(Im ⊗ C0)B5. Similar to (C.8),

(A∗′

m ⊗ Iℓ) + Θ11A1:mB6 = A∗′

m ⊗[Iℓ − Θ11A1:m(A′

1:mΘ′11Θ11A1:m)+A′

1:mΘ′11

],

which is of rank ℓ − m + 1.(b) Now, consider the contemporaneous IRFs of i-th variable for i = ℓ + 1, ..., N . Note

that

√Tvec(A1:mAm − H ′

ηA1:mA∗m) = vec

[√T (A1:m − H ′

ηA1:m)Am + H ′ηA1:m

√T (Am − A∗

m)]

= (A′m ⊗ Iq)

√Tvec(A1:m − H ′

ηA1:m) + H ′ηA1:m

√T (Am − A∗

m)

= B8√

Tvec(Θ1 − Θ1H−1′

η ) + op(1), (C.19)

where we use (C.12), (C.14), (C.18) and B8 = (A∗′m ⊗ Iq)Q

[(Kqq + Iq2)(Iq ⊗ H−1

η Θ′1)]

+H ′

ηA1:mB6(Im ⊗ C0)B5.For θ′

iA1:mAm for i = k + 1, ..., N , by (B.8) and (C.19) we can obtain

√T (θ′

iA1:mAm − θ′iA1:mA∗

m)

=θ′i

√T (A1:mAm − H ′

ηA1:mA∗m) +

√T (θi − H−1

η θi)′H ′ηA1:mA∗

m

=θ′iB8

√Tvec(Θ1 − Θ1H

−1′

η ) + A∗′

mA′1:mHηG′

√T (λi − H−1

η λi) + op(1)

=B(i)9

√T

vec(Λ1 − Λ1H−1′

F )(λi − H−1

η λi)

+ op(1)

where B(i)9 = θ′

iH−1′η B8(H ′

ηG′HF ⊗ Ik)C1 + A′mG′HF C2.

(c) For IRFs of Xit with respect to ζm,t−s (s ≥ 1), we have

√T(λ′

iΨsGA1:mAm − λ′iH

−1′

F H ′F ΨsH

−1′

F H ′F GHηH−1

η A1:mA∗m

)=

√T λ′

iΨsG(A1:mAm − H ′ηA1:mA∗

m) +√

T λ′i(Ψs − H ′

F ΨsH−1′

F )H ′F GA1:mA∗

m

+√

T (λi − H−1F λi)′H ′

F ΨsGA1:mA∗m + op(1)

=√

T λ′iΨsGB8vec(Θ1 − Θ1H

−1′

η ) + A∗′

mA′1:mG′Ψ′

sHF

√T (λi − H−1

F λi)

+√

T (λ′i ⊗ A∗′

mA′1:mG′HF )vec(Ψ′

s − H−1F Ψ′

sHF ) + op(1)

=B(i)10

√T

vec(Λ1 − Λ1H

−1′

F )λi − H−1

F λi

vec(Ψ′s − H−1

F Ψ′sHF )

+ op(1)

where the 2nd equality uses (C.19), the last line uses (B.8), and B(i)10 = λ′

iΨsGHηB8(H ′ηG′HF ⊗

41

Ik)C3 + A∗′mA′

1:mG′Ψ′sHF C4 + (λ′

iH−1′

F ⊗ A∗′mA′

1:mG′HF )C5.Q.E.D.

Proof of Theorem 3:The proof mainly follows that of Theorem 1 of Kleibergen and Paap (2006). Let Ξ =

Σ(1)−1/2Θ1H−1′η (G′G)−1/2, so we have

√Tvec(Ξ − Ξ) =

[(G′G)−1/2 ⊗ Σ(1)−1/2]√

Tvec(Θ1 − Θ1H−1′

η )

→d N(0kq×1, Ikq) (C.20)

Note that Md,2M′d,2 and M ′

d,1Md,1 are identity matrices, so we have

√T ρd =

√T (Md,2 ⊗ M ′

d,1)vec(Ξ − Ξ)︸ ︷︷ ︸→dN(0kq×1, Ikq)

+√

Tvec(M ′d,1ΞM ′

d,2). (C.21)

We next show that√

Tvec(M ′d,1ΞM ′

d,2) →p 0. By the decomposition suggested by Kleibergenand Paap (2006), it follows that Ξ = Kd,1Kd,2 under H(d)

0 , where Kd,1 is k × d, Kd,2 isd × q, M ′

d,1Kd,1 →p 0, and Kd,2M′d,2 →p 0. Kleibergen and Paap show that Md,1 and Md,2

converge to their probability limits at the same rate as Θ1. Hence, we have M ′d,1Kd,1 =

Op(T −1/2) + Op(δ−2NT ) and Kd,2M

′d,2 = Op(T −1/2) + Op(δ−2

NT ) by (B.4). Under the conditionthat

√T/N → 0 as N, T → ∞, we have

√Tvec(M ′

d,1ΞM ′d,2) →p 0.

Q.E.D.

Proof of Theorem 4:Recall that Theorem 3 has shown that

√T ρd is asymptotically normal if rank(Θ1) = d. The

term τd is designed to test the rank of Θ11A1:m and we only need to find its asymptoticdistribution. Note that

√T τd =

√T (M (ℓ)

d,2 ⊗ M(ℓ)′

d,1 )vec[Σ(1)−1/2

11

(Θ11A1:m − Θ11A1:m

)]+

√Tvec

(M

(ℓ)′

d,1 Σ(1)−1/2

11 Θ11A1:mM(ℓ)′

d,2

),

(C.22)

where the second term in (C.22) is negligible for the same reason as the second term in(C.21). By (C.14), the first term in (C.22) can be reduced to

√T (M (ℓ)

d,2 ⊗ M(ℓ)′

d,1 )vec[Σ(1)−1/2

11

(Θ11A1:m − Θ11A1:m

)]=

√T (M (ℓ)

d,2 ⊗ M(ℓ)′

d,1 Σ(1)−1/2

11 )(Im ⊗ C0)B5vec(Θ1 − Θ1H−1′

η ) + op(1). (C.23)

42

This gives the second row of matrix Bd. Combining with result for ρd, we obtain

√T

ρd

τd

= Bdvec(Θ1 − Θ1H−1′

η ) + op(1). (C.24)

Under Assumption 8, it follows that wjointd →d χ2

(k−d)(q−d)+(ℓ−d+1).Q.E.D.

References

[1] Amengual, D. and M. Watson, 2007. Consistent Estimation of the Number of DynamicFactors in a Large N And T Panel, Journal of Business and Economic Statistics, 25(1):91-96.

[2] Bai, J., 2003. Inferential Theory for Factor Models of Large Dimensions, Econometrica71, 135-172.

[3] Bai, J. and K.P. Li, 2012. Statistical analysis of factor models of high dimension, Annalsof Statistics, 40(1), 436-465.

[4] Bai, J., K.P. Li, and L. Lu, 2015. Estimation and inference of FAVAR models, Journalof Business and Economic Statistics, forthcoming.

[5] Bai, J. and S. Ng, 2002. Determining the Number of Factors in Approximate FactorModels, Econometrica, 70, 191-221.

[6] Bai, J. and S. Ng, 2004. A Panic Attack on Unit Roots and Cointegration, Econometrica,72(4), 1127-1177.

[7] Bai, J. and S. Ng, 2013. Principal Components Estimation and Identification of StaticFactors, Journal of Econometrics, 176, 18-29.

[8] Bai, J. and P. Wang, 2014. Identification Theory for High Dimensional Static and Dy-namic Factor Models, Journal of Econometrics, 178(2), 794-804.

[9] Bai, J. and P. Wang, 2015. Identification and Bayesian Estimation of Dynamic FactorModels, Journal of Business and Economic Statistics, 33(2), 221-240.

[10] Bernanke, B.S., J. Boivin, and P. Eliasz, 2005. Measuring the Effects of Monetary Policy:a Factor-augmented Vector Autoregressive (FAVAR) Approach, Quarterly Journal ofEconomics 120, 387–422.

[11] Bianchi, F., H. Mumtaz, and P. Surico 2009. The Great Moderation of the Term Struc-ture of U.K. Interest Rates, Journal of Monetary Economics, 56, 856-871.

43

[12] Boivin, J. and M. Giannoni, and I. Mihov, 2009. Sticky Prices and Monetary Policy:Evidence from Disaggregated US Data, American Economic Review, 99:1, 350-384.

[13] Brockwell, P. and R. Davis, 1991. Time Series: Theory and Methods (2d ed.), Berlin:Springer-Verlag.

[14] Buch, C. M., S. Eickmeier, and E. Prieto, 2014. Macroeconomic Factors and MicrolevelBank Behavior, Journal of Money, Credit and Banking, 41, Vol. 46, Issue 4, 715-751.

[15] Canova, F. and Pérez Forero F.J., 2012. Estimating Overidentified, Non-recursive, TimeVarying Coefficients Structural VARs, Barcelona GSE Working Paper Series, 637.

[16] Caggiano, G., E. Castelnuovo, and N. Groshenny, 2014. Uncertainty Shocks and Unem-ployment Dynamics in U.S. Recessions, Journal of Monetary Economics, 67, 78-92.

[17] Christiano, L., M. Eichenbaum, and C. Evans 1998. Monetary Policy Shocks: WhatHave We Learned and to What End? NBER working paper, 6400.

[18] Eichenbaum, M. and C. Evans 1995. Some Empirical Evidence of Shocks to MonetaryPolicy on Exchange Rates, Quarterly Journal of Economics, 110, 975–1010.

[19] Eickmeier, S., W. Lemke, and M. Marcellino, 2015. Classical Time Varying Factor-Augmented Vector Autoregressive Models–Estimation, Forecasting and Structural Anal-ysis, Journal of Royal Statistical Society: Series A, 178, Part(3), 493-533.

[20] Forni, M., D. Giannone, M. Lippi, and L. Reichlin, 2009. Opening the Black Box:Structural Factor Models with Large Cross Sections, Econometric Theory, 25, 1319-1347.

[21] Forni, M. and L. Gambetti, 2010. The Dynamic Effects of Monetary Policy: A StructuralFactor Model Approach, Journal of Monetary Economics, 57(2), 203-216.

[22] Forni, M., L. Gambetti, and L. Sala, 2014. No News in Business Cycles, The EconomicJournal, 124 (December), 1168-1191.

[23] Gafarov, B., 2014. Identification in Dynamic Models Using Sign Restrictions,Manuscript, Pennsylvania State University.

[24] Gilchrist, S., V. Yankov, and E. Zakraisek. Credit Market Shocks and Economic Fluc-tuations: Evidence from Corporate Bond and Stock Markets, Journal of Monetary Eco-nomics, 56, 471-493.

[25] Hamilton, J., 1994. Time Series Analysis, Princeton University Press.

[26] Han, X., 2015. Tests for Overidentifying Restrictions in Factor-Augmented VAR Models,Journal of Econometrics, 184(2), 394-419.

[27] Han, X. and A. Inoue, 2014. Tests for Parameter Instability in Dynamic Factor Models,Econometric Theory, forthcoming.

44

[28] Kleibergen, F and R. Paap, 2006. Generalized Reduced Rank Tests Using the SingularValue Decomposition, Journal of Econometrics, 133(1), 97-126.

[29] Kociecki, A., M. Rubaszek, and M. Ca’Zorz, 2012. Bayesian Analysis of Recursive SVARModels with Overidentified Restrictions, European Central Bank Working Paper Series,No. 1492.

[30] Koop, G. and D. Korobilis, 2014. A New Index of Financial Conditions, European Eco-nomic Review, Vol. 71, 101-116.

[31] Leeper, E., C. Sims, and T. Zha, 1996. What Does Monetary Policy Do? BrookingsPapers on Economic Activity 2.

[32] Magnus, J.R. and h. Neudecker, 1999. Matrix Differential Calculus with Applications inStatistics and Econometrics, John Wiley & Sons.

[33] Mumtaz, H. and P. Surico, 2009. The Transmission of International Shocks: A Factor-Augmented VAR Approach, Journal of Money, Credit and Banking, Vol. 41, Supplement1, 71-100.

[34] Ritschl, A. and S. Sarferaz, 2014. Currency versus Banking in the Financial Crisis of1931, International Economic Review, Vol. 55, No. 2, 349-373.

[35] Sargent, T.J. and C.A. Sims, 1977. Business Cycle Modelling without Pretending toHave Too Much a-priori Economic Theory, in: Sims et al., eds., New Methods in BusinessCycle Research (Federal Reserve Bank of Minneapolis, Minneapolis).

[36] Sims, C. and T. Zha, 1998. Bayesian Methods for Dynamic Multivariate Models, Inter-national Economic Review 39(4), 949-968.

[37] Stock, J. and M. Watson, 2002. Forecasting Using Principal Components from a LargeNumber of Predictors. Journal of the American Statistical Association, 97, 1167–1179.

[38] Stock, J. and M. Watson, 2005. Implications of Dynamic Factor Models for VAR Anal-ysis, NBER Working Paper No. 11467.

[39] White, H. 1984. Asymptotic Theory for Econometricians, New York: Academic.

45

Table 1: RMSE ratios between estimators under over- and just-identifying restrictions form = 4

ΘAq

(k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.668 0.472 0.517 0.389 0.403125, 500 0.620 0.403 0.425 0.313 0.321250, 250 0.671 0.491 0.498 0.383 0.393250, 500 0.602 0.402 0.419 0.305 0.317

ΘA1:mAm

(k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.700 0.691 0.570 0.684 0.545125, 500 0.645 0.632 0.464 0.616 0.432250, 250 0.711 0.704 0.548 0.697 0.533250, 500 0.640 0.632 0.463 0.627 0.440

Notes: the ratios between the RMSEs of estimators under all of the available over-identifyingrestrictions and the RMSEs of estimators under just-identifying restrictions. ΘAq is an estimatorfor the q-th column of Γ, which is the contemporaneous IRFs with respect to the q-th structuralshock. ΘA1:mAm is an estimator for the m-th column of Γ, which is the contemporaneous IRFswith respect to the m-th structural shock.

46

Table 2: Size properties of the estimated IRFs when m = 4

Θ1Aq Θ1A1:mAm

(k, ℓ) (k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5) (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.045 0.058 0.057 0.063 0.062 0.064 0.066 0.067 0.071 0.070125, 500 0.057 0.065 0.065 0.068 0.068 0.070 0.072 0.072 0.075 0.075250, 250 0.038 0.050 0.047 0.052 0.052 0.048 0.050 0.053 0.054 0.055250, 500 0.049 0.055 0.055 0.055 0.055 0.051 0.054 0.053 0.057 0.054

θ′iAq θ′

iA1:mAm

(k, ℓ) (k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5) (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.092 0.082 0.087 0.081 0.085 0.083 0.083 0.081 0.085 0.081125, 500 0.090 0.090 0.088 0.082 0.084 0.076 0.075 0.079 0.079 0.079250, 250 0.081 0.068 0.074 0.063 0.068 0.065 0.067 0.069 0.067 0.072250, 500 0.077 0.066 0.073 0.064 0.073 0.071 0.070 0.067 0.072 0.069

λ′iΨsGAq λ′

iΨsGA1:mAm

(k, ℓ) (k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5) (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.063 0.064 0.057 0.060 0.058 0.061 0.060 0.059 0.061 0.059125, 500 0.064 0.061 0.061 0.062 0.062 0.063 0.063 0.062 0.065 0.064250, 250 0.056 0.054 0.055 0.053 0.053 0.049 0.052 0.052 0.053 0.051250, 500 0.065 0.061 0.059 0.061 0.060 0.062 0.062 0.060 0.065 0.063

Notes: The normal size is 5%. Θ1Aq and Θ1A1:mAm are the contemporaneous IRFs of the first k

variables with respect to the q-th shock and m-th shock, respectively. θ′iAq and θ′

iA1:mAm are thecontemporaneous IRFs of the i-th variable (i > k) with respect to the q-th shock and m-th shock,respectively. λ′

iΨsGAq and λ′iΨsGA1:mAm are the dynamic IRFs of the i-th variable with respect

to the q-th and m-th shock, respectively. The horizon s is set equal to 5.

47

Table 3: Size properties of the estimated IRFs when m = 3Θ1A1:mAm

(k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.077 0.073 0.078 0.072 0.075125, 500 0.083 0.080 0.082 0.081 0.081250, 250 0.067 0.062 0.062 0.062 0.060250, 500 0.063 0.062 0.063 0.061 0.061

θ′iA1:mAm

(k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.110 0.101 0.106 0.094 0.099125, 500 0.117 0.099 0.107 0.099 0.103250, 250 0.086 0.085 0.094 0.082 0.085250, 500 0.089 0.079 0.083 0.069 0.074

λ′iΨsGA1:mAm

(k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.062 0.059 0.064 0.058 0.061125, 500 0.067 0.065 0.065 0.065 0.064250, 250 0.056 0.055 0.056 0.054 0.055250, 500 0.057 0.055 0.057 0.055 0.057

Notes: The normal size is 5%. Θ1A1:mAm is the contemporaneous IRF of the first k variables withrespect to the m-th shock. θ′

iA1:mAm is the contemporaneous IRF of the i-th variable (i > k) withrespect to the m-th shock. λ′

iΨsGA1:mAm is the dynamic IRF of the i-th variable with respect tothe m-th shock. The horizon s is set equal to 5.

48

Table 4: Size properties of the tests for over-identifying restrictions specified in (2.17) and(2.21)

wm wjointm

m = 4 (k, ℓ) (k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5) (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.045 0.060 0.059 0.073 0.073 0.052 0.060 0.070 0.072 0.079125, 500 0.055 0.068 0.066 0.076 0.080 0.062 0.069 0.075 0.079 0.086250, 250 0.038 0.050 0.047 0.056 0.055 0.039 0.048 0.053 0.054 0.057250, 500 0.046 0.054 0.055 0.057 0.060 0.051 0.052 0.059 0.057 0.063

wm wjointm

m = 3 (k, ℓ) (k, ℓ)N, T (5, 4) (6, 4) (6, 5) (7, 4) (7, 5) (5, 4) (6, 4) (6, 5) (7, 4) (7, 5)

125, 250 0.072 0.088 0.084 0.097 0.099 0.083 0.093 0.100 0.103 0.106125, 500 0.081 0.089 0.091 0.104 0.100 0.089 0.095 0.103 0.110 0.110250, 250 0.049 0.059 0.062 0.072 0.069 0.060 0.065 0.069 0.070 0.074250, 500 0.058 0.063 0.068 0.071 0.073 0.061 0.066 0.071 0.073 0.075

Notes: The normal size is 5%. wm tests the null hypothesis H(m)0 : rank(Θ1) = m, and wjoint

m teststhe joint null hypothesis H(m)′

0 : rank(Θ1) = m and rank(Θ11) = m − 1.

Table 5: p-values of tests for over-identifying restrictions

Joint H(d)′

0 p-value rank(Θ11) = d p-value rank(Θ1) = d p-valued = 2 2 × 10−10 d = 2 6 × 10−11 d = 3 0.001d = 3 0.276 d = 3 0.188 d = 4 0.502

49

Table 6: p-values of IRFs: over-identification versus just identification

p-values of IRFs under just-identifying restrictionsmths after shock 0 1 2 3 6 9 12 18 24 30 36

FFR 0.002 0.019 0.067 0.119 0.265 0.398 0.528 0.762 0.938 0.942 0.863IP NA 0.265 0.604 0.896 0.641 0.454 0.371 0.339 0.365 0.411 0.467CPI NA 0.535 0.242 0.091 0.015 0.013 0.015 0.014 0.007 0.002 0.001Consumption 0.091 0.005 0.003 0.002 0.002 0.002 0.002 0.003 0.004 0.006 0.006Cap. Utilization 0.661 0.420 0.771 0.932 0.499 0.367 0.335 0.376 0.474 0.612 0.776Unemployment 0.100 0.278 0.445 0.570 0.837 0.992 0.880 0.766 0.732 0.732 0.744Employment 0.559 0.658 0.775 0.915 0.767 0.567 0.438 0.302 0.243 0.214 0.197PCE Deflator 0.339 0.071 0.021 0.004 0.000 0.000 0.001 0.001 0.001 0.000 0.000Earning 0.475 0.183 0.044 0.011 0.001 0.000 0.000 0.000 0.000 0.000 0.000Housing Starts 0.981 0.219 0.058 0.024 0.007 0.005 0.004 0.004 0.005 0.005 0.006Orders 0.922 0.466 0.294 0.160 0.040 0.019 0.014 0.013 0.018 0.029 0.047Inventories 0.575 0.617 0.270 0.210 0.466 0.759 0.989 0.647 0.488 0.415 0.379M2 0.752 0.263 0.069 0.026 0.004 0.002 0.001 0.000 0.000 0.000 0.000Reserves 0.097 0.700 0.581 0.269 0.070 0.058 0.065 0.090 0.115 0.128 0.132Consumer Credits 0.016 0.035 0.094 0.227 0.883 0.631 0.384 0.198 0.138 0.111 0.096SP500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000Ex Rate Yen 0.105 0.072 0.107 0.118 0.148 0.185 0.226 0.310 0.387 0.450 0.498PPI 0.015 0.057 0.060 0.040 0.018 0.016 0.018 0.015 0.009 0.004 0.001Commodity Price NA 0.088 0.129 0.128 0.110 0.104 0.108 0.130 0.159 0.199 0.250Cons. Expectation 0.004 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.004 0.015 0.031

p-values of IRFs under over-identifying restrictions (the benchmark specification)mths after shock 0 1 2 3 6 9 12 18 24 30 36FFR 0.006 0.061 0.166 0.270 0.518 0.710 0.872 0.881 0.719 0.617 0.555IP 0.255 0.554 0.961 0.755 0.371 0.224 0.154 0.109 0.109 0.128 0.155CPI 0.255 0.914 0.542 0.243 0.050 0.040 0.042 0.035 0.019 0.007 0.002Consumption 0.048 0.002 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.001 0.001Cap. Utilization 0.971 0.788 0.862 0.610 0.278 0.171 0.130 0.122 0.160 0.241 0.358Unemployment 0.475 0.781 0.987 0.874 0.627 0.486 0.395 0.304 0.275 0.272 0.280Employment 0.841 0.770 0.677 0.577 0.371 0.246 0.167 0.086 0.054 0.039 0.032PCE Deflator 0.255 0.079 0.028 0.006 0.000 0.001 0.001 0.003 0.002 0.000 0.000Earning 0.981 0.497 0.185 0.065 0.006 0.002 0.001 0.000 0.000 0.000 0.000Housing Starts 0.487 0.040 0.004 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000Orders 0.553 0.327 0.200 0.109 0.024 0.008 0.004 0.002 0.003 0.008 0.018Inventories 0.397 0.974 0.576 0.546 0.894 0.800 0.574 0.308 0.193 0.141 0.117M2 0.706 0.321 0.099 0.043 0.011 0.006 0.004 0.001 0.000 0.000 0.000Reserves 0.100 0.831 0.498 0.223 0.074 0.083 0.110 0.163 0.196 0.205 0.201Consumer Credits 0.160 0.206 0.406 0.686 0.555 0.210 0.085 0.022 0.010 0.006 0.005SP500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000Ex Rate Yen 0.012 0.017 0.032 0.038 0.057 0.090 0.133 0.235 0.333 0.413 0.473PPI 0.082 0.196 0.170 0.112 0.051 0.045 0.046 0.039 0.023 0.010 0.003Commodity Price 0.255 0.233 0.184 0.150 0.094 0.076 0.070 0.074 0.091 0.126 0.178Cons. Expectation 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.006 0.020 0.041

50

Figure 1: Size-adjusted power of wd and wjointd

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Power againt the null hypothesis rank(Θ1)=4

β

wd test infeasible KP test

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Power againt the joint null hypothesis: rank(Θ1)=4 and rank(Θ

11)=3

β

wdjoint test infeasible joint test

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Power againt the null hypothesis rank(Θ1)=3

β

wd test infeasible KP test

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Power againt the joint null hypothesis: rank(Θ1)=3 and rank(Θ

11)=2

β

wdjoint test infeasible joint test

Notes: The solid lines with circles denote the size-adjusted power of wd and wjointd . The solid lines

with asterisks denote the size-adjusted power of the infeasible KP test and joint rank test. β specifiesthe strength of the violation against the null hypothesis. The sample size (N, T ) = (125, 500). Theupper-left (lower-left) panel shows the power of wd and KP test against the null hypothesis thatrank(Θ1) = 4 (rank(Θ1) = 3); the upper-right (lower-right) panel shows the power of wjoint

d andthe infeasible joint test against the joint null hypothesis that rank(Θ1) = 4 and rank(Θ11) = 3(rank(Θ1) = 3 and rank(Θ11) = 2).

51

Figure 2: Cumulative IRFs after a contractionary monetary policy shockm = 4 versus m = 3

10 20 30−0.5

0

0.5

FFR

10 20 30−0.4

−0.2

0

0.2

IP

10 20 30

−1

−0.5

0

CPI

10 20 30

−0.4

−0.2

0

Consumption

10 20 30

−0.2

0

0.2

Capacity Utilization

10 20 30

−0.2

0

0.2

0.4

0.6

Unemployment

10 20 30

−1

−0.5

0

Employment

10 20 30

−1

−0.5

0

PCE Deflator

10 20 30−1

−0.5

0

Earning

10 20 30

−2

−1

0Housing Starts

10 20 30

−0.2

0

0.2Orders

10 20 30

−0.6−0.4−0.2

00.2

Inventories

10 20 30

−0.8

−0.6

−0.4

−0.2

0

M2

10 20 30−0.4

−0.2

0

Reserves

10 20 30

−1

−0.5

0

Consumer Credits

10 20 30

−1

−0.5

0

SP500

10 20 30

0

0.2

0.4

Ex Rate Yen

10 20 30

−0.8−0.6−0.4−0.2

00.2

PPI

10 20 30

−0.2

0

0.2

Commodity Price

10 20 30

−0.4

−0.2

0

0.2

Consumer Expectation

IRFs with m = 4 IRFs with m = 3 90% confidence bands for IRFs with m = 4

Notes: The solid curves are the cumulative IRFs after a contractionary monetary policy shockunder the benchmark setup m = 4, which passes our specification tests. The dashed curves are thecumulative IRFs under m = 3, which does not pass our specification tests. The dotted lines are the90% confidence bands for the cumulative IRFs with m = 4.

52