inferential theoy.pdf

38
The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica. http://www.jstor.org Inferential Theory for Factor Models of Large Dimensions Author(s): Jushan Bai Source: Econometrica, Vol. 71, No. 1 (Jan., 2003), pp. 135-171 Published by: The Econometric Society Stable URL: http://www.jstor.org/stable/3082043 Accessed: 10-09-2015 06:43 UTC Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/ info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTC All use subject to JSTOR Terms and Conditions

Upload: afshan-muneer

Post on 24-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: inferential theoy.pdf

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

http://www.jstor.org

Inferential Theory for Factor Models of Large Dimensions Author(s): Jushan Bai Source: Econometrica, Vol. 71, No. 1 (Jan., 2003), pp. 135-171Published by: The Econometric SocietyStable URL: http://www.jstor.org/stable/3082043Accessed: 10-09-2015 06:43 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/ info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 2: inferential theoy.pdf

Econometrica, Vol. 71, No. 1 (January, 2003), 135-171

INFERENTIAL THEORY FOR FACTOR MODELS OF LARGE DIMENSIONS

BY JusHAN BAIl

This paper develops an inferential theory for factor models of large dimensions. The principal components estimator is considered because it is easy to compute and is asymp- totically equivalent to the maximum likelihood estimator (if normality is assumed). We derive the rate of convergence and the limiting distributions of the estimated factors, fac- tor loadings, and common components. The theory is developed within the framework of large cross sections (N) and a large time dimension (T), to which classical factor analysis does not apply.

We show that the estimated common components are asymptotically normal with a convergence rate equal to the minimum of the square roots of N and T. The estimated factors and their loadings are generally normal, although not always so. The convergence rate of the estimated factors and factor loadings can be faster than that of the estimated common components. These results are obtained under general conditions that allow for correlations and heteroskedasticities in both dimensions. Stronger results are obtained when the idiosyncratic errors are serially uncorrelated and homoskedastic. A necessary and sufficient condition for consistency is derived for large N but fixed T.

KEYWORDS: Approximate factor models, principal components, common components, large model analysis, large data sets, data-rich environment.

1. INTRODUCTION

ECONOMISTS NOW HAVE the luxury of working with very large data sets. For example, the Penn World Tables contain thirty variables for more than one- hundred countries covering the postwar years. The World Bank has data for about two-hundred countries over forty years. State and sectoral level data are also widely available. Such a data-rich environment is due in part to technological advances in data collection, and in part to the inevitable accumulation of infor- mation over time. A useful method for summarizing information in large data sets is factor analysis. More importantly, many economic problems are character- ized by factor models; e.g., the Arbitrage Pricing Theory of Ross (1976). While there is a well developed inferential theory for factor models of small dimensions

11 am grateful to three anonymous referees and a co-editor for their constructive comments, which led to a significant improvement in the presentation. I am also grateful to James Stock for his encouragement in pursing this research and to Serena Ng and Richard Tresch for their valuable comments. In addition, this paper has benefited from comments received from the NBER Summer Meetings, 2001, and the Econometric Society Summer Meetings, Maryland, 2001. Partial results were also reported at the NBER/NSF Forecasting Seminar in July, 2000 and at the ASSA Winter Meetings in New Orleans, January, 2001. Partial financial support from the NSF (Grant #SBR-9896329) is acknowledged.

135

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 3: inferential theoy.pdf

136 JUSHAN BAI

(classical), the inferential theory for factor models of large dimensions is absent. The purpose of this paper is to partially fill this void.

A factor model has the following representation:

() Xit = A'Ft + eit, (1)

where Xit is the observed datum for the ith cross section at time t(i = 1, . . . N; t = 1, ... T); Ft is a vector (r x 1) of common factors; Ai is a vector (r x 1) of factor loadings; and eit is the idiosyncratic component of Xit. The right-hand- side variables are not observable.2 Readers are referred to Wansbeek and Meijer (2000) for an econometric perspective on factor models. Classical factor analysis assumes a fixed N, while T is allowed to increase. In this paper, we develop an inferential theory for factor models of large dimensions, allowing both N and T to increase.

1.1. Examples of Factor Models of Large Dimensions

(i) Asset Pricing Models. A fundamental assumption of the Arbitrage Pricing Theory (APT) of Ross (1976) is that asset returns follow a factor structure. In this case, Xit represents asset i's return in period t; Ft is a vector of factor returns; and eit is the idiosyncratic return. This theory leaves the number of factors unspecified, though fixed.

(ii) Disaggregate Business Cycle Analysis. Cyclical variations in a country's economy could be driven by global or country-specific shocks, as analyzed by Gre- gory and Head (1999). Similarly, variations at the industry level could be driven by country-wide or industry-specific shocks, as analyzed by Forni and Reichlin (1998). Factor models allow for the identification of common and specific shocks, where Xit is the output of country (industry) i in period t; Ft is the common shock at t; and Ai is the exposure of country (industry) i to the common shocks. This factor approach of analyzing business cycles is gaining prominence.

(iii) Monitoring, Forecasting, and Diffusion Indices. The state of the economy is often described by means of a small number of variables. In recent years, Forni, Hallin, Lippi, and Reichlin (2000b), Reichlin (2000), and Stock and Watson (1998), among others, stressed that the factor model provides an effective way of monitoring economic activity. This is because business cycles are defined as co- movements of economic variables, and common factors provide a natural repre- sentation of these co-movements. Stock and Watson (1998) further demonstrated that the estimated common factors (diffusion indices) can be used to improve forecasting accuracy.

(iv) Consumer Theory. Suppose Xih represents the budget share of good i for household h. The rank of a demand system is the smallest integer r such that Xih = Ai,Gl(eh) + . AirGr(eh), where eh is household h's total expenditure, and GJ(.) are unknown functions. By defining the r factors as

2 It is noted that N also represents the number of variables and T represents the number of observations, an interpretation used by classical factor analysis.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 4: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 137

Fh = [Gl(eh)... Gr(eh)]', the rank of the demand system is equal to the number of factors. An important result due to Gorman (1981) is that demand systems consistent with standard axioms of consumer theory cannot have a rank more than three, meaning that households' allocation of resources if they are utility maximizing should be described by at most three factors. This theory is further generalized by Lewbel (1991).

Applications of large-dimensional factor models are rapidly increasing. Bernanke and Boivin (2000) showed that using the estimated factors makes it possible to incorporate large data sets into the study of the Fed's monetary policy. Favero and Marcellino (2001) examined a related problem for European coun- tries. Cristadoro, Forni, Reichlin, and Veronese (2001) estimated a core inflation index from a large number of inflation indicators. As is well known, factor mod- els have been important in finance and are used for performance evaluations and risk measurement; see, e.g., Campbell, Lo, and Mackinlay (1997, Chapters 5 and 6). More recently, Tong (2000) studied the profitability of momentum trading strategies using factor models.

In all the above and in future applications, it is of interest to know when the estimated factors can be treated as known. That is, under what conditions is the estimation error negligible? This question arises whenever the estimated factors are used as regressors and the significance of the factors is tested, as for exam- ple in Bernanke and Boivin (2000), and also in the diffusion index forecasting of Stock and Watson (1998). If the estimation error is not negligible, then the distributional properties of the estimated factors are needed. In addition, it is always useful to construct confidence intervals for the estimates, especially for applications in which the estimates represent economic indices. This paper offers answers to these and related theoretical questions.

1.2. Limitations of Classical Factor Analysis

Let Xt = (X1', X2t, . . ., XNt)' and et = (e1t, e2t, . . ., eNt)' be N x 1 vectors, and let X = cov(Xt) be the covariance matrix of Xt. Classical factor analysis assumes that N is fixed and much smaller than T, and further that the eit are independent and identically distributed (iid) over time and are also independent across i. Thus the variance-covariance matrix of et, Q = E(ete'), is a diagonal matrix. The factors Ft are also assumed to be iid and are independent of eit. Although not essential, normality of eit is often assumed and maximum likelihood estimation is used in estimation. In addition, inferential theory is based on the basic assumption that the sample covariance matrix

1T

T 1 ,(X X) (Xt -X t=1

is root-T consistent for Z and asymptotically normal; see Anderson (1984) and Lawley and Maxwell (1971).

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 5: inferential theoy.pdf

138 JUSHAN BAI

These assumptions are restrictive for economic problems. First, the number of cross sections (N) is often larger than the number of time periods (T) in eco- nomic data sets. Potentially important information is lost when a small number of variables is chosen (to meet the small-N, requirement). Second, the assump- tion that 3T-(X - X) is asymptotically normal may be appropriate under a fixed N, but it is no longer appropriate when N also tends to infinity. For example, the rank of X does not exceed min{N, T}, whereas the rank of X can always be N. Third, the iid assumption and diagonality of the idiosyncratic covariance matrix Q2, which rules out cross-section correlation, are too strong for economic time series data. Fourth, maximum likelihood estimation is not feasible for large- dimensional factor models because the number of parameters to be estimated is large. Fifth, classical factor analysis can consistently estimate the factor loadings (Ai's) but not the common factors (Fe's). In economics, it is often the common factors (representing the factor returns, common shocks, diffusion indices, etc.) that are of direct interest.

1.3. Recent Developments and the Main Results of this Paper

There is a growing literature that recognizes the limitations of classical factor analysis and proposes new methodologies. Chamberlain and Rothschild (1983) introduced the notation of an "approximate factor model" to allow for a non- diagonal covariance matrix. Furthermore, Chamberlain and Rothschild showed that the principal component method is equivalent to factor analysis (or max- imum likelihood under normality of the eit) when N increases to infinity. But they assumed a known N x N population covariance matrix. Connor and Kora- jczyk (1986, 1988, 1993) studied the case of an unknown covariance matrix and suggested that when N is much larger than T, the factor model can be esti- mated by applying the principal components method to the T x T covariance matrix. Forni and Lippi (1997) and Forni and Reichlin (1998) considered large dimensional dynamic factor models and suggested different methods for estima- tion. Forni, Hallin, Lippi, and Reichlin (2000a) formulated the dynamic principal components method by extending the analysis of Brillinger (1981).

Some preliminary estimation theory of large factor models has been obtained in the literature. Connor and Korajczyk (1986) proved consistency for the esti- mated factors with T fixed. For inference that requires large T, they used a sequential limit argument (N goes to infinity first and then T goes to infinity).3 Stock and Watson (1999) studied the uniform consistency of estimated factors and derived some rates of convergence for large N and large T. The rate of con- vergence was also studied by Bai and Ng (2002). Forni et al. (2000a,c) established consistency and some rates of convergence for the estimated common compo- nents (AIFt) for dynamic factor models.

3 Connor and Korajczyk (1986) recognized the importance of simultaneous limit theory. They stated that "Ideally we would like to allow N and T to grow simultaneously (possibly with their ratio approaching some limit). We know of no straightforward technique for solving this problem and leave it for future endeavors." Studying simultaneous limits is only a recent endeavor.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 6: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 139

However, inferential theory is not well understood for large-dimensional factor models. For example, limiting distributions are not available in the literature. In addition, the rates of convergence derived thus far are not the ones that would deliver a (nondegenerate) convergence in distribution. In this paper, we derive the rate of convergence and the limiting distributions for the estimated factors, factor loadings, and common components, estimated by the principal components method. Furthermore, the results are derived under more general assumptions than classical factor analysis. In addition to large N and large T, we allow for serial and cross-section dependence for the idiosyncratic errors; we also allow for heteroskedasticity in both dimensions. Under classical factor models, with a fixed N, one can consistently estimate factor loadings but not the factors; see Anderson (1984). In contrast, we demonstrate that both the factors and factor loadings can be consistently estimated (up to a normalization) for large-dimensional factor models.

We also consider the case of large N but fixed T. We show that to estimate the factors consistently, a necessary condition is asymptotic orthogonality and asymptotic homoskedasticity (defined below). In contrast, under the framework of large N and large T, we establish consistency in the presence of serial corre- lation and heteroskedasticity. That is, the necessary condition under fixed T is no longer necessary when both N and T are large.

The rest of the paper is organized as follows. Section 2 sets up the model. Section 3 provides the asymptotic theory for the estimated factors, factor load- ings, and common components. Section 4 provides additional results in the absence of serial correlation and heteroskedasticity. The case of fixed T is also studied. Section 5 derives consistent estimators for the covariance matrices occur- ring in the limiting distributions. Section 6 reports the simulation results. Con- cluding remarks are provided in Section 7. All proofs are given in the appendices.

2. ESTIMATION AND ASSUMPTIONS

Recall that a factor model is represented by

(2) Xit = A'Ft + eit = Cit + eit,

where Cit = A'Ft is the common component, and all other variables are intro- duced previously. When N is small, the model can be cast under the state space setup and be estimated by maximizing the Gaussian likelihood via the Kalman filter. As N increases, the state space and the number of parameters to be esti- mated increase very quickly, rendering the estimation problem challenging, if not impossible. But factor models can also be estimated by the method of princi- pal components. As shown by Chamberlain and Rothschild (1983), the principal components estimator converges to the maximum likelihood estimator when N increases (though they did not consider sampling variations). Yet the former is much easier to compute. Thus this paper focuses on the properties of the prin- cipal components estimator.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 7: inferential theoy.pdf

140 JUSHAN BAI

Equation (2) can be written as an N-dimension time series with T observa- tions:

(3) xt=AFt+et (t=1,2,... ,T)

where Xt = (X1t, X2t,... , XNt)', A = (A1, A2,... , AN)', and et = (elt, e2t,... eNt). Alternatively, we can rewrite (2) as a T-dimension system with N observations:

Xi= FAi + ei (i=1,2,.. .,N)

where Xi = (Xil, Xi2, . . . , XiT)', F = (F1, F2, . FT)', and ei = (ei1, ei2, . eiT We will also use the matrix notation:

X = FA'+e,

where X = (X1, X2,... , XN) is a T x N matrix of observed data and e =

(e1, e2,.. . , eN) is a T x N matrix of idiosyncratic errors. The matrices A(N x r) and F(T x r) are both unknown.

Our objective is to derive the large-sample properties of the estimated factors and their loadings when both N and T are large. The method of principal com- ponents minimizes

N T

V(r) min(NT)-l E >(Xit - AF)2. i=l t=l

Concentrating out A and using the normalization that F'F/T = Ir (an r x r iden- tity matrix), the problem is identical to maximizing tr(F'(XX')F). The estimated factor matrix, denoted by F, is VT times eigenvectors corresponding to the r largest eigenvalues of the T x T matrix XX', and A' = (F'F)-1F'X = F'X/T are the corresponding factor loadings. The common component matrix FA' is estimated by FA'.

Since both N and T are allowed to grow, there is a need to explain the way in which limits are taken. There are three limit concepts: sequential, pathwise, and simultaneous. A sequential limit takes one of the indices to infinity first and then the other index. Let g(N, T) be a quantity of interest. A sequential limit with N growing first to infinity is written as limToo limNoo g(N, T). This limit may not be the same as that of T growing first to infinity. A pathwise limit restricts (N, T) to increase along a particular trajectory of (N, T). This can be written as limVO0 g(Nv, Tv), where (Nv, Tv) describes a given trajectory as v varies. If v is chosen to be N, then a pathwise limit is written as limN, g(N, T(N)). A simultaneous limit allows (N, T) to increase along all possible paths such that limV mmi {Nv, Tv} = x. This limit is denoted by limN T og(N, T). The present paper mainly considers simultaneous limit theory. Note that the existence of a simultaneous limit implies the existence of sequential and pathwise limits (and the limits are the same), but the converse is not true. When g(N, T) is a

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 8: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 141

function of N (or T) alone, its limit with respect to N (or T) is automatically a simultaneous limit. Due to the nature of our problem, this paper also considers simultaneous limits with restrictions. For example, we may need to consider the limit of g(N, T) for N and T satisfying /VN/T - 0; that is, lim1,, g(Nv, TV) over all paths such that limvOo min{Nv, Tv} = xo and limv,o V/lTv = 0.

Let ItAII = [tr(A'A)]1/2 denote the norm of matrix A. Throughout, we let Ft? be the r x 1 vector of true factors and A? be the true loadings, with F? and AO being the corresponding matrices. The following assumptions are used in Bai and Ng (2002) to estimate the number of factors consistently:

ASSUMPTION A-Factors: ElIFJ0tI4 <M < x and T-' >11 F,?FO?' P IF for some r x r positive definite matrix IF

ASSUMPTION B-Factor Loadings: 1AiAll < < 00, and IAO'AOI/N - INII ?? 0 for some r x r positive definite matrix XA.

ASSUMPTION C-Time and Cross-Section Dependence and Heteroskedasticity: There exists a positive constant M < oo such that for all N and T:

1. E(eit) = 0, Eleitl18 < M. 2. E(e'et/N) = E(N-1 'N eis eit) = YN(S, t), IYN(s, s)I < M for all s, and

T T

T-1 1:1 IYA(S, t1 < M. s=1 t=1

3. E(eitej,) = rij, with 1rij t < 1rijI for some rij and for all t. In addition,

N N

N-1EEII Jl<M. i=1 j=1

4. E(eitejs) = rij ts and (NT)-1 EN I N 1 ET=l I E Tij,tsI < M. 5. For every (t, s), E IN-1/2 EN 1 [eisei- E(eiseit)] I4 < M.

ASSUMPTION D-Weak dependence between factors and idiosyncratic errors:

E( N T

2) E (N || 4 E FtJeit|e < M.

Assumption A is more general than that of classical factor analysis in which the factors Ft are i.i.d. Here we allow Ft to be dynamic such that A(L)F, = Et. However, we do not allow the dynamic to enter into Xit directly, so that the relationship between Xit and Ft is still static. For more general dynamic factor models, readers are referred to Forni et al. (2000a). Assumption B ensures that each factor has a nontrivial contribution to the variance of XV. We only con- sider nonrandom factor loadings for simplicity. Our results still hold when the A 's are random, provided they are independent of the factors and idiosyncratic

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 9: inferential theoy.pdf

142 JUSHAN BAI

errors, and Ell Ai II' < M. Assumption C allows for limited time series and cross section dependence in the idiosyncratic components. Heteroskedasticities in both the time and cross-section dimensions are also allowed. Under stationarity in the time dimension, yN (s, t) = yN (s - t), though the condition is not necessary. Given Assumption Cl, the remaining assumptions in C are satisfied if the eit are independent for all i and t. Correlation in the idiosyncratic components allows the model to have an approximate factor structure. It is more general than a strict factor model which assumes eit is uncorrelated across i, the framework on which the APT theory of Ross (1976) was based. Thus, the results to be developed also apply to strict factor models. When the factors and idiosyncratic errors are inde- pendent (a standard assumption for conventional factor models), Assumption D is implied by Assumptions A and C. Independence is not required for D to be true. For example, if ei, = Ei, 1tF, 11 with Eit being independent of F, and Eit satisfy- ing Assumption C, then Assumption D holds.4

Chamberlain and Rothschild (1983) defined an approximate factor model as having bounded eigenvalues for the N x N covariance matrix 2 = E(e,e'). If et is stationary with E(ei,ej,) = rij, then from matrix theory, the largest eigenvalue of 12 is bounded by maxi N lrij. Thus if we assume j=1 lrijI < M for all i and all N, which implies Assumption C3, then (3) will be an approximate factor model in the sense of Chamberlain and Rothschild. Since we also allow for nonstationarity (e.g., heteroskedasticity in the time dimension), our model is more general than approximate factor models.

Throughout this paper, the number of factors (r) is assumed fixed as N and T grow. Many economic problems do suggest this kind of factor structure. For example, the APT theory of Ross (1976) assumes an unknown but fixed r. The Capital Asset Pricing Model of Sharpe (1964) and Lintner (1965) implies one factor in the presence of a risk-free asset and two factors otherwise, irrespective of the number of assets (N). The rank theory of consumer demand systems of Gorman (1981) and Lewbel (1991) implies no more than three factors if con- sumers are utility maximizing, regardless of the number of consumption goods.

In general, larger data sets may contain more factors. But this may not have any practical consequence since large data sets allow us to estimate more factors. Suppose one is interested in testing the APT theory for Asian and the U.S. financial markets. Because of market segmentation and cultural and political differences, the factors explaining asset returns in Asian markets are different from those in the U.S. markets. Thus, the number of factors is larger if data from the two markets are combined. The total number of factors, although increased, is still fixed (at a new level), regardless of the number of Asian and U.S. assets if APT theory prevails in both markets. Alternatively, two separate tests of APT can be conducted, one for Asian markets, one for the U.S.; each individual data set

4 While these assumptions take a similar format as in classical factor analysis in that assumptions were made on the uobservable quantities such as F, and e, (data generating process assumptions), it would be better to make assumptions in terms of the observables Xi,. Assumptions characterized by observable variables may have the advantage of, at least in principle, having the assumptions verified by the data. We plan to pursue this in future research.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 10: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 143

is large enough to perform such a test and the number of factors is fixed in each market. The point is that such an increase in r has no practical consequence, and it is covered by the theory. Mathematically, a portion of the cross-section units may have zero factor loadings, which is permitted by the model's assumptions.

It should be pointed out that r can be a strictly increasing function of N (or T). This case is not covered by our theory and is left for future research.

3. ASYMPTOTIC THEORY

Assumptions A-D are sufficient for consistently estimating the number of fac- tors (r) as well as the factors themselves and their loadings. By analyzing the statistical properties of V(k) as a function of k, Bai and Ng (2002) showed that the number of factors (r) can be estimated consistently by minimizing the fol- lowing criterion:

IC(k) = log(V(k)) + kN(Nj+ T) log( NT7 )

That is, for k > r, let k = argminO<l<k IC(k). Then P(k = r) -+ 1, as T, N -+ oo. This consistency result does not impose any restriction between N and T, except min{N, T} -+ oo. Bai and Ng also showed that AIC (with penalty 2/T) and BIC (with penalty log TIT) do not yield consistent estimators. In this paper, we shall assume a known r and focus on the limiting distributions of the estimated factors and factor loadings. Their asymptotic distributions are not affected when the number of factors is unknown and is estimated.' Additional assumptions are needed to derive their limiting distributions:

ASSUMPTION E-Weak Dependence: There exists M < oo such that for all T and N, and for evety t < T and every i < N:

1. ET 1 I^YN(S' t)l < M. 2. Yk=lIrkiI < M

This assumption strengthens C2 and C3, respectively, and is still reasonable. For example, in the case of independence over time, YN(S, t) = 0 for s 0 t. Then Assumption El is equivalent to (1/N) EN, E(e 2) <M for all t and N, which is implied by Cl. Under cross-section independence, E2 is equivalent to E(e, )2 < M. Thus under time series and cross-section independence, El and E2 are equivalent and are implied by Cl.

SThis follows because P(F, < x) = P(F, < x, k = r) +P(F, < x, k 0 r). But P(F, < x, k 0 r) < P(k A r)=o(l).Thus

P(F, <x) = P(F, <x, k=r) +o(l)

= P(F, < xlk =r)P(k = r) +o(l)

= P(F, < xk =r) + o(l)

because P(k = r) -- 1. In summary, P(F, < x) = P(F< < xlk = r) + o(l).

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 11: inferential theoy.pdf

144 JUSHAN BAI

ASSUMPTION F-Moments and Central Limit Theorem: There exists an M < oo such that for all N and T:

1. for each t,

1 T N 2

E - E E Fso[eksekt-E(eksekt)] M; ,VN_T s =1 k= 1

2. the r x r matrix satisfies

1 T N 2

E | | I E Ft A5 ekt < M; NT t=1 k=1

3. for each t, as N -+ oo,

1N d

-A?eit >N(O, Ft) ,VN where F = limNoo(l/N) EN 1 EN 1 A9A9'E(eitejt);

4. for each i, as T -* oo,

1T d E Foeit > N (0Oi)

where Pi = plimT,,(l/T) EsT EtT1 E[FtTFso'eiseit].

Assumption F is not stringent because the sums in Fl and F2 involve zero mean random variables. The last two assumptions are simply central limit theorems, which are satisfied by various mixing processes.

ASSUMPTION G: The eigenvalues of the r x r matrix (LA `2F) are distinct.

The matrices .F and LA are defined in Assumptions A and B. Assumption G guarantees a unique limit for (F'F?/T), which appears in the limiting distri- butions. Otherwise, its limit can only be determined up to orthogonal transfor- mations. A similar assumption is made in classical factor analysis; see Anderson (1963). Note that this assumption is not needed for determining the number of factors. For example, in Bai and Ng (2002), the number of factors is determined based on the sum of squared residuals V(r), which depends on the projection matrix Pp. Projection matrices are invariant to orthogonal transformations. Also, Assumption G is not required for studying the limiting distributions of the esti- mated common components. The reason is that the common components are identifiable. In the following analysis, we will use the fact that for positive defi- nite matrices A and B, the eigenvalues of AB, BA, and A1/2BA1/2 are the same.

PROPOSITION 1: Under Assumptions A-D and G,

PMT,N T 0F'F= Q. plim,N~T

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 12: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 145

The matrix Q is invertible and is given by Q = Vl/2T _l/2, where V=

diag(vl, v2, vr), Vl > V2 > ... > Vr > 0 are the eigenvalues of /2AF/, A' and T is the corresponding eigenvector matrix such that T'T = Ir.

The proof is provided in Appendix A. Under Assumptions A-G, we shall establish asymptotic normality for the principal component estimators. Asymp- totic theory for the principal component estimator exists only in the classical framework. For example, Anderson (1963) showed asymptotic normality of the estimated principal components for large T and fixed N. Classical factor analysis always starts with the basic assumption that there exists a root-T consistent and asymptotically normal estimator for the underlying N x N covariance matrix of Xt (assuming N is fixed). The framework for classical factor analysis does not extend to situations considered in this paper. This is because consistent estima- tion of the covariance matrix of X, is not a well defined problem when N and T increase simultaneously to infinity. Thus, our analysis is necessarily different from the classical approach.

3.1. Limiting Distribution of Estimated Factors

As noted earlier, FO and AO are not separately identifiable. However, they can be estimated up to an invertible r x r matrix transformation. As shown in the Appendix, for the principal components estimator F, there exists an invertible matrix H (whose dependence on N, T will be suppressed for notational simplic- ity) such that F is an estimator of FOH and A is an estimator of A0(H')-l. In addition, FA' is an estimator of FOAO', the common components. It is clear that the common components are identifiable. Furthermore, knowing FOH is as good as knowing FO for many purposes. For example, in regression analysis, using FO as the regressor will give the same predicted value as using FOH as the regres- sor. Because F? and FOH span the same space, testing the significance of FO in a regression model containing FO as regressors is the same as testing the signif- icance of FOH. For the same reason, the portfolio-evaluation measurements of Connor and Korajczyk (1986) give valid results whether F? or FOH is used.

THEOREM 1: Under Assumptions A-G, as N, T -- oo, we have: (i) if VN/T -- 0, then for each t,

W(-HfF?) = V (Fr>NT T ) EA?eit+op(M)

A N(O, V-W'QFQ'V-1),

where VNT is a diagonal matrix consisting of the first r eigenvalues of (1/NT)XX' in decreasing order, V and Q are defined in Proposition 1, and Ft is defined in F3;

(ii) if liminfNK/T > r >0, then

T(F,-H'Fto) = Op(1).

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 13: inferential theoy.pdf

146 JUSHAN BAI

The dominant case is part (i); that is, asymptotic normality generally holds. Applied researchers should feel comfortable using the normal approximation for the estimated factors. Part (ii) is useful for theoretical purpose when a conver- gence rate is needed. The theorem says that the convergence rate is min{VIN, T}. When the factor loadings AO (i = 1, 2,... , N) are all known, FjO can be esti- mated by the cross-section least squares method and the rate of convergence is ,VK. The rate of convergence min{x/N, T} reflects the fact that factor loadings are unknown and are estimated. Under stronger assumptions, however, the root- N convergence rate is still achievable (see Section 4). Asymptotic normality of Theorem 1 is achieved by the central limit theorem as N -- oo. Thus large N is required for this theorem.

Additional comments: 1. Although restrictions between N and T are needed, the theorem is not a

sequential limit result but a simultaneous one. In addition, the theorem holds not only for a particular relationship between N and T, but also for many com- binations of N and T. The restriction VN-/T -+ 0 is not strong. Thus asymp- totic normality is the more prevalent situation for empirical applications. The result permits simultaneous inference for large N and large T. For example, the portfolio-performance measurement of Connor and Korajczyk (1986) can be obtained without recourse to a sequential limit argument. See footnote 3.

2. The rate of convergence implied by this theorem is useful in regression analysis or in a forecasting equation involving estimated regressors such as

Yt+l = caFto +3Wt + ut+l (t = 1, 2, , T),

where Yt and Wt are observable, but Ft? is not. However, FO can be replaced by Ft. A crude calculation shows that the estimation error in Ft can be ignored as long as Ft = H'Fto + op(T"-12) with H having a full rank. This will be true if T/N -+ 0 by Theorem 1 because Ft- H'FtJ = Op(1/ min{x/N, T}). A more careful but elementary calculation shows that the estimation error is negligi- ble if F'(F - F0H)/T = op(T-1"2). Lemma B.3 in the Appendix shows that F'(F - F?H)/T = oP(5K2K), where 8NT = min{N, T}. This implies that the esti- mation error in F is negligible if V7/N -+ 0. Thus for large N, FtJ can be treated as known. Stock and Watson (1998, 1999) considered such a framework of fore- casting. Given the rate of convergence for Ft, it is easy to show that the forecast

YT+11T = aFT + 3WT is V7-consistent for the conditional mean E(YT+l IFT, WT), assuming the conditional mean of UT+1 is zero. Furthermore, in constructing con- fidence intervals for the forecast, the estimation effect of FT can be ignored pro- vided that N is large. If v7/N -+ 0 does not hold, then the limiting distribution of the forecast will also depend on the limiting distribution of FT; the confidence interval must reflect this estimation effect. Theorem 1 allows us to account for this effect when constructing confidence intervals for the forecast.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 14: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 147

3. The covariance matrix of the limiting distribution depends on the correla- tion structure in the cross-section dimension. If ei, is independent across i, then

(4) Ft= itm

1Eo.AOAO'. N--+oo Ni=1

i=l

If, in addition, o(i2 = o, = J,2 for all i, j, we have Ft - Or2A. In Section 5, we discuss consistent estimation of QFtQ'.

Next, we present a uniform consistency result for the estimated factors.

PROPOSITION 2: Under Assumptions A-E,

max -| F,-H'Fto Op(T-112) + Op1((TIN)1/2). 1<t<T (

This lemma gives an upper bound on the maximum deviation of the esti- mated factors from the true ones (up to a transformation). The bound is not the sharpest possible because the proof essentially uses the argument that maxt JIFt-H'FtIll < ET=, IIFt-H'FtIll. Note that if liminfN/T2 > c >0 then the maximum deviation is OP(T-1/2), which is a very strong result. In addition, if it is assumed that maxl<S<T II II = OP(CT) (e.g., when Ft is strictly stationary and its moment generating function exists, aT = log T), then it can be shown that, for ANT = min{vN, N/Y},

max || Ft-H'Fto OP(T- 1/281) + OP(a T-1) + Op ((T/N)1/2).

3.2. Limiting Distribution of Estimated Factor Loadings

The previous section shows that F is an estimate of F?H. Now we show that A is an estimate of A0(H')-l. That is, Ai is an estimate of H-1A? for every i. The estimated loadings are used in Lehmann and Modest (1988) to construct various portfolios.

THEOREM 2: Under Assumptions A-G, as N, T -+ oo: (i) if 1T/N -+ 0, then for each i,

v7(Ai-H-1AO) = VjI (F AO'AO 1 T

dA N(0, (Q') 'iQ1),

where VNT is defined in Theorem 1, Q is given in Proposition 1, and Pi in F4; (ii) if lim inf VY/N > r > 0, then

N(Ai - H-'A) = Op(1).

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 15: inferential theoy.pdf

148 JUSHAN BAI

The dominant case is part (i), asymptotic normality. Part (ii) is of theoretical interest when a convergence rate is needed. This rate is min{ v, N}. When the factors Fto(t = 1, 2,. . . , T) are all observable, AO can be estimated by a time series regression with the ith cross-section unit, and the rate of convergence is v7. The new rate min{V7, N} is due to the fact that FtJ's are not observable and are estimated.

3.3. Limiting Distribution of Estimated Common Components

The limit theory of estimated common components can be derived from the previous two theorems. Note that C1?t = Fto'A? and =tA = i.

THEOREM 3: Under Assumptions A-F, as N, T -+ oo, we have for each i and t,

( Vit + Wit) (Cit - CP?) -+N(0, 1)

where Vit = A?1'2A'F,23A'A9, Wit = FtO'.F'j.F'FtO, and -A, Ft, .F, and i are all defined earlier. Both Vit and Wit can be replaced by their consistent estimators.

A remarkable feature of Theorem 3 is that no restriction on the relationship between N and T is required. The estimated common components are always asymptotically normal. The convergence rate is aNT = min{VN, vT}. To see this, Theorem 3 can be rewritten as

(5) 8NT (Cit C t) d N(O,1)

(N t+T Wit)

The denominator is bounded both above and below. Thus the rate of convergence is min{VK, v7}, which is the best rate possible. When FO is observable, the best rate for Ai is v7. When AO is observable, the best rate for Ft is IN. It follows that when both are estimated, the best rate for A'Ft is the minimum of 1K and

v7. Theorem 3 has two special cases: (a) if NIT -+ 0, then 1N(C1 - C d?)

N(O, Vit); (b) if TIN -+ 0, then v7(Ct - Ci?) N(O, Wit). But Theorem 3 does not require a limit for T/N or N/T.

4. STATIONARY IDIOSYNCRATIC ERRORS

In the previous section, the rate of convergence for Ft is shown to be min{VN, T}. If T is fixed, it implies that Ft is not consistent. The result seems to be in conflict with that of Connor and Korajczyk (1986), who showed that the estimator Ft is consistent under fixed T. This inquiry leads us to the discovery of

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 16: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 149

a necessary and sufficient condition for consistency under fixed T. Connor and Korajczyk imposed the following assumption:

1 N 1 N2 2 (6) E eitei+ 0, t E s, and - e,t -+o for all t, as N -+oo.

We shall call the first condition asymptotic orthogonality and the second con- dition asymptotic homoskedasticity (1J2 not depending on t). They established consistency under assumption (6). This assumption appears to be reasonable for asset returns and is commonly used in the finance literature, e.g., Campbell, Lo, and Mackinlay (1997). For many economic variables, however, one of the con- ditions could be easily violated. We show that assumption (6) is also necessary under fixed T.

THEOREM 4: Assume Assumptions A-G hold. Under a fixed T, a necessary and sufficient condition for consistency is asymptotic orthogonality and asymptotic homoskedasticity.

The implication is that, for fixed T, consistent estimation is not possible in the presence of serial correlation and heteroskedasticity. In contrast, under large T, we can still obtain consistent estimation. This result highlights the importance of the large-dimensional framework.

Next, we show that our previous results can also be strengthened under homoskedasticity and no serial correlation.

ASSUMPTION H-E(eiteis) = 0 if t 0 s, Ee,2 = o2, and E(eitejt) = rij, for all t, i, and ].

Let i2 = (1/N) ENL1 f2, which is a bounded sequence by Assumption C2. Let VNT be the diagonal matrix consisting of the first r largest eigenvalues of the matrix (1/TN)XX'. Lemma A.3 (Appendix A) shows VNT A V, a positive definite matrix. Define DNT = VNT(VNT- iN2) 1; then DNT Ir, as T and N go to infinity. Define H = HDNT.

THEOREM 5: Under Assumptions A-H, as T, N -x oo, we have

N(Ft- H'Fto) N(O, V-'QrQ'v-1)

where F = plim(AO'f2A0/N) and Q = E(ete') = (rij).

Note that cross-section correlation and cross-section heteroskedasticity are still allowed. Thus the result is for approximate factor models. This theorem does not require any restriction on the relationship between N and T except that they both go to infinity. The rate of convergence (VN) holds even for fixed T, but the limiting distribution is different.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 17: inferential theoy.pdf

150 JUSHAN BAI

If cross-section independence and cross-section homoskedasticity are assumed, then Theorem 2 part (i) also holds without any restriction on the relationship between N and T. However, cross-section homoskedasticity is unduly restrictive. Assumption H does not improve the result of Theorem 3, which already offers the best rate of convergence.

5. ESTIMATING COVARIANCE MATRICES

In this section, we derive consistent estimators of the asymptotic variance- covariance matrices that appear in Theorems 1-3.

(a) Covariance matrix of estimated factors. This covariance matrix depends on the cross-section correlation of the idiosyncratic errors. Because the order of cross-sectional correlation is unknown, a HAC-type estimator (see Newey and West (1987)) is not feasible. Thus we will assume cross-section indepen- dence for eit (i = 1, 2,. . . , N). The asymptotic covariance of Ft is given by

1t = V-1QFtQ'V-1, where Ft is defined in (4). That is,

1t = plimv T ( )N( "A' )( TEW

This matrix involves the product F0A0', which can be replaced by its estimate FA'. A consistent estimator of the covariance matrix is then given by

(7) V=~ vk')(+~ ; (Jv~ 1= ( ) t N(T )NE ) T ) T VI

X4 N i)N

where eit = Xit- AFt. Note that F'F/T = I. (b) Covariance matrix of estimated factor loadings. The asymptotic covariance

matrix of Ai is given by (see Theorem 2)

e(Q/)-1? 1Q-1.

Let Oi be the HAC estimator of Newey and West (1987), constructed with the series {Ft . jit} (t = 1, 2,. . . , T). That is,

Oi = Do, i + (:1 - +~ )(Dvi + Dvi)

where Di = (l/T) ZT=1 Feit-vFt Fv and q goes to infinity as T goes to infin- ity with q/T1/4 -* 0. One can also use other HAC estimators such as Andrews' (1991) data dependent method with quadratic spectral kernel. While a HAC esti- mator based on Ft? eit (the true factors and true idiosyncratic errors) estimates 4i, a HAC estimator based on Ft -e- is directly estimating &i (because Ft esti- mates H'Ft1). The consistency of &i is proved in the Appendix.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 18: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 151

(c) Covariance matrix of estimated common components. Let

___ 1 N _

vit = i N ) N Elt AiAI) N)

( F'F 71 (F'F 1

be the estimators of Vit and Wi, respectively.

THEOREM 6: Assume Assumptions A-G and cross-sectional independence. As, T, N xc, Ht, &i, Vit, and Wit are consistent for Ht, &i, Vit, and Wit, respectively.

One major point of this theorem is that all limiting covariances are easily estimable.

6. MONTE CARLO SIMULATIONS

We use simulations to assess the adequacy of the asymptotic results in approx- imating the finite sample distributions of Ft, Ai, and Cit. To conserve space, we only report the results for Ft and Cit because Ai shares similar properties to Ft. We consider combinations of T = 50, 100, and N = 25, 50, 100, 1000. Data are generated according to Xit = A? Fto + eit, where AO, FtJ, and eit are i.i.d. N(O, 1) for all i and t. The number of factors, r, is one. This gives the data matrix X (T x N). The estimated factor F is v7T times the eigenvector corresponding to the largest eigenvalue of XX'. Given F, we have A = X'F/T and C = FA'. The reported results are based on 2000 repetitions.

To demonstrate that Ft is estimating a transformation of F/? (t = 1, 2, .. ., T), we compute the correlation coefficient between {Ft[ 1 and {Fto}IT'1. Let p(e, N, T) be the correlation coefficient for the fth repetition with given (N, T). Table I reports this coefficient averaged over L = 2000 repetitions; that is, (1/L) I:L 1 p (f, N, T).

The correlation coefficient can be considered as a measure of consistency for all t. It is clear from the table that the estimation precision increases as N grows. With N = 1000, the estimated factors can be effectively treated as the true ones.

We next consider asymptotic distributions. The main asymptotic results are

(8) N(JFt - H'Fto) d~ N(0, 1Ht)

TABLE I

AVERAGE CORRELATION COEFFICIENTS BETWEEN

{F,},=1 AND t t=l

N=25 N=50 N=100 N=1000

T =50 0.9777 0.9892 0.9947 0.9995 T = 100 0.9785 0.9896 0.9948 0.9995

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 19: inferential theoy.pdf

152 JUSHAN BAI

where H1 = V-'QFrQ'V-, and

1 V1/2d

vit + w1t) (Cit - C) d N(O, 1)

where Vit and Wit are defined in Theorem 3. We define standardized estimates as

ft =H H"2VN(Ft-H'Ft0) and c-W=. C q)

where 1tl, Vit, and Wit are the estimated asymptotic variances given in Section 5. That is, the standardization is based on the theoretical mean and theoretical variance rather than the sample mean and sample variance from Monte Carlo repetitions. The standardized estimates should be approximately N(O, 1) if the asymptotic theory is adequate.

We next compute the sample mean and sample standard deviation (std) from Monte Carlo repetitions of standardized estimates. For example, for ft, the sam- ple mean is defined as ft = (1/L) 'f=1 ft(t) and the sample standard deviation is the square root of (1/L) Z= (tf(t) _ft)2, where t refers to the fth repetition. The sample mean and sample standard deviation for cit are similarly defined. We only report the statistics for t = [T/2] and i = [N/2], where [a] is the largest integer smaller than or equal to a; other values of (t, i) give similar results and thus are not reported.

The first four rows of Table II are for ft and the last four rows are for cit. In general, the sample means are close to zero and the standard deviations are close to 1. Larger T generates better results than smaller T. If we use the two- sided normal critical value (1.96) to test the hypothesis that ft has a zero mean with known variance of 1, then the null hypothesis is rejected (marginally) for just two cases: (T, N) = (50,1000) and (T, N) = (100, 50). But if the estimated standard deviation is used, the two-sided t statistic does not reject the zero-mean hypothesis for all cases. As for the common components, cit, all cases except N = 25 strongly point to zero mean and unit variance. This is consistent with the asymptotic theory.

TABLE II

SAMPLE MEAN AND STANDARD DEVIATION OF ft AND Ci,

N = 25 N = 50 N = 100 N = 1000

ft T=50 mean 0.0235 -0.0189 0.0021 -0.0447 T = 100 mean 0.0231 0.0454 -0.0196 0.0186

ft T=50 std 1.2942 1.2062 1.1469 1.2524 T= 100 std 1.2521 1.1369 1.0831 1.0726

cit T=50 mean -0.0455 -0.0080 -0.0029 -0.0036 T = 100 mean 0.0252 0.0315 0.0052 0.0347

cit T=50 std 1.4079 1.1560 1.0932 1.0671 T = 100 std 1.1875 1.0690 1.0529 1.0402

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 20: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 153

We also present graphically the above standardized estimates f, and ci. Figure 1 displays the histograms of ft for T = 50 and Figure 2 for T = 100. The histogram is scaled to be a density function; the sum of the bar heights times the bar lengths is equal to 1. Scaling makes for a meaningful compari- son with the standard normal density. The latter is overlayed on the histogram. Figure 3 and Figure 4 show the histogram of ci, for T = 50 and T = 100, respec- tively, with the density of N(O, 1) again overlayed.

It appears that asymptotic theory provides a very good approximation to the finite sample distributions. For a given T, the larger is N, the better is the approx- imation. In addition, F, does appear to be VH consistent because the histogram is for the estimates V(Ft - H'Ft) (divided by its variance). The histograms stay within the same range for the standardized F, as N grows from 25 to 1000. Similarly, the convergence rate for Ci, is min{-VfN, N'7} for the same reason. In general, the limited Monte Carlo simulations lend support to the theory.

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

o C -5 0 5 -5 0 5

T=50, N=25 T=50, N=50

0.4 0.4-

0.3 0.3

0.2 0.2

0.1 0.1

o 0 -5 0 5 -5 0 5

T=50, N=100 T=50, N=1000

FIGURE 1.- Histogram of estimated factors (T = 50). These histograms are for the standard- ized estimates f, (i.e., VH(F, - HF,?) divided by the estimated asymptotic standard deviation). The standard normal density function is superimposed on the histograms.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 21: inferential theoy.pdf

154 JUSHAN BAI

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

-5 0 5 -5 0 5 T=100, N=25 T=100, N=50

0.4 --0.4

0.3 0.3

0.2 0.2

0.1 0.1

C C -5 0 5 -5 0 5

T=100, N=100 T=100, N=1000

FIGURE 2.- Histogram of estimated factors (T = 100). These histograms are for the standard- ized estimates f, (i.e., -1N(F, - HF,0) divided by the estimated asymptotic standard deviation). The standard normal density function is superimposed on the histograms.

Finally, we construct confidence intervals for the true factor process. This is useful because the factors represent economic indices in various empirical appli- cations. A graphical method is used to illustrate the results. For this purpose, no Monte Carlo repetition is necessary; only one draw (a single data set X (T x N)) is needed. The case of r = 1 is considered for simplicity. A small T (T = 20) is used to avoid a crowded graphical display (for large T, the resulting graphs are difficult to read because of too many points). The values for N are 25, 50, 100, 1000, respectively. Since F, is an estimate of H'FJ?, the 95% confidence interval for H'FtJ, according to (8), is

(Ft- 1.96HJ'2N-12, Ft + 1.96HJ'2N-/2) (t = 1, 25, ... , T).

Because FO is known in a simulated environment, it is better to transform the confidence intervals to those of FtJ (rather than H'FtJ) and see if the resulting confidence intervals contain the true Ft0. We can easily rotate F toward FO by

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 22: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 155

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

CL=i 0 -5 0 5 -5 0 5

T=50, N=25 T=50, N=50

0.4 - - 0.4 -

0.3 0.3

0.2 0.2

0.1 0.1

o C. -5 0 5 -5 0 5

T=50, N=100 T=50,N=1000

FIGURE 3.- Histogram of estimated common components (T = 50). These histograms are for the standardized estimates cit (i.e., min{VN, T}(Ci, - Cit) divided by the estimated asymptotic standard deviation). The standard normal density function is superimposed on the histograms.

the regression6 FO = F/ + error. Let / be the least squares estimate of P. Then the 95% confidence interval for Fj? is

(Lt, Ut) = (Pt-1.96pIti42Nt 12 ,N F + 1.96PHt1'2N-1/2)

(t= 1, 2, ... ., T).

Figure 5 displays the confidence intervals (Lt, Ut) and the true factors Fj? (t= 1, 2, . . ., T). The middle curve is Ft?. It is clear that the larger is N, the narrower are the confidence intervals. In most cases, the confidence intervals contain the true value of Ft?. For N = 1000, the confidence intervals collapse to the true

6 In classical factor analysis, rotating the estimated factors is an important part of the analysis. This particular rotation (regression) is not feasible in practice because FO is not observable. The idea here is to show that F is an estimate of a transformation of FO and that the confidence interval before the rotation is for the transformed variable FOH.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 23: inferential theoy.pdf

156 JUSHAN BAI

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 . . . . . 0 -5 0 5 -5 0 5

T=100, N=25 T=100, N=50

0.4 0.4-

0.3 0.3

0.2 0.2

0.1 0.1

C 0 -5 0 5 -5 0 5

T=100, N=100 T=100, N=1000

FIGURE 4.- Histogram of estimated common components (T = 100). These histograms are for the standardized estimates cit (i.e., min{VN, I7}(Cit - CO) divided by the estimated asymptotic standard deviation). The standard normal density function is superimposed on the histograms.

values. This implies that there exists a transformation of F such that it is almost identical to FO when N is large.

7. CONCLUDING REMARKS

This paper studies factor models under a nonstandard setting: large cross sections and a large time dimension. Such large-dimensional factor models have received increasing attention in the recent economic literature. This paper considers estimating the model by the principal components method, which is feasible and straightforward to implement. We derive some inferential theory concerning the estimators, including rates of convergence and limiting distribu- tions. In contrast to classical factor analysis, we are able to estimate consistently both the factors and their loadings, not just the latter. In addition, our results are obtained under very general conditions that allow for cross-sectional and serial

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 24: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 157

4 4

0- 0~~~~~~~~~~~~~

-3 -3. 0 5 10 15 20 0 5 10 15 20

2 -2 -2

I' 1

0 5 10 15 20 0 5 10 15 20 T=20, N=100 T=20,N N1000

FIGURE 5.- Confidence intervals for F,? (t =1, . ,T). These are the 95 % confidence intervals (dashed lines) for the true factor process F,?(t = 1, 2, . .., 20) when the number of cross sections (N) varies from 25 to 1000. The middle curve (solid line) represents the true factor process.

correlations and heteroskedasticities. We also identify a necessary and sufficient condition for consistency under fixed T.

Many issues remain to be investigated. The first is the quality of approxima- tion that the asymptotic distributions provide to the finite sample distributions. Although our small Monte Carlo simulations show adequate approximation, a thorough investigation is still needed to access the asymptotic theory. In particu- lar, it is useful to document under what conditions the asymptotic theory would fail and what are the characteristics of finite sample distributions. The task is then to develop an alternative asymptotic theory that can better capture the known properties of the finite sample distributions. This approach to asymptotic analysis was used by Bekker (1994) to derive a new asymptotic theory for instrumental- variable estimators, where the number of instruments increases with the sample size; also see Hahn and Kursteiner (2000). This framework can be useful for fac- tor analysis because the data matrix has two growing dimensions and some of our asymptotic analysis restricts the way in which N and T can grow. In addition,

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 25: inferential theoy.pdf

158 JUSHAN BAI

Bekker's approach might be useful when the number of factors (r) also increases with N and T. Asymptotic theory for an increasing r is not examined in this paper and remains to be studied.

Another fruitful area of research would be empirical applications of the the- oretical results derived in this paper. Our results show that it is not necessary to divide a large sample into small subsamples in order to conform to a fixed T requirement, as is done in the existing literature. A large time dimension deliv- ers consistent estimates even under heteroskedasticity and serial correlation. In contrast, consistent estimation is not guaranteed under fixed T. Many existing applications that employed classical factor models can be reexamined using new data sets with large dimensions. It would also be interesting to examine common cycles and co-movements in the world economy along the lines of Forni et al. (2000b) and Gregory and Head (1999).

Recently, Granger (2001) and others called for large-model analysis to be on the forefront of the econometrics research agenda. We believe that large-model analysis will become increasingly important. Data sets will naturally expand as data collection, storage, and dissemination become more efficient and less costly. Also, the increasing interconnectedness in the world economy means that vari- ables across different countries may have tighter linkages than ever before. These facts, in combination with modern computing power, make large-model analysis more pertinent and feasible. We hope that the research of this paper will encour- age further developments in the analysis of large models.

Department of Economics, New York University, 269 Mercer St., 7th Floor, New York, NY 10003 US.A.; [email protected]

Manuscript received February, 2001; final revision received May, 2002.

APPENDIX A: PROOF OF THEOREM 1

As defined in the main text, let VNT be the r x r diagonal matrix of the first r largest eigenvalues of (1/TN)XX' in decreasing order. By the definition of eigenvectors and eigenvalues, we have (1/TN)XX'F = FVNT or (1/NT)XX'FVN-T = F. Let H = (A0'Ao/N2(Fr'F/T)V;-N be an r x r matrix and 8NT = min{VT/V, ,V}. Assumptions A and B together with FF/T = I and Lemma A.3 below imply that IIHII = Op(1). Theorem 1 is based on the identity (also see Bai and Ng (2002)):

/ TT - 1 T 1 (A.1) F'-H H'J = V Fs sT L Fses,)

where

e's et ;5t = N -YN(s, t),

(A.2) 71 = Fs'A?'et/N

t = Ft'A0'es/N.

To analyze each term above, we need the following lemma.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 26: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 159

LEMMA A. 1: Under Assumptions A-D,

82(1 i F 1H'F 2) = (1).

PROOF: See Theorem 1 of Bai and Ng (2002). Note that they used II VNTJI - VNTH'F0 12 as the summand. Because VNT converges to a positive definite matrix (see Lemma A.3 below), it follows that II VNTII = Op(l) and the lemma is implied by Theorem 1 of Bai and Ng. QE.D.

LEMMA A.2: Under Assumptions A-F, we have

(a) T-1 El F sYN(S, t) = Op(

(b) T-1 l Fsvst = ?P( INT

(C) T-1 T 1 =

(d) T-1 _sT= s = 0s(est N)-

PROOF: Consider part (a). By adding and subtracting terms,

T T

T-' IFYN(s, t) = T- >(F s-H'Fso + H'Fs)YN(S, t) s=1 s=1

T T

T-1 E(Fs -H'Fso)yN(s, t)+H'T-1 E F50YN(S, t). s=1 s=1

Now (1T) T =I Fs0 YN (S, t) = Op(1 /T) since

T T

EIFs?YN(S,t) '<(maxE jDFsjj)EIyN(s,t)j < M1+114

s=1 s=1

by Assumptions A and El. Consider the first term:

T 1E(s-H'Fs0)YN(S, <) - Ts-H'JFs?I 2) ( IYN (SI t) 12)

which is

Op(8) ) 3 0(1) = Op(vI8)

by Lemma A.1 and Assumption El. Consider part (b):

T T T

T-1 E Svst = T-1 (Fs - H'Fso);st + H'T-1 E Fso;st. s=1 s=1 s=1

For the first term,

T-1 (Rs - HFso);st| < (T 11Ps -H'Fsol 11 (-Ts2' s=1

Furthermore,

T= T e' [eNtN5 t)] 2=-T [e'e' E(e'e) 2 St -T IN YN(kSI TLE(I N N

s=1 =

y N-1 >Z(eseit - E(eiseit))] O

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 27: inferential theoy.pdf

160 JUSHAN BAI

Thus the first term is

Op (6 ) Op (4

Next,

T 1T N(NT T-1 N F T =eit-E(eiseit)) = p

by Assumption Fl. Thus

T1 EFs4St O=P (8l)?p0Q) + P O)P(V N)

Consider part (c):

T T T

T 1Z Fs7s = T IZ (Fs- H'Fso) 71st + H'T- 1 F 7-q s=1 s=1 s=1

NoteT- _ Fst I E1T FoFo,) I

N Akekt = ?p(*) The first term is

T-1 E(Fs-H'FsO)7qst < ( JTPs-H'FsO 2) (1 2)

The first expression is Op(1/8NT) by Lemma A.1. For the second expression,

T- 1 = T'1 Z(Fo Aoet/N)2 < l4 tNI2T irF5 ||=OP(ii),

since (1/T)s[=1 IIF211l2 = Op,(1), and IiA0'et/VKI2 = Op(l). Thus, (c) is Op(1/VK). Finally for part (d),

T T T

T 1Z FS St = T-1 EFsFO'A'et/N T1 (FetNT' s/N)JF? s=, s=1 s=l

= NT (T IH I)eAO12 1 T1

Consider the first term

|(NTHF seHAs >o t(4 )Fwj ||<A-LISHS 2)'(1~ 'eA0 2|)1/2?

- Op ( s ) * Op (1 8 ) *h0p (1) = Op ( N )

following arguments analogous to those of part (c). For the second term of (d),

1 EFO NS stOt A /B(esN =A1eks)eo = Op(),

NT s=1 =~ \=k 1/ T

by Assumption F2. Thus, N1T1 L> H'F5e5A0F? = Op(l/) and (d) is OP(1/V8NT) The

proof of Lemma A.2 is complete. QED.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 28: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 161

LEMMA A.3: Assume Assumptions A-D hold. As T, N -+ oc:

(i) T-1F( NXX)F= VNT P V,

(ii) TF(AN )FTF V

where V is the diagonal matrix consisting of the eigenvalues of XAXF.

PROOF: From (1/TN)XX'F = FVNT and FF/T = I, we have T-1F (1/TN)XXF= VNT. The rest is implicitly proved by Stock and Watson (1999). The details are omitted. Q.E.D.

Because V is positive definite, the lemma says that F0'F/T is of full rank for all large T and N and thus is invertible. We also note that Lemma A.3(ii) shows that a quadratic form of F0'F/T has a limit. But this does not guarantee FO'F/T itself has a limit unless Assumption G is made. In what follows, an eigenvector matrix of W refers to the matrix whose columns are the eigenvectors of W with unit length and the ith column corresponds to the ith largest eigenvalue.

PROOF OF PROPOSITION 1: Multiply the identity (1/TN)XX'F -FVNT on both sides by T-1(A'AO/N)1/2Fol to obtain:

(AO'AO ) T/2-1Fo, XX ' )F= A 'A ) FO'F VNT

.

Expanding XX' with X = FOAO' + e, we can rewrite the above as

(A.3) (AOAO)l(FOFO)(AOAO)(FOF)+dNT =V(ANA)(FF)VT

where

dNT=(AA )[(F'F)AO'eTI (TN)?+ F? eAF F/ T?+ TNF'ee'F/ T]

= op (1).

The op(l) is implied by Lemma A.2. Let

A ,o,,A, 0 1/2 FO,FO A tZtA o 1/2

BNT = (AN )/VFT (A N)J

and

(A.4) RNT =AoAo)1/2(F?F)

then we can rewrite (A.3) as

[BNT + dNTR T]RNT = RNTVNT.

Thus each column of RNT, though not of length 1, is an eigenvector of the matrix [BNT + dNTRyI-1 Let VNT be a diagonal matrix consisting of the diagonal elements of R'NTRNT. Denote TNT = RNT VNT-1/2

so that each column of TNT has a unit length, and we have

[BNT+dNTR- ]TNT = TNT VNT-

Thus TNT is the eigenvector matrix of [BNT + dNTRN-1I. Note that BNT + dNTR-1 converges to B =

XAX12F-A by Assumptions A and B and dNT = op(l). Because the eigenvalues of B are distinct by Assumption G, the eigenvalues of BNT + dNTRNT will also be distinct for large N and largeT by the

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 29: inferential theoy.pdf

162 JUSHAN BAI

continuity of eigenvalues. This implies that the eigenvector matrix of BNT + dNTRl-1 is unique except that each column can be replaced by the negative of itself. In addition, the kth column of RNT (see (A.4)) depends on F only through the kth column of F (k = 1, 2,... I r). Thus the sign of each column in RNT and thus in TNT = R 1TVNT112 is implicitly determined by the sign of each column in F. Thus, given the column sign of F, TNT is uniquely determined. By the eigenvector perturbation theory (which requires the distinctness of eigenvalues; see Franklin (1968)), there exists a unique eigenvector matrix T of B = X112.FXFS'2 such that ITNT - Ti = op(l). From

F0'F /AoIAo\ 1/2 __ = ( ) rNTV*1/2

T ~N} TNT NT'

we have

F0'F p -1/21V1/2

T > A

V

by Assumption B and by V NT V in view of Lemma A.3(ii). Q.E.D.

PROOF OF THEOREM 1: Case 1: VN/T -+0. By (A.1) and Lemma A.2, we have

(A.5) Ft-H F,o=O p()+Op( v!) ? + (2k) + 4 Op( T).

The limiting distribution is determined by the third term of the right-hand side of (A.1) because it is the dominant term. Using the definition of -qst,

T N1

(A.6) VN(Ft-H'Fto) = VN- 1T-1E (PsFs') N E Akoeit + op(l). s=1 i=1

Now (1/IN=) Ay -e d

N(O, Ft) by Assumption F3. Together with Proposition 1 and Lemma A.3, we have VN(,t - H'Ft) dA N(O, V- QFtQ'V-1) as stated.

Case 2: If lim inf V/NT > r > 0, then the first and the third terms of (A.1) are the dominant terms. We have

T(F,t - H'Fto) = Op (1) + Op(T/VN) = Op(1)

in view of limsup(T/VH) < 1/r < oo. Q.E.D.

PROOF OF PROPOSITION 2: We consider each term on the right-hand side of (A.1). For the first term,

(A.7) max T1 |EFsYN(S' t) < T - (T E IlFsl) max ( N (SIt

The above is OP(T-1/2) following from T-',= lIFsll2= Op(l) and _ST=l YN(S, I)2 M1 for some

M, < 00 uniformly in t. The remaining three terms of (A.1) are each Op((T/N)1/2) uniformly in t. To see this, let Pt = T-1 _s=l PsJst. It suffices to show maxt II t 112 = Op(T/N). But Bai and Ng (2002) proved that _IT= jjij2 = Op(T/N) (they used notation bt instead of Ikvt f2). Bai and Ng also obtained the same result for the third and the fourth terms. Q.E.D.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 30: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 163

APPENDIX B: PROOF OF THEOREM 2

To prove Theorem 2, we need some preliminary results.

LEMMA B. 1: Under Assumptions A-F, we have

T-1(F-F0H)'ei = Op (

PROOF: From the identity (A.1), we have

T T T T T T-1 E(F, - H'Fo)ei, = VN-T T-2 EZjFYN(S, t)ei, + T2 jE F,teit

t=1 t=l s=l t=1 s=1

T T T T +T-2 Es 7steit+T T 2 Fs steit

t=l s=l t=l s=l

- VIT (I + II + III + IV).

We begin with I, which can be rewritten as

T T T T

I = T EE(s - H'Fs)yN(s, t)ei,+T 2H'ZFsYN(s' t)e,. t=1 s=1 t=1 s=1

The first term is bounded by

T-1/2 - E II Ps - H' Fso11) 2 (T-1 IYN (S, t)12T-1 E 2 T-')/2 - (S-N )OP(l), s=l t=1 s=l t=l

where the Op(l) follows from Eei, < M and T-1 _S= _T=1 I YN(S, t)12 < M by Lemma 1(i) of Bai and Ng (2002). The expected value of the second term of I is bounded by (ignore H)

T-2 ly IY(s t)I(EIIFOIIl2\/2(Ee2,)1/2 < AfT1 (T-1 SI^Y( St) = OT1 t=l s=l1= =

by Assumption C2. Thus I is OP(T-1I28N1). For II, we rewrite it as

T T T T II T2 - ,iH'tE)EI,eY, + TN2H' t

t=1 s=1 t=l s=l

The second term is Op(1l/N-) by Assumption Fl. To see this, the second term can be written as (1/ViNT)((1/T) ET=1 z,ei,) with z, = (1/VNY) T=, y= Fs [eksek,t - E(eksek )11 By F, E||z 112 < M. Thus Ellz,ei,ll < (EIIZ,ll2Ee2)112 < M. This implies (l/T) zT1 z,ei, = Op(l). For the first term, we have

|T-2 E E(Fs -HFs )steit t=1 s=1

F - IFH'FsoI2 (

E E st 1E eit)

But

T ;stet = T (1 N2). e ,IN [esk E(eksekt)I)eit = PN

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 31: inferential theoy.pdf

164 JUSHAN BAI

So the first term is

Op(h) *Op ( ) = Op_( _

Thus II = 0P(1/8NTV1).

For III, we rewrite it as

T T T T III = T-2 E E(FS - H'F?) Stei, + T-2 EEH'Fs?,,ej,

t=1 s=1 t=1 s=1

The first term is bounded by

( 1 IFs - H'Fso| ) T (T 71st eit) O) O p6 )P(

because

T N T, 71ti Fso S T kkekt eit, T =

T t =N\v k=1 /

which is OP (N-1/2). The second term of III can be written as

(B.1) H(- F Fo') (TN E7kk ekt eit)

which is OP((NT)-1/2) if cross-section independence holds for the e's. Under weak cross-sectional dependence as in Assumption E2, the above is OP((NT)-1/2) + 0P(N-1). This follows from ekteit =

kt where Tki,t =E(ekteit). We have (1/TN) _t=1 Fk=1 I kitI < (1/N) _k=1 Tki = O(N-1) by E2, where I,Tki tl Tki. In summary, III is 0P(1/8NTV7N) + O(N-1). The proof of IV is similar to that of III. Thus

I+ 1)+0 IJI (- = ?? ( ) 1 -O P ( N `W%/78NT ' 8N T V'N IN NT!

=OP (2 Q.E.D.

LEMMA B.2: Under Assumptions A-F, the r x r matrix T-1 (F - F?H)'F? = 0P(8,).

PROOF: Using the identity (A.1), we have

T T T T T T-1 E?(Ft-H'F?)FJ?' = VNT T2 EE sFt0'yN(S, t)+T2EEJJF0'

t=1 t=l s=1 t=1 s=1

T T T T

+ T-2 j Fs FJ + T-2 E 1 FsFto'est t I St~~~~ t=l s=l t=l s=l

= VNT(I +II +III +V).

Term I is OP(T-1/28NIT). The proof is the same as that of I of Lemma B.1. Next,

T T T T

II = T-2 > j (Fs - H'Fso)Fto'st + T-2 E E H'FsoFto';s t t=1 s=1 t=1 s=1

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 32: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 165

The first term is OP(1/8NTVN) following arguments analogous to Lemma B.1. The second term is Op(1/NKT) by Assumption Fl and the Cauchy-Schwarz inequality. Thus, II = OP(1/8NTVTN). Next, note that

T T T T

HI = T-2 (Fs - H'JFs)JF,'ii, + T2 H E H Ftot. t=1 s=1 t=1 s=1

Now

T EE H'Fs?Fto'i7st = H' E TN soFso' = Op(l)Op(_p0 t=1 s=1T = TNt k1NT

by Assumption F2. Consider

T2 E (FS - H'F5)Fto'75, < E F 11H's - ) ( H Fso 1F1 2 -E )F t=i s=i ~ S -

T T ,

to st t=1 S=1 ~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I I \ t=1

The second term can be rewritten as

TNT EFj,AN e 2 21/2 N( s'_1 T TN 2)1/2

(T <<NT E , EF5AkNl||) T E|F? E I kF'k kt

which is Op(1l/VN) by F2. Therefore,

- ,,(F,- H'Fso)Fto'qBst= (SN) ()

Thus,

III =P (40 L) P (L) + o0( L) = o0(L.

The proof for IV is similar to that of III. Thus

I +H +II 1 +IV = opQ( ) + P0(<w) + ? + ( N )= ( ) Q.E.L

LEMMA B.3: Under Assumptions A-F, the r x r matrix T-1 (F - FH)'F = OP(86)-

PROOF:

T-1 (F - FH)'F = T-1 (F - F?H)F?H + T-1 (F - F?H)'(F - F?H).

The lemma follows from Lemma B.2 and Lemma A.1. Q.E.D.

PROOF OF THEOREM 2: From A = F'X/T and X = F0A' + e, we have Ai = T-1F'FFA0 + T-1F'ei. Writing FO = F? - FH-1 + FH-1 and using T-IF'F = I, we obtain

A = H1Ak + T1 H'F0'ej + T-1F'(F0 -FH-1)A + T-1 (F-FH)'e.

Each of the last two terms is OP(852 ) by Lemmas B.3 and B.1. Thus

(B.2) AiH-1Ak = H'4 Z Fseis+Op s=1 N

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 33: inferential theoy.pdf

166 JUSHAN BAI

Case 1: vfT/N -O 0. Then vY/82 T -? 0 and thus

1 T (Aj-H-H1A0) = H' E Fseis + op(l).

T s=l P

But H' - V-1 Q A = Q'1 . The equality follows from the definition of Q; see Proposition 1. Together with Assumption F4, the desired limiting distribution is obtained.

Case 2: lim inf IT/N > c with c > 0. The first term on the right-hand side of (B.2) is Op(T-1/2), and the second term is OP(N-1) in view of N << T and thus 2T = O(N). Thus

N(Ai - H-1A) = 0(N/-T) + Op(1) = Op(1)

because limsup(N/V7) < 1/c < oo. Q.E.D.

APPENDIX C: PROOF OF THEOREM 3

From C?, = F1?'AO and Ci,> F='At, we have

Ci, - Ct = (Ft - H'Fto)'H-'1A? + t'j(A - H-1A?).

By adding and subtracting, the second term can be written as

Fto'H(AI - H-1A0) + (F,-H'Fo)'(A - H-1A0) = Fto'H(A -H-H1A0) + Op(l/82T).

Thus,

Ci,- Cit = Ak'H'-1 (F,t - H'F) + Fto'H(Xj - H-'A) + 0P(1/8NT)

From (A.5) and (A.6),

NT(Ft- H'Fto) = NT VN ( FT ? = o Akoekt + op(l/QNT)x T

From (B.2),

8NT (A I-HA) =NTH 1 >jFs +p ( 1 ) _VY _/Ti S_= ~ NT

Combining the preceding three equations, we obtain

5NT it- Clt) = NAkH NT( T )V FAkekt

+ - F = HH : Fso eis + OP ( ' VIs=i7 S= IS N

By the definition of H, we have H'-1 VN-(F'F0/T) = (A0'A0/N)-1. In addition, it can be shown that HH' = (F0'F0/T)-1 + OP(1/82NT). Thus

Ci 8 ~AO'Ao -1 1 N

8iNT(C ,-C0t) = -NT A0 (A0) e +N

0i NT N _) _k=1

k kt

__N 0o F'FO- T1 NtF Fo eis +0pQ)

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 34: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 167

Let

tAO'AO\l11 N

(NT Ai N) N E= Akek

d d Then (NT N N(0, Vit) by Assumptions B and F3, where Vi, is defined in Theorem 3. Let

F?'F? -1 1 T

vNT =F,?' -

E Fso ei s=1

d d Then vNt = N(0, Wi,) by Assumptions A and F4, where Wi, is defined in Theorem 3. In addition, (NT and vNT are asymptotically independent because the former is the sum of cross-section random variables and the latter is the sum of a particular time series (ith). Asymptotic independence holds under our general assumptions that allow for weak cross-section and time series dependence for ei,. This implies that ((NT, vNT) converges (ointly) to a bivariate normal distribution. Let aNT = 8NT/IN and bNT = 8NT/VT,; then

(C.1) NT(C it- Cit) = aNT NT + bNT NT + OP(l/8NT).

The sequences aNT and bN_ are bounded and nonrandom. If they converge to some constants, then asymptotic normality for Cit follows immediately from Slustky's Theorem. However, aNT and bNT are not restricted to be convergent sequences; it requires an extra argument to establish asymptotic normality. We shall use an almost sure representation theory (see Pollard (1984, page 71)). Because

d ((NT, vNT) - ((, ;), the almost sure representation theory asserts that there exist random vectors (NT' vNT) and (6*, ~*) with the same distributions as ((NT, vNT) and (6, ;) such that (MT7 vNT)

(6*, ;*) (almost surely). Now

aNT6NT + bNT;NT = aNTe + bNT;* + aNT(6NT - 6*) + bNT(AkT -

The last two terms are each op(l) because of the almost sure convergence. Because 6* and ; are independent normal random variables with Vit and Wit as their variances, we have aNT6* + bNT* -N(0, aNTVit+bNTWit). That is, (aNTVit,+bNT W^,Yl/2(aNT6* + bNT;*) = N(O, 1). This implies that

aNT6NT +b2 4 N(O, 1).

(aNTVit + bNT t)112

The above is true with (NT. 4NT) replaced by (6NT GNT) because they have the same distribution. This implies that, in view of (C.l),

8NT(Cit + Cit) 1, - 0) (a>T, +b b tW)112-4NO1)

The above is (5), which is equivalent to Theorem 3. QE.D.

APPENDIX D: PROOF OF THEOREMS 4 AND 5

To prove Theorem 4, we need the following lemma. In the following, Ik denotes the k x k identity matrix.

LEMMA D.1: Let A be a T x r matrix (T > r) with rank(A) = r, and let Q be a semipositive definite matrix of T x T. If for every A, there exist r eigenvectors of AA' + Q, denoted by F (T x r), such that F = AC for some r x r invertible matrix C, then Q = CIT for some c > 0.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 35: inferential theoy.pdf

168 JUSHAN BAI

Note that if Q = cIT for some c, then the r eigenvectors corresponding to the first r largest eigen- values of AA' + Q are of the form AC. This is because AA' and AA' + CIT have the same set of eigenvectors, and the first r eigenvectors of AA' are of the form AC. Thus Q = CIT is a necessary and sufficient condition for AC to be the eigenvectors of AA' + Q for every A.

PROOF: Consider the case of r = 1. Let 71i be a T x 1 vector with the ith element being 1 and 0 elsewhere. For example, -q, = (1, 0, . 0, )'. Let A = -.l The lemma's assumption implies that m1l is an eigenvector of AA' + U2. That is, for some scalar a (it can be shown that a > 0),

(m?1 + )7 = 71a.

The above implies ml + 2l = mla, where U2l is the first column of U2. This in turn implies that all elements, with possible exception for the first element, of U21 are zero. Applying the same reasoning with A = ,i (i = 1, 2,... , T), we conclude U2 is a diagonal matrix such that U = diag(cl, c2, . . . CT).

Next we argue the constants ci must be the same. Let A = m1l + mq2 = (1, 1, 0, . . . 0)'. The lemma's assumption implies that F = (1/Xs, 1/ii2O, 0. . ., )' is an eigenvector of AA' + U2. It is easy to verify that (AA' + U2)F = Fd for some scalar d implies that cl = c2. Similar reasoning shows all the c 's are the same. This proves the lemma for r = 1. The proof for general r is omitted to conserve space, but is available from the author (the proof was also given in an earlier version of this paper). Q.E.D.

PROOF OF THEOREM 4: In this proof, all limits are taken as N -oo. Let l = plimN x N-lee' =

plim N-1 1 e- ee. That is, the (t, s)th entry of T is the limit of (1/N) 1 ei,eis. From

(TN)- XX'= T-1F0(A0'A0/N)F0' + T-1F0(A0'e'/N) + T-h (eA0/N)F0' + T (ee'/N),

we have (1/TN)XX' P- B with B = (1/T)F0,AF0' + (1/T)T because the two middle terms con- verge to zero. We shall argue that consistency of F for some transformation of F? implies T =

O2IT. That is, equation (6) holds. Let /il > /,2 > ... > IT be the eigenvalues of B with lr > 11r+1.

Thus, it is assumed that the first r eigenvalues are well separated from the remaining ones. With- out this assumption, it can be shown that consistency is not possible. Let F be the T x r matrix of eigenvectors corresponding to the r largest eigenvalues of B. Because (1/TN)XX' P- B, it fol-

lows that IIP - PrII = IIFF' - F'l -A 0. This follows from the continuity property of an invari- ance space when the associated eigenvalues are separated from the rest of the eigenvalues; see, e.g., Bhatia (1997). If F is consistent for some transformation of FO, that is, IF - F0DI 1-* 0 with D being an r x r invertible matrix, then IIPp - PFo II 0, where PFo = F0 (F0'F0)-' F0'. Since the limit of Pp is unique, we have P. = PFo. This implies that F = F?C for some r x r invertible matrix C. Since consistency requires that this be true for every F?, not just for a particular F?, we see the existence of r eigenvectors of B in the form of F?C for all FO. Applying Lemma D.1 with A = F0? 1/2 and U2 = T-1 P, we obtain T = CIT for some c, which implies condition (6). Q.E.D.

PROOF OF THEOREM 5: By the definition of F and VNT, we have (1/NT)XX'F = FVNT. Since W and W + cI have the same set of eigenvectors for an arbitrary matrix W, we have

(NT XX'- T1 JIT)F = F(VNT -T-1 I )

Right multiply JNT = (VNT -T- CTNIr)-1 on both sides to obtain

(NT XX'-T-1&N IT)FJNT = F.

Expanding XX' and noting that (1/TN)ee' - T-1 CN2IT = (1/TN)[ee'- E(ee')], we have

T T T

(D.1) F,-H'F0 = JNTT-1 ZF's;s+ JNTT IE, ss71 + JNTT1 E>sst,

s=1 s=1 s=l

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 36: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 169

where H = HVNTJNT. Equation (D.1) is similar to (A.1). But the first term on the right-hand side of (A.1) disappears and VN-T is replaced by JNT. The middle term of (D.1) is of OP(N-I/) and each of the other two terms is OP(N-1/2c5-N) by Lemma A.2. The rest of the proof is the same as that of Theorem 1, except no restriction between N and T is needed. In addition,

1 N N 1

F, = F = lim -Nlim -AE'e2AA. Q.E.D. Ni=1 j=1 N

APPENDIX E: PROOF OF THEOREM 6

First we show H, is consistent for HI. A detailed proof will involve four steps:

1 N N

(iii) N E ei2,AiA' - - eo A%A' = o

1N it i N i

___ i= 1N 2 N 2 (iii) N E e 2AixA0H-1 e ,,Ai0A0')=o(Hl);1

(iv) H- Q.

The first two steps imply that ei2t can be replaced by e. and Aj can be replaced by H-1A?. A rigorous proof for (i) and (ii) can be given, but the details are omitted here (a proof is available from the author). Heuristically, (i) and (ii) follows from eit = eit + OP(5-1) and Aj = H-1AO + Op(c5-), which are the consequences of Theorem 3 and Theorem 2, respectively. The result of (iii) is a special case of White (1980), and (iv) is proved below. Combining these results together with (4), we obtain

1 N H, = (VN~T) E itAtA(V ) V'Q1QV-1 = H,.

i=1

Next, we prove 0i is consistent for 0i. Because Ft is estimating H'Ft?, the HAC estimator 0i based on Ft,it (t = 1, 2, . . ., T) is estimating Ho'? Ho, where HO is the limit of H. The consistency of &

for HO' iH0 can be proved using the argument of Newey and West (1987). Now

H' = VN- 1(F'F0/T)(A0'A/N) -l VQ A.

The latter matrix is equal to Q`1 (see Proposition 1). Thus 0i is consistent for Q'` iQ-1. Next, we argue Vi, is consistent for Vit. First note that cross-section independence is assumed

because Vit involves Ft. We also note that Vit is simply the limit of

A' (AOAO) - N1 2 e,Akk )(AO Ao) -1

The above expression is scale free in the sense that an identical value will be obtained when replacing AO by AAO (for all j), where A is an r x r invertible matrix. The consistency of Vi now follows from the consistency of A for H` AO

Finally, we argue the consistency of Wit = Ft'i Ft for Wit. The consistency of 0i for Q'-1'iQ-1 is already proved. Now, Ft' is estimating Ft? H but H - Q-'. Thus Wit is consistent for Ft'Ql Q'-Q 1(Q1 Q'-1 F =_ Wit because Q1 Q'-1 = I1 (see Proposition 1). This completes the proof of Theorem 6. Q.E.D.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 37: inferential theoy.pdf

170 JUSHAN BAI

REFERENCES

ANDERSON, T. W. (1963): "Asymptotic Theory for Principal Component Analysis," Annals of Mathe- matical Statistics, 34, 122-148.

(1984): An Introduction to Multivariate Statistical Analysis. New York: Wiley. ANDREWS, D. W. K. (1991): "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix

Estimation," Econometrica, 59, 817-858. BAI, J., AND S. NG (2002): "Determining the Number of Factors in Approximate Factor Models,"

Econometrica, 70, 191-221. BEKKER, P. A. (1994): "Alternative Approximations to the Distributions of Instrumental Variable

Estimators," Econometrica, 62, 657-681. BERNANKE, B., AND J. BoIVIN (2000): "Monetary Policy in a Data Rich Environment," Unpublished

Manuscript, Princeton University. BHATIA, R. (1997): Matrix Analysis. New York: Springer. BRILLINGER, D. R. (1981): Time Series Data Analysis and Theory. New York: Holt, Rinehart and

Winston Inc. CAMPBELL, C. J., A. W. Lo, AND A. C. MACKINLAY (1997): The Econometrics of Financial Markets.

New Jersey: Princeton University Press. CHAMBERLAIN, G., AND M. ROTHSCHILD (1983): "Arbitrage, Factor Structure and Mean-Variance

Analysis in Large Asset Markets," Econometrica, 51, 1305-1324. CONNOR, G., AND R. KORAJCZYK (1986): "Performance Measurement with the Arbitrage Pricing

Theory: A New Framework for Analysis," Journal of Financial Economics, 15, 373-394. (1988): "Risk and Return in an Equilibrium APT: Application to a New Test Methodology,"

Journal of Financial Economics, 21, 255-289. (1993): "A Test for the Number of Factors in an Approximate Factor Model," Journal of

Finance, 48, 1263-1291. CRISTADORO, R., M. FORNI, L. REICHLIN, AND G. VERONESE (2001): "A Core Inflation Index for

the Euro Area," Unpublished Manuscript, CEPR. FAVERO, C. A., AND M. MARCELLINO (2001): "Large Datasets, Small Models And Monetary Policy

in Europe," Unpublished Manuscript, IEP, Bocconi University, and CEPR. FORNI, M., AND M. LIPPI (1997): Aggregation and the Microfoundations of Dynamic Macroeconomics.

Oxford, U.K.: Oxford University Press. FORNI, M., AND L. REICHLIN (1998): "Let's Get Real: a Factor-Analytic Approach to Disaggregated

Business Cycle Dynamics," Review of Economic Studies, 65, 453-473. FORNI, M., M. HALLIN, M. LIPPI, AND L. REICHLIN (2000a): "The Generalized Dynamic Factor

Model: Identification and Estimation," Review of Economics and Statistics, 82, 540-554. (2000b): "Reference Cycles: The NBER Methodology Revisited," CEPR Discussion Paper

2400. (2000c): "The Generalized Dynamic Factor Model: Consistency and Rates," Unpublished

Manuscript, CEPR. FRANKLIN, J. E. (1968): Matrix Theory. New Jersey: Prentice-Hall Inc. GORMAN, W. M. (1981): "Some Engel Curves," in Essays in the Theory and Measurement of Consumer

Behavior in Honor of Sir Richard Stone, ed. by A. Deaton. New York: Cambridge University Press. GRANGER, C. W. J. (2001): "Macroeconometrics-Past and Future," Journal of Econometrics, 100,

17-19. GREGORY, A., AND A. HEAD (1999): "Common and Country-Specific Fluctuations in Productivity,

Investment, and the Current Account," Journal of Monetary Economics, 44, 423-452. HAHN, J., AND G. KUERSTEINER (2000): "Asymptotically Unbiased Inference for a Dynamic Panel

Model with Fixed Effects When Both n and T Are Large," Unpublished Manuscript, Department of Economics, MIT

LAWLEY, D. N., AND A. E. MAXWELL (1971): Factor Analysis as a Statistical Method. London: Butterworth.

LEHMANN, B. N., AND D. MODEST (1988): "The Empirical Foundations of the Arbitrage Pricing Theory," Journal of Financial Economics, 21, 213-254.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions

Page 38: inferential theoy.pdf

FACTOR MODELS OF LARGE DIMENSIONS 171

LEWBEL, A. (1991): "The Rank of Demand Systems: Theory and Nonparametric Estimation," Econo- metrica, 59, 711-730.

LINTNER, J. (1965): "The Valuation of Risky Assets and the Selection of Risky Investments in Stock Portfolios and Capital Budgets," Review of Economics and Statistics, 47, 13-37.

NEWEY, W. K., AND K. D. WEST (1987): "A Simple Positive Semi-Definite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix," Econometrica, 55, 703-708.

POLLARD, D. (1984): Convergence of Stochastic Processes. New York: Springer-Verlag. REICHLIN, L. (2000): "Extracting Business Cycle Indexes from Large Data Sets: Aggregation, Esti-

mation, and Identification," Paper prepared for the 2000 World Congress. Ross, S. (1976): "The Arbitrage Theory of Capital Asset Pricing," Journal of Finance, 13, 341-360. SHARPE, W. (1964): "Capital Asset Prices: A Theory of Market Equilibrium under Conditions of

Risk," Journal of Finance, 19, 425-442. STOCK, J. H., AND M. WATSON (1989): "New Indexes of Coincident and Leading Economic Indica-

tions," in NBER Macroeconomics Annual 1989, ed. by 0. J. Blanchard and S. Fischer. Cambridge: MIT Press.

(1998): "Diffusion Indexes," NBER Working Paper 6702. (1999): "Forecasting Inflation," Journal of Monetary Economics, 44, 293-335.

TONG, Y. (2000): "When Are Momentum Profits Due to Common Factor Dynamics?" Unpublished Manuscript, Department of Finance, Boston College.

WANSBEEK, T., AND E. MEIJER (2000): Measurement Error and Latent Variables in Econometrics. Amsterdam: North-Holland.

WHITE, H. (1980): "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test of Heteroskedasticity," Econometnica, 48, 817-838.

This content downloaded from 111.68.96.209 on Thu, 10 Sep 2015 06:43:00 UTCAll use subject to JSTOR Terms and Conditions