factor analysis of high dimensional time series

Factor Analysis of HighDimensional Time Series

Chris Heaton

A thesis submitted in fullment

of the requirements for the degree of

Doctor of Philosophy

November 30, 2007

Acknowledgements

I gratefully acknowledge the signicant contribution made to the work pre-sented in this thesis by my supervisor Professor Victor Solo.

The results on identication presented in Section 2.2 were accepted for pub-lication before the preparation of this thesis, and appeared as Heaton, C. andSolo, V. (2004) Identication of causal models of stationary time series" TheEconometrics Journal 7, p.618-627. The editor and referees provided us withmany useful comments. This work was also presented at the 57th EuropeanMeeting of the Econometric Society in 2002. Early drafts of some of the mate-rial in Chapter 3 were presented at the 2003 North American Winter Meetingof the Econometric Society, the 9th International Conference on Computing inEconomics and Finance, and the 2006 North American Summer Meeting ofthe Econometric Society. The many useful comments and encouragement ofparticipants at these sessions are gratefully acknowledged.

Abstract

This thesis presents the results of research into the use of factor models forstationary economic time series. Two basic scenarios are considered. The rstis a situation where a large number of observations are available on a relativelysmall number variables, and a dynamic factor model is specied. It is shownthat a dynamic factor model may be derived as a representation of a VARMAmodel of reduced spectral rank observed subject to measurement error. Insome cases the resulting factor model corresponds to a minimal state-spacerepresentation of the VARMA plus noise model. Identication is discussedand proved for a fairly general class of dynamic factor model, and a frequencydomain estimation procedure is proposed which has the advantage of general-ising easily to models with rich dynamic structures. The second scenario is onewhere both the number of variables and the number of observations jointy di-verge to innity. The principal components estimator is considered in this case,and consistency is proved under assumptions which allow for much more errorcross-correlation than the previously published theorems. Ancillary results in-clude nite sample/variables bounds linking population principal componentsto population factors, and consistency results for principal components in adual limit framework under a `gap' condition on the eigenvalues. A new factormodel, named the Grouped Variable Approximate Factor Model, is introduced.This factor model allows for arbitrarily strong correlation between some of theerrors, provided that the variables corresponding to the strongly correlatederrors may be arranged into groups. An approximate instrumental variablesestimator is proposed for the model and consistency is proved.

Contents

1 Introduction 1

1.1 Classical Factor Analysis . . . . . . . . . . . . . . . . . . . . . . 41.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.2 Identication . . . . . . . . . . . . . . . . . . . . . . . . 51.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.4 Errors in Variables . . . . . . . . . . . . . . . . . . . . . 10

1.2 Static Principal Component Analysis . . . . . . . . . . . . . . . 111.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 111.2.2 Identication . . . . . . . . . . . . . . . . . . . . . . . . 131.2.3 Classical Asymptotics . . . . . . . . . . . . . . . . . . . 141.2.4 Random Matrix Theory . . . . . . . . . . . . . . . . . . 151.2.5 Principal component regression . . . . . . . . . . . . . . 191.2.6 Independent Component Analysis . . . . . . . . . . . . . 221.2.7 Canonical Correlation Analysis and Reduced Rank Re-

gression . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.3 Time series factor analysis . . . . . . . . . . . . . . . . . . . . . 24

1.3.1 Models Based on a 2-sided Filter . . . . . . . . . . . . . 261.3.2 Models Based on a 1-sided Filter . . . . . . . . . . . . . 271.3.3 Dynamic Errors in Variables . . . . . . . . . . . . . . . . 34

1.4 Time Series Principal Component Analysis . . . . . . . . . . . . 351.4.1 Models Based on a 2-sided Filter . . . . . . . . . . . . . 361.4.2 Models Based on a 1-sided Filter . . . . . . . . . . . . . 36

1.5 Factor Analysis and Principal Component Analysis of High-Dimensional Vectors . . . . . . . . . . . . . . . . . . . . . . . . 391.5.1 Population Results . . . . . . . . . . . . . . . . . . . . . 401.5.2 Models Based on a 2-sided Filter . . . . . . . . . . . . . 421.5.3 Models Based on a 1-sided Filter . . . . . . . . . . . . . 431.5.4 The Choice of Factor Order . . . . . . . . . . . . . . . . 471.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 481.5.6 Using Factors for GMM Estimation . . . . . . . . . . . . 50

ii

1.6 Evaluation and Contributions . . . . . . . . . . . . . . . . . . . 511.6.1 Evaluation of the Literature . . . . . . . . . . . . . . . . 511.6.2 Contributions Made in this Thesis . . . . . . . . . . . . . 57

2 Dynamic Factor Analysis with a Finite Number of Variables 64

2.1 Dynamic factor models in macroeconomics . . . . . . . . . . . . 662.2 Identication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832.4 A Comparison of the Time Domain and Frequency Domain Al-

gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892.5 An Empirical Example . . . . . . . . . . . . . . . . . . . . . . . 912.6 Concluding Comments . . . . . . . . . . . . . . . . . . . . . . . 99

3 Principal Components Estimation of Large-Scale Factor Mod-

els 111

3.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173.1.1 Population Principal Components and Population Factors1223.1.2 N and the Noise to Signal Ratio . . . . . . . . . . . . . . 1283.1.3 Sample Principal Components and Population Principal

Components . . . . . . . . . . . . . . . . . . . . . . . . . 1303.1.4 Sample Principal Components and Population Factors . 133

3.2 Measuring the noise-to-signal ratio . . . . . . . . . . . . . . . . 1383.3 The noise-to-signal ratio for a US macroeconomic data set . . . 1423.4 Summary and concluding comments . . . . . . . . . . . . . . . . 145

4 The Grouped Variable Approximate Factor Model 178

4.1 The grouped variable approximate factor model . . . . . . . . . 1844.2 The approximate instrumental variables estimator . . . . . . . . 186

4.2.1 Estimating Bi . . . . . . . . . . . . . . . . . . . . . . . . 1884.2.2 Estimating δ = (β′ α′)′ . . . . . . . . . . . . . . . . . . 1894.2.3 Estimating Σf and Ψ . . . . . . . . . . . . . . . . . . . . 1904.2.4 Estimating ft . . . . . . . . . . . . . . . . . . . . . . . . 1914.2.5 Estimation with approximate factors . . . . . . . . . . . 192

4.3 Some Dual-Limit Theory . . . . . . . . . . . . . . . . . . . . . . 1944.4 An experimental application to US macroeconomic data . . . . 2004.5 Concluding Comments . . . . . . . . . . . . . . . . . . . . . . . 206

5 Conclusions 230

5.1 The motivation for the research . . . . . . . . . . . . . . . . . . 2305.2 The ndings of the research . . . . . . . . . . . . . . . . . . . . 231

5.2.1 Dynamic factor analysis . . . . . . . . . . . . . . . . . . 231

iii

5.2.2 Approximate factor models . . . . . . . . . . . . . . . . . 2335.2.3 The grouped variable approximate factor model . . . . . 236

5.3 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . 2375.3.1 Dynamic factor analysis . . . . . . . . . . . . . . . . . . 2375.3.2 Approximate factor models . . . . . . . . . . . . . . . . . 2385.3.3 The grouped variable approximate factor model . . . . . 238

iv

List of Figures

2.1 Monthly industrial production growth for G7 countries . . . . . 932.2 Estimated ARMA(1,1) spectra of the errors . . . . . . . . . . . 972.3 Estimated ARMA(1,1) spectra of factor . . . . . . . . . . . . . . 982.4 Estimated ARMA(1,1) factor . . . . . . . . . . . . . . . . . . . 99

3.1 Eigenvalues of Stock and Watson's data . . . . . . . . . . . . . . 143

3.2 1N−k

N∑j=k+1

λj

λkfor Stock and Watson's data . . . . . . . . . . . . . 144

4.1 Single factor estimated using approximate instrumental vari-ables method and principal components method . . . . . . . . . 202

4.2 Dierence between factor estimated by approximate instrumen-tal variables and factor estimated by principal components . . . 203

v

List of Tables

2.1 Estimation by Time Domain and Frequency Domain ScoringAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

2.2 SBCs and log-likelihoods (q = the number of lags of factors) . . 952.3 Estimates of factor loadings and error variances . . . . . . . . . 962.4 Estimates of ARMA parameters of the factor and errors . . . . 96

3.1 Empirical and Theoretical Distributions of the Test Statistic(k=2, T=100, 5000 simulations) . . . . . . . . . . . . . . . . . . 142

3.2 Test results for Stock and Watson data . . . . . . . . . . . . . . 145

4.1 Approximate Instruments . . . . . . . . . . . . . . . . . . . . . 1934.2 Forecast MSEs for PC and AIV forecasts for IP . . . . . . . . . 2044.3 Forecast MSEs for PC and AIV forecasts for PUNEW . . . . . . 2044.4 Group a variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2234.5 Group 2 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2244.6 Group 3 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2254.7 Group 4 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2254.8 Group 5 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2264.9 Group 6 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2264.10 Group 7 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2274.11 Group 8 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2274.12 Group 9 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2274.13 Group 10 variables . . . . . . . . . . . . . . . . . . . . . . . . . 2284.14 Group 11 variables . . . . . . . . . . . . . . . . . . . . . . . . . 2284.15 Group 12 variables . . . . . . . . . . . . . . . . . . . . . . . . . 2294.16 Group 13 variables . . . . . . . . . . . . . . . . . . . . . . . . . 2294.17 Group 14 variables . . . . . . . . . . . . . . . . . . . . . . . . . 229

vi

Chapter 1

Introduction

Recent years have seen rapid growth in the availability of economic data.

Economists in the industrialised countries now have easy access to data on

many hundreds of variables that provide information about the state of the

economy. Coinciding with the growth in available data has been unprecen-

dented improvement in access to sophisticated econometric techniques. Soft-

ware packages such as Eviews, Oxmetrics, JMulti, Microt, and many others,

greatly simplify the application of modern econometric methods. Modelling

tasks which might otherwise require many hours of work by a specialist econo-

metrician can now be completed in seconds by general economists with basic

econometric training and a familiarity with their software environment.

Unfortunately, modern econometrics presents the time series analyst with

a Hobson's choice of techniques. While a wide range of methods exist for

analysing independently sampled random vectors, the majority of the com-

monly used techniques for analysing time series economic data are variations

on a basic dynamic regression, or vector autoregression, approach. While an

1

impressive range of problems have been investigated and resolved in this frame-

work, its limitation is that, given the length of time series usually available in

economics, it is unable to deal with more than a handful of variables in a single

model. Large scale structural macroeconometric models, while once popular,

are far less commonly employed following the criticisms of Sims (1980). Panel

data methods, while capable of handling large numbers of variables, are only

applicable in cases where panel structure exists. Consequently, for macroe-

conomists who believe that the wide range of time series data that are now

available contain useful information that is not spanned by any small subset

of variables, contemporary econometrics does not have much to oer.

A challenge for econometricians then, is to devise formal modelling strate-

gies that are suited to time series data sets which are large in the sense that the

number of variables is greater than that which could reasonably be analysed

using traditional regression-related techniques. Of particular interest are tech-

niques which work in cases for which the number of variables is of the same

order of magnitude, and possibly greater than, the number of observations

available, since this describes many of the data sets that are of interest. The

fact that economists who design macroeconomic policy pay close attention to

such a wide range of published data1 implies a belief that econometric tech-

niques which utilise large numbers of variables might reveal information that

cannot be gleaned from small econometric models. At the very least, such a

research program may be justied by a `suck it and see' argument if we don't

develop techniques that can deal with large time series data sets, then we will

never know whether these data sets are useful in economics.

1see, for example, the introduction to Bernanke and Boivin (2003).

2

One approach to modelling data sets which are too large for traditional

econometric techniques is to specify a factor model2. While factor analysis

is a commonly employed technique in the analysis of IID vectors, in the past

less interest has been shown in estimating factor models for time series. How-

ever, there is now a rapidly growing literature of applications of factor analyis

techniques to economic time series. Examples of applications include the con-

struction of economic indicators3, business cycle analysis4, forecasting5, the

analysis of monetary policy6, unemployment7, stock market returns8, interest

rates9, real estate market eciency10 and lending risk11. Importantly, there

exists recent theoretical work in which factor estimators are considered in a

framework in which both the number of observations and the number of vari-

ables jointly go to innity12.

This thesis presents several original contributions to the eld of time series

factor analysis. These include some extensions of established approaches, some

new approaches, and new theory. The remainder of this chapter provides an

introduction to classical factor analysis and principal component analysis and

related methods, and to the more recent work in time series factor analysis

2Other approaches, which are not considered in this thesis, include combining forecasts,Bayesian model averaging and empirical Bayes methods. Stock and Watson (2006) providea good survey of alternative approaches.

3Altissimo et al. (2006).4Gregory et al. (1997).5Stock and Watson (2002b).6Bernanke and Boivin (2003).7Heaton and Oslington (2002).8Ludvigson and Ng (2007).9Lippi and Thornton (2004).10Guntermann and Norrbin (1991).11Melvin and Schlagenhauf (1986).12Stock and Watson (2002a), Bai and Ng (2002), Bai (2003), Forni et al. (2000), Forni

et al. (2004) and Forni et al. (2005).

3

and principal components. An overview of the relevant literature is presented,

and the original contributions made in the thesis are briey described.

1.1 Classical Factor Analysis

1.1.1 Introduction

Factor analysis has its origins in early attempts by psychologists to measure

intelligence. Spearman (1904) gathered data on schoolchildrens' performance

in a range of tests and proposed that the positive correlation between the

test scores for randomly sampled children was due to a single factor, which he

referred to as general intelligence, and which he estimated from the data. Later

researchers proposed multiple types of intelligence, leading to multiple factor

models. While the psychological theories that inspired the development of

factor analysis have been superceded, factor analysis as a general multivariate

statistical technique is now commonly used in a range of empirical disciplines.

Consider observations on a N × 1 random vector xt for t = 1, .., T . The

factor model assumes that xt is a linear function of a k × 1 vector of random

factors ft and a N × 1 vector of random errors εt, so that

xt = Bft + εt (1.1)

where B is a N × k matrix of non-random coecients referred to as the factor

loading matrix, and k N . In the classical setting N is assumed to be xed.

Following the literature, it will be assumed that E(ft) = 0 and E(εt) = 0. If

xt for t = 1, .., T is a set of IID observations and k > 1 then equation (1.1) is

4

the classical multiple factor model of Thurstone (1947).

1.1.2 Identication

Denote Ψ = E(εtε′t) and Σff = E(ftf

′t). The covariance matrix of the observ-

able vector xt is then

Σxx = E(xtx′t) = BΣffB

′ + Ψ (1.2)

From a consideration of the rst two moments, the parameters of the factor

model are unidentied in the absence of restrictions. Firstly, note that the

factor structure implies that the covariance matrix of xt may be written as

the sum of a rank k matrix and a full-rank symmetric positive-denite ma-

trix. If we dene B∗ =√

αB and Ψ∗ = (1 − α)Ψ where 0 < α < 1 then

Σxx = BΣffB′ + Ψ = B∗ΣffB

′∗ + Ψ∗ where Ψ∗ is full-rank, symmetric and

positive-denite. Thus, in the general case, Ψ is unidentied. In classical

factor analysis, this problem is circumvented by assuming that Ψ is diagonal.

Anderson and Rubin (1956) prove that Ψ is identied under this restriction13.

Even with Ψ restricted to be diagonal, B is not uniquely determined. If we

now let B∗ = BM and f ∗t = M−1ft where M is any non-singular matrix, then

Σxx = BΣffB′ + Ψ = B∗Σf∗f∗B

′∗ + Ψ. Therefore, B and ft are identied

up to a non-singular transformation only. It is often assumed that Σff = Ik.

In this case, M must be an orthogonal matrix and B is then identied up to

an orthogonal transformation. In order to acheive identication up to a sign

13In addition to the diagonality of Ψ some other assumptions are made to prove theuniqueness of Ψ.

5

change, restrictions must be placed on B that are equivalent to choosing a

particular orthogonal value for M . Anderson and Rubin (1956) and Reiersøl

(1950) discuss a number of restrictions that are sucient to achieve identica-

tion up to a sign change. Perhaps the most familar of these, for an economist,

is that the observable variables can be ordered such that there exists a k × k

submatrix of B that is upper triangular, i.e. it is possible to order the elements

of xt such that

B =

B11 0 · · · 0

B21 B22 · · · 0

B31 B32. . . 0

......

BN1 BN2 · · · BNk

In the applied literature identication is often achieved by the analyst sim-

ply choosing a particular value of M that provides a set of factors and load-

ings that are regarded as being easily interpretable. A popular method is the

varimax rotation14 which chooses M to maximise the variance of the factor

loadings. This produces a large number of small factor loadings and a small

number of large factor loadings. Another popular approach has been simply to

plot factor loadings and choose an identication scheme which corresponds to a

plausible interpretation15. An alternative approach to identication which has

been utilised in the signal processing literature16 is to impose the restriction

that the factors are statistically independent (rather than the weaker condition

of uncorrelatedness that has been imposed above).

14Kaiser (1958).15see, for example, Gorsuch (1983).16Attias (1999).

6

The apparent arbitrariness of many of the identication schemes that have

been employed in the applied behavioural sciences literature has earned fac-

tor analysis something of a controversial reputation. However, two points

should be borne in mind. Firstly, the statistical theory of the classical factor

model, including the identication issue, is sound and is well-understood. The

charge that it has been badly applied in some cases should not be interpreted

as evidence that the general approach is awed. Secondly, there exist many

interesting applications (e.g. forecasting, variance decomposition, index con-

struction, etc) for which only the space spanned by the the factors needs to be

estimated. In such applications, the results are independent of any `rotation'

applied to the factors.

1.1.3 Estimation

Maximisation of the Gaussian likelihood by a (quasi-) Newton algorithm re-

quires identifying restricitons to be imposed on the model in order to ensure

a non-singular information matrix. In principle, any condition sucient for

identication could be employed, but the conditional Fletcher Powell proce-

dure proposed by Jöreskog (1967)17 produces a neat algorithm which is easy

to code. Once the model has been estimated under a particular restriction,

the parameters of any equivalent model may be computed by applying the

appropriate transformation to the maximum likelihood estimates.

Rubin and Thayer (1982) propose that maximum likelihood estimation of

the factor model be carried out using the EM algorithm of Dempster et al.

(1977). This approach has the advantage of being extremely simple to code.

17Jöreskog (1967) restricts B′ΨB to be a diagonal matrix.

7

While convergence of the EM algorithm generally requires many more itera-

tions than a (quasi)-Newton algorithm, each iteration can be computed rel-

atively quickly, particularly when the number of parameters is large. An in-

teresting feature of the EM algorithm in this context is that it estimates the

factor model without any explicit identifying restriction being imposed on the

parameters.

A practical problem which often occurs in the maximum likelihood estima-

tion of the factor model is that the algorithm will converge to a solution for

which one of the diagonal elements of Ψ is zero. These solutions are referred to

in the literature as `improper' or `Heywood' solutions18. Lawley and Maxwell

(1971) and Jöreskog (1967) propose an algorithm for model estimation in the

presence of improper solutions, which involves respecifying the factor model

once an improper solution has been detected.

Anderson and Rubin (1956) prove asymptotic Gaussianity of the maximum

likelihood estimator in cases in which the parameters of the model are identied

and√

T (Sxx − Ω) is asymptotically Gaussian, where Sxx = 1T

T∑t=1

xtx′t. Gill

(1977) proves consistency under more general conditions which include cases

in which a parameter lies on boundary of the coecient space.

A number of alternatives to likelihood estimation have been proposed.

Jöreskog and Goldberger (1972) propose a GLS procedure that may be imple-

mented using a Newton-Raphson algorithm. Ihara and Kano (1986) develop a

method of moments estimator for individual elements of Ψ. In a paper that is

particularly relevant for Chapter 4 of this thesis, Madansky (1964) notes that

the factor model implies an errors-in-variables structure. Equation 1.1 may be

18see Lawley and Maxwell (1971) or Jöreskog (1967).

8

partitioned into three blocks.

(1.3)x0t

x1t

x2t

=

B0

B1

B2

ft +

ε0t

ε1t

ε2t

t = 1, .., T

where xt is a k × k vector and it is assumed that B0 is non-singular and

that Ψ = E(εtε′t) is diagonal. Exploiting the fact that the factor loadings and

factors are identied up to a non-singular transformation only, the model may

be written equivalently as

(1.4)x0t

x1t

x2t

=

Ik

B∗1

B∗2

f ∗t +

ε0t

ε1t

ε2t

t = 1, .., T

where B∗j = BjB

−10 for j = 1, 2 and f ∗t = B0ft. The equations

xjt = B∗j f

∗t + εjt

f ∗t = x0t − ε0t

then form an errors-in-variables model. Madansky (1964) proposed an instru-

mental variables (IV) estimator based on this transformation of the factor

model. Hägglund (1982) extended this to a two stage least squares (2SLS)

estimator.

9

An important feature of the estimators of Ihara and Kano (1986), Madan-

sky (1964) and Hägglund (1982) is that they are non-iterative and may be

computed in cases in which the number of variables is larger than the number

of observations. From a computational perspective they are attractive alterna-

tives to likelihood methods particularly in cases for which improper solutions

cause a problem and in cases for which the number of variables is large. Jen-

nrich (1986) proposes a Gauss-Newton algorithm which achieves eciency in a

single step from consistent starting values. He suggests that it be used in con-

junction with the procedures of Ihara and Kano (1986) and Hägglund (1982)

to produce an ecient two-step estimator.

1.1.4 Errors in Variables

Consider the linear model

yt = β′x∗t + εt

where yt is a scalar and x∗t is a N × 1 vector. Suppose that x∗t is observed

subject to measurement error, so that

xt = x∗t + ηt

is the observable variable, where ηt is the random measurement error. A sim-

ple count of the moment conditions and parameters reveals that this model

is unidentied when all random variables are assumed to be Gaussian. Two

main approaches have been proposed to resolve this problem. One approach

is to use instrumental variables. Early discussions of this approach include

10

Geary (1942) and Reiersøl (1945). Carter and Fuller (1980) derive maximum

likelihood estimators using instruments. The other approach to estimation is

to assume non-Gaussianity and to use higher order moments to resolve the

identication problem. Reiersøl (1941) proposed estimators constructed from

third order moments. Geary (1942) discussed estimation using cumulunts of

any order higher than two. Pal (1980) used third moments to construct an es-

timator. Cragg (1997) and Dagenais and Dagenais (1997) derive instrumental

variables estimators using third and fourth moments. Erickson and Whited

(2002) construct a GMM estimator using higher order moments.

1.2 Static Principal Component Analysis

1.2.1 Introduction

The principal components method may be viewed as a linear transformation

of multivariate data to a new orthogonal coordinate system such that the

direction of the greatest variation of the data is given by the rst axis, the

direction of the second greatest variation of the data is given by the second

axis, and so on. The method is due originally to Pearson (1901) and was

developed by Hotelling (1933).

Consider observations on a N×1 random vector xt for t = 1, .., T . Let R1 be

the set of N × 1 vectors that have been normalised so that r′1r1 = 1,∀r1 ∈ R1.

Now consider the problem of choosing the value of r1 which maximises the

11

sample variance of r′1xt. Denote

r∗1 = argmaxr1∈R1

(1

T

T∑t=1

r′1xtx′tr1

)

Standard calculus arguments establish that r∗1 = q1 where q1 is the normalised

eigenvector corresponding to the largest eigenvalue of the sample covariance

matrix Sxx = 1T

T∑t=1

xtx′t. The constructed variable s1t = λ

− 12

1 r′∗1 xt = λ− 1

21 q′1xt,

where λ1 is the largest eigenvalue of the sample covariance matrix, is referred

to as the rst sample principal component of xt. Now dene R2 to be the set

of N × 1 vectors for which r′2r2 = 1 and r′2r1 = 0,∀r1 ∈ R1, r2 ∈ R2. Consider

the problem of choosing a value for r2 so that the sample variance of r′2xt is

maximised. Denoting

r∗2 = argmaxr2∈R2

(1

T

T∑t=1

r′2xtx′tr2

)

it is easy to show that r∗2 = q2 where q2 is the normalised eigenvector cor-

responding to the second largest eigenvalue, λ2. The constructed variable

s2t = λ− 1

22 r′∗2 x2 = λ

− 12

2 q′2xt is referred to as the second sample principal compo-

nent of xt. In a similar fashion a complete set of N orthormal sample principal

components may be dened.

An alternative, but equivalent way to dene the rst k sample principal

components of T observations on a N × 1 vector xt is to dene the matrix

X = (x1, · · · , xT )′ and to choose values for the T × k matrix Sk and the N × k

matrix B to minimise1

T‖X − SkB

′‖2F

12

such that 1TS ′kSk = Ik. As is well-known from regression analysis, the optimal

value of B is given by the OLS estimator B = 1TX ′Sk. Substituting this into

the maximand, the remaining problem is to choose a value for Sk to minimise

1

Ttr(S ′kXX ′Sk)

such that 1TS ′kSk = Ik. Therefore, the optimal value of Sk is Sk where Sk is

the matrix containing the eigenvectors corresponding to the rst k eigenvalues

of 1TXX ′.

It may be noted that the eigenvalues, eigenvectors and principal compo-

nents from the two denitions above are given simply by the singular value

decomposition1√T

X = SΛ12 Q′

where S is a T × N matrix for which the jth column is the jth principal

component.

In applied work in many disciplines it is often found that a large proportion

of the variance of xt is accounted for by the rst few principal components.

Since working with the rst few principal components might entail a signi-

cant reduction in dimension, in some applications analysts may prefer this to

working with the original data.

1.2.2 Identication

While often viewed simply as a technique for reducing sample dimension, it

is possible to interpret sample principal components as estimates of analogous

13

population quantities. Let Ω = E(Sxx) and let λj and qj be the jth eigenvalue

of Ω and its corresponding normalised eigenvector. Consider the problem of

choosing non-random orthonormal vectors rj for j = 1, .., N to maximise the

population variance of the constructed variables r′jxt for j = 1, .., N . The same

calculus arguments that were used above show that this problem is solved

by choosing rj so that the constructed variables are equal to the population

principal components, dened as sjt = λjq′jxt for j = 1, .., N .

It should be noted that, as dened above, population principal components

are identied up to an orthogonal transformation only. That is to say, the opti-

misation problem described above, which is solved by the population principal

components st = (s1t ... skt)′ is also solved by Mst where M is any k × k

orthogonal matrix. Consequently, the interpretation of principal components

as representing some underlying economic structure is problematic.

1.2.3 Classical Asymptotics

For the principal components model, if there are no repeated eigenvalues then

λj and qj are continuous functions of Ω19. Therefore, under conditions sucient

for√

T (Sxx − Ω) to be asymptotically Gaussian as T →∞,√

T (st −Mst) is

asymptotically Gaussian as T →∞ for j = 1, .., N , where N is a xed constant.

Anderson (1963) derives the asymptotic distribution of the eigenvalues and

eigenvectors for the more general case which allows for arbitrary multiplicity

of the eigenvalues in a setting where N is xed and T −→∞.

19see Magnus and Neudecker (1991).

14

1.2.4 Random Matrix Theory

In applications for which the number of variables is of the same order of mag-

nitude as the number of observations, the `xed-N ' asymptotics of Subsection

1.2.3 may be inappropriate. A signicant body of work, known as Random

Matrix Theory, exists which examines the distribution of the eigenvalues of

sequences of covariance matrices of T observations on N variables as T and

N jointly approach innity at the same rate. Random Matrix Theory has its

origins in theoretical physics. In quantum mechanics the discrete energy levels

of atomic nuclei may be found by computing the eigenvalues of a Schrödinger

operator. For light atoms, solution procedures are well-known, but for heavy

atoms with large numbers of energy levels, the required analysis becomes un-

feasibly complicated. Physicists often circumvent this problem by replacing

the Schrödinger operator with a Hermitian random matrix and conducting

analyses of energy levels by considering the distribution of eigenvalues. The

most famous result in this eld is Wigner's semicircle law20. Wigner con-

sidered a N × N Hermitian matrix with IID real random diagonal elements,

and o-diagonal elements that are IID complex random variables with a com-

mon variance σ2, and derived the asymptotic distribution of the eigenvalues.

Mar£enko and Pastur (1967) developed a similar theory for the eigenvalues of

the covariance matrix of serially and cross-sectionally uncorrelated Gaussian

random vectors. Assume that xt, t = 1, ..., T is a IIDN(0, σ2IN) sequence of

N × 1 random vectors, and denote the covariance matix Sxx = 1T

T∑t=1

xtx′t. Let

20Wigner (1955) and Wigner (1958).

15

λj be the jth eigenvalue of Sxx. The spectral distribution of Sxx is dened as

F Sxx(m) =#j : λj 6 m

N

where # denotes the number of elements in the set indicated. That is, the

spectral distribution at the point m gives the number of sample eigenvalues

that are not larger than m. Mar£enko and Pastur (1967) showed that as

NT−→ c < ∞, F Sxx(m) −→ F (m) where

F (m) =

√

(a+−m)(m−a−)

2πmcσ2 , a− 6 m 6 a+

0, elsewhere

a− = σ2(1−√

c)2 and a+ = σ2(1+√

c)2. Consequently, the sample eigenvalues

are more spread out than the population eigenvalues. A considerable amount

of theoretical work has followed. A complete review of this literature is not

necessary here, but it is noted that an implication of much of this work is that

sample eigenvalues may not be consistent estimators of population eigenvalues

in a setting in which N and T grow jointly. For example, Geman (1980)

considers the covariance matrix formed from a T × N matrix of IIDN(0, 1)

random variables, in a setting in which (T, N) −→ (∞,∞) and NT−→ c ∈ (0, 1]

and shows that

λ1a.s.−−→

(1 +

√c)2

16

Johnstone (2001) derives the asymptotic distribution of the largest sample

eigenvalue. He shows that

λ1 − µTN

σTN

d−→ W1 ∼ F1

where µTN =(√

T − 1 +√

N)2

, σTN =(√

T − 1 +√

N)(

1√T−1

+ 1√N

) 13,

and the density is given by

F1(s) = exp

−1

2

∞∫s

q(x) + (x− s)q2(x)dx

, s ∈ R

where q solves the Painlevé II dierential equation

q′′(x) = xq(x) + 2q3(x)

q(x) ∼ Ai(x) x →∞

where Ai(x) denotes the Airy function. Johnstone (2001) also proposes a

spiked" model in which a nite number of the population eigenvalues have

relatively large values. Baik and Silverstein (2006) derive almost sure limits

for the sample eigenvalues in a spiked model in which all but a nite number

of the population eigenvalues are equal to one. They nd that if all of the

population eigenvalues lie in the interval [1−√

c, 1 +√

c], then the Mar£enko

and Pastur (1967) result holds, despite the presence of the large population

eigenvalues. If some of the population eigenvalues lie outside [1−√

c, 1 +√

c],

then the same number of sample eigenvalues will lie outside the support of the

Mar£enko-Pastur density ([(1−√

c)2, (1 +√

c)2]), and the rest will conform to

17

the Mar£enko-Pastur limit.

Ledoit and Wolf (2002) use results from the Random Matrix Theory lit-

erature to consider the behaviour of test statistics for hypotheses about co-

variance matrices. They nd that an existing test for spherity is robust to

high dimensionality, and develop a modied statistic for testing that the co-

variance is equal to a specied matrix. There also exists some work in which

Random Matrix Theory is used to consider the behaviour of sample covari-

ances matrices of variables generated by factor models in a setting in which

(T,N) → (∞,∞) jointly. Kapetanios (2005) considers a static factor model

in which the eigenvalues of B′B grow at a rate of N and the eigenvalues of the

error covariance are bounded. If kmax > k then λi − λkmax+1 will diverge as

N −→∞ for i = 1, ..., k and remain bounded for i = k + 1, ..., kmax. A test of

the null hypothesis H0 : k = k0 against H1 : k > k0 may then be based on the

test statistic λk0+1− λkmax+1. Kapetanios (2005) proposes a subsampling tech-

nique to approximate the distribution of this test statistic. He proves that this

procedure consistently estimates the distribution of the test statistic, and that

a sequence of tests using this approach consistently estimates the true factor

order in a setting in which NT−→ c < ∞. Onatski (2007) proposes an alter-

native test statistic and derives its distribution. He assumes that the errors

are Gaussian and temporally independent, that the eigenvalues of the error

covariance are bounded, that the eigenvalues of B′B grow at a rate faster than

N23 , and that N

Tremains in a compact subset of (0,∞) as (T, N) −→ (∞,∞).

Onatski (2006a) considers a factor model in which the eigenvalues of B′B

are bounded above and the elements of the error vector εt are IIDN(0, σ2).

Since the proportion of the total variance that is accounted for by the factors

18

declines as N grows, he refers to the factors as being relatively weak. Using

arguments from Random Matrix Theory, he shows that for sequences for which

(T,N) → (∞,∞) and NT−→ c < ∞, the principal components are inconsistent

estimators of the factors but are asymptotically Gaussian. Onatski (2006b)

considers a static factor model with errors that are either IID across time or

IID across the cross-section (but not both). He assumes that the eigenvalues

of B′B are growing, but does not require the growth to be as fast as N . The

eigenvalues of the error covariance are assumed to be bounded. He uses ideas

from Random Matrix Theory to derive an estimate of an upper bound for

the eigenvalues of the error covariance matrix. The number of factors is then

estimated by counting the number of eigenvalues of the observable covariance

that are above this bound. He proves consistency of this estimator in a setting

in which N −→∞ and T −→∞ simultaneously at the same rate.

From the perspective of economic analysis, a weakness of Random Matrix

Theory, in its current form, is that it does not apply to serially dependent data.

An extension to a correlated time series setting would be a major advance

that would lead to the development of many new techniques in econometrics.

For the time being however, the applicability of Random Matrix Theory to

economic problems is somewhat limited.

1.2.5 Principal component regression

One possible application of principal components techniques which is at least

supercially attractive is the reduction of dimension in regression analysis.

This has been proposed both for reducing a large set of regressors to a size

19

manageable by standard regression techniques21, and for relatively small re-

ductions in dimension to eliminate multicollinearity problems22. It is often

argued that principal components corresponding to small eigenvalues may be

omitted from the analysis since they account for only a small proportion of

the total variation of the regressors23. However, this is not a sound argument.

The principal components of a set of predictor variables xt are the components

of xt which have the maximum possible variance. What matters for regression

analysis however, is not the variance of the right-hand-side variables, but their

correlation with the dependent variable yt. It is easy to construct examples in

which all of the correlation between xt and yt is accounted for by the principal

component of xt corresponding to the smallest eigenvalue. Therefore, in the

absence of an argument which explains why the correlation of xt with yt would

be due to the components which also explain most of the variance of xt, a re-

gression technique based on excluding principal components that correspond

to relatively small eigenvalues is suspect. In order to introduce ideas which

will be developed in much greater depth and generality in Chapters 3 and 4,

two such arguments will now be introduced.

Firstly, consider the case in which the explanatory variables are determined

by a vector of k unobservable factors ft and a vector of N errors εt, i.e.

xt = Bft + εt (1.5)

where B is a N × k matrix of unknown non-random coecients. It is assumed

21see, for example, Pidot (1969).22see, for example, Mittelhammer and Baritelle (1977), Cheng and Iglarsh (1976), McCal-

lum (1970).23Mittelhammer and Baritelle (1977) and Pidot (1969) both present this argument.

20

that yt is correlated with xt purely due to its correlation with the factors, i.e.

yt = β′ft + ηt (1.6)

where β is a k × 1 vector of regression coecients and ηt is a regression error

term that is uncorrelated with ft. For simplicity, it is assumed that E(εtε′t) =

IN and E(ftf′t) = Ik. Let Λk be a k× k diagonal matrix containing the rst k

eigenvalues of Ω = E

(T∑

t=1

xtx′t

)in descending order and Qk be a N×k matrix

containing the corresponding eigenvectors as columns. The eigenvectors of BB′

are then Qk and the eigenvalues are Λk − Ik. Therefore, there exists a k × k

orthogonal matrix M such that B = Qk (Λk − Ik)12 M . It follows that the rst

k principal components of xt may be written as

sjt =

√λj − 1

λj

fjt + λ− 1

2j q′jεt, j = 1, .., k

Therefore, in this particular case, the rst k principal components are noisy

scaled measures of the rst k factors. Consequently it is the rst k principal

components that account for the correlation between xt and yt. OLS regression

of the principal components on yt produces an inconsistent estimate of β due

to the measurement error introduced by using the principal components as

proxies for the factors. However, standard errors-in-variables arguments may

be used to construct a method of moments estimator of β for given values of

λj, j = 1, .., k. Consistent estimation of the eigenvalues then permits consistent

estimation of β.

Now consider a slightly dierent example. For equations (1.5) and (1.6)

21

assume once again that E(ftf′t) = Ik. This time however, the error variance

is assumed to be E(εtε′t) = σ2IN where σ2 is a scalar. The rst k principal

components of xt are now

sjt =

√λj − σ2

λj

fjt + λ− 1

2j q′jεt, j = 1, .., k

As above, it is clear that the correlation between xt and yt is accounted for by

the rst k principal components of xt, but that a OLS regression of the principal

components of xt on yt will have an errors-in-variables bias. Note however

that var(√

λj−σ2

λjfjt

)=

λj−σ2

λjand var

(λ− 1

2j q′jεt

)= σ2

λj. Consequently, if σ2

λjis

suciently small, the errors-in-variables bias will be negligible. In such cases,

OLS regression of the rst k principal components of xt on yt is a reasonable

approach to take.

1.2.6 Independent Component Analysis

An extension of principal component analysis, which resolves the identication

issue, is to assume that the components are statistically independent, rather

than merely uncorrelated. The independent component analysis (ICA) model

is

xt = Ast

where xt is a N×1 observable vector, st is a m×1 vector of unobservable unit-

variance independent signals, and A is a N ×m non-random mixing matrix.

Comon (1994) shows that st is identied up to a sign matrix24 if, with the

24that is, each series of elements of st is identied up to a sign change.

22

possible exception of one component, all the components are non-Gaussian,

N > m, and A is of full column rank. Cardoso (1998) and Hyvärinen et al.

(2001) provide good overviews of ICA and the many procedures that have been

proposed for estimation of the ICA model. Recently, Chen and Bickel (2006)

has proposed an asymptotically ecient semi-parametric estimator.

Applications of the ICA model have included the processing of magnetoen-

cephalographic and electroencephalographic data 25 26, functional magnetic

resonance imaging 27, and telecommunications 28. However, little work has

been done in economics using ICA. This is most likely due to the absence of

additive noise in the ICA model, which renders it unsuitable for many economic

applications.

1.2.7 Canonical Correlation Analysis and Reduced Rank

Regression

Canonical correlation analysis was introduced by Hotelling (1936). Consider

two random vectors xt and yt of dimensions N × 1 and m × 1 where m 6

N . Dene the weighted sums yt = β′xt and vt = α′yt, where α and β are

vectors of conformable dimension. The rst canonical vectors are derived by

nding the values of α and β which maximise the correlation between ut and

vt subject to the arbitrary normalisations α′α = 1 and β′β = 1. The second

canonical vectors are derived by similarly choosing vectors to maximise the

correlation, subject to the additional restriction that the second canonical

25Vigário et al. (1998).26Flexer et al. (2005).27McKeown et al. (1998).28Ristaniemi and Joutsensalo (1999).

23

vectors are orthogonal to the rst. A complete set of m canonical vectors may

be dened this way. Some calculus shows that the solution to this maximisation

problem is computed from the singular value decomposition of Σ− 1

2yy Σ′

xyΣ− 1

2xx

where Σxx = E(xtx′t), Σyy = E(yty

′t) and Σxy = E(xty

′t).

Consider the regression model

yt = ABxt + εt

where A is a m×r matrix of coecients and B is a r×N matrix of coecients.

If r < m then the regression coecient AB is of rank r < m and the model

is referred to as a reduced rank regression. It may be shown (see Reinsel and

Velu (1998)) that the Gaussian maximum likelihood estimators of A and B

are equivalent to the weighting matrices from a canonical correlation analysis

of yt and xt.

1.3 Time series factor analysis

The classical factor model was explicitly derived in a IID framework and most

of the problems to which it has been applied have been dened in this setting.

However, it should be remembered that the asymptotic Gaussianity proof for

the maximum likelihood estimator provided by Anderson and Rubin (1956)

requires only identication of the parameters and that√

T (Sxx−Ω) is asymp-

totically Gaussian; which does not necessarily rule out serial correlation of

ft or εt. Similar conditions are used to prove consistency and/or asymptotic

Gaussianity for the other factor model estimators discussed in Section 1.1, and

24

for the principal components estimator discussed in Section 1.2. Furthermore,

the factor model

xt =

q∑j=0

Bjft−j + εt

may be written as

xt = B∗f ∗t + εt

where B∗ =(B′

0 · · ·B′j

)′and f ∗t =

(f ′t f ′t−1 · · · f ′t−q

)′Therefore, a model with

q lags of k factors may be rewritten as a model with qk factors and no lags.

Consequently, the traditional techniques of factor analysis and principal com-

ponent analysis are also applicable in some time series settings in particular,

under the commonly assumed conditions of covariance stationarity and weak

dependence. Nonetheless, it is often of interest to explicity specify and es-

timate the dynamic structure of a factor-driven time series process. In this

thesis, the adjective `dynamic' will be used to indicate a factor model which

has an explicitly written lag structure. The adjective `static' will indicate a

model in which only contemporaneously-dated variables are explicity included

some or all of which may have interesting time series structure. Importantly,

the term `static' is not intended to indicate that a variable is IID.

Two distinct approaches have been taken to the specication of dynamic

factor models. The earliest models were based on a two-sided lter, such that

the observable variables are related to past and future values of the factors. The

alternative approach which was subsequently taken is to specify the observable

variables as functions of the current and past values of the factors only.

25

1.3.1 Models Based on a 2-sided Filter

Geweke (1977), Sargent and Sims (1977) and Geweke and Singleton (1981)

considered the dynamic factor model

xt =∞∑

j=−∞

Bjft−j + εt

for t = 1, .., T . The factor and the error terms are assumed to be zero-mean

mutually independent and covariance stationary. Taking the Fourier transform

of the autocovariance function yields the spectral density matrix of xt

Sx(ω) = B(ω)Sf (ω)B(ω)H + Sε(ω)

where |ω| 6 π, Sf (ω) is the spectral density matrix of ft, Sε(ω) is the spectral

density matrix of εt, B(ω) is the Fourier transform of the function Bt−j of

j, and B(ω)H is the complex conjugate transpose of B(ω). Geweke (1977)

assumed that the factor is scalar, unit variance and serially uncorrelated so that

Sf (ω) = 1. He proposes dividing the periodogram ordinates into frequency

bands and tting a model to each band using maximum likelihood methods

assuming complex Gaussianity. The algorithm that he uses is similar to that

employed by Jöreskog (1967) for the static model. Sargent and Sims (1977)

propose the same approach allowing for multiple factors. Geweke and Singleton

(1981) provide an identication theorem for the multiple factor model based

on zero-restrictions similar to that discussed for the static model in Subsection

1.1.2, and present an identication theorem which allows for correlated factors.

They also discuss maximum likelihood estimation in the correlated factor case.

26

Applications of this approach include business cycle modelling 29, a model of

interest rates30, and an analysis of sectoral unemployment31. A disadvantage

of this approach is the fact that the frequency bands must be specied and

that the spectrum must be assumed to be at within each frequency band.

The more bands that are used, the more likely that this assumption is to

be approximately true, but the fewer the available periodogram ordinates for

model estimation. A further disadvantage is that the observable variables are

determined by a two-sided lter of the factor vector. Consequently, Geweke's

model is not well-suited to forecasting.


The alternative to specifying a model based on a two-sided lter is to allow

the observable variables to be related to the current and past values of the

factors only. Engle and Watson (1981) propose the following autoregressive

one-factor model

xt = Bft + Γzt + εt

ft = αft−1 + δzt + ηt

where xt is the N×1 observable vector, ft is the scalar factor and zt is a vector

of exogenous or lagged dependent variables. They propose that the model be

treated as a state space model for the purposes of estimation. For the model

written above, the factor is treated as the state variable and the equations are

29Geweke and Singleton (1981) and Sargent and Sims (1977).30Singleton (1980).31Heaton and Oslington (2002).

27

the measurement and state equations respectively. While the above speci-

cation is for a scalar AR(1) factor with only a contemporaneous relationship

between the factors and the observable variables, models with a one-sided lter

of multiple higher order autoregressive factors can be considered by appropri-

ately stacking the lags of the multiple factors to create the state vector, and

restricting the system matrices appropriately. If appropriate, autoregressive

errors may be included in the state vector32.

Relatively little attention has been paid to identication of the one-sided

dynamic factor model. Engle et al. (1985) briey consider the possibility that

their 2-factor model is not identied, but point out that lack of identiability

does not cause problems with the EM algorithm that they employ, and don't

consider the matter further. Camba-Mendez et al. (2001) prove identication

for a k-factor model of the form xt = Bft + εt, under the following conditions

1. ft = C(L)−1ηt where C(L) is a diagonal k × k nite-order lag operator;

2. ηt and εt are mutually and serially uncorrelated and conditionally ho-

moscedastic;

3. the covariance matrix of εt is diagonal and the covariance matrix of ηt is

the identity;

4. the elements of B are such that Bii = 1 for i = 1, .., k.

It should be noted that these conditions are also sucient for identiability for

the static model and are similar to those employed by Geweke and Singleton

(1981) for the two-sided dynamic factor model.

32see, for example, Watson and Engle (1983), Watson and Kraft (1984), and Stock andWatson (1990).

28

Two approaches exist for the estimation of dynamic factor models written

in a state space form.

Likelihood Approaches

Engle and Watson (1981) propose that the likelihood of the state space form

be computed using the Kalman lter and the model estimated using a scoring

algorithm, with numerical dierentiation employed to more rapidly compute

the gradient and information matrix. While this is a neat approach, in practice

the computational load can be quite high, particularly for models with large

numbers of variables and for models with an extensive lag structure. A good

set of starting values can be a valuable asset.

Shumway and Stoer (1982) and Watson and Engle (1983) independently

proposed the EM algorithm to estimate the above model. Each iteration of the

EM algorithm requires a run of the Kalman lter plus a smoothing algorithm.

However, the rest of the iteration is a least squares computation so it is not

necessary for the information matrix to be computed and inverted for each

iteration. Practical experience suggests that the EM algorithm is far more ro-

bust to poor starting values than the scoring algorithm. However, convergence

of the EM algorithm can be quite slow near the solution. A sensible strategy,

which is often employed in applied work, is to estimate the model using the

EM algorithm with an extemely course convergence criterion and to then use

the solutions as starting values for a scoring algorithm. Using this strategy,

models with over one hundred parameters can sometimes be estimated with-

out too much diculty, provided that the dynamic structure of the model is

29

kept simple33. However, the EM approach is not well-suited to the estimation

of models with autoregressive errors, since these are generally included in the

state vector, resulting in a `noiseless' measurement equation. Furthermore,

for models with multiple lags of multiple factors, and with autoregressive er-

rors, the computational load of the scoring algorithm can be heavy, even for

models of only a few variables. This can be a strong disincentive against the

use of dynamic factor models particularly when vector autoregressions can

be estimated so easily. Applications of the likelihood approach include the

construction of coincident and leading indicators34 and analyses of wages35,

productivity 36 and aggregate demand37. Giannone et al. (2006) advocate the

use of dynamic factor models for business cycle analysis.

Subspace Algorithms

Since likelihood methods may involve a heavy computational burden, alter-

native methods of estimation should be considered. One approach which is

particularly attractive is to represent the model in forward innovations state-

space form and to employ a subspace algorithm. Since subspace algorithms

are not well-known in economics, they will be briey described. Much more

detail is available from Kapetanios and Marcellino (2004), Bauer (1998), or

from the extensive literature in engineering in which subspace algorithms were

developed.

33see Lebow (1993) for an example of a model with 145 variables estimated using the EMalgorithm.

34Stock and Watson (1990).35Engle and Watson (1981) and Watson and Engle (1983).36Lebow (1993).37Watson and Kraft (1984).

30

Consider the dynamic factor model

xt = Bft + εt (1.7)

ft = Aft−1 + ηt (1.8)

where xt is the N ×1 observable vector and ft is the k×1 factor. Viewing this

as a state space model, it may be rewritten in forward innovations form38 as

xt = Bft + Cut

ft = Aft−1 + Dut−1

Dene the innite-dimensional vectors

xft =

xt

xt+1

xt+2

...

xp

t =

xt−1

xt−2

xt−3

...

Note that xf

t contains future values of xt, and xpt contains past values. Some

elementary matrix algebra shows that it is possible to write

xft = B1ft + ηt

ft = A1xpt

38see Hannan and Deistler (1986).

31

where ηt =

D1ut

D1ut+1

D1ut+2

...

where A1, B1 and D1 are functions of the system

matrices A, B, C and D. Substituting the second equation into the rst yields

a regression equation linking the past and future values of xt

xft = Γxp

t + ηt

where Γ = B1A1.

In practice, the innite dimensional vectors xft and xp

t cannot be con-

structed and so they are replaced with their truncated analogues

xfs,t =

(x′t x′t+1 · · ·x′t+s

)and xp

q,t =(x′t−1 x′t+2 · · ·x′t−q

)where s and q are

chosen values. Ordinary least squares is then used to estimate Γ. An estimate

of A1 is then computed from a singular value decomposition of Γ39. The factors

are then estimated as ft = A1xpst and the parameter matrices of the original

factor model are estimated by ordinary least squares with ft used in place of

ft. Consistency40 requires Ns > k and q to grow slower than T13 but faster

than (lnT )δ where δ is a parameter that depends on the largest eigenvalue of

A. However, the rate of convergence may be slow in practice. In fact Deistler

et al. (1995) only prove the existence of a sequence of non-singular uniformly

bounded matrices MT such that∥∥∥B −MT BM−1

T

∥∥∥ a.s.−−→ 0,∥∥∥C −MT C

∥∥∥ a.s.−−→ 0

and∥∥∥A−MT A

∥∥∥ a.s.−−→ 0. They also nd that((

loglogTT

)) 1

2 (logT )α)−1 ∥∥∥Γ− Γ

∥∥∥ a.s.−−→

0. Given the sample sizes typically available in economics, rates of convergence

39Often a weighted value of Γ is used. See Larimore (1983).40Deistler et al. (1995).

32

such as this are of some concern.

It should also be noted that the dynamic factor model given by equations

(1.7) and (1.8) has quite simple dynamics, which might be considered too

simple in many applications. Lagged factors may be incorporated into equation

(1.7) quite easily by stacking the factor vector. However, it is less clear how the

accompanying restrictions on the system matrices can then be incorporated

in the estimation procedure. Similarly, it is not immediately obvious how

autoregressive dynamics for the error vector can be estimated using a subspace

approach. Consequently, while subspace methods show considerable promise

as an estimation approach for dynamic factor models, it is not yet clear how

models with rich dynamics may be handled.

Alternative Specications for the Factor Processes

Subsequent research has proposed estimation algorithms for dynamic factor

models with dierent factor specications. In particular, Kim (1994) and Kim

and Yoo (1995) propose an approximate maximum likelihood procedure for

the estimation of a factor model with a factor which follows the Markov-

switching process of Hamilton (1989). Chauvet (1998) uses a modication

of their method. Kim and Nelson (1998) use Gibbs sampling to estimate a fac-

tor model with regime-switching factors. Diebold and Nerlove (1989) propose

a Factor ARCH model in which the observable vector is related to a factor

which follows an ARCH process, and propose an approximate likelihood esti-

mation procedure based on the Kalman lter. Dungey et al. (2000) estimate a

model in which the factor is autoregressive with GARCH disturbances. They

employ the indirect estimation procedure of Gourieroux et al. (1993).

33

1.3.3 Dynamic Errors in Variables

The linear dynamic errors in variables model is usually written as

K(L)ut = 0

xt = ut + εt

where ut and εt are zero-mean, stationary N × 1 vector processes which are

often assumed to be mutually uncorrelated, and K(L) is a (N−k)×N full-rank

polynomial matrix. xt is the only variable which is observable. The model may

be partitioned in the following way

(K1(L) −K2(L)

) u1t

u2t

= 0

x1t

x2t

=

u1t

u2t

+

ε1t

ε2t

(1.9)

where x1t, u1t and ε1t are k×1 vectors, x2t, u2t and ε2t are (N −k)×1 vectors,

K1(L) is (N − k) × k and K2(L) is (N − k) × (N − k) and of full rank. We

may then write

x2t = K2(L)−1K1(L)u1t + ε2t

x1t = u1t + ε1t

and it becomes clear that the model is a dynamic generalisation of the static

errors-in-variables model reviewed in Subsection 1.1.4.

Identication of the parameters of K(L) is a non-trivial issue which has

34

been the subject of some interest in the literature. Results have been estab-

lished for many special cases. Deistler and Anderson (1989) consider the single-

input-single-output case, the three-variable case, and cases where the number

of inputs is equal to the number of outputs, and prove several results. They

also consider using higher-order cumulunt spectra for identication. Nowak

(1992) discusses several subclasses of dynamic errors-in-variables model which

are identiable from their second order moments. Nowak (1993) assumes that

K(L) is a rational transfer function and uses a partial fraction expansion rep-

resentation to prove local identiability. Bloch (1989) considers the case where

K(L) is a two-sided lter and shows that the model may be written as a dy-

namic factor model. This representation may be seen by stacking the variables

from the partitioning in Equation (1.9) to yield

xt =

Ik

K2(L)−1K1(L)

u1t + εt

He uses this representation to investigate identiability and proposes estima-

tion using a maximum likelihood approach in the frequency domain similar to

the method for non-causal factor models proposed by Geweke (1977).

1.4 Time Series Principal Component Analysis

Principal components techniques may also be adapted to suit a dynamic frame-

work. Like dynamic factor analysis, dynamic principal component analysis

has been approached in two ways through the use of a two-sided lter, and

through the use of a one-sided lter.

35


Brillinger (1975) proposes the estimation of principal components in the fre-

quency domain, based on a two-sided lter. He denes dynamic principal

components as in the static case, but with the spectral density matrix used

in place of the covariance matrix. In the time domain, the relationship be-

tween the dynamic principal components and the observable variables may be

written as

xt =∞∑

j=−∞

Ajst−j

xt is a N×1 vector, Aj is a N×N matrix of coecients, and st is a N×1 vector

of serially correlated variables that are mutually uncorrelated at all leads and

lags. Sample estimates of the dynamic principal components are constructed

by using a smoothing technique to consistently estimate the spectral density

matrix at a set of frequencies, and then computing the principal components

of the estimated spectral density matrix at each frequency.

Brillinger (1975) also proposes that canonical correlation analysis of pairs

of time series vectors be carried out in the frequency domain, based on a

two-sided lter.


Kariya (1993) proposes a multivariate time series variance component (MTV)

model

xt = Ast

36

where xt is a N × 1 vector, A is a N × N matrix of coecients for which

A′A = IN , and st is a N × 1 vector of serially correlated variables that are

mutually uncorrelated at all leads and lags. It should be noted that, unlike

the dynamic principal component representation of Brillinger (1975), the MTV

model is not general. Taniguchi et al. (2006) propose a test statistic for the

hypothesis that an observable time series was generated by a MTV model.

Hosseini et al. (2003) construct a generalisation of the ICA model to a

setting in which the elements of the source vector st are mutually independent

but serially correlated. They propose a maximum likelihood procedure for the

estimation of the unmixing matrix A−1.

Box and Tiao (1977) consider a canonical analysis for a vector autoregres-

sion

yt = Φ(L)yt−1 + εt

where yt is a N×1 vector. Consider a linear combination of the elements of yt,

wt = r′yt, where r is a arbitrary N × 1 vector for which r′r = 1. The variance

of wt is given by

σ2w = r′ΣΦ

y r + r′Σεr

where ΣΦy is the covariance matrix of Φ(L)yt−1 and Σε is the covariance matrix

of εt. The predictability of the series may then be measured by the signal-to-

noise ratio

τ =r′ΣΦ

y r

r′Σεr

Some calculus shows that the most predictable linear combination of yt is con-

structed by setting r equal to the eigenvector corresponding to the largest

37

eigenvalue of Σ−1ε ΣΦ

y . Box and Tiao (1977) consider the case where Φ(L) has

roots close to the unit circle. They show that in this case, the canonical

variables may be divided into two groups some which follow stationary au-

toregressions, and some which are approaching non-stationarity. They suggest

that the second of these groups could serve as useful composite indicators of

the overall dynamic growth of yt.

Consider the dynamic reduced rank regression model

yt = A(L)B(L)xt + εt (1.10)

In the case where A(L) and B(L) are two-sided polynomials, the analysis of the

model corresponds to the dynamic canonical correlation analysis of Brillinger

(1975). Velu et al. (1986) considered the case where A(L) and B(L) are one-

sided polynomials. When xt = yt−1, Equation (1.10) is a reduced rank vector

autoregression. Velu et al. (1986) consider two special cases of this model. If

deg (A(L)) = 0 then the model corresponds to the canonical analysis of Box

and Tiao (1977). If instead deg (B(L)) = 0, then the model becomes the index

model of Sims (1981). Velu et al. (1986) discuss estimation and develop some

asymptotic theory for these models. By far the most common use of reduced

rank regression and canonical correlation analysis in economics is the analysis

of cointegrated systems (see Johansen (1988) and Reinsel and Ahn (1992)).

38

1.5 Factor Analysis and Principal Component

Analysis of High-Dimensional Vectors

In addition to dealing with low dimensional vectors which might alternatively

be analysed using more traditional techniques, the factor analysis techniques

outlined in the previous sections may be applied to larger vectors. However, in

cases in which the number of variables is of a similar magnitude to, or possibly

even larger than the number of observations, three problems are encountered.

Firstly, the assumption that the error terms are not correlated with each other

in a factor model will generally become harder to believe as the number of

variables increases. Secondly, the computational work required to estimate

the factor model often becomes prohibitive. Thirdly, and most importantly,

the asymptotic arguments that are used to justify the factor estimators out-

lined in the previous sections assume that the number of variables is xed and

the number of observations goes to innity. In a setting in which the number of

variables is of a similar order of magnitude to the number of observations, such

an approach may provide a poor approximation to the actual behaviour of the

estimators. Quah and Sargent (1992) propose that the EM algorithm might

be used in a large-N context. They argue (p.10) that ..increasing the cross-

section dimension N can only help to estimate more precisely (the conditional

expectations of) the (unobserved part of the) state and its cross moments.

However, their argument is based on intuition and is not entirely convincing.

Given the ndings in the random matrix theory literature, where sample eigen-

values do not consistently estimate population eigenvalues in a setting in which

(N, T ) → (∞,∞) jointly, insistence on proof, rather than informal arguments,

39

would seem wise.

In recent years, there has been a great deal of interest in estimating factor

models of high-dimensional time series using principal component methods.

The computational attraction of this approach is clear, since eigenvalues and

eigenvectors can be computed even for very large matrices easily and quickly.

Furthermore, there exists a growing body of formal theory which shows that

sample principal component quantities can consistently estimate their anal-

ogous population factor quantities, under certain conditions, in a setting in

which (N, T ) → (∞,∞) jointly. Recently, there has been rapid growth in the

number of applications of these techniques, although many of these papers are

yet to be published. In this section, the literature on principal component

estimation of innite dimensional factor models in reviewed.

1.5.1 Population Results

The earliest work on innite dimensional factor analysis considered the the-

oretical problem of using population principal components as estimators of

population factors. While this work does not produce an estimator which can

be implemented in practice, it provides a lot of insight into the relationship

between principal components and factors and establishes some of the ground-

work necessary for a consideration of the more practically relevant issue of

using sample principal components to estimate population factors. Chamber-

lain and Rothschild (1983) considered a generalisation of the arbitrage pricing

model of Ross (1976). Whereas Ross had assumed that asset returns followed a

strict factor model41 in which the errors are mutually uncorrelated, Chamber-

41The term `strict' factor model was introduced by Chamberlain and Rothschild (1983).

40

lain and Rothschild (1983) considered an `approximate' factor model in which

the errors are allowed to be weakly correlated in the sense that the largest

eigenvalue of the error covariance matrix is bounded. They show that, in a

framework in which N −→ ∞ principal components and factor loadings are

equivalent. While they don't explicitly consider the problem of estimation,

Chamberlain and Rothschild (1983) suggest that an implication of their re-

sults is that nancial analysis might be undertaken by a consideration of the

principal components of the covariance matrices of returns, rather than requir-

ing the consideration of a factor model. Bentler and Kano (1990) consider a

static single factor model

xt = Bft + εt

They assume that Ψii 6 σ2 < ∞ where Ψii is the ith diagonal element of Ψ =

E(εtε′t) and B′B −→∞ as N −→∞. They prove that under these conditions

the correlation coecient between the factor and the rst population principal

component of xt converges to 1, and the principal component loading vector

converges to the factor loading vector.

Schneeweiss and Mathes (1995) consider a k-factor static model and anal-

yse the sum of the canonical correlation coecients between the population

factors and the population principal components. They show that this sum

approaches k as σ2

dk−→ 0 where σ2 is the largest eigenvalue of Ψ and dk is

the smallest eigenvalue of B′B. They also prove similar results for the fac-

tor loadings and the principal component loadings. Under similar conditions,

Schneeweiss (1997) proves that∥∥∥BD− 1

2 −QfL∥∥∥

F−→ 0 and ‖ft − Lsft‖ −→ 0,

where D is a diagonal k×k matrix containing the ordered eigenvalues of B′B,

41

Qf is the N×k matrix containing the eigenvectors of Ω = E(xtx′t) correspond-

ing to the rst k eigenvalues, sft is a vector containing the rst k principal

components of xt, L is a k × k sign matrix42 and ‖.‖F denotes the Frobe-

nius norm. Like Chamberlain and Rothschild (1983), Schneeweiss and Mathes

(1995), and Schneeweiss (1997) consider population quantities only. However,

their work provides considerable insight into the conditions under which prin-

cipal component quantities will be similar to corresponding factor quantities.

In the 75 years that principal component analysis and factor analysis have

coexisted, their papers are the rst to provide a detailed account of the re-

lationships between the two concepts. An understanding of this relationship

provides the foundation upon which theories relating sample principal compo-

nents to population factors may be constructed.


Forni et al. (2000) considered the dynamic factor model

xt = B(L)ft + εt

where ft is white noise, εt is zero mean, stationary and orthogonal to ft at

all leads and lags. Forni et al. (2000) state (p.541) that B(L) is a one-sided

square-summable lag polynomial, however their proposed estimator (p.546)

is based on a two-sided ltering. Transforming the model to the frequency

domain, they assume that the diagonal elements of the spectral density matrix

of xt are bounded, that the eigenvalues of the spectral density of the common

42i.e. the diagonal elements of L are all ±1.

42

component B(L)ft diverge as N −→ ∞, and the eigenvalues of the spectral

density of εt are uniformly bounded. As such, the model may be considered to

be a dynamic generalisation of the approximate factor model of Chamberlain

and Rothschild (1983). Using dynamic principal component techniques to

estimate the factors and factor loadings, they prove that the sample estimate

of the common component of the model converges in probability to the true

common component as (N, T ) −→ (∞,∞). Forni et al. (2004) show that the

rate of convergence is min(√

N,√

T ). For the dynamic principal component

estimator of the factors, they nd a rate of convergence of√

TN. The best rate

that can be achieved is√

N , but this requires that T grow at a rate at least

as fast as N2. They have no result for cases in which T grows slower than

N . This is not a particularly encouraging result for analysts who wish to use

dynamic principal component techniques to estimate factors, rather than just

the common component of the model, and there is a role for future research

to investigate whether faster rates of convergence may be established for the

factor estimator.


In parallel to Forni, Hallin, Lippi and Reichlin's work on the dynamic factor

model, interesting results have been derived in a dual limit setting for static

principal component estimates of static factor quantities. Stock and Watson

(2002a) considered a static factor model for a N × 1 vector of time series xt,

and considered the problem of forecasting a scalar time series variable yt h

43

time periods ahead, i.e.

xt = Bft + εt

yt+h = β′ft + γ′zt + ηt

where zt is a vector of predetermined variables (which may include lags of

the dependent variable) and β and γ are vectors of regression coecients.

Stock and Watson (2002a) prove that, under certain conditions as (T,N) −→

(∞,∞), the sample principal component vector at each period in time con-

verges in probability to the population factor, up to a sign matrix, and that

OLS estimator of the regression coecients in the forecasting equation, com-

puted by substituting the sample principal components for the unobservable

population factors, converge in probability to the population parameters, up

to a sign matrix. They also show that the sample forecasts computed using

these sample quantities converge in probability to the corresponding infeasible

forecasts computed using the unknown population quantities.

Under slightly dierent conditions, Bai and Ng (2002) prove that

min(N, T ) ‖sft −HN,T ft‖2 = Op(1) for each t, where sft is a vector contain-

ing the rst k sample principal components of xt and HN,T is a sequence of

non-singular matrices. Bai (2003) shows that under similar conditions, as

(T,N) −→ (∞,∞), if√

NT−→ 0 then

√N (sft −HN,T ft) converges to a Gaus-

sian distribution and that if√

NT

> τ > 0 then T (sft −HN,T ft) = Op(1).

When√

TN−→ 0 he shows that

√T(λ

12i qi −H−1

NT bi

)converges to a Gaussian

distribution where λi is the ith eigenvalue of SXX = 1TN

X ′X and qi is the

corresponding eigenvector, and bi is the ith row of B. If√

TN

> τ > 0 then

N(λ

12i qi −H−1

NT bi

)= Op(1). Bai (2003) also proves asymptotic Gaussianity

44

for the estimator of the common component and provides a uniform bound

on the factors of max16t6T

‖sft −HN,T ft‖2 = Op

[max

(T− 1

2 ,√

TN

)]. Denoting

δ = (β′ γ′)′, Bai and Ng (2006) construct an OLS estimator δ using the

sample principal components in place of the unobservable populaton factors,

and prove that as (T, N) −→ (∞,∞) if√

TN−→ 0,

√T(δ − δ

)converges to

a Gaussian distribution, and that if√

NT

−→ 0, yT+h−y√var(yT+h)

d−→ N(0, 1), where

yT+h is a forecast of yT+h computed using the OLS estimates and the principal

components estimator of the factor.

Kapetanios and Marcellino (2004) have suggested that subspace algorithms

could be used to estimate factor models with large dimensions. They provide

a modication to standard subspace algorithms which allow the estimtor to be

computed when N is large relative to T . Currently, their asymptotic theory

restricts N to grow at a rate less than T13 , and so the rationale for using sub-

space algorithms in cases where N is of the same order of magnitude as T is

not yet established. Given the computational ease with which subspace esti-

mators may be computed for models small or large, an extension of this theory

which relaxed the restriction on the growth rate of N , would be particularly

interesting since it would mean that a single approach to estimation could be

used irrespective of the size of the factor model, making questions of whether

N is `large enough' redundant.

Bernanke et al. (2005) have proposed a factor-augmented vector autore-

gression (FAVAR) for which

ft

yt

= Φ(L)

ft−1

yt−1

+ ϕt

45

where yt is now permitted to be a vector of observable variables. They assume

that a large vector of observable variables xt has factor structure, estimate the

factors using the principal components of xt, and then estimate the FAVAR

equation using the rst k principal components in place of the unobservable

factors. Stock and Watson (2005) consider a similar model and provide a

detailed discussion of identication schemes for conducting a structural FAVAR

analysis.

Forni et al. (2005) show how the two-sided dynamic principal components

estimator of the one-sided dynamic factor model of Forni et al. (2000) may

be extended to create a one-sided estimator of the common component and

forecasts based on a one-sided lter. For the dynamic factor model

xt = B(L)ft + εt

the linear combination of the elements of xt which is closest to the space

spanned by the factors may be found by choosing the vector α1 to maximise

var(α′1B(L)ft) such that var(α′1εt) = 1. Denote this vector α∗1. A second vec-

tor α∗2 may be dened by maximising var(α′2B(L)ft) such that var(α′2εt) = 1

and α′2α1 = 0. In this way, k orthogonal components α∗1, ..., α∗k may be de-

ned which are the closest orthogonal factors to the common factor space.

Forni et al. (2005) show that these vectors are the generalised eigenvectors

corresponding to the rst k generalised eigenvalues of the covariance matrix of

B(L)ft and the covariance matrix of εt. They propose that these covariances

be estimated from the inverse Fourier transforms of the dynamic principal

component estimators of the common and idiosyncratic components of the dy-

46

namic factor model. They argue that this estimator is superior to the static

principal component estimator of Stock and Watson (2002a) since it incorpo-

rates information about the dynamic structure of the model in the estimation

technique. Simulation results provided by D'Agostino and Giannone (2006)

support this claim.

1.5.4 The Choice of Factor Order

Bai and Ng (2002) consider the estimation of the number of factors in a static

factor model. They derive modications of well-known model selection pro-

cedures, such as the Schwarz-Bayes criterion, and prove their consistency as

(T,N) −→ (∞,∞). As explained at the start of Section 1.3 a k-factor dy-

namic model with q lags of the factor may be written as a kq-factor static

model. Therefore, in a dynamic setting, the Bai and Ng (2002) procedure will

estimate kq, rather than k. Amengual and Watson (2007) devise an estimator

for k by rewriting the dynamic factor model in a static form with k factors

and then applying the Bai and Ng (2002) procedure to the transformed model.

They prove consistency for this procedure. Bai and Ng (2007a) take a dierent

approach to the estimation of the number of factors. They note that in the

static specication of the model, the spectral density matrix of the kq factors

will have a rank of k. They estimate k by specifying a VAR model for the

factors and estimating the rank of the covariance matrix of the VAR errors.

As discussed in Section 1.2.4 Kapetanios (2005) and Onatski (2007) have

proposed test statistics for the hypothesis that the factor order is equal to a

predetermined number. The reader is referred back to that section for a brief

47

discussion of these procedures.

1.5.5 Applications

There now exist many applications of the techniques described above. Fac-

tor models have been estimated using large macroeconomic data sets from

Australia43, Austria44, Belgium45, Brazil46, Canada47, the European Area48,

France49, Germany50, Italy51, the Netherlands52, New Zealand53, Spain54, the

United Kingdom55 and the United States56. The number of variables in the

analyses have ranged from as few as 29 57 to almost 100058. A range of practi-

cal issues have been considered. Bernanke and Boivin (2003) estimate policy

reaction functions for the Federal Reserve that include factors estimated by

the principal components of a large macroeconomic data set. Bernanke et al.

(2005) construct a factor-augmented vector autoregression (FAVAR) model

and nd that the inclusion of factors from a large macro data set helps to

explain the monetary policy "price puzzle" described by Sims (1992). Favero

et al. (2005) use principal component techniques to estimate factors from large

43Gillitzer et al. (2005) and Gillitzer and Kearns (2007).44Schneider and Spitzer (2004).45Nieuwenhuyze (2006).46Ferreira et al. (2005).47Brisson et al. (2003).48Forni et al. (2003).49Bandt et al. (2007).50Schumacher (2005).51Favero et al. (2004).52den Reijer (2005).53Giannone and Matheson (2006) and Matheson (2006).54Camacho and Sancho (2003).55Artis et al. (2005).56Stock and Watson (2002b) and Gavin and Kliesen (2006).57Gillitzer et al. (2005).58e.g. Altissimo et al. (2001).

48

data sets for the United States and the Euro area. They include the factors

as regressors in structural VAR models which they use to evaluate the eects

of monetary policy. Sala (2003) uses a large dynamic factor model to study

the transmission of menetary shocks in the Euro area. Mansour (2003) uses

a dynamic factor model to estimate a world business cycle from GDP growth

data of 113 countries. Helbling and Bayoumi (2003) use a factor model to

estimate common business cycle components for the G7 countries.

A popular application of large-scale factor techniques is to estimate the

factors from a broad collection of macroeconomic variables and to interpret

them as coincident economic indicators. Perhaps the best known of these are

the Chicago Fed National Activity Index 59 (CFNAI) which is the rst sample

principal component of 85 monthly indicators of economic activity, and Euro-

COIN60, developed by Altissimo et al. (2001) using dynamic factor techniques,

which is the cyclical component extracted from the common factor obtained

from a large set of European macroeconomic data. Other examples are Gillitzer

et al. (2005) who construct a coincident indicator for Australia, and Nieuwen-

huyze (2006) who constructs a business cycle indicator for Belgium. Recently,

Altissimo et al. (2006) have proposed a new version of EuroCOIN. Cristadoro

et al. (2005) use large-scale dynamic factor techniques to construct a measure

of core ination for the Euro area. Giannone and Matheson (2006) construct

a similar measure for New Zealand. Kapetanios (2004) uses subspace methods

to construct an estimate of core ination for the United Kingdom.

The most stringent test to which large-scale factor analysis techniques have

59http://www.chicagofed.org/economic_research_and_data/cfnai.cfm.60http://www.cepr.org.uk/data/eurocoin/.

49

been subjected is forecasting. If estimated factors contain useful information

that is not spanned by any small subset of economic variables, then it might be

expected that the inclusion of estimated factors in forecasting models would

improve forecasting performance. The last few years have seen a consider-

able number of studies which perform forecasting simulations using historical

macroeconomic data to compare the perfomance of large-scale factor models to

benchmark models such as scalar autoregressions and small vector autoregres-

sions. Stock and Watson (2006) and Breitung and Eickmeier (2005) provide

good surveys which cover some of this literature. Eickmeier and Ziegler (2006)

conduct a meta-analysis of 46 dierent studies which assess the forecasting

performance of dierent large-scale factor analysis procedures using various

data sets.

1.5.6 Using Factors for GMM Estimation

An interesting application of large-scale factor techniques, which is currently

under development, is to use estimated factors as instruments in a Generalised

Method of Moments (GMM) regression. Consider the regression model

yt = βxt + εt

where xt is a vector of m observable variables, β is a m×1 vector of regres-

sion coecients, and εt is a scalar regression error term for which E(xtεt) 6= 0.

Suppose that a vector of N instruments zt is available. Under assumptions

similar to those used by Bai (2003), Bai and Ng (2007b) prove that, if xt and

zt are driven by a set of k common factors, then a GMM estimator of β, in

50

which the rst k principal components of zt are used as instruments is√

T -

consistent and asymptotically Gaussian if√

TN−→ 0. Furthermore, they show

that the k-factor GMM estimator is more ecient than a GMM estimator

constructed using any subset of k of the elements of zt. Kapetanios and Mar-

cellino (2006) consider some more general relationships between the regressor,

the observable instruments and the factors, which allow for elements of zt to

be weak instruments. They prove several asymptotic results which support

the use of GMM estimation with the rst k principal components of zt used

as instruments. Favero et al. (2005) and Beyer et al. (2005) are examples of

applications of this technique to macroeconomic data.

1.6 Evaluation and Contributions

1.6.1 Evaluation of the Literature

Time Series Factor Analysis

As Giannone et al. (2006) point out, business cycle models, with variables

measured with error, imply dynamic factor structure. This, and the fact that

dynamic factor models were developed in the late 1970s and early 1980s, make

it remarkable that so little work has been done using dynamic factor models

since this time. However, to those with experience estimating dynamic factor

models, this is perhaps not much of a surprise. Simple models with a sin-

gle AR(1) factor, white noise errors, and no lagged factors are relatively easy

to estimate. The state space representation is the same as the factor model,

and the scoring algorithm usually converges quite rapidly. In cases where it

51

doesn't, it is reasonably straightforward to write an EM algorithm, which will

usually converge to a coarse convergence criterion fairly quickly. However, the

dynamic structure of such a model is restrictive and is likely to be unsatisfac-

tory in many applications. Generalising the dynamic structure of the model

complicates the estimation. Multiple factors may be accomodated by writing

state-space models for each factor, and then stacking the state vectors and sys-

tem matrices to create a state space representation for the entire factor model.

Lags of factors may be incorporated by stacking lags of the factors into the

state vector. Autoregressive errors may be accomodated by writing state space

representations for each error and then stacking the state vectors for the errors

with the state vector for the factors. However, the state space representation

of the factor model then has a `noise-free' measurement equation, which does

not lend itself to estimation by the EM algorithm. Furthermore, the scor-

ing algorithm can be quite slow is such cases, with the high state dimension

and relatively large number of parameters greatly increasing the computation

required by the Kalman lter. Also, practical experience suggests that con-

vergence often requires a large number of iterations, and may not occur at all.

Good starting values are essential. In principle, the dynamic factor model is

an attractive alternative to the vector autoregression for empirical macroeco-

nomics. However, given the relative ease with which vector autoregressions

may be estimated, it is not surprising that dynamic factor analysis has had

such a limited impact on the applied economics literature.

Most of the dynamic factor models that have been estimated in the lit-

erature have autoregressive factors. There also exist some applications with

factors which follow the Markov-switching process of Hamilton (1989). How-

52

ever, the obvious generalisation to factors which follow ARMA processes has

not yet been pursued. Since an innite impulse response function may be

approximated to arbitrary accuracy by a rational transfer function, this is a

worthwhile generalisation. An obvious diculty here is that the construction of

a state space representation of the factor model will be further complicated by

the addition of ARMA dynamics for the factors and errors, and so estimation

is likely to be dicult.

Another issue that requires further investigation is identication of dynamic

factor analysis. It is well known that in classical static factor analyis, the

factors are identied only up to an orthogonal transformation. Therefore,

linear sums of a set of factors can produce an alternative factor representation

which is equally valid. In the case of dynamic factor models, this is complicated

by the presence of lags. Geweke and Singleton (1981) and Camba-Mendez et al.

(2001) have shown that restrictions on the factor loading, similar to those used

in the case of static factor analysis, are sucient to identify the factors in the

two-sided and one-sided factor model respectively. Therefore, identication

is no more of a problem in the dynamic factor model than it is in the static

model. However, the question of whether the extra structure implied by the

dynamics helps to identify the parameters remains unexplored. If dynamic

factor models are to be used more widely as models of economic processes,

then the issue of identication warrants further attention.

53

Factor Analysis and Principal Component Analysis of High-Dimensional

Vectors

Given the relative computatonal ease with which principal components may

be computed, it is not surprising that there has been so much recent work

done with principal components estimators of high-dimensional factor models.

Considering the sheer volume of these applications to macroeconomic data,

one might hope that an empirical consensus had emerged concerning the types

of economies for which the techniques work well, the types of variables that

they can successfully forecast, the number of variables and observations re-

quired for good performance, etc. Unfortunately, no such concensus is yet

obvious. Indeed, it is not yet entirely clear that large-scale factor models nec-

essarily produce superior forecasts to standard forecasting approaches. Some

studies61 nd large improvements in forecasting performance from the use of

factors, with mean squared forecasting errors reduced by over 40% compared

to scalar autoregressions. Others62 nd little evidence of factor-based forecasts

providing a signicant benet over benchmark models.

Of particular interest in this thesis is the wide range in the number of

variables used to estimate factor models in the literature. Based on a naïve

reading of the theoretical literature, one might expect that studies that es-

timate the factors from the largest number of variables available would tend

to return the best results. However, the empirical evidence does not support

this proposition. Boivin and Ng (2006) nd that 40 carefully chosen vari-

61e.g. Stock and Watson (2002b), Brisson et al. (2003), Schneider and Spitzer (2004) andCamacho and Sancho (2003).

62e.g. Angelini et al. (2001), Giacomini and White (2003), Eklund and Karlsson (2007),Schumacher (2005), and Banerjee and Marcellino (2006).

54

ables can yield better results than 147 variables when forecasting 8 measures

of economic activity and ination for the US. They also perform monte carlo

simulations with dierent degrees of error cross-correlation and demonstrate

that increasing the number of variables in the factor model might worsen fore-

casting performance. They suggest the use of weighting schemes to improve

the performance of the principal components estimator. Inklaar et al. (2003)

consider the construction of a coincident indicator for the Euro area and nd

that a factor model estimated using 38 carefully chosen macroeconomic vari-

ables produces an indicator that is at least as good as that produced by esti-

mating a factor model estimated using their entire database of 246 variables.

Schneider and Spitzer (2004) consider forecasting Austrian GDP using a dy-

namic factor model estimated by dynamic principal components. They nd

that models that include only 5 to 11 variables perform signicantly better

than a model with 143 variables. den Reijer (2005) considers using a dynamic

factor model of 370 variables to forecast Dutch GDP, but nds that models of

147 and 223 carefully chosen variables perform better. These results are not

easily understood from the existing theory of factor model estimation which,

with one exception, shows consistency as (T,N) −→ (∞,∞). Onatski (2006a)

proves inconsistency in a case with weak factors and temporally independent

Gaussian errors. Furthermore, in this case the factors expain a negligible pro-

portion of the total variance and it is not clear that factor-based forecasts

would necessrily perform well, even if the true factors were known. However,

it is a little dicult to believe that some industrialised economies have a strong

factor structure and others don't. Also, it is not clear how the common obser-

vation that the estimator may be worse when N is large, can be explained in

55

Onatski's framework.

An alternative explanation for the mixed empirical performance of the prin-

cipal components estimator of factor forecasts concerns the behaviour of the

error covariance matrix as N grows. The published dual-limit consistency

proofs for principal components estimation of large-scale factor models place

restrictions on the cross-correlation structure of the error covariance matrix.

In particular, for the dynamic factor techniques introduced by Forni et al.

(2000), the dynamic eigenvalues of the spectral density of the errors are as-

sumed to be uniformly bounded. This assumption is the dynamic analogue

to the `approximate factor' restriction introduced by Chamberlain and Roth-

schild (1983). Stock and Watson (2002a) and Bai and Ng (2002) assume that

the mean absolute row sum of the error covariance matrix is bounded. Bai

(2003) and Bai and Ng (2006) assume that the absolute row sums are uni-

formly bounded. A bound on the maximum absolute row sum is also a bound

on the maximum eigenvalue, so these assumptions imply an approximate fac-

tor structure. While clearly less restrictive than the traditional `strict factor

model' assumption of a diagonal error covariance, these assumptions should

not be taken for granted. In spatio-temporal applications of the factor model,

it might be reasonable to assume that the correlation between errors decays as

the geographical distance between variables increases, so that the absolute row

sums of the error covariance matrix remain bounded as the number of variables

grows. In general however, there is no reason why the `approximate factor' re-

striction should necessarily be expected to hold. One possible description of

the data, which might be relevant in many applications, is that the variables

belong to a set of natural groups. For example, the groups could correspond

56

to geographical boundaries (e.g. dierent countries in the Euro area), or to

functional catagories (e.g. real, nominal and nancial variables). It might be

reasonable to assume that pairs of errors corresponding to dierent groups are

weakly correlated in the sense of uniform boundedness of the absolute row

sums of the covariances as the number of variables grows, but that pairs of

errors from the same group are strongly correlated. If the number of natural

groups is nite and relatively small, then the number of variables might be

increased only by increasing the number of variables used from each natural

group. In such a situation, the absolute row sums of the covariance matrix

are unlikely to be uniformly bounded by a constant and could in fact grow at

any rate up to N . The existing theory for principal components estimation of

large-scale factor models does not cover such cases. In fact, with the exception

of a brief consideration of a single-factor model with identical factor loadings

by Boivin and Ng (2006), the implications of stronger cross-correlation in the

errors has not been given explicit consideration in the theoretical literature.

1.6.2 Contributions Made in this Thesis

Chapter 2

Chapter 2 considers dynamic factor analysis with a one-sided factor lter in

a setting in which the number of variables N is assumed to be xed. Three

contributions are made.

(i) The dynamic factor model with mutually uncorrelated autoregressive

factors is derived as a particular realisation of a VARMA model with re-

duced spectral rank observed subject to noise. As Giannone et al. (2006)

57

have pointed out, many theoretical macroeconomic models suggest that

the number of structural shocks that drive macroeconomic uctuations

is less than the number of observable variables, and macroeconomic vari-

ables are usually observed subject to measurement error. Consequently,

the reduced spectral rank VARMA plus noise model is an attractive spec-

ication for macroeconomic analysis. In Section 2.1, it is shown that the

dynamic factor model with mutually uncorrelated autoregressive factors

corresponds to a minimal dimension state space representation of the

reduced spectral rank VARMA plus noise model in cases in which the

autoregressive polynomials of the factors do not have any common poly-

nomial factors. In cases where common polynomial factors exist, the

dynamic factor model does not correspond to a minimal state dimension

representation (Proposition 1).

(ii) In Section 2.2 the issue of identication is considered for a fairly gen-

eral class of weakly stationary dynamic factor model with uncorrelated

factors. For the model

xt = βft + εt

in cases where the removal of any row of β(L) leaves rows from which

it is possible to construct two k × k matrices of full rank and another

matrix with at least one row, it is shown that the error spectrum is

identied (Theorem 2.2.1) and that the number of dynamic factors is

identied (Theorem 2.2.2). It is also shown that, under these conditions,

zero-restrictions similar to those used to identify the static factor model

are also identifying for the dynamic factor model (Theorem 2.2.3). Of

58

most interest is Theorem 2.2.4 which shows that, under the above rank

conditions, if β(L) is irreducible and the spectra of the factors are lin-

early independent, then β(L), the factor specta, and the error spectra

are identied up to sign changes, reordering, and rescaling of the fac-

tors. Consequently, zero-restrictions are not necessary for identication

in many forms of dynamic factor model, including those with autoregres-

sive factors.

(iii) In Section 2.3, a frequency domain approach is proposed for the estima-

tion of dynamic factor models. A simulation exercise (in Section 2.4)

suggests that this method has some computational advantage over the

state space scoring algorithm which is usually used for dynamic factor

model estimation. However, the main attraction of the frequency domain

approach is the relative ease with which a general algorithm can be coded.

The existing time domain algorithms for the estimation of dynamic fac-

tor models require the construction of a state space representation of the

model. For factor models with few lags, this is trivial. However for more

complicated lag structures, and particularly for ARMA dynamics, this

task becomes more complex, and the construction of a general algorithm,

which can handle any specication of model orders is complicated. As

shown in Section 2.3, in the frequency domain a general expression for the

covariance matrix can be written (Equation (2.5)) which makes the eval-

uation of the likelihood relatively easy to code. As an illustration, this

approach is used to estimate a dynamic factor model with a ARMA(1,1)

factor and ARMA(1,1) errors using data on industrial production growth

59

in the G7 countries (Section 2.5).

Chapter 3

In Chapter a theoretical investigation into the asymptotic behaviour of the

principal components estimator is presented. The asymptotic results that have

previously been published in the literature assume that the mean of the row

sums of the absolute value of the covariance matrix of the errors is bounded (the

approximate factor assumption). It is argued in Chapter 3 that this assumption

will often be violated. For many of the applications in the literature, the

variables are chosen from a relatively small number of categories. For example,

large factor models will often have a large number of price indexes, a large

number of interest variables, a large number of measures of industrial output,

etc. It is easy to believe that the similarity of many of the variables that belong

to the same category is such that many of the error terms corresponding to

those variables will have non-negligible correlation. If the `large-N' conditon

is achieved by increasing the number of variables in each category, instead of

increasing the number of categories, then it is likely that the absolute row sums

of the error covariance matrix will grow without bound a situation that is

not covered by the published theory.

The main result in this chapter (Theorem 3.1.4) is that the principal compo-

nents estimator is consistent under conditions where the absolute row sums of

the error covariance matrix grow without bound. Therefore consistency holds

for a class of model which is more general than the approximate factor model

that has been investigated in the literature. However, the rate of convergence

that is achieved by the estimator is slower, the faster is the rate of growth of

60

error cross-correlation. Consequently, it is possible for the performance of the

principal components estimator to be poor even in applications with a very

large number of variables, which may explain the patchy performance record of

the principal components estimator in forecasting applications. The proof of

this result makes use of a number of preliminary results, which are interesting

in their own right.

Theorem 3.1.3 proves the consistency of sample eigenvalues (scaled by 1N)

for population eigenvalues in a framework in which (N, T ) → (∞,∞) jointly.

The key assumption in this theorem is the so-called `gap' condition, which

requires that the absolute dierence between each of the rst k eigenvalues

and any other eigenvalue, grows at a rate of strictly N .

Theorem 3.1.1 presents a set of nite-sample/variables bounds linking pop-

ulation principal components to population factors. By avoiding sampling

issues and asymptotic arguments, these bounds give a clear view of the con-

ditions under which population factors and population principal components

are likely to be `close'. In particular, they suggest that what matters for prin-

cipal components to estimate factors well is not the number of variables per

se, but rather the magnitude of the noise-to-signal ratio, which is dened as

ρ = σ2

λk, where σ2 is the largest eigenvalue of the error covariance matrix Ψ,

and λk is the kth eigenvalue of Ω = E(

1TX ′X

). When the noise-to-signal ratio

is small, population principal component and population factor quantities will

be similar. Estimation of a lower bound on the noise-to-signal ratio is consid-

ered in Section 3.2, and some empirical work is conducted which suggests that

the noise-to-signal ratio of the US macroeconomic data set used by Stock and

Watson (2002b) is not particularly small (Section 3.3).

61

Chapter 4

In Chapter 4 a new factor model, named the grouped variable approximate fac-

tor model, is proposed. The grouped variable approximate factor model is mo-

tivated by the idea that large sets of economic variables will often have a group

structure such that most of the cross-correlation between the errors in a factor

model occurs between variables that belong to the same group. For example,

in the data appendix of Stock and Watson (2002b), the 215 variables used in

their model are listed under headings such as Real output and income", Em-

ployment and hours", Stock prices", etc. Many of the variables listed under

these headings are very similar to each other. For example, under Real out-

put and income" are listed variables such as Industrial production:total index;

Industrial production:products, total; Industrial production:nal products; In-

dustrial production:consumer goods; Industrial production:durable consumer

goods; Industrial production:nondurable consumer goods; Industrial produc-

tion:business equipment; Industrial production:intermediate products; Indus-

trial production:materials; and so on. Under Employment and hours" are

listed variables such as Employees on nonagricultural payrolls: goods produc-

ing; Employees on nonagricultural payrolls: contract construction; Employ-

ees on nonagricultural payrolls: manufacturing; Employees on nonagricultural

payrolls: durable goods; Employees on nonagricultural payrolls: nondurable

goods; Employees on nonagricultural payrolls: service producing; Employees

on nonagricultural payrolls: wholesale and retail trade; and so on. It is sug-

gested in Chapter 4 that, if xt is constructed by entering these variables in

the order given by the variables listed under their headings, then most of the

62

error cross-correlation in the factor model will exist in blocks which lie on the

diagonal of the error covariance matrix and which correspond to the groups

identied by these headings.

The grouped variable approximate factor model formalises this idea by

assuming that the error covariance of the factor model has a block structure,

where the blocks correspond to the variable groups. The o-diagonal blocks

are subject to a weak correlation restriction specically, the largest of the

singular values of the o-diagonal blocks must grow at a rate strictly less than

N− 12 . No restriction is placed on the correlation structure of the blocks that

lie on the diagonal.

In Section 4.2 an approximate instrumental variables estimator is proposed

for the grouped variable factor model. This estimator is simple to compute, re-

quiring only matrix multiplication and the inversion of a k×k matrix, where k is

the number of factors. In Section 4.3 consistency is proved for the approximate

instrumental variables estimator in a framework in which (N, T ) → (∞,∞)

jointly (Theorem 4.3.1). A brief empirical experiment which compares the

approximate instrumental variables estimator to the principal components es-

timator, is presented in Section 4.4.

63

Chapter 2

Dynamic Factor Analysis with a

Finite Number of Variables

This chapter considers the dynamic factor model

xt = β(L)ft + εt

where xt is a N × 1 vector of observable variables, β(L) is a N × k nite-order

one-sided polynomial in which L is the backshift operator, ft is a k× 1 vector

of unobservable mutually uncorrelated, but serially correlated, factors and εt

is a N × 1 vector of unobservable mutually uncorrelated disturbances, which

also may be serially correlated. It is assumed that N is xed at a value that

is small relative to the number of observations T and that the model of the

factor process, and the model of the error process, are of a known parametric

form.

Most commonly, the factor process is specied to be a vector of mutually

64

uncorrelated autoregressions. Engle and Watson (1981) proposed a model

with a single autoregressive factor which they write in state space form and

estimate by a scoring algorithm. Watson and Engle (1983) and Shumway and

Stoer (1982) independently proposed an EM algorithm for estimating the

autoregressive factor model. Applications of this dynamic factor model have

included the construction of coincident and leading indicators1 and analyses

of wages2, productivity 3 and aggregate demand4. Following Altug (1989) and

Sargent (1989), Giannone et al. (2006) advocate the use of dynamic factor

models for business cycle analysis.

Other research has proposed estimation algorithms for dynamic factor mod-

els with dierent factor specications. In particular, Kim (1994), Kim and Yoo

(1995), Chauvet (1998), Kim and Nelson (1998) and Harris and Martin (1998)

have proposed models in which the factor follows the Markov-switching process

of Hamilton (1989), and have applied them to the modelling of business cycles,

and Dungey et al. (2000) have estimated a model of bond yields in which the

factor is autoregressive with GARCH disturbances.

This chapter makes three contributions to the dynamic factor analysis liter-

ature. Firstly, in Section 2.1 the dynamic factor model with mutually uncorre-

lated autoregressive factors is derived as a particular realisation of a VARMA

model with reduced spectral rank and additive noise. It is shown that in some

cases, this dynamic factor model corresponds to a minimum dimension state

space representation of the VARMA plus noise model. Since business cycle

1Stock and Watson (1990).2Engle and Watson (1981) and Watson and Engle (1983).3Lebow (1993).4Watson and Kraft (1984).

65

models generally have fewer stochastic shocks than observable variables and,

since macroeconomic variables are measured with noise, it is argued that the

dynamic factor model is useful as a general model for empirical macroeco-

nomics and should be viewed as an attractive alternative to the almost uni-

versally used vector autoregression (VAR)5. Secondly, in Section 2.2 the iden-

tication issue is considered for a fairly general class of dynamic factor model

which includes autoregressive factors, ARMA factors, Markov-switching fac-

tors, and many other specications that may be useful. Of particular interest

is the nding that under reasonably general conditions, dynamic factor models

are identied without the need for strong restrictions of the type necessary in

static factor analysis. Thirdly, in Section 2.3 a frequency domain approach

to the estimation of dynamic factor models is proposed. A simulation study

(presented in Section 2.4) shows that, for models with simple dynamics, the

frequency domain approach has some computational advantages over the tra-

ditional state space approach to estimation. However, its main attraction is

the ease with which it generalises to models with more complicated dynamics,

in particular models with ARMA factors and ARMA errors. This is in contrast

to the traditional approach where the construction of the state space repre-

sentation becomes more complicated as the dynamic structure of the model

becomes richer. A brief empirical example is presented in Section 2.5

2.1 Dynamic factor models in macroeconomics

Giannone et al. (2006) consider the estimation of business cycle models. Fol-

5See Giannone et al. (2006) for a similar argument.

66

lowing Altug (1989) and Sargent (1989), they note that when observable vari-

ables are subject to measurement error, business cycle models imply that vec-

tors of observable variables have dynamic factor structure. Using this obser-

vation as a starting point, in this section a rationale is presented for using

dynamic factor models with mutually uncorrelated autoregressive factors as a

general class of model for empirical macroeconomics. The argument presented

is more general than those provided by the above authors and pays particular

attention to the denition of the factors, and to issues of generality, uniqueness

and parsimony.

In a world characterised by measurement error, it is reasonable to assume

that a N × 1 vector of observed economic variables xt has two components

xt = ξt + εt (2.1)

where εt is a N × 1 vector of measurement errors, which are assumed to

be mutually uncorrelated at all leads and lags, and ξt is N × 1 vector of

`measurement-error-free' variables for which a theoretical economic model ex-

ists. ξt is assumed to be uncorrelated with εt at all leads and lags. It is assumed

that the spectral densities of ξt and εt are uniformly bounded. The rank of the

spectral density matrix of ξt is denoted k. In general we could have k = N but,

as pointed out by Giannone et al. (2006), equilibrium business cycle models

usually have fewer stochastic shock variables than observable variables. Con-

sequently, our main interest is in cases where k < N . A fairly general model

67

would have (with L the lag or backshift operator)

ξt = T (L)ηt

where T (L) is a N × k matrix of rational transfer functions and ηt is a k × 1

vector of white noises. It is assumed that ηt has a covariance matrix of Ik.

In business cycle models, the underlying shocks are often considered to be

variables such as demand shocks, monetary policy shocks, technology shocks,

etc. The (i, j)th element of T (L) may be written as

Tij(L) =bij(L)

aij(L)

where aij(L) and bij(L) are coprime polynomial operators in L. In the control

theory literature it is often assumed that rational transfer functions are strictly

proper so that deg(bij(L)) < deg(aij(L)). This assumption will be employed

later in this section for a discussion of minimal dimensionality, however for

the discussion of identication and estimation in subsequent sections, no re-

strictions need be placed on the degrees of the polynomials other than that

the numerator polynomial and the denominator polynomial must both be of

nite degree. Since an innite impulse response function may be approximated

to arbitrary accuracy by a ratio of nite degree polynomials, this assumption

does not sacrice generality. The complete model may be written as

xt = T (L)ηt + εt (2.2)

Interest centres on the estimation of T (L) since it largely determines the re-

68

sponses of observable variables to impulses in the underlying shock variables.

In cases where measurement error does not exist, these impulse responses may

be estimated in a VAR framework, provided that T (L) satises the `fundamen-

talness condition' that there exists a k×N matrix polynomial S(L) such that

S(L)T (L) = Ik (see Hansen and Sargent (1990) for a discussion). As pointed

out by Giannone et al. (2006) however, in the presence of measurement error,

the identication of the impulse responses in a VAR framework is problem-

atic. Giannone et al. (2006) propose instead that the impulse responses be

estimated in a dynamic factor analysis framework and they conduct monte

carlo simulations which demonstrate the superiority of this approach. What

follows is a more detailed rationale of the dynamic factor model than that

given by Giannone et al. (2006). In particular, care is taken to precisely dene

the factors, and issues of uniqueness, generality and parsimony are considered.

Since the rst component of Equation (2.2) is a VARMA model with a

reduced spectral rank, it is known that even in the absence of the measurement

error vector εt, the parameters of T (L) are not uniquely identied6. In VARMA

modelling, the rst step in dealing with identiability is to nd a minimal

dimensional state space model. This minimal dimension is called the McMillan

degree7 and it has a number of other characterisations. In this case however,

the VARMA system is observed subject to noise, and so the identication issue

is non-standard. In this section, the identication issue is handled by choosing

a form of the `VARMA plus noise' model that corresponds to a dynamic factor

model. In the next section, it will be shown that, under certain conditions,

6See, for example, Lütkepohl (1991).7See Solo (1986).

69

the dynamic factor representation is unique.

Consider a single column in T (L), Tij(L) =bij(L)

aij(L)for i = 1, ..., N . Let

dj(L) be the lowest common multiple of aij(L) for i = 1, ..., N . The elements

of column j may then be written as

Tij(L) =cij(L)

dj(L)

where cij(L) and dj(L), which is monic, are coprime polynomial operators.

Therefore, the rst term on the right hand side of Equation (2.1) may be

written as

ξt =k∑

j=1

cj(L)

dj(L)ηjt =

(c1(L)

d1(L)...

ck(L)

dk(L)

)ηt = (c1(L)...ck(L))

1

d1(L)0 0

0. . . 0

0 0 1dk(L)

ηt

where cj(L) =

(c1j(L) · · · cNj(L)

)′. Denoting

A(L) =

d1(L) 0 · · · 0

0. . . · · · 0

.... . .

...

0 · · · 0 dk(L)

and β(L) =

c11(L) · · · c1k(L)

.... . .

...

cN1(L) · · · cNk(L)

we may dene the factor vector

ft = A(L)−1ηt (2.3)

70

and write Equation (2.1) as

xt = β(L)ft + εt (2.4)

where ft is a k × 1 vector of uncorrelated scalar autoregressions. Thus, the

`VARMA plus noise' model has a dymamic factor representation.

Continuing, we now assemble a state space model. To do this we write ξt

in terms of its k components

ξt =k∑

j=1

ξjt =k∑

j=1

c1j(L)

...

cNj(L)

dj(L)−1

Consider the single input multiple output (SIMO) transfer function given by

single element of this sum,

c1j(L)

...

cNj(L)

dj(L)−1. The construction of a min-

imal state space representation of this transfer function is straightforward,

well-known8 and is as follows.

8See, for example, Kailath (1980) or Barnett (1980).

71

ξjt =

cj11 cj12 · · · cj1m

cj21 cj22 · · · ......

. . ....

cjN1 · · · · · · cjNm

νjt

νt =

−dj1 −dj2 · · · −djm

1 0 · · · 0

.... . . 0

0 0 1 0

νjt−1 +

1

0

...

0

δjt

where δjt is a scalar white noise and νjt is a mj × 1 state vector, where mj =

deg (dj(L)). The minimal state space model for ξt is then constructed by

stacking the models for ξjt for j = 1, ..., k. The state dimension of the factor

model is thenk∑

j=1

mj. Now we need to see under what conditions this is the

minimal state dimension, i.e. the McMillan degree.

To determine the McMillan degree of T (L), an easy approach is to construct

Gilbert's minimal state space representation9 (see Kailath (1980) or Barnett

(1980)). Gilbert's representation is constructed by taking a partial fraction

expansion of T (L)

T (L) =k∑

j=1

Rj

1− λjL

where the N×k matrices Rj, j = 1, ..., k are of rank %j respectively. A singular

9It should be noted that Gilbert's representation assumes that the roots of the denomi-nator polynomials are distinct. However, since the set of models for which these polynomialshave repeated roots has measure zero, this is not a matter of great practical concern.

72

value decomposition of Rj is

Rj = CjN×%j

× BjN×%j

The Gilbert state space model is then

ξjt =

(C1 · · · C1

)ωt

ωt =

λ1I%1 0 · · · 0

0 λ2I%2 · · · 0

.... . . 0

0 0 0 λkI%k

ωt−1 +

B1

B2

...

Bk

ϕt

where ωt is ak∑

j=1

%j × 1 state vector and ϕt is a k × 1 error term. Since the

Gilbert form is known to be of minimal dimension, the McMillan degree of

ξt = T (L)ηt isk∑

j=1

%j. The key point here is that, if the denominator polyno-

mials dj(L), j = 1, ..., k used to construct A(L) in the factor model, have no

common polynomial factors, then the Rj matrices in the Gilbert representation

will have full rank. Consequently,k∑

j=1

%j =k∑

j=1

mj. If the denominator polyno-

mials do have common polynomial factors, then the Rj matrices in the Gilbert

representation will have reduced rank, resulting ink∑

j=1

%j <k∑

j=1

mj. This yields

the following proposition

Proposition 1. The dynamic factor model given by Equations (2.3) and (2.4)

creates a minimal dimension state space representation of T (L) if and only

if the polynomials in the diagonal matrix A(L) have no common polynomial

factors.

73

2.2 Identication

In the previous section, a dynamic factor model with mutually uncorrelated au-

toregressive factors was derived as a particular realisation of a VARMA model

of reduced spectral rank observed subject to measurement error. It was shown

that this realisation corresonds to a minimal dimension state space represen-

tation in some cases. However, the issue of uniqueness was not considered. In

the case of the classical static factor analysis model, it is well-known10 that the

factors and factor loadings are identied only up to an orthogonal transforma-

tion. Identication of the factors requires the imposition of restrictions on the

factor loading matrix such that the only orthogonal transformation for which

the restrictions are invariant is the identity. Camba-Mendez et al. (2001) show

that similar restrictions are also identifying in the case of a factor model with

mutually uncorrelated autoregressive factors and with no lagged factors aect-

ing the observed vector. Whether such restrictions are identifying in a more

general setting where the observed variable is related to a nite distributed lag

of factors is an open question. Of further interest is whether restrictions such

as these are needed at all. The identication problem exists in classical static

factor analysis because it is possible to construct mutually and serially uncor-

related weighted sums of mutually and serially uncorrelated factors. Therefore,

given any valid vector of factors, a dierent, but equally valid, factor vector

can be constructed by an orthogonal transformation. This is not the case if

the factors are mutually uncorrelated autoregressions. Weighted sums of au-

toregressions are not autoregressions. Consequently, it might be hoped that

10See, for example, Lawley and Maxwell (1971).

74

the dynamic structure of the factors in the dynamic factor model eliminates

at least some of the orthogonal transformations of the factors that would be

permissable in the static factor case.

In this section, an investigation of the identication issue for dynamic fac-

tor models is presented. The results are derived for a fairly general class of

dynamic factor model which includes, but is not restricted to, the models with

mutually uncorrelated autoregressive factors derived in the previous section.

The class of models considered is those with factors which are uncorrelated with

the errors and for which the spectral density matrices of the factors and the

errors are diagonal. Consequently, in addition to being of interest to macroe-

conomists wishing to estimate models of reduced spectral rank that are subject

to measurement error, the results in this section are relevant for the estimation

of multiple factor models with Markov-switching factors, GARCH factors, etc,

provided that the factors are mutually uncorrelated.

Consider the dynamic factor model

M : xt = β(L)ft + εt

where xt is a N×1 vector of observable variables, β(L) is a N×k nite-order

one-sided polynomial in which L is the backshift operator, ft is a k× 1 vector

of unobservable factors and εt is a N × 1 vector of unobservable disturbances.

The following assumptions dene the class of factor model under consideration.

Assumptions 1.

1.1 β(L) is a one-sided nite matrix polynomial operator.

75

1.2 The spectrum of ft is diagonal and uniformly bounded. The spectrum of

εt is diagonal and uniformly bounded.

1.3 E(ftε′t−j) = 0, j ∈ Z.

The spectrum of the observable variables may be written as

Sxω = βωSf

ωβHω + Sε

ω

where H denotes the complex conjugate transpose and Sfω and Sε

ω are the

spectra of ft and εt respectively, which again are diagonal. The subscript

ω denotes the frequency. The identication theorems that follow are based

on a consideration of the rst two moments only. Accordingly, the following

denition is made.

Denition 2.2.1. We dene two factor models M and M ∗ to be observa-

tionally equivalent if the spectral density matrices of the observable vectors xt

and x∗t , Sxω and Sx∗

ω are equal for all frequencies ω, where −π 6 ω 6 π.

In order to prove identication, the following assumption, which is the

dynamic analogue of a standard assumption in static factor analysis, is made.

Assumptions 2.

2.4 If any row of β(L) is deleted, from the remaining rows it is possible to

construct two k×k full-rank polynomial matrices and a polynomial matrix

with (N − 2k − 1) > 0 rows.

The theorems and proofs below make use of some properties of polynomial

matrices which are not often discussed in the economics literature. These terms

76

are dened in Appendix 1. The rst result, a dynamic extension of a static

result in Anderson and Rubin (1956), gives conditions under which the error

process spectrum is identied. The proofs of all theorems appear in Appendix

2.

Theorem 2.2.1. For the set of models M , under assumptions 1.1, 1.2, 1.3

and 2.4 the disturbance spectrum Sεω is identied.

Note that the theorem requires that N > 2k + 1, providing a lower bound

on the number of observable variables if the identication results are to apply.

However, it is not sucient to simply have a large number of variables relative

to the number of factors. There also exist restrictions on the linear dependence

of the ltering that need to be satised. For example, if it was the case that the

data set consisted of time series observations of a panel of rms or individuals

who have identical characteristics, then it may be the case that they all react

to changes in the common factors in the same way, in which case the rank of

β(L) may be insucient for the theorem to apply, even if there are a large

number of variables. Therefore, loosely speaking, Theorem 2.2.1 says that it is

insucient to have a large number of variables; they must also be suciently

diverse.

The next result shows that the number of factors is uniquely determined

under the conditions of Theorem 2.2.1.

Theorem 2.2.2. For the set of models M , under assumptions 1.1, 1.2, 1.3

and 2.4 the dimension of the factor vector (k) is identied.

With the disturbance spectrum and factor dimension identied, all that

remains is to determine conditions under which βω and Sfω are uniquely de-

77

termined by βωSfωβH

ω . The following lemma, which is an extension of a static

result by Reiersøl (1950), provides a useful representation of the set of obser-

vationally equivalent models which will subsequently be used .

Lemma 1. Under assumptions 1.1, 1.2, 1.3 and 2.4, the set of observationally

equivalent dynamic factor models has the spectral representation

Sxω = β∗ωSf∗

ω β∗Hω + Sεω

where β∗ω = βωM−1ω and Sf∗

ω = MωSfωMH

ω and Mω is a k × k non-singular

polynomial operator in e−iω. Furthermore, if β(L) is irreducible, then Mω is

unimodular.

We now show that the factor spectrum and the lter β(L) are identied

under a particular pattern of zero-restrictions on the factor loading matrix

β(L).

Theorem 2.2.3. If

a) Assumptions 1.1, 1.2 and 1.3 hold.

b) β(L) is irreducible.

c) Following the deletion of any row of β(L), from the remaining rows it is

possible to construct a k×k lower-triangular polynomial matrix, a k×k

full-rank polynomial matrix and a polynomial matrix with (N−2k−1) > 0

rows

then β(L) is identied and the factor spectrum Sfω is identied up to a rescaling

of the factors, and a sign change on each factor.

78

Theorem 2.2.3 generalizes to the time series context the well-known result

that an appropriate pattern of zero-restrictions identies a static factor model.

Thus, it tells us that identication is no more of a problem in the time-series

setting than it is in the static case. It states that a k-factor model is identied

if k − j variables are independent of j of the factors for j = 1, .., k − 1. While

useful, the factor-exclusion assumptions that the theorem requires are strong

and may not be satised in many applications. In Theorem 2.2.4 it is shown

that under fairly general conditions, the results of Theorem 2.2.3 hold without

zero-restrictions.

Theorem 2.2.4. If

a) Assumptions 1.1, 1.2, 1.3 and 2.4 hold.

b) β(L) is irreducible.

c) The factor spectra are linearly independent functions, i.e. λdiag(Sfω) =

0∀ω ∈ [0, π] ⇒ λ = 0 for a 1× k vector λ,

then β(L) is identied and the factor spectrum Sfω is identied up to a re-

ordering of the factors, a rescaling of the factors, and sign changes of the

factors.

Since the class of nite order autoregressions is not closed under addition,

the spectra of autoregressive factors are linearly independent provided that

they are all dierent11. Accordingly, the factor lter matrix β(L), the factor

spectra, and the disturbance spectra of autoregressive multiple factor models

are identied under the rank and irreducibility assumptions of Theorem 2.2.4.

11That is to say, we exclude cases such as f1t = f2t.

79

Since autoregressions are identied from their unconditional second moments,

all the parameters of the model are identied. Zero restrictions, or the unit

restrictions of Camba-Mendez et al. (2001) are redundant in this case, provided

that the irreducibility and rank assumptions on β(L) are satised. This is a

remarkable result since it implies that the factor estimates from such models

can be interpreted far more readily than is the case for static factor estimates.

For the model

yt = T (L)ηt where E(ηtη′t) = Ik

it is well-known that the impulse response functions are identied only up to

an othogonal transformation. Consequently, for example, the estimation of

impulse response functions in a structural VAR model requires restrictions to

be imposed on the model. This rotational indeterminacy is eliminated in the

construction of the factor model by writing T (L) as the product of a polynomial

matrix and a diagonal matrix containing the lowest common denominators

of the columns of T (L). The only class of orthogonal transformation which

preserves this diagonal matrix, and maintains an identity covariance for ηt, is

dened by the set of permutation matrices. Consequently, this particular form

of the VARMA model is identied subject to the rank conditions of Theorem

2.2.4. It should be noted however, that the dynamic factor model is only one

particular realisation of the VARMA plus noise model. If the object of the

analysis is to conduct an impulse response analysis for the VARMA plus noise

model

xt = T (L)ηt + εt

80

then dynamic factor techniques provide a convenient way to estimate the pa-

rameters of the model. However, in this form of the model, the impulse re-

sponse function that relates the structural shocks ηt to the observable variables

xt is still subject to a rotational indeterminism.

The theorems in this section have wider applicability than models with

autoregressive factors. For the case of a model with ARMA factors things are

not as clear cut as for autoregressive factors. Since ARMA processes contem-

poraneously aggregate to ARMA processes (see e.g. Lütkepohl (1991)), it is

possible to construct multiple ARMA-factor models for which Assumption c)

of Theorem 2.2.4 does not hold. However, such models are somewhat con-

trived in the sense that the factors are able to cancel each other out to some

extent. If we are prepared to assume that no such cancellation is possible so

that the factor spectra are linearly independent, and that the polynomial nu-

merators and denominators in any of the ARMA processes in the model are

coprime, then all the parameters in the model are identied under the rank

and irreducibility assumptions on β(L).

For models with factors which follow unit root processes, such as the

Fernández-Macho (1997) model, the observable variable yt must be dierenced

in order to satisfy Assumption 1.2. The Fernández-Macho (1997) model will

then be equivalent to a static model and zero restrictions may be used for

identication when there is more than one factor in the model. For a general-

isation of the model with factors which follow random walks with autoregres-

sive shocks, the dierenced model will be similar to the autoregressive-factor

model and the parameters of the model will be identied under the rank and

irreducibility conditions of Theorem 2.2.4. Indeed the results may be ap-

81

plied to any factor model of an integrated vector xt which may be written as

∆dxt = β(L)ft + εt where ft and εt satisfy Assumptions 1.2 and 1.3, and β(L)

satises Assumptions 1.1 and 2.4.

Poskitt and Chung (1996) have shown that an r-state scalar Markov-

switching model with a non-singular transition matrix generates the auto-

covariance function of an ARMA(r − 1, r − 1) process. It follows that an

autoregressive lter of a 2-state Markov-switching variable plus uncorrelated

white noise must have the spectrum of an ARMA process. Consequently, the

above comments about identiability of ARMA factors also apply to multiple

factor versions of the Markov-switching model of Chauvet (1998). Similarly,

Karlsen (1990) shows that a process which switches between r AR(1) processes

has the autocovariance of an ARMA process, indicating that a similar result

holds for multiple factor generalisations of the model used by Chauvet et al.

(2002). Zhang and Stine (1999) similarly nd ARMA structure for a more gen-

eral Markov-switching vector autoregression model. Thus, we can state that

for a fairly general class of Markov-switching factor model, it is possible to

write down examples for which Assumption c) of Theorem 2.2.4 does not hold,

but that with the exception of contrived cases where polynomial terms cancel

out, the models are identied under the rank and irreducibility assumptions

of Theorem 2.2.4.

The results for factor-GARCH type models are less pleasing. Since GARCH

variables are unconditionally homoscedastic and serially uncorrelated, the re-

sults mirror those for static factor models that is Theorem 2.2.4 does not

apply and we need to impose zeros restrictions to guarantee identication of the

factor spectra, disturbance spectra and factor lter matrix. Of course in the

82

1-factor case, Assumption c) of Theorem 2.2.4 is satised and zero restrictions

are unnecessary. However, even in this case, only the spectra of the factor and

disturbances are identied. Since GARCH processes are not identied by un-

conditional second order moments, the parameters driving these processes are

not identied by our theorems. For the AR/GARCH factor model of Dungey

et al. (2000), Assumption c) of Theorem 2.2.4 is satised and the factor lter

and the AR parameters of the factors and disturbances are identied. How-

ever, the GARCH parameters are still not identied by our theorems since

they only consider unconditional second order moments.

2.3 Estimation

In Section 2.1, the dynamic factor model with uncorrelated autoregressive

factors was derived as a particular realisation of a `VARMA plus noise' model

where the VARMA component has reduced spectral rank. This model was mo-

tivated by business cycle models in which the variables of interest are driven

by a relatively small number of mutually and serially uncorrelated structural

shocks, and are assumed to be observed subject to measurement error. Engle

and Watson (1981) proposed that the dynamic factor model with autoregres-

sive factors be estimated using a scoring algorithm with the model written in

state space form. Shumway and Stoer (1982) and Watson and Engle (1983)

independently proposed that the EM algorithm be used to estimate the model

in state space form. While these algorithms work reasonably well for small

models with simple dynamics, their application to models with extensive lag

structures can be more troublesome, with the algorithms generally taking a

83

long time to converge, and often failing to converge. These issues can usually

be resolved with some skilled human intervention, but the fact remains that

dynamic factor models with rich lag structures are dicult to estimate. Fur-

thermore, in the derivation of the model in Section 2.1, the lag polynomials

of the autoregressive factors are constructed as the least common multiples of

the denominator polynomials in each column of the transfer function in the

original VARMA plus noise model. Consequently, the orders of the autore-

gressive factors could be high in practice, making estimation dicult. Since a

high-order polynomial may be approximated by a ratio of polynomials of much

lower order, a practical solution to this problem is to replace the high-order

autoregressive factors with low order ARMA processes. Furthermore, the state

space construction leading to Proposition 1 does not preclude common factors

in the numerator polynomials. If this occurs then the representation of the

common polynomial factors as moving average components of the model fac-

tors can lead to a substantial reduction in the parameter count. This motivates

a dynamic factor model with ARMA factors.

In this section, a frequency domain approach to the estimation of dynamic

factor models with ARMA factors is proposed. In the derivation of the dynamic

factor model in Section 2.1, the errors were considered to be measurement

errors. As such, one might expect them to be serially uncorrelated. However,

in the interests of generality, the approach described below is able to estimate

dynamic factor models with errors which follow ARMA processes.

The published estimation approaches taken to factor models based on one-

sided ltering have all been based on a state space representation of the model.

In the case of a single factor model with an AR(1) factor and no lagged factors,

84

the state space representation coincides with the factor model. Higher orders

can be accomodated by expanding the system matrices, and multiple factors

can be represented by stacking state vectors. Serially correlated errors can be

incorporated by their inclusion in the state vector. By this stage however, the

state space representation of the model is becoming somewhat complicated.

The addition of ARMA dynamics increases the complexity further. An al-

ternative approach, which generalises easily from a simple model to one with

complex dynamics, is to carry out estimation in the frequency domain.

In the time domain, the model with ARMA factors and ARMA errors may

be written as

xt = β(L)ft + εt

ft = A(L)−1B(L)ηt

εt = H(L)−1G(L)δt

where A(L), B(L), H(L) and G(L) are diagonal polynomial matrices of degrees

m, d, n and v respectively. The degree of β(L) is denoted q. xt is a N × 1

vector and ft is a k× 1 vector. The elements of ηt and δt are all mutually and

serially uncorrelated at all leads and lags, it is assumed that E(ηtη′t) = Ik, and

we denote R = E(δtδ′t), where R is a N ×N matrix.

The discrete Fourier transform of xt is dened as

xω =T∑

t=1

xte− 2πi

Tωt

for harmonic frequencies ω. Taking Fourier transforms, the above dynamic

85

factor model may be written in the frequency domain as approximately12

xω = β(e−

2πiT

ωt)

fω + εω

fω = A(e−

2πiT

ωt)−1

B(e−

2πiT

ωt)

ηω

εω = H(e−

2πiT

ωt)−1

G(e−

2πiT

ωt)

δω

where fω, εω and δω are Fourier transforms of ft, εt and δt respectively.

Since E(ft) = 0 and E(δt) = 0, it follows from the linearity of the discrete

Fourier transform that E(xω) = 0. Also, because of the linearity of the discrete

Fourier transform, it follows that the covariance matrices of δω, ηω, εω, fω, and

xω are the discrete Fourier transforms of the autocovariance matrix sequences

of δt, ηt, εt, ft, and xt respectively. Therefore, the covariance matrices of xω,

fω and εω are respectively

Sxω = β

(e−

2πiT

ωt)

Sfωβ(e−

2πiT

ωt)H

+ Sεω

Sfω = A

(e−

2πiT

ωt)−1

A(e

2πiT

ωt)−1

B(e−

2πiT

ωt)

B(e

2πiT

ωt)

Sεω = H

(e−

2πiT

ωt)−1

H(e

2πiT

ωt)−1

G(e−

2πiT

ωt)

G(e

2πiT

ωt)

R

(2.5)

where H denotes the complex conjugate transpose. The complex Gaussian

likelihood13 is

L =∑

ω

Lω where

Lω = −1

2ln|Sx

ω| −1

2tr(xωxH

ω Sx−1ω

)

12Because of end-point issues, these equations do not hold exactly.13Whittle (1961).

86

where the sum is over harmonic frequencies.

Some calculus provides the gradient vector and information matrix

∂Lω

∂θ′= 1

2vec[Sx−1

ω

(xωxH

ω − Sxω

)Sx−1

ω

]H ∂vecSxω

∂θ′

E

[∂2Lω

∂θ′∂θ

]= −1

2

[∂vecSx

ω

∂θ′

]H (S

x−1

ω ⊗ Sx−1ω

) [∂vecSxω

∂θ

]where θ is a vector containing the model parameters and S

x

ω is the complex

conjugate of Sxω.

Estimation of the dynamic factor model in the time domain requires the

construction of a state space representation. For models with few lags, this

is elementary. For more complicated lag structures however, it is less simple;

and the derivation of an algorithm to construct a miminal dimension state

space representation for a general model with ARMA factors and errors of

arbitrary orders is a non-trivial task. In the frequency domain however, a

general procedure for computing the likelihood is relatively easy to implement.

Dene the following coecient matrices

A is a k× (m+1) matrix of factor AR coecients, with the rst column

a vector of ones,

B is a k× (d + 1) matrix of factor MA coecients, with the rst column

a vector of ones,

H is a N × (n + 1) matrix of error AR coecients, with the rst column

a vector of ones,

G is a N × (v + 1) matrix of error MA coecients, with the rst column

87

a vector of ones,

R is a N ×N diagonal matrix containing the variances of δt.

β = (β1 · · · βk) is a N×k(q+1) matrix of factor lter coecients, where

βi is a N × (q + 1) matrix containing the coecients linking the q lags

of the ith factor to xt.

Also dene

γpω =

1

e−2πiT

ω

e−2πiT

ω2

...

e−2πiT

ωp

, ϕp

ω = γpωγpH

ω and Θω =

γqω 0 · · · 0

0. . .

......

. . . 0

0 · · · γqω

We then have β(e−

2πiT

ωt)

= βΘω and the covariance of the observable Fourier

ordinates may be written as

Sxω = βΘωSf

ωΘHω β′ + Sε

ω

where

Sfω =

(Bϕd

ωB′ Ik

)(Aϕm

ω A′ Ik)−1

Sεω = (Gϕv

ωG′ IN) (HϕnωH ′ IN)

−1R

and denotes the Hadamard product. This is a general expression which is

correct for all possible model orders and has the advantage of being relatively

easy to code. The value for the covariance at each frequency may be used to

88

compute the likelihood. The derivatives in the expressions for the gradient

vector and the information matrix can be evaluated numerically and a scoring

algorithm implemented to maximise the likelihood.

2.4 A Comparison of the Time Domain and Fre-

quency Domain Algorithms

In this section the relative computational eciency of the time domain and

frequency domain approaches to estimation is investigated. Attention is re-

stricted to models with only autoregressive dynamics since these are relatively

easy to estimate in the time domain. The gures reported in Table 2.1 are

the results of a simulation exercise in which 600 models were estimated. Six

dierent sets of model dimensions were chosen and 100 random data sets were

generated for models with those dimensions. For each of the 100 models the

true parameter values were chosen randomly. All autoregressive parameters

(including variances) were sampled from uniform [0,1] distributions, and the

elements of the lter matrix β(L) were sampled from N(0,1) distributions. For

a given set of randomly chosen parameter values a single random data set

was generated. A model with autogressive factors was tted using the time

domain state space scoring algorithm proposed by Engle and Watson (1981)

and the frequency domain algorithm proposed in Section 2.3. The number

of iterations and the execution time (in seconds) were recorded for each al-

gorithm. All starting values were set equal to 0.5. If an algorithm failed to

converge after 200 iterations it was stopped and recorded as failing to con-

89

verge. It should be noted that the models in Table 2.1 all have low polynomial

degrees. This was done as a matter of practicality. Dynamic factor models

with complex dynamics are dicult to estimate and often require a degree of

human intervention in order get the algorithm to converge. Consequently, they

do not lend themselves to simulation studies. Table 2.1 contains the average

number of iterations to convergence, the average execution time in seconds,

and the failure rate, all rounded to the nearest whole number. Maximization

was carried out using the fminunc procedure from the Optimization Toolbox

in Matlab and was run using Matlab 6.5 on a Pentium M 1.6Mhz notebook

with 1GB of RAM. All the default options of the toolbox were used. It should

be borne in mind that dierent software packages use dierent optimization al-

gorithms and may produce dierent results. Accordingly, the results reported

here are intended to provide an indication of the computational performance

which might be expected, rather than a claim of what will necessarily occur

for any particular data set using any software on any computer.

As a general observation, the new frequency domain algorithm appears

to converge in roughly the same number of iterations as the traditional state

space algorithm in most situations. The exceptions are when there are lags

of factors, in which case the frequency domain approach took fewer iterations,

and the multiple-factor case in which it took more. In all cases though, the

frequency domain approach takes less time to converge than the state space

algorithm. On average, the increase in computational speed was 385%. Given

this faster computation, the relative ease with which it can be programmed,

and the fact that it generalizes easily to the ARMA-factor ARMA-disturbance

case, the frequency domain approach to estimation is an attractive alternative

90

to the traditional state space scoring algorithm.

Table 2.1: Estimation by Time Domain and Frequency Domain Scoring Algo-rithms

Iterations Execution Time Failure RateN=5, q=0, k=1, m=1, n=0, T=100Time Domain 11 9 5%Frequency Domain 13 3 6%N=5, q=0, k=1, m=1, n=0, T=1000Time Domain 11 85 0%Frequency Domain 11 26 0%N=7, q=0, k=3, m=1, n=0, T=100Time Domain 28 108 3%Frequency Domain 42 28 15%N=5, q=0, k=1, m=1, n=1, T=100Time Domain 12 10 2%Frequency Domain 12 3 0%N=20, q=0, k=1, m=1, n=0, T=100Time Domain 11 136 0%Frequency Domain 16 35 0%N=5, q=2, k=1, m=1, n=0, T=100Time Domain 23 46 3%Frequency Domain 16 8 2%N=number of observable variables;q=number of lags of factors;k=number of factors;m=AR order of factors;n=AR order of disturbances;T=number of observations.

2.5 An Empirical Example

In this section, a brief empirical example of a dynamic factor model with

an ARMA factor and ARMA errors is presented. The variables measure the

91

monthly growth rates in industrial production in the G7 countries. The data

are from May 1969 to March 2007 and are seasonally adjusted. They are

taken from the OECD Main Economic Indicators database. Plots of the data

are presented in Figure 2.1. The objective in this section is to t a model with

ARMA factors and ARMA errors as an illustration of the technique discussed

in the previous two sections.

Discussions of order selection are conspicuously rare in the applied dynamic

factor analysis literature14. Presumably the reason for this is the heavy com-

putational cost involved in estimating dynamic factor models, particularly for

models with rich lag structures. The factor lter matrix β(L) contains N × k

dierent polynomial lters, each of which could have a dierent degree. Each

factor and each error has a numerator polynomial and a denomiator polynomial

with degrees which need to be chosen. Even for quite small models with mod-

estly set maximum possible polynomial degrees, the number of combinations

of polynomial degrees is extremely large so much so that an extensive search

which involved estimating all possible models would be infeasible, even if each

model could be estimated very quickly. In practice it is only computationally

feasible to estimate a small number of models15. Given this, and bearing in

mind that the following is intended to be an illustration of a technique, rather

14Camba-Mendez et al. (2001) x the order of the factor lter β(L) at zero, assumethat the error process is white noise, and use the Schwarz-Bayes criterion to choose theautoregressive order between values of 1 and 4. They state that this was done for reasonsof computational feasibility." They also consider the forecasting performance of models withbetween one and four factors. More commonly however, the model orders are simply xedand there is no discussion of their choice, e.g. Lebow (1993)

15As an indication of a rough order of magnitude, it might be feasible to estimate a fewdozen models in a day, provided that the data set did not have any `problematic' featuresand that an experienced analyst was on hand to manage the process, choose starting values,etc.

92

Figure 2.1: Monthly industrial production growth for G7 countries

-5

-4

-3

-2

-1

0

1

2

3

4

1970 1975 1980 1985 1990 1995 2000 2005

IP

Canada

-6

-4

-2

0

2

4

6

1970 1975 1980 1985 1990 1995 2000 2005

IP

France

-10

-5

0

5

10

15

1970 1975 1980 1985 1990 1995 2000 2005

IP

Germany

-15

-10

-5

0

5

10

15

1970 1975 1980 1985 1990 1995 2000 2005

IP

Italy

-4

-3

-2

-1

0

1

2

3

4

1970 1975 1980 1985 1990 1995 2000 2005

IP

Japan

-8

-6

-4

-2

0

2

4

6

8

10

1970 1975 1980 1985 1990 1995 2000 2005

IP

UK

-4

-3

-2

-1

0

1

2

3

1970 1975 1980 1985 1990 1995 2000 2005

IP

US

93

than a serious piece of economic analysis, the following restrictions will be

placed on the model orders. Firstly, all stochastic processes will be modelled

as ARMA(1,1). The ARMA(1,1) model has a good record of performance as a

model for observable stationary economic variables. While it cannot produce

oscillatory behaviour, this is unlikely to be a disadvantage in this application

since all the variables are seasonally adjusted16. Only single factor models

were considered. With seven variables, the identication theorems in Section

2.2 allow for the estimation of up to three factors. However, in this application

we are seeking a single variable which summarises the joint dynamic behaviour

of industrial production across the G7 countries. It will also be assumed that

all the polynomials in the factor lter matrix β(L) are of the same degree q.

The degree q is chosen by minimising the Schwarz-Bayes Criterion over model

orders from zero to six.

As is the case with the traditional time domain state space scoring algo-

rithm, the performance of the frequency domain algorithm of Section 2.3 is

quite sensitive to the choice of starting values. Particularly for models with

rich lag structures, the use of arbitrary starting values often results in the

algorithm failing to converge. Consequently, the following method was used.

Firstly, a model with white noise errors, an AR(1) factor and no lagged factors

was estimated using a starting value of 0.5 for all parameters. The estimates

from this procedure were then used as starting values for a model with the

same structure but with an ARMA(1,1) factor. The estimates of this model

16The exercise was also executed assuming that all the errors and the factor follow AR(2)processes. The results were not markedly dissimilar to those of the ARMA(1,1) model and,since models with autoregressive dynamics have appeared in the literature, the results arenot reported.

94

were used as starting values to estimate a model with the same structure but

with AR(1) errors. These estimates were then used as starting values to esti-

mate a model with an ARMA(1,1) factor and ARMA(1,1) errors. This is the

rst model for which the Schwarz-Bayes Criterion is recorded. Lagged factors

are then successively added to the model by using the estimates of the pre-

vious model as starting values for the parameters that remain in the model,

and zeros as starting values for the newly introduced parameters. In this way,

all seven models under consideration were estimated without any instances of

the algorithm failing to converge, or taking an undue number of iterations to

converge. The total computational time was 22 minutes using the fminunc pro-

cedure from the Optimization toolbox in Matlab 6.5 on a Quad-Core Pentium

computer with 4Gb RAM.

The values of the Schwarz-Bayes Criterion (SBC) and the likelihood for

factor lter degrees (q) from zero to six are presented in Table 2.2. Surprisingly,

the SBC-minimisaton procedure chooses a degree of zero. It is possible that

this is due to the restriction that all the polynomials in β(L) should have the

same order. The consequence of this is that each time q is increased by 1,

seven extra parameters are added to the model.

Table 2.2: SBCs and log-likelihoods (q = the number of lags of factors)

q 0 1 2 3 4 5 6BIC 9.9998 10.0677 10.1236 10.1751 10.2143 10.2666 10.3354L -2183.1 -2177.2 -2168.5 -2158.8 -2146.3 -2136.7 -2131.0

The estimates of the factor loadings and the variances of the shocks of

the error processes (i.e. R as dened in Equation 2.3) are presented in Table

95

2.3. Standard errors are in brackets. For each country, the factor coecient is

signicantly dierent from zero at standard signicance levels.

Table 2.3: Estimates of factor loadings and error variances

Canada US Japan France Germany Italy UK

β0.2794(0.0532)

0.2595(0.0483)

0.3378(0.0627)

0.2475(0.0457)

0.228(0.046)

0.2962(0.0601)

0.176(0.0433)

R0.9687(0.0698)

0.349(0.0294)

1.3149(0.0967)

1.1373(0.0829)

1.8631(0.1285)

4.0346(0.2759)

1.8647(0.1257)

The estimates of the parameters of the ARMA processes used to model

the factors and the errors are presented in Table 2.4, with standard error in

brackets. Interestingly, the MA parameter for the factor is not signicantly

dierent from zero at the 5% signicance level, suggesting that an AR factor

might have been adequate in this case. For the error terms, the results are

mixed. For France and Germany, the MA coecients are not statistically

signicantly dierent from zero, but the AR coecients are; for Japan it is the

MA coecient only which is signicant, and for Canada neither the MA nor

the AR coeents are signicantly dierent from zero. For the US, the UK and

Italy, both coecients are signicant.

Table 2.4: Estimates of ARMA parameters of the factor and errors

ft εCant εUS

t εJapt εFra

t εGert εIt

t εUKt

AR0.8632(0.043)

0.066(0.2038)

0.4505(0.1536)

0.0618(0.128)

0.4835(0.0891)

0.4627(0.0922)

0.7165(0.0882)

0.5658(0.1437)

MA0.2572(0.1875)

-0.1747(0.2111)

0.642(0.1795)

-0.3108(0.1407)

-0.0971(0.0885)

-0.0764(0.0864)

0.3013(0.0693)

0.3015(0.1264)

Figure 2.2 shows the estimated ARMA(1,1) spectra of the errors. With

96

the exception of the US, the variance is concentrated in the higher frequencies.

The estimated ARMA(1,1) spectrum of the factor is shown in Figure 2.3.

Figure 2.2: Estimated ARMA(1,1) spectra of the errors

0.6

0.8

1

1.2

1.4

1.6

1.8

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

Frequency (x pi)

Canada

0

0.5

1

1.5

2

2.5

3

3.5

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

Frequency (x pi)

France

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

Frequency (x pi)

Germany

0

1

2

3

4

5

6

7

8

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

Frequency (x pi)

Italy

0.5

1

1.5

2

2.5

3

3.5

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

Frequency (x pi)

Japan

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

Frequency (x pi)

UK

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

Frequency (x pi)

US

In contrast to most of the errors, the variance of the factor appears to be

concentrated in the lower frequencies.

97

Figure 2.3: Estimated ARMA(1,1) spectra of factor

0

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

Frequency (x pi)

Factor

Using a standard property of the complex Gaussian distribution, the ex-

pected value of the Fourier ordinates of the factor are given by

E(fω|yω) = Sfω(βΘω)HSy−1

ω yω

Taking the inverse discrete Fourier transform of the series generated by eval-

uating this expression at all harmonic frequencies provides an estimate of the

factor, which is presented in Table 2.4. The relatively high content of low

frequency uctation is apparent in this plot.

98

Figure 2.4: Estimated ARMA(1,1) factor

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

1970 1975 1980 1985 1990 1995 2000 2005

Factor

2.6 Concluding Comments

This chapter has made three contributions to the literature on dynamic factor

analysis.

(i) The dynamic factor model was derived as a realisation of a `VARMA plus

noise' model where the VARMA component has reduced spectral rank.

It was shown that mutually uncorrelated autoregressive factors may be

derived from the lowest common denominators of the scalar transfer func-

tions in each column of the transfer function matrix. It was also shown

that, in cases where the factor polynomials have no common polynomial

factors, the dynamic factor model corresponds to a minimal dimension

state space representation of the VARMA plus noise model (Proposition

1). Since business cycle models often have fewer structural shocks than

99

observable variables, and macroeconomic variables are measured with

noise, the dynamic factor model with uncorrelated autoregressive factors

is a good choice of model for business cycle analysis and should be viewed

as a competitor to the almost universally used VAR.

(ii) The identication issue was considered for a fairly general class of weakly

stationary dynamic factor model with uncorrelated factors. In cases

where the removal of any row of β(L) leaves rows from which it is possi-

ble to construct two k× k matrices of full rank and another matrix with

at least one row, it was shown that the error spectrum is identied (Theo-

rem 2.2.1) and that the number of dynamic factors is identied (Theorem

2.2.2). It was also shown that, under these conditions, zero-restrictions

similar to those used to identify the static factor model are also identi-

fying for the dynamic factor model (Theorem 2.2.3). Of most interest

is Theorem 2.2.4 which shows that, under the above rank conditions, if

β(L) is irreducible and the spectra of the factors are linearly indepen-

dent, then β(L), the factor specta, and the error spectra are identied up

to sign changes, reordering, and rescaling of the factors. Consequently,

zero-restrictions are not necessary for identication in many forms of

dynamic factor model, including those with autoregressive factors.

(iii) A frequency domain approach was proposed for the estimation of dy-

namic factor models. A simulation exercise suggested that this method

has some computational advantage over the state space scoring algo-

rithms which are usually used for dynamic factor model estimation.

However, the main attraction of the frequency domain approach is the

100

relative ease with which a general algorithm can be coded. The existing

time domain algorithms for the estimation of dynamic factor models re-

quire the construction of a state space representation of the model. For

factor models with few lags, this is trivial. However for more complicated

lag structures, and particularly for ARMA dynamics, this task becomes

more complex, and the construction of a general algorithm, which can

handle any specication of model orders is complicated. As shown in

Section 2.3, in the frequency domain a general expression for the covari-

ance matrix can be written (Equation (2.5)) which make the evaluation

of the likelihood relatively easy to code.

There exists scope for future research in this area. Since the dynamic

factor model with mutually uncorrelated autoregressive factors is not always

a minimal dimension representation of the `VARMA plus noise' model, it is of

interest to derive a realisation which is. Perhaps the most natural way to do

this is to choose a representation of the VARMA component of the model which

is known to be of minimal dimension. If the representation of the VARMA

model is identied in the absence of additive noise, then Theorems 2.2.1 and

2.2.2 could easily be extended to provide identication of the VARMA plus

noise model. This is possible because the proofs to Theorems 2.2.1 and 2.2.2

do not require the existence of factor structure, but rather are based on the

ranks of submatrices of the spectral density matrix of the common component

β(L)ft. Consequently, these theorems could be rewritten to apply directly to

any representation of the VARMA plus noise model. A greater challenge is

oered by the estimation of the VARMA plus noise model. Even in the absence

of additive noise, VARMA estimation is a non-trivial problem. The addition

101

of the noise is unlikely to make things easier.

The identication theorems presented in Section 2.2 provide identication

of the factor spectra. In order for the parameters of the factor processes to

be identied, it must be possible for the factor parameters to be uniquely

determined from the second order moments of the factors. One particularly

interesting case where this is not possible is when the factors follow GARCH

processes. In this case, because the factors are serially uncorrelated, their spec-

tra are not linearly independent, and so Theorem 2.2.4 does not apply. Theo-

rem 2.2.3 provides identication for these models, but requires zero-restrictions

to be imposed, and these may be dicult to justify in practice. It would be

of interest to try to derive a theorem similar to Theorem 2.2.4 that applies to

models with GARCH factors. A possible approach would be to focus on con-

ditional second moments, rather than unconditional second moments. Given

the importance of multiple factor models in empirical nance, this would be a

worthwhile research project.

The simulations reported in Section 2.4 suggest that the frequency domain

approach to dynamic factor model estimation that was proposed in Section

2.3 has some compuational advantage over the traditional state space meth-

ods. However, it should be stressed that, as is the case with the traditional

state space methods, using the frequency domain method to estimate models

with complex lag structures is not a pleasant task. As the number of pa-

rameters grows, the speed with which each iteration is computed slows down

markedly. Practical experience suggests that the number of iterations is also

often larger for models with rich dynamic structures. Furthermore, with large

models, the algorithm often fails to converge and skilled human intervention

102

is often needed to achieve convergence. An important task for future research

then, is to introduce quicker, more robust estimation methods. Currently, sub-

space methods look like the most promising approach to explore. It has been

suggested in this chapter that the dynamic factor model is an attractive model

for empirical macroeconomics, which compares favourably with the commonly

used VAR model. However, until better estimation techniques are derived for

dynamic factor models, it is unlikely that they will see wide use.

Appendix 1 Denitions

• A scalar polynomial is said to be monic if and only if its leading term

has a value of unity.

• A matrix polynomial is said to be unimodular if and only if its inverse

is also a polynomial matrix. It follows that a polynomial matrix is uni-

modular if and only if its determinant is a non-zero constant.

• If P (L) is a N × k rational transfer function matrix (i.e. a matrix of

rational transfer functions), and if P (L) = P (L)DR(L)−1 where P (L)

and DR(L) are N × k and k × k polynomial matrices respectively, then

DR(L) is a right divisor of P (L). If P (L) = DL(L)−1P (L) where DL(L)

is a k × k polynomial matrix, then DL(L) is a left divisor of P (L).

• Let P and Q be a p × k and a q × k matrix polynomial respectively. If

P = PW , Q = QW where P , Q and W are matrix polynomials, then

W is a right divisor of P , Q. If P = WP , Q = WQ where P , Q and W

are matrix polynomials, then W is a left divisor of P , Q.

103

• Two matrix polynomials P and Q of dimensions N × k and q× k respec-

tively, are said to be (left or right) coprime if and only if they have only

unimodular common (left or right) divisors.

• A rational transfer function matrix P (L) is said to be proper if limL→∞

P (L) <

∞, and strictly proper if limL→∞

P (L) = 0.

• A polynomial matrix of full column rank is said to be irreducible if and

only if its rows are right coprime (Kailath (1980)).

Appendix 2 Proofs of Theorems

Proof of Theorem 2.2.1: Consider the spectral representation of two dy-

namic factor models M and M ∗. Without loss of generality, assume that

k∗ 6 k. Under the stated conditions for any j = 1, ..., k we can order the

variables such that

βω =

β1

βj

β2

β3

and β∗ω =

β∗1

β∗j

β∗2

β∗3

for each value of ω, where β1 and β2 are

full-rank k × k matrices, βj is a 1 × k vector, β∗1 and β∗2 are k × k∗ matrices

and β∗j is a 1 × k∗ vector. The subscript ω is suppressed in order to simplify

the notation.

104

For each value of ω we may write

βωSfωβH

ω =

β1SfβH1 β1Sfβ

Hj β1Sfβ

H2 β1Sfβ

H3

βjSfβH1 βjSfβ

Hj βjSfβ

H2 βjSfβ

H3

β2SfβH1 β2Sfβ

Hj β2Sfβ

H2 β2Sfβ

H3

β3SfβH1 β3Sfβ

Hj β3Sfβ

H2 β3Sfβ

H3

where ω is again suppressed, and we may write β∗ωSf∗

ω β∗Hω in a similar fashion.

Consider the (k + 1)× (k + 1) submatrix of βωSfωβH

ω occupying block rows

1 and 2 and block columns 2 and 3.

V =

β1SfβHj β1Sfβ

H2

βjSfβHj βjSfβ

H2

and the corresponding (k + 1)× (k + 1) submatrix of β∗ωSf∗

ω β∗Hω

V ∗ =

β∗1S∗fβ

∗Hj β∗1S

∗fβ

∗H2

β∗j S∗fβ

∗Hj β∗j S

∗fβ

∗H2

Since βωSf

ωβHω is of rank k, V being (k + 1) × (k + 1) must be singular.

Hence,

|V | = (−1)kβjSfβHj |β1Sfβ

H2 |+ f(β, Sf ) = 0 (2.6)

where f(., .) is a bounded, real-valued function of the elements of its matrix

arguments.

If M and M ∗ are observationally equivalent then Sxω = Sx∗

ω . Since Sεω

105

and Sε∗ω are diagonal it follows that the o-diagonal elements of β∗ωSf∗

ω β∗Hω are

equal to the corresponding elements of βωSfωβH

ω . Thus, β∗1S∗fβ

∗Hj = β1Sfβ

Hj ,

β∗1S∗fβ

∗H2 = β1Sfβ

H2 and β∗j S

∗fβ

∗H2 = βjSfβ

H2 . It follows that

V ∗ =

β1SfβHj β1Sfβ

H2

β∗j S∗fβ

∗Hj βjSfβ

H2

Similarly, since V ∗ is a (k + 1) × (k + 1) submatrix of a matrix of rank

k∗ 6 k, we have

|V ∗| = (−1)kβ∗j S∗fβ

∗Hj |β1Sfβ

H2 |+ f(β, Sf ) = 0 (2.7)

Since β1, Sf , and β2 are of full rank, |β1SfβH1 | 6= 0. Thus, equations (2.6) and

(2.7) yield β∗j S∗fβ

∗Hj = βjSfβ

Hj .

Similarly, under Assumption 2.4, it may be shown that the other diagonal

elements of β∗ωSf∗ω β∗Hω are equal to the corresponding elements of βωSf

ωβHω for

all ω. Thus, Sεω = Sε∗

ω for all ω.

Proof of Theorem 2.2.2: Consider two observationally equivalent factor mod-

els M and M ∗. Without loss of generality, assume that k∗ < k. From Theorem

2.2.1 we have βωSfωβH

ω = β∗ωSf∗ω β∗Hω . Let β1ω be a k × k full-rank sub-matrix

of βω and β∗1ω the k× k∗ matrix of corresponding rows from β∗ω. Then we have

β1ωSfωβH

1ω = β∗1ωSf∗ω β∗H1ω . Since β1ω is of full rank |β∗1ωSf∗

ω β∗H1ω | = |β1ωSfωβH

1ω| >

0 ⇒ k∗ > k, a contradiction.

Proof of Lemma 1: From Theorems 2.2.1 and 2.2.2, Sεω and k are identied.

Therefore, if M and M ∗ are observationally equivalent, it must be true that

106

βωSfωβH

ω = β∗ωSf∗

ω β∗Hω

Under the assumptions we may partition βω into a full-rank k × k matrix

β1ω, and a (N − k)× k matrix β1ω, and partition β∗ω similarly. Thus, we have

β1ω

β2ω

Sfω

(βH

1ω βH2ω

)=

β∗1ω

β∗2ω

Sf∗ω

(β∗H1ω β∗H2ω

)

a set of four matrix equations, two of which are

β1ωSfωβH

1ω = β∗1ωSf∗ω β∗H1ω (2.8)

β2ωSfωβH

1ω = β∗2ωSf∗ω β∗H1ω (2.9)

From (2.8) we can write

Sf∗ω = MωSf

ωMHω (2.10)

where

Mω = β∗−11ω β1ω (2.11)

Since

β∗1ω = β1ωM−1ω (2.12)

and

Sfω = M−1

ω Sf∗ω MH−1

ω

107

from (2.9) we have

β2ωM−1ω Sf∗

ω M−1Hω βH

1ω = β∗2ωSf∗ω M−1H

ω βH1ω

so

β∗2ω = β2ωM−1ω (2.13)

Stacking (2.12) and (2.13) gives

β∗ω = βωM−1ω (2.14)

If β and β∗ are irreducible, since β1 and β∗1 are constructed from rows of β

and β∗ respectively, it follows that β and β1 are right coprime and that β∗ and

β∗1 are right coprime. Thus, from the Simple Bezout Identity (Kailath (1980),

section 6.3) there exist polynomial matrices X1ω,, X2ω, X3ω, and X4ω such that

X1ωβω + X2ωβ1ω = I (2.15)

X3ωβ∗ω + X4ωβ∗1ω = I (2.16)

Substituting (2.14) and (2.11) into (2.15) and (2.16) yields M−1ω = X1ωβ∗ω +

X2ωβ∗1ω and Mω = X3ωβω +X4ωβ1ω. Since X1ω,X2ω, X3ω, X4ω, βω, β1ω, β∗ω, and

β∗ω are polynomial matrices, Mω and M−1ω are polynomial matrices. It follows

that Mω is unimodular.

Proof of Theorem 2.2.3: A permutation matrix is dened as a square ma-

trix for which each column and each row has exactly one element with a value

of unity. All other elements are zero.

108

From Theorem 2.2.2 and Lemma 1 we need only consider k-factor models

of the form

Sxω = β∗ωSf∗

ω β∗Hω + Sεω

where β∗ω = βωM−1ω and Sf∗

ω = MωSfωMH

ω and Mω is a k × k unimodular

operator.

Dene β1ω and β∗1ω as k × k lower triangular matrices constructed from

rows of βω and β∗ω respectively. We may then write Mω = β∗−11ω β1ω. Since

β1ω is lower triangular so is β−11ω . It follows that Mω is also lower triangular.

Now consider Sf∗ω = MωSf

ωMHω . Since Mω is lower triangular and Sf

ω and Sf∗ω

diagonal we have M−1ω Sf∗

ω = SfωMH

ω where the left-hand side is lower triangular

and the right-hand side upper triangular. It follows that Mω is diagonal. Since

from Lemma 1 Mω is also unimodular, it must be a constant. Thus, Sf∗ω is a

rescaling of Sfω.

Proof of Theorem 2.2.4: From Lemma 1

maxi,j

q∗ij∑j=0

β∗j e−ijω =

s∑l=0

maxi,j

qij∑j=0

βjMle−i(j+l)ω

for nite, non-negative s. Matching terms, it is clear that the above equality

holds if and only if maxi,j

q∗ij > maxi,j

qij. The unimodularity of Mω under the

assumed conditions allows us to similarly argue that maxi,j

qij > maxi,j

q∗ij. There-

fore maxi,j

q∗ij = maxi,j

qij and we may match the terms in the above equation and

solve to nd that Mω = M , a real constant. The factor-spectra of the set

of observationally equivalent models are therefore Sf∗ω = MSf

ωM ′. For this

equality to hold, it is required that (Mi Mj) diag(Sf

ω

)= 0 for i 6= j, where

Ml is the lth row of M and denotes the Hadamard product. Therefore, under

Assumption 1.2, (Mi Mj) = 0 for i 6= j. Thus, MM ′ is diagonal. This allows

us to write M = Md 12 M o where Md is a k × k non-singular diagonal matrix

109

and M o is a k × k orthogonal matrix. We then have M oSfωM oH = MdSf∗

ω .

The diagonality of the right-hand side implies that MdSf∗ω is the matrix of

eigenvalues of Sfω. Since Sf

ω is diagonal, Md = Ik and Sf∗ω is the spectrum of a

permutation of the factor vector ft.

110

Chapter 3

Principal Components Estimation

of Large-Scale Factor Models

The dynamic factor model described in Chapter 2 is suciently parsimonious

to be used in cases in which the number of variables is somewhat larger than

what might normally be used to estimate a vector autoregression. However,

this does not mean that it is necessarily suitable no matter how large the num-

ber of variables. In cases in which the number of variables is of the same order

of magnitude as the number of observations, three problems exist. Firstly,

the computational cost of the estimation procedure in Chapter 2 becomes

prohibitive. With the exception of models with extremely crude dynamics,

maximum likelihood estimation of dynamic factor models becomes computa-

tionally infeasible quite quickly as the number of variables in the model grows.

Secondly, the assumption that the spectrum of the error process is diagonal

becomes harder to believe the larger the number of variables in the model.

The magnitude of the bias caused by the presence of cross-correlation between

111

the errors is apparantly unknown. While it might be expected that this bias

would be small in cases where there was only a small amount of error cross-

correlation, there is no good general reason to expect the amount of correlation

to remain small as the number of variables grows. The third problem is that the

standard asymptotic theory for likelihood estimation assumes that the number

of variables is xed while the number of observations goes to innity. A more

appropriate asymptotic model in the case in which the number of variables is

of the same order of magnitude as the number of observations, is one in which

the number of variables and the number of observations diverge jointly.

In recent years there has been much interest in estimating the parameters of

large-scale factor models using principal components techniques. An obvious

advantage of this approach is the relative ease with which the eigenvalues and

eigenvectors of large symmetric matrices can be computed. In particular, when

computing the rst k sample principal components of a sequence of T N × 1

random vectors, it is not required that T > N . There now exists a substantial

body of asymptotic theory that considers the behaviour of principal component

based estimators as (T, N) −→ (∞,∞). Consider the static factor model

xt = Bft + εt

yt+h = β′ft + γ′zt + ηt

where xt is a N × 1 vector of observable variables, yt is a scalar observable

variable, zt is a m × 1 vector of predetermined variables (which may include

lags of the dependent variable), ft is a k × 1 vector of unobservable factors,

εt is a N × 1 vector of unobservable errors, ηt is a scalar unobservable er-

112

ror, B is a N × k matrix of non-random factor loadings and β and γ are

vectors of regression coecients. Let sft be a vector containing the sam-

ple principal components corresponding to the rst k sample eigenvalues of

Sxx = 1T

T∑t=1

xtx′t, let δ =

(β′ γ′

)′be the OLS estimator of δ = (β′ γ′)′

computed using the sample principal components in place of the unobserv-

able factors, and let ysT+h = δ′(s′fT z′T

)′be an estimator of the infeasible

population forecast δ′ (f ′T z′T )′. Stock and Watson (2002a) prove that, un-

der certain conditions as (T, N) −→ (∞,∞), sftp−→ Lft, δ

p−→

L 0

0 Im

δ

and ysT+hp−→ δ′ (f ′T z′T )′ where L is a k × k sign matrix1. Under slightly

dierent conditions, Bai and Ng (2002) prove that, for a xed value of t,

min(N, T ) ‖sft −HN,T ft‖2 = Op(1) where HN,T is a sequence of non-singular

matrices. Bai (2003) shows that under similar conditions, as (T, N) −→

(∞,∞), if√

NT−→ 0 then

√N (sft −HN,T ft) converges to a Gaussian distri-

bution and that if√

NT

> τ > 0 then T (sft −HN,T ft) = Op(1), where the value

of t is xed. When√

TN−→ 0 he shows that

√T(λ

12i qi −H−1

NT bi

)converges to

a Gaussian distribution where λi is the ith eigenvalue of SXX = 1TN

X ′X and qi

is the corresponding eigenvector, and bi is the ith row of B. If√

TN

> τ > 0 then

N(λ

12i qi −H−1

NT bi

)= Op(1). Bai (2003) also proves asymptotic Gaussianity

for the estimator of the common component and provides a uniform bound

on the factors of max16t6T

‖sft −HN,T ft‖2 = Op

[max

(T− 1

2 ,√

TN

)]. Bai and Ng

(2006) also prove that as (T,N) −→ (∞,∞) if√

TN

−→ 0,√

T(δ − δ

)con-

verges to a Gaussian distribution, and that if√

NT−→ 0, yT+h−yT+h√

var(yT+h)

d−→ N(0, 1),

where yT+h is a forecast of yT+h computed using the OLS estimates and the

1That is, a k × k diagonal matrix with each diagonal element either 1 or −1.

113

principal components estimator of the factor.

Forni et al. (2000) and Forni et al. (2004) consider the use of dynamic

principal components techniques to estimate the dynamic factor model

xt = B(L)ft + εt

They prove consistency as (T,N) −→ (∞,∞). However, since this chapter

does not consider the dynamic model, further details of their ndings will not

be provided here.

An important feature of the results outlined above is that they apply to

the `approximate' factor model, which allows for a degree of cross-correlation

between the errors. Specically, if Ψ = E(εtε′t) and Ψij is the i, jth element of

Ψ then Stock and Watson (2002a) and Bai and Ng (2002) assume that there

exists a nite bound M such that 1N

N∑i=1

N∑j=1

|Ψij| < M . Bai (2003) and Bai and

Ng (2006) make the slightly stronger assumption thatN∑

j=1

|Ψij| < M for all

i, i.e. the rows of the error covariance matrix are uniformly bounded. Forni

et al. (2000) and Forni et al. (2004) assume that the dynamic eigenvalues of

the error spectrum are uniformly bounded, which is implied by a similar ab-

solute summability condition on the rows of the error spectral density matrix.

Clearly these are weaker assumptions than the diagonal error covariance that is

assumed for the `strict' factor model. However, it is possible that the `approxi-

mate' factor model could still be too restrictive for the type of macroeconomic

analysis that has appeared in the literature. It is not hard to believe that

increasing the number of variables in a factor model would cause the absolute

row sums of the error covariance to increase without a xed bound, rendering

114

the existing theory inapplicable.

Empirical support for principal components estimation of macroeconomic

factor models is encouraging, but not uniformly so. Some studies, e.g. Stock

and Watson (2002b), Brisson et al. (2003), Schneider and Spitzer (2004) and

Camacho and Sancho (2003), nd large improvements in macroeconomic fore-

casting performance from the use of factors, with mean squared forecasting

errors reduced by over 40% compared to scalar autoregressions in some cases.

Others, e.g. Angelini et al. (2001), Giacomini and White (2003), Eklund and

Karlsson (2007), Schumacher (2005) and Banerjee and Marcellino (2006), nd

little benet from the use of principal component estimates of factors in fore-

casting models. Interestingly, there also exists some evidence that it is possible

for the number of variables in the factor model to be too large. Boivin and Ng

(2006) nd that 40 carefully chosen variables can yield better results than 147

variables when forecasting 8 measures of economic activity and ination for

the US. Inklaar et al. (2003) consider the construction of a coincident indicator

for the Euro area and nd that a factor model estimated using 38 carefully

chosen macroeconomic variables produces an indicator that is at least as good

as that produced by estimating a factor model estimated using their entire

database of 246 variables. Schneider and Spitzer (2004) consider forecasting

Austrian GDP using a dynamic factor model estimated by dynamic principal

components. They nd that models that include only 5 to 11 variables perform

signicantly better than a model with 143 variables. den Reijer (2005) consid-

ers using a dynamic factor model of 370 variables to forecast Dutch GDP, but

nds that models of 147 and 223 carefully chosen variables perform better.

Boivin and Ng (2006) also perform monte carlo simulations with dierent

115

degrees of error cross-correlation and with varying factor loadings, and demon-

strate that increasing the number of variables in the factor model can worsen

the performance of forecasts and factor estimators in some cases in which the

degree of error cross-correlation grows with N . Despite the importance of

these simulation results, to my knowledge, there is yet to appear any theory

that deals with cases in which the absolute row sums of the error covariance

matrix grow without a xed bound as the number of variables increases. Fur-

thermore, it is not clear how applied researchers might measure the degree of

error cross-correlation in order to make a judgement about the suitability of

the principal components estimator in any particular application. Given the

patchy empirical performance of large scale factor models, this is an important

area for theoretical research.

Section 3.1 of this chapter presents some new theory for principal com-

ponent estimators of static factor quantities in a setting in which (T,N) −→

(∞,∞). Consistency is proved under conditions which allow the absolute

row sums of the error covariance matrix to grow at a rate of O(N1−α) where

0 < α 6 1. However, the faster the growth in error cross-correlation, the

slower is the rate of convergence of the estimators. Furthermore, it is shown

that what really matters for estimation is not the number of variables per se,

but rather the magnitude of the noise-to-signal ratio, which is measured as σ2

λk

where σ2 is the largest eigenvalue of the error covariance matrix Ψ, and λk is

the kth eigenvalue of Ω = E(

1TX ′X

). When the noise-to-signal ratio is small,

population principal component and population factor quantities will be sim-

ilar. While the noise-to-signal ratio is not generally identied, it is possible

to construct lower bounds for it which are. In Section 3.2 a hypothesis test

116

for the magnitude of the noise-to-signal ratio is proposed. This test is devel-

oped in a framework in which N is xed and T −→ ∞, rather than one in

which (N, T ) −→ (∞,∞) jointly. However, simulations show that it performs

reasonably well in some cases in which N is large compared to the number

of observations. In Section 3.3 this test is used to consider whether the fac-

tor model estimated by Stock and Watson (2002b) has a small noise-to-signal

ratio.

3.1 Theory

The approach taken in this chapter is to consider the decomposition

sft − Lft = (sft − L1sft) + (L1sft − Lft) (3.1)

where sft is a vector containing the rst k principal components of the ob-

servable vector xt, sft is the vector containing the corresponding population

principal components, ft is the k×1 factor vector, and L and L1 are k×k sign

matrices. That is, the dierence between sample principal components and

population factors will be considered by separate consideration of the dierence

between sample principal components and population principal components

and the dierence between population principal components and population

factors. This section presents four sets of theoretical results which are relevant

to this decomposition.

(i) In Subsection 3.1.1 the work of Schneeweiss (1997) is extended to provide

a set of bounds on the `distance' between population principal compo-

117

nent quantities and their analogous factor quantities (Theorem 3.1.1).

In particular, an upper bound is placed on E ‖sft − Lft‖2F (Theorem

3.1.1(c)). The key parameter in these bounds is the noise-to-signal ratio

(ρ), which is measured as

ρ =σ2

λk

where σ2 is the largest eigenvalue of the error covariance matrix Ψ, and λk

is the kth eigenvalue of Ω = E(

1TX ′X

). When the noise-to-signal ratio is

small, population principal component and population factor quantities

will be similar. It is well-known that for a matrix A, ‖A‖2 6 ‖A‖1 ‖A‖∞.

Since Ψ, the covariance matrix of the errors, is symmetric, the following

Lemma is true.

Lemma 2. σ2 6 maxi

N∑j=1

|Ψij|

This lemma will be used subsequently to provide a link between the

noise-to-signal ratio and bounding conditions on the row sums of the

absolute value of the error covariance similar to those which have been

employed in the literature.

(ii) In Subsection 3.1.2 some conditions are presented under which the noise-

to-signal ratio shrinks as N grows (Theorem 3.1.2).

(iii) In Subsection 3.1.3 the relationship between sample principal compo-

nents and population principal components is considered. Under condi-

tions whereby the distance between each of the rst k eigenvalues and

all other eigenvalues grows at a rate of N , it is shown (Theorem 3.1.3)

that sample principal component quantities are√

T -consistent estimators

118

of population principal component quantities in a framework in which

(N, T ) −→ (∞,∞) jointly.

(iv) In Subsection 3.1.4 the results in subsections 3.1.1, 3.1.2 and 3.1.3 are

used with Equation (3.1) to develop results (Theorem 3.1.4) which give

conditions under which sample principal component quantities are con-

sistent estimators of population factor quantities.

The most commonly used notation is dened below. Less frequently used

notation will be dened when it is used.

The factor model, and the factor regression equation, are written as

xt = Bft + εt (3.2)

yt = β′ft + ηt (3.3)

where xt is a N ×1 vector of observed variables , yt is scalar observed variable.

εt is a N × 1 error vector, and ηt is a scalar error term. ft is a k × 1 factor

vector, B is a N×k matrix of factor loadings, and β is a k×1 regression vector.

All random variables are assumed to have expected values of zero. Without

loss of generality, it is assumed that E(ftf′t) = Ik. The covariance matrix of

xt is denoted Ω and the covariance matrix of εt is denoted Ψ. Therefore,

Ω = BB′ + Ψ

Λ is a N × N diagonal matrix containing the eigenvalues of Ω in descending

order, λ1, λ2, ..., λN . Q is the N×N matrix whose columns are the eigenvectors

119

corresponding to the diagonal elements of Λ. Λ is partitioned so that the top

left k × k block Λf contains the rst k eigenvalues. The remaining N − k

eigenvalues are contained in the lower right block Λ⊥. Q is similarly partitioned

into the N × k matrix Qf and the N × (N − k) matrix Q⊥. Therefore

Λ =

Λf 0

0 Λ⊥

and Q =

(Qf Q⊥

)

The population principal component vector st is partitioned into the rst k

principal components sft and the remaining N − k principal components s⊥t.

Therefore we have

st =

sft

s⊥t

= Λ− 12 Q′xt

An OLS estimate of the regression coecient in Equation (3.3) computed using

the population factors is

βf =1

T

T∑t=1

ftyt

The regression coecient computed using the population principal components

in place of the factors is

βs =1

T

T∑t=1

sftyt

The population forecast for period T +h computed at time T using the factors

is dened as

yfT+h = βffT+h

120

The forecast for period T+h computed at time T using the population principal

components instead of the factors is dened as

ysT+h = βssfT+h

The k×k diagonal matrix containing the eigenvalues of BB′ in descending

order, d1, d2, ..., dk, is denoted D. The N×k matrix containing as columns the

corresponding eigenvectors is denoted U . Therefore

BB′ = UDU ′

The largest eigenvalue of the error covariance matrix Ψ is denoted as σ2. The

noise-to-signal ratio (ρ) is dened as

ρ =σ2

λk

=maxeig(Ψ)

λk

The sample covariance matrix of xt is denoted

Sxx =1

T

T∑t=1

xtx′t

Sample quantities that are derived from Sxx are denoted by the same notation

used for their population counterparts, but with a `hat' to indicate that they

are sample estimates. Therefore, the eigenvalues and eigenvectors of Sxx are

given by

Λ =

Λf 0

0 Λ⊥

and Q =

(Qf Q⊥

)

121

the sample principal components are given by

st =

sft

s⊥t

= Λ− 12 Q′xt

the sample principal component regression estimator is

βs =1

T

T∑t=1

sftyt

and the sample forecast is

ysT+h = βssfT+h

3.1.1 Population Principal Components and Population

Factors

In this subsection the relationship between population principal component

quantities and population factor quantities is considered. It answers a fun-

damental question which has not received direct attention in the econometric

literature on large factor model; that is: under what conditions are factors

and principal components similar? While the assumptions made by Bai and

Ng (2002), Bai (2003), Bai and Ng (2006) and Stock and Watson (2002a) con-

stitute an answer to this question, their theoretical setup creates the need to

deal with sampling issues in a dual asymptotic framework, which somewhat

obscures the fundamental issue at hand. A consideration of population quanti-

ties alone simplies the problem and allows us to provide a concise and explicit

answer to this question.

122

Bearing in mind that principal component analysis and factor analysis are

widely used techniques which have co-existed for around 75 years, it is sur-

prising that this question has received so little attention in the literature.

Nonetheless, there are some researchers have considered this issue in the past.

In the context of a generalisation of the results on arbitrage pricing and factor

structure of Ross (1976), Chamberlain and Rothschild (1983) show that for

an approximate factor model, the eigenvectors of the population covariance

matrix of xt are asymptotically equivalent to the population factor loadings as

N gets large. Bentler and Kano (1990) consider a single factor model and show

that as N →∞ the correlation between the rst population principal compo-

nent and the population factor converges to one and the principal component

loading vector converges to the factor loading vector. In a signicant paper,

Schneeweiss and Mathes (1995) consider a k-factor static model and analyse

the sum of the canonical correlation coecients between the population factors

and the population principal components. They show that this sum approaches

k as σ2

dk−→ 0 where σ2 is the largest eigenvalue of Ψ and dk is the smallest

eigenvalue of B′B. They also prove similar results for the factor loadings

and the principal component loadings. Under similar conditions, Schneeweiss

(1997) proves that∥∥∥BD− 1

2 −QfL∥∥∥

F−→ 0 and E ‖sft − Lft‖F −→ 0, where

D is a diagonal k× k matrix containing the ordered eigenvalues of B′B, Qf is

the N × k matrix containing the eigenvectors of Ω = E(xtx′t) corresponding to

the rst k eigenvalues, sft is a vector containing the rst k principal compo-

nents of xt, L is a k×k sign matrix2 and ‖.‖F denotes the Frobenius norm. The

work of Schneeweiss (1997) is signicant since, in the long history of the prin-

2i.e. the diagonal elements of S are all ±1.

123

cipal component analysis and factor analysis literatures, it is the rst paper to

provide a detailed account of the `distance' between principal components and

factors. Since it deals only with population quantities it has nothing explicit

to say about sampling issues. However, it provides substantial insight into the

structure of the relationship between principal components and factors which

is subsequently used in this chapter to develop a sampling theory.

The remainder of this subsection presents some new theory linking pop-

ulation principal component quantities to their analogous population factor

quantities. In contrast to the asymptotic theorems of Schneeweiss (1997), the

results produced in this subsection are bounds on the distances between prin-

cipal component and factor quantities that hold for any number of variables.

Consider the factor model given by equations (3.2) and (3.3).

Let

r2 =‖β‖2

‖β‖2 + σ2η

where σ2η = E(ηtη

′t). Note that r2 is the proportion of the variance of yt that

is explained by the factors. Therefore, it may be interpreted as the population

analogue of the R2 statistic from regression analysis. Denote

δ =∞∑

j=1

∣∣∣∣E(ytyt−j)

E(y2t )

∣∣∣∣+ supi

∞∑j=1

∣∣∣∣∣ E(ytsi,t−j)√E(y2

t )E(s2it)

∣∣∣∣∣where sit is the ith principal component measured at time t. This variable

appears in one of the bounds that is subsequently derived. Also denote

c = max16i6k16j6N

i6=j

λi

|λj − λi|

124

for k > 1 and c = 0 for k = 1. Note that c provides a measure of the

relative closeness of adjacent eigenvalues. We dene the forecast deviation as

eT+h = ysT+h − yT+h = β′ssfT+h − yT+h.

Theorem 3.1.1 presents a set of bounds for the dierences between princi-

pal component quantities and their corresponding factor model quantities. In

each case the bound is a function of the noise-to-signal ratio ρ = σ2

λk. Conse-

quently, provided that the noise-to-signal ratio is suciently small, an analysis

of population factor quantities may be undertaken by considering the analo-

gous principal component quantities. All proofs are in the appendix.

Theorem 3.1.1.

For the factor model described above

(a) 1− ρ 6 di

λi6 1 for i = 1, .., k.

(b) if k = 1 or c 6 12ρ

√1−ρk−1

, then there exists a sign matrix L such that

‖Qf − UL‖2F 6 k (ρ + 4c2ρ2(k − 1)) .

(c) if k = 1 or c 6 1−ρ

2ρ√

(k−1)(1−ρ), then there exists a sign matrix L such that

E ‖sft − Lft‖2F 6 k (2ρ + ρ2 (4c2(k − 1)(1− ρ)− 1)) .

(d) if k = 1 or c 6 1−ρ

2ρ√


E ‖βs − Lβf‖2F 6 σ2

yk(2ρ + ρ2(4c2(k − 1)(1− ρ)− 1)).

(e) if ft, εt and ηt are Gaussian and γ < ∞, then E|eT+h|√β′β

6√

k(ρ2 + 2kδ

r2T

)+√

kρ(1 + 2γ

r2T

).

The following asymptotic results follow trivially from these theorems:

Corollary 1.

125

(a) di

λi→ 1 as ρ → 0;

(b) If c < c < ∞ then there exists a sign matrix L such that ‖Qf − UL‖F →

0 as ρ → 0;

(c) If c < c < ∞ then there exists a sign matrix L such that ‖sft − Lft‖F

p−→

0 as ρ → 0;

(d) If c < c < ∞ then there exists a sign matrix L such that∥∥∥QfΛ

12f −BL

∥∥∥F→

0 as ρ → 0.

(e) If c < c < ∞ then there exists a sign matrix L such that ‖βs − Lβf‖F

p−→

0 as ρ → 0.

(f) If δ < δ < ∞ then eT+hp−→ 0 as

(1T, ρ)→ (0, 0).

Corollaries 1(a), 1(b), and 1(c) were previously proved by Schneeweiss

(1997). Corollaries 1(d), 1(e) and 1(f) are new. Importantly, Theorem 3.1.1

is new and provides rates of convergence which are necessary for subsequent

theorems.

Note that, in order to be non-trivial, Theorems 3.1.1(b), 3.1.1(c) and

3.1.1(d) require the rst k eigenvalues of Ω to be distinct so that c is bounded.

The distance between the relevant quantities in these theorems depends on

the closeness of adjacent eigenvalues and on the noise to signal ratio. Theorem

3.1.1(e) assumes Gaussianity. It is quite likely that this assumption could be

replaced by the assumption of an upper bound on sums of fourth moments.

However Gaussianity produces a result which is more easily interpretable. In

any case, in the asymptotic arguments that follow, the assumption of Gaus-

sianity will not be needed. In order for the principal component forecast and

126

the theoretically optimal forecast to be close, we need the noise to signal ratio

to be fairly small and the sample size to be reasonably large. Precisely how

large the sample size needs to be will depend on the magnitude of the auto-

covariances of the forecast variable and the proportion of the variance of the

forecast variable that is determined by the factors.

It should be noted that the computation of the noise to signal ratio requires

knowledge of the eigenvalues of the error covariance matrix. In cases in which

this information is unavailable, it may be useful to have a lower bound for ρ.

It is shown in the proof to Theorem 3.1.1(a) that dj + σ2 > λj ∀j = 1, .., N .

Since dk+1 = 0, λk+1 6 σ2. We therefore have that λk+1

λk6 ρ where λj is the

jth eigenvalue of the covariance matrix of xt. This expression makes it clear

that in order for the noise to signal ratio to be small, implying that population

principal components are close to factors, there must exist a large relative

gap between the kth and (k + 1)th eigenvalues of the covariance matrix of the

observable variables. This is an interesting observation since practitioners of

principal component analysis will often plot sample eigenvalues, look for a

large gap between two adjacent eigenvalues, and include in their subsequent

analysis only those principal components that correspond to the set of larger

eigenvalues. As the above analysis shows, these are the principal components

which are likely to be the most strongly related to the factors in a factor

representation of the variables. Consequently, in addition to accounting for a

large proportion of the variance, these principal components are also likely to

explain a large proportion of the correlation between the observable variables.

127

3.1.2 N and the Noise to Signal Ratio

The theory in Section 3.1.1 links the dierence between population principal

component and population factor quantities to the noise-to-signal ratio. This

begs the question: under what conditions might the noise-to-signal ratio be

expected to be small? Bai (2003) and Bai and Ng (2006) assume that 1N

B′B →

ΣB > 0 as N → ∞. In the case where all factors are `strong' in the sense of

Onatski (2006a), this implies that all k eigenvalues of B′B, and consequently

the rst k eigenvlaues of Ω, λ1, ..., λk, grow at a rate of N . Bai (2003) and

Bai and Ng (2006) also assume thatN∑

j=1

|Ψij| < M < ∞ for all i. Since Ψ is

symmetric, it follows from Lemma 2 that σ2 6√

M . Since λk grows at a rate of

N and σ2 has a xed upper bound, under these restrictions ρ = σ2

λk= O (N−1).

However, it is clear that these restrictions on the absolute row sums are stronger

than is necessary for ρ −→ 0 as N −→ ∞, and some interesting cases do not

satisfy these restrictions.

One particular case of interest is where the eigenvalues of B′B grow at a rate

strictly less than N so that the factors are `weak' in the sense that, as N →∞,

the proportion of tr(Ω) that is explained by the factors converges to zero. In

such cases, provided that the rate of growth of the kth eigenvalue of B′B is

greater than the rate of growth of the largest eigenvalue of Ψ, ρ will shrink

as N → ∞. Consequently, population principal components and population

factors will be close. If techniques for estimating principal components could

be developed for this case, then the estimated principal components could be

used as factor estimates. At present however, it is not known how this can be

done.

128

Another interesting case, which is the one explored in detail in this chapter,

is when the eigenvalues of B′B grow at a rate of N and the largest eigenvalue of

Ψ also grows. This is the case which is relevant when the `approximate' factor

restriction does not hold, so that the error cross-correlation grows without

bound as N →∞. It is easy to see that, provided that the largest eigenvalue

grows at a rate strictly less than N , the noise-to-signal ratio will shrink as N

gets large. For subsequent use, this is now stated as a Theorem.

Theorem 3.1.2. For the factor model described above, if

1. 0 < dL <dj

N< dU < ∞ for j = 1, .., k where dj = eigj(B

′B) and k is a

xed scalar.

2. σ2 = O (N1−α) where 0 < α 6 1 and σ2 = maxeig(Ψ).

then ρ = σ2

λk= O (N−α). Furthermore ρ = σ2

dk= O (N−α).

Notice that, from Lemma 2, Assumption 2 is satised wheneverN∑

j=1

|Ψij| =

O(N1−α). Therefore, with the eigenvalues of B′B growing at a rate of N , pop-

ulation principal component quantities can consistently estimate their popula-

tion factor counterparts as N −→ ∞ even with the absolute row sums of the

error covariance matrix diverging. Theorem 3.1.2 also makes a claim about

the magnitude of an alternative noise-to-signal measure ρ = σ2

dk. This result

will be used in the proof of a subsequent theorem.

With a connection between the dierence between population principal

component quantities and the corresponding population factor quantities es-

tablished, what is now required is some theory linking sample principal com-

129

ponents to population principal components in a setting in which (N, T ) −→

(∞,∞).

3.1.3 Sample Principal Components and Population Prin-

cipal Components

This subsection presents some new consistency results for the sample eigen-

values, sample principal components, estimates of coecients from sample

principal component models, and forecasts conditional on sample principal

components constructed from of a sequence of T N × 1 random vectors, in

a setting in which (T, N) −→ (∞,∞), and a `gap' assumption is satised

such that the distance between each of the rst k eigenvalues and any other

eigenvalue grows at a rate of at least N . This gap assumption is satised by

the factor models considered in the following subsection. The setting here is

dierent to the case covered by the classical asymptotic analysis of Anderson

(1963) since N is assumed to be growing with T rather than remaining xed.

It is also dierent from the Random Matrix Theory framework since the gap

assumption forces the rst k eigenvalues to grow at a rate of N . The formal

statement of the assumptions is as follows.

Assumptions 3 (Theorem 3.1.3).

3.1 E (xt) = 0 for t = 1, .., T ;

3.2 E (xtx′t) = Ω for t = 1, .., T , and 1

Ntr(Ω) = O(1);

3.3 supt

supN

max16i6N

16j6N

∞∑r=0

|cov (xitxjt, xit−rxjt−r)| < γ < ∞

130

3.4 Gap Assumption: ∃∆ > 0 such that ∆N < max16i6k16j6N

i6=j

|λj − λi|

3.5 E(yt) = 0; E(y2t ) = σ2

y.

Assumptions 3.1, 3.2 and 3.3 are fairly standard assumptions for time series

and are made to ensure that Sxx − Ωp−→ 0 on an element-by-element basis.

Consider Assumption 3.3. In the Gaussian case

cov (xitxjt, xit−rxjt−r) = E(xitxit−r)E(xjtxjt−r) + E(xitxjt−r)E(xjtxit−r)

Suppose that xt = wzt where w is a N × 1 vector of ones and zt is a scalar

stationary Gaussian AR(1). Then

supt

supN

max16i6N

16j6N

∞∑r=0

|cov (xitxjt, xit−rxjt−r)| =2σ2

z

1− θ2

where σ2z is the variance of zt and θ is the autoregressive parameter, so As-

sumption 3.3 is satised. Assumption 3.5 is unremarkable. Assumption 3.4 ,

the `gap' condition, is worthy of more detailed comment. It requires that each

of the rst k eigenvalues of the covariance matrix diverges from each of the

last N − k eigenvalues at a rate of at least N . Since Assumption 3.2 limits the

rate of growth of the sum of all the eigenvalues to N , this means that, at most,

a nite number of eigenvalues can grow with N . The rest must be bounded.

Note that this does not imply that λk+1 is bounded. In fact, λk+1 could grow

at a rate as high as N without violating the assumptions made above. As a

simple example to illustrate this point, suppose that λk = `k and λk+1 = `k+1

where `k−`k+1 > ¯> 0. Then ¯N < |λk−λk+1|, so Assumption 3.4 is satised.

131

Similarly, the rst k eigenvalues can all grow at a rate of N and also diverge

from each other at a rate of N . What is required is that only a nite number

of eigenvalues can grow, and that those eigenvalues must be distinct. In Sub-

section 3.1.1, Corrollaries 1 (a) to (e) require that c = max16i6k16j6N

i6=j

λi

|λj−λi| < c < ∞.

It is easy to show that this condition satises Assumption 3.4 whenever the

rst k eigenvalues grow at a rate of N .

In the Appendix, the following results are proved.

Theorem 3.1.3.

(a) Under assumptions 3.1, 3.2 and 3.3, max16j6N

1N|λj − λj| = Op

(T− 1

2

).

(b) Under assumptions 3.1 to 3.4, there exists a k × k sign matrix L such

that ‖sft − Lsft‖2 = Op

(T− 1

2

).

(c) Under assumptions 3.1 to 3.5, there exists a k × k sign matrix L such

that∥∥∥βs − Lβs

∥∥∥2

= Op

(T− 1

2

).

(d) Under assumptions 3.1 to 3.5 |ysT+h − ysT+h| = Op

(T− 1

2

).

Therefore, the scaled sample eigenvalues, the rst k sample principal com-

ponents, regression coecients computed using the rst k sample principal

components, and forecasts computed using those regression coecients, are all√

T -consistent estimators of their population counterparts under the stated as-

sumptions. Importantly, Theorem 3.1.3 does not require N to be xed or small

relative to T . Rather, it provides consistency results which hold for sequences

in which T and N grow simultaneously, without placing any restrictions on

the relationship between their growth rates.

132

3.1.4 Sample Principal Components and Population Fac-

tors

The previous subsections give conditions under which population factors are

close to population principal components, and conditions under which pop-

ulation principal components are close to sample principal components. In

this subsection, these ideas are combined to produce theorems linking sample

principal component quantities to population factor quantities. Specically,

conditions are presented under which sample principal component quantities

are consistent estimators of analogous population factor quantities.

For the factor model given by equations (3.2) and (3.3) the following as-

sumptions are made

Assumptions 4 (Theorem 3.1.3).

4.1 (a) 0 < dL <dj

N< dU < ∞ for j = 1, .., k where dj = eigj(B

′B) and k

is a xed scalar.

(b) if k > 1 then c = max16i6k16j6N

i6=j

di

|dj−di| < c < ∞.

4.2 (a) 1N

tr (Ψ) = O(1) where Ψ = E(εtε′t).

(b) σ2 = O (N1−α) where 0 < α 6 1 and σ2 is the largest eigenvalue of

Ψ.

4.3 (a) E (ft) = 0, E (εt) = 0, E (ftf′t) = Ik, E (ftεt) = 0 for t = 1, .., T ;

(b) E(ηt) = 0, E(η2t ) = σ2

η, E(ftηt) = 0 for t = 1, .., T ;

133

4.4 supt

supN

max16i6N

16j6N

∞∑r=0

|cov (vitvjt, vit−rvjt−r)| < γ < ∞ where vt =

(f ′t ε′t η′t

)′and N = N + k + 1.

4.5 E(yt) = 0; E(y2t ) = σ2

y.

Assumption 4.1(b) places a bound on the smallest possible gap between

the rst k eigenvalues of B′B. In Theorem 3.1.1 a similar expression for the

eigenvalues of Ω appears in the bounds that link population principal compo-

nent quantities to population factor quantities. It is shown in the Appendix

that similar bounds may be derived in terms of the expression that appears in

Assumption 4.1(b). Assumption 4.2(b) controls the growth rate of the largest

eigenvalue of the error covariance matrix. In combination with Assumption

4.1(a) this ensures that a modied noise-to-signal ratio goes to zero as N

gets large. Consequently, the dierences between population principal compo-

nent quantities and their population factor counterparts converge to zero as

N gets large. Assumptions 4.1(a) and 4.1(b) ensure that the `gap' condition

is met. In conjunction with the moment conditions in Assumptions 4.3(a),

4.3(b), 4.4, and 4.5, this ensures that sample principal component quantities

converge to population principal quantities as T gets large. The fact that these

convergence results occur as (N, T ) → (∞,∞) simultaneously is stated in the

following Theorem.

Theorem 3.1.4. Under assumptions 4.1 to 4.4

(a) max16j6k

| 1N

λj − 1N

dj| = Op

[max

(T− 1

2 , N−α)]

for j=1,..,k;

(b) ‖sft − Lft‖2 = Op

[max

(T− 1

2 , N−α2

)];

134

(c)∥∥∥βs − Lβf

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)];

(d) If Assumption 4.5 also holds, |ysT+h − yfT+h| = Op

[max

(T− 1

2 , N−α2

)].

As noted earlier in this chapter, consistency proofs for the principal compo-

nents estimator of a factor model already exist. The contribution of Theorem

3.1.4 is threefold.

(i) Stock and Watson (2002a), Bai and Ng (2002), Bai (2003) and Bai and

Ng (2006) all make assumptions equivalent to 1N

B′B −→ ΣB where ΣB

is a non-random k × k matrix. No such limit is assumed in Theorem

3.1.4. Rather than requiring that N is large enough for 1N

B′B to be

suciently close to some limiting value, all that is required is that the

eigenvalues of 1N

B′B are distinct and lie between uniform upper and

lower bounds. Therefore, the `largeness' of N is not important for this

part of the theory.

(ii) Theorem 3.1.4 allows for much more cross-correlation between the er-

ror terms than is allowed for in the previously published theory. Stock

and Watson (2002a) and Bai and Ng (2002) assume that there exists a

nite bound M such that 1N

N∑i=1

N∑j=1

|Ψij| < M where Ψ = E(εtε′t) and

Ψij is the i, jth element of Ψ. Bai (2003) and Bai and Ng (2006) make

the slightly stronger assumption thatN∑

j=1

|Ψij| < M for all i , i.e. the

rows of the error covariance matrix are uniformly bounded. While these

assumptions are more general than the diagonal error covariance matrix

that is assumed for the classical `strict' factor model, they might still be

too restrictive for the type of macroeconomic applications which appear

135

in the literature. Indeed, it is entirely plausible that, as the number of

variables is increased, the sums of the absolute values of the rows of the

error covariance matrix increase without a xed bound. In most appli-

cations in the literature, the variables are chosen from a relatively small

number of categories (e.g. Real output and income, Housing starts and

sales, Interest rates, Price indexes). One can imagine a factor model

being constructed by choosing a single variable from each category. A

sequence of factor models with increasing N could then be constructed

by successively adding another variable from each of the categories, with

the number of categories held xed. One might suppose that the er-

rors corresponding to variables from dierent categories might be largely

uncorrelated. However, some of the variables within a category will be

very similar to other variables within the same category (e.g. in Stock

and Watson (2002b), the "Price indexes" category includes the producer

price index for nished goods and the producer price index for nished

consumer goods as two separate series) and consequently it might be

expected that their errors could be correlated. Therefore, as the number

of variables from each category is increased, the sums of the absolute

values of the coecients across each row of the error covariance matrix

will grow at a rate of anything up to and including N . Such cases are not

covered by the theory of Stock and Watson (2002a), Bai and Ng (2002),

Bai (2003) and Bai and Ng (2006). Boivin and Ng (2006) consider a

similar situation but, with the exception of a brief informal considera-

tion of a single-factor model for which the factor loadings are identical,

their analysis is by monte carlo simulation. Theorem 3.1.4 is the rst

136

general theory for principal components estimation of large factor models

which applies under these conditions. What is shows is that the princi-

pal components approach is consistent for a more general class of model

than the approximate factor model. Consistency holds provided that

the absolute row sums of the error covariance matrix grow at a rate of

strictly less than N . However, the faster the growth of the absolute row

sums, the slower the rate of convergence of the estimator. Consequently,

in applied work, it is not sucient simply to have a very large number

of variables. The correlation properties of the errors of the sequence of

models is critical to the performance of the estimator as N grows. This

provides a possible explanation for the lack of empirical evidence for very

large factor models having superior performance to smaller models.

(iii) The proof of Theorem 3.1.4 makes it clear that what really matters for

the quality of the estimator is not the number of variables in the model

per se, but rather the smallness of the noise-to-signal ratio. This simple

statistic provides an appropriate measure of the degree of correlatedness

and variance of the error terms in the model. Rather than concerning

themselves with nding large numbers of variables to include in their

models, practitioners should concentrate their attention on the relative

magnitudes of the eigenvalues of the covariance matrix. Similar to the

methodology used in traditional `small-N ' principal components analy-

sis, economists wishing to estimate large factor models using princpal

component methods should be wary of proceeding unless they are sat-

ised that a large gap exists between the magnitudes of two groups of

137

eigenvalues.

3.2 Measuring the noise-to-signal ratio

Given the above theory, measurement of the noise to signal ratio is a concern

of some practical importance. Since Ψ is not identied, the eigenvalue σ2

and accordingly the noise to signal ratio ρ are not identied. Thus, direct

estimation of the noise to signal ratio is not possible. However, it is possible

to consistently estimate a lower bound on the noise to signal ratio. Let Φ =

σ2IN − Ψ. Then Φ + Ω = BB′ + σ2IN . Note that eigj(Φ) = σ2 − σ2j , where

eigj(.) denotes the jth ordered eigenvalue of its matrix argument, so eigj(Φ) >

0 ∀j = 1, .., N . Thus, Φ is positive semi-denite. It follows from Magnus and

Neudecker (1991, p.208, Theorem 9) that eigj(BB′ + σ2IN) > eigj(Ω), i.e.

dj + σ2 > λj ∀j = 1, .., N . Since dk+1 = 0, λk+1 6 σ2. We therefore have that

λk+1

λk

6 ρ

This expression makes it clear that in order for the noise to signal ratio to be

small, implying that the k-principal component forecast is close to the theo-

retical ideal forecast, there must exist a large relative gap between the kth and

(k+1)th eigenvalues of the covariance matrix of predictor variables. This links

asymptotic principal component techniques to the traditional principal compo-

nent literature, where analysts will often rank the eigenvalues and search for a

point at which the dierence between successive eigenvalues is large. Theorem

3.1.3(a) may be used to show that this ratio of population eigenvalues may be

138

consistently estimated by the corresponding ratio of sample eigenvalues.

In this section, a statistic is constructed for testing the hypothesis that the

noise-to-signal ratio is small in magnitude. Ultimately, what is needed is a test

statistic with an asymptotic distribution which is established in a framework

in which (N, T ) −→ (∞,∞) jointly. However, it is not yet clear how such a

statistic may be constructed. What is presented below is a testing framework

in which N is xed and T −→ ∞ with a brief investigation of the robustness

of the test to large values of N . While not providing the result that is really

required, this approach provides a candidate test statistic which may be the

subject of a more thorough investigation at a later time. In any case, the test

procedure appears to work reasonably well in a setting in which N is larger

than T .

An obvious approach is to consider the distribution of

√T

(λk+1

λk

− λk+1

λk

)

However, despite the fact that the ratio λk+1

λkis consistently estimated by its

sample counterpart, monte carlo simulations suggest that the distribution of

the above statistic is highly sensitive to the magnitude of N . Consequently,

an alternative approach, based on a slightly dierent bounding argument is

presented below.

It was shown previously that λj 6 σ2 + dj for j = 1, .., N . Since dj = 0

for j > k + 1 it follows that 1N−k

N∑j=k+1

λj 6 λk+1 6 σ2. It follows that υ =

1N−k

N∑j=k+1

λj

λk6 ρ We therefore construct a test of H0: 1

N−k

N∑j=k+1

λj

λk= υα; H1:

139

1N−k

N∑j=k+1

λj

λk> υα. Note that rejection of the null implies ρ > υα.

Consider the statistic θ =N∑

j=k+1

λj − υα(N − k)λk. What is really required

is the asymptotic distribution of θ in a framework in which (N, T ) → (∞,∞)

jointly. For serially independent variables, it is possible that something could

be derived using ideas from RandomMatrix Theory. However, in the correlated

time series setting, this problem remains unresolved. For this reason, as an

interim measure, the distribution of the statistic will be derived in a setting in

which N is xed and T →∞.

Under the assumption that

c = maxi6=j

λj

|λi − λj|< c < ∞

the eigenvalues λj are continuous functions of Sxx (see Magnus and Neudecker

(1991)). Assuming conditions sucient for√

T (Sxx −Ω) to be asymptotically

Gaussian, it follows that√

T (λj−λj) is asymptotically Gaussian for j = 1, .., N

where N is xed and T →∞. Consequently, θ is also asymptotically Gaussian.

Furthermore Lawley (1956) provides the following expressions for the rst two

moments of the eigenvalues.

E(λj) = λj +λj

T

N∑i=1

λi

λj − λi

+ O(T−2)

var(λj) =2λ2

j

T

(1− 1

T

N∑i=1

(λi

λj − λi

)2)

+ O(T−3)

cov(λiλj) =2λj

T 2

(λiλj

λj − λi

)2

+ O(T−3)

140

Given the nite upper bound on c, it follows that

√TE(θ) = O(T− 1

2 )

and

var(√

T θ) = T

(N∑

j=k+1

N∑i=k

cov(λi, λj) + υ2α(N − k)2 var(λk)

)

= 2υ2α(N − k)2λ2

k + 2N∑

j=k+1

λ2j + O(T−1).

Dividing√

T θ by its variance yields the test statistic

φ =

√T

2

(1

N−k

N∑j=k+1

λj

λk− υα

)√

υ2α + 1

(N−k)2

N∑j=k+1

λ2j

λ2k

d−→ N(0, 1)

The table below shows the results of a small Monte Carlo simulation of this

test. We initially set the number of observations to 100, the number of variables

to 5 and the number of factors to 2. We then raise the number of variables to

50, then to 200. We choose the rst 3 population eigenvalues to be 100, 75 and

5. The remaining eigenvalues decay linearly to 0.001. Thus, λk+1

λk= 0.0667.

We conduct 5000 simulations of the test statistic for each model.

The elements in the table are the proportions of the empirical probability

mass that lie above the critical value corresponding to α. Thus, for example,

the 5% critical value for a standard Gaussian distribution is 1.645, and the table

shows that, for a model with 5 observable variables, 0.0696 of the empirical

141

Table 3.1: Empirical and Theoretical Distributions of the Test Statistic (k=2,T=100, 5000 simulations)

Empirical PercentilesN = 5 N = 50 N = 200

α 10.00% 0.1282 0.1264 0.05665.00% 0.0696 0.0728 0.02861.00% 0.0178 0.0256 0.0076

probability mass lies above 1.645.

While in no way being a substitute for a more thorough investigation, the

data in Table 3.1 suggest that the test statistic is able to perform reasonably

well in some cases where N is large relative to T .

3.3 The noise-to-signal ratio for a US macroe-

conomic data set

Stock andWatson (2002b) have collected a large data set of variables describing

the US macroeconomy which they employ in a forecasting experiment using a

factor model. The interested reader is directed to their paper for a description

of the data. The data set used here was downloaded from Professor Watson's

website. We follow Stock and Watson in taking logs and/or dierences or

double-dierences for some variables. Following appropriate transformations

the balanced panel contains 149 variables measured monthly from March 1959

to December 1998. These variables are rescaled to a zero mean and unit

variance.

The plots below shows the eigenvalues of the Stock and Watson data, and

142

the ratios 1N−k

N∑j=k+1

λj

λkfor j = 1, .., N − 1.

Figure 3.1: Eigenvalues of Stock and Watson's data

0

5

10

15

20

25

0 20 40 60 80 100 120 140

Note that the rst few sample eigenvalues drop sharply but the plot levels

out after that. With the exception of the rst couple of values, none of the

ratios in Figure 3.2 are particularly small.

Consider the theoretical problem of producing a forecast for a scalar vari-

able yt using a regression with known population factors, yt = β′ft + εt. The-

orem 3.1.1(e) gives an upper bound on the scaled expected forecast error for

a regression on population factors. As T →∞, this bound becomes

E |eT+h|√β′β

6√

kρ (√

ρ + 1)

where eT+h is the forecast error and, since E(ftf′t) = Ik,

√β′β is the stan-

dard deviation of the `signal' component of the regression. Applying Markov's

Lemma yields a bound on the probability that the forecast error is larger than

143

Figure 3.2: 1N−k

N∑j=k+1

λj

λkfor Stock and Watson's data

0

0.1

0.2

0.3

0.4

0.5

0.6

0 20 40 60 80 100 120 140

the standard deviation of the signal.

P

|eT+h|√

β′β> 1

6√

kρ (√

ρ + 1)

If we can choose a desired numerical bound for the above probability, which

we denote as α, then we may solve the equation α =√

kρ(√

ρ + 1)to nd a

corresponding bound for ρ, which we denote ρα.

Since υ = 1N−k

N∑j=k+1

λj

λk6 ρ, we may test H0: υ = ρα; H1: υ > ρα. Note

that rejection of the null implies ρ > ρα. Table 3.2 presents the results of the

hypothesis test derived above conducted for factor models of orders 1 to 6. We

choose values for ρα corresponding to probability bounds of 5%, 10%, 25% and

50%.

Note that for a 1-factor model we cannot reject a probability bound of 0.25

144

Table 3.2: Test results for Stock and Watson data

k1 2 3 4 5 6

0.05 72.92 88.15 92.2 94.71 96.35 98.580.1 38.82 75.54 88.07 92.02 94.57 96.560.25 -2.74 19.16 49.82 63.27 74.46 79.330.5 -11.38 -4.37 8.7 17.1 27.53 32.31

and 0.5, and for a 2-factor model we cannot reject a probability bound of 0.5,

using signicance levels of 5 per cent. For all other factor models and probabil-

ity bounds we can strongly reject the hypothesis about the probability bound.

Overall, these results do not provide strong support for the proposition that

the US macroeconomic data set used by Stock and Watson (2002b) satises

the condition of having a small noise-to-signal ratio, although the preliminary

nature of these results must be stressed.

3.4 Summary and concluding comments

It has been argued in this chapter that the approximate factor model that

has been investigated in the theoretical literature imposes restrictions on the

cross-correlation of the errors that are likely to be not satised in the types

of applications of the factor model which have appeared in the empirical lit-

erature. Some new theory was presented which proves consistency under as-

sumptions which allow for much greater error cross-correlation. However, the

rates of convergence that are achieved depend on the rate of growth of the er-

ror cross-correlation. Consequently, it is possible for models with a very large

cross-sectional dimension to perform poorly.

145

An important conclusion of the theoretical approach taken in this chapter

is that it suggests that what matters for the quality of the principal component

estimator is not the number of variables in the model per se, but rather the

noise-to-signal ratio of the model. Rather than concerning themselves with

collecting data on every available variable, so that N may be made as large

as possible, practitioners should be giving thought to the likely noise-to-signal

ratio. Clearly, what is required is a trade-o between having a large number of

variables and having a low amount of error cross-correlation. Unfortunately,

the noise-to-signal ratio is not generally identied and so bounding arguments

must be employed to investigate its magnitude.

Clearly, there exists plenty of scope for the work in this chapter to be

extended. First order convergence is interesting, but some results on second

order convergence analogous to those of Bai (2003) and Bai and Ng (2006)

would also be useful. Of particular interest would be an investigation of the

asymptotic distribution of the sample eigenvalues of the covariance matrix as

(N, T ) → (∞,∞), since knowledge of this distribution may lead to a better

testing methodology for the noise-to-signal ratio than what has been proposed

here. While the `xed-N' test that is derived in Section 3.2 is a vast improve-

ment on simply hoping that the noise-to-signal ratio is small, it is clear that

what is really needed is a test statistic which converges to a known distribution

as (N, T ) → (∞,∞).

Another issue worthy of investigation is the estimation of principal com-

ponents in a framework in which (N, T ) → (∞,∞) and the rst k eigenvalues

grow at a rate slower than N . The work in Subsection 3.1.1 shows that popu-

lation principal components are consistent estimators of population factors as

146

the noise to signal ratio gets small. If the eigenvalues of the error covariance

matrix are assumed to be bounded, then all that is required of the eigenvalues

of B′B is that they grow with N growth at a rate exactly equal to N is not

required. Consequently, these theorems cover certain cases where the factors

are `weak' in the sense that they account for a declining proportion of the total

variance of xt as N grows. What is needed is a theory for the estimation of

principal components in such cases. Theorem 3.1.3 does not cover this case

since it requires the rst k eigenvalues to grow at a rate of strictly N in order

for the `gap' condition to be satised. The current versions of Random Matrix

Theory also do not cover this case since they assume the eigenvalues to be

bounded as N grows.

Finally, the work presented in this chapter applies to the static model only.

An interesting extension would be to develop analogous results for the dynamic

factor model of Forni et al. (2000).

Appendices Proofs of Theorems

These appendices contain proofs of the four theorems stated in this chapter.

Theorems 3.1.1 and 3.1.2 concern the properties of population quantities and

are proved in Appendix A. Theorems 3.1.3 and 3.1.4 describe the properties of

sample quantities and are proved in Appendix B. In each appendix the proofs

of the theorems are given, then lemmas used to prove the theorems are stated

and nally, the lemmas are proved.

147

Appendix A Proofs of Theorems 3.1.1 and 3.1.2

Proofs of Theorems

Proof of Theorem 3.1.1(a): Let χ = σ2IN−Ψ. Then χ+Ω = BB′+σ2IN .

Note that eigj(χ) = σ2 − σ2j , where eigj(.) denotes the jth ordered eigenvalue

of its matrix argument, so eigj(χ) > 0 ∀j = 1, .., N . Thus, χ is positive semi-

denite. It follows from Magnus and Neudecker (1991, p.208, Theorem 9) that

eigj(BB′+σ2IN) > eigj(Ω), i.e. dj +σ2 > λj ∀j = 1, .., N . It also follows from

Magnus and Neudecker (1991, p.208, Theorem 9) that λi > di ∀i = 1, ..., k.

The result follows.

Proof of Theorem 3.1.1(b):

‖Qf − UL‖2F = tr

[(Q′

f − LU ′) (Qf − UL)]

= 2tr(I − LQ′

fU)

(3.4)

From Lemma 10, 1− ρ−k∑

j 6=i

(q′iuj)2 6 (q′iui)

2 where qi is the ith column of

Qf and ui is the ith column of U . If k = 1 then c = 0 and the result holds

from Equation (3.4) with sign(Lii) = sign(q′iuj). If k > 1 then using Lemma

9, 1 − ρ − 4c2ρ2(k − 1) 6 (q′iui)2. With c 6 1

2ρ

√1−ρk−1

the left hand side is

non-negative and √1− ρ− 4c2ρ2(k − 1) 6 |q′iui| (3.5)

If we choose L so that sign(Lii) = sign(q′iuj) then from equations (3.4) and

(3.5) we get ‖Qf − UL‖2F 6 k − k

√(1− ρ)− 4c2ρ2(k − 1). Multiplying this

by1+√

(1−ρ)−4c2ρ2(k−1)

1+√

(1−ρ)−4c2ρ2(k−1)yields the result.

148

Proof of Theorem 3.1.1(c):

E ‖st − Lft‖2F = tr

[(Λ− 1

2f Q′

fUD12 − L

)(D

12 U ′QfΛ

− 12

f − L)

+ Λ− 1

2f Q′

fΨQfΛ− 1

2f

]= 2tr

(I − LΛ

− 12

f Q′fUD

12

)As in the proof to Theorem 3.1.1(b), 1− ρ− 4c2ρ2(k − 1) 6 (q′iui)

2. From

Theorem 3.1.1(a) 1− ρ 6 di

λi, so

(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ) 6di

λi

(q′iui)2

If c 6 1−ρ

2ρ√

(k−1)(1−ρ)then the left hand side is non-negative and√

(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ) 6√

di

λi|q′iui|. If we choose L so that

sign(Lii) = sign(q′iuj), we get E ‖st − Lft‖2F 6 k−k

√(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ).

Multiplying this by1+√

(1−ρ)2−4c2ρ2(k−1)(1−ρ)

1+√

(1−ρ)2−4c2ρ2(k−1)(1−ρ)yields the result.

Proof of Theorem 3.1.1(d): Using the triangle inequality

‖βs − Lβf‖F =

∥∥∥∥∥ 1

T

T∑t=1

sftyt −1

T

T∑t=1

Lftyt

∥∥∥∥∥F

61

T

T∑t=1

‖yt‖F ‖sft − Lft‖F

Therefore, by the Cauchy-Schwarz inequality

‖βs − Lβf‖F 61

T

T∑t=1

√E(‖yt‖2

F

)E(‖sft − Lft‖2

F

)=√

σ2yk(2ρ + ρ2(4c2(k − 1)(1− ρ)− 1))

by Theorem 3.1.1(c).

149

Proof of Theorem 3.1.1(e): Dening Sxy = 1T

T∑t=1

xtyt, the forecast devia-

tion is

eT+h = β′ssfT+h − β′fT+h =(Λ− 1

2f Q′

fSxy

)′Λ− 1

2f Q′

fxT+h − β′fT+h

= S ′xyQfΛ−1f Q′

f (BfT+h + εT+h)− β′fT+h

= ea + eb

where ea =(S ′xyQfΛ

−1f Q′

fB − β′)fT+h and eb = S ′xyQfΛ

−1f Q′

fεT+h.

First consider eb. From the Cauchy-Schwarz inequality we have

|eb| 6∥∥S ′xyQfΛ

−1f

∥∥F

∥∥Q′fεT+h

∥∥Fand

(E |eb|)2 6 E∥∥S ′xyQfΛ

−1f

∥∥2

FE∥∥Q′

fεT+h

∥∥2

F(3.6)

We have that

E∥∥Q′

fεT+h

∥∥2

F= tr(Q′

fΨQf ) 6 σ2k (3.7)

Also, letting ωi be a vector of zeros with a 1 in the ith element, using Lemma

5

E∥∥S ′xyQfΛ

−1f

∥∥2

F= E

(S ′xyQfΛ

−2f Q′

fSxy

)=

k∑i=1

E[(

ω′iΛ−1f Q′

fSxy

)2]=

k∑i=1

[var

(ω′iΛ

−1f Q′

fSxy

)+ E

(ω′iΛ

−1f Q′

fSxy

)2]=

k∑i=1

[var

(ω′iΛ

−1f Q′

fSxy

)+(ω′iΛ

−1f Q′

fBβ)2]

(3.8)

150

Using Lemma 8 and Lemma 6

k∑i=1

(ω′iΛ

−1f Q′

fBβ)2

= β′B′QfΛ−2f Q′

fBβ = β′D12 U ′QfΛ

−2f Q′

fUD12 β

= βD12 RΛ−2

f R′D12 β 6 β′βmaxeig

(Λ−1

f R′DRΛ−11

)6 β′βmaxeig

(Λ−1

f ΛfΛ−1f

)6 β′βλ−1

k

(3.9)

where R = U ′Qf . Also, from Lemma 4k∑

i=1

var(ω′iΛ

−1f Q′

fSxy

)6 Υ1 + Υ2 + Υ3

where

Υ1 =2

T

k∑i=1

ω′iΛ−1f Q′

fΩQfΛ−1f ωiσ

(0)2y

Υ2 =2

T

k∑i=1

T−1∑j=1

∣∣ω′iΛ−1f Q′

fE(xtx′t−j)QfΛ

−1f ωiσ

(j)2y

∣∣Υ3 =

2

T

k∑i=1

T−1∑j=1

∣∣ω′iΛ−1f Q′

fE(xtyt−j)E(ytx′t−j)QfΛ

−1f ωi

∣∣where σ

(j)2y = E(ytyt−j).

We have that

Υ1 =2

Tσ(0)2

y tr(Λ−1

f ΛfΛ−1f

)6

2

Tσ(0)2

y λ−1k

Also Υ2 = 2T

k∑i=1

T−1∑j=1

∣∣∣ω′iΛ− 12

f E(sts′t−j)Λ

− 12

f ωiσ(j)2y

∣∣∣ where st = Λ− 1

2f Q′

fxt is the

principal component vector of xt, so

Υ2 =2

T

k∑i=1

T−1∑j=1

∣∣∣∣E(sits′it−j)

λi

σ(j)2y

∣∣∣∣ 6 2

T

k∑i=1

T−1∑j=1

1

λi

∣∣E(sits′it−j)

∣∣ ∣∣σ(j)2y

∣∣ 6 2

Tλ−1

k

T−1∑j=1

∣∣σ(j)2y

∣∣151

.

For the third term

Υ3 =2

T

k∑i=1

T−1∑j=1

1

λi

∣∣E(sityt−j)E(yts′it−j)

∣∣ 6 2

T

k∑i=1

T−1∑j=1

1

λi

|E(sityt−j)|∣∣E(yts

′it−j)

∣∣6

2σ(0)2y

T

k∑i=1

T−1∑j=1

1

λi

∣∣E(yts′it−j)

∣∣ 6 2σ(0)2y

Tλ−1

k supi

T−1∑j=1

∣∣E(yts′it−j)

∣∣

so

k∑i=1

var(ω′iΛ

−1f Q′

fSxy

)6

2

Tλk

(T−1∑j=0

∣∣σ(j)2y

∣∣+ σ(0)y sup

i

T−1∑j=1

∣∣E(yts′i,t−j)

∣∣)

=2

Tλk

σ(0)2y δ

(3.10)

where δ =T−1∑j=1

∣∣∣E(ytyt−j)

E(y2t )

∣∣∣+ supi

T−1∑j=1

∣∣∣∣ E(ytsi,t−j)√E(y2

t )E(s2it)

∣∣∣∣.Equations (3.8), (3.9), and (3.10) yield

E∥∥S ′xyQfΛ

−1f

∥∥26 λ−1

k

(2σ

(0)2y γ

T+ β′β

)

which when combined with equations (3.6) and (3.7) yield

(E |eb|)2

‖β‖2F

6σ2k

λk

(2

T

σ(0)2y γ

‖β‖2F

+ 1

).

152

Now consider ea. By the Cauchy-Schwarz inequality we have

(E |ea|)2 6 E∥∥S ′xyQfΛ

−1f Q′

fB − β′∥∥2

E ‖fT+h‖2

= kE∥∥S ′xyQfΛ

−1f Q′

fB − β′∥∥2

(3.11)

Now from Lemma 5,

E∥∥S ′xyQfΛ

−1f Q′

fB − β′∥∥2

=k∑

i=1

E[ω′i(B′QfΛ

−11 Q′

fSxy − β)2]

=k∑

i=1

var[ω′i(B′QfΛ

−1f Q′

fSxy − β)]

+k∑

i=1

E[ω′i(B′QfΛ

−1f Q′

fSxy − β)]2

(3.12)

but from Lemma 4k∑

i=1

var[ω′i(B′QfΛ

−1f Q′

fSxy − β)]

6 ∆1 + ∆2 + ∆3 where

∆1 =k∑

i=1

2

Tω′iB

′QfΛ−1f Q′

fΩQfΛ−1f Q′

fBωiσ(0)2y

∆2 =2

T

k∑i=1

T−1∑j=1

∣∣ω′iB′QfΛ−1f Q′

fE(xtxt−j)QfΛ−1f Q′

fBωiσ(j)2y

∣∣∆3 =

2

T

k∑i=1

T−1∑j=1

∣∣ω′iB′QfΛ−1f Q′

fE(xtyt−j)ω′iB

′QfΛ−11 Q′

fE(xtyt−j)∣∣

We have that

∆1 =2σ

(0)2y

Ttr(B′QfΛ

−1f Q′

fB)

62σ

(0)2y

Ttr(B′Ω−1B

)6

2kσ(0)2y

T

153

Also

∆2 =2

T

k∑i=1

T−1∑j=1

∣∣∣ω′iB′QfΛ− 1

2f E(stst−j)Λ

− 12

f Q′fBωiσ

(j)2y

∣∣∣

let vi = ω′iB′QfΛ

− 12

f =

(0 ... 0 νi 0 ... 0

)then

∆2 =2

T

k∑i=1

T−1∑j=1

∣∣viE(stst−j)v′iσ

(j)2y

∣∣ =2

T

k∑i=1

T−1∑j=1

∣∣ν2i E(stst−j)σ

(j)2y

∣∣6

2

T

k∑i=1

T−1∑j=1

ν2i

∣∣σ(j)2y

∣∣ 6 2

T

k∑i=1

ω′iBQfΛ−1f Q′

fBωi

T−1∑j=1

∣∣σ(j)2y

∣∣6

2

T

k∑i=1

tr(B′Ω−1B)T−1∑j=1

∣∣σ(j)2y

∣∣ 6 2k

T

T−1∑j=1

∣∣σ(j)2y

∣∣

For the third term

∆3 =2

T

k∑i=1

T−1∑j=1

∣∣∣ω′iB′QfΛ− 1

2f E(styt−j)E(yts

′t−j)Λ

− 12

f Q′fBωi

∣∣∣=

2

T

k∑i=1

ω′iB′QfΛ

−1f Q′

fBωi

T−1∑j=1

∣∣E(styt−j)E(yts′t−j)∣∣

62

Ttr(B′Ω−1B)σ(0)

y supi

T−1∑j=1

∣∣E(yts′t−j)∣∣

62kσ

(0)y

Tsup

i

T−1∑j=1


154

So combining the three terms yields

k∑i=1

var[ω′i(B′QfΛ

−1f Q′

fSxy − β)]

62k

T

T−1∑j=1

∣∣σ(j)2y

∣∣+ 2kσ(0)y

Tsup

i

T−1∑j=1


=2k

Tσ(0)2

y γ

(3.13)

Alsok∑

i=1

E[ω′i(B′QfΛ

−1f Q′

fSxy − β)]

= ω′i(M−I)β where M = D12 RΛ−1

f R′D12

and R = U ′Qf , sok∑

i=1

E[ω′i(B′QfΛ

−1f Q′

fSxy − β)]2

= β′(M − I)2β. However

M2 6 B′QfΛ−1f Q′

fΩQfΛ−1f Q′

fB = M so (M − I)2 6 (I − M) ⇒ M 6 Ik.

Therefore, using Lemma 7

k∑i=1

E[ω′i(B′QfΛ

−1f Q′

fSxy − β)]2

6 β′(M − I)2β

6 β′β [maxeig(I −M)]2

= β′β[maxeig

(Λ− 1

2f Q′

fΨQfΛ− 1

2f

)]26 β′β

(σ2

λk

)2

(3.14)

Combining equations (3.11) to (3.14) yields

(E |ea|)2

‖β‖2F

6 k

((σ2

λk

)2

+2k

T

σ(0)2y γ

‖β‖2F

)

Noting that ρ = σ2

λkand 1

r2 =σ2

y

‖β‖2F, combining the above results yields the

result of the theorem.

155

Proof of Theorem 3.1.2: From Magnus and Neudecker (1991, p.208, The-

orem 9) λk > dk > NdL so ρ = σ2

λk6 σ2

dk6 σ2

NdU= O (N−α).

Lemmas Used in Theorems

Lemma 3. If ω ∼ N(0, Γ) and α and β are vectors of conformable dimension

then E(α′ω)2(β′ω)2 = α′Γαβ′Γβ + 2(α′Γβ)2. This is a standard property of

Gaussian distributions. See, e.g. Johnson and Kotz (1972).

Corollary 2. var(α′ωβ′ω) = α′Γαβ′Γβ + (α′Γβ)2. The proof is elementary.

Lemma 4. If zt =

wt

ut

is Gaussian and E(ztz′t−j) = Γ(j) =

Γ(j)w Γ

(j)wu

Γ(j)uw Γ

(j)u

,

then using Lemma 3, and the Cauchy-Schwarz inequality,

var(a′Swub) = var

(1

T

T∑t=1

a′wtb′ut

)

62

T

(a′Γ(0)

w ab′Γ(0)u b +

T−1∑j=1

a′Γ(j)w ab′Γ(j)

u b +T−1∑j=1

a′Γ(j)wuab′Γ(−j)

wu b

)

where a and b are vectors of conformable dimension.

Corollary 3. E(α′u)2(β′v)2 = α′Γuαβ′Γvβ +2(α′Γuvβ)2 6 3α′Γuαβ′Γvβ. The

proof is elementary.

Lemma 5. If Z is a random vector and ei is a k × 1 vector of zeros but with

a 1 in position i, and M is a k × k constant, then

E(Z ′M ′MZ) = E(Z ′M ′k∑

i=1

eie′iMZ) =

k∑i=1

E(Z ′M ′ei)(e′iMZ) =

k∑i=1

E(e′iMZ)2

Lemma 6. Λ = Q′f (UDU ′ + Ψ)Qf

156

Lemma 7. If M = D12 RΛ−1

f R′D12 , where R = U ′Qf , then the eigenvalues of

I −M are equal to the eigenvalues of Λ− 1

2f Q′

fΨQfΛ− 1

2f .

Lemma 8. The eigenvalues of D12 RΛ−2

f R′D12 are equal to the eigenvalues of

Λ−1R′DRΛ−1, where R = U ′Qf .

Lemma 9. |q′iuj| 6 2cρ for i 6= j where qi is the ith column of Qf and uj is

the jth column of U, and c = max16i6k16j6N

i6=j

λi

|λj−λi|

Lemma 10. 1− ρ 6k∑

j=1

(q′iuj)2

Proofs of Lemmas

Proof of Lemma 6: ΩQf = QfΛf ⇒ (UDU ′+Ψ)Qf = QfΛf . Premultiply-

ing by Q′f gives the result.

Proof of Lemma 7: Using Lemma 6, the eigenvalues of I −M are the solu-

tions of

0 = |λI − (I −M)| = |(λ− 1)I + M | =∣∣∣(λ− 1)I + D

12 RΛ−1

f R′D12

∣∣∣=∣∣(λ− 1)I + RΛ−1

f R′D∣∣ =

∣∣(λ− 1)I + I −Q′fΨQfΛ

−1f

∣∣=∣∣λI −Q′

fΨQfΛ−1f

∣∣ =∣∣∣λI − Λ

− 12

f Q′fΨQfΛ

− 12

f

∣∣∣ .

Proof of Lemma 8: The eigenvalues of D12 R1Λ

−2R′1D

12 are the solutions for

λ of 0 =∣∣∣λI −D

12 RΛ−2

f R′D12

∣∣∣ =∣∣λI −RΛ−2

f R′D∣∣ =

∣∣λI − Λ−1f R′DR1Λ

−1f

∣∣

157

Proof of Lemma 9: QfΛfQ′f + Q⊥Λ⊥Q′

⊥ = UDU ′ + Ψ. Premultiplying by

Q′f , postmultiplying by UΛ−1

f , and subtracting Q′fU yields

ΛfQ′fUΛ−1

f −Q′fU = Q′

fΨUΛ−1f −Q′

fU(I −DΛ−1f ) (3.15)

We now consider each of the right hand side terms. Let ei be a vector of zeros

with a 1 in the ith element only. We have

(e′iQfΨUΛ−1

f ej

)26 tr

(U ′ΨQfeie

′iQ

′fΨUΛ−2

f

)6

1

λ2k

tr(e′iQ

′fΨUU ′ΨQfei

)6

1

λ2k

e′iQ′fΨ

2Qfei 6σ4

λ2k

∴∣∣e′iQfΨUΛ−1

f ej

∣∣ 6 ρ (3.16)

For the other right hand side term we have∣∣e′iQ′

fU(I −DΛ−1f )ej

∣∣ =∣∣∣q′iuj

(1− dj

λj

)∣∣∣ =

|q′iuj|(1− dj

λj

)since λj > dj. But 1 − dj

λj6 ρ from Theorem 3.1.1(a) so

|q′iuj|(1− dj

λj

)6 ρ |q′iuj| and |q′iuj| 6 1 by the Cauchy-Schwarz inequality, so

∣∣e′iQ′fU(I −DΛ−1

f )ej

∣∣ 6 ρ (3.17)

Combining equations (3.15), (3.16), and (3.17),

∣∣e′i (ΛfQ′fUΛ−1

f −Q′fU)ej

∣∣ =∣∣e′iQ′

fΨUΛ−1f ej − e′iQ

′fU(I −DΛ−1

f )ej

∣∣6∣∣e′iQ′

fΨUΛ−1f ej

∣∣+ ∣∣e′iQ′fU(I −DΛ−1

f )ej

∣∣ 6 2ρ

i.e.∣∣∣( λi

λj− 1)

q′iuj

∣∣∣ 6 2ρ ⇒ |q′iuj| 6 2cρ for i 6= j.

Proof of Lemma 10: QfΛfQ′f + Q⊥Λ⊥Q′

⊥ = UDU ′ + Ψ. Premultiply by

158

Q′fUU ′, postmultiply by Qf , and substitute R = U ′Qf to get

R′RΛf = R′DR + R′U ′ΨQf (3.18)

Also

Λ = Q′fΩQf ⇒ Λ = Q′

f (UDU ′ + Ψ)Qf ⇒ Λ = R′DR + Q′fΨQf (3.19)

Subtract Equation (3.18) from Equation (3.19) and postmultiply by Λ−1f to

get

R′R− I = R′U ′ΨQfΛ−1 −Q′

fΨQfΛ−11 = Q′

fUU ′ΨQfΛ−1f −Q′

fΨQfΛ−11

= Q′f (UU ′ − I)ΨQfΛ

−1f

= −Q′fU⊥U ′

⊥ΨQfΛ−1f

so e′i (I −R′R) ei = e′i(Q′

fU⊥U ′⊥ΨQfΛ

−1f

)ei 6 σ2

λk= ρ.

i.e. 1− ρ 6k∑

j=1

(q′iuj)2

Appendix B Proofs of Theorems 3.1.3 and 3.1.4

Proofs of Theorems

Proof of Theorem 3.1.3(a):

max16j6N

1

N2

(λj − λj

)2

61

N2

N∑j=1

(λj − λj

)2

61

N2‖Sxx − Ω‖2

F = Op

(T−1

)from Lemmas 11 and 15.

159

Proof of Theorem 3.1.3(b):

sft − Lsft = Λ− 1

2f Q′

fxt − LΛ− 1

2f Q′

fxt

= Λ− 1

2f Q′

f

(QfQ

′f + Q⊥Q′

⊥)xt − LΛ

− 12

f Q′fxt

= Λ− 1

2f Q′

fQfQ′fxt + Λ

− 12

f Q′fQ⊥Q′

⊥xt − LΛ− 1

2f Q′

fxt

= Λ− 1

2f Q′

fQfQ′fxt − Λ

− 12

f LQ′fxt + Λ

− 12

f LQ′fxt − Λ

− 12

f LQ′fxt + Λ

− 12

f Q′fQ⊥Q′

⊥xt

= Λ− 1

2f

(Q′

fQf − L)

Q′fxt +

(Λ− 1

2f − Λ

− 12

f

)LQ′

fxt + Λ− 1

2f Q′

fQ⊥Q′⊥xt

Therefore

‖sft − Lsft‖2 6√

Nλ− 1

2k

∥∥∥Q′fQf − L

∥∥∥2

∥∥∥∥ 1√N

Q′fxt

∥∥∥∥2

+√

N∥∥∥Λ− 1

2f − Λ

− 12

f

∥∥∥2

∥∥∥∥ 1√N

Q′fxt

∥∥∥∥2

+√

Nλ− 1

2k

∥∥∥Q′fQ⊥

∥∥∥2

∥∥∥∥ 1√N

Q′⊥xt

∥∥∥∥2

(3.20)

The following bounds apply to the right hand side terms in Equation (3.20).

• Since λk > 0 it follows from Theorem 3.1.3(a) that

√Nλ

− 12

k =√

Nλ− 1

2k + Op

(T− 1

2

)= O(1)

•√

N∥∥∥Λ− 1

2f − Λ

− 12

f

∥∥∥2

= max16j6k

(√Nλ

− 12

j −√

Nλ− 1

2j

)= Op

(T− 1

2

)from the

above bound.

•∥∥∥Q′

fQ⊥

∥∥∥2

= Op

(T− 1

2


160

•


∥∥∥2

=∥∥∥Rf − L

∥∥∥2

6∥∥∥Rf − L

∥∥∥F

=

√√√√ k∑i=1

k∑j=1

(Rij − Lij

)2

6

√√√√k max16i6k

k∑j=1

(Rij − Lij

)2

6√

k max16i6k

k∑j=1

∣∣∣Rij − Lij

∣∣∣=√

k max16i6k

k∑j=1

j 6=i

∣∣∣Rij

∣∣∣+ ∣∣∣Rii − Lii

∣∣∣

But for i 6= j, R2ij = Op (T−1) from Lemmas 12 and 15 and the Markov

inequality. Therefore∣∣∣Rij

∣∣∣ = Op

(T− 1

2

)for i 6= j. Also, from Lemmas

13 and 15 and the Markov inequality, 1 − R2ii = Op

(T− 1

2

). There-

fore, ∃Lii ∈ −1, +1 such that∣∣∣Rii − Lii

∣∣∣ = Op

(T− 1

2

). Consequently∥∥∥Q′

fQf − L∥∥∥

2= Op

(T− 1

2

).

•

E

(∥∥∥∥ 1√N

Q′⊥xt

∥∥∥∥2

)6 E

(∥∥∥∥ 1√N

Q′⊥xt

∥∥∥∥F

)= E

√∥∥∥∥ 1√N

Q′⊥xt

∥∥∥∥2

F

6

√√√√E

(∥∥∥∥ 1√N

Q′⊥xt

∥∥∥∥2

F

)6

√1

Ntr (Q′

⊥ΩQ⊥)

6

√1

Ntr(Ω)− 1

Ntr(Q′

fΩQf

)6

√1

Ntr(Ω) = O(1)

from Assumption 3.2. Therefore∥∥∥ 1√

NQ′⊥xt

∥∥∥2

= Op(1)

161

•

E

(∥∥∥∥ 1√N

Q′fxt

∥∥∥∥2

)6 E

(∥∥∥∥ 1√N

Q′fxt

∥∥∥∥F

)= E

√∥∥∥∥ 1√N

Q′fxt

∥∥∥∥2

F

6

√√√√E

(∥∥∥∥ 1√N

Q′fxt

∥∥∥∥2

F

)6

√1

Ntr(Q′

fΩQf

)

=

√√√√ 1

N

k∑j=1

λj = O(1)

from Assumption 3.2. Therefore∥∥∥ 1√

NQ′

fxt

∥∥∥2

= Op(1).

The above bounds and Equation (3.20) prove the theorem.

Proof of Theorem 3.1.3(c):

∥∥∥βs − Lβs

∥∥∥2

6

∥∥∥∥∥ 1

T

T∑t=1

(sft − Lsft) yt

∥∥∥∥∥2

61

T

T∑t=1

‖(sft − Lsft) yt‖2

As in the proof of Theorem 3.1.3(b),

sft − Lsft = Λ− 1

2f Q′

fxt − LΛ− 1

2f Q′

fxt

= Λ− 1

2f Q′

f

(QfQ

′f + Q⊥Q′

⊥)xt − LΛ

− 12

f Q′fxt

= Λ− 1

2f Q′

fQfQ′fxt + Λ

− 12

f Q′fQ⊥Q′

⊥xt − LΛ− 1

2f Q′

fxt

= Λ− 1

2f Q′

fQfQ′fxt − Λ

− 12

f LQ′fxt + Λ

− 12

f LQ′fxt − Λ

− 12

f LQ′fxt + Λ

− 12

f Q′fQ⊥Q′

⊥xt

= Λ− 1

2f

(Q′

fQf − L)

Q′fxt +

(Λ− 1

2f − Λ

− 12

f

)LQ′

fxt + Λ− 1

2f Q′

fQ⊥Q′⊥xt

162

Therefore

1

T

T∑t=1

‖(sft − Lsft) yt‖2 6√

Nλ− 1

2k


∥∥∥2

1

T

T∑t=1

∥∥∥∥ 1√N

Q′fxtyt

∥∥∥∥2

+√

N∥∥∥Λ− 1

2f − Λ

− 12

f

∥∥∥2

1

T

T∑t=1

∥∥∥∥ 1√N

Q′fxtyt

∥∥∥∥2

+√

Nλ− 1

2k

∥∥∥Q′fQ⊥

∥∥∥2

1

T

T∑t=1

∥∥∥∥ 1√N

Q′⊥xtyt

∥∥∥∥2

(3.21)

As in the shown in the proof of Theorem 3.1.3(b), the following bounds apply

• Since λk > 0 it follows from Theorem 3.1.3(a) that

√Nλ

− 12

k =√

Nλ− 1

2k + Op

(T− 1

2

)= O(1)

•√

N∥∥∥Λ− 1

2f − Λ

− 12

f

∥∥∥2

= max16j6k

(√Nλ

− 12

j −√

Nλ− 1

2j

)= Op

(T− 1

2

)from the

above bound.

•∥∥∥Q′

fQ⊥

∥∥∥2

= Op

(T− 1

2


•


∥∥∥2

=∥∥∥Rf − L

∥∥∥2

6∥∥∥Rf − L

∥∥∥F

=

√√√√ k∑i=1

k∑j=1

(Rij − Lij

)2

6

√√√√k max16i6k

k∑j=1

(Rij − Lij

)2

6√

k max16i6k

k∑j=1

∣∣∣Rij − Lij

∣∣∣=√

k max16i6k

k∑j=1

j 6=i

∣∣∣Rij

∣∣∣+ ∣∣∣Rii − Lii

∣∣∣ = Op

(T− 1

2

)

163

But for i 6= j, R2ij = Op (T−1) from Lemmas 12 and 15 and the Markov

inequality. Therefore∣∣∣Rij

∣∣∣ = Op

(T− 1

2

)for i 6= j. Also, from Lemmas

13 and 15 and the Markov inequality, 1 − R2ii = Op

(T− 1

2

). There-

fore, ∃Lii ∈ −1, +1 such that∣∣∣Rii − Lii

∣∣∣ = Op

(T− 1

2

). Consequently∥∥∥Q′

fQf − L∥∥∥

2= Op

(T− 1

2

).

Also

•

E

(1

T

T∑t=1

∥∥∥∥ 1√N

Q′⊥xtyt

∥∥∥∥2

)=

1

T

T∑t=1

E

(∥∥∥∥ 1√N

Q′⊥xt

∥∥∥∥2

yt

)

61

T

T∑t=1

√√√√E

(∥∥∥∥ 1√N

Q′⊥xt

∥∥∥∥2

F

)E (y2

t )

61

T

T∑t=1

√1

Ntr (Q′

⊥ΩQ⊥) σ2y

6

√1

Ntr(Ω−Q′

fΩQf

)σ2

y

61

T

T∑t=1

√1

Ntr(Ω)σ2

y = O(1)

from assumptions 3.2 and 3.5. Therefore 1T

T∑t=1

∥∥∥ 1√N

Q′⊥xtyt

∥∥∥2

= Op(1).

164

•

E

(1

T

T∑t=1

∥∥∥∥ 1√N

Q′fxtyt

∥∥∥∥2

)=

1

T

T∑t=1

E

(∥∥∥∥ 1√N

Q′fxt

∥∥∥∥2

yt

)

61

T

T∑t=1

√√√√E

(∥∥∥∥ 1√N

Q′fxt

∥∥∥∥2

F

)E (y2

t )

61

T

T∑t=1

√1

Ntr(Q′

fΩQf

)σ2

y

61

T

T∑t=1

√√√√ 1

N

k∑j=1

λjσ2y = O(1)

from assumptions 3.2 and 3.5.

The above bounds and Equation 3.21 prove the theorem.

Proof of Theorem 3.1.3(d):

ysT+h− ysT+h = β′ssfT+h− β′ssfT+h = β′s (sfT+h − LsfT+h) +(β′sL− β′s

)sfT+h

so

|ysT+h − ysT+h| 6∥∥∥βs

∥∥∥2‖sfT+h − LsfT+h‖2 +

∥∥∥βs − β′s

∥∥∥2‖sfT+h‖2

since L is orthogonal. The following bounds hold

• ‖sfT+h − LsfT+h‖2 = Op

(T− 1

2

)from Theorem 3.1.3(b).

•∥∥∥β′s − β′s

∥∥∥2

= Op

(T− 1

2

)from Theorem 3.1.3(c).

• E ‖sfT+h‖2 6 E ‖sfT+h‖F = tr(Λ− 1

2f Q′

fΩQfΛ− 1

2f

)= k. Therefore ‖sfT+h‖2 =

Op(1).

165

• βs = βs + Op

(T− 1

2

)from Theorem 3.1.3(c). Therefore

∥∥∥βs

∥∥∥2

= O(1).

and the result follows.

Proof of Theorem 3.1.4: The proof of Theorem 3.1.4 is based on the fol-

lowing inequalities

(a) | 1N

λj − 1N

dj| 6 | 1N

λj − 1N

λj|+ | 1N

λj − 1N

dj|

(b) ‖sft − Lft‖2 6 ‖sft − L1sft‖2 + ‖L1sft − Lft‖2

(c)∥∥∥βs − Lβf

∥∥∥2

6∥∥∥βs − L1βs

∥∥∥2+ ‖L1βs − Lβf‖2

(d) |ysT+h − yfT+h| 6 |ysT+h − ysT+h|+ |ysT+h − yfT+h|

where L1 is a sign matrix. Theorem 3.1.1 gives bounds linking population prin-

cipal component quantities to population factor quantities, which are relevant

for the second term in each of the above inequalities. However, these bounds

are written in terms of the noise-to-signal ratio ρ = σ2

λk. The assumptions

of Theorem 3.1.4 are written in terms of the eigenvalues of B′B (dj) rather

than the eigenvalues of Ω (λj). In order to utilise the results of Theorem

3.1.1, the bounds are re-derived in terms of the modied noise-to-signal ratio

ρ = σ2

dk. These results are given as Lemmas 18, 21 and 22. Under assumptions

4.1(a) and 4.2(b), from Theorem 3.1.2 we have ρ = O (N−1). Therefore, under

Assumption 3.4, Lemma 21 yields

‖L1sft − Lft‖2 = Op

(N−α

2

)and Lemma 22 yields

‖βs − Lβf‖2 = Op

(N−α

2

)166

Since ysT+h − yfT+h = β′ssft − β′fft, these two results yield

ysT+h − yfT+h = Op

(N−α

2

)Lemma 18 states that 1− ρ 6 dj

λj6 1. It follows that

∣∣∣∣ 1

Nλj −

1

Ndj

∣∣∣∣ 6 λj

Nρ = O

(N−α

)since ρ = O (N−α) from Theorem 3.1.2 and, as shown in the proof to Theorem

3.1.1(a), under assumptions 4.1(a) and 4.2(b), dj 6 λj 6 dj +σ2, which implies

that λj

N= O(1) under Assumption 4.1(a). Thus, we have bounds for the second

terms on the right hand sides of inequalities (a) to (d).

Bounds for the rst terms on the right hand sides of inequalities (a) to (d)

are provided by Theorem 3.1.3. The following points prove that the assump-

tions of Theorem 3.1.3 are satised by the assumptions of Theorem 3.1.4.

• Assumption 3.1 of Theorem 3.1.3 is satised by Assumption 4.3(a) of

Theorem 3.1.4 .

• Assumption 3.2 of Theorem 3.1.3 is satised by assumptions 4.1(a),

4.2(a), 4.2(b) and 4.3(a) of Theorem 3.1.4 .

• Assumption 3.3 of Theorem 3.1.3 is satised by Assumption 4.4 of The-

orem 3.1.4 using Lemma 17.

• Assumption 3.4 of Theorem 3.1.3 is satised by assumptions 4.1(a),

4.1(b) and 4.2(b) of Theorem 3.1.4 using Lemma 16.

167

• Assumption 3.5 of Theorem 3.1.3 is the same as Assumption 4.5 of The-

orem 3.1.4.

Thus, under the assumptions of Theorem 3.1.4, the results of Theorem 3.1.3

hold and the rst terms on the right hand side of inequalities (a) to (d) are all

Op

(T− 1

2

).

Lemmas Used in Theorems

Lemma 11.N∑

j=1

(λj − λj

)2

6 ‖Sxx − Ω‖2F

Lemma 12. Under Assumption 3.4,

k∑j=1

N∑i=1i6=j

R2ij 6

4

∆2N2‖Sxx − Ω‖2

F


1− R2jj 6

4

∆2N2‖Sxx − Ω‖2

F


∥∥∥Q′⊥Qf

∥∥∥2

26

4k

∆2N2‖Sxx − Ω‖2

F

Lemma 15. Under assumptions 3.1, 3.2 and 3.3,

E ‖Sxx − Ω‖2F 6

2γN2

T

168

Lemma 16. Under assumptions 4.1(a), 4.1(b) and 4.2(b), ∃θ,N0 > 0 such

that

N > N0 =⇒ θN < |λj − λi|

for i = 1, ..., k, j = 1, ..., N and i 6= j.


supt

supN

max16i6N

16j6N

∞∑r=0


Lemma 18. 1− ρ 6 di

λi6 1 for i=1,..,k.

Lemma 19. |q′iuj| 6 2cρ for i 6= j where qi is the ith column of Qf and uj is

the jth column of U.

Lemma 20. 1− ρ 6k∑

j=1

(q′iuj)2

Lemma 21. If k = 1 or c 6 1−ρ

2ρ√

(k−1)(1−ρ), then there exists a sign matrix L

such that E ‖st − Lft‖2F 6 k (2ρ + ρ2 (4c2(k − 1)(1− ρ)− 1))

Lemma 22.

If k = 1 or c 6 1−ρ

2ρ√


E∥∥∥βS − Lβf

∥∥∥F

6√

σ2yk(2ρ + ρ2(4c2(k − 1)(1− ρ)− 1)).

Proofs of Lemmas

Proof of Lemma 11:

N∑j=1

(λj − λj

)2

=N∑

j=1

λ2j +

N∑j=1

λ2j − 2

N∑j=1

λjλj

169

Since Sxx and Ω are positive denite and symmetric, it follows from Marcus

(1956) that tr (SxxΩ) 6N∑

j=1

λjλj. Therefore

N∑j=1

(λj − λj

)2

6 tr (SxxSxx) + tr (ΩΩ)− 2tr (SxxΩ)

= ‖Sxx − Ω‖2F

Proof of Lemma 12:

q′j(Sxx − Ω)qi = q′jSxxqi − q′jΩqi = q′jQΛQ′qi − q′jQΛQ′qi

Note that q′jQΛ = (0 ... λj ... 0) and ΛQ′qi = (0 ... λi ... 0)′. It follows that

q′jQΛQ′qi = λj q′jqi = λjRij and q′jQΛQ′qi = λiq

′jqi = λiRij. Therefore, the

above equation may be written as

q′j(Sxx − Ω)qi =(λj − λi

)Rij (3.22)

We may also write

(λj − λi) Rij −(λj − λj

)Rij =

(λj − λi

)Rij (3.23)

so from equations (3.22) and (3.23)

(λj − λi) Rij =(λj − λj

)Rij + q′j(Sxx − Ω)qi

170

Therefore

(λj − λi)2 R2

ij 6 2(λj − λj

)2

R2ij + 2

(q′j(Sxx − Ω)qi

)2Since

N∑i=1

R2ij =

N∑i=1

q′jqiq′iqj = 1, summing over i yields

N∑i=1

(λj − λi)2 R2

ij 6 2(λj − λj

)2

+ 2q′j(Sxx − Ω)(Sxx − Ω)qj

Summing over j yields

N∑j=1

N∑i=1

(λj − λi)2 R2

ij 6 2N∑

j=1

(λj − λj

)2

+2tr ((Sxx − Ω)(Sxx − Ω)) 6 4 ‖Sxx − Ω‖2F

from Lemma 11. Under Assumption 3.4, N2∆2 < (λj − λi)2 for i = 1, ..., k

and j = 1, ..., N , i 6= j. Therefore

N2∆2

k∑j=1

N∑i=1i6=j

R2ij 6

k∑j=1

N∑i=1i6=j

(λj − λi)2 R2

ij 6N∑

j=1

N∑i=1

(λj − λi)2 R2

ij 6 4 ‖Sxx − Ω‖2F

which yieldsk∑

j=1

N∑i=1i6=j

R2ij 6

4

N2∆2‖Sxx − Ω‖2

F

Proof of Lemma 13:

N∑i=1

R2ij =

N∑i=1

q′jqiq′iqj = 1

171

AlsoN∑

i=1

R2ij =

N∑i=1i6=j

R2ij + R2

jj

Therefore

1− R2jj =

N∑i=1i6=j

R2ij 6

k∑j=1

N∑i=1i6=j

R2ij 6

4

N2∆2‖Sxx − Ω‖2

F

from Lemma 12

Proof of Lemma 14: QfQ′f + Q⊥Q′

⊥ = I so, denoting Rf = Q′fQf ,

∥∥∥Q′⊥Qf

∥∥∥2

26∥∥∥Q′

⊥Qf

∥∥∥2

F= tr

(Q′

fQ⊥Q′⊥Qf

)= k − tr

(R′

f Rf

)6 k −

k∑i=1

k∑j=1

R2ij 6 k −

k∑j=1

R2jj

=k∑

j=1

(1− R2

jj

)6

4k

N2∆2‖Sxx − Ω‖2

F

from Lemma 13.

Proof of Lemma 15: E ‖Sxx − Ω‖2F = Etr

(ΩΩ)

=N∑

i=1

N∑j=1

E

([Ω]

ij

)where[

Ω]2

ijis the i, jth element of Ω = Sxx − Ω. Denoting the i, jth element of Ω as

172

σij, note that for all i = 1, ..., N and j = 1, ..., N ,

E

([Ω]2

ij

)= E

1

T

(T∑

t=1

xitxjt − σij

)2

=1

T 2var

(T∑

t=1

xitxjt

)

=1

T 2

T∑t=1

T∑r=1

cov(xitxjt, xit−rxjt−r)

62

T 2

T∑t=1

t∑r=0

|cov(xitxjt, xit−rxjt−r)|

62

Tsup

t

∞∑r=0

|cov(xitxjt, xit−rxjt−r)|

62

Tsup

tsupN

max16i6N

16j6N

∞∑r=0

|cov(xitxjt, xit−rxjt−r)| 62γ

T

Therefore

E ‖Sxx − Ω‖2F 6

N∑i=1

N∑j=1

2γ

T=

2Nγ

T

Proof of Lemma 16: Consider the case where i < j and j 6 k, so that

λi > λj and di > dj. From the proof of Theorem 3.1.1(a) we have λi > di and

λj 6 dj + σ2. It follows that

di − dj − σ2 6 λi − λj (3.24)

From assumptions 4.1(a) and 4.1(b)

NdL

c< |di − dj| (3.25)

173

Also, from Assumption 4.2(b), ∃M < ∞ such that

σ2 < MN1−α (3.26)

Combining equations 3.24, 3.25 and 3.26 yields

NdL

c−MN1−α < λi − λj

Dene N0 =(

cMdL

) 1α. Then

N > N0 =⇒ ∃θ > 0 3 θN <NdL

c−MN1−α < λi − λj

proving the lemma for cases in which i < j and j 6 k.

For cases where i 6 k and k +1 6 j 6 N , dene dj ≡ 0 for j = k +1, ..., N

and the above argument still applies with c set equal to 1.

For cases where i 6 k and j < i, the above argument holds with the indicies

i and j interchanged.

Proof of Lemma 17: xitxjt =k∑

p=1

k∑q=1

[Bip][Biq]fptfqt+k∑

p=1

[Bip]fptεjt

k∑q=1

[Biq]fqtεit+

εitεjt. Using the fact that for random numbers a1, ..., am and b1, ..., bn, cov

(m∑

i=1

ai,n∑

j=1

bj

)=

m∑i=1

n∑j=1

cov(ai, bj) it is straightforward, but tedious, to show that under Assump-

tion 4.4 there exists a constant γ such that

supt

supN

max16i6N

16j6N

∞∑r=0


174

Proof of Lemma 18: As shown in the proof of Theorem 3.1.1(a) λi > di ∀i =

1, ..., k. Therefore ρ = σ2

dk> σ2

λk= ρ ⇒ 1− ρ 6 1− ρ. The result then follows

from Theorem 3.1.1(a).

Proof of Lemma 19: QfΛfQ′f +Q⊥Λ⊥Q′

⊥ = UDU ′+Ψ. Premultiplying by

D−1Q′f , postmultiplying by U , and subtracting Q′

fU yields

D−1Q′fUD −Q′

fU =(D−1Λf − Ik

)Q′

fU −D−1Q′fΨU (3.27)

We now consider each of the right hand side terms. Let ei be a vector of

zeros with a 1 in the ith element only. We have

(e′iD

−1Q′fΨUej

)26 tr

(U ′ΨD−1eie

′iD

−1Q′fΨU

)6

1

d2k

e′iQ′fΨUU ′ΨQfei

61

d2k

e′iQ′fΨ

2Qfei 6σ4

d2k

∴∣∣e′iD−1Q′

fΨUej

∣∣ 6 ρ (3.28)

For the other right hand side term we have∣∣e′i(D−1Λf − Ik)Q′fUej

∣∣ =∣∣∣(λj

dj− 1)

q′iuj

∣∣∣ = |q′iuj|∣∣∣λj

dj− 1∣∣∣. From the proof

to Theorem 3.1.1(a) we have λj 6 dj + σ2 for j=1,..,k. Dividing by dj yields

λj

dj6 1 + σ2

dj= 1 + ρ. Also |q′iuj| 6 1 by the Cauchy-Schwarz inequality, so

∣∣e′i(D−1Λf − Ik)Q′fUej

∣∣ 6 ρ (3.29)

175

Combining equations (3.27), (3.28), and (3.29),

∣∣e′i (D−1Q′fUD −Q′

fU)ej

∣∣ =∣∣e′i (D−1Λf − Ik

)Q′

fUej − e′iD−1Q′

fΨUej

∣∣6∣∣e′i (D−1Λf − Ik

)Q′

fUej

∣∣+ ∣∣e′iD−1Q′fΨUej

∣∣6 2ρ

i.e.∣∣∣( di

dj− 1)

q′iuj

∣∣∣ 6 2ρ ⇒ |q′iuj| 6 2cρ for i 6= j.

Proof of Lemma 20: As shown in the proof to Lemma 18 1− ρ 6 1− ρ so

the result follows from Lemma 10.

Proof of Lemma 21:

E ‖sft − Lft‖2F = tr

[(Λ− 1

2f Q′

fUD12 − L

)(D

12 U ′QfΛ

− 12

f − L)

+ Λ− 1

2f Q′

fΨQfΛ− 1

2f

]= 2tr

(I − LΛ

− 12

f Q′fUD

12

)= 2

k∑i=1

(1− Lii

(di

λi

) 12

q′iui

)

Consider the terms(

di

λi

) 12q′iui, i = 1, ..., k.

From Lemma 18, 1−ρ 6 di

λi. If k = 1, then from Lemma 20, 1−ρ 6 (q′1u1)

2.

Combining these two results yields 1 − ρ 6(

di

λi

) 12 |q′1u1| which produces the

required result.

If k > 1 then from Lemma 20 1−ρ 6k∑

j 6=i

(q′iuj)2+(q′iui)

2 and from Lemma 19

(q′iuj)2 6 4c2ρ2 when i 6= j. Combining these two results with Lemma 18 yields

(1− ρ)2 − 4c2ρ2(k − 1)(1 − ρ) 6 di

λi(q′iui)

2. If c 6 1−ρ

2ρ√

(k−1)(1−ρ)then the left

hand side is non-negative and√

(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ) 6√

di

λi|q′iui|.

176

If we choose L so that sign(Lii) = sign(q′iuj), we get

E ‖sft − Lft‖2F 6 k − k

√(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ)

Multiplying this by1+√

(1−ρ)2−4c2ρ2(k−1)(1−ρ)

1+√

(1−ρ)2−4c2ρ2(k−1)(1−ρ)yields the result.

Proof of Lemma 22: Using the triangle inequality

‖βs − Lβf‖F =

∥∥∥∥∥ 1

T

T∑t=1

sftyt −1

T

T∑t=1

Lftyt

∥∥∥∥∥F

61

T

T∑t=1

‖yt‖F ‖sft − Lft‖F

Therefore, by the Cauchy-Schwarz inequality

E ‖βs − Lβf‖F 61

T

T∑t=1

√E(‖yt‖2

F

)E(‖sft − Lft‖2

F

)=√

σ2yk(2ρ + ρ2(4c2(k − 1)(1− ρ)− 1))

by Lemma 21.

177

Chapter 4

The Grouped Variable

Approximate Factor Model

Since the publication of the theoretical papers on approximate factor models by

Forni et al. (2000), Forni et al. (2004), Stock and Watson (2002a), Bai and Ng

(2002), Bai (2003) and Bai and Ng (2006) there have been many applications

of the principal component factor estimator to large macroeconomic datasets.

However, there has been relatively little discussion about the appropriateness

of the assumptions employed in the relevant theorems for the variables used.

It is well-known that the theorems in these papers apply to the `approximate'

factor model which allows for a degree of cross-sectional correlation between the

error terms. Specically, cross-correlation is permitted provided that the sums

of the absolute values of the rows of the error correlation matrix are uniformly

bounded1. It was argued in Chapter 3 of this thesis that this assumption

1In fact Stock and Watson (2002a) and Bai and Ng (2002) make the slightly weakerassumption that the mean of the absolute row sums is bounded, and Forni et al. (2000) andForni et al. (2004) place a uniform upper bound on the largest dynamic eigenvalue of the

178

might be too restrictive for the macroeconomic applications of the principal

component factor estimator that have appeared in the literature. It is easy to

imagine that as the number of variables in the model is increased, many of the

new variables that are added to the model will have errors which are correlated

with the errors of variables already in the model, so that the absolute row sums

of the covariance matrix grow without a xed bound as N −→ ∞. Consider

an `economy' consisting of N variables. Assume that each of these variables

belongs to a single group and that there exist m groups of size Nj so that

N =m∑

j=1

Nj, where m is a xed, nite scalar. These groups could correspond

to geographical or industrial sectors; they could be groups based on the type

of economic activity being measured (e.g. "stock prices", "housing starts and

sales", "average hourly earnings", etc as used in the appendix of Stock and

Watson (2002b)); or they might simply be broad functional groupings such

as `real variables', `price variables', and `nancial variables'. It is assumed

that a factor structure exists for all N variables in the economy. Since pairs of

variables that belong to the same group tend to be quite similar, it is reasonable

to expect that there would be stronger cross-correlation between the errors of

these variables than there would be between pairs of errors that correspond

to variables from two dierent groups. The group structure of the variables

implies the existence of an ordering such that a block structure for the error

covariance matrix exists, and the above argument suggests that the diagonal

blocks are likely to exhibit stronger cross-correlation than the o-diagonal

blocks. As an illustration, assume that the o-diagonal blocks of the error

covariance matrix, which represent error correlation between dierent groups,

spectrum of the error process, which is a slightly stronger assumption.

179

are subject to the same weak correlation assumption that is employed by Bai

(2003) and Bai and Ng (2006) for the entire covariance matrix. That is, the row

sums of the absolute value of the o-diagonal blocks of the covariance matrix

have a xed uniform upper bound of M . In contrast, the diagonal blocks of the

error covariance matrix, which represent error correlation between variables

that belong to the same group, have absolute row sums that are O(N1−α)

where 0 < α < 1, so that error cross-correlation within groups grows as N

grows. Suppose that N is increased by increasing the number of variables

in each group at a rate of N . The absolute row sums of the entire error

covariance matrix will grow at a rate of N1−α. Consequently, the theorems of

Forni et al. (2000), Forni et al. (2004), Stock and Watson (2002a), Bai and

Ng (2002), Bai (2003) and Bai and Ng (2006) do not apply, but consistency

of the principal components estimator is proved by Theorem 3.1.4 in Chapter

3 of this thesis. However, the rate of convergence is min(T

12 , N

α2

). If the

rate of growth in the error correlation within groups is high (α is low), then

this rate of convergence might be quite slow. Boivin and Ng (2006) have

conducted monte carlo simulations for the principal component estimator of

a factor model similar in some ways to that described above, and found the

performance of the estimator to be poor.

The work presented in this chapter is motivated by the proposition that

the poor empirical performance of the principal components estimator in some

applications might be due to relatively strong error cross-correlation between

variables that belong to the same group. A new factor model, named the

grouped variable approximate factor model, is proposed. This model is a

formalisation of the model that is loosely described above. It places a weak

180

correlation restriction on the o-diagonal blocks of the error covariance matrix,

but permits arbitrarily strong correlation between the errors of variables that

belong to the same group. An approximate instrumental variables estimator

is proposed for the model. This estimator is computationally straightforward

even for very large models, since it is non-iterative and requires only the com-

putation of matrix products and the inversion of k×k matrices, where k is the

number of factors in the model. It is not required that T > N . Consistency is

proved for this estimator, and rates of convergence are derived. The key result

in the chapter is that the rates of convergence depend on the rate of growth of

cross-correlation in the o-diagonal blocks of the error covariance matrix only.

The degree of error cross-correlation between variables within the same group

does not aect the rst order properties of the estimator. Consequently, pro-

vided that practitioners are able to identify groups of variables which are likely

to contain the strongest error cross-correlation, the approximate instrumental

variables estimator should provide rates of convergence superior to those avail-

able from the principal components estimator.

Since the entire error covariance matrix of the standard approximate factor

model satises the restrictions of the grouped variable model, the grouped

variable approximate factor model is a generalisation of the approximate factor

model, and the techniques and results presented in this chapter also apply to

the extension of the approximate factor model presented in Chapter 3, to

the approximate factor model considered by Stock and Watson (2002a), Bai

and Ng (2002), Bai (2003) and Bai and Ng (2006), and to the classical static

factor model. Consequently, the literature on the approximate factor model

is of relevance. However, since this literature is reviewed in Chapters 1 and

181

3, it will not be reviewed again here. The construction of the approximate

instrumental variables estimator exploits an errors-in-variables representation

of the static factor model that was rst proposed by Madansky (1964) and was

utilised by Hägglund (1982) to construct an instrumental variables estimator

for the classical `strict' static factor model. The results in this chapter might

be considered as an extension of this work to a dual-limit time series setting

with a grouped variable approximate factor structure.

Another part of the (as yet unpublished) literature that is relevant is the

recent work on GMM estimation of regression models in which there exist

a large number of instruments which have an approximate factor structure.

Consider the regression model

yt = θxt + ηt

where xt is a vector of n observable variables, θ is a n×1 vector of regression

coecients, and ηt is a scalar regression error term for which E(xtηt) 6= 0.

Suppose that a vector of N instruments zt is available. Under assumptions

similar to those used by Bai (2003), Bai and Ng (2007b) prove that, if xt

and zt are driven by a set of k common factors, then a GMM estimator of

θ, in which the rst k principal components of zt are used as instruments

is√

T -consistent and asymptotically Gaussian if√

TN

−→ 0. Furthermore,

they show that the k-factor GMM estimator is more ecient than a GMM

estimator constructed using any subset of k of the elements of zt. Kapetanios

and Marcellino (2006) consider some more general relationships between the

regressor, the observable instruments and the factors, which allow for elements

182

of zt to be weak instruments. They prove several asymptotic results which

support the use of GMM estimation with the rst k principal components of zt

used as instruments. Favero et al. (2005) and Beyer et al. (2005) are examples

of applications of this technique to macroeconomic data. The approximate

instrumental variables estimator proposed in this chapter and the theorems

that are proved, may be applied directly to regression models of this type.

Consequently, this chapter contains an alternative estimation procedure to

that proposed by Bai and Ng (2007b) and Kapetanios and Marcellino (2006),

which is consistent under the more general grouped variable approximate factor

model restriction.

The remainder of this chapter presents the work on the grouped variable

approximate factor model. In Section 4.1, the grouped variable approximate

factor model is described and some notation is established. In Section 4.2,

the approximate instrumental variables estimator is presented. For the sake

of clarity, the estimators are initially derived under the assumption that the

error covariance of the model is block diagonal. In this case, the estimator is

simply a standard instrumental variables estimator and it is relatively easy to

understand the rationale of the estimators and the choices of instruments. In

fact, the assumption that the o-diagonal blocks of the error covariance matrix

are zero is not necessary. All that is required is that they satisfy a weak corre-

lation assumption similar to that applied to the entire error covariance matrix

in Chapter 3. Since, under this condition, the moment conditions used to de-

rive the instrumental variables estimator hold only in an approximate sense as

N gets large, the instruments are referred to as approximate instruments, and

the estimator is referred to as an approximate instrumental variables estimator.

183

Section 4.3 presents some consistency results for the approximate instrumental

variables estimator. In particular, it is shown that the estimator is consistent

in a framework in which (T,N) → (∞,∞) jointly. Rates of convergence de-

pend on the rate of growth of cross-correlation in the o-diagonal blocks of the

error covariance matrix (i.e. on growth in cross-correlation between the errors

of variables belonging to dierent groups), but does not depend on the rate

of growth of error cross-correlation between variables that belong to the same

block. Consequently, useful rates of convergence may be achieved in cases

in which strong cross-correlation exists between errors, provided that su-

cient a priori information exists to allow for the arrangement of variables into

groups with relatively strongly cross-correlated errors. Section 4.4 contains

an application of the approximate instrumental variables estimator to a US

macroeconomic data set, and Section 4.5 contains some concluding comments.

The proofs of all theorems are in Appendix 1.

4.1 The grouped variable approximate factor model

It is assumed that a N×1 vector of observable variables xt may be represented

by a static factor model

xt = Bft + εt t = 1, ..., T (4.1)

where ft is a k × 1 vector of unobservable factors, εt is a N × 1 vector of

unobservable errors which are assumed to be uncorrelated with the factors,

and B is a N × k matrix of non-random coecients referred to as the factor

184

loadings. It is assumed that the observable variables may be ordered so that

xt may be partitioned into three subvectors of size Na, N2 and N3, such that

N = Na + N2 + N3, denoted xat, x2t and x3t. The covariance matrix of the

errors, Ψ = E (εtε′t) may be partitioned into three corresponding blocks.

Ψ =

Ψaa Ψa2 Ψa3

Ψ2a Ψ22 Ψ23

Ψ3a Ψ32 Ψ33

(4.2)

where Ψji for i, j = a, 2, 3 is a Nj×Ni matrix. In the approximate factor model,

the errors are assumed to be weakly correlated in the sense that the row sums

of the absolute value of the error covariance Ψ are uniformly bounded. In con-

trast, in the grouped variable approximate factor model, a weak correlation

assumption will be applied to the o-diagonal blocks of Ψ only. Specically, in

Section 4.3 an upper bound will be placed on the growth rate of the maximum

of the largest singular values of the o-diagonal blocks of Ψ. The blocks on

the diagonal, corresponding to variables belonging to the same group, are per-

mitted to display arbitrarily strong correlation. Of course, if weak correlation

applies to the entire covariance matrix, then it also applies to the o-diagonal

blocks, so the formulation above is a generalisation of the approximate fac-

tor model. As in the approximate factor model, it will be assumed that the

eigenvalues of BB′ grow at a rate of N .

The extension to higher numbers of groups is trivial. Furthermore, groups

may be aggregated since any two sets of variables which constitute groups as

dened above many be combined to form a superset of variables which also

satises the above denition of a group. However, the approximate instrumen-

185

tal variables estimator, which is described in the next section, requires that

there exist at least three blocks.

In this chapter, we also consider the estimation of the parameters in a

factor-augmented regression equation

yt = β′ft + α′wt + εyt (4.3)

where yt is a scalar random variable, εyt is a scalar error term, and wt is a

m× 1 vector of exogenous variables which may include lags of yt.

4.2 The approximate instrumental variables es-

timator

The grouped variable approximate factor model may be written as

xat

x2t

x3t

=

Ba

B2

B3

ft +

εat

ε2t

ε3t

t = 1, .., T

Following ideas established by Madansky (1964) and developed by Hägglund

(1982) in the context of a classical `exact' static factor model, the grouped

variable approximate factor model will now be interpreted as an errors-in-

variables model. To this end, xat is partitioned into a k × 1 vector x0t and

a Na − k vector x1t, and the vectors ε0t and ε1t and matrices B0 and B1 are

186

created similarly. The model may then be written as

x0t

x1t

x2t

x3t

=

B0

B1

B2

B3

ft +

ε0t

ε1t

ε2t

ε3t

t = 1, .., T (4.4)

In what follows, the k×1 vector x0t will be used as a proxy for the unobservable

k × 1 vector of factors. The key feature of the partioning is that ε0t is only

weakly correlated with ε2t and ε3t. This fact will be exploited to construct an

approximate instrumental variables estimator.

In what follows, it will be assumed that rank(B0) = k. Since B and ft

are identied only up to a non-singular transformation, it is without loss of

generality that we can create

B = BB−10

and

ft = B0ft

and write the model as

x0t

x1t

x2t

x3t

=

Ik

B1

B2

B3

ft +

ε0t

ε1t

ε2t

ε3t

t = 1, .., T (4.5)

187

where xjt and εjt are Nj×1 vectors, Bj = BjB−10 is Nj×k and N0 = k. Using

obvious denitions, we will write the model in this form as

xt = Bft + εt

The partitioning in Equation (4.5) suggests an errors-in-variables interpreta-

tion of the factor model. We may write

xit = Bift + εit i = 1, 2, 3.

x0t = ft + ε0t

(4.6)

In order to establish ideas, it will be assumed initially that the o-diagonal

blocks of (4.2) are known to all be zero. It will subsequently be shown that

this assumption is not necessary.

4.2.1 Estimating Bi

Suppose, for example, that we wished to estimate B2. From Equation (4.6)

we may write

x2t −B2x0t = ε2t −B2ε0t

z2t = x3t will be used as a vector of instruments for estimating B2. Postmul-

tiplying the above equation by z′2t yields

x2tz′2t −B2x0tz

′2t = ε2tf

′tB

′3 −B2ε0tf

′tB3 −B2ε0tε

′3t + ε2tε

′3t (4.7)

188

Bearing in mind that the errors are assumed to be uncorrelated with the factors

and that, for the sake of clarity, we are assuming for the time being that the

error covariance (4.2) is block diagonal, taking expectations of Equation (4.7)

yields the moment condition

Ω2 − Ω20B′2 = 0

where Ω2 = E (z2tx′2t) and Ω20 = E (z2tx

′0t). Replacing Ω2 and Ω20 by their

sample estimators yields an instrumental variables estimator for B2

B′2 = (S ′20S20)

−1S ′20S2

where S2 = 1T

T∑t=1

z2tx′2t and S20 = 1

T

T∑t=1

z2tx′0t. Similar arguments may be

used to construct instrumental variables estimators for B1 and B3 using z1t =

(x′2t x′3t)′ and z3t = x2t as instrument vectors, and the (temporary) assump-

tion that the error covariance (4.2) is block diagonal.

4.2.2 Estimating δ = (β′ α′)′

The above approach may be extended to estimate the regression parameters

in Equation (4.3). Let δ = (β′ α′)′ and xwt = (x′0t w′t)′. Let zwt be a vector

of valid instruments. The line of argument used above then yields the moment

condition

Ωw − Ωw0δ = 0

189

where Ωw = E (zwtyt) and Ωw0 = E (zwtx′wt). Replacing Ωw and Ωw0 with their

sample estimators yields an instrumental variables estimator for δ

δ = (S ′w0Sw0)−1

S ′w0Sw

where Sw = 1T

T∑t=1

zwtyt and Sw0 = 1T

T∑t=1

zwtx′wt. The appropriate choice of

elements for the instrument vector depends on which group yt belongs to. If

yt belongs to the same group of variables as x2t, then zwt = x3t. If yt belongs

to the same group of variables as x3t, then zwt = x2t.

4.2.3 Estimating Σf and Ψ

To estimate Σf , construct the instrument vector zft = (x′2t x′3t)′. Recall that

x0t = ft + ε0t

Postmultiplying by z′ft and taking expectations (noting that the errors are

assumed to be uncorrelated with the factors and, for the time being, we are

assuming the error covariance (4.2) to be block diagonal) yields the moment

condition

Ωf0 = BfΣf

where Ωf0 = E (zftx′0t) and Bf = (B′

2 B′3)′. Replacing Ωf with its sample

analogue, and Bf by the estimator Bf = (B′2 B′

3)′ described above, yields an

estimator of Σf

Σf =(B′

f Bf

)−1

B′fSf0

190

where Sf0 = 1T

T∑t=1

zftx′0t. It should be noted that this estimate is not con-

strained to be symmetric. A preferable estimator is therefore

Σf =1

2Σf +

1

2Σ′

f

Given consistent estimates of Σf and B, the error covariance Ψ may be

consistently estimated by

Ψ = Sxx − BΣf B′

where Sxx = 1T

T∑t=1

xtx′t.

Once the model parameters have been estimated, transformations may be

applied to the estimated factor loadings to acheive any particular orientation

that might be of interest. For example, it might be deemed worthwhile to

transform the estimated model to have orthonormal factors so that direct

comparison can be made with the usual principal component estimator of the

factor model.

4.2.4 Estimating ft

Now consider the estimation of the factor vector at a particular point in time.

If the true factor loadings were known, then an unbiased estimator of the

factor2 is given by

f ∗t = (B′B)−1

B′xt (4.8)

2corresponding to the rotation given by equation 4.5

191

The covariance matrix of this estimate is

cov(f ∗t ) = (B′B)−1

B′ΨB (B′B)−1 (4.9)

Note that ‖cov(f ∗t )‖2 =∥∥(B′B)−1 B′ΨB (B′B)−1

∥∥2

6 d1σ2

d2k

where dj is the jth

eigenvalue of B′B and σ2 is the largest eigenvalue of the entire error covari-

ance matrix Ψ. Theorem 3.1.2 in Chapter 3 presents conditions under which

the right hand side of this inequality goes to zero as N → ∞. Under such

conditions Equation (4.8) is a consistent estimator of the factor. However,

the block structure of the error covariance matrix in the grouped variable fac-

tor model suggests that this convergence to zero might occur quite slowly, so

that the value of the covariance given by Equation (4.9) would be of inter-

est in applications. Consistent estimators of the `population' factor estimator

and its covariance (equations (4.8) and (4.9)) are constructed by replacing the

population parameters with their sample estimators to yield.

ft =(B′B

)−1

B′xt (4.10)

and

C = est.cov(f ∗t ) =(B′B

)−1

B′ΨB(B′B

)−1

(4.11)

4.2.5 Estimation with approximate factors

The instrumental variables estimators described above are based on moment

conditions that were derived from the assumption that the error covariance ma-

trix (4.2) is block diagonal. It turns out that this assumption is not necessary

192

for consistency. As will be claimed more precisely in Section 4.3 (and proved

in Appendix 1), in a framework in which (N, T ) → (∞,∞) jointly, consistency

holds if the o-diagonal blocks of the error covariance satisfy a weak correla-

tion assumption. Since the moment conditions do not hold exactly in such

a framework, the instruments will be referred to as approximate instruments,

and the estimators listed above will be referred to as approximate instrumen-

tal variables estimators. To summarise, the approximate instruments for each

parameter matrix are listed in Table 4.1, and the approximate instrumental

variables estimators are listed below.

Table 4.1: Approximate Instruments

Parameter Approximate InstrumentB1 z1t = (x′2tx

′3t)

′

B2 z2t = x3t

B3 z3t = x2t

δ zwt =

x2t if yt belongs to group 3x3t if yt belongs to group 2

Σf zft = (x′2tx′3t)

′

The approximate instrumental variables estimators are

B′i = (S ′i0Si0)

−1S ′i0Si

for i = 1, .., 3, where Si = 1T

T∑t=1

zitx′it and Si0 = 1

T

T∑t=1

zitx′0t.

δ = (S ′w0Sw0)−1

S ′w0Sw

where Sw = 1T

T∑t=1

zwtyt and Sw0 = 1T

T∑t=1

zwtx′wt.

193

Σf =1

2Σf +

1

2Σ′

f

where

Σf =(B′

f Bf

)−1

B′fSf0

and Sf0 = 1T

T∑t=1

zftx′0t and Bf = (B′

2 B′3)′.

Ψ = Sxx − BΣf B′

where Sxx = 1T

T∑t=1

xtx′t and B = (Ik B′

1 B′2 B′

3)′.

ft =(B′B

)−1

B′xt

C = est.cov(f ∗t ) =(B′B

)−1

B′ΨB(B′B

)−1

4.3 Some Dual-Limit Theory

In the case where the o-diagonal blocks of the error covariance matrix are

zero and the number of variables N is xed, standard arguments may be used

to prove consistency of the estimators dened in Section 4.1. Furthermore,

in this case more ecient GMM estimators may easily be derived, and the

well-known testing procedures of the GMM applied to the model. In a setting

in which the o-diagonal blocks are non-zero, and N and T approach innity

jointly, consistency is less straightforward to establish.

194

Let

vt =

(f ′t ε′t w′

t yt

)′The following assumptions are made.

Assumptions 5.

5.1 E (vt) = 0 for t = 1, .., T .

5.2 Denote Σv = 1TE

(T∑

t=1

vtv′t

). Then Σv is a full-rank matrix and max

16i6N

16j6N

|[Σv]ij| <

c < ∞, where N = N + k + m + 1.

5.3 supt

supN

max16i6N

16j6N

∞∑r=0

|cov (vitvjt, vit−rvjt−r)| < γ < ∞ where N = N + k +

m + 1.

5.4 ϕ = supN

max06i6306j63

‖Ψij‖2 = O(N

1−α2

), α > 0.

5.5 Denoting dji = eigi

(B′

jBj

), 0 < dmin 6 dji

Nj6 dmax < ∞ where Nj is the

number of rows in Bj and i = 1, .., k.

5.6 0 < dmin < eigi

(B′

0B0

)< dmax < ∞, i = 1, .., k.

5.7 E(ftε′t) = 0, E(ftεyt) = 0.

5.8 Ωw0 = E

(1T

T∑t=1

zwtx′wt

)is of full column rank, where xwt = (x′0t w

′t)′ and

zwt is the vector of instruments used to estimate δ. Also, δ = O(1).

5.9 0 < mmin 6 Nj

N6 mmax < 1, j = 1, 2, 3.

Assumptions 5.1, 5.2 and 5.3 are made to ensure that sample second mo-

ments converge in probability to their corresponding population second mo-

195

ments. Assumption 5.4 places the weak correlation restriction on the o-

diagonal blocks of the error covariance matrix. It allows the maximum of the

largest of the singular values of the o-diagonal blocks of the error covari-

ance to grow at a rate strictly less than N12 . Since ‖Ψij‖2 6

√‖Ψij‖1 ‖Ψij‖∞

this assumption could be satised by making the stronger assumption that

‖Ψij‖1 ‖Ψij‖∞ = O(N1−α), that is, that the product of the maximum absolute

row sum and the maximum absolute column sum of the o-diagonal blocks of

the error covariance matrix grows at a rate strictly less than N . Assumptions

5.5 and 5.6 require that all k eigenvalues of the common component grow at a

rate of N . If they grew slower than this, then the proportion of the variance of

xt accounted for by the factors would go to zero as N −→∞. Any faster and

it would go to one. Assumption 5.7 requires that the errors are uncorrelated

with the factors. It is likely that this assumption could be relaxed to one of

weak correlation, but this isn't attempted here. Assumpton 5.8 is required

in order for the moment condition used to derive the estimator for δ has a

unique uniformly bounded solution. Lastly, Assumption 5.9 requires that N is

increased by increasing the number of variables in all the groups at the same

rate.

Under these assumptions, the following theorems hold. Proofs are pre-

sented in Appendix 1.

Theorem 4.3.1 (Consistency of AIV Regression Estimator). Under assump-

tions 5.1 to 5.9,∥∥∥δ − δ

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)].

Theorem 4.3.2 (Consistency of the AIV Factor Loading Estimator). Under

assumptions 5.1 to 5.7 and 5.9, max16j6Ni

∥∥∥Bi(j) −Bi(j)

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]196

where Bi(j) is the jth row of Bi and Bi(j) is the corresponding row of Bi.

Theorem 4.3.3 (Consistency of the AIV Estimate of the Covariance Ma-

trix of the Factors). Under assumptions 5.1 to 5.7 and 5.9∥∥∥Σf − Σf

∥∥∥2

=

Op

[max

(T− 1

2 , N−α2

)].

Theorem 4.3.4 (Consistency of the AIV Factor Estimator). Under assump-

tions 5.1 to 5.7 and 5.9,∥∥∥ft − f ∗t

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)].

Theorem 4.3.5 (Consistency of the Sample Covariance Estimator for the

Population Covariance Estimator). Under assumptions 5.1 to 5.7 and 5.9,∥∥∥C − cov(f ∗t )∥∥∥

2= Op

[max

(T− 1

2 , N−α2

)].

As is the case for the principal components estimator in Chapter 3 the

rate of convergence of the estimator depends on the growth of N and the

rate at which cross-correlation grows as N grows (which is determined by the

parameter α). However, for the approximate instrumental variables estima-

tor, it is the rate of growth of cross-correlation in the o-diagonal blocks only

which matters. The rate of convergence is not aected by the growth in cross-

correlation in the diagonal blocks. As long as the variables with the highest

concentration of correlated errors are appropriately arranged into groups, sig-

nicant error cross-correlation which would result in poor rates of convergence

for the principal components estimator, does not aect the performance of the

approximate instrumental variables estimator.

Care should be taken in interpreting Theorem 4.3.4. For the approximate

factor model in Chapter 3 it was shown that the sample principal components

converge in probability to the population factors. A dierent approach is taken

for the grouped variable approximate factor model in this chapter. Firstly, an

197

estimator is written (Equation (4.8)) which assumes that the population fac-

tor loading matrix is known. The covariance matrix for this estimator is then

given in terms of population parameters (Equation (4.9)). Sample estimators

of these quantities are constructed by replacing the unknown population co-

ecients with sample estimators of those coecients (Equations (4.10) and

(4.11)). Explicitly, it is the population estimator of the factor f ∗t (and its

covariance) that is being estimated by ft, rather than the factor ft itself. As

was explained in Section 4.2, ‖cov(f ∗t )‖2 =∥∥(B′B)−1 B′ΨB (B′B)−1

∥∥2

6 d1σ2

d2k

where dj is the jth eigenvalue of B′B and σ2 is the largest eigenvalue of the en-

tire error covariance matrix Ψ. Theorem 3.1.2 in Chapter 3 presents conditions

under which the right hand side of this inequality goes to zero as N → ∞.

Consequently, we might expect the approximate instrumental variables esti-

mator of the factors to be consistent under fairly general conditions of error

cross-correlation. However, the situations in which the approximate instru-

mental variables estimator is of particular interest are precisely those in which

the convergence is likely to be slow. For this reason, it will generally be of

interest to estimate both the factor vector and the covariance matrix of the

factor estimator.

Finally, Bai and Ng (2007b) and Kapetanios and Marcellino (2006) have

considered the problem of estimating a regression equation

yt = θxt + ηt

where xt is a vector of n observable variables, θ is a n× 1 vector of regression

coecients, and ηt is a scalar regression error term for which E(xtηt) 6= 0.

198

They assume that there exists a N × 1 vector of instruments zt which have a

k-factor structure

zt = Bft + εt

They consider estimating the factors using principal components of zt and

using these estimated factors as instruments in a GMM estimator for θ. They

prove consistency and asymptotic Gaussianity as (N, T ) −→ (∞,∞), under

assumptions similar to those made in Bai (2003) and Bai and Ng (2006).

Equations (4.3) and (4.6) provide us with yt = β′ft+α′wt+εyt and x0t = ft+ε0t.

Combining these equations yields

yt = β′x0t + α′wt + δyt (4.12)

where δyt = εyt−β′ε0t. Note that E(δytx0t) 6= 0. Furthermore, x0t has a factor

structure and there exists a large set of instrument variables zwt that have a

similar factor structure. Consequently, Equation (4.12), which is the same as

that considered by Bai and Ng (2007b) and Kapetanios and Marcellino (2006)3,

may be estimated by the approximate instrumental variables estimator, and

the theorems of this section apply.

3Note that Kapetanios and Marcellino (2006) considers three possible relationships be-tween the regressor and the factors. It is the second of these relationships that is beingconsidered here. This is also the relationship that is considered by Bai and Ng (2007b).

199

4.4 An experimental application to US macroe-

conomic data

As an experiment and illustration, the approximate instrumental variables es-

timator will now be used to estimate a grouped variable approximate factor

model for a US macroeconomic data set, and a comparison made to the princi-

pal components estimator of the approximate factor model. The data are the

same as those used by Stock and Watson (2002b) and were downloaded from

Professor Watson's web site. The reader is referred to their paper for a more

complete discussion of the data and an extensive simulation which compares

the out-of-sample forecasting performance of large factor models to a range of

more standard forecasting models.

In the appendix of Stock and Watson (2002b) they list the variables used

in their analysis under fourteen dierent headings. In the following analysis

it will be assumed that these headings dene a set of groups. Variables are

included in the following analysis only if data exist from 1959:01 to 1998:12.

Following Stock and Watson (2002b), variables were excluded if they had any

observations lying more than 10 times the interquartile range from the median.

This gave a set of 150 variables, 149 of which will be used as predictors.

The list of variables in each group is given in Appendix 2. Also listed are

codes indicating any transformation that was applied to the variable. The

transformations used are those used by Stock and Watson (2002b). The codes

are: 1 = no transformation, 2 = rst dierence, 4 = logarithm, 5 = rst

dierence of logarithms, 6 = second dierence of logarithms.

At the current stage of theoretical development, we have little guidance for

200

the best choice of variables to use as proxies for the factors. Arbitrarily, the

following variables were used:

1. Personal consumption expenditure (chained) - total durables (GMCDQ),

2. Personal consumption expenditure (chained) - nondurables (GMCNQ),

3. Personal consumption expenditure (chained) - services (GMCSQ),

4. Personal consumption expenditure (chained) - new cars (GMCANQ),

Grouped variable approximate factor models with from one to four factors

were estimated by using these variables in order as factor proxies.

Denoting the factor loading matrix corresponding to group j as Bj, for

j = 2, ..., 14 Bj may be estimated using as instruments the variables in all

groups except Group a and Group j. B1 may be estimated using as instruments

all variables except those in Group a. The approximate instrumental variables

estimator for B is then constructed by stacking B0 = Ik with the estimates of

B1, ..., B14. Σf is estimated using as instruments all variables except those in

Group a. The other parameters, and the factors, are estimated exactly as for

the three-group model detailed in Section 4.2.

The rst task was to estimate a one-factor model using the principal com-

ponents method and the approximate instrumental variables estimator. Since

the one-factor model is identied, this allows for a simple comparison to be

made between the output of the two estimators. Figure 4.1 plots the single

factor estimated using the two methods using all of the 150 variables listed

above.

201

Figure 4.1: Single factor estimated using approximate instrumental variablesmethod and principal components method

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

1960 1965 1970 1975 1980 1985 1990 1995

Approximate Instrumental Variables Method

-6

-5

-4

-3

-2

-1

0

1

2

3

4

1960 1965 1970 1975 1980 1985 1990 1995

Principal Components Method

Apart from the scaling, it takes a sharp eye to spot the dierences between

the two dierent factor estimates. However, Figure 4.2, which shows the dif-

ference between the two standardised series shows that they are not identical.

The sample correlation coecient for the two series is 0.97. Importantly, both

estimates have the appearance that we would expect of a macroeconomic fac-

tor. In particular, the downturns in the mid-1970s, early 1980s and early 1990s

are apparent.

The second task undertaken was to conduct an out-of-sample forecasting

simulation using the approximate instrumental variables estimator and the

principal components estimator in order to make a direct comparison of their

forecasting performance. Two variables were forecast in this simulation: indus-

trial production (IP) and CPI ination (PUNEW). For each of these variables,

factor models were estimated using the two estimation procedures with all of

the 149 other variables used as predictors. No lags were included in any fore-

casting model and, in each case the factor order was pre-determined. This was

done primarily because we do not yet have a model selection procedure for

the approximate instrumental variables estimator. Stock and Watson (2002b)

found that ...good forecasts can be made with only one or two factors...".

202

Figure 4.2: Dierence between factor estimated by approximate instrumentalvariables and factor estimated by principal components

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1960 1965 1970 1975 1980 1985 1990 1995

They also found little benet from lagged variables when forecasting IP, but

considerable benet when forecasting PUNEW. As part of their simulation ex-

ercise they considered forecasts generated from models with xed factor orders

from 1 to 4. The same xed orders are considered here.

All variables used to estimate the factors were transformed as indicated in

the above list of variables. Left hand side variables were also adjusted to have

zero means. The models used are as follows.

1200

hln

(IPt+h

IPt

)= β′Ift + ηIt

and

1200

hln

(PUNEWt+h

PUNEWt

)− 1200ln

(PUNEWt

PUNEWt−1

)= β′P ft + ηPt

203

Forecast horizons considered are 6 months, 12 months, 18 months and 24

months. The rst 6-month-ahead forecast was computed by estimating the

above models using data from 1959:04 to 1968:07, and then using the param-

eter estimates and the observations from 1968:08 to compute forecasts of the

dependent variables in 1969:02. An extra observation was then added to the

data set, and the models re-estimated using data up to 1968:08, and a fore-

cast computed for 1969:03 using data from 1968:09. Continuing this procedure

produced a series of 359 6-month-ahead forecasts for each of the variables. In

a similar fashion 359 12-, 18-, and 24-month-ahead forecasts were also gener-

ated. The forecasts were then compared to the actual values which occurred

and mean squared forecast errors calculated. These are displayed in tables 4.2

and 4.3.

Table 4.2: Forecast MSEs for PC and AIV forecasts for IP

k=1 k=2 k=3 k=4h PC AIV PC AIV PC AIV PC AIV6 0.7559 0.7095 0.5705 0.7081 0.5615 0.7316 0.5703 0.738212 0.9337 0.9047 0.548 0.8788 0.5261 0.9103 0.5306 0.877118 1.0123 1.0079 0.5718 0.9577 0.525 0.9851 0.5085 0.911124 1.0227 1.0276 0.6164 0.9683 0.5197 0.9675 0.4859 0.8928

Table 4.3: Forecast MSEs for PC and AIV forecasts for PUNEW

k=1 k=2 k=3 k=4h PC AIV PC AIV PC AIV PC AIV6 1.1978 1.1877 1.1998 1.1863 1.2002 1.1849 1.1371 1.175112 1.1545 1.1367 1.1524 1.1328 1.1171 1.1401 1.0331 1.140318 1.1937 1.1654 1.1793 1.1602 1.1192 1.1666 1.0302 1.172324 1.2138 1.175 1.1704 1.1716 1.1002 1.1699 1.0239 1.186

204

For the one-factor model, the approximate instrumental variables estimator

generally produces a slightly lower mean squared forecast error for both vari-

ables at the shorter forecast horizons. Interestingly however, while increasing

the number of factors leads to a signicant improvement in the performance

of the principal component procedure in forecasting IP, it makes little dier-

ence to the performance of the approximate instrumental variables estimate

forecast. For the PUNEW forecast, increasing the factor order doesn't make

a large dierence to the MSE of either procedure. It is not easy to explain

why this might be happening, given the current state of the relevant theory.

One possibility is that only the rst factor is useful in forecasting PUNEW,

and this is being estimated reasonably well by both procedures. In contrast,

IP might require two factors to produce an ecient forecast and, for some

reason, the approximate instrumental variables procedure is doing a poor job

of estimating multiple factors. One possible cause of this might be the choice

of proxies. In this particular case, all the proxies were (arbitrarily) chosen

to be consumption variables. It might be the case that all the consumption

variables respond to changes in the factors in a similar way, so that the matrix

B0 is poorly conditioned, resulting in the chosen variables being poor proxies

when used together, but satisfactory when used individually. Whatever the

cause, this application illustrates the need for further theoretical research on

the behaviour of the approximate instrumental variables estimator. Of partic-

ular interest is the distribution of the estimator and how it is aected by the

choice of proxies and instruments.

205

4.5 Concluding Comments

The theoretical work in this chapter provides a new method for estimating

large factor models. Since the rate of convergence of the estimator does not

depend on the degree of correlation in the diagonal blocks of the error covari-

ance matrix, it is of particular interest in applications in which there is a high

degree of error cross-correlation between some variables, since the results pre-

sented in Chapter 3 suggest that the principal components estimator is likely

to perform poorly in such situations. While a small empirical application was

carried out, what is really needed is an extensive set of empirical applica-

tions which compare the approximate instrumental variables estimator to the

usual principal components estimator. It would be of particular interest to

see whether the approximate instrumental variables estimator provides a bet-

ter performance in cases where the principal components estimator has been

shown to perform relatively poorly. Results such as this would indicate that

the poor performance of the principal components estimator in those cases is

due to excessive error cross-correlation. Of course, it would also be of inter-

est to compare the two approaches to estimation in cases where the principal

components estimator works well.

Much theoretical work remains to be done. The theorems in this chapter

prove rst order convergence only. Second order convergence results would

also be of interest. This would allow for a comparison of the asymptotic

variance with that derived for the principal components estimator of the ap-

proximate factor model by Bai (2003) and Bai and Ng (2006), which would

permit a more thourough theoretical comparison between the two procedures.

206

When constructing the approximate instrumental variables estimator there ex-

ist multiple choices for the proxy variables x0t. As yet, it is not clear what the

best choice is. The derivation of the asymptotic variance might provide some

guidance in this choice. A procedure for choosing the factor order would also

be useful. This might follow from distribution theory, or it might be based on

modications to traditional model selection procedures, as done by Bai and Ng

(2002) for the approximate factor model. Finally, in this chapter it is assumed

that the group structure of the variables is known so that valid approximate

instruments may be used for estimation. In some applications, it might not be

clear what the group structure is. Since the instrumental variables are chosen

based on a set of approximate moment conditions which are overidentifying,

the development of tests of the validity of the overidentifying approximate con-

ditions, similar to well-known J-test in the GMM literature, would be a useful

line of research.

Appendix 1 Proofs

This appendix contains proofs of all the theorems stated in this chapter.

Firstly, the proofs of the theorems are given. Then the lemmas used to prove

the theorems are stated. Finally the lemmas are proved.

Proofs of Theorems

Proof of Theorem 4.3.1. Dene Ωw0 = E(zwtx′wt), Ωw = E(zwtyt), Ψw =

E(εwtεyt), and Ψw0 = E(εwtε′0t), where εwt is the error vector for the vector

207

of instruments zwt. We have yt = β′ft + α′wt + εyt and ft = x0t − ε0t so we

may write yt = γ′xwt + εyt− β′ε0t where γ = (β′ α′)′ and xwt = (x0t wt)′. Post-

multiplying by zwt, taking the expected value, and solving yields an expression

for the population parameter

δ = (Ω′w0Ωw0)

−1Ω′

w0Ωw − (Ω′w0Ωw0)

−1Ω′

w0 (Ψw −Ψw0β) (4.13)

where the non-singularity of Ω′w0Ωw0 is a consequence of Assumption 5.8. The

sample estimator is

δ = (S ′w0Sw0)−1

S ′w0Sw

where Sw = 1T

T∑t=1

zwtyt and Sw0 = 1T

T∑t=1

zwtx′wt. From Lemma 27, 1

NwS ′w0Sw0 =

1Nw

Ω′w0Ωw0 + Op

(T− 1

2

)and 1

NwS ′w0Sw = 1

NwΩ′

w0Ωw + Op

(T− 1

2

)where Nw is

the number of elements in zwt. Also(

1Nw

Ω′w0Ωw0

)−1

= O(1) as a consequence

of Assumption 5.8. It follows that

δ − δ = (Ω′w0Ωw0)

−1Ω′

w0 (Ψw −Ψw0β) + Op

(T− 1

2

)

Since

∥∥∥(Ω′w0Ωw0)

−1Ω′

w0 (Ψw −Ψw0β)∥∥∥

26

∥∥∥∥∥(

1

Nw

Ω′w0Ωw0

)−11√Nw

Ω′w0

∥∥∥∥∥2

∥∥∥∥ 1√Nw

(Ψw −Ψw0β)

∥∥∥∥2

= O(N−α

2

)(4.14)

208

from Lemmas 24 and 25 and Assumption 5.9, it follows that

∥∥∥δ − δ∥∥∥

2+ O

(N−α

2

)+ Op

(T− 1

2

)

Proof of Theorem 4.3.2. The proof of Theorem 4.3.2 follows a similar ar-

gument to the proof of Theorem 4.3.1 Dene Ωi0 = E(zitx′0t), Ωi = E(zitx

′it),

Ψzi = E(εzitεit), and Ψzi0 = E(εzitε′0t), where εzit is the error vector for the

vector of instruments zit used to estimate Bi. We have xit = Bift + εit and

ft = x0t−ε0t so we may write xit = Bix0t+εit−Biε0t . Post-multiplying by z′it,

taking the expected value, and solving yields an expression for the population

parameter

B′i = (Ω′

i0Ωi0)−1

Ω′i0Ωi − (Ω′

i0Ωi0)−1

Ω′i0 (Ψzi −Ψzi0B

′i) (4.15)

where the non-singularity of Ω′i0Ωi0 is a consequence of Lemma 24. The sample

estimator is

B′i = (S ′i0Si0)

−1S ′i0Si

where Si = 1T

T∑t=1

zitxit and Si0 = 1T

T∑t=1

zitx′0t. From Lemma 27,∥∥∥ 1

NziS ′i0Si0 − 1

NziΩ′

i0Ωi0

∥∥∥2

= Op

(T− 1

2

)and

∥∥∥ 1Nzi

S ′i0Si − 1Nzi

Ω′i0Ωi

∥∥∥2

= Op

(T− 1

2

)where Nzi is the number of elements in zit. Also

(1

NziΩ′

i0Ωi0

)−1

= O(1) from

Lemma 24. It follows that

max16j6Ni


∥∥∥2

= max16j6Ni

∥∥∥(Ω′i0Ωi0)

−1Ω′

i0

(Ψzi(j) −Ψzi0B

′i(j)

)∥∥∥2+Op

(T− 1

2

)209

Since

max16j6Ni

∥∥∥(Ω′i0Ωi0)

−1Ω′

i0

(Ψzi(j) −Ψzi0B

′i(j)

)∥∥∥2

6

∥∥∥∥∥(

1

Nzi

Ω′i0Ωi0

)−11√Nzi

Ω′i0

∥∥∥∥∥2

× max16j6Ni

∥∥∥∥ 1√Nzi

(Ψzi(j) −Ψzi0B

′i(j)

)∥∥∥∥2

= O(N−α

2zi

)(4.16)

from Lemmas 24 and 25. It follows from Assumption 5.9 that

max16j6Ni


∥∥∥2

= O(N−α

2

)+ Op

(T− 1

2

)

Proof of Theorem 4.3.3. Dene Ωf0 = E(zftx′0t), Ψf0 = E(εftε

′0t), where

εft = (ε′2t ε′3t)

′. Also dene Bf = Bf −Bf and Ωfo = Sf0 − Ωf0.

We have x0t = ft + ε0t and zft = Bfft + εft where zft = (x′2t x′3t)

′ and Bf =

(B′2 B′

3)′. Taking the expected value of zftx

′0t and solving for Σf = E(ftf

′t)

yields

Σf =(B′

fBf

)−1B′

fΩf0 −(B′

fBf

)−1B′

fΨf0

The second term on the right hand side of the parameter equation is bounded

by

∥∥∥∥( 1Nf

B′fBf

)−11

NfB′

fΨ′f0

∥∥∥∥2

6

∥∥∥∥∥(

1√Nf

B′fBf

)−11

NfB′

f

∥∥∥∥∥2

∥∥∥∥ 1√Nf

Ψ′f0

∥∥∥∥2

= O(N−1

f

)since

√√√√maxeig

[(1√Nf

B′fBf

)−1]

= 1√dmin

and

∥∥∥∥ 1√Nf

Ψ′f0

∥∥∥∥2

= O(N−1

f

)where

210

Nf = N2 + N3. Therefore we have

Σf =(B′

fBf

)−1B′

fΩf0 + O(N−1

f

)The sample estimator is

Σf =(B′

f Bf

)−1

B′fSf0

From Lemma 31 and Assumption 5.5(1

NfB′

f Bf

)−1

=(

1Nf

B′fBf

)−1

+ Op

[max

(T− 1

2 , N−α

2f

)], so to prove the

theorem we need to show that 1Nf

S ′f0Bf = 1Nf

Ω′f0Bf +Op

[max

(T− 1

2 , N−α

2f

)].

We write Bf = Bf + Bf and Sj0 = Ωf0 + Ωf0. Then

1

Nf

S ′f0Bf =1

Nf

Ω′f0Bf +

1

Nf

Ω′f0Bf +

1

Nf

Ω′f0Bf +

1

Nf

Ω′f0Bf (4.17)

Bounds will now be given for the terms on the right hand side of Equation

(4.17).

•∥∥∥ 1

NfΩ′

f0Bf

∥∥∥2

6

∥∥∥∥ 1√Nf

Ωf0

∥∥∥∥2

∥∥∥∥ 1√Nf

Bf

∥∥∥∥2

= Op

[max

(T− 1

2 , N−α

2f

)]from

Lemma 30, Assumption 5.5 and Assumption 5.2.

•∥∥∥ 1

NfΩ′

f0Bf

∥∥∥2

6

∥∥∥∥ 1√Nf

Ωf0

∥∥∥∥∥∥∥∥ 1√Nf

Bf

∥∥∥∥2

. It follows from Lemma 26 and

Markov's Inequality that

∥∥∥∥ 1√Nf

Ωf0

∥∥∥∥2

2

6

∥∥∥∥ 1√Nf

Ωf0

∥∥∥∥2

F

= Op (T−1). Also∥∥∥∥ 1√Nf

Bf

∥∥∥∥2

2

= maxeig(

1Nf

B′fBf

)= dmax. It follows that

∥∥∥ 1Nf

Ωf0Bf

∥∥∥2

=

Op

(T− 1

2

).

•∥∥∥ 1

NfΩ′

f0Bf

∥∥∥2

6

∥∥∥∥ 1√Nf

Ωf0

∥∥∥∥2

∥∥∥∥ 1√Nf

Bf

∥∥∥∥2

= Op

[T− 1

2 max(T− 1

2 , N−α

2f

)]211

from arguments presented in the above two points.

From these three results and Equation (4.17) 1Nf

S ′f0Bf = 1Nf

Ωf0Bf+Op

[max

(T− 1

2 , N−α

2f

)]and the required result follows from Assumption 5.9.

Proof of Theorem 4.3.4. The population factor estimator is f ∗t = (B′B)−1 B′xt

and the sample factor estimator is ft =(B′B

)−1

B′xt where B =

(Ik B′

2 B′2 B′

3

)′.

It follows that∥∥∥ft − f ∗t

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]if∥∥∥ 1

NB′B − 1

NB′B

∥∥∥2

=

Op

[max

(T− 1

2 , N−α2

)]and

∥∥∥ 1N

B′xt − 1N

B′xt

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)].

Since B′B = Ik +3∑

i=1

B′iBi, the rst of these conditions is proved by Lemma

31. To prove the second, note that Si = 1T

T∑t=1

zitx′it and xit = Bix0t + δit

where δit = εit − Biε0t. Therefore Si = Si0B′i + Siδ where Siδ = 1

T

T∑t=1

zitδ′it.

Consequently

B′ixit −B′

ixit = (S ′i0Si0)−1

S ′iδxit

We have

∥∥∥∥ 1

Ni

B′ixit −

1

Ni

B′ixit

∥∥∥∥2

6

∥∥∥∥∥(

1

Nzi

S ′i0Si0

)−11√Nzi

S ′i0

∥∥∥∥∥2

∥∥∥∥ 1√NiNzi

Siδ

∥∥∥∥2

∥∥∥∥ 1√Ni

xit

∥∥∥∥2

(4.18)

where Nzi is the number of elements in the vector zit. The following bounds

apply

•∥∥∥∥( 1

NziS ′i0Si0

)−11√Nzi

S ′i0

∥∥∥∥2

= Op(1) from Lemma 28.

•∥∥∥ 1√

NiNziSiδ

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]from Lemma 29.

•∥∥∥ 1√

Nixit

∥∥∥2

= O(1) under assumptions 5.1, 5.2 and 5.5.

212

It follows that∥∥∥ 1

NiB′

ixit − 1Ni

B′ixit

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]. The result

then follows from the fact that∥∥∥ 1

NB′xt − 1

NB′xt

∥∥∥2

63∑

i=1

∥∥∥ 1N

B′ixit − 1

NB′

ixit

∥∥∥2

6

3∑i=1

∥∥∥ 1Ni

B′ixit − 1

NiB′

ixit

∥∥∥2.

Proof of Theorem 4.3.5. From Lemma 31 and Assumption 5.5∥∥∥∥( 1N

B′B)−1

−(

1N

B′B)−1

∥∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)], so to prove the the-

orem we need to show that∥∥∥ 1

N2 B′ΨB − 1

N2 B′ΨB

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)].

1

N2B′ΨB =

1

N2B′(Sxx − B′Σf B

)B =

1

N2B′SxxB − 1

N2B′BΣf B

′B

=1

N2B′ΩB +

1

N2B′ΩB − 1

N2B′BΣf B

′B

(4.19)

where Ω = Sxx−Ω and Ω = E(Sxx). We now consider each of the three terms

on the right hand side of Equation (4.19).

• For the rst right hand side term in Equation (4.19), 1N2 B

′ΩB− 1N2 B

′ΩB =

1N2 B

′ΩB + 1N2 B

′ΩB + 1N2 B

′ΩB where B = B − B. The terms on the

right hand side of this expression may be bounded as follows.

Firstly∥∥∥ 1

N2 B′ΩB

∥∥∥2

=∥∥∥ 1√

NB′∥∥∥

2

∥∥ 1N

Ω∥∥

2

∥∥∥ 1√N

B∥∥∥

2.

•∥∥∥ 1√

NB′∥∥∥

2= O(1) under Assumption 5.5.

•∥∥ 1

NΩ∥∥

2= 1

N2 maxeig (Ω2) = 1N2 maxeig (Ω)2 = O(1) under assump-

tions 5.2 and 5.5.

• Since B′ =

(0k B′

1 B′2 B′

3

), it follows that∥∥∥ 1√

NB∥∥∥

2=

3∑i=1

∥∥∥ 1√N

Bi

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]from Lemma

30.

213

Secondly∥∥∥ 1

N2 B′ΩB

∥∥∥2

6∥∥∥ 1√

NB∥∥∥2

2

∥∥ 1N

Ω∥∥

2= Op [max (T−1, N−α)] from

the above arguments.

Combining results we have

∥∥∥∥ 1

N2B′ΩB − 1

N2B′ΩB

∥∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]

• For the second right hand side term in Equation (4.19),

1

N2B′ΩB =

1

N2

(B′ + B′

)Ω(B + B

)=

1

N2B′ΩB +

1

N2B′ΩB +

1

N2B′ΩB +

1

N2B′ΩB

Bounds are now written for each of these terms.

•

∥∥∥∥ 1

N2B′ΩB

∥∥∥∥2

61√N

∥∥∥∥ 1√N

B

∥∥∥∥2

2

∥∥∥∥ 1√N

Ω

∥∥∥∥2

6d2

max√N

∥∥∥∥ 1√N

Ω

∥∥∥∥2

6d2

max√N

∥∥∥∥ 1√N

Ω

∥∥∥∥F

= Op

(N− 1

2 T− 12

)from Lemma 26 under Assumptions 5.1 to 5.3.

•∥∥∥ 1

N2 B′ΩB

∥∥∥2

6∥∥∥ 1√

NB′ 1

NΩ 1√

NB∥∥∥

26∥∥∥ 1√

NB∥∥∥

2

∥∥∥ 1N

Ω∥∥∥

2

∥∥∥ 1√N

B∥∥∥

2.∥∥∥ 1√

NB∥∥∥

26√

dmax under Assumption 5.5 and∥∥∥ 1√N

B∥∥∥

2= Op

[max

(T− 1

2 , N−α2

)]from Lemma 30.

∥∥∥ 1N

Ω∥∥∥

2=

Op

(T− 1

2

)from Lemma 26. Therefore∥∥∥ 1

N2 B′ΩB

∥∥∥2

= Op

[T− 1

2 max(T− 1

2 , N−α2

)].

•∥∥∥ 1

N2 B′ΩB

∥∥∥2

6∥∥∥ 1√

NB∥∥∥2

2

∥∥∥ 1N

Ω∥∥∥

2= Op

[T− 1

2 max (T−1, N−α)]from

214

Lemma 26 and Lemma 30.

Combining results we have∥∥∥ 1

N2 B′ΩB

∥∥∥2

= Op

[max

(N− 1

2 T− 12

)].

• For the third right hand side term in Equation (4.19),∥∥∥ 1N2 B

′BΣf B′B − 1

N2 B′BΣfB

′B∥∥∥

2= Op

[max

(T− 1

2 , N−α2

)]since∥∥∥ 1

NB′B − 1

N2 B′B∥∥∥

2= Op

[max

(T− 1

2 , N−α2

)]from Lemma 31, and∥∥∥Σf − Σf

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]from Theorem 4.3.3.

Combining these three results with Equation (4.19) we have

∥∥∥∥ 1

N2B′ΨB − 1

N2B′ΩB − 1

N2B′BB′B

∥∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]∥∥∥∥ 1

N2B′ΨB − 1

N2B′(Ω−BB′)B

∥∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]∥∥∥∥ 1

N2B′ΨB − 1

N2B′ΨB

∥∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]

Lemmas Used in the Proofs of the Theorems

Lemma 23. Let G and H be positive denite symmetric J × J matrices with

eigenvalues g1 > g2 > ... > gJ and h1 > h2 > ... > hJ . Then

J∑i=1

(gi − hi)2 6 ‖G−H‖2

F

Lemma 24. Under assumptions 5.1 to 5.7 tr

[(1

NziΩ′

i0Ωi0

)−1]

= O(1), where

Ωi0 = E(zitx′0t).

215

Lemma 25. Under assumptions 5.2 and 5.4,∥∥∥ 1√

Nw(Ψw −Ψw0β)

∥∥∥2

= O(N−α

2w

)and max

16j6Ni

∥∥∥ 1√Nzi

(Ψzi(j) −Ψzi0B

′i(j)

)∥∥∥2

= O(N−α

2zi

)where Bi(j) is the jth row

of Bi, Ψw = E(εwtε′yt), Ψw0 = E(εwtε

′0t), Ψzi = E(εzitε

′it), Ψzi0 = E(εzitε

′0t)

and Ψzi(j) is the jth column of Ψzi. εzit is the Nzi×1 error vector from the fac-

tor model of the instrument vector used to estimate Bi, and εwt is the Nw × 1

error vector from the factor model of the instrument vector zwt. Ni is the

number of rows in Bi.

Lemma 26. Dene ut = (x′t w′t yt)

′. Let upt be a Np × 1 vector containing a

subset of the elements of ut, and let uqt be a Nq × 1 vector dened similarly.

Dene Spq = 1T

T∑t=1

uptu′qt, Ωpq = E(Spq) and Ωpq = Spq − Ωpq. Then, under

assumptions 5.1, 5.2 and 5.3,

E∥∥∥Ωpq

∥∥∥2

F6

NpNqγ

T

where 0 < γ < ∞ and γ is a uniform bound applying to all vectors upt and uqt

as dened above.

Lemma 27. Dene ut = (x′t w′t yt)

′. Let upt be a Np × 1 vector containing

a subset of the elements of ut. Also let uqt be a Nq × 1 vector, and urt be a

Nr × 1 vector dened similarly. Dene Spq = 1T

T∑t=1

uptu′qt, Spr = 1

T

T∑t=1

uptu′rt,

Ωpq = E(Spq) and Ωpr = E(Spr). Then, under assumptions 5.1, 5.2 and 5.3,

∥∥S ′pqSpr − Ω′pqΩpr

∥∥2

= Op

(T− 1

2 NpN− 1

2q N

− 12

r

)

where the bound applies uniformly to all matrices Spq and Spr as dened above.

216

Lemma 28. Under assumptions 5.1 to 5.7

∥∥∥∥( 1Nzi

S ′i0Si0

)−11√Nzi

S ′i0

∥∥∥∥2

2

= Op(1).

Lemma 29. Let δit = xit − Bix0t and Siδ = 1T

T∑t=1

zitδ′it. Under assumptions

5.1, 5.2, 5.3, 5.5 and 5.9,∥∥ 1

NSiδ

∥∥2

= Op

[max

(T− 1

2 , N−α2

)]Lemma 30. Denote Bi = Bi − Bi. Under assumptions 5.1 to 5.7 and 5.9,∥∥∥ 1√

NiBi

∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]where Ni is the number of rows in Bi.

Lemma 31. Under assumptions 5.1 to 5.7 and 5.9,∥∥∥ 1

NfB′

f Bf − 1Nf

B′fBf

∥∥∥2

=

Op

[max

(T− 1

2 , N−α2

)].

Proofs of Lemmas

Proof of Lemma 23.

J∑i=1

(gi − hi)2 =

J∑i=1

g2i +

J∑i=1

h2i − 2

J∑i=1

gihi = tr(G′G) + tr(H ′H)− 2J∑

i=1

gihi

but tr(GH) 6J∑

i=1

gihi from Marcus (1956), so

J∑i=1

(gi − hi)2 6 tr(G′G) + tr(H ′H)− 2tr(GH) = ‖G−H‖2

F

Proof of Lemma 24. Ωi0 = BziΣf + Ψzi0 where Ψzi0 = E(εzitε′0t) and εzit is

the error vector corresponding to the instrument vector zit. Therefore

∥∥∥∥ 1

Nzi

Ω′i0Ωi0 −

1

Nzi

ΣfB′ziBziΣf

∥∥∥∥2

=

∥∥∥∥ 1

Nzi

ΣfB′ziΨzi0 +

1

Nzi

Ψ′zi0BziΣf +

1

Nzi

Ψ′zi0Ψzi0

∥∥∥∥217

where Nzi is the number of elements in zit. However∥∥∥∥ 1

Nzi

ΣfB′ziΨzi0 +

1

Nzi

Ψ′zi0BziΣf +

1

Nzi

Ψ′zi0Ψzi0

∥∥∥∥ 6 2 ‖Σf‖2

∥∥∥∥ 1√Nzi

Bzi

∥∥∥∥2

∥∥∥∥ 1√Nzi

Ψzi0

∥∥∥∥2

+

∥∥∥∥ 1√Nzi

Ψzi0

∥∥∥∥2

2

= O(N−α

2zi

)from assumptions 5.4, 5.5 and 5.6. Therefore

1

Nzi

Ω′i0Ωi0 =

1

Nzi

ΣfB′ziBziΣf + O

(N−α

2zi

)

Under assumptions 5.5 and 5.6,(

1Nzi

ΣfB′ziBziΣf

)−1

= O(1). It follows that

(1

Nzi

Ω′i0Ωi0

)−1

=

(1

Nzi

ΣfB′ziBziΣf

)−1

+ O(N−α

2zi

)= O(1)

Proof of Lemma 25.

∥∥∥∥ 1√Nw

(Ψw −Ψw0β)

∥∥∥∥2

6

∥∥∥∥ 1√Nw

Ψw

∥∥∥∥2

+

∥∥∥∥ 1√Nw

Ψw0

∥∥∥∥2

‖β‖2 = O(N−α

2w

)

under Assumption 5.4. Similarly

max16j6Ni

∥∥∥∥ 1√Nzi

(Ψzi −Ψzi0β)

∥∥∥∥2

6

∥∥∥∥ 1√Nzi

Ψzi

∥∥∥∥2

+

∥∥∥∥ 1√Nzi

Ψzi0

∥∥∥∥2

max16j6Ni

∥∥Bi(j)

∥∥2

= O(N−α

2zi

)

under Assumption 5.4 and since max16j6Ni

∥∥Bi(j)

∥∥ = O(1).

Proof of Lemma 26. Using the fact that for random numbers a1, ..., am and

b1, ..., bn, cov

(m∑

i=1

ai,n∑

j=1

bj

)=

m∑i=1

n∑j=1

cov(ai, bj) it is straightforward, but te-

218

dious, to show that under Assumption 5.3 there exists a constant γ such that

supt

supN

max16i,j6Nu

∞∑r=0

|cov (uitvjt, uit−rujt−r)| <γ

2< ∞ (4.20)

where Nu = N + m + 1

E

([Ωpq

]2ij

)= E

1

T

(T∑

t=1

uitujt − σij

)2

=1

T 2var

(T∑

t=1

uitujt

)

=1

T 2

T∑t=1

T∑r=1

cov(uitujt, uit−rujt−r)

62

T 2

T∑t=1

t∑r=0

|cov(uitujt, uit−rujt−r)|

62

Tsup

t

∞∑r=0

|cov(uitujt, uit−rujt−r)|

62

Tsup

tsupN

max16i,j6Nu

∞∑r=0

|cov(uitujt, uit−rujt−r)| 6γ

T

Therefore

E∥∥∥Ωpq

∥∥∥2

F=

Np∑i=1

Nq∑j=1

E

([Ωpq

]2ij

)6

NpNqγ

T

Proof of Lemma 27. Dene Ωpq = Spq − Ωpq and Ωpr = Spr − Ωpr. Then

E(Ωpq

)= 0 and E

(Ωpr

)= 0. Dene θpq = sup

i,j|[Ωpq]ij| and θpr = sup

i,j|[Ωpr]ij|.

The existence of the suprema is given by Assumption 5.2.

219

We have

∥∥S ′pqSpr − Ω′pqΩpr

∥∥2

=∥∥∥Ω′

pqΩpr + Ω′pqΩpr + Ω′

pqΩpr

∥∥∥2

6∥∥Ω′

pq

∥∥2

∥∥∥Ωpr

∥∥∥2+∥∥Ω′

pr

∥∥2

∥∥∥Ωpq

∥∥∥2+∥∥∥Ω′

pr

∥∥∥2

∥∥∥Ωpq

∥∥∥2

The terms on the right hand side of this inequality satisfy the following bounds

•∥∥∥Ωpq

∥∥∥2

6∥∥∥Ωpq

∥∥∥F

6√

NpNq γ

Tfrom Lemma 26.

•∥∥∥Ωpr

∥∥∥2

6∥∥∥Ωpr

∥∥∥F

6√

NpNr γ

Tfrom Lemma 26.

• ‖Ωpq‖2 6 ‖Ωpq‖F 6√

NpNqθpq.

• ‖Ωpr‖2 6 ‖Ωpr‖F 6√

NpNrθpr.

proving the lemma.

Proof of Lemma 28.

∥∥∥∥∥(

1

Nzi

S ′i0Si0

)−11√Nzi

S ′i0

∥∥∥∥∥2

2

= maxeig

((1

Nzi

S ′i0Si0

)−1)

=

(mineig

(1

Nzi

S ′i0Si0

))−1

From Lemma 27

1

Nzi

S ′i0Si0 =1

Nzi

Ω′i0Ωi0 + Op

(T− 1

2

)

so from Lemma 23

mineig

(1

Nzi

S ′i0Si0

)= mineig

(1

Nzi

Ω′i0Ωi0

)+ Op

(T− 1

2

)

It follows from Lemma 24 that

220

(mineig

(1

NziΩ′

i0Ωi0

))−1

= maxeig

((1

NziΩ′

i0Ωi0

)−1)

6 tr

((1

NziΩ′

i0Ωi0

)−1)

=

O(1). Therefore

∥∥∥∥∥(

1

Nzi

S ′i0Si0

)−11√Nzi

S ′i0

∥∥∥∥∥2

2

= mineig

(1

Nzi

S ′i0Si0

)= mineig

(1

Nzi

Ω′i0Ωi0

)+ Op

(T− 1

2

)= Op(1)

Proof of Lemma 29.

Siδ =1

T

T∑t=1

zitx′it −Bi

1

T

T∑t=1

zitx′0t

= Si −BiSi0

= Ωi −BiΩi0 + Ωi −BiΩi0

However, the moment condition that is satised by the model is

Ωi −BiΩi0 = Ψiz −BiΨ0iz

where Ψiz = E(εitε′zit) and Ψ0iz = E(ε0tε

′zit) and εzit is the Nzi×1 error vector

from the factor structure of the instrument vector zit. Therefore

1

NSiδ =

1

NΨiz −

1

NBiΨ0iz +

1

NΩi −

1

NBiΩi0

and consequently

∥∥∥∥ 1

NSiδ

∥∥∥∥2

=

∥∥∥∥ 1

NΨiz

∥∥∥∥2

+

∥∥∥∥ 1√N

Bi

∥∥∥∥2

∥∥∥∥ 1√N

Ψ0iz

∥∥∥∥2

+

∥∥∥∥ 1

NΩi

∥∥∥∥2

+

∥∥∥∥ 1√N

Bi

∥∥∥∥2

∥∥∥∥ 1√N

Ωi0

∥∥∥∥2

221

Bounds are now given for each of the terms on the right hand side of this

equation.

•∥∥ 1

NΨiz

∥∥2

= O(N−α2 ) by Assumption 5.4 and Assumption 5.9.

•∥∥∥ 1√

NΨ0iz

∥∥∥2

= O(N−α2 ) by Assumption 5.4 and Assumption 5.9.

•∥∥∥ 1√

NBi

∥∥∥2

= O(1) by Assumption 5.5 and Assumption 5.9.

•∥∥∥ 1

NΩi

∥∥∥2

6∥∥∥ 1

NΩi

∥∥∥F

= Op

(T− 1

2

)from Lemma 26 and Assumption 5.9.

•∥∥∥ 1√

NΩi0

∥∥∥2

6∥∥∥ 1√

NΩi0

∥∥∥F

= Op

(T− 1

2

)from Lemma 26 and Assumption

5.9.

Therefore∥∥ 1

NSiδ

∥∥2

= O(N−α2 ) + Op

(T− 1

2

).

Proof of Lemma 30. Note that Si = Si0B′i + Siδ where Siδ = 1

T

T∑t=1

zitδ′it.

Since B′i = (S ′i0Si0)

−1 S ′i0Si, we have

Bi = B′i −B′

i = (S ′i0Si0)−1

S ′i0Siδ

Therefore

∥∥∥∥ 1√Ni

Bi

∥∥∥∥2

6

∥∥∥∥∥(

1

Nzi

S ′i0Si0

)−11√Nzi

S ′j0

∥∥∥∥∥2

∥∥∥∥ 1√NiNzi

Siδ

∥∥∥∥2

= Op

[max

(T− 1

2 , N−α2

)]

from Lemmas 28 and 29 and Assumption 5.9.

Proof of Lemma 31. Since Bi = B′i −B′

i, we have

1

Ni

B′iBi −

1

Ni

B′iBi =

1

Ni

B′iBi +

1

Ni

B′iBi +

1

Ni

B′iBi

222

Therefore

∥∥∥∥ 1

Ni

B′iBi −

1

Ni

B′iBi

∥∥∥∥2

= 2

∥∥∥∥ 1√Ni

Bi

∥∥∥∥2

∥∥∥∥ 1√Ni

Bi

∥∥∥∥2

+

∥∥∥∥ 1√Ni

Bi

∥∥∥∥2

2

= Op

[max

(T− 1

2 , N−α2

)]

from Lemma 30, Assumption 5.5 and Assumption 5.9

Appendix 2 Data

Table 4.4: Group a variables

GMCQ 5 Personal consumption expend (chained)-total (bil 92$, saar)GMCDQ 5 Personal consumption expend (chained)-total durables (bil 92$, saar)GMCNQ 5 Personal consumption expend (chained)-nondurables (bil92$, saar)GMCSQ 5 Personal consumption expend (chained)-services (bil 92$, saar)GMCANQ 5 Personal consumption expend (chained)-new cars (bil 92$, saar)

223

Table 4.5: Group 2 variables

IP 5 Industrial production:total index (1992=100, sa)IPP 5 Industrial production:products, total (1992=100, sa)IPF 5 Industrial production:nal products (1992=100, sa)IPC 5 Industrial production:consumer goods (1992=100, sa)IPCD 5 Industrial production:durable consumer goods (1992=100, sa)IPCN 5 Industrial production:nondurable consumer goods (1992=100, sa)IPE 5 Industrial production:business equipment (1992=100, sa)IPI 5 Industrial production:intermediate products (1992=100, sa)IPM 5 Industrial production:materials (1992=100, sa)IPMD 5 Industrial production:durable goods materials (1992=100, sa)IPMND 5 Industrial production:nondurable goods materials (1992=100, sa)IPMFG 5 Industrial production:manufacturing (1992=100, sa)IPD 5 Industrial production:durable manufacturing (1992=100, sa)IPN 5 Industrial production:nondurable manufacturing (1992=100, sa)IPMIN 5 Industrial production:mining (1992=100, sa)IPUT 5 Industrial production:utilities (1992=100, sa)IPXMCA 1 Capacity utilization rate: manufacturing, total (% of capacity, sa) (frb)PMI 5 Purchasing managers' index (sa)PMP 5 NAPM production index (percent)GMPYQ 5 Personal income (chained) (series #52) (bil 92$, saar)GMYXPQ 5 Personal income less transfer payments (chained) (#51) (bil 92$, saar)

224


LHEL 5 Index of help wanted advertising in newspapers (1967=100, sa)LHELX 4 Employment: ratio; help wanted ads:no. Unemployed clfLHEM 5 Civilian labor force: employed, total (thousands, sa)LHNAG 5 Civilian labor force: employed: employed, nonagricultural industries (thsnd, sa)LHUR 1 Unemployment rate: all workers, 16 years and over (%, sa)LHU680 1 Unemployed by duration: average (mean) duration in weeks (sa)LHU5 1 Unemployed by duration: persons unemployed less than 5 wks (thousands, sa)LHU14 1 Unemployed by duration: persons unemployed 5 to 14 wks (thousands, sa)LHU15 1 Unemployed by duration: persons unemployed 15 wks (thousands, sa)LHU26 1 Unemployed by duration: persons unemployed 15 to 26 weeks (thousands, sa)LPNAG 5 Employees on nonagricultural payrolls: total (thousands, sa)LP 5 Employees on nonagricultural payrolls: total, private (thousands, sa)LPGD 5 Employees on nonagricultural payrolls: goods producing (thousands, sa)LPCC 5 Employees on nonagricultural payrolls: contract construction (thousands, sa)LPEM 5 Employees on nonagricultural payrolls: manufacturing (thousands, sa)LPED 5 Employees on nonagricultural payrolls: durable goods (thousands, sa)LPEN 5 Employees on nonagricultural payrolls: nondurable goods (thousands, sa)LPSP 5 Employees on nonagricultural payrolls: service producing (thousands, sa)LPT 5 Employees on nonagricultural payrolls: wholesale and retail trade (thousands, sa)LPFR 5 Employees on nonagricultural payrolls: n., ins. and real estate (thsnds, sa)LPS 5 Employees on nonagricultural payrolls: services (thousands, sa)LPGOV 5 Employees on nonagricultural payrolls: government (thousands, sa)LPHRM 1 Average weekly hours of production workers: manufacturing (sa)LPMOSA 1 Average weekly hours of production workers: manufacturing, overtime hours (sa)PMEMP 1 NAPM employment index (%)


MSMTQ 5 Manufacturing & trade: total (millions of chained 1992 dollars)(sa)MSMQ 5 Manufacturing & trade: manufacturing; total (millions of chained 1992 dollars)(sa)MSDQ 5 Manufacturing & trade: durable goods (millions of chained 1992 dollars)(sa)MSNQ 5 Manufacturing & trade: durable goods (millions of chained 1992 dollars)(sa)WTQ 5 Merchant wholesalers: total (millions of chained 1992 dollars)(sa)WTDQ 5 Merchant wholesalers: durable goods (millions of chained 1992 dollars)(sa)WTNQ 5 Merchant wholesalers: nondurable goods (millions of chained 1992 dollars)(sa)RTQ 5 Retail trade: total (millions of chained 1992 dollars)(sa)RTNQ 5 Retail trade: nondurable goods (millions of 1992 dollars)(sa)

225


HSFR 4 Housing starts: nonfarm (1947-58): total farm and nonfarm (1959-) (thousands, sa)HSNE 4 Housing starts: northeast (thousands) saHSMW 4 Housing starts: midwest (thousands) saHSSOU 4 Housing starts: south (thousands) saHSWST 4 Housing starts: west (thousands) saHSBR 4 Housing authorized: total new private housing units (thousands, saar)HMOB 4 Mobile homes: manufacturers' shipments (thousands of units, saar)


IVMTQ 5 Manufacturing and trade inventories: total (millions of chained 1992)(sa)IVMFGQ 5 Inventories, business, mfg (millions of chained 1992)(sa)IVMFDQ 5 Inventories, business, durables (millions of chained 1992)(sa)IVMFNQ 5 Inventories, business, durables (millions of chained 1992)(sa)IVWRQ 5 Manufacturing & trade inventories: merchant wholesalers (m$1992 chained )(sa)IVRRQ 5 Manufacturing & trade inventories: retail trade (millions of chained 1992 dollars)(sa)IVSRQ 2 Ratio for manufacturing and trade: inventory/sales (chained 1992 dollars, sa)IVSRMQ 2 Ratio for manufacturing and trade: man. inventory/sales (chained m$1992, sa)IVSRWQ 2 Ratio for manufacturing and trade: wholesaler inventory/sales (chained 1992 dollars, sa)IVSRRQ 2 Ratio for manufacturing and trade: retail trade inventory/sales (chained 1992 dollars, sa)PMNV 1 NAPM inventories index (percent)

226


PMNO 1 NAPM new orders index (percent)PMDEL 1 NAPM vendor deliveries index (percent)MOCMQ 5 New orders, (net)-consumer goods and materials, 1992 dollars (bci)MDOQ 5 New orders, durable goods industries, 1992 (bci)MSONDQ 5 New orders, nondefense capital goods, in 1992 dollars (bci)MO 5 Manufacturing new orders: all manufacturing industries, total (mil$, sa)MOWU 5 Manufacturing new orders: manufacturing industries with unlled orders (mil$, sa)MDO 5 Manufacturing new orders: durable goods industires, total (mil$, sa)MDUWU 5 Manufacturing new orders: durable goods industries with unlled orders (mil$, sa)MNO 5 Manufacturing new orders: nondurable goods industries, total (mil$, sa)MNOU 5 Manufacturing new orders: nondurable goods ind. with unlled orders (m$, sa)MU 5 Manufacturing unlled orders: all manufacturing industries (m$, sa)MDU 5 Manufacturing unlled orders: durable goods industries, total (mil$, sa)MNU 5 Manufacturing unlled orders: nondurable goods industries, total (mil$, sa)MPCON 5 Contracts and orders for plant and equipment (bil$, sa)MPCONQ 5 Contracts and orders for plant and equipment in 1992 dollars (bci)


FSNCOM 5 NYSE common stock price index: composite (12/31/65 = 50)FSPCOM 5 S&P's common stock price index: composite (1941-43 = 10)FSPIN 5 S&P's common stock price index: industrials (1941-43 = 10)FSPCAP 5 S&P's common stock price index: capital goods (1941-43 = 10)FSPUT 5 S&P's common stock price index: utilities (1941-43 = 10)FSDXP 1 S&P's common stock price index: dividend yield (% per annum)FSPXE 1 S&P's common stock price index: price earnings ratio (%, nsa)


EXRUS 5 United States eective exchange rate (merm) (index no.)EXRGER 5 Foreign exchange rate: Germany (deutsche mark per US$)EXRSW 5 Foreign exchange rate: Switzerland (swiss franc per US$)EXRJAN 5 Foreign exchange rate: Japan (yen per US$)EXRCAN 5 Foreign exchange rate: Canada (Canadian $ per US$)

227


FYGT1 2 Interest rate: US treasury const maturities, sec mkt, 1-yr. (% per ann, nsa)FYGT5 2 Interest rate: US treasury const maturities, sec mkt, 5-yr. (% per ann, nsa)FYGT10 2 Interest rate: US treasury const maturities, sec mkt, 10-yr. (% per ann, nsa)FYAAAC 2 Bond yield: Moody's AAA corporate (% per annum)FYBAAC 2 Bond yield: Moody's BAA corporate (% per annum)FYFHA 2 Secondary market yield on FHA mortgages (% per annum)SFYCP90 1 Spread 90 day commercial paper minus federal fundsSFYGM3 1 Spread 3mo treasury bills minus federal fundsSFYGM6 1 Spread 6mo treasury bills minus federal fundsSFYGT1 1 Spread FYGT1 minus federal fundsSFYGT5 1 Spread FYGT5 minus federal fundsSFYGT10 1 Spread FYGT10 minus federal fundsSFYAAAC 1 Spread FYAAAC minus federal fundsSFYBAAC 1 Spread FYBAAC minus federal fundsSFYFHA 1 Spread FYFHA minus federal funds


FM1 6 Money stock: M1 (bil$,sa)FM2 6 Money stock: M2 (bil$,sa)FM3 6 Money stock: M3 (bil$,sa)FM2DQ 5 Money supply-M2 in (1992$)(bci)FMFBA 6 Monetary base, adj for reserve requirement changes (mil$,sa)FMRRA 6 Depository institution reserves: total, adj for reserve requirement changes (mil$,sa)FMRNBC 6 Depository institution reserves: nonborrow+ext cr, adj res req changes (mil$,sa)

228


PMCP 6 NAPM commodity price index:PWFSA 6 Producer price index: nished goods (82 = 100, sa)PWFCSA 6 Producer price index: nished consumer goods (82=100, sa)PSM99Q 6 Index of sensitive materials prices (1990=100)(bci-99a)PUNEW 6 CPI-U: all items (82-84=100, sa)PU83 6 CPI-U: apparel and upkeep (82-84=100, sa)PU84 6 CPI-U: transportation (82-84=100, sa)PU85 6 CPI-U: medical care (82-84=100, sa)PUC 6 CPI-U: commodities (82-84=100, sa)PUCD 6 CPI-U: durables (82-84=100, sa)PUS 6 CPI-U: services (82-84=100, sa)PUXF 6 CPI-U: all items less food (82-84=100, sa)PUXHS 6 CPI-U: all items less shelter (82-84=100, sa)PUXM 6 CPI-U: all items less medical care (82-84=100, sa)PUXX 6 CPI-U: all items less food and energy (82-84=100, sa)GMDC 6 Pce, impl pr de. Pce; (1987=100)GMDCD 6 Pce, impl pr de. Pce; durables (1987=100)GMDCN 6 Pce, impl pr de. Pce; nondurables (1987=100)GMDCS 6 Pce, impl pr de. Pce; services (1987=100)


LEHCC 6 Average hourly earnings of construction workers: construction ($, sa)LEHM 6 Average hourly earnings of production workers: manufacturing ($, sa)


HHSNTN 1 U. Michigan index of consumer expectations (bcd-83)

229

Chapter 5

Conclusions

5.1 The motivation for the research

For the most part, economics is an observational science. Researchers in ex-

perimental disciplines often generate their own data. In contrast, economists

usually have to make do with what is available. The standard techniques for

analysing macroeconomic time series are mostly variations on the basic vector

autoregression model. Most of these techniques are not suitable for modelling

more than a handful of variables at a time. Conseqently, in macroeconomics,

`gathering more data' usually means waiting for the passage of time. Since the

period of the business cycle is probably a few years, it can be a long wait. For

the industrialised economies, data on hundreds of economic variables are now

regularly collected and published by the statistical agencies. In contrast to the

restrictions imposed by contemporary econometric methods, policy macroe-

conomists often informally analyse a wide range of available data in order to

make judgements about the state of the economy. This practice implies a belief

230

that the broad range of economic variables, most of which must be omitted

from formal analyses, contain useful information about the state of the econ-

omy. If this is true, then the development of formal econometric technques

which are capable of modelling more variables than is feasible with vector

autoregressions should be a fruitful area of research.

The research presented in this thesis focuses on the development of tech-

niques for estimating factor models of economic time series. Factor models

are attractive in economics since the notion that a wide range of economic

variables are aected by a small number of possibly unobservable factors is

usually uncontroversial, particularly in elds such as business cycle theory and

nance. Furthermore, the fact that factor analysis has been such a successful

tool in the analysis of independently sampled multivariate data, encourages

the belief that factor models of time series will prove to be a useful empirical

tool for applied economists.

5.2 The ndings of the research

A number of theoretical and methodological contributions are made in this

thesis.

5.2.1 Dynamic factor analysis

In Chapter 2 the dynamic factor model with mutually uncorrelated autoregres-

sive factors is derived as a particular realisation of a VARMA model of reduced

spectral rank observed subject to noise. It is shown (Proposition 1) that this

representation corresponds to a minimal dimension state space representation

231

of the VARMA plus noise model in cases where the the lowest common de-

nominator polynomials of each of the columns of the VARMA lter have no

common polynomial factors. When common polynomial factors exist, then

the dynamic factor model is not equivalent to a minimal dimension state space

representation.

Identication is also considered for a fairly general class of dynamic factor

model. Theorem 2.2.1 shows that the error spectrum of the dynamic factor

model is identied under some rank assumptions on the factor lter matrix

β(L). Theorem 2.2.2 shows that the number of factors is identied under

the same condition. These theorems are written for the dynamic factor model,

however the proofs do not rely on the existence of factor structure. Rather, the

essential requirement is that the observable variable is the sum of a component

with spectral rank k and a component with a diagonal spectrum. The proof is

based on the rank of submatrices of this rst component so, provided that the

appropriate rank conditions hold, the result applies generally. Theorem 2.2.3

shows that β(L) and the factor spectra, are identied, up to rescaling and sign

changes of factors, if it is possible to order the variables in xt such that the

rst k rows of β(L) are a lower triangular polynomial matrix. This generalises

a well-known condition for identication of static factor models to a dynamic

setting. Finally, Theorem 2.2.4 shows that under fairly general conditions,

zero restrictions of this type are not necessary for identication. The key

assumption here is that the factor spectra are linearly independent functions.

This assumption is satised by factors which follow autoregressive processes

(provided that no pair of factors follows the same autoregressive process). It

is also true for ARMA factors provided certain `no cancelling out' conditions

232

are satised. Since it holds for ARMA factors, it also holds for models which

have factors which follow the Markov-switching process of Hamilton (1989).

In Section 2.3, a frequency domain approach was proposed for the estima-

tion of dynamic factor models. A simulation exercise (in Section 2.4) suggests

that this method has some computational advantage over the state space scor-

ing algorithm which is usually used for dynamic factor model estimation. How-

ever, it is unfortunately the case that large models with rich dynamic structures

are dicult to estimate, as they are with the traditional state space scoring

algorithm. However, the main attraction of the frequency domain approach is

the relative ease with which a general algorithm can be coded. The existing

time domain algorithms for the estimation of dynamic factor models require

the construction of a state space representation of the model. For factor mod-

els with few lags, this is trivial. However for more complicated lag structures,

and particularly for ARMA dynamics, this task becomes more complex, and

the construction of a general algorithm, which can handle any specication of

model orders is complicated. As was shown in Section 2.3, in the frequency do-

main a general expression for the covariance matrix can be written (Equation

(2.5)) which makes the evaluation of the likelihood relatively easy to code.

5.2.2 Approximate factor models

Most of the recent research in time series factor models has been concerned

with the use of principal component methods to estimate factor models of

economic time series in a setting in which the number of variables in the

model, and the number of observations, go to innity jointly. An important

233

feature of this literature is that it does not assume that the errors of the model

have a diagonal covariance matrix, but rather assumes an `approximate' factor

structure in which the row sums of the absolute value of the error covariance

matrix are subject to a xed bound as the number of variables increases. It has

been argued (e.g. by Bai (2003)) that this is a much more realistic assumption

in applications with a large number of variables. It was argued in Chapter 3 of

this thesis, that the `approximate' factor model might still be too restrictive for

many economic applications. In most of the applications of this technique that

have appeared in the literature, the variables in the model belong to a relatively

small set of groups. Since variables within a group tend to be quite similar

(e.g. they might be price indexes for dierent classes of consumer goods), it

is possible that a non-trivial amount of cross-correlation exists between the

errors of variables that belong to the same group. Applications that have

particularly large numbers of variables tend to still have a small number of

groups. Therefore, it might be argued that the number of variables is being

increased by increasing the number of variables in a nite set of groups. In such

cases, it is quite possible that the absolute row sums of the error covariance

would grow without bound as the number of variables grows, violating the

assumptions of the approximate factor literature.

In Chapter 3 it is shown that the principal components estimator is still con-

sistent for the factor model when the absolute row sums of the error covariance

matrix are unbounded as N grows (Theorem 3.1.4) . However, the rates of con-

vergence achieved depend on the rate of growth of cross-correlation in the error

covariance matrix. Consequently, it is possible that the principal components

estimator could perform poorly in some situations. It is also shown that sample

234

principal components are consistent estimators of population principal compo-

nents in a setting in which (N, T ) → (∞,∞) provided that a `gap' condition is

satised whereby the rst k eigenvalues of the covariance matrix of xt diverge

from all other eigenvalues at a rate of N (Theorem 3.1.3). Consequently, even

in cases where the cross-correlation of the errors is growing rapidly, the sample

principal components may still be good estimates of the population principal

components. Theorem 3.1.1 presents a set of nite-sample/variables bounds

linking population principal components to population factors. By avoiding

sampling issues and asymptotic arguments, these bounds give a clear view of

the conditions under which population factors and population principal com-

ponents are likely to be `close'. In particular, they suggest that what matters

for principal components to estimate factors well is not the number of variables

per se, but rather the magnitude of the noise-to-signal ratio, which is dened

as ρ = σ2

λk, where σ2 is the largest eigenvalue of the error covariance matrix Ψ,

and λk is the kth eigenvalue of Ω = E(

1TX ′X

). When the noise-to-signal ratio

is small, population principal component and population factor quantities will

be similar. In Section 3.2 a `xed-N ' hypothesis test for the magnitude of the

noise-to-signal ratio is proposed. While the asymptotic framework in which

this test is developed is not entirely satisfactory, it represents a rst attempt

to make inferences about the noise-to-signal ratio in large factor models, and

provides some ideas which may form the basis of future research.

235

5.2.3 The grouped variable approximate factor model

In Chapter 4 a new factor model, named the grouped variable approximate

factor model, was proposed. In the grouped variable approximate factor model

the error covariance has a block structure, where the blocks correspond to the

variable groups. The o-diagonal blocks are subject to a weak correlation

restriction specically, the largest of the singular values of the o-diagonal

blocks must grow at a rate strictly less than N− 12 . No restriction is placed on

the correlation structure of the blocks that lie on the diagonal. In Section 4.2

an approximate instrumental variables estimator is proposed for the grouped

variable factor model. This estimator is simple to compute, requiring only

matrix multiplication and the inversion of a k × k matrix, where k is the

number of factors. In Section 4.3 consistency is proved for the approximate

instrumental variables estimator in a framework in which (N, T ) → (∞,∞)

jointly (Theorem 4.3.1). Importantly, the rates of convergence do not depend

on the correlation in the diagonal blocks (i.e. the correlation between the errors

of variables that belong to the same group). What matters is the rate of growth

of correlation in the o-diagonal blocks. Consequently, if it is possible to

arrange the variables into groups such that most of the error cross-correlation

occurs between variables in the same group, then the approximate instrumental

variables estimator will provide better rates of convergence than the principal

components estimator.

236

5.3 Future research

5.3.1 Dynamic factor analysis

Since there exist conditions under which the dynamic factor model with mu-

tually uncorrelated autoregressive factors does not correspond to a mimimal

dimension state space representation of the reduced spectral rank VARMA

plus noise model, a useful task for future research is to devise models which

are. Minimal dimension state space theory would clearly be an important

component of the analysis of any such model. Modications of Theorems 2.2.1

and 2.2.2 could be used to identify the spectra of the errors and the spectral

rank of the VARMA component of the process. Estimation however, is likely

to be a challenge.

Bloch (1989) provides a relationship between the dynamic errors-in-variables

model and the dynamic factor model. It would be interesting to see whether,

using this relationship, the ndings in Chapter 2 could be used to gain any

new insights into the dynamic errors-in-variables model.

Perhaps the most pressing need in the eld of dynamic factor analysis, is

for computationally ecient estimation algorithms. Dynamic factor models

with rich lag structures may be written in state space form by stacking fac-

tors and their lags into the state vector, and augmenting it by the vector of

state variables from the error processes. Unfortunately, this usually results

in a noise-free measurement equation, which does not lend itself to the EM

algorithm. While the scoring algorithm will work with such models, it tends

to be very slow and often fails to converge. Subspace algorithms oer some

hope here. However, in their current form it is not clear how the restrictions

237

implied by the stacking of the state vector can be reected in the estimation

procedure. Resolving this issue would be an important contribution to the

eld of dynamic factor analysis.

5.3.2 Approximate factor models

The theorems about the principal components estimator presented in Chapter

3 deal with rst order convergence only. Second order results which gave

convergence in distribution would also be useful. In particular, an investigation

of the asymptotic distribution of the eigenvalues would be useful since it may

lead to the development of hypothesis test procedures for the noise-to-signal

ratio. One obvious line of inquiry here would be to develop results in the

framework of Random Matrix Theory. However, currently Random Matrix

Theory applies only to serially independent vectors, and so is not applicable to

problems in time series econometrics. Consequently, it is unlikely that research

in this direction will be straightforward.

5.3.3 The grouped variable approximate factor model

Chapter 4 presents theorems which describe rst order convergence only. Re-

sults on convergence in distribution would also be useful. These might allow

the construction of test procedures for the number of factors in the model,

and may also provide some guidance about the appropriate choice of prox-

ies and factors in the construction of the approximate instrumental variables

estimator.

Another contribution that could be made by future research is empirical.

238

The applied literature which uses the principal components technique provides

mixed results. In some cases, the principal components estimator of the fac-

tor model is shown to perform well in forecasting exercises. In other cases,

the use of the estimated factors is of little benet compared with standard

univariate forecasting methods. It would be of interest to estimate the lower

bound on the noise-to-signal ratio in each of these cases to see whether it is

higher in cases where the principal components approach performs relatively

poorly. It would also be of interest to re-estimate many of the models that

appear in the literature using the approximate instrumental variables estima-

tor. Unfortunately, while it is now possible to download from the author's

websites working papers detailing the estimation of factor models for many

of the industrialised economies, the data sets used in these studies are not so

easily available. Constructing large data sets is time consuming work even

when all the data are publicly available. While perhaps lacking the challenge of

theoretical and empirical research, a great contribution could be made to this

eld by the electronic publication of a collection of large macroeconomic data

sets, as used in the published literature on large factor models, with consistent

formatting and with the usual transformations, so that the existing empirical

work may be easily replicated and extended.

239

Bibliography

Altissimo, F., Bassanetti, A., Cristadoro, R., Forni, M., Hallin, M., Lippi,

M., Reichlin, L. and Veronese, G. (2001), `Eurocoin: A real time coincident

indicator of the euro area business cycle', CEPR Discussion Papers: 3108 .

Altissimo, F., Cristadoro, R., Forni, M., Lippi, M. and Veronese, G. (2006),

`New eurocoin: Tracking economic growth in real time', C.E.P.R. Discussion

Papers, CEPR Discussion Papers: 5633 .

Altug, S. (1989), `Time-to-build and aggregate uctuations: Some new evi-

dence', International Economic Review 30, 889920.

Amengual, D. and Watson, M. W. (2007), `Consistent estimation of the num-

ber of dynamic factors in a large n and t panel', Journal of Business and

Economic Statistics 25(1), 9196.

Anderson, T. W. (1963), `Asymptotic theory for principal component analysis',

Annals of mathematical statistics 34, 122148.

Anderson, T. W. and Rubin, H. (1956), `Statistical inference in factor analy-

sis', Third Berkeley Symposium on Mathematical Statistics and Probability

5, 111150.

240

Angelini, E., Henry, J. and Mestre, R. (2001), `Diusion index-based ination

forecasts for the euro area', Working Paper Series: European Central Bank

(061).

Artis, M. J., Banerjee, A. and Marcellino, M. (2005), `Factor forecasts for the

uk', Journal of Forecasting 24(4), 279298.

Attias, H. (1999), Ìndependent factor analysis', Neural Computation

11(4), 803851.

Bai, J. (2003), Ìnferential theory for factor models of large dimensions', Econo-

metrica 71(1), 13571.

Bai, J. and Ng, S. (2002), `Determining the number of factors in approximate

factor models', Econometrica 70(1), 191221.

Bai, J. and Ng, S. (2006), `Condence intervals for diusion index forecasts and

inference for factor-augmented regressions', Econometrica 74(4), 11331150.

Bai, J. and Ng, S. (2007a), `Determining the number of primitive shocks in

factor models', Journal of Business and Economic Statistics 25(1), 5260.

Bai, J. and Ng, S. (2007b), Ìnstrumental variable estimation in a data rich

environment', mimeo.

Baik, J. and Silverstein, J. W. (2006), Èigenvalues of large sample covariance

matrices of spiked population models', J. Multivar. Anal. 97(6), 13821408.

Bandt, O. D., Michaux, E., Bruneau, C. and Flageollet, A. (2007), `Fore-

casting ination using economic indicators: the case of france', Journal of

Forecasting 26(1), 122.

241

Banerjee, A. and Marcellino, M. (2006), Àre there any reliable leading indica-

tors for us ination and gdp growth?', International Journal of Forecasting

22(1), 137151.

Barnett, S. (1980), Matrices in Control Theory, Van Nostrand Reinhold Com-

pany, New York.

Bauer, D. (1998), Some asymptotic theory for the estimation of linear systems

using maximum likelihood methods or subspace algorithms, PhD thesis, TU

Wien.

Bentler, P. M. and Kano, Y. (1990), Òn the equivalence of factors and com-

ponents', Multivariate Behavioral Research 25(1), 6774.

Bernanke, B. S. and Boivin, J. (2003), `Monetary policy in a data-rich envi-

ronment', Journal of Monetary Economics 50, 525546.

Bernanke, B. S., Boivin, J. and Eliasz, P. (2005), `Measuring the eects of

monetar policy: A factor-augmented vector autoregressive (favar) approach',

Quarterly Journal of Economics 120, 387422.

Beyer, A., Farmer, R. E. A., Henry, J. and Marcellino, M. (2005), Factor

analysis in a new-keynesian model, Working Paper Series 510, European

Central Bank.

Bloch, A. M. (1989), Ìdentication and estimation of dynamic errors-in-

variables models', Journal of Econometrics 41, 145158.

Boivin, J. and Ng, S. (2006), Àre more data always better for factor analysis?',

Journal of Econometrics 132, 169194.

242

Box, G. E. P. and Tiao, G. C. (1977), À canonical analysis of multiple time

series', Biometrika 64, 355365.

Breitung, J. and Eickmeier, S. (2005), Dynamic factor models, Discussion

Paper Series 1: Economic Studies 2005,38, Deutsche Bundesbank, Research

Centre.

Brillinger, D. (1975), Time Series: Data Analysis and Theory, Holt, Rinehart

and Winston.

Brisson, M., Campbell, B. and Galbraith, J. W. (2003), `Forecasting some

low-predictability time series using diusion indices', Journal of Forecasting

22, 515531.

Camacho, M. and Sancho, I. (2003), `Spanish diusion indexes', Spanish Eco-

nomic Review 5, 173203.

Camba-Mendez, G., Kapetanios, G., Smith, R. J. andWeale, M. R. (2001), Àn

automatic leading indicator of economic activity: Forecasting gdp growth

for european countries', The Econometrics Journal 4(1), S56S90.

Cardoso, J.-F. (1998), `Blind signal separation: statistical principles', Proceed-

ings of the IEEE 9, 20092025.

Carter, R. L. and Fuller, W. A. (1980), Ìnstrumental variable estimation of

the simple errors-in-variables model', Journal of the American Statistical

Association 75, 687692.

Chamberlain, G. and Rothschild, M. (1983), Àrbitrage, factor structure, and

243

mean-variance analysis on large asset markets', Econometrica 51(5), 1281

1304.

Chauvet, M. (1998), Àn econometric characterization of business cycle dy-

namics with factor structure and regime switching', International Economic

Review 39(4), 96996.

Chauvet, M., Juhn, C. and Potter, S. (2002), `Markov switching in disaggregate

unemployment rates', Empirical Economics 27, 205232.

Chen, A. and Bickel, P. J. (2006), Ècient independent component analysis',

Annals of Statistics 34, 28252855.

Cheng, D. C. and Iglarsh, H. J. (1976), `Principal component estimators in

regression analysis', The Review of Economics and Statistics 58, 229234.

Comon, P. (1994), Ìndependent component analysis a new concept?', Signal

Processing 36, 287314.

Cragg, J. G. (1997), Ùsing higher moments to estimate the simple errors-in-

variables model', RAND Journal of Economics 28, S7191.

Cristadoro, R., Forni, M., Reichlin, L. and Veronese, G. (2005), À core in-

ation indicator for the euro area', Journal of Money, Credit and Banking

37, 539560.

Dagenais, M. G. and Dagenais, D. L. (1997), `Higher moment estimators for

linear regression models with errors in the variables', Journal of Economet-

rics 76, 193.

244

D'Agostino, A. and Giannone, D. (2006), Comparing alternative predictors

based on large-panel factor models, Working Paper Series 680, European

Central Bank.

Deistler, M. and Anderson, B. D. O. (1989), `Linear dynamic errors-in-variables

models: Some structure theory', Journal of Econometrics 41, 3963.

Deistler, M., Peternell, K. and Scherrer, W. (1995), `Consistency and relative

eciency of subspace methods', Automatica 31, 18651875.

Dempster, Laird and Rubin, D. B. (1977), `Maximum likelihood from incom-

plete data via the em algorithm', Journal of the Royal Statistical Society.

Series B (Methodological) 39, 138.

den Reijer, A. (2005), Forecasting dutch gdp using large scale factor models,

DNB Working Papers 028, Netherlands Central Bank, Research Depart-

ment.

Diebold, F. X. and Nerlove, M. (1989), `The dynamics of exchange rate volatil-

ity: A multivariate latent factor arch model', Journal of Applied Economet-

rics 4(1), 121.

Dungey, M., Martin, V. L. and Pagan, A. R. (2000), `A multivariate latent

factor decomposition of international bond yield spreads', Journal of Applied

Econometrics 15, 697715.

Eickmeier, S. and Ziegler, C. (2006), How good are dynamic factor models

at forecasting output and ination? a meta-analytic approach, Discussion

245

Paper Series 1: Economic Studies 2006,42, Deutsche Bundesbank, Research

Centre.

Eklund, J. and Karlsson, S. (2007), An embarrassment of riches: Forecasting

using large panels, Working Papers 2007:1, Örebro University, Department

of Business, Economics, Statistics and Informatics.

Engle, R. F., Lilien, D. M. and Watson, M. (1985), À dymimic model of

housing price determination', Journal of Econometrics 28, 307326.

Engle, R. F. and Watson, M. W. (1981), À one-factor multivariate time se-

ries model of metropolitan wage rates', Journal of the American Statistical

Association 76(376), 77481.

Erickson, T. and Whited, T. M. (2002), `Two-step gmm estimation of the

errors-in-variables model using high order moments', Econometric Theory

18, 776799.

Favero, C. A., Marcellino, M. and Neglia, F. (2005), `Principal components

at work: The empirical analysis of monetary policy with large data sets',

Journal of Applied Econometrics 20, 602620.

Favero, C. A., Ricchi, O. and Tegami, C. (2004), `Forecasting italian ination

with large datasets and many models', IGIER Working Paper No. 269 .

Fernández-Macho, F. J. (1997), À dynamic factor model for economic time

series', Kybernetika 33(6), 583606.

Ferreira, R. T., Bierens, H. and Castelar, I. (2005), `Forecasting quarterly

246

brazilian gdp growth rate with linear and nonlinear diusion index models',

Economia 6(3), 261292.

Flexer, A., Bauer, H., Prip, J. and Dorner, G. (2005), `Using ica for removal

of ocular artifacts in eeg recorded from blind subjects', Neural Networks

18(7), 9981005.

Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000), `The generalized

dynamic-factor model: Identication and estimation', Review of Economics

and Statistics 82(4), 54054.

Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2003), `Do nancial variables

help forecasting ination and real activity in the euro area?', Journal of

Monetary Economics 50, 12431255.

Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2004), `The general-

ized dynamic factor model consistency and rates', Journal of Econometrics

119, 231255.

Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2005), `The generalized

dynamic factor model: One-sided estimation and forecasting', Journal of

the American Statistical Association 100, 830840.

Gavin, W. T. and Kliesen, K. L. (2006), `Forecasting ination and output:

comparing data-rich models with simple rules', Federal Reserve Bank of St.

Louis, Working Paper 54A .

Geary, R. (1942), `Inherent relations between random variables', Proceedings

of the Royal Irish Academy 47, 6.

247

Geman, S. (1980), `A limit theorem for the norm of random matrices', Annals

of Probability 8, 252261.

Geweke, J. F. (1977), The dynamic factor analysis of economic time-series

models, in D. J. Aigner and A. S. Goldberger, eds, `Latent Variables in

Socioeconomic Models', North Holland, Amsterdam.

Geweke, J. F. and Singleton, K. J. (1981), `Maximum likelihood 'conrma-

tory' factor analysis of economic time series', International Economic Review

22(1), 3754.

Giacomini, R. and White, H. (2003), `Tests of conditional predictive ability',

University of California, San Diego.

Giannone, D. and Matheson, T. (2006), A new core ination indicator for new

zealand, Reserve Bank of New Zealand Discussion Paper Series DP2006/02,

Reserve Bank of New Zealand.

Giannone, D., Reichlin, L. and Sala, L. (2006), `Vars, common factors and

the empirical validation of equilibrium business cycle models', Journal of

Econometrics 132, 257279.

Gill, R. D. (1977), Consistency of maximum likelihood estimators of the factor

analysis model, when the observations are not multivariate normally dis-

tributed, in J. B. Barra, F. Brodeau, G. Romier and B. Van Cutsem, eds,

`Recent Developments in Statistics', North-Holland, Amsterdam.

Gillitzer, C. and Kearns, J. (2007), Forecasting with factors: The accuracy of

248

timeliness, RBA Research Discussion Papers rdp2007-03, Reserve Bank of

Australia.

Gillitzer, C., Kearns, J. and Richards, A. (2005), The australian business cycle:

A coincident indicator approach, RBA Research Discussion Papers rdp2005-

07, Reserve Bank of Australia.

Gorsuch, R. L. (1983), Factor Analysis, Hillsdale, N. J.: L. Erlbaum Asso-

ciates.

Gourieroux, C., Renault, E. and Touzi, N. (1993), Ìndirect inference', Journal

of Applied Econometrics 8, S85S118.

Gregory, A., Head, A. and Raynauld, J. (1997), `Measuring world business

cycles', International Economic Review 38(3), 677701.

Guntermann, K. L. and Norrbin, S. C. (1991), Èmpirical tests of real estate

market eciency', Journal of Real Estate Finance and Economics 4(3), p297

313.

Hamilton, J. D. (1989), À new approach to the economic analysis of nonsta-

tionary time series and the business cycle', Econometrica 57(2), 35784.

Hannan, E. J. and Deistler, M. (1986), The Statistical Theory of Linear Sys-

tems, Wiley, New York.

Hansen, L. and Sargent, T. (1990), Two diculties in interpreting vector au-

toregressions, in L. Hansen and T. Sargent, eds, `Rational Expectations

Econometrics', Westview Press: London.

249

Harris, D. and Martin, V. L. (1998), Ìndirect estimation of dynamic factor

models of the business cycle', mimeo.

Heaton, C. and Oslington, P. (2002), `The contribution of structural shocks to

australian unemployment', Economic Record 78(243), p433 442.

Helbling, T. and Bayoumi, T. (2003), Àre they all in the same boat? the 2000-

2001 growth slowdown and the g-7 business cycle linkages', IMF Working

Paper No. 03/46 .

Hägglund, G. (1982), `Factor analysis by instrumental variables methods', Psy-

chometrika 47(2), 209222.

Hosseini, S., Jutten, C. and Pham, D. T. (2003), `Markovian source separa-

tion', IEEE Transactions on Signal Processing 51, 30093019.

Hotelling, H. (1933), Ànalysis of a complex of statistical variables into prin-

cipal components', Journal of educational psychology 24, 417441 498520.

Hotelling, H. (1936), `Relations between two sets of variates', Biometrika

28, 312377.

Hyvärinen, A., Karhunen, J. and Oja, E. (2001), Independent Component Anal-

ysis, Wiley, New York.

Ihara, M. and Kano, Y. (1986), À new estimator of the uniqueness in factor

analysis', Psychometrika 51(4), 563566.

Inklaar, R. J., Jacobs, J. and Romp, W. (2003), `Business cycle indexes: does

a heap of data help?', University of Groningen.

250

Jennrich, R. I. (1986), À gauss-newton algorithm for exploratory factor anal-

ysis', Psychometrika 51(2), 277284.

Johansen, S. (1988), `Statistical analysis of cointegration vectors', Journal of

Economic Dynamics and Control 12(2/3), p231 254.

Johnson, N. L. and Kotz, S. (1972), Distributions in statistics: continuous

multivariate distributions, Wiley, New York.

Johnstone, I. M. (2001), Òn the distribution of the largest eigenvalue in prin-

cipal components analysis', Annals of Statistics 29, 295327.

Jöreskog, K. G. (1967), `Some contributions to maximum likelihood factor

analysis', Psychometrika 32, 443482.

Jöreskog, K. and Goldberger, A. (1972), `Factor analysis by generalized least

squares', Psychometrika 37(3), 243260.

Kailath, T. (1980), Linear Systems, Prentice-Hall, New Jersey.

Kaiser, H. F. (1958), `The varimax criterion for analytic rotation in factor

analysis', Psychometrika 23, 187200.

Kapetanios, G. (2004), À note on modelling core ination for the uk using

a new dynamic factor estimation method and a large disaggregated price

index dataset', Economics Letters 85, 6369.

Kapetanios, G. (2005), À testing procedure for determining the number of

factors in approximate factor models with large datasets', Working paper

No.551, Queen Mary.

251

Kapetanios, G. and Marcellino, M. (2004), `A parametric estimation method

for dynamic factor models of large dimensions', Working Paper 489, Queen

Mary, University of London.

Kapetanios, G. and Marcellino, M. (2006), `Factor-gmm estimation with large

sets of possibly weak instruments', Working paper No.577, Queen Mary,

University of London.

Kariya, T. (1993), Quantitative Methods for Portfolio Analysis: MTV Model

Approach, Theory and decision library, Kluwer Academic Publishers, Dor-

drecht.

Karlsen, H. A. (1990), `Doubly stochastic vector ar(1) processes', Dept. of

Mathematics, University of Bergen, Norway.

Kim, C. J. (1994), `Dynamic linear models with markov-switching', Journal of

Econometrics 60(1-2), 122.

Kim, C. J. and Nelson, C. R. (1998), `Business cycle turning points, a new

coincident index, and tests of duration dependence based on a dynamic

factor model with regime switching', Review of Economics and Statistics

80(2), 188201.

Kim, M. J. and Yoo, J. S. (1995), `New index of coincident indicators: A

multivariate markov switching factor model approach', Journal of Monetary

Economics 36(3), 60730.

Larimore, W. (1983), System identication, reduced-order ltering and mod-

252

eling via canonical variate analysis, in `Proc. 1983 American Control Con-

ference'.

Lawley, D. (1956), `Tests of signicance for the latent roots of covariance and

correlation matrices', Biometrika 43, 128136.

Lawley, D. N. and Maxwell, A. E. (1971), Factor Analysis as a Statistical

Method, 2nd edn, Butterworths.

Lebow, D. E. (1993), `The covariability of productivity shocks across indus-

tries', Journal of Macroeconomics 15, 483510.

Ledoit, O. and Wolf, M. (2002), `Some hypothesis tests for the covariance

matrix when the dimension is large compared to the sample size', Annals of

Statistics 30, 10811102.

Lippi, M. and Thornton, D. L. (2004), `A dynamic factor analysis of the re-

sponse of u. s. interest rates to news', Federal Reserve Bank of St Louis

Working Paper .

Ludvigson, S. C. and Ng, S. (2007), `The empirical risk-return relation: A

factor analysis approach', Journal of Financial Economics 83(1), p171

222.

Lütkepohl, H. (1991), Introduction to Multiple Time Series Analysis, Springer-

Verlag: Berlin.

Madansky, A. (1964), `Instrumental variables in factor analysis', Psychome-

trika 29(2), 105113.

253

Magnus, J. R. and Neudecker, H. (1991), Matrix dierential caclulus with ap-

plications in statistics and econometrics, Wylie, Chichester.

Mansour, J. M. (2003), `Do national business cycles have an international

origin?', Empirical Economics 28, 223247.

Marcus, M. (1956), Àn eigenvalue inequality for the product of normal matri-

ces', American Mathematical Monthly 63, 173174.

Mar£enko, V. A. and Pastur, L. A. (1967), `Distribution of eigenvalues for some

sets of random matrices', Mathematics of the USSR Sbornik 72, 457483.

Matheson, T. D. (2006), `Factor model forecasts for new zealand', International

Journal of Central Banking 2(2), 169237.

McCallum, B. T. (1970), Àrticial orthogonalization in regression analysis',

The Review of Economics and Statistics 52, 110113.

McKeown, M., Makeig, S., Brown, S., Jung, T.-P., Kindermann, S., Bell, A.,

Iragui, V. and Sejnowski, T. (1998), `Blind separation of functional magnetic

resonance imaging (fmri) data', Human Brain Mapping 6, 368372.

Melvin, M. and Schlagenhauf, D. (1986), `Risk in international lending: A dy-

namic factor analysis applied to france and mexico', Journal of International

Money and Finance 5, pS31 48.

Mittelhammer, R. C. and Baritelle, J. L. (1977), Òn two strategies for choosing

principal components in regression analysis', American Journal of Agricul-

tural Economics 59, 336343.

254

Nieuwenhuyze, C. V. (2006), A generalised dynamic factor model for the bel-

gian economy - useful business cycle indicators and gdp growth forecasts,

Research series 200603-2, National Bank of Belgium.

Nowak, E. (1992), Ìdentiability in multivariate dynamic linear errors-in-

variables models', Journal of the American Statistical Association 87, 714

723.

Nowak, E. (1993), `The identication of multivariate linear dynamic errors-in-

variables models', Journal of Econometrics 59, 213227.

Onatski, A. (2006a), Àsymptotic distribution of the principal components

estimator of large factor models when the factors are relatively weak', mimeo.

Onatski, A. (2006b), `Determining the number of factors from empirical dis-

tribution of eigenvalues', mimeo.

Onatski, A. (2007), À formal statistical test for the number of factors in the

approximate factor models', mimeo.

Pal, M. (1980), `Consistent moment estimators of regression coecients in the

presence of errors in variables', Journal of Econometrics 14, 349364.

Pearson, K. (1901), Òn lines and planes of closest t to systems of points in

space', Philosophical Magazine 2, 559572.

Pidot, G. B. (1969), À principal components analysis of the determinants of

local government scal patterns', The Review of Economics and Statistics

51, 176188.

255

Poskitt, D. S. and Chung, S. H. (1996), `Markov chain models, time series

analysis and extreme value theory', Advances in Applied Probability 28, 405

425.

Quah, D. and Sargent, T. J. (1992), A dynamic index model for large

cross sections, Discussion Paper / Institute for Empirical Macroe-

conomics 77, Federal Reserve Bank of Minneapolis. available at

http://ideas.repec.org/p/p/fedmem/77.html.

Reiersøl, O. (1941), `Conuence analysis by means of lag moments and other

methods of conuence analysis', Econometrica 9, 124.

Reiersøl, O. (1945), `Conuence analysis by means of instrumental sets of

variables', Arkiv für Mathematik, Astronomi och Fysik 32A, 4.

Reiersøl, O. (1950), `On the identiability of parameters in thurstone's multiple

factor analysis', Psychometrika 15, 121149.

Reinsel, G. and Ahn, S. (1992), `Vector autoregressive models with unit roots

and reduced rank structure: estimation , likelihood ratio test, and forecast-

ing', Journal of Time Series Analysis 13, 352375.

Reinsel, G. C. and Velu, R. P. (1998), Multivariate Reduced-rank Regression:

Theory and Applications, Springer, New York.

Ristaniemi, T. and Joutsensalo, J. (1999), On the performance of blind source

separation in cdma downlink, in `Proc. Int. Workshop on Independent Com-

ponent Analysis and Signal Separation (ICA'99), Aussois, France'.

256

Ross, S. A. (1976), `The arbitrage theory of capital asset pricing', Journal of

Economic Theory 13, 341360.

Rubin, D. and Thayer, D. (1982), `Em algorithms for factor analysis', Psy-

chometrika 47, 6976.

Sala, L. (2003), `Monetary transmission in the euro area: A factor model

approach', mimeo.

Sargent, T. J. (1989), `Two models of measurements and the investment ac-

celerator', Journal of Political Economy 97, 251287.

Sargent, T. J. and Sims, C. A. (1977), Business cycle modeling without pre-

tending to have too much a priori economic theory, in C. A. Sims, ed., `New

Methods in Business Cycle Research', Minneapolis: Federal Reserve Bank

of Minneapolis, pp. 45109.

Schneeweiss, H. (1997), `Factors and principal components in the near spherical

case', Multivariate Behavioral Research 32(4), 375401.

Schneeweiss, H. and Mathes, H. (1995), `Factor analysis and principal compo-

nents', Journal of Multivariate Analysis 55, 105124.

Schneider, M. and Spitzer, M. (2004), Forecasting austrian gdp using the gen-

eralized dynamic factor model, Working Papers 89, Oesterreichische Nation-

albank (Austrian Central Bank).

Schumacher, C. (2005), `Forecasting german gdp using alternative factor mod-

els based on large datasets', Deutsche Bundesbank, Research Centre.

257

Shumway, R. H. and Stoer, D. S. (1982), Àn approach to time series smooth-

ing and forecasting using the em algorithm', Journal of Time Series Analysis

3, 253264.

Sims, C. (1981), An autoregressive index model for the u.s., 1948-1975, in

J. Kmenta and J. Ramsey, eds, `Large-Scale Macroeconometric Models',

Amsterdam: North Holland, pp. 283327.

Sims, C. (1992), Ìnterpreting the macroeconomic time series facts: The eects

of monetary policy', European Economic Review 36, 9751000.

Sims, C. A. (1980), `Macroeconomics and reality', Econometrica 48(1), p1

48.

Singleton, K. (1980), À latent time series model of the cyclical behavior of

interest rates', International Economic Review 21, 559575.

Solo, V. (1986), Topics in advanced time series analysis, Vol. 1215 of Lecture

Notes in Math., Springer, Berlin, pp. 165328.

Spearman, C. (1904), `General intelligence objectively determined and mea-

sured', American Journal of Psychology 15, 201293.

Stock, J. H. and Watson, M. W. (1990), New indexes of coincident and leading

economic indicators, NBER Reprints 1380, National Bureau of Economic

Research, Inc.

Stock, J. H. and Watson, M. W. (2002a), `Forecasting using principal compo-

nents from a large number of predictors', Journal of the American Statistical

Association 97(460), 116779.

258

Stock, J. H. and Watson, M. W. (2002b), `Macroeconomic forecasting using

diusion indexes', Journal of Business and Economic Statistics 20(2), 147

62.

Stock, J. H. and Watson, M. W. (2005), `Implications of dynamic factor models

for var analysis', mimeo.

Stock, J. and Watson, M. (2006), Forecasting with many predicors, in G. El-

liott, C. W. J. Granger and A. Timmermann, eds, `Handbook of Economic

Forecasting'.

Taniguchi, M., Maeda, K. and Puri, M. (2006), `Statistical analysis of a class

of factor time series models', Journal of Statistical Planning and Inference

136, 23672380.

Thurstone, L. L. (1947),Multiple Factor Analysis, University of Chicago Press:

Chicago.

Velu, R., Reinsel, G. and Wichern, D. (1986), `Reduced rank models for mul-

tivariate time series', Biometrika 73, 105118.

Vigário, R., Jousmáki, V., Hämäläinen, M., Hari, R. and Oja, E. (1998), In-

dependent component analysis for identication of artifacts in magnetoen-

cephalographic recordings, in `NIPS '97: Proceedings of the 1997 conference

on Advances in neural information processing systems 10', MIT Press, Cam-

bridge, MA, USA, pp. 229235.

Watson, M. W. and Engle, R. F. (1983), `Alternative algorithms for the esti-

259

mation of dynamic factor, mimic and varying coecient regression models',

Journal of Econometrics 23, 385400.

Watson, M. W. and Kraft, D. F. (1984), `Testing the interpretation of indices

in a macroeconomic index model', Journal of Monetary Economics 13, 165

181.

Whittle, P. (1961), `Gaussian estimation in stationary time series', Bulletin de

L'Institut International de Statistique 39, 105130.

Wigner, E. (1955), `Characteristic vectors of bordered matrices with innite

dimensions', Annals of Mathematics 62, 548564.

Wigner, E. (1958), `On the distribution of the roots of certain symmetric

matrices', Annals of Mathematics 67, 325328.

Zhang, J. and Stine, R. A. (1999), Autocovariance structure of markov regime

models and model selection, Technical report, Department of Statistics, The

Wharton School of the University of Pennsylvania.

260

factor analysis of high dimensional time series

Documents