financial econometrics series swp 2015/16 cce estimation

Faculty of Business and Law Centre for Financial Econometrics (Department of Finance)

Financial Econometrics Series

SWP 2015/16

CCE Estimation of Factor-Augmented Regression Models with More Factors than

Observables

H. Karabiyik, J-P. Urbain and J.

Westerlund

The working papers are a series of manuscripts in their draft form. Please do not quote without obtaining the author’s consent as these works are in their draft form. The views expressed in this paper are those of the author and not necessarily endorsed by the School or IBISWorld Pty Ltd.

CCE ESTIMATION OF FACTOR-AUGMENTED REGRESSION

MODELS WITH MORE FACTORS THAN OBSERVABLES

Hande KarabiyikLund University

Jean-Pierre UrbainMaastricht University

Joakim Westerlund∗Lund University

andCentre for Economics and Financial Econometrics Research

Deakin University

October 16, 2015

Abstract

This paper considers estimation of factor-augmented panel data regression models.

One of the most popular approaches towards this end is the common correlated effects

(CCE) estimator of Pesaran (Estimation and inference in large heterogeneous panels with

a multifactor error structure. Econometrica 74, 967–1012, 2006). For the pooled version of

this estimator to be consistent, either the number of observables must be larger than

the number of unobserved common factors, or the factor loadings must be distributed

independently of each other. This is a problem in the typical application involving only

a small number of regressors and/or correlated loadings. The current paper proposes

a simple extension to the CCE procedure by which both requirements can be relaxed.

The CCE approach is based on taking the cross-section average of the observables as an

estimator of the common factors. The idea put forth in the current paper is to consider

not only the average but also other cross-section combinations. Asymptotic properties

of the resulting combination-augmented CCE (C3E) estimator are provided and verified

in small samples using Monte Carlo simulations.

JEL Classification: C12; C13; C33.

Keywords: Factor-augmented panel regressions; common factor models; principal com-

ponents; cross-sectional averages; cross-sectional dependence.

∗Corresponding author: Department of Economics, Lund University, Box 7082, 220 07 Lund, Sweden. Tele-phone: +46 46 222 8997. Fax: +46 46 222 4613. E-mail address: [email protected].

1

1 Introduction

Consider the scalar and m× 1 vector of observable panel data variables yi,t and xi,t, where i =

1, ..., N and t = 1, ..., T indexes the cross-sectional and time series dimensions, respectively.

The data generating process (DGP) of the T × 1 vector yi = (yi,1, ..., yi,T)′ is similar to the

DGP of Pesaran (2006), and is given by

yi = Xiβi + ei, (1)

ei = Fλi + εi, (2)

βi = β + ξi, (3)

where Xi = (xi,1, ..., xi,T)′ is T×m, βi is a m× 1 vector of slope coefficients, F = (f1, ..., fT)

′ is

a T× r matrix of common factors with λi being the associated r× 1 vector of factor loadings,

εi = (εi,1, ..., εi,T)′ is a T × 1 vector of errors that are largely idiosyncratic, and ξi is a m× 1

vector of errors. If the model includes unit-specific fixed effects, then yi, Xi, ei, F and εi are

simply the correspondingly (time) demeaned variables.

The above model is the prototypical pooled panel regression with a factor error structure,

in which εi is independent of Xi. If F is also independent of Xi, then (1) is nothing but a static

panel data regression with exogenous regressors, which can be estimated consistently using

least squares (LS). If, however, Xi is correlated with F, then consistency will be lost. To allow

for this possibility, we follow Pesaran (2006) and assume that

Xi = FΛ′i + ηi, (4)

where Λi is a m× r loading matrix and ηi = (ηi,1, ..., ηi,T)′ is a T ×m matrix of idiosyncratic

errors. By combining (1)–(4),

Wi = FCi + Ui, (5)

where Wi = (wi,1, ..., wi,T)′ is T × (m + 1), wi,t = (yi,t, x′i,t)

′ is (m + 1) × 1, Ci = (Λ′iβi +

λi, Λ′i) is r × (m + 1), and Ui = (ui,1, ..., ui,T)′ = (ηiβi + εi, ηi) is T × (m + 1). Thus, (1)–(4)

can be rewritten equivalently as a static factor model for Wi, which is convenient because it

means that the common component of the data can be estimated using existing methods for

such models (see Chudik and Pesaran, 2013b, for a recent survey).1 In this paper, however,

1In Section 5 of the present paper we present some Monte Carlo results that enable comparison with the prin-cipal components-based estimator of Bai (2009), which is arguably the closest competitor of the CCE approach.

2

we focus on the CCE approach of Pesaran (2006), which has become very popular in the

empirical literature with a large number of applications. The approach has also attracted

much interest in the econometric literature where it has been shown to work under very

general conditions, including models with weak factors, dynamic models and even models

with non-stationary data (see, for example, Chudik et al., 2011; Chudik and Pesaran, 2013a;

Kapetanios et al., 2011; Pesaran et al., 2013; Reese and Westerlund, 2015a; Reese and Wester-

lund, 2015b).

As is well known from the classical common factor literature, F and Ci are not separately

identifiable, suggesting that the best that one can hope for is consistent estimation of the

space spanned by F. The idea of Pesaran (2006) is to make use of the cross-section variation

to estimate this space. A natural way to accomplish this is to take the cross-section average,

giving wt = C′ft + ut, where C, wt and ut are the cross-section averages of Ci, wi,t and ui,t,

respectively. Hence, since ut →p 0(m+1)×1 as N → ∞, where →p signifies convergence in

probability, we have that wt = C′ft + ut →p C

′ft. This suggests using wt as an estimator of

C′ft, a strategy that would seem to require

rk C′= r ≤ m + 1, (6)

where rk A denotes the rank of any matrix A. Hence, the number of observables must be at

least as large as the number of factors. The idea behind the CCE approach is then to estimate

β from a pooled LS regression of yi,t onto xi,t and wt, leading to the pooled CCE (CCEP)

estimator.2

Interestingly, as Pesaran (2006) points out, the condition in (6) is actually not necessary

when using the CCEP estimator. However, as has been shown by Westerlund and Urbain

(2013), and as we explain in detail in Section 3 of the current paper, relaxing (6) requires

imposing additional restrictive independence conditions on λi and Λi, which, if false, may

well render the CCEP estimator inconsistent. Hence, even if (6) can in principle be relaxed,

in most situations of practical relevance this is not necessarily so. Also, even if the more

restrictive assumptions are satisfied, the rate of consistency of the CCEP estimator when

r > m + 1 is lower than when r ≤ m + 1.2Another possibility is to estimate βi from a time series LS regression of yi,t onto xi,t and wt. This is the

individual-specific CCE estimator, which can be averaged across the cross-section to obtain the mean groupCCE (CCEMG) estimator. However, for reasons to be explained in Section 3, in this paper we focus on the CCEPestimator, although we also discuss the results for the other CCE estimators.

3

The discussion in the last paragraph suggests that it is important to have m + 1 ≥ r. The

question therefore arises as to how likely this is in practice. The number of regressors, m,

is usually a small number that is given by economic theory (and/or previous empirical evi-

dence). Economic theory is, on the other hand, not very informative regarding the number

of factors, r (see, for example, Eberhardt et al., 2013). Therefore, the theoretically implied

value of m has typically little or nothing to do with r. This is important because within CCE

choosing m also means restricting r, and in many applications there is little or no reason to

believe that this number should be less than or equal to m + 1. In view of this and the po-

tential problems involved when m + 1 < r, the restriction in (6) cannot be taken given but

should really be tested on a case-to-case basis. In practice, however, this aspect is almost

always ignored.

In the current paper we take this shortcoming as our starting point. The purpose is to

provide a simple modification of the original CCE approach allowing (but not requiring)

r > m + 1. Hence, the purpose here is not really to propose an alternative estimator, but to

show that original CCE belongs to a much broader class of estimators, which is henceforth

referred to as combination-augmented CCE (C3E). The idea behind C3E is to consider not

only the equal-weighted cross-section average, but also other combinations of w1,t, ..., wN,t.

In particular, by considering k such combinations we can allow for

k(m + 1) ≥ m + 1

common factors. In addition to the larger number of factors that can be allowed, the new

approach also enables one to consider separately the selection of m and r, which is again

not possible within the original CCE framework. In our study of the asymptotic properties

of the pooled C3E estimator we focus on the conventional homogenous slope case when

β1 = ...βN = β, although we also consider the case when this restriction is not met. The

analysis is conducted under the assumption that N, T → ∞ with T/N → τ < ∞, which is

less restrictive than the T/N → 0 condition of Pesaran (2006). Some Monte Carlo results are

also provided to suggest the asymptotic properties are borne out well in small samples.

The remainder of the paper is organized as follows. Section 2 gives the assumptions,

which are used in Section 3 to derive the asymptotic distribution of the pooled C3E estimator.

When T/N → τ < ∞ the estimator is biased. As a response to this, we propose using bias

correction, a procedure that is shown to be quite effective. As a solution to the practical

problem of how to pick the appropriate combinations, an information criterion (IC)-based

4

selection rule is proposed in Section 4. Section 5 focuses on the finite-sample accuracy of

the theory provided in Sections 3 and 4. Section 6 concludes. All proofs are provided in

Appendix.

A word on notation. tr A and ‖A‖ =√

tr (A′A) denote the trace and Frobenius (Eu-

clidean) norm, respectively, of the matrix A. Also, MA = IT −A(A′A)−1A′ for any T-rowed

matrix A. M < ∞ denotes a generic positive number. Finally, →d signifies convergence in

distribution.

2 Assumptions

The restrictions placed on εi, ηi, F, βi, λi, and Λi are given in Assumption 1, which is a

so-called “high-level” assumption (see Bai and Ng, 2002; Bai, 2003, 2009, for similar assump-

tions). The advantage of making such high-level assumptions is that the results cover a wide

range of DGPs. The disadvantage is that the assumptions can be difficult to interpret. Let

τε,ij,ts = E(εi,tεj,s) and τη,ij,ts = E(ηi,tη′j,s).

Assumption 1.

(i) E(εi,t) = 0, E|εi,t|8 < M, τε,ii,tt = σ2ε,i > 0, |σε,ii,ss| ≤ M, T−1 ∑T

s=1 ∑Tt=1 |σε,ii,ts| ≤ M,

|τε,ij,tt| ≤ |τε,ij| for some τε,ij, N−1 ∑Ni=1 ∑N

j=1 |τε,ij| ≤ M,

(NT)−1 ∑Ni=1 ∑N

j=1 ∑Tt=1 ∑T

s=1 |τε,ij,ts| ≤ M, and E[(N−1/2 ∑Ni=1[εi,sεi,t − τε,ii,st])

4] ≤ M.

(ii) E(ηi,t) = 0m×1, E(‖ηi,t‖8) ≤ M, τη,ii,tt = Ση,i is positive definite, N−1 ∑Ni=1 Ση,i → Ση

as N → ∞, where Ση is positive definite, ‖Ση,i‖ ≤ M, T−1 ∑Ts=1 ∑T

t=1 ‖τη,ii,ts‖ ≤ M,

‖τη,ij,tt‖ ≤ |τη,ij| for some τη,ij, N−1 ∑Ni=1 ∑N

j=1 |τη,ij,tt| ≤ M,

(NT)−1 ∑Ni=1 ∑N

j=1 ∑Tt=1 ∑T

s=1 ‖τη,ij,ts‖ ≤ M, and E(‖N−1/2 ∑Ni=1[ηi,tηi,s − τη,ii,ts]‖4) ≤

M.

(iii) T−1/2 ∑Tt=1 ηi,tεi,t →d N(0(m+1)×1, Σηε,i) as T → ∞ and

(NT)−1/2 ∑Ni=1 η′iεi →d N(0(m+1)×1, Σηε) as N, T → ∞, where Σηε,i and

Σηε = limN→∞ N−1 ∑Ni=1 Σηε,i are m×m positive definite matrices.

(iv) T−1 ∑Tt=1 ftf′t →p E(ftf′t) = Σ f as T → ∞, where Σ f is positive definite, and E(‖ft‖4) ≤

M.

(v) N−1/2 ∑Ni=1 ξi →d N(0m×1, Σξ) as N → ∞ with Σξ positive definite, and ‖Σξ‖ ≤ M.

5

(vi) λi and Λi are either random such that E(‖λi‖) ≤ M and E(‖Λi‖) ≤ M, or non-

random such that ‖λi‖ < M and ‖Λi‖ ≤ M. In both cases, N−1 ∑Ni=1 λiλ

′i →p Σλ,

and N−1 ∑Ni=1 ΛiΛ

′i →p ΣΛ, where Σλ and ΣΛ are positive definite.

(vii) (εi,t, η′i,t)′, fs, ξ j and (λl , Λl) are mutually independent for all i, j, l, t and s.

Remark 1. Assumption 1 is less restrictive than Assumptions 1–4 in Pesaran (2006) under

which CCE was originally proposed. Note in particular that while Pesaran (2006) only al-

lows for serial correlation, Assumption 1 (i) and (ii) allow for both serial and cross-sectional

correlation in the idiosyncratic errors. In this sense, Assumption 1 (i) and (ii) are similar to

Assumption C of Bai and Ng (2002) (see also Bai, 2003, 2009). The main difference is that

we do not allow for heteroscedasticity across time. The assumptions placed on βi are also

more general than those considered by Pesaran (2006, Assumption 4). Specifically, while Pe-

saran (2006) assumes that ξi is independent and identically distributed (iid) with mean zero

and constant covariance matrix, Assumption 1 (v) only requires that a suitable central limit

theorem applies. Similarly, while in Pesaran (2006, Assumption 3) λi and Λi are assumed to

be iid and also independent of each other, under Assumption 1 (vi) λi and Λi can be either

random in a general way or non-random. This means that λi and Λi can be correlated both

across i and with each other. As in Pesaran (2006) the loadings are assumed not to go to zero,

which means that the cross-section dependence is of the strong form. However, in analogy

with Chudik et al. (2011), some of the factors can also be weak without affecting the results.

For each of the m + 1 columns in Wi, we consider k cross-section combinations, as given

by the T× k(m+ 1) matrix N−1 ∑Ni=1 WiZi, where Zi = (Im+1⊗ z′i) is (m+ 1)× k(m+ 1) and

zi = (z1i, ..., zki)′ is a k × 1 vector of combinations. The combinations can be deterministic

and/or stochastic, provided that Assumption 2 is satisfied. Here and throughout this paper,

H =1N

N

∑i=1

Z′iC′i,

a k(m + 1)× r matrix.

Assumption 2.

(i) rk H = r for all N < ∞ and H→p H as N → ∞, where rk H = r and ‖H‖ < ∞.

(ii) Zi is either deterministic such that ‖Zi‖ ≤ M, or stochastic such that E(‖Zi‖2) ≤ M.

6

(iii) Let φt = N−1/2 ∑Ni=1 Z′iui,t and φi,t = N−1/2 ∑N

j 6=i Z′juj,t. It is assumed that E(‖φt‖2) ≤

M, T−1 ∑Tt=1 E(φtφ

′t)→ ΣZu = limN→∞ N−1 ∑N

i=1 Z′iΣu,iZi as N, T → ∞ with

Σu,i = E(ui,tu′i,t) =[

β′iΣη,iβi + σ2ε,i β′iΣη,i

Ση,iβi Ση,i

],

‖ΣZu‖ ≤ M, E(‖N−1/2T−1/2 ∑Ni=1 ∑T

t=1 Z′iui,tφ′i,t‖2) ≤ M, and

E(‖T−1/2 ∑Tt=1 Z′iui,tφ

′i,t‖2) ≤ M.

Remark 2. Note that if k = 1 and zi = 1, then N−1 ∑Ni=1 WiZi = N−1 ∑N

i=1(Wi⊗ z′i) = W, and

so we are back in the cross-section average-only original CCE approach of Pesaran (2006).

In this case, Assumption 2 is the same as in Pesaran (2006), in the sense that (i) boils down

to (6), (ii) is trivially satisfied, and (iii) is implied by Assumption 1. Pesaran (2006) does

point out that the equal-weighted average is not the only way to combine the data. How-

ever, while recognizing the fact that the weights do not have to be equal, it is still just one

combination/weighted average per observable that is being considered. The contribution

of the present paper is the consideration of multiple combinations, which is important, be-

cause it relaxes the m + 1 ≥ r requirement in (6). This makes it necessary to be specific about

the combinations that can be permitted. Interestingly, zi can be thought of as acting as an

instrument for Ci. Assumption 2 is therefore analogous to the well known orthogonality

and validity conditions in the instrumental variables (IV) literature (see Bai and Ng, 2010,

for a panel IV approach based on similar assumptions). Specifically, while strict indepen-

dence/orthogonality is not necessary, ui,t and Zi can be at most weakly correlated. We also

require that the combinations in zi are valid in the sense that rk H = r, and that certain mo-

ments exist. As in classical IV, the assumptions are placed on unobservables, which make

them harder to test had they been placed on observables. In Section 4 we elaborate on this.

Specifically, an IC-based procedure is proposed that selects only the valid combinations.

3 C3E estimation and inference

In Section 3.1, we study the asymptotic properties of the pooled C3E estimator in the case

when β1, ..., βN are all equal, and in the case when they are unrestricted. The estimation of

the various covariance matrices that appear in Section 3.1 is discussed in Section 3.2, where

we also consider briefly the properties of the individual-specific C3E estimator, which are im-

portant to ensure consistent covariance matrix estimation in the heterogeneous slope case.

7

The results reported in Section 3.1 show how the pooled C3E estimator is generally biased.

This finding lead quite naturally to the consideration of a bias-corrected estimator, the prop-

erties of which are studied in Section 3.3.

3.1 The pooled C3E estimator

As already mentioned, since F and Ci are not separately identifiable, F can only be estimated

up to a matrix rotation. The proposed estimator F of FH is given by

F =1N

N

∑i=1

WiZi =1N

N

∑i=1

(Wi ⊗ z′i), (7)

whose dimension is T × k(m + 1). The resulting pooled estimator of β is given by

βC3E =

(N

∑i=1

X′iMFXi

)−1 N

∑i=1

X′iMFyi. (8)

The CCEP estimator, henceforth denoted βCCEP, is simply βC3E with F = W.

Remark 3. The pooled C3E estimator considered here is based on “within” pooling, whereby

the data are summed over the cross-section before taking the ratio. Another approach is to

use “between” pooling, in which case the ratio is taken prior to summing over the cross-

section. Pesaran (2006) considers both types of pooling. However, since in his Monte Carlo

study within pooling generally leads to the best performing estimator, in this paper we only

consider this type. However, as mentioned in the above, as a by-product of the need for con-

sistent covariance estimation in the heterogeneous slope case, in Section 3.2 we also consider

the individual-specific C3E estimator. This estimator can be averaged, leading to a between

(or “group mean”) type C3E estimator.

Theorem 1. Suppose that ξ1 = ... = ξN = 0m×1 and k(m + 1) = r. Under Assumptions 1 and 2,

as N, T → ∞ with T/N → τ ≤ M,

√NT(βC3E − β)→d N(0m×1, Σ−1

η ΣηεΣ−1η ) + Σ−1

η

√τB,

8

where

B = B1 − B2 − B3,

B1 = limN→∞

1N

N

∑i=1

ΛiH−1ΣZu(H′)−1λi,

B2 = limN→∞

1N

N

∑i=1

Ση,i(β, Im)Zi(H′)−1λi,

B3 = limN→∞

1N

N

∑i=1

σ2ε,iΛiH−1Z′i(1, 0m)

′.

Theorem 1 is concerned with the conventional homogeneous slope case, and is the C3E

counterpart of Theorem 4 of Pesaran (2006), which requires that r = 1. Theorem 1 only

requires that k(m + 1) = r and is therefore more general in this regard. Another difference

when compared to Theorem 4 in Pesaran (2006), which supposes that T/N → 0, is that

Theorem 1 only requires that T/N → τ ≤ M, making it more relevant for applied work.

Moreover, by relaxing the T/N → 0 requirement, Theorem 1 also reveals the presence of an

asymptotic bias that is not in Theorem 4 of Pesaran (2006).

Analogous to the bulk of the existing literature on factor-augmented regressions, Theo-

rem 1 supposes that r is known and that β1 = ... = βN = β (see, for example, Bai, 2009;

Goncalves and Perron, 2014; Greenaway-McGrevy et al., 2012). The former assumption is

without loss of generality in the sense that if r is unknown, the IC-based approach of Section

4 can be used to obtain a consistent estimate. The effect of a violation of the common slope

assumption is studied in Theorem 2.

Theorem 2. Suppose that k(m + 1) = r. Under Assumptions 1 and 2, as N, T → ∞,

√N(βC3E − β)→d N(0m×1, Σ−1

η RΣ−1η ),

where

R = limN→∞

1N

N

∑i=1

Ση,iΣξΣη,i.

Theorem 2 is the C3E counterpart of Theorem 3 of Pesaran (2006). It shows that the

variance of the estimator emanates from the heterogeneity of the slopes, as measured by

Σξ . This result is analogous to that of Pesaran (2006) for the CCEP estimator. However,

the asymptotic variance of this estimator has an additional term that depends on the het-

erogeneity of the factor loadings and that is there because the rank condition in (6) is not

9

assumed to be met. The C3E estimator does not depend on whether (6) is satisfied, which

is also the reason for why the asymptotic distribution given in Theorem 2 does not depend

on the factor loadings. In order to illustrate this point, suppose first that m + 1 = r. Since

in this case wt = H′ft + op(1), where H = C is of full rank and hence invertible, we have

N−1/2T−1 ∑Ni=1 X′iMWFλi = N−1/2T−1 ∑N

i=1 X′iMFHFλi + op(1) = N−1/2T−1 ∑Ni=1 X′iMFFλi +

op(1) = op(1) (see Pesaran, 2006, equation (40)). Hence,

√N(βCCEP − β) =

(1

NT

N

∑i=1

X′iMWXi

)−11√NT

N

∑i=1

X′iMW(Xiξi + Fλi + εi)

=

(1

NT

N

∑i=1

X′iMWXi

)−11√NT

N

∑i=1

X′iMWXiξi + op(1), (9)

which converges to the same asymptotic distribution given in Theorem 2, provided that ξi is

“nicely behaved” in the sense that Assumption 1 (iv) is met. If, on the other hand, m + 1 < r,

then N−1/2T−1 ∑Ni=1 X′iMWFλi will not be negligible (see Pesaran, 2006, equation (38)), and

so we obtain

√N(βCCEP − β) =

(1

NT

N

∑i=1

X′iMWXi

)−11√NT

N

∑i=1

X′iMW(Xiξi + Fλi) + op(1), (10)

which will not converge in distribution unless λi is also nicely behaved. As pointed out by

Westerlund and Urbain (2013), one requirement here is that λi and Λi are mutually inde-

pendent, which seems like a rather restrictive assumption. For example, when regressing

investments on savings, as is commonly done in the literature on the so-called “Feldstein–

Horioka puzzle”, a common shock that increases savings is going to push interest rates down

and investments up, suggesting that λi and Λi should be negatively correlated. Thus, while

the requirement that m + 1 ≥ r can be relaxed also within the original CCE framework, this

does not come free of charge.

It is important to note that in the above example the rate of consistency of βCCEP is given

by√

N and not by√

NT. One may think that this relatively low rate of consistency is

due to the heterogeneity of βi, and that imposing β1 = ... = βN = β would prevent this

from happening, regardless of whether m + 1 ≥ r or m + 1 < r.3 However, this is not the

case. The reason is easily appreciated by simply imposing ξ1 = ... = ξN = 0m×1 and using

3It is not clear from Pesaran (2006) whether one can have β1 = ... = βN = β, while at the same time permittingm + 1 < r.

10

(NT)−1/2 ∑Ni=1 X′iMWεi = Op(1), from which it follows that

1√NT

N

∑i=1

X′iMW(Xiξi + Fλi + εi) =1√NT

N

∑i=1

X′iMWFλi + Op(1). (11)

If m + 1 ≥ r, then (NT)−1/2 ∑Ni=1 X′iMWFλi = op(1), and so we obtain

√NT(βCCEP − β) =

Op(1). Hence, provided that m + 1 ≥ r, imposing β1 = ... = βN = β restores√

NT-

consistency. If, on the other hand, m + 1 < r, then T−1X′iMWF = Op(1), and therefore

√NT(βCCEP − β) =

√T

(1

NT

N

∑i=1

X′iMWXi

)−11√N

N

∑i=1

T−1X′iMWFλi + Op(1), (12)

whose order is determined by the order of the first term on the right, which in turn depends

on λi and Λi. If λi is iid and independent of Λi, then the first term is Op(√

T ), whereas if

λi is non-iid and/or correlated with Λi, then the same term is Op(√

NT ). Thus, the rate of

consistency is√

N, at best, and if λi is non-iid and/or correlated with Λi, then βCCEP is even

inconsistent. The proposed C3E estimator in the homogeneous slope case is not only very

simple, but also√

NT-consistent regardless of the specification of λi and Λi, provided that

Assumptions 1 and 2 are satisfied.

3.2 Covariance matrix estimation

In this section we derive consistent estimators of the covariance matrices that appear in The-

orems 1 and 2. We begin by considering Σ−1η ΣηεΣ−1

η , which according to Theorem 1 is the ap-

propriate covariance matrix to consider when β1, ..., βN are all equal. Let εi = (εi,1, ..., εi,T)′ =

MF(yi − Xi βC3E) and ηi = (ηi,1, ..., ηi,T)′ = MFXi. A naturally consistent estimator of Ση,i is

given by

Ση,i =1T

T

∑t=1

ηi,tη′i,t, (13)

from which we obtain

Ση =1N

N

∑i=1

Ση,i. (14)

For Σηε, we follow Pesaran (2006), who recommend using a heteroskedasticity and autocor-

relation consistent (HAC) estimator in the spirit of Newey and West (1987). The particular

estimator considered here is given by

Σηε =1N

N

∑i=1

Σηε,i, (15)

11

where

Σηε,i = Σηε,i(0) +p

∑j=1

(1− j

p + 1

)[Σηε,i(j) + Σηε,i(j)′], (16)

Σηε,i(j) =1T

p

∑t=j+1

εi,tεi,t−jηi,tη′i,t−j, (17)

with p being the window size. The appropriate covariance matrix estimator to use in the

homogenous slope case is therefore given by Σ−1η ΣηεΣ

−1η .

If, as in Theorem 2, β1, ..., βN are not all the same, the above covariance estimator is

no longer consistent. Specifically, while Ση is still consistent, because of the reduced rate

of consistency of βC3E, Σηε is inconsistent for R. Recognizing this problem, Pesaran (2006)

proposes a nonparametric method that makes use of the individual-specific CCE estimator.

The appropriate C3E analog of this estimator is given by

R =1N

N

∑i=1

Ri, (18)

where

Ri = Ση,i

(βC3E,i −

1N

N

∑i=1

βC3E,i

)(βC3E,i −

1N

N

∑i=1

βC3E,i

)′Ση,i, (19)

βC3E,i = (X′iMFXi)−1X′iMFyi. (20)

The consistency of this estimator follows from the consistency of the individual-specific C3E

estimator, βC3E,i.

Theorem 3. Suppose that k(m + 1) = r. Under Assumptions 1 and 2, as N, T → ∞ with√

T/N → 0,

√T(βC3E,i − βi)→d N(0, Σ−1

η,i Σηε,iΣ−1η,i ).

Theorem 3 is the C3E counterpart of Theorem 1 of Pesaran (2006). It is important to note

that unlike this other theorem, provided that k(m + 1) = r, Theorem 3 does not require that

(6) is satisfied. The fact that βC3E,i is consistent means that the appropriate covariance matrix

estimator to consider in the heterogenous slope case is given by Σ−1η RΣ

−1η .

3.3 Bias-adjustment

Theorem 1 shows that while consistent, the asymptotic distribution of the pooled C3E esti-

mator is biased when T/N → τ > 0, leading to misleading inference. As pointed out by Bai

12

(2009), an obvious solution to this problem is to use bias correction. Let us therefore define

the following bias-adjusted version of βC3E:

βBAC3E = βC3E − N−1Σ−1η B, (21)

where B = B1 − B2 − B3 with

B1 =1N

N

∑i=1

ΛiΣZuλi, (22)

B2 =1N

N

∑i=1

Ση,i(βC3E, Im)Ziλi, (23)

B3 =1N

N

∑i=1

σ2ε,iΛiZ′i(1, 0m)

′. (24)

Here Ση and Ση,i are as in Section 3.3, while

σ2ε,i =

1T

T

∑t=1

ε2i,t, (25)

where εi,t is again as in Section 3.3. Also, letting ui = MFWi, we have

ΣZu =1N

N

∑i=1

Z′iΣu,iZi, (26)

Σu,i =1T

T

∑t=1

ui,tu′i,t. (27)

The estimators λi and Λi of λiH−1 and ΛiH−1, respectively, are obtained by simply picking

the appropriate elements in Ci = (F′F)−1F′Wi, a (m + 1)k× (m + 1) matrix.

Corollary 1. Under the conditions of Theorem 1,

√NT(βBAC3E − β) =

√NT(βC3E − β)−

√τΣ−1

η B + op(1).

According to Corollary 1,√

NT(βBAC3E− β) is asymptotically equivalent to√

NT(βC3E−

β)−√

τΣ−1η B, whose asymptotic distribution is easily inferred from Theorem 1. Indeed,

√NT(βBAC3E − β)→d N(0m×1, Σ−1

η ΣηεΣ−1η ) (28)

as N, T → ∞ with T/N → τ ≤ M. The bias correction is therefore asymptotically success-

ful. Moreover, the correction does not contribute to the limiting variance.

13

4 Selecting the combinations

A problem in applications is how to construct the combination matrix, z. This problem can

be seen as being comprised of two parts; (i) finding candidate combinations, and (ii) selecting

among the candidates.

4.1 Finding combination candidates

While zi is not required to be uncorrelated with ui,t, the correlation is not irrelevant, as the

rate of consistency of F is increased when zi and ui,t are uncorrelated. The combinations

in zi are therefore ideally chosen to be uncorrelated with ui,t. They should also be highly

correlated with Ci. Specifically, since Ci = (Λ′iβi + λi, Λ′i), for zi to be highly correlated with

Ci, one should choose combinations that are believed to be highly correlated with the factor

loadings.

An obvious approach to selecting the combinations is to exploit if there are natural can-

didates in the particular application being considered. For example, in macroeconomics

usual common factor suspects include trade of goods and services, technology spillovers,

and worldwide supply shocks, such as oil price shocks (see, for example, Dees et al., 2007).

The task of finding combinations that are correlated with the loadings is therefore tanta-

mount to finding variables that measure the extent to which countries are affected by these

usual suspects. Mastromarco et al. (2015), and Eberhardt and Teal (2011) argue that spillover

effects of globalization and business cycles, and common political, economic and spatial

stimuli are likely to make production correlated across countries. As examples of variables

that measure the effect of these common factors they mention openness, trade agreements,

physical capital shares in aggregate income, human capital, growth determinants, initial per

capita income, institutional environment, qualitative features of governance, geographical

features, adoption of efficiency enhancing technology, and natural resource constraints. If

the analysis is made at the firm level, the extent to which a firm’s production function is

affected by common factors is likely to depend on for example the size of the firm, financial

constraints, and the technology adopted (see Eberhardt and Teal, 2011; Chudik and Straub,

2011). In the spillover literature, absorptive capacity is known to be an important determi-

nant of the effect of knowledge, which can in turn be measured using, for example, openness,

trade flows, human capital, and various development indices (see, for example, Fracasso and

14

Marzetti, 2014; Fracasso and Marzetti, 2015). Baxter and Kouparitsas (2004) and Imbs (2004)

study the determinants of business cycle comovements. They conclude that trade is the most

important determinant of cross-country business cycle linkages. Trade is an important de-

terminant of cross-country linkages also in financial markets (see, for example, Forbes and

Chinn, 2004; Dees et al., 2007), although when modelling returns, it is standard practice to

use asset specific characteristics (“fundamentals”) like industry classification, market capi-

talization, and style classification as observable loadings, or “betas” (Rosenberg, 1974).

As the above discussion illustrates, in many applications there are natural combination

candidates that can go into zi. Deterministic combinations are particularly simple to come

by. Specifically, as Chudik et al. (2011) show, the cross-section average can be quite effective

in mopping up cross-section dependence. A vector of ones is therefore a good starting point.

Tutz and Binder (2007) considers the problem of boosting ridge regression. They separate

between “must have” candidates and other variables. The cross-section average can there-

fore be thought of as a “must have combination”. This special treatment of the cross-section

average highlights the role of C3E as an extension rather than as an alternative to original

CCE. Other readily available combination candidates are preliminary consistent estimates

of (the space spanned by) Ci. The only requirement is that the rate of consistency must be

at least√

N, which is sufficiently relaxed to enable estimation by principal components (see

Bai, 2003).

4.2 An IC-based selection procedure

An advantage of using deterministic combinations and/or preliminary loading estimates is

that they are (asymptotically) uncorrelated with ui,t.4 However, in practice there is no guar-

antee that the combinations are valid, and if some of the combinations are stochastic there is

also likely to be uncertainty regarding the correlation with ui,t. In this section, we propose

a selection criteria for the combination vector, zi. In so doing, it is convenient, albeit not

necessary (see the discussion that follows Corollary 2 below), to assume that the valid com-

binations, henceforth denoted z0,i, are ordered first (see, for example, Zheng and Loh, 1995;

Zheng and Loh, 1997; Donald and Newey, 2001, for similar assumptions), and that their

number is given by k0. In fact, analogous to IV selection, it is useful to treat also the combi-

nations within z0,i as ordered, but then according to their correlation with Ci. The first com-4This advantage of using deterministic instruments has been pointed out before by Phillips and Hansen

(1990) in the context of IV estimation of cointegrated time series regressions.

15

bination in z0,i has the highest correlation. The invalid, or “nuisance”, combinations, hence-

forth denoted z1,i, are ordered least, implying that zi can be partitioned as zi = (z′0,i, z′1,i)′.

Assumption 3 summarizes the restrictions imposed on this vector. Here and throughout the

rest of this section, Zp,i = (Im+1 ⊗ z′p,i), Hp = N−1 ∑Ni=1 Z′p,iC

′i, φp,t = N−1 ∑N

i=1 Z′p,iui,t and

φp,i,t = N−1∑Nj 6=iZ

′p,juj,t, where p ∈ {0, 1}.

Assumption 3.

(i) z0,i is such that Assumption 2 is satisfied with r ≤ k0(m + 1), and Zi and H replaced

by Z0,i and H0, respectively.

(ii) z1,i violates Assumption 2 in such a way that E(‖N−1/2φ1,t‖2) ≤ M,

E(‖T−1 ∑Tt=1 ui,tu′i,tZ1,i‖2) ≤ M, E(‖N−1T−1 ∑N

i=1 ∑Tt=1 Z′1,iui,tu′i,tZ1,i‖2) ≤ M,

E(‖N−3/2T−1/2 ∑Ni=1 ∑T

t=1 ui,tφ′1,i,t‖2) ≤ M, and

E(‖N−3/2T−1/2 ∑Ni=1 ∑T

t=1 Z′iui,tφ′1,i,t‖2) ≤ M.

Remark 4. According to Assumption 3, while the valid combinations in z0,i satisfy Assump-

tion 2 (which means that they are at most weakly correlated with ui,t), the nuisance com-

binations in z1,i do not. The type of violations that can be permitted are characterized by

Assumption 3 (ii), which requires that z1,i is at most strongly correlated with ui,t.

The IC considered in the present paper can be seen as a multivariate version of the IC

criterion of Bai and Ng (2002), and is given by

IC(s) = ln det V(Fs) + s · g, (29)

where V(A) = (NT)−1 ∑Ni=1 W′

iMAWi for any T-rowed matrix A, Fs is F based on s combi-

nations, and g is a penalty term. The associated IC estimator r of r is given simply by

r = arg mins=0,...,smax

IC(s), (30)

where smax ≥ r.

Proposition 1. Under Assumptions 1 and 3, if g→ 0 and min{N,√

T} · g→ ∞, as N, T → ∞,

P(r = r)→ 1.

Define k0 = r/(m + 1). Since r is consistent for r, k0 is consistent for r/(m + 1), which

may or may not be equal to k0. Indeed, since an additional combination increases the dimen-

sion of zi by (m + 1) and not by one, k0 is consistent for k0 only if r is a scalar multiple of

16

(m + 1) (see Smeekes, 2015 for a detailed discussion in the context of subpanel selection). If

r is not a scalar multiple of (m + 1), then k0 estimates the minimal number of combinations

required to approximate the underlying factor structure. Hence, while strictly speaking we

only require r ≤ k0(m + 1), for ease of interpretation it is convenient to think of r as being

equal to k0(m + 1).

In order to appreciate the implications of Proposition 1 it is convenient to treat β as a

function of r (or k0). Let us therefore write βrC3E for βC3E. Clearly,

P[√

NT(βrC3E − β) ≤ δ] = P[

√NT(β

rC3E − β) ≤ δ|r = r]P(r = r)

+ P[√

NT(βrC3E − β) ≤ δ|r 6= r]P(r 6= r),

where δ > 0. Because P(r = r) → 1 and P(r 6= r) → 0 by Proposition 1, while the first term

on the right-hand side converges to P[√

NT(βrC3E− β) ≤ δ|r = r] = P[

√NT(β

rC3E− β) ≤ δ],

the second term converges to zero. It follows that

|P[√

NT(βrC3E − β) ≤ δ]− P[

√NT(β

rC3E − β) ≤ δ]| → 0, (31)

implying that Theorem 1 is unaffected by the estimation of r.

Interestingly, if all the instruments under consideration are valid, the requirement on the

rate of expansion of the penalty can be relaxed, from min{N,√

T} · g→ ∞ to min{√

N,√

T}2 ·

g→ ∞.

Corollary 2. Suppose that zi = z0,i. Under Assumptions 1 and 3, if g → 0 and min{√

N,√

T}2 ·

g→ ∞, as N, T → ∞,

P(r = r)→ 1.

Bai and Ng (2002) propose several ICs that are appropriate in the context of principal

components estimation of common factor models. The Corollary 2 requirement that g → 0

and min{√

N,√

T}2 · g→ ∞ is the same as in their paper. The Proposition 1 requirement, on

the other hand, is, as already mentioned, stronger. Note in particular that if T/N → τ ≤ M,

then Corollary 2 requires that T · g → ∞, which is obviously implied by the Proposition 1

requirement that√

T · g→ ∞. The stricter condition in Proposition 1 is due to the presence of

the invalid combination candidates, and implies that such candidates will not be selected by

the procedure. As usual, the penalty g is not unique and has to be set by the researcher. Let

17

C be either min{√

N,√

T}2 or min{N,√

T}. Bai and Ng (2002) set g = O(C−1 ln(C)) ≥ 0,

such that g → 0 and C · g = O(ln(C)) → ∞. Hence, if C = min{√

N,√

T}2, then g → 0

but min{N,√

T} · g = O(min{N,√

T} · C−1 ln(C)) need not go to infinity. Hence, under the

conditions of Proposition 1 the penalty implied by Corollary 2 is too small. In essence, to be

able to root out those combinations that are correlated with ui,t the penalty has to be higher.

According to Proposition 1, C = min{N,√

T} is enough. In this paper we therefore set

g = (m + 1)ln(min{N,

√T})

min{N,√

T}, (32)

where the term (m + 1) is there to account for the dimension of V(Fs).

Remark 5. As already alluded to in Section 2, the presence of correlation between Zi and

ui,t affects the rate of consistency of F. For√

NT(βC3E − β) to have its stated asymptotic

distribution, it is essential that F is√

N-consistent, which will only be the case if Zi and

ui,t are at most weakly correlated. However, the IC considered here only requires that

T−1‖(F − FH′)′(F − FH′)‖ = op(1), which does not require that Zi and ui,t are at most

weakly correlated (see Bai and Ng, 2002, page 198, for a similar discussion in case of princi-

pal components estimation).

As alluded to in the above, it is not necessary for the candidates to be pre-ordered. If

there is no natural ordering, then one possibility is to simply use an all-subset grid-search,

which is feasible in applications where k is a relatively small number. If k is larger, then we

recommend following, for example, Zheng and Loh (1995), and Zheng and Loh (1997), and

to order the candidates according to an estimate of their correlation with Ci. This can be

done by taking Wi = T−1 ∑Tt=1 wi,t as an estimator for the space spanned by Ci. The logic

behind this approach is that Wi = T−1 ∑Tt=1 wi,t = C′iF + ui = C′iF + op(1). This gives m + 1

correlations for each combination in zi, which can be combined by taking as an example the

average. This is the approach used in the Monte Carlo experiments of Section 5.

5 Monte Carlo results

In this section we evaluate the small sample properties of the C3E estimator. The DGP

we use for this purpose can be seen as a restricted version of (1)–(4), and sets m = 1 and

(f′t, ηi,t, εi,t) ∼ N(0(r+2)×1, Ir+2). The difference between the experiments considered lies in

18

how we generate λi and Λi. Six experiments, denoted by E1–E6, are considered. In E1–

E2 and E6, the condition in (6) is satisfied, whereas in E3–E5, the condition is violated. In

E3, λi and Λi are iid and independent of each other, as required in original CCE, whereas

in E4 and E5, λi and Λi are non-iid. Exactly how yi,t, xi,t, λi, Λi and zi are generated is

described in Table A. For each experiment, 20 (N, T) pairs are considered. In the first 16,

N, T ∈ {30, 50, 100, 200}, whereas in the last four, N = bT4/3c. The motivation behind the

last four pairs is to asses the performance when T/N → 0.

The performance of C3E is compared with that of the naive LS estimator that ignores the

cross-section dependence altogether, the principal components (PC) estimator of Bai (2009)

and the original CCEP estimator of Pesaran (2006). Three versions of the pooled C3E estima-

tor is considered, which differ only in the choice of combinations. For each experiment, there

is a maximum of six combinations to chose from. Specifically, while z1i = 1, z2i, z3i, z4i, z5i

and z6i are drawn from N(0.5, 1), N(−0.4, 1), N(0.2, 1), N(0.5, 1) and N(0.1, 1), respectively

(see Table A). The first estimator, denoted “C3E1”, is based on taking only those k = r/2 com-

binations that are mostly correlated with Ci in the DGP. Thus, if r = 2, then C3E1 is based on

taking the single most correlated combination. Note that this estimator is infeasible in the

sense that it presumes knowledge of both r and the correlation of the combinations with Ci.

The second estimator, denoted “C3E2”, uses the IC criterion discussed in Section 4, which

is applied after first ordering the combinations according to their cross-section correlation

with Wi, as suggested in Section 4. The third and final estimator, denoted “C3E3”, is the

same as C3E2 except that z1i = 1 is always included as a must have combination, following

the recommendation of Section 4.

Two sets of results are reported, both of which are based on making 5, 000 draws from

the DGPs described in Table A. The first set include the bias of each estimator, and the size of

a nominal 5% level t-test. These results are reported in Tables E1–E6, which are conveniently

labelled according to the particular experiment to which they refer. The second set of results

contain the frequency counts for the selected number of combinations used by C3E2 and

C3E3. These results are reported in Table B. The conclusions that are drawn from all seven

tables may be summarized as follows.

E1. The aim of this experiment is to compare the performance of original CCE and C3E

when all the conditions required for both methods are met. Under these conditions

both CCE and C3E should perform equally well, which is also reflected in Table E1.

19

The relatively poor performance of PC is also partly expected, given the findings of

Westerlund and Urbain (2015). We also see that the performance of C3E2 and C3E3 is

very similar to that of C3E1, which means that the selection of the candidates is not

detrimental for performance. This is confirmed by Table B showing how the IC proce-

dure is doing a good job in selecting the number of candidates. In fact, in case of C3E2

the correct selection frequency is one in all cases considered. C3E3 tends to include too

many combinations, which is only natural given the requirement to always include a

vector of ones. However, we also see that this tendency to overselect decreases with

increasing sample sizes.

E2. In this experiment, the loadings are generated independently of the combinations.

Both loadings and combinations still have non-zero means, though, which means that

Assumption 2 is satisfied. However, since for most of the instruments the elements

of H are now smaller (in absolute value) than in E1, the performance under E2 is still

expected to worse than under E1, and this is also what we see when looking across

Tables E1 and E2.

E3. This experiment is conducted to compare the performance of CCEP and C3E when (6)

is not satisfied. Specifically, since in this case

E(Ci) =

[1.3 −0.42.6 −0.8

],

we have rk E(Ci) = 1 < m + 1 = 2. However, since the factor loadings are indepen-

dent, as explained in Section 3.1, the CCEP estimator is still expected to work. The

results reported in Table E3 reveal that while decreasing in N, the bias of the CCEP

estimator is roughly constant in T, as are the size distortions. Performance is still ac-

ceptable, though, which in view of the independence of the loadings is in accordance

with our expectations. However, the best performance is generally obtained by using

C3E, which reflects its relatively high rate of consistency in this case.

E4. In this experiment,

E(Ci) =

[1 1−0.25 −0.25

],

and therefore (6) is violated. However, in contrast to E3 now λi and Λi are correlated.

As expected, this makes CCEP break down. However, since the combinations are still

20

correlated with the loadings, C3E continues to perform well, as does the IC-based se-

lection procedure.

E5. In this experiment, (6) is again violated. However, this time the violation due to the

presence of too many factors; m + 1 = 2 < r = 4. As expected, CCE breaks down.

Interestingly, the effect of this break-down is even more pronounced than in E4. Both

bias and size distortion now increase with the sample size, making the LS problems

seem relatively mild in comparison. By contrast, C3E continues to do well in terms of

bias and size accuracy. One difference is that the tendency of C3E3 to select too many

combinations is now even more pronounced than before. However, this does not seem

to have too much of an effect on the overall performance of this estimator.

E6. The aim of this experiment is to evaluate the performance of the bias-adjustment pro-

cedure proposed in Section 3.3. The results reported in Table E6 suggest that bias-

adjustment leads to a considerable improvement for all estimators considered, includ-

ing CCEP, although C3E tend to perform best.

6 Conclusion

This paper considers the problem of consistent estimation of a factor-augmented panel re-

gression model in which the number of factors, r, is potentially larger than the number of

observables, m + 1. The estimator that we propose can be viewed as an extension of the

CCEP estimator of Pesaran (2006), which is based on using the cross-section averages of the

observables as proxies for the latent factors. While CCEP does allow r > m + 1, it does so

at a cost. In particular, it is required that the factor loadings are independently distributed,

which in most cases of practical relevance is likely to be violated. But even if the assumption

is in fact satisfied, violations of m + 1 ≥ r are still costly. This is particularly true in the ho-

mogenous slope case in which a violation causes a reduction in the rate of consistency, from

the usual√

NT-rate to√

N. In this paper we take this feature of CCE as our starting point.

The purpose is to provide a simple extension that preserves√

NT-consistency without for

that matter requiring independent loadings.

The idea behind the proposed C3E approach is to use not only the cross-section average

but also other (cross-section) combinations of the observables. By taking k ≥ 1 such combi-

nations we can allow k(m + 1) ≥ m + 1 common factors without for that matter requiring

21

independent loadings. In the analysis of the properties of the resulting pooled C3E estima-

tor we focus on the standard assumption of a common slope coefficient, although we also

consider the case when the slopes have a random distribution across the cross-section. We

show that the estimator is√

NT-consistent and asymptotically normal under the condition

that T/N → τ < ∞. This condition is more genal than the T/N → 0 condition of Pesaran

(2006), whose relaxation is shown to have important consequences. In particular, it is shown

that the estimator is biased whenever τ > 0. As a response to this, a bias-adjusted C3E esti-

mator is proposed, which is shown to support asymptotically normal and bias-free inference

under T/N → τ < ∞. This is true if the combinations are known. If there is uncertainty

over which combinations to use an IC can be used to select the appropriate combinations.

The small-sample performance of the C3E estimator is examined through a series of

Monte Carlo experiments. The results suggest that whenever the assumptions of Pesaran

(2006) are satisfied, the performance of the CCE and C3E estimators are comparable. If,

however, the assumptions are not met, then the C3E estimator continues to work well, while

the CCE estimator breaks down. We also find that the proposed bias-adjustment and IC-

based combination selection procedures seem to work well, leading to estimators with good

small-sample properties.

22

References

Amengual, D. and M. W. Watson (2006). Consistent estimation of the number of dynamic

factors in a large N and T panel, detailed appendix. Technical report, Mimeo, May.

Amengual, D. and M. W. Watson (2007). Consistent estimation of the number of dynamic

factors in a large N and T panel. Journal of Business & Economic Statistics 25, 91–96.

Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71, 135–

171.

Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica 77, 1229–1279.

Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models.

Econometrica 70, 191–221.

Bai, J. and S. Ng (2006). Determining the number of factors in approximate factor models,

errata. Technical report, Mimeo, May.

Bai, J. and S. Ng (2010). Instrumental variable estimation in a data rich environment. Econo-

metric Theory 26, 1577–1606.

Baxter, M. and M. A. Kouparitsas (2004). Determinants of business cycle comovement: A

robust analysis. Technical report, Working Paper W10725, National Bureau of Economic

Research.

Chudik, A., H. M. Pesaran, and E. Tosetti (2011). Weak and strong cross-section dependence

and estimation of large panels. Econometrics Journal 14, C45–C90.

Chudik, A. and M. H. Pesaran (2013a). Common correlated effects estimation of heteroge-

neous dynamic panel data models with weakly exogenous regressors. Technical report,

CESifo Working Paper.

Chudik, A. and M. H. Pesaran (2013b). Large panel data models with cross-sectional depen-

dence: a survey. Technical report, CESifo Working Paper.

Chudik, A. and R. Straub (2011). Size, openness, and macroeconomic interdependence. Tech-

nical report, Globalization and Monetary Policy Institute Working Paper 103.

23

Dees, S., F. di Mauro, H. M. Pesaran, and V. L. Smith (2007). Exploring the international

linkages of the euro area: A global var analysis. Journal of Applied Econometrics 22, 1–38.

Donald, S. G. and W. K. Newey (2001). Choosing the number of instruments. Econometrica 69,

1161–1191.

Eberhardt, M., C. Helmers, and H. Strauss (2013). Do spillovers matter when estimating

private returns to R&D? Review of Economics and Statistics 95, 436–448.

Eberhardt, M. and F. Teal (2011). Econometrics for grumblers: A new look at the literature

on cross-country growth empirics. Journal of Economic Surveys 25, 109–155.

Forbes, K. J. and M. D. Chinn (2004). A decomposition of global linkages in financial markets

over time. The Review of Economics and Statistics 86, 705–722.

Fracasso, A. and G. V. Marzetti (2014). International R&D spillovers, absorptive capacity and

relative backwardness: A panel smooth transition regression model. International Economic

Journal 28, 137–160.

Fracasso, A. and G. V. Marzetti (2015). International trade and R&D spillovers. Journal of

International Economics 96, 138–149.

Goncalves, S. and B. Perron (2014). Bootstrapping factor-augmented regression models. Jour-

nal of Econometrics 182, 156–173.

Greenaway-McGrevy, R., C. Han, and D. Sul (2012). Asymptotic distribution of factor aug-

mented estimators for panel regression. Journal of Econometrics 169, 48–53.

Imbs, J. (2004). Trade, finance, specialization and synchronization. The Review of Economics

and Statistics 84, 723–734.

Kapetanios, G., M. H. Pesaran, and T. Yamagata (2011). Panels with nonstationary multifac-

tor error structures. Journal of Econometrics 160, 326–348.

Mastromarco, C., L. Serlenga, and Y. Shin (2015). Modelling technical efficiency in cross sec-

tionally dependent stochastic frontier panels. Journal of Applied Econometrics, forthcoming.

Newey, W. K. and K. D. West (1987). Hypothesis testing with efficient method of moments

estimation. International Economic Review, 777–787.

24

Paulsen, J. (1984). Order determination of multivariate autoregressive time series with unit

roots. Journal of Time Series Analysis 5, 115–127.

Pesaran, M. H. (2006). Estimation and inference in large heterogenous panels with a multi-

factor error structure. Econometrica 74, 967–1012.

Pesaran, M. H., L. Vanessa Smith, and T. Yamagata (2013). Panel unit root tests in the pres-

ence of a multifactor error structure. Journal of Econometrics 175, 94–115.

Phillips, P. C. B. and B. Hansen (1990). Statistical inference in instrumental variables regres-

sion with I(1) variables. Review of Economic Studies 57, 99–125.

Reese, S. and J. Westerlund (2015a). Estimation of factor-augmented panel regressions with

weakly influential factors. Econometric Reviews, forthcoming.

Reese, S. and J. Westerlund (2015b). Panicca – panic on cross-section averages. Journal of

Applied Econometrics, forthcoming.

Rosenberg, B. (1974). Extra-market components of covariance in security returns. Journal of

Financial and Quantitative Analysis 9, 263–274.

Smeekes, S. (2015). Bootstrap sequential tests to determine the order of integration of indi-

vidual units in a time series panel. Journal of Time Series Analysis 36, 398–415.

Stock, J. and M. W. Watson (1998). Diffusion indexes. Technical report, Working Paper 6702,

National Bureau of Economic Research.

Tutz, G. and H. Binder (2007). Boosting ridge regression. Computational Statistics and Data

Analysis 51, 6044–6059.

Westerlund, J. and J.-P. Urbain (2013). On the estimation and inference in factor-augmented

panel regressions with correlated loadings. Economics Letters 119(3), 247–250.

Westerlund, J. and J.-P. Urbain (2015). Cross-sectional averages versus principal components.

Journal of Econometrics 185, 372–377.

Zheng, X. and W.-Y. Loh (1995). Consistent variable selection in linear models. Journal of the

American Statistical Association 90, 151–156.

25

Zheng, X. and W.-Y. Loh (1997). A consistent variable selection criterion for linear models

with high-dimensional covariates. Statistica Sinica 7, 311–325.

26

Appendix: Proofs

We start with some notation. The model for wi,t = (yi,t, x′i,t)′ can be written in matrix nota-

tion as

Wi = FCi + Ui, (A1)

where Wi = (wi,1, ..., wi,T)′ is T × (m + 1), F = (f1, ..., fT)

′ is T × r, Ci = (Λ′iβ + λi, Λ′i) is

r× (m + 1) and Ui = (ui,1, ..., ui,T)′ = (ηiβ + εi, ηi) is T × (m + 1). Alternatively, the model

for wi,t can be written as the following N-dimensional system:

wt = Cft + ut, (A2)

where wt = (w′1t, ..., w′Nt)′ and ut = (u′1t, ..., u′Nt)

′ are N(m + 1)× 1, and C = (C1, ..., CN)′ is

N(m + 1)× r. The matrix notation

W = FC′ + U (A3)

will also be used, where W = (W1, ..., WN) and U = (U1, ..., UN) are T × N(m + 1). In what

follows the representations in (A1)–(A3) will be used interchangeably.

Many of the results can be expressed in terms of (F− FH′). Let us therefore define

D = F− FH′ =1N

N

∑i=1

UiZi, (A4)

whose dimension is given by T× (m+ 1)k. It is further convenient to write D = (d1, ..., dT)′,

where

dt = ft −Hft =1N

N

∑i=1

Z′iui,t (A5)

is (m + 1)k× 1.

Before we come to the proof of Theorem 1 we state some useful lemmas.

Lemma A.1. Under Assumption 2,

1T

T

∑t=1‖dt‖2 = Op(N−1).

Proof of Lemma A.1.

27

The proof of Lemma 1 is a simple consequence of the fact that ‖N−1/2 ∑Ni=1 Z′iui,t‖ = ‖φt‖ =

Op(1), by Assumption 2 (iii), as seen by using (A5) and writing

1T

T

∑t=1‖dt‖2 ≤ 1

NT

T

∑t=1

∥∥∥∥∥ 1√N

N

∑i=1

Z′iui,t

∥∥∥∥∥2

= Op(N−1),

where triangle inequality is used to obtain the first inequality. �

Lemma A.2. Under Assumptions 1-2,

‖√

NT−1/2F′D‖ = Op(1).

Proof of Lemma A.2.

Since, by using (A5),

√NT−1/2F′D =

1√NT

N

∑i=1

T

∑t=1

ftu′i,tZi =1√T

T

∑t=1

ft1√N

N

∑i=1

u′i,tZi =1√T

T

∑t=1

ftφ′t, (A6)

the proof is an immediate consequence of Assumption 1 (vii) and Assumption 2 (iii). �

Lemma A.3. Under the conditions of Lemma A.1 and as N, T → ∞,

NT−1D′D = ΣZu + op(1).

Proof of Lemma A.3.

By substituting (A5),

NT−1D′D =NT

T

∑t=1

dtd′t =1

NT

T

∑t=1

N

∑i=1

N

∑j=1

Z′iui,tu′j,tZj

=1

NT

T

∑t=1

N

∑i=1

Z′iui,tu′i,tZi +1

NT

T

∑t=1

N

∑i=1

N

∑j 6=i

Z′iui,tu′j,tZj

= ΣZu + Op(T−1/2), (A7)

where the last equality follows from Assumption 2 (iii). �

Lemma A.4. Under Assumptions 1 and 2 and n = r, as N, T → ∞ with T/N → τ > 0,

1√NT

N

∑i=1

η′iD(H′)−1λi =√

τ1N

N

∑i=1

Ση,i(β, Im)Zi(H′)−1λi + op(1).

Proof of Lemma A.4.

28

By using (A5), we write,

1T

N

∑i=1

η′iD(H′)−1λi =1T

N

∑i=1

T

∑t=1

ηi,td′t(H

′)−1λi =

1NT

N

∑i=1

T

∑t=1

N

∑j=1

ηi,tu′j,tZj(H

′)−1λi

=1

NT

N

∑i=1

T

∑t=1

ηi,tu′i,tZi(H

′)−1λi +

1NT

N

∑i=1

T

∑t=1

N

∑j 6=i

ηi,tu′j,tZj(H

′)−1λi

=1N

N

∑i=1

Ση,i(β, Im)Zi(H′)−1λi + Op(T−1/2), (A8)

where the last equality is obtained by using Assumption 2 (iii) and the fact that E(ηi,tu′i,t) =

E[ηi,t(εi,t + η′i,tβ, η′i,t)] = (Ση,iβ, Ση,i) = Ση,i(β, Im), which is implied by Assumption 1. The

result in the lemma is obtained by multiplying both sides by√

T/√

N. �

Lemma A.5. Under the conditions of Lemma A.4,

1√NT

N

∑i=1

ε′iD(H−1)′Λ′i =

√τ

1N

N

∑i=1

σ2ε,i(1, 01×m)Zi(H

−1)′Λ′i + op(1).

Proof of Lemma A.5.

By using (A5), we write

1T

N

∑i=1

ε′iD(H−1)′Λ′i =

1NT

N

∑i=1

N

∑j=1

T

∑t=1

εi,tu′j,tZj(H−1)′Λ′i

=1

NT

N

∑i=1

T

∑t=1

εi,tu′i,tZi(H−1)′Λ′i +

1NT

N

∑i=1

N

∑j 6=i

T

∑t=1

εi,tu′j,tZj(H−1)′Λ′i

=1N

N

∑i=1

σ2ε,i(1, 01×m)Zi(H

−1)′Λ′i + Op(T−1/2), (A9)

where the last equality is implied by Assumption 2 (iii) and the fact that E(εi,tu′i,t) = E[εi,t(εi,t +

η′i,tβ, η′i,t)] = (σ2ε,i, 01×m), which is implied by Assumption 1. Then, multiplying both sides

by√

T/√

N yields the required result. �

Proof of Theorem 1.

Since rk H = r and n = r, H is r × r and nonsingular. The equation for yi can therefore be

written as

yi = Xiβ + F(H′)−1λi −D(H′)−1λi + εi, (A10)

29

where D = F− FH′ is in the introduction of this appendix. The C3E estimator of β is given

by

βC3E =

(N

∑i=1

X′iMFXi

)−1 N

∑i=1

X′iMFyi.

By substituting for yi using (A10), we obtain the following expression for√

NT(βC3E − β):

√NT(βC3E − β) =

(1

NT

N

∑i=1

X′iMFXi

)−11√NT

N

∑i=1

X′iMF(εi −D(H′)−1λi). (A11)

We begin by considering the second term in the numerator. Clearly, MFD(H′)−1 =

MF(F− FH′)(H′)−1 = −MFF, and therefore

− 1√NT

N

∑i=1

X′iMFD(H′)−1λi =1√NT

N

∑i=1

X′iMFFλi

=1√NT

N

∑i=1

ΛiF′MFFλi +1√NT

N

∑i=1

η′iMFFλi

= K1 + K2. (A12)

Consider K1. From

HF′MFFH′ = D′MFD = D′MFH′D−D′(MFH′ −MF)D,

we obtain

K1 =1√NT

N

∑i=1

ΛiF′MFFλi

=1√NT

N

∑i=1

ΛiH−1HF′MFFH′(H′)−1λi

= (NT)−1/2N

∑i=1

ΛiH−1D′MFH′D(H′)−1λi

− 1√NT

N

∑i=1

ΛiH−1D′(MFH′ −MF)D(H′)−1λi

= K11 −K12. (A13)

Consider K12, from the definitions of MFH′ and MF,

MFH′ −MF = D(F′F)−1D′ + D(F′F)−1HF′

+ FH′(F′F)−1D′ + FH′[(F′F)−1 − (HF′FH′)−1]HF′,

30

which implies

D′(MFH′ −MF)D = D′D(F′F)−1D′D + D′D(F′F)−1HF′D + D′FH′(F′F)−1D′D

+ D′FH′[(F′F)−1 − (HF′FH′)−1]HF′D. (A14)

Consider the fourth term. Since (HF′FH′)−1 = (H′)−1(F′F)−1H−1, we have

(F′F)−1 − (HF′FH′)−1 = (F′F)−1(HF′FH′ − F′F)(H′)−1(F′F)−1H−1

= −(F′F)−1(D′FH′ + F′D)(H′)−1(F′F)−1H−1.

By Assumption 2 (i), and Lemmas A.2 and A.3, using triangle inequality and the submulti-

plicative property of norms,

‖√

NT−1/2D′F‖ ≤√

TN−1/2‖NT−1D′D‖+ ‖√

NT−1/2D′F‖‖H‖

= Op(√

TN−1/2) + Op(1), (A15)

which, together with Assumption 1 (iv), gives

T‖(F′F)−1 − (HF′FH′)−1‖

≤ ‖(T−1F′F)−1‖T−1‖D′F + HF′D‖‖(H′)−1‖‖(T−1F′F)−1‖‖H−1‖

= Op(N−1) + Op((NT)−1/2). (A16)

This result imply, via Lemmas A.2 and A.3 and Assumption 1 (i),

‖T−1D′(MFH′ −MF)D‖

≤ ‖T−1D′D‖2 ‖(T−1F′F)−1‖+ 2‖H‖‖T−1D′D‖ ‖(T−1F′F)−1‖ ‖T−1F′D‖

+ ‖T−1D′F‖2‖H‖2T‖(F′F)−1 − (HF′FH′)−1‖

= Op(N−2) + Op(N−1)Op((NT)−1/2) + [Op(N−1) + Op((NT)−1/2)]Op((NT)−1)

= Op(N−2) + Op(N−3/2T−1/2). (A17)

Hence, by Assumption 1(vi) and the submultiplicative property of norms and by triangle

inequality, we have

‖K12‖ =

∥∥∥∥∥ 1√NT

N

∑i=1

ΛiH−1D′(MFH′ −MF)D(H′)−1λi

∥∥∥∥∥≤√

NT‖H−1‖2‖T−1D′(MFH′ −MF)D‖1N

N

∑i=1‖Λi‖‖λi‖

= Op(√

TN−3/2) + Op(N−1). (A18)

31

Consider K11. Since MFH′ = MF, we have D′MFH′D = D′MFD = D′D−D′F(F′F)−1F′D,

where∥∥∥∥∥ 1√NT

N

∑i=1

ΛiH−1D′F(F′F)−1F′D(H′)−1λi

∥∥∥∥∥≤ (NT)−1/2‖H−1‖2‖

√NT−1/2D′F‖2 ‖(T−1F′F)−1‖ 1

N

N

∑i=1‖Λi‖ ‖λi‖

= Op((NT)−1/2),

which is obtained by making use of Assumptions 1 (i), (iv), (vi) and Lemma A.2. It follows

that

K11 =

√T√N

1N

N

∑i=1

ΛiH−1NT−1D′D(H′)−1λi + Op(

√TN−3/2) + Op(N−1) + Op((NT)−1/2),

and so, by application of Lemma A.3 and using Assumption 1 (vi), as T/N → τ with N, T →

∞,

K1 = K11 −K12 =√

τB1 + op(1), (A19)

where B1 = limN→∞ N−1 ∑Ni=1 ΛiH−1ΣZu(H′)−1λi.

Next, consider K2. By using MFH′Fλi = MFFλi = 0T×1, and the previously obtained

expression to substitute for (MFH′ −MF), we arrive at

K2 =1√NT

N

∑i=1

η′iMFFλi

= − 1√NT

N

∑i=1

η′i(MFH′ −MF)Fλi

= − 1√NT

N

∑i=1

η′iD(F′F)−1D′Fλi −1√NT

N

∑i=1

η′iD(F′F)−1HF′Fλi

− 1√NT

N

∑i=1

η′iFH′(F′F)−1D′Fλi −1√NT

N

∑i=1

η′iFH′[(F′F)−1 − (HF′FH′)−1]HF′Fλi

= −K21 − ...−K24.

Since d′t(F′F)−1ds, f′t(F

′F)−1D′Fλi and f′sλi are just scalars, the orders of K21 and K23 can be

32

inferred as follows:

‖K21‖ =

∥∥∥∥∥ 1√NT

N

∑i=1

η′iD(F′F)−1D′Fλi

∥∥∥∥∥=

∥∥∥∥∥ 1√NT

N

∑i=1

T

∑t=1

ηi,td′t(F′F)−1

T

∑s=1

dsf′sλi

∥∥∥∥∥=

∥∥∥∥∥ 1√NT

T

∑t=1

T

∑s=1

d′t(F′F)−1ds

N

∑i=1

ηi,tf′sλi

∥∥∥∥∥≤√

T

(1

T2

T

∑t=1

T

∑s=1‖d′t(T−1F′F)−1dt‖2

)1/2 1

T2

T

∑t=1

T

∑s=1

∥∥∥∥∥ 1√N

N

∑i=1

ηi,tf′sλi

∥∥∥∥∥21/2

≤√

T(T−1F′F)−1 1T

T

∑t=1‖dt‖2

1T2

T

∑t=1

T

∑s=1

∥∥∥∥∥ 1√N

N

∑i=1

ηi,tλ′i

∥∥∥∥∥2

‖fs‖2

1/2

= Op(√

TN−1),

where we make use of Lemma A.1, Assumption 1 to obtain the result, and

‖K23‖ =

∥∥∥∥∥ 1√NT

N

∑i=1

η′iFH′(F′F)−1D′Fλi

∥∥∥∥∥=

∥∥∥∥∥ 1√NT

N

∑i=1

T

∑t=1

ηi,tf′tH′(F′F)−1D′Fλi

∥∥∥∥∥=

∥∥∥∥∥ 1√NT

N

∑i=1

T

∑t=1

ηi,tλ′iF′D(F′F)−1Hft

∥∥∥∥∥≤ 1√

N

1T

T

∑t=1

∥∥∥∥∥ 1√N

N

∑i=1

ηi,tλ′i

∥∥∥∥∥21/2

‖(T−1F′F)−1‖ ‖√

NT−1/2F′D‖

× ‖H‖(

1T

T

∑t=1‖ft‖2

)1/2

= Op(N−1/2),

where the result makes use of Lemma A.2 and Assumption 1. Similarly, since f′tH′[(F′F)−1−

33

(HF′FH′)−1]HF′Fλi is a scalar,

‖K24‖ =

∥∥∥∥∥ 1√NT

N

∑i=1

η′iFH′T[(F′F)−1 − (HF′FH′)−1]T−1HF′Fλi

∥∥∥∥∥=

∥∥∥∥∥ 1√NT

N

∑i=1

T

∑t=1

ηi,tλ′iF′FH′[(F′F)−1 − (HF′FH′)−1]Hft

∥∥∥∥∥≤√

T

1T

T

∑t=1

∥∥∥∥∥ 1√N

N

∑i=1

ηi,tλ′i

∥∥∥∥∥21/2

‖T−1F′F‖‖H‖2T‖(F′F)−1 − (HF′FH′)−1‖

×(

1T

T

∑t=1‖ft‖2

)1/2

= Op(√

TN−1) + Op(N−1/2),

by (A16), Assumption 1 and Assumption 2 (i). K22 can be expanded as follows, by adding

and subtracting 1√NT ∑N

i=1 η′iD(HF′FH′)−1HF′Fλi:

K22 =1√NT

N

∑i=1

η′iD(F′F)−1HF′Fλi

=1√NT

N

∑i=1

η′iD(H′)−1λi

+√

NT1N

N

∑i=1

T−1η′iD[(T−1F′F)−1 − (T−1HF′FH′)−1]T−1HF′Fλi,

where the norm of the last term on the right is∥∥∥∥∥ 1N

N

∑i=1

T−1η′iD[(T−1F′F)−1 − (T−1HF′FH′)−1]T−1HF′Fλi

∥∥∥∥∥=

∥∥∥∥∥ 1NT

N

∑i=1

T

∑t=1

ηi,tλ′iT−1F′FH[(T−1F′F)−1 − (T−1HF′FH′)−1]dt

∥∥∥∥∥≤

1T

T

∑t=1

∥∥∥∥∥ 1√N

N

∑i=1

ηi,tλ′i

∥∥∥∥∥21/2

‖T−1F′F‖‖H‖‖(T−1F′F)−1 − (T−1HF′FH′)−1‖

×(

1T

T

∑t=1‖dt‖2

)1/2

= [Op(N−1) + Op((NT)−1/2)]Op(N−1/2)

= Op(N−3/2) + Op(T−1/2N−1),

by Lemma A.1, (A16) and Assumption 1. The order of the second term in K22 is√

NT times

34

this, which is Op(√

TN−1) + Op(N−1/2). The first term of K22 is

1√NT

N

∑i=1

η′iD(H′)−1λi =√

τ1N

N

∑i=1

Ση,i(β, Im)Zi(H′)−1λi + op(1),

by Lemma A.4. Hence, letting B2 = limN→∞ N−1 ∑Ni=1 Ση,i(β, Im)Zi(H′)−1λi, we have

K22 =√

τB2 + op(1). (A20)

The above results imply that, the second term in the numerator,

− 1√NT

N

∑i=1

X′iMFD(H′)−1λi = K1 + K2 =√

τ(B1 − B2) + op(1). (A21)

Next, consider (NT)−1/2 ∑Ni=1 X′iMFεi, the first term in the numerator of

√NT(βC3E− β).

Clearly,

1√NT

N

∑i=1

X′iMFεi =1√NT

N

∑i=1

X′iMFH′εi −1√NT

N

∑i=1

X′i(MFH′ −MF)εi, (A22)

where

1√NT

N

∑i=1

X′i(MFH′ −MF)εi

=1√NT

N

∑i=1

X′iD(F′F)−1D′εi +1√NT

N

∑i=1

X′iD(F′F)−1HF′εi

+1√NT

N

∑i=1

X′iFH′(F′F)−1D′εi +1√NT

N

∑i=1

X′iFH′[(F′F)−1 − (HF′FH′)−1]HF′εi

= L1 + ... + L4. (A23)

The order of L1, ..., L4 can be obtained by using the same steps as when analyzing K2. For

L1, we use the fact that xi,t = Λift + ηi,t, giving∥∥∥∥∥ 1√N

N

∑i=1

xi,tεi,s

∥∥∥∥∥ ≤∥∥∥∥∥ 1√

N

N

∑i=1

Λiεi,s

∥∥∥∥∥‖ft‖+∥∥∥∥∥ 1√

N

N

∑i=1

ηi,tεi,s

∥∥∥∥∥ = Op(1),

which, in view of Lemma A.1, implies

‖L1‖ =

∥∥∥∥∥ 1√NT

N

∑i=1

X′iD(F′F)−1D′εi

∥∥∥∥∥=

∥∥∥∥∥ 1√NT

T

∑t=1

T

∑s=1

d′t(F′F)−1ds

N

∑i=1

xi,tεi,s

∥∥∥∥∥≤√

T1T

T

∑t=1‖dt‖2‖(T−1F′F)−1‖

1T2

T

∑t=1

T

∑s=1

∥∥∥∥∥ 1√N

N

∑i=1

xi,tεi,s

∥∥∥∥∥21/2

= Op(√

TN−1), (A24)

35

by Assumption 1. We can similarly show that ‖T−1X′iF‖ = Op(1), leading to the following

result for ‖L4‖:

‖L4‖ =

∥∥∥∥∥ 1√NT

N

∑i=1

X′iFH′[(F′F)−1 − (HF′FH′)−1]HF′εi

∥∥∥∥∥≤√

N

(1N

N

∑i=1‖T−1X′iF‖2

)1/2

‖H‖2T‖(F′F)−1 − (HF′FH′)−1‖

×(

1N

N

∑i=1‖T−1/2F′εi‖2

)1/2

= Op(N−1/2) + Op(T−1/2), (A25)

by Assumptions 1, 2 (i) and (A16). Consider L2. Adding and subtracting 1√NT ∑N

i=1 X′iD(HF′FH′)−1HF′εi

give

L2 =1√NT

N

∑i=1


=1√NT

N

∑i=1

X′iD(H′)−1(F′F)−1F′εi +1√NT

N

∑i=1

X′iD[(F′F)−1 − (HF′FH′)−1]HF′εi

= L21 + L22,

where

‖L22‖ =

∥∥∥∥∥ 1√NT

N

∑i=1

X′iD[(F′F)−1 − (HF′FH′)−1]HF′εi

∥∥∥∥∥=

∥∥∥∥∥ 1√NT

N

∑i=1

T

∑t=1

T

∑s=1

xi,td′t[(F′F)−1 − (HF′FH′)−1]Hfsεi,s

∥∥∥∥∥=√

T‖H‖(

1T

T

∑t=1‖dt‖2

)1/2

T‖(F′F)−1 − (HF′FH′)−1‖

×(

1T

T

∑s=1‖fs‖2

)1/2 1

T2

T

∑t=1

T

∑s=1

∥∥∥∥∥ 1√N

N

∑i=1

εi,sx′i,t

∥∥∥∥∥21/2

=√

TOp(N−1/2)[Op(N−1) + Op((NT)−1/2)] = Op(√

TN−3/2) + Op(N−1),

by Assumption 1, Lemma A.1 and (A16). Also, from Xi = FΛ′i + ηi,

L21 =1√NT

N

∑i=1

X′iD(H′)−1(F′F)−1F′εi

=1√NT

N

∑i=1

ΛiF′D(H′)−1(F′F)−1F′εi +1√NT

N

∑i=1

η′iD(H′)−1(F′F)−1F′εi.

36

By Assumption 2 (iii),

T−1η′iD =1T

T

∑t=1

ηi,td′t =

1NT

T

∑t=1

N

∑j=1

ηi,tu′j,tZj =

1NT

T

∑t=1

ηi,tu′i,tZi +

1NT

T

∑t=1

N

∑j 6=i

ηi,tu′j,tZj

= Op(N−1) + Op((NT)−1/2),

from which and Assumption 1 and Assumption 2 (i), we deduce that∥∥∥∥∥ 1√NT

N

∑i=1

η′iD(H′)−1(F′F)−1F′εi

∥∥∥∥∥≤√

N

(1N

N

∑i=1‖T−1η′iD‖

2

)1/2(1N

N

∑i=1‖T−1/2F′εi‖2

)1/2

‖H−1‖‖(T−1F′F)−1‖

=√

N[Op(N−1) + Op((NT)−1/2)] = Op(N−1/2) + Op(T−1/2),

and by further use of Lemma A.2 and Assumption 1,∥∥∥∥∥ 1√NT

N

∑i=1

ΛiF′D(H′)−1(F′F)−1F′εi

∥∥∥∥∥≤ 1√

T1N

N

∑i=1‖Λi‖‖

√NT−1/2F′D‖‖H−1‖‖(T−1F′F)−1‖‖T−1/2F′εi‖

= Op(T−1/2).

Consequently, by the triangle inequality,

‖L21‖ ≤∥∥∥∥∥ 1√

NT

N

∑i=1

ΛiF′D(H′)−1(F′F)−1F′εi

∥∥∥∥∥+∥∥∥∥∥ 1√

NT

N

∑i=1

η′iD(H′)−1(F′F)−1F′εi

∥∥∥∥∥= Op(N−1/2) + Op(T−1/2),

leading to the following result for ‖L2‖:

‖L2‖ ≤ ‖L21‖+ ‖L22‖ = Op(N−1/2) + Op(T−1/2) + Op(√

TN−3/2), (A26)

which is op(1) for N, T → ∞, if we assume that√

TN−3/2 = o(1).

Consider L3. We begin by adding and subtracting,

L3 =1√NT

N

∑i=1

X′iFH′(F′F)−1D′εi

=1√NT

N

∑i=1

X′iF(F′F)−1H−1D′εi +

1√NT

N

∑i=1

X′iFH′[(F′F)−1 − (HF′FH′)−1]D′εi,

37

where, in analogy to ‖L22‖,∥∥∥∥∥ 1√NT

N

∑i=1

X′iFH′[(F′F)−1 − (HF′FH′)−1]D′εi

∥∥∥∥∥=

∥∥∥∥∥ 1√NT

N

∑i=1

T

∑t=1

T

∑s=1

xi,tf′tH′[(F′F)−1 − (HF′FH′)−1]dsεi,s

∥∥∥∥∥≤√

T

(1T

T

∑t=1‖ft‖2

)1/2

‖H‖T‖(F′F)−1 − (HF′FH′)−1‖(

1T

T

∑s=1‖ds‖2

)1/2

×

1T2

T

∑t=1

T

∑s=1

∥∥∥∥∥ 1√N

N

∑i=1

xi,tεi,s

∥∥∥∥∥21/2

=√

T[Op(N−1) + Op((NT)−1/2)]Op(N−1/2) = Op(√

TN−3/2) + Op(N−1),

by Assumption 1, the result in (A16), and Lemma A.1. Consider the first term of L3, by

substituting for Xi and then using Lemma A.5, it can be written as

1√NT

N

∑i=1

X′iF(F′F)−1H−1D′εi

=1√NT

N

∑i=1

ΛiF′F(F′F)−1H−1D′εi +√

N1N

N

∑i=1

T−1/2η′iF(T−1F′F)−1H−1T−1D′εi

=1√NT

N

∑i=1

ΛiH−1D′εi +

√N[Op(N−1) + Op((NT)−1/2)]

=√

τ1N

N

∑i=1

σ2ε,iΛiH

−1Z′i(1, 0m)′ + op(1),

where we have made use of the fact that T−1D′εi is of the same order as T−1η′iD. Note

how Lemma A.5 supposes that T/N → τ, under which√

TN−3/2 = o(1). Hence, letting

B3 = limN→∞ N−1 ∑Ni=1 σ2

ε,iΛiH−1Z′i(1, 0m)′, we obtain

L3 =√

τB3 + op(1). (A27)

The results for L1, ..., L4 give

1√NT

N

∑i=1

X′i(MFH′ −MF)εi = L1 + ... + L4 =√

τB3 + op(1), (A28)

provided that√

TN−1/2 → τ. The implication is that

1√NT

N

∑i=1

X′iMFεi =1√NT

N

∑i=1

X′iMFH′εi −√

τB3 + op(1). (A29)

38

Let us consider the first term on the right-hand side of the above equation. The variance of

(NT)−1/2 ∑Ni=1 η′iFH′(HF′FH′)−1HF′εi is Op(T−1); hence,∥∥∥∥∥ 1√

NT

N

∑i=1

η′iFH′(HF′FH′)−1HF′εi

∥∥∥∥∥ = Op(T−1/2). (A30)

This result, together with the fact that MFH′Xi = MFH′ηi, implies

1√NT

N

∑i=1

X′iMFH′εi =1√NT

N

∑i=1

η′iMFH′εi

=1√NT

N

∑i=1

η′iεi −1√NT

N

∑i=1

η′iFH(H′F′FH)−1H′F′εi

=1√NT

N

∑i=1

η′iεi + Op(T−1/2). (A31)

where by Assumption 1, (NT)−1/2 ∑Ni=1 η′iεi →d N(0m×1, Σηε) as N, T → ∞. Thus, provided

that T/N → τ,

1√NT

N

∑i=1

X′iMFεi =1√NT

N

∑i=1

η′iεi −√

τB3 + op(1)

→d N(0m×1, Σηε)−√

τB3. (A32)

Let B = B1− B2− B3. The above results suggest the following limit for the numerator of√

NT(βC3E − β):

1√NT

N

∑i=1

(X′iMFεi − X′iMFD(H′)−1λi) =1√NT

N

∑i=1

η′iεi +√

τB + op(1)

→d N(0m×1, Σηε)−√

τB, (A33)

which holds as N, T → ∞ with T/N → τ.

Next, consider the denominator of√

NT(βC3E − β), which we expand as

1NT

N

∑i=1

X′iMFXi =1

NT

N

∑i=1

X′iMFH′Xi −1

NT

N

∑i=1

X′i(MFH′ −MF)Xi, (A34)

where

‖T−1X′i(MFH′ −MF)Xi‖

≤ ‖T−1X′iD‖2‖(T−1F′F)−1‖+ 2‖H‖‖T−1X′iD‖ ‖T−1X′iF‖ ‖(T−1F′F)−1‖

+ ‖T−1X′iF‖2‖H‖2T‖(F′F)−1 − (HF′FH′)−1‖.

Clearly, ‖T−1X′iF‖ = Op(1), and by using the fact that ‖T−1η′iD‖ = Op(N−1)+Op((NT)−1/2)

and ‖T−1F′D‖ = Op((NT)−1/2), we can further show that

‖T−1X′iD‖ ≤ ‖Λi‖‖T−1F′D‖+ ‖T−1η′iD‖ = Op(N−1) + Op((NT)−1/2).

39

This implies

‖T−1X′i(MFH′ −MF)Xi‖ = Op(N−1) + Op((NT)−1/2), (A35)

and so we get∥∥∥∥∥ 1NT

N

∑i=1

X′i(MFH′ −MF)Xi

∥∥∥∥∥ ≤ 1N

N

∑i=1‖T−1X′i(MFH′ −MF)Xi‖

= Op(N−1) + Op((NT)−1/2). (A36)

By using this and

T−1X′iMFH′Xi = T−1η′iMFH′ηi = T−1η′iηi − T−1T−1/2η′iF(T−1F′F)−1T−1/2F′ηi

= T−1η′iηi + Op(T−1), (A37)

we obtain

1NT

N

∑i=1

X′iMFXi =1

NT

N

∑i=1

X′iMFH′Xi + op(1) =1

NT

N

∑i=1

η′iηi + op(1)

= Ση + op(1). (A38)

By adding all the results, as N, T → ∞ with T/N → τ,

√NT(βC3E − β) =

(1

NT

N

∑i=1

X′iMFXi

)−11√NT

N

∑i=1

(X′iMFεi − X′iMFD(H′)−1λi)

= Σ−1η

(1√NT

N

∑i=1

η′iεi +√

τB

)+ op(1)

→d N(0m×1, Σ−1η ΣηεΣ−1

η ) + Σ−1η

√τB.

This completes the proof. �

Proof of Theorem 2.

When βi = β + ξi,√

N(βC3E − β) can be written as

√N(βC3E − β) = T−1/2

(1

NT

N

∑i=1

X′iMFXi

)−11√NT

N

∑i=1

X′iMF(εi −D(H′)−1λi)

+

(1

NT

N

∑i=1

X′iMFXi

)−11√NT

N

∑i=1

X′iMFXiξi.

From Proof of Theorem 1, we know that the first term is Op(T−1/2). We therefore focus on

the second term. Clearly,

1√NT

N

∑i=1

X′iMFXiξi =1√NT

N

∑i=1

X′iMFH′Xiξi +1√NT

N

∑i=1

X′i(MF −MFH′)Xiξi. (A39)

40

From (A36),∥∥∥∥∥ 1√NT

N

∑i=1

X′i(MF −MFH′)Xiξi

∥∥∥∥∥ ≤ √N

1N

N

∑i=1‖T−1X′i(MF −MFH′)Xi‖‖ξi‖

=√

N[Op(N−1) + Op((NT)−1/2)] = op(1),

and by use of ‖T−1η′iF(F′F)−1F′ηi‖ ≤ T−1‖T−1/2η′iF‖‖(T−1F′F)−1‖‖T−1/2F′ηi‖ = Op(T−1),

we can further show that

1√NT

N

∑i=1

X′iMFH′Xiξi =1√NT

N

∑i=1

η′iMFηiξi

=1√NT

N

∑i=1

η′iηiξi −√

N1N

N

∑i=1

T−1η′iF(F′F)−1F′ηiξi

=1√NT

N

∑i=1

η′iηiξi +√

NOp(T−1)

=1√N

N

∑i=1

Ση,iξi + op(1),

where the last result requires√

NT−1 = o(1), which is implied by T/N → τ. In view of

(A38) this yields

√N(βC3E − β) = Σ−1

η

1√N

N

∑i=1

Ση,iξi + op(1)→d N(0m×1, Σ−1η RΣ−1

η ),

where R = limN→∞ N−1 ∑Ni=1 Ση,iΣξΣη,i. This completes the proof. �

Proof of Theorem 3.

Consider (20), by using (A10) we rewrite it as

√T(βC3E,i − β) = (T−1X′iMFXi)

−1T−1/2X′iMFεi

− (T−1X′iMFXi)−1T−1/2X′iMFD(H′)−1λi. (A40)

We begin by considering the numerator of the second term. By following the same analogy

as in the proof of Theorem 1, we can write

T−1/2X′iMFD(H′)−1λi = T−1/2η′iMFH′Fλi − T−1/2η′i(MFH′ −MF)Fλi

− T−1/2ΛiH−1D′MFH′D(H′)−1λi

+ T−1/2ΛiH−1D′(MFH′ −MF)D(H′)−1λi.

41

The first term is zero since MFH′ = MF. By (A17), the fourth term has an order of Op(√

TN−2)+

Op(N−3/2). For the third term we use Lemma A.3, giving

‖T−1/2ΛiH−1D′MFH′D(H′)−1λi‖ ≤ ‖T−1/2ΛiH

−1D′D(H′)−1λi‖

≤√

TN−1‖Λi‖‖H−1‖‖NT−1D′D‖‖(H′)−1‖‖λi‖

= Op(√

TN−1).

The second term can be written as

T−1/2η′i(MFH′ −MF)Fλi = T−1/2η′iD(F′F)−1D′Fλi + T−1/2η′iD(F′F)−1HF′Fλi

+ T−1/2η′iFH′(F′F)−1D′Fλi

+ T−1/2η′iFH′[(F′F)−1 + (HF′FH′)−1]HF′Fλi.

Consider ‖T−1η′iD‖. Clearly,

‖T−1η′iD‖ =∥∥∥∥∥ 1

NT

N

∑j=1

T

∑t=1

ηi,tu′j,tZj

∥∥∥∥∥ ≤∥∥∥∥∥ 1

NT

T

∑t=1

ηi,tu′i,tZi

∥∥∥∥∥+∥∥∥∥∥ 1

NT

N

∑j 6=i

T

∑t=1

ηi,tu′j,tZj

∥∥∥∥∥ ,

where, by using the same arguments as in the proof of Lemma A.4, the first term is Op(N−1)

and the second is Op(T−1/2N−1/2). ‖T−1/2η′iF‖ is clearly Op(1) by Assumption 1 (vii). Mak-

ing use of these results, (A16) and Lemma A.2, we obtain

‖T−1/2η′i(MFH′ −MF)Fλi‖ = Op(√

TN−1) + Op((NT)−1/2).

Let us now consider the numerator in the first term of (A40), which can be written as

follows:

T−1/2X′iMFεi = T−1/2η′iMFH′εi − T−1/2X′iD(F′F)−1D′εi −1√T

N

∑i=1


− 1√NT

N

∑i=1

X′iFH′(F′F)−1D′εi −1√T

N

∑i=1

X′iFH′[(F′F)−1 − (HF′FH′)−1]HF′εi.

Here,

‖T−1ε′iD‖ =

∥∥∥∥∥ 1NT

N

∑j=1

T

∑t=1

εi,tu′j,tZj

∥∥∥∥∥ ≤∥∥∥∥∥ 1

NT

T

∑t=1

εi,tu′i,tZi

∥∥∥∥∥+∥∥∥∥∥ 1

NT

N

∑j 6=i

T

∑t=1

εi,tu′j,tZj

∥∥∥∥∥= Op(N−1) + Op((NT)−1/2),

and

‖T−1X′iD‖ =

∥∥∥∥∥ 1NT

N

∑j=1

T

∑t=1

xi,tu′j,tZj

∥∥∥∥∥ ≤∥∥∥∥∥ 1

NT

T

∑t=1

xi,tu′i,tZi

∥∥∥∥∥+∥∥∥∥∥ 1

NT

N

∑j 6=i

T

∑t=1

xi,tu′j,tZj

∥∥∥∥∥= Op(N−1) + Op((NT)−1/2).

42

In view of this, (A16) and the fact that ‖T−1/2F′εi‖ = Op(1), we obtain

T−1/2X′iMFεi = T−1/2η′iMFH′εi + Op(√

TN−1) + Op(N−1/2).

Note also how

T−1/2η′iMFH′εi = T−1/2η′iMFεi = T−1/2η′iεi − T−1/2T−1/2η′iF(T−1F′F)−1T−1/2F′εi

= T−1/2η′iεi + Op(T−1/2).

By combining all the results obtained so far, we obtain

T−1/2X′iMF[εi −D(H′)−1λi] = T−1/2η′iεi + Op(√

TN−1) + Op(N−1/2) + Op(T−1/2).

It remains to consider the denominator of the estimator. By (A35) and (A37),

T−1X′iMFXi = T−1η′iηi + Op(T−1) + Op(N−1) + Op((NT)−1/2).

This implies

√T(βC3E,i − β) = (T−1η′iηi)

−1T−1/2η′iεi + Op(√

TN−1) + Op(N−1/2) + Op(T−1/2).

The required result now follows from Assumptions 1 (i) and (ii), provided that N, T → ∞

with√

T/N → 0. �

Proof of Corollary 1.

Write

√NT(βBAC3E − β) =

√NT(βC3E − β)−

√TN−1/2Σ

−1η B

=√

NT(βC3E − β)−√

TN−1/2Σ−1η B−

√TN−1/2Σ

−1η (B− B)

−√

TN−1/2(Σ−1η − Σ−1

η )B. (A41)

Consider√

TN−1/2Σ−1η (B− B). We begin by showing that ‖Ci − (H′)−1Ci‖ = op(1), which

implies that λi and Λi in B are consistent. We have

T−1F′Wi = T−1F′FCi + T−1F′Ui

= T−1HF′FCi + T−1D′FCi + T−1F′Ui

= T−1HF′FCi + T−1HF′Ui + T−1D′FCi + T−1D′Ui.

43

Clearly, ‖T−1F′Ui‖ = Op(T−1/2), and by Lemma A.2, ‖T−1D′F‖ = Op((NT)−1/2). More-

over, from Proof of Theorem 1, ‖T−1D′Ui‖ and T‖(F′F)−1 − (HF′FH′)−1‖ are Op(N−1) +

Op((NT)−1/2). It follows that

Ci = (T−1F′F)−1T−1F′Wi

= (T−1HF′FH′)−1(T−1HF′FCi + T−1HF′Ui + T−1D′FCi + T−1D′Ui) + Op(N−1)

+ Op((NT)−1/2)

= (T−1HF′FH′)−1T−1HF′FCi + Op(N−1) + Op(T−1/2)

= (H′)−1Ci + Op(N−1) + Op(T−1/2) (A42)

(Ση −Ση), (Ση,i −Ση,i) and (σ2ε,i − σ2

ε,i) are all Op(T−1/2) (details are available upon request).

This implies

‖B− B‖ = Op(T−1/2) + Op(N−1),

and therefore, with ‖Σ−1η ‖ = Op(1),

‖√

TN−1/2Σ−1η (B− B)‖ ≤

√TN−1/2‖Σ−1

η ‖‖B− B‖

= Op(N−1/2) + Op(√

TN−3/2), (A43)

which is op(1) under our assumption that√

TN−1 = o(1). Similarly, since ‖B‖ = Op(1) and,

by Taylor expansion, ‖Σ−1η − Σ−1

η ‖ = Op(T−1/2),

‖√

TN−1/2(Σ−1η − Σ−1

η )B‖ ≤√

TN−1/2‖Σ−1η − Σ−1

η ‖‖B‖ = Op(N−1/2). (A44)

Together with Theorem 1 these results imply√

NT(βBAC3E − β) =√

NT(βC3E − β)−√

τΣ−1η B + op(1)

as N, T → ∞ with√

TN−1 → 0 and√

NT−1 → 0. Finally, note how T/N → τ implies√

TN−1 → 0 and√

NT−1 → 0. The Theorem 1 requirement of T/N → τ is therefore

enough also for this proof. �

Proof of Proposition 1.

Consider the ks× 1 vector zs,i of combinations candidates. Let us use Fs, Hs and Zsi to denote

F, H and Zi, respectively, based on estimating s = (m + 1)ks factors. By using ln a− ln b =

ln(a/b), 1/(det A) = det A−1 and (det A)(det B) = det(AB), we can show that

IC(s)− IC(r) = ln det[V(Fs)V(Fr)−1] + (s− r) · g

= ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r) · g. (A45)

44

We will consider two cases; s ≤ r and s > r. We start with the case with s ≤ r. Note that in

this case all the elements in Zsi will satisfy Assumption 2. In order to emphasize this we use

Zs0,i for the combination matrix in this case. Consider V(Fs)−V(Fr), which we write as

V(Fs)−V(Fr) = [V(Fs)−V(F(Hs)′)]− [V(Fr)−V(F(Hr

)′)]

+ [V(F(Hs)′)−V(F(Hr

)′)]. (A46)

Since s ≤ r, we have

MF(Hs)′ −MFs

= Ds((Fs)′Fs)−1(Ds)′ + Ds((Fs)′Fs)−1HsF′

+ F(Hs)′((Fs)′Fs)−1(Ds)′ + F(Hs

)′[((Fs)′Fs)−1 − (HsF′F(Hs)′)−1]HsF′,

where Ds = Fs − F(Hs)′, suggesting that

V(Fs)−V(F(Hs)′)

=1

NT

N

∑i=1

W′i(MF(Hs

)′ −MFs)Wi

=1

NT

N

∑i=1

W′iD

s((Fs)′Fs)−1(Ds)′Wi +1

NT

N

∑i=1

W′iD

s((Fs)′Fs)−1HsF′Wi

+1

NT

N

∑i=1

W′iF(H

s)′((Fs)′Fs)−1(Ds)′Wi

+1

NT

N

∑i=1

W′iF(H

s)′[((Fs)′Fs)−1 − (HsF′F(Hs

)′)−1]HsF′Wi.

From Wi = FCi + Ui and N−1 ∑Ni=1 CiZs

i = (Hs)′,

Fs =1N

N

∑i=1

WiZs0,i =

1N

N

∑i=1

FCiZs0,i +

1N

N

∑i=1

UiZs0,i = F(Hs

)′ +1N

N

∑i=1

UiZs0,i,

or

ft = Hsft +1N

N

∑i=1

(Zs0,i)′ui,t.

By using this, Assumption 2 (iii) and the fact that T−1F′Ui = T−1 ∑Tt=1 ftu′i,t = Op(T−1/2),

45

we obtain

T−1W′iD

s =1

NT

T

∑t=1

N

∑j=1

wi,tu′j,tZs0,j

=1

NT

T

∑t=1

N

∑j=1

C′iftu′j,tZs0,j +

1NT

T

∑t=1

N

∑j=1

ui,tu′j,tZs0,j

=1

NT

T

∑t=1

N

∑j=1

C′iftu′j,tZs0,j +

1NT

T

∑t=1

ui,tu′i,tZs0,i +

1NT

T

∑t=1

N

∑j 6=i

ui,tu′j,tZs0,j

= Op((NT)−1/2) + Op(N−1).

By repeated use of the same argument,

T−1F′Wi =1T

T

∑t=1

ftw′i,t =1T

T

∑t=1

ftf′tCi +1T

T

∑t=1

ftu′i,t = Op(1) + Op(T−1/2),

and

T−1(Fs)′Fs = T−1

(F(Hs

)′ +1N

N

∑i=1

UiZs0,i

)′(F(Hs

)′ +1N

N

∑i=1

UiZs0,i

)

= T−1HsF′F(Hs)′ +

1NT

N

∑i=1

H′F′UiZs0,i +

1NT

N

∑i=1

(Zs0,i)′U′iF(H

s)′

+1

N2T

N

∑i=1

N

∑j=1

(Zs0,i)′U′iUjZs

0,j

= T−1HsF′F(Hs)′ + Op((NT)−1/2) + Op(N−1).

Note also that in the case considered here rk Hs= min{s, r} = s, which implies that the s× s

matrix T−1HsF′F(Hs)′ is positive definite. Therefore,

T[((Fs)′Fs)−1 − (HsF′F(Hs)′)−1]

= (T−1(Fs)′Fs)−1(T−1HsF′F(Hs)′ − T−1(Fs)′Fs)(T−1HsF′F(Hs

)′)−1

= Op((NT)−1/2) + Op(N−1).

Hence, by putting everything together, we can show that

V(Fs)−V(F(Hs)′) = Op((NT)−1/2) + Op(N−1), (A47)

which holds for all s ≤ r, including s = r. This implies

V(Fs)−V(Fr)

= [V(Fs)−V(F(Hs)′)]− [V(Fr)−V(F(Hr

)′)] + [V(F(Hs)′)−V(F(Hr

)′)]

= [V(F(Hs)′)−V(F(Hr

)′)] + Op((NT)−1/2) + Op(N−1). (A48)

46

By writing MA = IT − PA for any A, the remaining term in the above expression for V(Fs)−

V(Fr) becomes

V(F(Hs)′)−V(F(Hr

)′) =1

NT

N

∑i=1

W′i(PF(Hr

)′ − PF(Hs)′)Wi,

which is zero if s = r. If s < r, then PF(Hr)′ = PF. Thus, since PF − PF(Hs

)′ is positive semi-

definite, the quadratic form T−1W′i(PF(Hr

)′ − PF(Hs)′)Wi = T−1W′

i(PF − PF(Hs)′)Wi is posi-

tive semi-definite too. Also, T−1W′i(PF − PF(Hs

)′)Wi = 0m+1 is equivalent to tr [T−1W′i(PF −

PF(Hs)′)Wi] = 0, which under Assumption 1 (iv) and (vi) can be shown to be violated asymp-

totically using the same arguments as in Bai and Ng (2002, Proof of Lemma 3), and Stock and

Watson (1998, Proof of Theorem 2). Therefore, V(F(Hs)′)− V(F(Hr

)′) converges to a pos-

itive definite matrix, as does V(Fr). Suppose that A is positive definite and B is positive

semi-definite. Then det(A + B) ≥ det A with equality if and only if B = 0. Making use of

this result we find that

ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1)→ c > ln det Im+1 = 0 (A49)

for all s < r. Hence, since g = o(1),

IC(s)− IC(r) = ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r) · g→ c > 0, (A50)

which in turn implies

P[IC(s)− IC(r) < 0]→ 0 (A51)

for all s < r.

Consider next the case when s > r, such that rk Hs= r (Hs has full column rank).

In this case we should allow for the possibility that zs,i includes some or indeed all of the

elements of z1,i. Denote by A− the generalized Moore–Penrose inverse of any matrix A.

From (Hs)− = ((Hs

)′Hs)−1(Hs

)′, we have that (Hs)−Hs

= Ir. We can therefore write Wi =

F(Hs)′(Hs−

)′Ci + Ui = F(Hs−)′Ci + Ei, where Ei = Ui −Ds(Hs−

)′Ci. In this notation,

V(F) =1

NT

N

∑i=1

W′iMFWi =

1NT

N

∑i=1

U′iMFUi,

V(Fs) =1

NT

N

∑i=1

W′iMFs Wi =

1NT

N

∑i=1

E′iMFs Ei

=1

NT

N

∑i=1

U′iMFs Ui −1

NT

N

∑i=1

CiHs−(Ds)′MFs Ui −

1NT

N

∑i=1

U′iMFs Ds(Hs−)′Ci

+1

NT

N

∑i=1

CiHs−(Ds)′MFs Ds(Hs−

)′Ci.

47

To evaluate the orders of T−1U′iDs, T−1(Ds)′Ds, we need to acknowledge that Zs

i might

include invalid candidates that belong to z1,i. If s > k0, then the elements after the k0-th

element will satisfy Assumption 3 (ii) instead of Assumption 3 (i). To deal with this, we

partition T−1U′iDs into two parts, such that T−1U′iD

s = ((T−1U′iDs)1, (T−1U′iD

s)2), where

(T−1U′iDs)1 has a dimension of (m + 1)× k0 and (T−1U′iD

s)2 has a dimension of (m + 1)×

(s− k0). We have

(T−1U′iDs)1 =

1NT

T

∑t=1

N

∑j=1

ui,tu′j,tZk00,j =

1NT

T

∑t=1

ui,tu′i,tZk00,i +

1NT

T

∑t=1

N

∑j 6=i

ui,tu′j,tZk00,j

= Op(N−1) + Op((NT)−1/2),

(T−1U′iDs)2 =

1NT

T

∑t=1

N

∑j=1

ui,tu′j,tZs−k01,j =

1NT

T

∑t=1

ui,tu′i,tZs−k01,i +

1NT

T

∑t=1

N

∑j 6=i

ui,tu′j,tZs−k01,j

= Op(N−1) + Op(T−1/2).

The last two results imply that T−1U′iDs = Op(N−1) + Op(T−1/2). Similarly, we need to

partition T−1(Ds)′Ds into four different matrices;

T−1(Ds)′Ds =

[(T−1(Ds)′Ds)11 (T−1(Ds)′Ds)12(T−1(Ds)′Ds)21 (T−1(Ds)′Ds)22

],

where the upper left hand-side is a k0 × k0 matrix, upper right hand-side is a k0 × (s− k0)

matrix, lower left hand-side is a (s− k0)× k0 matrix and lower right hand-side is a (s− k0)×

(s− k0) matrix. We have

(T−1(Ds)′Ds)11 =1

N2T

T

∑t=1

N

∑i=1

N

∑j=1

(Zk00,i)′ui,tu′j,tZ

k00,j

=1

N2T

N

∑i=1

T

∑t=1

(Zk00,i)′ui,tu′i,tZ

k00,i +

1N2T

T

∑t=1

N

∑i=1

N

∑j 6=i


k00,j

= Op(N−1),

(T−1(Ds)′Ds)12 =1

N2T

T

∑t=1

N

∑i=1

N

∑j=1


s−k01,j

=1

N2T

N

∑i=1

T

∑t=1

(Zk00,i)′ui,tu′i,tZ

s−k01,i +

1N2T

T

∑t=1

N

∑i=1

N

∑j 6=i


s−k01,j

= Op(N−1) + Op((NT)−1/2),

(T−1(Ds)′Ds)22 =1

N2T

T

∑t=1

N

∑i=1

N

∑j=1

(Zs−k01,i )′ui,tu′j,tZ

s−k01,j

=1

N2T

N

∑i=1

T

∑t=1

(Zs−k01,i )′ui,tu′i,tZ

s−k01,i +

1N2T

T

∑t=1

N

∑i=1

N

∑j 6=i

(Zs−k01,i )′ui,tu′j,tZ

s−k01,j

= Op(N−1) + Op(T−1/2),

48

which imply that T−1(Ds)′Ds = Op(N−1) + Op(T−1/2). Then, via Pythagoras’ theorem, we

have

‖T−1CiHs−(Ds)′MFs Ui‖ ≤ ‖T−1CiH

s−(Ds)′Ui‖ ≤ ‖Ci‖‖H

s−‖‖T−1(Ds)′Ui‖

= Op(N−1) + Op(T−1/2),

‖T−1CiHs−(Ds)′MFs Ds(Hs−

)′Ci‖ ≤ ‖T−1CiHs−(Ds)′Ds(Hs−

)′Ci‖

≤ ‖Ci‖2‖Hs−‖2‖T−1(Ds)′Ds‖

= Op(N−1) + Op(T−1/2).

It follows that

V(Fs) =1

NT

N

∑i=1

U′iMFs Ui + Op(N−1) + Op(T−1/2),

and so we obtain

V(Fs)−V(F) =1

NT

S

∑i=1

U′i(PF − PFs)Ui + Op(T−1/2) + Op(N−1).

We also have

‖T−1U′iPFUi‖ ≤ ‖T−1U′iF‖2‖(T−1F′F)−1‖ = Op(T−1),

and by further use of tr (AB) = tr (BA), tr (AB) ≤ (tr A)(tr B) and tr (AB) ≤ ∑rj=1 λe

j (A)λej (B)

where A and B are normal matrices and λej (A) is the j-th eigenvalue of A, and noting that

idempotency of PFs implies that λej (PFs) = 1 for j = 1, . . . , s, we obtain∥∥∥∥∥ 1

NT

N

∑i=1

U′iPFs Ui

∥∥∥∥∥2

= tr

(1

NT

N

∑i=1

U′iPFs Ui

)2

= tr

(1

NT

N

∑i=1

U′iPFs Ui

)2

≤[

s

∑j=1

λej

(1

NT

N

∑i=1

UiU′i

)λe

j (PFs)

]2

≤ [λemax((NT)−1UU′)s]2,

where U = (U1, ..., UN) is T × N(m + 1). Now, λmax((NT)−1UU′) has the same form as in

Bai and Ng (2006), Amengual and Watson (2006) and Amengual and Watson (2007), who

show that it is Op(N−1) + Op(T−1). We therefore obtain

V(Fs)−V(F) = Op(T−1/2) + Op(N−1), (A52)

and so

V(Fs)−V(Fr) = [V(Fs)−V(F)]− [V(Fr)−V(F)]

= Op(T−1/2) + Op(N−1), (A53)

49

which leads to the following:

ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) = Op(T−1/2) + Op(N−1) (A54)

(see, for example, Paulsen, 1984, page 119). Let C = min{N,√

T}. Making use of the previ-

ously obtained expression for IC(s)− IC(r), we obtain

g−1[IC(s)− IC(r)] = g−1 ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r).

By using this, g > 0 and (A54), we can show that

P[IC(s)− IC(r) < 0]

= P[(C · g)−1C · ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r) < 0]→ 0, (A55)

where the last result follows from noting that C · g → ∞ by assumption, C · ln det(Im+1 +

[V(Fs)−V(Fr)]V(Fr)−1) = Op(1) and (s− r) > 0 for all s > r. This last result together with

(A51) implies that

P[IC(s)− IC(r) < 0|s 6= r]→ 0, (A56)

which is equivalent to saying that

P(r = r)→ 1,

as was to be shown. �

Proof of Corollary 2.

The proof of Corollary 2 follows by simple manipulations of that of Proposition 1. Note in

particular how the proof for the case when s ≤ r is exactly the same as in Proof of Propo-

sition 1. When s > r, under the condition that Zsi = Zs

0,i we have the following changes

to the orders of T−1U′iDs and T−1(Ds)′Ds becomes equal to the orders of (T−1U′iD

s)11 and

(T−1(Ds)′Ds)11, respectively:

T−1U′iDs =

1NT

T

∑t=1

ui,tu′i,tZs0,i +

1NT

T

∑t=1

N

∑j 6=i

ui,tu′j,tZs0,j = Op(N−1) + Op((NT)−1/2),

T−1(Ds)′Ds =1

N2T

N

∑i=1

T

∑t=1

(Zs0,i)′ui,tu′i,tZ

s0,i +

1N2T

T

∑t=1

N

∑i=1

N

∑j 6=i

(Zs0,i)′ui,tu′j,tZ

s0,j

= Op(N−1).

50

This last result, together with the result ‖T−1U′iPFUi‖ = Op(T−1), implies

V(Fs)−V(Fr) = Op((NT)−1/2) + Op(N−1) + Op(T−1).

The order of ln det(Im+1 + [V(Fs)− V(Fr)]V(Fr)−1) is the same. Therefore, by letting C =

min{√

N,√

T} and using the same trick as in Proof of Proposition 1, we can show that

P[IC(s)− IC(r) < 0]

= P[(C2 · g)−1C2 · ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r) < 0], (A57)

which is o(1) because C2 · g → ∞, C2 · ln det(Im+1 + [V(Fs)− V(Fr)]V(Fr)−1) = Op(1) and

(s− r) > 0 for all s > r. Hence, provided that the rate of shrinking of g is slow enough, the

consistency of r is unaffected by the correlation between ui,t and Zsi . �

51

Table A: Description of the experiments.

Experiment r Observables Factor loadingsE1 2 yi,t = βixi,t + λ1i f1t + λ2i f2t + εi,t λ1i = 4z2i + τλ1i

xi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i = −2z2i + τλ2iΛ1i = 2z2i + τΛ1iΛ2i = z2i + τΛ2i

E2 2 yi,t = βixi,t + λ1i f1t + λ2i f2t + εi,t λ1i ∼ N( 1, 1)xi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i ∼ N( 2, 1)

Λ1i ∼ N( 1, 1)Λ2i ∼ N(−1, 1)

E3 2 yi,t = βixi,t + λ1i f1t + λ2i f2t + εi,t λ1i = z2,i + τλ1ixi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i = 2z2,i + τλ2i

Λ1i = z3i + τΛ1iΛ2i = 2z3i + τΛ2i

E4 2 yi,t = βixi,t + λ1i f1t + λ2i f2t + εi,t λ1i = 2z2i + τλ1ixi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i = 0.5z2i + τλ2i

Λ1i = 2z2i + τΛ1iΛ2i = 0.5z2i + τΛ2i

E5 4 yi,t = βixi,t + λ1i f1t + λ2i f2t + λ3i f3t λ1i = 4z2i + 2z3i + 0.2z4i + τλ1i+ λ4i f4t + εi,t λ2i = −2z2i + z3i + τλ2i

xi,t = Λ1i f1t + Λ2i f2t + Λ3i f3t λ3i = −z2i + 2z3i + 0.1z4i + τλ3i+ Λ4i f4t + ηi,t λ4i = −2z2i − 1z3i + τλ4i

Λ1i = 2z2i + z3i + 0.2z4i + τΛ1iΛ2i = z2i + 0.5z3i + τΛ2iΛ3i = 2z2i + 0.5z3i + τΛ3iΛ4i = z2i + z3i + 0.2z4i + τΛ4i

E6 2 yi,t = βxi,t + λ1i f1t + λ2i f2t + εi,t λ1i = z1i + 4z2i + τλ1ixi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i = z1,i − 2z2i + z3i + τλ2i

Λ1i = z1i + 2z2i + z3,i + τΛ1iΛ2i = z1i + z2i + τΛ2i

Notes: The following specifications that are kept constant across the experiments: βi ∼ N(−2, 0.25), β = −2,( f1,t, f2,t, f3,t, f4,t, ηi,t, εi,t) ∼ N(06×1, I6) and (τλ1i, τλ2i, τΛ1i, τΛ2i)

′ ∼ N(04×1, 0.25 · I4). The combinations arez1i = 1, z2i ∼ N(0.5, 1), z3i ∼ N(−0.4, 1), z4i ∼ N(0.2, 1), z5i ∼ N(0.5, 1) and z6i ∼ N(0.1, 1).

52

Table E1: All conditions of CCE and C3E are satisfied.

Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 94.79 10.57 1.63 0.06 0.06 0.06 99.5 36.7 14.8 6.5 6.5 7.350 30 95.59 6.32 0.33 0.01 0.01 0.05 99.8 26.5 9.9 5.8 5.8 6.9

100 30 95.56 3.10 0.01 −0.01 −0.01 0.01 99.9 14.9 6.7 5.2 5.2 6.2200 30 95.89 1.57 0.01 0.00 0.00 0.01 100.0 10.3 5.6 5.1 5.1 5.6

30 50 94.99 9.97 1.38 0.01 0.01 0.05 99.9 38.6 14.9 6.5 6.5 7.750 50 95.62 5.96 0.26 −0.03 −0.03 0.04 100.0 26.4 10.2 6.5 6.5 6.9

100 50 95.89 2.88 −0.09 −0.11 −0.11 −0.10 100.0 14.8 6.7 5.5 5.5 6.3200 50 96.32 1.52 0.03 0.04 0.04 0.03 100.0 11.6 5.6 5.0 5.0 5.6

30 100 95.76 9.64 0.87 0.00 0.00 0.05 100.0 40.1 13.5 6.5 6.5 7.350 100 96.06 5.75 0.15 −0.05 −0.05 −0.01 100.0 27.9 9.8 6.1 6.1 6.8

100 100 96.00 2.90 0.06 0.02 0.02 0.04 100.0 17.5 6.9 5.9 5.9 6.5200 100 96.51 1.41 −0.02 −0.03 −0.03 −0.02 100.0 10.8 5.2 5.0 5.0 5.2

30 200 95.78 9.56 1.39 0.03 0.03 0.07 100.0 42.3 14.9 6.6 6.6 7.650 200 95.92 5.70 0.16 −0.01 −0.01 0.04 100.0 30.2 9.3 5.8 5.8 6.3

100 200 96.41 2.82 −0.01 −0.02 −0.02 0.00 100.0 17.3 5.8 5.4 5.4 5.6200 200 96.58 1.47 0.05 0.05 0.05 0.05 100.0 11.9 5.6 5.5 5.5 5.6

93 30 95.84 3.44 0.12 0.07 0.07 0.10 99.8 17.3 7.6 6.1 6.1 6.8184 50 96.02 1.59 −0.02 −0.03 −0.03 −0.02 100.0 11.3 6.3 5.5 5.5 6.3464 100 96.42 0.59 −0.03 −0.03 −0.03 −0.03 100.0 7.6 5.3 5.2 5.2 5.3

1169 200 96.50 0.25 0.00 0.00 0.00 0.00 100.0 5.6 4.6 4.6 4.6 4.6

Notes: “LS”, “PC”, “CCE” refer to the LS, principal components and CCE estimators, respectively. “C3E1”refers to the C3E estimator based on the “true” combinations, and “C3E2” refers to the C3E estimator basedon IC selected combinations. “C3E3” is C3E with a vector of ones as a must have combination.

53

Table E2: The combinations are uncorrelated with the loadings.

Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 −19.36 7.46 0.05 −0.93 −0.27 0.05 39.0 25.8 6.8 16.1 9.0 6.850 30 −19.94 4.40 0.00 −0.06 −0.23 0.00 54.5 17.4 6.0 12.9 7.6 6.0

100 30 −19.92 2.12 −0.01 0.00 −0.09 −0.01 69.6 10.0 5.3 8.1 6.6 5.3200 30 −19.66 1.06 0.00 0.00 0.00 0.00 80.5 7.6 5.2 5.9 5.8 5.2

30 50 −20.07 7.33 0.00 −0.96 −0.28 0.00 40.1 29.6 6.4 15.7 8.3 6.450 50 −20.09 4.29 −0.02 0.00 −0.22 −0.02 54.9 19.4 6.3 12.7 8.0 6.3

100 50 −19.92 2.01 −0.10 −0.06 −0.12 −0.10 72.6 10.8 5.4 8.3 7.1 5.4200 50 −19.75 1.09 0.04 0.06 0.05 0.04 84.9 8.6 5.2 5.8 5.7 5.2

30 100 −19.92 7.27 0.01 −0.84 −0.30 0.01 40.1 32.2 6.5 15.4 8.9 6.550 100 −20.11 4.21 −0.05 −0.17 −0.24 −0.05 55.5 20.2 6.2 12.7 8.0 6.2

100 100 −20.08 2.11 0.01 0.06 0.00 0.01 77.9 12.7 5.9 8.5 7.4 5.9200 100 −19.97 1.01 −0.03 −0.02 −0.02 −0.03 91.9 8.5 5.0 5.5 5.5 5.0

30 200 −19.57 7.30 0.05 −0.70 −0.25 0.05 38.1 34.9 6.7 17.9 9.5 6.750 200 −20.05 4.25 −0.01 −0.27 −0.25 −0.01 55.9 23.0 5.5 12.8 7.6 5.5

100 200 −20.00 2.06 −0.02 −0.04 −0.07 −0.02 80.4 12.9 5.1 7.5 6.3 5.1200 200 −19.92 1.09 0.05 0.06 0.05 0.05 95.3 9.6 5.3 6.2 6.1 5.3

93 30 −19.52 2.37 0.06 0.08 0.02 0.06 67.2 11.7 6.0 9.3 7.6 6.0184 50 −19.75 1.11 −0.04 −0.02 −0.03 −0.04 83.8 8.9 5.5 6.5 6.4 5.5464 100 −19.93 0.42 −0.03 −0.03 −0.03 −0.03 97.6 6.7 5.3 5.1 5.1 5.3

1169 200 −20.03 0.18 0.00 0.00 0.00 0.00 100.0 5.1 4.7 4.7 4.7 4.7

Notes: See Table E1 for an explanation.

Table E3: Condition (6) is not satisfied but loadings are independent.

Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 −13.21 5.57 0.15 0.10 0.05 0.05 14.4 17.8 7.0 6.7 7.9 8.350 30 −13.61 3.24 0.06 0.09 0.05 0.06 20.2 12.6 6.1 5.9 6.0 6.5

100 30 −13.66 1.56 −0.02 −0.03 0.00 0.01 32.3 8.0 5.6 5.3 5.2 5.4200 30 −13.43 0.78 0.04 0.03 0.02 0.03 55.1 6.7 5.1 5.1 5.1 5.0

30 50 −13.79 5.28 0.07 0.09 0.08 0.05 15.6 18.8 6.6 6.3 6.6 7.150 50 −13.69 3.08 0.00 −0.01 0.01 −0.01 20.1 13.2 5.8 5.8 6.1 6.1

100 50 −13.74 1.42 −0.09 −0.08 −0.08 −0.09 32.6 8.3 5.5 5.6 5.6 5.4200 50 −13.47 0.79 0.06 0.06 0.07 0.05 55.0 6.8 5.7 5.3 5.3 5.5

30 100 −13.64 5.15 0.08 0.08 0.05 0.06 15.2 19.7 7.0 6.1 6.6 7.250 100 −13.43 2.98 −0.08 −0.10 −0.10 −0.08 19.8 12.8 6.6 6.2 6.2 6.6

100 100 −13.48 1.50 0.00 −0.01 −0.01 0.00 31.7 9.7 6.2 6.0 5.9 6.3200 100 −13.66 0.71 −0.05 −0.05 −0.05 −0.05 56.2 6.7 5.2 4.9 4.9 5.0

30 200 −13.58 5.14 0.09 0.08 0.07 0.07 14.4 21.2 6.9 6.9 7.6 8.350 200 −13.20 3.00 0.06 0.02 0.01 0.02 18.3 14.3 6.0 5.8 5.9 6.2

100 200 −13.85 1.45 −0.02 −0.01 −0.01 −0.01 32.7 9.1 5.9 5.6 5.8 5.9200 200 −13.79 0.78 0.03 0.03 0.04 0.03 56.6 7.9 5.5 5.7 5.8 5.6

93 30 −13.32 1.77 0.14 0.13 0.11 0.11 29.9 9.0 6.2 6.3 6.6 6.6184 50 −13.69 0.79 −0.01 0.01 0.00 −0.02 53.1 7.2 5.3 5.4 5.5 5.2464 100 −13.70 0.29 −0.01 0.00 0.00 −0.01 89.1 6.1 5.4 5.2 5.1 5.4

1169 200 −13.69 0.13 0.01 0.01 0.01 0.00 99.9 4.8 4.7 4.8 4.7 4.8


54

Table E4: rk C < m = k + 1 and loadings are not independent.

Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 76.45 9.00 9.52 0.07 0.05 0.88 100.0 31.1 28.7 6.6 7.5 9.650 30 76.78 5.31 5.21 0.02 0.01 1.30 100.0 21.6 20.5 5.9 6.0 8.8

100 30 76.98 2.60 2.27 −0.03 −0.02 1.17 100.0 12.2 11.5 5.1 5.4 8.3200 30 77.11 1.31 1.11 0.01 0.01 0.72 100.0 9.1 7.6 5.2 5.3 6.7

30 50 76.83 8.66 9.41 0.00 0.00 1.52 100.0 33.9 29.9 7.0 7.1 10.150 50 77.08 5.09 5.23 −0.01 0.00 1.85 100.0 22.2 22.5 6.8 6.7 11.2

100 50 77.19 2.43 2.25 −0.08 −0.06 1.39 100.0 13.0 12.2 5.5 5.4 9.0200 50 77.42 1.30 1.08 0.01 0.01 0.83 100.0 10.2 8.5 5.5 5.6 7.6

30 100 77.04 8.42 9.04 0.05 0.03 1.60 100.0 36.3 29.7 6.5 6.9 10.550 100 77.29 4.96 5.02 −0.07 −0.08 1.70 100.0 24.1 21.6 6.0 6.0 9.9

100 100 77.40 2.49 2.28 0.00 0.00 1.46 100.0 15.4 13.7 5.9 6.0 10.3200 100 77.64 1.21 1.02 −0.05 −0.05 0.78 100.0 9.7 8.4 5.0 5.0 7.4

30 200 77.07 8.38 9.44 0.02 0.04 1.04 100.0 39.0 31.1 6.6 7.0 9.550 200 77.34 4.96 5.25 0.02 0.01 1.45 100.0 26.4 23.2 5.7 5.7 9.6

100 200 77.59 2.44 2.24 −0.01 −0.01 1.26 100.0 15.2 12.9 5.4 5.4 8.9200 200 77.85 1.27 1.16 0.09 0.08 0.82 100.0 11.1 9.4 6.1 6.1 8.3

93 30 77.03 2.89 2.60 0.10 0.08 1.34 100.0 14.1 13.2 5.8 5.9 9.3184 50 77.40 1.34 1.16 −0.01 −0.02 0.89 100.0 10.1 8.6 5.2 5.1 7.9464 100 77.68 0.50 0.46 0.00 0.00 0.36 100.0 7.4 6.4 4.9 4.8 5.8

1169 200 77.88 0.21 0.18 −0.01 −0.01 0.14 100.0 5.3 5.2 4.9 4.9 5.2


Table E5: rk C = m < k + 1 and loadings are not independent.

Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 23.17 17.30 −11.38 0.13 0.19 0.17 48.4 74.8 77.6 7.0 8.5 9.350 30 22.79 10.27 −21.85 0.08 0.08 0.06 58.1 56.2 86.7 5.9 6.2 6.2

100 30 23.47 5.06 −37.03 0.01 0.02 0.00 70.8 33.1 94.7 5.2 5.2 5.6200 30 23.38 2.50 −44.21 0.00 0.00 −0.11 79.0 18.6 98.4 5.1 5.1 5.6

30 50 22.99 16.40 −11.52 0.00 0.02 0.03 46.4 80.2 79.4 7.4 8.3 8.550 50 23.79 9.68 −23.26 −0.01 −0.02 0.00 57.9 60.7 87.9 6.5 6.5 7.1

100 50 23.71 4.80 −37.51 −0.02 −0.02 −0.04 70.9 36.0 95.1 6.1 6.1 6.3200 50 23.65 2.40 −45.20 0.00 0.00 −0.19 80.8 20.1 98.6 5.4 5.4 6.1

30 100 23.41 15.64 −11.96 −0.27 −0.27 −0.25 44.9 82.6 80.1 6.9 7.7 8.150 100 23.85 9.40 −22.62 −0.01 −0.01 0.01 59.1 65.7 89.3 6.3 6.2 6.5

100 100 24.14 4.64 −37.08 −0.02 −0.02 −0.04 75.4 39.1 95.4 5.8 5.8 5.9200 100 23.64 2.31 −44.66 −0.01 −0.01 −0.20 85.5 21.9 98.6 5.7 5.7 6.2

30 200 23.58 15.53 −10.71 −0.10 −0.09 −0.09 44.0 85.6 79.4 8.1 8.8 9.350 200 23.94 9.23 −24.11 −0.07 −0.07 −0.07 59.9 67.6 88.5 7.2 7.2 7.9

100 200 23.81 4.49 −37.55 −0.12 −0.12 −0.12 80.5 38.9 95.3 5.3 5.3 5.7200 200 24.16 2.31 −45.45 0.03 0.03 −0.09 91.9 24.4 99.0 5.7 5.7 6.1

93 30 23.48 5.45 −35.43 0.01 0.01 0.01 68.6 35.4 93.4 5.9 5.9 6.2184 50 23.88 2.54 −44.31 −0.08 −0.08 −0.26 80.1 21.2 98.5 5.6 5.6 6.2464 100 24.09 0.99 −49.21 −0.01 −0.01 −0.20 92.1 12.7 99.9 5.9 5.9 7.2

1169 200 24.23 0.39 −51.35 0.00 0.00 −0.05 98.7 8.2 100.0 5.6 5.6 6.0


55

Table E6: Bias and bias-adjustment in the homogeneous slope case.

Bias × 100N T CCE BACCE C3E1 BAC3E1 C3E2 BAC3E2 C3E3 BAC3E330 30 0.22 −1.10 0.71 0.10 0.71 0.10 −0.11 0.0050 30 −0.25 −0.19 0.40 0.03 0.40 0.03 −0.21 −0.05

100 30 −0.14 0.00 0.22 0.03 0.22 0.03 −0.14 0.01200 30 −0.06 0.03 0.13 0.04 0.13 0.04 −0.06 0.03

30 50 0.13 −1.12 0.63 0.01 0.63 0.01 −0.25 −0.1150 50 −0.26 −0.23 0.37 0.00 0.37 0.00 −0.27 −0.09

100 50 −0.19 −0.05 0.17 −0.01 0.17 −0.01 −0.19 −0.05200 50 −0.11 −0.03 0.08 −0.02 0.08 −0.02 −0.11 −0.03

30 100 0.08 −1.26 0.61 0.02 0.61 0.02 −0.26 −0.1250 100 −0.28 −0.27 0.35 −0.02 0.35 −0.02 −0.31 −0.12

100 100 −0.18 −0.04 0.19 0.00 0.19 0.00 −0.18 −0.04200 100 −0.09 −0.01 0.09 0.00 0.09 0.00 −0.09 −0.01

30 200 0.13 −1.02 0.64 0.04 0.64 0.04 −0.18 −0.0850 200 −0.25 −0.18 0.41 0.04 0.41 0.04 −0.21 −0.04

100 200 −0.18 −0.04 0.18 −0.01 0.18 −0.01 −0.19 −0.04200 200 −0.09 0.00 0.10 0.00 0.10 0.00 −0.09 0.00

93 30 −0.16 −0.01 0.22 0.02 0.22 0.02 −0.16 −0.01184 50 −0.12 −0.02 0.08 −0.02 0.08 −0.02 −0.12 −0.02464 100 −0.04 0.00 0.04 0.00 0.04 0.00 −0.04 0.00

1169 200 −0.02 0.00 0.01 0.00 0.01 0.00 −0.02 0.00

Notes: “BACCE”, “BAC3E1”, “BAC3E2” and “BAC3E3” refer to the bias-adjusted versions of the CCE, C3E1,C3E2 and C3E3 estimators, respectively. See Table E1 for an explanation.

56

Table B: Frequency count of the required number of combinations.

E1 E2 E3 E4 E5N T C3E2 C3E3 C3E2 C3E3 C3E2 C3E3 C3E2 C3E3 C3E2 C3E330 30 1.00 0.26 0.41 1.00 0.57 0.23 0.67 0.24 0.07 0.0050 30 1.00 0.49 0.62 1.00 0.68 0.43 0.74 0.43 0.61 0.00

100 30 1.00 0.89 0.89 1.00 0.75 0.62 0.77 0.63 1.00 0.01200 30 1.00 1.00 1.00 1.00 0.76 0.71 0.77 0.71 1.00 0.16

30 50 1.00 0.34 0.50 1.00 0.74 0.40 0.83 0.39 0.22 0.0050 50 1.00 0.59 0.69 1.00 0.80 0.58 0.86 0.57 0.92 0.00

100 50 1.00 0.94 0.92 1.00 0.83 0.75 0.86 0.74 1.00 0.03200 50 1.00 1.00 1.00 1.00 0.86 0.82 0.86 0.81 1.00 0.30

30 100 1.00 0.33 0.52 1.00 0.74 0.39 0.83 0.39 0.20 0.0050 100 1.00 0.58 0.69 1.00 0.81 0.57 0.87 0.55 0.95 0.00

100 100 1.00 0.94 0.93 1.00 0.85 0.75 0.87 0.74 1.00 0.02200 100 1.00 1.00 1.00 1.00 0.86 0.83 0.88 0.83 1.00 0.30

30 200 1.00 0.25 0.45 1.00 0.65 0.28 0.78 0.27 0.05 0.0050 200 1.00 0.50 0.65 1.00 0.75 0.47 0.81 0.47 0.88 0.00

100 200 1.00 0.92 0.90 1.00 0.80 0.68 0.83 0.67 1.00 0.01200 200 1.00 1.00 1.00 1.00 0.83 0.76 0.83 0.75 1.00 0.19

93 30 1.00 0.86 0.87 1.00 0.74 0.63 0.77 0.62 1.00 0.01184 50 1.00 1.00 0.99 1.00 0.86 0.81 0.87 0.82 1.00 0.26464 100 1.00 1.00 1.00 1.00 0.87 0.86 0.88 0.85 1.00 0.87

1169 200 1.00 1.00 1.00 1.00 0.84 0.83 0.84 0.82 1.00 1.00

Notes: “E1”–“E5” refer to the experiments described in Table A. The numbers in the table are the fractionof times that the selected number of combinations were equal to the required number. See Table E1 for anexplanation of the rest.

57

financial econometrics series swp 2015/16 cce estimation

Documents