financial econometrics series swp 2015/16 cce estimation
TRANSCRIPT
Faculty of Business and Law Centre for Financial Econometrics (Department of Finance)
Financial Econometrics Series
SWP 2015/16
CCE Estimation of Factor-Augmented Regression Models with More Factors than
Observables
H. Karabiyik, J-P. Urbain and J.
Westerlund
The working papers are a series of manuscripts in their draft form. Please do not quote without obtaining the author’s consent as these works are in their draft form. The views expressed in this paper are those of the author and not necessarily endorsed by the School or IBISWorld Pty Ltd.
CCE ESTIMATION OF FACTOR-AUGMENTED REGRESSION
MODELS WITH MORE FACTORS THAN OBSERVABLES
Hande KarabiyikLund University
Jean-Pierre UrbainMaastricht University
Joakim Westerlund∗Lund University
andCentre for Economics and Financial Econometrics Research
Deakin University
October 16, 2015
Abstract
This paper considers estimation of factor-augmented panel data regression models.
One of the most popular approaches towards this end is the common correlated effects
(CCE) estimator of Pesaran (Estimation and inference in large heterogeneous panels with
a multifactor error structure. Econometrica 74, 967–1012, 2006). For the pooled version of
this estimator to be consistent, either the number of observables must be larger than
the number of unobserved common factors, or the factor loadings must be distributed
independently of each other. This is a problem in the typical application involving only
a small number of regressors and/or correlated loadings. The current paper proposes
a simple extension to the CCE procedure by which both requirements can be relaxed.
The CCE approach is based on taking the cross-section average of the observables as an
estimator of the common factors. The idea put forth in the current paper is to consider
not only the average but also other cross-section combinations. Asymptotic properties
of the resulting combination-augmented CCE (C3E) estimator are provided and verified
in small samples using Monte Carlo simulations.
JEL Classification: C12; C13; C33.
Keywords: Factor-augmented panel regressions; common factor models; principal com-
ponents; cross-sectional averages; cross-sectional dependence.
∗Corresponding author: Department of Economics, Lund University, Box 7082, 220 07 Lund, Sweden. Tele-phone: +46 46 222 8997. Fax: +46 46 222 4613. E-mail address: [email protected].
1
1 Introduction
Consider the scalar and m× 1 vector of observable panel data variables yi,t and xi,t, where i =
1, ..., N and t = 1, ..., T indexes the cross-sectional and time series dimensions, respectively.
The data generating process (DGP) of the T × 1 vector yi = (yi,1, ..., yi,T)′ is similar to the
DGP of Pesaran (2006), and is given by
yi = Xiβi + ei, (1)
ei = Fλi + εi, (2)
βi = β + ξi, (3)
where Xi = (xi,1, ..., xi,T)′ is T×m, βi is a m× 1 vector of slope coefficients, F = (f1, ..., fT)
′ is
a T× r matrix of common factors with λi being the associated r× 1 vector of factor loadings,
εi = (εi,1, ..., εi,T)′ is a T × 1 vector of errors that are largely idiosyncratic, and ξi is a m× 1
vector of errors. If the model includes unit-specific fixed effects, then yi, Xi, ei, F and εi are
simply the correspondingly (time) demeaned variables.
The above model is the prototypical pooled panel regression with a factor error structure,
in which εi is independent of Xi. If F is also independent of Xi, then (1) is nothing but a static
panel data regression with exogenous regressors, which can be estimated consistently using
least squares (LS). If, however, Xi is correlated with F, then consistency will be lost. To allow
for this possibility, we follow Pesaran (2006) and assume that
Xi = FΛ′i + ηi, (4)
where Λi is a m× r loading matrix and ηi = (ηi,1, ..., ηi,T)′ is a T ×m matrix of idiosyncratic
errors. By combining (1)–(4),
Wi = FCi + Ui, (5)
where Wi = (wi,1, ..., wi,T)′ is T × (m + 1), wi,t = (yi,t, x′i,t)
′ is (m + 1) × 1, Ci = (Λ′iβi +
λi, Λ′i) is r × (m + 1), and Ui = (ui,1, ..., ui,T)′ = (ηiβi + εi, ηi) is T × (m + 1). Thus, (1)–(4)
can be rewritten equivalently as a static factor model for Wi, which is convenient because it
means that the common component of the data can be estimated using existing methods for
such models (see Chudik and Pesaran, 2013b, for a recent survey).1 In this paper, however,
1In Section 5 of the present paper we present some Monte Carlo results that enable comparison with the prin-cipal components-based estimator of Bai (2009), which is arguably the closest competitor of the CCE approach.
2
we focus on the CCE approach of Pesaran (2006), which has become very popular in the
empirical literature with a large number of applications. The approach has also attracted
much interest in the econometric literature where it has been shown to work under very
general conditions, including models with weak factors, dynamic models and even models
with non-stationary data (see, for example, Chudik et al., 2011; Chudik and Pesaran, 2013a;
Kapetanios et al., 2011; Pesaran et al., 2013; Reese and Westerlund, 2015a; Reese and Wester-
lund, 2015b).
As is well known from the classical common factor literature, F and Ci are not separately
identifiable, suggesting that the best that one can hope for is consistent estimation of the
space spanned by F. The idea of Pesaran (2006) is to make use of the cross-section variation
to estimate this space. A natural way to accomplish this is to take the cross-section average,
giving wt = C′ft + ut, where C, wt and ut are the cross-section averages of Ci, wi,t and ui,t,
respectively. Hence, since ut →p 0(m+1)×1 as N → ∞, where →p signifies convergence in
probability, we have that wt = C′ft + ut →p C
′ft. This suggests using wt as an estimator of
C′ft, a strategy that would seem to require
rk C′= r ≤ m + 1, (6)
where rk A denotes the rank of any matrix A. Hence, the number of observables must be at
least as large as the number of factors. The idea behind the CCE approach is then to estimate
β from a pooled LS regression of yi,t onto xi,t and wt, leading to the pooled CCE (CCEP)
estimator.2
Interestingly, as Pesaran (2006) points out, the condition in (6) is actually not necessary
when using the CCEP estimator. However, as has been shown by Westerlund and Urbain
(2013), and as we explain in detail in Section 3 of the current paper, relaxing (6) requires
imposing additional restrictive independence conditions on λi and Λi, which, if false, may
well render the CCEP estimator inconsistent. Hence, even if (6) can in principle be relaxed,
in most situations of practical relevance this is not necessarily so. Also, even if the more
restrictive assumptions are satisfied, the rate of consistency of the CCEP estimator when
r > m + 1 is lower than when r ≤ m + 1.2Another possibility is to estimate βi from a time series LS regression of yi,t onto xi,t and wt. This is the
individual-specific CCE estimator, which can be averaged across the cross-section to obtain the mean groupCCE (CCEMG) estimator. However, for reasons to be explained in Section 3, in this paper we focus on the CCEPestimator, although we also discuss the results for the other CCE estimators.
3
The discussion in the last paragraph suggests that it is important to have m + 1 ≥ r. The
question therefore arises as to how likely this is in practice. The number of regressors, m,
is usually a small number that is given by economic theory (and/or previous empirical evi-
dence). Economic theory is, on the other hand, not very informative regarding the number
of factors, r (see, for example, Eberhardt et al., 2013). Therefore, the theoretically implied
value of m has typically little or nothing to do with r. This is important because within CCE
choosing m also means restricting r, and in many applications there is little or no reason to
believe that this number should be less than or equal to m + 1. In view of this and the po-
tential problems involved when m + 1 < r, the restriction in (6) cannot be taken given but
should really be tested on a case-to-case basis. In practice, however, this aspect is almost
always ignored.
In the current paper we take this shortcoming as our starting point. The purpose is to
provide a simple modification of the original CCE approach allowing (but not requiring)
r > m + 1. Hence, the purpose here is not really to propose an alternative estimator, but to
show that original CCE belongs to a much broader class of estimators, which is henceforth
referred to as combination-augmented CCE (C3E). The idea behind C3E is to consider not
only the equal-weighted cross-section average, but also other combinations of w1,t, ..., wN,t.
In particular, by considering k such combinations we can allow for
k(m + 1) ≥ m + 1
common factors. In addition to the larger number of factors that can be allowed, the new
approach also enables one to consider separately the selection of m and r, which is again
not possible within the original CCE framework. In our study of the asymptotic properties
of the pooled C3E estimator we focus on the conventional homogenous slope case when
β1 = ...βN = β, although we also consider the case when this restriction is not met. The
analysis is conducted under the assumption that N, T → ∞ with T/N → τ < ∞, which is
less restrictive than the T/N → 0 condition of Pesaran (2006). Some Monte Carlo results are
also provided to suggest the asymptotic properties are borne out well in small samples.
The remainder of the paper is organized as follows. Section 2 gives the assumptions,
which are used in Section 3 to derive the asymptotic distribution of the pooled C3E estimator.
When T/N → τ < ∞ the estimator is biased. As a response to this, we propose using bias
correction, a procedure that is shown to be quite effective. As a solution to the practical
problem of how to pick the appropriate combinations, an information criterion (IC)-based
4
selection rule is proposed in Section 4. Section 5 focuses on the finite-sample accuracy of
the theory provided in Sections 3 and 4. Section 6 concludes. All proofs are provided in
Appendix.
A word on notation. tr A and ‖A‖ =√
tr (A′A) denote the trace and Frobenius (Eu-
clidean) norm, respectively, of the matrix A. Also, MA = IT −A(A′A)−1A′ for any T-rowed
matrix A. M < ∞ denotes a generic positive number. Finally, →d signifies convergence in
distribution.
2 Assumptions
The restrictions placed on εi, ηi, F, βi, λi, and Λi are given in Assumption 1, which is a
so-called “high-level” assumption (see Bai and Ng, 2002; Bai, 2003, 2009, for similar assump-
tions). The advantage of making such high-level assumptions is that the results cover a wide
range of DGPs. The disadvantage is that the assumptions can be difficult to interpret. Let
τε,ij,ts = E(εi,tεj,s) and τη,ij,ts = E(ηi,tη′j,s).
Assumption 1.
(i) E(εi,t) = 0, E|εi,t|8 < M, τε,ii,tt = σ2ε,i > 0, |σε,ii,ss| ≤ M, T−1 ∑T
s=1 ∑Tt=1 |σε,ii,ts| ≤ M,
|τε,ij,tt| ≤ |τε,ij| for some τε,ij, N−1 ∑Ni=1 ∑N
j=1 |τε,ij| ≤ M,
(NT)−1 ∑Ni=1 ∑N
j=1 ∑Tt=1 ∑T
s=1 |τε,ij,ts| ≤ M, and E[(N−1/2 ∑Ni=1[εi,sεi,t − τε,ii,st])
4] ≤ M.
(ii) E(ηi,t) = 0m×1, E(‖ηi,t‖8) ≤ M, τη,ii,tt = Ση,i is positive definite, N−1 ∑Ni=1 Ση,i → Ση
as N → ∞, where Ση is positive definite, ‖Ση,i‖ ≤ M, T−1 ∑Ts=1 ∑T
t=1 ‖τη,ii,ts‖ ≤ M,
‖τη,ij,tt‖ ≤ |τη,ij| for some τη,ij, N−1 ∑Ni=1 ∑N
j=1 |τη,ij,tt| ≤ M,
(NT)−1 ∑Ni=1 ∑N
j=1 ∑Tt=1 ∑T
s=1 ‖τη,ij,ts‖ ≤ M, and E(‖N−1/2 ∑Ni=1[ηi,tηi,s − τη,ii,ts]‖4) ≤
M.
(iii) T−1/2 ∑Tt=1 ηi,tεi,t →d N(0(m+1)×1, Σηε,i) as T → ∞ and
(NT)−1/2 ∑Ni=1 η′iεi →d N(0(m+1)×1, Σηε) as N, T → ∞, where Σηε,i and
Σηε = limN→∞ N−1 ∑Ni=1 Σηε,i are m×m positive definite matrices.
(iv) T−1 ∑Tt=1 ftf′t →p E(ftf′t) = Σ f as T → ∞, where Σ f is positive definite, and E(‖ft‖4) ≤
M.
(v) N−1/2 ∑Ni=1 ξi →d N(0m×1, Σξ) as N → ∞ with Σξ positive definite, and ‖Σξ‖ ≤ M.
5
(vi) λi and Λi are either random such that E(‖λi‖) ≤ M and E(‖Λi‖) ≤ M, or non-
random such that ‖λi‖ < M and ‖Λi‖ ≤ M. In both cases, N−1 ∑Ni=1 λiλ
′i →p Σλ,
and N−1 ∑Ni=1 ΛiΛ
′i →p ΣΛ, where Σλ and ΣΛ are positive definite.
(vii) (εi,t, η′i,t)′, fs, ξ j and (λl , Λl) are mutually independent for all i, j, l, t and s.
Remark 1. Assumption 1 is less restrictive than Assumptions 1–4 in Pesaran (2006) under
which CCE was originally proposed. Note in particular that while Pesaran (2006) only al-
lows for serial correlation, Assumption 1 (i) and (ii) allow for both serial and cross-sectional
correlation in the idiosyncratic errors. In this sense, Assumption 1 (i) and (ii) are similar to
Assumption C of Bai and Ng (2002) (see also Bai, 2003, 2009). The main difference is that
we do not allow for heteroscedasticity across time. The assumptions placed on βi are also
more general than those considered by Pesaran (2006, Assumption 4). Specifically, while Pe-
saran (2006) assumes that ξi is independent and identically distributed (iid) with mean zero
and constant covariance matrix, Assumption 1 (v) only requires that a suitable central limit
theorem applies. Similarly, while in Pesaran (2006, Assumption 3) λi and Λi are assumed to
be iid and also independent of each other, under Assumption 1 (vi) λi and Λi can be either
random in a general way or non-random. This means that λi and Λi can be correlated both
across i and with each other. As in Pesaran (2006) the loadings are assumed not to go to zero,
which means that the cross-section dependence is of the strong form. However, in analogy
with Chudik et al. (2011), some of the factors can also be weak without affecting the results.
For each of the m + 1 columns in Wi, we consider k cross-section combinations, as given
by the T× k(m+ 1) matrix N−1 ∑Ni=1 WiZi, where Zi = (Im+1⊗ z′i) is (m+ 1)× k(m+ 1) and
zi = (z1i, ..., zki)′ is a k × 1 vector of combinations. The combinations can be deterministic
and/or stochastic, provided that Assumption 2 is satisfied. Here and throughout this paper,
H =1N
N
∑i=1
Z′iC′i,
a k(m + 1)× r matrix.
Assumption 2.
(i) rk H = r for all N < ∞ and H→p H as N → ∞, where rk H = r and ‖H‖ < ∞.
(ii) Zi is either deterministic such that ‖Zi‖ ≤ M, or stochastic such that E(‖Zi‖2) ≤ M.
6
(iii) Let φt = N−1/2 ∑Ni=1 Z′iui,t and φi,t = N−1/2 ∑N
j 6=i Z′juj,t. It is assumed that E(‖φt‖2) ≤
M, T−1 ∑Tt=1 E(φtφ
′t)→ ΣZu = limN→∞ N−1 ∑N
i=1 Z′iΣu,iZi as N, T → ∞ with
Σu,i = E(ui,tu′i,t) =[
β′iΣη,iβi + σ2ε,i β′iΣη,i
Ση,iβi Ση,i
],
‖ΣZu‖ ≤ M, E(‖N−1/2T−1/2 ∑Ni=1 ∑T
t=1 Z′iui,tφ′i,t‖2) ≤ M, and
E(‖T−1/2 ∑Tt=1 Z′iui,tφ
′i,t‖2) ≤ M.
Remark 2. Note that if k = 1 and zi = 1, then N−1 ∑Ni=1 WiZi = N−1 ∑N
i=1(Wi⊗ z′i) = W, and
so we are back in the cross-section average-only original CCE approach of Pesaran (2006).
In this case, Assumption 2 is the same as in Pesaran (2006), in the sense that (i) boils down
to (6), (ii) is trivially satisfied, and (iii) is implied by Assumption 1. Pesaran (2006) does
point out that the equal-weighted average is not the only way to combine the data. How-
ever, while recognizing the fact that the weights do not have to be equal, it is still just one
combination/weighted average per observable that is being considered. The contribution
of the present paper is the consideration of multiple combinations, which is important, be-
cause it relaxes the m + 1 ≥ r requirement in (6). This makes it necessary to be specific about
the combinations that can be permitted. Interestingly, zi can be thought of as acting as an
instrument for Ci. Assumption 2 is therefore analogous to the well known orthogonality
and validity conditions in the instrumental variables (IV) literature (see Bai and Ng, 2010,
for a panel IV approach based on similar assumptions). Specifically, while strict indepen-
dence/orthogonality is not necessary, ui,t and Zi can be at most weakly correlated. We also
require that the combinations in zi are valid in the sense that rk H = r, and that certain mo-
ments exist. As in classical IV, the assumptions are placed on unobservables, which make
them harder to test had they been placed on observables. In Section 4 we elaborate on this.
Specifically, an IC-based procedure is proposed that selects only the valid combinations.
3 C3E estimation and inference
In Section 3.1, we study the asymptotic properties of the pooled C3E estimator in the case
when β1, ..., βN are all equal, and in the case when they are unrestricted. The estimation of
the various covariance matrices that appear in Section 3.1 is discussed in Section 3.2, where
we also consider briefly the properties of the individual-specific C3E estimator, which are im-
portant to ensure consistent covariance matrix estimation in the heterogeneous slope case.
7
The results reported in Section 3.1 show how the pooled C3E estimator is generally biased.
This finding lead quite naturally to the consideration of a bias-corrected estimator, the prop-
erties of which are studied in Section 3.3.
3.1 The pooled C3E estimator
As already mentioned, since F and Ci are not separately identifiable, F can only be estimated
up to a matrix rotation. The proposed estimator F of FH is given by
F =1N
N
∑i=1
WiZi =1N
N
∑i=1
(Wi ⊗ z′i), (7)
whose dimension is T × k(m + 1). The resulting pooled estimator of β is given by
βC3E =
(N
∑i=1
X′iMFXi
)−1 N
∑i=1
X′iMFyi. (8)
The CCEP estimator, henceforth denoted βCCEP, is simply βC3E with F = W.
Remark 3. The pooled C3E estimator considered here is based on “within” pooling, whereby
the data are summed over the cross-section before taking the ratio. Another approach is to
use “between” pooling, in which case the ratio is taken prior to summing over the cross-
section. Pesaran (2006) considers both types of pooling. However, since in his Monte Carlo
study within pooling generally leads to the best performing estimator, in this paper we only
consider this type. However, as mentioned in the above, as a by-product of the need for con-
sistent covariance estimation in the heterogeneous slope case, in Section 3.2 we also consider
the individual-specific C3E estimator. This estimator can be averaged, leading to a between
(or “group mean”) type C3E estimator.
Theorem 1. Suppose that ξ1 = ... = ξN = 0m×1 and k(m + 1) = r. Under Assumptions 1 and 2,
as N, T → ∞ with T/N → τ ≤ M,
√NT(βC3E − β)→d N(0m×1, Σ−1
η ΣηεΣ−1η ) + Σ−1
η
√τB,
8
where
B = B1 − B2 − B3,
B1 = limN→∞
1N
N
∑i=1
ΛiH−1ΣZu(H′)−1λi,
B2 = limN→∞
1N
N
∑i=1
Ση,i(β, Im)Zi(H′)−1λi,
B3 = limN→∞
1N
N
∑i=1
σ2ε,iΛiH−1Z′i(1, 0m)
′.
Theorem 1 is concerned with the conventional homogeneous slope case, and is the C3E
counterpart of Theorem 4 of Pesaran (2006), which requires that r = 1. Theorem 1 only
requires that k(m + 1) = r and is therefore more general in this regard. Another difference
when compared to Theorem 4 in Pesaran (2006), which supposes that T/N → 0, is that
Theorem 1 only requires that T/N → τ ≤ M, making it more relevant for applied work.
Moreover, by relaxing the T/N → 0 requirement, Theorem 1 also reveals the presence of an
asymptotic bias that is not in Theorem 4 of Pesaran (2006).
Analogous to the bulk of the existing literature on factor-augmented regressions, Theo-
rem 1 supposes that r is known and that β1 = ... = βN = β (see, for example, Bai, 2009;
Goncalves and Perron, 2014; Greenaway-McGrevy et al., 2012). The former assumption is
without loss of generality in the sense that if r is unknown, the IC-based approach of Section
4 can be used to obtain a consistent estimate. The effect of a violation of the common slope
assumption is studied in Theorem 2.
Theorem 2. Suppose that k(m + 1) = r. Under Assumptions 1 and 2, as N, T → ∞,
√N(βC3E − β)→d N(0m×1, Σ−1
η RΣ−1η ),
where
R = limN→∞
1N
N
∑i=1
Ση,iΣξΣη,i.
Theorem 2 is the C3E counterpart of Theorem 3 of Pesaran (2006). It shows that the
variance of the estimator emanates from the heterogeneity of the slopes, as measured by
Σξ . This result is analogous to that of Pesaran (2006) for the CCEP estimator. However,
the asymptotic variance of this estimator has an additional term that depends on the het-
erogeneity of the factor loadings and that is there because the rank condition in (6) is not
9
assumed to be met. The C3E estimator does not depend on whether (6) is satisfied, which
is also the reason for why the asymptotic distribution given in Theorem 2 does not depend
on the factor loadings. In order to illustrate this point, suppose first that m + 1 = r. Since
in this case wt = H′ft + op(1), where H = C is of full rank and hence invertible, we have
N−1/2T−1 ∑Ni=1 X′iMWFλi = N−1/2T−1 ∑N
i=1 X′iMFHFλi + op(1) = N−1/2T−1 ∑Ni=1 X′iMFFλi +
op(1) = op(1) (see Pesaran, 2006, equation (40)). Hence,
√N(βCCEP − β) =
(1
NT
N
∑i=1
X′iMWXi
)−11√NT
N
∑i=1
X′iMW(Xiξi + Fλi + εi)
=
(1
NT
N
∑i=1
X′iMWXi
)−11√NT
N
∑i=1
X′iMWXiξi + op(1), (9)
which converges to the same asymptotic distribution given in Theorem 2, provided that ξi is
“nicely behaved” in the sense that Assumption 1 (iv) is met. If, on the other hand, m + 1 < r,
then N−1/2T−1 ∑Ni=1 X′iMWFλi will not be negligible (see Pesaran, 2006, equation (38)), and
so we obtain
√N(βCCEP − β) =
(1
NT
N
∑i=1
X′iMWXi
)−11√NT
N
∑i=1
X′iMW(Xiξi + Fλi) + op(1), (10)
which will not converge in distribution unless λi is also nicely behaved. As pointed out by
Westerlund and Urbain (2013), one requirement here is that λi and Λi are mutually inde-
pendent, which seems like a rather restrictive assumption. For example, when regressing
investments on savings, as is commonly done in the literature on the so-called “Feldstein–
Horioka puzzle”, a common shock that increases savings is going to push interest rates down
and investments up, suggesting that λi and Λi should be negatively correlated. Thus, while
the requirement that m + 1 ≥ r can be relaxed also within the original CCE framework, this
does not come free of charge.
It is important to note that in the above example the rate of consistency of βCCEP is given
by√
N and not by√
NT. One may think that this relatively low rate of consistency is
due to the heterogeneity of βi, and that imposing β1 = ... = βN = β would prevent this
from happening, regardless of whether m + 1 ≥ r or m + 1 < r.3 However, this is not the
case. The reason is easily appreciated by simply imposing ξ1 = ... = ξN = 0m×1 and using
3It is not clear from Pesaran (2006) whether one can have β1 = ... = βN = β, while at the same time permittingm + 1 < r.
10
(NT)−1/2 ∑Ni=1 X′iMWεi = Op(1), from which it follows that
1√NT
N
∑i=1
X′iMW(Xiξi + Fλi + εi) =1√NT
N
∑i=1
X′iMWFλi + Op(1). (11)
If m + 1 ≥ r, then (NT)−1/2 ∑Ni=1 X′iMWFλi = op(1), and so we obtain
√NT(βCCEP − β) =
Op(1). Hence, provided that m + 1 ≥ r, imposing β1 = ... = βN = β restores√
NT-
consistency. If, on the other hand, m + 1 < r, then T−1X′iMWF = Op(1), and therefore
√NT(βCCEP − β) =
√T
(1
NT
N
∑i=1
X′iMWXi
)−11√N
N
∑i=1
T−1X′iMWFλi + Op(1), (12)
whose order is determined by the order of the first term on the right, which in turn depends
on λi and Λi. If λi is iid and independent of Λi, then the first term is Op(√
T ), whereas if
λi is non-iid and/or correlated with Λi, then the same term is Op(√
NT ). Thus, the rate of
consistency is√
N, at best, and if λi is non-iid and/or correlated with Λi, then βCCEP is even
inconsistent. The proposed C3E estimator in the homogeneous slope case is not only very
simple, but also√
NT-consistent regardless of the specification of λi and Λi, provided that
Assumptions 1 and 2 are satisfied.
3.2 Covariance matrix estimation
In this section we derive consistent estimators of the covariance matrices that appear in The-
orems 1 and 2. We begin by considering Σ−1η ΣηεΣ−1
η , which according to Theorem 1 is the ap-
propriate covariance matrix to consider when β1, ..., βN are all equal. Let εi = (εi,1, ..., εi,T)′ =
MF(yi − Xi βC3E) and ηi = (ηi,1, ..., ηi,T)′ = MFXi. A naturally consistent estimator of Ση,i is
given by
Ση,i =1T
T
∑t=1
ηi,tη′i,t, (13)
from which we obtain
Ση =1N
N
∑i=1
Ση,i. (14)
For Σηε, we follow Pesaran (2006), who recommend using a heteroskedasticity and autocor-
relation consistent (HAC) estimator in the spirit of Newey and West (1987). The particular
estimator considered here is given by
Σηε =1N
N
∑i=1
Σηε,i, (15)
11
where
Σηε,i = Σηε,i(0) +p
∑j=1
(1− j
p + 1
)[Σηε,i(j) + Σηε,i(j)′], (16)
Σηε,i(j) =1T
p
∑t=j+1
εi,tεi,t−jηi,tη′i,t−j, (17)
with p being the window size. The appropriate covariance matrix estimator to use in the
homogenous slope case is therefore given by Σ−1η ΣηεΣ
−1η .
If, as in Theorem 2, β1, ..., βN are not all the same, the above covariance estimator is
no longer consistent. Specifically, while Ση is still consistent, because of the reduced rate
of consistency of βC3E, Σηε is inconsistent for R. Recognizing this problem, Pesaran (2006)
proposes a nonparametric method that makes use of the individual-specific CCE estimator.
The appropriate C3E analog of this estimator is given by
R =1N
N
∑i=1
Ri, (18)
where
Ri = Ση,i
(βC3E,i −
1N
N
∑i=1
βC3E,i
)(βC3E,i −
1N
N
∑i=1
βC3E,i
)′Ση,i, (19)
βC3E,i = (X′iMFXi)−1X′iMFyi. (20)
The consistency of this estimator follows from the consistency of the individual-specific C3E
estimator, βC3E,i.
Theorem 3. Suppose that k(m + 1) = r. Under Assumptions 1 and 2, as N, T → ∞ with√
T/N → 0,
√T(βC3E,i − βi)→d N(0, Σ−1
η,i Σηε,iΣ−1η,i ).
Theorem 3 is the C3E counterpart of Theorem 1 of Pesaran (2006). It is important to note
that unlike this other theorem, provided that k(m + 1) = r, Theorem 3 does not require that
(6) is satisfied. The fact that βC3E,i is consistent means that the appropriate covariance matrix
estimator to consider in the heterogenous slope case is given by Σ−1η RΣ
−1η .
3.3 Bias-adjustment
Theorem 1 shows that while consistent, the asymptotic distribution of the pooled C3E esti-
mator is biased when T/N → τ > 0, leading to misleading inference. As pointed out by Bai
12
(2009), an obvious solution to this problem is to use bias correction. Let us therefore define
the following bias-adjusted version of βC3E:
βBAC3E = βC3E − N−1Σ−1η B, (21)
where B = B1 − B2 − B3 with
B1 =1N
N
∑i=1
ΛiΣZuλi, (22)
B2 =1N
N
∑i=1
Ση,i(βC3E, Im)Ziλi, (23)
B3 =1N
N
∑i=1
σ2ε,iΛiZ′i(1, 0m)
′. (24)
Here Ση and Ση,i are as in Section 3.3, while
σ2ε,i =
1T
T
∑t=1
ε2i,t, (25)
where εi,t is again as in Section 3.3. Also, letting ui = MFWi, we have
ΣZu =1N
N
∑i=1
Z′iΣu,iZi, (26)
Σu,i =1T
T
∑t=1
ui,tu′i,t. (27)
The estimators λi and Λi of λiH−1 and ΛiH−1, respectively, are obtained by simply picking
the appropriate elements in Ci = (F′F)−1F′Wi, a (m + 1)k× (m + 1) matrix.
Corollary 1. Under the conditions of Theorem 1,
√NT(βBAC3E − β) =
√NT(βC3E − β)−
√τΣ−1
η B + op(1).
According to Corollary 1,√
NT(βBAC3E− β) is asymptotically equivalent to√
NT(βC3E−
β)−√
τΣ−1η B, whose asymptotic distribution is easily inferred from Theorem 1. Indeed,
√NT(βBAC3E − β)→d N(0m×1, Σ−1
η ΣηεΣ−1η ) (28)
as N, T → ∞ with T/N → τ ≤ M. The bias correction is therefore asymptotically success-
ful. Moreover, the correction does not contribute to the limiting variance.
13
4 Selecting the combinations
A problem in applications is how to construct the combination matrix, z. This problem can
be seen as being comprised of two parts; (i) finding candidate combinations, and (ii) selecting
among the candidates.
4.1 Finding combination candidates
While zi is not required to be uncorrelated with ui,t, the correlation is not irrelevant, as the
rate of consistency of F is increased when zi and ui,t are uncorrelated. The combinations
in zi are therefore ideally chosen to be uncorrelated with ui,t. They should also be highly
correlated with Ci. Specifically, since Ci = (Λ′iβi + λi, Λ′i), for zi to be highly correlated with
Ci, one should choose combinations that are believed to be highly correlated with the factor
loadings.
An obvious approach to selecting the combinations is to exploit if there are natural can-
didates in the particular application being considered. For example, in macroeconomics
usual common factor suspects include trade of goods and services, technology spillovers,
and worldwide supply shocks, such as oil price shocks (see, for example, Dees et al., 2007).
The task of finding combinations that are correlated with the loadings is therefore tanta-
mount to finding variables that measure the extent to which countries are affected by these
usual suspects. Mastromarco et al. (2015), and Eberhardt and Teal (2011) argue that spillover
effects of globalization and business cycles, and common political, economic and spatial
stimuli are likely to make production correlated across countries. As examples of variables
that measure the effect of these common factors they mention openness, trade agreements,
physical capital shares in aggregate income, human capital, growth determinants, initial per
capita income, institutional environment, qualitative features of governance, geographical
features, adoption of efficiency enhancing technology, and natural resource constraints. If
the analysis is made at the firm level, the extent to which a firm’s production function is
affected by common factors is likely to depend on for example the size of the firm, financial
constraints, and the technology adopted (see Eberhardt and Teal, 2011; Chudik and Straub,
2011). In the spillover literature, absorptive capacity is known to be an important determi-
nant of the effect of knowledge, which can in turn be measured using, for example, openness,
trade flows, human capital, and various development indices (see, for example, Fracasso and
14
Marzetti, 2014; Fracasso and Marzetti, 2015). Baxter and Kouparitsas (2004) and Imbs (2004)
study the determinants of business cycle comovements. They conclude that trade is the most
important determinant of cross-country business cycle linkages. Trade is an important de-
terminant of cross-country linkages also in financial markets (see, for example, Forbes and
Chinn, 2004; Dees et al., 2007), although when modelling returns, it is standard practice to
use asset specific characteristics (“fundamentals”) like industry classification, market capi-
talization, and style classification as observable loadings, or “betas” (Rosenberg, 1974).
As the above discussion illustrates, in many applications there are natural combination
candidates that can go into zi. Deterministic combinations are particularly simple to come
by. Specifically, as Chudik et al. (2011) show, the cross-section average can be quite effective
in mopping up cross-section dependence. A vector of ones is therefore a good starting point.
Tutz and Binder (2007) considers the problem of boosting ridge regression. They separate
between “must have” candidates and other variables. The cross-section average can there-
fore be thought of as a “must have combination”. This special treatment of the cross-section
average highlights the role of C3E as an extension rather than as an alternative to original
CCE. Other readily available combination candidates are preliminary consistent estimates
of (the space spanned by) Ci. The only requirement is that the rate of consistency must be
at least√
N, which is sufficiently relaxed to enable estimation by principal components (see
Bai, 2003).
4.2 An IC-based selection procedure
An advantage of using deterministic combinations and/or preliminary loading estimates is
that they are (asymptotically) uncorrelated with ui,t.4 However, in practice there is no guar-
antee that the combinations are valid, and if some of the combinations are stochastic there is
also likely to be uncertainty regarding the correlation with ui,t. In this section, we propose
a selection criteria for the combination vector, zi. In so doing, it is convenient, albeit not
necessary (see the discussion that follows Corollary 2 below), to assume that the valid com-
binations, henceforth denoted z0,i, are ordered first (see, for example, Zheng and Loh, 1995;
Zheng and Loh, 1997; Donald and Newey, 2001, for similar assumptions), and that their
number is given by k0. In fact, analogous to IV selection, it is useful to treat also the combi-
nations within z0,i as ordered, but then according to their correlation with Ci. The first com-4This advantage of using deterministic instruments has been pointed out before by Phillips and Hansen
(1990) in the context of IV estimation of cointegrated time series regressions.
15
bination in z0,i has the highest correlation. The invalid, or “nuisance”, combinations, hence-
forth denoted z1,i, are ordered least, implying that zi can be partitioned as zi = (z′0,i, z′1,i)′.
Assumption 3 summarizes the restrictions imposed on this vector. Here and throughout the
rest of this section, Zp,i = (Im+1 ⊗ z′p,i), Hp = N−1 ∑Ni=1 Z′p,iC
′i, φp,t = N−1 ∑N
i=1 Z′p,iui,t and
φp,i,t = N−1∑Nj 6=iZ
′p,juj,t, where p ∈ {0, 1}.
Assumption 3.
(i) z0,i is such that Assumption 2 is satisfied with r ≤ k0(m + 1), and Zi and H replaced
by Z0,i and H0, respectively.
(ii) z1,i violates Assumption 2 in such a way that E(‖N−1/2φ1,t‖2) ≤ M,
E(‖T−1 ∑Tt=1 ui,tu′i,tZ1,i‖2) ≤ M, E(‖N−1T−1 ∑N
i=1 ∑Tt=1 Z′1,iui,tu′i,tZ1,i‖2) ≤ M,
E(‖N−3/2T−1/2 ∑Ni=1 ∑T
t=1 ui,tφ′1,i,t‖2) ≤ M, and
E(‖N−3/2T−1/2 ∑Ni=1 ∑T
t=1 Z′iui,tφ′1,i,t‖2) ≤ M.
Remark 4. According to Assumption 3, while the valid combinations in z0,i satisfy Assump-
tion 2 (which means that they are at most weakly correlated with ui,t), the nuisance com-
binations in z1,i do not. The type of violations that can be permitted are characterized by
Assumption 3 (ii), which requires that z1,i is at most strongly correlated with ui,t.
The IC considered in the present paper can be seen as a multivariate version of the IC
criterion of Bai and Ng (2002), and is given by
IC(s) = ln det V(Fs) + s · g, (29)
where V(A) = (NT)−1 ∑Ni=1 W′
iMAWi for any T-rowed matrix A, Fs is F based on s combi-
nations, and g is a penalty term. The associated IC estimator r of r is given simply by
r = arg mins=0,...,smax
IC(s), (30)
where smax ≥ r.
Proposition 1. Under Assumptions 1 and 3, if g→ 0 and min{N,√
T} · g→ ∞, as N, T → ∞,
P(r = r)→ 1.
Define k0 = r/(m + 1). Since r is consistent for r, k0 is consistent for r/(m + 1), which
may or may not be equal to k0. Indeed, since an additional combination increases the dimen-
sion of zi by (m + 1) and not by one, k0 is consistent for k0 only if r is a scalar multiple of
16
(m + 1) (see Smeekes, 2015 for a detailed discussion in the context of subpanel selection). If
r is not a scalar multiple of (m + 1), then k0 estimates the minimal number of combinations
required to approximate the underlying factor structure. Hence, while strictly speaking we
only require r ≤ k0(m + 1), for ease of interpretation it is convenient to think of r as being
equal to k0(m + 1).
In order to appreciate the implications of Proposition 1 it is convenient to treat β as a
function of r (or k0). Let us therefore write βrC3E for βC3E. Clearly,
P[√
NT(βrC3E − β) ≤ δ] = P[
√NT(β
rC3E − β) ≤ δ|r = r]P(r = r)
+ P[√
NT(βrC3E − β) ≤ δ|r 6= r]P(r 6= r),
where δ > 0. Because P(r = r) → 1 and P(r 6= r) → 0 by Proposition 1, while the first term
on the right-hand side converges to P[√
NT(βrC3E− β) ≤ δ|r = r] = P[
√NT(β
rC3E− β) ≤ δ],
the second term converges to zero. It follows that
|P[√
NT(βrC3E − β) ≤ δ]− P[
√NT(β
rC3E − β) ≤ δ]| → 0, (31)
implying that Theorem 1 is unaffected by the estimation of r.
Interestingly, if all the instruments under consideration are valid, the requirement on the
rate of expansion of the penalty can be relaxed, from min{N,√
T} · g→ ∞ to min{√
N,√
T}2 ·
g→ ∞.
Corollary 2. Suppose that zi = z0,i. Under Assumptions 1 and 3, if g → 0 and min{√
N,√
T}2 ·
g→ ∞, as N, T → ∞,
P(r = r)→ 1.
Bai and Ng (2002) propose several ICs that are appropriate in the context of principal
components estimation of common factor models. The Corollary 2 requirement that g → 0
and min{√
N,√
T}2 · g→ ∞ is the same as in their paper. The Proposition 1 requirement, on
the other hand, is, as already mentioned, stronger. Note in particular that if T/N → τ ≤ M,
then Corollary 2 requires that T · g → ∞, which is obviously implied by the Proposition 1
requirement that√
T · g→ ∞. The stricter condition in Proposition 1 is due to the presence of
the invalid combination candidates, and implies that such candidates will not be selected by
the procedure. As usual, the penalty g is not unique and has to be set by the researcher. Let
17
C be either min{√
N,√
T}2 or min{N,√
T}. Bai and Ng (2002) set g = O(C−1 ln(C)) ≥ 0,
such that g → 0 and C · g = O(ln(C)) → ∞. Hence, if C = min{√
N,√
T}2, then g → 0
but min{N,√
T} · g = O(min{N,√
T} · C−1 ln(C)) need not go to infinity. Hence, under the
conditions of Proposition 1 the penalty implied by Corollary 2 is too small. In essence, to be
able to root out those combinations that are correlated with ui,t the penalty has to be higher.
According to Proposition 1, C = min{N,√
T} is enough. In this paper we therefore set
g = (m + 1)ln(min{N,
√T})
min{N,√
T}, (32)
where the term (m + 1) is there to account for the dimension of V(Fs).
Remark 5. As already alluded to in Section 2, the presence of correlation between Zi and
ui,t affects the rate of consistency of F. For√
NT(βC3E − β) to have its stated asymptotic
distribution, it is essential that F is√
N-consistent, which will only be the case if Zi and
ui,t are at most weakly correlated. However, the IC considered here only requires that
T−1‖(F − FH′)′(F − FH′)‖ = op(1), which does not require that Zi and ui,t are at most
weakly correlated (see Bai and Ng, 2002, page 198, for a similar discussion in case of princi-
pal components estimation).
As alluded to in the above, it is not necessary for the candidates to be pre-ordered. If
there is no natural ordering, then one possibility is to simply use an all-subset grid-search,
which is feasible in applications where k is a relatively small number. If k is larger, then we
recommend following, for example, Zheng and Loh (1995), and Zheng and Loh (1997), and
to order the candidates according to an estimate of their correlation with Ci. This can be
done by taking Wi = T−1 ∑Tt=1 wi,t as an estimator for the space spanned by Ci. The logic
behind this approach is that Wi = T−1 ∑Tt=1 wi,t = C′iF + ui = C′iF + op(1). This gives m + 1
correlations for each combination in zi, which can be combined by taking as an example the
average. This is the approach used in the Monte Carlo experiments of Section 5.
5 Monte Carlo results
In this section we evaluate the small sample properties of the C3E estimator. The DGP
we use for this purpose can be seen as a restricted version of (1)–(4), and sets m = 1 and
(f′t, ηi,t, εi,t) ∼ N(0(r+2)×1, Ir+2). The difference between the experiments considered lies in
18
how we generate λi and Λi. Six experiments, denoted by E1–E6, are considered. In E1–
E2 and E6, the condition in (6) is satisfied, whereas in E3–E5, the condition is violated. In
E3, λi and Λi are iid and independent of each other, as required in original CCE, whereas
in E4 and E5, λi and Λi are non-iid. Exactly how yi,t, xi,t, λi, Λi and zi are generated is
described in Table A. For each experiment, 20 (N, T) pairs are considered. In the first 16,
N, T ∈ {30, 50, 100, 200}, whereas in the last four, N = bT4/3c. The motivation behind the
last four pairs is to asses the performance when T/N → 0.
The performance of C3E is compared with that of the naive LS estimator that ignores the
cross-section dependence altogether, the principal components (PC) estimator of Bai (2009)
and the original CCEP estimator of Pesaran (2006). Three versions of the pooled C3E estima-
tor is considered, which differ only in the choice of combinations. For each experiment, there
is a maximum of six combinations to chose from. Specifically, while z1i = 1, z2i, z3i, z4i, z5i
and z6i are drawn from N(0.5, 1), N(−0.4, 1), N(0.2, 1), N(0.5, 1) and N(0.1, 1), respectively
(see Table A). The first estimator, denoted “C3E1”, is based on taking only those k = r/2 com-
binations that are mostly correlated with Ci in the DGP. Thus, if r = 2, then C3E1 is based on
taking the single most correlated combination. Note that this estimator is infeasible in the
sense that it presumes knowledge of both r and the correlation of the combinations with Ci.
The second estimator, denoted “C3E2”, uses the IC criterion discussed in Section 4, which
is applied after first ordering the combinations according to their cross-section correlation
with Wi, as suggested in Section 4. The third and final estimator, denoted “C3E3”, is the
same as C3E2 except that z1i = 1 is always included as a must have combination, following
the recommendation of Section 4.
Two sets of results are reported, both of which are based on making 5, 000 draws from
the DGPs described in Table A. The first set include the bias of each estimator, and the size of
a nominal 5% level t-test. These results are reported in Tables E1–E6, which are conveniently
labelled according to the particular experiment to which they refer. The second set of results
contain the frequency counts for the selected number of combinations used by C3E2 and
C3E3. These results are reported in Table B. The conclusions that are drawn from all seven
tables may be summarized as follows.
E1. The aim of this experiment is to compare the performance of original CCE and C3E
when all the conditions required for both methods are met. Under these conditions
both CCE and C3E should perform equally well, which is also reflected in Table E1.
19
The relatively poor performance of PC is also partly expected, given the findings of
Westerlund and Urbain (2015). We also see that the performance of C3E2 and C3E3 is
very similar to that of C3E1, which means that the selection of the candidates is not
detrimental for performance. This is confirmed by Table B showing how the IC proce-
dure is doing a good job in selecting the number of candidates. In fact, in case of C3E2
the correct selection frequency is one in all cases considered. C3E3 tends to include too
many combinations, which is only natural given the requirement to always include a
vector of ones. However, we also see that this tendency to overselect decreases with
increasing sample sizes.
E2. In this experiment, the loadings are generated independently of the combinations.
Both loadings and combinations still have non-zero means, though, which means that
Assumption 2 is satisfied. However, since for most of the instruments the elements
of H are now smaller (in absolute value) than in E1, the performance under E2 is still
expected to worse than under E1, and this is also what we see when looking across
Tables E1 and E2.
E3. This experiment is conducted to compare the performance of CCEP and C3E when (6)
is not satisfied. Specifically, since in this case
E(Ci) =
[1.3 −0.42.6 −0.8
],
we have rk E(Ci) = 1 < m + 1 = 2. However, since the factor loadings are indepen-
dent, as explained in Section 3.1, the CCEP estimator is still expected to work. The
results reported in Table E3 reveal that while decreasing in N, the bias of the CCEP
estimator is roughly constant in T, as are the size distortions. Performance is still ac-
ceptable, though, which in view of the independence of the loadings is in accordance
with our expectations. However, the best performance is generally obtained by using
C3E, which reflects its relatively high rate of consistency in this case.
E4. In this experiment,
E(Ci) =
[1 1−0.25 −0.25
],
and therefore (6) is violated. However, in contrast to E3 now λi and Λi are correlated.
As expected, this makes CCEP break down. However, since the combinations are still
20
correlated with the loadings, C3E continues to perform well, as does the IC-based se-
lection procedure.
E5. In this experiment, (6) is again violated. However, this time the violation due to the
presence of too many factors; m + 1 = 2 < r = 4. As expected, CCE breaks down.
Interestingly, the effect of this break-down is even more pronounced than in E4. Both
bias and size distortion now increase with the sample size, making the LS problems
seem relatively mild in comparison. By contrast, C3E continues to do well in terms of
bias and size accuracy. One difference is that the tendency of C3E3 to select too many
combinations is now even more pronounced than before. However, this does not seem
to have too much of an effect on the overall performance of this estimator.
E6. The aim of this experiment is to evaluate the performance of the bias-adjustment pro-
cedure proposed in Section 3.3. The results reported in Table E6 suggest that bias-
adjustment leads to a considerable improvement for all estimators considered, includ-
ing CCEP, although C3E tend to perform best.
6 Conclusion
This paper considers the problem of consistent estimation of a factor-augmented panel re-
gression model in which the number of factors, r, is potentially larger than the number of
observables, m + 1. The estimator that we propose can be viewed as an extension of the
CCEP estimator of Pesaran (2006), which is based on using the cross-section averages of the
observables as proxies for the latent factors. While CCEP does allow r > m + 1, it does so
at a cost. In particular, it is required that the factor loadings are independently distributed,
which in most cases of practical relevance is likely to be violated. But even if the assumption
is in fact satisfied, violations of m + 1 ≥ r are still costly. This is particularly true in the ho-
mogenous slope case in which a violation causes a reduction in the rate of consistency, from
the usual√
NT-rate to√
N. In this paper we take this feature of CCE as our starting point.
The purpose is to provide a simple extension that preserves√
NT-consistency without for
that matter requiring independent loadings.
The idea behind the proposed C3E approach is to use not only the cross-section average
but also other (cross-section) combinations of the observables. By taking k ≥ 1 such combi-
nations we can allow k(m + 1) ≥ m + 1 common factors without for that matter requiring
21
independent loadings. In the analysis of the properties of the resulting pooled C3E estima-
tor we focus on the standard assumption of a common slope coefficient, although we also
consider the case when the slopes have a random distribution across the cross-section. We
show that the estimator is√
NT-consistent and asymptotically normal under the condition
that T/N → τ < ∞. This condition is more genal than the T/N → 0 condition of Pesaran
(2006), whose relaxation is shown to have important consequences. In particular, it is shown
that the estimator is biased whenever τ > 0. As a response to this, a bias-adjusted C3E esti-
mator is proposed, which is shown to support asymptotically normal and bias-free inference
under T/N → τ < ∞. This is true if the combinations are known. If there is uncertainty
over which combinations to use an IC can be used to select the appropriate combinations.
The small-sample performance of the C3E estimator is examined through a series of
Monte Carlo experiments. The results suggest that whenever the assumptions of Pesaran
(2006) are satisfied, the performance of the CCE and C3E estimators are comparable. If,
however, the assumptions are not met, then the C3E estimator continues to work well, while
the CCE estimator breaks down. We also find that the proposed bias-adjustment and IC-
based combination selection procedures seem to work well, leading to estimators with good
small-sample properties.
22
References
Amengual, D. and M. W. Watson (2006). Consistent estimation of the number of dynamic
factors in a large N and T panel, detailed appendix. Technical report, Mimeo, May.
Amengual, D. and M. W. Watson (2007). Consistent estimation of the number of dynamic
factors in a large N and T panel. Journal of Business & Economic Statistics 25, 91–96.
Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71, 135–
171.
Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica 77, 1229–1279.
Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models.
Econometrica 70, 191–221.
Bai, J. and S. Ng (2006). Determining the number of factors in approximate factor models,
errata. Technical report, Mimeo, May.
Bai, J. and S. Ng (2010). Instrumental variable estimation in a data rich environment. Econo-
metric Theory 26, 1577–1606.
Baxter, M. and M. A. Kouparitsas (2004). Determinants of business cycle comovement: A
robust analysis. Technical report, Working Paper W10725, National Bureau of Economic
Research.
Chudik, A., H. M. Pesaran, and E. Tosetti (2011). Weak and strong cross-section dependence
and estimation of large panels. Econometrics Journal 14, C45–C90.
Chudik, A. and M. H. Pesaran (2013a). Common correlated effects estimation of heteroge-
neous dynamic panel data models with weakly exogenous regressors. Technical report,
CESifo Working Paper.
Chudik, A. and M. H. Pesaran (2013b). Large panel data models with cross-sectional depen-
dence: a survey. Technical report, CESifo Working Paper.
Chudik, A. and R. Straub (2011). Size, openness, and macroeconomic interdependence. Tech-
nical report, Globalization and Monetary Policy Institute Working Paper 103.
23
Dees, S., F. di Mauro, H. M. Pesaran, and V. L. Smith (2007). Exploring the international
linkages of the euro area: A global var analysis. Journal of Applied Econometrics 22, 1–38.
Donald, S. G. and W. K. Newey (2001). Choosing the number of instruments. Econometrica 69,
1161–1191.
Eberhardt, M., C. Helmers, and H. Strauss (2013). Do spillovers matter when estimating
private returns to R&D? Review of Economics and Statistics 95, 436–448.
Eberhardt, M. and F. Teal (2011). Econometrics for grumblers: A new look at the literature
on cross-country growth empirics. Journal of Economic Surveys 25, 109–155.
Forbes, K. J. and M. D. Chinn (2004). A decomposition of global linkages in financial markets
over time. The Review of Economics and Statistics 86, 705–722.
Fracasso, A. and G. V. Marzetti (2014). International R&D spillovers, absorptive capacity and
relative backwardness: A panel smooth transition regression model. International Economic
Journal 28, 137–160.
Fracasso, A. and G. V. Marzetti (2015). International trade and R&D spillovers. Journal of
International Economics 96, 138–149.
Goncalves, S. and B. Perron (2014). Bootstrapping factor-augmented regression models. Jour-
nal of Econometrics 182, 156–173.
Greenaway-McGrevy, R., C. Han, and D. Sul (2012). Asymptotic distribution of factor aug-
mented estimators for panel regression. Journal of Econometrics 169, 48–53.
Imbs, J. (2004). Trade, finance, specialization and synchronization. The Review of Economics
and Statistics 84, 723–734.
Kapetanios, G., M. H. Pesaran, and T. Yamagata (2011). Panels with nonstationary multifac-
tor error structures. Journal of Econometrics 160, 326–348.
Mastromarco, C., L. Serlenga, and Y. Shin (2015). Modelling technical efficiency in cross sec-
tionally dependent stochastic frontier panels. Journal of Applied Econometrics, forthcoming.
Newey, W. K. and K. D. West (1987). Hypothesis testing with efficient method of moments
estimation. International Economic Review, 777–787.
24
Paulsen, J. (1984). Order determination of multivariate autoregressive time series with unit
roots. Journal of Time Series Analysis 5, 115–127.
Pesaran, M. H. (2006). Estimation and inference in large heterogenous panels with a multi-
factor error structure. Econometrica 74, 967–1012.
Pesaran, M. H., L. Vanessa Smith, and T. Yamagata (2013). Panel unit root tests in the pres-
ence of a multifactor error structure. Journal of Econometrics 175, 94–115.
Phillips, P. C. B. and B. Hansen (1990). Statistical inference in instrumental variables regres-
sion with I(1) variables. Review of Economic Studies 57, 99–125.
Reese, S. and J. Westerlund (2015a). Estimation of factor-augmented panel regressions with
weakly influential factors. Econometric Reviews, forthcoming.
Reese, S. and J. Westerlund (2015b). Panicca – panic on cross-section averages. Journal of
Applied Econometrics, forthcoming.
Rosenberg, B. (1974). Extra-market components of covariance in security returns. Journal of
Financial and Quantitative Analysis 9, 263–274.
Smeekes, S. (2015). Bootstrap sequential tests to determine the order of integration of indi-
vidual units in a time series panel. Journal of Time Series Analysis 36, 398–415.
Stock, J. and M. W. Watson (1998). Diffusion indexes. Technical report, Working Paper 6702,
National Bureau of Economic Research.
Tutz, G. and H. Binder (2007). Boosting ridge regression. Computational Statistics and Data
Analysis 51, 6044–6059.
Westerlund, J. and J.-P. Urbain (2013). On the estimation and inference in factor-augmented
panel regressions with correlated loadings. Economics Letters 119(3), 247–250.
Westerlund, J. and J.-P. Urbain (2015). Cross-sectional averages versus principal components.
Journal of Econometrics 185, 372–377.
Zheng, X. and W.-Y. Loh (1995). Consistent variable selection in linear models. Journal of the
American Statistical Association 90, 151–156.
25
Zheng, X. and W.-Y. Loh (1997). A consistent variable selection criterion for linear models
with high-dimensional covariates. Statistica Sinica 7, 311–325.
26
Appendix: Proofs
We start with some notation. The model for wi,t = (yi,t, x′i,t)′ can be written in matrix nota-
tion as
Wi = FCi + Ui, (A1)
where Wi = (wi,1, ..., wi,T)′ is T × (m + 1), F = (f1, ..., fT)
′ is T × r, Ci = (Λ′iβ + λi, Λ′i) is
r× (m + 1) and Ui = (ui,1, ..., ui,T)′ = (ηiβ + εi, ηi) is T × (m + 1). Alternatively, the model
for wi,t can be written as the following N-dimensional system:
wt = Cft + ut, (A2)
where wt = (w′1t, ..., w′Nt)′ and ut = (u′1t, ..., u′Nt)
′ are N(m + 1)× 1, and C = (C1, ..., CN)′ is
N(m + 1)× r. The matrix notation
W = FC′ + U (A3)
will also be used, where W = (W1, ..., WN) and U = (U1, ..., UN) are T × N(m + 1). In what
follows the representations in (A1)–(A3) will be used interchangeably.
Many of the results can be expressed in terms of (F− FH′). Let us therefore define
D = F− FH′ =1N
N
∑i=1
UiZi, (A4)
whose dimension is given by T× (m+ 1)k. It is further convenient to write D = (d1, ..., dT)′,
where
dt = ft −Hft =1N
N
∑i=1
Z′iui,t (A5)
is (m + 1)k× 1.
Before we come to the proof of Theorem 1 we state some useful lemmas.
Lemma A.1. Under Assumption 2,
1T
T
∑t=1‖dt‖2 = Op(N−1).
Proof of Lemma A.1.
27
The proof of Lemma 1 is a simple consequence of the fact that ‖N−1/2 ∑Ni=1 Z′iui,t‖ = ‖φt‖ =
Op(1), by Assumption 2 (iii), as seen by using (A5) and writing
1T
T
∑t=1‖dt‖2 ≤ 1
NT
T
∑t=1
∥∥∥∥∥ 1√N
N
∑i=1
Z′iui,t
∥∥∥∥∥2
= Op(N−1),
where triangle inequality is used to obtain the first inequality. �
Lemma A.2. Under Assumptions 1-2,
‖√
NT−1/2F′D‖ = Op(1).
Proof of Lemma A.2.
Since, by using (A5),
√NT−1/2F′D =
1√NT
N
∑i=1
T
∑t=1
ftu′i,tZi =1√T
T
∑t=1
ft1√N
N
∑i=1
u′i,tZi =1√T
T
∑t=1
ftφ′t, (A6)
the proof is an immediate consequence of Assumption 1 (vii) and Assumption 2 (iii). �
Lemma A.3. Under the conditions of Lemma A.1 and as N, T → ∞,
NT−1D′D = ΣZu + op(1).
Proof of Lemma A.3.
By substituting (A5),
NT−1D′D =NT
T
∑t=1
dtd′t =1
NT
T
∑t=1
N
∑i=1
N
∑j=1
Z′iui,tu′j,tZj
=1
NT
T
∑t=1
N
∑i=1
Z′iui,tu′i,tZi +1
NT
T
∑t=1
N
∑i=1
N
∑j 6=i
Z′iui,tu′j,tZj
= ΣZu + Op(T−1/2), (A7)
where the last equality follows from Assumption 2 (iii). �
Lemma A.4. Under Assumptions 1 and 2 and n = r, as N, T → ∞ with T/N → τ > 0,
1√NT
N
∑i=1
η′iD(H′)−1λi =√
τ1N
N
∑i=1
Ση,i(β, Im)Zi(H′)−1λi + op(1).
Proof of Lemma A.4.
28
By using (A5), we write,
1T
N
∑i=1
η′iD(H′)−1λi =1T
N
∑i=1
T
∑t=1
ηi,td′t(H
′)−1λi =
1NT
N
∑i=1
T
∑t=1
N
∑j=1
ηi,tu′j,tZj(H
′)−1λi
=1
NT
N
∑i=1
T
∑t=1
ηi,tu′i,tZi(H
′)−1λi +
1NT
N
∑i=1
T
∑t=1
N
∑j 6=i
ηi,tu′j,tZj(H
′)−1λi
=1N
N
∑i=1
Ση,i(β, Im)Zi(H′)−1λi + Op(T−1/2), (A8)
where the last equality is obtained by using Assumption 2 (iii) and the fact that E(ηi,tu′i,t) =
E[ηi,t(εi,t + η′i,tβ, η′i,t)] = (Ση,iβ, Ση,i) = Ση,i(β, Im), which is implied by Assumption 1. The
result in the lemma is obtained by multiplying both sides by√
T/√
N. �
Lemma A.5. Under the conditions of Lemma A.4,
1√NT
N
∑i=1
ε′iD(H−1)′Λ′i =
√τ
1N
N
∑i=1
σ2ε,i(1, 01×m)Zi(H
−1)′Λ′i + op(1).
Proof of Lemma A.5.
By using (A5), we write
1T
N
∑i=1
ε′iD(H−1)′Λ′i =
1NT
N
∑i=1
N
∑j=1
T
∑t=1
εi,tu′j,tZj(H−1)′Λ′i
=1
NT
N
∑i=1
T
∑t=1
εi,tu′i,tZi(H−1)′Λ′i +
1NT
N
∑i=1
N
∑j 6=i
T
∑t=1
εi,tu′j,tZj(H−1)′Λ′i
=1N
N
∑i=1
σ2ε,i(1, 01×m)Zi(H
−1)′Λ′i + Op(T−1/2), (A9)
where the last equality is implied by Assumption 2 (iii) and the fact that E(εi,tu′i,t) = E[εi,t(εi,t +
η′i,tβ, η′i,t)] = (σ2ε,i, 01×m), which is implied by Assumption 1. Then, multiplying both sides
by√
T/√
N yields the required result. �
Proof of Theorem 1.
Since rk H = r and n = r, H is r × r and nonsingular. The equation for yi can therefore be
written as
yi = Xiβ + F(H′)−1λi −D(H′)−1λi + εi, (A10)
29
where D = F− FH′ is in the introduction of this appendix. The C3E estimator of β is given
by
βC3E =
(N
∑i=1
X′iMFXi
)−1 N
∑i=1
X′iMFyi.
By substituting for yi using (A10), we obtain the following expression for√
NT(βC3E − β):
√NT(βC3E − β) =
(1
NT
N
∑i=1
X′iMFXi
)−11√NT
N
∑i=1
X′iMF(εi −D(H′)−1λi). (A11)
We begin by considering the second term in the numerator. Clearly, MFD(H′)−1 =
MF(F− FH′)(H′)−1 = −MFF, and therefore
− 1√NT
N
∑i=1
X′iMFD(H′)−1λi =1√NT
N
∑i=1
X′iMFFλi
=1√NT
N
∑i=1
ΛiF′MFFλi +1√NT
N
∑i=1
η′iMFFλi
= K1 + K2. (A12)
Consider K1. From
HF′MFFH′ = D′MFD = D′MFH′D−D′(MFH′ −MF)D,
we obtain
K1 =1√NT
N
∑i=1
ΛiF′MFFλi
=1√NT
N
∑i=1
ΛiH−1HF′MFFH′(H′)−1λi
= (NT)−1/2N
∑i=1
ΛiH−1D′MFH′D(H′)−1λi
− 1√NT
N
∑i=1
ΛiH−1D′(MFH′ −MF)D(H′)−1λi
= K11 −K12. (A13)
Consider K12, from the definitions of MFH′ and MF,
MFH′ −MF = D(F′F)−1D′ + D(F′F)−1HF′
+ FH′(F′F)−1D′ + FH′[(F′F)−1 − (HF′FH′)−1]HF′,
30
which implies
D′(MFH′ −MF)D = D′D(F′F)−1D′D + D′D(F′F)−1HF′D + D′FH′(F′F)−1D′D
+ D′FH′[(F′F)−1 − (HF′FH′)−1]HF′D. (A14)
Consider the fourth term. Since (HF′FH′)−1 = (H′)−1(F′F)−1H−1, we have
(F′F)−1 − (HF′FH′)−1 = (F′F)−1(HF′FH′ − F′F)(H′)−1(F′F)−1H−1
= −(F′F)−1(D′FH′ + F′D)(H′)−1(F′F)−1H−1.
By Assumption 2 (i), and Lemmas A.2 and A.3, using triangle inequality and the submulti-
plicative property of norms,
‖√
NT−1/2D′F‖ ≤√
TN−1/2‖NT−1D′D‖+ ‖√
NT−1/2D′F‖‖H‖
= Op(√
TN−1/2) + Op(1), (A15)
which, together with Assumption 1 (iv), gives
T‖(F′F)−1 − (HF′FH′)−1‖
≤ ‖(T−1F′F)−1‖T−1‖D′F + HF′D‖‖(H′)−1‖‖(T−1F′F)−1‖‖H−1‖
= Op(N−1) + Op((NT)−1/2). (A16)
This result imply, via Lemmas A.2 and A.3 and Assumption 1 (i),
‖T−1D′(MFH′ −MF)D‖
≤ ‖T−1D′D‖2 ‖(T−1F′F)−1‖+ 2‖H‖‖T−1D′D‖ ‖(T−1F′F)−1‖ ‖T−1F′D‖
+ ‖T−1D′F‖2‖H‖2T‖(F′F)−1 − (HF′FH′)−1‖
= Op(N−2) + Op(N−1)Op((NT)−1/2) + [Op(N−1) + Op((NT)−1/2)]Op((NT)−1)
= Op(N−2) + Op(N−3/2T−1/2). (A17)
Hence, by Assumption 1(vi) and the submultiplicative property of norms and by triangle
inequality, we have
‖K12‖ =
∥∥∥∥∥ 1√NT
N
∑i=1
ΛiH−1D′(MFH′ −MF)D(H′)−1λi
∥∥∥∥∥≤√
NT‖H−1‖2‖T−1D′(MFH′ −MF)D‖1N
N
∑i=1‖Λi‖‖λi‖
= Op(√
TN−3/2) + Op(N−1). (A18)
31
Consider K11. Since MFH′ = MF, we have D′MFH′D = D′MFD = D′D−D′F(F′F)−1F′D,
where∥∥∥∥∥ 1√NT
N
∑i=1
ΛiH−1D′F(F′F)−1F′D(H′)−1λi
∥∥∥∥∥≤ (NT)−1/2‖H−1‖2‖
√NT−1/2D′F‖2 ‖(T−1F′F)−1‖ 1
N
N
∑i=1‖Λi‖ ‖λi‖
= Op((NT)−1/2),
which is obtained by making use of Assumptions 1 (i), (iv), (vi) and Lemma A.2. It follows
that
K11 =
√T√N
1N
N
∑i=1
ΛiH−1NT−1D′D(H′)−1λi + Op(
√TN−3/2) + Op(N−1) + Op((NT)−1/2),
and so, by application of Lemma A.3 and using Assumption 1 (vi), as T/N → τ with N, T →
∞,
K1 = K11 −K12 =√
τB1 + op(1), (A19)
where B1 = limN→∞ N−1 ∑Ni=1 ΛiH−1ΣZu(H′)−1λi.
Next, consider K2. By using MFH′Fλi = MFFλi = 0T×1, and the previously obtained
expression to substitute for (MFH′ −MF), we arrive at
K2 =1√NT
N
∑i=1
η′iMFFλi
= − 1√NT
N
∑i=1
η′i(MFH′ −MF)Fλi
= − 1√NT
N
∑i=1
η′iD(F′F)−1D′Fλi −1√NT
N
∑i=1
η′iD(F′F)−1HF′Fλi
− 1√NT
N
∑i=1
η′iFH′(F′F)−1D′Fλi −1√NT
N
∑i=1
η′iFH′[(F′F)−1 − (HF′FH′)−1]HF′Fλi
= −K21 − ...−K24.
Since d′t(F′F)−1ds, f′t(F
′F)−1D′Fλi and f′sλi are just scalars, the orders of K21 and K23 can be
32
inferred as follows:
‖K21‖ =
∥∥∥∥∥ 1√NT
N
∑i=1
η′iD(F′F)−1D′Fλi
∥∥∥∥∥=
∥∥∥∥∥ 1√NT
N
∑i=1
T
∑t=1
ηi,td′t(F′F)−1
T
∑s=1
dsf′sλi
∥∥∥∥∥=
∥∥∥∥∥ 1√NT
T
∑t=1
T
∑s=1
d′t(F′F)−1ds
N
∑i=1
ηi,tf′sλi
∥∥∥∥∥≤√
T
(1
T2
T
∑t=1
T
∑s=1‖d′t(T−1F′F)−1dt‖2
)1/2 1
T2
T
∑t=1
T
∑s=1
∥∥∥∥∥ 1√N
N
∑i=1
ηi,tf′sλi
∥∥∥∥∥21/2
≤√
T(T−1F′F)−1 1T
T
∑t=1‖dt‖2
1T2
T
∑t=1
T
∑s=1
∥∥∥∥∥ 1√N
N
∑i=1
ηi,tλ′i
∥∥∥∥∥2
‖fs‖2
1/2
= Op(√
TN−1),
where we make use of Lemma A.1, Assumption 1 to obtain the result, and
‖K23‖ =
∥∥∥∥∥ 1√NT
N
∑i=1
η′iFH′(F′F)−1D′Fλi
∥∥∥∥∥=
∥∥∥∥∥ 1√NT
N
∑i=1
T
∑t=1
ηi,tf′tH′(F′F)−1D′Fλi
∥∥∥∥∥=
∥∥∥∥∥ 1√NT
N
∑i=1
T
∑t=1
ηi,tλ′iF′D(F′F)−1Hft
∥∥∥∥∥≤ 1√
N
1T
T
∑t=1
∥∥∥∥∥ 1√N
N
∑i=1
ηi,tλ′i
∥∥∥∥∥21/2
‖(T−1F′F)−1‖ ‖√
NT−1/2F′D‖
× ‖H‖(
1T
T
∑t=1‖ft‖2
)1/2
= Op(N−1/2),
where the result makes use of Lemma A.2 and Assumption 1. Similarly, since f′tH′[(F′F)−1−
33
(HF′FH′)−1]HF′Fλi is a scalar,
‖K24‖ =
∥∥∥∥∥ 1√NT
N
∑i=1
η′iFH′T[(F′F)−1 − (HF′FH′)−1]T−1HF′Fλi
∥∥∥∥∥=
∥∥∥∥∥ 1√NT
N
∑i=1
T
∑t=1
ηi,tλ′iF′FH′[(F′F)−1 − (HF′FH′)−1]Hft
∥∥∥∥∥≤√
T
1T
T
∑t=1
∥∥∥∥∥ 1√N
N
∑i=1
ηi,tλ′i
∥∥∥∥∥21/2
‖T−1F′F‖‖H‖2T‖(F′F)−1 − (HF′FH′)−1‖
×(
1T
T
∑t=1‖ft‖2
)1/2
= Op(√
TN−1) + Op(N−1/2),
by (A16), Assumption 1 and Assumption 2 (i). K22 can be expanded as follows, by adding
and subtracting 1√NT ∑N
i=1 η′iD(HF′FH′)−1HF′Fλi:
K22 =1√NT
N
∑i=1
η′iD(F′F)−1HF′Fλi
=1√NT
N
∑i=1
η′iD(H′)−1λi
+√
NT1N
N
∑i=1
T−1η′iD[(T−1F′F)−1 − (T−1HF′FH′)−1]T−1HF′Fλi,
where the norm of the last term on the right is∥∥∥∥∥ 1N
N
∑i=1
T−1η′iD[(T−1F′F)−1 − (T−1HF′FH′)−1]T−1HF′Fλi
∥∥∥∥∥=
∥∥∥∥∥ 1NT
N
∑i=1
T
∑t=1
ηi,tλ′iT−1F′FH[(T−1F′F)−1 − (T−1HF′FH′)−1]dt
∥∥∥∥∥≤
1T
T
∑t=1
∥∥∥∥∥ 1√N
N
∑i=1
ηi,tλ′i
∥∥∥∥∥21/2
‖T−1F′F‖‖H‖‖(T−1F′F)−1 − (T−1HF′FH′)−1‖
×(
1T
T
∑t=1‖dt‖2
)1/2
= [Op(N−1) + Op((NT)−1/2)]Op(N−1/2)
= Op(N−3/2) + Op(T−1/2N−1),
by Lemma A.1, (A16) and Assumption 1. The order of the second term in K22 is√
NT times
34
this, which is Op(√
TN−1) + Op(N−1/2). The first term of K22 is
1√NT
N
∑i=1
η′iD(H′)−1λi =√
τ1N
N
∑i=1
Ση,i(β, Im)Zi(H′)−1λi + op(1),
by Lemma A.4. Hence, letting B2 = limN→∞ N−1 ∑Ni=1 Ση,i(β, Im)Zi(H′)−1λi, we have
K22 =√
τB2 + op(1). (A20)
The above results imply that, the second term in the numerator,
− 1√NT
N
∑i=1
X′iMFD(H′)−1λi = K1 + K2 =√
τ(B1 − B2) + op(1). (A21)
Next, consider (NT)−1/2 ∑Ni=1 X′iMFεi, the first term in the numerator of
√NT(βC3E− β).
Clearly,
1√NT
N
∑i=1
X′iMFεi =1√NT
N
∑i=1
X′iMFH′εi −1√NT
N
∑i=1
X′i(MFH′ −MF)εi, (A22)
where
1√NT
N
∑i=1
X′i(MFH′ −MF)εi
=1√NT
N
∑i=1
X′iD(F′F)−1D′εi +1√NT
N
∑i=1
X′iD(F′F)−1HF′εi
+1√NT
N
∑i=1
X′iFH′(F′F)−1D′εi +1√NT
N
∑i=1
X′iFH′[(F′F)−1 − (HF′FH′)−1]HF′εi
= L1 + ... + L4. (A23)
The order of L1, ..., L4 can be obtained by using the same steps as when analyzing K2. For
L1, we use the fact that xi,t = Λift + ηi,t, giving∥∥∥∥∥ 1√N
N
∑i=1
xi,tεi,s
∥∥∥∥∥ ≤∥∥∥∥∥ 1√
N
N
∑i=1
Λiεi,s
∥∥∥∥∥‖ft‖+∥∥∥∥∥ 1√
N
N
∑i=1
ηi,tεi,s
∥∥∥∥∥ = Op(1),
which, in view of Lemma A.1, implies
‖L1‖ =
∥∥∥∥∥ 1√NT
N
∑i=1
X′iD(F′F)−1D′εi
∥∥∥∥∥=
∥∥∥∥∥ 1√NT
T
∑t=1
T
∑s=1
d′t(F′F)−1ds
N
∑i=1
xi,tεi,s
∥∥∥∥∥≤√
T1T
T
∑t=1‖dt‖2‖(T−1F′F)−1‖
1T2
T
∑t=1
T
∑s=1
∥∥∥∥∥ 1√N
N
∑i=1
xi,tεi,s
∥∥∥∥∥21/2
= Op(√
TN−1), (A24)
35
by Assumption 1. We can similarly show that ‖T−1X′iF‖ = Op(1), leading to the following
result for ‖L4‖:
‖L4‖ =
∥∥∥∥∥ 1√NT
N
∑i=1
X′iFH′[(F′F)−1 − (HF′FH′)−1]HF′εi
∥∥∥∥∥≤√
N
(1N
N
∑i=1‖T−1X′iF‖2
)1/2
‖H‖2T‖(F′F)−1 − (HF′FH′)−1‖
×(
1N
N
∑i=1‖T−1/2F′εi‖2
)1/2
= Op(N−1/2) + Op(T−1/2), (A25)
by Assumptions 1, 2 (i) and (A16). Consider L2. Adding and subtracting 1√NT ∑N
i=1 X′iD(HF′FH′)−1HF′εi
give
L2 =1√NT
N
∑i=1
X′iD(F′F)−1HF′εi
=1√NT
N
∑i=1
X′iD(H′)−1(F′F)−1F′εi +1√NT
N
∑i=1
X′iD[(F′F)−1 − (HF′FH′)−1]HF′εi
= L21 + L22,
where
‖L22‖ =
∥∥∥∥∥ 1√NT
N
∑i=1
X′iD[(F′F)−1 − (HF′FH′)−1]HF′εi
∥∥∥∥∥=
∥∥∥∥∥ 1√NT
N
∑i=1
T
∑t=1
T
∑s=1
xi,td′t[(F′F)−1 − (HF′FH′)−1]Hfsεi,s
∥∥∥∥∥=√
T‖H‖(
1T
T
∑t=1‖dt‖2
)1/2
T‖(F′F)−1 − (HF′FH′)−1‖
×(
1T
T
∑s=1‖fs‖2
)1/2 1
T2
T
∑t=1
T
∑s=1
∥∥∥∥∥ 1√N
N
∑i=1
εi,sx′i,t
∥∥∥∥∥21/2
=√
TOp(N−1/2)[Op(N−1) + Op((NT)−1/2)] = Op(√
TN−3/2) + Op(N−1),
by Assumption 1, Lemma A.1 and (A16). Also, from Xi = FΛ′i + ηi,
L21 =1√NT
N
∑i=1
X′iD(H′)−1(F′F)−1F′εi
=1√NT
N
∑i=1
ΛiF′D(H′)−1(F′F)−1F′εi +1√NT
N
∑i=1
η′iD(H′)−1(F′F)−1F′εi.
36
By Assumption 2 (iii),
T−1η′iD =1T
T
∑t=1
ηi,td′t =
1NT
T
∑t=1
N
∑j=1
ηi,tu′j,tZj =
1NT
T
∑t=1
ηi,tu′i,tZi +
1NT
T
∑t=1
N
∑j 6=i
ηi,tu′j,tZj
= Op(N−1) + Op((NT)−1/2),
from which and Assumption 1 and Assumption 2 (i), we deduce that∥∥∥∥∥ 1√NT
N
∑i=1
η′iD(H′)−1(F′F)−1F′εi
∥∥∥∥∥≤√
N
(1N
N
∑i=1‖T−1η′iD‖
2
)1/2(1N
N
∑i=1‖T−1/2F′εi‖2
)1/2
‖H−1‖‖(T−1F′F)−1‖
=√
N[Op(N−1) + Op((NT)−1/2)] = Op(N−1/2) + Op(T−1/2),
and by further use of Lemma A.2 and Assumption 1,∥∥∥∥∥ 1√NT
N
∑i=1
ΛiF′D(H′)−1(F′F)−1F′εi
∥∥∥∥∥≤ 1√
T1N
N
∑i=1‖Λi‖‖
√NT−1/2F′D‖‖H−1‖‖(T−1F′F)−1‖‖T−1/2F′εi‖
= Op(T−1/2).
Consequently, by the triangle inequality,
‖L21‖ ≤∥∥∥∥∥ 1√
NT
N
∑i=1
ΛiF′D(H′)−1(F′F)−1F′εi
∥∥∥∥∥+∥∥∥∥∥ 1√
NT
N
∑i=1
η′iD(H′)−1(F′F)−1F′εi
∥∥∥∥∥= Op(N−1/2) + Op(T−1/2),
leading to the following result for ‖L2‖:
‖L2‖ ≤ ‖L21‖+ ‖L22‖ = Op(N−1/2) + Op(T−1/2) + Op(√
TN−3/2), (A26)
which is op(1) for N, T → ∞, if we assume that√
TN−3/2 = o(1).
Consider L3. We begin by adding and subtracting,
L3 =1√NT
N
∑i=1
X′iFH′(F′F)−1D′εi
=1√NT
N
∑i=1
X′iF(F′F)−1H−1D′εi +
1√NT
N
∑i=1
X′iFH′[(F′F)−1 − (HF′FH′)−1]D′εi,
37
where, in analogy to ‖L22‖,∥∥∥∥∥ 1√NT
N
∑i=1
X′iFH′[(F′F)−1 − (HF′FH′)−1]D′εi
∥∥∥∥∥=
∥∥∥∥∥ 1√NT
N
∑i=1
T
∑t=1
T
∑s=1
xi,tf′tH′[(F′F)−1 − (HF′FH′)−1]dsεi,s
∥∥∥∥∥≤√
T
(1T
T
∑t=1‖ft‖2
)1/2
‖H‖T‖(F′F)−1 − (HF′FH′)−1‖(
1T
T
∑s=1‖ds‖2
)1/2
×
1T2
T
∑t=1
T
∑s=1
∥∥∥∥∥ 1√N
N
∑i=1
xi,tεi,s
∥∥∥∥∥21/2
=√
T[Op(N−1) + Op((NT)−1/2)]Op(N−1/2) = Op(√
TN−3/2) + Op(N−1),
by Assumption 1, the result in (A16), and Lemma A.1. Consider the first term of L3, by
substituting for Xi and then using Lemma A.5, it can be written as
1√NT
N
∑i=1
X′iF(F′F)−1H−1D′εi
=1√NT
N
∑i=1
ΛiF′F(F′F)−1H−1D′εi +√
N1N
N
∑i=1
T−1/2η′iF(T−1F′F)−1H−1T−1D′εi
=1√NT
N
∑i=1
ΛiH−1D′εi +
√N[Op(N−1) + Op((NT)−1/2)]
=√
τ1N
N
∑i=1
σ2ε,iΛiH
−1Z′i(1, 0m)′ + op(1),
where we have made use of the fact that T−1D′εi is of the same order as T−1η′iD. Note
how Lemma A.5 supposes that T/N → τ, under which√
TN−3/2 = o(1). Hence, letting
B3 = limN→∞ N−1 ∑Ni=1 σ2
ε,iΛiH−1Z′i(1, 0m)′, we obtain
L3 =√
τB3 + op(1). (A27)
The results for L1, ..., L4 give
1√NT
N
∑i=1
X′i(MFH′ −MF)εi = L1 + ... + L4 =√
τB3 + op(1), (A28)
provided that√
TN−1/2 → τ. The implication is that
1√NT
N
∑i=1
X′iMFεi =1√NT
N
∑i=1
X′iMFH′εi −√
τB3 + op(1). (A29)
38
Let us consider the first term on the right-hand side of the above equation. The variance of
(NT)−1/2 ∑Ni=1 η′iFH′(HF′FH′)−1HF′εi is Op(T−1); hence,∥∥∥∥∥ 1√
NT
N
∑i=1
η′iFH′(HF′FH′)−1HF′εi
∥∥∥∥∥ = Op(T−1/2). (A30)
This result, together with the fact that MFH′Xi = MFH′ηi, implies
1√NT
N
∑i=1
X′iMFH′εi =1√NT
N
∑i=1
η′iMFH′εi
=1√NT
N
∑i=1
η′iεi −1√NT
N
∑i=1
η′iFH(H′F′FH)−1H′F′εi
=1√NT
N
∑i=1
η′iεi + Op(T−1/2). (A31)
where by Assumption 1, (NT)−1/2 ∑Ni=1 η′iεi →d N(0m×1, Σηε) as N, T → ∞. Thus, provided
that T/N → τ,
1√NT
N
∑i=1
X′iMFεi =1√NT
N
∑i=1
η′iεi −√
τB3 + op(1)
→d N(0m×1, Σηε)−√
τB3. (A32)
Let B = B1− B2− B3. The above results suggest the following limit for the numerator of√
NT(βC3E − β):
1√NT
N
∑i=1
(X′iMFεi − X′iMFD(H′)−1λi) =1√NT
N
∑i=1
η′iεi +√
τB + op(1)
→d N(0m×1, Σηε)−√
τB, (A33)
which holds as N, T → ∞ with T/N → τ.
Next, consider the denominator of√
NT(βC3E − β), which we expand as
1NT
N
∑i=1
X′iMFXi =1
NT
N
∑i=1
X′iMFH′Xi −1
NT
N
∑i=1
X′i(MFH′ −MF)Xi, (A34)
where
‖T−1X′i(MFH′ −MF)Xi‖
≤ ‖T−1X′iD‖2‖(T−1F′F)−1‖+ 2‖H‖‖T−1X′iD‖ ‖T−1X′iF‖ ‖(T−1F′F)−1‖
+ ‖T−1X′iF‖2‖H‖2T‖(F′F)−1 − (HF′FH′)−1‖.
Clearly, ‖T−1X′iF‖ = Op(1), and by using the fact that ‖T−1η′iD‖ = Op(N−1)+Op((NT)−1/2)
and ‖T−1F′D‖ = Op((NT)−1/2), we can further show that
‖T−1X′iD‖ ≤ ‖Λi‖‖T−1F′D‖+ ‖T−1η′iD‖ = Op(N−1) + Op((NT)−1/2).
39
This implies
‖T−1X′i(MFH′ −MF)Xi‖ = Op(N−1) + Op((NT)−1/2), (A35)
and so we get∥∥∥∥∥ 1NT
N
∑i=1
X′i(MFH′ −MF)Xi
∥∥∥∥∥ ≤ 1N
N
∑i=1‖T−1X′i(MFH′ −MF)Xi‖
= Op(N−1) + Op((NT)−1/2). (A36)
By using this and
T−1X′iMFH′Xi = T−1η′iMFH′ηi = T−1η′iηi − T−1T−1/2η′iF(T−1F′F)−1T−1/2F′ηi
= T−1η′iηi + Op(T−1), (A37)
we obtain
1NT
N
∑i=1
X′iMFXi =1
NT
N
∑i=1
X′iMFH′Xi + op(1) =1
NT
N
∑i=1
η′iηi + op(1)
= Ση + op(1). (A38)
By adding all the results, as N, T → ∞ with T/N → τ,
√NT(βC3E − β) =
(1
NT
N
∑i=1
X′iMFXi
)−11√NT
N
∑i=1
(X′iMFεi − X′iMFD(H′)−1λi)
= Σ−1η
(1√NT
N
∑i=1
η′iεi +√
τB
)+ op(1)
→d N(0m×1, Σ−1η ΣηεΣ−1
η ) + Σ−1η
√τB.
This completes the proof. �
Proof of Theorem 2.
When βi = β + ξi,√
N(βC3E − β) can be written as
√N(βC3E − β) = T−1/2
(1
NT
N
∑i=1
X′iMFXi
)−11√NT
N
∑i=1
X′iMF(εi −D(H′)−1λi)
+
(1
NT
N
∑i=1
X′iMFXi
)−11√NT
N
∑i=1
X′iMFXiξi.
From Proof of Theorem 1, we know that the first term is Op(T−1/2). We therefore focus on
the second term. Clearly,
1√NT
N
∑i=1
X′iMFXiξi =1√NT
N
∑i=1
X′iMFH′Xiξi +1√NT
N
∑i=1
X′i(MF −MFH′)Xiξi. (A39)
40
From (A36),∥∥∥∥∥ 1√NT
N
∑i=1
X′i(MF −MFH′)Xiξi
∥∥∥∥∥ ≤ √N
1N
N
∑i=1‖T−1X′i(MF −MFH′)Xi‖‖ξi‖
=√
N[Op(N−1) + Op((NT)−1/2)] = op(1),
and by use of ‖T−1η′iF(F′F)−1F′ηi‖ ≤ T−1‖T−1/2η′iF‖‖(T−1F′F)−1‖‖T−1/2F′ηi‖ = Op(T−1),
we can further show that
1√NT
N
∑i=1
X′iMFH′Xiξi =1√NT
N
∑i=1
η′iMFηiξi
=1√NT
N
∑i=1
η′iηiξi −√
N1N
N
∑i=1
T−1η′iF(F′F)−1F′ηiξi
=1√NT
N
∑i=1
η′iηiξi +√
NOp(T−1)
=1√N
N
∑i=1
Ση,iξi + op(1),
where the last result requires√
NT−1 = o(1), which is implied by T/N → τ. In view of
(A38) this yields
√N(βC3E − β) = Σ−1
η
1√N
N
∑i=1
Ση,iξi + op(1)→d N(0m×1, Σ−1η RΣ−1
η ),
where R = limN→∞ N−1 ∑Ni=1 Ση,iΣξΣη,i. This completes the proof. �
Proof of Theorem 3.
Consider (20), by using (A10) we rewrite it as
√T(βC3E,i − β) = (T−1X′iMFXi)
−1T−1/2X′iMFεi
− (T−1X′iMFXi)−1T−1/2X′iMFD(H′)−1λi. (A40)
We begin by considering the numerator of the second term. By following the same analogy
as in the proof of Theorem 1, we can write
T−1/2X′iMFD(H′)−1λi = T−1/2η′iMFH′Fλi − T−1/2η′i(MFH′ −MF)Fλi
− T−1/2ΛiH−1D′MFH′D(H′)−1λi
+ T−1/2ΛiH−1D′(MFH′ −MF)D(H′)−1λi.
41
The first term is zero since MFH′ = MF. By (A17), the fourth term has an order of Op(√
TN−2)+
Op(N−3/2). For the third term we use Lemma A.3, giving
‖T−1/2ΛiH−1D′MFH′D(H′)−1λi‖ ≤ ‖T−1/2ΛiH
−1D′D(H′)−1λi‖
≤√
TN−1‖Λi‖‖H−1‖‖NT−1D′D‖‖(H′)−1‖‖λi‖
= Op(√
TN−1).
The second term can be written as
T−1/2η′i(MFH′ −MF)Fλi = T−1/2η′iD(F′F)−1D′Fλi + T−1/2η′iD(F′F)−1HF′Fλi
+ T−1/2η′iFH′(F′F)−1D′Fλi
+ T−1/2η′iFH′[(F′F)−1 + (HF′FH′)−1]HF′Fλi.
Consider ‖T−1η′iD‖. Clearly,
‖T−1η′iD‖ =∥∥∥∥∥ 1
NT
N
∑j=1
T
∑t=1
ηi,tu′j,tZj
∥∥∥∥∥ ≤∥∥∥∥∥ 1
NT
T
∑t=1
ηi,tu′i,tZi
∥∥∥∥∥+∥∥∥∥∥ 1
NT
N
∑j 6=i
T
∑t=1
ηi,tu′j,tZj
∥∥∥∥∥ ,
where, by using the same arguments as in the proof of Lemma A.4, the first term is Op(N−1)
and the second is Op(T−1/2N−1/2). ‖T−1/2η′iF‖ is clearly Op(1) by Assumption 1 (vii). Mak-
ing use of these results, (A16) and Lemma A.2, we obtain
‖T−1/2η′i(MFH′ −MF)Fλi‖ = Op(√
TN−1) + Op((NT)−1/2).
Let us now consider the numerator in the first term of (A40), which can be written as
follows:
T−1/2X′iMFεi = T−1/2η′iMFH′εi − T−1/2X′iD(F′F)−1D′εi −1√T
N
∑i=1
X′iD(F′F)−1HF′εi
− 1√NT
N
∑i=1
X′iFH′(F′F)−1D′εi −1√T
N
∑i=1
X′iFH′[(F′F)−1 − (HF′FH′)−1]HF′εi.
Here,
‖T−1ε′iD‖ =
∥∥∥∥∥ 1NT
N
∑j=1
T
∑t=1
εi,tu′j,tZj
∥∥∥∥∥ ≤∥∥∥∥∥ 1
NT
T
∑t=1
εi,tu′i,tZi
∥∥∥∥∥+∥∥∥∥∥ 1
NT
N
∑j 6=i
T
∑t=1
εi,tu′j,tZj
∥∥∥∥∥= Op(N−1) + Op((NT)−1/2),
and
‖T−1X′iD‖ =
∥∥∥∥∥ 1NT
N
∑j=1
T
∑t=1
xi,tu′j,tZj
∥∥∥∥∥ ≤∥∥∥∥∥ 1
NT
T
∑t=1
xi,tu′i,tZi
∥∥∥∥∥+∥∥∥∥∥ 1
NT
N
∑j 6=i
T
∑t=1
xi,tu′j,tZj
∥∥∥∥∥= Op(N−1) + Op((NT)−1/2).
42
In view of this, (A16) and the fact that ‖T−1/2F′εi‖ = Op(1), we obtain
T−1/2X′iMFεi = T−1/2η′iMFH′εi + Op(√
TN−1) + Op(N−1/2).
Note also how
T−1/2η′iMFH′εi = T−1/2η′iMFεi = T−1/2η′iεi − T−1/2T−1/2η′iF(T−1F′F)−1T−1/2F′εi
= T−1/2η′iεi + Op(T−1/2).
By combining all the results obtained so far, we obtain
T−1/2X′iMF[εi −D(H′)−1λi] = T−1/2η′iεi + Op(√
TN−1) + Op(N−1/2) + Op(T−1/2).
It remains to consider the denominator of the estimator. By (A35) and (A37),
T−1X′iMFXi = T−1η′iηi + Op(T−1) + Op(N−1) + Op((NT)−1/2).
This implies
√T(βC3E,i − β) = (T−1η′iηi)
−1T−1/2η′iεi + Op(√
TN−1) + Op(N−1/2) + Op(T−1/2).
The required result now follows from Assumptions 1 (i) and (ii), provided that N, T → ∞
with√
T/N → 0. �
Proof of Corollary 1.
Write
√NT(βBAC3E − β) =
√NT(βC3E − β)−
√TN−1/2Σ
−1η B
=√
NT(βC3E − β)−√
TN−1/2Σ−1η B−
√TN−1/2Σ
−1η (B− B)
−√
TN−1/2(Σ−1η − Σ−1
η )B. (A41)
Consider√
TN−1/2Σ−1η (B− B). We begin by showing that ‖Ci − (H′)−1Ci‖ = op(1), which
implies that λi and Λi in B are consistent. We have
T−1F′Wi = T−1F′FCi + T−1F′Ui
= T−1HF′FCi + T−1D′FCi + T−1F′Ui
= T−1HF′FCi + T−1HF′Ui + T−1D′FCi + T−1D′Ui.
43
Clearly, ‖T−1F′Ui‖ = Op(T−1/2), and by Lemma A.2, ‖T−1D′F‖ = Op((NT)−1/2). More-
over, from Proof of Theorem 1, ‖T−1D′Ui‖ and T‖(F′F)−1 − (HF′FH′)−1‖ are Op(N−1) +
Op((NT)−1/2). It follows that
Ci = (T−1F′F)−1T−1F′Wi
= (T−1HF′FH′)−1(T−1HF′FCi + T−1HF′Ui + T−1D′FCi + T−1D′Ui) + Op(N−1)
+ Op((NT)−1/2)
= (T−1HF′FH′)−1T−1HF′FCi + Op(N−1) + Op(T−1/2)
= (H′)−1Ci + Op(N−1) + Op(T−1/2) (A42)
(Ση −Ση), (Ση,i −Ση,i) and (σ2ε,i − σ2
ε,i) are all Op(T−1/2) (details are available upon request).
This implies
‖B− B‖ = Op(T−1/2) + Op(N−1),
and therefore, with ‖Σ−1η ‖ = Op(1),
‖√
TN−1/2Σ−1η (B− B)‖ ≤
√TN−1/2‖Σ−1
η ‖‖B− B‖
= Op(N−1/2) + Op(√
TN−3/2), (A43)
which is op(1) under our assumption that√
TN−1 = o(1). Similarly, since ‖B‖ = Op(1) and,
by Taylor expansion, ‖Σ−1η − Σ−1
η ‖ = Op(T−1/2),
‖√
TN−1/2(Σ−1η − Σ−1
η )B‖ ≤√
TN−1/2‖Σ−1η − Σ−1
η ‖‖B‖ = Op(N−1/2). (A44)
Together with Theorem 1 these results imply√
NT(βBAC3E − β) =√
NT(βC3E − β)−√
τΣ−1η B + op(1)
as N, T → ∞ with√
TN−1 → 0 and√
NT−1 → 0. Finally, note how T/N → τ implies√
TN−1 → 0 and√
NT−1 → 0. The Theorem 1 requirement of T/N → τ is therefore
enough also for this proof. �
Proof of Proposition 1.
Consider the ks× 1 vector zs,i of combinations candidates. Let us use Fs, Hs and Zsi to denote
F, H and Zi, respectively, based on estimating s = (m + 1)ks factors. By using ln a− ln b =
ln(a/b), 1/(det A) = det A−1 and (det A)(det B) = det(AB), we can show that
IC(s)− IC(r) = ln det[V(Fs)V(Fr)−1] + (s− r) · g
= ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r) · g. (A45)
44
We will consider two cases; s ≤ r and s > r. We start with the case with s ≤ r. Note that in
this case all the elements in Zsi will satisfy Assumption 2. In order to emphasize this we use
Zs0,i for the combination matrix in this case. Consider V(Fs)−V(Fr), which we write as
V(Fs)−V(Fr) = [V(Fs)−V(F(Hs)′)]− [V(Fr)−V(F(Hr
)′)]
+ [V(F(Hs)′)−V(F(Hr
)′)]. (A46)
Since s ≤ r, we have
MF(Hs)′ −MFs
= Ds((Fs)′Fs)−1(Ds)′ + Ds((Fs)′Fs)−1HsF′
+ F(Hs)′((Fs)′Fs)−1(Ds)′ + F(Hs
)′[((Fs)′Fs)−1 − (HsF′F(Hs)′)−1]HsF′,
where Ds = Fs − F(Hs)′, suggesting that
V(Fs)−V(F(Hs)′)
=1
NT
N
∑i=1
W′i(MF(Hs
)′ −MFs)Wi
=1
NT
N
∑i=1
W′iD
s((Fs)′Fs)−1(Ds)′Wi +1
NT
N
∑i=1
W′iD
s((Fs)′Fs)−1HsF′Wi
+1
NT
N
∑i=1
W′iF(H
s)′((Fs)′Fs)−1(Ds)′Wi
+1
NT
N
∑i=1
W′iF(H
s)′[((Fs)′Fs)−1 − (HsF′F(Hs
)′)−1]HsF′Wi.
From Wi = FCi + Ui and N−1 ∑Ni=1 CiZs
i = (Hs)′,
Fs =1N
N
∑i=1
WiZs0,i =
1N
N
∑i=1
FCiZs0,i +
1N
N
∑i=1
UiZs0,i = F(Hs
)′ +1N
N
∑i=1
UiZs0,i,
or
ft = Hsft +1N
N
∑i=1
(Zs0,i)′ui,t.
By using this, Assumption 2 (iii) and the fact that T−1F′Ui = T−1 ∑Tt=1 ftu′i,t = Op(T−1/2),
45
we obtain
T−1W′iD
s =1
NT
T
∑t=1
N
∑j=1
wi,tu′j,tZs0,j
=1
NT
T
∑t=1
N
∑j=1
C′iftu′j,tZs0,j +
1NT
T
∑t=1
N
∑j=1
ui,tu′j,tZs0,j
=1
NT
T
∑t=1
N
∑j=1
C′iftu′j,tZs0,j +
1NT
T
∑t=1
ui,tu′i,tZs0,i +
1NT
T
∑t=1
N
∑j 6=i
ui,tu′j,tZs0,j
= Op((NT)−1/2) + Op(N−1).
By repeated use of the same argument,
T−1F′Wi =1T
T
∑t=1
ftw′i,t =1T
T
∑t=1
ftf′tCi +1T
T
∑t=1
ftu′i,t = Op(1) + Op(T−1/2),
and
T−1(Fs)′Fs = T−1
(F(Hs
)′ +1N
N
∑i=1
UiZs0,i
)′(F(Hs
)′ +1N
N
∑i=1
UiZs0,i
)
= T−1HsF′F(Hs)′ +
1NT
N
∑i=1
H′F′UiZs0,i +
1NT
N
∑i=1
(Zs0,i)′U′iF(H
s)′
+1
N2T
N
∑i=1
N
∑j=1
(Zs0,i)′U′iUjZs
0,j
= T−1HsF′F(Hs)′ + Op((NT)−1/2) + Op(N−1).
Note also that in the case considered here rk Hs= min{s, r} = s, which implies that the s× s
matrix T−1HsF′F(Hs)′ is positive definite. Therefore,
T[((Fs)′Fs)−1 − (HsF′F(Hs)′)−1]
= (T−1(Fs)′Fs)−1(T−1HsF′F(Hs)′ − T−1(Fs)′Fs)(T−1HsF′F(Hs
)′)−1
= Op((NT)−1/2) + Op(N−1).
Hence, by putting everything together, we can show that
V(Fs)−V(F(Hs)′) = Op((NT)−1/2) + Op(N−1), (A47)
which holds for all s ≤ r, including s = r. This implies
V(Fs)−V(Fr)
= [V(Fs)−V(F(Hs)′)]− [V(Fr)−V(F(Hr
)′)] + [V(F(Hs)′)−V(F(Hr
)′)]
= [V(F(Hs)′)−V(F(Hr
)′)] + Op((NT)−1/2) + Op(N−1). (A48)
46
By writing MA = IT − PA for any A, the remaining term in the above expression for V(Fs)−
V(Fr) becomes
V(F(Hs)′)−V(F(Hr
)′) =1
NT
N
∑i=1
W′i(PF(Hr
)′ − PF(Hs)′)Wi,
which is zero if s = r. If s < r, then PF(Hr)′ = PF. Thus, since PF − PF(Hs
)′ is positive semi-
definite, the quadratic form T−1W′i(PF(Hr
)′ − PF(Hs)′)Wi = T−1W′
i(PF − PF(Hs)′)Wi is posi-
tive semi-definite too. Also, T−1W′i(PF − PF(Hs
)′)Wi = 0m+1 is equivalent to tr [T−1W′i(PF −
PF(Hs)′)Wi] = 0, which under Assumption 1 (iv) and (vi) can be shown to be violated asymp-
totically using the same arguments as in Bai and Ng (2002, Proof of Lemma 3), and Stock and
Watson (1998, Proof of Theorem 2). Therefore, V(F(Hs)′)− V(F(Hr
)′) converges to a pos-
itive definite matrix, as does V(Fr). Suppose that A is positive definite and B is positive
semi-definite. Then det(A + B) ≥ det A with equality if and only if B = 0. Making use of
this result we find that
ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1)→ c > ln det Im+1 = 0 (A49)
for all s < r. Hence, since g = o(1),
IC(s)− IC(r) = ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r) · g→ c > 0, (A50)
which in turn implies
P[IC(s)− IC(r) < 0]→ 0 (A51)
for all s < r.
Consider next the case when s > r, such that rk Hs= r (Hs has full column rank).
In this case we should allow for the possibility that zs,i includes some or indeed all of the
elements of z1,i. Denote by A− the generalized Moore–Penrose inverse of any matrix A.
From (Hs)− = ((Hs
)′Hs)−1(Hs
)′, we have that (Hs)−Hs
= Ir. We can therefore write Wi =
F(Hs)′(Hs−
)′Ci + Ui = F(Hs−)′Ci + Ei, where Ei = Ui −Ds(Hs−
)′Ci. In this notation,
V(F) =1
NT
N
∑i=1
W′iMFWi =
1NT
N
∑i=1
U′iMFUi,
V(Fs) =1
NT
N
∑i=1
W′iMFs Wi =
1NT
N
∑i=1
E′iMFs Ei
=1
NT
N
∑i=1
U′iMFs Ui −1
NT
N
∑i=1
CiHs−(Ds)′MFs Ui −
1NT
N
∑i=1
U′iMFs Ds(Hs−)′Ci
+1
NT
N
∑i=1
CiHs−(Ds)′MFs Ds(Hs−
)′Ci.
47
To evaluate the orders of T−1U′iDs, T−1(Ds)′Ds, we need to acknowledge that Zs
i might
include invalid candidates that belong to z1,i. If s > k0, then the elements after the k0-th
element will satisfy Assumption 3 (ii) instead of Assumption 3 (i). To deal with this, we
partition T−1U′iDs into two parts, such that T−1U′iD
s = ((T−1U′iDs)1, (T−1U′iD
s)2), where
(T−1U′iDs)1 has a dimension of (m + 1)× k0 and (T−1U′iD
s)2 has a dimension of (m + 1)×
(s− k0). We have
(T−1U′iDs)1 =
1NT
T
∑t=1
N
∑j=1
ui,tu′j,tZk00,j =
1NT
T
∑t=1
ui,tu′i,tZk00,i +
1NT
T
∑t=1
N
∑j 6=i
ui,tu′j,tZk00,j
= Op(N−1) + Op((NT)−1/2),
(T−1U′iDs)2 =
1NT
T
∑t=1
N
∑j=1
ui,tu′j,tZs−k01,j =
1NT
T
∑t=1
ui,tu′i,tZs−k01,i +
1NT
T
∑t=1
N
∑j 6=i
ui,tu′j,tZs−k01,j
= Op(N−1) + Op(T−1/2).
The last two results imply that T−1U′iDs = Op(N−1) + Op(T−1/2). Similarly, we need to
partition T−1(Ds)′Ds into four different matrices;
T−1(Ds)′Ds =
[(T−1(Ds)′Ds)11 (T−1(Ds)′Ds)12(T−1(Ds)′Ds)21 (T−1(Ds)′Ds)22
],
where the upper left hand-side is a k0 × k0 matrix, upper right hand-side is a k0 × (s− k0)
matrix, lower left hand-side is a (s− k0)× k0 matrix and lower right hand-side is a (s− k0)×
(s− k0) matrix. We have
(T−1(Ds)′Ds)11 =1
N2T
T
∑t=1
N
∑i=1
N
∑j=1
(Zk00,i)′ui,tu′j,tZ
k00,j
=1
N2T
N
∑i=1
T
∑t=1
(Zk00,i)′ui,tu′i,tZ
k00,i +
1N2T
T
∑t=1
N
∑i=1
N
∑j 6=i
(Zk00,i)′ui,tu′j,tZ
k00,j
= Op(N−1),
(T−1(Ds)′Ds)12 =1
N2T
T
∑t=1
N
∑i=1
N
∑j=1
(Zk00,i)′ui,tu′j,tZ
s−k01,j
=1
N2T
N
∑i=1
T
∑t=1
(Zk00,i)′ui,tu′i,tZ
s−k01,i +
1N2T
T
∑t=1
N
∑i=1
N
∑j 6=i
(Zk00,i)′ui,tu′j,tZ
s−k01,j
= Op(N−1) + Op((NT)−1/2),
(T−1(Ds)′Ds)22 =1
N2T
T
∑t=1
N
∑i=1
N
∑j=1
(Zs−k01,i )′ui,tu′j,tZ
s−k01,j
=1
N2T
N
∑i=1
T
∑t=1
(Zs−k01,i )′ui,tu′i,tZ
s−k01,i +
1N2T
T
∑t=1
N
∑i=1
N
∑j 6=i
(Zs−k01,i )′ui,tu′j,tZ
s−k01,j
= Op(N−1) + Op(T−1/2),
48
which imply that T−1(Ds)′Ds = Op(N−1) + Op(T−1/2). Then, via Pythagoras’ theorem, we
have
‖T−1CiHs−(Ds)′MFs Ui‖ ≤ ‖T−1CiH
s−(Ds)′Ui‖ ≤ ‖Ci‖‖H
s−‖‖T−1(Ds)′Ui‖
= Op(N−1) + Op(T−1/2),
‖T−1CiHs−(Ds)′MFs Ds(Hs−
)′Ci‖ ≤ ‖T−1CiHs−(Ds)′Ds(Hs−
)′Ci‖
≤ ‖Ci‖2‖Hs−‖2‖T−1(Ds)′Ds‖
= Op(N−1) + Op(T−1/2).
It follows that
V(Fs) =1
NT
N
∑i=1
U′iMFs Ui + Op(N−1) + Op(T−1/2),
and so we obtain
V(Fs)−V(F) =1
NT
S
∑i=1
U′i(PF − PFs)Ui + Op(T−1/2) + Op(N−1).
We also have
‖T−1U′iPFUi‖ ≤ ‖T−1U′iF‖2‖(T−1F′F)−1‖ = Op(T−1),
and by further use of tr (AB) = tr (BA), tr (AB) ≤ (tr A)(tr B) and tr (AB) ≤ ∑rj=1 λe
j (A)λej (B)
where A and B are normal matrices and λej (A) is the j-th eigenvalue of A, and noting that
idempotency of PFs implies that λej (PFs) = 1 for j = 1, . . . , s, we obtain∥∥∥∥∥ 1
NT
N
∑i=1
U′iPFs Ui
∥∥∥∥∥2
= tr
(1
NT
N
∑i=1
U′iPFs Ui
)2
= tr
(1
NT
N
∑i=1
U′iPFs Ui
)2
≤[
s
∑j=1
λej
(1
NT
N
∑i=1
UiU′i
)λe
j (PFs)
]2
≤ [λemax((NT)−1UU′)s]2,
where U = (U1, ..., UN) is T × N(m + 1). Now, λmax((NT)−1UU′) has the same form as in
Bai and Ng (2006), Amengual and Watson (2006) and Amengual and Watson (2007), who
show that it is Op(N−1) + Op(T−1). We therefore obtain
V(Fs)−V(F) = Op(T−1/2) + Op(N−1), (A52)
and so
V(Fs)−V(Fr) = [V(Fs)−V(F)]− [V(Fr)−V(F)]
= Op(T−1/2) + Op(N−1), (A53)
49
which leads to the following:
ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) = Op(T−1/2) + Op(N−1) (A54)
(see, for example, Paulsen, 1984, page 119). Let C = min{N,√
T}. Making use of the previ-
ously obtained expression for IC(s)− IC(r), we obtain
g−1[IC(s)− IC(r)] = g−1 ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r).
By using this, g > 0 and (A54), we can show that
P[IC(s)− IC(r) < 0]
= P[(C · g)−1C · ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r) < 0]→ 0, (A55)
where the last result follows from noting that C · g → ∞ by assumption, C · ln det(Im+1 +
[V(Fs)−V(Fr)]V(Fr)−1) = Op(1) and (s− r) > 0 for all s > r. This last result together with
(A51) implies that
P[IC(s)− IC(r) < 0|s 6= r]→ 0, (A56)
which is equivalent to saying that
P(r = r)→ 1,
as was to be shown. �
Proof of Corollary 2.
The proof of Corollary 2 follows by simple manipulations of that of Proposition 1. Note in
particular how the proof for the case when s ≤ r is exactly the same as in Proof of Propo-
sition 1. When s > r, under the condition that Zsi = Zs
0,i we have the following changes
to the orders of T−1U′iDs and T−1(Ds)′Ds becomes equal to the orders of (T−1U′iD
s)11 and
(T−1(Ds)′Ds)11, respectively:
T−1U′iDs =
1NT
T
∑t=1
ui,tu′i,tZs0,i +
1NT
T
∑t=1
N
∑j 6=i
ui,tu′j,tZs0,j = Op(N−1) + Op((NT)−1/2),
T−1(Ds)′Ds =1
N2T
N
∑i=1
T
∑t=1
(Zs0,i)′ui,tu′i,tZ
s0,i +
1N2T
T
∑t=1
N
∑i=1
N
∑j 6=i
(Zs0,i)′ui,tu′j,tZ
s0,j
= Op(N−1).
50
This last result, together with the result ‖T−1U′iPFUi‖ = Op(T−1), implies
V(Fs)−V(Fr) = Op((NT)−1/2) + Op(N−1) + Op(T−1).
The order of ln det(Im+1 + [V(Fs)− V(Fr)]V(Fr)−1) is the same. Therefore, by letting C =
min{√
N,√
T} and using the same trick as in Proof of Proposition 1, we can show that
P[IC(s)− IC(r) < 0]
= P[(C2 · g)−1C2 · ln det(Im+1 + [V(Fs)−V(Fr)]V(Fr)−1) + (s− r) < 0], (A57)
which is o(1) because C2 · g → ∞, C2 · ln det(Im+1 + [V(Fs)− V(Fr)]V(Fr)−1) = Op(1) and
(s− r) > 0 for all s > r. Hence, provided that the rate of shrinking of g is slow enough, the
consistency of r is unaffected by the correlation between ui,t and Zsi . �
51
Table A: Description of the experiments.
Experiment r Observables Factor loadingsE1 2 yi,t = βixi,t + λ1i f1t + λ2i f2t + εi,t λ1i = 4z2i + τλ1i
xi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i = −2z2i + τλ2iΛ1i = 2z2i + τΛ1iΛ2i = z2i + τΛ2i
E2 2 yi,t = βixi,t + λ1i f1t + λ2i f2t + εi,t λ1i ∼ N( 1, 1)xi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i ∼ N( 2, 1)
Λ1i ∼ N( 1, 1)Λ2i ∼ N(−1, 1)
E3 2 yi,t = βixi,t + λ1i f1t + λ2i f2t + εi,t λ1i = z2,i + τλ1ixi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i = 2z2,i + τλ2i
Λ1i = z3i + τΛ1iΛ2i = 2z3i + τΛ2i
E4 2 yi,t = βixi,t + λ1i f1t + λ2i f2t + εi,t λ1i = 2z2i + τλ1ixi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i = 0.5z2i + τλ2i
Λ1i = 2z2i + τΛ1iΛ2i = 0.5z2i + τΛ2i
E5 4 yi,t = βixi,t + λ1i f1t + λ2i f2t + λ3i f3t λ1i = 4z2i + 2z3i + 0.2z4i + τλ1i+ λ4i f4t + εi,t λ2i = −2z2i + z3i + τλ2i
xi,t = Λ1i f1t + Λ2i f2t + Λ3i f3t λ3i = −z2i + 2z3i + 0.1z4i + τλ3i+ Λ4i f4t + ηi,t λ4i = −2z2i − 1z3i + τλ4i
Λ1i = 2z2i + z3i + 0.2z4i + τΛ1iΛ2i = z2i + 0.5z3i + τΛ2iΛ3i = 2z2i + 0.5z3i + τΛ3iΛ4i = z2i + z3i + 0.2z4i + τΛ4i
E6 2 yi,t = βxi,t + λ1i f1t + λ2i f2t + εi,t λ1i = z1i + 4z2i + τλ1ixi,t = Λ1i f1t + Λ2i f2t + ηi,t λ2i = z1,i − 2z2i + z3i + τλ2i
Λ1i = z1i + 2z2i + z3,i + τΛ1iΛ2i = z1i + z2i + τΛ2i
Notes: The following specifications that are kept constant across the experiments: βi ∼ N(−2, 0.25), β = −2,( f1,t, f2,t, f3,t, f4,t, ηi,t, εi,t) ∼ N(06×1, I6) and (τλ1i, τλ2i, τΛ1i, τΛ2i)
′ ∼ N(04×1, 0.25 · I4). The combinations arez1i = 1, z2i ∼ N(0.5, 1), z3i ∼ N(−0.4, 1), z4i ∼ N(0.2, 1), z5i ∼ N(0.5, 1) and z6i ∼ N(0.1, 1).
52
Table E1: All conditions of CCE and C3E are satisfied.
Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 94.79 10.57 1.63 0.06 0.06 0.06 99.5 36.7 14.8 6.5 6.5 7.350 30 95.59 6.32 0.33 0.01 0.01 0.05 99.8 26.5 9.9 5.8 5.8 6.9
100 30 95.56 3.10 0.01 −0.01 −0.01 0.01 99.9 14.9 6.7 5.2 5.2 6.2200 30 95.89 1.57 0.01 0.00 0.00 0.01 100.0 10.3 5.6 5.1 5.1 5.6
30 50 94.99 9.97 1.38 0.01 0.01 0.05 99.9 38.6 14.9 6.5 6.5 7.750 50 95.62 5.96 0.26 −0.03 −0.03 0.04 100.0 26.4 10.2 6.5 6.5 6.9
100 50 95.89 2.88 −0.09 −0.11 −0.11 −0.10 100.0 14.8 6.7 5.5 5.5 6.3200 50 96.32 1.52 0.03 0.04 0.04 0.03 100.0 11.6 5.6 5.0 5.0 5.6
30 100 95.76 9.64 0.87 0.00 0.00 0.05 100.0 40.1 13.5 6.5 6.5 7.350 100 96.06 5.75 0.15 −0.05 −0.05 −0.01 100.0 27.9 9.8 6.1 6.1 6.8
100 100 96.00 2.90 0.06 0.02 0.02 0.04 100.0 17.5 6.9 5.9 5.9 6.5200 100 96.51 1.41 −0.02 −0.03 −0.03 −0.02 100.0 10.8 5.2 5.0 5.0 5.2
30 200 95.78 9.56 1.39 0.03 0.03 0.07 100.0 42.3 14.9 6.6 6.6 7.650 200 95.92 5.70 0.16 −0.01 −0.01 0.04 100.0 30.2 9.3 5.8 5.8 6.3
100 200 96.41 2.82 −0.01 −0.02 −0.02 0.00 100.0 17.3 5.8 5.4 5.4 5.6200 200 96.58 1.47 0.05 0.05 0.05 0.05 100.0 11.9 5.6 5.5 5.5 5.6
93 30 95.84 3.44 0.12 0.07 0.07 0.10 99.8 17.3 7.6 6.1 6.1 6.8184 50 96.02 1.59 −0.02 −0.03 −0.03 −0.02 100.0 11.3 6.3 5.5 5.5 6.3464 100 96.42 0.59 −0.03 −0.03 −0.03 −0.03 100.0 7.6 5.3 5.2 5.2 5.3
1169 200 96.50 0.25 0.00 0.00 0.00 0.00 100.0 5.6 4.6 4.6 4.6 4.6
Notes: “LS”, “PC”, “CCE” refer to the LS, principal components and CCE estimators, respectively. “C3E1”refers to the C3E estimator based on the “true” combinations, and “C3E2” refers to the C3E estimator basedon IC selected combinations. “C3E3” is C3E with a vector of ones as a must have combination.
53
Table E2: The combinations are uncorrelated with the loadings.
Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 −19.36 7.46 0.05 −0.93 −0.27 0.05 39.0 25.8 6.8 16.1 9.0 6.850 30 −19.94 4.40 0.00 −0.06 −0.23 0.00 54.5 17.4 6.0 12.9 7.6 6.0
100 30 −19.92 2.12 −0.01 0.00 −0.09 −0.01 69.6 10.0 5.3 8.1 6.6 5.3200 30 −19.66 1.06 0.00 0.00 0.00 0.00 80.5 7.6 5.2 5.9 5.8 5.2
30 50 −20.07 7.33 0.00 −0.96 −0.28 0.00 40.1 29.6 6.4 15.7 8.3 6.450 50 −20.09 4.29 −0.02 0.00 −0.22 −0.02 54.9 19.4 6.3 12.7 8.0 6.3
100 50 −19.92 2.01 −0.10 −0.06 −0.12 −0.10 72.6 10.8 5.4 8.3 7.1 5.4200 50 −19.75 1.09 0.04 0.06 0.05 0.04 84.9 8.6 5.2 5.8 5.7 5.2
30 100 −19.92 7.27 0.01 −0.84 −0.30 0.01 40.1 32.2 6.5 15.4 8.9 6.550 100 −20.11 4.21 −0.05 −0.17 −0.24 −0.05 55.5 20.2 6.2 12.7 8.0 6.2
100 100 −20.08 2.11 0.01 0.06 0.00 0.01 77.9 12.7 5.9 8.5 7.4 5.9200 100 −19.97 1.01 −0.03 −0.02 −0.02 −0.03 91.9 8.5 5.0 5.5 5.5 5.0
30 200 −19.57 7.30 0.05 −0.70 −0.25 0.05 38.1 34.9 6.7 17.9 9.5 6.750 200 −20.05 4.25 −0.01 −0.27 −0.25 −0.01 55.9 23.0 5.5 12.8 7.6 5.5
100 200 −20.00 2.06 −0.02 −0.04 −0.07 −0.02 80.4 12.9 5.1 7.5 6.3 5.1200 200 −19.92 1.09 0.05 0.06 0.05 0.05 95.3 9.6 5.3 6.2 6.1 5.3
93 30 −19.52 2.37 0.06 0.08 0.02 0.06 67.2 11.7 6.0 9.3 7.6 6.0184 50 −19.75 1.11 −0.04 −0.02 −0.03 −0.04 83.8 8.9 5.5 6.5 6.4 5.5464 100 −19.93 0.42 −0.03 −0.03 −0.03 −0.03 97.6 6.7 5.3 5.1 5.1 5.3
1169 200 −20.03 0.18 0.00 0.00 0.00 0.00 100.0 5.1 4.7 4.7 4.7 4.7
Notes: See Table E1 for an explanation.
Table E3: Condition (6) is not satisfied but loadings are independent.
Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 −13.21 5.57 0.15 0.10 0.05 0.05 14.4 17.8 7.0 6.7 7.9 8.350 30 −13.61 3.24 0.06 0.09 0.05 0.06 20.2 12.6 6.1 5.9 6.0 6.5
100 30 −13.66 1.56 −0.02 −0.03 0.00 0.01 32.3 8.0 5.6 5.3 5.2 5.4200 30 −13.43 0.78 0.04 0.03 0.02 0.03 55.1 6.7 5.1 5.1 5.1 5.0
30 50 −13.79 5.28 0.07 0.09 0.08 0.05 15.6 18.8 6.6 6.3 6.6 7.150 50 −13.69 3.08 0.00 −0.01 0.01 −0.01 20.1 13.2 5.8 5.8 6.1 6.1
100 50 −13.74 1.42 −0.09 −0.08 −0.08 −0.09 32.6 8.3 5.5 5.6 5.6 5.4200 50 −13.47 0.79 0.06 0.06 0.07 0.05 55.0 6.8 5.7 5.3 5.3 5.5
30 100 −13.64 5.15 0.08 0.08 0.05 0.06 15.2 19.7 7.0 6.1 6.6 7.250 100 −13.43 2.98 −0.08 −0.10 −0.10 −0.08 19.8 12.8 6.6 6.2 6.2 6.6
100 100 −13.48 1.50 0.00 −0.01 −0.01 0.00 31.7 9.7 6.2 6.0 5.9 6.3200 100 −13.66 0.71 −0.05 −0.05 −0.05 −0.05 56.2 6.7 5.2 4.9 4.9 5.0
30 200 −13.58 5.14 0.09 0.08 0.07 0.07 14.4 21.2 6.9 6.9 7.6 8.350 200 −13.20 3.00 0.06 0.02 0.01 0.02 18.3 14.3 6.0 5.8 5.9 6.2
100 200 −13.85 1.45 −0.02 −0.01 −0.01 −0.01 32.7 9.1 5.9 5.6 5.8 5.9200 200 −13.79 0.78 0.03 0.03 0.04 0.03 56.6 7.9 5.5 5.7 5.8 5.6
93 30 −13.32 1.77 0.14 0.13 0.11 0.11 29.9 9.0 6.2 6.3 6.6 6.6184 50 −13.69 0.79 −0.01 0.01 0.00 −0.02 53.1 7.2 5.3 5.4 5.5 5.2464 100 −13.70 0.29 −0.01 0.00 0.00 −0.01 89.1 6.1 5.4 5.2 5.1 5.4
1169 200 −13.69 0.13 0.01 0.01 0.01 0.00 99.9 4.8 4.7 4.8 4.7 4.8
Notes: See Table E1 for an explanation.
54
Table E4: rk C < m = k + 1 and loadings are not independent.
Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 76.45 9.00 9.52 0.07 0.05 0.88 100.0 31.1 28.7 6.6 7.5 9.650 30 76.78 5.31 5.21 0.02 0.01 1.30 100.0 21.6 20.5 5.9 6.0 8.8
100 30 76.98 2.60 2.27 −0.03 −0.02 1.17 100.0 12.2 11.5 5.1 5.4 8.3200 30 77.11 1.31 1.11 0.01 0.01 0.72 100.0 9.1 7.6 5.2 5.3 6.7
30 50 76.83 8.66 9.41 0.00 0.00 1.52 100.0 33.9 29.9 7.0 7.1 10.150 50 77.08 5.09 5.23 −0.01 0.00 1.85 100.0 22.2 22.5 6.8 6.7 11.2
100 50 77.19 2.43 2.25 −0.08 −0.06 1.39 100.0 13.0 12.2 5.5 5.4 9.0200 50 77.42 1.30 1.08 0.01 0.01 0.83 100.0 10.2 8.5 5.5 5.6 7.6
30 100 77.04 8.42 9.04 0.05 0.03 1.60 100.0 36.3 29.7 6.5 6.9 10.550 100 77.29 4.96 5.02 −0.07 −0.08 1.70 100.0 24.1 21.6 6.0 6.0 9.9
100 100 77.40 2.49 2.28 0.00 0.00 1.46 100.0 15.4 13.7 5.9 6.0 10.3200 100 77.64 1.21 1.02 −0.05 −0.05 0.78 100.0 9.7 8.4 5.0 5.0 7.4
30 200 77.07 8.38 9.44 0.02 0.04 1.04 100.0 39.0 31.1 6.6 7.0 9.550 200 77.34 4.96 5.25 0.02 0.01 1.45 100.0 26.4 23.2 5.7 5.7 9.6
100 200 77.59 2.44 2.24 −0.01 −0.01 1.26 100.0 15.2 12.9 5.4 5.4 8.9200 200 77.85 1.27 1.16 0.09 0.08 0.82 100.0 11.1 9.4 6.1 6.1 8.3
93 30 77.03 2.89 2.60 0.10 0.08 1.34 100.0 14.1 13.2 5.8 5.9 9.3184 50 77.40 1.34 1.16 −0.01 −0.02 0.89 100.0 10.1 8.6 5.2 5.1 7.9464 100 77.68 0.50 0.46 0.00 0.00 0.36 100.0 7.4 6.4 4.9 4.8 5.8
1169 200 77.88 0.21 0.18 −0.01 −0.01 0.14 100.0 5.3 5.2 4.9 4.9 5.2
Notes: See Table E1 for an explanation.
Table E5: rk C = m < k + 1 and loadings are not independent.
Bias × 100 SizeN T LS PC CCE C3E1 C3E2 C3E3 LS PC CCE C3E1 C3E2 C3E330 30 23.17 17.30 −11.38 0.13 0.19 0.17 48.4 74.8 77.6 7.0 8.5 9.350 30 22.79 10.27 −21.85 0.08 0.08 0.06 58.1 56.2 86.7 5.9 6.2 6.2
100 30 23.47 5.06 −37.03 0.01 0.02 0.00 70.8 33.1 94.7 5.2 5.2 5.6200 30 23.38 2.50 −44.21 0.00 0.00 −0.11 79.0 18.6 98.4 5.1 5.1 5.6
30 50 22.99 16.40 −11.52 0.00 0.02 0.03 46.4 80.2 79.4 7.4 8.3 8.550 50 23.79 9.68 −23.26 −0.01 −0.02 0.00 57.9 60.7 87.9 6.5 6.5 7.1
100 50 23.71 4.80 −37.51 −0.02 −0.02 −0.04 70.9 36.0 95.1 6.1 6.1 6.3200 50 23.65 2.40 −45.20 0.00 0.00 −0.19 80.8 20.1 98.6 5.4 5.4 6.1
30 100 23.41 15.64 −11.96 −0.27 −0.27 −0.25 44.9 82.6 80.1 6.9 7.7 8.150 100 23.85 9.40 −22.62 −0.01 −0.01 0.01 59.1 65.7 89.3 6.3 6.2 6.5
100 100 24.14 4.64 −37.08 −0.02 −0.02 −0.04 75.4 39.1 95.4 5.8 5.8 5.9200 100 23.64 2.31 −44.66 −0.01 −0.01 −0.20 85.5 21.9 98.6 5.7 5.7 6.2
30 200 23.58 15.53 −10.71 −0.10 −0.09 −0.09 44.0 85.6 79.4 8.1 8.8 9.350 200 23.94 9.23 −24.11 −0.07 −0.07 −0.07 59.9 67.6 88.5 7.2 7.2 7.9
100 200 23.81 4.49 −37.55 −0.12 −0.12 −0.12 80.5 38.9 95.3 5.3 5.3 5.7200 200 24.16 2.31 −45.45 0.03 0.03 −0.09 91.9 24.4 99.0 5.7 5.7 6.1
93 30 23.48 5.45 −35.43 0.01 0.01 0.01 68.6 35.4 93.4 5.9 5.9 6.2184 50 23.88 2.54 −44.31 −0.08 −0.08 −0.26 80.1 21.2 98.5 5.6 5.6 6.2464 100 24.09 0.99 −49.21 −0.01 −0.01 −0.20 92.1 12.7 99.9 5.9 5.9 7.2
1169 200 24.23 0.39 −51.35 0.00 0.00 −0.05 98.7 8.2 100.0 5.6 5.6 6.0
Notes: See Table E1 for an explanation.
55
Table E6: Bias and bias-adjustment in the homogeneous slope case.
Bias × 100N T CCE BACCE C3E1 BAC3E1 C3E2 BAC3E2 C3E3 BAC3E330 30 0.22 −1.10 0.71 0.10 0.71 0.10 −0.11 0.0050 30 −0.25 −0.19 0.40 0.03 0.40 0.03 −0.21 −0.05
100 30 −0.14 0.00 0.22 0.03 0.22 0.03 −0.14 0.01200 30 −0.06 0.03 0.13 0.04 0.13 0.04 −0.06 0.03
30 50 0.13 −1.12 0.63 0.01 0.63 0.01 −0.25 −0.1150 50 −0.26 −0.23 0.37 0.00 0.37 0.00 −0.27 −0.09
100 50 −0.19 −0.05 0.17 −0.01 0.17 −0.01 −0.19 −0.05200 50 −0.11 −0.03 0.08 −0.02 0.08 −0.02 −0.11 −0.03
30 100 0.08 −1.26 0.61 0.02 0.61 0.02 −0.26 −0.1250 100 −0.28 −0.27 0.35 −0.02 0.35 −0.02 −0.31 −0.12
100 100 −0.18 −0.04 0.19 0.00 0.19 0.00 −0.18 −0.04200 100 −0.09 −0.01 0.09 0.00 0.09 0.00 −0.09 −0.01
30 200 0.13 −1.02 0.64 0.04 0.64 0.04 −0.18 −0.0850 200 −0.25 −0.18 0.41 0.04 0.41 0.04 −0.21 −0.04
100 200 −0.18 −0.04 0.18 −0.01 0.18 −0.01 −0.19 −0.04200 200 −0.09 0.00 0.10 0.00 0.10 0.00 −0.09 0.00
93 30 −0.16 −0.01 0.22 0.02 0.22 0.02 −0.16 −0.01184 50 −0.12 −0.02 0.08 −0.02 0.08 −0.02 −0.12 −0.02464 100 −0.04 0.00 0.04 0.00 0.04 0.00 −0.04 0.00
1169 200 −0.02 0.00 0.01 0.00 0.01 0.00 −0.02 0.00
Notes: “BACCE”, “BAC3E1”, “BAC3E2” and “BAC3E3” refer to the bias-adjusted versions of the CCE, C3E1,C3E2 and C3E3 estimators, respectively. See Table E1 for an explanation.
56
Table B: Frequency count of the required number of combinations.
E1 E2 E3 E4 E5N T C3E2 C3E3 C3E2 C3E3 C3E2 C3E3 C3E2 C3E3 C3E2 C3E330 30 1.00 0.26 0.41 1.00 0.57 0.23 0.67 0.24 0.07 0.0050 30 1.00 0.49 0.62 1.00 0.68 0.43 0.74 0.43 0.61 0.00
100 30 1.00 0.89 0.89 1.00 0.75 0.62 0.77 0.63 1.00 0.01200 30 1.00 1.00 1.00 1.00 0.76 0.71 0.77 0.71 1.00 0.16
30 50 1.00 0.34 0.50 1.00 0.74 0.40 0.83 0.39 0.22 0.0050 50 1.00 0.59 0.69 1.00 0.80 0.58 0.86 0.57 0.92 0.00
100 50 1.00 0.94 0.92 1.00 0.83 0.75 0.86 0.74 1.00 0.03200 50 1.00 1.00 1.00 1.00 0.86 0.82 0.86 0.81 1.00 0.30
30 100 1.00 0.33 0.52 1.00 0.74 0.39 0.83 0.39 0.20 0.0050 100 1.00 0.58 0.69 1.00 0.81 0.57 0.87 0.55 0.95 0.00
100 100 1.00 0.94 0.93 1.00 0.85 0.75 0.87 0.74 1.00 0.02200 100 1.00 1.00 1.00 1.00 0.86 0.83 0.88 0.83 1.00 0.30
30 200 1.00 0.25 0.45 1.00 0.65 0.28 0.78 0.27 0.05 0.0050 200 1.00 0.50 0.65 1.00 0.75 0.47 0.81 0.47 0.88 0.00
100 200 1.00 0.92 0.90 1.00 0.80 0.68 0.83 0.67 1.00 0.01200 200 1.00 1.00 1.00 1.00 0.83 0.76 0.83 0.75 1.00 0.19
93 30 1.00 0.86 0.87 1.00 0.74 0.63 0.77 0.62 1.00 0.01184 50 1.00 1.00 0.99 1.00 0.86 0.81 0.87 0.82 1.00 0.26464 100 1.00 1.00 1.00 1.00 0.87 0.86 0.88 0.85 1.00 0.87
1169 200 1.00 1.00 1.00 1.00 0.84 0.83 0.84 0.82 1.00 1.00
Notes: “E1”–“E5” refer to the experiments described in Table A. The numbers in the table are the fractionof times that the selected number of combinations were equal to the required number. See Table E1 for anexplanation of the rest.
57