home page of peter c.b. phillips - gmm estimation for...

34
GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS AND STRONG INSTRUMENTS AT UNITY BY Chirok Han and Peter C.B. Phillips COWLES FOUNDATION PAPER NO. 1290 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY Box 208281 New Haven, Connecticut 06520-8281 2010 http://cowles.econ.yale.edu/

Upload: others

Post on 22-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS AND STRONG INSTRUMENTS AT UNITY

BY

Chirok Han and Peter C.B. Phillips

COWLES FOUNDATION PAPER NO. 1290

COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY

Box 208281 New Haven, Connecticut 06520-8281

2010

http://cowles.econ.yale.edu/

Page 2: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

Econometric Theory, 26, 2010, 119–151.doi:10.1017/S026646660909063X

GMM ESTIMATION FORDYNAMIC PANELS WITH

FIXED EFFECTS AND STRONGINSTRUMENTS AT UNITY

CHIROK HANKorea University

AND

PETER C.B. PHILLIPSYale University, University of Auckland,

University of York, and Singapore Management University

This paper develops new estimation and inference procedures for dynamic paneldata models with fixed effects and incidental trends. A simple consistent GMM es-timation method is proposed that avoids the weak moment condition problem that isknown to affect conventional GMM estimation when the autoregressive coefficient(ρ) is near unity. In both panel and time series cases, the estimator has standardGaussian asymptotics for all values of ρ ∈ (−1,1] irrespective of how the compositecross-section and time series sample sizes pass to infinity. Simulations reveal thatthe estimator has little bias even in very small samples. The approach is applied topanel unit root testing.

1. INTRODUCTION

In simple dynamic panel models, it is well known that the usual fixed effectsestimator is inconsistent when the time span is small (Nickell, 1981), as is the or-dinary least squares (OLS) estimator based on first differences. In such cases, theinstrumental variable (IV) estimator (Anderson and Hsiao, 1981) and generalizedmethod of moments (GMM) estimator (Arellano and Bond, 1991) are both widelyused. However, as noted by Blundell and Bond (1998), these estimators bothsuffer from a weak instrument problem when the dynamic panel autoregressivecoefficient (ρ) approaches unity. When ρ = 1, the moment conditions are com-pletely irrelevant for the true parameter ρ, and the nature of the behavior of the es-timator depends on T . When T is small, the estimators are asymptotically random,and when T is large the unweighted GMM estimator may be inconsistent and the

Our thanks go to two referees for helpful comments on the original version. Phillips acknowledges partial supportfrom a Kelly Fellowship and the NSF under Grant Nos. SES 04-142254 and SES 06-47086. Address correspondenceto Peter C.B. Phillips, Cowles Foundation, Yale University, P.O. Box 208281, New Haven, CT 06520, USA; e-mail:[email protected]; and Chirok Han, Department of Economics, Korea University, Anam-dong, Seongbuk-gu,Korea; e-mail: [email protected].

c© Cambridge University Press, 2009 0266-4666/10 $15.00 119

Page 3: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

120 C. HAN AND P.C.B. PHILLIPS

efficient two-step estimator (including the two-stage least squares estimator) maybehave in a nonstandard manner. Some special cases of such situations are studiedin Staiger and Stock (1997) and Stock and Wright (2000), among others, and Hanand Phillips (2006), the latter in a general context that includes some panel cases.

Methods to avoid these problems were developed in Arellano and Bover (1995),Blundell and Bond (1998), and more recently in Hsiao, Pesaran, and Tahmis-cioglu (2002). Arellano and Bover (1995) and Blundel and Bond (1998) propose asystem GMM procedure that uses moment conditions based on the level equationstogether with the usual Arellano and Bond type orthogonality conditions. Hsiaoet al. (2002), on the other hand, consider direct maximum likelihood estimationbased on the differenced data under assumed normality for the idiosyncratic er-rors. Both approaches yield consistent estimators for all ρ values, but there areremaining issues that have yet to be determined in regard to the limit distributionwhen ρ is unity and T is large.

In a recent paper dealing with the time series case, Phillips and Han (2008) in-troduced a differencing-based estimator in an AR(1) model for which asymptoticGaussian-based inference is valid for all values of ρ ∈ (−1,1]. The present paperapplies those ideas to dynamic panel data models, where we show that significantadvantages occur. In panels, the estimator again has a standard Gaussian limit forall ρ values including unity, it has virtually no bias, and it completely avoids theusual weak instrument problem for ρ in the vicinity of unity.

As discussed later, this panel estimator makes use of moment conditions thatare strong for all values of ρ ∈ (−1,1] under the assumption that the errors arewhite noise over time. (The white noise condition is stronger than that on whichthe usual IV/GMM approaches by Anderson and Hsiao, 1981, or Arellano andBond, 1991, are based.) Under this condition, the proposed estimator is consis-tent, supports asymptotically valid Gaussian inference even with highly persistentpanel data, and is free of initial conditions on levels. These advantages stem fromthe following properties: (i) The limit distribution is continuous as the autoregres-sive coefficient passes through unity; (ii) the rate of convergence is the same forstationary and nonstationary panels; and (iii) differencing transformations essen-tially eliminate dependence on level initial conditions.

Furthermore, there are no restrictions on the number of the cross-sectional units(n) and the time span (T ) other than the simple requirement that nT → ∞. Thus,neither large T nor large n is required for the limit theory to hold. Gaussianasymptotics apply irrespective of how the composite sample size nT → ∞, in-cluding both fixed T and fixed n cases, as well as any diagonal path and relativerate of divergence for these sample dimensions. This robust feature of the asymp-totics is unique to our approach and differs substantially from the existing litera-ture, including recent contributions by Hahn and Kuersteiner (2002), Alvarez andArellano (2003), and Moon, Perron, and Phillips (2005), who analyze variouscases with large n and large T . Apart from the fact that the asymptotic varianceof our proposed estimator can be better estimated by different methods when n islarge and T is small (because the variance evolves with T ), no other modification

Page 4: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 121

or consideration is required in the implementation of our approach, so it is wellsuited to practical implementation. This wide applicability does come at a costin efficiency for the fixed effects model and a loss of power for the incidentaltrends model compared with existing methods. For example, the convergence rate√

nT is slower than the√

nT rate obtained in Levin, Lin, and Chu (2002) for abias-corrected OLS estimator.

In what follows, Section 2 considers the model and estimator for a simple dy-namic model with fixed effects, where the basic idea of our transformation isexplained. Section 3 deals with a dynamic panel model where exogenous vari-ables are present, and Section 4 studies the case with incidental trends. Section 5applies the new approach to panel unit root testing. Section 6 contains some con-cluding remarks. Proofs are in Appendix A and additional formulas are givenin Appendix B. Throughout the paper we define 00 = 1 and use Tj to denotemax(T − j,0). We assume that data are observed for t = 0,1, . . . ,T .

2. SIMPLE DYNAMIC PANELS

2.1. A New Estimator and Limit Theory

We consider the simple dynamic panel model

yit = αi +uit, uit = ρuit−1 + εit, ρ ∈ (−1,1],

implying

yit = (1−ρ)αi +ρyit−1 + εit, (1)

where αi are unobservable individual effects and εit ∼ i.i.d. (0,σ 2) with finitefourth moments.

This model differs slightly in its components form from the usual dynamicpanel model yit = αi +ρyit−1 + εit in that the individual effects disappear whenρ = 1. This formulation is made only to guarantee continuity in the model for-mulation and the asymptotics at ρ = 1. When |ρ| < 1 the two models are notdistinguishable. Without this alternative parametrization, the data generating pro-cess has a discontinuity at ρ = 1, at which point the individual effects becomeindividual trends.

As is well known, the OLS estimator based on the ‘within’ transformationyields an inconsistent estimator because the transformed regressor and the cor-responding error are correlated—see Nickell (1981), among others. This bias isalso not corrected by first differencing

�yit = ρ�yit−1 +�εit, (2)

because the transformation induces a correlation between �yit−1 and �εit. In-stead, following Phillips and Han (2008), we transform (2) further into the form

2�yit +�yit−1 = ρ�yit−1 +ηit, ηit = 2�yit + (1−ρ)�yit−1. (3)

Page 5: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

122 C. HAN AND P.C.B. PHILLIPS

Then, this formulation produces the following key moment conditions.1

LEMMA 1. If Eε2it = σ 2 for all t and Eεisεit = 0 for all s �= t , then

Egit(ρ) = E�yit−1[2�yit + (1−ρ)�yit−1] = 0, t = 2, . . . ,T (4)

for every ρ ∈ (−1,1].

It is worth noting that the moment conditions in (4) are based on the pattern ofcovariogram of uit = yit − αi implied by the AR(1) assumption. In particular,when uit is stationary, the key conditions are Eu2

it = σ 2/(1−ρ2) and Euituit+1 =ρEu2

it. More precisely, we have

git(ρ) = �yit−1[2�εit + (1+ρ)�yit−1]

= −2yit−2�εit +2yit−1εit −2yit−1εit−1 + (1+ρ)(�yit−1)2.

When |ρ| < 1 and Euitαi = 0, the first two terms have zero mean under Ahn andSchmidt’s (1995) “standard assumptions.” But in order for the sum of the remain-ing two terms to have zero mean, it is required that E(�uit)

2 = 2σ 2/(1 + ρ),which is implied by Eu2

it = σ 2/(1 − ρ2) and Euituit−1 = ρEu2it. These neither

imply nor are implied by the assumptions of Ahn and Schmidt (1995).The T1 (i.e., T − 1) moment conditions in (4) are strong for all ρ ∈ (−1,1]

in the sense that the expected derivatives of the moment functions git(ρ) differfrom zero for all ρ as long as �yit−1 has enough variation across i . This is easilyverified by the calculation E∂git(ρ)/∂ρ = −E(�yit−1)

2.There are many ways to make use of these T1 moment conditions. The simplest

is to use pooled least squares estimation of (3), which leads to

ρols = ∑ni=1 ∑T

t=2 �yit−1(2�yit +�yit−1)

∑ni=1 ∑T

t=2(�yit−1)2,

which we call the first difference least squares (FDLS) estimator. This estimatorhas the following limit distribution.

THEOREM 1. For each T ,√

nT1( ρols −ρ) ⇒ N (0,Vols,T ) as n → ∞ for allρ ∈ (−1,1], where

Vols,T = ET −11

(∑T

t=2 �yit−1ηit)2[

ET −11 ∑T

t=2(�yit−1)2]2 .

As T→∞, Vols,T →2(1+ρ), and furthermore,√

nT1( ρols −ρ)⇒N (0,2(1+ ρ)).

The Gaussian limit theory is valid for any n/T ratio as long as nT1 → ∞,including finite T values for which the variance Vols,T evolves with T . Most re-markably, the joint limit as both n and T pass to infinity is identical to the limitwhere T → ∞ individually, or the sequential limit as T → ∞ and then n → ∞

Page 6: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 123

or the sequential limit as n → ∞ and then T → ∞. As a result, the limit theoryis remarkably robust to different sample size constellations of (n,T ), and simula-tions support the resulting intuition that testing based on Theorem 2 should showlittle size distortion.

We remark that the fourth-moment condition Eε4it < ∞ is required for the limit

theory to hold. For small T , the variance Vols,T can be expressed directly in termsof the parameters, using the general formula given in (B.7) in Appendix B. Forexample, if εit ∼ N (0,σ 2) or more weakly Eε4

it = 3σ 4 and if T = 2, then we haveVols,2 = (1 +ρ)(3 −ρ). For T > 2 (and fixed) the variances Vols,T are plotted inFigure 1. The expression is quite complicated for general ρ, and is unlikely tobe practically useful because Vols,T depends on the nuisance fourth moment ofεit, except for ρ = 1 (see Corollary 1 below), and because Vols,T can be readilyestimated by just replacing the expectation operators with averaging over i and theerror process ηit with the residuals ηit from the regression of (3). More specifically,when n is large, Vols,T is estimated by

Vols,T = (nT1)−1 ∑n

i=1

(∑T

t=2 �yit−1ηit)2

[(nT1)−1 ∑n

i=1 ∑Tt=2(�yit−1)2

]2 , (5)

where ηit = 2�yit + �yit−1 − ρols�yit−1. The corresponding standard error forρols is

se( ρols) =⎡⎣ n

∑i=1

(T

∑t=2

�yit−1ηit

)2⎤⎦

1/2/ n

∑i=1

T

∑t=2

(�yit−1)2. (6)

FIGURE 1. Vols,T for various T with normal errors. As T → ∞, Vols,T approaches2(1+ρ). The convergence is fast when ρ > 0.

Page 7: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

124 C. HAN AND P.C.B. PHILLIPS

As T → ∞, both Vols,T and Vols,T converge to 2(1+ρ), and the N (0,2(1+ ρ))limit holds if T → ∞ whether or not n → ∞. Technically speaking, the jointlimit as both n → ∞ and T → ∞ coincides with the sequential limit as n → ∞followed by T → ∞ or the sequential limit as T → ∞ followed by n → ∞.In this case, both (5) and 2(1 + ρols) are consistent for Vols,T , so either formulacan be used. This undiscriminating feature is a characteristic of the new approach.

If n is small and T is large, then performance of (5) may be poor, as it re-lies on the law of large numbers across cross-sectional units. But in this casethe 2(1 + ρols) formula approximates the actual variances quite well. This goodperformance of the asymptotic theory has been confirmed in earlier simulationsreported in Phillips and Han (2008) for the time series case where n = 1.

When ρ = 1, the differenced data (�yit) are i.i.d. over both cross-sectional andtime-series dimensions, resulting in the same Gaussian limit holding as nT → ∞(more precisely, nT1 → ∞) irrespective of the n/T ratio, as given in the followingresult:

COROLLARY 1. If ρ = 1, then (nT1)1/2( ρols −1) ⇒ N (0,4) as nT1 → ∞.

This specific asymptotic distribution can also be derived from Bond et al.(2005).

Simulation results are given in Table 1 for T = 2 and T = 24. The exact formof Vols,T is provided in (B.7) in Appendix B. The simulated test size based ont-ratios with the standard errors obtained by (6) are given in the “size” columns.The results generally reflect the asymptotic theory well, including the consistencyof both the estimator and the standard error. When n is relatively small, the stan-dard errors according to (6) slightly underestimate the true variance. The asymp-totic variance formula 2(1 + ρ) for

√nT1( ρols − ρ) from Theorem 1, which is

appropriate when T → ∞, obviously does not perform as well when T is assmall as it is in this experiment, although the formula is surprisingly good whenρ is close to unity. Also, when T ≥ 3, the formula 2(1 +ρ) is rather close to thetrue variance for reasonably large ρ values (e.g., ρ ≥ 0.5), at least if the errors arenormally distributed. For error distributions with thicker tails, one might expectthat larger values of T might be needed to get a good correspondence with theasymptotic formula based on T → ∞.

Table 2 presents a brief comparison of FDLS, first-differenced GMM (Dif-fGMM), system GMM (SysGMM), and first-differenced MLE (DiffMLE) foryit = αi + uit and uit = ρuit−1 + εit. We observe that the small sample propertiesof system GMM depend on the variance of αi as noted by Hayakawa (2007).Both first-differenced MLE and FDLS are consistent, and obviously the MLE ismore efficient than FDLS. (This efficiency gap becomes larger as T increases.For example, when N = 50 and T = 10 with ρ = 0.5, the variance of the FDLSestimator is approximately 2.5 times as large as that of the MLE, according to un-reported simulations.) It is worth noting that the comparison of GMM estimatesand FDLS should be interpreted with caution because they are based on differentsets of moment conditions.

Page 8: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 125

TABLE 1. FDLS for εit ∼ N (0,1). Simulations conducted using Gauss with10,000 iterations. The limit variance Vols,T (denoted by V in this table) is cal-culated by (B.7) in Appendix B. The sizes of test based on the t-ratios using (6)are listed in the “size” columns.

T = 2ρ = 0 (V = 3) ρ = −0.5 (V = 1.75) ρ = −0.9 (V = 0.39)

n mean nT1var size mean nT1var size mean nT1var size

50 0.000 3.189 0.077 −0.499 1.857 0.076 −0.900 0.401 0.071100 0.002 3.081 0.063 −0.500 1.822 0.063 −0.900 0.387 0.058200 0.000 3.031 0.054 −0.500 1.734 0.054 −0.900 0.392 0.051400 0.000 3.032 0.053 −0.499 1.779 0.054 −0.900 0.387 0.055

ρ = 0.5 (V = 3.75) ρ = 0.9 (V = 3.99) ρ = 1 (V = 4)

n mean nT1var size mean nT1var size mean nT1var size

50 0.498 3.968 0.075 0.901 4.049 0.067 1.001 4.066 0.068100 0.500 3.808 0.064 0.900 4.041 0.061 1.001 3.980 0.058200 0.501 3.823 0.057 0.899 3.962 0.055 1.000 4.047 0.057400 0.500 3.846 0.054 0.900 3.930 0.052 1.001 4.098 0.058

T = 24ρ = 0 (V = 2.043) ρ = −0.5 (V = 1.074) ρ = −0.9 (V = 0.313)

n mean nT1var size mean nT1var size mean nT1var size

50 0.001 2.057 0.061 −0.498 1.091 0.062 −0.899 0.329 0.069100 0.000 2.077 0.059 −0.499 1.095 0.056 −0.899 0.321 0.060200 0.000 2.039 0.052 −0.500 1.053 0.049 −0.900 0.325 0.057400 0.000 2.035 0.050 −0.500 1.079 0.051 −0.900 0.324 0.050

ρ = 0.5 (V = 2.987) ρ = 0.9 (V = 3.758) ρ = 1 (V = 4)

n mean nT1var size mean nT1var size mean nT1var size

50 0.500 2.986 0.059 0.900 3.735 0.059 1.000 3.988 0.059100 0.500 3.036 0.054 0.900 3.723 0.053 1.000 4.039 0.057200 0.500 3.036 0.056 0.900 3.685 0.050 1.000 3.841 0.045400 0.500 3.073 0.054 0.900 3.817 0.053 1.000 4.071 0.055

As the moment functions are correlated over t , optimal GMM, which we callfirst difference GMM (FDGMM), is possibly a more efficient alternative to pooledOLS. The efficiency gain, however, looks marginal in our case. When ρ = 1, itcan be shown that OLS equals the optimal GMM. For other ρ values, the varianceratio Vgmm/Vols (where Vgmm and Vols are the variances of FDGMM and FDOLS,respectively) is evaluated in Figure 2 in the case of standard normal errors,with ρ = −0.5,0,0.5,0.9,1 and T = 2,3, . . . ,100. The lowest variance ratio is

Page 9: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

126 C. HAN AND P.C.B. PHILLIPS

TABLE 2. Comparison of various estimates (10,000 iterations).DGP: yit = αi +uit, uit = ρuit−1 +εit, εit∼N (0,1), αi ∼ N (1,σ 2

α );sd(ui,−1) = (1−ρ2)−1/2{|ρ| < 1}+25{ρ = 1}, N = 200, T = 3.

σα = 1, σε = 1

ρ DiffGMM SysGMM DiffMLE FDLS

0.5 0.487 (0.155) 0.497 (0.090) 0.499 (0.079) 0.501 (0.089)0.7 0.679 (0.188) 0.687 (0.093) 0.698 (0.082) 0.700 (0.093)0.9 0.848 (0.294) 0.875 (0.107) 0.896 (0.083) 0.899 (0.097)1.0 0.000 (0.940) 0.945 (0.138) 0.996 (0.082) 0.998 (0.100)

σα = 5, σε = 1

ρ DiffGMM SysGMM DiffMLE FDLS

0.5 0.448 (0.298) 0.507 (0.151) 0.499 (0.079) 0.501 (0.089)0.7 0.572 (0.453) 0.668 (0.167) 0.698 (0.082) 0.700 (0.093)0.9 0.579 (0.707) 0.816 (0.200) 0.896 (0.083) 0.899 (0.097)1.0 −0.004 (0.931) 0.884 (0.237) 0.996 (0.082) 0.998 (0.100)

approximately 0.99, which is obtained at ρ = 0.5 and T = 5, indicating thatthe efficiency gain of optimal GMM over OLS is marginal. But note that thissimulation result applies only to normally distributed errors. From additional ex-periments (not reported here) it was found that the efficiency of GMM over OLSis responsive to kurtosis, but for reasonable degrees of kurtosis, the efficiencygain of FDGMM remains marginal. For example, when var((εi/σ)2) = 7, theminimal Vgmm/Vols ratio is approximately 0.98.

Because the performance of the feasible two-step GMM estimator may deteri-orate due to inaccurate estimation of the covariance matrix, the two-step efficientGMM may yield a poorer estimator than OLS when the efficiency gain of the in-feasible optimal GMM is marginal. When εit is normally distributed, this is likelyto be the case. According to simulations not reported here, the two-step efficientGMM (using OLS as the first step estimator) looks less efficient than OLS for awide range of ρ and T values up to quite a large n. So we generally recommendFDLS over FDGMM for practical use.

It is interesting to view FDLS in the context of method of moments andcompare it with other consistent estimators. For the model yit = αi + uit,uit = ρuit−1 + εit, under the further assumption that αi is uncorrelated withεit (and ui,−1), we get the moments

Eyit yis = Eα2i +σ 2ρ|t−s|/(1−ρ2), |ρ| < 1,

Eyit yis = Eα2i + Eu2

i,−1 +σ 2(s ∧ t +1), ρ = 1,

Page 10: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 127

FIGURE 2. Variance ratio Vgmm/Vols for normal errors for T = 2,3, . . . ,100. The mini-mum efficiency of FDLS relative to FDGMM is approximately 0.99, with the low point be-ing attained at ρ = 0.5 and T = 4. The efficiency gain of FDGMM over FDLS is marginal.

for all t,s = 0,1, . . . ,T , which in turn provide (T +1)(T +2)/2 distinct moments.Next, rewrite the moments in terms of ( yi0,�y′

i )′ = ( yi0,�yi1, . . . ,�yiT)′ as

Ey2i0 = Eα2

i +ρ2 Eu2i,−1 +σ 2,

Eyi0�yit = −σ 2ρt−1/(1+ρ) ·1{ρ<1}, t ≥ 1,

E(�yit)2 = 2σ 2/(1+ρ), and

E�yit�yis = −σ 2ρ|t−s|−1(1−ρ)/(1+ρ), t �= s,

or in matrix form as

E

[yi0�yi

][yi0�yi

]′

= σ 2

1+ρ

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

ξρ −1{ρ<1} −ρ{ρ<1} · · · −ρT1{ρ<1}

−1{ρ<1} 2 −(1−ρ) · · · −ρT2(1−ρ)

−ρ{ρ<1} −(1−ρ) 2 · · · −ρT3(1−ρ)

......

.... . .

...

−ρT1{ρ<1} −ρT2(1−ρ) −ρT3(1−ρ) · · · 2

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦,(7)

Page 11: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

128 C. HAN AND P.C.B. PHILLIPS

where ξρ = (Eα2i +ρ2 Eu2

i,−1 +σ 2)(1+ρ)/σ 2. (Note that the moments Ey2i0 and

E�yi yi0 depend on the condition that the αi are uncorrelated with ui,−1 and εit

but the moments E�yi�yi do not.) Rewriting the moments Eyi y′i as (7) does not

waste any information because we can recover the original moments Eyi y′i by

a linear transformation. Now, among these (T + 1)(T + 2)/2 distinct moments,Ey2

i0 contains the nuisance parameters Eα2i and Eu2

i,−1 and, in fact, only Ey2i0

does so. Thus this element does not contribute to the estimation of ρ and can besafely ignored. Now, in what follows, we will show that each conventional (con-sistent) estimator can be derived as a (generalized) method of moments estimatorusing a subset of the above moment conditions.

Let us first consider the conventional IV/GMM estimators such as those ofAnderson and Hsiao (1981) and Arellano and Bond (1991). The momentconditions

E fi1t (ρ) = Eyi0(�yit −ρ�yit−1) = 0, t ≥ 2, (8)

are obtained by combining any two consecutive elements of the first column(except for Ey2

i0). Furthermore, any lower off-diagonal element of E�yi�y′i and

the element below it provides the moment condition

Ehist (ρ) = E�yis(�yit −ρ�yit−1) = 0, s < t −1, (9)

for some s and t . These moment conditions are strong when ρ < 1, so theIV/GMM estimators are consistent. But if ρ 1, then the moment conditionsare weakly identifying, and in case ρ = 1, each moment condition in (8) and (9)fails to identify the true parameter because then E fi1t (ρ) ≡ 0 and Ehist(ρ) ≡ 0.When ρ = 1, if T is small and fixed, then the limit distribution of the GMMestimator is nondegenerate due to the lack of identification (see, e.g., Phillips,1989; Staiger and Stock, 1997). But if T is large, then the unweighted GMM us-ing these moment conditions converges to zero (under some regularity conditionsfor ui,−1 = yi,−1 − αi ), because the unweighted criterion function satisfies theconvergence

q−1n

⎧⎨⎩

T

∑t=2

[n−1/2

n

∑i=1

fi1t (ρ)

]2

+ ∑s<t−1

[n−1/2

n

∑i=1

hist (ρ)

]2⎫⎬⎭→p (1+ρ2)σ 4,

(if the initial condition that (nT )−1/2 ∑ni=1 ui,−1 →p 0 holds under other regularity

conditions) due to the accumulating signal variability and despite the fact thateach moment condition fails to identify any ρ value, and this limit is minimizedat zero (see Han and Phillips, 2006, for a detailed study of such situations), whereqn = (T − 1)+ (T − 1)(T − 2)/2 is the total number of moment conditions. SoIV/GMM estimation based on (8) and (9) can hardly be used successfully when ρmay take values near unity. The behavior of the two-step efficient GMM estimatorin this case has not been determined.

Page 12: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 129

Unlike this IV/GMM estimator, which uses the off-diagonal elements of (7),our approach works from the moment conditions on the diagonal elements ofE�yi�y′

i . Each diagonal element of E�yi�y′i in (7) and the element right below

it construct a moment condition in (4), i.e., (1−ρ)E(�yit−1)2 +2E�yit−1�yit =

0, or equivalently, E�yit−1[2�yit + (1−ρ)�yit−1] = 0. It is also interesting thatmore moment conditions can be obtained by combining the diagonal elementsand their left elements, viz.,

Eg∗it(ρ) = E�yit[2�yit−1 + (1−ρ)�yit] = 0, t = 2, . . . ,T . (10)

These moment conditions are symmetric to (4) and are obtained by swapping theroles of �yit and �yit−1. We may consider GMM estimation or OLS estimationusing these additional moment conditions together with those in (4). Simulationsillustrate that the efficiency gain over ρols by considering (10) is considerablewhen T = 2, especially for negative ρ, but the contribution of these additionalmoment conditions diminishes with T . Furthermore, when ρ = 1, git(ρ) of (4)and g∗

it(ρ) of (10) are asymptotically identical when evaluated at the true param-eter, so the moment conditions are singular and the traditional feasible optimalGMM procedure does not provide standard asymptotics should we use both (4)and (10). In that case, procedures based on analytically calculated optimal weights(so the weights are a function of ρ) could avoid the singularity, but this sort ofgeneral analytic weighting scheme cannot be implemented because the optimalweighting matrix depends on the nuisance parameter Eε4

it. When εit ∼ N (0,σ 2),however, the variances of git(ρ) and g∗

it(ρ) are almost equal except for ρ −1according to simulations, implying that the unweighted sum git(ρ)+ g∗

it(ρ) is an(almost) optimal transformation of git(ρ) and g∗

it(ρ). Especially, when ∑Tt=2 git(ρ)

is used (as in the derivation of ρols) instead of each git(ρ) separately, the un-weighted sum ∑T

t=2[git(ρ) + g∗it(ρ)] is an almost optimal exactly identifying

moment condition with normal errors. This leads to a natural OLS regressionof the pooled dependent variable (2�yit + �yit−1,2�yit−1 + �yit)

′ on the cor-respondingly pooled independent variable (�yit−1,�yit)

′. Let us denote thisestimator by ρ∗∗

ols. According to simulations, when ρ = 0 and T = 2, the varianceratio var( ρ∗∗

ols)/var( ρols) is about 0.75, meaning that by additionally using themoment condition that Eg∗

i2(ρ) = 0, we can reduce 25% of the variance. Forother values of ρ and T , Table 3 reports the ratio of the variance of the OLSestimator based on git(ρ) to the variance of the OLS estimator based on bothgit(ρ) and g∗

it(ρ). Note again that this result applies only to normal errors, andwhen the tails of the error distribution are thicker (e.g., the t5 distribution), g∗

it(ρ)will have larger variance than git(ρ), and as a result, the estimator ρ∗∗

ols may beeven less efficient than ρols because of the failure of optimal weighting.

In summary, the loss of efficiency from using OLS rather than GMM ismarginal, and the gain from adding g∗

it(ρ) is not big enough compared with thepossible risk of singularity (in case of GMM) and efficiency loss (in case ofpooled OLS). In view of these many considerations, the original FDLS method(which yields ρols) is again recommended for practical use.

Page 13: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

130 C. HAN AND P.C.B. PHILLIPS

TABLE 3. Efficiency gain by adding g∗it(ρ). Variance

ratios var( ρ∗∗ols)/var( ρols) are reported.

ρ \ T 2 3 4 5 7 10 20

−0.5 0.44 0.47 0.56 0.62 0.73 0.79 0.900.0 0.75 0.78 0.86 0.88 0.92 0.95 0.980.5 0.93 0.95 0.97 0.98 0.99 0.99 1.000.9 0.99 1.00 1.00 1.00 1.00 1.00 1.001.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00

2.2. The Explosive Case

When ρ > 1, the differences �yit continue to manifest explosive behavior and allthe good properties of FDLS do not hold. The estimator is inconsistent and thelimit depends on the n/T ratio and the initial status ui,−1. The following lemmais indicative of what happens in this case.

LEMMA 2. If ρ > 1, then

E1

T1

T

∑t=2

(�yit−1)2 = (ρ +1)−1 [2+ν(ρ,T )]σ 2, and

E1

T1

T

∑t=2

�yit−1ηit = ν(ρ,T )σ 2,

where

ν(ρ,T ) =[

ρ2(ρ2T1 −1)δρ

T1(ρ +1)

], δρ = (ρ2 −1)Eu2

i,−1/σ2 +1.

When T is fixed, this lemma implies, by the law of large numbers, that

plimn→∞

ρols = ρ + (ρ +1)ν(ρ,T )

2+ν(ρ,T ).

Thus, in the case where ρ is close to unity and Eu2i,−1 is small so that ν(ρ,T ) is

negligible, the inconsistency of ρols is also small. For a given ρ, the bias of ρols isbigger when T is large than when T is small. If ρ is even closer to unity and suchthat

√nν(ρ,T ) → 0, then the bias of

√nT1( ρols −ρ) is negligible, leading to a

Gaussian limit whose variance changes continuously as ρ deviates slightly fromunity into the explosive area. This observation implies that the limit distributionof Theorem 1 is continuous as ρ passes through unity to very mildly explosivecases.

Page 14: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 131

The case with large T and fixed n can be analyzed by Theorems 2 and 3 ofPhillips and Han (2008), who consider the time series case. Again, the asymp-totics are continuous as ρ passes through unity under the initial condition that(ρT −1)u2

i,−1 →p 0 where ρ = ρT ↓ 1.If both n and T are large, the N (0,2(1 +ρ)) asymptotics are still continuous

as ρ passes through the boundary of unity into the explosive area, though theboundary of ρ for the continuous asymptotics is then correspondingly narroweron the explosive side of unity.

2.3. Heterogeneity and Cross-Section Dependence

The remainder of this section considers issues of cross-sectional heteroskedastic-ity, heterogeneity, and cross-section dependence.

If the error variance Eε2it is different across i , then Theorem 1 still applies as

long as the Lindeberg condition holds, with the only modification to Vols,T beingthe more general expression

limn→∞

(nT1)−1 ∑n

i=1 E(∑Tt=2 �yit−1ηit)

2[(nT1)−1 ∑n

i=1 E ∑Tt=2(�yit−1)2

]2 .

Computation of standard errors by (6) remains valid.If the AR coefficient changes across i so that the model involves uit = ρi uit−1 +

εit, then we have the limit

ρols →p plimn→∞

n

∑i=1

wiρi

if the limit on the right-hand side exists, where

wi = ∑Tt=2(�yit−1)

2

∑ni=1 ∑T

t=2(�yit−1)2.

Noting that E(�yit−1)2 = 2σ 2

i /(1 + ρi ), we see that an individual unit withsmaller ρi is given a bigger weight if there is no heterogeneity.

To allow for cross-section dependence, let εit = ∑Kk=1 λik fkt + vit, where the

fkt are common shocks i.i.d. over t and independent of other shocks, λik are thefactor loadings, and the vit are i.i.d. Since the compounded error εit is still whitenoise, the moment conditions (4) still hold. However, consistency does not holdunless T → ∞ or K → ∞ (where K is the number of common factors). The caseK = 1 is exemplary. Here (�yit−1)

2 involves (λi1 f1t )2, and the randomness in

the double sum

(nT1)−1

n

∑i=1

T

∑t=2

(λi1 f1t )2 = n−1

n

∑i=1

λ2i1 · T −1

1

T

∑t=2

f 21t

Page 15: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

132 C. HAN AND P.C.B. PHILLIPS

persists unless T is large (cf. Phillips and Sul, 2007). When T is small, as isthe case in typical microeconometric work, K may be considered large and eachfactor loading is assumed to be sufficiently small to ensure convergence. (Thenthe variation of the cross-section dependence term ∑K

k=1 λik fkt is correspondinglycontrolled.) As long as the common factors fkt are white noise over time, themoment conditions (4) hold, and under further regularity to ensure convergence,the FDLS estimator would be consistent. See Phillips and Sul (2007) for detailson panel models with these characteristics.

3. DYNAMIC PANELS WITH EXOGENOUS VARIABLES

This section considers the model with exogenous variables yit = αi +β ′xit + uit,where uit = ρuit−1 + εit for ρ ∈ (−1,1], which may be transformed to

yit = (1−ρ)αi +β ′(xit −ρxit−1)+ρyit−1 + εit. (11)

Let zit =yit −β ′xit. Then the model (11) is written as zit =(1−ρ)αi +ρzit−1 + εit,which is the same as (1) with yit replaced with zit. By applying the same trans-formation, we get

2�zit +�zit−1 = ρ�zit−1 +ηit, ηit = 2�εit + (1+ρ)�zit−1.

For all ρ, �zit−1 and ηit are uncorrelated and these moment conditions are strongfor all ρ, as before. Next, if we allow αi to be arbitrarily correlated with xit, thenwe can apply the within transformation to (11), giving

yit −ρ yit−1 = (xit −ρ xit−1)′β + εit,

where yit = yit − T −1 ∑Ts=1 yis , yit−1 = yit−1 − T −1 ∑T

s=1 yis−1, and so on. Fromthe fact that the within-group estimator is efficient when αi is allowed to becorrelated with xit in the usual linear panel data model (see Im, Ahn, Schmidt,and Wooldridge, 1999), we propose the following exactly identifying momentconditions:

ET

∑t=2

�zit−1[(2�zit +�zit−1)−ρ�zit−1] = 0, (12)

ET

∑t=1

(xit −ρ xit−1)[( yit −ρ yit−1)− (xit −ρ xit−1)

′β]= 0, (13)

where �zit = �yit −�x ′itβ. Note that when there is no exogenous variable, the

first moment condition (12) leads to the FDLS estimator of Section 2. We canestimate ρ and β by the method of moments. In practice, after (12) and (13)are rewritten in terms of the sample moment conditions, the parameters can beestimated by estimating ρ and β iteratively between (12) and (13) if the procedureconverges.

Page 16: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 133

For the asymptotic variance, we can use the usual “(D′�−1 D)−1” formula,where D contains the expected scores of (12) and (13) and � is the variancematrix of those moment conditions, both evaluated at the true parameter. Con-veniently, D is block diagonal, and if εit has zero third moment, then so isthe � matrix, separating the estimation of ρ from the estimation of β. (SeeAppendix B for further details.) As a result, we can treat the final estimator β asthe true β parameter in computing the �zit’s and then estimate ρ and computeits standard error; similarly, we can treat the final ρ as the true ρ parameter andcompute the standard errors of β using usual within-group estimation techniqueafter transforming xit and yit to xit − ρxit−1 and yit − ρyit−1, respectively.

4. INCIDENTAL TRENDS

When the model includes incidental trends so that yit = αi + γi t + uit, uit =ρuit−1 + εit, it may be written in the form

yit = (1−ρ)αi +ργi + (1−ρ)γi t +ρyit−1 + εit. (14)

Double differencing eliminates the combined fixed effects, giving

�2 yit = ρ�2 yit−1 +�2εit, (15)

which implies that

�2 yit =∞∑j=0

ρ j�2εi t− j = εit − (2−ρ)εit−1 + (1−ρ)2∞∑j=0

ρ jεi t− j−2.

When ρ =1, we have �2 yit =�εit, so E(�2 yit−1)2 =2σ 2 and E�2 yit−1�

2εit =−3σ 2. When |ρ| < 1, we have

E(�2 yit−1)2 =

[1+ (2−ρ)2 + (1−ρ)4

1−ρ2

]σ 2 = 2(3−ρ)σ 2

1+ρ

and

E�2 yit−1�2εit = −(4−ρ)σ 2,

so these formulas cover the case of ρ = 1. Thus,

E�2 yit−1ηit = 0, ηit = 2�2εit + (1+ρ)(4−ρ)

3−ρ�2 yit−1.

This corresponds to transforming (15) to the model in differences

2�2 yit +�2 yit−1 = θ�2 yit−1 + ηit, θ = − (1−ρ)2

3−ρ. (16)

Page 17: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

134 C. HAN AND P.C.B. PHILLIPS

Correspondingly, the double difference least squares (DDLS) estimator θ is

θ = ∑ni=1 ∑T

t=3 �2 yit−1(2�2 yit +�2 yit−1)

∑ni=1 ∑T

t=3(�2 yit−1)2

.

Note that θ ∈ (−1,0] for all ρ ∈ (−1,1] and θ = 0 if ρ = 1. When point estimationof ρ is of interest, we can simply run pooled OLS on (16) to get θ and censor at0 and −1, and then recover ρ from θ by ρ = 1

2 [2+ θ −(θ2 −8θ )1/2]. Alternatively,we may also consider method of moments estimation based on (16) using themoment condition

ET

∑t=3

�2 yit−1

(2�2 yit +

[1+ (1−ρ)2

3−ρ

]�yit−1

)= 0.

Some caution is needed here because the parameter θ = −(1 −ρ)2/(3 −ρ) cannever exceed zero for all ρ ∈ (−1,1], and therefore the sample moment functionmay not attain zero at any parameter value.

The asymptotic distribution of the DDLS estimator for θ is given as follows.

THEOREM 2. For each T ,√

nT2(θ − θ) ⇒ N (0,Wols,T ) for some positiveWols,T . Furthermore, Wols,T → Wols for some Wols > 0 as T → ∞. Whether ornot n → ∞,

√nT2(θ − θ) ⇒ N (0,Wols).

The variance Wols,T is presented in Appendix B for the special cases of ρ = 0 andρ = 1 and is given in general form because of the complexity. An expression forWols is given in Theorem 4 of Phillips and Han (2008).

For testing, we can rely on this asymptotic distribution. Just as for the casewithout incidental trends, the limit theory is continuous and joint as n → ∞ orT → ∞ or both pass to infinity with Wols,T evolving with T . However, as isapparent from the relation θ = −(1−ρ)2/(3−ρ), an O

(n−1/4T −1/4

)neighbor-

hood of ρ = 1 corresponds to an O(n−1/2T −1/2

)neighborhood of θ = 0, and for

ρ = 1, it is easily seen that the rate of convergence of ρ is at the slower n1/4T 1/42

rate, while that of θ is n1/2T 1/22 . For ρ < 1, the rates of convergence of ρ and

θ are both n1/2T 1/22 . Hence, there is a deficiency in the convergence rate for ρ

around unity.When T is small, Wols,T depends on Eε4

it, and the algebraic form of Wols,T forsmall T is too complicated to be of interest. But when n is large, as in typicalmicroeconometric projects, the standard error of θ is easily calculated by

se(θ) =√

∑ni=1(∑

Tt=3 �2 yit−1ηit)2

∑ni=1 ∑T

t=3(�2 yit−1)2

, (17)

with ηit denoting the residuals from the regression of (16). On the other hand,in macroeconometrics, where T is often moderately large, then

√nT2(θ − θ) ⇒

Page 18: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 135

N (0, limT →∞ Wols,T ), where an expression for limT →∞ Wols,T is given in The-orem 4 of Phillips and Han (2008). As presented in Theorems A.1 and A.2 inAppendix A, the convergence is uniform in the sense that the limit of the varianceis the variance of the limit distribution as T → ∞. Note that when ρ < 1, theasymptotic variance of ρ (indirectly obtained from θ ) may be obtained using thedelta method.

As a special case, if ρ = 1, then√nT2θ ⇒ N (0,2+κ4/2T2) (18)

for all T as n → ∞, where κ4 = var(ε2it)/σ

4 (this variance is calculated inAppendix B), and the limit distribution as T → ∞ is simply N (0,2) whether n islarge or small.

5. PANEL UNIT ROOT TESTING

5.1. Fixed Effects Model

The inferential apparatus and its limit theory may be applied directly to panel unitroot testing. Consider the fixed effects panel yit = (1−ρ)αi +ρi yit−1 +εit wherethe εit are i.i.d. The unit root null hypothesis is that ρi = 1 for all i . The OLS es-timator ρols and its limit theory in Theorem 1 and Corollary 1 can form the basisof a statistical test. More precisely, the test statistic is τ0 := (nT1)

1/2( ρols −1)/2.This test statistic is derived under the assumption of cross-sectional homoskedas-ticity. If the variances σ 2

i := Eε2it differ across i and ρi ≡ 1, then the standard

deviation of the limit distribution of ρols is approximated by

2(∑n

i=1 σ 4i

)1/2

T 1/21 ∑n

i=1 σ 2i

= 2

(nT1)1/2 ×(n−1 ∑n

i=1 σ 4i

)1/2

n−1 ∑ni=1 σ 2

i

, (19)

which is larger than the simple standard error form 2/(nT1)1/2 obtained for ho-

moskedastic data. Under heteroskedasticity, the τ0 statistic can be computed byusing formula (6) for the standard error, or by replacing σ 2

i in (19) with the esti-mate σ 2

i = T −1 ∑Tt=1(�yit)

2. Under the null hypothesis, τ0 ⇒ N (0,1), and underthe alternative hypothesis that ρi < 1 for some i (where the number of such iis a nonnegligible fraction of n), plim ρols < 1, and as a result, τ0 →p −∞ asnT → ∞. So, the test is consistent for any passage of nT → ∞.

Unlike Levin and Lin (1992) or Im, Pesaran, and Shin (1997), this test doesnot require any restriction on the path to infinity such as T → ∞ and n/T → 0.Only the composite divergence nT → ∞ is required, and there is virtually nosize distortion when T is small. On the other hand, while the point optimal testfor a unit root has local power in a neighborhood of unity that shrinks at therate n−1/2T −1 (see Moon and Perron, 2004; Moon, Perron, and Phillips, 2006b),the τ0 test has only trivial asymptotic power in n−1/2T −1 neighborhoods andnontrivial local asymptotic power in n−1/2T −1/2 neighborhoods of unity. So in

Page 19: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

136 C. HAN AND P.C.B. PHILLIPS

this model at least, there is an infinite power deficiency in neighborhoods of ordern−1/2T −1.

This deficiency is partly due to the fact that ρols depends only on differenceddata, which thereby reduces the signal relative to that of point optimal and othertests that make direct use of the nonstationary regressor. But that is not the onlyreason, and, interestingly, a common point optimal test based on the differenceddata may attain the optimal rate of n1/2T . To illustrate this possibility, considerthe common point likelihood ratio test for the case where the εit are i.i.d. N (0,σ 2)and σ 2 is known. The test for H0 : n1/2T (1 −ρ) = 0 against the alternative H1 :n1/2T (1−ρ) = c using the differenced data �yit is based on the likelihood ratio

UnT(c) = 2[log L

(1−n−1/2T −1c

)− log L(1)

], (20)

where log L(·) is calculated based on the joint distribution of �yi = (�yi1, . . . ,�ynT)′, i.e.,

log L(ρ)=−nT

2log(2π)−nT

2logσ 2− n

2log |�T (ρ)|− 1

2σ 2

n

∑i=1

�y′i�T (ρ)−1�yi ,

with

�T (ρ) = (1+ρ)−1Toeplitz{

2,−(1−ρ),−ρ(1−ρ), . . . ,−ρT2(1−ρ)}

. (21)

In the above, the notation Toeplitz{a0, . . . ,am−1} denotes the m × m symmetricToeplitz matrix whose (i, j) element is a|i− j |. As shown in the Appendix, whenn1/2T (1−ρ) = c (i.e., under the alternative hypothesis), we have

UnT(c) →d N (c2/8,c2/2). (22)

So if the null hypothesis of n1/2T (1−ρ) = 0 is tested against the alternative thatn1/2T (1 −ρ) = c such that the null hypothesis is rejected for UnT(c) ≥ zαc/

√2

where �(−zα) = α, and if the c happens to equal the true c, then the local powerof this test is

(c

4√

2− zα

).

For all c > 0, this local power function resides below the power envelope�(c/

√2 − zα) based on the level data for large T obtained by Moon Perron,

and Phillips (2006b). Nonetheless, it is remarkable that the optimal rate can beattained by using differenced data.

As discussed so far, the deficiency of testing using ρols comes partly fromdifferencing and partly from inefficient use of the moment conditions. This factmay at first seem to contradict our earlier observation that ρols is almost as goodas the (infeasible) optimal GMM estimator based on the differenced data (e.g.,

Page 20: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 137

Figure 2). Nonetheless, it seems that maximum likelihood estimation on thedifferenced model combines the moment conditions so cleverly that at ρ = 1(at which point the levels-based MLE is superconsistent), the otherwise use-less moment conditions contribute to estimating ρ = 1. (See the last part ofSection 2.3.) A full analysis and comparison of panel MLE in levels and dif-ferences that explores this issue will be useful and interesting, and deserves aseparate research paper.

To return to testing based on the FDLS estimator, the deficiency in power basedon this procedure may be interpreted as a cost arising from the simplicity of theτ0 test, the uniform convergence rate of the estimator, and its robustness to theasymptotic expansion path for (n,T ). Table 4 reports the simulated size and powerof the τ0 test, in comparison to Im et al.’s (1997) test, for the data generating pro-cess yit = (1 −ρi )αi +ρi yit−1 +σiεit with alternative parameter settings, whereαi and εi are standard normal and σi ∼ U (0.5,1.5). To simulate power, the casesρi ≡ 0.9 and ρi ∼ U (0.9,1) are considered. Panel length is chosen to be T = 6and T = 25, choices that roughly illustrate size and power for small and moderateT . (T = 6 is the smallest value covered by Table 1 of Im et al. (2003).) Note thatthe τ0 test does not require bias adjustment. It is also remarkable that the τ0 testseems to have better power than the IPS test when T is small. But with largerT (T = 25 in the simulation), the IPS test has better power, which is related to theO(

√nT ) convergence rate of the FDLS estimator.

For panels with small T , Bond et al. (2005) report size and power for vari-ous tests such as OLS in levels, Breitung and Meyer’s (1994) test, Harris andTzavalis’s (1999) test based on LSDV, a test based on Blundell and Bond’s (1998)system GMM, a test using Hsiao et al.’s (2002) MLE in differences, the test basedon FDLS (labeled “FD” in their paper), and others. See their paper for more onthe comparison. Note that FDLS, system GMM, and MLE in differences are con-sistent both under the null and the alternative hypotheses, while others are basedon estimators consistent only under the null hypothesis.

5.2. Incidental Trends Model

Next consider the case where incidental trends are present, as laid out in Section4. Let θ be the pooled OLS estimator from the regression of (16). Notingthat ρ = 1 corresponds to θ = 0, we can base the panel unit root test on thestatistic τ1 := θ/se(θ) ⇒ N (0,1), where se(θ) is given in (17) when n islarge or se(θ) = √

2/nT2 when T is large. (Again note that (17) is robustto the presence of cross-sectional heteroskedasticity.) The null hypothesisH0 : ρi = 1 for all i is rejected if τ1 is less than the left-tailed critical valuefrom the standard normal distribution. When some ρi are smaller than unity(again with the number of such i comparable to n), or equivalently when someθi are smaller than 0, θ converges in probability to the limit of ∑n

i=1 wiθi , wherewi = ∑T

t=3(�2 yit−1)

2/∑ni=1 ∑T

t=3(�2 yit−1)

2. As long as a nonnegligible portionof individual units i have ρi < 1, this limit is strictly negative, and hence the test

Page 21: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

138 C. HAN AND P.C.B. PHILLIPS

TABLE 4. Simulated size and power of unit root tests with in-cidental intercepts case. DGP: yit = αi + uit, uit = ρi uit−1+σiεit, αi ,εit ∼ i.i.d. N (0,1), σi ∼ i.i.d. U (0.5,1.5). Significancelevel: 5%

T = 6

ρi ≡ 1 ρi ≡ 0.9 ρi ∼ U (0.9,1)

n HP IPS HP IPS HP IPS

50 6.09 5.83 20.13 10.76 12.13 7.95100 6.00 5.47 28.37 14.77 13.37 9.20200 5.30 5.56 42.88 22.05 18.49 11.52400 5.54 5.97 66.30 35.02 26.77 15.63

T = 25

ρi ≡ 1 ρi ≡ 0.9 ρi ∼ U (0.9,1)

n HP IPS HP IPS HP IPS

50 6.17 5.14 49.56 84.26 22.66 33.61100 5.89 5.34 72.64 98.63 30.01 54.62200 5.05 5.31 93.00 99.99 47.09 81.03400 4.91 5.70 99.73 100.0 71.05 97.83

statistic τ1 diverges to −∞ in probability. Thus, the test τ1 is consistent regardlessof the existence of incidental trends.

Local power of the test is relatively weak, which can be explained by the defini-tion of θ in (16). Since θ has an (nT2)

1/2 rate of convergence, the test should havesome local power when θ is in an O(n−1/2T −1/2

2 ) neighborhood of zero. But that

is the case when ρ is in an O(n−1/4T −1/42 ) neighborhood of unity, which shrinks

at a far slower rate than the optimal n−1/4T −1 rate attained by a point optimaltest when the incidental trends are extracted from the panel data as described inMoon, Perron, and Phillips (2006b).

Interestingly there may exist a unit root test based on the double-differenceddata that has local power in a neighborhood of unity shrinking at the n−1/2T −1/2

1rate when T1/

√n → 0 and at a correspondingly faster rate when T grows faster.

(An earlier version of this paper, Han and Phillips, 2007, contains an analysisregarding this point.) Compared to large-T point optimal tests (such as Plobergerand Phillips, 2002) which have nontrivial power under the O(n−1/4T −1) localalternatives, this

√nT rate would imply the possible existence of more efficient

estimators using double-differenced data for panel data with large N and small T .This interesting issue presents a major challenge for future research.

Page 22: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 139

6. CONCLUSION

This paper develops a simple GMM estimator for dynamic panel data models,which is largely free from bias as the AR coefficient approaches unity, and whichyields standard Gaussian asymptotics for all values of ρ and without any discon-tinuity at unity. The limit theory is also robust in the sense that it performs wellunder all possible passages to infinity, including n → ∞, T → ∞, and all diag-onal paths. The method also extends in a straightforward manner to cases withexogenous variables, cross-section dependence, and incidental trends.

The approach leads to standard Gaussian panel unit root tests. These tests donot suffer from size distortion regardless of the n/T ratio. Illustration of powerproperties of some infeasible likelihood ratio tests indicates that the optimal con-vergence rate can perhaps be achieved using (double) differenced data while at thesame time preserving standard Gaussian limits. Such tests can be expected to beparticularly useful when n is large and T is small or moderate, and to outperformexisting point optimal tests and exceed the usual power envelope for such samplesize configurations. Extension of the present line of research in this direction is amajor challenge and is left for future work.

NOTE

1. These moment conditions are also found in Kruiniger (2007). Our approach is different fromhis in that we use only globally strong (strong at all ρ values) moment conditions and combine themto form a least squares estimator, which yields simple asymptotics. Bond, Nauges, and Windmeijer(2005) consider this least squares estimator for micro-panel unit root testing.

REFERENCES

Ahn, S.C. & P. Schmidt (1995) Efficient estimation of models for dynamic panel data. Journal ofEconometrics 68, 5–27.

Alvarez, J. & M. Arellano (2003) The time series and cross-section asymptotics of dynamic panel dataestimators. Econometrica 71(4), 1121–1159.

Anderson, T.W. & C. Hsiao (1981) Estimation of dynamic models with error components. Journal ofAmerican Statistical Association 76, 598–606.

Arellano, M. & S. Bond (1991) Some tests of specification for panel data: Monte carlo evidence andan application to employment equations. Review of Economic Studies 58, 277–297.

Arellano, M. & O. Bover (1995) Another look at the instrumental variable estimation of error-component models. Journal of Econometrics 68, 29–51.

Blundell, R. & S. Bond (1998) Initial conditions and moment restrictions in dynamic panel data mod-els. Journal of Econometrics 87, 115–143.

Bond, S., C. Nauges, & F. Windmeijer (2005) Unit Roots: Identification and Testing in Micro Panels.IFS Working paper CWP07/05.

Hahn, J. & G. Kuersteiner (2002) Asymptotically unbiased inference for a dynamic panel model withfixed effects when both N and T are large. Econometrica 70(4), 1639–1657.

Han, C. (2007) Determinants of covariance matrices of differenced AR(1) processes. EconometricTheory 23, 1248–1253.

Han, C. & P.C.B. Phillips (2006) GMM with many moment conditions. Econometrica 74, 147–192.Han, C. & P.C.B. Phillips (2007) GMM Estimation for Dynamic Panels with Fixed Effects and Strong

Instruments at Unity. Mimeo, University of Auckland.

Page 23: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

140 C. HAN AND P.C.B. PHILLIPS

Hayakawa, K. (2007) Small sample bias properties of the system GMM estimator in dynamic paneldata models. Economics Letters 95, 32–78.

Hsiao, C., M.H. Pesaran, & A.K. Tahmiscioglu (2002) Maximum likelihood estimation of fixed effectsdynamic panel data models covering short time periods. Journal of Econometrics 109, 107–150.

Im, K.S., S.C. Ahn, P. Schmidt, & J. Wooldridge (1999) Efficient estimation of panel data models withstrictly exogenous explanatory variables—A Monte Carlo study. Journal of Econometrics 93(1),177–201.

Im, K.S., M.H. Pesaran, & Y. Shin (1997) Testing for Unit Roots in Heterogeneous Panels. Workingpaper, University of Cambridge.

Kallenberg, O. (2002) Foundations of Modern Probability, 2nd ed. Springer.Kruiniger, H. (2007) An efficient linear GMM estimator for the covariance stationary AR(1)/unit root

model for panel data. Econometric Theory 23, 519–535.Levin, A. & C.F. Lin (1992) Unit Root Test in Panel Data: Asymptotic and Finite Sample Properties.

Discussion paper #92-93, University of California at San Diego.Levin, A., C.F. Lin, & C.S.J. Chu (2002). Unit root tests in panel data: Asymptotic and finite-sample

properties. Journal of Econometrics 108, 1–24.Moon, H.R. & B. Perron (2004) Asymptotic Local Power of Pooled t-Ratio Tests for Unit Roots in

Panels with Fixed Effects. Mimeo, University of Southern California.Moon, H.R., B. Perron, & P.C.B. Phillips (2006a) On the Breitung test for panel unit roots and local

asymptotic power. Econometric Theory 22, 1179–1190.Moon, H.R., B. Perron, & P.C.B. Phillips (2006b) Incidental Trends and the Power of Panel Unit Root

Tests. Working paper.Nickell, S. (1981) Biases in dynamic models with fixed effects. Econometrica 49, 1417–1426.Phillips, P.C.B. (1989) Partially identified econometric models. Econometric Theory 5, 181–240.Phillips, P.C.B. & C. Han (2008). Gaussian inference in AR(1) times series with or without unit root.

Econometric Theory 24, 631–650.Phillips, P.C.B. & H.R. Moon (1999) Linear regression limit theory for nonstationary panel data.

Econometrica 67(5), 1057–1111.Phillips, P.C.B. & V. Solo (1992) Asymptotics for linear processes. Annals of Statistics 20(2),

971–1001.Phillips, P.C.B. & D. Sul (2007) Bias in dynamic panel estimation with fixed effects, incidental trends

and cross section dependence. Journal of Econometrics 137, 162–188.Ploberger, W. & P.C.B. Phillips (2002) Optimal Testing for Unit Roots in Panel Data. Mimeo, Yale

University.Staiger, D. & J.H. Stock (1997) Instrumental variables regression with weak instruments. Economet-

rica 65(3), 557–586.Stock, J.H. & J.H. Wright (2000) GMM with weak identification. Econometrica 68(5), 1055–1096.

APPENDIX A: Proofs

The assumed model in the fixed effects case is yit = αi + uit, uit = ρuit−1 + εit, whereεit ∼ i.i.d. (0,σ 2). Obviously we can express, for all ρ ∈ (−1,1],

�yit−1 =∞∑j=0

ρ j �εi t−1− j = εit−1 − (1−ρ)∞∑j=1

ρ j−1εi t− j−1, (A.1)

and

ηit = 2�εit + (1+ρ)�yit−1 = 2εit − (1−ρ)εit−1 − (1−ρ2)∞∑j=1

ρ j−1εi t− j−1, (A.2)

where 0 ·±∞ = 0.

Page 24: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 141

Proof of Lemma 1. If ρ = 1, then �yit = εit, and ηit = 2εit. So we have E�yit−1ηit =2Eεit−1εit = 0. If |ρ| < 1, then E�yit−1ηit = −(1 − ρ)σ 2 + (1 − ρ)(1 − ρ2)(1−ρ2)−1σ 2 = 0. n

Next, we present some lemmas that are useful in proving the theorems (Theorem 1 inparticular). Let Tm = max(T −m,0). Let Xit = ∑∞

0 cj εi t− j and Yit = ∑∞0 dj εi t− j where

εit ∼ i.i.d. (0,σ 2). We will frequently assume that∞∑0

(|cs |+ |ds |) < ∞ and (A.3)

∞∑0

s(c2s +d2

s ) < ∞. (A.4)

THEOREM A.1 If Eε2it < ∞, then under (A.4),

1

nTm

n

∑i=1

T

∑t=m+1

X2it →p σ 2

∞∑0

c2j

for any small m (e.g., 2 or 3), as nTm → ∞.

Of course, the part of condition (A.4) related to ds is not relevant in this case.

Proof. If T is fixed and n → ∞, then by Khinchine’s law of large numbers,

1

Tm

T

∑t=m+1

[1

n

n

∑i=1

X2it

]→p

1

Tm

T

∑t=m+1

E X2it = E X2

it = σ 2∞∑0

c2j ,

where the first equality on the right hand side holds because of stationarity. If n is fixedand T → ∞, then the result follows from the convergence T −1

m ∑Tt=m+1 X2

it →p σ 2 ∑∞0 c2

jfor each i under the stated conditions (Thm. 3.7 of Phillips and Solo, 1992) and the cross-sectional i.i.d. assumption. If both n and T increase to infinity, by Theorem 1 of Phillipsand Moon (1999), it suffices to show that (a) limsupn,T n−1 ∑n

i=1 EZiT{ZiT > nδ} = 0

for all δ > 0 where ZiT = T −1m ∑T

t=m+1 X2it (a notation that is used only in this proof),

because all other conditions in Phillips and Moon’s (1999) theorem are obviously sat-isfied. Because the ZiT are i.i.d. across i , condition (a) is equivalent to EZ1T {Z1T >nδ} = 0 for all δ > 0, which is implied by the uniform integrability of Z1T (over T ).Since Z1T →p σ 2 ∑∞

0 c2j , the uniform integrability of Z1T (over T ) is equivalent to the

convergence EZ1T → σ 2 ∑∞0 c2

j , which is obviously true because EZ1T = σ 2 ∑∞0 c2

j forall T . n

Next, we establish a panel CLT for the sample covariance of Xit and Yit. We assume thatXit and Yit are uncorrelated by imposing the condition∞∑0

cj dj = 0. (A.5)

THEOREM A.2 Let �j,r = cj dj+r +cj+r dj . If Eε4it < ∞, then under (A.3), (A.4), and

(A.5), for any fixed T and small m (e.g., 2 or 3),

UnT := (nTm)−1/2n

∑i=1

T

∑t=m+1

XitYit ⇒ N (0,VT ) (A.6)

Page 25: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

142 C. HAN AND P.C.B. PHILLIPS

as n → ∞, where

VT = AT var(ε2it)+ BT σ 4,

with

AT =∞∑0

c2j d2

j + 2

Tm

T

∑t=m+2

t−m−1

∑r=1

∞∑j=0

cj dj cj+r dj+r , (A.7)

BT =∞∑j=0

∞∑

r=1�2

j,r + 2

Tm

T

∑t=m+2

t−m−1

∑k=1

∞∑j=0

∞∑

r=1�j,r �j+k,r . (A.8)

Furthermore,

VT → σ 4 B = σ 4∞∑

r=1

( ∞∑j=0

�j,r

)2

(A.9)

as T → ∞. Whether or not n → ∞, UnT ⇒ N (0,σ 4 B) as T → ∞.

Proof. When T is small, (A.6) follows from the central limit law for i.i.d. variatesbecause fourth moments are finite. The variance VT is computed as follows. Since

XitYit =∞∑j=0

cj dj ε2it− j +

∞∑j=0

∞∑

r=1�j,r εit− j εit− j−r , (A.10)

we have

C0 = var(XitYit) =( ∞

∑j=0

c2j d2

j

)var(ε2

it)+( ∞

∑j=0

∞∑

r=1�2

j,r

)σ 4. (A.11)

Also

Xit−kYit−k =∞∑j=0

cj dj ε2i t−k− j +

∞∑j=0

∞∑

r=1�j,r εi t−k− j εi t−k− j−r , (A.12)

and from (A.10) and (A.12), Ck := cov(XitYit, Xit−kYit−k) is

Ck =( ∞

∑j=0

cj dj cj+kdj+k

)var(ε2

it)+( ∞

∑j=0

∞∑

r=1�j,r �j+k,r

)σ 4.

Now

T −1m var

(T

∑t=m+1

XitYit

)= C0 + 2

Tm

T

∑t=m+2

t−m−1

∑k=1

Ck = AT var(ε2it)+ BT σ 4, (A.13)

as stated. Lemmas A.3 and A.4 below, respectively, show that AT → 0 and BT → B under(A.3) and (A.4), and thus the convergence (A.9) is obvious from (A.13).

Now we prove the central limit theory as T → ∞. If n is fixed, then the result fol-lows from Theorem 6 of Phillips and Han (2008), implied by (A.4) and the finiteness

Page 26: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 143

of the second moments. For the case where both n and T increase, we will apply theLindeberg central limit theorem to the row-wise independent array {n−1/2WT i }, where

WT i = T −1/2m ∑T

t=m+1 XitYit (notation that is used only in this proof). The Lindebergcondition is

1

n

n

∑i=1

EW 2T i{

W 2T i > nε

}→ 0, for all ε > 0 (A.14)

(e.g., Kallenberg, 2002, Thm. 5.12). Because the random variables are i.i.d. across i , (A.14)reduces to EW 2

T 1{W 2T 1 > nε} → 0 for all ε > 0 (note that T depends on n), which is again

implied by

supT

EW 2T 1{

W 2T 1 > nε

}→ 0 for all ε > 0.

This last condition holds if W 2T 1 (as a sequence indexed by T ) is uniformly integrable

over T . When a positive random variable converges in distribution, uniform convergenceis equivalent to convergence of the means (Kallenberg, 2002, Lem. 4.11). In the case ofW 2

T 1, we have WT 1 →d W1 ∼ N (0,σ 4 B) by Theorem 6 of Phillips and Han (2008), and

by the continuous mapping theorem, W 2T 1 →d W 2

1 . But by the first part of the theorem,

EW 2T 1 = AT var(ε2

it)+σ 4 BT → σ 4 BT = EW 21 ,

and the joint limit theory follows straightforwardly. n

The next two lemmas, respectively, show that AT → 0 and BT → B as T → ∞, asindicated above.

LEMMA A.3. Under (A.4) and (A.5), limT →∞ AT = 0.

Proof. Let f j = cj dj (notation used only in this proof). For any sequence {ar },T

∑t=m+2

t−m−1

∑r=1

ar =Tm−1

∑s=1

(Tm − s)as , (A.15)

and thus we have

AT =∞∑0

f 2j + 2

Tm

∞∑j=0

Tm−1

∑s=1

(Tm − s) f j f j+s

=[∞∑0

f 2j +2

∞∑j=0

Tm−1

∑s=1

f j f j+s

]−2

[1

Tm

∞∑j=0

Tm−1

∑s=1

s fj f j+s

]

=[∞∑0

f 2j +2

∞∑j=0

∞∑

s=1f j f j+s

]−2

∞∑j=0

∞∑

s=Tm

fj f j+s −2

[1

Tm

∞∑j=0

Tm−1

∑s=1

s fj f j+s

]

= A1T −2A2T −2A3T , say.

Here A1T = (∑∞0 f j )

2 = (∑∞0 cj dj )

2 = 0 because of (A.5), and therefore it suffices toshow that A2T → 0 and A3T → 0 as T → ∞. First,

|A2T | ≤∞∑j=0

∞∑

s=Tm

| f j f j+s | ≤∞∑j=0

| f j |∞∑

s=Tm

| fs | → 0,

Page 27: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

144 C. HAN AND P.C.B. PHILLIPS

because

∞∑0

| f j | =∞∑0

|cj dj | ≤(∞

∑0

c2j

)1/2(∞∑0

d2j

)1/2

< ∞, (A.16)

by (A.4). Next,

|A3T | ≤ T −1m

∞∑j=0

Tm

∑s=1

s| f j f j+s | = T −1m

∞∑j=0

| f j |Tm

∑s=1

s| f j+s | ≤ T −1m

(∞∑0

| f j |)∞

∑0

s| fs |.

But ∑∞0 | f j | < ∞ by (A.16), and ∑∞

0 s| fs | ≤ (∑∞0 sc2

s )1/2(∑∞0 sd2

s )1/2 < ∞ by (A.4).

So A3T = O(T −1m ) → 0. n

LEMMA A.4. Under (A.3) and (A.4), limT →∞ BT = B = ∑∞r=1

(∑∞

j=0 �j,r

)2< ∞.

Proof. The finiteness of B is proved in Theorem 6 of Phillips and Han (2008). To estab-lish convergence, note that

B =∞∑

r=1

( ∞∑j=0

�j,r

)2

=∞∑

r=1

∞∑j=0

�2j,r +2

∞∑

r=1

∞∑j=0

∞∑

k=1�j,r �j+k,r ,

so we have

BT − B=−2

( ∞∑

k=1

∞∑j=0

∞∑

r=1�j,r �j+k,r −T −1

m

T

∑t=m+2

t−m−1

∑k=1

∞∑j=0

∞∑

r=1�j,r �j+k,r

)=−2DT,

say. Using (A.15) again, we have

DT =∞∑

k=1

∞∑j=0

∞∑

r=1�j,r �j+k,r −

Tm−1

∑k=1

∞∑j=0

∞∑

r=1�j,r �j+k,r + 1

Tm

Tm−1

∑k=1

k∞∑j=0

∞∑

r=1�j,r �j+k,r

=∞∑

k=Tm

∞∑j=0

∞∑

r=1�j,r �j+k,r + 1

Tm

Tm−1

∑k=1

∞∑j=0

∞∑

r=1k�j,r �j+k,r = D1T + D2T , say.

As shown below, we have

∞∑

k=1

∞∑j=0

∞∑

r=1

∣∣�j,r �j+k,r∣∣< ∞ and

∞∑

k=1

∞∑j=0

∞∑

r=1k∣∣�j,r �j+k,r

∣∣< ∞, (A.17)

which imply that D1T → 0 and D2T = O(T −1m ) → 0.

Now we prove (A.17). For the first part of (A.17), note that ∑∞k=1 |�j+k,r | ≤ ∑∞

k=1|�k,r | ≤ ∑∞j=0 |�j,r |, so

∞∑

k=1

∞∑j=0

∞∑

r=1

∣∣�j,r �j+k,r∣∣≤ ∞

∑r=1

( ∞∑j=0

|�j,r |)2

. (A.18)

Page 28: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 145

Now let aj = |cj |+ |dj |, implying that |�j,r | ≤ |cj dj+r |+ |cj+r dj | ≤ aj aj+r . Then (A.3)and (A.4) imply that∞∑0

as < ∞,∞∑0

a2s < ∞,

∞∑0

sa2s < ∞, (A.19)

where the second and third results hold because a2j ≤ 2(c2

j +d2j ). Now, the right-hand side

of (A.18) is bounded by

∞∑

r=1

( ∞∑j=0

aj aj+r

)2

≤∞∑

r=1

( ∞∑j=0

a2j

)( ∞∑j=0

a2j+r

)=(∞

∑0

a2j

)(∞∑1

sa2s

)< ∞,

by (A.19). So the first part of (A.17) is proved.For the second part of (A.17),

∞∑

k=1k|�j+k,r | ≤

∞∑

k=1kaj+kaj+k+r ≤

( ∞∑

k=1ka2

j+k

)1/2( ∞∑

k=1ka2

j+k+r

)1/2

≤∞∑

k=1ka2

j+k ≤∞∑

k=1(k + j)a2

k+ j ≤∞∑1

ka2k ,

and therefore the second part of (A.17) is

∞∑

k=1

∞∑j=0

∞∑

r=1k|�j,r �j+k,r | ≤

∞∑j=0

∞∑

r=1|�j,r |

(∞∑1

ka2k

)≤(∞

∑0

aj

)2(∞∑1

ka2k

)< ∞

by (A.19). n

We apply Theorems A.1 and A.2 to the components of the FDLS estimator ρols.

LEMMA A.5. (nT1)−1 ∑ni=1 ∑T

t=2(�yit−1)2 →p 2σ 2/(1+ρ) as nT1 → ∞.

Proof. Because of (A.1), we invoke Theorem A.1 with c0 = 0, c1 = 1, and cj =−(1−ρ)ρ j−2 for j ≥ 2. The calculation of ∑∞

0 c2j for the limit is then obvious. n

For the central limit theorem, we have the following lemma.

LEMMA A.6. If Eε4it < ∞, then as n → ∞, (nT1)−1/2 ∑n

i=1 ∑Tt=2 �yit−1ηit con-

verges to a normal distribution with variance 8σ 4/(1 +ρ), and, furthermore, (nT1)−1/2

∑ni=1 ∑T

t=2 �yit−1ηit ⇒ N (0,8σ 4/(1+ρ)) whether n → ∞ or not.

Proof. Because the coefficients of the lag polynomials for �yit−1 and ηit decay expo-nentially, conditions (A.3) and (A.4) are satisfied. Condition (A.5) holds by Lemma 1. Theresults follow from Theorem A.2. The variances for finite T and infinite T are calculatedin Appendix B. n

Proof of Theorem 1. This follows from Lemmas A.5 and A.6. n

Proof of Lemma 2. Recall that the model is yit = αi +uit with uit = ρuit−1 + εit. Letthe sequence be initialized at ui,−1. Then

uit = ρt+1ui,−1 +t

∑j=0

ρ j εit− j ,

Page 29: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

146 C. HAN AND P.C.B. PHILLIPS

and therefore

�yit−1 = �uit−1 = ρt−1(ρ −1)ui,−1 + εit−1 + (ρ −1)t−2

∑j=0

ρ j εi t− j−2,

and

E(�yit−1)2 = (ρ −1)2ρ2(t−1)Eu2i,−1 +

[1+ (ρ −1)2

(1+ρ2 +·· ·ρ2(t−2)

)]σ 2

= (ρ −1)2ρ2(t−1)Eu2i,−1 + (ρ +1)−1

[2+ (ρ −1)ρ2(t−1)

]σ 2

= 2σ 2/(ρ +1)+(

ρ −1

ρ +1

)ρ2(t−1)σ 2δρ, (A.20)

where δρ = (ρ2 −1)Eu2i,−1/σ 2 +1. Next, because ηit = 2�εit + (1+ρ)�yit−1, we have

E�yit−1ηit = (ρ −1)ρ2(t−1)σ 2δρ. (A.21)

The results follow from (A.20) and (A.21). n

Proof of Theorem 2. Because the moving average coefficients of the double differencedprocess decay exponentially as lag order increases, the conditions for Theorems A.1 andA.2 are all satisfied. The result follows from Theorems A.1 and A.2. n

Next we show that the moment conditions (12) and (13) make the expected first deriva-tive matrix (D) block diagonal, and if Eε3

it = 0 then the variance-covariance matrix (�) isblock diagonal too when evaluated at the true parameter.

Proof that D and � Are Block Diagonal. For convenience, we write (12) and (13) as

E Ait(ρ,β) = ET

∑t=2

�zit−1[2�zit + (1−ρ)�zit] = 0,

E Bit(ρ,β) = ET

∑t=1

(xit −ρ xit−1)[( yit −ρ yit−1)− (xit −ρ xit−1)′β

]= 0,

where �zit = �yit −�x ′itβ.

For the D matrix, we need to show that E∂ Ait/∂β = 0 and E∂ Bit/∂ρ = 0 when evalu-ated at the true parameter. Because �zit = �uit and yit = (xit −ρ xit−1)′β +ρ yit−1 + εitwhen evaluated at the true parameter, we have

E∂ Ait

∂β= −E

T

∑t=2

[�xit−1ηit +�uit{2�xit + (1−ρ)�xit−1}

]= 0,

E∂ Bit

∂ρ= −E

T

∑t=1

[xit−1εit + (xit −ρ xit−1)uit−1

]= 0,

where ηit = 2�uit + (1 − ρ)�uit−1 and uit−1 = uit−1 − T −1 ∑Ts=1 uis−1 =

yit−1 − x ′it−1β. So the D matrix is block diagonal. Next, for the � matrix, at the true

parameter,

E Bit Ait = E

[T

∑t=1

(xit −ρ xit−1)εit

][T

∑t=2

�uit−1ηit

]= 0,

Page 30: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 147

when Eεit = 0, Eε3it = 0, and εit are i.i.d. So the � matrix is also block diagonal under the

zero third-moment assumption.Now let us obtain the limit behavior of UnT (c) of (20) under the local to unity setting

ρ = ρnT = 1−n−1/2T −1c as n → ∞ (whether T is fixed or T → ∞).

Proof of (22). We omit the nT subscript from ρnT for notational brevity. Clearly,

UnT (c) = −n log(|�T (ρ)|/|�T (1)|)− 1

σ 2

n

∑i=1

(�yi )′ [�T (ρ)−1 −�T (1)−1

](�yi ),

where �T (·) is defined in (21). Under the alternative hypothesis that n1/2T (1−ρ) = c,

EUnT (c) = n(

tr�T (ρ)− T − log |�T (ρ)|)

= n

((1−ρ)T

1+ρ− log

[1+ (1−ρ)T

1+ρ

]),

where |�T (ρ)| = 1+T (1−ρ)/(1+ρ) (see Han, 2007, Thm. 1). Because x − log(1+x) =x2/2−o(x2) when x is close to zero, we have

EUnT (c) = nT 2(1−ρ)2

2(1+ρ)2 +o

(nT 2(1−ρ)2

(1+ρ)2

)= c2

2(1+ρ)2 +o(1) → c2

8.

The variance of UnT (c) is calculated from the fact that var(z′ Az) = 2tr(A′ A) if z ∼N (0, I ) and A is symmetric. In our case,

σ−2�y′i

[�T (ρ)−1 −�T (1)−1

]�yi = z′

i [IT −�T (ρ)] zi = z′i QT zi , say,

where zi = σ−1�T (ρ)−1/2�yi ∼ N (0, I ) and

QT = Toeplitz{−1,1,ρ, . . . ,ρT2 }(1−ρ)/(1+ρ),

so that

var(

UnT (c))

= 2n tr(Q′T QT ) = 2n(1−ρ)2

(1+ρ)2

[T +2

T1

∑k=1

Tkρ2(k−1)

]

= 2nT 2(1−ρ)2

(1+ρ)2

[1

T+ 2

T 2

T1

∑k=1

Tk − 2

T 2

T1

∑k=1

Tk(1−ρ2(k−1))

]

= 2c2

(1+ρ)2

[1− 2

T 2

T1

∑k=1

Tk(1−ρ2(k−1))

]= 2c2

(1+ρ)2 [1−o(1)] → c2

2

as n → ∞. Above, the last o(1) order follows from

1

T 2

T1

∑k=1

Tk [1−ρ2(k−1)]= 1

T 2

T2

∑k=1

Tk+1(1−ρ2k)= 1

T 2

T2

∑k=1

Tk+1(1−ρ)

(k

∑j=0

ρ j

)(1+ρk)

≤ 2

T 2

T2

∑k=1

Tk+1k(1−ρ) = 2c

n1/2T 2

T3

∑k=1

k → 0

as n → ∞, whether T is fixed or T → ∞.

Page 31: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

148 C. HAN AND P.C.B. PHILLIPS

So far, we have obtained that EUnT (c) → c2/8 and var(UnT (c)) → c2/2. In the deriva-tion of the variance, we can also verify that it is uniformly bounded, which is sufficient toinvoke the Lindeberg CLT for UnT (c). Thus,

UnT (c) →d N (c2/8,c2/2)

as claimed. n

APPENDIX B: Asymptotic Variance Formulas

The Variance of FDLS. We calculate the variance of the FDLS estimator ρols in termsof the parameters using the expressions in Theorem A.2. Let ρ1 = 1−ρ and ρ2 = 1− ρ2.The lag polynomial coefficients of �yt−1 and ηt are tabulated in Table B1, and the corre-sponding �j,r terms are tabulated in Table B2. So

∞∑0

c2j d2

j = ρ21 +ρ2

1ρ22 (1+ρ4 +·· · ) = 2(1−ρ)2/(1+ρ2), (B.1)

∞∑0

cj dj cj+kdj+k = −ρ2(k−1)[ρ2

1ρ2 +ρ2ρ21ρ2

2 (1+ρ4 +·· · )]

= −ρ2(k−1)(1−ρ)2(1−ρ2)/(1+ρ2), k ≥ 1, (B.2)

TABLE B1. Coefficients of lag polynomials for �yit−1 and ηit

0 1 2 3 4 j

�yt−1 0 1 −ρ1 −ρρ1 −ρ2ρ1 −ρ j−2ρ1ηt 2 −ρ1 −ρ2 −ρρ2 −ρ2ρ2 −ρ j−2ρ2

ρ1 = 1−ρ, ρ2 = 1−ρ2

TABLE B2. The �j,r terms

r \ j 0 1 2 3 4 5

1 2 −2ρρ1 2ρρ1ρ2 2ρ3ρ1ρ2 2ρ5ρ1ρ2 · · ·2 −2ρ1 −2ρ2ρ1 2ρ2ρ1ρ2 2ρ4ρ1ρ2 2ρ6ρ1ρ2 · · ·3 −2ρρ1 −2ρ3ρ1 2ρ3ρ1ρ2 2ρ5ρ1ρ2 2ρ7ρ1ρ2 · · ·...

......

......

...

Page 32: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 149

where 00 = 1. Also,

∞∑j=0

∞∑

r=1�2

j,r = [4+4ρ21 (1+ρ2 +·· · )]+4ρ2ρ2

1 (1+ρ2 +·· · )

+4ρ2ρ21ρ2

2 (1+ρ2 +·· · )(1+ρ4 +·· · )

= 4

[1+ (1+ρ2)(1−ρ)

1+ρ+ ρ2(1−ρ)2

1+ρ2

], (B.3)

and (after some algebra)

∞∑j=0

∞∑

r=1�j,r �j+k,r = −4ρ(1−ρ)/(1+ρ){k = 1}+4ρ2k−3(1−ρ)2{k > 1}

−4ρ2k(1−ρ)2/(1+ρ2), k ≥ 1. (B.4)

The calculation of AT and BT of Theorem A.2 (with m = 1) is tedious. We will use

T

∑t=3

t−2

∑k=1

ρ2(k−1) = T2

1−ρ2 −(

ρ2

1−ρ2

)1−ρ2T2

1−ρ2

and

T

∑t=3

t−2

∑k=2

ρ2(k−2) =T

∑t=4

t−2

∑k=2

ρ2(k−2) = T3

1−ρ2 −(

ρ2

1−ρ2

)1−ρ2T3

1−ρ2 .

From the definition of AT , (B.1) and (B.2), we have

AT = 2(1−ρ)2

1+ρ2 − 2

T1

[T2

1−ρ2 −(

ρ2

1−ρ2

)1−ρ2T2

1−ρ2

](1−ρ)2(1−ρ2)

1+ρ2

= 2

T1

[(1−ρ)2

1+ρ2 + ρ2(1−ρ)(1−ρ2T2)

(1+ρ)(1+ρ2)

], (B.5)

and

BT = 4

[1+ (1+ρ2)(1−ρ)

1+ρ+ ρ2(1−ρ)2

1+ρ2

]

+ 8

T1

{− T2ρ(1−ρ)

1+ρ{T > 3}+

[T3(1−ρ)

1+ρ− ρ2(1−ρ2T3)

(1+ρ)2

−[

T2(1−ρ)

1+ρ− ρ2(1−ρ2T2)

(1+ρ)2

]ρ2

1+ρ2

},

Page 33: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

150 C. HAN AND P.C.B. PHILLIPS

which equals

BT = 4

[1+ (1+ρ2)(1−ρ)

1+ρ+ ρ2(1−ρ)2

1+ρ2 −(

2

T1

)ρ(1−ρ)

1+ρ{T ≥ 3}

−2

(T2

T1

)ρ2(1−ρ)

(1+ρ)(1+ρ2)+(

2

T1

)ρ4(1−ρ2T2)

(1+ρ)2(1+ρ2)−(

2

T1

)ρ3(1−ρ2T3)

(1+ρ)2

],

(B.6)

where AT and BT are defined in Theorem A.2 and m = 1 is used.The limit variance (as n → ∞) of (nT1)1/2( ρols −ρ) is now obtained by multiplying

AT var(ε2it)+ BT σ 4 by (1+ρ)2/4σ 4, viz.,

Vols,T = 1

T1

[(1−ρ2)2

1+ρ2 − ρ2(1−ρ2)(1−2ρ2T2)

1+ρ2

](1/2)var(ε2

it/σ2)

+ (1+ρ)2 + (1+ρ2)(1−ρ2)+ ρ2(1−ρ2)2

1+ρ2 −(

2

T1

)ρ(1−ρ2){T ≥ 3}

−2

(T2

T1

)ρ2(1−ρ2)

1+ρ2 +(

2

T1

)ρ4(1−ρ2T2)

1+ρ2

−(

2

T1

)ρ3(1−ρ2T3). (B.7)

As a special case, if T = 2 and εit ∼ N (0,σ 2), then var(ε2it/σ

2) = 2 and Vols,T is simplifiedto

Vols,T = (3−ρ)(1+ρ), T = 2, εit ∼ N (0,σ 2). (B.8)

It is also easily verified that Vols,T → 2(1+ρ) as T → ∞.

The Variance of DDLS when ρρρ === 1. Next, we calculate the variance of the DDLS(double-differencing least squares) estimator θ for ρ = 1. According to Phillips and Han(2008), �2 yit−1 = ∑∞

0 cj εi t− j and ηit = ∑∞0 dj εi t− j , where

c0 = 0, c1 = 1, c2 = −(2−ρ), ck = ρk−3(1−ρ)2, k ≥ 3, (B.9)

d0 = 2, d1 = −4+φc1, d2 = 2+φc2, dk = φck , k ≥ 3, (B.10)

with φ = (4−ρ)(1+ρ)/(3−ρ) and 00 = 1. When ρ = 1, we have

c0 = 0, c1 = 1, c2 = −1, ck = 0, k ≥ 3,

d0 = 2, d1 = −1, d2 = −1, dk = 0, k ≥ 3.

So

∞∑0

c2j d2

j = 2 andT

∑t=m+2

t−m−1

∑r=1

∞∑j=0

cj dj cj+r dj+r = −Tm+1,

Page 34: Home Page of Peter C.B. Phillips - GMM ESTIMATION FOR ...korora.econ.yale.edu/phillips/pubs/art/p1290.pdfIn a recent paper dealing with the time series case, Phillips and Han (2008)

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS 151

and because

�0,1 = 2, �0,2 = −2, and �j,r = 0 for all other j and r ,

we have

∞∑j=0

∞∑

r=1�2

j,r = 8,T

∑t=m+2

t−m−1

∑k=1

∞∑j=0

∞∑

r=1�j,r �j+k,r = 0.

For the DDLS estimator θ , we use m = 2, and thus, by Theorem A.2, when ρ = 1 andT > 2,

(nT2)−1/2n

∑i=1

T

∑t=3

�2 yit−1ηit →d N(

0,2var(ε2it)/T2 +8σ 4

),

and because plimnT →∞(nT2)−1 ∑ni=1 ∑T

t=3(�2 yit−1)2 = 2σ 4 by Theorem A.1, we haveθ = 1 and as nT → ∞,

(nT2)1/2θ →d N(

0,var(ε2it/σ

2)/2T2 +2).