bias corrected instrumental variables estimation for dynamic … · 2003. 2. 13. · bias corrected...

Bias Corrected Instrumental Variables Estimation

for Dynamic Panel Models with Fixed Effects

Jinyong Hahn Jerry Hausman Guido Kuersteiner

Brown University MIT MIT

August, 2002

Acknowledgment: We are grateful to Yu Hu for exceptional research assistance. Helpful

comments by Guido Imbens, Whitney Newey, James Powell, and participants of workshops at

Arizona State University, Berkeley, Harvard/MIT, Maryland, Triangle, UCL, UCLA, UCSD

are appreciated. Kuersteiner gratefully acknowledges financial support from NSF grant SES-

0095132.

Abstract

This paper proposes a new instrumental variables estimator for a dynamic panel model with fixed

effects with good bias and mean squared error properties even when identification of the model

becomes weak near the unit circle. We compare our new estimator based on long differences to a

bias corrected estimator based on second order expansions of the bias. Simulation experiments

show that this bias correction performs well if the model does not have a root near unity but

breaks down near the unit circle. We adopt a weak instrument asymptotic approximation to

study the behavior of various estimators near the unit circle. We show that an estimator based

on long differencing the model is much less biased than conventional implementations of the

GMM estimator for the dynamic panel model. We also show that under the weak instrument

approximation such conventional estimators are dominated in terms of mean squared error by an

estimator with far less moment conditions. The long difference estimator mimics the infeasible

optimal procedure through its reliance on a small set of moment conditions.

Keywords: dynamic panel, bias correction, second order, unit root, weak instrument

JEL: C13, C23, C51

1 Introduction

We are concerned with estimation of the dynamic panel model with fixed effects. Under large

n, fixed T asymptotics it is well known from Nickell (1981) that the standard maximum likeli-

hood estimator suffers from an incidental parameter problem leading to inconsistency. In order

to avoid this problem the literature has focused on instrumental variables estimation (GMM)

applied to first differences. Examples include Anderson and Hsiao (1982), Holtz-Eakin, Newey,

and Rosen (1988), and Arellano and Bond (1991). Ahn and Schmidt (1995), Hahn (1997), and

Blundell and Bond (1998) considered further moment restrictions. Comparisons of information

contents of varieties of moment restrictions made by Ahn and Schmidt (1995) and Hahn (1999)

suggest that, unless stationarity of the initial level yi0 is somehow exploited as in Blundell and

Bond (1998), the orthogonality of lagged levels with first differences provide the largest source

of information.

Unfortunately, the standard GMM estimator obtained after first differencing has been found

to suffer from substantial finite sample biases. See Alonso-Borrego and Arellano (1996). Mo-

tivated by this problem, modifications of likelihood based estimators emerged in the literature.

See Kiviet (1995), Lancaster (1997), Hahn and Kuersteiner (2002). The likelihood based esti-

mators do reduce finite sample bias compared to the standard maximum likelihood estimator,

but the remaining bias is still substantial for T relatively small.

In this paper, we attempt to eliminate the finite sample bias of the standard GMM estimator

obtained after first differencing. We view the standard GMM estimator as a minimum distance

estimator that combines T−1 instrumental variable estimators (2SLS) applied to first differences.This view has been adopted by Chamberlain (1984) and Griliches and Hausman (1986). It has

been noted for quite a while that IV estimators can be quite biased in finite sample. See Nagar

(1959), Mariano and Sawa (1972), Rothenberg (1983), Bekker (1994), Donald and Newey (2001)

and Kuersteiner (2000). If the ingredients of the minimum distance estimator are all biased,

it is natural to expect such bias in the resultant minimum distance estimator, or equivalently,

GMM.

We consider a second order approach to the bias of the GMM estimator using the formula

contained in Hahn and Hausman (2002). We find that the standard GMM estimator suffers

from significant bias. The bias arises from two primary sources: the correlation of the structural

equation error with the reduced form error and the low explanatory power of the instruments.

We attempt to solve these problems by using the “long difference technique” of Griliches and

Hausman (1986). Griliches and Hausman noted that bias is reduced when long differences are

1

used in the errors in variable problem, and a similar result works here with the second order bias.

Long differences also increases the explanatory power of the instruments which further reduces

the finite sample bias and also decreases the MSE of the estimator. To increase further the

explanatory power of the instruments, we use the technique of using estimated residuals as ad-

ditional instruments, a technique introduced in the simultaneous equations model by Hausman,

Newey, and Taylor (1987) and used in the dynamic panel data context by Ahn and Schmidt

(1995). Monte Carlo results demonstrate that the long difference estimator performs quite well,

even for high positive values of the lagged variable coefficient where previous estimators are

badly biased.

However, the second order bias calculations do not predict well the performance of the esti-

mator for these high values of the coefficient. Simulation evidence shows that our approximations

do not work well near the unit circle where the model suffers from a near non-identification prob-

lem. In order to analyze the bias of standard GMM procedures under these circumstances we

consider a local to non-identification asymptotic approximation. We use the weak instrument

approximation of Staiger and Stock (1997) and Stock and Wright (2000). It is based on letting

the correlation between instruments and regressors decrease at a prescribed rate as the sample

size increases.

In the case of the dynamic panel model it is the autoregressive parameter that determines

the quality of lagged observations used as instruments. Here we let the autoregressive parameter

tend to unity as the number of cross-sectional observations n tends to infinity. The goal of this

alternative asymptotic approximation is to keep the quality of the instruments constant as the

sample size increases such that the limiting distribution captures features of the finite sample

distribution of the estimators. Despite the autoregressive parameter tending to one with the

sample size increasing our approximation is quite different from the near unit root literature.

The number of time periods is held constant in our case and non-stationarity issues of importance

in the time series literature do not play any role in our analysis.

Our limiting distribution for the GMM estimator shows that only moment conditions in-

volving initial conditions are asymptotically relevant. We define a class of estimators based on

linear combinations of asymptotically relevant moment conditions and show that a bias minimal

estimator within this class can approximately be based on taking long differences of the dynamic

panel model. In general, it turns out that under near non-identification asymptotics the optimal

procedures of Alvarez and Arellano (1998), Arellano and Bond (1991) , Ahn and Schmidt (1995,

1997) are suboptimal from a bias point of view and inference optimally should be based on a

2

smaller than the full set of moment conditions. We show that a bias minimal estimator can be

obtained by using a particular linear combination of the original moment conditions. As far as

bias minimization under the alternative asymptotics is concerned this optimality result shows

that only a single moment condition should optimally be used. We also analyze the form of

the optimal estimator in terms of minimal mean squared error (MSE) under weak identificaiton

asymptotics. First order asymptotically efficient procedures are again found to be suboptimal.

The optimal number of moment conditions minimizing the MSE under alternative asymptotics

can be up to a factor of 10 smaller than the set of first order optimal instruments.

Optimal procedures under weak instrument asymptotics are difficult to implement because

they depend on unobservable nuisance parameters that can not be readily estimated. We show

that the long difference estimator we propose in this paper is a good heuristic approximation to

the optimal procedures. It captures the essence of our theoretical analysis which shows that the

selection of a few highly significant instruments is called for to minimize bias and MSE.

2 Review of the Bias of GMM Estimator

Consider the usual dynamic panel model with fixed effects. We assume that other exogenous

variables have been partialled out if they are present. Our variable yit may therefore be a residual

of a projection onto exogenous variables. Such removal of exogenous variables does not affect

our higher order analysis as long as it can be done√n-consistently. Our model then is

yit = αi + βyi,t−1 + εit, i = 1, . . . , n; t = 1, . . . , T (1)

We will impose somewhat strong conditions to simplify higher order analysis:

Condition 1 εiti.i.d.∼ N ¡0,σ2¢ over i and t.

Condition 2 yi0|αi ∼ N³αi1−β ,

σ2

1−β2´and αi ∼ N

¡0,σ2α

¢.

Condition (2) is a stationarity condition that is imposed for analytical convenience. Anderson

and Hsiao (1982) show that estimators exploiting such initial conditions can be very sensitive

to misspecification. For this reason we do not exploit the initial condition to construct our

estimator.

It has been common in the literature to consider the case where n is large and T is small.

The usual GMM estimator is based on the first difference form of the model yit − yi,t−1 =β (yi,t−1 − yi,t−2)+(εit − εi,t−1), where the instruments are based on the orthogonalityE [yi,s (εit − εi,t−1)] =

3

0, s = 0, . . . , t− 2. Instead, we consider a version of the GMM estimator developed by Arellano

and Bover (1995), which simplifies the characterization of the “weight matrix” in GMM esti-

mation. We define the innovation uit ≡ αi + εit. Arellano and Bover (1995) eliminate the fixedeffect αi in (1) by applying Helmert’s transformation

u∗it ≡r

T − tT − t+ 1

·uit − 1

T − t (ui,t+1 + · · ·+ uiT )¸, t = 1, . . . , T − 1

instead of first differencing.1 The transformation produces

y∗it = βx∗it + ε

∗it, t = 1, . . . , T − 1,

where x∗t ≡ y∗i,t−1. Let zit ≡ (yi0, . . . , yit−1)0. Our moment restriction is summarized by

E [zitε∗it] = 0, t = 1, . . . , T − 1. It can be shown that, with the homoskedasticity assump-

tion on εit, the optimal “weight matrix” is proportional to a block-diagonal matrix, with typical

diagonal block equal to E [zitz0it]. Therefore, the optimal GMM estimator is equal to

bbGMM ≡PT−1t=1 x

∗0t Pty

∗tPT−1

t=1 x∗0t Ptx

∗t

=

PT−1t=1 x

∗0t Ptx

∗t ·bb2SLS,tPT−1

t=1 x∗0t Ptx

∗t

(2)

where bb2SLS,t denotes the 2SLS of y∗t on x∗t :bb2SLS,t ≡ x∗0t Pty∗t

x∗0t Ptx∗t, t = 1, . . . , T − 1

and x∗t ≡ (x∗1t, · · · , x∗nt)0, y∗t ≡ (y∗1t, · · · , y∗nt)0, Zt ≡ (z1t, · · · , znt)0, and Pt ≡ Zt (Z0tZt)

−1 Z0t.

Therefore, the GMM estimator bbGMM may be understood as a linear combination of the 2SLS

estimators bb2SLS,1, . . . ,bb2SLS,T−1. It has long been known that the 2SLS may be subject tosubstantial finite sample bias. See Nagar (1959), Rothenberg (1983), Bekker (1994), and Donald

and Newey (2001), Hahn and Hausman (2002) and Kuersteiner (2000) for related discussion.

It is therefore natural to conjecture that a linear combination of the 2SLS may be subject to

quite substantial finite sample bias. This is indeed the case. In the first column of Table 1, we

summarized finite sample bias of bbGMM as a fraction of β approximated by 5,000 Monte Carlo

runs.

3 Bias Correction

In the previous section, we considered the bias of the GMM estimator as a result of the biases

of the 2SLS estimators. In this section, we attempt eliminating the bias using two different1Arellano and Bover (1995) note that the efficiency of the resultant GMM estimator is not affected whether

or not Helmert’s transformation is used instead of first differencing.

4

approaches: (i) by adopting the second order Taylor type approximation as by Nagar (1959),

and Rothenberg (1983); and (ii) by adopting a “long difference” perspective.

We first examine the second order method. Many econometric estimators, say bθ for θ basedon a sample of size n allow for an expansion of the form:

√n³bθ − θ´ = θ(1) + 1√

nθ(2) + op

µ1√n

¶(3)

where both θ(1) and θ(2) are of order Op (1). Typically θ(1) has mean zero and converges in

distribution to a normal distribution. Ignoring the op¡n−1/2

¢term, and taking an expectation,

we obtain the “approximate mean” of√n³bθ − θ´ equals 1√

nEθ(2). We can therefore understand

1nEθ

(2) as the “second order bias” of bθ. Based on the mapping between Eθ(2) and primitiveparameters, we can

√n−consistently estimate Eθ(2) by bEθ(2), say. The bias corrected estimatorbθ − 1

nbEθ(2) has zero second order bias. It is not always a priori clear if the second order bias

1nEθ

(2) approximately equals the actual bias E³bθ − θ´. It is also not a priori clear if the bias

corrected estimator bθ − 1nbEθ(2) indeed reduces much of the bias.

For dynamic panel model of interest, we addressed these questions by Monte Carlo approx-

imation. For such purpose, we first derive the second order bias of bbGMM :Theorem 1 Under Conditions 1-2, the second order bias of bbGMM is equal to

B1 +B2 +B3n

+ o

µ1

n

¶, (4)

where

B1 ≡ Υ−11PT−1t=1 trace

³(Γzzt )

−1 Γεxzzt,t

´B2 ≡ −2Υ−21

PT−1t=1

PT−1s=1 Γ

zx0t (Γzzt )

−1 Γεxzzt,s (Γzzs )−1 Γzxs

B3 ≡ Υ−21PT−1t=1

PT−1s=t Γ

zx0t (Γzzt )

−1B3,1(t, s) (Γzzs )−1 Γzxs .

where Γzzt ≡ E [zitz0it] ,Γzxt ≡ E [zitx∗it] ,Γεxzzt,s ≡ E [ε∗itx∗iszitz0is] , B3,1(t, s) ≡ Ehε∗itzitΓ

zx0s (Γzzs )

−1 zisz0isi

and Υ1 ≡PT−1t=1 Γ

zx0t (Γzzt )

−1 Γzxt .

Proof. Available at http://econ-www.mit.edu/faculty/jhausman/papers.htm.

In Table 1, we compare the actual performance of bbGMM and the prediction of its bias based

on Theorem 1. Table 1 tabulates the actual bias of the estimator approximated by 5,000 Monte

Carlo runs, and compares it with the second order bias based on the formula (4). It is clear that

the second order theory does a reasonably good job except when β is close to the unit circle and

n is small.

5

Theorem 1 suggests a natural way of eliminating the bias. Suppose that bB1, bB2, bB3 are√n-consistent estimators of B1, B2, B3. Then it is easy to see that

bbBC ≡ bbGMM − 1

n

³ bB1 + bB2 + bB3´ (5)

is first order equivalent to bbGMM , and has second order bias equal to zero. Define Γzzt =

n−1Pni=1 zitz

0it, Γzxt = n−1

Pni=1 zitx

∗it, Γεxzzt,s = n−1

Pni=1 e

∗itx

∗iszitz

0is and

B3,1(t, s) = n−1Pn

i=1 e∗itz0itΓ

zxt

³Γzzt

´−1zisz

0is, where e

∗it ≡ y∗it − x∗itbbGMM . Let B1, B2 and B3 be

defined by replacing Γzzt ,Γzxt ,Γ

εxzzt,s and B3,1(t, s) by Γzzt , Γ

zxt ,Γ

εxzzt,s and B3,1(t, s) in B1, B2 and

B3.

Second order asymptotic theory predicts approximately that bbBC would be relatively freeof bias. We examined whether this prediction is reasonably accurate in finite sample by 5,000

Monte Carlo runs. Table 1 summarizes the properties of bbBC . We have seen in Table 1 that thesecond order theory is reasonably accurate unless β is close to one. It is therefore sensible to

conjecture that bbBC would have a reasonable finite sample bias property as long as β is not tooclose to one. We verify this conjecture in Table 1.

We do not expect other methods of bias correction to be much superior to bbBC . Hahn,Kuersteiner, and Newey (2002) recently showed that higher order mean squared errors of bias

corrected maximum likelihood estimators are invariant to methods of bias correction. In partic-

ular, they considered a third order expansion of various bias corrected estimators, and showed

that bootstrap bias corrected MLE, jackknife bias corrected MLE, and analytically bias cor-

rected MLE all have the same higher order MSE. Their result crucially depends on the fact

that the MLE is first order efficient. Because bbGMM is first order efficient, we expect their

result to carry over, and MSE of bootstrap bias corrected estimator, say, is not expected to be

substantially different from that of analytically bias corrected estimator bbBC .However the second order asymptotics “fails” to be a good approximation around β ≈ 1.

This phenomenon can be explained by the “weak instrument” problem. See Staiger and Stock

(1997). Blundell and Bond (1998) argued that the weak instrument problem can be alleviated by

assuming stationarity on the initial observation yi0. The stationarity assumption turns out to be

a predominant source of information around β ≈ 1 as noted by Hahn (1999). The stationaritycondition may or may not be appropriate for particular applications, and substantial finite

sample biases due to inconsistency will result under violation of stationarity.2

2Under stationarity, we would have yi0 ∼ N³

αi1−β ,

σ2ε1−β2

´. We analyze departures from the stationary initial

condition by assuming that instead yi0 ∼ N³

αi1−βF ,

σ2ε1−β2

´. Because the asymptotic bias depends on the weight

6

We therefore turn to some other method to overcome the weak instrument problem around

the unit circle avoiding the stationarity assumption. We demonstrate that some of the difficulties

of inference around the unit circle can be alleviated by taking a long difference. To be specific,

we focus on a single equation based on the long difference

yiT − yi1 = β (yiT−1 − yi0) + (εiT − εi1) (6)

It is easy to see that the initial observation yi0 would serve as a valid instrument. Using similar

intuition as in Hausman and Taylor (1983) or Ahn and Schmidt (1995), we can see that the

residuals yiT−1 − βyiT−2, . . ., yi2 − βyi1 are valid instruments as well. We call this estimatorthe long difference estimator. Such additional instruments require a preliminary consistent

estimator. An obvious choice is Arellano and Bover’s (1995) estimator bbGMM . We call suchan estimator bbLD−AB: It is 2SLS on (6) using ³yi0, yiT−1 −bbGMM · yiT−2, . . . , yi2 −bbGMM · yi1

´as instruments. Such instruments were previously used in Hausman, Newey and Taylor (1987).

Using bbLD−AB as instruments, we can also come up with another long difference estimator.

Call it bbLD1. We can iterate it further, and come up with bbLD2, bbLD3, . . .. Below, we providean intuitive motivation for the long difference estimator. Higher order properties of the long

difference estimator will be compared with other popular estimators in the next section.

It is known that the bias of 2SLS (GMM) depends on 4 factors: “Explained” variance of the

first stage reduced form equation, covariance between the stochastic disturbance of the structural

equation and the reduced form equation, the number of instruments, and sample size:

Ehβ2SLS − β

i≈ 1

n

(number of instruments)× (“covariance”)“Explained” variance of the first stage reduced form equation

Similarly, the Donald-Newey (DN) (2001) MSE formula depends on the same 4 factors. As

explained in Donald and Newey (2001) the formula is valid strictly speaking only when the

number of instruments tends to infinity. It provides however a good approximation to the finite

sample behavior of GMM estimators as far as the impact of additional instruments on bias is

concerned. Kuersteiner (2000) shows that the same expression holds in a time series context if

appropriate adjustments are made for the covariance term. In order to apply the above formula

we make use of our representation of GMM as a linear combination of individual time period

GMM. We compute the bias for a single time period GMM estimator conditional on lagged

matrix, we consider two versions of GMM estimators. The first one uses an identity matrix for initial weight

matrix, whose asymptotic bias is summarized in the upper half of Table 2. The second one uses Blundell and

Bond’s recommendation, whose asymptotic bias is summarized in the lower half of Table 2. The results show that

the bias of Blundell and Bond’s estimator can be substantial under misspecification.

7

instruments. We now consider first differences (FD) and long differences (LD) to see why LD

does so much better in our Monte-Carlo experiments.

Assume that T = 4. The first difference set up considers the following equation among

others:

y4 − y3 = β (y3 − y2) + ε4 − ε3 (7)

For the RHS variables it uses the instrument equation:

y3 − y2 = (β − 1) y2 + α+ ε3

Now calculate the R2 for equation (7) using the Ahn-Schmidt (AS) moments under “ideal

conditions” where you know β in the sense that the nonlinear restrictions become linear re-

strictions: We would then use (y2, y1, y0) as instruments. Assuming stationarity for sym-

bols, but not using it as additional moment information, we can write y0 = α/ (1− β) + ξ0,where ξ0 ∼

¡0,σ2ε/

¡1− β2¢¢. It can be shown that the covariance between the structure

error and the first stage error3 is −σ2ε, and the “explained variance” in the first stage isequal to σ2ε (1− β) / (β + 1). Therefore, the ratio that determines the bias of 2SLS is equalto − (1 + β) / (1− β), which is equal to −19 for β = .9. For n = 100, this implies a percentagebias of −105.56.

We now turn to the LD setup:

y4 − y1 = β (y3 − y0) + ε4 − ε1

It can be shown that the covariance between the first stage and second stage errors is −β2σ2ε,and the “explained variance” in the first stage is given by

−σ2ε¡2β6 − 4β4 − 2β5 + 4β2 + 4β − 2β3 + 6¢σ2 + β6 − β4 + 2− 2β3¡−2β − 3 + β2¢σ2 − 1 + β2 ,

where σ2 = σ2α/σ2ε. Therefore, the ratio that determines the bias is equal to

−.37408 + 2.5703× 10−4σ2 + 4.8306× 10−2

for β = .9. Note that the maximum value that this ratio can take in absolute terms is −.37,which is much smaller than −19 for the Ahn and Schmidt estimator. We therefore concludethat the long difference increases R2 but decreases the covariance thus alleviating the “weak

instruments” problem. Further, the number of instruments is smaller in the long difference3This covariance is conditional on the set of instruments.

8

specification so we should expect even smaller bias. Thus, all of the factors in the HH equation,

except sample size, cause the LD estimator to have smaller bias. As we have shown the difference

in bias can be substantial.

4 Higher Order Bias and MSE of Long Difference Estimator

In this section, we examine the third order expansion of the long difference estimator and com-

pare it with two other estimators discussed in the literature. Note that unlike the approximate

bias calculations of the previous section the expansions considered here are valid for any fixed

number of instruments. Exact formulas tend to be very complicated and are not reported here.4

Many econometric estimators, say bθ for θ based on a sample of size n allow for an expansion ofthe form:

√n³bθ − θ´ = θ(1) + 1√

nθ(2) +

1

nθ(3) + op

µ1

n

¶(8)

where θ(1), θ(2), and θ(3) are of order Op (1). Typically θ(1) has mean zero and converges in

distribution to a normal distribution. Also, θ(3) usually has mean equal to zero. Ignoring the

op¡n−1

¢term, and taking an expectation, we obtain the “approximate mean” of

√n³bθ − θ´

equal to 1√nEθ(2). We can therefore understand 1

nEθ(2) as the “second order bias” of bθ. Ignoring

the op¡n−1

¢term and taking an expectation of

³√n³bθ − θ´´2, we obtain the approximate MSE

of√n³bθ − θ´ equal toE

·³θ(1)

´2¸+1

nE

·³θ(2)

´2¸+

2√nEhθ(1)θ(2)

i+2

nEhθ(1)θ(3)

i+ o

µ1

n

¶Ignoring the o

¡n−1

¢term, we may understand

1

nE

·³θ(1)

´2¸+1

n2E

·³θ(2)

´2¸+

2

n√nEhθ(1)θ(2)

i+2

n2Ehθ(1)θ(3)

i(9)

as the approximate MSE of bθ.We first examine the higher order bias and MSE for bbAS , the estimator proposed by Ahn and

Schmidt (1995) and bbBB, the estimator proposed by Blundell and Bond (1998). Both are GMMestimators, and the third order expansion of the usual two step estimator is critically dependent

on the initial weight matrices. It can be shown that the third order expansion of the “four4Details are available at http://econ-www.mit.edu/faculty/jhausman/papers.htm.

9

step” GMM estimator is invariant to the initial weight matrix5. Rilstone, Srivastava, and Ullah

(1996) obtained higher order bias and MSE formulas for method of moments estimators. Their

expressions are difficult to adapt to our situation and we have developed our own expansions.

We computed (9) analytically for T = 5, and summarized it along with the second order bias in

Table 3.6 Table 3 also contains the higher order properties of LD estimators. Given the two step

IV nature of the LD estimators, this required a separate expansion for IV estimator along with

a third order expansion7 of bbGMM . Unfortunately, analytic calculation of (9) for LD estimatorswas infeasible. We used a stochastic approximation to each term in (9) instead.8

We can see that theoretical higher order MSEs of bbLD2 and bbLD3 are comparable to that ofbbAS . We can also see that theoretical bias properties of bbLD2 and bbLD3 are moderately superiorto that of bbAS. In Table 4, we compare the theoretical higher order properties of selected

estimators with their actual properties. Interestingly, we find that actual properties of long

difference estimators are much better than those predicted by higher order theory, especially

around the unit circle. This suggests that Monte Carlo comparison should be preferred to any

higher order expansion for ranking various estimators. In Table 5, we examine Monte Carlo

properties of those estimators. We find that LD does better than the other estimators for large

β and not significantly worse for moderate β.5The details are available at http://econ-www.mit.edu/faculty/jhausman/papers.htm. On a related note,

Newey and Smith (2001) point out the second order expansion of “three” step GMM estimator is invariant to the

initial weight matrix.6The last column of Table 3 contains properties of bbLDGMM . This is the four step GMM estimator based on

the moment restrictions

E [yi0 · (yiT − yi1 − β (yiT−1 − yi0))] = 0

E [(yiT−1 − βyiT−2) · (yiT − yi1 − β (yiT−1 − yi0))] = 0

...

E [(yi2 − βyi1) · (yiT − yi1 − β (yiT−1 − yi0))] = 0

Interestingly, finite sample properties of bbLDGMM seems poorer than iterated 2SLS versions of long difference

estimators.7All derivations referred to in this paragraph are available at http://econ-

www.mit.edu/faculty/jhausman/papers.htm.8We perform Monte Carlo integration by generating i.i.d. sequences of

³θ(1)j , θ

(2)j , θ

(3)j

´j = 1, . . . , J , and

approximating E·³θ(1)

´2¸by 1

J

PJj=1

³θ(1)j

´2for example. Our choice of J was 5,000.

10

5 Near Unit Root Approximation

Our Monte Carlo simulation results summarized in Table 1 indicate that the previously discussed

approximations and the bias corrections that are based on them do not work well near the unit

circle. This result occurs because the identification of the model becomes “weak” near the unit

circle. See Blundell and Bond (1998), who related the problem to the analysis by Staiger and

Stock (1997). In this Section, we formally adopt approximations local to the points in the

parameter space that are not identified. To be specific, we consider model (1) for T fixed and

n→∞ when also βn tends to unity. The type of asymptotic sequences that we consider, where

βn → 1 at the same time as n→∞ may be interpreted in the spirit of Bekker (1994). We want

to keep certain aspects of the model, in our case the quality of the instruments, constant as the

sample size increases. The purpose of alternative asymptotic sequences is to obtain asymptotic

approximations that better reflect finite sample properties, in our case poor identification, than

standard asymptotics with fixed parameters would deliver.

A check that these alternative asymptotics provide better approximations to the finite sample

distribution near the unit circle can be obtained from comparing Tables 6 and 7 with actual

Monte Carlo results for Ahn and Schmidt and Arellano and Bond estimators for large values

of β. The predicted bias and MSE match much better than the predictions from Taylor series

expansions discussed earlier.

The fact that βn → 1 with n → ∞ is not meant to be a reflection of some realistic data

generating process but is an analytical device. The asymptotics that we propose are also quite

different from the near unit root literature in a pure time series context. Here, T is fixed and the

degree of temporal dependence modelled by βn is not directly relevant for our analysis. What

matters are the implications for identification as discussed before.

We analyze the bias of the associated weak instrument limit distribution. We analyze the

class of GMM estimators that exploit Ahn and Schmidt’s (1997) complete set of moment con-

ditions and show that a strict subset of the full set of moment restrictions should be used in

estimation in order to minimize bias. We show that the long difference estimator is a good

approximation to the bias minimal procedure.

Following Ahn and Schmidt we exploit the moment conditions

E£uiu

0i

¤= σ2εI + σ

2α11

0

E [uiyi0] = σαy01

with 1 = [1, ..., 1]0 a vector of dimension T and ui = [ui1, ..., uiT ]0 . The moment conditions can

11

be written more compactly as

b =

vechE [uiu0i]E [uiyi0]

= σ2ε vech I

0

+ σ2α vech110

0

+ σαy0 0

1

(10)

where the redundant moment conditions have been eliminated by use of the vech operator which

extracts the upper diagonal elements from a symmetric matrix. Representation (10) makes it

clear that the vector b ∈ RT (T+1)/2+T is contained in a 3 dimensional subspace which is anotherway of stating that there are G = T (T +1)/2+T − 3 restrictions imposed on b. This statementis equivalent to Ahn and Schmidt’s (1997) analysis of the number of moment conditions.

GMM estimators are obtained from the moment conditions by eliminating the unknown

parameters σ2ε,σ2α and σαy0. The set of all GMM estimators leading to consistent estimates of β

can therefore be described by a (T (T + 1)/2 + T )×G matrix A which contains all the vectorsspanning the orthogonal complement of b. This matrix A satisfies

b0A = 0.

For our purposes it will be convenient to choose A such that

b0A =£E [uit∆uis] , E [uiT∆uij ] , E [ui∆uik] , E

£∆u0iyi0

¤¤,

s = 2, ..., T ; t = 1, ...s− 2; j = 2, ..., T − 1; k = 2, ..., T

where ∆ui = [ui2 − ui1, ..., uiT − uiT−1]0 and ui = T−1PTt=1 uit. It becomes transparent that

any other representation of the moment conditions can be obtained by applying a corresponding

nonsingular linear operator C to the matrix A. It can be checked that there exists a nonsingular

matrix C such that b0AC = 0 is identical to the moment conditions (4a)-(4c) in Ahn and Schmidt

(1997).

We investigate the properties of (infeasible) GMM estimators based on

E [uit∆uis (β)] = 0, E [uiT∆uij (β)] = 0, E [ui∆uik (β)] = 0, E [yi0∆uit (β)] = 0

obtained by setting ∆uit (β) ≡ ∆yit − β∆yit−1. Here, we assume that the instruments uit areobservable. We partition the moment conditions into two groups where the second group only

contains instruments involving yi0. In particular, let gi1 (β) denote a column vector consisting of

uit∆uis (β) , uiT∆uij (β) , ui∆uik (β). Also let gi2 (β) ≡ [yi0∆ui (β)]. It will turn out that onlythese instruments are asymptotically relevant under near unit root asymptotics. Finally, let

gn (β) ≡ n−3/2Pni=1

£gi1 (β)

0 , gi2 (β)0¤0 with the optimal weight matrix Ωn ≡ E £gi (βn) gi (βn)0¤.

12

The infeasible GMM estimator of a possibly transformed set of moment conditions C0gn (β) then

solves

β2SLS = argminβ

gn (β)0C¡C0ΩnC

¢+C 0gn (β) (11)

where C is a G × r matrix for 1 ≤ r ≤ G such that C0C = Ir and rank³C (C0ΩC)+C0

´≥ 1.

We use (C0ΩC)+ to denote the Moore-Penrose inverse. We thus allow the use of a singular

weight matrix. Choosing r less than G allows to exclude certain moment conditions. Let

fi,1 ≡ − ∂gi1 (β)/∂β, fi,2 ≡ − ∂gi2 (β)/∂β, and fn ≡ n−3/2Pni=1

hf 0i1, f 0i,2

i0. The infeasible

2SLS estimator can be written as

β2SLS − βn =³f 0nC

¡C0ΩnC

¢+C 0fn

´−1f 0nC

¡C 0ΩnC

¢+C0gn (βn) . (12)

We are now analyzing the behavior of β2SLS − βn under local to unity asymptotics. We makethe following additional assumptions.9

Condition 3 Let yit = αi + βnyit−1 + εit with εit ∼ N¡0,σ2ε

¢, αi ∼ N

¡0,σ2α

¢and yi0 ∼

N³

αi1−βn ,

σ2ε1−β2n

´, where βn = exp(−c/n) for some c > 0.

The critical assumption in Condition (3) is that βn = exp(−c/n) = 1 − c/n + o(n−1).The parameter βn controls the degree of correlation between regressors and instruments. For

the case of GMM estimators based on first differences this correlation amounts to E [∆ytyt−2] =

O(βn−1) = −c/n.Our specification thus implies an asymptotic lack of correlation more stringentthan the one used by Staiger and Stock (1997). In Hahn and Kuersteiner (2002) it is argued that

the faster n−1-rate corresponds to a situation of near-nonidentification. Numerical and analytical

calculations10 in the context of our dynamic panel model show that n−1/2-rate asymptotics do

not capture the type of finite sample biases that we find important for the behavior of GMM

estimators and that the specification βn = exp(−c/n) delivers more accurate predictions.Under the generating mechanism described in the previous definition the following Lemma

can be established.

Lemma 1 Assume βn = exp(−c/n) for some c > 0. For T fixed and as n→∞

n−3/2nXi=1

fi,1p→ 0, n−3/2

nXi=1

gi,1(β0)p→ 0

9Kruiniger (2000) considers similar local-to-unity asymptotics.10Available at http://econ-www.mit.edu/faculty/jhausman/papers.htm

13

and

n−3/2nXi=1

£f 0i,2, g

0i,2(β0)

¤0 d→ £ξ0x, ξ

0y

¤0

where£ξ0x, ξ

0y

¤0 ∼ N(0,Σ) with Σ = Σ11 Σ12

Σ21 Σ22

and Σ11 = δI, Σ12 = δM 01, Σ22 = δM2, where

δ =σ2ασ

2ε

c2,

M1 =

−1 1 0

. . . . . .

0. . . 1

−1

, M2 =

2 −1 0

−1 . . . . . .. . . . . . −1

−1 2

and Σ12 = Σ021. We also have

1

n2Ωn =

0 0

0 Σ22

+ o (1) .Proof. See Appendix A.

It can be shown that the limiting normal distribution of Lemma (1) is singular. An alternative

representation of the limiting distribution can be given by eliminating redundant dimensions

from£ξ0x, ξ

0y

¤0. Let ξy = £ξ0y0, ξy1¤0 where ξy1 is the last element of ξy and ξu = £ξ0x, ξy1¤0 . Thenξy = Hξu, where H is a (T − 1)× T matrix defined as

H =

−1 1 0 0

. . . . . ....

0 −1 1 0

0 · · · 0 1

.

It can be checked easily that the covariance matrix of vector£ξ0x, ξ

0uH

0¤0 is Σ as required. UsingLemma (1) the limiting distribution of β2SLS − βn is stated in the next corollary. For thispurpose we define the augmented vectors ξ#x =

£0, ..., 0, ξ0x

¤0 and ξ#y = £0, ..., 0, ξy¤ and partitionC = [C00, C01]

0 such that C0ξ#x = C01ξx. Let r1 denote the rank of C1.

Corollary 1 Let β2SLS − βn be given by (12). If Condition 3 is satisfied then

β2SLS − 1 d→ ξ0xC1 (C01Σ22C1)+C01Hξu

ξ0xC1 (C01Σ22C1)+C01ξx

= X (C,Σ22) (13)

14

Unlike the limiting distribution for the standard weak instrument problem, X (C,Σ22), as

defined in (13), is based on normal vectors that have zero mean. This degeneracy is generated

by the presence of the fixed effect in the initial condition, scaled up appropriately to satisfy the

stationarity requirement for the process yit. Inspection of the proof shows that the usual concen-

tration parameter appearing in the limit distribution is dominated by a stochastic component

related to the fixed effect.

Based on Corollary 1, we define the following class of 2SLS estimators for the dynamic panel

model. The class contains estimators that only use a nonredundant set of moment conditions

involving the initial conditions yi0.

Definition 1 Let β∗2SLS be defined as β∗2SLS ≡ argminβ g2,n (β)0C1

³C01ΩC1

´−1C 01g2,n (β), where

g2,n (β) ≡ n−3/2Pni=1 gi2 (β), Ω is a symmetric positive definite (T − 1)×(T − 1) matrix of con-

stants and C1 is a (T − 1)× r1 matrix of full column rank r1 ≤ T − 1 such that C01C1 = I.

Next we turn to the analysis of the asymptotic bias for the estimator β∗2SLS of the dynamic

panel model. Since the limit only depends on zero mean normal random vectors we can apply

the results of Smith (1993).

Theorem 2 Let X∗³C1, Ω

´be the limiting distribution of β∗2SLS − 1 in Definition 1 under

Condition 3. Let D ≡ (D +D0) /2, where D = WM1 and W = C1³C01ΩC1

´−1C 01. Let Γ be

the matrix of eigenvectors with corresponding eigenvalues λi of W. Assume λ1 ≥ ... ≥ λT−1.

Let Λ = diag(λ1, ..,λT−1) and Λ1 be the upper right block of Λ containing all the non-zero

eigenvalues. Partition Γ = [Γ1,Γ2] where Γ1 are the eigenvectors corresponding to the nonzero

eigenvalues of W such that Γ1Λ1Γ01 =W and Γ1Γ01 + Γ2Γ02 = I. Then

EhX³C1, Ω

´i=M1

³λ−1,Γ01DΓ1,Λ1, r

´≡ λ−1

∞Xk=0

(1)k¡12

¢1+k¡

r12

¢1+k

k!C1,k1+k

³Γ01DΓ1, Ir − λ−1Λ1

´where r = rank (W ), (a)b is the Pochhammer symbol Γ (a+ b) /Γ (b), C

1,kp+k (., .) is a top order

invariant polynomial defined by Davis (1980) and λ is the largest eigenvalue of W . The mean

EhX³C1, Ω

´iexists for r ≥ 1.

Proof. See Appendix A.

The Theorem shows that the bias of β∗2SLS both depends on the choice of C1 and the weight

matrix Ω. Note for example that E [X(C1, IT−1)] = tr D/r1.

The problem of minimizing the bias by choosing optimal matrices C1 and Ω does not seem to

lead to an analytical solution but could in principle be carried out numerically for a given number

15

of time periods T . For our purpose we are not interested in such an exact minimum. Instead

we analyze optimal choices of C1 minimizing bias for a given weight matrix. Two choices of Ω

are most relevant in practice, namely Ω = IT−1 or Ω = Σ22.We are able to obtain an analytical

solution for the bias minimal estimator. We use this analytical optimum as a bench mark against

which we judge existing estimators that have been proposed in the literature.

Theorem 3 Let X³C1, Ω

´be as defined in Definition 1. Let D = (D +D0) /2 where D =

C1

³C01ΩC1

´−1C 01M1. Let Γ1 be as defined in Theorem 2. Then

minC1 s.t. C01C1=Ir1r1=1,..,T−1

|E [X(C1, IT−1)]| = minC1 s.t. C01C1=Ir1r1=1,..,T−1

|E [X(C1,Σ22)]| .

Moreover,

E [X (C1, IT−1)] = tr D/r1.

Let C∗1 = argminC1 |E [X(C1, IT−1)]| subject to C01C1 = Ir1 , r1 = 1, .., T −1. Then C∗1 = ρi whereρi is the eigenvector corresponding to the smallest eigenvalue li of M2. Thus, minC1 tr D/r1 =

min li/2. As T →∞ the smallest eigenvalue of M2, min li → 0.

Theorem 3 shows that the estimator that minimizes the bias is based only on a single moment

condition which is a linear combination of the moment conditions involving yi0 as instrument

where the weights are the elements of the eigenvector ρi corresponding to the smallest eigenvalue

of M2.

Optimal estimators based on C∗1 are unintuitive to implement. Moreover, optimality only

holds near the unit circle. For these reasons we do not recommend direct implementation of

these optimal procedures. We instead use them as benchmarks against which we measure the

performance of more easily implementable estimators.

We turn now to an analysis of bias and mean squared error of commonly used estimators

under near unit root asymptotics. Let 10k = [1, ..., 1] be a k-vector of ones and note that for the

Arellano and Bond estimator

CAB1 =

0 · · · 0 10T−1... · · · 0

0 102...

1 0 · · · 0

is a (T − 1)× ((T − 1)T/2) matrix of rank T − 1. Correspondingly, for the Ahn and Schmidtestimator, CAS1 =

£CAB1 ;0(T−1)×(2T−3)

¤, a (T − 1)× ((T + 1)T/2+T −3) matrix of rank T −1.

16

The long distance estimator takes on the form

CLD1 =£1T−1;0(T−1)×(T−2)

¤where now WLD = CLD1

¡CLD01 Σ22C

LD1

¢+CLD01 is singular of rank 1. The next result gives

explicit formulas for the bias of the asymptotic limiting distribution of the Ahn and Schmidt,

Arellano and Bond and long difference estimators.

Corollary 2 Let X¡CAS1 ,Σ22

¢, X

¡CAB1 ,Σ22

¢and X

¡CLD1 ,Σ22

¢be the limiting distribution of

the Ahn-Schmidt, Arellano-Bond and Long Difference estimators under Condition 3. Let Da ≡(Da +Da0) /2, whereDa =WaM 0

1 andWa = Ca1 (C

a01 Σ22C

a1 )+Ca01 for a = ”AS”, ”AB”, ”LD”.

Let λa be the largest eigenvalue of W a. Then for a = “AS”,“AB” ,

E [X (Ca1 ,Σ22)] =M1³λ−1a , D

a,W a, T − 1´.

Let Γ the matrix of eigenvectors of WLD with Γ = [Γ1,Γ2] where Γ1 is the eigenvector corre-

sponding to the nonzero eigenvalue of WLD such that Γ1Λ1Γ01 = WLD and Γ1Γ01 + Γ2Γ02 = I.

Define z1 = L−1Γ01ξx and z2 = L−1Γ02ξx. Then, for a = “LD” ,

E [X (Ca1 ,Σ22)] =M1³λ−1a ,Γ

01D

aΓ1,Λ1, 1´.

Numerical procedures to evaluate the top order invariant polynomials appearing in the ex-

pressions for the bias are discussed in Smith (1993). Table 6 reports numerical values for the

biases of the Arellano and Bond and Ahn and Schmidt estimators under near unit root as-

ymptotics and compares them with biases for the long difference estimator as well as the bias

minimal estimator. Despite the fact that long difference does not achieve the same bias reduc-

tion as the fully optimal estimator it has significantly less bias than the more commonly used

implementations of the GMM estimator.

We now turn to the analysis of the mean squared error implied by the asymptotic distribution

under Condition (3). Since β∗2SLS − 1 d→ X∗(C1,Σ22) we take E [X∗(C1,Σ22)]2 as a measure for

the MSE of β∗2SLS. Unlike in the case of bias it is not feasible to find analytical optima for the

problem of minimizing E [X∗ (C1,Σ22)]2 with respect to C1.

Nevertheless, for a given value of T and a particular point in the parameter space, such opti-

mization can be carried out numerically by Monte Carlo integration techniques and subsequent

numerical optimization. We carry out such a numerical exercise to establish that conventional

estimators which are optimal under standard first order asymptotic theory fail to be admissible

under weak identification circumstances. In our numerical examples we calibrate the asymptotic

17

distribution to the case where β = .95 and n = 100. This implies a value for c = 100(− log .95).In Table 7 we compare E [X (C∗1 ,Σ22)]

2 to E£X¡CAS1 ,Σ22

¢¤2 for different values of T. HereC∗1 = argmin

C1 s.t. C01C1=Ir1r1=3,..,T−1

E [X (C1,Σ22)]2 .

Note that the minimal dimension of C1 is 3 to guarantee the existence of the second moment.

The results of Table 7 show that the MSE of the Ahn and Schmidt estimator is clearly dominated

by the optimal procedure. It also shows that the optimal number of moment conditions increases

slowly with T and is much lower than the number of moments used by Ahn and Schmidt. In

fact, the optimal procedure uses only about 1/10 of the available moments.

These results show that the optimality arguments under standard asymptotic approxima-

tions break down when the model is weakly identified. While it is optimal from an asymptotic

variance and thus MSE point of view under standard asymptotics to use as many instruments as

possible, this argument is invalid under the near non-identification asymptotics we use here. The

GMM estimator that minimizes the asymptotic MSE under these circumstances uses a subset of

moment conditions that is much smaller than the set of all available moment conditions. Some

intuition can be gained from our analytical results regarding bias minimization. In Theorem (3)

it is show that the bias minimal procedure in fact only uses a single moment condition. When

focusing on the MSE this bias reduction needs to balanced off against efficiency considerations.

The resulting optimum lies somewhere between estimators that use all and estimators that use

very little information.

However, we can not compare the MSE of the Ahn and Schmidt and Arellano and Bond esti-

mators to our long difference estimator analytically because under the near unit root asymptotic

limiting distribution the long difference estimator has unbounded second moments. At the same

time the Monte Carlo results are consistent with the analytic result that the optimal estimator

does not use all the moment conditions. Indeed, the optimal estimator uses only a small fraction

of the moment conditions. The long difference estimator does well in terms of MSE because of

its small amount of bias and its use of a limited set of moment conditions.

Implementation of a procedure that is optimal in terms of its MSE under non-standard

asymptotics is not recommended because such a procedure depends on the unknown parameter

β through δ. Instead, our Monte Carlo results show that the long difference estimator is a

reasonable, very easily implementable approximation to such an optimal estimator.

18

6 Conclusion

We have considered the dynamic panel data model where it has been recognized that the usual

IV (GMM) estimators have substantial bias for a large positive β, which commonly occurs in

applied research. We consider two techniques to solve the bias problem. First we propose

a second-order bias corrected IV estimator. However, this estimator does not solve the bias

problem for large β, as is demonstrated by Monte-Carlo experiments. Our second approach is

an IV estimator that uses a reduced set of instruments, in particular “long differences”. This

estimator leads to a significant reduction in bias as predicted by a “weak instrument” analysis.

Further, the long difference estimator does well in MSE comparisons with existing estimators.

We then conduct a “local to unity” analysis to analyze why the long difference estimator

does significantly better than traditional second order theory predicts. We find that near the

unit circle the optimal estimator in terms of bias uses only one moment restriction, similar to

the long difference estimator. We also find that the optimal estimator in terms of MSE uses

a much restricted set of moment restrictions, typically about 1/10 of the available restrictions.

Thus, our analysis demonstrates that the usual first order asymptotic advice of using all of the

moment restrictions does not provide proper guidance in the dynamic panel data model for large

β. Instead a restricted set of moment conditions lead to a better estimator. The long difference

estimator, which uses a restricted set of moment conditions, provides such an improved estimator

as our Monte-Carlo results demonstrate.

19

A Technical Appendix

Proof of Lemma 1. Let ξi0 ≡ yi0 − αi/ (βn − 1) such that ξi0 ∼ N¡0,σ2ε/(1− β2n)

¢. Note

that

∆yit = βt−1n (βn − 1) ξi0 + εit + (βn − 1)

t−1Xs=1

βs−1n εit−s

such that

E [|uit∆yis−1|] ≤qE£u2it¤rEh(∆yis−1)2

i

=pσ2ε + σ

2α

vuuutEÃβs−2n (βn − 1) ξi0 + εis−1 + (βn − 1)

s−2Xr=1

βr−1n εis−1−r

!2=

pσ2ε + σ

2α

vuutβ2(s−2)nσ2ε (βn − 1)21− β2n

+ σ2ε + (βn − 1)2 σ2εs−2Xr=1

β2(r−1)n = O (1)

By independence of uit∆yis−1 across i, it therefore follows that n−3/2Pni=1 uit∆yis−1 = op (1).

By the same reasoning, we obtain n−3/2Pni=1 uiT∆yij−1 = op (1), and n

−3/2Pni=1 ui∆yik−1 =

op (1). We therefore obtain n−3/2Pni=1 fi,1 = op (1). We can similarly obtain n

−3/2Pni=1 gi,1 =

op (1).

Next we consider n−3/2Pni=1 fi,2 andn

−3/2Pni=1 gi,2. Note that

E [∆yityi0] = Ehyi0

³βt−1n (βn − 1) ξi0 + εit + (βn − 1)

Xt−1r=1

βr−1n εis−1−r´i

= βt−1n σ2εβn − 11− β2n

= σ2ε−c/n2c/n

+ o(1) =−σ2ε2+ o(1),

and

Eh(∆yityi0)

2i= E

·y2i0

³βt−1n (βn − 1) ξi0 + εit + (βn − 1)

Xt−1r=1

βr−1n εis−1−r´2¸

= β2(t−1)n

(βn − 1)2 σ2ε1− β2n

σ2α(1− βn)2

+ 3β2(t−1)n

σ4ε(βn − 1)2¡1− β2n

¢2+³σ2ε + (βn − 1)2 σ2ε

Xs−2r=1

β2(r−1)n

´Ã σ2α(1− βn)2

+σ2ε¡

1− β2n¢!

=σ2εσ

2α

c2n2 +O(n).

such that Var¡n−3/2

Pni=1∆yityi0

¢= O(1). For n−3/2

Pni=1 gi,2 (β0) we have from the moment

conditions that E [gi,2 (β0)] = 0 and

Var (∆ui(β0)yi0) = 2σ2εσ2α (1− βn)−2 +O(n) =

2σ2εσ2α

c2n2 +O(n).

20

The joint limiting distribution of n−3/2Pni=1

hf 0i,2 −E

hf 0i,2i, gi,2(β0)

0i0can now be obtained

from a triangular array CLT. By previous arguments

E£f 0i,2, gi,2(β0)

0¤ = h µ0 0 · · · 0i

with µ = −σ2ε/2ι+O(n−1) where ι is the T − 1 dimensional vector with elements 1. Then

Eh¡f 0i,2 −E

£f 0i,2¤, gi,2(β0)

0¢0 ¡f 0i,2 −E £f 0i,2¤ , gi,2(β0)0¢i = Σnwhere

Σn =

Σ11,n Σ12,n

Σ21,n Σ22,n

By previous calculations we have found the diagonal elements of Σ11,n and Σ22,n to be

σ2εσ2α

c2 n2

and 2σ2εσ2α

c2 n2. The off-diagonal elements of Σ11,n are found to be

E£∆yit∆yisy

2i0

¤= E

hy2i0

³βs−1n (βn − 1) ξi0 + εis + (βn − 1)

Xs−1r=1

βr−1n εis−1−r´

×³βt−1n (βn − 1) ξi0 + εit + (βn − 1)

Xt−1r=1

βr−1n εit−1−r´i

= βt−1n βs−1n

(βn − 1)2¡1− β2n

¢ Ã σ2εσ2α

(1− βn)2+ 3

σ4ε¡1− β2n

¢!+O(1) = σ2εσ2α

2cn+O(1)

which is of lower order of magnitude while n−1 (E [∆yityi0])2 = O(1). Thus n−1Σ11,n →diag(σ

2εσ

2α

c , ..., σ2εσ

2α

c ). The off-diagonal elements of Σ22,n are obtained from

E£∆uit∆uisy

2i0

¤=

−σ2εσ2α (1− βn)−2 +O(n) t = s+ 1 or t = s− 10 otherwise

For Σ12,n, we consider

E£∆yit∆uisy

2i0

¤=

σ2εσ

2α

c2 n2 +O(n) if t = s

−σ2εσ2αc2 n2 +O(n) if t = s− 10 otherwise

It then follows that for ` ∈ RT (T+1)/+2T−6 such that `0` = 1 n−3/2Pni=1 `

0Σ−1/2n

hf 0i,2 −Efi,2, gi,2(β0)0

i0 d→N (0, 1) by the Lindeberg-Feller CLT for triangular arrays. It then follows from a straight-

forward application of the Cramer-Wold theorem and the continuous mapping theorem that

n−3/2Pni=1

hf 0i,2, gi,2(β0)0

i0 d→ £ξ0x, ξ

0y

¤0 where £ξ0x, ξ0y¤0 ∼ N (0,Σ) .Note that n−3/2Pni=1 `

0E [fi,2] =

O(n−1/2) and thus does not affect the limit distribution.

21

Finally note that Ehgi,1g

0i,1

i= O (1). Also note that

Var (∆ui(β0)yi0) = 2σ2εσ2α (1− βn)−2 +O(n) =

2σ2εσ2α

c2n2 +O(n).

The off-diagonal elements of Ehgi,2g

0i,2

iare obtained from

E£∆uit∆uisy

2i0

¤=

−σ2εσ2α (1− βn)−2 +O(n) t = s+ 1 or t = s− 10 otherwise

It therefore follows that

1

n2E£gig

0i

¤=

0 0

0 Σ22

+ o (1) .

Proof of Theorem 2. Use the transformation z = L−1ξx with Σ11 = LL0. Note that we

can take L =√δIr1. Define W = C1

³C 01ΩC1

´−1C 01. Then X

³C1, Ω

´= z0L0WHξu/ z0L0WLz.

Let F = Σ12Σ−111 = M01 where the second equality is based on Σ

−111 = δ

−1Ir1 . Next use the fact

that E [Hξu| ξx] = Fξx = FLz. Using a conditioning argument it then follows that

EhX³C1, Ω

´i= E

·z0L0WFLzz0L0WLz

¸= E

·z0Dzz0Wz

¸.

Note that z0Dz = z0D0z = z0Dz where D = 12 (D +D

0) is symmetric. Define z1 = Γ01z and

z2 = Γ02z. It now follows that

E

·z0Dzz0Wz

¸= E

·z01Γ01DΓ1z1 + z01Γ01DΓ2z2

z01Λ1z1

¸= E

·z01Γ01DΓ1z1z01Λ1z1

¸where the first equality follows from Γ02W = 0 and the second equality follows from independence

between z1 and z2. The result then follows from Smith (1993, Eq. 2.4, p. 273).

Proof of Theorem 3. We first analyze E [X (C1,Σ22)]. Note that in this case W =

C1 (C01Σ22C1)−1C01 = δ

−1C1 (C01M2C1)−1C01 such that

E [X (C1,Σ22)] = E


¸and δ tr D = δ/2 trW (M 0

1 +M1) = −r1/2 since M 01 +M1 = −M2 where r1 is the rank of C1.

Then it follows from Smith (1993, p. 273 and Appendix A)

E


¸= λ

−1∞Xk=0

(1)k¡12

¢1+k¡

r12

¢1+k

k!C1,k1+k

³Γ01DΓ1, Ir1 − λ−1Λ1

´, (14)

22

where for any two real symmetric matrices Y1, Y2

C1,k1+k (Y1, Y2) =k!

2¡12

¢k+1

kXi=0

¡12

¢k−i

(k − i)! tr¡Y1Y

i2

¢Ck−i (Y2)

and

Ck (Y2) =k!¡12

¢k

dk (Y2)

dk (Y2) = k−1k−1Xj=0

1

2tr³Y k−j2

´dj (Y2)

d0 (Y2) = 1.

Since all elements λ−1λi in λ−1Λ1 satisfy 0 < λ

−1λi ≤ 1 it follows that Ir1 − λ−1Λ1 is positive

semidefinite implying that Ck³Ir1 − λ−1Λ1

´≥ 0, which holds with equality only if all eigen-

values λi are the same. Also note that, if Y2, Y3 are diagonal matrices and Y1 any conformable

matrix, then trY3Y1Y i2 = trY1Y3Yi2 . Therefore,

trΓ01DΓ1(Ir1 − λ−1Λ1)i = trΓ01¡WM 0

1 +M1W¢Γ1³Ir1 − λ−1Λ1

í=

1

2tr¡Λ1Γ

01M

01Γ1 + Γ

01M1Γ1Λ1

¢ ³Ir1 − λ−1Λ1

í=

1

2trΓ01

¡M 01 +M1

¢Γ1Λ1

³Ir1 − λ−1Λ1

í= −1

2trΓ01M2Γ1Λ1

³Ir1 − λ−1Λ1

í.

Next, note that for two positive semi-definite matrices Y1 and Y2 it follows that trY1Y2 =

trY1/22 Y1Y

1/22 ≥ 0 because Y 1/22 Y1Y

1/22 is well defined and positive semi-definite by positive semi-

definiteness of Y1. Next note that Λ1³Ir1 − λ−1Λ1

íis positive semi-definite by previous argu-

ments. Also, Γ01M2Γ1 is positive definite. Therefore, every trµΓ01DΓ1

³Ir1 − λ−1Λ1

í¶Ck−i

³Ir1 − λ−1Λ1

´has the same sign, and so does every C1,k1+k

³Γ01DΓ1, Ir1 − λ−1Λ1

´. Therefore, we have

|E [X (C1,Σ22)]| ≥¯¯λ−1

0Xk=0

(1)k¡12

¢1+k¡

r12

¢1+k

k!C1,k1+k

³Γ01DΓ1, Ir1 − λ−1Λ1

´¯¯For k = 0, we have

C1,01

³Γ01DΓ1, Ir1 − λ−1Λ1

´= trΓ01DΓ1 ≥ −δ−1r1/2. (15a)

For the last inequality let λDi be the eigenvalues of D. By Magnus and Neudecker (1988, Theorem

13, p.211), trΓ01DΓ1 ≥Pr1i=1 λ

Di . Note that D is of reduced rank r1 such that tr D =

Pr1i=1 λ

Di =

23

−δ−1r1/2. Also, (1)0¡12

¢1/¡r12

¢1= 1

r1. This shows that |E [X (C1,Σ22)]| ≥ λ

−1/2 for all C1

such that C01C1 = Ir1 and all r1 ∈ 1, 2, .., T − 1. Then

minC1 s.t. C01C1=Ir1r1∈1,2,..,T−1

|E [X (C1,Σ22)]| ≥ minC1 s.t. C01C1=Ir1r1∈1,2,..,T−1

λ−1

2δ≥ min lj

2, (16)

where min lj is the smallest eigenvalue of M2. Since λ is the largest eigenvalue of W and W

has the same nonzero eigenvalues as δ−1 (C 01M2C1)−1 by Zhang (1999, Theorem 2.8, p.51), it

follows that λ is the largest eigenvalue of δ−1 (C 01M2C1)−1. The last inequality in (16) follows

from Magnus and Neudecker (1988, Theorem 10, p. 209) as well as the fact that λ is δ−1

times the largest eigenvalue of (C 01M2C1)−1. Now let r1 = 1 and C1 = ρi, where ρi is the

eigenvector corresponding to min lj . Note that for this case W = δ−1ρiρ0i/min lj such that

Γ1 = ρi and λ = δ−1 (min lj)−1. Then Ir1 − λ−1Λ1 = 1− 1 = 0 and Γ01DΓ1 = 1/ (2δ) such that

|E [X (C1,Σ22)]| = λ−1¯trΓ01DΓ1

¯= (1/min lj)

−1 /2 = min lj/2. Inequality (16) therefore holds

with equality.

Next consider E [X (C1, IT−1)] . Note that now W = C1(C01C1)

−1C 01 = C1C01 since C 01C1 =

Ir1 such that W = Γ1Λ1Γ01 with Λ1 = Ir1 and Γ1 = C1. Then, λ = 1 and Ir1 − λΛ1 =

0 such that (14) reduces to E [X (C1, IT−1)] = 1r1C1,01

¡Γ01DΓ1, 0

¢= trΓ01DΓ1/r1. But then

D = 2−1 (C1C 01M1 +M01C1C

01) such that trΓ

01DΓ1/r1 = trC

01DC1/r1 = −2−1 trC 01M2C1/r1 =

tr D/r1. We analyze

minC1 s.t. C01C1=Ir1r1∈1,2,..,T−1

¯−2−1 trC01M2C1r1¯.

It can be checked easily that M2 is positive definite symmetric. We can therefore minimize

tr (C 01M2C1). It is now useful to choose an orthogonal matrix R with j-th row ρj such that

R0R = RR0 = I and M2 = RLR0 where L is the diagonal matrix of eigenvalues of M2 =PT−1j=1 ljρjρ

0j . Then it follows that tr (C

01M2C1) =

PT−1j=1 ljρ

0jC1C

01ρj. Next note that all the

eigenvalues of C1C01 are either zero or one such that 0 ≤ ρ0jC1C 01ρj ≤ 1. The minimum of

tr (C 01M2C1) is then found by choosing r1 = 1 and C1 such that C 01ρj = 0 except for the

eigenvector ρi corresponding to min lj . To show that tr¡D/r1

¢is also minimized for r1 = 1

and C1 = ρi, where tr¡D/r1

¢= min lj/2, consider augmenting C1 by a column vector x such

that x0x = 1 and ρ0ix = 0. Then C01C1 = I2, r2 = 2 and trC

01M2C1 = li +

PT−1j 6=i lj

³ρ0jx´2. By

Parseval’s equalityPT−1j 6=i

³ρ0jx´2= 1. Since lj ≥ li we can bound tr (C 01M2C1) ≥ 2li but then

tr (C 01M2C1/2) ≥ li. This argument can be repeated to more than one orthogonal addition x.It now follows that E [X (C1, IT−1)] = tr

¡D/r1

¢is minimized for r1 = 1 and C = ρi, where ρi

is the eigenvector corresponding to the smallest eigenvalue.

24

Next note that from x0x = 1 such that min li ≤ x0M2x ≤ max li it follows that

min li ≤ 10M21/(101) =2 (T − 1)−1

for 1 =[1, ..., 1]0 which shows that the smallest eigenvalue is bounded by a monotonically de-

creasing function of the number of moment conditions.

Proof of Corollary 2:. For a = ”AS”, ”AB” the matrixW a is of full rank T −1. Thus,for L =

√δIT−1, z = L−1ξx ∼ N (0, IT−1) and

E [X(Ca1 ,Σ22)] = E

·z0L0WaHE [ξu|ξx]

z0L0W aLz

¸= E

·z0L0W aδ−1Σ21Lz

z0L0W aLz

¸and the result follows from Smith (1993, Eq 2.4, p.273). For a = ”LD” note that

ξ0xW aHξuξ0xW aξx

=z01Γ01W aHξuz01Λ1z1

by the same arguments as in the proof of Theorem (2). The remainder of the proof is the same

as in that Theorem.

25

References

[1] Ahn, S., and P. Schmidt, 1995, “Efficient Estimation of Models for Dynamic Panel

Data”, Journal of Econometrics 68, 5 - 28.

[2] Ahn, S., and P. Schmidt, 1997, “Efficient Estimation of Dynamic Panel Data Models:

Alternative Assumptions and Simplified Estimation”, Journal of Econometrics 76, 309 -

321.

[3] Alonso-Borrego, C. and M. Arellano, 1999, “Symmetrically Normalized

Instrumental-Variable Estimation Using Panel Data”, Journal of Business and Economic

Statistics 17, pp. 36-49.

[4] Alvarez, J. and M. Arellano, 1998, “The Time Series and Cross-Section Asymptotics

of Dynamic Panel Data Estimators”, unpublished manuscript.

[5] Andrews, D.W.K., 1992, ”Generic Uniform Convergence”, Econometric Theory, 8, pp.

241-257.

[6] Andrews, D.W.K., 1994, ”Empirical Process Methods in Econometrics.” In Handbook of

Econometrics, Vol IV (R.F. Engle and D.L. McFadden, eds.). North Holland, Amsterdam.

[7] Anderson, T.W. and C. Hsiao, 1982, “Formulation and Estimation of Dynamic Models

Using Panel Data”, Journal of Econometrics 18, pp. 47 - 82.

[8] Arellano, M. and S.R. Bond, 1991, “Some Tests of Specification for Panel Data: Monte

Carlo Evidence and an Application to Employment Equations”, Review of Economic Studies

58, pp. 277 - 297.

[9] Arellano, M. and O. Bover, 1995, “Another Look at the Instrumental Variable Esti-

mation of Error-Component Models”, Journal of Econometrics 68, pp. 29 - 51.

[10] Bekker, P., 1994, “Alternative Approximations to the Distributions of Instrumental Vari-

able Estimators”, Econometrica 62, pp. 657 - 681.

[11] Blundell, R. and S. Bond, 1998, “Initial Conditions and Moment Restrictions in Dy-

namic Panel Data Models”, Journal of Econometrics 87, pp. 115 - 143.

[12] Chamberlain, G., 1984, “Panel Data”, in Griliches, Z. and M. D. Intriligator, eds., Hand-

book of Econometrics Vol. 2. Amsterdam: North-Holland.

26

[13] Chao, J. and N. Swanson, 2000, “An Analysis of the Bias and MSE of the IV Estimator

under Weak Identification with Application to Bias Correction”, unpublished manuscript.

[14] Choi, I. and P.C.B. Phillips, 1992, “Asymptotic and Finite Sample Distribution The-

ory for IV Estimators and Tests in Partially Identified Structural Equations”, Journal of

Econometrics 51, 113 - 150.

[15] Davis, A.W, 1980, “Invariant polynomials with two matrix arguments, extending the zonal

polynomials”, in P.R. Krishnaiah, ed,Mulitivariate Analysis V, 287-299. Amsterdam: North

Holland.

[16] Donald, S. and W. Newey, 2001, “Choosing the Number of Instruments”, Econometrica,

69, 1161-1191.

[17] Dufour, J-M, 1997, “Some Impossibility Theorems in Econometrics with Applications to

Structural and Dynamic Models”, Econometrica 65, 1365 - 1388.

[18] Griliches, Z. and J. Hausman, 1986, “Errors in Variables in Panel Data,” Journal of

Econometrics 31, pp. 93 - 118.

[19] Hahn, J., 1997, “Efficient Estimation of Panel Data Models with Sequential Moment

Restrictions”, Journal of Econometrics 79, pp. 1 - 21.

[20] Hahn, J., 1999, “How Informative Is the Initial Condition in the Dynamic Panel Model

with Fixed Effects?”, Journal of Econometrics 93, pp. 309 - 326.

[21] Hahn, J. and J. Hausman, 2002, “A New Specification Test for the Validity of Instru-

mental Variables”, Econometrica, 70, 163-189.

[22] Hahn, J. and G. Kuersteiner, 2002, “Asymptotically Unbiased Inference for a Dynamic

Panel Model with Fixed Effects When Both n and T are Large”, Econometrica, 70, 1639-

1657.

[23] Hahn, J. and G. Kuersteiner, 2002, ”Discontinuities of Weak Instrument Limiting

Distributions,” Economics Letters, 75, 325-331.

[24] Hahn, J., G. Kuersteiner, and W.K. Newey, 2002, “Higher Order Properties of

Bootstrap and Jackknife Bias Corrected Maximum Likelihood Estimators”, unpublished

manuscript.

27

[25] Hall, P. and J. Horowitz, 1996, “Bootstrap Critical Values for Tests Based on

Generalized-Method-of-Moments Estimators”, Econometrica 64, 891-916.

[26] Hausman, J.A., and W.E. Taylor, 1983, “Identification in Linear Simultaneous Equa-

tions Models with Covariance Restrictions: An Instrumental Variables Interpretation”,

Econometrica 51, pp. 1527-1550.

[27] Hausman, J.A., W.K. Newey, and W.E. Taylor, 1987, “Efficient Estimation and Iden-

tification of Simultaneous Equation Models with Covariance Restrictions”, Econometrica

55, pp. 849-874.

[28] Holtz-Eakin, D., W.K. Newey, and H. Rosen, “Estimating Vector Autoregressions

with Panel Data”, Econometrica 56, pp. 1371 - 1395.

[29] Kiviet, J.F., 1995, “On Bias, Inconsistency, and Efficiency of Various Estimators in Dy-

namic Panel Data Models”, Journal of Econometrics 68, pp. 1 - 268.

[30] Kruiniger, H., “GMM Estimation of Dynamic Panel Data Models with Persistent Data,”

unpublished manuscript.

[31] Kuersteiner, G.M., 2000, “Mean Squared Error Reduction for GMM Estimators of Lin-

ear Time Series Models” unpulished manuscript.

[32] Lancaster, T., 1997, “Orthogonal Parameters and Panel Data”, unpublished manuscript.

[33] Magnus J., and H. Neudecker (1988), Matrix Differential Calculus with Applications

in Statistics and Econometrics, John Wiley & Sons Ltd.

[34] Mariano, R.S. and T. Sawa, 1972, “The Exact Finite-Sample Distribution of the

Limited-Information Maximum Likelihood Estimator in the Case of Two Included Endoge-

nous Variables”, Journal of the American Statistical Association 67, 159 - 163.

[35] Moon, H.R. and P.C.B. Phillips, 2000, “GMM Estimation of Autoregressive Roots

Near Unity with Panel Data”, unpublished manuscript.

[36] Nagar, A.L., 1959, “The Bias and Moment Matrix of the General k-Class Estimators of

the Parameters in Simultaneous Equations”, Econometrica 27, pp. 575 - 595.

[37] Nelson, C., R. Startz, and E. Zivot, 1998, “Valid Confidence Intervals and Inference

in the Presence of Weak Instruments”, International Economic Review 39, 1119 - 1146.

28

[38] Newey, W.K. and R.J. Smith, 2000, “Asymptotic Equivalence of Empirical Likelihood

and the Continuous Updating Estimator”, unpublished manuscript.

[39] Nickell, S., 1981, “Biases in Dynamic Models with Fixed Effects”, Econometrica 49, pp,

1417 - 1426.

[40] Pakes, A and D. Pollard, 1989, ”Simulation and the Asymptotics of Optimazation

Estimators,” Econometrica 57, pp. 1027-1057.

[41] Phillips, P.C.B., 1989, “Partially Identified Econometric Models”, Econometric Theory

5, 181 - 240.

[42] Rilstone, P., V.K. Srivastava, A. Ullah, 1996, “The Second-Order Bias and Mean

Squared Error of Nonlinear Estimators”, Journal of Econometrics 75, pp. 369 - 395.

[43] Richardson, D.J., 1968, “The Exact Distribution of a Structural Coefficient Estimator”,

Journal of the American Statistical Association 63, 1214 - 1226.

[44] Rothenberg, T.J., 1983, “Asymptotic Properties of Some Estimators In Structural Mod-

els”, in Studies in Econometrics, Time Series, and Multivariate Statistics.

[45] Rothenberg, T. J., 1984, “Approximating the Distributions of Econometric Estimators

and Test Statistics”, in Griliches, Z. and M. D. Intriligator, eds., Handbook of Econometrics,

Vol. 2. Amsterdam: North-Holland.

[46] Smith, M.D., 1993, “Expectations of Ratios of Quadratic Forms in Normal Variables:

Evaluating Some Top-Order Invariant Polynomials”, Australian Journal of Statistics 35,

271-282.

[47] Staiger, D., and J. H. Stock (1997): ”Instrumental Variables Regressions with Weak

Instruments,” Econometrica, 65, 557 - 586.

[48] Wang, J. and E. Zivot, 1998, “Inference on Structural Parameters in Instrumental Vari-

ables Regression with Weak Instruments”, Econometrica 66,1389 — 1404.

[49] Windmeijer, F., 2000, “GMM Estimation of Linear Dynamic Panel Data Models using

Nonlinear Moment Conditions”, unpublished manuscript.

[50] Zhang, F., 1999, “Matrix Theory”, Springer Verlag.

29

Table 1: Finite Sample Properties of bbGMM and bbBCT n β %bias

³bbGMM´ %bias³bbBC´ RMSE

³bbGMM´ RMSE³bbBC´

ActualSecond Order

TheoryActual Actual Actual

5 100 0.1 -14.96 -17.71 0.25 0.08 0.08

10 100 0.1 -14.06 -15.78 -0.77 0.05 0.05

5 500 0.1 -3.68 -3.54 -0.38 0.04 0.04

10 500 0.1 -3.15 -3.16 -0.16 0.02 0.02

5 100 0.3 -8.86 -10.60 -0.47 0.10 0.10

10 100 0.3 -7.06 -8.13 -0.66 0.05 0.05

5 500 0.3 -2.03 -2.12 -0.16 0.04 0.04

10 500 0.3 -1.58 -1.63 -0.10 0.02 0.02

5 100 0.5 -10.05 -12.09 -1.14 0.13 0.13

10 100 0.5 -6.76 -8.00 -0.93 0.06 0.06

5 500 0.5 -2.25 -2.42 -0.15 0.06 0.06

10 500 0.5 -1.53 -1.60 -0.11 0.03 0.03

5 100 0.8 -27.65 -37.81 -11.33 0.32 0.34

10 100 0.8 -13.45 -18.98 -4.55 0.14 0.11

5 500 0.8 -6.98 -7.56 -0.72 0.13 0.13

10 500 0.8 -3.48 -3.80 -0.37 0.05 0.04

5 100 0.9 -50.22 -118.64 -42.10 0.55 0.78

10 100 0.9 -24.27 -52.66 -15.82 0.25 0.23

5 500 0.9 -20.50 -23.73 -6.23 0.28 0.30

10 500 0.9 -8.74 -10.53 -2.02 0.10 0.08

30

Table 2: Inconsistency of bbBB under MisspecificationWeight Matrix I

βFβ 0.3 0.6 0.8 0.9

0.3 0.000 0.071 0.074 0.035

0.6 0.267 0.000 0.116 0.054

0.8 0.253 0.205 0.000 0.088

0.9 0.220 0.181 0.129 0.000

Weight Matrix II

βFβ 0.3 0.6 0.8 0.9

0.3 0.000 0.077 0.149 0.084

0.6 0.267 0.000 0.268 0.149

0.8 0.258 0.208 0.000 0.126

0.9 0.235 0.197 0.137 0.000

Table 3: Higher Order Theoretical Properties of Various Estimators

β n T bbGMM bbLD−AB bbLD1 bbLD2 bbLD3 bbLD4 bbAS bbBB bbLDGMMBias Predicted by Second Order Theory

0.1 100 5 -0.017 0.000 0.001 0.001 0.001 0.001 -0.001 0.005 0.003

0.3 100 5 -0.029 0.000 0.003 0.005 0.005 0.005 -0.001 0.008 0.005

0.5 100 5 -0.054 -0.007 0.003 0.009 0.013 0.014 -0.002 0.012 0.012

0.7 100 5 -0.135 -0.040 -0.007 0.014 0.030 0.043 0.002 0.014 0.046

0.9 100 5 -1.019 -0.449 -0.119 0.081 0.214 0.317 0.340 -0.078 1.055

RMSE Predicted by Higher Order Theory

0.1 100 5 0.081 0.099 0.102 0.102 0.102 0.102 0.069 0.070 0.091

0.3 100 5 0.100 0.097 0.109 0.112 0.113 0.113 0.077 0.079 0.100

0.5 100 5 0.133 0.097 0.120 0.134 0.142 0.147 0.094 0.093 0.124

0.7 100 5 0.220 0.121 0.141 0.173 0.212 0.250 0.158 0.123 0.242

0.9 100 5 0.965 0.609 1.181 1.449 1.561 1.633 2.099 0.422 3.558

31

Table 4: Properties of Selected Estimators: Higher Order Theory and Practice

β n T bbGMM bbLD−AB bbLD1 bbLD2 bbLD3 bbLD4Bias Predicted by Higher Order Theory

0.10 100 5 -0.017 0.000 0.001 0.001 0.001 0.001

0.30 100 5 -0.029 0.000 0.003 0.005 0.005 0.005

0.50 100 5 -0.054 -0.007 0.003 0.009 0.013 0.014

0.70 100 5 -0.135 -0.040 -0.007 0.014 0.030 0.043

0.80 100 5 -0.281 -0.105 -0.029 0.015 0.051 0.083

0.90 100 5 -1.019 -0.449 -0.119 0.081 0.214 0.317

0.95 100 5 -3.849 -1.800 -0.300 0.793 1.591 2.179

Actual Bias

0.10 100 5 -0.015 0.001 0.002 0.002 0.002 0.002

0.30 100 5 -0.027 0.001 0.004 0.006 0.007 0.008

0.50 100 5 -0.050 -0.005 0.005 0.012 0.019 0.024

0.70 100 5 -0.119 -0.030 -0.003 0.019 0.039 0.049

0.80 100 5 -0.221 -0.068 -0.024 0.005 0.036 0.044

0.90 100 5 -0.452 -0.142 -0.077 -0.037 0.001 0.010

0.95 100 5 -0.598 -0.185 -0.112 -0.069 -0.030 -0.010

RMSE Predicted by Higher Order Theory

0.10 100 5 0.081 0.099 0.102 0.102 0.102 0.102

0.30 100 5 0.100 0.097 0.109 0.112 0.113 0.113

0.50 100 5 0.133 0.097 0.120 0.134 0.142 0.147

0.70 100 5 0.220 0.121 0.141 0.173 0.212 0.250

0.80 100 5 0.354 0.189 0.228 0.261 0.318 0.399

0.90 100 5 0.965 0.609 1.181 1.449 1.561 1.633

0.95 100 5 3.200 2.557 6.296 8.863 10.544 11.600

Actual RMSE

0.10 100 5 0.081 0.099 0.102 0.102 0.102 0.102

0.30 100 5 0.099 0.097 0.109 0.113 0.116 0.118

0.50 100 5 0.131 0.097 0.120 0.137 0.158 0.172

0.70 100 5 0.211 0.116 0.141 0.183 0.234 0.261

0.80 100 5 0.318 0.144 0.161 0.199 0.265 0.298

0.90 100 5 0.552 0.188 0.170 0.195 0.255 0.285

0.95 100 5 0.690 0.215 0.173 0.183 0.231 0.278

32

Table 5: Monte Carlo Comparison of Estimators

Estimator bbBB bbLDGMM bbAS bbLD−AB bbLD1 bbLD2 bbLD3β = 0.1, T = 5, n = 100

Mean 0.105 0.101 0.098 0.103 0.100 0.102 0.100

RMSE 0.072 0.092 0.070 0.101 0.106 0.104 0.103

Median 0.104 0.101 0.098 0.102 0.101 0.102 0.099

1st percentile -0.054 -0.113 -0.057 -0.127 -0.144 -0.145 -0.136

25th percentile 0.057 0.040 0.050 0.033 0.031 0.034 0.032

75th percentile 0.152 0.160 0.143 0.169 0.169 0.169 0.165

99th percentile 0.276 0.318 0.268 0.344 0.352 0.352 0.349

β = 0.5, T = 5, n = 100

Mean 0.508 0.526 0.504 0.507 0.512 0.519 0.520

RMSE 0.092 0.150 0.113 0.102 0.123 0.146 0.170

Median 0.507 0.503 0.491 0.504 0.503 0.503 0.497

1st percentile 0.296 0.273 0.295 0.282 0.255 0.250 0.245

25th percentile 0.448 0.429 0.430 0.437 0.429 0.421 0.418

75th percentile 0.569 0.588 0.559 0.573 0.588 0.598 0.592

99th percentile 0.725 1.033 0.922 0.759 0.826 0.953 1.073

β = 0.8, T = 5, n = 100

Mean 0.753 0.816 0.766 0.797 0.819 0.835 0.854

RMSE 0.158 0.182 0.184 0.214 0.254 0.274 0.294

Median 0.767 0.786 0.736 0.751 0.775 0.783 0.783

1st percentile 0.331 0.493 0.405 0.513 0.369 0.423 0.436

25th percentile 0.662 0.682 0.641 0.672 0.685 0.685 0.682

75th percentile 0.861 0.936 0.871 0.856 0.892 0.911 0.938

99th percentile 1.054 1.280 1.208 1.602 1.763 1.849 1.855

β = 0.9, T = 5, n = 100

Mean 0.664 0.826 0.709 0.816 0.853 0.880 0.902

RMSE 0.388 0.221 0.317 0.249 0.240 0.234 0.264

Median 0.693 0.798 0.697 0.762 0.814 0.838 0.844

1st percentile -0.221 0.380 0.077 0.467 0.384 0.427 0.495

25th percentile 0.499 0.689 0.558 0.691 0.734 0.747 0.750

75th percentile 0.882 0.954 0.854 0.861 0.911 0.961 0.981

99th percentile 1.273 1.349 1.295 1.752 1.764 1.722 1.879

33

Table 6: Bias of Near Unit Root Approximations

T AS/AB LD Optimal

5 -0.6632 -0.2500 -0.0955

6 -0.5903 -0.2000 -0.0670

7 -0.5297 -0.1667 -0.0495

8 -0.4781 -0.1429 -0.0381

9 -0.4336 -0.1250 -0.0302

10 -0.3948 -0.1111 -0.0245

Table 7: MSE of Near Unit Root Approximations

T Optimal Moments AS Moments

5 0.6866 3 0.7464 17

6 0.4246 3 0.5680 24

7 0.2889 3 0.4488 32

8 0.2031 4 0.3646 41

9 0.1524 4 0.3056 51

10 0.1093 4 0.2663 62

34

bias corrected instrumental variables estimation for dynamic … · 2003. 2. 13. · bias corrected...

Documents