bias corrected instrumental variables estimation for dynamic … · 2003. 2. 13. · bias corrected...
TRANSCRIPT
Bias Corrected Instrumental Variables Estimation
for Dynamic Panel Models with Fixed Effects
Jinyong Hahn Jerry Hausman Guido Kuersteiner
Brown University MIT MIT
August, 2002
Acknowledgment: We are grateful to Yu Hu for exceptional research assistance. Helpful
comments by Guido Imbens, Whitney Newey, James Powell, and participants of workshops at
Arizona State University, Berkeley, Harvard/MIT, Maryland, Triangle, UCL, UCLA, UCSD
are appreciated. Kuersteiner gratefully acknowledges financial support from NSF grant SES-
0095132.
Abstract
This paper proposes a new instrumental variables estimator for a dynamic panel model with fixed
effects with good bias and mean squared error properties even when identification of the model
becomes weak near the unit circle. We compare our new estimator based on long differences to a
bias corrected estimator based on second order expansions of the bias. Simulation experiments
show that this bias correction performs well if the model does not have a root near unity but
breaks down near the unit circle. We adopt a weak instrument asymptotic approximation to
study the behavior of various estimators near the unit circle. We show that an estimator based
on long differencing the model is much less biased than conventional implementations of the
GMM estimator for the dynamic panel model. We also show that under the weak instrument
approximation such conventional estimators are dominated in terms of mean squared error by an
estimator with far less moment conditions. The long difference estimator mimics the infeasible
optimal procedure through its reliance on a small set of moment conditions.
Keywords: dynamic panel, bias correction, second order, unit root, weak instrument
JEL: C13, C23, C51
1 Introduction
We are concerned with estimation of the dynamic panel model with fixed effects. Under large
n, fixed T asymptotics it is well known from Nickell (1981) that the standard maximum likeli-
hood estimator suffers from an incidental parameter problem leading to inconsistency. In order
to avoid this problem the literature has focused on instrumental variables estimation (GMM)
applied to first differences. Examples include Anderson and Hsiao (1982), Holtz-Eakin, Newey,
and Rosen (1988), and Arellano and Bond (1991). Ahn and Schmidt (1995), Hahn (1997), and
Blundell and Bond (1998) considered further moment restrictions. Comparisons of information
contents of varieties of moment restrictions made by Ahn and Schmidt (1995) and Hahn (1999)
suggest that, unless stationarity of the initial level yi0 is somehow exploited as in Blundell and
Bond (1998), the orthogonality of lagged levels with first differences provide the largest source
of information.
Unfortunately, the standard GMM estimator obtained after first differencing has been found
to suffer from substantial finite sample biases. See Alonso-Borrego and Arellano (1996). Mo-
tivated by this problem, modifications of likelihood based estimators emerged in the literature.
See Kiviet (1995), Lancaster (1997), Hahn and Kuersteiner (2002). The likelihood based esti-
mators do reduce finite sample bias compared to the standard maximum likelihood estimator,
but the remaining bias is still substantial for T relatively small.
In this paper, we attempt to eliminate the finite sample bias of the standard GMM estimator
obtained after first differencing. We view the standard GMM estimator as a minimum distance
estimator that combines T−1 instrumental variable estimators (2SLS) applied to first differences.This view has been adopted by Chamberlain (1984) and Griliches and Hausman (1986). It has
been noted for quite a while that IV estimators can be quite biased in finite sample. See Nagar
(1959), Mariano and Sawa (1972), Rothenberg (1983), Bekker (1994), Donald and Newey (2001)
and Kuersteiner (2000). If the ingredients of the minimum distance estimator are all biased,
it is natural to expect such bias in the resultant minimum distance estimator, or equivalently,
GMM.
We consider a second order approach to the bias of the GMM estimator using the formula
contained in Hahn and Hausman (2002). We find that the standard GMM estimator suffers
from significant bias. The bias arises from two primary sources: the correlation of the structural
equation error with the reduced form error and the low explanatory power of the instruments.
We attempt to solve these problems by using the “long difference technique” of Griliches and
Hausman (1986). Griliches and Hausman noted that bias is reduced when long differences are
1
used in the errors in variable problem, and a similar result works here with the second order bias.
Long differences also increases the explanatory power of the instruments which further reduces
the finite sample bias and also decreases the MSE of the estimator. To increase further the
explanatory power of the instruments, we use the technique of using estimated residuals as ad-
ditional instruments, a technique introduced in the simultaneous equations model by Hausman,
Newey, and Taylor (1987) and used in the dynamic panel data context by Ahn and Schmidt
(1995). Monte Carlo results demonstrate that the long difference estimator performs quite well,
even for high positive values of the lagged variable coefficient where previous estimators are
badly biased.
However, the second order bias calculations do not predict well the performance of the esti-
mator for these high values of the coefficient. Simulation evidence shows that our approximations
do not work well near the unit circle where the model suffers from a near non-identification prob-
lem. In order to analyze the bias of standard GMM procedures under these circumstances we
consider a local to non-identification asymptotic approximation. We use the weak instrument
approximation of Staiger and Stock (1997) and Stock and Wright (2000). It is based on letting
the correlation between instruments and regressors decrease at a prescribed rate as the sample
size increases.
In the case of the dynamic panel model it is the autoregressive parameter that determines
the quality of lagged observations used as instruments. Here we let the autoregressive parameter
tend to unity as the number of cross-sectional observations n tends to infinity. The goal of this
alternative asymptotic approximation is to keep the quality of the instruments constant as the
sample size increases such that the limiting distribution captures features of the finite sample
distribution of the estimators. Despite the autoregressive parameter tending to one with the
sample size increasing our approximation is quite different from the near unit root literature.
The number of time periods is held constant in our case and non-stationarity issues of importance
in the time series literature do not play any role in our analysis.
Our limiting distribution for the GMM estimator shows that only moment conditions in-
volving initial conditions are asymptotically relevant. We define a class of estimators based on
linear combinations of asymptotically relevant moment conditions and show that a bias minimal
estimator within this class can approximately be based on taking long differences of the dynamic
panel model. In general, it turns out that under near non-identification asymptotics the optimal
procedures of Alvarez and Arellano (1998), Arellano and Bond (1991) , Ahn and Schmidt (1995,
1997) are suboptimal from a bias point of view and inference optimally should be based on a
2
smaller than the full set of moment conditions. We show that a bias minimal estimator can be
obtained by using a particular linear combination of the original moment conditions. As far as
bias minimization under the alternative asymptotics is concerned this optimality result shows
that only a single moment condition should optimally be used. We also analyze the form of
the optimal estimator in terms of minimal mean squared error (MSE) under weak identificaiton
asymptotics. First order asymptotically efficient procedures are again found to be suboptimal.
The optimal number of moment conditions minimizing the MSE under alternative asymptotics
can be up to a factor of 10 smaller than the set of first order optimal instruments.
Optimal procedures under weak instrument asymptotics are difficult to implement because
they depend on unobservable nuisance parameters that can not be readily estimated. We show
that the long difference estimator we propose in this paper is a good heuristic approximation to
the optimal procedures. It captures the essence of our theoretical analysis which shows that the
selection of a few highly significant instruments is called for to minimize bias and MSE.
2 Review of the Bias of GMM Estimator
Consider the usual dynamic panel model with fixed effects. We assume that other exogenous
variables have been partialled out if they are present. Our variable yit may therefore be a residual
of a projection onto exogenous variables. Such removal of exogenous variables does not affect
our higher order analysis as long as it can be done√n-consistently. Our model then is
yit = αi + βyi,t−1 + εit, i = 1, . . . , n; t = 1, . . . , T (1)
We will impose somewhat strong conditions to simplify higher order analysis:
Condition 1 εiti.i.d.∼ N ¡0,σ2¢ over i and t.
Condition 2 yi0|αi ∼ N³αi1−β ,
σ2
1−β2´and αi ∼ N
¡0,σ2α
¢.
Condition (2) is a stationarity condition that is imposed for analytical convenience. Anderson
and Hsiao (1982) show that estimators exploiting such initial conditions can be very sensitive
to misspecification. For this reason we do not exploit the initial condition to construct our
estimator.
It has been common in the literature to consider the case where n is large and T is small.
The usual GMM estimator is based on the first difference form of the model yit − yi,t−1 =β (yi,t−1 − yi,t−2)+(εit − εi,t−1), where the instruments are based on the orthogonalityE [yi,s (εit − εi,t−1)] =
3
0, s = 0, . . . , t− 2. Instead, we consider a version of the GMM estimator developed by Arellano
and Bover (1995), which simplifies the characterization of the “weight matrix” in GMM esti-
mation. We define the innovation uit ≡ αi + εit. Arellano and Bover (1995) eliminate the fixedeffect αi in (1) by applying Helmert’s transformation
u∗it ≡r
T − tT − t+ 1
·uit − 1
T − t (ui,t+1 + · · ·+ uiT )¸, t = 1, . . . , T − 1
instead of first differencing.1 The transformation produces
y∗it = βx∗it + ε
∗it, t = 1, . . . , T − 1,
where x∗t ≡ y∗i,t−1. Let zit ≡ (yi0, . . . , yit−1)0. Our moment restriction is summarized by
E [zitε∗it] = 0, t = 1, . . . , T − 1. It can be shown that, with the homoskedasticity assump-
tion on εit, the optimal “weight matrix” is proportional to a block-diagonal matrix, with typical
diagonal block equal to E [zitz0it]. Therefore, the optimal GMM estimator is equal to
bbGMM ≡PT−1t=1 x
∗0t Pty
∗tPT−1
t=1 x∗0t Ptx
∗t
=
PT−1t=1 x
∗0t Ptx
∗t ·bb2SLS,tPT−1
t=1 x∗0t Ptx
∗t
(2)
where bb2SLS,t denotes the 2SLS of y∗t on x∗t :bb2SLS,t ≡ x∗0t Pty∗t
x∗0t Ptx∗t, t = 1, . . . , T − 1
and x∗t ≡ (x∗1t, · · · , x∗nt)0, y∗t ≡ (y∗1t, · · · , y∗nt)0, Zt ≡ (z1t, · · · , znt)0, and Pt ≡ Zt (Z0tZt)
−1 Z0t.
Therefore, the GMM estimator bbGMM may be understood as a linear combination of the 2SLS
estimators bb2SLS,1, . . . ,bb2SLS,T−1. It has long been known that the 2SLS may be subject tosubstantial finite sample bias. See Nagar (1959), Rothenberg (1983), Bekker (1994), and Donald
and Newey (2001), Hahn and Hausman (2002) and Kuersteiner (2000) for related discussion.
It is therefore natural to conjecture that a linear combination of the 2SLS may be subject to
quite substantial finite sample bias. This is indeed the case. In the first column of Table 1, we
summarized finite sample bias of bbGMM as a fraction of β approximated by 5,000 Monte Carlo
runs.
3 Bias Correction
In the previous section, we considered the bias of the GMM estimator as a result of the biases
of the 2SLS estimators. In this section, we attempt eliminating the bias using two different1Arellano and Bover (1995) note that the efficiency of the resultant GMM estimator is not affected whether
or not Helmert’s transformation is used instead of first differencing.
4
approaches: (i) by adopting the second order Taylor type approximation as by Nagar (1959),
and Rothenberg (1983); and (ii) by adopting a “long difference” perspective.
We first examine the second order method. Many econometric estimators, say bθ for θ basedon a sample of size n allow for an expansion of the form:
√n³bθ − θ´ = θ(1) + 1√
nθ(2) + op
µ1√n
¶(3)
where both θ(1) and θ(2) are of order Op (1). Typically θ(1) has mean zero and converges in
distribution to a normal distribution. Ignoring the op¡n−1/2
¢term, and taking an expectation,
we obtain the “approximate mean” of√n³bθ − θ´ equals 1√
nEθ(2). We can therefore understand
1nEθ
(2) as the “second order bias” of bθ. Based on the mapping between Eθ(2) and primitiveparameters, we can
√n−consistently estimate Eθ(2) by bEθ(2), say. The bias corrected estimatorbθ − 1
nbEθ(2) has zero second order bias. It is not always a priori clear if the second order bias
1nEθ
(2) approximately equals the actual bias E³bθ − θ´. It is also not a priori clear if the bias
corrected estimator bθ − 1nbEθ(2) indeed reduces much of the bias.
For dynamic panel model of interest, we addressed these questions by Monte Carlo approx-
imation. For such purpose, we first derive the second order bias of bbGMM :Theorem 1 Under Conditions 1-2, the second order bias of bbGMM is equal to
B1 +B2 +B3n
+ o
µ1
n
¶, (4)
where
B1 ≡ Υ−11PT−1t=1 trace
³(Γzzt )
−1 Γεxzzt,t
´B2 ≡ −2Υ−21
PT−1t=1
PT−1s=1 Γ
zx0t (Γzzt )
−1 Γεxzzt,s (Γzzs )−1 Γzxs
B3 ≡ Υ−21PT−1t=1
PT−1s=t Γ
zx0t (Γzzt )
−1B3,1(t, s) (Γzzs )−1 Γzxs .
where Γzzt ≡ E [zitz0it] ,Γzxt ≡ E [zitx∗it] ,Γεxzzt,s ≡ E [ε∗itx∗iszitz0is] , B3,1(t, s) ≡ Ehε∗itzitΓ
zx0s (Γzzs )
−1 zisz0isi
and Υ1 ≡PT−1t=1 Γ
zx0t (Γzzt )
−1 Γzxt .
Proof. Available at http://econ-www.mit.edu/faculty/jhausman/papers.htm.
In Table 1, we compare the actual performance of bbGMM and the prediction of its bias based
on Theorem 1. Table 1 tabulates the actual bias of the estimator approximated by 5,000 Monte
Carlo runs, and compares it with the second order bias based on the formula (4). It is clear that
the second order theory does a reasonably good job except when β is close to the unit circle and
n is small.
5
Theorem 1 suggests a natural way of eliminating the bias. Suppose that bB1, bB2, bB3 are√n-consistent estimators of B1, B2, B3. Then it is easy to see that
bbBC ≡ bbGMM − 1
n
³ bB1 + bB2 + bB3´ (5)
is first order equivalent to bbGMM , and has second order bias equal to zero. Define Γzzt =
n−1Pni=1 zitz
0it, Γzxt = n−1
Pni=1 zitx
∗it, Γεxzzt,s = n−1
Pni=1 e
∗itx
∗iszitz
0is and
B3,1(t, s) = n−1Pn
i=1 e∗itz0itΓ
zxt
³Γzzt
´−1zisz
0is, where e
∗it ≡ y∗it − x∗itbbGMM . Let B1, B2 and B3 be
defined by replacing Γzzt ,Γzxt ,Γ
εxzzt,s and B3,1(t, s) by Γzzt , Γ
zxt ,Γ
εxzzt,s and B3,1(t, s) in B1, B2 and
B3.
Second order asymptotic theory predicts approximately that bbBC would be relatively freeof bias. We examined whether this prediction is reasonably accurate in finite sample by 5,000
Monte Carlo runs. Table 1 summarizes the properties of bbBC . We have seen in Table 1 that thesecond order theory is reasonably accurate unless β is close to one. It is therefore sensible to
conjecture that bbBC would have a reasonable finite sample bias property as long as β is not tooclose to one. We verify this conjecture in Table 1.
We do not expect other methods of bias correction to be much superior to bbBC . Hahn,Kuersteiner, and Newey (2002) recently showed that higher order mean squared errors of bias
corrected maximum likelihood estimators are invariant to methods of bias correction. In partic-
ular, they considered a third order expansion of various bias corrected estimators, and showed
that bootstrap bias corrected MLE, jackknife bias corrected MLE, and analytically bias cor-
rected MLE all have the same higher order MSE. Their result crucially depends on the fact
that the MLE is first order efficient. Because bbGMM is first order efficient, we expect their
result to carry over, and MSE of bootstrap bias corrected estimator, say, is not expected to be
substantially different from that of analytically bias corrected estimator bbBC .However the second order asymptotics “fails” to be a good approximation around β ≈ 1.
This phenomenon can be explained by the “weak instrument” problem. See Staiger and Stock
(1997). Blundell and Bond (1998) argued that the weak instrument problem can be alleviated by
assuming stationarity on the initial observation yi0. The stationarity assumption turns out to be
a predominant source of information around β ≈ 1 as noted by Hahn (1999). The stationaritycondition may or may not be appropriate for particular applications, and substantial finite
sample biases due to inconsistency will result under violation of stationarity.2
2Under stationarity, we would have yi0 ∼ N³
αi1−β ,
σ2ε1−β2
´. We analyze departures from the stationary initial
condition by assuming that instead yi0 ∼ N³
αi1−βF ,
σ2ε1−β2
´. Because the asymptotic bias depends on the weight
6
We therefore turn to some other method to overcome the weak instrument problem around
the unit circle avoiding the stationarity assumption. We demonstrate that some of the difficulties
of inference around the unit circle can be alleviated by taking a long difference. To be specific,
we focus on a single equation based on the long difference
yiT − yi1 = β (yiT−1 − yi0) + (εiT − εi1) (6)
It is easy to see that the initial observation yi0 would serve as a valid instrument. Using similar
intuition as in Hausman and Taylor (1983) or Ahn and Schmidt (1995), we can see that the
residuals yiT−1 − βyiT−2, . . ., yi2 − βyi1 are valid instruments as well. We call this estimatorthe long difference estimator. Such additional instruments require a preliminary consistent
estimator. An obvious choice is Arellano and Bover’s (1995) estimator bbGMM . We call suchan estimator bbLD−AB: It is 2SLS on (6) using ³yi0, yiT−1 −bbGMM · yiT−2, . . . , yi2 −bbGMM · yi1
´as instruments. Such instruments were previously used in Hausman, Newey and Taylor (1987).
Using bbLD−AB as instruments, we can also come up with another long difference estimator.
Call it bbLD1. We can iterate it further, and come up with bbLD2, bbLD3, . . .. Below, we providean intuitive motivation for the long difference estimator. Higher order properties of the long
difference estimator will be compared with other popular estimators in the next section.
It is known that the bias of 2SLS (GMM) depends on 4 factors: “Explained” variance of the
first stage reduced form equation, covariance between the stochastic disturbance of the structural
equation and the reduced form equation, the number of instruments, and sample size:
Ehβ2SLS − β
i≈ 1
n
(number of instruments)× (“covariance”)“Explained” variance of the first stage reduced form equation
Similarly, the Donald-Newey (DN) (2001) MSE formula depends on the same 4 factors. As
explained in Donald and Newey (2001) the formula is valid strictly speaking only when the
number of instruments tends to infinity. It provides however a good approximation to the finite
sample behavior of GMM estimators as far as the impact of additional instruments on bias is
concerned. Kuersteiner (2000) shows that the same expression holds in a time series context if
appropriate adjustments are made for the covariance term. In order to apply the above formula
we make use of our representation of GMM as a linear combination of individual time period
GMM. We compute the bias for a single time period GMM estimator conditional on lagged
matrix, we consider two versions of GMM estimators. The first one uses an identity matrix for initial weight
matrix, whose asymptotic bias is summarized in the upper half of Table 2. The second one uses Blundell and
Bond’s recommendation, whose asymptotic bias is summarized in the lower half of Table 2. The results show that
the bias of Blundell and Bond’s estimator can be substantial under misspecification.
7
instruments. We now consider first differences (FD) and long differences (LD) to see why LD
does so much better in our Monte-Carlo experiments.
Assume that T = 4. The first difference set up considers the following equation among
others:
y4 − y3 = β (y3 − y2) + ε4 − ε3 (7)
For the RHS variables it uses the instrument equation:
y3 − y2 = (β − 1) y2 + α+ ε3
Now calculate the R2 for equation (7) using the Ahn-Schmidt (AS) moments under “ideal
conditions” where you know β in the sense that the nonlinear restrictions become linear re-
strictions: We would then use (y2, y1, y0) as instruments. Assuming stationarity for sym-
bols, but not using it as additional moment information, we can write y0 = α/ (1− β) + ξ0,where ξ0 ∼
¡0,σ2ε/
¡1− β2¢¢. It can be shown that the covariance between the structure
error and the first stage error3 is −σ2ε, and the “explained variance” in the first stage isequal to σ2ε (1− β) / (β + 1). Therefore, the ratio that determines the bias of 2SLS is equalto − (1 + β) / (1− β), which is equal to −19 for β = .9. For n = 100, this implies a percentagebias of −105.56.
We now turn to the LD setup:
y4 − y1 = β (y3 − y0) + ε4 − ε1
It can be shown that the covariance between the first stage and second stage errors is −β2σ2ε,and the “explained variance” in the first stage is given by
−σ2ε¡2β6 − 4β4 − 2β5 + 4β2 + 4β − 2β3 + 6¢σ2 + β6 − β4 + 2− 2β3¡−2β − 3 + β2¢σ2 − 1 + β2 ,
where σ2 = σ2α/σ2ε. Therefore, the ratio that determines the bias is equal to
−.37408 + 2.5703× 10−4σ2 + 4.8306× 10−2
for β = .9. Note that the maximum value that this ratio can take in absolute terms is −.37,which is much smaller than −19 for the Ahn and Schmidt estimator. We therefore concludethat the long difference increases R2 but decreases the covariance thus alleviating the “weak
instruments” problem. Further, the number of instruments is smaller in the long difference3This covariance is conditional on the set of instruments.
8
specification so we should expect even smaller bias. Thus, all of the factors in the HH equation,
except sample size, cause the LD estimator to have smaller bias. As we have shown the difference
in bias can be substantial.
4 Higher Order Bias and MSE of Long Difference Estimator
In this section, we examine the third order expansion of the long difference estimator and com-
pare it with two other estimators discussed in the literature. Note that unlike the approximate
bias calculations of the previous section the expansions considered here are valid for any fixed
number of instruments. Exact formulas tend to be very complicated and are not reported here.4
Many econometric estimators, say bθ for θ based on a sample of size n allow for an expansion ofthe form:
√n³bθ − θ´ = θ(1) + 1√
nθ(2) +
1
nθ(3) + op
µ1
n
¶(8)
where θ(1), θ(2), and θ(3) are of order Op (1). Typically θ(1) has mean zero and converges in
distribution to a normal distribution. Also, θ(3) usually has mean equal to zero. Ignoring the
op¡n−1
¢term, and taking an expectation, we obtain the “approximate mean” of
√n³bθ − θ´
equal to 1√nEθ(2). We can therefore understand 1
nEθ(2) as the “second order bias” of bθ. Ignoring
the op¡n−1
¢term and taking an expectation of
³√n³bθ − θ´´2, we obtain the approximate MSE
of√n³bθ − θ´ equal toE
·³θ(1)
´2¸+1
nE
·³θ(2)
´2¸+
2√nEhθ(1)θ(2)
i+2
nEhθ(1)θ(3)
i+ o
µ1
n
¶Ignoring the o
¡n−1
¢term, we may understand
1
nE
·³θ(1)
´2¸+1
n2E
·³θ(2)
´2¸+
2
n√nEhθ(1)θ(2)
i+2
n2Ehθ(1)θ(3)
i(9)
as the approximate MSE of bθ.We first examine the higher order bias and MSE for bbAS , the estimator proposed by Ahn and
Schmidt (1995) and bbBB, the estimator proposed by Blundell and Bond (1998). Both are GMMestimators, and the third order expansion of the usual two step estimator is critically dependent
on the initial weight matrices. It can be shown that the third order expansion of the “four4Details are available at http://econ-www.mit.edu/faculty/jhausman/papers.htm.
9
step” GMM estimator is invariant to the initial weight matrix5. Rilstone, Srivastava, and Ullah
(1996) obtained higher order bias and MSE formulas for method of moments estimators. Their
expressions are difficult to adapt to our situation and we have developed our own expansions.
We computed (9) analytically for T = 5, and summarized it along with the second order bias in
Table 3.6 Table 3 also contains the higher order properties of LD estimators. Given the two step
IV nature of the LD estimators, this required a separate expansion for IV estimator along with
a third order expansion7 of bbGMM . Unfortunately, analytic calculation of (9) for LD estimatorswas infeasible. We used a stochastic approximation to each term in (9) instead.8
We can see that theoretical higher order MSEs of bbLD2 and bbLD3 are comparable to that ofbbAS . We can also see that theoretical bias properties of bbLD2 and bbLD3 are moderately superiorto that of bbAS. In Table 4, we compare the theoretical higher order properties of selected
estimators with their actual properties. Interestingly, we find that actual properties of long
difference estimators are much better than those predicted by higher order theory, especially
around the unit circle. This suggests that Monte Carlo comparison should be preferred to any
higher order expansion for ranking various estimators. In Table 5, we examine Monte Carlo
properties of those estimators. We find that LD does better than the other estimators for large
β and not significantly worse for moderate β.5The details are available at http://econ-www.mit.edu/faculty/jhausman/papers.htm. On a related note,
Newey and Smith (2001) point out the second order expansion of “three” step GMM estimator is invariant to the
initial weight matrix.6The last column of Table 3 contains properties of bbLDGMM . This is the four step GMM estimator based on
the moment restrictions
E [yi0 · (yiT − yi1 − β (yiT−1 − yi0))] = 0
E [(yiT−1 − βyiT−2) · (yiT − yi1 − β (yiT−1 − yi0))] = 0
...
E [(yi2 − βyi1) · (yiT − yi1 − β (yiT−1 − yi0))] = 0
Interestingly, finite sample properties of bbLDGMM seems poorer than iterated 2SLS versions of long difference
estimators.7All derivations referred to in this paragraph are available at http://econ-
www.mit.edu/faculty/jhausman/papers.htm.8We perform Monte Carlo integration by generating i.i.d. sequences of
³θ(1)j , θ
(2)j , θ
(3)j
´j = 1, . . . , J , and
approximating E·³θ(1)
´2¸by 1
J
PJj=1
³θ(1)j
´2for example. Our choice of J was 5,000.
10
5 Near Unit Root Approximation
Our Monte Carlo simulation results summarized in Table 1 indicate that the previously discussed
approximations and the bias corrections that are based on them do not work well near the unit
circle. This result occurs because the identification of the model becomes “weak” near the unit
circle. See Blundell and Bond (1998), who related the problem to the analysis by Staiger and
Stock (1997). In this Section, we formally adopt approximations local to the points in the
parameter space that are not identified. To be specific, we consider model (1) for T fixed and
n→∞ when also βn tends to unity. The type of asymptotic sequences that we consider, where
βn → 1 at the same time as n→∞ may be interpreted in the spirit of Bekker (1994). We want
to keep certain aspects of the model, in our case the quality of the instruments, constant as the
sample size increases. The purpose of alternative asymptotic sequences is to obtain asymptotic
approximations that better reflect finite sample properties, in our case poor identification, than
standard asymptotics with fixed parameters would deliver.
A check that these alternative asymptotics provide better approximations to the finite sample
distribution near the unit circle can be obtained from comparing Tables 6 and 7 with actual
Monte Carlo results for Ahn and Schmidt and Arellano and Bond estimators for large values
of β. The predicted bias and MSE match much better than the predictions from Taylor series
expansions discussed earlier.
The fact that βn → 1 with n → ∞ is not meant to be a reflection of some realistic data
generating process but is an analytical device. The asymptotics that we propose are also quite
different from the near unit root literature in a pure time series context. Here, T is fixed and the
degree of temporal dependence modelled by βn is not directly relevant for our analysis. What
matters are the implications for identification as discussed before.
We analyze the bias of the associated weak instrument limit distribution. We analyze the
class of GMM estimators that exploit Ahn and Schmidt’s (1997) complete set of moment con-
ditions and show that a strict subset of the full set of moment restrictions should be used in
estimation in order to minimize bias. We show that the long difference estimator is a good
approximation to the bias minimal procedure.
Following Ahn and Schmidt we exploit the moment conditions
E£uiu
0i
¤= σ2εI + σ
2α11
0
E [uiyi0] = σαy01
with 1 = [1, ..., 1]0 a vector of dimension T and ui = [ui1, ..., uiT ]0 . The moment conditions can
11
be written more compactly as
b =
vechE [uiu0i]E [uiyi0]
= σ2ε vech I
0
+ σ2α vech110
0
+ σαy0 0
1
(10)
where the redundant moment conditions have been eliminated by use of the vech operator which
extracts the upper diagonal elements from a symmetric matrix. Representation (10) makes it
clear that the vector b ∈ RT (T+1)/2+T is contained in a 3 dimensional subspace which is anotherway of stating that there are G = T (T +1)/2+T − 3 restrictions imposed on b. This statementis equivalent to Ahn and Schmidt’s (1997) analysis of the number of moment conditions.
GMM estimators are obtained from the moment conditions by eliminating the unknown
parameters σ2ε,σ2α and σαy0. The set of all GMM estimators leading to consistent estimates of β
can therefore be described by a (T (T + 1)/2 + T )×G matrix A which contains all the vectorsspanning the orthogonal complement of b. This matrix A satisfies
b0A = 0.
For our purposes it will be convenient to choose A such that
b0A =£E [uit∆uis] , E [uiT∆uij ] , E [ui∆uik] , E
£∆u0iyi0
¤¤,
s = 2, ..., T ; t = 1, ...s− 2; j = 2, ..., T − 1; k = 2, ..., T
where ∆ui = [ui2 − ui1, ..., uiT − uiT−1]0 and ui = T−1PTt=1 uit. It becomes transparent that
any other representation of the moment conditions can be obtained by applying a corresponding
nonsingular linear operator C to the matrix A. It can be checked that there exists a nonsingular
matrix C such that b0AC = 0 is identical to the moment conditions (4a)-(4c) in Ahn and Schmidt
(1997).
We investigate the properties of (infeasible) GMM estimators based on
E [uit∆uis (β)] = 0, E [uiT∆uij (β)] = 0, E [ui∆uik (β)] = 0, E [yi0∆uit (β)] = 0
obtained by setting ∆uit (β) ≡ ∆yit − β∆yit−1. Here, we assume that the instruments uit areobservable. We partition the moment conditions into two groups where the second group only
contains instruments involving yi0. In particular, let gi1 (β) denote a column vector consisting of
uit∆uis (β) , uiT∆uij (β) , ui∆uik (β). Also let gi2 (β) ≡ [yi0∆ui (β)]. It will turn out that onlythese instruments are asymptotically relevant under near unit root asymptotics. Finally, let
gn (β) ≡ n−3/2Pni=1
£gi1 (β)
0 , gi2 (β)0¤0 with the optimal weight matrix Ωn ≡ E £gi (βn) gi (βn)0¤.
12
The infeasible GMM estimator of a possibly transformed set of moment conditions C0gn (β) then
solves
β2SLS = argminβ
gn (β)0C¡C0ΩnC
¢+C 0gn (β) (11)
where C is a G × r matrix for 1 ≤ r ≤ G such that C0C = Ir and rank³C (C0ΩC)+C0
´≥ 1.
We use (C0ΩC)+ to denote the Moore-Penrose inverse. We thus allow the use of a singular
weight matrix. Choosing r less than G allows to exclude certain moment conditions. Let
fi,1 ≡ − ∂gi1 (β)/∂β, fi,2 ≡ − ∂gi2 (β)/∂β, and fn ≡ n−3/2Pni=1
hf 0i1, f 0i,2
i0. The infeasible
2SLS estimator can be written as
β2SLS − βn =³f 0nC
¡C0ΩnC
¢+C 0fn
´−1f 0nC
¡C 0ΩnC
¢+C0gn (βn) . (12)
We are now analyzing the behavior of β2SLS − βn under local to unity asymptotics. We makethe following additional assumptions.9
Condition 3 Let yit = αi + βnyit−1 + εit with εit ∼ N¡0,σ2ε
¢, αi ∼ N
¡0,σ2α
¢and yi0 ∼
N³
αi1−βn ,
σ2ε1−β2n
´, where βn = exp(−c/n) for some c > 0.
The critical assumption in Condition (3) is that βn = exp(−c/n) = 1 − c/n + o(n−1).The parameter βn controls the degree of correlation between regressors and instruments. For
the case of GMM estimators based on first differences this correlation amounts to E [∆ytyt−2] =
O(βn−1) = −c/n.Our specification thus implies an asymptotic lack of correlation more stringentthan the one used by Staiger and Stock (1997). In Hahn and Kuersteiner (2002) it is argued that
the faster n−1-rate corresponds to a situation of near-nonidentification. Numerical and analytical
calculations10 in the context of our dynamic panel model show that n−1/2-rate asymptotics do
not capture the type of finite sample biases that we find important for the behavior of GMM
estimators and that the specification βn = exp(−c/n) delivers more accurate predictions.Under the generating mechanism described in the previous definition the following Lemma
can be established.
Lemma 1 Assume βn = exp(−c/n) for some c > 0. For T fixed and as n→∞
n−3/2nXi=1
fi,1p→ 0, n−3/2
nXi=1
gi,1(β0)p→ 0
9Kruiniger (2000) considers similar local-to-unity asymptotics.10Available at http://econ-www.mit.edu/faculty/jhausman/papers.htm
13
and
n−3/2nXi=1
£f 0i,2, g
0i,2(β0)
¤0 d→ £ξ0x, ξ
0y
¤0
where£ξ0x, ξ
0y
¤0 ∼ N(0,Σ) with Σ = Σ11 Σ12
Σ21 Σ22
and Σ11 = δI, Σ12 = δM 01, Σ22 = δM2, where
δ =σ2ασ
2ε
c2,
M1 =
−1 1 0
. . . . . .
0. . . 1
−1
, M2 =
2 −1 0
−1 . . . . . .. . . . . . −1
−1 2
and Σ12 = Σ021. We also have
1
n2Ωn =
0 0
0 Σ22
+ o (1) .Proof. See Appendix A.
It can be shown that the limiting normal distribution of Lemma (1) is singular. An alternative
representation of the limiting distribution can be given by eliminating redundant dimensions
from£ξ0x, ξ
0y
¤0. Let ξy = £ξ0y0, ξy1¤0 where ξy1 is the last element of ξy and ξu = £ξ0x, ξy1¤0 . Thenξy = Hξu, where H is a (T − 1)× T matrix defined as
H =
−1 1 0 0
. . . . . ....
0 −1 1 0
0 · · · 0 1
.
It can be checked easily that the covariance matrix of vector£ξ0x, ξ
0uH
0¤0 is Σ as required. UsingLemma (1) the limiting distribution of β2SLS − βn is stated in the next corollary. For thispurpose we define the augmented vectors ξ#x =
£0, ..., 0, ξ0x
¤0 and ξ#y = £0, ..., 0, ξy¤ and partitionC = [C00, C01]
0 such that C0ξ#x = C01ξx. Let r1 denote the rank of C1.
Corollary 1 Let β2SLS − βn be given by (12). If Condition 3 is satisfied then
β2SLS − 1 d→ ξ0xC1 (C01Σ22C1)+C01Hξu
ξ0xC1 (C01Σ22C1)+C01ξx
= X (C,Σ22) (13)
14
Unlike the limiting distribution for the standard weak instrument problem, X (C,Σ22), as
defined in (13), is based on normal vectors that have zero mean. This degeneracy is generated
by the presence of the fixed effect in the initial condition, scaled up appropriately to satisfy the
stationarity requirement for the process yit. Inspection of the proof shows that the usual concen-
tration parameter appearing in the limit distribution is dominated by a stochastic component
related to the fixed effect.
Based on Corollary 1, we define the following class of 2SLS estimators for the dynamic panel
model. The class contains estimators that only use a nonredundant set of moment conditions
involving the initial conditions yi0.
Definition 1 Let β∗2SLS be defined as β∗2SLS ≡ argminβ g2,n (β)0C1
³C01ΩC1
´−1C 01g2,n (β), where
g2,n (β) ≡ n−3/2Pni=1 gi2 (β), Ω is a symmetric positive definite (T − 1)×(T − 1) matrix of con-
stants and C1 is a (T − 1)× r1 matrix of full column rank r1 ≤ T − 1 such that C01C1 = I.
Next we turn to the analysis of the asymptotic bias for the estimator β∗2SLS of the dynamic
panel model. Since the limit only depends on zero mean normal random vectors we can apply
the results of Smith (1993).
Theorem 2 Let X∗³C1, Ω
´be the limiting distribution of β∗2SLS − 1 in Definition 1 under
Condition 3. Let D ≡ (D +D0) /2, where D = WM1 and W = C1³C01ΩC1
´−1C 01. Let Γ be
the matrix of eigenvectors with corresponding eigenvalues λi of W. Assume λ1 ≥ ... ≥ λT−1.
Let Λ = diag(λ1, ..,λT−1) and Λ1 be the upper right block of Λ containing all the non-zero
eigenvalues. Partition Γ = [Γ1,Γ2] where Γ1 are the eigenvectors corresponding to the nonzero
eigenvalues of W such that Γ1Λ1Γ01 =W and Γ1Γ01 + Γ2Γ02 = I. Then
EhX³C1, Ω
´i=M1
³λ−1,Γ01DΓ1,Λ1, r
´≡ λ−1
∞Xk=0
(1)k¡12
¢1+k¡
r12
¢1+k
k!C1,k1+k
³Γ01DΓ1, Ir − λ−1Λ1
´where r = rank (W ), (a)b is the Pochhammer symbol Γ (a+ b) /Γ (b), C
1,kp+k (., .) is a top order
invariant polynomial defined by Davis (1980) and λ is the largest eigenvalue of W . The mean
EhX³C1, Ω
´iexists for r ≥ 1.
Proof. See Appendix A.
The Theorem shows that the bias of β∗2SLS both depends on the choice of C1 and the weight
matrix Ω. Note for example that E [X(C1, IT−1)] = tr D/r1.
The problem of minimizing the bias by choosing optimal matrices C1 and Ω does not seem to
lead to an analytical solution but could in principle be carried out numerically for a given number
15
of time periods T . For our purpose we are not interested in such an exact minimum. Instead
we analyze optimal choices of C1 minimizing bias for a given weight matrix. Two choices of Ω
are most relevant in practice, namely Ω = IT−1 or Ω = Σ22.We are able to obtain an analytical
solution for the bias minimal estimator. We use this analytical optimum as a bench mark against
which we judge existing estimators that have been proposed in the literature.
Theorem 3 Let X³C1, Ω
´be as defined in Definition 1. Let D = (D +D0) /2 where D =
C1
³C01ΩC1
´−1C 01M1. Let Γ1 be as defined in Theorem 2. Then
minC1 s.t. C01C1=Ir1r1=1,..,T−1
|E [X(C1, IT−1)]| = minC1 s.t. C01C1=Ir1r1=1,..,T−1
|E [X(C1,Σ22)]| .
Moreover,
E [X (C1, IT−1)] = tr D/r1.
Let C∗1 = argminC1 |E [X(C1, IT−1)]| subject to C01C1 = Ir1 , r1 = 1, .., T −1. Then C∗1 = ρi whereρi is the eigenvector corresponding to the smallest eigenvalue li of M2. Thus, minC1 tr D/r1 =
min li/2. As T →∞ the smallest eigenvalue of M2, min li → 0.
Theorem 3 shows that the estimator that minimizes the bias is based only on a single moment
condition which is a linear combination of the moment conditions involving yi0 as instrument
where the weights are the elements of the eigenvector ρi corresponding to the smallest eigenvalue
of M2.
Optimal estimators based on C∗1 are unintuitive to implement. Moreover, optimality only
holds near the unit circle. For these reasons we do not recommend direct implementation of
these optimal procedures. We instead use them as benchmarks against which we measure the
performance of more easily implementable estimators.
We turn now to an analysis of bias and mean squared error of commonly used estimators
under near unit root asymptotics. Let 10k = [1, ..., 1] be a k-vector of ones and note that for the
Arellano and Bond estimator
CAB1 =
0 · · · 0 10T−1... · · · 0
0 102...
1 0 · · · 0
is a (T − 1)× ((T − 1)T/2) matrix of rank T − 1. Correspondingly, for the Ahn and Schmidtestimator, CAS1 =
£CAB1 ;0(T−1)×(2T−3)
¤, a (T − 1)× ((T + 1)T/2+T −3) matrix of rank T −1.
16
The long distance estimator takes on the form
CLD1 =£1T−1;0(T−1)×(T−2)
¤where now WLD = CLD1
¡CLD01 Σ22C
LD1
¢+CLD01 is singular of rank 1. The next result gives
explicit formulas for the bias of the asymptotic limiting distribution of the Ahn and Schmidt,
Arellano and Bond and long difference estimators.
Corollary 2 Let X¡CAS1 ,Σ22
¢, X
¡CAB1 ,Σ22
¢and X
¡CLD1 ,Σ22
¢be the limiting distribution of
the Ahn-Schmidt, Arellano-Bond and Long Difference estimators under Condition 3. Let Da ≡(Da +Da0) /2, whereDa =WaM 0
1 andWa = Ca1 (C
a01 Σ22C
a1 )+Ca01 for a = ”AS”, ”AB”, ”LD”.
Let λa be the largest eigenvalue of W a. Then for a = “AS”,“AB” ,
E [X (Ca1 ,Σ22)] =M1³λ−1a , D
a,W a, T − 1´.
Let Γ the matrix of eigenvectors of WLD with Γ = [Γ1,Γ2] where Γ1 is the eigenvector corre-
sponding to the nonzero eigenvalue of WLD such that Γ1Λ1Γ01 = WLD and Γ1Γ01 + Γ2Γ02 = I.
Define z1 = L−1Γ01ξx and z2 = L−1Γ02ξx. Then, for a = “LD” ,
E [X (Ca1 ,Σ22)] =M1³λ−1a ,Γ
01D
aΓ1,Λ1, 1´.
Numerical procedures to evaluate the top order invariant polynomials appearing in the ex-
pressions for the bias are discussed in Smith (1993). Table 6 reports numerical values for the
biases of the Arellano and Bond and Ahn and Schmidt estimators under near unit root as-
ymptotics and compares them with biases for the long difference estimator as well as the bias
minimal estimator. Despite the fact that long difference does not achieve the same bias reduc-
tion as the fully optimal estimator it has significantly less bias than the more commonly used
implementations of the GMM estimator.
We now turn to the analysis of the mean squared error implied by the asymptotic distribution
under Condition (3). Since β∗2SLS − 1 d→ X∗(C1,Σ22) we take E [X∗(C1,Σ22)]2 as a measure for
the MSE of β∗2SLS. Unlike in the case of bias it is not feasible to find analytical optima for the
problem of minimizing E [X∗ (C1,Σ22)]2 with respect to C1.
Nevertheless, for a given value of T and a particular point in the parameter space, such opti-
mization can be carried out numerically by Monte Carlo integration techniques and subsequent
numerical optimization. We carry out such a numerical exercise to establish that conventional
estimators which are optimal under standard first order asymptotic theory fail to be admissible
under weak identification circumstances. In our numerical examples we calibrate the asymptotic
17
distribution to the case where β = .95 and n = 100. This implies a value for c = 100(− log .95).In Table 7 we compare E [X (C∗1 ,Σ22)]
2 to E£X¡CAS1 ,Σ22
¢¤2 for different values of T. HereC∗1 = argmin
C1 s.t. C01C1=Ir1r1=3,..,T−1
E [X (C1,Σ22)]2 .
Note that the minimal dimension of C1 is 3 to guarantee the existence of the second moment.
The results of Table 7 show that the MSE of the Ahn and Schmidt estimator is clearly dominated
by the optimal procedure. It also shows that the optimal number of moment conditions increases
slowly with T and is much lower than the number of moments used by Ahn and Schmidt. In
fact, the optimal procedure uses only about 1/10 of the available moments.
These results show that the optimality arguments under standard asymptotic approxima-
tions break down when the model is weakly identified. While it is optimal from an asymptotic
variance and thus MSE point of view under standard asymptotics to use as many instruments as
possible, this argument is invalid under the near non-identification asymptotics we use here. The
GMM estimator that minimizes the asymptotic MSE under these circumstances uses a subset of
moment conditions that is much smaller than the set of all available moment conditions. Some
intuition can be gained from our analytical results regarding bias minimization. In Theorem (3)
it is show that the bias minimal procedure in fact only uses a single moment condition. When
focusing on the MSE this bias reduction needs to balanced off against efficiency considerations.
The resulting optimum lies somewhere between estimators that use all and estimators that use
very little information.
However, we can not compare the MSE of the Ahn and Schmidt and Arellano and Bond esti-
mators to our long difference estimator analytically because under the near unit root asymptotic
limiting distribution the long difference estimator has unbounded second moments. At the same
time the Monte Carlo results are consistent with the analytic result that the optimal estimator
does not use all the moment conditions. Indeed, the optimal estimator uses only a small fraction
of the moment conditions. The long difference estimator does well in terms of MSE because of
its small amount of bias and its use of a limited set of moment conditions.
Implementation of a procedure that is optimal in terms of its MSE under non-standard
asymptotics is not recommended because such a procedure depends on the unknown parameter
β through δ. Instead, our Monte Carlo results show that the long difference estimator is a
reasonable, very easily implementable approximation to such an optimal estimator.
18
6 Conclusion
We have considered the dynamic panel data model where it has been recognized that the usual
IV (GMM) estimators have substantial bias for a large positive β, which commonly occurs in
applied research. We consider two techniques to solve the bias problem. First we propose
a second-order bias corrected IV estimator. However, this estimator does not solve the bias
problem for large β, as is demonstrated by Monte-Carlo experiments. Our second approach is
an IV estimator that uses a reduced set of instruments, in particular “long differences”. This
estimator leads to a significant reduction in bias as predicted by a “weak instrument” analysis.
Further, the long difference estimator does well in MSE comparisons with existing estimators.
We then conduct a “local to unity” analysis to analyze why the long difference estimator
does significantly better than traditional second order theory predicts. We find that near the
unit circle the optimal estimator in terms of bias uses only one moment restriction, similar to
the long difference estimator. We also find that the optimal estimator in terms of MSE uses
a much restricted set of moment restrictions, typically about 1/10 of the available restrictions.
Thus, our analysis demonstrates that the usual first order asymptotic advice of using all of the
moment restrictions does not provide proper guidance in the dynamic panel data model for large
β. Instead a restricted set of moment conditions lead to a better estimator. The long difference
estimator, which uses a restricted set of moment conditions, provides such an improved estimator
as our Monte-Carlo results demonstrate.
19
A Technical Appendix
Proof of Lemma 1. Let ξi0 ≡ yi0 − αi/ (βn − 1) such that ξi0 ∼ N¡0,σ2ε/(1− β2n)
¢. Note
that
∆yit = βt−1n (βn − 1) ξi0 + εit + (βn − 1)
t−1Xs=1
βs−1n εit−s
such that
E [|uit∆yis−1|] ≤qE£u2it¤rEh(∆yis−1)2
i
=pσ2ε + σ
2α
vuuutEÃβs−2n (βn − 1) ξi0 + εis−1 + (βn − 1)
s−2Xr=1
βr−1n εis−1−r
!2=
pσ2ε + σ
2α
vuutβ2(s−2)nσ2ε (βn − 1)21− β2n
+ σ2ε + (βn − 1)2 σ2εs−2Xr=1
β2(r−1)n = O (1)
By independence of uit∆yis−1 across i, it therefore follows that n−3/2Pni=1 uit∆yis−1 = op (1).
By the same reasoning, we obtain n−3/2Pni=1 uiT∆yij−1 = op (1), and n
−3/2Pni=1 ui∆yik−1 =
op (1). We therefore obtain n−3/2Pni=1 fi,1 = op (1). We can similarly obtain n
−3/2Pni=1 gi,1 =
op (1).
Next we consider n−3/2Pni=1 fi,2 andn
−3/2Pni=1 gi,2. Note that
E [∆yityi0] = Ehyi0
³βt−1n (βn − 1) ξi0 + εit + (βn − 1)
Xt−1r=1
βr−1n εis−1−r´i
= βt−1n σ2εβn − 11− β2n
= σ2ε−c/n2c/n
+ o(1) =−σ2ε2+ o(1),
and
Eh(∆yityi0)
2i= E
·y2i0
³βt−1n (βn − 1) ξi0 + εit + (βn − 1)
Xt−1r=1
βr−1n εis−1−r´2¸
= β2(t−1)n
(βn − 1)2 σ2ε1− β2n
σ2α(1− βn)2
+ 3β2(t−1)n
σ4ε(βn − 1)2¡1− β2n
¢2+³σ2ε + (βn − 1)2 σ2ε
Xs−2r=1
β2(r−1)n
´Ã σ2α(1− βn)2
+σ2ε¡
1− β2n¢!
=σ2εσ
2α
c2n2 +O(n).
such that Var¡n−3/2
Pni=1∆yityi0
¢= O(1). For n−3/2
Pni=1 gi,2 (β0) we have from the moment
conditions that E [gi,2 (β0)] = 0 and
Var (∆ui(β0)yi0) = 2σ2εσ2α (1− βn)−2 +O(n) =
2σ2εσ2α
c2n2 +O(n).
20
The joint limiting distribution of n−3/2Pni=1
hf 0i,2 −E
hf 0i,2i, gi,2(β0)
0i0can now be obtained
from a triangular array CLT. By previous arguments
E£f 0i,2, gi,2(β0)
0¤ = h µ0 0 · · · 0i
with µ = −σ2ε/2ι+O(n−1) where ι is the T − 1 dimensional vector with elements 1. Then
Eh¡f 0i,2 −E
£f 0i,2¤, gi,2(β0)
0¢0 ¡f 0i,2 −E £f 0i,2¤ , gi,2(β0)0¢i = Σnwhere
Σn =
Σ11,n Σ12,n
Σ21,n Σ22,n
By previous calculations we have found the diagonal elements of Σ11,n and Σ22,n to be
σ2εσ2α
c2 n2
and 2σ2εσ2α
c2 n2. The off-diagonal elements of Σ11,n are found to be
E£∆yit∆yisy
2i0
¤= E
hy2i0
³βs−1n (βn − 1) ξi0 + εis + (βn − 1)
Xs−1r=1
βr−1n εis−1−r´
׳βt−1n (βn − 1) ξi0 + εit + (βn − 1)
Xt−1r=1
βr−1n εit−1−r´i
= βt−1n βs−1n
(βn − 1)2¡1− β2n
¢ Ã σ2εσ2α
(1− βn)2+ 3
σ4ε¡1− β2n
¢!+O(1) = σ2εσ2α
2cn+O(1)
which is of lower order of magnitude while n−1 (E [∆yityi0])2 = O(1). Thus n−1Σ11,n →diag(σ
2εσ
2α
c , ..., σ2εσ
2α
c ). The off-diagonal elements of Σ22,n are obtained from
E£∆uit∆uisy
2i0
¤=
−σ2εσ2α (1− βn)−2 +O(n) t = s+ 1 or t = s− 10 otherwise
For Σ12,n, we consider
E£∆yit∆uisy
2i0
¤=
σ2εσ
2α
c2 n2 +O(n) if t = s
−σ2εσ2αc2 n2 +O(n) if t = s− 10 otherwise
It then follows that for ` ∈ RT (T+1)/+2T−6 such that `0` = 1 n−3/2Pni=1 `
0Σ−1/2n
hf 0i,2 −Efi,2, gi,2(β0)0
i0 d→N (0, 1) by the Lindeberg-Feller CLT for triangular arrays. It then follows from a straight-
forward application of the Cramer-Wold theorem and the continuous mapping theorem that
n−3/2Pni=1
hf 0i,2, gi,2(β0)0
i0 d→ £ξ0x, ξ
0y
¤0 where £ξ0x, ξ0y¤0 ∼ N (0,Σ) .Note that n−3/2Pni=1 `
0E [fi,2] =
O(n−1/2) and thus does not affect the limit distribution.
21
Finally note that Ehgi,1g
0i,1
i= O (1). Also note that
Var (∆ui(β0)yi0) = 2σ2εσ2α (1− βn)−2 +O(n) =
2σ2εσ2α
c2n2 +O(n).
The off-diagonal elements of Ehgi,2g
0i,2
iare obtained from
E£∆uit∆uisy
2i0
¤=
−σ2εσ2α (1− βn)−2 +O(n) t = s+ 1 or t = s− 10 otherwise
It therefore follows that
1
n2E£gig
0i
¤=
0 0
0 Σ22
+ o (1) .
Proof of Theorem 2. Use the transformation z = L−1ξx with Σ11 = LL0. Note that we
can take L =√δIr1. Define W = C1
³C 01ΩC1
´−1C 01. Then X
³C1, Ω
´= z0L0WHξu/ z0L0WLz.
Let F = Σ12Σ−111 = M01 where the second equality is based on Σ
−111 = δ
−1Ir1 . Next use the fact
that E [Hξu| ξx] = Fξx = FLz. Using a conditioning argument it then follows that
EhX³C1, Ω
´i= E
·z0L0WFLzz0L0WLz
¸= E
·z0Dzz0Wz
¸.
Note that z0Dz = z0D0z = z0Dz where D = 12 (D +D
0) is symmetric. Define z1 = Γ01z and
z2 = Γ02z. It now follows that
E
·z0Dzz0Wz
¸= E
·z01Γ01DΓ1z1 + z01Γ01DΓ2z2
z01Λ1z1
¸= E
·z01Γ01DΓ1z1z01Λ1z1
¸where the first equality follows from Γ02W = 0 and the second equality follows from independence
between z1 and z2. The result then follows from Smith (1993, Eq. 2.4, p. 273).
Proof of Theorem 3. We first analyze E [X (C1,Σ22)]. Note that in this case W =
C1 (C01Σ22C1)−1C01 = δ
−1C1 (C01M2C1)−1C01 such that
E [X (C1,Σ22)] = E
·z01Γ01DΓ1z1z01Λ1z1
¸and δ tr D = δ/2 trW (M 0
1 +M1) = −r1/2 since M 01 +M1 = −M2 where r1 is the rank of C1.
Then it follows from Smith (1993, p. 273 and Appendix A)
E
·z01Γ01DΓ1z1z01Λ1z1
¸= λ
−1∞Xk=0
(1)k¡12
¢1+k¡
r12
¢1+k
k!C1,k1+k
³Γ01DΓ1, Ir1 − λ−1Λ1
´, (14)
22
where for any two real symmetric matrices Y1, Y2
C1,k1+k (Y1, Y2) =k!
2¡12
¢k+1
kXi=0
¡12
¢k−i
(k − i)! tr¡Y1Y
i2
¢Ck−i (Y2)
and
Ck (Y2) =k!¡12
¢k
dk (Y2)
dk (Y2) = k−1k−1Xj=0
1
2tr³Y k−j2
´dj (Y2)
d0 (Y2) = 1.
Since all elements λ−1λi in λ−1Λ1 satisfy 0 < λ
−1λi ≤ 1 it follows that Ir1 − λ−1Λ1 is positive
semidefinite implying that Ck³Ir1 − λ−1Λ1
´≥ 0, which holds with equality only if all eigen-
values λi are the same. Also note that, if Y2, Y3 are diagonal matrices and Y1 any conformable
matrix, then trY3Y1Y i2 = trY1Y3Yi2 . Therefore,
trΓ01DΓ1(Ir1 − λ−1Λ1)i = trΓ01¡WM 0
1 +M1W¢Γ1³Ir1 − λ−1Λ1
´i=
1
2tr¡Λ1Γ
01M
01Γ1 + Γ
01M1Γ1Λ1
¢ ³Ir1 − λ−1Λ1
´i=
1
2trΓ01
¡M 01 +M1
¢Γ1Λ1
³Ir1 − λ−1Λ1
´i= −1
2trΓ01M2Γ1Λ1
³Ir1 − λ−1Λ1
´i.
Next, note that for two positive semi-definite matrices Y1 and Y2 it follows that trY1Y2 =
trY1/22 Y1Y
1/22 ≥ 0 because Y 1/22 Y1Y
1/22 is well defined and positive semi-definite by positive semi-
definiteness of Y1. Next note that Λ1³Ir1 − λ−1Λ1
´iis positive semi-definite by previous argu-
ments. Also, Γ01M2Γ1 is positive definite. Therefore, every trµΓ01DΓ1
³Ir1 − λ−1Λ1
´i¶Ck−i
³Ir1 − λ−1Λ1
´has the same sign, and so does every C1,k1+k
³Γ01DΓ1, Ir1 − λ−1Λ1
´. Therefore, we have
|E [X (C1,Σ22)]| ≥¯¯λ−1
0Xk=0
(1)k¡12
¢1+k¡
r12
¢1+k
k!C1,k1+k
³Γ01DΓ1, Ir1 − λ−1Λ1
´¯¯For k = 0, we have
C1,01
³Γ01DΓ1, Ir1 − λ−1Λ1
´= trΓ01DΓ1 ≥ −δ−1r1/2. (15a)
For the last inequality let λDi be the eigenvalues of D. By Magnus and Neudecker (1988, Theorem
13, p.211), trΓ01DΓ1 ≥Pr1i=1 λ
Di . Note that D is of reduced rank r1 such that tr D =
Pr1i=1 λ
Di =
23
−δ−1r1/2. Also, (1)0¡12
¢1/¡r12
¢1= 1
r1. This shows that |E [X (C1,Σ22)]| ≥ λ
−1/2 for all C1
such that C01C1 = Ir1 and all r1 ∈ 1, 2, .., T − 1. Then
minC1 s.t. C01C1=Ir1r1∈1,2,..,T−1
|E [X (C1,Σ22)]| ≥ minC1 s.t. C01C1=Ir1r1∈1,2,..,T−1
λ−1
2δ≥ min lj
2, (16)
where min lj is the smallest eigenvalue of M2. Since λ is the largest eigenvalue of W and W
has the same nonzero eigenvalues as δ−1 (C 01M2C1)−1 by Zhang (1999, Theorem 2.8, p.51), it
follows that λ is the largest eigenvalue of δ−1 (C 01M2C1)−1. The last inequality in (16) follows
from Magnus and Neudecker (1988, Theorem 10, p. 209) as well as the fact that λ is δ−1
times the largest eigenvalue of (C 01M2C1)−1. Now let r1 = 1 and C1 = ρi, where ρi is the
eigenvector corresponding to min lj . Note that for this case W = δ−1ρiρ0i/min lj such that
Γ1 = ρi and λ = δ−1 (min lj)−1. Then Ir1 − λ−1Λ1 = 1− 1 = 0 and Γ01DΓ1 = 1/ (2δ) such that
|E [X (C1,Σ22)]| = λ−1¯trΓ01DΓ1
¯= (1/min lj)
−1 /2 = min lj/2. Inequality (16) therefore holds
with equality.
Next consider E [X (C1, IT−1)] . Note that now W = C1(C01C1)
−1C 01 = C1C01 since C 01C1 =
Ir1 such that W = Γ1Λ1Γ01 with Λ1 = Ir1 and Γ1 = C1. Then, λ = 1 and Ir1 − λΛ1 =
0 such that (14) reduces to E [X (C1, IT−1)] = 1r1C1,01
¡Γ01DΓ1, 0
¢= trΓ01DΓ1/r1. But then
D = 2−1 (C1C 01M1 +M01C1C
01) such that trΓ
01DΓ1/r1 = trC
01DC1/r1 = −2−1 trC 01M2C1/r1 =
tr D/r1. We analyze
minC1 s.t. C01C1=Ir1r1∈1,2,..,T−1
¯−2−1 trC01M2C1r1¯.
It can be checked easily that M2 is positive definite symmetric. We can therefore minimize
tr (C 01M2C1). It is now useful to choose an orthogonal matrix R with j-th row ρj such that
R0R = RR0 = I and M2 = RLR0 where L is the diagonal matrix of eigenvalues of M2 =PT−1j=1 ljρjρ
0j . Then it follows that tr (C
01M2C1) =
PT−1j=1 ljρ
0jC1C
01ρj. Next note that all the
eigenvalues of C1C01 are either zero or one such that 0 ≤ ρ0jC1C 01ρj ≤ 1. The minimum of
tr (C 01M2C1) is then found by choosing r1 = 1 and C1 such that C 01ρj = 0 except for the
eigenvector ρi corresponding to min lj . To show that tr¡D/r1
¢is also minimized for r1 = 1
and C1 = ρi, where tr¡D/r1
¢= min lj/2, consider augmenting C1 by a column vector x such
that x0x = 1 and ρ0ix = 0. Then C01C1 = I2, r2 = 2 and trC
01M2C1 = li +
PT−1j 6=i lj
³ρ0jx´2. By
Parseval’s equalityPT−1j 6=i
³ρ0jx´2= 1. Since lj ≥ li we can bound tr (C 01M2C1) ≥ 2li but then
tr (C 01M2C1/2) ≥ li. This argument can be repeated to more than one orthogonal addition x.It now follows that E [X (C1, IT−1)] = tr
¡D/r1
¢is minimized for r1 = 1 and C = ρi, where ρi
is the eigenvector corresponding to the smallest eigenvalue.
24
Next note that from x0x = 1 such that min li ≤ x0M2x ≤ max li it follows that
min li ≤ 10M21/(101) =2 (T − 1)−1
for 1 =[1, ..., 1]0 which shows that the smallest eigenvalue is bounded by a monotonically de-
creasing function of the number of moment conditions.
Proof of Corollary 2:. For a = ”AS”, ”AB” the matrixW a is of full rank T −1. Thus,for L =
√δIT−1, z = L−1ξx ∼ N (0, IT−1) and
E [X(Ca1 ,Σ22)] = E
·z0L0WaHE [ξu|ξx]
z0L0W aLz
¸= E
·z0L0W aδ−1Σ21Lz
z0L0W aLz
¸and the result follows from Smith (1993, Eq 2.4, p.273). For a = ”LD” note that
ξ0xW aHξuξ0xW aξx
=z01Γ01W aHξuz01Λ1z1
by the same arguments as in the proof of Theorem (2). The remainder of the proof is the same
as in that Theorem.
25
References
[1] Ahn, S., and P. Schmidt, 1995, “Efficient Estimation of Models for Dynamic Panel
Data”, Journal of Econometrics 68, 5 - 28.
[2] Ahn, S., and P. Schmidt, 1997, “Efficient Estimation of Dynamic Panel Data Models:
Alternative Assumptions and Simplified Estimation”, Journal of Econometrics 76, 309 -
321.
[3] Alonso-Borrego, C. and M. Arellano, 1999, “Symmetrically Normalized
Instrumental-Variable Estimation Using Panel Data”, Journal of Business and Economic
Statistics 17, pp. 36-49.
[4] Alvarez, J. and M. Arellano, 1998, “The Time Series and Cross-Section Asymptotics
of Dynamic Panel Data Estimators”, unpublished manuscript.
[5] Andrews, D.W.K., 1992, ”Generic Uniform Convergence”, Econometric Theory, 8, pp.
241-257.
[6] Andrews, D.W.K., 1994, ”Empirical Process Methods in Econometrics.” In Handbook of
Econometrics, Vol IV (R.F. Engle and D.L. McFadden, eds.). North Holland, Amsterdam.
[7] Anderson, T.W. and C. Hsiao, 1982, “Formulation and Estimation of Dynamic Models
Using Panel Data”, Journal of Econometrics 18, pp. 47 - 82.
[8] Arellano, M. and S.R. Bond, 1991, “Some Tests of Specification for Panel Data: Monte
Carlo Evidence and an Application to Employment Equations”, Review of Economic Studies
58, pp. 277 - 297.
[9] Arellano, M. and O. Bover, 1995, “Another Look at the Instrumental Variable Esti-
mation of Error-Component Models”, Journal of Econometrics 68, pp. 29 - 51.
[10] Bekker, P., 1994, “Alternative Approximations to the Distributions of Instrumental Vari-
able Estimators”, Econometrica 62, pp. 657 - 681.
[11] Blundell, R. and S. Bond, 1998, “Initial Conditions and Moment Restrictions in Dy-
namic Panel Data Models”, Journal of Econometrics 87, pp. 115 - 143.
[12] Chamberlain, G., 1984, “Panel Data”, in Griliches, Z. and M. D. Intriligator, eds., Hand-
book of Econometrics Vol. 2. Amsterdam: North-Holland.
26
[13] Chao, J. and N. Swanson, 2000, “An Analysis of the Bias and MSE of the IV Estimator
under Weak Identification with Application to Bias Correction”, unpublished manuscript.
[14] Choi, I. and P.C.B. Phillips, 1992, “Asymptotic and Finite Sample Distribution The-
ory for IV Estimators and Tests in Partially Identified Structural Equations”, Journal of
Econometrics 51, 113 - 150.
[15] Davis, A.W, 1980, “Invariant polynomials with two matrix arguments, extending the zonal
polynomials”, in P.R. Krishnaiah, ed,Mulitivariate Analysis V, 287-299. Amsterdam: North
Holland.
[16] Donald, S. and W. Newey, 2001, “Choosing the Number of Instruments”, Econometrica,
69, 1161-1191.
[17] Dufour, J-M, 1997, “Some Impossibility Theorems in Econometrics with Applications to
Structural and Dynamic Models”, Econometrica 65, 1365 - 1388.
[18] Griliches, Z. and J. Hausman, 1986, “Errors in Variables in Panel Data,” Journal of
Econometrics 31, pp. 93 - 118.
[19] Hahn, J., 1997, “Efficient Estimation of Panel Data Models with Sequential Moment
Restrictions”, Journal of Econometrics 79, pp. 1 - 21.
[20] Hahn, J., 1999, “How Informative Is the Initial Condition in the Dynamic Panel Model
with Fixed Effects?”, Journal of Econometrics 93, pp. 309 - 326.
[21] Hahn, J. and J. Hausman, 2002, “A New Specification Test for the Validity of Instru-
mental Variables”, Econometrica, 70, 163-189.
[22] Hahn, J. and G. Kuersteiner, 2002, “Asymptotically Unbiased Inference for a Dynamic
Panel Model with Fixed Effects When Both n and T are Large”, Econometrica, 70, 1639-
1657.
[23] Hahn, J. and G. Kuersteiner, 2002, ”Discontinuities of Weak Instrument Limiting
Distributions,” Economics Letters, 75, 325-331.
[24] Hahn, J., G. Kuersteiner, and W.K. Newey, 2002, “Higher Order Properties of
Bootstrap and Jackknife Bias Corrected Maximum Likelihood Estimators”, unpublished
manuscript.
27
[25] Hall, P. and J. Horowitz, 1996, “Bootstrap Critical Values for Tests Based on
Generalized-Method-of-Moments Estimators”, Econometrica 64, 891-916.
[26] Hausman, J.A., and W.E. Taylor, 1983, “Identification in Linear Simultaneous Equa-
tions Models with Covariance Restrictions: An Instrumental Variables Interpretation”,
Econometrica 51, pp. 1527-1550.
[27] Hausman, J.A., W.K. Newey, and W.E. Taylor, 1987, “Efficient Estimation and Iden-
tification of Simultaneous Equation Models with Covariance Restrictions”, Econometrica
55, pp. 849-874.
[28] Holtz-Eakin, D., W.K. Newey, and H. Rosen, “Estimating Vector Autoregressions
with Panel Data”, Econometrica 56, pp. 1371 - 1395.
[29] Kiviet, J.F., 1995, “On Bias, Inconsistency, and Efficiency of Various Estimators in Dy-
namic Panel Data Models”, Journal of Econometrics 68, pp. 1 - 268.
[30] Kruiniger, H., “GMM Estimation of Dynamic Panel Data Models with Persistent Data,”
unpublished manuscript.
[31] Kuersteiner, G.M., 2000, “Mean Squared Error Reduction for GMM Estimators of Lin-
ear Time Series Models” unpulished manuscript.
[32] Lancaster, T., 1997, “Orthogonal Parameters and Panel Data”, unpublished manuscript.
[33] Magnus J., and H. Neudecker (1988), Matrix Differential Calculus with Applications
in Statistics and Econometrics, John Wiley & Sons Ltd.
[34] Mariano, R.S. and T. Sawa, 1972, “The Exact Finite-Sample Distribution of the
Limited-Information Maximum Likelihood Estimator in the Case of Two Included Endoge-
nous Variables”, Journal of the American Statistical Association 67, 159 - 163.
[35] Moon, H.R. and P.C.B. Phillips, 2000, “GMM Estimation of Autoregressive Roots
Near Unity with Panel Data”, unpublished manuscript.
[36] Nagar, A.L., 1959, “The Bias and Moment Matrix of the General k-Class Estimators of
the Parameters in Simultaneous Equations”, Econometrica 27, pp. 575 - 595.
[37] Nelson, C., R. Startz, and E. Zivot, 1998, “Valid Confidence Intervals and Inference
in the Presence of Weak Instruments”, International Economic Review 39, 1119 - 1146.
28
[38] Newey, W.K. and R.J. Smith, 2000, “Asymptotic Equivalence of Empirical Likelihood
and the Continuous Updating Estimator”, unpublished manuscript.
[39] Nickell, S., 1981, “Biases in Dynamic Models with Fixed Effects”, Econometrica 49, pp,
1417 - 1426.
[40] Pakes, A and D. Pollard, 1989, ”Simulation and the Asymptotics of Optimazation
Estimators,” Econometrica 57, pp. 1027-1057.
[41] Phillips, P.C.B., 1989, “Partially Identified Econometric Models”, Econometric Theory
5, 181 - 240.
[42] Rilstone, P., V.K. Srivastava, A. Ullah, 1996, “The Second-Order Bias and Mean
Squared Error of Nonlinear Estimators”, Journal of Econometrics 75, pp. 369 - 395.
[43] Richardson, D.J., 1968, “The Exact Distribution of a Structural Coefficient Estimator”,
Journal of the American Statistical Association 63, 1214 - 1226.
[44] Rothenberg, T.J., 1983, “Asymptotic Properties of Some Estimators In Structural Mod-
els”, in Studies in Econometrics, Time Series, and Multivariate Statistics.
[45] Rothenberg, T. J., 1984, “Approximating the Distributions of Econometric Estimators
and Test Statistics”, in Griliches, Z. and M. D. Intriligator, eds., Handbook of Econometrics,
Vol. 2. Amsterdam: North-Holland.
[46] Smith, M.D., 1993, “Expectations of Ratios of Quadratic Forms in Normal Variables:
Evaluating Some Top-Order Invariant Polynomials”, Australian Journal of Statistics 35,
271-282.
[47] Staiger, D., and J. H. Stock (1997): ”Instrumental Variables Regressions with Weak
Instruments,” Econometrica, 65, 557 - 586.
[48] Wang, J. and E. Zivot, 1998, “Inference on Structural Parameters in Instrumental Vari-
ables Regression with Weak Instruments”, Econometrica 66,1389 — 1404.
[49] Windmeijer, F., 2000, “GMM Estimation of Linear Dynamic Panel Data Models using
Nonlinear Moment Conditions”, unpublished manuscript.
[50] Zhang, F., 1999, “Matrix Theory”, Springer Verlag.
29
Table 1: Finite Sample Properties of bbGMM and bbBCT n β %bias
³bbGMM´ %bias³bbBC´ RMSE
³bbGMM´ RMSE³bbBC´
ActualSecond Order
TheoryActual Actual Actual
5 100 0.1 -14.96 -17.71 0.25 0.08 0.08
10 100 0.1 -14.06 -15.78 -0.77 0.05 0.05
5 500 0.1 -3.68 -3.54 -0.38 0.04 0.04
10 500 0.1 -3.15 -3.16 -0.16 0.02 0.02
5 100 0.3 -8.86 -10.60 -0.47 0.10 0.10
10 100 0.3 -7.06 -8.13 -0.66 0.05 0.05
5 500 0.3 -2.03 -2.12 -0.16 0.04 0.04
10 500 0.3 -1.58 -1.63 -0.10 0.02 0.02
5 100 0.5 -10.05 -12.09 -1.14 0.13 0.13
10 100 0.5 -6.76 -8.00 -0.93 0.06 0.06
5 500 0.5 -2.25 -2.42 -0.15 0.06 0.06
10 500 0.5 -1.53 -1.60 -0.11 0.03 0.03
5 100 0.8 -27.65 -37.81 -11.33 0.32 0.34
10 100 0.8 -13.45 -18.98 -4.55 0.14 0.11
5 500 0.8 -6.98 -7.56 -0.72 0.13 0.13
10 500 0.8 -3.48 -3.80 -0.37 0.05 0.04
5 100 0.9 -50.22 -118.64 -42.10 0.55 0.78
10 100 0.9 -24.27 -52.66 -15.82 0.25 0.23
5 500 0.9 -20.50 -23.73 -6.23 0.28 0.30
10 500 0.9 -8.74 -10.53 -2.02 0.10 0.08
30
Table 2: Inconsistency of bbBB under MisspecificationWeight Matrix I
βFβ 0.3 0.6 0.8 0.9
0.3 0.000 0.071 0.074 0.035
0.6 0.267 0.000 0.116 0.054
0.8 0.253 0.205 0.000 0.088
0.9 0.220 0.181 0.129 0.000
Weight Matrix II
βFβ 0.3 0.6 0.8 0.9
0.3 0.000 0.077 0.149 0.084
0.6 0.267 0.000 0.268 0.149
0.8 0.258 0.208 0.000 0.126
0.9 0.235 0.197 0.137 0.000
Table 3: Higher Order Theoretical Properties of Various Estimators
β n T bbGMM bbLD−AB bbLD1 bbLD2 bbLD3 bbLD4 bbAS bbBB bbLDGMMBias Predicted by Second Order Theory
0.1 100 5 -0.017 0.000 0.001 0.001 0.001 0.001 -0.001 0.005 0.003
0.3 100 5 -0.029 0.000 0.003 0.005 0.005 0.005 -0.001 0.008 0.005
0.5 100 5 -0.054 -0.007 0.003 0.009 0.013 0.014 -0.002 0.012 0.012
0.7 100 5 -0.135 -0.040 -0.007 0.014 0.030 0.043 0.002 0.014 0.046
0.9 100 5 -1.019 -0.449 -0.119 0.081 0.214 0.317 0.340 -0.078 1.055
RMSE Predicted by Higher Order Theory
0.1 100 5 0.081 0.099 0.102 0.102 0.102 0.102 0.069 0.070 0.091
0.3 100 5 0.100 0.097 0.109 0.112 0.113 0.113 0.077 0.079 0.100
0.5 100 5 0.133 0.097 0.120 0.134 0.142 0.147 0.094 0.093 0.124
0.7 100 5 0.220 0.121 0.141 0.173 0.212 0.250 0.158 0.123 0.242
0.9 100 5 0.965 0.609 1.181 1.449 1.561 1.633 2.099 0.422 3.558
31
Table 4: Properties of Selected Estimators: Higher Order Theory and Practice
β n T bbGMM bbLD−AB bbLD1 bbLD2 bbLD3 bbLD4Bias Predicted by Higher Order Theory
0.10 100 5 -0.017 0.000 0.001 0.001 0.001 0.001
0.30 100 5 -0.029 0.000 0.003 0.005 0.005 0.005
0.50 100 5 -0.054 -0.007 0.003 0.009 0.013 0.014
0.70 100 5 -0.135 -0.040 -0.007 0.014 0.030 0.043
0.80 100 5 -0.281 -0.105 -0.029 0.015 0.051 0.083
0.90 100 5 -1.019 -0.449 -0.119 0.081 0.214 0.317
0.95 100 5 -3.849 -1.800 -0.300 0.793 1.591 2.179
Actual Bias
0.10 100 5 -0.015 0.001 0.002 0.002 0.002 0.002
0.30 100 5 -0.027 0.001 0.004 0.006 0.007 0.008
0.50 100 5 -0.050 -0.005 0.005 0.012 0.019 0.024
0.70 100 5 -0.119 -0.030 -0.003 0.019 0.039 0.049
0.80 100 5 -0.221 -0.068 -0.024 0.005 0.036 0.044
0.90 100 5 -0.452 -0.142 -0.077 -0.037 0.001 0.010
0.95 100 5 -0.598 -0.185 -0.112 -0.069 -0.030 -0.010
RMSE Predicted by Higher Order Theory
0.10 100 5 0.081 0.099 0.102 0.102 0.102 0.102
0.30 100 5 0.100 0.097 0.109 0.112 0.113 0.113
0.50 100 5 0.133 0.097 0.120 0.134 0.142 0.147
0.70 100 5 0.220 0.121 0.141 0.173 0.212 0.250
0.80 100 5 0.354 0.189 0.228 0.261 0.318 0.399
0.90 100 5 0.965 0.609 1.181 1.449 1.561 1.633
0.95 100 5 3.200 2.557 6.296 8.863 10.544 11.600
Actual RMSE
0.10 100 5 0.081 0.099 0.102 0.102 0.102 0.102
0.30 100 5 0.099 0.097 0.109 0.113 0.116 0.118
0.50 100 5 0.131 0.097 0.120 0.137 0.158 0.172
0.70 100 5 0.211 0.116 0.141 0.183 0.234 0.261
0.80 100 5 0.318 0.144 0.161 0.199 0.265 0.298
0.90 100 5 0.552 0.188 0.170 0.195 0.255 0.285
0.95 100 5 0.690 0.215 0.173 0.183 0.231 0.278
32
Table 5: Monte Carlo Comparison of Estimators
Estimator bbBB bbLDGMM bbAS bbLD−AB bbLD1 bbLD2 bbLD3β = 0.1, T = 5, n = 100
Mean 0.105 0.101 0.098 0.103 0.100 0.102 0.100
RMSE 0.072 0.092 0.070 0.101 0.106 0.104 0.103
Median 0.104 0.101 0.098 0.102 0.101 0.102 0.099
1st percentile -0.054 -0.113 -0.057 -0.127 -0.144 -0.145 -0.136
25th percentile 0.057 0.040 0.050 0.033 0.031 0.034 0.032
75th percentile 0.152 0.160 0.143 0.169 0.169 0.169 0.165
99th percentile 0.276 0.318 0.268 0.344 0.352 0.352 0.349
β = 0.5, T = 5, n = 100
Mean 0.508 0.526 0.504 0.507 0.512 0.519 0.520
RMSE 0.092 0.150 0.113 0.102 0.123 0.146 0.170
Median 0.507 0.503 0.491 0.504 0.503 0.503 0.497
1st percentile 0.296 0.273 0.295 0.282 0.255 0.250 0.245
25th percentile 0.448 0.429 0.430 0.437 0.429 0.421 0.418
75th percentile 0.569 0.588 0.559 0.573 0.588 0.598 0.592
99th percentile 0.725 1.033 0.922 0.759 0.826 0.953 1.073
β = 0.8, T = 5, n = 100
Mean 0.753 0.816 0.766 0.797 0.819 0.835 0.854
RMSE 0.158 0.182 0.184 0.214 0.254 0.274 0.294
Median 0.767 0.786 0.736 0.751 0.775 0.783 0.783
1st percentile 0.331 0.493 0.405 0.513 0.369 0.423 0.436
25th percentile 0.662 0.682 0.641 0.672 0.685 0.685 0.682
75th percentile 0.861 0.936 0.871 0.856 0.892 0.911 0.938
99th percentile 1.054 1.280 1.208 1.602 1.763 1.849 1.855
β = 0.9, T = 5, n = 100
Mean 0.664 0.826 0.709 0.816 0.853 0.880 0.902
RMSE 0.388 0.221 0.317 0.249 0.240 0.234 0.264
Median 0.693 0.798 0.697 0.762 0.814 0.838 0.844
1st percentile -0.221 0.380 0.077 0.467 0.384 0.427 0.495
25th percentile 0.499 0.689 0.558 0.691 0.734 0.747 0.750
75th percentile 0.882 0.954 0.854 0.861 0.911 0.961 0.981
99th percentile 1.273 1.349 1.295 1.752 1.764 1.722 1.879
33
Table 6: Bias of Near Unit Root Approximations
T AS/AB LD Optimal
5 -0.6632 -0.2500 -0.0955
6 -0.5903 -0.2000 -0.0670
7 -0.5297 -0.1667 -0.0495
8 -0.4781 -0.1429 -0.0381
9 -0.4336 -0.1250 -0.0302
10 -0.3948 -0.1111 -0.0245
Table 7: MSE of Near Unit Root Approximations
T Optimal Moments AS Moments
5 0.6866 3 0.7464 17
6 0.4246 3 0.5680 24
7 0.2889 3 0.4488 32
8 0.2031 4 0.3646 41
9 0.1524 4 0.3056 51
10 0.1093 4 0.2663 62
34