topics in applied econometrics the econometric analysis …¡ndez-amador/phd/ws12/... · the...

Topics in Applied Econometrics

The Econometric Analysis of Panel Data

Michael Pfa�ermayr

WS 2012/2013

Contents

1 Lecture 1: The Advantages of Panel Data 2

2 Lecture 1: One-way Error Component Model 6

2.1 Review of Some Important Concepts for Panel Models . . 6

2.2 One-way Model with Fixed E�ects . . . . . . . . . . . . . 11

2.3 One-way Model with Random E�ects . . . . . . . . . . . . 18

2.4 Testing in Panel Models . . . . . . . . . . . . . . . . . . . 26

3 Lecture 2: GMM{Estimation and IV's 41

3.1 GMM-Estimation . . . . . . . . . . . . . . . . . . . . . . 41

3.2 Estimation in Case of Just Identi�ed Parameters . . . . . . 50

3.3 The Generalized Method of Moments Estimator . . . . . . 53

3.4 Testing Hypotheses in the GMM Framework . . . . . . . . 67

4 Lecture 3: Weak Instruments and IV-Estimation in Static Panel

Models 70

4.1 Weak Instruments . . . . . . . . . . . . . . . . . . . . . . 70

4.2 IV-Estimation in Static Panel Models . . . . . . . . . . . . 89

4.3 The Mundlak Model and the Hausman-Taylor Estimator . 106

5 Lecture 4: Dynamic Linear Panel Models 117

6 Lectures 5 and 6: Limited Dependent Variables and Panel Data146

6.1 Preliminary Remarks on Maximum Likelihood Estimation . 146

6.2 The Fixed E�ects Logit and Random E�ects Probit PanelModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

6.3 Sample Selection in Panel Models . . . . . . . . . . . . . 180

6.4 Panel Models for Count Data . . . . . . . . . . . . . . . . 202

7 Lectures 7 and 8: Spatial Panel Models 209

7.1 The Spatial Dependence of Random Variables and SpatialEconometric Models . . . . . . . . . . . . . . . . . . . . . 209

7.2 Spatial Cross-Section Models (SEM-, SAR- and Durbin-Models) . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

7.3 The Moran I Test for Spatially Correlated Disturbances . . 254

7.4 GM-Estimation of Spatial Random-E�ects Models . . . . . 257

7.5 QMLE-Estimation of Spatial Fixed-E�ects Models . . . . . 269

8 Appendix Lab Sessions 271

8.1 Maximum Likelihood Estimation with Stata . . . . . . . . 271

Literature

Arellano, Manuel (2003), Panel Data Econometrics, Oxford University Press, Oxford.

Baltagi, Badi (2005), Econometric Analysis of Panel Data, Wiley, New York, 3ed edition.

Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics, Methods and Ap-

plications, Cambridge University Press, Cambridge.

Greene, William H. (2003), Econometric Analysis, 5th edition, Prentice Hall, New Jersey.

Hsiao, Cheng (2003), Analysis of Panel Data, Cambridge University Press, Cambridge.

Wooldridge Je�rey M. (2010), Econometric Analysis �of Cross Section and Panel Data,

2nd ed., MIT Press Cambridge MA.

1 Lecture 1: The Advantages of Panel Data

Balanced panels comprise N units (persons, �rms, regions, countries, etc.)

that are repeatedly observed (T -times).

y x1 x2 d1 ::: dN � 1 ::: �Ty11 x1;11 x2;11 1 ::: 0 1 ::: 0y12 x1;12 x2;12 1 ::: 0 0 ::: 0::: ::: ::: ::: ::: ::: ::: ::: :::y1T x1;1T x2;1T 1 ::: 0 0 ::: 1::: ::: ::: ::: ::: ::: ::: ::: :::yN1 x1;N1 x2;N1 0 ::: 1 1 ::: 0yN2 x1;N2 x2;N2 0 ::: 1 0 ::: 0::: ::: ::: ::: ::: 1 0 ::: 0yNT x1;NT x2;NT 0 ::: 1 0 ::: 1

Advantages of panel data (Baltagi, 2005, Hsiao, 2003):

More observations, i.e., more e�cient estimators and better inference.

Better estimation of complex behavioral models (program evaluation).

Some economic hypothesis can't be tested in a cross-section.

Accounting for heterogeneity and measurement errors; (Fixed) unit and

time e�ects reduce the missing variable bias.

Speci�cally, one can control for unobserved impacts that a�ect all units

symmetrically (time e�ects) or are invariant within units over time (unit

e�ects).

Estimation of dynamic models with short times spans to analyze adjust-

ment processes at the micro level.

Hsiao (2003), Fig. 1.6

2 Lecture 1: One-way Error Component Model

2.1 Review of Some Important Concepts for Panel Models

Baltagi (2005), chapter 2, Hsiao (2003), chapter 3.

Model:

yit = �+ x0it�+uit; i = 1; :::N; t = 1; ::::T:

xit:::(K � 1); �:::(K � 1); yit and uit are scalars.

i:::units (persons or �rms), t:::time.

The panel is balanced. For each unit i there are T observations available. Estimators for

unbalanced panels are similar, but notation is very complicated.

One way error structure:

uit = �i + �it

�i is either a �xed or random unit e�ect.

�it s iid(0; �2)

xit is exogenous and thus independent of �it; E[�itjxit] = 0:

One may include �xed time e�ects if the time dimension of the panel is

small.

In matrix notation:

y = ��NT +X� + u = Z� + u

y = [y11; y12; :::; y1T ; y21; y22; :::; y2T;:::; yN1; yN2; :::; yNT ]0; (NT � 1)

Note: Ordering of the data is all important, i is fast, t slow here (this

order is reversed in spatial panels).

X:::(NT �K) does not include the constant,�NT :::(NT � 1) vector ones,Z = [�NT ;X];u = Z��+ �;u = [u11; u12; :::; u1T ; u21; u22; :::; u2T ; :::; uN1; uN2; :::; uNT;];� is the (N � 1) vector of unit e�ects,Z� = IN �T is (NT �N) selector matrix (dummy for each unit).

Z�Z0� = (IN �T )�IN �0T

�=�IN �T �0T

�= (IN JT )

Z0�Z� =�IN �0T

�(IN �T ) =

�IN �0T �T

�= (IN T ) = T IN�

Z0�Z��1

= 1T IN

P = Z�(Z0�Z�)

�1Z0� =�IN �JT

�; �JT = 1

T JT : P is a projection ma-

trix that generates unit speci�c means, i.e.,

Py = [�y1�; :::�y1�| {z }T�times

; :::; �yN �; :::�yN �| {z }T�times

]0, with �yi� =1T

PTt=1 yit

Q = INT � P transforms the observations into deviations from group

means, i.e.,

Qy = [y11 � �y1�; :::; y1T � �y1�; :::; yN1 � �yN �; :::; yNT � �yN �]0

Properties if P and Q:

1. P and Q are symmetric and idempotent; P0 = P; P2 = P; Q0 = Q;Q2 = Q:

2. rank(P) = N; rank(Q) = N(T � 1) [The rank of an idempotent

matrix is equal to its trace]:

3. P and Q are orthogonal: PQ = PINT �P2 = 0:

4. P + Q = INT :

2.2 One-way Model with Fixed E�ects

�i are �xed parameters that have to be estimated. This involves the in-

cidental parameter problem: For �xed T and N ! 1 the number of

parameters increases in�nitely so that it is impossible to estimate �i con-

sistently.

The assumption of �xed e�ects seems plausible if inference refers to a spe-

ci�c group of N �rms, countries or regions so that inference is conditional

on the considered group.

Model:

y = Z�+ Z��+�

Principally, this model can be estimated by OLS with Z� as dummy variable

matrix (LSDV). For large N there will be numerical problems since one has

to invert a (K +N)� (K +N) Matrix.

Within estimator, OLS with the transformed model:

Qy = QZ� +QZ��+Q� = QZ� +Q�

Unit e�ects are eliminated:

QZ� = Z� � Z�(Z�Z0�)�1Z0�Z� = Z� � Z� = 0

so that

e� = (X0QX)�1X0Qy = (fX0fX)�1fXeyV ar

�e�� = �2�(X0QX)�1 = �2�(

fX0fX)�1

Remarks:

1. e� can also be derived using the Frisch-Waugh-Lovell-Theorem (David-son, Mackinnon, 1993, p. 19) [P is projection matrix on Z�. Q

projects on its orthogonal complement]. Then projecting all variables

in the model using Q yields the within estimator.

2. GLS using the generalized inverse of Q (which is Q) also yields e�:

The estimation of the remaining parameters:

e� = �y�� x0�� e�; e�i = �yi� � e�� x0i� e� [NormalizationPNi=1 �i = 0]

Bias of OLS:

If our model is the true DGP, OLS for the model y = Z�+ � yields biased

estimates (missing variable bias).

Consistency:

If T is �xed and N !1 only the estimator for � is consistent.

Testing for �xed e�ects: H0 : �1 = 0; :::; �N = 0:

F0 =(RRSS � URSS)=(N � 1)URSS=(NT �N �K)

s FN�1;NT�N�K:

For large N one can use the residuals of the within model to calculate

Warning: Out of econometric program software for OLS using the within

transformed model uses NT �K degrees of freedom while NT �N �Kis the correct number:

Robust �xed e�ects estimation:

Arellano (1987) shows that the robust asymptotic variance-covariance ma-

trix of e� can be estimated byvar(e�) = (fX0fX)�1 NX

fX0i|{z}K�T

euieu0i| {z }T�T

fXi|{z}T�K

(fX0fX)�1where fXi is the (T�K) matrix of within transformed observations referringto observation i and eui is the (T � 1) vector of residuals of the within

transformed model. In stata we use the option robust or vce(robust).

Assume that the observations are not independent but that they can be

divided into M groups G1, G2; :::; GM that are independent. The cluster

robust estimator of the variance is

var(e�) = (fX0fX)�1 MXj=1

fXG0j euGj euG0j exGj (fX0fX)�1:If each group refers to a unit, we obtain the robust variance estimator of

Arellano (1987). Note the cluster variable has to be time-invariant and

nested in the unit e�ects.

2.3 One-way Model with Random E�ects

�i is a random unit e�ect and we assume �i s IID(0; �2�); - most impor-tantly - E[�ijxit] = 0; and E[�i�it] = 0:

This is a plausible DGP, if we have a random sample of N individuals and

we want to have estimates that are representative for the whole population

Variance-covariance matrix of u = Z��+ � :

= E[uu0] = Z�E[��0]Z0�+ E[��0] = �2�(IN JT ) +�2�(IN IT ):

Homoskedastic variance and equi-correlation within units: The variance-

covariance matrix of u is denoted by and it is block diagonal with

Cov[uitujs] =

8><>:�2� + �

2� if i = j and t = s

�2� if i = j and t 6= s0 if i 6= j and t 6= s

For the Inversion of one can use Wansbeek, Kapteyn (1982, 1983)-Trick.

Let ET = IT � �JT

= �2�(IN JT ) + �2�(IN IT )= T�2�(IN �JT ) + �2�(IN �JT ) + �2�(IN ET )= �21(IN �JT ) + �2�(IN ET )= �21P+ �

2�Q;

�21 = T�2� + �2�

It is easy to show that

�1 = 1�21P+ 1

�2�Q, �

12 = 1

�1P+ 1

��Q:

[Proof: �1 = ( 1�21P+ 1

�2�Q)(�21P+�

2�Q) = PP+

�21�2�QP+�

2��21PQ+QQ =

P+Q = INT ]

Fuller, Battese (1973, 1974), GLS-transformation and OLS:

y� = ��12y = ��

�12Z�+ ��12u

y� =��1P+Q

�y =

�I�

�1� ��

�P�y

with typical element y�it = yit � ��yi�; � = 1��1= 1� ��

(T�2�+�2�)12:

Feasible GLS: �1 and �v are estimated using consistently estimated resid-

uals in a possibly iterative procedure.

Best quadratic unbiased (BQU)-Estimator:

Pu s(0;�21P); Qu s (0;�2�Q)

b�21 = u0Putr(P) =

TPNi=1 �u

b�2� = u0Qutr(Q) =

PNi=1(uit��ui�)2

N(T�1)

The u = y� Z� is not known, so we use a consistent estimator bu= y�Zb�:

Wallace, Hussain (1969): OLS-Residuals (OLS is unbiased, consistent,

but ine�cient).

Amemiya (1971): LDSV-residuals to drive an estimator e� and eu= y � Ze�is used to calculate b�1 and b��.Swamy, Arora (1972):

1. Within estimatorbb�� = (y0Qy � y0QX(X0QX)�1X0Qy)=(N(T � 1)�K)2. Between estimator:

OLS with T12 �yi� = T

12�+ T

12�x0i��+T

12�ui�; with V ar(T 1=2�ui�) = �21bb�1 = (y0Py � y0PZ(Z0PZ)�1Z0Py)=(N �K � 1)

The random e�ects estimator is a weighted average of the within and

between estimator (Baltagi, 2005, p. 17).

Under the random e�ects assumption it holds that

p limN!1 b�GLS = p limN!1 e�Within = p limN!1 ��Between = �:

However, b�GLS is the most e�cient one of these estimators.De�ning WXX = X0QX; WXy = X0Qy; BXX = X0(P� �JNT )X and

BXy = X0(P� �JNT )y, we can writee�Within = W�1

��Between = B�1XXBXy

De�ne the scalar �2 =�2��21=

�2�T�2�+�

b�GLS = �WXX + �

2BXX��1 �

WXy + �2BXy

�=�WXX + �

2BXX��1

WXXe�Within

+�WXX + �

2BXX��1

�2BXX��Between

V ar(b�GLS) = �2� �WXX + �2BXX

��1

Remember: �2 =�2��21=

�2�T�2�+�

(i) �2� = 0 ) �21 = �2� and � = 1 : b�GLS = b�OLS (same weight ofwithin and between variation).

(ii) T !1 ) �2 ! 0 : b�GLS ! e�Within

(iii) If WXX is "large" relative to BXX ,b�GLS tends to be similar toe�Within:

2.4 Testing in Panel Models

Breusch-Pagan Test (Random e�ects model):

H0 : �2� = 0 vs. H1 : �2� > 0

LM1 =NT

2(T � 1)

bu0 (IN JT ) bubu0bu!2

ass �2(1) under H0

bu denotes the vector of OLS-Residuals.

Hausman test:

E[�ijxit] = 0 is the critical assumption of the random e�ects model. If

E[�ijxit] 6= 0; b�GLS is biased and inconsistent.Hausman (1978): Under H0 : E[�ijxit] = 0 both b�GLS and e�Within are

consistent, but b�GLS is asymptotically e�cient. Under H1: e�Within is

consistent, but b�GLS is not. This forms the basis of the famous Hausmantesting principle.

De�ne q = b�GLS � e�Within:b�GLS � � =

�X0�1X

��1X0�1u ande�Within � � =

�X0QX

��1X0Qu:Under H0 it holds that plimN!1(q) = 0 and

Cov(b�GLS;q)= Cov((b�GLS; b�GLS � e�Within)

��b�GLS � �� b�GLS � �� e�Within � ��0�

��b�GLS � �� b�GLS � ��0 � �b�GLS � �� e�Within � ��0�

= V ar(b�GLS)� Cov(b�GLS e�Within)

=�X0�1X

��1 � �X0�1X

��1X0�1E[uu0]QX

�X0QX

��1=

�X0�1X

��1 � �X0�1X

��1X0QX

�X0QX

��1= 0;

because E[uu0] = :

From e�Within =b�GLS �q and Cov(b�GLS;q) = 0 we have

V ar(e�Within) = V ar(b�GLS) + V ar(q) orV ar(q) = V ar(e�Within)� V ar(b�GLS)

= �2��X0QX

��1 � �X0�1X

��1

Hausman test statistic:

bq0 � \V ar(q)��1 bq ass �2(K) under H0

Thereby we replace �1 by a consistent estimator b�1.

Baltagi Wu LBI-test:

This is an LM-test and the panel analog to the Durbin and Watson test for

time series models. It is a locally best invariant test (LBI). It is assumed

that the remainder error may follow an AR1 process �it = �vit�1 + "it;where "it is iid(0; �

2") so that H0 : � = 0: The LBI-test statistic is the sum

of four terms and it is applicable in unbalanced panels as well. In stata it

can be calculated using xtregar, lbi.

Unfortunately, this test statistic possesses a very complicated asymptotic

distribution and does not have tabulated p-values.

d� = d1 + d2 + d3 + d4

Pnij=1

h~z2iti;j�1

�~z2itijI(tij�ti;j�1=1)i2

Pnij=1 ~z

Pni�1j=1

h~z2iti;j�1

I(tij�ti;j�1=1)i2

Pnij=1 ~z

PNi=1 ~z

2iti1PN

Pnij=1 ~z

PNi=1 ~z

2itiniPN

Pnij=1 ~z

where unit i is observed at times ti;j for j = 1; :::; ni with 1 = ti;1 < ::: <ti;i = Ti with ni > K for i = 1; 2; :::; N .

Wooldrige's test for autocorrelated residuals:

Under the null of no serial correlation the residuals from the regression

of the �rst-di�erenced variables should have an autocorrelation of �0:5.Note, under H0 :

Corr(��it;��it�1) =E[(vit�vit�1)(vit�1�vit�2)]

[V ar(vit�vit�1)V ar(vit�1�vit�2)]0:5=�2�2�2�

= 0:5:

Wooldridge (2002) proposes to regress the residuals from the regression with �rst-di�erenced

variables on their lag and to test whether the coe�cient on the lagged residuals is equal to

�0:5. To account for the within-panel correlation in that regression, the VCE is adjustedfor clustering at the panel level. Since cluster() implies robust, this test is also robust

to conditional heteroskedasticity. See Drukker (2003) and Wooldridge (2002) for further

details.

In stata the ado-�le xtserial performs this test.

Panel Moran I test (Mutl and Pfa�ermayr, 2010):

We now order the observations �rst by time and then by units and assume

the panel is balanced.

Suppose that b�N is an initial root N -consistent estimator of � (within

estimator) and denote beuN = eyN� fXN b�N the residuals the estimated

disturbances of the within transformed model. Furthermore, de�ne the

(N �N) normalized spatial weights matrixWN with elements e.g. wij =1=dij

�Nj=11=dijand dii = 0; where dij denotes the distance between unit i and

Under H0 there is no spatial autocorrelation of the disturbances (� = 0),

while under H1 one assumes that �t = �WN�t + "t:

The panel Moran I test statistic is given as

beu0t;NWNbeut;Nb�2�;N [(T�1)tr((WN+W0N)WN)]

and is asymptotically distributed as N(0; 1). See also Baltagi, Song and

Koh (2007,) for LM-tests for spatial correlation of the disturbances in

random e�ects model.

Pesaran's (2004) CD test:

N(N � 1)

!12 N�1Xi=1

NXj=i+1

b�ij; b�ij =PTt=1 �̂it�̂jt�PT

t=1 �̂2it

�12�PT

t=1 �̂2jt

Under the null hypothesis of no cross-sectional dependence,for N ! 1and T su�ciently large CD

d! z, where z s N(0; 1).

The CD statistic has mean at exactly zero for �xed values of T and N, undera wide range of panel-data models, including homogeneous/heterogeneous,dynamic models and nonstationary models.

As pointed out in Pesaran (2004), the CD test need not be consistent foralternatives of interest.

In stata this test is available in the ado �le xtcsd , pesaran.

Example 2.1 Di�erence{in-Di�erence Estimation (Autor, 2003). We ob-serve N units in T Periods. Some of the units are subject to an exogenoustreatment (policy intervention) that is in e�ect form �ti onwards. The sim-plest design is T = 2 and �ti = 2.

Econometric model:

ysit = �i+�t + �Dsit + �it; i = 1; :::N; t = 1; :::; T; s = 0; 1

if unit i is subject to the treatment and 0 otherwise. The dummy Dsit takesthe value 1 if s = 1 and t � �ti

The treatment e�ect is identi�ed by doubly di�erencing (e.g. at t1 ��ti and t0 < �ti)

�1 = E[y1it1 � y1it0] = �t1 + � � �t0

�0 = E[y0it1 � y0it0] = �t1 � �t0

�1 ��0 = �

Note it is possible to include control variables xit with parameter vector

Autor (2003) considers an even more general DID-model to estimate the

impact of the changes of the "unjust dismissal doctrine" in US-states on

"Temporary Help Services (THS)", i.e. outsourcing of workers.

ysit = x0it� + �i+�t+�p2D

sit+2 + �p1D

sit+1 + �D

+�m1Dsit�1 + �m2D

sit�2 + �m3D

sit�3 + �4G

sit + �it

Gsit takes the value 1 if t � �ti + 4 and captures the persistent long run

e�ect after period 3.

Further references:

Autor D. (2003), Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine

to the Growth of Employment Outsourcing, Journal of Labor Economics, January 21(1),

Baltagi B. and P.X. Wu (1999), Unequally Spaced Panel Data Regressions with AR(1)

Disturbances, Econometric Theory 15(6), 814-823.

De Hoyos, R.E. and V. Sara�dis (2006), Testing for Cross-Sectional Dependence in Panel-

Data Models, The Stata Journal 6(4), 482-496.

Mutl J. and M. Pfa�ermayr (2010), A Note on the Cli� and Ord Test for Spatial Corre-

lation in Panel Models, Economics Letters 108(2),225-228.

Pesaran, M. H. (2004), General Diagnostic Tests for Cross-Section Dependence in Panels,

University of Cambridge, Faculty of Economics, Cambridge Working Papers in Economics

No. 0435.

3 Lecture 2: GMM{Estimation and IV's

3.1 GMM-Estimation

The basic idea of GMM estimation is to derive parameter estimates bymatching the empirical moments of the sample to their theoretical coun-terparts in the population. Naturally, this leads to m-estimators.

De�nition 3.1 (Moment conditions): A sample described by the random variables

fYi;xig with n observations is used to estimate the (K�1) parameter vector � 2 �.Let �0 be the true parameter vector and let m(Yi;xi; �) be a (J � 1) vector valuedfunction, where �2k = E[Y 2ki ] exists for some k > 1. The moment conditions aregiven by:

E[mj(Yi;xi; �0)] = 0; j = 1; :::; J

Example 3.1 (Estimating the sample moments): Consider a sample of iid (�; �2)

random variables fYig with n observations. The k-th uncentered empirical moment is

de�ned as �mk =1n

Pni=1 Y

The theoretical moment conditions for the mean and variance are given by:

E[mk(Yi)] = E[Yki ]� �k = 0:

The empirical counterparts are given by

�mk � �k = 1n

Y ki � �k = 0

E[ �mk] =1n

E[Y ki ] = �k

V ar[ �mk] = V ar[1n

Y ki ]

�E[Y 2ki ]�

�E[Y ki ]

�2�= 1n(�2k � �

We know from Khinchine's weak law of large numbers (see Greene, 2008)

that plimn!1 �mk = �k and from the Lindeberg-Levy central limit theorem

(see Greene, 2008) that n12 ( �mk � �k)

d! N(0; �2k � �2k):

Speci�cally, consider �rst and second moments:

E[m1(Yi)]� �1 = 0

E[m2(Yi)]� �2 � �21 = 0

�m1 = 1n

Yi = b�1�m2 = 1

Y 2i = b�2 + b�21b�1 = �m1b�2 = �m2 � �m21

Note: b�2 = 1n

Pni=1 Y

2i �

Pni=1 Yi

�2= 1

�Yi � �Y

�2: Equating the

theoretical moments to their empirical counterparts allows to solve for �̂1 and �̂2: This

implies that the arithmetic mean and its estimated variance are method of moments

estimators.

For iid samples we can compute J empirical moments �m1; ..., �mJ ; whose

probability limits are known functions of the parameters (�1; :::; �J). We

equate the empirical moments to these J functions. Inversion then al-

lows to express the estimated J parameters as functions of the empirical

moments.

�m1 � �1(�1; :::; �K) = 0

�m2 � �2(�1; :::; �K) = 0

�mK � �K(�1; :::; �K) = 0:

Solving, gives b�k = �k( �m1; :::; �mK); k = 1; ::;K:

The empirical moments will be consistent by the virtue of the law of large

numbers.

They are asymptotically normal by a proper central limit theorem.

The derived parameters inherit consistency by the Slutzky theorem and

asymptotic normality by the virtue of the delta method.

Example 3.2 (Linear regression): Let yi = x0i�+ ui; X = (x1; :::;xn)

0, xi(K � 1); � (K � 1) and ui s iid(0; �2):

Moment conditions: E["ijxi] = 0; so that E[yijxi] = x0i�:

Law of iterated expectations: E[xi"i] = Ex[E"[xi"ijxi]] = Ex[xiE"["ijxi]]= Ex[xi0] = 0:

E[xi"i] = E[xi(yi � x0i�)] = 0 form moment conditions with �0= �;

Empirical moments: mj(yi;xi;�) =xij(yi � x0i�); j = 1; :::;K:

The sample analog is: �m(yi;xi; �̂) =1n

Pni=1m(yi;xi; �̂) =

Pni=1 xi"i =

Pni=1 xi(yi � x0i�̂) = 0:

Remark:

1. Since J = K we obtain a system of K equations (xi is a (K � 1)

vector) in K unknowns (the components of �̂). Under further assumptions

a unique solution exists, i.e. � is exactly identi�ed. If there are less

equations than parameters, � remains unidenti�ed. If there are more, � is

overidenti�ed.

2. We don't need distributional assumptions for ui. This makes GMM

estimators attractive.

Example 3.3 (Instrumental variables): yi = x0i�+ui; ui s iid(0; �2); but

E[uijxi] 6= 0:

There are J instrument variables, collected for observation i in the (J�1)-vector zi with E[ziui] = 0:

Moment conditions: E[ziui] = E[zi(yi � x0i�)] = 0.

Empirical moments: m(yi;xi; zi;�) = zi(yi � x0i�):

There are J equations and K parameters. � is identi�ed if J � K.

3.2 Estimation in Case of Just Identi�ed Parameters

Assume that we have J = K moment conditions for K Parameters (� is

just identi�ed): E[m(yi;xi; �0)] = 0:

We use �m(�) = 1n

Pni=1m(yi;xi; �) instead of E[m(yi;xi; �)] to obtain a

methods of moments estimator.

Since we have J = K moment conditions, we can solve this system for �

(under the assumption that a unique solution exists).

Example 3.4 (Linear regression continued):

Empirical moments: �m(yi;xi; �̂) =1n

Pni=1 xibui = 1

Pni=1 xi(yi�x0i b�) =

Solving yields: b� = n �Pni=1 xix0i��1 1nPni=1 xiyi or b� = (X0X)�1X0y:For xi = 1 we have � = n

�Pni=1 1 � 1

��1 1n

Pni=1 yi = n

1n�y = �y:

Example 3.5 (Instrumental variables continued): Assume J = K instru-ments (just identi�ed case).

Empirical moments: �m(yi;xi; zi;�) =1n

Pni=1 zibui = 1

Pni=1 zi(yi�x0i b�) =

Solving yields: b� = 1n

�Pni=1 zix

��1 1n

Pni=1 ziyi or

b� = (Z0X)�1Z0y:

Example 3.6 (Maximum Likelihood):

Log-likelihood of the sample: 1nPni=1 ln f(yi;xi; �)

Moment conditions: E[m(yi;xi; �)] = E[@ ln f(yi;xi;�)

��=�0 ] = 0Empirical moments: 1n

Pni=1m(yi;xi; �) =

@ ln f(yi;xi;�)@�

��=b� = 0:

3.3 The Generalized Method of Moments Estimator

We consider the overidenti�ed case with J > K moment conditions.

E[mj(yi;xi; zi; �)] = E[mij(�)] = 0: j = 1; :::; J

The corresponding empirical moments are given by

�mj;n(�) =1n

mj(yi;xi; zi; �) =1n

mij(�)

and gives a system of J equation in K unknown parameters.

�mn(�) =1n

mi(b�) = 0

Minimize the criterion function (weighted sum of squares):

qn(�) = �mn(�)0Wn �mn(�);

whereWn denotes a positive-de�nite (J � J) weighting matrix that maydepend on the data but does not depend on �. We assumeWn

whereW0 is a positive de�nite matrix.

Similar to the logic of generalized least squares the weights are best chosen

to be inversely proportional to the variances of the moments (Hansen,

1982): W0 = Asy:V ar(n12 �mn) = ��1:

In fact, if plimn!1 �mn(�) = 0; one can show that the resulting estimatorb� is consistent (minimum distance estimator, see Greene, 2008, section

15.2 and Assumption 15.1, p. 448.)

The empirical moments are assumed to be continuous and continuously

di�erentiable so that we will be able to assume that �Gn(�0) =@ �mn(�0)@�00

@mi(�0)@�00

converges in probability, plimn!1 �Gn(�0) = �G:

We will impose an identi�cation assumption: For any n � K; if �1 and

�2 are two di�erent parameter vectors, there exist data sets such that

�mn(�1) 6= �mn(�2): Formally, identi�cation is de�ned to imply that the

probability limit of the GMM criterion function is uniquely minimized at

true �0:

There are three implications of the identi�cation assumption:

1. Order condition: The number of moments must be as large as the number of para-meters: J � K:

2. Rank condition: The J �K matrix �Gn(�0) has rank K:

3. Uniqueness: Together with the continuity assumption the identi�cation assumptionimplies that the parameter vector satisfying the population moment condition is

unique. Let plimn!1 �mn(�0) = 0: If �1 also satis�es this condition, then�1 = �0:

Lastly, one has to show (or to assume) that the empirical moments obeya central limit theorem, assuming that the moments have an asymptotic

variance, say �: n12 �mn(�0)

d! N(0;�):

Theorem 3.1 (Asymptotic distribution of the GMM estimator): Under the

preceding assumptions and usingW0 = ��1,

b�GMMP! �0

�b�GMM � �0�d! N(0; nVGMM)

where VGMM = n�1(�G0��1 �G)�1:

Sketch of the proof: The proof is similar to that of the ML-estimator. Following

Greene (2008, p.450) we will assume qn(�) converges to q0(�) for all points of the

parameter space. Moreover, plimn!1qn(�0) = q0(�0) = 0; since plimn!1�mn(�0) = 0: SinceW0 is positive de�nite, we have 0 � qn(b�GMM) � qn(�0):

Note b�GMM actually minimizes qn(�) in �nite samples and qn(b�GMM)p! 0, since

qn(�0)p! 0: Consistency then follows from �mn(b�GMM)� �mn(�0)

p! 0 and the

identi�cation condition (Greene, 2008, p.450).

Asymptotic normality is established in two steps. First,

@q(�)

��=b�GMM

= 2�Gn(b�GMM)0Wn �mn(b�GMM) = 0 (1)

Second, using the mean value theorem:

�mn(b�GMM) = �mn(�0) + �Gn(��)�b�GMM � �0

�(2)

where �� is a point in between b�GMM and �0: Inserting (2) yields in (1) yields

�Gn(b�GMM)0Wn �mn(�0) + �Gn(b�GMM)

0Wn �Gn(��)�b�GMM � �0

�or, assuming that �Gn(b�GMM)

0Wn �Gn(��) is invertible,

�b�GMM � �0�= �

h�Gn(b�GMM)

0Wn �Gn(��)i�1 �Gn(b�GMM)

0Wnn12 �mn(�0):

Both b�GMM and �� converge to �0; sinceb�GMM is consistent. Due to the continuity

of �Gn (see above) we have �Gn(b�GMM)p! �G(�0) and �Gn(��)

p! �G(�0):

Then the limiting distribution of n12

�b�GMM � �0�must be the same as that of

�h�G(�0)

0W0�G(�0)

i�1 �G(�0)0W0n12 �mn(�0): Now n

12 �mn(�0) converges to a

normal distribution,N(0;�); (as assumed in Greene, 2008) andh�G(�0)

0W0�G(�0)

i�1 ��Gn(�0)

0W0 to a matrix of constants so that

nVGMM =h�G(�0)

0W0�G(�0)

i�1 �G(�0)0W0�W0�G(�0)

h�Gn(�0)

0W0�G(�0)

i�1Under optimal weighting W0 = �

�1 this reduces to

nVGMM;optimal =h�G(�0)

0��1 �G(�0)i�1�

If we useW0 = I; one obtains

fVGMM = 1n

h�G(�0)

0 �G(�0)i�1 �G(�0)��G(�0) h�G(�0)0 �G(�0)i�1 :

Example 3.7 (Instrumental variables continued): Let J > K and remem-

ber the moment conditions E[ziui] = 0:

�mn(�)=1n

Pni=1 zi(yi � x0i�) = n�1Z0(y �X�):

qn(�) = n�1(Z0y � Z0X�)0Wnn�1(Z0y � Z0X�)

= n�1(y0ZWn��0X0ZWn)n�1(Z0y � Z0X�)

= n�2(y0�ZWnZ

0�y � �0X0 �ZWnZ0�y � y0 �ZWnZ

0�X�+�0X0

�ZWnZ

0�X�)

To obtain the e�cient GMM-estimator we set Wn = n(Z0Z)�1 [Note

V ar(n12 �mn(�)) = �2n1n�2(Z0Z)] with n(Z0Z)�1 ! W0: Using PZ =

n�1ZWnZ0 = n�1Zn(Z0Z)�1Z0 yields

qn(�) =n�1(y0PZy� �0X0PZy� y0PZX� + �0X0PZX�)

@qn(�)@� = �n�12X0PZy+n�12X0PZXb� = 0 )

b� = �X0PZX

��1X0PZy is the IV- or GMM-estimator.

For applications we need consistent estimates of �G(�0) and � (see Newey

and West, 1994).

@ �mi;n(�)

��=b�

Under independence one can estimate � by:

bS = 1

mi(b�)mi(

b�)0:If one usesWn instead of �.

ncfVGMM = 1

n(cG0Wn

cG)�1cG0WnbSWn

cG(cG0WncG)

Two-step GMM estimator: We need a consistent estimate of �.

First step: Use an arbitrary Wn; e.g., Wn = In: � will be estimated

consistently by bS, since this is based on consistently estimated �rst-stepresiduals.

Second step: Using bS�1 as weighting matrix yields the asymptoticallye�cient GMM estimator b�.Monte Carlo simulations show that the variance of two-step GMM esti-

mator is biased downwards in small samples (see Windmeijer, 2005 who

also provides a correction factor).

Example 3.8 (Instruments continued) yi = x0i�0 + ui; xi (K � 1); X =

(x1; :::;xn)0, �0 (K � 1) and ui s (0; �2): There are J > K population

moment conditions: E[ziui] = 0:

m(yi;xi; zi;�)= zi(yi � x0i�) = zi(ui � x0i(� � �0))

�mn(�)=1n

Pni=1 zi(ui�x0i(� � �0)) =

Pni=1 ziui�

Pni=1 zix

0i(� � �0)

p! �zx(� � �0); since 1n

Pni=1 ziui

p! 0 if the population moment con-

ditions hold and 1n

Pni=1 zix

0ip! E

i= �zx:

E[m(yi;xi; zi;�)] = 0, � = �0; i.e. �0 is identi�ed if p limn!1 �mn(�)

has rank K and the moments are not redundant.

Minimize qn(�) = �mn(�)0Wn �mn(�) withWnP!W0:

First order condition: 2�Gn(b�)0Wn �mn(b�) = 0:

�Gn(�) =@ �mn(�)@� = �1n

Pni=1 zix

0ip! ��zx:

V ar[n1=2 �mn(�)] =nV ar[1n

Pni=1 ziui] =

Pni=1E[u

2i ziz

0i] = �2�zz = �

with �zz = E[ziz0i]:

n1=2(b��0) d! N(0; �2(�0zxW0�zx)�1(�0zxW0�zzW0�zx)(�

0zxW0�zx)

ChooseWn = ��1n =�1n

Pni=1 ziz

��1 p!W0 = ��1zz ; so that

n1=2(b��0) d! N(0; �2(�0zx��1zz �x)

�1):

If zi and xi are non-stochastic: �zz = limn!1 1n

Pni=1 ziz

0i and �

limn!1 1n

Pni=1 zix

Note bS = 1n

Pni=1mi(

b�)mi(b�)0 = 1

Pni=1 u

2i ziz

0i: Under homoskedastic-

ity�y 1n

Pni=1 u

� �1n

Pni=1 ziz

�is a consistent estimator of �:

If we have more general assumptions on the disturbances, we need a more

general central limit theorem. Under heterosecasticity the Lindeberg-Feller

CLT su�ces.

3.4 Testing Hypotheses in the GMM Framework

In exactly identi�ed cases qn(�) = �mn(�)0Wn �mn(�) = 0, since we can

solve �mn(�) = 0: Under overidenti�cation �mn(�) = 0 forms a restriction

that can be tested. If the assumed population moment conditions do not

hold in the �rst place, at least some of sample moment conditions will be

systematically violated. By construction

nqn(�) =�n1=2 �mn(b�)�0 �Est:asy:V ar(n1=2 �mn(b�))��1 �n1=2 �mn(b�)�

is a valid Wald statistic for J�K restrictions (at J = K; we have nqn(�) =

0). Therefore, under H0 we have nqn(�)d! �2(J �K):

This is the Sargan test (homoskedasticy) and the Hansen J test (het-

eroskedasticity) and it is a speci�cation test.

For further tests see Greene (2008, p. 453f).

Testing a subset of j of the J moment conditions (C or J�test, Neweyand West, 1987, sorry for the inconsistent notation):

nqn(b�R)� nqn(b�) d! �2(j);b�R denotes the restricted estimator that is based on J� j moment restric-tions.

There exist natural counterparts to the LR, Wald and LM tests in the

GMM framework. Let c(�) a (R� 1) vector of the restrictions under testand C(b�) = @c(�)

@�0��=b� the (R�K) matrix of its derivatives. The Wald

test statistic is the same as that derived in the maximum likelihood case.

W = c(b�)0 hC(b�)est:asy:V ar(�̂)C(b�)0i�1 c(b�) as �2(R):

Further References:

Windmeijer, Frank (2005), A Finite Sample Correction for the Variance

of Linear E�cient Two-Step GMM Estimators, Journal of Econometrics

126(1), pp. 25-51.

4 Lecture 3: Weak Instruments and IV-Estimation

in Static Panel Models

4.1 Weak Instruments

Actually IV-estimation is a big challenge. Denote the (n � J) matrix ofinstruments by Z.

Exogeneity assumption: plimn!1n�1Z0" = 0:

Instrument relevance: plimn!1n�1Z0X = QZX is a �nite and non-zero

J �K matrix with rank K:

Working de�nition of weak identi�cation:

� is weakly identi�ed if the distributions of GMM or IV estimators are not well approxi-

mated by the asymptotic normal because of limited information in the data.

� Departures from standard asymptotics are what matters in practice.

� The source of the failures is limited information, not (for example) heavy tailed

distributions, near-unit roots, unmodeled breaks, etc.

� We will focus on large samples - the source of the failure is not small-sample problems

in a conventional sense. In fact most available tools for weak instruments have large-

sample justi�cations.

� We assume instrument exogeneity { weak identi�cation is about instrument rele-

vance, not instrument exogeneity.

IV-Regression model in a cross-section - a simple example:

y = Y� + u

Y = Z�+v

corr(uivi) = �

Y is a (n � 1) matrix, Z is (n � 1). Additional exogenous regressors areexcluded for simplicity.

IV-estimator with one endogenous variable and one instrument (just

identi�ed):

b�IV = (Z0Y)�1Z0y =Z0yZ0Y =Z0 (Y� + u)

Z0Y= � +

Z0uZ0Y

If the instrument is irrelevant � = 0; Y = v:

b�IV � �0 = Z0uZ0v

=n�1=2

Pni=1Ziui

n�1=2Pni=1Zivi

(zu; zv) s N 0;

�2u �2uv�2uv �2v

Remark:

b�IV is not consistent, since its variance converges to a constant.The limit distribution of b�IV is Cauchy.

Write zu = �zv + � with � = �uv=�2v so thatzuzv= � + �

zvand �

zvjz�

s N�0;�2�z2�

�: The distribution of b�IV ��0 is a mixture of normals

b�IV � �0 � � d!Z

�zvfz�(z�)dz�:

with heavy tails.

b�IV is centered at �0 � �; which is the plim of OLS and IV-estimation

does not work at all.

plimn!1b�OLS � �0 = plimn!1n�1v0un�1v0v

=�uv

�2v= �:

To measure instrument relevance we de�ne the concentration parameter

(J > 1)

�2 = �0Z0Z�=�2�:

Consider the F-test for H0: � = 0 and de�ne the infeasible counterpart

by eF with �2� known. In general, JeF is distributed as non-central �2 with

J degrees of freedom and non-centrality parameter �2:

One can show: E[ eF ] = �2=J + 1:In large samples: E[F ] t �2=J + 1:

F � 1 can be thought of as an estimator of �2=J:

Suppose that J = 1 and Z is �xed and u and v are normally distributed.

The sample size enters b�IV only through a concentration parameter �2 =�0Z0Z�=�2� (Rothemberg, 1984):

b�IV =Y0PZyY0PZY

��0Z0+v0

0B@(Z�+ v)| {z }y

� + u

1CA�0Z0PZZ�+ 2�0Z0v + v0PZv

��0Z0PZZ�+ 2�0Z0v + v0PZv

�� +

��0Z0+v0

�PZu

�0Z0PZZ�+ 2�0Z0v + v0PZv

= � +�0Z0PZu+ v

0PZu�0Z0PZZ�+ 2�0Z0v + v0PZv

= � +�0Z0PZu+ v

0PZu�2�2�+2�

0Z0v + v0PZv

Remember �2 = �0Z0PZZ�=�2� and Z

0PZ = Z0Z(Z0Z)�1Z0= Z0 :

b�IV � � =�0Z0PZu+ v

0PZu�2�2�+2�

0Z0v + v0PZv

=�u��0Z0Z�

�1=2�0Z0u=�u��0Z0Z��1=2+�u�vv0PZu=�u�v�2�2�+�2�

0Z0v=�v(�0Z0Z�)1=2 + �2vv0PZv=�2v

��b�IV � �� =

��0Z0Z�

�1=2�u�v�

��v

0B@�0Z0u= ��u��0Z0Z��1=2| {z }zu

1CA+�v0B@v0PZu=�u�v| {z }

1CA�2�2�+�

0B@2�0Z0v= ��v��0Z0Z��1=2| {z }zv

1CA+ �2v0B@v0PZv=�2v| {z }

��b�IV � �� = ��u

��vzu+�vSuv

�2�2�+�2zv + �2vSuv

= ��u

�zu+Suv

�2+�2zv + Suv

zu+Suv=�

1+2zv=�+ Suv=�2

zu = �0Z0u�u(�0Z0Z�)1=2

:::standard normal

zv = �0Z0v�v(�0Z0Z�)1=2

:::standard normal

Suv =v0PZu�u�v

:::quadratic form with respect to PZ

Svv =v0PZv�2v

:::quadratic form with respect to PZ

zu; zv; Suv; Svv do not depend on sample size, so sample size enters the

distribution of b�IV only through �:

� �2 plays the role usually played by n:

� As �2 !1, the usual asymptotic approximation obtains:

�(b�IV � �) d! �u

�vzu;

�vzu s N(0; �2u=�2v)

Hence, for the normal approximation to be accurate the concentration

parameter � must be large.

� For small values of �2 the distribution of �(b�IV ��) is non-standard,however.

How important are these deviations from normality quantitatively? Nelson-

Startz (1990a,b) plots of the distribution of the 2SLS t-statistic:

Dark line = irrelevant instruments; dashed light line = strong instruments;

intermediate cases: weak instruments

Weak instrument asymptotics:

Let the concentration parameter tend to a constant as the sample size

increases, i.e., model F as not increasing with the sample size.

This is accomplished by setting � = C=n: This is the Pitman drift for

obtaining the local power function of the �rst-stage F .

Under this assumption F converges in distribution to �2�J; �

�=J with

non-centrality parameter �2

J ; so F = Op(1).

Detection of weak instruments:

De�ne instruments to be weak if �2

J is small and compare F to a proper

cut-o� that ensures that �2

J is larger as the cut-o� to make the bias small.

First stage F-test for � = 0 is inadequate. Instead, use F > 10 as rule of

thumb (Staiger and Stock, 1997) .

Stock and Yogo (2005): Let �210%�bias be the value of � such that for�2 < �210%�bias the maximum bias of 2SLS will be no more than 10% of

the bias of OLS.

!�10 then conclude that instruments are

weakstrong

!�10 is chosen so that under H0 of weak instruments P (F > �10; �

�210%�bias) = 0:05:

Gragg and Donald W-statistic (1993) for multiple included regressors:

gmin = min Eval(GT ), GT = �̂�1=20V V Y0PZY�̂

�1=2V V =J

GT is essentially a matrix analog to the �rst stage F statistics, Eval means

eigenvalue and �̂V V = Y0(I�PZ)Y=(n� J).

Under weak instruments asymptotics, one can show that E[GT ]! �2=J+

1: Stock and Yogo (2005, Table 5.1) give critical values.

This test requires an assumption of iid errors.

This is a potentially a serious problem: if the test statistic is large simply because the

disturbances are not iid, one commits a Type I error and incorrectly concludes that the

model is adequately identi�ed.

If in ivreg2 the robust, cluster or bw option is speci�ed, the reported weak instruments

test statistic is a Wald F statistic based on the Kleibergen{Paap (2006) rk statistic as

the robust analog of the Cragg{Donald statistic.

The authors of ivreg2 (Baum, Scha�er and Stillman, 2003, 2007) suggest that "when

using the rk statistic to test for weak identi�cation, users either apply with caution the

critical values compiled by Stock and Yogo (2005) for the iid case, or refer to the older

\rule of thumb" of Staiger and Stock (1997) that the F-statistic should be at least 10

for weak identi�cation not to be considered a problem."

Hahn and Hausman (2003) test the null of strong instruments. Unfor-tunately, this test is not consistent against weak instruments as the powerof a 5%-test depends on parameters and is typically t 15� 20%:

Shea's (1997) partial R2 is problematic. What needs to be large is theconcentration parameter. An R2 of 0:1 is small if n = 50, but large ifn = 5000.

Anderson (1951) canonical correlations test: Denote the minimum eigen-value of the canonical correlations as CCEV. The smallest canonical correla-tion between the L endogenous regressors and the J excluded instruments(after partialling out the K exogenous regressors) is sqrt(CCEV ), andthe Anderson LM test statistic is n � CCEV , i.e., n times the square ofthe smallest canonical correlation.

Scha�er (2010) provides the ado-�le xtivreg2 with tests for weak instru-ments for the �xed e�ects and �rst di�erenced panel models.

Example 4.1 Angrist and Krueger (QJE, 1991) and Bound, Jaeger and

Baker (JASA, 1995), Returns of schooling in the US (US-Census, 1980)

of men born 1930-1939.

Angrist and Krueger argue that the quarter of birth is a good instrument

for schooling (EDUC) as birth in the last quarter implies that schooling is a

year shorter. These quarter of birth dummies are interacted with dummies

for year of birth and state of birth. Sample size is 329509.

4.2 IV-Estimation in Static Panel Models

Consider the �rst equation of a structural panel model:

y1 = Z1�1 + u1

Z1 = [Y1;X1]

�1 = [ 01;�01]0

u1 = Z��1 + �1

Y1...(NT �G1) vector of the endogenous variables (here G1 = 1); X1 isthe ( NT �K) matrix exogenous variables.

This equation is identi�ed, if the number of variables that are excluded fromthat equation, X2 (NT � K2); are larger as the number of endogenousones: K2 > G1.

One-way random e�ects error structure:

" �1�1

!��01 �01

"�2�11IN 0

0 �2�11INT

E[u1u01] = 11 = �

2�11INT + �

2�11(IN JT )

Within 2SLS: Within-transformation using Q yields:

Qy1 = QZ1�1 +Qu1= QZ1�1 +Q�1ey1 = eZ1�1 + e�1

2SLS uses within-transformed instruments based on X = [X1;X1] withfX = QX and PeX = fX(fX0fX)�1fX0: This gives the within-2SLS estimator:e�1;W2SLS =

�eZ01PeXeZ1��1 eZ01PeXey1V ar(e�1;W2SLS) = �2�11

�eZ01PeXeZ1��1

Remark:

In stata one can use xtivreg and option fe or the ado-�le xtivreg2.

It is essential that the instruments are transformed by the same Q used to

transform the model.

Between 2SLS: Transformation of the model using P yields:

Py1 = PZ1�1 +Pu1

�y1 = �Z1�1 + �u1

2SLS using instruments �X = PX and P�X = �X(�X0�X)�1�X0 results in thebetween-2SLS estimator:

��1;B2SLS =��Z01P�X

�Z1��1 �Z01P�X�y1

V ar(��1;B2SLS) = �2111

��Z01P�X

�Z1��1

�2111 = T�2�11 + �2�11

In stata one can use xtivreg and option fe or the ado-�le xtivreg2.

Error component two-stage least squares (EC2SLS): This estimator stacks

the within- and the between transformed- observations. fX0ey1�X0�y1

fX0eZ1�X0�Z1

!�1 +

fX0eu1�X0�u1

" fX0eu1�X0�u1

# h eu01fX; �u01�X i!= E

" fX0eu1eu01fX fX0eu1�u01�X�X0�u1eu01fX �X0�u1�u01�X

"�2�11

fX0fX 0

0 �2111�X0�X

Note: E(�u1eu01) = PE(u1u01)Q = P(�2111P+�2�11Q)Q = 0 since PQ = 0:

Applying GLS to the stacked model (Baltagi, 1981):

�1;EC2SLS =

264�eZ01fX; �Z01�X�0B@ ��2�11

�fX0fX��1 0

0 ��2111

��X0�X

��11CA fX0eZ1

�X0�Z1

!375�1

��eZ01fX; �Z01�X�

0B@ ��2�11�fX0fX��1 0

0 ��2111

��X0�X

��11CA fX0ey1

�X0�y1

"��2�11

eZ01fX �fX0fX��1 ; ��2111�Z01�X ��X0�X

��1 � fX0eZ1�X0�Z1

!#�1

��2�11

eZ01fX �fX0fX��1 ; ��2111�Z01�X ��X0�X

��1 � fX0ey1�X0�y1

=��2�11

eZ01PeXeZ1 + ��2111�Z01P�X�Z1��1 ��2�11

eZ01PeXey1 + ��2111�Z01P�X�y1� :

Remark:

V ar(��1;EC2SLS) =��2�11

eZ01PeXeZ1 + ��2111�Z01P�X�Z1��1

This estimator is a weighted average of e�1;W2SLS and ��1;B2SLS, i.e.b�1;EC2SLS = W1e�1;W2SLS +W2

��1;B2SLS

W1 =��2�11

eZ01PeXeZ1 + ��2111�Z01P�X�Z1��1 ��2�11 eZ01PeXeZ1�W2 =

��2�11

eZ01PeXeZ1 + ��2111�Z01P�X�Z1��1 ��2111�Z01P�X�Z1�

Consistent estimates of �2�11 and �2111

can be obtained from the within-

and between-residuals:

�̂2�11 =�ey1 � eZ1e�1;W2SLS

�0Q�ey1 � eZ1e�1;W2SLS

�=N(T � 1)

�̂2111 =��y1 � �Z1��1;B2SLS

�0P��y1 � �Z1��1;B2SLS

Feasible EC2SLS is based on the estimates �̂2�11and �̂2111:

Check: �̂2�11 = (�̂2111� �̂2�11)=T > 0?

Balestra and Varadharajan-Krishnakumar (1987), Cornwell, Schmid, Wyhowsky

(1992), White (1986): The optimal set of instruments is based on

X� = �111 X = [��1�11Q+ ��1111P]X = [��1�11

fX+ ��1�11 �X]:Denote Z�1 =

�1=211 Z1 and y

�1 =

�1=211 y1; then

b�1;G2SLS = �Z�01 PX�Z

��1Z�01 PX�y

De�ning the set of instruments as A = [fX; �X] and applying GLS with�1=211 y1 =

�1=211 Z1�1 +

�1=211 u1 yields Baltagi's EC2SLS, since PA =

PeX +P�X and�PeX +P�X��1=211 Z1 =

�PeX +P�X� (��1�11Q+ ��1111P)Z1

= (��1�11PeXeZ1 + ��1111P�X�Z1)Remember Z�1 =

�1=211 Z1 and y

�1 =

�1=211 y1 :

Z�01 PAZ�1 = (��2�11

eZ01PeXeZ1 + ��2111�Z01P�X�Z1)Z�01 PAy

�1 = (��2�11

eZ01PeXey1 + ��2111�Z01P�X�y1)b�1;EC2SLS is numerical identical to b�1;G2SLS = �Z�01 PAZ

��1Z�01 PAy

with instruments A = [fX; �X].

Implementing EC2SLS:

1) Estimate e�1;W2SLS and ��1;B2SLS using the instrumentsfX and �X.

2) Calculate �̂2�11 and �̂2111from the the within- and between-residuals.

3) 2SLS on GLS-transformed ( �̂�11̂�1=211 ) variables using instruments

[fX; �X]or use "xtivreg ...., re ec2sls" in stata.

Example 4.2 Badi H. Baltagi, "Estimating an Economic Model of Crime

using Panel Data from North Carolina", Journal of Applied Econometrics,

Vol. 21, No. 4, 2006, pp. 543-547. This is a replication of the arti-

cle Cornwell, C. and W. N. Trumbull, "Estimating the economic model

of crime with panel data", Review of Economics and Statistics 76, 1994,

360-366. It uses a panel data of 90 US-counties over 7 years, 1981-1987,

which comprises 630 observations.

All variables are in logs.

Dependent variable: log crimes permitted per person

Endogenous variables: log probability of arrest, log police per capita

Instruments: log tax revenue per capita, log o�ense mix: face-to-face/other

4.3 The Mundlak Model and the Hausman-Taylor Estima-

Mundlak (1978) proposes an alternative formulation of the one-way ran-dom e�ects model that includes time-invariant unit averages of the ex-planatory variables:

yit = �+ x0it�+�i + �it; i = 1; :::N; t = 1; ::::T

�i = �x0i:� + "i with "i s i:i:d:(0; �2");

i.e., Mundlak assumes that unit e�ects depend on all the unit-averages of the explanatory

variables (�xi:).

In addition he assumes independence of "i and �it:

Clearly, under � = 0 the unit e�ects �i and the explanatory variables are uncorrelatedand the random e�ects assumption holds.

In vector form:

� = 1TZ

0�X� + "

y = X� + Z��+ �

= X� + Z�Z0�X�=T + Z�"+ �

= X� +PX� + Z�"+ �;

P = Z�(Z0�Z�)

�1Z0� =1TZ�Z

0� = IN �JT

Eh(Z�"+ �) (Z�"+ �)

0i = �2"(IN JT ) + �2�INT

Using the formulas for the (partitioned inverse), GLS on the Mundlak-

model yields

b�GLS = e�within = (X0QX)�1X0Qyb�GLS = e�between � e�within = (X0PX)�1X0Py�(X0QX)�1X0QyV ar(b�GLS) = (T�2" + �

2�)(X

0PX)�1 + �2�(X0QX)�1:

One can easily show that the test � = 0 vs. � 6= 0 is equivalent to theHausman-test of the random vs. �xed e�ects-model.

A Wald test on � = 0 can be made robust with respect to heteroskedas-

ticity.

Hausman, Taylor (1981): Only a subset of the explanatory variables is

correlated with the unit e�ects. More importantly, this estimator allows to

include time-invariant variables as in the random e�ects model

yit = �+ x0it� + z

0i + �i + �it; i = 1; :::N; t = 1; ::::T:

X = [X1;X2]:

X1, (NT �K1), doubly exogenous variables (uncorrelated with both � and �)X2 , (NT �K2); singly exogenous variables (correlated with � but not with �)

Z = [Z1;Z2]:

Z1; (NT �G1), double exogenous time-invariant variables (uncorrelated with both �and �)

Z2; (NT �G2); singly exogenous time-invariant variables (correlated with � but notwith �).

Hausman and Taylor (1981) propose the following GLS-IV-estimator (in

stata xthtaylor)

1. The �xed e�ects estimator is consistent and provides consistent resid-

uals bdit = yit � x0it e�within that can be averaged for each unit overtime to yield (T �1) vectors bdi = �yi:��x0i: e�within that can be stackedin a (NT � 1) vector bd.

2. 2SLS using the instrument set of doubly exogenous variables A =

[X1;Z1] and projection matrix PA = A(A0A)�1A0 (i.e., we musthave K1 � G2):This yields a consistent estimator for :

b 2SLS = (Z0PAZ)�1Z0PAbd:

3. Estimation of the variances:

e�2� = ey0 �INT �PeX� ey=N(T � 1)::::within-residualse�21 = 1N (y �X

e�within � Zb 2SLS)0P(y �Xe�within � Zb 2SLS)4. IV-GLS with GLS-transformation e�� e�1=2using the instrumentsAHT =[QX1;QX2;PX1;Z1]:

If K1 < G2 the model is not identi�ed andb�HT = e�within while b HT

remains unidenti�ed.

If K1 = G2 the model is just identi�ed:b�HT = e�within and b HT =b 2SLS:

If K1 > G2 the model is overidenti�ed:b�HT is more e�cient that the

within estimator e�within:

Testing the overidenti�cation restrictions withK1 �G1 > 0:

cm =�b�HT � e�within�0 �est:V ar(e�within)� est:V ar(b�HT )� �b�HT � e�within�

b�2�cm H0! �2(l) with l = min(K1 �G1; NT �K):

Amemiya, MaCurdy (1986) propose the set of instruments AAM =

[QX1;QX2;X�1; Z1] with

X�1 = (X01 �T ) and X01 =

264 x11 x12 ::: x1T::: ::: ::: :::xN1 XN2 ::: xNT

Here, we have stronger exogeneity assumptions. The instruments are valid

if p limN!1�1N

PNi=1X1it�i

�= 0; t = 1; :::; T , while Hausman and Taylor

(1981) require p limN!1�1N

�X1i:�i�= 0:

Breusch, Mizon and Schmidt (1989) suggest an even more e�cient set of

instruments: ABMS = [QX1;QX2; (QX1)�; (QX2)

�; Z1]:

Example 4.3 The Panel Study of Income Dynamics, taken from Cornwell

and Rupert (1988, Journal of Applied Econometrics 3, pp. 149-155). 595

individuals over 7 years, 1976-1982.

5 Lecture 4: Dynamic Linear Panel Models

Dynamic partial adjustment models have been widely used in time series

econometrics, but are also popular in panel econometrics (e.g. company

investment, labor demand, growth in GDP per capita). The dynamic panel

model includes the lagged endogenous variable on the right hand side.

yit = �yi;t�1 + x0it� + uit

uit = �i + �it�i s i:i:d:(0; �2�)

�it s i:i:d:(0; �2�)

Problem 1: Both yit and yi;t�1 depend on unit e�ects �i: Hence OLS andGLS yield inconsistent and biased results, since yi;t�1 is endogenous.

Problem 2: The within transformation eliminates units e�ects. However,

yi;t�1 � 1T�1

PTt=2 yi;t�1 = yi;t�1 � �yi:;�1 is correlated with �it � ��i: per

construction, even if �it is not serially correlated.

Note vit, �it�1 are included in ��i: and in short panels their weight in ��i:can not be neglected. Obviously, �it�1 is correlated with yit�1; and thuswith �yi:; but also vit is correlated with �yi:;�1; because �yi:;�1 contains yit.

Nickell (1981) and Ridder, Wansbeek (1990): The bias of the within esti-

mator is of order 1=T and only disappears if T !1:

Anderson and Hsiao (1981) propose to transform the panel model into

�rst di�erences and to use instruments for yi;t�1 � yi;t�2:

yit � yi;t�1 = ��yi;t�1 � yi;t�2

�+�x0it � x0i;t�1

�� + vit � vi;t�1

yi;t�2�yi;t�3 or yi;t�2 are valid instruments: They are correlated yi;t�1�yi;t�2; but not with vit � vi;t�1 provided vit is not serially correlated.

Arellano (1989): IV-estimators with instrument yi;t�2 � yi;t�3 have sin-gularity points and thus can exhibit large variance. yi;t�2 is the betterinstrument.

Arellano, Bond (1991)-GMM-estimator: We start with a simple model

without exogenous variables xit:

yit = �yi;t�1 + uituit = �i + �it

�i s iid(0; �2�)

�it s iid(0; �2�)

or rewritten in �rst di�erences for t > 2 :

yit � yi;t�1 = ��yi;t�1 � yi;t�2

�+ vit � vi;t�1

�y = ��y�1 +��

Note: "it = vit � vi;t�1 is a MA(1)-process with a unit root.

Valid Instruments:

t = 3: yi1 is the only valid instrument:

yi3 � yi2 = � (yi2 � yi1) + vi3 � vi2

t = 4: yi2 and yi1 are valid instruments:

yi3 � yi2 = � (yi2 � yi1) + vi3 � vi2yi4 � yi3 = � (yi3 � yi2) + vi4 � vi3

The error term of the di�erenced model: Let �vit = vit � vi;t�1

E[�vi�v0i] =

266640BBB@

vi3 � vi2vi;4 � vi;3

viT � vi;T�1

1CCCA� vi3 � vi2; vi;4 � vi;3; ::: viT � vi;T�1 �37775

= 266666642�2� ��2� 0 ::: ::: 0��2� 2�2� ��2� ::: ::: 0::: ::: ::: ::: ::: :::

0 0 ::: ��2� 2�2� ��2�0 0 ::: 0 ��2� 2�2�

37777775

E[�vi�v0i] = �2�

266666642 �1 0 ::: ::: 0�1 2 �1 ::: ::: 0::: ::: ::: ::: ::: :::0 0 ::: �1 2 �10 0 ::: 0 �1 2

37777775 = �2�G;

E[�v�v0] = �2� (IN G)

Matrix of instruments for unit i (t = 3; :::; T ):

26664[yi1] 0 ::: 00 [yi1; yi2] ::: 0::: :::: ::: :::0 0 ::: [yi1; yi2; :::; yi;T�2]

37775(T�2�1+2+3:::+T�2)

26664W1W2:::WN

Moment Conditions:

E[W0i�vi] = 0 or E[W

0�v] = 0

Note: Roodman (2009a) proposes to collapse the matrix of instruments

(see the option collapse in xtabond2) to avoid the problem of to many

instruments:

26664[yi1] 0 ::: 00 [yi1; yi2] :::::: :::: :::0 0 ::: [yi1; yi2; :::; yi;T�2]

37775!

26664yi1 0 0 ::: 0yi2 yi1 0 ::: 0yi3 yi2 yi1 ::: 0::: ::: ::: ::: :::

Arellano and Bond (1991) preliminary GMM one-step estimator:

LetA be the weighting matrix for the moments so that the GMM-estimator

minimizes

qn(�) = �v0WA�1W�v =(�y��y�1)0WA�1W(�y��y�1);

b�1 = [�y0�1WA�1W0�y�1]�1�y0�1WA

�1W0�y:

V ar[b�1] = [�y0�1WA�1W0�y�1]

�1�y0�1WA�1W0

�E[��0]WA�1W0�y�1[�y0�1WA

�1W0�y�1]�1

Note: Since E[�v�v0] = �2� (IN G) ; we can set A =W0 (IN G)W=PNi=1W

0iGWi and obtain

b�1 = [�y0�1W(W0 (IN G)W)�1W0�y�1]�1

��y0�1W(W0 (IN G)W)�1W0�y;

which is the one-step Arellano-Bond GMM-estimator.

Inserting optimal weights simpli�es the variance of b�1 :V ar[b�1] = [�y0�1WA

�1W0�y�1]�1�y0�1W

�W0 (IN G)W

�| {z }

�1 �

W0��2� (IN G)�| {z }WE[��0]

A�1W0�y�1[�y0�1WA

�1W0�y�1]�1

= �2�

��y0�1W

�W0 (IN G)W

��1W0�y�1

��1:

One can use a less restrictive assumption and base the weighting matrix

cVN =NXi=1

W0id��d��0iWi

instead onW0 (IN G)W.

Since the one-step estimator is consistent, we can plug in the estimated

residuals of the preliminary one-step estimator for ��i: This yields the

two-step Arellano-Bond estimator:

b�2 = [�y0�1WcV�1N W0�y�1]

�1�y0�1WcV�1N W0�y

est:asy:V AR�b�2� = [�y0�1W

cV�1N W0�y�1]�1

If �it s i:i:d(0; �2�),b�1 and b�2 are asymptotically equivalent. Both are

asymptotically normal for N !1 and �xed T .

Adding strictly exogenous variables:

�y = ��y�1 +�X� +��

where X N(T � 1)�K: Then the matrix of instruments for each i is

26664W1;�X1W2;�X1

:::WN ;�XN

See Roodman (2009b), xtabond2 : the options gmm generatesWi and iv

adds �X:

If xit is predetermined, i.e., E[xitvis] 6= 0 for t > s; we can use the

instruments [x0i1;x0i2; :::;x

0i;s�1] so thatWi is given by266664

[yi1;x0i1;x

0i2] 0 :::

0 [yi1; yi2;x0i1;x

0i3] :::

::: :::: :::0 0 ::: [yi1; :::; yi;T�2;x0i1; :::;x

0i;T�1]

377775 :i.e. you put the predetermined variables in the gmm option of xtabond2

For instruments to be valid it is important that there is no second order

serial correlation (yit�2 is not correlated with �it). Arellano and Bond

(1991) propose a test for this hypothesis.

Sargan overidenti�cation test (see Roodman, 2009b)

�̂2��b�0W

24 NXi=1

W0i (IN G)Wi

35�1W�b� s �2(p�K � 1)

J = �b�0W24 NXi=1

W0i�b�i�b�0iWi

35�1W�b� s �2(p�K � 1);

where p is the number if instruments (columns ofW).

In case of heteroskedasticicty one uses Hansen's J-test.

In small samples the two-step GMM estimator exhibits overly optimistic t-

tests as the standard-errors of the estimated parameters are heavily under

estimated (as demonstrated by Monte Carlo simulations). Windemeijer

(2005) shows that the reason is that the Weighting matrix is itself esti-

mated and depends on the estimated parameters.

Based on a Taylor-Series expansion he proposes a small sample correction of

the two-step GMM estimator that provides a more accurate approximation

in �nite samples when moment conditions are linear.

In stata this correction is applied, whenever the robust option is invoked

when calculated the two-step GMM estimator.

Blundell and Bond (1998) show that one can use additional moment

conditions to increase the e�ciency of the GMM-estimator under an ad-

ditional "mild" stationarity restriction on the initial values (see Baltagi,

2008). Again we consider the simple model without additional explanatory

variables:

E[�yit �

�i1��

��i] = 0) E[�yi;t�1�i] = 0; t = 2; :::; T:

Note E[�yit �

�i1��

��i] = 0 implies that this condition holds for all t. This

adds T � 2 additional moment conditions in levels:

E[�yi;t�1(�i + �it)] = E[�yi;t�1(yit � �yi;t�1)] = 0:

Together with the moment conditions of Arellano and Bond (1991) we

get the system GMM-estimator, which stacks both types of moment con-

ditions (see also Arellano and Bover, 1995). A big advantage of this

estimator is that the model can include time invariant variables.

Roodman (2009b): "... But the new assumption is not trivial; it is akin to one ofstationarity. The Blundell{Bond approach instruments yi;t�1 with �yi;t�1, which ...contains the �xed e�ect �i - yet we assume that the levels equation error, uit, contains�i too, which makes the proposition that the instrument is orthogonal to the error, thatE(�yi;t�1uit) = 0, counterintuitive. "

Note �yi;t = (� � 1)yit�1 + x0it� + uit and � < 1 and

26640BB@ (� � 1)yit�1| {z }��i in expectation

+ x0it� + �i + "it

1CCA (�i + �it)3775 = 0

"The assumption can hold, but only if the data generating process is such that the �xede�ect and the autoregressive process governed by �,..., o�set each other in expectationacross the whole panel, much like investment and depreciation in a Solow growth model

steady state. " In short: One has to assume that initial values are random deviations

from the steady state.

Kiviet (1995), Bun and Kiviet (2006) derive a bias corrected LSDV-

estimator, which uses the AH-or AB-estimators as starting values. The

bias correction is based on higher order asymptotic expansions.

The proposed estimators are asymptotically normal distributed.

Bruno (2005) provides a stata ado-�le xtlsdvc that implements this es-

timator. This command is also able cope with unbalanced panels that

exhibit randomly missing data.

Example 5.1 Arellano, Bond (1991): A dynamic panel estimator of labor

demand at the �rm level.

1976 1978 1980 1982 1984year

The first ten observations

2 0 2 4 6L.n

n vs lag n

2 0 2 4 6L.n

diff n vs lag n

2 0 2 4 6first n

First vs. last obs. of n

Further references:

IV-Estimation:

Baum, C. F. , M. E. Scha�er, and S. Stillman (2003), Instrumental variables and GMM:

Estimation and Testing, Stata Journal 3(1), pp.1-31

Baum, C. F., M. E. Scha�er, and S. Stillmann (2007). ivreg2: Stata module for extended

instrumental variables/2SLS, GMM and AC/HAC, LIML, and k-class regression. Boston

College Department of Economics, Statistical Software Components S425401.

Downloadable from http://ideas.repec.org/c/boc/bocode/s425401.html.

Kleibergen, F., and R. Paap (2006). Generalized Reduced Rank Tests Using the Singular

Value Decomposition, Journal of Econometrics 127(1), pp. 97-126.

Scha�er, M.E. (2010), xtivreg2: Stata Module to Perform Extended IV/2SLS, GMM and

AC/HAC, LIML and k-class Regression for Panel Data Models,

http://ideas.repec.org/c/boc/bocode/s456501.html

Stock, J. H., J. H. Wright, and M. Yogo (2002). A Survey of Weak Instruments and Weak

Identi�cation in Generalized Method of Moments. Journal of Business and Economic

Statistics 20(4), pp. 518{29.

Stock, J. H., and M. Yogo (2005). Testing for Weak Instruments in Linear IV Regression

in Andrews D. W. K. and J. H. Stock (ed.), Identi�cation and Inference for Econometric

Models: Essays in Honor of Thomas Rothenberg, Cambridge: Cambridge University Press,

pp. 80-108.

Dynamic Panel Models:

Arellano, M., and S. Bond (1991), Some Tests of Speci�cation for Panel Data: Monte

Carlo Evidence and an Application to Employment Equations. Review of Economic Stud-

ies 58(2) 277-297.

Blundell, R., and S. Bond (1998), Initial Conditions and Moment Restrictions in Dynamic

Panel Data models, Journal of Econometrics 87(1), pp. 115-143.

Bruno G. S. F. (2005), Estimation and Inference in Dynamic Unbalanced Panel-Data

Models with a Small Number of Individuals, The Stata Journal 5(4), pp. 473-500.

Bun M. J.G. and J. F. Kiviet (2006), The E�ects of Dynamic Feedbacks on LS and MM

Estimator Accuracy in Panel Data Models, Journal of Econometrics 132(2), pp. 409-444.

Kiviet, J. F. (1995), On Bias, Inconsistency, and E�ciency of Various Estimators in

Dynamic Panel Data Models, Journal of Econometrics 68(1), pp. 53-78.

Roodman, D. M. (2009a). A Note on the Theme of too Many Instruments. Oxford

Bulletin of Economics and Statistics 71(1), pp. 135{158.

Roodman, D.M. (2009b). How to Do xtabond2: An introduction to "Di�erence" and

"System" GMM in Stata. The Stata Journal 9(1), pp. 86{136.

Windmeijer, F. (2005), A Finite Sample Correction for the Variance of Linear e�cient

Two-step GMM estimators; Journal of Econometrics 126(1), pp. 25-51.

topics in applied econometrics the econometric analysis …¡ndez-amador/phd/ws12/... · the...

Documents

econometric analyses of renewable energy...

metodologãa de la investigaciã³n. hernã¡ndez sampieri

14 umts lte sae ws12

financial econometric modelling

practical course ws12/13 introduction to monte carlo...

ws12 - aesthetic design

ws12 gyro/digital compass mounting adapter

tesis daniela bustos hernÃ¡ndez. Ãltima revisiÃ³n.21...

econometric s 12052011

behavioral/systems/cognitive...

econometric semiconductor forecast

spatial econometric analysis

econometric analysis for scenario-based...

ws12 savadogol wp7 health and food security -...

econometric in application

sinisa gavrilovic-ws12

econometric topic

econometric marketing

econometric causality

spatial econometric approaches to estimating hedonic...