instrument endogeneity, weak identiﬁcation, and inference …...2014/08/22 · instrument...

Instrument endogeneity, weak identification, and

inference in IV regressions

Firmin Doko Tchatoka∗

The University of Adelaide

August 18, 2014

∗ School of Economics, The University of Adelaide, 10 Pulteney Street, Adelaide SA 5005, Tel:+6188313 5540, Fax:+618 8223 1460, e-mail: [email protected]

ABSTRACT

We study the possibility of making exact inference in structural models where: (a) instru-

mental variables (IVs) may be arbitrary weak, collinear, and invalid; (b) the errors may

have non-Gaussian distributions (possibly heavy-tailed and heteroskedastic); and (c) the

reduced-form specification may be arbitrary heterogenous,nonlinear, unspecified, and in-

complete (missing instruments). We provide the necessary and sufficient conditions under

which such models are identifiable, despite instrument invalidity. Under these conditions,

Wald-type tests and confidence sets (CSs) based onk-class type estimators may apply.

However, these conditions rule out models in which IVs are weak and are further diffi-

cult to check in practice. To alleviate these drawbacks, we develop identification-robust

procedures to test and build CSs for model coefficients. CSs for individual component

of the structural and instrument endogeneity parameters are obtained by projection. Tests

of exclusion restrictions and instrument selection are covered as instances of the class of

proposed procedures.

Key words: Instrument endogeneity; weak instruments; identification-robust inference;

finite-sample; non-Gaussian errors; projection method; exact Monte Carlo tests.

JEL classification: C12; C13; C36.

i

1. Introduction

This paper contributes to the literature on weak instruments by developing exact tests and

confidence sets (CSs) in IV regressions where: (i) instrumental variables may be arbitrary

weak, collinear, and invalid; (ii) the errors may have non-Gaussian distributions (possi-

bly heavy-tailed and heteroskedastic); and (iii) the reduced-form specification may be het-

erogenous, nonlinear, unspecified, and incomplete (omitted instruments).

IV methods usually requires the availability of exogenous instruments, at least as great

as the number of coefficients to be estimated, whereas the validity of those instruments is

not testable. In the last two decades, the so-called “weak instruments” problem has received

considerable attention in econometrics. Research on this topic is widespread and most

of the studies have usually imposed the exclusion restrictions.1 Several studies of weak

instruments have recently questioned the validity of the strictly exogeneity assumption.2

For example, Murray (2006) states: “in most IV applications, the instruments often arrive

with a dark cloud of invalidity hanging overhead and researchers usually do not know

whether their correlations with the error are exactly zero.” He suggests avoiding invalid

instruments in IV procedures. However, as it is difficult to test the validity of all candidate

instruments, it might seem that if we want to avoid invalid instruments, there is little hope

in trying to use IV methods. Bound et al. (1995, Section 3) provide evidence on how a

slight violation of instrument exogeneity can cause severebias in IV estimates, especially

when identification is weak. Hausman and Hahn (2005) show that even in large samples, IV

estimator can have a substantial bias even when the instruments are only slightly correlated

with the error. Doko Tchatoka and Dufour (2008) and Guggenberger (2011) show that

1For example, see Phillips (1989), Nelson and Startz (1990a,1990b), Choi and Phillips (1992), Bekker(1994), Hall, Rudebusch and Wilcox (1996), Dufour (1997, 2003, 2009), Staiger and Stock (1997), Wang andZivot (1998), Stock and Wright (2000), Donald and Newey (2001), Dufour and Jasiak (2001), Kleibergen(2002, 2004, 2005), Moreira (2003), Stock, Wright and Yogo (2002), Hall and Peixe (2003), Stock and Yogo(2005), Dufour and Taamouti (2005, 2007), Swanson and Chao (2005), Andrews and Stock (2007a, 2007b),Guggenberger and Smith (2005), Andrews, Moreira and Stock (2006), Dufour and Hsiao (2008), Hansen,Hausman and Newey (2008), Moreira, Porter and Suarez (2009), Chaudhuri and Zivot (2010), Dufour, Kha-laf and Beaulieu (2010), Guggenberger (2011), Guggenberger, Kleibergen, Mavroeidis and Chen (2012),Dufour, Khalaf and Kichian (2013), Mikusheva (2010, 2013),Doko Tchatoka and Dufour (2014), andDoko Tchatoka (2014).

2See Bound, Jaeger and Baker (1995), Brock and Durlauf (2001), Imbens (2003), Hausman and Hahn(2005), Murray (2006), Kiviet and Niemczyk (2007, 2012), Doko Tchatoka and Dufour (2008), Kraay(2008), Ashley (2009), Bazzi and Clemens (2009), Hahn, Ham and Moon (2010), Guggenberger (2011),and Berkowitz, Caner and Fang (2008, 2012).

1

Anderson and Rubin (1949) (AR) and Kleibergen (2002) (K) tests are highly sensitive to

instrument invalidity.

In this paper, we stress the fact that valid tests and CSs can be obtained in IVs regres-

sions in which the exclusion restrictions are violated. Several studies have adopted the same

position and we wish to make progress in this direction. Imbens (2003) shows that bounds

on average treatment effect in program evaluation can be recoveredviaa sensitivity analysis

of the correlations between treatment and unobserved components of the outcomes. Ashley

(2009) shows how the discrepancy between OLS and IV estimates can be used to estimate

the degree of bias under any given assumption about the degree to which IVs violate the

exclusion restrictions. Kiviet and Niemczyk (2007, 2012) show that the realizations of IV

estimator based on strong but invalid instruments seem muchcloser to the true parameter

values than those obtained from valid but weak instruments.Doko Tchatoka (2013) shows

that bootstrapping improves the size of Durbin-Wu-Hausmantests of exogeneity when IVs

are invalid. Imbens et al. (2011) show that Donald and Newey (2001) bias-corrected esti-

mator and Phillips and Hale (1977) jackknife IV estimator can be consistent and asymptot-

ically normal even when the exclusion restrictions are violated. Their framework, however,

rules out weak issues. Berkowitz, Caner and Fang (2012) showthat re-sampling Anderson

and Rubin (1949) AR-statistic yields test that has correct level asymptotically, under local-

to-zero instrument endogeneity.3 However, their method is valid only in large-sample and

is overly conservative.

By contrast, we develop a finite-sample procedure for testing and building CSs in IV re-

gressions where: IVs can be arbitrary weak, collinear, and violate the exclusion restrictions;

the errors may have non-Gaussian distributions (possibly heavy-tailed and heteroskedastic);

and the reduced-form specification may be arbitrary heterogenous, nonlinear, unspecified,

or incomplete. To be more specific, we consider a model of the form

y1 = y2β +X1γ1+u, u= X2γ2+e

wherey1 is an observed dependent variable,y2 is an observed (possibly) endogenous re-

gressor,X1 is a matrix of exogenous variables,X2 is a matrix of instruments which may be

3The parameter that controls instrument endogeneity goes tozero [at raten−1/2] when the sample sizenincreases.

2

rank-deficient and violate the exclusion restrictions ifγ2 6= 0, e is an error term. We callγ2

“instrument endogeneity” because it determines which variables inX2 are valid instruments

and which are not.

We observe that a procedure similar to that of Anderson and Rubin (1949) can be used

to develop identification-robust tests and CSs onθ = (β , γ ′2)′. So, identification-robust CSs

for each component ofβ andγ2 can be derived through the projection method.4 When the

errore follows a Gaussian distribution and is independent ofX, we show that the standard

Fisher-type critical values are applicable. But for a wide class of parametric non-Gaussian

errors (possibly heavy-tailed and heteroskedastic), we supply exact Monte Carlo tests5 crit-

ical values. We provide the analytical forms of the proposedCSs forθ and scalar linear

transformations ofθ , and characterize the necessary and sufficient conditions under which

there are bounded. Tests of exclusion restrictions and instrument selection are covered as

instances of the class of proposed procedures, including inexactly identified models.

The remainder of this paper is organized as follows. Section2 formulates the model

and related assumptions. Section 3 studies structural parameters identification with invalid

instruments. Section 4 develops finite-sample tests and CSswith correct level, even in the

presence of non-Gaussian errors. Section 5 deals with the Monte Carlo experiment, while

Section 6 presents the empirical application. Conclusionsare drawn in Section 7 and proofs

are presented in the Appendix.

Throughout this paper,Iq stands for the identity matrix of orderq. For anyn×mmatrix

A, PA = A(A′A)+A is the projection matrix on the space spanned byA, andMA = In−PA,

whereB+ refers to the Moore-Penrose inverse of the matrixB. The notationrank(A) is

the rank of the matrixA, while ‖A‖= [tr(A′A)]12 denotes the usual Euclidian or Frobenius

norm for A. B > 0 for a squared matrixB means thatB is positive definite (p.d.). The

symbol “d∼ ” signifies equivalence in distribution. The orthogonal group of p× p matrices

is denoted byO(p) =

H ∈M(p, p) : H ′H = Ip, whereM(p, p) is the set of all squared

matrices of orderp. Finally, for anyn×m matrix Ω , K er(Ω) = ω ∈ Rm : Ωω = 0

is the null set (kernel) ofΩ , andI m(Ω) = x∈ Rn : x= Ωω for someω ∈ R

m is the

column space ofΩ .

4See Dufour and Jasiak (2001), Dufour and Taamouti (2005), and Doko Tchatoka and Dufour (2014).5See Dufour (2006).

3

2. Model and assumptions

We consider a standard linear IV regression with one endogenous right hand side (rhs)

variable,k1 exogenous variables, andk2 IVs. The sample size isn. The model consists of a

structural equation and a reduced-form equation:

y1 = y2β +X1γ1+u,(2.1)

y2 = X1π1+X2π2+v2(2.2)

wherey1, y2 ∈ Rn, X1 ∈ R

n×k1, andX2 ∈ Rn×k2 (k2 ≥ 1) are observed variables;u, v2 ∈

Rn are unobserved errors;β ∈ R, γ1 ∈ R

k1, π2 ∈ Rk2, andπ1 ∈ R

k1 are unknown fixed

parameters. LetY = [y2 : y2] = [Y1, . . . ,Yn]′ ∈R

n×(G+1) andX = [X1 : X2] = [X•1, . . . ,X•n]′ ∈

Rn×k (k= k1+k2) denote the matrix of endogenous variables and instruments, respectively.

We defineYt ∈ R2 andX•t ∈ R

k as thetth rows ofY andX, written as column vectors, and

similarly for other random matrices. We make the following assumptions on the model

variables.

Assumption A For some fixed vectorγ2 in Rk2, we have:

u = X2γ2+e, where e∈ Rn is an error term.(2.3)

Assumption A implies thatX2 violates the usual exclusion restrictions ifγ2 6= 0. If

X2t (t = 1,2, . . . ,1) have same finite second moments ande is uncorrelated withX2, then

cov(X2t ,ut) 6= 0 wheneverγ2 /∈ K er(Cov(X2)), whereCov(X2) = E[(X2t − µX2)(X2t −

µX2)′], is the covariance6 matrix of X2t , µX2

= E(X2t). Therefore, some variables inX2

do not constitute valid instruments. Because of this property, we call γ2 “instrument en-

dogeneity.” The usual tests of exclusion restrictions– such as that of Sargan (1958) Bas-

mann (1960), and Hansen (1982)– typically test the null hypothesis thatγ2 = 0 in (2.3);

see Staiger and Stock (1997) and Hahn et al. (2010). Under Assumption A, the condi-

tional mean and variance ofut , givenX2t , depend onX2t if γ2 6= 0 (conditional structural

heteroskedasticity). Staiger and Stock (1997) and Guggenberger (2011) made a similar

assumption withγ2 = γ02/√

n for some fixed vectorγ02 ∈ R

k2 (local-to-zero instrument en-

6See Anderson (1971, Section 2.3) and Muirhead (2005, Section 1.2) for a similar definition and notation.

4

dogeneity).7 Doko Tchatoka and Dufour (2008) show the Anderson and Rubin (1949)

(AR)-test and Kleibergen (2002) (K)-test are highly size distorted under Assumption A.

Imbens et al. (2011) show that Donald and Newey (2001) bias-corrected estimator and

Phillips and Hale (1977) jackknife-instrumental-variables estimator may still be consis-

tent and asymptotically normal under Assumption A. But their framework assumes strong

instruments (i.e.,π2 6= 0), thus ruling out issues associated with weak instruments.

Assumption B rank(X1) = k1 and rank(X2) = ν2 ≤ k2 for some integerν2 > 0.

Assumption B imposes full-column rank on the matrix of exogenous variablesX1, but

allows X2 to have any arbitrary rankν2 > 0. For example, some linear combinations of

the columns ofX2 may becollinear or close to being so. Dufour and Taamouti (2007)

also consider a similar setup. Note that there no impedimentto expanding the full-column

rank assumption ofX1 to any arbitrary rankν1 ≥ 0. Under Assumption B, we may also

have 0< rank(X) = ν ≤ k. In the remainder of this paper,W = MX1X2, whereMX1 =

In−X1(X′1X1)

−1X′1, denotes the residuals of the regression ofX2 on the columns ofX1.

Assumption C (i)(

et , v′2t ,X′•t)′ : t = 1, . . . ,≤ n

are i.i.d. across t≤ n and n; (ii)

E[(et ,v2t) |X•t ] = 0 ∀ t = 1, . . . , n; and(iii ) E(WtW′t ) = ΩW ∀ t = 1, . . . , n.

Assumption C-(i) and (ii) are widely used in the IV literature; see Staiger and Stock

(1997), Stock and Wright (2000), Kleibergen (2002, 2005), Andrews et al. (2006), Guggen-

berger et al. (2012). (i) states that the errors and IVs are random and i.i.d acrossi ≤ n andn,

while (ii) is the usual conditional zero mean assumption of the errors. Assumption C-(iii)

requires the existence of same second moments for each row ofW. Note thatΩW may

not be positive definite and can be singular. In particular, this is the case whenX2 is rank-

deficient. No assumption on the existence of second moments or more for the errors(e,v2)

is needed.

Now, consider the linear map defined by

Rn −→ R

n

e 7→ εσ (e) = σ(X)e,(2.4)

7Also, see Berkowitz, Caner and Fang (2008, 2012).

5

whereσ(X) is possibly a random function ofX such that the eventσ(X) 6= 0 | X has

probability 1a.s., e is the error term defined in(2.3). Note thatσ(X) need not to be constant

and the distribution ofεσ (e) may arbitrarily depend onX.

For the purpose of developing finite-sample theory, we make the following assumption.

Assumption D There isσ0(X) such thatεσ0(e) satisfies(2.4) and εσ0

(e)d∼ ε, where

given X= x, ε has a completely specified distributionPε (x).

Assumption D states that the conditional distribution, givenX, of the error in the re-

gression ofu on X2 only depends onX and a (typically unknown) possibly random scale

factor σ0(X). This assumption holds whenevere is independent ofX with a distribution

of the formed∼ ε/σ0, whereε has a specified distribution andσ0 is an unknown positive

constant. In this context, the standard Gaussian model is obtained by taking

(2.5) ε ∼ N(0, In).

But non-Gaussian distributions which may be heteroskedastic and lack moments (such as

the Cauchy or Studentt distributions) are covered.

Under Assumption A, we can write model (2.1)-(2.2) as:

y1 = y2β +X1γ1+X2γ2+e,(2.6)

y2 = X1π1+X2π2+v2(2.7)

whereE[(e : v2)|X] = 0 by Assumption C. Letθ = (β ,γ ′2)′ andδ = (θ ′,γ ′1,π

′1,π

′2)

′ ∈Θ ⊆R×R

k2 ×Rk1 ×R

k2 , whereΘ is the parameter space. The statistical model associated with

(2.6)-(2.7) is defined as(Y ×X ,Pδ , δ ∈Θ) , whereY andX are drawn fromY and

X , respectively. For any random variableZ (possibly function ofδ ), PZ(x;δ ) denotes the

distribution ofZ conditional onX = x and we writeZ|X=x ∼ PZ(x;δ ).

We consider the problem of testing

Hθ0

: θ = θ 0 vs.Hθ1

: θ 6= θ0, for some fixedθ0 = (β0,γ0′2 )

′.(2.8)

6

Our main focus is on finite-sample and we are concerned with developing similar tests for

Hθ and confidence sets forβ andγ2 when some instruments inX2 may be arbitrary weak,

invalid, or collinear. But before proceeding, it will illuminating to study the identification

of β in the presence of possibly invalid instruments first.

3. Identification of β

We study the identification of the structural coefficient (β ) when some instruments inX2

may be invalid or collinear. If the exclusion restrictions (γ2 = 0) are satisfied,X has full-

column rank with probability 1, and[e,v2] has mean zero, then the weak IV literature

documents that the necessary and sufficient condition for the identification ofβ is π2 6= 0;8

see Stock et al. (2002), Dufour (2003), Andrews and Stock (2007a), Dufour and Hsiao

(2008), and Mikusheva (2013). Here, we investigate the identification of β when γ2 is

left unrestricted (possibly invalid instruments) andX2 may contains redundant columns

(ν2 < k2) or close to being so.

First, we can write the reduced-form forY = [y1 : y2] as:

(3.1) Y = X1ξ 1+X2ξ 2+V with V = [v1 : v2],

wherev1 = v1(β ) = v2β +e, ξ 1 = (ξ 11 : π1) = (γ1+ π1β : π1), andξ 2 = (ξ 21 : π2) =

(γ2+π2β : π2). Suppose first thatrank(X2) = ν2 = k2 andE([e : v2]/X) = 0. Hence, the

least squares estimators of the coefficients onXj ( j = 1,2) in each regression of (3.1) are

unique. On expressing the coefficients onX2 in (3.1) asξ 2 = (ξ 21 : π2) = (γ2 : 0)+π2a′,

wherea = (β ,1)′, it can be seen thatξ 21 is proportional toβ (with factor π2) if γ2 = 0.

Sinceξ 2 is identifiable,β is identifiable wheneverπ2 6= 0 if γ2 = 0. However, if γ2 6=0 is left unrestricted,ξ 21 = γ2 + π2β does not necessary have a solution forβ even if

π2 6= 0. To be more specific,ξ 21 = γ2 + π2β has a solution with respect toβ if, and

only if, (ξ 21− γ2) ∈ I m(π2), whereI m(π2) is the column space ofπ2; see Magnus

and Neudecker (1999, Ch. 2, Section 9, Theorems 11-12). Evenif a solutionβ exists, it

generally depends on the unknown valueγ2. The condition under which a solutionβ (when

8Note that this condition is replaced by the full-column rankassumption ofπ2 if G > 1 (i.e., there aremore than one endogenous regressor iny2).

7

it exists) does not depend onγ2 is thatγ2 ∈ K er(π2), whereK er(π2) is the null space of

π2.

We can generalize the above argument to cases in whichX2 is rank-deficient (ν2 < k2)

or close to being so. One difficulty here, is that,ξ 21 andπ2 are not uniquely determined

from the regression (3.1); see Magnus and Neudecker (1999, Ch. 13, Section 6, Eqs.(1)-

(2)). However, the conditional meansE(y2|X) andE(y1|X) are still estimable despite the

fact thatX2 does not have full-column rank; see Magnus and Neudecker (1999, Ch. 13,

Theorem 15). This implies that the errorsv1t (t = 1, . . . , n) of the reduced-form equation

for y1 in (3.1) are identifiable, despite the multiplicity of leastsquares estimators.9 So,

β may be identifiable through the orthogonality betweenv1t andX•t . We can prove the

following proposition on the identification ofβ whenγ2 is left restricted andX2 may be

rank-deficient.

Proposition 3.1 Suppose that(2.1)-(2.2) and Assumptions A -C are satisfied. Then:

β is identifiable ⇔ π2 /∈ K er(ΩW) andγ2 ∈ K er(π ′2ΩW),(3.2)

whereK er(ΩW) andK er(π ′2ΩW) denote the null sets ofΩW andπ ′

2ΩW, respectively.

Remark 3.2 (i) The identification condition in Proposition3.1can be stated asπ ′2ΩW 6= 0

andπ ′2ΩWγ2 = 0, and is easy to interpret. First,π ′

2ΩW 6= 0 means that the instruments in

X2 are strong. Second, observe thatW′t π2 can be viewed as the indirect effect ofX2t on y1t

when the effect of the exogenous variablesX1t has been eliminated, andW′t γ2 is its direct

effect ony1t . So, the conditionπ ′2ΩWγ2 = E(π ′

2WtW′t γ2) = 0 means that both effects are

uncorrelated [similar to Imbens et al. (2011)].

(ii) If γ2= 0 (strict exogeneity) andν2= k2 (X2 has full-column rank), the identification

condition of Proposition3.1 becomesπ2 6= 0. So, Proposition3.1 generalizes the usual

necessary and sufficient condition for identification in theprevious weak IV literature.

(iii) Proposition 3.1 also generalizes the condition under which the two-stage least

squares estimator is consistent in Doko Tchatoka and Dufour(2008, Eqs(4.8)-(4.9)), and

9Under Assumption C-(ii), we can writev1t = yt −E(y1t |X•t) for all t = 1, . . . , n. From Magnus andNeudecker (1999, Ch. 13, Theorem 15),E(y1t |X•t) is identifiable even whenν2 < k2, hencev1t is alsoidentifiable.

8

those under which the Donald and Newey (2001) bias-corrected estimator and Phillips and

Hale (1977) jackknife-instrumental-variables estimators are consistent and asymptotically

normal in Imbens et al. (2011). Both Doko Tchatoka and Dufour(2008) and Imbens et

al. (2011) assume thatX2 has full-column rankk2. Here, we allowX2 to have any arbitrary

rank. In addition, Imbens et al. (2011) analyze the setup in whichX2 is strong (i.e.,π2 6= 0

in our framework), meaning that weakly identified models arerule out of their scope. Here,

we also allow for any arbitrary value ofπ2.

(iv) Under the conditions of Lemma3.1, the usualF-type or Wald-type statistics based

onk-class estimators10 could be used to assessHθ0

and build CSs forβ , despite instrument

endogeneity. However, these identification conditions albeit interesting, rule out model

where identification is not very strong and they are in addition difficult to implement in

practice (because the conditionγ2 ∈ K er(Π ′2ΩW) cannot be verified empirically, asγ2 is

not consistently estimable under instrument invalidity).

Clearly, Proposition3.1, albeit interesting because it shows that the usual procedures,

such asF- or Wald-type tests, may yield valid inference when IVs are invalid, it cannot

be implemented in empirical applications. In the remainderof this paper, we focus on

developing tests forHθ0

and building CSs forθ and scalar linear transformations ofθ .

4. Exact inference

In this section, we develop a finite-sample procedure for assessingHθ0

and building CSs

on θ . First, we propose a test forHθ0

that is similar despite instrument possible endogene-

ity and rank deficiency. Second, we use test inversion methodto obtain joint CSs with

level 1−α for θ , where 0< α < 1. Finally, we apply the projection techniques11 to get

identification-robust CSs with level 1−α (at least) for scalar linear transformations ofθ .

The marginal CSsc for the structural coefficient (β ) and each component of instrument

endogeneity (γ2) are deduced as special cases of the proposed projection method.

10For example, the Donald and Newey (2001) bias-corrected estimator or the Phillips and Hale (1977)jackknife-instrumental-variables estimator.

11see Dufour and Jasiak (2001), Dufour and Taamouti (2005), Dufour and Taamouti (2007), andDoko Tchatoka and Dufour (2014).

9

4.1. Similar test for Hθ0

We propose a generalization of Anderson and Rubin (1949) approach for assessingHθ0

.

We note that alternative procedures, such as Kleibergen (2002, K) and Moreira (2003,

CLR) tests, could be exploited for that purpose.12 However, no finite-sample distributional

theory is available for these methods, especially with heteroskedastic non-Gaussian errors.

Further, these are not robust to missing instruments.13

The Anderson and Rubin (1949) approach to testHθ0

is to consider the transformed

reduced-form equation fory1 :

y−Yβ 0−X2γ02 = X1ξ

0

11+X2ξ0

21+v0

1,(4.1)

whereξ0

11= π1(β −β 0)+γ1, ξ0

21= π2(β −β 0)+γ2−γ02, andv

0

1 ≡ v0

1(β) = v2(β −β 0)+

e. Sinceξ0

21= 0 whenβ = β 0 andγ2 = γ02, we can assessHθ

0by considering theF-statistic

of the null hypothesisξ0

21= 0 in (4.1). LetΩ = 1n−ν Y′MXY, whereY = [Y : X2], and define

S+= [(W′W)+]

1/2W′Yb0.(b

′0Ωb0)

−1/2 with b0 = (1,−θ ′0)

′.(4.2)

The generalization of the AR-statistic for assessingHθ0

is given by:

ΨAR(S+;θ0) = S+

′S+/(ν −ν1).(4.3)

The corresponding test rejectsHθ0

at levelα (0< α < 1) when

ΨAR(S+;θ0)> κΨ ,α (S

+;θ0)(4.4)

whereκΨ ,α (S+;θ0) is the 1−α quantile ofΨAR(S

+;θ0) and the critical value function is

defined asκΨ ,α (S+;θ0) = inf

τ ∈ R; Pθ0(ΨAR(S

+;θ0)> τ)≤ α

. If the distribution of

ΨAR(S+;θ0), conditional14 onX = x, is absolutely continuous with respect to the Lebesgue

12For example, Andrews et al. (2006) show that the CLR-test is nearly uniformly more powerful (UMP)among invariant similar tests that are asymptotically efficient, and have recommend the use of this test inempirical practice. Guggenberger et al. (2012) show that the plug-in Anderson and Rubin (1949) (AR) andKleibergen (2002) (K) subset statistics yield more powerful tests than their projection-based counterparts.

13See Dufour and Taamouti (2007), Dufour et al. (2013), and Doko Tchatoka (2014).14Observe that for a givenb0, S

+only depends on the data(Y,X) ∈ Y ×X . So, if the distribution ofY,

givenX, is absolutely continuous with respect to the Lebesgue measure, then the distribution ofΨAR(S+

;θ 0)is also absolutely continuous with respect to the Lebesgue measure.

10

measure, we obtain

Pθ0[ΨAR(S+;θ0)> κΨ ,α (S

+;θ0)] = α(4.5)

so that the test based on the critical valueκΨ ,α (S+;θ0) is exact. To implement this test, the

critical valuesκΨ ,α (S+;θ0) need to be computed from the observed data, especially with

non-Gaussian errors. This will be done using numerical simulations. Let

S+

ω = [(W′W)+]1/2

W′ω .

(ω ′MXωn−ν

)−1/2

for all ω ∈ e, εσ ,(4.6)

wheree andεσ are the error terms satisfying (2.3) and (2.4), respectively. Let P0S+(x;θ0)

andPS+ω(x,ω), ω ∈ e, εσ , denote the distributions ofS

+|X=x andS+

ω |X=x, respectively,

underHθ0. We note thatP

S+ω(x,ω) does not directly depend on a specific valueθ0 tested

because the statisticS+

ω does not directly involveθ . We can now state Lemma4.1 on the

behavior ofS+

andS+

ω , ω ∈ e, εσ , underHθ0

.

Lemma 4.1 Suppose that Assumptions A - B and Hθ0

are satisfied. Then, conditional on

X = x, we have:

(a) P0S+(x;θ0) = P

S+e(x,e);

(b) PS+e(x,e) is invariant to the transformation(2.4)⇔ P

S+e(x,e) = P

S+εσ(x,εσ ) ∀ εσ

satisfying(2.4) . If further Assumption D holds, we havePS+e(x,e)≡ P

S+

ε

(x, ε),

whereε ∼ Pε (x) andPε (x) is completely specified.

Remark 4.2 (i) Lemma4.1-(a) shows that the distribution ofS+, underHθ

0, only depends

on X and the error of the regression (2.3). So, the reduced-form errors v2 plays no role,

therefore, they can heteroskedastic in any arbitrary way. From (4.3), it is also clear that the

null distribution ofΨAR(S+;θ0) also depends only onX and the distribution ofe.

(ii) Lemma4.1-(b) shows that the conditional distribution ofS+

underHθ0, givenX =

x, is invariant to any linear transformation satisfying (2.4). In particular, the conditional

distribution of S+

underHθ0, given X = x, only depends on the distribution ofε under

Assumption D. Therefore, the distributionΨAR(S+;θ0)|X=x underHθ

0, only depends on the

distribution ofε.

11

(iii) If ε is normally distributed15 and is independent ofX, then it is straightforward

to show thatΨAR(S+;θ0) ∼ F(ν −ν1,n−ν) for all values ofπ2. So,Hθ

0can be assessed

by using the critical values of aF-distribution with (ν − ν1,n− ν) degrees of freedom.

However, If (2.5) does not hold (non-Gaussian error) or ifε is not independent ofX, the

null distribution ofΨAR(S+;θ0)|X=x is nonstandard. Nevertheless, it does not involve any

nuisance parameter. So, we can proceed as follows16 to compute the 1−α critical value

of ΨAR(S+;θ0) underHθ

0: (1) chooseα1 and N so thatα = [α1N]+1

N+1 , where [z] is the

smallest integer greater thanz; (2) for a givenθ 0, compute the test statisticΨ (0)AR

(S+;θ0)

based on the observed data;(3) generateN i.i.d. error vectorsε( j)= [ε( j)

1 , . . . , ε( j)n ]′,

j = 1, . . . ,N , according to the specified distributionPε ,x and compute the corresponding

statisticΨ ( j)AR

, j = 1, . . . , N, following (4.3); note that the null distribution ofΨAR(S+;θ0)

does not depend on the specific valuesθ0 tested, so there is no need to make it depend on

θ0; (4) compute the empirical distribution function based onΨ ( j)AR

, j = 1, . . . , N,

(4.7) PΨ (z;N)≡ PΨ (z) =∑N

j=11[Ψ ( j)AR

≤ z]

N+1,

where1[C] = 1 if condition C holds, and1[C] = 0 otherwise;(5) reject Hθ0

at level α

whenΨ (0)AR

(S+;θ0)≥ κMC(ε;α) = P

−1Ψ

(1−α1) , whereP−1

Ψ(q) = infz: PΨ (z)≥ q is the

generalized inverse ofPΨ (·). We can now prove Theorem4.3on the validity of the AR-test,

whereFα(n−ν,ν −ν1) denotes the 1−α quantile ofF(ν −ν2,n−ν).

Theorem 4.3 Suppose that Assumptions A - B and D are satisfied. Then, the test that re-

jects Hθ0

whenΨAR(S+;θ0) > cΨ (ε;α) is similar with significance levelα for all values

of π2 (instrument quality), wherecΨ (ε;α) = Fα(n− ν ,ν − ν1) if (2.5) holds and X is

independent ofε, andcΨ (ε;α) = κMC(ε;α) otherwise.

Remark 4.4 (i) Theorem4.3shows that the critical values computed as in Remark??-(iii)

yield a test with correct level in finite-sample, even when the model is weakly identified

(π2 = 0 or is close to being so) and the errors are non-Gaussian. So,the proposed test is

15That is, if equation (2.5) is satisfied.16We cover the case in whichPε (x) is continuous, so that the null distribution ofΨAR(S

+;θ 0)|X=x is also

continuous. IfPε (x) is not continuous, the Monte Carlo test algorithm can easilybe adapted by using “tie-breaking” method, as in Dufour (2006).

12

robust to weak IVs and non-Gaussian errors (even in small samples), despite instrument

possible invalidity (γ02 6= 0).

(ii) Since the null distribution ofΨAR(S+;θ0) does not depend on any of the variables

and parameters in (2.2), Theorem4.3hold even when the reduced-form fory2 is given by

y2 = m(X1,X2,X3, v2, π∗1,π

∗2,π

∗3),(4.8)

whereπ∗1, π∗

2, andπ∗3 are vectors of unknown reduced-form coefficients,m(·) is an arbi-

trary unspecified (possibly) nonlinear function, andX3 ∈ Rn×k3 is a matrix of instruments

that may have been omitted from (2.2). Because of the later properties, the proposed proce-

dure is robust to nonlinear and incomplete reduced-forms [similar to Dufour and Taamouti

(2007) and Dufour et al. (2013)]. More interestingly,y21, . . . , y2n may be arbitrary heteroge-

nous and the reduced-form disturbancesv21, . . . , v2n may not follow a Gaussian distribution

or may also be arbitrary heteroskedastic. So, the proposed procedure is also robust to het-

erogeneity in the reduced-forms.

We now examine the finite-sample power of the proposed test. To do this, we consider

the following linear transformation [similar to (2.4)] on the errorv0

1 of the regression (4.1):

Rn −→ R

n

v0

1 7→ εσβ= σ β (X)v

0

1,(4.9)

whereσ β (X) is (possibly) a random function ofX andβ such thatPδ [σ β (X) 6= 0|X=x] = 1.

In addition, we also make the following assumption.

Assumption E There existsσ β (X) satisfying(4.9) such thatεσβ|X=x

d∼ v, where the

distributionPv(x) of v, given X= x, is completely specified.

Assumption E is similar to Assumption D. It states that the distribution of the reduced-

form disturbancev0

1 only depends onX and a typically unknown (possibly) random scale

factorσ β (X), which is also (possibly) a function of bothX and the structural coefficientβ .

Again, a Gaussian distribution forv0

1 is obtained by choosingPv(x) = N(0, In). But non-

Gaussian distributions, including heavy-tailed distributions which may lack moments, are

covered. In general, Assumptions D and E do not entail each other, except whenβ = β 0 or

13

the conditional distribution of(e,v2), givenX = x, is Gaussian with finite second moments.

Let

Sv = [(W′W)+]1/2

W′v, σ v =

(v′MXvn−ν

)1/2

, and µπ2θ

= µπ2Cθ(4.10)

whereCθ = (θ − θ0)σ β (X), µπ2= [(W′W)+]

1/2W′W[π2 : Ik2], and v is the error in As-

sumption E. Lemma4.5characterizes the distribution ofS+

andΨAR(S+;θ0) underHθ

1.

Lemma 4.5 Suppose that Assumptions A - B and E are satisfied. If furtherθ 6= θ0, Then

we have:

S+ d∼ σ−1

v(Sv +µ

π2θ) and ΨAR(S

+;θ 0)

d∼ (ν −ν1)−1σ−2

v(Sv +µ

π2θ)′(Sv +µ

π2θ).

Remark 4.6 (i) states that the distribution ofΨAR(S+;θ 0), underHθ

1, only depends on the

distributions ofSv andσ v, as well as the factorµπ2θ

. Since givenX = x, the distributions of

Sv andσ v only depend on that ofv, it is clear that the conditional distribution ofΨAR(S+;θ0)

underHθ1, given X = x, only depends onµ

π2θand the distribution ofv. Therefore, the

power function,ηAR(·), of the corresponding AR-test that rejectsHθ0

whenΨAR(S+;θ0) >

cΨ (ε;α), is entirely determined by the distribution ofv and the factorµπ2θ

, i.e.

Pθ∈Hθ

1

[ΨAR(S

+;θ 0)> cΨ (ε;α)

]= ηAR(v,µπ2θ

;α).(4.11)

Under Assumption E,v∼ Pv(x) andPv(x) does not depend onθ . So,µπ2θ

is the only factor

that determines test power.

(ii) If v∼ Pv(x)≡ N(0, In) andX is independent ofv, thenΨAR(S+;θ0)|X=x ∼ F

τ2x,θ(ν −

ν1,n− ν) for all values ofπ2, whereτ2x,θ = σ2

β ‖ µπ2θ

‖2 is the non-centrality parameter

[similar to Revankar and Hartley (1972)]. Therefore, the exact power of the test in (4.11)

can be computed from the sample using a noncentralF-distribution with(ν − ν1,n− ν)

degrees of freedom and non-centrality parameterτ2x,θ for θ andπ2 fixed. If Pv(x) is not a

normal distribution orX depends onv, the distribution ofΨAR(S+;θ 0)|X=x, underHθ

1, is

nonstandard but it can be simulated forθ andπ2 fixed. So, the exact power of the test can

also be simulated forθ andπ2 fixed, by using the Monte Carlo test method described in

Remark??-(iii). We can now state the following necessary and sufficient condition under

which the proposed AR-test exhibit power in finite-sample.

14

Theorem 4.7 Suppose that Assumptions A - B and E are satisfied. Then, the test that re-

jects Hθ0

whenΨAR(S+;θ0) > cΨ (ε;α) exhibits power for all values ofπ2, if, and only if,

ξ0

21 /∈ K er([(W′W)+]

1/2W′W

), whereξ

0

21 = [π2 : Ik2](θ −θ0).

Theorem4.7 follows directly from Lemma4.5 shows thatµπ2θ

is the only factor

that determines the proposed AR-test power, i.e., power exists if, and only if,µπ2θ

=

σ β (X)[(W′W)+]1/2

W′Wξ0

21 6= 0, or equivalently,ξ0

21 /∈ K er([(W′W)+]

1/2W′W

), since

Pδ [σ β (X) 6= 0|X=x] = 1 from (4.9). As seen from the expression ofξ0

21, power may still

exist even whenπ2 = 0 (irrelevant instruments), providedγ2− γ02 6= 0. However, the test

has low power if bothπ2 and γ2− γ02 are zero or close to being so. We now focus on

building CSs forθ0 and scalar linear transformations ofθ 0.

4.2. Exact confidence sets

In section, we develop a methodology to builds CSs onθ 0 and linear combinations of the

elements ofθ 0. Whenθ0 is unknown,Ψ (0)AR (S

+;θ0) is also unknown and the test procedure

described in Remark4.2-(iii) is not directly implementable. We stress the fact exact CSs

can be obtained for model parameters by using test inversiontechniques. In Section 4.2.1,

we describe how to build joint CSs forθ0, while in Section 4.2.2, we deal with scalar linear

transformationsw′θ 0, for somew 6= 0.

4.2.1. Joint confidence sets forθ

In Theorem4.3, we show that the test that rejectsHθ0

whenΨAR(S+;θ0) > cΨ (ε;α) is

similar with significance levelα for any identification strengthπ2. So, we can invert

ΨAR(S+;θ0) to obtain a joint CS with level 1−α for θ0. More precisely, the generalized

Anderson-Rubin-type CS forθ0 is given by:

(4.12) Cθ (α) =

θ0 : ΨAR(S+;θ0)≤ cΨ (ε;α)

= θ 0 : Q(θ0)≤ 0

whereQ(θ0) = θ ′0Aθ0 + b′θ0 + c is a quadratic-linear form inθ 0 such thatA = [y2 :

X2]′H[y2 : X2], b = −2[y2 : X2]

′Hy, c = y′Hy, H = MX1 − [1+ cΨ (ε;α)(ν−ν1n−ν )]MX. De-

pending on the value ofA, b, andc, the quadric surfaceQ(θ0) = 0 may take different

15

forms: ellipsoid, paraboloid, hyperboloid, andcone. So, the confidence setCθ (α) may

be unbounded; see Dufour and Taamouti (2005, Theorem 4.1). In particular,Cθ (α) is

unbounded whenA is not positive semi-definite. We will now focus on building CSs for

w′θ 0.

4.2.2. Projection-based confidence sets forw′θ0

We use the projection techniques17 to obtained CSs for scalar linear transformtionw′θ 0.

Let h(θ) be any arbitrary function ofθ , andCθ (α) be the joint CS forθ0 in (4.12). Since

the eventθ ∈ Cθ (α) entailsh(θ) ∈ h[Cθ (α)], henceh[Cθ (α)] = h(θ) : θ ∈ Cθ (α) is a

confidence set with level (at least)18 1−α for h(θ). Conceptually, the confidence set with

level (at least) 1−α for h(θ0) = w′θ0, obtained by projectingCθ (α) is defined as:

Cw′θ (α) = h[Cθ (α)] = ζ 0 : ζ 0 = w′θ0 for someθ0 ∈ Cθ (α)(4.13)

= ζ 0 : ζ 0 = w′θ0 s.t. Q(θ0)≤ 0 .

Without any loss of generality, let partitionw asw = (w1,w′2)

′, wherew1 6= 0 is a scalar

andw2 is ak2×1 vector (possible zero). LetR=

w′

R2

=

w1 w′

2

0 IG+k2−1

and define

A = R−1′AR−1 =

a11 A′

21

A21 A22

, b= R−1′b=

b1

b2

,(4.14)

A andb are given in (4.12). Also, consider the spectral decomposition of A22 given by:

A22= P2Λ2P′2, Λ2 = diag(λ1, . . . , λ k2),(4.15)

whereP21 : k2× p2, P22 : k2× (k2− p2), andλ j are the eigenvalues ofA22 with λ j 6= 0 if

1≤ j ≤ p2, λ j = 0 if j > p2; andp2 = rank(A22). We can now prove Theorem4.8on the

analytic form ofCw′θ (α) in (4.14).

Theorem 4.8 Suppose that(2.1) - (2.2), Assumptions A - B, and D are satisfied. Then, we

have:17see Dufour and Jasiak (2001) and Dufour and Taamouti (2005, 2007).18Observe thatP [h(θ) ∈ hCθ (α)]≥ P[θ ∈ Cθ (α)]≥ 1−α so thath[Cθ (α)] has level at least 1−α.

16

Cw′θ (α) =

ζ 0 : a1ζ 20+ b1ζ 0+ c1 ≤ 0

∪S1 if A22 6= 0 is p.s.d.,

=

ζ 0 : a1ζ 20+ b1ζ 0+c≤ 0

∪

ζ 0 : 2A21ζ 0+ b2 6= 0

if A22 = 0 ,

= R otherwise;

where a1 = a11 − A′21A

+22A21, b1 = b1 − A′

21A+22b2, c1 = c − 1

4b′2A+22b2, S1 = /0 if

rank(A22) = k2, and S1 =

ζ 0 : P′22(2A21ζ 0+ b2) 6= 0

if 1≤ rank(A22)< k2.

Remark 4.9 (i) First, we observe that Theorem4.8 is similar to Theorem 4.1 in Dufour

and Taamouti (2007), so, we only give the guide lines of the proof in the appendix.

(ii) The theorem provides the analytical form of the CSs for any linear combina-

tion of the elements ofθ 0, but we find it useful to discuss the follow two interesting

applications in details: (1) CS for the structural coefficient β 0, and (2) instrument selection.

1. CS for the structural coefficientβ 0

The CS forβ 0 is obtained from Theorem4.8by choosingw1 = 1 andw2 = 0 in (4.14).

In this case, we have ¯a11 = y′2Hy2, A21 = W′y2, A22 = W′W, b1 = −2y′2Hy1, and b2 =

−2W′y1, whereH is given in (4.12) andW = MX1X2. So, the CS forβ0 with level (at least)

1−α is explicitly given by:

Cβ (α) =

β 0 : a1β 2

0+ b1β 0+ c1 ≤ 0∪S1 , if W′W 6= 0 is p.s.d.,

β 0 : a1β 2

0+b1β 0+c≤ 0∪S2 , if W′W = 0 ,

R if W′W is not p.s.d.,

(4.16)

where a1 = y′2(H − PW)y2, b1 = −2y′2(H − PW)y1, c1 = y′1(H − PW)y1, PW =

W(W′W)+W′, S2 = β 0 : W′y2β 0−W′y1 6= 0 , and S1 = /0 if rank(W′W) = k2 and

S1 = β 0 : P′22(W

′y2β 0−W′y1) 6= 0 if 1 ≤ rank(W′W) < k2. So, the analytical form of

Cβ (α) in (4.16) can be explicitly given by looking the eigenvaluesof instrument matrix

W′W. For example, if all eigenvalues ofW′W are positive,Cβ (α) takes the form of the

quadratic inequality, i.e.,Cβ (α) =

β 0 : a1β 20+ b1β 0+ c1 ≤ 0

.

2. Instrument selection

A second interesting application of Theorem4.8 is instrument selection. Letγ2 =

17

(γ21, . . . ,γ2k2) ≡ (γ0

2p)1≤p≤k2. Sinceγ0

2p = 0 entails that the variableX2p constitute a valid

instrument, the CS forγ02p provides a test of the validity ofX2p for all p=1, . . . ,k2. Specific,

we selectX2p as a valid IV if the CS of its coefficient (γ02p) in the structural equation

contains zero, i.e., 0∈ Cγ02p(α). If 0 /∈ Cγ0

2p(α), X2p does not constitute a valid instrument.

We stress the fact that instrument selection may still be meaningful, although (4.16)

provides a valid CS for the structural coefficient of interest β . For example, in empirical

applications where not all instruments are weak, providinga procedure to select those that

are valid may yield consistent point estimate ofβ that is relevant for policy analysis. We

now show how to obtainCγ02p(α) from Theorem4.8, for all p= 1, . . . ,k2.

(1) For eachp= 1, . . . ,k2, rearrange the parameters and data as follows:

θ (p) = (γ02p,θ

∗′(p))

′, θ∗(p) = (β ,γ∗

′2(p))

′, γ∗2(p) = γ2\γ02p,(4.17)

X(p)2 = [y2 : X2(p)], X2(p) = X2\X2p, W = MX1X2 = [W1, . . . ,Wp, . . . ,Wk2],(4.18)

where convention, we consider thatγ∗2(p) is simply not present in (4.17) whenk2 = 1. (2)

Compute the quantitiesa(p)11 =W′pWp, A(p)

21 = X(p)′

2 Wp, A(p)22 = X(p)′

2 HX(p)2 , b(p)1 =−2W′

py1,

b(p)2 = −2X(p)′

2 Hy1, andc(p) = y′1Hy1, as well as ˜a1p = W′2P

HX(p)2

Wp, b1p = −2W′p(In−

PHX(p)

2)y1, and c1p = y′1(H −P

HX(p)2)y1, whereP

HX(p)2

= HX(p)2 (X(p)′

2 HX(p)2 )+X(p)′

2 H. And

(3) the CSCγ02p(α), p= 1, . . . ,k2, is obtained by choosingw≡ w(p) = (1,0′)′ and replacing

a11, A21, A22, b1, andb2 by a(p)11 , A(p)21 , A(p)

22 , b(p)1 , b(p)2 , andc(p) in (4.16), respectively, i.e.:

Cγ02p(α) =

γ0

2p : a1p(γ02p)

2+ b1p(γ02p)+ c1p ≤ 0

∪S1p , if A(p)

22 6= 0 is p.s.d.,

γ02p : a1p(γ0

2p)2+b1pγ0

2p+c1p ≤ 0∪S2p , if A(p)

22 = 0 ,

R if A(p)22 is not p.s.d.,

(4.19)

where S2 =

γ02p : X(p)′

2 Wpγ02p−X(p)′

2 Hy1 6= 0, S1 = /0 if rank(X(p)′

2 HX(p)2 ) = k2 and

S1 =

γ02p : P′

22(X(p)′

2 Wpγ02p−X(p)′

2 Hy1) 6= 0

if 1 ≤ rank(X(p)′

2 HX(p)2 ) < k2. Again, if

X(p)′

2 HX(p)2 is positive definite, thenCγ0

2p(α) takes the form of the quadratic inequality,

i.e.,Cγ02p(α) =

γ0

2p : a1p(γ02p)

2+ b1p(γ02p)+ c1p ≤ 0

.

We will now illustrate our theory through a Monte Carlo experiment.

18

5. Simulation experiment

We use simulation to examine the performance of the proposedAR-test. The DGP19 is

y1t = y2tβ +ut , ut = X′2tγ2+et ,(5.1)

y2t = m(X2t, X3t , v2t ; π2, δ ), t = 1, . . . , n,(5.2)

where the reduced-form model fory2t uses two alternative specifications: (1)

m(X2t, X3t , v2t ; π2, δ ) = X′2tπ2+X′

3tδ +v2t , and (2)m(X2t, X3t , v2t ; π2, δ ) = exp(X′2tπ2+

X′3tδ )+ v2t . The first specification is the usual linear model, while the second is nonlin-

ear. X3 is a n× 5 matrix of instruments that belong to the true DGP, but are omitted in

the inference (missing instruments). So,δ measures the degree of instrument omission in

this setup. Ifδ = 0, then no instrument is omitted whileδ 6= 0 means relevant instrument

exclusion. In this experiment, we setδ = λδ 0, whereδ 0 is a 5× 1 vector of ones and

λ varies in0, 0.01, .1, 1 . For example,λ = 0 is a design of no instrument exclusion,

λ = 0.01 is a design of weak instrument exclusion,λ = 0.1 is a design of moderately weak

instrument exclusion, andλ = 1 is a design of strong instrument exclusion.X2 contains

k2 = 5 instruments that violate the exclusion restrictions ifγ2 6= 0. Each column ofX2 and

X3 is generated i.i.d. normal with identity matrix. The reduced-form coefficient vectorπ2

is chosen asπ2 = ( µ2

n‖X2π0‖)1/2π0, whereπ0 is a 5×1 vector of ones,µ2 is the concentra-

tion parameter which describes the strength ofX2. We varyµ2 in 0, 13, 1000, where

µ2 = 0 is a complete non-identification or irrelevant IVs setup,µ2 = 13 is a design of

weak instruments, andµ2 = 1000 is for strong identification (strong instruments).20 We

setβ −β 0 = β ∗, γ2−γ02 = τ∗.γ0

2, whereβ0 = 1, γ02 is a 5×1 vector of ones (so the IVs are

invalid), andβ ∗ andτ∗ vary inR. In this setup, the null hypothesisHθ0

is equivalent to test

whetherβ ∗ = τ∗ = 0. So,β ∗ = τ∗ = 0 in the graphs indicates the empirical size, while the

valuesβ ∗ 6= 0 andτ∗ 6= 0 indicate test empirical power. To shorten the exposition,we only

present the empirical power in the direction ofτ∗ = β ∗/3, but the results do not change

qualitatively with alternative directions.

We also consider two alternative specifications for the errors [e,v2] joint distribution. In

19Note that there is no exogenous variableX1 in (5.1)-(5.2), but the results do not change qualitativelyifsuch exogenous variables were included.

20See Hansen et al. (2008) and Guggenberger (2010) for a similar parametrization.

19

the first one,(et ,v2t)′ ∼ N

[0, σ2(X2)Σρ

]for all t = 1, . . . , n (conditional Gaussian errors),

whereσ2(X2) = exp(

ϖ‖k−1/2

2 X2‖)

, Σρ =

1 ρ

ρ 1

, ρ varies in0.2, 0.5, 0.9, and

ϖ ∈ 0, 1,−1. In the second one,(et ,v2t) follow a multivariatet(3) distribution with

the same covariance matrix as the first specification. In bothcases, Assumptions D and

E are satisfied. Ifϖ = 0, the errors are homoskedastic, but they are heteroskedasticif

ϖ ∈ 1,−1 . We use the exact Monte Carlo test critical values in all cases.

Figures 1 - 3 present the results. Figure 1 is about Gaussian heteroskedastic errors, while

Figures 2 and 3 deal with homoskedastic and heteroskedastict(3)-errors, respectively. In

all figures, the power curves are drawn for each strength of the omitted instrumentsX3

(λ ∈0, 0.01, .1, 1). In each figure, the sub-figures (a) and (c) represent the cases in which

all instruments inX2 are irrelevant (µ2=0), whereas (b) and (d) describe strong instruments

(µ2 = 1000).21 Meanwhile, the sub-figures (a) and (b) represent a linear specification of the

reduced-form fory2, while those in (c) and (d) are the nonlinear specification. The sample

size is set atn = 50, the nominal level at 5%, and the rejection frequencies are computed

usingN = 10,000 pseudo-samples.

First, we observe that in all cases– including heteroskedastic errors, nonlinear reduced-

form, and missing instruments– the rejection frequencies under Hθ0

is very close to the

nominal 5% level (seeβ ∗ = 0 in all graphs). So, the proposed tests are robust to weak

identification, heteroskedastic and possibly non-Gaussian errors, as well as nonlinearity

and instrument exclusion in the reduced-form specification, thus conforming our theory

findings in Section 4.Second, we note that all tests have good power in all cases con-

sidered despite the relatively small sample size (n = 50). In particular, the exclusion of

relevant instruments in the inference does not substantially affect the power of the tests

when identification is strong (µ2 = 1000), showed the power curves for different values

of λ in each sub-figure (b) and (d). However, instrument exclusion have a slight effect on

test power in absence of identification (µ2 = 0), as showed the power curves for different

values ofλ in each sub-figure (a) and (c). In addition, note that the proposed test have good

power even witht(3)-type heteroskedastic errors and nonlinear reduced-form with missing

instruments, this confirming our theoretical conclusions in Section 4.1.

21The case in whichµ = 13 (weak instruments) is omitted to shorten the exposition.

20

Figure 1. Power of AR-test with heteroskedastic errors (σ = .1)

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70

80

90Power of AR−test with heteroskedastic errors (σ=0.1): n=50

Rejec

tion f

reque

ncies

β∗

No IV exclusion: λ=0

IV exclusion:λ=0.01

IV exclusion:λ=0.1

(a) Normal errors:µ2 = 0,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70

80

90


Rejec

tion f

reque

ncies

β∗



IV exclusion:λ=0.1

(b) Normal errors:µ2 = 103,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70

80

90


Rejec

tion f

reque

ncies

β∗



IV exclusion:λ=0.1

(c) Normal errors:µ2 = 0,m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70

80

90


Rejec

tion f

reque

ncies

β∗



IV exclusion:λ=0.1

(d) Normal errors:µ2 = 103,m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)

Figure 2. Power of exact Monte Carlo AR-test with homoskedastic errors (σ = 0)

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70

80Power of exact Monte Carlo AR−test with homoskedastic errors (σ=0): n=50

Rejec

tion f

reque

ncies

β∗



IV exclusion:λ=0.1

(a) t(3)-errors:µ2 = 0,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70

80

90


Rejec

tion f

reque

ncies

β∗



IV exclusion:λ=0.1

(b) t(3)-errors:µ2 = 103,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60


Rejec

tion f

reque

ncies

β∗



IV exclusion:λ=0.1

(c) t(3)-errors:µ2 = 0, m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70

80

90


Rejec

tion f

reque

ncies

β∗



IV exclusion:λ=0.1

(d) t(3)-errors:µ2 = 1000, m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)

21

Figure 3. Power of exact Monte Carlo AR-test with heteroskedastic errors (σ = 0.1)

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70Power of exact Monte Carlo AR−test with heteroskedastic errors (σ=0.1): n=50

Rej

ectio

n fre

quen

cies

β∗



IV exclusion:λ=0.1

(a) t(3)-errors:µ2 = 0,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70

80

90


Rejec

tion f

reque

ncies

β∗



IV exclusion:λ=0.1

(b) t(3)-errors:µ2 = 103,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0

−3 −2 −1 0 1 2 3 4 50

5

10

15

20

25

30


Rej

ectio

n fre

quen

cies

β∗



IV exclusion:λ=0.1

(c) t(3)-errors:µ2 = 0, m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)

−3 −2 −1 0 1 2 3 4 50

10

20

30

40

50

60

70

80

90


Rejec

tion f

reque

ncies

β∗



IV exclusion:λ=0.1

(d) t(3)-errors:µ2 = 1000, m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)

22

6. Empirical application

We apply the proposed methods to Card (1995) model of the returns to education and

earnings. The version of this model after controlling for eventual instrument invalidity is:

log(wage) = βeduc+X′1γ1+X′

2γ2+e,(6.1)

educ = X′1π1+X′

2π2+v2,(6.2)

where wage is the earning, educ is the length of education (schooling),X1 =

[1,exper,exper2, race,smsa66,south66, IQ]consists of a constant, experience variables and

indicator variables for race, residence in a metropolitan area, residence in the south of the

United States, and IQ score. The instrument matrixX2 consists of the proximity-to-college

indicators for educational attainment; these areproximity to 2- and 4-year college.Hence,

we haveγ2 = (γ21,γ22)′ ∈ R

2. The original specification in Card (1995) imposes the ex-

clusion restrictions, i.e.,γ2 = 0. In recent years, several studies have raised concerns about

the validity of the proximity to 2- and 4-year college indicators as instruments foreduc;

for example, see Slichter (2013, Section 5). Here, we allowγ2 6= 0 and we use the method

proposed in this paper to build a joint confidence sets forθ = (β ,γ21,γ22)′ and marginal

confidence sets forβ , γ21, andγ22. Moreover, Kleibergen (2004, Table 2, p. 421) shows

that the proximity-to-college indicator instruments are not very strong. So, it is important to

use statistical procedures that are robust to both weak and invalid instruments for inference

in model (6.1 )-(6.2).

The data analyzed are from the National Longitudinal Surveyof Young Men (from

1966 to 1981). We use the cross-sectional 1976 subsample which contains 3010 observa-

tions. The variables contained in the data set are: two variables indicating the proximity to

college, the length of education, log wages, experience, age, racial, metropolitan, family,

and regional indicators.

If we impose the exclusion restrictions (γ2 = 0), the identification-robust confidence set

with level 95% for the returns to education (β ) that result on invertingAR(β 0) is given by:22

Cβ (α)=

β 0 : 41.437β20−21.193β0+1.886≤ 0

= [11.47%, 39.67%]when IQ score is

22The results reported are based on the critical values of theF-distribution but the results are similar whenthe asymptoticχ2 critical values are used.

23

used inX1 andCβ (α) =

β 0 : 24.239β20−11.449β0+0.978≤ 0

= [11.20%, 36.04%]

when IQ score is not part ofX1, where the explicit forms of the confidence intervals are

obtained by projection. If the exclusion restrictions imposed were satisfied, the fact that

Cβ (α) is very wide should indicate identification problems. But this conclusion may go

too far if γ2 6= 0, because the results are not valid, as extensively discussedin this paper.

We now focus on the case whereγ2 is left unrestricted.

Table 1 reported the joint confidence set with level 95% ofθ = (β ,γ21,γ22)′ based on

invertingAR(β0,γ02) and the marginal confidence sets for each parameter, obtained by pro-

jection; with or without the IQ score variable.23 We observe that the results are similar

with or without the IQ score variable. First, the confidence interval for the returns to edu-

cation is unbounded in both cases, thus educating thatβ is not identifiable after instrument

endogeneity is controlled for. Meanwhile, the confidence intervals for both instrument en-

dogeneity parameters (γ21 andγ22) are bounded and not very wide. Second, the confidence

intervals for the coefficient on the proximity to 2-year college indicator instrument (γ21)

does not include zero, with or without IQ score. This suggestthat this instrument violates

the exclusion restrictions (invalid instrument). So, imposing γ21 = 0, as usually done in

most applications may be problematic since even a slight correlation between the instru-

ments and errors can be detrimental to statistical inference; see Doko Tchatoka and Dufour

(2008) and Guggenberger (2011), among others. However, theprojection method fails to

reject that the proximity to 4-year college indicator instrument satisfies the exclusion re-

strictions. But the lower bound of the confidence interval for γ22 is close to zero without

the IQ score included inX1, and its upper bound is close to zero while the IQ score is

controlled for.

Overall, this application suggests that the proximity-to-college indicator instruments

are not strictly exogenous. It is therefore important to usestatistical procedures that are

robust to instrument endogeneity when conducting inference in this model.

23Note that by definition, the marginal confidence sets have level at least 95%.

24

Table 1.Identification-robust confidence intervals in Card model ofeducation and earnings withγ2 unrestricted

X1 does not include IQ score

Joint CS (level 95%) Cθ (α) =

θ0 : θ ′0Aθ0−

(21.193 64.910 52.519

)θ 0+1.886≤ 0

A=

41.437 77.262 167.02677.262 697.101 2.275167.026 2.275 500.517

, θ 0 = (β 0,γ0

21,γ022)

′

Projected CS (level≥ 95%)

education Cβ (α) =

β 0 : −22.697β 20+3.431β 0−0.992≤ 0

= R

proximity to 2-year Cγ21(α) =

γ0

21 : 1.106(γ021)

2−0.084γ021+0.001≤ 0

= [0.01, 0.066]


γ0

22 : 2.434(γ 022)

2−0.047γ022−0.0002≤ 0

= [−0.004, 0.023]

X1 includes IQ score

Joint CS (level 95%) Cθ (α) =

θ0 : θ ′0Aθ0−

(11.449 54.325 24.421

)θ 0+0.978≤ 0

A=

24.239 63.343 109.19563.343 473.430 28.928109.195 28.928 344.378

Projected CS (level≥ 95%)

education Cβ (α) =

β 0 : −16.615β 20+2.306β 0−0.905≤ 0

= R


γ0

21 : 7.536(γ021)

2−0.716γ021+0.009≤ 0

= [0.014, 0.081]


γ0

22 : 1.305(γ 022)

2+0.036γ022−0.0004≤ 0

= [−0.035, 0.008]

25

7. Conclusion

In this paper, we studied the possibility of building tests and confidence sets in IV re-

gressions where instrumental variables can be arbitrary weak, collinear and violate the

exclusion restrictions. We showed that a procedure similarto Anderson and Rubin (1949)

(AR) approach can be used to provide identification-robust tests and CSs for the structural

and instrument endogeneity parameters. Then, we used the projection method to obtain

identification-robust CSs for scalar linear combination ofthe elements of this parameters.

CSs for the structural coefficientβ and tests of exclusion restrictions (instrument selection)

are derived as special cases of the proposed projection method.

We present a Monte Carlo experiment that confirms our theoretical findings. The pro-

posed methods are illustrated through the well known model of the returns to education

and earnings [see Card (1995)]. The results clearly indicate that the proximity-to-college

instruments used in most application involving this model are not strictly exogeneity.

References

Anderson, T. W., 1971. The Statistical Analysis of Time Series. Wiley, New York.

Anderson, T. W., Rubin, H., 1949. Estimation of the parameters of a single equation in a

complete system of stochastic equations. Annals of Mathematical Statistics 20, 46–

63.

Andrews, D. W. K., Moreira, M. J., Stock, J. H., 2006. Optimaltwo-sided invariant similar

tests for instrumental variables regression. Econometrica 74(3), 715–752.

Andrews, D. W. K., Stock, J. H., 2007a. Inference with weak instruments. In: R. Blundell,

W. Newey, T. Pearson, eds, Advances in Economics and Econometrics, Theory and

Applications, 9th Congress of the Econometric Society Vol.3. Cambridge University

Press, Cambridge, U.K., chapter 6.

Andrews, D. W. K., Stock, J. H., 2007b. Testing with many weakinstruments. Journal of

Econometrics 138, 24–46.

26

Ashley, R., 2009. Assessing the credibilility of instrumental variables inference with imper-

fect instruments via sensitivity analysis. Journal of Applied Econometrics 24(2), 325–

337.

Basmann, R. L., 1960. On the asymptotic distributions of generalized classical linear esti-

mators. Econometrica 28, 97–107.

Bazzi, S. , Clemens, A. M., 2009. Blunt instruments: A cautionary note on establishing

the causes of economic growth. Technical report, Center forGlobal Development N0.

171.

Bekker, P., 1994. Alternative approximations to the distributions of instrumental variable

estimators. Econometrica 62, 657–681.

Berkowitz, D., Caner, M. , Fang, Y. , 2008. Are nearly exogenous instruments reliable?.

Economics Letters 101, 20–23.

Berkowitz, D., Caner, M., Fang, Y., 2012. The validity of instruments revisited. Journal of


Bhattacharya, R. N., Ghosh, J., 1978. On the validity of the formal Edgeworth expansion.

The Annals of Statistics 6, 434–451.

Bhattacharya, R. N., Rao, R., 1976. Normal approximation and asymptotic expansions. In:

R. Bhattacharya, R. Rao, eds, Normal Approximation and Asymptotic Expansions.

Wiley Series in Probability and Mathematical Analysis, NewYork.

Bound, J., Jaeger, D. A., Baker, R. M., 1995. Problems with instrumental variables estima-

tion when the correlation between the instruments and the endogenous explanatory

variable is weak. Journal of the American Statistical Association 90, 443–450.

Brock, W., Durlauf, S. , 2001. Growth empirics and reality. The World Bank Economic

Review 15(2), 229–272.

Card, D., 1995. Using geographic variation in college proximity to estimate the return to

schooling. In: D. Card, ed., Aspects of Labour Market Behaviour: Essays in Honour

27

of John Vanderkamp. University of Toronto Press: in L. N. Christo. des, E. K. Grant,

and R. Swidinsky Eds, Toronto, Canada.

Chaudhuri, S., Zivot, E., 2010. A new method of projection based inference in GMM with

weakly identified nuisance parameters. Technical report, Department of Economics,

New York University N.Y.

Choi, I., Phillips, P. C. B., 1992. Asymptotic and finite sample distribution theory for IV es-

timators and tests in partially identified structural equations. Journal of Econometrics

51, 113–150.

Doko Tchatoka, F., 2013. Specification tests with weak and invalid instruments. Technical

report, School of Economics and Finance, University of Tasmania Hobart, Australia.

Doko Tchatoka, F., 2014. Subset hypotheses testing and instrument exclusion in the linear

IV regression. Econometric Theory forthcoming.

Doko Tchatoka, F. , Dufour, J.-M. , 2008. Instrument endogeneity and identification-

robust tests: some analytical results. Journal of Statistical Planning and Inference

138(9), 2649–2661.

Doko Tchatoka, F. , Dufour, J.-M., 2014. Identification-robust inference for endogeneity

parameters in linear structural models. The Econometrics Journal 17, 165–187.

Donald, S. G., Newey, W. K., 2001. Choosing the number of instruments. Econometrica

69, 1161–1191.

Dufour, J.-M., 1997. Some impossibility theorems in econometrics, with applications to

structural and dynamic models. Econometrica 65, 1365–1389.

Dufour, J.-M., 2003. Identification, weak instruments and statistical inference in economet-

rics. Canadian Journal of Economics 36(4), 767–808.

Dufour, J.-M. , 2006. Monte Carlo tests with nuisance parameters: A general approach

to finite-sample inference and nonstandard asymptotics in econometrics. Journal of


28

Dufour, J.-M., 2009. Comments on “ Weak instrument robust tests in gmm and the new

keynesian phillips curve” by F. Kleibergen and S. Mavroeidis. Journal of Business

and Economic Statistics 27, 318–321.

Dufour, J.-M., Hsiao, C., 2008. Identification. In: L. E. Blume, S. N. Durlauf, eds, The New

Palgrave Dictionary of Economics second edn . Palgrave Macmillan, Basingstoke,

Hampshire, England. forthcoming.

Dufour, J.-M., Jasiak, J., 2001. Finite sample limited information inference methods for

structural equations and models with generated regressors. International Economic

Review 42, 815–843.

Dufour, J.-M., Khalaf, L., Beaulieu, M.-C., 2010. Multivariate residual-based finite-sample

tests for serial dependence and GARCH with applications to asset pricing models.

Journal of Applied Econometrics 25, 263–285.

Dufour, J.-M., Khalaf, L., Kichian, M., 2013. Identification-robustanalysis of DSGE and

structural macroeconomic models. Journal of Monetary Economics 60, 340–350.

Dufour, J.-M., Taamouti, M., 2005. Projection-based statistical inference in linear structural

models with possibly weak instruments. Econometrica 73(4), 1351–1365.

Dufour, J.-M. , Taamouti, M. , 2007. Further results on projection-based inference in IV

regressions with weak, collinear or missing instruments. Journal of Econometrics

139(1), 133–153.

Guggenberger, P. , 2010. The impact of a Hausman pretest on the size of the hypothesis

tests. Econometric Theory 156, 337–343.

Guggenberger, P., 2011. On the asymptotic size distortion of tests when instruments locally

violate the exogeneity assumption. Econometric Theory forthcoming.

Guggenberger, P., Kleibergen, F., Mavroeidis, S., Chen, L., 2012. On the asymptotic sizes

of subset anderson-rubin and lagrange multiplier tests in linear instrumental variables

regression. Econometrica 80(6), 2649–2666.

29

Guggenberger, P., Smith, R., 2005. Generalized empirical likelihood estimators and tests

under partial, weak and strong identification. EconometricTheory 21, 667–709.

Hahn, J., Ham, J., Moon, H. R., 2010. The Hausman test and weakinstruments. Journal of


Hall, A. R., Peixe, F. P. M., 2003. A consistent method for theselection of relevant instru-

ments. Econometric Reviews 2(3), 269–287.

Hall, A. R., Rudebusch, G. D. , Wilcox, D. W. , 1996. Judging instrument relevance in

instrumental variables estimation. International Economic Review 37, 283–298.

Hansen, C., Hausman, J., Newey, W., 2008. Estimation with many instrumental variables.

Journal of Business and Economic Statistics 26(4), 398–422.

Hansen, L. P., 1982. Large sample properties of generalizedmethod of moments estimators.

Econometrica 50, 1029–1054.

Hausman, J. , Hahn, J. , 2005. Estimation with valid and invalid instruments. Annales

d’Économie et de Statistique 79–80, 25–57.

Imbens, G. W., 2003. Sensitivity to exogeneity assumptionsin program evaluation. Amer-

ican Economic Review 93(2), 126–132.

Imbens, G. W., Kolesár, M., Chetty, R., Friedman, J., Glaeser, E., 20011. Inference and

identification with many invalid instruments. Technical report, Department of Eco-

nomics, Havard University Boston, MA.

Kiviet, J. F., Niemczyk, J., 2007. The asymptotic and finite-sample distributions of OLS

and simple IV in simultaneous equations. Computational Statistics and Data Analysis

51, 3296–3318.

Kiviet, J. F., Niemczyk, J., 2012. Comparing the asymptoticand empirical (un)conditional

distributions of OLS and IV in a linear static simultaneous equation. Computational

Statistics and Data Analysis 56, 3567–3586.

30

Kleibergen, F. , 2002. Pivotal statistics for testing structural parameters in instrumental

variables regression. Econometrica 70(5), 1781–1803.

Kleibergen, F., 2004. Testing subsets of structural coefficients in the IV regression model.

Review of Economics and Statistics 86, 418–423.

Kleibergen, F., 2005. Testing parameters in GMM without assuming that they are identified.

Econometrica 73, 1103–1124.

Kraay, A., 2008. Instrumental varaible regressions with honestly uncertain exclusion re-

strictions. Technical report, World Bank Washington, DC.

Magnus, J. R. , Neudecker, H. , 1999. Matrix Differential Calculus with Applications in

Statistics and Econometrics, Revised Edition. John Wiley &Sons, New York.

Mikusheva, A., 2010. Robust confidence sets in the presence of weak instruments. Journal

of Econometrics 157, 236–247.

Mikusheva, A. , 2013. Survey on statistical inferences in weakly-identified instrumental

variable models. Applied Econometrics 29(1), 117–131.

Moreira, M. J., 2003. A conditional likelihood ratio test for structural models. Econometrica

71(4), 1027–1048.

Moreira, M. J., Porter, J. , Suarez, G. , 2009. Bootstrap validity for the score test when

instruments may be weak. Journal of Econometrics 149, 52–64.

Muirhead, R. J., 2005. Aspects of Multivariate StatisticalTheory. John Wiley & Sons, Inc.,

Hoboken, New Jersey.

Murray, P. M., 2006. Avoiding invalid instruments and coping with weak instruments. The

Journal of Economic Perspectives 20(4), 111–132.

Nelson, C., Startz, R., 1990a. The distribution of the instrumental variable estimator and its

t-ratio when the instrument is a poor one. The Journal of Business 63, 125–140.

Nelson, C. , Startz, R. , 1990b. Some further results on the exact small properties of the

instrumental variable estimator. Econometrica 58, 967–976.

31

Phillips, G., Hale, C., 1977. The bias of instrumental variable estimators of simultaneous

equation systems. International Economic Review 18(1), 219–228.

Phillips, P. C. B. , 1989. Partially identified econometric models. Econometric Theory

5, 181–240.

Revankar, N. S., Hartley, M. J., 1972. An independence test and conditional unbiased pre-

dictions in the context of simultaneous equation systems. Econometrica 40(5), 913–

915.

Sargan, J., 1958. The estimation of economic relationshipsusing instrumental variables.

Econometrica 26(3), 393–415.

Slichter, D., 2013. Testing instrument validity and identification with invalid instruments.

Technical report, Department of Economics, University of Rochester Rochester, USA.

Staiger, D., Stock, J. H., 1997. Instrumental variables regression with weak instruments.

Econometrica 65(3), 557–586.

Stock, J. H., Wright, J. H., 2000. GMM with weak identification. Econometrica 68, 1055–

1096.

Stock, J. H., Wright, J. H. , Yogo, M. , 2002. A survey of weak instruments and weak

identification in generalized method of moments. Journal ofBusiness and Economic

Statistics 20(4), 518–529.

Stock, J. H., Yogo, M., 2005. Testing for weak instruments inlinear IV regression. In: D. W.

Andrews, J. H. Stock, eds, Identification and Inference for Econometric Models: Es-

says in Honor of Thomas Rothenberg. Cambridge University Press, Cambridge, U.K.,

chapter 6, pp. 80–108.

Swanson, N. R. , Chao, J. C. , 2005. Notes and comments: Consistent estimation with a

large number of weak instruments. Econometrica 73, 1673–1692.

Wang, J. , Zivot, E. , 1998. Inference on structural parameters in instrumental variables

regression with weak instruments. Econometrica 66(6), 1389–1404.

32

APPENDIX

A. Proofs

PROOF OF PROPOSITION 3.1 Suppose that (2.1)-(2.2) and Assumptions A - C hold.

Then, we haveE(Wtv1t) = 0, wherev1t = yt −X′1tξ 11−X′

2tξ 21 by (3.1). So, we have

E[Wt(yt −X′1tξ 11−X′

2tξ 21)] = E(Wtyt)−E(WtW′t )ξ 21 = 0 ⇔(A.1)

σWy−ΩWγ2−ΩWπ2β = 0 becauseξ 21 = γ2+π2β from (3.1),(A.2)

whereσWy= E(Wtyt) andΩW = E(WtW′t ). We want to solve (A.2) with respect toβ . To

do this, we find it useful to distinguish two cases: (a)π2 6= 0, and (b)π2 = 0.

(a) Suppose first thatπ2 6= 0. Then, post-multiplying both sides of (A.2) by does not

change the solution with respect toβ (if a solution exists). So, the system

π ′2σWy−π ′

2ΩWγ2 = π ′2ΩWπ2β(A.3)

and (A.2) are equivalent. Sinceγ2 is left unrestricted, aunique solutionwith respect

to β in (A.3), which does not depend onγ2, exists if, and only if,π ′2ΩWγ2 = 0 and

π ′2ΩWπ2 6= 0, i.e., if, and only if,γ2 ∈ ker(π ′

2ΩW) andπ2 /∈ ker(ΩW). In this case we have

β = (π ′2ΩWπ2)

−1σWy, which identifiable under Assumptions A - C because bothπ ′2σWy

andπ ′2ΩWπ2 can be estimated from the conditional means ofy1 andy2, givenX, in the

regressions in (3.1), even ifrank(X2) < k2; see Magnus and Neudecker (1999, Ch. 13,

Theorem 15) and the discussion in the last paragraph above Proposition3.1.

(b) Suppose now thatπ2 = 0. Then, (A.2) has multiple solution or does not have any

solution forβ , including whenγ2 ∈ ker(π ′2ΩW). So,β cannot be identified.

Proposition3.1 follows straightforwardly by putting (a) and (b) together.

PROOF OFLEMMA 4.1 (a) Suppose thatHθ0

holds. From (4.1), we haveYb0 = e and

33

b′0Ωb0 = e′MXe/(n−ν), so that

S+= [(W′W)+]

1/2W′e.

(e′MXen−ν

)−1/2

= S+

e from (4.6).(A.4)

It is clear from (A.4) that the distribution ofS+

underHθ0

, is identical to that ofS+

e , i.e.,

P0S+(x;θ0) = P

S+e(x,e) givenX = x, as stated. We now prove the invariance ofP

S+e(x,e).

(b) For any σ(X) satisfying (2.4), we can haveW′e.(

e′MXen−ν

)−1/2=

W′σ(X)e.(

σ(X)e′MXσ(X)en−ν

)−1/2=W′εσ (e).

(εσ (e)′MXεσ (e)

n−ν

)−1/2. So, we have

S+

e = [(W′W)+]1/2

W′e.(

e′MXen−ν

)−1/2

= [(W′W)+]1/2

W′εσ (e)

(εσ (e)′MXεσ (e)

n−ν

)−1/2

= S+

εσ ⇔ PS+e(x,e) = P

S+εσ(x,e) givenX = x.(A.5)

So, PS+e(x,e) is invariant to the transformation (2.4). If further Assumption D holds, we

εσd∼ ε, where givenX = x, the distribution ofε, Pε (x), is completely specified. There-

fore, givenX = x, we havePS+e(x,e)≡ P

S+

ε

(x, ε), whereε ∼ Pε (x) andPε (x) is completely

specified.

PROOF OFTHEOREM 4.3 Suppose that AssumptionsA - B andD are satisfied. If further

Hθ0

, it follows from the proof of Lemma4.1and (4.3) that

ΨAR(S+;θ0)

d∼ n−νν −ν1

ε ′PWεε ′MXε

.(A.6)

So, the conditional distribution ofΨAR(S+;θ 0) underHθ

0, givenX = x, only depends on the

distribution ofε, therefore is pivotal. We shall now distinguish the following two cases: (a)

assumption (2.5) (Gaussian errors) andε is independent ofX, (b) assumption (2.5) does

not hold orε is not independent ofX.

(a) Suppose that (2.5) andε is independent ofX. Then, it is straightforward to show

(A.7) ε ′MXε ∼ χ2(n−ν) and ε ′PWε ∼ χ2(ν −ν1),

34

whereMXPW = 0. So, ε ′MXε and ε ′PWε are independently, andΨAR(S+;θ0) ∼ F(ν =

ν1,n−ν) from (A.7). This means that the test that rejectsHθ0

whenΨAR(S+;θ 0)> Fα(n−

ν,ν −ν1) is similar with significance levelα for all values ofπ2, if (2.5) holds andX is

independent ofε, whereFα(n− ν,ν − ν1) is the 1−α quantile of aF distribution with

ν −ν1 andn−ν degrees of freedom.

(b) Suppose now that (2.5) does not hold orε is not independent ofX. It is straightfor-

ward to see that

(A.8) P[Ψ (0)

AR(S

+;θ 0)≥ cΨ (ε;α)

]= P

[pN[Ψ (0)

AR(S

+;θ0)]≤ α

]= α

underHθ0

, so that we have a test with levelα.

PROOF OFTHEOREM 4.8

The proof is similar to those of Theorem 4.1 Dufour and Taamouti (2007). Therefore,

we only present the outlines.

First, we can write the quadric formQ(θ) in (4.12) as:

Q(θ) = θ ′Aθ +b′θ +c≡ Q(θ) = θ ′Aθ + b′θ +c,(A.9)

A = R−1′AR−1 =

a11 A′

21

A21 A22

, b= R−1′b=

b1

b2

,(A.10)

where we haveθ = (ζ 0,θ′2)

′ = (w′θ ,θ ′2)

′, and R=

w′

R2

=

w1 w′

2

0 Im+k2−1

is a

square matrix of orderG+ k2 (w1 6= 0 by assumption). So, we can writeCw′θ (α) as

Cw′θ (α) ≡ Cζ 0(α) =

ζ 0 : θ = (ζ 0, θ ′

2)′satisfiesQ(θ)≤ 0

. Moreover, we can also ex-

plicitly write Q(θ) as

Q(θ) = = a11ζ 20+ b1ζ 0+c+θ ′

2A22θ2+[2A21ζ 0+ b2]′θ2 .(A.11)

From (A.11), it easy to see thatCw′θ (α)=

ζ 0 : minθ 2 Q(ζ 0, θ2)≤ 0. To solve this mini-

mization problem explicitly, we distinguish two cases: (a)A22≥ 0 andA22 6= 0, (b) A22= 0,

35

and (c)A22 is not positive definite.

(a) Suppose first thatA22≥ 0. The first and second derivatives ofQ(ζ 0, θ2) with respect

to θ 2 are:

∂ Q(ζ 0, θ2)

∂θ 2= 2A22θ2+2A21ζ 0+ b2 = 0⇔ A22θ2 =−A21ζ 0−

12

b2(A.12)

∂ 2Q(ζ 0, θ 2)

∂θ 2∂θ ′2

= 2A22.(A.13)

We distinguish the following two cases: (a1) rank(A22) = p2 = m+k2−1 and (a2)

0< rank(A22) = p2 < G+k2−1.

(a1) If rank(A22) = p2 = G+k2−1 (i.e. if A22 > 0), then (A.12)-(A.13) entail that

θ2 = θ∗2 =−A−1

22 A21ζ 0− 12A−1

22 b2, and∂ 2Q(ζ 0,θ2)∂θ 2∂θ ′

2> 0. So,θ∗

2 is the unique minimum

and by replacingθ2 by θ∗2 in the expression ofQ(ζ 0, θ2), we get:

Q(ζ 0, θ∗2)≡ Q(ζ 0) = a1ζ 2

0+ b1ζ 0+ c1(A.14)

wherea1= a11−A′21A

−122 A21, b1= b1−A′

21A−122 b2, andc1 = c− 1

4b′2A−122 b2. On noting

thatA−122 = A+

22, (4.12) holds withS1 = /0.

(a2) If 0 < rank(A22) = p2 < G+k2−1, we can writeQ(ζ 0, θ 2) as:

Q(ζ 0, θ2) = a11ζ 20+ b1ζ 0+c+ θ ′

2D2θ 2+[2A21ζ 0+ b2]′θ 2(A.15)

= a11ζ 20+ b1ζ 0+c+ θ ′

2∗D2∗θ2∗+[2A21∗ζ 0+ b2∗]′θ 2∗+

[P′22(2A21ζ 0+ b2]

′θ22,

where θ2 = P′21θ2, θ22 = P′

22θ2, and D2∗ > 0. If P′22(2A21ζ 0 + b2 = 0, we can

show as in (i1) that Cw′θ (α) =

ζ 0 : a1ζ 20+ b1ζ 0+ c1 ≤ 0

, where a1 = a11−

A′21∗D

−12∗ A21∗, b1 = b1−A′

21∗D−12∗ b2∗, andc1 = c− 1

4b′2∗D−12∗ b2∗. SinceA22=P′

2D2P2,

it is straightforward to see thatA+22=P′

21D−12∗ P21. Further, we also haveA′

21∗D−12∗ b2∗=

A′21A

+22b2, and b′2∗D

−12∗ b2∗ = b′2A+

22b2 so that (4.12) holds withS1 = /0. Now, if

P′22(2A21ζ 0 + b2 6= 0, we can proceed as above to show that (4.12) holds with

36

S1 =

ζ 0 : P′22(2A21ζ 0+ b2 6= 0

.

(b) Suppose thatA22 = 0. The result follows immediately form (A.11) and the last step

of the proof of (a2).

(c) Suppose now thatA22 is not positive definite. Hence, we can findθ 2 such that

θ ′2A22θ 2 = η < 0. So, for any any arbitrary scalarτ, we have

Q(ζ 0,τθ2) = a11ζ 20+ b1ζ 0+c+ητ2+ τ[2A21ζ 0+ b2]

′θ 2 .(A.16)

Becauseητ2+ τ[2A21ζ 0+ b2]′θ2 is a polynomial with respect toτ, andη < 0, we

can chooseτ sufficiently large to haveQ(ζ 0,τθ2) < 0, irrespective of the value of

ζ 0. So,Cζ 0(α) = R, as stated.

37

instrument endogeneity, weak identiﬁcation, and inference …...2014/08/22 · instrument...

Documents