instrument endogeneity, weak identification, and inference …...2014/08/22 · instrument...
TRANSCRIPT
Instrument endogeneity, weak identification, and
inference in IV regressions
Firmin Doko Tchatoka∗
The University of Adelaide
August 18, 2014
∗ School of Economics, The University of Adelaide, 10 Pulteney Street, Adelaide SA 5005, Tel:+6188313 5540, Fax:+618 8223 1460, e-mail: [email protected]
ABSTRACT
We study the possibility of making exact inference in structural models where: (a) instru-
mental variables (IVs) may be arbitrary weak, collinear, and invalid; (b) the errors may
have non-Gaussian distributions (possibly heavy-tailed and heteroskedastic); and (c) the
reduced-form specification may be arbitrary heterogenous,nonlinear, unspecified, and in-
complete (missing instruments). We provide the necessary and sufficient conditions under
which such models are identifiable, despite instrument invalidity. Under these conditions,
Wald-type tests and confidence sets (CSs) based onk-class type estimators may apply.
However, these conditions rule out models in which IVs are weak and are further diffi-
cult to check in practice. To alleviate these drawbacks, we develop identification-robust
procedures to test and build CSs for model coefficients. CSs for individual component
of the structural and instrument endogeneity parameters are obtained by projection. Tests
of exclusion restrictions and instrument selection are covered as instances of the class of
proposed procedures.
Key words: Instrument endogeneity; weak instruments; identification-robust inference;
finite-sample; non-Gaussian errors; projection method; exact Monte Carlo tests.
JEL classification: C12; C13; C36.
i
1. Introduction
This paper contributes to the literature on weak instruments by developing exact tests and
confidence sets (CSs) in IV regressions where: (i) instrumental variables may be arbitrary
weak, collinear, and invalid; (ii) the errors may have non-Gaussian distributions (possi-
bly heavy-tailed and heteroskedastic); and (iii) the reduced-form specification may be het-
erogenous, nonlinear, unspecified, and incomplete (omitted instruments).
IV methods usually requires the availability of exogenous instruments, at least as great
as the number of coefficients to be estimated, whereas the validity of those instruments is
not testable. In the last two decades, the so-called “weak instruments” problem has received
considerable attention in econometrics. Research on this topic is widespread and most
of the studies have usually imposed the exclusion restrictions.1 Several studies of weak
instruments have recently questioned the validity of the strictly exogeneity assumption.2
For example, Murray (2006) states: “in most IV applications, the instruments often arrive
with a dark cloud of invalidity hanging overhead and researchers usually do not know
whether their correlations with the error are exactly zero.” He suggests avoiding invalid
instruments in IV procedures. However, as it is difficult to test the validity of all candidate
instruments, it might seem that if we want to avoid invalid instruments, there is little hope
in trying to use IV methods. Bound et al. (1995, Section 3) provide evidence on how a
slight violation of instrument exogeneity can cause severebias in IV estimates, especially
when identification is weak. Hausman and Hahn (2005) show that even in large samples, IV
estimator can have a substantial bias even when the instruments are only slightly correlated
with the error. Doko Tchatoka and Dufour (2008) and Guggenberger (2011) show that
1For example, see Phillips (1989), Nelson and Startz (1990a,1990b), Choi and Phillips (1992), Bekker(1994), Hall, Rudebusch and Wilcox (1996), Dufour (1997, 2003, 2009), Staiger and Stock (1997), Wang andZivot (1998), Stock and Wright (2000), Donald and Newey (2001), Dufour and Jasiak (2001), Kleibergen(2002, 2004, 2005), Moreira (2003), Stock, Wright and Yogo (2002), Hall and Peixe (2003), Stock and Yogo(2005), Dufour and Taamouti (2005, 2007), Swanson and Chao (2005), Andrews and Stock (2007a, 2007b),Guggenberger and Smith (2005), Andrews, Moreira and Stock (2006), Dufour and Hsiao (2008), Hansen,Hausman and Newey (2008), Moreira, Porter and Suarez (2009), Chaudhuri and Zivot (2010), Dufour, Kha-laf and Beaulieu (2010), Guggenberger (2011), Guggenberger, Kleibergen, Mavroeidis and Chen (2012),Dufour, Khalaf and Kichian (2013), Mikusheva (2010, 2013),Doko Tchatoka and Dufour (2014), andDoko Tchatoka (2014).
2See Bound, Jaeger and Baker (1995), Brock and Durlauf (2001), Imbens (2003), Hausman and Hahn(2005), Murray (2006), Kiviet and Niemczyk (2007, 2012), Doko Tchatoka and Dufour (2008), Kraay(2008), Ashley (2009), Bazzi and Clemens (2009), Hahn, Ham and Moon (2010), Guggenberger (2011),and Berkowitz, Caner and Fang (2008, 2012).
1
Anderson and Rubin (1949) (AR) and Kleibergen (2002) (K) tests are highly sensitive to
instrument invalidity.
In this paper, we stress the fact that valid tests and CSs can be obtained in IVs regres-
sions in which the exclusion restrictions are violated. Several studies have adopted the same
position and we wish to make progress in this direction. Imbens (2003) shows that bounds
on average treatment effect in program evaluation can be recoveredviaa sensitivity analysis
of the correlations between treatment and unobserved components of the outcomes. Ashley
(2009) shows how the discrepancy between OLS and IV estimates can be used to estimate
the degree of bias under any given assumption about the degree to which IVs violate the
exclusion restrictions. Kiviet and Niemczyk (2007, 2012) show that the realizations of IV
estimator based on strong but invalid instruments seem muchcloser to the true parameter
values than those obtained from valid but weak instruments.Doko Tchatoka (2013) shows
that bootstrapping improves the size of Durbin-Wu-Hausmantests of exogeneity when IVs
are invalid. Imbens et al. (2011) show that Donald and Newey (2001) bias-corrected esti-
mator and Phillips and Hale (1977) jackknife IV estimator can be consistent and asymptot-
ically normal even when the exclusion restrictions are violated. Their framework, however,
rules out weak issues. Berkowitz, Caner and Fang (2012) showthat re-sampling Anderson
and Rubin (1949) AR-statistic yields test that has correct level asymptotically, under local-
to-zero instrument endogeneity.3 However, their method is valid only in large-sample and
is overly conservative.
By contrast, we develop a finite-sample procedure for testing and building CSs in IV re-
gressions where: IVs can be arbitrary weak, collinear, and violate the exclusion restrictions;
the errors may have non-Gaussian distributions (possibly heavy-tailed and heteroskedastic);
and the reduced-form specification may be arbitrary heterogenous, nonlinear, unspecified,
or incomplete. To be more specific, we consider a model of the form
y1 = y2β +X1γ1+u, u= X2γ2+e
wherey1 is an observed dependent variable,y2 is an observed (possibly) endogenous re-
gressor,X1 is a matrix of exogenous variables,X2 is a matrix of instruments which may be
3The parameter that controls instrument endogeneity goes tozero [at raten−1/2] when the sample sizenincreases.
2
rank-deficient and violate the exclusion restrictions ifγ2 6= 0, e is an error term. We callγ2
“instrument endogeneity” because it determines which variables inX2 are valid instruments
and which are not.
We observe that a procedure similar to that of Anderson and Rubin (1949) can be used
to develop identification-robust tests and CSs onθ = (β , γ ′2)′. So, identification-robust CSs
for each component ofβ andγ2 can be derived through the projection method.4 When the
errore follows a Gaussian distribution and is independent ofX, we show that the standard
Fisher-type critical values are applicable. But for a wide class of parametric non-Gaussian
errors (possibly heavy-tailed and heteroskedastic), we supply exact Monte Carlo tests5 crit-
ical values. We provide the analytical forms of the proposedCSs forθ and scalar linear
transformations ofθ , and characterize the necessary and sufficient conditions under which
there are bounded. Tests of exclusion restrictions and instrument selection are covered as
instances of the class of proposed procedures, including inexactly identified models.
The remainder of this paper is organized as follows. Section2 formulates the model
and related assumptions. Section 3 studies structural parameters identification with invalid
instruments. Section 4 develops finite-sample tests and CSswith correct level, even in the
presence of non-Gaussian errors. Section 5 deals with the Monte Carlo experiment, while
Section 6 presents the empirical application. Conclusionsare drawn in Section 7 and proofs
are presented in the Appendix.
Throughout this paper,Iq stands for the identity matrix of orderq. For anyn×mmatrix
A, PA = A(A′A)+A is the projection matrix on the space spanned byA, andMA = In−PA,
whereB+ refers to the Moore-Penrose inverse of the matrixB. The notationrank(A) is
the rank of the matrixA, while ‖A‖= [tr(A′A)]12 denotes the usual Euclidian or Frobenius
norm for A. B > 0 for a squared matrixB means thatB is positive definite (p.d.). The
symbol “d∼ ” signifies equivalence in distribution. The orthogonal group of p× p matrices
is denoted byO(p) =
H ∈M(p, p) : H ′H = Ip, whereM(p, p) is the set of all squared
matrices of orderp. Finally, for anyn×m matrix Ω , K er(Ω) = ω ∈ Rm : Ωω = 0
is the null set (kernel) ofΩ , andI m(Ω) = x∈ Rn : x= Ωω for someω ∈ R
m is the
column space ofΩ .
4See Dufour and Jasiak (2001), Dufour and Taamouti (2005), and Doko Tchatoka and Dufour (2014).5See Dufour (2006).
3
2. Model and assumptions
We consider a standard linear IV regression with one endogenous right hand side (rhs)
variable,k1 exogenous variables, andk2 IVs. The sample size isn. The model consists of a
structural equation and a reduced-form equation:
y1 = y2β +X1γ1+u,(2.1)
y2 = X1π1+X2π2+v2(2.2)
wherey1, y2 ∈ Rn, X1 ∈ R
n×k1, andX2 ∈ Rn×k2 (k2 ≥ 1) are observed variables;u, v2 ∈
Rn are unobserved errors;β ∈ R, γ1 ∈ R
k1, π2 ∈ Rk2, andπ1 ∈ R
k1 are unknown fixed
parameters. LetY = [y2 : y2] = [Y1, . . . ,Yn]′ ∈R
n×(G+1) andX = [X1 : X2] = [X•1, . . . ,X•n]′ ∈
Rn×k (k= k1+k2) denote the matrix of endogenous variables and instruments, respectively.
We defineYt ∈ R2 andX•t ∈ R
k as thetth rows ofY andX, written as column vectors, and
similarly for other random matrices. We make the following assumptions on the model
variables.
Assumption A For some fixed vectorγ2 in Rk2, we have:
u = X2γ2+e, where e∈ Rn is an error term.(2.3)
Assumption A implies thatX2 violates the usual exclusion restrictions ifγ2 6= 0. If
X2t (t = 1,2, . . . ,1) have same finite second moments ande is uncorrelated withX2, then
cov(X2t ,ut) 6= 0 wheneverγ2 /∈ K er(Cov(X2)), whereCov(X2) = E[(X2t − µX2)(X2t −
µX2)′], is the covariance6 matrix of X2t , µX2
= E(X2t). Therefore, some variables inX2
do not constitute valid instruments. Because of this property, we call γ2 “instrument en-
dogeneity.” The usual tests of exclusion restrictions– such as that of Sargan (1958) Bas-
mann (1960), and Hansen (1982)– typically test the null hypothesis thatγ2 = 0 in (2.3);
see Staiger and Stock (1997) and Hahn et al. (2010). Under Assumption A, the condi-
tional mean and variance ofut , givenX2t , depend onX2t if γ2 6= 0 (conditional structural
heteroskedasticity). Staiger and Stock (1997) and Guggenberger (2011) made a similar
assumption withγ2 = γ02/√
n for some fixed vectorγ02 ∈ R
k2 (local-to-zero instrument en-
6See Anderson (1971, Section 2.3) and Muirhead (2005, Section 1.2) for a similar definition and notation.
4
dogeneity).7 Doko Tchatoka and Dufour (2008) show the Anderson and Rubin (1949)
(AR)-test and Kleibergen (2002) (K)-test are highly size distorted under Assumption A.
Imbens et al. (2011) show that Donald and Newey (2001) bias-corrected estimator and
Phillips and Hale (1977) jackknife-instrumental-variables estimator may still be consis-
tent and asymptotically normal under Assumption A. But their framework assumes strong
instruments (i.e.,π2 6= 0), thus ruling out issues associated with weak instruments.
Assumption B rank(X1) = k1 and rank(X2) = ν2 ≤ k2 for some integerν2 > 0.
Assumption B imposes full-column rank on the matrix of exogenous variablesX1, but
allows X2 to have any arbitrary rankν2 > 0. For example, some linear combinations of
the columns ofX2 may becollinear or close to being so. Dufour and Taamouti (2007)
also consider a similar setup. Note that there no impedimentto expanding the full-column
rank assumption ofX1 to any arbitrary rankν1 ≥ 0. Under Assumption B, we may also
have 0< rank(X) = ν ≤ k. In the remainder of this paper,W = MX1X2, whereMX1 =
In−X1(X′1X1)
−1X′1, denotes the residuals of the regression ofX2 on the columns ofX1.
Assumption C (i)(
et , v′2t ,X′•t)′ : t = 1, . . . ,≤ n
are i.i.d. across t≤ n and n; (ii)
E[(et ,v2t) |X•t ] = 0 ∀ t = 1, . . . , n; and(iii ) E(WtW′t ) = ΩW ∀ t = 1, . . . , n.
Assumption C-(i) and (ii) are widely used in the IV literature; see Staiger and Stock
(1997), Stock and Wright (2000), Kleibergen (2002, 2005), Andrews et al. (2006), Guggen-
berger et al. (2012). (i) states that the errors and IVs are random and i.i.d acrossi ≤ n andn,
while (ii) is the usual conditional zero mean assumption of the errors. Assumption C-(iii)
requires the existence of same second moments for each row ofW. Note thatΩW may
not be positive definite and can be singular. In particular, this is the case whenX2 is rank-
deficient. No assumption on the existence of second moments or more for the errors(e,v2)
is needed.
Now, consider the linear map defined by
Rn −→ R
n
e 7→ εσ (e) = σ(X)e,(2.4)
7Also, see Berkowitz, Caner and Fang (2008, 2012).
5
whereσ(X) is possibly a random function ofX such that the eventσ(X) 6= 0 | X has
probability 1a.s., e is the error term defined in(2.3). Note thatσ(X) need not to be constant
and the distribution ofεσ (e) may arbitrarily depend onX.
For the purpose of developing finite-sample theory, we make the following assumption.
Assumption D There isσ0(X) such thatεσ0(e) satisfies(2.4) and εσ0
(e)d∼ ε, where
given X= x, ε has a completely specified distributionPε (x).
Assumption D states that the conditional distribution, givenX, of the error in the re-
gression ofu on X2 only depends onX and a (typically unknown) possibly random scale
factor σ0(X). This assumption holds whenevere is independent ofX with a distribution
of the formed∼ ε/σ0, whereε has a specified distribution andσ0 is an unknown positive
constant. In this context, the standard Gaussian model is obtained by taking
(2.5) ε ∼ N(0, In).
But non-Gaussian distributions which may be heteroskedastic and lack moments (such as
the Cauchy or Studentt distributions) are covered.
Under Assumption A, we can write model (2.1)-(2.2) as:
y1 = y2β +X1γ1+X2γ2+e,(2.6)
y2 = X1π1+X2π2+v2(2.7)
whereE[(e : v2)|X] = 0 by Assumption C. Letθ = (β ,γ ′2)′ andδ = (θ ′,γ ′1,π
′1,π
′2)
′ ∈Θ ⊆R×R
k2 ×Rk1 ×R
k2 , whereΘ is the parameter space. The statistical model associated with
(2.6)-(2.7) is defined as(Y ×X ,Pδ , δ ∈Θ) , whereY andX are drawn fromY and
X , respectively. For any random variableZ (possibly function ofδ ), PZ(x;δ ) denotes the
distribution ofZ conditional onX = x and we writeZ|X=x ∼ PZ(x;δ ).
We consider the problem of testing
Hθ0
: θ = θ 0 vs.Hθ1
: θ 6= θ0, for some fixedθ0 = (β0,γ0′2 )
′.(2.8)
6
Our main focus is on finite-sample and we are concerned with developing similar tests for
Hθ and confidence sets forβ andγ2 when some instruments inX2 may be arbitrary weak,
invalid, or collinear. But before proceeding, it will illuminating to study the identification
of β in the presence of possibly invalid instruments first.
3. Identification of β
We study the identification of the structural coefficient (β ) when some instruments inX2
may be invalid or collinear. If the exclusion restrictions (γ2 = 0) are satisfied,X has full-
column rank with probability 1, and[e,v2] has mean zero, then the weak IV literature
documents that the necessary and sufficient condition for the identification ofβ is π2 6= 0;8
see Stock et al. (2002), Dufour (2003), Andrews and Stock (2007a), Dufour and Hsiao
(2008), and Mikusheva (2013). Here, we investigate the identification of β when γ2 is
left unrestricted (possibly invalid instruments) andX2 may contains redundant columns
(ν2 < k2) or close to being so.
First, we can write the reduced-form forY = [y1 : y2] as:
(3.1) Y = X1ξ 1+X2ξ 2+V with V = [v1 : v2],
wherev1 = v1(β ) = v2β +e, ξ 1 = (ξ 11 : π1) = (γ1+ π1β : π1), andξ 2 = (ξ 21 : π2) =
(γ2+π2β : π2). Suppose first thatrank(X2) = ν2 = k2 andE([e : v2]/X) = 0. Hence, the
least squares estimators of the coefficients onXj ( j = 1,2) in each regression of (3.1) are
unique. On expressing the coefficients onX2 in (3.1) asξ 2 = (ξ 21 : π2) = (γ2 : 0)+π2a′,
wherea = (β ,1)′, it can be seen thatξ 21 is proportional toβ (with factor π2) if γ2 = 0.
Sinceξ 2 is identifiable,β is identifiable wheneverπ2 6= 0 if γ2 = 0. However, if γ2 6=0 is left unrestricted,ξ 21 = γ2 + π2β does not necessary have a solution forβ even if
π2 6= 0. To be more specific,ξ 21 = γ2 + π2β has a solution with respect toβ if, and
only if, (ξ 21− γ2) ∈ I m(π2), whereI m(π2) is the column space ofπ2; see Magnus
and Neudecker (1999, Ch. 2, Section 9, Theorems 11-12). Evenif a solutionβ exists, it
generally depends on the unknown valueγ2. The condition under which a solutionβ (when
8Note that this condition is replaced by the full-column rankassumption ofπ2 if G > 1 (i.e., there aremore than one endogenous regressor iny2).
7
it exists) does not depend onγ2 is thatγ2 ∈ K er(π2), whereK er(π2) is the null space of
π2.
We can generalize the above argument to cases in whichX2 is rank-deficient (ν2 < k2)
or close to being so. One difficulty here, is that,ξ 21 andπ2 are not uniquely determined
from the regression (3.1); see Magnus and Neudecker (1999, Ch. 13, Section 6, Eqs.(1)-
(2)). However, the conditional meansE(y2|X) andE(y1|X) are still estimable despite the
fact thatX2 does not have full-column rank; see Magnus and Neudecker (1999, Ch. 13,
Theorem 15). This implies that the errorsv1t (t = 1, . . . , n) of the reduced-form equation
for y1 in (3.1) are identifiable, despite the multiplicity of leastsquares estimators.9 So,
β may be identifiable through the orthogonality betweenv1t andX•t . We can prove the
following proposition on the identification ofβ whenγ2 is left restricted andX2 may be
rank-deficient.
Proposition 3.1 Suppose that(2.1)-(2.2) and Assumptions A -C are satisfied. Then:
β is identifiable ⇔ π2 /∈ K er(ΩW) andγ2 ∈ K er(π ′2ΩW),(3.2)
whereK er(ΩW) andK er(π ′2ΩW) denote the null sets ofΩW andπ ′
2ΩW, respectively.
Remark 3.2 (i) The identification condition in Proposition3.1can be stated asπ ′2ΩW 6= 0
andπ ′2ΩWγ2 = 0, and is easy to interpret. First,π ′
2ΩW 6= 0 means that the instruments in
X2 are strong. Second, observe thatW′t π2 can be viewed as the indirect effect ofX2t on y1t
when the effect of the exogenous variablesX1t has been eliminated, andW′t γ2 is its direct
effect ony1t . So, the conditionπ ′2ΩWγ2 = E(π ′
2WtW′t γ2) = 0 means that both effects are
uncorrelated [similar to Imbens et al. (2011)].
(ii) If γ2= 0 (strict exogeneity) andν2= k2 (X2 has full-column rank), the identification
condition of Proposition3.1 becomesπ2 6= 0. So, Proposition3.1 generalizes the usual
necessary and sufficient condition for identification in theprevious weak IV literature.
(iii) Proposition 3.1 also generalizes the condition under which the two-stage least
squares estimator is consistent in Doko Tchatoka and Dufour(2008, Eqs(4.8)-(4.9)), and
9Under Assumption C-(ii), we can writev1t = yt −E(y1t |X•t) for all t = 1, . . . , n. From Magnus andNeudecker (1999, Ch. 13, Theorem 15),E(y1t |X•t) is identifiable even whenν2 < k2, hencev1t is alsoidentifiable.
8
those under which the Donald and Newey (2001) bias-corrected estimator and Phillips and
Hale (1977) jackknife-instrumental-variables estimators are consistent and asymptotically
normal in Imbens et al. (2011). Both Doko Tchatoka and Dufour(2008) and Imbens et
al. (2011) assume thatX2 has full-column rankk2. Here, we allowX2 to have any arbitrary
rank. In addition, Imbens et al. (2011) analyze the setup in whichX2 is strong (i.e.,π2 6= 0
in our framework), meaning that weakly identified models arerule out of their scope. Here,
we also allow for any arbitrary value ofπ2.
(iv) Under the conditions of Lemma3.1, the usualF-type or Wald-type statistics based
onk-class estimators10 could be used to assessHθ0
and build CSs forβ , despite instrument
endogeneity. However, these identification conditions albeit interesting, rule out model
where identification is not very strong and they are in addition difficult to implement in
practice (because the conditionγ2 ∈ K er(Π ′2ΩW) cannot be verified empirically, asγ2 is
not consistently estimable under instrument invalidity).
Clearly, Proposition3.1, albeit interesting because it shows that the usual procedures,
such asF- or Wald-type tests, may yield valid inference when IVs are invalid, it cannot
be implemented in empirical applications. In the remainderof this paper, we focus on
developing tests forHθ0
and building CSs forθ and scalar linear transformations ofθ .
4. Exact inference
In this section, we develop a finite-sample procedure for assessingHθ0
and building CSs
on θ . First, we propose a test forHθ0
that is similar despite instrument possible endogene-
ity and rank deficiency. Second, we use test inversion methodto obtain joint CSs with
level 1−α for θ , where 0< α < 1. Finally, we apply the projection techniques11 to get
identification-robust CSs with level 1−α (at least) for scalar linear transformations ofθ .
The marginal CSsc for the structural coefficient (β ) and each component of instrument
endogeneity (γ2) are deduced as special cases of the proposed projection method.
10For example, the Donald and Newey (2001) bias-corrected estimator or the Phillips and Hale (1977)jackknife-instrumental-variables estimator.
11see Dufour and Jasiak (2001), Dufour and Taamouti (2005), Dufour and Taamouti (2007), andDoko Tchatoka and Dufour (2014).
9
4.1. Similar test for Hθ0
We propose a generalization of Anderson and Rubin (1949) approach for assessingHθ0
.
We note that alternative procedures, such as Kleibergen (2002, K) and Moreira (2003,
CLR) tests, could be exploited for that purpose.12 However, no finite-sample distributional
theory is available for these methods, especially with heteroskedastic non-Gaussian errors.
Further, these are not robust to missing instruments.13
The Anderson and Rubin (1949) approach to testHθ0
is to consider the transformed
reduced-form equation fory1 :
y−Yβ 0−X2γ02 = X1ξ
0
11+X2ξ0
21+v0
1,(4.1)
whereξ0
11= π1(β −β 0)+γ1, ξ0
21= π2(β −β 0)+γ2−γ02, andv
0
1 ≡ v0
1(β) = v2(β −β 0)+
e. Sinceξ0
21= 0 whenβ = β 0 andγ2 = γ02, we can assessHθ
0by considering theF-statistic
of the null hypothesisξ0
21= 0 in (4.1). LetΩ = 1n−ν Y′MXY, whereY = [Y : X2], and define
S+= [(W′W)+]
1/2W′Yb0.(b
′0Ωb0)
−1/2 with b0 = (1,−θ ′0)
′.(4.2)
The generalization of the AR-statistic for assessingHθ0
is given by:
ΨAR(S+;θ0) = S+
′S+/(ν −ν1).(4.3)
The corresponding test rejectsHθ0
at levelα (0< α < 1) when
ΨAR(S+;θ0)> κΨ ,α (S
+;θ0)(4.4)
whereκΨ ,α (S+;θ0) is the 1−α quantile ofΨAR(S
+;θ0) and the critical value function is
defined asκΨ ,α (S+;θ0) = inf
τ ∈ R; Pθ0(ΨAR(S
+;θ0)> τ)≤ α
. If the distribution of
ΨAR(S+;θ0), conditional14 onX = x, is absolutely continuous with respect to the Lebesgue
12For example, Andrews et al. (2006) show that the CLR-test is nearly uniformly more powerful (UMP)among invariant similar tests that are asymptotically efficient, and have recommend the use of this test inempirical practice. Guggenberger et al. (2012) show that the plug-in Anderson and Rubin (1949) (AR) andKleibergen (2002) (K) subset statistics yield more powerful tests than their projection-based counterparts.
13See Dufour and Taamouti (2007), Dufour et al. (2013), and Doko Tchatoka (2014).14Observe that for a givenb0, S
+only depends on the data(Y,X) ∈ Y ×X . So, if the distribution ofY,
givenX, is absolutely continuous with respect to the Lebesgue measure, then the distribution ofΨAR(S+
;θ 0)is also absolutely continuous with respect to the Lebesgue measure.
10
measure, we obtain
Pθ0[ΨAR(S+;θ0)> κΨ ,α (S
+;θ0)] = α(4.5)
so that the test based on the critical valueκΨ ,α (S+;θ0) is exact. To implement this test, the
critical valuesκΨ ,α (S+;θ0) need to be computed from the observed data, especially with
non-Gaussian errors. This will be done using numerical simulations. Let
S+
ω = [(W′W)+]1/2
W′ω .
(ω ′MXωn−ν
)−1/2
for all ω ∈ e, εσ ,(4.6)
wheree andεσ are the error terms satisfying (2.3) and (2.4), respectively. Let P0S+(x;θ0)
andPS+ω(x,ω), ω ∈ e, εσ , denote the distributions ofS
+|X=x andS+
ω |X=x, respectively,
underHθ0. We note thatP
S+ω(x,ω) does not directly depend on a specific valueθ0 tested
because the statisticS+
ω does not directly involveθ . We can now state Lemma4.1 on the
behavior ofS+
andS+
ω , ω ∈ e, εσ , underHθ0
.
Lemma 4.1 Suppose that Assumptions A - B and Hθ0
are satisfied. Then, conditional on
X = x, we have:
(a) P0S+(x;θ0) = P
S+e(x,e);
(b) PS+e(x,e) is invariant to the transformation(2.4)⇔ P
S+e(x,e) = P
S+εσ(x,εσ ) ∀ εσ
satisfying(2.4) . If further Assumption D holds, we havePS+e(x,e)≡ P
S+
ε
(x, ε),
whereε ∼ Pε (x) andPε (x) is completely specified.
Remark 4.2 (i) Lemma4.1-(a) shows that the distribution ofS+, underHθ
0, only depends
on X and the error of the regression (2.3). So, the reduced-form errors v2 plays no role,
therefore, they can heteroskedastic in any arbitrary way. From (4.3), it is also clear that the
null distribution ofΨAR(S+;θ0) also depends only onX and the distribution ofe.
(ii) Lemma4.1-(b) shows that the conditional distribution ofS+
underHθ0, givenX =
x, is invariant to any linear transformation satisfying (2.4). In particular, the conditional
distribution of S+
underHθ0, given X = x, only depends on the distribution ofε under
Assumption D. Therefore, the distributionΨAR(S+;θ0)|X=x underHθ
0, only depends on the
distribution ofε.
11
(iii) If ε is normally distributed15 and is independent ofX, then it is straightforward
to show thatΨAR(S+;θ0) ∼ F(ν −ν1,n−ν) for all values ofπ2. So,Hθ
0can be assessed
by using the critical values of aF-distribution with (ν − ν1,n− ν) degrees of freedom.
However, If (2.5) does not hold (non-Gaussian error) or ifε is not independent ofX, the
null distribution ofΨAR(S+;θ0)|X=x is nonstandard. Nevertheless, it does not involve any
nuisance parameter. So, we can proceed as follows16 to compute the 1−α critical value
of ΨAR(S+;θ0) underHθ
0: (1) chooseα1 and N so thatα = [α1N]+1
N+1 , where [z] is the
smallest integer greater thanz; (2) for a givenθ 0, compute the test statisticΨ (0)AR
(S+;θ0)
based on the observed data;(3) generateN i.i.d. error vectorsε( j)= [ε( j)
1 , . . . , ε( j)n ]′,
j = 1, . . . ,N , according to the specified distributionPε ,x and compute the corresponding
statisticΨ ( j)AR
, j = 1, . . . , N, following (4.3); note that the null distribution ofΨAR(S+;θ0)
does not depend on the specific valuesθ0 tested, so there is no need to make it depend on
θ0; (4) compute the empirical distribution function based onΨ ( j)AR
, j = 1, . . . , N,
(4.7) PΨ (z;N)≡ PΨ (z) =∑N
j=11[Ψ ( j)AR
≤ z]
N+1,
where1[C] = 1 if condition C holds, and1[C] = 0 otherwise;(5) reject Hθ0
at level α
whenΨ (0)AR
(S+;θ0)≥ κMC(ε;α) = P
−1Ψ
(1−α1) , whereP−1
Ψ(q) = infz: PΨ (z)≥ q is the
generalized inverse ofPΨ (·). We can now prove Theorem4.3on the validity of the AR-test,
whereFα(n−ν,ν −ν1) denotes the 1−α quantile ofF(ν −ν2,n−ν).
Theorem 4.3 Suppose that Assumptions A - B and D are satisfied. Then, the test that re-
jects Hθ0
whenΨAR(S+;θ0) > cΨ (ε;α) is similar with significance levelα for all values
of π2 (instrument quality), wherecΨ (ε;α) = Fα(n− ν ,ν − ν1) if (2.5) holds and X is
independent ofε, andcΨ (ε;α) = κMC(ε;α) otherwise.
Remark 4.4 (i) Theorem4.3shows that the critical values computed as in Remark??-(iii)
yield a test with correct level in finite-sample, even when the model is weakly identified
(π2 = 0 or is close to being so) and the errors are non-Gaussian. So,the proposed test is
15That is, if equation (2.5) is satisfied.16We cover the case in whichPε (x) is continuous, so that the null distribution ofΨAR(S
+;θ 0)|X=x is also
continuous. IfPε (x) is not continuous, the Monte Carlo test algorithm can easilybe adapted by using “tie-breaking” method, as in Dufour (2006).
12
robust to weak IVs and non-Gaussian errors (even in small samples), despite instrument
possible invalidity (γ02 6= 0).
(ii) Since the null distribution ofΨAR(S+;θ0) does not depend on any of the variables
and parameters in (2.2), Theorem4.3hold even when the reduced-form fory2 is given by
y2 = m(X1,X2,X3, v2, π∗1,π
∗2,π
∗3),(4.8)
whereπ∗1, π∗
2, andπ∗3 are vectors of unknown reduced-form coefficients,m(·) is an arbi-
trary unspecified (possibly) nonlinear function, andX3 ∈ Rn×k3 is a matrix of instruments
that may have been omitted from (2.2). Because of the later properties, the proposed proce-
dure is robust to nonlinear and incomplete reduced-forms [similar to Dufour and Taamouti
(2007) and Dufour et al. (2013)]. More interestingly,y21, . . . , y2n may be arbitrary heteroge-
nous and the reduced-form disturbancesv21, . . . , v2n may not follow a Gaussian distribution
or may also be arbitrary heteroskedastic. So, the proposed procedure is also robust to het-
erogeneity in the reduced-forms.
We now examine the finite-sample power of the proposed test. To do this, we consider
the following linear transformation [similar to (2.4)] on the errorv0
1 of the regression (4.1):
Rn −→ R
n
v0
1 7→ εσβ= σ β (X)v
0
1,(4.9)
whereσ β (X) is (possibly) a random function ofX andβ such thatPδ [σ β (X) 6= 0|X=x] = 1.
In addition, we also make the following assumption.
Assumption E There existsσ β (X) satisfying(4.9) such thatεσβ|X=x
d∼ v, where the
distributionPv(x) of v, given X= x, is completely specified.
Assumption E is similar to Assumption D. It states that the distribution of the reduced-
form disturbancev0
1 only depends onX and a typically unknown (possibly) random scale
factorσ β (X), which is also (possibly) a function of bothX and the structural coefficientβ .
Again, a Gaussian distribution forv0
1 is obtained by choosingPv(x) = N(0, In). But non-
Gaussian distributions, including heavy-tailed distributions which may lack moments, are
covered. In general, Assumptions D and E do not entail each other, except whenβ = β 0 or
13
the conditional distribution of(e,v2), givenX = x, is Gaussian with finite second moments.
Let
Sv = [(W′W)+]1/2
W′v, σ v =
(v′MXvn−ν
)1/2
, and µπ2θ
= µπ2Cθ(4.10)
whereCθ = (θ − θ0)σ β (X), µπ2= [(W′W)+]
1/2W′W[π2 : Ik2], and v is the error in As-
sumption E. Lemma4.5characterizes the distribution ofS+
andΨAR(S+;θ0) underHθ
1.
Lemma 4.5 Suppose that Assumptions A - B and E are satisfied. If furtherθ 6= θ0, Then
we have:
S+ d∼ σ−1
v(Sv +µ
π2θ) and ΨAR(S
+;θ 0)
d∼ (ν −ν1)−1σ−2
v(Sv +µ
π2θ)′(Sv +µ
π2θ).
Remark 4.6 (i) states that the distribution ofΨAR(S+;θ 0), underHθ
1, only depends on the
distributions ofSv andσ v, as well as the factorµπ2θ
. Since givenX = x, the distributions of
Sv andσ v only depend on that ofv, it is clear that the conditional distribution ofΨAR(S+;θ0)
underHθ1, given X = x, only depends onµ
π2θand the distribution ofv. Therefore, the
power function,ηAR(·), of the corresponding AR-test that rejectsHθ0
whenΨAR(S+;θ0) >
cΨ (ε;α), is entirely determined by the distribution ofv and the factorµπ2θ
, i.e.
Pθ∈Hθ
1
[ΨAR(S
+;θ 0)> cΨ (ε;α)
]= ηAR(v,µπ2θ
;α).(4.11)
Under Assumption E,v∼ Pv(x) andPv(x) does not depend onθ . So,µπ2θ
is the only factor
that determines test power.
(ii) If v∼ Pv(x)≡ N(0, In) andX is independent ofv, thenΨAR(S+;θ0)|X=x ∼ F
τ2x,θ(ν −
ν1,n− ν) for all values ofπ2, whereτ2x,θ = σ2
β ‖ µπ2θ
‖2 is the non-centrality parameter
[similar to Revankar and Hartley (1972)]. Therefore, the exact power of the test in (4.11)
can be computed from the sample using a noncentralF-distribution with(ν − ν1,n− ν)
degrees of freedom and non-centrality parameterτ2x,θ for θ andπ2 fixed. If Pv(x) is not a
normal distribution orX depends onv, the distribution ofΨAR(S+;θ 0)|X=x, underHθ
1, is
nonstandard but it can be simulated forθ andπ2 fixed. So, the exact power of the test can
also be simulated forθ andπ2 fixed, by using the Monte Carlo test method described in
Remark??-(iii). We can now state the following necessary and sufficient condition under
which the proposed AR-test exhibit power in finite-sample.
14
Theorem 4.7 Suppose that Assumptions A - B and E are satisfied. Then, the test that re-
jects Hθ0
whenΨAR(S+;θ0) > cΨ (ε;α) exhibits power for all values ofπ2, if, and only if,
ξ0
21 /∈ K er([(W′W)+]
1/2W′W
), whereξ
0
21 = [π2 : Ik2](θ −θ0).
Theorem4.7 follows directly from Lemma4.5 shows thatµπ2θ
is the only factor
that determines the proposed AR-test power, i.e., power exists if, and only if,µπ2θ
=
σ β (X)[(W′W)+]1/2
W′Wξ0
21 6= 0, or equivalently,ξ0
21 /∈ K er([(W′W)+]
1/2W′W
), since
Pδ [σ β (X) 6= 0|X=x] = 1 from (4.9). As seen from the expression ofξ0
21, power may still
exist even whenπ2 = 0 (irrelevant instruments), providedγ2− γ02 6= 0. However, the test
has low power if bothπ2 and γ2− γ02 are zero or close to being so. We now focus on
building CSs forθ0 and scalar linear transformations ofθ 0.
4.2. Exact confidence sets
In section, we develop a methodology to builds CSs onθ 0 and linear combinations of the
elements ofθ 0. Whenθ0 is unknown,Ψ (0)AR (S
+;θ0) is also unknown and the test procedure
described in Remark4.2-(iii) is not directly implementable. We stress the fact exact CSs
can be obtained for model parameters by using test inversiontechniques. In Section 4.2.1,
we describe how to build joint CSs forθ0, while in Section 4.2.2, we deal with scalar linear
transformationsw′θ 0, for somew 6= 0.
4.2.1. Joint confidence sets forθ
In Theorem4.3, we show that the test that rejectsHθ0
whenΨAR(S+;θ0) > cΨ (ε;α) is
similar with significance levelα for any identification strengthπ2. So, we can invert
ΨAR(S+;θ0) to obtain a joint CS with level 1−α for θ0. More precisely, the generalized
Anderson-Rubin-type CS forθ0 is given by:
(4.12) Cθ (α) =
θ0 : ΨAR(S+;θ0)≤ cΨ (ε;α)
= θ 0 : Q(θ0)≤ 0
whereQ(θ0) = θ ′0Aθ0 + b′θ0 + c is a quadratic-linear form inθ 0 such thatA = [y2 :
X2]′H[y2 : X2], b = −2[y2 : X2]
′Hy, c = y′Hy, H = MX1 − [1+ cΨ (ε;α)(ν−ν1n−ν )]MX. De-
pending on the value ofA, b, andc, the quadric surfaceQ(θ0) = 0 may take different
15
forms: ellipsoid, paraboloid, hyperboloid, andcone. So, the confidence setCθ (α) may
be unbounded; see Dufour and Taamouti (2005, Theorem 4.1). In particular,Cθ (α) is
unbounded whenA is not positive semi-definite. We will now focus on building CSs for
w′θ 0.
4.2.2. Projection-based confidence sets forw′θ0
We use the projection techniques17 to obtained CSs for scalar linear transformtionw′θ 0.
Let h(θ) be any arbitrary function ofθ , andCθ (α) be the joint CS forθ0 in (4.12). Since
the eventθ ∈ Cθ (α) entailsh(θ) ∈ h[Cθ (α)], henceh[Cθ (α)] = h(θ) : θ ∈ Cθ (α) is a
confidence set with level (at least)18 1−α for h(θ). Conceptually, the confidence set with
level (at least) 1−α for h(θ0) = w′θ0, obtained by projectingCθ (α) is defined as:
Cw′θ (α) = h[Cθ (α)] = ζ 0 : ζ 0 = w′θ0 for someθ0 ∈ Cθ (α)(4.13)
= ζ 0 : ζ 0 = w′θ0 s.t. Q(θ0)≤ 0 .
Without any loss of generality, let partitionw asw = (w1,w′2)
′, wherew1 6= 0 is a scalar
andw2 is ak2×1 vector (possible zero). LetR=
w′
R2
=
w1 w′
2
0 IG+k2−1
and define
A = R−1′AR−1 =
a11 A′
21
A21 A22
, b= R−1′b=
b1
b2
,(4.14)
A andb are given in (4.12). Also, consider the spectral decomposition of A22 given by:
A22= P2Λ2P′2, Λ2 = diag(λ1, . . . , λ k2),(4.15)
whereP21 : k2× p2, P22 : k2× (k2− p2), andλ j are the eigenvalues ofA22 with λ j 6= 0 if
1≤ j ≤ p2, λ j = 0 if j > p2; andp2 = rank(A22). We can now prove Theorem4.8on the
analytic form ofCw′θ (α) in (4.14).
Theorem 4.8 Suppose that(2.1) - (2.2), Assumptions A - B, and D are satisfied. Then, we
have:17see Dufour and Jasiak (2001) and Dufour and Taamouti (2005, 2007).18Observe thatP [h(θ) ∈ hCθ (α)]≥ P[θ ∈ Cθ (α)]≥ 1−α so thath[Cθ (α)] has level at least 1−α.
16
Cw′θ (α) =
ζ 0 : a1ζ 20+ b1ζ 0+ c1 ≤ 0
∪S1 if A22 6= 0 is p.s.d.,
=
ζ 0 : a1ζ 20+ b1ζ 0+c≤ 0
∪
ζ 0 : 2A21ζ 0+ b2 6= 0
if A22 = 0 ,
= R otherwise;
where a1 = a11 − A′21A
+22A21, b1 = b1 − A′
21A+22b2, c1 = c − 1
4b′2A+22b2, S1 = /0 if
rank(A22) = k2, and S1 =
ζ 0 : P′22(2A21ζ 0+ b2) 6= 0
if 1≤ rank(A22)< k2.
Remark 4.9 (i) First, we observe that Theorem4.8 is similar to Theorem 4.1 in Dufour
and Taamouti (2007), so, we only give the guide lines of the proof in the appendix.
(ii) The theorem provides the analytical form of the CSs for any linear combina-
tion of the elements ofθ 0, but we find it useful to discuss the follow two interesting
applications in details: (1) CS for the structural coefficient β 0, and (2) instrument selection.
1. CS for the structural coefficientβ 0
The CS forβ 0 is obtained from Theorem4.8by choosingw1 = 1 andw2 = 0 in (4.14).
In this case, we have ¯a11 = y′2Hy2, A21 = W′y2, A22 = W′W, b1 = −2y′2Hy1, and b2 =
−2W′y1, whereH is given in (4.12) andW = MX1X2. So, the CS forβ0 with level (at least)
1−α is explicitly given by:
Cβ (α) =
β 0 : a1β 2
0+ b1β 0+ c1 ≤ 0∪S1 , if W′W 6= 0 is p.s.d.,
β 0 : a1β 2
0+b1β 0+c≤ 0∪S2 , if W′W = 0 ,
R if W′W is not p.s.d.,
(4.16)
where a1 = y′2(H − PW)y2, b1 = −2y′2(H − PW)y1, c1 = y′1(H − PW)y1, PW =
W(W′W)+W′, S2 = β 0 : W′y2β 0−W′y1 6= 0 , and S1 = /0 if rank(W′W) = k2 and
S1 = β 0 : P′22(W
′y2β 0−W′y1) 6= 0 if 1 ≤ rank(W′W) < k2. So, the analytical form of
Cβ (α) in (4.16) can be explicitly given by looking the eigenvaluesof instrument matrix
W′W. For example, if all eigenvalues ofW′W are positive,Cβ (α) takes the form of the
quadratic inequality, i.e.,Cβ (α) =
β 0 : a1β 20+ b1β 0+ c1 ≤ 0
.
2. Instrument selection
A second interesting application of Theorem4.8 is instrument selection. Letγ2 =
17
(γ21, . . . ,γ2k2) ≡ (γ0
2p)1≤p≤k2. Sinceγ0
2p = 0 entails that the variableX2p constitute a valid
instrument, the CS forγ02p provides a test of the validity ofX2p for all p=1, . . . ,k2. Specific,
we selectX2p as a valid IV if the CS of its coefficient (γ02p) in the structural equation
contains zero, i.e., 0∈ Cγ02p(α). If 0 /∈ Cγ0
2p(α), X2p does not constitute a valid instrument.
We stress the fact that instrument selection may still be meaningful, although (4.16)
provides a valid CS for the structural coefficient of interest β . For example, in empirical
applications where not all instruments are weak, providinga procedure to select those that
are valid may yield consistent point estimate ofβ that is relevant for policy analysis. We
now show how to obtainCγ02p(α) from Theorem4.8, for all p= 1, . . . ,k2.
(1) For eachp= 1, . . . ,k2, rearrange the parameters and data as follows:
θ (p) = (γ02p,θ
∗′(p))
′, θ∗(p) = (β ,γ∗
′2(p))
′, γ∗2(p) = γ2\γ02p,(4.17)
X(p)2 = [y2 : X2(p)], X2(p) = X2\X2p, W = MX1X2 = [W1, . . . ,Wp, . . . ,Wk2],(4.18)
where convention, we consider thatγ∗2(p) is simply not present in (4.17) whenk2 = 1. (2)
Compute the quantitiesa(p)11 =W′pWp, A(p)
21 = X(p)′
2 Wp, A(p)22 = X(p)′
2 HX(p)2 , b(p)1 =−2W′
py1,
b(p)2 = −2X(p)′
2 Hy1, andc(p) = y′1Hy1, as well as ˜a1p = W′2P
HX(p)2
Wp, b1p = −2W′p(In−
PHX(p)
2)y1, and c1p = y′1(H −P
HX(p)2)y1, whereP
HX(p)2
= HX(p)2 (X(p)′
2 HX(p)2 )+X(p)′
2 H. And
(3) the CSCγ02p(α), p= 1, . . . ,k2, is obtained by choosingw≡ w(p) = (1,0′)′ and replacing
a11, A21, A22, b1, andb2 by a(p)11 , A(p)21 , A(p)
22 , b(p)1 , b(p)2 , andc(p) in (4.16), respectively, i.e.:
Cγ02p(α) =
γ0
2p : a1p(γ02p)
2+ b1p(γ02p)+ c1p ≤ 0
∪S1p , if A(p)
22 6= 0 is p.s.d.,
γ02p : a1p(γ0
2p)2+b1pγ0
2p+c1p ≤ 0∪S2p , if A(p)
22 = 0 ,
R if A(p)22 is not p.s.d.,
(4.19)
where S2 =
γ02p : X(p)′
2 Wpγ02p−X(p)′
2 Hy1 6= 0, S1 = /0 if rank(X(p)′
2 HX(p)2 ) = k2 and
S1 =
γ02p : P′
22(X(p)′
2 Wpγ02p−X(p)′
2 Hy1) 6= 0
if 1 ≤ rank(X(p)′
2 HX(p)2 ) < k2. Again, if
X(p)′
2 HX(p)2 is positive definite, thenCγ0
2p(α) takes the form of the quadratic inequality,
i.e.,Cγ02p(α) =
γ0
2p : a1p(γ02p)
2+ b1p(γ02p)+ c1p ≤ 0
.
We will now illustrate our theory through a Monte Carlo experiment.
18
5. Simulation experiment
We use simulation to examine the performance of the proposedAR-test. The DGP19 is
y1t = y2tβ +ut , ut = X′2tγ2+et ,(5.1)
y2t = m(X2t, X3t , v2t ; π2, δ ), t = 1, . . . , n,(5.2)
where the reduced-form model fory2t uses two alternative specifications: (1)
m(X2t, X3t , v2t ; π2, δ ) = X′2tπ2+X′
3tδ +v2t , and (2)m(X2t, X3t , v2t ; π2, δ ) = exp(X′2tπ2+
X′3tδ )+ v2t . The first specification is the usual linear model, while the second is nonlin-
ear. X3 is a n× 5 matrix of instruments that belong to the true DGP, but are omitted in
the inference (missing instruments). So,δ measures the degree of instrument omission in
this setup. Ifδ = 0, then no instrument is omitted whileδ 6= 0 means relevant instrument
exclusion. In this experiment, we setδ = λδ 0, whereδ 0 is a 5× 1 vector of ones and
λ varies in0, 0.01, .1, 1 . For example,λ = 0 is a design of no instrument exclusion,
λ = 0.01 is a design of weak instrument exclusion,λ = 0.1 is a design of moderately weak
instrument exclusion, andλ = 1 is a design of strong instrument exclusion.X2 contains
k2 = 5 instruments that violate the exclusion restrictions ifγ2 6= 0. Each column ofX2 and
X3 is generated i.i.d. normal with identity matrix. The reduced-form coefficient vectorπ2
is chosen asπ2 = ( µ2
n‖X2π0‖)1/2π0, whereπ0 is a 5×1 vector of ones,µ2 is the concentra-
tion parameter which describes the strength ofX2. We varyµ2 in 0, 13, 1000, where
µ2 = 0 is a complete non-identification or irrelevant IVs setup,µ2 = 13 is a design of
weak instruments, andµ2 = 1000 is for strong identification (strong instruments).20 We
setβ −β 0 = β ∗, γ2−γ02 = τ∗.γ0
2, whereβ0 = 1, γ02 is a 5×1 vector of ones (so the IVs are
invalid), andβ ∗ andτ∗ vary inR. In this setup, the null hypothesisHθ0
is equivalent to test
whetherβ ∗ = τ∗ = 0. So,β ∗ = τ∗ = 0 in the graphs indicates the empirical size, while the
valuesβ ∗ 6= 0 andτ∗ 6= 0 indicate test empirical power. To shorten the exposition,we only
present the empirical power in the direction ofτ∗ = β ∗/3, but the results do not change
qualitatively with alternative directions.
We also consider two alternative specifications for the errors [e,v2] joint distribution. In
19Note that there is no exogenous variableX1 in (5.1)-(5.2), but the results do not change qualitativelyifsuch exogenous variables were included.
20See Hansen et al. (2008) and Guggenberger (2010) for a similar parametrization.
19
the first one,(et ,v2t)′ ∼ N
[0, σ2(X2)Σρ
]for all t = 1, . . . , n (conditional Gaussian errors),
whereσ2(X2) = exp(
ϖ‖k−1/2
2 X2‖)
, Σρ =
1 ρ
ρ 1
, ρ varies in0.2, 0.5, 0.9, and
ϖ ∈ 0, 1,−1. In the second one,(et ,v2t) follow a multivariatet(3) distribution with
the same covariance matrix as the first specification. In bothcases, Assumptions D and
E are satisfied. Ifϖ = 0, the errors are homoskedastic, but they are heteroskedasticif
ϖ ∈ 1,−1 . We use the exact Monte Carlo test critical values in all cases.
Figures 1 - 3 present the results. Figure 1 is about Gaussian heteroskedastic errors, while
Figures 2 and 3 deal with homoskedastic and heteroskedastict(3)-errors, respectively. In
all figures, the power curves are drawn for each strength of the omitted instrumentsX3
(λ ∈0, 0.01, .1, 1). In each figure, the sub-figures (a) and (c) represent the cases in which
all instruments inX2 are irrelevant (µ2=0), whereas (b) and (d) describe strong instruments
(µ2 = 1000).21 Meanwhile, the sub-figures (a) and (b) represent a linear specification of the
reduced-form fory2, while those in (c) and (d) are the nonlinear specification. The sample
size is set atn = 50, the nominal level at 5%, and the rejection frequencies are computed
usingN = 10,000 pseudo-samples.
First, we observe that in all cases– including heteroskedastic errors, nonlinear reduced-
form, and missing instruments– the rejection frequencies under Hθ0
is very close to the
nominal 5% level (seeβ ∗ = 0 in all graphs). So, the proposed tests are robust to weak
identification, heteroskedastic and possibly non-Gaussian errors, as well as nonlinearity
and instrument exclusion in the reduced-form specification, thus conforming our theory
findings in Section 4.Second, we note that all tests have good power in all cases con-
sidered despite the relatively small sample size (n = 50). In particular, the exclusion of
relevant instruments in the inference does not substantially affect the power of the tests
when identification is strong (µ2 = 1000), showed the power curves for different values
of λ in each sub-figure (b) and (d). However, instrument exclusion have a slight effect on
test power in absence of identification (µ2 = 0), as showed the power curves for different
values ofλ in each sub-figure (a) and (c). In addition, note that the proposed test have good
power even witht(3)-type heteroskedastic errors and nonlinear reduced-form with missing
instruments, this confirming our theoretical conclusions in Section 4.1.
21The case in whichµ = 13 (weak instruments) is omitted to shorten the exposition.
20
Figure 1. Power of AR-test with heteroskedastic errors (σ = .1)
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70
80
90Power of AR−test with heteroskedastic errors (σ=0.1): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(a) Normal errors:µ2 = 0,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70
80
90
100Power of AR−test with heteroskedastic errors (σ=0.1): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(b) Normal errors:µ2 = 103,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70
80
90
100Power of AR−test with heteroskedastic errors (σ=0.1): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(c) Normal errors:µ2 = 0,m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70
80
90
100Power of AR−test with heteroskedastic errors (σ=0.1): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(d) Normal errors:µ2 = 103,m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)
Figure 2. Power of exact Monte Carlo AR-test with homoskedastic errors (σ = 0)
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70
80Power of exact Monte Carlo AR−test with homoskedastic errors (σ=0): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(a) t(3)-errors:µ2 = 0,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70
80
90
100Power of exact Monte Carlo AR−test with homoskedastic errors (σ=0): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(b) t(3)-errors:µ2 = 103,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70Power of exact Monte Carlo AR−test with homoskedastic errors (σ=0): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(c) t(3)-errors:µ2 = 0, m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70
80
90
100Power of exact Monte Carlo AR−test with homoskedastic errors (σ=0): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(d) t(3)-errors:µ2 = 1000, m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)
21
Figure 3. Power of exact Monte Carlo AR-test with heteroskedastic errors (σ = 0.1)
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70Power of exact Monte Carlo AR−test with heteroskedastic errors (σ=0.1): n=50
Rej
ectio
n fre
quen
cies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(a) t(3)-errors:µ2 = 0,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70
80
90
100Power of exact Monte Carlo AR−test with heteroskedastic errors (σ=0.1): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(b) t(3)-errors:µ2 = 103,m(X2, X3; Π) = X2Π2+X3 ∗λ δ 0
−3 −2 −1 0 1 2 3 4 50
5
10
15
20
25
30
35Power of exact Monte Carlo AR−test with heteroskedastic errors (σ=0.1): n=50
Rej
ectio
n fre
quen
cies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(c) t(3)-errors:µ2 = 0, m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)
−3 −2 −1 0 1 2 3 4 50
10
20
30
40
50
60
70
80
90
100Power of exact Monte Carlo AR−test with heteroskedastic errors (σ=0.1): n=50
Rejec
tion f
reque
ncies
β∗
No IV exclusion: λ=0
IV exclusion:λ=0.01
IV exclusion:λ=0.1
(d) t(3)-errors:µ2 = 1000, m(X2, X3; Π) = exp(X2Π2+X3 ∗λ δ 0)
22
6. Empirical application
We apply the proposed methods to Card (1995) model of the returns to education and
earnings. The version of this model after controlling for eventual instrument invalidity is:
log(wage) = βeduc+X′1γ1+X′
2γ2+e,(6.1)
educ = X′1π1+X′
2π2+v2,(6.2)
where wage is the earning, educ is the length of education (schooling),X1 =
[1,exper,exper2, race,smsa66,south66, IQ]consists of a constant, experience variables and
indicator variables for race, residence in a metropolitan area, residence in the south of the
United States, and IQ score. The instrument matrixX2 consists of the proximity-to-college
indicators for educational attainment; these areproximity to 2- and 4-year college.Hence,
we haveγ2 = (γ21,γ22)′ ∈ R
2. The original specification in Card (1995) imposes the ex-
clusion restrictions, i.e.,γ2 = 0. In recent years, several studies have raised concerns about
the validity of the proximity to 2- and 4-year college indicators as instruments foreduc;
for example, see Slichter (2013, Section 5). Here, we allowγ2 6= 0 and we use the method
proposed in this paper to build a joint confidence sets forθ = (β ,γ21,γ22)′ and marginal
confidence sets forβ , γ21, andγ22. Moreover, Kleibergen (2004, Table 2, p. 421) shows
that the proximity-to-college indicator instruments are not very strong. So, it is important to
use statistical procedures that are robust to both weak and invalid instruments for inference
in model (6.1 )-(6.2).
The data analyzed are from the National Longitudinal Surveyof Young Men (from
1966 to 1981). We use the cross-sectional 1976 subsample which contains 3010 observa-
tions. The variables contained in the data set are: two variables indicating the proximity to
college, the length of education, log wages, experience, age, racial, metropolitan, family,
and regional indicators.
If we impose the exclusion restrictions (γ2 = 0), the identification-robust confidence set
with level 95% for the returns to education (β ) that result on invertingAR(β 0) is given by:22
Cβ (α)=
β 0 : 41.437β20−21.193β0+1.886≤ 0
= [11.47%, 39.67%]when IQ score is
22The results reported are based on the critical values of theF-distribution but the results are similar whenthe asymptoticχ2 critical values are used.
23
used inX1 andCβ (α) =
β 0 : 24.239β20−11.449β0+0.978≤ 0
= [11.20%, 36.04%]
when IQ score is not part ofX1, where the explicit forms of the confidence intervals are
obtained by projection. If the exclusion restrictions imposed were satisfied, the fact that
Cβ (α) is very wide should indicate identification problems. But this conclusion may go
too far if γ2 6= 0, because the results are not valid, as extensively discussedin this paper.
We now focus on the case whereγ2 is left unrestricted.
Table 1 reported the joint confidence set with level 95% ofθ = (β ,γ21,γ22)′ based on
invertingAR(β0,γ02) and the marginal confidence sets for each parameter, obtained by pro-
jection; with or without the IQ score variable.23 We observe that the results are similar
with or without the IQ score variable. First, the confidence interval for the returns to edu-
cation is unbounded in both cases, thus educating thatβ is not identifiable after instrument
endogeneity is controlled for. Meanwhile, the confidence intervals for both instrument en-
dogeneity parameters (γ21 andγ22) are bounded and not very wide. Second, the confidence
intervals for the coefficient on the proximity to 2-year college indicator instrument (γ21)
does not include zero, with or without IQ score. This suggestthat this instrument violates
the exclusion restrictions (invalid instrument). So, imposing γ21 = 0, as usually done in
most applications may be problematic since even a slight correlation between the instru-
ments and errors can be detrimental to statistical inference; see Doko Tchatoka and Dufour
(2008) and Guggenberger (2011), among others. However, theprojection method fails to
reject that the proximity to 4-year college indicator instrument satisfies the exclusion re-
strictions. But the lower bound of the confidence interval for γ22 is close to zero without
the IQ score included inX1, and its upper bound is close to zero while the IQ score is
controlled for.
Overall, this application suggests that the proximity-to-college indicator instruments
are not strictly exogenous. It is therefore important to usestatistical procedures that are
robust to instrument endogeneity when conducting inference in this model.
23Note that by definition, the marginal confidence sets have level at least 95%.
24
Table 1.Identification-robust confidence intervals in Card model ofeducation and earnings withγ2 unrestricted
X1 does not include IQ score
Joint CS (level 95%) Cθ (α) =
θ0 : θ ′0Aθ0−
(21.193 64.910 52.519
)θ 0+1.886≤ 0
A=
41.437 77.262 167.02677.262 697.101 2.275167.026 2.275 500.517
, θ 0 = (β 0,γ0
21,γ022)
′
Projected CS (level≥ 95%)
education Cβ (α) =
β 0 : −22.697β 20+3.431β 0−0.992≤ 0
= R
proximity to 2-year Cγ21(α) =
γ0
21 : 1.106(γ021)
2−0.084γ021+0.001≤ 0
= [0.01, 0.066]
proximity to 4-year Cγ22(α) =
γ0
22 : 2.434(γ 022)
2−0.047γ022−0.0002≤ 0
= [−0.004, 0.023]
X1 includes IQ score
Joint CS (level 95%) Cθ (α) =
θ0 : θ ′0Aθ0−
(11.449 54.325 24.421
)θ 0+0.978≤ 0
A=
24.239 63.343 109.19563.343 473.430 28.928109.195 28.928 344.378
Projected CS (level≥ 95%)
education Cβ (α) =
β 0 : −16.615β 20+2.306β 0−0.905≤ 0
= R
proximity to 2-year Cγ21(α) =
γ0
21 : 7.536(γ021)
2−0.716γ021+0.009≤ 0
= [0.014, 0.081]
proximity to 4-year Cγ22(α) =
γ0
22 : 1.305(γ 022)
2+0.036γ022−0.0004≤ 0
= [−0.035, 0.008]
25
7. Conclusion
In this paper, we studied the possibility of building tests and confidence sets in IV re-
gressions where instrumental variables can be arbitrary weak, collinear and violate the
exclusion restrictions. We showed that a procedure similarto Anderson and Rubin (1949)
(AR) approach can be used to provide identification-robust tests and CSs for the structural
and instrument endogeneity parameters. Then, we used the projection method to obtain
identification-robust CSs for scalar linear combination ofthe elements of this parameters.
CSs for the structural coefficientβ and tests of exclusion restrictions (instrument selection)
are derived as special cases of the proposed projection method.
We present a Monte Carlo experiment that confirms our theoretical findings. The pro-
posed methods are illustrated through the well known model of the returns to education
and earnings [see Card (1995)]. The results clearly indicate that the proximity-to-college
instruments used in most application involving this model are not strictly exogeneity.
References
Anderson, T. W., 1971. The Statistical Analysis of Time Series. Wiley, New York.
Anderson, T. W., Rubin, H., 1949. Estimation of the parameters of a single equation in a
complete system of stochastic equations. Annals of Mathematical Statistics 20, 46–
63.
Andrews, D. W. K., Moreira, M. J., Stock, J. H., 2006. Optimaltwo-sided invariant similar
tests for instrumental variables regression. Econometrica 74(3), 715–752.
Andrews, D. W. K., Stock, J. H., 2007a. Inference with weak instruments. In: R. Blundell,
W. Newey, T. Pearson, eds, Advances in Economics and Econometrics, Theory and
Applications, 9th Congress of the Econometric Society Vol.3. Cambridge University
Press, Cambridge, U.K., chapter 6.
Andrews, D. W. K., Stock, J. H., 2007b. Testing with many weakinstruments. Journal of
Econometrics 138, 24–46.
26
Ashley, R., 2009. Assessing the credibilility of instrumental variables inference with imper-
fect instruments via sensitivity analysis. Journal of Applied Econometrics 24(2), 325–
337.
Basmann, R. L., 1960. On the asymptotic distributions of generalized classical linear esti-
mators. Econometrica 28, 97–107.
Bazzi, S. , Clemens, A. M., 2009. Blunt instruments: A cautionary note on establishing
the causes of economic growth. Technical report, Center forGlobal Development N0.
171.
Bekker, P., 1994. Alternative approximations to the distributions of instrumental variable
estimators. Econometrica 62, 657–681.
Berkowitz, D., Caner, M. , Fang, Y. , 2008. Are nearly exogenous instruments reliable?.
Economics Letters 101, 20–23.
Berkowitz, D., Caner, M., Fang, Y., 2012. The validity of instruments revisited. Journal of
Econometrics 166, 255–266.
Bhattacharya, R. N., Ghosh, J., 1978. On the validity of the formal Edgeworth expansion.
The Annals of Statistics 6, 434–451.
Bhattacharya, R. N., Rao, R., 1976. Normal approximation and asymptotic expansions. In:
R. Bhattacharya, R. Rao, eds, Normal Approximation and Asymptotic Expansions.
Wiley Series in Probability and Mathematical Analysis, NewYork.
Bound, J., Jaeger, D. A., Baker, R. M., 1995. Problems with instrumental variables estima-
tion when the correlation between the instruments and the endogenous explanatory
variable is weak. Journal of the American Statistical Association 90, 443–450.
Brock, W., Durlauf, S. , 2001. Growth empirics and reality. The World Bank Economic
Review 15(2), 229–272.
Card, D., 1995. Using geographic variation in college proximity to estimate the return to
schooling. In: D. Card, ed., Aspects of Labour Market Behaviour: Essays in Honour
27
of John Vanderkamp. University of Toronto Press: in L. N. Christo. des, E. K. Grant,
and R. Swidinsky Eds, Toronto, Canada.
Chaudhuri, S., Zivot, E., 2010. A new method of projection based inference in GMM with
weakly identified nuisance parameters. Technical report, Department of Economics,
New York University N.Y.
Choi, I., Phillips, P. C. B., 1992. Asymptotic and finite sample distribution theory for IV es-
timators and tests in partially identified structural equations. Journal of Econometrics
51, 113–150.
Doko Tchatoka, F., 2013. Specification tests with weak and invalid instruments. Technical
report, School of Economics and Finance, University of Tasmania Hobart, Australia.
Doko Tchatoka, F., 2014. Subset hypotheses testing and instrument exclusion in the linear
IV regression. Econometric Theory forthcoming.
Doko Tchatoka, F. , Dufour, J.-M. , 2008. Instrument endogeneity and identification-
robust tests: some analytical results. Journal of Statistical Planning and Inference
138(9), 2649–2661.
Doko Tchatoka, F. , Dufour, J.-M., 2014. Identification-robust inference for endogeneity
parameters in linear structural models. The Econometrics Journal 17, 165–187.
Donald, S. G., Newey, W. K., 2001. Choosing the number of instruments. Econometrica
69, 1161–1191.
Dufour, J.-M., 1997. Some impossibility theorems in econometrics, with applications to
structural and dynamic models. Econometrica 65, 1365–1389.
Dufour, J.-M., 2003. Identification, weak instruments and statistical inference in economet-
rics. Canadian Journal of Economics 36(4), 767–808.
Dufour, J.-M. , 2006. Monte Carlo tests with nuisance parameters: A general approach
to finite-sample inference and nonstandard asymptotics in econometrics. Journal of
Econometrics 138, 2649–2661.
28
Dufour, J.-M., 2009. Comments on “ Weak instrument robust tests in gmm and the new
keynesian phillips curve” by F. Kleibergen and S. Mavroeidis. Journal of Business
and Economic Statistics 27, 318–321.
Dufour, J.-M., Hsiao, C., 2008. Identification. In: L. E. Blume, S. N. Durlauf, eds, The New
Palgrave Dictionary of Economics second edn . Palgrave Macmillan, Basingstoke,
Hampshire, England. forthcoming.
Dufour, J.-M., Jasiak, J., 2001. Finite sample limited information inference methods for
structural equations and models with generated regressors. International Economic
Review 42, 815–843.
Dufour, J.-M., Khalaf, L., Beaulieu, M.-C., 2010. Multivariate residual-based finite-sample
tests for serial dependence and GARCH with applications to asset pricing models.
Journal of Applied Econometrics 25, 263–285.
Dufour, J.-M., Khalaf, L., Kichian, M., 2013. Identification-robustanalysis of DSGE and
structural macroeconomic models. Journal of Monetary Economics 60, 340–350.
Dufour, J.-M., Taamouti, M., 2005. Projection-based statistical inference in linear structural
models with possibly weak instruments. Econometrica 73(4), 1351–1365.
Dufour, J.-M. , Taamouti, M. , 2007. Further results on projection-based inference in IV
regressions with weak, collinear or missing instruments. Journal of Econometrics
139(1), 133–153.
Guggenberger, P. , 2010. The impact of a Hausman pretest on the size of the hypothesis
tests. Econometric Theory 156, 337–343.
Guggenberger, P., 2011. On the asymptotic size distortion of tests when instruments locally
violate the exogeneity assumption. Econometric Theory forthcoming.
Guggenberger, P., Kleibergen, F., Mavroeidis, S., Chen, L., 2012. On the asymptotic sizes
of subset anderson-rubin and lagrange multiplier tests in linear instrumental variables
regression. Econometrica 80(6), 2649–2666.
29
Guggenberger, P., Smith, R., 2005. Generalized empirical likelihood estimators and tests
under partial, weak and strong identification. EconometricTheory 21, 667–709.
Hahn, J., Ham, J., Moon, H. R., 2010. The Hausman test and weakinstruments. Journal of
Econometrics 160, 289–299.
Hall, A. R., Peixe, F. P. M., 2003. A consistent method for theselection of relevant instru-
ments. Econometric Reviews 2(3), 269–287.
Hall, A. R., Rudebusch, G. D. , Wilcox, D. W. , 1996. Judging instrument relevance in
instrumental variables estimation. International Economic Review 37, 283–298.
Hansen, C., Hausman, J., Newey, W., 2008. Estimation with many instrumental variables.
Journal of Business and Economic Statistics 26(4), 398–422.
Hansen, L. P., 1982. Large sample properties of generalizedmethod of moments estimators.
Econometrica 50, 1029–1054.
Hausman, J. , Hahn, J. , 2005. Estimation with valid and invalid instruments. Annales
d’Économie et de Statistique 79–80, 25–57.
Imbens, G. W., 2003. Sensitivity to exogeneity assumptionsin program evaluation. Amer-
ican Economic Review 93(2), 126–132.
Imbens, G. W., Kolesár, M., Chetty, R., Friedman, J., Glaeser, E., 20011. Inference and
identification with many invalid instruments. Technical report, Department of Eco-
nomics, Havard University Boston, MA.
Kiviet, J. F., Niemczyk, J., 2007. The asymptotic and finite-sample distributions of OLS
and simple IV in simultaneous equations. Computational Statistics and Data Analysis
51, 3296–3318.
Kiviet, J. F., Niemczyk, J., 2012. Comparing the asymptoticand empirical (un)conditional
distributions of OLS and IV in a linear static simultaneous equation. Computational
Statistics and Data Analysis 56, 3567–3586.
30
Kleibergen, F. , 2002. Pivotal statistics for testing structural parameters in instrumental
variables regression. Econometrica 70(5), 1781–1803.
Kleibergen, F., 2004. Testing subsets of structural coefficients in the IV regression model.
Review of Economics and Statistics 86, 418–423.
Kleibergen, F., 2005. Testing parameters in GMM without assuming that they are identified.
Econometrica 73, 1103–1124.
Kraay, A., 2008. Instrumental varaible regressions with honestly uncertain exclusion re-
strictions. Technical report, World Bank Washington, DC.
Magnus, J. R. , Neudecker, H. , 1999. Matrix Differential Calculus with Applications in
Statistics and Econometrics, Revised Edition. John Wiley &Sons, New York.
Mikusheva, A., 2010. Robust confidence sets in the presence of weak instruments. Journal
of Econometrics 157, 236–247.
Mikusheva, A. , 2013. Survey on statistical inferences in weakly-identified instrumental
variable models. Applied Econometrics 29(1), 117–131.
Moreira, M. J., 2003. A conditional likelihood ratio test for structural models. Econometrica
71(4), 1027–1048.
Moreira, M. J., Porter, J. , Suarez, G. , 2009. Bootstrap validity for the score test when
instruments may be weak. Journal of Econometrics 149, 52–64.
Muirhead, R. J., 2005. Aspects of Multivariate StatisticalTheory. John Wiley & Sons, Inc.,
Hoboken, New Jersey.
Murray, P. M., 2006. Avoiding invalid instruments and coping with weak instruments. The
Journal of Economic Perspectives 20(4), 111–132.
Nelson, C., Startz, R., 1990a. The distribution of the instrumental variable estimator and its
t-ratio when the instrument is a poor one. The Journal of Business 63, 125–140.
Nelson, C. , Startz, R. , 1990b. Some further results on the exact small properties of the
instrumental variable estimator. Econometrica 58, 967–976.
31
Phillips, G., Hale, C., 1977. The bias of instrumental variable estimators of simultaneous
equation systems. International Economic Review 18(1), 219–228.
Phillips, P. C. B. , 1989. Partially identified econometric models. Econometric Theory
5, 181–240.
Revankar, N. S., Hartley, M. J., 1972. An independence test and conditional unbiased pre-
dictions in the context of simultaneous equation systems. Econometrica 40(5), 913–
915.
Sargan, J., 1958. The estimation of economic relationshipsusing instrumental variables.
Econometrica 26(3), 393–415.
Slichter, D., 2013. Testing instrument validity and identification with invalid instruments.
Technical report, Department of Economics, University of Rochester Rochester, USA.
Staiger, D., Stock, J. H., 1997. Instrumental variables regression with weak instruments.
Econometrica 65(3), 557–586.
Stock, J. H., Wright, J. H., 2000. GMM with weak identification. Econometrica 68, 1055–
1096.
Stock, J. H., Wright, J. H. , Yogo, M. , 2002. A survey of weak instruments and weak
identification in generalized method of moments. Journal ofBusiness and Economic
Statistics 20(4), 518–529.
Stock, J. H., Yogo, M., 2005. Testing for weak instruments inlinear IV regression. In: D. W.
Andrews, J. H. Stock, eds, Identification and Inference for Econometric Models: Es-
says in Honor of Thomas Rothenberg. Cambridge University Press, Cambridge, U.K.,
chapter 6, pp. 80–108.
Swanson, N. R. , Chao, J. C. , 2005. Notes and comments: Consistent estimation with a
large number of weak instruments. Econometrica 73, 1673–1692.
Wang, J. , Zivot, E. , 1998. Inference on structural parameters in instrumental variables
regression with weak instruments. Econometrica 66(6), 1389–1404.
32
APPENDIX
A. Proofs
PROOF OF PROPOSITION 3.1 Suppose that (2.1)-(2.2) and Assumptions A - C hold.
Then, we haveE(Wtv1t) = 0, wherev1t = yt −X′1tξ 11−X′
2tξ 21 by (3.1). So, we have
E[Wt(yt −X′1tξ 11−X′
2tξ 21)] = E(Wtyt)−E(WtW′t )ξ 21 = 0 ⇔(A.1)
σWy−ΩWγ2−ΩWπ2β = 0 becauseξ 21 = γ2+π2β from (3.1),(A.2)
whereσWy= E(Wtyt) andΩW = E(WtW′t ). We want to solve (A.2) with respect toβ . To
do this, we find it useful to distinguish two cases: (a)π2 6= 0, and (b)π2 = 0.
(a) Suppose first thatπ2 6= 0. Then, post-multiplying both sides of (A.2) by does not
change the solution with respect toβ (if a solution exists). So, the system
π ′2σWy−π ′
2ΩWγ2 = π ′2ΩWπ2β(A.3)
and (A.2) are equivalent. Sinceγ2 is left unrestricted, aunique solutionwith respect
to β in (A.3), which does not depend onγ2, exists if, and only if,π ′2ΩWγ2 = 0 and
π ′2ΩWπ2 6= 0, i.e., if, and only if,γ2 ∈ ker(π ′
2ΩW) andπ2 /∈ ker(ΩW). In this case we have
β = (π ′2ΩWπ2)
−1σWy, which identifiable under Assumptions A - C because bothπ ′2σWy
andπ ′2ΩWπ2 can be estimated from the conditional means ofy1 andy2, givenX, in the
regressions in (3.1), even ifrank(X2) < k2; see Magnus and Neudecker (1999, Ch. 13,
Theorem 15) and the discussion in the last paragraph above Proposition3.1.
(b) Suppose now thatπ2 = 0. Then, (A.2) has multiple solution or does not have any
solution forβ , including whenγ2 ∈ ker(π ′2ΩW). So,β cannot be identified.
Proposition3.1 follows straightforwardly by putting (a) and (b) together.
PROOF OFLEMMA 4.1 (a) Suppose thatHθ0
holds. From (4.1), we haveYb0 = e and
33
b′0Ωb0 = e′MXe/(n−ν), so that
S+= [(W′W)+]
1/2W′e.
(e′MXen−ν
)−1/2
= S+
e from (4.6).(A.4)
It is clear from (A.4) that the distribution ofS+
underHθ0
, is identical to that ofS+
e , i.e.,
P0S+(x;θ0) = P
S+e(x,e) givenX = x, as stated. We now prove the invariance ofP
S+e(x,e).
(b) For any σ(X) satisfying (2.4), we can haveW′e.(
e′MXen−ν
)−1/2=
W′σ(X)e.(
σ(X)e′MXσ(X)en−ν
)−1/2=W′εσ (e).
(εσ (e)′MXεσ (e)
n−ν
)−1/2. So, we have
S+
e = [(W′W)+]1/2
W′e.(
e′MXen−ν
)−1/2
= [(W′W)+]1/2
W′εσ (e)
(εσ (e)′MXεσ (e)
n−ν
)−1/2
= S+
εσ ⇔ PS+e(x,e) = P
S+εσ(x,e) givenX = x.(A.5)
So, PS+e(x,e) is invariant to the transformation (2.4). If further Assumption D holds, we
εσd∼ ε, where givenX = x, the distribution ofε, Pε (x), is completely specified. There-
fore, givenX = x, we havePS+e(x,e)≡ P
S+
ε
(x, ε), whereε ∼ Pε (x) andPε (x) is completely
specified.
PROOF OFTHEOREM 4.3 Suppose that AssumptionsA - B andD are satisfied. If further
Hθ0
, it follows from the proof of Lemma4.1and (4.3) that
ΨAR(S+;θ0)
d∼ n−νν −ν1
ε ′PWεε ′MXε
.(A.6)
So, the conditional distribution ofΨAR(S+;θ 0) underHθ
0, givenX = x, only depends on the
distribution ofε, therefore is pivotal. We shall now distinguish the following two cases: (a)
assumption (2.5) (Gaussian errors) andε is independent ofX, (b) assumption (2.5) does
not hold orε is not independent ofX.
(a) Suppose that (2.5) andε is independent ofX. Then, it is straightforward to show
(A.7) ε ′MXε ∼ χ2(n−ν) and ε ′PWε ∼ χ2(ν −ν1),
34
whereMXPW = 0. So, ε ′MXε and ε ′PWε are independently, andΨAR(S+;θ0) ∼ F(ν =
ν1,n−ν) from (A.7). This means that the test that rejectsHθ0
whenΨAR(S+;θ 0)> Fα(n−
ν,ν −ν1) is similar with significance levelα for all values ofπ2, if (2.5) holds andX is
independent ofε, whereFα(n− ν,ν − ν1) is the 1−α quantile of aF distribution with
ν −ν1 andn−ν degrees of freedom.
(b) Suppose now that (2.5) does not hold orε is not independent ofX. It is straightfor-
ward to see that
(A.8) P[Ψ (0)
AR(S
+;θ 0)≥ cΨ (ε;α)
]= P
[pN[Ψ (0)
AR(S
+;θ0)]≤ α
]= α
underHθ0
, so that we have a test with levelα.
PROOF OFTHEOREM 4.8
The proof is similar to those of Theorem 4.1 Dufour and Taamouti (2007). Therefore,
we only present the outlines.
First, we can write the quadric formQ(θ) in (4.12) as:
Q(θ) = θ ′Aθ +b′θ +c≡ Q(θ) = θ ′Aθ + b′θ +c,(A.9)
A = R−1′AR−1 =
a11 A′
21
A21 A22
, b= R−1′b=
b1
b2
,(A.10)
where we haveθ = (ζ 0,θ′2)
′ = (w′θ ,θ ′2)
′, and R=
w′
R2
=
w1 w′
2
0 Im+k2−1
is a
square matrix of orderG+ k2 (w1 6= 0 by assumption). So, we can writeCw′θ (α) as
Cw′θ (α) ≡ Cζ 0(α) =
ζ 0 : θ = (ζ 0, θ ′
2)′satisfiesQ(θ)≤ 0
. Moreover, we can also ex-
plicitly write Q(θ) as
Q(θ) = = a11ζ 20+ b1ζ 0+c+θ ′
2A22θ2+[2A21ζ 0+ b2]′θ2 .(A.11)
From (A.11), it easy to see thatCw′θ (α)=
ζ 0 : minθ 2 Q(ζ 0, θ2)≤ 0. To solve this mini-
mization problem explicitly, we distinguish two cases: (a)A22≥ 0 andA22 6= 0, (b) A22= 0,
35
and (c)A22 is not positive definite.
(a) Suppose first thatA22≥ 0. The first and second derivatives ofQ(ζ 0, θ2) with respect
to θ 2 are:
∂ Q(ζ 0, θ2)
∂θ 2= 2A22θ2+2A21ζ 0+ b2 = 0⇔ A22θ2 =−A21ζ 0−
12
b2(A.12)
∂ 2Q(ζ 0, θ 2)
∂θ 2∂θ ′2
= 2A22.(A.13)
We distinguish the following two cases: (a1) rank(A22) = p2 = m+k2−1 and (a2)
0< rank(A22) = p2 < G+k2−1.
(a1) If rank(A22) = p2 = G+k2−1 (i.e. if A22 > 0), then (A.12)-(A.13) entail that
θ2 = θ∗2 =−A−1
22 A21ζ 0− 12A−1
22 b2, and∂ 2Q(ζ 0,θ2)∂θ 2∂θ ′
2> 0. So,θ∗
2 is the unique minimum
and by replacingθ2 by θ∗2 in the expression ofQ(ζ 0, θ2), we get:
Q(ζ 0, θ∗2)≡ Q(ζ 0) = a1ζ 2
0+ b1ζ 0+ c1(A.14)
wherea1= a11−A′21A
−122 A21, b1= b1−A′
21A−122 b2, andc1 = c− 1
4b′2A−122 b2. On noting
thatA−122 = A+
22, (4.12) holds withS1 = /0.
(a2) If 0 < rank(A22) = p2 < G+k2−1, we can writeQ(ζ 0, θ 2) as:
Q(ζ 0, θ2) = a11ζ 20+ b1ζ 0+c+ θ ′
2D2θ 2+[2A21ζ 0+ b2]′θ 2(A.15)
= a11ζ 20+ b1ζ 0+c+ θ ′
2∗D2∗θ2∗+[2A21∗ζ 0+ b2∗]′θ 2∗+
[P′22(2A21ζ 0+ b2]
′θ22,
where θ2 = P′21θ2, θ22 = P′
22θ2, and D2∗ > 0. If P′22(2A21ζ 0 + b2 = 0, we can
show as in (i1) that Cw′θ (α) =
ζ 0 : a1ζ 20+ b1ζ 0+ c1 ≤ 0
, where a1 = a11−
A′21∗D
−12∗ A21∗, b1 = b1−A′
21∗D−12∗ b2∗, andc1 = c− 1
4b′2∗D−12∗ b2∗. SinceA22=P′
2D2P2,
it is straightforward to see thatA+22=P′
21D−12∗ P21. Further, we also haveA′
21∗D−12∗ b2∗=
A′21A
+22b2, and b′2∗D
−12∗ b2∗ = b′2A+
22b2 so that (4.12) holds withS1 = /0. Now, if
P′22(2A21ζ 0 + b2 6= 0, we can proceed as above to show that (4.12) holds with
36
S1 =
ζ 0 : P′22(2A21ζ 0+ b2 6= 0
.
(b) Suppose thatA22 = 0. The result follows immediately form (A.11) and the last step
of the proof of (a2).
(c) Suppose now thatA22 is not positive definite. Hence, we can findθ 2 such that
θ ′2A22θ 2 = η < 0. So, for any any arbitrary scalarτ, we have
Q(ζ 0,τθ2) = a11ζ 20+ b1ζ 0+c+ητ2+ τ[2A21ζ 0+ b2]
′θ 2 .(A.16)
Becauseητ2+ τ[2A21ζ 0+ b2]′θ2 is a polynomial with respect toτ, andη < 0, we
can chooseτ sufficiently large to haveQ(ζ 0,τθ2) < 0, irrespective of the value of
ζ 0. So,Cζ 0(α) = R, as stated.
37