classical inference with ml and gmm estimates with various rates

Classical Inference with ML and GMM Estimates

with Various Rates of Convergence

Lung-fei Lee∗

June 2005

Department of Economics, Ohio State UniversityColumbus, OH 43210

Abstract

This paper considers classical hypothesis testing in the maximum likelihood(ML) and generalized method of moments (GMM) frameworks, where compo-nents of unconstrained (and constrained) estimates of a model may have variousrates of convergence and their limiting distributions are asymptotically normallydistributed. Sufficient conditions are established under which the likelihood ra-tio, efficient score, C(α), and Wald-type statistics for the testing of generalequality constraints can be asymptotically χ2 and are asymptotically equivalentunder both null and a sequence of local alternatives. Similarly, results for theanalogous difference test, gradient test, C(α)-type gradient test, and Wald testin the GMM estimation framework are established.

1 Introduction

In this paper, we consider classical hypothesis testing in the maximum like-

lihood (ML) and generalized method of moments (GMM) frameworks, where

components of unconstrained (and constrained) estimates of a model may have

various rates of convergence in distribution and their limiting distributions are

asymptotically normally distributed.

We consider the classical hypothesis testing of general (linear or nonlinear)

equality constraints on parameters of an econometric model. In the ML frame-

work, the classical testing procedures include the likelihood ratio (LR) test, the∗I appreciate having financial support from the NSF under grant no. 0519204 for my

research.

1

Lagrangian Multiplier (LM) (efficient score) test, Neyman’s C(α) test, the Wald

(W) test, and the minimum distance (MD) test. For the GMM approach, the

corresponding testing procedures of the LR, LM and C(α) tests are, respec-

tively, the difference test and the gradient test (Newey and West 1987; Ruud

2000), and C(α)-type gradient test (Lee 2005).

It is well known that in both the ML and GMM frameworks, these classi-

cal test statistics are asymptotically equivalent under both the null hypothesis

and an appropriate local alternative hypothesis, when all parameter estimates

in the model have the same (usually,√

n) rate of convergence and are asymp-

totically normally distributed.1 When various parameter estimates may have

different rates of convergence, the situation becomes complicated. In this paper,

we investigate the asymptotic properties of the various classical test statistics

when the ML estimates (MLE) or GMM estimates (GMME) of components of

the parameter vector may have different rates of convergence and their prop-

erly normalized estimates are asymptotically normally distributed. Under some

circumstances, we show that the familiar asymptotic properties of the classical

test statistics and their asymptotically equivalent results will still be valid.

In Section 2, we shall set up the situation in both the ML and GMM esti-

mation frameworks, where components of the unconstrained MLE and GMME

may have different rates of convergence. We shall focus on the case where their

asymptotic distributions are asymptotically normal. Section 3 considers gen-

eral equality constraints and asymptotic properties of the constrained MLE and

GMME. The subsequent sections consider the various classical hypothesis statis-

tics and their asymptotic properties. Section 4 specifies the local alternative

hypothesis under consideration. The MD test approach provides a framework

which connects the various classical testing statistics in Section 5. Section 6 con-

siders the Wald test. The LR type tests are in Section 7. The score-type tests

are considered in Section 8. Conclusions are drawn in Section 9. All the proofs

of the propositions are collected in Appendix A. Appendix B provides a result1The exceptional case that has not been considered in the literature is the C(α)-type

gradient test statistic, which has only been recently formulated in Lee (2005).

2

on the overidentification test in the GMM framework. Appendix B provides an

example to illustrate the importance of a critical assumption (Assumption G or

R) in establishing our asymptotic normal theory.

2 MLE and GMME with Various Rates of Con-

vergence

2.1 ML Estimation and the Asymptotic Distribution ofMLE

Let Ln(β) be the likelihood function of the parameter vector β in the parameter

space S, which is a convex and compact subset of the p dimensional Euclidian

space. The β0 denotes the true parameter vectors of β, which lies in the interior

of S. The likelihood function is twice continuously differentiable with respect

to β.

The β can be estimated by the maximization of ln Ln(β) on the parameter

space S. Let βn be the unconstrained MLE of β. We shall assume that the

consistency of βn has been established.2 For the purpose of this paper, our

subsequent analysis will concentrate on the issue of asymptotic distributions of

estimators and the associated inference statistics.

Assumption ML-C. The βn is a consistent estimate of β0 .

The asymptotic distribution of βn follows from

βn − β0 = −

∂2 ln Ln(β∗1,n)

∂β1∂β′

...∂2 ln Ln(β∗

p,n)

∂βp∂β′

−1

∂ lnLn(β0)∂β

, (1)

where βl is the lth component of β and β∗l,n’s lie between βn and β0, by the mean

value theorem.3 To simplify notations, we shall denote B∗n = (β∗

1,n, · · · , β∗p,n)

2For this circumstance, the conventional analysis by establishing the uniform convergenceof 1

nlnLn(β) to a well-defined limiting objective function (see, e.g., Amemiya (1985)) will

not be useful, because the limiting objective function will be flat around some componentsof β. For the spatial-econometric model in Lee (2004a), the analysis is applicable to someconcentrated likelihood function. In Park and Phillips (2000) for the analysis of nonstationarybinary choice models, they adopted the approach initiated in Wu (1981).

3The mean value theorem is applied to each component of∂ ln Ln(β)

∂β. The mean value

3

and ∂2 ln Ln(B∗n)

∂β∂β′ in place of the matrix of second order derivatives in the above

expression (1).

Assumption ML-D. Suppose there exists a sequence of invertible p × p

matrices Γn such that

1) −Γ′−1n

∂2 ln Ln(B∗n)

∂β∂β′ Γ−1n

p→ Ω;

2) Γ′−1n

∂ ln Ln(β0)∂β

d→ N (0, Ω),

where Ω is a p × p positive definite matrix, for any consistent estimates β∗j,n,

j = 1, · · · , p, in B∗n of β0.

Proposition 2.1 Under Assumptions ML-C and ML-D, the MLE βn has

the asymptotic distribution that

Γn(βn − β0) = Ω−1Γ′−1n

∂ lnLn(β0)∂β

+ op(1) d−→ N (0, Σ), (2)

where Σ = Ω−1.

If Γn is a diagonal matrix, the diagonal elements of Γn will represent the

rates of convergence for components of the MLE vector βn. The rates might not

be the same for all the components. The asymptotic variance of the MLE βn is

Γ−1n ΣΓ

′−1n = (Γ′

nΩΓn)−1. From 1) of Assumption ML-D, this asymptotic vari-

ance can be estimated by the estimated information matrix (−∂2 ln Ln(βn)∂β∂β′ )−1.

2.2 GMM Estimation and the Asymptotic Distribution ofthe GMME

We start with the framework with k moment equations

E(fn(β0)) = 0,

where fn : S → <k with k < p are continuously differentiable mappings. The

following basic assumptions are considered:

theorem is applicable to scalar value functions but not to vector-valued functions. This dis-tinction is less relevant in conventional asymptotic analysis but, for the analysis in this paper,our assumptions need to take this specific feature into account.

4

Assumption GMM-D1. There exists a sequence of invertible k × k ma-

trices Λn such that

Λnfn(β0)d−→ N (0, V ),

where V is a k × k positive definite variance matrix.

In this case, a possible generalized method of moments may be formulated

as

minβ∈S

f ′n(β)Λ′

nV −1n Λnfn(β), , (3)

where Vn is a consistent estimate of V . The minimized objective function takes

into account the possible different rates of convergence of the moments fn(β0)

at the true parameter vector β0. If all the moments have the same rate of

convergence, this reduces to the conventional GMM objective function. We

shall assume that the GMME βn is consistent to begin with.4

Assumption GMM-C. The GMME βn is a consistent estimate of β0

In order to derive appropriate rates of convergence of the GMM estimates,

the following assumption is useful. The notation ∂fn(B)∂β′ shall denote the matrix

of the partial derivatives of fn with its components evaluated at possible different

values of β with B ∈ Sk.

Assumption GMM-D2. There exists a sequence of invertible p × p ma-

trices Γn such that

Λn∂fn(B)

∂β′ = Fn(B)Γn,

for some k× p stochastic matrix Fn(B) on Sk, which converges in probability to

a nonstochastic matrix F (B) uniformly on Sk where F (β0, · · · , β0) has the full

rank p.

Note that the matrix Γn in Assumption GMM-D2 may not be that one in

Assumption ML-D in a single model. However, we use the same notation for

some unified assumptions in the subsequent Section 3 for both ML and GMM

estimation.4Contrary to the conventional case, the limiting objective function will be a stochastic

function instead of a nonstochastic function. Examples for consistency analyses for relatedsituations are in Moon and Schorfheide (2002) and Lee (2004b).

5

When B = (β, · · · , β), one may simply, respectively, denote Fn(β) and F (β)

for Fn(B) and F (B). Furthermore, F (β0, · · · , β0) will be denoted by F0 for

simplicity. Under these situations, the asymptotic normal distribution of βn

can be derived.

Proposition 2.2 Under Assumptions GMM-C, GMM-D1, and GMM-D2,

the GMME βn has the limiting distribution that

Γn(βn − β0) = −(F ′0V

−1F0)−1F ′0V

−1Λnfn(β0) + oP (1)d→ N (0, Σ), (4)

where Σ = (F ′0V

−1F0)−1.

If Γn is a diagonal matrix, then its diagonal elements represent various

rates of convergence for the components of βn. The asymptotic variance of

the GMME βn is Γ−1n ΣΓ

′−1n = (Γ′

nF ′0V

−1F0Γn)−1, which can be estimated by

(∂f ′n(βn)∂β Λ′

nV −1n Λn

∂fn(βn)∂β′ )−1.

3 Constrained ML and GMM Estimates

Consider the general equality constraints in the form that

R(β) = 0, (5)

where R : <p → <(p−q) forms a set of functionally independent (p−q) constraints

with ∂R(β0)∂β′ having the full rank (p − q). Such constraints may equivalently be

represented in the alternative form

β = g(δ), (6)

where δ ∈ <q with q ≤ p is the vector of free parameters, and ∂g(δ0)∂δ′ has the

full rank q. Explicitly, suppose that R(β) = 0 in (5). Let β = (β∗′

1 , β∗′

2 )′ be

a partition with β∗1 ∈ <p−q and β∗

2 ∈ <q . The β∗2 can be regarded as the free

parameters. Given β∗2 , β∗

1 can be solved from R(β∗1 , β∗

2) = 0 as β∗1 = g1(β∗

2 ).

Therefore, the constraints R(β) = 0 can be rewritten into β = g(δ) where δ = β∗2

and g(δ) = (g′1(β∗2 ), β∗′

2 )′. Conversely, suppose that (6) is satisfied. Decompose

6

(6) into β∗1 = g1(δ) and β∗

2 = g2(δ) where β = (β∗′

1 , β∗′

2 )′ with β∗1 ∈ <(p−q)

and β∗2 ∈ <q, and g2 is invertible. The corresponding constraints in (5) is

β∗1 − g1(g−1

2 (β∗2 )) = 0 with R(β) = β∗

1 − g1(g−12 (β∗

2 )).

For the constraints in (5), we shall consider the situation in the following

assumption, where ∂R(B)∂β′ refers to the matrix of partial derivatives of R with

respect to β′ with its (p − q) components evaluated at possibly different values

of β.

Assumption R. There exists a (p − q) × (p − q) invertible matrix Cn(B)

and a (p − q) × p matrix An(B) with B ∈ S(p−q) such that

∂R(B)∂β′ Γ−1

n = Cn(B)An(B), (7)

where An(B) converges to a nonstochastic finite matrix A(B) uniformly in B in

a neighborhood of (β0, · · · , β0) in S(p−q), and A0 = A(β0, · · · , β0) has full row

rank (p − q).

For the constraints in (6), we consider the following situation. Similarly,∂g(∆)

∂δ′ denotes the partial derivative of g with δ′ with its components evaluated

at possibly different values of δ.

Assumption G. There exists a p × q matrix Gn(D) and a q × q invertible

matrix Dn(∆) such that

Γn∂g(∆)

∂δ′= Gn(∆)Dn(∆) (8)

where Gn(D) converges uniformly in ∆ to G(∆) in a neighborhood of (δ0, · · · , δ0) ∈

(<q)p and G0 = G(δ0, · · · , δ0) has full column rank q.

In order to derive tractable asymptotic distributions of the constrained es-

timates, these assumptions are useful. Appendix C provides an illustrative

example on the asymptotic properties of the MD estimator (MDE) when As-

sumption G is not satisfied. That example illustrates that a general asymptotic

theory might not be feasible if Assumption G does not hold.

In these assumptions, we have paid special attention on each component of

the vector-valued functions R(β) and g(δ) because the linear expansion based

7

on the mean value theorem is applicable only to scalar-value functions. These

assumptions are essentially related to each other if the arguments in the various

components are the same. Because R(β) = 0 with β = g(δ) for all δ, it follows

∂R(g(δ))∂β′

∂g(δ)∂δ′

=∂R(g(δ))

∂β′ Γ−1n Γn

∂g(δ)∂δ′

= 0.

Hence the columns of Γ′−1n

∂R′(g(δ))∂β and Γn

∂g(δ)∂δ′ are perpendicular, and the

columns of [Γ′n

∂R′(g(δ))∂β

, Γn∂g(δ)∂δ′ ] span the p-dimensional Euclidian space <p.

Suppose (7) holds for some An(β) such that ∂R(β)∂β′ Γ−1

n = Cn(β)An(β). The

A′n(β) will span the same column subspace as that of Γ

′−1n

∂R′(g(δ))∂β in <p. The

matrix Gn(δ) can be chosen such that its columns are perpendicular to the

columns of A′n(β) and spans the orthogonal subspace of the column space of

A′n(β). The Gn(δ) can be taken as the orthonormal submatrix corresponds

to the eigenvectors of (Ip − A′n(β)[An(β)A

′−1n (β)]−1An(β)) with the nonzero

(unit) eigenvalues. Because Gn(δ) and Γn∂g(δ)∂δ′ span the same column space in

<p, there must exist an invertible transformation Dn(δ) such that Γn∂g(δ)∂δ′ =

Gn(δ)Dn(δ) as in (8) of Assumption G. Similarly, (8) implies (7) when the

components are evaluated at the same argument.

For simplicity, when B = (β, · · · , β), An(B) will be denoted by An(β) and

Cn(B) by Cn(β). Similarly, when ∆ = (δ, · · · , δ), Gn(∆) and Dn(∆) will be,

respectively, presented by Gn(δ) and Dn(δ). Note that the rows of A0 and G0

are always perpendicular to each other as in the following Proposition.

Lemma 3.1 Under Assumptions R and G, An(β)Gn(δ) = 0, where β = g(δ),

for all δ. This implies, in particular, A0G0 = 0 and the identity

P12 A′

0(A0PA′0)

−1A0P12 = Ip − P−1

2 G0(G′0P

−1G0)−1G′0P

− 12

holds for any p × p-dimensional positive definite matrix P .

The identity is useful for the equivalent expressions of some test statistics

and their limiting distributions in subsequent sections. As they shall be shown

in subsequent sections, while Γn provides the proper rates of the unconstrained

and constrained estimators of β0, if Dn = Dn(∆) does not depend on δ, Dn

8

will provide the rates matrix for the (constrained) estimator of δ0. If Cn =

Cn(B) does not depend on β, C−1n provides the rates matrix of R(βn) with the

unconstrained estimate βn. If Dn(∆) does depend on δ, the following situation

may be of consideration.

Assumption GL. Dn(∆) = D2n(∆)D1n where D2n(∆) is invertible and

D−12n (∆) converges to a matrix S(∆) uniformly in ∆ in a neighborhood of ∆0 =

(δ0, · · · , δ0) at which S(∆) is continuous.

In the event Dn(∆) does not depend on δ to begin with, Assumption GL

will be redundant as D2n(∆) shall be an identity matrix. Note that we have

neither restricted Dn(∆) nor D1n to be diagonal matrices. The implications of

such cases have been illustrated in some examples of constraints in Lee (2004b).

Finally, we note for asymptotic properties of the various test statistics, Assump-

tion GL will not be needed. Assumption GL is relevant only for the asymptotic

distribution and the rate of convergence of estimators of δ0. Assumption GL

allows the possibility that the resulted (constrained) estimator of δ0 may have

degenerate distribution after proper rates normalization.

3.1 Examples

Here we provide a simple example in a GMM estimation framework, where

Assumptions GMM-D1 and GMM-D2 hold, and also an example in the ML

framework. Assumption R and/or Assumption G will also be valid for some

restrictions of interest. Other relatively complicated examples can be found in

Lee (2004b).

3.1.1 A model of social interactions with rational expectations

The illustrated example is a model of social interactions with rational ex-

pectations as in Manski (1993) and Brock and Durlauf (2001). The social in-

teractions model under consideration is

yri = λ1

mr

mr∑

j=1

E(yrj |Jr) + xri,1α1 +1

mr

mr∑

j=1

xrj,2α2 + ur + εri, (9)

with i = 1, · · · , mr and r = 1, · · · , R in a group setting, where r refers to the

rth group and R is the total number of groups in the sample, while i refers to

9

the ith individual in a group and mr is the total number of members in the

rth group. The ur represents the specific group unobservable variable. The Jr

denotes the information set of the group r, which includes all exogenous vari-

ables xri,1, xri,2 for all i = 1, · · · , mr and r = 1, · · · , R. The disturbances εri

are i.i.d. (0, σ2) for all r and i. In this model, expected outcomes of the group

may influence outcome of each individual member in the group. The expected

outcomes shall be determined as equilibrium outcomes of the equation. The

parameter λ captures this possible effect of the expected group outcome on the

individual’s behavior. This has been termed an endogenous effect in Manski

(1993). The variables 1mr

∑mr

j=1 xrj,2 may capture interaction effect on an in-

dividual’s behavior through observed characteristics of his/her group, which is

termed exogenous interaction effect or contextual effect. For the identification

of the parameters, Manski has noted that xri,1 shall contain relevant exogenous

variables not included in xri,2.

For the estimation of this model, one may consider the GMM estimation

framework. The structural equation (9) implies that

1mr

mr∑

i=1

E(yri|Jr) =1

1 − λ(xr,1α1 + xr,2α2 + ur).

Hence, the structural equation can be rewritten as

yri = (xri,1 − xr,1)α1 + xr,1α1

1 − λ+ xr,2

α2

1 − λ+ ur + εri,

where xr,l = 1mr

∑mr

i=1 xri,l for l = 1, 2. The structural equation can be conve-

niently decomposed into the within group and between group equations:

yri − yr = (xri,1 − xr,1)α1 + (εri − εr), i = 1, · · · , mr ; r = 1, · · · , R, (10)

and

yr = xr,1α1

1 − λ+ xr,2

α2

1 − λ+ ur + εr, r = 1, · · · , R, (11)

where εr = 1mr

∑ni=1 εri.

Under the specification of a random component model where ur is un-

correlated with xr,1 and xr,2, the moment conditions of this model can be

10

E[(xri,1 − xr,1)′(εri − εr)] = 0 and E[x′r(ur + εr)] = 0, where xr consists of

all distinctive variables in xr,1 and xr,2. These moment conditions can be used

for the GMM estimation. Let β = (α′1, α

′2, λ). Suppose that xr,1 is of dimen-

sion k1 and xr is of dimension k. The empirical k1 + k moments vector-valued

function is

fn(β) =(

1n

∑Rr=1

∑mr

i=1(xri,1 − xr,1)′(yri − xri,1α1)1R

∑Rr=1 x′

r(yr − xr,1α1

1−λ− xr,2

α21−λ

)

).

For this set of moments, take Λn =(√

nIk1 00

√RIk

). One can see that, in

general,

Λnfn(β0) =

(1√n

∑Rr=1

∑mr

i=1(xri,1 − xr,1)′εri

1√R

∑Rr=1 x′

r(ur + εr)

)d→ N (0, V ),

which satisfies Assumption GMM-D1. The gradient matrix of fn(β) with β is

∂fn(β)∂β′ = (

∂fn(β)∂α′

1

,∂fn(β)

∂α′2

,∂fn(β)

∂λ) =

(An 0 0Bn Cn Dn

)

where

An = − 1n

R∑

r=1

mr∑

i=1

(xri,1 − xr,1)′(xri,1 − xr,1), Bn = − 1(1 − λ)R

R∑

r=1

x′rxr,1,

and

Cn = − 1(1 − λ)R

R∑

r=1

x′rxr,2, Dn = − 1

(1 − λ)2R

R∑

r=1

x′r(xr,1α1 + xr,2α2).

It follows that Λn∂fn(β)

∂β′ = Fn(β)Γn, where

Fn(β) =

(An 0 0√Rn

Bn Cn Dn

), Γn =

√nIk1 0 00

√RIk2 0

0 0√

R

.

Under the assumption that either Rn converges to a finite constant or 0 as n →

∞, the limiting matrix F (β) of Fn(β) can have full column rank when xr,1 has

at least a distinct relevant variable not included in xr,2. Thus, Assumption

GMM-D2 can be satisfied.

The hypotheses that are of interest may be the tests on whether the inter-

action effects are significant or not. We may consider three cases: 1) λ = 0, 2)

α2 = 0, and 3) both λ = 0 and α2 = 0.

11

1) H0 : λ = 0. For this case, R(β) = (0, 0, 1)β and, hence, ∂R(β)∂β′ = (0, 0, 1).

It follows that ∂R(β)∂β′ Γ−1

n = CnAn(β) with Cn = 1√R

and An(β) = (0, 0, 1), which

has the full row rank 1. Thus, Assumption R is satisfied. Alternatively, consider

β = g(δ) where δ = (α′1, α

′2)′ and g(δ) = (α′

1, α′2, 0)′. As ∂g(δ)

∂δ′ =

Ik1 00 Ik2

0 0

,

Γn∂g(δ)∂δ′ = Gn(δ)Dn with Gn(δ) =

Ik1 00 Ik2

0 0

and Dn =

(√nIk1 00

√RIk2

).

Thus, Assumption G is valid.

2) H0 : α2 = 0. This case has R(β) = (0, Ik2 , 0)β. As ∂R(β)∂β′ Γ−1

n = CnAn(β)

with Cn = 1√R

Ik2 and An(β) = (0, Ik2 , 0), which has the full row rank, thus,

Assumption R holds. Alternatively, g(δ) = (α′1, 0, λ)′ with δ = (α′

1, λ)′. As

∂g(δ)∂δ′ =

Ik1 00 00 1

, Γn

∂g(δ)∂δ′ = Gn(δ)Dn with Gn(δ) =

Ik1 00 00 1

, which has

full column rank, and Dn =(√

nIk1 00

√R

). Thus, Assumption G is valid.

3) H0 : λ = 0, α2 = 0. This case corresponds to R(β) =(

0 Ik2 00 0 1

)β.

Equivalently, g(δ) = (α′1, 0, 0)′ where δ = α1. Assumption R is satisfied with

Cn =

(1√R

Ik2 00 1√

R

)and An(β) =

(0 Ik2 00 0 1

). Assumption G is satisfied

with Gn(δ) =

Ik1

00

and Dn =

√nIk1 .

3.1.2 Mixed Estimation

Theil and Golderger (1961) has considered the pooling of sample information

and stochastic restrictions in a mixed estimation framework. The mixing esti-

mation issue may be extended into a general nonlinear restriction framework,

where the possible different degrees of information in the sample and the prior

stochastic restrictions are presented with different rates of convergence.

Let lnLn1(β1) be the log likelihood function which presents the sample infor-

mation about β. Suppose that this log likelihood function satisfies the standard

regularity of the conventional likelihood theory, in particular, 1n

∂2 ln Ln(B1n)∂β∂β′

p→

N (0, V2), and 1√n

∂ ln Ln(β10)∂β1

d→ N (0, V2). Suppose that the stochastic prior

information (restrictions) on β is β2 = h(β1) + ε2, where β2 is an estimate

12

of h(β) such that γn2(β2 − β20)d→ N (0, V2), where β2 = h(β1). The β2 is

a (p − q)-dimensional vector and β is a q-dimensional vector. The ∂h(β1)∂β1

is

a q × (p − q) matrix with full rank. Without loss of generality in asymp-

totical analysis, ε2 may be assumed to be N (0, 1γ2

n2V2). The sample infor-

mation and the prior information are as usual assumed to be independent.

Thus, for this case, Assumption ML-D is satisfied for the unrestricted pa-

rameter vector β = (β1, β2) with Γn =(

γn1Iq 00 γn2I(p−q)

), γn1 =

√n, and

Ω =

(−plimn→∞

1n

∂2 ln Ln1(β10)∂β1∂β′

10

0 V −12

).

For the constrained estimation, one has β = g(δ) where δ = β1 and g(δ) =

(δ′, h′(δ))′. The sample and the prior information will be mixed together for the

estimation of δ. The interesting question is whether the prior information will

be of any importance when one has a large sample. The answer to this question

in our setting will depend on the relative ratio of the rates γn1 and γn2. The

rate of the constrained (mixed) estimator of δ is also interesting. For this model,

it can be shown that Assumptions G and GL will be satisfied. The rate of the

(mixed) estimate δ will have the rate of D1n. The resulted rate matrix D1n will

depend on the ratio of γn1 and γn2.

(1) γn1γn2

→ ∞. In this case, Γn∂g(∆)

∂δ′ = Gn(∆)D1n where D1n = γn1Iq and

Gn(∆) =(

Iqγn2γn1

∂h(∆)∂δ′

). The limiting matrix G0 =

(Iq

0

)which has full

rank q. This corresponds to the case that the prior information is relatively

much weaker than that of the sample, so the mixed estimate of δn has the

γn1-rate of convergence, i.e.,√

n-rate.

(2) γn1γn2

→ c where c > 0 is a finite constant. In this case, D1n = γn1Iq and

Gn(∆) =(

Iqγn2γn1

∂h(∆)∂δ′

)and G0 =

(Iq

c∂h(δ0)∂δ′

)has full rank q. As the

rates are the same, the rate of δn has the similar rate. Both the prior

information and sample are useful for the constrained estimation.

(3) γn1γn2

→ 0. In this case, the prior information is relatively stronger than the

sample information.

13

(i) Case 1: (p− q) ≥ q. In this case, ∂h(δ)δ′ has rank q. The D1n = γn2Iq ,

Gn(∆) =( γn1

γn2Iq

∂h(δ)∂δ′

), and G0 =

(0

∂h(δ0)∂δ′

), which has rank q.

(ii) Case 2: (p − q) < q. Let m = q − (p − q). As ∂h(δ)δ′ has only

rank (p − q), the G0 in the preceding Case 1 is not relevant. The

search for D1n and Gn(∆) is relatively complicated. Take D1n =

γn1Iq . Then, Γn∂g(∆)

∂δ′ D−11n =

(Iq

γn2γn1

∂h(∆)∂δ′

), which does not converge.

Decompose ∂h(∆)∂δ′ such that ∂h(∆)

∂δ′ = (H1(∆), H2(∆)) where H2(∆)

is an invertible (p− q) × (p − q) matrix, and H1(∆) is a (p − q) × m

matrix. Consider the following matrix, which is invertible,

D−12n (∆) =

(0 Im

γn1γn2

H−12 (∆) −H−1

2 (∆)H1(∆)

).

It follows that γn2γn1

∂h(∆)∂δ′ D−1

2n (∆) = (I(p−q) 0). The relevant Gn ma-

trix can be taken as

Gn(∆) =(

Iqγn2γn1

∂g(∆)∂δ′

)D−1

2n (∆) =

0 Imγn1γn2

H−12 (∆) −H−1

2 (∆)H1(∆)I(p−q) 0

.

The limiting matrix G0 is G0 =

0 Im

0 −H−12 (δ0)H1(δ0)

I(p−q) 0

, which

has full rank q. Thus, Assumptions G and GL are satisfied. The rate

for the constrained estimator δn will be γn1.

3.2 Constrained ML Estimation

Consider the constraints in the form β = g(δ), where β ∈ <p and δ ∈ <q with

q ≤ p. Let Lcn(δ) = Ln(g(δ)) be the likelihood function of the constrained

parameter vector δ. Let δ0 denote the corresponding true parameter vector of

δ.

The δ can be estimated by the maximization of lnLcn(δ) on the parameter

space of δ. Let δn be the MLE of δ. We shall assume that the consistency of δn

has been established.55In a linear time series model, Nagaraj and Fuller (1991) establish the consistency of the

constrained estimator via the consistency of the unconstrained estimator. In general, one mayexpect that arguments in establishing the consistency of the unconstrained estimator mightbe carried over for the consistency of the constrained estimator.

14

Assumption ML-C′. The δn is a consistent estimate of δ0.

The following proposition provides the asymptotic distribution of the con-

strained MLE δn.

Proposition 3.1 Under Assumptions ML-C′, ML-D, G and GL,

D1n(δn − δ0) = S0[G′0ΩG0]−1G′

0Γ′−1n

∂ ln Ln(β0)∂β

+ oP (1)

d−→ N (0, S0(G′0ΩG0)−1S′

0),

where S0 = S(∆0).

Note that if S0 does not have full rank, some components of D1n(δn − δ0)

may be asymptotically linearly dependent and, hence, the limiting distribution

of D1n(δn − δ0) can be degenerated.

The following proposition provides the asymptotic distribution of the con-

strained MLE βcn of β0.

Proposition 3.2 Under Assumptions ML-C′, ML-D and G,

Γn(βcn − β0) = G0 [G′0ΩG0]

−1G′

0Γ−1n

∂ ln Ln(β0)∂β

+ oP (1)

d−→ N (0, G0(G′0ΩG0)−1G′

0).

The constrained MLE βcn of β0 is asymptotically efficient relative to the uncon-

strained MLE βn.

The constrained MLE βcn has the same rate matrix Γn as the unconstrained

MLE βn but is asymptotically efficient relative to the unconstrained MLE under

the null hypothesis. The asymptotic variance of βcn is Γ−1n G0(G′

0ΩG0)−1G′0Γ

′−1n ,

which can be estimated

∂g(δn)∂δ′

(∂g′(δn)

∂δ(−∂2 lnLn(βcn)

∂β∂β′ )∂g(δn)

∂δ′)−1 ∂g′(δn)

∂δ, (12)

which is also familiar in the regular case. The asymptotic variance in ( (12)

provides robust estimates of asymptotic variances of constrained ML estimates

for both the regular case as well as the irregular case under consideration.

15

3.3 Constrained GMM Estimation

The constrained GMM for the estimation of δ0 is

minδ

f ′n(g(δ))Λ′

nV −1n Λnfn(g(δ)).

Let δn be the constrained GMME of δ0. The corresponding constrained GMME

of β0 is βcn = g(δn).

Assumption GMM-C′. The constrained GMME δn is a consistent esti-

mate of δ0.

Proposition 3.3 Under Assumptions GMM-D1, GMM-D2, GMM-C′, and

G, the constrained GMME βcn has the limiting distribution

Γn(βcn − β0)d→ N (0, G0(G′

0F′0V

−1F0G0)−1G′0),

and is asymptotically efficient relative to the unconstrained GMME βn.

In addition, suppose that Assumption GL is satisfied, then

D1n(δn − δ0) = −S0(G′0F

′0V

−1F0G0)−1G′0F

′0V

−1Λnfn(β0) + oP (1)d−→ N (0, S0(G′

0F′0V

−1F0G0)−1S′0).

From the limiting distribution for the constrained GMME δn, the rates of con-

vergence for the components of δn are in D1n. The constrained GMME βcn may

have the same rates Γn as the unconstrained GMME βn, but can be asymptot-

ically efficient relative to the unconstrained one.

The asymptotic variance of the constrained GMME βcn is

Γ−1n G0(G′

0F′0V

−1F0G0)−1G′0Γ

′−1n =

∂g(δ0)∂δ′

(D′nG′

0F′0V

−1F0G0Dn)−1 ∂g′(δ0)∂δ

under Assumption G, which can be estimated by

∂g(δn)∂δ′

(∂g′(δn)

∂δ

∂f ′n(g(δn))

∂βΛ′

nV −1n Λn

∂fn(g(δn))∂β′

∂g(δn)∂δ′

)−1 ∂g′(δn)∂δ

,

under Assumption GMM-D2.

16

4 Hypothesis Testing Under the Null and LocalAlternative Hypotheses

In the subsequent sections, we shall consider hypothesis tests, which include

the MD, W, LR, and LM tests under the ML framework. For the GMM, the

tests corresponding to the LR and LM tests shall be the difference (D) test and

the gradient (G) test. The null hypothesis is H0 : R(β0) = 0 or, equivalently,

β0 = g(δ0). We shall also investigate the asymptotic properties of the various

test statistics under the local alternative

H1 : βn0 = β0 + Γ−1n ∆ (13)

for some constant matrix ∆, where Γn is the same rates matrix in the assump-

tions ML-D, GMM-D2, R, and G. Under H1, while R(β0) = 0, R(βn0) may not

be zero. Corresponding to β0, there exists a unique δ0 such that β0 = g(δ0) but

βn0 may not be in the image of g(δ) for all δ.

Under this local alternative, the condition 2) in Assumption ML-D shall be

modified with the sequence of true parameter vectors βn0:

Assumption ML-D′. Under H1 : βn0 = β0 +Γ−1n ∆, where ∆ is a constant

vector,

1) −Γ′−1n

∂2 ln Ln(B∗n)


p→ Ω;

2) Γ′−1n

∂ ln Ln(βn0)∂β

d→ N (0, Ω)

where Ω is a positive definite matrix, for any consistent estimates β∗j,n, j =

1, · · · , p of β0.

Similarly, Assumption GMM-D1 shall be replaced by

Assumption GMM-D1′. Under H1 : βn0 = β0 + Γ−1n ∆, there exists a

sequence of invertible k × k matrices Λn such that

Λnfn(βn0)d→ N (0, V ),

where V is a k × k positive definite variance matrix.

17

The concept of contiguity of probability measures is useful for establishing

convergences of statistics in probability under H1 in (13) while the convergences

of such statistics under H0 is known. Suppose that Pn and Qn are two se-

quences of probability measures. The sequence Qn is said to be contiguous to

Pn if for any sequence of random variables (or events) Tn for which Tn → 0

in Pn-probability, Tn → 0 in Qn-probability (Le Cam, 1960; Hajek and Sidak,

1967). Contiguity can be established by Le Cam’s first lemma on the log like-

lihood ratio of densities functions corresponding to Pn and Qn (Le Cam, 1960;

Bickel et al., 1993, Appendix A.9). For our purpose, the sequence of Pn cor-

responds to the distribution of model under H0 with the true parameter β0, and

Qn corresponds to the distributions under the sequence of local alternatives

in (13).

Proposition 4.1 Let Tn be a statistic. Under Assumption ML-D, if Tnp→ 0

under H0, then Tnp→ 0 under H1.

The contiguity properties of the model under H0 and H1 are useful to es-

tablish the asymptotic distributions of the unconstrained and unconstrained

estimators under H1.

The following theorem provides the asymptotic distributions of both the

unconstrained and constrained MLE’s under the sequence of local alternatives.

Proposition 4.2 Suppose Assumptions ML-C′, ML-D′ and G hold. Under

the local alternative H1, the unconstrained MLE βn has the following asymptotic

distribution:

Γn(βn − β0) = Ω−1Γ′−1n

∂ lnLn(βn0)∂β

+ ∆ + op(1) d→ N (∆, Ω−1);

and the constrained MLE βcn has the asymptotic distribution:

Γn(βcn − β0) = G0 [G′0ΩG0]

−1G′

0[Γ′−1n


+ Ω∆] + oP (1)

d−→ N (G0(G′0ΩG0)−1G′

0Ω∆, G0(G′0ΩG0)−1G′

0).

Furthermore, under the additional Assumption GL,

D1n(δn − δ0) = S0[G′0ΩG0]−1G′

0[Γ′−1n

∂ lnLn(βn0)∂β

+ Ω∆] + oP (1)

18

d−→ N (S0(G′0ΩG0)−1G′

0Ω∆, S0(G′0ΩG0)−1S′

0).

For the GMM estimation, in place of features in the underlying likelihood

function of a model, we shall assume directly that the contiguity property holds

under H1 with H0.

Assumption CT. The distributions under the sequence of local alternatives

H1 : βn0 = β0 + Γ−1n ∆ are contiguous to the distribution under H0.

The following one summaries the asymptotic distributions of the uncon-

strained and constrained GMM estimators under the sequence of local alterna-

tives.

Proposition 4.3 Suppose that Assumptions GMM-D1′, GMM-D2, GMM-

C′, and G hold. Under the local alternative H1 and Assumption CT, the uncon-

strained GMM estimate βn has the following asymptotic distribution:

Γn(βn − β0) = −(F ′0V

−1F0)−1F ′0V

−1Λnfn(βn0) + ∆ + op(1)d→ N (∆, (F ′

0V−1F0)−1).

and the constrained GMM estimate βcn has

Γn(βcn − β0)d→ N (G0(G′

0F′0V

−1F0G0)−1G′0F

′0V

−1F0∆, G0(G′0F

′0V

−1F0G0)−1G′0).

Under the additional Assumption GL,

D1n(δn − δ0) = −S0(G′0F

′0V

−1F0G0)−1G′0F

′0V

−1[Λnfn(β0) − F0∆] + oP (1)d−→ N (S0(G′

0F′0V

−1F0G0)−1G′0F

′0V

−1F0∆, S0(G′0F

′0V

−1F0G0)−1S′0).

For the unconstrained MLE, its limiting variance is Σ = Ω−1 and the lim-

iting variance of the unconstrained GMME is Σ = (F ′0V

−1F0)−1 under both

the local and alternative hypotheses. The next section shall first explore the

MD test. Subsequently, we can show that the classical test statistics can be

asymptotically equivalent to the MD test under both the null hypothesis and

19

the local alternative hypothesis. It is interesting to point out that, for all those

statistics, ∂f ′n(β)∂β

Λ′nV −1Λn

∂fn(β)∂β′ of the GMM approach plays a role similar to

(−∂2 ln Ln(β)∂β∂β′ ) of the ML approach.

5 The Minimum Distance Test

Suppose that an unconstrained estimator of β0 is βn with

Γn(βn − β0)d→ N (0, Σ), (14)

where the limiting variance matrix Σ is positive definite. If Γn is a diagonal

matrix, its diagonal elements will consist of various rates of convergence for

components of the estimate βn.

With the unconstrained estimate βn, a constrained estimator under the con-

straints R(β) = 0 can be derived by minimizing a weighted distance subject to

the constraints:

minβ

[Γn(βn − β)]′Σ−1n [Γn(βn − β)] : R(β) = 0, (15)

where Σn is a consistent estimate of Σ. Equivalently, in terms of the constraints

in the form β = g(δ), the MD estimation is

minδ

[Γn(βn − g(δ))]′Σ−1n [Γn(βn − g(δ))]. (16)

In the ML estimation under our setting, a consistent estimate of Σ−1 = Ω

is (−Γ′−1n

∂2 ln L(βn)∂β∂β′ Γ−1

n ). A version of the MD estimation in (15) can be based

on the distance

(βn − β)′(−∂2 lnL(βn)∂β∂β′ )(βn − β).

For the GMM estimation under our setting, as the unconstrained estimator

βn has the limiting distribution Γn(βn − β0)d→ N (0, (F ′

0V−1F0)−1). The

MD approach can be minβ[Γn(βn − β)]′F ′nV −1

n Fn[Γn(βn − β)], where Fn =

Fn(βn, · · · , βn) is a consistent estimate of F0. Alternatively, the distance func-

tion can be

(βn − β)′(∂f ′

n(βn)∂β

Λ′nV −1

n Λn∂fn(βn)

∂β′ )(βn − β).

20

These two formulations are identical because, under the situation in Assumption

GMM-D2, ∂f ′n(βn)∂β

Λ′nV −1

n Λn∂fn(βn)

∂β′ = Γ′nF ′

n(βn)V −1n Fn(βn)Γn.

Let βcm,n be the constrained MDE of β0 with R(β0) = 0, and δm be the

MDE of δ0. From (15) and (16), βcm,n = g(δm) when the same weighting matrix

Σ−1n is used in both (15) and (16). The consistency of βcm,n can be established

with the arguments in Moon and Schorfheide (2002) and Lee (2004b). We would

like to show that the minimized distance function in (15) can be a useful test

statistic for the constraints.

Proposition 5.1 Suppose that Assumption R or Assumption G is satisfied.

Under the null hypothesis H0, when Γn(βn−β0)d→ N (0, Σ), the MDE βcm,n

has the asymptotic distribution:

Γn(βcm,n − β0) = Ip − ΣA′0(A0ΣA′

0)−1A0 · Γn(βn − β0) + op(1)

= G0(G′0Σ

−1G0)−1G′0Σ

−1 · Γn(βn − β0) + op(1)

d→ N (0, Σcm),

where Σcm = Σ − ΣA′0(A0ΣA′

0)−1A0Σ = G0(G′0Σ−1G0)−1G′

0. In addition,

under Assumption GL,

D1n(δcm,n − δ0) = S0(G′0Σ

−1G0)−1G′0Σ

−1Γn(βn − β0) + op(1)d→ N (0, S0(G′

0Σ−1G0)−1S′

0).

Under the sequence of local alternatives H1 and Assumption CT, when Γn(βn−

β0)d→ N (∆, Σ),

Γn(βcm,n − β0)d→ N (µ, Σcm),

where µ = G0(G′0Σ−1G0)−1G′

0Σ−1∆ = (Ip − ΣA′0(A0ΣA′

0)−1A0)∆. Further-

more, under Assumption GL,

D1n(δcm,n − δ0)d→ N (S0µ, S0(G′

0Σ−1G0)−1S′

0).

The MDE βcm,n is asymptotically efficient relative to the unconstrained esti-

mate βc,n because Σcm is smaller than Σ by the generalized Schwartz inequality.

21

Assumption G plays a crucial role in the asymptotic distribution of δcm,n, and,

hence, the asymptotic distribution of the constrained estimator βcm,n.

Appendix C provides an illustrative example on asymptotic properties of

the MDE when Assumption G is not satisfied. That example illustrates the

possible rates of convergence would be rather complicated and the asymptotic

distributions might not be normal.

Proposition 5.2 Suppose that Assumption R or Assumption G is satisfied,

then

[Γn(βn − βcm,n)]′Σ−1n [Γn(βn − βcm,n)]

= [Γn(βn − β0)]′A′0(A0ΣA′

0)−1A0[Γn(βn − β0)] + op(1).. (17)

Under the null hypothesis H0, when Γn(βn − β0)d→ N (0, Σ),

[Γn(βn − βcm,n)]′Σ−1n [Γn(βn − βcm,n)] d→ χ2

(p−q),

where χ2(p−q) is the (central) χ2 random variable with (p−q) degrees of freedom.

Under the local alternative hypothesis H1 and Assumption CT, when Γn(βn−

β0)d→ N (∆, Σ),


(p−q)(η), (18)

which is a noncentral χ2 random variable with (p − q) degrees of freedom and

the noncentral parameter η, where

η = ∆′A′0(A0ΣA′

0)−1A0∆ = ∆′(Σ−1 − Σ−1G0(G′

0Σ−1G0)−1G′

0Σ−1)∆. (19)

The MD test provides the reference with which all the classical asymptotic

tests in subsequent sections can be compared. The asymptotic equivalency of

those classical asymptotic tests under both H0 and H1 can then be demon-

strated.

22

6 The Wald Test

The Wald test can be constructed with R(βn), where βn is either the MLE or

GMME. By the mean value theorem and Assumption R,

R(βn) =∂R(B∗

n)∂β′ (βn−β0) =

∂R(B∗n)

∂β′ Γ−1n ·Γn(βn−β0) = Cn(B∗

n)An(B∗n)·Γn(βn−β0).

As Γn(βn − β0)d→ N (0, Σ) under H0, it follows that

C−1n (B∗

n)R(βn) = A0Γn(βn − β0) + op(1) d→ N (0, A0ΣA′0).

The C−1n (B∗

n) apparently represents the rates matrix of R(βn). The Wald test

statistic Wn in its general form can be

Wn = R′(βn)[∂R(βn)

∂β′ Γ−1n ΣnΓ

′−1n

∂R′(βn)∂β

]−1R(βn). (20)

For the MLE, its limiting variance is Σ = Ω−1. Under the setting in As-

sumption ML-D, Γ′−1n (−∂2 ln Ln(βn)

∂β∂β′ )Γ−1n estimates Ω. An alternative form of

the Wald test with the MLE βn is the following familiar one:

R′(βn)

∂R(βn)

∂β′

(−∂2 lnLn(βn)

∂β∂β′

)−1∂R(βn)

∂β

−1

R(βn).

In the GMM framework, the limiting variance of the GMME βn is Σ =

(F ′0V

−1F0)−1. The Wald test statistic can be

R′(βn)[∂R(βn)

∂β′ (∂f ′

n(βn)∂β

Λ′nV −1

n Λn∂fn(βn)

∂β′ )−1 ∂R′(βn)∂β

]−1R(βn).

This follows because

R′(βn)[∂R(βn)

∂β′ Γ−1n (F ′(βn)V −1

n F (βn))−1Γ′−1n

∂R′(βn)∂β

]−1R(βn)

= R′(βn)[∂R(βn)

∂β′ (Γ′nF ′(βn)V −1

n F (βn)Γn)−1 ∂R′(βn)∂β

]−1R(βn)

= R′(βn)[∂R(βn)

∂β′ (∂f ′

n(βn)∂β

Λ′nV −1

n Λn∂fn(βn)

∂β′ )−1∂R′(βn)∂β

]−1R(βn)

under Assumption GMM-D2.

23

Proposition 6.1 Suppose that the setting under Assumption R holds. The

Wald test statistic for testing the hypothesis R(β) = 0 has, under the null hy-

pothesis H0,

Wn = [Γ−1n (βn − β0)]′A′

0(A0ΣA′0)

−1A0[Γn(βn − β0)] + op(1) d→ χ2(p−q). (21)

Under the local alternative hypothesis H1 and Assumption CT, Wnd→ χ2

(p−q)(η),

where the noncentral parameter η is in (19). Under both H0 and H1, the Wald

test statistic is asymptotically equivalent to the MD test statistic.

7 The Likelihood Ratio Type Tests

7.1 The Maximum Likelihood Ratio Test

The following proposition gives the asymptotic distribution of the LR statistic.

Proposition 7.1 Suppose Assumption G; Assumptions ML-C and ML-D

under H0; and Assumptions ML-C′ and ML-D′ under H1 hold.

Then, under H0

2[lnLn(βn) − ln Ln(βcn)] d−→ χ2(p−q),

and, under the local alternative hypothesis H1,

2[lnLn(βn) − lnLn(βcn)] d−→ χ2(p−q)(η),

where the noncentral parameter η is in (19).

From (33) in the proof and (17), we can see that the LR statistic is asymp-

totically equivalent to the MD test statistic under both the null hypothesis H0

and the local alternative hypothesis H1.

7.2 Difference Test

In the GMM framework, the difference test is analogous to the LR test in the

likelihood framework. The difference test in the GMM framework is based on

24

the difference of the minimized objective functions with and without constraints.

It is

D = f ′n(g(δn))Λ′

nV −1n Λnfn(g(δn)) − f ′

n(βn)Λ′nV −1

n Λnfn(βn),

where Vn is a consistent estimate of V .

Proposition 7.2 Suppose that the Assumption G; and GMM-C, GMM-D1

and GMM-D2 under H0; and GMM-C′, GMM-D1′, GMM-D2, and CT under

H1 hold.

Then, the difference test D is asymptotically equivalent to the MD test under

both the null and local alternative hypotheses. Under H0, it is asymptotic χ2(p−q),

and under H1 it is asymptotic χ2(p−q)(η).

8 The Score Type Tests

8.1 The LM (Efficient Score) Test and Neyman’s C(α)Test

The LM statistic is ∂ ln Ln(βcn)∂β′ (−∂2 ln Ln(βcn)

∂β∂β′ )−1 ∂ ln Ln(βcn)∂β′ . The following propo-

sition provides the asymptotic distribution of the LM statistic under the null

hypothesis.

Proposition 8.1 Suppose Assumption G; Assumptions ML-C and ML-D

under H0; and Assumptions ML-C′ and ML-D′ under H1 hold.

The LM statistic

∂ lnLn(βcn)∂β′ (−∂2 ln Ln(βcn)

∂β∂β′ )−1 ∂ lnLn(βcn)∂β′

is asymptotically equivalent to the LR test statistic under both H0 and the local

alternative H1.

The LM statistic is evaluated at the restricted MLE. Neyman (1959) gener-

alizes the efficient score test to a test which is invariant to restricted consistent

estimates, namely, the C(α)-test. The C(α) statistic may have computational

advantage relative to the score test if the restricted MLE is difficult to be com-

puted but appropriate consistent estimates can be available. Neyman’s original

C(α)-statistic is formulated in the case where the retrictions are imposed on

25

a subset of parameters being known. Smith (1987) and Dagenais and Du-

four (1991) discuss the general version in terms of general explicit constraints

R(β) = 0. With the identity in Lemma 3.1, one can formulate the C(α)-statistic

in terms of the explicit constraints β = g(δ). In its generalized form for testing

the explicit constraints the C(α)-test is

Cα =∂ lnLn(βcn)

∂β′ (−∂2 ln Ln(βcn)∂β∂β′ )−1 ∂ lnLn(βcn)

∂β′

−∂ lnLn(βcn)∂δ′

(−∂g′(δn)∂δ

∂2 lnLn(βcn)∂β∂β′

∂g(δn)∂δ′

)−1 ∂ ln Ln(βcn)∂δ′

,

where βcn = g(δn) and δn is Dn-consistent where Dn = Dn(∆) in Assumption

G is assumed to be independent of ∆. The following proposition shows that Cα

is asymptotically equivalent to the minimum distance statistic under both H0

and H1.

Proposition 8.2 Suppose that Assumptions ML-C and ML-D under H0;

and Assumptions ML-C′ and ML-D′ under H1 hold. Furthermore, Assumption

G holds with Dn = Dn(∆), which does not depend on parameters δ.

Then, the Cα test is asymptotically equivalent to the minimum distance test

under both the null and local alternative hypotheses. Under H0, it is asymptotic

χ2(p−q), and under H1 it is asymptotic χ2

(p−q)(η).

8.2 Gradient Test and C(α)-type Gradient Test

The derivative of the GMM objective function in (3) evaluated at the restricted

estimator βcn is two times ∂f ′n(βcn)∂β Λ′

nV −1n Λnfn(βcn). This suggests to use the

inverse of ∂f ′n(βcn)∂β Λ′

nV −1n Λn

∂fn(βcn)∂β′ as the weighting matrix. Thus, the gradient

test statistic can be formulated as

G = f ′n(βcn)Λ′

nV −1n Λn

∂fn(βcn)∂β′ [

∂f ′n(βcn)∂β

Λ′nV −1

n Λn∂fn(βcn)

∂β′ ]−1

×∂f ′n(βcn)∂β

Λ′nV −1

n Λnfn(βcn).

Proposition 8.3 Suppose that the Assumption G; and GMM-C, GMM-D1

and GMM-D2 under H0; and GMM-C′, GMM-D1′, GMM-D2, and CT under

H1 hold.

26

Then, the gradient test G is asymptotically equivalent to the difference test D

under both the null and local alternative hypotheses. Under H0, it is asymptotic

χ2(p−q), and under H1 it is asymptotic χ2

(p−q)(η).

In the likelihood framework, a C(α) statistic is invariant to consistent es-

timates, which generalizes the score test statistic. In the GMM framework, a

C(α)-type gradient test can also be formulated (Lee 2005). With various rates

of convergence, this statistic shall be

C = f ′n(βcn)Λ′

nV −1n Λn



Λ′nV −1

n Λn∂fn(βcn)

∂β′ ]−1

×∂f ′n(βcn)∂β

Λ′nV −1

n Λnfn(βcn)

−f ′n(βcn)Λ′

nV −1n Λn

∂fn(βcn)∂δ′

[∂f ′

n(βcn)∂δ

Λ′nV −1

n Λn∂fn(βcn)

∂δ′]−1

×∂f ′n(βcn)∂δ

Λ′nV −1

n Λnfn(βcn),

where βcn = g(δn) and δn is Dn-consistent where Dn = Dn(∆) in Assumption

G is assumed to be independent of ∆.

Proposition 8.4 Suppose that the Assumption GMM-C, GMM-D1 and

GMM-D2 under H0; and GMM-C′, GMM-D1′, GMM-D2, and CT under H1

hold. Furthermore, Assumption G holds with Dn = Dn(∆), which does not

depend on parameters δ.

Then, the C(α)-type gradient test C is asymptotically equivalent to the min-

imum distance test under both the null and local alternative hypotheses. Under

H0, it is asymptotic χ2(p−q), and under H1 it is asymptotic χ2

(p−q)(η).

9 Conclusion

This paper has considered the classical asymptotic test statistics, namely, the

likelihood ratio, efficient score, Neyman’s C(α), and Wald-type statistics for the

testing of general (linear or nonlinear) equality constraints on parameters in a

model, where the MLE’s of various parameters in the model may have different

rates of convergence. We have established a set of general sufficient conditions

such that these test statistics are asymptotically χ2. Indeed, we show that under

27

these sufficient conditions, these classical test statistics are all asymptotically

equivalent under both the null hypothesis and a sequence of local alternative

hypotheses. These test statistics are also asymptotically equivalent to a properly

defined MD test statistic (Lee 2004b).

In addition to the test statistics in the likelihood framework, we have ex-

tended the analogous difference test, gradient test and Wald test in the GMM

estimation framework (Newey and West 1987), where the GMM estimates of

various parameters in the model may have different rates of convergence. An

additional C(α)-type gradient statistic is also considered. These test statistics

are shown to be asymptotic equivalent under both the null hypothesis and a

sequence of local alternative hypotheses, under a set of sufficient conditions.

A Appendix: Proofs

Proof of Proposition 2.1

It follows from (1) that

Γn(βn − β0) = −(

Γ′−1n

∂2 ln Ln(B∗n)


)−1

Γ′−1n

∂ ln Ln(β0)∂β

= Ω−1Γ′−1n

∂ ln Ln(β0)∂β

+ oP (1) d−→ N (0, Σ). (22)

Q.E.D.


The first order condition of the GMM estimation is ∂f ′n(βn)∂β Λ′

nV −1n Λnfn(βn) =

0. By the mean value theorem,

fn(βn) = fn(β0) +∂fn(B∗

n)∂δ′

(βn − β0).

It follows from the first order condition that

βn − β0 = −

(∂f ′

n(βn)∂β

Λ′nV −1

n Λn∂fn(B∗

n)∂β′

)−1∂f ′

n(βn)∂β

Λ′nV −1

n Λnfn(β0)

= −[Γ′nF ′

n(βn)V −1n Fn(B∗

n)Γn]−1Γ′nF ′

n(βn)V −1n Λnfn(β0)

= −Γ−1n [F ′

n(βn)V −1n Fn(B∗

n)]−1F ′n(βn)V −1

n Λnfn(β0).

28

Therefore, the asymptotic distribution of βn follows from

Γn(βn − β0) = −(F ′0V

−1F0)−1F ′0V

−1Λnfn(β0) + oP (1) d→ N (0, Σ).

Q.E.D.

Proof of Lemma 3.1

Because R(β) = 0 when β = g(δ), R(g(δ)) = 0 for all δ. It follows that, by

Assumptions R and G,

∂R(β)∂β′

∂g(δ)∂δ′

= 0 ⇔ ∂R(β)∂β′ Γ−1

n Γn∂g(δ)∂δ′

= 0

⇔ Cn(β)An(β)Gn(δ)Dn(δ) = 0 ⇔ An(β)Gn(δ) = 0,

for β = g(δ). In the limit, as n → ∞, A0G0 = 0.

Because the columns Σ12 A′

0 are perpendicular to Σ− 12 G0, and A0 has full

row rank and G0 has full column rank, the columns of (Σ12 A′

0, Σ− 12 G0) span the

full p-dimensional Euclidean space Rp. Therefore, for any y ∈ Rp, y = y1 + y2

with y1 lies in the space spanned by the columns of Σ12 A′

0 and y2 lies in the

space spanned by the columns of Σ− 12 G0. As Σ

12 A′

0(A0ΣA′0)−1A0Σ

12 y = y1 and

(Ip − Σ− 12 G0(G′

0Σ−1G0)−1G′

0Σ− 1

2 )y = (Ip − Σ− 12 G0(G′

0Σ−1G0)−1G′

0Σ− 1

2 )y1 = y1,

because G′0Σ− 1

2 y1 = 0, the two mappings are identical. Q.E.D.


The linear expansion of ∂ ln Ln(g(δn))∂β

at δ0 may be done in two steps. By the

mean value theorem, in the first step,

∂ lnLn(g(δn))∂β

=∂ ln Ln(β0)

∂β+

∂2 lnLn(B∗n)

∂β∂β′ (g(δn) − β0),

and, in the second step,

g(δn) = g(δ0) +∂g(∆∗

n)∂δ′

(δn − δ0).

Together, one has

∂ ln Ln(g(δn))∂β

=∂ lnLn(g(δ0))

∂β+

∂2 lnLn(B∗n)

∂β∂β′∂g(∆∗

n)∂δ′

(δn − δ0).

29

Because ∂ ln Lcn(δ)∂δ = ∂g′(δ)

∂δ∂ ln Ln(β)

∂β and ∂ ln Lcn(δn)∂δ = 0, it follows that

∂g′(δn)∂δ

∂ ln Ln(g(δn)∂β = 0. Therefore,

δn − δ0 = −

[∂g′(δn)

∂δ

∂2 ln Ln(B∗n)

∂β∂β′∂g(∆∗

n)∂δ′

]−1∂g′(δn)

∂δ

∂ ln Ln(β0)∂β

= −[D′

n(δn)G′n(δn)Γ

′−1n

∂2 ln Ln(B∗n)

∂β∂β′ Γ−1n Gn(∆∗

n)Dn(∆∗n)]−1

×D′n(δn)G′

n(δn)Γ′−1n

∂ ln Ln(β0)∂β

= −D−1n (∆∗

n)[G′

n(δn)Γ′−1n

∂2 lnLn(B∗n)

∂β∂β′ Γ−1n Gn(∆∗

n)]−1

×G′n(δn)Γ

′−1n

∂ ln Ln(β0)∂β

, (23)

under the setting in Assumption G. As δnp→ δ0 implies δ∗j,n

p→ δ0 for j =

1, · · · , p, therefore

Dn(∆∗n) · (δn − δ0) = [G′

0Γ′−1n (−∂2 ln Ln(β0)

∂β∂β′ )Γ−1n G0]−1G′

0Γ′−1n

∂ ln Ln(β0)∂β

+ oP (1)

= (G′0ΩG0)−1G′

0Γ′−1n

∂ lnLn(β0)∂β

+ oP (1).

Under Assumption GL, it follows that

D1n(δn − δ0) = S0[G′0Γ

′−1n (−∂2 ln Ln(β0)

∂β∂β′ )Γ−1n G0]−1G′

0Γ′−1n

∂ ln Ln(β0)∂β

+ oP (1)

= S0(G′0ΩG0)−1G′

0Γ′−1n

∂ ln Ln(β0)∂β

+ oP (1).

Q.E.D.

Proof of Proposition 3.2 The constrained estimator of β0 is βcn = g(δn). By

the mean value theorem,

βcn − β0 = g(δn) − g(δ0) =∂g(∆∗

n)∂δ′

D−1n (∆∗

n) · Dn(∆∗n) · (δn − δ0)

= Γ−1n Gn(∆∗

n)

[G′0Γ

′−1n (−∂2 lnLn(β0)

∂β∂β′ )Γ−1n G0]−1G′

0Γ′−1n

∂ lnLn(β0)∂β

+ oP (1)

,

because ∆∗n is the same one in the preceding Proposition 3.2. Hence,

Γn(βcn − β0) = G0

[G′

0ΩG0]−1

G′0Γ

′−1n

∂ lnLn(β0)∂β

+ oP (1)

. (24)

30

As both the constrained and unconstrained MLE’s have the same rate matrix

Γn, their efficiency can be compared with their limiting variance matrix. The

generalized Schwartz inequality implies that Ω−1 ≥ G0(G′0ΩG0)−1G′

0. Hence

the constrained MLE βcn is asymptotically efficient relative to the unconstrained

MLE βn. Q.E.D.

Proof of Proposition 3.3 The first order condition for the constrained model

is

∂g′(δn)∂δ

∂f ′n(g(δn))

∂βΛ′

nV −1n Λnfn(g(δn)) = 0. (25)

The linearization of fn(g(δn)) at δ0 is desirable to be performed in two steps.

It shall first be linearized at β0 as

fn(g(δn)) = fn(β0) +∂fn(B∗

n)∂β′ (g(δn) − β0).

In the second step, linearize g(δn) at δ0 as g(δn) = g(δ0) + ∂g(∆∗n)

∂δ′ (δn − δ0).

Combining these together, one has

fn(g(δn)) = fn(β0) +∂fn(B∗

n)∂β′

∂g(∆∗n)

∂δ′(δn − δ0), (26)

because β0 = g(δ0). By substituting the expansion (26) into (25), it follows that

δn − δ0 = −

∂g′(δn)

∂δ

∂f ′n(g(δn))

∂βΛ′

nV −1n Λn

∂fn(B∗n)

∂β′∂g(∆∗

n)∂δ′

−1

·∂g′(δn)∂δ

∂f ′n(g(δn))

∂βΛ′

nV −1n Λnfn(β0)

= −

∂g′(δn)

∂δΓ′

nF ′n(g(δn))V −1

n Fn(B∗n)Γn

∂g(∆∗n)

∂δ′

−1

·∂g′(δn)

∂δΓ′

nF ′n(g(δn))V −1

n Λnfn(β0)

= −D−1n (∆∗

n)

G′n(δn)F ′

n(g(δn))V −1n Fn(B∗

n)Gn(∆∗n)−1

·G′n(δn)F ′

n(g(δn))V −1n Λnfn(β0),

which implies, in turn, that

Dn(∆∗n) · (δn − δ0) = −(G′

0F′0V

−1F0G0)−1G′0F

′0V

−1Λnfn(β0) + oP (1). (27)

31

Under Assumption GL,

D1n(δn − δ0) = −S0(G′0F

′0V

−1F0G0)−1G′0F

′0V

−1Λnfn(β0) + oP (1). (28)

Therefore, D1n(δn − δ0)d→ N (0, S0(G′

0F′0V

−1F0G0)−1S′0).

By the delta method, the limiting distribution of the constrained GMM

estimator βcn is

Γn(βcn − β0) = Γn(g(δn) − g(δ0)) = Γn∂g(∆∗

n)∂δ′

(δn − δ0)

= G0Dn(∆∗n)(δn − δ0) + oP (1) d→ N (0, G0(G′

0F′0V

−1F0G0)−1G′0).

By the generalized Schwartz inequality, (F ′0V

−1F0)−1 ≥ G0(G′0F

′0V

−1F0G0)−1G′0,

and, hence, βcn is efficient relative to βn. Q.E.D.


Under the Assumption ML-D, by the mean value theorem,

ln Ln(βn0) − ln Ln(β0)

=∂ lnLn(β0)

∂β′ (βn0 − β0) +12(βn0 − β0)′

∂2 lnLn(βn)∂β∂β′ (βn0 − β0)

= (Γ′−1n

∂ ln Ln(β0)∂β

)′Γn(βn0 − β0)

+12(Γn(βn0 − β0))′ · Γ

′−1n

∂2 lnLn(βn)∂β∂β′ Γ−1

n · Γn(βn0 − β0)

= (Γ′−1n

∂ ln Ln(β0)∂β

)′∆ +12∆′Γ

′−1n

∂2 lnLn(βn)∂β∂β′ Γ−1

n ∆

d→ N (−12∆′Ω∆, ∆′Ω∆),

under H0. The result follows from Le Cam’s lemma. Q.E.D.

Proof of Proposition 4.2 The mean value theorem implies that

Γ′−1n

∂ lnLn(β0)∂β

= Γ′−1n

∂ lnLn(βn0 − Γ−1n ∆)

∂β

= Γ′−1n ∂ ln Ln(βn0)

∂β− ∂2 ln Ln(B∗

n)∂β∂β′ Γ−1

n ∆ = Γ′−1n

∂ lnLn(βn0)∂β

+ Ω∆ + op(1)

d→ N (Ω∆, Ω).

From (22) in the proof of Proposition 2.1, the difference of Γn(βn − β0) and

Ω−1Γ′−1n

∂ ln Ln(β0)∂β converges in probability to zero under H0. By contiguity in

32

Proposition 4.1, this difference will also converge to zero under the sequence of

local alternatives in (13). Hence, under H1,

Γn(βn − β0)

= Ω−1Γ′−1n

∂ lnLn(β0)∂β

+ op(1) = Ω−1(Γ′−1n


+ Ω∆) + op(1)

= Ω−1Γ′−1n

∂ lnLn(βn0)∂β

+ ∆ + op(1) d→ N (∆, Ω−1),

under Assumption ML-D′.

Similarly, under H1, from Proposition 3.1 and by contiguity,

D1n(δn − δ0) = S0(G′0ΩG0)−1G′

0Γ′−1n

∂ lnLn(β0)∂β

+ op(1)

= S0(G′0ΩG0)−1G′

0[Γ′−1n


+ Ω∆] + op(1)

d→ N (S0(G′0ΩG0)−1G′

0Ω∆, S0(G′0ΩG0)−1S′

0),

and, from Proposition 3.2,

Γn(βcn − β0) = G0(G′0ΩG0)−1G′

0Γ′−1n

∂ lnLn(β0)∂β

+ op(1)

= G0(G′0ΩG0)−1G′

0(Γ′−1n

∂ lnLn(βn0)∂β

+ Ω∆) + op(1)

d→ N (G0(G′0ΩG0)−1G′

0Ω∆, G0(G′0ΩG0)−1G′

0).

Q.E.D.


By the mean value theorem, Assumptions GMM-D1′ and GMM-D2,

Λnfn(β0) = Λnfn(βn0 − Γ−1n ∆) = Λnf(βn0) −

∂fn(B∗n)

∂β′ Γ−1n ∆

= Λnf(βn0) − Fn(B∗n)∆ = Λnf(βn0) − F0∆ + op(1).

From Proposition 2.2 and by contiguity,

Γn(βn − β0) = −(F ′0V

−1F0)−1F ′0V

−1Λnfn(β0) + op(1)

= −(F ′0V

−1F0)−1F ′0V

−1(Λnfn(βn0) − F0∆) + op(1)

= −(F ′0V

−1F0)−1F ′0V

−1Λnfn(βn0) + ∆ + op(1)p→ N (∆, (F ′

0V−1F0)−1)

33

under H1. The result for the unconstrained estimate follows.

For the constrained estimators, the asymptotic expansion (28) in the proof

of Proposition 3.3 is valid under H1, i.e.,

Dn(∆∗n)(δn − δ0) = −(G′

0F′0V

−1F0G0)−1G′0F

′0V

−1Λnfn(β0) + oP (1),

and

Γn(βcn − β0) = G0Dn(∆∗n)(δn − δ0) + op(1),

under the sequence of local alternatives in (13). Similar arguments provide the

results for the constrained estimators under H1. Q.E.D.

Proof of Proposition 5.1 First, we shall derive the asymptotic distribution

of the MDE βcm,n under H0. The Lagrangian function of (15) is

L(β, λ) =12[Γn(βn − β)]′Σ−1

n [Γn(βn − β)] + λ′R(β),

where λ is the (p − q)-dimensional vector of Lagrangian multipliers. The first

order conditions are

−Γ′nΣ−1

n Γn(βn − βcm,n) +∂R′(βcm,n)

∂βλcm,n = 0 (29)

R(βcm,n) = 0.

By the mean value theorem, R(βcm,n) = ∂R(B∗n)

∂β′ (βcm,n−β0) because R(β0) = 0.

It follows that the constrained estimates satisfy the following equations(

Γ′nΣ−1

n Γn∂R′(βcm,n)

∂β∂R(B∗

n)∂β′ 0

)(βcm,n − β0

λcm,n

)=(

Γ′nΣ−1

n Γn(βn − β0)0

),

which implies that

(βcm,n − β0) = PnΓ′nΣ−1

n Γn(βn − β0)

where

Pn = (Γ′nΣ−1

n Γn)−1 − (Γ′nΣ−1

n Γn)−1 ∂R′(βcm,n)∂β

·

[∂R(B∗

n)∂β′ (Γ′

nΣ−1n Γn)−1∂R′(βcm,n)

∂β

]−1∂R(B∗

n)∂β′ (Γ′

nΣ−1n Γn)−1.(30)

34

Under Assumption R,

PnΓ′n = Γ−1

n Σn − ΣnA′n(βcm,n)[An(B∗

n)ΣnA′n(βcm,n)]−1An(B∗

n)Σn.

Therefore,

Γn(βcm,n − β0) = Ip − ΣA′0(A0ΣA′

0)−1A0Γn(βn − β0) + oP (1)

d→ N (0, (Σ − ΣA′0(A0ΣA′

0)−1A0Σ)).

Let δn be the MDE of δ0 derived from

minδ

[Γn(βn − g(δ))]′Σ−1n [Γn(βn − g(δ))].

The corresponding MDE of β0 will have βcm,n = g(δn). By the mean value

theorem, there exists ∆∗n such that

βcm,n = g(δn) = g(δ0) +∂g(∆∗

n)∂δ

(δn − δ0).

From the first order condition ∂g′(δn)∂δ Γ′

nΣ−1n Γ(βn − g(δn)) = 0, it follows that

∂g′(δn)∂δ

Γ′nΣ−1

n Γn

[βn − g(δ0) −

∂g(∆∗n)

∂δ′(δn − δ0)

]= 0.

Under Assumption G, this implies under H0 that

δn − δ0 =

[∂g′(δn)

∂δΓ′

nΣ−1n Γn

∂g(∆∗n)

∂δ′

]−1∂g′(δn)

∂δΓ′

nΣ−1n Γn(βn − β0)

= [D′n(δn)G′

n(δn)Σ−1n Gn(∆∗

n)Dn(∆∗n)]−1D′

n(δn)G′n(δn)Σ−1

n Γn(βn − β0).

Therefore,

Dn(∆∗n)(δn − δ0) = (G′

0Σ−1G0)−1G′

0Σ−1Γn(βn − β0) + oP (1)

d→ N (0, (G′0Σ

−1G0)−1),

under H0. Under the situation in Assumption GL, it follows that

D1n(δn − δ0) = S0(G′0Σ

−1G0)−1G′0Σ

−1Γn(βn − β0) + oP (1)

d→ N (0, S0(G′0Σ

−1G0)−1S′0).

35

Furthermore,

Γn(βcm,n − β0) = Γn∂g(∆∗

n)∂δ′

(δn − δ0) = G0Dn(∆∗n)(δn − δ0) + op(1).

Hence,

Γn(βcm,n − β0) = G0(G′0Σ

−1G0)−1G′0Σ

−1Γn(βn − β0) + op(1)

d→ N (0, G0(G′0Σ

−1G0)−1G′0),

under H0. Under H1, the results follow by contiguity and the property Γn(βn −

β0)d→ N (∆, Σ).

Note that the identity in Lemma 3.1

Σ12 A′

0(A0ΣA′0)

−1A0Σ12 = Ip − Σ− 1

2 G0(G′0Σ

−1G0)−1G′0Σ

− 12

implies that

Ip − ΣA′0(A0ΣA′

0)−1A0 = G0(G′

0Σ−1G0)−1G′

0Σ−1

and

Ip − ΣA′0(A0ΣA′

0)−1A0 = G0(G′

0Σ−1G0)−1G′

0Σ−1.

These justify the common values of Σcm and µ. Q.E.D.


From Proposition 5.1,

Γn(βn − βcm,n) = ΣA′0(A0ΣA′

0)−1A0Γn(βn − β0) + op(1)

under both H0 and H1. Hence, the minimized distance is

[Γn(βn − βcm,n)]′Σ−1n [Γn(βn − βcm,n)]

= [Γn(βn − β0)]′A′0(A0ΣA′

0)−1A0[Γn(βn − β0)] + oP (1)

= u′nΣ1/2A′

0(A0ΣA′0)

−1A0Σ1/2un + oP (1),

where un = Σ−1/2Γn(βn − β0), under both H0 and H1.

Under H0, und→ N (0, Ip), therefore


(p−q).

36

On the other hand, under H1, und→ N (Σ− 1

2 ∆, Ip) and


(p−q)(∆′A0(A0ΣA′

0)−1A0∆).

Q.E.D.

Proof of Proposition 6.1 By the mean value theorem,

R(βn) =∂R(B∗

n)∂β′ (βn − β0) =

∂R(B∗n)

∂β′ Γ−1n · Γn(βn − β0)

= Cn(B∗n)An(B∗

n) · Γn(βn − β0), (31)

under the constraints R(β0) = 0. It follows that

C−1n (B∗

n)R(βn) = An(B∗n)Γn(βn − β0) = A0Γn(βn − β0) + oP (1).

Therefore, the Wald test statistic has

Wn = (C−1n (B∗

n)R(βn))′(A0ΣA′0)

−1(C−1n (B∗

n)R(βn))

= (Γn(βn − β0))′A′0(A0ΣA′

0)−1A0Γn(βn − β0) + oP (1)

= [Γn(βn − βcm,n)]′Σ−1[Γn(βn − βcm,n)] + oP (1)

by (17), which is asymptotically equivalent to the MD test under both the null

and local alternative hypotheses by contiguity. Q.E.D.

Proof of Proposition 7.1 By the expansion of ln Ln(βcn) at βn,

2[lnLn(βn) − ln Ln(βcn)] = −(βn − βcn)′∂2 lnLn(β∗

n)∂β∂β′ (βn − βcn)

= −[Γn(βn − βcn)]′Γ−1n

∂2 lnLn(β∗n)

∂β∂β′ Γ−1n [Γn(βn − βcn)]

= [Γn(βn − βcn)]′Ω[Γn(βn − βcn)] + op(1), (32)

where β∗n lies between βn and βcn. The results (2) and (24) imply that

Γn(βn − βcn) = [Ω−1 − G0(G′0ΩG0)−1G′

0]Γ−1n

∂ lnLn(β0)∂β

+ op(1).

and, hence, (32) can be written as

2[lnLn(βn) − lnLn(βcn)]

37

=∂ ln Ln(β0)

∂β′ Γ−1n [Ω−1 − G0(G′

0ΩG0)−1G′0]Ω

×[Ω−1 − G0(G′0ΩG0)−1G′

0]Γ−1n

∂ lnLn(β0)∂β

+ op(1)

= u′n[Ip − Ω

12 G0(G′

0ΩG0)−1G′0Ω

12 ]un + oP (1) d−→ χ2

(p−q),

where un = Ω−1/2Γ−1n

∂ ln Ln(β0)∂β

= Ω12 Γn(βn − β0) + op(1) d→ N (0, 1) under H0.

Under H1, Assumptions ML-C′ and ML-D′ imply that

Γ′−1n

∂ lnLn(β0)∂β

= Γ′−1n

∂ lnLn(βn0)∂β

+ Γ′−1n

∂2 ln Ln(B∗n)

∂β∂β′ (β0 − βn0)

= Γ′−1n

∂ lnLn(βn0)∂β

− Γ′−1n

∂2 ln Ln(B∗n)

∂β∂β′ Γ−1n ∆

= Γ′−1n

∂ lnLn(βn0)∂β

+ Ω∆ + op(1).

It follows that

Γn(βn − β0) = Ω−1Γ′−1n

∂ lnLn(β0)∂β

+ op(1)

= Ω−1Γ′−1n

∂ lnLn(βn0)∂β

+ ∆ + op(1) d→ N (∆, Ω−1).

The equation (24) implies that

Γn(βcn − β0)

= G0(G′0ΩG0)−1G′

0Γ′−1n

∂ ln Ln(β0)∂β

+ op(1)

= G0(G′0ΩG0)−1G′

0Γ′−1n


+ G0(G′0ΩG0)−1G′

0Ω∆ + op(1).

Their difference gives

Ω12 Γn(βn − βcn) = [Ip − Ω

12 G0(G′

0ΩG0)−1G′0Ω

12 ](un + Ω

12 ∆),

where un = Ω− 12 Γ−1

n∂ ln Ln(βn0)

∂β

d→ N (0, 1) by Assumption ML-D′ under H1.

Therefore,

2[lnLn(βn) − ln Ln(βcn)] = [Γn(βn − βcn)]′Ω[Γn(βn − βcn)] + op(1)

= (un + Ω12 ∆)′[Ip − Ω

12 G0(G′

0ΩG0)−1G′0Ω

12 ](un + Ω

12 ∆) + op(1)

d→ χ2(p−q)[∆

′(Ω − ΩG0(G′0ΩG0)−1G′

0Ω)∆].

38

Q.E.D.

Proof of Proposition 7.2 From Propositions 2.2, for the unconstrained GMM

estimator βn,

Γn(βn − β0) = −(F ′0V

−1F0)−1F ′0V

−1Λnfn(β0) + op(1),

under H0. By contiguity, this holds also under H1.

By expansion,

Λnfn(βn) = Λnfn(β0) + Λn∂fn(B∗

n)∂β′ (βn − β0)

= Λnfn(β0) + F0Γn(βn − β0) + oP (1)

= [Ik − F0ΣF ′0V

−1]Λnfn(β0) + op(1),

and

V −1/2Λnfn(βn) = [Ik − V − 12 F0ΣF ′

0V− 1

2 ]un + oP (1),

where Σ = (F ′0V

−1F0)−1 and un = V − 12 Λnfn(β0). It follows that

f ′n(βn)Λ′

nV −1n Λnfn(βn) = u′

n[Ik − V − 12 F0ΣF ′

0V− 1

2 ]un. (33)

For the constrained GMM estimate δn,

Λfn(g(δn)) = Λnfn(β0) + Λn∂fn(B∗

n)∂β′

∂g(∆∗n)

∂δ′(δn − δ0)

= Λnfn(β0) + Fn(B∗n)Gn(∆∗

n)Dn(∆∗n)(δn − δ0)

= Λnfn(β0) + F0G0 · Dn(∆∗n)(δn − δ0) + oP (1)

= [Ik − F0G0(G′0Σ

−1G0)−1G′0F

′0V

−1]Λfn(β0) + oP (1),

and

V − 12 Λnfn(g(δn)) = [Ik − V − 1

2 F0G0(G′0Σ

−1G0)−1G′0F

′0V

− 12 ]un + oP (1). (34)

It follows that

f ′n(g(δn))Λ′

nV −1n Λnfn(g(δn))

= u′n[Ik − V − 1

2 F0G0(G′0Σ

−1G0)−1G′0F

′0V

− 12 ]un + oP (1). (35)

39

From these asymptotic expansions,

f ′n(g(δn))Λ′

nV −1n Λfn(g(δn)) − f ′

n(βn)Λ′nV −1

n Λnfn(βn)

= u′n[V − 1

2 F0ΣF ′0V

− 12 − V − 1

2 F0G0(G′0Σ

−1G0)−1G′0F

′0V

− 12 ]un + op(1) (36)

= [Γn(βn − β0)]′(Σ−1 − Σ−1G0(G′0Σ

−1G0)−1G0Σ−1)[Γn(βn − β0)] + op(1),

because Γn(βn − β0) = −ΣF ′0V

− 12 un + op(1) from Proposition 2.2. From this

expression and (17) in Proposition 5.2, we conclude

f ′n(g(δn))Λ′

nV −1n Λfn(g(δn)) − f ′

n(βn)Λ′nV −1

n Λnfn(βn)

= [Γn(βn − βcm,n)]′F ′0V

−1F0[Γn(βn − βcm,n)] + op(1),

where the latter is the MD test statistic, under both H0 and H1. Q.E.D.


By the mean value theorem, ∂ ln Ln(βcn)∂β

= ∂2 ln L(β∗n)

∂β∂β′ (βcn − βn). Hence,

∂ ln Ln(βcn)∂β′ (−∂2 lnLn(βcn)

∂β∂β′ )−1 ∂ ln Ln(βcn)∂β

= [Γn(βcn − βn)]′Γ−1n (−∂2 lnLn(β0)

∂β∂β′ )Γ−1n [Γn(βcn − βn)] + oP (1)

= 2[lnLn(βn) − lnLn(βcn)] + oP (1). (37)

From (37), the LM statistic is asymptotically equivalent to the LR test statistic

in (32) under both H0 and H1. Q.E.D.

Proof of Proposition 8.2 Define the following two-step estimates

β∗n = βcn − (

∂2 lnLn(βcn)∂β∂β′ )−1 ∂ ln Ln(βcn)

∂β,

and

β∗cn = βcn − ∂g(δn)

∂δ′(∂g′(δn)

∂δ

∂2 ln Ln(βcn)∂β∂β′

∂g(δn)∂δ′

)−1 ∂ lnLn(βcn)∂δ

.

First, it shall be shown that Cα can be rewritten in terms of distance of β∗n

and β∗cn. Because ∂ ln Ln(g(δ))

∂δ′ = ∂ ln Ln(g(δ))∂β′

∂g(δ)∂δ′ , the difference of these two

estimates is

β∗n − β∗

cn = [(−∂2 lnLn(βcn)∂β∂β′ )−1 − ∂g(δn)

∂δ′(−∂g′(δn)

∂δ

∂2 ln Ln)(βcn)∂β∂β′

×∂g(δn)∂δ

)−1 ∂g′(δn)∂δ

]∂ ln Ln(βcn)

∂β.

40

With this expression, it follows that (β∗n − β∗

cn)′(−∂2 ln Ln(βcn)∂β∂β′ )(β∗

n − β∗cn) = Cα.

Second, it shall be shown that β∗n and β∗

cn are, respectively, asymptoti-

cally equivalent to the unconstrained and constrained MLE’s βn and βcn un-

der both H0 and H1. By the mean value theorem ∂ ln L(βcn)∂β = ∂ ln Ln(β0)

∂β +∂2 ln Ln(βcn)

∂β∂β′ (βcn − β0), it follows that

Γn(β∗n − β0) = −Γn(

∂2 lnLn(βcn)∂β∂β′ )−1[

∂ lnLn(βcn)∂β

− ∂2 lnLn(βcn)∂β∂β′ (βcn − β0)]

= −Γn(∂2 lnLn(βcn)

∂β∂β′ )−1Γ′nΓ

′−1n

∂ ln Ln(β0)∂β

− Γn(∂2 ln Ln(βcn)

∂β∂β′ )−1Γ′n

×Γ′−1n [

∂2 ln Ln(βcn)∂β∂β′ − ∂2 ln Ln(βcn)

∂β∂β′ ]Γ−1n · Γn(βcn − β0)

= −Ω−1Γ′−1n

∂ ln Ln(β0)∂β

+ oP (1)

because Γn(βcn − β0) = Op(1). For the β∗cn, it has

Γn(β∗cn − β0) = G0(G′

0ΩG0)−1G′0 · Γ

′−1n

∂ lnLn(β0)∂β

+[Ip − G0(G′0ΩG0)−1G′

0Ω] · Γn(βcn − β0) + o(1)

= G0(G′0ΩG0)−1G′

0 · Γ′−1n

∂ lnLn(β0)∂β

+ op(1),

where the second term in the first equality goes to zero in probability because

Γn(βcn−β0) = Γn∂g(δn)

∂δ′(δn−δ0) = Gn(δn)Dn(δn−δ0) = G0·Dn(δn−δ)+op(1),

using the mean value theorem. From the proof of Propositions 3.2 and 4.2,

one concludes that Γn(β∗n − β0) = Γn(βn − β0) + op(1) and Γn(β∗

cn − β0) =

Γn(βcn −β0)+op(1) under H0. By contiguity, the asymptotic equivalence holds

also under H1.

Therefore, Cα = [Γn(βn − βcn)]′Ω[Γn(βn − βcn)] + oP (1), i.e., Cα is asymp-

totically equivalent to the minium distance test statistics under both H0 and

H1. Q.E.D.


By Assumption GMM-D2,

Λn∂fn(βcn)

∂β′ [∂f ′

n(βcn)∂β

Λ′nV −1

n Λn∂fn(βcn)

∂β′ ]−1∂f ′n(βcn)∂β

Λ′n

= Fn(βcn)Γn[Γ′nF ′

n(βcn)V −1n Fn(βcn)Γn]−1Γ′

nF ′n(βcn) = F0ΣF ′

0 + op(1),

41

where Σ = (F ′0V

−1F0)−1.

By (34) in the proof of Proposition 7.2,

V − 12 Λnfn(βc,n) = [Ik − V − 1

2 F0G0(G′0Σ

−1G0)−1G′0F

′0V

− 12 ]un + op(1).

Therefore,

f ′n(βcn)Λ′

nV −1n Λn



Λ′nV −1

n Λn∂fn(βcn)

∂β]−1

×∂fn(βcn)∂β′ ΛnV −1

n Λnfn(βcn)

= f ′n(βcn)Λ′

nV −1n F0ΣF ′

0V−1n Λnfn(βcn) + op(1)

= u′n[Ik − V − 1

2 F0G0(G′0Σ

−1G0)−1G′0F

′0V

− 12 ]V − 1

2 F0ΣF ′0V

− 12

·[Ik − V − 12 F0G0(G′

0Σ−1G0)−1G′

0F′0V

− 12 ]un + op(1)

= u′n[V − 1

2 F0ΣF ′0V

− 12 − V − 1

2 F0G0(G′0Σ

−1G0)−1G′0F

′0V

− 12 ]un + op(1),

where the latter is the difference test statistic in (36), under both H0 and H1.

Q.E.D.

Proof of Proposition 8.4 Define the following two-step estimates

β∗n = βcn − (


Λ′nV −1

n Λn∂fn(βcn)

∂β′ )−1 ∂f ′n(βcn)∂β

Λ′nV −1

n Λnfn(βcn),

and

β∗cn = βcn−

∂g(δn)∂δ′

(∂f ′

n(βcn)∂δ

Λ′nV −1

n Λn∂fn(βcn)

∂δ′)−1∂f ′

n(βcn)∂δ

Λ′nV −1

n Λnfn(βcn).

First, it shall be shown that C can be rewritten in terms of distance of β∗n

and β∗cn. Because ∂fn(g(δ))

∂δ′ = ∂fn(g(δ))∂β′

∂g(δ)∂δ′ , the difference of these two estimates

can be rewritten as

β∗cn − β∗

n = L′−1n MnL−1

n


Λ′nV −1

n Λnfn(βcn),

where Ln is defined by the decomposition (∂f ′n(βcn)∂β Λ′

nV −1n Λn

∂fn(βcn)∂β′ ) = LnL′

n

and Mn = Ip−L′n

∂g(δn)∂δ′ [∂g′(δn)

∂δ LnL′n

∂g(δn)∂δ′ ]−1 ∂g′(δn)

∂δ Ln. The Mn is a symmetric

and idempotent matrix. Therefore,

(β∗n − β∗

cn)′∂f ′

n(βcn)∂β

Λ′nV −1

n Λn∂fn(βcn)

∂β′ (β∗n − β∗

cn)

42

= f ′n(βcn)Λ′

nV −1n Λn

∂fn(βcn)∂β′ L

′−1n MnL−1

n


Λ′nV −1

n Λnfn(βcn) = C.

Second, it shall be shown that β∗n and β∗

cn are, respectively, asymptotically

equivalent to the unconstrained and constrained optimum GMM estimators βn

and βcn under both H0 and H1. The β∗n implies that

Γn(β∗n − β0) = Γn(


Λ′nV −1

n Λn∂fn(βcn)

∂β′ )−1 ∂f ′n(βcn)∂β

Λ′nV −1

n Λn

×[∂f ′

n(βcn)∂β

(βcn − β0) − fn(βcn)].

By the mean value theorem, fn(βcn) = fn(β0)+ ∂fn(βcn)∂β′ (βcn−β0), and because

Γn(∂f ′

n(βcn)∂β

Λ′nV −1

n Λn∂fn(βcn)

∂β′ )−1 ∂f ′n(βcn)∂β

Λ′n = (F ′

0V−1Fn)−1F ′

0 + op(1),

it follows that

Γn(β∗n − β0) = −(F ′

0V−1F0)−1F ′

0V−1Λnfn(β0) + op(1).

For the β∗cn, it has

Γn(β∗cn − β0) = −G0(G′

0F′0V

−1F0G0)1G′0F

′0V

−1fn(β0)

+[Ip − G0(G′0F

′0V

−1F0G0)−1G′0F

′0V

−1F0]Γn(βcn − β0)

= −G0(G′0F

′0V

−1F0G0)1G′0F

′0V

−1fn(β0) + op(1),

where the second term in the first equality goes to zero in probability because

Γn(βcn − β0) = G0 ·Dn(δn − δ) + op(1). From the proof of Propositions 3.3 and

4.3, one concludes that Γn(β∗n − β0) = Γn(βn − β0) + op(1) and Γn(β∗

cn − β0) =

Γn(βcn −β0)+op(1) under H0. By contiguity, the asymptotic equivalence holds

also under H1.

Therefore, C = [Γn(βn − βcn)]′F ′0V

−1F0[Γn(βn − βcn)] + oP (1), i.e., C is

asymptotically equivalent to the minium distance test statistics under both H0

and H1. Q.E.D.

B Appendix: GMM Overidentification Test

In this appendix, we demonstrate the possible construction of the overidentifica-

tion test statistic in our framework. The overidentification test in Hansen (1982)

43

is designed to test the validity with an extra number of moment conditions for

the GMM estimation of β0.

With the (unconstrained) GMM estimate βn, the minimized objective func-

tion is f ′n(βn)Λ′

nV −1n Λnfn(βn), which can be used as the overidentification test

statistic. The following proposition provides the asymptotic distribution of the

overidentification test statistic. The number of unknown parameters in β is p

and the number of moment equations is k, where p < k.

Proposition A.1 Suppose that Assumptions GMM-C, GMM-D1, GMM-D2

hold, then

f ′n(βn)Λ′

nV −1n Λnfn(βn) d→ χ2

(k−p).

Proof. From (33) in the proof of Proposition 7.2,

f ′n(βn)Λ′

nV −1n Λnfn(βn) = u′

n[Ik − V − 12 F0(F ′

0V−1F0)−1F ′

0V−1

2 ]un + oP (1),

where un = V − 12 Λnfn(β0). Because un

d→ N (0, Ik) under Assumption GMM-

D1, it follows that f ′n(βn)Λ′

nV −1n Λnfn(βn) d→ χ2(k − p). Q.E.D.

C Appendix: An Illustrative Example Where

Assumption G Does Not Hold

Consider the case where β = (β1, β2) and β = g(δ) = (δ2, δ). Suppose that

the unconstrained estimate βn = (βn1, βn2)′ has the asymptotic properties that

Γn(βn − β0)d→ N (0, I2), where Γn = (γn1, γn2) and γn1 is a faster rate than

γn2, i.e., γn2γn1

→ 0.

The derivative of g(δ) with δ is ∂g′(δ)∂δ

= (2δ, 1). As

Γn∂g(δ)∂δ

=(

2δγn2γn1

)γn1,

Gn(δ) = (2δ, γn2γn1

)′ and Dn = γn1. Assumption G will not be satisfied when the

true δ0 is 0 in this example, because Gn(δ0) = Gn(0) → G0 = (0, 0)′ which does

not have full column rank one.

44

Consider now the MD estimation of δ. Because Σn = I2, the MD estimation

is minδ [Γn(βn−g(δ))]′[Γn(βn−g(δ))] = minδ Qn(δ), where Qn(δ) = [γn1(βn1−

δ2)]2 + [γn2(βn2 − δ)]2. Because Qn(δ) is a polynomial function of order four,

one has the following exact expansion from the first order condition of the MDE

δcm,n at δ0 = 0:

0 =∂Qn(δcm,n)

∂δ=

∂Qn(0)∂δ

+∂2Qn(0)

∂δ2δcm,n+

12!

∂3Q(0)∂δ3

δ2cm,n+

13!

∂4Qn(0)∂δ4

δ3cm,n,

where ∂Qn(0)∂δ

= −2γ2n2βn2,

∂2Qn(0)∂δ2 = −4γ2

n1βn1 + 2γ2n2,

∂3Qn(0)∂δ3 = 0, and

∂4Qn(0)∂δ4 = 24γ2

n1. Together, this implies the following relationship among βn1,

βn2 and δcm,n:

γn2βn2 =(

γn2 − 2γn1

γn2γn1βn1

)δcm,n + 2

γ2n1

γn2δ3cm,n. (38)

It turns out that the rate of convergence of δcm,n to δ0 = 0 will depend on how

the ratio γn1γ2

n2will behave, and its asymptotic distribution may or may not be

normally distributed.

Case (1). γn1γ2

n2→ 0 i.e., the rate γn1 is faster than γn2 but slower than γ2

n2:

In this case, the preceding relation (38) shall be rewritten as

γn2βn2 =(

1 − 2γn1

γ2n2

γn1βn1

)γn2δcm,n + 2

γ2n1

γ4n2

(γn2δcm,n)3,

one has γn2δcm,n = γn2βn2 + op(1). Under this situation, δcm,n has the

lower rate γn2 of convergence. As γn2βn2 = γn2(βn2 − β02)d→ N (0, 1)

under β02 = 0, δcm,n is asymptotically normally distributed as βn2. In

this case, the information of βn1 does not play a role even βn1 converges

in probability to zero at the fast rate of γn1.

Case (2). γn1γ2

n2→ c where c 6= 0 is a finite constant i.e., γn1 is as fast as γ2

n2.

The limiting distribution z of γn2δcm,n might not be normally distributed

and will be characterized by the polynomial equation with normal ran-

dom coefficients: v2 − (1 − 2cv1)z − 2c2z3 = 0 where v1 and v2 are two

independent N (0, 1) variables because the limiting distributions of γn1βn1

and γn2βn2 are independently distributed N (0, 1) variables.

45

Case (3). γn1γ2

n2→ ∞, i.e., the rate γn1 is faster than γ2

n2-rate.

For this case, (38) shall be rewritten as

γn2βn2 =(

γ2n2

γn1− 2γn1βn1

)(γn1

γn2δcm,n) + 2

γ2n2

γn1(γn1

γn2δcm,n)3.

As γ2n2

γn1→ 0, it follows that γn1

γn2δcm,n = −1

2γn2βn2

γn1βn1+ op(1). Thus, in this

case, δcm,n has the γn1γn2

-rate of convergence, which is faster than the γn2-

rate but slower than the γn1-rate. The limiting distribution of γn1γn2

δcm,n is

a half of the ratio of two independently distributed N (0, 1) variables.

References

[1] Amemiya, T. (1985), Advanced Econometrics. Harvard University Press,

Cambridge, Massachusetts.

[2] Bickel, P.J., C.A.J. Klaassen, Y. Ritov, and J.A. Wellner (1993), Efficient

and Adaptive Estimation for Semiparametric Models. Baltimore: John Hop-

kins University Press.

[3] Brock, W.A. and S.N. Durlauf (2001), “Interactions-base models”. In Hand-

book of Econometrics, J.J. Heckman and E.E. Leamer (eds). North-Holland:

Amsterdam, 3297-3380.

[4] Dagenais, M.G. and J-M. Dufour (1991), “Invariance, nonlinear models,

and asymptotic tests”. Econometrica 59, 1601-1615.

[5] Hajek, J. and Z. Sidak (1967), Theory of Rank Tests. New York: Academic

Press.

[6] Hansen, L.P. (1982), “Large sample properties of generalized method of

moments estimators”. Econometrica 50, 1029-1054.

[7] Le Cam, L. (1960), “Locally asymptotically normal families of distribu-

tions”. University of California Publications in Statistics 3, 37-98.

46

[8] Lee, L.F., (2004a), “Asymptotic distributions of quasi-maximum likelihood

estimators for spatial econometric models”. Econometrica 72, 1899-1926.

[9] Lee, L.F., (2004b), “Pooling estimates with different rates of convergence –

a minimum χ2 approach: with an emphasis on a social interactions model”.

Manuscript, Department of Economics, OSU.

[10] Lee, L.F., (2005), “A C(α)-tpe gradient test in the GMM approach”.

Manuscript, Department of Economics, OSU.

[11] Manski, C.F., (1993), “Identification of endogenous social effects: the re-

flection problem”. Review of Economic Studies 60: 531-542.

[12] Moon, H.R. and F. Schorfheide (2002), “Minimum distance estimation of

nonstationary time series models”. Econometric Theory 18, 1385-1407.

[13] Nagaraj, N. and W. Fuller (1991), “Estimation of the parameters of linear

time series models subject to nonlinear restrictions”. Annals of Statistics

19, 1143-1154.

[14] Newey, W.K. and K.D. West (1987), “Hypothesis testing with efficient

method of moments estimation”. International Economic Review 28, 777-

787.

[15] Neyman, J. (1959), “Optimal asymptotic tests of composite statistical hy-

potheses”. In Probability and Statistics, the Harald Cramer Volume, ed., U.

Grenander, New York, Wiley.

[16] Park, J. and P.C.B. Phillips (2000), “Nonstationary binary choice”. Econo-

metrica 68, 1249-1280.

[17] Smith, R.J. (1987), “Alternative asymptotically optimal tests and their

application to dynamic specification”. Review of Economic Studies 54, 665-

680.

[18] Ruud, P.A., (2000), Classical Econometric Theory, Oxford University

Press, New York, NY.

47

[19] Wu, C., (1981), “Asymptotic theory of nonlinear least squares estimation”.

Annals of Statistics 9, 501-513.

48

classical inference with ml and gmm estimates with various rates

Documents