estimating a spatial autoregressive model with an …...emphasized, endogeneity of spatial weights...

Estimating a spatial autoregressive model with an endogenous

spatial weight matrix

Xi Qu�, Lung-fei Lee

The Ohio State University

October 29, 2012

Abstract

The spatial autoregressive model (SAR) is a standard tool to analyze data with spatial correlation.

Conventional estimation methods rely on the key assumption that the spatial weight matrix W is strictly

exogenous, which is likely to be violated in empirical analyses. This paper presents the speci�cation and

estimation of the SAR model with an endogenous spatial weight matrix. The outcome equation is a

cross-sectional SAR model where W has a model structure on its entries. Endogeneity of W comes from

the correlation between error terms in its entries and the disturbances in the SAR outcome equation. We

explore three estimation methods: two-stage instrumental variable (2SIV) method, maximum likelihood

estimation (MLE) approach, and general method of moments (GMM), for this model with an endogenous

spatial weights matrix. We then establish the consistence of the estimators from these methods and derive

their asymptotic distributions. Finite sample properties of these estimators are investigated by the Monte

Carlo simulation.

Keywords: Spatial autoregressive model; Endogenous spatial weight matrix; Estimation methods

1 Introduction

The SAR model is of great interest to economists because it has a game structure and can be interpreted as

a reaction function. It is widely used in spatial econometrics and for modeling social networks. In spatial

econometrics, the SAR model has been applied to cases where outcomes of a spatial unit at one location

depend on those of its neighbors. The corresponding spatial weight matrix is a measure of distance between

di¤erent locations. Consequently, the spatial dependence parameter provides a multiplier for the spillover

e¤ect. SAR models can also be used to model social networks. For example, a student�s behavior (such as

smoking or academic achievement) can be directly a¤ected by his/her friends�behaviors. The weight matrix

�Corresponding author: Department of Economics, Ohio State University, 310 Arps Hall, Columbus, OH 43210, [email protected].

1

can then be constructed by using the friendship relation, and the network (spatial) dependence parameter can

be interpreted as the strength of peer e¤ects. Since spillover and peer e¤ects are important to governments

on school policies, how to consistently estimate the spatial dependence parameter draws econometricians�

attention.

Spatial econometrics has well established estimation procedures for the SAR model with an exogenous

spatial weight matrix: for example, see Ord (1975) and Lee (2004) for the MLE, and see Anselin (1980) and

Kelejian and Prucha (1998, 1999) for the IV methods and for the GMM see Lee (2007), Lee and Liu (2010),

Lin and Lee (2010), and Liu et al. (2010). Consistency and asymptotic normality of these estimators rely

on the key assumption that the spatial weight matrix is strictly exogenous. This may be true when spatial

weights are constructed using predetermined geographic distances, for example, between di¤erent cities or

countries. However, if �economic distance� like the relative GDP or trade volume is used to construct the

weight matrix, it is highly possible that these elements are correlated with the �nal outcome. Similarly, in

the social network framework, some unobserved characteristics may a¤ect both the friendship relationship

and behavioral outcomes (Hsieh and Lee 2011). Even in the pure physical distance case, the spatial weight

matrix may be endogenous for individuals if the location involves a decision process when we consider

neighborhood choices or migration of an individual. Therefore, in an empirical work, the exogenous spatial

weight assumption is likely to be violated.

Many researchers have realized this problem but due to the complication engendered by spatial models

with an endogenous spatial weight matrix, so far no research has been done along this line. In Pinkse

and Slade (2009), they pointed out future directions of spatial econometrics. Among several problems they

emphasized, endogeneity of spatial weights was an important one. They concluded that "many of these are

still waiting for good solutions" and the endogeneity problem �can admittedly be challenging."

In this paper, we attempt to tackle the endogenous weights issue. By making explicit assumptions on

the source of endogeneity, we get two sets of equations � one is the SAR outcome, and the other is the

equation for entries of the spatial weight matrix. The disturbances in the SAR outcome equation and

the error terms in the entry equation can be correlated. When the correlation coe¢ cient between them

is nonzero, it introduces endogeneity to the spatial weights. We focus on estimation issues for this type

of SAR model. Assuming a jointly normal distribution, conditional on the sources of spatial weights, the

disturbances in the SAR outcome equation present unobserved control variables. Exploring the unobservable

control variables for endogeneity in the outcome SAR equation, we consider three estimation methods. The

�rst estimation method is a two-stage instrumental variables (2SIV) approach. In the �rst stage estimation,

we consistently estimate the entry equation; in the second stage, we substitute the residuals of the entry

equation into the outcome equation to replace the unobserved control variables and then use the standard

IV methods to estimate the SAR outcome equation. The second method, the ML approach, is based on

the joint normal distribution function, thus all the parameters can be jointly estimated via the likelihood

function of the equation system. The third method is a GMM approach, in which an outcome equation with

control variables for endogeneity introduces additional quadratic moments for identi�cation and the best

2

moment conditions can be found.

The main task of this paper is to show the consistency and asymptotic normality of these three estimators.

The estimates involve statistics with linear-quadratic forms of disturbances, but the quadratic matrix involves

the spatial weight matrix. The di¢ culty arises from the fact that the entries in the quadratic matrix are

non-linear functions of disturbances. So those statistics are not of really quadratic forms with nonstochastic

quadratic matrices. Therefore, the standard results of linear-quadratic forms do not directly apply to the

situation here. For the asymptotic theories to kick in, we assume that each agent can be directly a¤ected by

its physical neighbors but the magnitude of its neighbor�s e¤ect depends on their endogenous entries.

The rest of this paper is organized as follows. In Section 2, we present the model speci�cation of the

outcome equation and the entries of its spatial weight matrix. In Section 3, we explain the 2SIV and ML

estimation methods for this model. consistency and asymptotic normality of estimates from these methods

are derived in Section 4 and Section 5. In Section 6, we relax some of the assumptions used and also

brie�y discuss the best GMM estimation with linear and quadratic moments. In Section 7, Monte Carlo

simulations are employed to investigate �nite sample properties and compare the performance of the proposed

estimation methods with traditional methods. The log likelihood, its �rst and second derivatives and some

related expressions are collected in Appendx A. Proofs for all the lemmas, propositions, and theorems are

given in Appendix B.

2 The model

2.1 Model speci�cation

Following Jenish and Prucha (2009), we consider spatial processes located on a (possibly) unevenly spaced

lattice D � Rd, d � 1. Asymptotic methods we employ are increasing domain asymptotics: growth of thesample is ensured by an unbounded expansion of the sample region. The lattice satis�es the following basic

assumption.

Assumption 1 The lattice D � Rd, d � 1, is in�nite countable. All elements in D are located at distances

of at least �0 > 0 from each other, i.e., 8i; j 2 D : �(i; j) � �0; w.l.o.g. we assume that �0 = 1.

Let f("i;n; vi;n); i 2 Dn, n 2 Ng be a triangular double array of real random variables de�ned on a

probability space (; F ; P ), where the index sets Dn is a �nite subset of D, and D satis�es Assumption 1.

Zn = Xn� + "n; (2.1)

where Xn is an n � k matrix with its elements fxi;n; i 2 Dn; n 2 Ng being deterministic and bounded inabsolute value for all i and n, � is a k � p matrix of coe¢ cients, "n = ("1;n; :::"n;n)0 is an n � p matrix ofdisturbances with "i;n = (�i1; :::�ip)

0 being p dimensional column vectors, and Zn = (z1; :::zn)0 is an n � p

matrix with zi = (zi1; :::zip)0. Wn = (wij;n) is an n � n non-negative matrix with its elements constructed

3

by Zn: wij;n = hij(Zn) for i; j = 1; :::; n.1 Yn = (y1;n:; ::; yn;n)0 is an n� 1 vector from a cross-sectional SAR

model speci�ed as

Yn = �WnYn +Xn� + Vn; (2.2)

where Vn = (v1;n; :::; vn;n)0, � is a scaler, and � = (�1; :::; �k)0 is a k � 1 vector of coe¢ cients.

2.2 Model interpretation

We consider n agents in an area, each endowed with a predetermined location i. Any two agents are separated

away by a distance of at least 1: Due to some spillover e¤ects, each agent i has an outcome yi;n directly

a¤ected by its neighbors�outcomes y0j;ns: The outcome equation is yi;n = �P

j 6=i wij;nyj;n + x0i;n� + vi;n,

where the spatial weight wij;n is a measure of relationship between agents i and j, and the spatial coe¢ cient

� provides a multiplier for the spillover e¤ects. However, the spatial weight wij;n is not predetermined but

depends on some observable variable Zn: We can think of zi;n as some economic variables at location i such

as GDP, consumption, economic growth rate, etc.

This speci�cation introduces endogeneity into the spatial weight matrix and has been used in the lit-

erature. For example, Anselin and Bera (1997) stated that, in economic applications the use of weights

based on "economic" distance has been suggested, and provided several examples. In Case et al. (1993),

weights (before row normalization) of the form wij = 1=(zi � zj) were speci�cally suggested, where zi andzj are observations on "meaningful" socioeconomic characteristics. In Conway and Rork (2004), they used

migration �ow data to construct a spatial weight matrix. Another example is in Crabbé and Vandenbussche

(2008), where in addition to the physical distance, spatial weight matrices were constructed by inverse trade

share and inverse distance between GDP per capita.

2.3 Source of endogeneity

In the following analysis, we suppress the subscript n in the triangular array for convenience. The endogeneity

ofWn comes from the correlation between vi and �i, which is assumed to be multivariate normally distributed,

i.e., (vi; �i) � i:i:d N 0;

�2v �0v�

�v� �"

!!; where �2v is a scalar variance, covariance �v� = (�v�1 ; :::�v�p)

0 is

a p dimensional vector, and �" is a p � p matrix. If �v� is zero, then the spatial weight matrix Wn can be

strictly exogenous and we can apply conventional methodology of SAR models for estimation. However, in

general, �v� is not zero and Wn becomes an endogenous spatial weights matrix. Thus, Wn will be correlated

with the disturbances in Vn. Under the joint normality assumption, conditional on "n, we have

vij�i � N(�0v��1" �i; �2v � �0v��1" �v�)

1Here we simplify the notation by regarding the subscripts i and j as integer values to indicate entries in a vector or matrixeven though i and j refer formally in Assumption 1 as locations in the lattice D contained in the d-dimensional Euclidean spaceRd.

4

and (2.2) becomes

Yn = �WnYn +Xn� + (Zn �Xn�)� + �n; (2.3)

where � = ��1" �v� is a p dimensional column vector of parameters, �n � N�0; �2�In

�with the variance

�2� = �2v � �0v��1" �v�, and �n is independent of "n in equation (2.2). Our subsequent asymptotic analysis

will mainly rely on equation (2.3), where the added variables (Zn�Xn�) are control variables to control theendogeneity of Wn.

3 Estimation methods

3.1 The two-stage IV estimation

In the �rst stage, we estimate Zn = Xn�+"n by ordinary least squares (OLS) method, so b� = (X 0nXn)

�1X 0nZn.

Then, in the second stage by substituting b� for � in (2.3), we haveYn = �WnYn +Xn� + (Zn �Xnb�)� + b�n; (3.1)

where b�n = �n +Xn(b�� )� = �n + Pn"n� with Pn = Xn(X 0nXn)

�1X 0n. Since Zn �Xnb� = P?n Zn = P?n "n

with P?n = In � Pn, (3.1) can be explicitly rewritten as

Yn = (WnYn; Xn; P?n Zn)�+ (�n + Pn"n�); (3.2)

where � =�� 0 �0

�0: For estimation, with the control variables (Zn �Xn�) added in (2.3) or Zn in

(3.2), Wn can be treated as predetermined or exogenous. However, WnYn remains to be endogenous in (2.3)

and (3.2). So for an IV estimation, we need instruments for WnYn. Let Qn be an n�m matrix of IVs, then

an IV estimator of � with Qn will be

b� = [(WnYn; Xn; P?n Zn)

0Qn(Q0nQn)

�1Q0n(WnYn; Xn; P?n Zn)]

�1(WnYn; Xn; P?n Zn)

0Qn(Q0nQn)

�1Q0nYn:

3.2 The maximum likelihood estimation

Denote � = (�; �0; vec(�)0; �2v; �0; �0v�)

0; where � is a J dimensional column vector of distinct parameters of

elements in �" (J � p(p+ 1)=2), the log likelihood function can be written as

lnLn(�jYn; Zn) = �n ln(2�)�n

2ln(�2v � �0v��1" �v�)�

n

2ln j�"j+ ln jSn(�)j

� 12vec(Zn �Xn�)0(��1" In)vec(Zn �Xn�)�

1

2(�2v � �0v��1" �v�)(3.3)

� [Sn(�)Yn �Xn� � (Zn �Xn�)��1" �v�]0[Sn(�)Yn �Xn� � (Zn �Xn�)��1" �v�];

5

where Sn(�) = In � �Wn and Sn = Sn(�0). The MLE b� maximizes the log likelihood function. A necessarycondition is @ lnLn(b�jYn;Zn)

@� = 0, where the �rst order derivatives of the log likelihood function are listed in

Appendix A.2.

4 Asymptotic properties of estimators: consistency

4.1 Key statistics

The 2SIV

For the 2SIV estimator b�,b�� 0 = [(WnYn; Xn; P

?n Zn)

0Qn(Q0nQn)



0Qn(Q0nQn)

�1Q0nb�n

= [(WnYn; Xn; P?n Zn)

0Qn(Q0nQn)



0Qn(Q0nQn)

�1Q0n(�n + Pn"n�0);

where the subscript 0 on parameters denotes their true values. Here WnYn = Wn(In � �0Wn)�1(Xn�0 +

"n�0 + �n) = Gn(Xn�0 + "n�0 + �n);where Gn(�) = Wn(In � �Wn)�1 and Gn = Gn(�0). Therefore, for

(WnYn; Xn; P?n Zn)

0Qn(Q0nQn)

�1Q0n(WnYn; Xn; P?n Zn), we have the quadratic terms with disturbances

("n�0 + �n)0G0nQn(Q

0nQn)

�1Q0nGn("n�0 + �n);

("n�0 + �n)0G0nQn(Q

0nQn)

�1Q0nP?n "n; and "

0nP

?n Qn(Q

0nQn)

�1Q0nP?n "n;

the linear terms

X 0nG

0nQn(Q

0nQn)

�1Q0nGn("n�0 + �n);

X 0nG

0nQn(Q

0nQn)

�1Q0nP?n "n; and X

0nQn(Q

0nQn)

�1Q0nP?n "n;

and other terms

X 0nG

0nQn(Q

0nQn)

�1Q0nGnXn; X0nG

0nQn(Q

0nQn)

�1Q0nXn; and X0nQn(Q

0nQn)

�1Q0nXn:

As for (WnYn; Xn; P?n Zn)

0Qn(Q0nQn)

�1Q0n(�n + Pn"n�0), we have the quadratic terms with disturbances

("n�0 + �n)0G0nQn(Q

0nQn)

�1Q0n(�n + Pn"n�0) and "0nP

?n Qn(Q

0nQn)

�1Q0n(�n + Pn"n�0),

and the linear terms

X 0nG

0nQn(Q

0nQn)

�1Q0n(�n + Pn"n�0) and X0nQn(Q

0nQn)

�1Q0n(�n + Pn"n�0):

6

Each of these terms needs to be carefully analyzed. If we choose Qn = [WnXn; Xn; P?n Zn], then

Q0nQn =

0B@ X 0nW

0nWnXn X 0

nW0nXn X 0

nW0nP

?n "n

X 0nXn 0

"0nP?n "n

1CA ;and many terms can be simpli�ed. For example, Qn(Q0nQn)

�1Q0nPn = Pn, Qn(Q0nQn)�1Q0nP

?n Zn =

P?n Zn = P?n "n, and "

0nP

?n Qn(Q

0nQn)

�1Q0nP?n "n = "

0nP

?n "n.

To summarize, for the 2SIV estimator, the terms we need to analyze are

1

nX 0nW

0nWnXn,

1

nX 0nW

0nXn;

1

nX 0nG

0nWnXn;

1

nX 0nG

0nXn,

1

n"0nG

0n"n,

1

n"0nG

0nXn,

1

nX 0nW

0nGn"n;

1

nX 0nG

0nGn"n,

1

n�0nG

0n"n,

1

n�0nG

0nXn,

1

nX 0nW

0nGn�n,

1

nX 0nW

0n�n, and

1

n"0nP

?n �n.

The MLE

To show the consistency of the MLE b�, �rst we need to show the uniform convergence of the log likelihoodfunction, i.e., the uniform convergence of the sample averages of ln jSn(�)j; vec(Zn�Xn�)0(��1" In)vec(Zn�Xn�); and [Sn(�)Yn�Xn�� (Zn�Xn�)��1" �v�]

0[Sn(�)Yn�Xn�� (Zn�Xn�)��1" �v�]: Since Zn�Xn� =Xn(�0��)+ "n; the convergence of 1nvec(Zn�Xn�)

0(��1" In)vec(Zn�Xn�) can be directly derived fromthe linear and quadratic forms of "n. The complication comes from the last term in the likelihood function.

As Yn = S�1n (Xn�0 + "n�0 + �n); we have

Sn(�)Yn �Xn� � (Zn �Xn�)� = (S�1n � �Gn)(Xn�0 + "n�0 + �n)�Xn[� + (�0 � �)�]� "n�;

so in the log likelihood function, the terms we need to analyze are

1

nX 0nS

�10n S�1n Xn,

1

nX 0nS

�10n GnXn;

1

nX 0nG

0nGnXn,

1

nX 0nS

�10n Xn,

1

nX 0nG

0nXn;

1

nX 0nS

�10n S�1n "n;

1

nX 0nG

0nGn"n,

1

nX 0nS

�10n Gn"n;

1

nX 0nS

�10n "n,

1

nX 0nG

0n"n,

1

n"0nS

�10n S�1n "n,

1

n"0nS

�10n Gn"n;

1

n"0nG

0nGn"n;

1

nX 0nS

�10n �n,

1

nX 0nG

0n�n,

1

nX 0nS

�10n S�1n �n,

1

nX 0nG

0nGn�n,

1

nX 0nS

�10n Gn�n

1

n�0nS

�10n "n,

1

n�0nG

0n"n;

1

n"0nS

�10n S�1n �n,

1

n"0nG

0nGn�n,

1

n"0nS

�10n Gn�n

1

n�0nS

�10n S�1n �n,

1

n�0nS

�10n Gn�n; and

1

n�0nG

0nGn�n.

For the special term 1n ln jSn(�)j, we have the Taylor expansion,

1

nln jSn(�)j =

1

nln jIn � �Wnj = �

1

n

nXi=1

" 1Xl=1

�l

l(W l

n)ii

#:

In summary, for the consistency of the estimators, the terms we need to analyze can be written in a

7

general form as

1

nX 0nM

0nXn;

1

n"0nMn"n ,

1

n�0nMn�n;

1

n"0nMn�n;

1

nX 0nMn�n; and

1

n

nXi=1

" 1Xl=1

�l

l(W l

n)ii

#;

where Mn can be Wn, Gn; S�1n , W 0nWn, G0nWn, S�10n S�1n , S�10n Gn, and G0nGn. To analyze these terms, we

need some additional assumptions and topological structures.

4.2 Assumptions and topological structures

Assumption 2 2.1). The parameter � = (�; �0; vec(�)0; �2v; �0; �0v�)

0 is in a compact set � in the Euclidean

space Rk� , where k� = k + 2 + kp + p + J ; k is.the dimension of �, p is the dimension of �v�, kp is the

number of parameters in �, and J is the number of distinct parameters in ��(�). In this set, �2v > 0 and

��(�) is positive de�nite. The true parameter �0 is contained in the interior of �.

2.2). The matrix Sn is nonsingular.

2.3). The sequences of matrices fWng are bounded in row and column sums norms2 . � satis�es thatsup� jj�Wnjj1 < 1 and sup� jj�Wnjj1 < 1.2.4). limn!1(

1nX

0nXn)

�1 exists and is nonsingular.

2.5). vi and "i have the joint normal distribution:

(vi; �i) � i:i:d:N(0;�); where � = �2v �0v�

�v� �"

!is positive de�nite:

2.6). The spatial weight wij = 0 if �(i; j _) > k0 for some k0 > 0, i.e., if the distance between two locations

exceeds the threshold k0, they cannot be neighbors. We focus on two structures of Wn constructed by the

elements w�ij = g(zi; zj)I(�(i; j_) � k0):

Case 1. Without row normalization: wij = w�ij = g(zi; zj)I(�(i; j) � k0);Case 2. Row normalized weight matrix: wij =

w�ijPj w

�ij:

From these constructions, we can see the spatial weight considered in this paper is a combination of phys-

ical distance and economic distance: whether agents are neighbors depends on the predetermined physical

distance �(i; j), but the magnitude of neighbors�e¤ects depends on the economic similarity g(zi; zj). Denote

Bi(h) as a ball centered at the location i with a radius h. The di¤erence between Case 1 and 2 is that

in Case 1, wij is only related to two points within the ball Bi(k0) and hence the values zi and zj ; but in

Case 2, wij may be related to all �nite points within the ball Bi(k0) and corresponding values of z at those

2We say the sequence of n� n matrices fPng is bounded in row and column sums norms if sup1�i�n;n�1Pni=1 jpij;nj <1

and sup1�i�n;n�1Pni=1 jpji;nj <1:

8

locations. Similar to Case 1, wij and wkl are uncorrelated if the distance between locations i and k exceeds

2k0 (twice of the threshold). This is a generalization of m-dependence in the time series literature to our

spatial setting. Therefore, the analysis of these two cases will be similar.

The assumed topological structure in Assumption 2.6) has the following implications:

Claim 1 There exists a constant C < 1 such that for h > 1, there are at most Chd elements within the ball

Bi(h), where d is the dimension of the location space of units.

This is Lemma A.1(ii) in Jenish and Prucha (2009). Also we have the following claim.

Claim 2 The (i; j)th element of the b-fold product W bn of Wn is not zero only if j 2 Bi(bk0). Furthermore,

the (i; j)th element of W0an W

bn is not zero only if j 2 Bi((a+ b)k0):

4.3 Probability convergence of key statistics

Terms in key statistics have common features in the analysis of their convergence, so we would like to have

some general propositions, which can be applied to those terms.

De�nition 1 A �nite dimensional square matrix A is said to be a spatial h�dependence matrix of �n if each(i; j)th entry aij of A is a function of �s�s with its (location) s 2 Bi(h) and aij = 0 if j 62 Bi(h).

Example 1 1.1) An =W ln, l = 1; 2; ::: :

aij =

nXj1=1

� � �nX

jl�1=1

wij1wj1j2 � � �wjl�1j :

As (wij1 ; wj1j2 ; � � � ; wjl�1j) with all nonzero elements is a "forward walk" of length l from i to j, this An is

a spatial lk0-dependence matrix of �n.

1.2) An =W0ln , l = 1; 2; ::: :

aij =nX

i1=1

� � �nX

il�1=1

wji1wi1i2 ; � � � ; wil�1i:

As (wji1 ; wi1i2 ; � � � ; wil�1i) where all elements are nonzero is a "backward walk" of length l from i to j, this

An is a spatial lk0-dependence matrix of �n.

1.3) An =W0lnW

mn , l;m = 1; 2; ::::

aij =nX

i1=1

� � �nX

il�1=1

wki1wi1i2 � � �wil�1inX

j1=1

� � �nX

jm�1=1

wkj1wj1j2 � � �wjm�1j :

This has "backward walk"s of length l from i to any k, and then continues "forward walk"s of length m from

an k to j. This An is a spatial (l +m)k0-dependence matrix of �n.

9

De�nition 2 A sequence of square matrices fBng has an increasing dependence geometric series expansionif Bn =

P1l=0 alAnl under jj:jj1 where al�s are scalar coe¢ cients and fAnlg is a sequence of conformable

spatial (l+s)k0-dependence matrices of �n with s being a speci�ed non-negative integer, such that fjja0An0jj1gis uniformly bounded, and jjalAnljj1 � c0(l + c1)

s1cl for all l = 1; 2; � � � with c0 > 0, c1 and s1 are some

positive constants, and 0 < c < 1.

Under Assumption 2.3), we have following examples:

Example 2 2.1) Bn = Gn(�) =P1

l=0 �lW l+1

n .

2.2) Bn = G0n(�) =P1

l=0 �lW

0(l+1)n .

2.3) Bn =P1

l=1�l

l Wln.

2.4) Bn = G0n(�)Wn =P1

l=0 �lW

0(l+1)n Wn where Al =W

0(l+1)n Wn is (l + 2)k0-dependence (with s = 2).

2.5) Bn = G0n(�)Gn(�) =P1

i=0

P1j=0 �

i+jW0(i+1)n W

(j+1)n =

P1l=0 �

lAl where Al =Pl

i=0W0(i+1)n W

(l�i+1)n

with s = 2.

2.6) Bn = G0n(�)Gn(�)G0n(�)Gn(�) =

P1k=0

P1m=0 �

k+mPki=0

Pmj=0W

0(i+1)n W

(k�i+1)n W

0(j+1)n W

(m�j+1)n =P1

l=0 �lAl where Al =

Plk=0

Pki=0

Pl�kj=0W

0(i+1)n W

(k�i+1)n W

0(j+1)n W

(l�k�j+1)n with s = 4.

With these de�ned matrices, we would like to have some general convergence results:

(i) �0nAn�n; where An is a spatial lk0-dependence matrix of �n;

(ii) �0nBn�n; where Bn has an increasing dependence series expansion with al = �l for all l = 0; 1; : : : ,

which has the property jjalAljj1 � c0(l + c1)s1cl for l = 1; 2; � � � , with c0 > 0, c1 and s1 are some positiveconstants, and 0 < c < 1.

Proposition 1 Suppose that �i�s in "n = (�1; � � � ; �n)0 are i.i.d. with zero mean and its fourth moment existsand is �nite; the n-dimensional square matrix An is spatial h-dependence of "n where all the entries aij�s

are uniformly bounded. Then, jE( 1n"0nAn"n)j = O(1) and 1

n ["0nAn"n � E("0nAn"n)] = op(1).

Proposition 2 Suppose Bn has an increasing dependence geometric series expansion, Bn =P1

l=0 alAnl with

fAnlg being a sequence of spatial (l+s)k0-dependence matrices of �n; �i�s in "n are i.i.d. with zero mean andits fourth moment exists and is �nite; Then, E( 1n j�

0nBn�nj) = O(1) and 1

n [�0nBn�n � E(�0nBn�n)]= op(1).

From the proofs of above two propositions, we have also the convergence of terms with �n replaced by a

vector of exogenous variables.

Corollary 1 Suppose that all the elements of the vectors of exogenous variables xn = (xn1; � � � ; xnn)0

and qn = (qn1; � � � ; qnn)0 are uniformly bounded for all n. Under the setting in the previous proposi-

tions, E( 1n j�0nBnxnj) = O(1); E( 1n jq

0nBnxnj) = O(1), 1

n (�0nBnxn � E(�0nBnxn)) = op(1) and 1

n (q0nBnxn �

E(q0nBnxn)) = op(1).

10

For our statistics with a general form 1n"

0nMn�n, because �n is conditional independent of Zn (and Xn),

we have E(�njMn) = 0 and

V ar(1

n"0nMn�n) = E(V ar(

1

n"0nMn�njZn)) + V ar(E(

1

n"0nMn�njZn)) =

1

n�2�E(

1

n"0nMnM

0n"n):

From Proposition 2, we have E("0nMnM0n"n) = O(n), then 1

n"0nMn�n is op(1). Similar arguments can be

applied to show 1nX

0nMn�n = op(1); E ln jSn(�)j = O(n) and 1

n [ln jSn(�)j � E ln jSn(�)] = op(1):

4.4 Consistency of estimators

As

b�� 0 = [(WnYn; Xn; P?n Zn)

0Qn(Q0nQn)


�1

�(WnYn; Xn; P?n Zn)

0Qn(Q0nQn)

�1Q0n(�n + Pn"n�0);

and

limn!1

1

nQ0n(WnYn; Xn; P

?n Zn) = lim

n!1

1

n[Q0nGn(Xn�0 + "n�0 + �n); Q

0nXn; Q

0nP

?n "n]

p! limn!1

1

n[E(Q0nGn)Xn�0 + E(Q

0nGn"n)�0; E(Q

0n)Xn; E(Q

0n"n)];

to show the consistency of the 2SIV estimator, in addition to the convergence of each separated terms, we

need some rank conditions.

Assumption 3 3.1) limn!1

E( 1nQ0nQn) exists and is nonsingular;

3.2) limn!1

1n [E(Q

0nGn)Xn�0+E(Q

0nGn"n)�0; E(Q

0n)Xn; E(Q

0n"n)]

0[E(Q0nGn)Xn�0+E(Q0nGn"n)�0; E(Q

0n)Xn; E(Q

0n"n)]

is nonsingular.

Proposition 3 Under Assumptions 1-3, the 2SIV estimator b� p! �0:

Now for the consistency of the MLE, we need additional assumptions.

Assumption 4 limn!1

1n [Xn; E(Gn)Xn�0+E(Gn�n)�0]

0[Xn; E(Gn)Xn�0+E(Gn�n)�0] exists and is nonsin-

gular.

Lemma 1 (Unique maximizer) Under Assumptions 1, 2, and 4, �0 is the unique maximizer of limn!1

1nE lnLn(�).

This lemma, the uniform convergence that sup�2�1n

��lnLn(�)� E 1n lnLn(�)

�� p! 0, and the equicontinuity

of limn!1

1nE lnLn(�0) together imply the consistency of the MLE.

Theorem 1 Under Assumptions 1, 2, and 4, the MLE b� p! �0.

11

5 Asymptotic distributions of estimators

For the 2SIV estimator, we need the CLT for the form 1pn(a0Q0n�n + b

0X 0n"n�), where a and b are nonsto-

chastic matrices, speci�cally,

a = [S0(plimn!1

Q0nQnn

)�1S]�1S0(plimn!1

Q0nQnn

)�1 and b = [S0(plimn!1

Q0nQnn

)�1S]�1T 0( limn!1

X 0nXnn

)�1;

with S = limn!1

1n [E(Q

0nGn)Xn�0 + E(Q

0nGn"n)�0; E(Qn)

0Xn; E(Q0n"n)] and T = lim

n!11n [X

0nE(Gn)Xn�0 +

X 0nE(Gn"n)�0; X

0nXn; 0].

For the MLE, from the mean value theorem we have

pn(b� � �0) = � 1

n

@2 lnLn(e�)@�@�0

!�11pn

@ lnLn(�0)

@�;

where e� lies between b� and �0. Therefore, we need the uniform convergence of the second order derivative1n@2 lnL(�)@�@�0 and the CLT for 1p

n@ lnL(�0)

@� ; whose expressions can be found in Appendix A.3 and Appendix

A.2.

In general, we need to consider the CLT for the form of

Rn =1pn

�A0n&n +B

0n�n + �

0nCn�n + &

0nDn&n + �

0nEn&n � E(�0nCn�nj&n)� E(& 0nDn&n)

�;

where &n and �n are independent random variables with zero means and, respective nonzero variances �&and ��, (&i; �i) are identically distributed across n, An and Dn are n dimensional non-stochastic vector and

matrix, but elements in Bn, Cn, and En may contain &n.

Assumption 5 5.1) E(�4+�Ri ) and E(&4+�Ri ) exist.

5.2) For vectors An and Bn, elements are uniformly bounded; matrices Cn; Dn and En are bounded in

both row and column sums.

5.3) For stochastic matrices Cn and En, each is either a spatial h-dependence matrix of &n or has an

increasing dependence geometric series expansion.

5.4) For the stochastic vector Bn, it can be expressed as a product of a bounded non-stochastic vector and

a matrix either spatial h-dependence of &n or has an increasing dependence geometric series expansion.

Lemma 2 Denote �2Rnas the variance of Rn. Under Assumption 5 and assume that �2Rn

is bounded away

from zero,3 then Rn=�Rn

d! N(0; 1):

3For example, if limn!11n

Pni=1

Pnj 6=i(d

2ij+d

2ji) > 0 or limn!1

1n

Pni=1

Pnj 6=i E(c

2ij+c

2ji) > 0, then �

2Rn

will be boundedaway from zero.

12

From this lemma, the asymptotic normality of 2SIV is straightforward,

1pn(b�� 0) d! N(0;�IV );

where

�IV = �2�0[S

0(plimn!1

Q0nQnn

)�1S]�1+�00�"0�0[S0(plimn!1

Q0nQnn

)�1S]�1T 0( limn!1

X 0nXnn

)�1T [S0(plimn!1

Q0nQnn

)�1S]�1:

As for the MLE, we have the following lemma and theorem.

Lemma 3 Under Assumptions 1, 2 and 4, the information matrix I� = limn!1

E( 1n@ lnLn(�0)

@�@ lnLn(�0)

@�0 ) is

nonsingular.

Expression of the information matrix I� can be found in Appendix A.3.

Theorem 2 Under Assumptions 1, 2 and 4,pn(b� � �0) d! N

�0; I�1�

�:

6 Some extensions

6.1 The GMM estimation

With (2.3), the SAR equation implies that

Yn = (In � �Wn)�1[Xn(� � ��) + Zn� + �n]: (6.1)

BecauseWn is a function of Zn, and �n is conditional independent of Zn andXn, we have that E(�njZn;Wn; Xn) =

E(�njZn; Xn) = 0. This suggests additional moments for the estimation such as E(Z 0n�n) = 0, and

E((WnXn)0�n) = 0, etc. From (6.1), we have WnYn = GnXn(�0 � �0�0) + GnZn�0 + Gn�n:This suggests

that the possible set of linear moments may consist of

E(X 0n"n) = 0; E(X

0n�n) = 0; E((GnXn)

0�n) = 0, E((GnZn)0�n) = 0:

In addition to the linear moments, as motivated by Lee (2007), the quadratic moment may be

E

��0n[Gn �

1

ntr(Gn)In]

0�n

�= 0:

The corresponding empirical moments are:

(1) X 0n(Zn �Xn�);

(2) X 0n[Yn � �WnYn �Xn(� � ��)� Zn�];

13

(3) X 0nG

0n[Yn � �WnYn �Xn(� � ��)� Zn�];

(4) Z 0nG0n[Yn � �WnYn �Xn(� � ��)� Zn�]; and

(5) [Yn � �WnYn �Xn(� � ��)� Zn�]0[Gn � 1n tr(Gn)In]

0[Yn � �WnYn �Xn(� � ��)� Zn�].By comparing these moments with the �rst order condition (FOC) of the ML approach, we can see that

the FOC consists of linear combinations of these empirical moments. Thus, in place of the ML estimation,

one may base estimation on these moments. Within the optimum GMM framework, these moments involving

Gn and the optimum distance matrix need to be feasibly estimated with an initial estimate b�, i.e., usingbGn =Wn(In�b�Wn)�1 to replace Gn. With the same reason in Lee (2007), these feasible IV�s and quadratic

matrix would provide asymptotic equivalent GMM estimators as if these are known. With these moments,

the optimum GMM estimators are asymptotically equivalent to the MLE.

This GMM approach is, in particular, valuable if we have more than one spatial weight matrix in the

outcome equation, say

Yn = �W1nYn + �W2nYn +Xn� + Vn;

as the GMM and 2SIV estimation can be easily extended without additional computational complication.

6.2 Control function method

For simplicity, we assume z is a scalar in this section. The model is the same, Yn = �WnYn+Xn�+Vn with

Zn = Xn + "n: We want to relax distributional assumptions on (vi; "i). Still Xn is strictly exogenous, i.e.,

(vi; "i) is independent of Xn: Instead of the joint normal, we assume

E(vijXn; Zi) = E(vijXn; "i) = E(vij"i) = �"i:

Then we have

E(YnjXn; "n) = �WnE(YnjXn; "n) +Xn� + �"n:

Ordinary control function approach means running the OLS regression of Yn on WnYn, Xn; and b"n, whereb"n represents the reduced form residuals. But here the OLS cannot work because even controlling for "n,

we still have endogeneity coming from WnYn. Hence, we run the IV regression of Yn on WnYn, Xn; andb"n, using WnXn as an instrumental variable for WnYn. This yields the same estimation method as 2SIV

estimation but it relaxes the joint normality assumption.

Instead of joint normality, if we maintain the independence of the disturbances �n and "n in (2.1) and

(2.2), the log likelihood function in (3.3) can be regarded as a quasi-likelihood function. The MLE would be a

quasi-MLE (QMLE). As the consistency of the QMLE depends only on the �rst two moments of disturbances,

the QMLE would be consistent. The asymptotic variance would be in a "sandwich" form, which will involve

additional third and fourth moments of �n and "n because relevant conditional and unconditional linear and

quadratic forms consist of those disturbances (see, Lee (2004) for the QML of the SAR model and White

(1982)).

14

6.3 Spatial correlation in the economic distance

Still z is a scalar, but now we consider the case with a spatial structure in the formation of z. The outcome

equation is the same, Yn = �WnYn + Xn� + Vn. But for the z equation, there is an MA structure in

disturbance term: Zn = Xn + un with un = �2Wdnun + "n, where Wn has the same structure as in

Assumption 2 and Wdn is the spatial weight matrix constructed by predetermined physical distances, i.e.,

Wdn(i; j) = I(�(i; j _) � k0). The joint normal assumption still holds: (vi; �i) � i:i:d N 0;

�2v �v�

�v� �2"

!!:

With joint normality, the log likelihood function is

lnLn(�jYn; Zn) = �n ln(2�)�n

2ln[�2"(�

2v � �2"�2)] + ln jSn(�)j+ ln j(In � �2Wdn)�1j

� 1

2�2"(Zn �Xn )0(In � �2Wd0n)�1(In � �2Wdn)�1(Zn �Xn )

� 1

2(�2v � �2"�2)�Sn(�)Yn �Xn� � (In � �2Wdn)�1(Zn �Xn )�

�0��Sn(�)Yn �Xn� � (In � �2Wdn)�1(Zn �Xn )�

�:

As Yn = S�1n (Xn�0 + "n�0 + �n);

Sn(�)Yn �Xn� � �(In � �2Wdn)�1(Zn �Xn )

= (S�1n � �Gn)(Xn�0 + "n�0 + �n)�Xn� � �(In � �2Wdn)�1[Xn( 0 � ) + (In � �20Wdn)"n]:

Let Gdn(�2) = Wdn(In � �2Wdn)�1 and Gdn = Gdn(�20). Because Gdn is exogenous and is uniformally

bounded, that would not add complication in our analysis, so the most complicated term in the log likelihood

function for analysis is still 1n"0nGn(�)

0Gn(�)"n. The di¤erence between this and previous analysis is that

now the spatial weight in the outcome equation

wij = g(zi; zj)I(�(i; j _) � k0) = f(ui; uj)I(�(i; j _) � k0) with ui = "i � �2X

k2Bi(k0)

"k:

Thus, wij is a function of all "0s within Bi(2k0). For two nonzero weights wij and wkl, if distance betweentwo locations i and k exceeds 4k0, they will not contain common "0s and hence, they are independent.

Therefore, previous arguments of the consistency and asymptotic normality still hold.

15

7 Monte Carlo Simulations

7.1 Data generating process

In this section, we evaluate four estimation methods of a spatial autoregressive model with an endogenous

Wn. The data generating process (DGP) is

Yn = (In � �Wn)�1(Xn� + Vn);

where xi = (xi1; xi2)0 with xi1;n being the intercept and xi2 � N(0; 1); �1 = �2 = 1. We construct the

row-normalized Wn = (wij) in this way:

1. Generate bivariate normal random variables (vi; �i) from i:i:d N

0;

1 �

� 1

!!as disturbances in

the outcome equation and the spatial weight equation.

2. Construct the spatial weight matrix as the Hadamard product Wn = W dn �W e

n, i.e., wij = wdijweij ,

where W dn is a predetermined matrix based on geographic distance w

dij : w

dij = 1 if the two locations

are neighbors and otherwise 0; W en is a matrix based on economic similarity w

eij : w

eij = 1=jzi � zj j if

i 6= j and weii = 0; where elements of z is generated by zi = 1 + 0:8xi2 + �i.

3. Row-normalize Wn.

For the predeterminedW dn , we use four examples. First, the U.S. states spatial weight matrixWS(49�49);

based on the contiguity of the 48 contiguous states and D.C.; second, the Toledo spatial weight matrix

WO(98�98); based on the 5 nearest neighbors of 98 census tracts in Toledo, Ohio; the third one is the Iowa"Adjacency" spatial weight matrix WA(361 � 361), based on the adjacency of 361 school districts in Iowain 2009; and the last one is the Iowa "County" spatial weight matrix WC(361� 361), based on whether the361 school districts are in the same county in Iowa in 2009.

Four estimation methods we consider are conventional IV, 2-stage IV, conventional MLE of SAR, and

the MLE we proposed in section 2.4. A conventional method refers to the case of treating the spatial weight

matrices as exogenous ones. We refer to these four methods as IV, 2SIV, SAR, and MLE in tables. Here

2SIV and MLE treat Wn as endogenous: 2SIV estimates z equation by the OLS in the 1st stage and then

estimates the conditional outcome equation Yn = �WnYn+Xn�+(Zn�Xnb�)�+b�n by IV methods choosingWnXn as an IV for WnYn; the MLE estimates all the coe¢ cients (�; �

0; 0; �2� ; �2v; �)

0 jointly by maximizing

the log likelihood function

lnL(�jYn; Zn) = �n ln(2�)�n

2ln[�2��

2v(1� �2)] + ln jSn(�)j �

1

2�2v(Zn �Xn�)0(Zn �Xn�)

� 1

2�2v(1� �2)[Sn(�)Yn �Xn� � (Zn �Xn�)��v=��]0[Sn(�)Yn �Xn� � (Zn �Xn�)��v=��];

16

IV and SARmethods treatWn as exogenous and estimate the outcome equation (z equation is not estimated).

From these, we want to see how misspeci�cation of a spatial weight matrix being exogenous would behave

when the actual one is endogenous. We choose correlation coe¢ cients � = 0:2, 0:5, and 0:9 to control for

di¤erent magnitudes of endogeneity. We also let the spatial correlation to be � = 0:2 or 0:4 to see how the

spatial correlation parameter a¤ects estimates. 1000 replications are carried out for each setting4 .

7.2 Monte Carlo results

Table 1: Estimates from the U.S. states spatial weight matrix

WS(49) � = 0:2 � = 0:4

� = 0:2 b� b�1 b�2 b 1 b 2 b� b�1 b�2 b 1 b 2IV 0:119 1:1325 0:9658 0:2804 1:199 0:9717

(0:2336) (0:3996) (0:1494) (0:3358) (0:5931) (0:185)2SIV 0:1849 1:0278 0:9739 1:0013 0:8027 0:3728 1:0458 0:9811 1:003 0:8001

(0:2361) (0:3968) (0:1486) (0:144) (0:1388) (0:3391) (0:592) (0:1809) (0:1395) (0:1651)SAR 0:1354 1:1077 0:9779 0:3078 1:1589 0:9925

(0:1278) (0:2537) (0:1425) (0:1367) (0:2865) (0:1632)MLE 0:177 1:0415 0:9826 1:0013 0:8027 0:3548 1:0812 0:9965 1:003 0:8001

(0:1338) (0:2564) (0:1428) (0:144) (0:1388) (0:1402) (0:2858) (0:1634) (0:1395) (0:1651)� = 0:5IV �0:0067 1:254 0:9614 0:2285 1:2959 0:991

(0:2595) (0:3745) (0:1722) (0:2334) (0:4437) (0:1626)2SIV 0:1819 1:0213 0:9833 1:0028 0:8017 0:3948 1:0142 0:9886 1:0008 0:8084

(0:235) (0:3244) (0:166) (0:1404) (0:1572) (0:2021) (0:3787) (0:1607) (0:1409) (0:155)SAR 0:0686 1:1607 0:9804 0:2739 1:2176 1:0014

(0:1358) (0:2383) (0:1594) (0:1262) (0:2727) (0:158)MLE 0:1753 1:0287 0:991 1:0028 0:8017 0:3711 1:0538 0:9979 1:0008 0:8084

(0:1293) (0:2133) (0:1595) (0:1404) (0:1572) (0:1125) (0:2413) (0:1571) (0:1409) (0:155)� = 0:8IV �0:0756 1:3012 0:8888 0:2135 1:3561 0:9659

(0:2401) (0:354) (0:1686) (0:1728) (0:3907) (0:1263)2SIV 0:1859 1:0147 0:9937 0:9969 0:8046 0:3943 1:0098 1:0013 0:9978 0:8042

(0:1929) (0:2806) (0:1529) (0:15) (0:1326) (0:0988) (0:2392) (0:1172) (0:1434) (0:1145)SAR 0:0183 1:1971 0:9287 0:2643 1:2592 0:9797

(0:1269) (0:2351) (0:1378) (0:1175) (0:2928) (0:1178)MLE 0:1874 1:0129 0:9952 0:9972 0:804 0:3881 1:0223 1:0029 0:9978 0:805

(0:0892) (0:1894) (0:1438) (0:1557) (0:1373) (0:0756) (0:2115) (0:1213) (0:1494) (0:1216)

Note: Observations n = 49, �1 = �2 = 1 = 1; and 2 = 0:8:

4We try the DGP of some other speci�cations of � and . The results are similar.

17

Table 2: Estimates from the Toledo spatial weight matrix

WO(98) � = 0:2 � = 0:4

� = 0:2 b� b�1 b�2 b 1 b 2 b� b�1 b�2 b 1 b 2IV 0:107 1:124 0:989 0:2916 1:1851 0:9719

(0:1826) (0:2638) (0:1155) (0:1903) (0:3392) (0:1175)2SIV 0:1969 1:0076 0:9967 1:0001 0:7974 0:382 1:036 0:9905 0:9974 0:8004

(0:1854) (0:2618) (0:1148) (0:0991) (0:1147) (0:1871) (0:3318) (0:1154) (0:1034) (0:1013)SAR 0:1313 1:0914 0:9956 0:3214 1:1365 0:9813

(0:1042) (0:1733) (0:1119) (0:1012) (0:2051) (0:1071)MLE 0:1839 1:0233 0:9998 1:0001 0:7974 0:3776 1:0432 0:9928 0:9974 0:8004

(0:1087) (0:1732) (0:1122) (0:0991) (0:1147) (0:1018) (0:2015) (0:1076) (0:1034) (0:1013)� = 0:5IV �0:0114 1:2686 0:955 0:2088 1:3263 0:9808

(0:1625) (0:2468) (0:1123) (0:1522) (0:2937) (0:1042)2SIV 0:1896 1:0121 0:9938 1:0009 0:7982 0:3869 1:0244 0:9961 1:0022 0:8

(0:1419) (0:2068) (0:1067) (0:1063) (0:1048) (0:1293) (0:2423) (0:1013) (0:1019) (0:1031)SAR 0:0549 1:1829 0:9706 0:2663 1:2297 0:989

(0:102) (0:1758) (0:103) (0:0976) (0:209) (0:1004)MLE 0:1792 1:0247 0:994 1:0009 0:7982 0:3791 1:0381 0:9981 1:0022 0:8

(0:0939) (0:1547) (0:1028) (0:1063) (0:1048) (0:0844) (0:1765) (0:1001) (0:1019) (0:1031)� = 0:8IV �0:0796 1:3667 0:9404 0:1588 1:4063 0:991

(0:1453) (0:2483) (0:1064) (0:1542) (0:2904) (0:1226)2SIV 0:1982 0:9988 0:9968 1:0006 0:796 0:3953 1:0093 0:995 1:0016 0:7966

(0:0866) (0:1581) (0:0978) (0:103) (0:1002) (0:0902) (0:1799) (0:1195) (0:0967) (0:1182)SAR �0:0145 1:2798 0:9562 0:2044 1:3304 0:9966

(0:1011) (0:1931) (0:0986) (0:102) (0:2129) (0:1196)MLE 0:1931 1:0062 0:9958 1:001 0:7951 0:3906 1:0209 0:9979 1:0037 0:7976

(0:0634) (0:14) (0:0971) (0:1057) (0:1008) (0:0681) (0:1497) (0:1242) (0:1068) (0:124)


18

Table 3: Estimates from the Iowa "Adjacency" spatial weight matrix

WA(361) � = 0:2 � = 0:4

� = 0:2 b� b�1 b�2 b 1 b 2 b� b�1 b�2 b 1 b 2IV 0:1328 1:0825 0:9892 0:3278 1:1225 0:9903

(0:0823) (0:1195) (0:0563) (0:0766) (0:1441) (0:0519)2SIV 0:2018 0:994 0:9999 1:0007 0:7993 0:3956 1:0064 0:9947 1:0024 0:7998

(0:0792) (0:1136) (0:0558) (0:0552) (0:054) (0:0743) (0:1375) (0:0519) (0:0509) (0:0516)SAR 0:1548 1:0545 0:9937 0:3535 1:0782 0:9931

(0:0526) (0:0878) (0:0546) (0:0478) (0:0979) (0:0516)MLE 0:1964 1:0012 1 1:0007 0:7993 0:3958 1:0058 0:9958 1:0024 0:7998

(0:0517) (0:0851) (0:0548) (0:0552) (0:054) (0:0470) (0:0944) (0:0518) (0:0509) (0:0516)� = 0:5IV 0:0259 1:2331 0:9717 0:2488 1:2546 0:9811

(0:0743) (0:1182) (0:0547) (0:0755) (0:1439) (0:0561)2SIV 0:1933 1:0103 0:9978 1 0:8021 0:4004 0:9955 1:0001 0:9994 0:7998

(0:0645) (0:1004) (0:0532) (0:0519) (0:0504) (0:0603) (0:1140) (0:0547) (0:055) (0:0536)SAR 0:0878 1:1505 0:9822 0:3048 1:1592 0:989

(0:0509) (0:0872) (0:0525) (0:05) (0:1055) (0:0544)MLE 0:1942 1:0088 0:9987 1 0:8021 0:3958 1:0037 1 0:9995 0:7998

(0:0452) (0:0767) (0:0525) (0:0519) (0:0504) (0:0409) (0:0875) (0:0543) (0:055) (0:0546)� = 0:8IV �0:0814 1:3369 0:9509 0:1342 1:4259 0:9746

(0:0795) (0:1201) (0:0615) (0:0784) (0:148) (0:0603)2SIV 0:1989 1:001 0:9955 0:9989 0:7962 0:3987 1:0017 0:9955 0:9989 0:7962

(0:046) (0:076) (0:058) (0:0537) (0:0572) (0:0421) (0:0852) (0:0578) (0:0537) (0:0572)SAR �0:0011 1:2409 0:9648 0:2303 1:2721 0:9832

(0:055) (0:0963) (0:0586) (0:0537) (0:1124) (0:0584)MLE 0:1968 1:0036 0:9955 0:9989 0:7963 0:3962 1:006 0:9945 0:9987 0:7956

(0:0331) (0:0671) (0:0577) (0:0538) (0:0573) (0:0359) (0:0856) (0:0802) (0:0631) (0:08)


19

Table 4: Estimates from the Iowa "County" spatial weight matrix

WC(361) � = 0:2 � = 0:4

� = 0:2 b� b�1 b�2 b 1 b 2 b� b�1 b�2 b 1 b 2IV 0:1578 1:0522 0:9927 0:3627 1:0584 1:0004

(0:0629) (0:098) (0:0518) (0:0591) (0:1087) (0:0586)2SIV 0:1967 1:003 0:9946 1:0024 0:7998 0:3955 1:0068 0:9947 0:9983 0:7978

(0:0618) (0:0952) (0:0518) (0:0509) (0:0516) (0:057) (0:1049) (0:0588) (0:0509) (0:0579)SAR 0:1778 1:0268 0:995 0:3789 1:0332 0:9992

(0:0389) (0:0723) (0:0517) (0:0353) (0:0794) (0:0582)MLE 0:1977 1:0015 0:9959 1:0024 0:7998 0:3969 1:0048 0:996 0:9983 0:7978

(0:0381) (0:0706) (0:0518) (0:0509) (0:0516) (0:0336) (0:0765) (0:0582) (0:0509) (0:0579)� = 0:5IV 0:1033 1:1129 0:994 0:3121 1:1484 1:0019

(0:0658) (0:0973) (0:0581) (0:06) (0:1143) (0:0503)2SIV 0:1967 1:0036 0:9946 0:9985 0:7969 0:3959 1:0088 0:9994 1:0026 0:8

(0:0552) (0:0841) (0:0579) (0:0521) (0:0574) (0:0469) (0:0909) (0:0503) (0:0533) (0:0522)SAR 0:1478 1:0611 0:9955 0:3601 1:0687 1:0016

(0:0422) (0:077) (0:0578) (0:0346) (0:0782) (0:0501)MLE 0:1973 1:003 0:9956 0:9985 0:7969 0:397 1:0071 1:0003 1:0026 0:8

(0:0356) (0:069) (0:0578) (0:0521) (0:0574) (0:0296) (0:0688) (0:0501) (0:0533) (0:0522)� = 0:8IV 0:0511 1:1872 0:9783 0:2731 1:1989 1:0093

(0:0601) (0:0996) (0:0529) (0:0579) (0:1122) (0:0582)2SIV 0:1975 1:0018 0:9958 1:0005 0:7977 0:3982 1:0025 0:9958 0:9989 0:7962

(0:0357) (0:068) (0:0519) (0:0508) (0:0516) (0:0312) (0:0726) (0:058) (0:0537) (0:0572)SAR 0:1104 1:112 0:9861 0:3351 1:1018 1:0034

(0:0406) (0:0774) (0:0519) (0:0361) (0:0843) (0:058)MLE 0:1981 1:0015 0:9967 1:0005 0:7986 0:3988 1:0024 0:9942 0:9994 0:7943

(0:0272) (0:0765) (0:061) (0:0659) (0:0597) (0:0255) (0:0664) (0:0692) (0:058) (0:0729)


20

Tables 1-4 list the empirical mean and standard error (in parentheses) of estimators from 1000 iterations

using WS; WO; WA; and WC as the predetermined spatial weight matrices. In each table, the left panel

shows the results where � = 0:2 and the right panel is for � = 0:4. To see how the di¤erent estimation

methods react to the magnitude of endogeneity, we have three sets of results in each table corresponding to

� = 0:2, 0:5, and 0:8.

There are several patterns in the estimation results in all four tables:

1. For the accuracy of estimation, we consider the empirical standard error: IV is close to 2SIV; SAR is

close to MLE. In general, the likelihood estimation methods behave better than the IV methods.

2. For the bias, 2SIV and MLE behave well in all cases. With regard to conventional IV and SAR methods,

the more serious the endogeneity is, i.e., the larger the correlation coe¢ cient � is, the more bias they

create. Most bias comes from the estimator of the spatial correlation b�. In general, the estimators for� are much less biased. b� from IV and SAR su¤ers severe downward bias when � achieves 0:5, in some

cases exceeding 100%. Although both create large bias, it is clear that conventional SAR creates less

bias than conventional IV.

3. For di¤erent cases of the spatial correlation, patterns in the results are similar. When the true � is

relatively small (in our case 0.2), the estimation does not perform well compared to larger � (in our

case 0.4). More speci�cally, the bias is more serious and the standard error is larger.

From Table 1 to Table 4, as sample size increases, the bias and standard error of estimators decrease.

When the sample size is 361, such as in the case of WA and WC, the estimates appear quite precise and

close to the true parameters.

8 Conclusion

In this paper, we consider the speci�cation and estimation of a cross-sectional SAR model with an endogenous

spatial weight matrix. First, we present the model speci�cation for two sets of equations: one is the SAR

outcome, and the other is the equation for entries of the spatial weight matrix. Source of endogeneity

comes from the correlation between the disturbances in the SAR outcome equation and the errors in the

entry equation. Second, with the model speci�cation and a jointly normal assumption on disturbances, we

consider three estimation methods: the 2SIV approach, the MLE, and the GMM.

In deriving the consistency and asymptotic normality of estimators from these estimation methods, we

need to consider statistics with linear-quadratic forms of disturbances with the quadratic matrix involving

the spatial weight matrix, such as "0nWn"n and "0nGn"n. These statistics are di¤erent from the standard

quadratic forms with nonstochastic quadratic matrices because entries in the quadratic matrix are non-linear

functions of disturbances. To prove these asymptotic properties, we impose some restrictions on the spatial

weights, i.e., whether or not agents are neighbors depends only on the predetermined (physical) distance, but

21

magnitudes of neighbors�e¤ects depend on the economic distance. Therefore, the spatial weight considered

in this paper is a combination of (physical) distance and economic distance with the predetermined physical

distances playing an important role in our analysis.

Then we consider three extensions: having more than one spatial weight matrix in the SAR model,

relaxing the jointly normal distribution of disturbances and introducing spatial correlation in the entries of

the spatial weights. Without much modi�cation, the estimation methods we use for the basic model can be

applied to these situations.

To examine the behavior of our proposed estimation methods in �nite samples, we use the Monte Carlo

simulation. The simulation results indicate serious downward bias if we treat Wn as exogenous where the

true DGP is endogenous. As sample size increases, the estimates approach the true parameters.

This paper focuses on estimating a cross-sectional SAR model with a speci�ed source of endogeneity from

the spatial weight matrix. In future research, we will also be interested in estimating the SAR model in a

spatial panel data setting where the spatial weight matrix varies over time due to some endogenous reasons.

Another topic that needs further research is to consider an endogenous spatial weight matrix constructed

purely using the economic distance. This is a challenging issue because as the number of observations

increases, there may be no strict rule to keep them apart from each other. In this setting, each agent can be

highly correlated so alternative large sample theorems need to be developed.

References

[1] Anselin, L. (1980), Estimation methods for spatial autoregressive structures, Regional Science Disser-

tation and Monograph Series. Cornell University, Ithaca, NY.

[2] Anselin, L. and A. Bera, (1997), Spatial dependence in linear regression models with an introduction to

spatial econometrics, Journal of Public Economics 52, 285-307.

[3] Case, A., H. Rosen, and J. Hines, (1993), Budget spillovers and �scal policy interdependence: Evidence

from the states, in Handbook of Economic Statistics. D. Giles and A. Ullah, Eds.., Marcel Dekker, NY.

[4] Conway, K. and and J. Rork, (2004), Diagnosis Murder: The Death of State Death Taxes. Economic

Inquiry 42, 537�559.

[5] Crabbé, K. and H. Vandenbussche (2008), Spatial tax competition in the EU15, Working Paper.

[6] Hsieh, C. and L. Lee (2011), A social interactions model with endogenous friendship, Working Paper.

[7] Kelejian, H. and I. Prucha (1998), A generalized spatial two stage least squares procedures for estimating

a spatial autoregressive model with autoregressive disturbances, Journal of Real Estate Finance and

Economics 17, 99-121.

22

[8] Kelejian, H. and I. Prucha (1999), A generalized moments estimator for the autoregressive parameter

in a spatial model, International Economic Review 40, 509-533.

[9] Jenish, N., and I. Prucha (2009), Central limit theorems and uniform laws of large numbers for arrays

of random �elds. Journal of Econometrics 150, 86�98.

[10] Lee, L. (2004), Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregres-

sive models, Econometrica 72, 1899-1925.

[11] Lee, L. (2007), GMM and 2SLS estimation of mixed regressive, spatial autoregressive models, Journal

of Econometrics 137, 489-514.

[12] Lee, L. and X. Liu (2010), E¢ cient GMM estimation of high order spatial autoregressive models with

autoregressive disturbances, Econometric Theory 26, 187-230.

[13] Lin, X. and L. Lee (2010), GMM estimation of spatial autoregressive models with unknown heteroskedas-

ticity, Journal of Econometrics 157, 34-52.

[14] Liu, X., L. Lee, and C. Bollinger. (2010), Improved e¢ cient quasi maximum likelihood estimator of

spatial autoregressive models, Journal of Econometrics 159, 303-319.

[15] Ord J. (1975), Estimation methods for models of spatial interaction, Journal of the American Statistical

Association 70, 120�126.

[16] Pinkse, J. and M. Slade (2010), The future of spatial econometrics, Journal of Regional Science 50,

103-117.

[17] Qu, X., and L. Lee (2012), LM tests for spatial correlation in spatial models with limited dependent

variables. Regional Science and Urban Economics 42, 430�445.

[18] White, H. (1982), Maximum likelihood estimation in misspeci�ed models, Econometrica 50, 1-26.

Appendix A: Log likelihood function and its �rst and second order

derivatives

A.1 Log likelihood function and its expectation

As �2� = �2v � �0v��1" �v�, the log likelihood function in (3.3) can be written as:

lnLn(�) = �n ln(2�)�n

2ln�2��

n

2ln j�"j+ln jSn(�)j�

1

2vec(Zn�Xn�)0(��1" In)vec(Zn�Xn�)�

1

2�2��n(�)

0�n(�);

where �n(�) = Sn(�)Yn �Xn� � (Zn �Xn�)��1" �v�. The corresponding expectation is

23

1

nE(lnLn(�)) = � ln(2�)�

1

2ln(�2�)�

1

2ln j�"j �

1

2n

nXi=1

x0i(�0 � �)��1" (�0 � �)0xi

+1

nE(ln jSn(�)j)�

1

2tr(��1" �"0)�

�2�02�2�

tr[E1

n(S�1n � �Gn)0(S�1n � �Gn)]

� 1

2�2�E1

n(Xn�0 + "n�0)

0(S�1n � �Gn)0(S�1n � �Gn)(Xn�0 + "n�0)

� 1

2�2�[� + (�0 � �)�]0

X 0nXnn

[� + (�0 � �)�]�1

2�2��0�"0�

+1

�2�E1

n(Xn�0 + "n�0)

0(S�1n � �Gn)0fXn[� + (�0 � �)�] + "n�g; (A.1)

where � = ��1" �v�.

A.2 First order derivatives and scores

The �rst order derivatives are:

@ lnLn(�jYn; Zn)@�

=1

�2�X 0n�n(�);

@ lnLn(�jYn; Zn)@�

=1

�2�(WnYn)

0�n(�)� tr[WnS�1n (�)];

@ lnLn(�jYn; Zn)@vec(�)

= (��1" X 0n)vec(Zn �Xn�)�

1

�2�[(��1" �v�)X 0

n]�n(�);

@ lnLn(�jYn; Zn)@�2v

= � n

2�2�+

1

2�4��n(�)

0�n(�);

@ lnLn(�jYn; Zn)@�j

= �n2tr(��1"

@�"@�j

) +1

2vec(Zn �Xn�)0(��1"

@�"@�j

��1" In)vec(Zn �Xn�)

� 1

�2��n(�)

0(Zn �Xn�)��1"@�"@�j

��1" �v� �n

2�2��0v��

�1"

@�"@�j

��1" �v�

+1

2�4��0v��

�1"

@�"@�j

��1" �v��n(�)0�n(�);

where �j is the jth element of �;

@ lnLn(�jYn; Zn)@�v�

=n��1" �v��2�

+1

�2�[(Zn �Xn�)��1" ]0�n(�)�

��1" �v��4�

�n(�)0�n(�):

24

The corresponding scores are:

@ lnLn(�0)

@�=

1

�2�0X 0n�n;

@ lnLn(�0)

@�= �tr(WnS

�1n ) +

1

�2�0(WnYn)

0�n;

@ lnLn(�0)

@vec(�)= (��1"0 X 0

n)vec("n)�1

�2�0[(��1"0 �v�0)X 0

n]�n;

@ lnLn(�0)

@�2v= � n

2�2�0+

1

2�4�0�0n�n;

@ lnLn(�0)

@�j= �n

2tr(��1"0

@�"0@�j

) +1

2vec("n)

0(��1"0@�"0@�j

��1"0 In)vec("n)��0n"n�

�1"0

�2�0

@�"0@�j

��1"0 �v�0

�n�0v�0�

�1"0

2�2�0

@�"0@�j

��1"0 �v�0 +1

2�4�0�0v�0�

�1"0

@�"0@�j

��1"0 �v�0�0n�n;

@ lnLn(�0)

@�v�=

n��1"0 �v�0�2�0

+��1"0�2�0

"0n�n ��1"0 �v�0�4�0

�0n�n:

25

A.3 Second order derivatives and the expectation

Expressions of the second order derivatives are:

@2 lnLn(�jYn; Zn)@�@�0

= � 1

�2�X 0nXn;

@2 lnLn(�jYn; Zn)@�@�

= � 1

�2�X 0nWnYn;

@2 lnLn(�jYn; Zn)@�@vec(�)0

=1

�2�[(��1" �v�)X 0

n]Xn;

@2 lnLn(�jYn; Zn)@�@�2v

= � 1

�4�X 0n�n(�);

@2 lnLn(�jYn; Zn)@�@�j

=1

�2�X 0n(Zn �Xn�)��1"

@�"@�j

��1" �v� ��0v��

�1"

�4�

@�"@�j

��1" �v�X0n�n(�);

@2 lnLn(�jYn; Zn)@�@�0v�

=2

�4�X 0n�n(�)�

0v��

�1" � 1

�2�X 0n(Zn �Xn�)��1" ;

@2 lnLn(�jYn; Zn)@�@�

= �tr[WnS�1n (�)]2 � 1

�2�(WnYn)

0WnYn;

@2 lnLn(�jYn; Zn)@�@vec(�)0

=1

�2�[(��1" �v�)X 0

n]WnYn;

@2 lnLn(�jYn; Zn)@�@�2v

= � 1

�4�(WnYn)

0�n(�);

@2 lnLn(�jYn; Zn)@�@�j

=1

�2�(WnYn)

0(Zn �Xn�)��1"@�"@�j

��1" �v� ��0v��

�1"

�4�

@�"@�j

��1" �0v��n(�)0WnYn;

@2 lnLn(�jYn; Zn)@�@�0v�

=2��1" �v��4�

�n(�)0(WnYn)�

1

�2�[(Zn �Xn�)��1" ]0(WnYn);

@2 lnLn(�jYn; Zn)@vec(�)@vec(�)0

= ��1" (X 0nXn)�

1

�2�[(��1" �v��

0v��

�1" ) (X 0

nXn)];

@2 lnLn(�jYn; Zn)@vec(�)@�2v

=1

�4�[(��1" �v�)X 0

n]�n(�);

@2 lnLn(�jYn; Zn)@�2v@�

2v

=n

2�4�� 1

�6��n(�)

0�n(�);

@2 lnLn(�jYn; Zn)@�2v@�v�

= �n��1" �v��4�

� [(Zn �Xn�)��1" ]0

�4��n(�) +

2��1" �v��6�

�n(�)0�n(�);

26

@2 lnLn(�jYn; Zn)@vec(�)@�j

= �(��1"@�"@�j

��1" X 0n)vec(Zn �Xn�) +

�0v��1"

�4�

@�"@�j

��1" �v�[(��1" �v�)X 0

n]�n(�)

+1

�2�[(��1"

@�"@�j

��1" �v�)X 0n]�n(�)�

1

�2�[(��1" �v�)X 0

n][(Zn �Xn�)��1"@�"@�j

��1" �v�];

@2 lnLn(�jYn; Zn)@vec(�)@�0v�

=1

�2�(��1" �v�)X 0

n[(Zn �Xn�)��1" ]� ��1"

�2� (X 0

n�n(�))�2

�4�[(��1" �v�)X 0

n]�n(�)�0v��

�1" ;

@2 lnLn(�jYn; Zn)@�2v@�j

=�n(�)

0

�4�(Zn �Xn�)��1"

@�"@�j

��1" �v� +n�0v��

�1"

2�4�

@�"@�j

��1" �v� ��0v��

�1"

�6�

@�"@�j

��1" �0v��n(�)0�n(�);

@2 lnLn(�jYn; Zn)@�j@�k

=1

2vec(Zn �Xn�)0

�[��1" (

@2�"@�j@�k

� 2@�"@�k

��1"@�"@�j

)��1" ] In�vec(Zn �Xn�)

+n

2tr

��1" (

@�"@�k

��1"@�"@�j

� @2�"@�j@�k

)

�+n�0v��

�1"

2�4�

�@�"@�k

��1" �v��0v��

�1"

@�"@�j

� �2�(@2�"@�j@�k

� 2@�"@�k

��1"@�"@�j

)

��1" �v�

+2�0v��

�1"

�4�

@�"@�j

��1" �v��n(�)0[(Zn �Xn�)��1"

@�"@�k

��1" �v�]

��0v��

�1"

�2�

�[(Zn �Xn�)��1" (

@2�"@�j@�k

� 2@�"@�k

��1"@�"@�j

)]0[Sn(�)Yn �Xn� � (Zn �Xn�)]

+[(Zn �Xn�)��1"@�"@�k

]0(Zn �Xn�)��1"@�"@�j

��1" �v�

+�0v��

�1"

2�6�

��2�(

@2�"@�j@�k

� 2@�"@�k

��1"@�"@�j

)� 2@�"@�k

��1" �v��0v��

�1"

@�"@�j

��1" �v��n(�)

0�n(�);

@2 lnLn(�jYn; Zn)@�v�@�0v�

=2n��1" �v��

0v��

�1"

�4�+n��1"�2�

� ��1" (Zn �Xn�)0

�2�(Zn �Xn�)��1"

+4��1" �v��4�

�n(�)0(Zn �Xn�)��1" �

4��1" �v��0v��

�1" + �2��

�1"

�6��n(�)

0�n(�);

@2 lnLn(�jYn; Zn)@�v�@�j

= ��0v��

�1"

�4�

@�"@�j

��1" �v�[(Zn �Xn�)��1" ]0�n(�)�n��1" �v��

0v��

�1"

�4�

@�"@�j

��1" �v�

+��1"�2�

�(Zn �Xn�)0(Zn �Xn�)��1"

@�"@�j

��1" �v� � [(Zn �Xn�)��1"@�"@�j

]0�n(�)

��2�

�1" �v��4�

�n(�)0[(Zn �Xn�)��1"

@�"@�j

��1" �v�]�n��1"�2�

@�"@�j

��1" �v�

+1

�6�[2��1" �v��

0v��

�1" + �2��

�1" ]@�"@�j

��1" �v��n(�)0�n(�):

27

Expectation of the second order derivative matrix is

E

�@2 lnLn(�0)

@�@�0

�=

1

�2�0

0BBBBBBBBB@

I�� I 0�� I 0�� I�v I 0�� I��

� �X 0nXn I�� 0 0 0

� � I�� 0 0 0

� 0 0 Ivv Iv� Iv�

� 0 0 � I�� I��

� 0 0 � � I��

1CCCCCCCCCA;

where

I�� = ��2�0tr[E(G2n +GnG0n)]� �00X 0nE(G

0nGn)Xn�0 � �00E("0nG0nGn"n)�0 � 2�00E("0nG0nGn)Xn�0;

I�� = �X 0nE(GnXn�0 +Gn"n�0);

I�� = (�0 X 0n)E(GnXn�0 +Gn"n�0); I�v = �E[tr(Gn)];

(I��)j = �00@�"0@�j

��1"0 E["0nGn(Xn�0 + "n�0)� �"0�0tr(Gn)];

I�� = 2�0E[tr(Gn)]� ��1"0 E["0nGn(Xn�0 + "n�0)];

I�� = �00 (X 0nXn); I�� = �(��1"0 �2�0 + �0�

00) (X 0

nXn) = �(��1"0 �2v0) (X 0nXn);

Ivv = � n

2�2�0; Iv� = �

n

2�2�0�00@�"0@�j

�0; Iv� =n�0�2�0

;

(I��)ij = �n�2�02tr

�@�"0@�j

��1"0@�"0@�k

��1"0

�� n�00

@�"0@�k

��1"0@�"0@�j

�0 �n�002�2�0

@�"0@�k

�0�00

@�"@�j

�0, i; j = 1; :::J ;

I�� =n

�2�0(��1"0 �

2�0 + �0�

00)@�"0@�j

�0 =n

�2�0��1"0 �

2v0

@�"0@�j

�0; I�� = �n(��1"0 +2�0�

00

�2�0):

Appendix B: Proofs

Proof of Claim 2. For any b > 2; the (i; j)th element ofW bn isW

bn(i; j) =

Pi1� � �P

ia�1wii1wi1i2 � � �wi(b�1)j .

From Lemma 2 in Qu and Lee (2012), locations i and i2 can have common neighbors only if i2 2 Bi(2k0),therefore by induction wi(b�1)j 6= 0 only if j 2 Bi(bk0).Similarly, the (i; j)th element of (W 0

n)aW b

n is (W0n)aW b

n(i; j) =Pn

k=1 gki(a)gkj(b); where

gki(a) =P

i1� � �P

ia�1wki1wi1i2 � � �wi(a�1)i with a; b � 2. So (W 0

n)aW b

n(i; j) 6= 0 only if i 2 Bk(ak0) andj 2 Bk(bk0). Therefore, j 2 Bi((a+ b)k0): Q.E.D.

Proof of Proposition 1. For the mean,

"0nAn"n =nXi=1

nXj=1

aij�i�j =nXi=1

Ti;

28

where Ti =Pn

j=1 aij�i�j =P

j2Bi(h)aij�i�j ; because aij = 0 when j 62 Bi(h). As aij�s are uniformly

bounded, there exists a �nite constant ca such that jaij j � ca. It follows that EjTij � ca(�2+Chd�12) <1;where �2 = E(�

2i ) and �12 = E

2(j�ij). Hence, E( 1n"0nAn"n) = O(1).

For the variance, var("0nAn"n) =Pn

i=1

Pnj=1 cov(Ti; Tj) =

Pni=1

Pj2Bi(2h)

cov(Ti; Tj), because Tj =Pk2Bj(h)

ajk�j�k and there are no common ��s in Ti and Tj when i and j are apart by more than 2h

distance. As jTij � caj�ijP

j2Bi(h)j�j j, jTij2 � c2aj�ij2(

Pj2Bi(h)

j�j j)2 = c2aj�ij2P

j2Bi(h)

Pk2Bi(h)

j�j jj�kj. Itfollows that EjTij2 � c2a(Chd)2(Ej�ij4+2E(j�3i j)Ej�ij+E2j�2i j+Ej�2i jE2j�ij), as Bj(2h) can have spatial unitsonly less than C(2h)d from j, which is �nite and not depending on n. In consequence, supiEjTij2 < 1,which implies in turn that supi var(Ti) <1. Hence, we have

var(1

n"0nAn"n) =

1

n2

nXi=1

Xj2Bi(2h)

cov(Ti; Tj) = O(1

n)

which goes to zero as n!1. Q.E.D.

Proof of Proposition 2. For any 0 < c < 1 and a �nite integerm, the seriesP1

l=0 lmcl is convergent by the

ratio test. Then the absolute mean property of 1nEj�0nBn�nj follows by the proof of the �rst part of Proposition

1. De�ne A�nl =Anl

jjAnljj1 so that all its entries have ja�ij;nlj � 1. As 1n j�

0nA

�nl�nj � 1

n

Pni=1

Pnj=1 ja�ij jj�ijj�j j �

1n

Pni=1

Pj2Bi((l+s)k0)

j�ijj�j j; we have E 1n j�

0nA

�nl�nj � c�(l+s)d�

+12, where c

� = Ckd0 and �+12 = �2+�12 with

�2 = E(�2i ) and �12 = E2j�ij = Ej�ijEj�j j when i 6= j. Therefore, E 1n j�

0nBn�nj = E 1

n jP1

l=0 al�0nAnl�nj �P1

l=0 kalAnlk1E 1n j�

0nA

�nl�nj �

P1l=0 c0(l + c1)

s1clc�(l + s)d�+12 = O(1).

To show 1n [�

0nBn�n � E(�0nBn�n)] = op(1), decompose �0nBn�n =

PL�1l=0 al�

0nAnl�n +

P1l=L al�

0nAnl�n for

some large L. For the �rst �nite summation, by Proposition 1, its deviation from its mean will converge to

zero in probability for any �nite L. So it remains to investigate the convergence of the second in�nite sum-

mation for large enough L. As E 1n j�

0nA

�nl�nj � c�(l+ s)d�

+12, for any positive value M and a speci�ed integer

s, P ( 1n j�0nAnl�nj �M(l + s)d+2+s) � 1

M(l+s)d+2+sEj 1n�

0nAnl�nj � 1

M(l+s)d+2+sc�(l + s)d�+12 =

c��+12M(l+s)2+s .

For a given small value � > 0, there exists an M� such thatc��+12M�

P1l=0

1(l+s)2+s < �. Then, for a joint

probability of events below,

P (1

nj�0nA�nl�nj < M�(l + s)

d+2+s;8l = 1; 2; � � � )

= 1� P ( 1nj�0nA�nl�nj �M�(l + s)

d+2+s; for some l)

� 1�1Xl=0

P (1

nj�0nA�nl�nj �M�(l + s)

d+2+s; for some l)

� 1� c��+12M�

1Xl=0

1

(l + s)2+s

> 1� �:

29

The event 1n j�

0nA

�nl�nj < M�(l + s)

d+2;8l = 1; 2; � � � , implies that

j 1n

1Xl=L

al�0nAnl�nj �

1Xl=L

jaljjjAnljj1j1

n�0nA

�nl�nj �

1Xl=L

c0(l + c1)s1clj 1

n�0nA

�nl�nj

� c0M�

1Xl=L

[l +max(s; c1)]d+2+s1cl � �

occurs with probability greater than (1 � �) for some large L�. Furthermore, Ej 1nP1

l=L al�0nAnl�nj �P1

l=L c0(l + c1)s1clEj 1n�

0nA

�nl�nj � c0c��

+12

P1l=L[l +max(s; c1)]

d+2+s1cl < � for some large L�.

From the above, we have for any arbitrary small � > 0 that

P (j 1n

1Xl=0

al[�0nAnl�n � E(�0nAnl�n)]j > 3�)

� P (j 1n

L��1Xl=0

al[�0nAnl�n � E(�0nAnl�n)]j > �) + P (j

1

n

1Xl=L�

al�0nAnl�nj > �) � 2�, as n!1:

Hence, j 1nP1

l=0 al[�0nAnl�n � E(�0nAnl�n)]j = op(1). Q.E.D.

Proof of Lemma 1 (the unique maximizer). With information inequality, E (lnLn(�)) � 1nE (lnLn(�0)).

For the uniqueness, we only need to show if limn!1[1nE (lnLn(�)) �

1nE (lnLn(�0))] = 0, then it must be

� = �0: From (A.1), the di¤erence of expectation of log likelihood function is

1

nE (lnLn(�))�

1

nE (lnLn(�0))

= �12ln�2��2�0

� 12lnj�"jj�"0j

� 1

2n

nXi=1

x0i(�0 � �)��1" (�0 � �)0xi

+1

nE(ln

jSn(�)jjSnj

)� 12tr(�

1=2"0 �

�1" �

1=2"0 � Ip)�

1

2�2��0�"0� +

1

2�2�0�00�"0�0

��2�02�2�

tr[E1

n(S�1n � �Gn)0(S�1n � �Gn)] +

1

2

� 1

2�2�E1

n(Xn�0 + "n�0)

0(S�1n � �Gn)0(S�1n � �Gn)(Xn�0 + "n�0) +1

2�2�0E1

n(Xn�0 + "n�0)

0(Xn�0 + "n�0)

� 1

2�2�[� + (�0 � �)�]0

X 0nXnn

[� + (�0 � �)�] +1

2�2�0�00X 0nXnn

�0

+1

�2�E1

n(Xn�0 + "n�0)

0(S�1n � �Gn)0fXn[� + (�0 � �)�] + "n�g �1

�2�0E1

n(Xn�0 + "n�0)

0(Xn�0 + "n�0):

30

Since S�1n � �0Gn = In, we have S�1n Sn(�) = S�1n � �Gn = In � (�� 0)Gn and

1

n[E (lnLn(�))� E (lnLn(�0))]

= �12ln�2��2�0

� 12lnj�"jj�"0j

+1

nE(ln

jSn(�)jjSnj

)� 12tr(�

1=2"0 �

�1" �

1=2"0 � Ip)

� 1

2n

nXi=1

x0i(�0 � �)��1" (�0 � �)0xi �1

2n�2�Ef[(�� 0)Hn � Jn]0[(�� 0)Hn � Jn]g

��2�02n�2�

trE[(In � (�� 0)Gn)0(In � (�� 0)Gn)] +1

2

= �12[tr(�

1=2"0 �

�1" �

1=2"0 )� ln j�

1=2"0 �

�1" �

1=2"0 j � p]

� 1

2n

nXi=1

x0i(�0 � �)��1" (�0 � �)0xi �1

2n�2�Ef[(�� 0)Hn � Jn]0[(�� 0)Hn � Jn]g

� 1

2nE

"tr

�2�0�2�[In � (�� 0)Gn]0[In � (�� 0)Gn]

!� ln

��2�0�2� [In � (�� 0)Gn]0[In � (�� 0)Gn]�� n

#;

where Hn = Gn(Xn�0+ "n�0) and Jn = Xn[�0�� (�0��)�]� "n(�0� �): It is easy to check that for anyx > 0, the function f(x) = x� lnx� 1 � 0 and is minimized only at x = 1. Also for any symmetric, positivede�nite real value matrix M , the function f(M) = tr(M) � ln jM j �m =

Pmi=1('i � ln'i � 1) � 0 and is

minimized only at M = Im, where m is the dimension of M and '0is (i = 1; :::m) are eigenvalues of M .

If limn!1[1nE (lnLn(�))�

1nE (lnLn(�0))] = 0 then it must be �

1=2"0 �

�1" �

1=2"0 = Ip, (�0��)��1" (�0��) =

0 because limn!1X0nXn

n is nonsingular; limn!11nEf[(��0)Hn�Jn]

0[(��0)Hn�Jn]g = 0 and�2�0�2�[In�

(��0)Gn]0[In� (��0)Gn] = In with probability close to 1 for large enough n. Apparently, �" = �"0 and� = �0, so limn!1

1nEf[(��0)Gn(Xn�0+"n�0)�Xn(�0��)+"n(�0��)]

0[(��0)Gn(Xn�0+"n�0)�Xn(�0��)+"n(�0��)] = 0: As the square matrix of mean vector is less than the second moment matrix, this implieslimn!1

1n [(�� 0)E(GnXn�0 +Gn"n�0)�Xn(�0 � �)][(�� 0)E(GnXn�0 +Gn"n�0)�Xn(�0 � �)]

0 = 0.

Under Assumption 4, it must be � = �0 and � = �0. Then�2�0�2�= 1 and (�0 � �)0�"0(�0 � �) = 0, so it must

be � = �0, i.e., �v� = �v�0. These gives us � = �0 if limn!11n [E (lnLn(�))� E (lnLn(�0))] = 0. Therefore,

we can conclude that �0 is the unique maximizer of limn!11nE (lnLn(�)) : Q.E.D.

Proof of Theorem 1. With the compact parameter space, once we have �0 is the unique maximizer, we

need check two conditions for the consistency of the MLE.

Step 1: Uniform convergence of the log likelihood function. All terms in the the log likelihood function

can be expressed in our general terms with Mn, the pointwise convergence is straightforward:

1

nlnLn(�jYn; Zn)� E[

1

nlnLn(�jYn; Zn)]

p! 0:

31

All parameters are bounded in their parameter spaces and they enter the log likelihood function polynomially

except for the term ln jSn(�)j, so we need to show the stochastic equicontinuity of 1n ln jSn(�)j to have theuniform convergence. Applying the mean value theorem,�� 1n (ln jSn(�1)j � ln jSn(�2)j)

�� = ��(�2 � �1) 1ntr(Gn(�))�� j�2 � �1jCse; (A.2)

where � is a value between �1 and �2 and Cse is a constant not depending on n. This inequality is implied

by the fact Gn(�) is bounded in both row and column sums uniformly in our compact parameter space under

Assumption 2.3). From this, we can conclude the uniform convergence that

sup�2�

�1

nlnLn(�jYn; Zn)� E[

1

nlnLn(�jYn; Zn)]

�p! 0:

Step 2: Uniform equicontinuity of the limiting sample average of the expected log likelihood function

limn!1E�1n lnLn(�jYn; Zn)

�.

From (A.1), with inequality in (A.2), and variance parameters being bounded away from zero in compact

parameter spaces, earlier results of E (1n j�0nBn�n j) = O(1 ) and E( 1n jX

0nBn�nj) = O(1), we can conclude that

E�1n lnLn(�jYn; Zn)

�is uniformly equicontinuous in � 2 �.

From Lemma 1, �0 is the unique maximizer of limn!1E[1n lnLn(�jYn; Zn)]: These together imply b� p! �0:

Q.E.D.

Proof of Lemma 2 (the CLT). We prove the CLT for the form of

Rn =1pn

�A0n&n +B

0n�n + �

0nCn�n + &

0nDn&n + �

0nEn&n � E(�0nCn�nj&n)� E(& 0nDn&n)

�;

The entries &i�s of &n are i.i.d with mean zero and variance �2& , and �n is independent of Bn, Cn, and

En,5 so

Rn =1pn

nXi=1

0@ai&i + bi�i + �i nXj

cij�j � E(�inXj

cij�j j&n) + &idijnXj

&j � E(&idijnXj

&j) + �i

nXj

eij&j

1A=

1pn

nXi=1

0@ai&i + bi�i + 2�i i�1Xj=1

cij�j + cii(�2i � �2�) + 2&i

i�1Xj=1

dij&j + dii(&2i � �2& ) + �i

nXj

eij&j

1A ;5Here we treat Cn and Dn as symmetric. If not, we can always replace Cn with (Cn + C0n)=2.

32

LetRn = 1pn

P2nl=1Rn;l, then we can de�neRn;l and the corresponding �-�led Fn;l with following expressions:

Rn;1 =1pn[a1&1 + d11(&

21 � �2& )]; Fn;1 =< &1 >;

Rn;2 =1pn[a2&2 + 2&2d21&1 + d22(&

22 � �2& )]; Fn;2 =< &1; &2 >;

......

Rn;n =1pn[an&n + 2&n

n�1Xj=1

dnj&j + dnn(&2n � �2& )]; Fn;n =< &1; � � � &n >;

Rn;n+1 =1pn[b1�1 + �1

nXj

e1j&j + c11(�21 � �2�)], Fn;n+1 =< &n; �1 >;

Rn;n+1 =1pn[b2�2 + �2

nXj

e2j&j + 2�2c21�1 + c22(�22 � �2�)], Fn;n+1 =< &n; �1; �2 >;

...,...;

Rn;2n =1pn[bn�n + �n

nXj

enj&j + 2�n

n�1Xj=1

cij�j + cnn(�2n � �2�)], Fn;n+1 =< &n; �1; � � � �n > :

Because &�s and ��s are mutually independent, E(Rn;ljFn;l�1) = 0, for l = 2; :::2n. Then f(Rn;l;Fn;l)j1 �l � 2ng forms a martingale di¤erence double array. We have V ar(

P2nl=1Rn;l) =

P2nl=1ER

2n;l as Rn;l�s are

martingale di¤erences. De�ne this variance as �2Rnand the normalized variables R�n;l = Rn;l=�Rn

. In order

for the martingale central limit theorem to be applicable, we shall show that there exists a �R > 0 such that

2nXl=1

EjR�n;lj2+�R ! 0 as n!1;

2nXl=1

E(R�2n;ljFn;l�1)� 1p! 0:

Denote �2R1;nand �2R2;n

as the variance of Rd1;n and Rd2;n, where

Rd1;n =1pnA0n&n + &

0nDn&n � E(& 0nDn&n) and Rd2;n =

1pnB0n�n + �

0nCn�n + �

0nEn&n � E(�0nCn�nj"n);

33

then Rn = Rd1;n +Rd2;n and �2Rn= �2R1;n

+ �2R2;n. The expressions are

�2R1;n=

�2&nA0nAn +

�4&ntr(D2

n +DnD0n) + [E(&

4i )� 3�4& ]

1

n

nXi=1

d2ii +2

nE(&3i )

nXi=1

aidii

and �2R2;n=

�2�nE(B0nBn) +

�4�ntr[E(C2n + CnC

0n)] +

�2�nE(& 0nE

0nEn&n) +

2�2�nE(B0nEn&n)

+[E(�4i )� 3�4� ]1

n

nXi=1

E(c2ii) +2

nE(�3i )

nXi=1

E(bicii) +2

nE(�3i )

nXi=1

E(ciieij&j):

As �2&Pn

i=1 a2i + V ar(&

2i )Pn

i=1 d2ii � 2

q�2&V ar(&

2i )Pn

i=1 jaidiij � 2jE(&3i )jPn

i=1 jaidiij, we have

�2R1=�2&n

nXi=1

a2i +�4&n

Xi

Xj 6=i(d2ij + d

2ji) +

V ar(&2i )

n

nXi=1

d2ii +2

nE(&3i )

nXi=1

aidii ��4&n

Xi

Xj 6=i(d2ij + d

2ji):

So a su¢ cient condition for �2R1;nto be bounded away from zero is that limn!1

1n

Pni=1

Pnj 6=i(d

2ij+d

2ji) > 0.

Similarly, a su¢ cient condition for �2R2;nto be bounded away from zero is that limn!1

1n

Pni=1

Pnj 6=iE(c

2ij+

c2ji) > 0. Therefore, a su¢ cient condition for �2Rn> 0 is that limn!1

1n

Pni=1

Pnj 6=i(d

2ij + d

2ji) > 0 or

limn!11n

Pni=1

Pnj 6=iE(c

2ij + c

2ji) > 0.

From Propositions 1-2 and the corollary, we know that E(B0nBn), E(C2n + CnC

0n), E(&

0nE

0nEn&n), and

E(B0nEn&n) are O(n). Therefore, �2R1;n

; �2R2;n, and �2Rn

are O(1). For l = 1; � � � ; n, Rn;l = 1pn(ai&i +

2&iPi�1

j=1 dij&j + dii(&2i � �2& )). Since Rn;l=�R1;n

is conventional linear quadratic form, the two conditions

for the summation of �rst n terms are easy to check. We havePn

l=1EjR�n;lj2+�R ! 0 as n ! 1 andPnl=1E(R

2n;ljFn;l�1)=�2R1

� 1 p! 0:

34

For l = n+ 1; � � � ; 2n, Rn;l = 1pn[bi�i + 2�i

Pi�1j=1 cij�j + cii(�

2i � �2�) + �i

Pnj eij&j ], where i = l � n, so

E(R2n;ljFn;l�1) =1

nE[b2i �

2i + 4�

2i

i�1Xj=1

cij�j

i�1Xk=1

cik�k + c2ii(�

2i � �2�)2 + �2i

nXj=1

eij&j

nXk=1

eik&k + 4bi�2i

i�1Xj=1

cij�j

+2bicii(�3i � �i�2�) + 2bi�2i

nXj

eij&j + 4cii(�3i � �i�2�)

i�1Xj=1

cij�j + 4�2i

i�1Xj=1

cij�j

nXk=1

eik&k + 2cii(�3i � �i�2�)

nXj

eij&j ]

=1

n[b2i�

2� + 4�

2�

i�1Xj=1

cij�j

i�1Xk=1

cik�k + c2iiV ar(�

2i ) + �

2�

nXj=1

eij&j

nXk=1

eik&k + 4bi�2�

i�1Xj=1

cij�j

+2biciiE(�3i ) + 2bi�

2�

nXj

eij&j + 4ciiE(�3i )

i�1Xj=1

cij�j + 4�2�

i�1Xj=1

cij�j

nXk=1

eik&k + 2ciiE(�3i )

nXj

eij&j ];

E(R2n;l) =1

n[E(b2i )�

2� + 4�

4�

i�1Xj=1

E(c2ij) + E(c2ii)V ar(�

2i ) + �

2�

nXj=1

nXk=1

E(eijeik&j&k)

+2E(bicii)E(�3i ) + 2�

2�

nXj

E(bieij&j) + 2E(�3i )

nXj

E(ciieij&j)]:

Therefore for l = n+ 1; � � � ; 2n and i = l � n,

E(R2n;ljFn;l�1)� E(R2n;l) =1

n[�2�(b

2i � E(b2i ) + 4�2�

i�1Xj=1

cij�j

i�1Xk=1

cik�k � �2�E(c2ij)!+ (c2ii � E(c2ii))V ar(�2i )

+�2�

nXj=1

nXk=1

(eij&jeik&k � E(eij&jeik&k)) + 4bi�2�i�1Xj=1

cij�j + 2E(�3i )(bicii � E(bicii)) + 2�2�

nXj

(bieij&j � E(bieij&j))

+4ciiE(�3i )

i�1Xj=1

cij�j + 4�2�

i�1Xj=1

cij�j

nXk=1

eik&k + 2E(�3i )

nXj

(ciieij&j � E(ciieij&j))]

=1

nf�2� [b2i � E(b2i )] + V ar(�2i )[c2ii � E(c2ii)] + 2E(�3i )[bicii � E(bicii)]g

+4

n

i�1Xj=1

"�2�

cij�j

i�1Xk=1

cik�k � �2�E(c2ij)!+ bi�

2�cij�j + ciiE(�

3i )cij�j + �

2�cij�j

nXk=1

eik&k

#

+1

n

nXj=1

"�2�

nXk=1

(eij&jeik&k � E(eij&jeik&k)) + 2�2�(bieij&j � E(bieij&j)) + 2E(�3i )(ciieij&j � E(ciieij&j))#;

and hence,

2nXl=n+1

[E(R2n;ljFn;l�1)� E(R2n;l)] = H1n +H2n +H3n +H4n +H5n +H6n +H7n +H8n +H9n +H10n;

35

where

H1n =�2�n

nXi=1

[b2i � E(b2i )]; H2n =V ar(�2i )

n

nXi=1

[c2ii � E(c2ii)]; H3n =2E(�3i )

n

nXi=1

[bicii � E(bicii)];

H4n =4�2�n

nXi=1

i�1Xj=1

cij�j

i�1Xk=1

cik�k � �2�E(c2ij)!; H5n =

4�2�n

nXi=1

i�1Xj=1

bicij�j ; H6n =4E(�3i )

n

nXi=1

i�1Xj=1

ciicij�j ;

H7n =4�2�n

nXi=1

i�1Xj=1

cij�j

nXk=1

eik&k; H8n =2�2�n

nXi=1

nXj=1

(bieij&j � E(bieij&j));

H9n =2E(�3i )

n

nXi=1

nXj=1

(ciieij&j � E(ciieij&j)); H10n =�2�n

nXi=1

nXj=1

nXk=1

(eij&jeik&k � E(eij&jeik&k)):

We need to showHjn = op(1) for all j = 1; :::; 10. AsH1n =�2�n

Pni=1[b

2i�E(b2i )] =

�2�n (B

0nBn�E(B0nBn)) and

H10n =�2�n

Pni=1

Pnj=1

Pnk=1(eij&jeik&k � E(eij&jeik&k)) =

�2�n [&

0nE

0nEn&n � E(& 0nE0nEn&n)]; if Bn, Cn, and

En satisfy conditions in Assumption 5, then from Propositions 1-2 and Corollary 1, H10n = op(1). Similar

arguments can be applied to Hjn for j = 2; :::; 9: These imply thatP2n

l=n+1E(R2n;ljFn;l�1)=�2R2;n

� 1 p! 0:

Together withPn

l=1E(R2n;ljFn;l�1)=�2R1;n

� 1 p! 0, we have

2nXl=1

E(R2n;ljFn;l�1)� (�2R1;n+ �2R2;n

)p! 0; i.e.,

2nXl=1

E(R�2n;ljFn;l�1)� 1p! 0:

The next step is to showP2n

l=n+1EjRn;lj2+�R ! 0 as n ! 1. For l = n + 1; � � � ; 2n, let i = l � n thenfor any positive constants p and q such that 1=p+ 1=q = 1, we have

jRn;lj =1pnjbi�i + 2�i

i�1Xj=1

cij�j + cii(�2i � �2�) + �i

nXj

eij&j j

� 1pn(jbijj�ij+ 2j�ij

i�1Xj=1

jcij jj�j j+ jciijj�2i � �2� j+ j�ijnXj

jeij jj&j j)

=1pn(jbij

1p jbij

1q j�ij+ 2j�ij

i�1Xj=1

jcij j1p jcij j

1q j�j j+ jciij

1p jciij

1q j�2i � �2� j+ j�ij

nXj

jeij j1p jeij j

1q j&j j):

36

By Holder�s inequality, we can get

jRn;ljq � n�q=2

240@(jbij 1p )p + iXj=1

(jcij j1p )p +

nXj

(jeij j1p )p

1A 1p

0@(jbij 1q j�ij)q + i�1Xj=1

(2j�ijjcij j1q j�j j)q + (jciij

1q j�2i � �2� j)q +

nXj

(j�ijjeij j1q j&j j)q

1A 1q

375q

= n�q2

0@jbij+ iXj=1

jcij j+nXj

jeij j

1Aqp0@jbijj�qi j+ 2q i�1X

j=1

jcij jj�qi jj�qj j+ jciijj�

2i � �2� j)q + j�

qi j

nXj

jeij jj&qj j

1A :As Cn and En are uniformly bounded in both row and column sum norms and elements of Bn are uniformly

bounded, there exists a constant cR such that jbij � cR=3,Pi

j=1 jcij j � cR=3, andPn

j jeij j � cR=3 for all iand n. Hence,

jRn;ljq � n�q2 cq=pR

0@jbijj�qi j+ 2q i�1Xj=1

jcij jj�qi jj�qj j+ jciij(j�

2i � �2� j)q +

nXj

jeij jj�qi jj&qj j

1A :There exists a �nite constant cq > 1 such that Ej&qi j � cq and (j�2i � �2� j)q � cq for all i and n. Taking

q = 2 + �R, as Ej�2(2+�R)i j exists andPn

j (jeij jj�qi j) � cGj�

qi j where cG = jjEnjj1 <1, it follows that

2nXl=n+1

EjRn;lj2+�R � 22+�Rn�2+�R2 c

(2+�R)=pR c2q

0@jbij+ iXj=1

Ejcij j+ jciij+nXj

E(jeij jj&2+�Ri j)

1A = O(n��R2 ):

As �2Rn= O(1) and is bounded away from zero,

2nXl=1

EjR�n;lj2+�R =1

�2+�RRn

2nXl=1

EjRn;lj2+�R = O(n��R2 );

which goes to zero as n tends to in�nity. Conditions for the martingale CLT satisfy so Rn=�Rn

d! N(0; 1):

Q.E.D.

Proof of Lemma 3 (the nonsingular information matrix). The information matrix is I� =

� limn!1E�1n@2 lnLn(�0)

@�@�0

�, where the expression of E

�@2 lnLn(�0)

@�@�0

�can be found in Appendix A.3. Let

CI = (c1; c02; vec(c3)

0, c4, c05, c06)0 be a k� dimensional column vector of constants such that I�0CI = 0, where

c1 and c4 are constants; c2, c5; and c6 are column vectors of dimension k, J , and p; c3 is a k� p matrix. Toshow the information matrix is nonsingular, it is su¢ cient to show CI = 0.

37

From the second row block of the linear equation system I�0CI = 0, we have

limn!1

1

n[�c1X 0

nE(GnXn�0 +Gn"n�0)�X 0nXnc2 + (�

00 (X 0

nXn))vec(c3)] = 0:

Therefore

limn!1

1

nX 0nXnc2 = lim

n!1

1

n[(�00 (X 0

nXn))vec(c3)� c1X 0nE(GnXn�0 +Gn"n�0)]:

From the third row block, we have

limn!1

1

n[c1(�0 X 0

n)E(GnXn�0 +Gn"n�0) + (�0 (X 0nXn))c2 � ((��1"0 �2v0) (X 0

nXn))vec(c3)] = 0:

Since (�0 (X 0nXn))c2 = �0 (X 0

nXnc2), by eliminating limn!11nX

0nXnc2, we have

vec

�limn!1

X 0nXnn

c3(�0�00 � �2v0��1"0 )

�= vec

�limn!1

X 0nXnn

c3��1"0 (�v��

0v� � �2v0�"0)��1"0

�= 0:

Under Assumption 2.5), �v��0v��2v0�"0 is negative de�nite by Cauchy�Schwarz inequality and Assumption2.3) implies limn!1

X0nXn

n is positive de�nite. Hence, it follows that c3 = 0. Now with c3 = 0,

c2 = �c1 limn!1

(X 0nXn)

�1X 0nE(GnXn�0 +Gn"n�0):

From the fourth row block, we have

limn!1

1

nE[�c1tr(Gn)]�

c42�2�0

�JXj=1

c5j2�2�0

�00@�"0@�j

�0 +�00�2�0

c6 = 0: (B.1)

From the sixth row block, we have

limn!1

1

nc1f2�0E[tr(Gn)]��1"0 E["0nGn(Xn�0+"n�0)]g+c4

�0�2�0

+

JXj=1

c5j�2�0

��1"0 �2v0

@�"0@�j

�0�(��1"0 +2�0�

00

�2�0)c6 = 0:

(B.2)

Then (B.1)�2�0+(B.2) gives

limn!1

c1n��1"0 E["

0nGn(Xn�0 + "n�0)]�

JXj=1

c5j��1"0

@�"0@�j

�0 +��1"0 c6 = 0:

Therefore

c6 =JXj=1

c5j@�"0@�j

�0 � c1 limn!1

1

nE["0nGn(Xn�0 + "n�0)]:

38

From the �fth row block, for any k = 1; :::J , we have

limn!1

1

nc1

��00@�"0@�k

��1"0 E["0nGn(Xn�0 + "n�0)� �"0�0tr(Gn)]

�� c42�2�0

�00@�"0@�k

�0 +�2v0�2��00@�"0@�k

��1"0 c6

(B.3)

�JXj=1

c5j

"�2�02tr

�@�"0@�j

��1"0@�"0@�k

��1"0

�+ �00

@�"0@�k

��1"0@�"0@�j

�0 +�002�2�0

@�"@�k

�0�00

@�"@�j

�0

#= 0:

Plugging c6 into (B.1), (B.1) becomes

�c1 limn!1

Etr(Gn)

n� c42�2�0

+�002�2�0

0@ JXj=1

c5j@�"0@�j

�0 � 2c1 limn!1

1

nE["0nGn(Xn�0 + "n�0)]

1A = 0:

Plugging c6 into (B.3), (B.3) becomes

�00@�"0@�k

�0

24 �002�2�0

0@ JXj=1

c5j@�"0@�j

�0 � 2c1 limn!1

1

nE["0nGn(Xn�0 + "n�0)]

1A� c1 limn!1

Etr(Gn)

n� c42�2�0

35��2�02

JXj=1

c5jtr

�@�"0@�j

��1"0@�"0@�k

��1"0

�= 0:

These two equations imply

JXj=1

c5jtr

�@�"0@�j

��1"0@�"0@�k

��1"0

�= 0 for any k = 1; :::J:

Since �0js are distinct parameters elements in �", all the entries in@�"0@�k

are zero except for the place where

�k is. Hence, for each k = 1; :::J , c5k = 0, i.e., c5 = 0. With c3 = 0 and c5 = 0, we have

c6 = �c1 limn!1

1

nE["0nGn(Xn�0 + "n�0)];

c4 = �2c1�00 limn!1

1

nE["0nGn(Xn�0 + "n�0)]� 2c1�2�0 lim

n!1Etr(Gn)

n:

39

Go to the �rst row block,

0 = � limn!1

c11

nE[(Xn�0 + "n�0)

0G0nGn(Xn�0 + "n�0)]� c1 limn!1

1

n[�2�0tr[E(G

2n +GnG

0n)]

+c1 limn!1

1

nE[(Xn�0 + "n�0)

0G0n]Xn(X0nXn)

�1X 0nE[Gn(Xn�0 + "n�0)] + 2c1�

2�0

�limn!1

Etr(Gn)

n

�2+c1

�limn!1

E"0nGn(Xn�0 + "n�0)

n

�0��1"0

�limn!1

E"0nGn(Xn�0 + "n�0)

n

�:

Denote Hn = Gn(Xn�0 + "n�0), then

0 = c1 limn!1

1

n[E(H 0

nHn)� E(H 0n"n)�

�1"0 E("

0nHn)� E(H 0

n)Xn(X0nXn)

�1X 0nE(Hn)]

+c1 limn!1

�2�0

Etr(G2n +GnG

0n)

n� 2

�Etr(Gn)

n

�2!:

By Cauchy�Schwarz inequality, E(H 0nHn)� E(H 0

n)E(Hn) � E(H 0n"n)�

�1"0 E("

0nHn). Hence,

E(H 0nHn)� E(H 0

n"n)��1"0 E("

0nHn)� E(H 0

n)Xn(X0nXn)

�1X 0nE(Hn)

� E(H 0n)E(Hn)� E(H 0

n)Xn(X0nXn)

�1X 0nE(Hn) = E(H

0n)[In �Xn(X 0

nXn)�1X 0

n]E(Hn) � 0:

Because Assumption 3.2 implies limn!11nE(H

0n)[In�Xn(X 0

nXn)�1X 0

n]E(Hn) is positive de�nite and E[tr(G2n+

GnG0n)]� 2

nE2[tr(Gn)] � E[tr(G2n +GnG0n � 2

n tr2(Gn)] =

12E[tr(Gn +G

0n � 2tr(Gn)In=n)2] � 0, it follows

that c1 = 0, and therefore c2, c4, and c6 are all zeros. Q.E.D.

Proof of Theorem 2. The stochastic terms in second derivatives can be written in the general form with

associated with A (h spatial dependence of "n) or B (an increasing dependence geometric series expansion),

so from Propositions 1 and 2 we have the pointwise convergence

1

n

@2 lnLn(�jYn; Zn)@�@�0� E

�@2 ln(Ln(�jYn; Zn))

@�@�0

� p! 0:

All parameters in � entry the second derivatives polynomially except for tr[G2n(�)]: As

1

n

��tr[G2n(�1)]� tr[G2n(�2)]�� = 1

n

��2tr[G3n(�)](�1 � �2)�� = j�1 � �2jO(1);the second derivatives are stochastically equicontinuous. Then the uniform convergence holds,

sup�2�

1

n

@2 lnLn(�jYn; Zn)@�@�0� E

�@2 ln(Ln(�jYn; Zn))

@�@�0

� p! 0:

40

Hence,

pn(b� � �0) = �

1

n

@2 lnLn(e�)@�@�0

!�11pn

@ lnLn(�0)

@�

= ��E

�1

n

@2 lnLn(�0)

@�@�0

��11pn

@ lnLn(�0)

@�+ op(1)

d! N

0;

�limn!1

E(1

n

@ lnLn(�0)

@�

@ lnLn(�0)

@�0)

��1!:

The convergence in distribution holds because any linear combination of the elements in @ lnLn(�0)@� can be

expressed in the form of Rn in the CLT of Lemma 2. By Cramer-Wold device, we can concludepn(b��0) d!

N(0; I�1� ). Q.E.D.

41

estimating a spatial autoregressive model with an …...emphasized, endogeneity of spatial weights...

Documents