consistent speciﬁcation testing under spatial dependence

arX

iv:2

101.

1025

5v2

[ec

on.E

M]

31

Aug

202

1

Consistent specification testing under spatial

dependence∗

Abhimanyu Gupta†‡ Xi Qu§¶

September 1, 2021

Abstract

We propose a series-based nonparametric specification test for a regression function

when data are spatially dependent, the ‘space’ being of a general economic or social

nature. Dependence can be parametric, parametric with increasing dimension, semi-

parametric or any combination thereof, thus covering a vast variety of settings. These

include spatial error models of varying types and levels of complexity. Under a new

smooth spatial dependence condition, our test statistic is asymptotically standard nor-

mal. To prove the latter property, we establish a central limit theorem for quadratic

forms in linear processes in an increasing dimension setting. Finite sample perfor-

mance is investigated in a simulation study, with a bootstrap method also justified and

illustrated, and empirical examples illustrate the test with real-world data.

Keywords: Specification testing, nonparametric regression, spatial dependence, cross-

sectional dependence

JEL Classification: C21, C55

∗We thank the co-editor and three anonymous referees for insightful comments that improved the paper.We are grateful to Swati Chandna, Miguel Delgado, Emmanuel Guerre, Fernando Lopez Hernandez, HonHo Kwok, Arthur Lewbel, Daisuke Murakami, Ryo Okui and Amol Sasane for helpful comments. We alsothank seminar participants at YEAP 2018 (Shanghai University of Finance and Economics), NYU Shanghai,Carlos III Madrid, SEW 2018 (Dijon), Aarhus University, SEA 2018 (Vienna), EcoSta 2018 (Hong Kong),Hong Kong University, AFES 2018 (Cotonou), ESEM 2018 (Cologne), CFE 2018 (Pisa), University of York,Pennsylvania State University, Michigan State University, University of Michigan, Texas A&M University,1st Southampton Workshop on Econometrics and Statistics and MEG 2019 (Columbus).

†Department of Economics, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK. E-mail:[email protected].

‡Research supported by ESRC grant ES/R006032/1.§Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China, 200052.

E-mail: [email protected].¶Research supported by the National Natural Science Foundation of China, Project nos. 71973097,

71601115 and 72031006.

1

http://arxiv.org/abs/2101.10255v2

1 Introduction

Models for spatial dependence have recently become the subject of vigorous research. This

burgeoning interest has roots in the needs of practitioners who frequently have access to data

sets featuring inter-connected cross-sectional units. Motivated by these practical concerns,

we propose a specification test for a regression function in a general setup that covers a vast

variety of commonly employed spatial dependence models and permits the complexity of

dependence to increase with sample size. Our test is consistent, in the sense that a parametric

specification is tested with asymptotically unit power against a nonparametric alternative.

The ‘spatial’ models that we study are not restricted in any way to be geographic in nature,

indeed ‘space’ can be a very general economic or social space. Our empirical examples

feature conflict alliances and technology externalities as examples of ‘spatial dependence’,

for instance.

Specification testing is an important problem, and this is reflected in a huge literature

studying consistent tests. Much of this is based on independent, and often also identically

distributed, data. However data frequently exhibit dependence and consequently a branch

of the literature has also examined specification tests under time series dependence. Our

interest centres on dependence across a ‘space’, which differs quite fundamentally from de-

pendence in a time series context. Time series are naturally ordered and locations of the

observations can be observed, or at least the process generating these locations may be

modelled. It can be imagined that concepts from time series dependence be extended to

settings where the data are observed on a geographic space and dependence can be treated

as decreasing function of distance between observations. Indeed much work has been done to

extend notions of time series dependence in this type of setting, see e.g. Jenish and Prucha

(2009, 2012).

However, in a huge variety of economics and social science applications agents influence

each other in ways that do not conform to such a setting. For example, farmers affect the de-

mand of farmers in the same village but not in different villages, as in Case (1991). Likewise,

price competition among firms exhibits spatial features (Pinkse, Slade, and Brett (2002)),

input-output relations lead to complementarities between sectors (Conley and Dupor (2003)),

co-author connections form among scientists (Oettl (2012), Mohnen (2021)), R&D spillovers

occur through technology and product market spaces (Bloom, Schankerman, and van Reenen

(2013)), networks form due to allegiances in conflicts (Konig, Rohner, Thoenig, and Zilibotti

(2017)) and overlapping bank portfolios lead to correlated lending decisions (Gupta, Kokas, and Michaelides

(2021)). Such examples cannot be studied by simply extending results developed for time

series and illustrate the growing need for suitable methods.

2

A very popular model for general spatial dependence is the spatial autoregressive (SAR)

class, due to Cliff and Ord (1973). The key feature of SAR models, and various gener-

alizations such as SARMA (SAR moving average) and matrix exponential spatial speci-

fications (MESS, due to LeSage and Pace (2007)), is the presence of one or more spatial

weight matrices whose elements characterize the links between agents. As noted above,

these links may form for a variety of reasons, so the ‘spatial’ terminology represents a very

general notion of space, such as social or economic space. Key papers on the estimation of

SAR models and their variants include Kelejian and Prucha (1998) and Lee (2004), but re-

search on various aspects of these is active and ongoing, see e.g. Robinson and Rossi (2015);

Hillier and Martellosio (2018a,b); Kuersteiner and Prucha (2020); Han, Lee, and Xu (2021);

Hahn, Kuersteiner, and Mazzocco (2021).

Unlike work focusing on independent or time series data, a general drawback of spa-

tially oriented research has been the lack of general unified theory. Typically, individual

papers have studied specific special cases of various spatial specifications. A strand of the

literature has introduced the notion of a cross-sectional linear-process to help address this

problem, and we follow this approach. This representation can accommodate SAR mod-

els in the error term (so called spatial error models (SEM)) as a special case, as well as

variants like SARMA and MESS, whence its generality is apparent. The linear-process

structure shares some similarities with that familiar from the time series literature (see e.g.

Hannan (1970)). Indeed, time series versions may be regarded as very special cases but,

as stressed before, the features of spatial dependence must be taken into account in the

general formulation. Such a representation was introduced by Robinson (2011) and further

examined in other situations by Robinson and Thawornkaiwong (2012) (partially linear re-

gression), Delgado and Robinson (2015) (non-nested correlation testing), Lee and Robinson

(2016) (series estimation of nonparametric regression) and Hidalgo and Schafgans (2017)

(cross-sectionally dependent panels).

In this paper, we propose a test statistic similar to that of Hong and White (1995), based

on estimating the nonparametric specification via series approximations. Assuming an inde-

pendent and identically distributed sample, their statistic is based on the sample covariance

between the residual from the parametric model and the discrepancy between the paramet-

ric and nonparametric fitted values. Allowing additionally for spatial dependence through

the form of a linear process as discussed above, our statistic is shown to be asymptotically

standard normal, consistent and possessing nontrivial power against local alternatives of a

certain type. To prove asymptotic normality, we present a new central limit theorem (CLT)

for quadratic forms in linear processes in an increasing dimension setting that may be of

independent interest. A CLT for quadratic forms under time series dependence in the con-

3

text of series estimation can be found in Gao and Anh (2000), and our result can viewed

as complementary to this. The setting of Su and Qu (2017) is a very special case of our

framework. There has been recent interest in specification testing for spatial models, see for

example Sun (2020) for a kernel-based model specification test and Lee, Phillips, and Rossi

(2020) for a consistent omnibus test. We contribute to this literature by studying a linear

process based increasing parameter dimension framework.

Our linear process framework permits spatial dependence to be parametric, parametric

with increasing dimension, semiparametric or any combination thereof, thus covering a vast

variety of settings. A class of models of great empirical interest are ‘higher-order’ SAR

models in the outcome variables, but with spatial dependence structure also in the errors.

We initially present the familiar nonparametric regression to clarify the exposition, and then

cover this class as the main model of interest. Our theory covers as special cases SAR, SMA,

SARMA, MESS models for the error term. These specifications may be of any fixed spatial

order, but our theory also covers the case where they are of increasing order.

Thus we permit a more complex model of spatial dependence as more data become avail-

able, which encourages a more flexible approach to modelling such dependence as stressed

by Gupta and Robinson (2015, 2018) in a higher-order SAR context, Huber (1973), Portnoy

(1984, 1985) and Anatolyev (2012) in a regression context and Koenker and Machado (1999)

for the generalized method of moments setting, amongst others. This literature focuses on

a sequence of true models, rather than a sequence of models approximating an infinite true

model. Our paper also takes the same approach. On the other hand, in the spatial setting,

Gupta (2018a) considers increasing lag models as approximations to an infinite lag model

with lattice data and also suggests criteria for choice of lag length.

Our framework is also extended to the situation where spatial dependence occurs through

nonparametric functions of raw distances (these may be exogenous economic or social dis-

tances, say), as in Pinkse et al. (2002). This allows for greater flexibility in modelling spatial

weights as the practitioner only has to choose an exogenous economic distance measure and

allow the data to determine the functional form. It also adds a degree of robustness to the

theory by avoiding potential parametric misspecification. The case of geographical data is

also covered, for example the important classes of Matern and Wendland (see e.g. Gneiting

(2002)) covariance functions. Finally, we introduce a new notion of smooth spatial depen-

dence that provides more primitive, and checkable, conditions for certain properties than

extant ones in the literature.

To illustrate the performance of the test in finite samples, we present Monte Carlo simu-

lations that exhibit satisfactory small sample properties. The test is demonstrated in three

empirical examples, including two based on recently published work on social networks:

4

Bloom et al. (2013) (R&D spillovers in innovation), Konig et al. (2017) (conflict alliances

during the Congolese civil war). Another example studies cross-country spillovers in eco-

nomic growth. Our test may or may not reject the null hypothesis of a linear regression in

these examples, illustrating its ability to distinguish well between the null and alternative

models.

The next section introduces our basic setup using a nonparametric regression with no

SAR structure in responses. We treat this abstraction as a base case, and Section 3 discusses

estimation and defines the test statistic, while Section 4 introduces assumptions and the

key asymptotic results of the paper. Section 5 examines the most commonly employed

higher-order SAR models, while Section 6 deals with nonparametric spatial error structures.

Nonparametric specification tests are often criticized for poor finite sample performance

when using the asymptotic critical values. In Section 7 we present a bootstrap version

of our testing procedure. Sections 8 and 9 contain a study of finite sample performance

and the empirical examples respectively. Proofs are contained in appendices, including a

supplementary online appendix.

For the convenience of the reader, we collect some frequently used notation here. First,

introduce three notational conventions for any parameter ν for the rest of the paper: ν ∈ Rdν ,

ν0 denotes the true value of ν and for any scalar, vector or matrix valued function f(ν), we

denote f ≡ f(ν0). Let ϕ(·) (respectively ϕ(·)) denote the largest (respectively smallest)

eigenvalue of a generic square nonnegative definite matrix argument. For a generic matrix

A, denote ‖A‖ = [ϕ(A′A)]1/2, i.e. the spectral norm of A which reduces to the Euclidean

norm if A is a vector. ‖A‖R denotes the maximum absolute row sum norm of a generic

matrix A while ‖A‖F = [tr(AA′)]1/2, the Frobenius norm. Throughout the paper | · | isabsolute value when applied to a scalar and determinant when applied to a matrix. Denote

by c (C) generic positive constants, independent of any quantities that tend to infinity, and

arbitrarily small (big).

2 Setup

To illustrate our approach, we first consider the nonparametric regression

yi = θ0 (xi) + ui, i = 1, . . . , n, (2.1)

where θ0(·) is an unknown function and xi is a vector of strictly exogenous explanatory

variables with support X ⊂ Rk. Spatial dependence is explicitly modeled via the error term

5

ui, which we assume is generated by:

ui =

∞∑

s=1

bisεs, (2.2)

where εs are independent random variables, with zero mean and identical variance σ20. Fur-

ther conditions on the εs will be assumed later. The linear process coefficients bis can depend

on n, as may the covariates xi. This is generally the case with spatial models and implies

that asymptotic theory ought to be developed for triangular arrays. There are a number of

reasons to permit dependence on sample size. The bis can depend on spatial weight matrices,

which are usually normalized for both stability and identification purposes.

Such normalizations, e.g. row-standardization or division by spectral norm, may be n-

dependent. Furthermore, xi often includes underlying covariates of ‘neighbours’ defined by

spatial weight matrices. For instance, for some n × 1 covariate vector z and exogenous

spatial weight matrix W ≡ Wn, a component of xi can be e′iWz, where ei has unity in the

i-th position and zeroes elsewhere, which depends on n. Thus, subsequently, any spatial

weight matrices will also be allowed to depend on n. Finally, treating triangular arrays

permits re-labelling of quantities that is often required when dealing with spatial data, due

to the lack of natural ordering, see e.g. Robinson (2011). We suppress explicit reference to

this n-dependence of various quantities for brevity, although mention will be made of this at

times to remind the reader of this feature.

Now, assume the existence of a dγ × 1 vector γ0 such that bis = bis(γ0), possibly with

dγ → ∞ as n → ∞, for all i = 1, . . . , n and s ≥ 1. Let u be the n × 1 vector with typical

element ui, ε be the infinite dimensional vector with typical element εs, and B be an infinite

dimensional matrix (Cooke, 1950) with typical element bis. In matrix form,

u = Bε and E (uu′) = σ20BB

′ = σ20Σ ≡ σ2

0Σ (γ0) . (2.3)

We assume that γ0 ∈ Γ, where Γ is a compact subset of Rdγ . With dγ diverging, ensuring

Γ has bounded volume requires some care, see Gupta and Robinson (2018). For a known

function f(·), our aim is to test

H0 : P [θ0 (xi) = f(xi, α0)] = 1, for some α0 ∈ A ⊂ Rdα , (2.4)

against the global alternative H1 : P [θ0 (xi) 6= f(xi, α)] > 0, for all α ∈ A.

We now nest commonly used models for spatial dependence in (2.3). Introduce a set

of n × n spatial weight (equivalently network adjacency) matrices Wj, j = 1, . . . , m1 +

6

m2. Each Wj can be thought of as representing dependence through a particular space.

Now, consider models of the form Σ(γ) = A−1(γ)A′−1(γ). For example, with ξ denoting a

vector of iid disturbances with variance σ20, the model with SARMA(m1, m2) errors is u =

∑m1

j=1 γjWju+∑m1+m2

j=m1+1 γjWjξ+ξ, with A(γ) =(In +

∑m1+m2

j=m1+1 γjWj

)−1 (In −

∑m1

j=1 γjWj

),

assuming conditions that guarantee the existence of the inverse. Such conditions can be

found in the literature, see e.g. Lee and Liu (2010) and Gupta and Robinson (2018). The

SEM model is obtained by setting m2 = 0 while the model with SMA errors has m1 = 0.

The model with MESS(m) errors (LeSage and Pace (2007), Debarsy, Jin, and Lee (2015)) is

u = exp(∑m

j=1 γjWj

)ξ, A(γ) = exp

(−∑m

j=1 γjWj

).

In some cases the space under consideration is geographic i.e. the data may be observed

at irregular points in Euclidean space. Making the identification ui ≡ U (ti), ti ∈ Rd for

some d > 1, and assuming covariance stationarity, U(t) is said to follow an isotropic model

if, for some function δ on R, the covariance at lag s is r(s) = E [U(t)U(t + s)] = δ(‖s‖). Animportant class of parametric isotropic models is that of Matern (1986), which can be pa-

rameterized in several ways, see e.g. Stein (1999). Denoting by Γf the Gamma function and

by Kγ1 the modified Bessel function of the second kind (Gradshteyn and Ryzhik (1994)),

take δ(‖s‖ , γ) = (2γ1−1Γf(γ1))−1 (

γ−12

√2γ1 ‖s‖

)γ1 Kγ1

(γ−12

√2γ1 ‖s‖

), with γ1, γ2 > 0 and

dγ = 2. With dγ = 3, another model takes δ(‖s‖ , γ) = γ1 exp (−‖s/γ2‖γ3), see e.g.

De Oliveira, Kedem, and Short (1997), Stein (1999). Fuentes (2007) considers this model

with γ3 = 1, as well as a specific parameterization of the Matern covariance function.

3 Test statistic

We estimate θ0(·) via a series approximation. Certain technical conditions are needed to

allow for X to have unbounded support. To this end, for a function g(x) on X , define a

weighted sup-norm (see e.g. Chen, Hong, and Tamer (2005), Chen (2007), Lee and Robinson

(2016)) by ‖g‖w = supx∈X |g(x)|(1 + ‖x‖2

)−w/2, for some w > 0. Assume that there exists

a sequence of functions ψi := ψ (xi) : Rk 7→ R

p, where p→ ∞ as n→ ∞, and a p× 1 vector

of coefficients β0 such that

θ0 (xi) = ψ′iβ0 + e (xi) , (3.1)

where e(·) satisfies:

Assumption R.1. There exists a constant µ > 0 such that ‖e‖wx= O (p−µ) , as p → ∞,

where wx ≥ 0 is the largest value such that supi=1,...,nE ‖xi‖wx <∞, for all n.

7

By Lemma 1 in Appendix B of Lee and Robinson (2016), this assumption implies that

supi=1,...,n

E(e2 (xi)

)= O

(p−2µ

). (3.2)

Due to the large number of assumptions in the paper, sometimes with changes reflecting

only the various setups we consider, we prefix assumptions with R in this section and the

next, to signify ‘regression’. In Section 5 the prefix is SAR, for ‘spatial autoregression’, while

in Section 6 we use NPN, for ‘nonparametric network’.

Let y = (y1, . . . , yn)′, θ0 = (θ0 (x1) , . . . , θ0 (xn))

′,Ψ = (ψ1, . . . , ψn)′. We will estimate

γ0 using a quasi maximum likelihood estimator (QMLE) based on a Gaussian likelihood,

although Gaussianity is nowhere assumed. For any admissible values β, σ2 and γ, the

(multiplied by 2/n) negative quasi log likelihood function based on using the approximation

(3.1) is

L(β, σ2, γ) = ln(2πσ2

)+

1

nln |Σ (γ)|+ 1

nσ2(y −Ψβ)′Σ (γ)−1 (y −Ψβ), (3.3)

which is minimised with respect to β and σ2 by

β (γ) =(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 y, (3.4)

σ2 (γ) = n−1y′E(γ)′M(γ)E(γ)y, (3.5)

where M(γ) = In − E(γ)Ψ (Ψ′Σ(γ)−1Ψ)−1

Ψ′E(γ)′ and E(γ) is the n × n matrix such that

E(γ)E(γ)′ = Σ(γ)−1. The use of the approximate likelihood relies on the negligibility of

e(·), which in turn permits the replacement of θ0(·) by ψ′β0 with asymptotically negligible

cost. Thus the concentrated likelihood function is

L(γ) = ln(2π) + ln σ2(γ) +1

nln |Σ (γ)| . (3.6)

We define the QMLE of γ0 as γ = arg minγ∈ΓL(γ) and the QMLEs of β0 and σ20 as β = β (γ)

and σ2 = σ2 (γ). At a given x1, . . . , xn, the series estimate of θ0 is defined as

θ =(θ(x1), . . . , θ(xn)

)′=(ψ(x1)

′β, . . . , ψ(xn)′β)′. (3.7)

Let αn ≡ α denote an estimator consistent for α0 under H0, for example the (nonlinear)

least squares estimator. Note that α is consistent only under H0, so we introduce a general

probability limit of α, as in Hong and White (1995).

Assumption R.2. There exists a deterministic sequence α∗n ≡ α∗ such that α − α∗ =

8

Op (1/√n).

Examples of estimators that satisfy this assumption include (nonlinear) least squares, gener-

alized method of moments estimators or adaptive efficient weighted least squares (Stinchcombe and White,

1998).

Following Hong and White (1995), define the regression error ui ≡ yi − f(xi, α∗) and

the specification error vi ≡ θ0(xi) − f(xi, α∗). Our test statistic is based on a scaled and

centred version of mn = σ−2v′Σ (γ)−1 u/n = σ−2(θ − f (x, α)

)′Σ (γ)−1 (y − f (x, α)) /n,

where f(x, α) = (f (x1, α) , . . . , f (xn, α))′. Precisely, it is defined as

Tn =nmn − p√

2p. (3.8)

The motivation for such a centering and scaling stems from the fact that, for fixed p, nmn

has an asymptotic χ2p distribution. Such a distribution has mean p and variance 2p, and it

is a well-known fact that(χ2p − p

)/√2p

d−→ N(0, 1), as p → ∞. This motivates our use

of (3.8) and explains why we aspire to establish a standard normal distribution under the

null hypothesis. Intuitively, the test statistic is based on the sample covariance between

the residual from the parametric model and the discrepancy between the parametric and

nonparametric fitted values, as in Hong and White (1995). Hong and White (1995) also

note that, due to the nonparametric nature of the problem, such a statistic vanishes faster

than the parametric (n1

2 ) rate, thus a n1

2 -normalization leads to degeneracy of the test. A

proper normalization as in (3.8) will yield a non-degenerate limiting distribution.

4 Asymptotic theory

4.1 Consistency of γ

We first provide conditions under which our estimator γ of γ0 is consistent. Such a property

is necessary for the results that follow. The following assumption is a rather standard type

of asymptotic boundedness and full-rank condition on Σ(γ).

Assumption R.3.

limn→∞

supγ∈Γ

ϕ (Σ(γ)) <∞ and limn→∞

infγ∈Γ

ϕ (Σ(γ)) > 0.

Assumption R.4. The ui, i = 1, . . . , n, satisfy the representation (2.2). The εs, s ≥ 1, have

zero mean, finite third and fourth moments µ3 and µ4 respectively and, denoting by σij(γ)

9

the (i, j)-th element of Σ(γ) and defining b∗is = bis/σ1

2

ii , i = 1, . . . , n, n ≥ 1, s ≥ 1, we have

limn→∞

supi=1,...,n

∞∑

s=1

|b∗is|+ sups≥1

limn→∞

n∑

i=1

|b∗is| <∞. (4.1)

By Assumption R.3, σii is bounded and bounded away from zero, so the normalization of

the bis in Assumption R.4 is well defined. The summability conditions in (4.1) are typical

conditions on linear process coefficients that are needed to control dependence; for instance

in the case of stationary time series b∗is = b∗i−s. The infinite linear process assumed in (2.2) is

further discussed by Robinson (2011), who introduced it, and also by Delgado and Robinson

(2015).

Because we often need to consider the difference between values of the matrix-valued

function Σ(·) at distinct points, it is useful to introduce an appropriate concept of ‘smooth-

ness’. This concept has been employed before in economics, see e.g. Chen (2007), and is

defined below.

Definition 1. Let (X, ‖·‖X) and (Y, ‖·‖Y ) be Banach spaces, L (X, Y ) be the Banach space

of linear continuous maps from X to Y with norm ‖T‖L (X,Y ) = sup‖x‖X≤1 ‖T (x)‖Y and U

be an open subset of X. A map F : U → Y is said to be Frechet-differentiable at u ∈ U if

there exists L ∈ L (X, Y ) such that

lim‖h‖X→0

F (u+ h)− F (u)− L(h)

‖h‖X= 0. (4.2)

L is called the Frechet-derivative of F at u. The map F is said to be Frechet-differentiable

on U if it is Frechet-differentiable for all u ∈ U .

The above definition extends the notion of a derivative that is familiar from real analysis

to the functional spaces and allows us to check high-level assumptions that past literature

has imposed. To the best of our knowledge, this is the first use of such a concept in the

literature on spatial/network models. Denote by Mn×n the set of real, symmetric and

positive semi-definite n×n matrices. Let Γo be an open subset of Γ and consider the Banach

spaces(Γ, ‖·‖g

)and (Mn×n, ‖·‖), where ‖·‖g is a generic ℓp norm, p ≥ 1. The following

assumption ensures that Σ(·) is a ‘smooth’ function, in the sense of Frechet-smoothness.

Assumption R.5. The map Σ : Γo → Mn×n is Frechet-differentiable on Γo with Frechet-

derivative denoted DΣ ∈ L (Γo,Mn×n). Furthermore, the map DΣ satisfies

supγ∈Γo

‖DΣ(γ)‖L (Γo,Mn×n) ≤ C. (4.3)

10

Assumption R.5 is a functional smoothness condition on spatial dependence. It has the

advantage of being checkable for a variety of commonly employed models. For example,

a first-order SEM has Σ(γ) = A−1(γ)A′−1(γ) with A = In − γW . Corollary D.1 in the

supplementary appendix shows (DΣ(γ))(γ†)= γ†A−1(γ) (G′(γ) +G(γ))A′−1(γ), at a given

point γ ∈ Γo, where G(γ) =WA−1(γ). Then, taking

‖W‖+ supγ∈Γ

∥∥A−1(γ)∥∥ < C (4.4)

yields Assumption R.5. Condition (4.4) limits the extent of spatial dependence and is very

standard in the spatial literature; see e.g. Lee (2004) and numerous subsequent papers

employing similar conditions. Frechet derivatives for higher-order SAR, SMA, SARMA and

MESS error structures are computed in supplementary appendix D, in Lemmas D.5-D.6 and

Corollaries D.1-D.2. The following proposition is very useful in ‘linearizing’ perturbations in

the Σ(·).Proposition 4.1. If Assumption R.5 holds, then for any γ1, γ2 ∈ Γo,

‖Σ (γ1)− Σ (γ2)‖ ≤ C ‖γ1 − γ2‖ . (4.5)

To illustrate how the concept of Frechet-differentiability allows us to check high-level as-

sumptions extant in the literature, a consequence of Proposition 4.1 is the following corollary,

a version of which appears as an assumption in Delgado and Robinson (2015).

Corollary 4.1. For any γ∗ ∈ Γo and any η > 0,

limn→∞

supγ∈{γ:‖γ−γ∗‖<η}∩Γo

‖Σ(γ)− Σ (γ∗)‖ < Cη. (4.6)

We now introduce regularity conditions needed to establish the consistency of γ. Define

σ2 (γ) = n−1σ2tr(Σ(γ)−1Σ

)= n−1σ2

∥∥E(γ)E−1∥∥2F,

which is nonnegative by definition and bounded by Assumption R.3.

Assumption R.6. c ≤ σ2 (γ) ≤ C for all γ ∈ Γ.

Assumption R.7. γ0 ∈ Γ and, for any η > 0,

limn→∞

infγ∈N

γ(η)

n−1tr (Σ(γ)−1Σ)

|Σ(γ)−1Σ|1/n> 1, (4.7)

where N γ(η) = Γ \ N γ(η) and N γ(η) = {γ : ‖γ − γ0‖ < η} ∩ Γ.

11

Assumption R.8.{ϕ (n−1Ψ′Ψ)

}−1+ ϕ (n−1Ψ′Ψ) = Op(1).

Assumption R.6 is a boundedness condition originally considered in Gupta and Robinson

(2018), while Assumptions R.7 and R.8 are identification conditions. Indeed, Assumption

R.7 requires that Σ(γ) be identifiable in a small neighbourhood around γ0. This is ap-

parent on noticing that the ratio in (4.7) is at least one by the inequality between arith-

metic and geometric means, and equals one when Σ(γ) = Σ. Similar assumptions arise

frequently in related literature, see e.g. Lee (2004), Delgado and Robinson (2015). Assump-

tion R.8 is a typical asymptotic boundedness and non-multicollinearity condition, see e.g.

Newey (1997) and much other literature on series estimation. By Assumption R.3, it implies

supγ∈Γ

{ϕ (n−1Ψ′Σ(γ)−1Ψ)

}−1= Op(1).

Theorem 4.1. Under either H0 or H1, Assumptions R.1-R.8 and p−1 + (dγ + p) /n→ 0 as

n→ ∞, ‖(γ, σ2)− (γ0, σ20)‖

p−→ 0.

4.2 Asymptotic properties of the test statistic

Write Σj(γ) = ∂Σ(γ)/∂γj , j = 1, . . . , dγ, the matrix differentiated element-wise. While

Assumption R.5 guarantees that these partial derivatives exist, the next assumption imposes

a uniform bound on their spectral norms.

Assumption R.9. limn→∞ supj=1,...,dγ ‖Σj(γ)‖ < C.

We will later consider the sequence of local alternatives

Hℓn ≡ Hℓ : f(xi, α∗n) = θ0(xi) + (p1/4/n1/2)h(xi), (4.8)

where h is square integrable on the support X of the xi. Under the null H0, we have

h(xi) = 0.

Assumption R.10. For each n ∈ N and i = 1, . . . , n, the function f : X ×A → R is such that f (xi, α) is measurable for each α ∈ A, f (xi, ·) is a.s. con-

tinuous on A, with supα∈A f2 (xi, α) ≤ Dn (xi), where supn∈NDn (xi) is integrable and

supα∈A ‖∂f (xi, α) /∂α‖2 ≤ Dn (xi), supα∈A ‖∂2f (xi, α) /∂α∂α′‖ ≤ Dn (xi).

Define the infinite-dimensional matrix V = B′Σ−1Ψ (Ψ′Σ−1Ψ)−1

Ψ′Σ−1B, which is sym-

metric, idempotent and has rank p. We now show that our test statistic is approximated by

a quadratic form in ε, weighted by V .

Theorem 4.2. Under Assumptions R.1-R.10, p−1 + p(p + d2γ

)/n +

√n/pµ+1/4 → 0, as

n→ ∞, and H0, Tn −(σ−20 ε′V ε− p

)/√2p = op(1).

12

Assumption R.11. limn→∞

‖Σ−1‖R <∞.

Because ‖Σ−1‖ ≤ ‖Σ−1‖R, this restriction on spatial dependence is somewhat stronger

than a restriction on spectral norm but is typically imposed for central limit theorems in

this type of setting, cf. Lee (2004), Delgado and Robinson (2015), Gupta and Robinson

(2018). The next assumption is needed in our proofs to check a Lyapunov condition. A

typical approach would be assume moments of order 4 + ǫ, for some ǫ > 0. Due to the

linear process structure under consideration, taking ǫ = 4 makes the proof tractable, see for

example Delgado and Robinson (2015).

Assumption R.12. The εs, s ≥ 1, have finite eighth moment.

The next assumption is strong if the basis functions ψij(·) are polynomials, requiring all

moments to exist in that case.

Assumption R.13. E |ψij (x)| < C, i = 1, . . . , n and j = 1, . . . , p.

The next theorem establishes the asymptotic normality of the approximating quadratic

form introduced above.

Theorem 4.3. Under Assumptions R.3, R.4, R.8, R.11-R.13 and p−1 + p3/n → 0, as

n→ ∞,(σ−20 ε′V ε− p

)/√2p

d−→ N(0, 1).

This is a new type of CLT, integrating both a linear process framework as well as an

increasing dimension element. A linear-quadratic form in iid disturbances is treated by

Kelejian and Prucha (2001), while a quadratic form in a linear process framework is treated

by Delgado and Robinson (2015). However both results are established in a parametric

framework, entailing no increasing dimension aspect of the type we face with p→ ∞.

Next, we summarize the properties of our test statistic in a theorem that records its

asymptotic normality under the null, consistency and ability to detect local alternatives

at p1/4/n1/2 rate. This rate has been found also by De Jong and Bierens (1994) and Gupta

(2018b). Introduce the quantity κ =(√

2σ20

)−1plimn→∞ n−1h′Σ−1h, where h = (h (x1) , . . . , h (xn))

′

and h (xi) is from (4.8).

Theorem 4.4. Under the conditions of Theorems 4.2 and 4.3, (1) Tnd→ N(0, 1) under H0,

(2) Tn is a consistent test statistic, (3) Tnd→ N (κ, 1) under local alternatives Hℓ.

13

5 Models with SAR structure in responses

We now introduce the SAR model

yi =

dλ∑

j=1

λ0jw′i,jy + θ0 (xi) + ui, i = 1, . . . , n, (5.1)

where Wj , j = 1, . . . , dλ, are known spatial weight matrices with i-th rows denoted w′i,j,

as discussed earlier, and λ0j are unknown parameters measuring the strength of spatial

dependence. We take dλ to be fixed for convenience of exposition. The error structure

remains the same as in (2.2). Here spatial dependence arises not only in errors but also

responses. For example, this corresponds to a situation where agents in a network influence

each other both in their observed and unobserved actions.

While the model in (5.1) is new in the literature, some related ones are discussed

here. Models such as (5.1) but without dependence in the error structure are considered

by Su and Jin (2010) and Gupta and Robinson (2015, 2018), but the former consider only

dλ = 1 and the latter only parametric θ0(·). Linear θ0(·) and dλ > 1 are permitted by

Lee and Liu (2010), but the dependence structure in errors differs from what we allow in

(5.1). Using the same setup as Su and Jin (2010) and independent disturbances, a specifica-

tion test for the linearity of θ0(·) is proposed by Su and Qu (2017). In comparison, our model

is much more general and our test can handle more general parametric null hypotheses. We

thank a referee for pointing out that (5.1) is a particular case of Sun (2016) when ui are iid

and of Malikov and Sun (2017) when dλ = 1.

Denoting S(λ) = In −∑dλj=1 λjWj , the quasi likelihood function based on Gaussianity

and conditional on covariates is

L(β, σ2, φ) = log (2πσ2)− 2

nlog |S (λ)|+ 1

nlog |Σ (γ)|

+1

σ2n(S (λ) y −Ψβ)′ Σ(γ)−1 (S (λ) y −Ψβ) , (5.2)

at any admissible point (β ′, φ′, σ2)′with φ = (λ′, γ′)′, for nonsingular S(λ) and Σ(γ). For

given φ = (λ′, γ′)′, (5.2) is minimised with respect to β and σ2 by

β (φ) =(Ψ′Σ(γ)−1Ψ

)−1Ψ′Σ(γ)−1S (λ) y, (5.3)

σ2 (φ) = n−1y′S ′ (λ)E(γ)′M(γ)E(γ)S (λ) y. (5.4)

14

The QMLE of φ0 is φ = argminφ∈Φ L (φ), where

L (φ) = log σ2 (φ) + n−1 log∣∣S ′−1 (λ) Σ(γ)S−1 (λ)

∣∣ , (5.5)

and Φ = Λ × Γ is taken to be a compact subset of Rdλ+dγ . The QMLEs of β0 and σ20 are

defined as β(φ)

≡ β and σ2(φ)

≡ σ2 respectively. The following assumption controls

spatial dependence and is discussed below equation (4.4).

Assumption SAR.1. maxj=1,...,dλ ‖Wj‖+ ‖S−1‖ < C.

Writing T (λ) = S(λ)S−1 and φ = (λ′, γ′)′, define the quantity

σ2 (φ) = n−1σ20tr(T ′(λ)Σ(γ)−1T (λ)Σ

)= n−1σ2

0

∥∥E(γ)T (λ)E−1∥∥2F,

which is nonnegative by definition and bounded by Assumptions R.3 and SAR.1. The

assumptions below directly extend Assumptions R.6 and R.7 to the present setup.

Assumption SAR.2. c ≤ σ2 (φ) ≤ C, for all φ ∈ Φ.

Assumption SAR.3. φ0 ∈ Φ and, for any η > 0,

limn→∞

infφ∈N

φ(η)

n−1tr (T ′(λ)Σ(γ)−1T (λ)Σ)

|T ′(λ)Σ(γ)−1T (λ)Σ|1/n> 1, (5.6)

where N φ(η) = Φ \ N φ(η) and N φ(η) = {φ : ‖φ− φ0‖ < η} ∩ Φ.

We now introduce an identification condition that is required in the setup of this section.

Assumption SAR.4. β0 6= 0 and for any η > 0,

P

(limn→∞

inf(λ′,γ′)′∈Λ×N

γ(η)n−1β ′

0Ψ′T ′(λ)E(γ)′M (γ)E(γ)T (λ)Ψβ0/ ‖β0‖2 > 0

)= 1. (5.7)

Upon performing minimization with respect to β, the event inside the probability in (5.7)

is equivalent to the event

limn→∞

minβ∈Rp

inf(λ′,γ′)′∈Λ×N

γ(η)n−1 (Ψβ − T (λ)Ψβ0)

′ Σ(γ)−1 (Ψβ − T (λ)Ψβ0) / ‖β0‖2 > 0,

which is analogous to the identification condition for the nonlinear regression model with a

parametric linear factor in Robinson (1972), weighted by the inverse of the error covariance

matrix. This reduces the condition to a scalar form of a rank condition, making the iden-

tifying nature of the assumption transparent. A similar identifying assumption is used by

Gupta and Robinson (2018).

15

Theorem 5.1. Under either H0 or H1, Assumptions R.1-R.5, R.8, SAR.1-SAR.4 and

p−1 + (dγ + p) /n→ 0, as n→ ∞,

∥∥∥(φ, σ2

)− (φ0, σ0

2)∥∥∥ p−→ 0 as n→ ∞.

The test statistic Tn can be constructed as before but with the null residuals redefined to

incorporate the spatially lagged terms, i.e. u = S(λ)y− f(x, α). Then we have the following

theorem.

Theorem 5.2. Under Assumptions R.1-R.5, R.8-R.10, SAR.1-SAR.4,

p−1 + p(p + d2γ

)/n+

√n/pµ+1/4 + d2γ/p→ 0, as n→ ∞,

and H0, Tn −(σ−20 ε′V ε− p

)/√2p = op(1).

Theorem 5.3. Under the conditions of Theorems 4.3, 5.1 and 5.2, (1) Tnd→ N(0, 1) under

H0, (2) Tn is a consistent test statistic, (3) Tnd→ N (κ, 1) under local alternatives Hℓ.

6 Nonparametric spatial weights

In this section we are motivated by settings where spatial dependence occurs through non-

parametric functions of raw distances (this may be geographic, social, economic, or any other

type of distance), as is the case in Pinkse et al. (2002), for example. In their kind of setup,

dij is a raw distance between units i and j and the corresponding element of the spatial

weight matrix is given by wij = ζ0 (dij), where ζ0(·) is an unknown nonparametric function.

Pinkse et al. (2002) use such a setup in a SAR model like (5.1), but with a linear regression

function. In contrast, in keeping with the focus of this paper we instead model dependence

in the errors in this manner. Our formulation is rather general, covering, for example, a

specification like wij = f (γ0, ζ0 (dij)), with f(·) a known function, γ0 an unknown parameter

of possibly increasing dimension, and ζ0(·) an unknown nonparametric function.

Let Ξ be a compact space of functions, on which we will specify more conditions later.

For notational simplicity we abstract away from the SAR dependence in the responses. Thus

we consider (2.1), but with

ui =∞∑

s=1

bis(γ0, ζ0(z))εs, (6.1)

where ζ0(·) =(ζ01(·), . . . , ζ0dζ (·)

)′is a fixed-dimensional vector of real-valued nonparametric

functions with ζ0ℓ ∈ Ξ for each ℓ = 1, . . . , dζ, and z a fixed-dimensional vector of data,

16

independent of the εs, s ≥ 1, with support Z. One can also take z to be a fixed distance

measure. We base our estimation on approximating each ζ0ℓ(z), ℓ = 1, . . . , dζ, with the series

representation δ′0ℓϕℓ(z), where ϕℓ (z) ≡ ϕℓ is an rℓ × 1 (rℓ → ∞ as n → ∞) vector of basis

functions with typical function ϕℓk, k = 1, . . . , rℓ. The set of linear combinations δ′ℓϕℓ(z)

forms the sequence of sieve spaces Φrℓ ⊂ Ξ as rℓ → ∞, for any ℓ = 1, . . . , dζ, and

ζ0ℓ (z) = δ′0ℓϕℓ + νℓ, (6.2)

with the following restriction on the function space Ξ:

Assumption NPN.1. For some scalars κℓ > 0, ‖νℓ‖wz= O

(r−κℓ

ℓ

), as rℓ → ∞, ℓ =

1, . . . , dζ, where wz ≥ 0 is the largest value such that supz∈Z E ‖z‖wz <∞

Just as Assumption R.1 implied (3.2), by Lemma 1 of Lee and Robinson (2016), we

obtain

supz∈Z

E(ν2ℓ)= O

(r−2κℓ

ℓ

), ℓ = 1, . . . , dζ . (6.3)

Thus we now have an infinite-dimensional nuisance parameter ζ0(·) and increasing-dimensional

nuisance parameter γ. Writing∑dζ

ℓ=1 rℓ = r and τ = (γ′, δ′1, . . . , δ′dζ)′, which has increasing

dimension dτ = dγ + r, define ς(r) = supz∈Z;ℓ=1,...,dζ‖ϕℓ‖ . Write Σ(τ) for the covariance

matrix of the n × 1 vector of ui in (6.1), with δ′ℓϕℓ replacing each admissible function ζℓ(·).This is analogous to the definition of Σ(γ) in earlier sections, and indeed after conditioning

on z it can be treated in a similar way because dγ → ∞ was already permitted. For exam-

ple, suppose that u = (In −W )−1ε, where ‖W‖ < 1 and the elements satisfy wij = ζ0 (dij),

i, j = 1, . . . , n, for some fixed distances dij and unknown function ζ0(·), see e.g. Pinkse

(1999). Approximating ζ0(x) = τ ′0ϕ(x) + ν, for some r × 1 basis function vector ϕ(x) and

approximation error ν, we define W (τ) as the n×n matrix with elements wij(τ) = τ ′0ϕ (dij),

and set Σ(τ) = var ((In −W (τ))−1ε) = σ20(In −W (τ))−1(In −W ′(τ))−1.

For any admissible values β, σ2 and τ , the redefined (multiplied by 2/n) negative quasi

log likelihood function based on using the approximations (3.1) and (6.2) is

L(β, σ2, τ) = ln(2πσ2

)+

1

nln |Σ (τ)|+ 1

nσ2(y −Ψβ)′Σ (τ)−1 (y −Ψβ), (6.4)

which is minimised with respect to β and σ2 by

β (τ) =(Ψ′Σ (τ)−1Ψ

)−1Ψ′Σ (τ)−1 y, (6.5)

σ2 (τ) = n−1y′E(τ)′M(τ)E(τ)y, (6.6)

where M(τ) = In − E(τ)Ψ (Ψ′Σ(τ)−1Ψ)−1

Ψ′E(τ)′ and E(τ) is the n × n matrix such that

17

E(τ)E(τ)′ = Σ(τ)−1. Thus the concentrated likelihood function is

L(τ) = ln(2π) + ln σ2(τ) +1

nln |Σ (τ)| . (6.7)

Again, for compact Γ and sieve coefficient space ∆, the QMLE of τ0 is τ = arg minτ∈Γ×∆L(τ)and the QMLEs of β and σ2 are β = β (τ) and σ2 = σ2 (τ). The series estimate of θ0

is defined as in (3.7). Define also the product Banach space T = Γ × Ξdζ with norm∥∥(γ′, ζ ′)′∥∥Tw

= ‖γ‖ +∑dζ

ℓ=1 ‖ζℓ‖w, and consider the map Σ : T o → Mn×n, where T o is an

open subset of T .

Assumption NPN.2. The map Σ : T o → Mn×n is Frechet-differentiable on T o with

Frechet-derivative denoted DΣ ∈ L (T o,Mn×n). Furthermore, conditional on z, the map

DΣ satisfies

supt∈T o

‖DΣ(t)‖L (T o,Mn×n) ≤ C, (6.8)

on its domain T o.

This assumption can be checked in a similar way to how we checked Assumption R.5, where

a diverging dimension for the argument was already permitted.

Proposition 6.1. If Assumption NPN.2 holds, then for any t1, t2 ∈ T o, conditional on z,

‖Σ (t1)− Σ (t2)‖ ≤ Cς(r) ‖t1 − t2‖ . (6.9)

Corollary 6.1. For any t∗ ∈ T o and any η > 0, conditional on z,

limn→∞

supt∈{t:‖t−t∗‖<η}∩T o

‖Σ(t)− Σ (t∗)‖ < Cς(r)η. (6.10)

Assumption NPN.3. c ≤ σ2 (τ) ≤ C for τ ∈ Γ×∆, conditional on z.

Denote Σ (τ0) = Σ0. Note that this is not the true covariance matrix, which is Σ ≡Σ (γ0, ζ0).

Assumption NPN.4. τ0 ∈ Γ×∆ and, for any η > 0, conditional on z,

limn→∞

infτ∈N

τ(η)

n−1tr (Σ(τ)−1Σ0)

|Σ(τ)−1Σ0|1/n> 1, (6.11)

where N τ(η) = (Γ×∆) \ N τ(η) and N τ(η) = {τ : ‖τ − τ0‖ < η} ∩ (Γ×∆).

18

Remark 1. Expressing the identification condition in Assumption NPN.4 in terms of τ

implies that identification is guaranteed via the sieve spaces Φrℓ , ℓ = 1, . . . , dζ . This approach

is common in the sieve estimation literature, see e.g. Chen (2007), p. 5589, Condition 3.1.

Theorem 6.1. Under either H0 or H1, Assumptions R.1-R.4 (with R.3 and R.4 hold-

ing for t ∈ T rather than γ ∈ Γ), R.8, NPN.1-NPN.4 and p−1 +(minℓ=1,...,dζ rℓ

)−1+(

dγ + p+maxℓ=1,...,dζ rℓ)/n→ 0 as n→ ∞, ‖(τ , σ2)− (τ0, σ

20)‖

p−→ 0.

Theorem 6.2. Under the conditions of Theorems 4.2 and 6.1, but with τ and T replacing

γ and Γ in assumptions prefixed with R and p→ ∞,

(min

ℓ=1,...,dζrℓ

)−1

+p2

n+

√n

pµ+1/4+ p1/2ς(r)

dγ + max

ℓ=1,...,dζrℓ

√n

+

√√√√dζ∑

ℓ=1

r−2κℓ

ℓ

→ 0,

as n→ ∞, and H0, Tn −(σ−20 ε′V ε− p

)/√2p = op(1).

Theorem 6.3. Let the conditions of Theorems 4.3 and 6.2 hold, but with τ and T replacing

γ and Γ in assumptions prefixed with R. Then (1) Tnd→ N(0, 1) under H0, (2) Tn is a

consistent test statistic, (3) Tnd→ N (κ, 1) under local alternatives Hℓ.

7 Bootstrap test

The performance of nonparametric tests based on asymptotic distributions often leaves some-

thing to be desired in finite samples. An alternative approach is to use the bootstrap approx-

imation. In this section, we propose a bootstrap version of our test, focusing on the setting of

Section 5. In our simulations and empirical studies, we consider test statistics based on both

mn = σ−2v′Σ (γ)−1 u/n and mn = σ−2(u′Σ (γ)−1 u − η′Σ (γ)−1 η)/n, where η = S(λ)y − θ,

i.e., the residual from nonparametric estimation, u = S(λ)y − f(x, α), and v = θ − f(x, α).

Analogous to the definition of Tn, define the statistic T an = (nmn − p)/

√2p. In the case

of no spatial autoregressive term, and under the power series, T an and Tn are numerically

identical, as was observed by Hong and White (1995). However, in the SARSE setting a dif-

ference arises due to the spatial structure in the response y. We show that T an −Tn = op(1)

in Theorem D.1 in the online supplementary appendix.

The bootstrap versions of the test statistics Tn and T an are

T∗n =

nm∗n − p√2p

=σ∗−2v∗′Σ (γ∗)−1 u∗ − p√

2p

19

Ta∗n =

nm∗n − p√2p

=σ∗−2(u∗′Σ (γ∗)−1 u∗ − η∗′Σ (γ∗)−1 η∗)− p√

2p,

respectively, where u∗ is the bootstrap residual vector under the null, η∗ is the bootstrap

residual vector under the alternative, v∗ = θ∗(x) − f(x, α∗), and(γ∗, λ∗, σ∗2, θ∗, α∗

)is the

estimator using the bootstrap sample. We elaborate on the bootstrap statistics using the

SARARMA(m1,m2, m3) model as an example:

y =

m1∑

k=1

λkW1ky + θ(x) + u, u =

m2∑

l=1

γ2lW2lu+

m3∑

l=1

γ3lW3lξ + ξ.

Following Jin and Lee (2015), we first deduct the empirical mean of the residual vector from

ξ =

(m3∑

l=1

γ3lW3l + In

)−1(In −

m2∑

l=1

γ2lW2l

)(y −

m1∑

k=1

λkW1ky − θn

)

to obtain ξ = (In − 1nlnl

′n)ξ. Next, we sample randomly with replacement n times from

elements of ξ to obtain a vector of ξ∗. After this, we generate the bootstrap sample y∗ by

treating f = f(x, α), λ and γ as the true parameter:

y∗ =

(In −

m1∑

k=1

λkW1k

)−1f +

(In −

m2∑

l=1

γ2lW2l

)−1( m3∑

l=1

γ3lW3l + In

)ξ∗

.

We estimate the model based on the bootstrap sample y∗ using QMLE to obtain the estimator

θ∗ = ψ′β∗, λ∗, and γ∗ under the alternative hypothesis and α∗ under the null hypothesis of

θ(x) = f(x, α0). Then, η∗ = y∗ −∑m1

k=1 λ∗kW1ky

∗ − θ∗, u∗ = y∗ −∑m1

k=1 λ∗kW1ky

∗ − f(x, α∗).

This procedure is repeated B times to obtain the sequence{T ∗

nj

}Bj=1

. We reject the null

when p∗ = B−1∑B

j=1 1(Tn < T ∗nj) is smaller than the given level of significance. An identical

procedure holds for the test based on T a∗n . The asymptotic validity of the bootstrap method

can be shown as in Theorem 4 of Su and Qu (2017) and Lemma 2 in Jin and Lee (2015),

and detailed analysis can be found in the supplementary appendix, see proof of Theorem

TS.1.

20

8 Finite sample performance

8.1 Parametric error spatial structure

Taking n = 60, 100, 200, we choose two specifications to generate y from the

SARARMA(m1,m2, m3) models:

SARARMA(0,1,0): y = θ(x) + u, u = γ2W2u+ ξ

SARARMA(1,0,1): y = λ1W1y + θ(x) + u, u = γ3W3ξ + ξ,

where ξi is i.i.d. N(0, 1). The DGP of θ(x) is

θ(xi) = x′iα + cp1/4 sin(x′iα)/n1/2,

where x′iα = 1+ x1i + x2i, with x1i = (zi + z1i)/2, x2i = (zi + z2i)/2, zi, z1i, and z2i are i.i.d.

U [0, 2π].

We use two series bases for our experiments: power (polynomial) series

of third and fourth order (p = 10, p = 15) and the trigonometric series

trig1 = (1, sin (x1) , sin (x1/2) , sin (x2) , sin (x2/2) , cos (x1) , cos (x1/2) , cos (x2) , cos (x2/2))′

and trig2 = (trig′1, sin (x21) , cos (x

21) , sin (x

22) , cos (x

22))

′. We also set γ2 = 0.3, λ1 = 0.3

and γ3 = 0.4; the value c = 0, 3, 6 indicates the null hypothesis and the local alterna-

tives. The spatial weight matrices are generated using LeSage’s code make neighborsw

from http://www.spatial-econometrics.com/, where the row-normalized sparse matrix are

generated by choosing a specific number of the closest locations from randomly generated

coordinates and we set the number of neighbors to be n/20. We employ 100 bootstrap repli-

cations in each of 500 Monte Carlo replications except for the SARARMA(1,0,1) design with

n = 200, where we set 50 bootstrap replications in view of the computation time. We report

the rejection frequencies of tests based on both asymptotic and bootstrap critical values.

Tables 1-4 report the empirical rejection frequencies using the test statistics when nominal

levels are given by 1%, 5% and 10%. To see how the bootstrap test improves small sample

outcomes, we report two sets of results for each parameter configuration: the first row for

each value of c is from the asymptotic test while the second row is from the bootstrap test.

We summarize some important findings. First, we see that for most DGPs, our bootstrap

test is closer to the nominal level than the asymptotic test although the sizes of both types of

tests improve generally as the sample size increases. Second, both bootstrap and asymptotic

tests are powerful in detecting any deviations from linearity in the local alternatives.

Both types of bases give qualitatively similar results but we note that Tn = T an when

21

http://www.spatial-econometrics.com/

using polynomial series under the SARARMA(0,1,0) model, as observed in Hong and White

(1995). When using trigonometric series, tests based on these two statistics give slightly

different rejection rates. However, under the SARARMA(1,0,1) model, both series give

quantitatively different results, as illustrated in Tables 3 and 4. The patterns are similar

across all cases: the bootstrap generally affords better size control, albeit not always.

8.2 Nonparametric error spatial structure

Now we examine finite sample performance in the setting of Section 6. The three DGPs

of θ(x) are the same as the parametric setting but we generate the n × n matrix W ∗ as

w∗ij = Φ(−dij)I(cij < 0.05) if i 6= j, and w∗

ii = 0, where Φ(·) is the standard normal cdf,

dij ∼iid U [−3, 3], and cij ∼iid U [0, 1]. From this construction, we ensure that W ∗ is sparse

with no more than 5% elements being nonzero. Then, y is generated from y = θ(x)+u, u =

Wu+ ξ, where ξ ∼ N(0, In) and W = W ∗/1.2ϕ (W ∗), ensuring the existence of (I −W )−1.

In estimation, we know the distance dij and the indicator I(cij < 0.05), but we do not know

the functional form of wij, so we approximate elements in W by wij =∑r

l=0 aldlijI(cij <

0.05) if i 6= j; wii = 0.

Table 5 reports the rejection rates using 500 Monte Carlo simulation at the 5% asymptotic

level 1.645 using polynomial bases with r = 2, 3, 4, 5 and p = 10, 15, 20. We take n =

150, 300, 500, 600, 700, larger sample sizes than earlier because two nonparametric functions

must be estimated in this spatial setting. The two largest bandwidths (r = 5, p = 20)

are only employed for the largest sample size n = 700. We observe a clear pattern of

rejection rates approaching the theoretical level as sample size increases. Power improves as

c increases for all designs and is non-trivial in all cases even for c = 3. Sizes are acceptable for

n = 500, particularly when p = 15. Size performance improves further as n = 600, indicating

asymptotic stability. Note that with two diverging bandwidths (p and r), we expect sizes to

improve in a diagonal pattern going from top left corner to bottom right corner in Table 5.

This is indeed the case. For n = 700, we observe that the pairs (r, p) = (5, 15), (5, 20) deliver

acceptable sizes.

9 Empirical applications

In this section, we illustrate the specification test presented in previous sections using several

empirical examples.

22

9.1 Conflict alliances

This example is based on a study of how a network of military alliances and enmities affects

the intensity of a conflict, conducted by Konig et al. (2017). They stress that understanding

the role of informal networks of military alliances and enmities is important not only for

predicting outcomes, but also for designing and implementing policies to contain or put

an end to violence. Konig et al. (2017) obtain a closed-form characterization of the Nash

equilibrium and perform an empirical analysis using data on the Second Congo War, a

conflict that involves many groups in a complex network of informal alliances and rivalries.

To study the fighting effort of each group the authors use a panel data model with

individual fixed effects, where key regressors include total fighting effort of allies and enemies.

They further correct the potential spatial correlation in the error term by using a spatial

heteroskedasticity and autocorrelation robust standard error. We use their data and the

main structure of the specification and build a cross-sectional SAR(2) model with two weight

matrices, WA (WAij = 1 if group i and j are allies, and WA

ij = 0 otherwise) and WE (WEij = 1

if group i and j are enemies, and WEij = 0 otherwise):

y = λ1WAy + λ2W

Ey + 1nβ0 +Xβ + u,

where y is a vector of fighting efforts of each group and X includes the current rainfall,

rainfall from the last year, and their squares.1 To consider the spatial correlation in the

error term, we consider both the Error SARMA(1,0) and Error SARMA(0,1) structures.

For these, we employ a spatial weight matrix W d, based on the inverse distance between

group locations and set to be 0 after 150 km, following Konig et al. (2017). The idea is that

geographical spatial correlation dies out as groups become further apart. We also report

results using a nonparametric estimator of the spatial weights, as described in Section 6 and

studied in simulations in Section 8. For the nonparametric estimator we take r = 2.

In the original dataset, there are 80 groups, but groups 62 and 63 have the same variables

and the same locations, so we drop one group and end up with a sample of 79 groups.

We use data from 1998 as an example and further use the pooled data from all years as

a robustness check. H0 stands for restricted model where the linear functional form of

the regression is imposed, while H1 stands for the unrestricted model where we use basis

functions comprising of power series with p = 10. In all our specifications, the test statistics

are negative, so we cannot reject the null hypothesis that the model is correctly specified.

As Table 6 indicates, this failure to reject the null persists when we use pooled data from

1We follow the analysis in the original paper and do not row normalize. This is because the economiccontent of the weight matrices is defined by total fights of allies or enemies.

23

13 years, yielding 1027 observations. Thus we conclude that a linear specification is not

inappropriate for this setting. One possible reason is that the original regression, though

linear, has already included the squared terms of the rainfall as regressors. This finding is

robust to using the bootstrap tests of Section 7, which generally yield smaller p-values but

unchanged conclusions.

9.2 Innovation spillovers

This example is based on the study of the impact of R&D on growth from Bloom et al.

(2013). They develop a general framework incorporating two types of spillovers: a positive

effect from technology (knowledge) spillovers and a negative ‘business stealing’ effect from

product market rivals. They implement this model using panel data on U.S. firms.

We consider the Productivity Equation in Bloom et al. (2013):

ln y = ϕ1 ln(R&D) + ϕ2 ln(Sptec) + ϕ3 ln(Spsic) + ϕ4X + error, (9.1)

where y is a vector of sales, R&D is a vector of R&D stocks, and regressors in X include the

log of capital (Capital), log of labour (Labour), R&D, a dummy for missing values in R&D,

a price index, and two spillover terms constructed as the log of WSICR&D (Spsic) and the

log of WTECR&D (Sptec), where WSIC measures the product market proximity and WTEC

measures the technological proximity. Specifically, they define

WSIC,ij = SiS′j/(SiS

′i)

1/2(SjS′j)

1/2,WTEC,ij = TiT′j/(TiT

′i )

1/2(TjT′j)

1/2,

where Si = (Si1, Si2, . . . , Si597)′, with Sik being the share of patents of firm i in the four

digit industry k and Ti = (Ti1, Ti2, . . . , Ti426)′, with Tiτ being the share of patents of firm i

in technology class τ . Focussing on a cross-sectional analysis, we use observations from the

year 2000 and obtain a sample size of 577.

The column FE of Table 7 is from Table 5 of Bloom et al. (2013) based on their panel fixed

effects estimation and we use it as a baseline for comparison. We use both WSIC and WTEC

in the Error SARMA(2,0), Error SARMA(0,2), and Error MESS(2) models. In all of these

specifications, the test statistics are larger than 1.645, so we reject the null hypothesis of the

linear specification. This rejection also persists with the bootstrap tests, albeit the p-values

go up compared to the asymptotic ones. However, we can say even more as our estimation

also sheds light on spatial effects in the disturbances in (9.1). As before H0 imposes linear

functional form of the regressors, while H1 uses the nonparametric series estimate employing

power series with p = 10. Regardless of the specification of the regression function, the

24

disturbances suggest a strong spatial effect as the coefficients on WTEC and WSIC are large

in magnitude.

9.3 Economic growth

The final example is based on the study of economic growth rate in Ertur and Koch (2007).

Knowledge accumulated in one area might depend on knowledge accumulated in other ar-

eas, especially in its neighborhoods, implying the possible existence of the spatial spillover

effects. These questions are of interest to both economists as well as regional scientists.

For example, Autant-Bernard and LeSage (2011) examine spatial spillovers associated with

research expenditures for French regions, while Ho, Wang, and Yu (2013) examine the in-

ternational spillover of economic growth through bilateral trade amongst OECD countries,

Cuaresma and Feldkircher (2013) study spatially correlated growth spillovers in the income

convergence process of Europe, and Evans and Kim (2014) study the spatial dynamics of

growth and convergence in Korean regional incomes.

In this section, we want to test the linear SAR model specification in Ertur and Koch

(2007). Their dataset covers a sample of 91 countries over the period 1960-1995 originally

from Heston, Summers, and Aten (2002) obtained from Penn World Tables (PWT version

6.1). The variables in use include per worker income in 1960 (y60) and 1995 (y95), average

rate of growth between 1960 and 1995 (gy), average investment rate of this period (s), and

average rate of growth of working-age population (np).

Ertur and Koch (2007) consider the model

y = λWy +Xβ +WXθ + ε, (9.2)

where the dependent variable is log real income per worker ln(y95), elements of the explana-

tory variable X = (x′1, x′2) include log investment rate ln(s) = x1 and log physical capital

effective rate of depreciation ln(np + 0.05) = x2, with corresponding subscripted coefficients

β1, β2, θ1, θ2. A restricted regression based on the joint constraints β1 = −β2 and θ1 = −θ2(these constraints are implied by economic theory) is also considered in Ertur and Koch

(2007). The model (9.2) is a special case of the SAR model with a regressors (X,WX) and

iid errors, so the test derived in Section 5 can be directly applied here. Denoting by dij the

great-circle distance between the capital cities of countries i and j, one construction of W

takes wij = d−2ij while the other takes wij = e−2dij , following Ertur and Koch (2007). Table 8

presents the estimation and testing results based on using power series basis functions with

p = 10. We cannot reject linearity of the regression function for the unrestricted model. On

the other hand, linearity is rejected for the restricted model, which is the preferred specifi-

25

cation of Ertur and Koch (2007), with wij = e−2dij . Thus, not only can we conclude that

the specification of the model is under suspicion we can also infer this is due to constraints

from economic theory. The findings are supported by the bootstrap tests of Section 7.

26

SARARMA(0,1,0) SARARMA(1,0,1)

p = 10 p = 15 p = 10 p = 151% 5% 10% 1% 5% 10% 1% 5% 10% 1% 5% 10%

n = 60

c = 0 0.01 0.032 0.05 0.036 0.084 0.122 0.048 0.078 0.108 0.094 0.148 0.172

bootstrap 0.008 0.032 0.084 0.004 0.038 0.096 0.006 0.054 0.08 0.016 0.062 0.118

c = 3 0.07 0.156 0.194 0.454 0.58 0.658 0.298 0.41 0.47 0.532 0.656 0.716

0.036 0.154 0.296 0.154 0.414 0.62 0.08 0.264 0.402 0.132 0.41 0.578

c = 6 0.37 0.532 0.644 0.998 1 1 0.748 0.816 0.848 0.96 0.97 0.972

0.22 0.544 0.748 0.844 0.992 1 0.266 0.588 0.748 0.444 0.804 0.894

n = 100

c = 0 0.008 0.03 0.044 0.022 0.052 0.068 0.016 0.036 0.056 0.028 0.062 0.084

0.006 0.044 0.098 0.012 0.046 0.096 0.006 0.054 0.116 0.02 0.056 0.112

c = 3 0.352 0.478 0.574 0.54 0.666 0.744 0.202 0.332 0.398 0.49 0.622 0.716

0.294 0.578 0.72 0.37 0.662 0.824 0.134 0.366 0.496 0.222 0.556 0.732

c = 6 0.984 0.99 0.99 0.998 0.998 0.998 0.78 0.844 0.886 0.994 0.994 0.994

0.95 0.99 0.996 0.992 0.998 1 0.566 0.832 0.916 0.732 0.964 0.986

n = 200

c = 0 0.002 0.016 0.034 0.008 0.026 0.048 0.018 0.038 0.062 0.018 0.046 0.06

0.006 0.038 0.104 0.006 0.048 0.088 0.04 0.086 0.11 0.03 0.078 0.114

c = 3 0.176 0.29 0.356 0.34 0.496 0.582 0.17 0.252 0.354 0.46 0.61 0.68

0.178 0.402 0.55 0.282 0.56 0.694 0.186 0.4 0.524 0.402 0.636 0.754

c = 6 0.888 0.942 0.96 0.99 0.998 1 0.81 0.9 0.934 1 1 1

0.846 0.968 0.984 0.982 0.998 1 0.718 0.904 0.962 0.872 0.98 0.998

Table 1: Rejection probabilities for Tn and T ∗n at 1%, 5% and 10% level with polynomial series basis.

27


trig1 trig2 trig1 trig21% 5% 10% 1% 5% 10% 1% 5% 10% 1% 5% 10%

n = 60

c = 0 0.01 0.028 0.054 0.02 0.056 0.084 0.052 0.092 0.11 0.112 0.158 0.188

bootstrap 0.004 0.004 0.092 0.004 0.038 0.094 0.012 0.062 0.106 0.026 0.09 0.138

c = 3 0.166 0.248 0.296 0.172 0.29 0.358 0.254 0.35 0.426 0.326 0.456 0.528

0.092 0.276 0.39 0.056 0.22 0.374 0.082 0.256 0.406 0.096 0.222 0.354

c = 6 0.688 0.806 0.854 0.676 0.822 0.866 0.734 0.822 0.874 0.776 0.854 0.896

0.454 0.794 0.908 0.314 0.714 0.872 0.266 0.616 0.782 0.204 0.474 0.658

n = 100

c = 0 0.006 0.012 0.028 0.004 0.028 0.05 0.01 0.03 0.054 0.028 0.052 0.076

0.002 0.04 0.09 0.006 0.036 0.102 0.012 0.046 0.114 0.012 0.044 0.088

c = 3 0.27 0.39 0.484 0.288 0.412 0.508 0.188 0.296 0.366 0.484 0.634 0.702

0.214 0.508 0.626 0.194 0.45 0.632 0.132 0.346 0.514 0.242 0.542 0.698

c = 6 0.956 0.986 0.992 0.948 0.99 0.992 0.824 0.898 0.92 0.854 0.908 0.928

0.902 0.99 0.998 0.856 0.988 1 0.59 0.888 0.96 0.476 0.846 0.918

n = 200

c = 0 0.002 0.014 0.034 0.012 0.028 0.036 0.018 0.034 0.05 0.016 0.034 0.062

0.008 0.042 0.112 0.016 0.038 0.082 0.026 0.076 0.108 0.032 0.074 0.118

c = 3 0.164 0.256 0.312 0.144 0.274 0.356 0.166 0.28 0.344 0.23 0.364 0.436

0.162 0.374 0.532 0.136 0.346 0.468 0.242 0.432 0.526 0.244 0.42 0.542

c = 6 0.818 0.898 0.934 0.816 0.904 0.944 0.884 0.93 0.954 0.91 0.956 0.968

0.796 0.95 0.98 0.776 0.934 0.974 0.78 0.942 0.982 0.794 0.948 0.98

Table 2: Rejection probabilities for Tn and T ∗n at 1%, 5% and 10% level with trigonometric series basis.

28


p = 10 p = 15 p = 10 p = 151% 5% 10% 1% 5% 10% 1% 5% 10% 1% 5% 10%

n = 60

c = 0 0.01 0.032 0.05 0.036 0.084 0.122 0.052 0.074 0.102 0.094 0.14 0.168

bootstrap 0.008 0.032 0.084 0.004 0.038 0.096 0.006 0.052 0.084 0.012 0.068 0.114

c = 3 0.07 0.156 0.194 0.454 0.58 0.658 0.308 0.422 0.48 0.55 0.66 0.712

0.036 0.154 0.296 0.154 0.414 0.62 0.092 0.27 0.396 0.164 0.408 0.596

c = 6 0.37 0.532 0.644 0.998 1 1 0.776 0.838 0.864 0.98 0.984 0.984

0.22 0.544 0.748 0.844 0.992 1 0.268 0.596 0.752 0.518 0.824 0.9

n = 100

c = 0 0.008 0.03 0.044 0.022 0.052 0.068 0.016 0.04 0.066 0.032 0.062 0.086

0.006 0.044 0.098 0.012 0.046 0.096 0.008 0.058 0.122 0.024 0.062 0.118

c = 3 0.352 0.478 0.574 0.54 0.666 0.744 0.216 0.334 0.41 0.49 0.628 0.71

0.294 0.578 0.72 0.37 0.662 0.824 0.14 0.36 0.494 0.252 0.566 0.73

c = 6 0.984 0.99 0.99 0.998 0.998 0.998 0.784 0.848 0.9 0.98 0.98 0.98

0.95 0.99 0.996 0.992 0.998 1 0.536 0.818 0.914 0.786 0.958 0.974

n = 200

c = 0 0.002 0.016 0.034 0.008 0.026 0.048 0.002 0.038 0.06 0.014 0.034 0.058

0.006 0.038 0.104 0.006 0.048 0.088 0.04 0.08 0.116 0.026 0.064 0.108

c = 3 0.176 0.29 0.356 0.34 0.496 0.582 0.172 0.266 0.356 0.46 0.608 0.688

0.178 0.402 0.55 0.282 0.56 0.694 0.176 0.382 0.516 0.41 0.632 0.738

c = 6 0.888 0.942 0.96 0.99 0.998 1 0.808 0.902 0.932 1 1 1

0.846 0.968 0.984 0.982 0.998 1 0.704 0.894 0.948 0.914 0.986 0.996

Table 3: Rejection probabilities for T an and T a∗

n at 1%, 5% and 10% level with polynomial series basis.

29


trig1 trig2 trig1 trig21% 5% 10% 1% 5% 10% 1% 5% 10% 1% 5% 10%

n = 60

c = 0 0.01 0.028 0.054 0.02 0.056 0.084 0.048 0.092 0.108 0.104 0.152 0.18

bootstrap 0.004 0.004 0.092 0.004 0.038 0.094 0.014 0.064 0.096 0.024 0.088 0.13

c = 3 0.16 0.252 0.292 0.174 0.29 0.358 0.276 0.366 0.44 0.344 0.468 0.538

0.09 0.274 0.384 0.056 0.22 0.376 0.08 0.25 0.406 0.102 0.242 0.37

c = 6 0.682 0.798 0.85 0.676 0.822 0.866 0.744 0.864 0.9 0.81 0.874 0.91

0.444 0.794 0.906 0.312 0.714 0.87 0.248 0.61 0.792 0.206 0.484 0.658

n = 100

c = 0 0.006 0.012 0.026 0.006 0.028 0.05 0.014 0.032 0.054 0.036 0.06 0.084

0.004 0.038 0.092 0.006 0.036 0.106 0.01 0.046 0.116 0.014 0.044 0.096

c = 3 0.268 0.396 0.486 0.288 0.412 0.508 0.194 0.304 0.376 0.49 0.648 0.718

0.214 0.504 0.63 0.194 0.45 0.632 0.122 0.354 0.52 0.272 0.568 0.696

c = 6 0.958 0.986 0.992 0.952 0.99 0.992 0.832 0.908 0.916 0.874 0.92 0.94

0.9 0.99 0.998 0.856 0.988 1 0.554 0.884 0.948 0.478 0.834 0.916

n = 200

c = 0 0.002 0.018 0.038 0.012 0.028 0.032 0.018 0.036 0.044 0.016 0.034 0.064

0.012 0.046 0.114 0.016 0.042 0.08 0.03 0.076 0.102 0.028 0.06 0.122

c = 3 0.156 0.258 0.312 0.144 0.272 0.352 0.17 0.272 0.348 0.23 0.364 0.436

0.162 0.38 0.524 0.134 0.35 0.466 0.22 0.438 0.526 0.256 0.428 0.538

c = 6 0.816 0.908 0.936 0.816 0.904 0.944 0.876 0.928 0.956 0.912 0.956 0.97

0.802 0.952 0.978 0.774 0.934 0.972 0.746 0.934 0.976 0.776 0.93 0.976

Table 4: Rejection probabilities for T an and T a∗

n at 1%, 5% and 10% level with trigonometric series basis.

30

r = 2 r = 3 r = 4 r = 5n = 150 p = 10 p = 15 p = 10 p = 15 p = 10 p = 15 p = 10 p = 15 p = 20c = 0 0.0860 0.2020 0.1180 0.2060 0.1420 0.2240c = 3 0.3320 0.6340 0.3700 0.6380 0.3760 0.6700c = 6 0.9060 0.9920 0.9180 0.9940 0.9220 0.9960

n = 300

c = 0 0.0820 0.0960 0.0880 0.1080 0.1060 0.1100c = 3 0.2680 0.5980 0.2600 0.6120 0.2780 0.6220c = 6 0.8140 0.9980 0.8160 0.9980 0.8220 0.9980

n = 500

c = 0 0.0280 0.0420 0.0260 0.0400 0.0360 0.0480c = 3 0.2320 0.6660 0.2400 0.6620 0.2460 0.6680c = 6 0.8920 1 0.9040 1 0.9000 1

n = 600

c = 0 0.0320 0.0500 0.0340 0.0540 0.0360 0.0540c = 3 0.3140 0.6480 0.3080 0.6280 0.3120 0.6460c = 6 0.9220 1 0.9180 1 0.9180 1n = 700c = 0 0.0260 0.0300 0.0280 0.0380 0.0280 0.0380 0.0280 0.0420 0.0580c = 3 0.2420 0.5540 0.2400 0.5480 0.2520 0.5500 0.2420 0.5600 0.6920c = 6 0.9580 0.9980 0.9560 0.9980 0.9600 0.9980 0.9500 0.9980 1

Table 5: Rejection probabilities of Tn at 5% asymptotic level, nonparametric spatial errorstructure.

31

1998 Pooled

H0 p-value H1 p-value H0 p-value H1

SARARMA(2,1,0)

WAy -0.005 <0.001 -0.003 <0.001 0.013 <0.001 0.013 <0.001WEy 0.130 <0.001 0.129 <0.001 0.121 <0.001 0.121 <0.001W d -0.159 0.281 -0.225 <0.001 -0.086 0.033 -0.086 0.033Tn -1.921 0.973 -2.531 0.994T ∗

n 0.840 0.940T a

n -1.918 0.972 -2.547 0.995T a∗

n 0.870 0.730

SARARMA(2,0,1)

WAy 0.001 <0.01 0.011 <0.01 0.013 <0.01 0.013 <0.01WEy 0.127 <0.01 0.122 <0.01 0.121 <0.01 0.121 <0.01W d -0.153 <0.01 -0.050 <0.01 -0.086 <0.01 -0.086 0.025Tn -1.763 0.961 -2.421 0.992T ∗

n 0.900 0.990T a

n -2.349 0.991 -2.423 0.992T a∗

n 0.850 0.790

Nonparametric

WAy -0.052 <0.001 -0.011 <0.001 0.033 <0.001 0.033 <0.001WEy 0.149 <0.001 0.133 <0.001 0.110 <0.001 0.109 <0.001W d

Tn -1.294 0.902 -2.314 0.990T ∗

n 0.830 0.850T a

n -1.898 0.971 -2.530 0.994T a∗

n 0.660 0.910

Table 6: The estimates and test statistics for the conflict data. ∗ denotes the bootstrapp-value.

32

Variables FE SARARMA(0,1,0), WTEC SARARMA(0,1,0), WSIC

p-value H0 p-value H1 p-value H0 p-value H1 p-valueln(sic) -0.005 0.649 0.007 0.574 0.015 0.166 0.008 0.620 0.017 0.193ln(tech) 0.191 <0.001 0.006 0.850 -0.001 0.998 0.039 0.157 0.020 0.336ln(Lab.) 0.636 <0.001 0.572 <0.001 0.571 ¡0.001ln(Cap.) 0.154 <0.001 0.336 <0.001 0.318 <0.001ln(R&D) 0.043 <0.001 0.081 <0.001 0.082 <0.001WSIC 0.722 <0.001 0.724 <0.001WTEC 0.835 <0.001 0.829 <0.001Tn 15.528 <0.001 10.451 <0.001T ∗

n 0.050 <0.001

Variables SARARMA(0,2,0) SARARMA(0,0,2) Error MESS(2)

H0 p-value H1 p-value H0 p-value H1 p-value H0 H1

ln (sic) 0.009 0.587 0.018 0.170 -0.0002 0.991 0.013 0.266 0.002 0.788 0.014 0.040

ln (tech) 0.044 0.112 0.026 0.236 0.033 0.200 0.017 0.434 0.045 0.025 0.027 0.088

ln (Lab.) 0.573 <0.001 0.565 <0.01 0.569 <0.001

ln (Cap.) 0.315 <0.001 0.334 <0.01 0.323 <0.001

ln (R&D) 0.082 <0.001 0.076 <0.01 0.077 <0.001

WSIC 0.696 <0.001 0.693 <0.001 0.624 <0.01 0.728 <0.001 0.775 <0.001 0.836 <0.001

WTEC 0.157 0.092 0.164 0.079 0.312 0.123 0.321 0.112 0.338 0.010 0.380 0.004

Tn 10.485 <0.001 15.144 <0.001 12.776 <0.001

T ∗n 0.060 0.020 0.050

Table 7: The estimates and test statistics for the R&D data. ∗ denotes the bootstrap p-value

33

Variable w∗ij = d−2

ij for i 6= j w∗ij = e−2dij for i 6= j

estimate p-value estimate p-valueConstant 1.0711 0.608 0.5989 0.798ln(s) 0.8256 < 0.001 0.7938 < 0.001ln(np + 0.05) −1.4984 0.008 −1.4512 0.009W ln(s) −0.3159 0.075 −0.3595 0.020W ln(np + 0.05) 0.5633 0.498 0.1283 0.856Wy 0.7360 < 0.001 0.6510 < 0.001Tn −1.88 0.970 −2.08 0.981T ∗

n 0.850 0.900T a

n −1.90 0.971 −2.05 0.980T a∗

n 0.820 0.810Restricted regressionConstant 2.1411 < 0.001 2.9890 < 0.001ln(s)− ln(n+ 0.05) 0.8426 < 0.001 0.8195 < 0.001W [ln(s)− ln(np + 0.05)] −0.2675 0.122 −0.2589 0.098W ln(y) 0.7320 < 0.001 0.6380 < 0.001Tn 0.30 0.382 4.04 < 0.001T ∗

n 0.500 < 0.001T a

n 0.10 0.460 4.50 < 0.001T a∗

n 0.560 0.040

Table 8: The estimates and test statistics of the linear SAR model for the growth data. ∗

denotes the bootstrap p-value.

34

Appendix

A Proofs of theorems and propositions

Proof of Proposition 4.1: Because the map Σ : Γo → Mn×n is Frechet-differentiable on Γo,

it is also Gateaux-differentiable and the two derivative maps coincide. Thus by Theorem 1.8

of Ambrosetti and Prodi (1995), ‖Σ (γ1)− Σ (γ2)‖ ≤ supγ∈ℓ[γ1,γ2] ‖DΣ(γ)‖ ‖γ1 − γ2‖ , whereℓ [γ1, γ2] = {tγ1 + (1− t)γ2 : t ∈ [0, 1]}. The claim now follows by (4.3) in Assumption 8.

Proof of Theorem 4.1. This a particular case of the proof of Theorem 5.1 with λ = 0, and

so S(λ) = In.

Proof of Theorem 4.2. In supplementary appendix.

Proof of Theorem 4.3. We would like to establish the asymptotic unit normality of

σ−20 ε′V ε− p√

2p. (A.1)

Writing q =√2p, the ratio in (A.1) has zero mean and variance equal to one, and may be

written as∑∞

s=1ws, where ws = σ−20 q−1vss (ε

2s − σ2

0)+2σ−20 q−11(s ≥ 2)εs

∑t<s vstεt, with vst

the typical element of V , with s, t = 1, 2, . . . ,. We first show that

w∗p−→ 0, (A.2)

where w∗ = w − wS, wS =∑S

s=1ws and S = Sn is a positive integer sequence that is

increasing in n. All expectations in the sequel are taken conditional on X . By Chebyshev’s

inequality proving

Ew2∗

p→ 0 (A.3)

is sufficient to establish (A.2). Notice that Ew2s ≤ Cq−2v2ss + Cq−21(s ≥ 2)

∑t<s v

2st ≤

Cq−2∑

t≤s v2st, so that, writing M = Σ−1Ψ[Ψ′Σ−1Ψ]−1Ψ′Σ−1,

∞∑

s=S+1

Ew2s ≤ Cq−2

∞∑

s=S+1

∑

t≤s

v2st ≤ Cq−2∞∑

s=S+1

b′sM∑

t≤s

btb′tM bs

≤ Cq−2 ‖Σ‖∞∑

s=S+1

b′sM2bs ≤ Cq−2

∞∑

s=S+1

n∑

i,j,k=1

bisbktmijmkj

≤ Cq−2∞∑

s=S+1

n∑

i,k=1

|b∗is| |b∗ks|n∑

j=1

(m2

kj +m2ij

), (A.4)

35

where mij is the (i, j)-th element of M and we have used the inequality |ab| ≤ (a2 + b2) /2

in the last step. Now, denote by h′i the i-th row of the n × p matrix Σ−1Ψ. Denoting the

elements of Σ−1 by Σ−1ij and ψjl = ψ (xjl), hi has entries hil =

∑nj=1Σ

−1ij ψjl, l = 1, . . . , p. We

have |hil| = Op

(∑nj=1

∣∣Σ−1ij

∣∣)= Op (‖Σ−1‖R) = Op(1), uniformly, by Assumptions R.11 and

R.13. Thus, we have ‖hi‖ = Op

(√p), uniformly in i. As a result,

|mij | = n−1∣∣∣h′i(n−1Ψ′Σ−1Ψ

)−1hj

∣∣∣ = Op

(n−1 ‖hi‖ ‖hj‖

)= Op

(pn−1

), (A.5)

uniformly in i, j, by Assumption R.11. Similarly, note that

n∑

j=1

m2ij = n−1h′i

(n−1Ψ′Σ−1Ψ

)−1 (n−1Ψ′Σ−2Ψ

) (n−1Ψ′Σ−1Ψ

)−1hi

≤ n−1 ‖hi‖2∥∥∥(n−1Ψ′Σ−1Ψ

)−1∥∥∥2 ∥∥n−1Ψ′Σ−2Ψ

∥∥

= Op

(pn−2 ‖Ψ‖2

∥∥Σ−1∥∥2)= Op

(pn−1

), (A.6)

uniformly in i. Thus (A.4) is

Op

(q−2pn−1

n∑

i=1

∞∑

s=S+1

|b∗is|n∑

t=1

|b∗ks|)

= Op

(q−2p sup

i=1,...,n

∞∑

s=S+1

|b∗is|), (A.7)

by Assumption R.4. By the same assumption, there exists Sin such that∑∞

s=Sin+1 |b∗is| ≤ ǫn

for any decreasing sequence ǫn → 0 as n → ∞. Choosing S = maxi=1,...,n Sin in wS, we

deduce that (A.7) is Op (q−2pǫn) = Op (ǫn) = op(1), proving (A.3). Thus we need only focus

on wS, and seek to establish that

wS −→d N(0, 1), as n→ ∞. (A.8)

From Scott (1973), (A.8) follows if

S∑

s=1

Ew4s

p−→ 0, as n→ ∞, (A.9)

andS∑

s=1

[E(w2

s | εt, t < s)− E

(w2

s

)] p−→ 0, as n→ ∞. (A.10)

36

We show (A.9) first. Evaluating the expectation and using (A.6) yields

Ew4s ≤ Cq−4v4ss + Cq−4

∑

t<s

v4st ≤ Cq−4

(∑

t≤s

v2st

)2

≤ Cq−4

(b′sM

∑

t≤s

btb′tM bs

)2

≤ Cq−4(b′sM

2bs)2

= Cq−4n∑

i,j,k=1

bisbksmijmkj ≤ Cq−4n∑

i,k=1

|b∗is| |b∗ks|n∑

j=1

(m2

ij +m2kj

)

= Op

q−4pn−1

(n∑

i=1

|b∗is|)2 ,

whence

S∑

s=1

Ew4s = Op

q−4pn−1

S∑

s=1

(n∑

i=1

|b∗is|)2 = Op

(q−4pn−1

S∑

s=1

(n∑

i=1

|b∗is|))

= Op

(q−4p

),

by Assumption R.4. Thus (A.9) is established. Notice that E (w2s | ǫt, t < s) equals

4q−2σ−40

{(µ4 − σ4

0

)v2ss + 2µ31(s ≥ 2)

∑

t<s

vstvssεt

}+ 4q−2σ−2

0 1(s ≥ 2)

(∑

t<s

vstεt

)2

,

and Ew2s = 4q−2σ−4

0 (µ4 − σ40) v

2ss + 4q−21(s ≥ 2)

∑t<s v

2st, so that (A.10) is bounded by a

constant times

q−2S∑

s=2

∑

t<s

vstvssεt +

S∑

s=2

(∑

t<s

vstεt

)2

− σ20

∑

t<s

v2st

. (A.11)

By transforming the range of summation, the square of the first term in (A.11) has expec-

tation bounded by

Cq−4E

(S−1∑

t=1

S∑

s=t+1

vstvssεt

)2

≤ Cq−4S−1∑

t=1

(S∑

s=t+1

vstvss

)2

, (A.12)

where the factor in parentheses on the RHS of (A.12) is

S∑

s,r=t+1

b′sM bsb′sM btb

′rM brb

′rM bt ≤

S∑

s,r=t+1

|b′sM bsb′rM br| |b′sM bt| |b′rM bt|

≤ C

S∑

s,r=t+1

n∑

i,j,k,l=1

|bis| |mij | |bjr| |bks| |mlk| |bkr| |b′sM bt| |b′rM bt|

37

≤ C

(supi,j

|mij |)2(sups≥1

n∑

i=1

|b∗is|)4 S∑

s,r=t+1

|b′sM bt| |b′rM bt|

= Op

p2n−2

(S∑

s=t+1

|b′tM bs|)2 = Op

p2n−2

(S∑

s=t+1

n∑

i,j=1

|b∗it| |mij |∣∣b∗js∣∣)2 ,

where we used Assumptions R.4 and (A.5). Now Assumptions R.4, R.11 and (A.5) imply

that

S∑

s=t+1

n∑

i,j=1

|b∗it| |mij |∣∣b∗js∣∣ = Op

(supi,j

|mij | supt

n∑

i=1

|b∗it|n∑

j=1

S∑

s=t+1

∣∣b∗js∣∣)

= Op

(p sup

t

n∑

i=1

|b∗it|),

so (A.12) is Op

(q−4p4n−2 supt (

∑ni=1 |b∗it|)

(∑ni=1

(∑S−1t=1 |b∗it|

))). By Assumption R.4 the

latter is Op (q−4p4n−1) and therefore the first term in (A.11) is Op (p

2n−1), which is negligible.

Once again transforming the summation range and using the inequality |a + b|2 ≤C (a2 + b2), we can bound the square of the second term in (A.11) by a constant times

(S−1∑

t=1

S∑

s=t+1

v2st(ε2t − σ2

0

))2

+

(2

S−1∑

t=1

t−1∑

r=1

S∑

s=t+1

vstvsrεtεr

)2

. (A.13)

Using Assumption R.4, the expectations of the two terms in (A.13) are bounded by a con-

stant times α1 and a constant times α2, respectively, where α1 =∑S−1

t=1

(∑Ss=t+1 v

2st

)2, α2 =

∑S−1t=1

∑t−1r=1

(∑Ss=t+1 vstvsr

)2. Thus (A.13) is Op (α1 + α2). Now by (A.5), Assumptions R.4,

R.11 and elementary inequalities α2 is bounded by

S−1∑

t=1

t−1∑

r=1

S∑

s=t+1

S∑

u=t+1

b′sM btb′sM brb

′uM btb

′uM br

= Op

(q−4

S∑

s,r,t,u=1

n∑

i,j=1

|b∗ir| |mij |∣∣b∗js∣∣

n∑

i,j=1

|b∗ir| |mij |∣∣b∗ju∣∣

n∑

i,j=1

|b∗it| |mij |∣∣b∗js∣∣

n∑

i,j=1

|b∗it| |mij |∣∣b∗ju∣∣)

= Op

(q−4pn−1

S∑

s,r,t=1

(n∑

i,j=1

|b∗ir| |mij |∣∣b∗js∣∣)(

n∑

i,j=1

|b∗ir| |mij|S∑

u=1

∣∣b∗ju∣∣)

×n∑

i,j=1

|b∗it| |mij |∣∣b∗js∣∣

n∑

i=1

|b∗it| supu

n∑

j=1

∣∣b∗ju∣∣)

= Op

(q−4p2n−2

S∑

s,r=1

(n∑

i,j=1

|b∗ir| |mij |∣∣b∗js∣∣)

n∑

i=1

|b∗ir|n∑

j=1

(S∑

u=1

∣∣b∗ju∣∣)(

n∑

i,j=1

S∑

t=1

|b∗it| |mij|∣∣b∗js∣∣))

38

= Op

(q−4p2n−1

n∑

i,j=1

(S∑

r=1

|b∗ir|)|mij |

(S∑

s=1

∣∣b∗js∣∣)(

supj

n∑

i=1

|mij|)

n∑

j=1

∣∣b∗js∣∣)

= Op

(q−4p2n−1 sup

k

n∑

i,j=1

|mij |n∑

i=1

|mik|)

= Op

(q−4p2n−1 sup

k

n∑

i,j,ℓ=1

|mij | |mℓk|)

= Op

(q−4p2n−1 sup

k

n∑

i,j,ℓ=1

(m2

ij +m2ℓk

))

= Op

(q−4p2n−1

n∑

i,j,ℓ=1

(m2

ij +m2ℓj

))

= Op

(q−4p2n−1

n∑

i,j=1

m2ij

)= Op

(q−4p2 sup

j

n∑

i=1

m2ij

)= Op

(pn−1

),

where we used (A.6) in the last step. A similar use of the conditions of the theorem and

(A.5) implies that α1 is

Op

q−4

S−1∑

t=1

S∑

s=t+1

(n∑

i,j=1

|mij |∣∣b∗jt∣∣ |b∗is|

)2

2

= Op

q−4

(supi,j

|mij |)4 S−1∑

t=1

S∑

s=t+1

(n∑

i=1

|b∗is|n∑

j=1

∣∣b∗jt∣∣)2

2

= Op

q−4p4n−4

S−1∑

t=1

S∑

s=t+1

(n∑

i=1

|b∗is|)2( n∑

j=1

∣∣b∗jt∣∣)2

2

= Op

q−4p4n−4

S−1∑

t=1

S∑

s=t+1

(n∑

i=1

|b∗is|)2

2(n∑

j=1

∣∣b∗jt∣∣)4

= Op

q−4p4n−4

(S−1∑

t=1

n∑

j=1

∣∣b∗jt∣∣)(

S∑

s=t+1

n∑

i=1

|b∗is|)2

sups

(n∑

i=1

|b∗is|)2

supt

(n∑

j=1

∣∣b∗jt∣∣)3

= Op

(q−4p4n−1

)= Op

(p2n−1

)

proving (A.10), as p2/n→ 0 by the conditions of the theorem.

Proof of Theorem 4.4. In supplementary appendix.

Proof of Theorem 5.1. Due to the similarity with proofs in Delgado and Robinson (2015)

and Gupta and Robinson (2018), the details are in the supplementary appendix.

Proof of Theorem 5.2. Denote θ∗ as the solution of minθ E(yi −

∑dλj=1 λjw

′i,jy − θ(xi)

)2.

Put θ∗i = θ∗(xi), θ0i = θ0(xi), θi = ψ′iβ , fi = f(xi, α), f ∗

i = f(xi, α∗). Then

ui = yi−∑dλ

j=1 λjw′i,jy−f(xi, α) = ui+θ0i+

∑dλj=1(λj0−λj)w′

i,jy−fi. Proceeding as in the proof

39

of Theorem 4.2, we obtain nmn = σ−2u′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u+ σ−2∑7

j=1Aj .

Thus, compared to the test statistic with no spatial lag, cf. the proof of Theorem 4.2, we

have the additional terms

A5 =

dλ∑

j=1

(λj0 − λj)y′W ′

jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1dλ∑

j=1

(λj0 − λj)Wjy,

A6 =

dλ∑

j=1


jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 (u+ θ0 − f),

A7 =(Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 (u+e)− e + θ0 − f

)′Σ (γ)−1

dλ∑

j=1

(λj0 − λj)Wjy.

We now show that Aℓ = op(√p), ℓ > 4, so the leading term in nmn is the same as before. First

‖y‖ = Op(√n) from y = (In−

∑dλj=1 λj0Wj)

−1 (θ0 + u). Then, with∥∥∥λ

0− λ∥∥∥ = Op

(√dγ/n

)

by Lemma D.2, we have

|A5| ≤∥∥∥λ

0− λ∥∥∥2

dλ∑

j=1

‖Wj‖2 supγ,j

∥∥∥∥∥Σ (γ)−1 1

nΨ

(1

nΨ′Σ (γ)−1Ψ

)−1

Ψ′Σ (γ)−1

∥∥∥∥∥ ‖y‖2

= Op (dγ/n)Op(1)Op(n) = Op (dγ) = op(√p).

Uniformly in γ and j,

E(u′S−1′W ′

jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u)

= Etr

((1

nΨ′Σ (γ)−1Ψ

)−11

nΨ′Σ (γ)−1ΣS−1′W ′

jΣ (γ)−1Ψ

)= Op(p)

and

E(θ′0S

−1′W ′jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u

)2

= Op

∥∥S−1

∥∥2 supγ

∥∥Σ (γ)−1∥∥4∥∥∥∥∥1

nΨ

(1

nΨ′Σ (γ)−1Ψ

)−1

Ψ′

∥∥∥∥∥

2

supj

‖Wj‖2 ‖Σ‖ ‖θ0‖2 = Op(n).

Similarly, θ′0S−1′W ′

jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1Wjθ0 = Op(n), uniformly. Therefore,

∣∣∣∣∣

dλ∑

j=1


jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u

∣∣∣∣∣

40

=

∣∣∣∣∣

dλ∑

j=1

(λj0 − λj) (θ0 + u)′ S−1′W ′jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u

∣∣∣∣∣

≤ dλ

∥∥∥λ0− λ∥∥∥ sup

γ,j

∣∣θ′0S−1′W ′jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u

∣∣

+dλ

∥∥∥λ0− λ∥∥∥ sup

γ,j

∣∣u′S−1′W ′jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u

∣∣

= Op

(√dγ/n

)Op(

√n) +Op

(√dγ/n

)Op(p) = Op

(√dγ

)= op (

√p) ,

and

∣∣∣∣∣

dλ∑

j=1


jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 (θ0 − f)

∣∣∣∣∣

≤ dλ

∥∥∥λ0− λ∥∥∥ ‖y‖ sup

j‖Wj‖ sup

γ

∥∥∥∥∥1

nΨ

(1

nΨ′Σ (γ)−1Ψ

)−1

Ψ

∥∥∥∥∥ supγ∥∥Σ (γ)−1

∥∥2∥∥∥θ0 − f

∥∥∥

= Op

(√dγ/n

)Op

(√n)Op

(p1/4

)= Op

(√dγp

1/4)= op(

√p),

so that A6 = op(√p). Finally,

∣∣∣∣∣

dλ∑

j=1


jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 e

∣∣∣∣∣

≤ dλ

∥∥∥λ0− λ∥∥∥ ‖y‖ sup

j‖Wj‖ sup

γ

∥∥∥∥∥1

nΨ

(1

nΨ′Σ (γ)−1Ψ

)−1

Ψ

∥∥∥∥∥ supγ∥∥Σ (γ)−1

∥∥2 ‖e‖

= Op

(√dγ/n

)Op

(√n)Op

(p−µ

√n)= Op

(√dγp

−µ√n)= op(

√p),

and

∣∣∣∣∣(e+ θ0 − f)′Σ (γ)−1dλ∑

j=1

(λj0 − λj)Wjy

∣∣∣∣∣

≤ dλ

∥∥∥λ0− λ∥∥∥(‖e‖+

∥∥∥θ0 − f∥∥∥)supγ

∥∥Σ (γ)−1∥∥ sup

j‖Wj‖ ‖y‖

= Op

(√dγ/n

)Op

(p−µ

√n+ p1/4

)Op

(√n)= Op

(√dγp

−µ√n+

√dγp

1/4)= op(

√p),

implying that A7 = op(√p).

Proof of Theorem 5.3. Omitted as it is similar to the proof of Theorem 4.4.

41

Proof of Proposition 6.1: Because the map Σ : T o → Mn×n is Frechet-differentiable on T o,

it is also Gateaux-differentiable and the two derivative maps coincide. Thus by Theorem 1.8

of Ambrosetti and Prodi (1995),

‖Σ(t1)− Σ(t1)‖ ≤ supt∈T o

‖DΣ(t)‖L (T o,Mn×n)

‖γ1 − γ2‖+

dζ∑

ℓ=1

∥∥(δℓ1 − δℓ2)′ ϕℓ

∥∥w

, (A.14)

where

dζ∑

ℓ=1

∥∥(δℓ1 − δℓ2)′ ϕℓ

∥∥w

=

dζ∑

ℓ=1

supz∈Z

∣∣(δℓ1 − δℓ2)′ ϕℓ

∣∣ (1 + ‖z‖2)−w/2

≤dζ∑

ℓ=1

‖δℓ1 − δℓ2‖ supz∈Z

‖ϕℓ‖(1 + ‖z‖2

)−w/2

≤ Cς(r)

dζ∑

ℓ=1

‖δℓ1 − δℓ2‖ ≤ Cς(r) ‖t1 − t2‖ .

The claim now follows by (6.8) in Assumption NPN.2, because ‖γ1 − γ2‖ ≤ Cς(r) ‖t1 − t2‖for some suitably chosen C.

Proof of Theorem 6.1. The proof is omitted as it is entirely analogous to that of Theorem

5.1, with the exception of one difference when proving equicontinuity. In the setting of

Section 6, we obtain via Proposition 6.1 ‖Σ(τ)− Σ (τ ∗)‖ = Op (ς(r)ε). Because ε > 0 is

arbitrarily small we may choose it smaller than ε′/ς(r), for some arbitrary ǫ′ > 0.

Proof of Theorem 6.2. Writing, δ(z) =(δ′1ϕ1(z), . . . , δ

′dζϕdζ (z)

)′and taking t1 =

(γ′, δ(z)′

)′

and t2 = (γ′0, ζ0(z)′)′ in Proposition 6.1 implies (we suppress the argument z)

‖Σ (τ)− Σ‖ = Op

(ς(r)

(‖γ − γ0‖+

∥∥∥δ − ζ0

∥∥∥))

= Op (ς(r) (‖τ − τ0‖+ ‖ν‖))

= Op

ς(r)max

√dτ/n,

√√√√dζ∑

ℓ=1

r−2κℓ

ℓ

,

uniformly on Z. Thus we have

∥∥Σ (τ )−1 − Σ−1∥∥ ≤

∥∥Σ (τ )−1∥∥ ‖Σ (τ)− Σ‖

∥∥Σ−1∥∥ = Op

ς(r)max

√dτ/n,

√√√√dζ∑

ℓ=1

r−2κℓ

ℓ

.

42

And similarly,

∥∥∥∥∥

(1

nΨ′Σ (τ )−1Ψ

)−1

−(1

nΨ′Σ−1Ψ

)−1∥∥∥∥∥

≤∥∥∥∥∥

(1


)−1∥∥∥∥∥

∥∥∥∥1

nΨ′(Σ (τ)−1 − Σ−1

)Ψ

∥∥∥∥

∥∥∥∥∥

(1

nΨ′Σ−1Ψ

)−1∥∥∥∥∥

= Op

(∥∥Σ (τ)−1 − Σ−1∥∥) = Op

ς(r)max

√dτ/n,

√√√√dζ∑

ℓ=1

r−2κℓ

ℓ

.

As in the proof of Theorem 4.2, nmn = σ−2u′Σ (τ )−1Ψ[Ψ′Σ (τ)−1Ψ]−1Ψ′Σ (τ)−1 u +

σ−2∑4

k=1Ak, where γ in the parametric setting is changed to τ in this nonparametric setting.

Then, by the MVT,

∣∣u′(Σ (τ)−1Ψ[Ψ′Σ (τ )−1Ψ]−1Ψ′Σ (τ )−1 − Σ−1Ψ[Ψ′Σ−1Ψ]−1Ψ′Σ−1

)u∣∣

≤ 2

(supt

∥∥∥∥1√nu′Σ (t)−1Ψ

∥∥∥∥

∥∥∥∥∥

(1

nΨ′Σ (t)−1Ψ

)−1∥∥∥∥∥

)dτ∑

j=1

∥∥∥∥1√nΨ′(Σ (τ )−1Σj (τ )Σ (τ )−1)u

∥∥∥∥

× |τj − τj0|+ 2 supt

∥∥∥∥1√nu′Σ (t)−1Ψ

∥∥∥∥

∥∥∥∥∥

(1

nΨ′Σ (t)−1Ψ

)−1∥∥∥∥∥

∥∥∥∥1√nΨ′ (Σ0 − Σ) u

∥∥∥∥

+

∥∥∥∥1√nu′Σ−1Ψ

∥∥∥∥2∥∥∥∥∥

(1


)−1

−(1

nΨ′Σ−1Ψ

)−1∥∥∥∥∥

= Op(√p)Op(dτ

√pς(r)/

√n) +Op(

√p)Op

√pς(r)

√√√√dζ∑

ℓ=1

r−2κℓ

ℓ

+ Op(p)Op

ς(r)max

√dτ/n,

√√√√dζ∑

ℓ=1

r−2κℓ

ℓ

= Op

pς(r)max

dτ/

√n,

√√√√dζ∑

ℓ=1

r−2κℓ

ℓ

= op(

√p),

where the last equality holds under the conditions of the theorem. Next, it remains to show

Ak = op(p1/2), k = 1, . . . , 4. The order of Ak, k ≤ 3, is the same as the parametric case:

|A1| =∣∣∣u′Σ (τ )−1

(θ0 − f

)∣∣∣ ≤ supα,t

∥∥∥∥u′Σ (t)−1 ∂f(x, α)

∂αj

∥∥∥∥∣∣α∗

j − αj

∣∣+ p1/4

n1/2supt

∥∥u′Σ (t)−1 h∥∥

43

= Op(√n)Op(

1√n) +O(

p1/4

n1/2)Op(

√n) = Op(p

1/4) = op(p1/2),

|A2| =∣∣∣(u+θ0 − f)′

(Σ (τ)−1 − Σ (τ )−1Ψ[Ψ′Σ (τ)−1Ψ]−1Ψ′Σ (τ)−1) e

∣∣∣≤ sup

t|u′Σ (t)−1 e| + sup

t

∣∣u′Σ (t)−1Ψ[Ψ′Σ (t)−1Ψ]−1Ψ′Σ (t)−1 e∣∣

+∥∥∥θ0 − f

∥∥∥ supt

(∥∥Σ (t)−1∥∥+

∥∥Σ (t)−1Ψ[Ψ′Σ (t)−1Ψ]−1Ψ′Σ (t)−1∥∥) ‖e‖

= Op(p−µn1/2) +Op(p

−µ+1/4n1/2) = Op(p−µ+1/4n1/2) = op(

√p),

|A3| =∣∣∣u′Σ (τ )−1Ψ

(Ψ′Σ (τ )−1Ψ

)−1Ψ′Σ (τ )−1 (θ0 − f)

∣∣∣

≤ supα,t

dα∑

j=1

∥∥∥∥u′Σ (t)−1Ψ

(Ψ′Σ (t)−1Ψ

)−1Ψ′Σ (t)−1 ∂f(x, α)

∂αj

∥∥∥∥∣∣α∗

j − αj

∣∣

+p1/4

n1/2supt

∥∥∥u′Σ (t)−1Ψ(Ψ′Σ (t)−1Ψ

)−1Ψ′Σ (t)−1 h

∥∥∥

= Op(1) +Op(p1/4) = Op(p

1/4) = op(p1/2).

However, A4 has a different order. Under Hℓ,

A4 =(θ0 − f

)′Σ (γ)−1

(θ0 − f

)

=(θ0 − f

)′Σ−1

0

(θ0 − f

)+(θ0 − f

)′ (Σ (τ)−1 − Σ−1

) (θ0 − f

)

=p1/2

nh′Σ−1

0 h+ op(1) +Op

(p1/2

)Op

ς(r)max

√dτ/n,

√√√√dζ∑

ℓ=1

r−2κℓ

ℓ

=p1/2

nh′Σ−1

0 h+ op(√p),

where the last equality holds under the conditions of the theorem. Combining these together,

we have nmn = σ−2v′Σ (τ )−1 u = σ−20 ε′V ε +

(p1/2/n

)h′Σ−1

0 h + op(√p), under Hℓ and the

same expression holds with h = 0 under H0.

Proof of Theorem 6.3. Omitted as it is similar to the proof of Theorem 4.4.

B Proofs of Theorems 4.2 and 4.4

Proof of Theorem 4.2. From Corollary 4.1 and Lemma D.2, ‖Σ (γ)− Σ‖ = Op (‖γ − γ0‖) =√dγ/n, so we have, from Assumption R.3,

∥∥Σ (γ)−1 − Σ−1∥∥ ≤

∥∥Σ (γ)−1∥∥ ‖Σ (γ)− Σ‖

∥∥Σ−1∥∥ = Op (‖γ − γ0‖) =

√dγ/n. (B.1)

44

Similarly,

∥∥∥∥∥

(1

nΨ′Σ (γ)−1Ψ

)−1

−(1

nΨ′Σ−1Ψ

)−1∥∥∥∥∥

≤∥∥∥∥∥

(1

nΨ′Σ (γ)−1Ψ

)−1∥∥∥∥∥

∥∥∥∥1

nΨ′(Σ (γ)−1 − Σ−1

)Ψ

∥∥∥∥

∥∥∥∥∥

(1

nΨ′Σ−1Ψ

)−1∥∥∥∥∥

≤ supγ∈Γ

∥∥∥∥∥

(1

nΨ′Σ (γ)−1Ψ

)−1∥∥∥∥∥∥∥Σ (γ)−1 − Σ−1

∥∥∥∥∥∥

1√nΨ

∥∥∥∥2

= Op (‖γ − γ0‖) =√dγ/n.

By Assumption R.2, we have α − α∗ = Op(1/√n). Denote by θ∗(x) = ψ(x)′β∗, where

β∗ = argminβ E[yi − ψ(xi)′β)]2, and set θni = θ(xi), θ0i = θ0(xi), θi = ψ′

iβ, fi = f(xi, α),

f ∗i = f(xi, α

∗). Then ui = yi − f(xi, α) = ui + θ0i − fi. Let θ0 = (θ0 (x1) , . . . , θ0 (xn))′ as

before, with similar component-wise notation for the n-dimensional vectors θ∗, f , and u. As

the approximation error is e = θ0 − θ∗ = θ0 −Ψβ∗,

θ − θ∗ = Ψ(β − β∗) = Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 (u+ θ0 −Ψβ∗)

= Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 (u+ e),

so that

nmn = σ−2v′Σ (γ)−1 u = σ−2(θ − f

)′Σ (γ)−1

(y − f

)

= σ−2(θ − θ∗ + θ∗ − θ0 + θ0 − f

)′Σ (γ)−1

(u+ θ0 − f

)

= σ−2[Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 (u+ e)− e+ θ0 − f

]′Σ (γ)−1

(u+ θ0 − f

)

= σ−2u′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u+ σ−2u′Σ (γ)−1(θ0 − f

)

− σ−2(u+ θ0 − f

)′Σ (γ)−1 (I −Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e

+ σ−2(θ0 − f

)′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u

+ σ−2(θ0 − f)′Σ (γ)−1 (θ0 − f)

= σ−2u′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u+ σ−2 (A1 + A2 + A3 + A4) ,

say. First, for any vector g comprising of conditioned random variables,

E[(u′Σ(γ)−1g

)2]= g′Σ(γ)−1ΣΣ(γ)−1g ≤ sup

γ∈Γ

∥∥Σ(γ)−1∥∥2 ‖Σ‖ ‖g‖2 = Op

(‖g‖2

),

45

uniformly in γ ∈ Γ, where the expectation is taken conditional on g. Similarly,

E

[(u′Σ(γ)−1Ψ

(Ψ′Σ(γ)−1Ψ

)−1Ψ′Σ(γ)−1g

)2]

= g′Σ(γ)−1Ψ(Ψ′Σ(γ)−1Ψ

)−1Ψ′Σ(γ)−1ΣΣ(γ)−1Ψ

(Ψ′Σ(γ)−1Ψ

)−1Ψ′Σ(γ)−1g

≤ supγ∈Γ

∥∥Σ(γ)−1∥∥4 ‖Σ‖

∥∥∥∥∥1

nΨ

(1

nΨ′Σ(γ)−1Ψ

)−1

Ψ′

∥∥∥∥∥

2

‖g‖2 = Op

(‖g‖2

),

uniformly and, for any j = 1, . . . , dγ,

E[(u′Σ(γ)−1Σj (γ)Σ (γ)−1 g

)2]= g′Σ(γ)−1Σj (γ)Σ (γ)−1ΣΣ(γ)−1Σj (γ) Σ (γ)−1 g

≤ supγ∈Γ

∥∥Σ(γ)−1∥∥4 ‖Σj (γ)‖2 ‖Σ‖ ‖g‖2 = Op

(‖g‖2

).

Let Ψk be the k-th column of Ψ, k = 1, . . . , p. Then, we have ‖Ψk/√n‖ = Op(1) and for any

γ ∈ Γ,

E

∥∥∥∥1√nu′Σ (γ)−1Ψ

∥∥∥∥2

≤p∑

k=1

E

(u′Σ (γ)−1 1√

nΨk

)2

= Op (p) ,

E

∥∥∥∥1√nu′Σ (γ)−1Σj (γ) Σ (γ)−1Ψ

∥∥∥∥2

≤p∑

k=1

E

(u′Σ (γ)−1Σj (γ)Σ (γ)−1 1√

nΨk

)2

= O(p).

Therefore, by Chebyshev’s inequality,

supγ∈Γ

∥∥∥∥1√nu′Σ (γ)−1Ψ

∥∥∥∥ = Op(√p) and sup

γ∈Γ

∥∥∥∥1√nu′Σ (γ)−1Σj (γ) Σ (γ)−1Ψ

∥∥∥∥ = Op(√p).

By the decomposition

u′(Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 − Σ−1Ψ[Ψ′Σ−1Ψ]−1Ψ′Σ−1

)u

= u′(Σ (γ)−1 + Σ−1

)Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′

(n∑

i=1

eine′in

)(Σ (γ)−1 − Σ−1

)u

+u′Σ−1Ψ([Ψ′Σ (γ)−1Ψ]−1 − [Ψ′Σ−1Ψ]−1

)Ψ′Σ−1u

= u′(Σ (γ)−1 + Σ−1

)Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′

(n∑

i=1

eine′in

)dγ∑

j=1

(Σ (γ)−1Σj (γ) Σ (γ)−1)

× u(γj − γj0) + u′Σ−1Ψ([Ψ′Σ (γ)−1Ψ]−1 − [Ψ′Σ−1Ψ]−1

)Ψ′Σ−1u,

where ein is an n × 1 vector with i-th entry one and zeros elsewhere, so∑n

i=1 eine′in = In,

46

and

e′in(Σ (γ)−1 − Σ−1

)u =

dγ∑

j=1

e′in(Σ (γ)−1Σj (γ)Σ (γ)−1)u(γj − γj0)

= e′in

dγ∑

j=1

(Σ (γ)−1Σj (γ)Σ (γ)−1)u(γj − γj0)

where γ is a value between γ and γ0 due to the mean value theorem. We have

∣∣u′(Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 − Σ−1Ψ[Ψ′Σ−1Ψ]−1Ψ′Σ−1

)u∣∣

≤ 2 supγ∈Γ

∥∥∥∥1√nu′Σ (γ)−1Ψ

∥∥∥∥

∥∥∥∥∥

(1

nΨ′Σ (γ)−1Ψ

)−1∥∥∥∥∥

dγ∑

j=1

∥∥∥∥1√nΨ′(Σ (γ)−1Σj (γ)Σ (γ)−1) u

∥∥∥∥

× |γj − γj0|+∥∥∥∥

1√nu′Σ−1Ψ

∥∥∥∥2∥∥∥∥∥

(1

nΨ′Σ (γ)−1Ψ

)−1

−(1

nΨ′Σ−1Ψ

)−1∥∥∥∥∥

= Op(√p)Op(dγ

√p/√n) +Op(p)Op(

√dγ/

√n) = Op(dγp/

√n) = op(

√p),

where the last equality holds under the conditions of the theorem.

It remains to show that

Ai = op(p1/2

), i = 1, . . . , 4. (B.2)

It is convenient to perform the calculations under Hℓ, which covers H0 as a particular case.

Using the mean value theorem and either H0 or Hℓ, we can express

θ0i − fi = f ∗i − fi − (p1/4/n1/2)hi =

dα∑

j=1

∂f(xi, α)

∂αj(α∗

j − αj)−p1/4

n1/2hi, (B.3)

where αj is a value between α∗j and αj. Then, for any j = 1, . . . , dα,

∣∣α∗j − αj

∣∣=Op(1/√n).

Based on

supγ∈Γ

∣∣∣u′Σ(γ)−1Ψ(Ψ′Σ(γ)−1Ψ

)−1Ψ′Σ(γ)−1g

∣∣∣=Op (‖g‖) and supγ∈Γ

∣∣u′Σ(γ)−1g∣∣ = Op (‖g‖)

for any γ ∈ Γ and any conditioned vector g, if we take g = ∂f(x, α)/∂αj or g = h, then both

satisfy Op (‖g‖) = Op (√n) and it follows that

|A1| =∣∣∣u′Σ (γ)−1

(θ0 − f

)∣∣∣ ≤ supγ,α

dα∑

j=1

∥∥∥∥u′Σ(γ)−1∂f(x, α)

∂αj

∥∥∥∥∣∣α∗

j − αj

∣∣+ p1/4

n1/2supγ

∥∥u′Σ(γ)−1h∥∥

47

= Op(√n)Op

(1√n

)+O

(p1/4

n1/2

)Op(

√n) = Op(p

1/4) = op(p1/2).

Similarly,

|A3| =∣∣∣u′Σ (γ)−1Ψ

(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 (θ0 − f)

∣∣∣

≤ supγ,α

dα∑

j=1

∥∥∥∥u′Σ (γ)−1Ψ

(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 ∂f(x, α)

∂αj

∥∥∥∥∣∣α∗

j − αj

∣∣

+p1/4

n1/2supγ

∥∥∥u′Σ (γ)−1Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 h

∥∥∥

= Op(1) +Op(p1/4) = Op(p

1/4) = op(p1/2).

Also, by Assumptions R.2 and R.10, we have

∥∥∥θ0 − f∥∥∥ ≤ sup

α

dα∑

j=1

∥∥∥∥∂f(x, α)

∂αj

∥∥∥∥∣∣α∗

j − αj

∣∣ + ‖h‖ p1/4

n1/2= Op(p

1/4). (B.4)

By (3.2), we have ‖e‖ = O(p−µn1/2) and

|A2| =∣∣∣(u+θ0 − f)′

(Σ (γ)−1 − Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e

∣∣∣≤ sup

γ|u′Σ (γ)−1 e|+ sup

γ

∣∣u′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 e∣∣

+∥∥∥θ0 − f

∥∥∥ supγ

(∥∥Σ (γ)−1∥∥+

∥∥Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1∥∥) ‖e‖

= Op(p−µn1/2) +Op(p

−µ+1/4n1/2) = Op(p−µ+1/4n1/2) = op(

√p).

where the last equality holds under the conditions of the theorem. Finally, under Hℓ,

A4 =(θ0 − f

)′Σ (γ)−1

(θ0 − f

)

=(θ0 − f

)′Σ−1

(θ0 − f

)+(θ0 − f

)′ (Σ (γ)−1 − Σ−1

) (θ0 − f

)

=p1/2

nh′Σ−1h + op(1) +Op

(p1/2d1/2γ /n1/2

)=p1/2

nh′Σ−1h + op(

√p).

Combining these together, we have

nmn = σ−2v′Σ (γ)−1 u =1

σ20

ε′V ε+p1/2

nh′Σ−1h+ op(

√p),

under Hℓ and the same expression holds with h = 0 under H0.

48

Proof of Theorem 4.4. (1) Follows from Theorems 4.2 and 4.3. (2) Following reason-

ing analogous to the proofs of Theorems 4.2 and 4.3, it can be shown that under H1,

mn = n−1σ∗−2(θ0 − f ∗)′Σ (γ∗)−1 (θ0 − f ∗) + op(1). Then,

Tn = (nmn − p) /√2p = (n/

√p) (θ0 − f ∗)′Σ (γ∗)−1 (θ0 − f ∗)/

(√2nσ∗2

)+ op (n/

√p)

and for any nonstochastic sequence {Cn}, Cn = o(n/p1/2), P (Tn > Cn) → 1, so that

consistency follows. (3) Follows from Theorems 4.2 and 4.3.

C Proof of Theorem 5.1

Proof. We prove the result under H1, which is the more challenging case as it involves

nonparametric estimation. The proof under H0 is similar. We will show φp→ φ0, whence

βp→ β0 and σ2 p→ σ2

0 follow from (5.3) and (5.4) respectively. First note that

L (φ)−L = log σ2 (φ) /σ2−n−1 log∣∣T ′(λ)Σ(γ)−1T (λ)Σ

∣∣ = log σ2 (φ) /σ2 (φ)−log σ2/σ20+log r(φ),

(C.1)

where recall that σ2 (φ) = n−1σ20tr (T

′(λ)Σ(γ)−1T (λ)Σ) , σ2 = σ2 (φ0) = n−1u′E ′MEu,

using (5.4) and also r(φ) = n−1tr (T ′(λ)Σ(γ)−1T (λ)Σ) / |T ′(λ)Σ(γ)−1T (λ)Σ|1/n.We have σ2 (φ) = n−1

{S−1′ (Ψβ0 + u)

}′S ′(λ)E(γ)′M (γ)E(γ)S(λ)S−1 (Ψβ0 + u) =

c1 (φ) + c2 (φ) + c3 (φ), where

c1 (φ) = n−1β ′0Ψ

′T ′(λ)E(γ)′M (γ)E(γ)T (λ)Ψβ0,

c2 (φ) = n−1σ20tr (T

′(λ)E(γ)′M (γ)E(γ)T (λ)Σ) ,

c3 (φ) = n−1tr(T ′(λ)E(γ)′M (γ)E(γ)T (λ)

(uu′ − σ2

0Σ))

+ 2n−1β ′0Ψ

′T ′(λ)E(γ)′M (γ)E(γ)T (λ)u.

Note that in the particular cases of Theorems 4.1 and 6.1, where T (λ) = In, the c1 term

vanishes because M (γ)E(γ)Ψ = 0 and M (τ)E(τ)Ψ = 0. Proceeding with the current,

more general proof

logσ2 (φ)

σ2 (φ)= log

σ2 (φ)

(c1 (φ) + c2 (φ))+ log

c1 (φ) + c2 (φ)

σ2 (φ)

= log

(1 +

c3 (φ)

c1 (φ) + c2 (φ)

)+ log

(1 +

c1 (φ)− f (φ)

σ2 (φ)

),

49

where f (φ) = n−1σ20tr (E

′−1T ′(λ)E(γ)′ (In −M (γ))E(γ)T (λ)E−1) . Then (C.1) implies

P(∥∥∥φ− φ0

∥∥∥ ∈ N φ(η))

= P

(inf

φ∈ Nφ(η)

L (φ)−L ≤ 0

)

≤ P

(log

(1 + sup

φ∈ Nφ(η)

∣∣∣∣c3 (φ)

c1 (φ) + c2 (φ)

∣∣∣∣

)+∣∣log

(σ2/σ2

0

)∣∣

≥ infφ∈ N

φ(η)

(log

(1 +

c1 (φ)− f (φ)

σ2 (φ)

)+ log r(φ)

)),

where recall thatN φ(η) = Φ\N φ (η) , N φ (η) = {φ : ‖φ− φ0‖ < η}∩Φ. Because σ2/σ2

0

p→ 1,

the property log (1 + x) = x+ o (x) as x→ 0 implies that it is sufficient to show that

supφ∈ N

φ(η)

∣∣∣∣c3 (φ)

c1 (φ) + c2 (φ)

∣∣∣∣p−→ 0, (C.2)

supφ∈ N

φ(η)

∣∣∣∣f (φ)

σ2 (φ)

∣∣∣∣p−→ 0, (C.3)

P

(inf

φ∈ Nφ(η)

{c1 (φ)

σ2 (φ)+ log r(φ)

}> 0

)−→ 1. (C.4)

Because N φ(η) ⊆

{Λ×N γ

(η/2)}∪{N λ

(η/2)× Γ}, we have

P

(inf

φ∈ Nφ(η)

{c1 (φ)

σ2 (φ)+ log r(φ)

}> 0

)≥ P

(min

{inf

Λ×Nγ(η/2)

c1 (φ)

σ2 (φ), infN

λ(η/2)

log r(φ)

}> 0

)

≥ P

(min

{inf

Λ×Nγ(η/2)

c1 (φ)

C, infN

λ(η/2)

log r(φ)

}> 0

),

from Assumption SAR.2, whence Assumptions SAR.3 and SAR.4 imply (C.4). Again using

Assumption SAR.2, uniformly in φ, |f (φ) /σ2 (φ)| = Op (|f (φ)|) and

|f (φ)| = Op

(tr(E ′−1T ′(λ)Σ(γ)−1Ψ

(Ψ′Σ(γ)−1Ψ

)−1Ψ′Σ(γ)−1T (λ)E−1

)/n)

= Op

(tr(E ′−1T ′(λ)Σ(γ)−1ΨΨ′Σ(γ)−1T (λ)E−1

)/n2)= Op

(∥∥Ψ′Σ(γ)−1T (λ)E−1/n∥∥2F

)

= Op

(‖Ψ/n‖2F ϕ2

(Σ(γ)−1

)‖T (λ)‖2

∥∥E−1∥∥2)= Op

(‖Ψ/n‖2F ‖T (λ)‖2 ϕ (Σ) /ϕ2 (Σ(γ))

)

= Op

(‖T (λ)‖2 /n

), (C.5)

50

where we have twice made use of the inequality

‖AB‖F ≤ ‖A‖F ‖B‖ (C.6)

for generic multiplication compatible matrices A and B. (C.3) now follows by Assumption

SAR.1 and compactness of Λ because T (λ) = In +∑dλ

j=1 (λ0j − λj)Gj. Finally consider

(C.2). We first prove pointwise convergence. For any fixed φ ∈ N φ(η) and large enough n,

Assumptions SAR.2 and SAR.4 imply

{c1 (φ)}−1 = Op

(‖β0‖−2) = Op(1) (C.7)

{c2 (φ)}−1 = Op(1), (C.8)

because {n−1σ20tr (T

′(λ)Σ(γ)−1T (λ)E−1)}−1= Op(1) and, proceeding like in the bound for

|f(φ)|, tE ′−1r (E ′−1T ′(λ)E(γ)′ (I −M (γ))E(γ)T (λ)E−1) = Op

(‖T (λ)‖2 /n

)= Op (1/n).

In fact it is worth noting for the equicontinuity argument presented later that Assump-

tions SAR.2 and SAR.4 actually imply that (C.7) and (C.8) hold uniformly over N φ(η), a

property not needed for the present pointwise arguments. Thus c3 (φ) / (c1 (φ) + c2 (φ)) =

Op (|c3 (φ)|) where, writing B(φ) = T ′(λ)E(γ)′M (γ)E(γ)T (λ) with typical element brs(φ),

r, s = 1, . . . , n, c3 (φ) has mean 0 and variance

Op

(‖B(φ)Σ‖2F

n2+

∑nr,s,t,v=1 brs(φ)btv(φ)κrstv

n2+

‖β ′0Ψ

′B(φ)E−1‖2n2

), (C.9)

with κrstv denoting the fourth cumulant of ur, us, ut, uv, r, s, t, v = 1, . . . , n. Under the linear

process assumed in Assumption R.4 it is known that

n∑

r,s,t,v=1

κ2rstv = O(n). (C.10)

Using (C.6) and Assumptions SAR.1 and R.3, the first term in parentheses in (C.9) is

Op

(‖B(φ)‖2F ϕ2 (Σ) /n2

)= Op

(‖T (λ)‖2F ‖E(γ)‖4 ‖M(γ)‖2 ‖T (λ)‖2 /n2

)

= Op

(‖T (λ)‖4 /nϕ2 (Σ(γ))

)= Op

(‖T (λ)‖4 /n

), (C.11)

while the second is similarly

Op

(‖B(φ)‖2F /n

)(

n∑

r,s,t,v=1

κ2rstv/n2

) 1

2

= op

(‖T (λ)‖4

), (C.12)

51

using (C.10). Finally, the third term in parentheses in (C.9) is

Op

(‖B(φ)‖2 /n

)= Op

(‖T (λ)‖4 /n

). (C.13)

By compactness of Λ and Assumption SAR.1, (C.11), (C.12) and (C.13) are negligible, thus

pointwise convergence is established.

Uniform convergence will follow from an equicontinuity argument. First, for arbitrary ε >

0 we can find points φ∗ = (λ′∗, γ′∗)

′, possibly infinitely many, such that the neighbourhoods

‖φ− φ∗‖ < ε form an open cover of N φ(η). Since Φ is compact any open cover has a finite

subcover and thus we may in fact choose finitely many φ∗ = (λ′∗, γ′∗)

′, whence it suffices to

prove

sup‖φ−φ

∗‖<ε

∣∣∣∣c3 (φ)

c1 (φ) + c2 (φ)− c3 (φ∗)

c1 (φ∗) + c2 (φ∗)

∣∣∣∣p−→ 0.

Proceeding as in Gupta and Robinson (2018), we denote the two components of c3 (φ) by

c31 (φ) , c32 (φ) , and are left with establishing the negligibility of

|c31 (φ)− c31 (φ∗)|c2 (φ)

+|c32 (φ)− c32 (φ∗)|

c1 (φ)+

|c3 (φ∗)|c1 (φ) c1 (φ∗)

|c1 (φ∗)− c1 (φ)|

+|c3 (φ∗)|

c2 (φ) c2 (φ∗)|c2 (φ∗)− c2 (φ)| , (C.14)

uniformly on ‖φ− φ∗‖ < ε. By the fact that (C.7) and (C.8) hold uniformly over Φ, we first

consider only the numerators in the first two terms in (C.14). As in the proof of Theorem

1 of Delgado and Robinson (2015), (C.6) implies that E(sup‖φ−φ

∗‖<ε |c31 (φ)− c31 (φ∗)|

)is

bounded by

n−1(E ‖u‖2 + σ2

0trΣ)

sup‖φ−φ

∗‖<ε

‖B(φ)−B(φ∗)‖ = Op

(sup

‖φ−φ∗‖<ε

‖B(φ)−B(φ∗)‖),

because E ‖u‖2 = O(n) and trΣ = O(n). B(φ)−B(φ∗) can be written as

(T (λ)− T (λ∗))′E(γ)′M(γ)E(γ)T (λ) + T (λ∗)

′Σ′(γ∗)M(γ∗)E(γ∗) (T (λ)− T (λ∗))

+ T ′(λ∗) (E(γ)′M(γ)E(γ)− E(γ∗)

′M(γ∗)E(γ∗))T (λ), (C.15)

which, by the triangle inequality, has spectral norm bounded by

‖T (λ)− T (λ∗)‖(‖E(γ)‖2 ‖T (λ)‖+ ‖E(γ∗)‖2 ‖T (λ∗)‖

)

+ ‖T (λ∗)‖ ‖E(γ)′M(γ)E(γ)− E(γ∗)′M(γ∗)E(γ∗)‖ ‖T (λ)‖

52

= Op (‖T (λ)− T (λ∗)‖+ ‖E(γ)′M(γ)E(γ)−E(γ∗)′M(γ∗)E(γ∗)‖) .

(C.16)

By Assumption SAR.1 the first term in parentheses on the right side of (C.16) is bounded

uniformly on ‖φ− φ∗‖ < ε by

dλ∑

j=1

|λj − λ∗j | ‖Gj‖ ≤ maxj=1,...,dλ

‖Gj‖ ‖λ− λ∗‖ = Op(ε), (C.17)

while because E(γ)′M(γ)E(γ) = n−1Σ(γ)−1Ψ (n−1Ψ′Σ(γ)−1Ψ)−1

Ψ′Σ(γ)−1 for any γ ∈ Γ,

the second one can be decomposed into terms with bounds typified by

n−1∥∥Σ(γ)−1 − Σ(γ∗)

−1∥∥ ‖Ψ‖2

∥∥∥(n−1Ψ′Σ(γ)−1Ψ

)−1∥∥∥∥∥Σ(γ)−1

∥∥2

≤ n−1 ‖Σ(γ)− Σ(γ∗)‖ ‖Ψ‖2∥∥∥(n−1Ψ′Σ(γ)−1Ψ

)−1∥∥∥∥∥Σ(γ)−1

∥∥3 ∥∥Σ(γ∗)−1∥∥

= Op (‖Σ(γ)− Σ(γ∗)‖) = Op(ε),

uniformly on ‖φ− φ∗‖ < ε, by Assumptions R.3 and R.8, Proposition 4.1 and the inequality

‖A‖ ≤ ‖A‖F for a generic matrix A, so that

sup‖φ−φ∗‖<ε

‖B(φ)−B(φ∗)‖ = Op(ε). (C.18)

Thus equicontinuity of the first term in (C.14) follows because ε is arbitrary. The equicontinu-

ity of the second term in (C.14) follows in much the same way. Indeed sup‖φ−φ∗‖<ε c32 (φ)−c32 (φ∗) = 2n−1β ′

0Ψ′ sup‖φ−φ∗‖<ε (B(φ)−B(φ∗))u = Op

(sup‖φ−φ∗‖<ε ‖B(φ)−B(φ∗)‖

)=

Op(ε), using earlier arguments and (C.18). Because c1(φ) is bounded and bounded away from

zero in probability (see C.7) for sufficiently large n and all φ ∈ N φ(η), the third term in (C.14)

may be bounded by |c3(φ∗)|/c1(φ∗) (1 + c1(φ∗)/c1(φ))p−→ 0, convergence being uniform on

‖φ− φ∗‖ < ε by pointwise convergence of c3(φ)/ (c1(φ) + c2(φ)), cf. Gupta and Robinson

(2018). The uniform convergence to zero of the fourth term in (C.14) follows in identical

fashion, because c2(φ) is bounded and bounded away from zero (see (C.8)) in probability for

sufficiently large n and all φ ∈ N φ(η). This concludes the proof.

D Lemmas

Lemma D.1. Under the conditions of Theorem 4.1, c1(γ) = n−1β ′Ψ′E ′(γ)M(γ)E(γ)Ψβ +

op(1).

53

Proof. First,

c1(γ) = n−1β ′Ψ′E ′(γ)M(γ)E(γ)Ψβ + c12(γ) + c13(γ).

with c12(γ) = 2n−1e′E ′(γ)M(γ)E(γ)Ψβ and c13(γ) = n−1e′E ′(γ)M(γ)E(γ)e. It is readily

seen that c12(γ) and c13(γ) are negligible.

Lemma D.2. Under the conditions of Theorem 4.2 or Theorem 5.2, ‖γ − γ0‖ =

Op

(√dγ/n

).

Proof. We show the details for the setting of Theorem 4.2 and omit the details for the

setting of Theorem 5.2. Write l = ∂L(β0, γ0)/∂γ. By Robinson (1988), we have ‖γ − γ0‖ =

Op (‖l‖). Now l =(l1, . . . , ldγ

)′, with lj = n−1tr (Σ−1Σj) − n−1σ−2

0 u′Σ−1ΣjΣ−1u. Next,

E ‖l‖2 =∑dγj=1E

(l2j)and

E(l2j)=

1

n2σ40

var(u′Σ−1ΣjΣ

−1u)=

1

n2σ40

var(ε′B′Σ−1ΣjΣ

−1Bε)=

1

n2σ40

var (ε′Djε) ,

(D.1)

say. But, writing dj,st for a typical element of the infinite dimensional matrix Dj, we have

var (ε′Djε) =(µ4 − 3σ4

0

) ∞∑

s=1

d2j,ss + 2σ40tr(D2

j

)=(µ4 − 3σ4

0

) ∞∑

s=1

d2j,ss + 2σ40

∞∑

s,t=1

d2j,st. (D.2)

Next, by Assumptions R.4, R.3 and R.9

∞∑

s=1

d2j,ss =∞∑

s=1

(b′sΣ

−1ΣjΣ−1bs

)2 ≤(

∞∑

s=1

‖bs‖2)∥∥Σ−1

∥∥2 ‖Σj‖ = O

(n∑

j=1

∞∑

s=1

b∗2js

)= O(n).

(D.3)

Similarly,

∞∑

s,t=1

d2j,st =

∞∑

s=1

b′sΣ−1ΣjΣ

−1

(∞∑

t=1

btb′t

)Σ−1ΣjΣ

−1bs =

∞∑

s=1

b′sΣ−1ΣjΣ

−1ΣjΣ−1bs = O(n).

(D.4)

Using (D.3) and (D.4) in (D.2) implies that E(l2j)= O (n−1), by (D.1). Thus we have

E ‖l‖2 = O (dγ/n), and thus ‖l‖ = Op

(√dγ/n

), by Markov’s inequality, proving the lemma.

Lemma D.3. Under the conditions of Theorem 4.3, E(σ−20 ε′V ε

)= p and

V ar(σ−20 ε′V ε

)/2p→ 1.

54

Proof. As E(σ−20 ε′V ε

)= tr (E[B′Σ−1Ψ(Ψ′Σ−1Ψ)−1Ψ′Σ−1B]) = p, and

V ar

(1

σ20

ε′V ε

)=

(µ4

σ40

− 3

) ∞∑

s=1

E(v2ss) + E[tr(V V′) + tr(V 2)] =

(µ4

σ40

− 3

) ∞∑

s=1

v2ss + 2p,

(D.5)

it suffices to show that

(2p)−1

∞∑

s=1

v2ssp→ 0. (D.6)

Because vss = b′sM bs, we have v2ss =(∑n

i,j=1 bisbjsmij

)2. Thus, using Assumption R.4 and

(A.5), we have

∞∑

s=1

v2ss ≤(supi,j

|mij |)2 ∞∑

s=1

(n∑

i,j=1

|b∗is|∣∣b∗js∣∣)2

= Op

p2n−2

(sups

n∑

i=1

|b∗is|)3 n∑

i=1

∞∑

s=1

|b∗is|

= Op

(p2n−1

), (D.7)

establishing (D.6) because p2/n→ 0.

Lemma D.4. Under the conditions of Theorem 6.2, ‖τ − τ0‖ = Op

(√dτ/n

).

Proof. The proof is similar to that of Lemma D.2 and is omitted.

Denote H(γ) = In+∑m1+m2

j=m1+1 γjWj andK(γ) = In−∑m1

j=1 γjWj. Let Gj(γ) = WjK−1(γ),

j = 1, . . . , m1, Tj = H−1(γ)Wj, j = m1+1, . . . , m1+m2 and, for a generic matrix A, denote

A = A + A′. Our final conditions may differ according to whether the Wj are of general

form or have ‘single nonzero diagonal block structure’, see e.g Gupta and Robinson (2015).

To define these, denote by V an n × n block diagonal matrix with i-th block Vi, a si × si

matrix, where∑m1+m2

i=1 si = n, and for i = 1, ..., m1 + m2 obtain Wj from V by replacing

each Vj , j 6= i, by a matrix of zeros. Thus V =∑m1+m2

i=1 Wj.

Lemma D.5. For the spatial error model with SARMA(p, q) errors, if

supγ∈Γo

(∥∥K−1(γ)∥∥+

∥∥K ′−1(γ)∥∥+

∥∥H−1(γ)∥∥+

∥∥H ′−1(γ)∥∥)+ max

j=1,...,m1+m2

‖Wj‖ < C, (D.8)

then

(DΣ(γ))(γ†)= A−1(γ)

(m1∑

j=1

γ†jH−1(γ)Gj(γ) +

m1+m2∑

j=m1+1

γ†jTj(γ)

)A′−1(γ).

Proof. We first show that DΣ ∈ L (Γo,Mn×n). Clearly, DΣ is a linear map and (D.8)

∥∥(DΣ(γ))(γ†)∥∥ ≤ C

∥∥γ†∥∥1,

55

in the general case and

∥∥(DΣ(γ))(γ†)∥∥ ≤ C max

j=1,...,m1+m2

∣∣∣γ†j∣∣∣ ,

in the ‘single nonzero diagonal block’ case. Thus DΣ is a bounded linear operator between

two normed linear spaces, i.e. it is a continuous linear operator.

With A(γ) = H−1(γ)K(γ), we now show that

A−1(γ + γ†

)A′−1

(γ + γ†

)− A−1 (γ)A′−1 (γ)− (DΣ(γ))

(γ†)

‖γ†‖g→ 0, as

∥∥γ†∥∥g→ 0, (D.9)

where ‖·‖g is either the 1-norm or the max norm on Γ. First, note that

A−1(γ + γ†

)A′−1

(γ + γ†

)−A−1(γ)A′−1(γ)

= A−1(γ + γ†

) (A−1

(γ + γ†

)− A−1(γ)

)′+(A−1

(γ + γ†

)− A−1(γ)

)A−1(γ)

= −A−1(γ + γ†

)A′−1

(γ + γ†

) (A(γ + γ†

)− A(γ)

)′A′−1(γ)

− A−1(γ + γ†

) (A(γ + γ†

)− A(γ)

)A−1(γ)A′−1(γ). (D.10)

Next,

A(γ + γ†

)−A(γ) = H−1

(γ + γ†

)K(γ + γ†

)−H−1 (γ)K (γ)

= H−1(γ + γ†

) (K(γ + γ†

)−K(γ)

)

+ H−1(γ + γ†

) (H (γ)−H

(γ + γ†

))H−1 (γ)K (γ)

= −H−1(γ + γ†

)(

m1∑

j=1

γ†jWj +

m1+m2∑

j=m1+1

γ†jWjH−1(γ)K(γ)

).

(D.11)

Substituting (D.11) in (D.10) implies that

A−1(γ + γ†

)A′−1

(γ + γ†

)− A−1(γ)A′−1(γ) = ∆1

(γ, γ†

)+∆2

(γ, γ†

)= ∆

(γ, γ†

), (D.12)

say, where

∆1

(γ, γ†

)= A−1

(γ + γ†

)A′−1

(γ + γ†

)(

m1∑

j=1

γ†jW′j +K ′(γ)H ′−1(γ)

m1+m2∑

j=m1+1

γ†jW′j

)

× H ′−1(γ + γ†

)A′−1(γ),

56

∆2

(γ, γ†

)= A−1

(γ + γ†

)H−1

(γ + γ†

)(

m1∑

j=1

γ†jWj +

m1+m2∑

j=m1+1

γ†jWjH−1(γ)K(γ)

)

× A−1(γ)A′−1(γ).

From the definitions above and recalling that A(γ) = H−1(γ)K(γ), we can write

∆(γ, γ†

)= A−1

(γ + γ†

)Υ(γ, γ†

)A′−1 (γ) , (D.13)

with

Υ(γ, γ†

)=

m1∑

j=1

γ†jG′j

(γ + γ†

)H ′−1

(γ + γ†

)+ A′−1

(γ + γ†

)A′(γ)

m1+m2∑

j=m1+1

γ†jT′j

(γ + γ†

)

+

m1∑

j=1

γ†jH−1(γ + γ†

)Gj (γ) +

m1+m2∑

j=m1+1

γ†jTj(γ + γ†

).

Then (D.12) implies that

A−1(γ + γ†

)A′−1

(γ + γ†

)− A−1(γ)A′−1(γ)− (DΣ(γ))

(γ†)

= A−1(γ + γ†

)A′−1

(γ + γ†

)− A−1(γ)A′−1(γ)−∆

(γ, γ†

)− (DΣ(γ))

(γ†)+∆

(γ, γ†

)

= ∆(γ, γ†

)− (DΣ(γ))

(γ†), (D.14)

so to prove (D.9) it is sufficient to show that

∆(γ, γ†

)− (DΣ(γ))

(γ†)

‖γ†‖g→ 0 as

∥∥γ†∥∥g→ 0. (D.15)

The numerator in (D.15) can be written as∑7

i=1Πi

(γ, γ†

)A′−1(γ) by adding, subtracting

and grouping terms, where (omitting the argument(γ, γ†

))

Π1 = A−1(γ + γ†

) m1∑

j=1

γ†jG′j

(γ + γ†

)H ′−1(γ)

(H(γ)−H

(γ + γ†

))′H ′−1

(γ + γ†

),

Π2 = A−1(γ + γ†

) m1∑

j=1

γ†jH−1(γ + γ†

) (H(γ)−H

(γ + γ†

))H−1(γ)Gj (γ) ,

Π3 = A−1(γ + γ†

) m1+m2∑

j=m1+1

γ†j(A−1

(γ + γ†

)− A−1 (γ)

)T ′j

(γ + γ†

),

Π4 =(A−1

(γ + γ†

)− A−1 (γ)

) m1+m2∑

j=m1+1

γ†jTj (γ + γ†),

57

Π5 = A−1(γ)

m1+m2∑

j=m1+1

γ†jH−1 (γ + γ†) (H(γ)−H (γ + γ†))H−1(γ)Wj,

Π6 = ∆(γ, γ†

) m1∑

j=1

γ†jW′jH

′−1(γ),

Π7 =(A−1

(γ + γ†

)− A−1 (γ)

) m1∑

j=1

γ†jH−1(γ)Gj(γ).

By (D.8), (D.13) and replication of earlier techniques, we have

maxi=1,...,7

supγ∈Γo

∥∥Πi

(γ, γ†

)A−1(γ)

∥∥ ≤ C∥∥γ†∥∥2g, (D.16)

where the norm used on the RHS of (D.16) depends on whether we are considering the

general case or the ‘single nonzero diagonal block’ case. Thus

∥∥∆(γ, γ†

)− (DΣ(γ))

(γ†)∥∥

‖γ†‖g≤ C

∥∥γ†∥∥g→ 0 as

∥∥γ†∥∥g→ 0,

proving (D.15) and thus (D.9).

Corollary D.1. For the spatial error model with SAR(m1) errors,

(DΣ(γ))(γ†)= K−1(γ)

m1∑

j=1

γ†jGj(γ)K′−1(γ).

Proof. Taking m2 = 0 in Lemma D.5, the elements involving sums from m1 + 1 to m1 +m2

do not arise and H(γ) = In, proving the claim.

Corollary D.2. For the spatial error model with SMA(m2) errors,

(DΣ(γ))(γ†)= H(γ)

m2∑

j=1

γ†jTj(γ)H′(γ).

Proof. Taking m1 = 0 in Lemma D.5, the elements involving sums from 1 to m1 do not arise

and K(γ) = In, proving the claim.

Lemma D.6. For the spatial error model with MESS(m1) errors, if

maxj=1,...,m1

(‖Wj‖+

∥∥W ′j

∥∥) < 1, (D.17)

58

then

(DΣ(γ))(γ†)= exp

(m1∑

j=1

γj(Wj +W ′

j

))

m1∑

j=1

γ†j(Wj +W ′

j

).

Proof. Clearly DΣ ∈ L (Γo,Mn×n). Next,

∥∥A−1(γ + γ†

)A′−1

(γ + γ†

)− A−1(γ)A′−1(γ)− (DΣ(γ))

(γ†)∥∥

=

∥∥∥∥∥exp(

m1∑

j=1

(γj + γ†j

) (Wj +W ′

j

))

− exp

(m1∑

j=1

γj(Wj +W ′

j

))

− (DΣ(γ))(γ†)∥∥∥∥∥

=

∥∥∥∥∥exp(

m1∑

j=1

γj(Wj +W ′

j

))(

exp

(m1∑

j=1

γ†j(Wj +W ′

j

))

− In −m1∑

j=1

γ†j(Wj +W ′

j

))∥∥∥∥∥

≤∥∥∥∥∥exp

(m1∑

j=1

γj(Wj +W ′

j

))∥∥∥∥∥

∥∥∥∥∥exp(

m1∑

j=1

γ†j(Wj +W ′

j

))

− In −m1∑

j=1

γ†j(Wj +W ′

j

)∥∥∥∥∥

≤ C

∥∥∥∥∥∥In +

p∑

j=1

γ†j(Wj +W ′

j

)+

∞∑

k=2

{m1∑

j=1

γ†j(Wj +W ′

j

)}k

− In −m1∑

j=1

γ†j(Wj +W ′

j

)∥∥∥∥∥∥

≤ C

∥∥∥∥∥∥

∞∑

k=2

{m1∑

j=1

γ†j(Wj +W ′

j

)}k∥∥∥∥∥∥≤ C

∞∑

k=2

m1∑

j=1

∣∣∣γ†j∣∣∣∥∥(Wj +W ′

j

)∥∥k

≤ C∞∑

k=2

∥∥γ†∥∥kg, (D.18)

by (D.17), without loss of generality, and again the norm used in (D.18) depending on

whether we are in the general or the ‘single nonzero diagonal block’ case. Thus

∥∥A−1(γ + γ†

)A′−1

(γ + γ†

)− A−1(γ)A′−1(γ)− (DΣ(γ))

(γ†)∥∥

‖γ†‖g≤ C

∞∑

k=2

∥∥γ†∥∥k−1

g→ 0,

as∥∥γ†∥∥g→ 0, proving the claim.

Theorem D.1. Under the conditions of Theorem 4.4 or 5.3, Tn − T an = op(1) as n→ ∞.

Proof. It suffices to show that nmn = nmn+op(√p). As η = y− θ, u = y− f , and v = θ− f ,

we have u = η + v and

nmn = σ−2(u′Σ (γ)−1 u− η′Σ (γ)−1 η

)= σ−2

(2u′Σ (γ)−1 v − v′Σ (γ)−1 v

)

= 2nmn − σ−2[Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 (u+ e)− e+ θ0 − f

]′

Σ (γ)−1[Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 (u+ e)− e+ θ0 − f

]

59

= 2nmn − σ−2u′Σ (γ)−1Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 u−σ−2

(θ0 − f

)′Σ (γ)−1

(θ0 − f

)

+σ−2(2(θ0 − f)− e

)′Σ (γ)−1 (I −Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e

−2σ−2(θ0 − f

)′Σ (γ)−1Ψ

(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 u

= 2nmn −(nmn − σ−2 (A1 + A2 + A3 + A4)

)− σ−2A4

+σ−2(2(θ0 − f)− e

)′Σ (γ)−1 (I −Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e− 2σ−2A3

= nmn + σ−2 (A1 + A2 − A3)

+σ−2(2(θ0 − f)− e

)′Σ (γ)−1

(I −Ψ

(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1

)e. (D.19)

In the proof of Theorem 4.2, we have shown that

∣∣∣∣(θ0 − f

)′Σ (γ)−1 (I −Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e

∣∣∣∣ = op(√p)

in the process of proving |A2| = op(√p). Along with

∣∣∣e′Σ (γ)−1(I −Ψ

(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1

)e∣∣∣

≤∣∣e′Σ (γ)−1 e

∣∣ +∣∣∣e′Σ (γ)−1Ψ

(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1 e

∣∣∣

≤ ‖e‖2 supγ∈Γ

∥∥Σ (γ)−1∥∥+ ‖e‖2 sup

γ∈Γ

∥∥Σ (γ)−1∥∥2∥∥∥∥∥1

nΨ

(1

nΨ′Σ (γ)−1Ψ

)−1

Ψ′

∥∥∥∥∥= Op

(‖e‖2

)= Op

(p−2µn

)= op(

√p),

we complete the proof that nmn = nmn + op(√p). In the SAR setting of Section 5,

nmn = σ−2(u′Σ (γ)−1 u− η′Σ (γ)−1 η

)= σ−2

(2u′Σ (γ)−1 v − v′Σ (γ)−1 v

)

= 2nmn − σ−2

[Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1

(u+ e+

dλ∑

j=1

(λj0 − λj)Wjy

)− e+ θ0 − f

]′

Σ (γ)−1

[Ψ(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1

(u+ e+

dλ∑

j=1

(λj0 − λj)Wjy

)− e+ θ0 − f

].

Compared to the expression in (D.19), we have the additional terms

−σ−2

dλ∑

j=1

(λj0 − λj)Wjy′Σ (γ)−1Ψ

(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1

dλ∑

j=1

(λj0 − λj)Wjy

60

and

−2σ−2

dλ∑

j=1

(λj0 − λj)Wjy′Σ (γ)−1Ψ

(Ψ′Σ (γ)−1Ψ

)−1Ψ′Σ (γ)−1

(u+ θ0 − f

).

Both terms are op(√p) from the orders of A5 and A6 in the proof of Theorem 5.2. Hence, in

the SAR setting, nmn = nmn + op(√p) also holds.

We now present similar calculations that justify the validity of our bootstrap test for the

SARARMA(m1,m2, m3) model. The bootstrapped test statistic is constructed by

nm∗n = v∗′Σ (γ∗)−1 u∗ = (θ∗n − f(x, α∗

n))′Σ (γ∗)−1

((In −

m1∑

k=1

λ∗kW1k)y∗ − f(x, α∗

n)

)

Let Jn = (In − 1nlnl

′n). As y = S(λ)−1(θ(x) +R(γ)ξ), we have

ξ = Jnξ

= Jn

(

m3∑

l=1

γ3lW3l + In

)−1

+

(m3∑

l=1

γ3lW3l + In

)−1 m3∑

l=1

(γ3l − γ3l)W3l

(m3∑

l=1

γ3lW3l + In

)−1

×(In −

m2∑

l=1

γ2lW2l +

m2∑

l=1

(γ2l − γ2l)W2l

)(S(λ)y − θ(x) +

m1∑

k=1

(λk − λk)W1ky + θ(x)− ψ′β

)

= ξ − 1

nlnl

′nξ + Jn

(m3∑

l=1

γ3lW3l + In

)−1(In −

m2∑

l=1

γ2lW2l

)(m1∑

k=1


)

+Jn

(m3∑

l=1

γ3lW3l + In

)−1 m2∑

l=1

(γ2l − γ2l)W2l

(S(λ)y − θ(x) +

m1∑

k=1


)

+Jn

(m3∑

l=1

γ3lW3l + In

)−1 m3∑

l=1

(γ3l − γ3l)W3l

(m3∑

l=1

γ3lW3l + In

)−1

×(In −

m2∑

l=1

γ2lW2l +

m2∑

l=1

(γ2l − γ2l)W2l

)(S(λ)y − θ(x) +

m1∑

k=1


),

which can be written as

ξ = ξ +

r∑

j=1

ζ1n,jpnj +

s∑

j=1

ζ2n,jQnjξ,

where pnj is an n-dimensional vector with bounded elements, Qnj = [qnj,i] is an n×n matrix

with bounded row and column sum norms, and ζ1n,j and ζ2n,j’s are equal to l′nξ/n, elements

61

of λk − λk, γ2l − γ2l, θ(x) − ψ′β or their products. This differs from the proof of Lemma 2

in Jin and Lee (2015) in the term θ(x) − ψ′β and potentially increasing order of dγ. Then,

ζ1n,j = Op(√p1/2/n ∨

√dγ/n) and ζ2n,j = Op(

√p1/2/n ∨

√dγ/n), instead of Op(

√1/n) as

in Jin and Lee (2015). Based on this result, the assumptions in Theorem 4 of Su and Qu

(2017) hold, so the validility of our bootstrap test directly follows.

References

Ambrosetti, A. and G. Prodi (1995). A Primer of Nonlinear Analysis. Cambridge University

Press.

Anatolyev, S. (2012). Inference in regression models with many regressors. Journal of

Econometrics 170, 368–382.

Autant-Bernard, C. and J. P. LeSage (2011). Quantifying knowledge spillovers using spatial

autoregressive models. Journal of Regional Science 51, 471–496.

Bloom, N., M. Schankerman, and J. van Reenen (2013). Identifying technology pillovers and

product market rivalry. Econometrica 81, 1347–1393.

Case, A. C. (1991). Spatial patterns in household demand. Econometrica 59, 953–965.

Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models, Volume 6B,

Chapter 76, pp. 5549–5632. North Holland.

Chen, X., H. Hong, and E. Tamer (2005). Measurement error models with auxiliary data.

Review of Economic Studies 72, 343–366.

Cliff, A. D. and J. K. Ord (1973). Spatial Autocorrelation. London: Pion.

Conley, T. G. and B. Dupor (2003). A spatial analysis of sectoral complementarity. Journal

of Political Economy 111, 311–352.

Cooke, R. G. (1950). Infinite Matrices & Sequence Spaces. Macmillan and Company, London.

De Jong, R. M. and H. J. Bierens (1994). On the limit behavior of a chi-square type test if

the number of conditional moments tested approaches infinity. Econometric Theory 10,

70–90.

De Oliveira, V., B. Kedem, and D. A. Short (1997). Bayesian prediction of transformed

Gaussian random fields. Journal of the American Statistical Association 92, 1422–1433.

Debarsy, N., F. Jin, and L. F. Lee (2015). Large sample properties of the matrix exponential

spatial specification with an application to FDI. Journal of Econometrics 188, 1–21.

Delgado, M. and P. M. Robinson (2015). Non-nested testing of spatial correlation. Journal

of Econometrics 187, 385–401.

62

Ertur, C. and W. Koch (2007). Growth, technological interdependence and spatial external-

ities: theory and evidence. Journal of Applied Econometrics 22, 1033–1062.

Evans, P. and J. U. Kim (2014). The spatial dynamics of growth and convergence in Korean

regional incomes. Applied Economics Letters 21, 1139–1143.

Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. Journal

of the American Statistical Association 102, 321–331.

Gao, J. and V. Anh (2000). A central limit theorem for a random quadratic form of strictly

stationary processes. Statistics and Probability Letters 49, 69–79.

Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data.

Journal of the American Statistical Association 97, 590–600.

Gradshteyn, I. S. and I. M. Ryzhik (1994). Table of Integrals, Series and Products (5th ed.).

Academic Press, London.

Gupta, A. (2018a). Autoregressive spatial spectral estimates. Journal of Econometrics 203,

80–95.

Gupta, A. (2018b). Nonparametric specification testing via the trinity of tests. Journal of


Gupta, A., S. Kokas, and A. Michaelides (2021). Credit market spillovers in a financial

network: Evidence from spatial autoregressive model. Working paper.

Gupta, A. and P. M. Robinson (2015). Inference on higher-order spatial autoregressive

models with increasingly many parameters. Journal of Econometrics 186, 19–31.

Gupta, A. and P. M. Robinson (2018). Pseudo maximum likelihood estimation of spatial

autoregressive models with increasing dimension. Journal of Econometrics 202, 92–107.

Hahn, J., G. Kuersteiner, and M. Mazzocco (2021). Joint time-series and cross-section limit

theory under mixingale assumptions. Econometric Theory firstview, 1–17.

Han, X., L.-f. Lee, and X. Xu (2021). Large sample properties of Bayesian estimation of

spatial econometric models. Econometric Theory 37, 708–746.

Hannan, E. J. (1970). Multiple Time Series. John Wiley & Sons.

Heston, A., R. Summers, and B. Aten (2002). Penn World Tables Verison 6.1. Downloadable

dataset, Center for International Comparisons at the University of Pennsylvania.

Hidalgo, J. and M. Schafgans (2017). Inference and testing breaks in large dynamic panels

with strong cross sectional dependence. Journal of Econometrics 196, 259–274.

Hillier, G. and F. Martellosio (2018a). Exact and higher-order properties of the MLE in

spatial autoregressive models, with applications to inference. Journal of Econometrics 205,

402–422.

Hillier, G. and F. Martellosio (2018b). Exact likelihood inference in group interaction network

models. Econometric Theory 34, 383–415.

63

Ho, C.-Y., W. Wang, and J. Yu (2013). Growth spillover through trade: A spatial dynamic

panel data approach. Economics Letters 120, 450–453.

Hong, Y. and H. White (1995). Consistent specification testing via nonparametric series

regression. Econometrica 63, 1133–1159.

Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. The

Annals of Statistics 1, 799–821.

Jenish, N. and I. R. Prucha (2009). Central limit theorems and uniform laws of large numbers

for arrays of random fields. Journal of Econometrics 150, 86–98.

Jenish, N. and I. R. Prucha (2012). On spatial processes and asymptotic inference under

near-epoch dependence. Journal of Econometrics 170, 178 – 190.

Jin, F. and L. F. Lee (2015). On the bootstrap for Moran’s i test for spatial dependence.

Journal of Econometrics 184, 295–314.

Kelejian, H. H. and I. R. Prucha (1998). A generalized spatial two-stage least squares

procedure for estimating a spatial autoregressive model with autoregressive disturbances.

Journal of Real Estate Finance and Economics 17, 99–121.

Kelejian, H. H. and I. R. Prucha (2001). On the asymptotic distribution of the Moran I test

statistic with applications. Journal of Econometrics 104, 219–257.

Koenker, R. and J. A. F. Machado (1999). GMM inference when the number of moment

conditions is large. Journal of Econometrics 93, 327–344.

Konig, M. D., D. Rohner, M. Thoenig, and F. Zilibotti (2017). Networks in conflict: Theory

and evidence from the Great War of Africa. Econometrica 85, 1093–1132.

Kuersteiner, G. M. and I. R. Prucha (2020). Dynamic spatial panel models: Networks,

common shocks, and sequential exogeneity. Econometrica 88, 2109–2146.

Lee, J., P. C. B. Phillips, and F. Rossi (2020). Consistent misspecification testing in spatial

autoregressive models. Cowles Foundations Discussion Paper no. 2256.

Lee, J. and P. M. Robinson (2016). Series estimation under cross-sectional dependence.


Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for

spatial autoregressive models. Econometrica 72, 1899–1925.

Lee, L. F. and X. Liu (2010). Efficient GMM estimation of high order spatial autoregressive

models with autoregressive disturbances. Econometric Theory 26, 187–230.

LeSage, J. P. and R. Pace (2007). A matrix exponential spatial specification. Journal of


Malikov, E. and Y. Sun (2017). Semiparametric estimation and testing of smooth coefficient

spatial autoregressive models. Journal of Econometrics 199, 12–34.

Matern, B. (1986). Spatial Variation. Almaenna Foerlaget, Stockholm.

64

Mohnen, M. (2021). Stars and brokers: Peer effects among medical scientists. Management

Science (forthcoming), 20 pp.

Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators.


Oettl, A. (2012). Reconceptualizing stars: scientist helpfulness and peer performance. Man-

agement Science 58, 1122–1140.

Pinkse, J. (1999). Asymptotic properties of Moran and related tests and testing for spatial

correlation in probit models. Mimeo: Department of Economics, University of British

Columbia and University College London.

Pinkse, J., M. E. Slade, and C. Brett (2002). Spatial price competition: A semiparametric

approach. Econometrica 70, 1111–1153.

Portnoy, S. (1984). Asymptotic behavior of M-estimators of p regression parameters when

p2/n is large. I. Consistency. The Annals of Statistics 12, 1298–1309.

Portnoy, S. (1985). Asymptotic behavior of M-estimators of p regression parameters when

p2/n is large; II. Normal approximation. The Annals of Statistics 13, 1403–1417.

Robinson, P. M. (1972). Non-linear regression for multiple time-series. Journal of Applied

Probability 9, 758–768.

Robinson, P. M. (1988). The stochastic difference between econometric statistics. Econo-

metrica 56, 531–548.

Robinson, P. M. (2011). Asymptotic theory for nonparametric regression with spatial data.


Robinson, P. M. and F. Rossi (2015). Refined tests for spatial correlation. Econometric

Theory 31, 1249–1280.

Robinson, P. M. and S. Thawornkaiwong (2012). Statistical inference on regression with

spatial dependence. Journal of Econometrics 167, 521–542.

Scott, D. J. (1973). Central limit theorems for martingales and for processes with stationary

increments using a Skorokhod representation approach. Advances in Applied Probability 5,

119–137.

Stein, M. (1999). Interpolation of Spatial Data. Springer-Verlag, New York.

Stinchcombe, M. B. and H. White (1998). Consistent specification testing with nuisance

parameters present only under the alternative. Econometric Theory 14, 295–325.

Su, L. and S. Jin (2010). Profile quasi-maximum likelihood estimation of partially linear

spatial autoregressive models. Journal of Econometrics 157, 18–33.

Su, L. and X. Qu (2017). Specification test for spatial autoregressive models. Journal of

Business & Economic Statistics 35, 572–584.

65

Sun, Y. (2016). Functional-coefficient spatial autoregressive models with nonparametric

spatial weights. Journal of Econometrics 195, 134–153.

Sun, Y. (2020). The LLN and CLT for U-statistics under cross-sectional dependence. Journal

of Nonparametric Statistics 32, 201–224.

66

consistent speciﬁcation testing under spatial dependence

Documents