consistent specification testing under spatial dependence
TRANSCRIPT
arX
iv:2
101.
1025
5v2
[ec
on.E
M]
31
Aug
202
1
Consistent specification testing under spatial
dependence∗
Abhimanyu Gupta†‡ Xi Qu§¶
September 1, 2021
Abstract
We propose a series-based nonparametric specification test for a regression function
when data are spatially dependent, the ‘space’ being of a general economic or social
nature. Dependence can be parametric, parametric with increasing dimension, semi-
parametric or any combination thereof, thus covering a vast variety of settings. These
include spatial error models of varying types and levels of complexity. Under a new
smooth spatial dependence condition, our test statistic is asymptotically standard nor-
mal. To prove the latter property, we establish a central limit theorem for quadratic
forms in linear processes in an increasing dimension setting. Finite sample perfor-
mance is investigated in a simulation study, with a bootstrap method also justified and
illustrated, and empirical examples illustrate the test with real-world data.
Keywords: Specification testing, nonparametric regression, spatial dependence, cross-
sectional dependence
JEL Classification: C21, C55
∗We thank the co-editor and three anonymous referees for insightful comments that improved the paper.We are grateful to Swati Chandna, Miguel Delgado, Emmanuel Guerre, Fernando Lopez Hernandez, HonHo Kwok, Arthur Lewbel, Daisuke Murakami, Ryo Okui and Amol Sasane for helpful comments. We alsothank seminar participants at YEAP 2018 (Shanghai University of Finance and Economics), NYU Shanghai,Carlos III Madrid, SEW 2018 (Dijon), Aarhus University, SEA 2018 (Vienna), EcoSta 2018 (Hong Kong),Hong Kong University, AFES 2018 (Cotonou), ESEM 2018 (Cologne), CFE 2018 (Pisa), University of York,Pennsylvania State University, Michigan State University, University of Michigan, Texas A&M University,1st Southampton Workshop on Econometrics and Statistics and MEG 2019 (Columbus).
†Department of Economics, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK. E-mail:[email protected].
‡Research supported by ESRC grant ES/R006032/1.§Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China, 200052.
E-mail: [email protected].¶Research supported by the National Natural Science Foundation of China, Project nos. 71973097,
71601115 and 72031006.
1
1 Introduction
Models for spatial dependence have recently become the subject of vigorous research. This
burgeoning interest has roots in the needs of practitioners who frequently have access to data
sets featuring inter-connected cross-sectional units. Motivated by these practical concerns,
we propose a specification test for a regression function in a general setup that covers a vast
variety of commonly employed spatial dependence models and permits the complexity of
dependence to increase with sample size. Our test is consistent, in the sense that a parametric
specification is tested with asymptotically unit power against a nonparametric alternative.
The ‘spatial’ models that we study are not restricted in any way to be geographic in nature,
indeed ‘space’ can be a very general economic or social space. Our empirical examples
feature conflict alliances and technology externalities as examples of ‘spatial dependence’,
for instance.
Specification testing is an important problem, and this is reflected in a huge literature
studying consistent tests. Much of this is based on independent, and often also identically
distributed, data. However data frequently exhibit dependence and consequently a branch
of the literature has also examined specification tests under time series dependence. Our
interest centres on dependence across a ‘space’, which differs quite fundamentally from de-
pendence in a time series context. Time series are naturally ordered and locations of the
observations can be observed, or at least the process generating these locations may be
modelled. It can be imagined that concepts from time series dependence be extended to
settings where the data are observed on a geographic space and dependence can be treated
as decreasing function of distance between observations. Indeed much work has been done to
extend notions of time series dependence in this type of setting, see e.g. Jenish and Prucha
(2009, 2012).
However, in a huge variety of economics and social science applications agents influence
each other in ways that do not conform to such a setting. For example, farmers affect the de-
mand of farmers in the same village but not in different villages, as in Case (1991). Likewise,
price competition among firms exhibits spatial features (Pinkse, Slade, and Brett (2002)),
input-output relations lead to complementarities between sectors (Conley and Dupor (2003)),
co-author connections form among scientists (Oettl (2012), Mohnen (2021)), R&D spillovers
occur through technology and product market spaces (Bloom, Schankerman, and van Reenen
(2013)), networks form due to allegiances in conflicts (Konig, Rohner, Thoenig, and Zilibotti
(2017)) and overlapping bank portfolios lead to correlated lending decisions (Gupta, Kokas, and Michaelides
(2021)). Such examples cannot be studied by simply extending results developed for time
series and illustrate the growing need for suitable methods.
2
A very popular model for general spatial dependence is the spatial autoregressive (SAR)
class, due to Cliff and Ord (1973). The key feature of SAR models, and various gener-
alizations such as SARMA (SAR moving average) and matrix exponential spatial speci-
fications (MESS, due to LeSage and Pace (2007)), is the presence of one or more spatial
weight matrices whose elements characterize the links between agents. As noted above,
these links may form for a variety of reasons, so the ‘spatial’ terminology represents a very
general notion of space, such as social or economic space. Key papers on the estimation of
SAR models and their variants include Kelejian and Prucha (1998) and Lee (2004), but re-
search on various aspects of these is active and ongoing, see e.g. Robinson and Rossi (2015);
Hillier and Martellosio (2018a,b); Kuersteiner and Prucha (2020); Han, Lee, and Xu (2021);
Hahn, Kuersteiner, and Mazzocco (2021).
Unlike work focusing on independent or time series data, a general drawback of spa-
tially oriented research has been the lack of general unified theory. Typically, individual
papers have studied specific special cases of various spatial specifications. A strand of the
literature has introduced the notion of a cross-sectional linear-process to help address this
problem, and we follow this approach. This representation can accommodate SAR mod-
els in the error term (so called spatial error models (SEM)) as a special case, as well as
variants like SARMA and MESS, whence its generality is apparent. The linear-process
structure shares some similarities with that familiar from the time series literature (see e.g.
Hannan (1970)). Indeed, time series versions may be regarded as very special cases but,
as stressed before, the features of spatial dependence must be taken into account in the
general formulation. Such a representation was introduced by Robinson (2011) and further
examined in other situations by Robinson and Thawornkaiwong (2012) (partially linear re-
gression), Delgado and Robinson (2015) (non-nested correlation testing), Lee and Robinson
(2016) (series estimation of nonparametric regression) and Hidalgo and Schafgans (2017)
(cross-sectionally dependent panels).
In this paper, we propose a test statistic similar to that of Hong and White (1995), based
on estimating the nonparametric specification via series approximations. Assuming an inde-
pendent and identically distributed sample, their statistic is based on the sample covariance
between the residual from the parametric model and the discrepancy between the paramet-
ric and nonparametric fitted values. Allowing additionally for spatial dependence through
the form of a linear process as discussed above, our statistic is shown to be asymptotically
standard normal, consistent and possessing nontrivial power against local alternatives of a
certain type. To prove asymptotic normality, we present a new central limit theorem (CLT)
for quadratic forms in linear processes in an increasing dimension setting that may be of
independent interest. A CLT for quadratic forms under time series dependence in the con-
3
text of series estimation can be found in Gao and Anh (2000), and our result can viewed
as complementary to this. The setting of Su and Qu (2017) is a very special case of our
framework. There has been recent interest in specification testing for spatial models, see for
example Sun (2020) for a kernel-based model specification test and Lee, Phillips, and Rossi
(2020) for a consistent omnibus test. We contribute to this literature by studying a linear
process based increasing parameter dimension framework.
Our linear process framework permits spatial dependence to be parametric, parametric
with increasing dimension, semiparametric or any combination thereof, thus covering a vast
variety of settings. A class of models of great empirical interest are ‘higher-order’ SAR
models in the outcome variables, but with spatial dependence structure also in the errors.
We initially present the familiar nonparametric regression to clarify the exposition, and then
cover this class as the main model of interest. Our theory covers as special cases SAR, SMA,
SARMA, MESS models for the error term. These specifications may be of any fixed spatial
order, but our theory also covers the case where they are of increasing order.
Thus we permit a more complex model of spatial dependence as more data become avail-
able, which encourages a more flexible approach to modelling such dependence as stressed
by Gupta and Robinson (2015, 2018) in a higher-order SAR context, Huber (1973), Portnoy
(1984, 1985) and Anatolyev (2012) in a regression context and Koenker and Machado (1999)
for the generalized method of moments setting, amongst others. This literature focuses on
a sequence of true models, rather than a sequence of models approximating an infinite true
model. Our paper also takes the same approach. On the other hand, in the spatial setting,
Gupta (2018a) considers increasing lag models as approximations to an infinite lag model
with lattice data and also suggests criteria for choice of lag length.
Our framework is also extended to the situation where spatial dependence occurs through
nonparametric functions of raw distances (these may be exogenous economic or social dis-
tances, say), as in Pinkse et al. (2002). This allows for greater flexibility in modelling spatial
weights as the practitioner only has to choose an exogenous economic distance measure and
allow the data to determine the functional form. It also adds a degree of robustness to the
theory by avoiding potential parametric misspecification. The case of geographical data is
also covered, for example the important classes of Matern and Wendland (see e.g. Gneiting
(2002)) covariance functions. Finally, we introduce a new notion of smooth spatial depen-
dence that provides more primitive, and checkable, conditions for certain properties than
extant ones in the literature.
To illustrate the performance of the test in finite samples, we present Monte Carlo simu-
lations that exhibit satisfactory small sample properties. The test is demonstrated in three
empirical examples, including two based on recently published work on social networks:
4
Bloom et al. (2013) (R&D spillovers in innovation), Konig et al. (2017) (conflict alliances
during the Congolese civil war). Another example studies cross-country spillovers in eco-
nomic growth. Our test may or may not reject the null hypothesis of a linear regression in
these examples, illustrating its ability to distinguish well between the null and alternative
models.
The next section introduces our basic setup using a nonparametric regression with no
SAR structure in responses. We treat this abstraction as a base case, and Section 3 discusses
estimation and defines the test statistic, while Section 4 introduces assumptions and the
key asymptotic results of the paper. Section 5 examines the most commonly employed
higher-order SAR models, while Section 6 deals with nonparametric spatial error structures.
Nonparametric specification tests are often criticized for poor finite sample performance
when using the asymptotic critical values. In Section 7 we present a bootstrap version
of our testing procedure. Sections 8 and 9 contain a study of finite sample performance
and the empirical examples respectively. Proofs are contained in appendices, including a
supplementary online appendix.
For the convenience of the reader, we collect some frequently used notation here. First,
introduce three notational conventions for any parameter ν for the rest of the paper: ν ∈ Rdν ,
ν0 denotes the true value of ν and for any scalar, vector or matrix valued function f(ν), we
denote f ≡ f(ν0). Let ϕ(·) (respectively ϕ(·)) denote the largest (respectively smallest)
eigenvalue of a generic square nonnegative definite matrix argument. For a generic matrix
A, denote ‖A‖ = [ϕ(A′A)]1/2, i.e. the spectral norm of A which reduces to the Euclidean
norm if A is a vector. ‖A‖R denotes the maximum absolute row sum norm of a generic
matrix A while ‖A‖F = [tr(AA′)]1/2, the Frobenius norm. Throughout the paper | · | isabsolute value when applied to a scalar and determinant when applied to a matrix. Denote
by c (C) generic positive constants, independent of any quantities that tend to infinity, and
arbitrarily small (big).
2 Setup
To illustrate our approach, we first consider the nonparametric regression
yi = θ0 (xi) + ui, i = 1, . . . , n, (2.1)
where θ0(·) is an unknown function and xi is a vector of strictly exogenous explanatory
variables with support X ⊂ Rk. Spatial dependence is explicitly modeled via the error term
5
ui, which we assume is generated by:
ui =
∞∑
s=1
bisεs, (2.2)
where εs are independent random variables, with zero mean and identical variance σ20. Fur-
ther conditions on the εs will be assumed later. The linear process coefficients bis can depend
on n, as may the covariates xi. This is generally the case with spatial models and implies
that asymptotic theory ought to be developed for triangular arrays. There are a number of
reasons to permit dependence on sample size. The bis can depend on spatial weight matrices,
which are usually normalized for both stability and identification purposes.
Such normalizations, e.g. row-standardization or division by spectral norm, may be n-
dependent. Furthermore, xi often includes underlying covariates of ‘neighbours’ defined by
spatial weight matrices. For instance, for some n × 1 covariate vector z and exogenous
spatial weight matrix W ≡ Wn, a component of xi can be e′iWz, where ei has unity in the
i-th position and zeroes elsewhere, which depends on n. Thus, subsequently, any spatial
weight matrices will also be allowed to depend on n. Finally, treating triangular arrays
permits re-labelling of quantities that is often required when dealing with spatial data, due
to the lack of natural ordering, see e.g. Robinson (2011). We suppress explicit reference to
this n-dependence of various quantities for brevity, although mention will be made of this at
times to remind the reader of this feature.
Now, assume the existence of a dγ × 1 vector γ0 such that bis = bis(γ0), possibly with
dγ → ∞ as n → ∞, for all i = 1, . . . , n and s ≥ 1. Let u be the n × 1 vector with typical
element ui, ε be the infinite dimensional vector with typical element εs, and B be an infinite
dimensional matrix (Cooke, 1950) with typical element bis. In matrix form,
u = Bε and E (uu′) = σ20BB
′ = σ20Σ ≡ σ2
0Σ (γ0) . (2.3)
We assume that γ0 ∈ Γ, where Γ is a compact subset of Rdγ . With dγ diverging, ensuring
Γ has bounded volume requires some care, see Gupta and Robinson (2018). For a known
function f(·), our aim is to test
H0 : P [θ0 (xi) = f(xi, α0)] = 1, for some α0 ∈ A ⊂ Rdα , (2.4)
against the global alternative H1 : P [θ0 (xi) 6= f(xi, α)] > 0, for all α ∈ A.
We now nest commonly used models for spatial dependence in (2.3). Introduce a set
of n × n spatial weight (equivalently network adjacency) matrices Wj, j = 1, . . . , m1 +
6
m2. Each Wj can be thought of as representing dependence through a particular space.
Now, consider models of the form Σ(γ) = A−1(γ)A′−1(γ). For example, with ξ denoting a
vector of iid disturbances with variance σ20, the model with SARMA(m1, m2) errors is u =
∑m1
j=1 γjWju+∑m1+m2
j=m1+1 γjWjξ+ξ, with A(γ) =(In +
∑m1+m2
j=m1+1 γjWj
)−1 (In −
∑m1
j=1 γjWj
),
assuming conditions that guarantee the existence of the inverse. Such conditions can be
found in the literature, see e.g. Lee and Liu (2010) and Gupta and Robinson (2018). The
SEM model is obtained by setting m2 = 0 while the model with SMA errors has m1 = 0.
The model with MESS(m) errors (LeSage and Pace (2007), Debarsy, Jin, and Lee (2015)) is
u = exp(∑m
j=1 γjWj
)ξ, A(γ) = exp
(−∑m
j=1 γjWj
).
In some cases the space under consideration is geographic i.e. the data may be observed
at irregular points in Euclidean space. Making the identification ui ≡ U (ti), ti ∈ Rd for
some d > 1, and assuming covariance stationarity, U(t) is said to follow an isotropic model
if, for some function δ on R, the covariance at lag s is r(s) = E [U(t)U(t + s)] = δ(‖s‖). Animportant class of parametric isotropic models is that of Matern (1986), which can be pa-
rameterized in several ways, see e.g. Stein (1999). Denoting by Γf the Gamma function and
by Kγ1 the modified Bessel function of the second kind (Gradshteyn and Ryzhik (1994)),
take δ(‖s‖ , γ) = (2γ1−1Γf(γ1))−1 (
γ−12
√2γ1 ‖s‖
)γ1 Kγ1
(γ−12
√2γ1 ‖s‖
), with γ1, γ2 > 0 and
dγ = 2. With dγ = 3, another model takes δ(‖s‖ , γ) = γ1 exp (−‖s/γ2‖γ3), see e.g.
De Oliveira, Kedem, and Short (1997), Stein (1999). Fuentes (2007) considers this model
with γ3 = 1, as well as a specific parameterization of the Matern covariance function.
3 Test statistic
We estimate θ0(·) via a series approximation. Certain technical conditions are needed to
allow for X to have unbounded support. To this end, for a function g(x) on X , define a
weighted sup-norm (see e.g. Chen, Hong, and Tamer (2005), Chen (2007), Lee and Robinson
(2016)) by ‖g‖w = supx∈X |g(x)|(1 + ‖x‖2
)−w/2, for some w > 0. Assume that there exists
a sequence of functions ψi := ψ (xi) : Rk 7→ R
p, where p→ ∞ as n→ ∞, and a p× 1 vector
of coefficients β0 such that
θ0 (xi) = ψ′iβ0 + e (xi) , (3.1)
where e(·) satisfies:
Assumption R.1. There exists a constant µ > 0 such that ‖e‖wx= O (p−µ) , as p → ∞,
where wx ≥ 0 is the largest value such that supi=1,...,nE ‖xi‖wx <∞, for all n.
7
By Lemma 1 in Appendix B of Lee and Robinson (2016), this assumption implies that
supi=1,...,n
E(e2 (xi)
)= O
(p−2µ
). (3.2)
Due to the large number of assumptions in the paper, sometimes with changes reflecting
only the various setups we consider, we prefix assumptions with R in this section and the
next, to signify ‘regression’. In Section 5 the prefix is SAR, for ‘spatial autoregression’, while
in Section 6 we use NPN, for ‘nonparametric network’.
Let y = (y1, . . . , yn)′, θ0 = (θ0 (x1) , . . . , θ0 (xn))
′,Ψ = (ψ1, . . . , ψn)′. We will estimate
γ0 using a quasi maximum likelihood estimator (QMLE) based on a Gaussian likelihood,
although Gaussianity is nowhere assumed. For any admissible values β, σ2 and γ, the
(multiplied by 2/n) negative quasi log likelihood function based on using the approximation
(3.1) is
L(β, σ2, γ) = ln(2πσ2
)+
1
nln |Σ (γ)|+ 1
nσ2(y −Ψβ)′Σ (γ)−1 (y −Ψβ), (3.3)
which is minimised with respect to β and σ2 by
β (γ) =(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 y, (3.4)
σ2 (γ) = n−1y′E(γ)′M(γ)E(γ)y, (3.5)
where M(γ) = In − E(γ)Ψ (Ψ′Σ(γ)−1Ψ)−1
Ψ′E(γ)′ and E(γ) is the n × n matrix such that
E(γ)E(γ)′ = Σ(γ)−1. The use of the approximate likelihood relies on the negligibility of
e(·), which in turn permits the replacement of θ0(·) by ψ′β0 with asymptotically negligible
cost. Thus the concentrated likelihood function is
L(γ) = ln(2π) + ln σ2(γ) +1
nln |Σ (γ)| . (3.6)
We define the QMLE of γ0 as γ = arg minγ∈ΓL(γ) and the QMLEs of β0 and σ20 as β = β (γ)
and σ2 = σ2 (γ). At a given x1, . . . , xn, the series estimate of θ0 is defined as
θ =(θ(x1), . . . , θ(xn)
)′=(ψ(x1)
′β, . . . , ψ(xn)′β)′. (3.7)
Let αn ≡ α denote an estimator consistent for α0 under H0, for example the (nonlinear)
least squares estimator. Note that α is consistent only under H0, so we introduce a general
probability limit of α, as in Hong and White (1995).
Assumption R.2. There exists a deterministic sequence α∗n ≡ α∗ such that α − α∗ =
8
Op (1/√n).
Examples of estimators that satisfy this assumption include (nonlinear) least squares, gener-
alized method of moments estimators or adaptive efficient weighted least squares (Stinchcombe and White,
1998).
Following Hong and White (1995), define the regression error ui ≡ yi − f(xi, α∗) and
the specification error vi ≡ θ0(xi) − f(xi, α∗). Our test statistic is based on a scaled and
centred version of mn = σ−2v′Σ (γ)−1 u/n = σ−2(θ − f (x, α)
)′Σ (γ)−1 (y − f (x, α)) /n,
where f(x, α) = (f (x1, α) , . . . , f (xn, α))′. Precisely, it is defined as
Tn =nmn − p√
2p. (3.8)
The motivation for such a centering and scaling stems from the fact that, for fixed p, nmn
has an asymptotic χ2p distribution. Such a distribution has mean p and variance 2p, and it
is a well-known fact that(χ2p − p
)/√2p
d−→ N(0, 1), as p → ∞. This motivates our use
of (3.8) and explains why we aspire to establish a standard normal distribution under the
null hypothesis. Intuitively, the test statistic is based on the sample covariance between
the residual from the parametric model and the discrepancy between the parametric and
nonparametric fitted values, as in Hong and White (1995). Hong and White (1995) also
note that, due to the nonparametric nature of the problem, such a statistic vanishes faster
than the parametric (n1
2 ) rate, thus a n1
2 -normalization leads to degeneracy of the test. A
proper normalization as in (3.8) will yield a non-degenerate limiting distribution.
4 Asymptotic theory
4.1 Consistency of γ
We first provide conditions under which our estimator γ of γ0 is consistent. Such a property
is necessary for the results that follow. The following assumption is a rather standard type
of asymptotic boundedness and full-rank condition on Σ(γ).
Assumption R.3.
limn→∞
supγ∈Γ
ϕ (Σ(γ)) <∞ and limn→∞
infγ∈Γ
ϕ (Σ(γ)) > 0.
Assumption R.4. The ui, i = 1, . . . , n, satisfy the representation (2.2). The εs, s ≥ 1, have
zero mean, finite third and fourth moments µ3 and µ4 respectively and, denoting by σij(γ)
9
the (i, j)-th element of Σ(γ) and defining b∗is = bis/σ1
2
ii , i = 1, . . . , n, n ≥ 1, s ≥ 1, we have
limn→∞
supi=1,...,n
∞∑
s=1
|b∗is|+ sups≥1
limn→∞
n∑
i=1
|b∗is| <∞. (4.1)
By Assumption R.3, σii is bounded and bounded away from zero, so the normalization of
the bis in Assumption R.4 is well defined. The summability conditions in (4.1) are typical
conditions on linear process coefficients that are needed to control dependence; for instance
in the case of stationary time series b∗is = b∗i−s. The infinite linear process assumed in (2.2) is
further discussed by Robinson (2011), who introduced it, and also by Delgado and Robinson
(2015).
Because we often need to consider the difference between values of the matrix-valued
function Σ(·) at distinct points, it is useful to introduce an appropriate concept of ‘smooth-
ness’. This concept has been employed before in economics, see e.g. Chen (2007), and is
defined below.
Definition 1. Let (X, ‖·‖X) and (Y, ‖·‖Y ) be Banach spaces, L (X, Y ) be the Banach space
of linear continuous maps from X to Y with norm ‖T‖L (X,Y ) = sup‖x‖X≤1 ‖T (x)‖Y and U
be an open subset of X. A map F : U → Y is said to be Frechet-differentiable at u ∈ U if
there exists L ∈ L (X, Y ) such that
lim‖h‖X→0
F (u+ h)− F (u)− L(h)
‖h‖X= 0. (4.2)
L is called the Frechet-derivative of F at u. The map F is said to be Frechet-differentiable
on U if it is Frechet-differentiable for all u ∈ U .
The above definition extends the notion of a derivative that is familiar from real analysis
to the functional spaces and allows us to check high-level assumptions that past literature
has imposed. To the best of our knowledge, this is the first use of such a concept in the
literature on spatial/network models. Denote by Mn×n the set of real, symmetric and
positive semi-definite n×n matrices. Let Γo be an open subset of Γ and consider the Banach
spaces(Γ, ‖·‖g
)and (Mn×n, ‖·‖), where ‖·‖g is a generic ℓp norm, p ≥ 1. The following
assumption ensures that Σ(·) is a ‘smooth’ function, in the sense of Frechet-smoothness.
Assumption R.5. The map Σ : Γo → Mn×n is Frechet-differentiable on Γo with Frechet-
derivative denoted DΣ ∈ L (Γo,Mn×n). Furthermore, the map DΣ satisfies
supγ∈Γo
‖DΣ(γ)‖L (Γo,Mn×n) ≤ C. (4.3)
10
Assumption R.5 is a functional smoothness condition on spatial dependence. It has the
advantage of being checkable for a variety of commonly employed models. For example,
a first-order SEM has Σ(γ) = A−1(γ)A′−1(γ) with A = In − γW . Corollary D.1 in the
supplementary appendix shows (DΣ(γ))(γ†)= γ†A−1(γ) (G′(γ) +G(γ))A′−1(γ), at a given
point γ ∈ Γo, where G(γ) =WA−1(γ). Then, taking
‖W‖+ supγ∈Γ
∥∥A−1(γ)∥∥ < C (4.4)
yields Assumption R.5. Condition (4.4) limits the extent of spatial dependence and is very
standard in the spatial literature; see e.g. Lee (2004) and numerous subsequent papers
employing similar conditions. Frechet derivatives for higher-order SAR, SMA, SARMA and
MESS error structures are computed in supplementary appendix D, in Lemmas D.5-D.6 and
Corollaries D.1-D.2. The following proposition is very useful in ‘linearizing’ perturbations in
the Σ(·).Proposition 4.1. If Assumption R.5 holds, then for any γ1, γ2 ∈ Γo,
‖Σ (γ1)− Σ (γ2)‖ ≤ C ‖γ1 − γ2‖ . (4.5)
To illustrate how the concept of Frechet-differentiability allows us to check high-level as-
sumptions extant in the literature, a consequence of Proposition 4.1 is the following corollary,
a version of which appears as an assumption in Delgado and Robinson (2015).
Corollary 4.1. For any γ∗ ∈ Γo and any η > 0,
limn→∞
supγ∈{γ:‖γ−γ∗‖<η}∩Γo
‖Σ(γ)− Σ (γ∗)‖ < Cη. (4.6)
We now introduce regularity conditions needed to establish the consistency of γ. Define
σ2 (γ) = n−1σ2tr(Σ(γ)−1Σ
)= n−1σ2
∥∥E(γ)E−1∥∥2F,
which is nonnegative by definition and bounded by Assumption R.3.
Assumption R.6. c ≤ σ2 (γ) ≤ C for all γ ∈ Γ.
Assumption R.7. γ0 ∈ Γ and, for any η > 0,
limn→∞
infγ∈N
γ(η)
n−1tr (Σ(γ)−1Σ)
|Σ(γ)−1Σ|1/n> 1, (4.7)
where N γ(η) = Γ \ N γ(η) and N γ(η) = {γ : ‖γ − γ0‖ < η} ∩ Γ.
11
Assumption R.8.{ϕ (n−1Ψ′Ψ)
}−1+ ϕ (n−1Ψ′Ψ) = Op(1).
Assumption R.6 is a boundedness condition originally considered in Gupta and Robinson
(2018), while Assumptions R.7 and R.8 are identification conditions. Indeed, Assumption
R.7 requires that Σ(γ) be identifiable in a small neighbourhood around γ0. This is ap-
parent on noticing that the ratio in (4.7) is at least one by the inequality between arith-
metic and geometric means, and equals one when Σ(γ) = Σ. Similar assumptions arise
frequently in related literature, see e.g. Lee (2004), Delgado and Robinson (2015). Assump-
tion R.8 is a typical asymptotic boundedness and non-multicollinearity condition, see e.g.
Newey (1997) and much other literature on series estimation. By Assumption R.3, it implies
supγ∈Γ
{ϕ (n−1Ψ′Σ(γ)−1Ψ)
}−1= Op(1).
Theorem 4.1. Under either H0 or H1, Assumptions R.1-R.8 and p−1 + (dγ + p) /n→ 0 as
n→ ∞, ‖(γ, σ2)− (γ0, σ20)‖
p−→ 0.
4.2 Asymptotic properties of the test statistic
Write Σj(γ) = ∂Σ(γ)/∂γj , j = 1, . . . , dγ, the matrix differentiated element-wise. While
Assumption R.5 guarantees that these partial derivatives exist, the next assumption imposes
a uniform bound on their spectral norms.
Assumption R.9. limn→∞ supj=1,...,dγ ‖Σj(γ)‖ < C.
We will later consider the sequence of local alternatives
Hℓn ≡ Hℓ : f(xi, α∗n) = θ0(xi) + (p1/4/n1/2)h(xi), (4.8)
where h is square integrable on the support X of the xi. Under the null H0, we have
h(xi) = 0.
Assumption R.10. For each n ∈ N and i = 1, . . . , n, the function f : X ×A → R is such that f (xi, α) is measurable for each α ∈ A, f (xi, ·) is a.s. con-
tinuous on A, with supα∈A f2 (xi, α) ≤ Dn (xi), where supn∈NDn (xi) is integrable and
supα∈A ‖∂f (xi, α) /∂α‖2 ≤ Dn (xi), supα∈A ‖∂2f (xi, α) /∂α∂α′‖ ≤ Dn (xi).
Define the infinite-dimensional matrix V = B′Σ−1Ψ (Ψ′Σ−1Ψ)−1
Ψ′Σ−1B, which is sym-
metric, idempotent and has rank p. We now show that our test statistic is approximated by
a quadratic form in ε, weighted by V .
Theorem 4.2. Under Assumptions R.1-R.10, p−1 + p(p + d2γ
)/n +
√n/pµ+1/4 → 0, as
n→ ∞, and H0, Tn −(σ−20 ε′V ε− p
)/√2p = op(1).
12
Assumption R.11. limn→∞
‖Σ−1‖R <∞.
Because ‖Σ−1‖ ≤ ‖Σ−1‖R, this restriction on spatial dependence is somewhat stronger
than a restriction on spectral norm but is typically imposed for central limit theorems in
this type of setting, cf. Lee (2004), Delgado and Robinson (2015), Gupta and Robinson
(2018). The next assumption is needed in our proofs to check a Lyapunov condition. A
typical approach would be assume moments of order 4 + ǫ, for some ǫ > 0. Due to the
linear process structure under consideration, taking ǫ = 4 makes the proof tractable, see for
example Delgado and Robinson (2015).
Assumption R.12. The εs, s ≥ 1, have finite eighth moment.
The next assumption is strong if the basis functions ψij(·) are polynomials, requiring all
moments to exist in that case.
Assumption R.13. E |ψij (x)| < C, i = 1, . . . , n and j = 1, . . . , p.
The next theorem establishes the asymptotic normality of the approximating quadratic
form introduced above.
Theorem 4.3. Under Assumptions R.3, R.4, R.8, R.11-R.13 and p−1 + p3/n → 0, as
n→ ∞,(σ−20 ε′V ε− p
)/√2p
d−→ N(0, 1).
This is a new type of CLT, integrating both a linear process framework as well as an
increasing dimension element. A linear-quadratic form in iid disturbances is treated by
Kelejian and Prucha (2001), while a quadratic form in a linear process framework is treated
by Delgado and Robinson (2015). However both results are established in a parametric
framework, entailing no increasing dimension aspect of the type we face with p→ ∞.
Next, we summarize the properties of our test statistic in a theorem that records its
asymptotic normality under the null, consistency and ability to detect local alternatives
at p1/4/n1/2 rate. This rate has been found also by De Jong and Bierens (1994) and Gupta
(2018b). Introduce the quantity κ =(√
2σ20
)−1plimn→∞ n−1h′Σ−1h, where h = (h (x1) , . . . , h (xn))
′
and h (xi) is from (4.8).
Theorem 4.4. Under the conditions of Theorems 4.2 and 4.3, (1) Tnd→ N(0, 1) under H0,
(2) Tn is a consistent test statistic, (3) Tnd→ N (κ, 1) under local alternatives Hℓ.
13
5 Models with SAR structure in responses
We now introduce the SAR model
yi =
dλ∑
j=1
λ0jw′i,jy + θ0 (xi) + ui, i = 1, . . . , n, (5.1)
where Wj , j = 1, . . . , dλ, are known spatial weight matrices with i-th rows denoted w′i,j,
as discussed earlier, and λ0j are unknown parameters measuring the strength of spatial
dependence. We take dλ to be fixed for convenience of exposition. The error structure
remains the same as in (2.2). Here spatial dependence arises not only in errors but also
responses. For example, this corresponds to a situation where agents in a network influence
each other both in their observed and unobserved actions.
While the model in (5.1) is new in the literature, some related ones are discussed
here. Models such as (5.1) but without dependence in the error structure are considered
by Su and Jin (2010) and Gupta and Robinson (2015, 2018), but the former consider only
dλ = 1 and the latter only parametric θ0(·). Linear θ0(·) and dλ > 1 are permitted by
Lee and Liu (2010), but the dependence structure in errors differs from what we allow in
(5.1). Using the same setup as Su and Jin (2010) and independent disturbances, a specifica-
tion test for the linearity of θ0(·) is proposed by Su and Qu (2017). In comparison, our model
is much more general and our test can handle more general parametric null hypotheses. We
thank a referee for pointing out that (5.1) is a particular case of Sun (2016) when ui are iid
and of Malikov and Sun (2017) when dλ = 1.
Denoting S(λ) = In −∑dλj=1 λjWj , the quasi likelihood function based on Gaussianity
and conditional on covariates is
L(β, σ2, φ) = log (2πσ2)− 2
nlog |S (λ)|+ 1
nlog |Σ (γ)|
+1
σ2n(S (λ) y −Ψβ)′ Σ(γ)−1 (S (λ) y −Ψβ) , (5.2)
at any admissible point (β ′, φ′, σ2)′with φ = (λ′, γ′)′, for nonsingular S(λ) and Σ(γ). For
given φ = (λ′, γ′)′, (5.2) is minimised with respect to β and σ2 by
β (φ) =(Ψ′Σ(γ)−1Ψ
)−1Ψ′Σ(γ)−1S (λ) y, (5.3)
σ2 (φ) = n−1y′S ′ (λ)E(γ)′M(γ)E(γ)S (λ) y. (5.4)
14
The QMLE of φ0 is φ = argminφ∈Φ L (φ), where
L (φ) = log σ2 (φ) + n−1 log∣∣S ′−1 (λ) Σ(γ)S−1 (λ)
∣∣ , (5.5)
and Φ = Λ × Γ is taken to be a compact subset of Rdλ+dγ . The QMLEs of β0 and σ20 are
defined as β(φ)
≡ β and σ2(φ)
≡ σ2 respectively. The following assumption controls
spatial dependence and is discussed below equation (4.4).
Assumption SAR.1. maxj=1,...,dλ ‖Wj‖+ ‖S−1‖ < C.
Writing T (λ) = S(λ)S−1 and φ = (λ′, γ′)′, define the quantity
σ2 (φ) = n−1σ20tr(T ′(λ)Σ(γ)−1T (λ)Σ
)= n−1σ2
0
∥∥E(γ)T (λ)E−1∥∥2F,
which is nonnegative by definition and bounded by Assumptions R.3 and SAR.1. The
assumptions below directly extend Assumptions R.6 and R.7 to the present setup.
Assumption SAR.2. c ≤ σ2 (φ) ≤ C, for all φ ∈ Φ.
Assumption SAR.3. φ0 ∈ Φ and, for any η > 0,
limn→∞
infφ∈N
φ(η)
n−1tr (T ′(λ)Σ(γ)−1T (λ)Σ)
|T ′(λ)Σ(γ)−1T (λ)Σ|1/n> 1, (5.6)
where N φ(η) = Φ \ N φ(η) and N φ(η) = {φ : ‖φ− φ0‖ < η} ∩ Φ.
We now introduce an identification condition that is required in the setup of this section.
Assumption SAR.4. β0 6= 0 and for any η > 0,
P
(limn→∞
inf(λ′,γ′)′∈Λ×N
γ(η)n−1β ′
0Ψ′T ′(λ)E(γ)′M (γ)E(γ)T (λ)Ψβ0/ ‖β0‖2 > 0
)= 1. (5.7)
Upon performing minimization with respect to β, the event inside the probability in (5.7)
is equivalent to the event
limn→∞
minβ∈Rp
inf(λ′,γ′)′∈Λ×N
γ(η)n−1 (Ψβ − T (λ)Ψβ0)
′ Σ(γ)−1 (Ψβ − T (λ)Ψβ0) / ‖β0‖2 > 0,
which is analogous to the identification condition for the nonlinear regression model with a
parametric linear factor in Robinson (1972), weighted by the inverse of the error covariance
matrix. This reduces the condition to a scalar form of a rank condition, making the iden-
tifying nature of the assumption transparent. A similar identifying assumption is used by
Gupta and Robinson (2018).
15
Theorem 5.1. Under either H0 or H1, Assumptions R.1-R.5, R.8, SAR.1-SAR.4 and
p−1 + (dγ + p) /n→ 0, as n→ ∞,
∥∥∥(φ, σ2
)− (φ0, σ0
2)∥∥∥ p−→ 0 as n→ ∞.
The test statistic Tn can be constructed as before but with the null residuals redefined to
incorporate the spatially lagged terms, i.e. u = S(λ)y− f(x, α). Then we have the following
theorem.
Theorem 5.2. Under Assumptions R.1-R.5, R.8-R.10, SAR.1-SAR.4,
p−1 + p(p + d2γ
)/n+
√n/pµ+1/4 + d2γ/p→ 0, as n→ ∞,
and H0, Tn −(σ−20 ε′V ε− p
)/√2p = op(1).
Theorem 5.3. Under the conditions of Theorems 4.3, 5.1 and 5.2, (1) Tnd→ N(0, 1) under
H0, (2) Tn is a consistent test statistic, (3) Tnd→ N (κ, 1) under local alternatives Hℓ.
6 Nonparametric spatial weights
In this section we are motivated by settings where spatial dependence occurs through non-
parametric functions of raw distances (this may be geographic, social, economic, or any other
type of distance), as is the case in Pinkse et al. (2002), for example. In their kind of setup,
dij is a raw distance between units i and j and the corresponding element of the spatial
weight matrix is given by wij = ζ0 (dij), where ζ0(·) is an unknown nonparametric function.
Pinkse et al. (2002) use such a setup in a SAR model like (5.1), but with a linear regression
function. In contrast, in keeping with the focus of this paper we instead model dependence
in the errors in this manner. Our formulation is rather general, covering, for example, a
specification like wij = f (γ0, ζ0 (dij)), with f(·) a known function, γ0 an unknown parameter
of possibly increasing dimension, and ζ0(·) an unknown nonparametric function.
Let Ξ be a compact space of functions, on which we will specify more conditions later.
For notational simplicity we abstract away from the SAR dependence in the responses. Thus
we consider (2.1), but with
ui =∞∑
s=1
bis(γ0, ζ0(z))εs, (6.1)
where ζ0(·) =(ζ01(·), . . . , ζ0dζ (·)
)′is a fixed-dimensional vector of real-valued nonparametric
functions with ζ0ℓ ∈ Ξ for each ℓ = 1, . . . , dζ, and z a fixed-dimensional vector of data,
16
independent of the εs, s ≥ 1, with support Z. One can also take z to be a fixed distance
measure. We base our estimation on approximating each ζ0ℓ(z), ℓ = 1, . . . , dζ, with the series
representation δ′0ℓϕℓ(z), where ϕℓ (z) ≡ ϕℓ is an rℓ × 1 (rℓ → ∞ as n → ∞) vector of basis
functions with typical function ϕℓk, k = 1, . . . , rℓ. The set of linear combinations δ′ℓϕℓ(z)
forms the sequence of sieve spaces Φrℓ ⊂ Ξ as rℓ → ∞, for any ℓ = 1, . . . , dζ, and
ζ0ℓ (z) = δ′0ℓϕℓ + νℓ, (6.2)
with the following restriction on the function space Ξ:
Assumption NPN.1. For some scalars κℓ > 0, ‖νℓ‖wz= O
(r−κℓ
ℓ
), as rℓ → ∞, ℓ =
1, . . . , dζ, where wz ≥ 0 is the largest value such that supz∈Z E ‖z‖wz <∞
Just as Assumption R.1 implied (3.2), by Lemma 1 of Lee and Robinson (2016), we
obtain
supz∈Z
E(ν2ℓ)= O
(r−2κℓ
ℓ
), ℓ = 1, . . . , dζ . (6.3)
Thus we now have an infinite-dimensional nuisance parameter ζ0(·) and increasing-dimensional
nuisance parameter γ. Writing∑dζ
ℓ=1 rℓ = r and τ = (γ′, δ′1, . . . , δ′dζ)′, which has increasing
dimension dτ = dγ + r, define ς(r) = supz∈Z;ℓ=1,...,dζ‖ϕℓ‖ . Write Σ(τ) for the covariance
matrix of the n × 1 vector of ui in (6.1), with δ′ℓϕℓ replacing each admissible function ζℓ(·).This is analogous to the definition of Σ(γ) in earlier sections, and indeed after conditioning
on z it can be treated in a similar way because dγ → ∞ was already permitted. For exam-
ple, suppose that u = (In −W )−1ε, where ‖W‖ < 1 and the elements satisfy wij = ζ0 (dij),
i, j = 1, . . . , n, for some fixed distances dij and unknown function ζ0(·), see e.g. Pinkse
(1999). Approximating ζ0(x) = τ ′0ϕ(x) + ν, for some r × 1 basis function vector ϕ(x) and
approximation error ν, we define W (τ) as the n×n matrix with elements wij(τ) = τ ′0ϕ (dij),
and set Σ(τ) = var ((In −W (τ))−1ε) = σ20(In −W (τ))−1(In −W ′(τ))−1.
For any admissible values β, σ2 and τ , the redefined (multiplied by 2/n) negative quasi
log likelihood function based on using the approximations (3.1) and (6.2) is
L(β, σ2, τ) = ln(2πσ2
)+
1
nln |Σ (τ)|+ 1
nσ2(y −Ψβ)′Σ (τ)−1 (y −Ψβ), (6.4)
which is minimised with respect to β and σ2 by
β (τ) =(Ψ′Σ (τ)−1Ψ
)−1Ψ′Σ (τ)−1 y, (6.5)
σ2 (τ) = n−1y′E(τ)′M(τ)E(τ)y, (6.6)
where M(τ) = In − E(τ)Ψ (Ψ′Σ(τ)−1Ψ)−1
Ψ′E(τ)′ and E(τ) is the n × n matrix such that
17
E(τ)E(τ)′ = Σ(τ)−1. Thus the concentrated likelihood function is
L(τ) = ln(2π) + ln σ2(τ) +1
nln |Σ (τ)| . (6.7)
Again, for compact Γ and sieve coefficient space ∆, the QMLE of τ0 is τ = arg minτ∈Γ×∆L(τ)and the QMLEs of β and σ2 are β = β (τ) and σ2 = σ2 (τ). The series estimate of θ0
is defined as in (3.7). Define also the product Banach space T = Γ × Ξdζ with norm∥∥(γ′, ζ ′)′∥∥Tw
= ‖γ‖ +∑dζ
ℓ=1 ‖ζℓ‖w, and consider the map Σ : T o → Mn×n, where T o is an
open subset of T .
Assumption NPN.2. The map Σ : T o → Mn×n is Frechet-differentiable on T o with
Frechet-derivative denoted DΣ ∈ L (T o,Mn×n). Furthermore, conditional on z, the map
DΣ satisfies
supt∈T o
‖DΣ(t)‖L (T o,Mn×n) ≤ C, (6.8)
on its domain T o.
This assumption can be checked in a similar way to how we checked Assumption R.5, where
a diverging dimension for the argument was already permitted.
Proposition 6.1. If Assumption NPN.2 holds, then for any t1, t2 ∈ T o, conditional on z,
‖Σ (t1)− Σ (t2)‖ ≤ Cς(r) ‖t1 − t2‖ . (6.9)
Corollary 6.1. For any t∗ ∈ T o and any η > 0, conditional on z,
limn→∞
supt∈{t:‖t−t∗‖<η}∩T o
‖Σ(t)− Σ (t∗)‖ < Cς(r)η. (6.10)
Assumption NPN.3. c ≤ σ2 (τ) ≤ C for τ ∈ Γ×∆, conditional on z.
Denote Σ (τ0) = Σ0. Note that this is not the true covariance matrix, which is Σ ≡Σ (γ0, ζ0).
Assumption NPN.4. τ0 ∈ Γ×∆ and, for any η > 0, conditional on z,
limn→∞
infτ∈N
τ(η)
n−1tr (Σ(τ)−1Σ0)
|Σ(τ)−1Σ0|1/n> 1, (6.11)
where N τ(η) = (Γ×∆) \ N τ(η) and N τ(η) = {τ : ‖τ − τ0‖ < η} ∩ (Γ×∆).
18
Remark 1. Expressing the identification condition in Assumption NPN.4 in terms of τ
implies that identification is guaranteed via the sieve spaces Φrℓ , ℓ = 1, . . . , dζ . This approach
is common in the sieve estimation literature, see e.g. Chen (2007), p. 5589, Condition 3.1.
Theorem 6.1. Under either H0 or H1, Assumptions R.1-R.4 (with R.3 and R.4 hold-
ing for t ∈ T rather than γ ∈ Γ), R.8, NPN.1-NPN.4 and p−1 +(minℓ=1,...,dζ rℓ
)−1+(
dγ + p+maxℓ=1,...,dζ rℓ)/n→ 0 as n→ ∞, ‖(τ , σ2)− (τ0, σ
20)‖
p−→ 0.
Theorem 6.2. Under the conditions of Theorems 4.2 and 6.1, but with τ and T replacing
γ and Γ in assumptions prefixed with R and p→ ∞,
(min
ℓ=1,...,dζrℓ
)−1
+p2
n+
√n
pµ+1/4+ p1/2ς(r)
dγ + max
ℓ=1,...,dζrℓ
√n
+
√√√√dζ∑
ℓ=1
r−2κℓ
ℓ
→ 0,
as n→ ∞, and H0, Tn −(σ−20 ε′V ε− p
)/√2p = op(1).
Theorem 6.3. Let the conditions of Theorems 4.3 and 6.2 hold, but with τ and T replacing
γ and Γ in assumptions prefixed with R. Then (1) Tnd→ N(0, 1) under H0, (2) Tn is a
consistent test statistic, (3) Tnd→ N (κ, 1) under local alternatives Hℓ.
7 Bootstrap test
The performance of nonparametric tests based on asymptotic distributions often leaves some-
thing to be desired in finite samples. An alternative approach is to use the bootstrap approx-
imation. In this section, we propose a bootstrap version of our test, focusing on the setting of
Section 5. In our simulations and empirical studies, we consider test statistics based on both
mn = σ−2v′Σ (γ)−1 u/n and mn = σ−2(u′Σ (γ)−1 u − η′Σ (γ)−1 η)/n, where η = S(λ)y − θ,
i.e., the residual from nonparametric estimation, u = S(λ)y − f(x, α), and v = θ − f(x, α).
Analogous to the definition of Tn, define the statistic T an = (nmn − p)/
√2p. In the case
of no spatial autoregressive term, and under the power series, T an and Tn are numerically
identical, as was observed by Hong and White (1995). However, in the SARSE setting a dif-
ference arises due to the spatial structure in the response y. We show that T an −Tn = op(1)
in Theorem D.1 in the online supplementary appendix.
The bootstrap versions of the test statistics Tn and T an are
T∗n =
nm∗n − p√2p
=σ∗−2v∗′Σ (γ∗)−1 u∗ − p√
2p
19
Ta∗n =
nm∗n − p√2p
=σ∗−2(u∗′Σ (γ∗)−1 u∗ − η∗′Σ (γ∗)−1 η∗)− p√
2p,
respectively, where u∗ is the bootstrap residual vector under the null, η∗ is the bootstrap
residual vector under the alternative, v∗ = θ∗(x) − f(x, α∗), and(γ∗, λ∗, σ∗2, θ∗, α∗
)is the
estimator using the bootstrap sample. We elaborate on the bootstrap statistics using the
SARARMA(m1,m2, m3) model as an example:
y =
m1∑
k=1
λkW1ky + θ(x) + u, u =
m2∑
l=1
γ2lW2lu+
m3∑
l=1
γ3lW3lξ + ξ.
Following Jin and Lee (2015), we first deduct the empirical mean of the residual vector from
ξ =
(m3∑
l=1
γ3lW3l + In
)−1(In −
m2∑
l=1
γ2lW2l
)(y −
m1∑
k=1
λkW1ky − θn
)
to obtain ξ = (In − 1nlnl
′n)ξ. Next, we sample randomly with replacement n times from
elements of ξ to obtain a vector of ξ∗. After this, we generate the bootstrap sample y∗ by
treating f = f(x, α), λ and γ as the true parameter:
y∗ =
(In −
m1∑
k=1
λkW1k
)−1f +
(In −
m2∑
l=1
γ2lW2l
)−1( m3∑
l=1
γ3lW3l + In
)ξ∗
.
We estimate the model based on the bootstrap sample y∗ using QMLE to obtain the estimator
θ∗ = ψ′β∗, λ∗, and γ∗ under the alternative hypothesis and α∗ under the null hypothesis of
θ(x) = f(x, α0). Then, η∗ = y∗ −∑m1
k=1 λ∗kW1ky
∗ − θ∗, u∗ = y∗ −∑m1
k=1 λ∗kW1ky
∗ − f(x, α∗).
This procedure is repeated B times to obtain the sequence{T ∗
nj
}Bj=1
. We reject the null
when p∗ = B−1∑B
j=1 1(Tn < T ∗nj) is smaller than the given level of significance. An identical
procedure holds for the test based on T a∗n . The asymptotic validity of the bootstrap method
can be shown as in Theorem 4 of Su and Qu (2017) and Lemma 2 in Jin and Lee (2015),
and detailed analysis can be found in the supplementary appendix, see proof of Theorem
TS.1.
20
8 Finite sample performance
8.1 Parametric error spatial structure
Taking n = 60, 100, 200, we choose two specifications to generate y from the
SARARMA(m1,m2, m3) models:
SARARMA(0,1,0): y = θ(x) + u, u = γ2W2u+ ξ
SARARMA(1,0,1): y = λ1W1y + θ(x) + u, u = γ3W3ξ + ξ,
where ξi is i.i.d. N(0, 1). The DGP of θ(x) is
θ(xi) = x′iα + cp1/4 sin(x′iα)/n1/2,
where x′iα = 1+ x1i + x2i, with x1i = (zi + z1i)/2, x2i = (zi + z2i)/2, zi, z1i, and z2i are i.i.d.
U [0, 2π].
We use two series bases for our experiments: power (polynomial) series
of third and fourth order (p = 10, p = 15) and the trigonometric series
trig1 = (1, sin (x1) , sin (x1/2) , sin (x2) , sin (x2/2) , cos (x1) , cos (x1/2) , cos (x2) , cos (x2/2))′
and trig2 = (trig′1, sin (x21) , cos (x
21) , sin (x
22) , cos (x
22))
′. We also set γ2 = 0.3, λ1 = 0.3
and γ3 = 0.4; the value c = 0, 3, 6 indicates the null hypothesis and the local alterna-
tives. The spatial weight matrices are generated using LeSage’s code make neighborsw
from http://www.spatial-econometrics.com/, where the row-normalized sparse matrix are
generated by choosing a specific number of the closest locations from randomly generated
coordinates and we set the number of neighbors to be n/20. We employ 100 bootstrap repli-
cations in each of 500 Monte Carlo replications except for the SARARMA(1,0,1) design with
n = 200, where we set 50 bootstrap replications in view of the computation time. We report
the rejection frequencies of tests based on both asymptotic and bootstrap critical values.
Tables 1-4 report the empirical rejection frequencies using the test statistics when nominal
levels are given by 1%, 5% and 10%. To see how the bootstrap test improves small sample
outcomes, we report two sets of results for each parameter configuration: the first row for
each value of c is from the asymptotic test while the second row is from the bootstrap test.
We summarize some important findings. First, we see that for most DGPs, our bootstrap
test is closer to the nominal level than the asymptotic test although the sizes of both types of
tests improve generally as the sample size increases. Second, both bootstrap and asymptotic
tests are powerful in detecting any deviations from linearity in the local alternatives.
Both types of bases give qualitatively similar results but we note that Tn = T an when
21
using polynomial series under the SARARMA(0,1,0) model, as observed in Hong and White
(1995). When using trigonometric series, tests based on these two statistics give slightly
different rejection rates. However, under the SARARMA(1,0,1) model, both series give
quantitatively different results, as illustrated in Tables 3 and 4. The patterns are similar
across all cases: the bootstrap generally affords better size control, albeit not always.
8.2 Nonparametric error spatial structure
Now we examine finite sample performance in the setting of Section 6. The three DGPs
of θ(x) are the same as the parametric setting but we generate the n × n matrix W ∗ as
w∗ij = Φ(−dij)I(cij < 0.05) if i 6= j, and w∗
ii = 0, where Φ(·) is the standard normal cdf,
dij ∼iid U [−3, 3], and cij ∼iid U [0, 1]. From this construction, we ensure that W ∗ is sparse
with no more than 5% elements being nonzero. Then, y is generated from y = θ(x)+u, u =
Wu+ ξ, where ξ ∼ N(0, In) and W = W ∗/1.2ϕ (W ∗), ensuring the existence of (I −W )−1.
In estimation, we know the distance dij and the indicator I(cij < 0.05), but we do not know
the functional form of wij, so we approximate elements in W by wij =∑r
l=0 aldlijI(cij <
0.05) if i 6= j; wii = 0.
Table 5 reports the rejection rates using 500 Monte Carlo simulation at the 5% asymptotic
level 1.645 using polynomial bases with r = 2, 3, 4, 5 and p = 10, 15, 20. We take n =
150, 300, 500, 600, 700, larger sample sizes than earlier because two nonparametric functions
must be estimated in this spatial setting. The two largest bandwidths (r = 5, p = 20)
are only employed for the largest sample size n = 700. We observe a clear pattern of
rejection rates approaching the theoretical level as sample size increases. Power improves as
c increases for all designs and is non-trivial in all cases even for c = 3. Sizes are acceptable for
n = 500, particularly when p = 15. Size performance improves further as n = 600, indicating
asymptotic stability. Note that with two diverging bandwidths (p and r), we expect sizes to
improve in a diagonal pattern going from top left corner to bottom right corner in Table 5.
This is indeed the case. For n = 700, we observe that the pairs (r, p) = (5, 15), (5, 20) deliver
acceptable sizes.
9 Empirical applications
In this section, we illustrate the specification test presented in previous sections using several
empirical examples.
22
9.1 Conflict alliances
This example is based on a study of how a network of military alliances and enmities affects
the intensity of a conflict, conducted by Konig et al. (2017). They stress that understanding
the role of informal networks of military alliances and enmities is important not only for
predicting outcomes, but also for designing and implementing policies to contain or put
an end to violence. Konig et al. (2017) obtain a closed-form characterization of the Nash
equilibrium and perform an empirical analysis using data on the Second Congo War, a
conflict that involves many groups in a complex network of informal alliances and rivalries.
To study the fighting effort of each group the authors use a panel data model with
individual fixed effects, where key regressors include total fighting effort of allies and enemies.
They further correct the potential spatial correlation in the error term by using a spatial
heteroskedasticity and autocorrelation robust standard error. We use their data and the
main structure of the specification and build a cross-sectional SAR(2) model with two weight
matrices, WA (WAij = 1 if group i and j are allies, and WA
ij = 0 otherwise) and WE (WEij = 1
if group i and j are enemies, and WEij = 0 otherwise):
y = λ1WAy + λ2W
Ey + 1nβ0 +Xβ + u,
where y is a vector of fighting efforts of each group and X includes the current rainfall,
rainfall from the last year, and their squares.1 To consider the spatial correlation in the
error term, we consider both the Error SARMA(1,0) and Error SARMA(0,1) structures.
For these, we employ a spatial weight matrix W d, based on the inverse distance between
group locations and set to be 0 after 150 km, following Konig et al. (2017). The idea is that
geographical spatial correlation dies out as groups become further apart. We also report
results using a nonparametric estimator of the spatial weights, as described in Section 6 and
studied in simulations in Section 8. For the nonparametric estimator we take r = 2.
In the original dataset, there are 80 groups, but groups 62 and 63 have the same variables
and the same locations, so we drop one group and end up with a sample of 79 groups.
We use data from 1998 as an example and further use the pooled data from all years as
a robustness check. H0 stands for restricted model where the linear functional form of
the regression is imposed, while H1 stands for the unrestricted model where we use basis
functions comprising of power series with p = 10. In all our specifications, the test statistics
are negative, so we cannot reject the null hypothesis that the model is correctly specified.
As Table 6 indicates, this failure to reject the null persists when we use pooled data from
1We follow the analysis in the original paper and do not row normalize. This is because the economiccontent of the weight matrices is defined by total fights of allies or enemies.
23
13 years, yielding 1027 observations. Thus we conclude that a linear specification is not
inappropriate for this setting. One possible reason is that the original regression, though
linear, has already included the squared terms of the rainfall as regressors. This finding is
robust to using the bootstrap tests of Section 7, which generally yield smaller p-values but
unchanged conclusions.
9.2 Innovation spillovers
This example is based on the study of the impact of R&D on growth from Bloom et al.
(2013). They develop a general framework incorporating two types of spillovers: a positive
effect from technology (knowledge) spillovers and a negative ‘business stealing’ effect from
product market rivals. They implement this model using panel data on U.S. firms.
We consider the Productivity Equation in Bloom et al. (2013):
ln y = ϕ1 ln(R&D) + ϕ2 ln(Sptec) + ϕ3 ln(Spsic) + ϕ4X + error, (9.1)
where y is a vector of sales, R&D is a vector of R&D stocks, and regressors in X include the
log of capital (Capital), log of labour (Labour), R&D, a dummy for missing values in R&D,
a price index, and two spillover terms constructed as the log of WSICR&D (Spsic) and the
log of WTECR&D (Sptec), where WSIC measures the product market proximity and WTEC
measures the technological proximity. Specifically, they define
WSIC,ij = SiS′j/(SiS
′i)
1/2(SjS′j)
1/2,WTEC,ij = TiT′j/(TiT
′i )
1/2(TjT′j)
1/2,
where Si = (Si1, Si2, . . . , Si597)′, with Sik being the share of patents of firm i in the four
digit industry k and Ti = (Ti1, Ti2, . . . , Ti426)′, with Tiτ being the share of patents of firm i
in technology class τ . Focussing on a cross-sectional analysis, we use observations from the
year 2000 and obtain a sample size of 577.
The column FE of Table 7 is from Table 5 of Bloom et al. (2013) based on their panel fixed
effects estimation and we use it as a baseline for comparison. We use both WSIC and WTEC
in the Error SARMA(2,0), Error SARMA(0,2), and Error MESS(2) models. In all of these
specifications, the test statistics are larger than 1.645, so we reject the null hypothesis of the
linear specification. This rejection also persists with the bootstrap tests, albeit the p-values
go up compared to the asymptotic ones. However, we can say even more as our estimation
also sheds light on spatial effects in the disturbances in (9.1). As before H0 imposes linear
functional form of the regressors, while H1 uses the nonparametric series estimate employing
power series with p = 10. Regardless of the specification of the regression function, the
24
disturbances suggest a strong spatial effect as the coefficients on WTEC and WSIC are large
in magnitude.
9.3 Economic growth
The final example is based on the study of economic growth rate in Ertur and Koch (2007).
Knowledge accumulated in one area might depend on knowledge accumulated in other ar-
eas, especially in its neighborhoods, implying the possible existence of the spatial spillover
effects. These questions are of interest to both economists as well as regional scientists.
For example, Autant-Bernard and LeSage (2011) examine spatial spillovers associated with
research expenditures for French regions, while Ho, Wang, and Yu (2013) examine the in-
ternational spillover of economic growth through bilateral trade amongst OECD countries,
Cuaresma and Feldkircher (2013) study spatially correlated growth spillovers in the income
convergence process of Europe, and Evans and Kim (2014) study the spatial dynamics of
growth and convergence in Korean regional incomes.
In this section, we want to test the linear SAR model specification in Ertur and Koch
(2007). Their dataset covers a sample of 91 countries over the period 1960-1995 originally
from Heston, Summers, and Aten (2002) obtained from Penn World Tables (PWT version
6.1). The variables in use include per worker income in 1960 (y60) and 1995 (y95), average
rate of growth between 1960 and 1995 (gy), average investment rate of this period (s), and
average rate of growth of working-age population (np).
Ertur and Koch (2007) consider the model
y = λWy +Xβ +WXθ + ε, (9.2)
where the dependent variable is log real income per worker ln(y95), elements of the explana-
tory variable X = (x′1, x′2) include log investment rate ln(s) = x1 and log physical capital
effective rate of depreciation ln(np + 0.05) = x2, with corresponding subscripted coefficients
β1, β2, θ1, θ2. A restricted regression based on the joint constraints β1 = −β2 and θ1 = −θ2(these constraints are implied by economic theory) is also considered in Ertur and Koch
(2007). The model (9.2) is a special case of the SAR model with a regressors (X,WX) and
iid errors, so the test derived in Section 5 can be directly applied here. Denoting by dij the
great-circle distance between the capital cities of countries i and j, one construction of W
takes wij = d−2ij while the other takes wij = e−2dij , following Ertur and Koch (2007). Table 8
presents the estimation and testing results based on using power series basis functions with
p = 10. We cannot reject linearity of the regression function for the unrestricted model. On
the other hand, linearity is rejected for the restricted model, which is the preferred specifi-
25
cation of Ertur and Koch (2007), with wij = e−2dij . Thus, not only can we conclude that
the specification of the model is under suspicion we can also infer this is due to constraints
from economic theory. The findings are supported by the bootstrap tests of Section 7.
26
SARARMA(0,1,0) SARARMA(1,0,1)
p = 10 p = 15 p = 10 p = 151% 5% 10% 1% 5% 10% 1% 5% 10% 1% 5% 10%
n = 60
c = 0 0.01 0.032 0.05 0.036 0.084 0.122 0.048 0.078 0.108 0.094 0.148 0.172
bootstrap 0.008 0.032 0.084 0.004 0.038 0.096 0.006 0.054 0.08 0.016 0.062 0.118
c = 3 0.07 0.156 0.194 0.454 0.58 0.658 0.298 0.41 0.47 0.532 0.656 0.716
0.036 0.154 0.296 0.154 0.414 0.62 0.08 0.264 0.402 0.132 0.41 0.578
c = 6 0.37 0.532 0.644 0.998 1 1 0.748 0.816 0.848 0.96 0.97 0.972
0.22 0.544 0.748 0.844 0.992 1 0.266 0.588 0.748 0.444 0.804 0.894
n = 100
c = 0 0.008 0.03 0.044 0.022 0.052 0.068 0.016 0.036 0.056 0.028 0.062 0.084
0.006 0.044 0.098 0.012 0.046 0.096 0.006 0.054 0.116 0.02 0.056 0.112
c = 3 0.352 0.478 0.574 0.54 0.666 0.744 0.202 0.332 0.398 0.49 0.622 0.716
0.294 0.578 0.72 0.37 0.662 0.824 0.134 0.366 0.496 0.222 0.556 0.732
c = 6 0.984 0.99 0.99 0.998 0.998 0.998 0.78 0.844 0.886 0.994 0.994 0.994
0.95 0.99 0.996 0.992 0.998 1 0.566 0.832 0.916 0.732 0.964 0.986
n = 200
c = 0 0.002 0.016 0.034 0.008 0.026 0.048 0.018 0.038 0.062 0.018 0.046 0.06
0.006 0.038 0.104 0.006 0.048 0.088 0.04 0.086 0.11 0.03 0.078 0.114
c = 3 0.176 0.29 0.356 0.34 0.496 0.582 0.17 0.252 0.354 0.46 0.61 0.68
0.178 0.402 0.55 0.282 0.56 0.694 0.186 0.4 0.524 0.402 0.636 0.754
c = 6 0.888 0.942 0.96 0.99 0.998 1 0.81 0.9 0.934 1 1 1
0.846 0.968 0.984 0.982 0.998 1 0.718 0.904 0.962 0.872 0.98 0.998
Table 1: Rejection probabilities for Tn and T ∗n at 1%, 5% and 10% level with polynomial series basis.
27
SARARMA(0,1,0) SARARMA(1,0,1)
trig1 trig2 trig1 trig21% 5% 10% 1% 5% 10% 1% 5% 10% 1% 5% 10%
n = 60
c = 0 0.01 0.028 0.054 0.02 0.056 0.084 0.052 0.092 0.11 0.112 0.158 0.188
bootstrap 0.004 0.004 0.092 0.004 0.038 0.094 0.012 0.062 0.106 0.026 0.09 0.138
c = 3 0.166 0.248 0.296 0.172 0.29 0.358 0.254 0.35 0.426 0.326 0.456 0.528
0.092 0.276 0.39 0.056 0.22 0.374 0.082 0.256 0.406 0.096 0.222 0.354
c = 6 0.688 0.806 0.854 0.676 0.822 0.866 0.734 0.822 0.874 0.776 0.854 0.896
0.454 0.794 0.908 0.314 0.714 0.872 0.266 0.616 0.782 0.204 0.474 0.658
n = 100
c = 0 0.006 0.012 0.028 0.004 0.028 0.05 0.01 0.03 0.054 0.028 0.052 0.076
0.002 0.04 0.09 0.006 0.036 0.102 0.012 0.046 0.114 0.012 0.044 0.088
c = 3 0.27 0.39 0.484 0.288 0.412 0.508 0.188 0.296 0.366 0.484 0.634 0.702
0.214 0.508 0.626 0.194 0.45 0.632 0.132 0.346 0.514 0.242 0.542 0.698
c = 6 0.956 0.986 0.992 0.948 0.99 0.992 0.824 0.898 0.92 0.854 0.908 0.928
0.902 0.99 0.998 0.856 0.988 1 0.59 0.888 0.96 0.476 0.846 0.918
n = 200
c = 0 0.002 0.014 0.034 0.012 0.028 0.036 0.018 0.034 0.05 0.016 0.034 0.062
0.008 0.042 0.112 0.016 0.038 0.082 0.026 0.076 0.108 0.032 0.074 0.118
c = 3 0.164 0.256 0.312 0.144 0.274 0.356 0.166 0.28 0.344 0.23 0.364 0.436
0.162 0.374 0.532 0.136 0.346 0.468 0.242 0.432 0.526 0.244 0.42 0.542
c = 6 0.818 0.898 0.934 0.816 0.904 0.944 0.884 0.93 0.954 0.91 0.956 0.968
0.796 0.95 0.98 0.776 0.934 0.974 0.78 0.942 0.982 0.794 0.948 0.98
Table 2: Rejection probabilities for Tn and T ∗n at 1%, 5% and 10% level with trigonometric series basis.
28
SARARMA(0,1,0) SARARMA(1,0,1)
p = 10 p = 15 p = 10 p = 151% 5% 10% 1% 5% 10% 1% 5% 10% 1% 5% 10%
n = 60
c = 0 0.01 0.032 0.05 0.036 0.084 0.122 0.052 0.074 0.102 0.094 0.14 0.168
bootstrap 0.008 0.032 0.084 0.004 0.038 0.096 0.006 0.052 0.084 0.012 0.068 0.114
c = 3 0.07 0.156 0.194 0.454 0.58 0.658 0.308 0.422 0.48 0.55 0.66 0.712
0.036 0.154 0.296 0.154 0.414 0.62 0.092 0.27 0.396 0.164 0.408 0.596
c = 6 0.37 0.532 0.644 0.998 1 1 0.776 0.838 0.864 0.98 0.984 0.984
0.22 0.544 0.748 0.844 0.992 1 0.268 0.596 0.752 0.518 0.824 0.9
n = 100
c = 0 0.008 0.03 0.044 0.022 0.052 0.068 0.016 0.04 0.066 0.032 0.062 0.086
0.006 0.044 0.098 0.012 0.046 0.096 0.008 0.058 0.122 0.024 0.062 0.118
c = 3 0.352 0.478 0.574 0.54 0.666 0.744 0.216 0.334 0.41 0.49 0.628 0.71
0.294 0.578 0.72 0.37 0.662 0.824 0.14 0.36 0.494 0.252 0.566 0.73
c = 6 0.984 0.99 0.99 0.998 0.998 0.998 0.784 0.848 0.9 0.98 0.98 0.98
0.95 0.99 0.996 0.992 0.998 1 0.536 0.818 0.914 0.786 0.958 0.974
n = 200
c = 0 0.002 0.016 0.034 0.008 0.026 0.048 0.002 0.038 0.06 0.014 0.034 0.058
0.006 0.038 0.104 0.006 0.048 0.088 0.04 0.08 0.116 0.026 0.064 0.108
c = 3 0.176 0.29 0.356 0.34 0.496 0.582 0.172 0.266 0.356 0.46 0.608 0.688
0.178 0.402 0.55 0.282 0.56 0.694 0.176 0.382 0.516 0.41 0.632 0.738
c = 6 0.888 0.942 0.96 0.99 0.998 1 0.808 0.902 0.932 1 1 1
0.846 0.968 0.984 0.982 0.998 1 0.704 0.894 0.948 0.914 0.986 0.996
Table 3: Rejection probabilities for T an and T a∗
n at 1%, 5% and 10% level with polynomial series basis.
29
SARARMA(0,1,0) SARARMA(1,0,1)
trig1 trig2 trig1 trig21% 5% 10% 1% 5% 10% 1% 5% 10% 1% 5% 10%
n = 60
c = 0 0.01 0.028 0.054 0.02 0.056 0.084 0.048 0.092 0.108 0.104 0.152 0.18
bootstrap 0.004 0.004 0.092 0.004 0.038 0.094 0.014 0.064 0.096 0.024 0.088 0.13
c = 3 0.16 0.252 0.292 0.174 0.29 0.358 0.276 0.366 0.44 0.344 0.468 0.538
0.09 0.274 0.384 0.056 0.22 0.376 0.08 0.25 0.406 0.102 0.242 0.37
c = 6 0.682 0.798 0.85 0.676 0.822 0.866 0.744 0.864 0.9 0.81 0.874 0.91
0.444 0.794 0.906 0.312 0.714 0.87 0.248 0.61 0.792 0.206 0.484 0.658
n = 100
c = 0 0.006 0.012 0.026 0.006 0.028 0.05 0.014 0.032 0.054 0.036 0.06 0.084
0.004 0.038 0.092 0.006 0.036 0.106 0.01 0.046 0.116 0.014 0.044 0.096
c = 3 0.268 0.396 0.486 0.288 0.412 0.508 0.194 0.304 0.376 0.49 0.648 0.718
0.214 0.504 0.63 0.194 0.45 0.632 0.122 0.354 0.52 0.272 0.568 0.696
c = 6 0.958 0.986 0.992 0.952 0.99 0.992 0.832 0.908 0.916 0.874 0.92 0.94
0.9 0.99 0.998 0.856 0.988 1 0.554 0.884 0.948 0.478 0.834 0.916
n = 200
c = 0 0.002 0.018 0.038 0.012 0.028 0.032 0.018 0.036 0.044 0.016 0.034 0.064
0.012 0.046 0.114 0.016 0.042 0.08 0.03 0.076 0.102 0.028 0.06 0.122
c = 3 0.156 0.258 0.312 0.144 0.272 0.352 0.17 0.272 0.348 0.23 0.364 0.436
0.162 0.38 0.524 0.134 0.35 0.466 0.22 0.438 0.526 0.256 0.428 0.538
c = 6 0.816 0.908 0.936 0.816 0.904 0.944 0.876 0.928 0.956 0.912 0.956 0.97
0.802 0.952 0.978 0.774 0.934 0.972 0.746 0.934 0.976 0.776 0.93 0.976
Table 4: Rejection probabilities for T an and T a∗
n at 1%, 5% and 10% level with trigonometric series basis.
30
r = 2 r = 3 r = 4 r = 5n = 150 p = 10 p = 15 p = 10 p = 15 p = 10 p = 15 p = 10 p = 15 p = 20c = 0 0.0860 0.2020 0.1180 0.2060 0.1420 0.2240c = 3 0.3320 0.6340 0.3700 0.6380 0.3760 0.6700c = 6 0.9060 0.9920 0.9180 0.9940 0.9220 0.9960
n = 300
c = 0 0.0820 0.0960 0.0880 0.1080 0.1060 0.1100c = 3 0.2680 0.5980 0.2600 0.6120 0.2780 0.6220c = 6 0.8140 0.9980 0.8160 0.9980 0.8220 0.9980
n = 500
c = 0 0.0280 0.0420 0.0260 0.0400 0.0360 0.0480c = 3 0.2320 0.6660 0.2400 0.6620 0.2460 0.6680c = 6 0.8920 1 0.9040 1 0.9000 1
n = 600
c = 0 0.0320 0.0500 0.0340 0.0540 0.0360 0.0540c = 3 0.3140 0.6480 0.3080 0.6280 0.3120 0.6460c = 6 0.9220 1 0.9180 1 0.9180 1n = 700c = 0 0.0260 0.0300 0.0280 0.0380 0.0280 0.0380 0.0280 0.0420 0.0580c = 3 0.2420 0.5540 0.2400 0.5480 0.2520 0.5500 0.2420 0.5600 0.6920c = 6 0.9580 0.9980 0.9560 0.9980 0.9600 0.9980 0.9500 0.9980 1
Table 5: Rejection probabilities of Tn at 5% asymptotic level, nonparametric spatial errorstructure.
31
1998 Pooled
H0 p-value H1 p-value H0 p-value H1
SARARMA(2,1,0)
WAy -0.005 <0.001 -0.003 <0.001 0.013 <0.001 0.013 <0.001WEy 0.130 <0.001 0.129 <0.001 0.121 <0.001 0.121 <0.001W d -0.159 0.281 -0.225 <0.001 -0.086 0.033 -0.086 0.033Tn -1.921 0.973 -2.531 0.994T ∗
n 0.840 0.940T a
n -1.918 0.972 -2.547 0.995T a∗
n 0.870 0.730
SARARMA(2,0,1)
WAy 0.001 <0.01 0.011 <0.01 0.013 <0.01 0.013 <0.01WEy 0.127 <0.01 0.122 <0.01 0.121 <0.01 0.121 <0.01W d -0.153 <0.01 -0.050 <0.01 -0.086 <0.01 -0.086 0.025Tn -1.763 0.961 -2.421 0.992T ∗
n 0.900 0.990T a
n -2.349 0.991 -2.423 0.992T a∗
n 0.850 0.790
Nonparametric
WAy -0.052 <0.001 -0.011 <0.001 0.033 <0.001 0.033 <0.001WEy 0.149 <0.001 0.133 <0.001 0.110 <0.001 0.109 <0.001W d
Tn -1.294 0.902 -2.314 0.990T ∗
n 0.830 0.850T a
n -1.898 0.971 -2.530 0.994T a∗
n 0.660 0.910
Table 6: The estimates and test statistics for the conflict data. ∗ denotes the bootstrapp-value.
32
Variables FE SARARMA(0,1,0), WTEC SARARMA(0,1,0), WSIC
p-value H0 p-value H1 p-value H0 p-value H1 p-valueln(sic) -0.005 0.649 0.007 0.574 0.015 0.166 0.008 0.620 0.017 0.193ln(tech) 0.191 <0.001 0.006 0.850 -0.001 0.998 0.039 0.157 0.020 0.336ln(Lab.) 0.636 <0.001 0.572 <0.001 0.571 ¡0.001ln(Cap.) 0.154 <0.001 0.336 <0.001 0.318 <0.001ln(R&D) 0.043 <0.001 0.081 <0.001 0.082 <0.001WSIC 0.722 <0.001 0.724 <0.001WTEC 0.835 <0.001 0.829 <0.001Tn 15.528 <0.001 10.451 <0.001T ∗
n 0.050 <0.001
Variables SARARMA(0,2,0) SARARMA(0,0,2) Error MESS(2)
H0 p-value H1 p-value H0 p-value H1 p-value H0 H1
ln (sic) 0.009 0.587 0.018 0.170 -0.0002 0.991 0.013 0.266 0.002 0.788 0.014 0.040
ln (tech) 0.044 0.112 0.026 0.236 0.033 0.200 0.017 0.434 0.045 0.025 0.027 0.088
ln (Lab.) 0.573 <0.001 0.565 <0.01 0.569 <0.001
ln (Cap.) 0.315 <0.001 0.334 <0.01 0.323 <0.001
ln (R&D) 0.082 <0.001 0.076 <0.01 0.077 <0.001
WSIC 0.696 <0.001 0.693 <0.001 0.624 <0.01 0.728 <0.001 0.775 <0.001 0.836 <0.001
WTEC 0.157 0.092 0.164 0.079 0.312 0.123 0.321 0.112 0.338 0.010 0.380 0.004
Tn 10.485 <0.001 15.144 <0.001 12.776 <0.001
T ∗n 0.060 0.020 0.050
Table 7: The estimates and test statistics for the R&D data. ∗ denotes the bootstrap p-value
33
Variable w∗ij = d−2
ij for i 6= j w∗ij = e−2dij for i 6= j
estimate p-value estimate p-valueConstant 1.0711 0.608 0.5989 0.798ln(s) 0.8256 < 0.001 0.7938 < 0.001ln(np + 0.05) −1.4984 0.008 −1.4512 0.009W ln(s) −0.3159 0.075 −0.3595 0.020W ln(np + 0.05) 0.5633 0.498 0.1283 0.856Wy 0.7360 < 0.001 0.6510 < 0.001Tn −1.88 0.970 −2.08 0.981T ∗
n 0.850 0.900T a
n −1.90 0.971 −2.05 0.980T a∗
n 0.820 0.810Restricted regressionConstant 2.1411 < 0.001 2.9890 < 0.001ln(s)− ln(n+ 0.05) 0.8426 < 0.001 0.8195 < 0.001W [ln(s)− ln(np + 0.05)] −0.2675 0.122 −0.2589 0.098W ln(y) 0.7320 < 0.001 0.6380 < 0.001Tn 0.30 0.382 4.04 < 0.001T ∗
n 0.500 < 0.001T a
n 0.10 0.460 4.50 < 0.001T a∗
n 0.560 0.040
Table 8: The estimates and test statistics of the linear SAR model for the growth data. ∗
denotes the bootstrap p-value.
34
Appendix
A Proofs of theorems and propositions
Proof of Proposition 4.1: Because the map Σ : Γo → Mn×n is Frechet-differentiable on Γo,
it is also Gateaux-differentiable and the two derivative maps coincide. Thus by Theorem 1.8
of Ambrosetti and Prodi (1995), ‖Σ (γ1)− Σ (γ2)‖ ≤ supγ∈ℓ[γ1,γ2] ‖DΣ(γ)‖ ‖γ1 − γ2‖ , whereℓ [γ1, γ2] = {tγ1 + (1− t)γ2 : t ∈ [0, 1]}. The claim now follows by (4.3) in Assumption 8.
Proof of Theorem 4.1. This a particular case of the proof of Theorem 5.1 with λ = 0, and
so S(λ) = In.
Proof of Theorem 4.2. In supplementary appendix.
Proof of Theorem 4.3. We would like to establish the asymptotic unit normality of
σ−20 ε′V ε− p√
2p. (A.1)
Writing q =√2p, the ratio in (A.1) has zero mean and variance equal to one, and may be
written as∑∞
s=1ws, where ws = σ−20 q−1vss (ε
2s − σ2
0)+2σ−20 q−11(s ≥ 2)εs
∑t<s vstεt, with vst
the typical element of V , with s, t = 1, 2, . . . ,. We first show that
w∗p−→ 0, (A.2)
where w∗ = w − wS, wS =∑S
s=1ws and S = Sn is a positive integer sequence that is
increasing in n. All expectations in the sequel are taken conditional on X . By Chebyshev’s
inequality proving
Ew2∗
p→ 0 (A.3)
is sufficient to establish (A.2). Notice that Ew2s ≤ Cq−2v2ss + Cq−21(s ≥ 2)
∑t<s v
2st ≤
Cq−2∑
t≤s v2st, so that, writing M = Σ−1Ψ[Ψ′Σ−1Ψ]−1Ψ′Σ−1,
∞∑
s=S+1
Ew2s ≤ Cq−2
∞∑
s=S+1
∑
t≤s
v2st ≤ Cq−2∞∑
s=S+1
b′sM∑
t≤s
btb′tM bs
≤ Cq−2 ‖Σ‖∞∑
s=S+1
b′sM2bs ≤ Cq−2
∞∑
s=S+1
n∑
i,j,k=1
bisbktmijmkj
≤ Cq−2∞∑
s=S+1
n∑
i,k=1
|b∗is| |b∗ks|n∑
j=1
(m2
kj +m2ij
), (A.4)
35
where mij is the (i, j)-th element of M and we have used the inequality |ab| ≤ (a2 + b2) /2
in the last step. Now, denote by h′i the i-th row of the n × p matrix Σ−1Ψ. Denoting the
elements of Σ−1 by Σ−1ij and ψjl = ψ (xjl), hi has entries hil =
∑nj=1Σ
−1ij ψjl, l = 1, . . . , p. We
have |hil| = Op
(∑nj=1
∣∣Σ−1ij
∣∣)= Op (‖Σ−1‖R) = Op(1), uniformly, by Assumptions R.11 and
R.13. Thus, we have ‖hi‖ = Op
(√p), uniformly in i. As a result,
|mij | = n−1∣∣∣h′i(n−1Ψ′Σ−1Ψ
)−1hj
∣∣∣ = Op
(n−1 ‖hi‖ ‖hj‖
)= Op
(pn−1
), (A.5)
uniformly in i, j, by Assumption R.11. Similarly, note that
n∑
j=1
m2ij = n−1h′i
(n−1Ψ′Σ−1Ψ
)−1 (n−1Ψ′Σ−2Ψ
) (n−1Ψ′Σ−1Ψ
)−1hi
≤ n−1 ‖hi‖2∥∥∥(n−1Ψ′Σ−1Ψ
)−1∥∥∥2 ∥∥n−1Ψ′Σ−2Ψ
∥∥
= Op
(pn−2 ‖Ψ‖2
∥∥Σ−1∥∥2)= Op
(pn−1
), (A.6)
uniformly in i. Thus (A.4) is
Op
(q−2pn−1
n∑
i=1
∞∑
s=S+1
|b∗is|n∑
t=1
|b∗ks|)
= Op
(q−2p sup
i=1,...,n
∞∑
s=S+1
|b∗is|), (A.7)
by Assumption R.4. By the same assumption, there exists Sin such that∑∞
s=Sin+1 |b∗is| ≤ ǫn
for any decreasing sequence ǫn → 0 as n → ∞. Choosing S = maxi=1,...,n Sin in wS, we
deduce that (A.7) is Op (q−2pǫn) = Op (ǫn) = op(1), proving (A.3). Thus we need only focus
on wS, and seek to establish that
wS −→d N(0, 1), as n→ ∞. (A.8)
From Scott (1973), (A.8) follows if
S∑
s=1
Ew4s
p−→ 0, as n→ ∞, (A.9)
andS∑
s=1
[E(w2
s | εt, t < s)− E
(w2
s
)] p−→ 0, as n→ ∞. (A.10)
36
We show (A.9) first. Evaluating the expectation and using (A.6) yields
Ew4s ≤ Cq−4v4ss + Cq−4
∑
t<s
v4st ≤ Cq−4
(∑
t≤s
v2st
)2
≤ Cq−4
(b′sM
∑
t≤s
btb′tM bs
)2
≤ Cq−4(b′sM
2bs)2
= Cq−4n∑
i,j,k=1
bisbksmijmkj ≤ Cq−4n∑
i,k=1
|b∗is| |b∗ks|n∑
j=1
(m2
ij +m2kj
)
= Op
q−4pn−1
(n∑
i=1
|b∗is|)2 ,
whence
S∑
s=1
Ew4s = Op
q−4pn−1
S∑
s=1
(n∑
i=1
|b∗is|)2 = Op
(q−4pn−1
S∑
s=1
(n∑
i=1
|b∗is|))
= Op
(q−4p
),
by Assumption R.4. Thus (A.9) is established. Notice that E (w2s | ǫt, t < s) equals
4q−2σ−40
{(µ4 − σ4
0
)v2ss + 2µ31(s ≥ 2)
∑
t<s
vstvssεt
}+ 4q−2σ−2
0 1(s ≥ 2)
(∑
t<s
vstεt
)2
,
and Ew2s = 4q−2σ−4
0 (µ4 − σ40) v
2ss + 4q−21(s ≥ 2)
∑t<s v
2st, so that (A.10) is bounded by a
constant times
q−2S∑
s=2
∑
t<s
vstvssεt +
S∑
s=2
(∑
t<s
vstεt
)2
− σ20
∑
t<s
v2st
. (A.11)
By transforming the range of summation, the square of the first term in (A.11) has expec-
tation bounded by
Cq−4E
(S−1∑
t=1
S∑
s=t+1
vstvssεt
)2
≤ Cq−4S−1∑
t=1
(S∑
s=t+1
vstvss
)2
, (A.12)
where the factor in parentheses on the RHS of (A.12) is
S∑
s,r=t+1
b′sM bsb′sM btb
′rM brb
′rM bt ≤
S∑
s,r=t+1
|b′sM bsb′rM br| |b′sM bt| |b′rM bt|
≤ C
S∑
s,r=t+1
n∑
i,j,k,l=1
|bis| |mij | |bjr| |bks| |mlk| |bkr| |b′sM bt| |b′rM bt|
37
≤ C
(supi,j
|mij |)2(sups≥1
n∑
i=1
|b∗is|)4 S∑
s,r=t+1
|b′sM bt| |b′rM bt|
= Op
p2n−2
(S∑
s=t+1
|b′tM bs|)2 = Op
p2n−2
(S∑
s=t+1
n∑
i,j=1
|b∗it| |mij |∣∣b∗js∣∣)2 ,
where we used Assumptions R.4 and (A.5). Now Assumptions R.4, R.11 and (A.5) imply
that
S∑
s=t+1
n∑
i,j=1
|b∗it| |mij |∣∣b∗js∣∣ = Op
(supi,j
|mij | supt
n∑
i=1
|b∗it|n∑
j=1
S∑
s=t+1
∣∣b∗js∣∣)
= Op
(p sup
t
n∑
i=1
|b∗it|),
so (A.12) is Op
(q−4p4n−2 supt (
∑ni=1 |b∗it|)
(∑ni=1
(∑S−1t=1 |b∗it|
))). By Assumption R.4 the
latter is Op (q−4p4n−1) and therefore the first term in (A.11) is Op (p
2n−1), which is negligible.
Once again transforming the summation range and using the inequality |a + b|2 ≤C (a2 + b2), we can bound the square of the second term in (A.11) by a constant times
(S−1∑
t=1
S∑
s=t+1
v2st(ε2t − σ2
0
))2
+
(2
S−1∑
t=1
t−1∑
r=1
S∑
s=t+1
vstvsrεtεr
)2
. (A.13)
Using Assumption R.4, the expectations of the two terms in (A.13) are bounded by a con-
stant times α1 and a constant times α2, respectively, where α1 =∑S−1
t=1
(∑Ss=t+1 v
2st
)2, α2 =
∑S−1t=1
∑t−1r=1
(∑Ss=t+1 vstvsr
)2. Thus (A.13) is Op (α1 + α2). Now by (A.5), Assumptions R.4,
R.11 and elementary inequalities α2 is bounded by
S−1∑
t=1
t−1∑
r=1
S∑
s=t+1
S∑
u=t+1
b′sM btb′sM brb
′uM btb
′uM br
= Op
(q−4
S∑
s,r,t,u=1
n∑
i,j=1
|b∗ir| |mij |∣∣b∗js∣∣
n∑
i,j=1
|b∗ir| |mij |∣∣b∗ju∣∣
n∑
i,j=1
|b∗it| |mij |∣∣b∗js∣∣
n∑
i,j=1
|b∗it| |mij |∣∣b∗ju∣∣)
= Op
(q−4pn−1
S∑
s,r,t=1
(n∑
i,j=1
|b∗ir| |mij |∣∣b∗js∣∣)(
n∑
i,j=1
|b∗ir| |mij|S∑
u=1
∣∣b∗ju∣∣)
×n∑
i,j=1
|b∗it| |mij |∣∣b∗js∣∣
n∑
i=1
|b∗it| supu
n∑
j=1
∣∣b∗ju∣∣)
= Op
(q−4p2n−2
S∑
s,r=1
(n∑
i,j=1
|b∗ir| |mij |∣∣b∗js∣∣)
n∑
i=1
|b∗ir|n∑
j=1
(S∑
u=1
∣∣b∗ju∣∣)(
n∑
i,j=1
S∑
t=1
|b∗it| |mij|∣∣b∗js∣∣))
38
= Op
(q−4p2n−1
n∑
i,j=1
(S∑
r=1
|b∗ir|)|mij |
(S∑
s=1
∣∣b∗js∣∣)(
supj
n∑
i=1
|mij|)
n∑
j=1
∣∣b∗js∣∣)
= Op
(q−4p2n−1 sup
k
n∑
i,j=1
|mij |n∑
i=1
|mik|)
= Op
(q−4p2n−1 sup
k
n∑
i,j,ℓ=1
|mij | |mℓk|)
= Op
(q−4p2n−1 sup
k
n∑
i,j,ℓ=1
(m2
ij +m2ℓk
))
= Op
(q−4p2n−1
n∑
i,j,ℓ=1
(m2
ij +m2ℓj
))
= Op
(q−4p2n−1
n∑
i,j=1
m2ij
)= Op
(q−4p2 sup
j
n∑
i=1
m2ij
)= Op
(pn−1
),
where we used (A.6) in the last step. A similar use of the conditions of the theorem and
(A.5) implies that α1 is
Op
q−4
S−1∑
t=1
S∑
s=t+1
(n∑
i,j=1
|mij |∣∣b∗jt∣∣ |b∗is|
)2
2
= Op
q−4
(supi,j
|mij |)4 S−1∑
t=1
S∑
s=t+1
(n∑
i=1
|b∗is|n∑
j=1
∣∣b∗jt∣∣)2
2
= Op
q−4p4n−4
S−1∑
t=1
S∑
s=t+1
(n∑
i=1
|b∗is|)2( n∑
j=1
∣∣b∗jt∣∣)2
2
= Op
q−4p4n−4
S−1∑
t=1
S∑
s=t+1
(n∑
i=1
|b∗is|)2
2(n∑
j=1
∣∣b∗jt∣∣)4
= Op
q−4p4n−4
(S−1∑
t=1
n∑
j=1
∣∣b∗jt∣∣)(
S∑
s=t+1
n∑
i=1
|b∗is|)2
sups
(n∑
i=1
|b∗is|)2
supt
(n∑
j=1
∣∣b∗jt∣∣)3
= Op
(q−4p4n−1
)= Op
(p2n−1
)
proving (A.10), as p2/n→ 0 by the conditions of the theorem.
Proof of Theorem 4.4. In supplementary appendix.
Proof of Theorem 5.1. Due to the similarity with proofs in Delgado and Robinson (2015)
and Gupta and Robinson (2018), the details are in the supplementary appendix.
Proof of Theorem 5.2. Denote θ∗ as the solution of minθ E(yi −
∑dλj=1 λjw
′i,jy − θ(xi)
)2.
Put θ∗i = θ∗(xi), θ0i = θ0(xi), θi = ψ′iβ , fi = f(xi, α), f ∗
i = f(xi, α∗). Then
ui = yi−∑dλ
j=1 λjw′i,jy−f(xi, α) = ui+θ0i+
∑dλj=1(λj0−λj)w′
i,jy−fi. Proceeding as in the proof
39
of Theorem 4.2, we obtain nmn = σ−2u′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u+ σ−2∑7
j=1Aj .
Thus, compared to the test statistic with no spatial lag, cf. the proof of Theorem 4.2, we
have the additional terms
A5 =
dλ∑
j=1
(λj0 − λj)y′W ′
jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1dλ∑
j=1
(λj0 − λj)Wjy,
A6 =
dλ∑
j=1
(λj0 − λj)y′W ′
jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 (u+ θ0 − f),
A7 =(Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 (u+e)− e + θ0 − f
)′Σ (γ)−1
dλ∑
j=1
(λj0 − λj)Wjy.
We now show that Aℓ = op(√p), ℓ > 4, so the leading term in nmn is the same as before. First
‖y‖ = Op(√n) from y = (In−
∑dλj=1 λj0Wj)
−1 (θ0 + u). Then, with∥∥∥λ
0− λ∥∥∥ = Op
(√dγ/n
)
by Lemma D.2, we have
|A5| ≤∥∥∥λ
0− λ∥∥∥2
dλ∑
j=1
‖Wj‖2 supγ,j
∥∥∥∥∥Σ (γ)−1 1
nΨ
(1
nΨ′Σ (γ)−1Ψ
)−1
Ψ′Σ (γ)−1
∥∥∥∥∥ ‖y‖2
= Op (dγ/n)Op(1)Op(n) = Op (dγ) = op(√p).
Uniformly in γ and j,
E(u′S−1′W ′
jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u)
= Etr
((1
nΨ′Σ (γ)−1Ψ
)−11
nΨ′Σ (γ)−1ΣS−1′W ′
jΣ (γ)−1Ψ
)= Op(p)
and
E(θ′0S
−1′W ′jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u
)2
= Op
∥∥S−1
∥∥2 supγ
∥∥Σ (γ)−1∥∥4∥∥∥∥∥1
nΨ
(1
nΨ′Σ (γ)−1Ψ
)−1
Ψ′
∥∥∥∥∥
2
supj
‖Wj‖2 ‖Σ‖ ‖θ0‖2 = Op(n).
Similarly, θ′0S−1′W ′
jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1Wjθ0 = Op(n), uniformly. Therefore,
∣∣∣∣∣
dλ∑
j=1
(λj0 − λj)y′W ′
jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u
∣∣∣∣∣
40
=
∣∣∣∣∣
dλ∑
j=1
(λj0 − λj) (θ0 + u)′ S−1′W ′jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u
∣∣∣∣∣
≤ dλ
∥∥∥λ0− λ∥∥∥ sup
γ,j
∣∣θ′0S−1′W ′jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u
∣∣
+dλ
∥∥∥λ0− λ∥∥∥ sup
γ,j
∣∣u′S−1′W ′jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u
∣∣
= Op
(√dγ/n
)Op(
√n) +Op
(√dγ/n
)Op(p) = Op
(√dγ
)= op (
√p) ,
and
∣∣∣∣∣
dλ∑
j=1
(λj0 − λj)y′W ′
jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 (θ0 − f)
∣∣∣∣∣
≤ dλ
∥∥∥λ0− λ∥∥∥ ‖y‖ sup
j‖Wj‖ sup
γ
∥∥∥∥∥1
nΨ
(1
nΨ′Σ (γ)−1Ψ
)−1
Ψ
∥∥∥∥∥ supγ∥∥Σ (γ)−1
∥∥2∥∥∥θ0 − f
∥∥∥
= Op
(√dγ/n
)Op
(√n)Op
(p1/4
)= Op
(√dγp
1/4)= op(
√p),
so that A6 = op(√p). Finally,
∣∣∣∣∣
dλ∑
j=1
(λj0 − λj)y′W ′
jΣ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 e
∣∣∣∣∣
≤ dλ
∥∥∥λ0− λ∥∥∥ ‖y‖ sup
j‖Wj‖ sup
γ
∥∥∥∥∥1
nΨ
(1
nΨ′Σ (γ)−1Ψ
)−1
Ψ
∥∥∥∥∥ supγ∥∥Σ (γ)−1
∥∥2 ‖e‖
= Op
(√dγ/n
)Op
(√n)Op
(p−µ
√n)= Op
(√dγp
−µ√n)= op(
√p),
and
∣∣∣∣∣(e+ θ0 − f)′Σ (γ)−1dλ∑
j=1
(λj0 − λj)Wjy
∣∣∣∣∣
≤ dλ
∥∥∥λ0− λ∥∥∥(‖e‖+
∥∥∥θ0 − f∥∥∥)supγ
∥∥Σ (γ)−1∥∥ sup
j‖Wj‖ ‖y‖
= Op
(√dγ/n
)Op
(p−µ
√n+ p1/4
)Op
(√n)= Op
(√dγp
−µ√n+
√dγp
1/4)= op(
√p),
implying that A7 = op(√p).
Proof of Theorem 5.3. Omitted as it is similar to the proof of Theorem 4.4.
41
Proof of Proposition 6.1: Because the map Σ : T o → Mn×n is Frechet-differentiable on T o,
it is also Gateaux-differentiable and the two derivative maps coincide. Thus by Theorem 1.8
of Ambrosetti and Prodi (1995),
‖Σ(t1)− Σ(t1)‖ ≤ supt∈T o
‖DΣ(t)‖L (T o,Mn×n)
‖γ1 − γ2‖+
dζ∑
ℓ=1
∥∥(δℓ1 − δℓ2)′ ϕℓ
∥∥w
, (A.14)
where
dζ∑
ℓ=1
∥∥(δℓ1 − δℓ2)′ ϕℓ
∥∥w
=
dζ∑
ℓ=1
supz∈Z
∣∣(δℓ1 − δℓ2)′ ϕℓ
∣∣ (1 + ‖z‖2)−w/2
≤dζ∑
ℓ=1
‖δℓ1 − δℓ2‖ supz∈Z
‖ϕℓ‖(1 + ‖z‖2
)−w/2
≤ Cς(r)
dζ∑
ℓ=1
‖δℓ1 − δℓ2‖ ≤ Cς(r) ‖t1 − t2‖ .
The claim now follows by (6.8) in Assumption NPN.2, because ‖γ1 − γ2‖ ≤ Cς(r) ‖t1 − t2‖for some suitably chosen C.
Proof of Theorem 6.1. The proof is omitted as it is entirely analogous to that of Theorem
5.1, with the exception of one difference when proving equicontinuity. In the setting of
Section 6, we obtain via Proposition 6.1 ‖Σ(τ)− Σ (τ ∗)‖ = Op (ς(r)ε). Because ε > 0 is
arbitrarily small we may choose it smaller than ε′/ς(r), for some arbitrary ǫ′ > 0.
Proof of Theorem 6.2. Writing, δ(z) =(δ′1ϕ1(z), . . . , δ
′dζϕdζ (z)
)′and taking t1 =
(γ′, δ(z)′
)′
and t2 = (γ′0, ζ0(z)′)′ in Proposition 6.1 implies (we suppress the argument z)
‖Σ (τ)− Σ‖ = Op
(ς(r)
(‖γ − γ0‖+
∥∥∥δ − ζ0
∥∥∥))
= Op (ς(r) (‖τ − τ0‖+ ‖ν‖))
= Op
ς(r)max
√dτ/n,
√√√√dζ∑
ℓ=1
r−2κℓ
ℓ
,
uniformly on Z. Thus we have
∥∥Σ (τ )−1 − Σ−1∥∥ ≤
∥∥Σ (τ )−1∥∥ ‖Σ (τ)− Σ‖
∥∥Σ−1∥∥ = Op
ς(r)max
√dτ/n,
√√√√dζ∑
ℓ=1
r−2κℓ
ℓ
.
42
And similarly,
∥∥∥∥∥
(1
nΨ′Σ (τ )−1Ψ
)−1
−(1
nΨ′Σ−1Ψ
)−1∥∥∥∥∥
≤∥∥∥∥∥
(1
nΨ′Σ (τ )−1Ψ
)−1∥∥∥∥∥
∥∥∥∥1
nΨ′(Σ (τ)−1 − Σ−1
)Ψ
∥∥∥∥
∥∥∥∥∥
(1
nΨ′Σ−1Ψ
)−1∥∥∥∥∥
= Op
(∥∥Σ (τ)−1 − Σ−1∥∥) = Op
ς(r)max
√dτ/n,
√√√√dζ∑
ℓ=1
r−2κℓ
ℓ
.
As in the proof of Theorem 4.2, nmn = σ−2u′Σ (τ )−1Ψ[Ψ′Σ (τ)−1Ψ]−1Ψ′Σ (τ)−1 u +
σ−2∑4
k=1Ak, where γ in the parametric setting is changed to τ in this nonparametric setting.
Then, by the MVT,
∣∣u′(Σ (τ)−1Ψ[Ψ′Σ (τ )−1Ψ]−1Ψ′Σ (τ )−1 − Σ−1Ψ[Ψ′Σ−1Ψ]−1Ψ′Σ−1
)u∣∣
≤ 2
(supt
∥∥∥∥1√nu′Σ (t)−1Ψ
∥∥∥∥
∥∥∥∥∥
(1
nΨ′Σ (t)−1Ψ
)−1∥∥∥∥∥
)dτ∑
j=1
∥∥∥∥1√nΨ′(Σ (τ )−1Σj (τ )Σ (τ )−1)u
∥∥∥∥
× |τj − τj0|+ 2 supt
∥∥∥∥1√nu′Σ (t)−1Ψ
∥∥∥∥
∥∥∥∥∥
(1
nΨ′Σ (t)−1Ψ
)−1∥∥∥∥∥
∥∥∥∥1√nΨ′ (Σ0 − Σ) u
∥∥∥∥
+
∥∥∥∥1√nu′Σ−1Ψ
∥∥∥∥2∥∥∥∥∥
(1
nΨ′Σ (τ )−1Ψ
)−1
−(1
nΨ′Σ−1Ψ
)−1∥∥∥∥∥
= Op(√p)Op(dτ
√pς(r)/
√n) +Op(
√p)Op
√pς(r)
√√√√dζ∑
ℓ=1
r−2κℓ
ℓ
+ Op(p)Op
ς(r)max
√dτ/n,
√√√√dζ∑
ℓ=1
r−2κℓ
ℓ
= Op
pς(r)max
dτ/
√n,
√√√√dζ∑
ℓ=1
r−2κℓ
ℓ
= op(
√p),
where the last equality holds under the conditions of the theorem. Next, it remains to show
Ak = op(p1/2), k = 1, . . . , 4. The order of Ak, k ≤ 3, is the same as the parametric case:
|A1| =∣∣∣u′Σ (τ )−1
(θ0 − f
)∣∣∣ ≤ supα,t
∥∥∥∥u′Σ (t)−1 ∂f(x, α)
∂αj
∥∥∥∥∣∣α∗
j − αj
∣∣+ p1/4
n1/2supt
∥∥u′Σ (t)−1 h∥∥
43
= Op(√n)Op(
1√n) +O(
p1/4
n1/2)Op(
√n) = Op(p
1/4) = op(p1/2),
|A2| =∣∣∣(u+θ0 − f)′
(Σ (τ)−1 − Σ (τ )−1Ψ[Ψ′Σ (τ)−1Ψ]−1Ψ′Σ (τ)−1) e
∣∣∣≤ sup
t|u′Σ (t)−1 e| + sup
t
∣∣u′Σ (t)−1Ψ[Ψ′Σ (t)−1Ψ]−1Ψ′Σ (t)−1 e∣∣
+∥∥∥θ0 − f
∥∥∥ supt
(∥∥Σ (t)−1∥∥+
∥∥Σ (t)−1Ψ[Ψ′Σ (t)−1Ψ]−1Ψ′Σ (t)−1∥∥) ‖e‖
= Op(p−µn1/2) +Op(p
−µ+1/4n1/2) = Op(p−µ+1/4n1/2) = op(
√p),
|A3| =∣∣∣u′Σ (τ )−1Ψ
(Ψ′Σ (τ )−1Ψ
)−1Ψ′Σ (τ )−1 (θ0 − f)
∣∣∣
≤ supα,t
dα∑
j=1
∥∥∥∥u′Σ (t)−1Ψ
(Ψ′Σ (t)−1Ψ
)−1Ψ′Σ (t)−1 ∂f(x, α)
∂αj
∥∥∥∥∣∣α∗
j − αj
∣∣
+p1/4
n1/2supt
∥∥∥u′Σ (t)−1Ψ(Ψ′Σ (t)−1Ψ
)−1Ψ′Σ (t)−1 h
∥∥∥
= Op(1) +Op(p1/4) = Op(p
1/4) = op(p1/2).
However, A4 has a different order. Under Hℓ,
A4 =(θ0 − f
)′Σ (γ)−1
(θ0 − f
)
=(θ0 − f
)′Σ−1
0
(θ0 − f
)+(θ0 − f
)′ (Σ (τ)−1 − Σ−1
) (θ0 − f
)
=p1/2
nh′Σ−1
0 h+ op(1) +Op
(p1/2
)Op
ς(r)max
√dτ/n,
√√√√dζ∑
ℓ=1
r−2κℓ
ℓ
=p1/2
nh′Σ−1
0 h+ op(√p),
where the last equality holds under the conditions of the theorem. Combining these together,
we have nmn = σ−2v′Σ (τ )−1 u = σ−20 ε′V ε +
(p1/2/n
)h′Σ−1
0 h + op(√p), under Hℓ and the
same expression holds with h = 0 under H0.
Proof of Theorem 6.3. Omitted as it is similar to the proof of Theorem 4.4.
B Proofs of Theorems 4.2 and 4.4
Proof of Theorem 4.2. From Corollary 4.1 and Lemma D.2, ‖Σ (γ)− Σ‖ = Op (‖γ − γ0‖) =√dγ/n, so we have, from Assumption R.3,
∥∥Σ (γ)−1 − Σ−1∥∥ ≤
∥∥Σ (γ)−1∥∥ ‖Σ (γ)− Σ‖
∥∥Σ−1∥∥ = Op (‖γ − γ0‖) =
√dγ/n. (B.1)
44
Similarly,
∥∥∥∥∥
(1
nΨ′Σ (γ)−1Ψ
)−1
−(1
nΨ′Σ−1Ψ
)−1∥∥∥∥∥
≤∥∥∥∥∥
(1
nΨ′Σ (γ)−1Ψ
)−1∥∥∥∥∥
∥∥∥∥1
nΨ′(Σ (γ)−1 − Σ−1
)Ψ
∥∥∥∥
∥∥∥∥∥
(1
nΨ′Σ−1Ψ
)−1∥∥∥∥∥
≤ supγ∈Γ
∥∥∥∥∥
(1
nΨ′Σ (γ)−1Ψ
)−1∥∥∥∥∥∥∥Σ (γ)−1 − Σ−1
∥∥∥∥∥∥
1√nΨ
∥∥∥∥2
= Op (‖γ − γ0‖) =√dγ/n.
By Assumption R.2, we have α − α∗ = Op(1/√n). Denote by θ∗(x) = ψ(x)′β∗, where
β∗ = argminβ E[yi − ψ(xi)′β)]2, and set θni = θ(xi), θ0i = θ0(xi), θi = ψ′
iβ, fi = f(xi, α),
f ∗i = f(xi, α
∗). Then ui = yi − f(xi, α) = ui + θ0i − fi. Let θ0 = (θ0 (x1) , . . . , θ0 (xn))′ as
before, with similar component-wise notation for the n-dimensional vectors θ∗, f , and u. As
the approximation error is e = θ0 − θ∗ = θ0 −Ψβ∗,
θ − θ∗ = Ψ(β − β∗) = Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 (u+ θ0 −Ψβ∗)
= Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 (u+ e),
so that
nmn = σ−2v′Σ (γ)−1 u = σ−2(θ − f
)′Σ (γ)−1
(y − f
)
= σ−2(θ − θ∗ + θ∗ − θ0 + θ0 − f
)′Σ (γ)−1
(u+ θ0 − f
)
= σ−2[Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 (u+ e)− e+ θ0 − f
]′Σ (γ)−1
(u+ θ0 − f
)
= σ−2u′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u+ σ−2u′Σ (γ)−1(θ0 − f
)
− σ−2(u+ θ0 − f
)′Σ (γ)−1 (I −Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e
+ σ−2(θ0 − f
)′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u
+ σ−2(θ0 − f)′Σ (γ)−1 (θ0 − f)
= σ−2u′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 u+ σ−2 (A1 + A2 + A3 + A4) ,
say. First, for any vector g comprising of conditioned random variables,
E[(u′Σ(γ)−1g
)2]= g′Σ(γ)−1ΣΣ(γ)−1g ≤ sup
γ∈Γ
∥∥Σ(γ)−1∥∥2 ‖Σ‖ ‖g‖2 = Op
(‖g‖2
),
45
uniformly in γ ∈ Γ, where the expectation is taken conditional on g. Similarly,
E
[(u′Σ(γ)−1Ψ
(Ψ′Σ(γ)−1Ψ
)−1Ψ′Σ(γ)−1g
)2]
= g′Σ(γ)−1Ψ(Ψ′Σ(γ)−1Ψ
)−1Ψ′Σ(γ)−1ΣΣ(γ)−1Ψ
(Ψ′Σ(γ)−1Ψ
)−1Ψ′Σ(γ)−1g
≤ supγ∈Γ
∥∥Σ(γ)−1∥∥4 ‖Σ‖
∥∥∥∥∥1
nΨ
(1
nΨ′Σ(γ)−1Ψ
)−1
Ψ′
∥∥∥∥∥
2
‖g‖2 = Op
(‖g‖2
),
uniformly and, for any j = 1, . . . , dγ,
E[(u′Σ(γ)−1Σj (γ)Σ (γ)−1 g
)2]= g′Σ(γ)−1Σj (γ)Σ (γ)−1ΣΣ(γ)−1Σj (γ) Σ (γ)−1 g
≤ supγ∈Γ
∥∥Σ(γ)−1∥∥4 ‖Σj (γ)‖2 ‖Σ‖ ‖g‖2 = Op
(‖g‖2
).
Let Ψk be the k-th column of Ψ, k = 1, . . . , p. Then, we have ‖Ψk/√n‖ = Op(1) and for any
γ ∈ Γ,
E
∥∥∥∥1√nu′Σ (γ)−1Ψ
∥∥∥∥2
≤p∑
k=1
E
(u′Σ (γ)−1 1√
nΨk
)2
= Op (p) ,
E
∥∥∥∥1√nu′Σ (γ)−1Σj (γ) Σ (γ)−1Ψ
∥∥∥∥2
≤p∑
k=1
E
(u′Σ (γ)−1Σj (γ)Σ (γ)−1 1√
nΨk
)2
= O(p).
Therefore, by Chebyshev’s inequality,
supγ∈Γ
∥∥∥∥1√nu′Σ (γ)−1Ψ
∥∥∥∥ = Op(√p) and sup
γ∈Γ
∥∥∥∥1√nu′Σ (γ)−1Σj (γ) Σ (γ)−1Ψ
∥∥∥∥ = Op(√p).
By the decomposition
u′(Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 − Σ−1Ψ[Ψ′Σ−1Ψ]−1Ψ′Σ−1
)u
= u′(Σ (γ)−1 + Σ−1
)Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′
(n∑
i=1
eine′in
)(Σ (γ)−1 − Σ−1
)u
+u′Σ−1Ψ([Ψ′Σ (γ)−1Ψ]−1 − [Ψ′Σ−1Ψ]−1
)Ψ′Σ−1u
= u′(Σ (γ)−1 + Σ−1
)Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′
(n∑
i=1
eine′in
)dγ∑
j=1
(Σ (γ)−1Σj (γ) Σ (γ)−1)
× u(γj − γj0) + u′Σ−1Ψ([Ψ′Σ (γ)−1Ψ]−1 − [Ψ′Σ−1Ψ]−1
)Ψ′Σ−1u,
where ein is an n × 1 vector with i-th entry one and zeros elsewhere, so∑n
i=1 eine′in = In,
46
and
e′in(Σ (γ)−1 − Σ−1
)u =
dγ∑
j=1
e′in(Σ (γ)−1Σj (γ)Σ (γ)−1)u(γj − γj0)
= e′in
dγ∑
j=1
(Σ (γ)−1Σj (γ)Σ (γ)−1)u(γj − γj0)
where γ is a value between γ and γ0 due to the mean value theorem. We have
∣∣u′(Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 − Σ−1Ψ[Ψ′Σ−1Ψ]−1Ψ′Σ−1
)u∣∣
≤ 2 supγ∈Γ
∥∥∥∥1√nu′Σ (γ)−1Ψ
∥∥∥∥
∥∥∥∥∥
(1
nΨ′Σ (γ)−1Ψ
)−1∥∥∥∥∥
dγ∑
j=1
∥∥∥∥1√nΨ′(Σ (γ)−1Σj (γ)Σ (γ)−1) u
∥∥∥∥
× |γj − γj0|+∥∥∥∥
1√nu′Σ−1Ψ
∥∥∥∥2∥∥∥∥∥
(1
nΨ′Σ (γ)−1Ψ
)−1
−(1
nΨ′Σ−1Ψ
)−1∥∥∥∥∥
= Op(√p)Op(dγ
√p/√n) +Op(p)Op(
√dγ/
√n) = Op(dγp/
√n) = op(
√p),
where the last equality holds under the conditions of the theorem.
It remains to show that
Ai = op(p1/2
), i = 1, . . . , 4. (B.2)
It is convenient to perform the calculations under Hℓ, which covers H0 as a particular case.
Using the mean value theorem and either H0 or Hℓ, we can express
θ0i − fi = f ∗i − fi − (p1/4/n1/2)hi =
dα∑
j=1
∂f(xi, α)
∂αj(α∗
j − αj)−p1/4
n1/2hi, (B.3)
where αj is a value between α∗j and αj. Then, for any j = 1, . . . , dα,
∣∣α∗j − αj
∣∣=Op(1/√n).
Based on
supγ∈Γ
∣∣∣u′Σ(γ)−1Ψ(Ψ′Σ(γ)−1Ψ
)−1Ψ′Σ(γ)−1g
∣∣∣=Op (‖g‖) and supγ∈Γ
∣∣u′Σ(γ)−1g∣∣ = Op (‖g‖)
for any γ ∈ Γ and any conditioned vector g, if we take g = ∂f(x, α)/∂αj or g = h, then both
satisfy Op (‖g‖) = Op (√n) and it follows that
|A1| =∣∣∣u′Σ (γ)−1
(θ0 − f
)∣∣∣ ≤ supγ,α
dα∑
j=1
∥∥∥∥u′Σ(γ)−1∂f(x, α)
∂αj
∥∥∥∥∣∣α∗
j − αj
∣∣+ p1/4
n1/2supγ
∥∥u′Σ(γ)−1h∥∥
47
= Op(√n)Op
(1√n
)+O
(p1/4
n1/2
)Op(
√n) = Op(p
1/4) = op(p1/2).
Similarly,
|A3| =∣∣∣u′Σ (γ)−1Ψ
(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 (θ0 − f)
∣∣∣
≤ supγ,α
dα∑
j=1
∥∥∥∥u′Σ (γ)−1Ψ
(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 ∂f(x, α)
∂αj
∥∥∥∥∣∣α∗
j − αj
∣∣
+p1/4
n1/2supγ
∥∥∥u′Σ (γ)−1Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 h
∥∥∥
= Op(1) +Op(p1/4) = Op(p
1/4) = op(p1/2).
Also, by Assumptions R.2 and R.10, we have
∥∥∥θ0 − f∥∥∥ ≤ sup
α
dα∑
j=1
∥∥∥∥∂f(x, α)
∂αj
∥∥∥∥∣∣α∗
j − αj
∣∣ + ‖h‖ p1/4
n1/2= Op(p
1/4). (B.4)
By (3.2), we have ‖e‖ = O(p−µn1/2) and
|A2| =∣∣∣(u+θ0 − f)′
(Σ (γ)−1 − Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e
∣∣∣≤ sup
γ|u′Σ (γ)−1 e|+ sup
γ
∣∣u′Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1 e∣∣
+∥∥∥θ0 − f
∥∥∥ supγ
(∥∥Σ (γ)−1∥∥+
∥∥Σ (γ)−1Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1∥∥) ‖e‖
= Op(p−µn1/2) +Op(p
−µ+1/4n1/2) = Op(p−µ+1/4n1/2) = op(
√p).
where the last equality holds under the conditions of the theorem. Finally, under Hℓ,
A4 =(θ0 − f
)′Σ (γ)−1
(θ0 − f
)
=(θ0 − f
)′Σ−1
(θ0 − f
)+(θ0 − f
)′ (Σ (γ)−1 − Σ−1
) (θ0 − f
)
=p1/2
nh′Σ−1h + op(1) +Op
(p1/2d1/2γ /n1/2
)=p1/2
nh′Σ−1h + op(
√p).
Combining these together, we have
nmn = σ−2v′Σ (γ)−1 u =1
σ20
ε′V ε+p1/2
nh′Σ−1h+ op(
√p),
under Hℓ and the same expression holds with h = 0 under H0.
48
Proof of Theorem 4.4. (1) Follows from Theorems 4.2 and 4.3. (2) Following reason-
ing analogous to the proofs of Theorems 4.2 and 4.3, it can be shown that under H1,
mn = n−1σ∗−2(θ0 − f ∗)′Σ (γ∗)−1 (θ0 − f ∗) + op(1). Then,
Tn = (nmn − p) /√2p = (n/
√p) (θ0 − f ∗)′Σ (γ∗)−1 (θ0 − f ∗)/
(√2nσ∗2
)+ op (n/
√p)
and for any nonstochastic sequence {Cn}, Cn = o(n/p1/2), P (Tn > Cn) → 1, so that
consistency follows. (3) Follows from Theorems 4.2 and 4.3.
C Proof of Theorem 5.1
Proof. We prove the result under H1, which is the more challenging case as it involves
nonparametric estimation. The proof under H0 is similar. We will show φp→ φ0, whence
βp→ β0 and σ2 p→ σ2
0 follow from (5.3) and (5.4) respectively. First note that
L (φ)−L = log σ2 (φ) /σ2−n−1 log∣∣T ′(λ)Σ(γ)−1T (λ)Σ
∣∣ = log σ2 (φ) /σ2 (φ)−log σ2/σ20+log r(φ),
(C.1)
where recall that σ2 (φ) = n−1σ20tr (T
′(λ)Σ(γ)−1T (λ)Σ) , σ2 = σ2 (φ0) = n−1u′E ′MEu,
using (5.4) and also r(φ) = n−1tr (T ′(λ)Σ(γ)−1T (λ)Σ) / |T ′(λ)Σ(γ)−1T (λ)Σ|1/n.We have σ2 (φ) = n−1
{S−1′ (Ψβ0 + u)
}′S ′(λ)E(γ)′M (γ)E(γ)S(λ)S−1 (Ψβ0 + u) =
c1 (φ) + c2 (φ) + c3 (φ), where
c1 (φ) = n−1β ′0Ψ
′T ′(λ)E(γ)′M (γ)E(γ)T (λ)Ψβ0,
c2 (φ) = n−1σ20tr (T
′(λ)E(γ)′M (γ)E(γ)T (λ)Σ) ,
c3 (φ) = n−1tr(T ′(λ)E(γ)′M (γ)E(γ)T (λ)
(uu′ − σ2
0Σ))
+ 2n−1β ′0Ψ
′T ′(λ)E(γ)′M (γ)E(γ)T (λ)u.
Note that in the particular cases of Theorems 4.1 and 6.1, where T (λ) = In, the c1 term
vanishes because M (γ)E(γ)Ψ = 0 and M (τ)E(τ)Ψ = 0. Proceeding with the current,
more general proof
logσ2 (φ)
σ2 (φ)= log
σ2 (φ)
(c1 (φ) + c2 (φ))+ log
c1 (φ) + c2 (φ)
σ2 (φ)
= log
(1 +
c3 (φ)
c1 (φ) + c2 (φ)
)+ log
(1 +
c1 (φ)− f (φ)
σ2 (φ)
),
49
where f (φ) = n−1σ20tr (E
′−1T ′(λ)E(γ)′ (In −M (γ))E(γ)T (λ)E−1) . Then (C.1) implies
P(∥∥∥φ− φ0
∥∥∥ ∈ N φ(η))
= P
(inf
φ∈ Nφ(η)
L (φ)−L ≤ 0
)
≤ P
(log
(1 + sup
φ∈ Nφ(η)
∣∣∣∣c3 (φ)
c1 (φ) + c2 (φ)
∣∣∣∣
)+∣∣log
(σ2/σ2
0
)∣∣
≥ infφ∈ N
φ(η)
(log
(1 +
c1 (φ)− f (φ)
σ2 (φ)
)+ log r(φ)
)),
where recall thatN φ(η) = Φ\N φ (η) , N φ (η) = {φ : ‖φ− φ0‖ < η}∩Φ. Because σ2/σ2
0
p→ 1,
the property log (1 + x) = x+ o (x) as x→ 0 implies that it is sufficient to show that
supφ∈ N
φ(η)
∣∣∣∣c3 (φ)
c1 (φ) + c2 (φ)
∣∣∣∣p−→ 0, (C.2)
supφ∈ N
φ(η)
∣∣∣∣f (φ)
σ2 (φ)
∣∣∣∣p−→ 0, (C.3)
P
(inf
φ∈ Nφ(η)
{c1 (φ)
σ2 (φ)+ log r(φ)
}> 0
)−→ 1. (C.4)
Because N φ(η) ⊆
{Λ×N γ
(η/2)}∪{N λ
(η/2)× Γ}, we have
P
(inf
φ∈ Nφ(η)
{c1 (φ)
σ2 (φ)+ log r(φ)
}> 0
)≥ P
(min
{inf
Λ×Nγ(η/2)
c1 (φ)
σ2 (φ), infN
λ(η/2)
log r(φ)
}> 0
)
≥ P
(min
{inf
Λ×Nγ(η/2)
c1 (φ)
C, infN
λ(η/2)
log r(φ)
}> 0
),
from Assumption SAR.2, whence Assumptions SAR.3 and SAR.4 imply (C.4). Again using
Assumption SAR.2, uniformly in φ, |f (φ) /σ2 (φ)| = Op (|f (φ)|) and
|f (φ)| = Op
(tr(E ′−1T ′(λ)Σ(γ)−1Ψ
(Ψ′Σ(γ)−1Ψ
)−1Ψ′Σ(γ)−1T (λ)E−1
)/n)
= Op
(tr(E ′−1T ′(λ)Σ(γ)−1ΨΨ′Σ(γ)−1T (λ)E−1
)/n2)= Op
(∥∥Ψ′Σ(γ)−1T (λ)E−1/n∥∥2F
)
= Op
(‖Ψ/n‖2F ϕ2
(Σ(γ)−1
)‖T (λ)‖2
∥∥E−1∥∥2)= Op
(‖Ψ/n‖2F ‖T (λ)‖2 ϕ (Σ) /ϕ2 (Σ(γ))
)
= Op
(‖T (λ)‖2 /n
), (C.5)
50
where we have twice made use of the inequality
‖AB‖F ≤ ‖A‖F ‖B‖ (C.6)
for generic multiplication compatible matrices A and B. (C.3) now follows by Assumption
SAR.1 and compactness of Λ because T (λ) = In +∑dλ
j=1 (λ0j − λj)Gj. Finally consider
(C.2). We first prove pointwise convergence. For any fixed φ ∈ N φ(η) and large enough n,
Assumptions SAR.2 and SAR.4 imply
{c1 (φ)}−1 = Op
(‖β0‖−2) = Op(1) (C.7)
{c2 (φ)}−1 = Op(1), (C.8)
because {n−1σ20tr (T
′(λ)Σ(γ)−1T (λ)E−1)}−1= Op(1) and, proceeding like in the bound for
|f(φ)|, tE ′−1r (E ′−1T ′(λ)E(γ)′ (I −M (γ))E(γ)T (λ)E−1) = Op
(‖T (λ)‖2 /n
)= Op (1/n).
In fact it is worth noting for the equicontinuity argument presented later that Assump-
tions SAR.2 and SAR.4 actually imply that (C.7) and (C.8) hold uniformly over N φ(η), a
property not needed for the present pointwise arguments. Thus c3 (φ) / (c1 (φ) + c2 (φ)) =
Op (|c3 (φ)|) where, writing B(φ) = T ′(λ)E(γ)′M (γ)E(γ)T (λ) with typical element brs(φ),
r, s = 1, . . . , n, c3 (φ) has mean 0 and variance
Op
(‖B(φ)Σ‖2F
n2+
∑nr,s,t,v=1 brs(φ)btv(φ)κrstv
n2+
‖β ′0Ψ
′B(φ)E−1‖2n2
), (C.9)
with κrstv denoting the fourth cumulant of ur, us, ut, uv, r, s, t, v = 1, . . . , n. Under the linear
process assumed in Assumption R.4 it is known that
n∑
r,s,t,v=1
κ2rstv = O(n). (C.10)
Using (C.6) and Assumptions SAR.1 and R.3, the first term in parentheses in (C.9) is
Op
(‖B(φ)‖2F ϕ2 (Σ) /n2
)= Op
(‖T (λ)‖2F ‖E(γ)‖4 ‖M(γ)‖2 ‖T (λ)‖2 /n2
)
= Op
(‖T (λ)‖4 /nϕ2 (Σ(γ))
)= Op
(‖T (λ)‖4 /n
), (C.11)
while the second is similarly
Op
(‖B(φ)‖2F /n
)(
n∑
r,s,t,v=1
κ2rstv/n2
) 1
2
= op
(‖T (λ)‖4
), (C.12)
51
using (C.10). Finally, the third term in parentheses in (C.9) is
Op
(‖B(φ)‖2 /n
)= Op
(‖T (λ)‖4 /n
). (C.13)
By compactness of Λ and Assumption SAR.1, (C.11), (C.12) and (C.13) are negligible, thus
pointwise convergence is established.
Uniform convergence will follow from an equicontinuity argument. First, for arbitrary ε >
0 we can find points φ∗ = (λ′∗, γ′∗)
′, possibly infinitely many, such that the neighbourhoods
‖φ− φ∗‖ < ε form an open cover of N φ(η). Since Φ is compact any open cover has a finite
subcover and thus we may in fact choose finitely many φ∗ = (λ′∗, γ′∗)
′, whence it suffices to
prove
sup‖φ−φ
∗‖<ε
∣∣∣∣c3 (φ)
c1 (φ) + c2 (φ)− c3 (φ∗)
c1 (φ∗) + c2 (φ∗)
∣∣∣∣p−→ 0.
Proceeding as in Gupta and Robinson (2018), we denote the two components of c3 (φ) by
c31 (φ) , c32 (φ) , and are left with establishing the negligibility of
|c31 (φ)− c31 (φ∗)|c2 (φ)
+|c32 (φ)− c32 (φ∗)|
c1 (φ)+
|c3 (φ∗)|c1 (φ) c1 (φ∗)
|c1 (φ∗)− c1 (φ)|
+|c3 (φ∗)|
c2 (φ) c2 (φ∗)|c2 (φ∗)− c2 (φ)| , (C.14)
uniformly on ‖φ− φ∗‖ < ε. By the fact that (C.7) and (C.8) hold uniformly over Φ, we first
consider only the numerators in the first two terms in (C.14). As in the proof of Theorem
1 of Delgado and Robinson (2015), (C.6) implies that E(sup‖φ−φ
∗‖<ε |c31 (φ)− c31 (φ∗)|
)is
bounded by
n−1(E ‖u‖2 + σ2
0trΣ)
sup‖φ−φ
∗‖<ε
‖B(φ)−B(φ∗)‖ = Op
(sup
‖φ−φ∗‖<ε
‖B(φ)−B(φ∗)‖),
because E ‖u‖2 = O(n) and trΣ = O(n). B(φ)−B(φ∗) can be written as
(T (λ)− T (λ∗))′E(γ)′M(γ)E(γ)T (λ) + T (λ∗)
′Σ′(γ∗)M(γ∗)E(γ∗) (T (λ)− T (λ∗))
+ T ′(λ∗) (E(γ)′M(γ)E(γ)− E(γ∗)
′M(γ∗)E(γ∗))T (λ), (C.15)
which, by the triangle inequality, has spectral norm bounded by
‖T (λ)− T (λ∗)‖(‖E(γ)‖2 ‖T (λ)‖+ ‖E(γ∗)‖2 ‖T (λ∗)‖
)
+ ‖T (λ∗)‖ ‖E(γ)′M(γ)E(γ)− E(γ∗)′M(γ∗)E(γ∗)‖ ‖T (λ)‖
52
= Op (‖T (λ)− T (λ∗)‖+ ‖E(γ)′M(γ)E(γ)−E(γ∗)′M(γ∗)E(γ∗)‖) .
(C.16)
By Assumption SAR.1 the first term in parentheses on the right side of (C.16) is bounded
uniformly on ‖φ− φ∗‖ < ε by
dλ∑
j=1
|λj − λ∗j | ‖Gj‖ ≤ maxj=1,...,dλ
‖Gj‖ ‖λ− λ∗‖ = Op(ε), (C.17)
while because E(γ)′M(γ)E(γ) = n−1Σ(γ)−1Ψ (n−1Ψ′Σ(γ)−1Ψ)−1
Ψ′Σ(γ)−1 for any γ ∈ Γ,
the second one can be decomposed into terms with bounds typified by
n−1∥∥Σ(γ)−1 − Σ(γ∗)
−1∥∥ ‖Ψ‖2
∥∥∥(n−1Ψ′Σ(γ)−1Ψ
)−1∥∥∥∥∥Σ(γ)−1
∥∥2
≤ n−1 ‖Σ(γ)− Σ(γ∗)‖ ‖Ψ‖2∥∥∥(n−1Ψ′Σ(γ)−1Ψ
)−1∥∥∥∥∥Σ(γ)−1
∥∥3 ∥∥Σ(γ∗)−1∥∥
= Op (‖Σ(γ)− Σ(γ∗)‖) = Op(ε),
uniformly on ‖φ− φ∗‖ < ε, by Assumptions R.3 and R.8, Proposition 4.1 and the inequality
‖A‖ ≤ ‖A‖F for a generic matrix A, so that
sup‖φ−φ∗‖<ε
‖B(φ)−B(φ∗)‖ = Op(ε). (C.18)
Thus equicontinuity of the first term in (C.14) follows because ε is arbitrary. The equicontinu-
ity of the second term in (C.14) follows in much the same way. Indeed sup‖φ−φ∗‖<ε c32 (φ)−c32 (φ∗) = 2n−1β ′
0Ψ′ sup‖φ−φ∗‖<ε (B(φ)−B(φ∗))u = Op
(sup‖φ−φ∗‖<ε ‖B(φ)−B(φ∗)‖
)=
Op(ε), using earlier arguments and (C.18). Because c1(φ) is bounded and bounded away from
zero in probability (see C.7) for sufficiently large n and all φ ∈ N φ(η), the third term in (C.14)
may be bounded by |c3(φ∗)|/c1(φ∗) (1 + c1(φ∗)/c1(φ))p−→ 0, convergence being uniform on
‖φ− φ∗‖ < ε by pointwise convergence of c3(φ)/ (c1(φ) + c2(φ)), cf. Gupta and Robinson
(2018). The uniform convergence to zero of the fourth term in (C.14) follows in identical
fashion, because c2(φ) is bounded and bounded away from zero (see (C.8)) in probability for
sufficiently large n and all φ ∈ N φ(η). This concludes the proof.
D Lemmas
Lemma D.1. Under the conditions of Theorem 4.1, c1(γ) = n−1β ′Ψ′E ′(γ)M(γ)E(γ)Ψβ +
op(1).
53
Proof. First,
c1(γ) = n−1β ′Ψ′E ′(γ)M(γ)E(γ)Ψβ + c12(γ) + c13(γ).
with c12(γ) = 2n−1e′E ′(γ)M(γ)E(γ)Ψβ and c13(γ) = n−1e′E ′(γ)M(γ)E(γ)e. It is readily
seen that c12(γ) and c13(γ) are negligible.
Lemma D.2. Under the conditions of Theorem 4.2 or Theorem 5.2, ‖γ − γ0‖ =
Op
(√dγ/n
).
Proof. We show the details for the setting of Theorem 4.2 and omit the details for the
setting of Theorem 5.2. Write l = ∂L(β0, γ0)/∂γ. By Robinson (1988), we have ‖γ − γ0‖ =
Op (‖l‖). Now l =(l1, . . . , ldγ
)′, with lj = n−1tr (Σ−1Σj) − n−1σ−2
0 u′Σ−1ΣjΣ−1u. Next,
E ‖l‖2 =∑dγj=1E
(l2j)and
E(l2j)=
1
n2σ40
var(u′Σ−1ΣjΣ
−1u)=
1
n2σ40
var(ε′B′Σ−1ΣjΣ
−1Bε)=
1
n2σ40
var (ε′Djε) ,
(D.1)
say. But, writing dj,st for a typical element of the infinite dimensional matrix Dj, we have
var (ε′Djε) =(µ4 − 3σ4
0
) ∞∑
s=1
d2j,ss + 2σ40tr(D2
j
)=(µ4 − 3σ4
0
) ∞∑
s=1
d2j,ss + 2σ40
∞∑
s,t=1
d2j,st. (D.2)
Next, by Assumptions R.4, R.3 and R.9
∞∑
s=1
d2j,ss =∞∑
s=1
(b′sΣ
−1ΣjΣ−1bs
)2 ≤(
∞∑
s=1
‖bs‖2)∥∥Σ−1
∥∥2 ‖Σj‖ = O
(n∑
j=1
∞∑
s=1
b∗2js
)= O(n).
(D.3)
Similarly,
∞∑
s,t=1
d2j,st =
∞∑
s=1
b′sΣ−1ΣjΣ
−1
(∞∑
t=1
btb′t
)Σ−1ΣjΣ
−1bs =
∞∑
s=1
b′sΣ−1ΣjΣ
−1ΣjΣ−1bs = O(n).
(D.4)
Using (D.3) and (D.4) in (D.2) implies that E(l2j)= O (n−1), by (D.1). Thus we have
E ‖l‖2 = O (dγ/n), and thus ‖l‖ = Op
(√dγ/n
), by Markov’s inequality, proving the lemma.
Lemma D.3. Under the conditions of Theorem 4.3, E(σ−20 ε′V ε
)= p and
V ar(σ−20 ε′V ε
)/2p→ 1.
54
Proof. As E(σ−20 ε′V ε
)= tr (E[B′Σ−1Ψ(Ψ′Σ−1Ψ)−1Ψ′Σ−1B]) = p, and
V ar
(1
σ20
ε′V ε
)=
(µ4
σ40
− 3
) ∞∑
s=1
E(v2ss) + E[tr(V V′) + tr(V 2)] =
(µ4
σ40
− 3
) ∞∑
s=1
v2ss + 2p,
(D.5)
it suffices to show that
(2p)−1
∞∑
s=1
v2ssp→ 0. (D.6)
Because vss = b′sM bs, we have v2ss =(∑n
i,j=1 bisbjsmij
)2. Thus, using Assumption R.4 and
(A.5), we have
∞∑
s=1
v2ss ≤(supi,j
|mij |)2 ∞∑
s=1
(n∑
i,j=1
|b∗is|∣∣b∗js∣∣)2
= Op
p2n−2
(sups
n∑
i=1
|b∗is|)3 n∑
i=1
∞∑
s=1
|b∗is|
= Op
(p2n−1
), (D.7)
establishing (D.6) because p2/n→ 0.
Lemma D.4. Under the conditions of Theorem 6.2, ‖τ − τ0‖ = Op
(√dτ/n
).
Proof. The proof is similar to that of Lemma D.2 and is omitted.
Denote H(γ) = In+∑m1+m2
j=m1+1 γjWj andK(γ) = In−∑m1
j=1 γjWj. Let Gj(γ) = WjK−1(γ),
j = 1, . . . , m1, Tj = H−1(γ)Wj, j = m1+1, . . . , m1+m2 and, for a generic matrix A, denote
A = A + A′. Our final conditions may differ according to whether the Wj are of general
form or have ‘single nonzero diagonal block structure’, see e.g Gupta and Robinson (2015).
To define these, denote by V an n × n block diagonal matrix with i-th block Vi, a si × si
matrix, where∑m1+m2
i=1 si = n, and for i = 1, ..., m1 + m2 obtain Wj from V by replacing
each Vj , j 6= i, by a matrix of zeros. Thus V =∑m1+m2
i=1 Wj.
Lemma D.5. For the spatial error model with SARMA(p, q) errors, if
supγ∈Γo
(∥∥K−1(γ)∥∥+
∥∥K ′−1(γ)∥∥+
∥∥H−1(γ)∥∥+
∥∥H ′−1(γ)∥∥)+ max
j=1,...,m1+m2
‖Wj‖ < C, (D.8)
then
(DΣ(γ))(γ†)= A−1(γ)
(m1∑
j=1
γ†jH−1(γ)Gj(γ) +
m1+m2∑
j=m1+1
γ†jTj(γ)
)A′−1(γ).
Proof. We first show that DΣ ∈ L (Γo,Mn×n). Clearly, DΣ is a linear map and (D.8)
∥∥(DΣ(γ))(γ†)∥∥ ≤ C
∥∥γ†∥∥1,
55
in the general case and
∥∥(DΣ(γ))(γ†)∥∥ ≤ C max
j=1,...,m1+m2
∣∣∣γ†j∣∣∣ ,
in the ‘single nonzero diagonal block’ case. Thus DΣ is a bounded linear operator between
two normed linear spaces, i.e. it is a continuous linear operator.
With A(γ) = H−1(γ)K(γ), we now show that
A−1(γ + γ†
)A′−1
(γ + γ†
)− A−1 (γ)A′−1 (γ)− (DΣ(γ))
(γ†)
‖γ†‖g→ 0, as
∥∥γ†∥∥g→ 0, (D.9)
where ‖·‖g is either the 1-norm or the max norm on Γ. First, note that
A−1(γ + γ†
)A′−1
(γ + γ†
)−A−1(γ)A′−1(γ)
= A−1(γ + γ†
) (A−1
(γ + γ†
)− A−1(γ)
)′+(A−1
(γ + γ†
)− A−1(γ)
)A−1(γ)
= −A−1(γ + γ†
)A′−1
(γ + γ†
) (A(γ + γ†
)− A(γ)
)′A′−1(γ)
− A−1(γ + γ†
) (A(γ + γ†
)− A(γ)
)A−1(γ)A′−1(γ). (D.10)
Next,
A(γ + γ†
)−A(γ) = H−1
(γ + γ†
)K(γ + γ†
)−H−1 (γ)K (γ)
= H−1(γ + γ†
) (K(γ + γ†
)−K(γ)
)
+ H−1(γ + γ†
) (H (γ)−H
(γ + γ†
))H−1 (γ)K (γ)
= −H−1(γ + γ†
)(
m1∑
j=1
γ†jWj +
m1+m2∑
j=m1+1
γ†jWjH−1(γ)K(γ)
).
(D.11)
Substituting (D.11) in (D.10) implies that
A−1(γ + γ†
)A′−1
(γ + γ†
)− A−1(γ)A′−1(γ) = ∆1
(γ, γ†
)+∆2
(γ, γ†
)= ∆
(γ, γ†
), (D.12)
say, where
∆1
(γ, γ†
)= A−1
(γ + γ†
)A′−1
(γ + γ†
)(
m1∑
j=1
γ†jW′j +K ′(γ)H ′−1(γ)
m1+m2∑
j=m1+1
γ†jW′j
)
× H ′−1(γ + γ†
)A′−1(γ),
56
∆2
(γ, γ†
)= A−1
(γ + γ†
)H−1
(γ + γ†
)(
m1∑
j=1
γ†jWj +
m1+m2∑
j=m1+1
γ†jWjH−1(γ)K(γ)
)
× A−1(γ)A′−1(γ).
From the definitions above and recalling that A(γ) = H−1(γ)K(γ), we can write
∆(γ, γ†
)= A−1
(γ + γ†
)Υ(γ, γ†
)A′−1 (γ) , (D.13)
with
Υ(γ, γ†
)=
m1∑
j=1
γ†jG′j
(γ + γ†
)H ′−1
(γ + γ†
)+ A′−1
(γ + γ†
)A′(γ)
m1+m2∑
j=m1+1
γ†jT′j
(γ + γ†
)
+
m1∑
j=1
γ†jH−1(γ + γ†
)Gj (γ) +
m1+m2∑
j=m1+1
γ†jTj(γ + γ†
).
Then (D.12) implies that
A−1(γ + γ†
)A′−1
(γ + γ†
)− A−1(γ)A′−1(γ)− (DΣ(γ))
(γ†)
= A−1(γ + γ†
)A′−1
(γ + γ†
)− A−1(γ)A′−1(γ)−∆
(γ, γ†
)− (DΣ(γ))
(γ†)+∆
(γ, γ†
)
= ∆(γ, γ†
)− (DΣ(γ))
(γ†), (D.14)
so to prove (D.9) it is sufficient to show that
∆(γ, γ†
)− (DΣ(γ))
(γ†)
‖γ†‖g→ 0 as
∥∥γ†∥∥g→ 0. (D.15)
The numerator in (D.15) can be written as∑7
i=1Πi
(γ, γ†
)A′−1(γ) by adding, subtracting
and grouping terms, where (omitting the argument(γ, γ†
))
Π1 = A−1(γ + γ†
) m1∑
j=1
γ†jG′j
(γ + γ†
)H ′−1(γ)
(H(γ)−H
(γ + γ†
))′H ′−1
(γ + γ†
),
Π2 = A−1(γ + γ†
) m1∑
j=1
γ†jH−1(γ + γ†
) (H(γ)−H
(γ + γ†
))H−1(γ)Gj (γ) ,
Π3 = A−1(γ + γ†
) m1+m2∑
j=m1+1
γ†j(A−1
(γ + γ†
)− A−1 (γ)
)T ′j
(γ + γ†
),
Π4 =(A−1
(γ + γ†
)− A−1 (γ)
) m1+m2∑
j=m1+1
γ†jTj (γ + γ†),
57
Π5 = A−1(γ)
m1+m2∑
j=m1+1
γ†jH−1 (γ + γ†) (H(γ)−H (γ + γ†))H−1(γ)Wj,
Π6 = ∆(γ, γ†
) m1∑
j=1
γ†jW′jH
′−1(γ),
Π7 =(A−1
(γ + γ†
)− A−1 (γ)
) m1∑
j=1
γ†jH−1(γ)Gj(γ).
By (D.8), (D.13) and replication of earlier techniques, we have
maxi=1,...,7
supγ∈Γo
∥∥Πi
(γ, γ†
)A−1(γ)
∥∥ ≤ C∥∥γ†∥∥2g, (D.16)
where the norm used on the RHS of (D.16) depends on whether we are considering the
general case or the ‘single nonzero diagonal block’ case. Thus
∥∥∆(γ, γ†
)− (DΣ(γ))
(γ†)∥∥
‖γ†‖g≤ C
∥∥γ†∥∥g→ 0 as
∥∥γ†∥∥g→ 0,
proving (D.15) and thus (D.9).
Corollary D.1. For the spatial error model with SAR(m1) errors,
(DΣ(γ))(γ†)= K−1(γ)
m1∑
j=1
γ†jGj(γ)K′−1(γ).
Proof. Taking m2 = 0 in Lemma D.5, the elements involving sums from m1 + 1 to m1 +m2
do not arise and H(γ) = In, proving the claim.
Corollary D.2. For the spatial error model with SMA(m2) errors,
(DΣ(γ))(γ†)= H(γ)
m2∑
j=1
γ†jTj(γ)H′(γ).
Proof. Taking m1 = 0 in Lemma D.5, the elements involving sums from 1 to m1 do not arise
and K(γ) = In, proving the claim.
Lemma D.6. For the spatial error model with MESS(m1) errors, if
maxj=1,...,m1
(‖Wj‖+
∥∥W ′j
∥∥) < 1, (D.17)
58
then
(DΣ(γ))(γ†)= exp
(m1∑
j=1
γj(Wj +W ′
j
))
m1∑
j=1
γ†j(Wj +W ′
j
).
Proof. Clearly DΣ ∈ L (Γo,Mn×n). Next,
∥∥A−1(γ + γ†
)A′−1
(γ + γ†
)− A−1(γ)A′−1(γ)− (DΣ(γ))
(γ†)∥∥
=
∥∥∥∥∥exp(
m1∑
j=1
(γj + γ†j
) (Wj +W ′
j
))
− exp
(m1∑
j=1
γj(Wj +W ′
j
))
− (DΣ(γ))(γ†)∥∥∥∥∥
=
∥∥∥∥∥exp(
m1∑
j=1
γj(Wj +W ′
j
))(
exp
(m1∑
j=1
γ†j(Wj +W ′
j
))
− In −m1∑
j=1
γ†j(Wj +W ′
j
))∥∥∥∥∥
≤∥∥∥∥∥exp
(m1∑
j=1
γj(Wj +W ′
j
))∥∥∥∥∥
∥∥∥∥∥exp(
m1∑
j=1
γ†j(Wj +W ′
j
))
− In −m1∑
j=1
γ†j(Wj +W ′
j
)∥∥∥∥∥
≤ C
∥∥∥∥∥∥In +
p∑
j=1
γ†j(Wj +W ′
j
)+
∞∑
k=2
{m1∑
j=1
γ†j(Wj +W ′
j
)}k
− In −m1∑
j=1
γ†j(Wj +W ′
j
)∥∥∥∥∥∥
≤ C
∥∥∥∥∥∥
∞∑
k=2
{m1∑
j=1
γ†j(Wj +W ′
j
)}k∥∥∥∥∥∥≤ C
∞∑
k=2
m1∑
j=1
∣∣∣γ†j∣∣∣∥∥(Wj +W ′
j
)∥∥k
≤ C∞∑
k=2
∥∥γ†∥∥kg, (D.18)
by (D.17), without loss of generality, and again the norm used in (D.18) depending on
whether we are in the general or the ‘single nonzero diagonal block’ case. Thus
∥∥A−1(γ + γ†
)A′−1
(γ + γ†
)− A−1(γ)A′−1(γ)− (DΣ(γ))
(γ†)∥∥
‖γ†‖g≤ C
∞∑
k=2
∥∥γ†∥∥k−1
g→ 0,
as∥∥γ†∥∥g→ 0, proving the claim.
Theorem D.1. Under the conditions of Theorem 4.4 or 5.3, Tn − T an = op(1) as n→ ∞.
Proof. It suffices to show that nmn = nmn+op(√p). As η = y− θ, u = y− f , and v = θ− f ,
we have u = η + v and
nmn = σ−2(u′Σ (γ)−1 u− η′Σ (γ)−1 η
)= σ−2
(2u′Σ (γ)−1 v − v′Σ (γ)−1 v
)
= 2nmn − σ−2[Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 (u+ e)− e+ θ0 − f
]′
Σ (γ)−1[Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 (u+ e)− e+ θ0 − f
]
59
= 2nmn − σ−2u′Σ (γ)−1Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 u−σ−2
(θ0 − f
)′Σ (γ)−1
(θ0 − f
)
+σ−2(2(θ0 − f)− e
)′Σ (γ)−1 (I −Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e
−2σ−2(θ0 − f
)′Σ (γ)−1Ψ
(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 u
= 2nmn −(nmn − σ−2 (A1 + A2 + A3 + A4)
)− σ−2A4
+σ−2(2(θ0 − f)− e
)′Σ (γ)−1 (I −Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e− 2σ−2A3
= nmn + σ−2 (A1 + A2 − A3)
+σ−2(2(θ0 − f)− e
)′Σ (γ)−1
(I −Ψ
(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1
)e. (D.19)
In the proof of Theorem 4.2, we have shown that
∣∣∣∣(θ0 − f
)′Σ (γ)−1 (I −Ψ[Ψ′Σ (γ)−1Ψ]−1Ψ′Σ (γ)−1) e
∣∣∣∣ = op(√p)
in the process of proving |A2| = op(√p). Along with
∣∣∣e′Σ (γ)−1(I −Ψ
(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1
)e∣∣∣
≤∣∣e′Σ (γ)−1 e
∣∣ +∣∣∣e′Σ (γ)−1Ψ
(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1 e
∣∣∣
≤ ‖e‖2 supγ∈Γ
∥∥Σ (γ)−1∥∥+ ‖e‖2 sup
γ∈Γ
∥∥Σ (γ)−1∥∥2∥∥∥∥∥1
nΨ
(1
nΨ′Σ (γ)−1Ψ
)−1
Ψ′
∥∥∥∥∥= Op
(‖e‖2
)= Op
(p−2µn
)= op(
√p),
we complete the proof that nmn = nmn + op(√p). In the SAR setting of Section 5,
nmn = σ−2(u′Σ (γ)−1 u− η′Σ (γ)−1 η
)= σ−2
(2u′Σ (γ)−1 v − v′Σ (γ)−1 v
)
= 2nmn − σ−2
[Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1
(u+ e+
dλ∑
j=1
(λj0 − λj)Wjy
)− e+ θ0 − f
]′
Σ (γ)−1
[Ψ(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1
(u+ e+
dλ∑
j=1
(λj0 − λj)Wjy
)− e+ θ0 − f
].
Compared to the expression in (D.19), we have the additional terms
−σ−2
dλ∑
j=1
(λj0 − λj)Wjy′Σ (γ)−1Ψ
(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1
dλ∑
j=1
(λj0 − λj)Wjy
60
and
−2σ−2
dλ∑
j=1
(λj0 − λj)Wjy′Σ (γ)−1Ψ
(Ψ′Σ (γ)−1Ψ
)−1Ψ′Σ (γ)−1
(u+ θ0 − f
).
Both terms are op(√p) from the orders of A5 and A6 in the proof of Theorem 5.2. Hence, in
the SAR setting, nmn = nmn + op(√p) also holds.
We now present similar calculations that justify the validity of our bootstrap test for the
SARARMA(m1,m2, m3) model. The bootstrapped test statistic is constructed by
nm∗n = v∗′Σ (γ∗)−1 u∗ = (θ∗n − f(x, α∗
n))′Σ (γ∗)−1
((In −
m1∑
k=1
λ∗kW1k)y∗ − f(x, α∗
n)
)
Let Jn = (In − 1nlnl
′n). As y = S(λ)−1(θ(x) +R(γ)ξ), we have
ξ = Jnξ
= Jn
(
m3∑
l=1
γ3lW3l + In
)−1
+
(m3∑
l=1
γ3lW3l + In
)−1 m3∑
l=1
(γ3l − γ3l)W3l
(m3∑
l=1
γ3lW3l + In
)−1
×(In −
m2∑
l=1
γ2lW2l +
m2∑
l=1
(γ2l − γ2l)W2l
)(S(λ)y − θ(x) +
m1∑
k=1
(λk − λk)W1ky + θ(x)− ψ′β
)
= ξ − 1
nlnl
′nξ + Jn
(m3∑
l=1
γ3lW3l + In
)−1(In −
m2∑
l=1
γ2lW2l
)(m1∑
k=1
(λk − λk)W1ky + θ(x)− ψ′β
)
+Jn
(m3∑
l=1
γ3lW3l + In
)−1 m2∑
l=1
(γ2l − γ2l)W2l
(S(λ)y − θ(x) +
m1∑
k=1
(λk − λk)W1ky + θ(x)− ψ′β
)
+Jn
(m3∑
l=1
γ3lW3l + In
)−1 m3∑
l=1
(γ3l − γ3l)W3l
(m3∑
l=1
γ3lW3l + In
)−1
×(In −
m2∑
l=1
γ2lW2l +
m2∑
l=1
(γ2l − γ2l)W2l
)(S(λ)y − θ(x) +
m1∑
k=1
(λk − λk)W1ky + θ(x)− ψ′β
),
which can be written as
ξ = ξ +
r∑
j=1
ζ1n,jpnj +
s∑
j=1
ζ2n,jQnjξ,
where pnj is an n-dimensional vector with bounded elements, Qnj = [qnj,i] is an n×n matrix
with bounded row and column sum norms, and ζ1n,j and ζ2n,j’s are equal to l′nξ/n, elements
61
of λk − λk, γ2l − γ2l, θ(x) − ψ′β or their products. This differs from the proof of Lemma 2
in Jin and Lee (2015) in the term θ(x) − ψ′β and potentially increasing order of dγ. Then,
ζ1n,j = Op(√p1/2/n ∨
√dγ/n) and ζ2n,j = Op(
√p1/2/n ∨
√dγ/n), instead of Op(
√1/n) as
in Jin and Lee (2015). Based on this result, the assumptions in Theorem 4 of Su and Qu
(2017) hold, so the validility of our bootstrap test directly follows.
References
Ambrosetti, A. and G. Prodi (1995). A Primer of Nonlinear Analysis. Cambridge University
Press.
Anatolyev, S. (2012). Inference in regression models with many regressors. Journal of
Econometrics 170, 368–382.
Autant-Bernard, C. and J. P. LeSage (2011). Quantifying knowledge spillovers using spatial
autoregressive models. Journal of Regional Science 51, 471–496.
Bloom, N., M. Schankerman, and J. van Reenen (2013). Identifying technology pillovers and
product market rivalry. Econometrica 81, 1347–1393.
Case, A. C. (1991). Spatial patterns in household demand. Econometrica 59, 953–965.
Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models, Volume 6B,
Chapter 76, pp. 5549–5632. North Holland.
Chen, X., H. Hong, and E. Tamer (2005). Measurement error models with auxiliary data.
Review of Economic Studies 72, 343–366.
Cliff, A. D. and J. K. Ord (1973). Spatial Autocorrelation. London: Pion.
Conley, T. G. and B. Dupor (2003). A spatial analysis of sectoral complementarity. Journal
of Political Economy 111, 311–352.
Cooke, R. G. (1950). Infinite Matrices & Sequence Spaces. Macmillan and Company, London.
De Jong, R. M. and H. J. Bierens (1994). On the limit behavior of a chi-square type test if
the number of conditional moments tested approaches infinity. Econometric Theory 10,
70–90.
De Oliveira, V., B. Kedem, and D. A. Short (1997). Bayesian prediction of transformed
Gaussian random fields. Journal of the American Statistical Association 92, 1422–1433.
Debarsy, N., F. Jin, and L. F. Lee (2015). Large sample properties of the matrix exponential
spatial specification with an application to FDI. Journal of Econometrics 188, 1–21.
Delgado, M. and P. M. Robinson (2015). Non-nested testing of spatial correlation. Journal
of Econometrics 187, 385–401.
62
Ertur, C. and W. Koch (2007). Growth, technological interdependence and spatial external-
ities: theory and evidence. Journal of Applied Econometrics 22, 1033–1062.
Evans, P. and J. U. Kim (2014). The spatial dynamics of growth and convergence in Korean
regional incomes. Applied Economics Letters 21, 1139–1143.
Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. Journal
of the American Statistical Association 102, 321–331.
Gao, J. and V. Anh (2000). A central limit theorem for a random quadratic form of strictly
stationary processes. Statistics and Probability Letters 49, 69–79.
Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data.
Journal of the American Statistical Association 97, 590–600.
Gradshteyn, I. S. and I. M. Ryzhik (1994). Table of Integrals, Series and Products (5th ed.).
Academic Press, London.
Gupta, A. (2018a). Autoregressive spatial spectral estimates. Journal of Econometrics 203,
80–95.
Gupta, A. (2018b). Nonparametric specification testing via the trinity of tests. Journal of
Econometrics 203, 169–185.
Gupta, A., S. Kokas, and A. Michaelides (2021). Credit market spillovers in a financial
network: Evidence from spatial autoregressive model. Working paper.
Gupta, A. and P. M. Robinson (2015). Inference on higher-order spatial autoregressive
models with increasingly many parameters. Journal of Econometrics 186, 19–31.
Gupta, A. and P. M. Robinson (2018). Pseudo maximum likelihood estimation of spatial
autoregressive models with increasing dimension. Journal of Econometrics 202, 92–107.
Hahn, J., G. Kuersteiner, and M. Mazzocco (2021). Joint time-series and cross-section limit
theory under mixingale assumptions. Econometric Theory firstview, 1–17.
Han, X., L.-f. Lee, and X. Xu (2021). Large sample properties of Bayesian estimation of
spatial econometric models. Econometric Theory 37, 708–746.
Hannan, E. J. (1970). Multiple Time Series. John Wiley & Sons.
Heston, A., R. Summers, and B. Aten (2002). Penn World Tables Verison 6.1. Downloadable
dataset, Center for International Comparisons at the University of Pennsylvania.
Hidalgo, J. and M. Schafgans (2017). Inference and testing breaks in large dynamic panels
with strong cross sectional dependence. Journal of Econometrics 196, 259–274.
Hillier, G. and F. Martellosio (2018a). Exact and higher-order properties of the MLE in
spatial autoregressive models, with applications to inference. Journal of Econometrics 205,
402–422.
Hillier, G. and F. Martellosio (2018b). Exact likelihood inference in group interaction network
models. Econometric Theory 34, 383–415.
63
Ho, C.-Y., W. Wang, and J. Yu (2013). Growth spillover through trade: A spatial dynamic
panel data approach. Economics Letters 120, 450–453.
Hong, Y. and H. White (1995). Consistent specification testing via nonparametric series
regression. Econometrica 63, 1133–1159.
Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. The
Annals of Statistics 1, 799–821.
Jenish, N. and I. R. Prucha (2009). Central limit theorems and uniform laws of large numbers
for arrays of random fields. Journal of Econometrics 150, 86–98.
Jenish, N. and I. R. Prucha (2012). On spatial processes and asymptotic inference under
near-epoch dependence. Journal of Econometrics 170, 178 – 190.
Jin, F. and L. F. Lee (2015). On the bootstrap for Moran’s i test for spatial dependence.
Journal of Econometrics 184, 295–314.
Kelejian, H. H. and I. R. Prucha (1998). A generalized spatial two-stage least squares
procedure for estimating a spatial autoregressive model with autoregressive disturbances.
Journal of Real Estate Finance and Economics 17, 99–121.
Kelejian, H. H. and I. R. Prucha (2001). On the asymptotic distribution of the Moran I test
statistic with applications. Journal of Econometrics 104, 219–257.
Koenker, R. and J. A. F. Machado (1999). GMM inference when the number of moment
conditions is large. Journal of Econometrics 93, 327–344.
Konig, M. D., D. Rohner, M. Thoenig, and F. Zilibotti (2017). Networks in conflict: Theory
and evidence from the Great War of Africa. Econometrica 85, 1093–1132.
Kuersteiner, G. M. and I. R. Prucha (2020). Dynamic spatial panel models: Networks,
common shocks, and sequential exogeneity. Econometrica 88, 2109–2146.
Lee, J., P. C. B. Phillips, and F. Rossi (2020). Consistent misspecification testing in spatial
autoregressive models. Cowles Foundations Discussion Paper no. 2256.
Lee, J. and P. M. Robinson (2016). Series estimation under cross-sectional dependence.
Journal of Econometrics 190, 1–17.
Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for
spatial autoregressive models. Econometrica 72, 1899–1925.
Lee, L. F. and X. Liu (2010). Efficient GMM estimation of high order spatial autoregressive
models with autoregressive disturbances. Econometric Theory 26, 187–230.
LeSage, J. P. and R. Pace (2007). A matrix exponential spatial specification. Journal of
Econometrics 140, 190–214.
Malikov, E. and Y. Sun (2017). Semiparametric estimation and testing of smooth coefficient
spatial autoregressive models. Journal of Econometrics 199, 12–34.
Matern, B. (1986). Spatial Variation. Almaenna Foerlaget, Stockholm.
64
Mohnen, M. (2021). Stars and brokers: Peer effects among medical scientists. Management
Science (forthcoming), 20 pp.
Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators.
Journal of Econometrics 79, 147–168.
Oettl, A. (2012). Reconceptualizing stars: scientist helpfulness and peer performance. Man-
agement Science 58, 1122–1140.
Pinkse, J. (1999). Asymptotic properties of Moran and related tests and testing for spatial
correlation in probit models. Mimeo: Department of Economics, University of British
Columbia and University College London.
Pinkse, J., M. E. Slade, and C. Brett (2002). Spatial price competition: A semiparametric
approach. Econometrica 70, 1111–1153.
Portnoy, S. (1984). Asymptotic behavior of M-estimators of p regression parameters when
p2/n is large. I. Consistency. The Annals of Statistics 12, 1298–1309.
Portnoy, S. (1985). Asymptotic behavior of M-estimators of p regression parameters when
p2/n is large; II. Normal approximation. The Annals of Statistics 13, 1403–1417.
Robinson, P. M. (1972). Non-linear regression for multiple time-series. Journal of Applied
Probability 9, 758–768.
Robinson, P. M. (1988). The stochastic difference between econometric statistics. Econo-
metrica 56, 531–548.
Robinson, P. M. (2011). Asymptotic theory for nonparametric regression with spatial data.
Journal of Econometrics 165, 5–19.
Robinson, P. M. and F. Rossi (2015). Refined tests for spatial correlation. Econometric
Theory 31, 1249–1280.
Robinson, P. M. and S. Thawornkaiwong (2012). Statistical inference on regression with
spatial dependence. Journal of Econometrics 167, 521–542.
Scott, D. J. (1973). Central limit theorems for martingales and for processes with stationary
increments using a Skorokhod representation approach. Advances in Applied Probability 5,
119–137.
Stein, M. (1999). Interpolation of Spatial Data. Springer-Verlag, New York.
Stinchcombe, M. B. and H. White (1998). Consistent specification testing with nuisance
parameters present only under the alternative. Econometric Theory 14, 295–325.
Su, L. and S. Jin (2010). Profile quasi-maximum likelihood estimation of partially linear
spatial autoregressive models. Journal of Econometrics 157, 18–33.
Su, L. and X. Qu (2017). Specification test for spatial autoregressive models. Journal of
Business & Economic Statistics 35, 572–584.
65
Sun, Y. (2016). Functional-coefficient spatial autoregressive models with nonparametric
spatial weights. Journal of Econometrics 195, 134–153.
Sun, Y. (2020). The LLN and CLT for U-statistics under cross-sectional dependence. Journal
of Nonparametric Statistics 32, 201–224.
66