a consistent nonparametric test of parametric regression ... · –xed e⁄ects panel data models...

30
A Consistent Nonparametric Test of Parametric Regression Functional Formin Fixed E/ects Panel Data Models Qi Li Department of Economics Texas A&M University College Station, TX U.S.A. Yiguo Sun Department of Economics University of Guelph Guelph, Ontario N1G 2W1 Canada October 18, 2011 We propose a consistent nonparametric test to test for a parametric linear functional form against a nonparametric alternative in the framework of xed e/ects panel data models. The proposed test statistic is based on an integrated squared di/erence between a parametric and a non-iterative kernel curve estimate. We show that the test has a limiting standard normal distribution under the null hypothesis of a linear xed e/ects panel data model, and show that the test is a consistent test. We also establish the asymptotic validity of a bootstrap procedure which is used to better approximate the nite sample null distribution of the test statistic. Simulation results show that the proposed test performs well for panel data with a large number of cross-sectional units and a nite number of observations across time. Key words: Bootstrap; Consistent test; Fixed e/ects; Nonparametric estimation; Panel data. JEL Classication: C12, C14, C23 The corresponding author: Qi Li, email: [email protected]. The authors thank for the nancial support from the Social Sciences and Humanities Research Council of Canada (Project # 410-2009-0109). Li also thanks National Natural Science China (Project # 70773005) for nancial support.

Upload: others

Post on 24-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

A Consistent Nonparametric Test of Parametric RegressionFunctional Form in Fixed Effects Panel Data Models∗

Qi LiDepartment of EconomicsTexas A&M UniversityCollege Station, TX

U.S.A.

Yiguo SunDepartment of EconomicsUniversity of Guelph

Guelph, Ontario N1G 2W1Canada

October 18, 2011

We propose a consistent nonparametric test to test for a parametric linear functional form againsta nonparametric alternative in the framework of fixed effects panel data models. The proposed teststatistic is based on an integrated squared difference between a parametric and a non-iterative kernelcurve estimate. We show that the test has a limiting standard normal distribution under the nullhypothesis of a linear fixed effects panel data model, and show that the test is a consistent test. We alsoestablish the asymptotic validity of a bootstrap procedure which is used to better approximate the finitesample null distribution of the test statistic. Simulation results show that the proposed test performswell for panel data with a large number of cross-sectional units and a finite number of observations acrosstime.

Key words: Bootstrap; Consistent test; Fixed effects; Nonparametric estimation; Panel data.

JEL Classification: C12, C14, C23

∗The corresponding author: Qi Li, email: [email protected]. The authors thank for the financial support from theSocial Sciences and Humanities Research Council of Canada (Project # 410-2009-0109). Li also thanks National NaturalScience China (Project # 70773005) for financial support.

Page 2: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

1 Introduction

Panel data analysis based on some parametric (often linear) model specifications have been well developed and

widely used in empirical studies. The three most popularly cited econometrics textbooks written by Arellano

(2003), Baltagi (2005) and Hsiao (2003) give excellent overview of parametric panel data analysis techniques.

As more advanced (parametric model based) estimation methods and more panel data sets become available,

we expect to see more in-depth empirical studies using panel data. However, one shortcoming of parametric

modeling approach is that parametric models may be misspecified in practice. Therefore, one should guard

against potential model misspecifications associated with parametric models. For example, it is always worth

of checking on the validity of parametric functional forms that one assumes, before he/she makes inferences

based on the parametric model specifications. This paper aims to provide a test statistic to test for a

linear parametric functional form against a nonparametric alternative in a fixed effects panel data model

framework.

To the best of our knowledge, in the fixed effects panel data model framework, the only nonparametric

test statistic available in the existing literature is the one proposed by Henderson, Carroll and Li (2009;

henceforth, HCL), which is based on the average squared difference of a parametric and a nonparametric

curve estimate. However, the limiting distribution of HCL’s test statistic was not obtained, due to the

complication of their back-fitting-based nonparametric curve estimator used to construct the test statistic.

Instead, they relied on a bootstrap procedure to approximate the null distribution of their test statistic.

HCL further conjectured that, after properly centered and scaled, their test statistic may have an asymptotic

normal distribution under the null hypothesis of a linear panel data fixed effects model, and they left the

verification of the conjecture as a future research topic.

In this paper, we propose a test statistic based on the integrated squared difference of parametric and

nonparametric curve estimates. After several simplification steps, we obtain a simple test statistic, which is

shown to have a limiting standard normal distribution under the null hypothesis of the linear fixed effects

panel data model. Our success in obtaining explicitly the limiting result of our test statistic under the null

hypothesis stems from a non-iterative kernel estimator of the nonparametric fixed effects panel data model

motivated by Sun, Carroll and Li (2009; henceforth, SCL) who proposed a nonparametric version of the

least-squares dummy variable (LSDV) approach to estimate semiparametric varying coeffi cient models.

Both HCL and SCL provide nonparametric tests to test for random effects against fixed effects models

in the frameworks of nonparametric and semiparametric varying coeffi cient models, respectively. As our

proposed test for the functional form is designed to work for a fixed effects model, it will evidently work for

a random-effects panel data model. Therefore, one does not have to perform a test to pre-test for a random

effects model against a fixed effects model before using our proposed test to test for a parametric regression

functional form, although knowing that the true model is a random-effects model will evidently reduce the

diffi culty in both the nonparametric estimation and testing procedures.

1

Page 3: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

The literature of nonparametric model specification testing is huge. For independent data case, it includes

Bierens and Ploberger (1997), Ellison and Ellison (2000), Fan and Li (1996), Härdle and Mammen (1993),

Hong and White (1995), Horowitz (2001), Stengos and Sun (2001), Zheng (1996), Whang (2000), Yatchew

(1992), among many others. Many of the existing tests are shown to be still valid with weakly dependent

data case; see Chen and Fan (1999), Fan and Li (1999), and Li (1999), to mention only a few. More recently,

nonparametric model specification tests have been extended to integrated time series data case; see Gao et

al. (2009), and Sun, Cai and Li (2009).

Our test is in the spirit of Härdle and Mammen’s (1993) test, who proposed a test statistic based on

the integrated squared distance between the parametric and nonparametric fits for cross-sectional data.

We adopt the same distance measure; i.e., we construct a test statistic based on the integrated squared

difference between the parametric and nonparametric fits of fixed effects panel data models. After several

steps of simplifying procedures, our final test statistic is closely linked to the test statistics of Fan and Li

(1996), Li and Wang (1998), and Zheng (1996). Equipped with a standard limiting result under the null

hypothesis, our test provides empirical researchers a simple but useful alternative testing procedure to HCL’s

test for the same null hypothesis. Compared with HCL’s test, our test is computationally more friendly.

Moreover, we also establish theoretically the asymptotic validity of a bootstrap procedure used to better

approximate the finite sample null distribution of our test statistic. Monte Carlo simulation results show

that the proposed test performs well in small samples based on critical values obtained from the bootstrap

method.

The rest of the paper is organized as follows. Section 2 describes the parametric and nonparametric

fixed effects panel data models and provides respective consistent estimators. In Section 3, we explain how

to construct a consistent test to test a parametric panel data model against a nonparametric alternative

in the presence of unobserved fixed effects. In this section, we derive not only the limiting results of the

proposed test statistic but also the limiting result of a bootstrap statistic used to enhance the finite-sample

performance of our test. Monte Carlo simulations in Section 4 are then used to evaluate the finite sample

performance of the proposed test and the nonparametric estimator. Finally, Section 5 concludes the paper.

All the mathematical proofs are left to the Appendix.

2 Model and the estimation method

Given a panel data set {(Xit, Yit) : i = 1, . . . , n; t = 1 . . . , T}, without further knowledge on how the explana-

tory variables are related to the dependent variable, one may start to study the relationship between Y and

X using a general nonparametric fixed effects panel data regression model defined below,

Yit = θ(Xit,1, ..., Xit,q) + µi + νit, (i = 1, ..., n; t = 1, ..., T ) (2.1)

2

Page 4: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

where the model contains q ≥ 1 covariates, (Xit,1, ..., Xit,q), θ(·) is an unknown, measurable smooth function

to be estimated, and Yit, µi, and νit are all scalars. We allow E (µi|Xit,1, ..., Xit,q) 6= 0. Hence, this

is a fixed effects nonparametric panel data model, where we do not assume a known functional form for

E (µi|Xit,1, ..., Xit,q), nor do we assume the existence of adequate instrumental variables for the model. We

further assume that the unobserved individual effects, {µi}, are an i.i.d. sequence of random variables with

a zero mean and finite variance σ2µ > 0. For the idiosyncratic error terms, νit, we assume that {νit} is i.i.d.

with a zero mean, finite variance σ2ν > 0 and independent of {µi} with E (νit|X ) = 0 for all i, t, where

X = {(Xjs,1, ..., Xjs,q)}j=1,...,n;s=1,...,T . Hence, the explanatory variables are strictly exogenous. The i.i.d.

assumption on νit can be relaxed to some stationary process in t, and the strictly exogeneity of the covariate

X can be relaxed to contemporaneous exogeneity. But we will not pursue these extensions in this paper to

keep the exposition simple.

When q ≥ 2, one may find it diffi cult to interpret the function θ(X). Moreover, nonparametric estimation

methods also suffer the ‘curse of dimensionality’problem. Therefore, applied researchers tend to impose some

parametric structure to model (2.1). The most popular choice is to impose a linear fixed effects panel data

model as follows,

Yit = α+ β>Xit + µi + νit, (i = 1, ..., n; t = 1, ..., T ) (2.2)

where β is a q×1 parameter vector to be estimated. We keep the same assumptions imposed on the unknown

fixed effects µi and the error terms νit as explained above. We will consider the problem of testing the null

hypothesis of a linear fixed effects panel data model (2.2) against the nonparametric alternative model (2.1)

in Section 3. To help motivate our kernel estimator for model (2.1) and our proposed test statistic, we will

first review the estimation procedure for the linear fixed effects model (2.2) in the next subsection.

We introduce some notation for clarification purpose. Throughout this paper, we will use In to denote the

identity matrix of dimension n, ιm to denote anm×1 column vector of ones, and “ d→”to refer to convergence

in distribution and “p→”to convergence in probability. Also, A> denotes the transpose of A; e.g., it transforms

a n × k matrix A to a k × n matrix A>. In addition, for a function g : Rq → R, we define its first- and

second-order partial derivatives as g(1) (x) = ∂g (x) /∂x (a q × 1 vector) and g(2) (x) = ∂2g (x) /∂x∂x> (a

q × q matrix). Finally, M is a generic, finite constant which can take different value at different place, and

nT = n× T .

2.1 A Linear panel data fixed effects model

In this subsection, we review the parametric fixed effects estimator of the linear panel data fixed effects

model, which is useful in motivating our nonparametric estimator for θ(·) and for constructing our test

statistic.

3

Page 5: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

Rewriting model (2.2) in a matrix form yieldsY1

Y2

...Yn

= α ιnT +

ιT 0 · · · 00 ιT · · · 0

. . .0 0 · · · ιT

µ1

µ2...µn

+

X1

X2

...Xn

β +

ν1

ν2

...νn

,

where Yi = (Yi1, . . . , YiT )>, νi = (vi1, . . . , viT )

>, and Xi =(X>i1, . . . , X

>iT

)>is a T × q matrix. Letting

Y =(Y >1 , . . . , Y >n

)>, ν =

(ν>1 , . . . , ν

>n

)>and X =

(X>1 , . . . , X

>n

)>, we can write the above model more

compactly as

Y = α ιnT +Xβ +D0 µ+ ν, (2.3)

where D0 = In ⊗ ιT is a nT by n matrix, and “⊗” is the Kronecker product. Model (2.3) is called the

least-squares dummy variable (LSDV) model in traditional textbooks (e.g., Hsiao (2003, p. 32)).

To remove the unknown fixed effects, one can pre-multiply a projection matrixMD0 = InT−D0

(D>0 D0

)−1D>0

to both sides of equation (2.3) (the ‘within-groups transformation’), which gives

MD0Y = MD0

Xβ +MD0ν, (2.4)

as MD0ιnT = 0 and MD0

D0 = 0. Applying the OLS method to model (2.4) gives the traditional fixed effects

estimator of β (or the within-groups estimator of β):

β =(X>MD0X

)−1X>MD0Y . (2.5)

Then, α can be estimated by

α = (nT )−1ι>nT

(Y −Xβ

). (2.6)

2.2 Fixed-effects nonparametric panel data models

Model (2.1) has been studied by Henderson, Carroll and Li (2008, HCL) for a suffi ciently large n and

a finite T , who propose using a backfitting method to estimate θ(·). Mammen, StØve, and TjØstheim

(2009) considered the fixed effects nonparametric additive model and showed that the kernel estimator

is consistent also via the backfitting method. No asymptotic distribution results are obtained in either

Henderson, Carroll and Li (2008) or Mammen, StØve, and Tj Østheim (2009). Sun, Carroll and Li (2009,

SCL) applied a nonparametric version of the LSDV approach to obtain a consistent local linear estimator for

a semiparametric varying coeffi cient fixed effects panel data model. We will call SCL’s estimation method the

nonparametric LSDV method. Compared with the backfitting method, the nonparametric LSDV method

enjoys one advantage: It has a closed-form mathematical expression for the kernel estimator of the unknown

function θ(·), which significantly simplifies the limit theory and calculation of the kernel estimator, and our

proposed test is based on the integrated squared difference between the parametric LSDV (or fixed effects)

fit from model (2.3) and the nonparametric LSDV fit from model (2.1).

4

Page 6: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

Rewriting model (2.1) in a matrix form gives

Y = θ(X) +D0µ+ V , (2.7)

where θ(X) = [θ(X1), θ(X2), . . . , θ(Xn)]> with θ(Xi) = [θ(Xi1), . . . , θ(XiT )]

> for i = 1, 2, . . . , n.

Define a T × T diagonal matrix KH(Xi, x) = diag{KH(Xi1, x), · · · ,KH(XiT , x)} for each i = 1, ..., n,

and a (nT ) × (nT ) diagonal matrix WH(x) = diag{KH(X1, x), · · · , KH(Xn, x)}, where KH(Xit, x) =

K(H−1(Xit − x)

). The product kernel K (·) is defined in Assumption A3 below, and H = diag(h1, · · · , hq)

is a q × q diagonal bandwidth matrix which allows the usage of different bandwidths for different covari-

ates. As in SCL, mimicking a parametric LSDV estimation method, we remove the unknown fixed effects

nonparametrically. Specifically, we solve the following optimization problem

min{θ(X),µ}

[Y − θ(X)−D0µ]>WH(x)[Y − θ(X)−D0µ], (2.8)

where we use the kernel weighting matrix WH(x) to ensure that only data with Xit close to x (an interior

point in the support of X) are effectively used in estimating θ(x). Note that we do not place any weighting

matrix for data variation as we have assumed that the idiosyncratic errors {νit} are i.i.d. across equations.

Taking the first-order condition of the objective function in (2.8) with respect to µ gives

D>0 WH(X)[Y − θ(X)−D0µ(x)] = 0,

which yields

µ(x) =[D>0 WH(x)D0

]−1D>0 WH(x)[Y − θ(X)]. (2.9)

Define S0H(x) = M0

H(x)>WH(x)M0H(x) and M0

H(x) = InT −D0

[D>0 WH(x)D0

]−1D>0 WH(x). Replacing

µ in (2.8) by µ(x), we obtain the concentrated weighted least squares

minθ(X)

[Y − θ(X)]>S0H(x)[Y − θ(X)]. (2.10)

The kernel estimator of θ(x) is obtained by replacing θ(Xit) with θ(x) for those of Xit in the vicinity of

x; i.e., (2.10) is changed to mina (Y − aιnT )>S0H(x) (Y − aιnT ). Consequently, one may be tempted to use

the solution a =[ιnT>S0

H(x)ιnT]−1

ιnT>S0

H(x)Y to estimate θ(x). However, this method is not feasible as

the time invariant term, aιnT , will be removed by the local weighting matrix, S0H(x). That is, S0

H(x)ιnT ≡ 0

since S0H(x) is designed to remove any time invariant term in model (2.7). Hence, ιnT>S0

H(x)ιnT is not

invertible.

As the infeasibility of the estimator derived above stems from the complete elimination of any time

invariant term in model (2.7), to make this estimation methodology work, we have to replace the local

weighting matrix S0H(x) in (2.10) by another matrix, say SH(x), so that the estimator of θ(x) becomes

θ(x) =[ιnT>SH(x)ιnT

]−1ιnT>SH(x)Y

= θ(x) +[ιnT>SH(x)ιnT

]−1ιnT>SH(x) [θ(X)− θ(x)ιnT +D0µ+ ν] , (2.11)

5

Page 7: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

where the second equation is obtained by replacing Y in the first equation with Y = θ(X) + D0µ + ν of

(2.7) and adding/subtracting a term θ(x)ιnT . To make θ(x) a consistent nonparametric estimator of θ(x),

SH(x) has to serve two goals: (a) it removes the unobserved fixed effects either completely or asymptoti-

cally, and (b) it only selects Xit close to x for the estimation of θ(x). The latter (or (b)) can be met by

keeping the kernel matrix WH(x) as it is. This leaves us only one choice in modifying S0H(x)—replace D0 by

another matrix D such that ιnT>SH(x)ιnT 6= 0 but[ιnT>SH(x)ιnT

]−1 [ιnT>SH(x)D0µ

]is asymptotically

negligible. This is a reasonable remedy for problem (2.10). On the other hand, one may wonder if it is

possible to find a SH(x) matrix such that ιnT>SH(x)D0µ = 0 (i.e., µ is completely removed) and that,

at the same time, ιnT>SH(x)ιnT is invertible. However, it is easy to see that this is not possible because

ιnT>SH(x)D0µ = 0 implies ιnT>SH(x)ιnT = 0. Hence, the best we can hope is to find a matrix SH(x)

such that ιnT>SH(x)ιnT is invertible and[ιnT>SH(x)ιnT

]−1 [ιnT>SH(x)D0µ

]is asymptotically negligi-

ble. Our idea therefore is to introduce a new local weighting matrix SH(x) = MH(x)>WH(x)MH(x) with

MH(x) = InT −D[D>WH(x)D

]−1D>WH (x), where SH(x) andMH(x) are S0

H(x) andMH(x) respectively

with D0 replaced by an alternative matrix D. As a result, different from the classical nonparametric ker-

nel estimation method, θ(x) evidently contains a bias term arising from the nonparametric approximation,

θ(X)− θ(x)ιnT , as well as a bias term due to the existence of the unknown fixed effects, D0µ.

As for the choice of D satisfying the requirement discussed above, we use the same D as in SCL,

D = [−ιn−1 In−1]>⊗ ιT , a (nT )× (n− 1) matrix. In the Appendix, equation (A.4) shows ιnT>SH(x)ιnT =

n2/∑ni=1

(∑Tt=1K

(H−1 (Xit − x)

))−1

> 0 with a second-order kernel K (·) and a properly chosen band-

width matrix H; equations (A.4) and (A.5) give[ιnT>SH(x)ιnT

]−1ιnT>SH(x)D0µ = n−1

∑ni=1 µi =

Op(n−1/2

), as µi is an i.i.d. sequence with a zero mean and finite variance. Therefore, using SH (x) as

the local weighting matrix can remove the unobserved fixed effects asymptotically as sample size n→∞.

Let us formally summarize our proposed estimator as follows:

θ(x) =[ιnT>SH(x)ιnT

]−1ιnT>SH(x)Y , (2.12)

where SH(x) = MH(x)>WH(x)MH(x) with MH(x) = InT −D[D>WH(x)D

]−1D>WH (x) and D =

[−ιn−1 In−1]> ⊗ ιT . We will give the limiting result of θ(x) in Theorem 2.1 below.1

By applying the nonparametric LSDV method, we see that our proposed estimator θ(x) defined in (2.12)

is parallel to the pooled nonparametric estimator (which ignores the fixed effects) defined by

θ(x) =[ιnT>WH(x)ιnT

]−1ιnT>WH(x)Y =

n∑i=1

T∑t=1

ωitYit,

where we denote λit = K(H−1(Xit − x)

)and ωit = λit/

∑nj=1

∑Ts=1 λjs. The pooled estimator θ(x) is an

average of the dependent variable with a weight of ωit associated with Yit. In contrast, by (A.3) and (A.4),

1 In practice, one may use a different D matrix as long as it makes (a) and (b) satisfied, or use some other method as long asit can remove the fixed effects asymptotically. Su and Ullah (2006) used the same D matrix and a different estimation procedureto estimate a partially linear fixed-effects panel data model.

6

Page 8: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

we show that our proposed estimator θ(x) = n−1∑ni=1

∑Tt=1 ωitYit, where ωit = λit/

∑Ts=1 λis. Hence, θ(x)

is also an average of the dependent variable with the weight associated with Yit given by n−1ωit.

Of course, the pooled estimator is inconsistent due to the existence of the unknown fixed effects. By re-

placing the weighting matrixWH(x) in the pooled estimator with SH(x), the nonparametric LSDV estimator

is a consistent estimator of θ(x) because it removes locally the fixed effects before estimation. Specifically, the

local weights for the pooled estimator θ(x) satisfy∑ni=1

∑Tt=1 ωit = 1 while

∑Tt=1 ωit = 1 for our proposed

estimator θ(x), from which we see θ(x) force the summation of the local weights equal to one within each

individual unit. Hence, the relative importance of Yit in θ(x) is compared with all other observations on

Y ’s, while the relative importance of Yit in θ(x), given n, is compared with the T observations within the

ith unit. In some sense, the existence of the unknown fixed effects term only affects the way that we put the

weight function in the local least squared dummy variable regression.

Due to the clear-cut expression of the kernel estimator, also called the local constant estimator, we are

able to let readers see how the local weights are imposed in θ(x). This is the reason that we choose to illustrate

our estimation methodology via the kernel estimator, although our approach can be easily extended to local

polynomial kernel estimator.

To derive the asymptotic distribution of θ(x), we list some regularity conditions below.

(A1) The continuous random variables (Yit, Xit) are independently and identically distributed (i.i.d.) across

the i index and are generated according to equation (2.1). Further, we assume that

(a)Xit is a strictly stationary α-mixing process with mixing coeffi cients αk = O(k−(δ+2)/δ

)and E

(‖Xit‖2+δ′

)<∞ for some δ′ > δ > 0. Xit has a common p.d.f. f(x), and (Xit, Xis) has a joint p.d.f. ft,s (x1, x2). Also,

f (x) > 0 at an interior point x ∈ S, where S is the support of Xit.

(b) θ (x), f (x), and ft,s (x1, x2) are all twice continuously differentiable in the neighborhood of x ∈ S.

(A2) X, the nT by q matrix defined in eq. (2.3), has a full rank of q. The unobserved fixed effects µi are

i.i.d. with a zero mean, finite variance σ2µ > 0, and E (µi|Xit) 6= 0. The idiosyncratic errors {νit}

are assumed to be an i.i.d. sequence satisfying E (νit| {(µi, Xit)}) = 0, E(ν2it| {(µi, Xit)}

)= σ2

v, and

E(|νit|2+δ′ | {(µi, Xit)}

)<∞ for all i and t.

(A3) K(u) =∏qs=1 k(us) is a product kernel, where the univariate kernel function k(·) is a uniformly

bounded, symmetric (around zero) probability density function with a compact support [−1, 1].

(A4) Define |H| = h1 · · ·hq and ‖H‖ =√∑q

j=1 h2j . As T → ∞, ‖H‖ → 0 and T |H| → ∞. Also,√

nT |H| ||H||2 = O(1) as n→∞ and T →∞.

The assumptions listed above are regularity assumptions commonly seen in nonparametric estimation

literature. Of course, one can extend the current paper by allowing the error terms to have serial correlation

and heteroskedasticity. Also, the strictly stationary condition imposed by Assumption A1 (a) can be relaxed

7

Page 9: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

to allow for heteroskedasticity. However, without pursuing these extensions, we can explain the proposed

estimator better in the simple setup regulated by Assumptions A1 and A2. In Assumption A3, the kernel

function taking values over a closed interval of [−1, 1] is not essential, and the assumption can be relaxed

to a general second-order kernel function (such as a standard normal kernel) with some extra lengthy proof.

In Assumption A4, we need ‖H‖ → 0 and T |H| → ∞ as T → ∞ to deal with the random denominator,(∑Tt=1 λit

)−1

, appearing in θ(x) such that (T |H|)−1∑Tt=1 λit = f (x) + Op

(‖H‖2

)+ Op

((T |H|)−1/2

),

which implies (T |H|)−1∑Tt=1 λit

p→ f (x) for each i. However, in practice, T needs not be very large for the

proposed estimator to be accurate. In fact, T goes to infinity at much slower speed than n does; see Remark

2.2 below.

Below, we present the limiting results of θ(x) and delay the proofs to Appendix 6.1.

THEOREM 2.1 Under Assumptions A1-A4, we have, at an interior point x ∈ S,

√nT |H|

[θ(x)− θ(x)−BH (x)− µ

]d→ N

(0,ζ0σ

2v

f (x)

), (2.13)

where BH (x) = κ2tr{H[θ(1) (x) f (1) (x)

T/f (x) + θ(2) (x) /2

]H}

= O(‖H‖2

), κ2 =

∫K(v)v2dv, ζ0 =∫

K(v)2dv, and µ = n−1∑ni=1 µi.

Remark 2.1 Theorem 2.1 implies that θ(x) = θ(x) + BH (x) + µ + Op

((nT |H|)−1/2

)= θ(x) + O

(‖H‖2

)+Op

(n−1/2

)+ Op

((nT |H|)−1/2

). Then, Assumption A4 ensures that θ(x)

p→ θ(x) as n → ∞, as µ → 0

as n → ∞. Therefore, the unobserved fixed effects are asymptotically negligible (with a suffi ciently large n)

even though they are not completely removed in small sample applications.

Remark 2.2 To find the optimal bandwidth, one usually balances one of the asymptotic squared bias term,

B2H (x), with the asymptotic variance term. For the scalar case with q = 1, we have hopt ∼ (nT )

−1/5.

In addition, to make another bias term µ no larger than BH (x) in absolute values or n−1/2 = O(h2opt

),

we obtain T = O(n1/4

). For a general q ≥ 1, if T = O

(nq/4

), (nT )−1/(4+q) is the optimal rate for the

smoothing parameters hj, j = 1, ..., q.

As Theorem 2.1 implies that the nonparametric LSDV estimator, θ(x), removes the unknown fixed effects

only when n is large (µ = Op(n−1/2

)), the existence of the fixed effects term µ may affect the finite sample

performance of the proposed estimator, if n is not large enough or/and if V ar (µi) is rather large compared

to the variances of νit and Xit. Below, we illustrate a simple modification of θ(x) to completely remove the

unknown fixed effects for a special interest.

Model (2.1) implies E (Yit) = E[θ(Xit)] as E (µi) = E (νit) = 0 for all i and t. Letting c0 = E[θ(Xit)],

one can consistently estimate θ(z)− c0 by

θ(x)def= θ(z)− Y ,

8

Page 10: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

where Y = (nT )−1∑ni=1

∑Tt=1 Yit = µ+(nT )

−1∑ni=1

∑Tt=1 [θ (Xit) + νit] = E[θ(Zit)]+ µ + Op

((nT )

−1/2).

Here, (nT )−1∑n

i=1

∑Tt=1 θ (Xit) = E[θ(Zit)] +Op

((nT )

−1/2)holds under Assumption A1, if E

[θ(Zit)

2]<

M < ∞; and (nT )−1∑n

i=1

∑Tt=1 νit = Op

((nT )

−1/2)holds under Assumption A2. Therefore, we have

θ (x) = θ(x)− c0 +Op(‖H‖2

)+Op

((nT |H|)−1/2

), as µ is cancelled out by the subtraction of Y from θ(z),

the bias term now only involves the conventional term of order Op(||H||2

), and the variance term is of the

order Op(

(nT |H|)−1/2).

The demeaned estimator θ(z) is useful if one is interested in studying partial effects ∂θ(x)/∂x because

θ(z) and θ(z) have the same partial derivatives. From Theorem 2.1, we immediately obtain the following

result.

THEOREM 2.2 Under Assumptions A1-A4 and E[θ (Xit)

2]< M < ∞, we have, at an interior point

x ∈ S, √nT |H|

{θ(x)− [θ(x)− c0]−BH (x)

} d→ N(

0,ζ0σ

2v

f (x)

), (2.14)

where BH(x) and ζ0 are the same as defined in Theorem 2.1.

3 A consistent test for a linear panel data fixed effects model

In this section, we propose a test statistic of testing the null hypothesis of a linear panel data fixed effects

model against a nonparametric alternative. Such a testing problem has been considered by HCL who

proposed a test statistic based on a backfitting kernel estimator of model (2.1). As the estimator proposed

by HCL does not have a closed-form expression, it causes diffi culty for them to obtain the asymptotic

distribution of their test statistic. Although this disadvantage may be solved by using a bootstrap method,

the asymptotic validity of their bootstrap procedure is not verified. In this section, we are able to show

not only a limiting standard normal distribution result of our test statistic under the null of a linear panel

data fixed effects model but also the consistency of the test when the null hypothesis is false. Moreover,

to improve the finite sample performance of our test, we also establish the asymptotic validity of a wild

bootstrap procedure used to better approximate the finite sample null distribution of our test statistic.

In Section 3.1, we propose our test statistic, whose asymptotic behavior is given in Section 3.2. Note

that the test statistic proposed below is a consistent test when n is large, allowing T to be a finite positive

integer.

3.1 The test statistic

We consider the following null and alternative hypotheses:

H0 : Pr{θ(Xit) = α0 + β>0 Xit

}= 1 for some

(α0, β

>0

)>∈ Θ ⊂ R1+q, (3.1)

H1 : Pr{θ(Xit) = α+ β>Xit

}< 1 for any

(α, β>

)>∈ Θ, (3.2)

9

Page 11: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

where Θ is a compact and convex subset of R1+q. Our test statistic is based on the integrated squared

distance between the parametric and nonparametric fits of the data, which is in line with Härdle and

Mammen (1993). Let(α, β

)be the fixed effects estimator of model (2.2) and θ(x) be the kernel estimator

of θ(x) from model (2.1) defined in (2.12). In principle, one can construct a consistent test for H0 based on∫[θ(x) − (α + x>β)]2dx, where one replaces θ(x) by our nonparametric LSDV estimator θ(x) and α and β

by the LSDV estimator for the linear fixed effects panel data model. However, the resulting test will have

several bias terms (i.e., non-zero center terms), which complicate the asymptotic analysis as well as affecting

the finite sample performance of the test. Therefore, to construct a test statistic that is correctly centered at

zero under the null hypothesis, following the approach of Härdle and Mammen (1993), we choose to smooth

the parametric fit of the linear fixed effects panel data model. Since the nonparametric estimator of θ(x) is

given by θ(x) = [ι>nTSH(x)ιnT ]−1ιnT>SH(x)Y , we define a kernel smoothed least squares estimate of α+x>β

by θpara(x) = [ι>nTSH(x)ιnT ]−1ιnT>SH(x)

(αιnT +Xβ

). Denote by U = Y − αιnT − Xβ the estimated

residuals from the parametric fixed effects panel data model. We then have θ(x)−θpara(x) = A−1n ι>nTSH(x)U ,

where An = ιnTSH(x)ιnT . Then, we have

In =

∫ [θ(x)− θpara(x)

]2dx = U>

∫SH(x)ιnTA

−1n A−1

n ιnT>SH(x)dxU .

In trying to derive the asymptotic distribution of In defined above, we find it diffi cult to deal with the

random denominator matrix, A−1n . We therefore decide to replace θ(x) − θpara(x) by An

[θ(x)− θpara(x)

]to remove the random denominator. Also, the purpose of using the MH(x) matrix is to remove the fixed

effects term D0µ. The matrix MD0 can serve the same purpose of removing the fixed effects but is easier to

handle with than MH(x). Therefore, we make further a simplification of replacing MH(x) by MD0. Hence,

our further modified test statistic is given by

In =

n∑i=1

n∑j=1

U>i QT

∫KH(Xi, x)ιT ι

>TKH(Xj , x)dxQT Uj ,

where QT = IT − T−1ιT ι>T .

We notice that the typical element of |H|−1∫KH(Xi, x)ιT ι

>TKH(Xj , x)dx is given by KH(Xit, Xjs) =∫

K(H−1(Xit −Xjs) + ω

)K(ω)dω, which acts as a weighting function to select (i, t) and (j, s) such that

only those Xit and Xjs close to each other make non-negligible contribution to the test statistic. Hence,

without loss of generality, we further simplify the test by replacing KH(Xit, Xjs) with KH(Xit, Xjs); this

replacement will not affect the essence of the test statistic since the local weight property is preserved.

Finally, we use a leave-one-out estimator to remove a non-zero center term from the test statistic, which

is equivalent to replace the diagonal element of KH(Xi, Xi) by zero; i.e., we replace KH(Xit, Xit) by a zero

for any i and t. Now, we are in a position to finalize our modified test statistic, which is given by

In =1

n2|H|

n∑i=1

n∑j=1

U>i QT KH (Xi, Xj)QT Uj , (3.3)

10

Page 12: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

where KH (Xi, Xj) = KH (Xi, Xj) if j 6= i, and KH (Xi, Xi) = KH (Xi, Xi) − KH (0) IT for all i. Here,

KH (Xi, Xj) is a T × T matrix with its typical (t, s)th element given by KH (Xit, Xjs) for t, s = 1, ..., T and

KH(0) = KH(Xit, Xit). Therefore, the diagonal elements of KH (Xi, Xi) are all zeros. The use of leaving

out the diagonal elements in the kernel matrix KH (Xi, Xi) makes the test statistic In correctly centered at 0

under H0. To see this in a clearer way, we define the within-groups transformed residual byU i = QT Ui. Note

that U>i QT KH (Xi, Xj)QT Ui =U>

i KH (Xi, Xj)U j =

∑Tt=1

∑Ts=1 1(i,t) 6=(j,s)

uitujsKH(Xit, Xjs), where the

indicator function 1(i,t) 6=(j,s) = 1 if (i, t) 6= (j, s), and zero otherwise, and U i = (ui1, ..., uiT )> with the

within-groups transformed LSDV residuals, uit = uit −T−1∑Ts=1 uis for all i and t. Hence, our test statistic

is a leave-one-out version of the kernel estimate of E[uitE(uit|Xit)f(Xit)], where f(·) is the p.d.f. of Xit

defined in Assumption T1 in the next subsection. Hence, (3.3) can be equivalently written as

In =1

n2|H|

n∑i=1

n∑j=1

T∑t=1

T∑s=1

1(i,t) 6=(j,s)uitujsKH (Xit, Xjs) . (3.4)

Here, the use of the leaving-one-out kernel estimator ensures that the test statistic In is properly centered

at zero when the null hypothesis holds true.

HCL proposed a test statistic for testing a linear model against a nonparametric alternative model in the

same fixed effects panel data framework. Their test statistic is given by (their Ibn statistic)

Ibn =1

nT

n∑t=1

T∑t=1

(α+X>it β − θ(Xit)

)2

,

where θ(x) denotes the nonparametric estimator of θ(x) proposed by HCL.

HCL conjectures that their proposed test has an asymptotic standard normal distribution after proper

normalization and centering. Besides the theoretical complication due to the usage of an iterative estimator

θ(Xit) in constructing Ibn, another complication associated with their test is that Ibn contains several centering

terms which need to be estimated and subtracted from Ibn in order to properly center the test statistic. In

contrast, our test statistic In only involves the computation of the kernel weighted LSDV residuals. The

simplicity of our test statistic enables us to derive its asymptotic distribution, which is the subject of next

subsection.

3.2 Asymptotic distribution of the In test

In this subsection, we present the asymptotic properties of the In test statistic and delay the proofs to

Appendix 6.2. We first list some regularity conditions which will be used in deriving the asymptotic null

distribution of our test statistic and in establishing the consistency of the test.

(T1) The continuous random variables (Yit, Xit) are independently and identically distributed (i.i.d.) across

the i index and are generated according to equation (2.1). Further, we assume

11

Page 13: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

(a) E(v4it

)= µ4 < ∞, and across all individual sectional units, Xit is strictly stationary with the

common p.d.f. f(x). Also, (Xit, Xis) has a joint p.d.f. ft,s (x1, x2) for all t 6= s. Also, both f(x)

and ft,s (x1, x2) are continuously differentiable. In addition, supt6=s∫ft,s (x, x) dx < M <∞.

(b) E(Xmit,j |Xit′

), E (θ (Xit)

m |Xit′), f (x) and ft,s (x1, x2) and their first-order partial derivatives are

all uniformly bounded for m = 1, . . . , 4, j = 1, 2, . . . , q and for all i and t 6= s.

(T2) Let Xit = Xit − T−1∑Tt=1Xit for all i and t. Then, n−1X>MD0

Xp→ Σxx, where Σxx = E

(X>i Xi

)is a finite, positive definite matrix.

(T3) As n→∞, ‖H‖ → 0 and n|H| → ∞. T is a finite, positive integer.

As the validity of our test does not require T to be large, there is no need to place restrictions on

(Xit, Yit) across time. In particular, there is no need to impose strictly stationarity for Xit. However,

imposing the strictly stationary simplifies the consistency proof substantially. Therefore, we will keep the

strictly stationarity assumption. In Assumption T1 (b), the conditions on E (Xmit |Xit′) and E (θ (Xit)

m |Xit′)

are used to obtain the stochastic orders of the test statistic. The uniform boundness condition imposed by

Assumption T1 (b) can be replaced by an assumption such that those functions are bounded by functions

with finite expectations. Also, by the law of iterated expectation, Assumption T1 (b) implies E(‖Xit‖4

)<

M < ∞ and E[θ (Xit)

4]< M < ∞. Under Assumptions A2, A3, T1 and T2, we can show that the fixed

effects estimators given by (2.5) and (2.6) converge to well-defined finite values under both the null and

alternative hypotheses. Specifically, α − α = Op(n−1/2) and β − β = Op(n

−1/2), where α = α0 and β = β0

under H0, and α = plimn→∞α and β = plimn→∞β under H1. Therefore, under H0, the parametric residuals

uit can be replaced by uit = µi + νit (asymptotically) when we examine the asymptotic null distribution of

the test statistic In.

The asymptotic distributions of our proposed test statistic In under H0 and under H1 are given in the

following two Theorems.

THEOREM 3.1 Under Assumptions A2, A3, and T1-T3, we have, under H0,

Jn = n√|H|In/

√σ2

0d→ N(0, 1), (3.5)

where

σ20 = 2

(n2|H|

)−1n∑i=1

n∑j 6=i

T∑t=1

T∑s=1

u2

itu2

jsK2H (Xit, Xjs) (3.6)

is a consistent estimator of the asymptotic variance of n√|H| In, or

σ20 = 2ζ0σ

4ν (T − 1)

2E [f (X1t)] . (3.7)

Note that Theorem 3.1 does not require T to be large. Of course, if T is large, the Jn test also works under

some mild conditions on the serial dependence structure of Xit over t. Then, only some minor modifications

12

Page 14: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

are needed in the presentation of Theorem 3.1 such as n√|H| and

(n2|H|

)−1should be replaced by nT

√|H|

and(n2T 2|H|

)−1.

Remark 3.1 If we define σ20 = 2

(n2|H|

)−1∑ni=1

∑nj=1

∑Tt=1

∑Ts=1 1(i,t)6=(j,s)

u2

itu2

jsK2H (Xit, Xjs), it is easy

to show that σ0)2 = σ20 + op(1). This follows from σ2

0 = σ20 + Op(n

−1) and σ20 = σ2

0 + op(1). Note that σ20

leaves T observations out (the whole ith unit), while σ20 only leaves out one point with subscript (i, t). In fact,

it is easy to show that σ20 is the leading term of σ2

) . Hence, one can also use σ20 to replace σ

20 in Theorem

3.1. Although σ20 mimics closer to the second moment of n

√|H|In, it is easier to prove σ2

0 = σ20 +op(1) than

to prove σ20 = σ2

0 + op(1). This is the reason we choose to use σ20 in Theorem 3.1.

THEOREM 3.2 Under Assumptions A2, A3, and T1-T3, we have under H1,

Pr {Jn ≥Mn} → 1 as n→∞, (3.8)

where Mn is any non-stochastic, positive sequence such that Mn = o(n√|H|).

Theorem 3.2 means that the proposed test is a one-sided test. The null hypothesis is rejected at a given

significance level if Jn is greater than the corresponding critical value. Theorem 3.2 implies that, when the

null hypothesis is false, the probability that the Jn test rejects the null hypothesis approaches one as n→∞,

i.e., Jn is a consistent test.

It is well-known that kernel-based nonparametric tests suffer substantial finite sample size distortions.

Our Jn is without exception. We therefore suggest to use a bootstrap procedure to better approximate the

null distribution of Jn.

3.3 A bootstrap procedure

In this subsection, we propose to use a residual-based wild bootstrap procedure to better approximate the

finite sample null distribution of the Jn test. The detailed bootstrap procedure is given below.

1. Estimate the linear fixed effects panel data model (2.3) and obtain the LSDV residuals uit = Yit− α−

X>it β.

2. Obtain the two-point wild bootstrap errors by setting u∗it = auit with probability r and u∗it = buit with

probability 1 − r, where a = (1 −√

5)/2, b = (1 +√

5)/2, and r = (1 +√

5)/(2√

5). Then, calculate

Y ∗it = α+X>it β + u∗it. Call {(Xit, Y∗it)}i=1,...,n;t=1,...,T the bootstrap sample.

3. Using the bootstrap sample, we calculate the parametric LSDV (or fixed effects) estimator β∗and α∗,

where β∗and α∗ are the same as given in (2.5) and (2.6) except that Yit is replaced by Y ∗it . We then

calculate the bootstrap residuals u∗it = Y ∗it − α∗ −X>it β

∗.

13

Page 15: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

4. Compute J∗n = n√|H|I∗n/

√σ∗2, where I∗n =

(n2|H|

)−1∑ni=1

∑nj=1

∑Tt=1

∑Ts=1 1(i,t) 6=(j,s)

u∗itu∗jsKH (Xit, Xjs),

σ∗2 = 2(n2|H|

)−1∑ni=1

∑nj 6=i∑Tt=1

∑Ts=1

u∗2it u∗2jsK2H (Xit, Xjs), and u∗it = u∗it − T−1

∑Ts=1 u

∗is for all i

and t.

5. Repeat steps 1 to 4 for B times. Use the empirical distribution of the B bootstrap statistics to obtain

upper percentile values, which approximate the upper percentile values of the test statistic Jn under

H0.

Remark 3.2 By the same reasoning as given below Theorem 3.1, one can also use

σ∗2 = 2(n2|H|

)−1∑ni=1

∑nj=1

∑Tt=1

∑Ts=1 1(i,t)6=(j,s)

u∗2it u∗2jsK2H (Xit, Xjs) to replace σ

∗2 in step 4 above.

We need an additional assumption for the bootstrap statistic.

(B) E (µmi |Xit = x) for m = 1, . . . , 4 are continuously differentiable and uniformly bounded over x ∈ S.

THEOREM 3.3 Under Assumptions A2, A3, T1-T3, and B, we have

supz∈R|Pr∗ (J∗n ≤ z)− Φ(z)| = op(1), (3.9)

where Pr∗(·) = Pr (·|{(Xit, Yit)}i=1,...,n;t=1,...,T ), and Φ(·) is the standard normal cumulative distribution

function.

Theorem 3.3 shows that the bootstrap method is an asymptotically valid procedure to approximate the

null distribution of Jn regardless the null hypothesis holds true or not. The proof of Theorem 3.3 is given in

Appendix 6.2.

Note that if the right-hand-side term, op(1), in (3.9) were replaced by o(1) with probability one, then

the above result would be stated as that the bootstrap statistic J∗n converges to a standard normal random

variable in distribution with probability one. However, the ‘with probability one’result is more diffi cult to

establish. Therefore, we will prove the case that bootstrap statistic J∗n converges to a standard normal

random variable in distribution in probability as presented by (3.9).

4 Monte Carlo simulations

In this section we use Monte Carlo simulations to assess how our proposed estimator and test statistic perform

in finite sample applications. Specifically, Section 4.1 studies the finite sample behavior of the nonparametric

LSDV estimator and Section 4.2 studies that of the proposed test.

4.1 The finite sample performance of our proposed estimator

We use the same DGP as in HCL, and the DGP is given as follows

Yit = sin(2Xit) + µi + νit, (4.1)

14

Page 16: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

where Xit is i.i.d. uniform[-1,1], νit is i.i.d. N(0, 1), vi is i.i.d. uniform[-1,1], and µi = vi + a0T−1∑Ts=1Xis.

Also, Xit, νit, and vi are mutually independent of each other. We take a0 = 0.5, 1 and 2 so that µi and

{Xit : t = 1, . . . , T} are correlated. As sin (−x) = − sin (x) for x ∈ [−2, 2], we have E(Yit) = E [sin(2Xit)] = 0

for the current DGP; i.e., c0 = E[θ(Xit)] = 0. Therefore, both θ (x) and θ (x) consistently estimate θ (x) for

the DGP we considered here.

We take n = 50, 100, and 200. We set T =[1.5n1/4

], where [a] is the smallest integer greater or

equal to a. Hence, with n = 50, we take T = 4; n = 100, T = 5; n = 200, T = 6. The number of

Monte Carlo replications is 1,000. We use the standard normal kernel function to compute the proposed

estimator, and the bandwidth used is selected via h = cσxn−1/5, where σx is the sample standard deviation

of {Xit}i=1,...,n;t=1,...,T , and c is selected by cross-validation method.

To assess the estimation accuracy, we report in Table 1 the average mean squared errors (AMSE) of both

the estimators θ (x) and θ (x). That is, we calculate the AMSE for θ (x)

AMSEθ =1

M

M∑j=1

1

nT

n∑i=1

T∑t=1

[θ (Xit,j)− θ (Xit,j)

]2where j refers to the jth simulation replication, M = 1000 is the total number of replications, and AMSEθ

is defined the same way except that θ (Xit,j) is replaced by θ (Xit,j).

We observe three patterns from Table 1. First, given a0, the AMSEs of both estimators decrease fast

as both n and T grow. Second, given a0, n and T , θ (x) performs better than θ (x) for all the cases. This

is consistent with what Theorems 2.1 and 2.2 predict. Finally, as a0 grows from 0.5, to 1 and 2, θ (x) is

invariant with respect to the changes in a0 as the unknown fixed effects are removed; however, the larger a0

is, the larger the variance of the fixed effects, and the larger the AMSEs of θ (x) are. This shows that θ (x)

will be affected by the unknown fixed effects in finite sample applications, but the finite sample bias resulted

from the unknown fixed effects do go away fast as both n and T grow.

In the last column of Table 1 we also report the average fixed-effect-noise-to-signal ratio, σµ/σy, at the

right most column of Table 1, where σµ and σy are the sample standard deviations of µi and Yit, respectively.

With a fixed-effect-noise-to-signal ratio averaging between 40% and 50%, our proposed estimator performs

quite well.

4.2 Finite sample performance of our proposed test

To illustrate how our proposed test performs in finite sample applications, we consider the following DGPs:

DGP0 : Y1t = 1 +Xit + µi + νit,

DGP1 : Y1t = sin(Xitπ) + µi + νit,

DGP2 : Y1t = Xit − 0.5X2it + µi + νit,

15

Page 17: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

Table 1: AMSE of θ (x) and θ (x)

a0 n T θ (x) θ (x) σµ/σy.5 50 4 .0463 .0354 .4138

100 5 .0213 .0161 .4142200 6 .0089 .0067 .4129

1 50 4 .0480 .0354 .4311100 5 .0220 .0161 .4281200 6 .0091 .0067 .4250

2 50 4 .0539 .0354 .4956100 5 .0244 .0161 .4839200 6 .0099 .0067 .4743

Table 2: Estimated Size: The univariate regressor caseDGP0

n 1% 5% 10% 20% 50%50 .014 .054 .104 .212 .520100 .020 .058 .107 .200 .510200 .019 .053 .099 .209 .486

where Xit is i.i.d. uniform[-1,1], νit is i.i.d. N(0, 1), vi is i.i.d. uniform[-1,1], µi = vi + a0T−1∑Ts=1Xis with

a0 6= 0. Evidently, all the data generating processes have unknown fixed effects that are correlated with

the regressor for each given cross-sectional unit. Here, Xit, νit, and vi are all mutually independent of each

other, but Xit and µi are correlated with a non-zero a0. Because our test statistic Jn is invariant to different

values of a0 (because fixed effects µi are completely canceled out by the matrix Q), we only report results

with a0 = 1.

We take T = 3 and n = 50, 100, and 200. The number of replications is 1,000, and within each

replication, 400 bootstrap iterations are conducted to estimate the 1%, 5%, 10%, 20% and 50% upper

percentile values of the null distribution of our test statistic Jn. We use the standard normal kernel function

for the nonparametric test, and the bandwidth used is h = σxn−1/5, where σx is the sample standard

deviation of Xit. The estimated sizes and powers are given in Tables 2 and 3, where bootstrap critical values

are used.

From Table 2, we observe that our test statistic, Jn, has good size estimates for all sample sizes considered

and for all percentile values. Table 3 shows that the power of the test increases as sample size rises (as

expected), and that the test is more powerful against DGP1 (θ (x) shows a complete phase of sine curve

Table 3: Estimated Power: The univariate regressor caseDGP1 DGP2

n 1% 5% 10% 20% 1% 5% 10% 20%50 .830 .934 .962 .990 .074 .212 .306 .470100 .990 .998 1.00 1.00 .210 .408 .520 .638200 1.00 1.00 1.00 1.00 .436 .654 .786 .866

16

Page 18: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

Table 4: Estimated Sizes and Powers: The bivariate regressor caseSize —DGP2,0 Power —DGP2,1

n 1% 5% 10% 20% 1% 5% 10% 20%25 .009 .068 .114 .226 .698 .892 .934 .96250 .012 .053 .106 .195 .972 .992 .998 1.00100 .012 .053 .106 .195 1.00 1.00 1.00 1.00

over [−π, π]) than against DGP2 (θ (x) only contains the growing part of the parabola curve). Obviously,

the more curvature θ (x) is, the more powerful the test is.

Next, we consider the performance of our test statistic in a bivariate case. We use the same DGP as in

HCL:

DGP2,0 : Yit = 5Xit,1 + 2Xit,2 + µi + νit,

DGP2,1 : Yit = 5Xit,1 + 2X2it,2 + µi + νit,

whereXit,1 is i.i.d. uniform[-1,1],Xit,2 is i.i.d. uniform[2,4], vi is i.i.d. uniform[-1,1], µi = vi+T−1∑Tt=1Xit,1,

and νit is i.i.d. N(0, 1). Here, Xit,1, Xit,2, νit, and vi are all mutually independent of each other, but the

fixed effects terms µi are correlated with Xit,1. DGP2,0 gives us the null linear fixed effects panel data model,

and DGP2,1 is an alternative quadratic (in Xit,2) fixed effects panel data model.

We take T = 3 and n = 25, 50, and 100. The number of replications is 1,000; again, within each

replication, 400 bootstrap iterations are conducted to estimate the 1%, 5%, 10% and 20% upper percentile

values of the null distribution of our test statistic Jn. The product kernel is the product of two standard

normal kernel functions, and the bandwidth matrix is H = diag {h1, h2} with hj = σxjn−1/6, where σxj is

the sample standard deviation of Xit,j for j = 1, 2. The estimated sizes and powers are reported in Table 4.

From Table 4, we observe that the estimated sizes are quite good with estimated sizes close to their

nominal levels. The estimated power of our test is also quite good, and the power performance is similar the

test statistic proposed by HCL (see the Ibn test in Table 2 of HCL).

5 Conclusion

In this paper, we propose a nonparametric least squares dummy variable estimator and construct a simple

test statistic for testing the null hypothesis of a linear fixed effects panel data regression model against a

nonparametric alternative. Our estimator based on the nonparametric fixed effects panel data model has a

closed-form expression, which allows us to derive its limiting normal distribution result when both n and T

are suffi ciently large. Based on the proposed estimator, we construct a simple test statistic which does not

require an iterative estimation procedure to estimate the nonparametric panel data model with fixed effects

as in HCL. We establish the asymptotic null distribution of the test statistic and prove that the test is a

consistent test. A bootstrap procedure is provided to approximate the null distribution of the test statistic.

17

Page 19: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

Monte Carlo simulations show that our proposed test with bootstrap critical values performs well in finite

sample applications.

6 Appendix

This appendix contains two subsections, where Section 6.1 gives the proof of the limiting result of the kernel

estimator for the nonparametric fixed effects panel data model, and Section 6.2 gives the proof of the limiting

results of the proposed test statistic.

6.1 Proof of Theorem 2.1

To simplify notation, we let, for each i and t, λit = KH (Xit, x) and cH (Xi, x)−1

=∑Tt=1 λit. Applying

(A + BCD)−1 = A−1 −A−1B(DA−1B + C−1)−1DA−1 to derive the inverse matrix appearing in MH(x)

(Poirier (1995, p. 627)), we obtain MH(x) = InT −D{A−1−A−1ιn−1ι>n−1A

−1/∑ni=1 cH(Xi, x)}D>WH(x),

where A = diag{cH(X2, x)−1, . . . , cH(Xn, x)−1}. Further simplification gives

MH (x) = InT −[Q⊗

(ιT ιT

>)]WH (x) , (A.1)

where the typical elements ofQ are qii = cH (Xi, x)−cH (Xi, x)2/∑ni=1 cH (Xi, x) and qij = −cH (Xi, x) cH (Xj , x)

/∑ni=1 cH (Xi, x) for i 6= j ∈ {1, . . . , q}. It is easy to see that

∑ni=1 qij = 0 for all j = 1, . . . , q.

Moreover, simple calculations give

SH (x) = M>H (x)WH (x)MH (x) = WH (x)MH (x) , (A.2)

and for any (nT )× 1 vector Π, we obtain

ι>nTSH(x)Π =

n∑i=1

ι>TKH (Xi, x) Πi −n∑i=1

ι>TKH (Xi, x) Πi

n∑j=1

qji/cj

=n∑n

i=1 cH (Xi, x)

n∑i=1

cH (Xi, x) ι>TKH (Xi, x) Πi. (A.3)

It follows that

ιnT>SH(x)ιnT =

n2∑ni=1 cH (Xi, x)

, (A.4)

ιnT>SH(x) (θ(X)− θ(x)ιnT ) =

n∑ni=1 cH (Xj , x)

n∑i=1

cH (Xi, x) ι>TKH (Xi, x) (θ(Xi)− θ(x)ιT ) ,

ιnT>SH(x)D0µ =

n2∑ni=1 cH (Xi, x)

µ, (A.5)

ιnT>SH(x) ν =

n∑ni=1 cH (Xi, x)

n∑i=1

cH (Xi, x) ι>TKH (Xi, x) νi, (A.6)

18

Page 20: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

where µ = n−1∑ni=1 µi. Then, by (2.12), we have

θ (x) = θ(x) + n−1n∑i=1

cH (Xi, x) ι>TKH (Xi, x) (θ(Xi)− θ(x)ιT ) + n−1n∑i=1

cH (Xi, x) ι>TKH (Xi, x) νi + µ

= θ(x) +An1 +An2 + µ, (A.7)

where the definition of Anj for j = 1, 2 should be apparent from the context. By Assumption A2, µi is i.i.d.

with a zero mean and finite variance, which means µ = Op(n−1/2

). We will show that the term An1 is a

bias term of θ (x), and that An2 with a proper scale will converge to a normal distribution. We derive the

limiting results of An1 and An2 in Lemmas A.1 and A.2 below.

Lemma A.1 Under Assumptions A1-A4, we have

An1 = BH (x) +Op

(‖H‖4

)+Op

((nT |H|)−1/2 ‖H‖

), (A.8)

where BH (x) = κ2tr{H[θ(1) (x) f (1) (x)

T/f (x) + θ(2) (x) /2

]H}

= O(‖H‖2

)and κ2 =

∫K (v) v2dv.

Proof : From Equation (A.7), we have

An1 = n−1∑ni=1 cH (Xi, x) ι>TKH (Xi, x) [θ (Xi)− θ (x) ιT ] = n−1

∑ni=1

∑Tt=1 ωit [θ (Xit)− θ (x)], where

ωit = λit/∑Ts=1 λis ≥ 0. Since ‖H‖ → 0 and T |H| → ∞ as T → ∞, we have, under Assumptions A1, A3

and A4, (T |H|)−1∑Tt=1 λit = f (x) +Op

(‖H‖2

)+Op

((T |H|)−1/2

)at an interior point x ∈ S, where the

α-mixing property of Xit across t is used to find the order of the variance of (T |H|)−1∑Tt=1 λit.

Denote fi (x) = (T |H|)−1∑Tt=1 λit and bi (x) = (T |H|)−1∑T

t=1 λit [θ (Xit)− θ (x)]. We can rewrite An1

as

An1 =1

n

n∑i=1

bi (x)

fi (x)=

1

n

n∑i=1

bi (x)

E[fi (x)

]1−

fi (x)− E[fi (x)

]fi (x)

.As{fi (x)− E

[fi (x)

]}/fi (x) = Op

((T |H|)−1/2

), the leading term ofAn1 isAn1,1 = n−1

∑ni=1 bi (x) /E

[fi (x)

].

It is easy to show that E (An1,1) = BH (x) + O(‖H‖4

)and V ar (An1,1) = n−1V ar

(bi (x) /E

[fi (x)

])= O

((nT |H|)−1 ‖H‖2

). Hence, we obtain An1,1 = BH (x) + Op

(‖H‖4

)+ Op

((nT |H|)−1/2 ‖H‖

). With

An1 = An1,1 [1 + op (1)], we complete the proof of Lemma A.1.

Lemma A.2 Under Assumptions A1-A4, we have√nT |H|An2

d→ N(0, ζ0σ

2ν/f (x)

), (A.9)

where ζ0 =∫K(v)2dv.

Proof : From Equation (A.7), we haveAn2 = n−1∑ni=1 cH (Xi, x) ι>TKH (Xi, x) νi = n−1

∑ni=1

∑Tt=1 ωitνit.

Applying the same proof method used in the proof of Lemma A.1, we can show that the leading term of

An2 is An2,1 = (nT |H|)−1∑ni=1

∑Tt=1 λitνit/E

[fi (x)

], which has a zero mean and V ar

(√nT |H|An2,1

)= σ2

ν |H|−1E(λ2it

)/{E[fi (x)

]}2

= ζ0σ2ν/f (x) + O

(‖H‖2

).

19

Page 21: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

Now, denote Zi = ι>TKH (Xi, x) νi. Evidently, E (Zi) = 0. Assumptions A1 and A2 imply that Zi is an

i.i.d. triangular array with H depending on n. Applying Liaponuov’s CLT to√nT |H|E

[fi (x)

]An2,1 =√

nT |H|∑ni=1 Zi completes the proof of Lemma A.2.

Proof of Theorem 2.1: Combining Equation (A.7), Lemmas A.1 and A.2 proves Theorem 2.1.

6.2 Proof of Theorems 3.1, 3.2, and 3.3

Proof of Theorem 3.1: Our proposed test statistic is In =(n2|H|

)−1∑ni=1

∑nj=1 U

>i QT KH (Xi, Xj)QT Uj ,

where Uj = Yj − αιT −Xj β and QT = IT − T−1ιT ι>T .

Under H0, we have Uj = (α0 − α) ιT + Xj

(β0 − β

)+ µjιT + νj . Since QT ιT = 0, we have QT Uj =

QTXj

(β0 − β

)+QT νj . Hence, we have

In =1

n2 |H|

(β0 − β

)> n∑i=1

n∑j=1

X>i QT KH (Xi, Xj)QTXj

(β0 − β

)+

2

n2 |H|

(β0 − β

)> n∑i=1

n∑j=1

X>i QT KH (Xi, Xj)QT νj

+1

n2 |H|

n∑i=1

n∑j=1

ν>i QT KH (Xi, Xj)QT νj

= In1 + 2In2 + In3, (A.10)

where the definitions of Inj , j = 1, 2, 3, should be apparent from the context. Using β0 − β = Op(n−1/2),

it is straightforward to show that under H0, In1 = Op(n−1), In2 = Op(n

−1), and In3 = Op((n√|H|)−1).

Hence, In3 is the leading term of In under H0.

Define the within-groups transformed error as νi = QT νi. We can decompose In3 into two terms:

In3 =1

n2 |H|

n∑i=1

n∑j 6=i

ν>i KH(Xi, Xj)νj +1

n2 |H|

n∑i=1

ν>i KH(Xi, Xi)νi

= In3,1 + In3,2,

where In3,2 = Op(n−1

)as we have, under Assumptions A2, A3 and T1(a)

E |I3n,2| ≤1

n |H|

T∑t=1

T∑s=1

E |νit νisKH(Xit, Xis)|

≤ 1

n |H|

T∑t=1

T∑s=1

E(ν2it

)E [KH(Xit, Xis)]

=σ2ν

(1− T−1

)n |H|

T∑t=1

T∑s=1

∫ ∫K(H−1 (xit − xis))ft,s (xit, xis) dxitdxis

= σ2ν (T − 1)

2n−1

T∑t=1

T∑s=1

∫ft,s (x, x) dx [1 + o (1)]

= O(n−1

),

20

Page 22: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

applying Hölder’s inequality to obtain the second inequality. Without using the leaving-one-out estimator,

In3,2 would contain a term related to KH(Xit, Xit) = KH(0), a constant number, and In3,2 would be of

order Op((n|H|)−1

). After multiplying the normalization factor n

√|H|, it would give an explosive center

term of order n|H|1/2O((n|H|)−1

)= Op

(|H|−1/2

).

Lemma A.3 below shows that In3,1 has a zero mean and asymptotic variance given by (n2|H|)−1σ20(1 +

o(1)), where σ20 is defined in equation (3.7). Next, applying Hall’s (1984) central limit theorem for a second-

order degenerate U-statistic, one can show that under both H0 and H1,

n√|H|In3,1

d→ N(0, σ2

0

). (A.11)

To prove (A.11), we define Hn

(χi, χj

)= ν>i QT KH (Xi, Xj)QT νj with χi = (Xi, νi), which is a symmet-

ric, centered and degenerate variable. Also, let Gn (χ1, χ2) = E [Hn (χ1, χi)Hn (χ2, χi) |χi]. Then straight-

forward calculations yield

E[G2n (χ1, χ2)

]+ n−1E

[H4n (χ1, χ2)

]{E [H2

n (χ1, χ2)]}2=O(|H|3

)+O

(n−1 |H|

)O(|H|2

) → 0,

provided that |H| → 0 and n |H| → ∞ as n→∞. Hence, (A.11) follows from Theorem 1 of Hall (1984).

Taking together (A.11) and In3,2 = Op(n−1

), we have n

√|H|In3

d→ N(0, σ2

0

). Hence, by (A.10), under

H0, we obtain n√|H|In = n

√|H|In3 +Op

(√|H|), so n

√|H|In

d→ N(0, σ2

0

).

Lemma A.3 V ar(n√|H|In3,1

)= σ2

0 + o(1), where σ20 = 2ζ0σ

4ν (T − 1)

2E [f (X1t)] and ζ0 =

∫K2(ω)dω.

Proof : Assumption A2 implies that E (In3,1) = 0 and

V ar (In3,1) =2

n4 |H|2n∑i=1

n∑j 6=i

E

[(ν>i KH(Xi, Xj)νj

)2]

=2(n− 1)

n3 |H|2E

[(ν>1 KH(X1, X2)ν2

)2]. (A.12)

Note that

ν>1 KH(X1, X2)ν2 =

T∑t=1

T∑s=1

ν1t ν2sKH(X1t, X2s) =T∑t=1

T∑s=1

ν1tν2sKH(X1t, X2s)

because KH(Xit, Xjs) = KH(Xit, Xjs) when j 6= i. Hence, under Assumption A2, we have

E

[(ν>1 KH(X1, X2)ν2

)2]

=

T∑t=1

T∑s=1

T∑t′=1

T∑s′=1

E [ν1tν2sν1t′ ν2s′KH(X1t, X2s)KH(X1t′ , X2s′)]

=[E(ν2

1t

)]2 T∑t=1

T∑s=1

E[K2H(X1t, X2s)

],

where E(ν2

1t

)= σ2

ν

(1− T−1

).

21

Page 23: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

Letting ω = H−1 (X2s −X1t) and applying the change of variable, we obtain

E[K2H(X1t, X2s)

]=

∫f(x1t)f(x2s)K

2(H−1 (x2s − x1t)

)dx1tdx2s

= |H|∫f(x1t)f(x1t +Hω)K2(ω)dωdx1t

= |H|∫f(x1t)f(x1t)dx1t

∫K2(ω)dω [1 + o (1)]

= |H| ζ0E [f (X1t)] [1 + o (1)] .

Summarizing the above results, we have shown that

V ar (In3,1) =2 (n− 1) ζ0σ

n3 |H| (T − 1)2E [f (X1t)] [1 + o (1)]

=σ2

0 (n− 1)

n3 |H| [1 + o (1)] ,

which completes the proof of Lemma A.3.

Lemma A.4 Under H0, σ2 p→ σ2

0, where

σ2 = 2(n2 |H|

)−1n∑i=1

n∑j 6=i

T∑t=1

T∑s=1

u2

itu2

jsK2H(Xit, Xjs).

Proof : Following the above proof, we can show that the leading term of σ2 is given by

~σ2 =2

n (n− 1) |H|

n∑i=1

n∑j 6=i

T∑t=1

T∑s=1

ν2itν

2jsK

2H(Xit, Xjs),

which is a standard second-order U statistic with Hn

(χi, χj

)= |H|−1∑T

t=1

∑Ts=1 ν

2itν

2jsK

2H(Xit, Xjs) and

χi = (Xi, νi). As E[Hn

(χi, χj

)2] ' |H|−2∑Tt=1

∑Ts=1

[E(ν4it

)]2E[K4H(Xit, Xjs)

]= O

(|H|−1

)= o (n)

with a finite T as E(ν4it

)= µ4 <∞ under Assumption T1(a) and |H|−1

E[K4H(Xit, Xjs)

]=∫K4 (ω) dω

∫ft,s (x, x) dx

+o (1) under Assumptions T1(a) and A3. Applying Lemma 3.1 of Powell et al. (1989), we then have

~σ2 = E(~σ2)

+ op (1). It is easy to show that E(~σ2)

= σ20 + o (1), which completes the proof of this lemma.

Proof of Theorem 3.2: Under H1, we have Uj = θ (Xj) − αιT −Xj β + µjιT + νj . Since QT ιT = 0, we

have QT Uj = QT θ (Xj)−QTXj β+QT νj . Assumptions A2, T1, and T2 ensure β = β+Op(n−1/2

), where β

is the probability limit of β (β is a well-defined constant vector). Hence, we can replace β by β in analyzing

the test statistic In under H1. Letting Zj = θ (Xj)−Xj β, we have

In =1

n2 |H|

n∑i=1

n∑j=1

Z>i QT KH (Xi, Xj)QTZj + In3 +Op

(n−1/2

)=

1

n2 |H|

n∑i=1

Z>i QT KH (Xi, Xi)QTZi +1

n2 |H|

n∑i=1

n∑j 6=i

Z>i QT KH (Xi, Xj)QTZj + In3 +Op

(n−1/2

)= In1 + In2 + In3 +Op

(n−1/2

), (A.13)

22

Page 24: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

where the definitions of In1 and In2 should be obvious, and In3 is exactly the same as defined in the proof

of Theorem 3.1. Hence, In3 = Op

((n√|H|)−1

)as shown in the proof of Theorem 3.1.

Consider In1 first. Define Zit = Zit − T−1∑Ts=1 Zis for all i and t. We have

E∣∣∣In1

∣∣∣ ≤ 1

n |H|E∣∣∣Z>i QT KH (Xi, Xi)QTZi

∣∣∣≤ 1

n |H|

T∑t=1

T∑s=1

E∣∣∣ZitZisKH (Xit, Xis)

∣∣∣≤ T (T − 1)

n |H| E(Z2itKH (Xit, Xis)

)where the last equation is obtained from Hölder’s inequality and Assumption T1. For any t < s, letting

ω = H−1 (Xis −Xit) and applying the change of variable give

|H|−1E(Z2itKH (Xit, Xis)

)= |H|−1

∫ ∫E(Z2it|xit

)K(H−1 (xit − xis)

)ft,s (xit, xis) dxisdxit

=

∫ ∫E(Z2it|xit

)K2 (ω) ft,s (xit, Hω + xit) dωdxit

=

(∫K2 (ω) dω

)∫E(Z2it|x)ft,s (x, x) dx + o (1)

where the last equation is bounded under Assumption T1. Hence, we have E∣∣∣In1

∣∣∣ ≤ Mn−1, which implies

In1 = Op(n−1

).

Now, we consider In2. Letting Hn (Xi, Xj) = |H|−1Z>i QT KH (Xi, Xj)QTZj , which is a symmetric

function. Then, In2 can be written as a symmetric second-order U-statistic,

n− 1

nIn2 =

2

n (n− 1)

n∑i=1

n∑j=i+1

Hn (Xi, Xj) .

Since E[Hn (Xi, Xj)

2]

= O(|H|−1

)= o (n), by Lemma 3.1 of Powell et al. (1989), we obtain In2 =

E [Hn (Xi, Xj)] + op (1) = |H|−1∑Tt=1

∑Ts=1E

[ZitZjsKH (Xit, Xjs)

]+op (1). Since Xi is independent of

Xj for i 6= j, we have

Andef= |H|−1

T∑t=1

T∑s=1

E[ZitZjsKH (Xit, Xjs)

]= |H|−1

T∑t=1

T∑s=1

E[E(Zit|Xit

)E(Zjs|Xjs

)KH (Xit, Xjs)

]. (A.14)

Define m(x) = θ(x)− x>β and mtt′(x) = E[θ(Xit′)−X>it′ β|Xit = x). Then, we have E(Zit−Zit′ |Xit) =

m(Xit)−mtt′(Xit) and E(Zjs − Zjs′ |Xjs = Xit) = m(Xit)−mss′(Xit). It follows

E(Zit|Xit) = E(Zit − T−1T∑t′=1

Zit′ |Xit) = T−1∑t′ 6=t

E(Zit − Zit′ |Xit)

= T−1∑t′ 6=t

[m(Xit)−mtt′(Xit)] (A.15)

23

Page 25: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

and

E(Zjs|Xjs = Xit) = E(Zjs − T−1T∑

s′=1

Zjs′ |Xjs = Xit) = T−1∑s′ 6=s

[m(Xit)−mss′(Xit)]. (A.16)

Hence, using (A.14), (??) and (??) we have

An = |H|−1T∑t=1

T∑s=1

E[E(Zit|Xit

)E(Zjs|Xjs

)KH (Xit, Xjs)

]= |H|−1

T∑t=1

T∑s=1

∫ ∫f(xit)f(xjs)E

(Zit|xit

)E(Zjs|xjs

)KH (xit, xjs) dxitdxjs

=

T∑t=1

T∑s=1

∫ ∫f(xit)

2E(Zit|xit

)E(Zjs|xjs = xit

)K (ω) dxitdω [1 + o (1)]

'T∑t=1

T∑s=1

∫f(x)2E

(Zit|x

)E(Zjs|x

)dx

=

∫f(x)2

T∑t=1

T∑s=1

1

T

T∑t′ 6=t

[m(x)−mtt′(x)]1

T

T∑s′ 6=s

[m(x)−mss′(x)]dx

=

∫f(x)2

1

T

T∑t=1

T∑t′ 6=t

[m(x)−mtt′(x)]

2

dx

= E

f(X11)

1

T

T∑t=1

T∑t′ 6=t

[m(X11)−mtt′(X11)]

2

≡ C,

where C = E

{f(X11)

[T−1

∑Tt=1

∑Tt′ 6=t[m(X11)−mtt′(X11)]

]2}> 0 is a positive constant. Then, it follows

that In2 = C + op(1). Or equivalently, In2p→ C > 0.

Next, we can show that σ2 converges to a finite positive constant under H1. As the proof method follows

closely the proof method used above, we only give a brief proof here. First, we see

σ2 =2

n2 |H|

n∑i=1

n∑j 6=i

T∑t=1

T∑s=1

u2

itu2

jsK2H(Xit, Xjs)

=2

n2 |H|

n∑i=1

n∑j 6=i

T∑t=1

T∑s=1

(Zit + vit

)2 (Zjs + vjs

)2

K2H(Xit, Xjs) + op (1)

≡ 2 (n− 1)

nBn + op (1) ,

whereBn is a standard second-order U-statistic withHn

(χi, χj

)= |H|−1∑T

t=1

∑Ts=1

(Zit + vit

)2 (Zjs + vjs

)2

×K2H(Xit, Xjs) and χi =

(Zi, vi

). As E

[Hn

(χi, χj

)2]= O

(|H|−1

)= o (n) under Assumptions T1 (b) and

A3, again we haveBn = E (Bn)+op (1). Simple calculation gives E[(Zit + vit

)2

|Xit

]= E

(Z2it + v2

it|Xit

)=

E(Z2it|Xit

)+ σ2

v

(1− T−1

). Following the same proof method used to prove In2

p→ C > 0, we can show

that Bn converges in probability to a finite positive number as n→∞, so does σ2.

24

Page 26: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

Now, combining the results above together, we obtain, under H1, the standardized test statistic Jn =

n√|H| In/

√σ2 diverges to +∞ at the rate of n

√|H|. This completes the proof of Theorem 3.2.

Proof of Theorem 3.3: We need to show that J∗n converges to N(0, 1) in distribution in probability.

Because Φ(z) is a continuous cdf, by Pola’s Theorem (Bhattacharya and Rao, 1986), we only need to show

that for any fixed value of z,

Pr∗ (J∗n ≤ z)− Φ(z) = op(1). (A.17)

The proof is similar to the proof of Theorem 3.1 except that one needs to use de Jong’s (1987) central limit

theorem (CLT) to replace Hall’s (1984) CLT when checking regularity conditions.

The bootstrap test statistic is I∗n =(n2|H|

)−1∑ni=1

∑nj=1 U

∗>i QT KH (Xi, Xj)QT U

∗j , where U

∗j = Y ∗j −

α∗ιT −Xj β∗

= (α− α∗) ιT +Xj

(β − β

∗)+U∗j . Since QT ιT = 0, we have QT U∗j = QTXj

(β − β

∗)+QTU

∗j .

Hence, we have

I∗n =1

n2 |H|

(β − β

)∗> n∑i=1

n∑j=1

X>i QT KH (Xi, Xj)QTXj

(β − β

∗)+

2

n2 |H|

(β − β

)∗> n∑i=1

n∑j=1

X>i QT KH (Xi, Xj)QTU∗j

+1

n2 |H|

n∑i=1

n∑j=1

U∗>i QT KH (Xi, Xj)QTU∗j

= I∗n1 + 2I∗n2 + I∗n3, (A.18)

where the definitions of I∗nj , j = 1, 2, 3, should be apparent from the context.

Let E∗ (·) = E(·| {(Xit, Yit)}i=1,...,n;t=1,...,T

). The two-point wild bootstrap method ensures E∗ (u∗it) =

0, E∗(u∗2it)

= u2it, and E

∗ (u∗3it ) = u3it for all i and t, where uit is the parametric LSDV residual. Using

β − β∗

= Op(n−1/2) and E∗(u∗it) = 0, it is straightforward to show that I∗n1 = Op(n

−1), I∗n2 = Op(n−1), and

I∗n3 = Op((n√|H|)−1). Hence, I∗n3 is the leading term of I∗n.

As in the proof of Theorem 3.1, we decompose I∗n3 into two terms:

I∗n3 =1

n2 |H|

n∑i=1

n∑j 6=i

U∗>i QT KH (Xi, Xj)QTU∗j +

1

n2 |H|

n∑i=1

U∗>i QT KH (Xi, Xi)QTU∗i

≡ I∗n3,1 + I∗n3,2

25

Page 27: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

where I∗n3,2 = Op(n−1

)as we have

E∗∣∣I∗n3,2

∣∣ ≤ 1

n2 |H|

n∑i=1

T∑t=1

T∑s6=t

E∗[∣∣u∗itu∗js∣∣K (Xit, Xis)

]≤ 1

n2 |H|

n∑i=1

T∑t=1

T∑s6=t

E∗(u∗2it )K (Xit, Xis)

∼ 1

n2 |H|

n∑i=1

T∑t=1

T∑s6=t

u2itK (Xit, Xis)

= Op(n−1), (A.19)

where we used the notation A ∼ B meaning that A and B have the same probability order. We have also

used E∗(u∗2it)

=(1− T−1

)2u2it +T−2

∑s6=t u

2isdef= σ∗2it . This follows from E∗

(u∗2it)

= u2it.

Under Assumptions A2, T1, and T2, there exists(α, β

>)>∈ Θ ⊂ R1+q such that α = α + Op

(n−1/2

)and β = β+Op

(n−1/2

). It implies that uit = Yit−α−X>it β = uit+(α− α)+X>it

(β − β

)= uit+Op

(n−1/2

),

where uit = Yit − α−Xᵀitβ is an i.i.d. sequence across i.

Letting the within-groups transformed error be U∗i = QTU∗i andH

∗n

(χ∗i , χ

∗j

)= |H|−1

U∗>i KH(Xi, Xj)U∗j

with χ∗i = (Xi, U∗i ), we have

I∗n3 =n− 1

nI∗n3,1 =

2

n (n− 1) |H|

n∑i=1

n∑j=i+1

U∗>i KH(Xi, Xj)U∗j =

2

n (n− 1)

n∑i=1

n∑j=i+1

H∗n(χ∗i , χ

∗j

),

where I∗n3 is a centered, second-order U-statistic as E∗ [H∗n (χ∗i , χ∗j)] = 0 holds true for all i 6= j, which

results from the fact that {u∗it} is an independent sequence with a zero mean, conditional on {(Xit, Yit)}.

Since E∗(u∗2it)

= u2it, we have E

∗ (u∗2it ) =(1− T−1

)2u2it +T−2

∑s6=t u

2is

def= σ∗2it . Then, the conditional

second moment of I∗n3 is given by

S∗2n = E∗(I∗2n3

)=

4

n2 (n− 1)2 |H|2

n∑i=1

n∑j=i+1

T∑t=1

T∑s=1

E∗[u∗2it u

∗2jsK

2H(Xit, Xjs)

]=

4

n2 (n− 1)2 |H|2

n∑i=1

n∑j=i+1

T∑t=1

T∑s=1

σ∗2it σ∗2jsK

2H(Xit, Xjs).

By de Jong’s (1987) Proposition 3.2 we know that I∗n3/√S∗2n

d→ N (0, 1) if G∗I , G∗II , and G

∗IV all have sto-

chastic order of op(S∗4n). Define W ∗n,ij = 2/ [n (n− 1)]H∗n

(χ∗i , χ

∗j

)and λit,js = KH(Xit, Xjs). Then, G∗I =∑n

i=1

∑nj>iE

∗ (W ∗4n,ij), G∗II =∑ni=1

∑nj>i

∑nl>j>iE

∗(W ∗2n,ijW

∗2n,il +W ∗2n,jiW

∗2n,jl +W ∗2n,liW

∗2n,lj

), and G∗IV =

{2/ [n (n− 1) |H|]}4∑ni=1

∑nj>i

∑nk>j>i

∑nl>k>j>i

∑Tt=1

∑Ts=1

∑Tt′=1

∑Ts′=1 E

∗ (u∗2it u∗2js u∗2kt′ u∗2ls′) [λit,jsλkt′,ls′

× (λit,kt′λls′,js + λit,ls′λkt′,js) + λit,kt′λls′,jsλit,ls′λkt′,js].

Replacing uit by uit in σ∗2it gives σ∗2it . It is easy to show that (|H|/n2)∑ni=1

∑nj 6=i∑Tt=1

∑Ts 6=t σ

∗2it

σ∗2jsK2H(Xit, Xjs) = (|H|/n2)

∑ni=1

∑nj 6=i∑Tt=1

∑Ts 6=t σ

∗2it σ

∗2jsK

2H(Xit, Xjs)+Op

(n−1/2

), which implies S∗2n =

4/ [n (n− 1) |H|]2∑ni=1

∑nj>i

∑Tt=1

∑Ts=1 σ

∗2it σ∗2js K

2H(Xit, Xjs) [1 + op (1)]

def= S∗2n [1 + op (1)]. Applying the

26

Page 28: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

similar proof method used in the proof of Theorem 3.2, one can easily show that

S∗2n = Op

(n−2 |H|−1

). (A.20)

Applying the same method to G∗I , we have

G∗I =

n∑i=1

n∑j>i

E∗(W ∗4n,ij

)=

16

[n (n− 1)]4

n∑i=1

n∑j>i

E∗[H∗n(χ∗i , χ

∗j

)4]

=16

[n (n− 1) |H|]4n∑i=1

n∑j>i

E∗

(T∑t=1

T∑s=1

u∗2it u∗2jsλ

2it,js

)2

=16

[n (n− 1) |H|]4n∑i=1

n∑j>i

T∑t=1

T∑s=1

T∑t′=1

T∑s′=1

E∗(u∗2it u

∗2js u∗2it′ u∗2js′)λ2it,jsλ

2it′,js′

= Op

((n2 |H|

)−3)

= op(S∗4n).

Similarly, we can show that G∗II = Op

((n5 |H|2

)−1)

= op(S∗4n)and G∗IV = Op

((n4 |H|

)−1)

= op(S∗4n).

Therefore, we have I∗n3/√S∗2n → N (0, 1) in distribution in probability.

Finally, we define

σ∗2 =2

n2 |H|

n∑i=1

n∑j 6=i

T∑t=1

T∑s=1

u∗2it u∗2jsK

2H(Xit, Xjs),

and we need to show that σ∗2 = n2|H|S∗2n + op (1) so that n√|H|I∗n3/

√σ∗2 → N (0, 1) in distribution in

probability holds. Applying the method used in the proof of Theorem 3.2, we have

σ∗2 = E∗(σ∗2)

+ op (1) =4

n2 |H|

n∑i=1

n∑j>i

T∑t=1

T∑s=1

E∗(u∗2it)E∗(u∗2js)K2H(Xit, Xjs) + op (1)

=4

n2 |H|

n∑i=1

n∑j>i

T∑t=1

T∑s=1

σ∗2it σ∗2jsK

2H(Xit, Xjs) + op (1)

= n2 |H|S∗2n + op (1) .

Finally, as I∗n = nn−1 I

∗n3 + Op

(n−1

), we have J∗n = n

√|H|I∗n3/

√σ∗2 = n

n−1n√|H|I∗n3/

√σ∗2 +Op (|H|)

d→ N (0, 1) in distribution in probability holds. This completes the proof of Theorem 3.3.

27

Page 29: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

References

[1] Arellano, M., 2003. Panel Data Econometrics. Oxford University Press, New York, NY.

[2] Baltagi, B., 2005. Econometrics Analysis of Panel Data (2nd edition). Wiley, New York, NY.

[3] Bierens, H.J., Ploberger, W., 1997. Asymptotic theory of integrated conditional moment tests. Econo-

metrica 65, 1129-1151.

[4] Bhattacharya, R.N., Rao, R.R., 1986. Normal approximations and asymptotic expansions. Krieger.

[5] Chen, X., Fan, Y., 1999. Consistent hypothesis testing in semiparametric and nonparametric models for

econometric time series. Journal of Econometrics 91, 373-401.

[6] de Jong, P., 1987. A central limit theorem for generalized quadratic forms. Probability Theory and

Related Fields 75, 261-277.

[7] Ellison, G., Ellison, S.F., 2000. A simple framework for nonparametric specification testing. Journal of

Econometrics 96, 1—23.

[8] Fan, Y., Li,Q., 1996. Consistent model specification tests: omitted variables and semiparametric func-

tional forms. Econometrica 64, 865-890.

[9] Fan, Y., Li, Q., 1999. Central limit theorem for degenerate U-statistics of absolutely regular processes

with application to model specification test. Journal of Nonparametric Statistics 11, 251-269.

[10] Gao, J., King, M., Lu, Z. and Tjpstheim, D., 2009. Nonparametric specification testing for nonlinear

time series with non-stationarity. Econometric Theory 25, 1869-1892.

[11] Hall, P., 1984. Central limit theorem for integrated square error of multivariate nonparametric density

estimators. Annals of Statistics 14, 1-16.

[12] Härdle, W., Mammen, E., 1993. Comparing nonparametric versus parametric regression fits. Annals of

Statistics 21, 1926—1947.

[13] Henderson, D.J., Carroll, R.J., Li, Q., 2008. Nonparametric estimation and testing of fixed effects panel

data models. Journal of Econometrics 144, 257-275.

[14] Hong, Y., White, H., 1995. Consistent specification testing via nonparametric series regression. Econo-

metrica 63, 1133-1159.

[15] Horowitz, J.L., Spokoiny, V.G., 2001. An adaptive, rate-optimal test of a parametric mean-regression

model against a nonparametric alternative. Econometrica 69, 599-631.

[16] Hsiao, C., 2003. Analysis of Panel Data (2nd edition). Cambridge University Press, New York, NY.

28

Page 30: A Consistent Nonparametric Test of Parametric Regression ... · –xed e⁄ects panel data models and provides respective consistent estimators. In Section 3, we explain how to construct

[17] Li, Q., 1999. Consistent model specification tests for time series econometric models. Journal of Econo-

metrics 92, 101-147.

[18] Li, Q., Wang, S., 1998. A simple consistent bootstrap test for a parametric regression function. Journal

of Econometrics 87, 145-165.

[19] Mammen, E., StØve, B., TjØstheim, D., 2009. Nonparametric additive models for panels of time series.

Econometric Theory 25, 442-481.

[20] Poirier, D.J., 1995. Intermediate Statistics and Econometrics: a Comparative Approach. The MIT Press,

Cambridge, MA.

[21] Powell, J.L., Stock, J.H., Stoker,T.M., 1989. Semiparametric estimation of index coeffi cients. Econo-

metrica 57, 1403—1430.

[22] Stengos, T., Sun,Y., 2001. Consistent model specification test for a regression function based on non-

parametric wavelet estimation. Econometric Reviews 20, 41-60.

[23] Su, L., Ullah, A., 2006. Profile likelihood estimation of partially linear panel data models with fixed

effects. Economics Letters 92, 75-81.

[24] Sun, Y., Cai, Z., Li,Q., 2009. Consistent nonparametric test on parametric smooth coeffi cient model

with nonstationary data. Unpublished Manuscript.

[25] Sun, Y., Carroll, R.J., Li, D., 2009. Semiparametric estimation of fixed effects panel data varying

coeffi cient models. Advances in Econometrics 24, 101-130.

[26] Whang, Y., 2000. Consistent bootstrap tests of parametric regression functions. Journal of Econometrics

98, 27-46.

[27] Yatchew, A.J., 1992. Nonparametric regression tests based on least squares. Econometric Theory 8,

435-451.

[28] Zheng, J.X., 1996. A Consistent test of functional form via nonparametric estimation techniques. Journal

of Econometrics 75, 263-289.

29