varying-coefficient panel data models with partially ... · for panel data models, in terms of time...

40
ISSN 1440-771X Department of Econometrics and Business Statistics http://business.monash.edu/econometrics-and-business- statistics/research/publications January 2018 Working Paper 1/18 Varying-Coefficient Panel Data Models with Partially Observed Factor Structure Chaohua Dong, Jiti Gao and Bin Peng

Upload: others

Post on 22-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

ISSN 1440-771X

Department of Econometrics and Business Statistics

http://business.monash.edu/econometrics-and-business-statistics/research/publications

January 2018

Working Paper 1/18

Varying-Coefficient Panel Data Models

with Partially Observed Factor Structure

Chaohua Dong, Jiti Gao and Bin Peng

Page 2: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Varying–Coefficient Panel Data Models withPartially Observed Factor Structure1

Chaohua Dong∗, Jiti Gao?2 and Bin Peng†

∗Southwestern University of Finance and Economics

?Monash University and †University of Bath

January 30, 2018

Abstract

In this paper, we study a varying–coefficient panel data model with nonstationarity,

wherein a factor structure is adopted to capture different effects of time invariant variables

over time. The methodology employed in this paper fills a gap of dealing with the mixed

I(1)/I(0) regressors and factors in the literature. For comparison purposes, we consider the

scenarios where the factors are either observable or unobservable, respectively. We propose

an estimation method for both the unknown coefficient functions involved and the unknown

factors before we establish the corresponding theory. We then evaluate the finite–sample

performance of the proposed estimation theory through extensive Monte Carlo simulations.

In an empirical study, we use our newly proposed model and method to study the returns

to scale of large commercial banks in the U.S.. Some overlooked modelling issues in the

literature of production econometrics are addressed.

Keywords: Asymptotic theory; Orthogonal series method; Translog cost function; Return to scale

JEL classification: C14, C23, D24

1The first author thanks the support from National Natural Science Foundation of China under Grant71671143. The second author was supported by the Australian Research Council Discovery Grants Programunder Grant numbers: DP150101012 & DP170104421.

2Corresponding author: Jiti Gao, Department of Econometrics and Business Statistics, Monash University,Caulfield East, Victoria 3145, Australia. Email: [email protected].

Page 3: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

1 Introduction

For panel data models, in terms of time dimension, one often encounters three types of regressors:

(1) stationary (e.g., interest rate), (2) nonstationary (e.g., exchange rate) and (3) time invariant

(e.g., distance from the sea). However, there are not many panel data models which allow for

the existence of these three kinds of variables together.

In the literature, a panel data model mostly studied is formulated as

yit = x′itβ0 + γi + eit, (1.1)

where γi’s are the so–called fixed effects, and are used to capture the individual heterogeneity.

In order to implement the estimation, a within transformation is always necessary (Hsiao, 2003).

However, by doing so, one cannot allow xit to include any time invariant variables, as they get

cancelled together with the fixed effects. Around a decade ago, panel data models with factor

structure (also known as panel data models with interactive fixed effects)

yit = x′itβ0 + f ′0tγi + eit (1.2)

get introduced to researchers by two parallel studies (Pesaran, 2006 and Kapetanios et al., 2011;

and Bai, 2009 and Bai et al., 2009), which not only capture strong cross–sectional dependence

among individuals, but also get time invariant variables to be accommodated in linear parametric

panel data models as regressors. Through imposing a structure on the regressors, Pesaran (2006)

and Kapetanios et al. (2011) allow one to replace the unobservable stationary and nonstationary

f0t’s with observed variables yit, xit respectively, so that the consistent estimates are achieved

in these two papers. Meanwhile, Bai (2009) and Bai et al. (2009) also investigate (1.2) under

stationary and nonstationary scenarios separately, wherein a specific form on the regressor xit is

no longer necessary due to the usage of a principal component analysis (PCA) technique.

Both groups of studies require xi1, . . . , xiT and f01, . . . , f0T to be stationary or nonsta-

tionary simultaneously, which opens a question, that is, what would happen if panel data models

have mixed I(1)/I(0) regressors and factors. Although Section 4.2 of Bai et al. (2009) sketches

some brief idea of dealing with the singular matrix that might be encountered in the mixed

situation, no detailed development and application have been carried on since then. In addition,

because the coefficient is constant, it is hard for model (1.2) to measure the impacts of time

invariant variables over time. For instance, as policies change from time to time, the location of

cities certainly has a time–varying impacts. The arise of California in the history of U.S. serves

as a perfect example for this argument.

Recently, in order to study the U.S. stock market, Connor et al. (2012) extend the Fama–

French three–factor model to a semiparametric setting:

1

Page 4: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

yit = f ′0tγ0(vi) + eit, (1.3)

where γ0(v) = (γ01(v1), . . . , γ0dv(vdv))′ is a dv–dimensional unobservable loading function with

v = (v1, . . . , vdv)′. Building on this work, an extension with the statistical inference on projected

PCA is provided in Fan et al. (2016) later on. On the one hand, the setting of these two

papers provides a guide on how to capture the strong cross–sectional dependence caused by time

invariant variables; on the other hand, it allows us to measure the different time effects of these

time invariant variables.

Further to the aforementioned issues, another challenge of studying panel data models involv-

ing nonstationarity is establishing the asymptotics for joint limits. So far, a very limited number

of studies has been devoted to dwell on the issue. Some relevant literature includes Phillips and

Moon (1999), Pedroni (2004), Bai et al. (2009), Bai and Carrion-I-Silvestre (2009) and Dong

et al. (2015a). However, among these studies, the joint limits are not always achievable due to

some technical hurdles.

In view of the above discussion, the following model will be studied in this paper:

yit = x′itβ0(rit, τt) + f ′0tγ0(vi) + eit, (1.4)

where i ∈ 1, . . . , N, t ∈ 1, . . . , T, τt = t/T and eit is an error process having serial corre-

lation over t and weak cross–sectional dependence across i. The subscript “0” stands for the

true parameter or true function throughout this study. This paper focuses on the case where

f01, . . . , f0T is a stationary process, and we start with observable factors and then unobservable

ones, respectively.

Below, we briefly introduce the rest of the variables and functions at first, and will provide

detailed assumptions and discussions wherever necessary.

• Observable variables:

– xit = xi,t−1 + wit is a dx–dimensional integrated process on the time dimension;

– rit is a locally stationary3 process across t;

– vi = (vi,1, . . . , vi,dv)′ is a dv–dimensional vector;

3We follow Vogt (2012) and Dong and Linton (2017) to use the following definition of a locally stationaryprocess.Definition: The d× 1 dimensional process rt | t = 1, . . . , T is locally stationary if for each rescaled time pointu ∈ [0, 1] there exists an associated process rt[u] | t = 1, . . . , T with the following two properties:

1. rt[u] | t = 1, . . . , T is strictly stationary with density fu(r);

2. It holds that ‖rt − rt[u]‖ν ≤(|τt − u|+ T−1

)Rt(u) a.s., where τt = t/T , Rt(u) is a process of positive

variables satisfying E|Rt(u)|ρ < C for some ρ ≥ 1 and C <∞ independent of u, t, and T . Moreover, ‖ · ‖νdenotes an arbitrary norm on Rd.

2

Page 5: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

– dx and dv are known and finite.

• Flexible unknown functions of interest: β0(·, ·) and γ0(·)

– β0(·, ·) = (β01(·, ·), . . . , β0dx(·, ·))′, where, for ` = 1, . . . , dx, β0` is square integrable on

R× [0, 1], i.e., β0`(·, ·) ∈ L2(R× [0, 1]);

– γ0(v) = (γ01(v1), . . . , γ0dv(vdv))′, where v = (v1, . . . , vdv)

′ ∈ Rdv and γ0`(·) ∈ L2(R) for

` = 1, . . . , dv.

Model (1.4) clearly nests the model (1.3) of Connor et al. (2012) as a special case. Moreover,

when the factor loading variables are not observable, we can assume that the factor part has the

traditional parametric linear structure, i.e., yit = x′itβ0(rit, τt) + f ′0tγi + eit, which extends the

parametric model of Bai (2009) to a semiparametric setting. Another related paper is Feng et al.

(2017), who consider a varying–efficient panel data model of the form yit = x′itβ0(zit) + αi + eit

without involving a factor structure, in which zit is a vector of discrete covariates and αi is the

fixed–effect.

When the vector function β0(rit, τt) takes a specific structure, we may rewrite model (1.4)

as yit = x′1itβ01(rit) + x′2itβ02(τt) + f ′0tγ0(vi) + eit, similar to the spirit of equation (3) in Vogt

(2012). Also, rit can be the same as vi, which gives another special case of the form: yit =

x′itβ0(vi, τt) + f ′0tγ0(vi) + eit. For instance, it is reasonable to let rit = vi = distance from the seai

be the driving force for economic growth models in view of the history of some cities like London,

New York and Hong Kong, etc.; or one can let rit = vi = size characteristici for financial models

(cf., Connor et al., 2012); and so forth.

In the empirical study of this paper, we specifically use (1.4) to consider the economies

of scale for commercial banks in the U.S.. More often than not, the literature of production

econometrics (e.g., Feng and Serletis, 2008; Feng and Zhang, 2012) is very much interested in

the relationship of

CT

ηJand

(η1

ηJ, · · · , ηJ−1

ηJ, ζ1, · · · , ζK

), (1.5)

where CT represents total costs, η1, . . . , ηJ represent the input prices, and ζ1, . . . , ζK rep-

resent the output prices.4 Researchers usually impose a linear parametric relationship for some

reasons for the translog data:

lnCT

ηJ= α1 ln

η1

ηJ+ · · ·+ αJ−1 ln

ηJ−1

ηJ+ ψ1 ln ζ1 + · · ·+ ψK ln ζK . (1.6)

4In the literature, one always needs to divide CT , η1, . . . , ηJ−1 by ηJ to maintain linear homogeneity withrespect to input prices.

3

Page 6: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Moreover, to capture different marginal effects of the variables over time, the traditional litera-

ture normally requires to include certain interaction terms between time t (or/and t2) and some

other variables.

Nevertheless, the choice of variables like t or t2 is too arbitrary and lack of statistical support,

though these simple forms are relatively easy for modelling and estimation. Note also by the

empirical study in Section 4 below, the Augmented Dickey–Fuller (ADF) test suggests that all

the time series associated with ln η1, . . . , ln ηJ−1 and ln ζ1, . . . , ln ζK are in fact I(1) processes.

Such feature is barely mentioned in the literature of production economics. Therefore, this

economic example motivates us strongly to study panel data models with nonstationarity and

more flexible function forms, such as (1.4).

In summary, this paper makes the following contributions to the literature.

• It introduces a new nonlinear panel data model in equation (1.4) associated with both

stochastic and deterministic variables involved in the nonparametrically unknown varying–

coefficients;

• The proposed model incorporates local stationarity in rit and unit–root nonstationarity

in xit for the time series dimension as well as a type of cross–sectional dependence in

the cross–sectional dimension;

• It allows for a factor structure in the regression component to reflect possible a ‘macro–type’

of cross–sectional dependence, in which the loadings may be nonparametrically specified

as unknown functions of observable variables;

• This paper also relaxes the conventional mutual independence assumption between eit and

(xit, rit, f0t, vi) to a type of weak exogeneity among them; and

• This paper establishes a set of new asymptotic properties for the proposed estimators,

including both uniform convergence and central limit theorem for the factor estimator.

The structure of this paper is as follows. Section 2 introduces the necessary assumptions

on the model and its estimation procedure; meanwhile, the asymptotic theories under both

observable and unobservable factor cases are established, respectively. Section 3 uses some

Monte Carlo simulations to examine the theoretical findings of Section 2. In Section 4, we study

the issue of economies of scale of commercial banks in the U.S.. In Section 5, some potential

extensions (e.g., how to allow regressors and factors to include both I(1)/I(0) processes) of our

methodology are discussed. Section 6 concludes. Appendix A states the main lemmas, and then

presents the proofs of the main asymptotic results of this paper. All the preliminary lemmas

and the omitted proofs are provided in Appendix B of a supplementary material available from

4

Page 7: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

the authors. In addition, Appendix B includes some extra discussions and numerical studies due

to the space limitation in the main sections of this paper.

Before proceeding further, it is convenient to introduce some notation. ‖ · ‖ denotes the

Euclidean norm of a vector or the Frobenius norm of a matrix; ‖ · ‖sp denotes the spectral norm

of a matrix; a.s. stands for almost surely; the symbol “→P” denotes convergence in probability;

“→D” denotes convergence in distribution; aijm×n stands for an m×n matrix with aij being the

element at the ith row and jth column; bac means the largest integer part of a real number a; for

a square matrix W , let λmin(W ) and λmax(W ) stand for the minimum and maximum eigenvalues

of W respectively; MW = IT−PW denotes the orthogonal projection matrix generated by matrix

W , where PW = W (W ′W )−1W ′, and W is a T × q matrix with rank q; diagA1, . . . , Ak means

constructing block diagonal matrix from matrices (or scalars) A1, . . . , Ak; for a matrix M , let

vec(M) denote the vectorization operation.

2 Estimation Procedure and Asymptotic Results

In order to recover different unknown functions, the sieve estimation method is adopted in

this paper since its biggest advantage is that it can convert a nonparametric function into an

approximate parametric form. Thus, we first briefly introduce some Hilbert spaces and their

orthonormal systems with the respective function norms.

The function space L2(R): Let Hj(w) | j ≥ 0 be the Hermite polynomial system which

is orthogonal with respect to exp(−w2). The orthogonality reads∫Hi(w)Hj(w) exp(−w2)dw =

√π2jj!δij, where δij is the Kronecker delta. Define hj(w) = 1

4√π√

2jj!Hj(w) exp(−w2

2) for j ≥ 0.

Thus, hj(w) is an orthonormal basis in the Hilbert space L2(R) = h(w) |∫h2(w) exp(−w2)dw

<∞. As a result, for ∀g(w) ∈ L2(R) we have an orthogonal series expansion:

g(w) =∞∑j=0

cjhj(w) := gm(w) + δg,m(w), (2.1)

where cj =∫g(w)hj(w)dw, and we define for m ≥ 1, the partial sum gm(w) :=

∑m−1j=0 cjhj(w)

and the residue δg,m(w) :=∑∞

j=m cjhj(w) of the series for later use. For ∀g ∈ L2(R), the norm

is defined as ‖g‖L2 =∫

R g2(w)dw

1/2and by the Parseval equality, ‖g‖2

L2 =∑∞

j=0 c2j , which

implies the attenuation of the coefficients.

The function space L2([0, 1]): Let s0(u) = 1 and sj(u) =√

2 cos(πju) with j ≥ 1. It is

easy to see that∫ 1

0si(u)sj(u)du = δij, so sj(u) is an orthonormal basis in the Hilbert space

L2([0, 1]) = r(u) |∫ 1

0r2(u)du < ∞. Thence, for ∀r(u) ∈ L2([0, 1]), we have the following

orthogonal series expansion:

r(u) =∞∑j=0

cjsj(u) := rm(u) + δr,m(u), (2.2)

5

Page 8: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

where cj =∫ 1

0r(u)sj(u)du, and we define similarly the partial sum rm(u) and the residue δr,m(u)

of the series, respectively. The norm of a function r(u) ∈ L2([0, 1]) is defined as ‖r‖L2 =∫ 1

0r2(u)du

1/2

for which the Parseval equality also applies.

The function space L2(R× [0, 1]): This function space can be viewed as a tensor product of

the first two. By definition L2(R× [0, 1]) = β(w, u) |∫∫

R×[0,1]β2(w, u)dwdu <∞, in which the

norm of functions is given by ‖β‖L2 =∫∫

R×[0,1]β2(w, u)dwdu

1/2

. It is known that the tensor

product of hj(w) and sj(w) is an orthonormal basis in L2(R× [0, 1]). That is,

∫∫R×[0,1]

hj1(w)sj2(u)hj∗1 (w)sj∗2 (u)dwdu =

1, iff (j1, j2) = (j∗1 , j

∗2),

0, otherwise.(2.3)

Thus, for ∀β(w, u) ∈ L2(R× [0, 1]) we have the following orthogonal series expansion:

β(w, u) =∞∑j1=0

∞∑j2=0

cj1j2hj1(w)sj2(u),

where cj1j2 =∫∫

R×[0,1]β(w, u)hj1(w)sj2(u)dwdu.

By the above description, we then have β0`(w, u) =∑∞

j1=0

∑∞j2=0 c0`,j1j2hj1(w)sj2(u) for each

element of β0(·, ·), where 1 ≤ ` ≤ dx and c0`,j1j2 =∫∫

R×[0,1]β0`(w, u)hj1(w)sj2(u)dwdu. Similarly,

for m1,m2 ≥ 1 define the partial sum and residue such that

β0`(w, u) =

m1−1∑j1=0

m2−1∑j2=0

c0`,j1j2hj1(w)sj2(u) + δβ0`,m(w, u)

:=m∑j=1

c0`,jbj(w, u) + δβ0`,m(w, u), (2.4)

where bj(w, u) represents the corresponding product hj1(w)sj2(u) arranged in a suitable order

with respect to the indices (j1, j2), and m = m1m2. Without loss of generality, truncating the

expansions of all the elements of β0(·, ·) by the same pair of (m1,m2) allows us to further write

β0(w, u) := β0,m(w, u) + ∆β0,m(w, u), (2.5)

where β0,m(w, u) = Cβ0Bm(w, u), Bm(w, u) = (b1(w, u), . . . , bm(w, u))′, and the coefficient matrix

Cβ0 and the truncation residual vector ∆β0,m(w, u) are defined conformably.

Likewise, we can expand γ0(v) as

γ0(v) := γ0,n(v) + ∆γ0,n(v), (2.6)

where γ0,n(v) = H ′n(v)Cγ0 , Hn(v) = diagHn(v1), . . . ,Hn(vdv), Hn(·) = (h0(·), . . . , hn−1(·))′,

6

Page 9: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

and Cγ0 and ∆γ0,n(v) are defined conformably.

All these orthogonal series forms for β0(·, ·) and γ0(·) in (2.5) and (2.6) are used later in the

estimation of these unknown functions.

Remark 2.1. Note that the convergence of all orthogonal series expansions, for example, (2.1)

and (2.2), in the Hilbert spaces aforementioned can be pointwise or in the sense of norm under

certain conditions (e.g., Assumption 2.2 of this paper). We omit these details for the time being

in order not to deviate from our main goal.

Moreover, for any vector of functions G(·) = (g1(·), . . . , gdG(·))′ we define its norm as

‖G‖L2 =∑dG

`=1 ‖g`‖2L2

1/2

, where g`(·) can be in either L2(R) or L2([0, 1]) or L2(R × [0, 1])

for ` = 1, . . . , dG. When there is no misunderstanding, we use ‖ · ‖L2 throughout this paper

without mentioning the particular function space.

Now, we state assumptions that are used to establish an asymptotic theory for our proposed

estimators. We would like to point out that the relevant discussions and justifications of all the

assumptions of this paper will not be mentioned in the main text for better presentation and

conciseness, but are provided at the beginning of Appendix A below.

Assumption 1.

1. Let εij | i ∈ Z+, j ∈ Z be an array of dx–dimensional independent and identically dis-

tributed (i.i.d.) random variables over i and j. Moreover, E[ε11] = 0, E[ε11ε′11] = Idx,

E‖ε11‖q <∞ for some q > 4. In addition, the characteristic function of ε11 is integrable.

2. For each i ≥ 1, let xit = xi,t−1 + wit, where maxi≥1 ‖xi0‖ = OP (1), and wit is a linear

process given by wit =∑∞

j=0Dijεi,t−j. In addition, Dij | i ∈ Z+, j ∈ Z is a sequence

of deterministic matrices such that (1) Di0 = Idx, (2) maxi≥1

∑∞j=0 j‖Dij‖ < ∞, and (3)

Di :=∑∞

j=0Dij is of full rank uniformly in i.

3. ri1, . . . , riT is locally stationary with an associated process ri1[u], . . . , riT [u] for each

i ≥ 1. Moreover, for ∀u ∈ [0, 1], let ri1[u], . . . , riT [u] be identically distributed across i.

4. Denote that Xit = . . . , εi,t−1, εit; ri1, . . . , rit; vi; f01, . . . , f0t.

(a) Let ei1, . . . , eiT be identically distributed across i, and let et = (e1t, . . . , eNt)′ | t ≥

1 be strictly stationary and α–mixing with E[eit | Xit] = 0 almost surely (a.s.).

(b) Conditionning on Xit and Xjs, let αij(|t − s|) denote the mixing coefficient between

eit and ejs, such that for some δ > 0,∑N

i,j=1

∑Tt,s=1 |αij(|t− s|)|δ/(4+δ) = O(NT ) a.s..

Moreover, maxi,j,t,sE[|eit|2+δ/2 | Xit,Xjs

]<∞, a.s..

7

Page 10: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

(c) For t ≥ 1, suppose that E[eitejt | XiT ,XjT ] = σij a.s.. In addition, suppose that

max∑

i 6=j σ2ij,∑

i 6=j |σij| = O(N) and∑

i 6=j∑

t6=sE[(eitejt − σij)(eisejs − σij)] =

O(TN2).

5. (a)∥∥ 1TF ′0F0 − Σf

∥∥ = OP

(1√T

), where F0 = (f01, . . . , f0T )′ is T × dv, and Σf is a deter-

ministic and positive definite matrix of dv × dv. Moreover, maxtE‖f0t‖4 <∞.

(b)∥∥ 1N

Γ′0Γ0 − Σγ

∥∥ = OP

(1√N

), where Γ0 = (γ0(v1), . . . , γ0(vN))′ is N × dv, and Σγ is a

deterministic and positive definite matrix of dv × dv.

Assumption 2.

1. Suppose that there exist deterministic matrices Σ1mm, Σ12m, Σ12mn and Σ2nn such that

(a)∥∥∥ 1NT

∑i,t[Bm,itB

′m,it]⊗

∫ τt+1

τtWi(w)W ′

i (w)dw − Σ1mm

∥∥∥→P 0,

(b)∥∥∥ 1NT

∑i,t

Bm,it ⊗

∫ τt+1

τtWi (w) dw

f ′0t(Idv ,H

′n(vi))− (Σ12m,Σ12mn)

∥∥∥→P 0,

(c)∥∥∥ 1N

∑Ni=1 Hn(vi)ΣfH ′

n(vi)− Σ2nn

∥∥∥→P 0,

where Bm,it = Bm(rit[τt], τt) with the m–dimensional vector Bm(·, ·) being defined in (2.5),

and Wi(w) stands for a dx–dimensional Brownian motion with a covariance matrix DiD′i

defined in Assumption 1. Moreover, suppose that

0 < A1 ≤ minλmin(Σ∗), λmin(Σ†) ≤ maxλmax(Σ∗), λmax(Σ†) < A2 <∞

uniformly in (m,n), where

Σ∗ =

Σ1mm Σ12mn

Σ′12mn Σ2nn

and Σ† = Σ1mm − Σ12mΣ−1f Σ′12m.

2. (a) max1≤`≤dv∑∞

j=n

∣∣ ∫R γ0`(w)hj(w)dw

∣∣ = O(n−µ12 ) for some µ1 > 0;

(b) max1≤`≤dx∑∞

j=m+1

∣∣∣ ∫∫R×[0,1]β0`(w, u)bj(w, u)dwdu

∣∣∣ = O(m−µ22 ) for some µ2 > 1,

where bj(·, ·) is defined in (2.4).

3. m2

minN,T → 0, n2

T→ 0 and T

mµ2→ 0.

We are now ready to propose our estimation procedure, and then establish the corresponding

asymptotic results.

8

Page 11: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

2.1 The Case with Observable Factors

We start with the case where ft’s are observable. According to (2.5) and (2.6), the model (1.4)

can be rewritten as

yit = [B′m(rit, τt)⊗ x′it] vec(Cβ0) + f ′0tH′n(vi)Cγ0 + ωit, (2.7)

where ωit = x′it∆β0,m(rit, τt) + f ′0t∆γ0,n(vi) + eit.

Remark 2.2. It is worth commenting on the truncation residual x′it∆β0,m(rit, τt) here. Since xit

is an I(1) process, the rate of x′it∆β0,m(rit, τt) converging to 0 is affected by the following three

terms: (1) the smoothness of each element of β0(·, ·), (2) the truncation parameter m, and (3)

the rate of xit diverging. The first two conditions are well documented in the literature. For the

third one, as E‖xit‖2 = O(t), the rate of x′it∆β0,m(rit, τt) converging to 0 in fact becomes slower

as t diverges.

By virtue of (2.7), we have an explicit expression for the estimate of (vec(Cβ0)′, C ′γ0)

′ from

the ordinary least squares (OLS) method:

(vec(Cβ)′, C ′γ)′ =

[N∑i=1

T∑t=1

ΘitΘ′it

]−1 N∑i=1

T∑t=1

Θityit, (2.8)

where Θit = (z′it, f′0tH

′n(vi))

′ and zit = Bm(rit, τt) ⊗ xit. Correspondingly, the estimators of

β0(w, u) and γ0(v) are defined by βm(w, u) = CβBm(w, u) and γn(v) = H ′n(v)Cγ, respectively.

Based on Assumptions 1 and 2 and in view of Lemma A.1, we first establish the uniform

convergence of the estimator defined by (2.8).

Theorem 2.1. Let Assumptions 1 and 2 hold. As (N, T )→ (∞,∞),

1. sup(w,u)∈R×[0,1]

‖βm(w, u)− β0(w, u)‖ = OP

(mN−

12T−1

)+OP

(m

12κ)

,

2. supv∈Rdv

‖γn(v)− γ0(v)‖ = OP

(n(NT )−

12

)+OP

(n

12κ)

,

where κ = maxT 12m−

µ22 , n−

µ12 , and µ1 and µ2 are defined in Assumption 2.2.

Due to some properties of the Hermite functions (cf., Lemma B1 of the supplementary file),

we can achieve the rates of convergence for sup norm in Theorem 2.1 without restricting both

β0(·, ·) and γ0(·) to compact sets (cf., Newey, 1997). Note that both results of Theorem 2.1 share

the same rate (i.e., κ) generated by two truncation residuals of (2.7), which is consistent with

the development given for the term A1n of Dong and Linton (2017, pp. 36–37). This is primarily

due to the fact that the two truncation residuals x′it∆β0,m(rit, τt) and f ′0t∆γ0,n(vi) in (2.7) exist

9

Page 12: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

in an additive form (i.e., not separable when implementing (2.8)). Moreover, as explained in

Remark 2.2, the term x′it∆β0,m(rit, τt) will yield a rate of T12m−

µ22 , which is slower than the rate

associated with the stationary cases.

The asymptotic normality of both βm(w, u) and γn(v) is established in the next theorem.

Theorem 2.2. Let Assumptions 1 and 2 hold. Suppose further that N12Tm−1/2κ → 0 and for

∀(w, u, v) ∈ R× [0, 1]× Rdv ,

D1 · diag[B′m(w, u)⊗ Idx

],H ′

n(v)

Σ−1∗ D2

N∑i=1

T∑t=1

Θiteit →D N(0, Σ), (2.9)

where D1 = diagm−1/2Idx , n

−1/2Idv

, D2 = diagN−1/2T−1Imdx , (NT )−1/2Indv

, and κ and Σ∗

are defined in Theorem 2.1 and Assumption 2.1 respectively. Then, for ∀(w, u, v) ∈ R× [0, 1]×Rdv , as (N, T )→ (∞,∞),

D0

(βm(w, u)′, γn(v)′)′ − (β0(w, u)′, γ0(v)′)′

→D N(0, Σ),

where D0 = diagN1/2Tm−1/2Idx , (NT )1/2n−1/2Idv

.

The extra condition N12Tm−1/2κ → 0 of Theorem 2.2 ensures that the truncation residuals

of (2.7) can be smoothed out. The assumption (2.9) in this theorem can be verified using a

procedure similar to Lemma A.1 of Chen et al. (2012a). However, it will lead to a quite lengthy

derivation. For the sake of conciseness, we do not further establish this assumption from some

preliminary conditions in order not to deviate from our main goal.

In view of the factor structure of f ′0tγ0(vi), we in fact can improve the rate of convergence of

the estimator related to β0 slightly. To do so, we firstly denote

φi[β] := (x′i1β(ri1, τ1), . . . , x′iTβ(riT , τT ))′ (2.10)

for ∀β(·, ·) = (β1(·, ·), . . . , βdx(·, ·))′, and let Zi = (zi1, . . . , ziT )′, where zit has been defined under

(2.8). Thus, (1.4) can be written in the following matrix notation:

Yi = φi[β0,m] + F0γ0(vi) + φi[∆β0,m] + ei

= Zi vec(Cβ0) + F0γ0(vi) + φi[∆β0,m] + ei, (2.11)

where Yi and ei are defined conformably. Since F0 is observable, we can concentrate on the

factor structure using an orthogonal projection matrix as follows:

MF0Yi = MF0Zi vec(Cβ0) +MF0φi[∆β0,m] +MF0ei, (2.12)

10

Page 13: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

so that the terms associated with γ0 get removed completely, which implies that the term

T12m−

µ22 in the κ of Theorem 2.1 will not exist in the rate of convergence any more. Con-

sequently, an improved estimator of vec(Cβ0) is defined by

vec(Cβ) =

[N∑i=1

Z ′iMF0Zi

]−1 N∑i=1

Z ′iMF0Yi. (2.13)

We thereby can define βm(w, u) = CβBm(w, u) as an estimator for β(w, u). The corre-

sponding asymptotic results are summarized in Corollary A.1 of the Appendix A of this paper.

However, this potential improvement is not applicable to the rates related to γ0.

Before moving on to the next subsection, we point out one important fact, which is, regardless

of the availability of f0t’s, we can always recover β0(·, ·). This fact will be used to establish

consistent estimators for the case of unobservable factors below.

Denote the estimator of Cβ by

vec(Cβ) =

[N∑i=1

T∑t=1

zitz′it

]−1 N∑i=1

T∑t=1

zityit, (2.14)

where zit is defined under (2.8), and the interaction effects (i.e., f ′0tγ0(vi)) are treated as one

part of the error term. Accordingly, we may define βm(w, u) = CβBm(w, u) as an estimator of

β0(w, u). It is not difficult to derive the rates of convergence of Cβ and βm(w, u) in view of the

proof for Lemma A.1, and we present the results in the Corollary A.2 of the Appendix A.

2.2 The Case with Unobservable Factors

In this subsection, we consider the case where f0t’s are unobservable. By (2.14) and virtue of

Corollary A.2, it is reasonable to use the following restriction to narrow down the set that Cβ0

belongs to:

BT :=C | ‖C − C‖ ≤ T−

12 lnT

, (2.15)

where C is dx ×m and is defined conformable with C. It is easy to see that Cβ0 falls in the set

BT with a probability approaching to 1. The set BT serves as a normalizer below, and allows

us to eschew the annoyance that the I(1) process xi1, . . . , xiT and the stationary processes

f01, . . . , f0T require different normalizers when deriving asymptotic properties.

Remark 2.3. It is worthwhile to mention that we in fact can impose a sharper restriction on

the set BT of (2.15) as follows:

BT :=C | ‖C − C‖ ≤ α0T

− 12

,

11

Page 14: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

where α0 is a sufficiently large constant. It also ensures that Cβ0 falls in the set BT with probability

approaching one, and further allows us to drop the condition Assumption 3.2.(1) below. This

type of technique is well documented in the literature (cf., the arguments under Fan and Li (2001,

eq. A.2) and Wang and Xia (2009, eq. A3)). All the proofs will go through with very minor

modifications. To better demonstrate that BT indeed makes sense, we blow it up by lnT .

Recall that by (2.10), the model (1.4) can be expressed in matrix notation as Yi = φi[β0] +

F0γ0(vi) + ei. Left–multiplying MF0 on both sides gives MF0Yi = MF0φi[β0] + MF0ei. Then the

objective function is intuitively defined by

QNT (Cβ, F ) =1

NT

N∑i=1

(Yi − φi[βm])′MF (Yi − φi[βm]) , (2.16)

where βm(·, ·) = CβBm(·, ·) with Cβ being a dx ×m matrix, and F is restricted to the set:

DF =

F | 1

TF ′F = Idv

.

Therefore, the estimators of (Cβ0 , F0) are obtained by

(Cβ, F ) = argmin(Cβ ,F )∈BT×DF

QNT (Cβ, F ). (2.17)

Accordingly, the estimator of β0(·, ·) is defined as βm(·, ·) = CβBm(·, ·).

Remark 2.4.

1. It is easy to see that if we knew F0 and set F = F0 in (2.16), we would have an explicit

solution for Cβ from (2.17) which is the estimator Cβ in (2.13).

2. Numerically, we just need to implement an iterative procedure to obtain Cβ and F , respec-

tively. We omit the description of the iterative procedure here and refer interested readers

to Jiang et al. (2017), where the numerical algorithm for the linear panel data models with

interactive fixed effects has been studied carefully. Note further that in order to start the

iteration, we can always use (2.14) as an initial estimate in practice, which is exactly what

we do in the simulation and empirical studies below.

As f0t’s are unobservable in this subsection, we need to introduce the next assumption in

order to establish consistency.

Assumption 3.

1. Let infF∈DF λmin(Ω†(F )) ≥ A3 > 0 uniformly in (m,N, T ), where

Ω†(F ) = Ω1(F )− Ω′2(F )

[1

NT(Γ′0Γ0)⊗ IT

]−1

Ω2(F ),

12

Page 15: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Ω1(F ) =1

NT 2

N∑i=1

Z ′iMFZi, Ω2(F ) =1

NT 3/2

N∑i=1

γ0(vi)⊗ (MFZi),

in which Zi is defined below equation (2.10).

2. As (N, T ) → (∞,∞), the following holds: (1) T lnTmµ2

→ 0 and lnT4√N→ 0; (2) N

T→ A4 with

0 ≤ A4 <∞.

With Assumption 3 in hand, we summarize the consistency by the next lemma.

Lemma 2.1. Let Assumptions 1, 2, 3.1 and 3.2.(1) hold. As (N, T )→ (∞,∞),

1. ‖βm − β0‖L2 = oP

(1√T

),

2. ‖PF − PF0‖ = oP (1).

It is worth emphasizing that, in order to achieve the consistency, the extra restriction given in

Assumption 3.2.(2) is unnecessary. In other words, one can ignore the rate of diverging for both

N and T , if the main interest is to obtain a consistent estimation procedure only in practice.

However, Assumption 3.2.(2) is important for deriving the rate of convergence and asymptotic

normality below.

Building on Lemmas 2.1 and A.2, we establish the uniform convergence of βm(w, u) in the

next theorem.

Theorem 2.3. Let Assumptions 1–3 hold. As (N, T )→ (∞,∞),

sup(w,u)∈R×[0,1]

‖βm(w, u)− β0(w, u)‖ = OP

(mN−

12T−1

)+OP

(m

1−µ22

),

where βm(·, ·) is defined under (2.17).

Although we do not observe f0t’s, it is easy to see that, compared to Corollary A.1.3, the

rate of convergence given in Theorem 2.3 is identical to the case where f0t’s are observable.

To further derive an asymptotic normality for βm(·, ·), we define the following quantities for

notational simplicity, and impose some extra restrictions. For ∀(w, u) ∈ R× [0, 1], let

Ψ1 =N

12T

m12

[B′m(w, u)⊗ Idx ] Ψ−12 Σ−1

† ·1

NT32

N∑i=1

Z ′i√TMF0 + Ψ3i

ei,

Ψ2 = Imdx − Σ−1†

1

NT

N∑i=1

Ψ3iZi√T, Ψ3i =

1

N

N∑j=1

Z ′j√TMF0γ

′0(vj)Σ

−1γ γ0(vi),

where Σγ and Σ† are defined in Assumptions 1 and 2, respectively.

13

Page 16: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Assumption 4.

1. Denote ENt = εij | 1 ≤ i ≤ N,−∞ ≤ j ≤ t and RN,ts = r1t, . . . , rNt; r1s, . . . , rNs.Suppose for t ≥ s, E[f ′0tf0s | ENt, RN,ts] = ats a.s., and

∑Tt=1

∑ts=1 |ats| = O(T ).

2. Suppose that Ψ1 →D N(0,Ω?), where Ω? = lim(N,T )→(∞,∞)

E[Ψ1Ψ′1].

We are now ready to establish the asymptotic normality of βm below.

Theorem 2.4. Let Assumptions 1–4 hold. In addition, let mNT→ 0, mT

N2 → 0 and NT 2

m1+µ2→ 0.

For ∀(w, u) ∈ R× [0, 1], as (N, T )→ (∞,∞),

m−12N

12T(βm(w, u)− β0(w, u)

)→D N(0,Ω?),

where βm(·, ·) and Ω? are defined in (2.17) and Assumption 4, respectively.

Note that the condition mNT→ 0 in Theorem 2.4 implies N

T→ 0, that is, A4 = 0 in Assump-

tion 3.2.(2). On the other hand, the conditions mNT→ 0 and mT

N2 → 0 in Theorem 2.4 imply

that m2

N→ 0. Thus, the combination of these two restrictions is a bit stronger than the first

condition of Assumption 2.3 since that gives m2

N→ 0. Due to the usage of Assumption 4.1, we

are able to establish the normality without any bias.

Before discussing about how to estimate Ω?, we investigate the estimators of γ0(·) and F0

at first. Notice that, making use of the estimator of (Cβ0 , F0) defined in (2.17) as well as the

restriction 1TF ′F = Idv , we are able to estimate Cγ0 of (2.6) via equation (2.11) by

Cγ =

[N∑i=1

Hn(vi)Hn(vi)′

]−1 N∑i=1

Hn(vi)

1

TF ′(Yi − φi[βm])

. (2.18)

Hence, we obtain the estimator γn(v) = H ′n(v)Cγ for γ0(·). To facilitate the development, we

impose the next assumptions.

Assumption 5.

1. Let F0 ∈ DF =F | 1

TF ′F = Idv

.

2. Furthermore, suppose that Σγ in Assumption 1.5 is a dv×dv diagonal matrix with distinct

entries.

Assumption 5∗.

1. Let F0 ∈ DF .

2.Γ′0Γ0

Nis a dv × dv diagonal matrix with distinct entries a.s..

14

Page 17: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

3.√N supi≥1 supF∈F | 1√

T‖F−F0‖≤ε

∥∥ 1TF ′ei

∥∥ = oP (1), where ei is defined in (2.11), and ε is a

sufficiently small positive number.

4. For each fixed t, 1√N

∑Ni=1 γ0(vi)eit →D N(0,Σt(γ)), where Σt(γ) is positive definite for

each fixed t.

Based on Lemmas A.3 and A.4, we can now establish the next theorem.

Theorem 2.5. Let Assumptions 1–3 and 5 hold. As (N, T )→ (∞,∞),

1. 1√T‖F − F0‖ = OP

(N−

12

)+OP

(T

12m−

µ22

),

2. supv∈Rdv ‖γn(v)− γ0(v)‖ = OP

(n

12N−

12

)+OP

(n

12κ),

where κ is defined in Theorem 2.1.

Note that Assumption 4 is not needed for Theorem 2.5. The first result of Theorem 2.5 says

that after imposing Assumption 5 on the factor structure, we can recover F0 fully. Compared

with the second result of Theorem 2.1, the rate of convergence for γn(v) is slower than for the

case with unobservable ft’s, which is due to the fact that as we plug βm in (2.11) to achieve

(2.18), the rate of convergence associated with βm will be the first hurdle to overcome for the

convergence of Cγ in Theorem 2.5.

Note further that if one is willing to impose strong assumptions (i.e., Assumption 5∗), an

asymptotic normality result can be established for the factor estimator in Corollary 2.1 based

on the development of Theorem 2.5.

Corollary 2.1. Let Assumptions 1–3 and 5∗ hold. Suppose that NTmµ2→ 0. As (N, T )→ (∞,∞),

1. for each fixed t,√N(ft − f0t)→D N(0,Σ−1

γ Σt(γ)Σ−1γ ), where ft denotes the tth column of

F ′,

2. supv∈Rdv

‖γn(v)− γ0(v)‖ = OP

(√maxmn,n2

NT

)+OP

(n

N3/2

)+OP

(n

12κ)

.

Although Model (1.4) is complicated, we are able to recover the asymptotic distribution as-

sociated with ft for each fixed t, and, more importantly, the asymptotic distribution is consistent

with Theorem 1 of Bai and Ng (2013), wherein a factor model without regressors is considered.

However, due to plugging βm(·, ·) and F in, it is difficult to establish an asymptotic normality

for γn(v) at this stage.

With all the above results in hand, we are now ready to estimate the asymptotic covariance

matrix in Theorem 2.4. Intuitively, the estimator is defined as Ω? = Ψ1Ψ′1, where

15

Page 18: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Ψ1 =N

12T

m12

[B′m(w, u)⊗ Idx ] Ψ−12 Ω−1

1 (F ) · 1

NT32

N∑i=1

Z ′i√TMF + Ψ3i

ei,

Ψ2 = Imdx − Ω−11 (F )

1

NT

N∑i=1

Ψ3iZi√T,

Ψ3i =1

N

N∑j=1

Z ′j√TMF γ

′n(vj)

(1

N

N∑`=1

γn(v`)γ′n(v`)

)−1

γn(vi),

ei = Yi − φi[βm]− F γn(vi),

in which Ω1(F ) has been defined in Assumption 3.1. To show Ω? →P Ω?, certain types of

independence of the errors need to be imposed (e.g., Section 7 of Bai (2009)). Also, it can easily

lead to another research paper in a much more general way than what has been done in the

literature (cf., Bai and Liao, 2017), so we do not purse it further.

We now move on to examine the finite–sample properties of the proposed estimators in

Sections 3 and 4 below.

3 Numerical Studies

In this section, we implement simulation studies to examine our theoretical findings. Note that

a variety of numerical studies have been implemented in Fan et al. (2016) to examine the finite

sample performance of the estimate on F0 and γ0(·), so we do not stress the estimates on the

factor structure in this section due to similarity. In the following, we carefully study the finite

sample performance of βm defined in (2.17), and always compare it with βm defined under (2.13),

as x′itβ0(rit, τt) is the dominant term of model (1.4).5

We now start describing the data generating process for model (1.4). Let xit = xi,t−1+wit with

xi0 ∼ i.i.d. N(0, 1) and wit ∼ i.i.d. N(0, 1). The factors are generated by f0t ∼ i.i.d. N(0, 1). For

` = 1, . . . , dv, vi,` ∼ i.i.d. U(0, 1) and the relevant loading function is γ0`(v) = exp (−(v − `/4)2).

Let et = (e1t, . . . , eNt)′, where et = 0.4 et−1 + N(0,Σe) with Σe = 0.6|i−j|N×N . Similarly,

generate r∗t = (r∗1t, . . . , r∗Nt)′, where r∗t = 0.8 r∗t−1 + N(0,Σr∗) with Σr∗ = 0.4|i−j|N×N . Let

rit = r∗it + ‖f0t‖2 +∑dv

`=1 |vi,`|, so that rit is correlated with the factor structure. Throughout

the simulation studies, we choose dx = 2, dv = 3, m1 = bNT c 17 + 1 and6 m2 = bNT c 17 .

For the functional forms of β0(·, ·), consider the following two cases.

5In Bai (2009) and Bai et al. (2009) (or Pesaran (2006) and Kapetanios et al. (2011)), it is actually hard toconclude which term between x′itβ and γ′ift is the dominant one in terms of magnitude.

6Note that the choices of m1 and m2 may not be the optimal ones, but they satisfy all the requirements of ourassumptions. Although the optimal choice of truncation parameter and the optimal bandwidth selection havebeen solved for some cross–sectional models and time series models (e.g., Gao, 2007; Hall et al., 2007), it is wellunderstood that the question is still open even for the nonparametric panel data model with fixed effects (cf.,Chen et al., 2012b; Su and Jin, 2012). The question is even more daunting when both of the integrated processesand factor structure get involved.

16

Page 19: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

• Case 1 – Vector β0(w, u) has two elements β01(w) = exp(−w2/2) depending on w only,

while β02(u) = u2 depending on u only.7

• Case 2 – Vector β0(w, u) has two elements as β01(w, u) = (1+u) exp (−w2) and β02(w, u) =

cos(uπ) exp (−w2/2) .

For each replication, we estimate the coefficient functions using (2.13) and (2.17) respec-

tively, and record the estimated coefficient functions on some selected points on the intervals for

variables, respectively, in Case 1 and over areas for Case 2. Moreover, the estimates associated

with (2.13) and (2.17) are respectively referred to “M1” and “M2”, and the real curve is referred

to “True”. We plot the lower and upper bounds of the estimated functions on these selected

points in Figures 1–4 under a variety of choices of (N, T ) based on 1000 replications.

The results of Case 1 are summarized in Figures 1 and 2. The lower and upper bounds of

M1 and M2 are almost identical in Figure 1, but M2 has a much wider band in Figure 2, which

is not surprising given that M1 in fact has utilized more information. For Case 2, as it is hard

to distinguish 5 layers in a 3D plot given the fact they are all very close to each other, we focus

on the estimates of M2 only. In each subplot of Figures 3 and 4, the middle (red) layer stands

for the real function, while the other two layers represent the corresponding lower and upper

bounds of the estimates. Throughout Figures 1–4, as the sample size goes up, the distance

between lower and upper bounds becomes smaller under all scenarios, and both bounds move

towards the real function as the sample size increases.

4 Empirical Study

In this section, we use our newly proposed model and method to study the returns to scale of large

commercial banks in the U.S.. Due to different types of regulatory changes and technological

and financial innovations, considerable researches have investigated the returns to scale of large

banks in the U.S. over the past three decades. According to Jones and Critchfield (2005), the

asset share of large banks (those with assets in excess of $1 billion) increased from 76% in 1984

to 86% in 2003, and the average size of those banks increased from $4.97 billion to $15.50 billion

(in 2002 dollars). Meanwhile, a serious concern that some banks might be too large to operate

efficiently has been raised and debated over and over again. See Berger et al. (1999) for an

excellent review. This empirical study aims to address such concern using the above proposed

model and method. When no misunderstanding arises, we suppress subindex i and t for better

presentation in the following descriptions.

7The supposition of this form may facilitate to plot the estimates of β0 (see Figures 1 and 2 below), becausea three dimensional picture is not easy to draw for the purpose of comparison. Though not in L2(R× [0, 1]), thisform of β0 function is easier than the general form to be estimated by sieve method as mentioned in the firstsection.

17

Page 20: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

True M1 M2

-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

Figure 1: Case 1 – β01(w) = exp(−w2/2) and its estimates by both M1 and M2

4.1 Data

The quarterly data are obtained from the Reports of Income and Condition published by the

Federal Reserve Bank of Chicago, and cover the period 1986–2005. We examine only continuously

operating large banks with assets of at least $1 billion (in 1986 dollars) to avoid the impact of

entry and exit and to focus on the performance of a core of healthy, surviving institutions. It

then gives a total of 466 banks over 20 years (so T = 80 quarters). The relevant variables

are selected by following the commonly–accepted intermediation approach (Sealey and Lindley,

1977). To be specific, three input prices and three output quantities are identified for our study

as follows.

• Inputs:8

1. η1 – the wage rate for labour;

2. η2 – the interest rate for borrowed funds;

8Following the literature (e.g., Stiroh, 2000; Berger and Mester, 2003), the wage rate equals total salaries andbenefits divided by the number of full–time employees; the price of deposits and purchased funds equals totalinterest expense divided by total deposits and purchased funds; the price of capital equals expenses on premisesand equipment divided by premises and fixed assets. Total cost is thus the sum of these three input costs.

18

Page 21: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

True M1 M2

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Figure 2: Case 1 – β02(u) = u2 and its estimates by both M1 and M2

3. η3 – the price of physical capital.

• Outputs:9

1. ζ1 – consumer loans;

2. ζ2 – non–consumer loans, consisting of industrial and commercial loans and real estate

loans;

3. ζ3 – securities, including non–loan financial assets, i.e., all financial and physical assets

minus the sum of consumer loans, non–consumer loans, and equity.

Based on the above discussion, we provide a summary of the descriptive statistics of all the

variables in Table 1, and show the average asset of the banks for each year in Table 2 below.

9All outputs are deflated by the GDP deflator to the base year 1986.

19

Page 22: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Figure 3: Case 2 – β01(w, u) = (1 + u) exp (−w2) and its estimates by M2

Table 1: Descriptive Statistics

ASSETS COST η1 η2 η3 ζ1 ζ2 ζ3

Mean 3.24E+06 9.80E+04 17.42 0.01 0.03 1.34E+06 2.11E+05 1.69E+06

Std 1.17E+05 3.63E+03 0.05 0.00 0.00 5.77E+04 7.25E+03 5.69E+04

Median 2.10E+05 6.15E+03 16.19 0.00 0.02 7.96E+04 1.44E+04 1.08E+05

Mode 9.29E+04 1.57E+03 5.81 0.00 0.01 9.14E+03 6.54E+03 4.40E+04

Kurtosis 3.13E+02 3.30E+02 15.13 281.37 59.72 4.93E+02 4.77E+02 2.47E+02

Skewness 1.59E+01 1.63E+01 1.78 9.77 4.71 1.98E+01 1.89E+01 1.41E+01

Minimum 7.73E+03 9.14E+01 0.09 0.00 0.00 2.41E+03 4.29E+00 3.02E+03

Maximum 6.75E+08 2.22E+07 253.00 0.44 0.44 3.93E+08 4.96E+07 3.17E+08

20

Page 23: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Figure 4: Case 2 – β02(w, u) = cos(uπ) exp (−w2/2) and its estimates by M2

Table 2: Average Asset for Each Year (Million Dollars)

YEAR ASSET YEAR ASSET

1986 1.26E+06 1996 2.32E+06

1987 1.33E+06 1997 2.92E+06

1988 1.36E+06 1998 3.58E+06

1989 1.38E+06 1999 4.03E+06

1990 1.43E+06 2000 4.68E+06

1991 1.42E+06 2001 5.09E+06

1992 1.55E+06 2002 5.69E+06

1993 1.66E+06 2003 6.26E+06

1994 1.85E+06 2004 6.95E+06

1995 2.04E+06 2005 8.01E+06

4.2 Model Specification

In the literature of production econometrics (e.g., Feng and Serletis, 2008; Feng and Zhang,

2012), researchers are often interested in the relationship between

CT

η3

and

(η1

η3

,η2

η3

, ζ1, ζ2, ζ3

),

21

Page 24: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

where CT represents total costs, and all the other variables have been defined already. Here,

one needs to divide CT , η1, and η2 by η3 to maintain linear homogeneity with respect to input

prices.

However, the existing literature usually simply imposes, as argued in the first section, a linear

relationship for the translog data as

lnCT

η3

= C

(η1

η3

,η2

η3

, ζ1, ζ2, ζ3

)= α1 ln

η1

η3

+ α2 lnη2

η3

+ ψ1 ln ζ1 + ψ2 ln ζ2 + ψ3 ln ζ3, (4.1)

where C(·) represents the normalized cost function. Moreover, to capture different marginal

effects of all variables over time, the literature arbitrarily specifies certain interaction terms

between time t and all the other variables.

Based on the above review, in what follows we aim to improve two obvious modelling issues.

• How to maintain linear homogeneity (i.e., allow η3 to kick in the system) through a more

generalized format?

• How to capture the time varying marginal effects in a better fashion?

In addition, we would like to point out one minor but over–looked modelling issue in the

literature of production economics when translog cost/production models get adopted, that is,

• Are the regressors of (4.1) stationary or nonstationary?

Towards this end, we invoke the model (1.4), and, in particular, we focus on the case where

ft’s are unobservable in this empirical study:10

lnCTitη3,it

= β01(η3,it, τt) · ln η1,it + β02(η3,it, τt) · ln η2,it

+β03(τt) · ln ζ1,it + β04(τt) · ln ζ2,it + β05(τt) · ln ζ3,it

+γ01(v1,i) · f0t,1 + γ02(v2,i) · f0t,2 + eit, (4.2)

where v1,i represents the logarithm of the initial asset of individual i divided by 10, and v2,i

represents the initial operational policy of individual i.

If the variables of (4.2) satisfy our settings of Section 2, then we can say that model (4.2)

falls into the category of model (1.4). Here, we implement unit root tests (i.e., ADF test) for

η3, ln η` for ` = 1, 2 and ln ζj for j = 1, 2, 3 across all individuals, and report the percentage of

10One can add more interactions terms to the right hand side of (4.2) as in Feng and Serletis (2008) and Fengand Zhang (2012) in order to address some particular questions in practice. For the purpose of demonstrationand conciseness, we do not include those interactions in this study. The Matlab code and data are available uponrequest.

22

Page 25: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

rejecting the null (i.e., rejecting the hypothesis of having a unit root) for each variable in Table

3.

Table 3: Percentage of Rejection of the Unit Root Tests across Individuals

% of rejection % of rejection

ln η1 0.00% ln ζ1 0.00%

ln η2 0.00% ln ζ2 6.65%

η3 100% ln ζ3 0.43%

The results indicate that for the variables ln η`, ` = 1, 2 and ln ζj, j = 1, 2, 3, majority of

individuals follow an I(1) process, while for η3, all individuals are stationary (i.e., I(0)) process.

Therefore, the model (4.2) is indeed of the category of model (1.4).

Finally, an important argument in production economics is whether a production function is

constant, increasing or decreasing returns to scale. Given the consistent estimation of the model

(4.2), the index of returns to scale is computed as follows:

RTS =

[3∑j=1

∂ ln ζjC

(η1

η3

,η2

η3

, ζ1, ζ2, ζ3

)]−1

, (4.3)

where the function C(·) is the regression function and ∂∂ ln ζj

C(η1η3, η2η3, ζ1, ζ2, ζ3

)is the cost

elasticity of output j for j = 1, 2, 3. The results would reveal whether the large bank industry

is increasing returns to scale or not.

4.3 Summary of the Results

We now implement our estimation procedure to estimate11 all the unknown functions in model

(4.2) which accounts for the structure of the right–hand side of the model, and we then plot the

estimated coefficient functions in Figure 5 below. It is easy to see that these coefficient functions

are highly nonlinear.

Moreover, we calculate the RTS for each individual at each time period using (4.3), and

report the overall average RTS and average RTS for each year in Table 4 below. First, the

overall average RTS (1.0897) is close to 1.1 which reveals an increasing returns to scale for large

bank industry during the observed period.

Second, the annual average RTS is also greater than unity for all the sample years, implying

the increasing returns to scale over all observed years, while particularly the index is significantly

larger than one in 1990s. Indeed, all the annual averages in 1990s are larger than 1.1 except

the last two years, i.e., the years of 1998 and 1999, which, however, are much close to 1.1 as

11The truncation parameters are chosen in the same way as the Monte Carlo studies, i.e., m1 = b(NT )1/7c,m2 = b(NT )1/7c − 1 and n = b(NT )1/7c − 1.

23

Page 26: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Figure 5: Estimated Coefficient Functions

well. This is possibly due to the fast developed technologies of internet in the decade that yields

abundant returns, since, as banks grow bigger, they are more likely to afford new technologies.

The adoption of new technologies then increases the banks’ optimal scales over time, which

results in higher RTS for given bundles of inputs. More interesting results can be calculated

and discussed by following the same spirit as in Feng and Serletis (2008) and Feng and Zhang

(2012). We do not further pursue them in this study.

Table 4: Returns to Scale

Overall average RTS: 1.0897

YEAR RTS YEAR RTS

1986 1.0627 1996 1.1102

1987 1.0678 1997 1.1008

1988 1.0768 1998 1.0913

1989 1.0882 1999 1.0828

1990 1.1003 2000 1.0761

1991 1.1113 2001 1.0715

1992 1.1194 2002 1.0687

1993 1.1234 2003 1.0673

1994 1.1227 2004 1.0668

1995 1.1180 2005 1.0667

24

Page 27: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

5 Extensions and Discussion

In this section, we would like to discuss some extensions of the model and some contribution in

this study.

As mentioned by Connor et al. (2012), it is of interest to allow the variable of the loading func-

tion to change over both i and t. In this case, suppose that the time series γ0(vi1), . . . , γ0(viT )has mean γi for each i ≥ 1. Thus, we can rewrite (1.4) as

yit = x′itβ0(rit, τt) + f ′0tγi + e∗it, (5.1)

where e∗it = f ′0t(γ0(vit)− γi) + eit. If vit is independent of xit and rit, all the development

of Section 2 holds true under minor modifications. For the case where vit is correlated with

xit and rit, a similar approach to what has been discussed in Connor et al. (2012) may be

adopted, but since we have one extra component x′itβ0(rit, τt) in the model, such an extension

requires some careful thoughts and techniques.

As argued in Bai et al. (2009), models with mixed I(1)/I(0) regressors and I(1)/I(0) factors

are also important ones. Consider the next model with I(1)/I(0) regressors and I(0) factors as

an example,

yit = x′1itα0(rit, τt) + x′2itβ0(rit, τt) + f ′0tγ(vi) + eit, (5.2)

where x1it and x2it are I(1) and I(0) across t, respectively. As explained in Section 4.2 of Bai

et al. (2009), the difficulty of considering such a model lies in the requirements of different

normalizers, which further gives rise to a challenge of the degeneration of asymptotics since the

covariance matrix would be singular. The detailed development of this study (e.g., Theorems

2.2 and 2.4) provides a clear solution to this type of challenge when mixed I(1)/I(0) variables

get involved in a model with an unobservable factor structure. Meanwhile, the case where the

factors are also integrated nonstationary is also of a general interest. We leave such extensions

for future research.

We now briefly sketch how to estimate (5.2), and further implement a simple Monte Carlo

in Appendix B to back up our arguments below. For the cases where f0t’s are observable, the

solution is simple, so omitted. For the cases with unobserved f0t’s, we still need to restrict the

set that the coefficient function of I(1) regressors to a set like BT of (2.15). Then the objective

function (2.16) and the corresponding estimator (2.17) remain unchanged. One may just need

to derive the asymptotic properties of (2.17) choosing a suitable diagonal matrix as a normalizer

as in the proof of Theorem 2.2.

25

Page 28: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

6 Conclusion

In this paper, we have proposed a new class of varying–coefficient panel data models with

nonstationary regressors, wherein a partially oberaved factor structure similar to what has been

discussed in Connor et al. (2012) and Fan et al. (2016) is also adopted to capture different effects

of time invariant variables over time. The methodology employed in this paper fills a gap of

dealing with mixed I(1)/I(0) regressors and factors in the literature (c.f., Pesaran, 2006 and

Kapetanios et al., 2011; or Bai, 2009 and Bai et al., 2009). For the purpose of comparison,

we have also considered both scenarios where the factors are either observable or unobservable,

respectively. The corresponding asymptotic theories have been established extensively. We have

further examined our theoretical findings through an extensive Monte Carlo simulation study.

In the empirical study, we use our newly proposed model and method to study the returns to

scale of large commercial banks in the U.S.. Several possibly overlooked modelling issues in

the literature of production econometrics have been discussed. Some possible extensions with

discussions have been provided at the end of this paper, and they may guide our future research

projects.

Appendix A

In this appendix, we firstly provide our discussion and justification to the assumptions. Then we state

the necessary main lemmas and provide the proofs to the main asymptotic results of the paper. The

omitted simulation, proofs, and the preliminary lemmas with the associated proofs are provided in

Appendix B. O(1) always denotes constants and may be different at each appearance in the following

development.

A.1 Discussion and Justification to the Assumptions

Assumption 1:

Assumptions 1.1 and 1.2 are standard in the literature of nonstationary time series (e.g., Park and

Phillips, 2001), and also account for the heteroskedasticity of nonstationary panel data.

Assumption 1.3 nests the stationary time sereis as special cases. Variety of studies with discussions

on the locally stationary process can be seen in Koo and Linton (2012), Vogt (2012) and Dong and

Linton (2017) for example.

The current Assumption 1.4 requies weak exogeneity between the errors and the other variables.

The mixing conditions of Assumption 1.4 are the same as Assumption 3 of Jiang et al. (2017), and

are similar to Condition 2.4 of Chang et al. (2015) and Assumption 3.4 of Fan et al. (2016). It is

worth mentioning that the current Assumption 1.4 formulates some commonly used assumptions in the

literature. For example, conditions like Assumptions B and C of Bai (2009) and Assumption A.1.iii of

26

Page 29: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Li et al. (2016) are commonly imposed to restrict the terms in the expansion of E∥∥∥ 1NT

∑Ni=1 eie

′i

∥∥∥2,

where ei = (ei1, . . . , eiT )′. With the current setting, we are able to explicitly produce the expansion as

follows:

E

∥∥∥∥∥ 1

NT

N∑i=1

eie′i

∥∥∥∥∥2

=1

N2T 2

T∑t=1

T∑s=1

N∑i=1

E[e2ite

2is] +

∑i 6=j

E[eiteisejtejs]

=

1

N2T 2

T∑t=1

T∑s=1

N∑i=1

E[e2ite

2is] +

∑i 6=j

E[(eitejt − σij)(eisejs − σij)] +∑i 6=j

σ2ij

=

1

N2T 2

T∑t=1

N∑i=1

E[e4it] +

∑i 6=j

E[(eitejt − σij)2]

+

1

N2T 2

∑t6=s

N∑i=1

E[e2ite

2is] +

∑i 6=j

E[(eitejt − σij)(eisejs − σij)]

+1

N2

∑i 6=j

σ2ij

= O(1)1

NT+O(1)

1

T+O(1)

1

N+O(1)

1

T+O(1)

1

N= O(1)

1

N+O(1)

1

T.

Moreover, in Assumptions 1.4.a-b, Xit can be simplified to χit = . . . , εi,t−1, εit; rit; vi; f0t, and

gives

Assumption 1.4∗.

(a) Let ei1, . . . , eiT be identically distributed across i, and let et = (e1t, . . . , eNt)′ | t ≥ 1 be

strictly stationary and α–mixing with E[eit |χit] = 0 a.s..

(b) Conditional on χit and χjs, let αij(|t − s|) denote the mixing coefficient between eit and

ejs, such that for some δ > 0,∑N

i,j=1

∑Tt,s=1 |αij(|t − s|)|δ/(4+δ) = O(NT ) a.s.. Moreover,

maxi,j,t,sE[|eit|2+δ/2 |χit, χjs

]<∞, a.s..

However, by doing so, it will lead to quite messy mathematical symbols. Therefore, we stick to the

current form of Assumption 1.4.

Assumption 1.5.a is standard in the literature. See, e.g., Assumption B of Bai (2009). In Assump-

tion 1.5.b, as each element of γ0(·) is defined on L2(R), these elements are uniformly bounded on R.

Thus, there is no need to assume that the fourth moment of vi is bounded as in Assumption 1.5.a. In ad-

dition, the current form Assumption 1.5.b implicitly allows for the cross–sectional dependence and het-

eroskedasticity among vi. Alternatively, one can use a restriction like∑

i 6=j ‖Cov [γ0(vi), γ0(vj)′] ‖ =

O(N) plus vi being identically distributed across i, which after simple algebra will yield an explicit

form E[γ0(v1)]E[γ′0(v1)] + Cov[γ0(v1), γ′0(v1)] for Σγ .

Assumption 2:

The current form of Assumption 2.1 allows xit to be potentially correlated with rit, vi and

f0t, and is in the same spirit as how the weak cross–sectional dependence usually gets introduced to

the system in the literature of panel data models (cf., Chen et al., 2012b, Assumption A4; Fan et al.,

2016, Assumption 3.4; Jiang et al., 2017, Assumption 3). It is worth mentioning that if (1) εij is

27

Page 30: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

independent of all the other variables; or (2) a structure similar to Assumption B.1.b of Dong and

Linton (2017) is imposed, then Σ1mm, Σ12m and Σ12mn can be further simplified. We now focus on

Σ1mm, and use the independence case as an example. Consider the first moment only:

E

1

NT

∑i,t

[Bm(rit[τt], τt)B′m(rit[τt], τt)]⊗

∫ τt+1

τt

Wi (w)W ′i (w) dw

=

1

NT

∑i,t

E[Bm(rit[τt], τt)B′m(rit[τt], τt)]⊗ E

[∫ τt+1

τt

Wi (w)W ′i (w) dw

]=

1

NT

∑i,t

E[Bm(rit[τt], τt)B′m(rit[τt], τt)]⊗

[DiD

′iτt]· (1 + o(1))

=

1

T

T∑t=1

τtE[Bm(r11[τt], τt)B′m(r11[τt], τt)]

1

N

N∑i=1

DiD′i

· (1 + o(1)), (A.1)

where the second equality follows from a straightforward calculation; and the third equality follows from

Assumption 1.3. Moreover, by the definition of Riemann integral and accounting for the dimension of

Bm(·, ·), the right hand side of (A.1) will further have an asymptotic∫ 1

0wE[Bm(r11[w], w)B′m(r11[w], w)]dw ⊗Υ, (A.2)

where ‖ 1N

∑Ni=1DiD

′i−Υ‖ → 0. While variables have a structure similar to Assumption B.1.b of Dong

and Linton (2017), we can still show that Σ1mm reduces to (A.2). However, for the panel data models,

it requires more technicalities than those involved in the proof of their paper. We do not pursue it

further in this study in order not to deviate from our main goal.

Assumption 2.2 states the rates of convergence for the orthogonal expansions in (2.5) and (2.6),

both of which are achievable given certain smoothness of the related functions, and are in the same

spirit as the assumptions in Section 6.1 of Hansen (2015). Assumption 2.3 further puts restrictions on

the rate of divergence of the truncation parameters in order to ensure the consistency of the estimators

studied below.

Assumption 3:

Assumption 3.1 ensures that the estimators given in (2.17) are well defined, and is equivalent to

Assumption A of Bai (2009) and Assumption 1 of Jiang et al. (2017) used for the linear parametric

model. Assumption 3.2 requires that the number of individuals cannot diverge to infinity faster than

the number of time periods, and also imposes a strong condition on the smoothness of the elements of

β0(·, ·).

Assumption 4:

Assumption 4.1 further imposes restrictions on the unknown factors in order to ensure that the

estimator βm given by (2.17) is not asymptotically biased in the sense of Theorem 3 of Bai (2009). The

current requirements of Assumption 4.1 are in the same spirit as Connor et al. (2012, Eq. 3 and Eq.

20) and Jiang et al. (2017, pp. 21–22). Without this assumption, some other types of conditions are

28

Page 31: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

needed to achieve asymptotic normality. For example, one can require N/T → ρ with 0 < ρ <∞ and

establish the normality with biases as in Theorem 3 of Bai (2009).

Assumption 4.2 can be verified using a procedure similar to Lemma A.1 of Chen et al. (2012a).

However, it will lead to a quite lengthy derivation. For the sake of conciseness, we do not further

establish Assumption 4.2 from some low–level conditions.

Assumptions 5 and 5∗:

These two assumptions serve the purpose of identifying both γ0(·) and F0, but Assumption 5∗ is

a stronger version. These two assumptions allow us to avoid using an assumption like (4.7) of Jiang

et al. (2017), because we can derive their assumption from some fundamental conditions (see Lemma

A.3 of the Appendix A for details). Assumptions 5∗.1–5∗.2 are the same as Assumptions 3.2 and 4.1

of Fan et al. (2016). Assumption 5∗.2 is stronger than Assumption 5.2, as it imposes the restriction on

the sample directly. Replacing F with F0 in Assumption 5∗.3, it is easy to see that 1T F′0ei = OP

(1√T

)for each given i, so Assumption 5∗.3 essentially further imposes a relationship between N and T .

Assumption 5∗.4 is fairly standard.

A.2 Some Lemmas and Corollaries

A.2.1 The Case with Observable ft’s

Lemma A.1. Under Assumptions 1 and 2, as (N,T )→ (∞,∞),

1. (a) ‖Cβ − Cβ0‖ = OP

(m

12N−

12T−1

)+OP (κ),

(b) ‖βm − β0‖L2 = OP

(m

12N−

12T−1

)+OP (κ),

2. (a) ‖Cγ − Cγ0‖ = OP

(n

12 (NT )−

12

)+OP (κ),

(b) ‖γn − γ0‖L2 = OP

(n

12 (NT )−

12

)+OP (κ),

where κ = maxT12m−

µ22 , n−

µ12 , and µ1 and µ2 are defined in Assumption 2.2.

The leading terms of the above results should be expected in view of the literature (e.g., Bai et al.,

2009; Chen et al., 2012b; Dong et al., 2015b).

Corollary A.1. Under Assumptions 1 and 2, as (N,T )→ (∞,∞),

1. ‖Cβ − Cβ0‖ = OP

(m

12N−

12T−1

)+OP

(m−

µ22

),

2. ‖βm − β0‖L2 = OP

(m

12N−

12T−1

)+OP

(m−

µ22

),

3. sup(w,u)∈R×[0,1]

‖βm(w, u)− β0(w, u)‖ = OP

(mN−

12T−1

)+OP

(m

1−µ22

).

It is readily seen that the rates of convergence for Cβ and particularly for βm, comparing with Cβ

and βm, could be improved if the second term in the expressions dominates the first, since Corollary

A.1 only can enhance the rates of the second terms.

29

Page 32: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Corollary A.2. Under Assumptions 1 and 2, as (N,T )→ (∞,∞),

1. ‖Cβ − Cβ0‖ = OP

(T−

12

)+OP

(m−

µ22

),

2. ‖βm − β0‖L2 = OP

(T−

12

)+OP

(m−

µ22

),

3. sup(w,u)∈R×[0,1]

‖βm(w, u)− β0(w, u)‖ = OP

(m

12T−

12

)+OP

(m

1−µ22

).

The slow leading terms in the rates of convergence shown by Corollary A.2 are simply due to

ignoring the existence of the factor structure when using (2.14).

A.2.2 The Case with Unobservable ft’s

Lemma A.2. Let Assumptions 1–3 hold. As (N,T )→ (∞,∞),

1. ‖Cβ − Cβ0‖ = OP

(m

12N−

12T−1

)+OP

(m−

µ22

),

2. ‖βm − β0‖L2 = OP

(m

12N−

12T−1

)+OP

(m−

µ22

),

3. ‖PF− PF0‖ = OP

(m

14 (NT )−

14

)+OP

(T

14m−

µ24

)+OP

(N−

12

),

where βm(·, ·) = CβBm(·, ·).

Again, although we do not observe f0t’s, the rates of convergence achieved in Lemma A.2 are

identical to the case with observable f0t’s (i.e., Corollary A.1), under certain conditions.

Lemma A.3. Suppose that Assumptions 1–3 and 5 hold. As (N,T )→ (∞,∞),

1. ‖ΠNT − Idv‖ = OP

(√Tm−

µ22

)+OP

(1√N

),

2.∥∥∥ 1T F′0F − Idv

∥∥∥ = OP

(√Tm−

µ22

)+OP

(1√N

).

Lemma A.4. Let Assumptions 1–3 and 5 hold. As (N,T )→ (∞,∞),

1. ‖Cγ − Cγ0‖ = OP

(N−

12

)+OP (κ),

2. ‖γn − γ0‖L2 = OP

(N−

12

)+OP (κ),

where κ is given in Theorem 2.1.

Corollary A.3. Let Assumptions 1–3 and 5∗ hold. As (N,T )→ (∞,∞),

1. ‖Cγ − Cγ0‖ = OP

(√maxm,n

NT

)+OP

(√nN3

)+OP (κ),

2. ‖γn − γ0‖L2 = OP

(√maxm,n

NT

)+OP

(√nN3

)+OP (κ).

30

Page 33: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

A.3 Proofs

Proof of Lemma 2.1:

Recall that we have defined Γ0 = (γ0(v1), . . . , γ0(vN ))′. We further define some variables, which

will be repeatedly used below. Let ∆φi[βm] = φi[β0,m]− φi[βm] for i ≥ 1, where φi[β] has been defined

in (2.10). For ∀F ∈ DF , let ξF = vec (MFF0) , A1 = 1NT

∑Ni=1 Z

′iMFZi, A2 = 1

NT (Γ′0Γ0) ⊗ IT , and

A3 = 1NT

∑Ni=1 γ0(vi)⊗ (MFZi).

(1). We are now ready to start the proof. By (2)–(8) of Lemma B.5, it is straightforward to obtain

that

QNT (Cβ, F )−QNT (Cβ0 , F0) = Q∗NT (βm, F ) + oP (1), (A.3)

where Q∗NT (βm, F ) = 1NT

∑Ni=1 (∆φi[βm] + F0γ(vi))

′MF (∆φi[βm] + F0γ(vi)) .

Further organise Q∗NT (βm, F ) as follows:

Q∗NT (βm, F ) =1

NT

N∑i=1

∆φi[βm]′MF∆φi[βm] +1

NT

N∑i=1

γ0(vi)′F ′0MFF0γ0(vi)

+2

NT

N∑i=1

∆φi[βm]′MFF0γ0(vi)

= vec(Cβ0 − Cβ)′1

NT

N∑i=1

Z ′iMFZi vec(Cβ0 − Cβ) +1

NTtr(MFF0Γ′0Γ0F

′0MF

)+2 vec(Cβ0 − Cβ)′

1

NT

N∑i=1

Z ′iMFF0γ0(vi)

= vec(Cβ0 − Cβ)′A1 vec(Cβ0 − Cβ) + ξ′FA2ξF + 2 vec(Cβ0 − Cβ)′A′3ξF

=√T vec(Cβ0 − Cβ)′

(A1 −A′3A

−12 A3

T

)√T vec(Cβ0 − Cβ)

+[ξ′F + vec(Cβ0 − Cβ)′A′3A−12 ]A2[ξF +A−1

2 A3 vec(Cβ0 − Cβ)]

=√T vec(Cβ0 − Cβ)′Ω†(F )

√T vec(Cβ0 − Cβ)

+[ξ′F + vec(Cβ0 − Cβ)′A′3A−12 ]A2[ξF +A−1

2 A3 vec(Cβ0 − Cβ)], (A.4)

where the third equality follows from Bernstein (2005, Fact 7.4.6 and Fact 7.4.8 on p. 253), and Ω†(F )

has been defined in Assumption 3.1.

In view of Assumption 3.1, by the same arguments as in Bai (2009, p. 1265), we obtain that√T‖Cβ0 − Cβ‖ = oP (1). Therefore, further write

√T‖βm − β0‖L2 =

√T‖βm − β0,m‖L2 +

√T‖∆β0,m‖L2

=√T‖Cβ − Cβ0‖+

√T‖∆β0,m‖L2

= oP (1) +√TOP (m−

µ22 ) = oP (1), (A.5)

where the second equality follows from the definition of ‖·‖L2 , the third equality follows from√T‖Cβ0−

Cβ‖ = oP (1) and Assumption 2.2, and the last equality follows from Assumption 2.3.

31

Page 34: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

(2). By the first result of this lemma, we further obtain that

0 ≥ QNT (Cβ, F )−QNT (Cβ0 , F0) =1

NTtr[(F ′0MF

F0

) (Γ′0Γ0

)]+ oP (1) ,

which indicates that 1NT tr

[(F ′0MF

F0

)(Γ′0Γ0)

]= oP (1) . As in Bai (2009, p. 1265), we can further

conclude that 1T tr

(F ′0MF

F0

)= oP (1), 1

T F′F0 is invertible with probability approaching one, and∥∥P

F− PF0

∥∥ = oP (1). Then the proof is completed.

Following the same arguments as in Bai (2009, p. 1236), equation (2.17) can be further decomposed

into the following two expressions:

Cβ = argminCβ∈BT

1

NT

N∑i=1

(Yi − φi[βm])′MF

(Yi − φi[βm]) ,

1

NT

N∑i=1

(Yi − φi[βm]

)(Yi − φi[βm]

)′F = F VNT , (A.6)

where VNT is a diagonal matrix with the diagonal being the dv largest eigenvalues of

1

NT

N∑i=1

(Yi − φi[βm]

)(Yi − φi[βm]

)′arranged in descending order.

Proofs of Theorems 2.1–2.3:

The proofs are given in the last part of Appendix B of the supplementary document.

Proof of Theorem 2.4:

Recall that we have denoted A1NT , A2NT and A3,i in (B.20) of the supplementary file, and we will

keep using these notations here. By the definition of βm and (2.5), we can write for ∀(w, u) ∈ R× [0, 1]

βm(w, u)− β0(w, u) =(Cβ − Cβ0

)Bm(w, u) + ∆β0,m(w, u).

Therefore, we can further write

N12T

m12

(βm(w, u)− β0(w, u)

)=

N12T

m12

[B′m(w, u)⊗ Idx

] [vec(Cβ)− vec(Cβ0)

]+N

12T

m12

∆β0,m(w, u)

=N

12T

m12

[B′m(w, u)⊗ Idx

] [vec(Cβ)− vec(Cβ0)

]+ oP (1)

=N

12T

m12

[B′m(w, u)⊗ Idx

]A−1

1NTΣ−1† ·

1

NT32

N∑i=1

Z ′i√TMF

+A3,i

ei

+N

12T

m12

[B′m(w, u)⊗ Idx

]A−1

1NTΣ−1† · J6NT,1 + oP (1)

=N

12T

m12

[B′m(w, u)⊗ Idx

]A−1

1NTΣ−1† ·

1

NT32

N∑i=1

Z ′i√TMF

+A3,i

ei + oP (1)

32

Page 35: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

:= Λ1 + oP (1),

where the second equality follows from ‖∆β0,m(w, u)‖ = O(m−µ2/2) and the condition NT 2

m1+µ2→ 0, the

third equality follows from the proof of Lemma A.2, and the fourth equality follows from (B.21) of the

supplementary file and the condition mNT → 0.

We then just need to consider Λ1. Start from 1NT 3/2

∑Ni=1

Z′i√TMFei.

1

NT32

N∑i=1

Z ′i√TMFei =

1

NT32

N∑i=1

Z ′i√TMF0ei +

1

NT32

N∑i=1

Z ′i√T

(MF−MF0)ei

=1

NT32

N∑i=1

Z ′i√TMF0ei −

1

NT32

N∑i=1

Z ′i√T

(PF− PF0)ei

:= D1 −D2.

Firstly, we shall show

∥∥∥∥N 12 T

m12

[B′m(w, u)⊗ Idx ]A−11NTΣ−1

† D2

∥∥∥∥ = oP (1). Denote UiT = Zi√T

and let

UiT,j be the jth column of UiT . Write

D2 =1

NT32

N∑i=1

U ′iT

(F F ′

T− PF0

)ei

=1

NT32

N∑i=1

U ′iT (F − F0ΠNT )

TΠ′NTF

′0ei +

1

NT32

N∑i=1

U ′iT (F − F0ΠNT )

T(F − F0ΠNT )′ei

+1

NT32

N∑i=1

U ′iTF0ΠNT

T(F − F0ΠNT )′ei +

1

NT32

N∑i=1

U ′iTF0

T[ΠNTΠ′NT − (F ′0F0/T )−1]F ′0ei

:= D21 +D22 +D23 +D24,

where the definitions of D21 to D24 should be obvious.

In the following, we let D2`,j be the jth row of D2` for ` = 1, 2, 3, 4. Thus, for D21, consider

‖D21,j‖ =

∥∥∥∥∥ 1

NT32

N∑i=1

U ′iT,j(F − F0ΠNT )

TΠ′NTF

′0ei

∥∥∥∥∥≤

∥∥∥∥∥ 1

NT32

N∑i=1

(e′iF0)⊗U ′iT,j√T

∥∥∥∥∥ ·∥∥∥∥ 1√

Tvec[(F − F0ΠNT )Π′NT

]∥∥∥∥= OP

(1

N12T

)1√T‖F − F0ΠNT ‖,

where the second equality follows from the development similar to (B.15) of the supplementary file.

Summing up over j for D21,j , we obtain that ‖D21‖ = OP

(m

12

N12 T

)1√T‖F − F0ΠNT ‖.

For D22, write

‖D22,j‖ =

∥∥∥∥∥ 1

NT32

N∑i=1

U ′iT,j(F − F0ΠNT )

T(F − F0ΠNT )′ei

∥∥∥∥∥≤

∥∥∥∥∥ 1

NT

N∑i=1

e′i ⊗U ′iT,j√T

∥∥∥∥∥ ·∥∥∥∥ 1

Tvec[(F − F0ΠNT )(F − F0ΠNT )′

]∥∥∥∥33

Page 36: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

= OP

(1√NT

)1

T‖F − F0ΠNT ‖2,

where the second equality follows from the development similar to (B.14) of the supplementary file.

Summing D22,j up over j, we obtain that ‖D22‖ = OP(√

mNT

)1T ‖F − F0ΠNT ‖2.

For D23, write

‖D23,j‖ =

∥∥∥∥∥ 1

NT32

N∑i=1

U ′iT,jF0ΠNT

T(F − F0ΠNT )′ei

∥∥∥∥∥≤

∥∥∥∥∥ 1

NT

N∑i=1

e′i ⊗U ′iT,jF0

T

∥∥∥∥∥ ·∥∥∥∥ 1√

Tvec[ΠNT (F − F0ΠNT )′

]∥∥∥∥ .Note that

E

∥∥∥∥∥ 1

NT

N∑i=1

e′i ⊗U ′iT,jF0

T

∥∥∥∥∥2

=1

N2T 4E

∥∥∥∥∥N∑i=1

e′i ⊗T∑t=1

zit,j√Tf ′0t

∥∥∥∥∥2

=1

N2T 4

T∑s=1

E

∥∥∥∥∥N∑i=1

eis

T∑t=1

zit,j√Tf ′0t

∥∥∥∥∥2

=1

N2T 4

T∑s=1

N∑i1=1

N∑i2=1

E

[(T∑t=1

zi1t,j√Tf ′0t

)(T∑t=1

zi2t,j√Tf0t

)E[ei1sei2s | Xi1T ,Xi2T ]

]

=1

N2T 3

N∑i1=1

N∑i2=1

E

[(T∑t=1

zi1t,j√Tf ′0t

)(T∑t=1

zi2t,j√Tf0t

)]σi1i2

=1

N2T 3

T∑t=1

N∑i1=1

N∑i2=1

E

[zi1t,j√T

zi2t,j√TE[‖f0t‖2 | ENt, RN,tt

]]σi1i2

+2

N2T 3

∑t1>t2

N∑i1=1

N∑i2=1

E

[zi1t1,j√T

zi2t2,j√TE[f ′0t1f0t2 | ENt1 , RN,t1t2

]]σi1i2

=1

N2T 3

T∑t=1

N∑i1=1

N∑i2=1

E

[zi1t,j√T

zi2t,j√T

]attσi1i2

+2

N2T 3

∑t1>t2

N∑i1=1

N∑i2=1

E

[zi1t1,j√T

zi2t2,j√T

]at1t2σi1i2

≤ O(1)2

N2T 3

∑t1≥t2

N∑i1=1

N∑i2=1

|at1t2 | · |σi1i2 | = O(1)1

NT 2,

where zit,j stands for the jth element of zit; the fourth equality follows from Assumption 1.4.c; the sixth

equality follows from Assumption 4.1; and the seventh equality follows from both Assumptions 1.4.c

and 4.1.

Thus, ‖D23,j‖ = OP

(1

N12 T

)1√T

∥∥∥F − F0ΠNT

∥∥∥. Summing D23,j up over j, we obtain that ‖D23‖ =

OP

(m

12

N12 T

)1√T‖F − F0ΠNT ‖.

For D24, write

‖D24,j‖ =

∥∥∥∥∥ 1

NT32

N∑i=1

U ′iT,jF0

T[ΠNTΠ′NT − (F ′0F0/T )−1]F ′0ei

∥∥∥∥∥≤

∥∥∥∥∥ 1

NT32

N∑i=1

(e′iF0)⊗U ′iT,jF0

T

∥∥∥∥∥ · ∥∥ΠNTΠ′NT − (F ′0F0/T )−1∥∥

34

Page 37: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

= OP

(1

N12T

)∥∥ΠNTΠ′NT − (F ′0F0/T )−1∥∥ ,

where the second equality follows from the development similar to (B.15) of the supplementary file.

Summing D24,j up over j, we obtain that ‖D24‖ = OP

(m

12

N12 T

)∥∥ΠNTΠ′NT − (F ′0F0/T )−1∥∥ .

Based on the analyses of D21 to D24, we obtain

N12T

m12

‖D2‖ = OP (1)1√T‖F − F0ΠNT ‖+OP (1)

∥∥ΠNTΠ′NT − (F ′0F0/T )−1∥∥

+OP (1)√T · 1

T‖F − F0ΠNT ‖2

which further gives

∥∥∥∥N 12 T

m12

[B′m(w, u)⊗ Idx ]A−11NTΣ−1

† D2

∥∥∥∥ = oP (1) given the condition mTN2 → 0.

Similarly, we obtain

∥∥∥N 12T

m12

[B′m(w, u)⊗ Idx

]A−1

1NTΣ−1†

1

NT 3/2

N∑i=1

A3,iei

−N12T

m12

[B′m(w, u)⊗ Idx

]A−1

1NTΣ−1†

1

NT 3/2

N∑i=1

A3,iei

∥∥∥ = oP (1),

where A3,i = 1N

∑Nj=1

Z′j√TMF0γ0(vj)

′(Γ′0Γ0/N)−1γ0(vi).

Finally, by Assumption 4 and after some simple algebra, we obtain

Λ1 =N

12T

m12

[B′m(w, u)⊗ Idx

]A−1

1NTΣ−1† ·

1

NT 3/2

N∑i=1

Z ′i√TMF0 + A3,i

ei + oP (1)

→D N(0,Ω?),

where A1NT = Imdx − Σ−1† A2NT and A2NT = 1

N2T

∑Ni=1

∑Nj=1

Z′i√TMF0

Zj√Tγ0(vj)

′Σ−1γ γ0(vi). The proof

is then completed.

Proof of Theorem 2.5:

(1). Write

1√T‖F − F0‖ =

1√T‖F − FΠ−1

NT + FΠ−1NT − F0‖ ≤

1√T‖F (Idv −Π−1

NT )‖+1√T‖FΠ−1

NT − F0‖

= OP

(√T/mµ2 +

1√N

)where the second equality follows from Lemma B.2, (1) of Lemma A.3 and (2) of Lemma B.6.

(2). Similar to the development of Theorem 2.1, the second result follows immediately building on

the proof of Lemma A.4.

Proof of Corollary 2.1:

(1). For each fixed t, we consider the asymptotic distribution of√N(ft − f0t) . Note that using

Assumption 5∗.2, the rates of convergence shown in Lemma A.3 can be improved as follows:

35

Page 38: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

‖ΠNT − Idv‖ = OP

(√m/(NT ) +

√T/mµ2 +

1

N

),∥∥∥∥ 1

TF ′0F − Idv

∥∥∥∥ = OP

(√m/(NT ) +

√T/mµ2 +

1

N

). (A.7)

Note further that in the body of this theorem, we have imposed that NTmµ2 → 0. Thus, by (B.11) of the

supplementary file, we write

√N(ft − f0t) =

√N(ft −Π−1

NT ft) +√N(Π−1

NT ft − f0t) =√N(Π−1

NT ft − f0t) + oP (1)

=√N(Γ′0Γ0/N)−1(F ′F0/T )−1 1

NT

N∑i=1

(F ′eieit + F ′eiγ

′0(vi)f0t + F ′F0γ0(vi)eit

)+ oP (1)

=√N(Γ′0Γ0/N)−1 1

N

N∑i=1

γ0(vi)eit + oP (1)→D N(0,Σ−1γ Σt(γ)Σ−1

γ ),

where the second equality follows from (A.7) and NTmµ2 → 0; the third equality follows from the proof of

Lemma B.6; the fourth equality follows from

∥∥∥(Γ′0Γ0/N)−1(F ′F0/T )−1∥∥∥√N

Tsupi≥1

supF∈F | 1√

T‖F−F0‖≤ε

‖F ′ei‖

1

N

N∑i=1

|eit| = oP (1),

∥∥∥(Γ′0Γ0/N)−1(F ′F0/T )−1∥∥∥√N

Tsupi≥1

supF∈F | 1√

T‖F−F0‖≤ε

‖F ′ei‖

1

N

N∑i=1

‖γ0(vi)‖‖f0t‖ = oP (1),

by Assumption 5∗.3; and the last step follows from Assumption 5∗.4.

Thus, we can conclude that for each fixed t, we have√N(ft − f0t)→D N(0,Σ−1

γ Σt(γ)Σ−1γ ).

(2). By the proof of Corollary A.3, it is easy to show that the second result follows.

References

Bai, J. (2009), ‘Panel data models with interactive fixed effects’, Econometrica 77(4), 1229–1279.

Bai, J. and Carrion-I-Silvestre, J. L. (2009), ‘Structural changes, common stochastic trends, and unit roots in

panel data’, The Review of Economic Studies 76(2), 471–501.

Bai, J., Kao, C. and Ng, S. (2009), ‘Panel cointegration with global stochastic trends’, Journal of Econometrics

149(1), 82–99.

Bai, J. and Liao, Y. (2017), ‘Inferences in panel data with interactive effects using large covariance matrices’,

Journal of Econometrics 200(1), 59–78.

Bai, J. and Ng, S. (2013), ‘Principal components estimation and identification of static factors’, Journal of

Econometrics 176(1), 18–99.

Berger, A. N., Demsetz, R. S. and Strahan, P. E. (1999), ‘The consolidation of the financial services industry:

Causes, consequences, and implications for the future’, Journal of Banking and Finance 23(2-4), 135–194.

36

Page 39: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Berger, A. N. and Mester, L. J. (2003), ‘Explaining the dramatic changes in the performance of U.S. banks:

Technological change, deregulation, and dynamic changes in competition’, Journal of Financial Intermediation

12(1), 57–95.

Bernstein, D. S. (2005), Matrix Mathematics: Theory, Facts, and Formulas, Princeton University Press.

Chang, J., Guo, B. and Yao, Q. (2015), ‘High dimensional stochastic regression with latent factors, endogeneity

and nonlinearity’, Journal of Econometrics 189(2), 297–312.

Chen, J., Gao, J. and Li, D. (2012a), ‘A new diagnostic test for cross–section uncorrelatedness in nonparametric

panel data models’, Econometric Theory 28(5), 1144–1163.

Chen, J., Gao, J. and Li, D. (2012b), ‘Semiparametric trending panel data models with cross–sectional depen-

dence’, Journal of Econometrics 171(1), 71–85.

Connor, G., Hagmann, M. and Linton, O. (2012), ‘Efficient semiparametric estimation of the fama-french model

and extensions’, Econometrica 80(2), 713–754.

Dong, C., Gao, J. and Peng, B. (2015a), Partially linear panel data models with cross-sectional dependence and

nonstationarity. Working paper at https://ideas.repec.org/p/msh/ebswps/2015-7.html.

Dong, C., Gao, J. and Peng, B. (2015b), ‘Semiparametric single-index panel data models with cross-sectional

dependence’, Journal of Econometrics 188, 301–312.

Dong, C. and Linton, O. (2017), Additive nonparametric models with time variable and both stationary and

nonstationary regressors. Working paper available at https://ssrn.com/abstract=2847681.

Fan, J. and Li, R. (2001), ‘Variable selection via nonconcave penalized likelihood and its oracle properties’,

Journal of the American Statistical Association 96(456), 1348–1360.

Fan, J., Liao, Y. and Wang, W. (2016), ‘Projected Principal Component Analysis in Factor Models’, The Annals

of Statistics 44(1), 219–254.

Feng, G., Gao, J., Peng, B. and Zhang, X. (2017), ‘A varying–coefficient panel data model with fixed effects:

theory and an application to the us commercial banks’, Journal of Econometrics 196(1), 68–82.

Feng, G. and Serletis, A. (2008), ‘Productivity trends in U.S. manufacturing: Evidence from the NQ and AIM

cost functions’, Journal of Econometrics 142(1), 281–311.

Feng, G. and Zhang, X. (2012), ‘Productivity and efficiency at large and community banks in the U.S.: A bayesian

true random effects stochastic distance frontier analysis’, Journal of Banking and Finance 36(7), 1883–1895.

Gao, J. (2007), Nonlinear Time Series: Sem– and Non–Parametric Methods, Chapman & Hall/CRC.

Hall, P., Li, Q. and Racine, J. S. (2007), ‘Nonparametric estimation of regression functions in the presence of

irrelevant regressors’, Review of Economics and Statistics 89(4), 784–789.

Hansen, B. (2015), A unified asymptotic distribution theory for parametric and non-parametric least squares.

Working paper available at https://www.ssc.wisc.edu/ bhansen/preliminary/rnormal4.pdf.

Hsiao, C. (2003), Analysis of Panel Data, Cambridge University Press.

37

Page 40: Varying-Coefficient Panel Data Models with Partially ... · For panel data models, in terms of time dimension, one often encounters three types of regressors: (1) stationary (e.g.,

Jiang, B., Yang, Y., Gao, J. and Hsiao, C. (2017), Recursive estimation in large panel data models: Theory and

practice. Working paper available at https://ssrn.com/abstract=2915749.

Jones, K. D. and Critchfield, T. (2005), ‘Consolidation in the U.S. banking industry: Is the long, strange trip

about to end?’, FDIC Banking Review 17(4), 31–61.

Kapetanios, G., Pesaran, M. H. and Yamagata, T. (2011), ‘Panels with non-stationary multifactor error struc-

tures’, Journal of Econometrics 160(2), 326–348.

Koo, B. and Linton, O. (2012), ‘Estimation of semiparametric locally stationary diffusion models’, Journal of

Econometrics 170(1), 210–233.

Li, D., Qian, J. and Su, L. (2016), ‘Panel data models with interactive fixed effects and multiple structural

breaks’, Journal of the American Statistical Association 111(516), 1804–1819.

Newey, W. K. (1997), ‘Convergence rates and asymptotic normality for series estimators’, Journal of Econometrics

79(1), 147–168.

Park, J. Y. and Phillips, P. C. B. (2001), ‘Nonlinear regressions with integrated time series’, Econometrica

69(1), 117–161.

Pedroni, P. (2004), ‘Panel cointegration: Asymptotic and finite sample properties of pooled time series tests with

an application of the ppp hypothesis’, Econometric theory 20, 597–625.

Pesaran, M. H. (2006), ‘Estimation and inference in large heterogeneous panels with a multifactor error structure’,

Econometrica 74(4), 967–1012.

Phillips, P. C. B. and Moon, H. R. (1999), ‘Linear regression limit theory for nonstationary panel data’, Econo-

metrica 67(5), 1057–1111.

Sealey, C. W. and Lindley, J. T. (1977), ‘Inputs, outputs, and a theory of production and cost at depository

financial institutions’, Journal of Finance 32(4), 1251–1266.

Stiroh, K. (2000), ‘How did bank holding companies prosper in the 1990s?’, Journal of Banking and Finance

24(11), 1703–1745.

Su, L. and Jin, S. (2012), ‘Sieve estimation of panel data models with cross section dependence’, Journal of

Econometrics 169(1), 34–47.

Vogt, M. (2012), ‘Nonparametric regression for locally stationary time series’, The Annals of Statistics

40(5), 2601–2633.

Wang, H. and Xia, Y. (2009), ‘Shrinkage estimation of the varying coefficient’, Journal of the American Statistical

Association 104(486), 747—757.

38