a robust bootstrap approach to the hausman test in ...terms. the use of regional panel data is...

35
A robust bootstrap approach to the Hausman test in stationary panel data models with general error covariance structure Helmut Herwartz Michael H. Neumann February 4, 2008 Abstract In panel data econometrics the Hausman test is of central importance to select an efficient estimator of the models’ slope parameters. For testing the null hypothesis of no correlation between unobserved heterogeneity and observable explanatory variables, model disturbances are typically assumed to be independent and identically distributed over the time and the cross section dimension. The test statistic lacks pivotalness in case the iid assumption is violated. GLS based test statistics also build upon strong homogeneity restrictions that might not be met by empirical data. We propose a wild bootstrap approach to specification testing in panel data models which is robust under cross sectional or time heteroskedasticity and inhomogeneous patterns of serial correlation. A Monte Carlo study shows that in small samples the wild bootstrap outperforms inference based on critical values taken from a χ 2 -distribution. Moreover, as a benchmark iid resampling schemes fail under cross sectional heterogeneity. Keywords: Hausman test, random effects model, wild bootstrap, heteroskedasticity. JEL Classification: C12, C33. * Institut f¨ ur Statistik und ¨ Okonometrie, Christian–Albrechts–Universit¨at zu Kiel, Ohlshausenstr. 40, D– 24098 Kiel, E-mail: [email protected] (corresponding author) Institut f¨ ur Stochastik, Friedrich–Schiller–Universit¨ at Jena, Ernst-Abbe-Platz 2, D–07743 Jena, E-mail: [email protected]

Upload: others

Post on 05-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

A robust bootstrap approach to the Hausman test in

stationary panel data models with general error

covariance structure

Helmut Herwartz∗ Michael H. Neumann†

February 4, 2008

Abstract

In panel data econometrics the Hausman test is of central importance to select an

efficient estimator of the models’ slope parameters. For testing the null hypothesis of

no correlation between unobserved heterogeneity and observable explanatory variables,

model disturbances are typically assumed to be independent and identically distributed

over the time and the cross section dimension. The test statistic lacks pivotalness in

case the iid assumption is violated. GLS based test statistics also build upon strong

homogeneity restrictions that might not be met by empirical data. We propose a

wild bootstrap approach to specification testing in panel data models which is robust

under cross sectional or time heteroskedasticity and inhomogeneous patterns of serial

correlation. A Monte Carlo study shows that in small samples the wild bootstrap

outperforms inference based on critical values taken from a χ2-distribution. Moreover,

as a benchmark iid resampling schemes fail under cross sectional heterogeneity.

Keywords: Hausman test, random effects model, wild bootstrap, heteroskedasticity.

JEL Classification: C12, C33.

∗Institut fur Statistik und Okonometrie, Christian–Albrechts–Universitat zu Kiel, Ohlshausenstr. 40, D–

24098 Kiel, E-mail: [email protected] (corresponding author)†Institut fur Stochastik, Friedrich–Schiller–Universitat Jena, Ernst-Abbe-Platz 2, D–07743 Jena, E-mail:

[email protected]

Page 2: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

1 Introduction

Panel data models are often formalized under conditional absence of serial error correlation

and homoskedasticity over both the time and cross section dimension. Model disturbances

are often modelled by iid random variables in case of microeconometric studies where a set

of anonymous households or firms enters the analysis. For such widespread applications of

panel models, however, (neglected) dynamic features might show up in autocorrelated error

terms. The use of regional panel data is recently becoming more and more popular in macro-

and spatial econometrics. Typical fields where panel data models are employed cover, for

instance, models of growth, international or interregional trade, or empirical approaches to

urban crime or environmental economics (Baltagi and Kao, 2000; Anselin, Florax and Rey,

2004). A core issue in panel specification testing is the selection of an efficient estimator in

presence of unobserved heterogeneity. The Hausman test has become a prominent means of

inference against correlation between individual effects and observable explanatory variables

(Hausman, 1978). In applied spatial econometrics serial error correlation is likely to emerge

whenever a region only partially absorbs a shock within the unit of time used as the sampling

frequency. The presumption of time invariant error variances may also be criticized. In

econometrics of financial data time dependent variances have attracted a huge theoretical

and empirical interest (Bollerslev, Chou and Kroner, 1992). Similarly, shifts in the variations

of disturbances may occur as a consequence of (fiscal or monetary) policy changes, central

bank interventions or regime switches. Cross sectional patterns of second order heterogeneity

are also more the rule than an exception. For instance, one may intuitively expect that large

firms or industrialized regions are likely to respond to exogenous shocks at a different scale

in comparison with small firms or more agricultural regions.

Occasionally panel data models have been formalized with some pattern of serial error

correlation (Lillard and Willis, 1979; Baltagi, 2001, Chapter 5). Then, correlation might be

specified parsimoniously with some first order autoregressive parameter. Over all members

of a cross section a first order autocorrelation scheme might fail to provide a uniformly accu-

rate approximation of the true underlying pattern of error dynamics. Moreover, it is likely

that the autocorrelation parameter, if it exists, is cross section specific. Obviously, when

allowing serially correlated disturbances within a panel data framework potential directions

of covariance misspecification are magnifold.

2

Page 3: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

The asymptotic distribution of common panel specification test statistics derived under

an iid assumption depends on nuisance parameters if model disturbances are actually het-

eroskedastic over time, serially correlated or lack homogeneity over the cross section. On

the one hand neglecting such forms of heterogeneity may invalidate conclusions obtained un-

der an unrealistic modelling framework. On the other hand deriving first order asymptotic

approximations is often cumbersome if not impossible in presence of nuisance parameters.

Under such circumstances bootstrap approaches are in widespread use to obtain robust crit-

ical values for a particular test statistic. Li (2006) illustrates the merits of a block bootstrap

approach to detect a failure of exogeneity in regression models under serially correlated error

terms by means of an adjusted Hausman statistic. This approach is outlined for the case

of an infinite time dimension. By means of Edgeworth expansions, Bole and Rebec (2004)

prove asymptotic refinements achieved by iid resampling for the case of the classical Haus-

man test in stationary panel data models with iid error terms. Cameron and Trivedi (2005)

advocate resampling of cross sectional error vectors with replacement to estimate the covari-

ance matrix that enters the panel Hausman statistic. In spite of its feasibility, however, iid

resampling might suffer from theoretically invalid size features if model disturbances stem

from distributions that are (conditionally) heterogeneous over the cross section.

It is the purpose of this paper to contribute a robust approach to determine critical values

for the Hausman statistic. It retains its validity in panels with finite time series dimension,

under cross sectional heteroskedasticity, and (possibly time varying or cross section specific)

serial error correlation. The proposed method exploits a convenient feature of the wild

bootstrap which copes with heteroskedasticity of model disturbances (Wu, 1986; Liu, 1988;

Mammen, 1993) and cross sectional error correlation (Herwartz and Neumann, 2005).

This paper is organized as follows: The panel model and the test statistic are given in

the next section. Then, Section 3 provides a bootstrap approach to generate critical values

for the Hausman statistic. A simulation study, given in Section 4, illustrates the finite

sample performance of alternative resampling schemes and approaches motivated by first

order asymptotic approximations. As an empirical example Section 5 provides specification

tests for modelling dairy production in a cross section of farms located in Northern Germany.

Conclusions are drawn in Section 6. An Appendix provides the proofs of the asymptotic

results.

3

Page 4: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

2 The model and the test statistic

2.1 A panel model with generalized covariance structure

Consider the common panel data model with random individual effects by observation

yit = x′itβ + ν + uit, uit = αi + eit (i = 1, . . . , N ; t = 1, . . . , T ). (1)

Defining Yi = (yi1 . . . , yiT )′, Xi = (xi1, . . . , xiT )′ and ei = (ei1, . . . , eiT )′ we can rewrite (1) in

matrix notation as

Yi = Xiβ + ν11T + ui, ui = αi11T + ei (i = 1, . . . , N) (2)

or, with Y = (Y ′1 , . . . , Y

′N)′, X = (X ′

1, . . . , X′N)′, u = (u′

1, . . . , u′N)′, α = (α111

′T , . . . , αN11′T )′

and e = (e′1, . . . , e′N)′,

Y = Xβ + ν11NT + u, u = α + e. (3)

In (1) xit is a K × 1 random vector of explanatory variables. Accordingly, β is a K−dimensional parameter vector, ν denotes an intercept term and 11R denotes, for any R ∈ N,

an R-dimensional vector consisting of ones. By assumption, the random individual effects

αi ∼ (0, ω2i ) are independent from the disturbances eit. Note that in case ω2

1 = . . . = ω2N = 0

the pooled regression is obtained as a special case of (1). With respect to the covariance of

the mean zero innovations eit we allow a pattern of serially correlated but cross sectionally

uncorrelated error terms. Then, with E[ei] = 0T the latter scenario is formalized as

E[eie′j] = δij Σi, (4)

where Σi is a positive definite matrix of dimension T × T and δij is the Kronecker delta.

According to (4) model disturbances may stem from cross sectionally heterogenous distri-

butions and show time specific second order features. With regard to serial correlation the

general specification covers e.g. the first order autoregressive model put forth by Lillard and

Willis (1979), i.e.

eit = ρeit−1 + ǫit, |ρ| < 1, ǫit ∼ iid(0, σ2ǫ ). (5)

The autocorrelation structure specified in (5) is, however, very restrictive owing to the pos-

tulates of an exponentially decaying autocorrelation function on the one hand and cross

sectional homogeneity on the other hand. As alternative specifications one may regard error

4

Page 5: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

terms following a higher order autoregression or some moving average pattern. For a brief

review of alternative parametric suggestions and their treatment for feasible GLS estimation

the reader may consult Baltagi (2001, Chapter 5.2). In any case, the more general error

distribution complicates feasible GLS estimation of the models’ slope parameters and, more

importantly, introduces a source of potential misspecification of the model.

Given the likelihood of cross section specific covariance features it appears more natural

to allow general unspecified patterns of second order features in microeconometric or spatial

panel data models as (1). For the reader’s convenience, we briefly discuss in this section

generalized estimators one of which is efficient if individual effects and explanatory variables

are uncorrelated. As it is typical in panel data modelling this correlation feature is subjected

to specification testing by means of a (generalized) Hausman statistic which is provided

below. Since misspecification might be seen as a crucial issue in this vein of econometric

modelling we also discuss the distributional features of the generalized Hausman statistic

under misspecification of the covariance pattern. Before introducing generalized estimators

and test statistics we make the following assumptions:

(A1) (i) α1, . . . , αN , e1, . . . , eN are conditionally on X = (X ′1, . . . , X

′N)′ independent,

(ii) E(ei | X) = 0T , E(αi | X) = 0,

(iii) there exist positive constants C1 and C2 such that

var(αi | X) = ω2i , 0 < C1 ≤ ω2

i ≤ C2 < ∞,

Cov(ei | X) = Σi, C1IT Σi C2IT (A B if B − A is positive

semidefinite),

(iv) the random variables (e2it)i,t and (α2

i )i are conditionally on X uniformly integrable,

that is,

supi∈N

max1≤t≤T

E(e2

itI(|eit| > c)∣∣X)

−→c→∞

0,

supi∈N

E(α2

i I(|αi| > c)∣∣X)

−→c→∞

0,

(v) the random variables (‖X ′iXi‖)i∈N are uniformly integrable, that is,

supi∈N

E [‖X ′iXi‖I(‖X ′

iXi‖ > c)] −→c→∞

0.

Remark 1. It is well known that uniform integrability follows from boundedness of moments

of higher order. For example, since E (e2itI(|eit| > c)|X) ≤ c−δE(|e2+δ

it | X) the first condi-

5

Page 6: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

tion in (A1)(iv) will follow from supi∈Nmax1≤t≤T E(|eit|2+δ | X) < ∞, for some δ > 0.

2.2 Generalized estimators

Denote Ω := Cov(u) = Diag[Ω1, . . . , ΩN ], where Ωi = Σi + ω2i 11T 11′T . An efficient estimator

of β is given as

βGLS = A−1N

N∑

i=1

aN,iYi, (6)

where

AN =1

N

(X ′Ω−1X − X ′Ω−111NT 11′NT Ω−1X

11′NT Ω−111NT

),

aN,i =1

N

(X ′

iΩ−1i − 1

11′NT Ω−111NT

X ′Ω−111NT 11′T Ω−1i

).

In the special case of Σi = σ2IT and ω2i = ω2, this yields the common random effects

estimator (Baltagi, 2001, Chapter 2)

βiidGLS =

(∑

i,t

xitx′it + T

σ2

σ2 + Tω2

i

xix′i

)−1(∑

i,t

xityit + Tσ2

σ2 + Tω2

i

xiyi

), (7)

where standard conventions are used to denote centered variables, for instance xit = xit −xi·, xi = xi· − x, xi· = (

∑t xit)/T, x = (

∑i,t xit)/(NT ).

If E(αi | Xi) does not necessarily vanish, then βGLS is in general not a consistent estimator

of β. In this case we can augment the design matrix X with the NT × N -matrix W =

Diag[11T , . . . , 11T ] and obtain from (3) the equation

Y = (W X)

δ

β

+ u, (8)

where δ = (ν + E(α1 | X1), . . . , ν + E(αN | XN))′ and u = u − W (E(α1 | X1), . . . , E(αN |XN))′. Then, an efficient estimator of β is the (generalized) fixed effect or least squares

dummy variable estimator (LSDV)

βFE =(X ′Ω−1X − X ′Ω−1W (W ′Ω−1W )−1W ′Ω−1X

)−1

(X ′Ω−1 − X ′Ω−1W (W ′Ω−1W )−1W ′Ω−1

)Y.

βFE is the (unique) best linear unbiased estimator (BLUE) of β in model (8), that is, it

can be written in the form LY , where unbiasedness of βFE requires that LX = IK and

6

Page 7: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

LW = 0K×N while optimality means that Cov(βFE) = LΩL′ is minimal under these side

conditions. However, since all matrices L satisfying these side conditions fulfill LΩL′ =

LDiag[Σ1, . . . , ΣN ]L′, it follows that βFE is equal to the BLUE in model (8) with Cov(u) =

Σ := Diag[Σ1, . . . , ΣN ].

Therefore, βFE can also be written as

βFE = B−1N

N∑

i=1

bN,iYi, (9)

where

BN =1

N

N∑

i=1

(X ′

iΣ−1i Xi −

1

11′T Σ−1i 11T

X ′iΣ

−1i 11T 11′T Σ−1

i Xi

),

bN,i =1

N

(X ′

iΣ−1i − 1

11′T Σ−1i 11T

X ′iΣ

−1i 11T 11′T Σ−1

i

).

In the special case of Σi = σ2IT this estimator simplifies to the standard LSDV estimator

βiidFE =

(∑

i,t

xitx′it

)−1∑

i,t

xityit; (10)

see Baltagi (2001, Chapter 2).

2.3 The Hausman statistic

As mentioned OLS or GLS estimation of the slope parameters in (1) will be biased if the

individual effects αi are correlated with (some of) the explanatory variables xit,1, . . . , xit,K .

On the other hand, under assumption (A1,iii) the GLS estimator is to be preferred over the

OLS or LSDV estimator since it exploits the underlying error covariance structure efficiently.

Moreover, estimation of N fixed effects is avoided such that model evaluation does not suffer

from incidential parameters. Therefore, a test for correlation between individual effects and

explanatory variables is essential to select an efficient estimator for the model in (1). The

Hausman statistic (Hausman, 1978) has become a prominent tool to test the null hypothesis

that individual effects are uncorrelated with the variables in xit against the alternative of

correlation, i.e.

H0 : E(αi|Xi) = 0 vs. H1 : E(αi|Xi) 6≡ 0 for at least one i. (11)

In this paper we allow the error terms ei to have some general covariance pattern as

formalized in (4). Accordingly, we consider a GLS based modification of the Hausman

7

Page 8: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

statistic. It follows from least squares theory that Cov(βGLS | X) = 1N

A−1N , Cov(βFE | X) =

1N

B−1N and, since βGLS is efficient under (A1), we have also that Cov(

√N(βFE − βGLS) |

X) = B−1N − A−1

N (Hausman, 1978). Moreover, it follows from assumption (A2) below that

B−1N −A−1

N is a positive definite matrix if N is sufficiently large. For simplicity of presentation

we assume that this holds true even for all N . The Hausman statistic is

HN = N(βFE − βGLS)′(B−1

N − A−1N

)−1(βFE − βGLS). (12)

Note that√

N(βFE − βGLS) =N∑

i=1

CN,iui, (13)

where CN,i =√

N(B−1N bN,i − A−1

N aN,i). Accordingly, the Hausman statistic allows a repre-

sentation as a quadratic form in the underlying model disturbances, i.e.

HN =

∥∥∥∥∥(B−1

N − A−1N

)−1/2N∑

i=1

CN,iui

∥∥∥∥∥

2

. (14)

In the case that the covariance parameters are unknown but respective estimators are avail-

able, we can estimate the matrices AN and BN by AN and BN . In this case we consider the

statistic with estimated covariance parameters

HN = N(βFE −

βGLS)′(B−1

N − A−1N

)−1

(βFE −

βGLS), (15)

whereβFE and

βGLS are feasible LSDV and GLS estimators. The corresponding quadratic

form representation of HN is analogous to (14).

To derive the asymptotic properties of the Hausman statistic we make the following as-

sumptions:

(A2) It holds that ANP−→ A and BN

P−→ B, as N → ∞, where B and A−B are positive

definite matrices.

The following assertion characterizes the asymptotic behavior of HN and HN under the

null hypothesis.

Proposition 1. Suppose that (A1) and (A2) are fulfilled. Then, as N → ∞,

HNd−→ χ2(K). (16)

8

Page 9: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

Furthermore, if AN and BN are consistent estimators of AN and BN , that is, ‖AN−AN‖ P−→0 and ‖BN − BN‖ P−→ 0, and if

√N(βFE − βGLS)

d−→ N (0K , B−1 − A−1), then

HNd−→ χ2(K). (17)

The asymptotic results in (16) and (17) are both derived for the case of a finite time

dimension T . Owing to consistency of βGLS and βFE their difference vanishes under (A1)

and (A2) as T → ∞. For the case of an underlying iid covariance structure Ahn and

Moon (2001) show that as T → ∞, Cov[βFE − βGLS] converges sufficiently fast to ensure a

nondegenerate limit distribution of the standard Hausman statistic.

As argued before, any (cross sectionally homogeneous) a-priori formalization of panel

covariance features is likely subjected to misspecification. Therefore we also consider the

realistic case where the presumed covariance pattern differs from the unknown covariance

structure. We still assume that the true covariances are given by (A1,iii), that is, Ω is the

covariance matrix of u as above. Let now Ω denote the covariance specification which is

actually used for constructing the (feasible) LSDV and GLS estimators and the test statis-

tic. Denote by AN , aN,i, BN , bN,i and CN,i the analogues of AN , aN,i, BN , bN,i and CN,i,

respectively, where for each term the true covariance matrix Ω is replaced by the presumed

(false) covariance matrix Ω. In this case, the difference between the two panel estimators

writes as√

N(βFE − βGLS) =

∑Ni=1 CN,iui, obtaining the test statistic

HN = N

(βFE −

βGLS

)′ (B−1

N − A−1N

)−1(βFE −

βGLS

)(18)

=

∥∥∥∥∥(B−1

N − A−1N

)−1/2N∑

i=1

CN,iui

∥∥∥∥∥

2

.

We can see in complete analogy to the correctly specified case that the matrix B−1N − A−1

N

is positive semidefinite. Actually, assume that Cov(u) were equal to Ω rather than Ω. Then

βGLS were efficient and it would follow that B−1

N − A−1N = CoveΩ(

√NβFE)−CoveΩ(

√NβGLS)

is equal to CoveΩ(√

N(βFE − βGLS)) (CoveΩ denotes the covariance matrix under the hypo-

thetical scenario that Cov(u) = Ω). Hence, B−1N − A−1

N is positive semidefinite, even if Ω

deviates from the true covariance matrix Ω. Regularity of B−1N − A−1

N follows from assump-

tion (A3) below, for N sufficiently large. For simplicity of presentation, we assume again

that this holds true for all N .

9

Page 10: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

Finally, in the realistic case that estimates of the covariances are used, we obtain the

statistic

HN =

∥∥∥∥∥

(B

−1

N − A

−1

N

)−1/2 N∑

i=1

CN,iui

∥∥∥∥∥

2

, (19)

whereAN ,

BN and

CN,i are the analogues of AN , BN and CN,i, respectively, with the

presumed (false) covariances Ω replaced by their estimatesΩ.

For the asymptotic considerations we assume additionally

(A3) It holds that ANP−→ A and BN

P−→ B, as N → ∞, where B and A − B are positive

definite matrices. Furthermore,∑N

i=1 CN,iΩiC′N,i

P−→ D, where D is a non-vanishing

matrix.

The following proposition describes the asymptotic behavior of the Hausman statistics

HN andHN in the misspecified case, and under the null hypothesis. In contrast to the

result in Proposition 1, they converge now in distribution to a random variable which is a

weighted sum of independent χ2(1) random variables.

Proposition 2. Suppose that (A1), (A2) and (A3) are fulfilled. Then, as N → ∞,

HNd−→

K∑

i=1

λiZ2i ,

where Z1, . . . , ZK are independent standard normal random variables and λ1, . . . , λK are the

eigenvalues of the matrix (B−1 − A−1)−1/2D(B−1 − A−1)−1/2.

Furthermore, ifAN and

BN are consistent estimators of AN and BN , that is, ‖ AN −

AN‖ P−→ 0 and ‖ BN − BN‖ P−→ 0, and if∑N

i=1

CN,iui

d−→ N (0K , D), then

HNd−→

K∑

i=1

λiZ2i .

It is widespread folklore that bootstrap inference offers asymptotic improvements over

first order asymptotic approximations if a respective test statistic is asymptotically pivotal;

see for example Hall (1992). From this perspective our results suggest that depending on the

true covariance Ω the resampling as addressed in Section 3 either provides valid significance

levels (Ω 6= Ω) or faster convergence of empirical to nominal significance levels (Ω = Ω), see

also Bole and Rebec (2004) for an analysis of the Hausman statistic in the iid case.

10

Page 11: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

2.4 Alternative covariance estimators

The implementation of the generalized estimators introduced in Section 2.2 or the Hausman

statistic in Section 2.3 requires some a-priori choice of Cov[ui] or Cov[ei]. To estimate

these matrices a convenient starting point is the common fixed effect estimator building

on iid assumptions and given in (10). An intercept estimate is ν = y − x′βiidFE, with y =

(NT )−1∑

i,t yit and x denoting the K × 1-dimensional vector of unconditional means of

explanatory variables. Implied disturbance vectors ui, ei, and fixed effect estimates are,

respectively,

ui = yi − 11T ν − xiβiidFE, ei = yi − xiβ

iidFE, and αi = ν + 11′T ui/T.

Then, the variance of individual effects can be evaluated as (Nerlove, 1971)

ω2 =1

N − 1

N∑

i=1

(αi − α)2, α =1

N

N∑

i=1

αi. (20)

The estimator ω2 is consistent as N → ∞ and nonnegative by construction. Owing to the

latter property it might be particularly useful in Monte Carlo experiments. Apart from

the estimator in (20) one may also evaluate ω2 by means of other approaches going back to

Swamy and Arora (1972), Wallace and Hussain (1969) or Amemiya (1971). The focus of this

paper lies, however, on the characterization of alternative venues to obtain critical values for

the Hausman statistic.

In a second step feasible covariance estimators (Ω, Σ) entering the Hausman statistic

HN ,HN are constructed conditional on particular parametric assumptions. Alternatively,

to avoid overly strong restrictions, an analyst may also rely on semiparametric covariance

estimators. In the following we list a number of potential covariance estimators that are

likely to offer different empirical features of inference on correlation between xit and αi.

1. Unconditional estimation of parametric covariances

Presuming cross sectionally homogeneous serial correlation as in (5) the unconditional

variance is estimated as

var(eit) = σ2ǫ /(1 − ρ2) =

1

(N(T − 1) − K)

N∑

i=1

T∑

t=1

e2it. (21)

A pooled regression delivers an estimate of the autoregressive parameter

ρ =

∑Ni=1

∑Tt=2 ei,tei,t−1∑N

i=1

∑Tt=2 e2

i,t−1

, and, accordingly, σ2ǫ = var(eit)(1 − ρ2). (22)

11

Page 12: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

By means of the structural estimates ρ and σ2ǫ the matrix Cov[ei], denoted Σ(AR), can

be composed in the usual way.

2. Conditional estimation of variance parameters

A stronger parametric approach in comparison with Σ(AR) obtains by imposing some

a-priori restriction ρ = ρ0, and estimate subsequently the variance parameter in (22)

conditionally. A covariance estimator obtained along these lines is denoted Σ(ρ=ρ0).

3. Cross sectional averaging

Instead of presuming an explicit parametric autoregression one may alternatively esti-

mate Cov[ei] as an average pattern of second order moments evaluated over the cross

section. Along these lines the overall estimator depends on the particular covariance

structure presumed to hold for cross section specific quantities. We distinguish three

alternative scenarios, a finite order moving average representation (MA), time depen-

dent second order features (TH) and a general covariance pattern (GP). The respective

covariance estimators take the form

Σ(•) = 1/NN∑

i=1

Σ(•)i , • = MA,TH,GP. (23)

With regard to the cross sectional quantities Σ(•)i we distinguish:

MA: By assumption, eit obeys MA dynamics of order ([√

T ]), where [z] is the integer

part of z. With I(•) denoting an indicator function, a typical element of Σ(MA)i is

σ(i,MA)kl = 1/T

T−|k−l|∑

j=1

ei,j ei,j+|k−l|I(|k−l|≤[√

T ]), k, l = 1, . . . , T.

TH: Time heterogeneity of second order moments and absence of higher order serial

correlation motivates an estimator Σ(TH)i with typical elements

σ(i,TH)kl = ei,kei,lI(|k−l|≤[

√T ]).

GP: Owing to the within transformation applied to determine βiidFE the matrix eie

′i

is of reduced rank (Kiefer, 1980). The third choice of a cross section specific

covariance matrix utilizes as many serial cross products as possible to obtain a

full rank covariance matrix. A typical element of Σ(GP )

σ(i,GP )kl = ei,kei,lI(|k−l|≤T−2).

12

Page 13: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

4. Semiparametric covariance estimation

Nonparametric estimation of Σi suffers from the rank deficit of eie′i. An indirect non-

parametric estimator of Cov[ei] is to estimate Cov[ui] nonparametrically and subtract

the variance contribution of the individual effects. This yields

Σ(SP ) = Ω − ω211T 11′T , with Ω = 1/NN∑

i=1

uiu′i. (24)

3 Bootstrapping the Hausman statistic

As stated in Proposition 2 misspecification of the covariance structure of the innovations

entering a panel model implies that the (generalized) Hausman test statistic lacks pivotalness.

In this case the asymptotic distribution depends on the true covariance matrix Ω which is

unknown. Therefore, it is hardly possible to derive the nuisance parameters λi analytically,

such that the actual asymptotic distribution of the Hausman statistic is unfeasible. A-priori

information necessary to justify conditional parameter estimation is, however, often hardly

available. From this perspective the generality of the bootstrap methodology is immediately

clear as it obtains asymptotically correct critical values even under some misspecification of

the actual covariance pattern. As a particularly important case of misspecification one may

regard the imposition of cross sectionally homogeneous covariance features whenever the

true error distribution varies over the cross section. Owing to particular issues raised for the

Hausman test, as cross and intra sectional heteroskedasticity, the so-called wild or external

bootstrap (Wu, 1986) can be seen as a natural tool to determine critical values. Addressing

the case of heteroskedasticity of unknown form, Liu (1988) and Mammen (1993) established

the wild bootstrap to approximate the distribution of studentized statistics and F-type tests

in static linear regression models, respectively. Recently, Herwartz and Neumann (2005) have

used the wild bootstrap to mimic cross sectional covariance patterns observed in systems of

error correction models. For the general convenience of the wild bootstrap it is worthwhile to

mention that its implementation does not require any a-priori parametric guess concerning

the covariance structure of model disturbances.

According to the different versions of the Hausman statistic in (12), (15), (18) and (19)

resampling the statistic may proceed under alternative degrees of knowledge of the underlying

covariance structure. Depending on the estimates Σ(•) a particular Hausman statistic is

13

Page 14: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

obtained that is either asymptotically pivotal (HN , HN) or depends on nuisance parameters

(HN ,HN)). To exemplify the bootstrap based generation of critical values for the Hausman

statistic we consider the case of HN that has a quadratic form representation analogous to

(14). Resampling the other variants of the Hausman statistic is in complete analogy and

merely differs in the estimate of Σi or Ωi. The implementation of the bootstrap scheme

consists of the imitation of HN and the test decision, which we now sketch in more detail.

1. Particular bootstrap counterparts of HN are obtained in two steps:

i. draw bootstrap variables u∗i sharing second order features of ui as

u∗i = ui · ηi, ηi ∼ iid(0, 1), i = 1, . . . , N, (25)

where ηi are independent of the variables in the model;

ii. obtain a bootstrap statistic H∗N from its quadratic form representation as

H∗N =

∥∥∥∥∥(B−1

N − A−1N

)−1/2N∑

i=1

CN,iu∗i

∥∥∥∥∥

2

, (26)

where AN , BN , CN are the estimated counterparts of AN , BN , CN defined in con-

nection with (9), (6) and (13), respectively;

2. Decision: Define c∗γ as the (1 − γ)-quantile of H∗N , i.e. c∗γ = infc : P (H∗

N ≤ c |X, u1, . . . , uN) ≥ 1 − γ. In practice, c∗γ can conveniently be obtained by simulating

H∗N S times, for sufficiently large S, and choosing c∗γ as the empirical (1− γ)-quantile.

Reject H0 with significance level γ if HN exceeds c∗γ.

The central ingredient of the bootstrap procedure is the imitation of the first two moments

of ui = (ui1, . . . , uiT )′ by means of the quantity

u∗i = (u∗

i1, . . . , u∗iT )′ = ηi (ui1, . . . , uiT )′ = ηiui.

Since,

1

N

N∑

i=1

Cov(u∗i | X, u1, . . . , uN) =

1

N

N∑

t=1

uiu′i =

1

N

N∑

i=1

uiu′i + oP (1)

P→ 1

N

N∑

i=1

Cov(ui),

the bootstrap reflects, on average, the true underlying covariances. Note that the latter may

exhibit some variation over the cross section as formalized by assumption (A1,iii).

14

Page 15: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

Several approaches to draw ηi are available from the literature (Mammen, 1993; Liu, 1988).

For the Monte Carlo study ηi is drawn alternatively from the Rademacher distribution (Liu,

1988; Davidson and Flachaire, 2001),

P (η(R)i = 1) = P (η

(R)i = −1) = 0.5, (27)

and a two point distribution proposed in Mammen (1993)

η(M)i =

(√5 − 1

)/2 with probability q =

(√5 + 1)/(2

√5),

(√5 + 1

)/2 with probability 1 − q.

. (28)

While both choices are in line with (25) the two distributions differ with respect to

higher order moments imitated by the bootstrap design. In particular, E[(η(M)i )3] = 1

while E[(η(R)i )3] = 0 and E[(η

(R)i )4] = 1.

To state the asymptotic features of the wild bootstrap scheme denote XN = (X, u1, . . . , uN).

We will show that the bootstrap counterpart H∗N of the Hausman statistic has the same

asymptotic behavior as HN . Since the conditional distribution of H∗N given XN is itself ran-

dom we obtain weak convergence of these distributions to their common limit in probability.

Proposition 3. Suppose that (A1) and (A2) are fulfilled. Then, as N → ∞,

sup−∞<z<∞

∣∣P (H∗N ≤ z | XN) − P (χ2(K) ≤ z)

∣∣ P−→ 0.

Propositions 1 and 3 together imply that the bootstrap test has asymptotically the correct

size.

Theorem 1. Suppose that (A1) and (A2) are fulfilled. Then

PH0

(HN > c∗γ

)−→N→∞

γ.

In the case of incorrectly specified covariances we obtain analogous results: Denote by

H∗N the analogue to HN given in (18).

Proposition 4. Suppose that (A1), (A2) and (A3) are fulfilled. Then, as N → ∞,

sup−∞<z<∞

∣∣∣∣∣P (H∗N ≤ z | XN) − P

( K∑

i=1

λiZ2i ≤ z

)∣∣∣∣∣

P−→ 0,

where Z1, . . . , ZK and λ1, . . . , λK are as in Proposition 2.

15

Page 16: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

Denote by c∗γ the (1 − γ)-quantile of L(H∗N | XN). Propositions 2 and 4 together imply

that the bootstrap test has asymptotically the correct size.

Theorem 2. Suppose that (A1), (A2) and (A3) are fulfilled. Then

PH0

(HN > c∗γ

)−→N→∞

γ.

4 Monte Carlo analysis

As argued above the bootstrap approach to test the null hypothesis of no correlation be-

tween individual effects αi and explanatory variables xit allows for numerous deviations from

conditional iid assumptions for the analysis of stationary panel data models. Error vectors

ei may have a non-diagonal covariance matrix Σi, formalizing serial correlation. Along its

diagonal Σi might collect time specific variances, and, moreover, the covariance is allowed

to vary over the cross section. Finally, the unobservable error components in αi may feature

cross section specific second order properties.

The Monte Carlo study documented in this section addresses the performance of the

bootstrap method over potential violations of homogeneity assumptions. Moreover, the

finite sample properties of wild bootstrap and standard feasible GLS inference are compared.

The Hausman statistic depends on an analyst’s choice of a particular covariance estimator.

Since any presumption concerning the correlation pattern could be wrong, the Monte Carlo

analysis also sheds light on inferential features invoked by alternative covariance estimators.

In particular, we address the effect of neglecting the potential of serial correlation. Moreover,

the recommended wild bootstrap scheme is contrasted against iid resampling and a bootstrap

based strategy to estimate the covariance of the difference between the inefficient standard

GLS and LSDV estimators under the null hypothesis of the Hausman test.

16

Page 17: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

4.1 The simulation design

4.1.1 The considered data generating processes

Our Monte Carlo experiments basically employ the following homogeneous model specifica-

tion

yit = 1 + xit,1 + xit,2 + eit, t = 1, . . . , T, i = 1, . . . , N. (29)

The right hand side variables xit,2 are drawn from a Gaussian distribution, xit,2 ∼ N (0, 1),

and fixed over all replications of an experiment. Similarly, variables xit,1 are generated from

the model

xit,1 = µi + ξit, ξit ∼ N (0, 1), µi = 4(i − 1)/(N − 1).

Owing to the deterministic component µi the unconditional level of xit,1 is ordered equidis-

tantly over the cross section between values of 0 and 4. Individual effects αi are also drawn

from the normal distribution. Nesting the null and the alternative hypothesis of the Hausman

test, the individual effects are drawn as

αi = δxi·,1 + ωiζi, ζi ∼ N (0, 1). (30)

According to (30) individual effects αi and explanatory variables xit,1 are correlated if δ 6= 0.

Under the null hypothesis δ = 0.

Vectors of error terms ei are drawn from a T -dimensional normal distribution as

ei = G′ivi, vi ∼ N (0, IT ), G′

iGi = Σi, (31)

where IT is the T -dimensional identity matrix and Gi is an upper triangular matrix obtained

from a Cholesky decomposition of Σi. The particular choices of Σi = [σ(i)kl ] cover the following

data generating models (DGMs):

• DGM 1: Cross sectionally homogeneous patterns of first order serial correlation with

unconditional unit variance, i.e.

σ(i)kl = ρ|k−l|, ρ = 0.5. (32)

• DGM 2: Cross sectionally homogeneous patterns of serial correlation, ρ = 0.5, com-

bined with heterogenous variances, such that

diag[Σ(i)] = exp(0.5(xit,1 − xi·)),

17

Page 18: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

and

σ(i)kl = ρ|k−l|

√σ

(i)kkσ

(i)ll , if k 6= l.

• DGM 3: Cross sectionally varying patterns of serial correlation with an unconditional

variance of unity, such that

σ(i)kl = ρ

|k−l|i .

The parameters, ρ1 ≤ ρ2 ≤ . . . ≤ ρN , are drawn once from a uniform distribution,

U [−0.9, 0.9], and then fixed over all replications of an experiment.

Over all specifications DGM 1 to DGM 3 serial dependence is established by a first

order autoregression. We consider a further underlying error distribution with an irregular

covariance structure. For this purpose we first generate uniform variables θj ∼ U(−1, 1), j =

1, 2, 4, 8, 12, and draw for each cross section member 200 observations from the model

zis = wis + θ1wis−1 + θ2wis−2 + θ4wis−4 + θ8wis−8 + θ12wis−12, wis ∼ N(0, 1), s = 1, . . . , 200.

Then, the empirical autocovariance pattern is estimated from zis200s=1 in the usual way and

employed to compose a correlation matrix Ri. Cross sectional covariances are then

• DGM 4

Σ(i) = diag[exp(0.5(xit,1 − xi·,1))]1/2Ridiag[exp(0.5(xit,1 − xi·,1))]1/2

Over all replications of an experiment the sequence of covariance matrices Σ(i) is kept fixed.

For all models introduced so far the variance of individual effects is homogeneously ω2i =

ω2 = 1. To illustrate the potentially adverse impacts of heterogenous second order features

we consider a final scenario.

• DGM 5: Cross sectionally homogeneous covariance of eit as given for DGM 1 in (32)

combined with a cross section specific variance of individual effects,

ω2i = exp(0.5xi·,1).

4.1.2 DGMs and alternative covariance estimators

All DGMs are characterized by some pattern of serial correlation such that the standard

Hausman test is likely characterized by invalid empirical test levels. DGM 1 is the only

18

Page 19: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

specification for which an asymptotically pivotal test statistic, for instance implemented with

Σ(AR), can be obtained. For all other DGMs we expect asymptotically invalid significance

levels when using critical values from a χ2 distribution for inference. Moreover, DGMs 2 to 4

feature cross sectional heterogeneity of Σ such that iid resampling schemes might fail. For the

particular scenario DGM 5 (cross sectional variance of αi) it is worthwhile to point out that

along common lines of panel data modelling the true underlying parameters ω2i cannot be

estimated consistently. In this case the bootstrap approach is particularly promising since it

allows robust inference even under a false presumption concerning the individual effects’

variances. We consider two alternative a-priori restrictions made for the autoregressive

parameter, namely ρ0 = 0 and ρ0 = 0.5. The first choice resembles the widespread situation

where the potential of serial correlation is neglected. Choosing ρ0 = 0.5 under DGMs 1,2

and 5 mirrors the (unrealistic) scenario where an analyst has access to the true unconditional

autoregressive parameter.

4.1.3 A benchmark approach

As argued, ignoring potential deviations from iid assumptions will render the standard LSDV

(β(iid)FE ) and GLS estimators (β

(iid)GLS) inefficient and thereby the covariance estimator of their

difference becomes invalid. Cameron and Trivedi (2005, p. 718) propose a bootstrap algo-

rithm to estimate Cov[β(iid)GLS − β

(iid)FE ] under serial dependence of unknown form by means of

a bootstrap procedure. This approach builds on cross sectional iid resampling of e∗i from

eiNi=1 to obtain bootstrap vectors of dependent variables, Y ∗

i = 11T αi + Xiβ(iid)FE + e∗i . From

this sample the difference between the two estimators, δs,∗ = β(iid,∗)GLS − β

(iid,∗)FE , is computed.

After S replications the Hausman statistic can be determined as

Hiid =(β

(iid)GLS − β

(iid)FE

)′(

1

S − 1

S∑

s=1

(δs − δ

) (δs − δ

)′)−1 (

β(iid)GLS − β

(iid)FE

), (33)

where δ is the mean of δs over S bootstrap replications.

4.1.4 Further remarks

To implement wild bootstrap inference we employ the Rademacher distribution in (27) and

the asymmetric distribution in (28). For the purpose of comparison we also evaluate boot-

strap approximations implemented by means of iid resampling. The number of bootstrap

19

Page 20: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

replications is set to S = 299. Investigating size and (local) power properties the parameter δ

in (30) is chosen as δ = 0 and δ = 1/√

N , respectively. The considered time series dimensions

are T = 5, 10, 20. Since the asymptotic theory in Sections 2 and 3 has been set out under

the assumption of a fixed time dimension T and N → ∞ the cross section dimensions are

chosen as N = 40 and N = 1000, where the latter choice is thought to provide sufficiently

precise estimates of asymptotic size and power. Each DGM is generated 5000 times. Mostly

our discussion of empirical test features focusses on a nominal test level γ = 0.05. However,

a few selected results are also documented for testing at the nominal 1% and 10% level.

4.2 Monte Carlo results

4.2.1 Documentation

Simulation results for cross section dimensions N = 1000 and N = 40 are provided in Ta-

ble 1 and Table 2, respectively. Most of the documented results are obtained for time series

dimension T = 10. In the left hand side panels both tables provide empirical size (H0)

and size adjusted empirical power estimates (H1) over 5 distinct DGMs, 7 alternative esti-

mators Σ(•) and a benchmark iid bootstrap approach (Cameron and Trivedi 2005, p. 718).

Size adjustment is achieved by tuning the nominal level of a particular test procedure such

that the corresponding empirical level is 5%. For each covariance estimator critical values

of the respective Hausman statistic are alternatively taken from the χ2(2) distribution or

determined by means of wild or iid resampling. To facilitate the interpretation of the empir-

ical size estimates bold entries indicate that the nominal and empirical size differ with 5%

significance. Significant size distortions are diagnosed if the empirical rejection frequencies

under H0 are not covered by a confidence interval constructed around the nominal level as

γ ± 1.96√

γ(1 − γ)/5000. Setting γ = 0.05, 4.396% and 5.604% are the lower and upper

bound of this interval, respectively. A comparison of size adjusted power features is, how-

ever, only sensible if the empirical size estimates are insignificantly close to the respective

nominal levels. Although detailed results for (unadjusted) empirical power are not provided,

all test implementations turn out to have power in the sense that rejections frequencies are

larger under H1 in comparison to H0. As simulations were performed for a local alternative

δ = 1/√

N the rejection frequencies under H1 are similar for N = 40 and N = 1000. Using

a static alternative, δ = 1 say, all test procedures turned out to be consistent, as empirical

20

Page 21: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

power is unity for all experiments with N = 1000.

While most results in Tables 1 and 2 refer to time dimension T = 10 a few selected results

are also provided for Monte Carlo experiments with T = 5, 20 performed at alternative levels

1% and 10%. For the time dimensions T = 5, 10, 20 test specific power estimates aggregated

over 5 alternative DGMs are documented. Moreover, we count significant size violations over

the 5 DGMs for nominal levels 1%, 5% and 10%.

For the case of a fully parametric covariance estimate we document empirical test features

for two versions of the wild bootstrap that differ with respect to the employed distribution

of ηi. Empirical size estimates are almost uniformly (i.e. over all DGMs and N = 40, 1000)

closer to the 5% nominal level if the wild bootstrap scheme is implemented by means of the

asymmetric distribution in (28). For 3 out of 10 experiments (5 DGMs and N = 40, 1000) the

Rademacher distribution yields significant oversizing of wild bootstrap inference. In terms of

size adjusted power both versions of the wild bootstrap perform similarly with a slight and

overall advantage of the asymmetric scheme in the asymptotic case (N = 1000). Detailed

results on the relative performance of these two resampling schemes over distinct covariance

estimators are not provided for space considerations but available from the authors upon

request. Since wild bootstrap inference implemented by means of an asymmetric distribution

appears slightly superior, the documented results for wild bootstrap inference refer to this

particular implementation.

Insert Table 1 about here

4.2.2 Asymptotic results

For the asymptotic case (N = 5000) the wild bootstrap shows by far the best empirical size

features. Over all time dimensions T = 5, 10, 20, DGMs, covariance estimators and nominal

significance levels γ = 0.01, 0.05 and 0.10 only two size estimates differ significantly from

the nominal counterpart. For all competing approaches the overall number of significant

size violations is considerably larger. Valuing significance of the Hausman statistic by means

of the χ2 distribution or versions of iid resampling is particularly invalid for the DGMs

where the variance of individual effects is heteroskedastic (DGM 5) or the underlying error

covariance is irregular (DGM 4).

Using the χ2(2)0.95 quantile to assess the significance of the Hausman statistic for DGMs 1

21

Page 22: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

to 4, most size violations reflect undersizing. However, for the model with cross sectional

heteroskedasticity and uniform serial correlation (DGM 2) we have the interesting result

that the fully parametric covariance estimator invokes significant undersizing (γ=3.98%)

while the semiparametric estimator yields to many rejections (γ=5.78%). Thus, ignoring

cross sectional heterogeneity an analyst is subjected to the risk of invalid inference without

control of the bias’ direction.

As the empirical performance of bootstrap based inference is similar under the null hy-

pothesis over all distinct covariance estimators, alternative wild bootstrap approaches can

directly be compared in terms of size adjusted asymptotic power. Throughout, asymptotic

power features for a given DGM are similar over alternative covariance estimators. While

power estimates are also similar for a given testing strategy over DGM 1 to DGM 4, inference

under DGM 5 (heterogeneous variance of individual effects) turns out to be less powerful. For

instance, presuming a parametric MA([√

T ]) structure for model disturbances obtains size

adjusted power estimates between 13.6% and 14.3% for DGMs 1 to 4 while the correspond-

ing measure for DGM 5 is only 8.32%. In sum, over all simulated DGMs with T = 10 the

MA([√

T ]) based covariance estimator offers highest asymptotic power for testing at the 5%

level. The overall power difference to more restrictive covariance estimators, as, for instance,

the fully parametric estimate is moderate, however. While for the MA([√

T ]) covariance

estimator the wild bootstrap offers aggregated power of 64.3% the corresponding quantity

documented for the fully parametric estimator is 63.3%. Even ignoring the potential of serial

correlation, i.e. conditioning the parametric estimate on ρ0 = 0, yields an aggregated power

of 61.8%.

¿From the summary measures characterizing empirical power over time dimensions T =

5, 10, 20 it seems that there is hardly a dominating strategy to estimate Cov[ei]. For power

estimates of wild bootstrap inference it turns out that conditional on T = 5 a presumption of

MA(2) innovation dynamics yields superior power properties. For experiments with T = 20,

however, the more restrictive covariance estimator conditioning on ρ0 = 0.5 yields highest

aggregated power.

Although one should be careful in comparing alternative tests with divergent size prop-

erties, Table 1 documents that implementing the wild bootstrap with an asymmetric distri-

bution of ηi does not go along with power loss in comparison with iid resampling. Similarly,

the iid bootstrap scheme to estimate the covariance of the difference between the standard

22

Page 23: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

LSDV and GLS estimator is not preferable in terms of asymptotic power. Testing at the 5%

level the latter benchmark is, however, characterized by significant size violations over all

DGMs except the cross sectionally homogenous specification (DGM 1).

To summarize the asymptotic features of alternative approaches to Hausman testing we

conclude that ignoring cross sectional heterogeneity is likely to invalidate the level of the

test. Size violations are most suitably overcome by means of the wild bootstrap. For the

power features the relative impact of alternative covariance estimators is surprisingly small

given that their parametric strength is markedly varying. Note that the variety of covari-

ance estimators covers a most restrictive scenario of ignoring serial correlation ρ0 = 0 and

estimating the covariance matrix in a general semiparametric manner.

Insert Table 2 about here

4.2.3 Finite sample results

Table 2 documents simulation results for the ’finite’ sample case N = 40. Most strikingly,

conditional on the ’small’ cross sectional dimension the relative merits of (semi)parametric

covariance estimates deteriorate. For all underlying DGMs the fully parametric covariance

specification offers superior size adjusted power properties in comparison with Hausman

testing by means of Σ(SP ). For instance, employing χ2-quantiles for inference under DGM 2

(cross section specific variances with homogenous correlation), size adjusted power estimates

are 12.9% and 8.5% if the Hausman statistic is determined with Σ(AR) and Σ(SP ), respectively.

While standard inference suffers from marked size distortions, bootstrap approaches, and

in particular, wild resampling, achieve most favorable empirical size features which are almost

always insignificantly close to the nominal counterpart of 5%. The summary statistics in

the right hand side panel of Table 2 reveal that for inference at nominal levels of 1% or

10% an iid resampling scheme may offer less size violations in comparison with the wild

bootstrap. With particular reference to DGM 5 (heterogenous variance of individual effects),

however, the relative merits of the wild bootstrap against all competing test procedures

become evident. Except for test implementations employing the semiparametric covariance

estimator all empirical size estimates documented for iid resampling or χ2(2) based inference

exceed the nominal 5% level significantly.

Insert Table 3 about here

23

Page 24: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

5 An empirical illustration

To illustrate some effects of alternative covariance estimators when analyzing real data we

consider specification testing in a translog model that quantifies yearly revenues from dairy

production over the period 1997-2005 for 149 farms located in Northern Germany (Abdulai

and Tietje, 2007). For specification testing the following model excluding time dummy

variables is considered:

yit = ν +5∑

k=1

βjxit,k +1

2

5∑

k=1

βjjx2it,k +

5∑

k=1

5∑

l=k+1

βklxit,kxit,l + αi + eit. (34)

In (34) the output variable yit is the log total revenue from dairy production and log input

factors xit,k are, respectively, expenditure on feed (k = 1), expenditure on live stock (k = 2),

herd size (k = 3), land (k = 4) and labor (k = 5). All variables are deflated by an appropriate

price index to approximate physical quantities. For a detailed description of data collection

and procession we refer to Abdulai and Tiedje (2007).

Estimation and diagnostic results are displayed in Table 3. Complementary to character-

izing the entire sample period we also provide statistical features of three subperiods each

covering three years. For the entire set of panel data we confirm that individual effects

are likely correlated with explanatory variables featuring the translog model. Depending

on the employed covariance estimator the Hausman statistic varies between 52.01 (para-

metric covariance conditional on ρ = −0.5) and 98.24 (GP). According to an asymptotic

χ2-distribution all statistics are highly significant at any conventional nominal level. Deter-

mining critical values by means of the wild bootstrap obtains throughout higher p−values

and thereby weakens the evidence against the null hypothesis. Conditional on a few co-

variance estimators (e.g. parametric covariance conditional on ρ = −0.2,−0.5) the wild

bootstrap amounts to accepting H0 with 1% significance. Applying the semiparametric co-

variance estimator (SP), the wild bootstrap yields a p−value in excess of 4%. Since the data

based AR coefficient is ρ = .077, the latter results might be attributed to a potential power

deficit of resampling the Hausman statistic. More interesting is that iid based resampling

almost uniformly confirms the rather low p−values implied by the χ2 distribution. Noting

that iid and wild resampling are equivalent only in case of cross sectional panel homogeneity

the latter observation hints at the incidence of heterogeneous error distributions featuring

the data.

The case of panel heterogeneity is further underpinned when looking at subsample specific

24

Page 25: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

parameter estimates of the AR parameter and both variance parameters. Subsample specific

AR parameters are throughout negative and might suffer from small sample (T = 3) biases.

Standard deviations featuring individual effects (idiosyncratic disturbances) are smallest over

the period 2000-02 (2003-05). For these subsamples the respective standard error estimates

are by a factor .4 smaller in comparison with maximum estimates obtained from conditioning

on the period (1997-99).

While subsample specific Hausman statistics stay significant at conventional nominal lev-

els according to the χ2(20) distribution, resampling based critical values reveal that the evi-

dence in favor of correlation between individual effects and explanatory variables is strongest

for the first subperiod (1997-99). Conditional on the time span 2003-05 wild bootstrap based

critical values imply marginal significance levels of at least 2% (parametric covariance con-

ditional on ρ = 0.5) while a majority of test implementations hints at acceptance of H0.

For the third subperiod, distinct p−values implied by iid and wild resampling further sup-

port the likelihood of cross sectional heterogeneity of the underlying distribution of model

disturbances.

Throughout, using cross sectional iid resampling to robustly evaluate the covariance of

the difference between the common LSDV and GLS estimator yields Hausman statistics

which are rather close to the parametric covariance estimator building upon MA type serial

dependence.

6 Conclusions

In this paper we address the issue of testing for correlation between unobserved panel het-

erogeneity and explanatory variables under general covariance structures of underlying error

distributions. We consider the case of a finite time dimension while N → ∞. Second order

features cover (cross sectionally varying patterns of) serial correlation, time heteroskedastic-

ity or cross sectional variance of individual effects. For the determination of critical values

we propose a wild bootstrap scheme that retains its validity even in case the presumed co-

variance structure differs from the true second order features of error terms. In this case

nuisance parameters are likely to invalidate asymptotic pivotalness of a generalized Hausman

statistic.

Finite sample features involved when critical values for the Hausman statistic are taken

25

Page 26: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

from the χ2-distribution or estimated alternatively by means of the bootstrap are examined.

For benchmarking purposes cross sectional iid resampling schemes are also investigated.

We find that the wild bootstrap approach is characterized by more accurate empirical size

features. Inference by means of critical values from the χ2-distribution suffers from both,

weaker empirical size features if the cross section dimension is small under correct covariance

specification, and nonpivotalness in case second order features are misspecified.

In terms of power the choice of particular covariance estimators is not crucial asymptot-

ically. For small cross sections, however, (misspecified) parsimonious parametric covariance

representations promise power advantages in comparison with using more general (semipara-

metric) covariance estimators.

With respect to the empirical example it is apparent that panel covariance homogeneity

is likely exceptional at least when modelling longitudinal data. In the light of potential

heterogeneity robust critical values promise actual significance levels which are close to the

nominal test levels. The considered subsamples underscore, in addition, that correlation

between individual effects and explanatory variables might also undergo some form of time

variation. In the latter case it is important to have tools of inference at hand that show

accurate empirical features in case of a small time dimension.

Throughout our analysis proceeds under the (common) assumption of cross sectional

independence which might be at odds with macroeconomic or spatial panel data. Recent

contributions to spatial econometrics or panel unit root testing allow for cross sectional error

correlation. Immunizing the Hausman statistic against cross sectional error correlation is an

important issue of further research.

Acknowledgements

The authors thank two anonymous referees, an associate editor and the editor for helpful

comments. Moreover, we are grateful to Awudu Abdulai for providing us the data used

for the empirical illustration. The first author gratefully acknowledges financial support of

Deutsche Forschungsgemeinschaft (DFG) (HE 2188/1-1).

26

Page 27: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

7 Appendix

7.1 Proofs

Proof of Proposition 1. We obtain from (13) that

Cov

(N∑

i=1

CN,iui

∣∣∣∣∣X)

= Cov(√

N(βFE − βGLS)∣∣∣X)

= B−1N − A−1

N

P−→ B−1 − A−1.

Furthermore, we obtain from the uniform integrability of (‖X ′iXi‖)i∈N that, for arbitrary

c > 0,

P

(max

1≤i≤N‖X ′

iXi‖ > cN

)≤

N∑

i=1

P (‖X ′iXi‖ > cN)

≤ 1

cN

N∑

i=1

E[‖X ′iXi‖I(‖X ′

iXi‖ > cN)] −→N→∞

0.

In other words, we have that max1≤i≤N ‖Xi‖ = oP (√

N), which implies that cN = max1≤i≤N ‖CN,i‖= oP (1). Hence, we obtain by (A1,iv) that, for arbitrary ǫ > 0,

N∑

i=1

E(‖CN,iui‖2I (‖CN,iui‖ > ǫ)

∣∣X)

≤N∑

i=1

‖CN,i‖2 E(‖ui‖2I (‖ui‖ > ǫ/cN)

∣∣X)

= oP (1) ·N∑

i=1

‖CN,i‖2 = oP (1), (35)

that is, a conditional Lindeberg condition is fulfilled. Now we obtain by the Lindeberg-Feller

central limit theorem that

(B−1

N − A−1N

)−1/2 √N(βFE − βGLS)

d−→ N (0K , IK),

which implies by the continuous mapping theorem

HN = N(βFE − βGLS)′(B−1

N − A−1N

)−1(βFE − βGLS)

d−→ χ2(K).

The second assertion (17) follows immediately from ‖AN −AN‖ P−→ 0 and ‖BN −BN‖ P−→0.

Proof of Proposition 2. Analogous to the proof of Proposition 1.

27

Page 28: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

Proof of Proposition 3. We will first show that

L(

(B−1N − A−1

N )−1/2

N∑

i=1

CN,iu∗i

∣∣∣∣∣XN

)=⇒ N (0K , IK) in probability, (36)

which implies by the continuous mapping theorem that

L (H∗N | XN) =⇒ χ2(K) in probability.

Since χ2(K) is a continuous distribution we obtain that

sup−∞<z<∞

∣∣P (H∗N ≤ z | XN) − P (χ2(K) ≤ z)

∣∣ P−→ 0.

(36) will actually follow from

L(

(B−1N − A−1

N )−1/2

N∑

i=1

CN,iuiηi

∣∣∣∣∣XN

)=⇒ N (0K , IK) in probability (37)

and

TN :=N∑

i=1

CN,i(ui − ui)ηiP−→ 0. (38)

It follows from (35) that there exists a null sequence (ǫN)N∈N such that

E

(N∑

i=1

‖CN,iui‖2I(‖CN,iui‖2 > ǫN)

∣∣∣∣∣X)

P−→ 0.

Let γN,i = CN,iuiI(‖CN,iui‖ ≤ ǫN). It follows from the latter display that

N∑

i=1

CN,iuiu′iC

′N,i =

N∑

i=1

γN,iγ′N,i + oP (1). (39)

Using E(∑N

i=1 CN,iuiu′iC

′N,i | X) = B−1

N − A−1N we obtain that

∥∥∥∥∥E(

N∑

i=1

γN,iγ′N,i | X

)− (B−1

N − A−1N )

∥∥∥∥∥

≤N∑

i=1

E(‖CN,iui‖2 I(‖CN,iui‖ > ǫN)

∣∣X) P−→ 0. (40)

28

Page 29: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

For the (k, l)-th entry of the matrix∑N

i=1 γN,iγ′N,i, we have

E

[

N∑

i=1

(γN,i)k(γN,i)l − E((γN,i)k(γN,i)l | X)

]2∣∣∣∣∣∣X

=N∑

i=1

E([(γN,i)k(γN,i)l − E((γN,i)k(γN,i)l | X)]2

∣∣X)

≤N∑

i=1

E([(γN,i)k(γN,i)l]

2∣∣X)

≤ ǫ2N

N∑

i=1

E((γN,i)

2k

∣∣X)

≤ ǫ2N

N∑

i=1

E((CN,iui)

2k

∣∣X)

= oP (1),

which implies thatN∑

i=1

γN,iγ′N,i = E

(N∑

i=1

γN,iγ′N,i

∣∣∣∣∣X)

+ oP (1). (41)

From (39), (40) and (41) we conclude that

N∑

i=1

CN,iuiu′iC

′N,i = B−1

N − A−1N + oP (1),

which implies that

Cov

((B−1

N − A−1N

)−1/2N∑

i=1

CN,iuiηi

∣∣∣∣∣XN

)

=(B−1

N − A−1N

)−1/2N∑

i=1

CN,iuiu′iC

′N,i

(B−1

N − A−1N

)−1/2 P−→ IK . (42)

Moreover, since∑N

i=1 ‖CN,iui‖2 P−→ tr(B−1−A−1) and, according to (35), P (max1≤i≤N ‖CN,iui‖ >

c | X) ≤ (1/c2)∑N

i=1 E[‖CN,iui‖2I(‖CN,iui‖ > c) | X]P−→ 0 we obtain, for arbitrary ǫ > 0,

thatN∑

i=1

E(‖CN,iuiηi‖2I (‖CN,iuiηi‖ > ǫ)

∣∣XN

)

≤N∑

i=1

‖CN,iui‖2E(η2

i I (‖CN,iui‖|ηi| > ǫ)∣∣XN

) P−→ 0,

that is, we have again a conditional Lindeberg condition being fulfilled. Therefore, (37)

follows from (42) by the Lindeberg-Feller central limit theorem.

Now it remains to prove (38). We have that

TN = −N∑

i=1

CN,iXi(βFE − β)ηi.

29

Page 30: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

Since E(η2i | XN) = 1 we obtain that

E(‖TN‖2 | XN

)=

N∑

i=1

(βFE − β)′X ′iC

′N,iCN,iXi(βFE − β)

≤ ‖βFE − β‖2 · max1≤i≤N

‖X ′iXi‖ ·

N∑

i=1

C ′N,iCN,i

= OP (N−1) · oP (N) · OP (1) = oP (1),

that is, (38) holds also true.

Proof of Proposition 4. Analogous to the proof of Proposition 3.

Proof of Theorem 1. Since χ2(K) is a continuous distribution we conclude from Proposi-

tion 1 that

sup−∞<z<∞

∣∣P (HN ≤ z) − P(χ2(K) ≤ z

)∣∣ −→N→∞

0, (43)

which implies by Proposition 3 that

sup−∞<z<∞

|P (HN ≤ z) − P (H∗N ≤ z | XN)| P−→ 0. (44)

Using the fact that P (H∗N ≤ z | XN)

P−→ γ (Since the distribution of H∗N can be discrete we

cannot guarantee that P (H∗N ≤ z | XN) = γ, however, Proposition 3 ensures at least this

convergence.) we obtain that∣∣∣P (HN > c)|c=c∗

γ

− γ∣∣∣

≤ sup−∞<z<∞

|P (HN ≤ z) − P (H∗N ≤ z | XN)| + oP (1)

P−→ 0.

This implies that

P (HN > c∗γ) −→N→∞

0.

Proof of Theorem 2. Analogous to the proof of Theorem 1.

References

Abdulai, A. and Tietje, H. (2007). Estimating technical efficiency under unobserved hetero-

geneity with stochastic frontier models: Application to northern Germany dairy farms.

European Review of Agricultural Economics, 18, 1–24.

30

Page 31: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

Ahn, S.C. and Moon, H.R. (2001). On Large-N and Large-T properties of panel data esti-

mators and the Hausman test. mimeo, University of Southern California.

Amemiya, T. (1971). The estimation of variances in a variance-components model. Interna-

tional Economic Review, 12, 1–13.

Anselin, L., Florax, R.J.G.M. and Rey, S.J. (Eds.) (2004). Advances in Spatial Econometrics.

Springer, Berlin.

Baltagi, B.H. (2001). Econometric Analysis of Panel Data. John Wiley, Chichester.

Baltagi, B.H. and Griffin, J.M. (1983). Gasoline demand in the OECD: An application of

pooling and testing procedures. European Economic Review, 29, 745–753.

Baltagi, B.H. and Kao, C. (2000). Nonstationary panels, Cointegration in panels and dynamic

panels. A survey. in: Baltagi (Ed.): Nonstationary panels, Cointegration in panels and

dynamic panels. Advances in Econometrics, 15, JAI Press, Amsterdam, 7–52.

Baltagi, B.H. and Pinnoi, N. (1995). Public capital stock and state productivity growth:

Further evidence from an error components model. Empirical Economics, 20, 351–359.

Bole, V.A. and Rebec, P. (2004). Bootstrapping the Hausman test in panel data models.

Manuscript, available at SSRN: http://ssrn.com/abstract=628321

Bollerslev, T., Chou, R.Y. and Kroner, K. F. (1992). ARCH modelling in finance: A review

of the theory and empirical evidence. Journal of Econometrics, 52, 5–59.

Cameron, A.C. and Trivedi, P.K. (2005). Microeconomics: Methods and Applications. Cam-

bridge University Press, New York.

Davidson, R. and Flachaire, E. (2001). The wild bootstrap, tamed at last. GREQAM Doc-

ument de Travail 99A32

Grunfeld, Y. (1958). The determinants of corporate investment, unpublished Ph.D. disser-

tation (University of Chicago, Chicago).

Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.

Hausman, J.A. (1978) Specification Tests in Econometrics, Econometrica, 46, 1251–1271.

31

Page 32: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

Herwartz, H. and Neumann, M.H. (2005). Bootstrap inference in single equation error cor-

rection models. Journal of Econometrics, 128, 165–193.

Herwartz, H. and Neumann, M.H. (2007). A robust bootstrap approach to the Hausman test

in stationary panel data models. Economic Working Papers, Kiel University, 2007-29

Kiefer, N.M. (1980) Estimation of Fixed Effect Models for Time Series of Cross Sections

with Arbitrary Intertemporal Covariance, Journal of Econometrics, 14, 195–202.

Lillard, L.A. and Willis, R.J. (1979). Components of variation in panel earnings data: Amer-

ican scientists 1960-1970, Econometrica, 47, 437–454.

Liu, R.Y. (1988). Bootstrap procedures under some non-i.i.d. models. Annals of Statistics,

16, 1696–1708.

Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensional linear models. An-

nals of Statistics, 21, 255–285.

Munnell, A. (1990). Why has productivity growth declined? Productivity and public invest-

ment. New England Economic Review, January/February, 3–22.

Nerlove, M. (1971). Further evidence on the estimation of dynamic economic relations from

a time series of cross sections. Econometrica, 39, 359–382.

Swamy, P.A.V.B. and Arora, S.S. (1972). The exact finite sample properties of the estimators

of coefficients in the error components regression model. Econometrica, 40, 261–275.

Wallace, T.D. and Hussain, A. (1969). The use of error components models in combining

cross section and time series data, Econometrica, 37, 55–72.

Wu, C.F.J. (1986). Jackknife, bootstrap, and other resampling methods in regression analysis

(with discussion). Annals of Statistics, 14, 1261–1343.

32

Page 33: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

DGM 1 DGM 2 DGM 3 DGM 4 DGM 5∑

DGMsγT (H1) #(γ 6= γ)5 #(γ 6= γ)10 #(γ 6= γ)20

Σ(•) H0 H1 H0 H1 H0 H1 H0 H1 H0 H1 5 10 20 1%5%10% 1%5%10% 1%5%10%AR χ2(2) 3.76 13.9 3.98 13.9 4.72 15.0 4.56 13.4 5.78 7.96 57.5 64.1 67.5 2 3 4 3 3 3 3 2 3

ηi(R) 5.30 13.0 5.30 13.4 5.04 15.3 4.98 14.0 5.62 7.52 57.8 63.2 66.7 0 0 0 0 1 0 0 0 0ηi(M) 5.24 13.6 5.02 13.9 5.22 14.2 5.10 13.7 5.34 7.80 58.4 63.3 67.0 0 0 0 0 0 0 0 0 0iid b. 5.40 13.6 5.36 14.2 5.56 14.3 5.84 13.3 6.62 7.78 58.4 63.2 66.6 1 1 2 3 2 2 1 1 1

AR χ2(2) 3.20 13.6 3.50 13.3 4.24 13.3 2.96 14.1 5.60 7.58 59.1 62.0 68.6 4 4 4 3 4 4 5 4 4ρ = 0.5 ηi(M) 5.34 13.3 5.22 13.2 5.48 13.1 4.82 13.6 5.30 7.86 58.5 61.1 67.8 0 0 0 1 0 0 0 0 0

iid b. 5.46 13.4 5.64 13.9 5.70 12.8 5.18 14.0 6.48 7.72 58.0 61.9 68.7 2 1 2 1 3 3 1 1 1AR χ2(2) 4.30 13.7 4.78 13.4 4.36 14.3 3.70 13.6 5.96 8.08 57.8 63.2 66.6 4 4 5 3 4 3 1 3 4ρ = 0.0 ηi(M) 5.28 13.3 5.30 13.3 5.36 14.0 5.10 13.5 5.18 7.76 58.6 61.8 67.6 0 0 0 0 0 0 0 0 0

iid b. 5.22 13.4 5.58 13.2 5.82 14.1 5.58 13.4 6.38 7.74 59.0 61.9 67.1 1 1 2 2 2 4 1 1 1MA χ2(2) 4.34 14.0 4.56 13.5 4.90 15.1 5.18 13.5 5.82 8.00 58.1 64.0 68.0 1 1 2 2 2 2 3 3 2

ηi(M) 5.34 13.9 4.98 14.1 5.18 14.3 5.00 13.6 5.06 8.32 58.1 64.3 66.8 0 0 0 0 0 0 0 0 0iid b. 5.28 13.8 5.52 14.1 5.52 15.1 5.56 14.2 6.44 7.70 58.0 64.9 65.7 2 1 1 1 1 2 1 1 1

TH χ2(2) 4.54 14.0 4.88 13.0 5.14 14.8 5.22 13.8 5.70 7.62 57.8 63.2 67.0 0 1 1 1 1 1 2 2 3ηi(M) 5.24 13.6 5.00 13.9 5.20 14.2 4.96 13.8 5.24 7.80 57.1 63.5 65.9 0 0 0 0 0 0 0 0 0iid b. 5.20 13.5 5.62 13.7 5.52 14.8 5.60 13.4 6.24 8.06 57.8 63.5 66.7 1 1 1 3 2 2 1 1 1

GP χ2(2) 5.28 14.2 5.46 13.6 5.70 14.6 5.62 13.8 6.26 7.80 59.5 64.0 67.8 1 1 2 1 3 3 1 1 1ηi(M) 5.24 13.7 5.02 13.8 5.20 15.0 4.84 13.6 5.26 7.92 60.1 64.0 67.3 0 0 0 0 0 0 0 0 0iid b. 5.46 13.7 5.52 14.2 5.72 14.4 5.68 13.3 6.54 7.80 57.3 63.3 66.7 2 1 1 2 3 3 1 1 1

SP χ2(2) 4.82 14.4 5.78 12.7 5.34 14.9 5.26 14.5 5.72 8.40 56.8 64.9 67.5 0 1 2 2 2 1 0 2 1ηi(M) 4.88 13.8 5.22 12.8 4.90 14.6 4.90 13.6 4.78 8.08 58.7 62.8 65.2 0 0 0 0 0 0 1 0 0iid b. 5.10 13.9 5.60 12.7 5.40 14.5 5.48 13.7 5.72 8.10 57.6 62.8 67.2 0 1 2 0 1 1 0 1 1

Hiid 5.36 13.7 5.72 13.2 6.00 14.4 5.82 13.6 6.64 8.20 58.2 63.1 67.3 2 1 2 3 4 4 1 2 1

Table 1: Empirical rejection frequencies from alternative critical values for the Hausman statistic. Panel dimensions are N = 1000 and(mostly) T = 10. For each covariance estimator Σ(•) (see Section 2.4) critical values are obtained from the χ2-distribution, the wild

bootstrap (ηi(R), ηi(M)) and cross sectional iid resampling (iid b.). Hiid signifies a bootstrap approach to estimate Cov[β

(iid)GLS − β

(iid)FE

]. Error

distributions are cross sectionally homogeneous (DGM 1), have cross section specific variance (DGM 2) or correlation (DGM 3). DGMs 4and 5 feature irregular correlation patterns and cross section specific variances of individual effects, respectively. Rejection frequencies(100 · γ) under H0 and size adjusted rejection frequencies H1 are given. Bold entries indicate that under H0 γ is not covered by a 95%confidence interval around the nominal 5% level. Columns ’

∑DGMs γT (H1)’ display the sum of power estimates over 5 alternative DGMs

when γ = 5% and T = 5, 10, 20. The number of significant violations of γ = 1%, 5%, 10% over the five DGMs with alternative timedimensions is listed underneath #(γ 6= γ)T .

33

Page 34: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

DGM 1 DGM 2 DGM 3 DGM 4 DGM 5∑

DGMsγ(H1) #(γ 6= γ)5 #(γ 6= γ)10 #(γ 6= γ)20

Σ(•) H0 H1 H0 H1 H0 H1 H0 H1 H0 H1 5 10 20 1%5%10% 1%5%10% 1%5%10%AR χ2(2) 2.82 13.5 2.92 12.9 4.14 13.6 2.02 12.6 5.74 8.00 56.3 60.6 62.0 5 4 4 4 5 5 3 4 4

ηi(R) 5.72 13.2 5.00 13.2 5.38 12.7 5.32 13.4 5.72 8.06 52.3 60.6 62.4 2 2 2 1 2 2 1 4 5ηi(M) 4.94 13.6 5.00 12.1 4.90 12.7 4.82 12.9 5.10 8.12 51.1 59.4 62.4 5 2 1 3 0 2 4 1 4iid b. 4.74 13.2 5.02 12.9 5.04 13.3 4.32 12.7 6.64 7.58 53.1 59.6 59.5 2 2 3 3 2 1 1 3 4

AR χ2(2) 2.60 14.1 2.18 13.3 2.74 14.4 1.54 12.9 5.64 7.66 57.4 62.4 61.5 5 5 5 4 5 5 4 5 5ρ = 0.5 ηi(M) 5.16 13.7 4.88 12.0 4.52 13.9 4.98 13.4 5.16 8.22 52.3 61.2 62.1 5 2 3 4 0 2 3 0 4

iid b. 4.76 14.4 5.00 12.8 4.70 14.2 4.64 12.5 7.32 7.96 55.4 61.8 60.3 2 2 3 4 1 1 1 2 4AR χ2(2) 3.46 13.6 3.58 13.0 3.66 14.1 2.74 12.9 5.78 8.22 55.4 61.9 61.7 5 5 5 4 5 5 3 3 2ρ = 0.0 ηi(M) 4.84 13.1 4.88 12.0 4.74 13.4 4.80 13.2 4.90 7.96 51.5 59.7 61.3 5 2 1 4 0 1 4 2 4

iid b. 4.68 13.5 4.82 12.6 4.94 13.2 4.24 11.9 6.46 8.16 53.3 59.3 59.0 2 2 3 3 2 2 1 2 4MA χ2(2) 3.32 13.0 3.44 12.7 4.30 13.2 2.66 12.5 5.90 7.94 56.2 59.3 61.4 4 4 4 4 5 4 3 4 4

ηi(M) 4.92 13.7 4.80 11.9 4.76 13.5 4.72 13.5 5.02 7.88 51.3 60.5 62.9 5 3 2 2 0 1 3 0 4iid b. 4.78 13.8 4.92 12.6 5.00 13.3 4.16 12.4 6.68 7.52 55.2 59.6 58.0 2 2 3 4 2 2 1 4 4

TH χ2(2) 4.72 12.3 4.48 13.1 4.80 13.3 5.52 10.2 5.72 7.52 55.6 56.5 51.8 3 3 3 2 1 1 3 3 4ηi(M) 4.84 13.2 5.02 12.2 5.32 12.1 5.12 12.0 4.80 7.84 51.9 57.4 60.5 5 2 1 2 0 2 3 2 2iid b. 4.92 12.8 5.02 12.4 5.08 13.5 4.72 11.4 6.20 7.34 55.9 57.4 57.0 3 2 3 1 1 2 1 2 3

GP χ2(2) 4.66 13.6 4.94 12.4 4.96 13.8 4.42 12.6 6.14 8.36 55.3 60.7 61.4 3 2 3 2 1 2 1 1 4ηi(M) 5.24 13.3 5.50 11.9 5.28 12.9 5.32 12.5 5.14 8.38 51.9 59.0 62.5 5 3 2 3 0 3 3 2 4iid b. 4.86 13.6 5.30 12.5 5.34 13.4 5.02 12.3 6.50 8.26 55.4 60.0 60.9 3 2 4 0 1 4 1 4 5

SP χ2(2) 4.82 8.84 4.76 8.50 4.76 10.2 4.20 9.18 5.36 6.28 49.4 43.0 32.3 2 1 2 3 1 1 2 1 0ηi(M) 4.90 8.64 4.28 8.68 4.48 10.0 4.92 8.34 5.06 5.74 48.4 41.4 33.9 5 4 1 4 1 1 5 4 2iid b. 4.66 8.80 4.72 8.80 4.58 10.1 4.44 8.88 5.34 6.14 49.3 42.7 33.1 2 1 2 3 0 2 4 1 0

Hiid 5.12 12.9 5.08 12.7 5.16 13.2 4.32 12.8 7.22 8.00 54.4 59.7 59.3 2 2 3 2 2 3 1 3 4

Table 2: Empirical rejection frequencies for alternative strategies to obtain critical values for the Hausman statistic for panel dimensionsN = 40 and (mostly) T = 10. For further notes see Table 1.

34

Page 35: A robust bootstrap approach to the Hausman test in ...terms. The use of regional panel data is recently becoming more and more popular in macro-and spatial econometrics. Typical fields

crit Σ 97-05 97-99 00-02 03-05 Σ 97-05 97-99 00-02 03-05

HN AR 61.47 59.40 32.78 41.14 ρ = 84.25 55.94 47.56 56.99χ2 0.000 0.001 3.567 0.357 0.5 0.000 0.003 0.049 0.002ηi(M) 0.835 2.504 3.506 5.843 0.835 2.671 0.501 2.003iid b. 0.000 0.334 4.508 0.000 0.000 0.501 3.005 0.000

HN TH 65.57 62.38 36.28 49.22 ρ = 66.06 53.81 39.12 47.00χ2 0.000 0.000 1.426 0.029 0.2 0.000 0.006 0.644 0.059ηi(M) 0.501 2.170 4.841 4.341 0.835 2.671 0.668 5.008iid b. 0.000 0.334 5.008 0.000 0.000 0.501 1.669 0.167

HN MA 62.53 65.23 38.54 49.73 ρ = 59.20 54.46 35.38 43.29χ2 0.000 0.000 0.761 0.024 0 0.001 0.005 1.816 0.187ηi(M) 1.002 2.170 2.003 3.840 0.835 2.671 1.503 6.845iid b. 0.000 0.334 2.838 0.000 0.000 0.501 1.503 0.167

HN GP 98.24 62.38 36.28 49.22 ρ = 54.92 56.65 33.02 41.23χ2 0.000 0.000 1.426 0.029 -0.2 0.004 0.002 3.355 0.348ηi(M) 0.668 2.170 4.841 4.341 1.336 2.504 2.003 7.012iid b. 0.000 0.334 5.008 0.000 0.000 0.501 2.170 0.167

HN SP 74.25 49.37 31.02 44.46 ρ = 52.01 64.03 32.62 41.43χ2 0.000 0.027 5.495 0.130 -0.5 0.011 0.000 3.713 0.328ηi(M) 4.007 4.841 10.35 9.683 2.337 2.170 3.506 5.342iid b. 0.167 0.501 8.848 0.167 0.000 0.334 4.174 0.000

Hiid 62.27 70.21 42.52 50.44χ2 0.000 0.000 0.236 0.019

σe .095 .088 .071 .056 ρ .077 -.343 -.520 -.465σi .168 .288 .172 .268

Table 3: Hausman statistics and p−values (·100) obtained for a translog production functiondescribing dairy production for N = 149 farms in Northern Germany over the time period1997 to 2005 (T = 9). Alternative covariance estimators are indicated as in Table 1. The righthand side panel lists inferential results based on alternative preselections of the autoregressiveparameter ρ. ’χ2’ signifies that critical values are taken from a χ2-distribution with 20 degreesof freedom. The Table provides inferential and estimation results for the entire sample periodand for 3 subsamples each covering a time span of 3 years. For further notes see Table 1.

35