a robust bootstrap approach to the hausman test in ...terms. the use of regional panel data is...
TRANSCRIPT
A robust bootstrap approach to the Hausman test in
stationary panel data models with general error
covariance structure
Helmut Herwartz∗ Michael H. Neumann†
February 4, 2008
Abstract
In panel data econometrics the Hausman test is of central importance to select an
efficient estimator of the models’ slope parameters. For testing the null hypothesis of
no correlation between unobserved heterogeneity and observable explanatory variables,
model disturbances are typically assumed to be independent and identically distributed
over the time and the cross section dimension. The test statistic lacks pivotalness in
case the iid assumption is violated. GLS based test statistics also build upon strong
homogeneity restrictions that might not be met by empirical data. We propose a
wild bootstrap approach to specification testing in panel data models which is robust
under cross sectional or time heteroskedasticity and inhomogeneous patterns of serial
correlation. A Monte Carlo study shows that in small samples the wild bootstrap
outperforms inference based on critical values taken from a χ2-distribution. Moreover,
as a benchmark iid resampling schemes fail under cross sectional heterogeneity.
Keywords: Hausman test, random effects model, wild bootstrap, heteroskedasticity.
JEL Classification: C12, C33.
∗Institut fur Statistik und Okonometrie, Christian–Albrechts–Universitat zu Kiel, Ohlshausenstr. 40, D–
24098 Kiel, E-mail: [email protected] (corresponding author)†Institut fur Stochastik, Friedrich–Schiller–Universitat Jena, Ernst-Abbe-Platz 2, D–07743 Jena, E-mail:
1 Introduction
Panel data models are often formalized under conditional absence of serial error correlation
and homoskedasticity over both the time and cross section dimension. Model disturbances
are often modelled by iid random variables in case of microeconometric studies where a set
of anonymous households or firms enters the analysis. For such widespread applications of
panel models, however, (neglected) dynamic features might show up in autocorrelated error
terms. The use of regional panel data is recently becoming more and more popular in macro-
and spatial econometrics. Typical fields where panel data models are employed cover, for
instance, models of growth, international or interregional trade, or empirical approaches to
urban crime or environmental economics (Baltagi and Kao, 2000; Anselin, Florax and Rey,
2004). A core issue in panel specification testing is the selection of an efficient estimator in
presence of unobserved heterogeneity. The Hausman test has become a prominent means of
inference against correlation between individual effects and observable explanatory variables
(Hausman, 1978). In applied spatial econometrics serial error correlation is likely to emerge
whenever a region only partially absorbs a shock within the unit of time used as the sampling
frequency. The presumption of time invariant error variances may also be criticized. In
econometrics of financial data time dependent variances have attracted a huge theoretical
and empirical interest (Bollerslev, Chou and Kroner, 1992). Similarly, shifts in the variations
of disturbances may occur as a consequence of (fiscal or monetary) policy changes, central
bank interventions or regime switches. Cross sectional patterns of second order heterogeneity
are also more the rule than an exception. For instance, one may intuitively expect that large
firms or industrialized regions are likely to respond to exogenous shocks at a different scale
in comparison with small firms or more agricultural regions.
Occasionally panel data models have been formalized with some pattern of serial error
correlation (Lillard and Willis, 1979; Baltagi, 2001, Chapter 5). Then, correlation might be
specified parsimoniously with some first order autoregressive parameter. Over all members
of a cross section a first order autocorrelation scheme might fail to provide a uniformly accu-
rate approximation of the true underlying pattern of error dynamics. Moreover, it is likely
that the autocorrelation parameter, if it exists, is cross section specific. Obviously, when
allowing serially correlated disturbances within a panel data framework potential directions
of covariance misspecification are magnifold.
2
The asymptotic distribution of common panel specification test statistics derived under
an iid assumption depends on nuisance parameters if model disturbances are actually het-
eroskedastic over time, serially correlated or lack homogeneity over the cross section. On
the one hand neglecting such forms of heterogeneity may invalidate conclusions obtained un-
der an unrealistic modelling framework. On the other hand deriving first order asymptotic
approximations is often cumbersome if not impossible in presence of nuisance parameters.
Under such circumstances bootstrap approaches are in widespread use to obtain robust crit-
ical values for a particular test statistic. Li (2006) illustrates the merits of a block bootstrap
approach to detect a failure of exogeneity in regression models under serially correlated error
terms by means of an adjusted Hausman statistic. This approach is outlined for the case
of an infinite time dimension. By means of Edgeworth expansions, Bole and Rebec (2004)
prove asymptotic refinements achieved by iid resampling for the case of the classical Haus-
man test in stationary panel data models with iid error terms. Cameron and Trivedi (2005)
advocate resampling of cross sectional error vectors with replacement to estimate the covari-
ance matrix that enters the panel Hausman statistic. In spite of its feasibility, however, iid
resampling might suffer from theoretically invalid size features if model disturbances stem
from distributions that are (conditionally) heterogeneous over the cross section.
It is the purpose of this paper to contribute a robust approach to determine critical values
for the Hausman statistic. It retains its validity in panels with finite time series dimension,
under cross sectional heteroskedasticity, and (possibly time varying or cross section specific)
serial error correlation. The proposed method exploits a convenient feature of the wild
bootstrap which copes with heteroskedasticity of model disturbances (Wu, 1986; Liu, 1988;
Mammen, 1993) and cross sectional error correlation (Herwartz and Neumann, 2005).
This paper is organized as follows: The panel model and the test statistic are given in
the next section. Then, Section 3 provides a bootstrap approach to generate critical values
for the Hausman statistic. A simulation study, given in Section 4, illustrates the finite
sample performance of alternative resampling schemes and approaches motivated by first
order asymptotic approximations. As an empirical example Section 5 provides specification
tests for modelling dairy production in a cross section of farms located in Northern Germany.
Conclusions are drawn in Section 6. An Appendix provides the proofs of the asymptotic
results.
3
2 The model and the test statistic
2.1 A panel model with generalized covariance structure
Consider the common panel data model with random individual effects by observation
yit = x′itβ + ν + uit, uit = αi + eit (i = 1, . . . , N ; t = 1, . . . , T ). (1)
Defining Yi = (yi1 . . . , yiT )′, Xi = (xi1, . . . , xiT )′ and ei = (ei1, . . . , eiT )′ we can rewrite (1) in
matrix notation as
Yi = Xiβ + ν11T + ui, ui = αi11T + ei (i = 1, . . . , N) (2)
or, with Y = (Y ′1 , . . . , Y
′N)′, X = (X ′
1, . . . , X′N)′, u = (u′
1, . . . , u′N)′, α = (α111
′T , . . . , αN11′T )′
and e = (e′1, . . . , e′N)′,
Y = Xβ + ν11NT + u, u = α + e. (3)
In (1) xit is a K × 1 random vector of explanatory variables. Accordingly, β is a K−dimensional parameter vector, ν denotes an intercept term and 11R denotes, for any R ∈ N,
an R-dimensional vector consisting of ones. By assumption, the random individual effects
αi ∼ (0, ω2i ) are independent from the disturbances eit. Note that in case ω2
1 = . . . = ω2N = 0
the pooled regression is obtained as a special case of (1). With respect to the covariance of
the mean zero innovations eit we allow a pattern of serially correlated but cross sectionally
uncorrelated error terms. Then, with E[ei] = 0T the latter scenario is formalized as
E[eie′j] = δij Σi, (4)
where Σi is a positive definite matrix of dimension T × T and δij is the Kronecker delta.
According to (4) model disturbances may stem from cross sectionally heterogenous distri-
butions and show time specific second order features. With regard to serial correlation the
general specification covers e.g. the first order autoregressive model put forth by Lillard and
Willis (1979), i.e.
eit = ρeit−1 + ǫit, |ρ| < 1, ǫit ∼ iid(0, σ2ǫ ). (5)
The autocorrelation structure specified in (5) is, however, very restrictive owing to the pos-
tulates of an exponentially decaying autocorrelation function on the one hand and cross
sectional homogeneity on the other hand. As alternative specifications one may regard error
4
terms following a higher order autoregression or some moving average pattern. For a brief
review of alternative parametric suggestions and their treatment for feasible GLS estimation
the reader may consult Baltagi (2001, Chapter 5.2). In any case, the more general error
distribution complicates feasible GLS estimation of the models’ slope parameters and, more
importantly, introduces a source of potential misspecification of the model.
Given the likelihood of cross section specific covariance features it appears more natural
to allow general unspecified patterns of second order features in microeconometric or spatial
panel data models as (1). For the reader’s convenience, we briefly discuss in this section
generalized estimators one of which is efficient if individual effects and explanatory variables
are uncorrelated. As it is typical in panel data modelling this correlation feature is subjected
to specification testing by means of a (generalized) Hausman statistic which is provided
below. Since misspecification might be seen as a crucial issue in this vein of econometric
modelling we also discuss the distributional features of the generalized Hausman statistic
under misspecification of the covariance pattern. Before introducing generalized estimators
and test statistics we make the following assumptions:
(A1) (i) α1, . . . , αN , e1, . . . , eN are conditionally on X = (X ′1, . . . , X
′N)′ independent,
(ii) E(ei | X) = 0T , E(αi | X) = 0,
(iii) there exist positive constants C1 and C2 such that
var(αi | X) = ω2i , 0 < C1 ≤ ω2
i ≤ C2 < ∞,
Cov(ei | X) = Σi, C1IT Σi C2IT (A B if B − A is positive
semidefinite),
(iv) the random variables (e2it)i,t and (α2
i )i are conditionally on X uniformly integrable,
that is,
supi∈N
max1≤t≤T
E(e2
itI(|eit| > c)∣∣X)
−→c→∞
0,
supi∈N
E(α2
i I(|αi| > c)∣∣X)
−→c→∞
0,
(v) the random variables (‖X ′iXi‖)i∈N are uniformly integrable, that is,
supi∈N
E [‖X ′iXi‖I(‖X ′
iXi‖ > c)] −→c→∞
0.
Remark 1. It is well known that uniform integrability follows from boundedness of moments
of higher order. For example, since E (e2itI(|eit| > c)|X) ≤ c−δE(|e2+δ
it | X) the first condi-
5
tion in (A1)(iv) will follow from supi∈Nmax1≤t≤T E(|eit|2+δ | X) < ∞, for some δ > 0.
2.2 Generalized estimators
Denote Ω := Cov(u) = Diag[Ω1, . . . , ΩN ], where Ωi = Σi + ω2i 11T 11′T . An efficient estimator
of β is given as
βGLS = A−1N
N∑
i=1
aN,iYi, (6)
where
AN =1
N
(X ′Ω−1X − X ′Ω−111NT 11′NT Ω−1X
11′NT Ω−111NT
),
aN,i =1
N
(X ′
iΩ−1i − 1
11′NT Ω−111NT
X ′Ω−111NT 11′T Ω−1i
).
In the special case of Σi = σ2IT and ω2i = ω2, this yields the common random effects
estimator (Baltagi, 2001, Chapter 2)
βiidGLS =
(∑
i,t
xitx′it + T
σ2
σ2 + Tω2
∑
i
xix′i
)−1(∑
i,t
xityit + Tσ2
σ2 + Tω2
∑
i
xiyi
), (7)
where standard conventions are used to denote centered variables, for instance xit = xit −xi·, xi = xi· − x, xi· = (
∑t xit)/T, x = (
∑i,t xit)/(NT ).
If E(αi | Xi) does not necessarily vanish, then βGLS is in general not a consistent estimator
of β. In this case we can augment the design matrix X with the NT × N -matrix W =
Diag[11T , . . . , 11T ] and obtain from (3) the equation
Y = (W X)
δ
β
+ u, (8)
where δ = (ν + E(α1 | X1), . . . , ν + E(αN | XN))′ and u = u − W (E(α1 | X1), . . . , E(αN |XN))′. Then, an efficient estimator of β is the (generalized) fixed effect or least squares
dummy variable estimator (LSDV)
βFE =(X ′Ω−1X − X ′Ω−1W (W ′Ω−1W )−1W ′Ω−1X
)−1
(X ′Ω−1 − X ′Ω−1W (W ′Ω−1W )−1W ′Ω−1
)Y.
βFE is the (unique) best linear unbiased estimator (BLUE) of β in model (8), that is, it
can be written in the form LY , where unbiasedness of βFE requires that LX = IK and
6
LW = 0K×N while optimality means that Cov(βFE) = LΩL′ is minimal under these side
conditions. However, since all matrices L satisfying these side conditions fulfill LΩL′ =
LDiag[Σ1, . . . , ΣN ]L′, it follows that βFE is equal to the BLUE in model (8) with Cov(u) =
Σ := Diag[Σ1, . . . , ΣN ].
Therefore, βFE can also be written as
βFE = B−1N
N∑
i=1
bN,iYi, (9)
where
BN =1
N
N∑
i=1
(X ′
iΣ−1i Xi −
1
11′T Σ−1i 11T
X ′iΣ
−1i 11T 11′T Σ−1
i Xi
),
bN,i =1
N
(X ′
iΣ−1i − 1
11′T Σ−1i 11T
X ′iΣ
−1i 11T 11′T Σ−1
i
).
In the special case of Σi = σ2IT this estimator simplifies to the standard LSDV estimator
βiidFE =
(∑
i,t
xitx′it
)−1∑
i,t
xityit; (10)
see Baltagi (2001, Chapter 2).
2.3 The Hausman statistic
As mentioned OLS or GLS estimation of the slope parameters in (1) will be biased if the
individual effects αi are correlated with (some of) the explanatory variables xit,1, . . . , xit,K .
On the other hand, under assumption (A1,iii) the GLS estimator is to be preferred over the
OLS or LSDV estimator since it exploits the underlying error covariance structure efficiently.
Moreover, estimation of N fixed effects is avoided such that model evaluation does not suffer
from incidential parameters. Therefore, a test for correlation between individual effects and
explanatory variables is essential to select an efficient estimator for the model in (1). The
Hausman statistic (Hausman, 1978) has become a prominent tool to test the null hypothesis
that individual effects are uncorrelated with the variables in xit against the alternative of
correlation, i.e.
H0 : E(αi|Xi) = 0 vs. H1 : E(αi|Xi) 6≡ 0 for at least one i. (11)
In this paper we allow the error terms ei to have some general covariance pattern as
formalized in (4). Accordingly, we consider a GLS based modification of the Hausman
7
statistic. It follows from least squares theory that Cov(βGLS | X) = 1N
A−1N , Cov(βFE | X) =
1N
B−1N and, since βGLS is efficient under (A1), we have also that Cov(
√N(βFE − βGLS) |
X) = B−1N − A−1
N (Hausman, 1978). Moreover, it follows from assumption (A2) below that
B−1N −A−1
N is a positive definite matrix if N is sufficiently large. For simplicity of presentation
we assume that this holds true even for all N . The Hausman statistic is
HN = N(βFE − βGLS)′(B−1
N − A−1N
)−1(βFE − βGLS). (12)
Note that√
N(βFE − βGLS) =N∑
i=1
CN,iui, (13)
where CN,i =√
N(B−1N bN,i − A−1
N aN,i). Accordingly, the Hausman statistic allows a repre-
sentation as a quadratic form in the underlying model disturbances, i.e.
HN =
∥∥∥∥∥(B−1
N − A−1N
)−1/2N∑
i=1
CN,iui
∥∥∥∥∥
2
. (14)
In the case that the covariance parameters are unknown but respective estimators are avail-
able, we can estimate the matrices AN and BN by AN and BN . In this case we consider the
statistic with estimated covariance parameters
HN = N(βFE −
βGLS)′(B−1
N − A−1N
)−1
(βFE −
βGLS), (15)
whereβFE and
βGLS are feasible LSDV and GLS estimators. The corresponding quadratic
form representation of HN is analogous to (14).
To derive the asymptotic properties of the Hausman statistic we make the following as-
sumptions:
(A2) It holds that ANP−→ A and BN
P−→ B, as N → ∞, where B and A−B are positive
definite matrices.
The following assertion characterizes the asymptotic behavior of HN and HN under the
null hypothesis.
Proposition 1. Suppose that (A1) and (A2) are fulfilled. Then, as N → ∞,
HNd−→ χ2(K). (16)
8
Furthermore, if AN and BN are consistent estimators of AN and BN , that is, ‖AN−AN‖ P−→0 and ‖BN − BN‖ P−→ 0, and if
√N(βFE − βGLS)
d−→ N (0K , B−1 − A−1), then
HNd−→ χ2(K). (17)
The asymptotic results in (16) and (17) are both derived for the case of a finite time
dimension T . Owing to consistency of βGLS and βFE their difference vanishes under (A1)
and (A2) as T → ∞. For the case of an underlying iid covariance structure Ahn and
Moon (2001) show that as T → ∞, Cov[βFE − βGLS] converges sufficiently fast to ensure a
nondegenerate limit distribution of the standard Hausman statistic.
As argued before, any (cross sectionally homogeneous) a-priori formalization of panel
covariance features is likely subjected to misspecification. Therefore we also consider the
realistic case where the presumed covariance pattern differs from the unknown covariance
structure. We still assume that the true covariances are given by (A1,iii), that is, Ω is the
covariance matrix of u as above. Let now Ω denote the covariance specification which is
actually used for constructing the (feasible) LSDV and GLS estimators and the test statis-
tic. Denote by AN , aN,i, BN , bN,i and CN,i the analogues of AN , aN,i, BN , bN,i and CN,i,
respectively, where for each term the true covariance matrix Ω is replaced by the presumed
(false) covariance matrix Ω. In this case, the difference between the two panel estimators
writes as√
N(βFE − βGLS) =
∑Ni=1 CN,iui, obtaining the test statistic
HN = N
(βFE −
βGLS
)′ (B−1
N − A−1N
)−1(βFE −
βGLS
)(18)
=
∥∥∥∥∥(B−1
N − A−1N
)−1/2N∑
i=1
CN,iui
∥∥∥∥∥
2
.
We can see in complete analogy to the correctly specified case that the matrix B−1N − A−1
N
is positive semidefinite. Actually, assume that Cov(u) were equal to Ω rather than Ω. Then
βGLS were efficient and it would follow that B−1
N − A−1N = CoveΩ(
√NβFE)−CoveΩ(
√NβGLS)
is equal to CoveΩ(√
N(βFE − βGLS)) (CoveΩ denotes the covariance matrix under the hypo-
thetical scenario that Cov(u) = Ω). Hence, B−1N − A−1
N is positive semidefinite, even if Ω
deviates from the true covariance matrix Ω. Regularity of B−1N − A−1
N follows from assump-
tion (A3) below, for N sufficiently large. For simplicity of presentation, we assume again
that this holds true for all N .
9
Finally, in the realistic case that estimates of the covariances are used, we obtain the
statistic
HN =
∥∥∥∥∥
(B
−1
N − A
−1
N
)−1/2 N∑
i=1
CN,iui
∥∥∥∥∥
2
, (19)
whereAN ,
BN and
CN,i are the analogues of AN , BN and CN,i, respectively, with the
presumed (false) covariances Ω replaced by their estimatesΩ.
For the asymptotic considerations we assume additionally
(A3) It holds that ANP−→ A and BN
P−→ B, as N → ∞, where B and A − B are positive
definite matrices. Furthermore,∑N
i=1 CN,iΩiC′N,i
P−→ D, where D is a non-vanishing
matrix.
The following proposition describes the asymptotic behavior of the Hausman statistics
HN andHN in the misspecified case, and under the null hypothesis. In contrast to the
result in Proposition 1, they converge now in distribution to a random variable which is a
weighted sum of independent χ2(1) random variables.
Proposition 2. Suppose that (A1), (A2) and (A3) are fulfilled. Then, as N → ∞,
HNd−→
K∑
i=1
λiZ2i ,
where Z1, . . . , ZK are independent standard normal random variables and λ1, . . . , λK are the
eigenvalues of the matrix (B−1 − A−1)−1/2D(B−1 − A−1)−1/2.
Furthermore, ifAN and
BN are consistent estimators of AN and BN , that is, ‖ AN −
AN‖ P−→ 0 and ‖ BN − BN‖ P−→ 0, and if∑N
i=1
CN,iui
d−→ N (0K , D), then
HNd−→
K∑
i=1
λiZ2i .
It is widespread folklore that bootstrap inference offers asymptotic improvements over
first order asymptotic approximations if a respective test statistic is asymptotically pivotal;
see for example Hall (1992). From this perspective our results suggest that depending on the
true covariance Ω the resampling as addressed in Section 3 either provides valid significance
levels (Ω 6= Ω) or faster convergence of empirical to nominal significance levels (Ω = Ω), see
also Bole and Rebec (2004) for an analysis of the Hausman statistic in the iid case.
10
2.4 Alternative covariance estimators
The implementation of the generalized estimators introduced in Section 2.2 or the Hausman
statistic in Section 2.3 requires some a-priori choice of Cov[ui] or Cov[ei]. To estimate
these matrices a convenient starting point is the common fixed effect estimator building
on iid assumptions and given in (10). An intercept estimate is ν = y − x′βiidFE, with y =
(NT )−1∑
i,t yit and x denoting the K × 1-dimensional vector of unconditional means of
explanatory variables. Implied disturbance vectors ui, ei, and fixed effect estimates are,
respectively,
ui = yi − 11T ν − xiβiidFE, ei = yi − xiβ
iidFE, and αi = ν + 11′T ui/T.
Then, the variance of individual effects can be evaluated as (Nerlove, 1971)
ω2 =1
N − 1
N∑
i=1
(αi − α)2, α =1
N
N∑
i=1
αi. (20)
The estimator ω2 is consistent as N → ∞ and nonnegative by construction. Owing to the
latter property it might be particularly useful in Monte Carlo experiments. Apart from
the estimator in (20) one may also evaluate ω2 by means of other approaches going back to
Swamy and Arora (1972), Wallace and Hussain (1969) or Amemiya (1971). The focus of this
paper lies, however, on the characterization of alternative venues to obtain critical values for
the Hausman statistic.
In a second step feasible covariance estimators (Ω, Σ) entering the Hausman statistic
HN ,HN are constructed conditional on particular parametric assumptions. Alternatively,
to avoid overly strong restrictions, an analyst may also rely on semiparametric covariance
estimators. In the following we list a number of potential covariance estimators that are
likely to offer different empirical features of inference on correlation between xit and αi.
1. Unconditional estimation of parametric covariances
Presuming cross sectionally homogeneous serial correlation as in (5) the unconditional
variance is estimated as
var(eit) = σ2ǫ /(1 − ρ2) =
1
(N(T − 1) − K)
N∑
i=1
T∑
t=1
e2it. (21)
A pooled regression delivers an estimate of the autoregressive parameter
ρ =
∑Ni=1
∑Tt=2 ei,tei,t−1∑N
i=1
∑Tt=2 e2
i,t−1
, and, accordingly, σ2ǫ = var(eit)(1 − ρ2). (22)
11
By means of the structural estimates ρ and σ2ǫ the matrix Cov[ei], denoted Σ(AR), can
be composed in the usual way.
2. Conditional estimation of variance parameters
A stronger parametric approach in comparison with Σ(AR) obtains by imposing some
a-priori restriction ρ = ρ0, and estimate subsequently the variance parameter in (22)
conditionally. A covariance estimator obtained along these lines is denoted Σ(ρ=ρ0).
3. Cross sectional averaging
Instead of presuming an explicit parametric autoregression one may alternatively esti-
mate Cov[ei] as an average pattern of second order moments evaluated over the cross
section. Along these lines the overall estimator depends on the particular covariance
structure presumed to hold for cross section specific quantities. We distinguish three
alternative scenarios, a finite order moving average representation (MA), time depen-
dent second order features (TH) and a general covariance pattern (GP). The respective
covariance estimators take the form
Σ(•) = 1/NN∑
i=1
Σ(•)i , • = MA,TH,GP. (23)
With regard to the cross sectional quantities Σ(•)i we distinguish:
MA: By assumption, eit obeys MA dynamics of order ([√
T ]), where [z] is the integer
part of z. With I(•) denoting an indicator function, a typical element of Σ(MA)i is
σ(i,MA)kl = 1/T
T−|k−l|∑
j=1
ei,j ei,j+|k−l|I(|k−l|≤[√
T ]), k, l = 1, . . . , T.
TH: Time heterogeneity of second order moments and absence of higher order serial
correlation motivates an estimator Σ(TH)i with typical elements
σ(i,TH)kl = ei,kei,lI(|k−l|≤[
√T ]).
GP: Owing to the within transformation applied to determine βiidFE the matrix eie
′i
is of reduced rank (Kiefer, 1980). The third choice of a cross section specific
covariance matrix utilizes as many serial cross products as possible to obtain a
full rank covariance matrix. A typical element of Σ(GP )
σ(i,GP )kl = ei,kei,lI(|k−l|≤T−2).
12
4. Semiparametric covariance estimation
Nonparametric estimation of Σi suffers from the rank deficit of eie′i. An indirect non-
parametric estimator of Cov[ei] is to estimate Cov[ui] nonparametrically and subtract
the variance contribution of the individual effects. This yields
Σ(SP ) = Ω − ω211T 11′T , with Ω = 1/NN∑
i=1
uiu′i. (24)
3 Bootstrapping the Hausman statistic
As stated in Proposition 2 misspecification of the covariance structure of the innovations
entering a panel model implies that the (generalized) Hausman test statistic lacks pivotalness.
In this case the asymptotic distribution depends on the true covariance matrix Ω which is
unknown. Therefore, it is hardly possible to derive the nuisance parameters λi analytically,
such that the actual asymptotic distribution of the Hausman statistic is unfeasible. A-priori
information necessary to justify conditional parameter estimation is, however, often hardly
available. From this perspective the generality of the bootstrap methodology is immediately
clear as it obtains asymptotically correct critical values even under some misspecification of
the actual covariance pattern. As a particularly important case of misspecification one may
regard the imposition of cross sectionally homogeneous covariance features whenever the
true error distribution varies over the cross section. Owing to particular issues raised for the
Hausman test, as cross and intra sectional heteroskedasticity, the so-called wild or external
bootstrap (Wu, 1986) can be seen as a natural tool to determine critical values. Addressing
the case of heteroskedasticity of unknown form, Liu (1988) and Mammen (1993) established
the wild bootstrap to approximate the distribution of studentized statistics and F-type tests
in static linear regression models, respectively. Recently, Herwartz and Neumann (2005) have
used the wild bootstrap to mimic cross sectional covariance patterns observed in systems of
error correction models. For the general convenience of the wild bootstrap it is worthwhile to
mention that its implementation does not require any a-priori parametric guess concerning
the covariance structure of model disturbances.
According to the different versions of the Hausman statistic in (12), (15), (18) and (19)
resampling the statistic may proceed under alternative degrees of knowledge of the underlying
covariance structure. Depending on the estimates Σ(•) a particular Hausman statistic is
13
obtained that is either asymptotically pivotal (HN , HN) or depends on nuisance parameters
(HN ,HN)). To exemplify the bootstrap based generation of critical values for the Hausman
statistic we consider the case of HN that has a quadratic form representation analogous to
(14). Resampling the other variants of the Hausman statistic is in complete analogy and
merely differs in the estimate of Σi or Ωi. The implementation of the bootstrap scheme
consists of the imitation of HN and the test decision, which we now sketch in more detail.
1. Particular bootstrap counterparts of HN are obtained in two steps:
i. draw bootstrap variables u∗i sharing second order features of ui as
u∗i = ui · ηi, ηi ∼ iid(0, 1), i = 1, . . . , N, (25)
where ηi are independent of the variables in the model;
ii. obtain a bootstrap statistic H∗N from its quadratic form representation as
H∗N =
∥∥∥∥∥(B−1
N − A−1N
)−1/2N∑
i=1
CN,iu∗i
∥∥∥∥∥
2
, (26)
where AN , BN , CN are the estimated counterparts of AN , BN , CN defined in con-
nection with (9), (6) and (13), respectively;
2. Decision: Define c∗γ as the (1 − γ)-quantile of H∗N , i.e. c∗γ = infc : P (H∗
N ≤ c |X, u1, . . . , uN) ≥ 1 − γ. In practice, c∗γ can conveniently be obtained by simulating
H∗N S times, for sufficiently large S, and choosing c∗γ as the empirical (1− γ)-quantile.
Reject H0 with significance level γ if HN exceeds c∗γ.
The central ingredient of the bootstrap procedure is the imitation of the first two moments
of ui = (ui1, . . . , uiT )′ by means of the quantity
u∗i = (u∗
i1, . . . , u∗iT )′ = ηi (ui1, . . . , uiT )′ = ηiui.
Since,
1
N
N∑
i=1
Cov(u∗i | X, u1, . . . , uN) =
1
N
N∑
t=1
uiu′i =
1
N
N∑
i=1
uiu′i + oP (1)
P→ 1
N
N∑
i=1
Cov(ui),
the bootstrap reflects, on average, the true underlying covariances. Note that the latter may
exhibit some variation over the cross section as formalized by assumption (A1,iii).
14
Several approaches to draw ηi are available from the literature (Mammen, 1993; Liu, 1988).
For the Monte Carlo study ηi is drawn alternatively from the Rademacher distribution (Liu,
1988; Davidson and Flachaire, 2001),
P (η(R)i = 1) = P (η
(R)i = −1) = 0.5, (27)
and a two point distribution proposed in Mammen (1993)
η(M)i =
(√5 − 1
)/2 with probability q =
(√5 + 1)/(2
√5),
(√5 + 1
)/2 with probability 1 − q.
. (28)
While both choices are in line with (25) the two distributions differ with respect to
higher order moments imitated by the bootstrap design. In particular, E[(η(M)i )3] = 1
while E[(η(R)i )3] = 0 and E[(η
(R)i )4] = 1.
To state the asymptotic features of the wild bootstrap scheme denote XN = (X, u1, . . . , uN).
We will show that the bootstrap counterpart H∗N of the Hausman statistic has the same
asymptotic behavior as HN . Since the conditional distribution of H∗N given XN is itself ran-
dom we obtain weak convergence of these distributions to their common limit in probability.
Proposition 3. Suppose that (A1) and (A2) are fulfilled. Then, as N → ∞,
sup−∞<z<∞
∣∣P (H∗N ≤ z | XN) − P (χ2(K) ≤ z)
∣∣ P−→ 0.
Propositions 1 and 3 together imply that the bootstrap test has asymptotically the correct
size.
Theorem 1. Suppose that (A1) and (A2) are fulfilled. Then
PH0
(HN > c∗γ
)−→N→∞
γ.
In the case of incorrectly specified covariances we obtain analogous results: Denote by
H∗N the analogue to HN given in (18).
Proposition 4. Suppose that (A1), (A2) and (A3) are fulfilled. Then, as N → ∞,
sup−∞<z<∞
∣∣∣∣∣P (H∗N ≤ z | XN) − P
( K∑
i=1
λiZ2i ≤ z
)∣∣∣∣∣
P−→ 0,
where Z1, . . . , ZK and λ1, . . . , λK are as in Proposition 2.
15
Denote by c∗γ the (1 − γ)-quantile of L(H∗N | XN). Propositions 2 and 4 together imply
that the bootstrap test has asymptotically the correct size.
Theorem 2. Suppose that (A1), (A2) and (A3) are fulfilled. Then
PH0
(HN > c∗γ
)−→N→∞
γ.
4 Monte Carlo analysis
As argued above the bootstrap approach to test the null hypothesis of no correlation be-
tween individual effects αi and explanatory variables xit allows for numerous deviations from
conditional iid assumptions for the analysis of stationary panel data models. Error vectors
ei may have a non-diagonal covariance matrix Σi, formalizing serial correlation. Along its
diagonal Σi might collect time specific variances, and, moreover, the covariance is allowed
to vary over the cross section. Finally, the unobservable error components in αi may feature
cross section specific second order properties.
The Monte Carlo study documented in this section addresses the performance of the
bootstrap method over potential violations of homogeneity assumptions. Moreover, the
finite sample properties of wild bootstrap and standard feasible GLS inference are compared.
The Hausman statistic depends on an analyst’s choice of a particular covariance estimator.
Since any presumption concerning the correlation pattern could be wrong, the Monte Carlo
analysis also sheds light on inferential features invoked by alternative covariance estimators.
In particular, we address the effect of neglecting the potential of serial correlation. Moreover,
the recommended wild bootstrap scheme is contrasted against iid resampling and a bootstrap
based strategy to estimate the covariance of the difference between the inefficient standard
GLS and LSDV estimators under the null hypothesis of the Hausman test.
16
4.1 The simulation design
4.1.1 The considered data generating processes
Our Monte Carlo experiments basically employ the following homogeneous model specifica-
tion
yit = 1 + xit,1 + xit,2 + eit, t = 1, . . . , T, i = 1, . . . , N. (29)
The right hand side variables xit,2 are drawn from a Gaussian distribution, xit,2 ∼ N (0, 1),
and fixed over all replications of an experiment. Similarly, variables xit,1 are generated from
the model
xit,1 = µi + ξit, ξit ∼ N (0, 1), µi = 4(i − 1)/(N − 1).
Owing to the deterministic component µi the unconditional level of xit,1 is ordered equidis-
tantly over the cross section between values of 0 and 4. Individual effects αi are also drawn
from the normal distribution. Nesting the null and the alternative hypothesis of the Hausman
test, the individual effects are drawn as
αi = δxi·,1 + ωiζi, ζi ∼ N (0, 1). (30)
According to (30) individual effects αi and explanatory variables xit,1 are correlated if δ 6= 0.
Under the null hypothesis δ = 0.
Vectors of error terms ei are drawn from a T -dimensional normal distribution as
ei = G′ivi, vi ∼ N (0, IT ), G′
iGi = Σi, (31)
where IT is the T -dimensional identity matrix and Gi is an upper triangular matrix obtained
from a Cholesky decomposition of Σi. The particular choices of Σi = [σ(i)kl ] cover the following
data generating models (DGMs):
• DGM 1: Cross sectionally homogeneous patterns of first order serial correlation with
unconditional unit variance, i.e.
σ(i)kl = ρ|k−l|, ρ = 0.5. (32)
• DGM 2: Cross sectionally homogeneous patterns of serial correlation, ρ = 0.5, com-
bined with heterogenous variances, such that
diag[Σ(i)] = exp(0.5(xit,1 − xi·)),
17
and
σ(i)kl = ρ|k−l|
√σ
(i)kkσ
(i)ll , if k 6= l.
• DGM 3: Cross sectionally varying patterns of serial correlation with an unconditional
variance of unity, such that
σ(i)kl = ρ
|k−l|i .
The parameters, ρ1 ≤ ρ2 ≤ . . . ≤ ρN , are drawn once from a uniform distribution,
U [−0.9, 0.9], and then fixed over all replications of an experiment.
Over all specifications DGM 1 to DGM 3 serial dependence is established by a first
order autoregression. We consider a further underlying error distribution with an irregular
covariance structure. For this purpose we first generate uniform variables θj ∼ U(−1, 1), j =
1, 2, 4, 8, 12, and draw for each cross section member 200 observations from the model
zis = wis + θ1wis−1 + θ2wis−2 + θ4wis−4 + θ8wis−8 + θ12wis−12, wis ∼ N(0, 1), s = 1, . . . , 200.
Then, the empirical autocovariance pattern is estimated from zis200s=1 in the usual way and
employed to compose a correlation matrix Ri. Cross sectional covariances are then
• DGM 4
Σ(i) = diag[exp(0.5(xit,1 − xi·,1))]1/2Ridiag[exp(0.5(xit,1 − xi·,1))]1/2
Over all replications of an experiment the sequence of covariance matrices Σ(i) is kept fixed.
For all models introduced so far the variance of individual effects is homogeneously ω2i =
ω2 = 1. To illustrate the potentially adverse impacts of heterogenous second order features
we consider a final scenario.
• DGM 5: Cross sectionally homogeneous covariance of eit as given for DGM 1 in (32)
combined with a cross section specific variance of individual effects,
ω2i = exp(0.5xi·,1).
4.1.2 DGMs and alternative covariance estimators
All DGMs are characterized by some pattern of serial correlation such that the standard
Hausman test is likely characterized by invalid empirical test levels. DGM 1 is the only
18
specification for which an asymptotically pivotal test statistic, for instance implemented with
Σ(AR), can be obtained. For all other DGMs we expect asymptotically invalid significance
levels when using critical values from a χ2 distribution for inference. Moreover, DGMs 2 to 4
feature cross sectional heterogeneity of Σ such that iid resampling schemes might fail. For the
particular scenario DGM 5 (cross sectional variance of αi) it is worthwhile to point out that
along common lines of panel data modelling the true underlying parameters ω2i cannot be
estimated consistently. In this case the bootstrap approach is particularly promising since it
allows robust inference even under a false presumption concerning the individual effects’
variances. We consider two alternative a-priori restrictions made for the autoregressive
parameter, namely ρ0 = 0 and ρ0 = 0.5. The first choice resembles the widespread situation
where the potential of serial correlation is neglected. Choosing ρ0 = 0.5 under DGMs 1,2
and 5 mirrors the (unrealistic) scenario where an analyst has access to the true unconditional
autoregressive parameter.
4.1.3 A benchmark approach
As argued, ignoring potential deviations from iid assumptions will render the standard LSDV
(β(iid)FE ) and GLS estimators (β
(iid)GLS) inefficient and thereby the covariance estimator of their
difference becomes invalid. Cameron and Trivedi (2005, p. 718) propose a bootstrap algo-
rithm to estimate Cov[β(iid)GLS − β
(iid)FE ] under serial dependence of unknown form by means of
a bootstrap procedure. This approach builds on cross sectional iid resampling of e∗i from
eiNi=1 to obtain bootstrap vectors of dependent variables, Y ∗
i = 11T αi + Xiβ(iid)FE + e∗i . From
this sample the difference between the two estimators, δs,∗ = β(iid,∗)GLS − β
(iid,∗)FE , is computed.
After S replications the Hausman statistic can be determined as
Hiid =(β
(iid)GLS − β
(iid)FE
)′(
1
S − 1
S∑
s=1
(δs − δ
) (δs − δ
)′)−1 (
β(iid)GLS − β
(iid)FE
), (33)
where δ is the mean of δs over S bootstrap replications.
4.1.4 Further remarks
To implement wild bootstrap inference we employ the Rademacher distribution in (27) and
the asymmetric distribution in (28). For the purpose of comparison we also evaluate boot-
strap approximations implemented by means of iid resampling. The number of bootstrap
19
replications is set to S = 299. Investigating size and (local) power properties the parameter δ
in (30) is chosen as δ = 0 and δ = 1/√
N , respectively. The considered time series dimensions
are T = 5, 10, 20. Since the asymptotic theory in Sections 2 and 3 has been set out under
the assumption of a fixed time dimension T and N → ∞ the cross section dimensions are
chosen as N = 40 and N = 1000, where the latter choice is thought to provide sufficiently
precise estimates of asymptotic size and power. Each DGM is generated 5000 times. Mostly
our discussion of empirical test features focusses on a nominal test level γ = 0.05. However,
a few selected results are also documented for testing at the nominal 1% and 10% level.
4.2 Monte Carlo results
4.2.1 Documentation
Simulation results for cross section dimensions N = 1000 and N = 40 are provided in Ta-
ble 1 and Table 2, respectively. Most of the documented results are obtained for time series
dimension T = 10. In the left hand side panels both tables provide empirical size (H0)
and size adjusted empirical power estimates (H1) over 5 distinct DGMs, 7 alternative esti-
mators Σ(•) and a benchmark iid bootstrap approach (Cameron and Trivedi 2005, p. 718).
Size adjustment is achieved by tuning the nominal level of a particular test procedure such
that the corresponding empirical level is 5%. For each covariance estimator critical values
of the respective Hausman statistic are alternatively taken from the χ2(2) distribution or
determined by means of wild or iid resampling. To facilitate the interpretation of the empir-
ical size estimates bold entries indicate that the nominal and empirical size differ with 5%
significance. Significant size distortions are diagnosed if the empirical rejection frequencies
under H0 are not covered by a confidence interval constructed around the nominal level as
γ ± 1.96√
γ(1 − γ)/5000. Setting γ = 0.05, 4.396% and 5.604% are the lower and upper
bound of this interval, respectively. A comparison of size adjusted power features is, how-
ever, only sensible if the empirical size estimates are insignificantly close to the respective
nominal levels. Although detailed results for (unadjusted) empirical power are not provided,
all test implementations turn out to have power in the sense that rejections frequencies are
larger under H1 in comparison to H0. As simulations were performed for a local alternative
δ = 1/√
N the rejection frequencies under H1 are similar for N = 40 and N = 1000. Using
a static alternative, δ = 1 say, all test procedures turned out to be consistent, as empirical
20
power is unity for all experiments with N = 1000.
While most results in Tables 1 and 2 refer to time dimension T = 10 a few selected results
are also provided for Monte Carlo experiments with T = 5, 20 performed at alternative levels
1% and 10%. For the time dimensions T = 5, 10, 20 test specific power estimates aggregated
over 5 alternative DGMs are documented. Moreover, we count significant size violations over
the 5 DGMs for nominal levels 1%, 5% and 10%.
For the case of a fully parametric covariance estimate we document empirical test features
for two versions of the wild bootstrap that differ with respect to the employed distribution
of ηi. Empirical size estimates are almost uniformly (i.e. over all DGMs and N = 40, 1000)
closer to the 5% nominal level if the wild bootstrap scheme is implemented by means of the
asymmetric distribution in (28). For 3 out of 10 experiments (5 DGMs and N = 40, 1000) the
Rademacher distribution yields significant oversizing of wild bootstrap inference. In terms of
size adjusted power both versions of the wild bootstrap perform similarly with a slight and
overall advantage of the asymmetric scheme in the asymptotic case (N = 1000). Detailed
results on the relative performance of these two resampling schemes over distinct covariance
estimators are not provided for space considerations but available from the authors upon
request. Since wild bootstrap inference implemented by means of an asymmetric distribution
appears slightly superior, the documented results for wild bootstrap inference refer to this
particular implementation.
Insert Table 1 about here
4.2.2 Asymptotic results
For the asymptotic case (N = 5000) the wild bootstrap shows by far the best empirical size
features. Over all time dimensions T = 5, 10, 20, DGMs, covariance estimators and nominal
significance levels γ = 0.01, 0.05 and 0.10 only two size estimates differ significantly from
the nominal counterpart. For all competing approaches the overall number of significant
size violations is considerably larger. Valuing significance of the Hausman statistic by means
of the χ2 distribution or versions of iid resampling is particularly invalid for the DGMs
where the variance of individual effects is heteroskedastic (DGM 5) or the underlying error
covariance is irregular (DGM 4).
Using the χ2(2)0.95 quantile to assess the significance of the Hausman statistic for DGMs 1
21
to 4, most size violations reflect undersizing. However, for the model with cross sectional
heteroskedasticity and uniform serial correlation (DGM 2) we have the interesting result
that the fully parametric covariance estimator invokes significant undersizing (γ=3.98%)
while the semiparametric estimator yields to many rejections (γ=5.78%). Thus, ignoring
cross sectional heterogeneity an analyst is subjected to the risk of invalid inference without
control of the bias’ direction.
As the empirical performance of bootstrap based inference is similar under the null hy-
pothesis over all distinct covariance estimators, alternative wild bootstrap approaches can
directly be compared in terms of size adjusted asymptotic power. Throughout, asymptotic
power features for a given DGM are similar over alternative covariance estimators. While
power estimates are also similar for a given testing strategy over DGM 1 to DGM 4, inference
under DGM 5 (heterogeneous variance of individual effects) turns out to be less powerful. For
instance, presuming a parametric MA([√
T ]) structure for model disturbances obtains size
adjusted power estimates between 13.6% and 14.3% for DGMs 1 to 4 while the correspond-
ing measure for DGM 5 is only 8.32%. In sum, over all simulated DGMs with T = 10 the
MA([√
T ]) based covariance estimator offers highest asymptotic power for testing at the 5%
level. The overall power difference to more restrictive covariance estimators, as, for instance,
the fully parametric estimate is moderate, however. While for the MA([√
T ]) covariance
estimator the wild bootstrap offers aggregated power of 64.3% the corresponding quantity
documented for the fully parametric estimator is 63.3%. Even ignoring the potential of serial
correlation, i.e. conditioning the parametric estimate on ρ0 = 0, yields an aggregated power
of 61.8%.
¿From the summary measures characterizing empirical power over time dimensions T =
5, 10, 20 it seems that there is hardly a dominating strategy to estimate Cov[ei]. For power
estimates of wild bootstrap inference it turns out that conditional on T = 5 a presumption of
MA(2) innovation dynamics yields superior power properties. For experiments with T = 20,
however, the more restrictive covariance estimator conditioning on ρ0 = 0.5 yields highest
aggregated power.
Although one should be careful in comparing alternative tests with divergent size prop-
erties, Table 1 documents that implementing the wild bootstrap with an asymmetric distri-
bution of ηi does not go along with power loss in comparison with iid resampling. Similarly,
the iid bootstrap scheme to estimate the covariance of the difference between the standard
22
LSDV and GLS estimator is not preferable in terms of asymptotic power. Testing at the 5%
level the latter benchmark is, however, characterized by significant size violations over all
DGMs except the cross sectionally homogenous specification (DGM 1).
To summarize the asymptotic features of alternative approaches to Hausman testing we
conclude that ignoring cross sectional heterogeneity is likely to invalidate the level of the
test. Size violations are most suitably overcome by means of the wild bootstrap. For the
power features the relative impact of alternative covariance estimators is surprisingly small
given that their parametric strength is markedly varying. Note that the variety of covari-
ance estimators covers a most restrictive scenario of ignoring serial correlation ρ0 = 0 and
estimating the covariance matrix in a general semiparametric manner.
Insert Table 2 about here
4.2.3 Finite sample results
Table 2 documents simulation results for the ’finite’ sample case N = 40. Most strikingly,
conditional on the ’small’ cross sectional dimension the relative merits of (semi)parametric
covariance estimates deteriorate. For all underlying DGMs the fully parametric covariance
specification offers superior size adjusted power properties in comparison with Hausman
testing by means of Σ(SP ). For instance, employing χ2-quantiles for inference under DGM 2
(cross section specific variances with homogenous correlation), size adjusted power estimates
are 12.9% and 8.5% if the Hausman statistic is determined with Σ(AR) and Σ(SP ), respectively.
While standard inference suffers from marked size distortions, bootstrap approaches, and
in particular, wild resampling, achieve most favorable empirical size features which are almost
always insignificantly close to the nominal counterpart of 5%. The summary statistics in
the right hand side panel of Table 2 reveal that for inference at nominal levels of 1% or
10% an iid resampling scheme may offer less size violations in comparison with the wild
bootstrap. With particular reference to DGM 5 (heterogenous variance of individual effects),
however, the relative merits of the wild bootstrap against all competing test procedures
become evident. Except for test implementations employing the semiparametric covariance
estimator all empirical size estimates documented for iid resampling or χ2(2) based inference
exceed the nominal 5% level significantly.
Insert Table 3 about here
23
5 An empirical illustration
To illustrate some effects of alternative covariance estimators when analyzing real data we
consider specification testing in a translog model that quantifies yearly revenues from dairy
production over the period 1997-2005 for 149 farms located in Northern Germany (Abdulai
and Tietje, 2007). For specification testing the following model excluding time dummy
variables is considered:
yit = ν +5∑
k=1
βjxit,k +1
2
5∑
k=1
βjjx2it,k +
5∑
k=1
5∑
l=k+1
βklxit,kxit,l + αi + eit. (34)
In (34) the output variable yit is the log total revenue from dairy production and log input
factors xit,k are, respectively, expenditure on feed (k = 1), expenditure on live stock (k = 2),
herd size (k = 3), land (k = 4) and labor (k = 5). All variables are deflated by an appropriate
price index to approximate physical quantities. For a detailed description of data collection
and procession we refer to Abdulai and Tiedje (2007).
Estimation and diagnostic results are displayed in Table 3. Complementary to character-
izing the entire sample period we also provide statistical features of three subperiods each
covering three years. For the entire set of panel data we confirm that individual effects
are likely correlated with explanatory variables featuring the translog model. Depending
on the employed covariance estimator the Hausman statistic varies between 52.01 (para-
metric covariance conditional on ρ = −0.5) and 98.24 (GP). According to an asymptotic
χ2-distribution all statistics are highly significant at any conventional nominal level. Deter-
mining critical values by means of the wild bootstrap obtains throughout higher p−values
and thereby weakens the evidence against the null hypothesis. Conditional on a few co-
variance estimators (e.g. parametric covariance conditional on ρ = −0.2,−0.5) the wild
bootstrap amounts to accepting H0 with 1% significance. Applying the semiparametric co-
variance estimator (SP), the wild bootstrap yields a p−value in excess of 4%. Since the data
based AR coefficient is ρ = .077, the latter results might be attributed to a potential power
deficit of resampling the Hausman statistic. More interesting is that iid based resampling
almost uniformly confirms the rather low p−values implied by the χ2 distribution. Noting
that iid and wild resampling are equivalent only in case of cross sectional panel homogeneity
the latter observation hints at the incidence of heterogeneous error distributions featuring
the data.
The case of panel heterogeneity is further underpinned when looking at subsample specific
24
parameter estimates of the AR parameter and both variance parameters. Subsample specific
AR parameters are throughout negative and might suffer from small sample (T = 3) biases.
Standard deviations featuring individual effects (idiosyncratic disturbances) are smallest over
the period 2000-02 (2003-05). For these subsamples the respective standard error estimates
are by a factor .4 smaller in comparison with maximum estimates obtained from conditioning
on the period (1997-99).
While subsample specific Hausman statistics stay significant at conventional nominal lev-
els according to the χ2(20) distribution, resampling based critical values reveal that the evi-
dence in favor of correlation between individual effects and explanatory variables is strongest
for the first subperiod (1997-99). Conditional on the time span 2003-05 wild bootstrap based
critical values imply marginal significance levels of at least 2% (parametric covariance con-
ditional on ρ = 0.5) while a majority of test implementations hints at acceptance of H0.
For the third subperiod, distinct p−values implied by iid and wild resampling further sup-
port the likelihood of cross sectional heterogeneity of the underlying distribution of model
disturbances.
Throughout, using cross sectional iid resampling to robustly evaluate the covariance of
the difference between the common LSDV and GLS estimator yields Hausman statistics
which are rather close to the parametric covariance estimator building upon MA type serial
dependence.
6 Conclusions
In this paper we address the issue of testing for correlation between unobserved panel het-
erogeneity and explanatory variables under general covariance structures of underlying error
distributions. We consider the case of a finite time dimension while N → ∞. Second order
features cover (cross sectionally varying patterns of) serial correlation, time heteroskedastic-
ity or cross sectional variance of individual effects. For the determination of critical values
we propose a wild bootstrap scheme that retains its validity even in case the presumed co-
variance structure differs from the true second order features of error terms. In this case
nuisance parameters are likely to invalidate asymptotic pivotalness of a generalized Hausman
statistic.
Finite sample features involved when critical values for the Hausman statistic are taken
25
from the χ2-distribution or estimated alternatively by means of the bootstrap are examined.
For benchmarking purposes cross sectional iid resampling schemes are also investigated.
We find that the wild bootstrap approach is characterized by more accurate empirical size
features. Inference by means of critical values from the χ2-distribution suffers from both,
weaker empirical size features if the cross section dimension is small under correct covariance
specification, and nonpivotalness in case second order features are misspecified.
In terms of power the choice of particular covariance estimators is not crucial asymptot-
ically. For small cross sections, however, (misspecified) parsimonious parametric covariance
representations promise power advantages in comparison with using more general (semipara-
metric) covariance estimators.
With respect to the empirical example it is apparent that panel covariance homogeneity
is likely exceptional at least when modelling longitudinal data. In the light of potential
heterogeneity robust critical values promise actual significance levels which are close to the
nominal test levels. The considered subsamples underscore, in addition, that correlation
between individual effects and explanatory variables might also undergo some form of time
variation. In the latter case it is important to have tools of inference at hand that show
accurate empirical features in case of a small time dimension.
Throughout our analysis proceeds under the (common) assumption of cross sectional
independence which might be at odds with macroeconomic or spatial panel data. Recent
contributions to spatial econometrics or panel unit root testing allow for cross sectional error
correlation. Immunizing the Hausman statistic against cross sectional error correlation is an
important issue of further research.
Acknowledgements
The authors thank two anonymous referees, an associate editor and the editor for helpful
comments. Moreover, we are grateful to Awudu Abdulai for providing us the data used
for the empirical illustration. The first author gratefully acknowledges financial support of
Deutsche Forschungsgemeinschaft (DFG) (HE 2188/1-1).
26
7 Appendix
7.1 Proofs
Proof of Proposition 1. We obtain from (13) that
Cov
(N∑
i=1
CN,iui
∣∣∣∣∣X)
= Cov(√
N(βFE − βGLS)∣∣∣X)
= B−1N − A−1
N
P−→ B−1 − A−1.
Furthermore, we obtain from the uniform integrability of (‖X ′iXi‖)i∈N that, for arbitrary
c > 0,
P
(max
1≤i≤N‖X ′
iXi‖ > cN
)≤
N∑
i=1
P (‖X ′iXi‖ > cN)
≤ 1
cN
N∑
i=1
E[‖X ′iXi‖I(‖X ′
iXi‖ > cN)] −→N→∞
0.
In other words, we have that max1≤i≤N ‖Xi‖ = oP (√
N), which implies that cN = max1≤i≤N ‖CN,i‖= oP (1). Hence, we obtain by (A1,iv) that, for arbitrary ǫ > 0,
N∑
i=1
E(‖CN,iui‖2I (‖CN,iui‖ > ǫ)
∣∣X)
≤N∑
i=1
‖CN,i‖2 E(‖ui‖2I (‖ui‖ > ǫ/cN)
∣∣X)
= oP (1) ·N∑
i=1
‖CN,i‖2 = oP (1), (35)
that is, a conditional Lindeberg condition is fulfilled. Now we obtain by the Lindeberg-Feller
central limit theorem that
(B−1
N − A−1N
)−1/2 √N(βFE − βGLS)
d−→ N (0K , IK),
which implies by the continuous mapping theorem
HN = N(βFE − βGLS)′(B−1
N − A−1N
)−1(βFE − βGLS)
d−→ χ2(K).
The second assertion (17) follows immediately from ‖AN −AN‖ P−→ 0 and ‖BN −BN‖ P−→0.
Proof of Proposition 2. Analogous to the proof of Proposition 1.
27
Proof of Proposition 3. We will first show that
L(
(B−1N − A−1
N )−1/2
N∑
i=1
CN,iu∗i
∣∣∣∣∣XN
)=⇒ N (0K , IK) in probability, (36)
which implies by the continuous mapping theorem that
L (H∗N | XN) =⇒ χ2(K) in probability.
Since χ2(K) is a continuous distribution we obtain that
sup−∞<z<∞
∣∣P (H∗N ≤ z | XN) − P (χ2(K) ≤ z)
∣∣ P−→ 0.
(36) will actually follow from
L(
(B−1N − A−1
N )−1/2
N∑
i=1
CN,iuiηi
∣∣∣∣∣XN
)=⇒ N (0K , IK) in probability (37)
and
TN :=N∑
i=1
CN,i(ui − ui)ηiP−→ 0. (38)
It follows from (35) that there exists a null sequence (ǫN)N∈N such that
E
(N∑
i=1
‖CN,iui‖2I(‖CN,iui‖2 > ǫN)
∣∣∣∣∣X)
P−→ 0.
Let γN,i = CN,iuiI(‖CN,iui‖ ≤ ǫN). It follows from the latter display that
N∑
i=1
CN,iuiu′iC
′N,i =
N∑
i=1
γN,iγ′N,i + oP (1). (39)
Using E(∑N
i=1 CN,iuiu′iC
′N,i | X) = B−1
N − A−1N we obtain that
∥∥∥∥∥E(
N∑
i=1
γN,iγ′N,i | X
)− (B−1
N − A−1N )
∥∥∥∥∥
≤N∑
i=1
E(‖CN,iui‖2 I(‖CN,iui‖ > ǫN)
∣∣X) P−→ 0. (40)
28
For the (k, l)-th entry of the matrix∑N
i=1 γN,iγ′N,i, we have
E
[
N∑
i=1
(γN,i)k(γN,i)l − E((γN,i)k(γN,i)l | X)
]2∣∣∣∣∣∣X
=N∑
i=1
E([(γN,i)k(γN,i)l − E((γN,i)k(γN,i)l | X)]2
∣∣X)
≤N∑
i=1
E([(γN,i)k(γN,i)l]
2∣∣X)
≤ ǫ2N
N∑
i=1
E((γN,i)
2k
∣∣X)
≤ ǫ2N
N∑
i=1
E((CN,iui)
2k
∣∣X)
= oP (1),
which implies thatN∑
i=1
γN,iγ′N,i = E
(N∑
i=1
γN,iγ′N,i
∣∣∣∣∣X)
+ oP (1). (41)
From (39), (40) and (41) we conclude that
N∑
i=1
CN,iuiu′iC
′N,i = B−1
N − A−1N + oP (1),
which implies that
Cov
((B−1
N − A−1N
)−1/2N∑
i=1
CN,iuiηi
∣∣∣∣∣XN
)
=(B−1
N − A−1N
)−1/2N∑
i=1
CN,iuiu′iC
′N,i
(B−1
N − A−1N
)−1/2 P−→ IK . (42)
Moreover, since∑N
i=1 ‖CN,iui‖2 P−→ tr(B−1−A−1) and, according to (35), P (max1≤i≤N ‖CN,iui‖ >
c | X) ≤ (1/c2)∑N
i=1 E[‖CN,iui‖2I(‖CN,iui‖ > c) | X]P−→ 0 we obtain, for arbitrary ǫ > 0,
thatN∑
i=1
E(‖CN,iuiηi‖2I (‖CN,iuiηi‖ > ǫ)
∣∣XN
)
≤N∑
i=1
‖CN,iui‖2E(η2
i I (‖CN,iui‖|ηi| > ǫ)∣∣XN
) P−→ 0,
that is, we have again a conditional Lindeberg condition being fulfilled. Therefore, (37)
follows from (42) by the Lindeberg-Feller central limit theorem.
Now it remains to prove (38). We have that
TN = −N∑
i=1
CN,iXi(βFE − β)ηi.
29
Since E(η2i | XN) = 1 we obtain that
E(‖TN‖2 | XN
)=
N∑
i=1
(βFE − β)′X ′iC
′N,iCN,iXi(βFE − β)
≤ ‖βFE − β‖2 · max1≤i≤N
‖X ′iXi‖ ·
N∑
i=1
C ′N,iCN,i
= OP (N−1) · oP (N) · OP (1) = oP (1),
that is, (38) holds also true.
Proof of Proposition 4. Analogous to the proof of Proposition 3.
Proof of Theorem 1. Since χ2(K) is a continuous distribution we conclude from Proposi-
tion 1 that
sup−∞<z<∞
∣∣P (HN ≤ z) − P(χ2(K) ≤ z
)∣∣ −→N→∞
0, (43)
which implies by Proposition 3 that
sup−∞<z<∞
|P (HN ≤ z) − P (H∗N ≤ z | XN)| P−→ 0. (44)
Using the fact that P (H∗N ≤ z | XN)
P−→ γ (Since the distribution of H∗N can be discrete we
cannot guarantee that P (H∗N ≤ z | XN) = γ, however, Proposition 3 ensures at least this
convergence.) we obtain that∣∣∣P (HN > c)|c=c∗
γ
− γ∣∣∣
≤ sup−∞<z<∞
|P (HN ≤ z) − P (H∗N ≤ z | XN)| + oP (1)
P−→ 0.
This implies that
P (HN > c∗γ) −→N→∞
0.
Proof of Theorem 2. Analogous to the proof of Theorem 1.
References
Abdulai, A. and Tietje, H. (2007). Estimating technical efficiency under unobserved hetero-
geneity with stochastic frontier models: Application to northern Germany dairy farms.
European Review of Agricultural Economics, 18, 1–24.
30
Ahn, S.C. and Moon, H.R. (2001). On Large-N and Large-T properties of panel data esti-
mators and the Hausman test. mimeo, University of Southern California.
Amemiya, T. (1971). The estimation of variances in a variance-components model. Interna-
tional Economic Review, 12, 1–13.
Anselin, L., Florax, R.J.G.M. and Rey, S.J. (Eds.) (2004). Advances in Spatial Econometrics.
Springer, Berlin.
Baltagi, B.H. (2001). Econometric Analysis of Panel Data. John Wiley, Chichester.
Baltagi, B.H. and Griffin, J.M. (1983). Gasoline demand in the OECD: An application of
pooling and testing procedures. European Economic Review, 29, 745–753.
Baltagi, B.H. and Kao, C. (2000). Nonstationary panels, Cointegration in panels and dynamic
panels. A survey. in: Baltagi (Ed.): Nonstationary panels, Cointegration in panels and
dynamic panels. Advances in Econometrics, 15, JAI Press, Amsterdam, 7–52.
Baltagi, B.H. and Pinnoi, N. (1995). Public capital stock and state productivity growth:
Further evidence from an error components model. Empirical Economics, 20, 351–359.
Bole, V.A. and Rebec, P. (2004). Bootstrapping the Hausman test in panel data models.
Manuscript, available at SSRN: http://ssrn.com/abstract=628321
Bollerslev, T., Chou, R.Y. and Kroner, K. F. (1992). ARCH modelling in finance: A review
of the theory and empirical evidence. Journal of Econometrics, 52, 5–59.
Cameron, A.C. and Trivedi, P.K. (2005). Microeconomics: Methods and Applications. Cam-
bridge University Press, New York.
Davidson, R. and Flachaire, E. (2001). The wild bootstrap, tamed at last. GREQAM Doc-
ument de Travail 99A32
Grunfeld, Y. (1958). The determinants of corporate investment, unpublished Ph.D. disser-
tation (University of Chicago, Chicago).
Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.
Hausman, J.A. (1978) Specification Tests in Econometrics, Econometrica, 46, 1251–1271.
31
Herwartz, H. and Neumann, M.H. (2005). Bootstrap inference in single equation error cor-
rection models. Journal of Econometrics, 128, 165–193.
Herwartz, H. and Neumann, M.H. (2007). A robust bootstrap approach to the Hausman test
in stationary panel data models. Economic Working Papers, Kiel University, 2007-29
Kiefer, N.M. (1980) Estimation of Fixed Effect Models for Time Series of Cross Sections
with Arbitrary Intertemporal Covariance, Journal of Econometrics, 14, 195–202.
Lillard, L.A. and Willis, R.J. (1979). Components of variation in panel earnings data: Amer-
ican scientists 1960-1970, Econometrica, 47, 437–454.
Liu, R.Y. (1988). Bootstrap procedures under some non-i.i.d. models. Annals of Statistics,
16, 1696–1708.
Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensional linear models. An-
nals of Statistics, 21, 255–285.
Munnell, A. (1990). Why has productivity growth declined? Productivity and public invest-
ment. New England Economic Review, January/February, 3–22.
Nerlove, M. (1971). Further evidence on the estimation of dynamic economic relations from
a time series of cross sections. Econometrica, 39, 359–382.
Swamy, P.A.V.B. and Arora, S.S. (1972). The exact finite sample properties of the estimators
of coefficients in the error components regression model. Econometrica, 40, 261–275.
Wallace, T.D. and Hussain, A. (1969). The use of error components models in combining
cross section and time series data, Econometrica, 37, 55–72.
Wu, C.F.J. (1986). Jackknife, bootstrap, and other resampling methods in regression analysis
(with discussion). Annals of Statistics, 14, 1261–1343.
32
DGM 1 DGM 2 DGM 3 DGM 4 DGM 5∑
DGMsγT (H1) #(γ 6= γ)5 #(γ 6= γ)10 #(γ 6= γ)20
Σ(•) H0 H1 H0 H1 H0 H1 H0 H1 H0 H1 5 10 20 1%5%10% 1%5%10% 1%5%10%AR χ2(2) 3.76 13.9 3.98 13.9 4.72 15.0 4.56 13.4 5.78 7.96 57.5 64.1 67.5 2 3 4 3 3 3 3 2 3
ηi(R) 5.30 13.0 5.30 13.4 5.04 15.3 4.98 14.0 5.62 7.52 57.8 63.2 66.7 0 0 0 0 1 0 0 0 0ηi(M) 5.24 13.6 5.02 13.9 5.22 14.2 5.10 13.7 5.34 7.80 58.4 63.3 67.0 0 0 0 0 0 0 0 0 0iid b. 5.40 13.6 5.36 14.2 5.56 14.3 5.84 13.3 6.62 7.78 58.4 63.2 66.6 1 1 2 3 2 2 1 1 1
AR χ2(2) 3.20 13.6 3.50 13.3 4.24 13.3 2.96 14.1 5.60 7.58 59.1 62.0 68.6 4 4 4 3 4 4 5 4 4ρ = 0.5 ηi(M) 5.34 13.3 5.22 13.2 5.48 13.1 4.82 13.6 5.30 7.86 58.5 61.1 67.8 0 0 0 1 0 0 0 0 0
iid b. 5.46 13.4 5.64 13.9 5.70 12.8 5.18 14.0 6.48 7.72 58.0 61.9 68.7 2 1 2 1 3 3 1 1 1AR χ2(2) 4.30 13.7 4.78 13.4 4.36 14.3 3.70 13.6 5.96 8.08 57.8 63.2 66.6 4 4 5 3 4 3 1 3 4ρ = 0.0 ηi(M) 5.28 13.3 5.30 13.3 5.36 14.0 5.10 13.5 5.18 7.76 58.6 61.8 67.6 0 0 0 0 0 0 0 0 0
iid b. 5.22 13.4 5.58 13.2 5.82 14.1 5.58 13.4 6.38 7.74 59.0 61.9 67.1 1 1 2 2 2 4 1 1 1MA χ2(2) 4.34 14.0 4.56 13.5 4.90 15.1 5.18 13.5 5.82 8.00 58.1 64.0 68.0 1 1 2 2 2 2 3 3 2
ηi(M) 5.34 13.9 4.98 14.1 5.18 14.3 5.00 13.6 5.06 8.32 58.1 64.3 66.8 0 0 0 0 0 0 0 0 0iid b. 5.28 13.8 5.52 14.1 5.52 15.1 5.56 14.2 6.44 7.70 58.0 64.9 65.7 2 1 1 1 1 2 1 1 1
TH χ2(2) 4.54 14.0 4.88 13.0 5.14 14.8 5.22 13.8 5.70 7.62 57.8 63.2 67.0 0 1 1 1 1 1 2 2 3ηi(M) 5.24 13.6 5.00 13.9 5.20 14.2 4.96 13.8 5.24 7.80 57.1 63.5 65.9 0 0 0 0 0 0 0 0 0iid b. 5.20 13.5 5.62 13.7 5.52 14.8 5.60 13.4 6.24 8.06 57.8 63.5 66.7 1 1 1 3 2 2 1 1 1
GP χ2(2) 5.28 14.2 5.46 13.6 5.70 14.6 5.62 13.8 6.26 7.80 59.5 64.0 67.8 1 1 2 1 3 3 1 1 1ηi(M) 5.24 13.7 5.02 13.8 5.20 15.0 4.84 13.6 5.26 7.92 60.1 64.0 67.3 0 0 0 0 0 0 0 0 0iid b. 5.46 13.7 5.52 14.2 5.72 14.4 5.68 13.3 6.54 7.80 57.3 63.3 66.7 2 1 1 2 3 3 1 1 1
SP χ2(2) 4.82 14.4 5.78 12.7 5.34 14.9 5.26 14.5 5.72 8.40 56.8 64.9 67.5 0 1 2 2 2 1 0 2 1ηi(M) 4.88 13.8 5.22 12.8 4.90 14.6 4.90 13.6 4.78 8.08 58.7 62.8 65.2 0 0 0 0 0 0 1 0 0iid b. 5.10 13.9 5.60 12.7 5.40 14.5 5.48 13.7 5.72 8.10 57.6 62.8 67.2 0 1 2 0 1 1 0 1 1
Hiid 5.36 13.7 5.72 13.2 6.00 14.4 5.82 13.6 6.64 8.20 58.2 63.1 67.3 2 1 2 3 4 4 1 2 1
Table 1: Empirical rejection frequencies from alternative critical values for the Hausman statistic. Panel dimensions are N = 1000 and(mostly) T = 10. For each covariance estimator Σ(•) (see Section 2.4) critical values are obtained from the χ2-distribution, the wild
bootstrap (ηi(R), ηi(M)) and cross sectional iid resampling (iid b.). Hiid signifies a bootstrap approach to estimate Cov[β
(iid)GLS − β
(iid)FE
]. Error
distributions are cross sectionally homogeneous (DGM 1), have cross section specific variance (DGM 2) or correlation (DGM 3). DGMs 4and 5 feature irregular correlation patterns and cross section specific variances of individual effects, respectively. Rejection frequencies(100 · γ) under H0 and size adjusted rejection frequencies H1 are given. Bold entries indicate that under H0 γ is not covered by a 95%confidence interval around the nominal 5% level. Columns ’
∑DGMs γT (H1)’ display the sum of power estimates over 5 alternative DGMs
when γ = 5% and T = 5, 10, 20. The number of significant violations of γ = 1%, 5%, 10% over the five DGMs with alternative timedimensions is listed underneath #(γ 6= γ)T .
33
DGM 1 DGM 2 DGM 3 DGM 4 DGM 5∑
DGMsγ(H1) #(γ 6= γ)5 #(γ 6= γ)10 #(γ 6= γ)20
Σ(•) H0 H1 H0 H1 H0 H1 H0 H1 H0 H1 5 10 20 1%5%10% 1%5%10% 1%5%10%AR χ2(2) 2.82 13.5 2.92 12.9 4.14 13.6 2.02 12.6 5.74 8.00 56.3 60.6 62.0 5 4 4 4 5 5 3 4 4
ηi(R) 5.72 13.2 5.00 13.2 5.38 12.7 5.32 13.4 5.72 8.06 52.3 60.6 62.4 2 2 2 1 2 2 1 4 5ηi(M) 4.94 13.6 5.00 12.1 4.90 12.7 4.82 12.9 5.10 8.12 51.1 59.4 62.4 5 2 1 3 0 2 4 1 4iid b. 4.74 13.2 5.02 12.9 5.04 13.3 4.32 12.7 6.64 7.58 53.1 59.6 59.5 2 2 3 3 2 1 1 3 4
AR χ2(2) 2.60 14.1 2.18 13.3 2.74 14.4 1.54 12.9 5.64 7.66 57.4 62.4 61.5 5 5 5 4 5 5 4 5 5ρ = 0.5 ηi(M) 5.16 13.7 4.88 12.0 4.52 13.9 4.98 13.4 5.16 8.22 52.3 61.2 62.1 5 2 3 4 0 2 3 0 4
iid b. 4.76 14.4 5.00 12.8 4.70 14.2 4.64 12.5 7.32 7.96 55.4 61.8 60.3 2 2 3 4 1 1 1 2 4AR χ2(2) 3.46 13.6 3.58 13.0 3.66 14.1 2.74 12.9 5.78 8.22 55.4 61.9 61.7 5 5 5 4 5 5 3 3 2ρ = 0.0 ηi(M) 4.84 13.1 4.88 12.0 4.74 13.4 4.80 13.2 4.90 7.96 51.5 59.7 61.3 5 2 1 4 0 1 4 2 4
iid b. 4.68 13.5 4.82 12.6 4.94 13.2 4.24 11.9 6.46 8.16 53.3 59.3 59.0 2 2 3 3 2 2 1 2 4MA χ2(2) 3.32 13.0 3.44 12.7 4.30 13.2 2.66 12.5 5.90 7.94 56.2 59.3 61.4 4 4 4 4 5 4 3 4 4
ηi(M) 4.92 13.7 4.80 11.9 4.76 13.5 4.72 13.5 5.02 7.88 51.3 60.5 62.9 5 3 2 2 0 1 3 0 4iid b. 4.78 13.8 4.92 12.6 5.00 13.3 4.16 12.4 6.68 7.52 55.2 59.6 58.0 2 2 3 4 2 2 1 4 4
TH χ2(2) 4.72 12.3 4.48 13.1 4.80 13.3 5.52 10.2 5.72 7.52 55.6 56.5 51.8 3 3 3 2 1 1 3 3 4ηi(M) 4.84 13.2 5.02 12.2 5.32 12.1 5.12 12.0 4.80 7.84 51.9 57.4 60.5 5 2 1 2 0 2 3 2 2iid b. 4.92 12.8 5.02 12.4 5.08 13.5 4.72 11.4 6.20 7.34 55.9 57.4 57.0 3 2 3 1 1 2 1 2 3
GP χ2(2) 4.66 13.6 4.94 12.4 4.96 13.8 4.42 12.6 6.14 8.36 55.3 60.7 61.4 3 2 3 2 1 2 1 1 4ηi(M) 5.24 13.3 5.50 11.9 5.28 12.9 5.32 12.5 5.14 8.38 51.9 59.0 62.5 5 3 2 3 0 3 3 2 4iid b. 4.86 13.6 5.30 12.5 5.34 13.4 5.02 12.3 6.50 8.26 55.4 60.0 60.9 3 2 4 0 1 4 1 4 5
SP χ2(2) 4.82 8.84 4.76 8.50 4.76 10.2 4.20 9.18 5.36 6.28 49.4 43.0 32.3 2 1 2 3 1 1 2 1 0ηi(M) 4.90 8.64 4.28 8.68 4.48 10.0 4.92 8.34 5.06 5.74 48.4 41.4 33.9 5 4 1 4 1 1 5 4 2iid b. 4.66 8.80 4.72 8.80 4.58 10.1 4.44 8.88 5.34 6.14 49.3 42.7 33.1 2 1 2 3 0 2 4 1 0
Hiid 5.12 12.9 5.08 12.7 5.16 13.2 4.32 12.8 7.22 8.00 54.4 59.7 59.3 2 2 3 2 2 3 1 3 4
Table 2: Empirical rejection frequencies for alternative strategies to obtain critical values for the Hausman statistic for panel dimensionsN = 40 and (mostly) T = 10. For further notes see Table 1.
34
crit Σ 97-05 97-99 00-02 03-05 Σ 97-05 97-99 00-02 03-05
HN AR 61.47 59.40 32.78 41.14 ρ = 84.25 55.94 47.56 56.99χ2 0.000 0.001 3.567 0.357 0.5 0.000 0.003 0.049 0.002ηi(M) 0.835 2.504 3.506 5.843 0.835 2.671 0.501 2.003iid b. 0.000 0.334 4.508 0.000 0.000 0.501 3.005 0.000
HN TH 65.57 62.38 36.28 49.22 ρ = 66.06 53.81 39.12 47.00χ2 0.000 0.000 1.426 0.029 0.2 0.000 0.006 0.644 0.059ηi(M) 0.501 2.170 4.841 4.341 0.835 2.671 0.668 5.008iid b. 0.000 0.334 5.008 0.000 0.000 0.501 1.669 0.167
HN MA 62.53 65.23 38.54 49.73 ρ = 59.20 54.46 35.38 43.29χ2 0.000 0.000 0.761 0.024 0 0.001 0.005 1.816 0.187ηi(M) 1.002 2.170 2.003 3.840 0.835 2.671 1.503 6.845iid b. 0.000 0.334 2.838 0.000 0.000 0.501 1.503 0.167
HN GP 98.24 62.38 36.28 49.22 ρ = 54.92 56.65 33.02 41.23χ2 0.000 0.000 1.426 0.029 -0.2 0.004 0.002 3.355 0.348ηi(M) 0.668 2.170 4.841 4.341 1.336 2.504 2.003 7.012iid b. 0.000 0.334 5.008 0.000 0.000 0.501 2.170 0.167
HN SP 74.25 49.37 31.02 44.46 ρ = 52.01 64.03 32.62 41.43χ2 0.000 0.027 5.495 0.130 -0.5 0.011 0.000 3.713 0.328ηi(M) 4.007 4.841 10.35 9.683 2.337 2.170 3.506 5.342iid b. 0.167 0.501 8.848 0.167 0.000 0.334 4.174 0.000
Hiid 62.27 70.21 42.52 50.44χ2 0.000 0.000 0.236 0.019
σe .095 .088 .071 .056 ρ .077 -.343 -.520 -.465σi .168 .288 .172 .268
Table 3: Hausman statistics and p−values (·100) obtained for a translog production functiondescribing dairy production for N = 149 farms in Northern Germany over the time period1997 to 2005 (T = 9). Alternative covariance estimators are indicated as in Table 1. The righthand side panel lists inferential results based on alternative preselections of the autoregressiveparameter ρ. ’χ2’ signifies that critical values are taken from a χ2-distribution with 20 degreesof freedom. The Table provides inferential and estimation results for the entire sample periodand for 3 subsamples each covering a time span of 3 years. For further notes see Table 1.
35