multicollinearity in cross-sectional...
TRANSCRIPT
Multicollinearity in Cross-Sectional Regressions‡.
Jørgen Lauridsen (*),Jesús Mur (**),
(*)Corresponding author: The Econometric Group Department of Economics. University of Southern Denmark. Odense. Denmark. e-mail: [email protected] (**)Department of Economic Analysis. University of Zaragoza. Zaragoza. Spain. e-mail: [email protected]
Abstract
The robustness of the results coming from an econometric application depends to a
great extent on the quality of the sampling information. This statement is a general rule
that becomes especially relevant in a spatial context where data usually have lots of
irregularities.
The purpose of this paper is to examine more closely this question paying attention to
the impact of multicollinearity. It is well known that the reliability of estimators (least-
squares or maximum-likelihood) gets worse as the linear relationships between the
regressors become more acute. The main aspect of our work is that we resolve the
discussion in a spatial context, looking closely into the behaviour shown, under several
unfavourable conditions, by the most outstanding misspecification tests when collinear
variables are added to the regression. For this purpose, we plan and solve a Monte Carlo
simulation. The conclusions point to the fact that these statistics react in different ways
to the problems posed.
‡ Acknowledgements: This work has been carried out with the financial support of project SEC 2002-
02350 of the Spanish Ministerio de Educatión. The authors also wish to thank Ana Angulo for her
invaluable and disinterested collaboration.
1
1- Introduction
The main purpose of this paper is to examine the relationship between quality of the
sampling information and trustworthiness of econometric results in a cross-sectional
setting. We will focus on one point in particular, namely multicollinearity.
Multicollinearity among regressors is an intriguing and common property of data. The
consequences for estimation and inference are well known: unreliable estimation
results; high standard errors; coefficients with wrong signs and implausible magnitudes,
etc. (Belsley et al., 1980). In light of these problems, it is striking to see the relatively
shortcut treatment of the problem in econometric literature. Usually the discussion is
restricted to a few diagnostics, together with some standard suggestions for estimation,
trading an (expected) small increase in bias for an (expected) small reduction of MSE.
This slight treatment is frequently justified by judging the problem irrelevant, using
statements alike that of Greene (2003): ‘Suggested “remedies” to multicollinearity
might well amount to attempts to force the theory on the data’.
Indeed, a number of serious attempts to resolve the multicollinearity problem show up
in literature, but they are generally not included as a part of the econometrician’s
toolbox. Such attempts include the three-stage test procedure (Farrar and Glauber, 1967,
Wichers, 1975, Kumar, 1975, and O’Hagan and McCabe, 1975); regularisation methods
(Draper and Nostrand, 1979, Hocking, 1983) and factor analysis regression (Scott,
1966, King, 1969, Scott, 1969). A recent attempt by Kosfeld and Lauridsen (2005)
integrates a common factor measurement model in an errors in variables setting and
suggest a feasible factor analysis regression (FAR) estimator which outperforms the
OLS estimator for cases of medium and strong multicollinearity. In-depth treatment of
the multicollinearity problem and tools for detection and remedying are presented by
Belsley et al. (1980) and Chatterjee and Hadi (1988).
The purpose of the present investigation is to address the specific problems caused
when performing misspecification tests in a spatial cross-sectional regression. In section
2 we go deeply into the issues related to multicollinearity by tracing the partial effect on
misspecification tests from an additional variable which is collinear to the remainding
variables, using a partial regression framework established by Chatterjee and Hadi
(1988). As it is well known (Chatterjee and Hadi, 1988; Belsley et al., 1980) that
2
multicollinearity does affect least squares estimates but not the least squares residuals,
on which the tests are based, this property is shown to be sensitive in unpredictable
manners to the amount of spatial dependency as well as misspecification of the
underlying spatial process. While the framework applied also serves well as a tool to
trace the impact of extremal observations (which is equivalent to omitting relevant
variables, i.e. a set of dummies each of which holds the value 1 for an extremal
observation) as well as the impact of joint presence of outliers and multicollinearity, the
present study concentrates on the multicollinearity problem. An analysis of the effects
of extremal observations in spatial cross-sectional regression appears in Mur and
Lauridsen (2005), and an integrative study combining the two aspects is planned to
occur (Lauridsen and Mur, 2005). A simulation study is carried out in the third section
in order to analyse finite-sample size and power effects on tests for spatial dependency
of 1) omission/inclusion of an additional collinear variable, and 2) misspecification of
the underlying spatial process. The paper finishes with a section of conclusions.
2- Multicollinarity in cross-sectional econometric models
Essentially, multicollinearity refers to the successive inclusion of additional variables
that lift the collinearity of the full set of explanatory variables to a ‘harmful’ level. This
is the case if the additional variables 1) closely correlates to one or more linear
combination of the variables already in the model and 2) contributes relatively little to
the prediction apart from what is provided by the variables already in the model.
Formally, the problem faced can be expressed as
(1) uXXY +β+β= 2211
where 1X are the variables included in the model and 2X a set of 2k additional
variables to be considered added to the specification. Rewriting ξ+=ξ+γ= 212 XXX ,
where ξ is a set of 2k error vectors, multicollinearity occurs when the variance of ξ is
relatively small as compared to the variance of 2x .
To trace the impact of adding the additional regressor, two matrices are central: the
prediction matrix, defined as ')'( 1 XXXXP −= , which maps y onto the prediction y ,
3
i.e. Pyy =ˆ , and the residual matrix PIM −= , which maps y onto the residuals from a
regression on X , i.e. Myu = .
The prediction y can be thought of as made up of two independent predictions: 1) the
prediction provided by 1X and 2) the additional prediction provided by the part of 2X
that is independent from 1X , i.e. the residual from a regression of 2X on 1X , which is
equal to ξ . This can be formalised by partialising the predictor matrix into two
predictor matrices as
(2) 121
2122111
11121 ')'(')'( MXXMXXMXXXXPPP −− +=+=
so that the prediction is partialised as 2121 ˆˆˆ yyyPyPPyy +=+== .
Essentially, spatial dependency may be incorporated in (1) in one of two ways. A
dynamic substantive dependency is included by respecifying (1) with a spatially
autoregressive term (SAR) as
(3) uXXWyY +β+β+ρ= 2211
while a static residual dependency is included by respecifying (1) as
(4) ν+β+β+ρ= 2211 XXWyY
which further divides into the spatially autocorrelated (SAC) specification obtained by
letting
(5) uW +νρ=ν
or the spatial moving average (SMA) specification obtained by letting
(6) Wuu ρ−=ν .
Further, static and substantive dependency may be combined by replacing the residual
of (3) with a residual on the form (5) to obtain a spatially autoregressive – spatially
autocorrelated (SARC) specification or on the form (6) to obtain a spatially
autoregressive – spatial moving average (SARMA) specification.
4
For the moment we will limit ourselves to evaluate the impact on the misspecification
statistics habitually used in cross-sectional econometric models, that is, on Moran's I ,
ERRLM − , ELLM − and KR , which address the problem of spatial dependence in
the error term, together with the LAGLM − and the LELM − whose objective is to
analyse the dynamic structure of the equation. To these we add the SARMA test, whose
alternative hypothesis is composite (dynamic structure in the equation and a moving
average error term). Appendix 1 provides a brief presentation. With respect to our work,
it is important to point out that the seven tests are constructed from the residuals of the
LS estimation. Given that these residuals react in a different way to the presence of
anomalies in the sample, this sensitivity should appear, at least in part, also in the tests.
To establish the impact of adding 2X to the regression on the Moran test, use that
2121 )( PMPPIM −=−−= , where 1M is the residual matrix for the regression of y
on 1X , whereby yPMMyu )(ˆ 21 −== , so that the Moran I test reads as
(7) yPyyMy
yWPMyyWPPyyWMMy
S
R
Myy
MWMyy
S
R
uu
uWu
S
RI
21
212211
000 ''
'2''
'
'ˆ'ˆ
ˆ'ˆ
−−+
===
2120
1054
321
0
iIDS
RD
S
R
mm
mmm
S
R +=+=−
−+=
where 4
11 m
mD = and
5424
1543422 mmm
mmmmmmD
−+−
= , with yWMMym 111 '= ,
yWPPym 222 '= , yWPMym 213 '2= , yMym 14 '= , and yPym 25 '= . Thus, 1I is the
Moran test that emerges when y is regressed on 1X only, and 2i is the additional effect
on the test when adding 2X to the regression. Increasing collinearity implies that all
quadratic and cross product terms involving yPy 22ˆ = goes towards 0, i.e. 2m , 3m and
5m goes to 0, while 1m and 4m are left unaffected. This implies that 2i goes toward 0,
so that the test involving 1X and 2X moves toward the test involving 1X only.
The effect on the expected value of the I test under the null is traced as follows:
5
(8) )()(
)()(
)()(
)( 20
100
WPtrkRS
RWMtr
kRS
RMWtr
kRS
RIE
−−
−=
−=
21 )( eiIE +=
where )( 1IE is the expectation when including only 1X in the regression, and 2ei the
additional effect that goes toward 0 for increasing collinearity.
An alike – but admittedly more involved – development for the variance of I provides:
(9) )}()()'({)2)((
1)()( 22
0
MWtrMWMWtrMWMWtrkRkRS
RIV ++
−−−=
2)}({ IE−
211
21111
2
0
)}({)}()()'({)2)((
1)( IEWMtrWWMMtrWWMMtr
kRkRS
R −++−−−
=
)'()()()'({)2)((
1)[( 122
22222
2
0
WWMPtrWPtrWWPPtrWWPPtrkRkRS
R −++−−−
+
2112222112 )()}](2{)}()()( viIVIEiiWPtrWMtrWWMPtr +=+−−−
where traces involving 2P goes to 0 under increasing collinearity, so that 2vi moves
toward 0.
To conclude, the effect on the Moran test of adding a variable that is collinear to the
variables already included is an additional term, which is small in the sense that its
magnitude moves towards 0 for increasing collinearity. While this result is promising in
itself, the practical applicability is limited for at least two reasons. First, ‘small’ is not
necessarily equivalent to ‘small enough’. It is therefore necessary to establish some
knowledge about the relative importance of the term 2i (i.e. the effect of omitting 2x )
for the size and power of the test under fairly general circumstances and specific levels
of collinearity. Second, there is an interactive effect of collinearity and presence of
spatial dependencies. Assuming a fixed level of collinearity, increasing spatial
6
dependency will change the terms 1m , 3m and 4m , so that the magnitude of 2i is
reduced or increased in an unpredictable direction. Put another way, the effect of
omitting 2X , which is well understood for the size of the test or for the power of the
test for a specific level of spatial dependency, certainly may vary over the range of the
parameter or parameters that define the spatial dependency, and further be sensitive to
misspecification of the spatial process. These questions are resolved by Monte Carlo
studies in the following section, which investigates the finite-sample effect of varying
degree of collinearity on the size of the test as well as its power against increasing
spatial dependence.
For the ERRLM − test, a similar development provides:
(10) 2
54
3212
22
))(()ˆ'ˆ
ˆ'ˆ)((
mm
mmm
T
R
uu
uWu
T
RERRLM
−−+
==−
212122
22
1
2
)2)(()( errlmERRLMDDDT
RD
T
R −+−=++=
where 1m to 5m , 1D and 2D are defined as above under the treatment of the Moran
test. Thus, 1ERRLM − is the test when only 1X is included in the regression, and
2errlm − the additional effect of adding 2X . Increasing collinearity implies that 2D
moves toward 0, so that 2errlm − moves toward 0. But, again, increasing spatial
dependency and misspecification of the spatial process interacts with this effect in an
unpredictable way.
The LAGLM − test can be viewed as a product of two terms, multiplied by a constant:
(11) 2222 )1
()1
()ˆ'ˆ
'ˆ( R
RSR
Ruu
WyuLAGLM
jj
==−
Using that the first term may be partialised as
(12) 43
23
4132
3
1
43
21
21
21
''
''
'
'ˆ'ˆ
ˆ
ppp
pppp
p
p
pp
pp
yPyyMy
WyPyWyMy
Myy
MWyy
uu
WyuS
−+−
+=−−
=−−
===
21 sS +=
7
where increasing collinearity will move 2p and 4p toward 0, it follows that 2s moves
toward 0. For the second term, a partialisation of jR gives
(13) Myy
yMWWyRTR j '
ˆ''ˆ1 +=
yWPWPPyyWPWPPyyWPWMPyRT 2211212111 '''[{ −−++=
]}'''' 222122212112 yWPWPPyyWPWPPyyWPWMPyyWPWMPy −−++
}''/{ 21 yPyyMy −
2143
23
4332
3
11
43
211 jj rR
nnn
nnnnR
n
nRT
nn
nnRT +=
+−
++=−+
+=
where yWPWMPyn 1111 '= , 2n equals the term in the square brackets, yMyn 13 '= , and
yPyn 24 '= , so that 1jR is the value of jR obtained when regressing only on 1X and
2jr the additional effect of adding 2X . For increasing collinearity, all terms involved in
2n moves towards 0, like 4n does, while 3n is unchanged. This implies that 2jr moves
toward 0. Combining these results,
(14) 2
2121
22
11211222
1
212
21
2212
2 2)(R
rRR
rSRsSRsR
R
SR
rR
sSR
R
SLAGLM
jjj
jjj
jjjj +−+
+=+
+==−
21 lmlagLAGLM +−=
where 1LAGLM − is the test obtained when regressing on 1X only. For increasing
collinearity, the numerator in 2lmlag goes toward 0 and the denominator toward the
constant 21jR , implying that LAGLM − moves toward 1LAGLM − . Thus, in case of
strong collinearity, the tests calculated with and without 2X will be close to each other.
Again, increasing spatial dependency and misspecification of the spatial process may
work against this conclusion, as it increases the numerator of 2lmlag .
For the ELLM − test, it is found using the above definitions that
8
(15) 221212
2
2
2
2 ))((
)ˆ'ˆ
'ˆ
ˆ'ˆ
ˆ'ˆ(
sSR
TDD
R
TT
R
R
TT
uu
Wyu
R
T
uu
uWu
RELLMj
jj
j +−+−
=−
−=−
])([))()(( 21
11
1
2
22
22112
2
SR
TD
R
TT
Rs
R
TDS
R
TD
R
TT
R
j
j
jj
j
−−
=−+−−
=
)(
)(
)11
(
)(
)2()( 11
2
21
2
121
22
2211
122
12
2 SR
TD
rR
TT
RrRTR
rRR
RrSTr
j
jj
jjj
jjj
jj −
+−
−−
+++
−+
212211222
2
)))((2( ellmELLMsR
TDS
R
TDs
R
TD
R
TT
R
jjj
j
−+−=−−+−−
+
where 1ELLM − (i.e. the term in square brackets) is the value of the test when
regressing on 1X only, and 2ellm − is the additional effect of including 2x to the
regression. For increasing collinearity, the terms 2D , 2s , 2jr and )11
(121 jjj RrR
−−
go
toward zero so that the bias of omitting 2X vanishes for increasing collinearity. But,
again, the impact of spatial dependency and misspecification of the spatial process may
interact with this effect.
For the LELM − test, an alike derivation provide
(16) TR
DsDSR
TR
DDsSRLELM
jj −−+−
=−
+−+=−
222112
221212 ))()(())()((
TR
DsDSR
TR
DsR
TRTrR
DSrR
TR
DSR
jjjjj
j
j −−−
+−−
+−−+
−−+
−−
=))(()(
))((
))((]
)([ 2211
2222
2
121
1122
1
2112
21 lelmLELM −+−=
9
where 1LELM − (i.e. the term in square brackets) is the value of the test when
regressing on 1X only and 2lelm − the additional effect on the test of adding 2X to the
regression. As increasing collinearity implies that 2D , 2s and 2jr go toward zero, the
bias vanishes, and the LELM − value converges toward 1LELM − .
For the SARMA test, which is a sum of the LELM − and the ERRLM − test, it
follows from the above derivations that
21 sarmaSARMASARMA +=
where 1SARMA is the value of the test for a regression on 1X only and 2sarma the
additional effect of adding 2X to the regression, which goes toward zero for increasing
collinearity.
Concluding our investigation of the tests, it generally holds true that the value of the test
calculated from a regression of y on a set of regressors 1X and an additional regressor
2X moves toward the value of the test obtained when regressing y on 1X only, when
the collinearity between 2X and 1X is increased. This leads to an important guideline
for applied research: if the test calculated with and without the additional variable
included in the regression leads to test values on the same side of the critical value, then
conclusion about spatial effects can, ceteris paribus, be drawn without any further
concern of multicollinearity. It is, however, important to verify to which content such a
similarity can be expected to occur under general conditions. Resolving this question
implies an investigation of the empirical performance of the size and power functions of
the test under increasing collinearity and finite sample sizes as well as their robustness
toward different spatial specifications. Such an investigation is the aim of the Monte
Carlo study in section 3.
3.- Monte Carlo results
Regarding spatial structure, two models are simulated: a static one with a structure of
residual dependence in the forms of a SAC and a SMA process, and a substantive
dependence model specification in the form of a SAR with a white noise error term. The
10
models simulated in the static case share a simple design, νβ += Xy , while the model
for the SAR case is expressed as uXWyy ++= βρ .
For the SAC structure, ν is defined from uWv += νρ , while it is defined from
Wuuv ρ−= for the SMA case. For each structure, a simple model is simulated using
]1[ 1xX = and )',( 10 ββ=β together with a bivariate model using ]1[ 21 xxX =
and )',,( 210 βββ=β . The impact of increasing collinearity is studied by defining 1)
ξ=2x and 2) i
xx1012
ξ+= , with i varying between the values 0, 1, 2, 3 and 4, which
implies that the correlation between 1x and 2x ranges from 0 to 0.9994. The variable 1x
is obtained from an )1,0(U distribution, while u and ξ are obtained from )1,0(N
distributions. The parameters jβ )2,1,0( =j are set to 10. Furthermore, three systems of
regions were defined from the rook contiguity matrix on an rr × board with r selected
as 5, 10 and 15, leading to R being 25, 100 and 225 respectively. The resulting matrix
W has been row-standardised. Only non-negative values of the parameter ρ ranging
from 0 to 0.99 have been applied, and the number of simulations fixed to 1000 for each
case.
The principal results obtained are gathered in Table 1, dedicated to the empirical size,
and in Figures 1 to 7 which present the estimated power functions of the tests.
The results reflected in Table 1 do not contain surprises. Multicollinearity does not
affect the size of the tests. The estimated values fluctuate, in most cases, around the
theoretical 5.0%, with the KR test as an exception, as the size of the test is lifted by
multicollinearity.
The results corresponding to the estimated power functions are presented in graphic
form in Figures 1 to 7 (details can be obtained from the authors upon request).
Some aspects of these graphs were already well-known beforehand. The weakness of
the spatial dependency tests in a context of small samples is one of them. Another is the
worsening of the performance of the tests when a moving average is use in the
alternative hypothesis. Neither is the greater reliability of Moran's I for detecting
dependency processes in the error term nor the weaknesses of the KR test, especially in
11
SMA structures (Florax and de Graaff, 2004), anything new. The figures also illustrate
the low level of specificity of the traditional statistics, which react to all types of spatial
dependence. This is the raison d'être of the robust Lagrange Multipliers (LM-EL and
LM-LE).
TABLE 1: Empirical size of the tests for a theoretical significance level of 0.05.(*)
Bivariate model (x1 and x2 ) R=25
Simple model (x1 only)
corr=0
i=0 (corr=0.2774)
i=2 (corr=0.9449)
i=4 (corr=0.9994)
Moran’s I LM-ERR LM-LAG LM-EL LM-LE SARMA KR
0.036 0.040 0.054 0.053 0.056 0.049 0.035
0.052 0.040 0.073 0.041 0.066 0.064 0.089
0.049 0.039 0.072 0.039 0.071 0.055 0.097
0.055 0.053 0.064 0.049 0.061 0.065 0.069
0.053 0.050 0.065 0.056 0.062 0.052 0.067
R=100 Moran’s I LM-ERR LM-LAG LM-EL LM-LE SARMA KR
0.051 0.052 0.054 0.062 0.046 0.043 0.019
0.052 0.049 0.056 0.054 0.056 0.060 0.068
0.050 0.050 0.040 0.051 0.041 0.043 0.059
0.041 0.037 0.047 0.036 0.044 0.046 0.045
0.048 0.050 0.055 0.052 0.056 0.056 0.067
R=225 Moran’s I LM-ERR LM-LAG LM-EL LM-LE SARMA KR
0.041 0.041 0.052 0.051 0.055 0.044 0.019
0.039 0.037 0.049 0.037 0.050 0.042 0.046
0.053 0.048 0.040 0.049 0.041 0.048 0.068
0.055 0.049 0.051 0.049 0.048 0.048 0.047
0.035 0.038 0.050 0.038 0.047 0.046 0.049
(*) 95% Confidence interval for p (probability of rejecting the null hypothesis), around the theoretical
value for the case of 1000 replications, is 0.036 < p < 0.064.
Figures 1-2 show that the Moran’s I and LM-ERR tests are unaffected by
multicollinearity. This holds for all sample sizes and for each DGP.
The situation in the case of the LM-LAG test seems intriguing. In Figure 3 it is seen that
the power function moves downward under residual dependence as we increase
multicollinearity among the regressors; that is clear when the dependency parameter
excess 0.2. On the other hand, the power function reacts upwards to multicollinearity
when the DGP includes substantive dependence. In this case, multicollinearity improves
the performance of the LM-LAG test.
12
For the LM-EL test, it is noticed from Figure 4 that the power function in the case of
substantive dependency is absolutely flat confirming that it discriminates correctly
between residual and substantive dependence. In the case of residual dependence, the
LM-EL test is slighly lifted by multicollinearity.
Turning to the LM-LE test, Figure 5 shows that the power function in the cases of
residual dependence is absolutely flat confirming that it discriminates correctly between
residual and substantive dependence. Regarding multicollinearity, the LM-LE test is
seen to react alike the LM-LAG test in the case of substantive dependency, i.e.
increasing multicollinearity improves the performance of the LM-LAG test.
Regarding the SARMA test, it is noticed from Figure 6 that the power function is
unaffected by multicollinearity when resídual dependence is present but, as for the LM-
LAG and the LM-LE tests, the power function moves upward for the case of substantive
dependence.
Finally, for the KR test, Figure 7 shows a general tendency to increase of the power
function for increasing collinearity. This tendency is only marginal when residual
dependence is present, and somewhat larger for the case of substantive dependency.
The effects of multicollinearity on size and power of the tests are summarized in Table
2. It is readily concluded that multicollinearity only impact the size for the KR test.
Further, the power for the Moran’s I and the LM-ERR tests when is unaffected by
multicollinearity, while it is lifted for the KR test. For the robust LM-EL test the power
increases slightly, both under SAC and SMA processes. Next, all tests for substantive
dependence have their power functions increased by multicollinearity in the case where
the DGP includes a lag of the endogenous in the right hand side of the equation.
13
Table 2. Effects of multicollinearity on tests for omitted spatial effects(**).
Effect on size Effect on power
Residual Dependence Residual Dependence
Substantive Dependence
SAR SAC SMA
Substantive Dependence
SAR SAC SMA
Moran’s I ↔ ↔ ↔ ↔ ↔ ↔
LM-ERR ↔ ↔ ↔ ↔ ↔ ↔
LM-EL ↔ ↔ ↔ ↔ ↑ ↑
KR ↑ ↑ ↑ ↑ ↑ ↑
LM-LAG ↔ ↔ ↔ ↑ ↓ ↓
LM-LE ↔ ↔ ↔ ↑ ↔ ↔
SARMA ↔ ↔ ↔ ↑ ↔ ↔
(**) ↑, ↓ and ↔ indicates that the size or power function is increased/decreased/unaffected by increasing collinearity among the regressors.
4.-Conclusions and final remarks
The objective of this paper was to examine the influence of multicollinearity
relationships on the misspecification tests more often used in a context of spatial
econometric modelling. It was shown that the additional effects on tests from adding an
additional variable generally vanish for increasing multicollinearity, but that this feature
interacts with spatial dependence in an unpredictable manner.
The simulation carried out has served to corroborate some hypotheses. Multicollinearity
does not impact the size of the tests for omitted spatial effects. This holds true
irrespectively of sample size and whether the characteristics of the DGP, with only sligh
deviations for the KR test. Further, the power for the unadjusted tests under residual
dependence (i.e. the Moran’s I, LM-ERR and KR tests) are largely unaffected by
multicollinearity for any sample size and DGP, with sligh exceptions for the KR test.
But for the robust one, the LM-EL test, the power increases with multicollinearity.
Next, all tests for lag omitted endogenous variables have their power functions
increased by multicollinearity in the case where there is, actually, substantive
dependence in the equation.
Finally, we wish to insist that this paper is nothing more than a first approximation to
the problem of multicollinearity in cross-sectional econometric models. We have
analysed a limited number of combinations with which we have been reaching a few
14
conclusions. Nevertheless, the cases that remain to be studied (including outliers and/or
more complex collinear patterns) seem even more interesting.
15
APPENDIX 1: Misspecification test used.
The tests used always refer to the model of the null hypothesis; that is, of the static type
such as: uXy += β . This model has been estimated by LS, where 2σ and β are the
corresponding LS estimations and u the residual series. The tests are the following (see
Anselin and Florax, 1995, or Florax and de Graaff, 2004, for details):
(A.1) Moran’s I Test: uu
uWu
S
RI
ˆ'ˆ
ˆ'ˆ
0
= ; ∑∑= =
=R
r
R
srswS
1 10
(A.2) ERRLM − Test: 22
1
)ˆ
ˆˆ(
1
σuWu
TERRLM =− ; )'( 2
1 WWWtrT +=
(A.3) ELLM − Test:
j
j
R
TT
Wyu
R
TuWu
ELLM2
11
22
12
)ˆ'ˆ
ˆ
ˆ'ˆ(
−
−=−
σσ
(A.4) KR Test: ee
ZZhKR R ˆ'ˆ
ˆ''ˆ γγ=
(A.5) LAGLM − Test: 22
)ˆ'ˆ
(1
σWyu
RLAGLM
j
=−
(A.6) LELM − Test: 1
222
)ˆ
ˆ'ˆ
ˆ'ˆ
(
TR
uWuWyu
LELMj −
−=− σσ
(A.7) SARMA Test: 22
11
222
)ˆ
ˆˆ(
1)
ˆ
ˆ'ˆ
ˆ'ˆ
(
σσσ uWu
TTR
uWuWyu
SARMAj
+−
−=
Moreover, 21 ˆ
ˆ'''ˆ
σββ MWXWX
TR j += and ')'( 1 XXXXIM −−= . Furthermore, e is the
vector of residuals from the auxiliary regression of the Kelejian-Robinson ( KR ) test, of
order 1×Rh , Z is the matrix of exogenous variables included in the last regression and
γ the estimated coefficients obtained for the corresponding vector of parameters.
16
As is well-known, the asymptotic distribution of the standardised Moran’s I , obtained
as )(
)(
IV
IEI −, with )(
)()(
0
MWtrkRS
RIE
−= and
)2)((
)}({)}()()'({)()(
222
0 −−−−++=
kRkR
IEMWtrMWMWtrMWMWtr
S
RIV ,
is an )1,0(N ; the two Lagrange Multipliers that follow, ERRLM − and ELLM − ,
have an asymptotic )1(2χ , the distribution of the KR test is a )(2 mχ , with m being
the number of regressors included in the auxiliary regression. The three final tests also
have a chi-square distribution, with one degree of freedom in the first two, and two
degrees of freedom in the SARMA test.
17
References
Anselin L and R Florax (1995): Small Sample Properties of Tests for Spatial
Dependence in Regression Models. In L. Anselin and R. Florax (eds.): New
Directions in Spatial Econometrics (pp. 21-74). Berlin: Springer.
Belsley D, Kuh E and Welsh R (1980): Regression diagnostics: Identifying influential
data and sources of collinearity. New York: Wiley.
Chatterjee S and A Hadi (1988): Sensitivity Analysis in Linear Regression. New York:
Wiley.
Draper N and R Nostrand (1979): Ridge Regression and James-Stein estimation:
Review and Comments. Technometrics 21, 451-65.
Farrar D and R Glauber (1967): Multicollinearity in Regression Analysis: The Problem
Revisited. Review of Economics and Statistics 49, 92-107.
Florax R and T de Graaff (2004, forthcoming): The Performance of Diagnostics Tests
for Spatial Dependence in Linear Regression Models: A Meta-Analysis of
Simulation Studies. In L. Anselin, R. Florax and S. Rey (eds.) Advances in Spatial
Econometrics: Methodology, Tools and Applications. Berlin: Springer
Greene W (2003): Econometric Analysis, Fifth Edition. New Jersey: Prentice Hall.
Hocking R (1983): Developments in Linear Regression Methodology: 1959-1982.
Technometrics, 25, 219-30.
King B (1969): Comments on “Factor analysis and Regression”. Econometrica, 37, 538-
40.
Kosfeld R and J Lauridsen (2005): Factor Analysis Regression. University of Kassel,
Department of Economics, Working Paper.
Kumar T (1975): Multicollinearity in Regression Analysis. Review of Economics and
Statistics, 57, 365-6.
Lauridsen J and Mur J (2005): Multicollinearity and Outliers in Cross-Sectional
Analysis. University of Southern Denmark, Department of Economics, Working
Paper.
Mur J and Lauridsen J (2005): Outliers in Cross-Sectional Analysis. University of
Zaragoza, Department of Economic Analysis, Working Paper.
O’Hagan J and B McCabe (1975): Tests for the Severity of Multicollinearity in
Regression Analysis: A Comment. Review of Economics and Statistics, 57, 368-
70.
18
Scott J (1966): Factor Analysis and Regression. Econometrica, 34, 552-62.
Scott J (1969): Factor analysis and Regression Revisited. Econometrica, 37, 719.
Wichers C (1975): The Detection of Multicollinearity: A Comment. Review of
Economics and Statistics, 57, 366-8.
19
Note. i=NC: corr(X1,X2)=0; i=NA: X1 included only; i=0: corr(X1,X2)=0.2774; i=2: corr(X1,X2)=0.9449; i=4: corr(X1,X2)=0.9994.
Figure 1. Sizes and powers of the Moran I test.
20
Note. See Figure 1.
Figure 2. Sizes and powers of the LM-ERR test.
21
Note. See Figure 1.
Figure 3. Sizes and powers of the LM-LAG test.
22
Note. See Figure 1.
Figure 4. Sizes and powers of the LM-EL test.
23
Note. See Figure 1.
Figure 5. Sizes and powers of the LM-LE test.
24
Note. See Figure 1.
Figure 6. Sizes and powers of the SARMA test.
25
Note. See Figure 1.
Figure 7. Sizes and powers of the KR test.