multicollinearity in cross-sectional...

Multicollinearity in Cross-Sectional Regressions‡.

Jørgen Lauridsen (*),Jesús Mur (**),

(*)Corresponding author: The Econometric Group Department of Economics. University of Southern Denmark. Odense. Denmark. e-mail: [email protected] (**)Department of Economic Analysis. University of Zaragoza. Zaragoza. Spain. e-mail: [email protected]

Abstract

The robustness of the results coming from an econometric application depends to a

great extent on the quality of the sampling information. This statement is a general rule

that becomes especially relevant in a spatial context where data usually have lots of

irregularities.

The purpose of this paper is to examine more closely this question paying attention to

the impact of multicollinearity. It is well known that the reliability of estimators (least-

squares or maximum-likelihood) gets worse as the linear relationships between the

regressors become more acute. The main aspect of our work is that we resolve the

discussion in a spatial context, looking closely into the behaviour shown, under several

unfavourable conditions, by the most outstanding misspecification tests when collinear

variables are added to the regression. For this purpose, we plan and solve a Monte Carlo

simulation. The conclusions point to the fact that these statistics react in different ways

to the problems posed.

‡ Acknowledgements: This work has been carried out with the financial support of project SEC 2002-

02350 of the Spanish Ministerio de Educatión. The authors also wish to thank Ana Angulo for her

invaluable and disinterested collaboration.

1

1- Introduction

The main purpose of this paper is to examine the relationship between quality of the

sampling information and trustworthiness of econometric results in a cross-sectional

setting. We will focus on one point in particular, namely multicollinearity.

Multicollinearity among regressors is an intriguing and common property of data. The

consequences for estimation and inference are well known: unreliable estimation

results; high standard errors; coefficients with wrong signs and implausible magnitudes,

etc. (Belsley et al., 1980). In light of these problems, it is striking to see the relatively

shortcut treatment of the problem in econometric literature. Usually the discussion is

restricted to a few diagnostics, together with some standard suggestions for estimation,

trading an (expected) small increase in bias for an (expected) small reduction of MSE.

This slight treatment is frequently justified by judging the problem irrelevant, using

statements alike that of Greene (2003): ‘Suggested “remedies” to multicollinearity

might well amount to attempts to force the theory on the data’.

Indeed, a number of serious attempts to resolve the multicollinearity problem show up

in literature, but they are generally not included as a part of the econometrician’s

toolbox. Such attempts include the three-stage test procedure (Farrar and Glauber, 1967,

Wichers, 1975, Kumar, 1975, and O’Hagan and McCabe, 1975); regularisation methods

(Draper and Nostrand, 1979, Hocking, 1983) and factor analysis regression (Scott,

1966, King, 1969, Scott, 1969). A recent attempt by Kosfeld and Lauridsen (2005)

integrates a common factor measurement model in an errors in variables setting and

suggest a feasible factor analysis regression (FAR) estimator which outperforms the

OLS estimator for cases of medium and strong multicollinearity. In-depth treatment of

the multicollinearity problem and tools for detection and remedying are presented by

Belsley et al. (1980) and Chatterjee and Hadi (1988).

The purpose of the present investigation is to address the specific problems caused

when performing misspecification tests in a spatial cross-sectional regression. In section

2 we go deeply into the issues related to multicollinearity by tracing the partial effect on

misspecification tests from an additional variable which is collinear to the remainding

variables, using a partial regression framework established by Chatterjee and Hadi

(1988). As it is well known (Chatterjee and Hadi, 1988; Belsley et al., 1980) that

2

multicollinearity does affect least squares estimates but not the least squares residuals,

on which the tests are based, this property is shown to be sensitive in unpredictable

manners to the amount of spatial dependency as well as misspecification of the

underlying spatial process. While the framework applied also serves well as a tool to

trace the impact of extremal observations (which is equivalent to omitting relevant

variables, i.e. a set of dummies each of which holds the value 1 for an extremal

observation) as well as the impact of joint presence of outliers and multicollinearity, the

present study concentrates on the multicollinearity problem. An analysis of the effects

of extremal observations in spatial cross-sectional regression appears in Mur and

Lauridsen (2005), and an integrative study combining the two aspects is planned to

occur (Lauridsen and Mur, 2005). A simulation study is carried out in the third section

in order to analyse finite-sample size and power effects on tests for spatial dependency

of 1) omission/inclusion of an additional collinear variable, and 2) misspecification of

the underlying spatial process. The paper finishes with a section of conclusions.

2- Multicollinarity in cross-sectional econometric models

Essentially, multicollinearity refers to the successive inclusion of additional variables

that lift the collinearity of the full set of explanatory variables to a ‘harmful’ level. This

is the case if the additional variables 1) closely correlates to one or more linear

combination of the variables already in the model and 2) contributes relatively little to

the prediction apart from what is provided by the variables already in the model.

Formally, the problem faced can be expressed as

(1) uXXY +β+β= 2211

where 1X are the variables included in the model and 2X a set of 2k additional

variables to be considered added to the specification. Rewriting ξ+=ξ+γ= 212 XXX ,

where ξ is a set of 2k error vectors, multicollinearity occurs when the variance of ξ is

relatively small as compared to the variance of 2x .

To trace the impact of adding the additional regressor, two matrices are central: the

prediction matrix, defined as ')'( 1 XXXXP −= , which maps y onto the prediction y ,

3

i.e. Pyy =ˆ , and the residual matrix PIM −= , which maps y onto the residuals from a

regression on X , i.e. Myu = .

The prediction y can be thought of as made up of two independent predictions: 1) the

prediction provided by 1X and 2) the additional prediction provided by the part of 2X

that is independent from 1X , i.e. the residual from a regression of 2X on 1X , which is

equal to ξ . This can be formalised by partialising the predictor matrix into two

predictor matrices as

(2) 121

2122111

11121 ')'(')'( MXXMXXMXXXXPPP −− +=+=

so that the prediction is partialised as 2121 ˆˆˆ yyyPyPPyy +=+== .

Essentially, spatial dependency may be incorporated in (1) in one of two ways. A

dynamic substantive dependency is included by respecifying (1) with a spatially

autoregressive term (SAR) as

(3) uXXWyY +β+β+ρ= 2211

while a static residual dependency is included by respecifying (1) as

(4) ν+β+β+ρ= 2211 XXWyY

which further divides into the spatially autocorrelated (SAC) specification obtained by

letting

(5) uW +νρ=ν

or the spatial moving average (SMA) specification obtained by letting

(6) Wuu ρ−=ν .

Further, static and substantive dependency may be combined by replacing the residual

of (3) with a residual on the form (5) to obtain a spatially autoregressive – spatially

autocorrelated (SARC) specification or on the form (6) to obtain a spatially

autoregressive – spatial moving average (SARMA) specification.

4

For the moment we will limit ourselves to evaluate the impact on the misspecification

statistics habitually used in cross-sectional econometric models, that is, on Moran's I ,

ERRLM − , ELLM − and KR , which address the problem of spatial dependence in

the error term, together with the LAGLM − and the LELM − whose objective is to

analyse the dynamic structure of the equation. To these we add the SARMA test, whose

alternative hypothesis is composite (dynamic structure in the equation and a moving

average error term). Appendix 1 provides a brief presentation. With respect to our work,

it is important to point out that the seven tests are constructed from the residuals of the

LS estimation. Given that these residuals react in a different way to the presence of

anomalies in the sample, this sensitivity should appear, at least in part, also in the tests.

To establish the impact of adding 2X to the regression on the Moran test, use that

2121 )( PMPPIM −=−−= , where 1M is the residual matrix for the regression of y

on 1X , whereby yPMMyu )(ˆ 21 −== , so that the Moran I test reads as

(7) yPyyMy

yWPMyyWPPyyWMMy

S

R

Myy

MWMyy

S

R

uu

uWu

S

RI

21

212211

000 ''

'2''

'

'ˆ'ˆ

ˆ'ˆ

−−+

===

2120

1054

321

0

iIDS

RD

S

R

mm

mmm

S

R +=+=−

−+=

where 4

11 m

mD = and

5424

1543422 mmm

mmmmmmD

−+−

= , with yWMMym 111 '= ,

yWPPym 222 '= , yWPMym 213 '2= , yMym 14 '= , and yPym 25 '= . Thus, 1I is the

Moran test that emerges when y is regressed on 1X only, and 2i is the additional effect

on the test when adding 2X to the regression. Increasing collinearity implies that all

quadratic and cross product terms involving yPy 22ˆ = goes towards 0, i.e. 2m , 3m and

5m goes to 0, while 1m and 4m are left unaffected. This implies that 2i goes toward 0,

so that the test involving 1X and 2X moves toward the test involving 1X only.

The effect on the expected value of the I test under the null is traced as follows:

5

(8) )()(

)()(

)()(

)( 20

100

WPtrkRS

RWMtr

kRS

RMWtr

kRS

RIE

−−

−=

−=

21 )( eiIE +=

where )( 1IE is the expectation when including only 1X in the regression, and 2ei the

additional effect that goes toward 0 for increasing collinearity.

An alike – but admittedly more involved – development for the variance of I provides:

(9) )}()()'({)2)((

1)()( 22

0

MWtrMWMWtrMWMWtrkRkRS

RIV ++

−−−=

2)}({ IE−

211

21111

2

0

)}({)}()()'({)2)((

1)( IEWMtrWWMMtrWWMMtr

kRkRS

R −++−−−

=

)'()()()'({)2)((

1)[( 122

22222

2

0

WWMPtrWPtrWWPPtrWWPPtrkRkRS

R −++−−−

+

2112222112 )()}](2{)}()()( viIVIEiiWPtrWMtrWWMPtr +=+−−−

where traces involving 2P goes to 0 under increasing collinearity, so that 2vi moves

toward 0.

To conclude, the effect on the Moran test of adding a variable that is collinear to the

variables already included is an additional term, which is small in the sense that its

magnitude moves towards 0 for increasing collinearity. While this result is promising in

itself, the practical applicability is limited for at least two reasons. First, ‘small’ is not

necessarily equivalent to ‘small enough’. It is therefore necessary to establish some

knowledge about the relative importance of the term 2i (i.e. the effect of omitting 2x )

for the size and power of the test under fairly general circumstances and specific levels

of collinearity. Second, there is an interactive effect of collinearity and presence of

spatial dependencies. Assuming a fixed level of collinearity, increasing spatial

6

dependency will change the terms 1m , 3m and 4m , so that the magnitude of 2i is

reduced or increased in an unpredictable direction. Put another way, the effect of

omitting 2X , which is well understood for the size of the test or for the power of the

test for a specific level of spatial dependency, certainly may vary over the range of the

parameter or parameters that define the spatial dependency, and further be sensitive to

misspecification of the spatial process. These questions are resolved by Monte Carlo

studies in the following section, which investigates the finite-sample effect of varying

degree of collinearity on the size of the test as well as its power against increasing

spatial dependence.

For the ERRLM − test, a similar development provides:

(10) 2

54

3212

22

))(()ˆ'ˆ

ˆ'ˆ)((

mm

mmm

T

R

uu

uWu

T

RERRLM

−−+

==−

212122

22

1

2

)2)(()( errlmERRLMDDDT

RD

T

R −+−=++=

where 1m to 5m , 1D and 2D are defined as above under the treatment of the Moran

test. Thus, 1ERRLM − is the test when only 1X is included in the regression, and

2errlm − the additional effect of adding 2X . Increasing collinearity implies that 2D

moves toward 0, so that 2errlm − moves toward 0. But, again, increasing spatial

dependency and misspecification of the spatial process interacts with this effect in an

unpredictable way.

The LAGLM − test can be viewed as a product of two terms, multiplied by a constant:

(11) 2222 )1

()1

()ˆ'ˆ

'ˆ( R

RSR

Ruu

WyuLAGLM

jj

==−

Using that the first term may be partialised as

(12) 43

23

4132

3

1

43

21

21

21

''

''

'

'ˆ'ˆ

ˆ

ppp

pppp

p

p

pp

pp

yPyyMy

WyPyWyMy

Myy

MWyy

uu

WyuS

−+−

+=−−

=−−

===

21 sS +=

7

where increasing collinearity will move 2p and 4p toward 0, it follows that 2s moves

toward 0. For the second term, a partialisation of jR gives

(13) Myy

yMWWyRTR j '

ˆ''ˆ1 +=

yWPWPPyyWPWPPyyWPWMPyRT 2211212111 '''[{ −−++=

]}'''' 222122212112 yWPWPPyyWPWPPyyWPWMPyyWPWMPy −−++

}''/{ 21 yPyyMy −

2143

23

4332

3

11

43

211 jj rR

nnn

nnnnR

n

nRT

nn

nnRT +=

+−

++=−+

+=

where yWPWMPyn 1111 '= , 2n equals the term in the square brackets, yMyn 13 '= , and

yPyn 24 '= , so that 1jR is the value of jR obtained when regressing only on 1X and

2jr the additional effect of adding 2X . For increasing collinearity, all terms involved in

2n moves towards 0, like 4n does, while 3n is unchanged. This implies that 2jr moves

toward 0. Combining these results,

(14) 2

2121

22

11211222

1

212

21

2212

2 2)(R

rRR

rSRsSRsR

R

SR

rR

sSR

R

SLAGLM

jjj

jjj

jjjj +−+

+=+

+==−

21 lmlagLAGLM +−=

where 1LAGLM − is the test obtained when regressing on 1X only. For increasing

collinearity, the numerator in 2lmlag goes toward 0 and the denominator toward the

constant 21jR , implying that LAGLM − moves toward 1LAGLM − . Thus, in case of

strong collinearity, the tests calculated with and without 2X will be close to each other.

Again, increasing spatial dependency and misspecification of the spatial process may

work against this conclusion, as it increases the numerator of 2lmlag .

For the ELLM − test, it is found using the above definitions that

8

(15) 221212

2

2

2

2 ))((

)ˆ'ˆ

'ˆ

ˆ'ˆ

ˆ'ˆ(

sSR

TDD

R

TT

R

R

TT

uu

Wyu

R

T

uu

uWu

RELLMj

jj

j +−+−

=−

−=−

])([))()(( 21

11

1

2

22

22112

2

SR

TD

R

TT

Rs

R

TDS

R

TD

R

TT

R

j

j

jj

j

−−

=−+−−

=

)(

)(

)11

(

)(

)2()( 11

2

21

2

121

22

2211

122

12

2 SR

TD

rR

TT

RrRTR

rRR

RrSTr

j

jj

jjj

jjj

jj −

+−

−−

+++

−+

212211222

2

)))((2( ellmELLMsR

TDS

R

TDs

R

TD

R

TT

R

jjj

j

−+−=−−+−−

+

where 1ELLM − (i.e. the term in square brackets) is the value of the test when

regressing on 1X only, and 2ellm − is the additional effect of including 2x to the

regression. For increasing collinearity, the terms 2D , 2s , 2jr and )11

(121 jjj RrR

−−

go

toward zero so that the bias of omitting 2X vanishes for increasing collinearity. But,

again, the impact of spatial dependency and misspecification of the spatial process may

interact with this effect.

For the LELM − test, an alike derivation provide

(16) TR

DsDSR

TR

DDsSRLELM

jj −−+−

=−

+−+=−

222112

221212 ))()(())()((

TR

DsDSR

TR

DsR

TRTrR

DSrR

TR

DSR

jjjjj

j

j −−−

+−−

+−−+

−−+

−−

=))(()(

))((

))((]

)([ 2211

2222

2

121

1122

1

2112

21 lelmLELM −+−=

9

where 1LELM − (i.e. the term in square brackets) is the value of the test when

regressing on 1X only and 2lelm − the additional effect on the test of adding 2X to the

regression. As increasing collinearity implies that 2D , 2s and 2jr go toward zero, the

bias vanishes, and the LELM − value converges toward 1LELM − .

For the SARMA test, which is a sum of the LELM − and the ERRLM − test, it

follows from the above derivations that

21 sarmaSARMASARMA +=

where 1SARMA is the value of the test for a regression on 1X only and 2sarma the

additional effect of adding 2X to the regression, which goes toward zero for increasing

collinearity.

Concluding our investigation of the tests, it generally holds true that the value of the test

calculated from a regression of y on a set of regressors 1X and an additional regressor

2X moves toward the value of the test obtained when regressing y on 1X only, when

the collinearity between 2X and 1X is increased. This leads to an important guideline

for applied research: if the test calculated with and without the additional variable

included in the regression leads to test values on the same side of the critical value, then

conclusion about spatial effects can, ceteris paribus, be drawn without any further

concern of multicollinearity. It is, however, important to verify to which content such a

similarity can be expected to occur under general conditions. Resolving this question

implies an investigation of the empirical performance of the size and power functions of

the test under increasing collinearity and finite sample sizes as well as their robustness

toward different spatial specifications. Such an investigation is the aim of the Monte

Carlo study in section 3.

3.- Monte Carlo results

Regarding spatial structure, two models are simulated: a static one with a structure of

residual dependence in the forms of a SAC and a SMA process, and a substantive

dependence model specification in the form of a SAR with a white noise error term. The

10

models simulated in the static case share a simple design, νβ += Xy , while the model

for the SAR case is expressed as uXWyy ++= βρ .

For the SAC structure, ν is defined from uWv += νρ , while it is defined from

Wuuv ρ−= for the SMA case. For each structure, a simple model is simulated using

]1[ 1xX = and )',( 10 ββ=β together with a bivariate model using ]1[ 21 xxX =

and )',,( 210 βββ=β . The impact of increasing collinearity is studied by defining 1)

ξ=2x and 2) i

xx1012

ξ+= , with i varying between the values 0, 1, 2, 3 and 4, which

implies that the correlation between 1x and 2x ranges from 0 to 0.9994. The variable 1x

is obtained from an )1,0(U distribution, while u and ξ are obtained from )1,0(N

distributions. The parameters jβ )2,1,0( =j are set to 10. Furthermore, three systems of

regions were defined from the rook contiguity matrix on an rr × board with r selected

as 5, 10 and 15, leading to R being 25, 100 and 225 respectively. The resulting matrix

W has been row-standardised. Only non-negative values of the parameter ρ ranging

from 0 to 0.99 have been applied, and the number of simulations fixed to 1000 for each

case.

The principal results obtained are gathered in Table 1, dedicated to the empirical size,

and in Figures 1 to 7 which present the estimated power functions of the tests.

The results reflected in Table 1 do not contain surprises. Multicollinearity does not

affect the size of the tests. The estimated values fluctuate, in most cases, around the

theoretical 5.0%, with the KR test as an exception, as the size of the test is lifted by

multicollinearity.

The results corresponding to the estimated power functions are presented in graphic

form in Figures 1 to 7 (details can be obtained from the authors upon request).

Some aspects of these graphs were already well-known beforehand. The weakness of

the spatial dependency tests in a context of small samples is one of them. Another is the

worsening of the performance of the tests when a moving average is use in the

alternative hypothesis. Neither is the greater reliability of Moran's I for detecting

dependency processes in the error term nor the weaknesses of the KR test, especially in

11

SMA structures (Florax and de Graaff, 2004), anything new. The figures also illustrate

the low level of specificity of the traditional statistics, which react to all types of spatial

dependence. This is the raison d'être of the robust Lagrange Multipliers (LM-EL and

LM-LE).

TABLE 1: Empirical size of the tests for a theoretical significance level of 0.05.(*)

Bivariate model (x1 and x2 ) R=25

Simple model (x1 only)

corr=0

i=0 (corr=0.2774)

i=2 (corr=0.9449)

i=4 (corr=0.9994)

Moran’s I LM-ERR LM-LAG LM-EL LM-LE SARMA KR

0.036 0.040 0.054 0.053 0.056 0.049 0.035

0.052 0.040 0.073 0.041 0.066 0.064 0.089

0.049 0.039 0.072 0.039 0.071 0.055 0.097

0.055 0.053 0.064 0.049 0.061 0.065 0.069

0.053 0.050 0.065 0.056 0.062 0.052 0.067

R=100 Moran’s I LM-ERR LM-LAG LM-EL LM-LE SARMA KR

0.051 0.052 0.054 0.062 0.046 0.043 0.019

0.052 0.049 0.056 0.054 0.056 0.060 0.068

0.050 0.050 0.040 0.051 0.041 0.043 0.059

0.041 0.037 0.047 0.036 0.044 0.046 0.045

0.048 0.050 0.055 0.052 0.056 0.056 0.067

R=225 Moran’s I LM-ERR LM-LAG LM-EL LM-LE SARMA KR

0.041 0.041 0.052 0.051 0.055 0.044 0.019

0.039 0.037 0.049 0.037 0.050 0.042 0.046

0.053 0.048 0.040 0.049 0.041 0.048 0.068

0.055 0.049 0.051 0.049 0.048 0.048 0.047

0.035 0.038 0.050 0.038 0.047 0.046 0.049

(*) 95% Confidence interval for p (probability of rejecting the null hypothesis), around the theoretical

value for the case of 1000 replications, is 0.036 < p < 0.064.

Figures 1-2 show that the Moran’s I and LM-ERR tests are unaffected by

multicollinearity. This holds for all sample sizes and for each DGP.

The situation in the case of the LM-LAG test seems intriguing. In Figure 3 it is seen that

the power function moves downward under residual dependence as we increase

multicollinearity among the regressors; that is clear when the dependency parameter

excess 0.2. On the other hand, the power function reacts upwards to multicollinearity

when the DGP includes substantive dependence. In this case, multicollinearity improves

the performance of the LM-LAG test.

12

For the LM-EL test, it is noticed from Figure 4 that the power function in the case of

substantive dependency is absolutely flat confirming that it discriminates correctly

between residual and substantive dependence. In the case of residual dependence, the

LM-EL test is slighly lifted by multicollinearity.

Turning to the LM-LE test, Figure 5 shows that the power function in the cases of

residual dependence is absolutely flat confirming that it discriminates correctly between

residual and substantive dependence. Regarding multicollinearity, the LM-LE test is

seen to react alike the LM-LAG test in the case of substantive dependency, i.e.

increasing multicollinearity improves the performance of the LM-LAG test.

Regarding the SARMA test, it is noticed from Figure 6 that the power function is

unaffected by multicollinearity when resídual dependence is present but, as for the LM-

LAG and the LM-LE tests, the power function moves upward for the case of substantive

dependence.

Finally, for the KR test, Figure 7 shows a general tendency to increase of the power

function for increasing collinearity. This tendency is only marginal when residual

dependence is present, and somewhat larger for the case of substantive dependency.

The effects of multicollinearity on size and power of the tests are summarized in Table

2. It is readily concluded that multicollinearity only impact the size for the KR test.

Further, the power for the Moran’s I and the LM-ERR tests when is unaffected by

multicollinearity, while it is lifted for the KR test. For the robust LM-EL test the power

increases slightly, both under SAC and SMA processes. Next, all tests for substantive

dependence have their power functions increased by multicollinearity in the case where

the DGP includes a lag of the endogenous in the right hand side of the equation.

13

Table 2. Effects of multicollinearity on tests for omitted spatial effects(**).

Effect on size Effect on power

Residual Dependence Residual Dependence

Substantive Dependence

SAR SAC SMA

Substantive Dependence

SAR SAC SMA

Moran’s I ↔ ↔ ↔ ↔ ↔ ↔

LM-ERR ↔ ↔ ↔ ↔ ↔ ↔

LM-EL ↔ ↔ ↔ ↔ ↑ ↑

KR ↑ ↑ ↑ ↑ ↑ ↑

LM-LAG ↔ ↔ ↔ ↑ ↓ ↓

LM-LE ↔ ↔ ↔ ↑ ↔ ↔

SARMA ↔ ↔ ↔ ↑ ↔ ↔

(**) ↑, ↓ and ↔ indicates that the size or power function is increased/decreased/unaffected by increasing collinearity among the regressors.

4.-Conclusions and final remarks

The objective of this paper was to examine the influence of multicollinearity

relationships on the misspecification tests more often used in a context of spatial

econometric modelling. It was shown that the additional effects on tests from adding an

additional variable generally vanish for increasing multicollinearity, but that this feature

interacts with spatial dependence in an unpredictable manner.

The simulation carried out has served to corroborate some hypotheses. Multicollinearity

does not impact the size of the tests for omitted spatial effects. This holds true

irrespectively of sample size and whether the characteristics of the DGP, with only sligh

deviations for the KR test. Further, the power for the unadjusted tests under residual

dependence (i.e. the Moran’s I, LM-ERR and KR tests) are largely unaffected by

multicollinearity for any sample size and DGP, with sligh exceptions for the KR test.

But for the robust one, the LM-EL test, the power increases with multicollinearity.

Next, all tests for lag omitted endogenous variables have their power functions

increased by multicollinearity in the case where there is, actually, substantive

dependence in the equation.

Finally, we wish to insist that this paper is nothing more than a first approximation to

the problem of multicollinearity in cross-sectional econometric models. We have

analysed a limited number of combinations with which we have been reaching a few

14

conclusions. Nevertheless, the cases that remain to be studied (including outliers and/or

more complex collinear patterns) seem even more interesting.

15

APPENDIX 1: Misspecification test used.

The tests used always refer to the model of the null hypothesis; that is, of the static type

such as: uXy += β . This model has been estimated by LS, where 2σ and β are the

corresponding LS estimations and u the residual series. The tests are the following (see

Anselin and Florax, 1995, or Florax and de Graaff, 2004, for details):

(A.1) Moran’s I Test: uu

uWu

S

RI

ˆ'ˆ

ˆ'ˆ

0

= ; ∑∑= =

=R

r

R

srswS

1 10

(A.2) ERRLM − Test: 22

1

)ˆ

ˆˆ(

1

σuWu

TERRLM =− ; )'( 2

1 WWWtrT +=

(A.3) ELLM − Test:

j

j

R

TT

Wyu

R

TuWu

ELLM2

11

22

12

)ˆ'ˆ

ˆ

ˆ'ˆ(

−

−=−

σσ

(A.4) KR Test: ee

ZZhKR R ˆ'ˆ

ˆ''ˆ γγ=

(A.5) LAGLM − Test: 22

)ˆ'ˆ

(1

σWyu

RLAGLM

j

=−

(A.6) LELM − Test: 1

222

)ˆ

ˆ'ˆ

ˆ'ˆ

(

TR

uWuWyu

LELMj −

−=− σσ

(A.7) SARMA Test: 22

11

222

)ˆ

ˆˆ(

1)

ˆ

ˆ'ˆ

ˆ'ˆ

(

σσσ uWu

TTR

uWuWyu

SARMAj

+−

−=

Moreover, 21 ˆ

ˆ'''ˆ

σββ MWXWX

TR j += and ')'( 1 XXXXIM −−= . Furthermore, e is the

vector of residuals from the auxiliary regression of the Kelejian-Robinson ( KR ) test, of

order 1×Rh , Z is the matrix of exogenous variables included in the last regression and

γ the estimated coefficients obtained for the corresponding vector of parameters.

16

As is well-known, the asymptotic distribution of the standardised Moran’s I , obtained

as )(

)(

IV

IEI −, with )(

)()(

0

MWtrkRS

RIE

−= and

)2)((

)}({)}()()'({)()(

222

0 −−−−++=

kRkR

IEMWtrMWMWtrMWMWtr

S

RIV ,

is an )1,0(N ; the two Lagrange Multipliers that follow, ERRLM − and ELLM − ,

have an asymptotic )1(2χ , the distribution of the KR test is a )(2 mχ , with m being

the number of regressors included in the auxiliary regression. The three final tests also

have a chi-square distribution, with one degree of freedom in the first two, and two

degrees of freedom in the SARMA test.

17

References

Anselin L and R Florax (1995): Small Sample Properties of Tests for Spatial

Dependence in Regression Models. In L. Anselin and R. Florax (eds.): New

Directions in Spatial Econometrics (pp. 21-74). Berlin: Springer.

Belsley D, Kuh E and Welsh R (1980): Regression diagnostics: Identifying influential

data and sources of collinearity. New York: Wiley.

Chatterjee S and A Hadi (1988): Sensitivity Analysis in Linear Regression. New York:

Wiley.

Draper N and R Nostrand (1979): Ridge Regression and James-Stein estimation:

Review and Comments. Technometrics 21, 451-65.

Farrar D and R Glauber (1967): Multicollinearity in Regression Analysis: The Problem

Revisited. Review of Economics and Statistics 49, 92-107.

Florax R and T de Graaff (2004, forthcoming): The Performance of Diagnostics Tests

for Spatial Dependence in Linear Regression Models: A Meta-Analysis of

Simulation Studies. In L. Anselin, R. Florax and S. Rey (eds.) Advances in Spatial

Econometrics: Methodology, Tools and Applications. Berlin: Springer

Greene W (2003): Econometric Analysis, Fifth Edition. New Jersey: Prentice Hall.

Hocking R (1983): Developments in Linear Regression Methodology: 1959-1982.

Technometrics, 25, 219-30.

King B (1969): Comments on “Factor analysis and Regression”. Econometrica, 37, 538-

40.

Kosfeld R and J Lauridsen (2005): Factor Analysis Regression. University of Kassel,

Department of Economics, Working Paper.

Kumar T (1975): Multicollinearity in Regression Analysis. Review of Economics and

Statistics, 57, 365-6.

Lauridsen J and Mur J (2005): Multicollinearity and Outliers in Cross-Sectional

Analysis. University of Southern Denmark, Department of Economics, Working

Paper.

Mur J and Lauridsen J (2005): Outliers in Cross-Sectional Analysis. University of

Zaragoza, Department of Economic Analysis, Working Paper.

O’Hagan J and B McCabe (1975): Tests for the Severity of Multicollinearity in

Regression Analysis: A Comment. Review of Economics and Statistics, 57, 368-

70.

18

Scott J (1966): Factor Analysis and Regression. Econometrica, 34, 552-62.

Scott J (1969): Factor analysis and Regression Revisited. Econometrica, 37, 719.

Wichers C (1975): The Detection of Multicollinearity: A Comment. Review of

Economics and Statistics, 57, 366-8.

19

Note. i=NC: corr(X1,X2)=0; i=NA: X1 included only; i=0: corr(X1,X2)=0.2774; i=2: corr(X1,X2)=0.9449; i=4: corr(X1,X2)=0.9994.

Figure 1. Sizes and powers of the Moran I test.

20

Note. See Figure 1.

Figure 2. Sizes and powers of the LM-ERR test.

21

Note. See Figure 1.

Figure 3. Sizes and powers of the LM-LAG test.

22

Note. See Figure 1.

Figure 4. Sizes and powers of the LM-EL test.

23

Note. See Figure 1.

Figure 5. Sizes and powers of the LM-LE test.

24

Note. See Figure 1.

Figure 6. Sizes and powers of the SARMA test.

25

Note. See Figure 1.

Figure 7. Sizes and powers of the KR test.

multicollinearity in cross-sectional...

Documents