ols violation of assumptions - thus spake vm · pdf file4 cds m phil econometrics vijayamohan...

43
1 CDS M Phil Econometrics Vijayamohan OLS OLS Violation of Violation of Assumptions Assumptions CDS M Phil Econometrics Vijayamohanan Pillai N 1 Non Non-Normality, Normality, Multicollinearity Multicollinearity, Specification Error Specification Error 2 CDS M Phil Econometrics Vijayamohan

Upload: lamkhuong

Post on 06-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

1

CDS M Phil Econometrics Vijayamohan

OLSOLS

Violation ofViolation ofAssumptionsAssumptions

CDS M Phil Econometrics

Vijayamohanan Pillai N

1

NonNon--Normality,Normality,MulticollinearityMulticollinearity,,

Specification ErrorSpecification Error

2CDS M Phil Econometrics Vijayamohan

2

Assumptions of RegressionAssumptions of Regression

An acronym NOLINE:

• Non-stochastic X

• Orthogonal X and Error

• Linearity

• Independence of Errors

• Normality of Error

• Equal Variance (Homoscedasticity)

23/10/2009 3CDS M Phil EconometricsVijayamohan

Assumption Violations:

How we will approach the question.

• Definition

• Implications

• Causes

• Tests

• Remedies

23/10/2009 4CDS M Phil EconometricsVijayamohan

3

Assumption ViolationsAssumption Violations

•Problems with u:

•The disturbances are not normallydistributed

•The variance parameters in thecovariance-variance matrix aredifferent

•The disturbance terms are correlated

CDS M Phil Econometrics Vijayamohan

23/10/2009 5

CDS M Phil Econometrics Vijayamohan

Assumption ViolationsAssumption Violations::

•Problems with X:

•The explanatory variables and thedisturbance term are correlated

•There is high linear dependence betweentwo or more explanatory variables

•Incorrect model – e.g. exclusion ofrelevant variables; inclusion of irrelevantvariables; incorrect functional form

23/10/2009 6

4

CDS M Phil Econometrics Vijayamohan

Residual AnalysisResidual Analysis

••The residual for observationThe residual for observation ii,, eeii, = the, = thedifference between its observed anddifference between its observed andpredicted valuepredicted value

••Check the assumptions of regressionCheck the assumptions of regressionby examining the residualsby examining the residuals

••Graphical Analysis of ResidualsGraphical Analysis of Residuals

iii YYe

7

CDS M Phil Econometrics Vijayamohan

Residual AnalysisResidual Analysis

••Check the assumptions of regressionCheck the assumptions of regressionby examining the residualsby examining the residuals

oo Examine for linearity assumptionExamine for linearity assumption

oo Evaluate independence assumptionEvaluate independence assumption

oo Evaluate normal distribution assumptionEvaluate normal distribution assumption

oo Examine for constant variance for allExamine for constant variance for alllevels of X (levels of X (homoscedasticityhomoscedasticity))

••Graphical Analysis of ResidualsGraphical Analysis of Residuals

oo Can plot residuals vs. XCan plot residuals vs. X

iii YYe

8

5

CDS M Phil Econometrics Vijayamohan

Residual Analysis orResidual Analysis orModel Adequacy TestsModel Adequacy Tests iii YYe

9

Model adequacy diagnosis: An important stage,before hypothesis testing in forecast modelling

The fitted model is said to be adequate if itexplains the data set adequately, i.e., if theresidual does not contain (or conceal) any‘explainable non-randomness’ left from the(‘explained’) model.

i.e. if the residual is purely random/white noise

If all the OLS assumptions are satisfied.

CDS M Phil Econometrics Vijayamohan

Residual Analysis forResidual Analysis forLinearityLinearity

Not Linear Linear

x

resid

uals

x

Y

x

Y

x

resid

uals

10

6

CDS M Phil Econometrics Vijayamohan

Residual Analysis forResidual Analysis forIndependenceIndependence

Not IndependentIndependent

X

Xresid

uals

resid

uals

X

resid

uals

Chap

(1) Non-zero Mean for theresiduals (Definition)

o The residuals have a mean other than 0.

o

o Note that this refers to the true residuals.Hence the estimated residuals have amean of 0, while the true residuals arenon-zero.

oo E(u) > or < 0.E(u) > or < 0.

CDS M Phil Econometrics Vijayamohan

)n,...2,1i(0)u(E i

12

7

Non-zero Mean for theresiduals (Implications)

• The true regression line is

• Therefore the intercept isbiased.

• The slope, b, is unbiased.

Y a bXi i e

CDS M Phil Econometrics Vijayamohan13

Non-zero Mean for theresiduals (Causes, Tests,

Remedies)••Causes:Causes:

••some form of specification error:some form of specification error:omitted variables.omitted variables.

••We will discuss Tests and RemediesWe will discuss Tests and Remedieswhen we look closely atwhen we look closely atSpecification errors.Specification errors.

CDS M Phil Econometrics Vijayamohan14

8

(2) Non-normally distributederrors : Definition

• The residuals are not NID(0, )

0.0

8.8

17.5

26.3

35.0

-1000.0 -250.0 500.0 1250.0 2000.0

Histogram of Residuals of rate90

Residuals of rate90

Count

Normality TestsAssumption Value Probability Decision(5%)Skewness 5.1766 0.000000 RejectedKurtosis 4.6390 0.000004 Rejected

CDS M Phil Econometrics Vijayamohan15

Non-normally distributederrors : Implications

oo The model is to some degreeThe model is to some degree misspecifiedmisspecified..

oo A collection of truly stochasticA collection of truly stochasticdisturbances should have a normaldisturbances should have a normaldistribution:distribution:

The central limit theorem states that asThe central limit theorem states that asthe number of random variables increases,the number of random variables increases,the sum of their distributions tends to be athe sum of their distributions tends to be anormal distribution.normal distribution.

CDS M Phil Econometrics Vijayamohan16

9

Non-normally distributederrors : Implications (cont.)• If the residuals are not normally distributed, then

the estimators of a and b are also not normallydistributed.

• Estimates are, however, still BLUE.

• Estimates are unbiased and have minimumvariance.

• BUT, no longer asymptotically efficient, eventhough they are asymptotically unbiased

• (consistent).

•• Asymptotically efficient?Asymptotically efficient?

CDS M Phil Econometrics Vijayamohan17

Non-normally distributed errors: Implications (cont.)

• An estimator is asymptotically efficient, if it isconsistent, asymptotically normally distributed,and has an asymptotic covariance matrix that isnot larger than that of any other similar estimator.

• If the residuals are normally distributed, then theLS estimator is also the ML estimator.

•• MLEs are asymptotically efficient amongMLEs are asymptotically efficient amongconsistent and asymptotically normally distributedconsistent and asymptotically normally distributedestimators.estimators.

If residuals are non-Normal,

• It is only our hypothesis tests which are affected.CDS M Phil Econometrics Vijayamohan

18

10

Non-normally distributederrors: Causes

•• Generally caused by a misspecificationGenerally caused by a misspecificationerror.error.

•• Usually an omitted variable.Usually an omitted variable.

•• Can also result fromCan also result from

oo Outliers in data.Outliers in data.

oo Wrong functional form.Wrong functional form.

CDS M Phil Econometrics Vijayamohan19

CDS M Phil Econometrics Vijayamohan

Non-Normality Tests:Residual Analysis for Normality

Percent

Residual

A normal probability plot of the residualscan be used to check for normality:

-3 -2 -1 0 1 2 3

0

100 The plottedpoints

reasonablylinear

20

11

CDS M Phil Econometrics Vijayamohan

Percent

Residual

Residual distribution Positively skewed

-3 -2 -1 0 1 2 3

0

100 The plottedpoints lie

above thecomparisonline in bothtails of the

distribution

Non-Normality Tests:Residual Analysis for Normality

21

CDS M Phil Econometrics Vijayamohan

Percent

Residual

Residual distribution heavy-tailed

-3 -2 -1 0 1 2 3

0

100

The plottedpoints in theupper tail lie

above thecomparison

line andthose in the

lower tailbelow the

line

Non-Normality Tests:Residual Analysis for Normality

22

12

Non-normally distributed errors : Testsfor non-normality (cont.)

• Jarque-Bera test

o This test examines both the skewness andkurtosis of a distribution to test for normality.

o Where S is the skewness and K is the kurtosis ofthe residuals.

o JB has a 2 distribution with 2 df.

oo Ho: S = 0; K = 3 (residuals normal)Ho: S = 0; K = 3 (residuals normal)

o If Estimated JB near zero, p-value > 0.05,

» do not reject Ho.CDS M Phil Econometrics Vijayamohan

23

CDS M Phil Econometrics Vijayamohan 24

A portmanteau test, since the four lowest momentsabout the origin are used jointly for its calculation.

13

Non-normally distributed errors : Testsfor non-normality (cont.)

JarqueJarque,, Carlos M.Carlos M.; Anil K.; Anil K. BeraBera (1980). "Efficient(1980). "Efficienttests for normality,tests for normality, homoscedasticityhomoscedasticity and serialand serialindependence of regression residuals". Economicsindependence of regression residuals". EconomicsLetters 6 (3): 255Letters 6 (3): 255––259.259.

oo JarqueJarque,, Carlos M.Carlos M.; Anil K.; Anil K. BeraBera (1981). "Efficient(1981). "Efficienttests for normality,tests for normality, homoscedasticityhomoscedasticity and serialand serialindependence of regression residuals: Monte Carloindependence of regression residuals: Monte Carloevidence". Economics Letters 7 (4): 313evidence". Economics Letters 7 (4): 313––318.318.

CDS M Phil Econometrics Vijayamohan25

CDS M Phil Econometrics Vijayamohan 26

Variable name: resid; label: residuals

NonNon--normality Tests :normality Tests :An ExampleAn Example

14

Residual vs. Predictor PlotResidual vs. Predictor Plot

CDS M Phil Econometrics Vijayamohan 27

-50

05

010

0R

esid

ua

ls

1000 1500 2000 2500Square feetN

on

No

n--n

orm

ali

tyT

es

ts:

no

rma

lity

Te

sts

:A

nE

xa

mp

leA

nE

xa

mp

le

Residual vs. Fit PlotResidual vs. Fit Plot

CDS M Phil Econometrics Vijayamohan 28

-50

050

10

0R

esid

ua

ls

200 250 300 350 400Fitted values

No

nN

on

--no

rma

lity

Te

sts

:n

orm

ali

tyT

es

ts:

An

Ex

am

ple

An

Ex

am

ple

15

HistogramHistogram

CDS M Phil Econometrics Vijayamohan 29

0.0

05

.01

.015

.02

De

nsity

-50 0 50 100Residuals

No

nN

on

--no

rma

lity

Te

sts

:n

orm

ali

tyT

es

ts:

An

Ex

am

ple

An

Ex

am

ple

StataStata Normality testsNormality tests

CDS M Phil Econometrics Vijayamohan 30

StatIsitics:Summaries, Tables and Tests:

Distributional plots and tests:Skewness and kurtosis

normality tests,Shapiro-Wilk normality testsShapiro-Francia normality

tests

16

CDS M Phil Econometrics Vijayamohan 31

0.0

00.2

50.5

00.7

51.0

0N

orm

alF

[(re

sid

-m)/

s]

0.00 0.25 0.50 0.75 1.00Empirical P[i] = i/(N+1)

Graphics :Distributional graphs : Normal probability plot

No

nN

on

--no

rma

lity

Te

sts

:n

orm

ali

tyT

es

ts:

An

Ex

am

ple

An

Ex

am

ple

Normal Probability PlotNormal Probability Plot

CDS M Phil Econometrics Vijayamohan 32

NonNon--normality Tests :normality Tests :An ExampleAn Example

17

NonNon--normally distributed errors:normally distributed errors:RemediesRemedies

•Try to modify your theory.

•Omitted variable?

•Outlier needing specification?

•Modify your functional form by takingsome variance transforming step suchas

•square root, exponentiation, logs, etc.

CDS M Phil Econometrics Vijayamohan33

MulticollinearityMulticollinearity:: DefinitionDefinition

• Multicollinearity : the condition where theindependent variables are related to each other.Causation is not implied by multicollinearity.

• As any two (or more) variables become more andmore closely correlated, the condition worsens, and‘approaches singularity’.

• Since the X's are supposed to be fixed, this a sampleproblem.

• Since multicollinearity is almost always present, it isa problem of degree, not merely existence.

CDS M Phil Econometrics Vijayamohan34

18

•• Consider 2 explanatory variable model:Consider 2 explanatory variable model:

•• YYii == oo ++ 11XX1i1i ++ 22XX2i2i ++ UUii

•• In matrix format,In matrix format,

•• Y = XY = X + U+ U

CDS M Phil Econometrics Vijayamohan

MulticollinearityMulticollinearity:: ImplicationsImplications

)yx()xx(ˆ 1

12u )xx()(Var

35

•• Y = XY = X + U+ U

•• In meanIn mean--deviations, we havedeviations, we have

•• Where D = |(Where D = |(xxTTxx)|)|CDS M Phil Econometrics Vijayamohan

2212

2121

xxx

xxx)xx(

2121

21221

xxx

xxx

D

1)xx(

yx

yx)yx(

2

1

36

MulticollinearityMulticollinearity:: ImplicationsImplications

19

•• YYii == oo ++ 11XX1i1i ++ 22XX2i2i ++ UUii

CDS M Phil Econometrics Vijayamohan

)yx()xx(ˆ 1

2121

21221

xxx

xxx

D

1)xx(

yx

yx)yx(

2

1

221

22

21

212221

1)xx(xx

xxyxxyxˆ

37

MulticollinearityMulticollinearity:: ImplicationsImplications

•• YYii == oo ++ 11XX1i1i ++ 22XX2i2i ++ UUii

CDS M Phil Econometrics Vijayamohan

12u )xx()(Var

2

2122

21

222

u1)xx(xx

x)(Var

21

2u

22.1

1x)R1(

1)ˆvar(

2121

21221

xxx

xxx

D

1)xx(

R1.22 = R2 in the regression of x1 on all the other variables

38

MulticollinearityMulticollinearity:: ImplicationsImplications

20

CDS M Phil Econometrics Vijayamohan

Consider the following cases

A) No multicollinearity

X1 X2 : Orthogonal Cov(X1, X2) = R1.22 = 0.

The regression would appear to be identicalto separate bivariate regressions: bothcoefficients and variances

221

22

21

212221

1)xx(xx

xxyxxyxˆ

21

2u

22.1

1x)R1(

1)ˆvar(

21

11

x

yxˆ

21

2u

1x

)ˆvar(

39

MulticollinearityMulticollinearity:: ImplicationsImplications

o B) Perfect Multicollinearity

Given X = (xGiven X = (x11, x, x22,…,,…,xxkk))xxii : the: the ii--thth column of X with n observations.column of X with n observations.

If Xi is a perfect linear combination of one ormore other variables Xj,

where the constants (where the constants (ccii) are not all zero,) are not all zero,then X'X is singular :: |X'X| = 0.

The matrix does not have full rank.The matrix does not have full rank.

0cx...cxcx kk2211

CDS M Phil Econometrics Vijayamohan 40

MulticollinearityMulticollinearity:: ImplicationsImplications

21

CDS M Phil Econometrics Vijayamohan

For example if x1 = kx2 then the variables x1

and x2, are exactly linearly related.

the matrix XTX is singular the inverse does not exist.

Substitute x1 = kx2 in the following and see:

221

22

21

212221

1)xx(xx

xxyxxyxˆ

41

MulticollinearityMulticollinearity:: ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

The more common problem:Imperfect or near multicollinearity

two or more of the explanatory variables areapproximately linearly related

0xxx kk ccc ...2211

42

MulticollinearityMulticollinearity:: ImplicationsImplications

22

CDS M Phil Econometrics Vijayamohan

For example: x1 = kx2 + v (where v is arandom element:

higher the r, the greater the prevalenceof multicollinearity.

(Naturally, without v, r = 1 by definition.)

43

MulticollinearityMulticollinearity:: ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

ii21i uXY

2i

2u

2x

)ˆvar(

2i

2i )XX(x

The impact of multicollinearity on the standarderrors of regression estimates :

In the context of a simple (bivariate) regression:

where

44

MulticollinearityMulticollinearity:: ImplicationsImplications

23

CDS M Phil Econometrics Vijayamohan

With several explanatory variables :

where R2k­ is the coefficient of

determination when Xk is regressed onall the other explanatory variables.

2k

2u

2.k

kx)R1(

1)ˆvar(

45

MulticollinearityMulticollinearity:: ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

the (variance and standard error) ofthe coefficients depend not only on thestandard error of the regression and thevariation in the explanatory variable inquestion

but also on R2k.

2k

2u

2.k

kx)R1(

1)ˆvar(

46

MulticollinearityMulticollinearity:: ImplicationsImplications

24

CDS M Phil Econometrics Vijayamohan

Ceteris paribus,

The correlation of Xk with the other variables ,var .

Variation in Xk , var .

Overall fit of the regression , var .

2k

2u

2.k

kx)R1(

1)ˆvar(

47

MulticollinearityMulticollinearity:: ImplicationsImplications

• If the independent variables are highlycorrelated, inflated

•• t ratio's are lower

•• insignificant

• and R2 tends to be high as well.

•• Significant FSignificant F

•• Sign changes occur with the introduction ofSign changes occur with the introduction ofa new variablea new variable

• The still BLUE (but useless)

)ˆvar( k

k

k

CDS M Phil Econometrics Vijayamohan 48

MulticollinearityMulticollinearity:: ImplicationsImplications

25

CDS M Phil Econometrics Vijayamohan

Nonexperimental datanever orthogonal (R2

k. = 0)

So to some extent, multicollinearity alwayspresent.

When is multicollinearity a problem?

Some diagnostic statistics:

Multicollinearity:Tests/Indicators

49

CDS M Phil Econometrics Vijayamohan

Some diagnostic statistics:

1. Klein’s Rule of Thumb (Lawrence Klein: 1980)

Multicollinearity is problem ifR2

k. from auxiliary regression > the overall R2 .

2. Variance inflation factor (VIF):

“The most useful single diagnostic guide”J Johnston (1984)

)R1(

12

.k

Multicollinearity:Tests/Indicators

50

26

Interpreting VIFsInterpreting VIFs

•• NoNo multicollinearitymulticollinearity VIF = 1.0VIF = 1.0

•• If the VIF for a variable were 9,If the VIF for a variable were 9,

its standard error would be three times asits standard error would be three times aslarge as it would be if its VIF was 1.large as it would be if its VIF was 1.

•• In such a case, the coefficient would have toIn such a case, the coefficient would have tobe 3 times as large to be statisticallybe 3 times as large to be statisticallysignificant.significant.

51CDS M Phil Econometrics Vijayamohan

Interpreting VIFsInterpreting VIFs

•• If the VIF is greater than 10.0, thenIf the VIF is greater than 10.0, thenmulticollinearitymulticollinearity is probably severe.is probably severe.

•• 90% of the variance of90% of the variance of XXjj is explained byis explained bythe other Xs.the other Xs.

•• In small samples, a VIF of about 5.0 mayIn small samples, a VIF of about 5.0 mayindicate problemsindicate problems

52CDS M Phil Econometrics Vijayamohan

27

Multicollinearity:Tests/Indicators (cont.)

• Also Tolerance:

• If the tolerance equals 1, the variablesare unrelated.

• If TOLj = 0, then they are perfectlycorrelated.

)R1(VIF

1TOL 2

.kk

53CDS M Phil Econometrics Vijayamohan

CDS M Phil Econometrics Vijayamohan

How large a VIF value has to beto be “large enough”?

Belsley (1991) suggests :

1. Eigen values of X’X2. Condition index (CI) and3. Condition number (CN)

Multicollinearity:Tests/Indicators

54

28

CDS M Phil Econometrics Vijayamohan

Given the eigenvalues 1 > 2 > 3 > ….

CIj = (1/ j); j = 1, 2, 3, …..

Stata/SPSS reports sqrt of CIj

CN = sqrt(Max eigenvalue/ Min eigenvalue)

Or CN = Max CIj.

Multicollinearity:Tests/Indicators

55

CDS M Phil Econometrics Vijayamohan

Largest CI = 5 – 10 no problemLargest CI = 30 – 100 problematicLargest CI = 1000 – 3000 severe problem

See1. DA Belsley, 1991 Conditioning Diagnostics:

Collinearity and weak data in Regression, Wiley2. DA Belsley, E Kuh and RE Welsch 1980 Regression

Diagnostics: Identifying Influential Data andSources of Collinearity, Wiley

3. Norman R Draper and Harry Smith 2003 AppliedRegression Analysis 3rd Ed. Wiley

Multicollinearity:Tests/Indicators

56

29

CDS M Phil Econometrics Vijayamohan 57

MulticollinearityMulticollinearity:: Tests/IndicatorsTests/Indicators

An ExampleAn Example

CDS M Phil Econometrics Vijayamohan 58

MulticollinearityMulticollinearity:: Tests/IndicatorsTests/Indicators

An ExampleAn Example

30

CDS M Phil Econometrics Vijayamohan 59

Postestimation statistics for regress:Reports and statistics:

Variance inflation factors

MulticollinearityMulticollinearity:: Tests/IndicatorsTests/Indicators

An ExampleAn Example

InStata

CDS M Phil Econometrics Vijayamohan 60

For Condition Index and NumberFor Condition Index and Number

Download theDownload the collincollin commandcommand

31

CDS M Phil Econometrics Vijayamohan 61

CDS M Phil Econometrics Vijayamohan 62

Sqrt(1/ 1)Sqrt(1/ 2)Sqrt(1/ 3)

Mu

ltic

oll

ine

ari

tyM

ult

ico

llin

ea

rity

::T

es

ts/I

nd

ica

tors

Te

sts

/In

dic

ato

rs

An

Ex

am

ple

An

Ex

am

ple

32

Multicollinearity: Causes

• Sampling mechanism. Poorly constructeddesign & measurement scheme or limitedrange.

• Statistical model specification: addingpolynomial terms or trend indicators.

• Too many variables in the model - themodel is overdetermined.

• Theoretical specification is wrong.Inappropriate construction of theory oreven measurement

CDS M Phil Econometrics Vijayamohan 63

MulticollinearityMulticollinearity:: RemediesRemedies

• Increase sample size

• Omit Variables

• Scale Construction/Transformation

• Factor Analysis

• Constrain the estimation. Such as the casewhere you can set the value of onecoefficient relative to another.

•• Ignore itIgnore it -- report adjusted r2 and claim itreport adjusted r2 and claim itwarrants retention in the model.warrants retention in the model.

CDS M Phil Econometrics Vijayamohan 64

33

Model Specification:Model Specification:DefinitionDefinition

Specification error

• covers any mistake in the set ofassumptions of a model and theassociated inference procedures

•• But it has come to be used for errors inBut it has come to be used for errors inspecifying the data matrix X.specifying the data matrix X.

CDS M Phil Econometrics Vijayamohan 65

Model Specification:Model Specification:Definition (cont.)Definition (cont.)

• There are basically 4 types ofmisspecification we need to examine:

oo exclusion of a relevant variableexclusion of a relevant variable

oo inclusion of an irrelevant variableinclusion of an irrelevant variable

o functional form

o measurement error and misspecifiederror term

CDS M Phil Econometrics Vijayamohan 66

34

CDS M Phil Econometrics Vijayamohan

67

Too Many or Too FewToo Many or Too FewVariablesVariables

•• What happens if we include variables inWhat happens if we include variables inour specification that don’t belong?our specification that don’t belong?

•• There is no effect on our parameterThere is no effect on our parameterestimate, and OLS remains unbiasedestimate, and OLS remains unbiased

•• What if we exclude a variable from ourWhat if we exclude a variable from ourspecification that does belong?specification that does belong?

•• OLS will usually be biasedOLS will usually be biased

CDS M Phil Econometrics Vijayamohan

68

1. Omission of relevant variable:1. Omission of relevant variable:Exclusion/Exclusion/ UnderfittingUnderfitting BiasBias

,uxxy 22110 Suppose the true model is:

But we estimate,ux

~~y~

110

Then

211i

i11i1

xx

yxx~

35

CDS M Phil Econometrics Vijayamohan

69

Omitted Variable BiasOmitted Variable Bias

,uxxy 22110 But

So the numerator becomes

211i

i11i1

xx

yxx~

i11i2i11i22

11i1

i2i21i1011i

uxxxxxxx

uxxxx

CDS M Phil Econometrics Vijayamohan

70

Omitted Variable BiasOmitted Variable Bias(cont)(cont)

211i

2i11i211

i

211i

i11i

211i

2i11i21

)xx(

xxx)

~(E

havewensexpectatiotaking0,)uE(since

)xx(

uxx

)xx(

xxx~

211i

i11i1

xx

yxx~

36

CDS M Phil Econometrics Vijayamohan

Omitted Variable Bias (cont)Omitted Variable Bias (cont)

1211

211i

2i11i11102

12

~)

~(Eso

)xx(

xxx~thenx

~~x~

xonxofregressiontheConsider

71

CDS M Phil Econometrics Vijayamohan

72

MisMis--specified Modelsspecified Models

,x~~

y~

110

21

2u

22.1

1x)R1(

1)ˆvar(

For the mis-specified model:

x)

~(Var

21

2

1

Thus ˆVar~

Var 11

unless x1 and x2 are uncorrelated

37

Model Specification:Model Specification:ImplicationsImplications

• If an omitted variable is correlated withthe included variables, the estimates arebiased as well as inconsistent.

• In addition, the error variance is incorrect,and usually overestimated.

• If the omitted variable is uncorrelated tothe included variables, the errors are stillbiased, even though the s are not.

CDS M Phil Econometrics Vijayamohan 73

CDS M Phil Econometrics Vijayamohan

74

MisspecifiedMisspecified Models (cont)Models (cont)

•• While the variance of the estimator isWhile the variance of the estimator issmaller for thesmaller for the misspecifiedmisspecified model, unlessmodel, unlessbb22 = 0 the= 0 the misspecifiedmisspecified model is biasedmodel is biased

•• As the sample size grows, the variance ofAs the sample size grows, the variance ofeach estimator shrinks to zero, making theeach estimator shrinks to zero, making thevariance difference less importantvariance difference less important

38

CDS M Phil Econometrics Vijayamohan

uβXy 11

ebXbXy 2211

Suppose this time the “true” model is:

but we estimate:

This equation is over-specified and tends tooccur when researchers adopt a “kitchensink” approach to model building.

This specification error does not lead to bias:both the parameters and the error varianceare unbiasedly estimated.

2. Inclusion of Irrelevant Variable:2. Inclusion of Irrelevant Variable:Inclusion/Inclusion/overfittingoverfitting biasbias

75

CDS M Phil Econometrics Vijayamohan

0

ββ)b(E 1

1

However, the estimates are inefficient: thatis to say, including irrelevant variablesraises the standard errors of ourcoefficient estimates.

76

Inclusion of Irrelevant Variable:Inclusion of Irrelevant Variable:Inclusion/Inclusion/overfittingoverfitting biasbias

39

CDS M Phil Econometrics Vijayamohan Chap

A third type of mis-specificationoccurs when we adopt an incorrectfunctional form.

For example, we estimate a linearregression model whereas the "true"regression model is log-linear.

3.3. Functional FormFunctional Form MisMis--specificationspecification

Functional FormFunctional Form MisMis--specificationspecification::ImplicationsImplications

• Incorrect functional form can result inautocorrelation or heteroskedasticity.

• Next class

CDS M Phil Econometrics Vijayamohan 78

40

Functional FormFunctional Form MisMis--specificationspecification::CausesCauses

•Theoretical design.

o something is omitted,

o irrelevantly included,

o mismeasured or

o non-linear.

CDS M Phil Econometrics Vijayamohan 79

Functional FormFunctional Form MisMis--specificationspecification::TestsTests

• Actual Specification Tests

o No test can reveal poor theoreticalconstruction per se.

o The best indicator : the model has someundesirable statistical property;

o e.g a misspecified functional form will oftenbe indicated by a significant test forautocorrelation.

o Sometimes time-series models will havenegative autocorrelation as a result of poordesign.

CDS M Phil Econometrics Vijayamohan 80

41

CDS M Phil Econometrics Vijayamohan

uZαXβy

432 yyyZ

A common test for mis-specification is Ramsey’sregression specification error test (RESET) –Ramsey (1969).

The test is H0: = 0.

Z contains powers of the predicted values ofthe dependent variable

Essentially the variables are 'proxying' forother possible variables (or nonlinearities).

Specification Error TestsSpecification Error Tests

81

CDS M Phil Econometrics Vijayamohan

)kn/(ee

J/)eeee(F **

)kn/()R1(

J/)RR(F

2

2*

2

We then perform a standard F-test on the significanceof the additional variables.

or

Specification Error TestsSpecification Error Tests

R2: from the model with Z variables

R*2: from the model without Z variables

If the estimated F value is significant, accept thehypothesis that model without Z variables is mis-specified

82

42

CDS M Phil Econometrics Vijayamohan 83

Ramsey’s regression specificationRamsey’s regression specificationerror test (RESET)error test (RESET) –– An ExampleAn Example

Statistics:Linear models and related

Regression diagnosticsSpecification tests, etc.

InStata

Model Specification: TestsModel Specification: Tests

• Specification Criteria for lagged designs

o Most useful for comparing time seriesmodels with same set of variables, butdiffering number of parameters

AIC (Akaike Information Criterion)

Schwartz Criterion

CDS M Phil Econometrics Vijayamohan 84

43

Model Specification:Model Specification:RemediesRemedies

• Model Building

o Hendry and the LSE school of “top-down” modeling.

o Nested Models

o Stepwise Regression.

Stepwise regression is a process of includingthe variables in the model “one step at atime.”

CDS M Phil Econometrics Vijayamohan 85