nonnon--normality, normality, ols violation of assumptions · pdf file05.05.2011 ·...

14
1 CDS M Phil Econometrics Vijayamohan OLS OLS Violation of Violation of Assumptions Assumptions CDS M Phil Econometrics Vijayamohanan Pillai N 1 Non Non-Normality, Normality, Multicollinearity Multicollinearity, , Specification Error Specification Error 2 CDS M Phil Econometrics Vijayamohan Assumption Violations: How we will approach the question. Definition Implications Causes Tests Remedies 3/3/2014 3 CDS M Phil Econometrics Vijayamohan Assumption Violations Assumption Violations Problems with u: The disturbances are not normally distributed The variance parameters in the covariance-variance matrix are different The disturbance terms are correlated CDS M Phil Econometrics Vijayamohan 3/3/2014 4 CDS M Phil Econometrics Vijayamohan Assumption Violations Assumption Violations: Problems with X: The explanatory variables and the disturbance term are correlated There is high linear dependence between two or more explanatory variables Incorrect model – e.g. exclusion of relevant variables; inclusion of irrelevant variables; incorrect functional form 3/3/2014 5 CDS M Phil Econometrics Vijayamohan Residual Analysis Residual Analysis The residual for observation The residual for observation i, , e i , = the , = the difference between its observed and difference between its observed and predicted value predicted value Check the assumptions of regression Check the assumptions of regression by examining the residuals by examining the residuals Graphical Analysis of Residuals Graphical Analysis of Residuals i i i Y ˆ Y e - = 6

Upload: letram

Post on 06-Mar-2018

223 views

Category:

Documents


4 download

TRANSCRIPT

1

CDS M Phil Econometrics Vijayamohan

OLSOLS

Violation of Violation of AssumptionsAssumptions

CDS M Phil Econometrics

Vijayamohanan Pillai N

1

NonNon--Normality, Normality, MulticollinearityMulticollinearity, ,

Specification ErrorSpecification Error

2CDS M Phil Econometrics Vijayamohan

Assumption Violations: How we will approach the question.

• Definition

• Implications

• Causes

• Tests

• Remedies

3/3/2014 3CDS M Phil Econometrics Vijayamohan

Assumption ViolationsAssumption Violations

•Problems with u:

•The disturbances are not normally distributed

•The variance parameters in the covariance-variance matrix are different

•The disturbance terms are correlated

CDS M Phil Econometrics Vijayamohan

3/3/2014 4

CDS M Phil Econometrics Vijayamohan

Assumption ViolationsAssumption Violations ::•Problems with X:

•The explanatory variables and the disturbance term are correlated

•There is high linear dependence between two or more explanatory variables

•Incorrect model – e.g. exclusion of relevant variables; inclusion of irrelevant variables; incorrect functional form

3/3/2014 5

CDS M Phil Econometrics Vijayamohan

Residual AnalysisResidual Analysis

••The residual for observation The residual for observation ii, , eeii, = the , = the difference between its observed and difference between its observed and predicted valuepredicted value

••Check the assumptions of regression Check the assumptions of regression by examining the residualsby examining the residuals

••Graphical Analysis of ResidualsGraphical Analysis of Residuals

iii YYe −=

6

2

CDS M Phil Econometrics Vijayamohan

Residual Analysis or Residual Analysis or Model Adequacy TestsModel Adequacy Tests iii YYe −=

7

Model adequacy diagnosis: An important stage, before hypothesis testing in forecast modelling

The fitted model is said to be adequate if it explains the data set adequately, i.e., if the residual does not contain (or conceal) any ‘explainable non-randomness’ left from the (‘explained’) model.

i.e. if the residual is purely random/white noise

If all the OLS assumptions are satisfied.CDS M Phil Econometrics Vijayamohan

Residual Analysis for Residual Analysis for LinearityLinearity

Not Linear Linear�

x

resi

dual

s

x

Y

x

Y

x

resi

dual

s

8

CDS M Phil Econometrics Vijayamohan

Residual Analysis for Residual Analysis for IndependenceIndependence

Not IndependentIndependent

X

Xresi

dual

s

resi

dual

s

X

resi

dual

s

Chap

(1) Non-zero Mean for the residuals (Definition)

o The residuals have a mean other than 0.

o

o Note that this refers to the true residuals. Hence the estimated residuals have a mean of 0, while the true residuals are non-zero.

oo E(u) > or < 0.E(u) > or < 0.

CDS M Phil Econometrics Vijayamohan

)n,...2,1i(0)u(E i =≠

10

Non-zero Mean for the residuals (Implications)

• The true regression line is

• Therefore the intercept is biased.

• The slope, b, is unbiased.

Y a bXi i e= + + µ

CDS M Phil Econometrics Vijayamohan11

a + μe

a

Non-zero Mean for the residuals

••Causes: Causes:

••some form of specification error: some form of specification error: omitted variables.omitted variables.

CDS M Phil Econometrics Vijayamohan12

3

(2) Non-normally distributed errors : Definition

• The residuals are not NID(0, σ)

0.0

8.8

17.5

26.3

35.0

-1000.0 -250.0 500.0 1250.0 2000.0

Histogram of Residuals of rate90

Residuals of rate90

Cou

nt

Normality TestsAssumption Value Probability Decision(5%)Skewness 5.1766 0.000000 RejectedKurtosis 4.6390 0.000004 Rejected

CDS M Phil Econometrics Vijayamohan13

Non-normally distributed errors : Implications

oo The model is to some degree The model is to some degree misspecifiedmisspecified..

oo A collection of truly stochastic A collection of truly stochastic disturbances should have a normal disturbances should have a normal distribution: distribution:

The central limit theorem states that as The central limit theorem states that as the number of random variables increases, the number of random variables increases, the sum of their distributions tends to be a the sum of their distributions tends to be a

normal distribution.normal distribution.

CDS M Phil Econometrics Vijayamohan14

Non-normally distributed errors : Implications (cont.)

• If the residuals are not normally distributed, then the estimators of a and b are also not normally distributed.

• Estimates are, however, still BLUE.

• Estimates are unbiased and have minimum variance.

• BUT, no longer asymptotically efficient, even though they are asymptotically unbiased

• (consistent).

CDS M Phil Econometrics Vijayamohan15

Non-normally distributed errors : Implications (cont.)

• If the residuals are normally distributed, then the LS estimator is also the ML estimator.

•• MLEs are asymptotically efficient among MLEs are asymptotically efficient among consistent and asymptotically normally distributed consistent and asymptotically normally distributed estimators.estimators.

If residuals are non-Normal,

• It is only our hypothesis tests which are affected.

CDS M Phil Econometrics Vijayamohan16

Non-normally distributed errors: Causes

•• Generally caused by a specification error.Generally caused by a specification error.

•• Usually an omitted variable.Usually an omitted variable.

•• Can also result from Can also result from

oo Outliers in data.Outliers in data.

ooWrong functional form.Wrong functional form.

CDS M Phil Econometrics Vijayamohan17 CDS M Phil Econometrics Vijayamohan

Non-Normality Tests: Residual Analysis for Normality

Percent

Residual

� A normal probability plot of the residuals can be used to check for normality:

-3 -2 -1 0 1 2 3

0

100 The plotted points

reasonably linear

18

4

CDS M Phil Econometrics Vijayamohan

Percent

Residual

� Residual distribution Positively skewed

-3 -2 -1 0 1 2 3

0

100 The plotted points lie

above the comparison line on both

tails of the distribution

Non-Normality Tests: Residual Analysis for Normality

19 CDS M Phil Econometrics Vijayamohan

Percent

Residual

� Residual distribution heavy-tailed

-3 -2 -1 0 1 2 3

0

100

The plotted points in the upper tail lie

above the comparison

line and those in the

lower tail below the

line

Non-Normality Tests: Residual Analysis for Normality

20

Non-normally distributed errors : Tests for non-normality (cont.)

• Jarque-Bera test

o This test examines both the skewness and kurtosis of a distribution to test for normality.

o Where S is the skewness and K is the kurtosis of the residuals.

o JB has a χ2 distribution with 2 df.

oo Ho: S = 0; K = 3 (residuals normal)Ho: S = 0; K = 3 (residuals normal)

o If Estimated JB near zero, p-value > 0.05,

» do not reject Ho.CDS M Phil Econometrics Vijayamohan

21 CDS M Phil Econometrics Vijayamohan 22

A portmanteau test, since the four lowest moments about the origin are used jointly for its calculati on.

Non-normally distributed errors : Tests for non-normality (cont.)

JarqueJarque, , Carlos M.Carlos M.; Anil K. ; Anil K. BeraBera (1980). "Efficient (1980). "Efficient tests for normality, tests for normality, homoscedasticityhomoscedasticity and serial and serial independence of regression residuals". Economics independence of regression residuals". Economics Letters 6 (3): 255Letters 6 (3): 255––259. 259.

oo JarqueJarque, , Carlos M.Carlos M.; Anil K. ; Anil K. BeraBera (1981). "Efficient (1981). "Efficient tests for normality, tests for normality, homoscedasticityhomoscedasticity and serial and serial independence of regression residuals: Monte Carlo independence of regression residuals: Monte Carlo evidence". Economics Letters 7 (4): 313evidence". Economics Letters 7 (4): 313––318.318.

CDS M Phil Econometrics Vijayamohan23 CDS M Phil Econometrics Vijayamohan 24

Variable name: resid; label: residuals

NonNon--normality Tests :normality Tests :An ExampleAn Example

5

Residual vs. Predictor PlotResidual vs. Predictor Plot

CDS M Phil Econometrics Vijayamohan 25

-50

050

100

Res

idua

ls

1000 1500 2000 2500Square feet

Non

Non

--nor

mal

ity

Tes

ts :

norm

alit

y T

ests

:A

n E

xam

ple

An

Exa

mpl

e

Residual vs. Fit PlotResidual vs. Fit Plot

CDS M Phil Econometrics Vijayamohan 26

-50

050

100

Res

idua

ls

200 250 300 350 400Fitted values

Non

Non

--nor

mal

ity

Tes

ts :

norm

alit

y T

ests

:A

n E

xam

ple

An

Exa

mpl

e

HistogramHistogram

CDS M Phil Econometrics Vijayamohan 27

0.0

05.0

1.0

15.0

2D

ensi

ty

-50 0 50 100Residuals

Non

Non

--nor

mal

ity

Tes

ts :

norm

alit

y T

ests

:A

n E

xam

ple

An

Exa

mpl

e

StataStata Normality testsNormality tests

CDS M Phil Econometrics Vijayamohan 28

StatIsitics:Summaries, Tables and Tests:

Distributional plots and tests:Skewness and kurtosis

normality tests, Shapiro-Wilk normality testsShapiro-Francia normality

tests

CDS M Phil Econometrics Vijayamohan 29

0.00

0.25

0.50

0.75

1.00

Nor

mal

F[(

resi

d-m

)/s]

0.00 0.25 0.50 0.75 1.00Empirical P[i] = i/(N+1)

Graphics : Distributional graphs : Normal probability plot

Non

Non

--nor

mal

ity

Tes

ts :

norm

alit

y T

ests

:A

n E

xam

ple

An

Exa

mpl

e

Normal Probability PlotNormal Probability Plot

CDS M Phil Econometrics Vijayamohan 30

NonNon--normality Tests :normality Tests :An ExampleAn Example

6

NonNon--normally distributed errors: normally distributed errors: RemediesRemedies

•Try to modify your theory.

•Omitted variable?

•Outlier needing specification?

•Modify your functional form by taking some variance transforming step such as

•square root, exponentiation, logs, etc.

CDS M Phil Econometrics Vijayamohan31

MulticollinearityMulticollinearity: : DefinitionDefinition

• Multicollinearity : the condition where the independent variables are related to each other. Causation is not implied by multicollinearity.

• As any two (or more) variables become more and more closely correlated, the condition worsens, and ‘approaches singularity’.

• Since the X's are supposed to be fixed, this a sample problem.

• Since multicollinearity is almost always present, it is a problem of degree, not merely existence.

CDS M Phil Econometrics Vijayamohan32

•• Consider 2 explanatory variable model:Consider 2 explanatory variable model:

•• YYii = = ββoo + + ββ11XX1i 1i + + ββ22XX2i2i + + UUii

•• In matrix format,In matrix format,

•• Y = XY = Xββ + U+ U

CDS M Phil Econometrics Vijayamohan

MulticollinearityMulticollinearity: : ImplicationsImplications

)yx()xx(ˆ 1 ′′=β −

12u )xx()(Var −′σ=β

33

•• Y = XY = Xββ + U+ U

•• In meanIn mean--deviations, we havedeviations, we have

••Where D = |(Where D = |(xxTTxx)|)|CDS M Phil Econometrics Vijayamohan

=′

∑∑∑∑

2212

2121

xxx

xxx)xx(

−−

=′∑∑∑∑−

2121

21221

xxx

xxx

D

1)xx(

=′∑∑

yx

yx)yx(

2

1

34

MulticollinearityMulticollinearity: : ImplicationsImplications

•• YYii = = ββoo + + ββ11XX1i 1i + + ββ22XX2i2i + + UUii

CDS M Phil Econometrics Vijayamohan

)yx()xx(ˆ 1 ′′=β −

−−

=′∑∑∑∑−

2121

21221

xxx

xxx

D

1)xx(

=′∑∑

yx

yx)yx(

2

1

∑ ∑ ∑∑ ∑ ∑ ∑

−=β

221

22

21

212221

1)xx(xx

xxyxxyxˆ

35

MulticollinearityMulticollinearity: : ImplicationsImplications

•• YYii = = ββoo + + ββ11XX1i 1i + + ββ22XX2i2i + + UUii

CDS M Phil Econometrics Vijayamohan

12u )xx()(Var −′σ=β

∑ ∑ ∑∑

−σ=β

221

22

21

222

u1)xx(xx

x)(Var

∑σ

−=β

21

2u

22.1

1x)R1(

1)ˆvar(

−−

=′∑∑∑∑−

2121

21221

xxx

xxx

D

1)xx(

R1.22 = R2 in the regression of x 1 on all the other variables

36

MulticollinearityMulticollinearity: : ImplicationsImplications

7

CDS M Phil Econometrics Vijayamohan

� Consider the following cases

� A) No multicollinearity

� X1 ⊥ X2 : Orthogonal ⇒ Cov(X1, X2) = R1.22 = 0.

� The regression would appear to be identical to separate bivariate regressions: both coefficients and variances

∑ ∑ ∑∑ ∑ ∑ ∑

−=β

221

22

21

212221

1)xx(xx

xxyxxyxˆ

∑σ

−=β

21

2u

22.1

1x)R1(

1)ˆvar(

∑∑=β

21

11

x

yxˆ

∑σ=β

21

2u

1x

)ˆvar(

37

MulticollinearityMulticollinearity: : ImplicationsImplicationso B) Perfect Multicollinearity

Given X = (xGiven X = (x 11, x, x22,…,,…,xx kk) ) xx ii : the : the ii--thth column of X with n observations. column of X with n observations.

If Xi is a perfect linear combination of one or more other variables Xj,

where the constants (where the constants (cc ii) are not all zero, ) are not all zero, then X'X is singular : : |X'X| = 0.

The matrix does not have full rank. The matrix does not have full rank.

0cx...cxcx kk2211 =++

CDS M Phil Econometrics Vijayamohan 38

MulticollinearityMulticollinearity: : ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

For example if x 1 = kx2 then the variables x 1and x 2, are exactly linearly related.

⇒ the matrix X TX is singular ⇒ the inverse does not exist.

Substitute x 1 = kx2 in the following and see:

∑ ∑ ∑∑ ∑ ∑ ∑

−=β

221

22

21

212221

1)xx(xx

xxyxxyxˆ

39

MulticollinearityMulticollinearity: : ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

The more common problem: Imperfect or near multicollinearity

two or more of the explanatory variables are approximately linearly related

0xxx =++ &kk ccc ...2211

40

MulticollinearityMulticollinearity: : ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

For example: x 1 = kx2 + v (where v is a random element:

higher the r, the greater the prevalence of multicollinearity.

(Naturally, without v, r = 1 by definition.)

41

MulticollinearityMulticollinearity: : ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

ii21i uXY +β+β=

∑σ=β

2i

2u

2x

)ˆvar(

∑ −= 2i

2i )XX(x

The impact of multicollinearity on the standard errors of regression estimates :

In the context of a simple (bivariate) regression:

where

42

MulticollinearityMulticollinearity: : ImplicationsImplications

8

CDS M Phil Econometrics Vijayamohan

With several explanatory variables :

where R 2k- is the coefficient of

determination when X k is regressed on all the other explanatory variables.

∑σ

−=β

2k

2u

2.k

kx)R1(

1)ˆvar(

43

MulticollinearityMulticollinearity: : ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

⇒ the (variance and standard error) of the coefficients depend not only on the standard error of the regression and the variation in the explanatory variable in question

but also on R 2k.

∑σ

−=β

2k

2u

2.k

kx)R1(

1)ˆvar(

44

MulticollinearityMulticollinearity: : ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

Ceteris paribus,

The correlation of X k with the other variables ↑, var ↑.

Variation in X k ↑, var ↓.

Overall fit of the regression ↑, var ↓.

∑σ

−=β

2k

2u

2.k

kx)R1(

1)ˆvar(

45

MulticollinearityMulticollinearity: : ImplicationsImplications

• If the independent variables are highly correlated, inflated

••⇒⇒ t ratio's are lower

••⇒⇒ insignificant

• and R2 tends to be high as well.

••⇒⇒ Significant FSignificant F

•• Sign changes occur with the introduction of Sign changes occur with the introduction of a new variablea new variable

• The still BLUE (but useless)

)ˆvar( kβ

CDS M Phil Econometrics Vijayamohan 46

MulticollinearityMulticollinearity: : ImplicationsImplications

CDS M Phil Econometrics Vijayamohan

Nonexperimental data never orthogonal (R 2

k. = 0)

So to some extent, multicollinearity always present.

When is multicollinearity a problem?

Some diagnostic statistics:

Multicollinearity: Tests/Indicators

47 CDS M Phil Econometrics Vijayamohan

Some diagnostic statistics:

1. Klein’s Rule of Thumb (Lawrence Klein: 1980)

Multicollinearity is problem if R2

k. from auxiliary regression > the overall R 2 .

2. Variance inflation factor (VIF):

“The most useful single diagnostic guide” J Johnston (1984)

)R1(

12

.k−

Multicollinearity: Tests/Indicators

48

9

Interpreting VIFsInterpreting VIFs

•• No No multicollinearitymulticollinearity ⇒⇒ VIF = 1.0VIF = 1.0

•• If the VIF for a variable were 9, If the VIF for a variable were 9,

its standard error would be three times as its standard error would be three times as large as it would be if its VIF was 1.large as it would be if its VIF was 1.

•• In such a case, the coefficient would have to In such a case, the coefficient would have to be 3 times as large to be statistically be 3 times as large to be statistically significant. significant.

49CDS M Phil Econometrics Vijayamohan

Interpreting VIFsInterpreting VIFs

•• If the VIF is greater than 10.0, then If the VIF is greater than 10.0, then multicollinearitymulticollinearity is probably severe. is probably severe.

••⇒⇒ 90% of the variance of 90% of the variance of XXjj is explained by is explained by

the other Xs.the other Xs.

•• In small samples, a VIF of about 5.0 may In small samples, a VIF of about 5.0 may indicate problemsindicate problems

50CDS M Phil Econometrics Vijayamohan

Multicollinearity: Tests/Indicators (cont.)

• Also Tolerance:

• If the tolerance equals 1, the variables are unrelated.

• If TOLj = 0, then they are perfectly correlated.

)R1(VIF

1TOL 2

.kk −==

51CDS M Phil Econometrics Vijayamohan CDS M Phil Econometrics Vijayamohan

How large a VIF value has to be to be “large enough”?

Belsley (1991) suggests :

1. Eigen values of X ’X2. Condition index (CI) and3. Condition number (CN)

Multicollinearity: Tests/Indicators

52

CDS M Phil Econometrics Vijayamohan

Given the eigenvalues λ1 > λ2 > λ3 > ….

CIj = (λ1/ λj); j = 1, 2, 3, …..

Stata/SPSS reports sqrt of CI j

CN = sqrt(Max eigenvalue/ Min eigenvalue)

Or CN = sqrt(Max CI j).

Multicollinearity: Tests/Indicators

53 CDS M Phil Econometrics Vijayamohan

Largest CI = 5 – 10 ⇒ no problemLargest CI = 30 – 100 ⇒ problematicLargest CI = 1000 – 3000 ⇒ severe problem

See1. DA Belsley, 1991 Conditioning Diagnostics:

Collinearity and weak data in Regression, Wiley2. DA Belsley, E Kuh and RE Welsch 1980 Regression

Diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley

3. Norman R Draper and Harry Smith 2003 Applied Regression Analysis 3 rd Ed. Wiley

Multicollinearity: Tests/Indicators

54

10

CDS M Phil Econometrics Vijayamohan 55

MulticollinearityMulticollinearity: : Tests/IndicatorsTests/Indicators

An ExampleAn Example

CDS M Phil Econometrics Vijayamohan 56

MulticollinearityMulticollinearity: : Tests/IndicatorsTests/Indicators

An ExampleAn Example

CDS M Phil Econometrics Vijayamohan 57

Postestimation statistics for regress:Reports and statistics:

Variance inflation factors

MulticollinearityMulticollinearity: : Tests/IndicatorsTests/Indicators

An ExampleAn Example

In Stata

CDS M Phil Econometrics Vijayamohan 58

For Condition Index and NumberFor Condition Index and Number

Download the Download the collincollin commandcommand

CDS M Phil Econometrics Vijayamohan 59 CDS M Phil Econometrics Vijayamohan 60

Sqrt( λ1/ λ1)Sqrt( λ1/ λ2)Sqrt( λ1/ λ3) M

ulti

colli

near

ity

Mul

tico

lline

arit

y: : T

ests

/Ind

icat

ors

Tes

ts/I

ndic

ator

s

An

Exa

mpl

eA

n E

xam

ple

11

Multicollinearity: Causes

• Sampling mechanism. Poorly constructed design & measurement scheme or limited range.

• Statistical model specification: adding polynomial terms or trend indicators.

• Too many variables in the model - the model is overdetermined.

• Theoretical specification is wrong.

Inappropriate construction of theory or even measurement

CDS M Phil Econometrics Vijayamohan 61

MulticollinearityMulticollinearity: : Remedies Remedies

• Increase sample size

• Omit Variables

• Scale Construction/Transformation

• Factor Analysis

• Constrain the estimation. Such as the case where you can set the value of one coefficient relative to another.

•• Ignore it Ignore it -- report adjusted r2 and claim it report adjusted r2 and claim it

warrants retention in the model.warrants retention in the model.

CDS M Phil Econometrics Vijayamohan 62

Model Specification: DefinitionModel Specification: Definition

Specification error

• covers any mistake in the set of assumptions of a model and the associated inference procedures

•• But it has come to be used for errors in But it has come to be used for errors in specifying the data matrix X.specifying the data matrix X.

CDS M Phil Econometrics Vijayamohan 63

Model Specification: Definition Model Specification: Definition (cont.)(cont.)

• There are basically 4 types of misspecification we need to examine:

oo exclusion of a relevant variableexclusion of a relevant variable

oo inclusion of an irrelevant variableinclusion of an irrelevant variable

o functional form

omeasurement error and misspecifiederror term

CDS M Phil Econometrics Vijayamohan 64

CDS M Phil Econometrics Vijayamohan

65

Too Many or Too Few VariablesToo Many or Too Few Variables

•• What happens if we include variables in What happens if we include variables in our specification that don’t belong?our specification that don’t belong?

•• There is no effect on our parameter There is no effect on our parameter estimate, and OLS remains unbiasedestimate, and OLS remains unbiased

••What if we exclude a variable from our What if we exclude a variable from our specification that does belong?specification that does belong?

•• OLS will usually be biased OLS will usually be biased

CDS M Phil Econometrics Vijayamohan

66

1. Omission of relevant variable:1. Omission of relevant variable:Exclusion/ Exclusion/ UnderfittingUnderfitting BiasBias

,uxxy 22110 +β+β+β=Suppose the true model is:

But we estimate ,ux

~~y~ 110 +β+β=

Then

( )( )∑

∑−

−=β

211i

i11i1

xx

yxx~

12

CDS M Phil Econometrics Vijayamohan

67

Omitted Variable BiasOmitted Variable Bias

,uxxy 22110 +β+β+β=But

So the numerator becomes

( )( )∑

∑−

−=β

211i

i11i1

xx

yxx~

( )( )

( ) ( ) ( ) i11i2i11i2

2

11i1

i2i21i1011i

uxxxxxxx

uxxxx

∑∑∑

−+−β+−β

=+β+β+β−

CDS M Phil Econometrics Vijayamohan

68

Omitted Variable Bias Omitted Variable Bias (cont)(cont)

( ) ( )

( )∑∑

∑∑

∑∑

−β+β=β

=

−+

−β+β=β

211i

2i11i211

i

211i

i11i

211i

2i11i21

)xx(

xxx)

~(E

have we nsexpectatio taking 0,)uE( since

)xx(

uxx

)xx(

xxx~

( )( )∑

∑−

−=β

211i

i11i1

xx

yxx~

CDS M Phil Econometrics Vijayamohan

Omitted Variable Bias (cont)Omitted Variable Bias (cont)

( )

1211

211i

2i11i11102

12

~)

~(E so

)xx(

xxx~ then x

~~x~

x on x of regression the Consider

δβ+β=β

−=δδ+δ=∑∑

69 CDS M Phil Econometrics Vijayamohan

70

MisMis--specified Modelsspecified Models

,x~~

y~ 110 β+β=

∑σ

−=β

21

2u

22.1

1x)R1(

1)ˆvar(

For the mis-specified model:

x)

~(Var

21

2

1∑

σ=β

Thus ( ) ( ) ˆVar~

Var 11 β<β

unless x 1 and x 2 are uncorrelated

CDS M Phil Econometrics Vijayamohan

71

MisspecifiedMisspecified Models (cont)Models (cont)

•• While the variance of the estimator is While the variance of the estimator is smaller for the smaller for the misspecifiedmisspecified model, unless model, unless

ββ22 = 0 the = 0 the misspecifiedmisspecified model is biasedmodel is biased

•• As the sample size grows, the variance of As the sample size grows, the variance of each estimator shrinks to zero, making the each estimator shrinks to zero, making the variance difference less importantvariance difference less important

CDS M Phil Econometrics Vijayamohan

uβXy 11 +=

ebXbXy 2211 ++=

Suppose this time the “true” model is:but we estimate:

This equation is over-specified and tends to occur when researchers adopt a “kitchen sink” approach to model building.

This specification error does not lead to bias: both the parameters and the error variance are unbiasedly estimated.

2. Inclusion of Irrelevant Variable:2. Inclusion of Irrelevant Variable:Inclusion/Inclusion/overfittingoverfitting biasbias

72

13

CDS M Phil Econometrics Vijayamohan

==

0

ββ)b(E 1

1

However, the estimates are inefficient: that is to say, including irrelevant variables raises the standard errors of our coefficient estimates.

73

Inclusion of Irrelevant Variable:Inclusion of Irrelevant Variable:Inclusion/Inclusion/overfittingoverfitting biasbias

CDS M Phil Econometrics Vijayamohan Chap

A third type of mis-specification occurs when we adopt an incorrect functional form.

For example, we estimate a linear regression model whereas the "true" regression model is log-linear.

3. 3. Functional Form Functional Form MisMis--specificationspecification

Functional Form Functional Form MisMis--specificationspecification : : ImplicationsImplications

• Incorrect functional form can result in autocorrelation or heteroskedasticity.

• Next class

CDS M Phil Econometrics Vijayamohan 75

Functional Form Functional Form MisMis--specificationspecification : : CausesCauses

•Theoretical design.

osomething is omitted,

o irrelevantly included,

omismeasured or

onon-linear.

CDS M Phil Econometrics Vijayamohan 76

Functional Form Functional Form MisMis--specificationspecification : : TestsTests

• Actual Specification Tests

o No test can reveal poor theoretical construction per se.

o The best indicator : the model has some undesirable statistical property;

o e.g a misspecified functional form will often be indicated by a significant test for autocorrelation.

o Sometimes time-series models will have negative autocorrelation as a result of poor design.

CDS M Phil Econometrics Vijayamohan 77 CDS M Phil Econometrics Vijayamohan

uZαXβy ++=

[ ]432 yyyZ =

A common test for mis-specification is Ramsey’s regression specification error test (RESET) –Ramsey (1969).

The test is H0: α = 0.

Z contains powers of the predicted values of the dependent variable

Essentially the variables are 'proxying' for other possible variables (or nonlinearities).

Specification Error TestsSpecification Error Tests

78

14

CDS M Phil Econometrics Vijayamohan

)kn/(ee

J/)eeee(F **

−′′−′

=)kn/()R1(

J/)RR(F

2

2*

2

−−−=

We then perform a standard F-test on the significan ce of the additional variables.

or

Specification Error TestsSpecification Error Tests

R2: from the model with Z variables

R*2: from the model without Z variables

If the estimated F value is significant, accept the hypothesis that model without Z variables is mis-specified

79 CDS M Phil Econometrics Vijayamohan 80

Ramsey’s regression specification Ramsey’s regression specification error test (RESET) error test (RESET) –– An ExampleAn Example

Statistics:Linear models and related

Regression diagnosticsSpecification tests, etc.

In Stata

Model Specification: TestsModel Specification: Tests

• Specification Criteria for lagged designs

oMost useful for comparing time series models with same set of variables, but differing number of parameters

AIC (Akaike Information Criterion)

Schwartz Criterion

CDS M Phil Econometrics Vijayamohan 81

Model Specification: Remedies Model Specification: Remedies

•Model Building

o Hendry and the LSE school of “top-down” modeling.

o Nested Models

o Stepwise Regression.

�Stepwise regression is a process of including the variables in the model “one step at a time.”

CDS M Phil Econometrics Vijayamohan 82