ols violation of assumptions - thus spake vm · pdf file4 cds m phil econometrics vijayamohan...
TRANSCRIPT
1
CDS M Phil Econometrics Vijayamohan
OLSOLS
Violation ofViolation ofAssumptionsAssumptions
CDS M Phil Econometrics
Vijayamohanan Pillai N
1
NonNon--Normality,Normality,MulticollinearityMulticollinearity,,
Specification ErrorSpecification Error
2CDS M Phil Econometrics Vijayamohan
2
Assumptions of RegressionAssumptions of Regression
An acronym NOLINE:
• Non-stochastic X
• Orthogonal X and Error
• Linearity
• Independence of Errors
• Normality of Error
• Equal Variance (Homoscedasticity)
23/10/2009 3CDS M Phil EconometricsVijayamohan
Assumption Violations:
How we will approach the question.
• Definition
• Implications
• Causes
• Tests
• Remedies
23/10/2009 4CDS M Phil EconometricsVijayamohan
3
Assumption ViolationsAssumption Violations
•Problems with u:
•The disturbances are not normallydistributed
•The variance parameters in thecovariance-variance matrix aredifferent
•The disturbance terms are correlated
CDS M Phil Econometrics Vijayamohan
23/10/2009 5
CDS M Phil Econometrics Vijayamohan
Assumption ViolationsAssumption Violations::
•Problems with X:
•The explanatory variables and thedisturbance term are correlated
•There is high linear dependence betweentwo or more explanatory variables
•Incorrect model – e.g. exclusion ofrelevant variables; inclusion of irrelevantvariables; incorrect functional form
23/10/2009 6
4
CDS M Phil Econometrics Vijayamohan
Residual AnalysisResidual Analysis
••The residual for observationThe residual for observation ii,, eeii, = the, = thedifference between its observed anddifference between its observed andpredicted valuepredicted value
••Check the assumptions of regressionCheck the assumptions of regressionby examining the residualsby examining the residuals
••Graphical Analysis of ResidualsGraphical Analysis of Residuals
iii YYe
7
CDS M Phil Econometrics Vijayamohan
Residual AnalysisResidual Analysis
••Check the assumptions of regressionCheck the assumptions of regressionby examining the residualsby examining the residuals
oo Examine for linearity assumptionExamine for linearity assumption
oo Evaluate independence assumptionEvaluate independence assumption
oo Evaluate normal distribution assumptionEvaluate normal distribution assumption
oo Examine for constant variance for allExamine for constant variance for alllevels of X (levels of X (homoscedasticityhomoscedasticity))
••Graphical Analysis of ResidualsGraphical Analysis of Residuals
oo Can plot residuals vs. XCan plot residuals vs. X
iii YYe
8
5
CDS M Phil Econometrics Vijayamohan
Residual Analysis orResidual Analysis orModel Adequacy TestsModel Adequacy Tests iii YYe
9
Model adequacy diagnosis: An important stage,before hypothesis testing in forecast modelling
The fitted model is said to be adequate if itexplains the data set adequately, i.e., if theresidual does not contain (or conceal) any‘explainable non-randomness’ left from the(‘explained’) model.
i.e. if the residual is purely random/white noise
If all the OLS assumptions are satisfied.
CDS M Phil Econometrics Vijayamohan
Residual Analysis forResidual Analysis forLinearityLinearity
Not Linear Linear
x
resid
uals
x
Y
x
Y
x
resid
uals
10
6
CDS M Phil Econometrics Vijayamohan
Residual Analysis forResidual Analysis forIndependenceIndependence
Not IndependentIndependent
X
Xresid
uals
resid
uals
X
resid
uals
Chap
(1) Non-zero Mean for theresiduals (Definition)
o The residuals have a mean other than 0.
o
o Note that this refers to the true residuals.Hence the estimated residuals have amean of 0, while the true residuals arenon-zero.
oo E(u) > or < 0.E(u) > or < 0.
CDS M Phil Econometrics Vijayamohan
)n,...2,1i(0)u(E i
12
7
Non-zero Mean for theresiduals (Implications)
• The true regression line is
• Therefore the intercept isbiased.
• The slope, b, is unbiased.
Y a bXi i e
CDS M Phil Econometrics Vijayamohan13
Non-zero Mean for theresiduals (Causes, Tests,
Remedies)••Causes:Causes:
••some form of specification error:some form of specification error:omitted variables.omitted variables.
••We will discuss Tests and RemediesWe will discuss Tests and Remedieswhen we look closely atwhen we look closely atSpecification errors.Specification errors.
CDS M Phil Econometrics Vijayamohan14
8
(2) Non-normally distributederrors : Definition
• The residuals are not NID(0, )
0.0
8.8
17.5
26.3
35.0
-1000.0 -250.0 500.0 1250.0 2000.0
Histogram of Residuals of rate90
Residuals of rate90
Count
Normality TestsAssumption Value Probability Decision(5%)Skewness 5.1766 0.000000 RejectedKurtosis 4.6390 0.000004 Rejected
CDS M Phil Econometrics Vijayamohan15
Non-normally distributederrors : Implications
oo The model is to some degreeThe model is to some degree misspecifiedmisspecified..
oo A collection of truly stochasticA collection of truly stochasticdisturbances should have a normaldisturbances should have a normaldistribution:distribution:
The central limit theorem states that asThe central limit theorem states that asthe number of random variables increases,the number of random variables increases,the sum of their distributions tends to be athe sum of their distributions tends to be anormal distribution.normal distribution.
CDS M Phil Econometrics Vijayamohan16
9
Non-normally distributederrors : Implications (cont.)• If the residuals are not normally distributed, then
the estimators of a and b are also not normallydistributed.
• Estimates are, however, still BLUE.
• Estimates are unbiased and have minimumvariance.
• BUT, no longer asymptotically efficient, eventhough they are asymptotically unbiased
• (consistent).
•• Asymptotically efficient?Asymptotically efficient?
CDS M Phil Econometrics Vijayamohan17
Non-normally distributed errors: Implications (cont.)
• An estimator is asymptotically efficient, if it isconsistent, asymptotically normally distributed,and has an asymptotic covariance matrix that isnot larger than that of any other similar estimator.
• If the residuals are normally distributed, then theLS estimator is also the ML estimator.
•• MLEs are asymptotically efficient amongMLEs are asymptotically efficient amongconsistent and asymptotically normally distributedconsistent and asymptotically normally distributedestimators.estimators.
If residuals are non-Normal,
• It is only our hypothesis tests which are affected.CDS M Phil Econometrics Vijayamohan
18
10
Non-normally distributederrors: Causes
•• Generally caused by a misspecificationGenerally caused by a misspecificationerror.error.
•• Usually an omitted variable.Usually an omitted variable.
•• Can also result fromCan also result from
oo Outliers in data.Outliers in data.
oo Wrong functional form.Wrong functional form.
CDS M Phil Econometrics Vijayamohan19
CDS M Phil Econometrics Vijayamohan
Non-Normality Tests:Residual Analysis for Normality
Percent
Residual
A normal probability plot of the residualscan be used to check for normality:
-3 -2 -1 0 1 2 3
0
100 The plottedpoints
reasonablylinear
20
11
CDS M Phil Econometrics Vijayamohan
Percent
Residual
Residual distribution Positively skewed
-3 -2 -1 0 1 2 3
0
100 The plottedpoints lie
above thecomparisonline in bothtails of the
distribution
Non-Normality Tests:Residual Analysis for Normality
21
CDS M Phil Econometrics Vijayamohan
Percent
Residual
Residual distribution heavy-tailed
-3 -2 -1 0 1 2 3
0
100
The plottedpoints in theupper tail lie
above thecomparison
line andthose in the
lower tailbelow the
line
Non-Normality Tests:Residual Analysis for Normality
22
12
Non-normally distributed errors : Testsfor non-normality (cont.)
• Jarque-Bera test
o This test examines both the skewness andkurtosis of a distribution to test for normality.
o Where S is the skewness and K is the kurtosis ofthe residuals.
o JB has a 2 distribution with 2 df.
oo Ho: S = 0; K = 3 (residuals normal)Ho: S = 0; K = 3 (residuals normal)
o If Estimated JB near zero, p-value > 0.05,
» do not reject Ho.CDS M Phil Econometrics Vijayamohan
23
CDS M Phil Econometrics Vijayamohan 24
A portmanteau test, since the four lowest momentsabout the origin are used jointly for its calculation.
13
Non-normally distributed errors : Testsfor non-normality (cont.)
JarqueJarque,, Carlos M.Carlos M.; Anil K.; Anil K. BeraBera (1980). "Efficient(1980). "Efficienttests for normality,tests for normality, homoscedasticityhomoscedasticity and serialand serialindependence of regression residuals". Economicsindependence of regression residuals". EconomicsLetters 6 (3): 255Letters 6 (3): 255––259.259.
oo JarqueJarque,, Carlos M.Carlos M.; Anil K.; Anil K. BeraBera (1981). "Efficient(1981). "Efficienttests for normality,tests for normality, homoscedasticityhomoscedasticity and serialand serialindependence of regression residuals: Monte Carloindependence of regression residuals: Monte Carloevidence". Economics Letters 7 (4): 313evidence". Economics Letters 7 (4): 313––318.318.
CDS M Phil Econometrics Vijayamohan25
CDS M Phil Econometrics Vijayamohan 26
Variable name: resid; label: residuals
NonNon--normality Tests :normality Tests :An ExampleAn Example
14
Residual vs. Predictor PlotResidual vs. Predictor Plot
CDS M Phil Econometrics Vijayamohan 27
-50
05
010
0R
esid
ua
ls
1000 1500 2000 2500Square feetN
on
No
n--n
orm
ali
tyT
es
ts:
no
rma
lity
Te
sts
:A
nE
xa
mp
leA
nE
xa
mp
le
Residual vs. Fit PlotResidual vs. Fit Plot
CDS M Phil Econometrics Vijayamohan 28
-50
050
10
0R
esid
ua
ls
200 250 300 350 400Fitted values
No
nN
on
--no
rma
lity
Te
sts
:n
orm
ali
tyT
es
ts:
An
Ex
am
ple
An
Ex
am
ple
15
HistogramHistogram
CDS M Phil Econometrics Vijayamohan 29
0.0
05
.01
.015
.02
De
nsity
-50 0 50 100Residuals
No
nN
on
--no
rma
lity
Te
sts
:n
orm
ali
tyT
es
ts:
An
Ex
am
ple
An
Ex
am
ple
StataStata Normality testsNormality tests
CDS M Phil Econometrics Vijayamohan 30
StatIsitics:Summaries, Tables and Tests:
Distributional plots and tests:Skewness and kurtosis
normality tests,Shapiro-Wilk normality testsShapiro-Francia normality
tests
16
CDS M Phil Econometrics Vijayamohan 31
0.0
00.2
50.5
00.7
51.0
0N
orm
alF
[(re
sid
-m)/
s]
0.00 0.25 0.50 0.75 1.00Empirical P[i] = i/(N+1)
Graphics :Distributional graphs : Normal probability plot
No
nN
on
--no
rma
lity
Te
sts
:n
orm
ali
tyT
es
ts:
An
Ex
am
ple
An
Ex
am
ple
Normal Probability PlotNormal Probability Plot
CDS M Phil Econometrics Vijayamohan 32
NonNon--normality Tests :normality Tests :An ExampleAn Example
17
NonNon--normally distributed errors:normally distributed errors:RemediesRemedies
•Try to modify your theory.
•Omitted variable?
•Outlier needing specification?
•Modify your functional form by takingsome variance transforming step suchas
•square root, exponentiation, logs, etc.
CDS M Phil Econometrics Vijayamohan33
MulticollinearityMulticollinearity:: DefinitionDefinition
• Multicollinearity : the condition where theindependent variables are related to each other.Causation is not implied by multicollinearity.
• As any two (or more) variables become more andmore closely correlated, the condition worsens, and‘approaches singularity’.
• Since the X's are supposed to be fixed, this a sampleproblem.
• Since multicollinearity is almost always present, it isa problem of degree, not merely existence.
CDS M Phil Econometrics Vijayamohan34
18
•• Consider 2 explanatory variable model:Consider 2 explanatory variable model:
•• YYii == oo ++ 11XX1i1i ++ 22XX2i2i ++ UUii
•• In matrix format,In matrix format,
•• Y = XY = X + U+ U
CDS M Phil Econometrics Vijayamohan
MulticollinearityMulticollinearity:: ImplicationsImplications
)yx()xx(ˆ 1
12u )xx()(Var
35
•• Y = XY = X + U+ U
•• In meanIn mean--deviations, we havedeviations, we have
•• Where D = |(Where D = |(xxTTxx)|)|CDS M Phil Econometrics Vijayamohan
2212
2121
xxx
xxx)xx(
2121
21221
xxx
xxx
D
1)xx(
yx
yx)yx(
2
1
36
MulticollinearityMulticollinearity:: ImplicationsImplications
19
•• YYii == oo ++ 11XX1i1i ++ 22XX2i2i ++ UUii
CDS M Phil Econometrics Vijayamohan
)yx()xx(ˆ 1
2121
21221
xxx
xxx
D
1)xx(
yx
yx)yx(
2
1
221
22
21
212221
1)xx(xx
xxyxxyxˆ
37
MulticollinearityMulticollinearity:: ImplicationsImplications
•• YYii == oo ++ 11XX1i1i ++ 22XX2i2i ++ UUii
CDS M Phil Econometrics Vijayamohan
12u )xx()(Var
2
2122
21
222
u1)xx(xx
x)(Var
21
2u
22.1
1x)R1(
1)ˆvar(
2121
21221
xxx
xxx
D
1)xx(
R1.22 = R2 in the regression of x1 on all the other variables
38
MulticollinearityMulticollinearity:: ImplicationsImplications
20
CDS M Phil Econometrics Vijayamohan
Consider the following cases
A) No multicollinearity
X1 X2 : Orthogonal Cov(X1, X2) = R1.22 = 0.
The regression would appear to be identicalto separate bivariate regressions: bothcoefficients and variances
221
22
21
212221
1)xx(xx
xxyxxyxˆ
21
2u
22.1
1x)R1(
1)ˆvar(
21
11
x
yxˆ
21
2u
1x
)ˆvar(
39
MulticollinearityMulticollinearity:: ImplicationsImplications
o B) Perfect Multicollinearity
Given X = (xGiven X = (x11, x, x22,…,,…,xxkk))xxii : the: the ii--thth column of X with n observations.column of X with n observations.
If Xi is a perfect linear combination of one ormore other variables Xj,
where the constants (where the constants (ccii) are not all zero,) are not all zero,then X'X is singular :: |X'X| = 0.
The matrix does not have full rank.The matrix does not have full rank.
0cx...cxcx kk2211
CDS M Phil Econometrics Vijayamohan 40
MulticollinearityMulticollinearity:: ImplicationsImplications
21
CDS M Phil Econometrics Vijayamohan
For example if x1 = kx2 then the variables x1
and x2, are exactly linearly related.
the matrix XTX is singular the inverse does not exist.
Substitute x1 = kx2 in the following and see:
221
22
21
212221
1)xx(xx
xxyxxyxˆ
41
MulticollinearityMulticollinearity:: ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
The more common problem:Imperfect or near multicollinearity
two or more of the explanatory variables areapproximately linearly related
0xxx kk ccc ...2211
42
MulticollinearityMulticollinearity:: ImplicationsImplications
22
CDS M Phil Econometrics Vijayamohan
For example: x1 = kx2 + v (where v is arandom element:
higher the r, the greater the prevalenceof multicollinearity.
(Naturally, without v, r = 1 by definition.)
43
MulticollinearityMulticollinearity:: ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
ii21i uXY
2i
2u
2x
)ˆvar(
2i
2i )XX(x
The impact of multicollinearity on the standarderrors of regression estimates :
In the context of a simple (bivariate) regression:
where
44
MulticollinearityMulticollinearity:: ImplicationsImplications
23
CDS M Phil Econometrics Vijayamohan
With several explanatory variables :
where R2k is the coefficient of
determination when Xk is regressed onall the other explanatory variables.
2k
2u
2.k
kx)R1(
1)ˆvar(
45
MulticollinearityMulticollinearity:: ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
the (variance and standard error) ofthe coefficients depend not only on thestandard error of the regression and thevariation in the explanatory variable inquestion
but also on R2k.
2k
2u
2.k
kx)R1(
1)ˆvar(
46
MulticollinearityMulticollinearity:: ImplicationsImplications
24
CDS M Phil Econometrics Vijayamohan
Ceteris paribus,
The correlation of Xk with the other variables ,var .
Variation in Xk , var .
Overall fit of the regression , var .
2k
2u
2.k
kx)R1(
1)ˆvar(
47
MulticollinearityMulticollinearity:: ImplicationsImplications
• If the independent variables are highlycorrelated, inflated
•• t ratio's are lower
•• insignificant
• and R2 tends to be high as well.
•• Significant FSignificant F
•• Sign changes occur with the introduction ofSign changes occur with the introduction ofa new variablea new variable
• The still BLUE (but useless)
)ˆvar( k
k
k
CDS M Phil Econometrics Vijayamohan 48
MulticollinearityMulticollinearity:: ImplicationsImplications
25
CDS M Phil Econometrics Vijayamohan
Nonexperimental datanever orthogonal (R2
k. = 0)
So to some extent, multicollinearity alwayspresent.
When is multicollinearity a problem?
Some diagnostic statistics:
Multicollinearity:Tests/Indicators
49
CDS M Phil Econometrics Vijayamohan
Some diagnostic statistics:
1. Klein’s Rule of Thumb (Lawrence Klein: 1980)
Multicollinearity is problem ifR2
k. from auxiliary regression > the overall R2 .
2. Variance inflation factor (VIF):
“The most useful single diagnostic guide”J Johnston (1984)
)R1(
12
.k
Multicollinearity:Tests/Indicators
50
26
Interpreting VIFsInterpreting VIFs
•• NoNo multicollinearitymulticollinearity VIF = 1.0VIF = 1.0
•• If the VIF for a variable were 9,If the VIF for a variable were 9,
its standard error would be three times asits standard error would be three times aslarge as it would be if its VIF was 1.large as it would be if its VIF was 1.
•• In such a case, the coefficient would have toIn such a case, the coefficient would have tobe 3 times as large to be statisticallybe 3 times as large to be statisticallysignificant.significant.
51CDS M Phil Econometrics Vijayamohan
Interpreting VIFsInterpreting VIFs
•• If the VIF is greater than 10.0, thenIf the VIF is greater than 10.0, thenmulticollinearitymulticollinearity is probably severe.is probably severe.
•• 90% of the variance of90% of the variance of XXjj is explained byis explained bythe other Xs.the other Xs.
•• In small samples, a VIF of about 5.0 mayIn small samples, a VIF of about 5.0 mayindicate problemsindicate problems
52CDS M Phil Econometrics Vijayamohan
27
Multicollinearity:Tests/Indicators (cont.)
• Also Tolerance:
• If the tolerance equals 1, the variablesare unrelated.
• If TOLj = 0, then they are perfectlycorrelated.
)R1(VIF
1TOL 2
.kk
53CDS M Phil Econometrics Vijayamohan
CDS M Phil Econometrics Vijayamohan
How large a VIF value has to beto be “large enough”?
Belsley (1991) suggests :
1. Eigen values of X’X2. Condition index (CI) and3. Condition number (CN)
Multicollinearity:Tests/Indicators
54
28
CDS M Phil Econometrics Vijayamohan
Given the eigenvalues 1 > 2 > 3 > ….
CIj = (1/ j); j = 1, 2, 3, …..
Stata/SPSS reports sqrt of CIj
CN = sqrt(Max eigenvalue/ Min eigenvalue)
Or CN = Max CIj.
Multicollinearity:Tests/Indicators
55
CDS M Phil Econometrics Vijayamohan
Largest CI = 5 – 10 no problemLargest CI = 30 – 100 problematicLargest CI = 1000 – 3000 severe problem
See1. DA Belsley, 1991 Conditioning Diagnostics:
Collinearity and weak data in Regression, Wiley2. DA Belsley, E Kuh and RE Welsch 1980 Regression
Diagnostics: Identifying Influential Data andSources of Collinearity, Wiley
3. Norman R Draper and Harry Smith 2003 AppliedRegression Analysis 3rd Ed. Wiley
Multicollinearity:Tests/Indicators
56
29
CDS M Phil Econometrics Vijayamohan 57
MulticollinearityMulticollinearity:: Tests/IndicatorsTests/Indicators
An ExampleAn Example
CDS M Phil Econometrics Vijayamohan 58
MulticollinearityMulticollinearity:: Tests/IndicatorsTests/Indicators
An ExampleAn Example
30
CDS M Phil Econometrics Vijayamohan 59
Postestimation statistics for regress:Reports and statistics:
Variance inflation factors
MulticollinearityMulticollinearity:: Tests/IndicatorsTests/Indicators
An ExampleAn Example
InStata
CDS M Phil Econometrics Vijayamohan 60
For Condition Index and NumberFor Condition Index and Number
Download theDownload the collincollin commandcommand
31
CDS M Phil Econometrics Vijayamohan 61
CDS M Phil Econometrics Vijayamohan 62
Sqrt(1/ 1)Sqrt(1/ 2)Sqrt(1/ 3)
Mu
ltic
oll
ine
ari
tyM
ult
ico
llin
ea
rity
::T
es
ts/I
nd
ica
tors
Te
sts
/In
dic
ato
rs
An
Ex
am
ple
An
Ex
am
ple
32
Multicollinearity: Causes
• Sampling mechanism. Poorly constructeddesign & measurement scheme or limitedrange.
• Statistical model specification: addingpolynomial terms or trend indicators.
• Too many variables in the model - themodel is overdetermined.
• Theoretical specification is wrong.Inappropriate construction of theory oreven measurement
CDS M Phil Econometrics Vijayamohan 63
MulticollinearityMulticollinearity:: RemediesRemedies
• Increase sample size
• Omit Variables
• Scale Construction/Transformation
• Factor Analysis
• Constrain the estimation. Such as the casewhere you can set the value of onecoefficient relative to another.
•• Ignore itIgnore it -- report adjusted r2 and claim itreport adjusted r2 and claim itwarrants retention in the model.warrants retention in the model.
CDS M Phil Econometrics Vijayamohan 64
33
Model Specification:Model Specification:DefinitionDefinition
Specification error
• covers any mistake in the set ofassumptions of a model and theassociated inference procedures
•• But it has come to be used for errors inBut it has come to be used for errors inspecifying the data matrix X.specifying the data matrix X.
CDS M Phil Econometrics Vijayamohan 65
Model Specification:Model Specification:Definition (cont.)Definition (cont.)
• There are basically 4 types ofmisspecification we need to examine:
oo exclusion of a relevant variableexclusion of a relevant variable
oo inclusion of an irrelevant variableinclusion of an irrelevant variable
o functional form
o measurement error and misspecifiederror term
CDS M Phil Econometrics Vijayamohan 66
34
CDS M Phil Econometrics Vijayamohan
67
Too Many or Too FewToo Many or Too FewVariablesVariables
•• What happens if we include variables inWhat happens if we include variables inour specification that don’t belong?our specification that don’t belong?
•• There is no effect on our parameterThere is no effect on our parameterestimate, and OLS remains unbiasedestimate, and OLS remains unbiased
•• What if we exclude a variable from ourWhat if we exclude a variable from ourspecification that does belong?specification that does belong?
•• OLS will usually be biasedOLS will usually be biased
CDS M Phil Econometrics Vijayamohan
68
1. Omission of relevant variable:1. Omission of relevant variable:Exclusion/Exclusion/ UnderfittingUnderfitting BiasBias
,uxxy 22110 Suppose the true model is:
But we estimate,ux
~~y~
110
Then
211i
i11i1
xx
yxx~
35
CDS M Phil Econometrics Vijayamohan
69
Omitted Variable BiasOmitted Variable Bias
,uxxy 22110 But
So the numerator becomes
211i
i11i1
xx
yxx~
i11i2i11i22
11i1
i2i21i1011i
uxxxxxxx
uxxxx
CDS M Phil Econometrics Vijayamohan
70
Omitted Variable BiasOmitted Variable Bias(cont)(cont)
211i
2i11i211
i
211i
i11i
211i
2i11i21
)xx(
xxx)
~(E
havewensexpectatiotaking0,)uE(since
)xx(
uxx
)xx(
xxx~
211i
i11i1
xx
yxx~
36
CDS M Phil Econometrics Vijayamohan
Omitted Variable Bias (cont)Omitted Variable Bias (cont)
1211
211i
2i11i11102
12
~)
~(Eso
)xx(
xxx~thenx
~~x~
xonxofregressiontheConsider
71
CDS M Phil Econometrics Vijayamohan
72
MisMis--specified Modelsspecified Models
,x~~
y~
110
21
2u
22.1
1x)R1(
1)ˆvar(
For the mis-specified model:
x)
~(Var
21
2
1
Thus ˆVar~
Var 11
unless x1 and x2 are uncorrelated
37
Model Specification:Model Specification:ImplicationsImplications
• If an omitted variable is correlated withthe included variables, the estimates arebiased as well as inconsistent.
• In addition, the error variance is incorrect,and usually overestimated.
• If the omitted variable is uncorrelated tothe included variables, the errors are stillbiased, even though the s are not.
CDS M Phil Econometrics Vijayamohan 73
CDS M Phil Econometrics Vijayamohan
74
MisspecifiedMisspecified Models (cont)Models (cont)
•• While the variance of the estimator isWhile the variance of the estimator issmaller for thesmaller for the misspecifiedmisspecified model, unlessmodel, unlessbb22 = 0 the= 0 the misspecifiedmisspecified model is biasedmodel is biased
•• As the sample size grows, the variance ofAs the sample size grows, the variance ofeach estimator shrinks to zero, making theeach estimator shrinks to zero, making thevariance difference less importantvariance difference less important
38
CDS M Phil Econometrics Vijayamohan
uβXy 11
ebXbXy 2211
Suppose this time the “true” model is:
but we estimate:
This equation is over-specified and tends tooccur when researchers adopt a “kitchensink” approach to model building.
This specification error does not lead to bias:both the parameters and the error varianceare unbiasedly estimated.
2. Inclusion of Irrelevant Variable:2. Inclusion of Irrelevant Variable:Inclusion/Inclusion/overfittingoverfitting biasbias
75
CDS M Phil Econometrics Vijayamohan
0
ββ)b(E 1
1
However, the estimates are inefficient: thatis to say, including irrelevant variablesraises the standard errors of ourcoefficient estimates.
76
Inclusion of Irrelevant Variable:Inclusion of Irrelevant Variable:Inclusion/Inclusion/overfittingoverfitting biasbias
39
CDS M Phil Econometrics Vijayamohan Chap
A third type of mis-specificationoccurs when we adopt an incorrectfunctional form.
For example, we estimate a linearregression model whereas the "true"regression model is log-linear.
3.3. Functional FormFunctional Form MisMis--specificationspecification
Functional FormFunctional Form MisMis--specificationspecification::ImplicationsImplications
• Incorrect functional form can result inautocorrelation or heteroskedasticity.
• Next class
CDS M Phil Econometrics Vijayamohan 78
40
Functional FormFunctional Form MisMis--specificationspecification::CausesCauses
•Theoretical design.
o something is omitted,
o irrelevantly included,
o mismeasured or
o non-linear.
CDS M Phil Econometrics Vijayamohan 79
Functional FormFunctional Form MisMis--specificationspecification::TestsTests
• Actual Specification Tests
o No test can reveal poor theoreticalconstruction per se.
o The best indicator : the model has someundesirable statistical property;
o e.g a misspecified functional form will oftenbe indicated by a significant test forautocorrelation.
o Sometimes time-series models will havenegative autocorrelation as a result of poordesign.
CDS M Phil Econometrics Vijayamohan 80
41
CDS M Phil Econometrics Vijayamohan
uZαXβy
432 yyyZ
A common test for mis-specification is Ramsey’sregression specification error test (RESET) –Ramsey (1969).
The test is H0: = 0.
Z contains powers of the predicted values ofthe dependent variable
Essentially the variables are 'proxying' forother possible variables (or nonlinearities).
Specification Error TestsSpecification Error Tests
81
CDS M Phil Econometrics Vijayamohan
)kn/(ee
J/)eeee(F **
)kn/()R1(
J/)RR(F
2
2*
2
We then perform a standard F-test on the significanceof the additional variables.
or
Specification Error TestsSpecification Error Tests
R2: from the model with Z variables
R*2: from the model without Z variables
If the estimated F value is significant, accept thehypothesis that model without Z variables is mis-specified
82
42
CDS M Phil Econometrics Vijayamohan 83
Ramsey’s regression specificationRamsey’s regression specificationerror test (RESET)error test (RESET) –– An ExampleAn Example
Statistics:Linear models and related
Regression diagnosticsSpecification tests, etc.
InStata
Model Specification: TestsModel Specification: Tests
• Specification Criteria for lagged designs
o Most useful for comparing time seriesmodels with same set of variables, butdiffering number of parameters
AIC (Akaike Information Criterion)
Schwartz Criterion
CDS M Phil Econometrics Vijayamohan 84