nonnon--normality, normality, ols violation of assumptions · pdf file05.05.2011 ·...
TRANSCRIPT
1
CDS M Phil Econometrics Vijayamohan
OLSOLS
Violation of Violation of AssumptionsAssumptions
CDS M Phil Econometrics
Vijayamohanan Pillai N
1
NonNon--Normality, Normality, MulticollinearityMulticollinearity, ,
Specification ErrorSpecification Error
2CDS M Phil Econometrics Vijayamohan
Assumption Violations: How we will approach the question.
• Definition
• Implications
• Causes
• Tests
• Remedies
3/3/2014 3CDS M Phil Econometrics Vijayamohan
Assumption ViolationsAssumption Violations
•Problems with u:
•The disturbances are not normally distributed
•The variance parameters in the covariance-variance matrix are different
•The disturbance terms are correlated
CDS M Phil Econometrics Vijayamohan
3/3/2014 4
CDS M Phil Econometrics Vijayamohan
Assumption ViolationsAssumption Violations ::•Problems with X:
•The explanatory variables and the disturbance term are correlated
•There is high linear dependence between two or more explanatory variables
•Incorrect model – e.g. exclusion of relevant variables; inclusion of irrelevant variables; incorrect functional form
3/3/2014 5
CDS M Phil Econometrics Vijayamohan
Residual AnalysisResidual Analysis
••The residual for observation The residual for observation ii, , eeii, = the , = the difference between its observed and difference between its observed and predicted valuepredicted value
••Check the assumptions of regression Check the assumptions of regression by examining the residualsby examining the residuals
••Graphical Analysis of ResidualsGraphical Analysis of Residuals
iii YYe −=
6
2
CDS M Phil Econometrics Vijayamohan
Residual Analysis or Residual Analysis or Model Adequacy TestsModel Adequacy Tests iii YYe −=
7
Model adequacy diagnosis: An important stage, before hypothesis testing in forecast modelling
The fitted model is said to be adequate if it explains the data set adequately, i.e., if the residual does not contain (or conceal) any ‘explainable non-randomness’ left from the (‘explained’) model.
i.e. if the residual is purely random/white noise
If all the OLS assumptions are satisfied.CDS M Phil Econometrics Vijayamohan
Residual Analysis for Residual Analysis for LinearityLinearity
Not Linear Linear�
x
resi
dual
s
x
Y
x
Y
x
resi
dual
s
8
CDS M Phil Econometrics Vijayamohan
Residual Analysis for Residual Analysis for IndependenceIndependence
Not IndependentIndependent
X
Xresi
dual
s
resi
dual
s
X
resi
dual
s
�
Chap
(1) Non-zero Mean for the residuals (Definition)
o The residuals have a mean other than 0.
o
o Note that this refers to the true residuals. Hence the estimated residuals have a mean of 0, while the true residuals are non-zero.
oo E(u) > or < 0.E(u) > or < 0.
CDS M Phil Econometrics Vijayamohan
)n,...2,1i(0)u(E i =≠
10
Non-zero Mean for the residuals (Implications)
• The true regression line is
• Therefore the intercept is biased.
• The slope, b, is unbiased.
Y a bXi i e= + + µ
CDS M Phil Econometrics Vijayamohan11
a + μe
a
Non-zero Mean for the residuals
••Causes: Causes:
••some form of specification error: some form of specification error: omitted variables.omitted variables.
CDS M Phil Econometrics Vijayamohan12
3
(2) Non-normally distributed errors : Definition
• The residuals are not NID(0, σ)
0.0
8.8
17.5
26.3
35.0
-1000.0 -250.0 500.0 1250.0 2000.0
Histogram of Residuals of rate90
Residuals of rate90
Cou
nt
Normality TestsAssumption Value Probability Decision(5%)Skewness 5.1766 0.000000 RejectedKurtosis 4.6390 0.000004 Rejected
CDS M Phil Econometrics Vijayamohan13
Non-normally distributed errors : Implications
oo The model is to some degree The model is to some degree misspecifiedmisspecified..
oo A collection of truly stochastic A collection of truly stochastic disturbances should have a normal disturbances should have a normal distribution: distribution:
The central limit theorem states that as The central limit theorem states that as the number of random variables increases, the number of random variables increases, the sum of their distributions tends to be a the sum of their distributions tends to be a
normal distribution.normal distribution.
CDS M Phil Econometrics Vijayamohan14
Non-normally distributed errors : Implications (cont.)
• If the residuals are not normally distributed, then the estimators of a and b are also not normally distributed.
• Estimates are, however, still BLUE.
• Estimates are unbiased and have minimum variance.
• BUT, no longer asymptotically efficient, even though they are asymptotically unbiased
• (consistent).
CDS M Phil Econometrics Vijayamohan15
Non-normally distributed errors : Implications (cont.)
• If the residuals are normally distributed, then the LS estimator is also the ML estimator.
•• MLEs are asymptotically efficient among MLEs are asymptotically efficient among consistent and asymptotically normally distributed consistent and asymptotically normally distributed estimators.estimators.
If residuals are non-Normal,
• It is only our hypothesis tests which are affected.
CDS M Phil Econometrics Vijayamohan16
Non-normally distributed errors: Causes
•• Generally caused by a specification error.Generally caused by a specification error.
•• Usually an omitted variable.Usually an omitted variable.
•• Can also result from Can also result from
oo Outliers in data.Outliers in data.
ooWrong functional form.Wrong functional form.
CDS M Phil Econometrics Vijayamohan17 CDS M Phil Econometrics Vijayamohan
Non-Normality Tests: Residual Analysis for Normality
Percent
Residual
� A normal probability plot of the residuals can be used to check for normality:
-3 -2 -1 0 1 2 3
0
100 The plotted points
reasonably linear
18
4
CDS M Phil Econometrics Vijayamohan
Percent
Residual
� Residual distribution Positively skewed
-3 -2 -1 0 1 2 3
0
100 The plotted points lie
above the comparison line on both
tails of the distribution
Non-Normality Tests: Residual Analysis for Normality
19 CDS M Phil Econometrics Vijayamohan
Percent
Residual
� Residual distribution heavy-tailed
-3 -2 -1 0 1 2 3
0
100
The plotted points in the upper tail lie
above the comparison
line and those in the
lower tail below the
line
Non-Normality Tests: Residual Analysis for Normality
20
Non-normally distributed errors : Tests for non-normality (cont.)
• Jarque-Bera test
o This test examines both the skewness and kurtosis of a distribution to test for normality.
o Where S is the skewness and K is the kurtosis of the residuals.
o JB has a χ2 distribution with 2 df.
oo Ho: S = 0; K = 3 (residuals normal)Ho: S = 0; K = 3 (residuals normal)
o If Estimated JB near zero, p-value > 0.05,
» do not reject Ho.CDS M Phil Econometrics Vijayamohan
21 CDS M Phil Econometrics Vijayamohan 22
A portmanteau test, since the four lowest moments about the origin are used jointly for its calculati on.
Non-normally distributed errors : Tests for non-normality (cont.)
JarqueJarque, , Carlos M.Carlos M.; Anil K. ; Anil K. BeraBera (1980). "Efficient (1980). "Efficient tests for normality, tests for normality, homoscedasticityhomoscedasticity and serial and serial independence of regression residuals". Economics independence of regression residuals". Economics Letters 6 (3): 255Letters 6 (3): 255––259. 259.
oo JarqueJarque, , Carlos M.Carlos M.; Anil K. ; Anil K. BeraBera (1981). "Efficient (1981). "Efficient tests for normality, tests for normality, homoscedasticityhomoscedasticity and serial and serial independence of regression residuals: Monte Carlo independence of regression residuals: Monte Carlo evidence". Economics Letters 7 (4): 313evidence". Economics Letters 7 (4): 313––318.318.
CDS M Phil Econometrics Vijayamohan23 CDS M Phil Econometrics Vijayamohan 24
Variable name: resid; label: residuals
NonNon--normality Tests :normality Tests :An ExampleAn Example
5
Residual vs. Predictor PlotResidual vs. Predictor Plot
CDS M Phil Econometrics Vijayamohan 25
-50
050
100
Res
idua
ls
1000 1500 2000 2500Square feet
Non
Non
--nor
mal
ity
Tes
ts :
norm
alit
y T
ests
:A
n E
xam
ple
An
Exa
mpl
e
Residual vs. Fit PlotResidual vs. Fit Plot
CDS M Phil Econometrics Vijayamohan 26
-50
050
100
Res
idua
ls
200 250 300 350 400Fitted values
Non
Non
--nor
mal
ity
Tes
ts :
norm
alit
y T
ests
:A
n E
xam
ple
An
Exa
mpl
e
HistogramHistogram
CDS M Phil Econometrics Vijayamohan 27
0.0
05.0
1.0
15.0
2D
ensi
ty
-50 0 50 100Residuals
Non
Non
--nor
mal
ity
Tes
ts :
norm
alit
y T
ests
:A
n E
xam
ple
An
Exa
mpl
e
StataStata Normality testsNormality tests
CDS M Phil Econometrics Vijayamohan 28
StatIsitics:Summaries, Tables and Tests:
Distributional plots and tests:Skewness and kurtosis
normality tests, Shapiro-Wilk normality testsShapiro-Francia normality
tests
CDS M Phil Econometrics Vijayamohan 29
0.00
0.25
0.50
0.75
1.00
Nor
mal
F[(
resi
d-m
)/s]
0.00 0.25 0.50 0.75 1.00Empirical P[i] = i/(N+1)
Graphics : Distributional graphs : Normal probability plot
Non
Non
--nor
mal
ity
Tes
ts :
norm
alit
y T
ests
:A
n E
xam
ple
An
Exa
mpl
e
Normal Probability PlotNormal Probability Plot
CDS M Phil Econometrics Vijayamohan 30
NonNon--normality Tests :normality Tests :An ExampleAn Example
6
NonNon--normally distributed errors: normally distributed errors: RemediesRemedies
•Try to modify your theory.
•Omitted variable?
•Outlier needing specification?
•Modify your functional form by taking some variance transforming step such as
•square root, exponentiation, logs, etc.
CDS M Phil Econometrics Vijayamohan31
MulticollinearityMulticollinearity: : DefinitionDefinition
• Multicollinearity : the condition where the independent variables are related to each other. Causation is not implied by multicollinearity.
• As any two (or more) variables become more and more closely correlated, the condition worsens, and ‘approaches singularity’.
• Since the X's are supposed to be fixed, this a sample problem.
• Since multicollinearity is almost always present, it is a problem of degree, not merely existence.
CDS M Phil Econometrics Vijayamohan32
•• Consider 2 explanatory variable model:Consider 2 explanatory variable model:
•• YYii = = ββoo + + ββ11XX1i 1i + + ββ22XX2i2i + + UUii
•• In matrix format,In matrix format,
•• Y = XY = Xββ + U+ U
CDS M Phil Econometrics Vijayamohan
MulticollinearityMulticollinearity: : ImplicationsImplications
)yx()xx(ˆ 1 ′′=β −
12u )xx()(Var −′σ=β
33
•• Y = XY = Xββ + U+ U
•• In meanIn mean--deviations, we havedeviations, we have
••Where D = |(Where D = |(xxTTxx)|)|CDS M Phil Econometrics Vijayamohan
=′
∑∑∑∑
2212
2121
xxx
xxx)xx(
−−
=′∑∑∑∑−
2121
21221
xxx
xxx
D
1)xx(
=′∑∑
yx
yx)yx(
2
1
34
MulticollinearityMulticollinearity: : ImplicationsImplications
•• YYii = = ββoo + + ββ11XX1i 1i + + ββ22XX2i2i + + UUii
CDS M Phil Econometrics Vijayamohan
)yx()xx(ˆ 1 ′′=β −
−−
=′∑∑∑∑−
2121
21221
xxx
xxx
D
1)xx(
=′∑∑
yx
yx)yx(
2
1
∑ ∑ ∑∑ ∑ ∑ ∑
−
−=β
221
22
21
212221
1)xx(xx
xxyxxyxˆ
35
MulticollinearityMulticollinearity: : ImplicationsImplications
•• YYii = = ββoo + + ββ11XX1i 1i + + ββ22XX2i2i + + UUii
CDS M Phil Econometrics Vijayamohan
12u )xx()(Var −′σ=β
∑ ∑ ∑∑
−σ=β
221
22
21
222
u1)xx(xx
x)(Var
∑σ
−=β
21
2u
22.1
1x)R1(
1)ˆvar(
−−
=′∑∑∑∑−
2121
21221
xxx
xxx
D
1)xx(
R1.22 = R2 in the regression of x 1 on all the other variables
36
MulticollinearityMulticollinearity: : ImplicationsImplications
7
CDS M Phil Econometrics Vijayamohan
� Consider the following cases
� A) No multicollinearity
� X1 ⊥ X2 : Orthogonal ⇒ Cov(X1, X2) = R1.22 = 0.
� The regression would appear to be identical to separate bivariate regressions: both coefficients and variances
∑ ∑ ∑∑ ∑ ∑ ∑
−
−=β
221
22
21
212221
1)xx(xx
xxyxxyxˆ
∑σ
−=β
21
2u
22.1
1x)R1(
1)ˆvar(
∑∑=β
21
11
x
yxˆ
∑σ=β
21
2u
1x
)ˆvar(
37
MulticollinearityMulticollinearity: : ImplicationsImplicationso B) Perfect Multicollinearity
Given X = (xGiven X = (x 11, x, x22,…,,…,xx kk) ) xx ii : the : the ii--thth column of X with n observations. column of X with n observations.
If Xi is a perfect linear combination of one or more other variables Xj,
where the constants (where the constants (cc ii) are not all zero, ) are not all zero, then X'X is singular : : |X'X| = 0.
The matrix does not have full rank. The matrix does not have full rank.
0cx...cxcx kk2211 =++
CDS M Phil Econometrics Vijayamohan 38
MulticollinearityMulticollinearity: : ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
For example if x 1 = kx2 then the variables x 1and x 2, are exactly linearly related.
⇒ the matrix X TX is singular ⇒ the inverse does not exist.
Substitute x 1 = kx2 in the following and see:
∑ ∑ ∑∑ ∑ ∑ ∑
−
−=β
221
22
21
212221
1)xx(xx
xxyxxyxˆ
39
MulticollinearityMulticollinearity: : ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
The more common problem: Imperfect or near multicollinearity
two or more of the explanatory variables are approximately linearly related
0xxx =++ &kk ccc ...2211
40
MulticollinearityMulticollinearity: : ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
For example: x 1 = kx2 + v (where v is a random element:
higher the r, the greater the prevalence of multicollinearity.
(Naturally, without v, r = 1 by definition.)
41
MulticollinearityMulticollinearity: : ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
ii21i uXY +β+β=
∑σ=β
2i
2u
2x
)ˆvar(
∑ −= 2i
2i )XX(x
The impact of multicollinearity on the standard errors of regression estimates :
In the context of a simple (bivariate) regression:
where
42
MulticollinearityMulticollinearity: : ImplicationsImplications
8
CDS M Phil Econometrics Vijayamohan
With several explanatory variables :
where R 2k- is the coefficient of
determination when X k is regressed on all the other explanatory variables.
∑σ
−=β
2k
2u
2.k
kx)R1(
1)ˆvar(
43
MulticollinearityMulticollinearity: : ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
⇒ the (variance and standard error) of the coefficients depend not only on the standard error of the regression and the variation in the explanatory variable in question
but also on R 2k.
∑σ
−=β
2k
2u
2.k
kx)R1(
1)ˆvar(
44
MulticollinearityMulticollinearity: : ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
Ceteris paribus,
The correlation of X k with the other variables ↑, var ↑.
Variation in X k ↑, var ↓.
Overall fit of the regression ↑, var ↓.
∑σ
−=β
2k
2u
2.k
kx)R1(
1)ˆvar(
45
MulticollinearityMulticollinearity: : ImplicationsImplications
• If the independent variables are highly correlated, inflated
••⇒⇒ t ratio's are lower
••⇒⇒ insignificant
• and R2 tends to be high as well.
••⇒⇒ Significant FSignificant F
•• Sign changes occur with the introduction of Sign changes occur with the introduction of a new variablea new variable
• The still BLUE (but useless)
)ˆvar( kβ
kβ
kβ
CDS M Phil Econometrics Vijayamohan 46
MulticollinearityMulticollinearity: : ImplicationsImplications
CDS M Phil Econometrics Vijayamohan
Nonexperimental data never orthogonal (R 2
k. = 0)
So to some extent, multicollinearity always present.
When is multicollinearity a problem?
Some diagnostic statistics:
Multicollinearity: Tests/Indicators
47 CDS M Phil Econometrics Vijayamohan
Some diagnostic statistics:
1. Klein’s Rule of Thumb (Lawrence Klein: 1980)
Multicollinearity is problem if R2
k. from auxiliary regression > the overall R 2 .
2. Variance inflation factor (VIF):
“The most useful single diagnostic guide” J Johnston (1984)
)R1(
12
.k−
Multicollinearity: Tests/Indicators
48
9
Interpreting VIFsInterpreting VIFs
•• No No multicollinearitymulticollinearity ⇒⇒ VIF = 1.0VIF = 1.0
•• If the VIF for a variable were 9, If the VIF for a variable were 9,
its standard error would be three times as its standard error would be three times as large as it would be if its VIF was 1.large as it would be if its VIF was 1.
•• In such a case, the coefficient would have to In such a case, the coefficient would have to be 3 times as large to be statistically be 3 times as large to be statistically significant. significant.
49CDS M Phil Econometrics Vijayamohan
Interpreting VIFsInterpreting VIFs
•• If the VIF is greater than 10.0, then If the VIF is greater than 10.0, then multicollinearitymulticollinearity is probably severe. is probably severe.
••⇒⇒ 90% of the variance of 90% of the variance of XXjj is explained by is explained by
the other Xs.the other Xs.
•• In small samples, a VIF of about 5.0 may In small samples, a VIF of about 5.0 may indicate problemsindicate problems
50CDS M Phil Econometrics Vijayamohan
Multicollinearity: Tests/Indicators (cont.)
• Also Tolerance:
• If the tolerance equals 1, the variables are unrelated.
• If TOLj = 0, then they are perfectly correlated.
)R1(VIF
1TOL 2
.kk −==
51CDS M Phil Econometrics Vijayamohan CDS M Phil Econometrics Vijayamohan
How large a VIF value has to be to be “large enough”?
Belsley (1991) suggests :
1. Eigen values of X ’X2. Condition index (CI) and3. Condition number (CN)
Multicollinearity: Tests/Indicators
52
CDS M Phil Econometrics Vijayamohan
Given the eigenvalues λ1 > λ2 > λ3 > ….
CIj = (λ1/ λj); j = 1, 2, 3, …..
Stata/SPSS reports sqrt of CI j
CN = sqrt(Max eigenvalue/ Min eigenvalue)
Or CN = sqrt(Max CI j).
Multicollinearity: Tests/Indicators
53 CDS M Phil Econometrics Vijayamohan
Largest CI = 5 – 10 ⇒ no problemLargest CI = 30 – 100 ⇒ problematicLargest CI = 1000 – 3000 ⇒ severe problem
See1. DA Belsley, 1991 Conditioning Diagnostics:
Collinearity and weak data in Regression, Wiley2. DA Belsley, E Kuh and RE Welsch 1980 Regression
Diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley
3. Norman R Draper and Harry Smith 2003 Applied Regression Analysis 3 rd Ed. Wiley
Multicollinearity: Tests/Indicators
54
10
CDS M Phil Econometrics Vijayamohan 55
MulticollinearityMulticollinearity: : Tests/IndicatorsTests/Indicators
An ExampleAn Example
CDS M Phil Econometrics Vijayamohan 56
MulticollinearityMulticollinearity: : Tests/IndicatorsTests/Indicators
An ExampleAn Example
CDS M Phil Econometrics Vijayamohan 57
Postestimation statistics for regress:Reports and statistics:
Variance inflation factors
MulticollinearityMulticollinearity: : Tests/IndicatorsTests/Indicators
An ExampleAn Example
In Stata
CDS M Phil Econometrics Vijayamohan 58
For Condition Index and NumberFor Condition Index and Number
Download the Download the collincollin commandcommand
CDS M Phil Econometrics Vijayamohan 59 CDS M Phil Econometrics Vijayamohan 60
Sqrt( λ1/ λ1)Sqrt( λ1/ λ2)Sqrt( λ1/ λ3) M
ulti
colli
near
ity
Mul
tico
lline
arit
y: : T
ests
/Ind
icat
ors
Tes
ts/I
ndic
ator
s
An
Exa
mpl
eA
n E
xam
ple
11
Multicollinearity: Causes
• Sampling mechanism. Poorly constructed design & measurement scheme or limited range.
• Statistical model specification: adding polynomial terms or trend indicators.
• Too many variables in the model - the model is overdetermined.
• Theoretical specification is wrong.
Inappropriate construction of theory or even measurement
CDS M Phil Econometrics Vijayamohan 61
MulticollinearityMulticollinearity: : Remedies Remedies
• Increase sample size
• Omit Variables
• Scale Construction/Transformation
• Factor Analysis
• Constrain the estimation. Such as the case where you can set the value of one coefficient relative to another.
•• Ignore it Ignore it -- report adjusted r2 and claim it report adjusted r2 and claim it
warrants retention in the model.warrants retention in the model.
CDS M Phil Econometrics Vijayamohan 62
Model Specification: DefinitionModel Specification: Definition
Specification error
• covers any mistake in the set of assumptions of a model and the associated inference procedures
•• But it has come to be used for errors in But it has come to be used for errors in specifying the data matrix X.specifying the data matrix X.
CDS M Phil Econometrics Vijayamohan 63
Model Specification: Definition Model Specification: Definition (cont.)(cont.)
• There are basically 4 types of misspecification we need to examine:
oo exclusion of a relevant variableexclusion of a relevant variable
oo inclusion of an irrelevant variableinclusion of an irrelevant variable
o functional form
omeasurement error and misspecifiederror term
CDS M Phil Econometrics Vijayamohan 64
CDS M Phil Econometrics Vijayamohan
65
Too Many or Too Few VariablesToo Many or Too Few Variables
•• What happens if we include variables in What happens if we include variables in our specification that don’t belong?our specification that don’t belong?
•• There is no effect on our parameter There is no effect on our parameter estimate, and OLS remains unbiasedestimate, and OLS remains unbiased
••What if we exclude a variable from our What if we exclude a variable from our specification that does belong?specification that does belong?
•• OLS will usually be biased OLS will usually be biased
CDS M Phil Econometrics Vijayamohan
66
1. Omission of relevant variable:1. Omission of relevant variable:Exclusion/ Exclusion/ UnderfittingUnderfitting BiasBias
,uxxy 22110 +β+β+β=Suppose the true model is:
But we estimate ,ux
~~y~ 110 +β+β=
Then
( )( )∑
∑−
−=β
211i
i11i1
xx
yxx~
12
CDS M Phil Econometrics Vijayamohan
67
Omitted Variable BiasOmitted Variable Bias
,uxxy 22110 +β+β+β=But
So the numerator becomes
( )( )∑
∑−
−=β
211i
i11i1
xx
yxx~
( )( )
( ) ( ) ( ) i11i2i11i2
2
11i1
i2i21i1011i
uxxxxxxx
uxxxx
∑∑∑
∑
−+−β+−β
=+β+β+β−
CDS M Phil Econometrics Vijayamohan
68
Omitted Variable Bias Omitted Variable Bias (cont)(cont)
( ) ( )
( )∑∑
∑∑
∑∑
−
−β+β=β
=
−
−+
−
−β+β=β
211i
2i11i211
i
211i
i11i
211i
2i11i21
)xx(
xxx)
~(E
have we nsexpectatio taking 0,)uE( since
)xx(
uxx
)xx(
xxx~
( )( )∑
∑−
−=β
211i
i11i1
xx
yxx~
CDS M Phil Econometrics Vijayamohan
Omitted Variable Bias (cont)Omitted Variable Bias (cont)
( )
1211
211i
2i11i11102
12
~)
~(E so
)xx(
xxx~ then x
~~x~
x on x of regression the Consider
δβ+β=β
−
−=δδ+δ=∑∑
69 CDS M Phil Econometrics Vijayamohan
70
MisMis--specified Modelsspecified Models
,x~~
y~ 110 β+β=
∑σ
−=β
21
2u
22.1
1x)R1(
1)ˆvar(
For the mis-specified model:
x)
~(Var
21
2
1∑
σ=β
Thus ( ) ( ) ˆVar~
Var 11 β<β
unless x 1 and x 2 are uncorrelated
CDS M Phil Econometrics Vijayamohan
71
MisspecifiedMisspecified Models (cont)Models (cont)
•• While the variance of the estimator is While the variance of the estimator is smaller for the smaller for the misspecifiedmisspecified model, unless model, unless
ββ22 = 0 the = 0 the misspecifiedmisspecified model is biasedmodel is biased
•• As the sample size grows, the variance of As the sample size grows, the variance of each estimator shrinks to zero, making the each estimator shrinks to zero, making the variance difference less importantvariance difference less important
CDS M Phil Econometrics Vijayamohan
uβXy 11 +=
ebXbXy 2211 ++=
Suppose this time the “true” model is:but we estimate:
This equation is over-specified and tends to occur when researchers adopt a “kitchen sink” approach to model building.
This specification error does not lead to bias: both the parameters and the error variance are unbiasedly estimated.
2. Inclusion of Irrelevant Variable:2. Inclusion of Irrelevant Variable:Inclusion/Inclusion/overfittingoverfitting biasbias
72
13
CDS M Phil Econometrics Vijayamohan
==
0
ββ)b(E 1
1
However, the estimates are inefficient: that is to say, including irrelevant variables raises the standard errors of our coefficient estimates.
73
Inclusion of Irrelevant Variable:Inclusion of Irrelevant Variable:Inclusion/Inclusion/overfittingoverfitting biasbias
CDS M Phil Econometrics Vijayamohan Chap
A third type of mis-specification occurs when we adopt an incorrect functional form.
For example, we estimate a linear regression model whereas the "true" regression model is log-linear.
3. 3. Functional Form Functional Form MisMis--specificationspecification
Functional Form Functional Form MisMis--specificationspecification : : ImplicationsImplications
• Incorrect functional form can result in autocorrelation or heteroskedasticity.
• Next class
CDS M Phil Econometrics Vijayamohan 75
Functional Form Functional Form MisMis--specificationspecification : : CausesCauses
•Theoretical design.
osomething is omitted,
o irrelevantly included,
omismeasured or
onon-linear.
CDS M Phil Econometrics Vijayamohan 76
Functional Form Functional Form MisMis--specificationspecification : : TestsTests
• Actual Specification Tests
o No test can reveal poor theoretical construction per se.
o The best indicator : the model has some undesirable statistical property;
o e.g a misspecified functional form will often be indicated by a significant test for autocorrelation.
o Sometimes time-series models will have negative autocorrelation as a result of poor design.
CDS M Phil Econometrics Vijayamohan 77 CDS M Phil Econometrics Vijayamohan
uZαXβy ++=
[ ]432 yyyZ =
A common test for mis-specification is Ramsey’s regression specification error test (RESET) –Ramsey (1969).
The test is H0: α = 0.
Z contains powers of the predicted values of the dependent variable
Essentially the variables are 'proxying' for other possible variables (or nonlinearities).
Specification Error TestsSpecification Error Tests
78
14
CDS M Phil Econometrics Vijayamohan
)kn/(ee
J/)eeee(F **
−′′−′
=)kn/()R1(
J/)RR(F
2
2*
2
−−−=
We then perform a standard F-test on the significan ce of the additional variables.
or
Specification Error TestsSpecification Error Tests
R2: from the model with Z variables
R*2: from the model without Z variables
If the estimated F value is significant, accept the hypothesis that model without Z variables is mis-specified
79 CDS M Phil Econometrics Vijayamohan 80
Ramsey’s regression specification Ramsey’s regression specification error test (RESET) error test (RESET) –– An ExampleAn Example
Statistics:Linear models and related
Regression diagnosticsSpecification tests, etc.
In Stata
Model Specification: TestsModel Specification: Tests
• Specification Criteria for lagged designs
oMost useful for comparing time series models with same set of variables, but differing number of parameters
AIC (Akaike Information Criterion)
Schwartz Criterion
CDS M Phil Econometrics Vijayamohan 81
Model Specification: Remedies Model Specification: Remedies
•Model Building
o Hendry and the LSE school of “top-down” modeling.
o Nested Models
o Stepwise Regression.
�Stepwise regression is a process of including the variables in the model “one step at a time.”
CDS M Phil Econometrics Vijayamohan 82