violations of ols

7/22/2019 Violations of OLS

1/64

Violations of Assumptions

of OLS

1


2/64

Readings: Asertiou:

P93-196 (2ndEdition

P83-179 (1stedition)

covers all this stuff.

2


3/64

The Linear Regression Model

The Assumptions

1. Linearity: The dependent variable is a linear function of theindependent variable

2. Xt has some variation i.e. Var(X) is not 0.

3. Xt

is non stochastic and fixed in repeated samples

4. E(ut) = 0

5. Homoskedasticity: Var (ut) = 2

= constant for all t,

6.

Cov(ut, us) = 0 , serial independence

7. ut~ N (,2)

8. No Multi-collinearity (i.e. no explanatory variable can be a linearcombination of others in the model)

3


4/64

Violation of the Assumptions of the Classical Linear

Regression Model (CLRM) (i.e. OLS)

We will focus on the assumptions regarding the

disturbance terms:

1. E(ut) = 0

2. Var(ut) = 2 (Homoskedasticity)

3. Cov (ui,uj) = 0 (Absence of autocorrelation)

4. utN(0,2) (Normally distributed errors)

4


5/64


6/64

Statistical Distributions for Diagnostic Tests

Often, anF- and a 2- version of the test are available.

TheF-test version involves estimating a restricted and an unrestricted

version of a test regression and comparing theRSS. (i.e. Wald tests)

The 2- version is sometimes called an LM test, and only has one

degree of freedom parameter: the number of restrictions being tested, m.

For small samples, theF-version is preferable.

6


7/64

Assumption 1:

E(ut) = 0

7


8/64

Assumption 1: E(ut) = 0

Assumption that the mean of the disturbances is zero.

For all diagnostic tests, we cannot observe the disturbances and so

perform the tests on the residuals instead.

Recall:

Disturbance -> difference between observation and true line

Residual -> difference between observation and estimated line

The mean of the residuals will always be zero provided that there is a

constant term in the regression so in most cases we dont need to worry

about this!!

8


9/64

Assumption 2:

Var(ut) = 2

True => Homoskedasticity

False => Heteroskedasticity

9


10/64


11/64

Detection of Heteroscedasticity

Graphical methods

Formal tests:One of the best is Whitesgeneral test for heteroskedasticity.

The test is carried out as follows:

1. Assume that the regression we carried out is as follows

yt= 1+ 2x2t+3x3t+ utAnd we want to test Var(ut) =

2. We estimate the model, obtaining

the residuals,

2. Then run the auxiliary regression

ut

tttttttt vxxxxxxu 3262

35

2

2433221

2

11


12/64

Performing Whites Test for Heteroscedasticity

3. ObtainR2from the auxiliary regression and multiply it by the number ofobservations, T. It can be shown that

T R22(m)

where mis the number of regressors in the auxiliary regression excluding theconstant term. The idea is that if R2 in the auxiliary regression is high, then

the equation has explanatory power.

4. If the 2test statistic from step 3 is greater than the corresponding valuefrom the statistical table then reject the null hypothesis that the disturbancesare homoscedastic.

There is a test for hetroscedasticity given with OLS regression output.

I.e. The basic idea is to see if xs are correlated with the size of errors(shouldnt be if homoskedastic)

12


13/64

Consequences of Using OLS in the Presence of Heteroscedasticity

OLS estimation still gives unbiased coefficient estimates (in linear models),but they are no longer BLUE. [not efficient because it increases the varianceof the distribution of the s]

This implies that if we still use OLS in the presence of heteroscedasticity, ourstandard errors could be inappropriate and hence any inferences we make

could be misleading.

If standard error is too high: our t-statistic will be too low and we will fail to rejecthypotheses we should reject.

If standard error is too low: our t-statistic will be too high and we will reject

hypotheses we should fail to reject..

Whether the standard errors calculated using the usual formulae are too big ortoo small will depend upon the form of the heteroscedasticity.

13


14/64

How Do we Deal with Heteroscedasticity?

If the form (i.e. the cause) of the heteroskedasticity is known, then we can use

an estimation method which takes this into account (called generalised leastsquares, GLS).

Suppose that the original regression:yt= 1+ 2x2t+ 3x3t+ ut

A simple illustration of GLS is as follows: Suppose that the error variance is

related to another variableztby

To remove the heteroscedasticity, divide the regression equation byzt

where is just the rescaled error term.

Now for knownzt.

So the disturbances from the new regression equation will be homoscedastic.

22

var tt zu

t

t

t

t

t

tt

t vz

x

z

x

zz

y 33

221

1

t

tt

z

uv

2

2

22

2

varvarvar

t

t

t

t

t

tt

z

z

z

u

z

uv

14


15/64

Other Approaches to Dealing with Heteroscedasticity

Other solutionsinclude:1. Transforming the variables into logs or reducing by some other measure

of size.

2. Use Whitesheteroscedasticity consistent standard error estimates.

The effect of using Whitescorrection is that in general the standard errors

for the slope coefficients are increased relative to the usual OLS standarderrors.

This makes us more conservativein hypothesis testing, so that we would

need more evidence against the null hypothesis before we would reject it.

15


16/64

Assumption 3:

Cov (ui,uj) = 0

(Absence of autocorrelation)

16


17/64

Aside: Covariance

What is covariance?

The covariance between two real-valued randomvariablesXand Y,

with expected values and is defined

as

If two variables tend to vary together (that is, when one ofthem is above its expected value, then the other variabletends to be above itsexpected value too), then the

covariance between the two variables will be positive. On the other hand, if one of them tends to be above its

expected value when the other variable is belowitsexpected value, then the covariance between the twovariables will be negative.

17
http://en.wikipedia.org/wiki/Real_numberhttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Expected_valuehttp://en.wikipedia.org/wiki/Expected_valuehttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Real_number


18/64

Autocorrelation Autocorrelation is when the error terms of different observations

are not independent of each other i.e. they are correlated witheach other

Consider the time series regression

Yt= 1X1t+ 2X2t+ 3X3t ++ kXkt + utbut in this case there is some relation between the error termsacross observations. [Imagine a shock like the current economic

collapsehas effects in more than one period!] E(ut) = 0 Var (ut) =

2

But:Cov (ut, us) 0

Thus the error covariance's are not zero.

This means that one of the assumption that makes OLS BLUEdoes not hold.

Autocorrelation is most likely to occur in a time series framework In cross-section data there may be spatial correlationthis is a form of

autocorrelation too!

18


19/64

Likely causes of Autocorrelation

1. Omit variable that ought to be included.

2. Misspecification of the functional form. Thisis most obvious where a straight line is

fitted where a curve should have beenused. This would clearly show up in plots ofresiduals.

3. Errors of measurement in the dependent

variable. If the errors are not random thenthe error term will pick up any systematicmistakes.

19


20/64

The Problem with Autocorrelation

OLS estimators will be inefficient so no longer BLUE but

are still unbiased so get right answer for betas on

average!!

Inefficient here basically means the estimate is more variable

than it needs to be! => less confident in our estimate than wecould be

The estimated variances of the regression coefficients

will be biased and inconsistent, therefore hypothesis

testing will no longer be valid. t-statistics will tend to be higher

R2will tend to be overestimated. [i.e. think we explain

more than we actually do!]

20


21/64

Detecting Autocorrelation

By observation Plot the residuals against time

If error is random should be no pattern!!

Plot u^tagainst u^t-1 If no correlation should not be a relationship!

The Durbin Watson Test Only good for first order serial correlation

It can give inconclusive results

It is not applicable when a lagged dependent variable isincluded in the regression

Breusch and Godfrey (1978) developed analternative test

21

-30

-20

-10

0

10

20

30

-30 -20 -10 0 10 20 30

RESID

PREVRESID


22/64

Breusch-Godfrey Test

This is an example of an LM (Lagrange Multiplier) type test

where only the restricted form of the model is estimated. Wethen test for a relaxation of these restrictions by applying a

formula.

Consider the model Y

t=

1X

1t+

2X

2t+

3X

3t++

kX

kt+ u

tWhere ut=ut-1+ 2ut-2+ 3ut-3++ put-p+ t

Combing these two equations gives Yt= 1X1t+ 2X2t+ 3X3t ++ kXkt + ut-1

+ 2ut-2+ 3ut-3++ put-p+ t

Test the following H0 and Ha Ho: 1= 2= 3== p= 0 Ha: at least one of the s is not zero, thus there is serial

correlation

22


23/64


This two-stage test begins be considering the

model

Yt= 1X1t+ 2X2t+ 3X3t ++ kXkt + ut (1)

1) Estimate model 1 and save the residuals.

2) Run the following model with the number of lags used

determined by the order of serial correlation you are

willing to test.u^t= 0+ 1X2t+ 2X3t ++ RrXRt + R+1u^t-1+ R+2u^t-2++ R+pu^t-p

(auxiliary regression)

[Note: the ^ are supposed to be on top of the us above]23


24/64


The test statistic may be written as an

LM statistic = (n - )*R2,

where R2relates to this auxiliary regression.

The statistic is distributed asymptotically as chi-square

(2) with degrees of freedom.

If the LM statistic is bigger than the 2critical then we

reject the null hypothesisof no serial correlation and

conclude that serial correlation exists

24


25/64

Ho: No Autocorrelation Here we have autocorrelation (Prob. F


26/64

Solutions to Autocorrelation

1. Find cause

2. Increase number of observations

3. Re-specify model correctly

4. E-views provides number of procedures eg Cochrane Orcutt -last resort

5. Most important: It is easy to confuse misspecified dynamicswith serial correlation in the errors. In fact it is best to alwaysstart from a general dynamic model and test the restrictionsbefore applying the tests for serial correlation.

[i.e. If this periods Y depends on last periods Y, we should include alagged term for Y in our model rather than thinking its a problem with theerrors for the original model!]

26


27/64

Assumption 4:

utN(0,2)

Normally distributed error term

27


28/64

Testing the Normality Assumption

Recall we needed to assume normally distributeddisturbances for many hypothesis tests to be valid

Testing for Departures from Normality

The Bera Jarque normali ty test

Skewness and kurtosis are the (standardised) third andfourth moments of a distribution. The first and second arethe mean and variance respectively.

A normal distribution is not skewed and is defined to have a

coefficient of kurtosis of 3. The kurtosis of the normal distribution is 3 so its excess

kurtosis (b2-3) is zero.

28


29/64

Normal versus Skewed Distributions

A normal distribution A skewed distribution

f(x)

x x

f(x)

29


30/64

Leptokurtic versus Normal Distribution

-5.4 -3.6 -1.8 -0.0 1.8 3.6 5.4

0.0

0.1

0.2

0.3

0.4

0.5

30


31/64

Testing for Normality

Bera and Jarque formalise this by testing the residuals for normality bytesting whether the coefficient of skewness and the coefficient of excesskurtosis are jointly zero.

It can be proved that the coefficients of skewness and kurtosis can beexpressed respectively as:

and

The Bera Jarque test statistic is given by

We estimate b1and b2using the residuals from the OLS regression, .

This test of normality can be easily performed using E-views

u

b

E u1

3

23 2

[ ]

/

b

E u2

4

2 2

[ ]

2~

24

3

6

2

2

2

2

1

bbTW

31


32/64

What do we do if we find evidence of Non-Normality?

It is not obvious what we should do!

Could use a method which does not assume normality, but difficult and

what are its properties?

Often the case that one or two very extreme residuals causes us to

reject the normality assumption.

A solutionis to use dummyvariables.

e.g. say we estimate a monthly model of asset returns from 1980-1990, and we plot the residuals, and find a particularly large outlier for

October 1987:

32


33/64

What do we do if we find evidence

of Non-Normality? (contd)

Create a new variable:

D87M10t= 1 during October 1987 and zero otherwise.

This effectively knocks out that observation. But we need a theoreticalreason for adding dummy variables.

Oct

1987

+

-

Time

tu

33


34/64

A test for functional form

34


35/64

Adopting the Wrong Functional Form

We have previously assumed that the appropriate functionalform is linear.

This may not always be true.

We can formally test this using RamseysRESET test, whichis a general test for mis-specification of functional form.

1. Essentially the method works by adding higher order terms of the

fitted values (e.g. etc.) into an auxiliary regression:2. Regress on powers of the fitted values:

3. ObtainR2from this regression. The test statistic is given by TR2and is

distributed as a .

4. So if the value of the test statistic is greater than a then rejectthe null hypothesis that the functional form was correct.

... u y y y vt t t p t

p

t

0 1

2

2

3

1

, y yt t2 3

2 1( )p

ut

2 1( )p

35


36/64

But what do we do if this is the case?

The RESET test gives us no guide as to what a better specification

might be.

One possible cause of rejection of the test is if the true model is

In this case the remedy is obvious.

Another possibility is to transform the data into logarithms. This will

linearise many previously multiplicative models into additive ones:

ttttt uxxxy 3

24

2

23221

ttt

u

tt uxyeAxy t lnln

36


37/64

Up to here is covered in the

first 9 chapters of Asetiou

and Hall

37


38/64

Intro to Endogeneity

[wont have time to cover this

in classnot on exam]

38


39/64

Learning outcomes

1. To understand what endogeneity is & why it

is a problem.

2. To have an intuitive understanding of how

we might fix it.

But: We wont cover this in any detail despitethe fact that it is an extremely important

topic. Unfortunately we have limited time!

39


40/64

Sources of Endogeneity

3 causes of endogeneity:

1. Omitted Variables

2. Measurement Error

3. Simultaneity

I will only focus on endogeneity caused by

omitted variables.

40


41/64

The assumption:

When we estimate a model such as:

using OLS, we assume that cov(Xi

, ) = 0

i.e. our included variables (Xi) are not related to any

unobserved variables () that influence the dependent

variable (Y).

The example we will work with today:

Earnings = + 1YrsEd + 2Other variables +

41

2211 XXY


42/64

Assuming cov(X, ) = 0: We can visualize the case where the assumption holds as

follows:

Notice that there is no link between X and .

If X affects Y, when X changes so should Y

The effect of X on Y is:

42

Y

X

)(

),(

XVar

YXCov


43/64

Example:

Comparing the earnings of people with different education

levels will give us an estimate of the effect of education on

earnings (controlling for gender etc.):

43

Earnings

Education Experience

Ethinicity

Region etc.


44/64

Example 15.4 Wooldridges book

An extra year of education increases wages by

roughly 7.4%.

(Coefficient*100 is % effect because Y is logged)

44


45/64

What if our assumption is

incorrect, i.e. cov(X, )0?

=> Endogeneity

45


46/64

Endogeneity

In our example we looked at the effect of

education on earnings.

However many unobserved variables affect both

income and education e.g. the individuals ability High ability people may be more likely to go to

college

High ability people generally earn more

Since we do not observe ability, we will thinkall of the higher earnings is due to educationi.e. estimate the effect, , incorrectly!

46


47/64

Endogeneity:

Now we have a problem, some variables in are related toX

as well as to Y => Cov(X,) 0

A change in , will changeX and Ybut we do not observe , so

we think that X is causing the change in Y directly!!! Thus we get the wrong estimate for :

47

Y

X

X

Xcorrp

),(lim


48/64

Endogeneity:

Now we have a problem, some variables in are related toX

as well as to Y => Cov(X,) 0

A change in , will changeX and Ybut we do not observe , so

we think that X is causing the change in Y directly!!! Thus we get the wrong estimate for :

48

X

Xcorrp

),(lim

Earnings

Education


49/64

Solutions to endogeneity:

Instrumental Variables (IV)

49


50/64

Solution 1:

If we could include the omitted variable (or a

proxy) into our regression, this will solve the

problem since we would be controlling for

ability. E.g. we might try to measure ability with an

aptitude test and include that in our model.

Problem: often good proxies are not

available.

50


51/64

Solution 2: Instrumental variable

Suppose we can find a variable Z that is related to Xbut not to (or Y).

Since Z is not related to , if we focus on changes in

X that are due to Z, we overcome the problem. (since

these changes in X are not related to )

51

Y

X Z


52/64

Solution 2: Instrumental variable

In our example, we need a variable related to years ofeducation but not related to ability

Proximity to college may affect the likelihood of

going to college but not your ability!

52

Proximity to college

Earnings

Education


53/64

Two stage least squares

First stage:

Regress X and Z (and any other observed variables)

Now predict X =>

Key point:

Since Z is not related to , the predicted cant be

related to , so it is exogenous

53

X= *+ *1Z+ *2Other variables +


54/64

Problem solved:

Second stage: Regress Y and . This IVwill be

unbiased.

Note however that the standard error needs to be

corrected since we reduced the variability of X in stage 1.

54

Y

i d i i i


55/64

First stage: regress Education on proximity

to college

55

S d U di d d i i


56/64

Second stage: Usepredictededucation in

original regression instead of education

56

l b d


57/64

Original OLS (biased)

57

C i f l


58/64

Comparison of Results:

Coefficients:

OLS: 0.074009 = > 7.4%increase in wages per

year

IV: 0.1322888 = > 13.2%increase in wages peryear

This difference may be important to an individual

deciding whether to go to college!

Note: standard errors are larger in IV than OLS becausewe lose some explanatory power when we exclude part

of X (the part not correlated with Z)

58


59/64

Some issues with IVwonthave time to cover today!

59

W k I


60/64

Weak Instruments

Suppose Z is not strongly correlated with X e.g. My

education level may be related to my cousins

butprobably not very strongly!

In the first stage regression we will not get a very good

prediction for X => so our second stage estimate wont

be good.

60

X Z

Y

I lid I


61/64

Invalid Instruments

Here The instrument is not exogenous since it is also related to

=> We would still get a biasedestimate

(in fact it couldbe even worse than the original estimate)

61

X Z

Y

T S L S i


62/64

Two Stage Least Squares estimator

(2SLS): Recall the standard OLS estimator is:

The 2SLS estimator is:

62

)(

),(

XVar

YXCov

),(

),(

XZCov

YZCov

E i


63/64

Exercise:

Suppose that I am interested in the effect of

hours of study (X) on exam results (Y).

1. Are there any unobserved variables that

might influence both X and Y?2. Can you think of a proxyvariable for them?

3. Suppose we have data on the following

variables, would any of these be validinstruments for hours of study? Why, why

not?

63


64/64

Sample questions on Black Boardtry to do the ones with the *

violations of ols

Documents

ols 2008 proceedings v1

meet the ols

ols geometry

sensitivity analysis of ols multiple regression …...ols...

ols - open life solutions!

go ols by report

traffic enforcement. categories of violations infractions...

pcap touch screen displays€¦ · pcap touch screen...

multiple regression & ols violations week 4 lecture mg461...

ols 40 and ols 80 priority valves service manual · title:...

2011 ols brochure

ols examples - rutgers...

ols briefing-m2 (23)

of ols caio vigo...motivation for multiple regression the...

model predicting emergence of western corn...

ols assumptions

finite-sample properties of ols [pdf only]

fully modified ols for heterogeneous cointegrated … ·...

conception à la pointe de l’innovation · 2019. 10....

econometrics i: ols