the simple regression model - universidade nova de lisboadocentes.fe.unl.pt/~azevedoj/web...

33
Simple Regression Model 1 The Simple Regression Model

Upload: hoangdung

Post on 05-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 1

The Simple Regression Model

Page 2: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 2

Simple regression model: Objectives

Given the model:

- where y is earnings and x years of education

- Or y is sales and x is spending in advertising

- Or y is the market value of an apartment and x is its area

Our objectives for the moment will be:

- Estimate the unknown parameters β0 andβ1

- Assess “how much”x explains y

Page 3: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 3

The Simple Linear Regression (SLR) Model with Cross-Sectional Data

• y is the dependent variable (or explained variable, or response variable, or theregressand)

• x is the independent variable (or explanatory variable, or control variable, orthe regressor)

• u is the error term or disturbance (y is allowed to vary for the same level ofx)

• This is a Population equation. It islinear in the (unknown) parameters

Assumption SLR.1(Linearity in parameters)

Page 4: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 4

Further assumptionsAssumption SLR.2(Random Sampling)

We have a random sample from the population such that:

Assumption SLR.3(Variation in the Regressor)

xi, i=1,2,...,n don’t have all the same value

Page 5: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 5

Further assumptions

ueducationEarnings ++= 10 ββ

This is not a restrictive assumption, since we can always use β0 so that SLR.4´holds

Assumption SLR.4(Zero Conditional Mean)

Assumption SLR.4(́Zero Mean)

Not really necessary given that SLR.4 implies SLR.4´

Crucial assumption! Remember previous examples:

uPolicemanCrimeRate ++= 10 ββIn these cases SLR.4 most likely fails.

Page 6: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 6

Implication of Zero Conditional Mean

..

x1 x2

E(y|x) = β0 + β1x

y

f(y)

implies

For anyx, distribution ofy is centered at E(y|x)

Page 7: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 7

.

..

.

y4

y1

y2

y3

x1 x2 x3 x4

}

}

{

{

u1

u2

u3

u4

x

y

Population regression line, sample data pointsand the associated error terms

E(y|x) = β0 + β1x

Page 8: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 8

Estimation of unknown parameters by method of moments

• Want to estimate the population parameters from a sample

• Let {(xi,yi): i=1, …,n} denote a random sample of size n from the population

• yi = β0 + β1xi + ui

implies

implies

since

i)

ii)

Page 9: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 9

Estimation of unknown parameters by method of moments (intuition)

Clever idea is to pick so that sample moments match the population moments

These are two equations in the two unknowns

The sample counterparts are:

Page 10: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 10

Estimation of unknown parameters by method of moments (solution)

Need sample variation of the x variable.Otherwise the slope parameter is notwell defined!

implies:

Plug-in in

to obtain:

this simplifies to:

Page 11: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 11

Slope estimator

• The slope estimate is the sample covariance between x and ydivided by the sample variance of x

• If x and y are positively correlated, the slope will be positive

• If x and y are negatively correlated, the slope will be negative

• Need x to vary in our sample

It is anestimator if we treat (xi,yi) as randomvariables. It is an estimate once we substitute(xi,yi) by the actual observations.

Page 12: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 12

Example and interpretation

• wageis net monthly wage andeducmeasures the number ofyears of schooling completed

• So, we estimate a positive relation between these twovariables; an additional year of schooling leads to an estimatedexpected increase of 52.513 euros in the wage

• Again, must be cautious about the interpretation of theseestimates

n=11064, Data from the Employment Survey (INE), 2003

Page 13: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 13

Definitions

Fitted value

Residual

Sum of Squared Residuals (RSS)

Page 14: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 14

.

..

.

y4

y1

y2

y3

x1 x2 x3 x4

}

}

{

{

û1

û2

û3

û4

x

y

Sample regression line, sample data pointsand the associated estimatederror terms

(called residuals); fitted values are on the line

xy 10ˆˆˆ ββ +=

Page 15: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 15

Alternate approach to derivationOLS – Ordinary Least Squares

• Can fit a line so as to minimise the sum of squared residuals

Equivalent to the equations we had before: same solution!

Page 16: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 16

Review – OLS estimators

OLS obtained by minimising:

Fitted value:

Residual:

Given a random sample:

with respect to

Alternatively, can exploit the fact that:

Page 17: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 17

Today

- Assess “how much”x explains y

- Assess “how close” our estimators

are from the true (population) values

Page 18: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 18

Looking for a Goodness of fit measure: Decomposition of the Total Sum of

Squares (SST) We know:

SSE, Explained Sum ofSquares

This implies: , to be used later

Also: Now define the Total Sum of SquaresSST

Page 19: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 19

Goodness-of-Fit

• How well does our sample regression line fit the data? If SSRis very small relative to SST(or SSEvery large relative to SST), then a lot of the variation in the dependent variable y is “explained” by the model

• Can compute the fraction of the total sum of squares (SST) that is explained by the model, denote this as the R-squared of the regression

• R2 = SSE/SST= 1 –SSR/SST

• The lower SSR, the higher will be R2

• OLS maximises R2 (minimisesSSR). We have always 0≤ R2 ≤1

Page 20: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 20

Example (continued)

• A lot of variation in the dependent variable (wage) is left unexplained by the model

• This isnot related to the fact thateducis most probably correlated withu

• Can have low R2 even if all the assumptions hold true

n=11064, Data from the Employment Survey (INE), 2003

Page 21: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 21

Assumptions again...Assumption SLR.1(Linearity in parameters)

Assumption SLR.2(Random Sampling from the population)

We have a random sample

Assumption SLR.3(Variation in the Regressor)

xi, i=1,2,...,n don’t have all the same value

Assumption SLR.4(Zero Conditional Mean)

This implies

Page 22: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 22

Properties of OLS EstimatorsThought experiment:

• Get a sample of sizen from the population many times (infinite times)• Compute the OLS estimates• Is the average of these estimates equal to the true parameter values?

• Under the previous assumption (SLR.1 to SLR.4) it turnsout that theanswer is positive

It will be the case that:

These are estimators if we treat (xi,yi) as random variables

Page 23: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 23

Unbiasedness of OLS (Proof)

∑ = −=n

i ix xxSST1

2)(

Let us rewrite our estimator as:

Put:

Now, note that: ( ) ( ) ( )2111

and 0 ∑∑∑ ===−=−=−

n

i ii

n

i i

n

i i xxxxxxx

Page 24: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 24

Unbiasedness of OLS (cont)

Now take expectationsConditional on the values of the x’s, that is, treat thex’s asconstants. To avoid heavy notation, I don´t make this explicit.

Now, it´s really easy to prove unbiasedness of

Try it yourself!

And if the conditional expectation is a constantthe unconditinal expectation is also that constant...

Page 25: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 25

Unbiasedness Summary

• The OLS estimators ofβ1 andβ0 are unbiased

• However, in a given sample we may be “close” or “far” from the true parameter, unbiasedness is a minimal requirement

• Unbiasedness depends on the 4 assumptions:if any assumption fails, then OLS is not necessarily unbiased

Page 26: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 26

Variance of the OLS Estimators

• We know already that the sampling distribution of our estimators is centered around the true parameter

• Is the distribution concentrated around the true parameter?

• To answer this question, we add an additional assumption

Assumption SLR.5(Homoskedasticity)

Page 27: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 27

Variance of OLS (cont.)

• Var(u|x) = E(u2|x)-[E(u|x)]2

• E(u|x) = 0, soσ2 = E(u2|x) = E(u2) = Var(u)

• So,σ2 is also the unconditional variance, called the variance of

the error

• σ, the square root of the error variance is called the standard deviation of the error

• Can write: E(y|x)=β0 + β1x and Var(y|x) = σ2

Assumption SLR.5(Homoskedasticity)

Page 28: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 28

..

x1 x2

Homoskedastic Case

E(y|x) = β0 + β1x

y

f(y|x)

Page 29: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 29

.

xx1 x2

yf(y|x)

Heteroskedastic Case

x3

..

E(y|x) = β0 + β1x

Page 30: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 30

Variance of OLS (cont.)

( )( ) ( )

1 1

2 22

2 222 2 2

ˆ 1 ( )

1 1( ) ( )

1 1( )

i ix

i i i ix x

i xx x x

Var Var x x uSST

Var x x u x x Var uSST SST

x x SSTSST SST SST

β β

σσ σ

= + − = = − = − = − = =

∑∑ ∑∑

This is a conditional variance!

∑ = −=n

i ix xxSST1

2)(

)()()(

)()()( 2

XVarYVarXYVar

XVarbbXVarbXaVar

+=+

==+

Note that:

- a andb are constants andX a Random Variable

- If X andYare independent

Page 31: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 31

Variance of OLS Summary

• The larger the error variance, σ2, the larger the variance of the slope estimator

• The larger the variability in the xi, the smaller the variance of the slope estimate

• SSTx tends to increase with the sample size n, so variance tends to decrease with the sample size

• Can also show:

( )xSST

Var2

1̂σβ = ∑ = −=

n

i ix xxSST1

2)(

Page 32: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 32

Estimating the Error Variance• In practice, the error variance σ2 is unknown, must estimate

it...

• Can show that this is an unbiased estimator of the error variance

• The so-called standar error of the regression is given by:

Page 33: The Simple Regression Model - Universidade Nova de Lisboadocentes.fe.unl.pt/~azevedoj/Web Page_files/Teaching_files... · Simple Regression Model 4 Further assumptions ... Now take

Simple Regression Model 33

Estimated standard error (se) of the parameters

And similarly for

In our leading example: