the simple regression model - universidade nova de lisboadocentes.fe.unl.pt/~azevedoj/web...

Simple Regression Model 1

The Simple Regression Model


Simple regression model: Objectives

Given the model:

- where y is earnings and x years of education

- Or y is sales and x is spending in advertising

- Or y is the market value of an apartment and x is its area

Our objectives for the moment will be:

- Estimate the unknown parameters β0 andβ1

- Assess “how much”x explains y


The Simple Linear Regression (SLR) Model with Cross-Sectional Data

• y is the dependent variable (or explained variable, or response variable, or theregressand)

• x is the independent variable (or explanatory variable, or control variable, orthe regressor)

• u is the error term or disturbance (y is allowed to vary for the same level ofx)

• This is a Population equation. It islinear in the (unknown) parameters

Assumption SLR.1(Linearity in parameters)


Further assumptionsAssumption SLR.2(Random Sampling)

We have a random sample from the population such that:

Assumption SLR.3(Variation in the Regressor)

xi, i=1,2,...,n don’t have all the same value


Further assumptions

ueducationEarnings ++= 10 ββ

This is not a restrictive assumption, since we can always use β0 so that SLR.4´holds

Assumption SLR.4(Zero Conditional Mean)

Assumption SLR.4(́Zero Mean)

Not really necessary given that SLR.4 implies SLR.4´

Crucial assumption! Remember previous examples:

uPolicemanCrimeRate ++= 10 ββIn these cases SLR.4 most likely fails.


Implication of Zero Conditional Mean

..

x1 x2

E(y|x) = β0 + β1x

y

f(y)

implies

For anyx, distribution ofy is centered at E(y|x)


.

..

.

y4

y1

y2

y3

x1 x2 x3 x4

}

}

{

{

u1

u2

u3

u4

x

y

Population regression line, sample data pointsand the associated error terms

E(y|x) = β0 + β1x


Estimation of unknown parameters by method of moments

• Want to estimate the population parameters from a sample

• Let {(xi,yi): i=1, …,n} denote a random sample of size n from the population

• yi = β0 + β1xi + ui

implies

implies

since

i)

ii)


Estimation of unknown parameters by method of moments (intuition)

Clever idea is to pick so that sample moments match the population moments

These are two equations in the two unknowns

The sample counterparts are:


Estimation of unknown parameters by method of moments (solution)

Need sample variation of the x variable.Otherwise the slope parameter is notwell defined!

implies:

Plug-in in

to obtain:

this simplifies to:


Slope estimator

• The slope estimate is the sample covariance between x and ydivided by the sample variance of x

• If x and y are positively correlated, the slope will be positive

• If x and y are negatively correlated, the slope will be negative

• Need x to vary in our sample

It is anestimator if we treat (xi,yi) as randomvariables. It is an estimate once we substitute(xi,yi) by the actual observations.


Example and interpretation

• wageis net monthly wage andeducmeasures the number ofyears of schooling completed

• So, we estimate a positive relation between these twovariables; an additional year of schooling leads to an estimatedexpected increase of 52.513 euros in the wage

• Again, must be cautious about the interpretation of theseestimates

n=11064, Data from the Employment Survey (INE), 2003


Definitions

Fitted value

Residual

Sum of Squared Residuals (RSS)


.

..

.

y4

y1

y2

y3

x1 x2 x3 x4

}

}

{

{

û1

û2

û3

û4

x

y

Sample regression line, sample data pointsand the associated estimatederror terms

(called residuals); fitted values are on the line

xy 10ˆˆˆ ββ +=


Alternate approach to derivationOLS – Ordinary Least Squares

• Can fit a line so as to minimise the sum of squared residuals

Equivalent to the equations we had before: same solution!


Review – OLS estimators

OLS obtained by minimising:

Fitted value:

Residual:

Given a random sample:

with respect to

Alternatively, can exploit the fact that:


Today

- Assess “how much”x explains y

- Assess “how close” our estimators

are from the true (population) values


Looking for a Goodness of fit measure: Decomposition of the Total Sum of

Squares (SST) We know:

SSE, Explained Sum ofSquares

This implies: , to be used later

Also: Now define the Total Sum of SquaresSST


Goodness-of-Fit

• How well does our sample regression line fit the data? If SSRis very small relative to SST(or SSEvery large relative to SST), then a lot of the variation in the dependent variable y is “explained” by the model

• Can compute the fraction of the total sum of squares (SST) that is explained by the model, denote this as the R-squared of the regression

• R2 = SSE/SST= 1 –SSR/SST

• The lower SSR, the higher will be R2

• OLS maximises R2 (minimisesSSR). We have always 0≤ R2 ≤1


Example (continued)

• A lot of variation in the dependent variable (wage) is left unexplained by the model

• This isnot related to the fact thateducis most probably correlated withu

• Can have low R2 even if all the assumptions hold true

n=11064, Data from the Employment Survey (INE), 2003


Assumptions again...Assumption SLR.1(Linearity in parameters)

Assumption SLR.2(Random Sampling from the population)

We have a random sample

Assumption SLR.3(Variation in the Regressor)

xi, i=1,2,...,n don’t have all the same value

Assumption SLR.4(Zero Conditional Mean)

This implies


Properties of OLS EstimatorsThought experiment:

• Get a sample of sizen from the population many times (infinite times)• Compute the OLS estimates• Is the average of these estimates equal to the true parameter values?

• Under the previous assumption (SLR.1 to SLR.4) it turnsout that theanswer is positive

It will be the case that:

These are estimators if we treat (xi,yi) as random variables


Unbiasedness of OLS (Proof)

∑ = −=n

i ix xxSST1

2)(

Let us rewrite our estimator as:

Put:

Now, note that: ( ) ( ) ( )2111

and 0 ∑∑∑ ===−=−=−

n

i ii

n

i i

n

i i xxxxxxx


Unbiasedness of OLS (cont)

Now take expectationsConditional on the values of the x’s, that is, treat thex’s asconstants. To avoid heavy notation, I don´t make this explicit.

Now, it´s really easy to prove unbiasedness of

Try it yourself!

And if the conditional expectation is a constantthe unconditinal expectation is also that constant...


Unbiasedness Summary

• The OLS estimators ofβ1 andβ0 are unbiased

• However, in a given sample we may be “close” or “far” from the true parameter, unbiasedness is a minimal requirement

• Unbiasedness depends on the 4 assumptions:if any assumption fails, then OLS is not necessarily unbiased


Variance of the OLS Estimators

• We know already that the sampling distribution of our estimators is centered around the true parameter

• Is the distribution concentrated around the true parameter?

• To answer this question, we add an additional assumption

Assumption SLR.5(Homoskedasticity)


Variance of OLS (cont.)

• Var(u|x) = E(u2|x)-[E(u|x)]2

• E(u|x) = 0, soσ2 = E(u2|x) = E(u2) = Var(u)

• So,σ2 is also the unconditional variance, called the variance of

the error

• σ, the square root of the error variance is called the standard deviation of the error

• Can write: E(y|x)=β0 + β1x and Var(y|x) = σ2

Assumption SLR.5(Homoskedasticity)


..

x1 x2

Homoskedastic Case

E(y|x) = β0 + β1x

y

f(y|x)


.

xx1 x2

yf(y|x)

Heteroskedastic Case

x3

..

E(y|x) = β0 + β1x


Variance of OLS (cont.)

( )( ) ( )

1 1

2 22

2 222 2 2

ˆ 1 ( )

1 1( ) ( )

1 1( )

i ix

i i i ix x

i xx x x

Var Var x x uSST

Var x x u x x Var uSST SST

x x SSTSST SST SST

β β

σσ σ

= + − = = − = − = − = =

∑∑ ∑∑

This is a conditional variance!

∑ = −=n

i ix xxSST1

2)(

)()()(

)()()( 2

XVarYVarXYVar

XVarbbXVarbXaVar

+=+

==+

Note that:

- a andb are constants andX a Random Variable

- If X andYare independent


Variance of OLS Summary

• The larger the error variance, σ2, the larger the variance of the slope estimator

• The larger the variability in the xi, the smaller the variance of the slope estimate

• SSTx tends to increase with the sample size n, so variance tends to decrease with the sample size

• Can also show:

( )xSST

Var2

1̂σβ = ∑ = −=

n

i ix xxSST1

2)(


Estimating the Error Variance• In practice, the error variance σ2 is unknown, must estimate

it...

• Can show that this is an unbiased estimator of the error variance

• The so-called standar error of the regression is given by:


Estimated standard error (se) of the parameters

And similarly for

In our leading example:

the simple regression model - universidade nova de lisboadocentes.fe.unl.pt/~azevedoj/web...

Documents