the simple regression model - web.uvic.caweb.uvic.ca/~mfarnham/345/t2notes.pdf · in the simple...
TRANSCRIPT
The Simple Regression Model
y = β0 + β1x + u
Some Terminology
! In the simple linear regression model, where y = β0 + β1x + u, we typically refer to y as the Dependent Variable, or Left-Hand Side Variable, or Explained Variable, or Regressand
1
Some Terminology, cont.
! In the simple linear regression of y on x, we typically refer to x as the Independent Variable, or Right-Hand Side Variable, or Explanatory Variable, or Regressor, or Covariate
2
Some Terminology, cont.
! The parameters of the model, β0 and β1, are constants ! In this case they are the intercept and slope of a function
relating education to earnings ! The error term, u, includes all other factors affecting
earnings, including unobserved factors ! One could estimate the parameters of this model using data
on a random sample of working people
!
Earnings = "0 + "1Educ + u
3
A Simple Assumption
! We start by making a simple assumption about the error
! The average value of u, the error term, in the population is 0. That is,
! E(u) = 0 ! This is not a restrictive assumption, since
we can always use β0 to normalize E(u) to 0 e.g. use β0 to normalize average ability to zero
4
Zero Conditional Mean
! We need to make a crucial assumption about how u and x are related
! We need to assume that the average value of u does not depend on the value of x
! x and u are independent, that is ! E(u|x) = E(u) = 0, which implies ! E(y|x) = β0 + β1x
5
Zero Conditional Mean
! To get some intuition as to what this means, think about the returns to education example
! Suppose ability (which is unobservable to us) was included in u
! Our assumption implies: E(ability|8 years ed.) = E(ability|12 years ed.) ! This would not be the case if those with
more ability chose more education
6
E(y|x) as a linear function of x, where for any x the distribution of y is centered about E(y|x)
. .
x1 x2
E(y|x) = β0 + β1x
y f(y|x)
Population Regression Function
7
Ordinary Least Squares
! Note that I haven’t mentioned data yet. ! Basic idea of regression is to estimate the
population parameters (β0 and β1) from a sample ! Let {(xi,yi): i=1, …,n} denote a random sample of
size n from the population ! Since they are drawn from the population, for
each observation in this sample, it will be the case that:
! yi = β0 + β1xi + ui 8
Population regression line, some actual draws from the population and the associated error terms
. .
. u1
. y4
y1
y2
y3
x1 x2 x3 x4
}
{
u2
u3
u4
}
{
x
y
E(y|x) = β0 + β1x
• Assume a population regression function (true relationship; para- meters unknown to us) • Actual outcomes differ from true relationship because of u • Want to estimate the true relationship based on data
9
Deriving OLS Estimates
! We need some way of estimating the true relationship using the sample data
! Can do this using some of our assumptions ! First we need to realize that our main assumption
of E(u|x) = E(u) = 0 also implies ! Cov(x,u) = E(xu) = 0 Why? ! From basic probability we know that:
Cov(x,u) = E(xu) – E(x)E(u) ! given E(u)=0 (by assumption) and Cov(x,u)=0 (by same
assumption plus independence) then E(xu)=0
10
Deriving OLS continued
! So we have that 1. E(u)=0 2. E(xu)=0 ! These can be rewritten in terms of x, y, β0 and β1 ,
since u = y – β0 – β1x 1. E(y – β0 – β1x) = 0 2. E[x(y – β0 – β1x)] = 0 ! These are called moment restrictions
11
Deriving OLS using M.O.M.
! So we have two equations (2 restrictions) and 2 unknowns (β0 , β1)
! We will solve using the sample counterparts of these two equations
! The method of moments approach to estimation implies imposing the population moment restrictions on the sample moments
! What does this mean? Recall that for E(X), the mean of a population distribution, a sample estimator of E(X) is simply the arithmetic mean of the sample
12
More Derivation of OLS
! We want to choose values of the parameters that will ensure that the sample versions of our moment restrictions are true
! The sample versions are as follows:
1A) 1n
(yi ! "0 ! "1i=1
n
# xi ) = 0
2A) 1n
xi (yi ! "0 ! "1i=1
n
# xi ) = 0
13
More Derivation of OLS ! Given the definition of a sample mean, and
properties of summation, we can rewrite equation 1(A) as follows:
1n
yii=1
n
! =1n
"0i=1
n
! +1n"1 xi
i=1
n
! , or
1n
yii=1
n
! =1nn"0 + "1
1n
xii=1
n
!#
$%&
'(, or
y = "0 + "1xsolving for !0 ,
!0 = y " !1x
! The equation passes through the means of x and y
! Need an expression of the estimate of β1 to get an estimate of β0
14
More Derivation of OLS ! 2(A) can be rewritten dropping 1/n and plugging
in our estimate of :
xi yi ! y ! "1x( ) ! "1xi( )i=1
n
# = 0
xi yi ! y( )i=1
n
# = "1 xi xi ! x( )i=1
n
#
xi ! x( )i=1
n
" yi ! y( ) = #1 xi ! x( )2i=1
n
"
!0
15
So the OLS estimated slope is
provided that xi ! x( )2
i=1
n
" > 0
!1 =xi " x( ) yi " y( )
i=1
n
#
xi " x( )2i=1
n
#
Of course, this can’t be less than zero. But if it’s zero, we’re out of luck. Intuitively, this should make sense. We need variation in x to discern a relationship between x and y. 16
Summary of OLS slope estimate
! The slope estimate is the sample covariance between x and y divided by the sample variance of x
! If x and y are positively correlated, the slope will be positive
! If x and y are negatively correlated, the slope will be negative
! Only need x to vary in our sample to get estimate Example: If everyone in the population has the same education
the denominator equals zero - uninteresting question
17
OLS Interpretation
! Intuitively, OLS is fitting a line through the sample points such that the sum of squared residuals is as small as possible, hence the term least squares
! The residual, û, is an estimate of the error term, u, and is the difference between the actual data point, yi, and its fitted value.
ui = yi ! yi = yi ! "0 ! "1xi18
Sample Regression Function, sample data points and the associated estimated error terms
.
. .
.
y4
y1
y2
y3
x1 x2 x3 x4
}
}
{
{
û1
û2
û3
û4
x
y
xy 10ˆˆˆ !! +=
! Residual is difference between actual y and fitted value ! SRF is obtained for a given sample by minimizing sum of squared residuals ! Each new sample will generate a different function ! Unlike the PRF, which is fixed
19
Alternate approach to derivation
! Given the intuitive idea of fitting a line, we can set up a formal minimization problem
! That is, we want to choose our parameters such that we minimize the following (the sum of squared residuals):
u( )i=1
n
!2
= yi " #0 " #1xi( )2i=1
n
!
20
Alternate approach, continued ! Solving the minimization problem using calculus you
obtain the following first order conditions,
foc1: foc2:
yi ! "0 ! "1xi( )i=1
n
# = 0
xi yi ! "0 ! "1xi( )i=1
n
# = 0
! These are the same as we obtained before (using method of moments), multiplied by n
! We could divide both sides by n and proceed as before
21
Algebraic Properties of OLS ! These properties are basically by definition 1. The sum of the OLS residuals is zero (this is from
the first foc)
! Thus, the sample average of the OLS residuals is zero as well
3. The OLS regression line always goes through the mean of the sample
2. The sample covariance between the regressors and the OLS residuals is zero (by foc2)
uii=1
n
! = 0
1n
uii=1
n
! = 0
xiuii=1
n
! = 0
y = !0 + !1x22
Decomposition of sum of squares ! We can think of each observation being made up
of two parts ! An explained part and an unexplained part
iii uyy ˆˆ +=
! We can then define the following:
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Total Sum of Squares: SST ! (yi " y )2
i=1
n
#
Explained Sum of Squares: SSE ! (yi " y )2
i=1
n
#
Residual Sum of Squares: SSR ! ui2
i=1
n
# 23
Proof that SST = SSE + SSR
yi ! y( )"2= yi + ! yi + yi( ) ! y#$ %&
2"
= yi ! yi( ) + yi ! y( )#$ %&2
"
! We can show that SST=SSE+SSR
( )iii yyu ˆˆ that recall !=
= [ui + (yi ! y )]2
i=1
n
"
= ui2 + 2 ui (yi ! y ) + (yi ! y )2
i=1
n
"i=1
n
"i=1
n
"
= SSR + 2 ui (yi ! y ) + SSEi=1
n
"
and we know that ui (yi ! y ) = 0, since Cov(u,y)=0i=1
n
" 24
Goodness-of-Fit ! We can now tell how well our sample regression line fits
our sample data (i.e. how well x explains y). ! Can compute the fraction of the total sum of squares
(SST) that is explained by the model ! This is called the coefficient of determination or the R-
squared of the regression ! R2 =SSE/SST=1-SSR/SST ! R2 always lies between 0 and 1.
Example: If R2=0.65, this tells us that variation in x explains 65% of the variation in y.
25
iClickers ! R2 =SSE/SST=1-SSR/SST
! Question: If you take a sample and estimate a model and have a high sum of squared residuals, you would say your model’s “goodness of fit” is A) High B) Low 26
Total Sum of Squares: SST ! (yi " y )2
i=1
n
#
Explained Sum of Squares: SSE ! (yi " y )2
i=1
n
#
Residual Sum of Squares: SSR ! ui2
i=1
n
#
Properties of OLS Estimators
! We want to know how our estimates of β0 and β1 relate to the population parameters
! Suppose that we get estimates of the parameters from one sample of the population, then take new samples and get a new estimate of parameters for each sample
! What are the sampling properties of our estimates of β0 and β1?
27
iClickers ! Recall the formula for the OLS slope
estimator:
! Question: Which of the following statements about the estimator is false: A) It is only defined if there is variation in x. B) It is a constant, in repeated samples. C) It will have sampling variance. D) It will have an expected value. 28
!1 =xi " x( ) yi " y( )
i=1
n
#
xi " x( )2i=1
n
#
Unbiasedness of OLS
! We can establish the unbiasedness of OLS under a few simple assumptions
1. Assume the population model is linear in parameters as
2. Assume we have a random sample of size n, {(xi, yi): i=1, 2, …, n}, from the population. We can rewrite the population model in terms of the random sample as
for i=1,2,…,n. ui is the error for observation i.
y = !0 + !1x + u
yi = !0 + !1xi + ui
29
Unbiasedness of OLS 3. Assume there is at least some variation in the xi 4. Assume E(u|x) = 0
For a random sample this implies that E(ui|xi) = 0 for all i. This allows us to derive properties of the OLS estimators conditional on the sample. That is, we will assume that the x’s remain constant in repeated sampling (so that what changes in repeated sampling is the y’s and the u’s). The assumption that ui and xi are independent should not be taken for granted. When it is violated, the parameter estimates will be biased, in general.
! In order to think about unbiasedness, we need to rewrite our estimator in terms of the population parameter
30
Unbiasedness of OLS (cont)
! Start with a simple rewrite of the formula as
!1 =xi " x( )yi#xi " x( )2#
,
We can do this rewrite because
xi ! x( ) yi ! y( )" = xi ! x( )yi"
31
Unbiasedness of OLS (cont)
(xi ! x )yi =" xi ! x( ) #0 + #1xi + ui( )"
= (xi ! x )i=1
n
" #0 + (xi ! x )i=1
n
" #1xi + (xi ! x )i=1
n
" ui
= #0 (xi ! x )i=1
n
" + #1 (xi ! x )i=1
n
" xi + (xi ! x )i=1
n
" ui
! Substituting for yi to get in terms of population parameters the numerator becomes:
32
Unbiasedness of OLS (cont)
xi ! x( )" = 0,
xi ! x( )xi" = xi ! x( )2"
! Recall from the appendices that,
! Thus, the numerator can be rewritten as
!1 (xi " x )2
i=1
n
# + (xi " x )uii=1
n
# , so we can rewrite !1 as
!1 =!1 (xi " x )2
i=1
n
# + (xi " x )uii=1
n
#
(xi " x )2
i=1
n
#= !1 +
(xi " x )uii=1
n
#
(xi " x )2
i=1
n
# 33
Unbiasedness of OLS (cont) let di = xi ! x( ), so that
"1 = "1 +diui
i=1
n
#
di2
i=1
n
#
$
%
&&&&
'
(
))))
,
This tells us that the estimator equals the population slope plus a linear combination of the errors. Since the x’s are assumed to be constant in repeated samples, the second term only varies (in repeated samples) in the u’s. In general the estimator will differ from the true population value. But will it differ systematically? 34
Unbiasedness of OLS (cont) ! Taking the expected value of the estimator:
We find that the slope coeff estimator is unbiased. It doesn’t vary systematically from the true population parameter.
E(!1) = !1 + E1
(xi " x )2
i=1
n
#
$
%
&&&&
'
(
))))
diuii=1
n
#
*
+
,,,,
-
.
////
= !1 +1
(xi " x )2
i=1
n
#
$
%
&&&&
'
(
))))
E(diui )i=1
n
#
= !1 +1
(xi " x )2
i=1
n
#
$
%
&&&&
'
(
))))
diE(ui )i=1
n
# = !1 +1
(xi " x )2
i=1
n
#
$
%
&&&&
'
(
))))
di0 =i=1
n
# !1
35
Unbiasedness Summary
! Similarly, we could also prove that our estimate of β0 is unbiased (homework)
! Thus, the OLS estimates of β1 and β0 are unbiased ! Proof of unbiasedness depends on our 4
assumptions – if any assumption fails, then OLS is not necessarily unbiased.
! Remember unbiasedness is a description of the estimator – in a given sample we may be “near” or “far” from the true parameter.
36
Variance of the OLS Estimators
! Now we know that the sampling distribution of our estimate is centered around the true parameter
! Want to think about how spread out this distribution is ! It’s much easier to think about this variance under an
additional assumption, so ! Assume Var(u|x)=σ2
We’re assuming that the variance of the error term is constant over all values of x. This is referred to as assuming that the errors are homoskedastic.
37
Variance of OLS (cont)
! Var(u|x) = E(u2|x)-[E(u|x)]2 =σ2 ! E(u|x) = 0, and E(u|x)=E(u) so, ! σ2 = E(u2|x) = E(u2) = Var(u) ! Thus σ2 is also the unconditional variance of u ! σ is called the standard deviation of the error. ! For convenience, we can say: E(y | x) = !0 + !1x andVar(y | x) = " 2
38
Homoskedastic Case
. .
x1 x2
E(y|x) = β0 + β1x
y f(y|x)
! Var(y|x) does not depend on x
39
Heteroskedastic Case
. x x1 x2
f(y|x)
x3
. . E(y|x) = β0 + β1x
! When Var(u|x) depends on x so does Var(y|x)
40
Variance of OLS (cont)
! The error term in our returns to education example may exhibit heteroskedasticity
! Recall that y is the wage Low Educated: Likely to become tradespersons, factory workers. Low wage variance. High Educated: May become poets, musicians (low wage), investment bankers (high wage) or anything else. High wage variance.
Later in the course, we will relax the homoskedasticity assumption. 41
Variance of OLS (cont)
=1d
i
2!"
#$
%
&'
2
Var diui!( ) = 1d
i
2!"
#$
%
&'
2
di2Var ui( )!
=1d
i
2!"
#$
%
&'
2
di2( 2! = ( 2 1
di
2!"
#$
%
&'
2
di2! =
( 2
di
2!=
( 2
(xi ) x )2!
! Now, we will use the assumption to find the sampling variances of the OLS estimators
Var(!1) = Var !1 +di" ui
di2
i=1
n
"
#
$
%%%%
&
'
((((
! We are conditioning on x so,
42
Some Intuition ! Under homoskedasticity, we now know that
Larger σ2 means more variance in More variation in unobservables (which are captured
in u) makes it harder to discern the effect of x on y. Larger variation in x’s (bigger denominator)
lowers variance of More variation in the “treatment” (independent
variable) makes it easier to discern the effect of x on y. Note also that larger sample increases denominator==>more precise estimator.
Var(!1) =" 2
(xi # x )2$
!1
!1
43
Variance of OLS (Continued)
! We could also solve to find the sampling variance
Var !0( ) = " 2n#1 xi2$
xi # x( )2$=
" 2 xi2$
n xi # x( )2$! These formulas would not hold in the presence of
heteroskedasticity i.e. couldn’t rely on these variances for hypothesis testing
44
Variance of OLS Summary
! We will primarily be interested in the slope estimate Its variance showed: ! The larger the error variance, σ2, the larger the variance of
the slope estimate ! The larger the variability in the xi, the smaller the variance
of the slope estimate ! As a result, a larger sample size should decrease the
variance of the slope estimate (giving us a more precise estimate)
45
Estimating the Error Variance ! Recall: σ2=E(u2) ! So an unbiased estimator of σ2 is (1/n)Σui
2 Problem ! We don’t know what the error variance, σ2, is,
because we don’t observe the errors, ui ! However, we do observe the residuals, ûi
! Why not replace the errors with the residuals to form an estimate of the error variance:
! 2 =1n
u2i = SSR / ni=1
n
"46
Error Variance Estimate (cont) ! However, this would be a biased estimate because
it does not account for the 2 restrictions already placed on the OLS residuals
1. ui! = 0
2. xiui = 0!! If we know n-2 of the residuals we can solve for
the other 2 (there are n-2 degrees of freedom) ! So an unbiased estimate would be:
! 2 =1
n " 2ui2 = SSR / (n " 2)
i=1
n
#47
Error Variance Estimate (cont) ! Work through the proof on pg. 57 (Theorem 2.3)
to prove that this is an unbiased estimator ! You will need to get the residuals as a function of
the parameters ui = yi ! "0 ! "1xisubstituting for yiui = "0 + "1xi + ui( ) ! "0 ! "1xi! Substitute this in and take the expected value of
the error variance estimator 48
Error Variance Estimate (cont) ! We can now also calculate estimates for the
standard errors of the coefficient estimates ! If we define
We can use this to estimate the standard deviations of the coefficient estimates.
Standard Error of the regression (SER): ! = ! 2
Var(!1) = " 2
(xi # x )2$, so
sd(!1) = "
(xi # x )2$ 49
Error Variance Estimate (cont)
! A natural estimator of this standard deviation is
! This is called the standard error of
se(!1) ="
(xi # x )2$
!1
50
Error Variance Estimate (cont) ! Similarly for β0
! These standard errors will differ for each sample ! They will be used to construct test statistics and
confidence intervals. ! We will do this in a more general framework (with
more LHS variables) ! But this simple case is meant to illustrate the basic
ideas as simply as possible.
se !0( ) = " 2 xi2#
n (xi $ x )2#
51
iClickers ! Recall that
! Question: Which of the following will lead to a
larger sampling variance of the OLS estimator of the slope coefficient in the univariate model? A) A larger sample size B) More variation in x C) More variation in u (the error term)
52
Var(!1) =" 2
(xi # x )2$
iClickers ! Recall that the standard error of the OLS slope
coefficient estimate is given by
! Question: Which of the following statements is
FALSE? A) The standard error is an estimate of the
dispersion of the OLS estimator. B) The standard error is an estimate of the
variance of the OLS estimator C) The standard error conveys information about
how precisely the slope coefficient is estimated
53
se(!1) ="
(xi # x )2$