econometrics i - new york...
TRANSCRIPT
Part 3: Least Squares Algebra 3-1/29
Econometrics I Professor William Greene
Stern School of Business
Department of Economics
Part 3: Least Squares Algebra 3-3/29
Vocabulary
Some terms to be used in the discussion.
Population characteristics and entities vs.
sample quantities and analogs
Residuals and disturbances
Population regression line and sample regression
Objective: Learn about the conditional mean
function. ‘Estimate’ and 2
First step: Mechanics of fitting a line (hyperplane) to
a set of data
Part 3: Least Squares Algebra 3-4/29
Fitting Criteria
The set of points in the sample
Fitting criteria - what are they:
LAD: Minimizeb |y – x’bLAD|
Least squares: Minimizeb (y – x’bLS)2
and so on
Why least squares?
A fundamental result:
Sample moments are “good” estimators of
their population counterparts
We will examine this principle and apply it to least squares computation.
Part 3: Least Squares Algebra 3-5/29
An Analogy Principle for Estimating
In the population E[y | X ] = X so E[y - X |X] = 0 Continuing (assumed) E[xi i] = 0 for every i Summing, Σi E[xi i] = Σi 0 = 0 Exchange Σi and E[] E[Σi xi i] = E[ X ] = 0 E[X(y - X) ] = 0 So, if X is the conditional mean, then E[X’] = 0. We choose b, the estimator of , to mimic this population
result: i.e., mimic the population mean with the sample mean
Find b such that As we will see, the solution is the least squares coefficient
vector.
( )n n
= 1 1
X e 0 X y - Xb
Part 3: Least Squares Algebra 3-6/29
Population Moments
We assumed that E[i|xi] = 0. (Slide 2:40)
It follows that Cov[xi,i] = 0.
Proof: Cov(xi,i) = Cov(xi,E[i |xi]) = Cov(xi,0) = 0.
(Theorem B.2). If E[yi|xi] = xi’, then
= (Var[xi])-1 Cov[xi,yi].
Proof: Cov[xi,yi] = Cov[xi,E[yi|xi]]=Cov[xi,xi’]
This will provide a population analog to the statistics we
compute with the data.
Part 3: Least Squares Algebra 3-8/29
Least Squares
Example will be, Gi regressed on
xi = [1, PGi , Yi]
Fitting criterion: Fitted equation will be
yi = b1xi1 + b2xi2 + ... + bKxiK.
Criterion is based on residuals:
ei = yi - b1xi1 + b2xi2 + ... + bKxiK
Make ei as small as possible.
Form a criterion and minimize it.
Part 3: Least Squares Algebra 3-9/29
Fitting Criteria
Sum of residuals:
Sum of squares:
Sum of absolute values of residuals:
Absolute value of sum of residuals
We focus on now and later
1e
n
ii
2
1e
n
ii
1e
n
ii
1e
n
ii2
1e
n
ii 1e
n
ii
Part 3: Least Squares Algebra 3-10/29
Least Squares Algebra
2 2
1 1e (y )
n n
i i ii i - x b = e e = (y - Xb)'(y - Xb)
Matrix and vector derivatives.
Derivative of a scalar with respect to a vector
Derivative of a column vector wrt a row vector
Other derivatives
Part 3: Least Squares Algebra 3-11/29
Least Squares Normal Equations
2 2 2
1 1
1
1 1 1
1 1
e (y ) (y )
2(y )(- ) = -2 y 2
= -2 y 2
= -2 2
n n
ni i ii i i i
i
n n n
i i i i i i ii i i
n n
i i i ii i
- x b - x b
b b b
- x b x x x x b
x x x b
X'y + X'Xb
Part 3: Least Squares Algebra 3-12/29
Least Squares Normal Equations
2
1 1 K 1 (-2)(n K n 1
(-2)(K n n 1 K 1
Note: Derivative of (1 1) wrt K 1 vector is a K 1 vector.
(y - Xb)'(y - Xb)X'(y - Xb) = 0
b
( ) / ( ) )'( )
= )( ) =
Solution: 2 X'(y - Xb) = 0 X'y = X'Xb
Part 3: Least Squares Algebra 3-13/29
Least Squares Solution
-1
1
1 1
` `
Assuming it exists: = ( )
Note the analogy: = Var( ) Cov( ,y)
1 1 1 1 = y
Suggests something desirable about least squares
n n
i i i ii in n n n
b X'X X'y
x x
b X'X X'y x x x
Part 3: Least Squares Algebra 3-14/29
Second Order Conditions
2
First derivatives =
2
Second derivatives ...
( ) ( )
( ) ( ) =
2 ( ) 2
0
(y - Xb)'(y - Xb)X'(y - Xb)
b
y - Xb ' y - Xb
y - X
Necessary Condition :
Sufficient Condition :
b ' y - Xb b
b b b
X' y - Xb X'y
b b
2 2
K 1 column vector = = K K matrix
1 K row vector
= 2
X' -Xb X'Xb0
b b
X'X
Part 3: Least Squares Algebra 3-15/29
Side Result: Sample Moments
2
1 1 1 1 2 1 1
2
1 2 1 1 2 1 2
2
1 1 1 2 1
2
1 1 2 1
2
2 1 2 2
1
1
...
...
... ... ... ...
...
...
... =
... ... ... ...
n n n
i i i i i i i iK
n n n
i i i i i i i iK
n n n
i iK i i iK i i iK
i i i i iK
n i i i i iK
i
iK i
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x
X'X =
2
2
1
2
1 1 2
1
...
= ......
=
iK i iK
i
in
i i i iK
ik
n
i i i
x x x
x
xx x x
x
x x
Part 3: Least Squares Algebra 3-16/29
Does b Minimize e’e?
2
1 1 1 1 2 1 1
221 2 1 1 2 1 2
2
1 1 1 2 1
...
...2 2
... ... ... ...
...
If there were a single b, we would require t
n n n
i i i i i i i iK
n n n
i i i i i i i iK
n n n
i iK i i iK i i iK
x x x x x
x x x x x
x x x x x
e'eX'X =
b b'
2
1
his to be
positive, which it would be; 2 = 2 0. OK
The matrix counterpart of a positive number is a
.
n
iix
positive definite m
x x
atrix
'
Part 3: Least Squares Algebra 3-17/29
A Positive Definite Matrix
Matrix is positive definite if is > 0 for every .
Generally hard to check. Requires a look at
characteristic roots (later in the course).
For some matrices, it is easy to verify.
C a'Ca a
X'
K 2
kk=1 = v 0
-1
is
one of these.
= ( )( ) = ( ) ( ) =
Could = ? means . Is this possible? No.
Conclusion: = ( ) does indeed minimize .
X
a'X'Xa a'X' Xa Xa ' Xa v'v
v 0 v =0 Xa = 0
b X'X X'y e'e
Part 3: Least Squares Algebra 3-18/29
Algebraic Results - 1
1
= means for each column of , = 0
(1) Each column of is orthogonal to .
(2) One of the columns of is a column of ones.
n
ii
k
n
0 x e
e
In the population: E[ ' ] =
1In the sample : ei
X 0
x 0
X e X
X
X
i1
i1
= e = 0. The residuals sum to zero.
1(3) It follows that e = 0 which mimics E[ ] 0.
n
i
n
iin
i'e
Part 3: Least Squares Algebra 3-19/29
Residuals vs. Disturbances
i i i
i i i
Disturbances (population) y
Partitioning : = E[ | ] +
Residuals (sample) y e
Partitio
x
y y y X
x b
ε
= conditional mean + disturbance
ning : = +
, i.e., the
set of linear combinations of the columns of
y y Xb
X
X. Xb
e
= projection + residual
(Note : Projection into the column space of
is one of these.)
Part 3: Least Squares Algebra 3-20/29
Algebraic Results - 2
A “residual maker” M = (I - X(X’X)-1X’)
e = y - Xb= y - X(X’X)-1X’y = My
My = The residuals that result when y is regressed on X
MX = 0 (This result is fundamental!)
How do we interpret this result in terms of residuals?
When a column of X is regressed on all of X, we get a perfect fit and zero residuals.
(Therefore) My = MXb + Me = Me = e
(You should be able to prove this.
y = Py + My, P = X(X’X)-1X’ = (I - M).
PM = MP = 0.
Py is the projection of y into the column space of X.
Part 3: Least Squares Algebra 3-21/29
The M Matrix
M = I- X(X’X)-1X’ is an nxn matrix
M is symmetric – M = M’
M is idempotent – M*M = M
(just multiply it out)
M is singular; M-1 does not exist.
(We will prove this later as a side result in another derivation.)
Part 3: Least Squares Algebra 3-22/29
Results when X Contains a Constant Term
X = [1,x2,…,xK]
The first column of X is a column of ones
Since X’e = 0, x1’e = 0 – the residuals sum to zero.
n
ii=1
Define [1,1,...,1]' a column of n ones
= y ny
so (1/n) = (1/n)
implies
y (the regression line passes through the means)
These do not apply if the model has
y Xb e
i
i'y
i'y i'Xb+i'e = i'Xb i'y i'Xb
x b
+
no constant term.