econometrics i - new york...

Part 3: Least Squares Algebra 3-1/29

Econometrics I Professor William Greene

Stern School of Business

Department of Economics


Econometrics I

Part 3 – Least Squares Algebra


Vocabulary

Some terms to be used in the discussion.

Population characteristics and entities vs.

sample quantities and analogs

Residuals and disturbances

Population regression line and sample regression

Objective: Learn about the conditional mean

function. ‘Estimate’ and 2

First step: Mechanics of fitting a line (hyperplane) to

a set of data


Fitting Criteria

The set of points in the sample

Fitting criteria - what are they:

LAD: Minimizeb |y – x’bLAD|

Least squares: Minimizeb (y – x’bLS)2

and so on

Why least squares?

A fundamental result:

Sample moments are “good” estimators of

their population counterparts

We will examine this principle and apply it to least squares computation.


An Analogy Principle for Estimating

In the population E[y | X ] = X so E[y - X |X] = 0 Continuing (assumed) E[xi i] = 0 for every i Summing, Σi E[xi i] = Σi 0 = 0 Exchange Σi and E[] E[Σi xi i] = E[ X ] = 0 E[X(y - X) ] = 0 So, if X is the conditional mean, then E[X’] = 0. We choose b, the estimator of , to mimic this population

result: i.e., mimic the population mean with the sample mean

Find b such that As we will see, the solution is the least squares coefficient

vector.

( )n n

= 1 1

X e 0 X y - Xb


Population Moments

We assumed that E[i|xi] = 0. (Slide 2:40)

It follows that Cov[xi,i] = 0.

Proof: Cov(xi,i) = Cov(xi,E[i |xi]) = Cov(xi,0) = 0.

(Theorem B.2). If E[yi|xi] = xi’, then

= (Var[xi])-1 Cov[xi,yi].

Proof: Cov[xi,yi] = Cov[xi,E[yi|xi]]=Cov[xi,xi’]

This will provide a population analog to the statistics we

compute with the data.


U.S. Gasoline Market, 1960-1995


Least Squares

Example will be, Gi regressed on

xi = [1, PGi , Yi]

Fitting criterion: Fitted equation will be

yi = b1xi1 + b2xi2 + ... + bKxiK.

Criterion is based on residuals:

ei = yi - b1xi1 + b2xi2 + ... + bKxiK

Make ei as small as possible.

Form a criterion and minimize it.


Fitting Criteria

Sum of residuals:

Sum of squares:

Sum of absolute values of residuals:

Absolute value of sum of residuals

We focus on now and later

1e

n

ii

2

1e

n

ii

1e

n

ii

1e

n

ii2

1e

n

ii 1e

n

ii


Least Squares Algebra

2 2

1 1e (y )

n n

i i ii i - x b = e e = (y - Xb)'(y - Xb)

Matrix and vector derivatives.

Derivative of a scalar with respect to a vector

Derivative of a column vector wrt a row vector

Other derivatives


Least Squares Normal Equations

2 2 2

1 1

1

1 1 1

1 1

e (y ) (y )

2(y )(- ) = -2 y 2

= -2 y 2

= -2 2

n n

ni i ii i i i

i

n n n

i i i i i i ii i i

n n

i i i ii i

- x b - x b

b b b

- x b x x x x b

x x x b

X'y + X'Xb


Least Squares Normal Equations

2

1 1 K 1 (-2)(n K n 1

(-2)(K n n 1 K 1

Note: Derivative of (1 1) wrt K 1 vector is a K 1 vector.

(y - Xb)'(y - Xb)X'(y - Xb) = 0

b

( ) / ( ) )'( )

= )( ) =

Solution: 2 X'(y - Xb) = 0 X'y = X'Xb


Least Squares Solution

-1

1

1 1

` `

Assuming it exists: = ( )

Note the analogy: = Var( ) Cov( ,y)

1 1 1 1 = y

Suggests something desirable about least squares

n n

i i i ii in n n n

b X'X X'y

x x

b X'X X'y x x x


Second Order Conditions

2

First derivatives =

2

Second derivatives ...

( ) ( )

( ) ( ) =

2 ( ) 2

0

(y - Xb)'(y - Xb)X'(y - Xb)

b

y - Xb ' y - Xb

y - X

Necessary Condition :

Sufficient Condition :

b ' y - Xb b

b b b

X' y - Xb X'y

b b

2 2

K 1 column vector = = K K matrix

1 K row vector

= 2

X' -Xb X'Xb0

b b

X'X


Side Result: Sample Moments

2

1 1 1 1 2 1 1

2

1 2 1 1 2 1 2

2

1 1 1 2 1

2

1 1 2 1

2

2 1 2 2

1

1

...

...

... ... ... ...

...

...

... =

... ... ... ...

n n n

i i i i i i i iK

n n n

i i i i i i i iK

n n n

i iK i i iK i i iK

i i i i iK

n i i i i iK

i

iK i

x x x x x

x x x x x

x x x x x

x x x x x

x x x x x

x x

X'X =

2

2

1

2

1 1 2

1

...

= ......

=

iK i iK

i

in

i i i iK

ik

n

i i i

x x x

x

xx x x

x

x x


Does b Minimize e’e?

2

1 1 1 1 2 1 1

221 2 1 1 2 1 2

2

1 1 1 2 1

...

...2 2

... ... ... ...

...

If there were a single b, we would require t

n n n

i i i i i i i iK

n n n

i i i i i i i iK

n n n

i iK i i iK i i iK

x x x x x

x x x x x

x x x x x

e'eX'X =

b b'

2

1

his to be

positive, which it would be; 2 = 2 0. OK

The matrix counterpart of a positive number is a

.

n

iix

positive definite m

x x

atrix

'


A Positive Definite Matrix

Matrix is positive definite if is > 0 for every .

Generally hard to check. Requires a look at

characteristic roots (later in the course).

For some matrices, it is easy to verify.

C a'Ca a

X'

K 2

kk=1 = v 0

-1

is

one of these.

= ( )( ) = ( ) ( ) =

Could = ? means . Is this possible? No.

Conclusion: = ( ) does indeed minimize .

X

a'X'Xa a'X' Xa Xa ' Xa v'v

v 0 v =0 Xa = 0

b X'X X'y e'e


Algebraic Results - 1

1

= means for each column of , = 0

(1) Each column of is orthogonal to .

(2) One of the columns of is a column of ones.

n

ii

k

n

0 x e

e

In the population: E[ ' ] =

1In the sample : ei

X 0

x 0

X e X

X

X

i1

i1

= e = 0. The residuals sum to zero.

1(3) It follows that e = 0 which mimics E[ ] 0.

n

i

n

iin

i'e


Residuals vs. Disturbances

i i i

i i i

Disturbances (population) y

Partitioning : = E[ | ] +

Residuals (sample) y e

Partitio

x

y y y X

x b

ε

= conditional mean + disturbance

ning : = +

, i.e., the

set of linear combinations of the columns of

y y Xb

X

X. Xb

e

= projection + residual

(Note : Projection into the column space of

is one of these.)


Algebraic Results - 2

A “residual maker” M = (I - X(X’X)-1X’)

e = y - Xb= y - X(X’X)-1X’y = My

My = The residuals that result when y is regressed on X

MX = 0 (This result is fundamental!)

How do we interpret this result in terms of residuals?

When a column of X is regressed on all of X, we get a perfect fit and zero residuals.

(Therefore) My = MXb + Me = Me = e

(You should be able to prove this.

y = Py + My, P = X(X’X)-1X’ = (I - M).

PM = MP = 0.

Py is the projection of y into the column space of X.


The M Matrix

M = I- X(X’X)-1X’ is an nxn matrix

M is symmetric – M = M’

M is idempotent – M*M = M

(just multiply it out)

M is singular; M-1 does not exist.

(We will prove this later as a side result in another derivation.)


Results when X Contains a Constant Term

X = [1,x2,…,xK]

The first column of X is a column of ones

Since X’e = 0, x1’e = 0 – the residuals sum to zero.

n

ii=1

Define [1,1,...,1]' a column of n ones

= y ny

so (1/n) = (1/n)

implies

y (the regression line passes through the means)

These do not apply if the model has

y Xb e

i

i'y

i'y i'Xb+i'e = i'Xb i'y i'Xb

x b

+

no constant term.


U.S. Gasoline Market, 1960-1995


Least Squares Algebra


Least Squares


Residuals


Least Squares Residuals (autocorrelated)


Least Squares Algebra-3

M is n n potentially huge

I X XX X M


Least Squares Algebra-4

MX =

econometrics i - new york...

Documents