september 1, 2009 session 2slide 1 psc 5940: regression review and questions about “causality”...

September 1, 2009 Session 2 Slide 1

PSC 5940: Regression Review and Questions about “Causality”

Session 2

Fall, 2009


Data Discussion• EE09 & NS09 Data: research ideas?• Fixing data in Excel: EE09

– NA replacement– Text to numeric (e28_gcc)– Getting rid of extraneous characters

• $ in “random_p”

• EE and partisanship– Loading and attaching the data– Examining party identification (“e216_par”)– Examining gender (“e3_gender”)

• Dealing with awkward names and NA values


Deterministic Linear Models

• Theoretical Model:– andare constant terms

• is the intercept

• is the slope

– Xi is a predictor of Yi

Yi = +Xi

Yi = +Xi

a

b

1 =a

b

Xi

Yi


Stochastic Linear Models• E[Yi] = +Xi

–

–

– Variation in Y is caused by more than X:

error (i)

•

• So:

€

0 = Y when X = 0

Each 1 unit increase in X increases Y by

i = Yi −(β0 + β1Xi ) = Yi − E[Yi ]

Yi =E[Yi ] + i= + Xi + i


Assumptions Necessary for Estimating Linear Models

1.Errors have identical distributions

Zero mean, same variance, across the range of X

2.Errors are independent of X and other i

3.Errors are normally distributed

E[ i ] ≠ f(X)andE[i ] ≠ f( j , j ≠i)

i=0

X


Normal, Independent & Identical i Distributions (“Normal iid”)

Y

X

Problem: We don’t know:

a) if error assumptions hold true; b) values for 0 and 1

Solution: Estimate ‘em!


OLS Derivation of b0

€

Given that : Yi = bo + b1X i +e and ˆ Y = bo + b1X

SSE = ei2

1

n

∑ = (Yi − ˆ Y i1

n

∑ )2

= (Yi

1

n

∑ − bo − b1X i)2

Now we minimize w.r.t. b0, using the chain rule :

′ f (b0) = 2(Yi − bo − b1X i∑ ) ⋅(−1) =

− 2 (Yi − bo − b1X ii∑ ) = −2 Yi + 2nbo + 2b1 X i∑∑

Use partial derivation in this step:


Derivation of b0, step 2

Now we want to set the derivative to zero and solve :

−2 Yi + 2nbo + 2b Xi∑∑ =

First shift the non-b terms to the other side:

2nb =2 Yi −∑ 2b Xi∑ 2Now divide through by n:

2nb2n

=2 Yi∑2n

−2b Xi∑2n

which is: b = Y −b X


Derivation of b1

Step 1: Multiply out e2

ei2∑ = (Yi −b0 −b1Xi)

2∑= (Yi −b0 −b1Xi)∑ ⋅(Yi −b0 −b1Xi)

= Yi2∑ − Yi∑ b0 − Yib1Xi −∑ Yi∑ b0 + b0

2 + b0b1Xi∑∑− Yib1Xi +∑ b0b1Xi∑ + b1

2Xi2∑

Now add the like terms and then drag all the constants

through the summations to get:

= Y2 −2b0 Y∑∑ −2b1 XY+nb02∑ +2b0b1 X+b1

2 X2∑∑


Derivation of b1

Step 2: Differentiate w.r.t. b1

∑ ∑∑

∑ ∑∑∑ ∑

++−=′

+++−−

==

2101

1

22110

2010

2

12

12

222 )(

:obtain toderivation thefrom

dropped be orecan theref and constants

effect,in are, b without termsall that Note

222

)(

:b respect to with e atedifferentipartially Next,

XbXbXYbf

XbXbbnbXYbYbY

bfe


Derivation of b1

Step 3: Substitute for b0

′f (b1 ) = − 2 XY +∑ 2b0 X + 2b1 X 2∑∑

Since b0 = Y − b1 X, we can write ′ f (b1 ) as follows :

−2 XY +∑2 Y X∑∑

n−

2b1( X )2∑n

+ 2b1 X2∑ = 0


Derivation of b1

Step 4: Simplify and Isolate b1 Now we can multiply through by

n2

and

put all the b1 terms on the same side :

−2 XY +∑2 Y X∑∑

n−2b( X)2∑

n+ 2b X2∑ =, so

(nb X2 −b( X)2 ) =(n XY− X Y)∑∑∑∑∑ , or

b(n X2 −( X)2 ) =n XY− X Y, and finally∑∑∑∑∑

b =n XY− X Y∑∑∑n X2 −( X)2∑∑


Calculating b0 and b1

• The formula for b1 and b0 allow you (or preferably your computer) to calculate the error-minimizing slope and intercept for any data set representing a bi-variate, linear relationship.

ˆ b 1 =n XY− X Y∑∑∑n X2 −( X)2∑∑

=(Xi − X)(Yi − Y )∑

(Xi − X)2∑

ˆb = Y −b X

• No other line, using the same data, will result in a smaller a squared-error (e2 ). OLS gives best fit.


Interpreting b1 and b0

ˆ b 1 =(Xi − X)(Yi − Y)∑

(Xi − X)2∑

ˆ b 0 = Y −b X

For each 1-unit increase in X, you get b1 units change in Y

When X is zero, Y will be equal to b0. Note that a regression model with no independent variables is simply the mean.


Theoretical Specification of Multivariate Regression

E[Yi] 0 1 Xi,1 2 Xi,2 ... K 1Xi ,K 1

where K is the number of parameters ( ' s)

Yi E[Yi ] i or (in matrix form) Y X U

ˆ Y b0 b1 Xi,1 b2 Xi, 2 ... bK 1 Xi, K 1

So RSS= (Yi ˆ Y i )2 ei

2

Regression in Matrix Form• Assume a model using n observations, with K-1

Xi (independent) variables

€

Y (n ×1) is a column vector of the observed dependent variable

ˆ Y (n ×1) is a column vector of predicted Y values

X (n × K) each column is of observations on an X, first column 1's

B (K ×1) a row vector of regression coefficients (first is b0)

U (n ×1) is a column vector of n residual values

Regression in Matrix Form

Y =XB+UˆY =XB

B=( ′X X)− ′XY

Note: we can’t uniquely define (X’X)-1 if anycolumn in the X matrix is a linear function ofany other column(s) in X.

The X’X Matrix

( ′X X) =

n X∑ X2∑ X3∑X∑ X

2∑ XX2∑ XX3∑X2∑ X2X∑ X2

2∑ X2X3∑X3∑ X3X∑ X3X2∑ X3

2∑

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

Note that you can obtain the basis for all the necessary means, variances and covariances among the Xs from the (X’X) matrix

An Example of Matrix Regression

Using a sample of 7 observations, where X hasElements {X0, X1, X2, X3}

[ ]49.004.006.196.3=B

11.0

58.0

11.0

49.0

41.0

98.0

48.0

10.11

9.58

4.89

2.51

4.41

10.02

6.48

=Y

5281

4371

5431

6911

4621

3271

4541

10

9

5

3

4

11

6

−

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

−

−

−

−

=

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

= UXY


Summary of OLS Assumption Failures and their Implications

Problem Biased b Biased SE Invalid t/F Hi Var

Non-linear Yes Yes Yes ---

Omit relev. X Yes Yes Yes ---

Irrel X No No No Yes

X meas. Error Yes Yes Yes ---

Heterosced. No Yes Yes Yes

Autocorr. No Yes Yes Yes

X corr. error Yes Yes Yes ---

Non-normal err. No No Yes Yes

Multicolinearity No No No Yes


BREAK


Causality and Experiments

X2 Y

Number ofFire Trucks

Number ofFire Deaths

Question: What is the relationship between the number of fire trucks at the scene of a fire, and the number of deaths caused by that fire?

Experimental approach: Randomly assign fire incidents to different categories, which receive different numbers of trucks (treatment).


Causality and Observational Data

The problem of spurious relations...

X2

X1

Y

Number ofFire Trucks

Number ofFire Deaths

Size ofFire

In an experimental design, we fully control for spuriousrelationships. With OLS we try to manage them statistically.


Statistical Calculation of Partial Effects

In calculating the effect of X1 on Y, we remove the effect of the other X’s on both X1 and Y:

ˆ Y i b0 b2 Xi ,2 ei, Y| X2

ˆ X i ,1 b0 b2 Xi ,2 ei ,X 1 |X2

so

ei ,Y |X2 b0 b1ei, X1 |X 2

The use of residuals “cleans” both Y and X1 of their correlationswith X2, permitting estimation PRCs.

Y stripped ofthe effect of X2

X1 stripped ofthe effect of X2


Intuition of PRC’s

• All overlapping variance is stripped

• Highly correlated IVs are problematic– But what if the overlap is important?

• What if X1 and X2 are really part of some larger construct?

– The case of knowledge, efficacy and behavior– Kelstet et al

• How should we interpret the PRC’s in this case?


Workshop• Load EE data

• Run a simple model:

• Willingness to pay for an alternative energy tax

• Use randomly assigned cost as IV

• Plot to relationship (use jitter)

• Now add: Income, Ideology

• Change in cost variable? (Why?)


Homework• Generate and analyze the residuals

• Add to the model:

• Belief in anthropogenic climate change

• Will require recodes

• Understanding of GCC science

• Recode “What scientists’ believe…” variables

• 1 page summary of findings for class next week

• Next Extension: Modeling Dummies and Interactions

september 1, 2009 session 2slide 1 psc 5940: regression review and questions about “causality”...

Documents