september 1, 2009 session 2slide 1 psc 5940: regression review and questions about “causality”...
TRANSCRIPT
September 1, 2009 Session 2 Slide 1
PSC 5940: Regression Review and Questions about “Causality”
Session 2
Fall, 2009
September 1, 2009 Session 2 Slide 2
Data Discussion• EE09 & NS09 Data: research ideas?• Fixing data in Excel: EE09
– NA replacement– Text to numeric (e28_gcc)– Getting rid of extraneous characters
• $ in “random_p”
• EE and partisanship– Loading and attaching the data– Examining party identification (“e216_par”)– Examining gender (“e3_gender”)
• Dealing with awkward names and NA values
September 1, 2009 Session 2 Slide 3
Deterministic Linear Models
• Theoretical Model:– andare constant terms
• is the intercept
• is the slope
– Xi is a predictor of Yi
Yi = +Xi
Yi = +Xi
a
b
1 =a
b
Xi
Yi
September 1, 2009 Session 2 Slide 4
Stochastic Linear Models• E[Yi] = +Xi
–
–
– Variation in Y is caused by more than X:
error (i)
•
• So:
€
0 = Y when X = 0
Each 1 unit increase in X increases Y by
i = Yi −(β0 + β1Xi ) = Yi − E[Yi ]
Yi =E[Yi ] + i= + Xi + i
September 1, 2009 Session 2 Slide 5
Assumptions Necessary for Estimating Linear Models
1.Errors have identical distributions
Zero mean, same variance, across the range of X
2.Errors are independent of X and other i
3.Errors are normally distributed
E[ i ] ≠ f(X)andE[i ] ≠ f( j , j ≠i)
i=0
X
September 1, 2009 Session 2 Slide 6
Normal, Independent & Identical i Distributions (“Normal iid”)
Y
X
Problem: We don’t know:
a) if error assumptions hold true; b) values for 0 and 1
Solution: Estimate ‘em!
September 1, 2009 Session 2 Slide 7
OLS Derivation of b0
€
Given that : Yi = bo + b1X i +e and ˆ Y = bo + b1X
SSE = ei2
1
n
∑ = (Yi − ˆ Y i1
n
∑ )2
= (Yi
1
n
∑ − bo − b1X i)2
Now we minimize w.r.t. b0, using the chain rule :
′ f (b0) = 2(Yi − bo − b1X i∑ ) ⋅(−1) =
− 2 (Yi − bo − b1X ii∑ ) = −2 Yi + 2nbo + 2b1 X i∑∑
Use partial derivation in this step:
September 1, 2009 Session 2 Slide 8
Derivation of b0, step 2
Now we want to set the derivative to zero and solve :
−2 Yi + 2nbo + 2b Xi∑∑ =
First shift the non-b terms to the other side:
2nb =2 Yi −∑ 2b Xi∑ 2Now divide through by n:
2nb2n
=2 Yi∑2n
−2b Xi∑2n
which is: b = Y −b X
September 1, 2009 Session 2 Slide 9
Derivation of b1
Step 1: Multiply out e2
ei2∑ = (Yi −b0 −b1Xi)
2∑= (Yi −b0 −b1Xi)∑ ⋅(Yi −b0 −b1Xi)
= Yi2∑ − Yi∑ b0 − Yib1Xi −∑ Yi∑ b0 + b0
2 + b0b1Xi∑∑− Yib1Xi +∑ b0b1Xi∑ + b1
2Xi2∑
Now add the like terms and then drag all the constants
through the summations to get:
= Y2 −2b0 Y∑∑ −2b1 XY+nb02∑ +2b0b1 X+b1
2 X2∑∑
September 1, 2009 Session 2 Slide 10
Derivation of b1
Step 2: Differentiate w.r.t. b1
∑ ∑∑
∑ ∑∑∑ ∑
++−=′
+++−−
==
2101
1
22110
2010
2
12
12
222 )(
:obtain toderivation thefrom
dropped be orecan theref and constants
effect,in are, b without termsall that Note
222
)(
:b respect to with e atedifferentipartially Next,
XbXbXYbf
XbXbbnbXYbYbY
bfe
September 1, 2009 Session 2 Slide 11
Derivation of b1
Step 3: Substitute for b0
′f (b1 ) = − 2 XY +∑ 2b0 X + 2b1 X 2∑∑
Since b0 = Y − b1 X, we can write ′ f (b1 ) as follows :
−2 XY +∑2 Y X∑∑
n−
2b1( X )2∑n
+ 2b1 X2∑ = 0
September 1, 2009 Session 2 Slide 12
Derivation of b1
Step 4: Simplify and Isolate b1 Now we can multiply through by
n2
and
put all the b1 terms on the same side :
−2 XY +∑2 Y X∑∑
n−2b( X)2∑
n+ 2b X2∑ =, so
(nb X2 −b( X)2 ) =(n XY− X Y)∑∑∑∑∑ , or
b(n X2 −( X)2 ) =n XY− X Y, and finally∑∑∑∑∑
b =n XY− X Y∑∑∑n X2 −( X)2∑∑
September 1, 2009 Session 2 Slide 13
Calculating b0 and b1
• The formula for b1 and b0 allow you (or preferably your computer) to calculate the error-minimizing slope and intercept for any data set representing a bi-variate, linear relationship.
ˆ b 1 =n XY− X Y∑∑∑n X2 −( X)2∑∑
=(Xi − X)(Yi − Y )∑
(Xi − X)2∑
ˆb = Y −b X
• No other line, using the same data, will result in a smaller a squared-error (e2 ). OLS gives best fit.
September 1, 2009 Session 2 Slide 14
Interpreting b1 and b0
ˆ b 1 =(Xi − X)(Yi − Y)∑
(Xi − X)2∑
ˆ b 0 = Y −b X
For each 1-unit increase in X, you get b1 units change in Y
When X is zero, Y will be equal to b0. Note that a regression model with no independent variables is simply the mean.
September 1, 2009 Session 2 Slide 15
Theoretical Specification of Multivariate Regression
E[Yi] 0 1 Xi,1 2 Xi,2 ... K 1Xi ,K 1
where K is the number of parameters ( ' s)
Yi E[Yi ] i or (in matrix form) Y X U
ˆ Y b0 b1 Xi,1 b2 Xi, 2 ... bK 1 Xi, K 1
So RSS= (Yi ˆ Y i )2 ei
2
Regression in Matrix Form• Assume a model using n observations, with K-1
Xi (independent) variables
€
Y (n ×1) is a column vector of the observed dependent variable
ˆ Y (n ×1) is a column vector of predicted Y values
X (n × K) each column is of observations on an X, first column 1's
B (K ×1) a row vector of regression coefficients (first is b0)
U (n ×1) is a column vector of n residual values
Regression in Matrix Form
Y =XB+UˆY =XB
B=( ′X X)− ′XY
Note: we can’t uniquely define (X’X)-1 if anycolumn in the X matrix is a linear function ofany other column(s) in X.
The X’X Matrix
( ′X X) =
n X∑ X2∑ X3∑X∑ X
2∑ XX2∑ XX3∑X2∑ X2X∑ X2
2∑ X2X3∑X3∑ X3X∑ X3X2∑ X3
2∑
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
Note that you can obtain the basis for all the necessary means, variances and covariances among the Xs from the (X’X) matrix
An Example of Matrix Regression
Using a sample of 7 observations, where X hasElements {X0, X1, X2, X3}
[ ]49.004.006.196.3=B
11.0
58.0
11.0
49.0
41.0
98.0
48.0
10.11
9.58
4.89
2.51
4.41
10.02
6.48
=Y
5281
4371
5431
6911
4621
3271
4541
10
9
5
3
4
11
6
−
⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
−
−
−
−
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
= UXY
September 1, 2009 Session 2 Slide 20
Summary of OLS Assumption Failures and their Implications
Problem Biased b Biased SE Invalid t/F Hi Var
Non-linear Yes Yes Yes ---
Omit relev. X Yes Yes Yes ---
Irrel X No No No Yes
X meas. Error Yes Yes Yes ---
Heterosced. No Yes Yes Yes
Autocorr. No Yes Yes Yes
X corr. error Yes Yes Yes ---
Non-normal err. No No Yes Yes
Multicolinearity No No No Yes
September 1, 2009 Session 2 Slide 21
BREAK
September 1, 2009 Session 2 Slide 22
Causality and Experiments
X2 Y
Number ofFire Trucks
Number ofFire Deaths
Question: What is the relationship between the number of fire trucks at the scene of a fire, and the number of deaths caused by that fire?
Experimental approach: Randomly assign fire incidents to different categories, which receive different numbers of trucks (treatment).
September 1, 2009 Session 2 Slide 23
Causality and Observational Data
The problem of spurious relations...
X2
X1
Y
Number ofFire Trucks
Number ofFire Deaths
Size ofFire
In an experimental design, we fully control for spuriousrelationships. With OLS we try to manage them statistically.
September 1, 2009 Session 2 Slide 24
Statistical Calculation of Partial Effects
In calculating the effect of X1 on Y, we remove the effect of the other X’s on both X1 and Y:
ˆ Y i b0 b2 Xi ,2 ei, Y| X2
ˆ X i ,1 b0 b2 Xi ,2 ei ,X 1 |X2
so
ei ,Y |X2 b0 b1ei, X1 |X 2
The use of residuals “cleans” both Y and X1 of their correlationswith X2, permitting estimation PRCs.
Y stripped ofthe effect of X2
X1 stripped ofthe effect of X2
September 1, 2009 Session 2 Slide 25
Intuition of PRC’s
• All overlapping variance is stripped
• Highly correlated IVs are problematic– But what if the overlap is important?
• What if X1 and X2 are really part of some larger construct?
– The case of knowledge, efficacy and behavior– Kelstet et al
• How should we interpret the PRC’s in this case?
September 1, 2009 Session 2 Slide 26
Workshop• Load EE data
• Run a simple model:
• Willingness to pay for an alternative energy tax
• Use randomly assigned cost as IV
• Plot to relationship (use jitter)
• Now add: Income, Ideology
• Change in cost variable? (Why?)
September 1, 2009 Session 2 Slide 27
Homework• Generate and analyze the residuals
• Add to the model:
• Belief in anthropogenic climate change
• Will require recodes
• Understanding of GCC science
• Recode “What scientists’ believe…” variables
• 1 page summary of findings for class next week
• Next Extension: Modeling Dummies and Interactions