1 regression econ 240a. 2 outline w a cognitive device to help understand the formulas for...
Post on 20-Dec-2015
213 views
TRANSCRIPT
2
Outline A cognitive device to help understand the
formulas for estimating the slope and the intercept, as well as the analysis of variance
Table of Analysis of Variance (ANOVA) for regression
F distribution for testing the significance of the regression, i.e. does the independent variable, x, significantly explain the dependent variable, y?
3
Outline (Cont.) The Coefficient of Determination, R2, and
the Coefficient of Correlation, r. Estimate of the error variance, 2. Hypothesis tests on the slope, b.
5
A Cognitive Device: The Conceptual Model
(1) yi = a + b*xi + ei
Take expectations , E: (2) E yi = a + b*E xi +E ei, where (3) E ei =0
Subtract (2) from (1) to obtain model in deviations:
(4) [yi - E yi ] = b*[xi - E xi ] + ei
Multiply (3) by [xi - E xi ] and take expectations:
6
A Cognitive Device: (Cont.)
(5) E{[yi - E yi ] [xi - E xi ]} = b*E[xi - E xi ]2 + E{ei [xi - E xi ] }, where E{ei [xi - E xi ] }=0
By definition, (6) cov yx = b* var x, i.e. (7) b= cov yx/ var x The corresponding empirical estimate:
(8) ˆ b [y(i) y ][x(i) x ] [x(i) x ]2
i
i
7
A Cognitive Device (Cont.) The empirical counter part to (2)
Square both sides of (4), and take expectations,
(10) E [yi - E yi ]2 = b2*E[xi - E xi ]2 + 2E{ei*[xi - E xi ]}+ E[ei]2
Where (11) E{ei*[xi - E xi ] = 0 , i.e. the explanatory variable x and the error e are assumed to be independent, cov ex = 0
y a ˆ b * x ,so(9) ˆ a y ˆ b *x
8
A Cognitive Device (Cont.) From (10) by definition (11) var y = b2 * var x + var e, this is the
partition of the total variance in y into the variance explained by x, b2 * var x , and the unexplained or error variance, var e.
the empirical counterpart to (11) is the total sum of squares equals the explained sum of squares plus the unexplained sum of squares:
(12) [y(i) y ]2 ˆ b 2 [x(i) x i
i ]2 [e(i)]2
i
10
ANOVA
Testing the significance of the regression, i.e. does x significantly explain y?
F1, n -2 = EMS/UMS
Distributed with the F distribution with 1 degree of freedom in the numerator and n-2 degrees of freedom in the denominator
11
Table of Analysis of Variance (ANOVA)
S o u rc e o fV a r ia t io n
S u m o fS q u a re s
D e g re e s o fF re e d o m
M e a nS q u a re
E x p la in e d ,E S S
b 2 [ x ( i ) x ] 2
i 1 ˆ b 2 [ x ( i ) x ] 2
i
E rro r ,U S S
[ ˆ e ( i ) ]2
i n - 2 [ ˆ e ( i ) ]2
i /n -2
T o ta l , T S S [ y ( i ) y ] 2
i n - 1 [ y ( i ) y ] 2
i /n -1
F1,n -2 = Explained Mean Square / Error Mean Square
UC Budget in Millions of Nominal $, 1968-69 to 2001-02
y = 84.877x - 19.953
R2 = 0.9071
0
500
1000
1500
2000
2500
3000
3500
4000
Fiscal Year
Millions $
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.95239903R Square 0.90706392Adjusted R Square0.90415967Standard Error274.743234Observations 34
ANOVAdf SS MS F Significance F
Regression 1 23575316.6 23575316.6 312.322678 4.5374E-18Residual 32 2415483.03 75483.8448Total 33 25990799.6
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 64.9235479 92.1946935 0.70420049 0.48640139 -122.87074 252.717834 -122.87074 252.717834X Variable 1 84.8767885 4.80271901 17.6726534 4.5374E-18 75.0939783 94.6595988 75.0939783 94.6595988
Time index, t = 0 for 1968-69, t=1 for 1969-70 etc.
15
Example from Lab Four
Exponential trend model for UC Budget UCBud(t) =exp[a+b*t+e(t)] taking the logarithms of both sides ln UCBud(t) = a + b*t +e(t)
UC Budget in Millions of Nominal $, 1968-69 to 2001-02
y = 339.05e0.0708x
R2 = 0.9245
0
500
1000
1500
2000
2500
3000
3500
4000
Fiscal Year
Millions $
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.96150521R Square 0.92449227Adjusted R Square0.92213265Standard Error0.20475476Observations 34
ANOVAdf SS MS F Significance F
Regression 1 16.4259265 16.4259265 391.797669 1.6212E-19Residual 32 1.34158442 0.04192451Total 33 17.7675109
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 5.89699127 0.06870889 85.8257392 2.1056E-39 5.75703596 6.03694658 5.75703596 6.03694658X Variable 1 0.07084759 0.00357927 19.7938796 1.6212E-19 0.06355687 0.07813832 0.06355687 0.07813832
ln 339.05 = 5.83
Time index, t = 0 for 1968-69, t=1 for 1969-70 etc.
19
The F Distribution
The density function of the F distribution:
1 and 2 are the numerator and denominator degrees of freedom.
0FF
1
F
22
22
22
)F(f2
2
1
22
2
2
1
21
21
21
11
0FF
1
F
22
22
22
)F(f2
2
1
22
2
2
1
21
21
21
11
!!
!
20
0
0.002
0.004
0.006
0.008
0.01
0 1 2 3 4 5
This density function generates a rich family of distributions, depending on the values of 1 and 2
The F Distribution
1 = 5, 2 = 10
1 = 50, 2 = 10
00.0010.0020.0030.0040.0050.0060.0070.008
0 1 2 3 4 5
1 = 5, 2 = 10
1 = 5, 2 = 1
21
Determining Values of F
The values of the F variable can be found in the F table, Table 6(a) in Appendix B for a type I error of 5%, or Excel .
The entries in the table are the values of the F variable of the right hand tail probability (A), for which P(F1,2>FA) = A.
22
Part IV: The Pearson Coefficient of Correlation, r The Pearson coefficient of correlation, r, is
(13) r = cov yx/[var x]1/2 [var y]1/2
Estimated counterpart
Comparing (13) to (7) note that (15) r*{[var y]1/2 /[var x]1/2} = b
(14) ˆ r [y(i) y ][x(i) x ] [y(i) i
i y ]2 [x(i) x ]2
i
23
Part IV (Cont.) The coefficient of Determination, R2
For a bivariate regression of y on a single explanatory variable, x, R2 = r2, i.e. the coefficient of determination equals the square of the Pearson coefficient of correlation
Using (14) to square the estimate of r
(16)[ ˆ r ]2 { [y(i) y ][x(i) x ]}2 [y(i) i
i y ]2 [x( i) x ]2
i
24
Part IV (Cont.) Using (8), (16) can be expressed as
And so
In general, including multivariate regression, the estimate of the coefficient of determination, , can be calculated from (21) =1 -USS/TSS .
(19) ˆ r 2 ˆ b 2 * [x(i) x ]2
i [y(i) y ]2
i ESS / TSS
(20)1 ˆ r 2 1 [ESS / TSS} [TSS ESS]/ TSS USS / TSS
ˆ R 2
ˆ R 2
25
Part IV (Cont.) For the bivariate regression, the F-test can
be calculated from F1, n-2 = [(n-2)/1][ESS/TSS]/[USS/TSS] F1, n-2 = [(n-2)/1][ESS/USS]=[(n-2)]
For a multivariate regression with k explanatory variables, the F-test can be calculated as Fk, n-2 = [(n-2)/k][ESS/USS] Fk,
n-2 = [(n-2)/k]
ˆ R 2 [1 ˆ R 2 ]
ˆ R 2 [1 ˆ R 2 ]
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.95239903R Square 0.90706392Adjusted R Square0.90415967Standard Error274.743234Observations 34
ANOVAdf SS MS F Significance F
Regression 1 23575316.6 23575316.6 312.322678 4.5374E-18Residual 32 2415483.03 75483.8448Total 33 25990799.6
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 64.9235479 92.1946935 0.70420049 0.48640139 -122.87074 252.717834 -122.87074 252.717834X Variable 1 84.8767885 4.80271901 17.6726534 4.5374E-18 75.0939783 94.6595988 75.0939783 94.6595988
F1, 32 = (n-2)*[R2/(1 - R2) = 32*(0.903/0.093) = 312
27
Part V:Estimate of the Error Variance
Var ei =
Estimate is unexplained mean square, UMS
Standard error of the regression is
ˆ 2 [ˆ e (i)]2 (n 1) [y(i) ˆ y (i)i
i ]2 (n 1)
ˆ
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.95239903R Square 0.90706392Adjusted R Square0.90415967Standard Error274.743234Observations 34
ANOVAdf SS MS F Significance F
Regression 1 23575316.6 23575316.6 312.322678 4.5374E-18Residual 32 2415483.03 75483.8448Total 33 25990799.6
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 64.9235479 92.1946935 0.70420049 0.48640139 -122.87074 252.717834 -122.87074 252.717834X Variable 1 84.8767885 4.80271901 17.6726534 4.5374E-18 75.0939783 94.6595988 75.0939783 94.6595988
ˆ 274.74 UMS 75483.84
29
Part VI: Hypothesis Tests on the Slope
Hypotheses, H0 : b=0; HA: b>0
Test statistic:
Set probability for the type I error, say 5% Note: for bivariate regression, the square of the
t-statistic for the null that the slope is zero is the F-statistic
[ ˆ b E( ˆ b )] ˆ ( ˆ b ),where E( ˆ b ) b under theH0
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.95239903R Square 0.90706392Adjusted R Square0.90415967Standard Error274.743234Observations 34
ANOVAdf SS MS F Significance F
Regression 1 23575316.6 23575316.6 312.322678 4.5374E-18Residual 32 2415483.03 75483.8448Total 33 25990799.6
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 64.9235479 92.1946935 0.70420049 0.48640139 -122.87074 252.717834 -122.87074 252.717834X Variable 1 84.8767885 4.80271901 17.6726534 4.5374E-18 75.0939783 94.6595988 75.0939783 94.6595988
t = {84.88 - 0]/4.80 = 17.7
t2 = F, i.e. 17.67*17.67 = 312
32
The Student t Distribution
The Student t density function
is the parameter of the student t distribution
E(t) = 0 V(t) =(– 2)
2/)1(2t1
)]!2[()]!1[(
)t(f
2/)1(2t1
)]!2[()]!1[(
)t(f
(for n > 2)(for n > 2)
33
The Student t Distribution
0
0.05
0.1
0.15
0.2
-6 -4 -2 0 2 4 6
0
0.05
0.1
0.15
0.2
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
= 3
= 10
34
Determining Student t Values
The student t distribution is used extensively in statistical inference.
Thus, it is important to determine values of tA associated with a given number of degrees of freedom.
We can do this using• t tables , Table 4 Appendix B
• Excel
35
Degrees of Freedom1 3.078 6.314 12.706 31.821 63.6572 1.886 2.92 4.303 6.965 9.925. . . . . .. . . . . .
10 1.372 1.812 2.228 2.764 3.169. . . . . .. . . . . .
200 1.286 1.653 1.972 2.345 2.6011.282 1.645 1.96 2.326 2.576
tA
t.100 t.05 t.025 t.01 t.005
A = .05A = .05
-tA
The t distribution issymmetrical around 0
=1.812=-1.812
The table provides the t values (tA) for which P(t > tA) = A
Using the t Tabletttttttt