1 regression econ 240a. 2 outline w a cognitive device to help understand the formulas for...

35
1 Regression Econ 240A

Post on 20-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Regression

Econ 240A

2

Outline A cognitive device to help understand the

formulas for estimating the slope and the intercept, as well as the analysis of variance

Table of Analysis of Variance (ANOVA) for regression

F distribution for testing the significance of the regression, i.e. does the independent variable, x, significantly explain the dependent variable, y?

3

Outline (Cont.) The Coefficient of Determination, R2, and

the Coefficient of Correlation, r. Estimate of the error variance, 2. Hypothesis tests on the slope, b.

4

Part I: A Cognitive Device

5

A Cognitive Device: The Conceptual Model

(1) yi = a + b*xi + ei

Take expectations , E: (2) E yi = a + b*E xi +E ei, where (3) E ei =0

Subtract (2) from (1) to obtain model in deviations:

(4) [yi - E yi ] = b*[xi - E xi ] + ei

Multiply (3) by [xi - E xi ] and take expectations:

6

A Cognitive Device: (Cont.)

(5) E{[yi - E yi ] [xi - E xi ]} = b*E[xi - E xi ]2 + E{ei [xi - E xi ] }, where E{ei [xi - E xi ] }=0

By definition, (6) cov yx = b* var x, i.e. (7) b= cov yx/ var x The corresponding empirical estimate:

(8) ˆ b [y(i) y ][x(i) x ] [x(i) x ]2

i

i

7

A Cognitive Device (Cont.) The empirical counter part to (2)

Square both sides of (4), and take expectations,

(10) E [yi - E yi ]2 = b2*E[xi - E xi ]2 + 2E{ei*[xi - E xi ]}+ E[ei]2

Where (11) E{ei*[xi - E xi ] = 0 , i.e. the explanatory variable x and the error e are assumed to be independent, cov ex = 0

y a ˆ b * x ,so(9) ˆ a y ˆ b *x

8

A Cognitive Device (Cont.) From (10) by definition (11) var y = b2 * var x + var e, this is the

partition of the total variance in y into the variance explained by x, b2 * var x , and the unexplained or error variance, var e.

the empirical counterpart to (11) is the total sum of squares equals the explained sum of squares plus the unexplained sum of squares:

(12) [y(i) y ]2 ˆ b 2 [x(i) x i

i ]2 [e(i)]2

i

9

Part II: ANOVA in Regression

10

ANOVA

Testing the significance of the regression, i.e. does x significantly explain y?

F1, n -2 = EMS/UMS

Distributed with the F distribution with 1 degree of freedom in the numerator and n-2 degrees of freedom in the denominator

11

Table of Analysis of Variance (ANOVA)

S o u rc e o fV a r ia t io n

S u m o fS q u a re s

D e g re e s o fF re e d o m

M e a nS q u a re

E x p la in e d ,E S S

b 2 [ x ( i ) x ] 2

i 1 ˆ b 2 [ x ( i ) x ] 2

i

E rro r ,U S S

[ ˆ e ( i ) ]2

i n - 2 [ ˆ e ( i ) ]2

i /n -2

T o ta l , T S S [ y ( i ) y ] 2

i n - 1 [ y ( i ) y ] 2

i /n -1

F1,n -2 = Explained Mean Square / Error Mean Square

12

Example from Lab Four

Linear Trend Model for UC Budget

UC Budget in Millions of Nominal $, 1968-69 to 2001-02

y = 84.877x - 19.953

R2 = 0.9071

0

500

1000

1500

2000

2500

3000

3500

4000

Fiscal Year

Millions $

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.95239903R Square 0.90706392Adjusted R Square0.90415967Standard Error274.743234Observations 34

ANOVAdf SS MS F Significance F

Regression 1 23575316.6 23575316.6 312.322678 4.5374E-18Residual 32 2415483.03 75483.8448Total 33 25990799.6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 64.9235479 92.1946935 0.70420049 0.48640139 -122.87074 252.717834 -122.87074 252.717834X Variable 1 84.8767885 4.80271901 17.6726534 4.5374E-18 75.0939783 94.6595988 75.0939783 94.6595988

Time index, t = 0 for 1968-69, t=1 for 1969-70 etc.

15

Example from Lab Four

Exponential trend model for UC Budget UCBud(t) =exp[a+b*t+e(t)] taking the logarithms of both sides ln UCBud(t) = a + b*t +e(t)

UC Budget in Millions of Nominal $, 1968-69 to 2001-02

y = 339.05e0.0708x

R2 = 0.9245

0

500

1000

1500

2000

2500

3000

3500

4000

Fiscal Year

Millions $

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.96150521R Square 0.92449227Adjusted R Square0.92213265Standard Error0.20475476Observations 34

ANOVAdf SS MS F Significance F

Regression 1 16.4259265 16.4259265 391.797669 1.6212E-19Residual 32 1.34158442 0.04192451Total 33 17.7675109

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 5.89699127 0.06870889 85.8257392 2.1056E-39 5.75703596 6.03694658 5.75703596 6.03694658X Variable 1 0.07084759 0.00357927 19.7938796 1.6212E-19 0.06355687 0.07813832 0.06355687 0.07813832

ln 339.05 = 5.83

Time index, t = 0 for 1968-69, t=1 for 1969-70 etc.

18

Part III: The F Distribution

19

The F Distribution

The density function of the F distribution:

1 and 2 are the numerator and denominator degrees of freedom.

0FF

1

F

22

22

22

)F(f2

2

1

22

2

2

1

21

21

21

11

0FF

1

F

22

22

22

)F(f2

2

1

22

2

2

1

21

21

21

11

!!

!

20

0

0.002

0.004

0.006

0.008

0.01

0 1 2 3 4 5

This density function generates a rich family of distributions, depending on the values of 1 and 2

The F Distribution

1 = 5, 2 = 10

1 = 50, 2 = 10

00.0010.0020.0030.0040.0050.0060.0070.008

0 1 2 3 4 5

1 = 5, 2 = 10

1 = 5, 2 = 1

21

Determining Values of F

The values of the F variable can be found in the F table, Table 6(a) in Appendix B for a type I error of 5%, or Excel .

The entries in the table are the values of the F variable of the right hand tail probability (A), for which P(F1,2>FA) = A.

22

Part IV: The Pearson Coefficient of Correlation, r The Pearson coefficient of correlation, r, is

(13) r = cov yx/[var x]1/2 [var y]1/2

Estimated counterpart

Comparing (13) to (7) note that (15) r*{[var y]1/2 /[var x]1/2} = b

(14) ˆ r [y(i) y ][x(i) x ] [y(i) i

i y ]2 [x(i) x ]2

i

23

Part IV (Cont.) The coefficient of Determination, R2

For a bivariate regression of y on a single explanatory variable, x, R2 = r2, i.e. the coefficient of determination equals the square of the Pearson coefficient of correlation

Using (14) to square the estimate of r

(16)[ ˆ r ]2 { [y(i) y ][x(i) x ]}2 [y(i) i

i y ]2 [x( i) x ]2

i

24

Part IV (Cont.) Using (8), (16) can be expressed as

And so

In general, including multivariate regression, the estimate of the coefficient of determination, , can be calculated from (21) =1 -USS/TSS .

(19) ˆ r 2 ˆ b 2 * [x(i) x ]2

i [y(i) y ]2

i ESS / TSS

(20)1 ˆ r 2 1 [ESS / TSS} [TSS ESS]/ TSS USS / TSS

ˆ R 2

ˆ R 2

25

Part IV (Cont.) For the bivariate regression, the F-test can

be calculated from F1, n-2 = [(n-2)/1][ESS/TSS]/[USS/TSS] F1, n-2 = [(n-2)/1][ESS/USS]=[(n-2)]

For a multivariate regression with k explanatory variables, the F-test can be calculated as Fk, n-2 = [(n-2)/k][ESS/USS] Fk,

n-2 = [(n-2)/k]

ˆ R 2 [1 ˆ R 2 ]

ˆ R 2 [1 ˆ R 2 ]

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.95239903R Square 0.90706392Adjusted R Square0.90415967Standard Error274.743234Observations 34

ANOVAdf SS MS F Significance F

Regression 1 23575316.6 23575316.6 312.322678 4.5374E-18Residual 32 2415483.03 75483.8448Total 33 25990799.6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 64.9235479 92.1946935 0.70420049 0.48640139 -122.87074 252.717834 -122.87074 252.717834X Variable 1 84.8767885 4.80271901 17.6726534 4.5374E-18 75.0939783 94.6595988 75.0939783 94.6595988

F1, 32 = (n-2)*[R2/(1 - R2) = 32*(0.903/0.093) = 312

27

Part V:Estimate of the Error Variance

Var ei =

Estimate is unexplained mean square, UMS

Standard error of the regression is

ˆ 2 [ˆ e (i)]2 (n 1) [y(i) ˆ y (i)i

i ]2 (n 1)

ˆ

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.95239903R Square 0.90706392Adjusted R Square0.90415967Standard Error274.743234Observations 34

ANOVAdf SS MS F Significance F

Regression 1 23575316.6 23575316.6 312.322678 4.5374E-18Residual 32 2415483.03 75483.8448Total 33 25990799.6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 64.9235479 92.1946935 0.70420049 0.48640139 -122.87074 252.717834 -122.87074 252.717834X Variable 1 84.8767885 4.80271901 17.6726534 4.5374E-18 75.0939783 94.6595988 75.0939783 94.6595988

ˆ 274.74 UMS 75483.84

29

Part VI: Hypothesis Tests on the Slope

Hypotheses, H0 : b=0; HA: b>0

Test statistic:

Set probability for the type I error, say 5% Note: for bivariate regression, the square of the

t-statistic for the null that the slope is zero is the F-statistic

[ ˆ b E( ˆ b )] ˆ ( ˆ b ),where E( ˆ b ) b under theH0

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.95239903R Square 0.90706392Adjusted R Square0.90415967Standard Error274.743234Observations 34

ANOVAdf SS MS F Significance F

Regression 1 23575316.6 23575316.6 312.322678 4.5374E-18Residual 32 2415483.03 75483.8448Total 33 25990799.6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 64.9235479 92.1946935 0.70420049 0.48640139 -122.87074 252.717834 -122.87074 252.717834X Variable 1 84.8767885 4.80271901 17.6726534 4.5374E-18 75.0939783 94.6595988 75.0939783 94.6595988

t = {84.88 - 0]/4.80 = 17.7

t2 = F, i.e. 17.67*17.67 = 312

31

Part VII: Student’s t-Distribution

32

The Student t Distribution

The Student t density function

is the parameter of the student t distribution

E(t) = 0 V(t) =(– 2)

2/)1(2t1

)]!2[()]!1[(

)t(f

2/)1(2t1

)]!2[()]!1[(

)t(f

(for n > 2)(for n > 2)

33

The Student t Distribution

0

0.05

0.1

0.15

0.2

-6 -4 -2 0 2 4 6

0

0.05

0.1

0.15

0.2

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

= 3

= 10

34

Determining Student t Values

The student t distribution is used extensively in statistical inference.

Thus, it is important to determine values of tA associated with a given number of degrees of freedom.

We can do this using• t tables , Table 4 Appendix B

• Excel

35

Degrees of Freedom1 3.078 6.314 12.706 31.821 63.6572 1.886 2.92 4.303 6.965 9.925. . . . . .. . . . . .

10 1.372 1.812 2.228 2.764 3.169. . . . . .. . . . . .

200 1.286 1.653 1.972 2.345 2.6011.282 1.645 1.96 2.326 2.576

tA

t.100 t.05 t.025 t.01 t.005

A = .05A = .05

-tA

The t distribution issymmetrical around 0

=1.812=-1.812

The table provides the t values (tA) for which P(t > tA) = A

Using the t Tabletttttttt