graphing a relationship in a multiple regression model the output above shows the result of...

15
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years of schooling, and EXP, years of work experience. 1 . reg EARNINGS S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010 -------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ EXP S INGS N EAR 56 . 0 68 . 2 49 . 26 ˆ

Upload: edward-kelley

Post on 18-Jan-2018

213 views

Category:

Documents


0 download

DESCRIPTION

3 A simple plot would be misleading. GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

TRANSCRIPT

Page 1: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years of schooling, and EXP, years of work experience.

1

. reg EARNINGS S EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010-------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------

EXPSINGSNEAR 56.068.249.26ˆ

Page 2: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

2

Suppose that you were particularly interested in the relationship between EARNINGS and S and wished to represent it graphically, using the sample data.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Hou

rly e

arni

ngs

($)

Years of schooling (highest grade completed)

Page 3: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

3

A simple plot would be misleading.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Hou

rly e

arni

ngs

($)

Years of schooling (highest grade completed)

Page 4: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Hou

rly e

arni

ngs

($)

Years of schooling (highest grade completed)

4

Schooling is negatively correlated with work experience. The plot fails to take account of this, and as a consequence the regression line underestimates the impact of schooling on earnings.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. cor S EXP(obs=540) | S ASVABC--------+------------------ S| 1.0000 EXP| -0.2179 1.0000

Page 5: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

5

We will investigate the distortion mathematically when we come to omitted variable bias.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Hou

rly e

arni

ngs

($)

Years of schooling (highest grade completed)

. cor S EXP(obs=540) | S ASVABC--------+------------------ S| 1.0000 EXP| -0.2179 1.0000

Page 6: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

6

To eliminate the distortion, you purge both EARNINGS and S of their components related to EXP and then draw a scatter diagram using the purged variables.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Hou

rly e

arni

ngs

($)

Years of schooling (highest grade completed)

. cor S EXP(obs=540) | S ASVABC--------+------------------ S| 1.0000 EXP| -0.2179 1.0000

Page 7: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

. reg EARNINGS EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 2.98 Model | 617.717488 1 617.717488 Prob > F = 0.0847 Residual | 111392.514 538 207.049282 R-squared = 0.0055-------------+------------------------------ Adj R-squared = 0.0037 Total | 112010.231 539 207.811189 Root MSE = 14.389

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- EXP | .2414715 .1398002 1.73 0.085 -.0331497 .5160927 _cons | 15.55527 2.442468 6.37 0.000 10.75732 20.35321------------------------------------------------------------------------------

. predict EEARN, resid

7

We start by regressing EARNINGS on EXP, as shown above. The residuals are the part of EARNINGS which is not related to EXP. The ‘predict’ command is the Stata command for saving the residuals from the most recent regression. We name them EEARN.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

Page 8: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

. reg S EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 26.82 Model | 152.160205 1 152.160205 Prob > F = 0.0000 Residual | 3052.82313 538 5.67439243 R-squared = 0.0475-------------+------------------------------ Adj R-squared = 0.0457 Total | 3204.98333 539 5.94616574 Root MSE = 2.3821

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- EXP | -.1198454 .0231436 -5.18 0.000 -.1653083 -.0743826 _cons | 15.69765 .4043447 38.82 0.000 14.90337 16.49194------------------------------------------------------------------------------

. predict ES, resid

8

We do the same with S. We regress it on EXP and save the residuals as ES.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

Page 9: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

9

Now we plot EEARN on ES and the scatter is a faithful representation of the relationship, both in terms of the slope of the trend line (the red line) and in terms of the variation about that line.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

-40

-20

0

20

40

60

80

-8 -6 -4 -2 0 2 4 6

Page 10: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

10

As you would expect, the trend line is steeper that in scatter diagram which did not control for EXP (reproduced here as the black dashed line).

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

-40

-20

0

20

40

60

80

-8 -6 -4 -2 0 2 4 6

Page 11: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

11

Here is the regression of EEARN on ES.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EEARN ES Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966-------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ES | 2.678125 .2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09 .5550284 0.00 1.000 -1.090288 1.090288------------------------------------------------------------------------------

Page 12: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

12

A mathematical proof that the technique works requires matrix algebra. We will content ourselves by verifying that the estimate of the slope coefficient is the same as in the multiple regression.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EEARN ES Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966-------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ES | 2.678125 .2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09 .5550284 0.00 1.000 -1.090288 1.090288------------------------------------------------------------------------------

. reg EARNINGS S EXP------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------

From multiple regression:

. reg EEARN ES Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966-------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ES | 2.678125 .2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09 .5550284 0.00 1.000 -1.090288 1.090288------------------------------------------------------------------------------

Page 13: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

13

Finally, a small and not very important technical point. You may have noticed that the standard error and t statistic do not quite match. The reason for this is that the number of degrees of freedom is overstated by 1 in the residuals regression.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EEARN ES Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966-------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ES | 2.678125 .2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09 .5550284 0.00 1.000 -1.090288 1.090288------------------------------------------------------------------------------

. reg EARNINGS S EXP------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------

From multiple regression:

. reg EEARN ES Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966-------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ES | 2.678125 .2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09 .5550284 0.00 1.000 -1.090288 1.090288------------------------------------------------------------------------------

Page 14: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

14

That regression has not made allowance for the fact that we have already used up 1 degree of freedom in removing EXP from the model.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

. reg EEARN ES Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966-------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ES | 2.678125 .2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09 .5550284 0.00 1.000 -1.090288 1.090288------------------------------------------------------------------------------

. reg EARNINGS S EXP------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------

From multiple regression:

. reg EEARN ES Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966-------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ES | 2.678125 .2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09 .5550284 0.00 1.000 -1.090288 1.090288------------------------------------------------------------------------------

Page 15: GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years

Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author.

The content of this slideshow comes from Section 3.2 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press.Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centrehttp://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school courseEC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspxor the University of London International Programmes distance learning courseEC2020 Elements of Econometricswww.londoninternational.ac.uk/lse.

2012.10.28