slides prepared by john s. loucks st. edward’s university

46
1 © 2003 South-Western/Thomson Learning™ © 2003 South-Western/Thomson Learning™ Slides Prepared by Slides Prepared by JOHN S. LOUCKS JOHN S. LOUCKS St. Edward’s University St. Edward’s University

Upload: takara

Post on 07-Jan-2016

17 views

Category:

Documents


1 download

DESCRIPTION

Slides Prepared by JOHN S. LOUCKS St. Edward’s University. Chapter 15 Multiple Regression. Multiple Regression Model Least Squares Method Multiple Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression Equation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

1 1 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Slides Prepared bySlides Prepared byJOHN S. LOUCKSJOHN S. LOUCKS

St. Edward’s UniversitySt. Edward’s University

Page 2: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

2 2 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Chapter 15Chapter 15 Multiple Regression Multiple Regression

Multiple Regression ModelMultiple Regression Model Least Squares Method Least Squares Method Multiple Coefficient of DeterminationMultiple Coefficient of Determination Model AssumptionsModel Assumptions Testing for SignificanceTesting for Significance Using the Estimated Regression EquationUsing the Estimated Regression Equation

for Estimation and Predictionfor Estimation and Prediction Qualitative Independent VariablesQualitative Independent Variables Residual AnalysisResidual Analysis

Page 3: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

3 3 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

The Multiple Regression ModelThe Multiple Regression Model

The Multiple Regression ModelThe Multiple Regression Model

yy = = 00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp + +

The Multiple Regression EquationThe Multiple Regression Equation

E(E(yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp

The Estimated Multiple Regression EquationThe Estimated Multiple Regression Equation

yy = = bb00 + + bb11xx1 1 + + bb22xx2 2 + . . . + + . . . + bbppxxpp

^

Page 4: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

4 4 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

The Least Squares MethodThe Least Squares Method

Least Squares CriterionLeast Squares Criterion

Computation of Coefficients ValuesComputation of Coefficients Values

The formulas for the regression The formulas for the regression coefficients coefficients bb00, , bb11, , bb22, . . . , . . . bbp p involve the use of involve the use of matrix algebra. We will rely on computer matrix algebra. We will rely on computer software packages to perform the calculations.software packages to perform the calculations.

A Note on Interpretation of CoefficientsA Note on Interpretation of Coefficients

bbi i represents an estimate of the change in represents an estimate of the change in yy corresponding to a one-unit change in corresponding to a one-unit change in xxii when when all other independent variables are held all other independent variables are held constant.constant.

min ( iy yi )2min ( iy yi )2^

Page 5: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

5 5 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

The Multiple Coefficient of DeterminationThe Multiple Coefficient of Determination

Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE

SST = SSR + SSESST = SSR + SSE

Multiple Coefficient of DeterminationMultiple Coefficient of Determination

R R 22 = SSR/SST = SSR/SST

Adjusted Multiple Coefficient of DeterminationAdjusted Multiple Coefficient of Determination

( ) ( ) ( )y y y y y yi i i i 2 2 2( ) ( ) ( )y y y y y yi i i i 2 2 2^^

R Rn

n pa2 21 1

11

( )R Rn

n pa2 21 1

11

( )

Page 6: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

6 6 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Model AssumptionsModel Assumptions

Assumptions About the Error Term Assumptions About the Error Term • The error The error is a random variable with mean of is a random variable with mean of

zero.zero.

• The variance of The variance of , denoted by , denoted by 22, is the same , is the same for all values of the independent variables.for all values of the independent variables.

• The values of The values of are independent. are independent.

• The error The error is a normally distributed random is a normally distributed random variable reflecting the deviation between the variable reflecting the deviation between the yy value and the expected value of value and the expected value of yy given by given by

00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp

Page 7: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

7 7 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Testing for Significance: Testing for Significance: F F Test Test

HypothesesHypotheses

HH00: : 11 = = 2 2 = . . . = = . . . = p p = 0= 0

HHaa: One or more of the parameters: One or more of the parameters

is not equal to zero.is not equal to zero. Test StatisticTest Statistic

FF = MSR/MSE = MSR/MSE

Page 8: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

8 8 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Testing for Significance: Testing for Significance: F F Test Test

Rejection RuleRejection Rule

Using test statistic: Reject Using test statistic: Reject HH00 if if FF > > FF

Using p-value:Using p-value: Reject Reject HH00 if if pp-value < -value <

where where FF is based on an is based on an FF distribution distribution with with

pp d.f. in the numerator and d.f. in the numerator and nn - - pp - 1 d.f. - 1 d.f. in the in the denominatordenominator

Page 9: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

9 9 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Testing for Significance: Testing for Significance: F F Test Test

ANOVA Table (assuming ANOVA Table (assuming pp independent independent variables)variables)

Source of Sum of Degrees of MeanSource of Sum of Degrees of Mean Variation Squares Freedom Squares Variation Squares Freedom Squares

FF

RegressionRegression SSRSSR p p

ErrorError SSESSE nn - - pp - 1 - 1

TotalTotal SSTSST nn - 1 - 1

SSRMSR

p

SSRMSR

p

SSEMSE

1n p

SSE

MSE1n p

MSRMSE

F MSRMSE

F

Page 10: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

10 10 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Testing for Significance: Testing for Significance: t t Test Test

HypothesesHypotheses

HH00: : ii = 0 = 0

HHaa: : ii = 0 = 0 Test StatisticTest Statistic

tbs

i

bi

tbs

i

bi

Page 11: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

11 11 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Testing for Significance: Testing for Significance: t t Test Test

Rejection RuleRejection Rule

Using test statistic: Reject Using test statistic: Reject HH00 if if tt < - < -ttor or tt > > tt

Using p-value:Using p-value: Reject Reject HH00 if if pp-value -value < <

where where tt is based on a is based on a t t distribution distribution withwith

nn - - pp - 1 degrees of freedom - 1 degrees of freedom

Page 12: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

12 12 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Testing for Significance: Multicollinearity Testing for Significance: Multicollinearity

The term The term multicollinearitymulticollinearity refers to the refers to the correlation among the independent variables.correlation among the independent variables.

When the independent variables are highly When the independent variables are highly correlated (say, |correlated (say, |r r | > .7), it is not possible to | > .7), it is not possible to determine the separate effect of any particular determine the separate effect of any particular independent variable on the dependent variable.independent variable on the dependent variable.

If the estimated regression equation is to be used If the estimated regression equation is to be used only for predictive purposes, multicollinearity is only for predictive purposes, multicollinearity is usually not a serious problem.usually not a serious problem.

Every attempt should be made to avoid including Every attempt should be made to avoid including independent variables that are highly correlated.independent variables that are highly correlated.

Page 13: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

13 13 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Using the Estimated Regression EquationUsing the Estimated Regression Equationfor Estimation and Predictionfor Estimation and Prediction

The procedures for estimating the mean value of The procedures for estimating the mean value of yy and predicting an individual value of and predicting an individual value of y y in multiple in multiple regression are similar to those in simple regression.regression are similar to those in simple regression.

We substitute the given values of We substitute the given values of xx11, , xx22, . . . , , . . . , xxpp into into the estimated regression equation and use the the estimated regression equation and use the corresponding value of corresponding value of yy as the point estimate. as the point estimate.

The formulas required to develop interval estimates The formulas required to develop interval estimates for the mean value of for the mean value of yy and for an individual value and for an individual value of of y y are beyond the scope of the text. are beyond the scope of the text.

Software packages for multiple regression will often Software packages for multiple regression will often provide these interval estimates.provide these interval estimates.

^

Page 14: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

14 14 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary SurveyExample: Programmer Salary Survey

A software firm collected data for a sample of 20A software firm collected data for a sample of 20

computer programmers. A suggestion was made thatcomputer programmers. A suggestion was made that

regression analysis could be used to determine if salaryregression analysis could be used to determine if salary

was related to the years of experience and the score onwas related to the years of experience and the score on

the firm’s programmer aptitude test.the firm’s programmer aptitude test.

The years of experience, score on the aptitude test,The years of experience, score on the aptitude test,

and corresponding annual salary ($1000s) for a sampleand corresponding annual salary ($1000s) for a sample

of 20 programmers is shown on the next slide.of 20 programmers is shown on the next slide.

Page 15: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

15 15 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Exper.Exper. ScoreScore SalarySalary Exper.Exper. ScoreScore SalarySalary

44 7878 2424 99 8888 3838

77 100100 4343 22 7373 26.626.6

11 8686 23.723.7 1010 7575 36.236.2

55 8282 34.334.3 55 8181 31.631.6

88 8686 35.835.8 66 7474 2929

1010 8484 3838 88 8787 3434

00 7575 22.222.2 44 7979 30.130.1

11 8080 23.123.1 66 9494 33.933.9

66 8383 3030 33 7070 28.228.2

66 9191 3333 33 8989 3030

Page 16: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

16 16 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Multiple Regression ModelMultiple Regression Model

Suppose we believe that salary (Suppose we believe that salary (yy) is related to ) is related to the years of experience (the years of experience (xx11) and the score on ) and the score on the programmer aptitude test (the programmer aptitude test (xx22) by the ) by the following regression model:following regression model:

yy = = 00 + + 11xx1 1 + + 22xx2 2 + +

wherewhere

yy = annual salary ($000) = annual salary ($000)

xx11 = years of experience = years of experience

xx22 = score on programmer aptitude test = score on programmer aptitude test

Page 17: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

17 17 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Multiple Regression EquationMultiple Regression Equation

Using the assumption Using the assumption E E (() = 0, we obtain) = 0, we obtain

E(E(y y ) = ) = 00 + + 11xx1 1 + + 22xx22

Estimated Regression EquationEstimated Regression Equation

bb00, , bb11, , bb2 2 are the least squares estimates of are the least squares estimates of 00, , 11, , 22

ThusThus

yy = = bb00 + + bb11xx1 1 + + bb22xx22

^

Page 18: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

18 18 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Solving for the Estimates of Solving for the Estimates of 00, , 11, , 22

ComputerComputerPackagePackage

for Solvingfor SolvingMultipleMultiple

RegressionRegressionProblemsProblems

ComputerComputerPackagePackage

for Solvingfor SolvingMultipleMultiple

RegressionRegressionProblemsProblems

bb00 = = bb11 = = bb22 = =RR22 = =

etc.etc.

bb00 = = bb11 = = bb22 = =RR22 = =

etc.etc.

Input DataInput DataLeast SquaresLeast Squares

OutputOutput

xx11 xx22 yy

4 78 244 78 24 7 100 437 100 43 . . .. . . . . .. . . 3 89 303 89 30

xx11 xx22 yy

4 78 244 78 24 7 100 437 100 43 . . .. . . . . .. . . 3 89 303 89 30

Page 19: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

19 19 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation Formula Worksheet (showing data entered)Formula Worksheet (showing data entered)

A B C D1 Programmer Experience (yrs) Test Score Salary ($K)2 1 4 78 24.03 2 7 100 43.04 3 1 86 23.75 4 5 82 34.36 5 8 86 35.87 6 10 84 38.08 7 0 75 22.29 8 1 80 23.1

Note: Rows 10-21 are not shown.Note: Rows 10-21 are not shown.

Page 20: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

20 20 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Performing the Multiple Regression AnalysisPerforming the Multiple Regression Analysis

Step 1Step 1 Select the Select the Tools Tools pull-down menupull-down menu

Step 2Step 2 Choose the Choose the Data AnalysisData Analysis option option

Step 3Step 3 Choose Choose RegressionRegression from the list of from the list of Analysis Analysis Tools Tools

… … continuedcontinued

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

Page 21: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

21 21 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Performing the Multiple Regression AnalysisPerforming the Multiple Regression Analysis

Step 4Step 4 When the Regression dialog box appears: When the Regression dialog box appears:

Enter D1:D21 in the Enter D1:D21 in the Input Y RangeInput Y Range box box

Enter B1:C21 in the Enter B1:C21 in the Input X RangeInput X Range box box

Select Select LabelsLabels

Select Select Confidence LevelConfidence Level

Enter 95 in the Enter 95 in the Confidence LevelConfidence Level box box

Select Select Output RangeOutput Range and enter A24 in the and enter A24 in the

Output RangeOutput Range box box

ClickClick OK OK

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

Page 22: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

22 22 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Value Worksheet (Regression Statistics)Value Worksheet (Regression Statistics)

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

A B C23 24 SUMMARY OUTPUT2526 Regression Statistics27 Multiple R 0.91333405928 R Square 0.83417910329 Adjusted R Square 0.81467076230 Standard Error 2.41876207631 Observations 2032

Page 23: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

23 23 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Value Worksheet (ANOVA Output)Value Worksheet (ANOVA Output)

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

A B C D E F3233 ANOVA34 df SS MS F Significance F35 Regression 2 500.3285 250.1643 42.76013 2.32774E-0736 Residual 17 99.45697 5.8504137 Total 19 599.785538

The The Significance FSignificance F value in value incell F35 is the cell F35 is the pp-value used-value used

to test for overall to test for overall significance.significance.

Page 24: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

24 24 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

A B C D E3839 Coeffic. Std. Err. t Stat P-value40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843

Note: Columns F-I are not shown.Note: Columns F-I are not shown.

The The P-valueP-value in cell E41 is in cell E41 is usedused

to test for the individualto test for the individualsignificance of Experience.significance of Experience.

Page 25: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

25 25 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

A B C D E3839 Coeffic. Std. Err. t Stat P-value40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843

Note: Columns F-I are not shown.Note: Columns F-I are not shown.

The The P-valueP-value in cell E42 is in cell E42 is usedused

to test for the individualto test for the individualsignificance of Test Score.significance of Test Score.

Page 26: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

26 26 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Estimated Regression EquationEstimated Regression Equation

SALARY = 3.174 + 1.404(EXPER) + SALARY = 3.174 + 1.404(EXPER) + 0.2509(SCORE)0.2509(SCORE)

Note: Predicted salary will be in thousands of Note: Predicted salary will be in thousands of dollarsdollars

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

Page 27: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

27 27 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

A B F G H I3839 Coeffic. Low. 95% Up. 95% Low. 95.0% Up. 95.0%40 Intercept 3.17394 -9.814248 16.1621 -9.814248 16.162141 Experience 1.4039 0.984962 1.82284 0.984962 1.8228442 Test Score 0.25089 0.087682 0.41409 0.087682 0.4140943

Note: Columns C-E are hidden.Note: Columns C-E are hidden.

Page 28: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

28 28 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary SurveyExample: Programmer Salary Survey

FF Test Test

• HypothesesHypotheses HH00: : 11 = = 2 2 = 0= 0

HHaa: One or both of the : One or both of the parametersparameters

is not equal to zero.is not equal to zero.

• Rejection RuleRejection Rule

For For = .05 and d.f. = 2, 17: = .05 and d.f. = 2, 17: FF.05.05 = 3.59 = 3.59

Reject Reject HH00 if F > 3.59. if F > 3.59.

• Test StatisticTest Statistic

FF = MSR/MSE = 250.16/5.85 = 42.76 = MSR/MSE = 250.16/5.85 = 42.76

• ConclusionConclusion

We can reject We can reject HH00..

Page 29: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

29 29 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary SurveyExample: Programmer Salary Survey

tt Test for Significance of Individual Parameters Test for Significance of Individual Parameters

• HypothesesHypotheses H H00: : ii = 0 = 0

HHaa: : ii = 0 = 0

• Rejection RuleRejection Rule

For For = .05 and d.f. = 17, = .05 and d.f. = 17, tt.025.025 = 2.11 = 2.11

Reject Reject HH00 if if tt > 2.11 > 2.11

• Test StatisticsTest Statistics

• ConclusionsConclusions

Reject Reject HH00: : 11 = 0 = 0 Reject Reject HH00: : 22 = 0= 0

bsb

1

1

1 40391986

7 07 ..

.bsb

1

1

1 40391986

7 07 ..

. bsb

2

2

2508907735

3 24 ..

.bsb

2

2

2508907735

3 24 ..

.

Page 30: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

30 30 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Qualitative Independent VariablesQualitative Independent Variables

In many situations we must work with In many situations we must work with qualitative qualitative independent variablesindependent variables such as gender (male, such as gender (male, female), method of payment (cash, check, credit female), method of payment (cash, check, credit card), etc.card), etc.

For example, For example, xx22 might represent gender where might represent gender where xx22 = 0 indicates male and = 0 indicates male and xx22 = 1 indicates female. = 1 indicates female.

In this case, In this case, xx22 is called a is called a dummy or indicator dummy or indicator variablevariable..

If a qualitative variable has If a qualitative variable has kk levels, levels, kk - 1 dummy - 1 dummy variables are required, with each dummy variable variables are required, with each dummy variable being coded as 0 or 1.being coded as 0 or 1.

For example, a variable with levels A, B, and C For example, a variable with levels A, B, and C would be represented by would be represented by xx11 and and xx22 values of (0, 0), values of (0, 0), (1, 0), and (0,1), respectively.(1, 0), and (0,1), respectively.

Page 31: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

31 31 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

As an extension of the problem involving theAs an extension of the problem involving the

computer programmer salary survey, suppose thatcomputer programmer salary survey, suppose that

management also believes that the annual salary ismanagement also believes that the annual salary is

related to whether or not the individual has a graduaterelated to whether or not the individual has a graduate

degree in computer science or information systems.degree in computer science or information systems.

The years of experience, the score on the programmerThe years of experience, the score on the programmer

aptitude test, whether or not the individual has aaptitude test, whether or not the individual has a

relevant graduate degree, and the annual salary ($000)relevant graduate degree, and the annual salary ($000)

for each of the sampled 20 programmers are shown onfor each of the sampled 20 programmers are shown on

the next slide.the next slide.

Page 32: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

32 32 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Exp.Exp. ScoreScore Degr.Degr. SalarySalary Exp.Exp. ScoreScore Degr.Degr. SalarySalary

44 7878 NoNo 2424 99 8888 YesYes 383877 100100 YesYes 4343 22 7373 NoNo 26.626.611 8686 NoNo 23.723.7 1010 7575 YesYes 36.236.255 8282 YesYes 34.334.3 55 8181 NoNo 31.631.688 8686 YesYes 35.835.8 66 7474 NoNo 29291010 8484 YesYes 3838 88 8787 YesYes 343400 7575 NoNo 22.222.2 44 7979 NoNo 30.130.111 8080 No No 23.123.1 66 9494 YesYes 33.933.966 8383 NoNo 3030 33 7070 NoNo 28.228.266 9191 YesYes 3333 33 8989 NoNo 3030

Page 33: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

33 33 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Multiple Regression EquationMultiple Regression Equation

E(E(y y ) = ) = 00 + + 11xx1 1 + + 22xx2 2 + + 33xx33

Estimated Regression EquationEstimated Regression Equation

yy = = bb00 + + bb11xx1 1 ++ bb22xx2 2 + + bb33xx33

wherewhere

yy = annual salary ($000) = annual salary ($000)

xx11 = years of experience = years of experience

xx22 = score on programmer aptitude test = score on programmer aptitude test

xx33 = 0 if individual = 0 if individual does notdoes not have a grad. have a grad. degreedegree

1 if individual 1 if individual doesdoes have a grad. degree have a grad. degree

Note: Note: xx33 is referred to as a dummy variable. is referred to as a dummy variable.

^

Page 34: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

34 34 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

Formula Worksheet (showing data)Formula Worksheet (showing data)

A B C D E

1Pro-

grammerExperience

(years)Test

ScoreGrad.

DegreeSalary ($000)

2 1 4 78 0 24.03 2 7 100 1 43.04 3 1 86 0 23.75 4 5 82 1 34.36 5 8 86 1 35.87 6 10 84 1 38.08 7 0 75 0 22.2

Note: Rows 9-21 are not shown.Note: Rows 9-21 are not shown.

Page 35: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

35 35 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation Value Worksheet (Regression Statistics)Value Worksheet (Regression Statistics)

A B C23 24 SUMMARY OUTPUT2526 Regression Statistics27 Multiple R 0.92021523928 R Square 0.84679608529 Adjusted R Square 0.81807035130 Standard Error 2.39647510131 Observations 2032

Page 36: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

36 36 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

Value Worksheet (ANOVA Output)Value Worksheet (ANOVA Output)

A B C D E F3233 ANOVA34 df SS MS F Significance F35 Regression 3 507.896 169.2987 29.47866 9.41675E-0736 Residual 16 91.88949 5.74309337 Total 19 599.785538

Page 37: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

37 37 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)

A B C D E3839 Coeffic. Std. Err. t Stat P-value40 Intercept 7.94485 7.3808 1.0764 0.297741 Experience 1.14758 0.2976 3.8561 0.001442 Test Score 0.19694 0.0899 2.1905 0.0436443 Grad. Degr. 2.28042 1.98661 1.1479 0.26789

Note: Columns F-I are not shown.Note: Columns F-I are not shown.

Page 38: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

38 38 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression

EquationEquation

Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)

A B F G H I3839 Coeffic. Low. 95% Up. 95% Low. 95.0% Up. 95.0%40 Intercept 7.94485 -7.701739 23.5914 -7.7017385 23.59143641 Experience 1.14758 0.516695 1.77847 0.51669483 1.778468642 Test Score 0.19694 0.00635 0.38752 0.00634964 0.387524343 Grad. Degr. 2.28042 -1.931002 6.49185 -1.9310017 6.4918494

Note: Columns C-E are hidden.Note: Columns C-E are hidden.

Page 39: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

39 39 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Interpreting the ParametersInterpreting the Parameters

• bb11 = 1.15 = 1.15

Salary is expected to increase by $1,150 for Salary is expected to increase by $1,150 for each additional year of experience (when all each additional year of experience (when all other independent variables are held other independent variables are held constant)constant)

Page 40: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

40 40 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Interpreting the ParametersInterpreting the Parameters

• bb22 = 0.197 = 0.197

Salary is expected to increase by $197 for Salary is expected to increase by $197 for each additional point scored on the each additional point scored on the programmer aptitude test (when all other programmer aptitude test (when all other independent variables are held constant)independent variables are held constant)

Page 41: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

41 41 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Interpreting the ParametersInterpreting the Parameters

• bb33 = 2.28 = 2.28

Salary is expected to be $2,280 higher for Salary is expected to be $2,280 higher for an individual with a graduate degree than an individual with a graduate degree than one without a graduate degree (when all one without a graduate degree (when all other independent variables are held other independent variables are held constant)constant)

Page 42: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

42 42 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Residual AnalysisResidual Analysis

For simple linear regression the residual plot For simple linear regression the residual plot againstagainst

and the residual plot against and the residual plot against xx provide the provide the same information.same information.

In multiple regression analysis it is preferable In multiple regression analysis it is preferable to use the residual plot against to determine to use the residual plot against to determine if the model assumptions are satisfied.if the model assumptions are satisfied.

yy

yy

Page 43: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

43 43 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Residual AnalysisResidual Analysis

Standardized residuals are frequently used in Standardized residuals are frequently used in residual plots for purposes of:residual plots for purposes of:

• Identifying outliers (typically, standardized Identifying outliers (typically, standardized residuals < -2 or > +2)residuals < -2 or > +2)

• Providing insight about the assumption that Providing insight about the assumption that the error term the error term has a normal distribution has a normal distribution

The computation of the standardized residuals The computation of the standardized residuals in multiple regression analysis is too complex in multiple regression analysis is too complex to be done by handto be done by hand

Excel’s Regression tool can be usedExcel’s Regression tool can be used

Page 44: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

44 44 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Using Excel to Construct Using Excel to Construct a Standardized Residual Plota Standardized Residual Plot

Value Worksheet (Residual Output)Value Worksheet (Residual Output)

A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y Residuals Standard Residuals32 1 27.89626052 -3.89626052 -1.77170689633 2 37.95204323 5.047956775 2.29540601634 3 26.02901122 -2.32901122 -1.05904757235 4 32.11201403 2.187985973 0.99492059636 5 36.34250715 -0.54250715 -0.246688757

Note: Rows 37-51 are not shown.Note: Rows 37-51 are not shown.

Page 45: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

45 45 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

Using Excel to Construct Using Excel to Construct a Standardized Residual Plota Standardized Residual Plot

Standardized Residual Plot

-2

-1

0

1

2

3

0 10 20 30 40 50

Predicted Salary

Sta

nd

ard

R

es

idu

als

OutlierOutlier

Page 46: Slides Prepared by JOHN S. LOUCKS St. Edward’s University

46 46 Slide

Slide

© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™

End of Chapter 15End of Chapter 15