slides prepared by john s. loucks st. edward’s university
DESCRIPTION
Slides Prepared by JOHN S. LOUCKS St. Edward’s University. Chapter 15 Multiple Regression. Multiple Regression Model Least Squares Method Multiple Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression Equation - PowerPoint PPT PresentationTRANSCRIPT
1 1 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Slides Prepared bySlides Prepared byJOHN S. LOUCKSJOHN S. LOUCKS
St. Edward’s UniversitySt. Edward’s University
2 2 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Chapter 15Chapter 15 Multiple Regression Multiple Regression
Multiple Regression ModelMultiple Regression Model Least Squares Method Least Squares Method Multiple Coefficient of DeterminationMultiple Coefficient of Determination Model AssumptionsModel Assumptions Testing for SignificanceTesting for Significance Using the Estimated Regression EquationUsing the Estimated Regression Equation
for Estimation and Predictionfor Estimation and Prediction Qualitative Independent VariablesQualitative Independent Variables Residual AnalysisResidual Analysis
3 3 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
The Multiple Regression ModelThe Multiple Regression Model
The Multiple Regression ModelThe Multiple Regression Model
yy = = 00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp + +
The Multiple Regression EquationThe Multiple Regression Equation
E(E(yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp
The Estimated Multiple Regression EquationThe Estimated Multiple Regression Equation
yy = = bb00 + + bb11xx1 1 + + bb22xx2 2 + . . . + + . . . + bbppxxpp
^
4 4 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
The Least Squares MethodThe Least Squares Method
Least Squares CriterionLeast Squares Criterion
Computation of Coefficients ValuesComputation of Coefficients Values
The formulas for the regression The formulas for the regression coefficients coefficients bb00, , bb11, , bb22, . . . , . . . bbp p involve the use of involve the use of matrix algebra. We will rely on computer matrix algebra. We will rely on computer software packages to perform the calculations.software packages to perform the calculations.
A Note on Interpretation of CoefficientsA Note on Interpretation of Coefficients
bbi i represents an estimate of the change in represents an estimate of the change in yy corresponding to a one-unit change in corresponding to a one-unit change in xxii when when all other independent variables are held all other independent variables are held constant.constant.
min ( iy yi )2min ( iy yi )2^
5 5 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
The Multiple Coefficient of DeterminationThe Multiple Coefficient of Determination
Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE
SST = SSR + SSESST = SSR + SSE
Multiple Coefficient of DeterminationMultiple Coefficient of Determination
R R 22 = SSR/SST = SSR/SST
Adjusted Multiple Coefficient of DeterminationAdjusted Multiple Coefficient of Determination
( ) ( ) ( )y y y y y yi i i i 2 2 2( ) ( ) ( )y y y y y yi i i i 2 2 2^^
R Rn
n pa2 21 1
11
( )R Rn
n pa2 21 1
11
( )
6 6 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Model AssumptionsModel Assumptions
Assumptions About the Error Term Assumptions About the Error Term • The error The error is a random variable with mean of is a random variable with mean of
zero.zero.
• The variance of The variance of , denoted by , denoted by 22, is the same , is the same for all values of the independent variables.for all values of the independent variables.
• The values of The values of are independent. are independent.
• The error The error is a normally distributed random is a normally distributed random variable reflecting the deviation between the variable reflecting the deviation between the yy value and the expected value of value and the expected value of yy given by given by
00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp
7 7 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Testing for Significance: Testing for Significance: F F Test Test
HypothesesHypotheses
HH00: : 11 = = 2 2 = . . . = = . . . = p p = 0= 0
HHaa: One or more of the parameters: One or more of the parameters
is not equal to zero.is not equal to zero. Test StatisticTest Statistic
FF = MSR/MSE = MSR/MSE
8 8 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Testing for Significance: Testing for Significance: F F Test Test
Rejection RuleRejection Rule
Using test statistic: Reject Using test statistic: Reject HH00 if if FF > > FF
Using p-value:Using p-value: Reject Reject HH00 if if pp-value < -value <
where where FF is based on an is based on an FF distribution distribution with with
pp d.f. in the numerator and d.f. in the numerator and nn - - pp - 1 d.f. - 1 d.f. in the in the denominatordenominator
9 9 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Testing for Significance: Testing for Significance: F F Test Test
ANOVA Table (assuming ANOVA Table (assuming pp independent independent variables)variables)
Source of Sum of Degrees of MeanSource of Sum of Degrees of Mean Variation Squares Freedom Squares Variation Squares Freedom Squares
FF
RegressionRegression SSRSSR p p
ErrorError SSESSE nn - - pp - 1 - 1
TotalTotal SSTSST nn - 1 - 1
SSRMSR
p
SSRMSR
p
SSEMSE
1n p
SSE
MSE1n p
MSRMSE
F MSRMSE
F
10 10 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Testing for Significance: Testing for Significance: t t Test Test
HypothesesHypotheses
HH00: : ii = 0 = 0
HHaa: : ii = 0 = 0 Test StatisticTest Statistic
tbs
i
bi
tbs
i
bi
11 11 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Testing for Significance: Testing for Significance: t t Test Test
Rejection RuleRejection Rule
Using test statistic: Reject Using test statistic: Reject HH00 if if tt < - < -ttor or tt > > tt
Using p-value:Using p-value: Reject Reject HH00 if if pp-value -value < <
where where tt is based on a is based on a t t distribution distribution withwith
nn - - pp - 1 degrees of freedom - 1 degrees of freedom
12 12 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Testing for Significance: Multicollinearity Testing for Significance: Multicollinearity
The term The term multicollinearitymulticollinearity refers to the refers to the correlation among the independent variables.correlation among the independent variables.
When the independent variables are highly When the independent variables are highly correlated (say, |correlated (say, |r r | > .7), it is not possible to | > .7), it is not possible to determine the separate effect of any particular determine the separate effect of any particular independent variable on the dependent variable.independent variable on the dependent variable.
If the estimated regression equation is to be used If the estimated regression equation is to be used only for predictive purposes, multicollinearity is only for predictive purposes, multicollinearity is usually not a serious problem.usually not a serious problem.
Every attempt should be made to avoid including Every attempt should be made to avoid including independent variables that are highly correlated.independent variables that are highly correlated.
13 13 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Using the Estimated Regression EquationUsing the Estimated Regression Equationfor Estimation and Predictionfor Estimation and Prediction
The procedures for estimating the mean value of The procedures for estimating the mean value of yy and predicting an individual value of and predicting an individual value of y y in multiple in multiple regression are similar to those in simple regression.regression are similar to those in simple regression.
We substitute the given values of We substitute the given values of xx11, , xx22, . . . , , . . . , xxpp into into the estimated regression equation and use the the estimated regression equation and use the corresponding value of corresponding value of yy as the point estimate. as the point estimate.
The formulas required to develop interval estimates The formulas required to develop interval estimates for the mean value of for the mean value of yy and for an individual value and for an individual value of of y y are beyond the scope of the text. are beyond the scope of the text.
Software packages for multiple regression will often Software packages for multiple regression will often provide these interval estimates.provide these interval estimates.
^
14 14 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary SurveyExample: Programmer Salary Survey
A software firm collected data for a sample of 20A software firm collected data for a sample of 20
computer programmers. A suggestion was made thatcomputer programmers. A suggestion was made that
regression analysis could be used to determine if salaryregression analysis could be used to determine if salary
was related to the years of experience and the score onwas related to the years of experience and the score on
the firm’s programmer aptitude test.the firm’s programmer aptitude test.
The years of experience, score on the aptitude test,The years of experience, score on the aptitude test,
and corresponding annual salary ($1000s) for a sampleand corresponding annual salary ($1000s) for a sample
of 20 programmers is shown on the next slide.of 20 programmers is shown on the next slide.
15 15 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary SurveyExample: Programmer Salary Survey
Exper.Exper. ScoreScore SalarySalary Exper.Exper. ScoreScore SalarySalary
44 7878 2424 99 8888 3838
77 100100 4343 22 7373 26.626.6
11 8686 23.723.7 1010 7575 36.236.2
55 8282 34.334.3 55 8181 31.631.6
88 8686 35.835.8 66 7474 2929
1010 8484 3838 88 8787 3434
00 7575 22.222.2 44 7979 30.130.1
11 8080 23.123.1 66 9494 33.933.9
66 8383 3030 33 7070 28.228.2
66 9191 3333 33 8989 3030
16 16 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary SurveyExample: Programmer Salary Survey
Multiple Regression ModelMultiple Regression Model
Suppose we believe that salary (Suppose we believe that salary (yy) is related to ) is related to the years of experience (the years of experience (xx11) and the score on ) and the score on the programmer aptitude test (the programmer aptitude test (xx22) by the ) by the following regression model:following regression model:
yy = = 00 + + 11xx1 1 + + 22xx2 2 + +
wherewhere
yy = annual salary ($000) = annual salary ($000)
xx11 = years of experience = years of experience
xx22 = score on programmer aptitude test = score on programmer aptitude test
17 17 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary SurveyExample: Programmer Salary Survey
Multiple Regression EquationMultiple Regression Equation
Using the assumption Using the assumption E E (() = 0, we obtain) = 0, we obtain
E(E(y y ) = ) = 00 + + 11xx1 1 + + 22xx22
Estimated Regression EquationEstimated Regression Equation
bb00, , bb11, , bb2 2 are the least squares estimates of are the least squares estimates of 00, , 11, , 22
ThusThus
yy = = bb00 + + bb11xx1 1 + + bb22xx22
^
18 18 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary SurveyExample: Programmer Salary Survey
Solving for the Estimates of Solving for the Estimates of 00, , 11, , 22
ComputerComputerPackagePackage
for Solvingfor SolvingMultipleMultiple
RegressionRegressionProblemsProblems
ComputerComputerPackagePackage
for Solvingfor SolvingMultipleMultiple
RegressionRegressionProblemsProblems
bb00 = = bb11 = = bb22 = =RR22 = =
etc.etc.
bb00 = = bb11 = = bb22 = =RR22 = =
etc.etc.
Input DataInput DataLeast SquaresLeast Squares
OutputOutput
xx11 xx22 yy
4 78 244 78 24 7 100 437 100 43 . . .. . . . . .. . . 3 89 303 89 30
xx11 xx22 yy
4 78 244 78 24 7 100 437 100 43 . . .. . . . . .. . . 3 89 303 89 30
19 19 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation Formula Worksheet (showing data entered)Formula Worksheet (showing data entered)
A B C D1 Programmer Experience (yrs) Test Score Salary ($K)2 1 4 78 24.03 2 7 100 43.04 3 1 86 23.75 4 5 82 34.36 5 8 86 35.87 6 10 84 38.08 7 0 75 22.29 8 1 80 23.1
Note: Rows 10-21 are not shown.Note: Rows 10-21 are not shown.
20 20 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Performing the Multiple Regression AnalysisPerforming the Multiple Regression Analysis
Step 1Step 1 Select the Select the Tools Tools pull-down menupull-down menu
Step 2Step 2 Choose the Choose the Data AnalysisData Analysis option option
Step 3Step 3 Choose Choose RegressionRegression from the list of from the list of Analysis Analysis Tools Tools
… … continuedcontinued
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
21 21 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Performing the Multiple Regression AnalysisPerforming the Multiple Regression Analysis
Step 4Step 4 When the Regression dialog box appears: When the Regression dialog box appears:
Enter D1:D21 in the Enter D1:D21 in the Input Y RangeInput Y Range box box
Enter B1:C21 in the Enter B1:C21 in the Input X RangeInput X Range box box
Select Select LabelsLabels
Select Select Confidence LevelConfidence Level
Enter 95 in the Enter 95 in the Confidence LevelConfidence Level box box
Select Select Output RangeOutput Range and enter A24 in the and enter A24 in the
Output RangeOutput Range box box
ClickClick OK OK
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
22 22 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Value Worksheet (Regression Statistics)Value Worksheet (Regression Statistics)
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
A B C23 24 SUMMARY OUTPUT2526 Regression Statistics27 Multiple R 0.91333405928 R Square 0.83417910329 Adjusted R Square 0.81467076230 Standard Error 2.41876207631 Observations 2032
23 23 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Value Worksheet (ANOVA Output)Value Worksheet (ANOVA Output)
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
A B C D E F3233 ANOVA34 df SS MS F Significance F35 Regression 2 500.3285 250.1643 42.76013 2.32774E-0736 Residual 17 99.45697 5.8504137 Total 19 599.785538
The The Significance FSignificance F value in value incell F35 is the cell F35 is the pp-value used-value used
to test for overall to test for overall significance.significance.
24 24 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
A B C D E3839 Coeffic. Std. Err. t Stat P-value40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843
Note: Columns F-I are not shown.Note: Columns F-I are not shown.
The The P-valueP-value in cell E41 is in cell E41 is usedused
to test for the individualto test for the individualsignificance of Experience.significance of Experience.
25 25 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
A B C D E3839 Coeffic. Std. Err. t Stat P-value40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843
Note: Columns F-I are not shown.Note: Columns F-I are not shown.
The The P-valueP-value in cell E42 is in cell E42 is usedused
to test for the individualto test for the individualsignificance of Test Score.significance of Test Score.
26 26 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Estimated Regression EquationEstimated Regression Equation
SALARY = 3.174 + 1.404(EXPER) + SALARY = 3.174 + 1.404(EXPER) + 0.2509(SCORE)0.2509(SCORE)
Note: Predicted salary will be in thousands of Note: Predicted salary will be in thousands of dollarsdollars
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
27 27 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
A B F G H I3839 Coeffic. Low. 95% Up. 95% Low. 95.0% Up. 95.0%40 Intercept 3.17394 -9.814248 16.1621 -9.814248 16.162141 Experience 1.4039 0.984962 1.82284 0.984962 1.8228442 Test Score 0.25089 0.087682 0.41409 0.087682 0.4140943
Note: Columns C-E are hidden.Note: Columns C-E are hidden.
28 28 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary SurveyExample: Programmer Salary Survey
FF Test Test
• HypothesesHypotheses HH00: : 11 = = 2 2 = 0= 0
HHaa: One or both of the : One or both of the parametersparameters
is not equal to zero.is not equal to zero.
• Rejection RuleRejection Rule
For For = .05 and d.f. = 2, 17: = .05 and d.f. = 2, 17: FF.05.05 = 3.59 = 3.59
Reject Reject HH00 if F > 3.59. if F > 3.59.
• Test StatisticTest Statistic
FF = MSR/MSE = 250.16/5.85 = 42.76 = MSR/MSE = 250.16/5.85 = 42.76
• ConclusionConclusion
We can reject We can reject HH00..
29 29 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary SurveyExample: Programmer Salary Survey
tt Test for Significance of Individual Parameters Test for Significance of Individual Parameters
• HypothesesHypotheses H H00: : ii = 0 = 0
HHaa: : ii = 0 = 0
• Rejection RuleRejection Rule
For For = .05 and d.f. = 17, = .05 and d.f. = 17, tt.025.025 = 2.11 = 2.11
Reject Reject HH00 if if tt > 2.11 > 2.11
• Test StatisticsTest Statistics
• ConclusionsConclusions
Reject Reject HH00: : 11 = 0 = 0 Reject Reject HH00: : 22 = 0= 0
bsb
1
1
1 40391986
7 07 ..
.bsb
1
1
1 40391986
7 07 ..
. bsb
2
2
2508907735
3 24 ..
.bsb
2
2
2508907735
3 24 ..
.
30 30 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Qualitative Independent VariablesQualitative Independent Variables
In many situations we must work with In many situations we must work with qualitative qualitative independent variablesindependent variables such as gender (male, such as gender (male, female), method of payment (cash, check, credit female), method of payment (cash, check, credit card), etc.card), etc.
For example, For example, xx22 might represent gender where might represent gender where xx22 = 0 indicates male and = 0 indicates male and xx22 = 1 indicates female. = 1 indicates female.
In this case, In this case, xx22 is called a is called a dummy or indicator dummy or indicator variablevariable..
If a qualitative variable has If a qualitative variable has kk levels, levels, kk - 1 dummy - 1 dummy variables are required, with each dummy variable variables are required, with each dummy variable being coded as 0 or 1.being coded as 0 or 1.
For example, a variable with levels A, B, and C For example, a variable with levels A, B, and C would be represented by would be represented by xx11 and and xx22 values of (0, 0), values of (0, 0), (1, 0), and (0,1), respectively.(1, 0), and (0,1), respectively.
31 31 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)
As an extension of the problem involving theAs an extension of the problem involving the
computer programmer salary survey, suppose thatcomputer programmer salary survey, suppose that
management also believes that the annual salary ismanagement also believes that the annual salary is
related to whether or not the individual has a graduaterelated to whether or not the individual has a graduate
degree in computer science or information systems.degree in computer science or information systems.
The years of experience, the score on the programmerThe years of experience, the score on the programmer
aptitude test, whether or not the individual has aaptitude test, whether or not the individual has a
relevant graduate degree, and the annual salary ($000)relevant graduate degree, and the annual salary ($000)
for each of the sampled 20 programmers are shown onfor each of the sampled 20 programmers are shown on
the next slide.the next slide.
32 32 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)
Exp.Exp. ScoreScore Degr.Degr. SalarySalary Exp.Exp. ScoreScore Degr.Degr. SalarySalary
44 7878 NoNo 2424 99 8888 YesYes 383877 100100 YesYes 4343 22 7373 NoNo 26.626.611 8686 NoNo 23.723.7 1010 7575 YesYes 36.236.255 8282 YesYes 34.334.3 55 8181 NoNo 31.631.688 8686 YesYes 35.835.8 66 7474 NoNo 29291010 8484 YesYes 3838 88 8787 YesYes 343400 7575 NoNo 22.222.2 44 7979 NoNo 30.130.111 8080 No No 23.123.1 66 9494 YesYes 33.933.966 8383 NoNo 3030 33 7070 NoNo 28.228.266 9191 YesYes 3333 33 8989 NoNo 3030
33 33 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)
Multiple Regression EquationMultiple Regression Equation
E(E(y y ) = ) = 00 + + 11xx1 1 + + 22xx2 2 + + 33xx33
Estimated Regression EquationEstimated Regression Equation
yy = = bb00 + + bb11xx1 1 ++ bb22xx2 2 + + bb33xx33
wherewhere
yy = annual salary ($000) = annual salary ($000)
xx11 = years of experience = years of experience
xx22 = score on programmer aptitude test = score on programmer aptitude test
xx33 = 0 if individual = 0 if individual does notdoes not have a grad. have a grad. degreedegree
1 if individual 1 if individual doesdoes have a grad. degree have a grad. degree
Note: Note: xx33 is referred to as a dummy variable. is referred to as a dummy variable.
^
34 34 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
Formula Worksheet (showing data)Formula Worksheet (showing data)
A B C D E
1Pro-
grammerExperience
(years)Test
ScoreGrad.
DegreeSalary ($000)
2 1 4 78 0 24.03 2 7 100 1 43.04 3 1 86 0 23.75 4 5 82 1 34.36 5 8 86 1 35.87 6 10 84 1 38.08 7 0 75 0 22.2
Note: Rows 9-21 are not shown.Note: Rows 9-21 are not shown.
35 35 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation Value Worksheet (Regression Statistics)Value Worksheet (Regression Statistics)
A B C23 24 SUMMARY OUTPUT2526 Regression Statistics27 Multiple R 0.92021523928 R Square 0.84679608529 Adjusted R Square 0.81807035130 Standard Error 2.39647510131 Observations 2032
36 36 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
Value Worksheet (ANOVA Output)Value Worksheet (ANOVA Output)
A B C D E F3233 ANOVA34 df SS MS F Significance F35 Regression 3 507.896 169.2987 29.47866 9.41675E-0736 Residual 16 91.88949 5.74309337 Total 19 599.785538
37 37 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)
A B C D E3839 Coeffic. Std. Err. t Stat P-value40 Intercept 7.94485 7.3808 1.0764 0.297741 Experience 1.14758 0.2976 3.8561 0.001442 Test Score 0.19694 0.0899 2.1905 0.0436443 Grad. Degr. 2.28042 1.98661 1.1479 0.26789
Note: Columns F-I are not shown.Note: Columns F-I are not shown.
38 38 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Using Excel’s Regression Tool to DevelopUsing Excel’s Regression Tool to Developthe Estimated Multiple Regression the Estimated Multiple Regression
EquationEquation
Value Worksheet (Regression Equation Output)Value Worksheet (Regression Equation Output)
A B F G H I3839 Coeffic. Low. 95% Up. 95% Low. 95.0% Up. 95.0%40 Intercept 7.94485 -7.701739 23.5914 -7.7017385 23.59143641 Experience 1.14758 0.516695 1.77847 0.51669483 1.778468642 Test Score 0.19694 0.00635 0.38752 0.00634964 0.387524343 Grad. Degr. 2.28042 -1.931002 6.49185 -1.9310017 6.4918494
Note: Columns C-E are hidden.Note: Columns C-E are hidden.
39 39 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)
Interpreting the ParametersInterpreting the Parameters
• bb11 = 1.15 = 1.15
Salary is expected to increase by $1,150 for Salary is expected to increase by $1,150 for each additional year of experience (when all each additional year of experience (when all other independent variables are held other independent variables are held constant)constant)
40 40 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)
Interpreting the ParametersInterpreting the Parameters
• bb22 = 0.197 = 0.197
Salary is expected to increase by $197 for Salary is expected to increase by $197 for each additional point scored on the each additional point scored on the programmer aptitude test (when all other programmer aptitude test (when all other independent variables are held constant)independent variables are held constant)
41 41 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)
Interpreting the ParametersInterpreting the Parameters
• bb33 = 2.28 = 2.28
Salary is expected to be $2,280 higher for Salary is expected to be $2,280 higher for an individual with a graduate degree than an individual with a graduate degree than one without a graduate degree (when all one without a graduate degree (when all other independent variables are held other independent variables are held constant)constant)
42 42 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Residual AnalysisResidual Analysis
For simple linear regression the residual plot For simple linear regression the residual plot againstagainst
and the residual plot against and the residual plot against xx provide the provide the same information.same information.
In multiple regression analysis it is preferable In multiple regression analysis it is preferable to use the residual plot against to determine to use the residual plot against to determine if the model assumptions are satisfied.if the model assumptions are satisfied.
yy
yy
43 43 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Residual AnalysisResidual Analysis
Standardized residuals are frequently used in Standardized residuals are frequently used in residual plots for purposes of:residual plots for purposes of:
• Identifying outliers (typically, standardized Identifying outliers (typically, standardized residuals < -2 or > +2)residuals < -2 or > +2)
• Providing insight about the assumption that Providing insight about the assumption that the error term the error term has a normal distribution has a normal distribution
The computation of the standardized residuals The computation of the standardized residuals in multiple regression analysis is too complex in multiple regression analysis is too complex to be done by handto be done by hand
Excel’s Regression tool can be usedExcel’s Regression tool can be used
44 44 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Using Excel to Construct Using Excel to Construct a Standardized Residual Plota Standardized Residual Plot
Value Worksheet (Residual Output)Value Worksheet (Residual Output)
A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y Residuals Standard Residuals32 1 27.89626052 -3.89626052 -1.77170689633 2 37.95204323 5.047956775 2.29540601634 3 26.02901122 -2.32901122 -1.05904757235 4 32.11201403 2.187985973 0.99492059636 5 36.34250715 -0.54250715 -0.246688757
Note: Rows 37-51 are not shown.Note: Rows 37-51 are not shown.
45 45 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
Using Excel to Construct Using Excel to Construct a Standardized Residual Plota Standardized Residual Plot
Standardized Residual Plot
-2
-1
0
1
2
3
0 10 20 30 40 50
Predicted Salary
Sta
nd
ard
R
es
idu
als
OutlierOutlier
46 46 Slide
Slide
© 2003 South-Western/Thomson Learning™© 2003 South-Western/Thomson Learning™
End of Chapter 15End of Chapter 15