![Page 1: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/1.jpg)
1
8.4 Multiple Regression8.4 Multiple Regression
Lecture Unit 8
![Page 2: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/2.jpg)
2
8.4 Introduction
• In this section we extend simple linear regression where we had one explanatory variable, and allow for any number of explanatory variables.
• We expect to build a model that fits the data better than the simple linear regression model.
![Page 3: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/3.jpg)
3
• We shall use computer printout to – Assess the model
• How well it fits the data• Is it useful• Are any required conditions violated?
– Employ the model• Interpreting the coefficients• Predictions using the prediction equation• Estimating the expected value of the dependent variable
Introduction
![Page 4: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/4.jpg)
4
Coefficients
Dependent variable Independent variables
Random error variable
Multiple Regression Model
• We allow for k explanatory variables to potentially be related to the response variable
y = 0 + 1x1+ 2x2 + …+ kxk +
![Page 5: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/5.jpg)
The Multiple Regression ModelIdea: Examine the linear relationship between 1 response variable (y) & 2 or more explanatory variables (xi)
εxβxβxββy kk22110
kk22110 xbxbxbby
Population model:
Y-intercept Population slopes Random Error
Estimated (or predicted)
value of yEstimated slope coefficients
Estimated multiple regression model:
Estimatedintercept
![Page 6: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/6.jpg)
Simple Linear Regression
Random Error for this x value
y
x
Observed Value of y for xi
Predicted Value of y for xi
0 1y b b x
xi
Slope = b1
Intercept = b0
εi
![Page 7: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/7.jpg)
7
Multiple Regression, 2 explanatory variables
•X
•X
•*
•*•*
•*
•*•*
•*•*
•*
•*
•*
•*
•*
•Y
•*
•*
•*
•*
•2
•1
•Least Squares
Plane (instead of
line)
•Scatter of points around plane are random error.
![Page 8: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/8.jpg)
Multiple Regression ModelTwo variable model
y
x1
x2
22110 xbxbby yi
yi
<
e = (yi – yi)<
x2i
x1i The best fit equation, y , is found by minimizing the sum of squared errors, e2
<
Sample observation
![Page 9: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/9.jpg)
9
• The error is normally distributed.• The mean is equal to zero and the standard
deviation is constant ( for all values of y. • The errors are independent.
Required conditions for the error variable
![Page 10: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/10.jpg)
10
– If the model assessment indicates good fit to the data, use it to interpret the coefficients and generate predictions.
– Assess the model fit using statistics obtained from the sample.
– Diagnose violations of required conditions. Try to remedy problems when identified.
8.4 Estimating the Coefficients and Assessing the Model
• The procedure used to perform regression analysis:– Obtain the model coefficients and statistics using statistical
software.
![Page 11: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/11.jpg)
11
• Predicting final exam scores in BUS/ST 350– We would like to predict final exam scores in 350.– Use information generated during the semester.– Predictors of the final exam score:
• Exam 1• Exam 2• Exam 3• Homework total
Estimating the Coefficients and Assessing the Model, Example
![Page 12: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/12.jpg)
12
• Data were collected from 203 randomly selected students from previous semesters• The following model is proposed
final exam = exam1 exam2exam3hwtot
Estimating the Coefficients and Assessing the Model, Example
exam 1 exam2 exam3 hwtot finalexm80 60 80 159 7280 70 75 359 7695 70 90 330 8490 100 100 359 9270 60 80 272 6490 70 70 344 8490 85 90 351 8885 35 90 200 7685 55 70 251 6040 80 95 293 64
![Page 13: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/13.jpg)
Regression StatisticsMultiple R 0.618439R Square 0.38246679Adjusted R Square 0.36999137Standard Error 11.5122313Observations 203
ANOVAdf SS MS F Significance F
Regression 4 16252.40443 4063 30.66 7.32692E-20Residual 198 26241.23104 132.5Total 202 42493.63547
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 0.04978935 8.17368799 0.006 0.995 -16.06886586 16.16844455exam 1 0.10021107 0.075633398 1.325 0.187 -0.048939306 0.249361453exam2 0.15413733 0.072271404 2.133 0.034 0.011616858 0.296657794exam3 0.29600913 0.066724619 4.436 2E-05 0.16442702 0.427591244hwtot 0.10771069 0.022685084 4.748 4E-06 0.062975308 0.15244607213
This is the sample regression equation (sometimes called the prediction equation)This is the sample regression equation (sometimes called the prediction equation)
Regression Analysis, Excel Output
Final exam score = 0.0498 + 0.1002exam1 + 0.1541exam2 + 0.2960exam3 +0.1077hwtot
![Page 14: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/14.jpg)
14
• b0 = 0.0498. This is the intercept, the value of y when all
the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept.
• b1 = 0.1002. In this model, for each additional point on
exam 1, the final exam score increases on average by
0.1002 (assuming the other variables are held
constant).
Interpreting the Coefficients
![Page 15: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/15.jpg)
15
• b2 = 0.1541. In this model, for each additional point on exam 2, the final exam score increases on average by 0.1541 (assuming the other variables are held constant).
• b3 = 0.2960. For each additional point on exam 3, the final exam score increases on average by 0.2960 (assuming the other variables are held constant).
• b4 = 0.1077. For each additional point on the homework, the final exam score increases on average by 0.1077 (assuming the other variables are held constant).
Interpreting the Coefficients
![Page 16: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/16.jpg)
16
• Predict the average final exam score of a student with the following exam scores and homework score:– Exam 1 score 75,– Exam 2 score 79,– Exam 3 score 85,– Homework score 310
– Use trend function in ExcelFinal exam score =0.0498 + 0.1002(75) +0.1541(79) + 0.2960(85) + 0.1077(310) = 78.2857
Final Exam Scores, Predictions
![Page 17: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/17.jpg)
17
Model Assessment
• The model is assessed using three tools:– The standard error of the residuals – The coefficient of determination– The F-test of the analysis of variance
• The standard error of the residuals participates in building the other tools.
![Page 18: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/18.jpg)
18
• The standard deviation of the residuals is estimated by the Standard Error of the Residuals:
• The magnitude of s is judged by comparing it to
1knSSE
s
Standard Error of Residuals
.y
![Page 19: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/19.jpg)
Regression StatisticsMultiple R 0.618439R Square 0.38246679Adjusted R Square 0.36999137Standard Error 11.5122313Observations 203
ANOVAdf SS MS F Significance F
Regression 4 16252.40443 4063 30.66 7.32692E-20Residual 198 26241.23104 132.5Total 202 42493.63547
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 0.04978935 8.17368799 0.006 0.995 -16.06886586 16.16844455exam 1 0.10021107 0.075633398 1.325 0.187 -0.048939306 0.249361453exam2 0.15413733 0.072271404 2.133 0.034 0.011616858 0.296657794exam3 0.29600913 0.066724619 4.436 2E-05 0.16442702 0.427591244hwtot 0.10771069 0.022685084 4.748 4E-06 0.062975308 0.15244607219
Regression Analysis, Excel OutputStandard error of the residuals; sqrt(MSE) (standard error of the residuals)2: MSE=SSE/198
Sum of squares of residuals SSE
![Page 20: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/20.jpg)
20
• From the printout, s = 11.5122….• Calculating the mean value of y we have• It seems s is not particularly small. • Question:
Can we conclude the model does not fit the data well?
78.84y
Standard Error of Residuals
![Page 21: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/21.jpg)
21
• The proportion of the variation in y that is explained by differences in the explanatory variables x1, x2, …, xk
• R = 1 – (SSE/SSTotal)• From the printout, R2 = 0.382466…• 38.25% of the variation in final exam score is explained by
differences in the exam1, exam2, exam3, and hwtot explanatory variables. 61.75% remains unexplained.
• When adjusted for degrees of freedom, Adjusted R2 = 36.99%
Coefficient of Determination R2
(like r2 in simple linear regression
![Page 22: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/22.jpg)
22
• We pose the question:Is there at least one explanatory variable linearly related to the response variable?
• To answer the question we test the hypothesis
H0: 1 = 2 = … = k=0
H1: At least one i is not equal to zero.
• If at least one i is not equal to zero, the model has some validity.
Testing the Validity of the Model
![Page 23: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/23.jpg)
23
• The hypotheses are tested by what is called an F test shown in the Excel output below
Testing the Validity of the Final Exam Scores Regression Model
k =n–k–1 = n-1 =
ANOVAdf SS MS F Significance F
Regression 4 16252.404 4063 30.66 7.32692E-20Residual 198 26241.231 132.5Total 202 42493.635
P-value
SSR
SSE MSE=SSE/(n-k-1)
MSR=SSR/k
MSR/MSE
![Page 24: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/24.jpg)
24
[Variation in y] = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model; the model is useful, and thus, the null hypothesis H0 should be rejected. Reject H0 when P-value < 0.05
Testing the Validity of the Final Exam Scores Regression Model
![Page 25: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/25.jpg)
25
The P-value (Significance F) < 0.05Reject the null hypothesis.
Testing the Validity of the Final Exam Scores Regression Model
ANOVAdf SS MS F Significance F
Regression 4 16252.404 4063 30.66 7.32692E-20Residual 198 26241.231 132.5Total 202 42493.635
Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the i is not equal to zero. Thus, at least one explanatory variable is linearly related to y. This linear regression model is valid
Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the i is not equal to zero. Thus, at least one explanatory variable is linearly related to y. This linear regression model is valid
![Page 26: 1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,](https://reader036.vdocument.in/reader036/viewer/2022081518/551af9835503462e578b4dd5/html5/thumbnails/26.jpg)
Coefficients Standard Error t Stat P-valueIntercept 0.04978935 8.17368799 0.006 0.995145915exam 1 0.10021107 0.075633398 1.325 0.186712117exam2 0.15413733 0.072271404 2.133 0.034176157exam3 0.29600913 0.066724619 4.436 1.51714E-05hwtot 0.10771069 0.022685084 4.748 3.93288E-06
26
• The hypothesis for each i is
• Excel printout
H0: i 0H1: i 0 d.f. = n - k -1
Test statistic0
i
i
b
bt
s
Testing the Coefficients