statistics - site.iugaza.edu.pssite.iugaza.edu.ps/mriffi/files/2018/02/ch14.pdf · 14.1 testing the...
TRANSCRIPT
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
STATISTICSINFORMED DECISIONS USING DATAFifth Edition
Chapter 14
Inference on the Least-Squares
Regression Model and Multiple Regression
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression ModelLearning Objectives
1. State the requirements of the least-squares regression model
2. Compute the standard error of the estimate
3. Verify that the residuals are normally distributed
4. Conduct inference on the slope of the least-squares regression model
5. Construct a confidence interval about the slope of the least-squares regression model
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.1 State the Requirements of the Least-Squares Regression Model (1 of 5)
Requirement 1 for Inference on the Least-Squares Regression Model
For any particular value of the explanatory variable x, the mean of the corresponding responses in the population depends linearly on x. That is,
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.1 State the Requirements of the Least-Squares Regression Model (2 of 5)
Requirement 2 for Inference on the Least-Squares Regression Model
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.1 State the Requirements of the Least-Squares Regression Model (3 of 5)
“In Other Words”
When doing inference on the least-squares regression model, we require (1) for any explanatory variable, x, the mean of the response variable, y, depends on the value of x through a linear equation, and (2) the response variable, y, is normally distributed with a constant standard deviation, σ. The mean increases/ decreases at a constant rate depending on the slope, while the standard deviation remains constant.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.1 State the Requirements of the Least-Squares Regression Model (4 of 5)
A large value of σ, the population standard deviation, indicates that the data are widely dispersed about the regression line, and a small value of σ indicates that the data lie fairly close to the regression line.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.1 State the Requirements of the Least-Squares Regression Model (5 of 5)
where
yi is the value of the response variable for the ith individual
xi is the value of the explanatory variable for the ith
individual
β0 and β1 are the parameters to be estimated based on sample data
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.2 Compute the Standard Error of the Estimate (1 of 7)
The standard error of the estimate, se, is found using the formula
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.2 Compute the Standard Error of the Estimate (2 of 7)
Parallel Example 2: Compute the Standard Error
Compute the standard error of the estimate for the drilling data which is presented on the next slide.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.2 Compute the Standard Error of the Estimate (3 of 7)
Depth at Which Drilling Begins, x (in feet)
Time to Drill 5 Feet, y (in minutes)
35 5.88
50 5.99
75 6.74
95 6.1
120 7.47
130 6.93
145 6.42
155 7.97
160 7.92
175 7.62
185 6.89
190 7.9
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.2 Compute the Standard Error of the Estimate (4 of 7)
Step 2, 3: The predicted values as well as the residuals for the 12 observations are given in the table on the next slide
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.2 Compute the Standard Error of the Estimate (5 of 7)
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.2 Compute the Standard Error of the Estimate (6 of 7)
Solution
Step 4: We find the sum of the squared residuals by summing the last column of the table:
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.2 Compute the Standard Error of the Estimate (7 of 7)
CAUTION!
Be sure to divide by n − 2 when computing the standard error of the estimate.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.3 Verify That the Residuals Are Normally Distributed (1 of 2)
Parallel Example 4: Compute the Standard Error
Verify that the residuals from the drilling example are normally distributed.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.3 Verify That the Residuals Are Normally Distributed (2 of 2)
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (1 of 24)
Hypothesis Test Regarding the Slope Coefficient, β1
To test whether two quantitative variables are linearly related, we use the following steps provided that
1. the sample is obtained using random sampling.
2. the residuals are normally distributed with constant error variance.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (2 of 24)
Step 1: Determine the null and alternative hypotheses. The hypotheses can be structured in one of three ways:
Two-tailed Left-Tailed Right-Tailed
H0: β1 = 0 H0: β1 = 0 H0: β1 = 0
H1: β1 ≠ 0 H1: β1 < 0 H1: β1 > 0
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (3 of 24)
which follows Student’s t-distribution with n − 2 degrees of freedom. Remember, when computing the test statistic, we assume the null hypothesis to be true. So, we assume that β1 = 0. Use Table VII to determine the critical value using n − 2 degrees of freedom.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (4 of 24)
Classical Approach
Two-Tailed
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (5 of 24)
Classical Approach
Left-Tailed
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (6 of 24)
Classical Approach
Right-Tailed
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (7 of 24)
Classical Approach
Step 4: Compare the critical value with the test statistic.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (8 of 24)
P-value Approach
By Hand Step 3: Compute the test statistic
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (9 of 24)
P-value Approach
Two-Tailed
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (10 of 24)
P-value Approach
Left-Tailed
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (11 of 24)
P-value Approach
Right-Tailed
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (12 of 24)
P-value Approach
Technology Step 3: Use a statistical spreadsheet or calculator with statistical capabilities to obtain the P-value. The directions for obtaining the P-value using the TI-83/84 Plus graphing calculators, Minitab, Excel, and StatCrunch are in the Technology Step-by-Step in the text.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (13 of 24)
P-value Approach
Step 4: If the P-value < α, reject the null hypothesis.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (14 of 24)
P-value Approach
Step 5: State the conclusion.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (15 of 24)
CAUTION!
Before testing H0: β1 = 0, be sure to draw a residual plot to verify that a linear model is appropriate.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (16 of 24)
Parallel Example 5: Testing for a Linear Relation
Test the claim that there is a linear relation between drill depth and drill time at the α = 0.05 level of significance using the drilling data.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (17 of 24)
Solution
Verify the requirements:
• We assume that the experiment was randomized so that the data can be assumed to represent a random sample.
• In Parallel Example 4 we confirmed that the residuals were normally distributed by constructing a normal probability plot.
• To verify the requirement of constant error variance, we plot the residuals against the explanatory variable, drill depth.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (18 of 24)
There is no discernable pattern.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (19 of 24)
Solution
Step 1: We want to determine whether a linear relation exists between drill depth and drill time without regard to the sign of the slope. This is a two-tailed test with
H0: β1 = 0 versus H1: β1 ≠ 0
Step 2: The level of significance is α = 0.05.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (20 of 24)
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (21 of 24)
Solution
Step 3, cont’d: We have
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (22 of 24)
Solution: Classical Approach
Step 3: cont’d Since this is a two-tailed test, we determine the critical t-values at the α = 0.05 level of significance with n − 2 = 12 − 2 = 10 degrees of freedom to be −t0.025 = −2.228 and t0.025 = 2.228.
Step 4: Since the value of the test statistic, 3.867, is greater than 2.228, we reject the null hypothesis.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (23 of 24)
Solution: P-value Approach
Step 3: Since this is a two-tailed test, the P-value is the sum of the area under the t-distribution with 12 − 2 = 10 degrees of freedom to the left of −t0 = −3.867 and to the right of t0 = 3.867. Using Table VII we find that with 10 degrees of freedom, the value 3.867 is between 3.581 and 4.144 corresponding to right-tail areas of 0.0025 and 0.001, respectively. Thus, the P-value is between 0.002 and 0.005.
Step 4: Since the P-value is less than the level of significance, 0.05, we reject the null hypothesis.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.4 Conduct Inference on the Slope of the Least-Squares Regression
Model (24 of 24)
Solution
Step 5: There is sufficient evidence at the α = 0.05 level of significance to conclude that a linear relation exists between drill depth and drill time.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.5 Construct a Confidence Interval about the Slope of the Least-Squares Regression Model (1 of 5)
Confidence Intervals for the Slope of the Regression Line
A (1 − α) • 100% confidence interval for the slope of the true regression line, β1, is given by the following formulas:
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.5 Construct a Confidence Interval about the Slope of the Least-Squares Regression Model (2 of 5)
Note: The confidence interval formula for β1 can be computed only if the data are randomly obtained, the residuals are normally distributed, and there is constant error variance.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.5 Construct a Confidence Interval about the Slope of the Least-Squares Regression Model (3 of 5)
Parallel Example 7: Constructing a Confidence Interval for the Slope of the True Regression Line
Construct a 95% confidence interval for the slope of the least-squares regression line for the drilling example.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.5 Construct a Confidence Interval about the Slope of the Least-Squares Regression Model (4 of 5)
Solution
The requirements for the usage of the confidence interval formula were verified in previous examples.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.1 Testing the Significance of the Least-Squares Regression Model14.1.5 Construct a Confidence Interval about the Slope of the Least-Squares Regression Model (5 of 5)
Solution
Since t0.025 = 2.228 for 10 degrees of freedom, we have
Lower bound = 0.0116 − 2.228 • 0.003 = 0.0049
Upper bound = 0.0116 + 2.228 • 0.003 = 0.0183.
We are 95% confident that the mean increase in the time it takes to drill 5 feet for each additional foot of depth at which the drilling begins is between 0.005 and 0.018 minutes.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction IntervalsLearning Objectives
1. Construct confidence intervals for a mean response
2. Construct prediction intervals for an individual response
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction IntervalsIntroduction
Confidence intervals for a mean response are intervals constructed about the predicted value of y, at a given level of x, that are used to measure the accuracy of the mean response of all the individuals in the population.
Prediction intervals for an individual response are intervals constructed about the predicted value of y that are used to measure the accuracy of a single individual’s predicted value.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.1 Construct Confidence Intervals for a Mean Response (1 of 5)
where x* is the given value of the explanatory variable, n is the number of observations, and tα/2 is the critical value with n − 2 degrees of freedom.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.1 Construct Confidence Intervals for a Mean Response (2 of 5)
Parallel Example 1: Constructing a Confidence Interval
for a Mean Response
Construct a 95% confidence interval about the predicted mean time
to drill 5 feet for all drillings started at a depth of 110 feet.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.1 Construct Confidence Intervals for a Mean Response (3 of 5)
Solution
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.1 Construct Confidence Intervals for a Mean Response (4 of 5)
Solution
Therefore,
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.1 Construct Confidence Intervals for a Mean Response (5 of 5)
Solution
We are 95% confident that the mean time to drill 5 feet for all
drillings started at a depth of 110 feet is between 6.45 and 7.15
minutes.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.2 Construct Prediction Intervals for an Individual Response (1 of 5)
where x* is the given value of the explanatory variable, n is the number of observations, and tα/2 is the critical value with n − 2 degrees of freedom.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.2 Construct Prediction Intervals for an Individual Response (2 of 5)
Parallel Example 2: Constructing a Prediction Interval for
an Individual Response
Construct a 95% prediction interval about the predicted time to drill
5 feet for a single drilling started at a depth of 110 feet.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.2 Construct Prediction Intervals for an Individual Response (3 of 5)
Solution
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.2 Construct Prediction Intervals for an Individual Response (4 of 5)
Solution
Therefore,
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.2 Confidence and Prediction Intervals14.2.2 Construct Prediction Intervals for an Individual Response (5 of 5)
Solution
We are 95% confident that the time to drill 5 feet for a random
drilling started at a depth of 110 feet is between 5.59 and 8.01
minutes.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple RegressionLearning Objectives
1. Obtain the correlation matrix
2. Use technology to find a multiple regression equation
3. Interpret the coefficients of a multiple regression equation
4. Determine R2 and adjusted R2
5. Perform an F-test for lack of fit
6. Test individual regression coefficients for significance
7. Construct confidence and prediction intervals
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.1 Obtain the Correlation Matrix (1 of 8)
where
– yi is the value of the response variable for the ith individual
– x1i is the ith observation for the first explanatory variable, x2i is the ith observation for the second explanatory variable, and so on
– β0, β1,…, βk are the parameters to be estimated based on sample data
– εi is a random error term that is normally distributed with mean 0 and standard deviation
– The error terms are independent, and i = 1,…, n, where n is the sample size.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.1 Obtain the Correlation Matrix (2 of 8)
A correlation matrix shows the linear correlation between each pair of variables under consideration in a multiple regression model.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.1 Obtain the Correlation Matrix (3 of 8)
Multicollinearity exists between two explanatory variables if they have a high linear correlation.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.1 Obtain the Correlation Matrix (4 of 8)
CAUTION!
If two explanatory variables in the regression model are highly correlated with each other, watch out for strange results in the regression output.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.1 Obtain the Correlation Matrix (5 of 8)
Parallel Example 1: Constructing a Correlation Matrix
As cheese ages, various chemical processes take place that determine the taste of the final product. The next two slides give concentrations of various chemicals in 30 samples of mature cheddar cheese and a subjective measure of taste for each sample.
Source: Moore, David S., and George P. McCabe (1989)
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.1 Obtain the Correlation Matrix (6 of 8)
Obs Taste In(Acetic) In(H2S) Lactic
1 12.3 4.543 3.135 0.86
2 20.9 5.159 5.043 1.53
3 39 5.366 5.438 1.57
4 47.9 5.759 7.496 1.81
5 5.6 4.663 3.807 0.99
6 25.9 5.697 7.601 1.09
7 37.3 5.892 8.726 1.29
8 21.9 6.078 7.966 1.78
9 18.1 4.898 3.85 1.29
10 21 5.242 4.174 1.58
11 34.9 5.74 6.142 1.68
12 57.2 6.446 7.908 1.9
13 0.7 4.477 2.996 1.06
14 25.9 5.236 4.942 1.3
15 54.9 6.151 6.752 1.52
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.1 Obtain the Correlation Matrix (7 of 8)
Obs Taste In(Acetic) In(H2S) Lactic
16 40.9 6.365 9.588 1.74
17 15.9 4.787 3.912 1.16
18 6.4 5.412 4.7 1.49
19 18 5.247 6.174 1.63
20 38.9 5.438 9.064 1.99
21 14 4.564 4.949 1.15
22 15.2 5.298 5.22 1.33
23 32 5.455 9.242 1.44
24 56.7 5.855 10.199 2.01
25 16.8 5.366 3.664 1.31
26 11.6 6.043 3.219 1.46
27 26.5 6.458 6.962 1.72
28 0.7 5.328 3.912 1.25
29 13.4 5.802 6.685 1.08
30 5.5 6.176 4.787 1.25
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.1 Obtain the Correlation Matrix (8 of 8)
Solution
The following correlation matrix is from MINITAB:
Correlations: taste, Acetic, H2S, Lactic
blank taste Acetic H2S
Acetic 0.550 blank blank
H2S 0.756 0.618 blank
Lactic 0.704 0.604 0.645
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.2 Use Technology to Find a Multiple Regression Equation (1 of 5)
2. Draw residual plots and a boxplot of the residuals to assess the adequacy of the model.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.2 Use Technology to Find a Multiple Regression Equation (2 of 5)
Solution
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.2 Use Technology to Find a Multiple Regression Equation (3 of 5)
Solution
2.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.2 Use Technology to Find a Multiple Regression Equation (4 of 5)
Solution
2.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.2 Use Technology to Find a Multiple Regression Equation (5 of 5)
Solution
2. None of the residual plots show any discernible pattern, and the boxplot does not show any outliers. Therefore, the linear model is appropriate.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.3 Interpret the Coefficients of a Multiple Regression Equation (1 of 6)
Parallel Example 3: Interpreting Regression Coefficients
Interpret the regression coefficients for the least-squares regression equation found in Parallel Example 2.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.3 Interpret the Coefficients of a Multiple Regression Equation (2 of 6)
• Since b1 = 0.328, for every 1 unit increase in the natural logarithm
of acetic acid concentration, the cheese’s taste score will
increase by 0.328, assuming that the hydrogen sulfide and lactic
acid concentrations remain unchanged.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.3 Interpret the Coefficients of a Multiple Regression Equation (3 of 6)
Solution
• Since b2 = 3.912, for every 1 unit increase in the natural logarithm of hydrogen sulfide concentration, the cheese’s taste score will increase by 3.912, assuming that the acetic acid and lactic acid concentrations remain unchanged.
• Since b3 = 19.671, for every 1unit increase in lactic acid concentration, the cheese’s taste score will increase by 19.671, assuming that the hydrogen sulfide and acetic acid concentrations remain unchanged.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.3 Interpret the Coefficients of a Multiple Regression Equation (4 of 6)
If the mean value of the response variable y in a least-squares regression associated with a 1-unit change in an explanatory variable depends on a second explanatory variable, there is interaction between the two explanatory variables. When interaction exists between two explanatory variables, x1 and x2, we introduce a term with the variable x1x2
in the regression model as an explanatory variable.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.3 Interpret the Coefficients of a Multiple Regression Equation (5 of 6)
An indicator (or dummy) variable is a qualitative explanatory variable in a multiple regression model that takes on the value 0 or 1.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.3 Interpret the Coefficients of a Multiple Regression Equation (6 of 6)
In general, if there are c categories for a qualitative explanatory variable, the regression model will require c − 1 indicator variables, each taking on a value of 0 or 1.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.4 Determine R2 and Adjusted R2 (1 of 6)
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.4 Determine R2 and Adjusted R2 (2 of 6)
“In Other Words”
The value of R2 always increases by adding one more explanatory variable.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.4 Determine R2 and Adjusted R2 (3 of 6)
CAUTION!
Never use R2 to compare regression models with a different number of explanatory variables. Rather, use the adjusted R2.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.4 Determine R2 and Adjusted R2 (4 of 6)
Parallel Example 4: Coefficient of Determination
For the regression model obtained in Parallel Example 2, determine the coefficient of determination and the adjusted R2.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.4 Determine R2 and Adjusted R2 (5 of 6)
Regression Analysis: taste versus Acetic, H2S, Lactic
The regression equation is taste = – 28.9 + 0.33 Acetic + 3.91 H2S + 19.7 Lactic
Predictor Coef SE Coef T P
Constant – 28.88 19.74 –1.46 0.155
Acetic 0.328 4.460 0.07 0.942
H2S 3.912 1.248 3.13 0.004
Lactic 19.671 8.629 2.28 0.031
S = 10.1307 R–Sq = 65.2 % R–Sq(adj) = 61.2%
Analysis of Variance
Source DF SS MS F P
Regression 3 4994.5 1664.8 0.155 0.000
Residual Error 26 2668.4 102.6 blank blank
Total 29 7662.9 blank blaank blank
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.4 Determine R2 and Adjusted R2 (6 of 6)
Solution
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.5 Perform an F-Test for Lack of Fit (1 of 9)
with k − 1 degrees of freedom in the numerator and n − k degrees of freedom in the denominator, where k is the number of explanatory variables and n is the sample size.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.5 Perform an F-Test for Lack of Fit (2 of 9)
where
R2 is the coefficient of determination
k is the number of explanatory variables
n is the sample size.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.5 Perform an F-Test for Lack of Fit (3 of 9)
Decision Rule for Testing H0: β1 = β2 = ··· = βk = 0
If the P-value is less than the level of significance, α, reject the null hypothesis. Otherwise, do not reject the null hypothesis.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.5 Perform an F-Test for Lack of Fit (4 of 9)
“In Other Words”
The null hypothesis states that there is no linear relation between the explanatory variables and the response variable. The alternative hypothesis states that there is a linear relation between at least one explanatory variable and the response variable.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.5 Perform an F-Test for Lack of Fit (5 of 9)
Parallel Example 5: Inference on the Regression Model
Test H0: β1 = β2 = β3 = 0 versus H1: at least one βi ≠ 0 for the multiple regression model for the cheese taste data.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.5 Perform an F-Test for Lack of Fit (6 of 9)
Solution
We must first determine whether it is reasonable to believe that the residuals are normally distributed with no outliers.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.5 Perform an F-Test for Lack of Fit (7 of 9)
Regression Analysis: taste versus Acetic, H2S, Lactic
The regression equation is taste = – 28.9 + 0.33 Acetic + 3.91 H2S + 19.7 Lactic
Predictor Coef SE Coef T P
Constant – 28.88 19.74 –1.46 0.155
Acetic 0.328 4.460 0.07 0.942
H2S 3.912 1.248 3.13 0.004
Lactic 19.671 8.629 2.28 0.031
S = 10.1307 R–Sq = 65.2 % R–Sq(adj) = 61.2%
Analysis of Variance
Source DF SS MS F P
Regression 3 4994.5 1664.8 0.155 0.000
Residual Error 26 2668.4 102.6 blank blank
Total 29 7662.9 blank blank blank
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.5 Perform an F-Test for Lack of Fit (8 of 9)
Solution
Although there appears to be one outlier, the sample size is large enough for this to not be of great concern. We look at the P-value associated with the F-test statistic from the MINITAB output.
Since the P-value < 0.001, we reject H0 and conclude that at least one of the regression coefficients is different from zero.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.5 Perform an F-Test for Lack of Fit (9 of 9)
CAUTION!
If we reject the null hypothesis that all the slope coefficients are zero, then we are saying that at least one of the slopes is different from zero, not that they all are different from zero.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.6 Test Individual Regression Coefficients for Significance (1 of 4)
Parallel Example 6: Testing the Significance of Individual Predictor Variables
Test the following hypotheses for the cheese taste data:
a) H0: β1 = 0 versus H1: β1 ≠ 0
b) H0: β2 = 0 versus H1: β2 ≠ 0
c) H0: β3 = 0 versus H1: β3 ≠ 0
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.6 Test Individual Regression Coefficients for Significance (2 of 4)
Regression Analysis: taste versus Acetic, H2S, Lactic
The regression equation is taste = – 28.9 + 0.33 Acetic + 3.91 H2S + 19.7 Lactic
Predictor Coef SE Coef T P
Constant – 28.88 19.74 –1.46 0.155
Acetic 0.328 4.460 0.07 0.942
H2S 3.912 1.248 3.13 0.004
Lactic 19.671 8.629 2.28 0.031
S = 10.1307 R–Sq = 65.2 % R–Sq(adj) = 61.2%
Analysis of Variance
Source DF SS MS F P
Regression 3 4994.5 1664.8 0.155 0.000
Residual Error 26 2668.4 102.6 blank blank
Total 29 7662.9 blank blank blank
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.6 Test Individual Regression Coefficients for Significance (3 of 4)
Solution
We will again use the MINITAB output
a) The test statistic for acetic acid is 0.07 with a P-value of 0.942 so we fail to reject H0.
b) The test statistic for hydrogen sulfide is 3.13 with a P-value of 0.004 so we reject H0.
c) The test statistic for lactic acid is 2.28 with a P-value of 0.031 so we reject H0.
We conclude that the natural logarithm of hydrogen sulfide concentration and lactic acid concentration are useful predictors for taste, but the natural logarithm of acetic acid concentration is not.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.6 Test Individual Regression Coefficients for Significance (4 of 4)
We refit the model using the natural logarithm of hydrogen sulfide concentration and lactic acid concentration to obtain
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.7 Construct Confidence and Prediction Intervals (1 of 3)
Parallel Example 7: Testing the Significance of Individual Predictor Variables
a) Construct a 95% confidence interval for the mean taste score of all cheddar cheeses whose natural logarithm of hydrogen sulfide concentration is 5.5 and whose lactic acid concentration is 1.75.
b) Construct a 95% prediction interval for the taste score of an individual cheddar cheese whose natural logarithm of hydrogen sulfide concentration is 5.5 and whose lactic acid concentration is 1.75.
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.7 Construct Confidence and Prediction Intervals (2 of 3)
Solution
Predicted Values for New Observations
New
Obs Fit SE Fit 95% CI 95% PI
1 28.92 3.34 (22.07, 35.76) (7.40, 50.43)
Values of Predictors for New Observations
New blank blank
Obs H2S Lactic
1 5.50 1.75
Copyright © 2017, 2013, 2010 Pearson Education, Inc. All Rights Reserved
14.3 Introduction to Multiple Regression14.3.7 Construct Confidence and Prediction Intervals (3 of 3)
Solution
Based on the MINITAB output, we are 95% confident that the mean taste score of all cheddar cheeses with ln(hydrogen sulfide) = 5.5 and a lactic acid concentration of 1.75 is between 22.07 and 35.76. We are 95% confident that the mean taste score of an individual cheddar cheese with ln(hydrogen sulfide) = 5.5 and a lactic acid concentration of 1.75 will be between 7.40 and 50.43.