shonda kuiper grinnell college. statistical techniques taught in introductory statistics courses...
TRANSCRIPT
![Page 1: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/1.jpg)
Shonda Kuiper
Grinnell College
Comparing the two-sample t-test, ANOVA and regression
![Page 2: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/2.jpg)
Comparing Statistical Tests
Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory variable.
Explanatory Variable
Response
Variable
Response variable measures the outcome of a study.
Explanatory variable explain changes in the response variable.
![Page 3: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/3.jpg)
Comparing Statistical Tests
Each variable can be classified as either categorical or quantitative.
Explanatory Variable
Response
Variable
Categorical
Categorical
Quantitative
Quantitative
Chi-Square test
Two proportion test
Two-sample t-test
ANOVA
Logistic Regression
Regression
Categorical data place individuals into one of several groups (such as red/blue/white, male/female or yes/no).
Quantitative data consists of numerical values for which most arithmetic operations make sense.
![Page 4: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/4.jpg)
= +
Model for a Two-sample t-test
𝑌 𝑖𝑗=𝑌 𝑖+ �̂�𝑖𝑗70 80 -10
82 80 2
90 80 10
78 = 80 + -2
75 85 -10
85 85 0
95 85 10
85 85 0
where i =1,2 j = 1,2,3,4
Statistical models have the following form:
observed value = mean response + random error
Generic Group: = = (70+82+90+78)/4 = 80
Brand Name Group: = = (75+85+95+85)/4 = 85
![Page 5: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/5.jpg)
= = 80
= = 85
μ1
μ2
Null Hypothesis: the two groups of batteries last the same amount of time
Model for a Two-sample t-test
![Page 6: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/6.jpg)
= 80
= 85
μ1
μ2
Model for a Two-Sample t-test
![Page 7: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/7.jpg)
Model for a Two-Sample t-test
The theoretical model used in the two-sample t-test is designed to account for these two group means (µ1 and µ2) and random error.
Null Hypothesis:
Alternative Hypothesis:
observed mean randomvalue response error= +
𝑌 𝑖𝑗=𝜇𝑖+𝜀𝑖𝑗 where i =1,2 j = 1,2,3,4
𝑌 𝑖𝑗=𝑌 𝑖+ �̂�𝑖𝑗 where i =1,2 j = 1,2,3,4
![Page 8: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/8.jpg)
Model for ANOVA
70 82.5 -2.5 -10
82 82.5 -2.5 2
90 82.5 -2.5 10
78 = 82.5 + -2.5 + -2
75 82.5 2.5 -10
85 82.5 2.5 0
95 82.5 2.5 10
85 82.5 2.5 0
= = 80 82.5 = —2.5
= = 85 + 82.5 = 2.5
= = (70 + 82 + 90 + 78 + 75 + 85 + 95 + 85)/8
= 82.5
where i = 1,2 and j = 1,2,3,4
ANOVA: Instead of using two group means, we break the mean response into a grand mean, , two group effects (1 and 2).
![Page 9: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/9.jpg)
= 80
= 85
μ1
μ2
= = 82.5 = = —2.5
= 2.5
Model for ANOVA
![Page 10: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/10.jpg)
Model for ANOVA
Null Hypothesis:
Alternative Hypothesis:
+𝑌 𝑖 , 𝑗=𝜇𝑖+𝜀𝑖 , 𝑗
observed mean randomvalue response error= +𝑌 𝑖𝑗=𝜇𝑖+𝜀𝑖𝑗 where i =1,2
j = 1,2,3,4𝑌 𝑖 , 𝑗={𝜇+𝛼𝑖 }+𝜀𝑖 , 𝑗
𝐻0 :𝜇1=𝜇2
![Page 11: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/11.jpg)
Model for Regression
Xi is either 0 or 1
Regression: Instead of using two group means, we create a model for a straight line (using and ).
Xi 0, Xi , 𝐻0 :𝜇2−𝜇1=0
𝑌 𝑖 , 𝑗=𝜇𝑖+𝜀𝑖 , 𝑗
observed mean randomvalue response error= +𝑌 𝑖𝑗=𝜇𝑖+𝜀𝑖𝑗
where i =1,2 j = 1,2,3,4
𝑌 𝑖= {𝛽0+𝛽1𝑋 𝑖 }+𝜀𝑖 where i = 1,2, …, 8
![Page 12: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/12.jpg)
Model for Regression
![Page 13: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/13.jpg)
Model for Regression
70 80 0 -10
82 80 0 2
90 80 0 10
78 = 80 + 0 + -2
75 80 5 -10
85 80 5 0
95 80 5 10
85 80 5 0
80
85 80 5
where i = 1,2,…,8
Regression: Instead of using two group means, we create a model for a straight line (using and ).
![Page 14: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/14.jpg)
Model for Regression
80 80 0
80 80 0
80 80 0
80 = 80 + 0
85 80 5
85 80 5
85 80 5
85 80 5
where i = 1,2,…,8
Regression: Instead of using two group means, we create a model for a straight line (using and ).
The equation for the line is often written as:
![Page 15: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/15.jpg)
Comparing the Two-sample t-test, Regression and ANOVA
When there are only two groups (and we have the same assumptions), all three models are algebraically equivalent.
𝑌 𝑖𝑗=𝜇𝑖+𝜀𝑖𝑗 where i =1,2 j = 1,2,3,4
𝐻0 : μ1=μ2
𝑌 𝑖 , 𝑗={𝜇+𝛼𝑖 }+𝜀𝑖 , 𝑗 where i =1,2 j = 1,2,3,4
𝑌 𝑖= {𝛽0+𝛽1𝑋 𝑖 }+𝜀𝑖 where i = 1,2, …, 8
![Page 16: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/16.jpg)
Shonda Kuiper
Grinnell College
Introduction to Multiple RegressionHypothesis Tests and R2
![Page 17: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/17.jpg)
Goals of Multiple Regression
• Multiple regression analysis can be used to serve different goals. The goals will influence the type of analysis that is conducted. The most common goals of multiple regression are to:• Describe: A model may be developed to describe the
relationship between multiple explanatory variables and the response variable.
• Predict: A regression model may be used to generalize to observations outside the sample.
• Confirm: Theories are often developed about which variables or combination of variables should be included in a model. Hypothesis tests can be used to evaluate the relationship between the explanatory variables and the response.
![Page 18: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/18.jpg)
Introduction to Multiple Regression
• Build a multiple regression model to predict retail price of cars• Price = 35738 – 0.22 Mileage R-Sq: 4.1%
• Slope coefficient (b1): t = -2.95 (p-value = 0.004)
Questions: What happens to Price as Mileage increases?
![Page 19: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/19.jpg)
Introduction to Multiple Regression
• Build a multiple regression model to predict retail price of cars• Price = 35738 – 0.22 Mileage R-Sq: 4.1%
• Slope coefficient (b1): t = -2.95 (p-value = 0.004)
Questions: What happens to Price as Mileage increases? Since b1 = -0.22 is small can we conclude it is unimportant?
![Page 20: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/20.jpg)
Introduction to Multiple Regression
• Build a multiple regression model to predict retail price of cars• Price = 35738 – 0.22 Mileage R-Sq: 4.1%
• Slope coefficient (b1): t = -2.95 (p-value = 0.004)
Questions: What happens to Price as Mileage increases? Since b1 = -0.22 is small can we conclude it is unimportant? Does mileage help you predict price? What does the p-value tell you?
![Page 21: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/21.jpg)
Introduction to Multiple Regression
• Build a multiple regression model to predict retail price of cars• Price = 35738 – 0.22 Mileage R-Sq: 4.1%
• Slope coefficient (b1): t = -2.95 (p-value = 0.004)
Questions: What happens to Price as Mileage increases? Since b1 = -0.22 is small can we conclude it is unimportant? Does mileage help you predict price? What does the p-value tell you? Does mileage help you predict price? What does the R-Sq value tell you?
![Page 22: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/22.jpg)
Introduction to Multiple Regression
• Build a multiple regression model to predict retail price of cars• Price = 35738 – 0.22 Mileage R-Sq: 4.1%
• Slope coefficient (b1): t = -2.95 (p-value = 0.004)
Questions: What happens to Price as Mileage increases? Since b1 = -0.22 is small can we conclude it is unimportant? Does mileage help you predict price? What does the p-value tell you? Does mileage help you predict price? What does the R-Sq value tell you? Are there outliers or influential observations?
![Page 23: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/23.jpg)
What is R2?
![Page 24: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/24.jpg)
What is R2?
![Page 25: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/25.jpg)
What is R2?
What happens when all the points fall on the regression line?
0
![Page 26: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/26.jpg)
What is R2?
What happens when the regression line does not help us estimate Y?
![Page 27: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/27.jpg)
What is R2?
What happens when the regression line does not help us estimate Y?
![Page 28: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/28.jpg)
What is R2?
What happens when the regression line does not help us estimate Y?
![Page 29: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/29.jpg)
What is R2?
What happens when the regression line does not help us estimate Y?
![Page 30: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/30.jpg)
What is R2?
What happens when the regression line does not help us estimate Y?
![Page 31: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/31.jpg)
What is R2?
What happens when the regression line does not help us estimate Y?
![Page 32: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/32.jpg)
What is R2?
What happens when the regression line does not help us estimate Y?
![Page 33: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/33.jpg)
Adjusted R2
• R2adj includes a penalty when more terms are included in
the model.
• n is the sample size and p is the number of coefficients (including the constant term β0, β1, β2, β3,…, βp-1)
• When many terms are in the model:• p is larger R2
adj is smaller (n – 1)/(n-p) is larger
![Page 34: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/34.jpg)
Price = 35738 – 0.22 Mileage R-Sq: 4.1%
Slope coefficient (b1): t = -2.95 (p-value = 0.004)
![Page 35: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/35.jpg)
Shonda Kuiper
Grinnell College
Introduction to Multiple Regression:Variable Section
![Page 36: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/36.jpg)
Variable Selection Techniques
• Build a multiple regression model to predict retail price of cars
Mileage
Pri
ce
50000400003000020000100000
70000
60000
50000
40000
30000
20000
10000
0
Scatterplot of Price vs Mileage R2 = 2%
![Page 37: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/37.jpg)
Variable Selection Techniques
• Build a multiple regression model to predict retail price of cars
Mileage
Pri
ce
50000400003000020000100000
70000
60000
50000
40000
30000
20000
10000
0
Scatterplot of Price vs Mileage R2 = 2%Mileage
Cylinder
Liter
Leather
Cruise
Doors
Sound
![Page 38: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/38.jpg)
Variable Selection Techniques
• Build a multiple regression model to predict retail price of cars
Mileage
Pri
ce
50000400003000020000100000
70000
60000
50000
40000
30000
20000
10000
0
Scatterplot of Price vs Mileage R2 = 2%Mileage
Cylinder
Liter
Leather
Cruise
Doors
Sound
Price = 6759 + 6289Cruise + 3792Cyl -1543Doors + 3349Leather - 787Liter -0.17Mileage - 1994Sound
R2 = 44.6%
![Page 39: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/39.jpg)
Introduction to Multiple Regression
Step Forward Regression (Forward Selection):
Which single explanatory variable best predicts Price?
Price = 13921.9 + 9862.3Cruise R2 = 18.56%
![Page 40: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/40.jpg)
Introduction to Multiple Regression
Step Forward Regression:
Which single explanatory variable best predicts Price?
Price = 13921.9 + 9862.3Cruise R2 = 18.56%
Price = -17.06 + 4054.2Cyl R2 = 32.39%
![Page 41: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/41.jpg)
Introduction to Multiple Regression
Step Forward Regression:
Which single explanatory variable best predicts Price?
Price = 13921.9 + 9862.3Cruise R2 = 18.56%
Price = -17.06 + 4054.2Cyl R2 = 32.39%
Price = 24764.6 – 0.17Mileage R2 = 2.04%
![Page 42: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/42.jpg)
Introduction to Multiple Regression
Step Forward Regression:
Which single explanatory variable best predicts Price?
Price = 13921.9 + 9862.3Cruise R2 = 18.56%
Price = -17.06 + 4054.2Cyl R2 = 32.39%
Price = 24764.6 – 0.17Mileage R2 = 2.04%
Price = 6185.8.6 + 4990.4Liter R2 = 31.15%
![Page 43: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/43.jpg)
Introduction to Multiple Regression
Step Forward Regression:
Which single explanatory variable best predicts Price?
Price = 13921.9 + 9862.3Cruise R2 = 18.56%
Price = -17.06 + 4054.2Cyl R2 = 32.39%
Price = 24764.6 – 0.17Mileage R2 = 2.04%
Price = 6185.8.6 + 4990.4Liter R2 = 31.15%
Price = 23130.1 – 2631.4Sound R2 = 1.55%
Price = 18828.8 + 3473.46Leather R2 = 2.47%
Price = 27033.6 -1613.2Doors R2 = 1.93%
![Page 44: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/44.jpg)
Introduction to Multiple Regression
Step Forward Regression:
Which combination of two terms best predicts Price?
Price = - 17.06 + 4054.2Cyl R2 = 32.39% Price = -1046.4 + 3392.6Cyl + 6000.4Cruise R2 = 38.4% (38.2%)
![Page 45: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/45.jpg)
Introduction to Multiple Regression
Step Forward Regression:
Which combination of two terms best predicts Price?
Price = - 17.06 + 4054.2Cyl R2 = 32.39% Price = 3145.8 + 4027.6Cyl – 0.152Mileage R2 = 34% (33.8)
![Page 46: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/46.jpg)
Introduction to Multiple Regression
Step Forward Regression:
Which combination of two terms best predicts Price?
Price = -17.06 + 4054.2Cyl R2 = 32.39% Price = 1372.4 + 2976.4Cyl + 1412.2Liter R2 = 32.6% (32.4%)
![Page 47: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/47.jpg)
Introduction to Multiple Regression
Step Forward Regression:
Which combination of terms best predicts Price?
Price = -17.06 + 4054.2Cyl R2 = 32.39% Price = -1046.4 + 3393Cyl + 6000.4Cruise R2 = 38.4% (38.2%)
Price = -2978.4 + 3276Cyl +6362Cruise + 3139Leather
R2 = 40.4% (40.2%)
Price = 412.6 + 3233Cyl +6492Cruise + 3162Leather
-0.17Mileage R2 = 42.3% (42%)
Price = 5530.3 + 3258Cyl +6320Cruise + 2979Leather
-0.17Mileage – 1402Doors R2 = 43.7% (43.3%)
Price = 7323.2 + 3200Cyl + 6206Cruise + 3327Leather
-0.17Mileage – 1463Doors – 2024Sound R2 = 44.6% (44.15%)
Price = 6759 + 3792Cyl + 6289Cruise + 3349Leather -787Liter
-0.17Mileage -1543Doors - 1994Sound R2 = 44.6% (44.14%)
![Page 48: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/48.jpg)
Introduction to Multiple Regression
Step Forward Regression:
Which single explanatory variable best predicts Price?
Price = 13921.9 + 9862.3Cruise R2 = 18.56%
Price = -17.06 + 4054.2Cyl R2 = 32.39%
Price = 24764.6 – 0.17Mileage R2 = 2.04%
Price = 6185.8.6 + 4990.4Liter R2 = 31.15%
Price = 23130.1 – 2631.4Sound R2 = 1.55%
Price = 18828.8 + 3473.46Leather R2 = 2.47%
Price = 27033.6 -1613.2Doors R2 = 1.93%
![Page 49: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/49.jpg)
Introduction to Multiple Regression
Step Backward Regression (Backward Elimination):
Price = 7323.2 + 3200Cyl + 6206Cruise + 3327Leather
-0.17Mileage – 1463Doors – 2024Sound R2 = 44.6% (44.15%)
Price = 6759 + 3792Cyl + 6289Cruise + 3349Leather -787Liter
-0.17Mileage -1543Doors - 1994Sound R2 = 44.6% (44.14%)
Other techniques, such as Akaike information criterion, Bayesian information criterion, Mallows’ Cp, are often used to find the best model.
Bidirectional stepwise procedures
![Page 50: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/50.jpg)
Introduction to Multiple Regression
Best Subsets Regression:
Here we see that Liter is the second best single predictor of price.
![Page 51: Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory](https://reader037.vdocument.in/reader037/viewer/2022110103/56649ea05503460f94ba3562/html5/thumbnails/51.jpg)
Introduction to Multiple Regression
Important Cautions:
• Stepwise regression techniques can often ignore very important explanatory variables. Best subsets is often preferable.
• Both best subsets and stepwise regression methods only consider linear relationships between the response and explanatory variables.
• Residual graphs are still essential in validating whether the model is appropriate.
• Transformations, interactions and quadratic terms can often improve the model.
• Whenever these iterative variable selections techniques are used, the p-values corresponding to the significance of each individual coefficient are not reliable.