1 chapter 10, part 2 linear regression. 2 last time: a scatterplot gives a picture of the...
TRANSCRIPT
![Page 1: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/1.jpg)
1
Chapter 10, Part 2
Linear Regression
![Page 2: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/2.jpg)
2
• Last Time: A scatterplot gives a picture of the relationship between two quantitative variables.
• One variable is explanatory, and the other is the response.
• Today: If we know the value of the explanatory variable, can we predict the value of the response variable?
Predictions with Scatterplots
![Page 3: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/3.jpg)
The Regression Line
• To make predictions, we’ll find a straight line that is the “best fit” for the points in the scatterplot. This is not so simple….
40
50
60
70
80
90
100
Exa
m 2
20 30 40 50 60 70 80 90 100 110
Exam 1
![Page 4: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/4.jpg)
Regression Line in JMP
• Start by making a scatterplot.• Red Triangle menu -> “Fit Line.”• The equation of the regression line
appears under the “Linear Fit” group.• JMP uses column headings as variable
names (instead of x and y).• Example from the Cars 1993 file:MaxPrice = 2.3139014 + 1.1435971*MinPrice
![Page 5: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/5.jpg)
Predicted Values
• We use the equation of the regression line to make predictions about…• Individuals not in the original data set.• Later measurements of the same individuals.
• Example: In 1994, a vehicle had a Min. Price of $15,000. Use the previous data to predict the Max. Price.
• You can do this by hand from the equation:MaxPrice = 2.3139014 + 1.1435971*MinPrice
• 2.3139014+1.1435971*(15) = 19.4678579
![Page 6: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/6.jpg)
Are the Predictions Useful?
• In some cases, the regression line is more useful for predicting values. Consider the following examples (from Cars 1993):
![Page 7: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/7.jpg)
7
Coefficient of Determination
• If the scatterplot is well-approximated by a straight line, the regression equation is more useful for making predictions.
• Correlation is one measure of this.
• The square of the correlation has a more intuitive meaning: What proportion of variation in the Response Variable is explained by variation in the Explanatory Variable?
JMP: “RSquare” under “Summary of Fit”
![Page 8: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/8.jpg)
Coefficient of Determination
• In predicting Max. Price from Min. Price, we had RSquare = 0.822202.
• About 82% of the variation in Max. Price is explained by a variation in Min. Price.
• In predicting Highway MPG from Engine size, we have RSquare = 0.392871
• Only 39% of the variation in Highway MPG is explained by a variation in Engine Size.
![Page 9: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/9.jpg)
Coefficient of Determination
• RSquare takes values from 0 to 1.
• For values close to 0, the regression line is not very useful for predictions.
• For values close to 1, the regression line is more useful for making predictions.
• RSquare makes no distinction between positive and negative association of variables.
![Page 10: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/10.jpg)
10
Residuals
• For each individual in the data set we can compute the difference (error) between the actual and predicted values of the response variable. This difference is called a residual:
Residual = (actual value) – (predicted value)
• In JMP: Click the red triangle by “Linear Fit” and select “Save Residuals” from the drop-down menu. You can also “Plot Residuals.”
![Page 11: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/11.jpg)
11
How does JMP find the Regression Line?
• JMP uses the most popular method, Ordinary Least Squares (OLS).
• To measure how a given line fits the data:• Compute all residuals, take the square of each.• Add up the results to get a “total error.”
• The closer this total is to zero, the better the line fits the data. Choose the line with the smallest “total error.”
• (Thankfully) JMP takes care of the details.
![Page 12: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/12.jpg)
12
Limitations of Correlation and Linear Regression:
• Both describe linear relationships only.• Both are sensitive to outliers.• Beware of extrapolation: predicting
outside of the given range of the explanatory variable.
• Beware of lurking variables: other factors that may explain a strong correlation.
• Correlation does not imply causality!
![Page 13: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/13.jpg)
13
Beware Extrapolation!
• A child’s height was plotted against her age...
• Can you predict her height at age 8 (96 months)?
• Can you predict her height at age 30 (360 months)?
80
85
90
95
100
30 35 40 45 50 55 60 65
age (months)
hei
gh
t (c
m)
![Page 14: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/14.jpg)
14
Beware Extrapolation!
• Regression line:y = 71.95 + .383 x
• Height at 96 months? y = 94.93cm (3' 6'')
• Height at 360 months? y = 209.8cm (6’ 10'')
• Height at birth (x = 0)?
y = 71.95cm (2’ 4”)
70
90
110
130
150
170
190
210
30 90 150 210 270 330 390
age (months)
hei
gh
t (c
m)
![Page 15: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/15.jpg)
Beware Lurking Variables!
• Although there may be a strong correlation (statistical relationship) between two variables, there might not be a direct practical (cause-and-effect) relationship.
• A lurking variable is a third variable (not in the scatterplot) that might cause the apparent relationship between explanatory and response variables.
![Page 16: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/16.jpg)
Example: Pizza vs. Subway Fare
The regression line to the right shows a strong correlation (0.9878) between the cost of:
• A slice of pizza
• Subway fare
Q: Does the price of pizza affect the price of the subway?
![Page 17: 1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable](https://reader035.vdocument.in/reader035/viewer/2022072014/56649eab5503460f94bb089a/html5/thumbnails/17.jpg)
17
• In a study of emergency services, it was noted that larger fires tend to have more firefighters present.
• Suppose we used:– Explanatory Variable: Number of firefighters
– Response Variable: Size of the fire
• We would expect a strong correlation.
• But it’s ludicrous to conclude that having more firefighters present causes the fire to be larger.
Caution:Correlation Does Not Imply Causation