1 chapter 10, part 2 linear regression. 2 last time: a scatterplot gives a picture of the...

1

Chapter 10, Part 2

Linear Regression

2

• Last Time: A scatterplot gives a picture of the relationship between two quantitative variables.

• One variable is explanatory, and the other is the response.

• Today: If we know the value of the explanatory variable, can we predict the value of the response variable?

Predictions with Scatterplots

The Regression Line

• To make predictions, we’ll find a straight line that is the “best fit” for the points in the scatterplot. This is not so simple….

40

50

60

70

80

90

100

Exa

m 2

20 30 40 50 60 70 80 90 100 110

Exam 1

Regression Line in JMP

• Start by making a scatterplot.• Red Triangle menu -> “Fit Line.”• The equation of the regression line

appears under the “Linear Fit” group.• JMP uses column headings as variable

names (instead of x and y).• Example from the Cars 1993 file:MaxPrice = 2.3139014 + 1.1435971*MinPrice

Predicted Values

• We use the equation of the regression line to make predictions about…• Individuals not in the original data set.• Later measurements of the same individuals.

• Example: In 1994, a vehicle had a Min. Price of $15,000. Use the previous data to predict the Max. Price.

• You can do this by hand from the equation:MaxPrice = 2.3139014 + 1.1435971*MinPrice

• 2.3139014+1.1435971*(15) = 19.4678579

Are the Predictions Useful?

• In some cases, the regression line is more useful for predicting values. Consider the following examples (from Cars 1993):

7

Coefficient of Determination

• If the scatterplot is well-approximated by a straight line, the regression equation is more useful for making predictions.

• Correlation is one measure of this.

• The square of the correlation has a more intuitive meaning: What proportion of variation in the Response Variable is explained by variation in the Explanatory Variable?

JMP: “RSquare” under “Summary of Fit”


• In predicting Max. Price from Min. Price, we had RSquare = 0.822202.

• About 82% of the variation in Max. Price is explained by a variation in Min. Price.

• In predicting Highway MPG from Engine size, we have RSquare = 0.392871

• Only 39% of the variation in Highway MPG is explained by a variation in Engine Size.


• RSquare takes values from 0 to 1.

• For values close to 0, the regression line is not very useful for predictions.

• For values close to 1, the regression line is more useful for making predictions.

• RSquare makes no distinction between positive and negative association of variables.

10

Residuals

• For each individual in the data set we can compute the difference (error) between the actual and predicted values of the response variable. This difference is called a residual:

Residual = (actual value) – (predicted value)

• In JMP: Click the red triangle by “Linear Fit” and select “Save Residuals” from the drop-down menu. You can also “Plot Residuals.”

11

How does JMP find the Regression Line?

• JMP uses the most popular method, Ordinary Least Squares (OLS).

• To measure how a given line fits the data:• Compute all residuals, take the square of each.• Add up the results to get a “total error.”

• The closer this total is to zero, the better the line fits the data. Choose the line with the smallest “total error.”

• (Thankfully) JMP takes care of the details.

12

Limitations of Correlation and Linear Regression:

• Both describe linear relationships only.• Both are sensitive to outliers.• Beware of extrapolation: predicting

outside of the given range of the explanatory variable.

• Beware of lurking variables: other factors that may explain a strong correlation.

• Correlation does not imply causality!

13

Beware Extrapolation!

• A child’s height was plotted against her age...

• Can you predict her height at age 8 (96 months)?

• Can you predict her height at age 30 (360 months)?

80

85

90

95

100

30 35 40 45 50 55 60 65

age (months)

hei

gh

t (c

m)

14

Beware Extrapolation!

• Regression line:y = 71.95 + .383 x

• Height at 96 months? y = 94.93cm (3' 6'')

• Height at 360 months? y = 209.8cm (6’ 10'')

• Height at birth (x = 0)?

y = 71.95cm (2’ 4”)

70

90

110

130

150

170

190

210

30 90 150 210 270 330 390

age (months)

hei

gh

t (c

m)

Beware Lurking Variables!

• Although there may be a strong correlation (statistical relationship) between two variables, there might not be a direct practical (cause-and-effect) relationship.

• A lurking variable is a third variable (not in the scatterplot) that might cause the apparent relationship between explanatory and response variables.

Example: Pizza vs. Subway Fare

The regression line to the right shows a strong correlation (0.9878) between the cost of:

• A slice of pizza

• Subway fare

Q: Does the price of pizza affect the price of the subway?

17

• In a study of emergency services, it was noted that larger fires tend to have more firefighters present.

• Suppose we used:– Explanatory Variable: Number of firefighters

– Response Variable: Size of the fire

• We would expect a strong correlation.

• But it’s ludicrous to conclude that having more firefighters present causes the fire to be larger.

Caution:Correlation Does Not Imply Causation

1 chapter 10, part 2 linear regression. 2 last time: a scatterplot gives a picture of the...

Documents