regression wisdom getting to know your scatterplot and residuals

21
Regression Wisdom Getting to Know Your Scatterplot and Residuals

Upload: solomon-stokes

Post on 18-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Regression Wisdom

Getting to Know Your Scatterplot and Residuals

Page 2: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Important Terms

0Extrapolation (203)

0Outlier (205)

0 Leverage (206)

0 Influential Point (206)

0 Lurking Variable (208)

Page 3: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Residuals

0Recall – Residuals are the difference between data values and the corresponding values predicted by the regression model

0Residual = Observed Value – Predicted Value e =

(page 172)

Page 4: Regression Wisdom Getting to Know Your Scatterplot and Residuals

When Residuals Aren’t Random

0 We want our plot of residuals to be boring

0 It should have no structure, direction, shape, none of that stuff.

0 When it does, there is something else going on in the data that explains the variation of the two variables.

Page 5: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Sifting Residuals for Groups

- We can form subsets of the same population to try and achieve a better analysis of the data.

- Sometimes the easiest way to achieve this is to examine a plot or histogram of residuals

Page 6: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Sifting and Subsets

0You can perform regression analysis on each subset of the larger population, noting correlation and all appropriate summary statistics for each subset.

Page 7: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Extrapolation

0Our Linear Model:0Plug in a new x, it gives

you a predicted

0 But the farther the new x-value is from , the less trust we can place in the predicted y value.

0 Once we venture into new x territory such a prediction is called an extrapolation

0 Extrapolations require the very questionable assumption that nothing about the relationship between x and y changes even at extreme values of x and beyond

Page 8: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Extrapolation

0 If your x variable is Time, extrapolation becomes a prediction about the future!

0Example:

Mid-1970s, oil cost $17 a barrel in 2005 dollars

0 This is what it had cost for about 20 years!

0 But suddenly, within a few years, the price skyrocketed to over $40 a barrel

0 If you used this data for your model, you might be predicting oil prices today in the hundreds upon hundreds of dollars per barrel while if you had done your analysis before the spike in prices, you might still be predicting around 17$ a barrel.

Page 9: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Outliers, Leverage, Influence

0 Outliers can have big impacts on your fitted regression line.

0 Points with large residuals always deserve special attention.

0 A data point with an unusually large x-value from the mean is said to have high leverage

0 High Leverage doesn’t mean the point changes the overall picture.

0 If the point lines up with the pattern of other points, including it doesn’t change our estimate of the line

0 But by sitting so far from it may strengthen the relationship, inflate the correlation and R-Squared

Page 10: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Outliers, Leverage, Influence

0A point is influential if omitting it from the analysis gives a very different model

0 Influence depends on both leverage and residual

0A case with high leverage whose y-value sits right on the line is not influential.

0Removing this point may not change the slope but may change R-Squared

Page 11: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Outliers, Leverage, Influence

0A point is influential if omitting it from the analysis gives a very different model

0 Influence depends on both leverage and residual

0A case with modest leverage but a very large residual can be influential.

0With enough leverage, the regression line can be pulled right to it. Then its highly influential but will have a small residual

Page 12: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Outliers, Leverage, Influence

0A point is influential if omitting it from the analysis gives a very different model

0 Influence depends on both leverage and residual

0The only thing to do is to do your analysis twice:

0Once with the point

0Once omitting the point

Page 13: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Does the unusual point have high-leverage, a large residual,

and is it influential?

Not high leverageNot influentialLarge Residual

High LeverageNot InfluentialSmall Residual

High LeverageInfluentialNot Large Residual

Page 14: Regression Wisdom Getting to Know Your Scatterplot and Residuals
Page 15: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Lurking Variables, Causation0 No matter how strong the

association

0 No matter how large the value

0 No matter how straight the line

0 There is NO way to conclude from regression alone that one variable causes the other.

0There may always be a lurking variable that causes the apparent association

Page 16: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Lurking Variable Example0The scatterplot shows the

Life Expectancy of men and women in 41 different countries

0These values are plotted against the square root of Doctors per person in that country.

Page 17: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Lurking Variable Example0There is a strong positive

correlation,

0This confirms our expectation that more doctors per person improves healthcare, leading to longer lifetimes and greater life expectancy.

Page 18: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Lurking Variable Example0Can we conclude though

that doctors cause greater life expectancy? Perhaps, but increasing numbers of doctors and greater life expectancy may both be results of a larger change.

Page 19: Regression Wisdom Getting to Know Your Scatterplot and Residuals

Lurking Variable Example0Here is a similar looking

scatterplot now comparing life expectancy to the square root of TVs per person.

0This is an even stronger association!

Page 20: Regression Wisdom Getting to Know Your Scatterplot and Residuals

A Final Note

0Beware of scatterplots of statistics of summarized data.

0 For example,

Page 21: Regression Wisdom Getting to Know Your Scatterplot and Residuals

HomeworkPg 214, #1, 3, 4, 8, 10