simple linear regression least squares line interpreting coefficients prediction cautions the formal...
TRANSCRIPT
![Page 1: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/1.jpg)
Simple Linear Regression
• Least squares line• Interpreting coefficients• Prediction• Cautions• The formal model
Section 2.6, 9.1, 9.2 Professor Kari Lock MorganDuke University
![Page 2: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/2.jpg)
Exam 2 GradesIn Class:
Lab:
Total:
![Page 3: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/3.jpg)
• Test whether this data provides evidence that Melanoma is found significantly more often on the left side of the body: one categorical variable -> single proportion
• 2011 Hollywood movies: If the sample is the same as the population, then no need for inference!
• Standard deviation of a bootstrap distribution is the standard error
Comments on In-Class Exam
![Page 4: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/4.jpg)
• Most common reason for points off: applying the wrong method
• The first step should ALWAYS be asking yourself: What is/are the variable(s)? Are they categorical or quantitative?
• Always plot/visualize your data. Outliers can strongly affect the results; you should either explain why they are left in, or else remove them
Comments on Lab Exam
![Page 5: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/5.jpg)
• For any one or two variables, resample( ) gives a confidence interval
• For any two variables, reallocate( ) tests for an association between the variables
• No conditions to check!• Automatically deals with missing data!• Only two commands to remember!• No distributions to remember!
Simulation Methods
![Page 6: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/6.jpg)
MODELING
![Page 7: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/7.jpg)
• Can you estimate the temperature on a summer evening, just by listening to crickets chirp?
Crickets and Temperature
Response Variable, y
ExplanatoryVariable, x
We will fit a model to predict temperature based on cricket chirp rate
![Page 8: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/8.jpg)
• A linear model predicts a response variable, y, using a linear function of explanatory variables
• Simple linear regression predicts on response variable, y, as a linear function of one explanatory variable, x
• We will create a model that predicts temperature as a linear function of cricket chirp rate
Linear Model
![Page 9: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/9.jpg)
Regression LineGoal: Find a straight line that best fits the data in a
scatterplot
![Page 10: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/10.jpg)
Predicted and Actual Values
• The actual response value, y, is the response value observed for a particular data point
• The predicted response value, , is the response value that would be predicted for a given x value, based on a model
• In linear regression, the predicted values fall on the regression line directly above each x value
• The best fitting line is that which makes the predicted values closest to the actual values
y
![Page 11: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/11.jpg)
Predicted and Actual Values
yy
![Page 12: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/12.jpg)
Residual
• The residual for each data point is
or the vertical distance from the point to the line
residual {
actual predicted yy
ˆ 63.5 61.44 2.06yy
Want to make all the residuals as small as possible.
How would you measure this?
![Page 13: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/13.jpg)
Least Squares Regression
• Least squares regression chooses the regression line that minimizes the sum of the squared residuals
2
1
minimize ( )ˆn
i ii
y y
![Page 14: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/14.jpg)
Least Squares Regression
![Page 15: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/15.jpg)
• The estimated regression line is
• Slope: increase in predicted y for every unit increase in x
• Intercept: predicted y value when x = 0
Equation of the Line
0 1ˆ ˆˆi iy x
Intercept Slope
![Page 16: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/16.jpg)
Regression in R> lm(Temperature~Chirps)
Call: lm(formula = Temperature ~ Chirps)
Coefficients: (Intercept) Chirps 37.6786 0.2307
37.69 0.23temperature Chirps
![Page 17: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/17.jpg)
Regression Model
37.69 0.23temperature chirps
Which is a correct interpretation?a) The average temperature is 37.69b) For every extra 0.23 chirps per minute, the
predicted temperate increases by 1 degreec) Predicted temperature increases by 0.23
degrees for each extra chirp per minuted) For every extra 0.23 chirps per minute, the
predicted temperature increases by 37.69
![Page 18: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/18.jpg)
Units
37.69 0.23temperature chirps
• It is helpful to think about units when interpreting a regression equation
y units y units x units
units
units
y
x
degrees degrees chirps per minute
degrees/chirps per min
0 1ˆ ˆˆi iy x
![Page 19: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/19.jpg)
Prediction
• The regression equation can be used to predict y for a given value of x
• If you listen and hear crickets chirping about 140 times per minute, your best guess at the outside temperature is
37.69 0.23temperature chirps
37.69 0.23 140 69.9
![Page 20: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/20.jpg)
Prediction37.69 0.23 140 69.9
![Page 21: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/21.jpg)
Prediction
If the crickets are chirping about 180 times per minute, your best guess at the temperature is
(a) 60(b) 70(c) 80
37.69 0.23 180 79.09
![Page 22: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/22.jpg)
Exam Scores 2 24.2 0.7 1Exam Exam
Calculate your residual.
![Page 23: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/23.jpg)
Prediction 37.69 0.23temperature chirps
The intercept tells us that the predicted temperature when the crickets are not chirping at all is 37.69. Do you think this is a good prediction?
(a) Yes(b) No
![Page 24: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/24.jpg)
Regression Caution 1
• Do not use the regression equation or line to predict outside the range of x values available in your data (do not extrapolate!)
• If none of the x values are anywhere near 0, then the intercept is meaningless!
![Page 25: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/25.jpg)
Duke Rank and Duke Shirts
a) positively associated
b) negatively associated
c) not associated
d) other
Are the rank of Duke among schools applied to and the number of Duke shirts owned
2 4 6 8
1020
3040
DukeRank
DukeShirts
![Page 26: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/26.jpg)
Duke Rank and Duke Shirts
a) positively associated
b) negatively associated
c) not associated
d) other
Are the rank of Duke among schools applied to and the number of Duke shirts owned
2 4 6 8
1020
3040
DukeRank
DukeShirts
2 4 6 8
1020
3040
DukeRank
DukeShirts
![Page 27: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/27.jpg)
Regression Caution 2
• Computers will calculate a regression line for any two quantitative variables, even if they are not associated or if the association is not linear
• ALWAYS PLOT YOUR DATA!
• The regression line/equation should only be used if the association is approximately linear
![Page 28: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/28.jpg)
Regression Caution 3
• Outliers (especially outliers in both variables) can be very influential on the regression line
• ALWAYS PLOT YOUR DATA!
http://illuminations.nctm.org/LessonDetail.aspx?ID=L455
![Page 29: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/29.jpg)
Life Expectancy and Birth Rate
Coefficients: (Intercept) LifeExpectancy 83.4090 -0.8895
Which of the following interpretations is correct?
(a) A decrease of 0.89 in the birth rate corresponds to a 1 year increase in predicted life expectancy
(b) Increasing life expectancy by 1 year will cause the birth rate to decrease by 0.89
(c) Both
(d) Neither
![Page 30: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/30.jpg)
Regression Caution 4
• Higher values of x may lead to higher (or lower) predicted values of y, but this does NOT mean that changing x will cause y to increase or decrease
• Causation can only be determined if the values of the explanatory variable were determined randomly (which is rarely the case for a continuous explanatory variable)
![Page 31: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/31.jpg)
Explanatory and Response
• Unlike correlation, for linear regression it does matter which is the explanatory variable and which is the response
37.69 0.23temperature chirps
157.8 4.25chirps Temperature
![Page 32: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/32.jpg)
r = 0
Challenge: If the correlation between x and y is 0, what would the regression line be?
![Page 33: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/33.jpg)
• The population/true simple linear model is
• 0 and 1, are unknown parameters
• Can use familiar inference method!
0 1i i iy x òIntercept Slope
Simple Linear Model
Random error
![Page 34: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/34.jpg)
• Confidence intervals and hypothesis tests for the slope can be done using the familiar formulas:
• Population Parameter: 1, Sample Statistic:
• Use t-distribution with n – 2 degrees of freedom
Inference for the Slope
sample statistic null value
SEt
*sample statistic t SE
1
![Page 35: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/35.jpg)
Inference for Slope
7n
Give a 95% confidence interval for the true slope.
Is the slope significantly different from 0? (a) Yes(b) No
![Page 36: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/36.jpg)
Confidence Interval
7n
*statistic t SE > qt(.975,5) [1] 2.570582
2.57 0.014
0.194,
0.2
66
3
0.2
We are 95% confident that the true slope, regressing temperature on cricket chirp rate, is between 0.194 and 0.266 degrees per chirp per minute.
![Page 37: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/37.jpg)
Hypothesis Test
7n
0 1
1
: 0
: 0aH
H
0.23116.21
0.014t
> 2*pt(16.21,5,lower.tail=FALSE) [1] 1.628701e-05
There is strong evidence that the slope is significantly different from 0, and that there is an association between cricket chirp rate and temperature.
![Page 38: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/38.jpg)
Small Samples• The t-distribution is only appropriate for large samples (definitely not n = 7)!
• We should have done inference for the slope using simulation methods...
![Page 39: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/39.jpg)
If results are very significant, it doesn’t really matter if you get the exact p-value… you come to the same conclusion!
![Page 40: Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan](https://reader036.vdocument.in/reader036/viewer/2022081603/56649d9c5503460f94a84ae1/html5/thumbnails/40.jpg)
• Details here
• Group project on regression (modeling)
• If you want to change groups, email me TODAY OR TOMORROW. If other people in your lab section want to change, I’ll move people around.
• Need a data set with a quantitative response variable and multiple explanatory variables; explanatory variables must have at least one categorical and at least one quantitative
• Proposal due next Wednesday
Project 2