chapter 5 regression. u objective: to quantify the linear relationship between an explanatory...
TRANSCRIPT
![Page 1: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/1.jpg)
Chapter 5
Regression
![Page 2: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/2.jpg)
Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y).
We can then predict the average response for all subjects with a given value of the explanatory variable.
Linear Regression
![Page 3: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/3.jpg)
Prediction via Regression Line Number of new birds and Percent returning
Example: predicting number (y) of new adult birds that join the colony based on the percent (x) of adult birds that return to the colony from the previous year.
![Page 4: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/4.jpg)
Correlation tells us about
strength (scatter) and direction
of the linear relationship
between two quantitative
variables.
In addition, we would like to have a numerical description of how both
variables vary together. For instance, is one variable increasing faster
than the other one? And we would like to make predictions based on that
numerical description.But which line best
describes our data?
![Page 5: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/5.jpg)
Least Squares
Used to determine the “best” line
We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict)
Least Squares: use the line that minimizes the sum of the squares of the vertical distances of the data points from the line
![Page 6: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/6.jpg)
Distances between the points and line are squared so all are positive values. This is done so that distances can be properly added (Pythagoras).
The regression lineThe least-squares regression line is the unique line such that the sum of
the squared vertical (y) distances between the data points and the line
is the smallest possible.
![Page 7: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/7.jpg)
Least Squares Regression Line
Regression equation: y = a + bx^
– x is the value of the explanatory variable– “y-hat” is the average value of the response
variable (predicted response for a value of x)
– note that a and b are just the intercept and slope of a straight line
– note that r and b are not the same thing, but their signs will agree
![Page 8: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/8.jpg)
Prediction via Regression Line Number of new birds and Percent returning
The regression equation is y-hat = 31.9343 0.3040x– y-hat is the average number of new birds for all
colonies with percent x returning For all colonies with 60% returning, we predict
the average number of new birds to be 13.69:31.9343 (0.3040)(60) = 13.69 birds
Suppose we know that an individual colony has 60% returning. What would we predict the number of new birds to be for just that colony?
![Page 9: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/9.jpg)
^
xbya
s
srb
x
y
Regression equation: y = a + bx
Regression Line Calculation
where sx and sy are the standard deviations of the two variables, and r is their correlation
![Page 10: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/10.jpg)
Regression CalculationCase Study
Per Capita Gross Domestic Productand Average Life Expectancy for
Countries in Western Europe
![Page 11: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/11.jpg)
Country Per Capita GDP (x) Life Expectancy (y)
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99
United Kingdom 21.2 77.37
Regression CalculationCase Study
![Page 12: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/12.jpg)
0.795 1.532
0.809 77.754 21.52
yx ss
ryx
Linear regression equation:
68.716.52)(0.420)(21-77.754
0.4201.532
0.795(0.809)
xbya
s
srb
x
y
y = 68.716 + 0.420x^
Regression CalculationCase Study
![Page 13: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/13.jpg)
Facts about least-squares regression1. The distinction between explanatory and response variables is
essential in regression.
2. There is a close connection between correlation and the slope of the
least-squares line.
3. The least-squares regression line always passes through the point
4. The correlation r describes the strength of a straight-line
relationship. The square of the correlation, r2, is the fraction of the
variation in the values of y that is explained by the least-squares
regression of y on x.
,x y
![Page 14: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/14.jpg)
Coefficient of Determination (R2)
Measures usefulness of regression prediction R2 (or r2, the square of the correlation): measures
what fraction of the variation in the values of the response variable (y) is explained by the regression line r=1: R2=1: regression line explains all (100%) of
the variation in y r=.7: R2=.49: regression line explains almost half
(50%) of the variation in y
![Page 15: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/15.jpg)
Residuals
A residual is the difference between an observed value of the response variable and the value predicted by the regression line:
residual = y y
![Page 16: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/16.jpg)
Residuals
A residual plot is a scatterplot of the regression residuals against the explanatory variable
– used to assess the fit of a regression line
– look for a “random” scatter around zero
![Page 17: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/17.jpg)
Case StudyGesell Adaptive Score and Age at First Word
![Page 18: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/18.jpg)
Residual Plot:Case Study
Gesell Adaptive Score and Age at First Word
![Page 19: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/19.jpg)
The x-axis in a residual plot is the same as on the scatterplot.
The line on both plots is the regression line.
Only the y-axis is different.
![Page 20: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/20.jpg)
Residuals are randomly scattered—good!
A curved pattern—means the relationship
you are looking at is not linear.
A change in variability across plot is a
warning sign. You need to find out why it
is and remember that predictions made in
areas of larger variability will not be as
good.
![Page 21: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/21.jpg)
Outliers and Influential Points
An outlier is an observation that lies far away from the other observations
– outliers in the y direction have large residuals
– outliers in the x direction are often influential for the least-squares regression line, meaning that the removal of such points would markedly change the equation of the line
![Page 22: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/22.jpg)
Outliers:Case Study
Gesell Adaptive Score and Age at First Word
From all the data
r2 = 41%
r2 = 11%
After removing child 18
![Page 23: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/23.jpg)
Cautionsabout Correlation and Regression
only describe linear relationships are both affected by outliers always plot the data before interpreting beware of extrapolation
– predicting outside of the range of x
beware of lurking variables– have important effect on the relationship among the
variables in a study, but are not included in the study
association does not imply causation
![Page 24: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/24.jpg)
Caution:Beware of Extrapolation
Sarah’s height was plotted against her age
Can you predict her height at age 42 months?
Can you predict her height at age 30 years (360 months)?
80
85
90
95
100
30 35 40 45 50 55 60 65
age (months)
hei
gh
t (c
m)
![Page 25: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/25.jpg)
Caution:Beware of Extrapolation
Regression line:y-hat = 71.95 + .383 x
height at age 42 months? y-hat = 88
height at age 30 years? y-hat = 209.8– She is predicted to
be 6’ 10.5” at age 30.70
90
110
130
150
170
190
210
30 90 150 210 270 330 390
age (months)
hei
gh
t (c
m)
![Page 26: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/26.jpg)
Caution: Beware of Lurking Variables
A lurking variable is a variable not included in the study design that does have an effect on the variables studied.
Lurking variables can falsely suggest a relationship.
What is the lurking variable in these examples?
How could you answer if you didn’t know anything about the topic?
Strong positive association between
the number firefighters at a fire site and
the amount of damage a fire does
–Negative association between moderate
amounts of wine drinking and death rates
from heart disease in developed nations
![Page 27: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/27.jpg)
Even very strong correlations may not correspond to a real causal relationship (changes in x actually
causing changes in y).(correlation may be explained by a
lurking variable)
Caution:Correlation Does Not Imply Causation
![Page 28: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/28.jpg)
Social Relationships and Health
Does lack of social relationships cause people to become ill? (there was a strong correlation)
Or, are unhealthy people less likely to establish and maintain social relationships? (reversed relationship)
Or, is there some other factor that predisposes people both to have lower social activity and become ill?
Caution:Correlation Does Not Imply Causation
![Page 29: Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict](https://reader035.vdocument.in/reader035/viewer/2022062515/56649f415503460f94c606dd/html5/thumbnails/29.jpg)
Evidence of Causation A properly conducted experiment establishes
the connection (chapter 8)
Other considerations:– The association is strong– The association is consistent
The connection happens in repeated trials The connection happens under varying conditions
– Higher doses are associated with stronger responses– Alleged cause precedes the effect in time– Alleged cause is plausible (reasonable explanation)