regression
DESCRIPTION
Regression. What is regression to the mean? Suppose the mean temperature in November is 5 degrees What’s your best guess for tomorrow’s temperature? exactly 5? warmer than 5? colder than 5?. Regression. What is regression to the mean? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/1.jpg)
![Page 2: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/2.jpg)
Regression
• What is regression to the mean?
• Suppose the mean temperature in November is 5 degrees
• What’s your best guess for tomorrow’s temperature?
1. exactly 5?
2. warmer than 5?
3. colder than 5?
![Page 3: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/3.jpg)
Regression
• What is regression to the mean?
• Suppose the mean temperature in November is 5 degrees and today the temperature is 15
• What’s your best guess for tomorrow’s temperature?1. exactly 15 again?
2. exactly 5?
3. warmer than 15?
4. something between 5 and 15?
![Page 4: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/4.jpg)
Regression
• What is regression to the mean?
• Regression to the mean is the fact that scores tend to be closer to the mean than the values they are paired with
– e.g. Daughters tend to be shorter than mothers if the mothers are taller than the mean and taller than mothers if the mothers are shorter than the mean
– e.g. Parents with high IQs tend to have kids with lower IQs, parents with low IQs tend to have kids with higher IQs
![Page 5: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/5.jpg)
Regression
• What is regression to the mean?
• The strength of the correlation between two variables tells you the degree to which regression to the mean affects scores
– strong correlation means little regression to the mean
– weak correlation means strong regression to the mean
– no correlation means that one variable has no influence on values of the other - the mean is always your best guess
![Page 6: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/6.jpg)
Regression
• Suppose you measured workload and credit hours for 8 students
Could you predict the number of homework hours from credit hours?
![Page 7: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/7.jpg)
Regression
• Suppose you measured workload and credit hours for 8 students
Your first guess might be to pick the mean number of homework hours which is 12.9
![Page 8: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/8.jpg)
Regression
• Sum of Squares
•Adding up the squared deviation scores gives you a measure of the total error of your estimate
![Page 9: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/9.jpg)
Regression
• Sum of Squares
•ideally you would pick an equation that minimized the sum of the squared deviations
•You would need a line is as close as possible to each point
![Page 10: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/10.jpg)
Regression
• The regression line
•That line is called the regression line
•The sum of squared deviations from it is called the sum of squared error or SSE
![Page 11: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/11.jpg)
Regression
• The regression line
•That line is called the regression line
•its equation is:
€
′ y i = rxy
Sy
Sx
x i + y − rxy
Sy
Sx
x
![Page 12: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/12.jpg)
Regression
€
′ y i = rxy
Sy
Sx
x i + y − rxy
Sy
Sx
x
predicted y
remember: y = ax + b
ax b+
![Page 13: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/13.jpg)
• What happens if you had transformed all the scores to z scores and were trying to predict a z score?
Regression
€
′ y i = rxy
Sy
Sx
x i + y − rxy
Sy
Sx
x
![Page 14: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/14.jpg)
Regression
€
′ y i = rxy
Sy
Sx
x i + y − rxy
Sy
Sx
x
Sy = Sx = 1
€
y = x = 0
€
′ z yi= rxyzxi
So….
but…
• What happens if you had transformed all the scores to z scores and were trying to predict a z score?
![Page 15: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/15.jpg)
The Regression Line
• The regression line is a linear function that generates a y for a given x
![Page 16: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/16.jpg)
The Regression Line
• The regression line is a linear function that generates a y for a given x
• What should its slope and y-intercept be to be the best predictor?
![Page 17: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/17.jpg)
The Regression Line
• The regression line is a linear function that generates a y for a given x
• What should its slope and y-intercept be to be the best predictor?
• What does best predictor mean? It means least distance between the predicted y and an actual y for a given x
![Page 18: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/18.jpg)
The Regression Line
• The regression line is a linear function that generates a y for a given x
• What should its slope and y-intercept be to be the best predictor?
• What does best predictor mean? It means least distance between the predicted y and an actual y for a given x
• in other words, how much variability is residual after using the correlation to explain the y scores
![Page 19: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/19.jpg)
Mean Square Residual
• Recall that
€
Sy2 =
(y i − y )2∑n
![Page 20: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/20.jpg)
Mean Square Residual
• The variance of Zy is the average squared distance of each point from the x axis (note that the mean of Zy = 0)
Regression
-3.0
0.0
3.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0Actual Scores
![Page 21: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/21.jpg)
Mean Square Residual
• Some of the variance in the Zy scores is due to the correlation with x• Some of the variance in the Zy scores is due to other (probably
random) factors
Regression
-3.0
0.0
3.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0Actual Scores
![Page 22: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/22.jpg)
Mean Square Residual
• the variance due to other factors is called “residual” because it is “leftover” after fitting a regression line
• The best predictor should minimize this residual variance
![Page 23: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/23.jpg)
Mean Square Residual
€
MSres =(y i − y'i )∑
2
n
MSres is the average squared deviation of the actual scores from the regression line
![Page 24: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/24.jpg)
Minimizing MSres
• the regression line (the best predictor of y) is the line with a slope and y intercept such that MSres is minimized
![Page 25: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/25.jpg)
Minimizing MSres• What will be its y intercept?
– if there was no correlation at all, your best guess for y at any x would be the mean of y
– if there was a strong correlation between x and y, your best guess for the y that matches the mean x would be the mean y
– the mean of Zx is zero so the best guess for the Zy that goes with it will be zero (the mean of the Zy’s)
![Page 26: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/26.jpg)
Minimizing MSres
• In other words, the regression line will predict zero when Zx is zero so the y intercept of the regression line will be zero (only so for Z scores !)
![Page 27: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/27.jpg)
Minimizing MSres
• y intercept is zero
Regression
-3.0
0.0
3.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0Actual Scores
![Page 28: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/28.jpg)
Minimizing MSres
• what is the slope?
Regression
-3.0
0.0
3.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0Actual Scores
![Page 29: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/29.jpg)
Minimizing MSres
• what is the slope? consider the extremes:• Do the slopes look familiar?
Z scores
-3.0
0.0
3.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Zy = ZxZy’=Zxslope = 1
Z scores
-3.0
0.0
3.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Zy=-ZxZy’=-Zxslope = -1
Z scores
-3.0
0.0
3.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Zy is random with respect to ZxZy’=mean Zy=0slope = 0
![Page 30: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/30.jpg)
Minimizing MSres
• a line (regression of Zy on Zx) that has a slope of rxy and a y intercept of zero minimizes MSres
![Page 31: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/31.jpg)
Predicting raw scores
• we have a regression line in z scores:
• can we predict a raw-score y from a raw-score x?
€
zy = rxyzx
![Page 32: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/32.jpg)
Predicting raw scores
• recall that:
€
zyi=
y i − y
Sy
€
zxi=
x i − x
Sxand
![Page 33: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/33.jpg)
Predicting raw scores
• by substituting we get:
€
y i = rxy
Sy
Sxx i + y − rxy
Sy
Sxx
![Page 34: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/34.jpg)
Predicting raw scores
• by substituting we get:
• note that this is still of the form:
• note that the slope still depends on r and the intercept still depends on the mean of y
€
y i = rxy
Sy
Sxx i + y − rxy
Sy
Sxx
y = ax + b
a
+ b
![Page 35: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/35.jpg)
Interpreting rxy in terms of variance
• Recall that rxy is the slope of the regression line that minimizes MSres
![Page 36: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/36.jpg)
Interpreting rxy in terms of variance
• Recall that rxy is the slope of the regression line that minimizes MSres
€
MSres =(y i − ′ y )2∑
n= Sy− ′ y
2
![Page 37: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/37.jpg)
Interpreting rxy in terms of variance
• MSres can be simplified to:
€
Sy− ′ y 2 = Sy
2(1− rxy2 )
![Page 38: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/38.jpg)
Interpreting rxy in terms of variance
• Thus:
€
rxy2 =
Sy2 − Sy− ′ y
2
Sy2
![Page 39: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/39.jpg)
Interpreting rxy in terms of variance
• Thus:
• So can be thought of as the proportion of original variance accounted for by the regression line
€
rxy2 =
Sy2 − Sy− ′ y
2
Sy2
€
rxy2
![Page 40: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/40.jpg)
Interpreting rxy in terms of variance
Regression Line
Observed y
Predicted y
What % of this distance
is this distance
Mean of y
Subtract this distance
![Page 41: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/41.jpg)
Interpreting rxy in terms of variance
• it follows that 1 - is the proportion of
variance not accounted for by the regression
line - this is the residual variance
€
rxy2
![Page 42: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/42.jpg)
Interpreting rxy in terms of variance
• this can be thought of as a partitioning of variance into the variance accounted for by the regression and the variance unaccounted for
€
Sy2 = S ′ y
2 + Sy− ′ y 2
![Page 43: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/43.jpg)
Interpreting rxy in terms of variance
• this can be thought of as a partitioning of variance into the variance accounted for by the regression and the variance unaccounted for
€
(y i − y )2∑n
=(y '−y )2∑
n+
(y i − ′ y )2∑n
![Page 44: Regression](https://reader035.vdocument.in/reader035/viewer/2022070419/56815a9b550346895dc81b72/html5/thumbnails/44.jpg)
Interpreting rxy in terms of variance
• often written in terms of sums of squares:
• or simply
€
(y i − y )2∑ = (y '−y )2∑ + (y i − ′ y )2∑
SStotal = SSregression + SSresidual