regression basics
DESCRIPTION
Regression Basics. Predicting a DV with a Single IV. What are predictors and criteria? Write an equation for the linear regression. Describe each term. How do changes in the slope and intercept affect (move) the regression line? - PowerPoint PPT PresentationTRANSCRIPT
Regression Basics
Predicting a DV with a Single IV
Questions• What are predictors and
criteria?• Write an equation for
the linear regression. Describe each term.
• How do changes in the slope and intercept affect (move) the regression line?
• What does it mean to test the significance of the regression sum of squares? R-square?
• What is R-square?
• What does it mean to choose a regression line to satisfy the loss function of least squares?
• How do we find the slope and intercept for the regression line with a single independent variable? (Either formula for the slope is acceptable.)
• Why does testing for the regression sum of squares turn out to have the same result as testing for R-square?
Basic Ideas
• Jargon– IV = X = Predictor (pl. predictors)– DV = Y = Criterion (pl. criteria)– Regression of Y on X e.g., GPA on SAT
• Linear Model = relations between IV and DV represented by straight line.
• A score on Y has 2 parts – (1) linear function of X and (2) error.
Y Xi i i (population values)
Basic Ideas (2)
• Sample value:
• Intercept – place where X=0
• Slope – change in Y if X changes 1 unit. Rise over run.
• If error is removed, we have a predicted value for each person at X (the line):
Y a bX ei i i
Y a bXSuppose on average houses are worth about $75.00 a square foot. Then the equation relating price to size would be Y’=0+75X. The predicted price for a 2000 square foot house would be $150,000.
Linear Transformation
• 1 to 1 mapping of variables via line
• Permissible operations are addition and multiplication (interval data)
1086420X
40
35
30
25
20
15
10
5
0
Y
Changing the Y Intercept
Y=5+2XY=10+2XY=15+2X
Add a constant
1086420X
30
20
10
0
Y
Changing the Slope
Y=5+.5XY=5+X
Y=5+2X
Multiply by a constant
Y a bX
Linear Transformation (2)
• Centigrade to Fahrenheit
• Note 1 to 1 map
• Intercept?
• Slope?
1209060300Degrees C
240
200
160
120
80
40
0D
eg
ree
s F
32 degrees F, 0 degrees C
212 degrees F, 100 degrees C
Intercept is 32. When X (Cent) is 0, Y (Fahr) is 32.
Slope is 1.8. When Cent goes from 0 to 100 (run), Fahr goes from 32 to 212 (rise), and 212-32 = 180. Then 180/100 =1.8 is rise over run is the slope. Y = 32+1.8X. F=32+1.8C.
Y a bX
Review
• What are predictors and criteria?
• Write an equation for the linear regression with 1 IV. Describe each term.
• How do changes in the slope and intercept affect (move) the regression line?
Regression of Weight on Height
Ht Wt
61 105
62 120
63 120
65 160
65 120
68 145
69 175
70 160
72 185
75 210
N=10 N=10
M=67 M=150
SD=4.57 SD=
33.99
767472706866646260Height in Inches
240
210
180
150
120
90
60
We
igh
t in
Lb
s
Regression of Weight on Height
Regression of Weight on Height
Regression of Weight on Height
Rise
Run
Y= -316.86+6.97X
Correlation (r) = .94.
Regression equation: Y’=-316.86+6.97X
Y a bX
Illustration of the Linear Model. This concept is vital!
727068666462Height
200
180
160
140
120
100
We
igh
t
Regression of Weight on Height
727068666462Height
Regression of Weight on Height
Regression of Weight on Height
(65,120)
Mean of X
Mean of Y
Deviation from X
Deviation from Y
Linear Part
Error Part
yY'
e
Y Xi i i
Y a bX ei i i
Consider Y as a deviation from the mean.
Part of that deviation can be associated with X (the linear part) and part cannot (the error).
Y a bX'
iii YYe
Predicted Values & ResidualsN Ht Wt Y' Resid
1 61 105 108.19 -3.19
2 62 120 115.16 4.84
3 63 120 122.13 -2.13
4 65 160 136.06 23.94
5 65 120 136.06 -16.06
6 68 145 156.97 -11.97
7 69 175 163.94 11.06
8 70 160 170.91 -10.91
9 72 185 184.84 0.16
10 75 210 205.75 4.25
M 67 150 150.00 0.00
SD 4.57 33.99 31.85 11.89
V 20.89 1155.56 1014.37 141.32
727068666462Height
200
180
160
140
120
100
We
igh
t
Regression of Weight on Height
727068666462Height
Regression of Weight on Height
Regression of Weight on Height
(65,120)
Mean of X
Mean of Y
Deviation from X
Deviation from Y
Linear Part
Error Part
yY'
e
Numbers for linear part and error.
Note M of Y’ and Residuals. Note variance of Y is V(Y’) + V(res).
Y a bX
Finding the Regression Line
Need to know the correlation, SDs and means of X and Y. The correlation is the slope when both X and Y are expressed as z scores. To translate to raw scores, just bring back original SDs for both.
N
zzr YX
XY
X
YXY SD
SDrb
To find the intercept, use: XbYa
(rise over run)
Suppose r = .50, SDX = .5, MX = 10, SDY = 2, MY = 5.
25.
25. b 15)10(25 a XY 215'
Slope Intercept Equation
Line of Least Squares
727068666462Height
200
180
160
140
120
100
We
igh
t
Regression of Weight on Height
727068666462Height
Regression of Weight on Height
Regression of Weight on Height
(65,120)
Mean of X
Mean of Y
Deviation from X
Deviation from Y
Linear Part
Error Part
yY'
e
We have some points.
Assume linear relations is reasonable, so the 2 vbls can be represented by a line. Where should the line go?Place the line so errors (residuals) are small. The line we
calculate has a sum of errors = 0. It has a sum of squared errors that are as small as possible; the line provides the smallest sum of squared errors or least squares.
Least Squares (2)
Review
• What does it mean to choose a regression line to satisfy the loss function of least squares?
• What are predicted values and residuals?
Suppose r = .25, SDX = 1, MX = 10, SDY = 2, MY = 5.What is the regression equation (line)?
Partitioning the Sum of Squares
ebXaY bXaY 'eYY ' 'YYe
Definitions
)'()'( YYYYYY = y, deviation from mean
22 )]'()'[()( YYYYYY Sum of squares
222 )'()'()( YYYYy(cross products drop out)
Sum of squared deviations from the mean
=Sum of squares due to regression
+Sum of squared residuals
reg error
Analog: SStot=SSB+SSW
Partitioning SS (2)
SSY=SSReg + SSRes Total SS is regression SS plus residual SS. Can also get proportions of each. Can get variance by dividing SS by N if you want. Proportion of total SS due to regression = proportion of total variance due to regression = R2 (R-square).
Y
s
Y
g
Y
Y
SS
SS
SS
SS
SS
SS ReRe
)1(1 22 RR
Partitioning SS (3)Wt (Y)M=150
Y' Resid (Y-Y')
Resid2
105 2025 108.19 -41.81 1748.076 -3.19 10.1761
120 900 115.16 -34.84 1213.826 4.84 23.4256
120 900 122.13 -27.87 776.7369 -2.13 4.5369
160 100 136.06 -13.94 194.3236 23.94 573.1236
120 900 136.06 -13.94 194.3236 -16.06 257.9236
145 25 156.97 6.97 48.5809 -11.97 143.2809
175 625 163.94 13.94 194.3236 11.06 122.3236
160 100 170.91 20.91 437.2281 -10.91 119.0281
185 1225 184.84 34.84 1213.826 0.16 0.0256
210 3600 205.75 55.75 3108.063 4.25 18.0625
Sum = 1500
10400 1500.01 0.01 9129.307 -0.01 1271.907
Variance 1155.56 1014.37 141.32
2)( YY YY ' 2)'( YY
Partitioning SS (4)Total Regress Residual
SS 10400 9129.31 1271.91
Variance 1155.56 1014.37 141.32
12.88.110400
91.1271
10400
31.9129
10400
10400 Proportion of SS
12.88.156.1155
32.141
56.1155
37.1014
56.1155
56.1155 Proportion of
VarianceR2 = .88
Note Y’ is linear function of X, so .
XYYY rr 94.'
1' XYr
012.35..88. '222
' EYYEYEYY rrrRr
Significance Testing
Testing for the SS due to regression = testing for the variance due to regression = testing the significance of R2. All are the same. 0: 2
0 populationRH
FSS df
SS df
SS k
SS N kreg
res
reg
res
/
/
/
/ ( )1
2 1
k=number of IVs (here it’s 1) and N is the sample size (# people). F with k and (N-k-1) df.
FSS df
SS dfreg
res
/
/
. /
. / ( ).1
2
9129 31 1
127191 10 1 157 42
)1/()1(
/2
2
kNR
kRF Equivalent test using R-square
instead of SS.
F
. /
( . ) / ( ).
88 1
1 88 10 1 158 67
Results will be same within rounding error.
Review
• What does it mean to test the significance of the regression sum of squares? R-square?
• What is R-square?• Why does testing for the regression sum of
squares turn out to have the same result as testing for R-square?