assessi n g mo d el fi t€¦ · (intercept) 41.04 6.66 6.16 1.4e-08 taill 1.24 0.18 6.93 3.9e-10...
TRANSCRIPT
![Page 1: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/1.jpg)
Assessing Model FitC OR R E L ATION AN D R E G R E SS ION IN R
Ben Baumer
Assistant Professor at Smith College
![Page 2: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/2.jpg)
CORRELATION AND REGRESSION IN R
How well does our textbook model fit?
ggplot(data = textbooks, aes(x = amazNew, y = uclaNew)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
![Page 3: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/3.jpg)
CORRELATION AND REGRESSION IN R
How well does our possum model fit?
ggplot(data = possum, aes(y = totalL, x = tailL)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
![Page 4: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/4.jpg)
CORRELATION AND REGRESSION IN R
Sums of squared deviations
![Page 5: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/5.jpg)
CORRELATION AND REGRESSION IN R
SSE
library(broom) mod_possum <- lm(totalL ~ tailL, data = possum) mod_possum %>% augment() %>% summarize(SSE = sum(.resid^2), SSE_also = (n() - 1) * var(.resid))
SSE SSE_also 1 1301 1301
![Page 6: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/6.jpg)
CORRELATION AND REGRESSION IN R
RMSE
![Page 7: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/7.jpg)
CORRELATION AND REGRESSION IN R
Residual standard error (possums)summary(mod_possum)
Call: lm(formula = totalL ~ tailL, data = possum) Residuals: Min 1Q Median 3Q Max -9.210 -2.326 0.179 2.777 6.790 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32, Adjusted R-squared: 0.313 F-statistic: 48 on 1 and 102 DF, p-value: 3.94e-10
![Page 8: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/8.jpg)
CORRELATION AND REGRESSION IN R
Residual standard error (textbooks)lm(uclaNew ~ amazNew, data = textbooks) %>% summary()
Call: lm(formula = uclaNew ~ amazNew, data = textbooks) Residuals: Min 1Q Median 3Q Max -34.78 -4.57 0.58 4.01 39.00 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.9290 1.9354 0.48 0.63 amazNew 1.1990 0.0252 47.60 <2e-16 Residual standard error: 10.5 on 71 degrees of freedom Multiple R-squared: 0.97, Adjusted R-squared: 0.969 F-statistic: 2.27e+03 on 1 and 71 DF, p-value: <2e-16
![Page 9: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/9.jpg)
Let's practice!C OR R E L ATION AN D R E G R E SS ION IN R
![Page 10: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/10.jpg)
Comparing modelfits
C OR R E L ATION AN D R E G R E SS ION IN R
Ben Baumer
Assistant Professor at Smith College
![Page 11: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/11.jpg)
CORRELATION AND REGRESSION IN R
How well does our textbook model fit?
ggplot(data = textbooks, aes(x = amazNew, y = uclaNew)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
![Page 12: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/12.jpg)
CORRELATION AND REGRESSION IN R
How well does our possum model fit?
ggplot(data = possum, aes(y = totalL, x = tailL)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
![Page 13: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/13.jpg)
CORRELATION AND REGRESSION IN R
Null (average) modelFor all observations…
![Page 14: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/14.jpg)
CORRELATION AND REGRESSION IN R
Visualization of null model
![Page 15: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/15.jpg)
CORRELATION AND REGRESSION IN R
SSE, null model
mod_null <- lm(totalL ~ 1, data = possum) mod_null %>% augment(possum) %>% summarize(SSE = sum(.resid^2))
SSE 1 1914
![Page 16: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/16.jpg)
CORRELATION AND REGRESSION IN R
SSE, our model
mod_possum <- lm(totalL ~ tailL, data = possum) mod_possum %>% augment() %>% summarize(SSE = sum(.resid^2))
SSE 1 1301
![Page 17: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/17.jpg)
CORRELATION AND REGRESSION IN R
Coefficient of determination
SST is the SSE for the null model
![Page 18: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/18.jpg)
CORRELATION AND REGRESSION IN R
Connection to correlationFor simple linear regression...
![Page 19: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/19.jpg)
CORRELATION AND REGRESSION IN R
Summary
summary(mod_possum)
Call: lm(formula = totalL ~ tailL, data = possum) Residuals: Min 1Q Median 3Q Max -9.210 -2.326 0.179 2.777 6.790 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32, Adjusted R-squared: 0.313 F-statistic: 48 on 1 and 102 DF, p-value: 3.94e-10
![Page 20: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/20.jpg)
CORRELATION AND REGRESSION IN R
Over-reliance on R-squared
![Page 21: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/21.jpg)
Let's practice!C OR R E L ATION AN D R E G R E SS ION IN R
![Page 22: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/22.jpg)
Unusual PointsC OR R E L ATION AN D R E G R E SS ION IN R
Ben Baumer
Assistant Professor at Smith College
![Page 23: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/23.jpg)
CORRELATION AND REGRESSION IN R
Unusual points
regulars <- mlbBat10 %>% filter(AB > 400) ggplot(data = regulars, aes(x = SB, y = HR)) + geom_point() + geom_smooth(method = "lm", se = 0)
![Page 24: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/24.jpg)
CORRELATION AND REGRESSION IN R
Unusual points
regulars <- mlbBat10 %>% filter(AB > 400) ggplot(data = regulars, aes(x = SB, y = HR)) + geom_point() + geom_smooth(method = "lm", se = 0)
![Page 25: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/25.jpg)
CORRELATION AND REGRESSION IN R
Unusual points
regulars <- mlbBat10 %>% filter(AB > 400) ggplot(data = regulars, aes(x = SB, y = HR)) + geom_point() + geom_smooth(method = "lm", se = 0)
![Page 26: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/26.jpg)
CORRELATION AND REGRESSION IN R
Unusual points
regulars <- mlbBat10 %>% filter(AB > 400) ggplot(data = regulars, aes(x = SB, y = HR)) + geom_point() + geom_smooth(method = "lm", se = 0)
![Page 27: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/27.jpg)
CORRELATION AND REGRESSION IN R
Leverage
![Page 28: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/28.jpg)
CORRELATION AND REGRESSION IN R
Leverage computationslibrary(broom) mod <- lm(HR ~ SB, data = regulars) mod %>% augment() %>% arrange(desc(.hat)) %>% select(HR, SB, .fitted, .resid, .hat) %>% head()
HR SB .fitted .resid .hat 1 1 68 2.383 -1.383 0.13082 2 2 52 6.461 -4.461 0.07034 3 5 50 6.971 -1.971 0.06417 4 19 47 7.736 11.264 0.05550 5 5 47 7.736 -2.736 0.05550 6 1 42 9.010 -8.010 0.04261
![Page 29: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/29.jpg)
CORRELATION AND REGRESSION IN R
Leverage computationslibrary(broom) mod <- lm(HR ~ SB, data = regulars) mod %>% augment() %>% arrange(desc(.hat)) %>% select(HR, SB, .fitted, .resid, .hat) %>% head()
HR SB .fitted .resid .hat 1 1 68 2.383 -1.383 0.13082 # Juan Pierre 2 2 52 6.461 -4.461 0.07034 3 5 50 6.971 -1.971 0.06417 4 19 47 7.736 11.264 0.05550 5 5 47 7.736 -2.736 0.05550 6 1 42 9.010 -8.010 0.04261
![Page 30: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/30.jpg)
CORRELATION AND REGRESSION IN R
Consider Rickey Henderson…
![Page 31: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/31.jpg)
CORRELATION AND REGRESSION IN R
Consider Rickey Henderson…
![Page 32: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/32.jpg)
CORRELATION AND REGRESSION IN R
Consider Rickey Henderson…
![Page 33: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/33.jpg)
CORRELATION AND REGRESSION IN R
Influence via Cook's distancemod <- lm(HR ~ SB, data = regulars_plus) mod %>% augment() %>% arrange(desc(.cooksd)) %>% select(HR, SB, .fitted, .resid, .hat, .cooksd) %>% head()
HR SB .fitted .resid .hat .cooksd 1 28 65 5.770 22.230 0.105519 0.33430 2 54 9 17.451 36.549 0.006070 0.04210 3 34 26 13.905 20.095 0.013150 0.02797 4 19 47 9.525 9.475 0.049711 0.02535 5 39 0 19.328 19.672 0.010479 0.02124 6 42 14 16.408 25.592 0.006061 0.02061
![Page 34: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/34.jpg)
CORRELATION AND REGRESSION IN R
Influence via Cook's distancemod <- lm(HR ~ SB, data = regulars_plus) mod %>% augment() %>% arrange(desc(.cooksd)) %>% select(HR, SB, .fitted, .resid, .hat, .cooksd) %>% head()
HR SB .fitted .resid .hat .cooksd 1 28 65 5.770 22.230 0.105519 0.33430 # Henderson 2 54 9 17.451 36.549 0.006070 0.04210 3 34 26 13.905 20.095 0.013150 0.02797 4 19 47 9.525 9.475 0.049711 0.02535 5 39 0 19.328 19.672 0.010479 0.02124 6 42 14 16.408 25.592 0.006061 0.02061
![Page 35: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/35.jpg)
Let's practice!C OR R E L ATION AN D R E G R E SS ION IN R
![Page 36: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/36.jpg)
Dealing withOutliers
C OR R E L ATION AN D R E G R E SS ION IN R
Ben Baumer
Assistant Professor at Smith College
![Page 37: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/37.jpg)
CORRELATION AND REGRESSION IN R
Dealing with outliers
ggplot(data = regulars_plus, aes(x = SB, y = HR)) + geom_point() + geom_smooth(method = "lm", se = 0)
![Page 38: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/38.jpg)
CORRELATION AND REGRESSION IN R
Dealing with outliers
ggplot(data = regulars_plus, aes(x = SB, y = HR)) + geom_point() + geom_smooth(method = "lm", se = 0)
![Page 39: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/39.jpg)
CORRELATION AND REGRESSION IN R
Dealing with outliers
ggplot(data = regulars_plus, aes(x = SB, y = HR)) + geom_point() + geom_smooth(method = "lm", se = 0)
![Page 40: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/40.jpg)
CORRELATION AND REGRESSION IN R
The full model
coef(lm(HR ~ SB, data = regulars_plus))
(Intercept) SB 19.3282 -0.2086
![Page 41: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/41.jpg)
CORRELATION AND REGRESSION IN R
Removing outliers that don't fit
regulars <- regulars_plus %>% filter(!(SB > 60 & HR > 20)) # remove Henderson coef(lm(HR ~ SB, data = regulars))
(Intercept) SB 19.7169 -0.2549
What is the justi�cation?
How does the scope of inference change?
![Page 42: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/42.jpg)
CORRELATION AND REGRESSION IN R
Removing outliers that do fit
regulars_new <- regulars %>% filter(SB < 60) # remove Pierre coef(lm(HR ~ SB, data = regulars_new))
(Intercept) SB 19.6870 -0.2514
What is the justi�cation?
How does the scope of inference change?
![Page 43: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/43.jpg)
Let's practice!C OR R E L ATION AN D R E G R E SS ION IN R
![Page 44: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/44.jpg)
ConclusionC OR R E L ATION AN D R E G R E SS ION IN R
Ben Baumer
Assistant Professor at Smith College
![Page 45: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/45.jpg)
CORRELATION AND REGRESSION IN R
Graphical: scatterplots
![Page 46: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/46.jpg)
CORRELATION AND REGRESSION IN R
Numerical: correlation
![Page 47: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/47.jpg)
CORRELATION AND REGRESSION IN R
Numerical: correlation
![Page 48: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/48.jpg)
CORRELATION AND REGRESSION IN R
Modular: linear regression
![Page 49: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/49.jpg)
CORRELATION AND REGRESSION IN R
Focus on interpretation
![Page 50: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/50.jpg)
CORRELATION AND REGRESSION IN R
Objects and formulas
summary(mod)
Call: lm(formula = uclaNew ~ amazNew, data = textbooks) Residuals: Min 1Q Median 3Q Max -34.78 -4.57 0.58 4.01 39.00 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.9290 1.9354 0.48 0.63 amazNew 1.1990 0.0252 47.60 <2e-16 Residual standard error: 10.5 on 71 degrees of freedom Multiple R-squared: 0.97, Adjusted R-squared: 0.969 F-statistic: 2.27e+03 on 1 and 71 DF, p-value: <2e-16
![Page 51: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/51.jpg)
CORRELATION AND REGRESSION IN R
Model fit
![Page 52: Assessi n g Mo d el Fi t€¦ · (Intercept) 41.04 6.66 6.16 1.4e-08 tailL 1.24 0.18 6.93 3.9e-10 Residual standard error: 3.57 on 102 degrees of freedom Multiple R-squared: 0.32,](https://reader036.vdocument.in/reader036/viewer/2022071014/5fcd49544107de68d73204d1/html5/thumbnails/52.jpg)
Let's practice!C OR R E L ATION AN D R E G R E SS ION IN R