mlr, multiple linear regression ... - 27411.compute.dtu… · enote 3 3.2 example: car data 6 -...
TRANSCRIPT
eNote 3 INDHOLD 2
Indhold
3 MLR, Multiple Linear Regression (OLS) in R 13.1 Reading material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.2 Example: Car data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2.1 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.2.2 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.2.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.4 More validation - modelling non-linearities and interactions . . . . 193.2.5 Removing outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1 Reading material
• Intro statistics course eNote6 on MLR: (incl. an ozon concentration example) http://introstat.compute.dtu.dk/enote/afsnit/NUID177/
• Check Wehrens book, Chapter 8, pp 145-148
• and the Varmuza-book, chapter 4, sections 4.3.1 -4.3.2.
• And some basics on Regression in R: http://www.statmethods.net/stats/regression.html
• And Regression diagnostics in R: http://www.statmethods.net/stats/regression.html
eNote 3 3.2 EXAMPLE: CAR DATA 3
3.2 Example: Car data
3.2.1 Exploration
Please check the section on explorative plotting of the car data in chapter 1 of the eNote.
## The data:
data(mtcars)
summary(mtcars) # Summarize each variable in the data set
3.2.2 Modelling
Simple linear regression is carried out as follows:
# Simple regression
lm1 <- lm(mpg ~ wt, data = mtcars)
summary(lm1)
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.543 -2.365 -0.125 1.410 6.873
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.285 1.878 19.86 < 2e-16 ***
wt -5.344 0.559 -9.56 1.3e-10 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 3.05 on 30 degrees of freedom
Multiple R-squared: 0.753,Adjusted R-squared: 0.745
eNote 3 3.2 EXAMPLE: CAR DATA 4
F-statistic: 91.4 on 1 and 30 DF, p-value: 1.29e-10
For the full MLR-model we simply add all the terms in the model:
# Full MLR model:
lm2 <- lm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb, data = mtcars)
summary(lm2)
Call:
lm(formula = mpg ~ cyl + disp + hp + drat + wt + qsec + vs +
am + gear + carb, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.45 -1.60 -0.12 1.22 4.63
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.3034 18.7179 0.66 0.518
cyl -0.1114 1.0450 -0.11 0.916
disp 0.0133 0.0179 0.75 0.463
hp -0.0215 0.0218 -0.99 0.335
drat 0.7871 1.6354 0.48 0.635
wt -3.7153 1.8944 -1.96 0.063 .
qsec 0.8210 0.7308 1.12 0.274
vs 0.3178 2.1045 0.15 0.881
am 2.5202 2.0567 1.23 0.234
gear 0.6554 1.4933 0.44 0.665
carb -0.1994 0.8288 -0.24 0.812
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared: 0.869,Adjusted R-squared: 0.807
F-statistic: 13.9 on 10 and 21 DF, p-value: 3.79e-07
Some (few actually) effects appear significant others not. It is possible to have R do someautomated model selection, that is, e.g. removing non-significant terms in a stepwise
eNote 3 3.2 EXAMPLE: CAR DATA 5
manner:
step(lm2, direction = "backward")
Start: AIC=70.9
mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
Df Sum of Sq RSS AIC
- cyl 1 0.08 148 68.9
- vs 1 0.16 148 68.9
- carb 1 0.41 148 69.0
- gear 1 1.35 149 69.2
- drat 1 1.63 149 69.2
- disp 1 3.92 151 69.7
- hp 1 6.84 154 70.3
- qsec 1 8.86 156 70.8
<none> 148 70.9
- am 1 10.55 158 71.1
- wt 1 27.01 174 74.3
Step: AIC=68.92
mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb
Df Sum of Sq RSS AIC
- vs 1 0.27 148 67.0
- carb 1 0.52 148 67.0
- gear 1 1.82 149 67.3
- drat 1 1.98 150 67.3
- disp 1 3.90 152 67.7
- hp 1 7.36 155 68.5
<none> 148 68.9
- qsec 1 10.09 158 69.0
- am 1 11.84 159 69.4
- wt 1 27.03 175 72.3
Step: AIC=66.97
mpg ~ disp + hp + drat + wt + qsec + am + gear + carb
Df Sum of Sq RSS AIC
- carb 1 0.69 148 65.1
eNote 3 3.2 EXAMPLE: CAR DATA 6
- gear 1 2.14 150 65.4
- drat 1 2.21 150 65.4
- disp 1 3.65 152 65.8
- hp 1 7.11 155 66.5
<none> 148 67.0
- am 1 11.57 159 67.4
- qsec 1 15.68 164 68.2
- wt 1 27.38 175 70.4
Step: AIC=65.12
mpg ~ disp + hp + drat + wt + qsec + am + gear
Df Sum of Sq RSS AIC
- gear 1 1.6 150 63.5
- drat 1 1.9 150 63.5
<none> 148 65.1
- disp 1 10.1 159 65.2
- am 1 12.3 161 65.7
- hp 1 14.8 163 66.2
- qsec 1 26.4 175 68.4
- wt 1 69.1 218 75.3
Step: AIC=63.46
mpg ~ disp + hp + drat + wt + qsec + am
Df Sum of Sq RSS AIC
- drat 1 3.3 153 62.2
- disp 1 8.5 159 63.2
<none> 150 63.5
- hp 1 13.3 163 64.2
- am 1 20.0 170 65.5
- qsec 1 25.6 176 66.5
- wt 1 67.6 218 73.4
Step: AIC=62.16
mpg ~ disp + hp + wt + qsec + am
Df Sum of Sq RSS AIC
- disp 1 6.6 160 61.5
<none> 153 62.2
eNote 3 3.2 EXAMPLE: CAR DATA 7
- hp 1 12.6 166 62.7
- qsec 1 26.5 180 65.3
- am 1 32.2 186 66.3
- wt 1 69.0 222 72.1
Step: AIC=61.52
mpg ~ hp + wt + qsec + am
Df Sum of Sq RSS AIC
- hp 1 9.2 169 61.3
<none> 160 61.5
- qsec 1 20.2 180 63.3
- am 1 26.0 186 64.3
- wt 1 78.5 239 72.3
Step: AIC=61.31
mpg ~ wt + qsec + am
Df Sum of Sq RSS AIC
<none> 169 61.3
- am 1 26.2 195 63.9
- qsec 1 109.0 278 75.2
- wt 1 183.3 353 82.8
Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)
Coefficients:
(Intercept) wt qsec am
9.62 -3.92 1.23 2.94
In each step, the least important term is dropped from the model if it is judged not tobe important. The default criterion used here for judging ”importance” is the so-calledAIC, which in this case is very closely related to the usual t-testing of each individualterm, that could also be used as a criterion. The AIC is defined from the log-likelihoodvalue of the model fit:
AIC = 2k− 2 log L
where k is the number of paramaters in the model and L is the (maximum) likelihoodvalue. For the full MLR model here, we have k = 12 parameters, 10 x-variables, the in-tercept and the residual variance parameter, so we can check the AIC and log-likelihood
eNote 3 3.2 EXAMPLE: CAR DATA 8
values by inbuilt R-functions:
logLik(lm2)
’log Lik.’ -69.855 (df=12)
2*12-2*logLik(lm2)
’log Lik.’ 163.71 (df=12)
AIC(lm2)
[1] 163.71
We will not teach the likelihood theory here, but merely mention that the maximum log-likelihood value for these models simply amounts to a measure of the fit of the modelusing the normal distribution, that is, it depends basically on the residuals of the model:
lm2summary <- summary(lm2)
lm2summary$sigma
[1] 2.6502
var_ml <- lm2summary$sigma^2*21/32
sum(log(dnorm(resid(lm2), sd = sqrt(var_ml))))
[1] -69.855
The better the fit, the smaller the variance, the larger the log-likelihood. And you thenchoose models where the AIC is smallest: Whenever you drop a variable from the mo-del, it will fit poorer and the log-likelihood will be smaller for the model without thevariable. However, only if the log-likelihood becomes only less than 2 smaller a singlevariable will be dropped from the model due to the AIC approach.
eNote 3 3.2 EXAMPLE: CAR DATA 9
The AIC value for each variable is hence in one-to-one correspondence with the t-testp-values. But actually, being chosen to be kept in the model due to the AIC criterionmight not necessarily mean that the variable is significant on the 5%-level, so we checkthe final model chosen by the step-function:
lm3 <- lm(formula = mpg ~ wt + qsec + am, data = mtcars)
summary(lm3)
Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.481 -1.556 -0.726 1.411 4.661
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.618 6.960 1.38 0.17792
wt -3.917 0.711 -5.51 7e-06 ***
qsec 1.226 0.289 4.25 0.00022 ***
am 2.936 1.411 2.08 0.04672 *
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 2.46 on 28 degrees of freedom
Multiple R-squared: 0.85,Adjusted R-squared: 0.834
F-statistic: 52.7 on 3 and 28 DF, p-value: 1.21e-11
3.2.3 Validation
# plot fitted vs. observed - identifying observation nr:
plot(mtcars$mpg, lm3$fitted, type = "n")
text(mtcars$mpg, lm3$fitted, labels = row.names(mtcars))
eNote 3 3.2 EXAMPLE: CAR DATA 10
10 15 20 25 30
1015
2025
30
mtcars$mpg
lm3$
fitte
d Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280Merc 280C
Merc 450SEMerc 450SLMerc 450SLC
Cadillac FleetwoodLincoln ContinentalChrysler Imperial
Fiat 128Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28Pontiac Firebird
Fiat X1−9
Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
# Regression Diagnostics plots given automatically, either 4:
par(mfrow = c(2, 2))
plot(lm3)
eNote 3 3.2 EXAMPLE: CAR DATA 11
10 15 20 25 30
−4
−2
02
4
Fitted values
Res
idua
ls
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Residuals vs Fitted
Chrysler Imperial Fiat 128Toyota Corolla
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−1
01
2
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
Chrysler ImperialFiat 128
Toyota Corolla
10 15 20 25 30
0.0
0.5
1.0
1.5
Fitted values
Sta
ndar
dize
d re
sidu
als
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Scale−LocationChrysler Imperial
Fiat 128Toyota Corolla
0.00 0.05 0.10 0.15 0.20 0.25 0.30
−1
01
2
Leverage
Sta
ndar
dize
d re
sidu
als
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cook's distance
0.5
Residuals vs Leverage
Chrysler Imperial
Merc 230
Fiat 128
# Or 6:
par(mfrow = c(3, 2))
plot(lm3, 1:6)
eNote 3 3.2 EXAMPLE: CAR DATA 12
10 15 20 25 30
−4
−2
02
4
Fitted values
Res
idua
ls
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
Residuals vs Fitted
Chrysler Imperial Fiat 128Toyota Corolla
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−1
01
2
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
Chrysler ImperialFiat 128
Toyota Corolla
10 15 20 25 30
0.0
0.5
1.0
1.5
Fitted values
Sta
ndar
dize
d re
sidu
als
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Scale−LocationChrysler Imperial
Fiat 128Toyota Corolla
0 5 10 15 20 25 30
0.0
0.1
0.2
0.3
Obs. number
Coo
k's
dist
ance
Cook's distanceChrysler Imperial
Merc 230Fiat 128
0.00 0.05 0.10 0.15 0.20 0.25 0.30
−1
01
2
Leverage
Sta
ndar
dize
d re
sidu
als
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cook's distance
0.5
Residuals vs Leverage
Chrysler Imperial
Merc 230
Fiat 128
0.00
0.10
0.20
0.30
Leverage hii
Coo
k's
dist
ance
●●
●
●●
●
●
●
●
●●●●● ● ●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
0.05 0.1 0.15 0.2 0.25 0.3
0
0.5
11.522.5
Cook's dist vs Leverage hii (1 − hii)Chrysler Imperial
Merc 230Fiat 128
# Plot residuals versus individual xs:
par(mfrow = c(2, 3))
for (i in 4:8) {plot(lm3$residuals ~ mtcars[,i], type = "n", xlab = names(mtcars)[i])
text(mtcars[,i], lm3$residuals, labels = row.names(mtcars))
lines(lowess(mtcars[,i], lm3$residuals), col = "blue")
}
eNote 3 3.2 EXAMPLE: CAR DATA 13
50 100 150 200 250 300
−2
02
4
hp
lm3$
resi
dual
s
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SE
Merc 450SL
Merc 450SLC
Cadillac Fleetwood
Lincoln Continental
Chrysler ImperialFiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge Challenger
AMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9
Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari DinoMaserati Bora
Volvo 142E
3.0 3.5 4.0 4.5 5.0
−2
02
4
drat
lm3$
resi
dual
sMazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SE
Merc 450SL
Merc 450SLC
Cadillac Fleetwood
Lincoln Continental
Chrysler Imperial Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge Challenger
AMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9
Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari DinoMaserati Bora
Volvo 142E
2 3 4 5
−2
02
4
wt
lm3$
resi
dual
s
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SE
Merc 450SL
Merc 450SLC
Cadillac Fleetwood
Lincoln Continental
Chrysler ImperialFiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge Challenger
AMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9
Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari DinoMaserati Bora
Volvo 142E
16 18 20 22
−2
02
4
qsec
lm3$
resi
dual
s
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SE
Merc 450SL
Merc 450SLC
Cadillac Fleetwood
Lincoln Continental
Chrysler ImperialFiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge Challenger
AMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9
Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari DinoMaserati Bora
Volvo 142E
0.0 0.2 0.4 0.6 0.8 1.0
−2
02
4
vs
lm3$
resi
dual
s
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SE
Merc 450SL
Merc 450SLC
Cadillac Fleetwood
Lincoln Continental
Chrysler Imperial Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge Challenger
AMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9
Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari DinoMaserati Bora
Volvo 142E
Before moving on, with e.g. more linearity investigation diagnostics, it seems alreadyhere that we have a problem with the variability: It appears to increase with increa-sing fitted values. This would point towards a log-transformation. Let’s try the Box-coxpower transformations for the full model:
par(mfrow = c(1, 1))
library(MASS)
boxcox(lm2)
eNote 3 3.2 EXAMPLE: CAR DATA 14
−2 −1 0 1 2
510
15
λ
log−
Like
lihoo
d
95%
This clearly supports the log-transformation, so we redo the initial analysis:
lm2 <- lm(log(mpg) ~ cyl + disp + hp + drat + wt + qsec + vs +
am + gear + carb, data = mtcars)
summary(lm2)
Call:
lm(formula = log(mpg) ~ cyl + disp + hp + drat + wt + qsec +
vs + am + gear + carb, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.1457 -0.0789 -0.0175 0.0652 0.2513
Coefficients:
eNote 3 3.2 EXAMPLE: CAR DATA 15
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.78e+00 8.49e-01 3.27 0.0037 **
cyl 7.66e-03 4.74e-02 0.16 0.8733
disp 4.99e-05 8.10e-04 0.06 0.9515
hp -8.96e-04 9.88e-04 -0.91 0.3744
drat 2.22e-02 7.42e-02 0.30 0.7677
wt -1.72e-01 8.60e-02 -2.00 0.0580 .
qsec 3.08e-02 3.32e-02 0.93 0.3640
vs -2.87e-03 9.55e-02 -0.03 0.9763
am 4.74e-02 9.33e-02 0.51 0.6169
gear 5.93e-02 6.78e-02 0.87 0.3917
carb -2.01e-02 3.76e-02 -0.54 0.5983
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.12 on 21 degrees of freedom
Multiple R-squared: 0.89,Adjusted R-squared: 0.837
F-statistic: 16.9 on 10 and 21 DF, p-value: 6.89e-08
step(lm2, direction = "backward")
Start: AIC=-127.05
log(mpg) ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear +
carb
Df Sum of Sq RSS AIC
- vs 1 0.0000 0.304 -129
- disp 1 0.0001 0.304 -129
- cyl 1 0.0004 0.304 -129
- drat 1 0.0013 0.305 -129
- am 1 0.0037 0.307 -129
- carb 1 0.0041 0.308 -129
- gear 1 0.0111 0.315 -128
- hp 1 0.0119 0.316 -128
- qsec 1 0.0124 0.316 -128
<none> 0.304 -127
- wt 1 0.0581 0.362 -123
Step: AIC=-129.05
eNote 3 3.2 EXAMPLE: CAR DATA 16
log(mpg) ~ cyl + disp + hp + drat + wt + qsec + am + gear + carb
Df Sum of Sq RSS AIC
- disp 1 0.0001 0.304 -131
- cyl 1 0.0005 0.304 -131
- drat 1 0.0013 0.305 -131
- am 1 0.0040 0.308 -131
- carb 1 0.0041 0.308 -131
- gear 1 0.0110 0.315 -130
- hp 1 0.0131 0.317 -130
- qsec 1 0.0140 0.318 -130
<none> 0.304 -129
- wt 1 0.0584 0.362 -125
Step: AIC=-131.04
log(mpg) ~ cyl + hp + drat + wt + qsec + am + gear + carb
Df Sum of Sq RSS AIC
- cyl 1 0.0007 0.304 -133
- drat 1 0.0014 0.305 -133
- am 1 0.0040 0.308 -133
- carb 1 0.0088 0.312 -132
- gear 1 0.0112 0.315 -132
- qsec 1 0.0152 0.319 -132
- hp 1 0.0167 0.320 -131
<none> 0.304 -131
- wt 1 0.1443 0.448 -121
Step: AIC=-132.97
log(mpg) ~ hp + drat + wt + qsec + am + gear + carb
Df Sum of Sq RSS AIC
- drat 1 0.0010 0.305 -135
- am 1 0.0035 0.308 -135
- carb 1 0.0085 0.313 -134
- gear 1 0.0109 0.315 -134
- hp 1 0.0164 0.321 -133
- qsec 1 0.0189 0.323 -133
<none> 0.304 -133
- wt 1 0.1499 0.454 -122
eNote 3 3.2 EXAMPLE: CAR DATA 17
Step: AIC=-134.87
log(mpg) ~ hp + wt + qsec + am + gear + carb
Df Sum of Sq RSS AIC
- am 1 0.0046 0.310 -136
- carb 1 0.0078 0.313 -136
- gear 1 0.0129 0.318 -136
- hp 1 0.0178 0.323 -135
<none> 0.305 -135
- qsec 1 0.0205 0.326 -135
- wt 1 0.1638 0.469 -123
Step: AIC=-136.39
log(mpg) ~ hp + wt + qsec + gear + carb
Df Sum of Sq RSS AIC
- carb 1 0.0083 0.318 -138
- qsec 1 0.0163 0.326 -137
<none> 0.310 -136
- hp 1 0.0207 0.331 -136
- gear 1 0.0304 0.340 -135
- wt 1 0.1937 0.504 -123
Step: AIC=-137.55
log(mpg) ~ hp + wt + qsec + gear
Df Sum of Sq RSS AIC
<none> 0.318 -138
- gear 1 0.0228 0.341 -137
- qsec 1 0.0231 0.341 -137
- hp 1 0.0332 0.351 -136
- wt 1 0.2879 0.606 -119
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear, data = mtcars)
Coefficients:
(Intercept) hp wt qsec gear
3.08237 -0.00108 -0.19155 0.02603 0.05023
eNote 3 3.2 EXAMPLE: CAR DATA 18
lm3 <- lm(log(mpg) ~ hp + wt + qsec + gear, data = mtcars)
summary(lm3)
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.1421 -0.0674 -0.0316 0.0613 0.2736
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.082366 0.417139 7.39 6.0e-08 ***
hp -0.001080 0.000643 -1.68 0.10
wt -0.191553 0.038757 -4.94 3.6e-05 ***
qsec 0.026027 0.018592 1.40 0.17
gear 0.050232 0.036090 1.39 0.18
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.109 on 27 degrees of freedom
Multiple R-squared: 0.884,Adjusted R-squared: 0.867
F-statistic: 51.6 on 4 and 27 DF, p-value: 2.95e-12
Let us now redo versions of the diagnostics plots using the ggplot-package as illu-strated in chapter 1 for the raw data plotting: Using the melt function of the reshape2-package a version of the data set where (relevant) variables are ”stringed out on top ofeach other” as a single variable, and coding for this in a new variable: (we take the 4x-variables from the model)
library(reshape2)
mtcars$residuals <- resid(lm3)
mtcars2 <- melt(mtcars, measure.vars=c(4, 6, 7, 10))
And then using this new ”variable coding” factor to produce multiple plots for eachvariable:
eNote 3 3.2 EXAMPLE: CAR DATA 19
p <- ggplot(mtcars2, aes(value, residuals))
p <- p + geom_point(shape=1)
p <- p + geom_smooth(method="loess")
p <- p + facet_wrap(~ variable, scales="free")
print(p)
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
hp wt
qsec gear
−0.2
−0.1
0.0
0.1
0.2
−0.1
0.0
0.1
0.2
−0.2
−0.1
0.0
0.1
0.2
0.3
−0.1
0.0
0.1
0.2
100 200 300 2 3 4 5
16 18 20 22 3.0 3.5 4.0 4.5 5.0value
resi
dual
s
Even though the residual patterns are not perfectly linear, none of the confidence bandsdo not contain the zero-value throughout (The drat one is on the limit)
3.2.4 More validation - modelling non-linearities and interactions
The potential non-linearity of hp or drat can be more formally checked by including itin the modelling. It is easily done in R by e.g. the poly-function, here applied to fit andtest a 3rd degree polynomium for each of those two variables:
eNote 3 3.2 EXAMPLE: CAR DATA 20
lm4 <- lm(log(mpg) ~hp + wt + qsec + gear + poly(hp, 3)+ poly(wt, 3),
data = mtcars)
summary(lm4)
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear + poly(hp, 3) +
poly(wt, 3), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.16380 -0.06647 -0.00527 0.03225 0.24115
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.155942 0.485279 6.50 1.2e-06 ***
hp -0.001501 0.000811 -1.85 0.0770 .
wt -0.164364 0.048387 -3.40 0.0025 **
qsec 0.019608 0.021417 0.92 0.3694
gear 0.054383 0.043106 1.26 0.2197
poly(hp, 3)1 NA NA NA NA
poly(hp, 3)2 0.137961 0.151826 0.91 0.3729
poly(hp, 3)3 -0.080076 0.127262 -0.63 0.5354
poly(wt, 3)1 NA NA NA NA
poly(wt, 3)2 0.044573 0.129784 0.34 0.7344
poly(wt, 3)3 -0.132549 0.122949 -1.08 0.2922
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.11 on 23 degrees of freedom
Multiple R-squared: 0.9,Adjusted R-squared: 0.865
F-statistic: 25.8 on 8 and 23 DF, p-value: 1.01e-09
We are confirmed that neither curvature (2nd degree terms) nor more complicated non-linear structures are really significant. Let’s double check without the 3rd degree terms:
lm4 <- lm(log(mpg) ~hp + wt + qsec + gear + poly(hp, 2)+ poly(wt, 2),
data = mtcars)
summary(lm4)
eNote 3 3.2 EXAMPLE: CAR DATA 21
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear + poly(hp, 2) +
poly(wt, 2), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.1621 -0.0647 -0.0197 0.0580 0.2293
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.284438 0.465993 7.05 2.2e-07 ***
hp -0.001286 0.000768 -1.68 0.10632
wt -0.183001 0.044749 -4.09 0.00039 ***
qsec 0.018703 0.020985 0.89 0.38130
gear 0.031637 0.038656 0.82 0.42085
poly(hp, 2)1 NA NA NA NA
poly(hp, 2)2 0.137100 0.148705 0.92 0.36536
poly(wt, 2)1 NA NA NA NA
poly(wt, 2)2 0.097600 0.115796 0.84 0.40730
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.109 on 25 degrees of freedom
Multiple R-squared: 0.893,Adjusted R-squared: 0.867
F-statistic: 34.7 on 6 and 25 DF, p-value: 6.04e-11
We could also check for interactions - the possibility that the effect of some of the x-variables depend on the values of some of the other x-variables.
Again in R, it is easy to try to include all the interactions:
lm5 <- lm(log(mpg) ~ (hp + wt + qsec + gear) *
(hp + wt + qsec + gear), data = mtcars)
summary(lm5)
Call:
lm(formula = log(mpg) ~ (hp + wt + qsec + gear) * (hp + wt +
qsec + gear), data = mtcars)
eNote 3 3.2 EXAMPLE: CAR DATA 22
Residuals:
Min 1Q Median 3Q Max
-0.15227 -0.05985 -0.00141 0.04864 0.19782
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.54e+00 5.19e+00 -0.68 0.503
hp 4.60e-04 8.26e-03 0.06 0.956
wt 2.10e+00 1.25e+00 1.68 0.107
qsec 2.90e-01 2.39e-01 1.21 0.239
gear 3.47e-01 7.36e-01 0.47 0.642
hp:wt -8.52e-04 8.28e-04 -1.03 0.315
hp:qsec -7.06e-05 3.81e-04 -0.19 0.855
hp:gear 4.09e-04 1.08e-03 0.38 0.707
wt:qsec -9.10e-02 5.24e-02 -1.74 0.097 .
wt:gear -1.34e-01 8.64e-02 -1.56 0.135
qsec:gear 5.43e-03 3.63e-02 0.15 0.883
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.107 on 21 degrees of freedom
Multiple R-squared: 0.913,Adjusted R-squared: 0.872
F-statistic: 22.1 on 10 and 21 DF, p-value: 6.1e-09
None of them appear significant. But please also remember that even though somemight appear slightly significant, one should be a bit careful about over interpretingthis. First of all, we may in this process do many tests, in this case 6 tests for intera-ctions. A good advice is to always do a so-called ”Bonferroni-type correction in sucha case. Here it would amount to using as significance level 0.05/6 ≈ 0.01 in stead of0.05 to protect against random significances. And things may change when beginningto remove non-significant interactions again from the model. I do this in the following,without showing all steps: (And important: do NOT interpret main effect informationfor a variable when it is also part of interaction terms in the model)
lm6 <- lm(log(mpg) ~ hp + wt + qsec + gear +
hp:wt + hp:qsec + hp:gear + wt:qsec + wt:gear, data = mtcars)
summary(lm6)
eNote 3 3.2 EXAMPLE: CAR DATA 23
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear + hp:wt + hp:qsec +
hp:gear + wt:qsec + wt:gear, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.15216 -0.06213 -0.00031 0.05002 0.19604
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.08e+00 3.59e+00 -1.14 0.268
hp 1.10e-03 6.91e-03 0.16 0.875
wt 2.13e+00 1.20e+00 1.77 0.091 .
qsec 3.18e-01 1.43e-01 2.23 0.036 *
gear 4.52e-01 2.23e-01 2.03 0.055 .
hp:wt -8.67e-04 8.03e-04 -1.08 0.292
hp:qsec -7.61e-05 3.71e-04 -0.21 0.839
hp:gear 2.78e-04 6.07e-04 0.46 0.651
wt:qsec -9.32e-02 4.92e-02 -1.89 0.072 .
wt:gear -1.31e-01 8.11e-02 -1.61 0.121
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.104 on 22 degrees of freedom
Multiple R-squared: 0.913,Adjusted R-squared: 0.877
F-statistic: 25.7 on 9 and 22 DF, p-value: 1.16e-09
lm7 <- lm(log(mpg) ~ hp + wt + qsec + gear +
hp:wt + hp:gear + wt:qsec + wt:gear, data = mtcars)
summary(lm7)
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear + hp:wt + hp:gear +
wt:qsec + wt:gear, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.15190 -0.05938 0.00164 0.04910 0.19396
eNote 3 3.2 EXAMPLE: CAR DATA 24
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.122793 3.511592 -1.17 0.252
hp -0.000169 0.003046 -0.06 0.956
wt 2.174015 1.158859 1.88 0.073 .
qsec 0.321575 0.138457 2.32 0.029 *
gear 0.443498 0.214573 2.07 0.050 .
hp:wt -0.000865 0.000786 -1.10 0.283
hp:gear 0.000299 0.000586 0.51 0.614
wt:qsec -0.096674 0.045174 -2.14 0.043 *
wt:gear -0.127107 0.077412 -1.64 0.114
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.102 on 23 degrees of freedom
Multiple R-squared: 0.913,Adjusted R-squared: 0.883
F-statistic: 30.1 on 8 and 23 DF, p-value: 2.07e-10
lm8 <- lm(log(mpg) ~ hp + wt + qsec + gear +
hp:wt + wt:qsec + wt:gear, data = mtcars)
summary(lm8)
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear + hp:wt + wt:qsec +
wt:gear, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.14743 -0.06462 -0.00812 0.05687 0.19267
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.653785 3.336892 -1.09 0.284
hp 0.000809 0.002332 0.35 0.732
wt 1.938639 1.046850 1.85 0.076 .
qsec 0.302590 0.131308 2.30 0.030 *
gear 0.419263 0.206017 2.04 0.053 .
eNote 3 3.2 EXAMPLE: CAR DATA 25
hp:wt -0.000751 0.000742 -1.01 0.322
wt:qsec -0.089738 0.042417 -2.12 0.045 *
wt:gear -0.102841 0.060180 -1.71 0.100
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.1 on 24 degrees of freedom
Multiple R-squared: 0.912,Adjusted R-squared: 0.886
F-statistic: 35.5 on 7 and 24 DF, p-value: 3.75e-11
lm9 <- lm(log(mpg) ~ hp + wt + qsec + gear +
wt:qsec + wt:gear, data = mtcars)
summary(lm9)
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear + wt:qsec + wt:gear,
data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.16175 -0.06625 -0.00705 0.05506 0.19559
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.658008 1.541204 -0.43 0.673
hp -0.001443 0.000699 -2.06 0.050 *
wt 1.000935 0.487534 2.05 0.051 .
qsec 0.192462 0.073529 2.62 0.015 *
gear 0.243207 0.110427 2.20 0.037 *
wt:qsec -0.054639 0.024435 -2.24 0.035 *
wt:gear -0.052699 0.034178 -1.54 0.136
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.101 on 25 degrees of freedom
Multiple R-squared: 0.908,Adjusted R-squared: 0.886
F-statistic: 41.2 on 6 and 25 DF, p-value: 9.03e-12
eNote 3 3.2 EXAMPLE: CAR DATA 26
lm9 <- lm(log(mpg) ~ hp + wt + qsec + gear +
wt:qsec, data = mtcars)
summary(lm9)
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear + wt:qsec, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.1286 -0.0731 -0.0182 0.0596 0.2222
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.388875 1.419765 0.27 0.786
hp -0.001724 0.000693 -2.49 0.020 *
wt 0.724936 0.465341 1.56 0.131
qsec 0.167048 0.073531 2.27 0.032 *
gear 0.082830 0.038056 2.18 0.039 *
wt:qsec -0.048975 0.024789 -1.98 0.059 .
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.103 on 26 degrees of freedom
Multiple R-squared: 0.899,Adjusted R-squared: 0.88
F-statistic: 46.5 on 5 and 26 DF, p-value: 3.8e-12
Even in this model, the remeaining interaction is not really significant, so we end upwith pure additive model with the four x’es (already in lm3).
A remark on the modelling interactions: Here we just specified the interactions as mul-tiplicative terms in the model. This actually fits with the fact that this is how a potentialinteraction is modelled by this approach: as an effect where the log(mpg) changes line-arly with the product of two quantitative x-variables. Clearly interaction effects couldbe of many other forms than this. And a way to model this more generally could beto categorize an x-variable and include it in the modelling as a factor as in Analysis ofVariance (ANOVA).
Let’s look at the basic diagnostics plots again: (now for the log-transformed analysis)
eNote 3 3.2 EXAMPLE: CAR DATA 27
# Regression Diagnostics plots given automatically, either 4:
par(mfrow = c(2, 2))
plot(lm3)
2.4 2.6 2.8 3.0 3.2 3.4
−0.
10.
00.
10.
20.
3
Fitted values
Res
idua
ls
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
Residuals vs Fitted
Chrysler Imperial
Pontiac FirebirdFiat 128
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−1
01
23
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
Chrysler Imperial
Pontiac FirebirdFiat 128
2.4 2.6 2.8 3.0 3.2 3.4
0.0
0.5
1.0
1.5
Fitted values
Sta
ndar
dize
d re
sidu
als
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
Scale−LocationChrysler Imperial
Pontiac FirebirdFiat 128
0.0 0.1 0.2 0.3 0.4 0.5
−1
01
23
Leverage
Sta
ndar
dize
d re
sidu
als
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cook's distance0.5
0.5
1
Residuals vs Leverage
Chrysler Imperial
Cadillac Fleetwood
Merc 230
And the box-cox together with another qq-plot:
par(mfrow = c(1, 2))
boxCox(lm3)
qqPlot(residuals(lm3))
eNote 3 3.2 EXAMPLE: CAR DATA 28
−2 −1 0 1 2
1314
1516
1718
λ
log−
Like
lihoo
d
95%
−2 −1 0 1 2
−0.
10.
00.
10.
2
norm quantiles
resi
dual
s(lm
3)
●
●
●●
● ●●●
●●●
●●●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
3.2.5 Removing outliers
Even though for the log-transformed mpg-values none of them appears really extreme,let’s try to illustrate what could be done by trying to remove the Chrysler Imperial datapoint:
# How to remove a potential outlier and re-analyze:
# Which data point number is "Chrysler Imperial":
rownames(mtcars)
[1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
[4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
[7] "Duster 360" "Merc 240D" "Merc 230"
eNote 3 3.2 EXAMPLE: CAR DATA 29
[10] "Merc 280" "Merc 280C" "Merc 450SE"
[13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
[16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
[19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
[22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
[25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
[28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
[31] "Maserati Bora" "Volvo 142E"
# E.g. Let’s try without observation no 17:
# Remove and copy-paste:
mtcars_red <- mtcars[-17,]
# Check:
dim(mtcars)
[1] 32 12
dim(mtcars_red)
[1] 31 12
#row.names(mtcars_red)
lm3_red <- lm(log(mpg) ~ hp + wt + qsec + gear, data = mtcars_red)
summary(lm3_red)
Call:
lm(formula = log(mpg) ~ hp + wt + qsec + gear, data = mtcars_red)
Residuals:
Min 1Q Median 3Q Max
-0.1317 -0.0493 -0.0287 0.0599 0.2253
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.193185 0.358957 8.90 2.3e-09 ***
eNote 3 3.2 EXAMPLE: CAR DATA 30
hp -0.000934 0.000553 -1.69 0.10
wt -0.227616 0.034972 -6.51 6.7e-07 ***
qsec 0.026968 0.015930 1.69 0.10
gear 0.038415 0.031127 1.23 0.23
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.093 on 26 degrees of freedom
Multiple R-squared: 0.916,Adjusted R-squared: 0.903
F-statistic: 70.8 on 4 and 26 DF, p-value: 1.36e-13
lm4_red <- lm(log(mpg) ~ hp + wt + qsec, data = mtcars_red)
summary(lm4_red)
Call:
lm(formula = log(mpg) ~ hp + wt + qsec, data = mtcars_red)
Residuals:
Min 1Q Median 3Q Max
-0.16200 -0.05817 -0.00423 0.05698 0.20961
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.428723 0.306947 11.17 1.3e-11 ***
hp -0.000795 0.000547 -1.45 0.16
wt -0.252602 0.028791 -8.77 2.2e-09 ***
qsec 0.025044 0.016007 1.56 0.13
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.0939 on 27 degrees of freedom
Multiple R-squared: 0.911,Adjusted R-squared: 0.901
F-statistic: 92.1 on 3 and 27 DF, p-value: 2.69e-14
lm5_red <- lm(log(mpg) ~ wt + qsec, data = mtcars_red)
summary(lm5_red)
eNote 3 3.2 EXAMPLE: CAR DATA 31
Call:
lm(formula = log(mpg) ~ wt + qsec, data = mtcars_red)
Residuals:
Min 1Q Median 3Q Max
-0.1861 -0.0654 -0.0105 0.0543 0.2220
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.08000 0.19550 15.75 1.9e-15 ***
wt -0.28399 0.01944 -14.61 1.3e-14 ***
qsec 0.04369 0.00978 4.47 0.00012 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.0958 on 28 degrees of freedom
Multiple R-squared: 0.904,Adjusted R-squared: 0.897
F-statistic: 132 on 2 and 28 DF, p-value: 5.67e-15
par(mfrow = c(3, 2))
plot(lm5_red, 1:6)
eNote 3 3.2 EXAMPLE: CAR DATA 32
2.4 2.6 2.8 3.0 3.2 3.4
−0.
2−
0.1
0.0
0.1
0.2
Fitted values
Res
idua
ls
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
Residuals vs Fitted
Pontiac Firebird
Toyota Corona
Fiat 128
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
−2 −1 0 1 2
−2
−1
01
2
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
Pontiac Firebird
Toyota Corona
Fiat 128
2.4 2.6 2.8 3.0 3.2 3.4
0.0
0.5
1.0
1.5
Fitted values
Sta
ndar
dize
d re
sidu
als
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
Scale−LocationPontiac Firebird
Toyota CoronaFiat 128
0 5 10 15 20 25 30
0.00
0.05
0.10
0.15
Obs. number
Coo
k's
dist
ance
Cook's distanceToyota Corona
Pontiac FirebirdFiat 128
0.00 0.05 0.10 0.15 0.20 0.25 0.30
−2
−1
01
2
Leverage
Sta
ndar
dize
d re
sidu
als
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
Cook's distance1
0.5
0.5
1
Residuals vs Leverage
Toyota Corona
Pontiac Firebird
Fiat 128
0.00
0.04
0.08
0.12
Leverage hii
Coo
k's
dist
ance
●●
●
●●
●●
● ●
● ●
●
●●
●●
●
●
●
●
●
● ●
●
●●
●
●
● ●●
0 0.05 0.1 0.15 0.2 0.25 0.3
0
0.5
11.522.5
Cook's dist vs Leverage hii (1 − hii)Toyota Corona
Pontiac FirebirdFiat 128
So some of the significances were due to this single extreme observation.
3.2.6 Prediction
It is easy to predict single or several observations for which data on the x-variables areavailable. E.g. predicting the mpg for observation number 3, 7 and 9:
# With confidence intervals:
predict(lm3, mtcars[c(3,7,9),], interval = ’confidence’)
fit lwr upr
Datsun 710 3.2229 3.1679 3.2778
Duster 360 2.6970 2.5941 2.7999
Merc 230 3.1734 3.0136 3.3331
eNote 3 3.3 EXERCISES 33
# With prediction intervals:
predict(lm3, mtcars[c(3,7,9),], interval = ’prediction’)
fit lwr upr
Datsun 710 3.2229 2.9934 3.4523
Duster 360 2.6970 2.4516 2.9424
Merc 230 3.1734 2.8992 3.4475
Or using the final model without the Chrysler Imperial:
# With confidence intervals:
predict(lm5_red, mtcars[c(3,7,9),], interval = ’confidence’)
fit lwr upr
Datsun 710 3.2342 3.1854 3.2830
Duster 360 2.7582 2.7040 2.8123
Merc 230 3.1859 3.0790 3.2928
# With prediction intervals:
predict(lm5_red, mtcars[c(3,7,9),], interval = ’prediction’)
fit lwr upr
Datsun 710 3.2342 3.0321 3.4363
Duster 360 2.7582 2.5547 2.9617
Merc 230 3.1859 2.9625 3.4093
In this case these observations were also used for the fit, but this was not important here- any new data set can be used in the prediction function.
3.3 Exercises
eNote 3 3.3 EXERCISES 34
Exercise 1 Prostate Cancer data
Analyze the Prostate data from the following book website: http://statweb.stanford.edu/~tibs/ElemStatLearn/ (check p. 47-48 of Hastie et. al for a little description of thedata) (Use the uploaded version of the data file in Campusnet to avoid import problems)
prostate <- read.table("prostatedata.txt", header = TRUE,
sep = ";", dec = ",")
head(prostate)
no lcavol lweight age lbph svi lcp gleason pgg45 lpsa
1 7 0.73716 3.4735 64 0.61519 0 -1.38629 6 0 0.76547
2 9 -0.77653 3.5395 47 -1.38629 0 -1.38629 6 0 1.04732
3 10 0.22314 3.2445 63 -1.38629 0 -1.38629 6 0 1.04732
4 15 1.20597 3.4420 57 -1.38629 0 -0.43078 7 5 1.39872
5 22 2.05924 3.5010 60 1.47476 0 1.34807 7 20 1.65823
6 25 0.38526 3.6674 69 1.59939 0 -1.38629 6 0 1.73165
train
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
summary(prostate)
no lcavol lweight age
Min. : 1 Min. :-1.347 Min. :2.37 Min. :41.0
1st Qu.:25 1st Qu.: 0.513 1st Qu.:3.38 1st Qu.:60.0
Median :49 Median : 1.447 Median :3.62 Median :65.0
Mean :49 Mean : 1.350 Mean :3.65 Mean :63.9
3rd Qu.:73 3rd Qu.: 2.127 3rd Qu.:3.88 3rd Qu.:68.0
Max. :97 Max. : 3.821 Max. :6.11 Max. :79.0
lbph svi lcp gleason
Min. :-1.39 Min. :0.000 Min. :-1.386 Min. :6.00
1st Qu.:-1.39 1st Qu.:0.000 1st Qu.:-1.386 1st Qu.:6.00
Median : 0.30 Median :0.000 Median :-0.799 Median :7.00
Mean : 0.10 Mean :0.216 Mean :-0.179 Mean :6.75
eNote 3 3.3 EXERCISES 35
3rd Qu.: 1.56 3rd Qu.:0.000 3rd Qu.: 1.179 3rd Qu.:7.00
Max. : 2.33 Max. :1.000 Max. : 2.904 Max. :9.00
pgg45 lpsa train
Min. : 0.0 Min. :-0.431 Mode :logical
1st Qu.: 0.0 1st Qu.: 1.732 FALSE:30
Median : 15.0 Median : 2.591 TRUE :67
Mean : 24.4 Mean : 2.478 NA’s :0
3rd Qu.: 40.0 3rd Qu.: 3.056
Max. :100.0 Max. : 5.583
dim(prostate)
[1] 97 11
Note that the first and the last variable of the data set are not ”real” variables: no is just anumbering of persons, and train is a coding into two groups for (later) analysis: Somepersons have been assigned to a ”training” part of the data set (train=TRUE) and othersto a ”test” part of the data set (train=FALSE). If not otherwise stated, we will for nowignore this grouping.
pairs(prostate)
eNote 3 3.3 EXERCISES 36
no
−1 1 3
●● ● ● ●● ●●●● ●●●●●●● ● ●● ●●●●● ● ●●●●
●●●● ●● ●●● ●●●● ●● ● ●● ●●● ●● ●● ●● ●●●● ●● ●●●●●●●● ●●●● ● ●● ●●●●●● ●●● ●●● ●● ●●● ●●
●●●●●●● ● ●● ●●●●●●● ●●● ●●●●●● ●●
● ●
● ●● ●●●●●●●●● ●●● ●●●●●●● ●●● ●● ●●● ●●● ●●●●●●● ●●●●●●●● ●● ●●●●●●●●●● ●●●●●●●
40 60 80
●● ●●● ●●●●● ●●● ●● ●●●●● ●●●● ●●●●● ●
● ● ●●●● ● ●●●●●●●● ●●●●●●●● ●● ●●●●●●●● ●●●●● ●● ●●●●●●● ●●●● ●●●● ●●●●●● ●●● ●● ●
●●●●●●● ●●● ●●● ●● ●●●●● ●●● ●● ●● ●
● ●
●●●●●● ●● ●●●● ●●● ●● ● ●● ●● ●●● ●● ●●● ●●● ●● ●●●● ● ●● ● ●● ●● ●● ● ●●● ●●● ●● ●●● ●● ●●● ●
0.0 0.6
●●●●●●●●●●●●●●●●●●●●
●●●● ●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●● ●●●● ●●● ●● ●● ●●●● ●●●●
●●● ● ●●●●●●●●
●●● ● ● ●● ● ●●● ● ● ●● ●●●
●●●●●●●●● ●●● ● ●●●●● ●● ● ●●● ● ●● ●●● ●●● ●●●●●●●● ●●●● ● ●● ● ●●● ●● ●●● ●● ● ●●●● ● ●●
6.0 7.5 9.0
●●● ●●●● ●●● ●●●
●●● ●●●●●●● ●● ●● ●●●
●● ●●●●●●●●●● ●●●●●● ●●●●●●● ●●●● ●● ●● ●●● ●● ●●● ●●●● ●●●●●●●●●●●●●● ●●●● ●●●●
●●●●●●● ●●●●
●● ●●● ●●●● ● ●● ●● ●● ●●●
●● ●●●●●●●●●● ●●●●●● ●●●●●●● ●● ●● ●● ●● ●●● ●● ● ●● ●●●●● ●●● ●●● ●● ●●● ●● ● ● ●● ● ●● ●
0 2 4
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●
060
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−1
2
●
●●●
●
●●
●●●
●●●●●●●
●●
●●●●●●
●●●●
●
●●●●
●
●
●●
●
●●●
●
●
●●●
●
●
●●
●
●●
●
●●
●
●●●●●●
●●●●●●●
●●●
●
●●●
●●
●●●
●●●●
●
●●●
●
●●●●●
lcavol ●
●●●●
●●
●●●
●●●●
●●●
●●
●●●●●●
●●●●●
● ●●
●
●
●
●●
●
●●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●● ●●●
●
●●●
●●●
●
●●●
●
●●
●
●●
●●●
●●●●
●
●●●
●
●●●●●
●
●●
●●
●●
●●●
●●●●
●●
●
●●
●●●●
● ●●
●●●●
● ●●
●
●
●
● ●
●
●●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●●●●
●
●●●
● ●●
●
●●●
●
●●
●
●●
●●●
●● ●
●
●
●●●
●
●● ●
●●
●
●●●
●
●●
●●●
●●● ●●
●●
●●
●●●●
●●●
● ●●●
●●●●
●
●
●●
●
●●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●● ●●●●
●●●
●●●
●
●● ●
●
●●
●
● ●
●●●
●●●
●
●
●●●
●
●●●
●●
●
●●●●
●●
●●●
●●●●●●●
●●
●●●●
● ●●
●● ●●
●●●●
●
●
●●
●
●●●
●
●
●●●
●
●
●●
●
●●
●
●●
●
●●●●●
●
●●●●●●●
●●●
●
●●
●
●●
●● ●
●● ●●
●
● ●●
●
●●●●●
●
●●
●●
●●
●●●
●●●
●●
●●
●●
●●●●
● ●●
● ● ●●
●●●●
●
●
●●
●
●●●
●
●
●●●
●
●
●●
●
●●
●
●●
●
●● ●●●
●
●●●●●●●
●●●
●
●●
●
● ●
●● ●
●●●
●
●
● ●●
●
●● ●
●●
●
●●
●●
●●
●●●
●●●●
●●
●
●●
●●●●●●
●● ●●●
●●●
●
●
●
●●
●
●●●
●
●
●●●
●
●
●●
●
●●
●
●●
●
● ●●●●
●
●● ●● ●
●●
●●●
●
●●●
●●
●●●
●●●●
●
● ●●
●
●●●●●
●
●●●
●
●●
●●●
●●●
●●●
●
●●
●● ●●
●●●
● ●●●
●●●
●
●
●
●●
●
●●●
●
●
●●●
●
●
●●
●
●●
●
●●
●
● ●●●●
●
●● ●● ●
●●
●●●
●
●●
●
● ●
●● ●
●●●
●
●
● ●●
●
●● ●
●●
●
●●●●
●●
●●●
●●●●●●●
●●
●●●●●●●
●● ●●
●●●●
●
●
●●
●
●●●
●
●
●●●
●
●
●●
●
●●
●
●●
●
●●●●●●
●●●●●●●
●●●
●
●●●
●●
●●●
●●●●
●
●●●
●
●●●
●●
●
●●●●
●●
●●●
●●●●●●●
●●
●●●●●●●●●●●
●●●●
●
●
●●
●
●●●
●
●
●●●
●
●
●●
●
●●
●
●●
●
●●●●●●
●●●●●●●
●●●
●
●●●
●●
●●●
●●●●
●
●●●
●
●●●●●
●●●●●●●●
●
●
●●●●●●●●●●
●●●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●
●●●
●
●
●●●●●●●●●●
●●●
●●●●●
●●●●
●●●●●●●●●●
●
●●●●●● ●● ● ● ●●
●●
●
●
●●●●●●●
●●
●
●●●●●●
●●●
●
●●
●● ●● ●●●
●●●● ●●
●●● ●●● ●
●
●●●
●
●
●●●
●●●●●●
●●
●●
●●●
●●
●●●●
●●●
●●●● ●●
●
●
●●
●● ●●lweight
●● ●●● ●●●
●
●
●●● ●
● ●●●
●●
●●●● ●●
●●●
●
●●
●●●● ● ●●
●●●●●
●●
●●●●●●
●
●●●
●
●
●●●
●●●●●●
●●
●●
●●●●
●
● ●●●
● ●●
●●
●●●●●
●
●●
● ●● ● ●●●● ●●●
●
●
●
●●●
●● ●●
●●●
●●● ●●●
● ●●
●
●●●●●● ●● ●●●●
●●●
●● ● ●●
●●●
●●●
●
●
●●●●●●● ●●
●●
●●
●● ●
●●
● ●●
●
●●●
●●
● ●● ●●
●
●●
●●● ● ●●●●●●●●
●
●
●●●●●●●●●●
●●●● ●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●●●
●
●
●●●●●
●●●●●●●●
●●●●●
●●●●
●●●
●●
●● ●●●
●
●●
●●●● ●●● ● ●●●●
●
●
●● ●●● ● ●
●●
●
●●● ● ●●
● ●●
●
●●●●●●●●●
●●●● ●
●●●● ●●
● ●●
● ●●
●
●
●●●●●
●●●●●●●●
●●●
●●
●●●
●
●●●
●●
●● ●●●
●
●●● ● ●● ●●● ●●●
●●
●
●
●●●●
●● ●●●●
●●● ●●●
● ●●●
●●
●●●●●●●
●●●●●
●●●● ●●●●●
●●●
●
●
● ●●
●●●●● ●
●●●
●
●●●
●●
●●●●
●●●●●●●●●●
●
●●
●●●● ●●●● ●●●
●
●
●
●●●
●●● ●
●●●
● ●● ●●●
● ●●
●
●●
●●●●●●●
●●●●●
●●●● ●●
●●●
●●●
●
●
● ●●
●●●●● ●
●●
●●
●●●
●●
●●●
●
●●●
●●
●● ●●●
●
●●
● ●● ● ●●●●●●●●
●
●
●●●●●●●●●●
●●●●●●
●●●
●
●●●●●●●
●●●●●●●●●●●●●●●●
●●●
●
●
●●●●●●●●●●●●●
●●●●●
●●●●
●●●●●●●●●●
●
●●●●● ●
35
●●●●●●●●
●
●
●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●●●
●
●
●●●●●●●●●●●●●
●●●●●
●●●●
●●●●●●●●●●
●
●●●●●●
4070 ●
●
●●●
●●●●
●
●●●●
●
●●●
●
●
●●●●●
●●●
●
●
●●
●
●●
●●●●●●●●●
●
●
●●●●●●●●●
●
●●
●●●●●
●
●●●
●
●
●
●●●●●●
●
●●●●●
●●
●
●
●●●●●
●●●●
●
●●
●
●● ●
● ●●●
●
●●●
●
●
●●
●
●
●
●●●●●
● ●●
●
●
●●
●
●●
●●
●● ●●●
●●
●
●
●●●●● ●●
●●
●
●●
●●●
●●
●
●●●
●
●
●
● ●●●● ●
●
●●●
●●
●●
●
●
●●
●● ●
●●
●●
●
●●
●
●●●
●● ● ●
●
●●●●
●
●●
●
●
●
●●●●●
● ●●
●
●
●●
●
●●
●●●●●
●●●●
●
●
●●●●●● ●
●●
●
●●
●● ●
●●
●
●●●
●
●
●
●●●●●●
●
●●●
●●
●●
●
●
●●●
● ●
●●
●●
●
● age ●
●
●● ●
●● ●●
●
●●●
●
●
●●
●
●
●
●●● ●●
●● ●
●
●
●●
●
●●
●●
● ●●●●
●●
●
●
● ●●● ●● ●
●●
●
●●
●● ●
●●
●
●●●
●
●
●
●●●
●● ●
●
●● ●
●●
● ●
●
●
●●
●●●
●●
●●
●
●●
●
●●●
●●●●
●
●●●●
●
●●●
●
●
●●●●●
●●●
●
●
●●
●
●●
●●●●●●●●●
●
●
●●●●●●●●●
●
●●
●●●●●
●
●●●
●
●
●
●●●●●●
●
●●●
●●
●●
●
●
●●
●●●
●●
●●
●
●●
●
●● ●
●●●●
●
●●
●●
●
●●
●
●
●
●●● ●●
●● ●
●
●
●●
●
●●
●●●● ●●●
●●
●
●
●●●● ● ●●
●●
●
●●
●● ●
●●
●
●●●
●
●
●
● ●●
●● ●
●
●● ●
●●
●●
●
●
●●
●● ●
●●●
●
●
●●
●
●●●
●● ●●
●
●●●●
●
●●●
●
●
●●● ●●
●● ●
●
●
●●
●
●●
●●●● ●
●●●
●
●
●
●●●●●●●
●●
●
●●
●●●
●●
●
●● ●
●
●
●
● ●●●● ●
●
●●●●●
●●
●
●
●●
●●●
●●
●●
●
●●
●
●● ●
●● ●●
●
●●
●●
●
●●
●
●
●
● ●● ●●
●● ●
●
●
●●
●
●●
●●●● ●●●
●●
●
●
●●●●●●●
●●
●
●●
●●●
●●
●
●● ●
●
●
●
● ●●
●●●
●
●● ●
●●
●●
●
●
●●
●● ●
●●
●●
●
●●
●
●●●
●●●●
●
●●●●
●
●●●
●
●
●●●●●
●●●
●
●
●●
●
●●
●●●●●●●●●
●
●
●●●●●●●●●
●
●●
●●●●●
●
●●●
●
●
●
●●●●●●
●
●●●●●
●●
●
●
●●●●●
●●●●
●
●●
●
●●●
●●●●
●
●●●●
●
●●●
●
●
●●●●●
●●●
●
●
●●
●
●●
●●●●●●●●●
●
●
●●●●●●●●●
●
●●
●●●●●
●
●●●
●
●
●
●●●●●●
●
●●●●●
●●
●
●
●●●●●
●●●●
●
●
●
●●●
●●
●
●●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●●
●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
● ● ●
●●
●
●●
●
●
●●
●
●
●● ●
●●
●●
●
●
●
●
●
●
●
●
●●●● ●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●
●
●●
●
● ●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●●●
●●
●
● ●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
● ●● ●●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●●
● ●
●
●●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
● ● ●●●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●●
●
lbph ●
●●●
●●
●
●●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●●
●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●● ●
●●
●
●●
●
●
● ●
●
●
● ● ●
● ●
●●
●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●● ●
●●
●
●●
●
●
●●
●
●
● ●●
●●
●●
●
●
●
●
●
●
●
●
●● ●●●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●
●
●●
●
● ●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●●
●
●●
●
●
●●
●
●
● ●●
●●
●●
●
●
●
●
●
●
●
●
●● ●●●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●●
●●
●
● ●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●●●
●●
●
●●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●●
●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
−1
1●
●●●
●●
●
●●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●●
●●●
●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
0.0
0.8
●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●●●●●
●
●
●●
●●
●
●●
●
●
●
●
●●●
●
●●●●
●● ● ● ●● ●●●● ●●●●●●● ● ●●
●●
●●
● ●
●●
●●
●●●● ●● ●●● ●●●● ●● ● ●● ●●● ●● ●● ●●
●
●●● ●●
●
●●●●●●● ●●●● ●
●
●
●●
●●
●
● ●
●
●
●
●
● ●●
●
●● ●●
●●●●●●● ● ●● ●●●●●●● ●●●
●●
●●
●●
●●
● ●
● ●● ●●●●●●●●● ●●● ●●●●●●● ●●● ●●
●
●● ●●●
●
●●●●●● ●●●●●●
●
●
●●
●●
●
●●
●
●
●
●
● ●●
●
●●●●
●● ●●● ●●●●● ●●● ●● ●●●●●
●●
●●
●●
●●
● ●
● ● ●●●● ● ●●●●●●●● ●●●●●●●● ●● ●●
●
●●●●●
●
●●●● ●● ●●●●●●
●
●
●●
● ●
●
●●
●
●
●
●
●● ●
●
● ●● ●
●●●● ●●● ●●● ●●● ●● ●●●●●
●●
● ●
● ●
● ●
● ●
●●●●●● ●● ●●●● ●●● ●● ● ●● ●● ●●● ●●
●
●● ●●●
●
● ●●●● ● ●● ● ●● ●
●
●
● ●
●●
●
●●
●
●
●
●
●● ●
●
●●● ●
svi●●● ● ●●●●●●●● ●●● ● ● ●● ●
●●
● ●
● ●
● ●
●●
●●●●●●●●● ●●● ● ●●●●● ●● ● ●●● ● ●●
●
●● ●●●
●
●●●●●●● ●●●● ●
●
●
● ●
●●
●
● ●
●
●
●
●
● ●●
●
● ● ●●
●●● ●●●● ●●● ●●●●●● ●●●●
●●
● ●
● ●
● ●
●●
●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●●
●
● ●● ●●
●
●● ●● ●●● ●●●● ●
●
●
●●
●●
●
●●
●
●
●
●
●●●
●
●●●●
●●●● ●●● ●●●●●● ●●● ●●●●
● ●
● ●
● ●
● ●
●●
●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●●
●
● ●● ●●
●
●● ●● ● ●● ●●●●●
●
●
● ●
●●
●
● ●
●
●
●
●
● ● ●
●
● ●● ●
●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●●●●●
●
●
●●
●●
●
●●
●
●
●
●
●●●
●
●●● ●
●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●●●●●
●
●
●●
●●
●
●●
●
●
●
●
●●●
●
●●●●
●●●
●
●
●●●●●●●
●
●
●●●
●
●
●
●●
●
●●
●●
●●●
●●●●●●●●●●●●●
●
●●●●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●●●●●●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●● ●
●
●
● ●●●● ●●
●
●
●●
●
●
●
●
●●
●
●●
●●
●●
●
●●●● ●● ●●●●●●
●
●
● ● ●●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●●●●●●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●●
●
●
●● ● ●● ●●
●
●
●●●
●
●
●
●●
●
●●
●●
●●
●
● ●● ●●●●●●●●●
●
●
● ●●●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●●●●●● ●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●● ●
●
●
●●●●● ●●
●
●
●●
●
●
●
●
●●
●
●●
●●
●●
●
● ● ●●●● ● ●●●
●●●
●
● ●●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●●● ●● ●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●●
●
●
●● ●●● ●●
●
●
●●●
●
●
●
●●
●
●●
●●
●●
●
●●●●●● ●● ●●●●
●
●
● ●● ●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
● ●●●● ● ●
● ●
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●●
●
●●●
●
●
●●●●●●●
●
●
●●●
●
●
●
●●
●
●●
●●
●●●
●●●●●●●●●●●●●
●
●●●●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●●●●●●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●● lcp
●●●
●
●
●● ●●● ●●
●
●
●●
●
●
●
●
●●
●
●●
●●
●●●
●● ●●●●●●●●●●●
●
●●●●
●
●●
●
●●●
●
●
●
●
●
● ●
●
●
●● ●● ●●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●●
●●●
●
●
●● ●●●●●
●
●
●●
●
●
●
●
● ●
●
●●
●●
●●
●
●● ●●●●●●●●
●●●
●
●●●●
●
●●
●
●●●
●
●
●
●
●
● ●
●
●
●● ●● ● ●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●●●
●
●
●●●●●●●
●
●
●●●
●
●
●
●●
●
●●
●●
●●
●
●●●●●●●●●●●●●
●
●●●●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●●●●●●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
−1
2
●●●
●
●
●●●●●●●
●
●
●●●
●
●
●
●●
●
●●
●●
●●●
●●●●●●●●●●●●●
●
●●●●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●●●●●●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
6.0
8.0
●●●
●●
●●
●
●●
●●●●
●●
●●●●●●
●
●●
●
●
●
●●
●●
●
●●●●●●
●●
●
●
●●●●●
●●●
●●●●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●●●●●●●●●●●●
●
●●●
●
●●●●
●● ●
● ●
● ●
●
●●
●●●●
●●
● ● ●● ●●
●
●●
●
●
●
●●
●●
●
● ●● ●●●
●●
●
●
●● ● ●●
●●●
●● ●●
●
● ●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
● ●● ●●●●●● ●●● ●
●
● ●●
●
●● ●●
●●●
●●
●●
●
●●
●●●●
●●
● ●●● ●●
●
●●
●
●
●
● ●
● ●
●
●●●●●●
●●
●
●
●● ●●●
●●●
● ●●●
●
● ●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●● ●● ●●●●●●●●
●
● ●●
●
●●●●
●● ●
●●
●●
●
●●
●●● ●
● ●
●●●● ●●
●
● ●
●
●
●
● ●
● ●
●
●●● ● ●●
●●
●
●
●● ●●●
●●●
●● ●●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●● ●●●● ●●●● ●●●
●
●● ●
●
● ●● ●
●●●
● ●
●●
●
●●
●●● ●
● ●
●●●● ●●
●
●●
●
●
●
● ●
●●
●
●●● ●● ●
●●
●
●
●● ●● ●
●● ●
● ●●●
●
● ●●
●
●
●●
●
●
●
●
●
● ●
●
●
● ●
●
●● ●● ● ●●● ●●● ●●
●
●● ●
●
●●● ●
●●●
●●
●●
●
●●
●●●●
●●
●●●● ●●
●
● ●
●
●
●
●●
●●
●
●●●●●●
●●
●
●
●●●●●
●●●
●●●●
●
● ●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
● ●● ●●●● ●●● ●● ●
●
●●●
●
●●●●
●●●
● ●
●●
●
●●
●● ●●
● ●
● ●● ● ●●
●
● ●
●
●
●
●●
●●
●
●●●●●●
●●
●
●
●●●●●
●● ●
●●● ●
●
● ●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
● ●● ● ●●● ●● ●●● ●
●
● ●●
●
● ● ●●gleason
●●●
● ●
●●
●
●●
●●● ●
●●
●●●● ● ●
●
●●
●
●
●
●●
●●
●
●●●●●●
●●
●
●
●●●●●
●●●
●●●●
●
● ●●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
● ●●● ●●● ●● ●●● ●
●
● ● ●
●
● ●● ●
●●●
●●
●●
●
●●
●●●●
●●
●●●●●●
●
●●
●
●
●
●●
●●
●
●●●●●●
●●
●
●
●●●●●
●●●
●●●●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●●●●●●●●●●●●
●
●●●
●
●●● ●
●●●
●●
●●
●
●●
●●●●
●●
●●●●●●
●
●●
●
●
●
●●
●●
●
●●●●●●
●●
●
●
●●●●●
●●●
●●●●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●●●●●●●●●●●●
●
●●●
●
●●●●
●●●●●●●●●●●
●●
●
●●
●
●
●●
●●
●●●
●
●●
●●
●●●●●●●●●
●
●●
●
●●●●●
●●●
●●●●●●●
●
●
●●●
●
●
●●●●●
●
●
●
●●●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●● ● ●●
● ●●
●● ●●●
●
●●
●
●
●●
●●
●●●
●
●●
●●
●●●
● ●● ●●●
●
●●
●
●● ● ●●
●●
●
●● ●●●●
●
●
●
●●●
●
●
●●
●●
●
●
●
●
●● ●
●
● ●
●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●●●●●●●●
●● ●●●
●
●●
●
●
●●
●●
●●●
●
●●
●●
● ●●
●●●●●●
●
●●
●
●● ●●●
●●
●
● ●●●●●
●
●
●
●●●
●
●
●●
●●
●
●
●
●
●●●
●
● ●
●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●● ●●●
●●●
●● ●●●
●
● ●
●
●
●●
●●
●● ●
●
●●
●●
● ●●
●●● ● ●●
●
●●
●
●● ●●●
●●●
●● ●●●●
●
●
●
●●●
●
●
●●
●●
●
●
●
●
●●●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●●●●●●●●●● ●●●
●
● ●
●
●
●●
●●
●●●
●
●●
●●
●●●●●● ●● ●
●
●●
●
●● ●● ●
●●
●
● ●●●●●
●
●
●
●●●
●
●
●●
●●
●
●
●
●
●● ●
●
●●
●●
●
●
●●
● ●
●
●
●
●●
●●
●●
●
●●●●●●●●●●●●●
●
●●
●
●
●●
●●
●● ●
●
●●
●●
●●●●●●●●●
●
●●
●
●●●●●
●●●
●●●●●●
●
●
●
●●●
●
●
●●●●●
●
●
●
●●●
●
● ●
●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●●● ●●
●●●●●●● ●
●
● ●
●
●
● ●
●●
●● ●
●
●●
●●
●●●●●●●●●
●
●●
●
●●●●●
●●
●
●●● ●●●●
●
●
●●●
●
●
●●●●●
●
●
●
●● ●
●
● ●
●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●●● ●●
●●●
●● ●●●
●
●●
●
●
●●
●●
●●●
●
●●
●●
●●●
●●●●●●
●
●●
●
●●●●●
●●●
●●●●●●
●
●
●
●●●
●
●
●●
●●●
●
●
●
●● ●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
pgg45●●●●
●●●●●●●●●
●
●●
●
●
●●
●●
●●●
●
●●
●●
●●●●●●●●●
●
●●
●
●●●●●
●●●
●●●●●●●
●
●
●●●
●
●
●●●●●
●
●
●
●●●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
060
●●●●●●●●●●●●●
●
●●
●
●
●●
●●
●●●
●
●●
●●
●●●●●●●●●
●
●●
●
●●●●●
●●●
●●●●●●●
●
●
●●●
●
●
●●●●●
●
●
●
●●●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
03
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●
●● ● ● ●● ●●●● ●●●●●●● ● ●● ●●●●● ● ●●
●●
●●●●●● ●●● ●●●● ●● ● ●● ●●● ●● ●● ●● ●●●● ●● ●●●●●●●● ●●●● ● ●● ●●●●●● ●●● ●●● ●● ●●● ●
●
●●●●●●● ● ●● ●●●●●●● ●●● ●●●●●● ●●
● ●
● ●● ●●●●●●●●● ●●● ●●●●●●● ●●● ●● ●●● ●●● ●●●●●●● ●●●●●●●● ●● ●●●●●●●●●● ●●●●●●
●
●● ●●● ●●●●● ●●● ●● ●●●●● ●●●● ●●●●
● ●
● ● ●●●● ● ●●●●●●●● ●●●●●●●● ●● ●●●●●●●● ●●●●● ●● ●●●●●●● ●●●● ●●●● ●●●●●● ●●● ●●
●
●●●●●●● ●●● ●●● ●● ●●●●● ●●● ●● ●● ●
● ●
●●●●●● ●● ●●●● ●●● ●● ● ●● ●● ●●● ●● ●●● ●●● ●● ●●●● ● ●● ● ●● ●● ●● ● ●●● ●●● ●● ●●● ●● ●●●
●
●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●● ●●●● ●●● ●● ●● ●●●● ●●
●●
●●●● ●●●●●●●
● ●●● ● ● ●● ● ●●● ● ● ●● ●
●●
●●●●●●●●● ●●● ● ●●●●● ●● ● ●●● ● ●● ●●● ●●● ●●●●●●●● ●●●● ● ●● ● ●●● ●● ●●● ●● ● ●●●● ● ●
●
●●●●●●● ●●● ●●●●●● ●●●●●●● ●● ●● ●
●●
●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●●●● ●● ●● ●●● ●● ●●● ●●●● ●●●●●●●●
●●●●●● ●●●● ●●●●
●●●● ●●● ●●●●●●
●●● ●●●● ● ●● ●● ●● ●
●●
●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●● ●● ●● ●● ●●● ●● ● ●● ●●●●● ●●● ●●● ●● ●●● ●● ● ● ●● ● ●●
●
lpsa●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0 40 80
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●● ● ● ●● ●●●● ●●●●●●● ● ●● ●●●●● ● ●●●●
●●●● ●● ●●● ●●●● ●● ● ●● ●●● ●● ●● ●● ●●●● ●● ●●●●●●●● ●●●● ● ●● ●●●●●● ●●● ●●● ●● ●●● ●●
3 5
●●●●●●● ● ●● ●●●●●●● ●●● ●●●●●● ●●● ●
● ●● ●●●●●●●●● ●●● ●●●●●●● ●●● ●● ●●● ●●● ●●●●●●● ●●●●●●●● ●● ●●●●●●●●●● ●●●●●●●
●● ●●● ●●●●● ●●● ●● ●●●●● ●●●● ●●●●● ●
● ● ●●●● ● ●●●●●●●● ●●●●●●●● ●● ●●●●●●●● ●●●●● ●● ●●●●●●● ●●●● ●●●● ●●●●●● ●●● ●● ●
−1 1
●●●● ●●● ●●● ●●● ●● ●●●●● ●●● ●● ●● ●● ●
●●●●●● ●● ●●●● ●●● ●● ● ●● ●● ●●● ●● ●●● ●●● ●● ●●●● ● ●● ● ●● ●● ●● ● ●●● ●●● ●● ●●● ●● ●●● ●
●●●●●●●●●●●●●●●●●●●● ●●●● ●●●● ●●
●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●● ●●●● ●●● ●● ●● ●●●● ●●●●
−1 1 3
●●● ● ●●●●●●●● ●●● ● ● ●● ● ●●● ● ● ●● ● ●●
●●●●●●●●● ●●● ● ●●●●● ●● ● ●●● ● ●● ●●● ●●● ●●●●●●●● ●●●● ● ●● ● ●●● ●● ●●● ●● ● ●●●● ● ●●
●●● ●●●● ●●● ●●●●●● ●●●●●●● ●● ●● ●●●
●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●●●● ●● ●● ●●● ●● ●●● ●●●● ●●●●●●●●●●●●●● ●●●● ●●●●
0 40 80
●●●● ●●● ●●●●●● ●●● ●●●● ● ●● ●● ●● ●●●
●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●● ●● ●● ●● ●●● ●● ● ●● ●●●●● ●●● ●●● ●● ●●● ●● ● ● ●● ● ●● ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●
0.0 0.6
0.0
0.8
train
Try to answer the following questions:
a) What are the pair-wise relations between lpsa and the other variables – any indi-cations of important relations?
b) Are there any clearly non-normal (e.g. skew) distributions among the variables?
c) Run the 8-variable MLR analysis and try to reduce the model by removing themost non-significant variables one by one – what is the final model?
eNote 3 3.3 EXERCISES 37
d) Interpret the parameters of the final model – compare with the investigation in 1.
e) What is the estimate (and interpretation) of the residual standard deviation?
f) Investigate the validity/assumptions of the final model:
1. Residual checks
2. Influential/outlying observations
3. Any additional model structure? (non-linearities, iteractions?)(diagnostics plotsand/or model extensions)