mlr, multiple linear regression ... - 27411.compute.dtu… · enote 3 3.2 example: car data 6 -...

37
eNote 3 1 eNote 3 MLR, Multiple Linear Regression (OLS) in R

Upload: duongcong

Post on 15-Sep-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

eNote 3 1

eNote 3

MLR, Multiple Linear Regression (OLS) inR

eNote 3 INDHOLD 2

Indhold

3 MLR, Multiple Linear Regression (OLS) in R 13.1 Reading material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.2 Example: Car data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.2.1 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.2.2 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.2.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.4 More validation - modelling non-linearities and interactions . . . . 193.2.5 Removing outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.1 Reading material

• Intro statistics course eNote6 on MLR: (incl. an ozon concentration example) http://introstat.compute.dtu.dk/enote/afsnit/NUID177/

• Check Wehrens book, Chapter 8, pp 145-148

• and the Varmuza-book, chapter 4, sections 4.3.1 -4.3.2.

• And some basics on Regression in R: http://www.statmethods.net/stats/regression.html

• And Regression diagnostics in R: http://www.statmethods.net/stats/regression.html

eNote 3 3.2 EXAMPLE: CAR DATA 3

3.2 Example: Car data

3.2.1 Exploration

Please check the section on explorative plotting of the car data in chapter 1 of the eNote.

## The data:

data(mtcars)

summary(mtcars) # Summarize each variable in the data set

3.2.2 Modelling

Simple linear regression is carried out as follows:

# Simple regression

lm1 <- lm(mpg ~ wt, data = mtcars)

summary(lm1)

Call:

lm(formula = mpg ~ wt, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-4.543 -2.365 -0.125 1.410 6.873

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 37.285 1.878 19.86 < 2e-16 ***

wt -5.344 0.559 -9.56 1.3e-10 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 3.05 on 30 degrees of freedom

Multiple R-squared: 0.753,Adjusted R-squared: 0.745

eNote 3 3.2 EXAMPLE: CAR DATA 4

F-statistic: 91.4 on 1 and 30 DF, p-value: 1.29e-10

For the full MLR-model we simply add all the terms in the model:

# Full MLR model:

lm2 <- lm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb, data = mtcars)

summary(lm2)

Call:

lm(formula = mpg ~ cyl + disp + hp + drat + wt + qsec + vs +

am + gear + carb, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-3.45 -1.60 -0.12 1.22 4.63

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 12.3034 18.7179 0.66 0.518

cyl -0.1114 1.0450 -0.11 0.916

disp 0.0133 0.0179 0.75 0.463

hp -0.0215 0.0218 -0.99 0.335

drat 0.7871 1.6354 0.48 0.635

wt -3.7153 1.8944 -1.96 0.063 .

qsec 0.8210 0.7308 1.12 0.274

vs 0.3178 2.1045 0.15 0.881

am 2.5202 2.0567 1.23 0.234

gear 0.6554 1.4933 0.44 0.665

carb -0.1994 0.8288 -0.24 0.812

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 2.65 on 21 degrees of freedom

Multiple R-squared: 0.869,Adjusted R-squared: 0.807

F-statistic: 13.9 on 10 and 21 DF, p-value: 3.79e-07

Some (few actually) effects appear significant others not. It is possible to have R do someautomated model selection, that is, e.g. removing non-significant terms in a stepwise

eNote 3 3.2 EXAMPLE: CAR DATA 5

manner:

step(lm2, direction = "backward")

Start: AIC=70.9

mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb

Df Sum of Sq RSS AIC

- cyl 1 0.08 148 68.9

- vs 1 0.16 148 68.9

- carb 1 0.41 148 69.0

- gear 1 1.35 149 69.2

- drat 1 1.63 149 69.2

- disp 1 3.92 151 69.7

- hp 1 6.84 154 70.3

- qsec 1 8.86 156 70.8

<none> 148 70.9

- am 1 10.55 158 71.1

- wt 1 27.01 174 74.3

Step: AIC=68.92

mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb

Df Sum of Sq RSS AIC

- vs 1 0.27 148 67.0

- carb 1 0.52 148 67.0

- gear 1 1.82 149 67.3

- drat 1 1.98 150 67.3

- disp 1 3.90 152 67.7

- hp 1 7.36 155 68.5

<none> 148 68.9

- qsec 1 10.09 158 69.0

- am 1 11.84 159 69.4

- wt 1 27.03 175 72.3

Step: AIC=66.97

mpg ~ disp + hp + drat + wt + qsec + am + gear + carb

Df Sum of Sq RSS AIC

- carb 1 0.69 148 65.1

eNote 3 3.2 EXAMPLE: CAR DATA 6

- gear 1 2.14 150 65.4

- drat 1 2.21 150 65.4

- disp 1 3.65 152 65.8

- hp 1 7.11 155 66.5

<none> 148 67.0

- am 1 11.57 159 67.4

- qsec 1 15.68 164 68.2

- wt 1 27.38 175 70.4

Step: AIC=65.12

mpg ~ disp + hp + drat + wt + qsec + am + gear

Df Sum of Sq RSS AIC

- gear 1 1.6 150 63.5

- drat 1 1.9 150 63.5

<none> 148 65.1

- disp 1 10.1 159 65.2

- am 1 12.3 161 65.7

- hp 1 14.8 163 66.2

- qsec 1 26.4 175 68.4

- wt 1 69.1 218 75.3

Step: AIC=63.46

mpg ~ disp + hp + drat + wt + qsec + am

Df Sum of Sq RSS AIC

- drat 1 3.3 153 62.2

- disp 1 8.5 159 63.2

<none> 150 63.5

- hp 1 13.3 163 64.2

- am 1 20.0 170 65.5

- qsec 1 25.6 176 66.5

- wt 1 67.6 218 73.4

Step: AIC=62.16

mpg ~ disp + hp + wt + qsec + am

Df Sum of Sq RSS AIC

- disp 1 6.6 160 61.5

<none> 153 62.2

eNote 3 3.2 EXAMPLE: CAR DATA 7

- hp 1 12.6 166 62.7

- qsec 1 26.5 180 65.3

- am 1 32.2 186 66.3

- wt 1 69.0 222 72.1

Step: AIC=61.52

mpg ~ hp + wt + qsec + am

Df Sum of Sq RSS AIC

- hp 1 9.2 169 61.3

<none> 160 61.5

- qsec 1 20.2 180 63.3

- am 1 26.0 186 64.3

- wt 1 78.5 239 72.3

Step: AIC=61.31

mpg ~ wt + qsec + am

Df Sum of Sq RSS AIC

<none> 169 61.3

- am 1 26.2 195 63.9

- qsec 1 109.0 278 75.2

- wt 1 183.3 353 82.8

Call:

lm(formula = mpg ~ wt + qsec + am, data = mtcars)

Coefficients:

(Intercept) wt qsec am

9.62 -3.92 1.23 2.94

In each step, the least important term is dropped from the model if it is judged not tobe important. The default criterion used here for judging ”importance” is the so-calledAIC, which in this case is very closely related to the usual t-testing of each individualterm, that could also be used as a criterion. The AIC is defined from the log-likelihoodvalue of the model fit:

AIC = 2k− 2 log L

where k is the number of paramaters in the model and L is the (maximum) likelihoodvalue. For the full MLR model here, we have k = 12 parameters, 10 x-variables, the in-tercept and the residual variance parameter, so we can check the AIC and log-likelihood

eNote 3 3.2 EXAMPLE: CAR DATA 8

values by inbuilt R-functions:

logLik(lm2)

’log Lik.’ -69.855 (df=12)

2*12-2*logLik(lm2)

’log Lik.’ 163.71 (df=12)

AIC(lm2)

[1] 163.71

We will not teach the likelihood theory here, but merely mention that the maximum log-likelihood value for these models simply amounts to a measure of the fit of the modelusing the normal distribution, that is, it depends basically on the residuals of the model:

lm2summary <- summary(lm2)

lm2summary$sigma

[1] 2.6502

var_ml <- lm2summary$sigma^2*21/32

sum(log(dnorm(resid(lm2), sd = sqrt(var_ml))))

[1] -69.855

The better the fit, the smaller the variance, the larger the log-likelihood. And you thenchoose models where the AIC is smallest: Whenever you drop a variable from the mo-del, it will fit poorer and the log-likelihood will be smaller for the model without thevariable. However, only if the log-likelihood becomes only less than 2 smaller a singlevariable will be dropped from the model due to the AIC approach.

eNote 3 3.2 EXAMPLE: CAR DATA 9

The AIC value for each variable is hence in one-to-one correspondence with the t-testp-values. But actually, being chosen to be kept in the model due to the AIC criterionmight not necessarily mean that the variable is significant on the 5%-level, so we checkthe final model chosen by the step-function:

lm3 <- lm(formula = mpg ~ wt + qsec + am, data = mtcars)

summary(lm3)

Call:

lm(formula = mpg ~ wt + qsec + am, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-3.481 -1.556 -0.726 1.411 4.661

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 9.618 6.960 1.38 0.17792

wt -3.917 0.711 -5.51 7e-06 ***

qsec 1.226 0.289 4.25 0.00022 ***

am 2.936 1.411 2.08 0.04672 *

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 2.46 on 28 degrees of freedom

Multiple R-squared: 0.85,Adjusted R-squared: 0.834

F-statistic: 52.7 on 3 and 28 DF, p-value: 1.21e-11

3.2.3 Validation

# plot fitted vs. observed - identifying observation nr:

plot(mtcars$mpg, lm3$fitted, type = "n")

text(mtcars$mpg, lm3$fitted, labels = row.names(mtcars))

eNote 3 3.2 EXAMPLE: CAR DATA 10

10 15 20 25 30

1015

2025

30

mtcars$mpg

lm3$

fitte

d Mazda RX4Mazda RX4 Wag

Datsun 710

Hornet 4 Drive

Hornet Sportabout

Valiant

Duster 360

Merc 240D

Merc 230

Merc 280Merc 280C

Merc 450SEMerc 450SLMerc 450SLC

Cadillac FleetwoodLincoln ContinentalChrysler Imperial

Fiat 128Honda Civic

Toyota Corolla

Toyota Corona

Dodge ChallengerAMC Javelin

Camaro Z28Pontiac Firebird

Fiat X1−9

Porsche 914−2

Lotus Europa

Ford Pantera L

Ferrari Dino

Maserati Bora

Volvo 142E

# Regression Diagnostics plots given automatically, either 4:

par(mfrow = c(2, 2))

plot(lm3)

eNote 3 3.2 EXAMPLE: CAR DATA 11

10 15 20 25 30

−4

−2

02

4

Fitted values

Res

idua

ls

●●

● ●

Residuals vs Fitted

Chrysler Imperial Fiat 128Toyota Corolla

●●

−2 −1 0 1 2

−1

01

2

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

Chrysler ImperialFiat 128

Toyota Corolla

10 15 20 25 30

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

●●

●●

●●

Scale−LocationChrysler Imperial

Fiat 128Toyota Corolla

0.00 0.05 0.10 0.15 0.20 0.25 0.30

−1

01

2

Leverage

Sta

ndar

dize

d re

sidu

als

●●

●●

Cook's distance

0.5

Residuals vs Leverage

Chrysler Imperial

Merc 230

Fiat 128

# Or 6:

par(mfrow = c(3, 2))

plot(lm3, 1:6)

eNote 3 3.2 EXAMPLE: CAR DATA 12

10 15 20 25 30

−4

−2

02

4

Fitted values

Res

idua

ls

●●

● ●

●●

Residuals vs Fitted

Chrysler Imperial Fiat 128Toyota Corolla

●●

●●

−2 −1 0 1 2

−1

01

2

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

Chrysler ImperialFiat 128

Toyota Corolla

10 15 20 25 30

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

● ●

●●

●●

Scale−LocationChrysler Imperial

Fiat 128Toyota Corolla

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

Obs. number

Coo

k's

dist

ance

Cook's distanceChrysler Imperial

Merc 230Fiat 128

0.00 0.05 0.10 0.15 0.20 0.25 0.30

−1

01

2

Leverage

Sta

ndar

dize

d re

sidu

als

●●

●●

Cook's distance

0.5

Residuals vs Leverage

Chrysler Imperial

Merc 230

Fiat 128

0.00

0.10

0.20

0.30

Leverage hii

Coo

k's

dist

ance

●●

●●

●●●●● ● ●

●●

●●

0.05 0.1 0.15 0.2 0.25 0.3

0

0.5

11.522.5

Cook's dist vs Leverage hii (1 − hii)Chrysler Imperial

Merc 230Fiat 128

# Plot residuals versus individual xs:

par(mfrow = c(2, 3))

for (i in 4:8) {plot(lm3$residuals ~ mtcars[,i], type = "n", xlab = names(mtcars)[i])

text(mtcars[,i], lm3$residuals, labels = row.names(mtcars))

lines(lowess(mtcars[,i], lm3$residuals), col = "blue")

}

eNote 3 3.2 EXAMPLE: CAR DATA 13

50 100 150 200 250 300

−2

02

4

hp

lm3$

resi

dual

s

Mazda RX4Mazda RX4 Wag

Datsun 710

Hornet 4 Drive

Hornet Sportabout

Valiant

Duster 360

Merc 240D

Merc 230

Merc 280

Merc 280C

Merc 450SE

Merc 450SL

Merc 450SLC

Cadillac Fleetwood

Lincoln Continental

Chrysler ImperialFiat 128

Honda Civic

Toyota Corolla

Toyota Corona

Dodge Challenger

AMC Javelin

Camaro Z28

Pontiac Firebird

Fiat X1−9

Porsche 914−2

Lotus Europa

Ford Pantera L

Ferrari DinoMaserati Bora

Volvo 142E

3.0 3.5 4.0 4.5 5.0

−2

02

4

drat

lm3$

resi

dual

sMazda RX4

Mazda RX4 Wag

Datsun 710

Hornet 4 Drive

Hornet Sportabout

Valiant

Duster 360

Merc 240D

Merc 230

Merc 280

Merc 280C

Merc 450SE

Merc 450SL

Merc 450SLC

Cadillac Fleetwood

Lincoln Continental

Chrysler Imperial Fiat 128

Honda Civic

Toyota Corolla

Toyota Corona

Dodge Challenger

AMC Javelin

Camaro Z28

Pontiac Firebird

Fiat X1−9

Porsche 914−2

Lotus Europa

Ford Pantera L

Ferrari DinoMaserati Bora

Volvo 142E

2 3 4 5

−2

02

4

wt

lm3$

resi

dual

s

Mazda RX4Mazda RX4 Wag

Datsun 710

Hornet 4 Drive

Hornet Sportabout

Valiant

Duster 360

Merc 240D

Merc 230

Merc 280

Merc 280C

Merc 450SE

Merc 450SL

Merc 450SLC

Cadillac Fleetwood

Lincoln Continental

Chrysler ImperialFiat 128

Honda Civic

Toyota Corolla

Toyota Corona

Dodge Challenger

AMC Javelin

Camaro Z28

Pontiac Firebird

Fiat X1−9

Porsche 914−2

Lotus Europa

Ford Pantera L

Ferrari DinoMaserati Bora

Volvo 142E

16 18 20 22

−2

02

4

qsec

lm3$

resi

dual

s

Mazda RX4Mazda RX4 Wag

Datsun 710

Hornet 4 Drive

Hornet Sportabout

Valiant

Duster 360

Merc 240D

Merc 230

Merc 280

Merc 280C

Merc 450SE

Merc 450SL

Merc 450SLC

Cadillac Fleetwood

Lincoln Continental

Chrysler ImperialFiat 128

Honda Civic

Toyota Corolla

Toyota Corona

Dodge Challenger

AMC Javelin

Camaro Z28

Pontiac Firebird

Fiat X1−9

Porsche 914−2

Lotus Europa

Ford Pantera L

Ferrari DinoMaserati Bora

Volvo 142E

0.0 0.2 0.4 0.6 0.8 1.0

−2

02

4

vs

lm3$

resi

dual

s

Mazda RX4Mazda RX4 Wag

Datsun 710

Hornet 4 Drive

Hornet Sportabout

Valiant

Duster 360

Merc 240D

Merc 230

Merc 280

Merc 280C

Merc 450SE

Merc 450SL

Merc 450SLC

Cadillac Fleetwood

Lincoln Continental

Chrysler Imperial Fiat 128

Honda Civic

Toyota Corolla

Toyota Corona

Dodge Challenger

AMC Javelin

Camaro Z28

Pontiac Firebird

Fiat X1−9

Porsche 914−2

Lotus Europa

Ford Pantera L

Ferrari DinoMaserati Bora

Volvo 142E

Before moving on, with e.g. more linearity investigation diagnostics, it seems alreadyhere that we have a problem with the variability: It appears to increase with increa-sing fitted values. This would point towards a log-transformation. Let’s try the Box-coxpower transformations for the full model:

par(mfrow = c(1, 1))

library(MASS)

boxcox(lm2)

eNote 3 3.2 EXAMPLE: CAR DATA 14

−2 −1 0 1 2

510

15

λ

log−

Like

lihoo

d

95%

This clearly supports the log-transformation, so we redo the initial analysis:

lm2 <- lm(log(mpg) ~ cyl + disp + hp + drat + wt + qsec + vs +

am + gear + carb, data = mtcars)

summary(lm2)

Call:

lm(formula = log(mpg) ~ cyl + disp + hp + drat + wt + qsec +

vs + am + gear + carb, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-0.1457 -0.0789 -0.0175 0.0652 0.2513

Coefficients:

eNote 3 3.2 EXAMPLE: CAR DATA 15

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.78e+00 8.49e-01 3.27 0.0037 **

cyl 7.66e-03 4.74e-02 0.16 0.8733

disp 4.99e-05 8.10e-04 0.06 0.9515

hp -8.96e-04 9.88e-04 -0.91 0.3744

drat 2.22e-02 7.42e-02 0.30 0.7677

wt -1.72e-01 8.60e-02 -2.00 0.0580 .

qsec 3.08e-02 3.32e-02 0.93 0.3640

vs -2.87e-03 9.55e-02 -0.03 0.9763

am 4.74e-02 9.33e-02 0.51 0.6169

gear 5.93e-02 6.78e-02 0.87 0.3917

carb -2.01e-02 3.76e-02 -0.54 0.5983

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.12 on 21 degrees of freedom

Multiple R-squared: 0.89,Adjusted R-squared: 0.837

F-statistic: 16.9 on 10 and 21 DF, p-value: 6.89e-08

step(lm2, direction = "backward")

Start: AIC=-127.05

log(mpg) ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear +

carb

Df Sum of Sq RSS AIC

- vs 1 0.0000 0.304 -129

- disp 1 0.0001 0.304 -129

- cyl 1 0.0004 0.304 -129

- drat 1 0.0013 0.305 -129

- am 1 0.0037 0.307 -129

- carb 1 0.0041 0.308 -129

- gear 1 0.0111 0.315 -128

- hp 1 0.0119 0.316 -128

- qsec 1 0.0124 0.316 -128

<none> 0.304 -127

- wt 1 0.0581 0.362 -123

Step: AIC=-129.05

eNote 3 3.2 EXAMPLE: CAR DATA 16

log(mpg) ~ cyl + disp + hp + drat + wt + qsec + am + gear + carb

Df Sum of Sq RSS AIC

- disp 1 0.0001 0.304 -131

- cyl 1 0.0005 0.304 -131

- drat 1 0.0013 0.305 -131

- am 1 0.0040 0.308 -131

- carb 1 0.0041 0.308 -131

- gear 1 0.0110 0.315 -130

- hp 1 0.0131 0.317 -130

- qsec 1 0.0140 0.318 -130

<none> 0.304 -129

- wt 1 0.0584 0.362 -125

Step: AIC=-131.04

log(mpg) ~ cyl + hp + drat + wt + qsec + am + gear + carb

Df Sum of Sq RSS AIC

- cyl 1 0.0007 0.304 -133

- drat 1 0.0014 0.305 -133

- am 1 0.0040 0.308 -133

- carb 1 0.0088 0.312 -132

- gear 1 0.0112 0.315 -132

- qsec 1 0.0152 0.319 -132

- hp 1 0.0167 0.320 -131

<none> 0.304 -131

- wt 1 0.1443 0.448 -121

Step: AIC=-132.97

log(mpg) ~ hp + drat + wt + qsec + am + gear + carb

Df Sum of Sq RSS AIC

- drat 1 0.0010 0.305 -135

- am 1 0.0035 0.308 -135

- carb 1 0.0085 0.313 -134

- gear 1 0.0109 0.315 -134

- hp 1 0.0164 0.321 -133

- qsec 1 0.0189 0.323 -133

<none> 0.304 -133

- wt 1 0.1499 0.454 -122

eNote 3 3.2 EXAMPLE: CAR DATA 17

Step: AIC=-134.87

log(mpg) ~ hp + wt + qsec + am + gear + carb

Df Sum of Sq RSS AIC

- am 1 0.0046 0.310 -136

- carb 1 0.0078 0.313 -136

- gear 1 0.0129 0.318 -136

- hp 1 0.0178 0.323 -135

<none> 0.305 -135

- qsec 1 0.0205 0.326 -135

- wt 1 0.1638 0.469 -123

Step: AIC=-136.39

log(mpg) ~ hp + wt + qsec + gear + carb

Df Sum of Sq RSS AIC

- carb 1 0.0083 0.318 -138

- qsec 1 0.0163 0.326 -137

<none> 0.310 -136

- hp 1 0.0207 0.331 -136

- gear 1 0.0304 0.340 -135

- wt 1 0.1937 0.504 -123

Step: AIC=-137.55

log(mpg) ~ hp + wt + qsec + gear

Df Sum of Sq RSS AIC

<none> 0.318 -138

- gear 1 0.0228 0.341 -137

- qsec 1 0.0231 0.341 -137

- hp 1 0.0332 0.351 -136

- wt 1 0.2879 0.606 -119

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear, data = mtcars)

Coefficients:

(Intercept) hp wt qsec gear

3.08237 -0.00108 -0.19155 0.02603 0.05023

eNote 3 3.2 EXAMPLE: CAR DATA 18

lm3 <- lm(log(mpg) ~ hp + wt + qsec + gear, data = mtcars)

summary(lm3)

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-0.1421 -0.0674 -0.0316 0.0613 0.2736

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.082366 0.417139 7.39 6.0e-08 ***

hp -0.001080 0.000643 -1.68 0.10

wt -0.191553 0.038757 -4.94 3.6e-05 ***

qsec 0.026027 0.018592 1.40 0.17

gear 0.050232 0.036090 1.39 0.18

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.109 on 27 degrees of freedom

Multiple R-squared: 0.884,Adjusted R-squared: 0.867

F-statistic: 51.6 on 4 and 27 DF, p-value: 2.95e-12

Let us now redo versions of the diagnostics plots using the ggplot-package as illu-strated in chapter 1 for the raw data plotting: Using the melt function of the reshape2-package a version of the data set where (relevant) variables are ”stringed out on top ofeach other” as a single variable, and coding for this in a new variable: (we take the 4x-variables from the model)

library(reshape2)

mtcars$residuals <- resid(lm3)

mtcars2 <- melt(mtcars, measure.vars=c(4, 6, 7, 10))

And then using this new ”variable coding” factor to produce multiple plots for eachvariable:

eNote 3 3.2 EXAMPLE: CAR DATA 19

p <- ggplot(mtcars2, aes(value, residuals))

p <- p + geom_point(shape=1)

p <- p + geom_smooth(method="loess")

p <- p + facet_wrap(~ variable, scales="free")

print(p)

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

hp wt

qsec gear

−0.2

−0.1

0.0

0.1

0.2

−0.1

0.0

0.1

0.2

−0.2

−0.1

0.0

0.1

0.2

0.3

−0.1

0.0

0.1

0.2

100 200 300 2 3 4 5

16 18 20 22 3.0 3.5 4.0 4.5 5.0value

resi

dual

s

Even though the residual patterns are not perfectly linear, none of the confidence bandsdo not contain the zero-value throughout (The drat one is on the limit)

3.2.4 More validation - modelling non-linearities and interactions

The potential non-linearity of hp or drat can be more formally checked by including itin the modelling. It is easily done in R by e.g. the poly-function, here applied to fit andtest a 3rd degree polynomium for each of those two variables:

eNote 3 3.2 EXAMPLE: CAR DATA 20

lm4 <- lm(log(mpg) ~hp + wt + qsec + gear + poly(hp, 3)+ poly(wt, 3),

data = mtcars)

summary(lm4)

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear + poly(hp, 3) +

poly(wt, 3), data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-0.16380 -0.06647 -0.00527 0.03225 0.24115

Coefficients: (2 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.155942 0.485279 6.50 1.2e-06 ***

hp -0.001501 0.000811 -1.85 0.0770 .

wt -0.164364 0.048387 -3.40 0.0025 **

qsec 0.019608 0.021417 0.92 0.3694

gear 0.054383 0.043106 1.26 0.2197

poly(hp, 3)1 NA NA NA NA

poly(hp, 3)2 0.137961 0.151826 0.91 0.3729

poly(hp, 3)3 -0.080076 0.127262 -0.63 0.5354

poly(wt, 3)1 NA NA NA NA

poly(wt, 3)2 0.044573 0.129784 0.34 0.7344

poly(wt, 3)3 -0.132549 0.122949 -1.08 0.2922

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.11 on 23 degrees of freedom

Multiple R-squared: 0.9,Adjusted R-squared: 0.865

F-statistic: 25.8 on 8 and 23 DF, p-value: 1.01e-09

We are confirmed that neither curvature (2nd degree terms) nor more complicated non-linear structures are really significant. Let’s double check without the 3rd degree terms:

lm4 <- lm(log(mpg) ~hp + wt + qsec + gear + poly(hp, 2)+ poly(wt, 2),

data = mtcars)

summary(lm4)

eNote 3 3.2 EXAMPLE: CAR DATA 21

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear + poly(hp, 2) +

poly(wt, 2), data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-0.1621 -0.0647 -0.0197 0.0580 0.2293

Coefficients: (2 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.284438 0.465993 7.05 2.2e-07 ***

hp -0.001286 0.000768 -1.68 0.10632

wt -0.183001 0.044749 -4.09 0.00039 ***

qsec 0.018703 0.020985 0.89 0.38130

gear 0.031637 0.038656 0.82 0.42085

poly(hp, 2)1 NA NA NA NA

poly(hp, 2)2 0.137100 0.148705 0.92 0.36536

poly(wt, 2)1 NA NA NA NA

poly(wt, 2)2 0.097600 0.115796 0.84 0.40730

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.109 on 25 degrees of freedom

Multiple R-squared: 0.893,Adjusted R-squared: 0.867

F-statistic: 34.7 on 6 and 25 DF, p-value: 6.04e-11

We could also check for interactions - the possibility that the effect of some of the x-variables depend on the values of some of the other x-variables.

Again in R, it is easy to try to include all the interactions:

lm5 <- lm(log(mpg) ~ (hp + wt + qsec + gear) *

(hp + wt + qsec + gear), data = mtcars)

summary(lm5)

Call:

lm(formula = log(mpg) ~ (hp + wt + qsec + gear) * (hp + wt +

qsec + gear), data = mtcars)

eNote 3 3.2 EXAMPLE: CAR DATA 22

Residuals:

Min 1Q Median 3Q Max

-0.15227 -0.05985 -0.00141 0.04864 0.19782

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -3.54e+00 5.19e+00 -0.68 0.503

hp 4.60e-04 8.26e-03 0.06 0.956

wt 2.10e+00 1.25e+00 1.68 0.107

qsec 2.90e-01 2.39e-01 1.21 0.239

gear 3.47e-01 7.36e-01 0.47 0.642

hp:wt -8.52e-04 8.28e-04 -1.03 0.315

hp:qsec -7.06e-05 3.81e-04 -0.19 0.855

hp:gear 4.09e-04 1.08e-03 0.38 0.707

wt:qsec -9.10e-02 5.24e-02 -1.74 0.097 .

wt:gear -1.34e-01 8.64e-02 -1.56 0.135

qsec:gear 5.43e-03 3.63e-02 0.15 0.883

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.107 on 21 degrees of freedom

Multiple R-squared: 0.913,Adjusted R-squared: 0.872

F-statistic: 22.1 on 10 and 21 DF, p-value: 6.1e-09

None of them appear significant. But please also remember that even though somemight appear slightly significant, one should be a bit careful about over interpretingthis. First of all, we may in this process do many tests, in this case 6 tests for intera-ctions. A good advice is to always do a so-called ”Bonferroni-type correction in sucha case. Here it would amount to using as significance level 0.05/6 ≈ 0.01 in stead of0.05 to protect against random significances. And things may change when beginningto remove non-significant interactions again from the model. I do this in the following,without showing all steps: (And important: do NOT interpret main effect informationfor a variable when it is also part of interaction terms in the model)

lm6 <- lm(log(mpg) ~ hp + wt + qsec + gear +

hp:wt + hp:qsec + hp:gear + wt:qsec + wt:gear, data = mtcars)

summary(lm6)

eNote 3 3.2 EXAMPLE: CAR DATA 23

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear + hp:wt + hp:qsec +

hp:gear + wt:qsec + wt:gear, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-0.15216 -0.06213 -0.00031 0.05002 0.19604

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -4.08e+00 3.59e+00 -1.14 0.268

hp 1.10e-03 6.91e-03 0.16 0.875

wt 2.13e+00 1.20e+00 1.77 0.091 .

qsec 3.18e-01 1.43e-01 2.23 0.036 *

gear 4.52e-01 2.23e-01 2.03 0.055 .

hp:wt -8.67e-04 8.03e-04 -1.08 0.292

hp:qsec -7.61e-05 3.71e-04 -0.21 0.839

hp:gear 2.78e-04 6.07e-04 0.46 0.651

wt:qsec -9.32e-02 4.92e-02 -1.89 0.072 .

wt:gear -1.31e-01 8.11e-02 -1.61 0.121

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.104 on 22 degrees of freedom

Multiple R-squared: 0.913,Adjusted R-squared: 0.877

F-statistic: 25.7 on 9 and 22 DF, p-value: 1.16e-09

lm7 <- lm(log(mpg) ~ hp + wt + qsec + gear +

hp:wt + hp:gear + wt:qsec + wt:gear, data = mtcars)

summary(lm7)

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear + hp:wt + hp:gear +

wt:qsec + wt:gear, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-0.15190 -0.05938 0.00164 0.04910 0.19396

eNote 3 3.2 EXAMPLE: CAR DATA 24

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -4.122793 3.511592 -1.17 0.252

hp -0.000169 0.003046 -0.06 0.956

wt 2.174015 1.158859 1.88 0.073 .

qsec 0.321575 0.138457 2.32 0.029 *

gear 0.443498 0.214573 2.07 0.050 .

hp:wt -0.000865 0.000786 -1.10 0.283

hp:gear 0.000299 0.000586 0.51 0.614

wt:qsec -0.096674 0.045174 -2.14 0.043 *

wt:gear -0.127107 0.077412 -1.64 0.114

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.102 on 23 degrees of freedom

Multiple R-squared: 0.913,Adjusted R-squared: 0.883

F-statistic: 30.1 on 8 and 23 DF, p-value: 2.07e-10

lm8 <- lm(log(mpg) ~ hp + wt + qsec + gear +

hp:wt + wt:qsec + wt:gear, data = mtcars)

summary(lm8)

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear + hp:wt + wt:qsec +

wt:gear, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-0.14743 -0.06462 -0.00812 0.05687 0.19267

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -3.653785 3.336892 -1.09 0.284

hp 0.000809 0.002332 0.35 0.732

wt 1.938639 1.046850 1.85 0.076 .

qsec 0.302590 0.131308 2.30 0.030 *

gear 0.419263 0.206017 2.04 0.053 .

eNote 3 3.2 EXAMPLE: CAR DATA 25

hp:wt -0.000751 0.000742 -1.01 0.322

wt:qsec -0.089738 0.042417 -2.12 0.045 *

wt:gear -0.102841 0.060180 -1.71 0.100

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.1 on 24 degrees of freedom

Multiple R-squared: 0.912,Adjusted R-squared: 0.886

F-statistic: 35.5 on 7 and 24 DF, p-value: 3.75e-11

lm9 <- lm(log(mpg) ~ hp + wt + qsec + gear +

wt:qsec + wt:gear, data = mtcars)

summary(lm9)

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear + wt:qsec + wt:gear,

data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-0.16175 -0.06625 -0.00705 0.05506 0.19559

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.658008 1.541204 -0.43 0.673

hp -0.001443 0.000699 -2.06 0.050 *

wt 1.000935 0.487534 2.05 0.051 .

qsec 0.192462 0.073529 2.62 0.015 *

gear 0.243207 0.110427 2.20 0.037 *

wt:qsec -0.054639 0.024435 -2.24 0.035 *

wt:gear -0.052699 0.034178 -1.54 0.136

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.101 on 25 degrees of freedom

Multiple R-squared: 0.908,Adjusted R-squared: 0.886

F-statistic: 41.2 on 6 and 25 DF, p-value: 9.03e-12

eNote 3 3.2 EXAMPLE: CAR DATA 26

lm9 <- lm(log(mpg) ~ hp + wt + qsec + gear +

wt:qsec, data = mtcars)

summary(lm9)

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear + wt:qsec, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-0.1286 -0.0731 -0.0182 0.0596 0.2222

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.388875 1.419765 0.27 0.786

hp -0.001724 0.000693 -2.49 0.020 *

wt 0.724936 0.465341 1.56 0.131

qsec 0.167048 0.073531 2.27 0.032 *

gear 0.082830 0.038056 2.18 0.039 *

wt:qsec -0.048975 0.024789 -1.98 0.059 .

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.103 on 26 degrees of freedom

Multiple R-squared: 0.899,Adjusted R-squared: 0.88

F-statistic: 46.5 on 5 and 26 DF, p-value: 3.8e-12

Even in this model, the remeaining interaction is not really significant, so we end upwith pure additive model with the four x’es (already in lm3).

A remark on the modelling interactions: Here we just specified the interactions as mul-tiplicative terms in the model. This actually fits with the fact that this is how a potentialinteraction is modelled by this approach: as an effect where the log(mpg) changes line-arly with the product of two quantitative x-variables. Clearly interaction effects couldbe of many other forms than this. And a way to model this more generally could beto categorize an x-variable and include it in the modelling as a factor as in Analysis ofVariance (ANOVA).

Let’s look at the basic diagnostics plots again: (now for the log-transformed analysis)

eNote 3 3.2 EXAMPLE: CAR DATA 27

# Regression Diagnostics plots given automatically, either 4:

par(mfrow = c(2, 2))

plot(lm3)

2.4 2.6 2.8 3.0 3.2 3.4

−0.

10.

00.

10.

20.

3

Fitted values

Res

idua

ls

●●

●●

Residuals vs Fitted

Chrysler Imperial

Pontiac FirebirdFiat 128

●●

−2 −1 0 1 2

−1

01

23

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

Chrysler Imperial

Pontiac FirebirdFiat 128

2.4 2.6 2.8 3.0 3.2 3.4

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

●●

●●

●●

●●

● ●

Scale−LocationChrysler Imperial

Pontiac FirebirdFiat 128

0.0 0.1 0.2 0.3 0.4 0.5

−1

01

23

Leverage

Sta

ndar

dize

d re

sidu

als

●●

Cook's distance0.5

0.5

1

Residuals vs Leverage

Chrysler Imperial

Cadillac Fleetwood

Merc 230

And the box-cox together with another qq-plot:

par(mfrow = c(1, 2))

boxCox(lm3)

qqPlot(residuals(lm3))

eNote 3 3.2 EXAMPLE: CAR DATA 28

−2 −1 0 1 2

1314

1516

1718

λ

log−

Like

lihoo

d

95%

−2 −1 0 1 2

−0.

10.

00.

10.

2

norm quantiles

resi

dual

s(lm

3)

●●

● ●●●

●●●

●●●

●●

●●

●●

●●

3.2.5 Removing outliers

Even though for the log-transformed mpg-values none of them appears really extreme,let’s try to illustrate what could be done by trying to remove the Chrysler Imperial datapoint:

# How to remove a potential outlier and re-analyze:

# Which data point number is "Chrysler Imperial":

rownames(mtcars)

[1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"

[4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"

[7] "Duster 360" "Merc 240D" "Merc 230"

eNote 3 3.2 EXAMPLE: CAR DATA 29

[10] "Merc 280" "Merc 280C" "Merc 450SE"

[13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"

[16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"

[19] "Honda Civic" "Toyota Corolla" "Toyota Corona"

[22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"

[25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"

[28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"

[31] "Maserati Bora" "Volvo 142E"

# E.g. Let’s try without observation no 17:

# Remove and copy-paste:

mtcars_red <- mtcars[-17,]

# Check:

dim(mtcars)

[1] 32 12

dim(mtcars_red)

[1] 31 12

#row.names(mtcars_red)

lm3_red <- lm(log(mpg) ~ hp + wt + qsec + gear, data = mtcars_red)

summary(lm3_red)

Call:

lm(formula = log(mpg) ~ hp + wt + qsec + gear, data = mtcars_red)

Residuals:

Min 1Q Median 3Q Max

-0.1317 -0.0493 -0.0287 0.0599 0.2253

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.193185 0.358957 8.90 2.3e-09 ***

eNote 3 3.2 EXAMPLE: CAR DATA 30

hp -0.000934 0.000553 -1.69 0.10

wt -0.227616 0.034972 -6.51 6.7e-07 ***

qsec 0.026968 0.015930 1.69 0.10

gear 0.038415 0.031127 1.23 0.23

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.093 on 26 degrees of freedom

Multiple R-squared: 0.916,Adjusted R-squared: 0.903

F-statistic: 70.8 on 4 and 26 DF, p-value: 1.36e-13

lm4_red <- lm(log(mpg) ~ hp + wt + qsec, data = mtcars_red)

summary(lm4_red)

Call:

lm(formula = log(mpg) ~ hp + wt + qsec, data = mtcars_red)

Residuals:

Min 1Q Median 3Q Max

-0.16200 -0.05817 -0.00423 0.05698 0.20961

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.428723 0.306947 11.17 1.3e-11 ***

hp -0.000795 0.000547 -1.45 0.16

wt -0.252602 0.028791 -8.77 2.2e-09 ***

qsec 0.025044 0.016007 1.56 0.13

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.0939 on 27 degrees of freedom

Multiple R-squared: 0.911,Adjusted R-squared: 0.901

F-statistic: 92.1 on 3 and 27 DF, p-value: 2.69e-14

lm5_red <- lm(log(mpg) ~ wt + qsec, data = mtcars_red)

summary(lm5_red)

eNote 3 3.2 EXAMPLE: CAR DATA 31

Call:

lm(formula = log(mpg) ~ wt + qsec, data = mtcars_red)

Residuals:

Min 1Q Median 3Q Max

-0.1861 -0.0654 -0.0105 0.0543 0.2220

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.08000 0.19550 15.75 1.9e-15 ***

wt -0.28399 0.01944 -14.61 1.3e-14 ***

qsec 0.04369 0.00978 4.47 0.00012 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.0958 on 28 degrees of freedom

Multiple R-squared: 0.904,Adjusted R-squared: 0.897

F-statistic: 132 on 2 and 28 DF, p-value: 5.67e-15

par(mfrow = c(3, 2))

plot(lm5_red, 1:6)

eNote 3 3.2 EXAMPLE: CAR DATA 32

2.4 2.6 2.8 3.0 3.2 3.4

−0.

2−

0.1

0.0

0.1

0.2

Fitted values

Res

idua

ls

●●

●●

Residuals vs Fitted

Pontiac Firebird

Toyota Corona

Fiat 128

●●

●●

−2 −1 0 1 2

−2

−1

01

2

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

Pontiac Firebird

Toyota Corona

Fiat 128

2.4 2.6 2.8 3.0 3.2 3.4

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

●●

● ●

●●

●●

Scale−LocationPontiac Firebird

Toyota CoronaFiat 128

0 5 10 15 20 25 30

0.00

0.05

0.10

0.15

Obs. number

Coo

k's

dist

ance

Cook's distanceToyota Corona

Pontiac FirebirdFiat 128

0.00 0.05 0.10 0.15 0.20 0.25 0.30

−2

−1

01

2

Leverage

Sta

ndar

dize

d re

sidu

als

●●

● ●

Cook's distance1

0.5

0.5

1

Residuals vs Leverage

Toyota Corona

Pontiac Firebird

Fiat 128

0.00

0.04

0.08

0.12

Leverage hii

Coo

k's

dist

ance

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●●

0 0.05 0.1 0.15 0.2 0.25 0.3

0

0.5

11.522.5

Cook's dist vs Leverage hii (1 − hii)Toyota Corona

Pontiac FirebirdFiat 128

So some of the significances were due to this single extreme observation.

3.2.6 Prediction

It is easy to predict single or several observations for which data on the x-variables areavailable. E.g. predicting the mpg for observation number 3, 7 and 9:

# With confidence intervals:

predict(lm3, mtcars[c(3,7,9),], interval = ’confidence’)

fit lwr upr

Datsun 710 3.2229 3.1679 3.2778

Duster 360 2.6970 2.5941 2.7999

Merc 230 3.1734 3.0136 3.3331

eNote 3 3.3 EXERCISES 33

# With prediction intervals:

predict(lm3, mtcars[c(3,7,9),], interval = ’prediction’)

fit lwr upr

Datsun 710 3.2229 2.9934 3.4523

Duster 360 2.6970 2.4516 2.9424

Merc 230 3.1734 2.8992 3.4475

Or using the final model without the Chrysler Imperial:

# With confidence intervals:

predict(lm5_red, mtcars[c(3,7,9),], interval = ’confidence’)

fit lwr upr

Datsun 710 3.2342 3.1854 3.2830

Duster 360 2.7582 2.7040 2.8123

Merc 230 3.1859 3.0790 3.2928

# With prediction intervals:

predict(lm5_red, mtcars[c(3,7,9),], interval = ’prediction’)

fit lwr upr

Datsun 710 3.2342 3.0321 3.4363

Duster 360 2.7582 2.5547 2.9617

Merc 230 3.1859 2.9625 3.4093

In this case these observations were also used for the fit, but this was not important here- any new data set can be used in the prediction function.

3.3 Exercises

eNote 3 3.3 EXERCISES 34

Exercise 1 Prostate Cancer data

Analyze the Prostate data from the following book website: http://statweb.stanford.edu/~tibs/ElemStatLearn/ (check p. 47-48 of Hastie et. al for a little description of thedata) (Use the uploaded version of the data file in Campusnet to avoid import problems)

prostate <- read.table("prostatedata.txt", header = TRUE,

sep = ";", dec = ",")

head(prostate)

no lcavol lweight age lbph svi lcp gleason pgg45 lpsa

1 7 0.73716 3.4735 64 0.61519 0 -1.38629 6 0 0.76547

2 9 -0.77653 3.5395 47 -1.38629 0 -1.38629 6 0 1.04732

3 10 0.22314 3.2445 63 -1.38629 0 -1.38629 6 0 1.04732

4 15 1.20597 3.4420 57 -1.38629 0 -0.43078 7 5 1.39872

5 22 2.05924 3.5010 60 1.47476 0 1.34807 7 20 1.65823

6 25 0.38526 3.6674 69 1.59939 0 -1.38629 6 0 1.73165

train

1 FALSE

2 FALSE

3 FALSE

4 FALSE

5 FALSE

6 FALSE

summary(prostate)

no lcavol lweight age

Min. : 1 Min. :-1.347 Min. :2.37 Min. :41.0

1st Qu.:25 1st Qu.: 0.513 1st Qu.:3.38 1st Qu.:60.0

Median :49 Median : 1.447 Median :3.62 Median :65.0

Mean :49 Mean : 1.350 Mean :3.65 Mean :63.9

3rd Qu.:73 3rd Qu.: 2.127 3rd Qu.:3.88 3rd Qu.:68.0

Max. :97 Max. : 3.821 Max. :6.11 Max. :79.0

lbph svi lcp gleason

Min. :-1.39 Min. :0.000 Min. :-1.386 Min. :6.00

1st Qu.:-1.39 1st Qu.:0.000 1st Qu.:-1.386 1st Qu.:6.00

Median : 0.30 Median :0.000 Median :-0.799 Median :7.00

Mean : 0.10 Mean :0.216 Mean :-0.179 Mean :6.75

eNote 3 3.3 EXERCISES 35

3rd Qu.: 1.56 3rd Qu.:0.000 3rd Qu.: 1.179 3rd Qu.:7.00

Max. : 2.33 Max. :1.000 Max. : 2.904 Max. :9.00

pgg45 lpsa train

Min. : 0.0 Min. :-0.431 Mode :logical

1st Qu.: 0.0 1st Qu.: 1.732 FALSE:30

Median : 15.0 Median : 2.591 TRUE :67

Mean : 24.4 Mean : 2.478 NA’s :0

3rd Qu.: 40.0 3rd Qu.: 3.056

Max. :100.0 Max. : 5.583

dim(prostate)

[1] 97 11

Note that the first and the last variable of the data set are not ”real” variables: no is just anumbering of persons, and train is a coding into two groups for (later) analysis: Somepersons have been assigned to a ”training” part of the data set (train=TRUE) and othersto a ”test” part of the data set (train=FALSE). If not otherwise stated, we will for nowignore this grouping.

pairs(prostate)

eNote 3 3.3 EXERCISES 36

no

−1 1 3

●● ● ● ●● ●●●● ●●●●●●● ● ●● ●●●●● ● ●●●●

●●●● ●● ●●● ●●●● ●● ● ●● ●●● ●● ●● ●● ●●●● ●● ●●●●●●●● ●●●● ● ●● ●●●●●● ●●● ●●● ●● ●●● ●●

●●●●●●● ● ●● ●●●●●●● ●●● ●●●●●● ●●

● ●

● ●● ●●●●●●●●● ●●● ●●●●●●● ●●● ●● ●●● ●●● ●●●●●●● ●●●●●●●● ●● ●●●●●●●●●● ●●●●●●●

40 60 80

●● ●●● ●●●●● ●●● ●● ●●●●● ●●●● ●●●●● ●

● ● ●●●● ● ●●●●●●●● ●●●●●●●● ●● ●●●●●●●● ●●●●● ●● ●●●●●●● ●●●● ●●●● ●●●●●● ●●● ●● ●

●●●●●●● ●●● ●●● ●● ●●●●● ●●● ●● ●● ●

● ●

●●●●●● ●● ●●●● ●●● ●● ● ●● ●● ●●● ●● ●●● ●●● ●● ●●●● ● ●● ● ●● ●● ●● ● ●●● ●●● ●● ●●● ●● ●●● ●

0.0 0.6

●●●●●●●●●●●●●●●●●●●●

●●●● ●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●● ●●●● ●●● ●● ●● ●●●● ●●●●

●●● ● ●●●●●●●●

●●● ● ● ●● ● ●●● ● ● ●● ●●●

●●●●●●●●● ●●● ● ●●●●● ●● ● ●●● ● ●● ●●● ●●● ●●●●●●●● ●●●● ● ●● ● ●●● ●● ●●● ●● ● ●●●● ● ●●

6.0 7.5 9.0

●●● ●●●● ●●● ●●●

●●● ●●●●●●● ●● ●● ●●●

●● ●●●●●●●●●● ●●●●●● ●●●●●●● ●●●● ●● ●● ●●● ●● ●●● ●●●● ●●●●●●●●●●●●●● ●●●● ●●●●

●●●●●●● ●●●●

●● ●●● ●●●● ● ●● ●● ●● ●●●

●● ●●●●●●●●●● ●●●●●● ●●●●●●● ●● ●● ●● ●● ●●● ●● ● ●● ●●●●● ●●● ●●● ●● ●●● ●● ● ● ●● ● ●● ●

0 2 4

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●

060

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−1

2

●●●

●●

●●●

●●●●●●●

●●

●●●●●●

●●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●●●

●●●●●●●

●●●

●●●

●●

●●●

●●●●

●●●

●●●●●

lcavol ●

●●●●

●●

●●●

●●●●

●●●

●●

●●●●●●

●●●●●

● ●●

●●

●●●

●●

●●

●●

●●

●● ●●●

●●●

●●●

●●●

●●

●●

●●●

●●●●

●●●

●●●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

● ●●

●●●●

● ●●

● ●

●●●

●●

●●

●●

●●

●●●●●

●●●

● ●●

●●●

●●

●●

●●●

●● ●

●●●

●● ●

●●

●●●

●●

●●●

●●● ●●

●●

●●

●●●●

●●●

● ●●●

●●●●

●●

●●●

●●

●●

●●

●●

●● ●●●●

●●●

●●●

●● ●

●●

● ●

●●●

●●●

●●●

●●●

●●

●●●●

●●

●●●

●●●●●●●

●●

●●●●

● ●●

●● ●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●●

●●●●●●●

●●●

●●

●●

●● ●

●● ●●

● ●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

● ●●

● ● ●●

●●●●

●●

●●●

●●●

●●

●●

●●

●● ●●●

●●●●●●●

●●●

●●

● ●

●● ●

●●●

● ●●

●● ●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●●●

●● ●●●

●●●

●●

●●●

●●●

●●

●●

●●

● ●●●●

●● ●● ●

●●

●●●

●●●

●●

●●●

●●●●

● ●●

●●●●●

●●●

●●

●●●

●●●

●●●

●●

●● ●●

●●●

● ●●●

●●●

●●

●●●

●●●

●●

●●

●●

● ●●●●

●● ●● ●

●●

●●●

●●

● ●

●● ●

●●●

● ●●

●● ●

●●

●●●●

●●

●●●

●●●●●●●

●●

●●●●●●●

●● ●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●●●

●●●●●●●

●●●

●●●

●●

●●●

●●●●

●●●

●●●

●●

●●●●

●●

●●●

●●●●●●●

●●

●●●●●●●●●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●●●

●●●●●●●

●●●

●●●

●●

●●●

●●●●

●●●

●●●●●

●●●●●●●●

●●●●●●●●●●

●●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●

●●●

●●●●●

●●●●

●●●●●●●●●●

●●●●●● ●● ● ● ●●

●●

●●●●●●●

●●

●●●●●●

●●●

●●

●● ●● ●●●

●●●● ●●

●●● ●●● ●

●●●

●●●

●●●●●●

●●

●●

●●●

●●

●●●●

●●●

●●●● ●●

●●

●● ●●lweight

●● ●●● ●●●

●●● ●

● ●●●

●●

●●●● ●●

●●●

●●

●●●● ● ●●

●●●●●

●●

●●●●●●

●●●

●●●

●●●●●●

●●

●●

●●●●

● ●●●

● ●●

●●

●●●●●

●●

● ●● ● ●●●● ●●●

●●●

●● ●●

●●●

●●● ●●●

● ●●

●●●●●● ●● ●●●●

●●●

●● ● ●●

●●●

●●●

●●●●●●● ●●

●●

●●

●● ●

●●

● ●●

●●●

●●

● ●● ●●

●●

●●● ● ●●●●●●●●

●●●●●●●●●●

●●●● ●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●

●●●●●

●●●●

●●●

●●

●● ●●●

●●

●●●● ●●● ● ●●●●

●● ●●● ● ●

●●

●●● ● ●●

● ●●

●●●●●●●●●

●●●● ●

●●●● ●●

● ●●

● ●●

●●●●●

●●●●●●●●

●●●

●●

●●●

●●●

●●

●● ●●●

●●● ● ●● ●●● ●●●

●●

●●●●

●● ●●●●

●●● ●●●

● ●●●

●●

●●●●●●●

●●●●●

●●●● ●●●●●

●●●

● ●●

●●●●● ●

●●●

●●●

●●

●●●●

●●●●●●●●●●

●●

●●●● ●●●● ●●●

●●●

●●● ●

●●●

● ●● ●●●

● ●●

●●

●●●●●●●

●●●●●

●●●● ●●

●●●

●●●

● ●●

●●●●● ●

●●

●●

●●●

●●

●●●

●●●

●●

●● ●●●

●●

● ●● ● ●●●●●●●●

●●●●●●●●●●

●●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●

●●●●●

●●●●

●●●●●●●●●●

●●●●● ●

35

●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●

●●●●●

●●●●

●●●●●●●●●●

●●●●●●

4070 ●

●●●

●●●●

●●●●

●●●

●●●●●

●●●

●●

●●

●●●●●●●●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●●●

●●●●●

●●

●●●●●

●●●●

●●

●● ●

● ●●●

●●●

●●

●●●●●

● ●●

●●

●●

●●

●● ●●●

●●

●●●●● ●●

●●

●●

●●●

●●

●●●

● ●●●● ●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●● ● ●

●●●●

●●

●●●●●

● ●●

●●

●●

●●●●●

●●●●

●●●●●● ●

●●

●●

●● ●

●●

●●●

●●●●●●

●●●

●●

●●

●●●

● ●

●●

●●

● age ●

●● ●

●● ●●

●●●

●●

●●● ●●

●● ●

●●

●●

●●

● ●●●●

●●

● ●●● ●● ●

●●

●●

●● ●

●●

●●●

●●●

●● ●

●● ●

●●

● ●

●●

●●●

●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●●●

●●●

●●

●●

●●●●●●●●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●●●

●●

●●

●●

●●● ●●

●● ●

●●

●●

●●●● ●●●

●●

●●●● ● ●●

●●

●●

●● ●

●●

●●●

● ●●

●● ●

●● ●

●●

●●

●●

●● ●

●●●

●●

●●●

●● ●●

●●●●

●●●

●●● ●●

●● ●

●●

●●

●●●● ●

●●●

●●●●●●●

●●

●●

●●●

●●

●● ●

● ●●●● ●

●●●●●

●●

●●

●●●

●●

●●

●●

●● ●

●● ●●

●●

●●

●●

● ●● ●●

●● ●

●●

●●

●●●● ●●●

●●

●●●●●●●

●●

●●

●●●

●●

●● ●

● ●●

●●●

●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●●●

●●●

●●

●●

●●●●●●●●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●●●

●●●●●

●●

●●●●●

●●●●

●●

●●●

●●●●

●●●●

●●●

●●●●●

●●●

●●

●●

●●●●●●●●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●●●

●●●●●

●●

●●●●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ● ●

●●

●●

●●

●● ●

●●

●●

●●●● ●●

●●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

● ●

● ●

● ●

●●●

●●

● ●

●●

●●●

●●

●●

● ●● ●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

● ●●

● ●

●●

●●

●●●

●●

●●

● ● ●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

lbph ●

●●●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

● ● ●

● ●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

●●

● ●●

●●

●●

●● ●●●●

●●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●● ●●●●

●●●

●●

●●

●●

●●●

●●

● ●

● ●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

−1

1●

●●●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

0.0

0.8

●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●

●●

●●

●●●

●●●●

●● ● ● ●● ●●●● ●●●●●●● ● ●●

●●

●●

● ●

●●

●●

●●●● ●● ●●● ●●●● ●● ● ●● ●●● ●● ●● ●●

●●● ●●

●●●●●●● ●●●● ●

●●

●●

● ●

● ●●

●● ●●

●●●●●●● ● ●● ●●●●●●● ●●●

●●

●●

●●

●●

● ●

● ●● ●●●●●●●●● ●●● ●●●●●●● ●●● ●●

●● ●●●

●●●●●● ●●●●●●

●●

●●

●●

● ●●

●●●●

●● ●●● ●●●●● ●●● ●● ●●●●●

●●

●●

●●

●●

● ●

● ● ●●●● ● ●●●●●●●● ●●●●●●●● ●● ●●

●●●●●

●●●● ●● ●●●●●●

●●

● ●

●●

●● ●

● ●● ●

●●●● ●●● ●●● ●●● ●● ●●●●●

●●

● ●

● ●

● ●

● ●

●●●●●● ●● ●●●● ●●● ●● ● ●● ●● ●●● ●●

●● ●●●

● ●●●● ● ●● ● ●● ●

● ●

●●

●●

●● ●

●●● ●

svi●●● ● ●●●●●●●● ●●● ● ● ●● ●

●●

● ●

● ●

● ●

●●

●●●●●●●●● ●●● ● ●●●●● ●● ● ●●● ● ●●

●● ●●●

●●●●●●● ●●●● ●

● ●

●●

● ●

● ●●

● ● ●●

●●● ●●●● ●●● ●●●●●● ●●●●

●●

● ●

● ●

● ●

●●

●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●●

● ●● ●●

●● ●● ●●● ●●●● ●

●●

●●

●●

●●●

●●●●

●●●● ●●● ●●●●●● ●●● ●●●●

● ●

● ●

● ●

● ●

●●

●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●●

● ●● ●●

●● ●● ● ●● ●●●●●

● ●

●●

● ●

● ● ●

● ●● ●

●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●

●●

●●

●●●

●●● ●

●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●

●●

●●

●●●

●●●●

●●●

●●●●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●●●●●●

●●●●

●●

●●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●

●● ●

● ●●●● ●●

●●

●●

●●

●●

●●

●●●● ●● ●●●●●●

● ● ●●

●●

●●

● ●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●● ● ●● ●●

●●●

●●

●●

●●

●●

● ●● ●●●●●●●●●

● ●●●

●●

●●●

●●

●●●●●● ●

●●

●●

●●

●●

●●

●●●

●● ●

●●●●● ●●

●●

●●

●●

●●

●●

● ● ●●●● ● ●●●

●●●

● ●●●

●●

●●

●●

●●●● ●● ●

●●

●●

●●

●●

●●

●●

●●●

●● ●●● ●●

●●●

●●

●●

●●

●●

●●●●●● ●● ●●●●

● ●● ●

●●

●●

●●

● ●●●● ● ●

● ●

●●

●●

● ●

● ●

●●

●●●

●●●●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●●●●●●

●●●●

●●

●●●

●●

●●●●●●●

●●

●●

●●

●●

● ●

●●● lcp

●●●

●● ●●● ●●

●●

●●

●●

●●

●●●

●● ●●●●●●●●●●●

●●●●

●●

●●●

● ●

●● ●● ●●●

●●

●●

●●

●●

● ●

●●●

●●●

●● ●●●●●

●●

● ●

●●

●●

●●

●● ●●●●●●●●

●●●

●●●●

●●

●●●

● ●

●● ●● ● ●●

●●

●●

●●

●●

● ●

●●

●●●

●●●●●●●

●●●

●●

●●

●●

●●

●●●●●●●●●●●●●

●●●●

●●

●●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●

−1

2

●●●

●●●●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●●●●●●

●●●●

●●

●●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●

6.0

8.0

●●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●

●●

●●●●●●

●●

●●●●●

●●●

●●●●

●●●

●●

●●

●●

●●●●●●●●●●●●●

●●●

●●●●

●● ●

● ●

● ●

●●

●●●●

●●

● ● ●● ●●

●●

●●

●●

● ●● ●●●

●●

●● ● ●●

●●●

●● ●●

● ●●

●●

●●

●●

● ●● ●●●●●● ●●● ●

● ●●

●● ●●

●●●

●●

●●

●●

●●●●

●●

● ●●● ●●

●●

● ●

● ●

●●●●●●

●●

●● ●●●

●●●

● ●●●

● ●●

●●

●●

●●

●●● ●● ●●●●●●●●

● ●●

●●●●

●● ●

●●

●●

●●

●●● ●

● ●

●●●● ●●

● ●

● ●

● ●

●●● ● ●●

●●

●● ●●●

●●●

●● ●●

●●●

●●

●●

●●

●● ●●●● ●●●● ●●●

●● ●

● ●● ●

●●●

● ●

●●

●●

●●● ●

● ●

●●●● ●●

●●

● ●

●●

●●● ●● ●

●●

●● ●● ●

●● ●

● ●●●

● ●●

●●

● ●

● ●

●● ●● ● ●●● ●●● ●●

●● ●

●●● ●

●●●

●●

●●

●●

●●●●

●●

●●●● ●●

● ●

●●

●●

●●●●●●

●●

●●●●●

●●●

●●●●

● ●●

●●

●●

●●

● ●● ●●●● ●●● ●● ●

●●●

●●●●

●●●

● ●

●●

●●

●● ●●

● ●

● ●● ● ●●

● ●

●●

●●

●●●●●●

●●

●●●●●

●● ●

●●● ●

● ●●

●●

●●

●●

● ●● ● ●●● ●● ●●● ●

● ●●

● ● ●●gleason

●●●

● ●

●●

●●

●●● ●

●●

●●●● ● ●

●●

●●

●●

●●●●●●

●●

●●●●●

●●●

●●●●

● ●●

●●

● ●

●●

● ●●● ●●● ●● ●●● ●

● ● ●

● ●● ●

●●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●

●●

●●●●●●

●●

●●●●●

●●●

●●●●

●●●

●●

●●

●●

●●●●●●●●●●●●●

●●●

●●● ●

●●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●

●●

●●●●●●

●●

●●●●●

●●●

●●●●

●●●

●●

●●

●●

●●●●●●●●●●●●●

●●●

●●●●

●●●●●●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●●●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●● ● ●●

● ●●

●● ●●●

●●

●●

●●

●●●

●●

●●

●●●

● ●● ●●●

●●

●● ● ●●

●●

●● ●●●●

●●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●●●●●●●

●● ●●●

●●

●●

●●

●●●

●●

●●

● ●●

●●●●●●

●●

●● ●●●

●●

● ●●●●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●● ●●●

●●●

●● ●●●

● ●

●●

●●

●● ●

●●

●●

● ●●

●●● ● ●●

●●

●● ●●●

●●●

●● ●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●● ●●●

● ●

●●

●●

●●●

●●

●●

●●●●●● ●● ●

●●

●● ●● ●

●●

● ●●●●●

●●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●●●●●●●●●●●●

●●

●●

●●

●● ●

●●

●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●●●

●●●

●●●●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●●

●●●●●●● ●

● ●

● ●

●●

●● ●

●●

●●

●●●●●●●●●

●●

●●●●●

●●

●●● ●●●●

●●●

●●●●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●● ●●

●●●

●● ●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●●●

●●

●●●●●

●●●

●●●●●●

●●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

pgg45●●●●

●●●●●●●●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●●●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

060

●●●●●●●●●●●●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●●●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

03

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●

●● ● ● ●● ●●●● ●●●●●●● ● ●● ●●●●● ● ●●

●●

●●●●●● ●●● ●●●● ●● ● ●● ●●● ●● ●● ●● ●●●● ●● ●●●●●●●● ●●●● ● ●● ●●●●●● ●●● ●●● ●● ●●● ●

●●●●●●● ● ●● ●●●●●●● ●●● ●●●●●● ●●

● ●

● ●● ●●●●●●●●● ●●● ●●●●●●● ●●● ●● ●●● ●●● ●●●●●●● ●●●●●●●● ●● ●●●●●●●●●● ●●●●●●

●● ●●● ●●●●● ●●● ●● ●●●●● ●●●● ●●●●

● ●

● ● ●●●● ● ●●●●●●●● ●●●●●●●● ●● ●●●●●●●● ●●●●● ●● ●●●●●●● ●●●● ●●●● ●●●●●● ●●● ●●

●●●●●●● ●●● ●●● ●● ●●●●● ●●● ●● ●● ●

● ●

●●●●●● ●● ●●●● ●●● ●● ● ●● ●● ●●● ●● ●●● ●●● ●● ●●●● ● ●● ● ●● ●● ●● ● ●●● ●●● ●● ●●● ●● ●●●

●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●● ●●●● ●●● ●● ●● ●●●● ●●

●●

●●●● ●●●●●●●

● ●●● ● ● ●● ● ●●● ● ● ●● ●

●●

●●●●●●●●● ●●● ● ●●●●● ●● ● ●●● ● ●● ●●● ●●● ●●●●●●●● ●●●● ● ●● ● ●●● ●● ●●● ●● ● ●●●● ● ●

●●●●●●● ●●● ●●●●●● ●●●●●●● ●● ●● ●

●●

●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●●●● ●● ●● ●●● ●● ●●● ●●●● ●●●●●●●●

●●●●●● ●●●● ●●●●

●●●● ●●● ●●●●●●

●●● ●●●● ● ●● ●● ●● ●

●●

●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●● ●● ●● ●● ●●● ●● ● ●● ●●●●● ●●● ●●● ●● ●●● ●● ● ● ●● ● ●●

lpsa●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 40 80

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ● ● ●● ●●●● ●●●●●●● ● ●● ●●●●● ● ●●●●

●●●● ●● ●●● ●●●● ●● ● ●● ●●● ●● ●● ●● ●●●● ●● ●●●●●●●● ●●●● ● ●● ●●●●●● ●●● ●●● ●● ●●● ●●

3 5

●●●●●●● ● ●● ●●●●●●● ●●● ●●●●●● ●●● ●

● ●● ●●●●●●●●● ●●● ●●●●●●● ●●● ●● ●●● ●●● ●●●●●●● ●●●●●●●● ●● ●●●●●●●●●● ●●●●●●●

●● ●●● ●●●●● ●●● ●● ●●●●● ●●●● ●●●●● ●

● ● ●●●● ● ●●●●●●●● ●●●●●●●● ●● ●●●●●●●● ●●●●● ●● ●●●●●●● ●●●● ●●●● ●●●●●● ●●● ●● ●

−1 1

●●●● ●●● ●●● ●●● ●● ●●●●● ●●● ●● ●● ●● ●

●●●●●● ●● ●●●● ●●● ●● ● ●● ●● ●●● ●● ●●● ●●● ●● ●●●● ● ●● ● ●● ●● ●● ● ●●● ●●● ●● ●●● ●● ●●● ●

●●●●●●●●●●●●●●●●●●●● ●●●● ●●●● ●●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●● ●●●● ●●● ●● ●● ●●●● ●●●●

−1 1 3

●●● ● ●●●●●●●● ●●● ● ● ●● ● ●●● ● ● ●● ● ●●

●●●●●●●●● ●●● ● ●●●●● ●● ● ●●● ● ●● ●●● ●●● ●●●●●●●● ●●●● ● ●● ● ●●● ●● ●●● ●● ● ●●●● ● ●●

●●● ●●●● ●●● ●●●●●● ●●●●●●● ●● ●● ●●●

●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●●●● ●● ●● ●●● ●● ●●● ●●●● ●●●●●●●●●●●●●● ●●●● ●●●●

0 40 80

●●●● ●●● ●●●●●● ●●● ●●●● ● ●● ●● ●● ●●●

●● ●●●●●●● ●●● ●●●●●● ●●●●●●● ●● ●● ●● ●● ●●● ●● ● ●● ●●●●● ●●● ●●● ●● ●●● ●● ● ● ●● ● ●● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●

0.0 0.6

0.0

0.8

train

Try to answer the following questions:

a) What are the pair-wise relations between lpsa and the other variables – any indi-cations of important relations?

b) Are there any clearly non-normal (e.g. skew) distributions among the variables?

c) Run the 8-variable MLR analysis and try to reduce the model by removing themost non-significant variables one by one – what is the final model?

eNote 3 3.3 EXERCISES 37

d) Interpret the parameters of the final model – compare with the investigation in 1.

e) What is the estimate (and interpretation) of the residual standard deviation?

f) Investigate the validity/assumptions of the final model:

1. Residual checks

2. Influential/outlying observations

3. Any additional model structure? (non-linearities, iteractions?)(diagnostics plotsand/or model extensions)