logistic regression ~ handout #1course1.winona.edu/bdeppa/stat 425/handouts/logisti… · web view8...

104
8 - Introduction to Logistic Regression These data are taken from the text “Applied Logistic Regression” by Hosmer and Lemeshow. Researchers are interested in the relationship between age and presence or absence of evidence of coronary heart disease (CHD). The smooth is an estimate of: E(CHD|Age) = P(CHD=1|Age) Expectation of a Bernoulli Random Variable 193

Upload: others

Post on 09-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

8 - Introduction to Logistic RegressionThese data are taken from the text “Applied Logistic Regression” by Hosmer and Lemeshow. Researchers are interested in the relationship between age and presence or absence of evidence of coronary heart disease (CHD).

The smooth is an estimate of:E(CHD|Age) = P(CHD=1|Age) Why?

Expectation of a Bernoulli Random Variable

193

Page 2: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Fitting the Model in JMPSelect Analyze > Fit Y by X and place CHD (y/n) in the Y box and age in the X box.The resulting output is shown below. Because the response is a dichotomous categorical variable logistic regression is performed.

Example:P(CHD|Age=40)=

P(CHD|Age=60 )=

The curve is a plot of:

P(CHD|Age )=exp( βo+ β1 Age )

1+exp ( βo+ β1 Age )

194

Page 3: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Interpretation of Model Parameters

P(CHD=1|Age) =

eβo+ β1 Age

1+eβo+ β1 Age

Odds for Success

θ ( x~)

1−θ( x~)=

thus

ln ( θ( x~)

1−θ( x~))=βo+β1 Age

Suppose we contrast individuals who are Age = x to those who are Age = x + c. What can we say about the increased risk associated with a c year increase in age? The logistic model gives us a means to do this through the odds ratio (OR).

ln (OR associated with a c year increase in age)=ln(θ( Age=x+c )1−θ( Age=x+c )θ( Age=x )1−θ( Age=x )

)=ln (θ ( Age=x+c )

1−θ ( Age=x+c ) )− ln(θ( Age=x )1−θ ( Age=x ) )=βo+β1( Age+c )−( βo+ β1 Age )=cβ1

Exponentiating both sides gives

Thus the multiplicative increase (or decrease ifβ1<0 ) in odds associated with a c year

increase in age isecβ1

.

195

Page 4: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Example: Interpreting a c year increase in age.

Question: Is it reasonable to assume that a c unit increase in a continuous predictor is constant regardless of starting point? For example, does the risk associated with a 5 year increase in age remain constant throughout ones life?

196

Page 5: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Statistical Inference for the Logistic Regression ModelGiven estimates for the model parameters and their estimated standard errors what types of statistical inferences can be made?

Hypothesis Testing

For testing:Ho : β i=0Ha : β i≠0

Large sample test for significance of “slope” parameter ( β i)

z=β i

SE ( βi )≈N (0,1 )

Confidence Intervals for Parameters and Corresponding OR’s100(1−α )% CI for β i

β i±z1−α /2 SE ( β i )

100(1−α )% CI for OR Associated with β i

exp( β i±z1−α /2 SE( β i))

if β i corresponds to a continuous predictor and we wish to examine the OR associated with a c unit increase the CI for the OR becomes

exp( c β i±z1−α /2 cSE( β i ))

Example: What is the OR for CHD associated with a 10 year increase in age? Give a 95% confidence interval based on this estimate.

197

Page 6: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

In JMP Using the Analyze > Fit Y by X Approach

Estimated Odds Ratios

ROC Curve and Table

By changing the classification rule based on estimated probability we can obtain an ROC curve.

OPTIONS FOR LOGISTIC REGRESSIONRange Odds Ratios – Odds ratio associated with being at the maximum of x vs. the minimum of x.Unit Odds Ratios – Odds ratio associated with a unit increase in x, i.e. c = 1.

ROC Curve – if we use θ( x

~)

=P (CHD|x~ ) to construct a rule for classifying a

patient as having CHD vs. No CHD this option gives the ROC curve coming from all possible cutpoints based on this estimate probability.

198

Page 7: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Logistic Regression for the CHD data in R> CHD <- read.table(file.choose(),header=T)> CHD agegrp age chd1 1 20 02 1 23 03 1 24 04 1 25 05 1 25 1. . . .. . . .. . . .96 8 63 197 8 64 098 8 64 199 8 65 1100 8 69 1

> names(CHD)[1] "agegrp" "age" "chd" > attach(CHD)

> chd <- factor(chd) > chd.glm <- glm(chd~age,family="binomial")> summary(chd.glm)

Call:glm(formula = chd ~ age, family = "binomial")

Deviance Residuals: Min 1Q Median 3Q Max -1.9718 -0.8456 -0.4576 0.8253 2.2859

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.30945 1.13263 -4.688 2.76e-06 ***age 0.11092 0.02404 4.614 3.95e-06 ***---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Make sure that you specify family=”binomial” or R will perform ordinary least squares

199

Page 8: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 136.66 on 99 degrees of freedomResidual deviance: 107.35 on 98 degrees of freedomAIC: 111.35

Number of Fisher Scoring iterations: 3

> probCHD <- exp(-5.30945 + .11092*age)/(1+exp(-5.30945 + .11092*age))> plot(age,probCHD,type="b",ylab="P(CHD|Age)",xlab="Age")

An easier way obtain the estimated probabilities is to extract them from the model object.

> probCHD <- fitted(chd.glm)> plot(Age,probCHD,type=”b”,ylab=”P(CHD|Age)”) # This produces plot above

We can obtain the estimated logit (Li= βo+ β1 Age ) by using the predicted command.> chd.logit = predict(chd.glm)> plot(Age,chd.logit,type="b",ylab="L = bo + b1*Age")> title(main="Plot of Estimated Logit vs. Age")

P(CHD|Age )= eβo+β1 Age

1+eβ o+ β1 Age

βo=−5 . 310β1= .11092

200

Page 9: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

The Logistic Regression Model (single predictor case)

y i=e

βo+β1 xi

1+eβo+β 1x i

+εi

where yi={1 if outcome is a success

0 if outcome is a failure

=θ ( xi )+ε i

What can we say about the errors?

If y i=1 then

If y i=0 then

Thus E( ε )= and Var (ε )=

We see that the errors are binomial NOT normal!

Estimation of Model Parameters (Method of Maximum Likelihood)

201

Page 10: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

For ith

observed pair ( x i , y i ) the contribution to the likelihood is

θ( xi )yi (1−θ( x i))

1− yi

where θ( xi )=

eβo+β1 xi

1+eβo+ β1x i and

y i={10The Likelihood Function

L( β

~ )=L ( βo , β1 )=∏i=1

n

θ( x i )yi (1−θ( x i))

1− yi

maximizing this as a function of both βo and β1 yields the maximum likelihood estimates of the model parameters.

For computational purposes it is usually easier to maximize the logarithm of the likelihood function rather than the likelihood function itself. This is fine because the logarithm is a monotonic increasing function so the maximizing parameters is the same for the likelihood and log-likelihood function. The log-likelihood function is given by

ln L( βo , β1 )=∑i=1

n

y i ln (θ ( xi ))+(1− y i ) ln (1−θ( x i ))To find the parameter estimates we solve simultaneously the equations given by setting the partial derivatives with respect to each parameter equal to 0,i.e. solve simultaneously,

∂∂ βo

ln L( βo , β1)=0

∂∂ β1

ln L( βo , β1 )=0

Several different nonlinear optimization routines are used to find solutions to such systems. Realize of course that this process gets increasingly computationally intensive as the number of terms in the model increases.

How do we measure discrepancy between observed and fitted values?In OLS regression with a continuous response we used

RSS=∑i=1

n

( y i− y i)2=∑

i=1

n

( y i−( ηT ui) )2

=∑i=1

n

( y i−( ηo+η1 u1 i+⋯+ηk uki))2

In logistic regression modeling we can use the deviance (typically denoted D or G2) which is defined as

D =2 ln

D=2∑

i=1

n

y i ln( y i

θ ( xi ))+(1− y i ) ln ( 1− y i

1−θ( x i) )

likelihood of saturated modellikelihood of fitted model

202

Page 11: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Because the likelihood function of the saturated model is equal to 1 when the response (y i ¿is 0 or 1 the deviance reduces to:

D = -2 ln(likelihood of the fitted model)

The deviance can be used to compare two potential models where one model is nested within the other by using the “General Chi-Square Test” for comparing rival logistic regression models.

Nested model concept:

General Chi-Square TestConsider the comparing two rival models where the alternative hypothesis model

Ho :log (θ( x )1−θ( x )

)=β1T x1

H1 :log(θ ( x )1−θ( x )

)=β1T x1+β

2T x2

General Chi-Square Statisticχ2

= (residual deviance of reduced model) – (residual deviance of full model)

= D( for model without the terms in x2 )−D(for model with the terms in x2)~ χ

Δ df 2

If the full model is needed χ2

is BIG and the associated p-value = P( χ Δ df2 > χ2)is small.

Example: CHD and Age

Ho :H1 :

From JMP

(reduced model OK)

(full model needed)

203

Page 12: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

From R> summary(chd.glm)Call:glm(formula = chd ~ Age, family = "binomial")

Deviance Residuals: Min 1Q Median 3Q Max -1.9718 -0.8456 -0.4576 0.8253 2.2859

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.30945 1.13365 -4.683 2.82e-06 ***Age 0.11092 0.02406 4.610 4.02e-06 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Null deviance: 136.66 on 99 degrees of freedomResidual deviance: 107.35 on 98 degrees of freedom

Logistic Regression with a Single Dichotomous Predictor

Example: CHD and Indicator of Age Over 55

Computed using standard approach

Logistic ModelThere are two different ways to code dichotomous variables (0,1) coding or (-1,+1, i.e. contrast) coding. JMP uses contrast coding where as other packages we will generally use the (0,1) coding. The two coding types are shown below.

Age 55+ = {10 Age 55+ =

{+1−1

For the purposes of discussion we will consider the (0,1) coding.

Recall

Age > 55

Age < 55

Age > 55

Age < 55

204

Page 13: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

θ( x )=P(CHD=1|x )= e

βo+ β1 x

1+eβo+β 1 x

where x = Age 55+ indicator we have the following.

Age > 55 (x = 1) Age < 55 (x = 0)

CHD = 1

θ( x=1 )= eβo+ β1

1+eβo+β1

θ( x=0)= eβo

1+eβo

CHD = 0

1−θ ( x=1)= 1

1+e βo+ β11−θ ( x=0 )= 1

1+eβo

Estimating the model parameters “by hand”

OR =

(θ( x=1)/(1−θ ( x=1))(θ( x=0 )/(1−θ ( x=0 ))

=

Logistic Regression in R

> Over55[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0[53] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1Levels: 0 1

> chd[1] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 1 1[53] 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1Levels: 0 1

> table(chd,Over55)

Over55chd 0 1 0 51 6 1 22 21

> chd55 = glm(chd~Over55,family=”binomial”)> summary(chd55)

205

Page 14: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Call:glm(formula = chd ~ Over55, family = "binomial")

Deviance Residuals: Min 1Q Median 3Q Max -1.734 -0.847 -0.847 0.709 1.549

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.8408 0.2551 -3.296 0.00098 ***Over55 2.0935 0.5285 3.961 7.46e-05 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1) Null deviance: 136.66 on 99 degrees of freedomResidual deviance: 117.96 on 98 degrees of freedomAIC= 121.96 Number of Fisher Scoring iterations: 4

In JMP To fit a logistic regression model is best to use the Analyze > Fit Model option.We place CHD y/n (1 = Yes, 2 = No) in the Y box and Over 55 (1 = Yes, 2 = No) in the model effects box. The key is to have “Yes” for risk and disease alpha-numerically before “No”, thus the use of 1 for “Yes” and 2 for “No”

The summary of the fitted logistic model is shown below. Notice that the parameter estimates are the not the same as those obtained from R. This because JMP uses contrast coding for the Over 55 predictor (+1 = Age > 55 and -1 = Age < 55).

206

Page 15: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

OR’s and Fitted Probabilities

Using JMP to Compute OR’s, CI’s, Fitted Probabilities

207

Page 16: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

For dichotomous predictors the range odds ratios compare -1 to +1 in terms of odds ratio which is precisely what we want.

By selecting Save Probability Formula we can save the fitted probabilities to the spreadsheet.

208

Page 17: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Example 1: Oral Contraceptive Use and Myocardial InfarctionsSet up a text file with the data in columns with variable names at the top. The case and control counts are in separate columns. The risk factor OC use and stratification variable Age follow. > OCMI.data = read.table(file.choose(),header=T) # read in text file> OCMI.data MI NoMI Age OCuse1 4 62 1 Yes2 2 224 1 No3 9 33 2 Yes4 12 390 2 No5 4 26 3 Yes6 33 330 3 No7 6 9 4 Yes8 65 362 4 No9 6 5 5 Yes10 93 301 5 No> attach(OCMI.data)

> OC.glm <- glm(cbind(MI,NoMI)~Age+OCuse,family=binomial) # fit model

> summary(OC.glm)

Call:glm(formula = cbind(MI, NoMI) ~ Age + OCuse, family = binomial)

Deviance Residuals:

209

Page 18: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

[1] 0.456248 -0.520517 1.377693 -0.886710 -1.685521 0.714695 -0.130922 0.033643 [9] -0.045061 0.008822

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.3698 0.4347 -10.054 < 2e-16 ***Age2 1.1384 0.4768 2.388 0.0170 * Age3 1.9344 0.4582 4.221 2.43e-05 ***Age4 2.6481 0.4496 5.889 3.88e-09 ***Age5 3.1943 0.4474 7.140 9.36e-13 ***OCuseYes 1.3852 0.2505 5.530 3.19e-08 ***---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 158.0085 on 9 degrees of freedomResidual deviance: 6.5355 on 4 degrees of freedomAIC: 58.825

Number of Fisher Scoring iterations: 3

Find OR associated with oral contraceptive use ADJUSTED for age. Note: CMH procedure gave 3.97.> exp(1.3852)[1] 3.995625

Find a 95% CI for OR associated with OC use.

> exp(1.3852-1.96*.2505)[1] 2.445428> exp(1.3852+1.96*.2505)[1] 6.528518

Interpreting the age effect in terms of OR’s ADJUSTING for OC use. Note: The reference group is Age = 1 which was women 25 – 29 years of age.

> OC.glm$coefficients(Intercept) Age2 Age3 Age4 Age5 OCuseYes -4.369850 1.138363 1.934401 2.648059 3.194292 1.385176 > Age.coefs <- OC.glm$coefficients[2:5]> exp(Age.coefs) Age2 Age3 Age4 Age5 3.121653 6.919896 14.126585 24.392906

Find 95% CI for age = 5 group.

> exp(3.1943-1.96*.4474)[1] 10.14921> exp(3.1943+1.96*.4474)[1] 58.62751

Example 2: Coffee Drinking and Myocardial InfarctionsCoffeeMI.data = read.table(file.choose(),header=T)

210

Page 19: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> CoffeeMI.data Smoking Coffee MI NoMI1 Never > 5 7 312 Never < 5 55 2693 Former > 5 7 184 Former < 5 20 1125 1-14 Cigs > 5 7 246 1-14 Cigs < 5 33 1147 15-25 Cigs > 5 40 458 15-25 Cigs < 5 88 1729 25-34 Cigs > 5 34 2410 25-34 Cigs < 5 50 5511 35-44 Cigs > 5 27 2412 35-44 Cigs < 5 55 5813 45+ Cigs > 5 30 1714 45+ Cigs < 5 34 17> attach(CoffeeMI.data)> Coffee.glm = glm(cbind(MI,NoMI)~Smoking+Coffee,family=binomial)> summary(Coffee.glm)

Call:glm(formula = cbind(MI, NoMI) ~ Smoking + Coffee, family = binomial)

Deviance Residuals: Min 1Q Median 3Q Max -0.7650 -0.4510 -0.0232 0.2999 0.7917

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.2981 0.1819 -7.136 9.60e-13 ***Smoking15-25 Cigs 0.6892 0.2119 3.253 0.00114 ** Smoking25-34 Cigs 1.2462 0.2398 5.197 2.02e-07 ***Smoking35-44 Cigs 1.1988 0.2389 5.017 5.24e-07 ***Smoking45+ Cigs 1.7811 0.2808 6.342 2.27e-10 ***SmokingFormer -0.3291 0.2778 -1.185 0.23616 SmokingNever -0.3153 0.2279 -1.384 0.16646 Coffee> 5 0.3200 0.1377 2.324 0.02012 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 173.7899 on 13 degrees of freedomResidual deviance: 3.7622 on 6 degrees of freedomAIC: 84.311

Number of Fisher Scoring iterations: 3

OR for drinking 5 or more cups of coffee per day.Note: CMH procedure gave OR = 1.375

> exp(.3200)[1] 1.377128

95% CI for OR associated with heavy coffee drinking

211

Page 20: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> exp(.3200 - 1.96*.1377)[1] 1.051385> exp(.3200 + 1.96*.1377)[1] 1.803794

Reordering a FactorTo examine the effect of smoking we might want to “reorder” the levels of smoking status so that individuals who have never smoked are used as the reference group. To do this in R you must do the following:

Smoking = factor(Smoking,levels=c("Never","Former","1-14 Cigs","15-25 Cigs","25-34 Cigs","35-44 Cigs","45+ Cigs"))

The first level specified in the levels subcommand will be used as the reference group, “Never” in this case. Refitting the model with the reordered smoking status factor gives the following:

> Coffee.glm2 <-glm(cbind(MI,NoMI)~Smoking+Coffee,family=binomial)> summary(Coffee.glm2)Call:glm(formula = cbind(MI, NoMI) ~ Smoking + Coffee, family = binomial)Deviance Residuals: Min 1Q Median 3Q Max -0.7650 -0.4510 -0.0232 0.2999 0.7917

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.61344 0.14068 -11.469 < 2e-16 ***SmokingFormer -0.01376 0.25376 -0.054 0.9568 Smoking1-14 Cigs 0.31533 0.22789 1.384 0.1665 Smoking15-25 Cigs 1.00451 0.17976 5.588 2.30e-08 ***Smoking25-34 Cigs 1.56150 0.21254 7.347 2.03e-13 ***Smoking35-44 Cigs 1.51417 0.21132 7.165 7.77e-13 ***Smoking45+ Cigs 2.09646 0.25855 8.108 5.13e-16 ***Coffee> 5 0.31995 0.13766 2.324 0.0201 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 173.7899 on 13 degrees of freedomResidual deviance: 3.7622 on 6 degrees of freedomAIC: 84.311

Number of Fisher Scoring iterations: 3

Notice that “SmokingNever” is now absent from the output so we know it is being used as the reference group. The OR’s associated with the various levels of smoking are computed below.

> Smoke.coefs = Coffee.glm$coefficients[2:7]> exp(Smoke.coefs)SmokingFormer Smoking1-14 Cigs Smoking15-25 Cigs Smoking25-34 Cigs 0.986338 1.370715 2.730561 4.765984

212

Page 21: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Smoking35-44 Cigs Smoking45+ Cigs 4.545632 8.137279

Confidence intervals for each could be computed in the standard way.

Some Details for Categorical Predictors with More Than Two LevelsConsider the coffee drinking/MI study above. The stratification variable smoking has seven levels. Thus it requires six dummy variables to define it. The level that is not defined using a dichotomous dummy variable serves as the reference group. The table below shows how the value of the dummy variables:

Level D2 D3 D4 D5 D6 D7Never (Reference Group)

0 0 0 0 0 0

Former 1 0 0 0 0 01 – 14 Cigs 0 1 0 0 0 015 – 24 Cigs 0 0 1 0 0 025 – 34 Cigs 0 0 0 1 0 035 – 44 Cigs 0 0 0 0 1 045+ Cigs 0 0 0 0 0 1

Example: Coffee Drinking and Myocardial InfarctionsCoffeeMI.data = read.table(file.choose(),header=T)> CoffeeMI.data Smoking Coffee MI NoMI1 Never > 5 7 31

213

Page 22: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

2 Never < 5 55 2693 Former > 5 7 184 Former < 5 20 1125 1-14 Cigs > 5 7 246 1-14 Cigs < 5 33 1147 15-25 Cigs > 5 40 458 15-25 Cigs < 5 88 1729 25-34 Cigs > 5 34 2410 25-34 Cigs < 5 50 5511 35-44 Cigs > 5 27 2412 35-44 Cigs < 5 55 5813 45+ Cigs > 5 30 1714 45+ Cigs < 5 34 17

The Logistic Model

ln ( θ( x )~

1−θ( x~))=βo+β1 Coffee+β2 D2+β3 D3+β4 D4+β5 D5+β6 D6+β7 D7

where Coffee is a dichotomous predictor equal to 1 if they 5 or more cups of coffee per day.

Comparing the log-odds of a heavy coffee drinker who who smokes 15-25 cigarettes day to a heavy coffee drinker who has never smoked we have.

ln (θ1( x~)

1−θ1 ( x~) )=βo+ β1+ β4

ln (θ2( x~)

1−θ2 ( x~) )=βo+ β1

Taking the difference gives,

ln (θ1( x

~)

1−θ1( x~)

θ2( x~)

1−θ2( x~) )=β4

thus

eβ4= the odds ratio associated with smoking 15-24 cigarettes per day when compared to

individuals who have never smoked amongst heavy coffee drinkers. Because β1 is not involved in the odds ratio the result is the same for non-heavy coffee drinkers as well!

214

Page 23: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

You can also consider combinations of factors, e.g. if we compared heavy coffee drinkers who smoked 15-24 cigarettes to a non-heavy coffee drinkers who have never smoked the associated OR would be given bye

β1+β 4 .

Using our fitted model the OR’s ratios discussed above would be.

> summary(Coffee.glm)

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.61344 0.14068 -11.469 < 2e-16 ***SmokingFormer -0.01376 0.25376 -0.054 0.9568 Smoking1-14 Cigs 0.31533 0.22789 1.384 0.1665 Smoking15-25 Cigs 1.00451 0.17976 5.588 2.30e-08 ***Smoking25-34 Cigs 1.56150 0.21254 7.347 2.03e-13 ***Smoking35-44 Cigs 1.51417 0.21132 7.165 7.77e-13 ***Smoking45+ Cigs 2.09646 0.25855 8.108 5.13e-16 ***Coffee> 5 0.31995 0.13766 2.324 0.0201 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

OR for 15-24 cigarette smokers vs. never smokers (regardless of coffee drinking status)> exp(1.00451)[1] 2.730569

OR for 15-24 cigarette smokers who are also heavy coffee drinkers vs. non-smokers who are not heavy coffee drinkers > exp(.31995 + 1.00451)[1] 3.760154

Similar calculations could be done for other combinations of coffee and cigarette use.

Using Arc when the Number Trials is not 1

Example 1: Oral contraceptive use, myocardial infarctions, and ageTo read these data in Arc it is easiest to create a text file that looks like:

Age OCuse MI NoMI Trials1 Yes 4 62 661 No 2 224 2262 Yes 9 33 422 No 12 390 4023 Yes 4 26 303 No 33 330 3634 Yes 6 9 154 No 65 362 4275 Yes 6 5 115 No 93 301 394

215

Page 24: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

The Trials column contains the total number of patients in each age and oral contraceptive use category, i.e. the sum of the number of patients with MI and the number of patients without MI (NoMI).

When read in Arc we have:; loading D:\Data\Deppa Documents\Biostatistics (Biometry II)\Book Data\OCMI.txtArc 1.06, rev July 2004, Mon Oct 16, 2006, 12:58:46. Data set name: OCMIOral contraceptive use, age, and myocardial infarctionsName Type n InfoAGE Variate 10 MI Variate 10 NOMI Variate 10 TRIALS Variate 10 OCUSE Text 10

In Arc we need to create to turn the Age variable into a factor as we don’t want to be interpreted as an actual number and we need to create a factor based on OCuse. By default Arc does things alphabetically so No would be used as “present” which is not desirable. Thus it is best to create separate dichotomous dummy variables for each level individually. This will allow us to use those who used oral contraceptives as having “risk present”. To do this in Arc we need to use the Make Factors… option in the data menu.

For oral contraceptive use we want two separate dummy variables, one for each level of use, i.e. Yes and No.

216

Page 25: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Fitting the logistic model in Arc with MI as the response and OCUSE[YES] as the risk factor indicator.

Results for Fitted Logistic ModelIteration 1: deviance = 6.69914Iteration 2: deviance = 6.53561

Data set = OCMI, Name of Fit = B1Binomial RegressionKernel mean function = LogisticResponse = MITerms = ({F}AGE {T}OCUSE[YES])Trials = TRIALSCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -4.36985 0.434642 -10.054 0.0000{F}AGE[2] 1.13836 0.476782 2.388 0.0170{F}AGE[3] 1.93440 0.458227 4.221 0.0000{F}AGE[4] 2.64806 0.449627 5.889 0.0000{F}AGE[5] 3.19429 0.447386 7.140 0.0000{T}OCUSE[YES] 1.38518 0.250458 5.531 0.0000

Scale factor: 1. Number of cases: 10Degrees of freedom: 4Pearson X2: 6.386Deviance: 6.536

217

Page 26: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

We can work with these parameter estimates as above to obtain OR’s of interest etc.

Logistic Regression Case Study 1: Risk Factors for Low Birth Weight

ResponseY = low birth weight, i.e. birth weight < 2500 grams (1 = yes, 0 = no)Set of potential predictorsX1 = previous history of premature labor (1 = yes, 0 = no)X2 = hypertension during pregnancy (1 = yes, 0 = no)X3 = smoker (1 = yes, 0 = no)X4 = uterine irritability (1 = yes, 0 = no)X5 = minority (1 = yes, 0 = no)X6 = mother’s age in yearsX7 = mother’s weight at last menstrual cycle

Analysis in R> Lowbirth = read.table(file.choose(),header=T)> Lowbirth[1:5,] # print first 5 rows of the data set Low Prev Hyper Smoke Uterine Minority Age Lwt race bwt1 0 0 0 0 1 1 19 182 2 25232 0 0 0 0 0 1 33 155 3 25513 0 0 0 1 0 0 20 105 1 25574 0 0 0 1 1 0 21 108 1 2594

218

Page 27: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

5 0 0 0 1 1 0 18 107 1 2600

Make sure categorical variables are interpreted as factors by using the factor command> Low = factor(Low)> Prev = factor(Prev)> Hyper = factor(Hyper)> Smoke = factor(Smoke)> Uterine = factor(Uterine)> Minority = factor(Minority)

Note: This is not really necessary for dichotomous variables that are coded (0,1).

Fit a preliminary model using available covariates> low.glm = glm(Low~Prev+Hyper+Smoke+Uterine+Minority+Age+Lwt,family=binomial)> summary(low.glm)Call:glm(formula = Low ~ Prev + Hyper + Smoke + Uterine + Minority + Age + Lwt, family = binomial)

Deviance Residuals: Min 1Q Median 3Q Max -1.6010 -0.8149 -0.5128 1.0188 2.1977

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.378479 1.170627 0.323 0.74646 Prev1 1.196011 0.461534 2.591 0.00956 **Hyper1 1.452236 0.652085 2.227 0.02594 * Smoke1 0.959406 0.405302 2.367 0.01793 * Uterine1 0.647498 0.466468 1.388 0.16511 Minority1 0.990929 0.404969 2.447 0.01441 * Age -0.043221 0.037493 -1.153 0.24900 Lwt -0.012047 0.006422 -1.876 0.06066 . ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Null deviance: 232.40 on 185 degrees of freedomResidual deviance: 196.71 on 178 degrees of freedomAIC: 212.71

Number of Fisher Scoring iterations: 3

It appears that both uterine irritability and mother’s age are not significant. We can fit the reduced model eliminating both terms and test whether the model is significantly degraded by using the general chi-square test (see pg. 11 of the logistic notes).

> low.reduced = glm(Low~Prev+Hyper+Smoke+Minority+Lwt,family=binomial)> summary(low.reduced)

Call:glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family = binomial)

Deviance Residuals: Min 1Q Median 3Q Max -1.7277 -0.8219 -0.5368 0.9867 2.1517

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.261274 0.885803 -0.295 0.76803 Prev1 1.181940 0.444254 2.661 0.00780 **Hyper1 1.397219 0.656271 2.129 0.03325 * Smoke1 0.981849 0.398300 2.465 0.01370 *

219

Page 28: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Minority1 1.044804 0.394956 2.645 0.00816 **Lwt -0.014127 0.006387 -2.212 0.02697 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 232.40 on 185 degrees of freedomResidual deviance: 200.32 on 180 degrees of freedomAIC: 212.32

Number of Fisher Scoring iterations: 3

Ho :ln (θ( x~)

1−θ ( x~) )=βo+ β1 X1+β2 X2+ β3 X3+β5 X5+β7 X 7

H1 :ln(θ( x~)

1−θ( x~) )=βo+β1 X1+ β2 X2+β3 X3+β4 X4+β5 X5+ β6 X6+ β7 X7

* Recall: θ( x

~)=P( Low=1|X

~)

Residual Deviance Null Hypothesis Model: DH o

=200 .32 df = 180

Residual Deviance Alternative Hypothesis Model: DH1

=196 .71 df = 178

General Chi-Square Test χ2=DH 0

−D H1=200. 32−196 .71=3 . 607

p−value=P ( χ22>3 .607 )=. 1647

Fail to reject the null, the reduced model is adequate.Interpretation of Model Parameters OR’s Associated with Categorical Predictors> low.reduced

Call: glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family = binomial)

Coefficients:(Intercept) Prev1 Hyper1 Smoke1 Minority1 Lwt -0.26127 1.18194 1.39722 0.98185 1.04480 -0.01413

Degrees of Freedom: 185 Total (i.e. Null); 180 ResidualNull Deviance: 232.4 Residual Deviance: 200.3 AIC: 212.3

Estimated OR’s > exp(low.reduced$coefficients[2:5]) Prev1 Hyper1 Smoke1 Minority1

220

Page 29: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

3.260693 4.043938 2.669388 2.842841

95% CI for OR Associated with History of Premature Labor> exp(1.182 - 1.96*.444)[1] 1.365827> exp(1.182 + 1.96*.444)[1] 7.78532

Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.366 and 7.785 times larger for mothers with a history of premature labor.

95% CI for OR Associated with Hypertension> exp(1.397 - 1.96*.6563)[1] 1.117006> exp(1.397 + 1.96*.6563)[1] 14.63401

Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.117 and 14.63 times larger for mothers with hypertension during pregnancy.

95% CI for OR Associated with Smoking> exp(.981849 - 1.96*.3983)[1] 1.222846> exp(.981849 + 1.96*.3983)[1] 5.827086

Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.223 and 5.827 times larger for mothers who smoked during pregnancy.

95% CI for OR Associated with Minority Status> exp(1.0448 - 1.96*.3950)[1] 1.310751> exp(1.0448 + 1.96*.3950)[1] 6.16569

Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.311 and 6.166 times larger for non-white mothers.

OR Associated with Mother’s Weight at Last Menstrual Cycle

Because this is a continuous predictor with values over 100 we should use an increment larger than one when considering the effect of mother’s weight on birth weight. Here we will use an increment of c = 10 lbs. although certainly there are other possibilities.

> exp(-10*.014127) [1] 0.8682549

221

Page 30: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

i.e. 13.2% decrease in the OR for each additional 10 lbs. in premenstrual weight.

A 95% CI for this OR is:> exp(10*(-.014127) - 1.96*10*.006387)[1] 0.7660903> exp(10*(-.014127) + 1.96*10*.006387)[1] 0.9840439

x = seq(min(Lwt),max(Lwt),.5)fit = predict(low.reduced,data.frame(Prev=factor(rep(1,length(x))),Hyper=factor(rep(0,length(x))),Smoke=factor(rep(1,length(x))),Minority=factor(rep(0,length(x))),Lwt=x),type="response")plot(x,fit,xlab=”Mother’s Weight”,ylab=”P(Low|Prev=1,Smoke=1,Lwt)”)

Diagnostics (Delta Deviance and Cook’s Distance)As in the case of ordinary least squares (OLS) regression we need to be wary of cases that may have unduly high influence on our results and those that are poorly fit. The most common influence measure is Cook’s Distance and a good measure of poorly fit cases is the Delta Deviance.

Essentially Cook’s Distance (Δ β(−i ) ) measures the changes in the estimated parameters when the ith observation is deleted. This change is measured for each of the observations

and can be plotted versus θ( x

~) or observation number to aid in the identification of high

influence cases. Several cut-offs have been proposed for Cook’s Distance, the most

common being to classify an observation as having large influence if Δ β (−i)>1 or, in

case of large sample size n, Δ β (−i)>4 /n . (details of Cook’s distance on page 38 below)

This is a plot of the effect of premenstrual weight for smoking mothers with a history of premature labor. Using the predict command above similar plots could be constructed by examining other combinations of the categorical predictors.

222

Page 31: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Delta deviance measures the change in the deviance (D) when the ith case is deleted. Values around 4 or larger are considered to cases that are poorly fit. These cases

correspond to cases to individuals where y i=1 but θ( x

~) is small, or cases where y i=0

but θ( x

~) is large.

In cases of both high influence and poor fit it is good to look at the covariate values for these individuals and we can begin to address the role they play in the analysis. In many cases there will be several individuals with the same covariate pattern, especially if most or all of the predictors are categorical in nature.

> Diagplot.glm(low.reduced)

> Diagplot.log(low.reduced)

223

Page 32: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Cases 11 and 13 have the highest Cook’s distances although they are not that large. It should be noted also that they are also somewhat poorly fit. Cases 129, 144, 152, and 180 appear to be poorly fit. The information on all of these cases is shown below.

> Lowbirth[c(11,13,129,144,152,180),] Low Prev Hyper Smoke Uterine Minority Age Lwt race bwt11 0 0 1 0 0 1 19 95 3 272213 0 0 1 0 0 1 22 95 3 2750129 1 0 0 0 1 0 29 130 1 1021144 1 0 0 0 1 1 21 200 2 1928152 1 0 0 0 0 0 24 138 1 2100180 1 0 0 1 0 0 26 190 1 2466

Case 152 had a low birth weight infant even in the absence of the identified potential risk factors. The fitted values for all four of the poorly fit cases are quite small.

> fitted(low.reduced)[c(11,13,129,144,152,180)] 11 13 129 144 152 180 0.69818500 0.69818500 0.10930602 0.11486743 0.09877858 0.12307383

Cases 11 and 13 have high predicted probabilities despite the fact that they had babies with normal birth weight. Their relatively high leverage might come from the fact that there were very few hypertensive minority women in the study. These two facts combined lead to the relatively large Cook’s Distances for these two cases.

224

Page 33: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Plotting Estimated Conditional Probabilities ~P( Low=1|x

~)

A summary of the reduced model is given below:> low.reduced

Call: glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family = binomial)

Coefficients:(Intercept) Prev1 Hyper1 Smoke1 Minority1 Lwt -0.26127 1.18194 1.39722 0.98185 1.04480 -0.01413

Degrees of Freedom: 185 Total (i.e. Null); 180 ResidualNull Deviance: 232.4 Residual Deviance: 200.3 AIC: 212.3

To easily plot probabilities in R we can write a function that takes covariate values and compute the desired conditional probability.

> x <- seq(min(Lwt),max(Lwt),.5)

> PrLwt <- function(x,Prev,Hyper,Smoke,Minority) {+ L <- -.26127 + 1.18194*Prev + 1.39722*Hyper + .98185*Smoke + + 1.0448*Minority - .01413*x+ exp(L)/(1 + exp(L))+ }> plot(x,PrLwt(x,1,1,1,1),xlab="Mother's Weight",ylab="P(Low=1|x)",+ ylim=c(0,1),type="l")> title(main="Plot of P(Low=1|X) vs. Mother's Weight")> lines(x,PrLwt(x,0,0,0,0),lty=2,col="red")> lines(x,PrLwt(x,1,1,0,0),lty=3,col="blue")> lines(x,PrLwt(x,0,0,1,1),lty=4,col="green")

225

Page 34: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Fitting Logistic Models in Arc and More Diagnostics (lowbirtharc.txt from website)Again we consider the low birth weight case study.

Arc 1.03, rev Aug, 2000, Wed Oct 22, 2003, 12:10:14. Data set name: LowbwLow birth weight study.Name Type n InfoAGE Variate 189 Age of motherBWT Variate 189 Actual birthweight of child in gramsHT Variate 189 Mother hypertensive during pregnancy (1 = yes, 0 = no)ID Variate 189 LOW Variate 189 (1 = low birthweight, 0 = normal birthweight)LWT Variate 189 Mothers weight at last menstrual cyclePTD Variate 189 do not knowPTL Variate 189 Previous history of premature labor (1 = yes, 0 = no)RACE Variate 189 Race of mother (1 = white, 2 = black, 3 = other)SMOKE Variate 189 Mother smoke (1 = yes, 0 = no)UI Variate 189 Uterine irritability (1 = yes, 0 = no)FTV Text 189 # of doctor visits during 1st trimester{F}FTV Factor 189 Factor--first level dropped{F}HT Factor 189 Factor--first level dropped{F}PTD Factor 189 Factor--first level dropped{F}RACE Factor 189 Factor--first level dropped{F}SMOKE Factor 189 Factor--first level dropped{F}UI Factor 189 Factor--first level dropped

In the resulting dialog box, specify the model as shown on the following page.

Select Fit binomial response… from the Graph & Fit menu

226

Page 35: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

The output below shows the results of fitting this initial model.Data set = Lowbw, Name of Fit = B1Binomial RegressionKernel mean function = LogisticResponse = LOWTerms = (AGE LWT {F}FTV {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant 0.386634 1.27736 0.303 0.7621AGE -0.0372340 0.0386777 -0.963 0.3357LWT -0.0156530 0.00707594 -2.212 0.0270{F}FTV[0] 0.436379 0.479161 0.911 0.3624{F}FTV[2+] 0.615386 0.553104 1.113 0.2659 {F}HT[1] 1.91316 0.720434 2.656 0.0079{F}PTD[1] 1.34376 0.480445 2.797 0.0052{F}RACE[2] 1.19241 0.535746 2.226 0.0260{F}RACE[3] 0.740681 0.461461 1.605 0.1085{F}SMOKE[1] 0.755525 0.424764 1.779 0.0753{F}UI[1] 0.680195 0.464216 1.465 0.1429

Scale factor: 1. Number of cases: 189Degrees of freedom: 178Pearson X2: 179.059Deviance: 195.476 (Note: AIC = D + 2k*(scale factor) = 195.48 + 22 = 217.48)The results are identical those obtained from R. Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 195.48 on 178 degrees of freedom

Give the model a name if you want.

Always include an intercept.

Use the Make Factors… option from the data set menu to ensure all categorical predictors are treated as factors.

Put dichotomous response in the Response… box. The response may also be the number of “successes” observed. (see below)

If mi=1 for all cases then put the variable Ones in the Trials… box. If your response represented the number of “successes” observed in mi > 1 trials then you we need to import the number trials and put that variable in this box.

Note: For FTV those who went to the doctor once during the first trimester are used as the reference group

227

Page 36: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Examining Submodels – Backward Elimination and Forward Selection

The results of backward elimination for the current low birth weight model are shown below.Data set = Lowbw, Name of Fit = B1Binomial RegressionKernel mean function = LogisticResponse = LOWTerms = (AGE LWT {F}FTV {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI)Trials = OnesBackward Elimination: Sequentially remove termsthat give the smallest change in AIC.All fits include an intercept.

Current terms: (AGE LWT {F}FTV {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) df Deviance Pearson X2 | k AICDelete: {F}FTV 180 196.834 180.989 | 9 214.834 *Delete: AGE 179 196.417 181.401 | 10 216.417Delete: {F}UI 179 197.585 180.753 | 10 217.585Delete: {F}SMOKE 179 198.674 186.809 | 10 218.674Delete: {F}RACE 180 201.227 183.365 | 9 219.227Delete: LWT 179 200.949 177.855 | 10 220.949Delete: {F}HT 179 202.934 177.447 | 10 222.934Delete: {F}PTD 179 203.584 180.74 | 10 223.584

Current terms: (AGE LWT {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) df Deviance Pearson X2 | k AICDelete: AGE 181 197.852 183.999 | 8 213.852 *Delete: {F}UI 181 199.151 184.559 | 8 215.151Delete: {F}RACE 182 203.24 182.815 | 7 217.240Delete: {F}SMOKE 181 201.247 186.953 | 8 217.247Delete: LWT 181 201.833 181.355 | 8 217.833Delete: {F}PTD 181 203.948 181.536 | 8 219.948Delete: {F}HT 181 204.013 179.069 | 8 220.013

Forward Elimination –Select this option and click OK. It will then show terms are sequentially added to a model containing any base terms to the model. By default the base contains the intercept only.

Backward Elimination –Simply select this option and click OK. It will show how terms are sequentially eliminated from the model along with the resulting AIC for the deletion.

The other options do what they say.

228

Page 37: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Current terms: (LWT {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) df Deviance Pearson X2 | k AICDelete: {F}UI 182 200.482 186.918 | 7 214.482 *Delete: {F}SMOKE 182 202.567 189.716 | 7 216.567Delete: {F}RACE 183 205.466 186.461 | 6 217.466Delete: LWT 182 203.816 185.551 | 7 217.816Delete: {F}PTD 182 204.217 182.499 | 7 218.217Delete: {F}HT 182 205.162 182.282 | 7 219.162

Current terms: (LWT {F}HT {F}PTD {F}RACE {F}SMOKE) df Deviance Pearson X2 | k AICDelete: {F}SMOKE 183 205.397 189.925 | 6 217.397Delete: {F}RACE 184 207.955 192.506 | 5 217.955Delete: {F}HT 183 207.039 184.17 | 6 219.039Delete: LWT 183 207.165 187.234 | 6 219.165Delete: {F}PTD 183 208.247 184.45 | 6 220.247

Current terms: (LWT {F}HT {F}PTD {F}RACE) df Deviance Pearson X2 | k AICDelete: {F}RACE 185 210.123 194.086 | 4 218.123Delete: {F}HT 184 212.18 188.048 | 5 222.180Delete: LWT 184 213.226 187.544 | 5 223.226Delete: {F}PTD 184 216.295 191.533 | 5 226.295

Current terms: (LWT {F}HT {F}PTD) df Deviance Pearson X2 | k AICDelete: {F}HT 186 217.497 190.809 | 3 223.497Delete: LWT 186 217.662 188.394 | 3 223.662Delete: {F}PTD 186 221.142 193.26 | 3 227.142

Current terms: (LWT {F}PTD) df Deviance Pearson X2 | k AICDelete: LWT 187 221.898 188.863 | 2 225.898Delete: {F}PTD 187 228.691 189.647 | 2 232.691

* indicates a potential “final” model using the AIC criteria, Arc does not add the *’s.

Making InteractionsTo make interactions in Arc…

1st - Select Make Interactions from the data set menu.

2nd - Placing all covariates in the right-hand box will create all possible two-way interactions.

229

Page 38: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Deciding which interactions to include however is not as easy as in R. You could potentially include all interactions and then backward eliminate, however things will get unstable numerically with that many terms in the model. It is better to choose any interactions you feel might make physiological sense and then backward eliminate.

If Arc does not use the reference group you would like to use, you can create dummy variables for each level of the factor and then leave the one for the reference group out when you specify the model.

The model with the age*recoded FTV and the smoking*uterine irritability interactions we saw in the R handout is summarized below.Data set = Lowbw, Name of Fit = B6Binomial RegressionKernel mean function = LogisticResponse = LOWTerms = (AGE LWT {F}HT {F}PTD {F}SMOKE {F}UI {F}SMOKE*{F}UI {T}FTV[1] {T}FTV[2+] {T}FTV[1]*AGE {T}FTV[2+]*AGE)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -0.582374 1.42158 -0.410 0.6821AGE 0.0755389 0.0539665 1.400 0.1616LWT -0.0203726 0.00749678 -2.718 0.0066{F}HT[1] 2.06570 0.748727 2.759 0.0058{F}PTD[1] 1.56032 0.496986 3.140 0.0017{F}SMOKE[1] 0.780044 0.420371 1.856 0.0635{F}UI[1] 1.81853 0.667517 2.724 0.0064{F}SMOKE[1].{F}UI[1] -1.91668 0.973066 -1.970 0.0489{T}FTV[1] 2.92109 2.28571 1.278 0.2013{T}FTV[2+] 9.24491 2.66099 3.474 0.0005{T}FTV[1].AGE -0.161824 0.0968164 -1.671 0.0946{T}FTV[2+].AGE -0.411033 0.119117 -3.451 0.0006 Number of cases: 189Degrees of freedom: 177Pearson X2: 179.282Deviance: 183.073

Selecting these options will create three dummy variables one for each level of FTV (0, 1, 2+).

Notice: The recoding of FTV so FTV=0 is now the reference group.

230

Page 39: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Diagnostic PlotsThere are several plotting options in Arc to help assess a models adequacy. They are as follows:

Residuals (deviance or chi-square) vs. the estimated logit (L= βT X )Deviance Residual

Di=2⋅sgn ( y i−θ ( xi ))⋅[ yi ln( y i

θ( x i ) )+(1− y i ) ln( 1− y i

1−θ (x i ))]Chi-residual for the ith covariate pattern is defined as:

e χ i=

yi− y i

√mi θ (x~

i )(1−θ( x i))~ the sum of the squared chi-residuals = Pearson’s

where y i=miθ ( xi )

~ and y i=1for cases and 0 for controls. Plot of Cook’s distance vs. Case Number or some other quantity. Plot of Leverage (potential for influence) vs. Case Number Model checking plots

Residuals vs. Estimated Logit (or some other function of the covariates)If the model is adequate, a lowess ( = .6) smooth added to the plot should be constant,

i.e. flat. This plot will not work well when the number of replicates,mi are small, i.e. close to 1. Model checking plots work better for checking model adequacy in those cases.

Eta’U ~ Estimated Logit (Li= βT x i

~ )

Obs-Fraction ~ y i/mi (1 and 0’s in the case mi =1)

Fit-Fraction ~ θ( x i

~)= e Li

1+eLi

Chi-Residuals ~ see aboveDev-Residuals ~ see above

T-Residuals ~

e χi

√1−hi studentized chi-residualLeverages ~ hi = ith element of the hat matrix H

Cook’s Distance ~ Di=

1k ( e χ i

2

1−hi) hi

1−hi measures influence of the ith case.

231

Page 40: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

As an example consider the simple, but reasonable, main effects model shown on the next page.

Data set = Lowbw, Name of Fit = B3Binomial RegressionKernel mean function = LogisticResponse = LOWTerms = (LWT {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -0.125327 0.967238 -0.130 0.8969LWT -0.0159185 0.00695085 -2.290 0.0220{F}HT[1] 1.86689 0.707212 2.640 0.0083{F}PTD[1] 1.12886 0.450330 2.507 0.0122{F}RACE[2] 1.30085 0.528349 2.462 0.0138{F}RACE[3] 0.854413 0.440761 1.938 0.0526{F}SMOKE[1] 0.866581 0.404341 2.143 0.0321{F}UI[1] 0.750648 0.458753 1.636 0.1018

Scale factor: 1. Number of cases: 189Degrees of freedom: 181Pearson X2: 183.999Deviance: 197.852

The plots of the chi-square residuals vs. the estimated logit (L =βT X ) and LWT are shown below. The lowess smooth looks fairly flat and so no model inadequacies are suggested.

232

Page 41: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Cook’s Distance vs. Case Number and Est. Probs - (no cases have high influence)

Leverages vs. Case Numbers For leverages the average value is k/n, so values far exceeding the average have the potential to be influential. The following is a good rule of thumb:

1/n < hi < .25 no worries.25 < hi < .50 worry.50 < hi < 1 worry lots

Model Checking Plots

For any linear combination bT x i of the predictors of terms imagine drawing two plots:

one of y i/mi vs. bT x i , and one of

θ( x i )~ vs.b

T x i . If the model is adequate lowess smooth of each should match for any linear combination we choose. A model checking plot is a

plot with bT x i on the x-axis and both the lowess smooths described above added to the

plot. If they agree for a variety of choices of bT x i then we can feel reasonably confident

that our model is adequate. Large differences between these smooths can indicate model

deficiencies. Common choices for bT x i include the estimated logits (L ), the individual

predictors, and randomly chosen combinations of the terms in the model.

233

Page 42: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Here we see good agreement between the two smoothes for the estimated logits.

Model checking plot with the single term LWT on the x-axis.

234

Page 43: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Model checking plot for one random linear combination of the terms in the model. Again we see good agreement.

235

Page 44: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Interactions and Higher Order Terms (Note ~ uses data frame: Lowbwt ) Working with a slightly different version of the low birth weight data available which includes an additional predictor, ftv, which is a factor that indicates the number of first trimester doctor visits the woman (coded as: 0, 1, or 2+). We will examine how the model below was developed in the next section where we discuss model development.

In the model below we have added an interaction between age and the number of first trimester visits. The logistic model is:

log(θ( x~)

1−θ( x~) )=βo+β1 Age+β2 Lwt+ β3 Smoke+β4 Pr ev+β5 HT +β6 UI+

β7 FTV 1+β8 FTV 2+β9 Age∗FTV 1+β10 Age∗FTV 2+β11 Smoke∗UI

> summary(bigmodel)

Call:glm(formula = low ~ age + lwt + smoke + ptd + ht + ui + ftv + age:ftv + smoke:ui, family = binomial)

Deviance Residuals: Min 1Q Median 3Q Max -1.8945 -0.7128 -0.4817 0.7841 2.3418

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.582389 1.420834 -0.410 0.681885 age 0.075538 0.053945 1.400 0.161428 lwt -0.020372 0.007488 -2.721 0.006513 ** smoke1 0.780047 0.420043 1.857 0.063302 . ptd1 1.560304 0.496626 3.142 0.001679 ** ht1 2.065680 0.748330 2.760 0.005773 ** ui1 1.818496 0.666670 2.728 0.006377 ** ftv1 2.921068 2.284093 1.279 0.200941 ftv2+ 9.244460 2.650495 3.488 0.000487 ***age:ftv1 -0.161823 0.096736 -1.673 0.094360 . age:ftv2+ -0.411011 0.118553 -3.467 0.000527 ***smoke1:ui1 -1.916644 0.972366 -1.971 0.048711 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 183.07 on 177 degrees of freedomAIC: 207.07

Number of Fisher Scoring iterations: 4> bigmodel$coefficients (Intercept) age lwt smoke1 prev1 ht1 -0.58238913 0.07553844 -0.02037234 0.78004747 1.56030401 2.06567991 ui1 ftv1 ftv2+ age:ftv1 age:ftv2+ smoke1:ui1 1.81849631 2.92106773 9.24445985 -0.16182328 -0.41101103 -1.91664380

236

Page 45: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Calculate P(Low|Age,FTV) for women of average pre-pregnancy weight with all other risk factors absent. Similar calculations could be done if we wanted to add in other factors as well.

First we calculate the logits as function of age for three levels of FTV 0, 1, and 2+ respectively.> L <- -.5824 + .0755*agex - .02037*mean(lwt)> L1 <- -.5824 + .0755*agex - .02037*mean(lwt) + 2.9211 - .16182*agex> L2 <- -.5824 + .0755*agex - .02037*mean(lwt) + 9.2445 - .4110*agex

Next we calculate the associated conditional probabilities.> P <- exp(L)/(1+exp(L))> P1 <- exp(L1)/(1+exp(L1))> P2 <- exp(L2)/(1+exp(L2))

Finally we plot the probability curves as function of age and FTV.> plot(agex,P,type="l",xlab="Age",ylab="P(Low|Age,FTV)",ylim=c(0,1))> lines(agex,P1,lty=2,col="blue")> lines(agex,P2,lty=3,col="red")> title(main="Interaction Between Age and First Trimester Visits",cex=.6)

We also have an interaction between smoking and uterine irritability added to the model. This will affect how we interpret the two in terms of odds ratios. We need to consider the OR associated with smoking for women without uterine irritability, the OR associated with uterine irritability for nonsmokers, and finally the OR associated with smoking and having uterine irritability during pregnancy.

The interaction between in age and FTV produces differences in direction and magnitude of the age effect. For women with no first trimester doctor visits their probability of low birth weight increases with age. However for women with at least one first trimester visit the probability of low birth weight decreases with age. The magnitude of that drop is largest for women with 2 or more first trimester visits.

237

Page 46: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

These estimated odds ratios are given below:

OR for Smoking with No Uterine Irritability> exp(.7800)[1] 2.181472OR for Uterine Irritability with No Smoking> exp(1.8185)[1] 6.162608OR for Smoking and Uterine Irritability > exp(.7800+1.8185-1.91664)[1] 1.977553

This result is hard to explain physiologically and so this interaction term might be removed from the model.

Model Selection MethodsStepwise methods used in logistic regression are the same as those used in ordinary least square regression however the measure is the AIC (Akaike Information Criteria) as opposed to Mallow’s Ck statistic. Like Mallow’s statistic, AIC balances residual deviance and the number of parameters in the model.

AIC = D + 2kφ

Where D = residual deviance, k = total number of estimated parameters, and φ is an estimate of the dispersion parameter which is taken to be 1 in models where overdispersion is not present. Overdispersion occurs when the data consists of the

number of successes out of mi > 1 trials and the trials are not independent (e.g. male birth data from your last homework).

Forward, backward, both forward and backward simultaneously, and all possible subsets regression methods can be employed to find models with small AIC values. By default R uses both forward and backward selection simultaneously. The command to do this in R has the basic form:

> step(current model name)

To have it select from models containing all potential two-way interactions use:

> step(current model name, scope=~.^2)

This sometimes will have problems with convergence due to overfitting (i.e. the estimated probabilities approach 0 and 1 as in the saturated model). If this occurs you can have R consider adding each of the potential interaction terms and then you can scan the list and decide which you might want to add to your existing model. You can then continue adding terms until the AIC criteria suggests additional terms do not improve current model.

238

Page 47: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

These commands are illustrated for the low birth weight data with first trimester visits included in the output shown below.

Base Model> low.glm <- glm(low~age+lwt+race+smoke+ht+ui+ptd+ftv,family=binomial)> summary(low.glm)

Call:glm(formula = low ~ age + lwt + race + smoke + ht + ui + ptd + ftv, family = binomial)

Deviance Residuals: Min 1Q Median 3Q Max -1.7038 -0.8068 -0.5009 0.8836 2.2151

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.822706 1.240174 0.663 0.50709 age -0.037220 0.038530 -0.966 0.33404 lwt -0.015651 0.007048 -2.221 0.02637 * race2 1.192231 0.534428 2.231 0.02569 * race3 0.740513 0.459769 1.611 0.10726 smoke1 0.755374 0.423246 1.785 0.07431 . ht1 1.912974 0.718586 2.662 0.00776 **ui1 0.680162 0.463464 1.468 0.14222 ptd1 1.343654 0.479409 2.803 0.00507 **ftv1 -0.436331 0.477792 -0.913 0.36112 ftv2+ 0.178939 0.455227 0.393 0.69426 ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 195.48 on 178 degrees of freedomAIC: 217.48

Number of Fisher Scoring iterations: 3

Find “best” model that includes all potential two-way interactions> low.step <- step(low.glm,scope=~.^2)Start: AIC= 217.48 low ~ age + lwt + race + smoke + ht + ui + ptd + ftv

Df Deviance AIC+ age:ftv 2 183.00 209.00- ftv 2 196.83 214.83- age 1 196.42 216.42<none> 195.48 217.48- ui 1 197.59 217.59+ smoke:ui 1 193.76 217.76+ lwt:smoke 1 194.04 218.04+ ui:ptd 1 194.24 218.24+ lwt:ui 1 194.28 218.28

239

Page 48: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

+ ptd:ftv 2 192.38 218.38+ ht:ptd 1 194.55 218.55+ age:ptd 1 194.58 218.58+ age:ht 1 194.59 218.59+ age:smoke 1 194.61 218.61+ race:ui 2 192.63 218.63- smoke 1 198.67 218.67+ smoke:ht 1 195.03 219.03+ smoke:ptd 1 195.16 219.16- race 2 201.23 219.23+ race:smoke 2 193.24 219.24+ lwt:ptd 1 195.35 219.35+ lwt:ht 1 195.44 219.44+ age:lwt 1 195.46 219.46+ age:ui 1 195.47 219.47+ ht:ftv 2 194.00 220.00+ lwt:ftv 2 194.19 220.19+ smoke:ftv 2 194.47 220.47+ age:race 2 194.58 220.58+ lwt:race 2 194.63 220.63+ race:ptd 2 194.83 220.83- lwt 1 200.95 220.95+ race:ht 2 195.19 221.19+ ui:ftv 2 195.32 221.32- ht 1 202.93 222.93- ptd 1 203.58 223.58+ race:ftv 4 193.81 223.81

Step: AIC= 209 low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv

Df Deviance AIC+ smoke:ui 1 179.94 207.94+ lwt:smoke 1 180.89 208.89- race 2 186.99 208.99<none> 183.00 209.00+ ui:ptd 1 181.42 209.42+ lwt:ui 1 181.90 209.90+ ht:ptd 1 182.06 210.06- smoke 1 186.11 210.11+ age:smoke 1 182.16 210.16+ race:ui 2 180.32 210.32+ age:ptd 1 182.50 210.50- ui 1 186.61 210.61+ smoke:ht 1 182.71 210.71+ lwt:ptd 1 182.75 210.75+ smoke:ptd 1 182.82 210.82+ age:ht 1 182.90 210.90+ age:ui 1 182.96 210.96+ age:lwt 1 183.00 211.00+ lwt:ht 1 183.00 211.00+ race:smoke 2 181.23 211.23+ lwt:ftv 2 181.44 211.44+ ptd:ftv 2 181.57 211.57+ age:race 2 181.62 211.62+ smoke:ftv 2 181.65 211.65+ ht:ftv 2 181.82 211.82

240

Page 49: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

+ lwt:race 2 182.55 212.55+ race:ht 2 182.78 212.78+ race:ptd 2 182.85 212.85- lwt 1 188.88 212.88+ ui:ftv 2 182.94 212.94- ht 1 190.13 214.13- ptd 1 191.05 215.05+ race:ftv 4 181.69 215.69- age:ftv 2 195.48 217.48

Step: AIC= 207.94 low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui

Df Deviance AIC- race 2 183.07 207.07<none> 179.94 207.94+ lwt:smoke 1 178.34 208.34+ ht:ptd 1 178.89 208.89- smoke:ui 1 183.00 209.00+ ui:ptd 1 179.07 209.07+ age:ptd 1 179.35 209.35+ age:smoke 1 179.37 209.37+ smoke:ptd 1 179.58 209.58+ lwt:ptd 1 179.61 209.61+ lwt:ui 1 179.76 209.76+ age:ht 1 179.78 209.78+ smoke:ht 1 179.82 209.82+ age:lwt 1 179.84 209.84+ age:ui 1 179.86 209.86+ lwt:ht 1 179.94 209.94+ lwt:ftv 2 178.25 210.25+ ptd:ftv 2 178.53 210.53+ smoke:ftv 2 178.64 210.64+ race:smoke 2 178.73 210.73+ age:race 2 178.84 210.84+ ht:ftv 2 178.89 210.89+ race:ui 2 179.13 211.13+ ui:ftv 2 179.50 211.50+ race:ht 2 179.52 211.52+ lwt:race 2 179.68 211.68+ race:ptd 2 179.86 211.86- lwt 1 187.15 213.15- ht 1 187.66 213.66+ race:ftv 4 178.51 214.51- ptd 1 188.83 214.83- age:ftv 2 193.76 217.76

Step: AIC= 207.07 low ~ age + lwt + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui

Df Deviance AIC<none> 183.07 207.07+ lwt:smoke 1 181.40 207.40+ ui:ptd 1 181.88 207.88+ ht:ptd 1 181.93 207.93+ race 2 179.94 207.94

241

Page 50: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

+ age:smoke 1 181.97 207.97+ age:ht 1 182.64 208.64+ age:ptd 1 182.69 208.69+ lwt:ptd 1 182.73 208.73+ lwt:ui 1 182.76 208.76+ smoke:ptd 1 182.85 208.85+ age:lwt 1 182.92 208.92- smoke:ui 1 186.99 208.99+ age:ui 1 182.99 208.99+ smoke:ht 1 183.02 209.02+ lwt:ht 1 183.06 209.06+ smoke:ftv 2 181.48 209.48+ lwt:ftv 2 181.69 209.69+ ptd:ftv 2 181.85 209.85+ ui:ftv 2 182.28 210.28+ ht:ftv 2 182.41 210.41- ht 1 191.21 213.21- lwt 1 191.56 213.56- ptd 1 193.59 215.59- age:ftv 2 199.00 219.00Summarize the model returned from the stepwise search> summary(low.step)

Call:glm(formula = low ~ age + lwt + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui, family = binomial)

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.582389 1.420834 -0.410 0.681885 age 0.075538 0.053945 1.400 0.161428 lwt -0.020372 0.007488 -2.721 0.006513 ** smoke1 0.780047 0.420043 1.857 0.063302 . ht1 2.065680 0.748330 2.760 0.005773 ** ui1 1.818496 0.666670 2.728 0.006377 ** ptd1 1.560304 0.496626 3.142 0.001679 ** ftv1 2.921068 2.284093 1.279 0.200941 ftv2+ 9.244460 2.650495 3.488 0.000487 ***age:ftv1 -0.161823 0.096736 -1.673 0.094360 . age:ftv2+ -0.411011 0.118553 -3.467 0.000527 ***smoke1:ui1 -1.916644 0.972366 -1.971 0.048711 * Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 183.07 on 177 degrees of freedomAIC: 207.07Number of Fisher Scoring iterations: 4

This is the model used to demonstrate model interpretation in the presence of interactions.An alternative to the full blown search above is to consider adding a single interaction term to the “Base Model” from the set of all possible terms.

242

Page 51: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> add1(low.glm,scope=~.^2)Single term additions

Model:low ~ age + lwt + race + smoke + ht + ui + ptd + ftv Df Deviance AIC<none> 195.48 217.48age:lwt 1 195.46 219.46age:race 2 194.58 220.58age:smoke 1 194.61 218.61age:ht 1 194.59 218.59age:ui 1 195.47 219.47age:ptd 1 194.58 218.58age:ftv 2 183.00 209.00 *lwt:race 2 194.63 220.63lwt:smoke 1 194.04 218.04lwt:ht 1 195.44 219.44lwt:ui 1 194.28 218.28lwt:ptd 1 195.35 219.35lwt:ftv 2 194.19 220.19race:smoke 2 193.24 219.24race:ht 2 195.19 221.19race:ui 2 192.63 218.63race:ptd 2 194.83 220.83race:ftv 4 193.81 223.81smoke:ht 1 195.03 219.03smoke:ui 1 193.76 217.76smoke:ptd 1 195.16 219.16smoke:ftv 2 194.47 220.47ht:ui 0 195.48 217.48ht:ptd 1 194.55 218.55ht:ftv 2 194.00 220.00ui:ptd 1 194.24 218.24ui:ftv 2 195.32 221.32ptd:ftv 2 192.38 218.38

We can than “manually” enter this term to our base model by using the update command in R.> low.glm2 <- update(low.glm,.~.+age:ftv)> summary(low.glm2)

Call:glm(formula = low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv, family = binomial)

Deviance Residuals: Min 1Q Median 3Q Max -2.0338 -0.7690 -0.4510 0.8354 2.3383

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.636485 1.558677 -1.050 0.29376 age 0.085461 0.055734 1.533 0.12519 lwt -0.017599 0.007653 -2.300 0.02147 * race2 0.994134 0.550962 1.804 0.07118 . race3 0.700669 0.491400 1.426 0.15391

243

Page 52: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

smoke1 0.792972 0.452303 1.753 0.07957 . ht1 1.936204 0.747576 2.590 0.00960 **ui1 0.938620 0.492240 1.907 0.05654 . ptd1 1.373390 0.495738 2.770 0.00560 **ftv1 2.877889 2.253710 1.277 0.20162 ftv2+ 8.264965 2.594444 3.186 0.00144 **age:ftv1 -0.149619 0.096342 -1.553 0.12043 age:ftv2+ -0.359454 0.115429 -3.114 0.00185 **---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 183.00 on 176 degrees of freedomAIC: 209Number of Fisher Scoring iterations: 4

Next we could use add1 to consider the remaining interaction terms for addition to this model.> add1(low.glm2,scope=~.^2)Single term additionsModel:low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv Df Deviance AIC<none> 183.00 209.00age:lwt 1 183.00 211.00age:race 2 181.62 211.62age:smoke 1 182.16 210.16age:ht 1 182.90 210.90age:ui 1 182.96 210.96age:ptd 1 182.50 210.50lwt:race 2 182.55 212.55lwt:smoke 1 180.89 208.89 *lwt:ht 1 183.00 211.00lwt:ui 1 181.90 209.90lwt:ptd 1 182.75 210.75lwt:ftv 2 181.44 211.44race:smoke 2 181.23 211.23race:ht 2 182.78 212.78race:ui 2 180.32 210.32race:ptd 2 182.85 212.85race:ftv 4 181.69 215.69smoke:ht 1 182.71 210.71smoke:ui 1 179.94 207.94 **smoke:ptd 1 182.82 210.82smoke:ftv 2 181.65 211.65ht:ui 0 183.00 209.00ht:ptd 1 182.06 210.06ht:ftv 2 181.82 211.82ui:ptd 1 181.42 209.42ui:ftv 2 182.94 212.94ptd:ftv 2 181.57 211.57

244

Page 53: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Motivating Example: Recumbant Cows“The abiltiy of biochemical and haematolgical tests to predict recovery in periparturient recumbent cows.” NZ Veterinary Journal, 35, 126-133 Clark, R. G., Henderson, H. V., Hoggard, G. K. Ellison, R. S. and Young,B. J. (1987).

Study Description:For unknown reasons, many pregnant dairy cows become recumbant--they lay down--either shortly before or after calving. This condition can be serious, and may lead to death of the cow. These data are from a study of blood samples of over 500 cows studied at the Ruakura (N.Z.) Animal Health Laboratory during 1983-84. A variety of blood tests were performed, and for many of the animals the outcome (survived, died, or animal was killed) was determined. The goal is to see if survival can be predicted from theblood measurements. Case numbers 12607 and 11630 were noted as having exceptional care---and they survived.Name Type n InfoAST Variate 429 serum asparate amino transferase (U/l at 30C)Calving Variate 431 0 if measured before calving, 1 if afterCK Variate 413 Serum creatine phosphokinase (U/l at 30C)Daysrec Variate 432 Days recumbentInflamat Variate 136 inflamation 0=no, 1=yesMyopathy Variate 222 Muscle disorder, 1 if present, 0 if absentOutcome Variate 435 outcome: 1 if survived, 0 if died or killed (response)PCV Variate 175 Packed Cell Volume (Haemactocrit), %Urea Variate 266 serum urea (mmol/l)CaseNo Text 435 case number

Because calving, inflammation, and myopathy are Bernoulli dichotomous predictors they will not be transformed, although we might consider potential interactions involving these predictors. We will not consider inflammation and myopathy however as most of the cows have that information missing.

Guidelines for Transforming Predictors in Logistic Regression

Examining univariate conditional density plots for continuous predictors f ( xi|y ) (Cook & Weisberg)

Consider,

f ( x|y )= the conditional density of x given outcome variable y ={1 if success

0 if failure

Idea:

245

Page 54: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Univariate considerationsf ( x|y ) Suggested model termsNormal, common variancei.e. Var ( x|y=0 )=Var ( x|y=1) xNormal, unequal variancesi.e. Var ( x|y=0 )≠Var ( x|y=1)v x and x2

Skewed right x and log2(x)base 2 is easier to interpret

x∈[ 0,1 ] log(x) , log(1-x)x is dichotomous, Bernoulli xx ~ Poisson, i.e. x is a count x

Multivariate considerationsWhen considering multiple continuous predictors simultaneously we look at multivariate normality.

If

f ( x~|y )~ MVN ( μ y=k , Σ)

then use the x’s themselves

f ( x~|y )~ MVN ( μ y=k , Σ y=k )

then include x i2

’s and x i x j terms

For example in the two predictor case (p = 2)

x1 x2 is needed if E( x1|x2 )=βo+β1 , y=k x2

and if the variances are different for the x i across levels of y then we add x i2

terms as well.

AST

246

Page 55: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Clearly AST has skewed distribution and using the log 2( AST ) in the model would be recommended. After transformation we have

In f ( log2 ( AST )|Outcome )appears to approximately normal for both levels with a constant variance so quadratic terms in the log scale are not suggested.

CK

247

Page 56: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Clearly CK is extremely right skewed and would benefit from log transformation.

Again the conditional densities appear approximately normal with equal variance, so we

will consider adding log 2(CK )only to the model.

PCV

248

Page 57: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

f ( PCV |Outcome ) is approximately normal for both outcome groups but the variation in PCV levels appear to be higher for cows that survived. Thus we will consider PCV and PCV2 terms in the model.

Daysrec

Despite the fact that Daysrec is right skewed we will not log transform it. It represents a count of the number of days the cow was recumbent, so it could be modeled as a Poisson and thus the only term recommended is the Daysrec itself.

Urea

249

Page 58: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Consider the log transformation of urea level.

f ( log2 (Urea )|Outcome) is approximately normal however the variation for cows that

survived appears larger so we will consider both log 2(Urea ) and log 2(Urea )2 terms.

250

Page 59: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Data set = Downer, Name of Fit = B2372 cases are missing at least one value. (PCV has lots of missing values also)Binomial RegressionKernel mean function = LogisticResponse = OutcomeTerms = (AST log2[AST] CK log2[CK] Urea log2[Urea] log2[Urea]^2 PCV PCV^2 Daysrec Calving)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -1.03935 6.35298 -0.164 0.8700AST -0.000720027 0.00242524 -0.297 0.7666log2[AST] -0.330179 0.554239 -0.596 0.5514CK -0.000109772 0.000135315 -0.811 0.4172log2[CK] -0.0121434 0.223648 -0.054 0.9567Urea -1.13453 1.05860 -1.072 0.2838log2[Urea] 0.730468 2.89371 0.252 0.8007log2[Urea]^2 0.660165 1.38757 0.476 0.6342PCV 0.182480 0.224691 0.812 0.4167PCV^2 -0.00165620 0.00325722 -0.508 0.6111Daysrec -0.391937 0.157490 -2.489 0.0128Calving 1.28561 0.648089 1.984 0.0473

Scale factor: 1. Number of cases: 435Number of cases used: 165Degrees of freedom: 153Pearson X2: 127.410Deviance: 141.988

Clearly we have some model reduction to do, as many of the current terms are not significant. Before backward eliminating we will drop all of the non-transformed versions of log scale predictors.

Coefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -3.82598 5.84498 -0.655 0.5127log2[AST] -0.554005 0.293416 -1.888 0.0590log2[CK] -0.118575 0.160536 -0.739 0.4601log2[Urea] 4.09939 3.12355 1.312 0.1894log2[Urea]^2 -0.978895 0.545929 -1.793 0.0730PCV 0.218085 0.213730 1.020 0.3075PCV^2 -0.00229912 0.00305947 -0.751 0.4524Daysrec -0.383179 0.153758 -2.492 0.0127Calving 1.39322 0.647605 2.151 0.0314Scale factor: 1. Number of cases: 435Number of cases used: 165Degrees of freedom: 156Pearson X2: 134.154Deviance: 145.123 Backward Elimination: Sequentially remove termsthat give the smallest change in AIC.All fits include an intercept.

251

Page 60: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Current terms: (log2[AST] log2[CK] log2[Urea] log2[Urea]^2 PCV PCV^2 Daysrec Calving) df Deviance Pearson X2 | k AICDelete: log2[CK] 157 145.671 134.797 | 8 161.671Delete: PCV^2 157 145.786 134.995 | 8 161.786Delete: PCV 157 146.392 135.415 | 8 162.392Delete: log2[Urea] 157 148.141 140.787 | 8 164.141Delete: log2[AST] 157 148.92 140.737 | 8 164.920Delete: Calving 157 150.163 141.672 | 8 166.163Delete: Daysrec 157 151.993 135.976 | 8 167.993Delete: log2[Urea]^2 157 152.536 143.299 | 8 168.536

Current terms: (log2[AST] log2[Urea] log2[Urea]^2 PCV PCV^2 Daysrec Calving) df Deviance Pearson X2 | k AICDelete: PCV^2 158 146.202 135.813 | 7 160.202 *Delete: PCV 158 146.701 136.211 | 7 160.701Delete: log2[Urea] 158 149.035 142.035 | 7 163.035Delete: Calving 158 151.207 140.587 | 7 165.207Delete: Daysrec 158 152.168 136.078 | 7 166.168Delete: log2[Urea]^2 158 153.767 145.12 | 7 167.767Delete: log2[AST] 158 161.383 144.17 | 7 175.383

Current terms: (log2[AST] log2[Urea] log2[Urea]^2 PCV Daysrec Calving) df Deviance Pearson X2 | k AICDelete: PCV 159 148.955 137.789 | 6 160.955Delete: log2[Urea] 159 150.035 144.626 | 6 162.035Delete: Calving 159 152.176 141.179 | 6 164.176Delete: Daysrec 159 152.699 136.298 | 6 164.699Delete: log2[Urea]^2 159 155.31 149.108 | 6 167.310Delete: log2[AST] 159 163.059 140.738 | 6 175.059

Current terms: (log2[AST] log2[Urea] log2[Urea]^2 Daysrec Calving) df Deviance Pearson X2 | k AICDelete: log2[Urea] 160 152.373 144.523 | 5 162.373Delete: Daysrec 160 155.744 138.388 | 5 165.744Delete: Calving 160 155.99 142.871 | 5 165.990Delete: log2[Urea]^2 160 157.017 148.417 | 5 167.017Delete: log2[AST] 160 164.785 143.03 | 5 174.785

Current terms: (log2[AST] log2[Urea]^2 Daysrec Calving) df Deviance Pearson X2 | k AICDelete: Calving 161 160.932 150.399 | 4 168.932Delete: Daysrec 161 162.036 146.037 | 4 170.036Delete: log2[AST] 161 169.755 148.817 | 4 177.755Delete: log2[Urea]^2 161 176.794 157.24 | 4 184.794

Current terms: (log2[AST] log2[Urea]^2 Daysrec) df Deviance Pearson X2 | k AICDelete: Daysrec 162 167.184 150.961 | 3 173.184Delete: log2[AST] 162 178.021 150.618 | 3 184.021Delete: log2[Urea]^2 162 181.641 162.028 | 3 187.641

Current terms: (log2[AST] log2[Urea]^2) df Deviance Pearson X2 | k AICDelete: log2[Urea]^2 163 182.688 162.386 | 2 186.688Delete: log2[AST] 163 192.479 151.943 | 2 196.479

252

Page 61: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Forward selection suggests the same model.

“Final” ModelData set = Downer, Name of Fit = B5372 cases are missing at least one value.Binomial RegressionKernel mean function = LogisticResponse = OutcomeTerms = (log2[AST] log2[Urea] log2[Urea]^2 PCV Daysrec Calving)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -1.12404 5.01853 -0.224 0.8228log2[AST] -0.733670 0.196044 -3.742 0.0002log2[Urea] 4.44950 3.17044 1.403 0.1605log2[Urea]^2 -1.05918 0.554282 -1.911 0.0560PCV 0.0514512 0.0335256 1.535 0.1249Daysrec -0.386695 0.153067 -2.526 0.0115Calving 1.44641 0.623820 2.319 0.0204

Scale factor: 1. Number of cases: 435Number of cases used: 170Degrees of freedom: 163Pearson X2: 138.509Deviance: 148.269

Diagnostics and Model Checking Plots

Chi-residuals vs. estimated logits ~ Looks good.

253

Page 62: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Cook’s Distance and Leverage vs. Case Numbers

Model Checking Plots (Estimated Logits and Marginals)

LOGIT

254

Page 63: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

AST

UREA

255

Page 64: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

PCV

DAYSREC

All of these plots look OK. The largest departure observed is in the case of urea but the discrepancy there is primarily due to one observation stands out from the rest.

256

Page 65: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

In R

To replicate the analysis above in R you will need the following functions to look and the conditional densities:

f ( x|y=0 ) and f (x∨ y=1)

The first two functions are used to make pretty histograms in the conplot function. The function conplot replicates the fitting density estimates conditional on the value of the outcome variable Y, by taking the predictor X and the Y as arguments. If there are missing values on either the response or the predictor those cases are automatically removed before constructing the plot.

nclass.FD = function (x) { r <- quantile(x, c(0.25, 0.75)) names(r) <- NULL h <- 2 * (r[2] - r[1]) * length(x)^{ -1/3 } ceiling(diff(range(x))/h)}

bandwidth.nrd = function (x) { r <- quantile(x, c(0.25, 0.75)) h <- (r[2] - r[1])/1.34 4 * 1.06 * min(sqrt(var(x)), h) * length(x)^(-1/5)}

conplot = function (x, xname = deparse(substitute(x)),y) { xname <- deparse(substitute(x)) data = na.omit(cbind(x,y)) x = data[,1] y = as.numeric(data[,2]) lev = unique(y) par(err = -1) dens0 <- density(x[y==0], width = bandwidth.nrd(x[y==lev[1]])) dens1 <- density(x[y==1], width = bandwidth.nrd(x[y==lev[2]])) ylim <- range(c(dens0$y,dens1$y)) xlim <- range(c(dens0$x,dens1$x)) hist(x, nclass.FD(x), prob = T, xlab = xname, xlim = xlim, ylim = ylim,main=paste("Conditional X|Y Plot of ",xname)) lines(dens0,col="blue") lines(dens1,col="red") invisible()}

257

Page 66: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> conplot(x=AST,y=Outcome)

> conplot(x=log(AST),y=Outcome)

Etc…

To obtain model checking plots in R you will need to install the package car from the CRAN which essentially is a collection of functions to replicate Arc in R. The two functions that create model checking plots in the car library are called mmp and mmps, the latter creates model checking plots for each predictor as well as the overall fit.

258

Page 67: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Downer Example in R> mod1 = glm(Outcome~AST+Urea+PCV+Calving+Daysrec+CK,family="binomial")> summary(mod1)

Call:glm(formula = Outcome ~ AST + Urea + PCV + Calving + Daysrec + CK, family = "binomial")

Deviance Residuals: Min 1Q Median 3Q Max -1.7678 -0.7541 -0.1928 0.7546 2.0696

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.3313771 1.1987644 0.276 0.78222 AST -0.0022405 0.0014726 -1.521 0.12815 Urea -0.3140380 0.0770497 -4.076 4.59e-05 ***PCV 0.0601745 0.0339726 1.771 0.07652 . Calving 1.3192777 0.6238318 2.115 0.03445 * Daysrec -0.4804961 0.1498000 -3.208 0.00134 ** CK -0.0001435 0.0001121 -1.280 0.20068 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 212.71 on 164 degrees of freedomResidual deviance: 146.39 on 158 degrees of freedom (270 observations deleted due to missingness) AIC: 160.39

Number of Fisher Scoring iterations: 7

> mmps(mod1)

259

Page 68: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Using the same approach as in the analysis in Arc we might use the model with several terms based on predictor transformations.

> logAST = log2(AST)> logCK = log2(CK)> logUrea = log2(Urea)> logUrea2 = logUrea^2> PCV2 = PCV^2

> Downer2 = data.frame(Outcome,logAST,logCK,logUrea,logUrea2,PCV,PCV2,Daysrec,Calving)> Downer2 = na.omit(Downer2)> attach(Downer2)

> mod2 = glm(Outcome~logAST+logCK+logUrea+logUrea2+PCV+PCV2+ Daysrec+Calving,family="binomial",data=Downer2)> summary(mod2)

Call:glm(formula = Outcome ~ logAST + logCK + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving, family = "binomial", data = Downer2)

Deviance Residuals: Min 1Q Median 3Q Max -1.9522 -0.7094 -0.2869 0.7109 2.0585

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.826289 5.856671 -0.653 0.5135 logAST -0.554007 0.293529 -1.887 0.0591 .logCK -0.118574 0.160621 -0.738 0.4604 logUrea 4.099642 3.137295 1.307 0.1913 logUrea2 -0.978940 0.548487 -1.785 0.0743 .PCV 0.218083 0.213771 1.020 0.3076 PCV2 -0.002299 0.003060 -0.751 0.4525

260

Page 69: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Daysrec -0.383178 0.153795 -2.491 0.0127 *Calving 1.393222 0.647759 2.151 0.0315 *---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 212.71 on 164 degrees of freedomResidual deviance: 145.12 on 156 degrees of freedomAIC: 163.12

Number of Fisher Scoring iterations: 7

Backwards eliminate using the step() function

> mod3 = step(mod2)Start: AIC=163.12Outcome ~ logAST + logCK + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving

Df Deviance AIC- logCK 1 145.67 161.67- PCV2 1 145.79 161.79- PCV 1 146.39 162.39<none> 145.12 163.12- logUrea 1 148.14 164.14- logAST 1 148.92 164.92- Calving 1 150.16 166.16- Daysrec 1 151.99 167.99- logUrea2 1 152.54 168.54

Step: AIC=161.67Outcome ~ logAST + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving

Df Deviance AIC- PCV2 1 146.20 160.20- PCV 1 146.70 160.70<none> 145.67 161.67- logUrea 1 149.03 163.03- Calving 1 151.21 165.21- Daysrec 1 152.17 166.17- logUrea2 1 153.77 167.77- logAST 1 161.38 175.38

Step: AIC=160.2Outcome ~ logAST + logUrea + logUrea2 + PCV + Daysrec + Calving

Df Deviance AIC

261

Page 70: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

<none> 146.20 160.20- PCV 1 148.96 160.96- logUrea 1 150.03 162.03- Calving 1 152.18 164.18- Daysrec 1 152.70 164.70- logUrea2 1 155.31 167.31- logAST 1 163.06 175.06

> summary(mod3)

Call:glm(formula = Outcome ~ logAST + logUrea + logUrea2 + PCV + Daysrec + Calving, family = "binomial", data = Downer2)

Deviance Residuals: Min 1Q Median 3Q Max -2.0329 -0.6836 -0.2644 0.7002 2.0893

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.49051 5.04944 -0.295 0.767853 logAST -0.72986 0.19514 -3.740 0.000184 ***logUrea 4.61037 3.19802 1.442 0.149406 logUrea2 -1.08728 0.55899 -1.945 0.051768 . PCV 0.05489 0.03370 1.629 0.103394 Daysrec -0.37191 0.15422 -2.411 0.015888 * Calving 1.45572 0.62845 2.316 0.020538 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 212.71 on 164 degrees of freedomResidual deviance: 146.20 on 158 degrees of freedomAIC: 160.2

Number of Fisher Scoring iterations: 7

> mmps(mod3)

262

Page 71: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

We could consider adding interaction terms to our “final” model. This is easily done using the scope option.

> mod3 = step(mod2,scope=~.^2)> summary(mod3)

Call:glm(formula = Outcome ~ logAST + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving + PCV:Calving + logAST:PCV2 + logAST:PCV, family = "binomial", data = Downer2)

Deviance Residuals: Min 1Q Median 3Q Max -1.8712 -0.6639 -0.1012 0.6784 2.5298

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 44.950653 39.720672 1.132 0.2578 logAST -9.472551 5.720701 -1.656 0.0978 .

263

Page 72: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

logUrea 6.107698 3.425871 1.783 0.0746 .logUrea2 -1.315449 0.602659 -2.183 0.0291 *PCV -3.814468 2.386123 -1.599 0.1099 PCV2 0.069737 0.036368 1.918 0.0552 .Daysrec -0.388998 0.162716 -2.391 0.0168 *Calving 12.376954 6.037254 2.050 0.0404 *PCV:Calving -0.322466 0.168853 -1.910 0.0562 .logAST:PCV2 -0.010067 0.005179 -1.944 0.0519 .logAST:PCV 0.608959 0.344917 1.766 0.0775 .---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 212.71 on 164 degrees of freedomResidual deviance: 134.15 on 154 degrees of freedomAIC: 156.15

Number of Fisher Scoring iterations: 7

> mmps(mod3)

264

Page 73: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

There is a slight improvement in fit. The same model fit in JMP produces the following ROC curve. The resulting classification is very good using this model.

265

Page 74: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

MORE EXAMPLES OF LOGISTIC REGRESSION

Example 8.1 - Classification of Credit Card DefaultsIn this example, we seek to develop classification models to predict which customers will default on their credit card debt. The data frame is called Default and is in the ISLR library. The variables in the data frame are summarized below:

> summary(Default)

default student balance income No :9667 No :7056 Min. : 0.0 Min. : 772 Yes: 333 Yes:2944 1st Qu.: 481.7 1st Qu.:21340 Median : 823.6 Median :34553 Mean : 835.4 Mean :33517 3rd Qu.:1166.3 3rd Qu.:43808 Max. :2654.3 Max. :73554

The response is default={1if default=Yes0 if default=No and the predictors are customer

student status (Yes or No), the average balance on credit card ($) after making their monthly payment, and the customer’s annual income ($).

> def.glm1 = glm(default~student,data=Default,family="binomial")> summary(def.glm1)

Call:glm(formula = default ~ student, family = "binomial", data = Default)

Deviance Residuals: Min 1Q Median 3Q Max -0.2970 -0.2970 -0.2434 -0.2434 2.6585

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.50413 0.07071 -49.55 < 2e-16 ***studentYes 0.40489 0.11502 3.52 0.000431 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2920.6 on 9999 degrees of freedomResidual deviance: 2908.7 on 9998 degrees of freedomAIC: 2912.7

Number of Fisher Scoring iterations: 6

> logits = predict(def.glm1,type=”link”)> Pdefault = 1/(1+exp(-logits))> table(Pdefault)> table(Pdefault)

Pdefault0.0291950113382457 0.0431385869565177 7056 2944

266

Page 75: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> table(student,default) defaultstudent No Yes No 6850 206 Yes 2817 127

> mosaicplot(~student+default,color=3:5,main=”Mosaic Plot of Defaults vs. Student Status”)

> def.glm2 = glm(default~.,data=Default,family="binomial")> summary(def.glm2)

Call:glm(formula = default ~ ., family = "binomial", data = Default)

Deviance Residuals: Min 1Q Median 3Q Max -2.469 -0.142 -0.056 -0.020 3.738

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.09e+01 4.92e-01 -22.08 <2e-16 ***studentYes -6.47e-01 2.36e-01 -2.74 0.0062 ** balance 5.74e-03 2.32e-04 24.74 <2e-16 ***income 3.03e-06 8.20e-06 0.37 0.7115 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2920.6 on 9999 degrees of freedomResidual deviance: 1571.5 on 9996 degrees of freedomAIC: 1580

Number of Fisher Scoring iterations: 8

> PrDefault = function(x,student){ L = -10.9 - .647*student + .00574*balance + .00000303*33517 1/(1+exp(-L))}

267

Page 76: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> plot(balance,PrDefault(balance,student=1),col=5,ylab="P(Default|X)",pch=19)> points(balance,PrDefault(balance,student=0),col=3,pch=20)> legend(250,.8,c("Student","Non-student"),col=c(3,5),pch=c(19,20))

> par(mfrow=c(1,2))> boxplot(split(balance,student),col=c(3:5))> boxplot(split(income,student),col=c(3:5))

268

Page 77: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> def.glm3 = glm(default~.^2,data=Default,family="binomial")> summary(def.glm3)

Call:glm(formula = default ~ .^2, family = "binomial", data = Default)

Deviance Residuals: Min 1Q Median 3Q Max -2.485 -0.142 -0.055 -0.020 3.758

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.10e+01 1.87e+00 -5.91 3.3e-09 ***studentYes -5.20e-01 1.34e+00 -0.39 0.70 balance 5.88e-03 1.18e-03 4.98 6.3e-07 ***income 4.05e-06 4.46e-05 0.09 0.93 studentYes:balance -2.55e-04 7.90e-04 -0.32 0.75 studentYes:income 1.45e-05 2.78e-05 0.52 0.60 balance:income -1.58e-09 2.82e-08 -0.06 0.96 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2920.6 on 9999 degrees of freedomResidual deviance: 1571.1 on 9993 degrees of freedomAIC: 1585

Number of Fisher Scoring iterations: 8

> def.step = step(def.glm3)Start: AIC=1585default ~ (student + balance + income)^2

Df Deviance AIC- balance:income 1 1571 1583- student:balance 1 1571 1583- student:income 1 1571 1583<none> 1571 1585

Step: AIC=1583default ~ student + balance + income + student:balance + student:income

Df Deviance AIC- student:balance 1 1571 1581- student:income 1 1571 1581<none> 1571 1583

Step: AIC=1581default ~ student + balance + income + student:income

Df Deviance AIC- student:income 1 1572 1580<none> 1571 1581- balance 1 2907 2915

Step: AIC=1580default ~ student + balance + income

Df Deviance AIC

- income 1 1572 1578

269

Page 78: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

<none> 1572 1580- student 1 1579 1585- balance 1 2907 2913

Step: AIC=1578default ~ student + balance

Df Deviance AIC<none> 1572 1578- student 1 1596 1600- balance 1 2909 2913

> summary(def.step)

Call:glm(formula = default ~ student + balance, family = "binomial", data = Default)

Deviance Residuals: Min 1Q Median 3Q Max -2.458 -0.142 -0.056 -0.020 3.743

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.07e+01 3.69e-01 -29.12 < 2e-16 ***studentYes -7.15e-01 1.48e-01 -4.85 1.3e-06 ***balance 5.74e-03 2.32e-04 24.75 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2920.6 on 9999 degrees of freedomResidual deviance: 1571.7 on 9997 degrees of freedomAIC: 1578

Number of Fisher Scoring iterations: 8

The package ROCR contains functions to examine classification performance of a model where predicted probabilities of class membership are returned by modeling method. Given the estimated probabilities we first run the function pred to compare predicted probabilities to actual class memberships for the training data. We can then examine various performance measures and plot them by using the perf command. The ROC curve is obtained by running the performance function with the True Positive Rate (tpr) and False Positive Rate (fpr) as arguments. Plotting the results will give the ROC curve. The area underneath the curve requires another run of perf function with “auc” as the performance measure. This process is demonstrated below for our simple model to classify credit card defaulters.

270

Page 79: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> library(ROCR)> PrDefault = fitted(def.step)> pred = prediction(PrDefault,default)> perf = performance(pred,"tpr","fpr")> plot(perf,main="ROC Curve for Credit Card Default")> performance(pred,”auc”)

AUC = .9495

Example 8.2 - Classification of Real vs. Forged Swiss Francs

names(Swiss)[1] "id" "leng" "left" "right" "bottom" "top" "diagon" "genu" > Swiss = Swiss[,-1]> attach(Swiss)> pairs(Swiss[,-7],panel=function(x,y){points(x[genu==1],y[genu==1],pch="+",col="blue")points(x[genu==0],y[genu==0],pch="o",col="red")})

diagonal

271

Page 80: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> pairs.image(Swiss[,-7],cont=T)

272

Page 81: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> pairs.persp(Swiss[,-7])

Two predictor model with only linear terms> rb.sim = glm(genu~right+bottom,data=Swiss,family="binomial")> right.seq = seq(min(right),max(right),length=100)> bottom.seq = seq(min(bottom),max(bottom),length=100)> rb.grid = expand.grid(right=right.seq,bottom=bottom.seq)> PrGenu = predict(rb.glm,newdata=rb.grid,"response")> plot(right,bottom,xlab="Right (mm)",ylab="Bottom (mm)",type="n")> points(right,bottom,col=as.numeric(genu)+3,pch=17+as.numeric(genu))> z = matrix(PrGenu,100,100)

273

Page 82: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> contour(right.seq,bottom.seq,z,add=T,levels=.5,lty=1,lwd=2)

Two predictor model with non-linear terms> rb.glm = glm(genu~poly(right,2)+poly(bottom,2)+right:bottom,data=Swiss,family="binomial")> summary(rb.glm)Call:glm(formula = genu ~ poly(right, 2) + poly(bottom, 2) + right:bottom, family = "binomial", data = Swiss)

Deviance Residuals: Min 1Q Median 3Q Max -2.1890 -0.0486 0.0024 0.1998 2.7819

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1855.43 1370.41 -1.35 0.17576 poly(right, 2)1 -106.44 59.52 -1.79 0.07373 . poly(right, 2)2 9.81 5.06 1.94 0.05230 . poly(bottom, 2)1 -4110.61 2973.80 -1.38 0.16689 poly(bottom, 2)2 -43.92 13.15 -3.34 0.00084 ***right:bottom 1.51 1.12 1.35 0.17637 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 277.259 on 199 degrees of freedomResidual deviance: 76.157 on 194 degrees of freedomAIC: 88.16

Number of Fisher Scoring iterations: 8> PrGenu = fitted(rb.glm,type="response")> table(PrGenu>.5,genu) genu 0 1 FALSE 90 8 TRUE 10 92

> PrGenu = predict(rb.glm,newdata=rb.grid,"response")> plot(right,bottom,xlab="Right (mm)",ylab="Bottom (mm)",type="n")

274

Page 83: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

> points(right,bottom,col=as.numeric(genu)+3,pch=17+as.numeric(genu))> z = matrix(PrGenu,100,100)> contour(right.seq,bottom.seq,z,add=T) adds contours of P(genu=1)

> plot(right,bottom,xlab="Right (mm)",ylab="Bottom (mm)",type="n")> points(right,bottom,col=as.numeric(genu)+3,pch=17+as.numeric(genu))

Add decision boundary for rule: P(genu=1)>.50 Real Swiss Franc > contour(right.seq,bottom.seq,z,add=T,levels=0.5,lty=1,lwd=2

Building a logistic model using all available bill dimensions

> swiss.glm = glm(genu~.,data=Swiss,family="binomial")

Warning messages:1: glm.fit: algorithm did not converge 2: glm.fit: fitted probabilities numerically 0 or 1 occurred

275

Page 84: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Logistic regression will become unstable if the estimated probabilities are near 0 and/or 1. For these data, this is precisely what happens. Despite this instability, the model does nearly produce a perfect classification of the Swiss francs in the training data with an overall misclassification rate of .015 or 1.5%.

> table(PrGenu>.5,genu) genu 0 1 FALSE 99 2 TRUE 1 98

In cases where this instability occurs, both ridge and Lasso logistic regression are good options. These are also good options when you have a “wide data” problem where n < p or when p is large and also when you have some highly correlated predictors. For logistic regression, the regularized logistic models using the ridge and Lasso are given below.

Ridge Logistic Model: ln ( θ(x )1−θ(x ))=ηo+∑

j=1

k

η j u j+ λ∑j=1

k

η j2

Lasso Logistic Model: ln ( θ(x )1−θ(x ))=ηo+∑

j=1

k

η j u j+ λ∑j=1

k

|η j|

We now consider fitting both a ridge and Lasso logistic regression to the Swiss Franc data.

X = model.matrix(genu~.,data=Swiss[,-1])[,-1]y = Swiss$genuforg.ridge = glmnet(X,y,alpha=0,family="binomial")forg.lasso = glmnet(X,y,alpha=1,family="binomial")ridge.cv = cv.glmnet(X,y,alpha=0,family="binomial")lasso.cv = cv.glmnet(X,y,alpha=1,family="binomial")plot(ridge.cv)

ridge.lam = ridge.cv$lambda.minridge.lam[1] 0.04476108

276

Page 85: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

plot(lasso.cv)

lasso.lam = lasso.cv$lambda.minlasso.lam[1] 0.001849533

ypred.ridge = predict(forg.ridge,newx=X,s=ridge.lam,type=”response”)ypred.lasso = predict(forg.lasso,newx=X,s=lasso.lam,type=”response”)

table(ypred.ridge>.5,y) y 0 1 FALSE 100 1 TRUE 0 99

table(ypred.lasso>.5,y) y 0 1 FALSE 100 0 TRUE 0 100

277

Page 86: Logistic Regression ~ Handout #1course1.winona.edu/bdeppa/Stat 425/Handouts/Logisti… · Web view8 - Introduction to Logistic Regression These data are taken from the text “Applied

Cross-validation of a Classification from GLM Models (non-regularized or regularized, i.e. ridge and Lasso)

log.cv = function (fit, B=50,p = .67, pcut = 0.5) { cv <- rep(0, B) data = fit$data y = fit$y n = dim(data)[1] k = floor(n*p) for (i in 1:B) { sam <- sample(1:n,k,replace=F) fit2 <- glm(formula(fit),data = data[sam,],family = "binomial") phat <- predict(fit2, newdata = data[-sam,], type="response") predclass <- phat > pcut tab <- table(predclass, y[-sam]) mc <- (n-k) - sum(diag(tab)) cv[i] <- mc/(n-k) } cv}

It should not be hard to modify this code to handle the ridge and Lasso glmnet() models as well.

glmnetlog.cv = function (X,y,s=.10,alpha=0,B=50,p = .67, pcut = 0.5) { cv <- rep(0, B) n = length(y) k = floor(n*p) for (i in 1:B) { sam <- sample(1:n,k,replace=F) fit2 <- glmnet(X[sam,],y[sam],alpha=alpha,family = "binomial") phat <- predict(fit2, newx = X[-sam,], type="response",s=s) predclass <- phat > pcut tab <- table(predclass, y[-sam]) mc <- (n-k) - sum(diag(tab)) cv[i] <- mc/(n-k) } cv}

Recall,alpha = 0 RIDGE LOGISTIC REGRESSIONalpha = 1 LASSO LOGISTIC REGRESSION

and s will be the value for you found to be optimal from running the cv.glmnet() function.

278