lecture 15: logistic regression: inference and link functions

23
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II

Upload: deo

Post on 15-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Lecture 15: Logistic Regression: Inference and link functions. BMTRY 701 Biostatistical Methods II. More on our example. > pros5.reg summary(pros5.reg) Call: glm(formula = cap.inv ~ log(psa) + gleason, family = binomial) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 15: Logistic Regression:  Inference and link functions

Lecture 15:Logistic Regression: Inference and link functions

BMTRY 701Biostatistical Methods II

Page 2: Lecture 15: Logistic Regression:  Inference and link functions

More on our example

> pros5.reg <- glm(cap.inv ~ log(psa) + gleason, family=binomial)> summary(pros5.reg)

Call:glm(formula = cap.inv ~ log(psa) + gleason, family = binomial)

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.1061 0.9916 -8.174 2.97e-16 ***log(psa) 0.4812 0.1448 3.323 0.000892 ***gleason 1.0229 0.1595 6.412 1.43e-10 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 512.29 on 379 degrees of freedomResidual deviance: 403.90 on 377 degrees of freedomAIC: 409.9

Page 3: Lecture 15: Logistic Regression:  Inference and link functions

Other covariates: Simple logistic models

Covariate Beta exp(Beta) Z

Age -0.0082 0.99 -0.51

Race -0.054 0.95 -0.15

Vol -0.014 0.99 -2.26

Dig Exam (vs. no nodule)

Unilobar left 0.88 2.41 2.81

Unilobar right 1.56 4.76 4.78

Bilobar 2.10 8.17 5.44

Detection in RE 1.71 5.53 4.48

LogPSA 0.87 2.39 6.62

Gleason 1.24 3.46 8.12

Page 4: Lecture 15: Logistic Regression:  Inference and link functions

What is a good multiple regression model?

Principles of model building are analogous to linear regression

We use the same approach• Look for significant covariates in simple models• consider multicollinearity• look for confounding (i.e. change in betas when a

covariate is removed)

Page 5: Lecture 15: Logistic Regression:  Inference and link functions

Multiple regression model proposal

Gleason, logPSA, Volume, Digital Exam result, detection in RE

But, what about collinearity? 5 choose 2 pairs.

gleason log.psa. volgleason 1.00 0.46 -0.06log.psa. 0.46 1.00 0.05vol -0.06 0.05 1.00

gleason

-1 0 1 2 3 4 5

02

46

8

-10

12

34

5

log.psa.

0 2 4 6 8 0 20 40 60 80 100

020

4060

8010

0

vol

Page 6: Lecture 15: Logistic Regression:  Inference and link functions

Categorical pairs

> dpros.dcaps <- epitab(dpros, dcaps)> dpros.dcaps$tab OutcomePredictor 1 p0 2 p1 oddsratio lower upper 1 95 0.2802360 4 0.09756098 1.000000 NA NA 2 123 0.3628319 9 0.21951220 1.737805 0.5193327 5.815089 3 84 0.2477876 12 0.29268293 3.392857 1.0540422 10.921270 4 37 0.1091445 16 0.39024390 10.270270 3.2208157 32.748987 OutcomePredictor p.value 1 NA 2 4.050642e-01 3 3.777900e-02 4 1.271225e-05> fisher.test(table(dpros, dcaps))

Fisher's Exact Test for Count Data

data: table(dpros, dcaps) p-value = 2.520e-05alternative hypothesis: two.sided

Page 7: Lecture 15: Logistic Regression:  Inference and link functions

Categorical vs. continuous

t-tests and anova: means by category> summary(lm(log(psa)~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2506 0.1877 6.662 9.55e-11 ***dcaps 0.8647 0.1632 5.300 1.97e-07 ***---Residual standard error: 0.9868 on 378 degrees of freedomMultiple R-squared: 0.06917, Adjusted R-squared: 0.06671 F-statistic: 28.09 on 1 and 378 DF, p-value: 1.974e-07

> summary(lm(log(psa)~factor(dpros)))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1418087 0.0992064 21.589 < 2e-16 ***factor(dpros)2 -0.1060634 0.1312377 -0.808 0.419 factor(dpros)3 0.0001465 0.1413909 0.001 0.999 factor(dpros)4 0.7431101 0.1680055 4.423 1.28e-05 ***---Residual standard error: 0.9871 on 376 degrees of freedomMultiple R-squared: 0.07348, Adjusted R-squared: 0.06609 F-statistic: 9.94 on 3 and 376 DF, p-value: 2.547e-06

Page 8: Lecture 15: Logistic Regression:  Inference and link functions

Categorical vs. continuous

> summary(lm(vol~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.905 3.477 6.587 1.51e-10 ***dcaps -6.362 3.022 -2.106 0.0359 * ---Residual standard error: 18.27 on 377 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.01162, Adjusted R-squared: 0.009003 F-statistic: 4.434 on 1 and 377 DF, p-value: 0.03589

> summary(lm(vol~factor(dpros)))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 17.417 1.858 9.374 <2e-16 ***factor(dpros)2 -1.638 2.453 -0.668 0.505 factor(dpros)3 -1.976 2.641 -0.748 0.455 factor(dpros)4 -3.513 3.136 -1.120 0.263 ---Residual standard error: 18.39 on 375 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.003598, Adjusted R-squared: -0.004373 F-statistic: 0.4514 on 3 and 375 DF, p-value: 0.7164

Page 9: Lecture 15: Logistic Regression:  Inference and link functions

Categorical vs. continuous

> summary(lm(gleason~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.2560 0.1991 26.401 < 2e-16 ***dcaps 1.0183 0.1730 5.885 8.78e-09 ***---Residual standard error: 1.047 on 378 degrees of freedomMultiple R-squared: 0.08394, Adjusted R-squared: 0.08151 F-statistic: 34.63 on 1 and 378 DF, p-value: 8.776e-09

> summary(lm(gleason~factor(dpros)))

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.9798 0.1060 56.402 < 2e-16 ***factor(dpros)2 0.4217 0.1403 3.007 0.00282 ** factor(dpros)3 0.4890 0.1511 3.236 0.00132 ** factor(dpros)4 0.9636 0.1795 5.367 1.40e-07 ***---

Residual standard error: 1.055 on 376 degrees of freedomMultiple R-squared: 0.07411, Adjusted R-squared: 0.06672 F-statistic: 10.03 on 3 and 376 DF, p-value: 2.251e-06

Page 10: Lecture 15: Logistic Regression:  Inference and link functions

Lots of “correlation” between covariates

We should expect that there will be insignificance and confounding.

Still, try the ‘full model’ and see what happens

Page 11: Lecture 15: Logistic Regression:  Inference and link functions

Full model results

> mreg <- glm(cap.inv ~ gleason + log(psa) + vol + dcaps + factor(dpros), family=binomial)

> > summary(mreg)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.617036 1.102909 -7.813 5.58e-15 ***gleason 0.908424 0.166317 5.462 4.71e-08 ***log(psa) 0.514200 0.156739 3.281 0.00104 ** vol -0.014171 0.007712 -1.838 0.06612 . dcaps 0.464952 0.456868 1.018 0.30882 factor(dpros)2 0.753759 0.355762 2.119 0.03411 * factor(dpros)3 1.517838 0.372366 4.076 4.58e-05 ***factor(dpros)4 1.384887 0.453127 3.056 0.00224 ** ---

Null deviance: 511.26 on 378 degrees of freedomResidual deviance: 376.00 on 371 degrees of freedom (1 observation deleted due to missingness)AIC: 392

Page 12: Lecture 15: Logistic Regression:  Inference and link functions

What next?

Drop or retain? How to interpret?

Page 13: Lecture 15: Logistic Regression:  Inference and link functions

Likelihood Ratio Test

Recall testing multiple coefficients in linear regression

Approach: ANOVA We don’t have ANOVA for logistic More general approach: Likelihood Ratio Test Based on the likelihood (or log-likelihood) for

“competing” nested models

Page 14: Lecture 15: Logistic Regression:  Inference and link functions

Likelihood Ratio Test

Ho: small model Ha: large model Example:

)4()3()2(

log)(logit

654

3210

DIDIDI

volPSAGSpi

0and/or ;0;0:

0:

6541

6540

H

H

Page 15: Lecture 15: Logistic Regression:  Inference and link functions

Recall the likelihood function

n

iiii

n

i i

yi

xxyxyL

x

xxyL

i

1101010

1 10

1010

))exp(1log()(),;,(log

)exp(1

)exp(),;,(

Page 16: Lecture 15: Logistic Regression:  Inference and link functions

Estimating the log-likelihood

Recall that we use the log-likelihood because it is simpler (back to linear regression)

MLEs:• Betas are selected to maximize the likelihood• Betas also maximize the log-likelihood• If we plus the estimated betas, we get our ‘maximized’

log-likelihood for that model

We compare the log-likelihoods from competing (nested) models

Page 17: Lecture 15: Logistic Regression:  Inference and link functions

Likelihood Ratio Test

LR statistic = G2 = -2*(LogL(H0)-LogL(H1))

Under the null: G2 ~ χ2(p-q)

If G2 < χ2(p-q),1-α, conclude H0

If G2 > χ2(p-q),1-α conclude H1

Page 18: Lecture 15: Logistic Regression:  Inference and link functions

LRT in R

-2LogL = Residual Deviance So, G2 = Dev(0) - Dev(1) Fit two models:

0and/or ;0;0:

0:

6541

6540

H

H

Page 19: Lecture 15: Logistic Regression:  Inference and link functions

> mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros),+ family=binomial)> mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial)> mreg1Coefficients: (Intercept) gleason log(psa) vol -8.31383 0.93147 0.53422 -0.01507 factor(dpros)2 factor(dpros)3 factor(dpros)4 0.76840 1.55109 1.44743

Degrees of Freedom: 378 Total (i.e. Null); 372 Residual (1 observation deleted due to missingness)Null Deviance: 511.3 Residual Deviance: 377.1 AIC: 391.1

> mreg0Coefficients:(Intercept) gleason log(psa) vol -7.76759 0.99931 0.50406 -0.01583

Degrees of Freedom: 378 Total (i.e. Null); 375 Residual (1 observation deleted due to missingness)Null Deviance: 511.3 Residual Deviance: 399 AIC: 407

Page 20: Lecture 15: Logistic Regression:  Inference and link functions

Testing DPROS

Dev(0) – Dev(1) =

p – q =

χ2(p-q),1-α, =

Conclusion?

p-value?

Page 21: Lecture 15: Logistic Regression:  Inference and link functions

More in R

qchisq(0.975,3)-2*(logLik(mreg0) - logLik(mreg1))1-pchisq(21.96, 3)

> anova(mreg0, mreg1)Analysis of Deviance Table

Model 1: cap.inv ~ gleason + log(psa) + volModel 2: cap.inv ~ gleason + log(psa) + vol + factor(dpros) Resid. Df Resid. Dev Df Deviance1 375 399.02 2 372 377.06 3 21.96>

Page 22: Lecture 15: Logistic Regression:  Inference and link functions

Notes on LRT

Again, models have to be NESTED For comparing models that are not nested, you

need to use other approaches Examples:

• AIC • BIC• DIC

Next time….

Page 23: Lecture 15: Logistic Regression:  Inference and link functions

For next time, read the following article

Mary K. Townsend, Gary C. Curhan, Neil M. Resnick, Francine Grodstein, Oral Contraceptive Use and Incident Urinary Incontinence in Premenopausal Women, The Journal of Urology, In Press,

(http://www.sciencedirect.com/science/article/B7XMT-4VVN50M-K/2/31c7620a20865c25c70c93736ef2814d)

Keywords: urinary incontinence; contraceptives; oral; epidemiology