stats 330: lecture 21
DESCRIPTION
Stats 330: Lecture 21. More on Logistic Regression. Plan of the day. In today’s lecture we continue our discussion of the logistic regression model Topics covered Probabilities, odds & log odds Inference for coefficients, probabilities and log-odds Calculating them in R - PowerPoint PPT PresentationTRANSCRIPT
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 1
Stats 330: Lecture 21
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 2
Plan of the day
In today’s lecture we continue our discussion of the logistic regression model
Topics covered– Probabilities, odds & log odds– Inference for coefficients, probabilities and log-
odds– Calculating them in R
• Reference: Coursebook, section 5.2.1
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 3
Probabilities, Odds and Log Odds
• If E is an event, the probability that E occurs is written P(E).
• The odds on E occuring is the ratio
P(E)/(1-P(E))
• The log-odds is the logarithm of the odds
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 4
For the logistic regression model
• Binary response Y=0/1, covariate x
• Let E be the event that Y=1. Let denote this probability. Then
= exp( + x)/[ 1+ exp( + x)]
1 - = 1- exp( + x) /[ 1+ exp( + x)] =1/ [ 1+ exp( + x)]
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 5
Odds & log-odds
)exp(
)exp(1/1
)exp(1/)exp(
1
x
x
xx
Odds
Log – odds
(logits)x
x
odds
))log(exp(
1log)log(
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 6
Logistic regression model
)exp(1
)exp(
x
x
)exp(1
x
Probability form
Odds form
Log-odds formx
1log
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 7
Interpretation of • If x is increased by 1, odds become
exp((x+1)) = exp(x) exp( = old odds exp(
measures effect of unit increase in x on odds (multiplies by exp(
• If x is increased by 1, log odds become (x+1) = x + = old log-odds + measures effect of unit increase in x on
log-odds (adds )
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 8
Estimating probabilities and log-odds
• Given a fitted model, and a value of x, how can we estimate the probability ?
• In practical terms, how can we estimate the probability a person of a given age has CHD?
• Example: If age is 45, what is =P(CHD)?
• Use estimates for and : estimate of is -5.2784, estimate of is 0.1103
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 9
Hand Calculations
• Estimated probability isexp(-5.2784 + 0.1103 45)/
(1+ exp(-5.2784 + 0.1103 45 )) = 0.4221
• Estimated odds is 0.4221/(1-0.4221) = 0.7304
• Log-odds (logit) is log (0.7304) = -0.3142
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 10
Calculations using R
> predict(chd.glm,data.frame(age=45), type="response")[1] 0.4221367
> predict(chd.glm,data.frame(age=45))[1] -0.314008
Calculates probability
Calculates log-odds
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 11
Plotting estimated probability: grouped
approachgrouped.chd.df<-data.frame(g.age=sort(unique(chd.df$age)),r=as.vector(tapply(chd.df$chd, chd.df$age,sum)),n=as.vector(tapply(chd.df$chd, chd.df$age,length)))
attach(grouped.chd.df)plot(g.age, r/n, xlab= "age", ylab= "r/n")grouped.chd.glm<-glm(cbind(r, n-r)~g.age, family=binomial, data=grouped.chd.df)
est.prob<-predict(grouped.chd.glm, grouped.chd.df, type="response")lines(g.age,est.prob,lwd=2,col="blue")
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 12
20 30 40 50 60 70
0.0
0.2
0.4
0.6
0.8
1.0
age
r/n
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 13
Ungrouped approachplot(chd.df$age, chd.df$chd, xlab="age", ylab="CHD")
chd.glm<-glm(chd~age, family=binomial, data=chd.df)
est.prob<-predict(chd.glm, data.frame(age=sort(chd.df$age)), type="response")
lines(sort(chd.df$age),est.prob,lwd=3,col="blue")
Need age in ascending order
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 14
20 30 40 50 60 70
0.0
0.2
0.4
0.6
0.8
1.0
age
CH
D
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 15
Inference for coefficients and probabilities
• Provided we have sufficient data, the estimated coefficients are approximately normal, similar to linear regression. – (in linear regression, exactly normal under the model
assumptions)
• The Maximum likelihood method gives us a way of computing standard errors for the coefficients and the estimated probabilities - we skip the (complicated) mathematical details
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 16
Testing for a zero coefficient
• To test if a coefficient is zero we use the t-statistic and p-value just as in linear regression – tests are interpreted the same way
• (in the case of a single covariate, this is testing that there is no relationship between covariate and response)
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 17
CHD example
> summary(chd.glm)
Call:glm(formula = chd ~ age, family = binomial, data = chd.df)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.2784 1.1296 -4.673 2.97e-06 ***age 0.1103 0.0240 4.596 4.30e-06 ***---
P-values both small, need covariate and intercept
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 18
Confidence intervals
Take the form (Wald intervals)
Estimate ± standard error 1.96
e.g. for , we get 0.1103 ± 0.0240 1.96
i.e. 0.1103 ± 0.04704 or (0.0633, 0.1573)
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.2784 1.1296 -4.673 2.97e-06 ***age 0.1103 0.0240 4.596 4.30e-06 ***
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 19
Confidence intervals (2)Or, use the confint function (LR intervals)
> confint(chd.glm)Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) -7.68700761 -3.2196722age 0.06638715 0.1612957
> confint(chd.glm, level=0.99)Waiting for profiling to be done... 0.5 % 99.5 %(Intercept) -8.53291031 -2.6281457age 0.05368102 0.1791288
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 20
Confidence intervals for probabilities
Calculated with predict function(Like prediction intervals in linear regression)
Form is Estimate ± standard error 1.96
Example: 0.4221 ± 0.0578 1.96 i.e. 0.4221 ± 0.11328
> predict(chd.glm,data.frame(age=45),type="response",se=T)$fit[1] 0.4221367$se.fit[1] 0.05780285$residual.scale[1] 1
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 21
Confidence intervals for log-odds
Calculated with predict function(Like prediction intervals in linear regression)
Form is Estimate ± standard error 1.96
Example: -0.314008 ± 0.2369578 1.96 i.e. -0.3141 ± 0.4644
> predict(chd.glm,data.frame(age=45),se=TRUE)$fit[1] -0.314008$se.fit[1] 0.2369578$residual.scale[1] 1
© Department of Statistics 2012 STATS 330 Lecture 21: Slide 22
Confidence intervals for log-odds
Calculated with predict function(Like prediction intervals in linear regression)
Form is Estimate ± standard error 1.96
Example: -0.314008 ± 0.2369578 1.96 i.e. -0.3141 ± 0.4644
> predict(chd.glm,data.frame(age=45),se=T)$fit[1] -0.314008$se.fit[1] 0.2369578$residual.scale[1] 1