lecture 6: non-parametric estimation ii -confidence intervals and bands -mean and median

49
Lecture 6: Non- parametric Estimation II -Confidence intervals and bands -Mean and median

Upload: kevin-merritt

Post on 03-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Lecture 6: Non-parametric Estimation II

-Confidence intervals and bands-Mean and median

Page 2: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Point-wise Confidence Intervals

• Recall last time we discussed several possibilities for constructing point-wise confidence intervals for S(t) at a particular t

• The estimates rely heavily on the Greenwood estimate of the variance of

• Recall that this is the sum in the Greenwood’s formula:

2ˆ ˆˆ i

i i i

i

d

Y Y dt t

V S t S t

S t

Page 3: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Types of Point-wise CIs• Linear

• Log

• Log-log

• Arcsine Square root

2

1 2ˆ ˆ i

i i ii

d

Y Y dt tS t z S t

1 2ˆdi

Y Y dt t i i iiz

S t e

11 1 2 ˆlogˆ ˆ, where

diY Y dS t t t i i ii

zS t S t e

ˆ0.52ˆ1 2 1

ˆ0.52ˆ1 22 1

ˆ ˆsin max 0,arcsin 0.5

ˆsin min ,arcsin 0.5

i

i i ii

i

i i ii

S td

Y Y d S tt t

S td

Y Y d S tt t

S t z S t

S t z

Page 4: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Example: Tongue Cancer

• 80 subjects with tongue cancer• Outcome: Time to death (in weeks)• Two tumor types

– Aneuploid– Diploid

Page 5: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Example: Tongue Cancer

0 100 200 300 400

0.0

0.2

0.4

0.6

0.8

1.0

Time to Death (months)

Su

rviv

al

AneuploidDiploid

Page 6: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

R Life Tables >dat<-Surv(tongue$Time, tongue$Cens)> type<-tongue$Type> fit<-survfit(dat~type)> summary(fit)Call: survfit(formula = dat ~ type)

type=1 *Anueploid time n.risk n.event survival std.err L 95% CI U 95% CI 1 52 1 0.981 0.0190 0.944 1.000 3 51 2 0.942 0.0323 0.881 1.000 4 49 1 0.923 0.0370 0.853 0.998 10 48 1 0.904 0.0409 0.827 0.988 13 47 2 0.865 0.0473 0.777 0.963 16 45 2 0.827 0.0525 0.730 0.936 24 43 1 0.808 0.0547 0.707 0.922 26 42 1 0.788 0.0566 0.685 0.908 27 41 1 0.769 0.0584 0.663 0.893 28 40 1 0.750 0.0600 0.641 0.877 30 39 2 0.712 0.0628 0.598 0.846 32 37 1 0.692 0.0640 0.578 0.830 41 36 1 0.673 0.0651 0.557 0.813 51 35 1 0.654 0.0660 0.537 0.797 65 33 1 0.634 0.0669 0.516 0.780….

Page 7: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

R Life Tables

type=2 *Diploid

time n.risk n.event survival std.err L 95% CI U 95% CI 1 28 1 0.9643 0.0351 0.8979 1.000 3 27 1 0.9286 0.0487 0.8379 1.000 4 26 1 0.8929 0.0585 0.7853 1.000 5 25 2 0.8214 0.0724 0.6911 0.976 8 23 1 0.7857 0.0775 0.6475 0.953 12 21 1 0.7483 0.0824 0.6031 0.929 13 20 1 0.7109 0.0863 0.5603 0.902 18 19 1 0.6735 0.0895 0.5190 0.874 23 18 1 0.6361 0.0921 0.4790 0.845 26 17 1 0.5986 0.0939 0.4402 0.814 27 16 1 0.5612 0.0952 0.4025 0.783 30 15 1 0.5238 0.0959 0.3658 0.750 42 14 1 0.4864 0.0961 0.3302 0.716 56 13 1 0.4490 0.0957 0.2956 0.682 62 12 1 0.4116 0.0948 0.2621 0.646…

Page 8: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

CIs for S(52) for Anueploid Tumors

• Linear

Page 9: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

CIs for S(52) for Anueploid Tumors

• Log

Page 10: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

CIs for S(52) for Anueploid Tumors

• Log-log

Page 11: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

CIs for S(52) for Anueploid Tumors

• Arcsine-Square root

Page 12: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median
Page 13: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Comparing CI’s

Page 14: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

R Code### Anueploid Onlyafit.lin<-survfit(dat[type==1]~1, conf.type="plain")afit.log<-survfit(dat[type==1]~1, conf.type="log")afit.loglog<-survfit(dat[type==1]~1, conf.type="log-log")

### Diploid onlydfit.lin<-survfit(dat[type==2]~1, conf.type="plain")dfit.log<-survfit(dat[type==2]~1, conf.type="log")dfit.loglog<-survfit(dat[type==2]~1, conf.type="log-log")

Page 15: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

R Codepar(mfrow=c(2,2))

plot(afit.lin, conf.int=T, col=1, lwd=2, lty=c(2,2), main="Linear")

lines(dfit.lin, conf.int=T, col=3, lwd=2, lty=c(1,1))

mtext("S(t)", side=2, line=3, at=-0.4)

plot(afit.log, conf.int=T, col=1, lwd=2, lty=c(2,2), main="Log")

lines(dfit.log, conf.int=T, col=3, lwd=2, lty=c(1,1))

plot(afit.loglog, conf.int=T, col=1, lwd=2, lty=c(2,2), main="Log-log")

lines(dfit.loglog, conf.int=T, col=3, lwd=2, lty=c(1,1))

mtext("Time to Death (days)", side=1, line=3, at=500)

plot(afit.lin, conf.int=F, col=1, lwd=2, lty=2, main="Arcsine squareroot")

lines(dfit.lin, conf.int=F, col=3, lwd=2, lty=1)

lines(afit.lin$time, aarcsn[,1], type="s", lwd=2, lty=2)

lines(afit.lin$time, aarcsn[,3], type="s", lwd=2, lty=2)

lines(dfit.lin$time, darcsn[,1], type="s", col=3, lwd=2, lty=1)

lines(dfit.lin$time, darcsn[,3], type="s", col=3, lwd=2, lty=1)

Page 16: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

R Codepar(mfrow=c(1,2), xpd=NA)

plot(afit.loglog, conf.int=T, col=2, lwd=2, lty=c(3,3), main="Anueploid", ylab="S(t)")

lines(afit.log, conf.int=T, col=3, lwd=2, lty=c(2,2))

lines(afit.lin, conf.int=T, col=1, lwd=2, lty=c(1,1))

lines(afit.lin$time, aarcsn[,1], type="s", col=4, lwd=2, lty=4)

lines(afit.lin$time, aarcsn[,3], type="s", col=4, lwd=2, lty=4)

mtext("Time to Death (days)", side=1, line=3, at=500)

plot(dfit.loglog, conf.int=T, col=2, lwd=2, lty=c(3,3), main="Diploid")

lines(dfit.log, conf.int=T, col=3, lwd=2, lty=c(2,2))

lines(dfit.lin, conf.int=T, col=1, lwd=2, lty=c(1,1))

lines(dfit.lin$time, darcsn[,1], type="s", col=4, lwd=2, lty=4)

lines(dfit.lin$time, darcsn[,3], type="s", col=4, lwd=2, lty=4)

legend(x=125, y=1, legend=c("Linear","Log","Log-log","Arcsine sqrt"), col=c(1,3,2,4), lty=c(1,2,3,4), lwd=2, cex=0.75, bty="n")

Page 17: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

CI Function for Arcsine Sqrtarcsincis<-function(fit, t, alpha)#fit is a fitted survival curve from survfit, t is the time we want to estimate S(t) and the CI for#alpha is the significance level we want to consider{ if(class(fit)!="survfit") stop("The object is not of class 'survfit'") tloc<-max(which(fit$time<=t)) St_hat<-fit$surv[tloc] gw<-c() for (i in 1:tloc)

{di<-fit$n.event[i]yi<-fit$n.risk[i]gw<-append(gw, di/(yi*(yi-di)))}

gwsum<-sum(gw) lo<-round((sin(max(0, asin(sqrt(St_hat))-0.5*qnorm(1-alpha/2)*sqrt(gwsum*St_hat/(1-St_hat)))))^2, 3) hi<-round((sin(min(pi/2, asin(sqrt(St_hat))+0.5*qnorm(1-alpha/2)*sqrt(gwsum*St_hat/(1-

St_hat)))))^2, 3) St_hat<-round(St_hat, 3) cat("Arcsine Square root ", 100*(1-alpha), "% Confidence interval for S(", t, ") is: ", lo, " <= ",

St_hat," <= ",hi, sep="", "\n")}

Page 18: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

CI Function for Arcsine Sqrt

> afit<-survfit(dat[type==1]~1, conf.type="none")> arcsincis(fit=afit, t=52, alpha=0.05)Arcsine Square root 95% Confidence interval for S(52) is: 0.52 <= 0.654 <= 0.776

Page 19: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Just Some Reminders About Usage…

• For N > 25 and < 50% censoring– Log-log is good– Arcsine square-root good– All three give ~ nominal coverage for 95% CI– Exception: extreme right tail where there is little

data• Linear approach requires larger N for good

coverage

Page 20: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Which Ones are More/Less Conservative

• Arcsine square root– Slightly conservative– A little wider than necessary

• Log– Conservative for upper limit– Anti-conservative on lower limit

• Log-log– For small N, slightly anti-conservative– A little too narrow

• Linear– For small N, overly anti-conservative– Too narrow

• Large Samples: all about the same

Page 21: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Confidence Intervals

• What is most commonly produced by software packages

• Valid ONLY for point-wise intervals• Problem is they are often misinterpreted:

– Plot a set of point-wise 95% CIs– Interpret as confidence “band”– These “bands” are too narrow!

Page 22: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Confidence Bands

• A band for which we are (1-a)% confident that the survival function fails within the band for all t in some interval

• Tend to be wider than the point-wise estimates

• Looking for two random functions, U(t) and L(t) s.t.

1 ; L UP L t S t U t t t t

Page 23: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

EP (“equal probability”) bands (Nair)

• Proportional to point-wise confidence bands• Several steps:

1. Define a’s:

2. Define confidence coefficient: ca(aL, aU) from table C3

3. Use linear, log-log, or arcsine approach…

2

2

2

2

1

1

S L

S L

S U

S U

n tL n t

n t

U n t

a

a

Page 24: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Linear Confidence Bands

ˆ ˆ,

where

for

i

i i ii

L U S

dS Y Y dt t

L U

S t c a a t S t

t

t t t

Page 25: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Log-Log Confidence Bands

• Define q:

• Confidence band:

,exp

ˆln

L U Sc a a t

S t

1ˆ ˆ,S t S t

Page 26: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Arcsine-Square Root Confidence Bands

12

12

ˆ2ˆ1

ˆ2ˆ1

ˆlower: sin max 0, arcsin 0.5 ,

ˆupper: sin min , arcsin 0.5 ,2

S tL U S S t

S tL U S S t

S t c a a t

S t c a a t

Page 27: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Hall-Wellner Confidence Bands• Similar to previous approach• Confidence coefficients in table C4

• Linear:

• Log-log:

• Arcsine:

2, 1

ˆ ˆL U Sa a n tS t S t

n

121 22

12

121 22

12

ˆ, 1_2ˆ1

ˆ, 1_2ˆ2 1

ˆ ˆsin max 0,arcsin 0.5

ˆsin min ,arcsin 0.5

L U s

L U s

k a a n t S t

S tn

k a a n t S t

S tn

S t S t

S t

12, 1

ˆ ˆ, where expˆln

L U Sa a n tS t S t

n S t

Page 28: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Life-Table for Tongue Cancer>sort(tongue$Time) [1] 1 1 3 3 3 4 4 5 5 8 8 10 12 13 13 13 16 16 18 23 24 26 26 27 27 28 30 30 30 32 [30] 41 42 51 56 61 62 65 67 67 69 70 72 73 74 76 77 79 80 81 87 87 88 89 91 93 93 96[58] 97 100 101 104 104 104 104 104 108 109 112 120 129 131 150 157 167 176 181 231 231 240 400> dat<-Surv(tongue$Time, tongue$Cens)> full.mod<-survfit(dat~1)> summary(full.mod)Call: survfit(formula = dat ~ 1)

time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 80 2 0.975 0.0175 0.9414 1.000 3 78 3 0.938 0.0271 0.8859 0.992 4 75 2 0.913 0.0316 0.8526 0.977 5 73 2 0.888 0.0353 0.8209 0.960 8 71 1 0.875 0.0370 0.8054 0.951 10 69 1 0.862 0.0386 0.7900 0.941 12 68 1 0.850 0.0400 0.7747 0.932… 157 8 1 0.252 0.0634 0.1541 0.413 167 7 1 0.216 0.0638 0.1213 0.386 181 5 1 0.173 0.0640 0.0838 0.357

Page 29: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Example: Tongue Cancer

• Steps:1. Choose and interval of time

2. Define a’s:

10 200L Ut and t

2

2

2

2

2 2

32 2 2 1 180*78 78*75 75*73 73*71 71*70 69*68

2 2 2

80*0.00199881 80*0.00199881

80*0.1370271 80*0.1370271

10

0.0019988

200 181 0.137027

0.1379

0.0.9164

S L

S L

S U

S U

s L s

s L s s

n tL n t

n t

U n t

t

t

a

a

Page 30: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Example: Tongue Cancer

• Steps:3. Define confidence coefficient: ca(aL, aU) from

table C3 (or ka(aL, aU) from table C4)

4. Use linear, (log), log-log, or arcsine approach…

Page 31: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median
Page 32: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median
Page 33: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Example: Confidence Bands (Nair)

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

Linear

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

Log

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

Loglog

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

Arcsine-Sqrt

Page 34: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

There is an R Package for That…

• R package “km.ci”– Will estimate different types of confidence intervals

• But no arcsine squareroot

– Will estimate both the Nair and Hall-Wellner confidence bands

• But only the log-log transformed

• It does include the ca and ka tables from Klein and Moeschberger!– So… you could write your own confidence band function

Page 35: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Confidence Band Performance

• Linear is poor for n < 200– Poor coverage probability

• Log-log and arcsine square-root have pretty accurate coverage probabilities (even for n = 20)

Page 36: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Mean Estimation• Recall:• Non-parametric approach

– Can be done using integral

– Requires last time is not censored• “fixes” exist if this is the case

0

S t dt

2

2

10 0

1

ˆ ˆˆ ˆ ˆ&

ˆ ˆ ˆ100 1 % CI :

D ii

i i i

dS t dt Var S t dt

Y Y d

z Var

Page 37: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Example: MP-6 treatment for Leukemia

• Time in months to relapse on acute leukemia patients

• 21 patients observed for up to 35 weeksTime di Yi S(t)

6 3 21 0.8577 1 17 0.807

10 1 15 0.75313 1 12 0.69016 1 11 0.62822 1 7 0.53823 1 6 0.448

Page 38: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Example: MP-6 treatment for Leukemia

Page 39: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Example: MP-6 treatment for Leukemia

Page 40: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Median Survival

• Any quantile of S(t) can be estimated in the same way

• The most common is median (p = 0.50)• Definition of median survival:

• That is, the smallest event time for which S(t) is less than 0.50

0.5ˆˆ inf : 0.5x t S t

Page 41: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Other Quantiles

• For the pth quantile

• So for example, the 25th quantile is

ˆˆ inf : 1px t S t p

0.25ˆ ˆˆ inf : 1 0.25 inf : 0.75x t S t t S t

Page 42: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Precision of pth Quantile

• Technically difficult…• Requires knowledge of the density function of X at xp

• Approximation approach by Brookmeyer and Crowely (1982)

• Most commonly used approach for estimating a confidence interval for median survival

Page 43: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Brookmeyer-Crowley Approach

• For each observed t, estimate a z-score• Example

– 95% confidence interval • Calculate Z for each t• All t for which |Z| < 1.96 are included in the confidence

interval

• Looks similar to an approach for estimating the confidence for a mean

Page 44: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Brookmeyer-Crowley Approach

• Linear

• Log-log

• Arcsine square root

ˆ 1

ˆˆ

S t pZ

V S t

ˆ ˆ ˆln ln ln ln 1 ln

ˆˆ

S t p S t S t

V S t

12ˆ ˆ ˆ2 arcsine arcsine 1 1

ˆˆ

S t p S t S t

V S t

Page 45: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Example

• Kim paper• Event = time to relapse• Data:

– 10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+

Page 46: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Kim Data

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

Time to Relapse (months)

Su

rviv

al F

un

ctio

n

Kaplan-MeierNelson-Aalen

Page 47: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Kim Datat di Yi S(t) V(t) Linear Z Log-log Z Arcsine Z

10 1 10

35 1 8

55 1 5

80 1 2

Page 48: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

CI Around Median

• Median for Kim data is 80 weeks• CI around this value is set of all points that

satisfy the selected inequality– E.g. for 90%, Z is 1.645– In the case of the Kim data, estimates highly

variable– Limitation with SMALL data set

Page 49: Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

For Next Time

• Left Truncated data• Competing Risks