st745: survival analysis: nonparametric methodseblaber/l5.pdfst745: survival analysis: nonparametric...
TRANSCRIPT
![Page 1: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/1.jpg)
ST745: Survival Analysis:Nonparametric methods
Eric B. Laber
Department of Statistics, North Carolina State University
February 5, 2015
![Page 2: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/2.jpg)
The KM estimator is used ubiquitously inmedical studies to estimate and depict thefraction of patients living for a certainamount of time after treatment. It has sincebeen applied to data from clinical trials oftherapies for every disease from cancer tocardiology to concussion. —Science Life
Paul Meiers work and the KM analysis havebeen responsible for saving millions oflives.—Significance
![Page 3: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/3.jpg)
Then and now
I Last time we discussed max-LH with censoringI Right-censoring schemes
I Left-truncation
I Interval censored data
I Current status data
I Estimating parametric models in R
I Large sample theory and inference
I Today we’ll discussI Kaplan-Meier estimator and inference
I Nelson–Aalen estimator and inference
I Using R for nonpar estimation
![Page 4: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/4.jpg)
Warm-up
I Explain to your stat buddy
1. What’s the difference between left-censoring andleft-truncation?
2. Given two examples of nonparametric estimators
3. Pros and cons of nonparametric methods relative to parametricmethods
4. What is a confidence interval?
I True or false:I (T/F) Paul Meier is still alive
I (T/F) The bootstrap is an asymptotic approximation
I (T/F) The intergral symbol∫
was invented by GottfriedWilhelm Leibniz III
![Page 5: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/5.jpg)
Things to recall
I For a discrete distribution with failure times t1, . . .
S(t) =∏j :tj<t
[1− h(tj)] ,
where h(tj) = P(T = tj |T ≥ t)
![Page 6: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/6.jpg)
Family feud!
I I surveyed statisticians in SAS hall for the five most importantsteps in an applied statistical analysis. What are they?
1.2.3.4.5.
![Page 7: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/7.jpg)
Complications due to censoring
I Consider making a simple visual display of lifetime data subjto right-censoring
I Why is this important?
I Consider making a histogram, what goes wrong?
I What about plotting the empirical CDF?
I Today we’ll see how to make these plots (and more!)
![Page 8: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/8.jpg)
Product limit estimator: warm-up
I Let T1, . . . ,Tn denote an iid sample (no-censoring)I Empirical CDF
F (t) =1
n
n∑i=1
1Ti≤t
I Empirical survival function (ESF)
S(t) =1
n
n∑i=1
1Ti≥t
I Does F (t) = S(t) everywhere?
![Page 9: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/9.jpg)
Ex. ECDF and ESF
0 5 10 15
0.0
0.2
0.4
0.6
0.8
1.0
x
F(x
)
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●● ●●
●●●●
● ●●●●
●●●●●●●
●●●● ● ●●
●●●●
●● ● ● ●●● ●●
●
0 5 10 150.
00.
20.
40.
60.
81.
0
x
S(x
)
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●● ● ●●●●●●●● ● ● ●● ● ●● ●
I How big are the steps above?
![Page 10: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/10.jpg)
Ex. ECDF and ESF cont’d
n = 100;
x = rchisq (n, df=4);
par(list(mar=c(5,5,4,1) + 0.1, mfrow=c(1,2)));
plot (stepfun (sort(x), c(0, (1:n)/n)), xlab="x",
ylab=expression(hat(F)(x)), main="", lwd=3);
plot (stepfun (sort(x), c(1, 1-(1:n)/n)), xlab="x",
ylab=expression(hat(S)(x)), main="", lwd=3);
![Page 11: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/11.jpg)
Ex. ECDF and ESF cont’d
I If t1 < t2 < · · · < tk are distinct failure times
S(t) =1
n
k∑j=1
dj1tj≥t ,
where dj are the number of observations equal to tjI Why?
![Page 12: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/12.jpg)
ECDF and ESF under censoring
I When there is censoringI Number of points in an interval [a, b] is unknown
I Cannot compute ESF or ECDF
I Kaplan-Meier (KM) estimator (aka product limit estimator) isan analog of the ESF for right-censored data
I The original KM paper is the most highly cited statistics paperto date. What is the second most highly cited?
![Page 13: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/13.jpg)
KM estimator
I Let {(t ′i , δi )}ni=1 denote obs. data with distinct failure timest1 < t2 < · · · < tk (these DO NOT include censoring times)
I DefineI dj ,
∑ni=1 1t′i =tj ,δi=1 to be the number of failures at tj
I nj ,∑n
i=1 1t′i ≥tj to be the number at risk at tj
I The KM estimator of S(t) is
S(t) =∏j :tj<t
(nj − dj
nj
)
I Explain S(t) intuitively to your stat buddy
![Page 14: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/14.jpg)
Why does KM make sense?
I Given {(t ′i , δi )}ni=1 how can we estimate h(tj)? (Assumediscrete for now)
h(tj) = P(T = tj |T ≥ tj) ≈#fail at tj
#at risk at tj
=djnj
apply S(t) =∏
j :tj<t [1− h(tj)] ≈∏
j :tj<t
(1− dj
nj
)= S(t)
![Page 15: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/15.jpg)
Ex. compute the KM estimator
t δ
6 14 15 011 01 115 12 0
tj nj dj (nj − dj)/nj S(tj+)
14615
![Page 16: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/16.jpg)
Code break I: Computing KM in R
I See file firstKM.R
![Page 17: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/17.jpg)
Sanity check
I Claim: The KM estimator reduces to the ESF when there isno censoring. Why?
I Answer on board.
![Page 18: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/18.jpg)
Code break II: Example 3.2.1 from Lawless
I See file ex321.R
![Page 19: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/19.jpg)
Variance estimation
I A consistent estimator of the variance of S(t) is given byGreenwood’s formula:
σ2S(t) = S2(t)∑j :tj<t
djnj(nj − dj)
I When there is no censoring, this reduces to S(t)(1− S(t))/n.Why is this the right quantity?
![Page 20: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/20.jpg)
KM as nonparametric MLE
I Recall our counting process notation
Yt(t) = 1Ti≥t,ith subj not cens at t
dNi (t) = Yi (t)1Ti=t
dCi (t) = Yi (t)1ith subj cens at t,
we’ll assume a discrete distribution with potential failure timest = 0, 1, . . .
I With your stat buddy prove∑n
i=1 dNi (t) =∑n
i=1 Yi (t)dNi (t)
![Page 21: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/21.jpg)
KM as nonparametric MLE cont’d
I Recall from our work on non-informative censoring that
L ∝n∏
i=1
∞∏t=0
h(t)dNi (t) [1− h(t)]Yi (t)dNi (t)
I Note* We saw this en route to simplifying to an expressioninvolving f (t) and S(t); for our purposes it will be convenientto use the above form.
![Page 22: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/22.jpg)
KM as nonparametric MLE cont’d
I The LH simplifies to
L ∝∞∏t=0
h(t)dt [1− h(t)]nt−dt ,
where dt ,∑n
i=1 dNi (t), nt ,∑n
i=1 Yi (t)
I Why? Interchange products to obtain
∞∏t=0
n∏i=1
h(t)dt [1− h(t)]nt−dt
=∞∏t=0
h(t)∑n
i=1 dNi (t) [1− h(t)]∑n
i=1 Yi (t)(1−dNi (t)) ,
and use∑
i=1 Yi (t)dNi (t) =∑n
i=1 dNi (t) = dt
![Page 23: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/23.jpg)
KM as nonparametric MLE cont’d
I To obtain nonparametric MLE we view (h(0), h(1), . . .) as ourparameter and maximize L
I If nt = 0 then there is no information about h(t), let τ denotethe largest t s.t. nt > 0 then
L ∝τ∏
t=0
h(t)dt [1− h(t)]nt−dt ,
and the log-LH is
` =τ∑
t=0
{dt log h(t) + (nt − dt) log (1− h(t))}
![Page 24: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/24.jpg)
KM as nonparametric MLE cont’d
I Differentiate ` wrt to h(t) to obtain
∂
∂h(t)` =
dth(t)
− (nt − dt)
1− h(t),
set this to zero and solve for h(t) to obtain h(t) = dt/ntI Then
S(t) =∏j :tj<t
[1− h(tj)
]=∏j :tj<t
[1−
djnj
],
is the MLE for S(t) by the invariance property of the MLE
![Page 25: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/25.jpg)
KM as nonpar MLE, enough already!
I Some things to note
1. If the last obs time τ is a failure then S(t) ≡ 0 for all t > τ
2. If the last obs time τ is a censoring time then S(t) is notdefined for t > τ
3. MLE formulation is powerful since large sample theory can beused to study efficiency and conduct statistical inference
![Page 26: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/26.jpg)
Fact from your past
I Let g be a smooth function from R into R then
g(θn) ≈ g(θ) +∇g(θ)(θn − θ)
so thatVar g(θn) ≈ ∇g2(θ)Var θn,
thus we can approximate the variance of θn via
Var θn ≈1
∇g2(θ)Var g(θn)
I Ex. Let g(u) = log u to obtain
Var S(t) ≈ S2(t)Var log S(t)
![Page 27: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/27.jpg)
Computing Greenwood’s formula
I If we can approximate the variance of log S(t) then we canuse the preceding expansion to approximate Var S(t)
I Recall the score function (derivative of log-LH) is
u(h(t)) =dth(t)
− (nt − dt)
1− h(t),
so that
u′(h(t)) = − dth2(t)
− (nt − dt)
(1− h(t))2
= −nt[
1
h(t)+
1
1− h(t)
]= − nt
h(t)(1− h(t))
![Page 28: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/28.jpg)
Computing Greenwood’s formula cont’d
I Observed fisher info is a diagonal matrix with entries
It =nt
h(t)(1− h(t))
I Thus (h(0), h(1), . . . , h(τ)) are asymptotically independent s.t.
Var log S(t) = Var log∏j :tj<t
[1− h(tj)
]
= Var
∑j :tj<t
log[1− h(tj)
]≈
∑j :tj<t
Var log[1− h(tj)
]
![Page 29: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/29.jpg)
Computing Greenwood’s formula cont’d
I We can estimate Var log{
1− h(tj)]
using our approx
Var log{
1− h(tj)]≈
Var h(tj)
(1− h(t))2≈
I−1tj
(1− h(t))2=
ntj h(tj)
1− h(tj)
I Putting it all together
Var(S(t)) ≈ S2(t)∑j :tj<t
h(tj)ntj
1− h(tj)= S2(t)
∑j :tj<t
dj(nj − dj)2nj
,
where we have used ntj = nj and h(tj) = dj/nj
![Page 30: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/30.jpg)
Computing Greenwood’s formula epilogue
I We glossed over some slippery technical details; for rigoroustreatment see advanced survival texts (e.g., Flemming andHarrington, 2005). For a treatment of infinite dimensionalparameter spaces see Butches semi-parametrics course.
![Page 31: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/31.jpg)
Nelson-Aalen estimator
I One could obtain an estimator of the cumulative hazard via− log S(t) (why?) but the following estimator is typicallypreferred
H(t) ,∑j :tj≤t
djnj,
this is called the Nelson-Aalen (pronounced OH-len) estimator
![Page 32: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/32.jpg)
Ex. compute the NA estimator
t δ
6 14 15 011 01 115 12 0
tj nj dj dj)/nj H(tj)
14615
![Page 33: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/33.jpg)
Code break III: Computing NA in R
I See file firstNA.R
![Page 34: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/34.jpg)
Plotting the NA estimator
I Plot of H(t) informative for the shape of the hazard fnI H(t) linear implies constant hazard
I H(t) convex implies monotone hazard
I Slope of H(t) approximates h(t)
![Page 35: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/35.jpg)
Match the NA estimator with the true hazard
●●●●●●●●
●●●●●
● ●●●●
●●
●●●
●●●●●●
●●●
●
●
●
●
●
●
●
●
0 1 2 3 4
01
23
4
time
H(t)
● ● ● ● ●●●●
● ●●●●●●●●●
●●●●●
●●●●
●●
●●
●
●
●
●
●
●
●
●
●
1 2 3 4 5 6 70
12
34
time
H(t)
● ● ● ● ●●● ●●●●● ●●●
●●●●●
●●
●●
●●●
●●
●●
●
●
●
●
●
●
●
●
●
0.5 1.0 1.5 2.0
01
23
4
time
H(t)
0 1 2 3 4 5
0.6
0.8
1.0
1.2
1.4
time
h(t)
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
time
h(t)
0 1 2 3 4 5
02
46
810
time
h(t)
![Page 36: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/36.jpg)
Variance estimation
I NA estimator is an MLE just like KM
I Variance estimator for H(t) is
σ2H(t) =∑j :tj≤t
dj(nj − dj)
n2j,
which can be derived using large-sample approximations
![Page 37: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/37.jpg)
Codebreak IV: NA on Example 3.2.1 from Lawless
I See file ex321NA.R
![Page 38: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/38.jpg)
Confidence interval for S(t)
I Fact: For any fixed t > 0
S(t)− S(t)
σS(t) N(0, 1)
I Stronger convergence results (simultaneous over all t) exist
I (1− α)× 100% CI based on Greenwood’s formula
S(t)± z1−α/2σS(t)
![Page 39: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/39.jpg)
Alternative confidence intervals
I Greenwood’s formula is intuitive but has drawbacksI CI generally does not perform well in small samples
I Can generate a CI with endpoints outside of (0, 1)
I Recall our general strategy for modeling probabilities
1. Transform to take values in R2. Conduct estimation/inference on transformed scale
3. Transform back to (0, 1)
![Page 40: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/40.jpg)
Transformed confidence interval
I Let g(s) be a decreasing cts function from (0, 1) onto R,construct a CI for g(S(t)) then transform back via Taylorapprox
I Define ψ(t) , g(S(t)) then
σ2ψ(t) ≈[∇g
{S(t)
}]2σ2S(t)
I Taylor series arguments show
P
(−z1−α/2 ≤
ψ(t)− ψ(t)
σψ(t)≤ z1−α/2
)≈ 1− α
![Page 41: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/41.jpg)
Transformed confidence interval cont’d
I Rearrange terms to obtain
P(ψ(t)− z1−α/2σψ(t) ≤ ψ(t) ≤ ψ(t) + z1−α/2σψ(t)
)≈ 1−α
I Solve for S(t) using ψ(t) = g(S(t))
P
(g−1
{ψ(t) + z1−α/2σψ(t)
}≤ S(t)
≤ g−1{ψ(t)− z1−α/2σψ(t)
})≈ 1− α
I Note the arguments within g−1 have flipped
I Question: How do we know g−1 exists and is decreasing?
![Page 42: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/42.jpg)
Transformed confidence interval cont’d
I If g(s) = log (− log(s))I CI is [
e{− exp(ψ(t)+z1−α/2σψ)}, e{− exp(ψ(t)−z1−α/2σψ)}]
I Variance is
σ2ψ(t) =
σ2S(t)[
S(t) log S(t)]2
I Another common choice is g(s) = − log(s)
![Page 43: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/43.jpg)
Bootstrap: AKA the boostarp
I Eric Draws a brilliant depiction of the bootstrap on the board
I Applaud subsides
I A quiet moment of reflection reveals a new appreciation forthe beauty of statistics in each of us
![Page 44: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/44.jpg)
The boostarp cont’d
I Let D = {(Ti , δi )}ni=1 denote the observed data and Pn theempirical distribution
I A (nonparametric) bootstrap sample is a sample of size n, sayD(b), drawn uniformly (with replacement) from D
I D(b) is an i .i .d . draw of size n from Pn
I Other resample sizes are possible
I Standard percentile bootstrap CI for S(t)
1. Draw B nonparametric samples, D(1), . . . ,D(B)
2. Compute S (b)(t), KM on D(b), b = 1, . . . ,B
3. Let α/2, and u1−α/2 be the (α/2)× 100 and (1− α/2)× 100
percentiles of S (1)(t), . . . , S (B)
4. Final (1− α)× 100% CI is[α/2, u1−α/2
]
![Page 45: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/45.jpg)
Simulated experiment: coverage probabilities
I T ∼ log-normal(−1, 2), C ∼ exp(1.75)
I Sample size of n = 200 and 10K MC replications
I Compare coverage of Greenwood’s formula with log − logtransform
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●
●●●● ●
● ●●
● ● ●
● ●
●
●●
● ●
●
●● ●
●● ●
●●
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.90
0.91
0.92
0.93
0.94
0.95
t
Cov
erag
e
+++++
+
++
++++
+++
++++++++++
+++++++
+++++ + + + + + + + + ++
++++
+
++
++++
+++
++++++++++
+++++++
+++++ + + + + + + + + +
● GreenwoodLog−log
See coverageExample.R
![Page 46: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/46.jpg)
Confidence intervals for quantiles
I In some settings a quantile is of interestI E.g., the medianI Quantiles are often easier to estimate than moments
I Recall tp is the pth quantile of T
tp = inf {t : 1− S(t) ≥ p}
I Give an estimator S(t) of S(t) we obtain
tp = inf{t : 1− S(t) ≥ p
}
![Page 47: ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University](https://reader035.vdocument.in/reader035/viewer/2022062611/6128e379849c7c6388309bc0/html5/thumbnails/47.jpg)
Confidence intervals for quantiles cont’d
I For continuous T , S(tp) = 1− p
I Suppose tL = tL(Data) satisfies
P (S(tL) ≥ 1− p) ≥ 1− α,
then tL is a lower confidence bound for tp (Why?)
I For any fixed t
P(S(t) ≥ S(t)− z1−α/2σS(t)
)≈ 1− α,
solve S(tL)− z1−α/2σS(t) = 1− p for tL