hsrp 734: advanced statistical methods july 10, 2008
DESCRIPTION
HSRP 734: Advanced Statistical Methods July 10, 2008. Objectives. Describe the Kaplan-Meier estimated survival curve Describe the log-rank test Use SAS to implement. Kaplan-Meier Estimate of Survival Function S(t). - PowerPoint PPT PresentationTRANSCRIPT
HSRP 734: HSRP 734: Advanced Statistical Advanced Statistical
MethodsMethodsJuly 10, 2008July 10, 2008
ObjectivesObjectives
Describe the Kaplan-Meier Describe the Kaplan-Meier estimated survival curveestimated survival curve
Describe the log-rank testDescribe the log-rank test Use SAS to implement Use SAS to implement
Kaplan-Meier Estimate of Survival Kaplan-Meier Estimate of Survival Function Function S(t)S(t)
The Kaplan-Meier estimate of the The Kaplan-Meier estimate of the survival function is a simple, useful survival function is a simple, useful and popular estimate for the survival and popular estimate for the survival function.function. This estimate incorporates both This estimate incorporates both
censored and noncensored observationscensored and noncensored observations Breaks the estimation problem down Breaks the estimation problem down
into small piecesinto small pieces
Kaplan-Meier Estimate of Kaplan-Meier Estimate of the Survival Function S(t)the Survival Function S(t)
For grouped survival data,For grouped survival data,
Let interval lengths Let interval lengths LLjj become very small – become very small – all of length all of length L=L=t t and let and let tt11, t, t22, … be times , … be times of events (survival times)of events (survival times)
tjj
jj
N
Ly
tbeyongSurvivetS
thru bins
Estimated
1:
1
)Pr()(ˆ
Kaplan-Meier Estimate of Kaplan-Meier Estimate of the Survival Function S(t)the Survival Function S(t)
2 cases to consider in the previous 2 cases to consider in the previous equationequation Case 1. No event in a bin (interval)Case 1. No event in a bin (interval)
does not change does not change —— which means that which means that we can ignore bins with no eventswe can ignore bins with no events
11
00
j
jj
j
j
j
jj
N
Ly
N
L
N
Ly
)(ˆ tS
Kaplan-Meier Estimate of Kaplan-Meier Estimate of the Survival Function S(t)the Survival Function S(t) Case 2. Case 2. yyjj events occur in a bin (interval) events occur in a bin (interval)
Also: Also: nnjj persons enter the bin persons enter the bin
assume any censored times that occur in the assume any censored times that occur in the bin occur at the end of the binbin occur at the end of the bin
j
jj
j
j
j
jj
n
yn
tn
ty
N
Ly
11
Kaplan-Meier Estimate of Kaplan-Meier Estimate of the Survival Function S(t)the Survival Function S(t)
So, as So, as t t → 0→ 0, we get the , we get the Kaplan- Meier estimate Kaplan- Meier estimate of the survival function of the survival function S(t)S(t)
Also called the “product-limit estimate” of the Also called the “product-limit estimate” of the survival function survival function S(t)S(t)
Note: each conditional probability estimate is Note: each conditional probability estimate is obtained from the observed number at risk for obtained from the observed number at risk for an event and the observed number of events an event and the observed number of events (n(njj--yyjj) / n) / njj
)convention(by 1)0(ˆ
)(ˆ:
S
n
yntS
ttjj
jj
j
Kaplan-Meier Estimate of Kaplan-Meier Estimate of Survival Function Survival Function S(t)S(t)
We begin byWe begin by Rank ordering the survival times (including the Rank ordering the survival times (including the
censored survival times)censored survival times) Define each interval as starting at an observed Define each interval as starting at an observed
time and ending just before the next ordered timetime and ending just before the next ordered time Identify the number at risk within each intervalIdentify the number at risk within each interval Identify the number of events within each intervalIdentify the number of events within each interval Calculate the probability of surviving within that Calculate the probability of surviving within that
intervalinterval Calculate the survival function for that interval as Calculate the survival function for that interval as
the probability of surviving that interval times the the probability of surviving that interval times the probability of surviving to the start of that intervalprobability of surviving to the start of that interval
GroupGroup Weeks in remission Weeks in remission -- ie, time-- ie, time
to relapseto relapse
Maintenance Maintenance chemo (chemo (X=1X=1))
9, 13, 13+, 18, 23, 9, 13, 13+, 18, 23, 28+, 31,28+, 31,
34, 45+, 48, 161+34, 45+, 48, 161+
No No maintenance maintenance chemo (chemo (X=0X=0))
5, 5, 8, 8, 12, 16+, 5, 5, 8, 8, 12, 16+, 23, 27,23, 27,
30+, 33, 43, 4530+, 33, 43, 45
Example - AMLExample - AML
+ indicates a censored time to relapse; e.g., 13+ = more than 13 weeks to relapse
Example – AMLExample – AML Calculation of Kaplan-Meier estimates:Calculation of Kaplan-Meier estimates:
In the “not maintained on chemotherapy” group:In the “not maintained on chemotherapy” group:
TimeTime At At riskrisk
EventsEvents
ttjj nnjj yyjj
00 1212 00 1.0001.000
55 1212 22 1.000 x ((12-2)/12) = 0.8331.000 x ((12-2)/12) = 0.833
88 1010 22 0.833 x ((10-2)/10) = 0.6660.833 x ((10-2)/10) = 0.666
1212 88 11 0.666 x ((8-1)/8) = 0.5830.666 x ((8-1)/8) = 0.583
2323 66 11 0.583 x ((6-1)/6) = 0.4860.583 x ((6-1)/6) = 0.486
2727 55 11 0.486 x ((5-1)/5) = 0.3890.486 x ((5-1)/5) = 0.389
3333 33 11 0.389 x ((3-1)/3) = 0.2590.389 x ((3-1)/3) = 0.259
4343 22 11 0.259 x ((2-1)/2) = 0.1300.259 x ((2-1)/2) = 0.130
4545 11 11 0.130 x ((1-1)/1) = 00.130 x ((1-1)/1) = 0
)(ˆ tS
j
jj
jj n
yntStS
)(ˆ)(ˆ 1
Example – AML (cont’d)Example – AML (cont’d)
In the “maintained on chemotherapy” group:In the “maintained on chemotherapy” group:
TimeTime At At riskrisk
EventsEvents
ttjj nnjj yyjj
00 1111 00 1.0001.000
99 1111 11 1.000 x ((11-1)/11) = 0.9091.000 x ((11-1)/11) = 0.909
1313 1010 11 0.909 x ((10-1)/10) = 0.8180.909 x ((10-1)/10) = 0.818
1818 88 11 0.7160.716
2323 77 11 0.6140.614
3131 55 11 0.4910.491
3434 44 11 0.3680.368
4848 22 11 0.1840.184
)(ˆ tS
j
jj
jj n
yntStS
)(ˆ)(ˆ 1
Example – AML (cont’d)Example – AML (cont’d) The “Kaplan-Meier curve” plots the estimated survival The “Kaplan-Meier curve” plots the estimated survival
function vs. function vs. time time —— separate curves for each group separate curves for each group
Time
Su
rviv
al
0 50 100 150
0.0
0.2
0.4
0.6
0.8
1.0 Maintained=0
Maintained=1
Example – AML (cont’d)Example – AML (cont’d)
NotesNotes
— — Can count the total number of Can count the total number of events by counting the number of events by counting the number of steps (times)steps (times)
— — If feasible, picture the censoring If feasible, picture the censoring times on the graph as shown above. times on the graph as shown above.
Kaplan-Meier Estimate Kaplan-Meier Estimate Using SASUsing SAS
Comments on the Kaplan-Comments on the Kaplan-Meier EstimateMeier Estimate
If the event and censoring times are If the event and censoring times are tied, we assume that the censoring time tied, we assume that the censoring time is slightly larger than the death time.is slightly larger than the death time.
If the largest observation is an event, If the largest observation is an event, the Kaplan-Meier estimate is 0.the Kaplan-Meier estimate is 0.
If the largest observation is censored, If the largest observation is censored, the Kaplan-Meier estimate remains the Kaplan-Meier estimate remains constant forever.constant forever.
Comments on the Kaplan-Comments on the Kaplan-Meier EstimateMeier Estimate
If we plot the empirical survival estimates, If we plot the empirical survival estimates, we observe a step function. If there are no we observe a step function. If there are no ties and no censoring, the step function ties and no censoring, the step function drops by 1/n.drops by 1/n.
With every censored observation the size With every censored observation the size of the steps increase.of the steps increase.
When does the number of intervals equal When does the number of intervals equal the number of deaths in the sample?the number of deaths in the sample?
When does the number of intervals equal When does the number of intervals equal n?n?
Comments on the Kaplan-Comments on the Kaplan-Meier EstimateMeier Estimate
The Kaplan-Meier is a consistent The Kaplan-Meier is a consistent estimate of the true S(t). That estimate of the true S(t). That means that as the sample size gets means that as the sample size gets large, KM estimate converges to the large, KM estimate converges to the true value.true value.
The Kaplan-Meier estimate can be The Kaplan-Meier estimate can be used to empirically estimate any used to empirically estimate any cumulative distribution functioncumulative distribution function
Comments on the Kaplan-Comments on the Kaplan-Meier EstimateMeier Estimate
The step function in K-M curve really looks The step function in K-M curve really looks like this:like this:
If you have a failure at If you have a failure at tt11 then you want to then you want to say survivorship at say survivorship at tt11 should be less than 1. should be less than 1.
For small data sets it matters, but for large For small data sets it matters, but for large data sets it does not matter.data sets it does not matter.
Confidence Interval for S(t) Confidence Interval for S(t) – Greenwood’s Formula– Greenwood’s Formula
Greenwood’s formula for the variance of Greenwood’s formula for the variance of : :
Using Greenwood’s formula, an Using Greenwood’s formula, an approximate 95% CI for S(t) isapproximate 95% CI for S(t) is
There is a “problem”: the 95% CI is not There is a “problem”: the 95% CI is not constrained to lie within the interval (0,1)constrained to lie within the interval (0,1)
ttj
jjj
j
j ynn
ytStSraV
:
2
)()(ˆ)(ˆˆ
)(ˆˆ96.1)(ˆ tSraVtS
)(ˆ tS
Confidence Interval for S(t) Confidence Interval for S(t) – Alternative Formula– Alternative Formula
Based on log(-log(S(t)) which ranges from -Based on log(-log(S(t)) which ranges from -∞ to ∞∞ to ∞
Find the standard error of above, find the Find the standard error of above, find the CI of above, then transform CI to one for CI of above, then transform CI to one for S(t)S(t)
This CI will lie within the interval [0,1]This CI will lie within the interval [0,1]
This is the default in SASThis is the default in SAS
Log-rank test for comparing Log-rank test for comparing survivor curvessurvivor curves
Are two survivor curves the same?Are two survivor curves the same? Use the times of events: Use the times of events: tt11, t, t22, ... , ...
(do not include censoring times)(do not include censoring times) Treat each event and its “set of persons still at Treat each event and its “set of persons still at
risk” (i.e., risk set) at each time risk” (i.e., risk set) at each time ttjj as an as an independent tableindependent table
Make a 2×2 table Make a 2×2 table at each tat each tjj
EventEvent No EventNo Event TotalTotal
Group AGroup A aajj nnjAjA- a- ajj nnjAjA
Group BGroup B ccjj nnjBjB-c-cjj nnjBjB
TotalTotal ddjj nnjj-d-djj nnjj
Log-rank test for Log-rank test for comparing survivor curvescomparing survivor curves
At each event time At each event time t t jj, under assumption of , under assumption of equal survival (i.e., equal survival (i.e., SSAA(t) = S(t) = SBB(t) (t) ), the ), the expected number of events in Group A out expected number of events in Group A out of the total events (of the total events (ddjj=a=ajj +c +cjj) is in ) is in proportion to the numbers at risk in group proportion to the numbers at risk in group A to the total at risk at time A to the total at risk at time ttjj::
EaEajj = d= djj x n x njAjA / n / njj
Differences between Differences between aajj and and EaEajj represent represent evidence against the null hypothesis of evidence against the null hypothesis of equal survival in the two groupsequal survival in the two groups
Log-rank test for Log-rank test for comparing survivor curvescomparing survivor curves
Use the Cochran Mantel-Haenszel idea of Use the Cochran Mantel-Haenszel idea of pooling over events pooling over events j j to get the log-rank to get the log-rank chi-squared statistic with one degree of chi-squared statistic with one degree of freedomfreedom
)1(
)(ˆ
~ˆ
2
2
1
2
2
jj
jBjAjjj
j
jj
jjj
nn
nndnda
a
Eaa
raV
raV
Log-rank test for Log-rank test for comparing survivor curvescomparing survivor curves
Idea summary:Idea summary: Create a 2x2 table at each uncensored Create a 2x2 table at each uncensored
failure timefailure time The construct of each 2x2 table is based The construct of each 2x2 table is based
on the corresponding risk seton the corresponding risk set Combine information from all the tablesCombine information from all the tables
The null hypothesis is The null hypothesis is SSAA(t) = S(t) = SBB(t)(t) for all time for all time tt..
Comparisons across Comparisons across GroupsGroups
Extensions of the log-rank test to Extensions of the log-rank test to several groups require knowledge of several groups require knowledge of matrix algebra. In general, these tests matrix algebra. In general, these tests are well approximated by a chi-squared are well approximated by a chi-squared distribution with G-1 degrees of distribution with G-1 degrees of freedom.freedom.
Alternative tests:Alternative tests: Wilcoxon family of tests (including Peto Wilcoxon family of tests (including Peto
test)test) Likelihood ratio test (SAS)Likelihood ratio test (SAS)
Comparison between Log-Comparison between Log-Rank and Wilcoxon TestsRank and Wilcoxon Tests
The log-rank test weights each failure time The log-rank test weights each failure time equally. No parametric model is assumed for equally. No parametric model is assumed for failure times within a stratum.failure times within a stratum.
The Wilcoxon test weights each failure time by a The Wilcoxon test weights each failure time by a function of the number at risk. Thus, more function of the number at risk. Thus, more weight tends to be given to early failure times. weight tends to be given to early failure times. As in the log-rank test, no parametric model is As in the log-rank test, no parametric model is assumed for failure times within a stratum.assumed for failure times within a stratum.
Between these two tests (Wilcoxon and log-rank Between these two tests (Wilcoxon and log-rank tests), the Wilcoxon test will tend to be better at tests), the Wilcoxon test will tend to be better at picking up early departures from the null picking up early departures from the null hypothesis and the log-rank test will tend to be hypothesis and the log-rank test will tend to be more sensitive to departures in the tail.more sensitive to departures in the tail.
Comparison with Likelihood Comparison with Likelihood Ratio Test in SASRatio Test in SAS
The likelihood ratio test employed in The likelihood ratio test employed in SAS assumes the data within the SAS assumes the data within the various strata are exponentially various strata are exponentially distributed and censoring in non-distributed and censoring in non-informative. Thus, this is a informative. Thus, this is a parametric method that smoothes parametric method that smoothes across the entire curve.across the entire curve.