hsrp 734: advanced statistical methods july 10, 2008

HSRP 734: HSRP 734: Advanced Statistical Advanced Statistical

MethodsMethodsJuly 10, 2008July 10, 2008

ObjectivesObjectives

Describe the Kaplan-Meier Describe the Kaplan-Meier estimated survival curveestimated survival curve

Describe the log-rank testDescribe the log-rank test Use SAS to implement Use SAS to implement

Kaplan-Meier Estimate of Survival Kaplan-Meier Estimate of Survival Function Function S(t)S(t)

The Kaplan-Meier estimate of the The Kaplan-Meier estimate of the survival function is a simple, useful survival function is a simple, useful and popular estimate for the survival and popular estimate for the survival function.function. This estimate incorporates both This estimate incorporates both

censored and noncensored observationscensored and noncensored observations Breaks the estimation problem down Breaks the estimation problem down

into small piecesinto small pieces

Kaplan-Meier Estimate of Kaplan-Meier Estimate of the Survival Function S(t)the Survival Function S(t)

For grouped survival data,For grouped survival data,

Let interval lengths Let interval lengths LLjj become very small – become very small – all of length all of length L=L=t t and let and let tt11, t, t22, … be times , … be times of events (survival times)of events (survival times)

tjj

jj

N

Ly

tbeyongSurvivetS

thru bins

Estimated

1:

1

)Pr()(ˆ


2 cases to consider in the previous 2 cases to consider in the previous equationequation Case 1. No event in a bin (interval)Case 1. No event in a bin (interval)

does not change does not change —— which means that which means that we can ignore bins with no eventswe can ignore bins with no events

11

00

j

jj

j

j

j

jj

N

Ly

N

L

N

Ly

)(ˆ tS

Kaplan-Meier Estimate of Kaplan-Meier Estimate of the Survival Function S(t)the Survival Function S(t) Case 2. Case 2. yyjj events occur in a bin (interval) events occur in a bin (interval)

Also: Also: nnjj persons enter the bin persons enter the bin

assume any censored times that occur in the assume any censored times that occur in the bin occur at the end of the binbin occur at the end of the bin

j

jj

j

j

j

jj

n

yn

tn

ty

N

Ly

11


So, as So, as t t → 0→ 0, we get the , we get the Kaplan- Meier estimate Kaplan- Meier estimate of the survival function of the survival function S(t)S(t)

Also called the “product-limit estimate” of the Also called the “product-limit estimate” of the survival function survival function S(t)S(t)

Note: each conditional probability estimate is Note: each conditional probability estimate is obtained from the observed number at risk for obtained from the observed number at risk for an event and the observed number of events an event and the observed number of events (n(njj--yyjj) / n) / njj

)convention(by 1)0(ˆ

)(ˆ:

S

n

yntS

ttjj

jj

j

Kaplan-Meier Estimate of Kaplan-Meier Estimate of Survival Function Survival Function S(t)S(t)

We begin byWe begin by Rank ordering the survival times (including the Rank ordering the survival times (including the

censored survival times)censored survival times) Define each interval as starting at an observed Define each interval as starting at an observed

time and ending just before the next ordered timetime and ending just before the next ordered time Identify the number at risk within each intervalIdentify the number at risk within each interval Identify the number of events within each intervalIdentify the number of events within each interval Calculate the probability of surviving within that Calculate the probability of surviving within that

intervalinterval Calculate the survival function for that interval as Calculate the survival function for that interval as

the probability of surviving that interval times the the probability of surviving that interval times the probability of surviving to the start of that intervalprobability of surviving to the start of that interval

GroupGroup Weeks in remission Weeks in remission -- ie, time-- ie, time

to relapseto relapse

Maintenance Maintenance chemo (chemo (X=1X=1))

9, 13, 13+, 18, 23, 9, 13, 13+, 18, 23, 28+, 31,28+, 31,

34, 45+, 48, 161+34, 45+, 48, 161+

No No maintenance maintenance chemo (chemo (X=0X=0))

5, 5, 8, 8, 12, 16+, 5, 5, 8, 8, 12, 16+, 23, 27,23, 27,

30+, 33, 43, 4530+, 33, 43, 45

Example - AMLExample - AML

+ indicates a censored time to relapse; e.g., 13+ = more than 13 weeks to relapse

Example – AMLExample – AML Calculation of Kaplan-Meier estimates:Calculation of Kaplan-Meier estimates:

In the “not maintained on chemotherapy” group:In the “not maintained on chemotherapy” group:

TimeTime At At riskrisk

EventsEvents

ttjj nnjj yyjj

00 1212 00 1.0001.000

55 1212 22 1.000 x ((12-2)/12) = 0.8331.000 x ((12-2)/12) = 0.833

88 1010 22 0.833 x ((10-2)/10) = 0.6660.833 x ((10-2)/10) = 0.666

1212 88 11 0.666 x ((8-1)/8) = 0.5830.666 x ((8-1)/8) = 0.583

2323 66 11 0.583 x ((6-1)/6) = 0.4860.583 x ((6-1)/6) = 0.486

2727 55 11 0.486 x ((5-1)/5) = 0.3890.486 x ((5-1)/5) = 0.389

3333 33 11 0.389 x ((3-1)/3) = 0.2590.389 x ((3-1)/3) = 0.259

4343 22 11 0.259 x ((2-1)/2) = 0.1300.259 x ((2-1)/2) = 0.130

4545 11 11 0.130 x ((1-1)/1) = 00.130 x ((1-1)/1) = 0

)(ˆ tS

j

jj

jj n

yntStS

)(ˆ)(ˆ 1

Example – AML (cont’d)Example – AML (cont’d)

In the “maintained on chemotherapy” group:In the “maintained on chemotherapy” group:

TimeTime At At riskrisk

EventsEvents

ttjj nnjj yyjj

00 1111 00 1.0001.000

99 1111 11 1.000 x ((11-1)/11) = 0.9091.000 x ((11-1)/11) = 0.909

1313 1010 11 0.909 x ((10-1)/10) = 0.8180.909 x ((10-1)/10) = 0.818

1818 88 11 0.7160.716

2323 77 11 0.6140.614

3131 55 11 0.4910.491

3434 44 11 0.3680.368

4848 22 11 0.1840.184

)(ˆ tS

j

jj

jj n

yntStS

)(ˆ)(ˆ 1

Example – AML (cont’d)Example – AML (cont’d) The “Kaplan-Meier curve” plots the estimated survival The “Kaplan-Meier curve” plots the estimated survival

function vs. function vs. time time —— separate curves for each group separate curves for each group

Time

Su

rviv

al

0 50 100 150

0.0

0.2

0.4

0.6

0.8

1.0 Maintained=0

Maintained=1

Example – AML (cont’d)Example – AML (cont’d)

NotesNotes

— — Can count the total number of Can count the total number of events by counting the number of events by counting the number of steps (times)steps (times)

— — If feasible, picture the censoring If feasible, picture the censoring times on the graph as shown above. times on the graph as shown above.

Kaplan-Meier Estimate Kaplan-Meier Estimate Using SASUsing SAS

Comments on the Kaplan-Comments on the Kaplan-Meier EstimateMeier Estimate

If the event and censoring times are If the event and censoring times are tied, we assume that the censoring time tied, we assume that the censoring time is slightly larger than the death time.is slightly larger than the death time.

If the largest observation is an event, If the largest observation is an event, the Kaplan-Meier estimate is 0.the Kaplan-Meier estimate is 0.

If the largest observation is censored, If the largest observation is censored, the Kaplan-Meier estimate remains the Kaplan-Meier estimate remains constant forever.constant forever.


If we plot the empirical survival estimates, If we plot the empirical survival estimates, we observe a step function. If there are no we observe a step function. If there are no ties and no censoring, the step function ties and no censoring, the step function drops by 1/n.drops by 1/n.

With every censored observation the size With every censored observation the size of the steps increase.of the steps increase.

When does the number of intervals equal When does the number of intervals equal the number of deaths in the sample?the number of deaths in the sample?

When does the number of intervals equal When does the number of intervals equal n?n?


The Kaplan-Meier is a consistent The Kaplan-Meier is a consistent estimate of the true S(t). That estimate of the true S(t). That means that as the sample size gets means that as the sample size gets large, KM estimate converges to the large, KM estimate converges to the true value.true value.

The Kaplan-Meier estimate can be The Kaplan-Meier estimate can be used to empirically estimate any used to empirically estimate any cumulative distribution functioncumulative distribution function


The step function in K-M curve really looks The step function in K-M curve really looks like this:like this:

If you have a failure at If you have a failure at tt11 then you want to then you want to say survivorship at say survivorship at tt11 should be less than 1. should be less than 1.

For small data sets it matters, but for large For small data sets it matters, but for large data sets it does not matter.data sets it does not matter.

Confidence Interval for S(t) Confidence Interval for S(t) – Greenwood’s Formula– Greenwood’s Formula

Greenwood’s formula for the variance of Greenwood’s formula for the variance of : :

Using Greenwood’s formula, an Using Greenwood’s formula, an approximate 95% CI for S(t) isapproximate 95% CI for S(t) is

There is a “problem”: the 95% CI is not There is a “problem”: the 95% CI is not constrained to lie within the interval (0,1)constrained to lie within the interval (0,1)

ttj

jjj

j

j ynn

ytStSraV

:

2

)()(ˆ)(ˆˆ

)(ˆˆ96.1)(ˆ tSraVtS

)(ˆ tS

Confidence Interval for S(t) Confidence Interval for S(t) – Alternative Formula– Alternative Formula

Based on log(-log(S(t)) which ranges from -Based on log(-log(S(t)) which ranges from -∞ to ∞∞ to ∞

Find the standard error of above, find the Find the standard error of above, find the CI of above, then transform CI to one for CI of above, then transform CI to one for S(t)S(t)

This CI will lie within the interval [0,1]This CI will lie within the interval [0,1]

This is the default in SASThis is the default in SAS

Log-rank test for comparing Log-rank test for comparing survivor curvessurvivor curves

Are two survivor curves the same?Are two survivor curves the same? Use the times of events: Use the times of events: tt11, t, t22, ... , ...

(do not include censoring times)(do not include censoring times) Treat each event and its “set of persons still at Treat each event and its “set of persons still at

risk” (i.e., risk set) at each time risk” (i.e., risk set) at each time ttjj as an as an independent tableindependent table

Make a 2×2 table Make a 2×2 table at each tat each tjj

EventEvent No EventNo Event TotalTotal

Group AGroup A aajj nnjAjA- a- ajj nnjAjA

Group BGroup B ccjj nnjBjB-c-cjj nnjBjB

TotalTotal ddjj nnjj-d-djj nnjj

Log-rank test for Log-rank test for comparing survivor curvescomparing survivor curves

At each event time At each event time t t jj, under assumption of , under assumption of equal survival (i.e., equal survival (i.e., SSAA(t) = S(t) = SBB(t) (t) ), the ), the expected number of events in Group A out expected number of events in Group A out of the total events (of the total events (ddjj=a=ajj +c +cjj) is in ) is in proportion to the numbers at risk in group proportion to the numbers at risk in group A to the total at risk at time A to the total at risk at time ttjj::

EaEajj = d= djj x n x njAjA / n / njj

Differences between Differences between aajj and and EaEajj represent represent evidence against the null hypothesis of evidence against the null hypothesis of equal survival in the two groupsequal survival in the two groups


Use the Cochran Mantel-Haenszel idea of Use the Cochran Mantel-Haenszel idea of pooling over events pooling over events j j to get the log-rank to get the log-rank chi-squared statistic with one degree of chi-squared statistic with one degree of freedomfreedom

)1(

)(ˆ

~ˆ

2

2

1

2

2

jj

jBjAjjj

j

jj

jjj

nn

nndnda

a

Eaa

raV

raV


Idea summary:Idea summary: Create a 2x2 table at each uncensored Create a 2x2 table at each uncensored

failure timefailure time The construct of each 2x2 table is based The construct of each 2x2 table is based

on the corresponding risk seton the corresponding risk set Combine information from all the tablesCombine information from all the tables

The null hypothesis is The null hypothesis is SSAA(t) = S(t) = SBB(t)(t) for all time for all time tt..

Comparisons across Comparisons across GroupsGroups

Extensions of the log-rank test to Extensions of the log-rank test to several groups require knowledge of several groups require knowledge of matrix algebra. In general, these tests matrix algebra. In general, these tests are well approximated by a chi-squared are well approximated by a chi-squared distribution with G-1 degrees of distribution with G-1 degrees of freedom.freedom.

Alternative tests:Alternative tests: Wilcoxon family of tests (including Peto Wilcoxon family of tests (including Peto

test)test) Likelihood ratio test (SAS)Likelihood ratio test (SAS)

Comparison between Log-Comparison between Log-Rank and Wilcoxon TestsRank and Wilcoxon Tests

The log-rank test weights each failure time The log-rank test weights each failure time equally. No parametric model is assumed for equally. No parametric model is assumed for failure times within a stratum.failure times within a stratum.

The Wilcoxon test weights each failure time by a The Wilcoxon test weights each failure time by a function of the number at risk. Thus, more function of the number at risk. Thus, more weight tends to be given to early failure times. weight tends to be given to early failure times. As in the log-rank test, no parametric model is As in the log-rank test, no parametric model is assumed for failure times within a stratum.assumed for failure times within a stratum.

Between these two tests (Wilcoxon and log-rank Between these two tests (Wilcoxon and log-rank tests), the Wilcoxon test will tend to be better at tests), the Wilcoxon test will tend to be better at picking up early departures from the null picking up early departures from the null hypothesis and the log-rank test will tend to be hypothesis and the log-rank test will tend to be more sensitive to departures in the tail.more sensitive to departures in the tail.

Comparison with Likelihood Comparison with Likelihood Ratio Test in SASRatio Test in SAS

The likelihood ratio test employed in The likelihood ratio test employed in SAS assumes the data within the SAS assumes the data within the various strata are exponentially various strata are exponentially distributed and censoring in non-distributed and censoring in non-informative. Thus, this is a informative. Thus, this is a parametric method that smoothes parametric method that smoothes across the entire curve.across the entire curve.

hsrp 734: advanced statistical methods july 10, 2008

Documents

kaplan meier estimate

estimated survival function

survival function stnote

survival function stalso

survival function stfor

survival function stso

survival function stcase

eventskaplanmeier estimate