sanjib basu association for - iasct home to... · 2015-03-24 · 1 . sanjib basu . northern...
TRANSCRIPT
1
Sanjib Basu
Northern Illinois University, De Kalb, USA
&
Rush University Medical Center, Chicago, USA
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
2
Outline
Time-to-event data
Censored Data
Notations and Terminology
Survival Analysis in R
Kaplan-Meier Estimator and in R
Two sample comparison (Log-Rank test) and in R
Proportional Hazards Regression Model and in R
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
3
Time-to-event data
Many questions in medicine deal with evaluating the effect of a characteristic or drug on the time to an event, whether the event is a primary outcome or a surrogate.
Such time-to-event is often referred to as the “survival time”. To determine the survival time T, three basic elements are needed
A time origin or starting point (birth, start of study, entry time of the subject into the study)
An ending event of interest
A measurement scale for the passage of time (days, weeks etc.).
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
4
Examples of Time-to-event data
We can use survival analysis when you wish to analyze survival times or “time-to-event” times
“Time-to-Event” data include:
Time to death (Overall Survival or OS)
Time to recurrence or relapse (TTR)
Time to disease progression or death, whichever happens first (Progression-free survival or PFS)
Time until response to a treatment
Time until cancellation of service
Time until resumption of smoking by someone who had quit
Time until certain percentage of weight loss Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
5
More Examples of Time-to-event data
Suppose you wish to analyze the time between diagnosis at a hospital until death for a lung cancer patient.
Response/Status Variables : Length-of-Follow up, Status
Predictor Variables: Age, Gender, Race, White Blood Counts, Tumor Type, Treatment Type, Cancerous Mass Size
Suppose you are interested in the time it takes before one sees results for a certain treatment.
Response/Status Variables: Time it Takes to see results, Status
Predictor Variables: Age, Gender, Type of Treatment, Weight, Height, exercise (Y/N), healthy eating (Y/N)
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
6
More Examples of Time-to-event data
Suppose you are interested in comparing the time until a subject loses 10% body weight on one of two exercise programs.
Response/Status Variables: Time it Takes, Status
Predictor Variables: Age, Gender, Starting Weight, BP, BMI, Exercise Program
An actuary (someone who works in insurance industry) might be interested in time until the company has to make a payment on a life-insurance or car insurance
Response/Status Variables: Time it Takes, Status
Predictor Variables: Age, Gender, Weight, Height, Risk category, many others
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
7
More non-medical Examples
A bank or a mortgage company might be interested in time-until default or refinancing
Response/Status Variables: Time it Takes, Status
Predictor Variables: Age, Gender, SocioEconomic status, may others
A service provider (say Cable TV) might be interested in the time it takes before a customer switches to a competing service
Response/Status Variables: Time it Takes, Status
Predictor Variables: Age, Gender, Race, Cable Provider, Average Income, Average number of complaints per month
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Time-to-event: Choice of time scale
How do we measure time and when does the clock start ticking (that is, what is time origin =0)?
This is usually dictated by the scientific question of interest
8
Scale Origin Comment
-----------------------------------------------------------------------
Study time Diagnosis time or Clinical
Start of Treatment Trials
Study time First Exposure (Occupational) Epidemiology
Age Birth (subject) Epidemiology Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Examples
Time to cancer relapse among children in remission from
acute leukemia, 6-MP vs. placebo
Time origin: entry into the clinical trial/start treatment
Outcome: time to relapse
Retrospective study of survival of cancer patients and level of a biomarker
Time Origin: (typically) Date of diagnosis or date of tumor resection
Outcome: time to death
9
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Examples
Survival analysis of Breast cancer patients from a cancer (SEER) registry
Time Origin: Birth or date of diagnosis
Outcome: time to death
Time until default or refinancing of a mortgage
Time Origin: Start of the mortgage contract
Outcome: Time to end of contract
10
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Censoring
In many respects, time-to-event data analysis grew as a unique topic area in Statistics due to the presence of censoring
Censoring is present when we have some information about a subject’s event time, but we don’t know the exact event time.
For the analysis methods we will discuss to be valid, censoring mechanism must be independent of the survival mechanism.
11
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example
The recurrence time of 42 patients with acute leukemia were reported from a clinical trial undertaken to assess the ability of the drug 6-mercaptopurine (6-MP) to maintain remission.
Each patient was randomized to either 6-MP or placebo;
the study was terminated after one-year.
Patients were enrolled sequentially at different times.
The recurrence times are measured in weeks.
Time to relapse in the Treatment (6-MP) group 6,6,6,7,10,13,16,22,23, 6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+
12
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
6-MP Leukemia study
Time to relapse in the Treatment (6-MP) group 6,6,6,7,10,13,16,22,23,
6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+
For the 6+ data point,
The subject was followed up to the 6th week
The subject did not relapse until the that time (6th week)
The subject was lost-to-follow-up
We do not know what happened after 6th week. The subject may have relapsed at a later time, but we do not know that, and if so, we do not know the exact time either
13
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
6-MP Leukemia study contd
Time to relapse in the Treatment (6-MP) group 6,6,6,7,10,13,16,22,23, 6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+
If there were no censoring and the data were 6,6,6,7,10,13,16,22,23, 6,9,10,11,17,19,20,25, 32, 32, 34, 35
Then one could have used usual statistical methods for their analysis
For example, median survival would have been median(6,6,..,23,6,9,…..,35)= 16
Presence of censored data is an aspect that makes time-to-event analysis distinct
14
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Censored data in time-to-event analysis
Time to relapse in the Treatment (6-MP) group 6,6,6,7,10,13,16,22,23,
6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+
In time-to-event analysis, the information in the data point is 6+ taken as it is
The time to relapse for this subject is not imputed
The time to relapse is not taken as 6 week.
Rather, the information that is used in a time-to-event analysis is that subject did not relapse (survived from relapse) up to 6th week.
15
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Right Censoring
Time to relapse in the Treatment (6-MP) group 6,6,6,7,10,13,16,22,23, 6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+
Usual Reasons for Right-Censoring
A subject does not experience the event before the study ends
A person is lost to follow-up during the study period
A person withdraws from the study
A subject experiences a terminating event (such as death) from a different cause
16
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Types of right censoring
Fixed type I censoring occurs when a study is designed to end after a specified duration of follow-up (one year in the 6-MP study). In this case, everyone who does not have an event observed during the course of the study is censored at end of study.
This is also sometimes referred to as administrative censoring
17
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Potential data without censoring
Time 0
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
“Administrative” censoring
Time 0 STUDY END
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Observed data
Time 0 STUDY END
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Types of right censoring
In random type I censoring, the study is designed to end after a specified duration , but censored subjects do not all have the same censoring time. This is the main type of right-censoring we will be concerned with.
This is usually caused by
Patient drop-out
Loss to follow-up
In type II censoring, a study ends when there is a
prespecified number of events. 21
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Drop-out or Loss-TFU
Time 0 STUDY END
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
How do we ‘treat” the data?
Time of
enrollment
Shift everything so each patient time represents time on study
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Other types censoring
Left censoring In right censoring, we observe that subject survived up to certain time (such as 6 weeks), that is, the survival time is ≥ 6 weeks. In left censoring, we only observe that the survival time is ≤ some value.
Interval censoring refers to the case where we only know that the survival time is in some interval, that is L ≤ Survival time ≤ U
Example: Any study where events are observed at fixed follow-up visits. For example, a study periodically screens women for breast cancer. If a woman tests positive for cancer, we only know that cancer occurred in the interval between the previous screening and the current one.
24
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Non-informative censoring
Regardless of the type of censoring, in all standard survival analysis, we assume that censoring is non-informative about the event; that is, the censoring is caused by something other than the impending failure.
Modeling dependent censoring is an active research area.
25
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
We will use T to denote the time-to-event. This is our outcome variable.
The survival function is S(t) = P(T > t)
For example, at t=2 years, the survival function S(2) provides the (theoretical) probability of surviving beyond 2 years. This, for example, can be interpreted as the proportion of subjects who will survive beyond 2 years.
Terminology and notations
26
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
The Survival function
The survival function gives the probability that a subject will survive past time t.
As t ranges from 0 to ∞ (infinity), the survival function has the following properties
It is non-increasing
At time t = 0, S(t) = 1. In other words, everyone is is surviving at time 0.
At time t ∞, S(t ∞) = 0. As time goes to infinity, the survival curve goes to 0. That is, the probability of eventual failure is 1.
In theory, the survival function is smooth. In practice, we observe events on a discrete time scale (days, weeks, etc.).
27
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
The hazard function
The hazard function h(t) (or (t)) is defined as
The hazard function is based on a conditional probability of the type (for t=3 and =2) P(3 < T ≤ 5 | T > 3). that is, given that a patient has survived up to time t=3, what is the probability that the patient will have the event by time 5?
The hazard function is an instantaneous failure rate. Given that a patient has survived up to time t, the hazard at time t is the probability of having the event in the next instant
28
)|(
0
lim)(
tTtTtPth
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
The hazard function
The hazard function h(t) , being an instantaneous failure rate, is the derivative of the cumulative hazard function H(t)
The hazard function, being an instantaneous conditional probability, is also linked to the survival function.
29
t
duuhtHortHdt
dth
0)()()()(
))(exp())(exp()()(log)(0t
duuhtHtSortStH
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Relative Risk or hazard ratio
Suppose we want to compare two groups. The Treatment group 1 has hazard hT(t) and the Control group 2 has hazard hC(t) . The relative risk of Treatment to Control is then given by the ratio
In general, this relative risk is a function of time t, that is, the relative risk at time t=2 years may not be the same as the relative risk at time t=5 years.
If we make the assumption that relative risk remains constant over time, RR(t)=constant, we have a proportional hazards model with the hazard of the control group serving as the baseline.
30
)(
)()(
th
thtRR
C
T
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Recording Survival data
31
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Recording Survival Data contd.
32
Recorded data Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Survival analysis in R
R is an Open Source (and freely available) environment for statistical computing and graphics.
Base R and most R packages are available for download from the Comprehensive R Archive Network (CRAN)
R is under active development - typically two major releases per year.
base R comes with a number of basic data management, analysis, and graphical tools
R's power and flexibility, however, lie in its array of packages (currently more than 4,000 !)
33
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Interacting with R
You can work straight with R but most people use some other front end.
RStudio, an Integrated Development Environment (IDE)
Deducer, a Graphical User Interface (GUI)
34
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Data input in R
The simplest way to input a rectangular data set is to save it as a comma-separated value (csv) fille and read it with the R command read.csv.
The first argument to read.csv is the name of the file. On Windows it can be tricky to get the file path correct. You can set the path globally using > setwd(“C:/Mydirectory/SurvivalCourse/") mydata <- read.csv(“filename.csv”)
Another approach is to use the function file.choose which brings up a “chooser“ panel through which you can select a particular file. The idiom to remember is
> mydata <- read.csv(file.choose()) or read.delim(file.choose())
35
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Reading data from other software
You can read in datasets from other statistical analysis software using functions found in the foreign package
require(foreign)
read.dta Read Stata Binary Files
read.mtp Read a Minitab Portable Worksheet
read.spss Read an SPSS Data File
read.ssd Obtain a Data Frame from a SAS Permanent Dataset, via read.xport
read.systat Obtain a Data Frame from a Systat File
read.xport Read a SAS XPORT Format Library
36
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Reading Excel data
Excel spreadsheets can be read into R utilizing the xlsx package
require (xlsx)
Package: xlsx Title: Read, write, format Excel 2007 and Excel 97/2000/XP/2003 files Version: 0.5.1 Date: 2012-03-31
Description: Provide R functions to read/write/format Excel 2007 and Excel 97/2000/XP/2003 file formats.
URL: http://code.google.com/p/rexcel/
37
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Data in R
Standard rectangular data sets (columns are variables, rows are observations) are stored in R as data frames.
The columns can be numeric variables (e.g. measurements or counts) or factor variables (categorical data) or ordered factor variables. These types are called the class of the variable.
38
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Estimating the survival function S(t) based on observed data
We assume here that every subject follows the same survival function (no covariates or other individual differences
• We can use nonparametric estimators like the Kaplan-Meier estimator
• We can estimate the survival distribution by making parametric assumptions
Exponential
Weibull
Gamma
log-normal
39
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Non-parametric estimation of S(t)
• When no event times are censored, a non-parametric
estimator of S(t) is 1−Fn(t). Here Fn (t) is the empirical distribution function which puts equal, 1/n, mass at each data point.
• When some observations are censored, we can estimate S(t) using the Kaplan-Meier product-limit estimator.
Like the Empirical CDF, the Kaplan-Meier estimator per se, is completely model free. In this sense, it can also be considered as a descriptive statistics.
40
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Kaplan-Meier estimator of S(t)
Begin with data as observed
Convert to “on-study” time
41
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Kaplan-Meier estimator of S(t)
“On-study” time
Kaplan-Meier estimator of the survival curve
42
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Kaplan-Meier estimator of S(t)
6-MP data: Treatment (6-MP) group, n=21
Time to recurrence: 6,6,6,7,10,13,16,22,23,
6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+
43
t # at risk=n #Events = d 1-d/n KM Estimator S(t)
6 21 3 1-3/21=0.8571 1*0.8571=0.8571
7 17 1 1-1/17=0.9412 0.8571*0.9412=0.8067
10 15 1 1-1/15=0.9333 0.8067*0.9333=0.7529
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
How to obtain the Kaplan-Meier estimate from R
We will use the survival package in R
Therneau T (2013). A Package for Survival Analysis in S. R package version 2.37-4,
http://CRAN.R-project.org/package=survival.
44
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Survival objects in R
Many functions in the survival package apply methods to Surv objects, which are survival-type objects created using the Surv() function.
For right-censored data, only two arguments are needed in
the Surv() function: a vector of times and a vector indicating which times are observed and censored. my.surv.object <- Surv(time, event)
There are many other types of survival objects that can be created, such as or Surv(time, time2, event, type)
45
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
KM estimate in R: survfit() function
The Kaplan-Meier estimate can be obtained using the survfit function in R
km.est <- survfit(my.surv.object ~ 1)
summary(km.est)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.720 1.000
7 17 1 0.807 0.0869 0.653 0.996
10 15 1 0.753 0.0963 0.586 0.968
13 12 1 0.690 0.1068 0.510 0.935
16 11 1 0.627 0.1141 0.439 0.896
… .. … … …. …. ….
46
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
KM estimate from survfit() function
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.720 1.000
7 17 1 0.807 0.0869 0.653 0.996
10 15 1 0.753 0.0963 0.586 0.968
13 12 1 0.690 0.1068 0.510 0.935
16 11 1 0.627 0.1141 0.439 0.896
… .. … … …. …. ….
47
t # at risk=n #Events = d 1-d/n KM Estimator S(t)
6 21 3 1-3/21=0.8571 1*0.8571=0.8571
7 17 1 1-1/17=0.9412 0.8571*0.9412=0.8067
10 15 1 1-1/15=0.9333 0.8067*0.9333=0.7529
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
KM estimate plot
plot(km.est, main="Kaplan-Meier estimate with 95% confidence bounds- 6-MP group", xlab="Weeks", ylab="Time to Recurrence",
lty = c(1,2,2), lwd=2, col=c("blue","purple","purple") )
48
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Mean and median survival estimates with bounds
Most medical literature report the median survival and its associated confidence interval
49
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Mean and Median survival estimates with bounds
The survfit() function can estimate the median survival and its 95% confidence interval
km.est <- survfit(Surv(Time.to.Relapse,recur) ~ 1)
km.est
records n.max n.start events median 0.95LCL 0.95UCL
21 21 21 9 23 16 NA
Using survfit() in conjunction with print(), the mean survival time and its standard error may be obtained. This however requires a finite upper bound on survival times which is set at the largest observed or censored time.
> print(km.est, print.rmean=TRUE)
records n.max n.start events *rmean *se(rmean) median 0.95LCL 0.95UCL
21.00 21.00 21.00 9.00 23.29 2.83 23.00 16.00 NA
* restricted mean with upper limit = 35
50
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Parametric models for survival functions
The Kaplan-Meier estimator is a very useful tool for estimating survival functions. Sometimes, we may want to make more assumptions that allow us to model the data in more detail. By specifying a parametric form for S(t), we can
easily compute selected quantiles of the distribution
estimate the expected failure time
derive a concise equation and smooth function for estimating S(t), H(t) and h(t)
may be able to estimate S(t) better than KM assuming the parametric form is correct!
51
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Parametric survival models
Some popular distributions for estimating survival curves are
• Exponential
• Weibull
• log-normal (log(T) has a normal distribution)
• log-logistic
52
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Maximum likelihood estimation
Maximum likelihood estimation is usually used to estimate the unknown parameters of the parametric distributions.
When an event is observed (uncesored), the ith subject contributes a density term f(y) to the likelihood
If the event is censored, the ith subject contributes a survival term S(y)= P(T > y) to the likelihood
The joint likelihood for all n subjects is
53
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Survreg() function in survival package
We use the survreg function to fit a parametric exponential model to the 6-MP group data
survreg(formula = Surv(Time.to.Relapse, recur) ~ 1, data =
sixmp.trt, dist = "exponential") Value Std. Error z p (Intercept)
3.69 0.333 11.1 2e-28
Scale fixed at 1
Exponential distribution Loglik(model)= -42.2 Loglik(intercept only)= -42.2 Number of Newton-Raphson Iterations: 4 n= 21
54
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Exponential model fit
Exponential distribution has S(t) = exp(-t)
Survreg retruns an estimate of in the negative log scale Estimate of -log = 3.686 Hence, estimate of is = exp(-3.686) = 0.02506
Estimate of the survival function is
55
t)*0.02506exp()(ˆ tS
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
56
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Two sample comparison for uncensored data Introduction
Two group comparison is one of the most widely used method in biomedical analysis
The simplest and one of the most used two-sample method is the two (independent) samples t-test
This test is based on parametric, two normal distribution assumptions.
Even under parametric assumptions, one can compare parameters other than the mean, such as
57
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Two sample comparison for uncensored data in more nonparametric settings
1. 2.
58
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Two sample comparison for survival data
59
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Two sample comparison for survival data
60
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Comparing survivals at a single time point
Issues
Choice of the time point t0
For the scientific point of view (such as clinically), this could be of some interest. But generally, more interest lies in comparing the survival probabilities at many time points.
This is inefficient since we throw away information on the rest of the survival curves
Goal: compare the entire survival curves (up to the maximum observed time)
61
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Two sample comparison for survival data:
Logrank test
Goal: compare the entire survival curves (up to the
maximum observed time)
62
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
LogRank Test: Idea
At an observed event time, we can think of the data as follows
The null hypothesis of EQUAL survival function for the two groups implies the independence of Group and Status (Dead or Alive) . Thus, under the null hypothesis, the expected value of D1k is
The test statistic is based on the differences between observed Ok =D1k in the first group and what is expected Ek
under the null hypothesis, that is which has a distribution under null.
63
Group Number of death at ti Number Alive at ti Total= # at risk
6-MP D1k A1k n1k
Placebo D2k A2k n2k
Total Dk Ak Nk
k
kkkk N
nDDEE 1
1 )(
)(
)( 2
kk
kk
EOVar
EO
2
1Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
LogRank test in R: survdiff()
survdiff(Surv(Time.to.Relapse,recur) ~ Treatment) N Observed Expected (O-E)^2/E (O-E)^2/V
Treatment=6-MP 21 9 19.3 5.46 16.8 Treatment=Placebo 21 21 10.7 9.77 16.8 Chisq= 16.8 on 1 degrees of freedom, p= 4.17e-05
We find strongly significant difference in survival between the 6-mp and placebo groups as expected
64
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
LogRank type tests
A whole family of log-rank type tests are obtained based on the weighted differences
The weights are often taken as
=0 gives the usual Log-rank test
=1 gives the Peto & Peto modication of the Gehan-Wilcoxon test
survdiff(Surv(Time.to.Relapse,recur) ~ Treatment, rho=1)
65
2)( kkk EOW
)}(ˆ{ kk tSW
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Regression analysis for survival data
We consider the case when in addition to survival time Y and the censoring indicator we also observe covariate x for each observation.
Example: A group of patients who dies from acute myelogenous leukemia
were classified into two subgroups according to the presence and absence of a morphologic characteristic of white cells. Patients termed “AG positive” were identified by the presence of Auer rods and/or significant granulature of leukemic cells in the bone marrow at diagnosis. These factors were absent for AG-negative patients. Leukemia is a cancer characterized by an overproliferation of white blood cells; the higher the white blood cell count (WBC), the more severe the disease. Thus, WBC could be an important covariate for predicting survival of leukemia patients. Link to CSV data
66
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Recap: The hazard function
The hazard function h(t) (or (t)) is defined as
The hazard function is based on a conditional probability of the type (for t=3 and =2) P(3 < T ≤ 5 | T > 3). that is, given that a patient has survived up to time t=3, what is the probability that the patient will have the event by time 5?
The hazard function is an instantaneous failure rate. Given that a patient has survived up to time t, the hazard at time t is the probability of having the event in the next instant
67
)|(
0
lim)(
tTtTtPth
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Hazard Function
Useful for conceptualizing how chance of event changes over time
That is, consider hazard ‘relative’ over time
Examples:
Treatment toxicity related mortality
Early on, high risk of death
Later on, risk of death decreases
Aging
Early on, low risk of death
Later on, higher risk of death
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Shapes of hazard functions
Increasing (Increasing failure rate or IFR)
Natural aging and wear
Decreasing (DFR)
Early failures due to device or transplant failures
Bathtub
Populations followed from birth
Hump-shaped
Initial risk of event, followed by decreasing chance of event
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Examples
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
Time
Haz
ard
Func
tion
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Relative Risk or hazard ratio
Suppose we want to compare two groups. The AG+ group has hazard hAG+(t) and the AG- group 2 has hazard hAG-(t) . The relative risk of Treatment to Control is then given by the ratio
In general, this relative risk is a function of time t, that is, the relative risk at time t=2 years may not be the same as the relative risk at time t=5 years.
If we make the assumption that relative risk remains constant over time, RR(t)= , we have a proportional hazards model hAG+(t) = hAG-(t) with the hazard of the AG- group serving as the baseline.
71
)(
)()(
th
thtRR
AG
AG
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Proportional hazards (PH) model
hAG+(t) = hAG-(t), or
Proportional hazard means that the log-hazard functions of the two groups are parallel and the survival curves do not cross each other This is the setup of the Cox Proportional Hazards model.
The “relative risk” is called the hazard ratio (HR). In the proportional hazards model, we make the assumption that the hazard ratio is constant over time. Hence HR is a single number
72
)]([)( tStS AGAG
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Proportional hazards (PH) model
For a continuous covariate x, such as WBC , the (Cox) proportional hazard (PH) regression models the relative risk of a subject with covariate x to another subject who has covariate x=0.
In particular, the PH model assumes that the relative risk is constant over time (hence the name proportional) and depends only on the covariate x, as
Next, this relative risk is modeled as (x) = exp(x)
Once we combine these two parts and rewrite, we get
h(t|X=x) = exp(x) h0(t)
73
xxtRRXth
xXth
),(
)0|(
)|(
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Proportional hazards (PH) model
h(t|X=x) = h0(t) exp(x)
The 1st component is h0(t), which is an unspecified baseline hazard
The 2nd component is the linear regression part, x , inside the exponent.
Going back to the binary covariate AG (1=AG+ and 0=AG-), we have h(t|X=AG+) = h0(t) exp() h(t|X=AG-) = h0(t)
So, the hazard ratio (HR) between AG+ and AG- is given by =exp()
The crucial assumption in the proportional hazard regression model is that the covariate changes the baseline hazard by a constant (not time-varying) multiple
74
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Proportional hazards (PH) model in R the coxph() function
The coxph() function in the survival() package can fit a Proportional Hazards regression model ph.ag <- coxph(Surv(SurvWk,Status)~ AG, data=agwbc)
coef exp(coef) se(coef) z p AGAG+ -1.18 0.307 0.417 -2.83 0.0047
Likelihood ratio test=8.26 on 1 df, p=0.00405 n= 33, number of events= 33
Estimate of is -1.18. Estimate of HR= =exp() is 0.307
75
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Proportional hazards (PH) model in R the coxph() function
coef exp(coef) se(coef) z p AGAG+ -1.18 0.307 0.417 -2.83 0.0047
Likelihood ratio test=8.26 on 1 df, p=0.00405 n= 33, number of events= 33
AG+ has a reduced hazard of death (compared to AG-, which is the reference). The HR is 0.307, that is, AG+ has 0.307 times reduced hazard than AG-.
The effect of AG status on survival is strongly significant (Wald test p-value=0.0047, or likelihood ratio test p=0.0041)
76
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Confidence interval for HR in R
summary(ph.ag) returns a conf interval for HR exp(coef) exp(-coef) lower .95 upper .95
AGAG+ 0.3072 3.255 0.1356 0.696
Likelihood ratio test= 8.26 on 1 df, p=0.00405 Wald test = 8 on 1 df, p=0.00468 Score (logrank) test = 8.75 on 1 df, p=0.0031
77
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
PH regression with a numeric covariate
ph.wbc <- coxph(Surv(SurvWk,Status)~ WBC1000, data=agwbc) coef exp(coef) se(coef) z p
WBC1000 0.0101 1.01 0.00488 2.07 0.038
The HR is 1.01. Each 1 unit change in WBC1000 increases the hazard of death by 1.01.
A patient with WBC=10000 has 1.01 times higher hazard than a patient with WBC=9000
A patient with WBC=11000 has (1.01)2 times higher hazard than a patient with WBC=9000
The effect of WBC on the hazard of death is significant, p=0.038 (within the Cox PH model).
78
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
PH regression with multiple covariate
When there are more than one covariate, such as X1 = AG status (a binary covariate) and X2= WBC (a continuous covariate), the PH regression models the hazard function as h(t|X=x) = h0(t) exp(1 x 1 + 2 x 2 )
where , as before, h0(t) is the baseline hazard function.
ph.mult <- coxph(Surv(SurvWk,Status)~ AG + WBC1000, data=agwbc)
coef exp(coef) se(coef) z p
AGAG+ -1.08919 0.336 0.42635 -2.55 0.011
WBC1000 0.00784 1.008 0.00499 1.57 0.120
Likelihood ratio test=10.5 on 2 df, p=0.00519 n= 33, number of events= 33
79
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
PH regression with multiple covariate
h(t|X=x) = h0(t) exp(1 x 1 + 2 x 2 )
Overall test: H0 : 1 =0, 2 =0 Likelihood ratio test=10.5 on 2 df, p=0.00519 n= 33
Tests for single covariate: H10 : 1 =0 and H2
0 : 2 =0
coef exp(coef) se(coef) z p
AGAG+ -1.08919 0.336 0.42635 -2.55 0.011
WBC1000 0.00784 1.008 0.00499 1.57 0.120
These are conditional tests with adjusted p-values
80
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Predictions from the Cox PH model
81
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Interaction
An interaction effect allows one to assess if the effect of X1 = AG status and X2= WBC on survival is beyond additive (within the setting of PH regression) h(t|X=x) = h0(t) exp(1 x 1 + 2 x 2 + 3 x 1*x 2 )
ph.interac <- coxph(Surv(SurvWk,Status)~ AG*WBC1000, data=agwbc)
coef exp(coef) se(coef) z p
AGAG+ -1.735824 0.176 0.54631 -3.1774 0.0015
WBC1000 -0.000184 1.000 0.00684 -0.0268 0.9800
AGAG+:WBC1000 0.019265 1.019 0.00978 1.9706 0.0490
Likelihood ratio test=14.4 on 3 df, p=0.00245 n= 33
82
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Predictions from Interaction model
83
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Graphical Check for proportional hazards: categorical covariate
Proportional Hazards regression using the binary covariate AG ph.ag <- coxph(Surv(SurvWk,Status)~ AG, data=agwbc)
coef exp(coef) se(coef) z p AGAG+ -1.18 0.307 0.417 -2.83 0.0047
Proportional hazards means that log-hazards and hence the log-cumulative hazards for the twp groups, AG+ and AG-, will be proportional.
This is the setup of the PH model and hence this is what we are going to see if we plot the log-cumulative hazards.
84
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
85
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Graphical Check for proportional hazards: categorical covariate
However, since AG is categorical, we can also fit Kaplan-Meier estimates
Without the proportional hazards assumption
But the KM estimates will be fitted separately in the AG+ and AG- subsets km.ag <- survfit(Surv(SurvWk,Status)~ AG, data=agwbc)
86
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Graphical Check for proportional hazards: categorical covariate
87
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Comparison of KM and PH estimates
88
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Other diagnostics for PH assumption
Based on constructing time-dependent covariates
Based on residual diagnostics
89
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls