sanjib basu association for - iasct home to... · 2015-03-24 · 1 . sanjib basu . northern...

1

Sanjib Basu

Northern Illinois University, De Kalb, USA

&

Rush University Medical Center, Chicago, USA

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

2

Outline

Time-to-event data

Censored Data

Notations and Terminology

Survival Analysis in R

Kaplan-Meier Estimator and in R

Two sample comparison (Log-Rank test) and in R

Proportional Hazards Regression Model and in R

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

3

Time-to-event data

Many questions in medicine deal with evaluating the effect of a characteristic or drug on the time to an event, whether the event is a primary outcome or a surrogate.

Such time-to-event is often referred to as the “survival time”. To determine the survival time T, three basic elements are needed

A time origin or starting point (birth, start of study, entry time of the subject into the study)

An ending event of interest

A measurement scale for the passage of time (days, weeks etc.).

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

4

Examples of Time-to-event data

We can use survival analysis when you wish to analyze survival times or “time-to-event” times

“Time-to-Event” data include:

Time to death (Overall Survival or OS)

Time to recurrence or relapse (TTR)

Time to disease progression or death, whichever happens first (Progression-free survival or PFS)

Time until response to a treatment

Time until cancellation of service

Time until resumption of smoking by someone who had quit

Time until certain percentage of weight loss Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

5

More Examples of Time-to-event data

Suppose you wish to analyze the time between diagnosis at a hospital until death for a lung cancer patient.

Response/Status Variables : Length-of-Follow up, Status

Predictor Variables: Age, Gender, Race, White Blood Counts, Tumor Type, Treatment Type, Cancerous Mass Size

Suppose you are interested in the time it takes before one sees results for a certain treatment.

Response/Status Variables: Time it Takes to see results, Status

Predictor Variables: Age, Gender, Type of Treatment, Weight, Height, exercise (Y/N), healthy eating (Y/N)

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

6

More Examples of Time-to-event data

Suppose you are interested in comparing the time until a subject loses 10% body weight on one of two exercise programs.

Response/Status Variables: Time it Takes, Status

Predictor Variables: Age, Gender, Starting Weight, BP, BMI, Exercise Program

An actuary (someone who works in insurance industry) might be interested in time until the company has to make a payment on a life-insurance or car insurance


Predictor Variables: Age, Gender, Weight, Height, Risk category, many others

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

7

More non-medical Examples

A bank or a mortgage company might be interested in time-until default or refinancing


Predictor Variables: Age, Gender, SocioEconomic status, may others

A service provider (say Cable TV) might be interested in the time it takes before a customer switches to a competing service


Predictor Variables: Age, Gender, Race, Cable Provider, Average Income, Average number of complaints per month

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Time-to-event: Choice of time scale

How do we measure time and when does the clock start ticking (that is, what is time origin =0)?

This is usually dictated by the scientific question of interest

8

Scale Origin Comment

-----------------------------------------------------------------------

Study time Diagnosis time or Clinical

Start of Treatment Trials

Study time First Exposure (Occupational) Epidemiology

Age Birth (subject) Epidemiology Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Examples

Time to cancer relapse among children in remission from

acute leukemia, 6-MP vs. placebo

Time origin: entry into the clinical trial/start treatment

Outcome: time to relapse

Retrospective study of survival of cancer patients and level of a biomarker

Time Origin: (typically) Date of diagnosis or date of tumor resection

Outcome: time to death

9

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Examples

Survival analysis of Breast cancer patients from a cancer (SEER) registry

Time Origin: Birth or date of diagnosis

Outcome: time to death

Time until default or refinancing of a mortgage

Time Origin: Start of the mortgage contract

Outcome: Time to end of contract

10

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Censoring

In many respects, time-to-event data analysis grew as a unique topic area in Statistics due to the presence of censoring

Censoring is present when we have some information about a subject’s event time, but we don’t know the exact event time.

For the analysis methods we will discuss to be valid, censoring mechanism must be independent of the survival mechanism.

11

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Example

The recurrence time of 42 patients with acute leukemia were reported from a clinical trial undertaken to assess the ability of the drug 6-mercaptopurine (6-MP) to maintain remission.

Each patient was randomized to either 6-MP or placebo;

the study was terminated after one-year.

Patients were enrolled sequentially at different times.

The recurrence times are measured in weeks.

Time to relapse in the Treatment (6-MP) group 6,6,6,7,10,13,16,22,23, 6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+

12

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

6-MP Leukemia study

Time to relapse in the Treatment (6-MP) group 6,6,6,7,10,13,16,22,23,

6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+

For the 6+ data point,

The subject was followed up to the 6th week

The subject did not relapse until the that time (6th week)

The subject was lost-to-follow-up

We do not know what happened after 6th week. The subject may have relapsed at a later time, but we do not know that, and if so, we do not know the exact time either

13

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

6-MP Leukemia study contd


If there were no censoring and the data were 6,6,6,7,10,13,16,22,23, 6,9,10,11,17,19,20,25, 32, 32, 34, 35

Then one could have used usual statistical methods for their analysis

For example, median survival would have been median(6,6,..,23,6,9,…..,35)= 16

Presence of censored data is an aspect that makes time-to-event analysis distinct

14

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Censored data in time-to-event analysis

Time to relapse in the Treatment (6-MP) group 6,6,6,7,10,13,16,22,23,

6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+

In time-to-event analysis, the information in the data point is 6+ taken as it is

The time to relapse for this subject is not imputed

The time to relapse is not taken as 6 week.

Rather, the information that is used in a time-to-event analysis is that subject did not relapse (survived from relapse) up to 6th week.

15

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Right Censoring


Usual Reasons for Right-Censoring

A subject does not experience the event before the study ends

A person is lost to follow-up during the study period

A person withdraws from the study

A subject experiences a terminating event (such as death) from a different cause

16

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Types of right censoring

Fixed type I censoring occurs when a study is designed to end after a specified duration of follow-up (one year in the 6-MP study). In this case, everyone who does not have an event observed during the course of the study is censored at end of study.

This is also sometimes referred to as administrative censoring

17

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Potential data without censoring

Time 0

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

“Administrative” censoring

Time 0 STUDY END

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Observed data

Time 0 STUDY END

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Types of right censoring

In random type I censoring, the study is designed to end after a specified duration , but censored subjects do not all have the same censoring time. This is the main type of right-censoring we will be concerned with.

This is usually caused by

Patient drop-out

Loss to follow-up

In type II censoring, a study ends when there is a

prespecified number of events. 21

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Drop-out or Loss-TFU

Time 0 STUDY END

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

How do we ‘treat” the data?

Time of

enrollment

Shift everything so each patient time represents time on study

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Other types censoring

Left censoring In right censoring, we observe that subject survived up to certain time (such as 6 weeks), that is, the survival time is ≥ 6 weeks. In left censoring, we only observe that the survival time is ≤ some value.

Interval censoring refers to the case where we only know that the survival time is in some interval, that is L ≤ Survival time ≤ U

Example: Any study where events are observed at fixed follow-up visits. For example, a study periodically screens women for breast cancer. If a woman tests positive for cancer, we only know that cancer occurred in the interval between the previous screening and the current one.

24

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Non-informative censoring

Regardless of the type of censoring, in all standard survival analysis, we assume that censoring is non-informative about the event; that is, the censoring is caused by something other than the impending failure.

Modeling dependent censoring is an active research area.

25

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

We will use T to denote the time-to-event. This is our outcome variable.

The survival function is S(t) = P(T > t)

For example, at t=2 years, the survival function S(2) provides the (theoretical) probability of surviving beyond 2 years. This, for example, can be interpreted as the proportion of subjects who will survive beyond 2 years.

Terminology and notations

26

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

The Survival function

The survival function gives the probability that a subject will survive past time t.

As t ranges from 0 to ∞ (infinity), the survival function has the following properties

It is non-increasing

At time t = 0, S(t) = 1. In other words, everyone is is surviving at time 0.

At time t ∞, S(t ∞) = 0. As time goes to infinity, the survival curve goes to 0. That is, the probability of eventual failure is 1.

In theory, the survival function is smooth. In practice, we observe events on a discrete time scale (days, weeks, etc.).

27

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

The hazard function

The hazard function h(t) (or (t)) is defined as

The hazard function is based on a conditional probability of the type (for t=3 and =2) P(3 < T ≤ 5 | T > 3). that is, given that a patient has survived up to time t=3, what is the probability that the patient will have the event by time 5?

The hazard function is an instantaneous failure rate. Given that a patient has survived up to time t, the hazard at time t is the probability of having the event in the next instant

28

)|(

0

lim)(

tTtTtPth

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

The hazard function

The hazard function h(t) , being an instantaneous failure rate, is the derivative of the cumulative hazard function H(t)

The hazard function, being an instantaneous conditional probability, is also linked to the survival function.

29

t

duuhtHortHdt

dth

0)()()()(

))(exp())(exp()()(log)(0t

duuhtHtSortStH

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Relative Risk or hazard ratio

Suppose we want to compare two groups. The Treatment group 1 has hazard hT(t) and the Control group 2 has hazard hC(t) . The relative risk of Treatment to Control is then given by the ratio

In general, this relative risk is a function of time t, that is, the relative risk at time t=2 years may not be the same as the relative risk at time t=5 years.

If we make the assumption that relative risk remains constant over time, RR(t)=constant, we have a proportional hazards model with the hazard of the control group serving as the baseline.

30

)(

)()(

th

thtRR

C

T

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Recording Survival data

31

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Recording Survival Data contd.

32

Recorded data Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Survival analysis in R

R is an Open Source (and freely available) environment for statistical computing and graphics.

Base R and most R packages are available for download from the Comprehensive R Archive Network (CRAN)

R is under active development - typically two major releases per year.

base R comes with a number of basic data management, analysis, and graphical tools

R's power and flexibility, however, lie in its array of packages (currently more than 4,000 !)

33

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Interacting with R

You can work straight with R but most people use some other front end.

RStudio, an Integrated Development Environment (IDE)

Deducer, a Graphical User Interface (GUI)

34

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

http://www.rstudio.com/ide/

http://www.deducer.org/pmwiki/index.php?n=Main.DeducerManual?from=Main.HomePage

Data input in R

The simplest way to input a rectangular data set is to save it as a comma-separated value (csv) fille and read it with the R command read.csv.

The first argument to read.csv is the name of the file. On Windows it can be tricky to get the file path correct. You can set the path globally using > setwd(“C:/Mydirectory/SurvivalCourse/") mydata <- read.csv(“filename.csv”)

Another approach is to use the function file.choose which brings up a “chooser“ panel through which you can select a particular file. The idiom to remember is

> mydata <- read.csv(file.choose()) or read.delim(file.choose())

35

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Reading data from other software

You can read in datasets from other statistical analysis software using functions found in the foreign package

require(foreign)

read.dta Read Stata Binary Files

read.mtp Read a Minitab Portable Worksheet

read.spss Read an SPSS Data File

read.ssd Obtain a Data Frame from a SAS Permanent Dataset, via read.xport

read.systat Obtain a Data Frame from a Systat File

read.xport Read a SAS XPORT Format Library

36

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

http://127.0.0.1:25062/library/foreign/html/read.dta.html

http://127.0.0.1:25062/library/foreign/html/read.mtp.html

http://127.0.0.1:25062/library/foreign/html/read.spss.html

http://127.0.0.1:25062/library/foreign/html/read.ssd.html

http://127.0.0.1:25062/library/foreign/html/read.systat.html

http://127.0.0.1:25062/library/foreign/html/read.xport.html

Reading Excel data

Excel spreadsheets can be read into R utilizing the xlsx package

require (xlsx)

Package: xlsx Title: Read, write, format Excel 2007 and Excel 97/2000/XP/2003 files Version: 0.5.1 Date: 2012-03-31

Description: Provide R functions to read/write/format Excel 2007 and Excel 97/2000/XP/2003 file formats.

URL: http://code.google.com/p/rexcel/

37

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Data in R

Standard rectangular data sets (columns are variables, rows are observations) are stored in R as data frames.

The columns can be numeric variables (e.g. measurements or counts) or factor variables (categorical data) or ordered factor variables. These types are called the class of the variable.

38

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Estimating the survival function S(t) based on observed data

We assume here that every subject follows the same survival function (no covariates or other individual differences

• We can use nonparametric estimators like the Kaplan-Meier estimator

• We can estimate the survival distribution by making parametric assumptions

Exponential

Weibull

Gamma

log-normal

39

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Non-parametric estimation of S(t)

• When no event times are censored, a non-parametric

estimator of S(t) is 1−Fn(t). Here Fn (t) is the empirical distribution function which puts equal, 1/n, mass at each data point.

• When some observations are censored, we can estimate S(t) using the Kaplan-Meier product-limit estimator.

Like the Empirical CDF, the Kaplan-Meier estimator per se, is completely model free. In this sense, it can also be considered as a descriptive statistics.

40

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Kaplan-Meier estimator of S(t)

Begin with data as observed

Convert to “on-study” time

41

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls


“On-study” time

Kaplan-Meier estimator of the survival curve

42

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls


6-MP data: Treatment (6-MP) group, n=21

Time to recurrence: 6,6,6,7,10,13,16,22,23,

6+,9+,10+,11+,17+,19+,20+,25+, 32+, 32+, 34+, 35+

43

t # at risk=n #Events = d 1-d/n KM Estimator S(t)

6 21 3 1-3/21=0.8571 1*0.8571=0.8571

7 17 1 1-1/17=0.9412 0.8571*0.9412=0.8067

10 15 1 1-1/15=0.9333 0.8067*0.9333=0.7529

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

How to obtain the Kaplan-Meier estimate from R

We will use the survival package in R

Therneau T (2013). A Package for Survival Analysis in S. R package version 2.37-4,

http://CRAN.R-project.org/package=survival.

44

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

http://cran.r-project.org/package=survival



Survival objects in R

Many functions in the survival package apply methods to Surv objects, which are survival-type objects created using the Surv() function.

For right-censored data, only two arguments are needed in

the Surv() function: a vector of times and a vector indicating which times are observed and censored. my.surv.object <- Surv(time, event)

There are many other types of survival objects that can be created, such as or Surv(time, time2, event, type)

45

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

KM estimate in R: survfit() function

The Kaplan-Meier estimate can be obtained using the survfit function in R

km.est <- survfit(my.surv.object ~ 1)

summary(km.est)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

6 21 3 0.857 0.0764 0.720 1.000

7 17 1 0.807 0.0869 0.653 0.996

10 15 1 0.753 0.0963 0.586 0.968

13 12 1 0.690 0.1068 0.510 0.935

16 11 1 0.627 0.1141 0.439 0.896

… .. … … …. …. ….

46

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

KM estimate from survfit() function

time n.risk n.event survival std.err lower 95% CI upper 95% CI

6 21 3 0.857 0.0764 0.720 1.000

7 17 1 0.807 0.0869 0.653 0.996

10 15 1 0.753 0.0963 0.586 0.968

13 12 1 0.690 0.1068 0.510 0.935

16 11 1 0.627 0.1141 0.439 0.896

… .. … … …. …. ….

47

t # at risk=n #Events = d 1-d/n KM Estimator S(t)

6 21 3 1-3/21=0.8571 1*0.8571=0.8571

7 17 1 1-1/17=0.9412 0.8571*0.9412=0.8067

10 15 1 1-1/15=0.9333 0.8067*0.9333=0.7529

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

KM estimate plot

plot(km.est, main="Kaplan-Meier estimate with 95% confidence bounds- 6-MP group", xlab="Weeks", ylab="Time to Recurrence",

lty = c(1,2,2), lwd=2, col=c("blue","purple","purple") )

48

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Mean and median survival estimates with bounds

Most medical literature report the median survival and its associated confidence interval

49

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Mean and Median survival estimates with bounds

The survfit() function can estimate the median survival and its 95% confidence interval

km.est <- survfit(Surv(Time.to.Relapse,recur) ~ 1)

km.est

records n.max n.start events median 0.95LCL 0.95UCL

21 21 21 9 23 16 NA

Using survfit() in conjunction with print(), the mean survival time and its standard error may be obtained. This however requires a finite upper bound on survival times which is set at the largest observed or censored time.

> print(km.est, print.rmean=TRUE)

records n.max n.start events *rmean *se(rmean) median 0.95LCL 0.95UCL

21.00 21.00 21.00 9.00 23.29 2.83 23.00 16.00 NA

* restricted mean with upper limit = 35

50

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Parametric models for survival functions

The Kaplan-Meier estimator is a very useful tool for estimating survival functions. Sometimes, we may want to make more assumptions that allow us to model the data in more detail. By specifying a parametric form for S(t), we can

easily compute selected quantiles of the distribution

estimate the expected failure time

derive a concise equation and smooth function for estimating S(t), H(t) and h(t)

may be able to estimate S(t) better than KM assuming the parametric form is correct!

51

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Parametric survival models

Some popular distributions for estimating survival curves are

• Exponential

• Weibull

• log-normal (log(T) has a normal distribution)

• log-logistic

52

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Maximum likelihood estimation

Maximum likelihood estimation is usually used to estimate the unknown parameters of the parametric distributions.

When an event is observed (uncesored), the ith subject contributes a density term f(y) to the likelihood

If the event is censored, the ith subject contributes a survival term S(y)= P(T > y) to the likelihood

The joint likelihood for all n subjects is

53

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Survreg() function in survival package

We use the survreg function to fit a parametric exponential model to the 6-MP group data

survreg(formula = Surv(Time.to.Relapse, recur) ~ 1, data =

sixmp.trt, dist = "exponential") Value Std. Error z p (Intercept)

3.69 0.333 11.1 2e-28

Scale fixed at 1

Exponential distribution Loglik(model)= -42.2 Loglik(intercept only)= -42.2 Number of Newton-Raphson Iterations: 4 n= 21

54

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Exponential model fit

Exponential distribution has S(t) = exp(-t)

Survreg retruns an estimate of in the negative log scale Estimate of -log = 3.686 Hence, estimate of is = exp(-3.686) = 0.02506

Estimate of the survival function is

55

t)*0.02506exp()(ˆ tS

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

56

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Two sample comparison for uncensored data Introduction

Two group comparison is one of the most widely used method in biomedical analysis

The simplest and one of the most used two-sample method is the two (independent) samples t-test

This test is based on parametric, two normal distribution assumptions.

Even under parametric assumptions, one can compare parameters other than the mean, such as

57

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Two sample comparison for uncensored data in more nonparametric settings

1. 2.

58

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Two sample comparison for survival data

59

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Two sample comparison for survival data

60

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Comparing survivals at a single time point

Issues

Choice of the time point t0

For the scientific point of view (such as clinically), this could be of some interest. But generally, more interest lies in comparing the survival probabilities at many time points.

This is inefficient since we throw away information on the rest of the survival curves

Goal: compare the entire survival curves (up to the maximum observed time)

61

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Two sample comparison for survival data:

Logrank test

Goal: compare the entire survival curves (up to the

maximum observed time)

62

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

LogRank Test: Idea

At an observed event time, we can think of the data as follows

The null hypothesis of EQUAL survival function for the two groups implies the independence of Group and Status (Dead or Alive) . Thus, under the null hypothesis, the expected value of D1k is

The test statistic is based on the differences between observed Ok =D1k in the first group and what is expected Ek

under the null hypothesis, that is which has a distribution under null.

63

Group Number of death at ti Number Alive at ti Total= # at risk

6-MP D1k A1k n1k

Placebo D2k A2k n2k

Total Dk Ak Nk

k

kkkk N

nDDEE 1

1 )(

)(

)( 2

kk

kk

EOVar

EO

2

1Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

LogRank test in R: survdiff()

survdiff(Surv(Time.to.Relapse,recur) ~ Treatment) N Observed Expected (O-E)^2/E (O-E)^2/V

Treatment=6-MP 21 9 19.3 5.46 16.8 Treatment=Placebo 21 21 10.7 9.77 16.8 Chisq= 16.8 on 1 degrees of freedom, p= 4.17e-05

We find strongly significant difference in survival between the 6-mp and placebo groups as expected

64

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

LogRank type tests

A whole family of log-rank type tests are obtained based on the weighted differences

The weights are often taken as

=0 gives the usual Log-rank test

=1 gives the Peto & Peto modication of the Gehan-Wilcoxon test

survdiff(Surv(Time.to.Relapse,recur) ~ Treatment, rho=1)

65

2)( kkk EOW

)}(ˆ{ kk tSW

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Regression analysis for survival data

We consider the case when in addition to survival time Y and the censoring indicator we also observe covariate x for each observation.

Example: A group of patients who dies from acute myelogenous leukemia

were classified into two subgroups according to the presence and absence of a morphologic characteristic of white cells. Patients termed “AG positive” were identified by the presence of Auer rods and/or significant granulature of leukemic cells in the bone marrow at diagnosis. These factors were absent for AG-negative patients. Leukemia is a cancer characterized by an overproliferation of white blood cells; the higher the white blood cell count (WBC), the more severe the disease. Thus, WBC could be an important covariate for predicting survival of leukemia patients. Link to CSV data

66

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

R-code/AGWBCLeuk.csv

Recap: The hazard function

The hazard function h(t) (or (t)) is defined as

The hazard function is based on a conditional probability of the type (for t=3 and =2) P(3 < T ≤ 5 | T > 3). that is, given that a patient has survived up to time t=3, what is the probability that the patient will have the event by time 5?

The hazard function is an instantaneous failure rate. Given that a patient has survived up to time t, the hazard at time t is the probability of having the event in the next instant

67

)|(

0

lim)(

tTtTtPth

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Hazard Function

Useful for conceptualizing how chance of event changes over time

That is, consider hazard ‘relative’ over time

Examples:

Treatment toxicity related mortality

Early on, high risk of death

Later on, risk of death decreases

Aging

Early on, low risk of death

Later on, higher risk of death

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Shapes of hazard functions

Increasing (Increasing failure rate or IFR)

Natural aging and wear

Decreasing (DFR)

Early failures due to device or transplant failures

Bathtub

Populations followed from birth

Hump-shaped

Initial risk of event, followed by decreasing chance of event

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Examples

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

Time

Haz

ard

Func

tion

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Relative Risk or hazard ratio

Suppose we want to compare two groups. The AG+ group has hazard hAG+(t) and the AG- group 2 has hazard hAG-(t) . The relative risk of Treatment to Control is then given by the ratio

In general, this relative risk is a function of time t, that is, the relative risk at time t=2 years may not be the same as the relative risk at time t=5 years.

If we make the assumption that relative risk remains constant over time, RR(t)= , we have a proportional hazards model hAG+(t) = hAG-(t) with the hazard of the AG- group serving as the baseline.

71

)(

)()(

th

thtRR

AG

AG

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Proportional hazards (PH) model

hAG+(t) = hAG-(t), or

Proportional hazard means that the log-hazard functions of the two groups are parallel and the survival curves do not cross each other This is the setup of the Cox Proportional Hazards model.

The “relative risk” is called the hazard ratio (HR). In the proportional hazards model, we make the assumption that the hazard ratio is constant over time. Hence HR is a single number

72

)]([)( tStS AGAG

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls


For a continuous covariate x, such as WBC , the (Cox) proportional hazard (PH) regression models the relative risk of a subject with covariate x to another subject who has covariate x=0.

In particular, the PH model assumes that the relative risk is constant over time (hence the name proportional) and depends only on the covariate x, as

Next, this relative risk is modeled as (x) = exp(x)

Once we combine these two parts and rewrite, we get

h(t|X=x) = exp(x) h0(t)

73

xxtRRXth

xXth

),(

)0|(

)|(

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls


h(t|X=x) = h0(t) exp(x)

The 1st component is h0(t), which is an unspecified baseline hazard

The 2nd component is the linear regression part, x , inside the exponent.

Going back to the binary covariate AG (1=AG+ and 0=AG-), we have h(t|X=AG+) = h0(t) exp() h(t|X=AG-) = h0(t)

So, the hazard ratio (HR) between AG+ and AG- is given by =exp()

The crucial assumption in the proportional hazard regression model is that the covariate changes the baseline hazard by a constant (not time-varying) multiple

74

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Proportional hazards (PH) model in R the coxph() function

The coxph() function in the survival() package can fit a Proportional Hazards regression model ph.ag <- coxph(Surv(SurvWk,Status)~ AG, data=agwbc)

coef exp(coef) se(coef) z p AGAG+ -1.18 0.307 0.417 -2.83 0.0047

Likelihood ratio test=8.26 on 1 df, p=0.00405 n= 33, number of events= 33

Estimate of is -1.18. Estimate of HR= =exp() is 0.307

75

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Proportional hazards (PH) model in R the coxph() function



AG+ has a reduced hazard of death (compared to AG-, which is the reference). The HR is 0.307, that is, AG+ has 0.307 times reduced hazard than AG-.

The effect of AG status on survival is strongly significant (Wald test p-value=0.0047, or likelihood ratio test p=0.0041)

76

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Confidence interval for HR in R

summary(ph.ag) returns a conf interval for HR exp(coef) exp(-coef) lower .95 upper .95

AGAG+ 0.3072 3.255 0.1356 0.696

Likelihood ratio test= 8.26 on 1 df, p=0.00405 Wald test = 8 on 1 df, p=0.00468 Score (logrank) test = 8.75 on 1 df, p=0.0031

77

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

PH regression with a numeric covariate

ph.wbc <- coxph(Surv(SurvWk,Status)~ WBC1000, data=agwbc) coef exp(coef) se(coef) z p

WBC1000 0.0101 1.01 0.00488 2.07 0.038

The HR is 1.01. Each 1 unit change in WBC1000 increases the hazard of death by 1.01.

A patient with WBC=10000 has 1.01 times higher hazard than a patient with WBC=9000

A patient with WBC=11000 has (1.01)2 times higher hazard than a patient with WBC=9000

The effect of WBC on the hazard of death is significant, p=0.038 (within the Cox PH model).

78

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

PH regression with multiple covariate

When there are more than one covariate, such as X1 = AG status (a binary covariate) and X2= WBC (a continuous covariate), the PH regression models the hazard function as h(t|X=x) = h0(t) exp(1 x 1 + 2 x 2 )

where , as before, h0(t) is the baseline hazard function.

ph.mult <- coxph(Surv(SurvWk,Status)~ AG + WBC1000, data=agwbc)

coef exp(coef) se(coef) z p

AGAG+ -1.08919 0.336 0.42635 -2.55 0.011

WBC1000 0.00784 1.008 0.00499 1.57 0.120


79

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

PH regression with multiple covariate

h(t|X=x) = h0(t) exp(1 x 1 + 2 x 2 )

Overall test: H0 : 1 =0, 2 =0 Likelihood ratio test=10.5 on 2 df, p=0.00519 n= 33

Tests for single covariate: H10 : 1 =0 and H2

0 : 2 =0


AGAG+ -1.08919 0.336 0.42635 -2.55 0.011

WBC1000 0.00784 1.008 0.00499 1.57 0.120

These are conditional tests with adjusted p-values

80

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Predictions from the Cox PH model

81

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Interaction

An interaction effect allows one to assess if the effect of X1 = AG status and X2= WBC on survival is beyond additive (within the setting of PH regression) h(t|X=x) = h0(t) exp(1 x 1 + 2 x 2 + 3 x 1*x 2 )

ph.interac <- coxph(Surv(SurvWk,Status)~ AG*WBC1000, data=agwbc)


AGAG+ -1.735824 0.176 0.54631 -3.1774 0.0015

WBC1000 -0.000184 1.000 0.00684 -0.0268 0.9800

AGAG+:WBC1000 0.019265 1.019 0.00978 1.9706 0.0490

Likelihood ratio test=14.4 on 3 df, p=0.00245 n= 33

82

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Predictions from Interaction model

83

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Graphical Check for proportional hazards: categorical covariate

Proportional Hazards regression using the binary covariate AG ph.ag <- coxph(Surv(SurvWk,Status)~ AG, data=agwbc)


Proportional hazards means that log-hazards and hence the log-cumulative hazards for the twp groups, AG+ and AG-, will be proportional.

This is the setup of the PH model and hence this is what we are going to see if we plot the log-cumulative hazards.

84

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

85

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls


However, since AG is categorical, we can also fit Kaplan-Meier estimates

Without the proportional hazards assumption

But the KM estimates will be fitted separately in the AG+ and AG- subsets km.ag <- survfit(Surv(SurvWk,Status)~ AG, data=agwbc)

86

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls


87

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Comparison of KM and PH estimates

88

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Other diagnostics for PH assumption

Based on constructing time-dependent covariates

Based on residual diagnostics

89

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

90

[email protected]

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

sanjib basu association for - iasct home to... · 2015-03-24 · 1 . sanjib basu . northern...

Documents