01/20151 epi 5344: survival analysis in epidemiology introduction to concepts and basic methods...

01/2015 1

EPI 5344:Survival Analysis in

EpidemiologyIntroduction to concepts and basic methods

February 24, 2015

Dr. N. Birkett,School of Epidemiology, Public Health &

Preventive Medicine,University of Ottawa

01/2015 2

Survival concepts (1)

• Cohort studies– Follow-up a pre-defined group of people for a period

of time which can be: • Same time for everyone

• Different time for different people.

– Determine which people achieve specified outcome.

01/2015 3

• Cohort studies– Outcomes could be many different things, such as:

• Death– Any cause

– Cause-specific

• Onset of new disease

• Resumption of smoking in someone who had quit

• Recidivism for drug use or criminal activity

• Change in numerical measure such as blood pressure– Longitudinal data analysis

01/2015 4

• Cohort studies– Traditional approach to cohorts assumes everyone is

followed for the same time• Incidence proportion

• Logistic regression modeling

– If follow-up time varies, what do you do with subjects

who don’t make it to the end of the study?• Censoring

01/2015 5

• Cohort studies– Cohort studies can provide more information than

presence/absence of outcome.• Time when outcome occurred

• Type of outcome (competing outcomes)

– Can look at rate or speed of development of outcome• Incidence rate

• Person-time

01/2015 6

• Time to event analysis– Survival Analysis (general term)– Life tables– Kaplan-Meier curves– Actuarial methods– Log-rank test– Cox modeling (proportional hazards)

• Strong link to engineering– Failure time studies

• Common epidemiological approach to the analysis of cohort studies– Most common outcome measure is:

• Incidence proportion• Cumulative incidence

– Select a point in time as the end of follow-up.– Compare groups using t-test– Logistic regression is commonly used– Produces a CIR (RR)

01/2015 7

• Issues with this approach include:– What point in time to use?– What if not all subjects remain under follow-up

that long?– Ignores information from subjects who don’t get

outcome or reach the time point– What is incidence proportion for the outcome

‘death’ if we set the follow-up time to 200 years?• Will always be 100%

01/2015 8

• Alternate method uses Incidence rate (density)– Based on person time of follow-up– Can include information on drop-outs, etc.– Closely linked to survival analysis methods

01/2015 9

01/2015 10

• Cumulative Incidence– The probability of becoming ill over a pre-defined

period of time.– No units– Range 0-1

• Incidence density (rate)– The rate at which people get ill during person-time of

follow-up• Units: 1/time or cases/Person-time• Range 0 to +∞

– Very closely related to hazard rate.

01/2015 11

Measuring Time (1)

• Units to use to measure time– Normally, years/months/days

– Time of events is usually measured using dates on a calendar

– Other measures are possible (e.g. hours)

• ‘scale’ to be used– time on study

– age

– calendar date

• Time ‘0’ (‘origin of time’)– The point when time starts

01/2015 12

Time Scale (1)

• Time of events is usually measured using ‘calendar dates’

• Can be represented in graphic display by ‘time lines’– The conceptual idea used in analyses

Patient #1 enters on Feb 15, 2000 & dies on Nov 8, 2000

Patient #2 enters on July 2, 2000 & is lost (censored) on April 23, 2001

Patient #3 Enters on June 5, 2001 & is still alive (censored) at the end of the follow-up period

Patient #4 Enters on July 13, 2001 and dies on December 12, 2002

01/2015 13

01/2015 14

Time Scale (2)

• In RCT’s, focus is commonly on ‘study time’– How long after a patient starts follow-up do their events

occur?– Need to define a ‘time 0’ or the point when study time

starts accumulating for each patient.– Frequently used as the ‘default’ in observational research

• Most epidemiologists recommend using ‘age’ as the time scale for etiological studies– More in Session 6

• For now, focus on ‘study time’ as the time scale

01/2015 15

Origin of Time (1)

• Choice of time ‘0’ affects analysis– can produce very different regression

coefficients and model fit;

• Preferred origin is often unavailable• More than one origin may make sense

– no clear criterion to choose which to use

01/2015 16

Time ‘0’ (2)

• No best time ‘0’ for all situations– Depends on study objectives and design

• RCT of Rx– ‘0’ = date of randomization

• Prognostic study– ‘0’ = date of disease onset– Inception cohort– Often use: date of disease diagnosis

01/2015 17

Time ‘0’ (3)

• ‘point source’ exposure

– Use date of event• Hiroshima atomic bomb

• Dioxin spill (Seveso, Italy)

01/2015 18

Time ‘0’ (4)

• Chronic exposure– Date of study entry– Date of first exposure– For age as time scale, time ‘0’ is date of birth

• Issues to consider– There often is no first exposure (or no clear date of 1st

exposure)– Recruitment long after 1st exposure

• Immortal person time• Lack of info on early events.

01/2015 19

Time ‘0’ (5)

• Here is our sample time line data• Convert for analysis by defining a time ‘0’

Patient #1 enters on Feb 15, 2000 & dies on Nov 8, 2000

Patient #2 enters on July 2, 2000 & is lost (censored) on April 23, 2001

Patient #3 Enters on June 5, 2001 & is still alive (censored) at the end of the follow-up period

Patient #4 Enters on July 13, 2001 and dies on December 12, 2002

01/2015 20

Time ‘0’ (6)

• Calendar time can be very important– Uses the actual date of the event

– Studies of incidence/mortality trends

– Normally uses Poisson or similar models

• In survival analysis, focus is on ‘study time’– When after a patient starts follow-up do their events occur

• Need to change time lines to reflect new time scale

01/2015 21

01/2015 22

01/2015 23

Study course for patients in cohort

2001 2003 2013

01/2015 24

01/2015 25

Time ‘0’ (7)

• Can be interested in more than one ‘event’– More than one ‘time to event’

• An Example:– Patients treated for malignant melanoma– Treated with drug ‘A’ or ‘B’– Expected to influence both:

• Time to relapse;• Time to survival

01/2015 26

Time ‘0’ (8)

• SAS code to compute time-to-event.

• Surgical treatment for breast cancer

• Four time points:– Date of surgery

– Relapse

– Death

– Last follow-up (if still alive without relapse.)

01/2015 27

Time ‘0’ (9)

• Time ‘0’: Date of surgery

• Event #1: Relapse– Earliest of relapse/death/end

• Event #2: Death– Earliest of death/end

01/2015 28

Time ‘0’

• How do we compute the ‘time on study’ for each of these events?• Convert to days (or weeks, months, years) from

time ‘0’ for each person• Let’s talk some SAS

01/2015 29

Dates in SAS (1)

• Multiple ways to get date data into SAS– I commonly use three variables for each date:

• Day• Month• Year

– Facilitates data entry and editing– Requires more complicated manipulation later

• Stored as SAS date variables– Multiple formats available for data entry– Always stored as # days since Jan 1, 1960.

Dates in SAS (2)

data dates; input ptid 1-5 @7 surgdate mmddyy8.; datalines;13725 10/5/9525422 3/7/9734721 9/6/9411111 6/6/55;run;

proc print data=dates;run; 01/2015 30

Dates in SAS (3)

01/2015 31

Obs # ptid surgdate

1 13725 13061

2 25422 13580

3 34721 12667

4 11111 -1670

Dates in SAS (4)

data dates; input ptid 1-5 @7 surgdate mmddyy8.; datalines;13725 10/5/9525422 3/7/9734721 9/6/9411111 6/6/55;run;

proc print data=dates; format surgdate date9.;run; 01/2015 32

Dates in SAS (5)

01/2015 33

Obs # ptid surgdate

1 13725 05OCT1995

2 25422 07MAR1997

3 34721 06SEP1994

4 11111 06JUN1955

01/2015 34

Time ‘0’

• Read the date data using a ‘date format’• If the event didn’t happen, then the date is ‘missing’

01/2015 35

SAS code to create event variables

Data melanoma; set melanoma;/* surv -> Alive at the end of follow-up */ if (date_of_death = .) then survevent = 0; else survevent = 1;

01/2015 36

Data melanoma; set melanoma;/* surv -> Alive at the end of follow-up */ survevent = (date_of_death ne .);

01/2015 37

if (survevent = 0) then survtime = (date_of_last – date_of_surg)/30.4; else survtime = (date_of_death – date_of_surg)/30.4;

01/2015 38

/* dfs -> Died or relapsed */ if ((date_of_relapse = 0) and (date_of_death = .)) then dfsevent = 0 else dfsevent = 1;

01/2015 39

/* dfs -> Died or relapsed */ dfsevent = 1 – (date_of_relapse = .)*(date_of_death = .);

01/2015 40

/* dfs -> Died or relapsed */ dfsevent = 1 – (date_of_relapse = .)*(date_of_death = .);

if (dfsevent = 0) then dfstime = (date_of_last - date_of_surg)/30.4; else if (date_of_relapse NE .) then dfstime = (date_of_relapse - date_of_surg)/30.4; else if (date_of_relapse = . and date_of_death NE .) then dfstime = (date_of_death - date_of_surg)/30.4; else dfstime = .E;

01/2015 41

01/2015 42

01/2015 43

Survival curve (1)

• What can we do with data which includes time-to-event?

• Might be nice to see a picture of the number of people surviving from the start to the end of follow-up.

Sample Data: Mortality, no losses

Year # still alive # dying in the year

2000 10,000 2,000

2001 8,000 1,600

2002 6,400 1,280

2003 5,120 1,024

2004 4,096 820

01/2015 44

01/2015 45

Not the right axis for a survival curve

01/2015 46

Survival curve (2)

• Previous graph has a problem– What if some people were lost to follow-up?– Plotting the number of people still alive would

effectively say that the lost people had all died.

Sample Data: Mortality, no losses

01/2015 47

Year # still alive # dying in the year Lost to follow-up

2000 10,000 2,000 1,000

2001 7,000

2000 10,000 2,000 1,000

2001 7,000 1,400 800

2002 4,800 960 500

2003 3,340 670 400

2004 2,270 460 260

01/2015 48

01/2015 49

Survival curve (2)

• Previous graph has a problem– What if some people were lost to follow-up?– Plotting the number of people still alive would

effectively say that the lost people had all died.

• Instead– True survival curve plots the probability of

surviving.

01/2015 50

01/2015 51

01/2015 52

Survival Curves (1)• Primary outcome is ‘an event happened’• You need to know:

– type of event – time to event

Person Type Time

1 Death 100

2 Alive 200

3 Lost 150

4 Death 65

And so on

01/2015 53

Survival Curves (2)

• Censoring (censored outcome)– People who do not have the targeted outcome (e.g. death)

– For now, assume no censoring

• How do we represent the ‘time’ data in a statistical

method?– Histogram of death times - f(t)

– Survival curve - S(t)

– Hazard curve - h(t)

• To know one is to know them all

01/2015 54

dxxftF0

Histogram of death time- Skewed to right- pdf or f(t)- CDF or F(t)

- Area under ‘pdf’ from ‘0’ to ‘t’

01/2015 55

Survival curves (3)

• Plot % of group still alive (or % dead)

S(t) = survival curve

= % still surviving at time ‘t’

= P(survive to time ‘t’)

Mortality rate = 1 – S(t)

= F(t)

= Cumulative incidence

01/2015 56

Deaths CI(t)

Survival S(t)

1-S(t)

01/2015 57

‘Rate’ of dying• Consider these 2 survival curves• Which has the better survival profile?

– Both have S(3) = 0

01/2015 58

01/2015 59

Survival curves (4)

• Most people would prefer to be in group‘A’ than

group ‘B’.– Death rate is lower in first two years.

– Will live longer than in pop ‘B’

• Concept is called:– Hazard: Survival analysis/stats

– Force of mortality: Demography

– Incidence rate/density: Epidemiology

01/2015 60

Survival curves (5)

• DEFINITION of hazard– h(t) = rate of dying at time ‘t’ GIVEN that you have

survived to time ‘t’

– Similar to asking the speed of your car given that you

are two hours into a five hour trip from Ottawa to

Toronto

• Slight detour and then back to main theme

01/2015 61

Conditional Probability

h(t0) = rate of failing at ‘t0’ conditional on surviving to t0

Requires the ‘conditional survival curve’:

Essentially, you are re-scaling S(t) so that S*(t0) = 1.0

Survival Curves (5)

01/2015 62

01/2015 63

S*(t) = survival curve conditional on surviving to ‘t0‘

CI*(t) = failure/death/cumulative incidence at ‘t’ conditional on surviving to ‘t0‘

Hazard at t0 is defined as: ‘the slope of CI*(t) at t0’

Hazard (instantaneous)Force of MortalityIncidence rateIncidence density

Range: 0 ∞

01/2015 64

Some relationships

If the rate of disease is small: CI(t) ≈ H(t)If we assume h(t) is constant (= ID): CI(t)≈ID*t

01/2015 65

Some survival functions (1)

• Exponential– h(t) = λ– S(t) = exp (- λt)

• Underlies most of the ‘standard’ epidemiological formulae.

• Assumes that the hazard is constant over time– Big assumption which is not usually true

01/2015 66

01/2015 67

• Weibull– h(t) = λ γ tγ-1

– S(t) = exp (- λ tγ)• Allows fitting a broader range of hazard

functions• Assumes hazard is monotonic

– Always increasing (or decreasing)

01/2015 68

01/2015 69

Hazard curves (2)

01/2015 70

Hazard curves (3)

01/2015 71

• All these functions assume that everyone eventually gets the outcome event.

• That might not be true:– Cures occur– Immunity

• Mixture models

01/2015 72

• Piece-wise exponential– Divide follow-up into intervals– The hazard is constant within interval but can differ

across intervals (e.g. ‘0’ for cure)

01/2015 73

01/2015 74

• Piece-wise exponential– Divide follow-up into intervals– The hazard is constant within interval but can differ

across intervals (e.g. ‘0’ for cure)

• Gompertz Model– Uses a functional form for S(t) which goes to a fixed,

non-zero value after a finite time

01/2015 75

Censoring (1)

• So much for theory

• In real world, we run into practical issues:– May know that subject was disease-free up to time ‘t’ but then

you lost track of them

– May only know subject got disease before time ‘t’

– May only know subject got disease between two exam dates.

– May know subject must have been outcome-free for the first ‘x’

years of follow-up (immortal person-time)

– Can’t measure time to infinite precision• Often only know year of event

– Exact time of event might not even exist in theory

Censoring (2)

• Three main kinds of censoring– Right censoring

• The time of the event is known to be later than some time

• Subject moves to Australia after three years of follow-up– We only know that they died some time after 3 years.

– Left censoring• The time of the event is known to be before some time

– Looking at age of menarche, starting with a group of 12 year old

girls.

– Some girls are already menstruating

01/2015 76

Censoring (3)

• Three main kinds of censoring– Interval censoring

• Time of the event occurred between two known

times– Annual HIV test

– Negative on Jan 1, 2012

– Positive on Jan 1, 2013

01/2015 77

01/2015 78

01/2015 79

Censoring (4)

• Right censoring is most commonly

considered– Type 1 censoring

• The censoring time is ‘fixed’ (under control of

investigator)

– Singly censored• Everyone has the same censoring time

• Commonly due to the study ending on a specific

01/2015 80

Censoring (5)

• Right censoring is most commonly considered– Type 2 censoring

• Terminate study after a fixed number of events has

happened– most common in lab studies

– Random censoring• Observation terminated for reason not under investigator’s

control

• Varying reasons for drop-out

• Varying entry times

01/2015 81

Censoring (6)• Right censoring is most commonly assumed• At the end of their follow-up, subject has not had event.

– Administrative Censoring– Loss-to-follow-up

• A patient moves away or is lost without having experienced event of interest

– Drop-out• Patient dropped from study due to protocol violation, etc.

– Competing risks• Death occurs due to a competing event

• We know something about these patients.• Discarding them would ‘waste’ information

01/2015 82

Study course for patients in cohort

2001 2003 2013

01/2015 83

Censoring (7)

• Standard analysis ignores method used to

generate censoring.

• Type 1/2 methods work fine

• ‘Random’ censoring can be a problem.

• Informative vs. uninformative censoring– Standard analyses require ‘uninformative’ censoring

– The development of the outcome in subjects who are

censored must be the same as in the subjects who

remained in follow-up

01/2015 84

Censoring (8)

• Informative vs. uninformative censoring– RCT of new therapy with serious side effects.

• Patients on this Rx can tolerate side effects until near death. Then, they drop out.

• Mortality rate in this group will be 0 (/100,000)

– Control therapy has no side-effects• Patients do not drop out near death.

• Strong bias

01/2015 85

01/20151 epi 5344: survival analysis in epidemiology introduction to concepts and basic methods...

time pointwhat

measuring time

everyonedifferent time

person time of follow

defined period of time

time starts01201512time

time linesthe conceptual

specified outcome

Documents

03/20131 back to basics, 2013 population health : vital &...

8/20091 epi 5240: introduction to epidemiology course...

11/20091 epi 5240: introduction to epidemiology cohort...

9/20091 epi 5240: introduction to epidemiology disease...

03/20121 back to basics, 2012 population health : vital &...

birkett range

01/20151 epi 5344: survival analysis in epidemiology age as...

01/20141 epi 5344: survival analysis in epidemiology quick...

01/20151 epi 5344: survival analysis in epidemiology time...

11/20091 epi 5240: introduction to epidemiology screening...

10/20091 epi 5240: introduction to epidemiology concepts of...

01/20151 epi 5344: survival analysis in epidemiology sas...

missing data in epidemiology: issues & approaches n....

01/20141 epi 5344: survival analysis in epidemiology...

03/2013 back to basics, 2013 population health :...

11/20091 epi 5240: introduction to epidemiology confounding:...

william norman birkett

01/20151 epi 5344: survival analysis in epidemiology...

disease classification, morbidity, mortality. dr. n....

march 20121 back to basics, 2012 population health (1):...