learning patient-speci c cancer survival distributions as...

1
Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors Chun-Nam Yu 1 Russ Greiner 1 Hsiu-Chin Lin 1 Vickie Baracos 2 Dept of Computing Science 1 /Dept of Oncology 2 , University of Alberta, Canada Introduction I Cancer patients and doctors need to make decisions about treatments and end-of-life care I Good prognostic models help manage the uncertainty and make better decisions I Use statistics such as median survival time, 5 year survival rate I Now: estimated from a large heterogenous population, based on only: . Site + Stage of cancer . (Perhaps) 1 or 2 covariates (e.g., age, sex, race) I Should use other information already in electronic health records: . prescriptions . blood test results . performance assessment by physicians I e.g. the Alberta Cancer Registry I Our GOAL: Build accurate patient-specific prognostic predictors with electronic health records Basic Survival Analysis I Survival function: S (t )= P (T t ) is the proportion of patients surviving longer than t months S(t) months 0.5 12 I Hazard function: h (t ) = lim Δt 0 P (t T t t | T t ) Δt is the instantaneous rate of failure/death at time t , given the patient survives longer than t S(t) months 6 24 I Censored Observations: Some patients leave in the middle of a study or remain alive at the end of a study only a lower bound on their survival time. Survival Regression Models I One of the most important regression model in survival analysis is the Cox Proportional Hazards (PH) Model [1] I Given covariates ~ x , Cox regression assumes the hazard function have the form h (t | ~ x )= h 0 (t ) exp( ~ θ T ~ x ), where h 0 (t ) is a base hazard independent of ~ x . I The hazard ratio is constant (independent of t ) for two patients ~ x 1 and ~ x 2 : h (t | ~ x 1 )/h (t | ~ x 2 ) = exp( ~ θ T ( ~ x 1 - ~ x 2 )) I Objective depends on rank of survival time only; cannot handle covariates with time-varying effects I Other survival regression models include Aalen linear hazards model [2], parametric survival regression based on Weibull, log-normal distributions I Survival Analysis VS Survival Prediction: 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 Cox survival curves for 3 patients Time (Months) P(survival) ●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●● ● ●●●●● ●●● ●● ●●● ●●● ●● ●● ●● ●●● ●● Survival Analysis Survival Prediction Goal: Evaluate prognostic factors and treat- ment effectiveness Predict accurate patient-specific sur- vival time Focus: Evaluation on populations Evaluation on individuals Uses: Testing new drugs & medical devices Treatment planning & patient man- agement Data - Cancer Patient Composition and Patient Attributes site\stage 1 2 3 4 Bronchus & Lung 61 44 186 390 Colorectal 15 157 233 545 Head and Neck 6 8 14 206 Esophagus 0 1 1 63 Pancreas 1 3 0 134 Stomach 0 0 1 128 Other Digestive 0 1 0 77 Misc 1 0 3 123 basic age, sex, weight gain/loss, BMI, cancer site, cancer stage general wellbeing no appetite, nausea, sore mouth, taste funny, constipation, pain, dental problem, dry mouth, vomit, diarrhea, performance status blood test granulocytes, LDH-serum, HGB, lyphocytes platelet, WBC count, calcium-serum, creatinine, albumin Predicting Personalized Survival Distributions I Consider a simpler problem: will a subject with covariate ~ x survive t months? I Model this with a simple logistic regression model: P ~ θ (T t | ~ x )=1/(1 + exp( ~ θ T ~ x + b )) S(t) months t 1 S(t) months t2 S(t) months t3 I Multiple thresholds t 1 , t 2 ,... to capture survival function at different times P ~ θ j (T t j | ~ x )=1/(1 + exp( ~ θ j T ~ x + b j )) I Predict a sequence of dependent bits 1 1 1 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 0 y 1 y 2 y 3 y 4 (0,t 1 ) [t 1 ,t 2 ) [t 2 ,t 3 ) [t 3 ,t 4 ) t 4 Dies in Dies in Dies in Dies in Alive at N+1 sequences ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 10 30 50 0.0 0.2 0.4 0.6 0.8 1.0 MTLR survival curves for 4 patients Time (Months) P(survival) ●●●●●●●●●●●●●●●●●●●●● ●● ●● ●● ●●●●●●●●●●● ●● ●●● ●●● ●● ●●●●●●●●●●●●● ●●● ●● ●● ●● ●●●●●● ●● ●●● ●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●● I Multi-task logistic regression for survival prediction: . Likelihoood of observing a sequence: P (Y =(y 1 , y 2 ,..., y N ) | ~ x )= exp( N j =1 y j ( ~ θ j ·~ x + b j )) Z ( ~ Θ; ~ x ) . Add multi-task regularizer to prevent overfitting: min ~ Θ C 1 2 N X j =1 k ~ θ j k 2 + C 2 2 N -1 X j =1 k ~ θ j +1 - ~ θ j k 2 - n X i =1 N X j =1 h y j (s i )( ~ θ j ·~ x i + b j ) - log Z ( ~ Θ; ~ x i ) i I Similar to CRF training objective, but . no transition features . no sharing of node potentials . Partition function Z ( ~ Θ; ~ x ) involves only a linear number of terms I Model also related to local regression approaches [3] I Advantage of the MTLR approach: . No PH assumption - effects of covariates can change with time . Handles censoring: by integrating out hidden variables in a survival sequence . More accurate survival probability predictions (below) Experimental Results I Classification Accuracy and Survival Rate Prediction (5CV): Accuracy 5 month 12 month 22 month MTLR 86.5(0.7) 76.1(0.9) 74.5(1.3) Cox 74.5(0.9) 59.3(1.1) 62.8(3.5) Aalen 73.3(1.2) 61.0(1.7) 59.6(3.6) Baseline 69.2(0.3) 56.2(2.0) 57.0(1.4) MSE 5 month 12 month 22 month MTLR 0.101(0.005) 0.158(0.004) 0.170(0.007) Cox 0.196(0.009) 0.270(0.008) 0.232(0.016) Aalen 0.198(0.004) 0.278(0.008) 0.288(0.020) Baseline 0.227(0.012) 0.299(0.011) 0.243(0.012) I Baseline is the common approach of predicting with cancer site & stage only. I Predicted survival curves for two test patients (Patient 1: 3 months, Patient 2: censored at 46 months) 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 P(survival) months MTLR 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 P(survival) months Cox 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 P(survival) months Aalen I Optimizing clinically relevant loss functions (prediction p , true survival t ): . Absolute error on survival time (AE): |p - t | . Absolute error on log survival time (AE-log): | log p - log t | . Relative absolute error (RAE): min{|p - t /p |, 1} MTLR Cox Aalen CSVR Site+Stage AE 9.58(0.11) 10.76(0.12) 19.06(2.04) 9.96(0.32) 11.73(0.62) AE-log 0.56(0.02) 0.61(0.02) 0.76(0.06) 0.56(0.02) 0.70(0.05) RAE 0.40(0.01) 0.44(0.02) 0.44(0.02) 0.44 (0.03) 0.53(0.02) I CSVR is censored support vector regression [4]. I Similar results on two other large survival datasets, SUPPORT2 and RHC References [1 ] D.R. Cox, Regression models and life-tables, J ROY STAT SOC B MET, 1972 [2 ] O.O. Aalen, A linear regression model for the analysis of life times, STAT MED, 1989 [3 ] T. Hastie & R. Tibshirani, Varying-coefficient models, J ROY STAT SOC B MET, 1993 [4 ] Shivaswamy et al., A support vector approach to censored targets, ICDM 2007

Upload: builiem

Post on 24-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Patient-Speci c Cancer Survival Distributions as apapersdb.cs.ualberta.ca/~papersdb/uploaded_files/1060/additional... · Learning Patient-Speci c Cancer Survival Distributions

Learning Patient-Specific Cancer Survival Distributions as aSequence of Dependent Regressors

Chun-Nam Yu1 Russ Greiner1 Hsiu-Chin Lin1 Vickie Baracos2

Dept of Computing Science1/Dept of Oncology2, University of Alberta, Canada

Introduction

I Cancer patients and doctors need to make decisions about treatments andend-of-life care

I Good prognostic models help manage the uncertainty and make better decisionsI Use statistics such as median survival time, 5 year survival rateI Now: estimated from a large heterogenous population, based on only:. Site + Stage of cancer. (Perhaps) 1 or 2 covariates (e.g., age, sex, race)

I Should use other information already in electronichealth records:. prescriptions. blood test results. performance assessment by physicians

I e.g. the Alberta Cancer RegistryI Our GOAL: Build accurate patient-specific

prognostic predictors with electronic health records

Basic Survival Analysis

I Survival function:

S(t) = P(T ≥ t)

is the proportion of patients surviving longer thant months

S(t)

months

0.5

12

I Hazard function:

h(t) = lim∆t→0

P(t ≤ T ≤ t + ∆t | T ≥ t)

∆tis the instantaneous rate of failure/death at timet, given the patient survives longer than t

S(t)

months6 24I Censored Observations: Some patients leave in the middle of a study or remain

alive at the end of a study ⇒ only a lower bound on their survival time.

Survival Regression Models

I One of the most important regression model in survival analysis is the CoxProportional Hazards (PH) Model [1]

I Given covariates ~x , Cox regression assumes the hazard function have the form

h(t |~x) = h0(t) exp(~θT~x),

where h0(t) is a base hazard independent of ~x .I The hazard ratio is constant (independent of t) for two patients ~x1 and ~x2:

h(t |~x1)/h(t |~x2) = exp(~θT (~x1 − ~x2))

I Objective depends on rank of survival time only;cannot handle covariates with time-varying effects

I Other survival regression models include Aalen linearhazards model [2], parametric survival regressionbased on Weibull, log-normal distributions

I Survival Analysis VS Survival Prediction:

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●

●●●

●●

●●

●●

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

Cox survival curves for 3 patients

Time (Months)

P(s

urvi

val)

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●● ● ●● ● ● ● ● ● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●

●●

●●

●●

●●

Survival Analysis Survival PredictionGoal: Evaluate prognostic factors and treat-

ment effectivenessPredict accurate patient-specific sur-vival time

Focus: Evaluation on populations Evaluation on individualsUses: Testing new drugs & medical devices Treatment planning & patient man-

agement

Data - Cancer Patient Composition and Patient Attributes

site\stage 1 2 3 4Bronchus & Lung 61 44 186 390Colorectal 15 157 233 545Head and Neck 6 8 14 206Esophagus 0 1 1 63Pancreas 1 3 0 134Stomach 0 0 1 128Other Digestive 0 1 0 77Misc 1 0 3 123

basic age, sex, weight gain/loss,BMI, cancer site, cancer stage

general wellbeing no appetite, nausea, sore mouth,taste funny, constipation, pain,dental problem, dry mouth, vomit,diarrhea, performance status

blood test granulocytes, LDH-serum, HGB,lyphocytes platelet, WBC count,calcium-serum, creatinine, albumin

Predicting Personalized Survival Distributions

I Consider a simpler problem: will a subject with covariate ~x survive ≥ t months?I Model this with a simple logistic regression model:

P~θ(T ≥ t | ~x) = 1/(1 + exp(~θT~x + b))

S(t)

monthst1

S(t)

monthst2

S(t)

monthst3

I Multiple thresholds t1, t2, . . . to capture survival function at different times

P~θj(T ≥ tj | ~x) = 1/(1 + exp(~θj

T~x + bj))

I Predict a sequence of dependent bits

1 1 1 1

0 1 1 1

0 0 1 1

0 0 0 1

0 0 0 0

y1 y2 y3 y4

(0, t1)

[t1, t2)

[t2, t3)

[t3, t4)

t4

Dies in

Dies in

Dies in

Dies in

Alive at

N+

1 sequences

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●● ●

0 10 30 50

0.0

0.2

0.4

0.6

0.8

1.0

MTLR survival curves for 4 patients

Time (Months)

P(s

urvi

val)

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●● ● ● ● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

● ●●●●●●●●●●●●●●●●●●●●●●

● ●●

●●

●●

●●●●●●

●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ●

I Multi-task logistic regression for survival prediction:. Likelihoood of observing a sequence:

P(Y =(y1, y2, . . . , yN) | ~x)=exp(

∑Nj=1 yj(~θj ·~x +bj))

Z (~Θ;~x). Add multi-task regularizer to prevent overfitting:

min~Θ

C1

2

N∑j=1

‖~θj‖2 +C2

2

N−1∑j=1

‖~θj+1−~θj‖2−n∑

i=1

N∑j=1

[yj(si)(~θj ·~xi +bj)− log Z (~Θ;~xi)

]I Similar to CRF training objective, but. no transition features. no sharing of node potentials. Partition function Z (~Θ;~x) involves only a linear number of terms

I Model also related to local regression approaches [3]

I Advantage of the MTLR approach:. No PH assumption - effects of covariates can change with time. Handles censoring: by integrating out hidden variables in a survival sequence. More accurate survival probability predictions (below)

Experimental Results

I Classification Accuracy and Survival Rate Prediction (5CV):Accuracy 5 month 12 month 22 monthMTLR 86.5(0.7) 76.1(0.9) 74.5(1.3)Cox 74.5(0.9) 59.3(1.1) 62.8(3.5)Aalen 73.3(1.2) 61.0(1.7) 59.6(3.6)Baseline 69.2(0.3) 56.2(2.0) 57.0(1.4)

MSE 5 month 12 month 22 monthMTLR 0.101(0.005) 0.158(0.004) 0.170(0.007)Cox 0.196(0.009) 0.270(0.008) 0.232(0.016)Aalen 0.198(0.004) 0.278(0.008) 0.288(0.020)Baseline 0.227(0.012) 0.299(0.011) 0.243(0.012)

I Baseline is the common approach of predicting with cancer site & stage only.I Predicted survival curves for two test patients (Patient 1: 3 months, Patient 2:

censored at 46 months)

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60

P(s

urv

iva

l)

months

MTLR

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60

P(s

urv

iva

l)

months

Cox

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60

P(s

urv

iva

l)

months

Aalen

I Optimizing clinically relevant loss functions (prediction p, true survival t):. Absolute error on survival time (AE): |p − t|. Absolute error on log survival time (AE-log): | log p − log t|. Relative absolute error (RAE): min{|p − t/p|, 1}

MTLR Cox Aalen CSVR Site+StageAE 9.58(0.11) 10.76(0.12) 19.06(2.04) 9.96(0.32) 11.73(0.62)AE-log 0.56(0.02) 0.61(0.02) 0.76(0.06) 0.56(0.02) 0.70(0.05)RAE 0.40(0.01) 0.44(0.02) 0.44(0.02) 0.44 (0.03) 0.53(0.02)

I CSVR is censored support vector regression [4].I Similar results on two other large survival datasets, SUPPORT2 and RHC

References

[1 ] D.R. Cox, Regression models and life-tables, J ROY STAT SOC B MET, 1972

[2 ] O.O. Aalen, A linear regression model for the analysis of life times, STAT MED, 1989

[3 ] T. Hastie & R. Tibshirani, Varying-coefficient models, J ROY STAT SOC B MET, 1993

[4 ] Shivaswamy et al., A support vector approach to censored targets, ICDM 2007