recent advances in outcome weighted learning for precision

47
Recent Advances in Outcome Weighted Learning for Precision Medicine Michael R. Kosorok Department of Biostatistics Department of Statistics and Operations Research University of North Carolina at Chapel Hill September 15, 2017

Upload: others

Post on 28-Nov-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recent Advances in Outcome Weighted Learning for Precision

Recent Advances in Outcome WeightedLearning for Precision Medicine

Michael R. Kosorok

Department of BiostatisticsDepartment of Statistics and Operations Research

University of North Carolina at Chapel Hill

September 15, 2017

Page 2: Recent Advances in Outcome Weighted Learning for Precision

Acknowledgments

I Many co-authors and collaborators helped with this work.

I Especially Guanhua Chen (UW-Madison), Yifan Cui (PhDstudent in Statistics and Operations Research atUNC-Chapel Hill), Anna Kahkoska (MD-PhD student atUNC-Chapel Hill), Eric Laber (NCSU), Daniel Luckett(PhD student in Biostatistics at UNC-Chapel Hill), DavidMaahs (Stanford), Elizabeth Mayer-Davis (UNC-ChapelHill), Donglin Zeng (UNC-Chapel Hill), and Ruoqing Zhu(UI-Urban-Champaign).

I This project was funded in part by US NIH grants P01CA142538 and UL1 TR001111, and US NSF grantDMS-1407732, among others.

Michael R. Kosorok 2/ 47

Page 3: Recent Advances in Outcome Weighted Learning for Precision

Outline

Introduction

Outcome weighted learning

A Tree Based Approach for Censored Data

Dose finding

Precision medicine in mHealth

Overall conclusions and future work

Michael R. Kosorok 3/ 47

Page 4: Recent Advances in Outcome Weighted Learning for Precision

Personalized vs. precision medicine

I Personalized medicineI The idea of developing targeted treatments that take into

account patient heterogeneityI Clinicians have been doing this for centuries: not really new

I Precision medicineI The idea of developing targeted treatments that take into

account patient heterogeneityI Empirically based, scientifically rigorous, reproducible, and

generalizable (i.e., will work with future patients): very new

Michael R. Kosorok 4/ 47

Page 5: Recent Advances in Outcome Weighted Learning for Precision

Ingredients of precision medicine

I Incorporates data on:I Genetic makeup, proteomics, metabolomics, metabiomics, etc.I Environmental factors, demographics, etc.I Lifestyle, phenotypic information, etc.

I Scientific tools:I Biomedical knowledgeI Data (potentially integrated across many platforms)I Knowledge driven vs. data drivenI Computational, mathematical and statistical tools

Michael R. Kosorok 5/ 47

Page 6: Recent Advances in Outcome Weighted Learning for Precision

Clinical focus

We want to make the best treatment decisions based on data.I The single-decision setting:

I A patient presents with a disease and we need to decide whattreatment (or dose) to give from a list of choices.

I We want to make the best decision (treatment regimen) basedon available baseline patient-level feature data.

I The multi-decision setting:I Treat patients for diseases with multiple treatment decisions

occurring over time.I Make the best decision based on historical patient-level data at

each decision time (dynamic treatment regimen).I The best decisions take into account delayed effects.

I Real time decision making in mHealth:I A large number of decisions need to be made in real timeI Technical and practical challenges for implementing

Michael R. Kosorok 6/ 47

Page 7: Recent Advances in Outcome Weighted Learning for Precision

Outline

Introduction

Outcome weighted learning

A Tree Based Approach for Censored Data

Dose finding

Precision medicine in mHealth

Overall conclusions and future work

Michael R. Kosorok 7/ 47

Page 8: Recent Advances in Outcome Weighted Learning for Precision

Single decision setting

I Let X be the vector of patient tailoring variables, A thechoice of treatment given, and R the clinical outcome(with larger being better).

I The “standard” approach is to first estimate theQ-function

Q(x , a) = E [R |X = x ,A = a] ,

through regression of R on (X ,A), and invert to obtain

d(x) = argmaxa

Q(x , a).

I Issue: This approach is indirect, since we must estimateQ(x , a) and invert.

Michael R. Kosorok 8/ 47

Page 9: Recent Advances in Outcome Weighted Learning for Precision

Value function and optimal individualized treatmentrule

I Let P be the distribution of (X ,A,R), with treatmentsrandomized via π(A|X ), and Pd the distribution of(X ,A,R), with treatments chosen according to d . Thevalue function of d (Qian & Murphy, 2011) is

V (d) = E d(R) =

∫RdPd =

∫RdPd

dPdP = E

[I (A = d)

π(A|X )R

].

I Optimal Individualized Treatment Rule:

d∗ ∈ argmaxd

V (d).

E (R |X ,A = 1) > E (R |X ,A = −1)⇒ d∗(X ) = 1

E (R |X ,A = 1) < E (R |X ,A = −1)⇒ d∗(X ) = −1

Michael R. Kosorok 9/ 47

Page 10: Recent Advances in Outcome Weighted Learning for Precision

Outcome weighted learning (OWL or O-learning)

Optimal Individualized Treatment Rule d∗

Maximize the value Minimize the risk

E

[I (A = d(X ))

π(A|X )R

]E

[I (A 6= d(X ))

π(A|X )R

]

I For any rule d , d(X ) = sign(f (X )) for some function f .

I Empirical approximation to the risk function:

n−1n∑

i=1

Ri

π(Ai |Xi)I (Ai 6= sign(f (Xi))).

I Computational challenges: non-convexity anddiscontinuity of 0-1 loss.

Michael R. Kosorok 10/ 47

Page 11: Recent Advances in Outcome Weighted Learning for Precision

Using a support vector machine (SVM) approach

Objective Function: Regularization Framework

minf

{1

n

n∑i=1

Ri

π(Ai |Xi)φ(Ai f (Xi)) + λn‖f ‖2

}. (1)

I φ(u) = (1− u)+ is the hinge loss surrogate, ‖f ‖ is somenorm for f , and λn controls the penalty on f .

I A linear decision rule: f (X ) = XTβ + β0, with ‖f ‖ as theEuclidean norm of β.

I Estimated individualized treatment rule:

dn = sign(fn(X )),

where fn is the solution to (1).

Michael R. Kosorok 11/ 47

Page 12: Recent Advances in Outcome Weighted Learning for Precision

Outcome Weighted Learning (OWL or O-Learning)

Figure 1: Zhao YQ, Zeng D, Rush AJ, and Kosorok MR (2012).Estimating individualized treatment rules using outcome weightedlearning. JASA 107:1106–1118.

Michael R. Kosorok 12/ 47

Page 13: Recent Advances in Outcome Weighted Learning for Precision

Results for O-Learning

I Can use kernel trick to extend to nonparametric decisionrule (e.g., the Gaussian kernel).

I Fisher consistent, asymptotically consistent, and modelrobust.

I Risk bounds and convergence rates similar to thoseobserved in SVM literature (Tsybakov, 2004).

I Excellent simulation results and data analysis ofNefazodone-CBASP clinical trial (Keller et al., 2000).

I Promising performance overall (Y.Q. Zhao, et al., 2012).

I An example of a direct value function approach (see alsoB. Zhang, et al., 2012).

I Opens door to a unique application of machine learningtechniques to personalized medicine.

Michael R. Kosorok 13/ 47

Page 14: Recent Advances in Outcome Weighted Learning for Precision

O-Learning Extensions

I Multiple decision times:I Zhao YQ, Zeng D, Laber EB, and Kosorok MR (2015). New

statistical learning methods for estimating optimal dynamictreatment regimes. JASA 110:583–598.

I Also works well, but a concern is the low fraction of data beingused.

I Issue: O-learning is not invariant under constant shifts inoutcome R .

I Zhou X, Mayer-Hamblett N, Khan U, and Kosorok MR (2017).Residual weighted learning for estimating individualizedtreatment rules. JASA 112:169–187.

Michael R. Kosorok 14/ 47

Page 15: Recent Advances in Outcome Weighted Learning for Precision

O-Learning Extensions

I What about observational data?I One method that seems to work is to replace the known

propensity score π(Ai |Xi ) in (1) with an estimate π(Ai |Xi ).

I More than two treatment optionsI Ordinal treatment options (e.g., organized by dose level) (in

progress)I Nonordinal treatments (in progress)

Michael R. Kosorok 15/ 47

Page 16: Recent Advances in Outcome Weighted Learning for Precision

O-Learning Extensions

I Censored data:I Zhao YQ, Zeng D, Laber EB, Song R, Yuan M, and Kosorok

MR (2015). Doubly robust learning for estimatingindividualized treatment with censored data. Biometrika102:151-168.

I Two basic approaches:I Inverse weighting: requires correct modeling of conditional

distribution function of censoring time given covariates.I Double robust: correct modeling of one of either conditional

censoring or conditional survival distribution functions (inverseweighting still used).

I Works quite well, but a concern is the need for inverseprobability of censoring weighting.

Michael R. Kosorok 16/ 47

Page 17: Recent Advances in Outcome Weighted Learning for Precision

O-Learning Extensions

I Continuous treatment options (e.g., dose or time)I Chen G, Zeng D, and Kosorok MR (2016). Personalized dose

finding using outcome weighted learning (with discussion andrejoinder). Journal of the American Statistical Association111:1509-1547.

I V-learning for continuous time and mHealth (in progress)

Michael R. Kosorok 17/ 47

Page 18: Recent Advances in Outcome Weighted Learning for Precision

Outline

Introduction

Outcome weighted learning

A Tree Based Approach for Censored Data

Dose finding

Precision medicine in mHealth

Overall conclusions and future work

Michael R. Kosorok 18/ 47

Page 19: Recent Advances in Outcome Weighted Learning for Precision

A Tree Based Approach for Censored Data

I Cui Y, Zhu R, and Kosorok MR (In press). Tree basedweighted learning for estimating individualized treatmentrules with censored data. Electronic Journal of Statistics.

I Recall that (X ,A,R) is the observed data in outcomeweighted learning:

I X is the vector of patient-level features.I A is the treatment indicator.I R is the positive outcome: in survival data this is the event

time which is censored.

Michael R. Kosorok 19/ 47

Page 20: Recent Advances in Outcome Weighted Learning for Precision

A Tree Based Approach for Censored Data

Recall for outcome weighted learning, the targeted ITR satisfies

D∗(X ) = arg mind

E

[R1{A 6= d(X )}

π(A|X )

], (2)

where we use the hinge loss surrogate to ensure convexity.

However, due to censoring, we do not observe R = T but only

observe Y = T ∧ C and δ = 1{Y = T}. We will use two

nonparametric estimates of T based on censored data:

I R1 = E (T |X ,A) and

I R2 = 1{δ = 1}Y + 1{δ = 0}E (T |X ,A,Y ,T > Y ),

where the E terms are estimated via random forests.

Michael R. Kosorok 20/ 47

Page 21: Recent Advances in Outcome Weighted Learning for Precision

Recursively Imputed Survival Trees

We use recursively imputed survival trees (RIST) to estimate E :

I Zhu R and Kosorok MR (2013). Recursively imputedsurvival trees. JASA 107:331–340.

The basic ingredients of RIST:

I RIST is a random forest method (see, e.g., Breiman,2001, Machine Learning) for right censored data (slightlymodified).

I RIST uses a form of Monte Carlo EM (see, e.g., Wei andTanner, 1990, JASA) to avoid inverse probability ofcensoring weighting (IPCW) as done in Hothorn et al(2006, Biostatistics) or single-step hazard estimation interminal nodes (see, e.g., Ishwaran et al, 2008, Annals ofApplied Statistics).

Michael R. Kosorok 21/ 47

Page 22: Recent Advances in Outcome Weighted Learning for Precision

Theoretical Results

TheoremUnder reasonable regularity conditions, there exist sequencesrn,wn > 0 converging to zero such that for each covariate value xand treatment choice a,

pr{∣∣E (T | x , a)− E (T | x , a)

∣∣ ≤ rn}≥ 1− wn

and

pr{∣∣E (T | x , a,T > Y ,Y )− E (T | x , a,T > Y ,Y )

∣∣ ≤ rn}

≥ 1− 2wn.

Michael R. Kosorok 22/ 47

Page 23: Recent Advances in Outcome Weighted Learning for Precision

Theoretical Results

Let d∗(x) = sign (f ∗(x)) be the decision function which optimizes

the value function V (d) over all d and dn(x) = sign(fn(x)

)be

the estimated decision function.

TheoremUnder reasonable regularity conditions and assuming that thesequence λn > 0 satisfies λn → 0 and λn ln n→∞, there existsequences εn,wn > 0 converging to zero such that

pr{V (f ∗) ≤ V (fn) + εn

}≥ 1− wn,

Michael R. Kosorok 23/ 47

Page 24: Recent Advances in Outcome Weighted Learning for Precision

Simulation Results

We considered several simulation scenarios, but present two here:

I Scenario 3: Failure time is non-linear with 5 covariateswhile censoring time follows a Cox model.

I Scenario 4: Failure time is AFT with 10 covariates whilecensoring time follows a Cox model.

Other features of both scenarios:

I 45% censoring rate with n = 200 sample size

I 500 replications and predictions based on log failure time

I Compare five methods: (1) oracle, (2) RIST-R1, (3)RIST-R2, (4) ICO, (5) DR, and (6) Cox

I Both linear and Gaussian kernels used (except for Cox).

Michael R. Kosorok 24/ 47

Page 25: Recent Advances in Outcome Weighted Learning for Precision

Simulation Results

●●

●●●

●●●●●●

●●●

●●●●●●●

●●

●●●

●●●

●●●●●

●●●

●●●●

●●

●●

●●●

●●●

●●

0.50

0.75

1.00

T RIST−R1 RIST−R2 ICO DR Cox

Log

surv

ival

tim

e

Kernel (except Cox)LinearGaussian

Scenario 3

Figure 2: Boxplots of mean log survival time for different treatmentregimes. The black horizontal line is the theoretical optimal value.

Michael R. Kosorok 25/ 47

Page 26: Recent Advances in Outcome Weighted Learning for Precision

Simulation Results

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●●●●●●●●● ●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

−0.7

−0.6

−0.5

−0.4

T RIST−R1 RIST−R2 ICO DR Cox

Log

surv

ival

tim

e

Kernel (except Cox)LinearGaussian

Scenario 4

Figure 3: Boxplots of mean log survival time for different treatmentregimes. The black horizontal line is the theoretical optimal value.

Michael R. Kosorok 26/ 47

Page 27: Recent Advances in Outcome Weighted Learning for Precision

Non-Small Cell Lung Cancer (NSCLC) Trial Data

We apply the proposed methods to the NSCLC randomized clinical

trial data described in Socinski, et al (2002, Journal of Clinical

Oncology 29:1335–1343):

I Two treatment arms with 114 patients each (228 total)

I Study duration of 104 weeksI Five covariates:

I performance status: ranging from 70% to 100%I cancer stage: 3 or 4I race: black, white, otherI genderI age: ranging from 31 to 82

I We use the methods from the simulations with valuefunction based on log(T )

Michael R. Kosorok 27/ 47

Page 28: Recent Advances in Outcome Weighted Learning for Precision

NSCLC Trial Data: Results

●●

3.0

3.5

4.0

RIST−R1 RIST−R2 ICO DR Cox

Val

ue

Kernel (except Cox)LinearGaussian

Non−small−cell lung cancer data

Figure 4: Boxplots of cross-validated (CV) value of survival weeks on thelog scale. Four-fold CV randomly repeated 100 times.

Michael R. Kosorok 28/ 47

Page 29: Recent Advances in Outcome Weighted Learning for Precision

Overall Results for Tree Based Approach

I The proposed outcome-weighted learning using RIST-R2

works well over a range of right-censored survival settings.

I The idea of using random forests to impute beforedownstream analyses seems beneficial.

I There remain several important open research questionsabout inference in this context.

Michael R. Kosorok 29/ 47

Page 30: Recent Advances in Outcome Weighted Learning for Precision

Outline

Introduction

Outcome weighted learning

A Tree Based Approach for Censored Data

Dose finding

Precision medicine in mHealth

Overall conclusions and future work

Michael R. Kosorok 30/ 47

Page 31: Recent Advances in Outcome Weighted Learning for Precision

Motivating example

Bronchopulmonary Dysplasia

The most common morbidity associated with premature birth is broncho-

pulmonary dysplasia (BPD).

BPD20%−−→ pulmonary arterial hypertension

40%−−→ die.

Sildenafil

Sildenafil is a potent inhibitor of type 5 phosphodiesterase.

Sildenafil has been used for adults with pulmonary arterial hypertension.

Challenge: Researchers want to know the best dose of sildenafil forpremature infants with BPD as a function of infant characteristics.

Michael R. Kosorok 31/ 47

Page 32: Recent Advances in Outcome Weighted Learning for Precision

Traditional dose trial for sildenafil

I 100 patients randomized to:

(mg/kg/dose) 0 0.75 1.50 2.25 3Rate of developing

hypertension 20% 15% 10% 5% 10%

I But 2.25mg/kg/dose may not be optimal for all babies:

I a very early preterm baby with moderate BPD: 1.5 mg/kg/dose.

I an older baby with severe BPD : 5 mg/kg/dose.

Michael R. Kosorok 32/ 47

Page 33: Recent Advances in Outcome Weighted Learning for Precision

What is the challenge?

I For continuous dose P(A = f (X )) = 0.

I Thus O-learning cannot be applied directly.

I Solution: Chen G, Zeng D, and Kosorok MR (2016).Personalized dose finding using outcome weighted learning(with discussion and rejoinder). JASA 111:1509-1547.

I Note that maximizing the value function is approximatelyequivalent to minimizing:

Rφ(f ) = E

(R`φ(A− f (X ))

φp(A|X )

),

where `φ(A− f (X )) = min(|A− f (X )|/φ, 1).

Michael R. Kosorok 33/ 47

Page 34: Recent Advances in Outcome Weighted Learning for Precision

The `φ Loss

x

y

0-φ φ

1

x

y

0-φ φ

1

I Connection to Kernel estimator: Triangular Kernel (left).

I `φ is a bounded loss and brings robustness.

Michael R. Kosorok 34/ 47

Page 35: Recent Advances in Outcome Weighted Learning for Precision

Outcome weighted learning (O-Learning)

Objective Function: Loss + Penalty

minf

{1

n

n∑i=1

Ri`φ(Ai − f (Xi))

2φp(Ai |Xi)+ λn‖f ‖2

}. (3)

I ‖f ‖ is some norm for f , and λn controls the penalty.

I A linear decision rule: f (X ) = XTw + b, with ‖f ‖ beingthe Euclidean norm of w.

I The kernel trick generalizes to nonlinear/nonparametricrules.

I Estimated individualized treatment rule: fn solves (3).

I (2) is not convex but is the difference of two convexfunctions, so we use the DC algorithm.

Michael R. Kosorok 35/ 47

Page 36: Recent Advances in Outcome Weighted Learning for Precision

`φ as D.C.

x

y

0-φ φ

1

x

y

0-φ φ

1

Michael R. Kosorok 36/ 47

Page 37: Recent Advances in Outcome Weighted Learning for Precision

Results for Dose Finding

I Consistent, with convergence rate able to get close ton−1/4: can be strengthened to nearly n−1/2 under amodified kernel and additional assumptions (Luedtke andvan der Laan, 2016).

I In contrast, unbounded loss functions such as the absolutedeviation loss will not yield consistent result.

I Generalizes to observational data via inverse propensityscore weighting.

I Good simulation results: improves over other methods,especially indirect methods.

I Works well when applied to a Warfarin dosing data set.

Michael R. Kosorok 37/ 47

Page 38: Recent Advances in Outcome Weighted Learning for Precision

Outline

Introduction

Outcome weighted learning

A Tree Based Approach for Censored Data

Dose finding

Precision medicine in mHealth

Overall conclusions and future work

Michael R. Kosorok 38/ 47

Page 39: Recent Advances in Outcome Weighted Learning for Precision

Precision medicine in mHealth

Overall research goal:

I Develop estimation techniques (using data collected withmobile devices) for dynamic treatment regimes (which canbe implemented as personalized mHealth interventions)

Motivating example: type 1 diabetes

I Understand type 1 diabetes (T1D) and how it is managed

I Develop tailored mHealth interventions for T1Dmanagement

Michael R. Kosorok 39/ 47

Page 40: Recent Advances in Outcome Weighted Learning for Precision

What is T1D?

I Autoimmune disease wherein the pancreas producesinsufficient insulin

I Daily regimen of monitoring glucose and replacing insulin

I Hypo- and hyperglycemia result from poor management

I Glucose levels affected by insulin, diet, and physicalactivity

Michael R. Kosorok 40/ 47

Page 41: Recent Advances in Outcome Weighted Learning for Precision

The glucose-insulin dynamical system

A day in the life of a T1D patient:

time (half−hours)

gluc

ose

(mg/

dL)

0 10 20 30 40 48

5010

015

020

025

0

glucoseinsulinactivityfood

Figure 5: Plot of glucose, insulin, physical activity, and food intake.

Michael R. Kosorok 41/ 47

Page 42: Recent Advances in Outcome Weighted Learning for Precision

Mobile technology in T1D care

Mobile devices can be used to administer treatment and assist withdata collection in an outpatient setting, including

I Continuous glucose monitoring

I Accelerometers to track physical activity

I Insulin pumps to administer and log injectionsautomatically

These technologies can be incorporated using mobile phones.

Michael R. Kosorok 42/ 47

Page 43: Recent Advances in Outcome Weighted Learning for Precision

Research goals

Methodological goals:

I Estimate dynamic treatment regimes for use in mobilehealth

I Infinite time horizon, minimal modeling assumptions

I Observational data with minute-by-minute observations

I Online estimation to facilitate real-time decision making

Clinical goals:

I Provide patients information on the best actions tostabilize glucose

I Recommendations that are dynamic and personalized tothe patient

Michael R. Kosorok 43/ 47

Page 44: Recent Advances in Outcome Weighted Learning for Precision

Conceptual framework

I We use a Markov decision process (MDP) context

I One potential approach is to use infinite horizonQ-learning (models state-value as a function of actionassuming all future actions are optimal):

I Ertefaie A (2014). Constructing dynamic treatment regimes ininfinite-horizon settings. arXiv preprint arXiv:1406.0764.

I We developed V-learning which uses a policy learningapproach (models state-value as a function of policy):

I Luckett DJ, Laber EB, Kahkoska AR, Maahs DM,Mayer-Davis E, Kosorok MR (2016). Estimating dynamictreatment regimes in mobile health using V-learning. arXivpreprint arXiv:1611.03531.

Michael R. Kosorok 44/ 47

Page 45: Recent Advances in Outcome Weighted Learning for Precision

Summary for V-Learning

I Advantages of V-learning include

I Flexibility in choosing a model for V (π, s; θπ)

I Online estimation, randomized decision rules

I Flexibility in specifying reference distribution

I Parametric value estimates

I A tailored treatment regime delivered through mobiledevices may help to reduce hypo- and hyperglycemia inT1D patients

Michael R. Kosorok 45/ 47

Page 46: Recent Advances in Outcome Weighted Learning for Precision

Outline

Introduction

Outcome weighted learning

A Tree Based Approach for Censored Data

Dose finding

Precision medicine in mHealth

Overall conclusions and future work

Michael R. Kosorok 46/ 47

Page 47: Recent Advances in Outcome Weighted Learning for Precision

Overall Conclusions and Future Work

I This is an exciting time for precision medicine at theconfluence of machine learning, Big Data, and statistics.

I Outcome weighted learning, or O-learning, V-learning andrelated methods, bridge between machine learning andprecision medicine.

I There are numerous open questions.

I This work is part of the emergence of a new (or renewed)discipline focused on data driven decision making andprecision medicine, and statistics is playing a key role.

Michael R. Kosorok 47/ 47