11 confidence intervals, q-learning and dynamic treatment regimes s.a. murphy time for causality –...

11

Confidence Intervals, Q-Learning and

Dynamic Treatment Regimes

S.A. Murphy

Time for Causality – Bristol

April, 2012

2

Outline

• Dynamic Treatment Regimes

• Example Experimental Designs & Challenges

• Q-Learning & Challenges

3

Dynamic treatment regimes are individually tailored treatments, with treatment type and dosage changing according to patient outcomes. Operationalize clinical practice.

k Stages for one individual

Observation available at jth stage

Action at jth stage (usually a treatment)

4

Example of a Dynamic Treatment Regime

•Adaptive Drug Court Program for drug abusing offenders.

•Goal is to minimize recidivism and drug use.

•Marlowe et al. (2008, 2011)

5

non-responsiveAs-needed court hearings As-needed court hearings

low risk + standard counseling + ICM

non-complianthigh risk

non-responsiveBi-weekly court hearings Bi-weekly court hearings + standard counseling + ICM

non-compliant Court-determined disposition

Adaptive Drug Court Program

6

Goal: Construct decision rules that input information available at each stage and output a recommended decision; these decision rules should lead to a maximal mean Y where Y is a function of

The dynamic treatment regime is a sequence of two decision rules:

k=2 Stages

7

Outline




8

Data for Constructing the Dynamic Treatment Regime:

Subject data from sequential, multiple assignment, randomized trials. At each stage subjects are randomized among alternative options.

Aj is a randomized action with known randomization probability.

binary actions with P[Aj=1]=P[Aj=-1]=.5

9

Pelham’s ADHD Study

B. Begin low dosemedication

8 weeks

Assess-Adequate response?

B1. Continue, reassess monthly; randomize if deteriorate

B2. Increase dose of medication with monthly changes

as neededRandom

assignment:B3. Add behavioral

treatment; medication dose remains stable but intensity

of bemod may increase with adaptive modifications

based on impairment

No

A. Begin low-intensity behavior modification

8 weeks


A1. Continue, reassess monthly;randomize if deteriorate

A2. Add medication;bemod remains stable butmedication dose may vary

Randomassignment:

A3. Increase intensity of bemod with adaptive modifi-

cations based on impairment

Yes

No

Randomassignment:

10

Oslin’s ExTENd Study

Late Trigger forNonresponse

8 wks Response

TDM + Naltrexone

CBIRandom

assignment:

CBI +Naltrexone

Nonresponse

Early Trigger for Nonresponse

Randomassignment:

Randomassignment:

Randomassignment:

Naltrexone

8 wks Response

Randomassignment:

CBI +Naltrexone

CBI

TDM + Naltrexone

Naltrexone

Nonresponse

11

Kasari Autism Study

B. JAE + AAC

12 weeks


B!. JAE+AAC

B2. JAE +AAC ++No

A. JAE+ EMT

12 weeks


JAE+EMT

JAE+EMT+++

Randomassignment:

JAE+AAC

Yes

No

Randomassignment:

Yes

Jones’ Study for Drug-Addicted Pregnant Women

rRBT

2 wks Response

rRBT

tRBTRandom

assignment:

rRBT

Nonresponse

tRBT

Randomassignment:

Randomassignment:

Randomassignment:

aRBT

2 wks Response

Randomassignment:

eRBT

tRBT

tRBT

rRBT

Nonresponse

13

Challenges• Goal of trial may differ

– Experimental designs for settings involving new drugs (cancer, ulcerative colitis)

– Experimental designs for settings involving already approved drugs/treatments

• Choice of Primary Hypothesis/Analysis & Secondary Hypothesis/Analysis

• Longitudinal/Survival Primary Outcomes

• Sample Size Formulae

14

Outline




15

Secondary Analysis: Q-Learning

•Q-Learning (Watkins, 1989; Ernst et al., 2005; Murphy, 2005) (a popular method from computer science)

•Optimal nested structural mean model (Murphy, 2003; Robins, 2004)

• The first method is an inefficient version of the second method when (a) linear models are used, (b) each stages’ covariates include the prior stages’ covariates and (c) the treatment variables are coded to have conditional mean zero.

16

Goal: Construct

for which is maximal.

k=2 Stages

is called the value

and the maximal value is denoted by

17

Idea behind Q-Learning

18

There is a regression for each stage.

Simple Version of Q-Learning –

• Stage 2 regression: Regress Y on to obtain

• Stage 1 regression: Regress on to obtain

19

for subjects entering stage 2:

• is a predictor of

• is the predicted end of stage 2 response when the stage 2 treatment is equal to the “best” treatment.

• is the dependent variable in the stage 1 regression for patients moving to stage 2

maxa2 Q2(X 1;A1;X 2;a2)

20

A Simple Version of Q-Learning –

• Stage 2 regression, (using Y as dependent variable) yields

• Arg-max over a2 yields

21

A Simple Version of Q-Learning –

• Stage 1 regression, (using as dependent variable) yields

• Arg-max over a1 yields

22

Confidence Intervals

Y = ®T2 S02+maxa2

^T2 S2a2

= ®T2 S02+j^

T2 S2j

23

Non-regularity

•

• Limiting distribution of is non-regular (Robins, 2004)

• Problematic area in parameter space is around for which

• Standard asymptotic approaches invalid without modification (see Shao, 1994; Andrews, 2000).

P [ T2 S2¼0]> 0

¯2

24

Non-regularity –

• Problematic term in is

where

• This term is well-behaved if is bounded away from zero.

• We want to form an adaptive confidence interval.

cT § ¡ 11 Pn

·B1¡jpn^T2 S2)j ¡ j

pn¯T2 S2j

¢¸

25

Idea from Econometrics

• In nonregular settings in Econometrics there are a fixed number of easily identified “bad” parameter values at which you have nonregular behavior of the estimator

• Use a pretest (e.g. an hypothesis test) to test if you are near a “bad” parameter value; if the pretest rejects, use standard critical values to form confidence interval; if the pretest accepts, use the maximal critical value over all possible local alternatives. (Andrews & Soares, 2007; Andrews & Guggenberger, 2009)

• Construct smooth upper and lower bounds so that for all n

• The upper/lower bounds use a pretest:

• Embed in the formula for a pretest of

based on26

Our Approach

Tn(s2) =n(sT2 ^2)

2

sT2 § 2s2

H0: sT2 ¯2 = 0

27

Non-regularity

• The upper bound adds to :

28

Confidence Interval• Let and be the bootstrap analogues of and

• is the probability with respect to the bootstrap weights. is the quantile of ; is the quantile of .

Theorem: Assume moment conditions, invertible var-covariance matrices and , then

PMl

29

Adaptation Theorem: Assuming finite moment conditions and invertible var-covariance matrices are invertible, ,

then

each converge, in distribution, to the same limiting distribution (the last two in probability).

P [ST2 ¯2 6= 0]= 1

30

• Some Competing Methods

• Soft-thresholding (ST) Chakraborty et al. (2009)

• Centered percentile bootstrap (CPB)

• Plug-in pretesting estimator (PPE)

• Generative Models

• Nonregular (NR):

• Nearly Nonregular (NNR):

• Regular (R):

• n=150, 1000 Monte Carlo Reps, 1000 Bootstrap Samples

Empirical Study

P [ST2 ¯2 =0]> 0

31

Example –Two Stages, two treatments per stage

Type CPB ST PPE ACI

NR .93* .95 (.34) .93* .99(.50)

R .94(.45) .92* .93* .95(.48)

NR .93* .76* .90* .96(.49)

NNR .93* .76* .90* .97(.49)

Size(width)

32

Example –Two Stages, three treatments in stage 2

Type CPB PPE ACI

NR .93* .93* 1.0(.72)

R .94(.56) .92* .96(.63)

NR .89* .86* .97(.67)

NNR .90* .86* .97(.67)

Size(width)

33

Pelham’s ADHD Study

B. Begin low dosemedication

8 weeks


B1. Continue, reassess monthly; randomize if deteriorate

B2. Increase dose of medication with monthly changes

as neededRandom

assignment:B3. Add behavioral

treatment; medication dose remains stable but intensity

of bemod may increase with adaptive modifications

based on impairment

No

A. Begin low-intensity behavior modification

8 weeks


A1. Continue, reassess monthly;randomize if deteriorate

A2. Add medication;bemod remains stable butmedication dose may vary

Randomassignment:

A3. Increase intensity of bemod with adaptive modifi-

cations based on impairment

Yes

No

Randomassignment:

3434

• (X1, A1, R1, X2, A2, Y)

– Y = end of year school performance

– R1=1 if responder; =0 if non-responder

– X2 includes the month of non-response, M2, and a measure of adherence in stage 1 (S2 )

– S2 =1 if adherent in stage 1; =0, if non-adherent

– X1 includes baseline school performance, Y0 , whether medicated in prior year (S1), ODD (O1)

– S1 =1 if medicated in prior year; =0, otherwise.

ADHD Example

3535

• Stage 2 regression for Y:

• Stage 1 outcome:

ADHD Example

R1Y +(1¡ R1)Y

3636

• Stage 1 regression for

• Interesting stage 1 contrast: is it important to know whether the child was medicated in the prior year (S1=1) to determine the best initial treatment in the sequence?

ADHD Example

(1;Y0;S1;O1)®1+A1(¯11+S1¯12)

3737

• Stage 1 treatment effect when S1=1:

• Stage 1 treatment effect when S1=0:

ADHD Example

90% ACI

(-0.48, 0.16)

(-0.05, 0.39)

¯11+¯12¯11

¯11+¯12

¯11

3838

IF medication was not used in the prior year THEN begin with BMOD;

ELSE select either BMOD or MED.

IF the child is nonresponsive and was non-adherent, THEN augment present treatment;

ELSE IF the child is nonresponsive and was adherent, THEN select intensification of current treatment.

Dynamic Treatment Regime Proposal

39

• There are multiple ways to form ; what are the pros and cons?

•Improve adaptation by a pretest of

• High dimensional data; investigators want to collect real time data

• Feature construction & Feature selection

•Many stages or infinite horizon

Challenges

Y

4040

This seminar can be found at:

http://www.stat.lsa.umich.edu/~samurphy/

seminars/Bristol04.10.12.ppt

Email Eric Laber or me for questions:

[email protected] or [email protected]

11 confidence intervals, q-learning and dynamic treatment regimes s.a. murphy time for causality –...

Documents

treatment slide

nonresponse random assignment

wks response random

aaaaaaaaaa slide

multiple assignment

adequate response

treatment type

jae aac