11 confidence intervals, q-learning and dynamic treatment regimes s.a. murphy time for causality –...
TRANSCRIPT
11
Confidence Intervals, Q-Learning and
Dynamic Treatment Regimes
S.A. Murphy
Time for Causality – Bristol
April, 2012
2
Outline
• Dynamic Treatment Regimes
• Example Experimental Designs & Challenges
• Q-Learning & Challenges
3
Dynamic treatment regimes are individually tailored treatments, with treatment type and dosage changing according to patient outcomes. Operationalize clinical practice.
k Stages for one individual
Observation available at jth stage
Action at jth stage (usually a treatment)
4
Example of a Dynamic Treatment Regime
•Adaptive Drug Court Program for drug abusing offenders.
•Goal is to minimize recidivism and drug use.
•Marlowe et al. (2008, 2011)
5
non-responsiveAs-needed court hearings As-needed court hearings
low risk + standard counseling + ICM
non-complianthigh risk
non-responsiveBi-weekly court hearings Bi-weekly court hearings + standard counseling + ICM
non-compliant Court-determined disposition
Adaptive Drug Court Program
6
Goal: Construct decision rules that input information available at each stage and output a recommended decision; these decision rules should lead to a maximal mean Y where Y is a function of
The dynamic treatment regime is a sequence of two decision rules:
k=2 Stages
7
Outline
• Dynamic Treatment Regimes
• Example Experimental Designs & Challenges
• Q-Learning & Challenges
8
Data for Constructing the Dynamic Treatment Regime:
Subject data from sequential, multiple assignment, randomized trials. At each stage subjects are randomized among alternative options.
Aj is a randomized action with known randomization probability.
binary actions with P[Aj=1]=P[Aj=-1]=.5
9
Pelham’s ADHD Study
B. Begin low dosemedication
8 weeks
Assess-Adequate response?
B1. Continue, reassess monthly; randomize if deteriorate
B2. Increase dose of medication with monthly changes
as neededRandom
assignment:B3. Add behavioral
treatment; medication dose remains stable but intensity
of bemod may increase with adaptive modifications
based on impairment
No
A. Begin low-intensity behavior modification
8 weeks
Assess-Adequate response?
A1. Continue, reassess monthly;randomize if deteriorate
A2. Add medication;bemod remains stable butmedication dose may vary
Randomassignment:
A3. Increase intensity of bemod with adaptive modifi-
cations based on impairment
Yes
No
Randomassignment:
10
Oslin’s ExTENd Study
Late Trigger forNonresponse
8 wks Response
TDM + Naltrexone
CBIRandom
assignment:
CBI +Naltrexone
Nonresponse
Early Trigger for Nonresponse
Randomassignment:
Randomassignment:
Randomassignment:
Naltrexone
8 wks Response
Randomassignment:
CBI +Naltrexone
CBI
TDM + Naltrexone
Naltrexone
Nonresponse
11
Kasari Autism Study
B. JAE + AAC
12 weeks
Assess-Adequate response?
B!. JAE+AAC
B2. JAE +AAC ++No
A. JAE+ EMT
12 weeks
Assess-Adequate response?
JAE+EMT
JAE+EMT+++
Randomassignment:
JAE+AAC
Yes
No
Randomassignment:
Yes
Jones’ Study for Drug-Addicted Pregnant Women
rRBT
2 wks Response
rRBT
tRBTRandom
assignment:
rRBT
Nonresponse
tRBT
Randomassignment:
Randomassignment:
Randomassignment:
aRBT
2 wks Response
Randomassignment:
eRBT
tRBT
tRBT
rRBT
Nonresponse
13
Challenges• Goal of trial may differ
– Experimental designs for settings involving new drugs (cancer, ulcerative colitis)
– Experimental designs for settings involving already approved drugs/treatments
• Choice of Primary Hypothesis/Analysis & Secondary Hypothesis/Analysis
• Longitudinal/Survival Primary Outcomes
• Sample Size Formulae
14
Outline
• Dynamic Treatment Regimes
• Example Experimental Designs & Challenges
• Q-Learning & Challenges
15
Secondary Analysis: Q-Learning
•Q-Learning (Watkins, 1989; Ernst et al., 2005; Murphy, 2005) (a popular method from computer science)
•Optimal nested structural mean model (Murphy, 2003; Robins, 2004)
• The first method is an inefficient version of the second method when (a) linear models are used, (b) each stages’ covariates include the prior stages’ covariates and (c) the treatment variables are coded to have conditional mean zero.
16
Goal: Construct
for which is maximal.
k=2 Stages
is called the value
and the maximal value is denoted by
18
There is a regression for each stage.
Simple Version of Q-Learning –
• Stage 2 regression: Regress Y on to obtain
• Stage 1 regression: Regress on to obtain
19
for subjects entering stage 2:
• is a predictor of
• is the predicted end of stage 2 response when the stage 2 treatment is equal to the “best” treatment.
• is the dependent variable in the stage 1 regression for patients moving to stage 2
maxa2 Q2(X 1;A1;X 2;a2)
20
A Simple Version of Q-Learning –
• Stage 2 regression, (using Y as dependent variable) yields
• Arg-max over a2 yields
21
A Simple Version of Q-Learning –
• Stage 1 regression, (using as dependent variable) yields
• Arg-max over a1 yields
23
Non-regularity
•
• Limiting distribution of is non-regular (Robins, 2004)
• Problematic area in parameter space is around for which
• Standard asymptotic approaches invalid without modification (see Shao, 1994; Andrews, 2000).
P [ T2 S2¼0]> 0
¯2
24
Non-regularity –
• Problematic term in is
where
• This term is well-behaved if is bounded away from zero.
• We want to form an adaptive confidence interval.
cT § ¡ 11 Pn
·B1¡jpn^T2 S2)j ¡ j
pn¯T2 S2j
¢¸
25
Idea from Econometrics
• In nonregular settings in Econometrics there are a fixed number of easily identified “bad” parameter values at which you have nonregular behavior of the estimator
• Use a pretest (e.g. an hypothesis test) to test if you are near a “bad” parameter value; if the pretest rejects, use standard critical values to form confidence interval; if the pretest accepts, use the maximal critical value over all possible local alternatives. (Andrews & Soares, 2007; Andrews & Guggenberger, 2009)
• Construct smooth upper and lower bounds so that for all n
• The upper/lower bounds use a pretest:
• Embed in the formula for a pretest of
based on26
Our Approach
Tn(s2) =n(sT2 ^2)
2
sT2 § 2s2
H0: sT2 ¯2 = 0
28
Confidence Interval• Let and be the bootstrap analogues of and
• is the probability with respect to the bootstrap weights. is the quantile of ; is the quantile of .
Theorem: Assume moment conditions, invertible var-covariance matrices and , then
PMl
29
Adaptation Theorem: Assuming finite moment conditions and invertible var-covariance matrices are invertible, ,
then
each converge, in distribution, to the same limiting distribution (the last two in probability).
P [ST2 ¯2 6= 0]= 1
30
• Some Competing Methods
• Soft-thresholding (ST) Chakraborty et al. (2009)
• Centered percentile bootstrap (CPB)
• Plug-in pretesting estimator (PPE)
• Generative Models
• Nonregular (NR):
• Nearly Nonregular (NNR):
• Regular (R):
• n=150, 1000 Monte Carlo Reps, 1000 Bootstrap Samples
Empirical Study
P [ST2 ¯2 =0]> 0
31
Example –Two Stages, two treatments per stage
Type CPB ST PPE ACI
NR .93* .95 (.34) .93* .99(.50)
R .94(.45) .92* .93* .95(.48)
NR .93* .76* .90* .96(.49)
NNR .93* .76* .90* .97(.49)
Size(width)
32
Example –Two Stages, three treatments in stage 2
Type CPB PPE ACI
NR .93* .93* 1.0(.72)
R .94(.56) .92* .96(.63)
NR .89* .86* .97(.67)
NNR .90* .86* .97(.67)
Size(width)
33
Pelham’s ADHD Study
B. Begin low dosemedication
8 weeks
Assess-Adequate response?
B1. Continue, reassess monthly; randomize if deteriorate
B2. Increase dose of medication with monthly changes
as neededRandom
assignment:B3. Add behavioral
treatment; medication dose remains stable but intensity
of bemod may increase with adaptive modifications
based on impairment
No
A. Begin low-intensity behavior modification
8 weeks
Assess-Adequate response?
A1. Continue, reassess monthly;randomize if deteriorate
A2. Add medication;bemod remains stable butmedication dose may vary
Randomassignment:
A3. Increase intensity of bemod with adaptive modifi-
cations based on impairment
Yes
No
Randomassignment:
3434
• (X1, A1, R1, X2, A2, Y)
– Y = end of year school performance
– R1=1 if responder; =0 if non-responder
– X2 includes the month of non-response, M2, and a measure of adherence in stage 1 (S2 )
– S2 =1 if adherent in stage 1; =0, if non-adherent
– X1 includes baseline school performance, Y0 , whether medicated in prior year (S1), ODD (O1)
– S1 =1 if medicated in prior year; =0, otherwise.
ADHD Example
3636
• Stage 1 regression for
• Interesting stage 1 contrast: is it important to know whether the child was medicated in the prior year (S1=1) to determine the best initial treatment in the sequence?
ADHD Example
(1;Y0;S1;O1)®1+A1(¯11+S1¯12)
3737
• Stage 1 treatment effect when S1=1:
• Stage 1 treatment effect when S1=0:
ADHD Example
90% ACI
(-0.48, 0.16)
(-0.05, 0.39)
¯11+¯12¯11
¯11+¯12
¯11
3838
IF medication was not used in the prior year THEN begin with BMOD;
ELSE select either BMOD or MED.
IF the child is nonresponsive and was non-adherent, THEN augment present treatment;
ELSE IF the child is nonresponsive and was adherent, THEN select intensification of current treatment.
Dynamic Treatment Regime Proposal
39
• There are multiple ways to form ; what are the pros and cons?
•Improve adaptation by a pretest of
• High dimensional data; investigators want to collect real time data
• Feature construction & Feature selection
•Many stages or infinite horizon
Challenges
Y
4040
This seminar can be found at:
http://www.stat.lsa.umich.edu/~samurphy/
seminars/Bristol04.10.12.ppt
Email Eric Laber or me for questions: