optimal dynamic treatment strategies with protection against missed decision points

29
Stat Biosci DOI 10.1007/s12561-013-9107-8 Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points Susanne Rosthøj · Robin Henderson · Jessica K. Barrett Received: 13 November 2012 / Accepted: 29 November 2013 © International Chinese Statistical Association 2013 Abstract We review methods for determination of optimal dynamic treatment strate- gies and consider the consequences of patients missing scheduled clinic visits. We describe a Markov chain Monte Carlo procedure for parameter estimation in the pres- ence of incomplete data. We propose an optimal dynamic fixed-dose treatment allo- cation rule that accommodates the possibility of patients missing future scheduled visits. We compare our strategy with a globally optimal strategy through simulations and an application on control of blood clotting time for patients on long-term antico- agulation. Keywords Causal inference · Optimal dynamic treatments · Regret-regression · Missing data 1 Introduction A dynamic treatment strategy allows drug types, drug doses or other therapies to be repeatedly adjusted in response to measurements taken on a patient over time. How to formulate and estimate new strategies based on data from observational or random- ized studies have, from a statistical point of view, received considerable attention over S. Rosthøj (B ) Department of Biostatistics, Institute of Public Health, University of Copenhagen, Copenhagen, Denmark e-mail: [email protected] R. Henderson School of Mathematics and Statistics, University of Newcastle, Newcastle upon Tyne, UK J.K. Barrett MRC Biostatistics Unit, Institute of Public Health, University Forvie Site, Cambridge, UK

Upload: jessica-k

Post on 23-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat BiosciDOI 10.1007/s12561-013-9107-8

Optimal Dynamic Treatment Strategies with ProtectionAgainst Missed Decision Points

Susanne Rosthøj · Robin Henderson ·Jessica K. Barrett

Received: 13 November 2012 / Accepted: 29 November 2013© International Chinese Statistical Association 2013

Abstract We review methods for determination of optimal dynamic treatment strate-gies and consider the consequences of patients missing scheduled clinic visits. Wedescribe a Markov chain Monte Carlo procedure for parameter estimation in the pres-ence of incomplete data. We propose an optimal dynamic fixed-dose treatment allo-cation rule that accommodates the possibility of patients missing future scheduledvisits. We compare our strategy with a globally optimal strategy through simulationsand an application on control of blood clotting time for patients on long-term antico-agulation.

Keywords Causal inference · Optimal dynamic treatments · Regret-regression ·Missing data

1 Introduction

A dynamic treatment strategy allows drug types, drug doses or other therapies to berepeatedly adjusted in response to measurements taken on a patient over time. How toformulate and estimate new strategies based on data from observational or random-ized studies have, from a statistical point of view, received considerable attention over

S. Rosthøj (B)Department of Biostatistics, Institute of Public Health, University of Copenhagen, Copenhagen,Denmarke-mail: [email protected]

R. HendersonSchool of Mathematics and Statistics, University of Newcastle, Newcastle upon Tyne, UK

J.K. BarrettMRC Biostatistics Unit, Institute of Public Health, University Forvie Site, Cambridge, UK

Page 2: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

the past 10 years. Murphy [9] suggested a method for estimation of so-called opti-mal dynamic treatment strategies. Assuming we are interested in an outcome variableY measured at the end of the treatment phase, this strategy will for each time pointdescribe how to assign treatments A as a function of observed state variables S andprevious treatments such that the mean of Y is maximized. Robins [17] considered thesame problem from a different perspective, and subsequently several authors have ad-dressed the problem (e.g. [5, 8, 10, 19, 27]). Attention has mainly been concentratedon the causal effect of binary treatment decisions, but in this work we are particularlyinterested in the causal effect of changing drug doses and our focus is on continuoustreatments.

Methods for causal inference for a time-dependent treatment invariably assumethat patients will all be observed at the same times during the treatment phase. Inpractice, however, patients and physicians seldom adhere to planned visit structures,in particular if the patients have frequent scheduled visits. Examples include studiesof anticoagulation [2, 5, 21] and maintenance therapy of childhood leukemia [22].When assessing the causal effect of a particular dose, the timing of the measurementscannot be disregarded in the analysis since the effect of a change of dose will dependon how long the patient will stay on that dose (i.e. the time to the next visit at theclinic). Respecting this feature, the formulation of a new dynamic treatment strategybased on an approach suggested by van der Laan et al. [25] was suggested for theleukemia study by Rosthøj et al. [22], though the method was complicated and oflimited practical value.

Two particular problems arise when patients miss some of their scheduled exami-nations. First, the data available for estimation of new treatment strategies will con-tain missing values. In this case we demonstrate that estimation by regret regressionas suggested by Henderson et al. [5] can be performed using an MCMC-algorithm.Second, at each scheduled time point for dose assignment in future patient cohorts, adynamic strategy to suggest the best possible dose may be optimal under an assump-tion that the patients will be seen again at all future time points, so that dose can beupdated as necessary. If visits are missed then the anticipated updating will not bepossible and this may affect performance of the strategy. To accommodate this, wepropose an optimal dynamic fixed-dose strategy. We formulate this strategy under theassumption that future patients will have no future examinations and we will assignthe best possible fixed (unchangeable) doses to these patients as if there will be noopportunity for these to be changed later in the treatment phase. Each time the pa-tients present for examination, they are assigned a new fixed dose in response to theupdated available information. Repeatedly searching for the best possible fixed dosedefines a new dynamic treatment strategy.

We begin in Sect. 2 by outlining the framework for the estimation of causal effectsbased on Robins’ structural nested mean models [14, 16, 17]. Next, in Sect. 3 wedefine the new optimal dynamic fixed-dose strategy intended for incomplete datastructures. We evaluate the performance of this strategy and compare it to the optimaldynamic treatment strategy through an illustrative example and simulation studies inSect. 4. We demonstrate that if patients have no missed visits, the performance ofthe two strategies does not necessarily differ substantially. If the patients are allowedto miss some of the visits our new regime will perform better. In Sect. 5 we treat

Page 3: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

regret regression with missing data and suggest an MCMC procedure for estimation.We furthermore discuss how to formulate dosing strategies, when previous importantinformation needed for dose assignment is missing. In Sect. 6 we compare the optimaldynamic fixed-dose strategy with the optimal dynamic strategy in an application onanticoagulation. Some closing remarks in Sect. 7 complete the paper.

2 Outcome Models for Dynamic Dosing Strategies

2.1 Framework

We consider longitudinal data on n independent patients assuming that the length ofthe treatment period is the same for all patients. Each patient is assumed to havethe same K scheduled visits indexed by j = 1, . . . ,K . At each visit the treatingphysician measures a state variable Sj and decides which dose Aj the patient shallreceive until the next visit. Doses are assumed to be continuous but bounded. Forsimplicity, the states are assumed to be one-dimensional but the results can easilybe generalized to multidimensional states. At the end of the treatment a final out-come Y with finite mean is measured. The observed data in observational order arethus (S1,A1, . . . , SK,AK,Y ) for each patient. Letting a bar over a variable denoteits current and previous values, the physician has information on Sj = (S1, . . . , Sj )

and Aj−1 = (A1, . . . ,Aj−1) when deciding which dose Aj the patient is in need of.The target is to formulate a dosing strategy to assist the physicians in assigning thedoses Aj , in particular in a setting where the patients may miss some of the visits. Wefirst consider the situation for which all patients are examined at all K time points.

The notion of potential outcomes or counterfactuals [23] is useful for investigationof what would have happened to a patient if another treatment than the actually ob-served treatment was given. At each time point of dose assignment j = 1, . . . ,K thephysician may choose the dose from the set Aj of all possible doses. For each timepoint j , let aj = (a1, . . . , aj ) taking values in Aj = A1 × · · · ×Aj denote a possibletreatment sequence for the first j time points. Similarly define aj = (aj , . . . , aK) ∈Aj = Aj × · · · ×AK as a future treatment sequence. Let Sj (aj−1) denote the poten-tial outcome measured at the j th time point if the patient was given treatment aj−1

instead of the actually observed treatment. Similarly let Y(aK) denote the potentialoutcome measured at the end of the study assuming treatments aK were given.

A decision rule dj for the dose to be assigned at the j th time point is a functionthat takes as input the observed history (Sj , Aj−1) and outputs a dose aj to be given,aj = dj (Sj , Aj−1). Depending on the observed history, some doses in Aj may beimpossible to assign (e.g. increasing doses for patients for which an increase maycause side effects) and therefore the decision rule dj should suggest feasible dosesonly, namely doses belonging to a subset Aj (Sj , Aj−1) ⊆ Aj of all possible doses.The set of dosing rules d = dK = d1 = (d1, . . . , dK) describing how to assign dosesfor each time point in the study as a function of the observed history constitutes adynamic treatment strategy.

Page 4: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

2.2 Treatment Contrasts and the Structural Nested Mean Model

Considering a particular dynamic treatment regime d , we are interested in modelingthe mean of the potential outcome Y(d). This can, due to Robins [14, 16, 17] andMurphy [9], be accomplished by considering treatment contrasts for each time pointj defined as

˜Δdj

(

aj | Sj (aj−1), aj−1) = E

(

Y(aj−1, aj , dj+1) | Sj (aj−1))

− E(

Y(aj−1, dj ) | Sj (aj−1))

where dj = (dj , . . . , dK) is the treatment sequence defined by the strategy d fromthe j th time point onwards. These treatment contrasts compare the mean expectedoutcome when initiating the dynamic treatment strategy d at the j th time point tothe mean expected outcome when assigning dose aj at the current time point andfollowing the dynamic treatment strategy d from the next time point onwards, for apatient who had been given treatment aj−1. Note that the future doses prescribed bydj+1 for the two potential outcomes on the right hand side need not be equal sincethe observed history at future time points will differ depending on the dose assignedat the j th time point.

To relate the potential outcomes to the observed outcomes and to allow for estima-tion of causal effects, we need the standard assumptions of consistency (e.g. [3]) andno unmeasured confounding [16]. Because we assume continuous actions, we do notrequire the so-called positivity assumption, under which it is assumed that all possi-ble actions have non-zero probability of appearing in a data set. Instead, we will needto assume that the effect of any potential action can be obtained from an appropriateparametric model.

For the remainder of this paper we will only consider the means of the poten-tial outcomes corresponding to a strategy d conditional on the observed history(Sj , Aj−1) for each time point. To simplify notation we write the mean of thesepotential outcomes using the do-notation of Pearl [11] as

E(

Y(Aj−1, dj ) | Sj (Aj−1)) = E

(

Y | (Sj , Aj−1),do(dj ))

.

The contrasts based on the observed history can then be written as

Δdj (aj | Sj , Aj−1) = E

(

Y | (Sj , Aj−1),do(aj , dj+1)) − E

(

Y | (Sj , Aj−1),do(dj ))

for each time point j .Robins Structural Nested Mean Model (SNMM) [1, 9, 14, 16, 17] is an additive

decomposition of the mean of the outcome Y conditional on (SK, AK) involving thecontrasts Δd

j , j = 1, . . . ,K . The SNMM specifies the conditional mean as a telescop-ing sum of the following form:

E(Y | SK, AK)

= E(

Y | do(d1)) + {

E(

Y | S1,do(d1)) − E

(

Y | do(d1))}

+ {

E(

Y | (S1,A1),do(d2)) − E

(

Y | S1,do(d1, d2))}

Page 5: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

+ {

E(

Y | (S2,A1),do(d2)) − E

(

Y | (S1,A1),do(d2))}

+ {

E(

Y | (S2, A2),do(d3)) − E

(

Y | (S2,A1),do(d2, d3))}

+ · · ·+ {

E(

Y | (SK, AK−1),do(dK)) − E

(

Y | (SK−1, AK−1),do(dK))}

+ {

E(Y | SK, AK) − E(

Y | (SK, AK−1),do(dK))}

= E(

Y | do(dK)) +

K∑

j=1

εdj (Sj , Aj−1) +

K∑

j=1

Δdj (Aj | Sj , Aj−1), (1)

where we have defined

εdj (Sj , Aj−1) = E

(

Y | (Sj , Aj−1),do(dj )) − E

(

Y | (Sj−1, Aj−1),do(dj ))

.

The first term E(Y | do(dK)) of (1) is the marginal mean of the potential outcome ifall patients follow strategy d = dK . The εd

j are defined such that the right hand sideequals the left hand side. For each time point, they measure the effect of the statesbeing revealed on the potential outcome. The conditional mean of εd

j over Sj is zeroby construction. Murphy [9] termed these functions nuisance functions, given thatinterest in her case was in the effect of actions, not states.

2.3 Models for the Treatment Contrasts

The so-called A-learning approach to determining the causal effects of actions [9]is based on specification of parametric models for the contrasts Δd

j under particularregimes d . Two regimes have been considered in the literature: the zero regime andthe optimal dynamic regime.

Robins and co-authors (e.g. [14, 16]) and Almirall et al. [1] focus on the zeroregime, d = (0, . . . ,0), namely the treatment strategy prescribing no or baseline treat-ment at each time point. This is a non-dynamic strategy as it does not depend on theobserved history. The treatment contrast at the j th time point with the zero regime asreference is

γj (aj | Sj , Aj−1) = Δ0j (aj | Sj , Aj−1)

= E(

Y | (Sj , Aj−1),do(aj ,0j+1)) − E

(

Y | (Sj , Aj−1),do(0j ))

.

Robins termed these quantities ‘blips’ as they describe the causal effect of one last‘blip’ of treatment compared to no further treatment. Note that the only constraint onthe blip is that it equals zero when a dose of size zero is assigned. The zero regimeitself may not be of particular relevance but the blips γj can be used to isolate andstudy the causal effects of dose changes at each time point [1].

Murphy [9] focused on estimating a treatment strategy possessing the desirableproperty of being optimal in the sense that the strategy will optimise the mean of theoutcome. Denoting the optimal dynamic (OD) regime by dOD, the optimal regime

Page 6: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

thus fulfills E(Y | do(dOD)) ≥ E(Y | do(aK)) for all aK ∈ AK . Using dOD as thereference regime, the (negative) contrasts have the form

μj (aj | Sj , Aj−1) = −ΔdOD

j (aj | Sj , Aj−1)

= E(

Y∣

∣ (Sj , Aj−1),do(

dODj

)) − E(

Y∣

∣ (Sj , Aj−1),do(

aj , dODj+1

))

.

The (negative) contrasts are constrained such that each μj ≥ 0 and a minimum valueof zero is obtained when assigning the optimal dose aj = dOD

j (Sj , Aj−1). Murphytermed these contrasts ’regrets’ as each term due to (1) directly quantifies the loss inexpected outcome when choosing a non-optimal dose.

In the examples considered in the literature, models for the blips are usually of theform γj (aj | Sj , Aj−1;ψj) = aj ·ψT

j fj (Sj , Aj−1), i.e. a model containing a relevantsummary vector of the history given by a function fj and being linear in an unknownvector ψj of parameters, j = 1, . . . ,K , (e.g. [1, 17]). With this parameterization, theconstraint γj (0 | Sj , Aj−1;ψj) = 0 is always fulfilled. Similarly models for regrets(e.g. [5, 9, 21]) have been of the form

μj (aj | Sj , Aj−1;ψj) = ηj (Sj , Aj−1;ψj)h(

aj − dψj

j (Sj , Aj−1))

where ηj is a positive valued function depending on the observed history, h is a non-

negative link function for which h(0) = 0 and dψj

j is a function parameterized by ψj

that takes as input the observed history (Sj , Aj−1) and outputs a dose to be given atthe j th time point. With this parameterization of the regret functions, the regrets are

non-negative and attain the value 0 when assigning doses aj = dψj

j (Sj , Aj−1). There-

fore dψj

j is a direct parameterization of the optimal dose dODj at the j th time point.

With ψ = (ψT1 , . . . ,ψT

K)T , the dynamic treatment strategies dψ = (dψ11 , . . . , d

ψK

K )

indexed by ψ define a class Dψ of dynamic treatment strategies. An estimate ψ of ψ

for which

E(

Y∣

∣ do(

dψ)) ≥ E

(

Y∣

∣ do(

dψ))

for all ψ

defines an estimate of the optimal dynamic treatment regime dOD.A variety of estimation methods have been proposed for determining the parameter

ψ in either blips or regrets (e.g. Robins [1, 5, 8, 9, 14, 16, 17]). In this work wewill focus on the regret-regression method of Henderson et al. [5], under which thenuisance functions εd

j in (1) are modelled as linear combinations of residuals betweenobserved and expected states Sj or powered states Sm

j . The method is illustrated byexample in Sect. 4 and is explained in more detail in Henderson et al. [5] and Barrettet al. [2].

3 The Optimal Dynamic Fixed-Dose Strategy

We assume that each patient has assessments scheduled for times j = 1,2, . . . ,K

but we now allow the possibility that some visits may be missed. For simplicity we

Page 7: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

assume that missingness is completely at random [7]. Extension to missing at randomis in principle straightforward. We assume that the response Y is always observed.

We assume that a dose cannot be changed unless the state of the patient is mea-sured, corresponding to aj = aj−1 if the patient misses visit j . Thus, only states canbe missing. The challenge when patients skip some of the scheduled visits is twofold:we have missing data in the past and we will have missing data in the future. Withmissing data in the past we refer to the situation where the data set used for estimationcontain missing values. Missing data in the future refers to the patients being allowedto miss visits in the future.

Missing data in the past challenges the estimation procedure because of the miss-ing states. Standard techniques for handling missing covariates in longitudinal datasettings may solve this problem, e.g. multiple imputation or Markov Chain MonteCarlo (MCMC) methods [24].

Being interested in the optimal dynamic treatment regime, missing data in thefuture constitutes a larger problem. The optimal dynamic strategy dOD is definedsuch that the regret

μj (aj | Sj , Aj−1) = E(

Y∣

∣ (Sj , Aj−1),do(

dODj

)) − E(

Y∣

∣ (Sj , Aj−1),do(

aj , dODj+1

))

is zero for all time points j = 1, . . . ,K when the optimal dose is chosen. In thedefinition of the optimal regime it is assumed that the patient will be seen again atthe future time points j + 1, . . . ,K and that doses will be assigned according to thefuture optimal dynamic treatment regime dOD

j+1. This strategy thus depends on thepatients being monitored regularly and on the future states. However, when the timepoint for the next visit is uncertain, the future strategy should not depend on futurestates since we might not have the chance to update the dose. Therefore, the optimaldynamic treatment strategy assuming complete visits may not perform optimally ifthe patients are not monitored on a regular basis.

To handle the issue of measurements being missing in the future, we will focuson specifying another strategy that is intended to be more robust than the optimaldynamic treatment strategy in a setting where the time point for the next visit isuncertain. We will term this strategy the optimal dynamic fixed-dose strategy. Wefirst specify this regime as if no visits are missed in Sect. 3.1 and investigate theperformance of this regime compared to the optimal dynamic regime in a completedata setting through a simple illustrative example in Sect. 4. Missingness is deferredto Sect. 5.

3.1 The Optimal Dynamic Fixed-Dose Strategy, ODFD

In the definition of a treatment regime not depending on the timing of future measure-ments, we suggest a combination of dynamic and static treatment strategies. A statictreatment strategy is one that at each point in time fully prescribes the sequence ofdoses the patient shall receive from the current time point and onwards, irrespectiveof future evolving state information. We will consider a specific example of a staticstrategy, namely the strategy that does not change the dose. We will term it the fixed-dose strategy. For each time point j it assigns the current dose aj−1 to the patient

Page 8: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

at all remaining times within the treatment period, i.e. aj−1 = aj = · · · = aK . De-note the fixed-dose regime as dFD, defined by dFD

j (Sj , Aj−1) = Aj−1. Note that thisregime only depends on the history through the current dose. We need the definitionof a baseline dose a0 such that the fixed-dose strategy is well defined at the first timepoint, dFD

1 (S1) = a0. Based on this regime, treatment contrasts νj are defined as

νj (aj | Sj , Aj−1) = ΔdFD

j (aj | Sj , Aj−1)

= E(

Y∣

∣ (Sj , Aj−1),do(

aj , dFDj+1

)) − E(

Y∣

∣ (Sj , Aj−1),do(

dFDj

))

= E(

Y | (Sj , Aj−1),do(aj , . . . , aj ))

− E(

Y | (Sj , Aj−1),do(Aj−1, . . . ,Aj−1))

. (2)

Thus νj measures the causal effect of staying on current dose Aj−1 throughout thetreatment period compared with changing to aj and subsequently staying at that level.Note that the contrast is constrained to be zero whenever current dose is assigned tothe patient, i.e. νj (Aj−1 | Sj , Aj−1) = 0.

The fixed-dose strategy dFD is not of particular relevance itself. However, havingspecified the fixed-dose contrasts we can define another strategy by considering thedose a

optj maximizing the contrasts, namely

aoptj (Sj , Aj−1) := arg max

aj

νj (aj | Sj , Aj−1), (3)

which we will term the optimal fixed dose. The optimal fixed dose thus specifieswhich dose the patient should stay on to optimise the potential outcome, assumingthat the dose will not be changed in the future. Prescribing the optimal fixed doseleads to protection against the event that the patient misses all future visits. Since wedo not expect this always to happen, our proposal is to re-calculate the optimal fixeddose at each visit that takes place, based on the information available at that time.Thus our proposal is a dynamic regime. We will refer to it as an optimal dynamicfixed-dose strategy, ODFD, and will denote this strategy by dODFD. We will compareit to the optimal dynamic treatment strategy, OD, that assigns the best dose under theassumption that all future visits will take place.

An example of a dynamic strategy that has often been considered in the literature isthe question of the optimal timing of when to start a particular treatment (i.e. doses are0/1). Formulated in terms of the fixed-dose approach this corresponds to the patientat each visit being told to stay on the treatment throughout the whole treatment phase.When the patient comes to the clinic at the next time point of measurement, wheneverthat might be, the decision is revisited. As an example, with the application of Rosthøjet al. [22] in mind, children with leukemia will be told to take daily / weekly tabletsof chemotherapy of a certain dose, in principle throughout the whole treatment phase,but at the next visit to the clinic the dose will be adjusted as necessary.

As an aside, we note that our proposal is reminiscent of the work of van der Laan etal. [25] and Petersen et al. [12], who proposed a statically optimal dynamic treatmentregime. At each time point j a marginal structural model for Y conditional on history

Page 9: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

is assumed, and the sequence of doses maximising the mean of Y is obtained:

(aj , . . . , aK)opt(Sj , Aj−1) = arg maxaj ,aj+1,...,aK

E(

Y | (Sj , Aj−1),do(aj , . . . , aK))

.

This sequence of doses do not depend on future states, meaning they are static. Thefirst dose a

optj in the sequence of doses (aj , . . . , aK)opt(Sj , Aj−1) is assigned, and the

statically optimal doses are then re-evaluated when the next measurement Sj+1 be-comes available, meaning that the policy is dynamic. The difference from the ODFDpolicy is that we optimise over a single dose, to be held fixed, rather than a sequenceof doses. van der Laan et al. [25] term their class of models history adjusted marginalstructural models and estimate the parameters by inverse-probability weighting. Wehave in a previous application on the childhood leukemia data experimented with theimplementation of this class of models [22] and demonstrated that estimation basedon inverse-probability weighting for complex dynamic treatments involving dosingon a continuous scale may be impossible. We therefore do not consider this classof models as a useful approach for our problem, especially when missing data tech-niques are needed. Furthermore, Robins et al. [18] discuss the similarities and dif-ferences between the history adjusted marginal structural models and SNMMs andrecommend the use of SNMMs since the parameters estimated from a history ad-justed model may be logically incompatible.

3.2 Parameterization of the Optimal Dynamic Fixed-Dose Strategy

Parameterization of contrasts Δdj defining blips and regrets was discussed in Sect. 2.3.

In deciding how to model the fixed-dose contrasts (2) we take into account the fol-lowing points.

1. νj (Aj−1 | Sj , Aj−1) = 0 by construction.2. With continuous treatments, linear models for νj (aj | Sj , Aj−1) will force an op-

timum to be on a boundary of the dose space Aj (Sj , Aj−1).3. The length of treatment for which the fixed dose will be applied will decrease as

visit number j increases, meaning parameters should be time-dependent.4. Depending on the history, a physician will often know whether to leave the dose

unchanged, increase or decrease doses. It is more problematic to determine theadequate size of the change of dose [22].

With these in mind, we follow Murphy [9] in suggesting that attention is first givento specifying a model for the ODFD strategy dODFD

j (Sj , Aj−1;φj ) parameterized bya parameter φj . Next we suggest (2) might be parameterized as

νj

(

aj | Sj , Aj−1; (φj0, φj ))

= φj0(Aj−1 − aj ) · ((Aj−1 + aj ) − 2dODFDj (Sj , Aj−1;φj )

)

(4)

for a positive scale parameter φj0 and a vector of parameters φj . This form for νj

prescribes a parabola with axis of symmetry at aODj = dODFD

j (Sj , Aj−1;φj ) such

that νj is maximized at the optimal fixed dose dODFDj .

Page 10: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

4 Illustrative Example

We consider a simple example in some detail so as to compare ODFD and ODregimes and to consider estimation of parameters in a situation where there are nomissed visits.

4.1 Setup

We assume K = 3, S1 = Z1 and Sj = θ1 +θ2Sj−1 +θ3Aj−1 +Zj for j = 2,3, wherethe {Zj } are independent zero-mean Normal with variance σ 2

z . We assume the regretform (1) of the SNMM, with outcome Y having a Normal distribution with varianceσ 2

Y and conditional mean

E(Y | S3, A3) = β1 + β2

3∑

j=1

Zj −3

j=1

μj (Aj | Sj , Aj−1;ψ). (5)

The terms in this conditional mean correspond to the terms in (1) having de-fined E(Y | do(dOD)) = β1, εdOD

j (Sj , Aj−1) = β2Zj and ΔdOD

j (Aj | Sj , Aj−1) =−μj (Aj | Sj , Aj−1;ψ). Note that the nuisance functions εdOD

j do not depend onthe history. The regret models are defined as

μ1(a;ψ) = ψ1(a − ψ2S1)2

μj (a | Sj , Aj−1;ψ) = ψ1(

a − (ψ2Sj + ψ3Sj−1 + ψ4Aj−1))2

j = 2,3.(6)

The optimal dynamic recommended doses are thus dOD1 (S1;ψ) = ψ2S1 for the first

time point and dODj (Sj , Aj−1;ψ) = ψ2Sj + ψ3Sj−1 + ψ4Aj−1 for the later time

points. Parameter choices for simulations are θ = (0,0.9,−1), σ 2Z = σ 2

Y = 1,ψ =(1,1,0.75,−1) and (β1, β2) = (1,1). With these parameter values the optimal dy-namic treatment strategy is

dOD3 (S3, A2) = S3 + 0.75S2 − A2

dOD2 (S2, A1) = S2 + 0.75S1 − A1 (7)

dOD1 (S1) = S1.

When simulating from the model the baseline dose was set at a0 = 0 and later doseswere independent standard Normal.

4.2 ODFD Form

Another form of the SNMM (1) incorporates the fixed-dose contrasts (2):

E(Y | S3, A3) = E(

Y∣

∣ do(

dFD)) +3

j=1

εdFD

j (Sj , Aj−1) +3

j=1

νj (Aj | Sj , Aj−1).

(8)

Page 11: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

To see that this is actually a valid decomposition of the conditional mean of Y , thetelescoping form of the SNMM (1) corresponding to the fixed-dose regime d = dFD

has to be written up.Expressions for the {νj } can be derived from the regrets {μj }. At time j = 3 we

have

ν3(a3 | S3, A2) = E(

Y | (S3, A2),do(a3)) − E

(

Y | (S3, A2),do(A2))

= E(

Y | (S3, A2),do(a3)) − E

(

Y∣

∣ (S3, A2),do(

dOD3

))

+ E(

Y∣

∣ (S3, A2),do(

dOD3

)) − E(

Y | (S3, A2),do(A2))

= −μ3(a3 | S3, A2) + μ3(A2 | S3, A2).

Similarly at j = 2 we find

ν2(a2 | S2,A1) = E(

Y∣

∣ (S2,A1),do(

a2, dFD3

)) − E(

Y∣

∣ (S2,A1),do(

dFD2

))

= E(

Y | (S2,A1),do(a2, a2)) − E

(

Y∣

∣ (S2,A1),do(

a2, dOD3

))

+ E(

Y∣

∣ (S2,A1),do(

a2, dOD3

)) − E(

Y∣

∣ (S2,A1),do(

dOD2

))

+ E(

Y∣

∣ (S2,A1),do(

dOD2

)) − E(

Y∣

∣ (S2,A1),do(

A1, dOD3

))

+ E(

Y∣

∣ (S2,A1),do(

A1, dOD3

)) − E(

Y | (S2,A1),do(A1,A1))

.

(9)

To simplify, we define

μ(−1)3 (a | S2,A1) = E

(

Y∣

∣ (S2,A1),do(

a, dOD3

)) − E(

Y | (S2,A1),do(a, a))

= ES3|(S2,A1),do(a)

{

E(

Y∣

(

S3,A1,do(a))

,do(

dOD3

))

− E(

Y∣

(

S3,A1,do(a))

,do(a))}

= ES3|(S2,A1),do(a)

(

μ3(

a | S3,A1,do(a)))

,

with (S3,A1,do(a)) being short for ((S2, S3(a)), (A1,do(a))). Thus μ(−1)3 (a |

S2,A1) is the expected regret for dose a at the third time point if initiating dosea at the second time point and continuing with this dose throughout the treatmentperiod when having observed (S2,A1). The notation μ

(−1)3 is used to indicate that it

corresponds to the one-step-ahead expectation of μ3. The contrast at j = 2 can thenbe written as

ν2(a2 | S2,A1)

= −μ(−1)3 (a2 | S2,A1) − μ2(a2 | S2,A1) + μ2(A1 | S2,A1) + μ

(−1)3 (A1 | S2,A1),

namely a sum of regrets at the second time point and future expected regrets assumingthe fixed-dose strategy is followed at the third time point.

Finally considering j = 1, and omitting details, the fixed-dose contrast can beshown to be

Page 12: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

ν1(a1 | S1) = −μ(−2)3 (a1 | S1) − μ

(−1)2 (a1 | S1) − μ1(a1 | S1)

+ μ(−2)3 (a0 | S1) + μ

(−1)2 (a0 | S1) + μ1(a0 | S1),

with

μ(−1)2 (a | S1) = ES2|S1,do(a)

(

μ2(

a | S2,do(a)))

μ(−2)3 (a | S1) = ES3|S2,do(a,a)

{

ES2|S1,do(a)

(

μ3(

a | S3,do(a, a)))}

,

where S3 | S2,do(a, a) is short for S3 | ((S1, S2(a)),do(a)),do(a).Detailed calculation of the contrasts for the scenario we are considering is pro-

vided in Appendix. In particular

ν1(

a1 | S1; (φ10, φ1)) = φ10(0 − a1) × (

a1 − 2(φ11 + φ12S1))

(10)

and for j = 2,3

νj

(

aj | Sj , Aj−1; (φj0, φj ))

= φj0(Aj−1 − aj )(

(Aj−1 + aj ) − 2(φj1 + φj2Sj + φj3Sj−1 + φj4Aj−1))

,

(11)

where φj0, φ1 = (φ11, φ12) and φj = (φj1, . . . , φj4) are derived from the parametersθ and ψ appearing in the state and regret models of Sect. 4.1. Thus the fixed-dosecontrasts, though derived from regrets, are of the proposed form (4). Further, theODFD recommended doses are dODFD

1 (S1;φ1) = φ11 + φ12S1 for the first time pointand dODFD

j (Sj , Aj−1;φj ) = φj1 + φj2Sj + φj3Sj−1 + φj4Aj−1 for j = 2,3. Thesehave the same linear structure as the OD-recommended doses which follow from theregrets (6), but with different coefficients. For the parameters chosen in this particularexample we have

dODFD3 (S3, A2) = S3 + 0.75S2 − A2

dODFD2 (S2, A1) = 0.595S2 + 0.075S1 − 0.1A1 (12)

dODFD1 (S1) = 0.4165S1.

Compared to the OD strategy (7), the coefficients for time points 1 and 2 are shrunktowards 0. The ODFD dose at j = 1 will never be numerically larger than the ODequivalent. The same is usually true at time point j = 2, though not mathematicallyguaranteed since, for example, an OD dose of zero may be recommended. At j = 3the ODFD and OD strategies are the same. Generally we will expect the ODFD strat-egy to be more conservative since we in principle have no second chance to bringan extreme dose back toward the norm. Note that the parameterization of the ODFDdoses does not include an intercept term in this particular example, which is due tothe OD doses not containing intercept terms either.

Page 13: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

Table 1 Means and standarddeviations (SD) of 1000 meanresponses from samples of sizen = 1000 when followingdifferent dosing strategies

Dosing strategy Mean Y SD

Random (setup) −23.38 0.99

0-dose (a0 = 0) −9.68 0.44

Fixed-dose dynamic 0.25 0.07

Optimal dynamic 1.00 0.06

4.3 Performance of ODFD Strategy when no Visits are Missed

To investigate the performance of the ODFD strategy (12) for our simple scenario, wegenerated data according to the regret model (5), with the doses assigned in four dif-ferent ways: (1) independent standard Normal doses, (2) constant zero dose through-out treatment, a1 = a2 = a3 = a0 = 0, (3) ODFD strategy (12), and (4) optimal dy-namic strategy, OD. From (5), the mean outcome corresponding to the OD strategywill equal β1 = 1 since for this strategy all regrets (contrasts) equal 0 and the meanof the residual terms is 0.

We generated 1000 data sets, each of sample size n = 1000 patients. For eachsimulated data set we determined the mean response Y . The means and standarddeviations of the 1000 sample means for each dose allocation strategy are given inTable 1.

The results show that, for our example, the baseline level of treatment (a0 = 0)performs considerably better than assigning doses at random. As expected, the meanoutcome when following the OD strategy is at the target of β1 = 1. The ODFD strat-egy gives a lower mean outcome of 0.25, which, relative to the other results, is con-siderably better than the first two strategies and not much worse than OD. In thisparticular setup where we update the doses on a regular basis, we therefore do notlose much with respect to optimization of the outcome when following the ODFDstrategy instead of the OD strategy.

4.4 Estimation of the Optimal Fixed Dose

Before introducing missingness we demonstrate that we can estimate the ODFD strat-egy adequately under a complete visit structure. We generate data according to theregret model (5)–(6) but fit the reparametrized model (8), with contrasts given by(10)–(11). To completely specify the model (8), models for the nuisance functions{εdFD

j } need to be formulated whereas E(Y | do(dFD)) is an intercept term. The nui-sance functions can be derived from formula (5) by adding and subtracting the ex-pressions for the fixed-dose contrasts νj found in Sect. 4.2, rearranging and keepingtrack of the terms. It can then be shown that the sum of the nuisance functions is alinear combination of the following 10 terms:

(S1 − ES1), (S2 − ES2) · (1, S1,A1), (S3 − ES3) · (1, S2,A2),(

S21 − ES2

1

)

,(

S22 − ES2

2

)

,(

S23 − ES2

3

)(13)

where ESmj is a short form for the expected conditional mean of Sm

j given history,m = 1,2. The required expectations follow from the autoregressive state model (with

Page 14: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

estimated parameters). This leaves the contrasts {νj }, which from (10)–(11) with theintercept terms φj1 = 0 sum to a linear combination of 10 terms also:

A1 · (A1, S1), (A1 − A2) · (A1 + A2, S2, S1,A1),

(A2 − A3) · (A2 + A3, S3, S2,A2)T .

Including an intercept term in the SNMM (8), E(Y | S3, A3) is thus a linear com-bination of 21 parameters. To estimate the parameters we use the algorithm de-scribed in [5]. First the state model is estimated using the true autoregressive mod-els described in Sect. 4.1 to allow for calculation of residuals of the powered states,Sm

j − ESmj , m = 1,2. Next least squares estimates of the parameters are found by spec-

ifying a linear model for the observed responses involving the residuals (10 terms)and the terms needed for the contrasts (10 terms). From this procedure, consideringthe 10 terms needed for the parameterization of the contrasts, we do not directly getthe estimates for the ODFD-strategy parameters (12) but have to do a simple cal-culation. For example for time point 2 we estimate four parameters, say δ1, . . . , δ4,corresponding to the terms (A1 − A2)(A1 + A2), (A1 − A2)S2, (A1 − A2)S1 and(A1 − A2)A1. From formula (11) we find δ1 = φ20 and δl = 2φ20φ2l , l = 2,3,4 (φ21

was set to 0). Thus the latter three parameters have to be divided by 2φ20 = 2δ1, toisolate the parameters (φ22, φ23, φ24) corresponding to the ODFD (12).

In practice, the nuisance functions will be unknown and we therefore also exper-imented with other models for these components to see how sensitive the results areto misspecification. In each case we assumed the sum of the nuisance functions couldbe written as a linear combination of terms. We considered four possibilities.

A. The true combination (13).B. First-order state residuals only:

(S1 − ES1), (S2 − ES2), (S3 − ES3).

C. The first 7 components of the true combination but no second-order states:

(S1 − ES1), (S2 − ES2) · (1, S1,A1), (S3 − ES3) · (1, S2,A2).

D. As C, with additional interactions:

(S1 − ES1), (S2 − ES2) · (1, S1,A1, S21 , S1A1

)

,

(S3 − ES3) · (1, S2,A2, S22 , S2A2

)

.

Results, for each model, of 1000 replications of samples of size 1000 are summa-rized in Table 2, where we concentrate on the parameters determining the optimaldecisions (12). The same seed was used in the data generation for all four nuisancemodels such that the four models are fitted on the exact same 1000 data sets.

Specifying the true models for the nuisance functions, the parameters are esti-mated without bias and with small standard deviations. The misspecified models B–Dall lead to biased estimates of the coefficients, and estimates often have large standard

Page 15: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

Table 2 Mean and standard deviations of ODFD parameter estimates with true (A) and three misspecifiednuisance function models (B–D, see text for details). Results based on 1000 simulations of sample sizen = 1000

Parameter True value A B C D

φ12 0.4165 0.4064 0.4015 0.4070 0.4069

SD(φ12) 0.0121 0.0322 0.0190 0.0190

φ22 0.5950 0.5937 −0.0213 0.5966 0.5969

SD(φ22) 0.0096 0.0393 0.0399 0.0394

φ23 0.0750 0.0762 0.4886 0.0763 0.0763

SD(φ23) 0.0036 0.0760 0.0364 0.0356

φ24 −0.1000 −0.1014 −0.5624 −0.1012 −0.1012

SD(φ24) 0.0043 0.0862 0.0288 0.0284

φ32 1.0000 0.9989 −1.2096 1.0507 1.0520

SD(φ32) 0.0349 17.3938 0.3394 0.3378

φ33 0.7500 0.7508 0.8007 0.7970 0.7964

SD(φ33) 0.0313 15.8636 0.2992 0.2966

φ34 −1.0000 −1.0051 −5.8001 −1.0672 −1.0673

SD(φ34) 0.0407 83.3965 0.3636 0.3603

deviations, particularly at the third time point. The simplest model B leads to partic-ularly unreliable estimates. A few of the estimated parameter values were extremelylarge for the third time point parameters and some of the Inter Quartile Ranges(IQR) did not even contain the true parameter values (IQR(φ32) = (−0.27,−0.84);IQR(φ33) = (0.27,0.78), IQR(φ34) = (−3.60,−1.78)). The more complex modelsC and D on the other hand have fairly small bias at the third time point, though morevariability in estimates than the true model A.

Further simulations, with different sample sizes and alternative misspecifications,lead to the same broad conclusions. A correctly specified model for the nuisancefunctions leads to estimators with good properties, but overly simple forms lead tounreliable estimation. Provided estimation does not become infeasible, there is littlecost to erring on the side of overfitting the nuisance functions. We note that the ap-plications of regret regression described by Almirall et al. [1] and Henderson et al.[5] used only first-order state residuals. Our simulation study demonstrates that thismay not be sufficient, and that unless some effort is put into the formulation of thenuisance functions we may obtain severe bias in our estimated optimal doses.

5 Missing Visits

When patients are allowed to skip some of the scheduled visits, we have missing datain the past and we will have missing data in the future. Since doses are left unchangedwhen patients miss visits, only states may be missing.

In Sect. 5.1 we first demonstrate that our ODFD strategy may solve the problem ofmissing data in the future, namely that future patients may miss visits, as the strategy

Page 16: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

performs better than the OD strategy when missed visits are allowed in the simu-lation setup considered above. When assigning OD or ODFD doses to patients thepreviously measured state is needed according to (7) and (12) but when patients areallowed to miss visits, this information may be missing. We consider different impu-tation methods for handling the previous missing states and demonstrate by examplehow we can explicitly formulate the ODFD strategy in terms of whether the previousstate was measured or not.

In Sect. 5.2 we address the problem of having missing data in the past, namelythat the data available for estimation of the parameters contain missing values. Wedemonstrate that an MCMC method can be used to handle the estimation.

We assume that the visit process is Missing Completely At Random (MCAR) [7].Adaption to MAR would be straightforward. We assume that the group of patientsfor which the estimated strategy will be applied in the future are intended to followthe same visit structure as the patients generating the data used for estimation. Bothgroups are allowed to skip some of the visits according to an MCAR or MAR process.

5.1 Performance of the Optimal Dynamic Fixed-Dose Strategy for Missed Visits

To investigate the issue of missingness in the future, we compare the performance ofthe OD and ODFD strategies for the scenario of Sect. 4, but now with the complica-tion that some or all patients may miss some of the scheduled visits. We consider fivemissing data patterns:

1. All patients miss visits 2 and 3.2. All patients miss visit 3 only.3. All patients miss visit 2 only.4. Patients miss measurements at time point 2 and 3 with probability 0.33 on each

occasion.5. As 4, but with probability 0.67 of missingness.

The state variable S1 at the first time point is thus always observed. For the first twomissingness patterns there is no missingness in the past at decision times, and thewhole history can be used to inform actions. For the other patterns it is possible thatS2 is missing when a decision is needed at time 3. In these cases we considered fourimputation strategies: Last Observation Carried Forward, LOCF (i.e. S1); interpola-tion between the first and third states (i.e. the average of S1 and S3); predicted valueof the previous missing state given previous history (S2 | (S1,A1)); or the predictedvalue given all the information available at the time point of dose assignment, namely(S2 | ((S1, S3), (A1,A1))) found from the multivariate Normal state distribution. Tobe explicit, let S∗

3 denote the observed state history for a patient, being S∗3 = (S1, S3)

if visit 2 is missed and S3 otherwise. We can then formulate the ODFD dosing strat-egy at the third time point as

dODFD((

S∗3 , A2

);φ3) =

{

φ31 + φ32S3 + φ33ES2 + φ34A2 if S2 is missingφ31 + φ32S3 + φ33S2 + φ34A2 if S2 is observed

for a prediction ES2 of S2. For the strategy based on replacing the unobservedS2 by the predicted value found from ES2 | (S1, S3), (A1,A1) we find ES2 =

Page 17: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

Table 3 Mean outcome and standard deviation (SD) of the mean obtained under the OD strategy and theODFD strategy assuming various missing patterns (1.–5., see text for details). For 3.–5. missing previousstates S2 for the third time point dose assignments were replaced by either true (unobserved) values, LOCF,interpolation or predicted values. The calculations were based on n = 1000 patients in 1000 simulated datasets

Missing pattern Missing data strategy Optimal dynamic Optimal dynamicfixed dose

Mean Y SD Mean Y SD

1. Visit 1 only −15.56 0.74 −4.43 0.21

2. Visit 1 and 2 only −2.61 0.17 −0.80 0.08

3. Visit 1 and 3 only True S2 −1.82 0.14 −0.51 0.08

Visit 1 and 3 only LOCF −3.07 0.19 −1.22 0.10

Visit 1 and 3 only Interpolation −2.13 0.15 −0.82 0.09

Visit 1 and 3 only S2|S1,A1 −3.76 0.26 −1.07 0.10

Visit 1 and 3 only S2|(S1, S3), (A1,A1) −2.13 0.15 −0.82 0.09

4. MCAR visit 2–3 (33 %) True S2 −2.22 0.32 −0.66 0.11

MCAR visit 2–3 (33 %) LOCF −2.50 0.33 −0.82 0.11

MCAR visit 2–3 (33 %) Interpolation −2.29 0.32 −0.73 0.11

MCAR visit 2–3 (33 %) S2|S1,A1 −2.35 0.32 −0.78 0.12

MCAR visit 2–3 (33 %) S2|(S1, S3), (A1,A1) −2.28 0.32 −0.73 0.11

5. MCAR visit 2–3 (67 %) True S2 −7.88 0.54 −2.26 0.16

MCAR visit 2–3 (67 %) LOCF −8.15 0.54 −2.41 0.16

MCAR visit 2–3 (67 %) Interpolation −7.95 0.54 −2.33 0.16

MCAR visit 2–3 (67 %) S2|S1,A1 −8.00 0.54 −2.38 0.16

MCAR visit 2–3 (67 %) S2|(S1, S3), (A1,A1) −7.95 0.54 −2.33 0.16

0.497(S1 + S3) − 0.055A1. Plugging in the parameters from our simulation setup(12) in combination with this predicted value ES2 of S2 we obtain the ODFD dose

dODFD((

S∗3 , A2

);φ3) =

{

1.497S3 + 0.373S1 − 1.041A2 if S2 is missingS3 + 0.75S2 − A2 if S2 is observed

since A1 = A2. Thus when visit 2 is missed, more weight is given to the currentmeasurement S3 and current dose A1 = A2 and the first state S1 has an effect too,though with a reduced weighting.

As a reference, we also consider performance if the true state S2 was known inthe simulations (although the visit is still considered as missed and no change intreatment occurs). Based on n = 1000 patients, 1000 data sets were generated. Foreach data set the mean outcome was determined. The means and standard deviationsof the mean outcomes are given in Table 3.

For all missing data patterns, the ODFD strategy outperforms the OD strategy.The more the missingness, the larger the difference between the two strategies. Forthe scenario for which only the first measurement was taken, and for which the pa-tient had to stay on the assigned dose for the following two time points, the largest

Page 18: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

difference is seen. If the patients only missed the second or third visit, a smaller dif-ference in outcome is seen since the patients were allocated a non-optimal dose fora shorter part of the treatment phase. For the other two scenarios with visits MCARat time points 2 and 3, the difference between the performance of the two strategiesdepends on the frequency of missed visits (33 % vs. 67 %). The four feasible methodsof imputation of a missing state led to quite similar results. Not surprisingly, replac-ing the missing state with the predicted value given all available information gives thebest performance. The much simpler option of simply interpolating between observedstates gives almost the same performance. The four feasible strategies all performedreasonably well when compared to the gold standard of knowing the true value of thestate. In summary, the ODFD strategy shows good performance, and the choice ofmissing data handling method does not seem to have a large influence in this setup.

5.2 Estimation Based on Incomplete Data

Missing values in the data set used for estimation complicate the regret-regressionestimation method. The state models may easily be estimated using a likelihood ap-proach but the calculation of the conditional mean of Y in the SNMM (1) used forordinary least squares is challenging. In this section we briefly consider a Markovchain Monte Carlo (MCMC) procedure [13, 24] in which we treat the missing statesas targets for inference alongside the unknown parameters.

Once more we consider the scenario of Sect. 4. For the implementation of thismethod, the densities of the distribution of each state conditional on previous his-tory are needed, based on the autoregressive model of Sect. 4.1. Using a Metropolis-Hastings step, missing states were iteratively updated, simulating values from theconditional state distributions. For each update, acceptance probabilities were cal-culated based on the conditional distribution of the states as well as further assum-ing a Gaussian distribution for Y with conditional mean given by the SNMM (8)with terms described in Sects. 4.2–4.3. Flat priors were used for the parameters. TheMCMC algorithm was used to sample iteratively from the posterior distribution ofthe parameters.

We generated 100 data sets of size n = 1000 according to the regret model (5) andMCAR missingness was introduced with probabilities 0.33, 0.50 and 0.67 at timepoints 2 and 3. We considered the ODFD model (8) as described in Sect. 4.4. Foreach data set we used a burn-in of 100000 iterations and generated further 10000samples from the posterior distribution of the parameters irrespective of whether ornot convergence had been achieved. Acceptance rates were approximately 60 %.

Results based on the chains deemed to have converged according to the Heidel-berger-Welch diagnostic [26] are summarized in Table 4, which gives the mean andstandard deviation of the posterior MCMC means of the parameters. For 33 % miss-ingness the parameters are estimated unbiasedly with a good precision for the firsttwo time points and acceptable precision for the third time-point parameters. Com-pared to complete-case estimation (Scenario 1 of Table 2), the standard deviations arelarger and at least double for all parameters except for the first time-point parameterfor which there are no missing values. With 50 % of the measurements missing, thereis a bias for the third time-point parameter corresponding to the current state S3 only.

Page 19: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

Table 4 Mean and standard deviation (SD) of the posterior means of the parameters of the ODFD strategywith 33 %, 50 % or 67 % missingness at time points 2 and 3. Only the data sets for which convergencewas achieved according to the Heidelberger–Welch criteria were included

True values 33 % missing 50 % missing 67 % missing

% convergence 87 % 89 % 80 %

Parameter Mean SD Mean SD Mean SD

φ12 0.4165 0.4059 0.0031 0.4051 0.0065 0.4057 0.0128

φ22 0.5950 0.5941 0.0090 0.5973 0.0145 0.6086 0.0335

φ23 0.0750 0.0752 0.0044 0.0729 0.0079 0.0694 0.0220

φ24 −0.1000 −0.1027 0.0054 −0.1028 0.0119 −0.1030 0.0243

φ32 1.0000 1.0233 0.0842 1.0902 0.1674 1.3174 0.4348

φ33 0.7500 0.7291 0.0979 0.7278 0.1582 0.7576 0.4253

φ34 −1.0000 −0.9876 0.1118 −1.0090 0.1989 −1.0948 0.5837

The standard deviations are almost doubled for all parameters. For 67 % missing-ness the bias of the parameter corresponding to S3 is considerably larger and thereis further a bias of the parameter corresponding to current dose A2. The standarddeviations for all third time-point parameters are large.

6 Application

Rosthøj et al. [21] and Henderson et al. [5] described regret-based analyses of data on303 patients undertaking anticoagulation therapy. All patients had chronic conditionsand were prescribed warfarin treatment in order to attempt to control blood clottingtime, as determined by the International Normalized Ratio, INR (Hirsch et al. [4]).Actions A were changes in prescribed dose of drug, and states S were standardizedmeasures of INR, defined to be zero if the patient is within range, and otherwise thenumber of standard deviations from the range boundary [21]. The response Y was theproportion of the follow-up period during which the INR was within target range. Theaim is to determine optimal treatment rules as functions of dynamic patient histories.The data include 14 state and action pairs (Sj ,Aj ) for each patient. Rosthøj et al.[21] and Henderson et al. [5] treated the first four measurements as burn-in and didnot include them as decision times in their analyses. Timing of the measurements wasnot taken into account.

In this work we will concentrate on the first 20 weeks, which covers the burn-in period. Exact measurement times vary between patients, which means it is notpossible to reconstruct from the data to hand whether a scheduled visit was missed.However, intervals between measurements are often around 6 weeks and so for thepurposes of this analysis we will define three measurement intervals: 4–8 weeks,10–14 weeks and 16–20 weeks. There were 291 patients with observations within atleast one of these intervals. The observation pattern is summarized in Table 5. Theresponse Y is the proportion of time INR was within range over the first 20 weeks offollow-up. It is obtained as a linear interpolation between the actual INR values andactual visit times. Since the actual visit times vary between patients, the response is

Page 20: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

Table 5 Observation patternfor warfarin data Observation Count

1 2 3

� � � 122

� � 62

� � 16

� 58

� � 9

� 4

� 20

essentially continuous except for a small point mass at the boundaries for people whowere never in range or always in range. The interpolation passes through any missedvisits. This way of defining response for warfarin patients is standard in the literature(e.g. [20]).

We used regret regression to fit the same regret functions as used in Sect. 4, namely

Y = β0 +3

j=1

βjZj −3

j=1

μj (Aj | Sj , Aj−1;ψ)

where

μ1(a | S1;ψ) = ψ1(a − ψ2S1)2

and

μj (a | Sj , Aj−1;ψ) = ψ1(a − ψ2Sj − ψ3Sj−1 − ψ4Aj−1)2 j = 2,3.

Given that the response Y defined as a proportion is bounded, this model shouldbe considered as an approximation. We have previously demonstrated [5] that thismodel provides a good fit to the anticoagulation data. To generate residuals {Zj } weadapted the state model of Henderson et al. [5] to reflect the reduced number of visits.State Sj was assumed to have a mixed distribution, with a logistic binary componentfor in range (Sj = 0) or not, and Gaussian distribution for the amount out of rangeif appropriate. In both cases we included linear predictors based on previous stateand action, if any, and their interaction, and we allowed parameters to vary with visitnumber.

We concentrate on the response parameters β and ψ . We used the MCMC proce-dure of Sect. 5.2 for estimation, with flat priors, a burn-in of 10000 iterations and afurther 50000 iterations for estimation. Posterior means are given in Table 6, togetherwith standard deviations obtained from every 500th sample to remove autocorrela-tion. For reference, the lag 1 correlations in the chains were around 0.5, whereas thelag 500 correlations were close to zero.

The negative coefficients for Z1,Z2 and Z3 shown in Table 6 suggest an asym-metric effect of INR. High INR, and associated Z, seems to lead to a reduction inpercentage time in range, whereas low INR has the opposite effect. This suggeststhat it is more difficult to control patients whose INR leaves the target range through

Page 21: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

Table 6 Parameter estimatesfor warfarin data Parameter Estimate SE

β0 53.968 0.318

β1 −8.186 1.143

β2 −13.355 0.823

β3 −10.790 2.345

ψ1 1.723 0.197

ψ2 −0.991 0.047

ψ3 −0.559 0.102

ψ4 −0.657 0.083

the upper boundary, or whose INR is high from the outset. It is perhaps worth notingthat excursions of INR from within range tend to be more extreme if they are positivethan if they are negative, since there is a lower limit of zero, whereas there is no upperlimit to excursions.

Turning to regrets, it seems that current and previous states, and previous actionsare all important, since ψ2,ψ3 and ψ4 are all significantly different from zero. Themost important term in the regret model is current state Sj and the estimated decisionrule prescribes negative actions when current state is high, positive actions whencurrent state is low. These are clinically sensible: decrease in dose should be expectedwhen blood clotting time is high, increase in dose when it is low. The negative valueof ψ3 shows how the previous state moderates the decision rule. If two consecutivestates have the same sign then a more aggressive change in dose is indicated. If,however, consecutive states have opposite signs then more caution is needed. Fromψ4 we see that the decision rule indicates that when states are in range there shouldbe a tendency to reverse previous dose changes. Table 7 illustrates by showing theestimated optimal dynamic decision, and the regrets for non-optimal decisions, for arange of values of current and previous states and previous actions. By assumption,the values apply to either visit two or visit three.

We now consider the optimal dose under the ODFD-policy. We will consider visittwo to illustrate. The corresponding ODFD contrast was defined (Sect. 4.2) as

ν2(a2 | S2,A1) = E(

Y | (S2,A1),do(a2, a2)) − E

(

Y | (S2,A1),do(A1,A1))

.

However, for the warfarin data we have defined action to be change in dose, and amissed visit three implies no further change, so A3 = 0 rather than A3 = A2. Conse-quently the operative fixed-dose contrast is

ν2(a2 | S2,A1) = E(

Y | (S2,A1),do(a2,0)) − E

(

Y | (S2,A1),do(0,0))

.

Following Sect. 4.2 this is

ν2(a2 | S2,A1) = −μ−13 (a2 | S2,A1) − μ2(a2 | S2,A1)

+μ2(0 | S2,A1) + μ−13 (0 | S2,A1),

Page 22: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

Table 7 Warfarin data: estimated optimal decisions and example regrets for various previous histories,j = 2,3

Sj−1 Aj−1 Sj AOD μj (−2) μj (−1) μj (0) μj (1) μj (2)

−1 −1 −1 2.3 31.8 18.7 9.1 2.9 0.2

−1 0 −1 1.5 21.6 11.1 4.1 0.5 0.4

−1 1 −1 0.8 13.3 5.5 1.1 0.1 2.6

0 −1 −1 1.7 24.2 13.0 5.3 1.0 0.1

0 0 −1 1.0 15.4 6.8 1.7 0.0 1.8

0 1 −1 0.2 8.6 2.6 0.1 1.0 5.4

1 −1 −1 1.2 17.6 8.3 2.5 0.1 1.1

1 0 −1 0.4 10.3 3.6 0.3 0.5 4.2

1 1 −1 −0.3 4.9 0.8 0.2 3.0 9.2

−1 −1 0 1.3 18.8 9.2 2.9 0.2 0.8

−1 0 0 0.5 11.2 4.1 0.5 0.4 3.6

−1 1 0 −0.2 5.5 1.1 0.1 2.5 8.4

0 −1 0 0.8 13.1 5.3 1.0 0.1 2.7

0 0 0 0.0 6.9 1.7 0.0 1.7 6.9

0 1 0 −0.8 2.7 0.1 1.0 5.3 13.1

1 −1 0 0.2 8.4 2.5 0.1 1.1 5.5

1 0 0 −0.5 3.6 0.4 0.5 4.1 11.2

1 1 0 −1.3 0.8 0.2 2.9 9.2 18.8

−1 −1 1 0.3 9.2 3.0 0.2 0.8 4.9

−1 0 1 −0.4 4.2 0.5 0.3 3.6 10.3

−1 1 1 −1.2 1.1 0.1 2.5 8.3 17.6

0 −1 1 −0.2 5.4 1.0 0.1 2.6 8.6

0 0 1 −1.0 1.8 0.0 1.7 6.8 15.4

0 1 1 −1.7 0.1 1.0 5.3 13.0 24.2

1 −1 1 −0.8 2.6 0.1 1.1 5.5 13.3

1 0 1 −1.5 0.4 0.5 4.1 11.1 21.6

1 1 1 −2.3 0.2 2.9 9.1 18.7 31.8

where

μ−13 (a2 | S2,A1) := E

(

Y | (S2,A1),do(

a2, dOD3

)) − E(

Y | (S2,A1),do(a2,0))

= ES3|(S2,A1),do(a2)

{

E(

Y∣

(

S3,A1,do(a2))

,do(

dOD3

))

− E(

Y∣

(

S3,A1,do(a2))

,do(0))}

= ES3|(S2,A1),do(a2))

{

μ3(

0∣

(

S3,A1,do(a2)))}

= ES3|(S2,A1),do(a2))

{

ψ1(

0 − (ψ2S3 + ψ3S2 + ψ4a2))2}

.

To find the ODFD strategy, we need to maximise ν2 over a2. The assumed mixeddistribution for S3 means the expectation in the last expression is intractable and we

Page 23: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

Fig. 1 Optimal decisions atvisit two, under optimaldynamic fixed-dose (ODFD)and optimal dynamic (OD)regimes, as function of state S2and previous history

need to use numerical methods. Details are omitted, but we illustrate in Fig. 1. Thefigure shows the estimated ODFD decision at visit two, as a function of S2 and for(S1,A1) = (−2,−2), (0,0) or (2,2). For comparison we also include the correspond-ing OD decision rule.

We see that at (S1,A1) = (0,0), meaning the previous INR value was in range andthe previous decision was to leave dose unchanged, then the ODFD and OD rules arevery similar, with increase in dose indicated when S2 is low and decrease when S2is high. The same pattern is evident when (S1,A1) = (2,2) which means previouslythere was a high INR but a counter-indicated decision was taken to increase dose.This time, however, the rules prescribe a more severe decrease (or smaller increase)in dose, to compensate for the presumed too-aggressive A1. For this case the ODFDrule is less influenced by the past values being unusual than is the OD regime.

The most interesting feature of Fig. 1 is the pattern at (S1,A1) = (−2,−2). Thegeneral behavior essentially reflects that at (S1,A1) = (2,2), but the ODFD decisionrule has a clear slope change just before S2 = 0, and effectively prescribes no dosechange over the range of positive S2 given here. Our interpretation of this is thefollowing. The history is (S1,A1) = (−2,−2). If S2 stays very low then there is aclear need to increase dose, as expected. As S2 increases, the combined effect ofS1 and A1 both being negative dominates a small positive S2 and the OD regimestill prescribes a further increase in dose. The ODFD regime on the other hand isdominated by the pressing need of the current situation when S2 is negative, butas S2 increases to around target the future consequences become more importantand the rule seeks to avoid locking in a high dose. The lack of symmetry between(S1,A1) = (−2,−2) and (S1,A1) = (2,2) is because of a lack of symmetry in thestate model, and the corresponding expected behavior of S3 and associated regrets.

We also experimented with the direct estimation of the ODFD strategy formulat-ing a linear model for the ODFD-dose involving the same terms as the OD strategy.In the data, 29 % of the measurements are missing. Combined with the number of

Page 24: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

patients n = 291 and 10 parameters required for estimation of the contrasts (versus 4parameters for regrets), we were not able to make the MCMC regret regression con-verge.

7 Discussion

Methods for causal inference for time-dependent treatments have so far assumed thatall patients are examined at exactly the same times during the treatment phase. Withparticular focus on the estimation of optimal dynamic treatment strategies maximiz-ing the mean of a final outcome measured by the end of the treatment phase, wehave investigated whether the methods could be generalized to a setting for whichthe patients are allowed to miss some of the examinations.

We distinguish between two kinds of missingness. We consider missing data in thefuture corresponding to future patients not being expected to show up for examinationregularly. We have demonstrated that the optimal dynamic (OD) treatment regime, asformulated by Murphy [9], will not perform optimally if the patients do not show upfor all future scheduled examinations. Instead we have suggested a strategy which wetermed the optimal dynamic fixed-dose (ODFD) strategy. Our proposal is to assignthe best possible fixed (unchangeable) dose to the patient at each examination butwhen the patient shows up for the next examination, he or she is assigned a new bestpossible fixed dose depending on changes in the state of the patient. This defines adynamic treatment strategy. We have demonstrated how to define the strategies whenprevious information on the state of the patient is missing due to a missed visit. Inour simulations the new ODFD strategy did not differ substantially from the optimaldynamic strategy, OD, when there were no missed visits, and it outperformed ODwhen there were missed visits.

When the data available for estimation of optimal dynamic treatment regimesdo not contain regular measurements corresponding to some of the patients havingskipped some of the examinations, we have missing data in the past. We have demon-strated that regret regression as suggested by Henderson et al. [5] and Almirall et al.[1] is possible using an MCMC procedure when the available data contain missingvalues. Missing data is thus a technical problem we can solve by MCMC, no matterwhich strategy we are interested in. In particular we have considered the estimation ofthe parameters of a SNMM based on the optimal dynamic treatment strategy to findparameters of the OD regime as well as a SNMM based on the fixed-dose strategyto find parameters of the ODFD. However, direct estimation of the ODFD strategyfrom the SNMM based on the fixed-dose strategy may be impossible if there is alarge number of missed visits or the sample size is small, because the ODFD requiresseparate parameters for each time point. In this case the MCMC estimation may be-come unfeasible. Since models for the OD strategy may be specified using the sameparameters for each time point, the estimation of a model for the OD strategy basedon a regret-regression MCMC method may be feasible. From this, the ODFD strat-egy can be derived in a similar manner to Sect. 4.2. As demonstrated in Appendix,for a quadratic regret with a linear parameterization of the optimal dose and a linearstate model, the formulas for the ODFD strategy can be derived directly. For other

Page 25: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

link functions for the regret, non-linear optimal doses or non-linear state models, theODFD dose will have to be derived using numerical methods (see also the applicationin Sect. 6). In this case, we cannot supply the physicians with a simple formula forhow they should calculate the appropriate size of the dose, but instead they will needa computer program to perform the calculation.

Our proposals are necessarily based on two untestable assumptions. One is thatvisits are missing at random, in that the reason for missing a visit may depend on thepast, or more precisely those parts of the past that we have records for, and the other isthat there are no unmeasured confounders that influence the choice of action. Theseassumptions have been widely discussed in the longitudinal and causal literaturesand of course we cannot fully protect against their failure in any given application.It will be useful in future work to investigate sensitivity to local departures from theassumptions.

van der Laan et al. [25] and Petersen et al. [12] suggested a method for estimationof a so-called statically optimal dynamic treatment strategy based on history adjustedmarginal structural models. The new strategy we have proposed here has some sim-ilarities with their proposed strategy. Instead of finding the optimal fixed dose toassign throughout the treatment, they suggest for each time point to fully specify thefuture sequence of doses. Again, the future sequence of doses is updated when thepatient shows up for the next examination. The regime we consider is thus a specialcase of the regime suggested by van der Laan et al. [25] but their strategy cannot beformulated in terms of Robins’ structural nested mean model. Furthermore Robinset al. [18] demonstrate that the parameters obtained from a history adjusted marginalstructural model may be logically incompatible and therefore recommend the use ofSNMMs, although encouraging a comparison of results obtained from both class ofmodels to gain understanding of the differences between the two approaches and ofwhether logically incompatible parameters will occur in practice.

An important assumption underlying our setup is that we consider time as beingdiscrete. For some treatments time is truly discrete. Examples include the childhoodleukemia data considered in Rosthøj et al. [22] for which the children have scheduledweekly examinations. The discrete time approach allows us to consider the problemas a missing data problem rather than a continuous time problem. It is possible thatwe can formulate our new strategy using continuous time methods, taking as startingpoint the work of e.g. Lok [6]. This will be a topic for future work.

Another important assumption in the implementation of regret regression is a cor-rect specification of the conditional means of (powers of) the states. We consideredthe issue of model misspecification in the setting for which patients are assumed toshow up for all examinations in Barrett et al. [2]. Almirall et al. [1] also addressed thisissue. Basing regret regression on an MCMC procedure we need the full distributionof the states as well as an assumption on the distribution of the final outcome. Thus,our new procedure handling estimation based on missing data values, might be evenmore sensitive to misspecification of the state (and outcome) models. It will be in-teresting to compare the MCMC regret regression to g-estimation based on multipleimputation of the missing states as this approach will require fewer assumptions onthe state distribution. Another possibility is to use the inverse-probability weightingmethod suggested by Robins et al. [15] to handle missing longitudinal covariates.

Page 26: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

The approach considered in this paper needs to be generalized to a longitudinaloutcome rather than a single outcome measured by the end of the study. For theanticoagulation treatment as well as treatment of childhood leukemia, the goal isto stabilize a certain parameter (INR resp. white blood count) within a prespecifiedrange throughout the whole treatment phase. Thus the outcomes of interest will bethe INR or white blood count e.g. two or three weeks ahead. We intend to considerthis setup for the leukemia data in the near future.

We have discussed the estimation and performance of dosing strategies assumingthat the future patient cohort for which our new strategy is to be applied is similarto the patient cohort generating the data used for estimation. In particular we assumethat both groups of patients are intended to be examined at the same time points butwe allow the patients to skip some of the intended examinations. If the two groups ofpatients differ considerably with respect to the visit structure, the strategies estimatedfrom the one population might not be transferred to the other population. Robins etal. [19] have some ideas on how to formulate optimal dynamic treatment strategieswhen parameters and estimation are based on data with one particular visit structurebut the rules are to be applied to a treatment setting with another visit structure. Itwill be important to describe and investigate the differences between their approachand our suggested strategy.

Acknowledgements S.R. was supported by Public Health Services Grant 5 R01 CA 54706-11 from theNational Cancer Institute. J.B. was supported by the Medical Research Council Grant number G0902100We thank Riema Ali, Department of Biostatistics, University of Copenhagen, for assistance with the sim-ulation studies. We are grateful for the helpful comments of two anonymous referees.

Appendix

To simplify notation, we will in the following supress the dependence of the ODand the ODFD strategy on the state history and parameters, writing dOD

j (Aj−1) and

dODFDj (Aj−1) instead of dOD

j (Sj , Aj−1;ψ) and dODFDj (Sj , Aj−1;ψ), respectively.

At the third time point, the OD and the ODFD strategy are equal, dODFD3 = dOD

3 ,such that the fixed-dose contrast at time point 3 can be written

ν3(a3 | S3, A2) = −μ3(a3 | S3, A2) + μ3(A2 | S3, A2)

= −ψ1(

a3 − dOD3 (A2)

)2 + ψ1(

A2 − dOD3 (A2)

)2

= ψ1((

A22 − a2

3

) − 2(A2 − a3)dOD3 (A2)

)

= ψ1(A2 − a3)(

(A2 + a3) − 2dFD3 (A2)

)

.

In the calculation of the expected future regrets at time point 1 and 2 we need theconditional moments of the states,

E(Sj | Sj−1, Aj−1) = θ1 + θ2Sj−1 + θ3Aj−1

E(

S2j | Sj−1, Aj−1

) = σ 2Z + (θ1 + θ2Sj−1 + θ3Aj−1)

2.

Page 27: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

At the second time point, we need the expected future regret at time point 3, assumingthe same dose is assigned at time point 2 and 3. The difference of the expected regretsassigning unchanged dose A1 versus new dose a2 is

−μ−13 (a2 | S2,A1) + μ−1

3 (A1 | S2,A1)

= −ES3|(S2,A1),do(a2)ψ1

(

a2 − dOD3 (a2)

)2 + ES3|(S2,A1),do(A1)ψ1

(

A1 − dOD3 (A2)

)2

= ψ1{(

A21 − a2

2

) + (

ES3|(S2,A1),do(A1)dOD

3 (A1)2 − ES3|(S2,A1),do(a2)

dOD3 (A2)

2)

− 2(

A1ES3|(S2,A1),do(A1)dOD

3 (A1) − a2ES3|(S2,A1),do(a2)dOD

3 (a2))}

.

The second term on the right hand side of the above equation can be written

(

ES3|(S2,A1),do(A1)dOD

3 (A1)2 − ES3|(S2,A1),do(a2)

dOD3 (A2)

2)

= ES3|(S2,A1),do(A1)(ψ2S3 + ψ3S2 + ψ4A1)

2

− ES3|(S2,A1),do(a2)(ψ2S3 + ψ3S2 + ψ4a2)

2

= ES3|(S2,A1),do(A1)

(

(ψ2S3)2 + ψ2

4 A21 + 2ψ3ψ4S2A1 + 2ψ2S3(ψ3S2 + ψ4A1)

)

− ES3|(S2,A1),do(a2)

(

(ψ2S3)2 + ψ2

4 a22 + 2ψ3ψ4S2a2 + 2ψ2S3(ψ3S2 + ψ4a2)

)

= ψ24

(

A21 − a2

2

) + 2ψ3ψ4S2(A1 − a2)

+ ψ22

(

ES3|(S2,A1),do(A1)S2

3 − ES3|(S2,A1),do(a2)S2

3

)

+ 2ψ2ψ3S2(ES3|(S2,A1),do(A1)S3 − ES3|(S2,A1),do(a2)

S3)

+ 2ψ2ψ4(A1ES3|(S2,A1),do(A1)S3 − a2ES3|(S2,A1),do(a2)

S3)

= ψ24

(

A21 − a2

2

) + 2ψ3ψ4S2(A1 − a2) + ψ22 θ2

3

(

A21 − a2

2

)

+ 2ψ22 θ3(A1 − a2)(θ1 + θ2S2)

+ 2ψ2ψ3S2θ3(A1 − a2) + 2ψ2ψ4(

(A1 − a2)(θ1 + θ2S2)

+ (

A21 − a2

2

)

2ψ2ψ4θ3)

= (

ψ24 + ψ2

2 θ23 + 2ψ2ψ4θ3

)(

A21 − a2

2

)

+ (

2ψ3ψ4S2 + 2ψ22 θ3(θ1 + θ2S2) + 2ψ2ψ3θ3S2

+ 2ψ2ψ4(θ1 + θ2S2))

(A1 − a2).

Similarly the third term can be written

−2(

A1ES3|(S2,A1),do(A1)dOD

3 (A1) − a2ES3|(S2,A1),do(a2)dOD

3 (a2))

= −2(

A1(ψ2ES3|(S2,A1),do(A1)S3 + ψ3S2 + ψ4A1)

− a2ψ2ES3|(S2,A1),do(a2)S3 + ψ3S2 + ψ4a2

)

Page 28: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

= −2(

ψ4(

A21 − a2

2

) + ψ3(A1 − a2)S2 + ψ2(A1ES3|(S2,A1),do(A1)S3

− a2ES3|(S2,A1),do(a2)S3)

)

= −2(

ψ4(

A21 − a2

2

) + ψ3(A1 − a2)S2

+ ψ2(

(A1 − a2)(θ1 + θ2S2) + (

A21 − a2

)2θ3

))

= −2(

(ψ4 + ψ2θ3)(

A21 − a2

2

) + (A1 − a2)(

ψ2θ1 + (ψ3 + ψ2θ2)S2))

.

The fixed-dose contrast at the second time point is

ν2(a2 | S2,A1)

= −μ−13 (a2 | S2,A1) − μ2(a2 | S2,A1) + μ2(A1 | S2,A1) + μ−1

3 (A1 | S2,A1)

= −μ−13 (a2 | S2,A1) − μ3(a2 | S2,A1) + μ3(A1 | S2,A1) + μ−1

3 (A1 | S2,A1)

= −μ−13 (a2 | S2,A1) + ν3(a2 | S2,A1) + μ−1

3 (A1 | S2,A1)

since the regrets μ2 and μ3 share parameters and only depend on current and previousstate and previous dose. Now adding the above terms we obtain

ν2(a2 | S2,A1) = f1(ψ, θ)(

A21 − a2

2

) + f2(S2,A1,ψ, θ)(A1 − a2)

with

f1(ψ, θ) = ψ1(

2 + (

ψ24 + ψ2

2 θ23 + 2ψ2ψ4θ3

) − 2(ψ4 + ψ2θ3))

f2(S2,A1,ψ, θ) = 2ψ1(

f3(ψ, θ) + f4(ψ, θ)S2 + f5(ψ, θ)S1 + f6(ψ, θ)A1)

such that the first term f1 does not depend on the observed history (S2,A1) whereasthe second term f2 is linear in history. Here

f3(ψ, θ) = ψ22 θ3θ1 + ψ2ψ4θ1 − ψ2θ1

f4(ψ, θ) = −ψ2 + ψ3ψ4 + ψ22 θ2θ3 + ψ2ψ3θ3 + ψ2ψ4θ2 − (ψ3 + ψ2θ2)

f5(ψ, θ) = ψ3

f6(ψ, θ) = ψ4.

Defining φj0 = f1(ψ, θ), φj1 = f3(ψ, θ)/η1, φj2 = f4(ψ, θ)/η1, φj3 = ψ3/η1 andφj4 = ψ4/η1 we can write the fixed-dose contrast on the claimed form, namely

ν2(

a2 | S2,A1; (φ20, φ2))

= φ20(A1 − a2)(

(A1 + a2) − 2(φ21 + φ22S2 + φ23S1 + φ24A1))

such that the optimal dynamic fixed-dose strategy at the second time point is lin-ear in the same terms as the optimal dynamic strategy. The parameters are differentand there is an intercept term. For the parameters in the simulation study we findφ20 = 10, φ21 = 0, φ22 = 0.595, φ23 = 0.075 and φ24 = −0.1.

Page 29: Optimal Dynamic Treatment Strategies with Protection Against Missed Decision Points

Stat Biosci

Similar calculations for the first time point give

ν1(

a1 | S1; (φ10, φ1)) = φ10(a0 − a1)

(

(a0 + a1) − 2(φ11 + φ12S1))

with φ10 = 32.2725, φ11 = 0 and φ12 = 0.4165 for the chosen parameter values.

References

1. Almirall D, Ten Have T, Murphy SA (2010) Structural nested mean models for assessing time-varyingeffect moderation. Biometrics 66:131–139

2. Barrett JK, Henderson R, Rosthøj S (2013) Doubly robust estimation of optimal dynamic treatmentregimes. Stat Biosci. doi:10.1007/s12561-013-9097-6

3. Cole SR, Frangakis CE (2009) The consistency statement in causal inference: a definition or an as-sumption? Epidemiology 20:3–5

4. Hirsh J, Dalen JE, Anderson DR, Poller L, Bussey H, Ansell J, Deykin D (2001) Oral anticoagulants:mechanism of action, clinical effectiveness, and optimal therapeutic range. Chest 119:8S–21S

5. Henderson R, Ansell P, Alshibani D (2010) Regret-regression for optimal dynamic treatment regimes.Biometrics 66:1192–1201

6. Lok JJ (2008) Statistical modeling of causal effects in continuous time. Ann Stat 36:1464–15077. Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York8. Moodie EMM, Richardson TS, Stephens DA (2007) Demystifying optimal dynamic treatment

regimes. Biometrics 63:447–4559. Murphy SA (2003) Optimal dynamic treatment regimes. J R Stat Soc Ser B 65:331–355

10. Orellana L, Rotnitzky A, Robins JM (2010) Dynamic regime marginal structural mean models forestimation of optimal dynamic treatment regimes, part I: main content. Int J Biostat 2:1–47

11. Pearl J (2009) Causality: models, reasoning and inference, 2nd edn. Cambridge University Press,Cambridge

12. Petersen ML, Deeks SG, Martin JN, van der Laan MJ (2007) History-adjusted marginal structuralmodels for estimating time-varying effect modification. Am J Epidemiol 166(9):985–993

13. Robert C, Casella G (2004) Monte Carlo statistical methods, 2nd edn. Springer, New York14. Robins JM (1994) Correcting for non-compliance in randomized trials using structural nested mean

models. Commun Stat, Theory Methods 23:2379–241215. Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors

are not always observed. J Am Stat Assoc 89:846–86616. Robins JM (1997) Causal inference from complex longitudinal data. In: Berkane M (ed) Latent vari-

able modeling and applications to causality. Springer, New York, pp 69–11717. Robins JM (2004) Optimal structural nested models for optimal sequential decisions. In: Lin DY,

Heagerty P (eds) Proceedings of the second symposium on biostatistics. Springer, New York, pp 189–326

18. Robins JM, Hernán MA, Rotnitzky A (2007) Invited commentary: effect modification by time-varyingcovariates. Am J Econ 166:994–1002

19. Robins JM, Orellana L, Rotnitzky A (2008) Estimation and extrapolation of optimal treatment andtesting strategies. Stat Med 27:4678–4721

20. Rosendaal FR, Cannegieter SC, Van Der Meer FJ, Briët E et al (1993) A method to determine theoptimal intensity or oral anticoagulant therapy. Thromb Haemost 69:236–239

21. Rosthøj S, Fullwood C, Henderson R, Stewart S (2006) Estimation of optimal dynamic anticoagula-tion regimes from observational data: a regret-based approach. Stat Med 25:4197–4215

22. Rosthøj S, Keiding N, Schmiegelow K (2012) Estimation of dynamic treatment strategies for main-tenance therapy of children with acute lymphoblastic leukemia: an application of history-adjustedmarginal structural models. Stat Med 31:470–488

23. Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Ann Stat 6:34–5824. Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London25. van der Laan MJ, Petersen ML, Joffe MM (2005) History-adjusted marginal structural models and

statically-optimal dynamic treatment regimens. Int J Biostat 1:Article 426. Welch PD, Heidelberger P (1983) Simulation run length control in presence of an initial transient.

Oper Res 31:1109–114427. Zhang B, Tsiatis AA, Laber EB, Davidian M (2012) A robust method for estimating optimal treatment

regimes. Biometrics 68:1010–1018