astin: a bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in...

12
ASTIN: a Bayesian adaptive dose–response trial in acute stroke Andrew P Grieve a and Michael Krams b Understanding the dose–response is critical for successful drug development. We describe an adaptive design to efficiently learn about the dose–response and the ED 95 . A dynamic termination rule allows for early discontinuation either for efficacy or futility. The design was deployed in ASTIN, a phase II proof-of-concept trial of the neuroprotectant, neutrophil inhibitory factor (NIF), in acute stroke. We discuss the learning from this trial. Clinical Trials 2005; 2: 340–351. www.SCTjournal.com 1 Introduction Developing new pharmacological therapies is expensive. Most projects fail, sometimes late in the development process, and therefore there is great value in enabling earlier and better decision making as to whether or not to continue with a drug development program. Incomplete understanding of the dose–response is recognized as a major problem in clinical drug development, potentially leading to inappropriate doses being taken into phase III. We have implemented a design that conti- nuously captures outcome data to allow early termination and improved learning about the dose–response. We describe the rationale for adaptive treatment allocation and dynamic termin- ation rules, show results from simulating the design, and discuss lessons learned from implementing the design in ASTIN, an acute stroke trial. 2 Issues in dose selection A standard Phase II dose-selection design is a randomized, parallel group trial with placebo and three, or four, active doses. Let the objective of such a trial be to determine the minimum dose with satisfactory effect (MDSE) [1]. In ASTIN this was defined as the ED 95 , the dose delivering 95% of the maximum efficacy. This corresponds to looking for a dose delivering almost maximal effect, but that minimizes the danger of unacceptable adverse events. To illustrate issues in dose selection, assume that in addition to a placebo (P) group there are three doses groups, low (L), medium (M) and high (H) (Figure 1). Some problems associated with the standard design are immediately apparent. With a small number of doses the interval between successive doses is wide. Consequently, as in Figure 1A, there may be no effect at L, and a maximum effect at M, from which we might only conclude that the ED 95 lies between the two doses. Similarly, in Figure 1B we learn that the ED 95 lies between P and L, or in Figure 1C where we learn that it lies between M and H. This choice of doses is only suitable for determining the ED 95 if the dose–response gently increases across the whole dose range (Figure 1D). 2.1 Improvements to a standard design 2.1.1 Increase number of doses Phase II of drug development should be an exploratory, or learning mode of investigation, rather than a confirmatory, testing mode [2]. Since it is known that if the objective of a trial is to estimate some aspect of a dose–response function then, for a fixed total number of patients, it is better to have more doses, with fewer patients per dose, than fewer doses with more patients per dose [3] and we would recommend using as many doses as is feasible. In ASTIN we chose 15 doses and placebo, made possible because the study drug was delivered by infusion, allowing for dilution. However, even for treatments in tablet form, many more doses than are currently used could be contemplated, for example, by combining two tablets of strengths a Statistical Research and Consulting Centre, Pfizer Global Research and Development, Sandwich, Kent CT13 9NJ, UK. E-mail: Andy.P.Grieve@Pfizer.Com, b Clinical CNS, Pfizer Inc., Groton, Connecticut, CT 06340 USA CASE STUDY Clinical Trials 2005; 2: 340–351 # Society for Clinical Trials 2005 10.1191/1740774505cn094oa

Upload: phamdien

Post on 24-Jul-2019

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

ASTIN a Bayesian adaptive dosendashresponsetrial in acute stroke

Andrew P Grievea and Michael Kramsb

Understanding the dosendashresponse is critical for successful drug development Wedescribe an adaptive design to efficiently learn about the dosendashresponse and theED95 A dynamic termination rule allows for early discontinuation either for efficacyor futility The design was deployed in ASTIN a phase II proof-of-concept trial of theneuroprotectant neutrophil inhibitory factor (NIF) in acute stroke We discuss thelearning from this trial Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

1 Introduction

Developing new pharmacological therapies isexpensive Most projects fail sometimes late in thedevelopment process and therefore there is greatvalue in enabling earlier and better decision makingas to whether or not to continue with a drugdevelopment program Incomplete understandingof the dosendashresponse is recognized as a majorproblem in clinical drug development potentiallyleading to inappropriate doses being taken intophase III

We have implemented a design that conti-nuously captures outcome data to allow earlytermination and improved learning about thedosendashresponse We describe the rationale foradaptive treatment allocation and dynamic termin-ation rules show results from simulating the designand discuss lessons learned from implementing thedesign in ASTIN an acute stroke trial

2 Issues in dose selection

A standard Phase II dose-selection design is arandomized parallel group trial with placebo andthree or four active doses Let the objective of sucha trial be to determine the minimum dose withsatisfactory effect (MDSE) [1] In ASTIN this wasdefined as the ED95 the dose delivering 95 of themaximum efficacy This corresponds to looking fora dose delivering almost maximal effect but thatminimizes the danger of unacceptable adverseevents To illustrate issues in dose selection assumethat in addition to a placebo (P) group there are

three doses groups low (L) medium (M) and high (H)(Figure 1)

Some problems associated with the standarddesign are immediately apparent With a smallnumber of doses the interval between successivedoses is wide Consequently as in Figure 1A theremay be no effect at L and a maximum effect at Mfrom which we might only conclude that the ED95

lies between the two doses Similarly in Figure 1Bwe learn that the ED95 lies between P and L or inFigure 1C where we learn that it lies between Mand H This choice of doses is only suitable fordetermining the ED95 if the dosendashresponse gentlyincreases across the whole dose range (Figure 1D)

21 Improvements to a standard design

211 Increase number of doses

Phase II of drug development should be anexploratory or learning mode of investigationrather than a confirmatory testing mode [2] Sinceit is known that if the objective of a trial is toestimate some aspect of a dosendashresponse functionthen for a fixed total number of patients it is betterto have more doses with fewer patients per dosethan fewer doses with more patients per dose [3] andwe would recommend using as many doses as isfeasible In ASTIN we chose 15 doses and placebomade possible because the study drug was deliveredby infusion allowing for dilution However evenfor treatments in tablet form many more doses thanare currently used could be contemplated forexample by combining two tablets of strengths

aStatistical Research and Consulting Centre Pfizer Global Research and Development Sandwich Kent CT13 9NJ UKE-mail AndyPGrievePfizerCom bClinical CNS Pfizer Inc Groton Connecticut CT 06340 USA

CASE STUDY Clinical Trials 2005 2 340ndash351

Society for Clinical Trials 2005 1011911740774505cn094oa

0 1 3 4 allowing any dose in the rangeof 0ndash8 to be studied The advantage of increasingthe number of doses is seen in Figure 2A ndash anarrower interval between doses allows the ED95 tobe better determined Increasing the number of

doses is an option only if we adopt a learning stancevia estimation rather than a confirmatory stancevia testing since in the latter whether we use 3 or15 doses has little impact on the number of subjectsneeded in a single group to detect a difference of

Figure 2 Issues in increasing the number of doses

Figure 1 Issues associated with traditional dosendashresponse designs

ASTIN a Bayesian adaptive dosendashresponse trial 341

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

a given size A second issue is illustrated in Figure 2BPatients allocated to the first four doses excludingplacebo are wasted since their response willessentially be the same as the response of patientsto placebo Similarly patients allocated to the topthree doses will respond similarly to patientsreceiving the first dose on the plateau Ideally ifthe shape and the position of the dosendashresponsefunction were known then patients could beallocated to placebo or to the four doses spanningthe steep portion of the dosendashresponse curve Thiswould be true whether the position of the curve onthe interval was as in Figure 2B or 2C

212 Adaptive allocation

While at the start of a trial we may know little aboutthe position of the curve within the dose interval asthe trial progresses information accrues as to theresponse of patients to differing doses and we canlearn about the position and use this information toadapt allocation For example if we learned that thedosendashresponse curve was as in Figure 2B we couldreduce the chance of allocating patients to the lowdoses or to the very high doses whereas if it was asin Figure 2C we would only need to allocate toplacebo and the lowest four doses

213 Early stopping

As part of learning about the shape of the dosendashresponse curve we are also able to make decisionsabout stopping early For example if there is little orno evidence to show that the drug has a real effectthen we should stop as continuing would be futileSimilarly there may be enough evidence to identifya dose with sufficient efficacy and an adequatesafety profile to warrant going into a Phase III trial

214 Seamless designs

During thedevelopment of thisdesign itwas intendedthat if early stopping for efficacy took place transitionto the subsequent Phase III trial would take placeseamlessly There are clear advantages to such aproposal both in terms of time to approval if the drugwere effective but also in terms of continuity ofrecruitment This approach was not adopted forASTIN because of objections both from the regulatoryauthority as well as within the company

3 The ASTIN design process

Assume that the study is already running and thatinformation has accrued on the shape and location

of the dosendashresponse function allowing adaptiverandomization The design process is illustrated inFigure 3 in which four distinct elements areidentified randomization prediction decisionmaking and dose allocation

31 Randomization

A new patient entering the study is assigned toplacebo or an active dose predetermined tomaximize learning about that aspect of the dosendashresponse function that is of interest Patients areallocated at a minimum rate to placebo throughoutthe trial to protect against a drift in the patientpopulation that can lead to biased estimates of thedosendashresponse function By maintaining a mini-mum allocation to placebo we can fit time as anexplanatory variable in the analysis and amelioratethe influence of population drift

32 Prediction

In acute stroke trials it is standard to measure apatientrsquos response to treatment three months post-stroke Waiting 90 days to determine the response ofan individual patient to a dose results in manypatients needing to be randomized before you canhave learnt how to optimally allocate them to adose One approach is to predict a patientrsquos outcomeat 90 days based on early outcome or a surrogateoutcome Figure 4 shows data taken from theCopenhagen Stroke Database (CSD) [4] that con-tains data from 1351 pharmacologically untreatedstroke patients entering an acute stroke unit (ASU)in Copenhagen between September 1991 andSeptember 1993 In Figure 4A the relationshipbetween the Scandinavian Stroke Scale (SSS) scorethe primary endpoint used in ASTIN at admission

Figure 3 General design process for response-adaptivetrials

342 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

to the ASU and the corresponding score at dischargefrom the ASU is shown The SSS measuresneurological function with a zero representing acomatose patient and a score of 58 corresponding tono neurological deficit There is only a weak relationbetween the SSS scores at admission and dischargewith mild stroke patients being discharged withlittle neurological deficit and severe stroke patientsbeing discharged with a substantial residual deficitFigures 4B 4C and 4D show similar relationshipsbetween the SSS scores at 1 4 and 8 weeks post-stroke and the SSS score at discharge As timeprogresses there is an increasingly strong relation-ship between the SSS score at an intermediate timepoint and discharge These data were used toestablish a set of linear regressions to predict finaloutcome based on early measurements called thelongitudinal model In order to use this model inASTIN we required immediate access to earlyresponse data This was achieved by building anelectronic data interface to the clinical sites

33 Updating estimated dosendashresponse

Following the prediction of final outcome based onearly response the estimated dosendashresponse functionwas updated by Bayes theorem A Bayesian analysisaccounts for all sources of uncertainty in particularthe uncertainty associated with predicting patientoutcome from early outcomes in an appropriate way

when updating the estimate of the dosendashresponsecurve It also allowed us to update the longitudinalmodel as data from ASTIN became available

34 Decision making

Having updated the estimate of dosendashresponse wecan now make decisions as to the future conduct ofthe trial There are three possibilities First theremay be sufficient evidence to decide there is no doseof NIF giving sufficient efficacy to take into aPhase III leading to the halting of ASTIN and thedevelopment program Secondly there may besufficient evidence to identify a dose with theappropriate riskbenefit profile allowing ASTIN tobe stopped and the planning of Phase III to begin Ifneither of the above decisions can be made thenASTIN continues in adaptive allocation and learningabout the dosendashresponse function

35 Dose allocation

If ASTIN continues the dose to be given to the nextpatient can be determined by choosing that dosewhich maximizes learning about the aspect of thedosendashresponse function that is of primary interestBased on simulation we chose to use that dose whichminimizes the predicted variance of the response atthe ED95 in other words the dose that we expect to

Figure 4 Relationship between early response and final outcome in untreated acute stroke patients

ASTIN a Bayesian adaptive dosendashresponse trial 343

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

lead to the most precise estimate of the response atthe ED95 a measure of its expected utility

36 Dosendashresponse model

In general terms the requirements for a dosendashresponse model are that it relates the expectedresponse at a given dose to a set of parameters andpossibly covariates Usually dosendashresponse modelsare restricted to be monotonic In ASTIN a moreflexible model allowing nonmonotonicity wasneeded because of an indication from an earlypatient safety study that the dosendashresponse curvefor NIF might be nonmonotonic at high doses [5]

One of the hindrances to the use of Bayesianmethods in practice has been the lack of appropriatecomputational tools Recently the development ofnumerical methods based on Markov chain MonteCarlo (MCMC) has greatly broadened the scope ofpractical Bayesian statistics However MCMC has acomputational overhead that may be prohibitivebecause of the considerable simulation work that isnecessary to carry before trials can be run The dosendashresponse model needed to allow a degree of analyticupdating of the response curve and the calculationof the expected utilities of each dose

There are a number of approaches that give bothflexibility and efficiency including models based onsplines and kernel regression In ASTIN we chose theNormal Dynamic Linear Model (NDLM) which hasthe necessary characteristics NDLMs were orig-inally developed in time series [6] and combine twosources of variability observational and systemFigure 5 illustrates a second-order polynomial

NDLM The diamonds represent observed individualresponses Yjk at dose Zj At dose Zj a straight line isfitted through the observations parameterized sothat the expected value at z frac14 Zj is uj the interceptand the slope is dj If the model were a simple linearmodel then the expected response at dose Zjthorn1 frac14

Zjthorn 1 would be ujthorn dj and the slope would remain djThe dynamic component of the model allows thesemodel parameters to change The model may bewritten as follows

Observation equation

Yjk frac14 mj thorn nj nj N(0 Vs2)

System equations

mj frac14 mj1 thorn dj1 thorn vj vj N(0 Wjs2)

dj frac14 dj1 thorn 1j 1j N(0 Wjs2)

This is a flexible family of response functions inwhich the multipliers Wj can be regarded assmoothing tuners with small values giving asmooth response function and large values a moreerratic response function Covariates Xk can beintroduced by making the expected responsesdepend linearly on the covariates

E( yjkjz frac14 Zj Xk) frac14 mj frac14 uj thorn bXk

and by applying the NDLM to the uj values InASTIN baseline SSS was used as a covariate inmodeling dosendashresponse although not in choosingthe dose to which the next patient is allocated

Figure 5 Application of a second-order NDLM to dosendashresponse relationships

344 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

37 Stopping rules

In a Bayesian framework there are essentially twoways of stopping a clinical trial one based ondecision theoretic principles and the other on anassessment of the posterior probability of clinicalmeaningful effects

The difficulty with the decision approach is thatas more updates are performed the number ofdecision scenarios increases exponentiallysecondly the necessary calculations are computa-tionally intensive because they are not analyticallytractable Berry et al [7] report approximations thathave been developed to make the calculations morefeasible However in the implementation of ASTINit was decided to use the second approach in whichthe size of the effect at the ED95 relative to placebowas determined and the decision to stop was basedon the magnitude of this effect

The NDLM models response through the expectedvalue of the response mj at a number of doses( j frac14 1 J) including placebo (j frac14 1) The dosendashresponse curve is converted into a dosendasheffect curveby defining the expected difference to placebo as

fj frac14 mj m1 (j frac14 2 J)

To stop we require two clinical effect sizes The firstc1 is the smallest clinical effect that we would notwish to miss and the second c0 is the largest clinicaleffect that is not of interest These define stoppingcriteria for satisfactory efficacy and futility If theposterior interval for fk where k indexes the doseclosest to the estimated ED95 denoted ED95

liescompletely above c1 then we would conclude thatthere is sufficient evidence that the effect at the ED95

is large enough to warrant going into a Phase III trialConversely if the posterior interval for fk liescompletely below c0 we could conclude there issufficient evidence that the effect at the ED95 is toosmall to warrant continuing ASTIN or the NIFdevelopment program A hypothetical example in

which there is sufficient evidence to start a Phase IIIprogram is shown in Figure 6

4 Simulating the design

Having developed such a complex design were wein a position immediately to implement it Theanswer is no since there were a number ofinterested parties to persuade that the approachwas appropriate and feasible We needed to choosethe appropriate settings for the algorithm Weneeded to assure ourselves that the design had anacceptable operating characteristic not only interms of false positive rates and false negative ratesbut also in terms of the aspects of the design thatwere considered important Could the algorithmlearn appropriately Could it accurately estimatethe dosendashresponse relationship Did the adaptiveallocation result in a sensible choice of doses Couldwe stop early Were there benefits compared to atraditional design Answers to these questions wereneeded to convince senior management in thecompany that the design was worthwhile andacceptable to the regulatory authorities and toconvince the regulatory authorities in both NorthAmerica and Europe that the approach wasscientifically sound and that the computer systemsthat were developed were appropriately validatedAll of this was achieved by simulation In practicethe simulations were conducted in two stages In thefirst stage a fractional factorial computer experi-ment was conducted to optimize the parametersettings for the algorithm In the second stage theoperating characteristic of the design was deter-mined based on the parameters determined in thefirst stage

To illustrate Figure 7 displays a series of snap-shots from a simulation of a single trial in whichthe true underlying dosendashresponse (indicated bythe solid diamonds) corresponded to a logistic-typecurve giving a maximum benefit over placebo ofeight points change from baseline on the SSSIn Figure 7A the prior is shown and it can becharacterized as reflecting a belief that if a dosendashresponse exists it is minimal and gradual but thatthere is great uncertainty Figures 7B to 7H show theupdated estimate of the dosendashresponse curve (solidline) and its associated uncertainty (dashed line)after 25 50 100 200 500 patients On eachgraph the circles denote the observed patientresponses the arrows the doses that have alreadybeen allocated and the dotted line a locallyweighted fit (LOWESS) through the observedresponses These figures suggest that the algorithmcan accurately estimate the dosendashresponse relation-ship in that the estimate converges to the truth andthe uncertainty reduces rapidly They also suggestFigure 6 Use of a dosendasheffect curve to determine stopping

ASTIN a Bayesian adaptive dosendashresponse trial 345

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

that it can learn appropriately and consequentlychoose appropriate doses which is illustrated by thelow density of patients allocated to the bottom fourdoses ndash indeed the second dose remains unused ndash aswell as to the doses on the plateau

Hundreds of thousands of such simulations wereconducted so that the algorithm could be appro-priately tuned One of the main items of interest washow the adaptive design would compare to atraditional design Table 1 illustrates the results of

Figure 7 Simulation of a single clinical trial

346 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

the simulations If interest centered on identifyingthe ED95 from a curve that plateaued at 2 3 or 4points benefit over placebo then for 80 powerapproximately 2400 1000 and 600 patients arerequired respectively for 90 the correspondingnumbers are 3200 430 and 800 For an adaptivedesign based on a maximum of 1000 evaluablepatients the false positive rate is controlled at lessthan 5 while the effect ldquopowerrdquo for 3 and 4 pointsis acceptably high with small numbers of patientsOf course the comparison to the traditional designis not completely fair because the use of traditionalsequential designs would reduce the averagenumbers of patients

5 The results of the ASTIN study

In ASTIN patients who had suffered an acuteischemic stroke who arrived in the emergency roomof the hospital within 6 hours of its onset and whohad a baseline rating on the Scandinavian StrokeScale (SSS) of between 10 and 40 points were eligiblefor entry Details of the exclusion criteria can befound in Krams et al [8]

The study was run by an executive steeringcommittee and an Independent Data MonitoringCommittee (IDMC) with an expert Bayesian statis-tician who were responsible for ensuring the safety ofthe patients the integrity of the study monitoringthe performance of the algorithm and confirmingdecisions to continue or stop the study A number offormal meetings of the IDMC were prespecified forwhich reports were prepared by an independentstatistician and the computer system was runindependently of the sponsor by Tessella Ltd

A total of 966 patients were randomized ofwhom 26 received placebo There was preferentialallocation to the top three doses (Figure 8) and 40of patients were allocated to them

Figure 9 illustrates the course of ASTIN in terms ofthe estimate of the dosendashresponse relationship Inthe frame corresponding to Week 0 the priorestimate of the dosendashresponse is shown to be flatwith expectation of 10 points change from baseline

on the SSS This placebo effect size was estimatedfrom the data in the CSD study The following sixframes show estimates of the dosendashresponse at8-week intervals up to 48 weeks What is alreadyapparent by 16 weeks is that response on placebo ndash19 points ndash is far higher than anticipated from theCSD study and this persists during the course of thewhole study At week 32 the placebo effect is stillmuch higher than anticipated but there is littleindication of a true dosendashresponse At week 48 thefirst opportunity at which the IDMC could stopthe trial it was stopped for futility At this pointrecruitment was halted but the protocol requiredthat the patients who had already been enteredshould continue to be monitored for the full 13 weeksof the study The study was finally completed 66weeks after it started at which point there was littleindication of a positive dosendashresponse

The final estimate of the dosendashresponse ismagnified in Figure 10 and there are a number ofpoints to consider First as was noted above theplacebo response was far greater at 17 points thanthe anticipated 10 points The trial was designed todetect a 3 point benefit over placebo which clearlyis far greater than the estimated effect at any doseThis final model contained a single covariate ndashbaseline severity After the trial was completed anumber of predefined important prognostic covari-ates including tPA were added to the model butnone materially changed the final conclusionsThere was also no indication of any issues withadverse events or serious adverse events The onlydose-related effect that was seen was in theantibodies to neutrophil inhibitory factor (NIF) [8]

6 Post-trial learnings

At the conclusion of ASTIN a number of investi-gations were undertaken to assess the running of thetrial and the algorithm

Table 1 Operating characteristic of traditional and adaptive

designs

Benefit overplacebo

Power oftraditional design

Adaptive design(max 1000 pts)

80 90 Stoppedfor efficacy

Median numberof patients

0 mdash mdash 002 5012 2432 3220 056 6443 1080 432 090 4164 608 808 095 280

Figure 8 Patient allocation by dose group

ASTIN a Bayesian adaptive dosendashresponse trial 347

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

61 Operational investigations

Throughout the planning of ASTIN we anticipated aparticular recruitment speed in the event theachieved rate was double Centers themselves were

enthusiastic about the unique aspects of the design aphenomenon reported in a previous adaptive design[9] One consequence of faster recruitment is thatlearning cannot take place efficiently Our investi-gations suggest that a lower recruitment rate would

Figure 9 Posterior estimate of the dosendashresponse function during the course of ASTIN mdashmdash Posterior estimate --- - - - posterioruncertainty

348 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 2: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

0 1 3 4 allowing any dose in the rangeof 0ndash8 to be studied The advantage of increasingthe number of doses is seen in Figure 2A ndash anarrower interval between doses allows the ED95 tobe better determined Increasing the number of

doses is an option only if we adopt a learning stancevia estimation rather than a confirmatory stancevia testing since in the latter whether we use 3 or15 doses has little impact on the number of subjectsneeded in a single group to detect a difference of

Figure 2 Issues in increasing the number of doses

Figure 1 Issues associated with traditional dosendashresponse designs

ASTIN a Bayesian adaptive dosendashresponse trial 341

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

a given size A second issue is illustrated in Figure 2BPatients allocated to the first four doses excludingplacebo are wasted since their response willessentially be the same as the response of patientsto placebo Similarly patients allocated to the topthree doses will respond similarly to patientsreceiving the first dose on the plateau Ideally ifthe shape and the position of the dosendashresponsefunction were known then patients could beallocated to placebo or to the four doses spanningthe steep portion of the dosendashresponse curve Thiswould be true whether the position of the curve onthe interval was as in Figure 2B or 2C

212 Adaptive allocation

While at the start of a trial we may know little aboutthe position of the curve within the dose interval asthe trial progresses information accrues as to theresponse of patients to differing doses and we canlearn about the position and use this information toadapt allocation For example if we learned that thedosendashresponse curve was as in Figure 2B we couldreduce the chance of allocating patients to the lowdoses or to the very high doses whereas if it was asin Figure 2C we would only need to allocate toplacebo and the lowest four doses

213 Early stopping

As part of learning about the shape of the dosendashresponse curve we are also able to make decisionsabout stopping early For example if there is little orno evidence to show that the drug has a real effectthen we should stop as continuing would be futileSimilarly there may be enough evidence to identifya dose with sufficient efficacy and an adequatesafety profile to warrant going into a Phase III trial

214 Seamless designs

During thedevelopment of thisdesign itwas intendedthat if early stopping for efficacy took place transitionto the subsequent Phase III trial would take placeseamlessly There are clear advantages to such aproposal both in terms of time to approval if the drugwere effective but also in terms of continuity ofrecruitment This approach was not adopted forASTIN because of objections both from the regulatoryauthority as well as within the company

3 The ASTIN design process

Assume that the study is already running and thatinformation has accrued on the shape and location

of the dosendashresponse function allowing adaptiverandomization The design process is illustrated inFigure 3 in which four distinct elements areidentified randomization prediction decisionmaking and dose allocation

31 Randomization

A new patient entering the study is assigned toplacebo or an active dose predetermined tomaximize learning about that aspect of the dosendashresponse function that is of interest Patients areallocated at a minimum rate to placebo throughoutthe trial to protect against a drift in the patientpopulation that can lead to biased estimates of thedosendashresponse function By maintaining a mini-mum allocation to placebo we can fit time as anexplanatory variable in the analysis and amelioratethe influence of population drift

32 Prediction

In acute stroke trials it is standard to measure apatientrsquos response to treatment three months post-stroke Waiting 90 days to determine the response ofan individual patient to a dose results in manypatients needing to be randomized before you canhave learnt how to optimally allocate them to adose One approach is to predict a patientrsquos outcomeat 90 days based on early outcome or a surrogateoutcome Figure 4 shows data taken from theCopenhagen Stroke Database (CSD) [4] that con-tains data from 1351 pharmacologically untreatedstroke patients entering an acute stroke unit (ASU)in Copenhagen between September 1991 andSeptember 1993 In Figure 4A the relationshipbetween the Scandinavian Stroke Scale (SSS) scorethe primary endpoint used in ASTIN at admission

Figure 3 General design process for response-adaptivetrials

342 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

to the ASU and the corresponding score at dischargefrom the ASU is shown The SSS measuresneurological function with a zero representing acomatose patient and a score of 58 corresponding tono neurological deficit There is only a weak relationbetween the SSS scores at admission and dischargewith mild stroke patients being discharged withlittle neurological deficit and severe stroke patientsbeing discharged with a substantial residual deficitFigures 4B 4C and 4D show similar relationshipsbetween the SSS scores at 1 4 and 8 weeks post-stroke and the SSS score at discharge As timeprogresses there is an increasingly strong relation-ship between the SSS score at an intermediate timepoint and discharge These data were used toestablish a set of linear regressions to predict finaloutcome based on early measurements called thelongitudinal model In order to use this model inASTIN we required immediate access to earlyresponse data This was achieved by building anelectronic data interface to the clinical sites

33 Updating estimated dosendashresponse

Following the prediction of final outcome based onearly response the estimated dosendashresponse functionwas updated by Bayes theorem A Bayesian analysisaccounts for all sources of uncertainty in particularthe uncertainty associated with predicting patientoutcome from early outcomes in an appropriate way

when updating the estimate of the dosendashresponsecurve It also allowed us to update the longitudinalmodel as data from ASTIN became available

34 Decision making

Having updated the estimate of dosendashresponse wecan now make decisions as to the future conduct ofthe trial There are three possibilities First theremay be sufficient evidence to decide there is no doseof NIF giving sufficient efficacy to take into aPhase III leading to the halting of ASTIN and thedevelopment program Secondly there may besufficient evidence to identify a dose with theappropriate riskbenefit profile allowing ASTIN tobe stopped and the planning of Phase III to begin Ifneither of the above decisions can be made thenASTIN continues in adaptive allocation and learningabout the dosendashresponse function

35 Dose allocation

If ASTIN continues the dose to be given to the nextpatient can be determined by choosing that dosewhich maximizes learning about the aspect of thedosendashresponse function that is of primary interestBased on simulation we chose to use that dose whichminimizes the predicted variance of the response atthe ED95 in other words the dose that we expect to

Figure 4 Relationship between early response and final outcome in untreated acute stroke patients

ASTIN a Bayesian adaptive dosendashresponse trial 343

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

lead to the most precise estimate of the response atthe ED95 a measure of its expected utility

36 Dosendashresponse model

In general terms the requirements for a dosendashresponse model are that it relates the expectedresponse at a given dose to a set of parameters andpossibly covariates Usually dosendashresponse modelsare restricted to be monotonic In ASTIN a moreflexible model allowing nonmonotonicity wasneeded because of an indication from an earlypatient safety study that the dosendashresponse curvefor NIF might be nonmonotonic at high doses [5]

One of the hindrances to the use of Bayesianmethods in practice has been the lack of appropriatecomputational tools Recently the development ofnumerical methods based on Markov chain MonteCarlo (MCMC) has greatly broadened the scope ofpractical Bayesian statistics However MCMC has acomputational overhead that may be prohibitivebecause of the considerable simulation work that isnecessary to carry before trials can be run The dosendashresponse model needed to allow a degree of analyticupdating of the response curve and the calculationof the expected utilities of each dose

There are a number of approaches that give bothflexibility and efficiency including models based onsplines and kernel regression In ASTIN we chose theNormal Dynamic Linear Model (NDLM) which hasthe necessary characteristics NDLMs were orig-inally developed in time series [6] and combine twosources of variability observational and systemFigure 5 illustrates a second-order polynomial

NDLM The diamonds represent observed individualresponses Yjk at dose Zj At dose Zj a straight line isfitted through the observations parameterized sothat the expected value at z frac14 Zj is uj the interceptand the slope is dj If the model were a simple linearmodel then the expected response at dose Zjthorn1 frac14

Zjthorn 1 would be ujthorn dj and the slope would remain djThe dynamic component of the model allows thesemodel parameters to change The model may bewritten as follows

Observation equation

Yjk frac14 mj thorn nj nj N(0 Vs2)

System equations

mj frac14 mj1 thorn dj1 thorn vj vj N(0 Wjs2)

dj frac14 dj1 thorn 1j 1j N(0 Wjs2)

This is a flexible family of response functions inwhich the multipliers Wj can be regarded assmoothing tuners with small values giving asmooth response function and large values a moreerratic response function Covariates Xk can beintroduced by making the expected responsesdepend linearly on the covariates

E( yjkjz frac14 Zj Xk) frac14 mj frac14 uj thorn bXk

and by applying the NDLM to the uj values InASTIN baseline SSS was used as a covariate inmodeling dosendashresponse although not in choosingthe dose to which the next patient is allocated

Figure 5 Application of a second-order NDLM to dosendashresponse relationships

344 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

37 Stopping rules

In a Bayesian framework there are essentially twoways of stopping a clinical trial one based ondecision theoretic principles and the other on anassessment of the posterior probability of clinicalmeaningful effects

The difficulty with the decision approach is thatas more updates are performed the number ofdecision scenarios increases exponentiallysecondly the necessary calculations are computa-tionally intensive because they are not analyticallytractable Berry et al [7] report approximations thathave been developed to make the calculations morefeasible However in the implementation of ASTINit was decided to use the second approach in whichthe size of the effect at the ED95 relative to placebowas determined and the decision to stop was basedon the magnitude of this effect

The NDLM models response through the expectedvalue of the response mj at a number of doses( j frac14 1 J) including placebo (j frac14 1) The dosendashresponse curve is converted into a dosendasheffect curveby defining the expected difference to placebo as

fj frac14 mj m1 (j frac14 2 J)

To stop we require two clinical effect sizes The firstc1 is the smallest clinical effect that we would notwish to miss and the second c0 is the largest clinicaleffect that is not of interest These define stoppingcriteria for satisfactory efficacy and futility If theposterior interval for fk where k indexes the doseclosest to the estimated ED95 denoted ED95

liescompletely above c1 then we would conclude thatthere is sufficient evidence that the effect at the ED95

is large enough to warrant going into a Phase III trialConversely if the posterior interval for fk liescompletely below c0 we could conclude there issufficient evidence that the effect at the ED95 is toosmall to warrant continuing ASTIN or the NIFdevelopment program A hypothetical example in

which there is sufficient evidence to start a Phase IIIprogram is shown in Figure 6

4 Simulating the design

Having developed such a complex design were wein a position immediately to implement it Theanswer is no since there were a number ofinterested parties to persuade that the approachwas appropriate and feasible We needed to choosethe appropriate settings for the algorithm Weneeded to assure ourselves that the design had anacceptable operating characteristic not only interms of false positive rates and false negative ratesbut also in terms of the aspects of the design thatwere considered important Could the algorithmlearn appropriately Could it accurately estimatethe dosendashresponse relationship Did the adaptiveallocation result in a sensible choice of doses Couldwe stop early Were there benefits compared to atraditional design Answers to these questions wereneeded to convince senior management in thecompany that the design was worthwhile andacceptable to the regulatory authorities and toconvince the regulatory authorities in both NorthAmerica and Europe that the approach wasscientifically sound and that the computer systemsthat were developed were appropriately validatedAll of this was achieved by simulation In practicethe simulations were conducted in two stages In thefirst stage a fractional factorial computer experi-ment was conducted to optimize the parametersettings for the algorithm In the second stage theoperating characteristic of the design was deter-mined based on the parameters determined in thefirst stage

To illustrate Figure 7 displays a series of snap-shots from a simulation of a single trial in whichthe true underlying dosendashresponse (indicated bythe solid diamonds) corresponded to a logistic-typecurve giving a maximum benefit over placebo ofeight points change from baseline on the SSSIn Figure 7A the prior is shown and it can becharacterized as reflecting a belief that if a dosendashresponse exists it is minimal and gradual but thatthere is great uncertainty Figures 7B to 7H show theupdated estimate of the dosendashresponse curve (solidline) and its associated uncertainty (dashed line)after 25 50 100 200 500 patients On eachgraph the circles denote the observed patientresponses the arrows the doses that have alreadybeen allocated and the dotted line a locallyweighted fit (LOWESS) through the observedresponses These figures suggest that the algorithmcan accurately estimate the dosendashresponse relation-ship in that the estimate converges to the truth andthe uncertainty reduces rapidly They also suggestFigure 6 Use of a dosendasheffect curve to determine stopping

ASTIN a Bayesian adaptive dosendashresponse trial 345

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

that it can learn appropriately and consequentlychoose appropriate doses which is illustrated by thelow density of patients allocated to the bottom fourdoses ndash indeed the second dose remains unused ndash aswell as to the doses on the plateau

Hundreds of thousands of such simulations wereconducted so that the algorithm could be appro-priately tuned One of the main items of interest washow the adaptive design would compare to atraditional design Table 1 illustrates the results of

Figure 7 Simulation of a single clinical trial

346 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

the simulations If interest centered on identifyingthe ED95 from a curve that plateaued at 2 3 or 4points benefit over placebo then for 80 powerapproximately 2400 1000 and 600 patients arerequired respectively for 90 the correspondingnumbers are 3200 430 and 800 For an adaptivedesign based on a maximum of 1000 evaluablepatients the false positive rate is controlled at lessthan 5 while the effect ldquopowerrdquo for 3 and 4 pointsis acceptably high with small numbers of patientsOf course the comparison to the traditional designis not completely fair because the use of traditionalsequential designs would reduce the averagenumbers of patients

5 The results of the ASTIN study

In ASTIN patients who had suffered an acuteischemic stroke who arrived in the emergency roomof the hospital within 6 hours of its onset and whohad a baseline rating on the Scandinavian StrokeScale (SSS) of between 10 and 40 points were eligiblefor entry Details of the exclusion criteria can befound in Krams et al [8]

The study was run by an executive steeringcommittee and an Independent Data MonitoringCommittee (IDMC) with an expert Bayesian statis-tician who were responsible for ensuring the safety ofthe patients the integrity of the study monitoringthe performance of the algorithm and confirmingdecisions to continue or stop the study A number offormal meetings of the IDMC were prespecified forwhich reports were prepared by an independentstatistician and the computer system was runindependently of the sponsor by Tessella Ltd

A total of 966 patients were randomized ofwhom 26 received placebo There was preferentialallocation to the top three doses (Figure 8) and 40of patients were allocated to them

Figure 9 illustrates the course of ASTIN in terms ofthe estimate of the dosendashresponse relationship Inthe frame corresponding to Week 0 the priorestimate of the dosendashresponse is shown to be flatwith expectation of 10 points change from baseline

on the SSS This placebo effect size was estimatedfrom the data in the CSD study The following sixframes show estimates of the dosendashresponse at8-week intervals up to 48 weeks What is alreadyapparent by 16 weeks is that response on placebo ndash19 points ndash is far higher than anticipated from theCSD study and this persists during the course of thewhole study At week 32 the placebo effect is stillmuch higher than anticipated but there is littleindication of a true dosendashresponse At week 48 thefirst opportunity at which the IDMC could stopthe trial it was stopped for futility At this pointrecruitment was halted but the protocol requiredthat the patients who had already been enteredshould continue to be monitored for the full 13 weeksof the study The study was finally completed 66weeks after it started at which point there was littleindication of a positive dosendashresponse

The final estimate of the dosendashresponse ismagnified in Figure 10 and there are a number ofpoints to consider First as was noted above theplacebo response was far greater at 17 points thanthe anticipated 10 points The trial was designed todetect a 3 point benefit over placebo which clearlyis far greater than the estimated effect at any doseThis final model contained a single covariate ndashbaseline severity After the trial was completed anumber of predefined important prognostic covari-ates including tPA were added to the model butnone materially changed the final conclusionsThere was also no indication of any issues withadverse events or serious adverse events The onlydose-related effect that was seen was in theantibodies to neutrophil inhibitory factor (NIF) [8]

6 Post-trial learnings

At the conclusion of ASTIN a number of investi-gations were undertaken to assess the running of thetrial and the algorithm

Table 1 Operating characteristic of traditional and adaptive

designs

Benefit overplacebo

Power oftraditional design

Adaptive design(max 1000 pts)

80 90 Stoppedfor efficacy

Median numberof patients

0 mdash mdash 002 5012 2432 3220 056 6443 1080 432 090 4164 608 808 095 280

Figure 8 Patient allocation by dose group

ASTIN a Bayesian adaptive dosendashresponse trial 347

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

61 Operational investigations

Throughout the planning of ASTIN we anticipated aparticular recruitment speed in the event theachieved rate was double Centers themselves were

enthusiastic about the unique aspects of the design aphenomenon reported in a previous adaptive design[9] One consequence of faster recruitment is thatlearning cannot take place efficiently Our investi-gations suggest that a lower recruitment rate would

Figure 9 Posterior estimate of the dosendashresponse function during the course of ASTIN mdashmdash Posterior estimate --- - - - posterioruncertainty

348 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 3: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

a given size A second issue is illustrated in Figure 2BPatients allocated to the first four doses excludingplacebo are wasted since their response willessentially be the same as the response of patientsto placebo Similarly patients allocated to the topthree doses will respond similarly to patientsreceiving the first dose on the plateau Ideally ifthe shape and the position of the dosendashresponsefunction were known then patients could beallocated to placebo or to the four doses spanningthe steep portion of the dosendashresponse curve Thiswould be true whether the position of the curve onthe interval was as in Figure 2B or 2C

212 Adaptive allocation

While at the start of a trial we may know little aboutthe position of the curve within the dose interval asthe trial progresses information accrues as to theresponse of patients to differing doses and we canlearn about the position and use this information toadapt allocation For example if we learned that thedosendashresponse curve was as in Figure 2B we couldreduce the chance of allocating patients to the lowdoses or to the very high doses whereas if it was asin Figure 2C we would only need to allocate toplacebo and the lowest four doses

213 Early stopping

As part of learning about the shape of the dosendashresponse curve we are also able to make decisionsabout stopping early For example if there is little orno evidence to show that the drug has a real effectthen we should stop as continuing would be futileSimilarly there may be enough evidence to identifya dose with sufficient efficacy and an adequatesafety profile to warrant going into a Phase III trial

214 Seamless designs

During thedevelopment of thisdesign itwas intendedthat if early stopping for efficacy took place transitionto the subsequent Phase III trial would take placeseamlessly There are clear advantages to such aproposal both in terms of time to approval if the drugwere effective but also in terms of continuity ofrecruitment This approach was not adopted forASTIN because of objections both from the regulatoryauthority as well as within the company

3 The ASTIN design process

Assume that the study is already running and thatinformation has accrued on the shape and location

of the dosendashresponse function allowing adaptiverandomization The design process is illustrated inFigure 3 in which four distinct elements areidentified randomization prediction decisionmaking and dose allocation

31 Randomization

A new patient entering the study is assigned toplacebo or an active dose predetermined tomaximize learning about that aspect of the dosendashresponse function that is of interest Patients areallocated at a minimum rate to placebo throughoutthe trial to protect against a drift in the patientpopulation that can lead to biased estimates of thedosendashresponse function By maintaining a mini-mum allocation to placebo we can fit time as anexplanatory variable in the analysis and amelioratethe influence of population drift

32 Prediction

In acute stroke trials it is standard to measure apatientrsquos response to treatment three months post-stroke Waiting 90 days to determine the response ofan individual patient to a dose results in manypatients needing to be randomized before you canhave learnt how to optimally allocate them to adose One approach is to predict a patientrsquos outcomeat 90 days based on early outcome or a surrogateoutcome Figure 4 shows data taken from theCopenhagen Stroke Database (CSD) [4] that con-tains data from 1351 pharmacologically untreatedstroke patients entering an acute stroke unit (ASU)in Copenhagen between September 1991 andSeptember 1993 In Figure 4A the relationshipbetween the Scandinavian Stroke Scale (SSS) scorethe primary endpoint used in ASTIN at admission

Figure 3 General design process for response-adaptivetrials

342 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

to the ASU and the corresponding score at dischargefrom the ASU is shown The SSS measuresneurological function with a zero representing acomatose patient and a score of 58 corresponding tono neurological deficit There is only a weak relationbetween the SSS scores at admission and dischargewith mild stroke patients being discharged withlittle neurological deficit and severe stroke patientsbeing discharged with a substantial residual deficitFigures 4B 4C and 4D show similar relationshipsbetween the SSS scores at 1 4 and 8 weeks post-stroke and the SSS score at discharge As timeprogresses there is an increasingly strong relation-ship between the SSS score at an intermediate timepoint and discharge These data were used toestablish a set of linear regressions to predict finaloutcome based on early measurements called thelongitudinal model In order to use this model inASTIN we required immediate access to earlyresponse data This was achieved by building anelectronic data interface to the clinical sites

33 Updating estimated dosendashresponse

Following the prediction of final outcome based onearly response the estimated dosendashresponse functionwas updated by Bayes theorem A Bayesian analysisaccounts for all sources of uncertainty in particularthe uncertainty associated with predicting patientoutcome from early outcomes in an appropriate way

when updating the estimate of the dosendashresponsecurve It also allowed us to update the longitudinalmodel as data from ASTIN became available

34 Decision making

Having updated the estimate of dosendashresponse wecan now make decisions as to the future conduct ofthe trial There are three possibilities First theremay be sufficient evidence to decide there is no doseof NIF giving sufficient efficacy to take into aPhase III leading to the halting of ASTIN and thedevelopment program Secondly there may besufficient evidence to identify a dose with theappropriate riskbenefit profile allowing ASTIN tobe stopped and the planning of Phase III to begin Ifneither of the above decisions can be made thenASTIN continues in adaptive allocation and learningabout the dosendashresponse function

35 Dose allocation

If ASTIN continues the dose to be given to the nextpatient can be determined by choosing that dosewhich maximizes learning about the aspect of thedosendashresponse function that is of primary interestBased on simulation we chose to use that dose whichminimizes the predicted variance of the response atthe ED95 in other words the dose that we expect to

Figure 4 Relationship between early response and final outcome in untreated acute stroke patients

ASTIN a Bayesian adaptive dosendashresponse trial 343

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

lead to the most precise estimate of the response atthe ED95 a measure of its expected utility

36 Dosendashresponse model

In general terms the requirements for a dosendashresponse model are that it relates the expectedresponse at a given dose to a set of parameters andpossibly covariates Usually dosendashresponse modelsare restricted to be monotonic In ASTIN a moreflexible model allowing nonmonotonicity wasneeded because of an indication from an earlypatient safety study that the dosendashresponse curvefor NIF might be nonmonotonic at high doses [5]

One of the hindrances to the use of Bayesianmethods in practice has been the lack of appropriatecomputational tools Recently the development ofnumerical methods based on Markov chain MonteCarlo (MCMC) has greatly broadened the scope ofpractical Bayesian statistics However MCMC has acomputational overhead that may be prohibitivebecause of the considerable simulation work that isnecessary to carry before trials can be run The dosendashresponse model needed to allow a degree of analyticupdating of the response curve and the calculationof the expected utilities of each dose

There are a number of approaches that give bothflexibility and efficiency including models based onsplines and kernel regression In ASTIN we chose theNormal Dynamic Linear Model (NDLM) which hasthe necessary characteristics NDLMs were orig-inally developed in time series [6] and combine twosources of variability observational and systemFigure 5 illustrates a second-order polynomial

NDLM The diamonds represent observed individualresponses Yjk at dose Zj At dose Zj a straight line isfitted through the observations parameterized sothat the expected value at z frac14 Zj is uj the interceptand the slope is dj If the model were a simple linearmodel then the expected response at dose Zjthorn1 frac14

Zjthorn 1 would be ujthorn dj and the slope would remain djThe dynamic component of the model allows thesemodel parameters to change The model may bewritten as follows

Observation equation

Yjk frac14 mj thorn nj nj N(0 Vs2)

System equations

mj frac14 mj1 thorn dj1 thorn vj vj N(0 Wjs2)

dj frac14 dj1 thorn 1j 1j N(0 Wjs2)

This is a flexible family of response functions inwhich the multipliers Wj can be regarded assmoothing tuners with small values giving asmooth response function and large values a moreerratic response function Covariates Xk can beintroduced by making the expected responsesdepend linearly on the covariates

E( yjkjz frac14 Zj Xk) frac14 mj frac14 uj thorn bXk

and by applying the NDLM to the uj values InASTIN baseline SSS was used as a covariate inmodeling dosendashresponse although not in choosingthe dose to which the next patient is allocated

Figure 5 Application of a second-order NDLM to dosendashresponse relationships

344 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

37 Stopping rules

In a Bayesian framework there are essentially twoways of stopping a clinical trial one based ondecision theoretic principles and the other on anassessment of the posterior probability of clinicalmeaningful effects

The difficulty with the decision approach is thatas more updates are performed the number ofdecision scenarios increases exponentiallysecondly the necessary calculations are computa-tionally intensive because they are not analyticallytractable Berry et al [7] report approximations thathave been developed to make the calculations morefeasible However in the implementation of ASTINit was decided to use the second approach in whichthe size of the effect at the ED95 relative to placebowas determined and the decision to stop was basedon the magnitude of this effect

The NDLM models response through the expectedvalue of the response mj at a number of doses( j frac14 1 J) including placebo (j frac14 1) The dosendashresponse curve is converted into a dosendasheffect curveby defining the expected difference to placebo as

fj frac14 mj m1 (j frac14 2 J)

To stop we require two clinical effect sizes The firstc1 is the smallest clinical effect that we would notwish to miss and the second c0 is the largest clinicaleffect that is not of interest These define stoppingcriteria for satisfactory efficacy and futility If theposterior interval for fk where k indexes the doseclosest to the estimated ED95 denoted ED95

liescompletely above c1 then we would conclude thatthere is sufficient evidence that the effect at the ED95

is large enough to warrant going into a Phase III trialConversely if the posterior interval for fk liescompletely below c0 we could conclude there issufficient evidence that the effect at the ED95 is toosmall to warrant continuing ASTIN or the NIFdevelopment program A hypothetical example in

which there is sufficient evidence to start a Phase IIIprogram is shown in Figure 6

4 Simulating the design

Having developed such a complex design were wein a position immediately to implement it Theanswer is no since there were a number ofinterested parties to persuade that the approachwas appropriate and feasible We needed to choosethe appropriate settings for the algorithm Weneeded to assure ourselves that the design had anacceptable operating characteristic not only interms of false positive rates and false negative ratesbut also in terms of the aspects of the design thatwere considered important Could the algorithmlearn appropriately Could it accurately estimatethe dosendashresponse relationship Did the adaptiveallocation result in a sensible choice of doses Couldwe stop early Were there benefits compared to atraditional design Answers to these questions wereneeded to convince senior management in thecompany that the design was worthwhile andacceptable to the regulatory authorities and toconvince the regulatory authorities in both NorthAmerica and Europe that the approach wasscientifically sound and that the computer systemsthat were developed were appropriately validatedAll of this was achieved by simulation In practicethe simulations were conducted in two stages In thefirst stage a fractional factorial computer experi-ment was conducted to optimize the parametersettings for the algorithm In the second stage theoperating characteristic of the design was deter-mined based on the parameters determined in thefirst stage

To illustrate Figure 7 displays a series of snap-shots from a simulation of a single trial in whichthe true underlying dosendashresponse (indicated bythe solid diamonds) corresponded to a logistic-typecurve giving a maximum benefit over placebo ofeight points change from baseline on the SSSIn Figure 7A the prior is shown and it can becharacterized as reflecting a belief that if a dosendashresponse exists it is minimal and gradual but thatthere is great uncertainty Figures 7B to 7H show theupdated estimate of the dosendashresponse curve (solidline) and its associated uncertainty (dashed line)after 25 50 100 200 500 patients On eachgraph the circles denote the observed patientresponses the arrows the doses that have alreadybeen allocated and the dotted line a locallyweighted fit (LOWESS) through the observedresponses These figures suggest that the algorithmcan accurately estimate the dosendashresponse relation-ship in that the estimate converges to the truth andthe uncertainty reduces rapidly They also suggestFigure 6 Use of a dosendasheffect curve to determine stopping

ASTIN a Bayesian adaptive dosendashresponse trial 345

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

that it can learn appropriately and consequentlychoose appropriate doses which is illustrated by thelow density of patients allocated to the bottom fourdoses ndash indeed the second dose remains unused ndash aswell as to the doses on the plateau

Hundreds of thousands of such simulations wereconducted so that the algorithm could be appro-priately tuned One of the main items of interest washow the adaptive design would compare to atraditional design Table 1 illustrates the results of

Figure 7 Simulation of a single clinical trial

346 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

the simulations If interest centered on identifyingthe ED95 from a curve that plateaued at 2 3 or 4points benefit over placebo then for 80 powerapproximately 2400 1000 and 600 patients arerequired respectively for 90 the correspondingnumbers are 3200 430 and 800 For an adaptivedesign based on a maximum of 1000 evaluablepatients the false positive rate is controlled at lessthan 5 while the effect ldquopowerrdquo for 3 and 4 pointsis acceptably high with small numbers of patientsOf course the comparison to the traditional designis not completely fair because the use of traditionalsequential designs would reduce the averagenumbers of patients

5 The results of the ASTIN study

In ASTIN patients who had suffered an acuteischemic stroke who arrived in the emergency roomof the hospital within 6 hours of its onset and whohad a baseline rating on the Scandinavian StrokeScale (SSS) of between 10 and 40 points were eligiblefor entry Details of the exclusion criteria can befound in Krams et al [8]

The study was run by an executive steeringcommittee and an Independent Data MonitoringCommittee (IDMC) with an expert Bayesian statis-tician who were responsible for ensuring the safety ofthe patients the integrity of the study monitoringthe performance of the algorithm and confirmingdecisions to continue or stop the study A number offormal meetings of the IDMC were prespecified forwhich reports were prepared by an independentstatistician and the computer system was runindependently of the sponsor by Tessella Ltd

A total of 966 patients were randomized ofwhom 26 received placebo There was preferentialallocation to the top three doses (Figure 8) and 40of patients were allocated to them

Figure 9 illustrates the course of ASTIN in terms ofthe estimate of the dosendashresponse relationship Inthe frame corresponding to Week 0 the priorestimate of the dosendashresponse is shown to be flatwith expectation of 10 points change from baseline

on the SSS This placebo effect size was estimatedfrom the data in the CSD study The following sixframes show estimates of the dosendashresponse at8-week intervals up to 48 weeks What is alreadyapparent by 16 weeks is that response on placebo ndash19 points ndash is far higher than anticipated from theCSD study and this persists during the course of thewhole study At week 32 the placebo effect is stillmuch higher than anticipated but there is littleindication of a true dosendashresponse At week 48 thefirst opportunity at which the IDMC could stopthe trial it was stopped for futility At this pointrecruitment was halted but the protocol requiredthat the patients who had already been enteredshould continue to be monitored for the full 13 weeksof the study The study was finally completed 66weeks after it started at which point there was littleindication of a positive dosendashresponse

The final estimate of the dosendashresponse ismagnified in Figure 10 and there are a number ofpoints to consider First as was noted above theplacebo response was far greater at 17 points thanthe anticipated 10 points The trial was designed todetect a 3 point benefit over placebo which clearlyis far greater than the estimated effect at any doseThis final model contained a single covariate ndashbaseline severity After the trial was completed anumber of predefined important prognostic covari-ates including tPA were added to the model butnone materially changed the final conclusionsThere was also no indication of any issues withadverse events or serious adverse events The onlydose-related effect that was seen was in theantibodies to neutrophil inhibitory factor (NIF) [8]

6 Post-trial learnings

At the conclusion of ASTIN a number of investi-gations were undertaken to assess the running of thetrial and the algorithm

Table 1 Operating characteristic of traditional and adaptive

designs

Benefit overplacebo

Power oftraditional design

Adaptive design(max 1000 pts)

80 90 Stoppedfor efficacy

Median numberof patients

0 mdash mdash 002 5012 2432 3220 056 6443 1080 432 090 4164 608 808 095 280

Figure 8 Patient allocation by dose group

ASTIN a Bayesian adaptive dosendashresponse trial 347

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

61 Operational investigations

Throughout the planning of ASTIN we anticipated aparticular recruitment speed in the event theachieved rate was double Centers themselves were

enthusiastic about the unique aspects of the design aphenomenon reported in a previous adaptive design[9] One consequence of faster recruitment is thatlearning cannot take place efficiently Our investi-gations suggest that a lower recruitment rate would

Figure 9 Posterior estimate of the dosendashresponse function during the course of ASTIN mdashmdash Posterior estimate --- - - - posterioruncertainty

348 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 4: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

to the ASU and the corresponding score at dischargefrom the ASU is shown The SSS measuresneurological function with a zero representing acomatose patient and a score of 58 corresponding tono neurological deficit There is only a weak relationbetween the SSS scores at admission and dischargewith mild stroke patients being discharged withlittle neurological deficit and severe stroke patientsbeing discharged with a substantial residual deficitFigures 4B 4C and 4D show similar relationshipsbetween the SSS scores at 1 4 and 8 weeks post-stroke and the SSS score at discharge As timeprogresses there is an increasingly strong relation-ship between the SSS score at an intermediate timepoint and discharge These data were used toestablish a set of linear regressions to predict finaloutcome based on early measurements called thelongitudinal model In order to use this model inASTIN we required immediate access to earlyresponse data This was achieved by building anelectronic data interface to the clinical sites

33 Updating estimated dosendashresponse

Following the prediction of final outcome based onearly response the estimated dosendashresponse functionwas updated by Bayes theorem A Bayesian analysisaccounts for all sources of uncertainty in particularthe uncertainty associated with predicting patientoutcome from early outcomes in an appropriate way

when updating the estimate of the dosendashresponsecurve It also allowed us to update the longitudinalmodel as data from ASTIN became available

34 Decision making

Having updated the estimate of dosendashresponse wecan now make decisions as to the future conduct ofthe trial There are three possibilities First theremay be sufficient evidence to decide there is no doseof NIF giving sufficient efficacy to take into aPhase III leading to the halting of ASTIN and thedevelopment program Secondly there may besufficient evidence to identify a dose with theappropriate riskbenefit profile allowing ASTIN tobe stopped and the planning of Phase III to begin Ifneither of the above decisions can be made thenASTIN continues in adaptive allocation and learningabout the dosendashresponse function

35 Dose allocation

If ASTIN continues the dose to be given to the nextpatient can be determined by choosing that dosewhich maximizes learning about the aspect of thedosendashresponse function that is of primary interestBased on simulation we chose to use that dose whichminimizes the predicted variance of the response atthe ED95 in other words the dose that we expect to

Figure 4 Relationship between early response and final outcome in untreated acute stroke patients

ASTIN a Bayesian adaptive dosendashresponse trial 343

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

lead to the most precise estimate of the response atthe ED95 a measure of its expected utility

36 Dosendashresponse model

In general terms the requirements for a dosendashresponse model are that it relates the expectedresponse at a given dose to a set of parameters andpossibly covariates Usually dosendashresponse modelsare restricted to be monotonic In ASTIN a moreflexible model allowing nonmonotonicity wasneeded because of an indication from an earlypatient safety study that the dosendashresponse curvefor NIF might be nonmonotonic at high doses [5]

One of the hindrances to the use of Bayesianmethods in practice has been the lack of appropriatecomputational tools Recently the development ofnumerical methods based on Markov chain MonteCarlo (MCMC) has greatly broadened the scope ofpractical Bayesian statistics However MCMC has acomputational overhead that may be prohibitivebecause of the considerable simulation work that isnecessary to carry before trials can be run The dosendashresponse model needed to allow a degree of analyticupdating of the response curve and the calculationof the expected utilities of each dose

There are a number of approaches that give bothflexibility and efficiency including models based onsplines and kernel regression In ASTIN we chose theNormal Dynamic Linear Model (NDLM) which hasthe necessary characteristics NDLMs were orig-inally developed in time series [6] and combine twosources of variability observational and systemFigure 5 illustrates a second-order polynomial

NDLM The diamonds represent observed individualresponses Yjk at dose Zj At dose Zj a straight line isfitted through the observations parameterized sothat the expected value at z frac14 Zj is uj the interceptand the slope is dj If the model were a simple linearmodel then the expected response at dose Zjthorn1 frac14

Zjthorn 1 would be ujthorn dj and the slope would remain djThe dynamic component of the model allows thesemodel parameters to change The model may bewritten as follows

Observation equation

Yjk frac14 mj thorn nj nj N(0 Vs2)

System equations

mj frac14 mj1 thorn dj1 thorn vj vj N(0 Wjs2)

dj frac14 dj1 thorn 1j 1j N(0 Wjs2)

This is a flexible family of response functions inwhich the multipliers Wj can be regarded assmoothing tuners with small values giving asmooth response function and large values a moreerratic response function Covariates Xk can beintroduced by making the expected responsesdepend linearly on the covariates

E( yjkjz frac14 Zj Xk) frac14 mj frac14 uj thorn bXk

and by applying the NDLM to the uj values InASTIN baseline SSS was used as a covariate inmodeling dosendashresponse although not in choosingthe dose to which the next patient is allocated

Figure 5 Application of a second-order NDLM to dosendashresponse relationships

344 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

37 Stopping rules

In a Bayesian framework there are essentially twoways of stopping a clinical trial one based ondecision theoretic principles and the other on anassessment of the posterior probability of clinicalmeaningful effects

The difficulty with the decision approach is thatas more updates are performed the number ofdecision scenarios increases exponentiallysecondly the necessary calculations are computa-tionally intensive because they are not analyticallytractable Berry et al [7] report approximations thathave been developed to make the calculations morefeasible However in the implementation of ASTINit was decided to use the second approach in whichthe size of the effect at the ED95 relative to placebowas determined and the decision to stop was basedon the magnitude of this effect

The NDLM models response through the expectedvalue of the response mj at a number of doses( j frac14 1 J) including placebo (j frac14 1) The dosendashresponse curve is converted into a dosendasheffect curveby defining the expected difference to placebo as

fj frac14 mj m1 (j frac14 2 J)

To stop we require two clinical effect sizes The firstc1 is the smallest clinical effect that we would notwish to miss and the second c0 is the largest clinicaleffect that is not of interest These define stoppingcriteria for satisfactory efficacy and futility If theposterior interval for fk where k indexes the doseclosest to the estimated ED95 denoted ED95

liescompletely above c1 then we would conclude thatthere is sufficient evidence that the effect at the ED95

is large enough to warrant going into a Phase III trialConversely if the posterior interval for fk liescompletely below c0 we could conclude there issufficient evidence that the effect at the ED95 is toosmall to warrant continuing ASTIN or the NIFdevelopment program A hypothetical example in

which there is sufficient evidence to start a Phase IIIprogram is shown in Figure 6

4 Simulating the design

Having developed such a complex design were wein a position immediately to implement it Theanswer is no since there were a number ofinterested parties to persuade that the approachwas appropriate and feasible We needed to choosethe appropriate settings for the algorithm Weneeded to assure ourselves that the design had anacceptable operating characteristic not only interms of false positive rates and false negative ratesbut also in terms of the aspects of the design thatwere considered important Could the algorithmlearn appropriately Could it accurately estimatethe dosendashresponse relationship Did the adaptiveallocation result in a sensible choice of doses Couldwe stop early Were there benefits compared to atraditional design Answers to these questions wereneeded to convince senior management in thecompany that the design was worthwhile andacceptable to the regulatory authorities and toconvince the regulatory authorities in both NorthAmerica and Europe that the approach wasscientifically sound and that the computer systemsthat were developed were appropriately validatedAll of this was achieved by simulation In practicethe simulations were conducted in two stages In thefirst stage a fractional factorial computer experi-ment was conducted to optimize the parametersettings for the algorithm In the second stage theoperating characteristic of the design was deter-mined based on the parameters determined in thefirst stage

To illustrate Figure 7 displays a series of snap-shots from a simulation of a single trial in whichthe true underlying dosendashresponse (indicated bythe solid diamonds) corresponded to a logistic-typecurve giving a maximum benefit over placebo ofeight points change from baseline on the SSSIn Figure 7A the prior is shown and it can becharacterized as reflecting a belief that if a dosendashresponse exists it is minimal and gradual but thatthere is great uncertainty Figures 7B to 7H show theupdated estimate of the dosendashresponse curve (solidline) and its associated uncertainty (dashed line)after 25 50 100 200 500 patients On eachgraph the circles denote the observed patientresponses the arrows the doses that have alreadybeen allocated and the dotted line a locallyweighted fit (LOWESS) through the observedresponses These figures suggest that the algorithmcan accurately estimate the dosendashresponse relation-ship in that the estimate converges to the truth andthe uncertainty reduces rapidly They also suggestFigure 6 Use of a dosendasheffect curve to determine stopping

ASTIN a Bayesian adaptive dosendashresponse trial 345

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

that it can learn appropriately and consequentlychoose appropriate doses which is illustrated by thelow density of patients allocated to the bottom fourdoses ndash indeed the second dose remains unused ndash aswell as to the doses on the plateau

Hundreds of thousands of such simulations wereconducted so that the algorithm could be appro-priately tuned One of the main items of interest washow the adaptive design would compare to atraditional design Table 1 illustrates the results of

Figure 7 Simulation of a single clinical trial

346 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

the simulations If interest centered on identifyingthe ED95 from a curve that plateaued at 2 3 or 4points benefit over placebo then for 80 powerapproximately 2400 1000 and 600 patients arerequired respectively for 90 the correspondingnumbers are 3200 430 and 800 For an adaptivedesign based on a maximum of 1000 evaluablepatients the false positive rate is controlled at lessthan 5 while the effect ldquopowerrdquo for 3 and 4 pointsis acceptably high with small numbers of patientsOf course the comparison to the traditional designis not completely fair because the use of traditionalsequential designs would reduce the averagenumbers of patients

5 The results of the ASTIN study

In ASTIN patients who had suffered an acuteischemic stroke who arrived in the emergency roomof the hospital within 6 hours of its onset and whohad a baseline rating on the Scandinavian StrokeScale (SSS) of between 10 and 40 points were eligiblefor entry Details of the exclusion criteria can befound in Krams et al [8]

The study was run by an executive steeringcommittee and an Independent Data MonitoringCommittee (IDMC) with an expert Bayesian statis-tician who were responsible for ensuring the safety ofthe patients the integrity of the study monitoringthe performance of the algorithm and confirmingdecisions to continue or stop the study A number offormal meetings of the IDMC were prespecified forwhich reports were prepared by an independentstatistician and the computer system was runindependently of the sponsor by Tessella Ltd

A total of 966 patients were randomized ofwhom 26 received placebo There was preferentialallocation to the top three doses (Figure 8) and 40of patients were allocated to them

Figure 9 illustrates the course of ASTIN in terms ofthe estimate of the dosendashresponse relationship Inthe frame corresponding to Week 0 the priorestimate of the dosendashresponse is shown to be flatwith expectation of 10 points change from baseline

on the SSS This placebo effect size was estimatedfrom the data in the CSD study The following sixframes show estimates of the dosendashresponse at8-week intervals up to 48 weeks What is alreadyapparent by 16 weeks is that response on placebo ndash19 points ndash is far higher than anticipated from theCSD study and this persists during the course of thewhole study At week 32 the placebo effect is stillmuch higher than anticipated but there is littleindication of a true dosendashresponse At week 48 thefirst opportunity at which the IDMC could stopthe trial it was stopped for futility At this pointrecruitment was halted but the protocol requiredthat the patients who had already been enteredshould continue to be monitored for the full 13 weeksof the study The study was finally completed 66weeks after it started at which point there was littleindication of a positive dosendashresponse

The final estimate of the dosendashresponse ismagnified in Figure 10 and there are a number ofpoints to consider First as was noted above theplacebo response was far greater at 17 points thanthe anticipated 10 points The trial was designed todetect a 3 point benefit over placebo which clearlyis far greater than the estimated effect at any doseThis final model contained a single covariate ndashbaseline severity After the trial was completed anumber of predefined important prognostic covari-ates including tPA were added to the model butnone materially changed the final conclusionsThere was also no indication of any issues withadverse events or serious adverse events The onlydose-related effect that was seen was in theantibodies to neutrophil inhibitory factor (NIF) [8]

6 Post-trial learnings

At the conclusion of ASTIN a number of investi-gations were undertaken to assess the running of thetrial and the algorithm

Table 1 Operating characteristic of traditional and adaptive

designs

Benefit overplacebo

Power oftraditional design

Adaptive design(max 1000 pts)

80 90 Stoppedfor efficacy

Median numberof patients

0 mdash mdash 002 5012 2432 3220 056 6443 1080 432 090 4164 608 808 095 280

Figure 8 Patient allocation by dose group

ASTIN a Bayesian adaptive dosendashresponse trial 347

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

61 Operational investigations

Throughout the planning of ASTIN we anticipated aparticular recruitment speed in the event theachieved rate was double Centers themselves were

enthusiastic about the unique aspects of the design aphenomenon reported in a previous adaptive design[9] One consequence of faster recruitment is thatlearning cannot take place efficiently Our investi-gations suggest that a lower recruitment rate would

Figure 9 Posterior estimate of the dosendashresponse function during the course of ASTIN mdashmdash Posterior estimate --- - - - posterioruncertainty

348 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 5: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

lead to the most precise estimate of the response atthe ED95 a measure of its expected utility

36 Dosendashresponse model

In general terms the requirements for a dosendashresponse model are that it relates the expectedresponse at a given dose to a set of parameters andpossibly covariates Usually dosendashresponse modelsare restricted to be monotonic In ASTIN a moreflexible model allowing nonmonotonicity wasneeded because of an indication from an earlypatient safety study that the dosendashresponse curvefor NIF might be nonmonotonic at high doses [5]

One of the hindrances to the use of Bayesianmethods in practice has been the lack of appropriatecomputational tools Recently the development ofnumerical methods based on Markov chain MonteCarlo (MCMC) has greatly broadened the scope ofpractical Bayesian statistics However MCMC has acomputational overhead that may be prohibitivebecause of the considerable simulation work that isnecessary to carry before trials can be run The dosendashresponse model needed to allow a degree of analyticupdating of the response curve and the calculationof the expected utilities of each dose

There are a number of approaches that give bothflexibility and efficiency including models based onsplines and kernel regression In ASTIN we chose theNormal Dynamic Linear Model (NDLM) which hasthe necessary characteristics NDLMs were orig-inally developed in time series [6] and combine twosources of variability observational and systemFigure 5 illustrates a second-order polynomial

NDLM The diamonds represent observed individualresponses Yjk at dose Zj At dose Zj a straight line isfitted through the observations parameterized sothat the expected value at z frac14 Zj is uj the interceptand the slope is dj If the model were a simple linearmodel then the expected response at dose Zjthorn1 frac14

Zjthorn 1 would be ujthorn dj and the slope would remain djThe dynamic component of the model allows thesemodel parameters to change The model may bewritten as follows

Observation equation

Yjk frac14 mj thorn nj nj N(0 Vs2)

System equations

mj frac14 mj1 thorn dj1 thorn vj vj N(0 Wjs2)

dj frac14 dj1 thorn 1j 1j N(0 Wjs2)

This is a flexible family of response functions inwhich the multipliers Wj can be regarded assmoothing tuners with small values giving asmooth response function and large values a moreerratic response function Covariates Xk can beintroduced by making the expected responsesdepend linearly on the covariates

E( yjkjz frac14 Zj Xk) frac14 mj frac14 uj thorn bXk

and by applying the NDLM to the uj values InASTIN baseline SSS was used as a covariate inmodeling dosendashresponse although not in choosingthe dose to which the next patient is allocated

Figure 5 Application of a second-order NDLM to dosendashresponse relationships

344 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

37 Stopping rules

In a Bayesian framework there are essentially twoways of stopping a clinical trial one based ondecision theoretic principles and the other on anassessment of the posterior probability of clinicalmeaningful effects

The difficulty with the decision approach is thatas more updates are performed the number ofdecision scenarios increases exponentiallysecondly the necessary calculations are computa-tionally intensive because they are not analyticallytractable Berry et al [7] report approximations thathave been developed to make the calculations morefeasible However in the implementation of ASTINit was decided to use the second approach in whichthe size of the effect at the ED95 relative to placebowas determined and the decision to stop was basedon the magnitude of this effect

The NDLM models response through the expectedvalue of the response mj at a number of doses( j frac14 1 J) including placebo (j frac14 1) The dosendashresponse curve is converted into a dosendasheffect curveby defining the expected difference to placebo as

fj frac14 mj m1 (j frac14 2 J)

To stop we require two clinical effect sizes The firstc1 is the smallest clinical effect that we would notwish to miss and the second c0 is the largest clinicaleffect that is not of interest These define stoppingcriteria for satisfactory efficacy and futility If theposterior interval for fk where k indexes the doseclosest to the estimated ED95 denoted ED95

liescompletely above c1 then we would conclude thatthere is sufficient evidence that the effect at the ED95

is large enough to warrant going into a Phase III trialConversely if the posterior interval for fk liescompletely below c0 we could conclude there issufficient evidence that the effect at the ED95 is toosmall to warrant continuing ASTIN or the NIFdevelopment program A hypothetical example in

which there is sufficient evidence to start a Phase IIIprogram is shown in Figure 6

4 Simulating the design

Having developed such a complex design were wein a position immediately to implement it Theanswer is no since there were a number ofinterested parties to persuade that the approachwas appropriate and feasible We needed to choosethe appropriate settings for the algorithm Weneeded to assure ourselves that the design had anacceptable operating characteristic not only interms of false positive rates and false negative ratesbut also in terms of the aspects of the design thatwere considered important Could the algorithmlearn appropriately Could it accurately estimatethe dosendashresponse relationship Did the adaptiveallocation result in a sensible choice of doses Couldwe stop early Were there benefits compared to atraditional design Answers to these questions wereneeded to convince senior management in thecompany that the design was worthwhile andacceptable to the regulatory authorities and toconvince the regulatory authorities in both NorthAmerica and Europe that the approach wasscientifically sound and that the computer systemsthat were developed were appropriately validatedAll of this was achieved by simulation In practicethe simulations were conducted in two stages In thefirst stage a fractional factorial computer experi-ment was conducted to optimize the parametersettings for the algorithm In the second stage theoperating characteristic of the design was deter-mined based on the parameters determined in thefirst stage

To illustrate Figure 7 displays a series of snap-shots from a simulation of a single trial in whichthe true underlying dosendashresponse (indicated bythe solid diamonds) corresponded to a logistic-typecurve giving a maximum benefit over placebo ofeight points change from baseline on the SSSIn Figure 7A the prior is shown and it can becharacterized as reflecting a belief that if a dosendashresponse exists it is minimal and gradual but thatthere is great uncertainty Figures 7B to 7H show theupdated estimate of the dosendashresponse curve (solidline) and its associated uncertainty (dashed line)after 25 50 100 200 500 patients On eachgraph the circles denote the observed patientresponses the arrows the doses that have alreadybeen allocated and the dotted line a locallyweighted fit (LOWESS) through the observedresponses These figures suggest that the algorithmcan accurately estimate the dosendashresponse relation-ship in that the estimate converges to the truth andthe uncertainty reduces rapidly They also suggestFigure 6 Use of a dosendasheffect curve to determine stopping

ASTIN a Bayesian adaptive dosendashresponse trial 345

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

that it can learn appropriately and consequentlychoose appropriate doses which is illustrated by thelow density of patients allocated to the bottom fourdoses ndash indeed the second dose remains unused ndash aswell as to the doses on the plateau

Hundreds of thousands of such simulations wereconducted so that the algorithm could be appro-priately tuned One of the main items of interest washow the adaptive design would compare to atraditional design Table 1 illustrates the results of

Figure 7 Simulation of a single clinical trial

346 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

the simulations If interest centered on identifyingthe ED95 from a curve that plateaued at 2 3 or 4points benefit over placebo then for 80 powerapproximately 2400 1000 and 600 patients arerequired respectively for 90 the correspondingnumbers are 3200 430 and 800 For an adaptivedesign based on a maximum of 1000 evaluablepatients the false positive rate is controlled at lessthan 5 while the effect ldquopowerrdquo for 3 and 4 pointsis acceptably high with small numbers of patientsOf course the comparison to the traditional designis not completely fair because the use of traditionalsequential designs would reduce the averagenumbers of patients

5 The results of the ASTIN study

In ASTIN patients who had suffered an acuteischemic stroke who arrived in the emergency roomof the hospital within 6 hours of its onset and whohad a baseline rating on the Scandinavian StrokeScale (SSS) of between 10 and 40 points were eligiblefor entry Details of the exclusion criteria can befound in Krams et al [8]

The study was run by an executive steeringcommittee and an Independent Data MonitoringCommittee (IDMC) with an expert Bayesian statis-tician who were responsible for ensuring the safety ofthe patients the integrity of the study monitoringthe performance of the algorithm and confirmingdecisions to continue or stop the study A number offormal meetings of the IDMC were prespecified forwhich reports were prepared by an independentstatistician and the computer system was runindependently of the sponsor by Tessella Ltd

A total of 966 patients were randomized ofwhom 26 received placebo There was preferentialallocation to the top three doses (Figure 8) and 40of patients were allocated to them

Figure 9 illustrates the course of ASTIN in terms ofthe estimate of the dosendashresponse relationship Inthe frame corresponding to Week 0 the priorestimate of the dosendashresponse is shown to be flatwith expectation of 10 points change from baseline

on the SSS This placebo effect size was estimatedfrom the data in the CSD study The following sixframes show estimates of the dosendashresponse at8-week intervals up to 48 weeks What is alreadyapparent by 16 weeks is that response on placebo ndash19 points ndash is far higher than anticipated from theCSD study and this persists during the course of thewhole study At week 32 the placebo effect is stillmuch higher than anticipated but there is littleindication of a true dosendashresponse At week 48 thefirst opportunity at which the IDMC could stopthe trial it was stopped for futility At this pointrecruitment was halted but the protocol requiredthat the patients who had already been enteredshould continue to be monitored for the full 13 weeksof the study The study was finally completed 66weeks after it started at which point there was littleindication of a positive dosendashresponse

The final estimate of the dosendashresponse ismagnified in Figure 10 and there are a number ofpoints to consider First as was noted above theplacebo response was far greater at 17 points thanthe anticipated 10 points The trial was designed todetect a 3 point benefit over placebo which clearlyis far greater than the estimated effect at any doseThis final model contained a single covariate ndashbaseline severity After the trial was completed anumber of predefined important prognostic covari-ates including tPA were added to the model butnone materially changed the final conclusionsThere was also no indication of any issues withadverse events or serious adverse events The onlydose-related effect that was seen was in theantibodies to neutrophil inhibitory factor (NIF) [8]

6 Post-trial learnings

At the conclusion of ASTIN a number of investi-gations were undertaken to assess the running of thetrial and the algorithm

Table 1 Operating characteristic of traditional and adaptive

designs

Benefit overplacebo

Power oftraditional design

Adaptive design(max 1000 pts)

80 90 Stoppedfor efficacy

Median numberof patients

0 mdash mdash 002 5012 2432 3220 056 6443 1080 432 090 4164 608 808 095 280

Figure 8 Patient allocation by dose group

ASTIN a Bayesian adaptive dosendashresponse trial 347

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

61 Operational investigations

Throughout the planning of ASTIN we anticipated aparticular recruitment speed in the event theachieved rate was double Centers themselves were

enthusiastic about the unique aspects of the design aphenomenon reported in a previous adaptive design[9] One consequence of faster recruitment is thatlearning cannot take place efficiently Our investi-gations suggest that a lower recruitment rate would

Figure 9 Posterior estimate of the dosendashresponse function during the course of ASTIN mdashmdash Posterior estimate --- - - - posterioruncertainty

348 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 6: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

37 Stopping rules

In a Bayesian framework there are essentially twoways of stopping a clinical trial one based ondecision theoretic principles and the other on anassessment of the posterior probability of clinicalmeaningful effects

The difficulty with the decision approach is thatas more updates are performed the number ofdecision scenarios increases exponentiallysecondly the necessary calculations are computa-tionally intensive because they are not analyticallytractable Berry et al [7] report approximations thathave been developed to make the calculations morefeasible However in the implementation of ASTINit was decided to use the second approach in whichthe size of the effect at the ED95 relative to placebowas determined and the decision to stop was basedon the magnitude of this effect

The NDLM models response through the expectedvalue of the response mj at a number of doses( j frac14 1 J) including placebo (j frac14 1) The dosendashresponse curve is converted into a dosendasheffect curveby defining the expected difference to placebo as

fj frac14 mj m1 (j frac14 2 J)

To stop we require two clinical effect sizes The firstc1 is the smallest clinical effect that we would notwish to miss and the second c0 is the largest clinicaleffect that is not of interest These define stoppingcriteria for satisfactory efficacy and futility If theposterior interval for fk where k indexes the doseclosest to the estimated ED95 denoted ED95

liescompletely above c1 then we would conclude thatthere is sufficient evidence that the effect at the ED95

is large enough to warrant going into a Phase III trialConversely if the posterior interval for fk liescompletely below c0 we could conclude there issufficient evidence that the effect at the ED95 is toosmall to warrant continuing ASTIN or the NIFdevelopment program A hypothetical example in

which there is sufficient evidence to start a Phase IIIprogram is shown in Figure 6

4 Simulating the design

Having developed such a complex design were wein a position immediately to implement it Theanswer is no since there were a number ofinterested parties to persuade that the approachwas appropriate and feasible We needed to choosethe appropriate settings for the algorithm Weneeded to assure ourselves that the design had anacceptable operating characteristic not only interms of false positive rates and false negative ratesbut also in terms of the aspects of the design thatwere considered important Could the algorithmlearn appropriately Could it accurately estimatethe dosendashresponse relationship Did the adaptiveallocation result in a sensible choice of doses Couldwe stop early Were there benefits compared to atraditional design Answers to these questions wereneeded to convince senior management in thecompany that the design was worthwhile andacceptable to the regulatory authorities and toconvince the regulatory authorities in both NorthAmerica and Europe that the approach wasscientifically sound and that the computer systemsthat were developed were appropriately validatedAll of this was achieved by simulation In practicethe simulations were conducted in two stages In thefirst stage a fractional factorial computer experi-ment was conducted to optimize the parametersettings for the algorithm In the second stage theoperating characteristic of the design was deter-mined based on the parameters determined in thefirst stage

To illustrate Figure 7 displays a series of snap-shots from a simulation of a single trial in whichthe true underlying dosendashresponse (indicated bythe solid diamonds) corresponded to a logistic-typecurve giving a maximum benefit over placebo ofeight points change from baseline on the SSSIn Figure 7A the prior is shown and it can becharacterized as reflecting a belief that if a dosendashresponse exists it is minimal and gradual but thatthere is great uncertainty Figures 7B to 7H show theupdated estimate of the dosendashresponse curve (solidline) and its associated uncertainty (dashed line)after 25 50 100 200 500 patients On eachgraph the circles denote the observed patientresponses the arrows the doses that have alreadybeen allocated and the dotted line a locallyweighted fit (LOWESS) through the observedresponses These figures suggest that the algorithmcan accurately estimate the dosendashresponse relation-ship in that the estimate converges to the truth andthe uncertainty reduces rapidly They also suggestFigure 6 Use of a dosendasheffect curve to determine stopping

ASTIN a Bayesian adaptive dosendashresponse trial 345

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

that it can learn appropriately and consequentlychoose appropriate doses which is illustrated by thelow density of patients allocated to the bottom fourdoses ndash indeed the second dose remains unused ndash aswell as to the doses on the plateau

Hundreds of thousands of such simulations wereconducted so that the algorithm could be appro-priately tuned One of the main items of interest washow the adaptive design would compare to atraditional design Table 1 illustrates the results of

Figure 7 Simulation of a single clinical trial

346 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

the simulations If interest centered on identifyingthe ED95 from a curve that plateaued at 2 3 or 4points benefit over placebo then for 80 powerapproximately 2400 1000 and 600 patients arerequired respectively for 90 the correspondingnumbers are 3200 430 and 800 For an adaptivedesign based on a maximum of 1000 evaluablepatients the false positive rate is controlled at lessthan 5 while the effect ldquopowerrdquo for 3 and 4 pointsis acceptably high with small numbers of patientsOf course the comparison to the traditional designis not completely fair because the use of traditionalsequential designs would reduce the averagenumbers of patients

5 The results of the ASTIN study

In ASTIN patients who had suffered an acuteischemic stroke who arrived in the emergency roomof the hospital within 6 hours of its onset and whohad a baseline rating on the Scandinavian StrokeScale (SSS) of between 10 and 40 points were eligiblefor entry Details of the exclusion criteria can befound in Krams et al [8]

The study was run by an executive steeringcommittee and an Independent Data MonitoringCommittee (IDMC) with an expert Bayesian statis-tician who were responsible for ensuring the safety ofthe patients the integrity of the study monitoringthe performance of the algorithm and confirmingdecisions to continue or stop the study A number offormal meetings of the IDMC were prespecified forwhich reports were prepared by an independentstatistician and the computer system was runindependently of the sponsor by Tessella Ltd

A total of 966 patients were randomized ofwhom 26 received placebo There was preferentialallocation to the top three doses (Figure 8) and 40of patients were allocated to them

Figure 9 illustrates the course of ASTIN in terms ofthe estimate of the dosendashresponse relationship Inthe frame corresponding to Week 0 the priorestimate of the dosendashresponse is shown to be flatwith expectation of 10 points change from baseline

on the SSS This placebo effect size was estimatedfrom the data in the CSD study The following sixframes show estimates of the dosendashresponse at8-week intervals up to 48 weeks What is alreadyapparent by 16 weeks is that response on placebo ndash19 points ndash is far higher than anticipated from theCSD study and this persists during the course of thewhole study At week 32 the placebo effect is stillmuch higher than anticipated but there is littleindication of a true dosendashresponse At week 48 thefirst opportunity at which the IDMC could stopthe trial it was stopped for futility At this pointrecruitment was halted but the protocol requiredthat the patients who had already been enteredshould continue to be monitored for the full 13 weeksof the study The study was finally completed 66weeks after it started at which point there was littleindication of a positive dosendashresponse

The final estimate of the dosendashresponse ismagnified in Figure 10 and there are a number ofpoints to consider First as was noted above theplacebo response was far greater at 17 points thanthe anticipated 10 points The trial was designed todetect a 3 point benefit over placebo which clearlyis far greater than the estimated effect at any doseThis final model contained a single covariate ndashbaseline severity After the trial was completed anumber of predefined important prognostic covari-ates including tPA were added to the model butnone materially changed the final conclusionsThere was also no indication of any issues withadverse events or serious adverse events The onlydose-related effect that was seen was in theantibodies to neutrophil inhibitory factor (NIF) [8]

6 Post-trial learnings

At the conclusion of ASTIN a number of investi-gations were undertaken to assess the running of thetrial and the algorithm

Table 1 Operating characteristic of traditional and adaptive

designs

Benefit overplacebo

Power oftraditional design

Adaptive design(max 1000 pts)

80 90 Stoppedfor efficacy

Median numberof patients

0 mdash mdash 002 5012 2432 3220 056 6443 1080 432 090 4164 608 808 095 280

Figure 8 Patient allocation by dose group

ASTIN a Bayesian adaptive dosendashresponse trial 347

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

61 Operational investigations

Throughout the planning of ASTIN we anticipated aparticular recruitment speed in the event theachieved rate was double Centers themselves were

enthusiastic about the unique aspects of the design aphenomenon reported in a previous adaptive design[9] One consequence of faster recruitment is thatlearning cannot take place efficiently Our investi-gations suggest that a lower recruitment rate would

Figure 9 Posterior estimate of the dosendashresponse function during the course of ASTIN mdashmdash Posterior estimate --- - - - posterioruncertainty

348 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 7: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

that it can learn appropriately and consequentlychoose appropriate doses which is illustrated by thelow density of patients allocated to the bottom fourdoses ndash indeed the second dose remains unused ndash aswell as to the doses on the plateau

Hundreds of thousands of such simulations wereconducted so that the algorithm could be appro-priately tuned One of the main items of interest washow the adaptive design would compare to atraditional design Table 1 illustrates the results of

Figure 7 Simulation of a single clinical trial

346 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

the simulations If interest centered on identifyingthe ED95 from a curve that plateaued at 2 3 or 4points benefit over placebo then for 80 powerapproximately 2400 1000 and 600 patients arerequired respectively for 90 the correspondingnumbers are 3200 430 and 800 For an adaptivedesign based on a maximum of 1000 evaluablepatients the false positive rate is controlled at lessthan 5 while the effect ldquopowerrdquo for 3 and 4 pointsis acceptably high with small numbers of patientsOf course the comparison to the traditional designis not completely fair because the use of traditionalsequential designs would reduce the averagenumbers of patients

5 The results of the ASTIN study

In ASTIN patients who had suffered an acuteischemic stroke who arrived in the emergency roomof the hospital within 6 hours of its onset and whohad a baseline rating on the Scandinavian StrokeScale (SSS) of between 10 and 40 points were eligiblefor entry Details of the exclusion criteria can befound in Krams et al [8]

The study was run by an executive steeringcommittee and an Independent Data MonitoringCommittee (IDMC) with an expert Bayesian statis-tician who were responsible for ensuring the safety ofthe patients the integrity of the study monitoringthe performance of the algorithm and confirmingdecisions to continue or stop the study A number offormal meetings of the IDMC were prespecified forwhich reports were prepared by an independentstatistician and the computer system was runindependently of the sponsor by Tessella Ltd

A total of 966 patients were randomized ofwhom 26 received placebo There was preferentialallocation to the top three doses (Figure 8) and 40of patients were allocated to them

Figure 9 illustrates the course of ASTIN in terms ofthe estimate of the dosendashresponse relationship Inthe frame corresponding to Week 0 the priorestimate of the dosendashresponse is shown to be flatwith expectation of 10 points change from baseline

on the SSS This placebo effect size was estimatedfrom the data in the CSD study The following sixframes show estimates of the dosendashresponse at8-week intervals up to 48 weeks What is alreadyapparent by 16 weeks is that response on placebo ndash19 points ndash is far higher than anticipated from theCSD study and this persists during the course of thewhole study At week 32 the placebo effect is stillmuch higher than anticipated but there is littleindication of a true dosendashresponse At week 48 thefirst opportunity at which the IDMC could stopthe trial it was stopped for futility At this pointrecruitment was halted but the protocol requiredthat the patients who had already been enteredshould continue to be monitored for the full 13 weeksof the study The study was finally completed 66weeks after it started at which point there was littleindication of a positive dosendashresponse

The final estimate of the dosendashresponse ismagnified in Figure 10 and there are a number ofpoints to consider First as was noted above theplacebo response was far greater at 17 points thanthe anticipated 10 points The trial was designed todetect a 3 point benefit over placebo which clearlyis far greater than the estimated effect at any doseThis final model contained a single covariate ndashbaseline severity After the trial was completed anumber of predefined important prognostic covari-ates including tPA were added to the model butnone materially changed the final conclusionsThere was also no indication of any issues withadverse events or serious adverse events The onlydose-related effect that was seen was in theantibodies to neutrophil inhibitory factor (NIF) [8]

6 Post-trial learnings

At the conclusion of ASTIN a number of investi-gations were undertaken to assess the running of thetrial and the algorithm

Table 1 Operating characteristic of traditional and adaptive

designs

Benefit overplacebo

Power oftraditional design

Adaptive design(max 1000 pts)

80 90 Stoppedfor efficacy

Median numberof patients

0 mdash mdash 002 5012 2432 3220 056 6443 1080 432 090 4164 608 808 095 280

Figure 8 Patient allocation by dose group

ASTIN a Bayesian adaptive dosendashresponse trial 347

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

61 Operational investigations

Throughout the planning of ASTIN we anticipated aparticular recruitment speed in the event theachieved rate was double Centers themselves were

enthusiastic about the unique aspects of the design aphenomenon reported in a previous adaptive design[9] One consequence of faster recruitment is thatlearning cannot take place efficiently Our investi-gations suggest that a lower recruitment rate would

Figure 9 Posterior estimate of the dosendashresponse function during the course of ASTIN mdashmdash Posterior estimate --- - - - posterioruncertainty

348 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 8: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

the simulations If interest centered on identifyingthe ED95 from a curve that plateaued at 2 3 or 4points benefit over placebo then for 80 powerapproximately 2400 1000 and 600 patients arerequired respectively for 90 the correspondingnumbers are 3200 430 and 800 For an adaptivedesign based on a maximum of 1000 evaluablepatients the false positive rate is controlled at lessthan 5 while the effect ldquopowerrdquo for 3 and 4 pointsis acceptably high with small numbers of patientsOf course the comparison to the traditional designis not completely fair because the use of traditionalsequential designs would reduce the averagenumbers of patients

5 The results of the ASTIN study

In ASTIN patients who had suffered an acuteischemic stroke who arrived in the emergency roomof the hospital within 6 hours of its onset and whohad a baseline rating on the Scandinavian StrokeScale (SSS) of between 10 and 40 points were eligiblefor entry Details of the exclusion criteria can befound in Krams et al [8]

The study was run by an executive steeringcommittee and an Independent Data MonitoringCommittee (IDMC) with an expert Bayesian statis-tician who were responsible for ensuring the safety ofthe patients the integrity of the study monitoringthe performance of the algorithm and confirmingdecisions to continue or stop the study A number offormal meetings of the IDMC were prespecified forwhich reports were prepared by an independentstatistician and the computer system was runindependently of the sponsor by Tessella Ltd

A total of 966 patients were randomized ofwhom 26 received placebo There was preferentialallocation to the top three doses (Figure 8) and 40of patients were allocated to them

Figure 9 illustrates the course of ASTIN in terms ofthe estimate of the dosendashresponse relationship Inthe frame corresponding to Week 0 the priorestimate of the dosendashresponse is shown to be flatwith expectation of 10 points change from baseline

on the SSS This placebo effect size was estimatedfrom the data in the CSD study The following sixframes show estimates of the dosendashresponse at8-week intervals up to 48 weeks What is alreadyapparent by 16 weeks is that response on placebo ndash19 points ndash is far higher than anticipated from theCSD study and this persists during the course of thewhole study At week 32 the placebo effect is stillmuch higher than anticipated but there is littleindication of a true dosendashresponse At week 48 thefirst opportunity at which the IDMC could stopthe trial it was stopped for futility At this pointrecruitment was halted but the protocol requiredthat the patients who had already been enteredshould continue to be monitored for the full 13 weeksof the study The study was finally completed 66weeks after it started at which point there was littleindication of a positive dosendashresponse

The final estimate of the dosendashresponse ismagnified in Figure 10 and there are a number ofpoints to consider First as was noted above theplacebo response was far greater at 17 points thanthe anticipated 10 points The trial was designed todetect a 3 point benefit over placebo which clearlyis far greater than the estimated effect at any doseThis final model contained a single covariate ndashbaseline severity After the trial was completed anumber of predefined important prognostic covari-ates including tPA were added to the model butnone materially changed the final conclusionsThere was also no indication of any issues withadverse events or serious adverse events The onlydose-related effect that was seen was in theantibodies to neutrophil inhibitory factor (NIF) [8]

6 Post-trial learnings

At the conclusion of ASTIN a number of investi-gations were undertaken to assess the running of thetrial and the algorithm

Table 1 Operating characteristic of traditional and adaptive

designs

Benefit overplacebo

Power oftraditional design

Adaptive design(max 1000 pts)

80 90 Stoppedfor efficacy

Median numberof patients

0 mdash mdash 002 5012 2432 3220 056 6443 1080 432 090 4164 608 808 095 280

Figure 8 Patient allocation by dose group

ASTIN a Bayesian adaptive dosendashresponse trial 347

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

61 Operational investigations

Throughout the planning of ASTIN we anticipated aparticular recruitment speed in the event theachieved rate was double Centers themselves were

enthusiastic about the unique aspects of the design aphenomenon reported in a previous adaptive design[9] One consequence of faster recruitment is thatlearning cannot take place efficiently Our investi-gations suggest that a lower recruitment rate would

Figure 9 Posterior estimate of the dosendashresponse function during the course of ASTIN mdashmdash Posterior estimate --- - - - posterioruncertainty

348 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 9: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

61 Operational investigations

Throughout the planning of ASTIN we anticipated aparticular recruitment speed in the event theachieved rate was double Centers themselves were

enthusiastic about the unique aspects of the design aphenomenon reported in a previous adaptive design[9] One consequence of faster recruitment is thatlearning cannot take place efficiently Our investi-gations suggest that a lower recruitment rate would

Figure 9 Posterior estimate of the dosendashresponse function during the course of ASTIN mdashmdash Posterior estimate --- - - - posterioruncertainty

348 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 10: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

have caused the trial to stop with fewer patientsrecruited It would have been preferable to work withfewer centers aiming at achieving the optimalrecruitment speed after an initial ramp-up time

A second issue relates to exchangeability of thecenters In ASTIN there were 100 centers worldwideFewer centers would not only have brought us closerto an optimal recruitment speed it would mostlikely also have reduced variability

62 Statistical investigations

621 Longitudinal model

Figure 11A shows the average actual response forpatients who completed the trial (dashed line)

Figure 10 Final posterior estimate of the dosendashresponsefunction

Figure 11 Actual and imputed responses in ASTIN over time

ASTIN a Bayesian adaptive dosendashresponse trial 349

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 11: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

together with the average imputed responses fromthe longitudinal model (solid line) using the priorestimate of the longitudinal model from the CSDplotted over time Clearly the imputed values areconsiderably higher than the actual responsesFigure 11B shows the same plot using the imputedvalues obtained from the updated longitudinalmodel Here we see that once the longitudinal modelbegins to be updated the imputed values becomecloser to the actual observed responses illustratingthat the CSD prior was not very good Why

The parameterization of the longitudinal modelthat was implemented in ASTIN had the form

yi frac14 ai thorn bithorn1

where yi is the response at week i There are twoissues with this parameterization First the para-meters of the longitudinal model (ai and bi) arelinked and if one pair of parameters is updated allthe others are too Secondly each pair may be esti-mated in different categories of patients To see thiswe consider the acute stroke unit in CopenhagenThe 1351 patients who went through the unit in thetwo years of collection did not all stay for the sametime Those most severely impaired by their strokestayed there the longest If such patients were usedto predict the progress from say weeks 11 to 12 theyare not necessarily representative of the wholepopulation and the parameter-linkage will have animpact on all other parts of the longitudinal model

622 The estimate of the ED95

Consideration of Figure 10 might suggest that the ED95

for ASTIN should be of the order of 110 mg In fact theposterior estimate of the ED95 was 54 mg [8] Why

Conventionally the Bayesrsquo estimate of a para-meter is its posterior mean and this was used inASTIN Figure 12 displays the posterior distribution

of the ASTIN-ED95 ndash it is clearly bimodal It can beargued that in the context of futility a bimodalposterior distribution for the ED95 is not unex-pected since the posterior estimate of the dosendashresponse curve will consist of a series of randombumps This raises some issues

First in the context of futility the concept of anED95 is meaningless Secondly given that futility is apossibility ndash particularly with the history of stroketrials in neuroprotection ndash it is preferable to use theposterior mode rather than mean Alternatively anestimate could be obtained from the posteriorestimate of the dosendashresponse curve itself ratherthan average of the ED95 values from eachindividual MCMC iterate This suggestion wasconsidered during the course of ASTIN as theIDMC-statistician identified the anomaly pointedto above IDMC-inspired investigations showed thatan algorithm based on the average curve hadessentially the same properties as the ASTIN-implemented algorithm and that the final infer-ences would not have been fundamentallychanged

7 Discussion

We have shown that adaptive designs with dynamictermination rules can be successfully implementedin phase II proof-of-concept trials offering import-ant advantages such as improved learning aboutthe dosendashresponse and potentially earlier stoppingIn future applications it will be important to buildmore reliable longitudinal models and to bettercontrol recruitment speed aiming for a rate thatallows the system to perform optimally

Up front investment is required to allow designingthe trial simulating its characteristics and implement-ing it However the payoff goes beyond applying amore informative design Investigators participatingin ASTIN were extremely interested in the trialmethodology and regulatory agencies reviewing ourapproach were supportive of the concept of adaptivetreatment allocation and dynamic termination rulesin phase II We are currently engaged in implementingsimilar designs in areas beyond stroke

References

1 Lesko LJ Rowland M Peck CC Blaschke TFOptimizing the science of drug devlopment opportu-nities for better candidate selection and acceleratedevaluation in humans J Clin Pharmacol 2000 40 803ndash14

2 Sheiner LB Learning versus confirming in clinical drugdevelopment Clin Pharmacol Therapeut 1997 61 275ndash91

3 Muller H-G Schmitt T Choice of number of doses formaximum likelihood estimation of the ED50 for quantaldosendashresponse data Biometrics 1990 46 117ndash29Figure 12 The posterior distribution of the ED95

350 AP Grieve and M Krams

Clinical Trials 2005 2 340ndash351 wwwSCTjournalcom

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351

Page 12: ASTIN: a Bayesian adaptive dose response trial in acute stroke · 0 ,1 ,3 ,4 , allowing any dose in the range of 0 8 to be studied. The advantage of increasing the number of doses

4 Jorgensen HS Nakayama H Raaschou HO Vive-Larsen J Stoier M Olsen TS Outcome and time courseof recovery in stroke The Copenhagen Stroke StudyArchives Physical Medical Rehabilitation 1995 76399ndash412

5 Lees KR Diener H-C Asplund K Krams M UK279276 a neutrophil inhibitory glycoprotein in acutestroke tolerability and pharmacokinetics Stroke 2003 341704ndash709

6 West M Harrison PJ Bayesian forecasting anddynamic models 2nd edition New York SpringerVerlag 1997

7 Berry DA Muller P Grieve AP et al Adaptive Bayesiandesigns for dose-ranging trials In Carlin B Carriquiry AGatsonis C et al eds Case studies in Bayesian statistics VNew York Springer Verlag 2002 99ndash181

8 Krams M Lees KR Hacke W Grieve AP OrgogozoJ-M Ford GA Acute stroke therapy by inhibition of neu-trophils (ASTIN) An adaptive dosendashresponse study of UK-279276 in acute ischaemic stroke Stroke 2003 34 2543ndash48

9 Tamura RN Faries DE Andersen JS HeiligensteinJH A case study of an adaptive treatment allocation in aclinical trial in the treatment of out-patients withdepressive disorder J Am Statist Assoc 89 768ndash76

ASTIN a Bayesian adaptive dosendashresponse trial 351

wwwSCTjournalcom Clinical Trials 2005 2 340ndash351