Some Considerations for Choosing Among Types of
Phase II Designs
Paul Catalano
June 26, 2009
Determine if a new agent or a new treatment regimen appears sufficiently efficacious to be worth further investigation◦ Not attempting to prove or establish that the new
agent improves outcome Verify the safety of the therapy Provide statistical rigor/formal evaluation
context and targeted patient population
Purpose of Phase II Studies
Often formulate as testing a null hypothesis vs. an alternative◦ E.g. H0: pr = 0.05 vs. Ha: pr = 0.20, where pr is the
true proportion of patients who will respond to the new agent
Consequence of a type I error (): an ineffective agent will be studied further ◦ Use = 0.10 (one-sided)◦ Larger than in phase III studies
General Approach
Consequence of a type II error (): an effective agent will not be studied further◦ should be < 0.10
In practice, tend to be multiple phase II studies performed in multiple diseases, so the overall chance of missing an effective treatment is lower
Selection of therapies for phase III testing is based on all available data, not on a single phase II study
General Approach
Single arm with single analysis (can have multiple single arm studies in one protocol)
Single arm with interim stopping rules (usually with suspension of accrual)
Randomized selection designs(pick-the-winner)
Comparative randomized control Randomized discontinuation designs
Types of Phase II Designs
Patients refractory to standard therapy If some patients improve, agent must have
some activity Often use H0: pr = 0.05 vs. Ha: pr = 0.20 Simon’s (1989) optimal two-stage designs
minimize expected sample size under H0
Classic Design for Screening New Agents
Simon’s optimal design for pr = 0.05 vs 0.20:◦ 1st stage: treat 12 patients; stop if no responses◦ 2nd stage: treat 25 patients; conclude inactive if
< 4 / 37 (11%) respond
CTEP / IDB has been pushing this design for new agents in diseases without prior evidence of activity
Classic Design for Screening New Agents
Single arm two-stage designs are inefficient for multicenter studies◦ Time and effort needed to develop protocol and
CRFs and set up database◦ Cost of activation at institutions
Prefer settings where single stage designs are appropriate or studies with multiple strata and / or multiple arms
Classic Design for Screening New Agents
Might be appropriate◦ If some prior evidence of activity ◦ For combinations of new drugs with standard
treatments
Example: H0: pr = 0.20 vs Ha: pr = 0.37 (null rate depends on level of activity for standard rx)◦ 1-stage: 45 patients, reject H0 if > 12 / 45 (27%)
respond◦ 2-stage: conclude inactive if < 5 / 25 (20% 1st
stage) or 13/50 (26% overall) respond
Single Stage Accrual Designs
Cytostatic agents might improve disease stabilization rates rather than improve response rates
Test for improvement in disease stabilization rates; e.g. H0: ps = 0.30 vs. Ha: ps = 0.50, where ps = proportion stable or responding (free of progression) at x months (e.g. x = 4)
Calculations the same as for response
Improvement in Disease Stabilization
Multinomial: test e.g. H0: pr = 0.05 and ps = 0.30 vs. Ha: pr > 0.05 or ps > 0.30◦ Less efficient than binomial◦ May be more difficult to interpret
TTP or PFS◦ Kaplan-Meier estimate at single time or other
nonparametric test◦ Parametric (e.g. exponential) models can be
slightly more efficient
Survival generally not appropriate
Other Endpoints
Test e.g. H0: pr = 0.05 and ps = 0.30 vs. Ha: pr > 0.05 or ps > 0.30
Need to consider power against multiple alternative values; e.g. Ha1: pr = 0.20, ps = 0.30Ha2: pr = 0.14, ps = 0.40Ha3: pr = 0.05, ps = 0.50
1-stage: n=46, reject H0 if > 6 responses or >20 cases responding or stable◦ = 0.09; power = 0.92 for Ha1, Ha2, & Ha3
Example Multinomial Design
Separate evaluation of each arm◦ Each arm evaluated in a similar population
Selection designs: select the ‘best’ arm for further study
Comparative randomized control Randomized discontinuation
Randomized designs are larger and more complex – need to explain each arm to patients
Types of Randomized Phase II Designs
Concern about selection bias in studies without a simultaneous control group◦ Studies can enroll different patient groups even
with the same nominal population
◦ Population drift and stage migration
Control groups more appropriate for evaluating contribution to a combination or effect on progression than for determining if any response activity
Comparing studies from different groups
Control Arms
Often not needed because◦ Phase II studies can only detect fairly large
effects, so biases would need to be large
◦ Consequence of a false positive is further testing of an inactive drug
◦ Cooperative group or other studies conducted in the same network with central data review produce fairly consistent results
Increase the time and expense for phase II evaluation
Control Arms
(Simon, Wittes and Ellenberg, 1985) randomize between 2 or more experimental arms (no control arm)◦ In a sense, least efficacious arm is a control for
the others Select the best arm for further evaluation Usually define ‘best’ to be the arm with the
best outcome, no matter how small the difference
Randomized Selection (Pick-the-Winner) Design
With two arms, 0.50◦ Rationale: doesn’t matter which arm is
selected if they are nearly equivalent
Often separate efficacy test for each arm, too◦ 1-stage or 2-stage
Usually prefer randomizing over a series of separate studies◦ Facilitates (informal) comparisons
◦ Guards against sampling bias
Randomized Selection (Pick-the-Winner) Design
Randomized Selection (Pick-the-Winner) Design
RANDOMIZE
RX1
RX2
RXk. . .
Estimated Resp Rate
R1/n1
R2/n2
Rk/nk
. . .
RXj is ‘best’ if Rj/nj > Ri/ni for i j
Can use other endpoints
Example: Simon’s optimal 2-stage design for H0: pr = 0.20 vs Ha: pr = 0.40 enrolls 17 patients in the 1st stage and 20 in the 2nd ( = = .10)
Apply this design to each arm in a 2-arm randomized selection design
Randomized Selection (Pick-the-Winner) Design
Prob arm is winner
pr1 pr2 RX1 RX2 Neither
.20 .40 .015 .890 .095
.30 .40 .147 .758 .095
Probability of selecting the best arm declines as the number of arms increases
P{X1>max(X2,…,Xk)}
= x P(X1=x)P(X2<x, X3<x…,Xk<x|X1=x)
= x P(X1=x)P(X2<x) P(X3<x)… P(Xk<x)
= x P(X1=x)P(X2<x)k-1
if X2, …, Xk have the same distribution
Randomized Selection (Pick-the-Winner) Design
X1~Bin(50,.32); X2,…,Xk~Bin(50,.20) gives P{X1>max(X2,…,Xk)} = .90 for k = 2 andP{X1>max(X2,…,Xk)} = .72 for k = 6
Advanced renal trial of several targeted agents: 6 arms, n=55 / arm◦ TTP compared via Cox model◦ If one arm has median TTP of 7.2 months and the
other 5 have median TTP of 4.8 months (50% improvement), then the probability of selecting the best arm is 0.87
Randomized Selection(Pick-the-Winner) Design
Discussed for evaluating cytostatic agents in Korn et al. (2001)
Randomize experimental vs. standard and formally compare the arms
Appropriate if don’t have a reasonable prior estimate of expected control arm outcomes
Endpoint could be any of the standard phase II endpoints (e.g. TTP, response)
Might target larger differences than a phase III
Comparative Randomized Control Design
Test could be a definitive (phase III) evaluation with < 0.025 (one-sided)◦ If little prior phase II efficacy data, need early
stopping rules for lack of benefit◦ Might not be appropriate if a second phase III
study evaluating survival would be needed
Comparative Randomized Control Design
Test could be a suggestive (phase II) evaluation with a larger (e.g. 0.10 to 0.20)◦ Appropriate for screening new agents◦ If positive, still needs to be followed by a
definitive phase III study◦ Korn et al. suggest using = 0.20, because the
sample size with = 0.10 is large enough that it might be better to go directly to the definitive study
Comparative Randomized Control Design
3-arm comparison of TTP (two dose levels of bevacizumab), targeting a large difference (100% improvement in median TTP), but designed to be definitive (Yang, 2003)◦ Overall = .05 (two-sided), = 0.20◦ Each comparison at one-sided 0.0125◦ Needed about 50 patients / arm (stopped early
because of highly significant results)◦ Crossover from placebo to low dose drug
Study of Bevacizumab vs. Placebo in RCC
Was overall = .05 appropriate?◦ A second, larger study is still needed for
survival◦ Could have identified drug as promising with
even fewer patients (larger )
Yang Study of Bevacizumab vs. Placebo
Was a placebo needed?◦ Evaluation bias should be much smaller than a
doubling of TTP◦ May not be to identify promising drugs◦ FDA tends to require a placebo for TTP
Was the control arm needed?◦ Would results from a single arm, single
institution study have been convincing?
Yang Study of Bevacizumab vs. Placebo
Cisplatin + C225 vs. Cisplatin + Placebo Designed to have 90% power to detect an
improvement in median PFS from 2 months to 4 months (100% improvement) with = 0.025 (one-sided)
With allowance for non-compliance, required 54 eligible patients / arm
Final accrual was 117 eligible patients
E5397 – Advanced Head and Neck Cancer
E5397 Summary of Results
Cisplatin + C225
Cisplatin + Placebo
P-value (one-sided)
Response Rate 26% 10% 0.02
Median PFS 4.2 mos 2.7 mos 0.09
Median Survival 9.2 mos 8.0 mos 0.21
Hazard Ratios (Placebo/C225) and 95% CIs
PFS: 1.31 (0.91, 1.89)
Survival: 1.16 (0.80, 1.69)
E5397 PFS by Treatment
0 5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
Months
Pro
bab
ility
Cisplatin+C225 (55 events/ 56 cases)Cisplatin+Plcbo (59 events/ 60 cases)P1 = 0.09
E5397 Survival by Treatment
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Months
Pro
bab
ility
Cisplatin+C225 (54 events/ 57 cases)Cisplatin+Plcbo (57 events/ 60 cases)P1 = 0.21
Study is not definitive – underpowered for both PFS and survival
Is it promising – should a follow-up study of C225 be done?
Would a better strategy have been a single arm phase II with a response endpoint, followed by a definitive phase III based on the ‘promising’ response rate of 26%?
E5397
PFS reaches the one-sided = 0.10 cutoff for a ‘promising’ phase II result
Survival would not have been an appropriate endpoint◦ Estimated improvement is 16%◦ Confidence interval consistent with 20% decrease
to a 69% increase◦ Phase II sample sizes are not adequate to detect
realistic survival effects
E5397
An enrichment strategy based on randomizing patients who appear to be doing well on the treatment (Rosner, Stadler, Ratain, 2002)
Initially all patients are treated, patients free of progression for some period of time are randomized between continuing treatment and placebo, with crossover from placebo to treatment at progression or specified PFI
Complex design with a blinded randomization and 3 registration points
Randomized Discontinuation Design
Randomized Discontinuation Design
RANDOMIZE
RX
RXPlacebo
REGISTER
REASSES
Initial RX
Off study
PD
SD
Continue RXResponse
Crossover at PD or after
specified PFI
(run-in)
Usefulness depends on how successful the run-in is in selecting patients benefiting from treatment◦ TTP is highly variable in most diseases, so
randomized population will be a mixture
◦ Korn et al. (2001), Capra (2004) suggest often less efficient than standard RCT
Carry-over effect could dilute difference between randomized arms
Requires much larger sample size
Randomized Discontinuation Design
CALGB 69901 (CAI in RCC) Randomize patients if stable after 16 weeks Enrolled 374 patients; randomized 65
eligible patients (17%)◦ Enrichment strategy was not successful, but does
CAI have any activity?◦ Did they learn any more from 374 patients than
ECOG did from 57 patients in a more traditional two-stage phase II design (E4896)?
Randomized Discontinuation Design
In many settings, conventional phase II designs may still be appropriate
Start-up costs for single-arm two-stage designs are a concern
Randomized phase II studies allow evaluation of multiple agents or schedules and protect against sampling bias
Selection designs are useful for informal comparison and identifying promising agents
Main Points
Control arms should not ordinarily be needed, but can be effective in some settings
Survival is seldom (never?) the best phase II endpoint
Randomized discontinuation designs may not be appropriate and need to be strongly justified
Main Points
Capra WB (2004). Comparing the power of the discontinuation design to that of the classic randomized design on time-to-event endpoints. Controlled Clinical Trials 25:168-177.
Freidlin B, Dancey J, Korn EL, Zee B, Eisenhauer E (2002) Multinomial phase II trial designs (letter to the editor). Journal of Clinical Oncology 20:599.
Korn EL, Arbuck SG, Pluda JM, Simon R, Kaplan RS, Christian MC (2001). Clinical trial designs for cytostatic agents: are new approaches needed? Journal of Clinical Oncology 19:265-272.
Rosner GL, Stadler W, Ratain MJ (2002). Randomized discontinuation design: application to cytostatic antineoplastic agents. Journal of Clinical Oncology 20:4478-4484.
Simon R (1989). Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 10:1-10.
Simon R, Wittes RE, Ellenberg SS (1985). Randomized phase II clinical trials. Cancer Treatment Reports 12:1375-1381.
Yang JC et al. (2003). A randomized trial of bevacizumab, an anti-vascular endothelial growth factor antibody, for metastatic renal cancer. New England Journal of Medicine 349:427-434.
References