1 the changing landscape of interim analyses for efficacy / futility marc buyse, scd iddi,...

48
1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium [email protected] Massachusetts Biotechnology Council Cambridge, Mass June 2, 2009

Upload: evangeline-fitzgerald

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

1

The changing landscape of interim analyses for

efficacy / futility

Marc Buyse, ScDIDDI, Louvain-la-Neuve, Belgium

[email protected]

Massachusetts Biotechnology CouncilCambridge, Mass

June 2, 2009

Page 2: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

2

Reasons for Interim Analyses• Early stopping for

• safety

• extreme efficacy

• futility

• Adaptation of design based on observed data to

• play the winner / drop the loser

• maintain power

• make any adaptation, for whatever reason and whether or not data-derived, whilst controlling for

Page 3: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

3

Methods for Interim Analyses

• Multi-stage designs / seamless transition designs

• Group-sequential designs

• Stochastic curtailment

• Sample size adjustments

• Adaptive (« flexible ») designs

Page 4: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

4

Early Stopping Helsinki Declaration:

“Physician should cease any investigation if the hazards are found to outweigh the potential benefits.”(« Primum non nocere »)

Trials with serious, irreversible endpoints should be stopped if one treatment is “proven” to be superior, and such potential stopping should be formally pre-specified in the trial design.

Page 5: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

5

The Cost of Delay

Average Daily Sales ($ in Millions)

$0

$2

$4

$6

$8

$10

$12

Prilosec Zocor Norvasc Paxil Claritin

« Blockbusters » reach sales > 500 M$ a year (> 1 M$ a day)

Page 6: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

6

Fixed Sample Size Trials…

1 – the sample size is calculated to detect a given difference at given significance and power2 – the required number of patients is accrued3 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified number of events

Page 7: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

7

…vs (Group) Sequential Trials…

1 – the sample size is calculated to detect a given difference at given significance and power2 – patients are accrued until a pre-planned interim analysis of patient outcomes takes place3a – the trial is terminated early, or3b – the trial continues unchanged4 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified number of events

Page 8: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

8

…vs Adaptive Trials

1 – the sample size is calculated to detect a given difference at given significance and power2 – patients are accrued until a pre-planned interim analysis of patient outcomes takes place3a – the trial is terminated early, or3b – the trial continues unchanged, or3c – the trial continues with adaptations4 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified or modified number of events

Page 9: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

Randomized phase II trial with continuation as phase III trial

Simultaneous screening of several treatment groups with continuation as phase III trial :

PHASE III

Comparison of the arms

Arm 2

Arm 1

Early stopping ofone or more arms

PHASE II

Arm 3

Page 10: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

Phase III trial with interim analysis

Phase III trial with interim look at data:

Interim comparison of

the arms

PHASE III INTERIM PHASE III

Comparison of the arms

Arm 2

Arm 1

Arm 3

Page 11: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

11

Seamless transition designs(e.g. for dose selection)

Designs can be operationally or inferentially seamless:

Page 12: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

12

Group Sequential Trials If several analyses are carried out, the Type I error

is inflated if each analysis is carried out at the target level of significance.

So, the interim analyses must use an adjusted level of significance so as to preserve the overall type I error.

Page 13: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

13

Inflation of with multiple analyses

With 5 analyses performed at level 0.05, the overall level is 0.15

Page 14: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

Adjusting for multiple analyses

The 5 analyses must be performed at level 0.0159 in order to preserve an overall level of 0.05

Page 15: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

15

Group sequential designs Test H0: Δ = 0 vs. HA: Δ ≠ 0

m pts. accrued to each arm between analyses

Use standardized test statistic Zk, k=1,...,K

mk

XX

mk

XXZ CkEk

mk

iCi

mk

iEi

k/22

11

Page 16: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

16

Group-Sequential Designs – Type I Error

Probability of wrongly stopping/rejecting H0 at

analysis k

PH0(|Z1|< c1, ..., |Zk-1|< ck-1, | Zk |≥ ck) = πk

• “Type I error spent at stage k”

P(Type I error) = ∑πk

Choose ck’s so that ∑πk α

Page 17: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

17

Group-Sequential Designs – Type II Error

Probability of Type II error is

1-PHA( U {|Z1|<c1, ..., |Zk-1|<ck-1, | Zk |≥ck} )

Depends on K, α, β, ck’s.

Given the values, the required sample size can be computed• it can be expressed as R x (fixed sample size)

Page 18: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

18

Pocock Boundaries

Reject H0 if | Zk | > cP(K,α)

• cP(K,α) chosen so that P(Type I error) = α

All analyses are carried out at the same adjusted significance level

The probability of early rejection is high but the power at the final analysis may be compromised

Page 19: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

19

Pocock Boundaries

p-values for Zk (two-sided) per interim analysis (K=5)

Page 20: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

20

O’Brien-Fleming Boundaries

Reject H0 if | Zk | > cOBF(K,α)√(K / k)

• for k=K we get | ZK | > cOBF(K,α)

• cOBF(K,α) chosen so that P(Type I error) = α

Early analyses are carried out at extreme adjusted significance levels

The probability of early rejection is low but the power at the final analysis is almost unaffected

Page 21: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

21

O’Brien-Fleming Boundaries

p-values for Zk (two-sided) per interim analysis (K=5)

Page 22: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

22

Wang & Tsiatis Boundaries Wang & Tsiatis (1987):

Reject H0 if | Zk | > cWT(K,α,θ)(K / k)θ - ½

• θ = 0.5 gives Pocock’s test; θ = 0, O’Brien-Fleming

• implemented in some software (e.g. EaSt)

Can accomodate any intermediate choice between Pocock and O’Brien-Fleming

Page 23: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

23

p-values for Zk (two-sided) per interim analysis (K=5) with = .2

Wang & Tsiatis Boundaries

Page 24: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

24

Haybittle & Peto Boundaries Haybittle & Peto (1976):

Reject H0 if | Zk | > 3 for k = 1,...,K-1

Reject H0 if | Zk | > cHP(K,α) for k = K

• | Zk | > 3 corresponds to using p < 0.0026

Early analyses are carried out at extreme, yet reasonable adjusted significance levels

Intuitive and easily implemented if correction to final significance level is ignored (pragmatic approach)

Page 25: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

25

p-values for Zk (two-sided) per interim analysis (K=5)

Haybittle & Peto Boundaries

Page 26: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

26

Boundaries compared

p-values for Zk (two-sided) per interim analysis (K=5)

Page 27: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

27

Boundaries compared

Zk per interim analysis (K=5)

Page 28: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

Potential savings / costs in using group sequential designs

A - B Fixed sample Pocock O’Brien-Fleming

0.0 170 205 179

0.5 170 182 168

1.0 170 117 130

1.5 170 70 94

Expected sample sizes for different designs (K=5): - outcomes normally distributed with = 2- = 0.05- = 0.1 for A - B = 1

Page 29: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

29

Error-Spending Approach Removing the requirement of a fixed number of equally- spaced analyses

Lan & DeMets (1983): two-sided tests “spending” Type I error.

Maximum information design:

• Error spending function →

• Defines boundaries

• Accept H0 if Imax attained without rejecting the null

Page 30: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

30

Error-Spending Approach

f(t)=min(2-2Φ(z1-α/2),α) yields ≈ O’B-F boundaries

f(t)=min(α ln (1+(e -1)t,α) yields ≈ Pocock boundaries

f(t)=min(αtθ,α):•θ=1 or 3 corresponds to Pocock and O’B-F, respectively

Page 31: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

31

How Many Interim Analyses?

One or two interim analyses give most benefit in terms of a reduction of the expected sample size

Not much gain from going beyond 5 analyses

Page 32: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

32

When to Conduct Interim Analyses?

With error-spending, full flexibility as to number and timing of analyses

• First analysis should not be “too early” (often at 50% of information time)

• Equally-spaced analyses advisable

In principle, strategy/timing should not be chosen based on the observed results

Page 33: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

33

Who conducts interim analyses? Independent Data Monitoring Committee

Experts from different disciplines (clinicians, statisticians, ethicists, patient advocates, …)

Reviews trial conduct, safety and efficacy data

Recommends• Stopping the trial• Continuing the trial unchanged• Amending the trial

Page 34: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

34

Sample Size Re-Estimation Assume normally distributed endpoints

2

22/112

zznI

Sample size depends on σ2

If misspecified, nI can be too small

Idea: internal pilot study

• estimate σ2 based on early observed data

• compute new sample size, nA

• if necessary, accrue extra patients above nI

Page 35: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

35

Early Stopping for Futility

Stopping to reject H0 of no treatment difference

• Avoids exposing further patients to the inferior treatment

• Appropriate if no further checks are needed on, e.g., treatment safety or long-term effects.

Stopping to accept H0 of no treatment difference

• Stopping “for futility” or “abandoning a lost cause”

• Saves time and effort when a study is unlikely to lead to a positive conclusion.

Page 36: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

36

Two-Sided Test

Page 37: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

37

Stochastic CurtailmentIdea:

Terminate the trial for efficacy if there is high probability of rejecting the null, given the current data and assuming the null is true among future patients

Conversely, terminate the trial for futility if there is low probability of rejecting the null, given the current data and assuming the alternative is true among future patients

Page 38: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

38

Conditional Power

At the interim analysis k, define

pk(Δ) = PHA(Test will reject H0 | current data)

A high value of pk(0) suggests T will reject H0

• terminate the trial & reject H0 if pk(0) > ξ

• terminate the trial & accept H0 if 1-pk(Δ) > ξ’ (1-sided)

• probabilities of error, type I α / ξ, type II β / ξ’

Note: ξ and ξ’ 0.8

Page 39: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

39

Conditional Power Unconditional power

for α=0.05 and β=0.1 at Δ=0.2

Conditional power for a mid-trial analysis with an estimate of Δ of 0.1• probability of rejecting

the null at the end of the trial has been reduced from 0.9 to 0.1

Page 40: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

40

Conditional Power

B(t) = Z(t)t1/2 = t

Page 41: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

41

Conditional Power

Slope = assumed treatment effect in

future patients

Page 42: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

42

Conditional Power

Crosshatched area = conditional power

Page 43: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

43

Predictive Power

Problem with the conditional power approach: it is computed assuming Δ not supported by the current data.

A solution: average across the values of Δ

“Predictive power”

dpP kk )data|()(

π(Δ | data) is the posterior density

Termination against H0 if Pk > ξ etc.

What prior ?

Page 44: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

Futility guidelines

Less indicated More indicated

Controversial intervention requiring large randomized evidence (e.g. drug eluding stents)

Time to event endpoints with rapid enrollment (e.g. cholesterol lowering drugs)

Intervention in current use Learning curve by

investigators (e.g. mechanical heart valves)

Late effects suspected

Safety expected to be an issue (e.g. cox-2 inhibitors)

Approved competitive products (e.g. drugs for allergic rhinitis)

Long pipeline of alternative drugs (e.g. oncology)

Short-term outcomes (e.g. 30 day mortality in sepsis)

Page 45: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

Overruling futility boundaries

No stopping when boundary crossed

Stopping when boundary not crossed

Time trends Baseline imbalances Major problems with quality

of data Considerable imputation of

missing data Important secondary

endpoints showing benefit External information on

benefit t of similar therapies

Benefit/risk ratio unlikely to be good enough to adopt experimental treatment

All endpoints showing consistent trends against experimental treatment

External information on lack of effect of similar therapies

Page 46: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

46

Adaptive Designs

Based on combining p-values from different analyses

Allow for flexible designs

• sample size re-calculation

• any changes to the design (including endpoint, test, etc!)

Page 47: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

47

Adaptive Designs

Lehmacher and Wassmer (1999):

At stage k, combine one-sided p-values p1,... ,pk

L = k-1/2∑Φ-1(1-pk)

Use any group sequential design for L

Slight power loss as compared to a group-sequential plan

Flexibility as to design modifications: OK for control of type I error, BUT…

Page 48: 1 The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts

48

Potential concerns with adaptive designs

Major changes between cohorts make clinical interpretation difficult

If eligibility / endpoint changed, what is adequate label?

Temporal trends

Operational bias

Less efficient than group sequential for sample size adjustments

Modest gains (in general), high risks