design and analysis of non-inferiority mortality trials in oncology

4
STATISTICS IN MEDICINE Statist. Med. 2004; 23:2771–2778 (DOI: 10.1002/sim.1897) LETTER TO THE EDITOR Design and analysis of non-inferiority mortality trials in oncology by M. Rothman, N. Li, G. Chen and G.Y.H. Chi, Statistics in Medicine 2003; 22:239–264 We would like to draw your attention to the implications for oncologic drug development of the article by Rothmann et al. published in the January 2003 special edition of SIM on non- inferiority trials [1]. The methods described in this article are increasingly used by regulators in the U.S.A. and Europe to evaluate the design and analysis of trials of new agents. The consequences for trial size are enormous. Shlaes and Moellering have expressed closely related concerns for anti-infective drug development [2]. There has been something of a paradigm shift in the approach to cancer treatment over recent years. Academia and industry alike are now fully engaged in the discovery, research and development of novel, well tolerated, biologically targeted (cytostatic) anticancer agents. It is hoped that these new treatments will oer signicant advantages to patients in terms of improved tolerability, but they may not always demonstrate increased ecacy. This nat- urally leads to the use of active-control, non-inferiority trials to compare the new agent with a standard agent, the conventional aim being to show no clinically relevant loss of ecacy. Such trials are often designed to demonstrate that the new treatment retains some fraction of the established eect of the standard, say at least 1/2. Note that this fraction is essen- tially arbitrary and no regulatory guidance currently mandates this as the minimum amount either to demonstrate clinical non-inferiority or to secure regulatory approval. If the stan- dard treatment was previously shown to double survival in a particular disease setting (haz- ard ratio = 0:50;p =0:02, say), and the goal for a new, better tolerated therapy is to re- tain at least 1/2 of this eect, a routine sample size calculation shows that a total of 350 events is required to provide 90 per cent power at the one-sided, 2.5 per cent signicance level. There are several important issues associated with the design and analysis of non-inferiority trials, including ‘constancy’—the extent to which the standard treatment performs as it did in previous trials—and ‘assay sensitivity’—the ability of a non-inferiority trial to detect a real dierence between the treatments compared. Much has been published in this area. The regulatory guidelines ICH E9 and E10 describe the issues in detail and provide some general guidance with respect to trial design and conduct [3; 4]. An issue not addressed in these guidelines arises from the fact that the standard eect is an estimate from earlier work and so is not known with certainty. Sample size calculations often ignore this uncertainty. Hung et al. have shown that this approach increases the probability of erroneously accepting the ecacy of a truly inferior drug [5]. The approach oered by Rothmann tackles this issue. Assuming constancy of the eect of the standard and accepting assay sensitivity, Rothmann proposes a formal statistical com- Copyright ? 2004 John Wiley & Sons, Ltd.

Upload: kevin-carroll

Post on 06-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

STATISTICS IN MEDICINEStatist. Med. 2004; 23:2771–2778 (DOI: 10.1002/sim.1897)

LETTER TO THE EDITOR

Design and analysis of non-inferiority mortality trialsin oncology

by M. Rothman, N. Li, G. Chen and G.Y.H. Chi, Statistics in Medicine 2003; 22:239–264

We would like to draw your attention to the implications for oncologic drug development ofthe article by Rothmann et al. published in the January 2003 special edition of SIM on non-inferiority trials [1]. The methods described in this article are increasingly used by regulatorsin the U.S.A. and Europe to evaluate the design and analysis of trials of new agents. Theconsequences for trial size are enormous. Shlaes and Moellering have expressed closely relatedconcerns for anti-infective drug development [2].There has been something of a paradigm shift in the approach to cancer treatment over

recent years. Academia and industry alike are now fully engaged in the discovery, researchand development of novel, well tolerated, biologically targeted (cytostatic) anticancer agents.It is hoped that these new treatments will o�er signi�cant advantages to patients in termsof improved tolerability, but they may not always demonstrate increased e�cacy. This nat-urally leads to the use of active-control, non-inferiority trials to compare the new agentwith a standard agent, the conventional aim being to show no clinically relevant loss ofe�cacy.Such trials are often designed to demonstrate that the new treatment retains some fraction

of the established e�ect of the standard, say at least 1/2. Note that this fraction is essen-tially arbitrary and no regulatory guidance currently mandates this as the minimum amounteither to demonstrate clinical non-inferiority or to secure regulatory approval. If the stan-dard treatment was previously shown to double survival in a particular disease setting (haz-ard ratio = 0:50; p=0:02, say), and the goal for a new, better tolerated therapy is to re-tain at least 1/2 of this e�ect, a routine sample size calculation shows that a total of 350events is required to provide 90 per cent power at the one-sided, 2.5 per cent signi�cancelevel.There are several important issues associated with the design and analysis of non-inferiority

trials, including ‘constancy’—the extent to which the standard treatment performs as it didin previous trials—and ‘assay sensitivity’—the ability of a non-inferiority trial to detect areal di�erence between the treatments compared. Much has been published in this area. Theregulatory guidelines ICH E9 and E10 describe the issues in detail and provide some generalguidance with respect to trial design and conduct [3; 4].An issue not addressed in these guidelines arises from the fact that the standard e�ect is an

estimate from earlier work and so is not known with certainty. Sample size calculations oftenignore this uncertainty. Hung et al. have shown that this approach increases the probabilityof erroneously accepting the e�cacy of a truly inferior drug [5].The approach o�ered by Rothmann tackles this issue. Assuming constancy of the e�ect

of the standard and accepting assay sensitivity, Rothmann proposes a formal statistical com-

Copyright ? 2004 John Wiley & Sons, Ltd.

2772 LETTER TO THE EDITOR

parison between the historical data characterising the standard e�ect and the data arising inthe non-inferiority trial, thereby explicitly incorporating the uncertainty (i.e. SE) around thestandard e�ect estimate. This is in fact akin to the putative or virtual placebo comparisonapproach described by Wang et al. [6]. Operationally, Rothmann’s approach is equivalent toconventional methodology using not the point estimate for the standard e�ect, but, rather, alesser e�ect somewhere between the point estimate and its lower 97.5 per cent con�dencelimit. This reduced e�ect is chosen so that the chance of falsely approving an inferior drugis exactly 2.5 per cent, thereby managing the regulatory risk.The key problem for researchers, physicians and patients alike is that, with Rothmann’s

approach, there is a dramatic increase in the size of the trial required, often rendering the trialcompletely infeasible. Applying this methodology to the example above increases the numberof events required from 350 to 3082, a near nine-fold increase. This size of the increasederives from a combination of the (arbitrary) e�ect retention fraction (50 per cent in thisexample) and the strength of prior characterization of the standard e�ect, which is re�ected inthe (historical) p-value. As illustrated in the table below, the application of this methodologymay actually require more events than there are patients with the disease Table I:This serves to illustrate that even with a highly signi�cant standard e�ect estimate, p∼ 0:001

say, Rothmann’s approach can double the size of a non-inferiority trial. Importantly, if thestandard treatment has only just reached statistical signi�cance, this approach implies that nonew drug can ever be approved via the non-inferiority route in that group of patients on thebasis of clinical bene�ts other than e�cacy; superiority in e�cacy to the standard treatmentwould have to be shown.Thus, the use of Rothmann’s approach, coupled with the arbitrary 50 per cent e�ect retention

requirement, would result in impracticable trials in many settings, notably those where thestandard was approved on the basis of a relatively small evidence base. Assuming that directcomparisons with placebo are unethical, we are forced to contemplate non-inferiority trialstoo large ever to be mounted, e�ectively removing non-inferiority as a viable tool in theevaluation and ultimate approval of new cancer medicines. This outcome is identical to the onefaced by those developing anti-infective agents which has contributed toward a decline

Table I. The number of deaths required to prove a new treatment retains 1/2 of the e�ect of standardtreatment (HR=0:50) using Rothmann’s methodology (true HR new:standard is unity, 90 per cent

power, � 2.5 per cent one-sided).

(Historical) p-value for Upper 95 per cent CI for HR Approx No. deathsstandard vs placebo of new-to-standard required to prove 50 per cent

must be less than: retention

0.049 1.004 3 000 0000.02 1.12 30820.01 1.18 15630.001 1.27 7350.0001 1.31 5720.00001 1.33 5050.000001 1.35 459

�0.000001∗ 1.41 350

∗Equivalent to the conventional approach i.e. the standard e�ect is known with (virtually) complete certainty.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2771–2778

LETTER TO THE EDITOR 2773

in the number of companies investing in antibacterial research and development and, conse-quently, in the number of new antibiotics to treat serious infections [2]. Some fundamentalre-thinking in this area is called for to avoid the obvious adverse impact on the future devel-opment of new cancer medicines.One way forward is to argue that there should be no di�erence between the standards of

evidence required in a superiority setting and in a non-inferiority (or active-control) setting.Hence the �rst purpose of the non-inferiority trials in question should be to prove that anew agent would have been better than placebo if placebo had been included. The secondpurpose should be to estimate (indirectly) the size of the e�ect of the new agent relative toplacebo. Both of these aims are achievable with Rothmanns’ approach and assumptions, withsome small modi�cations. In well-conducted trials with hard endpoints, little non-complianceand complete follow up, there should be no need to require 50 per cent retention of e�ectto demonstrate superiority to placebo. When estimating the size of the e�ect, attention couldfocus on the point estimate in the usual manner, and not on the lower con�dence limitalone. An approach along these lines has been nicely illustrated by Fisher et al. [7] anddoes not require pre-speci�cation of a percentage e�ect retention though, having obtained thedata, the result can easily be displayed in relation to the likely fraction of the standard e�ectretained. Concentrating on estimating the degree of bene�t over placebo, albeit through indirectmeasures, seems more in line with the e�cacy standards required by U.S. and European law,both of which call for substantial evidence of e�cacy to be established, with no requirementon relative e�cacy with respect to existing agents. This approach is in fact consistent withthe recently issued draft CPMP guidance on non-inferiority trials [8].The scienti�c and statistical debate on how best to draw inferences from active-control,

non-inferiority trials should not be considered complete. Rothmann’s approach serves tohighlight that considerable statistical, methodological and philosophical issues remain. Fail-ure to consider these issues constructively will, at the very least, lead to ever increasingdrug development times and, thus, delay the availability of new therapeutic options to pa-tients with life-threatening diseases. At worst, the barriers posed will discourage drug de-velopment where it otherwise might have been feasible and so prevent potentially usefulnew medicines becoming available to patients. We sincerely hope that the scienti�c commu-nity together with regulatory bodies worldwide will give this important area further carefulthought.

REFERENCES

1. Rothmann M, Li N, Chen G, Chi GYH, Temple R, Tsou H-H. Design and analysis of non-inferiority mortalitytrials in oncology. Statistics in Medicine 2003; 22:239–264.

2. Shlaes DM, Moellering RC. The United States Food and Drug Administration and the end of antibiotics. ClinicalInfectious Diseases 2002; 34:420–422.

3. ICH Topic E9. Note for Guidance on Statistical Principles for Clinical Trials. ICH Technical Coordination,EMEA: London, 1998.

4. ICH Topic E10. Note for Guidance on Choice of Control Group for Clinical Trials. ICH Technical Coordination,EMEA: London, 2000.

5. Hung J, Wang S-J, Tsong Y, Lawrence J, O’Neil RT. Some fundamental issues with non-inferiority testing inactive controlled trials. Statistics in Medicine 2003; 22:213–225.

6. Wang S-J, Hung J, Tsong Y. Utility and pitfalls of some statistical methods in active controlled clinical trials.Controlled Clinical Trials 2002; 23:15–28.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2771–2778

2774 LETTER TO THE EDITOR

7. Fisher LD, Gent M, B�uller HR. Active-control trials: how would a new agent compare with placebo? A methodillustrated with clopidogrel, aspirin, and placebo. American Heart Journal 2001; 141:26–32.

8. Committee for Propriety Medicinal Products: ‘Points to consider on the choice of non-inferiority margin’. EMEA.London, 2004, CPMP/EWP/2158/99draft

From: Kevin Carroll1, Bob Milsted2 and John A. Lewis31Statistical Expert for Global Oncology2VP, Regulatory Director for Global Oncology, AstraZeneca PharmaceuticalsAlderley Park, Maccles�eld, U.K.3Visiting Professor University of Leicester, U.K. and Consultant to AstraZeneca

Author’s Reply†;‡

Sir—I would like to thank you for the opportunity to reply in writing to comments of MrCarroll, Dr Milsted and Professor Lewis [1]. This reply will respond to the premise of theCarroll et al. commentary that it is di�cult to design a non-inferiority trial based on 50 percent retention of the survival e�ect of a standard therapy using the methodology in Rothmannet al. [2] when the estimate of the standard therapy vs placebo survival hazard ratio is0.5, discuss powering a non-inferiority trial for survival, discuss the adequacy of comparingsurvival between a test therapy and placebo when a standard therapy has been approved forsurvival, and elaborate on the content of some of the papers referenced in Reference [1].With respect to the article by Shlaes and Moellering [3], I invite the readers to read the

comments by Powers et al. [4], the comments by Gilbert et al. [5], the reply by Shlaes[6], and the transcript of the February 19–20, 2002 Division of Anti-Infective Drug ProductsAdvisory Committee meeting [7], which Dr Shlaes was a participant. Much of the commentaryof both References [5; 6] centers on the discussions of that advisory committee meeting andhow these discussions clari�ed concerns given previously in Shlaes and Moellering. Dr Shlaesstates in their reply ‘The FDA should be congratulated for organizing a very informative andextremely useful meeting. At this meeting, the FDA succeeded in de�ning a number of issuesand in achieving some early consensus for several of these issues.’ Dr Shlaes then lists someof the highlights of the discussions.According to Carroll et al. it is too di�cult to design a non-inferiority trial when the

estimated standard therapy vs placebo survival hazard ratio is 0.5. Examples that they pro-vide attribute very little precision to such an estimate and less precision than required fora regulatory approval based on survival. An example Carroll et al. bring up twice in textwould have this standard therapy vs placebo survival hazard ratio estimate of 0.5 based on 45events (p-value = 0:02; 45 events based on a one-to-one randomization). Carroll et al. discussand apparently prefer that the estimate of 0.5 based on 45 events from one trial should betreated with complete certainty as the true theoretical value of the standard therapy vs placebo

†The views expressed in this article do not necessarily represent those of the U.S. Food and Drug Administration.‡This article is a U.S. Government work and is in the public domain in the U.S.A.

Published in 2004 by John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2771–2778