sample size estimation 1. general considerations 2. continuous response variable –parallel group...
TRANSCRIPT
Sample Size Estimation1. General considerations
2. Continuous response variable– Parallel group comparisons
• Comparison of response after a specified period of follow-up• Comparison of changes from baseline
– Crossover study
3. Success/failure response variable– Impact of non-compliance, lag– Realistic estimates of control event rate (Pc) and event rate
pattern– Use of epidemiological data to obtain realistic estimates of
experimental group event rate (Pe)
4. Time to event designs and variable follow-up
Comparison of Sample Size Formulae for Means and Proportions (n per group)
n =1
(Pc - Pe )2z1-/2 2P (1- P ) z1- Pc(1- Pc) + Pe(1-Pe ) 2
n =1
(Pc - Pe )2Pc (1- Pc ) + Pe (1- Pe ) z1-/2 z1- 2
Pc = Control group event rate
Pe = Experimental group event rate
P =(Pc + Pe )
2
Pc -Pe
For means: n =2s 2 z1-a/2 + z1-b( )2
D2
Example
• H0: Pc = Pe (proportion with event on control arm = proportion with event on experimental arm)
• HA: Pc = .40, Pe = .30
= .40 - .30 = .10
• Assume
a = .05 Za = 1.96 (2-sided)
1 - b = .90 Zb = 1.28
• p = (.40 + .30 )/2 = .35
Example (cont.)
N = 476; 2N = 952
2
2
34
6473282165352961
).(.
]))(.(.))(.(..))(.(..[
N
Approximate* Total Sample Size for Comparing Proportions in Two Groups of Equal Size with Significance Level () of
0.05 and Power (1- ) of 0.90
0.50 0.40 10400.30 2500.20 110
0.30 0.20 7900.15 3300.10 170
0.10 0.08 86000.05 11700.02 370
0.05 0.03 40300.025 24300.01 760
0.03 0.015 41000.03 0.019 82900.03 0.02 102400.019 0.013 18360
Total Sample SizePE
(Experimental Group)PC
(Control Group)
* Sample sizes are rounded up to the nearest 10.
Similar toMRFIT
Factors Which Influence “Realized Delta Pc-Pe”
(Delta = Hypothesized Treatment Difference)
• Non-compliance to experimental treatment
• Switchover from control to experimental treatment
• Lag time for experimental treatment to influence endpoint
• Events counted as an endpoint that are not influenced by treatments under study (e.g., accidental or violent deaths in a study of HIV treatments)
Strategy for Specifying Delta and Estimating Sample Size
• Begin by specifying the minimal effect of experimental treatment which would be considered clinically relevant (usually this is done in terms of a relative difference, e.g., relative risk or hazard ratio)
• Assume immediate full impact of treatment on endpoint and full compliance
• Adjust this “optimistic delta” downwards for non-compliance and lag if necessary
• Calculate sample size using “adjusted delta”
• Inflate sample size (again) for competing events and losses
• For planned sample size, assess impact of deviations from “adjusted delta” on power
2 x Variability x [Constant (,)]2
Delta2N PerGroup
=
No.primaryevents
No.competing
events
Losses
Standardization
Clinicaljudgement Biologic
plausibility Non-compliance
Lag
Simple Adjustment for Non-Compliance in Experimental Group
Example:Heart failure trial; primary endpoint is death or hospitalization for heart failure.
PC = 0.30 (Placebo group event rate) after 3 yearsPe = 0.24 (New treatment event rate) after 3 years
• Assume 20% of patients do not comply with experimental treatment (d)
• Assume risk of endpoint for non-compliers in experimental group is the same as placebo group
Simple Adjustment for Non-Compliance
NEW∆
eP
ADJ= 0.20 ( ) + 0.80 ( )
cP
eP
eP
ADJ= 0.252
= 0.30 - 0.252 = 0.048;
OLD∆ = 0.30 - 0.24 = 0.06
Unadjusted sample size = 1150 per group
Approximation: Inflate usual sample size by
where d = fraction of patients not complying2(1-d)
1
2(.8)1
New sample size = 1150
≈ 1800 per group( )
Compliance Adjustment
d = fraction who do not comply to exp. treatment
Inflate usual sample size by
Pe ADJ dP c (1 d)Pe
Pc Pe ADJ (1 d)(P c Pe )
1(1 - d) 2
TOXO Study DesignPower to Detect a 50% Differencefor Sample Size of 265 Patients
0 0 0.8010 0.7425 0.66
10 0 0.7510 0.7125 0.63
25 0 0.6910 0.6525 0.57
Switchover fromplacebo to active
Non-complianceto Pyrimethamine Power
Realistic Estimates of Pe and Pc
• Halperin M, J Chronic Dis,1968 (constant event rates for control and experimental groups, non-compliance in experimental group and lag)
• Wu M, Cont Clin Trials,1980 (extended Halperin’s method to non-compliance in control group and time-dependent non-compliance)
• Lakatos E, Cont Clin Trials,1986 and Biometrics, 1988 (extended to log rank test – time to event analyses)
• Shih J, Cont Clin Trials,1995 and Encyclopedia of Clinical Trials, 2007 (implemented Lakatos methods in SAS – Size program – allows event rates to vary and extended to weighted log rank and unequal allocation)
Definitions
Dropout – Non-compliance to exp. treatment
Dropin – Non-compliance to control treatment
Lag –Time for treatment to achieve maximum benefit
Lost-to-follow-up – A person for whom endpoint status is unknown (outcome is missing)
Halperin Model to Adjust for Non-Compliance and Lag in Experimental
Group
1. Specify
2. Specify (or k); k = where
k x 100 = % reduction in control group event rate due to experimental treatment
3. Specify nonadherence rate in experimental group: d
4. Specify lag: f
5. Obtain adjusted value of from table
6. Obtain inflated sample size estimate
cP
e P
eP
Pc - Pe
Pc
Later development:allow pattern of dropout to vary over follow-up of length T
d
T
CumulativeDropout
Rate
x
d
T
CumulativeDropout
Rate
x
0
0
Effect of Non-adherence on Pe
Non-Dropouts
T
Dropouts assume the risk of participants in the control arm.Their risk reverts in the same manner as it decreased before
dropout (immediately if lag=0)
c
e
0
c = hazard for controls
e = hazard for experimental group
Lag -- Halperin defined r (t), the hazard of event in experimental group, as follows:
r(t) = c(1- ), t < fkt
fc(1- k) = e, t > f{
T
c
e
0 f
Linear decline to ebetween T=0 and T=f
Halperin M, et al give tables for f=0, 0.5T, T and 2T.
Example: Heart Failure Trial with Death or Hospitalization for Heart Failure as Primary Endpoint
p c = 0.30 K = 0.20
p e = 0.24
d = 0.20
Table 1 of Halperin (f = 0)
pe = 0.246 K Adj. = 0.18
N = 1425 per group
Before we had pe ADJ = 0.252
N = 1800 per group
Assume event rate is 30% after 3 years; event rate is constant; 20% of
those assigned new treatment will discontinueit after 3 years (cumulative dropout=20%;
and there is no lag.
Impact of Dropout Pattern on pe and k:Heart Failure Example (cont.)
(1,1,1,1,1,1,1,1) (Halperin) 0.246 0.180(2,1,1,1,1,1,1,1)0.247 0.177(1,0,0,0,0,0,0,0)0.251 0.163(1,1,1,1,1,1,1,2)0.246 0.181(0,0,0,0,0,0,0,1)0.241 0.197
Pattern of Dropout OverFour Years (Eight 6-Month
Time Periods Adjusted pe Adjusted k
cp
epkdf
= 0.30= 0.24= 0.20= 0.20= 0
Comparison of Non-AdherenceAdjustments on Sample Size for Heart
Failure Trial
No adjustment .240 1150
Simple adjustment .252 1800(instantaneousnon-compliance)
Halperin (equal over .246 1425 follow-up)
Wu/Shih (twice as.247 1485much in 1st year)
Adj. pe N Per Group
Dropout Assumptionsin Major Trials
1. MRFIT (J Chronic Dis, 1977): 50% (2,1,1,1,1,1)
2. CPPT (JAMA, 1984): 35% (1,1,1,1,1,1,1)
3. Systolic Hypertension in the Elderly (SHEP) (J Clin Epid, 1988): 16% (2,1,1,1,1)
Example: Similar to MRFIT (Lag of 3 years)
Full Effect of Treatment is 50% and is Reached in 1/2 T
cp
epd T
= 0.03 (CHD death) K = 0.50= 0.015 = 0.05, 1- = 0.90= 0 (no dropouts) = 6 years, f = 3 years
Adjusted = 0.019 instead of 0.015e
p
NEW2N (f = 3) = 8290 versus 4100 with no lag adjustment;
alpha=0.05 (2-sided) and power=0.90.
0.50
T/2 T0 0
K
Adjustment for Both Non-Compliance and Lag (Parameters Similar to MRFIT)
p = 0.03; K = 0.50; = 0.05 (2-sided), 1- = 0.90
T = 6 years; f = 3 years (0.5T); d = 0.50
Adjusted pe = 0.022
2N = 16,610 and 2N = 4100 (no adjustment for lag or dropout)
c
NEW OLD
J Chron Dis 1976. Actually, MRFIT was designed as1-sided test with alpha=0.05 with unadjusted K=0.542.
Dropout and Dropin Assumptionsin Major Cardiovascular Trials
1. MRFIT 50 0
2. CPPT 35 0
3. SHEP 16 19
Dropout (%) Dropin (%)
Impact of Dropout, Dropin and LagAssumptions on Hypothesized Risk
Reductions
MRFIT 54% 27%
CPPT – 36%
SHEP 40% 32%
Unadjusted Adjusted
Example
TOXO Protocol
1. Primary endpoint: Toxoplasmic encephalitis (TE)
2. Control (placebo) group event rate: 30% in 2.5 years
3. Experimental (pyrimethamine) group event rate: 15% in 2.5 years (50% reduction)
4. Death rate unrelated to TE: 33%
5. Confidence in answer: = 0.05 (2-sided); 1 - (power) = 0.80
6. 2:1 allocation for pyrimethamine:placebo
TOXO Sample SizeInfluence of Non-Compliance
0 0 30.0 15.0 50.0 265
10 30.0 15.8 47.4 300
25 30.0 17.0 43.3 365
10 0 29.3 15.0 48.7 290
10 29.3 15.8 46.0 330
25 29.3 17.0 41.8 405
25 0 28.1 15.0 46.6 335
10 28.1 15.8 42.3 380
25 28.1 17.0 39.4 490
Switchover from
placebo to active
Non-complianceto
PyrimethamineSample
SizePercent
ReductionPlacebo Pyrimethamine
Event Rate (%)
Mis-specification of Control Group Event Rate
Approximate* Total Sample Size for Comparing Proportions in Two Groups of Equal Size with Significance Level () of
0.05 and Power (1- ) of 0.90
0.50 0.40 10400.30 2500.20 110
0.30 0.20 7900.15 3300.10 170
0.10 0.08 86000.05 11700.02 370
0.05 0.03 40300.025 24300.01 760
0.03 0.015 41000.03 0.019 82900.03 0.02 102400.019 0.013 18360
Total Sample SizePE
(Experimental Group)PC
(Control Group)
* Sample sizes are rounded up to the nearest 10.
Similar toMRFIT
Influence on Power of Mis-Specification of Control
Group Event Rate (Pc) in CPCRA TOXO Study
Design: Pc = 0.30; hypothesized percentage reduction
due to treatment = 50%; a = 0.05 (2-sided); 10%
switchover from placebo; 25% non-compliance to
pyrimethamine; combined sample size = 405
.30 0.80
.25 0.71
.20 0.62
.15 0.49
.10 0.35
Pc Power
Comparison of Observed and Expected Number of Deaths
Primary Prevention Studies
MRFIT (6 years)
CHD deaths 104 187 0.56
All deaths 219 442 0.50
Physician’s Health
Study (4.8 years)
CVD death 44 366 0.12
Helsinki Heart Study
Fatal/nonfatal 84 152 0.55
cardiac events
Observed ExpectedObserved/Expected
a
U.S. life tablesa
Comparison of Observed and Expected Number of Deaths
UGDP (8 years)
CVD deaths 10 17 0.59
BHAT (3 years)
All deaths 187 269 0.70
CDP (5 years)
All deaths 583 837 0.70
Observed ExpectedObserved/Expected
a
U.S. life tablesa
Impact of Medical Exclusions on Mortality
Deaths, Cause Known, by Interval Between Last Exam and Death
≤ 6 42 607-12 33 4213-24 20 45> 24 31 35
126
Interval
(months) All Deaths
Dead,Cause Known
(%)
• 60% of subjects who died ≤6 months after exam had a finding on exam related to death
• Impact of medical exclusions could be 50% during first 2 years
Schor et al., An Evaluation of the Periodic Health Examination, Annals Int Med, Dec. 1964.
Impact of Medical Exclusions on Mortality
Observed and Expected No. DeathsAmong 85,491 White Male Veterans
• 20-22 years after WWII, mortality among male veterans is lower than white U.S. males in general
Seltzer and Jablon, Effects of Selection on Mortality, Am J Epi, Vol. 100, 1974.
1947-51 623 844.3 0.738
1952-56 694 892.8 0.694
1957-61 1028 1200.1 0.857
1962-66 1621 1868.1 0.868
1967-69 1379 1597.0 0.863
Total 5345 6402.2 0.835
Year Observed Expected O/E
“Partial Solution” to Problems Resulting from Mis-Estimation of Control Group
Event Rate• Monitor parameters on which sample size is
based during the trial, i.e., the control group event rate, and extend the trial if necessary
• Plan for a sample-size re-estimation
• Design the study to continue until a certain number of events occur (i.e., event-driven trial) (this may not always be possible because of funding risks)
Usual Situation for“Time-to-Event” Clinical Trials
• Recruitment extends over several months or years.
• Trial design usually specifies minimum period of follow-up for all patients and study ends on a common closing date.
• Total trial duration = Recruitment period + minimum follow-up period following enrollment.
• Patients are followed for a variable length of time as a consequence of recruitment period and common closing date.
Usual Situation (cont.)
• Time to event methods: Kaplan-Meier life-tables, Cox models and log rank statistics are used to compare groups, e.g., Ho: Se=Sc (survival functions for experimental and control groups are equal)
• Sample size based on log rank test instead of tests of proportions.
For studies in which the study duration is short comparedto average event time (e.g., survival time), sample size using
proportions (over average follow-up) is similar to that using time to event (log rank). When this is not the case, using proportions
usually results in a larger sample size than considering time to event.
Reasons for Censoring
• End of follow-up (administrative)
• Lost to follow-up (bias is a concern)
• Competing event (e.g., death from an accident in a CVD study; in some cases bias is also a concern)
Typical Enrollment in Trial
xx
x
1 2 3 4 5 6 7 8 9 April 301977
x
xABBABBAAAB
123456789
10
PatientAcc. No. Treatment
End ofStudy
x – Death – Censored
Calendar Time from Start of Study (Months)
0
Common Closing Date (def) – the calendar date that is the end of follow-up for all patients (except deaths, withdrawals,
losses). The date through which events are counted for the primary analysis.
April 30, 1977 in example
Conversion to Timefrom Randomization
xx
x
1 2 3 4 5 6 7 8 9 10
x
xABBABBAAAB
123456789
10
PatientAcc. No. Treatment
Follow-up Since Randomization (Months)
(180)(30)
(45)
(195)
(89)
(225)
(60)
(25)
(265)
(91 days)
0
Common Closing Date Examples
• MRFIT: February 28, 1982 (chosen to correspond to be the 6-year anniversary of last person randomized)
• SMART: January 11, 2006 (date investigators notified of early termination)
• ESPRIT: November 15, 2008 (date when target number of primary events, 320, estimated to occur)
Sample Size forTime to Event Comparisons
Number of required events depends on:
• Type I error (false positive rate)
• Power
• Hypothesized treatment effect, e.g., hazard ratio or relative risk
Note: Initial work assumed all participants would be followed to the event. This was extended to accommodate censoring and more complex trial situations, e.g., recruitment period, lag, dropouts, dropins.
Sample Size forTime to Event Comparison (cont.)
RR = Hypothesized hazard ratio (relative risk) (ratio of hazards for new treatment versus control)
Formulas can be derived assuming exponential survival or by assuming proportional hazards and use of log rank test.
No. EventsRequired
No. EventsRequired =
=
No. EventsRequired
=constant , 2PAPBLOGe
2 RR ; PA and PB proportion assigned A and B
4 constant , 2LOGe
2 RR for 1 : 1 allocation
31.4LOGe
2 RR for = .05 (2 - sided) and = .20
Freedman L, Stat Med 1982 and Schoenfeld D, Biometrika 1981.
Sample Size forTime to Event Comparison
To obtain N, Pc and Pe for the average total duration must be determined (need to consider length of follow-up and average hazard rate).
Let d = No. events required
N = No. of patients required
d = NPc Pe
2
Pc and Pe = cumulative event rate after time T
N = 2d
(Pc Pe)
Sample Size for Time to Event Comparison Assuming Exponential Survival (constant hazard)
Suppose λ = average (both treatment groups combined) event (hazard) rate.
Assuming uniform enrollment over E years and a minimum follow-up (F years) for each patient, average follow-up = E/2 + F
The prob (event) assuming exponential model is: 1- exp [- λ (E/2 +F)]
Example: λ = 10/1000 person years; E=3; F=4; then prob (event) over an average of 5.5 years = .0535
In general, to detect a hazard
ratio of .70 (30% reduction)
with alpha=0.05 and 80%
power, about 250 events are
required
Expected Control Rate: 7 events per 100 person years
Needed: 250 events ≈ 3,600 person years
(250/0.07)
Obtain by following: 3,600 patients 1 year
1,200 patients 3 years
250 patients until they experience an event
There Are a Number of Ways to Get TargetNumber of Events
Extensions of Sample Size forTime to Event Comparisons
Number of patients to obtain events depends on:
• Duration of follow-up
– Accrual period and pattern + minimum follow-up for each patients (if follow-up is to a common closing date)
• Expected pattern of events in control group
• Lag (whether hazards are proportional), dropouts, dropins, and losses (missing data)
Size program (Shih, Cont Clin Trials, 1996)
%size (np=8,pc = 0.0322 6,pcratio = 1 1.074 1.778 1.852 1.963, 2.037,prop = 0,logr = 0,lag = 3,k=0.845 0.688 0.465 0.465 0.465 0.465 0.465 0.465,alpha =0.05,power = 0.90,loss = 0.0 8,din = 0.00 8,dout = 0.50 6,doratio = 1 1 1 1 1 1 1 1,diratio = 1 1 1 1 1 1 1 1,loratio = 1 1 1 1 1 1 1 1,simult =0,rectime = 1 2,recratio = 1 1,ratio = 1);endsas;
Example: Similar to MRFIT
Assume patients are enrolled overa 2-year period and followed for a minimum of 6 years.
We specify eight 1-year timeperiods (np=8), and we estimate ourevent rate (pc) and dropout rate (dout)for 6 years and then extrapolate to8 periods (8 years).
This program assumes a non-constantcontrol event rate after 6 years of0.0322 and a dropout rate of 50% after6 years. The control hazard increaseswith time.
We will use log-rank statistics for analysis and assume a lag of 3 years.
Constant Versus Time-Dependent Outcome Pattern in Control Arm
• Halperin’s method assumed constant (exponential) event rate pattern in control arm.
• In many cases, the event rate pattern will not be constant:
• Lag (non-proportional hazards) and dropout (non-adherence to experimental treatment) can have very different effects if the time-dependent outcome pattern in the control arm is not recognized.
• Two examples:– MRFIT, based on Framingham data, event increases with each year of
follow-up– In trials of participants hospitalized for myocardial infarction, event rate is
higher earlier in follow-up compared to later.
MRFIT: Dropout, Lag (Non-proportional Hazards), and Constant Versus Time-Dependent Mortality Pattern
ConstantMortality (2N)
Dropout Rate+ (Lag)
Time-Dependent Mortality (2N)
+ Dropout rate assumed constant over follow-upK= 0.845 0.688 0.542 0.542 0.542 0.542 0.542 0.542 (3 year lag)
Control event pattern (1.0 1.074 1.778 1.852 1.963 2.037 2.139 2.246)based on Framingham
pc = 0.0322, HR = 0.465 (convert k to hazard ratio) by notingHR = ln (1-pe) / ln (1-pc) and ln (1-pe) = ln [1-(1-k)pc]
2-year enrollment; minimum follow-up of 6 years a = .05 (2-sided), 1 - b = 0.90
0.0 (no lag) 2850 2720
0.50 (no lag) 6700 7260
0.0 (3 years) 4300 3460
0.50 (3 years) 8500 7290
Sample sizes are rounded up.
Another Example (Similar to IMPROVE-IT Study, Amer Heart J 2008)
• Trial of aggressive lipid-lowering treatment in patients hospitalized for myocardial infarction (MI).
• Primary endpoint: CVD death or non-fatal MI
• Event rate expected to be highest in 1st 6-12 months following hospitalization.
• Full impact of lipid-lowering treatment not achieved for 18-24 months.
Lipid Lowering Trial: Lag (Non-proportional Hazards) and Constant Versus Time-Dependent Mortality Pattern
ConstantEvent Rate (2N)Lag
Time-Dependent Event Rate (2N)
+HR= 0.96 0.96 0.91 0.91 0.90 0.90 0.90 0.90 0.90 Time-dependent control event pattern (3.0 1.2 1 1 1 1 1 1 1)
pc = 0.235 after 2 years, HR = 0.90a = .05 (2-sided), 1 - b = 0.90
2 year enrollment period and 2.5 year minimum follow-up (divide into nine 6-month periods for SIZE)
No lag (HR=0.90) 10,600 12,070
1-year lag+ 18,130 26,970
Sample Size Adjustment to Account for Losses-to-Follow-up (Missing Outcome Data Due to Competing Events, Whereabouts Unknown, and Withdrawal of
Consent)
L = fraction of patients expected to be lost
NOTE: SIZE allows this to vary over follow-up. However, this adjustment takes care of the loss of power but not bias. Bias cannot be eliminated by increasing sample size
NNEW =NOLD
1 - L
Duration of Follow-up Considerations
• Availability of patients
• Ease of follow-up
• Funding
• Pressure for quick answers
• Information on secondary outcomes (e.g., safety)
• Generalizability of finding from high risk to low risk participants
• Possibility of changing treatment effect over time (e.g., lipid-lowering study, studies of AZT monotherapy)
Summary
• If possible, design trials with clinical outcomes with a target number of events – event-driven trials.
• Often there are many assumptions around sample size for which there is uncertainty, therefore plan for a sample size re-estimation based on pooled event rate (not unblinded treatment comparisons).