sample size estimation 1. general considerations 2. continuous response variable –parallel group...

Sample Size Estimation1. General considerations

2. Continuous response variable– Parallel group comparisons

• Comparison of response after a specified period of follow-up• Comparison of changes from baseline

– Crossover study

3. Success/failure response variable– Impact of non-compliance, lag– Realistic estimates of control event rate (Pc) and event rate

pattern– Use of epidemiological data to obtain realistic estimates of

experimental group event rate (Pe)

4. Time to event designs and variable follow-up

Comparison of Sample Size Formulae for Means and Proportions (n per group)

n =1

(Pc - Pe )2z1-/2 2P (1- P ) z1- Pc(1- Pc) + Pe(1-Pe ) 2

n =1

(Pc - Pe )2Pc (1- Pc ) + Pe (1- Pe ) z1-/2 z1- 2

Pc = Control group event rate

Pe = Experimental group event rate

P =(Pc + Pe )

2

Pc -Pe

For means: n =2s 2 z1-a/2 + z1-b( )2

D2

Example

• H0: Pc = Pe (proportion with event on control arm = proportion with event on experimental arm)

• HA: Pc = .40, Pe = .30

= .40 - .30 = .10

• Assume

a = .05 Za = 1.96 (2-sided)

1 - b = .90 Zb = 1.28

• p = (.40 + .30 )/2 = .35

Example (cont.)

N = 476; 2N = 952

2

2

34

6473282165352961

).(.

]))(.(.))(.(..))(.(..[

N

Approximate* Total Sample Size for Comparing Proportions in Two Groups of Equal Size with Significance Level () of

0.05 and Power (1- ) of 0.90

0.50 0.40 10400.30 2500.20 110

0.30 0.20 7900.15 3300.10 170

0.10 0.08 86000.05 11700.02 370

0.05 0.03 40300.025 24300.01 760

0.03 0.015 41000.03 0.019 82900.03 0.02 102400.019 0.013 18360

Total Sample SizePE

(Experimental Group)PC

(Control Group)

* Sample sizes are rounded up to the nearest 10.

Similar toMRFIT

Factors Which Influence “Realized Delta Pc-Pe”

(Delta = Hypothesized Treatment Difference)

• Non-compliance to experimental treatment

• Switchover from control to experimental treatment

• Lag time for experimental treatment to influence endpoint

• Events counted as an endpoint that are not influenced by treatments under study (e.g., accidental or violent deaths in a study of HIV treatments)

Strategy for Specifying Delta and Estimating Sample Size

• Begin by specifying the minimal effect of experimental treatment which would be considered clinically relevant (usually this is done in terms of a relative difference, e.g., relative risk or hazard ratio)

• Assume immediate full impact of treatment on endpoint and full compliance

• Adjust this “optimistic delta” downwards for non-compliance and lag if necessary

• Calculate sample size using “adjusted delta”

• Inflate sample size (again) for competing events and losses

• For planned sample size, assess impact of deviations from “adjusted delta” on power

2 x Variability x [Constant (,)]2

Delta2N PerGroup

=

No.primaryevents

No.competing

events

Losses

Standardization

Clinicaljudgement Biologic

plausibility Non-compliance

Lag

Simple Adjustment for Non-Compliance in Experimental Group

Example:Heart failure trial; primary endpoint is death or hospitalization for heart failure.

PC = 0.30 (Placebo group event rate) after 3 yearsPe = 0.24 (New treatment event rate) after 3 years

• Assume 20% of patients do not comply with experimental treatment (d)

• Assume risk of endpoint for non-compliers in experimental group is the same as placebo group

Simple Adjustment for Non-Compliance

NEW∆

eP

ADJ= 0.20 ( ) + 0.80 ( )

cP

eP

eP

ADJ= 0.252

= 0.30 - 0.252 = 0.048;

OLD∆ = 0.30 - 0.24 = 0.06

Unadjusted sample size = 1150 per group

Approximation: Inflate usual sample size by

where d = fraction of patients not complying2(1-d)

1

2(.8)1

New sample size = 1150

≈ 1800 per group( )

Compliance Adjustment

d = fraction who do not comply to exp. treatment

Inflate usual sample size by

Pe ADJ dP c (1 d)Pe

Pc Pe ADJ (1 d)(P c Pe )

1(1 - d) 2

TOXO Study DesignPower to Detect a 50% Differencefor Sample Size of 265 Patients

0 0 0.8010 0.7425 0.66

10 0 0.7510 0.7125 0.63

25 0 0.6910 0.6525 0.57

Switchover fromplacebo to active

Non-complianceto Pyrimethamine Power

Realistic Estimates of Pe and Pc

• Halperin M, J Chronic Dis,1968 (constant event rates for control and experimental groups, non-compliance in experimental group and lag)

• Wu M, Cont Clin Trials,1980 (extended Halperin’s method to non-compliance in control group and time-dependent non-compliance)

• Lakatos E, Cont Clin Trials,1986 and Biometrics, 1988 (extended to log rank test – time to event analyses)

• Shih J, Cont Clin Trials,1995 and Encyclopedia of Clinical Trials, 2007 (implemented Lakatos methods in SAS – Size program – allows event rates to vary and extended to weighted log rank and unequal allocation)

Definitions

Dropout – Non-compliance to exp. treatment

Dropin – Non-compliance to control treatment

Lag –Time for treatment to achieve maximum benefit

Lost-to-follow-up – A person for whom endpoint status is unknown (outcome is missing)

Halperin Model to Adjust for Non-Compliance and Lag in Experimental

Group

1. Specify

2. Specify (or k); k = where

k x 100 = % reduction in control group event rate due to experimental treatment

3. Specify nonadherence rate in experimental group: d

4. Specify lag: f

5. Obtain adjusted value of from table

6. Obtain inflated sample size estimate

cP

e P

eP

Pc - Pe

Pc

Later development:allow pattern of dropout to vary over follow-up of length T

d

T

CumulativeDropout

Rate

x

d

T

CumulativeDropout

Rate

x

0

0

Effect of Non-adherence on Pe

Non-Dropouts

T

Dropouts assume the risk of participants in the control arm.Their risk reverts in the same manner as it decreased before

dropout (immediately if lag=0)

c

e

0

c = hazard for controls

e = hazard for experimental group

Lag -- Halperin defined r (t), the hazard of event in experimental group, as follows:

r(t) = c(1- ), t < fkt

fc(1- k) = e, t > f{

T

c

e

0 f

Linear decline to ebetween T=0 and T=f

Halperin M, et al give tables for f=0, 0.5T, T and 2T.

Example: Heart Failure Trial with Death or Hospitalization for Heart Failure as Primary Endpoint

p c = 0.30 K = 0.20

p e = 0.24

d = 0.20

Table 1 of Halperin (f = 0)

pe = 0.246 K Adj. = 0.18

N = 1425 per group

Before we had pe ADJ = 0.252

N = 1800 per group

Assume event rate is 30% after 3 years; event rate is constant; 20% of

those assigned new treatment will discontinueit after 3 years (cumulative dropout=20%;

and there is no lag.

Impact of Dropout Pattern on pe and k:Heart Failure Example (cont.)

(1,1,1,1,1,1,1,1) (Halperin) 0.246 0.180(2,1,1,1,1,1,1,1)0.247 0.177(1,0,0,0,0,0,0,0)0.251 0.163(1,1,1,1,1,1,1,2)0.246 0.181(0,0,0,0,0,0,0,1)0.241 0.197

Pattern of Dropout OverFour Years (Eight 6-Month

Time Periods Adjusted pe Adjusted k

cp

epkdf

= 0.30= 0.24= 0.20= 0.20= 0

Comparison of Non-AdherenceAdjustments on Sample Size for Heart

Failure Trial

No adjustment .240 1150

Simple adjustment .252 1800(instantaneousnon-compliance)

Halperin (equal over .246 1425 follow-up)

Wu/Shih (twice as.247 1485much in 1st year)

Adj. pe N Per Group

Dropout Assumptionsin Major Trials

1. MRFIT (J Chronic Dis, 1977): 50% (2,1,1,1,1,1)

2. CPPT (JAMA, 1984): 35% (1,1,1,1,1,1,1)

3. Systolic Hypertension in the Elderly (SHEP) (J Clin Epid, 1988): 16% (2,1,1,1,1)

Example: Similar to MRFIT (Lag of 3 years)

Full Effect of Treatment is 50% and is Reached in 1/2 T

cp

epd T

= 0.03 (CHD death) K = 0.50= 0.015 = 0.05, 1- = 0.90= 0 (no dropouts) = 6 years, f = 3 years

Adjusted = 0.019 instead of 0.015e

p

NEW2N (f = 3) = 8290 versus 4100 with no lag adjustment;

alpha=0.05 (2-sided) and power=0.90.

0.50

T/2 T0 0

K

Adjustment for Both Non-Compliance and Lag (Parameters Similar to MRFIT)

p = 0.03; K = 0.50; = 0.05 (2-sided), 1- = 0.90

T = 6 years; f = 3 years (0.5T); d = 0.50

Adjusted pe = 0.022

2N = 16,610 and 2N = 4100 (no adjustment for lag or dropout)

c

NEW OLD

J Chron Dis 1976. Actually, MRFIT was designed as1-sided test with alpha=0.05 with unadjusted K=0.542.

Dropout and Dropin Assumptionsin Major Cardiovascular Trials

1. MRFIT 50 0

2. CPPT 35 0

3. SHEP 16 19

Dropout (%) Dropin (%)

Impact of Dropout, Dropin and LagAssumptions on Hypothesized Risk

Reductions

MRFIT 54% 27%

CPPT – 36%

SHEP 40% 32%

Unadjusted Adjusted

Example

TOXO Protocol

1. Primary endpoint: Toxoplasmic encephalitis (TE)

2. Control (placebo) group event rate: 30% in 2.5 years

3. Experimental (pyrimethamine) group event rate: 15% in 2.5 years (50% reduction)

4. Death rate unrelated to TE: 33%

5. Confidence in answer: = 0.05 (2-sided); 1 - (power) = 0.80

6. 2:1 allocation for pyrimethamine:placebo

TOXO Sample SizeInfluence of Non-Compliance

0 0 30.0 15.0 50.0 265

10 30.0 15.8 47.4 300

25 30.0 17.0 43.3 365

10 0 29.3 15.0 48.7 290

10 29.3 15.8 46.0 330

25 29.3 17.0 41.8 405

25 0 28.1 15.0 46.6 335

10 28.1 15.8 42.3 380

25 28.1 17.0 39.4 490

Switchover from

placebo to active

Non-complianceto

PyrimethamineSample

SizePercent

ReductionPlacebo Pyrimethamine

Event Rate (%)

Mis-specification of Control Group Event Rate

Approximate* Total Sample Size for Comparing Proportions in Two Groups of Equal Size with Significance Level () of

0.05 and Power (1- ) of 0.90

0.50 0.40 10400.30 2500.20 110

0.30 0.20 7900.15 3300.10 170

0.10 0.08 86000.05 11700.02 370

0.05 0.03 40300.025 24300.01 760

0.03 0.015 41000.03 0.019 82900.03 0.02 102400.019 0.013 18360

Total Sample SizePE

(Experimental Group)PC

(Control Group)

* Sample sizes are rounded up to the nearest 10.

Similar toMRFIT

Influence on Power of Mis-Specification of Control

Group Event Rate (Pc) in CPCRA TOXO Study

Design: Pc = 0.30; hypothesized percentage reduction

due to treatment = 50%; a = 0.05 (2-sided); 10%

switchover from placebo; 25% non-compliance to

pyrimethamine; combined sample size = 405

.30 0.80

.25 0.71

.20 0.62

.15 0.49

.10 0.35

Pc Power

Comparison of Observed and Expected Number of Deaths

Primary Prevention Studies

MRFIT (6 years)

CHD deaths 104 187 0.56

All deaths 219 442 0.50

Physician’s Health

Study (4.8 years)

CVD death 44 366 0.12

Helsinki Heart Study

Fatal/nonfatal 84 152 0.55

cardiac events

Observed ExpectedObserved/Expected

a

U.S. life tablesa

Comparison of Observed and Expected Number of Deaths

UGDP (8 years)

CVD deaths 10 17 0.59

BHAT (3 years)

All deaths 187 269 0.70

CDP (5 years)

All deaths 583 837 0.70

Observed ExpectedObserved/Expected

a

U.S. life tablesa

Impact of Medical Exclusions on Mortality

Deaths, Cause Known, by Interval Between Last Exam and Death

≤ 6 42 607-12 33 4213-24 20 45> 24 31 35

126

Interval

(months) All Deaths

Dead,Cause Known

(%)

• 60% of subjects who died ≤6 months after exam had a finding on exam related to death

• Impact of medical exclusions could be 50% during first 2 years

Schor et al., An Evaluation of the Periodic Health Examination, Annals Int Med, Dec. 1964.

Impact of Medical Exclusions on Mortality

Observed and Expected No. DeathsAmong 85,491 White Male Veterans

• 20-22 years after WWII, mortality among male veterans is lower than white U.S. males in general

Seltzer and Jablon, Effects of Selection on Mortality, Am J Epi, Vol. 100, 1974.

1947-51 623 844.3 0.738

1952-56 694 892.8 0.694

1957-61 1028 1200.1 0.857

1962-66 1621 1868.1 0.868

1967-69 1379 1597.0 0.863

Total 5345 6402.2 0.835

Year Observed Expected O/E

“Partial Solution” to Problems Resulting from Mis-Estimation of Control Group

Event Rate• Monitor parameters on which sample size is

based during the trial, i.e., the control group event rate, and extend the trial if necessary

• Plan for a sample-size re-estimation

• Design the study to continue until a certain number of events occur (i.e., event-driven trial) (this may not always be possible because of funding risks)

Usual Situation for“Time-to-Event” Clinical Trials

• Recruitment extends over several months or years.

• Trial design usually specifies minimum period of follow-up for all patients and study ends on a common closing date.

• Total trial duration = Recruitment period + minimum follow-up period following enrollment.

• Patients are followed for a variable length of time as a consequence of recruitment period and common closing date.

Usual Situation (cont.)

• Time to event methods: Kaplan-Meier life-tables, Cox models and log rank statistics are used to compare groups, e.g., Ho: Se=Sc (survival functions for experimental and control groups are equal)

• Sample size based on log rank test instead of tests of proportions.

For studies in which the study duration is short comparedto average event time (e.g., survival time), sample size using

proportions (over average follow-up) is similar to that using time to event (log rank). When this is not the case, using proportions

usually results in a larger sample size than considering time to event.

Reasons for Censoring

• End of follow-up (administrative)

• Lost to follow-up (bias is a concern)

• Competing event (e.g., death from an accident in a CVD study; in some cases bias is also a concern)

Typical Enrollment in Trial

xx

x

1 2 3 4 5 6 7 8 9 April 301977

x

xABBABBAAAB

123456789

10

PatientAcc. No. Treatment

End ofStudy

x – Death – Censored

Calendar Time from Start of Study (Months)

0

Common Closing Date (def) – the calendar date that is the end of follow-up for all patients (except deaths, withdrawals,

losses). The date through which events are counted for the primary analysis.

April 30, 1977 in example

Conversion to Timefrom Randomization

xx

x

1 2 3 4 5 6 7 8 9 10

x

xABBABBAAAB

123456789

10

PatientAcc. No. Treatment

Follow-up Since Randomization (Months)

(180)(30)

(45)

(195)

(89)

(225)

(60)

(25)

(265)

(91 days)

0

Common Closing Date Examples

• MRFIT: February 28, 1982 (chosen to correspond to be the 6-year anniversary of last person randomized)

• SMART: January 11, 2006 (date investigators notified of early termination)

• ESPRIT: November 15, 2008 (date when target number of primary events, 320, estimated to occur)

Sample Size forTime to Event Comparisons

Number of required events depends on:

• Type I error (false positive rate)

• Power

• Hypothesized treatment effect, e.g., hazard ratio or relative risk

Note: Initial work assumed all participants would be followed to the event. This was extended to accommodate censoring and more complex trial situations, e.g., recruitment period, lag, dropouts, dropins.

Sample Size forTime to Event Comparison (cont.)

RR = Hypothesized hazard ratio (relative risk) (ratio of hazards for new treatment versus control)

Formulas can be derived assuming exponential survival or by assuming proportional hazards and use of log rank test.

No. EventsRequired

No. EventsRequired =

=

No. EventsRequired

=constant , 2PAPBLOGe

2 RR ; PA and PB proportion assigned A and B

4 constant , 2LOGe

2 RR for 1 : 1 allocation

31.4LOGe

2 RR for = .05 (2 - sided) and = .20

Freedman L, Stat Med 1982 and Schoenfeld D, Biometrika 1981.

Sample Size forTime to Event Comparison

To obtain N, Pc and Pe for the average total duration must be determined (need to consider length of follow-up and average hazard rate).

Let d = No. events required

N = No. of patients required

d = NPc Pe

2

Pc and Pe = cumulative event rate after time T

N = 2d

(Pc Pe)

Sample Size for Time to Event Comparison Assuming Exponential Survival (constant hazard)

Suppose λ = average (both treatment groups combined) event (hazard) rate.

Assuming uniform enrollment over E years and a minimum follow-up (F years) for each patient, average follow-up = E/2 + F

The prob (event) assuming exponential model is: 1- exp [- λ (E/2 +F)]

Example: λ = 10/1000 person years; E=3; F=4; then prob (event) over an average of 5.5 years = .0535

In general, to detect a hazard

ratio of .70 (30% reduction)

with alpha=0.05 and 80%

power, about 250 events are

required

Expected Control Rate: 7 events per 100 person years

Needed: 250 events ≈ 3,600 person years

(250/0.07)

Obtain by following: 3,600 patients 1 year

1,200 patients 3 years

250 patients until they experience an event

There Are a Number of Ways to Get TargetNumber of Events

Extensions of Sample Size forTime to Event Comparisons

Number of patients to obtain events depends on:

• Duration of follow-up

– Accrual period and pattern + minimum follow-up for each patients (if follow-up is to a common closing date)

• Expected pattern of events in control group

• Lag (whether hazards are proportional), dropouts, dropins, and losses (missing data)

Size program (Shih, Cont Clin Trials, 1996)

%size (np=8,pc = 0.0322 6,pcratio = 1 1.074 1.778 1.852 1.963, 2.037,prop = 0,logr = 0,lag = 3,k=0.845 0.688 0.465 0.465 0.465 0.465 0.465 0.465,alpha =0.05,power = 0.90,loss = 0.0 8,din = 0.00 8,dout = 0.50 6,doratio = 1 1 1 1 1 1 1 1,diratio = 1 1 1 1 1 1 1 1,loratio = 1 1 1 1 1 1 1 1,simult =0,rectime = 1 2,recratio = 1 1,ratio = 1);endsas;

Example: Similar to MRFIT

Assume patients are enrolled overa 2-year period and followed for a minimum of 6 years.

We specify eight 1-year timeperiods (np=8), and we estimate ourevent rate (pc) and dropout rate (dout)for 6 years and then extrapolate to8 periods (8 years).

This program assumes a non-constantcontrol event rate after 6 years of0.0322 and a dropout rate of 50% after6 years. The control hazard increaseswith time.

We will use log-rank statistics for analysis and assume a lag of 3 years.

Constant Versus Time-Dependent Outcome Pattern in Control Arm

• Halperin’s method assumed constant (exponential) event rate pattern in control arm.

• In many cases, the event rate pattern will not be constant:

• Lag (non-proportional hazards) and dropout (non-adherence to experimental treatment) can have very different effects if the time-dependent outcome pattern in the control arm is not recognized.

• Two examples:– MRFIT, based on Framingham data, event increases with each year of

follow-up– In trials of participants hospitalized for myocardial infarction, event rate is

higher earlier in follow-up compared to later.

MRFIT: Dropout, Lag (Non-proportional Hazards), and Constant Versus Time-Dependent Mortality Pattern

ConstantMortality (2N)

Dropout Rate+ (Lag)

Time-Dependent Mortality (2N)

+ Dropout rate assumed constant over follow-upK= 0.845 0.688 0.542 0.542 0.542 0.542 0.542 0.542 (3 year lag)

Control event pattern (1.0 1.074 1.778 1.852 1.963 2.037 2.139 2.246)based on Framingham

pc = 0.0322, HR = 0.465 (convert k to hazard ratio) by notingHR = ln (1-pe) / ln (1-pc) and ln (1-pe) = ln [1-(1-k)pc]

2-year enrollment; minimum follow-up of 6 years a = .05 (2-sided), 1 - b = 0.90

0.0 (no lag) 2850 2720

0.50 (no lag) 6700 7260

0.0 (3 years) 4300 3460

0.50 (3 years) 8500 7290

Sample sizes are rounded up.

Another Example (Similar to IMPROVE-IT Study, Amer Heart J 2008)

• Trial of aggressive lipid-lowering treatment in patients hospitalized for myocardial infarction (MI).

• Primary endpoint: CVD death or non-fatal MI

• Event rate expected to be highest in 1st 6-12 months following hospitalization.

• Full impact of lipid-lowering treatment not achieved for 18-24 months.

Lipid Lowering Trial: Lag (Non-proportional Hazards) and Constant Versus Time-Dependent Mortality Pattern

ConstantEvent Rate (2N)Lag

Time-Dependent Event Rate (2N)

+HR= 0.96 0.96 0.91 0.91 0.90 0.90 0.90 0.90 0.90 Time-dependent control event pattern (3.0 1.2 1 1 1 1 1 1 1)

pc = 0.235 after 2 years, HR = 0.90a = .05 (2-sided), 1 - b = 0.90

2 year enrollment period and 2.5 year minimum follow-up (divide into nine 6-month periods for SIZE)

No lag (HR=0.90) 10,600 12,070

1-year lag+ 18,130 26,970

Sample Size Adjustment to Account for Losses-to-Follow-up (Missing Outcome Data Due to Competing Events, Whereabouts Unknown, and Withdrawal of

Consent)

L = fraction of patients expected to be lost

NOTE: SIZE allows this to vary over follow-up. However, this adjustment takes care of the loss of power but not bias. Bias cannot be eliminated by increasing sample size

NNEW =NOLD

1 - L

Duration of Follow-up Considerations

• Availability of patients

• Ease of follow-up

• Funding

• Pressure for quick answers

• Information on secondary outcomes (e.g., safety)

• Generalizability of finding from high risk to low risk participants

• Possibility of changing treatment effect over time (e.g., lipid-lowering study, studies of AZT monotherapy)

Summary

• If possible, design trials with clinical outcomes with a target number of events – event-driven trials.

• Often there are many assumptions around sample size for which there is uncertainty, therefore plan for a sample size re-estimation based on pooled event rate (not unblinded treatment comparisons).

sample size estimation 1. general considerations 2. continuous response variable –parallel group...

Documents