topics in clinical trials (6) - 2012 j. jack lee, ph.d. department of biostatistics university of...

Topics in Clinical Trials (6) - 2012

J. Jack Lee, Ph.D.Department of BiostatisticsUniversity of Texas M. D. Anderson Cancer Center

How many patients are needed in a clinical trial?

It depends on what you want to achieve.

As large as possible until it bankrupts your bank account?

As small as it takes to get the trial approved?

N=1? N=14? N=1000?

Adequate N is needed for proper statistical inference. Inadequate N may lead to inconclusive or wrong results. Unduly large N may not be feasible and can also be unethical. Clinical Trials should have sufficient statistical power to detect clinically meaningful differences between groups. Sample size should be considered early in the planning phase.

Examples

Pilot study (feasibility): N 18Phase I (toxicity): 20 N 40 Phase II (efficacy): 30 N 100 Phase III (confirmatory): N > 100Primary Prevention Trials: N > 10,000

e.g. BCPT (Tamoxifen): N=16,000 (13,388)

PHS (aspirin, b-carotene): N=22,071

Essence of Sample Size Calculation

Adequate N is needed for proper statistical inference. Central limit theorem, large sample approximation Control false positive (type I) and false negative (type II)

errors

Inadequate N may lead to inconclusive or wrong results. 71 RCT failed to find sig. Results between groups

67 had > 10% risk of missing a 25% tx improvement 50 had > 10% risk of missing a 50% improvement(false negative results)

Spurious finding can occur by chance alone when N is small

(false positive results)

Unduly large N may not be feasible. It can also be unethical. Why?

Fundamental Points

Clinical Trials should be designed with good operating characteristics to yield valid scientific inference. Sufficient sample size is needed for estimation and/or hypothesis testing.Sample size calculation should be based on the identification of A primary endpoint The objective to be achieved on the

primary endpoint

Examples

In a single-arm Phase II trial Primary endpoint: Response rate Objective: find out whether the new

treatment can achieve a target response rate

In a randomized controlled Phase III trial Primary endpoint: Overall survival Objective: find out whether the new

treatment can yield a longer overall survival compared to the standard treatment

Sample Size Calculation Is Only An Estimate

Parameters used in calculation are estimates themselves with a level of uncertainty.Estimated tx effect may be based on a different population.Estimated tx effect is often overly optimistic based on highly selected pilot studies.Pts eligibility criteria may be changed, thus, affect the sample population.Rule of thumb: Be conservative May need a pilot study to refine the estimates Better to design a larger study with early

stopping and a smaller study than try to expand N /extend f/u during the trial.

Statistical Concepts

Estimation 1-sample: estimating tx effect 2-sample: estimating tx difference Methods for 1-sample binary endpoint, e.g.,

response rate Exact: e.g. Clopper-Pearson Interval Asymptotic Gaussian approximation

/ 2

ˆˆ ~ ( ,var( ))

ˆˆ(1 / 2) 100% C.I for : var( )

p N p p

p p z p

Hypothesis Testing

Estimation

Let X=systolic blood pressure (SBP)X ~ N(90,102)With sample size N, mean(X )~ N(90, 100/N )

Blood Pressure

De

nsi

ty

60 80 100 120

0.0

0.0

20

.04

0.0

60

.08

Distribution of BP, N=1

Blood Pressure

De

nsi

ty

60 80 100 120

0.0

0.0

20

.04

0.0

60

.08

Distribution of Mean BP, N=4

For binary response, e.g. 3 out of 10 metastatic breast cancer patients responded to Taxol. What is the estimated response rate p*?

p* ~ N ( p , p(1-p)/N) = N (0.3, 0.021)

SE = sqrt(0.021) = 0.145

Probability of ResponseStandard Error 0.1 0.2 0.3 0.4 0.5

0.2 3 4 6 6 70.1 9 16 21 24 250.05 36 64 84 96 1000.025 144 256 336 384 400

95% CI for p: p* 1.96 SE(p*) = 0.30 0.28 = (0.02, 0.58)

For binary response, e.g. 3 out of 10 metastatic breast cancer patients responded to Taxol. What is the estimated response rate p*?

Point estimate: p* = 0.30, SE(p*) = sqrt(0.3x0.7/10) = 0.145

95% CI for p: p* 1.96 SE(p*) = 0.30 0.28 = (0.02, 0.58)

Suppose we have 30 out of 100 metastatic breast cancer patients responded to Taxol. What is the estimated p*?

Point estimate: p* = 0.30, SE(p*) = sqrt(0.3x0.7/100) = 0.046

95% CI for p: p* 1.96 SE(p*) = 0.30 0.09 = (0.21, 0.39)

Sample Size Calculation Based on Estimation

SE = SD / sqrt(N)Width of 95% CI = 1.96 x SE x 2Compute N s.t. SE or width of CI is within a pre-specified precision

Hypothesis Testing

Framework of hypothesis testing

Action

Truth

H o

H1Ho

H1

b

a

a: Type I error

(level of significance)

b: Type II error

(1- b = Power)

Sample Size Calculation: Find N s.t. to a and b are under control.

Typically, compute N for a given a to yield (1- )b x100% power.

For example, compute N for = 0.05a to yield 80% power.

P-values

P-value = probability of obtaining data as extreme or more extreme as the observed result when the null hypothesis is true.Smaller p-values stronger evidence against H0.

Nothing sacred about p = 0.05. (p = 0.045 vs. p = 0.055)

Statistical Significance Clinical SignificanceLarge samples: small differences may be significantSmall samples: large differences may not be significantThe frequentist inference depends on sample space, i.e. the design.

Tools for Sample Size Calculation

STPLAN http://biostatistics.mdanderson.org/

SoftwareDownload/

NQueryPASSEaStMany web sites

Example: Let Y = reduction in SBP in an anti-hypertension trial

mean(Y) ~ Normal(m, s2); Ho: m ≤ 0 vs. H1: m > 0

If s = 20 and a = 0.05, how big N should be to have 80% power for testing Ho vs. H1 if the true m = 5 ?

2 2( ) /( / )N Z Z

N= 98.9 100

Blood Pressure Reduction

De

nsity

-5 0 5 10 15

0.0

0.0

50

.10

0.1

50

.20

H0 H1

Power

dstplan

0 50 100 150 200

N

0.0

0.2

0.4

0.6

0.8

1.0

pow

erPower vs. Sample Size

0 2 4 6 8

Difference

0.0

0.2

0.4

0.6

0.8

1.0

pow

er

Power vs. Difference

n = 5

0 10

Courtesy of Don Berry

0 10

n = 10n = 10

0 10

n = 30n = 30

0 10

n = 60n = 60

0 10

n = 90n = 90

Selecting Appropriate Statistical Methods for Categorical DataGoal Analysis

Describe one group Proportion

Compare one group to a Chi-square testhypothetical value

Compare two unpaired groups Chi-square test*

Compare two paired groups McNemar's test

Compare three or more Chi-square test*unmatched groups

Model the effect of multiple Logistic regression prognostic variables

*: When sample size is small, use Fisher’s exact test

Selecting Appropriate Statistical Methods for Gaussian DataGoal Analysis

Describe one group Mean, SD

Compare one group to a One-sample t-test hypothetical value

Compare two unpaired groups Two-sample t-test

Compare paired data Paired t-test

Compare three or more One-way ANOVAunmatched groups

Selecting Appropriate Statistical Methods for Non-Gaussian Data

Goal Analysis

Describe one group Median, Percentiles

Compare one group to a Signed-rank testhypothetical value

Compare two unpaired groups Mann-Whitney testWilcoxon rank sum test

Compare paired data Signed-rank test

Compare three or more unmatched groups Kruskal-Wallis test

Selecting Appropriate Statistical Methods for Survival Data

Goal Analysis

Describe one group Kaplan-Meier

Compare two unpaired groups log-rank test

Compare three or more Cox regressionunmatched groups

Model the effect of multiple Cox regressionprognostic factors

Sample Size Based on Hypothesis Testing for Continuous Outcome

One-sample test

Two-sample test

Note: Using Za for one-sided tests; replacing with Z /2 a for two-sided tests.

( /d s) is often called the effect size Cohen defined ES=.2, .5, and .8 as small, medium, and large, respectively.

Cohen (1988): Statistical power analysis for the Behavioral Sciences

2 2

2

Z ZN

2 2

2

42

Z ZN

Total N by Effect Size for 2-Sample Test

Effect Size One-sided a = 5% Two-sided a = 5%

80%

power

90%

power

80%

power

90%

power

.2 619 857 785 1,051

.5 99 138 126 169

.8 39 54 50 66

Sample Size Based on Hypothesis Testing for 2 Independent Binary Outcomes

2 22 4( ) (1 ) ( )C IN Z Z p p p p

How big N should be for comparing the response rates between doxorubicin (control) and FTI (new intervention) in pancreatic cancer patients?

Ho: pC = pI vs. H1: pC < pI

Estimated pC = 0.1, pI = 0.3, 1-sided a = 0.05, 1-b = .80

2N = 98.9, N 50 dstplan

Ex: Two-sample Binomial Probability

Annual event rate: Pc = 0.4, PI = 0.3 Two-sided a = 0.05 90% power

Sample size Total N = 956 Each group = 478

dstplan

Sample Size Calculation for Survival Outcome – Instantaneous Entry – No Censoring

Assume exponential survival

2 22 4( ) /(ln( / ))C IN Z Z

( ) exp( )

( ) exp( )

( )

S t t

f t t

h t

• To test Ho: lc = lI vs. H1: lc ≠ lI

Example: Exponential Survival

Assume lc = 0.30 and lI = 0.20. What will be the sample size needed to test the equality of hazard rate with two-sided a = 0.05 and 1 = 0.90b5-yr mortality rates are 0.7769 and 0.6321 for the control and intervention groups, respectively.

Median survival time = ln(.5)/ = 2.31 l and 3.47, respectively

By plugging the formula, N=128 or 2N=256.

Using the comparison of two proportions, 2N=412

Survival approach is more efficient.

Sample Size Calculation for Survival Outcome – Instantaneous Entry – With Censoring

All patients entered at the same time and censored at time T.

2 2

2

2 2( ) [ ( ) ( )]/( )

where

( ) /(1 )

C I C I

T

N Z Z

e

• In the previous example, if a 5-yr study is planned, then, the required sample size is

2N=376.

Sample Size Calculation for Survival Outcome – Staggered Entry

Assume participants are recruited uniformly over a period of To

The trial continues for T years (T > To)With 3 years of accrual in a 5-yr study, 2N = 466 by using similar formula as before but with

2

( )( )1 [ ]/oT T T

oe e T

The Key Quantity – Expected # of Events

Expected # of events is a function of sample size, hazard rate, recruitment rate, and censoring distribution.Assume uniform accrual over (0,To ) and f/u over (0, T)

2

( )

( ) / ( )

[1 ]oT T T

o

E D N

e eN

T

Sample Size Based on CI estimation for 2 Independent Binary Outcomes

2 2

2 (1 )(2 / )

8 (1 ) ( )

CI

CI

W Z p p N

N Z p p W

How big N should be such that the width of a 100(1-a)% CI for pI - pC will not exceed WCI.

Choose pI - pC = WCI / 2

2 2

2 2

2 2( ) (1 ) ( / 2)

8( ) (1 ) ( )

CI

CI

N Z Z p p W

Z Z p p W

The same as the H.T. formula with Zb = 0

Sample Size Based on Hypothesis Testing for Paired Binary Outcomes

2 2 2

2 2

( )

( )

p

p

N Z f Z f d d

N Z Z f d

How big N should be for comparing the response rates between control and intervention given to each of the two eyes, respectively?


Estimated pC = 0.2, pI = 0.4, a = 0.05, 1-b = .90, and the proportion of the pts with discordant response f=.5

Np = 132

Sample Size for McNemar’s Test


Estimated pC = 0.2, pI = 0.4, a = 0.05, 1-b = .90, and the proportion of the pts with discordant response f=.5

Intervention

Failure Success

Control Failure .45 .35 .80

Success .15 .05 .20

.60 .40

Use Connor (1987), N = 104

Impact of Noncompliance

“Diluting” the treatment effectIncrease sample size

* 2

For + =10%, sample size increas

/(1 )

where , are

e by 23%

drop-out and drop in rates, respecti

20%,

vely.

56%. o

o I

o I

I

N N R

R R

R

R

R

• Raise questions about the study validity

• Difficult to make proper inference to the population

Sample Size for Other Designs

Repeated measuresEquivalence trialsHistorical control trialsCluster randomization trials(Reading Assignment)Efficient targeted design trials

Premise (binary endpoint)

In the study population R+ : portion of marker positive (likely to

respond) R- : portion of marker negative (less likely

to respond) Proportion of R- : g

Patients were randomized into control and exp groupsResponse probability For R- pts: pc + d 0

For R+ pts: pc + d 1

Relative efficiency

Relative Efficiency: n/nT ( g = 0.5)

N (Randomized)

N (Screened)

Case 0:d 0 = 0

4 2

Case 1: d 0 = d 1 / 2 1.75 0.89

Gefitinib Trials

INTACT I & II Trials had 2,130 pts. Results were negative. We want to study an EGFR inhibitor in high-risk oral IEN Only a fraction (1 - g = .10) of subjects presenting the

target Response rates are: std tx (e.g., retinoids): pc = .40 EGFR inhibitor: w/ target is pc + 1 , w/o target is pc + 0

Sample size needed for 90% power at 2-sided 5% significance level

Design (Entry: N (Efficiency))

d 0 = 0 d 0 = d 1 /2

d 1 = .2 d 1 = .4 d 1 = .2 d 1 = .4

Untargeted Design 12,806 3,248 446 116

Targeted Design 138 (92.2)

34 (95.8) 138 (3.2) 34 (3.4)

Targeted Design (screened)

1,380 (9.2)

340 (9.6)

1,380 (0.3)

340 (0.3)

Adherence/Compliance Monitoring

Pill diaryPill count Forget to bring in the bottle Dump the remaining drugs into the toilet

Over-subscribe Dispense by weight – not precise

Special pill dispenser to monitor when the bottle is openedLaboratory test of drug level in serum or urine Half-life of the drug Choosing cutoff value to declare (+) or (-)

Percent compliance % compliance = # of pills taken / # of pills prescribed

Dose intensity # of pills taken / # of pills should have been taken per

protocol Measure the actual amount of drug taken

Main Reasons for Noncompliance

Toxicity or side effectsInvolving life style/behavior changeComplex or inconvenient interventionsInsufficient or lack of understanding instructionsChange of mind, refusalLack of family support

Other Adjustments for Sample Size

Increase number of screened/registered patients to take the ineligibility into consideration 10% ineligible, Total N = N/0.9

Increase number of randomized patients if not everyone is evaluable 5% inevaluable, Total N = N/0.95

Drop out, loss to f/uBe aware of informative censoringInterim analysis, sample size re-estimation To be covered later in the course

Sample Size/Power Calculation via Simulations for Hypothesis Testing

1. Generate data according to the study design

2. Compute the test statistics3. Determine whether you reject H0 or not4. Repeat steps 1-3 5. Useful tips

• Set seed to initialize the random number generator

• Check the distribution of the data to make sure they are accurately generated

• Run the test under H0 to verify the level of significance

• Do it for at least 1,000 trials. Precision of statistical power?

sqrt((.8x.2)/1,000) = 0.013sqrt((.5x.5)/1,000) = 0.016

Homework #7 (due 2/21)Sample size calculation for comparing two binomial probabilities

In a randomized Phase II trial, patients are randomized to receive either the standard treatment or a new targeted treatment. The goal is to compare the response rate between the two treatments by testing the following hypothesis H0: ps = pT

H1: ps pT Assume ps = 0.3, pT = 0.5, a=0.05, and b=0.1,

1. Calculate the sample size required assuming equal randomization between the two treatments. (Use STPLAN)

2. Applying the Bayesian response adaptive randomization, compute the required sample size using the following decision rules: At the end of trial, if Prob(ps > pT) > 0.95, conclude standard treatment is better. Otherwise, if Prob(ps < pT) > 0.95, conclude the new treatment is better. (The AR software can be downloaded from http://biostatistics.mdanderson.org/SoftwareDownload.)

3. Similar as in 2, compute the sample size but add an early stopping rule that at any given time of the study, if observe Prob(ps > pT) > 0.999, terminate the study and conclude standard treatment is better. Otherwise, if Prob(ps < pT) > 0.999, terminate the study and conclude the new treatment is better.

4. Compare the maximum sample size, averaged sample size, type I error, statistical power, probability of early stopping, probability of patients randomized into each arm, and the average number of responses observed in the trial in (a), (b), and (c).

http://biostatistics.mdanderson.org/SoftwareDownload

Homework #8 (due 2/21)Sample size calculation for comparing survival endpoints in two groups

Instead of using the binary endpoint, we now assume that the anti-tumor activity is measured by a survival endpoint. Assume the 5-yr survival rate for recurrent head and neck cancer is about 30% for the standard treatment. Assume a new agent can increase the 5-yr survival rate to 50%. Please design a two-arm randomized study comparing the standard versus new treatments with a two-sided a = 5% and 90% power for testing equal hazard rate assuming exponential survival.

Compute the sample size needed (e.g., use STPLAN). 1. Assume instantaneous accrual and no censoring.

2. Assume instantaneous accrual with 5 years of f/u.

3. Compute the accrual rate and total sample size needed with 2 years of accrual and 3 years of additional follow-up, i.e. the total study duration is 5 years.

4. Please verify the result in 3. above by conducting simulation studies with at least 1,000 runs.

5. Compute the f/u time and total study duration required if the accrual time is 3 years with a rate of 5 patients per month.

topics in clinical trials (6) - 2012 j. jack lee, ph.d. department of biostatistics university of...

Documents

n slide

sample size n

large n

errors inadequate n

primary endpoint slide

sample population

sufficient sample size

sample binary endpoint