topics in clinical trials (6) - 2012 j. jack lee, ph.d. department of biostatistics university of...
Post on 23-Dec-2015
220 Views
Preview:
TRANSCRIPT
Topics in Clinical Trials (6) - 2012
J. Jack Lee, Ph.D.Department of BiostatisticsUniversity of Texas M. D. Anderson Cancer Center
How many patients are needed in a clinical trial?
It depends on what you want to achieve.
As large as possible until it bankrupts your bank account?
As small as it takes to get the trial approved?
N=1? N=14? N=1000?
Adequate N is needed for proper statistical inference. Inadequate N may lead to inconclusive or wrong results. Unduly large N may not be feasible and can also be unethical. Clinical Trials should have sufficient statistical power to detect clinically meaningful differences between groups. Sample size should be considered early in the planning phase.
Examples
Pilot study (feasibility): N 18Phase I (toxicity): 20 N 40 Phase II (efficacy): 30 N 100 Phase III (confirmatory): N > 100Primary Prevention Trials: N > 10,000
e.g. BCPT (Tamoxifen): N=16,000 (13,388)
PHS (aspirin, b-carotene): N=22,071
Essence of Sample Size Calculation
Adequate N is needed for proper statistical inference. Central limit theorem, large sample approximation Control false positive (type I) and false negative (type II)
errors
Inadequate N may lead to inconclusive or wrong results. 71 RCT failed to find sig. Results between groups
67 had > 10% risk of missing a 25% tx improvement 50 had > 10% risk of missing a 50% improvement(false negative results)
Spurious finding can occur by chance alone when N is small
(false positive results)
Unduly large N may not be feasible. It can also be unethical. Why?
Fundamental Points
Clinical Trials should be designed with good operating characteristics to yield valid scientific inference. Sufficient sample size is needed for estimation and/or hypothesis testing.Sample size calculation should be based on the identification of A primary endpoint The objective to be achieved on the
primary endpoint
Examples
In a single-arm Phase II trial Primary endpoint: Response rate Objective: find out whether the new
treatment can achieve a target response rate
In a randomized controlled Phase III trial Primary endpoint: Overall survival Objective: find out whether the new
treatment can yield a longer overall survival compared to the standard treatment
Sample Size Calculation Is Only An Estimate
Parameters used in calculation are estimates themselves with a level of uncertainty.Estimated tx effect may be based on a different population.Estimated tx effect is often overly optimistic based on highly selected pilot studies.Pts eligibility criteria may be changed, thus, affect the sample population.Rule of thumb: Be conservative May need a pilot study to refine the estimates Better to design a larger study with early
stopping and a smaller study than try to expand N /extend f/u during the trial.
Statistical Concepts
Estimation 1-sample: estimating tx effect 2-sample: estimating tx difference Methods for 1-sample binary endpoint, e.g.,
response rate Exact: e.g. Clopper-Pearson Interval Asymptotic Gaussian approximation
/ 2
ˆˆ ~ ( ,var( ))
ˆˆ(1 / 2) 100% C.I for : var( )
p N p p
p p z p
Hypothesis Testing
Estimation
Let X=systolic blood pressure (SBP)X ~ N(90,102)With sample size N, mean(X )~ N(90, 100/N )
Blood Pressure
De
nsi
ty
60 80 100 120
0.0
0.0
20
.04
0.0
60
.08
Distribution of BP, N=1
Blood Pressure
De
nsi
ty
60 80 100 120
0.0
0.0
20
.04
0.0
60
.08
Distribution of Mean BP, N=4
For binary response, e.g. 3 out of 10 metastatic breast cancer patients responded to Taxol. What is the estimated response rate p*?
p* ~ N ( p , p(1-p)/N) = N (0.3, 0.021)
SE = sqrt(0.021) = 0.145
Probability of ResponseStandard Error 0.1 0.2 0.3 0.4 0.5
0.2 3 4 6 6 70.1 9 16 21 24 250.05 36 64 84 96 1000.025 144 256 336 384 400
95% CI for p: p* 1.96 SE(p*) = 0.30 0.28 = (0.02, 0.58)
For binary response, e.g. 3 out of 10 metastatic breast cancer patients responded to Taxol. What is the estimated response rate p*?
Point estimate: p* = 0.30, SE(p*) = sqrt(0.3x0.7/10) = 0.145
95% CI for p: p* 1.96 SE(p*) = 0.30 0.28 = (0.02, 0.58)
Suppose we have 30 out of 100 metastatic breast cancer patients responded to Taxol. What is the estimated p*?
Point estimate: p* = 0.30, SE(p*) = sqrt(0.3x0.7/100) = 0.046
95% CI for p: p* 1.96 SE(p*) = 0.30 0.09 = (0.21, 0.39)
Sample Size Calculation Based on Estimation
SE = SD / sqrt(N)Width of 95% CI = 1.96 x SE x 2Compute N s.t. SE or width of CI is within a pre-specified precision
Hypothesis Testing
Framework of hypothesis testing
Action
Truth
H o
H1Ho
H1
b
a
a: Type I error
(level of significance)
b: Type II error
(1- b = Power)
Sample Size Calculation: Find N s.t. to a and b are under control.
Typically, compute N for a given a to yield (1- )b x100% power.
For example, compute N for = 0.05a to yield 80% power.
P-values
P-value = probability of obtaining data as extreme or more extreme as the observed result when the null hypothesis is true.Smaller p-values stronger evidence against H0.
Nothing sacred about p = 0.05. (p = 0.045 vs. p = 0.055)
Statistical Significance Clinical SignificanceLarge samples: small differences may be significantSmall samples: large differences may not be significantThe frequentist inference depends on sample space, i.e. the design.
Tools for Sample Size Calculation
STPLAN http://biostatistics.mdanderson.org/
SoftwareDownload/
NQueryPASSEaStMany web sites
Example: Let Y = reduction in SBP in an anti-hypertension trial
mean(Y) ~ Normal(m, s2); Ho: m ≤ 0 vs. H1: m > 0
If s = 20 and a = 0.05, how big N should be to have 80% power for testing Ho vs. H1 if the true m = 5 ?
2 2( ) /( / )N Z Z
N= 98.9 100
Blood Pressure Reduction
De
nsity
-5 0 5 10 15
0.0
0.0
50
.10
0.1
50
.20
H0 H1
Power
dstplan
0 50 100 150 200
N
0.0
0.2
0.4
0.6
0.8
1.0
pow
erPower vs. Sample Size
0 2 4 6 8
Difference
0.0
0.2
0.4
0.6
0.8
1.0
pow
er
Power vs. Difference
n = 5
0 10
Courtesy of Don Berry
0 10
n = 10n = 10
0 10
n = 30n = 30
0 10
n = 60n = 60
0 10
n = 90n = 90
Selecting Appropriate Statistical Methods for Categorical DataGoal Analysis
Describe one group Proportion
Compare one group to a Chi-square testhypothetical value
Compare two unpaired groups Chi-square test*
Compare two paired groups McNemar's test
Compare three or more Chi-square test*unmatched groups
Model the effect of multiple Logistic regression prognostic variables
*: When sample size is small, use Fisher’s exact test
Selecting Appropriate Statistical Methods for Gaussian DataGoal Analysis
Describe one group Mean, SD
Compare one group to a One-sample t-test hypothetical value
Compare two unpaired groups Two-sample t-test
Compare paired data Paired t-test
Compare three or more One-way ANOVAunmatched groups
Selecting Appropriate Statistical Methods for Non-Gaussian Data
Goal Analysis
Describe one group Median, Percentiles
Compare one group to a Signed-rank testhypothetical value
Compare two unpaired groups Mann-Whitney testWilcoxon rank sum test
Compare paired data Signed-rank test
Compare three or more unmatched groups Kruskal-Wallis test
Selecting Appropriate Statistical Methods for Survival Data
Goal Analysis
Describe one group Kaplan-Meier
Compare two unpaired groups log-rank test
Compare three or more Cox regressionunmatched groups
Model the effect of multiple Cox regressionprognostic factors
Sample Size Based on Hypothesis Testing for Continuous Outcome
One-sample test
Two-sample test
Note: Using Za for one-sided tests; replacing with Z /2 a for two-sided tests.
( /d s) is often called the effect size Cohen defined ES=.2, .5, and .8 as small, medium, and large, respectively.
Cohen (1988): Statistical power analysis for the Behavioral Sciences
2 2
2
Z ZN
2 2
2
42
Z ZN
Total N by Effect Size for 2-Sample Test
Effect Size One-sided a = 5% Two-sided a = 5%
80%
power
90%
power
80%
power
90%
power
.2 619 857 785 1,051
.5 99 138 126 169
.8 39 54 50 66
Sample Size Based on Hypothesis Testing for 2 Independent Binary Outcomes
2 22 4( ) (1 ) ( )C IN Z Z p p p p
How big N should be for comparing the response rates between doxorubicin (control) and FTI (new intervention) in pancreatic cancer patients?
Ho: pC = pI vs. H1: pC < pI
Estimated pC = 0.1, pI = 0.3, 1-sided a = 0.05, 1-b = .80
2N = 98.9, N 50 dstplan
Ex: Two-sample Binomial Probability
Annual event rate: Pc = 0.4, PI = 0.3 Two-sided a = 0.05 90% power
Sample size Total N = 956 Each group = 478
dstplan
Sample Size Calculation for Survival Outcome – Instantaneous Entry – No Censoring
Assume exponential survival
2 22 4( ) /(ln( / ))C IN Z Z
( ) exp( )
( ) exp( )
( )
S t t
f t t
h t
• To test Ho: lc = lI vs. H1: lc ≠ lI
Example: Exponential Survival
Assume lc = 0.30 and lI = 0.20. What will be the sample size needed to test the equality of hazard rate with two-sided a = 0.05 and 1 = 0.90b5-yr mortality rates are 0.7769 and 0.6321 for the control and intervention groups, respectively.
Median survival time = ln(.5)/ = 2.31 l and 3.47, respectively
By plugging the formula, N=128 or 2N=256.
Using the comparison of two proportions, 2N=412
Survival approach is more efficient.
Sample Size Calculation for Survival Outcome – Instantaneous Entry – With Censoring
All patients entered at the same time and censored at time T.
2 2
2
2 2( ) [ ( ) ( )]/( )
where
( ) /(1 )
C I C I
T
N Z Z
e
• In the previous example, if a 5-yr study is planned, then, the required sample size is
2N=376.
Sample Size Calculation for Survival Outcome – Staggered Entry
Assume participants are recruited uniformly over a period of To
The trial continues for T years (T > To)With 3 years of accrual in a 5-yr study, 2N = 466 by using similar formula as before but with
2
( )( )1 [ ]/oT T T
oe e T
The Key Quantity – Expected # of Events
Expected # of events is a function of sample size, hazard rate, recruitment rate, and censoring distribution.Assume uniform accrual over (0,To ) and f/u over (0, T)
2
( )
( ) / ( )
[1 ]oT T T
o
E D N
e eN
T
Sample Size Based on CI estimation for 2 Independent Binary Outcomes
2 2
2 (1 )(2 / )
8 (1 ) ( )
CI
CI
W Z p p N
N Z p p W
How big N should be such that the width of a 100(1-a)% CI for pI - pC will not exceed WCI.
Choose pI - pC = WCI / 2
2 2
2 2
2 2( ) (1 ) ( / 2)
8( ) (1 ) ( )
CI
CI
N Z Z p p W
Z Z p p W
The same as the H.T. formula with Zb = 0
Sample Size Based on Hypothesis Testing for Paired Binary Outcomes
2 2 2
2 2
( )
( )
p
p
N Z f Z f d d
N Z Z f d
How big N should be for comparing the response rates between control and intervention given to each of the two eyes, respectively?
Ho: pC = pI vs. H1: pC < pI
Estimated pC = 0.2, pI = 0.4, a = 0.05, 1-b = .90, and the proportion of the pts with discordant response f=.5
Np = 132
Sample Size for McNemar’s Test
Ho: pC = pI vs. H1: pC < pI
Estimated pC = 0.2, pI = 0.4, a = 0.05, 1-b = .90, and the proportion of the pts with discordant response f=.5
Intervention
Failure Success
Control Failure .45 .35 .80
Success .15 .05 .20
.60 .40
Use Connor (1987), N = 104
Impact of Noncompliance
“Diluting” the treatment effectIncrease sample size
* 2
For + =10%, sample size increas
/(1 )
where , are
e by 23%
drop-out and drop in rates, respecti
20%,
vely.
56%. o
o I
o I
I
N N R
R R
R
R
R
• Raise questions about the study validity
• Difficult to make proper inference to the population
Sample Size for Other Designs
Repeated measuresEquivalence trialsHistorical control trialsCluster randomization trials(Reading Assignment)Efficient targeted design trials
Premise (binary endpoint)
In the study population R+ : portion of marker positive (likely to
respond) R- : portion of marker negative (less likely
to respond) Proportion of R- : g
Patients were randomized into control and exp groupsResponse probability For R- pts: pc + d 0
For R+ pts: pc + d 1
Relative efficiency
Relative Efficiency: n/nT ( g = 0.5)
N (Randomized)
N (Screened)
Case 0:d 0 = 0
4 2
Case 1: d 0 = d 1 / 2 1.75 0.89
Gefitinib Trials
INTACT I & II Trials had 2,130 pts. Results were negative. We want to study an EGFR inhibitor in high-risk oral IEN Only a fraction (1 - g = .10) of subjects presenting the
target Response rates are: std tx (e.g., retinoids): pc = .40 EGFR inhibitor: w/ target is pc + 1 , w/o target is pc + 0
Sample size needed for 90% power at 2-sided 5% significance level
Design (Entry: N (Efficiency))
d 0 = 0 d 0 = d 1 /2
d 1 = .2 d 1 = .4 d 1 = .2 d 1 = .4
Untargeted Design 12,806 3,248 446 116
Targeted Design 138 (92.2)
34 (95.8) 138 (3.2) 34 (3.4)
Targeted Design (screened)
1,380 (9.2)
340 (9.6)
1,380 (0.3)
340 (0.3)
Adherence/Compliance Monitoring
Pill diaryPill count Forget to bring in the bottle Dump the remaining drugs into the toilet
Over-subscribe Dispense by weight – not precise
Special pill dispenser to monitor when the bottle is openedLaboratory test of drug level in serum or urine Half-life of the drug Choosing cutoff value to declare (+) or (-)
Percent compliance % compliance = # of pills taken / # of pills prescribed
Dose intensity # of pills taken / # of pills should have been taken per
protocol Measure the actual amount of drug taken
Main Reasons for Noncompliance
Toxicity or side effectsInvolving life style/behavior changeComplex or inconvenient interventionsInsufficient or lack of understanding instructionsChange of mind, refusalLack of family support
Other Adjustments for Sample Size
Increase number of screened/registered patients to take the ineligibility into consideration 10% ineligible, Total N = N/0.9
Increase number of randomized patients if not everyone is evaluable 5% inevaluable, Total N = N/0.95
Drop out, loss to f/uBe aware of informative censoringInterim analysis, sample size re-estimation To be covered later in the course
Sample Size/Power Calculation via Simulations for Hypothesis Testing
1. Generate data according to the study design
2. Compute the test statistics3. Determine whether you reject H0 or not4. Repeat steps 1-3 5. Useful tips
• Set seed to initialize the random number generator
• Check the distribution of the data to make sure they are accurately generated
• Run the test under H0 to verify the level of significance
• Do it for at least 1,000 trials. Precision of statistical power?
sqrt((.8x.2)/1,000) = 0.013sqrt((.5x.5)/1,000) = 0.016
Homework #7 (due 2/21)Sample size calculation for comparing two binomial probabilities
In a randomized Phase II trial, patients are randomized to receive either the standard treatment or a new targeted treatment. The goal is to compare the response rate between the two treatments by testing the following hypothesis H0: ps = pT
H1: ps pT Assume ps = 0.3, pT = 0.5, a=0.05, and b=0.1,
1. Calculate the sample size required assuming equal randomization between the two treatments. (Use STPLAN)
2. Applying the Bayesian response adaptive randomization, compute the required sample size using the following decision rules: At the end of trial, if Prob(ps > pT) > 0.95, conclude standard treatment is better. Otherwise, if Prob(ps < pT) > 0.95, conclude the new treatment is better. (The AR software can be downloaded from http://biostatistics.mdanderson.org/SoftwareDownload.)
3. Similar as in 2, compute the sample size but add an early stopping rule that at any given time of the study, if observe Prob(ps > pT) > 0.999, terminate the study and conclude standard treatment is better. Otherwise, if Prob(ps < pT) > 0.999, terminate the study and conclude the new treatment is better.
4. Compare the maximum sample size, averaged sample size, type I error, statistical power, probability of early stopping, probability of patients randomized into each arm, and the average number of responses observed in the trial in (a), (b), and (c).
Homework #8 (due 2/21)Sample size calculation for comparing survival endpoints in two groups
Instead of using the binary endpoint, we now assume that the anti-tumor activity is measured by a survival endpoint. Assume the 5-yr survival rate for recurrent head and neck cancer is about 30% for the standard treatment. Assume a new agent can increase the 5-yr survival rate to 50%. Please design a two-arm randomized study comparing the standard versus new treatments with a two-sided a = 5% and 90% power for testing equal hazard rate assuming exponential survival.
Compute the sample size needed (e.g., use STPLAN). 1. Assume instantaneous accrual and no censoring.
2. Assume instantaneous accrual with 5 years of f/u.
3. Compute the accrual rate and total sample size needed with 2 years of accrual and 3 years of additional follow-up, i.e. the total study duration is 5 years.
4. Please verify the result in 3. above by conducting simulation studies with at least 1,000 runs.
5. Compute the f/u time and total study duration required if the accrual time is 3 years with a rate of 5 patients per month.
top related