sample size/power calculation by software/online...

Sample Size/Power Calculation by Software/Online Calculators

May 24, 2018

Li Zhang, [email protected] Professor Department of Epidemiology and BiostatisticsDivision of Hematology and Oncology Department of MedicineUniversity of California, San Francisco

Topics

• R packages• SAS Proc• Online calculator: CTSI sample size calculator• Online calculator for clinical trial: SWOG• Software: G*Power

2

Power Analysis with R

3

https://www.statmethods.net/stats/power.html

Power/Sample size calculation for one or two proportions• Power calculations for proportion tests (one sample)

– H0: p=p1 vs. Ha: p ≠ p1– pwr.p.test(h, n, sig.level, power, alternative =

c("two.sided","less","greater"))• Power calculation for two proportions (same sample

size)– H0: p1=p2 vs. Ha: p1 ≠ p2– pwr.2p.test(h, n, sig.level, power,

alternative=c("two.sided","less","greater"))• Power calculation for two proportions (different

sample sizes)– H0: p1=p2 vs. Ha: p1 ≠ p2– pwr.2p2n.test(h, n1, n2, sig.level, power, alternative

= c("two.sided", "less","greater"))• Effect size calculation• R Demo

4

Power calculations for chi-squared testspwr.chisq.test(w = NULL, N = NULL, df = NULL, sig.level = 0.05, power = NULL)• ES.w1(P0, P1):

– Effect size calculation in the chi-squared test for goodness of fit, which is the sum of differences between observed and expected outcome frequencies

– Compute effect size w for two sets of k probabilities P0 (null hypothesis) and P1 (alternative hypothesis)

• ES.w2(P0, P1): – Effect size calculation in the chi-squared test for

association– Compute effect size w for a two-way probability table

corresponding to the alternative hypothesis in the chi-squared test of association in two-way contingency tables

5

Power/Sample size calculation for one or two means• Power calculations for t-tests of means (one sample,

two samples and paired samples)– One sample: H0: 𝞵 = 𝞵1 vs. Ha: 𝞵 ≠ 𝞵1– Two sample or paired samples: H0: 𝞵1= 𝞵2 vs. Ha: 𝞵1

≠ 𝞵2– pwr.t.test(n, d, sig.level, power, type =

c("two.sample", "one.sample", "paired"), alternative = c("two.sided", "less", "greater"))

• Power calculations for two samples (different sizes) t-tests of means

– H0: 𝞵1= 𝞵2 vs. Ha: 𝞵1 ≠ 𝞵2– pwr.t2n.test(n1, n2, d, sig.level = 0.05, power,

alternative = c("two.sided", "less","greater"))

• R Demo6

Power calculations for balanced one-way analysis of variance tests• pwr.anova.test(k = NULL, n = NULL, f = NULL,

sig.level = 0.05, power = NULL)

7

k Number of groupsn Number of observations (per group)f Effect size

Power calculations for the general linear model• pwr.f2.test(u = NULL, v = NULL, f2 = NULL,

sig.level = 0.05, power = NULL)• u and v are the numerator and denominator degrees of

freedom. We use f2 as the effect size measure.• when evaluating the impact of a set of predictors on an

outcome

• when evaluating the impact of one set of predictors above and beyond a second set of predictors (or covariates)

8

Other R Packages for Sample Size Calculation• powerSurvEpi: Power and Sample Size Calculation for

Survival Analysis of Epidemiological Studies • epiR: Sample size cohort study, case-control study,

cross-sectional study, under one or two-stage cluster sampling

• kappaSize: Sample Size Estimation Functions for Studies of Interobserver Agreement

• powerMediation: Power/Sample size calculation for mediation analysis, simple linear regression, logistic regression, or longitudinal study

• power.roc.test {pROC}: Computes sample size, power, significance level or minimum AUC for ROC curves.

• RNASeqPower: Sample Size for RNA-Seq and similar Studies

9

Case-control study by library(epiR)

• A matched case control study is to be carried out to quantify the association between exposure A and an outcome B.

– Assume the prevalence of exposure in controls is 0.60 and the correlation between case and control exposures for matched pairs (rho) is 0.20 (moderate).

– Assuming an equal number of cases and controls, how many subjects need to be enrolled into the study to detect an odds ratio of 3.0 with 0.80 power using a two-sided 0.05 test?

– epi.ccsize(OR = 3.0, p0 = 0.60, n = NA, power = 0.80, r = 1, rho = 0.2, design = 1, sided.test = 2, conf.level = 0.95, method = "matched", fleiss = FALSE)

– A total of 162 subjects need to be enrolled in the study: 81 cases and 81 controls.

10

Case-control study by library(epiR)

• How many cases and controls are required if we select three controls per case?

• epi.ccsize(OR = 3.0, p0 = 0.60, n = NA, power = 0.80, r = 3, rho = 0.2, design = 1, sided.test = 2, conf.level = 0.95, method = "matched", fleiss = FALSE)

• A total of 204 subjects need to be enrolled in the study: 51 cases and 153 controls.

11

kappaSize: Sample Size Estimation Functions for Studies of Interobserver Agreement

• Library(kappaSize)• Can handle binary to 5 categories

• Confidence Interval Approach– E.g. CI3Cats

• Calculation of the Lowest Expected Value – E.g., FixedN4Cats

• Power-Based Approach– E.g., PowerBinary

12

Computes sample size/power/minimum AUC for ROC curvespower.roc.test(...) • One or Two ROC curves test with roc objects:

– power.roc.test(roc1, roc2, sig.level = 0.05, power = NULL, alternative = c("two.sided", "one.sided"), reuse.auc=TRUE, method = c("delong", "bootstrap", "obuchowski"), ...)

• One ROC curve with a given AUC: – power.roc.test(auc = NULL, ncontrols = NULL, ncases =

NULL, sig.level = 0.05, power = NULL, kappa = 1, alternative = c("two.sided", "one.sided"), ...)

• Two ROC curves with the given parameters: – power.roc.test(parslist, ncontrols = NULL, ncases = NULL,

sig.level = 0.05, power = NULL, kappa = 1, alternative = c("two.sided", "one.sided"), ...)

13

RNASeqPower: Sample Size for RNA-Seqand Similar Studies • rnapower(depth, n, n2 = n, cv, cv2 = cv, effect, alpha,

power) – depth average depth of coverage for the transcript or gene

of interest. Common values are 5-20, any numeric value >0 is valid.

– n sample size in group 1 (or both)– n2 sample size in group 2– cv biological coefficient of variation in group 1 (or both). – cv2 biological coefficient of variation in group 2– effect size target effect size

14

Comments about R packages

• Pros:– Free– A lot of resources for different tests/study designs– Generate a figure/table easily for different options of

parameters, for example, sample size calculation for sequencing data

• Cons:– Need to write codes– Hard to implement sometimes– Not sure about reliability

15

Power Analysis with SAS

• SAS – PROC POWER

• t-tests, equivalence tests, and confidence intervals for means tests,

• equivalence tests, and confidence intervals for binomial proportions

• multiple regression• tests of correlation and partial correlation• one-way analysis of variance• rank tests for comparing two survival curves• logistic regression with binary response• Wilcoxon-Mann-Whitney (rank-sum) test

– PROC GLMPOWER: Compute Power and Sample Size for Repeated Measures

16

SAS Example: Calculate power for Pearson chi-squared tests • Same sample size, two-sided test of proportions

proc power;twosamplefreq test=pchi groupproportions=(0.1 0.5) npergroup=30 power=.;run;

17

The SAS System 1

The POWER ProcedurePearson Chi-square Test for Proportion Difference

The SAS System 1


Fixed Scenario Elements

Distribution Asymptotic normal

Method Normal approximation

Group 1 Proportion 0.1


Sample Size per Group 30

Number of Sides 2

Null Proportion Difference 0

Alpha 0.05

ComputedPower

Power

0.943

The SAS System 1


The SAS System 1








Number of Sides 2


Alpha 0.05

ComputedPower

Power

0.943

SAS Example: Calculate power for Pearson chi-squared tests • Different sample size, two-sided test of proportions

proc power;twosamplefreq test=pchi groupproportions=(0.1 0.5) groupns=25 | 50 power=.;run;

18

The SAS System 2


The SAS System 2







Group 1 Sample Size 25


Number of Sides 2


Alpha 0.05

ComputedPower

Power

0.966

The SAS System 2


The SAS System 2









Number of Sides 2


Alpha 0.05

ComputedPower

Power

0.966

SAS Example: Calculate power for t-tests• Two independent samples, same size

proc power;twosamplemeans test=diff meandiff=2 stddev=2.8 npergroup=30 power=.;run;

19

The SAS System 3

The POWER ProcedureTwo-Sample t Test for Mean Difference

The SAS System 3



Distribution Normal

Method Exact

Mean Difference 2

Standard Deviation 2.8


Number of Sides 2

Null Difference 0

Alpha 0.05

ComputedPower

Power

0.776

The SAS System 3


The SAS System 3



Distribution Normal

Method Exact

Mean Difference 2



Number of Sides 2

Null Difference 0

Alpha 0.05

ComputedPower

Power

0.776

SAS Example: Calculate power for t-tests• One sample

proc power;onesamplemeans test=t mean=2 stddev=2.8 ntotal=30 power=.; run;

20

The SAS System 4

The POWER ProcedureOne-Sample t Test for Mean

The SAS System 4



Distribution Normal

Method Exact

Mean 2


Total Sample Size 30

Number of Sides 2

Null Mean 0

Alpha 0.05

ComputedPower

Power

0.966

The SAS System 4


The SAS System 4



Distribution Normal

Method Exact

Mean 2


Total Sample Size 30

Number of Sides 2

Null Mean 0

Alpha 0.05

ComputedPower

Power

0.966

SAS Example: Calculate power for t-tests• Paired samples

proc power;pairedmeans test=diff meandiff=2 corr=0.5 stddev=2.8 npairs=30 power=.; run;

21

The SAS System 5

The POWER ProcedurePaired t Test for Mean Difference

The SAS System 5



Distribution Normal

Method Exact

Mean Difference 2


Correlation 0.5

Number of Pairs 30

Number of Sides 2

Null Difference 0

Alpha 0.05

ComputedPower

Power

0.966

The SAS System 5


The SAS System 5



Distribution Normal

Method Exact

Mean Difference 2


Correlation 0.5

Number of Pairs 30

Number of Sides 2

Null Difference 0

Alpha 0.05

ComputedPower

Power

0.966

SAS Example: Calculate power for t-tests• Two independent samples, different sizes

proc power;twosamplemeans test=diff meandiff=2 stddev=2.8 groupns=(20 40) power=.; run;

22

The SAS System 6


The SAS System 6



Distribution Normal

Method Exact

Mean Difference 2




Number of Sides 2

Null Difference 0

Alpha 0.05

ComputedPower

Power

0.727

The SAS System 6


The SAS System 6



Distribution Normal

Method Exact

Mean Difference 2




Number of Sides 2

Null Difference 0

Alpha 0.05

ComputedPower

Power

0.727

UCSF CTSI Sample Size Calculators

• http://www.sample-size.net– Can do most of the popular tests– Compare the mean of a continuous measurement in

two samples which allow for clustered sampling.• A cluster randomized controlled trial is a type

of randomized controlled trial in which groups of subjects (as opposed to individual subjects) are randomised.

23

Online Calculators: SWOG

https://stattools.crab.org• Primary objective is not a hypothesis, just estimation,

then provide the precision of the estimation– Example: The expected adherence rate is 80%, n=50– 95% CI is (66.3%, 90.0%)

• One arm binomial: H0: P=0.1 vs. Ha: P≠0.1• One arm survival:

– Length of the accrual period– Length of the follow-up period, i.e. the time from end of

accrual to analysis– H0: median OS=6 months vs. Ha: median OS>6 months– H0: s(t)=0.5 at 6 months vs. Ha: s(t) = 0.5 at 9 months

24

Online Calculators: SWOG (cont.)

• Two-arm binomial: H0: P1=P2 vs. Ha: P1≠P2– P1 = 0.1 vs. P2 = 0.25

• Two-arm survival: – Length of the accrual period– Length of the follow-up period, i.e. the time from end of

accrual to analysis– H0: HR = 2 vs. Ha: HR ≠ 2 (Median OS = 6months for null,

12-month accrual and 12-month followup)

25


al If the number of successes after completing the first stage is < al, we reject the alternative hypothesis that p > Pa.

rl If the number of successes after completing the first stage is > rl, we reject the null hypothesis that p < P0.

a2 If the number of successes after completing the trial is < a2 then we reject the alternative hypothesis.

r2 If the number of successes after completing the trial is > r2 then we reject the null hypothesis.

26

Two stage


• Other options:– Survival noninferiority

• Competing Risk: the hazard of the competing risk random variable

• Hazard ratios between experimental and standard defining equivalence

• Hazard ratio must be less than hazard ratio defining equivalence

– Expected Deaths• Make a table of expected death information• Provide expected deaths for a given time• Provide expected deaths for a time at which the

expected proportion of deaths have occurred.

27

Online Calculators: Simon’s two stage design • http://cancer.unc.edu/biostatistics/program/ivanova/Simo

nsTwoStageDesign.aspx– One arm Phase II clinical trial– Endpoint: Response rate or binary outcome– Incorporate interim analysis for futility – One-sided test– Example: H0: P=0.1 vs. Ha: P>0.1

Simon's two-stage design (Simon, 1989) will be used. The null hypothesis that the true response rate is 0.1 will be tested against a one-sided alternative. In the first stage, 22 patients will be accrued. If there are 2 or fewer responses in these 22 patients, the study will be stopped. Otherwise, 18 additional patients will be accrued for a total of 40.The null hypothesis will be rejected if 8 or more responses are observed in 40 patients. This design yields a type I error rate of 0.04 and power of 80%when the true response rate is 0.25.

28

G* Power

• G*Power is a tool to compute statistical power analyses for many different t tests, F tests, χ2 tests, z tests and some exact tests.

• G*Power can also be used to compute effect sizes and to display graphically the results of power analyses.

• It is free, both Windows and Mac version.

29

Exact: Proportion - inequality, two dependent groups (McNemar)

5 Exact: Proportion - inequality, twodependent groups (McNemar)

This procedure relates to tests of paired binary responses.Such data can be represented in a 2 ⇥ 2 table:

StandardTreatment Yes No

Yes p11 p12 ptNo p21 p22 1 � pt

ps 1 � ps 1

where pij denotes the probability of the respective re-sponse. The probability pD of discordant pairs, that is, theprobability of yes/no-response pairs, is given by pD =p12 + p21. The hypothesis of interest is that ps = pt, whichis formally identical to the statement p12 = p21.

Using this fact, the null hypothesis states (in a ratio no-tation) that p12 is identical to p21, and the alternative hy-pothesis states that p12 and p21 are different:

H0 : p12/p21 = 1H1 : p12/p21 6= 1.

In the context of the McNemar test the term odds ratio (OR)denotes the ratio p12/p21 that is used in the formulation ofH0 and H1.

5.1 Effect size indexThe Odds ratio p12/p21 is used to specify the effect size.The odds ratio must lie inside the interval [10�6, 106]. Anodds ratio of 1 corresponds to a null effect. Therefore thisvalue must not be used in a priori analyses.

In addition to the odds ratio, the proportion of discordantpairs, i.e. pD, must be given in the input parameter fieldcalled Prop discordant pairs. The values for this propor-tion must lie inside the interval [#, 1 � #], with # = 10�6.

If pD and d = p12 � p21 are given, then the odds ratiomay be calculated as: OR = (d + pD)/(d � pD).

5.2 OptionsPress the Options button in the main window to select oneof the following options.

5.2.1 Alpha balancing in two-sided tests

The binomial distribution is discrete. It is therefore notnormally possible to arrive at the exact nominal a-level.For two-sided tests this leads to the problem how to “dis-tribute” a to the two sides. G * Power offers the three op-tions listed here, the first option being selected by default:

1. Assign a/2 to both sides: Both sides are handled inde-pendently in exactly the same way as in a one-sidedtest. The only difference is that a/2 is used instead ofa. Of the three options offered by G * Power , this oneleads to the greatest deviation from the actual a (in posthoc analyses).

2. Assign to minor tail a/2, then rest to major tail (a2 =a/2, a1 = a � a2): First a/2 is applied to the side of

the central distribution that is farther away from thenoncentral distribution (minor tail). The criterion usedon the other side is then a � a1, where a1 is the actuala found on the minor side. Since a1 a/2 one canconclude that (in post hoc analyses) the sum of the ac-tual values a1 + a2 is in general closer to the nominala-level than it would be if a/2 were assigned to bothsides (see Option 1).

3. Assign a/2 to both sides, then increase to minimize the dif-ference of a1 + a2 to a: The first step is exactly the sameas in Option 1. Then, in the second step, the criticalvalues on both sides of the distribution are increased(using the lower of the two potential incremental a-values) until the sum of both actual a values is as closeas possible to the nominal a.

5.2.2 Computation

You may choose between an exact procedure and a fasterapproximation (see implementation notes for details):

1. Exact (unconditional) power if N < x. The computationtime of the exact procedure increases much faster withsample size N than that of the approximation. Giventhat both procedures usually produce very similar re-sults for large sample sizes, a threshold value x for Ncan be specified which determines the transition be-tween both procedures. The exact procedure is used ifN < x; the approximation is used otherwise.Note: G * Power does not show distribution plots forexact computations.

2. Faster approximation (assumes number of discordant pairsto be constant). Choosing this option instructs G * Power

to always use the approximation.

5.3 ExamplesAs an example we replicate the computations in O’Brien(2002, p. 161-163). The assumed table is:


Yes .54 .08 .62No .32 .06 .38

.86 .14 1

In this table the proportion of discordant pairs is pD =.32 + .08 = 0.4 and the Odds Ratio OR = p12/p21 =0.08/.32 = 0.25. We want to compute the exact power fora one-sided test. The sample size N, that is, the number ofpairs, is 50 and a = 0.05.

• SelectType of power analysis: Post hoc

• OptionsComputation: Exact

• InputTail(s): OneOdds ratio: 0.25a err prob: 0.05Total sample size: 50Prop discordant pairs: 0.4

14

30





ps 1 � ps 1



H0 : p12/p21 = 1H1 : p12/p21 6= 1.












5.2.2 Computation







Yes .54 .08 .62No .32 .06 .38

.86 .14 1





14





ps 1 � ps 1



H0 : p12/p21 = 1H1 : p12/p21 6= 1.












5.2.2 Computation







Yes .54 .08 .62No .32 .06 .38

.86 .14 1





14

• InputTail(s): TwoOdds ratio: 0.25α err prob: 0.05Total sample size: 50 Prop discordant pairs: 0.4 • OutputPower (1-β err prob): 0.80 Actual α: 0.04Proportion p12: 0.08 Proportion p21: 0.32

• SelectType of power analysis: Post hoc • Options Computation: Exact

F test: Fixed effects One-Way ANOVA

10 F test: Fixed effects ANOVA - oneway

The fixed effects one-way ANOVA tests whether there areany differences between the means µi of k � 2 normallydistributed random variables with equal variance s. Therandom variables represent measurements of a variable Xin k fixed populations. The one-way ANOVA can be viewedas an extension of the two group t test for a difference ofmeans to more than two groups.

The null hypothesis is that all k means are identical H0 :µ1 = µ2 = . . . = µk. The alternative hypothesis states thatat least two of the k means differ. H1 : µi 6= µj, for at leastone pair i, j with 1 i, j k.

10.1 Effect size indexThe effect size f is defined as: f = sm/s. In this equa-tion sm is the standard deviation of the group means µiand s the common standard deviation within each of thek groups. The total variance is then s2

t = s2m + s2. A dif-

ferent but equivalent way to specify the effect size is interms of h2, which is defined as h2 = s2

m/s2t . That is, h2

is the ratio between the between-groups variance s2m and

the total variance s2t and can be interpreted as “proportion

of variance explained by group membership”. The relation-ship between h2 and f is: h2 = f 2/(1 + f 2) or solved for f :f =

ph2/(1 � h2).

Cohen (1969, p.348) defines the following effect size con-ventions:

• small f = 0.10

• medium f = 0.25

• large f = 0.40

If the mean µi and size ni of all k groups are known thenthe standard deviation sm can be calculated in the followingway:

µ̄ = Âki=1 wiµi, (grand mean),

sm =q

Âki=1 wi(µi � µ̄)2.

where wi = ni/(n1 + n2 + · · ·+ nk) stands for the relativesize of group i.

Pressing the Determine button to the left of the effect sizelabel opens the effect size drawer. You can use this drawerto calculate the effect size f from variances, from h2 or fromthe group means and group sizes. The drawer essentiallycontains two different dialogs and you can use the Selectprocedure selection field to choose one of them.

10.1.1 Effect size from means

In this dialog (see left side of Fig. 11) you normally start bysetting the number of groups. G * Power then provides youwith a mean and group size table of appropriate size. Insertthe standard deviation s common to all groups in the SDs within each group field. Then you need to specify themean µi and size ni for each group. If all group sizes areequal then you may insert the common group size in theinput field to the right of the Equal n button. Clicking on

Figure 11: Effect size dialogs to calculate f

this button fills the size column of the table with the chosenvalue.

Clicking on the Calculate button provides a preview ofthe effect size that results from your inputs. If you clickon the Calculate and transfer to main window buttonthen G * Power calculates the effect size and transfers theresult into the effect size field in the main window. If thenumber of groups or the total sample size given in the ef-fect size drawer differ from the corresponding values in themain window, you will be asked whether you want to ad-just the values in the main window to the ones in the effectsize drawer.

24

31

Example: We compare 10 groups, and we have reason to expect a "medium" effect size (f = .25). How many subjects do we need in a test with α = 0.05 to achieve a power of 0.95? •SelectType of power analysis: A priori •InputEffect size f : 0.25α err prob: 0.05Power (1-β err prob): 0.95 Number of groups: 10 •OutputNoncentrality parameter λ: 24.375000 Critical F: 1.904538Numerator df: 9Denominator df: 380Total sample size: 390Actual Power: 0.952363

Questions?

32

sample size/power calculation by software/online...

Documents