Why Power Analysis?
•Research is expensive…wouldn’t want to conduct an experiment with far too
1. few experimental units (EUs)Project won’t find important differences that exist not really worth doing
2. many experimental units (EUs)Project will be unnecessarily too expensive
•Typical granting agency requirement
2
A Simple Experimental Design
•Effect of diet on blood pressure (mmHg) in rats
•Consider a Completely Randomized Design (CRD)• 12 rats randomly assigned to one of two different diets
• Trt 1: DASH diet…n=6• Trt 2: Standard diet…n=6
• Investigator expects higher mean blood pressure (BP) at the end of 12 weeks when under Trt 2
• Is n=6 enough to detect this difference?
3
Statistical Analysis• Two competing hypotheses:
Ho: m1=m2
H1: m1<m2
• Basis for choosing between the two is the degree of evidence against the null hypothesis. We use the P-value relative to a declared significance level a
• P ≤ a → reject Ho and conclude mean BP larger in Trt 2
• P > a → fail to reject Ho, not enough evidence to conclude H1
• There are two possible incorrect conclusions based on this approach to inference
4
i.e. one-tailed test (for now)
Type I and Type II errors
5
Fail to reject Ho:
(P>a)
Reject Ho:
(P≤a)
HoNo error
Type I error
(Prob is a)
H1
Type II error
(Prob = b)No error
What the data indicate:Tru
e u
nknow
n s
tate
So is n = 6 rats large enough?
•Rephrase: Do we have enough statistical power?
•Need to “know” two things
1.How large is the true mean difference (d = m2-m1)?a) What do you anticipate and/or want to detect?b)What would be economically/practically important?
2.How much variability (s) between rats within a grp?• Sometimes prior information available from pilot study or
previously published studies• Otherwise need to make an educated “guess”• Always round up to be a little conservative
6
One way to elicit values for s
•Empirical rule: Consider range of responses to be equal to 4s
•Question to client: What would be the likely range (max-min) of responses for rats within the same trt?
•Suppose the answer was 60 mmHgR = 60 → s = 15 mmHg.
7R≈4s
Suppose researchers also believe that d ≥20 mmHg is important
Two competing hypotheses
• Under Ho:
• Under H1:
Conduct one-tailed z-test for a certain a
8
2
2 1
2~ 0,y y N
n
s
2
2 1
2~ ,y y N
n
sd
2 1
22
y yz z
n
a
s
Reject Ho: if
2
2 1
2y y z
na
s if
Currently assuming the data are Normal.
Sole difference is in the mean of the distribution.
More reasonable statistical test
• t-test • Because you likely won’t be able to assume s2 is known• One-sided: Reject Ho if
• Two-sided (H1: m1≠m2): Reject Ho: if
• Use of t distribution results in more complicated alternative hypothesis distribution (non-central t)
10
/ 2,dft ta
2 1,
2 2
1 2
df
y yt t
s s
n n
a
Using SAS for power analysis
11
proc power;
twosamplemeans alpha=.05 nulldiff=0 sides=1
meandiff=20 npergroup=6 stddev=15
power=.;
run;
proc power;
onewayanova alpha=.05 test=overall
groupmeans=(0 20) npergroup=6 stddev=15
power=.;
run;
or
Similar to two-sided t-test
SAS Output
12
Two-sample t Test for Mean DifferenceFixed Scenario Elements
Distribution NormalMethod ExactNumber of Sides 1Null Difference 0Alpha 0.05Mean Difference 20Standard Deviation 15Sample Size Per Group 6
Computed PowerPower0.693
SAS Output
13
Overall F Test for One-Way ANOVAFixed Scenario Elements
Method ExactAlpha 0.05Group Means 0 20Standard Deviation 15Sample Size Per Group 6
Computed PowerPower0.550
Typically want power to be larger than 80% so more rats would be desirable
Using SAS for sample size
14
proc power;
twosamplemeans alpha=.05 nulldiff=0 sides=1
meandiff=20 npergroup=. stddev=15
power=.80;
run;
proc power;
onewayanova alpha=.05 test=overall
groupmeans=(0 20) npergroup=. stddev=15
power=.80;
run;
or
SAS Output
15
Two-sample t Test for Mean DifferenceFixed Scenario Elements
Distribution NormalMethod ExactNumber of Sides 1Null Difference 0Alpha 0.05Mean Difference 20Standard Deviation 15Nominal Power 0.8
Computed N Per GroupActual N PerPower Group0.813 8
SAS Output
16
Overall F Test for One-Way ANOVAFixed Scenario Elements
Method ExactAlpha 0.05Group Means 0 20Standard Deviation 15Nominal Power 0.8
Computed N Per GroupActual N PerPower Group0.805 10
Generating Power Curve I
17
proc power;twosamplemeans alpha=.05 nulldiff=0 sides=1
meandiff=20 stddev=15 power=.npergroup=3 to 20 by 1;
plot interpol=join yopts=(ref=0.80);run;
18
0.8
0 5 10 15 20
Sample Size Per Group
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0P
ow
er
Power Curve for one-sided t test
Generating Power Curve II
19
proc power;twosamplemeans alpha=.05 nulldiff=0 sides=1
meandiff=10 to 30 by 1 stddev=15npergroup=6 power=.;
plot x=effect interpol=join yopts=(ref=0.80);run;
Determining sample size for a desired margin of error
•Confidence interval
•Given guesstimates for the variances, one can set margin of error equal to desired amount and solve for n
2 2
1 22 1 df
s sy y t
n n Margin of error
What if more than two trts?
•Example: In a study of vitamin supplementation, certain pigs are assigned to each of 5 treatment groups and weight gains over a specified time period are to be recorded.
• Researchers anticipate mean responses to be 3.9, 4.1,4.2, 4.3 and 4.5 kg for the five treatments, respectively
•Based on previous experience, they anticipate a within-treatment variance of about 0.30 kg2
•They want to know if n=4 animals per treatment would provide sufficient power for the ANOVA F-test.
22
1) ij i ijY em
2) ij i ijY em a
Linear model written two ways
23
i= 1,....,r=5; j = 1,2,…,n=4 )2~ 0,ije NIID s
1
1
0
r
i ri
i i i
ir
m
m a m m a
i.e. Sum-to-zero constraints
Factor level effects model
Cell means model
Central F-distribution
One-way ANOVA table
24
Source Df SS MS EMS
Treatment r-1 SSTrt MSTrt 2
1
2r
i
i
function as
Error r(n-1) SSE MSE s 2
ANOVA F-test:
1) Ho: m1=m2=m3=m4=m5 versus H1: at least one mi≠mi’
2) Ho: ALL ai = 0 versus. H1: at least one ai ≠ 0
Note: if Ho: is true then both EMS = s2 such that F = MSTrt/MSE ~ Fr-1,r(n-1)
Equivalent specs.
Power determination for F-test
• Under H1: , or
This means F = MSTrt/MSE ~ Fr-1,r(n-1),f
25
1
2
3
4
5
3.9
4.1
4.2
4.3
4.5
m
m
m
m
m
1
2
3
4
5
0.3
0.1
0.0
0.1
0.3
a
a
a
a
a
2
21
r
i
i
nf a
s
is the non-centrality parameter
Non-central F-distribution (if f > 0)
with 4.2m
“Corrected sum of squared means” (CSSM) =(-0.3)2+(-0.1)2+ +(0.0)2+ (0.1)2+(+0.3)2=0.20 for example
SAS Code
proc power;
onewayanova alpha=.05 test=overall
groupmeans=(3.9 4.1 4.2 4.3 4.5)
npergroup=4 stddev=0.5477
power=.;
run;
26
This is the square root of 0.30
SAS Output
27
Overall F Test for One-Way ANOVAFixed Scenario Elements
Method ExactAlpha 0.05Group Means 3.9 4.1 4.2 4.3 4.5Standard Deviation 0.5477Sample Size Per Group 4
Computed PowerPower0.171
Very poorly underpowered….as designed, this would be a waste of time and money to run!!
SAS Codeproc power;
onewayanova alpha=.05 test=overall
groupmeans=(3.9 4.1 4.2 4.3 4.5)
npergroup=4 to 30
stddev=0.5477 power=.;
plot interpol=join yopts=(ref=.80);
run;
28
Let’s look at a power curve to get an idea of the necessary sample size
29
0.8
0 5 10 15 20 25 30
Sample Size Per Group
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Pow
er
Looks like we need about 19 animals per group (almost 5 times the number before)
What if trt means unknown?
•Use the “worst case” scenario
•Conservative assessment of power• Just have to know the difference between the largest and
smallest means or the smallest difference D that is scientifically meaningful
• Use –D/2 and D/2 with all other means clumped at zero
30
fa minimizesit so Minimizes2
iTrue power will be greater than or equal to this
SAS Code
**Suppose D=0.6***;
proc power;
onewayanova alpha=.05 test=overall
groupmeans=(-0.3 0 0 0 0.3)
npergroup=4 to 30
stddev=0.5477 power=.;
plot interpol=join yopts=(ref=.80);
run;
31
32
0.8
0 5 10 15 20 25 30
Sample Size Per Group
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0P
ow
er
Looks like we need about 21 animals per group in the worst case
There is actually a “trick” to computing f using ANOVA software like PROC GLM/MIXED (O’Brien and Lohr, 1984)
1) Substitute “true means” for data in ANOVA.
2) Use the ANOVA table to compute the noncentrality parameter
3) Then use that computed value in power calculations!
33
Using “true means” for data
data oneway;
input treatment mean;
datalines;
1 4.0
1 4.0
1 4.0
1 4.0
2 4.3
2 4.3
2 4.3
2 4.3
3 4.6
3 4.6
3 4.6
3 4.6
;
34
Suppose you are interested in 3 treatments.
Anticipate true mean responses of 4.0, 4.3 and 4.6
Anticipate residual variance of 0.30
Wish to compute power based on sample size of n= 4 for each treatment.
proc mixed data=oneway noprofile;class treatment;model mean = treatment;parms (0.30) /noiter;ods output tests3 = tests3;
run;
Output the ANOVA table to a file called “tests3”
Trick to compute f
• Compute the ANOVA treatment “F ratio”
• Multiple “FTreatment” by numerator degrees of freedom (NumDF) to get f:
• FTreatment is a function of CSSM.
35
" "TreatmentF
1.2*2 2" ." 4* TreatmenTreatmen tt dF ff
Obs Effect NumDF DenDF FValue ProbF
1 treatment 2 9 1.20 0.3452
Use f to computer power
36
data power;set tests3;noncent = Fvalue*numdf;alpha = 0.05;criticalvalue = Finv(1-alpha,numdf,dendf,0);Power = 1-Probf(criticalvalue,numdf,dendf,noncent);
run;proc print data=power;run;
Effect Num
DF
Den
DF
FValue ProbF noncent alpha Critical
value
Power
treatment 2 9 1.20 0.3452 2.4 0.05 4.25649 0.20010
The critical value separating the “acceptance region” from the “rejection region”
Probability of falling in rejection region if H1 is true.
PROC GLMPOWER does this
data example1;
input FactorA $ mean;
datalines;
1 4.0
2 4.3
3 4.6
run;
proc glmpower data=example1 ;
class FactorA ;
model mean = FactorA ;
power
stddev = .548
ntotal = 12
power = .alpha=0.05;
run;
37
Total number of experimental units
Much simpler data step
The GLMPOWER ProcedureFixed Scenario Elements
Dependent Variable meanSource FactorAAlpha 0.05Error Standard Deviation 0.548Total Sample Size 12Test Degrees of Freedom 2Error Degrees of Freedom 9
Computed PowerPower0.200
What about Factorial Designs?
•An experiment was conducted to determine the effects of three different sources of dietary phosphorous and four different varieties of corn silage on daily milk production
•Proposed a 3 x 4 factorial experiment:• Factor A, Dietary phosphorus : 1, 2, & 3 (a=3)• Factor B, Corn silage varieties: 1,2,3, & 4 (b=4).
•Each cow randomly assigned to just one particular A*B treatment combination.
•How many cows should be considered?
38
Need to specify “true” means
39
Power analysis requires “knowledge” of mij and s2.
11 37m 12 38m 13 44m 14 41m
21 42m 22 43m 23 49m 24 46m
31 47m 32 48m 33 54m 34 51m
Suppose, investigator anticipates that:
s2 = 5 kg2
Wishes to determine power for both main effects and two-way interaction and also the difference between, say, Level 1 and 2 of A
) )1. 2. 11 12 13 14 21 22 23 24
1 1
4 4m m m m m m m m m m
Setup “data”
data power;
input FactorA FactorB cellmean;
datalines;
1 1 37
1 2 38
1 3 44
1 4 41
2 1 42
2 2 43
2 3 49
2 4 46
3 1 47
3 2 48
3 3 54
3 4 51
run;
40
symbol1 i=join;proc gplot;plot cellmean*FactorB=FactorA;run;
Profile means plot
41
Researcher anticipating no interaction
(Power analysis should still take its possiblity into account in ANOVA)
cellmean
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
FactorB
1 2 3 4
FactorA 1 2 3
42
proc glmpower data=power ;class FactorA FactorB;model cellmean = FactorA | FactorB ;contrast 'A1 vs A2' FactorA 1 -1 0 FactorB 0 0 0 0 FactorA*FactorB 0.25 0.25 0.25 0.25
-0.25 -0.25 -0.25 -0.250 0 0 0 ;
power stddev = 5/* square root of residual standard deviation */
ntotal = 36/* provides power determination for n =36/12 = 3 reps per group */
power = . /* Blank…because you want to compute power */
alpha=0.05;plot x=n min=24 max=96;
/* power curve plot ranging from n = 24/12 to 96/12 */ run;
Using GLMPower
PROC GLMPOWER OUTPUT
43
Fixed Scenario Elements
Dependent Variable cellmeanAlpha 0.05Error Standard Deviation 5Total Sample Size 36Error Degrees of Freedom 24
Computed PowerTest
Index Type Source DF Power 1 Effect FactorA 2 0.9892 Effect FactorB 3 0.7203 Effect FactorA*FactorB 6 0.0504 Contrast A1 vs A2 1 0.652