sample size and study design
DESCRIPTION
Sample size and study design. Brian Healy, PhD. Comments from last time. We did not cover confounding Too much in one class/Not enough examples/Superficial level - PowerPoint PPT PresentationTRANSCRIPT
Sample size and Sample size and study designstudy design
Brian Healy, PhDBrian Healy, PhD
Comments from last timeComments from last time We did not cover confoundingWe did not cover confounding Too much in one class/Not enough Too much in one class/Not enough
examples/Superficial levelexamples/Superficial level– I wanted to show one example for each type of I wanted to show one example for each type of
analysis so that you can determine what your analysis so that you can determine what your data matches. This way you can speak to a data matches. This way you can speak to a statistician knowing the basic ideas.statistician knowing the basic ideas.
– My hope was for you to feel confident enough My hope was for you to feel confident enough to learn more about the topics relevant to youto learn more about the topics relevant to you
– Worked example lecturesWorked example lectures This is not basic biostatisticsThis is not basic biostatistics I did Teach for AmericaI did Teach for America
ObjectivesObjectives Type II errorType II error How to improve power? How to improve power? Sample size calculationSample size calculation Study design considerationsStudy design considerations
ReviewReview Previous classes we have focused on Previous classes we have focused on
data analysisdata analysis– AFTER data collectionAFTER data collection
Hypothesis testing allowed us to Hypothesis testing allowed us to determine whether there was a determine whether there was a statistically significant:statistically significant:– Difference between groupsDifference between groups– Association between two continuous factorsAssociation between two continuous factors– Association between two dichotomous Association between two dichotomous
factorsfactors
ExampleExample We know that the heart rate for healthy We know that the heart rate for healthy
adult is 80 beats per minute and this adult is 80 beats per minute and this has an approximately normal has an approximately normal distribution (according to my wife)distribution (according to my wife)
Some elite athletes, like Lance Some elite athletes, like Lance Armstrong, have lower heart rate, but it Armstrong, have lower heart rate, but it is not known if this is true on averageis not known if this is true on average
How could we address this question?How could we address this question?
Experimental designExperimental design One way to do this is to collect a One way to do this is to collect a
sample of normal controls and a sample sample of normal controls and a sample of elite athletes and compare their of elite athletes and compare their meanmean– What test would you use?What test would you use?
Another way is to collect a sample of Another way is to collect a sample of elite athletes and compare their mean elite athletes and compare their mean to the known population meanto the known population mean– This is a one sample testThis is a one sample test– Null hypothesis: meanNull hypothesis: meaneliteelite=80=80
QuestionQuestion How large a sample of elite athletes should I How large a sample of elite athletes should I
collect?collect? What is the benefit of having a large sample What is the benefit of having a large sample
size?size?– More informationMore information– More accurate estimate of the population meanMore accurate estimate of the population mean
What is the disadvantage of a large sample What is the disadvantage of a large sample size?size?– CostCost– Effort required to collectEffort required to collect
What is the “correct” sample size?What is the “correct” sample size?
Effect of sample sizeEffect of sample size Let’s say we wanted to estimate the blood Let’s say we wanted to estimate the blood
pressure of people at MGHpressure of people at MGH– If we sampled 3 people, would we have a good If we sampled 3 people, would we have a good
estimate of the population mean?estimate of the population mean? How much will sample mean vary from sample to How much will sample mean vary from sample to
sample?sample?– Does our estimate of the improve if we Does our estimate of the improve if we
sampled 30 people?sampled 30 people? Would the sample mean to vary more or less from Would the sample mean to vary more or less from
sample to sample?sample to sample?– What about 300 people?What about 300 people?
SimulationSimulation http://onlinestatbook.com/stat_sim/sahttp://onlinestatbook.com/stat_sim/sa
mpling_dist/index.htmlmpling_dist/index.html What is the shape of the distribution What is the shape of the distribution
of sample means?of sample means? Where is the curve centered?Where is the curve centered? What happens to curve as sample What happens to curve as sample
size increases?size increases? Technical: Central limit theoremTechnical: Central limit theorem
Standard error of the meanStandard error of the mean There are two measures of spread in the There are two measures of spread in the
datadata– Standard deviationStandard deviation: measure of spread of : measure of spread of
the individual observationsthe individual observations The estimate of this is the standard deviation of The estimate of this is the standard deviation of
the observations:the observations:– Standard errorStandard error: standard deviation of the : standard deviation of the
sample meansample mean The estimate of this is the standard deviation of The estimate of this is the standard deviation of
the observations divided by the sample sizethe observations divided by the sample size
n
Technical: Distribution of Technical: Distribution of sample mean under the nullsample mean under the null
If we took If we took repeated samples repeated samples and calculated and calculated the sample mean, the sample mean, the distribution of the distribution of the sample the sample means would means would have a have a distributiondistribution
Mean of distribution=80
Spread in distribution is based on standard error
Type I errorType I error We could plot the distribution of the We could plot the distribution of the
sample means under the null before sample means under the null before collecting datacollecting data
Type I error is the probability that you Type I error is the probability that you reject the null given that the null is truereject the null given that the null is true
P(reject HP(reject H00 | H | H00 is true) is true)
Notice that the shaded area is still part of the null curve, but it is in the tail of the distribution
Hypothesis test-reviewHypothesis test-review After data collection, we can After data collection, we can
calculate the p-valuecalculate the p-value If the p-value is less than the pre-If the p-value is less than the pre-
specified specified -level, we reject the null -level, we reject the null hypothesishypothesis
As the sample size increases, the standard As the sample size increases, the standard error decreaseserror decreases
p-value is based on the standard errorp-value is based on the standard error– As you sample size increases, the p-value As you sample size increases, the p-value
decreases if the mean and standard deviation do decreases if the mean and standard deviation do not changenot change
– With an extremely large sample, a very small With an extremely large sample, a very small departure from the null is statistically significantdeparture from the null is statistically significant
What would you think if you found the What would you think if you found the sample mean heart rate of three elite sample mean heart rate of three elite athletes was 70 beats per minute?athletes was 70 beats per minute?– Do your thoughts change if you sampled 300 Do your thoughts change if you sampled 300
athletes and found the same sample mean?athletes and found the same sample mean?
How much data should we How much data should we collect?collect?
Depends on several factors:Depends on several factors:– Type I errorType I error– Type II errorType II error (power) (power)– Difference we are trying to detect (null Difference we are trying to detect (null
and alternative hypotheses)and alternative hypotheses)– Standard deviationStandard deviation
Remember this is decided BEFORE Remember this is decided BEFORE the study!!!the study!!!
Type II errorType II error Definition:Definition: when you fail to reject when you fail to reject
the null hypothesis when the the null hypothesis when the alternative is in fact true (alternative is in fact true (type II type II errorerror))
This type of error is based on a This type of error is based on a specific alternativespecific alternative
P(fail to reject the HP(fail to reject the H00 | H | HAA is true) is true)
PowerPower Definition:Definition: the probability that you the probability that you
reject the null hypothesis given that reject the null hypothesis given that the alternative hypothesis is true. the alternative hypothesis is true. This is what we want to happen.This is what we want to happen.
Power = P(reject HPower = P(reject Ho o | H| HAA is true) = 1 - is true) = 1 - Since this is a good thing, we want Since this is a good thing, we want
this to be highthis to be high
This is the population distribution under the null hypothesis
The location of the curve is 0 and the spread in the curve is the standard error
This is the population distribution under the alternative hypothesis
This is the cut-off value.
Reject HoFail to reject H0
P(reject H0| H0 is true)
PowerP(reject H0| HA is true)
P(fail to reject H0|
HA is true)
Reject HoFail to reject H0
Life is a trade offLife is a trade off These two errors are relatedThese two errors are related
– We usually assume that the type I error is We usually assume that the type I error is 0.05 and calculate the type II error for a 0.05 and calculate the type II error for a specific alternativespecific alternative
– If you are want to be more strict and falsely If you are want to be more strict and falsely reject the null only 1% of the time (reject the null only 1% of the time (=0.01), =0.01), the chance of a type II error increasesthe chance of a type II error increases
Sensitivity/specificity or false Sensitivity/specificity or false positive/false negativepositive/false negative
Changing the powerChanging the power Note how the power Note how the power
(green) increases (green) increases as you increase the as you increase the difference between difference between the null and the null and alternative alternative hypotheseshypotheses
How else do you How else do you think we could think we could increase the power?increase the power?
Another way to increase power is to Another way to increase power is to increase type I error rateincrease type I error rate
Two other ways to increase power Two other ways to increase power involve changing the shape of the involve changing the shape of the distributiondistribution– Increasing the sample sizeIncreasing the sample size
When the sample size increases, the curve for When the sample size increases, the curve for the sample means tightensthe sample means tightens
– Decreasing the variability in the Decreasing the variability in the populationpopulation When there is less variability, the curve for the When there is less variability, the curve for the
sample means also tightenssample means also tightens
ExampleExample For our study, we know that we can enroll
40 elite athletes. We also know that the population mean is
80 beats per minute and the standard deviation is 20
We believe the elite athletes will have a mean of 70 beats per minute
How much power would we have to detect How much power would we have to detect this difference at the two-sided 0.05 level?this difference at the two-sided 0.05 level?– All this information fully defined our curvesAll this information fully defined our curves
• Using STATA, we find that we have 88.5% power to detect the difference of 10 beats per minute between the groups at the two-sided 0.05 level using a one sample z-test• Question: If we were able to enroll more subjects would our power increase or decrease?
ConclusionsConclusions For a specific sample size, standard For a specific sample size, standard
deviation, difference between the deviation, difference between the means and type I error, we can means and type I error, we can calculate the powercalculate the power
Changing any of the four parameters Changing any of the four parameters above will change the powerabove will change the power– Some under the control of the Some under the control of the
investigator, but others are notinvestigator, but others are not
Sample sizeSample size Up to now we have shown how to find the Up to now we have shown how to find the
power given a specific sample size, power given a specific sample size, difference between the means, standard difference between the means, standard deviation and alpha level.deviation and alpha level.
We can vary any four of these five factors We can vary any four of these five factors and find the fifth. and find the fifth. – Usually the alpha level is required to be two-Usually the alpha level is required to be two-
sided 0.05 sided 0.05 – How can we calculate the sample size for How can we calculate the sample size for
specific values of the remaining parameters?specific values of the remaining parameters?
Two approaches to sample Two approaches to sample sizesize
Hypothesis testingHypothesis testing– When you have a specific null AND When you have a specific null AND
alternative hypothesis in mindalternative hypothesis in mind Confidence intervalConfidence interval
– When you want to place an interval When you want to place an interval around an estimatearound an estimate
Hypothesis testing approachHypothesis testing approach1)1) State null and alternative hypothesisState null and alternative hypothesis
– Null usually pretty easyNull usually pretty easy– Alternative is more difficult, but very importantAlternative is more difficult, but very important
2)2) State standard deviation of outcomeState standard deviation of outcome3)3) State desired power and alpha levelState desired power and alpha level
– Power=0.8Power=0.8– Alpha=0.05 for two-sided testAlpha=0.05 for two-sided test
4)4) State testState test5)5) Use statistical package to calculate sample Use statistical package to calculate sample
sizesize
We know the We know the location of the null location of the null and alternative and alternative curves, but we do curves, but we do not know the shape not know the shape because the sample because the sample size determines the size determines the shape. We need to shape. We need to find the sample size find the sample size that will give the that will give the curves the shape so curves the shape so that the that the level and level and power equal the power equal the specified values.specified values.
Alpha=0.025
Power=0.8
Beta=0.2
General form of sample size General form of sample size calculationcalculation
Here is the general form of the normal Here is the general form of the normal sample sizesample size– One-sidedOne-sided
– Two-sidedTwo-sided
2
10
12/1
zz
n
211
2
10
11
zzzz
nSample size
Standard deviation
Related to Type I error
Related to Type II error
Mean under null and alternative
Hypothesis testing approachHypothesis testing approach1)1) State null and alternative hypothesisState null and alternative hypothesis
– HH00: : 00=80=80– HHAA: : 11=70=70
2)2) sd=20sd=203)3) State desired power and alpha levelState desired power and alpha level
– Power=0.8Power=0.8– Alpha=0.05 for two-sided testAlpha=0.05 for two-sided test
4)4) State test: z-testState test: z-test5)5) n=31.36 n=31.36 n=32 n=32
Example-more complexExample-more complex In a recently submitted grant, we In a recently submitted grant, we
investigated the sample size required investigated the sample size required to detect a difference between RRMS to detect a difference between RRMS and SPMS patients in terms of levels and SPMS patients in terms of levels of a markerof a marker
Preliminary data:Preliminary data:– RRMS: mean level=0.54 +/- 0.37 RRMS: mean level=0.54 +/- 0.37 – SPMS: mean level=0.94 +/- 0.42SPMS: mean level=0.94 +/- 0.42
Hypothesis testing approachHypothesis testing approach1)1) State null and alternative hypothesisState null and alternative hypothesis
– HH00: mean: meanRRMSRRMS=mean=meanSPMSSPMS=0.54=0.54– HHAA: mean: meanRRMSRRMS=0.54, mean=0.54, meanSPMSSPMS=0.94, =0.94,
Difference between groups=0.4Difference between groups=0.42)2) sdsdRRMSRRMS=0.37, sd=0.37, sdSPMSSPMS=0.42=0.423)3) State desired power and alpha levelState desired power and alpha level
– Power=0.8Power=0.8– Alpha=0.05 for two-sided testAlpha=0.05 for two-sided test
4)4) State test: t-testState test: t-test
ResultsResults Use these values in statistical Use these values in statistical
packagepackage– 17 samples from each group are 17 samples from each group are
requiredrequired Website: Website:
http://hedwig.mgh.harvard.edu/samphttp://hedwig.mgh.harvard.edu/sample_size/size.htmlle_size/size.html
Statistical considerations for Statistical considerations for grantgrant
““Group sample sizes of 17 and 17 achieve Group sample sizes of 17 and 17 achieve at least 80% power to detect a difference at least 80% power to detect a difference of -0.400 between the null hypothesis of -0.400 between the null hypothesis that both group means are 0.540 and the that both group means are 0.540 and the alternative hypothesis that the mean of alternative hypothesis that the mean of group 2 is 0.940 with estimated group group 2 is 0.940 with estimated group standard deviations of 0.370 and 0.420 standard deviations of 0.370 and 0.420 and with a significance level (alpha) of and with a significance level (alpha) of 0.05 using a two-sided two-sample t-0.05 using a two-sided two-sample t-test.”test.”
Technical remarksTechnical remarks So we have shown that we can calculate So we have shown that we can calculate
the power for a given sample size and the power for a given sample size and sample size for a given power. We can also sample size for a given power. We can also change the clinically meaningful difference change the clinically meaningful difference if we set the sample size and power.if we set the sample size and power.
In many grant applications, we show the In many grant applications, we show the power for a variety of sample sizes and power for a variety of sample sizes and differences in the means in a table so that differences in the means in a table so that the grant reviewer can see that there is the grant reviewer can see that there is sufficient power to detect a range of sufficient power to detect a range of differences with the proposed sample size.differences with the proposed sample size.
Confidence interval Confidence interval approachapproach
If we do not have a set alternative, If we do not have a set alternative, we can choose the sample size based we can choose the sample size based on how close to the truth we want to on how close to the truth we want to getget
In particular we choose the sample In particular we choose the sample size so that the confidence interval is size so that the confidence interval is of a certain widthof a certain width
Under a normal distribution, the Under a normal distribution, the confidence interval for a single confidence interval for a single sample mean issample mean is
We can choose the sample size to We can choose the sample size to provide the specified width of the provide the specified width of the confidence intervalconfidence interval
nmean
nmean *96.1,*96.1
ConclusionsConclusions Sample size can be calculated if the Sample size can be calculated if the
power, alpha level, difference power, alpha level, difference between the groups and standard between the groups and standard deviation are specifieddeviation are specified
For more complex setting than those For more complex setting than those presented here, statisticians have presented here, statisticians have worked out the sample size worked out the sample size calculations, but still need estimates calculations, but still need estimates of the hypothesized difference and of the hypothesized difference and variability in the datavariability in the data
Study designStudy design
Reasons for differences Reasons for differences between groupsbetween groups
Actual effect-when there is a Actual effect-when there is a difference between the two groups difference between the two groups (ex. the treatment has an effect)(ex. the treatment has an effect)
ChanceChance BiasBias ConfoundingConfounding
ChanceChance When we run a study, we can only When we run a study, we can only
take a sample of the population. Our take a sample of the population. Our conclusions are based on the sample conclusions are based on the sample we have drawn. Just by chance, we have drawn. Just by chance, sometimes we can draw an extreme sometimes we can draw an extreme sample from the population. If we had sample from the population. If we had taken a different sample, we may taken a different sample, we may have drawn different conclusions. We have drawn different conclusions. We call this call this sampling variabilitysampling variability..
Note on variabilityNote on variability Even though your experiments are well Even though your experiments are well
controlled, not all subjects will behave controlled, not all subjects will behave exactly the sameexactly the same– This is true for almost all experimentsThis is true for almost all experiments– If all animals acted EXACTLY the same, we If all animals acted EXACTLY the same, we
would only need one animalwould only need one animal Since one is not enough, we observe a Since one is not enough, we observe a
group of micegroup of mice– We call this our sampleWe call this our sample
Based on our sample, we draw a Based on our sample, we draw a conclusion regarding the entire populationconclusion regarding the entire population
Study design considerationsStudy design considerations Null hypothesisNull hypothesis Outcome variableOutcome variable Explanatory variableExplanatory variable Sources of variabilitySources of variability Experimental unitExperimental unit Potential correlationPotential correlation Analysis planAnalysis plan Sample sizeSample size
ExampleExample We start with a single group (ex. We start with a single group (ex.
Genetically identical mice)Genetically identical mice) The group are broken into 3 groups that The group are broken into 3 groups that
are treated with 3 different interventionsare treated with 3 different interventions An outcome is measured in each individualAn outcome is measured in each individual Questions:Questions:
– What analysis should we do?What analysis should we do?– What is the effect of starting from the same What is the effect of starting from the same
population?population?– Do we need to account for repeated measures?Do we need to account for repeated measures?
Original group Condition
1
Condition 3
Condition 2
GeneralizabilityGeneralizability Assume that we have found a Assume that we have found a
difference between our exposure and difference between our exposure and control group and we have shown that control group and we have shown that this result is not likely due to chance, this result is not likely due to chance, bias or confounding. bias or confounding.
What does this mean for the general What does this mean for the general population? Specifically, to which population? Specifically, to which group can we apply our results?group can we apply our results?– This is often based on how the sample This is often based on how the sample
was originally collected.was originally collected.
Example 2Example 2 We want to compare the expression of a We want to compare the expression of a
marker in patients vs. controlsmarker in patients vs. controls Full sample size is 288 samplesFull sample size is 288 samples Can only run 24 samples (1 plate) per Can only run 24 samples (1 plate) per
dayday Questions:Questions:
– What types of analysis should we do?What types of analysis should we do?– Can we combine across the plates?Can we combine across the plates?– Could other confounders be important to Could other confounders be important to
collect?collect?
Plate 1: 10 patients, 14 controls Estimate of
difference in this plate
Plate 2: 14 patients, 10 controls Estimate of
difference in this plate
Plate 3: 12 patients, 12 controls Estimate of
difference in this plate
We can test if there is a different effect in each plate by investigating the interaction
Example 3Example 3 We want to compare the expression of We want to compare the expression of
6 markers6 markers We measure the six markers in 5 miceWe measure the six markers in 5 mice Questions:Questions:
– What types of analysis should we do?What types of analysis should we do?– How many independent groups do we How many independent groups do we
have?have?– What is the null hypothesis?What is the null hypothesis?
Example 4Example 4 ““In our experiments, we collect 3 In our experiments, we collect 3
measurements. If it is significant, we measurements. If it is significant, we call it a day. If it is close to significant, call it a day. If it is close to significant, we measure 1 more animal”we measure 1 more animal”
Question:Question:– Is this valid?Is this valid?
Always more statistically valid if the Always more statistically valid if the number is specified BEFORE the number is specified BEFORE the experimentexperiment
Spreadsheet formationSpreadsheet formation What to collectWhat to collect
– Everything that might be important for Everything that might be important for the analysisthe analysis PlatePlate BatchBatch TechnicianTechnician All potential sources of variabilityAll potential sources of variability All potential confoundersAll potential confounders
– Most accurate version of this you canMost accurate version of this you can If it is continuous, collect it as such. Can If it is continuous, collect it as such. Can
always dichotomize lateralways dichotomize later
Spreadsheet formationSpreadsheet formation Easiest to move to a statistical Easiest to move to a statistical
package ifpackage if– One row per measurementOne row per measurement– One column for the outcome, each One column for the outcome, each
predictor and potential confounderspredictor and potential confounders– No open spaceNo open space
ConclusionsConclusions Sample size for experiment must be Sample size for experiment must be
considered BEFORE collecting dataconsidered BEFORE collecting data Can improve power by reducing Can improve power by reducing
standard deviation, increasing standard deviation, increasing sample size or increasing difference sample size or increasing difference between groupsbetween groups
Important to consider study design Important to consider study design as you develop your analysis planas you develop your analysis plan