sample size estimation - vam resource center · pdf filesample size estimation. ......
Post on 19-Mar-2018
226 Views
Preview:
TRANSCRIPT
Sample size estimation
Steps for calculating the sample size per stratum
1. Choose the stratification (e.g.
regions, district,….)
2. Define the population (N) of each
strata
3. Decide on key indicator(s)
4. Estimate mean & variance or prevalence
of key indicator
5. Decide on precision and confidence level
6. Calculate the initial total sample size (n)
according to the budget/time
7. Use simple random sample per strata to
select your representative sample
Final sample size
Calculate sample size – random sampleTo estimate sample size, you need to know:
•Estimate of the prevalence or mean & STDev of the key indicator (e.g. 30% poor food consumption)
•Precision desired (for example: ± 5%)
•Level of confidence (for example: 95%)
•Population (only if below 10,000, otherwise it will not influence the required sample size)
•Expected response rate (for example: 90%)
•Number of eligible individuals per household (if applicable)
• Precision is the variability of the estimate
• Confidence Level is the probability of the same result if you re-sampled, all other things equal
• Confidence Interval is the interval around the estimate for which we have a desired confidence level
• Choose Size of Confidence Interval (Precision) given a confidence level
• As long as the target population is more than a few thousand households, it will not influence the required sample size. Only if less, the sample size requirements will reduce slightly.
Note:
Prevalence vs. mean
For the purpose of calculating sample size:
Treat variables as prevalence rates only when they are naturally binary
DO NOT threshold continuous indicators when calculating sample size even if for analysis purposes they are turned into prevalence indicators
e.g. the Food Consumption Score (FCS) is a continuous indicator (0-112) but it is tresholded at 21 and 35 for poor and borderline prevalence
• Prevalence is the total number of cases for a variable of interest that is typically binarywithin a population divided by its total population (e.g. stunting incidence, unemployment)
• Mean is the expected value of a variable of interest that is typically continuous or within a prescribed range for a given population (e.g. height, weight, age)
Choosing the right distribution
For continuous variables we must choose a probability distribution that best fits the data
Step 1: ALWAYS plot a histogram of past data and choose the distributional form!
Two of the most common ones encountered at WFP are:
Prevalence rates always follow a binomial distribution which is why they are mathematically easy to deal with
NormalDistribution
Negative BinomialDistribution
Sample size formula for prevalence (single survey using random sampling)
To calculate sample size for estimate of prevalence with 95%confidence limit
𝒏 = 1.962
x (P)(1-P)
d2
• 1.96 = Z value for 95% confidence limits• P = Estimated prevalence (e.g. 0.3 for 30%)• (P)(1-P) = variance for a binary (binomial) variable• d = ½ of desired confidence interval (e.g. 0.025 for ± 5%)
Sample size formula for continuous variable (single survey using random sampling)
To calculate sample size for estimate with 95% confidence limit:
𝒏 = 1
1
𝑁+𝜇∗𝑑 2
1.962 𝜎2
(1.2)
• 1.96 = Z value for 95% confidence limits• μ = Expected mean • σ2 = variance of variable• d = ½ of desired confidence interval (e.g. 0.025 for ± 5%)• μ*d = absolute value of confidence interval• N= population of each strata.
Sample size formula for cluster sampling
• To calculate sample size for estimate of prevalence with 95% confidence interval taking into account cluster sampling:
N = DEFF x 1.962 x (P)(1-P)d2
DEFF = Design effect1.96 = Z value for p = 0.05 or 95% confidence intervalsP = Estimated prevalenced = Desired precision (for example, 0.05 for ± 5%)
What is Design effect?
• Ratio of the actual variance from the sampling method used, to the hypothetical variance under simple random sampling
• For clustered sampling:• N = # of Samples |K = # of clusters |M = # of samples per cluster
• Deff > 1 always for cluster sampling because it can never be more efficient than random sampling, vise versa for stratified sampling
Design effect
Design effect increases when• Key indicators are highly geographically clustered (e.g. water source, access to
health care)• Water source
• Access to health care
• When number of clusters are decreased and size of clusters are increased
To minimize design effect• Include more clusters of smaller size
• Stratify sample into more homogeneous groups
• All clusters should be of same size
Example1: Key indicator Normally distributedFood Consumption Score
Example 1: Iraq Case Study1. Choose the stratification (Strata): 18 Governorates of Iraq
2. Define Population (N) of each strata:
3. Key Indicator: Food Consumption Score (FCS)
Governorate(s) Population 2015
Anbar 1723154
Babil 2008609
Najaf 1428979
Baghdad 7882807
Basrah 2822646
Diyala 1592434
Duhok 848524
Erbil 1650224
Kerbala 1183818
Kirkuk 1551670
Missan 1080392
Wassit 1340116
Muthanna 792339
Qadissiya 1254963
Ninewa 3397659
Salah al-Din 1551978
Sulaymaniyah 1858506
Thi-Qar 2035734
4. Calculate the Mean (μ) and Standard deviation (STDev) of the FCS (Key Indicator)
5. Decide on precision and confidence level:
• 90% confidence interval
• Z=1.645
N.B. A confidence interval of 10% at 90% confidence is the absolute minimum!
The bigger the CI then the worse the ability to detect trends!
The STDev was rounded for having just two levels
6. Initial total sample size (n): 2200 surveyed respondents
The required sample size needed in each Governorate calculated using the equation (1.2) will be:
Governorate(s) Population FCS Mean FCS Stdev SRS 5%-90%
Anbar 1723154 75 20.5 323
Babil 3437588 82.5 17.5 195
Najaf 3437588 82.5 17.5 195
Baghdad 7882807 82.5 17.5 195
Basrah 2822646 85 17.5 184
Diyala 1592434 80 17.5 207
Duhok 848524 77.5 20.5 303
Erbil 1650224 80 17.5 207
Kerbala 1183818 82.5 17.5 195
Kirkuk 1551670 75 17.5 236
Missan 2420508 80 17.5 207
Wassit 2420508 80 17.5 207
Muthanna 2047302 82.5 17.5 195
Qadissiya 2047302 82.5 17.5 195
Ninewa 3397659 77.5 20.5 303
Salah al-Din 1551978 80 17.5 207
Sulaymaniyah 1858506 80 17.5 207
Thi-Qar 2035734 85 17.5 184
Iraq 36004552 3348
• After estimating the sample size in each governorate, rescale it to the total
sample size of 2200 (n) using the following formula: 𝑛𝑛
𝑛𝐼𝑅𝐴𝑄
𝑛𝐼𝑅𝐴𝑄
Simple Random Sampling
Given 5/90 -> scaling by 2200
212
128
128
121
136
199
136
128
155
136
128
199
136
136
121
2199
Example2: Negative binomial distributed indicatorreduced Coping Strategy Index
Example 2: Malawi Case Study
1. Define the total Population (N):
2. Define the population in each Strata (Nh): (Strata= 10 aggregated districts of Malawi)
Stratified Random Sampling
District(s) Population
Blantyre-Mwanza-Neno-Balaka 1933263
Chikwawa-Nsanje 892772
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421
Dedza-Ntcheu 1406995
Dowa-Ntchisi-Kasungu-Mchinji 2322675
Lilongwe 2310728
Machinga-Mangochi 1608745
Mzimba-Karonga-Rumphi 1578519
Nkhata Bay-Chitipa-Likoma 514968
Nkhotakota-Salima 773482
Malawi Total 16512568
StDev = µ+µ2
𝑛ℎ,𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒
Where nh,baseline = sample size of eachstrata of the baseline with h=1,…,10
Stratified Random Sampling
3. Key Indicator: reduced Coping Strategy Index (rCSI)4. Calculate the Mean (μ) and Standard deviation (STDev) of the rCSI
StDev = µ+µ2
𝑛ℎ,𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒
Where nh,baseline = sample size of each
strata of the baseline with h=1,…,10
5. Decide on precision and confidence level:
• 90% confidence interval
• Z=1.645
N.B. A confidence interval of 10% at 90% confidence is the absolute minimum!
The bigger the CI then the worse the ability to detect trends!
District(s) Population rCSI Mean rCSI Stdev
Blantyre-Mwanza-Neno-Balaka 1933263 12 4
Chikwawa-Nsanje 892772 16 5
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421 14 4
Dedza-Ntcheu 1406995 16 5
Dowa-Ntchisi-Kasungu-Mchinji 2322675 16 4
Lilongwe 2310728 12 4
Machinga-Mangochi 1608745 16 5
Mzimba-Karonga-Rumphi 1578519 14 4
Nkhata Bay-Chitipa-Likoma 514968 16 5
Nkhotakota-Salima 773482 14 4
Malawi Total 16512568 15 4
The STDev was rounded for having just two levels
6. Total Sample Size (n): 2000 surveyed respondents
The required sample size needed in each Governorate calculated using the equation (1.2) will be:
Simple Random Sampling
District(s) Population rCSI Mean rCSI Stdev SRS 5%-90%
Blantyre-Mwanza-Neno-Balaka 1933263 12 4 481
Chikwawa-Nsanje 892772 16 5 423
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421 14 4 353
Dedza-Ntcheu 1406995 16 5 423
Dowa-Ntchisi-Kasungu-Mchinji 2322675 16 4 271
Lilongwe 2310728 12 4 481
Machinga-Mangochi 1608745 16 5 423
Mzimba-Karonga-Rumphi 1578519 14 4 353
Nkhata Bay-Chitipa-Likoma 514968 16 5 422
Nkhotakota-Salima 773482 14 4 353
Rescaling it to the total sample size of 2000 individuals:
District(s) Population rCSI Mean rCSI Stdev SRS 5%-90% Given 5/90 -> scaling by 2000
Blantyre-Mwanza-Neno-Balaka 1933263 12 4 481 242
Chikwawa-Nsanje 892772 16 5 423 212
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421 14 4 353 177
Dedza-Ntcheu 1406995 16 5 423 212
Dowa-Ntchisi-Kasungu-Mchinji 2322675 16 4 271 136
Lilongwe 2310728 12 4 481 242
Machinga-Mangochi 1608745 16 5 423 212
Mzimba-Karonga-Rumphi 1578519 14 4 353 177
Nkhata Bay-Chitipa-Likoma 514968 16 5 422 212
Nkhotakota-Salima 773482 14 4 353 177
Rounding/Appying
Human Judgement*
240
210
180
210
135
245
210
180
210
180
Final sample size
Applying Human Judgement
Number of Attempts – Malawi caseGiven:
•The Response rate (25%)
•Estimated sample size in each strata:
The desired attempts=estimated sample size/response rate
Aggregated Districts
Rounding/Appying
Human Judgement*
Blantyre-Mwanza-Neno-Balaka 240
Chikwawa-Nsanje 210
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 180
Dedza-Ntcheu 210
Dowa-Ntchisi-Kasungu-Mchinji 135
Lilongwe 245
Machinga-Mangochi 210
Mzimba-Karonga-Rumphi 180
Nkhata Bay-Chitipa-Likoma 210
Nkhotakota-Salima 180
Aggregated Districts
Rounding/Appying
Human Judgement* Desired Attempts
Blantyre-Mwanza-Neno-Balaka 240 960
Chikwawa-Nsanje 210 840
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 180 720
Dedza-Ntcheu 210 840
Dowa-Ntchisi-Kasungu-Mchinji 135 540
Lilongwe 245 980
Machinga-Mangochi 210 840
Mzimba-Karonga-Rumphi 180 720
Nkhata Bay-Chitipa-Likoma 210 840
Nkhotakota-Salima 180 720
top related