sample size estimation - vam resource center · pdf filesample size estimation. ......

Sample size estimation

Steps for calculating the sample size per stratum

1. Choose the stratification (e.g.

regions, district,….)

2. Define the population (N) of each

strata

3. Decide on key indicator(s)

4. Estimate mean & variance or prevalence

of key indicator

5. Decide on precision and confidence level

6. Calculate the initial total sample size (n)

according to the budget/time

7. Use simple random sample per strata to

select your representative sample

Final sample size

Calculate sample size – random sampleTo estimate sample size, you need to know:

•Estimate of the prevalence or mean & STDev of the key indicator (e.g. 30% poor food consumption)

•Precision desired (for example: ± 5%)

•Level of confidence (for example: 95%)

•Population (only if below 10,000, otherwise it will not influence the required sample size)

•Expected response rate (for example: 90%)

•Number of eligible individuals per household (if applicable)

• Precision is the variability of the estimate

• Confidence Level is the probability of the same result if you re-sampled, all other things equal

• Confidence Interval is the interval around the estimate for which we have a desired confidence level

• Choose Size of Confidence Interval (Precision) given a confidence level

• As long as the target population is more than a few thousand households, it will not influence the required sample size. Only if less, the sample size requirements will reduce slightly.

Prevalence vs. mean

For the purpose of calculating sample size:

Treat variables as prevalence rates only when they are naturally binary

DO NOT threshold continuous indicators when calculating sample size even if for analysis purposes they are turned into prevalence indicators

e.g. the Food Consumption Score (FCS) is a continuous indicator (0-112) but it is tresholded at 21 and 35 for poor and borderline prevalence

• Prevalence is the total number of cases for a variable of interest that is typically binarywithin a population divided by its total population (e.g. stunting incidence, unemployment)

• Mean is the expected value of a variable of interest that is typically continuous or within a prescribed range for a given population (e.g. height, weight, age)

Choosing the right distribution

For continuous variables we must choose a probability distribution that best fits the data

Step 1: ALWAYS plot a histogram of past data and choose the distributional form!

Two of the most common ones encountered at WFP are:

Prevalence rates always follow a binomial distribution which is why they are mathematically easy to deal with

NormalDistribution

Negative BinomialDistribution

Sample size formula for prevalence (single survey using random sampling)

To calculate sample size for estimate of prevalence with 95%confidence limit

𝒏 = 1.962

x (P)(1-P)

• 1.96 = Z value for 95% confidence limits• P = Estimated prevalence (e.g. 0.3 for 30%)• (P)(1-P) = variance for a binary (binomial) variable• d = ½ of desired confidence interval (e.g. 0.025 for ± 5%)

Sample size formula for continuous variable (single survey using random sampling)

To calculate sample size for estimate with 95% confidence limit:

𝒏 = 1

𝑁+𝜇∗𝑑 2

1.962 𝜎2

• 1.96 = Z value for 95% confidence limits• μ = Expected mean • σ2 = variance of variable• d = ½ of desired confidence interval (e.g. 0.025 for ± 5%)• μ*d = absolute value of confidence interval• N= population of each strata.

Sample size formula for cluster sampling

• To calculate sample size for estimate of prevalence with 95% confidence interval taking into account cluster sampling:

N = DEFF x 1.962 x (P)(1-P)d2

DEFF = Design effect1.96 = Z value for p = 0.05 or 95% confidence intervalsP = Estimated prevalenced = Desired precision (for example, 0.05 for ± 5%)

What is Design effect?

• Ratio of the actual variance from the sampling method used, to the hypothetical variance under simple random sampling

• For clustered sampling:• N = # of Samples |K = # of clusters |M = # of samples per cluster

• Deff > 1 always for cluster sampling because it can never be more efficient than random sampling, vise versa for stratified sampling

Design effect

Design effect increases when• Key indicators are highly geographically clustered (e.g. water source, access to

health care)• Water source

• Access to health care

• When number of clusters are decreased and size of clusters are increased

To minimize design effect• Include more clusters of smaller size

• Stratify sample into more homogeneous groups

• All clusters should be of same size

Example1: Key indicator Normally distributedFood Consumption Score

Example 1: Iraq Case Study1. Choose the stratification (Strata): 18 Governorates of Iraq

2. Define Population (N) of each strata:

3. Key Indicator: Food Consumption Score (FCS)

Governorate(s) Population 2015

Anbar 1723154

Babil 2008609

Najaf 1428979

Baghdad 7882807

Basrah 2822646

Diyala 1592434

Duhok 848524

Erbil 1650224

Kerbala 1183818

Kirkuk 1551670

Missan 1080392

Wassit 1340116

Muthanna 792339

Qadissiya 1254963

Ninewa 3397659

Salah al-Din 1551978

Sulaymaniyah 1858506

Thi-Qar 2035734

4. Calculate the Mean (μ) and Standard deviation (STDev) of the FCS (Key Indicator)

5. Decide on precision and confidence level:

• 90% confidence interval

• Z=1.645

N.B. A confidence interval of 10% at 90% confidence is the absolute minimum!

The bigger the CI then the worse the ability to detect trends!

The STDev was rounded for having just two levels

6. Initial total sample size (n): 2200 surveyed respondents

The required sample size needed in each Governorate calculated using the equation (1.2) will be:

Governorate(s) Population FCS Mean FCS Stdev SRS 5%-90%

Anbar 1723154 75 20.5 323

Babil 3437588 82.5 17.5 195

Najaf 3437588 82.5 17.5 195

Baghdad 7882807 82.5 17.5 195

Basrah 2822646 85 17.5 184

Diyala 1592434 80 17.5 207

Duhok 848524 77.5 20.5 303

Erbil 1650224 80 17.5 207

Kerbala 1183818 82.5 17.5 195

Kirkuk 1551670 75 17.5 236

Missan 2420508 80 17.5 207

Wassit 2420508 80 17.5 207

Muthanna 2047302 82.5 17.5 195

Qadissiya 2047302 82.5 17.5 195

Ninewa 3397659 77.5 20.5 303

Salah al-Din 1551978 80 17.5 207

Sulaymaniyah 1858506 80 17.5 207

Thi-Qar 2035734 85 17.5 184

Iraq 36004552 3348

• After estimating the sample size in each governorate, rescale it to the total

sample size of 2200 (n) using the following formula: 𝑛𝑛

𝑛𝐼𝑅𝐴𝑄

Simple Random Sampling

Given 5/90 -> scaling by 2200

Example2: Negative binomial distributed indicatorreduced Coping Strategy Index

Example 2: Malawi Case Study

1. Define the total Population (N):

2. Define the population in each Strata (Nh): (Strata= 10 aggregated districts of Malawi)

Stratified Random Sampling

District(s) Population

Blantyre-Mwanza-Neno-Balaka 1933263

Chikwawa-Nsanje 892772

Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421

Dedza-Ntcheu 1406995

Dowa-Ntchisi-Kasungu-Mchinji 2322675

Lilongwe 2310728

Machinga-Mangochi 1608745

Mzimba-Karonga-Rumphi 1578519

Nkhata Bay-Chitipa-Likoma 514968

Nkhotakota-Salima 773482

Malawi Total 16512568

StDev = µ+µ2

𝑛ℎ,𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒

Where nh,baseline = sample size of eachstrata of the baseline with h=1,…,10

Stratified Random Sampling

3. Key Indicator: reduced Coping Strategy Index (rCSI)4. Calculate the Mean (μ) and Standard deviation (STDev) of the rCSI

StDev = µ+µ2

𝑛ℎ,𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒

Where nh,baseline = sample size of each

strata of the baseline with h=1,…,10

5. Decide on precision and confidence level:

• 90% confidence interval

• Z=1.645

N.B. A confidence interval of 10% at 90% confidence is the absolute minimum!

The bigger the CI then the worse the ability to detect trends!

District(s) Population rCSI Mean rCSI Stdev

Blantyre-Mwanza-Neno-Balaka 1933263 12 4

Chikwawa-Nsanje 892772 16 5

Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421 14 4

Dedza-Ntcheu 1406995 16 5

Dowa-Ntchisi-Kasungu-Mchinji 2322675 16 4

Lilongwe 2310728 12 4

Machinga-Mangochi 1608745 16 5

Mzimba-Karonga-Rumphi 1578519 14 4

Nkhata Bay-Chitipa-Likoma 514968 16 5

Nkhotakota-Salima 773482 14 4

Malawi Total 16512568 15 4

The STDev was rounded for having just two levels

6. Total Sample Size (n): 2000 surveyed respondents

The required sample size needed in each Governorate calculated using the equation (1.2) will be:

Simple Random Sampling

District(s) Population rCSI Mean rCSI Stdev SRS 5%-90%

Blantyre-Mwanza-Neno-Balaka 1933263 12 4 481

Chikwawa-Nsanje 892772 16 5 423

Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421 14 4 353

Dedza-Ntcheu 1406995 16 5 423

Dowa-Ntchisi-Kasungu-Mchinji 2322675 16 4 271

Lilongwe 2310728 12 4 481

Machinga-Mangochi 1608745 16 5 423

Mzimba-Karonga-Rumphi 1578519 14 4 353

Nkhata Bay-Chitipa-Likoma 514968 16 5 422

Nkhotakota-Salima 773482 14 4 353

Rescaling it to the total sample size of 2000 individuals:

District(s) Population rCSI Mean rCSI Stdev SRS 5%-90% Given 5/90 -> scaling by 2000

Blantyre-Mwanza-Neno-Balaka 1933263 12 4 481 242

Chikwawa-Nsanje 892772 16 5 423 212

Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421 14 4 353 177

Dedza-Ntcheu 1406995 16 5 423 212

Dowa-Ntchisi-Kasungu-Mchinji 2322675 16 4 271 136

Lilongwe 2310728 12 4 481 242

Machinga-Mangochi 1608745 16 5 423 212

Mzimba-Karonga-Rumphi 1578519 14 4 353 177

Nkhata Bay-Chitipa-Likoma 514968 16 5 422 212

Nkhotakota-Salima 773482 14 4 353 177

Rounding/Appying

Human Judgement*

Final sample size

Applying Human Judgement

Number of Attempts – Malawi caseGiven:

•The Response rate (25%)

•Estimated sample size in each strata:

The desired attempts=estimated sample size/response rate

Aggregated Districts

Rounding/Appying

Human Judgement*

Blantyre-Mwanza-Neno-Balaka 240

Chikwawa-Nsanje 210

Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 180

Dedza-Ntcheu 210

Dowa-Ntchisi-Kasungu-Mchinji 135

Lilongwe 245

Machinga-Mangochi 210

Mzimba-Karonga-Rumphi 180

Nkhata Bay-Chitipa-Likoma 210

Nkhotakota-Salima 180

Aggregated Districts

Rounding/Appying

Human Judgement* Desired Attempts

Blantyre-Mwanza-Neno-Balaka 240 960

Chikwawa-Nsanje 210 840

Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 180 720

Dedza-Ntcheu 210 840

Dowa-Ntchisi-Kasungu-Mchinji 135 540

Lilongwe 245 980

Machinga-Mangochi 210 840

Mzimba-Karonga-Rumphi 180 720

Nkhata Bay-Chitipa-Likoma 210 840

Nkhotakota-Salima 180 720

sample size estimation - vam resource center · pdf filesample size estimation. ......

Documents

sample size estimation in prevalence studies

size for prevalence estimation - library and archives...

project size estimation and measurement techniques

estimation of gastrointestinal polyp size in video...

software size estimation performance of...

estimation of ventricular volumes by a constant infusion...

a population size estimation problem

size measurement and estimation

sample size & power estimation

estimation of hydraulic conductivity from grain size...

the shadow economy of bangladesh: size estimation …...

software size estimation i

size estimation of most at risk populations

software size and cost estimation

selection of endpoints and sample size estimation in...

crowd size estimation (2) - technology - winona...

metrics for project size estimation

thematic group 2: sample size - european commission ·...

geometric analysis for the size estimation of subsurface...

energy efficiency: indicator, estimation, and a new idea