multiple indicator cluster surveys survey design workshop sampling: advanced sampling mics survey...

Post on 18-Dec-2015

218 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Multiple Indicator Cluster SurveysSurvey Design Workshop

Sampling: Advanced Sampling

MICS Survey Design Workshop

Major steps in designing MICS sample• Define objectives

– Key indicators– Desired level of precision– Sub-national domains of estimation

• Identify most appropriate sampling frame– Most recent census of population and housing– Master sample or sample for another survey

conducted recently

Major steps in designing MICS sample

• Determine sample size and allocation–Determine availability of previous

MICS or DHS results to provide measures of sampling parameters

Sampling Frame• Sampling frame:

–Nationally-representative –Complete coverage–Measures of size (households or

population) for small area units

• Generally most recent census is the most effective sampling frame

Sampling Frame

• In some cases more recent pre-census listing may be available

• When no census is available, identify most complete geographic frame available (e.g. list of villages/localities with estimated population)

Sampling Frame

• Common problems with area frames:–Coverage issues–Census maps of poor quality–Errors and changes in area boundaries–Inappropriate type and size of area

units–Lack of auxiliary information

SAMPLE SIZE DETERMINATION

• n is the required sample size (number of households)

• 4 is a factor to achieve the 95 percent level of confidence

• r is the predicted or estimated value of the indicator in target population

• deff is the design effect

• RR is the response rate• pb is the proportion of the target

subpopulation in total population (upon which the indicator, r, is based)

• AveSize is the average household size (that is, average number of persons per household)

• e is the margin of error to be tolerated at the 95% level of confidence

• Currently, note that e = 0.12r [defined as 12% of r, in this case the relative standard error of r is 6% because e = 2 standard error (r)]

Previously in MICS2

• 2 different values for margin of error – Margin of error was 5 percentage points for high

values of r (over 25%)– Margin of error was 3 percentage points for low

values of r (25% or less) • Difficulty for users in deciding on the sample

size for their surveys.

MICS template for sample size calculation - EXCEL FILE

Selection of key indicators

• Choose an important indicator that will yield the largest sample size

• Step 1: Select 2 or 3 target populations representing each a small percentage of the total population (pb); typically – Children 12-23 months: 2-4% or – Children under 5 years: 7%-20%

Selection of key indicators

• Step 2: Review important indicators for these target groups but ignore indicators with very low or very high prevalence (less 10% or over 40%, respectively)

• Do not choose from the desirably low coverage indicators an indicator that is already acceptably low

• Do no choose childhood and maternal mortality ratios

Explicit Stratification

• Explicit stratification: dividing the sampling frame into sub-groups (called strata) of homogeneous (similar) PSUs.

• Advantages: – Better precision because reduced variance

within stratum given similarity of units – Flexible design, sub-national estimates for

smaller domains (differential sampling rates)• Example of stratification: region, urban/rural

Implicit Stratification

• Sort the sampling frame according to certain characters such as regions, urban-rural residence, sub-regions, districts, etc., then select a systematic pps sample.

• Ensures a representative sample for each subgroup

• Automatically provides proportional allocation by size of subgroup

Allocation of sample to strata/domains

• Proportional allocation– Effective for precision of estimates at the national level

• Equal allocation to each domain– Used when each domain requires same level of precision

• Optimum allocation – takes into account differential variance and costs by stratum– For example, variability may be higher in urban areas and

enumeration costs may be higher in rural areas – use higher sampling rate for urban areas

Subnational estimates• Number of separate areas (domains) for which

separate, equally reliable estimates are wanted affects sample size

• For example, if 10 regional estimates are wanted, theoretically the sample should be increased by factor of 10

• As a compromise, larger sampling errors accepted for subnational estimates– One proposal (by Dr. Vijay Verma) – increase national

sample size by factor of D0.65, where D is the number of domains

– Results in an average increase in the sampling errors for domain estimates by a factor of about 1.5

Sampling Stages

• Ideal to have two-stage sample design, with EAs defined as PSUs

• In some countries only frame of larger administrative units available – Three-stage sample design: larger area units

selected as PSUs– Necessary to delineate smaller segments in each

sample PSU

Number of PSUs and Cluster Size• Survey costs depend not only on number of

households but their distribution among primary sampling units (PSUs)

• Important to determine effective balance between number of sample PSUs and number of sample households per cluster

• In general, the more PSUs the better for reliability but the greater the cost (mostly costs of travel and listing)

Number of PSUs and Cluster Size• Example: 8000 households selected in 400

PSUs of 20 sample households each is a much more reliable sample than 200 PSUs of 40 households each, but more expensive

• Number of sample households per cluster should be as small as practical for reliability

• A range of 15-25 households for MICS appears to be effective

Design Effect (DEFF)

• Deff - ratio of variance of estimate based on stratified multi-stage sample design and corresponding variance from simple random sample of same size

• Measure of the relative efficiency of the sample design

• Effective stratification reduces the deff• Cluster sampling increases the deff

Design Effect (DEFF)

• In case of cluster sampling, deff generally measures effect of clustering

• δ = intraclass correlation coefficient, or measure of homogeneity within cluster

• = average number of households per cluster• Design effect increases with intraclass correlation

and cluster size

)1(1_

mdeff

_

m

First Stage Selection of PSUs

• Standard methodology for MICS and other household surveys – select EAs or clusters systematically with PPS

• Important to sort frame before selection, in order to ensure effective implicit stratification

• Traditional procedure – cumulate measures of size, determine sampling interval and random start, generate selection numbers

Large sample PSUs in PPS sampling

• Sometimes a PSU may have a measure of size larger than the sampling interval

• PSU may be selected more than once in the systematic PPS selection

• Option 1 – if the PSU is selected two or more times, multiply the number of households to be selected by the number of “hits”

• Option 2 – separate the large PSUs and include in sample with a probability of 1

MICS Sampling Option 1 – new sample with household listing

• Design new MICS sample • Two stages with census as frame• Use of implicit stratification, systematic selection of

census EAs at first stage with pps• List households in selected EAs/segments• Select households systematically from listing• Interview selected households, no replacement will

be allowed

Sampling Option 1 - continued

• Advantages of option 2- simple design- probability-based- if possible self-weighting (national level)

• Limitations of option 2- expense of listing households- time necessary to list households

[Example, sample size of 5000 households may require 25000 to 50000 households to be listed]

MICS Sampling Option 2 – use an existing sample

• Design MICS as a rider to another survey if timely and feasible

• Use sample from a previous survey and re-interview households for MICS

• Or, use old survey sample EAs and construct new listing of households to select for MICS

• Old sample must be probability-based, national in scope

• Possibilities – DHS, other national health survey, recent labour force survey

• Important: design parameters must be known (such as selection probability, stratification, etc.)

Sampling option 2 - continued

• Use of existing master sampling frame• Some countries use master sample design for

intercensal national household surveys• Master samples generally sufficiently large for

MICS; subsample of PSUs can be selected• Advantage – updated maps may be available

for master sample of PSUs, and perhaps updated listing

Sampling option 2 - continued• Advantages of using previous sample

- cost savings- maps available for interviewers- appropriate sampling plan available- simplicity

• Limitations of using old sample- burden on respondents- sample design may need modification

* sample size* sub-national coverage* number of PSUs or clusters

• Balance between loss and gain

Listing and Selection of Households• Household listing manual is available • Importance of new listing to represent current

population• Problems with using previous listing (older

than 1 year)– Does not represent newer households– Distribution of sample population by age group

distorted, generally with higher median age– Difficulty of finding households in old list

Listing and Selection of Households

• MICS recommends a separate household listing operation– More reliable as listing staff are less likely than

interviewers to bias the sample by excluding households that are difficult to reach

– Allows household selection to be done in a single central location using reliable and uniform procedures

Listing and Selection of Households

• Household selection in the office:– Advantages – conducted by specialized staff,

possible to avoid selection bias in the field, possible to control overall sample size

– Disadvantage – increased costs from having two field visits

• Selection in the field: use household selection table – Advantage – cost savings of having one integrated

field operation– Disadvantage - correct sampling may be difficult

for field staff, selection may be biased

Listing and Selection of Households

• Excel template for generating automatically the sample of households based on the number of households listed(see spreadsheet)

• Common problems found in listing operations– Problem with quality of sketch maps – difficult to

determine segment boundaries– Sometimes large differences found between

number of households in frame (census) and number listed.

Sampling strategy for low fertility countries

• In MICS 4 and 5, some low fertility countries are using second-stage stratification of listing by households with and without children under 5

• Higher sampling rate used for households with children

• Increases number of households with children in MICS sample, and therefore number of sample children

Sampling strategy for low fertility countries (continued)

• Improves the reliability of the child indicators without increasing the sample size to a very high level

• This procedure also increases the variability in the weights and the design effects for the overall sample

• Important to avoid very large variability in the weights for households with and without children– Differential weights between households with and without

children generally should not exceed a factor of about 4

Implications of sampling strategy on sample size calculations

• One parameter in the sample size calculation template is the proportion of the indicator subpopulation

• Using a higher sampling rate for households with children increases the proportion of children under 5 in the sample

• The proportion of children under 5 (or smaller age groups) should be multiplied by a factor that reflects the increase in sample households with children

Implications of sampling strategy on weighting procedures

• Under normal MICS sample design, weights vary by sample cluster

• With second stage stratification by households with and without children, two weights need to be calculated for each cluster: for households with and without children

Survey weighting procedures

• Survey data collected using a complex design featuring clustering, unequal probabilities of selection and stratification:– All analyses must apply survey weights in order to

prevent biased results• Formulas for calculating weights depend on

the exact sample design used in each country• MICS has 4 set of weights: households,

women, men and children

Survey weighting procedures

• Components of MICS survey weights: – Design weight: inverse of the final probability of

selection for households– Adjustment factors for nonresponse (cluster,

household, woman, child level)• Normalized weights so that the total weighted

number of observations is equal to the total unweighting number (sample size)

Survey weighting procedures

Sampling Error Estimation• Necessary to evaluate reliability of survey estimates• Possible only when probability sampling is used• Should be done for 30-50 important indicators• Methodology is complex and design-specific• Several software packages:

– SPSS Complex Samples module – used in MICS– SAS, Stata, SUDAAN, Clusters, WesVar, CENVAR,

PCCarp, etc.• Standard error, confidence intervals and DEFF

Sampling Error Estimation SPSS Complex Samples module

• Advantages:– Simple to use– Template syntax available for standard indicators– Supported by MICS Global and Regional staff

• Steps:– Set up sampling parameter specifications file

(csplan)– Define variables for stratum, PSU and weight

Sampling Error Estimation SPSS Complex Samples module

• Stratum should be lowest level of explicit stratification (for example, province, urban/rural)

• Necessary to have minimum of two sample PSUs per stratum

Reducing bias• Accuracy of survey results depends on both variance and bias

(mostly from nonsampling errors)• Bias should be minimized with quality control for all survey

operations• Basic data quality determined during enumeration

– Important to have good training and supervision in the field• Data capture should include 100% or sample verification • Important to have quality control for editing and coding

procedures• Computer consistency and range checks

top related