survey sampling and weights

Upload: jessica-angelina

Post on 03-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Survey Sampling and Weights

    1/58

    Survey sampling

    Sampling & non-sampling error

    Bias

    Simple sampling methods

    Sampling terminology

    Cluster sampling

    Design effect Stratified sampling

    Sampling weights

  • 8/12/2019 Survey Sampling and Weights

    2/58

    Why sample?

    To make an inference about a

    population Studying entire pop is impractical or

    impossible

  • 8/12/2019 Survey Sampling and Weights

    3/58

    Example of sampling

    Estimate the proportion of adults,

    ages 18-65, in Port Elizabeth that

    have type 2 diabetes

    Select a sample from which to

    estimate the proportion

    Population: adults aged 18-65 living

    in Port Elizabeth

    Inference: proportion with type 2

    diabetes

  • 8/12/2019 Survey Sampling and Weights

    4/58

    Probability sampling

    Each individual has known (non-

    zero) probability of selection Precision of estimates can be

    quantified

  • 8/12/2019 Survey Sampling and Weights

    5/58

    Non-probability sampling

    Cheaper, more convenient

    Quality of estimates cannot beassessed

    May not be representative of

    population

  • 8/12/2019 Survey Sampling and Weights

    6/58

    Sampling error

    v.

    Non-sampling error

  • 8/12/2019 Survey Sampling and Weights

    7/58

    Sampling error

    Random variability in sample

    estimates that arises out of the

    randomness of the sample selection

    process

    Precision can be quantified

    (estimation of standard errors,

    confidence intervals)

  • 8/12/2019 Survey Sampling and Weights

    8/58

    Non-sampling error

    Estimation error that arises from

    sources other than random variation

    non-response

    undercoverage of survey

    poorly-trained interviewers

    non-truthful answers

    non-probability sampling

    This type of error is a bias

  • 8/12/2019 Survey Sampling and Weights

    9/58

    What is bias?

    We want to estimate the mean weight of allwomen aged 15-44 living in Coopersville.

    Suppose there are 50,000 such women and

    the true mean weight is 61.7 kg.

    We select a sample of 200 such women and

    interview them, asking each woman what

    her weight is.

    The sample mean weight is 59.4 kg.

    Is our estimate biased?

  • 8/12/2019 Survey Sampling and Weights

    10/58

    Bias

    Suppose we could repeat the survey

    many, many times.

    Then we compute the mean of all thesample means.

    Say the mean of the means = 62.9

    Bias= (mean of means) - (true mean)

    = 62.9 - 61.7 = 1.2 kg

  • 8/12/2019 Survey Sampling and Weights

    11/58

    Unbiased estimation

    If . . .

    (mean of the means) = (true mean)then the bias is zero, and we say that

    the estimator is unbiased.

    The mean of the means is called theexpected value of the estimator.

  • 8/12/2019 Survey Sampling and Weights

    12/58

    Simple sampling methods

    Task: Select a sample of n

    individuals or items from a

    population of N individuals or

    items

    Common methods

    simple random sampling

    systematic sampling

  • 8/12/2019 Survey Sampling and Weights

    13/58

    Simple sampling methods

    Simple random sampling (SRS)

    each item in population is equally likely

    to be selected each combination of n items is equally

    likely to be selected

    Systematic sampling (typical method) randomly select a starting point

    select every kth item thereafter

  • 8/12/2019 Survey Sampling and Weights

    14/58

    Systematic sampling example

    Stack of 213 hospital admission forms; select a

    sample of 15

    213/15 = 14.2 Select every 14th form Starting point: random number between 1 and 14

    (we choose 11)

    First form selected is 11th from top

    Second form selected is 25th from top (11 + 14 = 25) Third form selected is 39th from top (11 + 2x14 = 39)

    And so forth . . .

  • 8/12/2019 Survey Sampling and Weights

    15/58

    Systematic sampling, continued

    What is the probability that the 146th

    form will be selected? The 195th?

    Does this qualify as a simple random

    sample? Why or why not?

    Is there any potential problem arisingfrom the use of systematic sampling

    in this situation?

  • 8/12/2019 Survey Sampling and Weights

    16/58

    Example was typical

    quick method

    In the preceding example, we

    selected every 14th form

    Ideally, we would select every 14.2thform (see later example on 2-stage

    sample of nurses)

    Example is a quick and easy method,

    commonly used in the field; it is a

    good approximation to the more

    rigorous procedure

  • 8/12/2019 Survey Sampling and Weights

    17/58

    Systematic sampling: + and -

    Advantages of systematic sampling

    typically simpler to implement than SRS

    can provide a more uniform coverage

    Potential disadvantage of systematic

    sampling

    can produce a bias if there is asystematic pattern in the sequence of

    items from which the sample is selected

  • 8/12/2019 Survey Sampling and Weights

    18/58

    Role of simple sampling methods

    These simple sampling methods are

    necessary components of more

    complex sampling methods: cluster sampling

    stratified sampling

    Well discuss these more complexmethods next (following some

    definitions)

  • 8/12/2019 Survey Sampling and Weights

    19/58

    Definitions

    Listing units (or enumeration units) the lowest level sampled units (e.g.,

    households or individuals)

    PSUs (primary sampling units) the first units sampled (e.g., states or

    regions)

    Sampling probability for any unit eligible to be sampled, the

    probability that the unit is selected in

    the sample

  • 8/12/2019 Survey Sampling and Weights

    20/58

    More definitions

    EPSEM sampling

    equal probability of selection method,

    thus a method in which each listing unithas the same sampling probability

    Sampling frame

    the set of items from which sampling isdone--often a list of items.

  • 8/12/2019 Survey Sampling and Weights

    21/58

    More definitions

    Undercoverage: the degree to which

    we fail to identify all eligible units in

    the population

    incomplete lists

    incomplete or incorrect eligibility

    information

  • 8/12/2019 Survey Sampling and Weights

    22/58

    Still more definitions

    Non-response: failure to interview

    sampled listing units (study subjects)

    refusal

    death

    physician refusal

    inability to locate subject

    unavailability

  • 8/12/2019 Survey Sampling and Weights

    23/58

    Still more definitions

    Precision: the amount of random

    error in an estimate

    often measured by the width or half-width of the confidence interval

    standard error is another measure of

    precision estimates with smaller standard error or

    narrower CI are said to be more precise

  • 8/12/2019 Survey Sampling and Weights

    24/58

    CLUSTER SAMPLING

    single stage

  • 8/12/2019 Survey Sampling and Weights

    25/58

    Clusters

    Subsets of the listing units in the

    population

    Set of clusters must be mutuallyexclusive and collectively exhaustive

    counties

    townships regions

    institutions

  • 8/12/2019 Survey Sampling and Weights

    26/58

    Example

    Single-stage cluster sampling

    There are 361 nurses working at the

    31 hospitals and clinics in Region 4

    We wish to interview a sample of

    these nurses

    select a simple random sample of 5

    hospitals/clinics

    interview all nurses employed at the 5

    selected institutions

  • 8/12/2019 Survey Sampling and Weights

    27/58

    Assessing the example

    Hospitals/clinics are the PSUs

    Nurses are the listing units

    Sampling probability for each nurse

    is 5/31

    Thus, this is an EPSEM sample

    Sampling frame is the list of 31

    hospitals and clinics

  • 8/12/2019 Survey Sampling and Weights

    28/58

    CLUSTER SAMPLING

    two stage

  • 8/12/2019 Survey Sampling and Weights

    29/58

    Cluster sampling -- two stage

    Select a sample of clusters, as in the

    single-stage method

    From each selected cluster, select a

    subsample of listing units

  • 8/12/2019 Survey Sampling and Weights

    30/58

    Cluster sampling -- two stage

    It is always nice to do EPSEM

    sampling because such samples are

    self-weighting

    dont need sampling weights in analysis

    A common EPSEM method for two-

    stage sampling is PPS (probability

    proportional to size)

  • 8/12/2019 Survey Sampling and Weights

    31/58

    PPS sampling

    The key to the method is that the

    sampling probabilities of clusters in

    the first stage are proportional to thesizes of the clusters

    size = number of listing units in cluster

    At stage 2, select the same numberof listing units from each selected

    cluster

  • 8/12/2019 Survey Sampling and Weights

    32/58

    Nurse example revisited

    Two-stage sampling

    We want to interview a sample of 36

    nurses

    We can afford to visit 9 different

    hospitals/clinics

    Thus, we need to interview 36/9 = 4

    nurses at each institution

  • 8/12/2019 Survey Sampling and Weights

    33/58

    Nurse example revisited

    Two-stage sampling

    Stage 1: select a sample of 9

    hospitals/clinics

    Selection prob. proportional to size

    Stage 2: select a sample of 4 nurses

    from each selected institution

    At each stage, use one of the simple

    sampling methods

  • 8/12/2019 Survey Sampling and Weights

    34/58

    Nurse example revisited

    Two-stage sampling

    PSUs are the hospitals/clinics

    Listing units are the nurses Sampling frames

    Stage 1: List of 31 hospitals/clinics

    Stage 2: Lists of nurses at eachselected hospital/clinic

  • 8/12/2019 Survey Sampling and Weights

    35/58

    Selecting 2-stage nurse sample

    Sampling interval, I= 361/9 = 40.1

    Starting point, random number between 1and 40; we choose R = 14

    First sampling number = R = 14

    2nd sampling number = 14 + 1x40.1 = 54.1

    3rd sampling number = 14 + 2x40.1 = 94.2

    We have selected institutions 2, 5, 9, . . .

  • 8/12/2019 Survey Sampling and Weights

    36/58

    Two-stage nurse sample

    InstitutionNumber No. ofNurses CumulativeNurses SamplingNumber

    1 12 122 7 19 143 9 284 18 46

    5 11 57 54.16 7 647 10 748 14 889 8 96 94.2

    .

    ...

    .

    .31 9 361

    Total 361

  • 8/12/2019 Survey Sampling and Weights

    37/58

    Applying the sampling numbers

    For each sampling number, choose

    the first unit with cumulative size

    equal to or greater than the sampling

    number

    Example: sampling number 54.1

    first unit with cumulative size 54.1is unit 5 (cum. no. of nurses = 57)

    so we select unit 5 for the sample

  • 8/12/2019 Survey Sampling and Weights

    38/58

    Optional challenge

    What is the selection probability for institution 1?

    12/40.1 = 0.299

    What is the selection probability for a nurse ininstitution 1?

    (12/40.1) x (4/12) = 0.998 = 36/361

    What is the selection probability for a nurse in

    institution 2?

    (7/40.1) x (4/7) = 0.998 = 36/361

    All nurses have the same selection probability.

  • 8/12/2019 Survey Sampling and Weights

    39/58

    Why do cluster sampling instead

    Of a simple sampling method?

    Advantages

    reduced logistical costs (e.g., travel)

    list of all 361 nurses may not be available

    (reduces listing labor)

    Disadvantages

    estimates are less precise

    analysis is more complicated (requires

    special software)

  • 8/12/2019 Survey Sampling and Weights

    40/58

    Design effect

    Relative increase in variance of an

    estimate due to the sampling design

    variance = (standard error)2

    Formula

    s1 = standard error under simple

    random sampling

    s2 = standard error under complex

    sampling design (e.g., cluster sampling)

    design effect = (s2/s1)2

  • 8/12/2019 Survey Sampling and Weights

    41/58

    Design effect for cluster sampling

    For cluster sampling designs, the

    design effect is always >1

    This means that estimates from asurvey done with cluster sampling

    are less precise than corresponding

    estimates obtained from a surveyhaving the same sample size done

    with simple random sampling

  • 8/12/2019 Survey Sampling and Weights

    42/58

    Cluster sizes

    Recommended take per cluster is20-40 for multi-purpose surveys

    Time and resource limitations will

    often dictate the maximum number ofclusters you can include in the study

    Including more clusters improves the

    precision of your estimates more

    than a corresponding increase in

    sample size within the clusters

    already in the sample

  • 8/12/2019 Survey Sampling and Weights

    43/58

    STRATIFIED

    SAMPLING

  • 8/12/2019 Survey Sampling and Weights

    44/58

    Strata

    Subsets of the listing units in thepopulation

    Set of strata must be mutually

    exclusive and collectively exhaustive

    Strata are often based on

    demographic variables

    age

    sex

    race

  • 8/12/2019 Survey Sampling and Weights

    45/58

    Stratified sampling

    Sample from each stratum

    Often, sampling probabilities varyacross strata

  • 8/12/2019 Survey Sampling and Weights

    46/58

    Stratified sampling

    Advantages guarantees coverage across strata

    can over-sample some strata in order to obtain

    precise within-stratum estimates

    typically, design effect < 1

    Disadvantages

    with unequal sampling probabilities, sampling

    weights must be included in analysis more complicated

    requires special software

  • 8/12/2019 Survey Sampling and Weights

    47/58

    Example: sampling breast cancer

    cases for the Womens CARE Study

    Stratification variables

    geographic site race (2 races)

    five-year age group

    Over-sampled younger women Over-sampled black women

  • 8/12/2019 Survey Sampling and Weights

    48/58

    Example: Sampling households

    for a reproductive health surveyin 11 refugee camps in Pakistan

    Selected simple random sample ofhouseholds from within each of the

    11 camps

    All households were selected withthe same probability

  • 8/12/2019 Survey Sampling and Weights

    49/58

    Refugee camp sampling

    Camp PopulationSample

    SizeCompletedInterviews

    Lakhte Banda 12,943 64 61Kotki 1 7,262 36 29Kotki 2 5781 29 21

    Kata Kanra 8,437 42 38Mohd Khoja 12,791 63 45Doaba 13,584 67 25Darsamand 17,797 88 53Kahi 11,061 55 32Naryab 5,543 28 19

    Thal 1 11,087 55 44Thal 2 17,130 85 60Dallan 10,990 55 45

    Total 134,406 667 472

  • 8/12/2019 Survey Sampling and Weights

    50/58

    The sampling operation

    Must be carefully controlled

    dont leave to discretion in the field

    use a carefully defined procedure

    Document what you did

    for reference during analysis

    to defend your study

  • 8/12/2019 Survey Sampling and Weights

    51/58

    Sampling frames

    A list containing all listing units is

    great if you can get it

    ok if it includes some ineligibles

    Problems associated with geographic

    location-based sampling

    map-based sampling

    EPI sampling

  • 8/12/2019 Survey Sampling and Weights

    52/58

    Sampling weights

    Inverse of the net sampling

    probability

    Interpretation: the sampling weight

    for an sampled individual is the

    number of individuals his/her data

    represent

  • 8/12/2019 Survey Sampling and Weights

    53/58

    Example--sampling weights

    There are 150 employees in a firm

    stratum 1: 50 employees aged 18-29

    stratum 2: 100 employees aged 30-69

    We sample 10 from each stratum

    Sampling probabilities are stratum 1: 10/50 = 0.20

    stratum 2: 10/100 = 0.10

  • 8/12/2019 Survey Sampling and Weights

    54/58

  • 8/12/2019 Survey Sampling and Weights

    55/58

    What about non-response?

    1 employee in the stratum 1 sample

    and 3 employees in the stratum 2

    sample refuse to participate in the

    survey

    Net sampling probabilities

    stratum 1: 9/50 = 0.18

    stratum 2: 7/100 = 0.07

  • 8/12/2019 Survey Sampling and Weights

    56/58

    Revised sampling weights

    Sampling weights revised for non-

    response

    stratum 1: 1/0.18 = 5.56 stratum 2: 1/0.07 = 14.29

    This computation is often done by

    multiplying the original samplingweights by adjustment factors to

    account for non-response rates

  • 8/12/2019 Survey Sampling and Weights

    57/58

    Post-stratification weighting

    Define strata, which may or may not have

    been used as strata in the sampling design

    Compute sampling probabilities = proportion

    of each stratum that was actually sampled

    Compute sampling weights from these

    sampling probabilities

    Allows post-hoc treatment of unequalrepresentation of population segments in

    the sample

  • 8/12/2019 Survey Sampling and Weights

    58/58

    Discussion topics

    What is the population of interest?

    Infinite populations

    Selecting random numbers Selecting simple random samples

    from finite populations

    from infinite populations

    Analysis software for complex

    surveys