sampling for research studies - wordpress.com · 10/3/2016 · systematic sampling •is a type of...
TRANSCRIPT
10/10/2016
1
Sampling for Research Studies
Introduction to Biostatistics
Haleema Masud
2
Background on Sampling
• Concepts
– Definitions
– Sampling methods
– Choice of the right design
• Calculations (2nd semester and later in this semester)
– Sampling error
– Design effect
– Sample size
What
Why
WhoWhere
When
Learning Approach What Is Sampling?
Sampling is a process by which we select/study a small part of a population to make judgments about the entire population.
Sampling involves selecting a number of units from a defined population.
Important Statistical terms
10/10/2016
2
Population
a set of entities which includes all measurements of interest to the researcher (The collection of all responses, measurements, or counts that are of interest)
Sample:
A subset of the population8
A Representative Sample
A representative sample has all the important
characteristics of the study population from
which it is drawn.
9
What is a sample?
"A part of a population, or a subset from a set of units, which is provided by some process or
other, usually by deliberate selection with the object of investigating the properties of the
parent population or set.”
A Dictionary of Statistical Terms, by Maurice G. Kendall and William R. Buckland, International Statistical Institute, 1957.
10
Definitions
•Target population– The complete collection of objects we would like to say
something about
•Sample– A subset of the population
•Sampled population– The population from which the sample was chosen
•Sampling frame
– List of sampling units
– Examples, Households, Street addresses, Villages
11
Definitions
•Observation unit or element– We collect data on these objects or people
•Sampling unit
– Unit actually sampled (can be different from observation unit)
– For example
• Households as sampling units
• Individuals as observation units
SAMPLING…….
12
TARGET POPULATION
STUDY POPULATION
SAMPLE
10/10/2016
3
13SAMPLING BREAKDOWN
14
What makes a "good" sample?
A RUSsIaN sample:
–Representative
–Unbiased
–Sampling Error can be quantified
–Maximum Information for minimum cost
–Non-sampling error is corrected for as much as possible.
15
Representativeness
A sample is representative if what we are studying (characteristic) is present in the
sampled population in the same proportion as in the target population
16
Unbiasedness
• Responses are not likely to systematically over or underestimate the true population parameters
• Can be linked to representativeness
17
Sampling Error
• Expressed by standard error
– of mean, proportion, differences, etc
• No sample is the exact mirror image of the population
• Magnitude of error can be measured in probability samples
18
Non-Sampling Error
Due to problems in design or conduct – Measurement Bias (systematic over or underreporting)
– Selection Bias (part of target population not in
sampled population)
– Questionnaire Design
– Interview Bias
– Processing Error
– Non-response
– Undercoverage
10/10/2016
4
19
Sampling Error vs Sample Size
20
Sample Size, Sampling and Non-Sampling Error
Learning Approach
What
Why
How
When
Why sampling?
Get information about large populations
Less costs
Less field time
More accuracy i.e. Can Do A Better Job of Data
Collection
When it’s impossible to study the whole
population
23
Why do we use samples ?
Get information
– At minimal cost
– At maximum speed
– At increased accuracy
24
Why do we use samples?
• Large population
• Large area
• To reduce the costs/demands on personnel
• To reduce the length of the study
• To avoid response bias
• In case of study with destructive “test” (for example autopsy
on mice)
• Need to know uncertainty or variation in target population
10/10/2016
5
25
When do we NOT use samples ?
• Very rare event/diseases
• Very detailed study for category or area
• Legal records required for every individual in
target population
What
Why
How
When
Learning Approach
Sampling Process
1. Defining the population
2. Decide the sampling and observation units
3. Specifying a sampling frame
4. Specifying a sampling method
5. Determining the sample size
6. Implementing the sampling plan
7. Sampling and data collecting
8. Reviewing the sampling process
27
Types of sampling methods
• Non-probability samples
• Probability samples
PROBABILITY SAMPLING
29
• A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined.
• When every element in the population does have the same probability of selection, this is known as an 'equal probability of selection' (EPS) design.
Probability samples
Simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
Multi-stage sampling
10/10/2016
6
31
Simple random sampling
• Principle
– Equal chance of drawing each sample of size n
– Random selection
• Procedure
– Enumerate all units
– Randomly draw units
Example: evaluate the prevalence of tooth decay among the 1200 children attending a school
• List of children attending the school
• Children numerated from 1 to 1200
• Sample size = 100 children
• Random sampling of 100 numbers between 1 and 1200
How to randomly select?
Simple random sampling
33
Simple random sampling
Lottery method
Table of random numbers
57172 42088 70098 11333 26902 29959 43909 49607
33883 87680 28923 15659 09839 45817 89405 70743
77950 67344 10609 87119 15859 74577 42791 75889
11607 11596 01796 24498 17009 67119 00614 49529
56149 55678 38169 47228 49931 94303 67448 31286
80719 65101 77729 83949 83358 75230 56624 27549
93809 19505 82000 79068 45552 86776 48980 56684
40950 86216 48161 17646 24164 35513 94057 51834
12182 59744 65695 83710 41125 14291 74773 66391
13382 48076 73151 48724 35670 38453 63154 58116
38629 94576 48859 75654 17152 66516 78796 73099
60728 32063 12431 23898 23683 10853 04038 75246
01881 99056 46747 08846 01331 88163 74462 14551
23094 29831 95387 23917 07421 97869 88092 72201
15243 21100 48125 05243 16181 39641 36970 99522
53501 58431 68149 25405 23463 49168 02048 31522
07698 24181 01161 01527 17046 31460 91507 16050
22921 25930 79579 43488 13211 71120 91715 49881
68127 00501 37484 99278 28751 80855 02035 10910
55309 10713 36439 65660 72554 77021 46279 22705
92034 90892 69853 06175 61221 76825 18239 47687
50612 84077 41387 54107 09190 74305 68196 75634
81415 98504 32168 17822 49946 37545 47201 85224
38461 44528 30953 08633 08049 68698 08759 45611
07556 24587 88753 71626 64864 54986 38964 83534
60557 50031 75829 05622 30237 77795 41870 26300
I need 100 numbers
between 1 and 1200
1)I select one random point
2) I decide the direction (down)
and where I want to search
(the last 4 ciphers)
3) I look for numbers in the
last 4 ciphers between 1-
1200
35
Using Excel for Random Numbers
• =randbetween(5,10) generates random numbers between 5 and 10
• =randbetween(1,N) generates random numbers 1 and the total population size (N)
OR
• =trunc(rand()*(a-b+1)+b,0)
Where you want to generate a random number greater than or equal to a, and less than or equal to b.
10/10/2016
7
37
Simple random sampling
• Advantages– Simple (implementation, calculation)– Sampling error easily measured– Simple random sampling is always an EPS design, but
not all EPS designs are simple random sampling.
• Disadvantages– Need complete list of units– Does not always achieve representativeness– Selected units may be scattered in a large area
(geographic problem)– Minority subgroups of interest in population may not
be present in sample in sufficient numbers for study.
Distributing 5 candies in the class
Simple random sampling
Systematic sampling
• is a type of probability sampling method in which sample members from a larger population are selected according to a random starting point and a fixed periodic interval.
40
Systematic Sampling withEqual Probability
• Numbered list of all possible units
• # units desired sample size = sampling interval
– For example, to select 20 health centers from a list of 46, the sampling
interval is 46/20 = 2.3
• Random # x sampling interval = random start
– For example, if the random number is 0.183, calculate
0.183 × 2.3 = 0.421, which rounds upward to 1
• Round number up to choose sample unit
• Add sampling interval to random start for subsequent units
0.421 + 2.3 = 2.721 or Facility 32.721 + 2.3 = 5.021 or Facility 65.021 + 2.3 = 7.321 or Facility 8 and so forth
Sampling to Study Drug Use 41
Systematic Sampling withProbability Proportional To Size
• List where the units are sorted in decreasing order by some measure of size (like population or number of visits)
• Calculate the cumulative total
• Cumulative total sample size = sampling interval
• Random # x sampling interval = random start
• Choose first unit with cumulative total result
• Add sampling interval to previous total for subsequent units.
42
Advantages of Systematic Sampling
• The sample is easy to select.
• Using systematic sampling the selected sampling units are likely to be more uniformly spread over the whole population and may therefore be more representative than a simple random sample.
• Under most conditions, simple random sampling formulae for parameter and variance estimates can be used with systematic sampling.
10/10/2016
8
43
Disadvantages of Systematic Sampling
• The sample may be biased if a hidden periodicity in the population coincides with that of the selection.
• It is difficult to assess the precision of the estimate from one survey.
• If there is a periodic trend in the data that matches the periodicity of selection, the estimate of the standard error will be too small, since the observed sample will not reflect the periodic trend.
• If the population is ordered monotonically (i.e. in increasing or decreasing order) the estimated standard error will be too large.
Bad systematic sample
Good systematic sample
44
Systematic sampling
Distributing 5 candies in the class
Systematic sampling
46
Stratified Sampling• Used when the sampling frame contains clearly
different categories (strata)
–For example, • Urban and rural facilities
• Facilities with and without doctors
• Government and mission facilities
• Process:- Organize the list of sampling units by stratum- Select units within each stratum using a random
method (simple random sampling or systematic
sampling)
STRATIFIED SAMPLING…….
47
Draw a sample from each stratum
48
Stratified sampling
• Principle :– Classify population into internally homogeneous
subgroups (strata)– Draw sample in each strata– Combine results of all strata
• Assumptions– Strata are internally homogeneous, and externally
heterogeneous– Variation between strata reflects population
heterogeneity– Variation within strata low.
10/10/2016
9
49
Stratified sampling
50
Advantages of Stratified Samples
• Using the same sampling fraction for all strata ensures proportionate representation in the sample of the characteristic being stratified.
• Adequate representation of minority subgroups of interest can be ensured by stratification and by varying the sampling fraction between strata as required.
• A stratified random sample may provide increased precision (i.e. narrower confidence intervals) over that which is possible with a simple random sample of the same size (i.e., DEFF<1).
• Information concerning estimates within each stratum is easily obtainable.
• For either administrative or logistic reasons, it may be easier to select a stratified sample than a simple random sample.
51
Disadvantages of Stratified Samples
• Stratified sampling requires advance knowledge of the characteristic in the population used for stratification.
• The sampling frame of the entire population has to be prepared separately for each stratum.
• Stratified sampling may not be less expensive than simple random sampling since detailed frames must be constructed for each stratum prior to sampling.
• Varying the sampling fraction between strata, to ensure selection of sufficient numbers in minority subgroups for study, affects the proportional representativeness of the subgroups in the sample as a whole, and makes the analysis of the survey more complex.
• Strata-level estimates may not have the desired level of precision, or the total sample size may be larger than needed
52
Example: Stratified sampling
Test Your Understanding
• An auto analyst is conducting a satisfaction survey, sampling from a list of 10,000 new car buyers. The list includes 2,500 Ford buyers, 2,500 GM buyers, 2,500 Honda buyers, and 2,500 Toyota buyers. The analyst selects a sample of 400 car buyers, by randomly sampling 100 buyers of each brand.
• Is this an example of a simple random sample?• (A) Yes, because each buyer in the sample was randomly sampled.
(B) Yes, because each buyer in the sample had an equal chance of being sampled.(C) Yes, because car buyers of every brand were equally represented in the sample.(D) No, because every possible 400-buyer sample did not have an equal chance of being chosen.(E) No, because the population consisted of purchasers of four different brands of car.
Solution
• The correct answer is (D). A simple random sample requires that every sample of size n (in this problem, n is equal to 400) has an equal chance of being selected. In this problem, there was a 100 percent chance that the sample would include 100 purchasers of each brand of car. There was zero percent chance that the sample would include, for example, 99 Ford buyers, 101 Honda buyers, 100 Toyota buyers, and 100 GM buyers. Thus, all possible samples of size 400 did not have an equal chance of being selected; so this cannot be a simple random sample.
• The fact that each buyer in the sample was randomly sampled is a necessary condition for a simple random sample, but it is not sufficient. Similarly, the fact that each buyer in the sample had an equal chance of being selected is characteristic of a simple random sample, but it is not sufficient. The sampling method in this problem used random sampling and gave each buyer an equal chance of being selected; but the sampling method was actually stratified random sampling.
10/10/2016
10
55
Cluster Sampling
• Population is divided into heterogeneous, but convenient groups ("clusters").
• Some groups are sampled.
• One-Stage Cluster Sampling: every element of selected groups is sampled.
• Two-Stage Cluster Sampling: elements within selected groups are themselves randomly chosen.
• Multi-Stage Cluster Sampling…..
56
Cluster Sampling: Diagram
57
Cluster Sampling
One-Stage Cluster Sample: Fully observe some clusters
CLUSTER SAMPLING
58
• Cluster sampling is an example of 'two-stage sampling' .
• First stage a sample of areas is chosen;
• Second stage a sample of respondents within those areas is selected.
• Sampling units are groups rather than individuals.
• A sample of such clusters is then selected.
• All units from the selected clusters are studied.
CLUSTER SAMPLING…….
59
• Advantages :– Cuts down on the cost of preparing a
sampling frame.
– This can reduce travel and other administrative costs.
• Disadvantages: – sampling error is higher for a simple random
sample of same size.
60
Multiple stage sampling
Principle
• repeated sampling
• example :
sampling unit = household
– 1rst stage : sampling provinces – 2nd stage : sampling villages/local areas – 3rd stage : sampling households
10/10/2016
11
Sampling to Study Drug Use 61
Multistage Sampling
• Randomly select primary sampling units at the first stage:
– Specific communities
– Specific health facilities
• Within the primary sampling units, randomly select the final sampling units at the second stage:
– Drug use encounters– Patients– Households
• Sometimes in complex samples, additional stages are needed
Non-probability Sampling
63
Non-probability samples
• Probability of being chosen : unknown
• Any sampling method where some elements of population have no chance of selection
• or where the probability of selection can't be accurately determined.
• (these are sometimes referred to as 'out of coverage'/'undercovered'),
64
Non-probability samples (II)• The unknown probability of selection means that the sampling
error cannot be calculated.
• May be more convenient and less expensive to execute (sometimes this is the only feasible option) – does not require the identification of a sampling frame.
• The results cannot be generalized.
• Requires judgment and caution when interpreting the results.
Nonprobability Sampling
• Convenience sampling
• Quota sampling
• Snowball sampling
• Judgmental sampling
CONVENIENCE SAMPLING
66
• A type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand. That is, readily available and convenient.
• Sometimes known as grab or opportunity sampling or accidental or haphazard sampling.
10/10/2016
12
67
CONVENIENCE SAMPLING…….
– Use sampling units that are easy to get
67
CONVENIENCE SAMPLING
68
• The researcher using such a sample cannot scientifically make generalizations about the total population from this sample because it would not be representative enough.
• For example, if the interviewer was to conduct a survey at a shopping center early in the morning on a given day, the people that he/she could interview would be limited to those given there at that given time, which would not represent the views of other members of society in such an area, if the survey was to be conducted at different times of day and several times per week.
• This type of sampling is most useful for pilot testing.
QUOTA SAMPLING
69
• The population is first segmented into mutually exclusive sub-groups, just as in stratified sampling.
• Then judgment used to select subjects or units from each segment based on a specified proportion.
• For example, an interviewer may be told to sample 200 females and 300 males between the age of 45 and 60.
• In quota sampling the selection of the sample is non-random.
Judgmental sampling or Purposive sampling
70
• The researchers choose the sample based on who they think would be appropriate for the study.
• This is used primarily when there is a limited number of people that have expertise in the area being researched
• For qualitative research
Voluntary sample
• A voluntary sample is made up of people who self-select into the survey. Often, these folks have a strong interest in the main topic of the survey.
Suppose, for example, that a news show asks viewers to participate in an on-line poll. This would be a volunteer sample. The sample is chosen by the viewers, not by the survey administrator.
Snowball Sampling
•In snowball sampling technique, existing study subjects are used to recruit more subjects into the sample.
•Respondent-driven sampling, initial respondents refer others to the researcher
–Usually used with hard-to-discover populations–Bias introduced by structured nature of affiliation–Can be improved with incentives to subjects torecruit a certain number of new respondents
10/10/2016
13
Nonprobability SamplingCritiques
–Limited generalizability—one cannot judge representativeness.
–Researchers should estimate who the sample represents . . . The sample at least represents populations that are similar to it.
–Why use nonprobability samples? Nonprobability does not mean, “intentional attempt to get a sample that is not representative:”
1. Well-suited for exploratory and evaluation research2. Sampling frames (lists from which samples are drawn) are at times
inadequate or nonexistent 3. Quick, efficient4. Can be effectively used to study and describe social and social
psychological “processes”5. Any research is limited, but not having research is worse.6. Across samples, repeatedly finding the same results supports
generalizability.
What sampling method you recommend?
74
• Determining proportion of undernourished five year olds in a village.
• Investigating nutritional status of preschool children.
• Selecting maternity records for the study of previous abortions or duration of postnatal stay.
What sampling method you recommend?
• In estimation of immunization coverage in a province, data on seven children aged 12-23 months in 30 clusters are used to determine proportion of fully immunized children in the province.
• Give reasons why cluster sampling is used in this survey.