13 sampling
DESCRIPTION
TRANSCRIPT
14-04-2012
1
Research Methodology Dr. Nimit Chowdhary, Professor
Saturday, April 14, 2012 1© Dr. Nimit Chowdhary
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population.
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 2
14-04-2012
2
Conditions Favoring the Use of Type of Study
Sample Census
1. Budget
Small
Large
2. Time available
Short Long
3. Population size
Large Small
4. Variance in the characteristic
Small Large
5. Cost of sampling errors
Low High
6. Cost of nonsampling errors
High Low
7. Nature of measurement
Destructive Nondestructive
8. Attention to individual cases Yes No
Fig. 11.1
Define the population
Determine the sampling frame
Select sampling technique(s)
Determine the sample size
Execute the sampling process
14-04-2012
3
© Dr. Nimit Chowdhary Research Methodology Workshop p. 5
© Dr. Nimit Chowdhary Research Methodology Workshop p. 6
14-04-2012
4
Target populationPopulation of interest
Sampling frameList or rule defining
the population
List of target sample
Actual population to which generalizations are made
Defined/ listed by sampling frame
Target sample
SampleThe people
actually studied Response rate
Generalization
Method of selection
Need to distinguish between the population of interest and actual population defined by sampling frame
Generalizations can be made only to ‘actual population’
Understand crucial role of the sampling frame
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 8
14-04-2012
5
The list or procedure defining the Population. (From which the sample will be drawn.)
Distinguish sampling frame from sample. Examples: Telephone book Voter list Random digit dialling
Essential for probability sampling, but can be defined for non-probability sampling
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 9
© Dr. Nimit Chowdhary Research Methodology Workshop p. 10
Sampling
Non-probability
Probability
Convenience
Judgmental
Quota
SnowballRandom sampling
Stratified sampling
Cluster sampling
Systematic
Proportionate
Dis-proportionate
14-04-2012
6
A probability sample is one in which each element of the population has a known non-zero probability of selection.
Not a probability sample of some elements of population cannot be selected (have zero probability)
Not a probability sample if probabilities of selection are not known.
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 11
Cannot guarantee “representativeness” on all traits of interest
A sampling plan with known statistical properties
Permits statement like- “there are 99% chances that the true population correlation falls between 0.46 and 0.56.
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 12
14-04-2012
7
If the sampling frame is a poor fit to the population of interest, random sampling from that frame cannot fix the problem
The sampling frame is non-randomly chosen. Elements not in the sampling frame have zero probability of selection.
Generalizations can be made ONLY to the actual population defined by the sampling frame
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 13
Each element in the population has an equal probability of selection AND each combination of elements has an equal probability of selection
Names drawn out of a hat Random numbers to select
elements from an ordered list
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 14
14-04-2012
8
© Dr. Nimit Chowdhary Research Methodology Workshop p. 15
http://faculty.elgin.edu/dkernler/statistics/ch01/4-1.html
A small catering business serves 9 reception centers. The owner wants to interview a sample of 4 clients in detail to find ways to improve services to his/her clients.
To avoid bias, the owner chooses a simple random sample of size 4.
Saturday, April 14, 2012 © Dr. Nimit Chowdhary
14-04-2012
9
Each reception center is assigned a numerical label 1-9.
1 - Darlene’s Wedding Center2 - Magic Moments Reception Hall3 - Rustic Realm Weddings4 - Romance Gardens5 - Classic Weddings6 - Old Time Chapel7 - Lovers Lane Weddings8 - Accents-Modern Weddings9 - Century Falls Reception Center
Saturday, April 14, 2012 © Dr. Nimit Chowdhary
The owner decides to use a statistical software program to generate 4 numerical labels between 1 and 9 at random. The software returns the following numbers:
5, 8, 6, 4Therefore, the simple random sample to be interviewed in detail will be:
Classic Weddings (5)Accents-Modern Weddings (8)Old Time Chapel (6)Romance Gardens (4)
14-04-2012
10
Sometimes subpopulations within your entire population vary considerably. In this case, it is advantageous to divide your sample into subpopulations called "strata“ and then perform simple random sampling within each stratum. This is stratified sampling.
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 19
Divide population into groups that differ in important ways
Basis for grouping must be known before sampling
Select random sample from within each group
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 20
14-04-2012
11
© Dr. Nimit Chowdhary Research Methodology Workshop p. 21
Imagine you would like to interview schools that contract with different vendors to bring food to their cafeteria. We would expect opinions about cafeteria food to vary widely from school to school. Therefore, it makes sense to create school strata to sample from. Suppose the schools are as follows:
School 1: 1050 students School 2: 565 studentsSchool 3: 1554 students School 4: 306 students
14-04-2012
12
Total students: 1050 + 565 + 1554 + 306 = 3475 students
The administrator wishes to take a sample of 150 students.
The first step is to find the total number of students (3475 above) and calculate the percent of students in each stratum.
School 1: 1050 / 3475 = .30School 2: 565 / 3475 = .16School 3: 1554 / 3475 = .45School 4: 306 / 3475 = .09
14-04-2012
13
Next, to select a sample in proportion to the size of each stratum (in this case school), the following number of students should be randomly selected:
School 1: 150 x .30 = 45School 2: 150 x .16 = 24School 3: 150 x .45 ~ 67School 4: 150 x .09 ~ 14
This tells us that our sample of 150 students should be comprised of: 45 students randomly selected from School 1 24 students randomly selected from School 2 67 students randomly selected from School 3 14 students randomly selected from School 4
The primary advantage of stratified sampling over simple random sampling is it improves accuracy of estimation if you select a relevant stratification variable.
14-04-2012
14
Is a statistical method involving the selection of elements from an ordered sampling frame. The most common form of systematic sampling is an equal-probability method, in which every kth element in the frame is selected, where k, the sampling interval (sometimes known as the skip). Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 27
Number the units in the population from 1 to N
decide on the n (sample size) that you want or need
k = N/n = the interval size randomly select an integer between 1 to k then take every kth unit
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 28
14-04-2012
15
© Dr. Nimit Chowdhary Research Methodology Workshop p. 29
Has same error rate as simple random sample if the list is in random or haphazard order
Provides the benefits of implicit stratification if the list is grouped
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 30
14-04-2012
16
Runs the risk of error if periodicity in the list matches the sampling interval
This is rare. In this example, every 4th
element is red, and red never gets sampled. If j had been 4 or 8, ONLY reds would be sampled.
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 31
Done correctly, this is a form of random sampling
Population is divided into groups, usually geographic or organizational
14-04-2012
17
Some of the groups are randomly chosen
In pure cluster sampling, whole cluster is sampled.
In simple multistage cluster, there is random sampling within each randomly chosen cluster
© Dr. Nimit Chowdhary Research Methodology Workshop p. 34
14-04-2012
18
Population is divided into groups Some of the groups are randomly selected For given sample size, a cluster sample
has more error than a simple random sample
Cost savings of clustering may permit larger sample
Error is smaller if the clusters are similar to each other
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 35
Cluster sampling has very high error if the clusters are different from each other
Cluster sampling is NOT desirable if the clusters are different
It IS random sampling: you randomly choose the clusters
But you will tend to omit some kinds of subjects
14-04-2012
19
Example:
Election forecast!
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 37
Reduce the error in cluster sampling by creating strata of clusters
Sample one cluster from each stratum
The cost-savings of clustering with the error reduction of stratification
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 38
14-04-2012
20
STRATIFICATION
Divide population into groups different from each other: sexes, races, ages
Sample randomly from each group
Less error compared to simple random
More expensive to obtain stratification information before sampling
CLUSTERING
Divide population into comparable groups: schools, cities
Randomly sample some of the groups
More error compared to simple random
Reduces costs to sample only some areas or organizations
Saturday, April 14, 2012 © Dr. Nimit Chowdhary
Combines elements of stratification and clustering
First you define the clusters Then you group the clusters into strata of
clusters, putting similar clusters together in a stratum
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 40
14-04-2012
21
Then you randomly pick one (or more) cluster from each of the strata of clusters
Then you sample the subjects within the sampled clusters (either all the subjects, or a simple random sample of them)
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 41
Convenience sampling Purposive sampling Quota sampling Snowball sampling
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 42
14-04-2012
22
Subjects selected because it is easy to access them.
No reason tied to purposes of research. Students in your class, people on street,
friends
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 43
Subjects selected for a good reason tied to purposes of research
Small samples < 30, not large enough for power of probability sampling. Nature of research requires small sample Choose subjects with appropriate variability in
what you are studying Hard-to-get populations that cannot be
found through screening general populationSaturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 44
14-04-2012
23
Examples test markets purchase engineers selected in industrial
marketing research bellwether precincts selected in voting behavior
research expert witnesses used in court
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 45
Pre-plan number of subjects in specified categories (e.g. 100 men, 100 women)
In uncontrolled quota sampling, the subjects chosen for those categories are a convenience sample, selected any way the interviewer chooses
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 46
14-04-2012
24
In controlled quota sampling, restrictions are imposed to limit interviewer’s choice
No call-backs or other features to eliminate convenience factors in sample selection
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 47
In Stratified Sampling, selection of subject is random. Call-backs are used to get that particular subject.
Stratified sampling without call-backs may not, in practice, be much different from quota sampling.
In Quota Sampling, interviewer selects first available subject who meets criteria: is a convenience sample.
Highly controlled quota sampling uses probability sampling down to the last block or telephone exchange
14-04-2012
25
In snowball sampling, an initial group of respondents is selected, usually at random. After being interviewed, these respondents
are asked to identify others who belong to the target population of interest. Subsequent respondents are selected based
on the referrals.
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 49
Heterogeneity: need larger sample to study more diverse population
Desired precision: need larger sample to get smaller error
Sampling design: smaller if stratified, larger if cluster
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 50
14-04-2012
26
Nature of analysis: complex multivariate statistics need larger samples
Accuracy of sample depends upon sample size, not ratio of sample to population
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 51
Often a non-random selection of basic sampling frame (city, organization etc.)
Fit between sampling frame and research goals must be evaluated
Sampling frame as a concept is relevant to all kinds of research (including non-probability)
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 52
14-04-2012
27
Non-probability sampling means you cannot generalize beyond the sample
Probability sampling means you can generalize to the population defined by the sampling frame
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 53
There are normally two case: Determining sample size for percents Determining sample size for means
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 54
nZe
e Error that is acceptable
Z Z score that is calculated on the basis of desired confidence
estimated standard deviation of the population under study (from past study, pilot study
n Sample size to be determined
14-04-2012
28
A fast food company wants to determine the average number of times that fast food users visit fast food restaurants per week. They have decided that their estimate needs to be accurate within plus or minus one-tenth of a visit, and they want to be 95% sure that their estimate does differ from true number of visits by more than one-tenth of a visit. Previous research has shown that the standard deviation is .7 visits. What is the required sample size?
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 55
Population standard deviation(): .7
Maximum acceptable difference (e): .1
Desired confidence interval (%): 95Z = 1.96
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 56
14-04-2012
29
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 57
188)1.0(
)7.0()96.1(2
22
2
22
n
eZn
nZe
A publishing wants to know what percent of the population might be interested in a new magazine on making the most of your retirement. Secondary data (that is several years old) indicates that 22% of the population is retired. They are willing to accept an error rate of 5% and they want to be 95% certain that their finding does not differ from the true rate by more than 5%. What is the required sample size?
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 58
14-04-2012
30
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 59
2
22
eZn
nZe
in this case
pq
Population proportion (p) = .22Therefore, q= .78
Population standard deviation() = (√p q)=√(.22)(.78)
Maximum acceptable difference (e): 0.05
Desired confidence interval (%): 95Z = 1.96
14-04-2012
31
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 61
263)1.0(
)78.0)(22.0()96.1(2
2
2
2
2
22
n
epqZ
eZn
nZe