13 sampling

31
14-04-2012 1 Research Methodology Dr. Nimit Chowdhary, Professor Saturday, April 14, 2012 1 © Dr. Nimit Chowdhary In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population. Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 2

Upload: iittm

Post on 21-Nov-2014

588 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: 13 sampling

14-04-2012

1

Research Methodology Dr. Nimit Chowdhary, Professor

Saturday, April 14, 2012 1© Dr. Nimit Chowdhary

In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population.

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 2

Page 2: 13 sampling

14-04-2012

2

Conditions Favoring the Use of Type of Study

Sample Census

1. Budget

Small

Large

2. Time available

Short Long

3. Population size

Large Small

4. Variance in the characteristic

Small Large

5. Cost of sampling errors

Low High

6. Cost of nonsampling errors

High Low

7. Nature of measurement

Destructive Nondestructive

8. Attention to individual cases Yes No

Fig. 11.1

Define the population

Determine the sampling frame

Select sampling technique(s)

Determine the sample size

Execute the sampling process

Page 3: 13 sampling

14-04-2012

3

© Dr. Nimit Chowdhary Research Methodology Workshop p. 5

© Dr. Nimit Chowdhary Research Methodology Workshop p. 6

Page 4: 13 sampling

14-04-2012

4

Target populationPopulation of interest

Sampling frameList or rule defining

the population

List of target sample

Actual population to which generalizations are made

Defined/ listed by sampling frame

Target sample

SampleThe people

actually studied Response rate

Generalization

Method of selection

Need to distinguish between the population of interest and actual population defined by sampling frame

Generalizations can be made only to ‘actual population’

Understand crucial role of the sampling frame

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 8

Page 5: 13 sampling

14-04-2012

5

The list or procedure defining the Population. (From which the sample will be drawn.)

Distinguish sampling frame from sample. Examples: Telephone book Voter list Random digit dialling

Essential for probability sampling, but can be defined for non-probability sampling

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 9

© Dr. Nimit Chowdhary Research Methodology Workshop p. 10

Sampling

Non-probability

Probability

Convenience

Judgmental

Quota

SnowballRandom sampling

Stratified sampling

Cluster sampling

Systematic

Proportionate

Dis-proportionate

Page 6: 13 sampling

14-04-2012

6

A probability sample is one in which each element of the population has a known non-zero probability of selection.

Not a probability sample of some elements of population cannot be selected (have zero probability)

Not a probability sample if probabilities of selection are not known.

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 11

Cannot guarantee “representativeness” on all traits of interest

A sampling plan with known statistical properties

Permits statement like- “there are 99% chances that the true population correlation falls between 0.46 and 0.56.

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 12

Page 7: 13 sampling

14-04-2012

7

If the sampling frame is a poor fit to the population of interest, random sampling from that frame cannot fix the problem

The sampling frame is non-randomly chosen. Elements not in the sampling frame have zero probability of selection.

Generalizations can be made ONLY to the actual population defined by the sampling frame

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 13

Each element in the population has an equal probability of selection AND each combination of elements has an equal probability of selection

Names drawn out of a hat Random numbers to select

elements from an ordered list

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 14

Page 8: 13 sampling

14-04-2012

8

© Dr. Nimit Chowdhary Research Methodology Workshop p. 15

http://faculty.elgin.edu/dkernler/statistics/ch01/4-1.html

A small catering business serves 9 reception centers. The owner wants to interview a sample of 4 clients in detail to find ways to improve services to his/her clients.

To avoid bias, the owner chooses a simple random sample of size 4.

Saturday, April 14, 2012 © Dr. Nimit Chowdhary

Page 9: 13 sampling

14-04-2012

9

Each reception center is assigned a numerical label 1-9.

1 - Darlene’s Wedding Center2 - Magic Moments Reception Hall3 - Rustic Realm Weddings4 - Romance Gardens5 - Classic Weddings6 - Old Time Chapel7 - Lovers Lane Weddings8 - Accents-Modern Weddings9 - Century Falls Reception Center

Saturday, April 14, 2012 © Dr. Nimit Chowdhary

The owner decides to use a statistical software program to generate 4 numerical labels between 1 and 9 at random. The software returns the following numbers:

5, 8, 6, 4Therefore, the simple random sample to be interviewed in detail will be:

Classic Weddings (5)Accents-Modern Weddings (8)Old Time Chapel (6)Romance Gardens (4)

Page 10: 13 sampling

14-04-2012

10

Sometimes subpopulations within your entire population vary considerably. In this case, it is advantageous to divide your sample into subpopulations called "strata“ and then perform simple random sampling within each stratum. This is stratified sampling.

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 19

Divide population into groups that differ in important ways

Basis for grouping must be known before sampling

Select random sample from within each group

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 20

Page 11: 13 sampling

14-04-2012

11

© Dr. Nimit Chowdhary Research Methodology Workshop p. 21

Imagine you would like to interview schools that contract with different vendors to bring food to their cafeteria. We would expect opinions about cafeteria food to vary widely from school to school. Therefore, it makes sense to create school strata to sample from. Suppose the schools are as follows:

School 1: 1050 students School 2: 565 studentsSchool 3: 1554 students School 4: 306 students

Page 12: 13 sampling

14-04-2012

12

Total students: 1050 + 565 + 1554 + 306 = 3475 students

The administrator wishes to take a sample of 150 students.

The first step is to find the total number of students (3475 above) and calculate the percent of students in each stratum.

School 1: 1050 / 3475 = .30School 2: 565 / 3475 = .16School 3: 1554 / 3475 = .45School 4: 306 / 3475 = .09

Page 13: 13 sampling

14-04-2012

13

Next, to select a sample in proportion to the size of each stratum (in this case school), the following number of students should be randomly selected:

School 1: 150 x .30 = 45School 2: 150 x .16 = 24School 3: 150 x .45 ~ 67School 4: 150 x .09 ~ 14

This tells us that our sample of 150 students should be comprised of: 45 students randomly selected from School 1 24 students randomly selected from School 2 67 students randomly selected from School 3 14 students randomly selected from School 4

The primary advantage of stratified sampling over simple random sampling is it improves accuracy of estimation if you select a relevant stratification variable.

Page 14: 13 sampling

14-04-2012

14

Is a statistical method involving the selection of elements from an ordered sampling frame. The most common form of systematic sampling is an equal-probability method, in which every kth element in the frame is selected, where k, the sampling interval (sometimes known as the skip). Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 27

Number the units in the population from 1 to N

decide on the n (sample size) that you want or need

k = N/n = the interval size randomly select an integer between 1 to k then take every kth unit

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 28

Page 15: 13 sampling

14-04-2012

15

© Dr. Nimit Chowdhary Research Methodology Workshop p. 29

Has same error rate as simple random sample if the list is in random or haphazard order

Provides the benefits of implicit stratification if the list is grouped

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 30

Page 16: 13 sampling

14-04-2012

16

Runs the risk of error if periodicity in the list matches the sampling interval

This is rare. In this example, every 4th

element is red, and red never gets sampled. If j had been 4 or 8, ONLY reds would be sampled.

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 31

Done correctly, this is a form of random sampling

Population is divided into groups, usually geographic or organizational

Page 17: 13 sampling

14-04-2012

17

Some of the groups are randomly chosen

In pure cluster sampling, whole cluster is sampled.

In simple multistage cluster, there is random sampling within each randomly chosen cluster

© Dr. Nimit Chowdhary Research Methodology Workshop p. 34

Page 18: 13 sampling

14-04-2012

18

Population is divided into groups Some of the groups are randomly selected For given sample size, a cluster sample

has more error than a simple random sample

Cost savings of clustering may permit larger sample

Error is smaller if the clusters are similar to each other

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 35

Cluster sampling has very high error if the clusters are different from each other

Cluster sampling is NOT desirable if the clusters are different

It IS random sampling: you randomly choose the clusters

But you will tend to omit some kinds of subjects

Page 19: 13 sampling

14-04-2012

19

Example:

Election forecast!

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 37

Reduce the error in cluster sampling by creating strata of clusters

Sample one cluster from each stratum

The cost-savings of clustering with the error reduction of stratification

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 38

Page 20: 13 sampling

14-04-2012

20

STRATIFICATION

Divide population into groups different from each other: sexes, races, ages

Sample randomly from each group

Less error compared to simple random

More expensive to obtain stratification information before sampling

CLUSTERING

Divide population into comparable groups: schools, cities

Randomly sample some of the groups

More error compared to simple random

Reduces costs to sample only some areas or organizations

Saturday, April 14, 2012 © Dr. Nimit Chowdhary

Combines elements of stratification and clustering

First you define the clusters Then you group the clusters into strata of

clusters, putting similar clusters together in a stratum

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 40

Page 21: 13 sampling

14-04-2012

21

Then you randomly pick one (or more) cluster from each of the strata of clusters

Then you sample the subjects within the sampled clusters (either all the subjects, or a simple random sample of them)

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 41

Convenience sampling Purposive sampling Quota sampling Snowball sampling

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 42

Page 22: 13 sampling

14-04-2012

22

Subjects selected because it is easy to access them.

No reason tied to purposes of research. Students in your class, people on street,

friends

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 43

Subjects selected for a good reason tied to purposes of research

Small samples < 30, not large enough for power of probability sampling. Nature of research requires small sample Choose subjects with appropriate variability in

what you are studying Hard-to-get populations that cannot be

found through screening general populationSaturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 44

Page 23: 13 sampling

14-04-2012

23

Examples test markets purchase engineers selected in industrial

marketing research bellwether precincts selected in voting behavior

research expert witnesses used in court

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 45

Pre-plan number of subjects in specified categories (e.g. 100 men, 100 women)

In uncontrolled quota sampling, the subjects chosen for those categories are a convenience sample, selected any way the interviewer chooses

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 46

Page 24: 13 sampling

14-04-2012

24

In controlled quota sampling, restrictions are imposed to limit interviewer’s choice

No call-backs or other features to eliminate convenience factors in sample selection

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 47

In Stratified Sampling, selection of subject is random. Call-backs are used to get that particular subject.

Stratified sampling without call-backs may not, in practice, be much different from quota sampling.

In Quota Sampling, interviewer selects first available subject who meets criteria: is a convenience sample.

Highly controlled quota sampling uses probability sampling down to the last block or telephone exchange

Page 25: 13 sampling

14-04-2012

25

In snowball sampling, an initial group of respondents is selected, usually at random. After being interviewed, these respondents

are asked to identify others who belong to the target population of interest. Subsequent respondents are selected based

on the referrals.

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 49

Heterogeneity: need larger sample to study more diverse population

Desired precision: need larger sample to get smaller error

Sampling design: smaller if stratified, larger if cluster

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 50

Page 26: 13 sampling

14-04-2012

26

Nature of analysis: complex multivariate statistics need larger samples

Accuracy of sample depends upon sample size, not ratio of sample to population

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 51

Often a non-random selection of basic sampling frame (city, organization etc.)

Fit between sampling frame and research goals must be evaluated

Sampling frame as a concept is relevant to all kinds of research (including non-probability)

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 52

Page 27: 13 sampling

14-04-2012

27

Non-probability sampling means you cannot generalize beyond the sample

Probability sampling means you can generalize to the population defined by the sampling frame

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 53

There are normally two case: Determining sample size for percents Determining sample size for means

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 54

nZe

e Error that is acceptable

Z Z score that is calculated on the basis of desired confidence

estimated standard deviation of the population under study (from past study, pilot study

n Sample size to be determined

Page 28: 13 sampling

14-04-2012

28

A fast food company wants to determine the average number of times that fast food users visit fast food restaurants per week. They have decided that their estimate needs to be accurate within plus or minus one-tenth of a visit, and they want to be 95% sure that their estimate does differ from true number of visits by more than one-tenth of a visit. Previous research has shown that the standard deviation is .7 visits. What is the required sample size?

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 55

Population standard deviation(): .7

Maximum acceptable difference (e): .1

Desired confidence interval (%): 95Z = 1.96

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 56

Page 29: 13 sampling

14-04-2012

29

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 57

188)1.0(

)7.0()96.1(2

22

2

22

n

eZn

nZe

A publishing wants to know what percent of the population might be interested in a new magazine on making the most of your retirement. Secondary data (that is several years old) indicates that 22% of the population is retired. They are willing to accept an error rate of 5% and they want to be 95% certain that their finding does not differ from the true rate by more than 5%. What is the required sample size?

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 58

Page 30: 13 sampling

14-04-2012

30

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 59

2

22

eZn

nZe

in this case

pq

Population proportion (p) = .22Therefore, q= .78

Population standard deviation() = (√p q)=√(.22)(.78)

Maximum acceptable difference (e): 0.05

Desired confidence interval (%): 95Z = 1.96

Page 31: 13 sampling

14-04-2012

31

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 61

263)1.0(

)78.0)(22.0()96.1(2

2

2

2

2

22

n

epqZ

eZn

nZe