sampling for research studies - wordpress.com · 10/3/2016 · systematic sampling •is a type of...

10/10/2016

1

Sampling for Research Studies

Introduction to Biostatistics

Haleema Masud

2

Background on Sampling

• Concepts

– Definitions

– Sampling methods

– Choice of the right design

• Calculations (2nd semester and later in this semester)

– Sampling error

– Design effect

– Sample size

What

Why

WhoWhere

When

Learning Approach What Is Sampling?

Sampling is a process by which we select/study a small part of a population to make judgments about the entire population.

Sampling involves selecting a number of units from a defined population.

Important Statistical terms

10/10/2016

2

Population

a set of entities which includes all measurements of interest to the researcher (The collection of all responses, measurements, or counts that are of interest)

Sample:

A subset of the population8

A Representative Sample

A representative sample has all the important

characteristics of the study population from

which it is drawn.

9

What is a sample?

"A part of a population, or a subset from a set of units, which is provided by some process or

other, usually by deliberate selection with the object of investigating the properties of the

parent population or set.”

A Dictionary of Statistical Terms, by Maurice G. Kendall and William R. Buckland, International Statistical Institute, 1957.

10

Definitions

•Target population– The complete collection of objects we would like to say

something about

•Sample– A subset of the population

•Sampled population– The population from which the sample was chosen

•Sampling frame

– List of sampling units

– Examples, Households, Street addresses, Villages

11

Definitions

•Observation unit or element– We collect data on these objects or people

•Sampling unit

– Unit actually sampled (can be different from observation unit)

– For example

• Households as sampling units

• Individuals as observation units

SAMPLING…….

12

TARGET POPULATION

STUDY POPULATION

SAMPLE

10/10/2016

3

13SAMPLING BREAKDOWN

14

What makes a "good" sample?

A RUSsIaN sample:

–Representative

–Unbiased

–Sampling Error can be quantified

–Maximum Information for minimum cost

–Non-sampling error is corrected for as much as possible.

15

Representativeness

A sample is representative if what we are studying (characteristic) is present in the

sampled population in the same proportion as in the target population

16

Unbiasedness

• Responses are not likely to systematically over or underestimate the true population parameters

• Can be linked to representativeness

17

Sampling Error

• Expressed by standard error

– of mean, proportion, differences, etc

• No sample is the exact mirror image of the population

• Magnitude of error can be measured in probability samples

18

Non-Sampling Error

Due to problems in design or conduct – Measurement Bias (systematic over or underreporting)

– Selection Bias (part of target population not in

sampled population)

– Questionnaire Design

– Interview Bias

– Processing Error

– Non-response

– Undercoverage

10/10/2016

4

19

Sampling Error vs Sample Size

20

Sample Size, Sampling and Non-Sampling Error

Learning Approach

What

Why

How

When

Why sampling?

Get information about large populations

Less costs

Less field time

More accuracy i.e. Can Do A Better Job of Data

Collection

When it’s impossible to study the whole

population

23

Why do we use samples ?

Get information

– At minimal cost

– At maximum speed

– At increased accuracy

24

Why do we use samples?

• Large population

• Large area

• To reduce the costs/demands on personnel

• To reduce the length of the study

• To avoid response bias

• In case of study with destructive “test” (for example autopsy

on mice)

• Need to know uncertainty or variation in target population

10/10/2016

5

25

When do we NOT use samples ?

• Very rare event/diseases

• Very detailed study for category or area

• Legal records required for every individual in

target population

What

Why

How

When

Learning Approach

Sampling Process

1. Defining the population

2. Decide the sampling and observation units

3. Specifying a sampling frame

4. Specifying a sampling method

5. Determining the sample size

6. Implementing the sampling plan

7. Sampling and data collecting

8. Reviewing the sampling process

27

Types of sampling methods

• Non-probability samples

• Probability samples

PROBABILITY SAMPLING

29

• A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined.

• When every element in the population does have the same probability of selection, this is known as an 'equal probability of selection' (EPS) design.

Probability samples

Simple random sampling

Systematic sampling

Stratified sampling

Cluster sampling

Multi-stage sampling

http://en.wikipedia.org/wiki/Sampling_(statistics)



10/10/2016

6

31


• Principle

– Equal chance of drawing each sample of size n

– Random selection

• Procedure

– Enumerate all units

– Randomly draw units

Example: evaluate the prevalence of tooth decay among the 1200 children attending a school

• List of children attending the school

• Children numerated from 1 to 1200

• Sample size = 100 children

• Random sampling of 100 numbers between 1 and 1200

How to randomly select?


33


Lottery method

Table of random numbers

57172 42088 70098 11333 26902 29959 43909 49607

33883 87680 28923 15659 09839 45817 89405 70743

77950 67344 10609 87119 15859 74577 42791 75889

11607 11596 01796 24498 17009 67119 00614 49529

56149 55678 38169 47228 49931 94303 67448 31286

80719 65101 77729 83949 83358 75230 56624 27549

93809 19505 82000 79068 45552 86776 48980 56684

40950 86216 48161 17646 24164 35513 94057 51834

12182 59744 65695 83710 41125 14291 74773 66391

13382 48076 73151 48724 35670 38453 63154 58116

38629 94576 48859 75654 17152 66516 78796 73099

60728 32063 12431 23898 23683 10853 04038 75246

01881 99056 46747 08846 01331 88163 74462 14551

23094 29831 95387 23917 07421 97869 88092 72201

15243 21100 48125 05243 16181 39641 36970 99522

53501 58431 68149 25405 23463 49168 02048 31522

07698 24181 01161 01527 17046 31460 91507 16050

22921 25930 79579 43488 13211 71120 91715 49881

68127 00501 37484 99278 28751 80855 02035 10910

55309 10713 36439 65660 72554 77021 46279 22705

92034 90892 69853 06175 61221 76825 18239 47687

50612 84077 41387 54107 09190 74305 68196 75634

81415 98504 32168 17822 49946 37545 47201 85224

38461 44528 30953 08633 08049 68698 08759 45611

07556 24587 88753 71626 64864 54986 38964 83534

60557 50031 75829 05622 30237 77795 41870 26300

I need 100 numbers

between 1 and 1200

1)I select one random point

2) I decide the direction (down)

and where I want to search

(the last 4 ciphers)

3) I look for numbers in the

last 4 ciphers between 1-

1200

35

Using Excel for Random Numbers

• =randbetween(5,10) generates random numbers between 5 and 10

• =randbetween(1,N) generates random numbers 1 and the total population size (N)

OR

• =trunc(rand()*(a-b+1)+b,0)

Where you want to generate a random number greater than or equal to a, and less than or equal to b.

10/10/2016

7

37


• Advantages– Simple (implementation, calculation)– Sampling error easily measured– Simple random sampling is always an EPS design, but

not all EPS designs are simple random sampling.

• Disadvantages– Need complete list of units– Does not always achieve representativeness– Selected units may be scattered in a large area

(geographic problem)– Minority subgroups of interest in population may not

be present in sample in sufficient numbers for study.

Distributing 5 candies in the class


Systematic sampling

• is a type of probability sampling method in which sample members from a larger population are selected according to a random starting point and a fixed periodic interval.

40

Systematic Sampling withEqual Probability

• Numbered list of all possible units

• # units desired sample size = sampling interval

– For example, to select 20 health centers from a list of 46, the sampling

interval is 46/20 = 2.3

• Random # x sampling interval = random start

– For example, if the random number is 0.183, calculate

0.183 × 2.3 = 0.421, which rounds upward to 1

• Round number up to choose sample unit

• Add sampling interval to random start for subsequent units

0.421 + 2.3 = 2.721 or Facility 32.721 + 2.3 = 5.021 or Facility 65.021 + 2.3 = 7.321 or Facility 8 and so forth

Sampling to Study Drug Use 41

Systematic Sampling withProbability Proportional To Size

• List where the units are sorted in decreasing order by some measure of size (like population or number of visits)

• Calculate the cumulative total

• Cumulative total sample size = sampling interval

• Random # x sampling interval = random start

• Choose first unit with cumulative total result

• Add sampling interval to previous total for subsequent units.

42

Advantages of Systematic Sampling

• The sample is easy to select.

• Using systematic sampling the selected sampling units are likely to be more uniformly spread over the whole population and may therefore be more representative than a simple random sample.

• Under most conditions, simple random sampling formulae for parameter and variance estimates can be used with systematic sampling.

10/10/2016

8

43

Disadvantages of Systematic Sampling

• The sample may be biased if a hidden periodicity in the population coincides with that of the selection.

• It is difficult to assess the precision of the estimate from one survey.

• If there is a periodic trend in the data that matches the periodicity of selection, the estimate of the standard error will be too small, since the observed sample will not reflect the periodic trend.

• If the population is ordered monotonically (i.e. in increasing or decreasing order) the estimated standard error will be too large.

Bad systematic sample

Good systematic sample

44

Systematic sampling

Distributing 5 candies in the class

Systematic sampling

46

Stratified Sampling• Used when the sampling frame contains clearly

different categories (strata)

–For example, • Urban and rural facilities

• Facilities with and without doctors

• Government and mission facilities

• Process:- Organize the list of sampling units by stratum- Select units within each stratum using a random

method (simple random sampling or systematic

sampling)

STRATIFIED SAMPLING…….

47

Draw a sample from each stratum

48

Stratified sampling

• Principle :– Classify population into internally homogeneous

subgroups (strata)– Draw sample in each strata– Combine results of all strata

• Assumptions– Strata are internally homogeneous, and externally

heterogeneous– Variation between strata reflects population

heterogeneity– Variation within strata low.

10/10/2016

9

49

Stratified sampling

50

Advantages of Stratified Samples

• Using the same sampling fraction for all strata ensures proportionate representation in the sample of the characteristic being stratified.

• Adequate representation of minority subgroups of interest can be ensured by stratification and by varying the sampling fraction between strata as required.

• A stratified random sample may provide increased precision (i.e. narrower confidence intervals) over that which is possible with a simple random sample of the same size (i.e., DEFF<1).

• Information concerning estimates within each stratum is easily obtainable.

• For either administrative or logistic reasons, it may be easier to select a stratified sample than a simple random sample.

51

Disadvantages of Stratified Samples

• Stratified sampling requires advance knowledge of the characteristic in the population used for stratification.

• The sampling frame of the entire population has to be prepared separately for each stratum.

• Stratified sampling may not be less expensive than simple random sampling since detailed frames must be constructed for each stratum prior to sampling.

• Varying the sampling fraction between strata, to ensure selection of sufficient numbers in minority subgroups for study, affects the proportional representativeness of the subgroups in the sample as a whole, and makes the analysis of the survey more complex.

• Strata-level estimates may not have the desired level of precision, or the total sample size may be larger than needed

52

Example: Stratified sampling

Test Your Understanding

• An auto analyst is conducting a satisfaction survey, sampling from a list of 10,000 new car buyers. The list includes 2,500 Ford buyers, 2,500 GM buyers, 2,500 Honda buyers, and 2,500 Toyota buyers. The analyst selects a sample of 400 car buyers, by randomly sampling 100 buyers of each brand.

• Is this an example of a simple random sample?• (A) Yes, because each buyer in the sample was randomly sampled.

(B) Yes, because each buyer in the sample had an equal chance of being sampled.(C) Yes, because car buyers of every brand were equally represented in the sample.(D) No, because every possible 400-buyer sample did not have an equal chance of being chosen.(E) No, because the population consisted of purchasers of four different brands of car.

Solution

• The correct answer is (D). A simple random sample requires that every sample of size n (in this problem, n is equal to 400) has an equal chance of being selected. In this problem, there was a 100 percent chance that the sample would include 100 purchasers of each brand of car. There was zero percent chance that the sample would include, for example, 99 Ford buyers, 101 Honda buyers, 100 Toyota buyers, and 100 GM buyers. Thus, all possible samples of size 400 did not have an equal chance of being selected; so this cannot be a simple random sample.

• The fact that each buyer in the sample was randomly sampled is a necessary condition for a simple random sample, but it is not sufficient. Similarly, the fact that each buyer in the sample had an equal chance of being selected is characteristic of a simple random sample, but it is not sufficient. The sampling method in this problem used random sampling and gave each buyer an equal chance of being selected; but the sampling method was actually stratified random sampling.

http://stattrek.com/Help/Glossary.aspx?Target=Simple random sampling

http://stattrek.com/Help/Glossary.aspx?Target=Sample

http://stattrek.com/Help/Glossary.aspx?Target=Stratified sampling

10/10/2016

10

55

Cluster Sampling

• Population is divided into heterogeneous, but convenient groups ("clusters").

• Some groups are sampled.

• One-Stage Cluster Sampling: every element of selected groups is sampled.

• Two-Stage Cluster Sampling: elements within selected groups are themselves randomly chosen.

• Multi-Stage Cluster Sampling…..

56

Cluster Sampling: Diagram

57

Cluster Sampling

One-Stage Cluster Sample: Fully observe some clusters

CLUSTER SAMPLING

58

• Cluster sampling is an example of 'two-stage sampling' .

• First stage a sample of areas is chosen;

• Second stage a sample of respondents within those areas is selected.

• Sampling units are groups rather than individuals.

• A sample of such clusters is then selected.

• All units from the selected clusters are studied.

CLUSTER SAMPLING…….

59

• Advantages :– Cuts down on the cost of preparing a

sampling frame.

– This can reduce travel and other administrative costs.

• Disadvantages: – sampling error is higher for a simple random

sample of same size.

60

Multiple stage sampling

Principle

• repeated sampling

• example :

sampling unit = household

– 1rst stage : sampling provinces – 2nd stage : sampling villages/local areas – 3rd stage : sampling households

http://en.wikipedia.org/wiki/Cluster_sampling

10/10/2016

11

Sampling to Study Drug Use 61

Multistage Sampling

• Randomly select primary sampling units at the first stage:

– Specific communities

– Specific health facilities

• Within the primary sampling units, randomly select the final sampling units at the second stage:

– Drug use encounters– Patients– Households

• Sometimes in complex samples, additional stages are needed

Non-probability Sampling

63

Non-probability samples

• Probability of being chosen : unknown

• Any sampling method where some elements of population have no chance of selection

• or where the probability of selection can't be accurately determined.

• (these are sometimes referred to as 'out of coverage'/'undercovered'),

64

Non-probability samples (II)• The unknown probability of selection means that the sampling

error cannot be calculated.

• May be more convenient and less expensive to execute (sometimes this is the only feasible option) – does not require the identification of a sampling frame.

• The results cannot be generalized.

• Requires judgment and caution when interpreting the results.

Nonprobability Sampling

• Convenience sampling

• Quota sampling

• Snowball sampling

• Judgmental sampling

CONVENIENCE SAMPLING

66

• A type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand. That is, readily available and convenient.

• Sometimes known as grab or opportunity sampling or accidental or haphazard sampling.

10/10/2016

12

67

CONVENIENCE SAMPLING…….

– Use sampling units that are easy to get

67

CONVENIENCE SAMPLING

68

• The researcher using such a sample cannot scientifically make generalizations about the total population from this sample because it would not be representative enough.

• For example, if the interviewer was to conduct a survey at a shopping center early in the morning on a given day, the people that he/she could interview would be limited to those given there at that given time, which would not represent the views of other members of society in such an area, if the survey was to be conducted at different times of day and several times per week.

• This type of sampling is most useful for pilot testing.

QUOTA SAMPLING

69

• The population is first segmented into mutually exclusive sub-groups, just as in stratified sampling.

• Then judgment used to select subjects or units from each segment based on a specified proportion.

• For example, an interviewer may be told to sample 200 females and 300 males between the age of 45 and 60.

• In quota sampling the selection of the sample is non-random.

Judgmental sampling or Purposive sampling

70

• The researchers choose the sample based on who they think would be appropriate for the study.

• This is used primarily when there is a limited number of people that have expertise in the area being researched

• For qualitative research

Voluntary sample

• A voluntary sample is made up of people who self-select into the survey. Often, these folks have a strong interest in the main topic of the survey.

Suppose, for example, that a news show asks viewers to participate in an on-line poll. This would be a volunteer sample. The sample is chosen by the viewers, not by the survey administrator.

Snowball Sampling

•In snowball sampling technique, existing study subjects are used to recruit more subjects into the sample.

•Respondent-driven sampling, initial respondents refer others to the researcher

–Usually used with hard-to-discover populations–Bias introduced by structured nature of affiliation–Can be improved with incentives to subjects torecruit a certain number of new respondents

http://en.wikipedia.org/wiki/Mutually_exclusive

http://en.wikipedia.org/wiki/Mutually_exclusive

http://en.wikipedia.org/wiki/Stratified_sampling

http://en.wikipedia.org/wiki/Random

http://en.wikipedia.org/wiki/Snowball_sampling

10/10/2016

13

Nonprobability SamplingCritiques

–Limited generalizability—one cannot judge representativeness.

–Researchers should estimate who the sample represents . . . The sample at least represents populations that are similar to it.

–Why use nonprobability samples? Nonprobability does not mean, “intentional attempt to get a sample that is not representative:”

1. Well-suited for exploratory and evaluation research2. Sampling frames (lists from which samples are drawn) are at times

inadequate or nonexistent 3. Quick, efficient4. Can be effectively used to study and describe social and social

psychological “processes”5. Any research is limited, but not having research is worse.6. Across samples, repeatedly finding the same results supports

generalizability.

What sampling method you recommend?

74

• Determining proportion of undernourished five year olds in a village.

• Investigating nutritional status of preschool children.

• Selecting maternity records for the study of previous abortions or duration of postnatal stay.

What sampling method you recommend?

• In estimation of immunization coverage in a province, data on seven children aged 12-23 months in 30 clusters are used to determine proportion of fully immunized children in the province.

• Give reasons why cluster sampling is used in this survey.

sampling for research studies - wordpress.com · 10/3/2016 · systematic sampling •is a type of...

Documents