sampling and its variability

64
Presentation by: Dr. Bhushan Kamble Moderator: Dr. Poornima Tiwari Professor, Department of Community Medicine, VMMC & SJH Sampling and sampling variability

Upload: drbhushan-kamble

Post on 27-Jan-2015

108 views

Category:

Health & Medicine


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Sampling  and its variability

Presentation by: Dr. Bhushan Kamble

Moderator: Dr. Poornima Tiwari

Professor,

Department of Community Medicine,

VMMC & SJH

Sampling and sampling variability

Page 2: Sampling  and its variability

Outline of presentation• Definitions• Need for sampling• Types of sampling design

Probability sampling Non probability sampling

• Factors affecting choice of sampling design• Sample size

Factors affecting sample size Calculation of sample size for• Descriptive studies• Comparison studies

• Sampling variability• Sampling errors• References

Page 3: Sampling  and its variability

Definitions

Population: The target group to which the findings (of a study) would ultimately apply is called population1

OrPopulation is the term statisticians use to describe a large set or

collection of items that have something in common2.

Sample: is that part of the target population which is actually enquired upon or investigated1.

OrSample is a subset of population, selected in such a way that it

is representative of the larger population2

1. Indrayan A., Satyanarayana L., Medical Biostatistics, third edition, 20092. Last JM. Dictionary of Epidemiology, 3rd edition, 2000.

Page 4: Sampling  and its variability

Definitions (cont.)

Sampling: is the process of selecting a small number of elements from a larger defined target group of elements such that the information gathered from the small group will allow judgments to be made about the larger groups.

conclusions based on the sample results may be attributed only to the population sampled*. .

*Dawson B., Trapp RG, Basic and Clinical Biostatistics, second edition, 1994

Page 5: Sampling  and its variability

Definitions cont..

Sampling unit: is the unit of selection

Unit of study or element: is the subject on which information is obtained.

Sampling frame: list of all sampling units in the target population is called a sampling frame.

Sample size: the number of units or subjects sampled for inclusion in the study is called sample size.

Sampling technique: Method of selecting sampling units from sampling frame

Page 6: Sampling  and its variability

Population Vs. SamplePopulation Vs. SampleSample Population Sample

Parameter StatisticWe measure the sample using statistics in order to draw inferences about the population and its parameters.

Population of Interest

Page 7: Sampling  and its variability

Target population

Sampling frame

Sample

Population you want to generalize results to

Population you have access to for your study

Study population

How can you get access to study population?

Study actually done on?

1.2.3…..

Page 8: Sampling  and its variability

Need for sampling

1. Complete enumeration may not be possible.

2. Resources: Lower cost, Lesser demand on personnel.

3. Speed: Faster results due to lesser coverage.

4. Reliable information: Due to small size - better trained personnel, more accurate methods, better supervision.

To draw conclusions about population from sample, there are two major requirements for a sample.

Firstly, the sample size should be large. Secondly, the sample has to be selected appropriately so that

it is representative of the population. Sample should have all the characteristics of the population.

Page 9: Sampling  and its variability

Disadvantages of sampling

1. Sampling entails an argument from the fraction to the whole. Validity depends on representativeness of the sample.

2. Fails to provide precise information in case of small segments containing few individuals.

3. Not necessary in studies where complete enumeration is needed.

4. May cause a feeling of discrimination among the subjects who are not included in the study.

Page 10: Sampling  and its variability

Types of sampling

Probability sampling Non probability samplingProbability of selection of

each individual is known and pre determinedSimple random samplingSystematic random

samplingStratified random

samplingCluster random samplingMultistage random

sampling

Probability of selection of each individual is not knownQuota samplingPurposive/ Judgmental

samplingSnowball/ Network

samplingConvenience/ Grab

sampling (man in the street)

Page 11: Sampling  and its variability

Simple random sampling

Equal probability of selection of units for inclusion in the studyRequires a list of all sampling units (sampling frame)Each individual is chosen randomly.Methods:

Lottery method (possible for finite population)Random number tablesSoftware that generate random numbers

Page 12: Sampling  and its variability

Lottery method

Lottery method

Page 13: Sampling  and its variability

Random number table

76 58 30 83 64

47 56 91 29 34

10 80 21 38 84

00 95 01 31 76

07 28 37 07 61

Page 14: Sampling  and its variability
Page 15: Sampling  and its variability

Simple random sampling (contd.)

Simple random methodWith replacementWithout replacement

AdvantageVery scientific methodEqual chance of all subjects for selection

Disadvantage Requires sampling frame

Example:Blood sampling – TLC, Hb estimation

Page 16: Sampling  and its variability

Stratified random sampling

Preferred method when the population is heterogeneous with respect to characteristic under study.

Population is divided into groups or strata on the basis of certain characteristics.

A simple random sample is selected from each strata.Ensures representation of different strata/ groups in the study

population.Can be done by selecting individuals from different strata in

certain fixed predetermined proportions.Proportional stratified samplingDis-proportionate stratified sampling

Page 17: Sampling  and its variability

Stratified random sampling(contd.)

For example, if we draw a simple random sample from a population, a sample of 100 may contain 10 to 15 from high socioeconomic group20 to 25 from middle socioeconomic

group70 to 75 from low socioeconomic group

To get adequately large representation for all the three socio economic structures, we can stratify on socioeconomic class and select simple random samples from each of the three strata.

Page 18: Sampling  and its variability

POPULATION

LOW SOCIOECONOMIC

MIDDLE SOCIOECONOMIC

HIGH SOCIOECONOMIC

Page 19: Sampling  and its variability

Stratified random sampling(contd.)Advantage:

All groups, however small are equally represented.When we want to highlight a specific subgroup within the

population. Ensures presence of the subgroup.Observe existing relationships between two or more

subgroups.Can representatively sample even the smallest and most

inaccessible subgroups in the population. To sample the rare extremes of the given population.

Higher statistical precision compared to simple random sampling. (d/t lesser variability). So less time and money.

Disadvantage:Requires a sampling frame for each stratum separately.Requires accurate information on proportions of each stratum

Page 20: Sampling  and its variability

Systematic random sampling

Systematic sampling is a commonly employed technique, when complete and up to date list of sampling units is available.

A systematic random sample is obtained by Selecting the first unit on a random basis Then others are included on the basis of

sampling interval I = N/n.

Page 21: Sampling  and its variability

For example, if there are 100 patients (N) in a hospital and to select a sample of 20 patients (n) by systematic random sampling procedure,

Step 1: write the names of 100 patients in alphabetical order or their roll numbers one below the other.

Step 2: sampling fraction: divide N by n to get the sampling fraction (k).In the example k=100/20 = 5.

Step 3: randomly select any number between 1 to k i.e. between 1 to 5. Suppose the number we select is 4.

Step 4: patient number 4 is selected in the sample. Step 5: Thereafter every 4+k th patient is selected

in the sample until we reach the last one.

Systematic random sampling(contd.)

Page 22: Sampling  and its variability

Systematic random sampling(contd.)

Page 23: Sampling  and its variability

Advantage: easy to draw, simplicity.assurance that the population will be evenly sampled.

Disadvantage: Requires sampling frame.

Eg. Random blinded rechecking of slides under RNTCP. Slides are drawn from the register by systematic random sampling.

Systematic random sampling(contd.)

Page 24: Sampling  and its variability

Cluster samplingThe population is divided into subgroups (clusters) like

families. A simple random sample is taken of the subgroups and then all members of the cluster selected are surveyed.

Cluster sampling is used when the population is heterogeneous.

Clusters are formed by grouping units on the basis of their geographical locations.

Cluster sampling is a very useful method for the field epidemiological research and for health administrators.

Page 25: Sampling  and its variability

Cluster sampling

Cluster 4

Cluster 5

Cluster 3

Cluster 2Cluster 1

Page 26: Sampling  and its variability

Types: One stage – when all units in the selected cluster are selected.Two stage – only some units from a selected cluster are taken

using simple random or systematic random sampling.Advantages

Simple as complete list of sampling units within population not required

Low costCan estimate characteristics of both cluster and populationLess travel/resources required

DisadvantagesPotential problem is that cluster members are more likely to be

alike, than those in another cluster (homogenous).Each stage in cluster sampling introduces sampling error—

the more stages there are, the more error there tends to be Usually less expensive than SRS but not as accurate

Cluster sampling (contd.)

Page 27: Sampling  and its variability

A special form of cluster sampling called the “30 X 7 cluster sampling”, has been recommended by the WHO for field studies in assessing vaccination coverage.

In this a list of all villages (clusters) for a given geographical area is made.

30 clusters are selected using Probability Proportional to Size (PPS).

From each of the selected clusters, 7 subjects are randomly chosen.

Thus a total sample of 30 x 7 = 210 subjects is chosen. The advantage of cluster sampling is that sampling frame is not

required

Cluster sampling (contd.)

Page 28: Sampling  and its variability

Steps:List of all clusters (villages and sectors/wards) is made.Population of each cluster is written against them.Cumulative population is then written in serial order.Sampling interval is calculated = Total cumulative population/30

Choose a random number between 1 and the SI. This is the Random Start (RS). The first cluster to be sampled contains this cumulative populationCalculate the following series: RS; RS + SI; RS + 2SI; …. RS+(d-

1)*SI.The clusters selected are those for which the cumulative population

contains one of the serial numbers.

Probability proportional to size (PPS)

Page 29: Sampling  and its variability
Page 30: Sampling  and its variability

Multistage random samplingMultistage sampling refers to sampling plans where the sampling is

carried out in stagesusing smaller and smaller sampling units at each stage.

Not all Secondary Units Sampled normally used to overcome problems associated with a geographically dispersed population

Page 31: Sampling  and its variability

Multistage random samplingIn this method, the whole population is divided in first stage

sampling units from which a random sample is selected.The selected first stage is then subdivided into second stage units

from which another sample is selected. Third and fourth stage sampling is done in the same manner if

necessary.Example:

NFHS data is collected by multistage sampling.Rural areas – 2 stage sampling – Villages from list by PPS,

Households from villageUrban areas – Wards (PPS) – CEB (PPS) – 30 households

from each CEB

Page 32: Sampling  and its variability

CEBWARD HOUSHOLD

Page 33: Sampling  and its variability

Non probability sampling

The probability of each case being selected from the total population is not known

Units of the sample are chosen on the basis of personal judgment or convenience

There are NO statistical techniques for measuring random sampling error in a non-probability sample. Therefore, generalizability is never statistically appropriate

Page 34: Sampling  and its variability

• Involves non random methods in selection of sample

• All have not equal chance of being selected

• Selection depend upon situation

• Considerably less expensive

• Convenient

• Sample chosen in many ways

Non probability sampling

Page 35: Sampling  and its variability

Types of Non probability sampling

Convenience/Grab/Availability

Judgment/Purposive sampling

Quota sampling

Snowball/Network

Page 36: Sampling  and its variability

Convenience/Grab/Availability sampling Subjects selected because it is easy to access them.No Students in your class, people on Street, friends etcAdvantages:

In pilot studies, convenience sample is usually used to obtain basic data and trends.

In documenting that a particular quality of a substance or phenomenon occurs within a given sample.

Disadvantages:Not representative of the entire population – skewed results.Limitation in generalization and inference making about the entire

population – low external validity.

Page 37: Sampling  and its variability

Snowball/Network sampling If the sample for the study is very rare or is limited to a very

small subgroup of the population.Works like a chain referral.Initial subject helps identify people with a similar trait.Advantages:

To reach rare and difficult to access populations.Cheap, cost – efficient.Lesser workforce, lesser planning.

Disadvantages:Little control over sampling technique.Representativeness is not guaranteed.Sampling bias d/t people referring known people who are

more likely to be similar.

Page 38: Sampling  and its variability
Page 39: Sampling  and its variability

Purposive or judgmental samplingThe specialty of an authority can select a more representative

sample. Knowledge of research question required.Subjects selected for a good reason tied to purposes of research.Advantages:

Hard-to-get populations that cannot be found through screening general population.

Usually used when a limited number of individuals possess the trait of interest.

Disadvantages:No way to evaluate the reliability of the expert or the

authority.Biased since no randomization was used in obtaining the

sample. So results cannot be generalised.

Page 40: Sampling  and its variability

Quota sampling

• The population is divided into cells on the basis of relevant control characteristics.

• A quota of sample units is established for each cell.• A convenience sample is drawn for each cell until the quota is

met.• Pre-plan number of subjects in specified categories(e.g. 100

men, 100 women).• In uncontrolled quota sampling, the subjects chosen for those

categories are a convenience sample.• In controlled quota sampling, restrictions are imposed to limit

interviewer’s choice.

Page 41: Sampling  and its variability

•To sample a subgroup that is of great interest to the study.•To observe relationships between subgroups.•Example – an interviewer may be told to sample 50 males and 50 females.Advantages: •Used when research budget limited•Introduces some elements of stratification

Disadvantages:•Variability and bias can not be controlled or measured •Time consuming

Page 42: Sampling  and its variability

Factors affecting choice of sampling designs

Heterogeneity: need larger sample to study more diverse population

Desired precision: need larger sample to get smaller error Nature of analysis: complex multivariate statistics need

larger samples

Accuracy of sample depends upon sample size, not ratio of sample to population

Page 43: Sampling  and its variability

Sample size

Page 44: Sampling  and its variability

Factors affecting sample size

1. Study design: descriptive or comparison study

2. Sampling design: smaller if stratified, larger if cluster

3. Type and number of variables being studied.

4. Maximum tolerable probability of type I error.

5. Required power for a specified clinically important difference.

6. Specification of the magnitude of difference that would be considered significant.

7. The extent of variability among measurements( S.D.)

8. Whether underlying distribution is normal or skewed

9. Heterogeneity of population: need larger sample to study more diverse population

10. Desired precision: need larger sample to get smaller error

11. Nature of analysis: complex multivariate statistics need larger samples

12. Resources and time at hand

Page 45: Sampling  and its variability

Calculation of sample size

Page 46: Sampling  and its variability

SAMPLE SIZE FOR QUALITATIVE OUTCOME VARIABLE

n=4 / 2𝑃𝑄 𝐿 n= sample sizeP= estimated prevalenceQ= 1-PL= allowable errorA survey is to estimate prevalence of influenza virus infection in school kids. Suppose the available evidence suggests that approximately 20% (P=20) of the children will have antibodies to the virus. Assume the investigator wants to estimate the prevalence within 6% of the true value (6% is called allowable error; L)

The required sample size is :

n = (4 x 20 x 80) / (6 x 6) = 177.78Thus approximately 180 kids would be needed for the survey

Page 47: Sampling  and its variability

Sample size for estimation of mean

n= z2a/2s2

l2

Where, n= sample sizes= standard deviation l= absolute precisionz= relative deviatea= alpha error

Za/2 = 1.96 for a= 0.05

n = 4 s2

l2

Page 48: Sampling  and its variability

Example Suppose that it was required to estimate diastolic blood pressure in a

population to within ±2mmHg (using a 95% confidence interval) and the standard deviation of diastolic blood pressure was known to be 15mmHg.

S= 15 l= 2

n = 4 s2

l2

N= 4× (225/4)= 216.09

The next highest integer is taken, giving a requirement of 217 subjects

Page 49: Sampling  and its variability

Sample size for estimation of proportion

n= z2a/2p(1-p)

l2

Where, n= sample size

p= anticipated value of proportion in population

l= absolute precision

z= relative deviate

a= alpha error

Za/2 = 1.96 for a= 0.05

n= 4 p(1-p) l2

Page 50: Sampling  and its variability

Example Suppose it is thought that there are about

28% smokers in the population and it is required to estimate the percentage of smokers to within ±3% (in absolute terms), using a 95% confidence interval.

p= 0.28 l= 0.03 n= 4 p(1-p) n= 4 ×0.28(1-0.28) l2 (0.03)2 n= 860.5so that a survey of 861 persons is required,

Page 51: Sampling  and its variability

Sample size for estimation of rate

n= 4 r2

l2

where: r = estimated rate in the population

l = absolute precision Suppose that a rate is expected to be around 25 per million (per

year) and it is required to estimate it with a 95% confidence interval to within ± 5 per million. The number of cases required to achieve this level of precision is

n= 4 (25)2

(5)2

n= 96.04

which means that 97 cases would have to be observed

Page 52: Sampling  and its variability

Sample size for estimation of difference between two population means

n= z2a/2 (s12 + s2

2 )

l2

Where, n= sample size

s= standard deviation ( subscript 1,2 refer to two populations)

l= absolute precision

z= relative deviate

a= alpha error

Za/2 = 1.96 for a= 0.05

n= 4 (s12 + s2

2 )

l2

Page 53: Sampling  and its variability

Sample size for estimation of difference between two population proportion

n= z2a/2[ p1(1-p1) + p2 (1-p2) ] l2

Where, n= sample sizep= anticipated value of proportion in population

( subscript 1,2 refer to two populations) l= absolute precisionz= relative deviatea= alpha error

Za/2 = 1.96 for a= 0.05

n= 4 [ p1(1-p1) + p2 (1-p2) ] l2

Page 54: Sampling  and its variability

Sampling variability refers to the different values which a given function of the data takes when it is computed for two or more samples drawn from the same population.

Factors affecting sampling variability:1.Inherent variation in the population2. Sample size 3.Sampling distribution of the mean4.Sampling error and bias.

Sampling variability

Page 55: Sampling  and its variability

Eg. Population of 7000 children and their birth weight. The mean and standard deviation for this distribution are 3.36 and 0.56 respectively.

N Sample 1 Sample 2 Sample 3 Sample 4 Sample 5

1 3.09 4.28 4.09 2.34 4.29

2 3.74 2.82 2.96 3.06 2.87

3 2.56 3.80 3.09 3.35 3.43

4 3.63 1.89 3.14 3.30 3.40

5 2.96 4.04 3.14 4.36 3.58

6 2.76 2.39 4.38 3.99 3.96

7 3.98 3.41 3.87 4.62 3.18

8 3.76 3.95 4.34 3.18 3.07

9 2.66 5.83 3.81 2.80 2.70

10 3.16 3.30 4.16 3.14 3.21

N 10 10 10 10 10

Mean 3.23 3.57 3.70 3.41 3.37

SD 0.51 1.10 0.56 0.71 0.48

Minimum 2.56 1.89 2.96 2.34 2.70

maximum 3.98 5.83 4.38 4.62 4.28

Page 56: Sampling  and its variability

Irrespective of sample size , the sample means are expected to fluctuate evenly about the true population mean.

The variation in sample means exhibited in the table is an example of sampling variation due to chance.

If we take 50 observations ,mean is 3.46 kg. sampling error 3.46-3.36= 0.10

The means vary less(by chance) if the sample size is large; that is sampling error is smaller,the larger is the sample.

Page 57: Sampling  and its variability

The distribution more closely clustered around a middle value as the sample size increases.

The mean do not systematically increase or decrease with increasing sampling and have more variability(larger SD) when the sample size is small.

The standard deviation of the means steadily decrease as sample size increases, more quickly when the sample size is small.

Page 58: Sampling  and its variability

The sampling distribution of the mean

Page 59: Sampling  and its variability

A sampling experiment(based on the distribution of birth weights): what happens to mean and variability of a sample mean when we keep doubling the sample size

N Mean of population values=3.36

Mean of sample means(kg)

SD of population values=0.56

SD of sample means(observed SE OF Mean;kg)

2 3.50 0.40

4 3.51 0.28

8 3.46 0.19

16 3.45 0.11

32 3.44 0.080

64 3.46 0.06

Page 60: Sampling  and its variability

Sampling error •  Types of sampling error: 1. sample error • 2. non sample error

SAMPLE ERROR: is incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population.

For example, if one measures the height of a thousand individuals from a country of one million, the average height of the thousand is typically not the same as the average height of all one million people in the country.

Page 61: Sampling  and its variability

Sample error (random error) • Error caused by the act of taking a sample• They cause sample results to be different from the results of

census• Size of error can be measured in probability samples• Expressed as “standard error”

• of mean, proportion…• We have no control over• Sample error depends upon:

• Size of the sample (larger size lesser error)• Distribution of character of interest in population

Page 62: Sampling  and its variability

Non sample error Non response error: A non-response error occurs when

units selected as part of the sampling procedure do not respond in whole or in part

Response error: A response or data error is any systematic bias that occurs during data collection, analysis or interpretation • Respondent error (e.g., lying, forgetting, etc.)• Interviewer bias• Recording errors• Poorly designed questionnaires

Page 63: Sampling  and its variability

References

1. Indrayan A., Satyanarayana L., Medical Biostatistics, third edition, 2009

2. Last JM. Dictionary of Epidemiology, 3rd edition, 2000.

3. Dawson B., Trapp RG, Basic and Clinical Biostatistics, second edition, 1994

4. Daly LE, Bourke GJ, Interpretation and uses of medical statistics, fifth edition, 2003

5. Detels R., Beaglehole R., Oxford Textbook of public health, fifth edition,2011.

Page 64: Sampling  and its variability

Thank You