population and sampling in health research
TRANSCRIPT
-
7/30/2019 Population and Sampling in Health Research
1/41
Chapter IISTEPS OF SCIENTIFIC RESEARCH
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
2/41
Population and SamplingDEFINITION OF TERMS
Population: Collection of units sharing a commoncharacteristic. It might be:
* Finite: possibility of counting all units; e.g.students in a school
* Infinite: counting all units is not feasible; e.g.RBCs of an individual.
Sample: A subset of a population obtained to
investigate properties of the parentpopulation.Target population: Population upon which the
results of the study will be generalized.
-
7/30/2019 Population and Sampling in Health Research
3/41
Population and SamplingDEFINITION OF TERMS
Sampling population: Population from which thesample was taken. Ideally, the sampling andtarget populations should be the same.
Sampling unit: Population unit used for sampling. Ina population of individuals, the unit is notalways a person; e.g. * In a study of
patients' satisfaction, a random sample of two
days of the week was taken, and all the patientsattending the hospital during these two days wereinterviewed. The sampling unit here is not theindividual patient but the day of the week.
-
7/30/2019 Population and Sampling in Health Research
4/41
Population and SamplingDEFINITION OF TERMS
* In a study of the prevalence of smoking amongschool children in the country, a sample of fiveprovinces was selected. From each province threecities were selected. From each city four schoolswere selected. From each school, three classeswere selected. All the students of the selected
classes were interviewed.
-
7/30/2019 Population and Sampling in Health Research
5/41
Population and SamplingDEFINITION OF TERMS
Analytic unit: This is the population unit used indata analysis. In the above examples, theanalysis of patients' satisfaction and that ofthe prevalence of smoking pertain to theindividual; the analytic unit is the person.Thus, regardless of the sampling unit, theunit of analysis relate more closely to the
study questions and objectives.Sampling frame: Listing of all the units thatcompose the sampling population.
-
7/30/2019 Population and Sampling in Health Research
6/41
Population and Sampling
Sampling:Sampling is the process of selection of a numberof units from a defined study population. Insome situations, quite uncommon, the study
population is very small, and all its units can bestudied. In this case, there is no need forsampling, although the application of statisticalprinciples could not be done since they are based
on sampling. More often the population is infiniteor too large to be totally included in the study.Hence the importance of sampling.
-
7/30/2019 Population and Sampling in Health Research
7/41
Population and Sampling
Sampling: Apart from saving time, money andeffort, sampling leads to increased precision ofthe collected data since the resources areconcentrated on a smaller number of units. More
importantly is the possibility of inference orgeneralization from sample to population, whichrests on randomness of selection of the sample.The sampling involves the following steps:
1. Identification of study population2. Determination of sampling population3. Definition of sampling unit4. Choice of sampling method5. Estimation of sample size
-
7/30/2019 Population and Sampling in Health Research
8/41
Population and SamplingSampling:
1. Identification of study population.As defined above, the study or target populationis the one upon which the results of the study will
be generalized. It is crucial that the investigatordefines the target population clearly, since it isthe most important determinant of the samplingpopulation. On the one hand, if the subject of thestudy could be affected by local socio-demographic characteristics e.g. prevalencestudies, the target population is limited to the
locality of the study.
-
7/30/2019 Population and Sampling in Health Research
9/41
Population and Sampling
Sampling:1. Identification of study population.If, on the other hand, the research questionpertains to biological processes with minimal
effects of the local or socio-demographiccharacteristics, the process of generalization canbe extended to a wider target population, e.g.clinical trials for drug therapy
Example,A. In a study of the prevalence of hypertension incity "X", the prevalence rate was found to be 30%among males above 25 years of age. What is thetarget or study population?
-
7/30/2019 Population and Sampling in Health Research
10/41
Population and Sampling
Sampling:1. Identification of study population.This prevalence rate of 30% is only generalizableto "males above 25 years of age residing in city
X". It does not apply to other gender or agecategories, or any other locality. Thus the studypopulation is well defined by age, gender andresidence.
B. In a study of therapy of hypertension, drug"A" was found to be more efficacious in thetreatment of mild hypertension. What is thetarget population?
-
7/30/2019 Population and Sampling in Health Research
11/41
Population and Sampling
Sampling:1. Identification of study population.
If the study is sound, it can be assumed that itsresults are applicable to any patient with mildhypertension, regardless of age, gender, race,etc. Hence, the target population is defined onlyby the level of hypertension.
-
7/30/2019 Population and Sampling in Health Research
12/41
Population and SamplingSampling:2. Determination of sampling population.The sampling population is the one from which thesample is drawn. The definition of the samplingpopulation by the investigator is governed by two
factors: Feasibility: this leads the investigator to selecta reachable sampling population, forconvenience; e.g. hospital-based studies of
prevalence where the sampling populationconsisting of patients attending the hospitalis quite different from the target population,the community.
-
7/30/2019 Population and Sampling in Health Research
13/41
Population and SamplingSampling:2. Determination of sampling population.
In this case, generalization from the sampleto the target population is not possible, andthe external validity of the study is
jeopardized.External validity: the ability to generalize fromthe study results to the target population.This leads the investigator to identify the
proper sampling population that allows him togeneralize his results. In this case, thesampling and the target populations areidentical, and the external validity of the
study is high.
-
7/30/2019 Population and Sampling in Health Research
14/41
Population and SamplingSampling:
3. Definition of sampling unit.The sampling unit might be different from theanalysis unit, as outlined above, but more oftenthey are the same. The definition of the sampling
unit is done by setting:Inclusion criteria: these specify thecharacteristics that make a unit eligible forinclusion in the study sample. Thesecharacteristics must be clear and well defined.Working definitions might be used e.g.Hypertension is BP > 140/90.
-
7/30/2019 Population and Sampling in Health Research
15/41
Population and SamplingSampling:3. Definition of sampling unit.An example of inclusion criteria in a study of oralcontraceptives (OC) and deep venous thrombosis(DVT) is as follows:
{Married} {Female} {Using OC for > 2yrs} Exclusion criteria: these are criteria thatdisqualify eligible units. They are not the oppositeof inclusion criteria. They pertain to conditions or
factors that might affect the subject of thestudy.
-
7/30/2019 Population and Sampling in Health Research
16/41
Population and SamplingSampling:
Thus, in the above example, the exclusion criteriaare NOT:{Unmarried} {Male} {Not using OC for > 2yrs}
They could rather be: {Bed-ridden} {History ofmajor surgery}since these two conditions are risk factors forDVT, and hence could affect the studied
relationship between OC and DVT.
l l
-
7/30/2019 Population and Sampling in Health Research
17/41
Population and SamplingSampling:4. Choice of sampling method.There are two main types of sampling, non-probability and probability sampling. Non-probability sampling is not recommended in medical
research, and hence will be discussed briefly. Themain emphasis will be on probability sampling.4.1 Non-probability sampling.In this type of sampling, there is no known
probability of selection for each unit. Thus,generalization from study results is not possiblesince representativeness of the sample cannot beassumed. There are two methods of non-
probability sampling:
P l d l
-
7/30/2019 Population and Sampling in Health Research
18/41
Population and SamplingSampling:4. Choice of sampling method.4.1.1 Convenience sampling: The investigator
selects a convenient sample; e.g. to assessthe opinion of patients about service, the
investigator decides to interview all thepatients coming to his office today. Theassumption that these patients represent allthe patients attending the hospital cannot
hold. The investigator cannot generalize theresults.4.12 Quota sampling: In the above study,the investigator wants to ensure that all types of
patients are represented in his sample.
P l d l
-
7/30/2019 Population and Sampling in Health Research
19/41
Population and SamplingSampling:4. Choice of sampling method.
He decides to interview 60 males and 60females. Within each gender he will include 20
individuals 60 yrs, and 20 inbetween. Thus, he has six categories of age andgender, with a quota of 20 individuals in each.Although this sample might be better than the
previous one, yet there is no known probability ofselection, and generalization is still risky.
P l i d li
-
7/30/2019 Population and Sampling in Health Research
20/41
Population and SamplingSampling:4. Choice of sampling method.
4.2 Probability sampling:This is the more important sampling type.Here there is a known probability of
selection for each sampling unit. However,this necessitates the presence of asampling frame. There are various methodsof probability sampling.
4.2.1 Simple random sampling.- Most basic sampling method- Requirements: complete sampling
frame, numbering of sampling
units and sample size
P l i d S li
-
7/30/2019 Population and Sampling in Health Research
21/41
Population and SamplingSampling:4. Choice of sampling method.
Methods:* lottery method (drawing numbers from a box)
* table of random numbers* computer generated random numbers
4.2.2 Systematic random sampling.
- Also a basic sampling method- Requirements:* complete sampling frame in the form of a list* numbering of sampling units
* sample size
-
7/30/2019 Population and Sampling in Health Research
22/41
Population and SamplingSampling:
4. Choice of sampling method.
- Methods:* determination of the periodicity of sampling as
follows:period = population (N) / sample size (n)e.g. N = 1000 n = 100 then every 1000/100 or10th sampling unit will be selected* determination of the starting point (in thisexample from 1 to 10) by simple random sampling* compilation of the required sample according
the starting point, the periodicity and the samplesize
P l ti d S li
-
7/30/2019 Population and Sampling in Health Research
23/41
Population and SamplingSampling:4. Choice of sampling method.4.2.3 Stratified random sampling.- Most suitable to ensure representation ofcertain subcategories of the population in the
sample. These subcategories gain importancewhenever they are suspected to affect theresearch question, or the relationship under study.Example:
In the study of the relationship betweenhypercholesterolemia and CAD, gender and smokingare important factors that might affect thisassociation.
P l ti d S li
-
7/30/2019 Population and Sampling in Health Research
24/41
Population and SamplingSampling:4. Choice of sampling method.
We might need to study it in various subcategoriesor strata of gender and smoking:
* male smokers* female smokers* male non-smokers* female non-smokers
Stratified random sampling is thus a process ofdividing the population into mutually exclusivestrata, and sampling from these various strata.
P l ti d S li
-
7/30/2019 Population and Sampling in Health Research
25/41
Population and SamplingSampling:4. Choice of sampling method.
4.2.4 Multi-stage random sample.- Mostly used in surveys- Requirements:
* sampling frame of the first population and ofsubsequently selected sampling units* determination of the different stages ofsampling
* determination of the required sample size ineach stage* compilation of the required sample by simple orsystematic random methods
-
7/30/2019 Population and Sampling in Health Research
26/41
Population and SamplingSampling:
4. Choice of sampling method.
Example: determination of the prevalence of DM inthe country
* sampling frames are done for the selected unitsonly. In the previous example the needed samplingframes are for:+ provinces of the country
+ cities of the selected provinces+ districts of the selected cities+ households of the selected districts+ individuals of the selected households
P l ti d S li
-
7/30/2019 Population and Sampling in Health Research
27/41
Population and SamplingSampling:4. Choice of sampling method.Compare these frames to the frame required forsimple random sampling: enumeration of all theindividual persons in the country.
P l ti d S li
-
7/30/2019 Population and Sampling in Health Research
28/41
Population and SamplingSampling:4. Choice of sampling method.
4.2.5 Cluster sampling.In a study of the prevalence of schistosomiasis ina village, the sampling units were households. It
was decided to select households by simple randomsampling. The village had 10,000households. The required sample size, 300households, were found to be scattered on an area
of 600 km 2 . The time and resources wouldn'tallow the investigator to undergo this tediousfield work. What would he do?
P l ti d S li
-
7/30/2019 Population and Sampling in Health Research
29/41
Population and SamplingSampling4. Choice of sampling method
4.2.5 Cluster samplingThe investigator noticed that the village consistedof 100 hamlets with an average of 80 to 120
households each. All the hamlets were quite similarregarding socio-demographic characteristicsand factors related to the disease. Each hamletcontained various categories of age, gender,
occupation, social class, education, etc.Each hamlet was considered as a cluster ofhouseholds. A sampling frame of the 100 hamletswas prepared.
P p l ti n nd S mplin
-
7/30/2019 Population and Sampling in Health Research
30/41
Population and SamplingSampling4. Choice of sampling method
4.2.5 Cluster samplingA simple random sample of "n" clusters wasselected. All the households of the selected
clusters were included in the sample.The assumption of representativeness of thesample is based on:* the similarity of / or homogeneity among all the
clusters* the heterogeneity within each cluster[ note the difference from stratified randomsampling ]
P pul ti n nd S mplin
-
7/30/2019 Population and Sampling in Health Research
31/41
Population and SamplingSampling4. Choice of sampling method
4.2.6 Area sampling.Area sampling is very similar to cluster sampling.It is used whenever natural clusters are not
present, and the investigator creates the requiredclusters. If the previous study was done in acity, with no hamlets for clustering, a map of thecity could be used to divide it into sections or
clusters for sampling. The only one condition tofulfill is the similarity among the various sections.This kind of sampling is used in counting the hairof the scalp.
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
32/41
Population and SamplingSampling4. Choice of sampling method
4.2.7 Multi-phase sampling.In certain studies, the outcome might be assessedby two or more diagnostic tools. One of these
tools might be inexpensive, rapid, harmless andacceptable, while the other could be costly andhaving potential side-effects. If, for example,the investigator wants to determine the prevalence
of angina in the population of males above 30 yrsin a certain locality, his tools are the Rose'squestionnaire for angina and the exercise test. Hemight select a large random sample of the
population for interviewing.
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
33/41
Population and SamplingSampling4. Choice of sampling method
4.2.7 Multi-phase sampling
POPULATION
SAMPLE Test 1
SUBSAMPLE Test 2
MULTI-PHASE SAMPLING
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
34/41
Population and SamplingSampling5. Estimation of sample size.How many subjects (sampling units) should bestudied? The answer to this question is often anempirical choice of a number. This number will
erroneously depend only on feasibility i.e. the timeallowed for the study, the available resources, thefrequency of cases, etc.
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
35/41
Population and SamplingSampling5. Estimation of sample size.There is also a common belief that the larger thesample size, the better is the study. This is akind of misbelief since, similar to a small sample
size, a large sample size can lead to methodologicand statistical problems. The major problem with asmall sample size is its inability to show asignificant difference whenever it is actually
present.
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
36/41
Population and SamplingSampling5. Estimation of sample size.The size of the sample will depend on the followingfactors:1. Magnitude of the difference to be detected
(effect size): A large sample size is needed fordetection of a minute difference, e.g. aninvestigator is comparing the effect of twoantihypertensive drugs. If the expected
difference is in the order of 1-2 mm Hg, he willneed a much larger sample size than that requiredto detect a difference of 5-10 mm Hg. Here, thedifference to be detected should be governed by
its clinical significance.
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
37/41
Population and SamplingSampling5. Estimation of sample size.The same applies for the magnitude of risk inetiologic studies. Thus, the sample size is inverselyrelated to the effect size.
2. Variability of the measurement:Variability of the measurement can be simulated tobackground noise. The higher this noise is, themore difficult is the detection of the signal,
the more effort is required, the more subjectsneed to be studied. In the previous example, ifthe drugs are tested on homogenous groups withlow variability of blood pressure measurements,
required..
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
38/41
Population and SamplingSampling5. Estimation of sample size.Thus, sample size is directly related to thestandard deviation. detection of the differencewill be easier and will need a small number of
subjects. The variability of measurements isreflected by the standard deviation or thevariance. The higher the standard deviation, thelarger is the sample size.
.
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
39/41
Population and SamplingSampling5. Estimation of sample size.
3. Level of significance:The significance level " " pertains to the maximumrisk or probability of rejecting a true null
hypothesis. It is also known as error or type Ierror. Since it is an error, the investigator tendsto keep it at minimum. The maximum level of has been set to 5% or 0.05. To be more confident
with his results, the investigator might want tominimize his -error to 0.01 or 0.001. However,this is not without cost. The price is in terms ofincrease in sample size. Thus, sample size is
inversely related to the level of -error
Population and Sampling
-
7/30/2019 Population and Sampling in Health Research
40/41
Population and SamplingSampling5. Estimation of sample size.
4. Power of the study:The power of a study is the probability that it willyield statistically significant result. It is relatedto another type of error, type II or
-error.This error pertains to the risk of accepting thenull hypothesis although it is false. The power isequal to (1 - ). The investigator tends to
increase the power of his study through minimizingthe level of -error. There is no pre-set level of -error as it is the case for -error. However,studies with -error of up to 0.2 (power of 0.8
or 80%) are acceptable.
-
7/30/2019 Population and Sampling in Health Research
41/41
Population and SamplingSampling
5. Estimation of sample size.4. Power of the study:Thus, sample size is inversely related to -error,or directly related to the desired power
To summarize, sample size (n) is directly relatedto the standard deviation (s), and inversely relatedto the effect size (ES), -error, and error.