population and sampling in health research

Upload: drtaa62

Post on 04-Apr-2018

225 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/30/2019 Population and Sampling in Health Research

    1/41

    Chapter IISTEPS OF SCIENTIFIC RESEARCH

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    2/41

    Population and SamplingDEFINITION OF TERMS

    Population: Collection of units sharing a commoncharacteristic. It might be:

    * Finite: possibility of counting all units; e.g.students in a school

    * Infinite: counting all units is not feasible; e.g.RBCs of an individual.

    Sample: A subset of a population obtained to

    investigate properties of the parentpopulation.Target population: Population upon which the

    results of the study will be generalized.

  • 7/30/2019 Population and Sampling in Health Research

    3/41

    Population and SamplingDEFINITION OF TERMS

    Sampling population: Population from which thesample was taken. Ideally, the sampling andtarget populations should be the same.

    Sampling unit: Population unit used for sampling. Ina population of individuals, the unit is notalways a person; e.g. * In a study of

    patients' satisfaction, a random sample of two

    days of the week was taken, and all the patientsattending the hospital during these two days wereinterviewed. The sampling unit here is not theindividual patient but the day of the week.

  • 7/30/2019 Population and Sampling in Health Research

    4/41

    Population and SamplingDEFINITION OF TERMS

    * In a study of the prevalence of smoking amongschool children in the country, a sample of fiveprovinces was selected. From each province threecities were selected. From each city four schoolswere selected. From each school, three classeswere selected. All the students of the selected

    classes were interviewed.

  • 7/30/2019 Population and Sampling in Health Research

    5/41

    Population and SamplingDEFINITION OF TERMS

    Analytic unit: This is the population unit used indata analysis. In the above examples, theanalysis of patients' satisfaction and that ofthe prevalence of smoking pertain to theindividual; the analytic unit is the person.Thus, regardless of the sampling unit, theunit of analysis relate more closely to the

    study questions and objectives.Sampling frame: Listing of all the units thatcompose the sampling population.

  • 7/30/2019 Population and Sampling in Health Research

    6/41

    Population and Sampling

    Sampling:Sampling is the process of selection of a numberof units from a defined study population. Insome situations, quite uncommon, the study

    population is very small, and all its units can bestudied. In this case, there is no need forsampling, although the application of statisticalprinciples could not be done since they are based

    on sampling. More often the population is infiniteor too large to be totally included in the study.Hence the importance of sampling.

  • 7/30/2019 Population and Sampling in Health Research

    7/41

    Population and Sampling

    Sampling: Apart from saving time, money andeffort, sampling leads to increased precision ofthe collected data since the resources areconcentrated on a smaller number of units. More

    importantly is the possibility of inference orgeneralization from sample to population, whichrests on randomness of selection of the sample.The sampling involves the following steps:

    1. Identification of study population2. Determination of sampling population3. Definition of sampling unit4. Choice of sampling method5. Estimation of sample size

  • 7/30/2019 Population and Sampling in Health Research

    8/41

    Population and SamplingSampling:

    1. Identification of study population.As defined above, the study or target populationis the one upon which the results of the study will

    be generalized. It is crucial that the investigatordefines the target population clearly, since it isthe most important determinant of the samplingpopulation. On the one hand, if the subject of thestudy could be affected by local socio-demographic characteristics e.g. prevalencestudies, the target population is limited to the

    locality of the study.

  • 7/30/2019 Population and Sampling in Health Research

    9/41

    Population and Sampling

    Sampling:1. Identification of study population.If, on the other hand, the research questionpertains to biological processes with minimal

    effects of the local or socio-demographiccharacteristics, the process of generalization canbe extended to a wider target population, e.g.clinical trials for drug therapy

    Example,A. In a study of the prevalence of hypertension incity "X", the prevalence rate was found to be 30%among males above 25 years of age. What is thetarget or study population?

  • 7/30/2019 Population and Sampling in Health Research

    10/41

    Population and Sampling

    Sampling:1. Identification of study population.This prevalence rate of 30% is only generalizableto "males above 25 years of age residing in city

    X". It does not apply to other gender or agecategories, or any other locality. Thus the studypopulation is well defined by age, gender andresidence.

    B. In a study of therapy of hypertension, drug"A" was found to be more efficacious in thetreatment of mild hypertension. What is thetarget population?

  • 7/30/2019 Population and Sampling in Health Research

    11/41

    Population and Sampling

    Sampling:1. Identification of study population.

    If the study is sound, it can be assumed that itsresults are applicable to any patient with mildhypertension, regardless of age, gender, race,etc. Hence, the target population is defined onlyby the level of hypertension.

  • 7/30/2019 Population and Sampling in Health Research

    12/41

    Population and SamplingSampling:2. Determination of sampling population.The sampling population is the one from which thesample is drawn. The definition of the samplingpopulation by the investigator is governed by two

    factors: Feasibility: this leads the investigator to selecta reachable sampling population, forconvenience; e.g. hospital-based studies of

    prevalence where the sampling populationconsisting of patients attending the hospitalis quite different from the target population,the community.

  • 7/30/2019 Population and Sampling in Health Research

    13/41

    Population and SamplingSampling:2. Determination of sampling population.

    In this case, generalization from the sampleto the target population is not possible, andthe external validity of the study is

    jeopardized.External validity: the ability to generalize fromthe study results to the target population.This leads the investigator to identify the

    proper sampling population that allows him togeneralize his results. In this case, thesampling and the target populations areidentical, and the external validity of the

    study is high.

  • 7/30/2019 Population and Sampling in Health Research

    14/41

    Population and SamplingSampling:

    3. Definition of sampling unit.The sampling unit might be different from theanalysis unit, as outlined above, but more oftenthey are the same. The definition of the sampling

    unit is done by setting:Inclusion criteria: these specify thecharacteristics that make a unit eligible forinclusion in the study sample. Thesecharacteristics must be clear and well defined.Working definitions might be used e.g.Hypertension is BP > 140/90.

  • 7/30/2019 Population and Sampling in Health Research

    15/41

    Population and SamplingSampling:3. Definition of sampling unit.An example of inclusion criteria in a study of oralcontraceptives (OC) and deep venous thrombosis(DVT) is as follows:

    {Married} {Female} {Using OC for > 2yrs} Exclusion criteria: these are criteria thatdisqualify eligible units. They are not the oppositeof inclusion criteria. They pertain to conditions or

    factors that might affect the subject of thestudy.

  • 7/30/2019 Population and Sampling in Health Research

    16/41

    Population and SamplingSampling:

    Thus, in the above example, the exclusion criteriaare NOT:{Unmarried} {Male} {Not using OC for > 2yrs}

    They could rather be: {Bed-ridden} {History ofmajor surgery}since these two conditions are risk factors forDVT, and hence could affect the studied

    relationship between OC and DVT.

    l l

  • 7/30/2019 Population and Sampling in Health Research

    17/41

    Population and SamplingSampling:4. Choice of sampling method.There are two main types of sampling, non-probability and probability sampling. Non-probability sampling is not recommended in medical

    research, and hence will be discussed briefly. Themain emphasis will be on probability sampling.4.1 Non-probability sampling.In this type of sampling, there is no known

    probability of selection for each unit. Thus,generalization from study results is not possiblesince representativeness of the sample cannot beassumed. There are two methods of non-

    probability sampling:

    P l d l

  • 7/30/2019 Population and Sampling in Health Research

    18/41

    Population and SamplingSampling:4. Choice of sampling method.4.1.1 Convenience sampling: The investigator

    selects a convenient sample; e.g. to assessthe opinion of patients about service, the

    investigator decides to interview all thepatients coming to his office today. Theassumption that these patients represent allthe patients attending the hospital cannot

    hold. The investigator cannot generalize theresults.4.12 Quota sampling: In the above study,the investigator wants to ensure that all types of

    patients are represented in his sample.

    P l d l

  • 7/30/2019 Population and Sampling in Health Research

    19/41

    Population and SamplingSampling:4. Choice of sampling method.

    He decides to interview 60 males and 60females. Within each gender he will include 20

    individuals 60 yrs, and 20 inbetween. Thus, he has six categories of age andgender, with a quota of 20 individuals in each.Although this sample might be better than the

    previous one, yet there is no known probability ofselection, and generalization is still risky.

    P l i d li

  • 7/30/2019 Population and Sampling in Health Research

    20/41

    Population and SamplingSampling:4. Choice of sampling method.

    4.2 Probability sampling:This is the more important sampling type.Here there is a known probability of

    selection for each sampling unit. However,this necessitates the presence of asampling frame. There are various methodsof probability sampling.

    4.2.1 Simple random sampling.- Most basic sampling method- Requirements: complete sampling

    frame, numbering of sampling

    units and sample size

    P l i d S li

  • 7/30/2019 Population and Sampling in Health Research

    21/41

    Population and SamplingSampling:4. Choice of sampling method.

    Methods:* lottery method (drawing numbers from a box)

    * table of random numbers* computer generated random numbers

    4.2.2 Systematic random sampling.

    - Also a basic sampling method- Requirements:* complete sampling frame in the form of a list* numbering of sampling units

    * sample size

  • 7/30/2019 Population and Sampling in Health Research

    22/41

    Population and SamplingSampling:

    4. Choice of sampling method.

    - Methods:* determination of the periodicity of sampling as

    follows:period = population (N) / sample size (n)e.g. N = 1000 n = 100 then every 1000/100 or10th sampling unit will be selected* determination of the starting point (in thisexample from 1 to 10) by simple random sampling* compilation of the required sample according

    the starting point, the periodicity and the samplesize

    P l ti d S li

  • 7/30/2019 Population and Sampling in Health Research

    23/41

    Population and SamplingSampling:4. Choice of sampling method.4.2.3 Stratified random sampling.- Most suitable to ensure representation ofcertain subcategories of the population in the

    sample. These subcategories gain importancewhenever they are suspected to affect theresearch question, or the relationship under study.Example:

    In the study of the relationship betweenhypercholesterolemia and CAD, gender and smokingare important factors that might affect thisassociation.

    P l ti d S li

  • 7/30/2019 Population and Sampling in Health Research

    24/41

    Population and SamplingSampling:4. Choice of sampling method.

    We might need to study it in various subcategoriesor strata of gender and smoking:

    * male smokers* female smokers* male non-smokers* female non-smokers

    Stratified random sampling is thus a process ofdividing the population into mutually exclusivestrata, and sampling from these various strata.

    P l ti d S li

  • 7/30/2019 Population and Sampling in Health Research

    25/41

    Population and SamplingSampling:4. Choice of sampling method.

    4.2.4 Multi-stage random sample.- Mostly used in surveys- Requirements:

    * sampling frame of the first population and ofsubsequently selected sampling units* determination of the different stages ofsampling

    * determination of the required sample size ineach stage* compilation of the required sample by simple orsystematic random methods

  • 7/30/2019 Population and Sampling in Health Research

    26/41

    Population and SamplingSampling:

    4. Choice of sampling method.

    Example: determination of the prevalence of DM inthe country

    * sampling frames are done for the selected unitsonly. In the previous example the needed samplingframes are for:+ provinces of the country

    + cities of the selected provinces+ districts of the selected cities+ households of the selected districts+ individuals of the selected households

    P l ti d S li

  • 7/30/2019 Population and Sampling in Health Research

    27/41

    Population and SamplingSampling:4. Choice of sampling method.Compare these frames to the frame required forsimple random sampling: enumeration of all theindividual persons in the country.

    P l ti d S li

  • 7/30/2019 Population and Sampling in Health Research

    28/41

    Population and SamplingSampling:4. Choice of sampling method.

    4.2.5 Cluster sampling.In a study of the prevalence of schistosomiasis ina village, the sampling units were households. It

    was decided to select households by simple randomsampling. The village had 10,000households. The required sample size, 300households, were found to be scattered on an area

    of 600 km 2 . The time and resources wouldn'tallow the investigator to undergo this tediousfield work. What would he do?

    P l ti d S li

  • 7/30/2019 Population and Sampling in Health Research

    29/41

    Population and SamplingSampling4. Choice of sampling method

    4.2.5 Cluster samplingThe investigator noticed that the village consistedof 100 hamlets with an average of 80 to 120

    households each. All the hamlets were quite similarregarding socio-demographic characteristicsand factors related to the disease. Each hamletcontained various categories of age, gender,

    occupation, social class, education, etc.Each hamlet was considered as a cluster ofhouseholds. A sampling frame of the 100 hamletswas prepared.

    P p l ti n nd S mplin

  • 7/30/2019 Population and Sampling in Health Research

    30/41

    Population and SamplingSampling4. Choice of sampling method

    4.2.5 Cluster samplingA simple random sample of "n" clusters wasselected. All the households of the selected

    clusters were included in the sample.The assumption of representativeness of thesample is based on:* the similarity of / or homogeneity among all the

    clusters* the heterogeneity within each cluster[ note the difference from stratified randomsampling ]

    P pul ti n nd S mplin

  • 7/30/2019 Population and Sampling in Health Research

    31/41

    Population and SamplingSampling4. Choice of sampling method

    4.2.6 Area sampling.Area sampling is very similar to cluster sampling.It is used whenever natural clusters are not

    present, and the investigator creates the requiredclusters. If the previous study was done in acity, with no hamlets for clustering, a map of thecity could be used to divide it into sections or

    clusters for sampling. The only one condition tofulfill is the similarity among the various sections.This kind of sampling is used in counting the hairof the scalp.

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    32/41

    Population and SamplingSampling4. Choice of sampling method

    4.2.7 Multi-phase sampling.In certain studies, the outcome might be assessedby two or more diagnostic tools. One of these

    tools might be inexpensive, rapid, harmless andacceptable, while the other could be costly andhaving potential side-effects. If, for example,the investigator wants to determine the prevalence

    of angina in the population of males above 30 yrsin a certain locality, his tools are the Rose'squestionnaire for angina and the exercise test. Hemight select a large random sample of the

    population for interviewing.

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    33/41

    Population and SamplingSampling4. Choice of sampling method

    4.2.7 Multi-phase sampling

    POPULATION

    SAMPLE Test 1

    SUBSAMPLE Test 2

    MULTI-PHASE SAMPLING

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    34/41

    Population and SamplingSampling5. Estimation of sample size.How many subjects (sampling units) should bestudied? The answer to this question is often anempirical choice of a number. This number will

    erroneously depend only on feasibility i.e. the timeallowed for the study, the available resources, thefrequency of cases, etc.

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    35/41

    Population and SamplingSampling5. Estimation of sample size.There is also a common belief that the larger thesample size, the better is the study. This is akind of misbelief since, similar to a small sample

    size, a large sample size can lead to methodologicand statistical problems. The major problem with asmall sample size is its inability to show asignificant difference whenever it is actually

    present.

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    36/41

    Population and SamplingSampling5. Estimation of sample size.The size of the sample will depend on the followingfactors:1. Magnitude of the difference to be detected

    (effect size): A large sample size is needed fordetection of a minute difference, e.g. aninvestigator is comparing the effect of twoantihypertensive drugs. If the expected

    difference is in the order of 1-2 mm Hg, he willneed a much larger sample size than that requiredto detect a difference of 5-10 mm Hg. Here, thedifference to be detected should be governed by

    its clinical significance.

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    37/41

    Population and SamplingSampling5. Estimation of sample size.The same applies for the magnitude of risk inetiologic studies. Thus, the sample size is inverselyrelated to the effect size.

    2. Variability of the measurement:Variability of the measurement can be simulated tobackground noise. The higher this noise is, themore difficult is the detection of the signal,

    the more effort is required, the more subjectsneed to be studied. In the previous example, ifthe drugs are tested on homogenous groups withlow variability of blood pressure measurements,

    required..

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    38/41

    Population and SamplingSampling5. Estimation of sample size.Thus, sample size is directly related to thestandard deviation. detection of the differencewill be easier and will need a small number of

    subjects. The variability of measurements isreflected by the standard deviation or thevariance. The higher the standard deviation, thelarger is the sample size.

    .

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    39/41

    Population and SamplingSampling5. Estimation of sample size.

    3. Level of significance:The significance level " " pertains to the maximumrisk or probability of rejecting a true null

    hypothesis. It is also known as error or type Ierror. Since it is an error, the investigator tendsto keep it at minimum. The maximum level of has been set to 5% or 0.05. To be more confident

    with his results, the investigator might want tominimize his -error to 0.01 or 0.001. However,this is not without cost. The price is in terms ofincrease in sample size. Thus, sample size is

    inversely related to the level of -error

    Population and Sampling

  • 7/30/2019 Population and Sampling in Health Research

    40/41

    Population and SamplingSampling5. Estimation of sample size.

    4. Power of the study:The power of a study is the probability that it willyield statistically significant result. It is relatedto another type of error, type II or

    -error.This error pertains to the risk of accepting thenull hypothesis although it is false. The power isequal to (1 - ). The investigator tends to

    increase the power of his study through minimizingthe level of -error. There is no pre-set level of -error as it is the case for -error. However,studies with -error of up to 0.2 (power of 0.8

    or 80%) are acceptable.

  • 7/30/2019 Population and Sampling in Health Research

    41/41

    Population and SamplingSampling

    5. Estimation of sample size.4. Power of the study:Thus, sample size is inversely related to -error,or directly related to the desired power

    To summarize, sample size (n) is directly relatedto the standard deviation (s), and inversely relatedto the effect size (ES), -error, and error.