sampling methods in research
TRANSCRIPT
8/2/2019 Sampling Methods in Research
http://slidepdf.com/reader/full/sampling-methods-in-research 1/4
Grace H. Encelan-Brizuela, MD, MSPH July 29, 2010
SAMPLING METHODS INRESEARCH
INTRODUCTION
Challenge to every research protocol:It must specify a sample of subjects that:can be studied at an acceptable cost in time and
moneyis large enough to control random error in
generalizing the study findings to the populationis representative enough to control systematic
error in these inferences
Basic terms and concepts…
Population – complete set of people with
specified set of characteristics Sample – subset of the population, selected
so as to be representative of the largerpopulation (e.g., Population – Medicine Class2013; Sample – 2013 Section B)
Target population – the large set of patientsthroughout the world to which the results will begeneralized. Defined by clinical anddemographic characteristics.
Accessible population – the subset of thetarget population that is available for the study.Defined by geographic and temporalcharacteristics.
Reasons for sampling1. Samples can be studied more quickly than
populations2. A study of a sample is less expensive than
studying an entire population3. A study of an entire population is impossible
in most situations (e.g., sex workers)4. Sample results are often more accurate
than results based on a population (More proneto mistakes if the population is larger)
5. If samples are properly selected, probability
methods can be used to estimate the error inthe resulting statistics (can also generalize)6. Samples can be selected to reduce
heterogeneity (i.e., use of inclusion andexclusion criteria)
RESEARCHQUESTION(Truth in the
Universe)STEP # 1
TargetPopulations
Specify clinicaland
DemographicCharacteristicsCRITERIA
Well suited to theResearchQuestion
STEP # 2AccessiblePopulation
Specify temporalAnd geographicCharacteristics
CRITERIARepresentative of
target populationsand easy to
study
STUDY PLAN(Truth in the Study)
STEP # 3Intended Sample
Design an approachto selecting the
sampleCRITERIA
Representative of accessiblepopulation
and easy to do
Sampling
Specification
SpecificationEstablishing Inclusion Criteria
Inclusion criteria – define the maincharacteristics of the target and accessiblepopulations (the exclusion criteria is not the
opposite of the inclusion criteria and vice versa)
Considerations Examples
Inclusion
criteria
Specifying thecharacteristicsthat definepopulations thatare relevant tothe researchquestion andefficient forstudy:
A 5 year trial of calciumsupplementation forpreventingosteoporosis mightspecify that thesubjects be:
Targetpopulati
onDerived
fromLiteratur
e
Demographic
characteristics
White females age
45 – 50
Clinicalcharacteristics
In good generalhealth: no knownlife threateningdisease; not takinglong-termcorticosteroids
Accessible
population
Geographiccharacteristics
Patients attendingthe medical clinic atthe investigator’shospital
Temporalcharacteristics
Between Jan 1 andDec 31, 2006
Establishing Exclusion CriteriaExclusion criteria – indicate subsets of
individuals who meet the eligibility criteria, butare likely to interfere with the quality of the dataor the interpretation of the findings
Considerations Examples
Exclusion
criteria
Specifyingsubsets of thepopulation thatwill not bestudied becauseof:
A 5 year trial of calciumsupplementation forpreventingosteoporosis might
exclude subjects whoare:
A high likelihoodof being lost tofollow-up
Plan to move out of state
An inability toprovide gooddata
Disoriented or havinglanguage barriers
Ethical barriers Kidney stone formers
The subject’srefusal toparticipate
Unwilling to acceptpossibility of randomallocation to placebogroup
Choosing an accessible population
Clinic based samples – inexpensive and easyto recruit, but selection factors that determinewho comes to the hospital or clinic may have animportant effect
Population based samples – particularlyuseful for guiding public health and clinicalpractice in the whole community, but chief disadvantage is the expense and difficultyinvolved
1of 4
Page
8/2/2019 Sampling Methods in Research
http://slidepdf.com/reader/full/sampling-methods-in-research 2/4
SAMPLING
1. Probability Sampling
uses a random process to guarantee that eachunit of the population has a specified chance of selection
if there is no accurate listing of the target population, use non-probability sampling
researcher knows denominator and characteristics of the population
a. Simple Random sampling Every subject has an equal probability of
being selected for the study.
Recommended way is to use a table of random numbers or a computer generated listof random numbers
Process of enumerating every unit of theaccessible population, and then selecting thesample at random
Fishbowl sampling included here
What are needed:
Accurate listing of the population
Mechanism to find and enroll those who arechosen
b. Systematic sampling
Involves selecting by a periodic process;starting point is chosen at random
Example: get 200 samples from a populationof 3400
Procedure: Number all units 1 to 3400; dividepopulation with the number to be sampled(3400/200 = 17). Select any number from 1 to17 to be the k. Then select every 17th subjectthereafter.
NOTE: should not be used when a cyclicrepetition is inherent in the sampling frame.
e.g., not appropriate for selecting months of the year in a study of the frequency of differenttypes of accidents, because some accidentsoccur most often at certain times of the year
e.g., selecting all even/odd numbers whenmales and females are seated alternately
c. Stratified Random sampling
Involves dividing the population intosubgroups according to characteristics and
taking a random sample from each of these“strata”
Characteristics used to stratify should berelated to the measurement of interest
In Medicine, commonly used strata include:age, gender, severity of disease
e.g., use of proportionate numbers in groupswith differing population sizes
d. Cluster sampling
Process of taking a random sample of naturalgroupings of individuals in the population; veryuseful when the population is widely dispersed
and it is impractical or costly to list and samplefrom all of its elements
Clusters are commonly based on geographicareas or districts, so this approach is used moreoften in epidemiologic research than in clinicalresearch
e.g., Different areas [Area 48 or 81] in BrgyDona Imelda
(e.g., Different areas [Area 48 or 81] in Brgy DonaImelda)
ADDITIONAL: Multi-Stage samplingCombination of the above-mentioned sampling
methodsUsed in very big research studies (e.g.,
nationwide study)
e.g., nationwide choose province choosecity/municipality choose barangay, etc
2. Nonprobability Sampling
Sampling method in which the probabilitythat a subject is selected is unknown
E.g., studies involving abused children/women,
sex workers, etcDenominator is unknown (total population
number)
a. Consecutive Sampling
Involves taking every patient who meets theselection criteria over a specified time intervalor number of patients; it amounts to taking thecomplete accessible population over theduration of the study
Taking an accessible population at the time of the study
Usually used by residents in their research (dueto time constraints)
b. Convenience Sampling
Process of taking those members of theaccessible population who are easily available.
Many biases, sample not homogenous
c. Judgemental Sampling
Involves handpicking from the accessiblepopulation those individuals judged mostappropriate for the study
E.g. accreditation
Snowball Sampling
Referral method
After recruiting one subject, ask that subject for other prospective subjects
SAMPLE SIZE COMPUTATION
Sample SizeFactors that affect the number of subjects requiredfor a study:1. Whether the research design involves
paired or unpaired data
2. Whether beta error is considered in additionto alpha error
3. Whether a large or small variance isanticipated in the data set
4. Whether alpha level chosen is the usual (pvalue 0.05) or smaller
5. Whether the desired difference betweenmeans or proportions to be detected is fairlysmall or extremely small
Pre-testAnswer with LARGE or SMALL:
What sample size would be needed if theinvestigator wants the answer to be very closeto the true value (i.e., have very narrowconfidence level or a very small p value)? -LARGER
What sample size would be needed if anticipated variance is small? - SMALLER
What sample size would be needed if thedifference the investigator wants to detect isextremely small? - LARGER
2 of 4 |Page
8/2/2019 Sampling Methods in Research
http://slidepdf.com/reader/full/sampling-methods-in-research 3/4
Review of Basic Concepts and Terms
Effect size – difference you want to detect between one group and the other group (related to number 3)
Alpha level/Significance level – probability that a positive finding is due to chance alone (inmedicine, set at 95% [0.05] and Z α=1.96 -constant)
Power – the probability that the effect will bedetected; equivalent to beta error (in medicine,set at 80% and Z β=0.84 - constant)
Alpha error – type I error; error of finding
something when in fact there is nothing (error of rejecting the null when it is true); z-value for alpha error is 1.96; p = 0.05 means that you areallowing yourself 5% chance of committing typeI error
Beta error – type II error; error of finding nothingwhen in fact there is something (error of accepting the null when it is false); z-value for beta error is 0.84
Recall… (Nice to Know)t = d____
_sd__
√ NWhere: d is the mean difference that wasobserved, sd is the standard error of that meandifference, and N is the sample size
T o solve for N, rearrangements have to be done. The formula becomes (Need to Know)
N = (zα)2 * (s)2 (d)2
Derivation of the Basic Sample Size FormulaFormula for the Calculation of Sample Size forstudies commonly pursued in Medical Research
Studies using the paired t test (e.g. before andafter studies) and considering alpha (Type I)error only
N = (zα)2 * (s)2 (d)2
Use Paired T-test if:
Utilizing a before and after study (involves 1group only)
Matching was employed (according to gender,age, etc)
Subjects are twins
If not, data is continuous use independent T-test
StudyCharacteris
tics
Assumptions made byInvestigator
Type of Study
Before and after study of an anti-HPN drug
Data setsPre-treatment and post-treatmentobservations in the same group of subjects
Variable Systolic blood pressureStandard
deviation (s)15 mmHg
Variance (s
2
) 225 mmHgData for
alpha (zα)
p = 0.05; therefore, 95%confidence desired (two-tailedtest); Zα = 1.96
Difference tobe detected
(d)
10 mmHg or larger differencebetween pre and post-treatmentblood pressure values
N = (zα)2 * (s)2 (d)2
= (1.96)2 * (15)2 (10)2
= (3.84)*(225)(100)
= 864 = 8.64 = 9 subjects total100
Studies using the Student’s t test (e.g. oneexperimental group and one control group) andconsidering alpha (Type I) error only
N = (zα)2 * 2 * (s)2 (d)2
Study
Characteristics
Assumptions made by
Investigator
Type of Study
RCT of an anti-HPN drug
Data setsObservations in one experimentalgroup and one control group
Variable Systolic blood pressureStandard
deviation (s)15 mmHg
Variance (s2) 225 mmHg
Data foralpha (zα)
p = 0.05; therefore, 95%confidence desired (two-tailedtest); Zα = 1.96
Difference tobe detected
(d)
10 mmHg or larger differencebetween mean blood pressurevalues of theexperimental group and controlgroup
N = (zα)2 * 2 * (s)2 (d)2
= (1.96)2 * 2 * (15)2 (10)2
= (3.84)*2*(225)(100)
= 1728 = 17.28
100= 18 subjects per group * 2 grps = 36 subjects
Studies using the Student’s t test andconsidering alpha (Type I) error and beta (TypeII) errors
N = (zα + zβ )2 * 2 * (s)2 (d)2
StudyCharacteris
tics
Assumptions made byInvestigator
Type of
Study
RCT of an anti-HPN drug
Data setsObservations in one experimentalgroup and one control group
Variable Systolic blood pressureStandard
deviation (s)15 mmHg
Variance (s2) 225 mmHg
Data foralpha (zα)
p = 0.05; therefore, 95%confidence desired (two-tailedtest); Zα = 1.96
Data for beta(zΒ)
20% beta error; therefore, 80%power desired (one-tailed test); Zβ
= 0.84
Difference tobe detected
(d)
10 mmHg or larger differencebetween mean blood pressurevalues of theexperimental group and controlgroup
N = (zα + zβ )2 * 2 * (s)2 (d)2
= (1.96+0.84)2*2* (15)2 (10)2
= (7.84)*2* (225)
3 of 4 |Page
8/2/2019 Sampling Methods in Research
http://slidepdf.com/reader/full/sampling-methods-in-research 4/4
100= 3528 = 35.28
100= 36 subjects per grp * 2 grps = 72 subjects
Studies using a test of differences in proportionsand considering alpha (Type I) error and beta(Type II) errors
N = (zα + zβ )2 * 2 * p(1 - p)(d)2
Study
Characteristics
Assumptions made by
Investigator
Type of Study
RCT of a drug to reduce the 5yrmortality inpatients with a particular form of
cancer
Data setsObservations in one experimentalgroup and one control group
VariableSuccess=5-yr survival after Tx;Failure=death within 5 yrs of Tx
Variance, p(1-p)
p=0.55;therefore, (1-p) = 0.45
Data for
alpha (zα)
p = 0.05; therefore, 95%
confidence desired (two-tailedtest); Zα = 1.96
Data for beta(zΒ)
20% beta error; therefore, 80%power desired (one-tailed test); Zβ
= 0.84
Difference tobe detected
(d)
0.1 or larger difference bet thesuccess(survival) of the E grp and that of
the C grp)
N = (zα + zβ )2 * 2 * p(1 - p)(d)2
= (1.96+0.84)2 * 2 * (0.55)(0.45)
(0.1)2
= (7.84)*2*(0.2475)0.01
= 3.88 = 3880.01
= 388 subjects per grp * 2 grps = 776
Remember:N = sample sizeZα = z-value for alpha error = 1.96 (constant)Zβ = z-value for beta error = 0.84 (constant)(s)2 = variance (from literature)p = mean proportion of success (from literature)
d = difference to be detected (researcherassigned)
4 of 4 |Page