sampling methods in research

8/2/2019 Sampling Methods in Research

http://slidepdf.com/reader/full/sampling-methods-in-research 1/4

Grace H. Encelan-Brizuela, MD, MSPH July 29, 2010

SAMPLING METHODS INRESEARCH

INTRODUCTION

Challenge to every research protocol:It must specify a sample of subjects that:can be studied at an acceptable cost in time and

moneyis large enough to control random error in

generalizing the study findings to the populationis representative enough to control systematic

error in these inferences

Basic terms and concepts…

Population – complete set of people with

specified set of characteristics Sample – subset of the population, selected

so as to be representative of the largerpopulation (e.g., Population – Medicine Class2013; Sample – 2013 Section B)

Target population – the large set of patientsthroughout the world to which the results will begeneralized. Defined by clinical anddemographic characteristics.

Accessible population – the subset of thetarget population that is available for the study.Defined by geographic and temporalcharacteristics.

Reasons for sampling1. Samples can be studied more quickly than

populations2. A study of a sample is less expensive than

studying an entire population3. A study of an entire population is impossible

in most situations (e.g., sex workers)4. Sample results are often more accurate

than results based on a population (More proneto mistakes if the population is larger)

5. If samples are properly selected, probability

methods can be used to estimate the error inthe resulting statistics (can also generalize)6. Samples can be selected to reduce

heterogeneity (i.e., use of inclusion andexclusion criteria)

RESEARCHQUESTION(Truth in the

Universe)STEP # 1

TargetPopulations

Specify clinicaland

DemographicCharacteristicsCRITERIA

Well suited to theResearchQuestion

STEP # 2AccessiblePopulation

Specify temporalAnd geographicCharacteristics

CRITERIARepresentative of

target populationsand easy to

study

STUDY PLAN(Truth in the Study)

STEP # 3Intended Sample

Design an approachto selecting the

sampleCRITERIA

Representative of accessiblepopulation

and easy to do

Sampling

Specification

SpecificationEstablishing Inclusion Criteria

Inclusion criteria – define the maincharacteristics of the target and accessiblepopulations (the exclusion criteria is not the

opposite of the inclusion criteria and vice versa)

Considerations Examples

Inclusion

criteria

Specifying thecharacteristicsthat definepopulations thatare relevant tothe researchquestion andefficient forstudy:

A 5 year trial of calciumsupplementation forpreventingosteoporosis mightspecify that thesubjects be:

Targetpopulati

onDerived

fromLiteratur

e

Demographic

characteristics

White females age

45 – 50

Clinicalcharacteristics

In good generalhealth: no knownlife threateningdisease; not takinglong-termcorticosteroids

Accessible

population

Geographiccharacteristics

Patients attendingthe medical clinic atthe investigator’shospital

Temporalcharacteristics

Between Jan 1 andDec 31, 2006

Establishing Exclusion CriteriaExclusion criteria – indicate subsets of

individuals who meet the eligibility criteria, butare likely to interfere with the quality of the dataor the interpretation of the findings

Considerations Examples

Exclusion

criteria

Specifyingsubsets of thepopulation thatwill not bestudied becauseof:

A 5 year trial of calciumsupplementation forpreventingosteoporosis might

exclude subjects whoare:

A high likelihoodof being lost tofollow-up

Plan to move out of state

An inability toprovide gooddata

Disoriented or havinglanguage barriers

Ethical barriers Kidney stone formers

The subject’srefusal toparticipate

Unwilling to acceptpossibility of randomallocation to placebogroup

Choosing an accessible population

Clinic based samples – inexpensive and easyto recruit, but selection factors that determinewho comes to the hospital or clinic may have animportant effect

Population based samples – particularlyuseful for guiding public health and clinicalpractice in the whole community, but chief disadvantage is the expense and difficultyinvolved

1of 4

Page



SAMPLING

1. Probability Sampling

uses a random process to guarantee that eachunit of the population has a specified chance of selection

if there is no accurate listing of the target population, use non-probability sampling

researcher knows denominator and characteristics of the population

a. Simple Random sampling Every subject has an equal probability of

being selected for the study.

Recommended way is to use a table of random numbers or a computer generated listof random numbers

Process of enumerating every unit of theaccessible population, and then selecting thesample at random

Fishbowl sampling included here

What are needed:

Accurate listing of the population

Mechanism to find and enroll those who arechosen

b. Systematic sampling

Involves selecting by a periodic process;starting point is chosen at random

Example: get 200 samples from a populationof 3400

Procedure: Number all units 1 to 3400; dividepopulation with the number to be sampled(3400/200 = 17). Select any number from 1 to17 to be the k. Then select every 17th subjectthereafter.

NOTE: should not be used when a cyclicrepetition is inherent in the sampling frame.

e.g., not appropriate for selecting months of the year in a study of the frequency of differenttypes of accidents, because some accidentsoccur most often at certain times of the year

e.g., selecting all even/odd numbers whenmales and females are seated alternately

c. Stratified Random sampling

Involves dividing the population intosubgroups according to characteristics and

taking a random sample from each of these“strata”

Characteristics used to stratify should berelated to the measurement of interest

In Medicine, commonly used strata include:age, gender, severity of disease

e.g., use of proportionate numbers in groupswith differing population sizes

d. Cluster sampling

Process of taking a random sample of naturalgroupings of individuals in the population; veryuseful when the population is widely dispersed

and it is impractical or costly to list and samplefrom all of its elements

Clusters are commonly based on geographicareas or districts, so this approach is used moreoften in epidemiologic research than in clinicalresearch

e.g., Different areas [Area 48 or 81] in BrgyDona Imelda

(e.g., Different areas [Area 48 or 81] in Brgy DonaImelda)

ADDITIONAL: Multi-Stage samplingCombination of the above-mentioned sampling

methodsUsed in very big research studies (e.g.,

nationwide study)

e.g., nationwide choose province choosecity/municipality choose barangay, etc

2. Nonprobability Sampling

Sampling method in which the probabilitythat a subject is selected is unknown

E.g., studies involving abused children/women,

sex workers, etcDenominator is unknown (total population

number)

a. Consecutive Sampling

Involves taking every patient who meets theselection criteria over a specified time intervalor number of patients; it amounts to taking thecomplete accessible population over theduration of the study

Taking an accessible population at the time of the study

Usually used by residents in their research (dueto time constraints)

b. Convenience Sampling

Process of taking those members of theaccessible population who are easily available.

Many biases, sample not homogenous

c. Judgemental Sampling

Involves handpicking from the accessiblepopulation those individuals judged mostappropriate for the study

E.g. accreditation

Snowball Sampling

Referral method

After recruiting one subject, ask that subject for other prospective subjects

SAMPLE SIZE COMPUTATION

Sample SizeFactors that affect the number of subjects requiredfor a study:1. Whether the research design involves

paired or unpaired data

2. Whether beta error is considered in additionto alpha error

3. Whether a large or small variance isanticipated in the data set

4. Whether alpha level chosen is the usual (pvalue 0.05) or smaller

5. Whether the desired difference betweenmeans or proportions to be detected is fairlysmall or extremely small

Pre-testAnswer with LARGE or SMALL:

What sample size would be needed if theinvestigator wants the answer to be very closeto the true value (i.e., have very narrowconfidence level or a very small p value)? -LARGER

What sample size would be needed if anticipated variance is small? - SMALLER

What sample size would be needed if thedifference the investigator wants to detect isextremely small? - LARGER

2 of 4 |Page



Review of Basic Concepts and Terms

Effect size – difference you want to detect between one group and the other group (related to number 3)

Alpha level/Significance level – probability that a positive finding is due to chance alone (inmedicine, set at 95% [0.05] and Z α=1.96 -constant)

Power – the probability that the effect will bedetected; equivalent to beta error (in medicine,set at 80% and Z β=0.84 - constant)

Alpha error – type I error; error of finding

something when in fact there is nothing (error of rejecting the null when it is true); z-value for alpha error is 1.96; p = 0.05 means that you areallowing yourself 5% chance of committing typeI error

Beta error – type II error; error of finding nothingwhen in fact there is something (error of accepting the null when it is false); z-value for beta error is 0.84

Recall… (Nice to Know)t = d____

_sd__

√ NWhere: d is the mean difference that wasobserved, sd is the standard error of that meandifference, and N is the sample size

T o solve for N, rearrangements have to be done. The formula becomes (Need to Know)

N = (zα)2 * (s)2 (d)2

Derivation of the Basic Sample Size FormulaFormula for the Calculation of Sample Size forstudies commonly pursued in Medical Research

Studies using the paired t test (e.g. before andafter studies) and considering alpha (Type I)error only

N = (zα)2 * (s)2 (d)2

Use Paired T-test if:

Utilizing a before and after study (involves 1group only)

Matching was employed (according to gender,age, etc)

Subjects are twins

If not, data is continuous use independent T-test

StudyCharacteris

tics

Assumptions made byInvestigator

Type of Study

Before and after study of an anti-HPN drug

Data setsPre-treatment and post-treatmentobservations in the same group of subjects

Variable Systolic blood pressureStandard

deviation (s)15 mmHg

Variance (s

2

) 225 mmHgData for

alpha (zα)

p = 0.05; therefore, 95%confidence desired (two-tailedtest); Zα = 1.96

Difference tobe detected

(d)

10 mmHg or larger differencebetween pre and post-treatmentblood pressure values

N = (zα)2 * (s)2 (d)2

= (1.96)2 * (15)2 (10)2

= (3.84)*(225)(100)

= 864 = 8.64 = 9 subjects total100

Studies using the Student’s t test (e.g. oneexperimental group and one control group) andconsidering alpha (Type I) error only

N = (zα)2 * 2 * (s)2 (d)2

Study

Characteristics

Assumptions made by

Investigator

Type of Study

RCT of an anti-HPN drug

Data setsObservations in one experimentalgroup and one control group



Variance (s2) 225 mmHg

Data foralpha (zα)



(d)

10 mmHg or larger differencebetween mean blood pressurevalues of theexperimental group and controlgroup

N = (zα)2 * 2 * (s)2 (d)2

= (1.96)2 * 2 * (15)2 (10)2

= (3.84)*2*(225)(100)

= 1728 = 17.28

100= 18 subjects per group * 2 grps = 36 subjects

Studies using the Student’s t test andconsidering alpha (Type I) error and beta (TypeII) errors

N = (zα + zβ )2 * 2 * (s)2 (d)2

StudyCharacteris

tics

Assumptions made byInvestigator

Type of

Study

RCT of an anti-HPN drug




Variance (s2) 225 mmHg

Data foralpha (zα)


Data for beta(zΒ)

20% beta error; therefore, 80%power desired (one-tailed test); Zβ

= 0.84


(d)

10 mmHg or larger differencebetween mean blood pressurevalues of theexperimental group and controlgroup

N = (zα + zβ )2 * 2 * (s)2 (d)2

= (1.96+0.84)2*2* (15)2 (10)2

= (7.84)*2* (225)

3 of 4 |Page



100= 3528 = 35.28

100= 36 subjects per grp * 2 grps = 72 subjects

Studies using a test of differences in proportionsand considering alpha (Type I) error and beta(Type II) errors

N = (zα + zβ )2 * 2 * p(1 - p)(d)2

Study

Characteristics

Assumptions made by

Investigator

Type of Study

RCT of a drug to reduce the 5yrmortality inpatients with a particular form of

cancer


VariableSuccess=5-yr survival after Tx;Failure=death within 5 yrs of Tx

Variance, p(1-p)

p=0.55;therefore, (1-p) = 0.45

Data for

alpha (zα)

p = 0.05; therefore, 95%

confidence desired (two-tailedtest); Zα = 1.96

Data for beta(zΒ)

20% beta error; therefore, 80%power desired (one-tailed test); Zβ

= 0.84


(d)

0.1 or larger difference bet thesuccess(survival) of the E grp and that of

the C grp)

N = (zα + zβ )2 * 2 * p(1 - p)(d)2

= (1.96+0.84)2 * 2 * (0.55)(0.45)

(0.1)2

= (7.84)*2*(0.2475)0.01

= 3.88 = 3880.01

= 388 subjects per grp * 2 grps = 776

Remember:N = sample sizeZα = z-value for alpha error = 1.96 (constant)Zβ = z-value for beta error = 0.84 (constant)(s)2 = variance (from literature)p = mean proportion of success (from literature)

d = difference to be detected (researcherassigned)

4 of 4 |Page

sampling methods in research

Documents