errors in epidemiological studies - · pdf fileepidemiological studies is ... different...
TRANSCRIPT
ERRORS IN ERRORS IN
EPIDEMIOLOGICAL EPIDEMIOLOGICAL STUDIESSTUDIES
PratapPratap SinghasivanonSinghasivanonDepartment of Tropical HygieneDepartment of Tropical Hygiene
An important goal of
epidemiological studies is to measure accurately the occurrence of exposure/risk factors and disease outcome
A false or mistaken result obtained in
a study or experiment
BIAS BIAS BIAS Fluctuation of and
estimate around thepopulation value
(RANDOM VARIABILITY)
Error due to factorsthatinherent in the
design, conduct and analysis
Result obtained in sample differs from result that would
be obtained if the entire population were studies
ERRORERRORERROR SYSTEMATIC ERRORSYSTEMATIC ERROR RANDOM ERRORRANDOM ERROR== ++
Page 3
ERRORERROR
Is defined as a false or mistaken result obtained in a study or experiment
Consists of 2 componentsSystematic error
Random errorRandom error
Page 4
RANDOM ERRORRANDOM ERROR
IsIs the divergence, due to chance alone, of
an observation on an sample from the true
population value
Page 5
RANDOM ERRORRANDOM ERROR
Refers to fluctuations around a true value because of Sampling variability
SYSTEMATIC ERRORSYSTEMATIC ERRORAny difference between the true value and that actually obtained that is the resultof all causes other than Sampling variability.
Page 6
Errors in epidemiological studies
Error
Study size
Source: Rothman, 2002
Systematic error (bias)
Random error (chance)
BiasBias occurs when an estimated association (RR, OR, difference in means etc.) deviates from the true measure of association
Consequence of bias systematic error in RR, OR etc.
Bias may be introduced at design, implementation or analysis phase of a study
SYSTEMATIC ERROR SYSTEMATIC ERROR ::
SELECTION BIAS
INFORMATION BIAS
CONFOUNDING
Page 9
Should I believe my measurement?
Smoking Lung cancerOR = 9.1
Random Error?Bias?Confounding?
True associationcausalnon-causal
IsIs the expression of the degree to which a
test is capable of measuring what it is intended to measure
AA study is valid if its results corresponds to the truth, no systematic error and random error should be as small as possible
VALIDITYVALIDITY
Page 11
AA high reliabilityhigh reliability means that in repeated measurements the results fall very close to each other;conversely,
AA low reliabilitylow reliability means that they are scattered.
Page 12
Different combinations of high and low Different combinations of high and low reliability and validityreliability and validity
Page 13
VALIDITYVALIDITYHighHigh
HighHigh
REL
IAB
ILIT
YR
ELIA
BIL
ITY
LowLow
LowLow
HighHigh
LowLow
Internal and Internal and EExternal Validityxternal Validity
ExternalExternalPopulationPopulation
TargetPopulation
StudySample
VALIDITYVALIDITYVALIDITY
INTINT.. EXTEXT..
Page 14
is a distorsion in the estimate of effect resulting from the manner in which subject are selected for the study population
MAJOR SOUREC OFMAJOR SOUREC OF SELECTION BIASSELECTION BIAS
1) flaws in the choice of groups to be compared2) choice of sampling frame3) loss to follow up or nonresponse during data
collection4) selective survival
SELECTION BIASSELECTION BIAS
Page 15
Selection Bias
Systematic error resulting from manner in which subjects are selected or retained in the study
Can occur when:• Characteristics of subjects selected for study
differ systematically from those in the target population
• Study and comparison groups are selected from different populations
Selection Bias
Distortions that arise from– Procedures used to select subjects– Factors that influence study participation– Factors that influence participant attrition
Systematic error in identifying or selecting subjects– Examples are…
Selection Bias
Example:If cases & controls or exposed & non-exposed individuals were selected in such a way that an association is observed even though exposure & disease are not associated
May result from withdrawal or losses to follow-up of study subjects
Selection Bias
Problematic
– Can result in over- or under- estimation of the true magnitude of the relationship between an exposure and an outcome
– May produce an apparent association when none exists
• OR/RR may be incorrect estimates ⇒ Invalid inferences about association of exposure & disease
– May conceal a real association
Selection Bias
To avoid it, ensure that:
– Subjects are representative of target population
– Study and comparison groups are similar except for variables being investigated
– Subject losses are kept to a minimum
INFORMATION BIAS INFORMATION BIAS is a distortion in the measurement error or misclassification of subject on one or more variables
MAJOR SOURCES OF INFORMATION BIASMAJOR SOURCES OF INFORMATION BIAS
1) invalid measurement2) incorrect diagnostic criteria3) omissions imprecisions4) other inadequacies in previously recorded data
Page 21
Information BiasInformation Bias
DefinitionDefinition::
A distortion in the measure of association A distortion in the measure of association caused by a lack of accurate caused by a lack of accurate measurements of the exposure (risk measurements of the exposure (risk factor) or disease statefactor) or disease state
Also known as Also known as Measurement biasMeasurement bias
Information biasInformation bias
Systematic error in the measurements of Systematic error in the measurements of information on exposure or outcomeinformation on exposure or outcome
Result in:Result in:
Differences in accuracyDifferences in accuracy of:of:-- exposure data between cases and controlsexposure data between cases and controls-- outcome data between different exposureoutcome data between different exposuregroupsgroups
Information biasInformation bias
Sources of information bias include:Sources of information bias include:-- Defects in the measurement instrumentsDefects in the measurement instruments-- Deficiencies in the questionnairesDeficiencies in the questionnaires-- Inaccurate diagnostic proceduresInaccurate diagnostic procedures-- AmbigiousAmbigious definition of exposuredefinition of exposure-- Poorly defined diagnostic criteria ofPoorly defined diagnostic criteria ofdiseasedisease
-- Incomplete or unreliable data sources Incomplete or unreliable data sources
Information BiasInformation Bias
CauseCause::Information bias arises when Information bias arises when study variables (exposure, disease, or study variables (exposure, disease, or confounders) are inaccurately measured confounders) are inaccurately measured or classified resulting in or classified resulting in MisclassificationMisclassification
MISCLASSIFICATION BIASMISCLASSIFICATION BIAS
Definition:Definition:
the erroneous classification of an individual, a the erroneous classification of an individual, a value, or an attribute into a category other value, or an attribute into a category other than that to which it should be assignedthan that to which it should be assigned
often results from an improper often results from an improper ““cutoff pointcutoff point”” in in disease diagnosis or exposure classificationdisease diagnosis or exposure classification
Hence errors are made in classifying to Hence errors are made in classifying to either disease or exposure statuseither disease or exposure status
MISCLASSIFICATION BIASMISCLASSIFICATION BIAS
Types of misclassification biasTypes of misclassification bias
Non differential (random)Non differential (random)
Differential (systematic)Differential (systematic)
NondifferentialNondifferential Misclassification Misclassification BiasBias
Occurs when there is equal misclassification of Occurs when there is equal misclassification of exposure between diseased and nonexposure between diseased and non--diseased diseased study subjectsstudy subjects
OROR
When there is equal misclassification of When there is equal misclassification of disease between exposed and nondisease between exposed and non--exposed exposed study subjectsstudy subjects
NonNon--differential Misclassification differential Misclassification BiasBias
If exposure or disease is dichotomous, If exposure or disease is dichotomous,
then,then,
NonNon--differential misclassification differential misclassification causes a bias of the RR or OR causes a bias of the RR or OR towards the nulltowards the null
NondifferentialNondifferential Misclassification Misclassification BiasBias
CasesCases ControlsControls TotalTotal
ExposedExposed 100100 5050 150150
NonexposedNonexposed 5050 5050 100100
150150 100100 250250
OR = ad/bc = 2.0; RR = a/(a+b)/c/(c+d) = 1.3
True Classification
CasesCases ControlsControls TotalTotal
ExposedExposed 110110 6060 170170
NonexposedNonexposed 4040 4040 8080
150150 100100 250250
OR = ad/bc = 1.8; RR = a/(a+b)/c/(c+d) = 1.3
Nondifferential misclassification Overestimate exposure in 10 cases, 10 controls
bias towards null
Non-Differential Misclassification
"True Situation"Cases Controls Total
Exp. 85 40 125Not Exp. 15 60 75Total 100 100 200
OR= 8.5
50% of exposed misclassified as unexposed
Cases ControlsExp. 43 20Not Exp. 15 + 42 60 + 20Total 100 100
OR=3.0
Bias towards the null (1.0)
Differential MisclassificationDifferential Misclassification
Occurs when misclassification of exposure is Occurs when misclassification of exposure is not equal between diseased and nonnot equal between diseased and non--diseased diseased study subjectsstudy subjects
OROR
When misclassification of disease is not equal When misclassification of disease is not equal between exposed and nonbetween exposed and non--exposed study exposed study subjectssubjects
Differential MisclassificationDifferential Misclassification
Causes a bias in the RR or OR Causes a bias in the RR or OR -- either towards or away from the null, either towards or away from the null, -- depending on the proportions of studydepending on the proportions of study
subjects misclassifiedsubjects misclassified
Differential MisclassificationDifferential Misclassification
Direction of bias is Direction of bias is towards the nulltowards the null ifif-- fewer cases are considered to be exposed orfewer cases are considered to be exposed or-- fewer exposed are considered to be diseasedfewer exposed are considered to be diseased
Direction of bias is Direction of bias is away from the nullaway from the null ifif-- more cases are considered to be exposed ormore cases are considered to be exposed or-- more exposed are considered to be diseasedmore exposed are considered to be diseased
Differential MisclassificationDifferential MisclassificationExampleExample
CaseCase--Control studyControl study: : If more cases are If more cases are mistakenlymistakenly classified as being classified as being exposed than controls exposed than controls overestimation of ORoverestimation of OR
Cohort studyCohort study: : If exposed group is more likely to be If exposed group is more likely to be mistakenlymistakenlyclassified as having developed the outcome than the classified as having developed the outcome than the unexposed group unexposed group overestimation of RRoverestimation of RR
Leads to Leads to overover-- or underor under-- estimationestimation of the true of the true magnitude of the measure of associationmagnitude of the measure of association
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
1 2 3 4 5+
Birth OrderBirth Order
Affe
cted
A
ffect
ed B
abie
sB
abie
spe
r 100
0 Li
ve B
irths
per 1
000
Live
Birt
hsPrevalence of Down syndrome at birth by birth orderPrevalence of Down syndrome at birth by birth order
0
1
2
3
4
5
6
7
8
9
<20 20-24 25-29 30-34 35-39 40+
Maternal AgeMaternal Age
Affe
cted
Bab
ies
per 1
000
Live
A
ffect
ed B
abie
s pe
r 100
0 Li
ve
Birt
hsB
irths
Prevalence of Down syndrome at Maternal AgePrevalence of Down syndrome at Maternal Age
Here’s what we’d like to assess :
ExposureExposure DiseaseDisease
Here’s where confounding acts :
ExposureExposure DiseaseDisease
ConfoundingConfounding
Where represents a causal relationshiprepresents a non-causal relationship
Why is confounding a problem?
Because…...
• the estimate of association between exposure and disease includes BOTH the contribution of the exposure AND the confounder
• the estimate of association which includes the confounder gives an incorrect estimate of the impact of the exposure on the disease outcome
Criteria for confounding
• must be an independent predictor of disease with or without exposure
• must be associated (correlated) with exposure but not caused by the exposure
• must not be an intermediate link in a causal pathway between exposure and outcome
Potential Confounder
If ...
E D
C
E = Exposure D = Disease C = Confounder
E D
C
NOT A POTENTIAL CONFOUNDER
C E Dpart of causal pathway
E C D
E D
C
E D
C
E and C are part of a causal pathway
Or
CONFOUNDING CONFOUNDING
MIXING OF EFFECTSMIXING OF EFFECTS
The estimate of the effect of The exposure
of interest is distorted because it is mixed
With the effect of an Extraneous factor
Page 45
COFFEE DRINKINGCOFFEE DRINKING,, CIGARETTE SMOKINGCIGARETTE SMOKINGAND CORONARY HEART DISEASEAND CORONARY HEART DISEASE
EXPOSURE(coffee drinking)
DISEASE(heart disease)
CONFOUNDINGVARIABLE
(cigarette smoking)
CONFOUNDING CONFOUNDING
Page 46
TThe distortion introduced by a confounding factor can lead to overestimation or under estimation of an effect depending on the direction of the association that the confounding factor has with exposureand disease.
CConfoundingonfounding can even change the apparent direction of an effect.
ExampleExample :: AlcoholAlcohol
Smoking Smoking
Oral cancerOral cancer
Page 47
E =E = E = E =
D D
D D
E =E = E = E =
D D
D D
EE EEEE
DD
EE
DD
E =E = E =E =
D D
D D
E E E E E E E E
D D
D D
Situation in which Situation in which FF is a is a confounderconfounder for a for a DD -- EE associationassociation..
Situation in which Situation in which FF is notis not a a confounderconfounder for a for a DD -- EE associationassociation..
EE
FFDD
EE
FFDD
EE
FFDD
EE
FFDD
EE
FFDD
EE
FFDD
EE
FFDD
Page 52
To be confoundingTo be confounding,, the extraneous variable must the extraneous variable must have the following characteristicshave the following characteristics
AA confounding variableconfounding variable must be a risk factor for the disease.
AA confounding variableconfounding variable must be associated with the exposure under study (in the population from which the case derive).
AA confounding variableconfounding variable must not be an intermediate step in the causal path between the exposure and the disease.
Page 53
TThe data-based criterion for establishing the presence or absence of confounding involve the comparison of a crude effect measurewith an adjusted effect measure that corrects for distortions due to extraneous variables.
CConfounding is acknowledged to be present when the crude and adjusted effect measured i f f e r i n v a l u e .
Page 54
- RESTRICTION
- MATCHING
- STRATIFICATION
- MATHEMATICAL MODEL
(Multivariate analysis)
DESIGNDESIGN
ANALYSISANALYSIS
CONTROL OF CONFOUNDINGCONTROL OF CONFOUNDING
Page 55
AGE *MI (%) CONTROLS(%) **OC USE (%)
25-29 3 16 29
30-34 9 14 10
35-39 16 20 8
40-44 30 21 4
45-49 42 18 3
*MI : MYOCARDIAL INFARCTION**OC : ORAL CONTRACEPTIVE
Relation of Relation of ConfounderConfounder to to Disease and Exposure Disease and Exposure
DISEASE EXPOSURE
Page 56
CRUDE RRCRUDE RR
D
D
+
-
E E-
RR = 4RR = 4
CRR1000
• Collapsed• Collapsed in 1 table without separation into subgroup.
Page 57
CRUDERR
4==
1000 2000
E =E =
D
D
EE EE
D
D
EE EE
D
D
Page 60
SMOKERSSMOKERSSMOKERS NON-SMOKERSNONNON--SMOKERSSMOKERS
6 29
871
-+
+
-
CIR = 1.86SM^
ADJUSTEDCIR = 1.13
^
CIR = 1.02SM^
EXPOSUREEXPOSURE
194 21
706 79
-+
DISEASEDISEASE
9494
+-
200200
800800
5050
950950
ALCALC
250
1750DISEASEDISEASE
1000 1000 2000
CRUDE CIR = 4.0^
EXPOSUREEXPOSURE
OCaOCa
+
- OCaOCa
IDENTIFYING A CONFOUNDER - an example
Calculate crude measure of association:
RR = 2.8 (2.21,3.63)d)c/(cb)a/(aRR
++
=
Smokers Non smokers TotalCHD 305 58 363NO CHD 345 292 637Total 650 350 1000
Calculate stratum-specific measures of association...
Smokers Non smokers TotalCHD 300 50 350No CHD 300 150 450Total 600 200 800
RR = 2.0 (1.55,2.58)
Smokers Non smokers TotalCHD 5 8 13No CHD 45 142 187Total 50 150 200
STRATUM 1: MEN
STRATUM 2: WOMEN
RR = 1.9 (0.64,5.47)
IS THERE A CONFOUNDER?
• CRUDE RR for smoking and CHD =2.8
• STRATUM-SPECIFIC RR for smoking and CHD with gender as a potential confounder...
MEN RR = 2.0WOMEN RR = 1.9
• Overlapping confidence intervals or 10% rule
• Gender confounds the association between smoking and CHD because the crude RR of 2.8 is NOT the same as the stratum-specific RR’s of approx. 2.0
roughly the same
IS THERE A CONFOUNDER?
• THE 10% RULE
• We assert that gender confounds the association between smoking and CHD because the crude RR of 2.8 is NOT the same as the stratum-specific RR’s of 2.0 or 1.9
• the 10% rule is a good rule of thumb for assessing whether there is confounding present
Is 2.0 and 1.9 more than 10% different from 2.8?
10% of 2.8 = .28 and the difference between our stratum specific RR’s and the crude RR is greater than .28 Not a replacement for a statistical test - simply a way to initially judge whether something is a potentially confounding factor
HOW TO CONTROL FOR CONFOUNDERS?
IN STUDY DESIGN…
- RESTRICTION of subjects according to potential confounders (i.e. simply don’t include confounder in study)
- RANDOM ALLOCATION of subjects to study groups to attempt to even out confounders
- MATCHING subjects on potential confounder thus assuring even distribution among study groups
HOW TO CONTROL FOR CONFOUNDERS?
IN DATA ANALYSIS…
- STRATIFIED ANALYSIS using the Mantel Haenszel method to adjust for confounders
- IMPLEMENT A MATCHED-DESIGN after you have collected data (frequency or group)
- RESTRICTION is still possible at the analysis stage but it means throwing away data
- MODEL FITTING using regression techniques
Using the Mantel Haenszel Method to Report Adjusted OR’s
• Can only use with confounders because we assume that ORs are constant across stratum
• General Formula : where i = strata and N = Total
iii
iiiMH Ncb
NdaOR∑∑
=
• This technique generates a summary measure across strata by removing the effect of the confounder
Example of the Mantel Haenszel Method ...
Stroke No Stroke TotalFm Hsty 16 47 63No Fm Hsty 2 7 9Total 18 54 72
STRATUM 1: Pre-Menopause OR = 1.19
STRATUM 2: Post-Menopause OR = 1.05
Stroke No Stroke TotalFm Hsty 24 13 37No Fm Hsty 58 33 91Total 82 46 128
1.29OR MH =
1281358
72472
1283324
72716
×+
×
×+
×
Mantel Haenszel RELATIVE RISK
∑
∑+
+
=
i
iii
i
iii
MH
N)b(ac
N)d(ca
RR
HOW TO REPORT DATA WITH CONFOUNDERS
IF YOU HAVE A CONFOUNDER….
• DO NOT report crude OR or RR (you know it’s wrong)
• GOOD: Report stratum-specific OR or RR
• BEST: Report summary measures such as a Mantel-Haenszel OR (this is like compiling stratum-specific OR’s)
THIRD VARIABLE SUMMARY
Are stratum-specific OR’s the same?
YES
YES
NO
NO
crude OR = stratum-specific? INTERACTION… reportstratum-specific OR or RR
CONFOUNDINGReport summaryMeasure (MH OR)
NO CONFOUNDING or INTERACTIONReport crude OR or RR
Page 72
Degree of ConfoundingDegree of Confounding
mmeasures the amount of confoundingrather than mere presence or absence
degree of confoundingdegree of confounding = crude measureadjusted measure
= 4.00 = 3.531.13
Crude = 1.68Adjusted = 3.97
d.c. = 1.68 = 0.423.97
over estimation
under estimation
AGE Recent Use of OC MI Controls OR
Yes 4 6225-29No 2 244
7.2
Yes 9 3330-34
No 12 3908.9
Yes 4 26 1.535-39
No 33 330
Yes 6 9 3.740-44
No 65 362
Yes 6 5 3.945-49
No 93 301
Yes 29 135 1.7TOTAL
No 205 160765
AGE Recent Use of OC MI Controls OR
Yes 4 6225-29No 2 244
7.2
Yes 9 3330-34
No 12 3908.9
Yes 4 26 1.535-39
No 33 330
Yes 6 9 3.740-44
No 65 362
Yes 6 5 3.945-49
No 93 301
Yes 29 135 1.7TOTAL
No 205 160765
97.3)MH(ORa =∧
44 fold risk of fold risk of MIMI among recent of among recent of OCOC users users as compared to nonas compared to non--usersusers..
Page 73
TYPES OF ASSOCIATIONTYPES OF ASSOCIATION
A. Not statistically associated (Independent)
B. Statistically associated
1.1. NoncausallyNoncausally associated associated (Secondarily)
2.2. Causally associated Causally associated a. Indirectly associatedb. Directly causal
Page 74
Example No.
Type of Confounding
Unadjusted Relative Risk
Adjusted Relative Risk
1 Positive 3.5 1.0
2 Positive 3.5 2.1
3 Positive 0.3 0.7
4 Negative 1.0 3.2
5 Negative 1.5 3.2
6 Negative 0.8 0.2
7 Qualitative 2.0 0.7
8 Qualitative 0.6 1.8
Hypothetical Examples of Unadjusted and Adjusted Relative Risks According to Type of confounding (Positive or Negative)