errors in epidemiological studies - · pdf fileepidemiological studies is ... different...

ERRORS IN ERRORS IN

EPIDEMIOLOGICAL EPIDEMIOLOGICAL STUDIESSTUDIES

PratapPratap SinghasivanonSinghasivanonDepartment of Tropical HygieneDepartment of Tropical Hygiene

An important goal of

epidemiological studies is to measure accurately the occurrence of exposure/risk factors and disease outcome

A false or mistaken result obtained in

a study or experiment

BIAS BIAS BIAS Fluctuation of and

estimate around thepopulation value

(RANDOM VARIABILITY)

Error due to factorsthatinherent in the

design, conduct and analysis

Result obtained in sample differs from result that would

be obtained if the entire population were studies

ERRORERRORERROR SYSTEMATIC ERRORSYSTEMATIC ERROR RANDOM ERRORRANDOM ERROR== ++

ERRORERROR

Is defined as a false or mistaken result obtained in a study or experiment

Consists of 2 componentsSystematic error

Random errorRandom error

RANDOM ERRORRANDOM ERROR

IsIs the divergence, due to chance alone, of

an observation on an sample from the true

population value

RANDOM ERRORRANDOM ERROR

Refers to fluctuations around a true value because of Sampling variability

SYSTEMATIC ERRORSYSTEMATIC ERRORAny difference between the true value and that actually obtained that is the resultof all causes other than Sampling variability.

Errors in epidemiological studies

Error

Study size

Source: Rothman, 2002

Systematic error (bias)

Random error (chance)

BiasBias occurs when an estimated association (RR, OR, difference in means etc.) deviates from the true measure of association

Consequence of bias systematic error in RR, OR etc.

Bias may be introduced at design, implementation or analysis phase of a study

SYSTEMATIC ERROR SYSTEMATIC ERROR ::

SELECTION BIAS

INFORMATION BIAS

CONFOUNDING

Should I believe my measurement?

Smoking Lung cancerOR = 9.1

Random Error?Bias?Confounding?

True associationcausalnon-causal

IsIs the expression of the degree to which a

test is capable of measuring what it is intended to measure

AA study is valid if its results corresponds to the truth, no systematic error and random error should be as small as possible

VALIDITYVALIDITY

AA high reliabilityhigh reliability means that in repeated measurements the results fall very close to each other;conversely,

AA low reliabilitylow reliability means that they are scattered.

Different combinations of high and low Different combinations of high and low reliability and validityreliability and validity

VALIDITYVALIDITYHighHigh

HighHigh

REL

IAB

ILIT

YR

ELIA

BIL

ITY

LowLow

LowLow

HighHigh

LowLow

Internal and Internal and EExternal Validityxternal Validity

ExternalExternalPopulationPopulation

TargetPopulation

StudySample

VALIDITYVALIDITYVALIDITY

INTINT.. EXTEXT..

is a distorsion in the estimate of effect resulting from the manner in which subject are selected for the study population

MAJOR SOUREC OFMAJOR SOUREC OF SELECTION BIASSELECTION BIAS

1) flaws in the choice of groups to be compared2) choice of sampling frame3) loss to follow up or nonresponse during data

collection4) selective survival

SELECTION BIASSELECTION BIAS

Selection Bias

Systematic error resulting from manner in which subjects are selected or retained in the study

Can occur when:• Characteristics of subjects selected for study

differ systematically from those in the target population

• Study and comparison groups are selected from different populations

Selection Bias

Distortions that arise from– Procedures used to select subjects– Factors that influence study participation– Factors that influence participant attrition

Systematic error in identifying or selecting subjects– Examples are…

Selection Bias

Example:If cases & controls or exposed & non-exposed individuals were selected in such a way that an association is observed even though exposure & disease are not associated

May result from withdrawal or losses to follow-up of study subjects

Selection Bias

Problematic

– Can result in over- or under- estimation of the true magnitude of the relationship between an exposure and an outcome

– May produce an apparent association when none exists

• OR/RR may be incorrect estimates ⇒ Invalid inferences about association of exposure & disease

– May conceal a real association

Selection Bias

To avoid it, ensure that:

– Subjects are representative of target population

– Study and comparison groups are similar except for variables being investigated

– Subject losses are kept to a minimum

INFORMATION BIAS INFORMATION BIAS is a distortion in the measurement error or misclassification of subject on one or more variables

MAJOR SOURCES OF INFORMATION BIASMAJOR SOURCES OF INFORMATION BIAS

1) invalid measurement2) incorrect diagnostic criteria3) omissions imprecisions4) other inadequacies in previously recorded data

Information BiasInformation Bias

DefinitionDefinition::

A distortion in the measure of association A distortion in the measure of association caused by a lack of accurate caused by a lack of accurate measurements of the exposure (risk measurements of the exposure (risk factor) or disease statefactor) or disease state

Also known as Also known as Measurement biasMeasurement bias

Information biasInformation bias

Systematic error in the measurements of Systematic error in the measurements of information on exposure or outcomeinformation on exposure or outcome

Result in:Result in:

Differences in accuracyDifferences in accuracy of:of:-- exposure data between cases and controlsexposure data between cases and controls-- outcome data between different exposureoutcome data between different exposuregroupsgroups

Information biasInformation bias

Sources of information bias include:Sources of information bias include:-- Defects in the measurement instrumentsDefects in the measurement instruments-- Deficiencies in the questionnairesDeficiencies in the questionnaires-- Inaccurate diagnostic proceduresInaccurate diagnostic procedures-- AmbigiousAmbigious definition of exposuredefinition of exposure-- Poorly defined diagnostic criteria ofPoorly defined diagnostic criteria ofdiseasedisease

-- Incomplete or unreliable data sources Incomplete or unreliable data sources

Information BiasInformation Bias

CauseCause::Information bias arises when Information bias arises when study variables (exposure, disease, or study variables (exposure, disease, or confounders) are inaccurately measured confounders) are inaccurately measured or classified resulting in or classified resulting in MisclassificationMisclassification

MISCLASSIFICATION BIASMISCLASSIFICATION BIAS

Definition:Definition:

the erroneous classification of an individual, a the erroneous classification of an individual, a value, or an attribute into a category other value, or an attribute into a category other than that to which it should be assignedthan that to which it should be assigned

often results from an improper often results from an improper ““cutoff pointcutoff point”” in in disease diagnosis or exposure classificationdisease diagnosis or exposure classification

Hence errors are made in classifying to Hence errors are made in classifying to either disease or exposure statuseither disease or exposure status

MISCLASSIFICATION BIASMISCLASSIFICATION BIAS

Types of misclassification biasTypes of misclassification bias

Non differential (random)Non differential (random)

Differential (systematic)Differential (systematic)

NondifferentialNondifferential Misclassification Misclassification BiasBias

Occurs when there is equal misclassification of Occurs when there is equal misclassification of exposure between diseased and nonexposure between diseased and non--diseased diseased study subjectsstudy subjects

OROR

When there is equal misclassification of When there is equal misclassification of disease between exposed and nondisease between exposed and non--exposed exposed study subjectsstudy subjects

NonNon--differential Misclassification differential Misclassification BiasBias

If exposure or disease is dichotomous, If exposure or disease is dichotomous,

then,then,

NonNon--differential misclassification differential misclassification causes a bias of the RR or OR causes a bias of the RR or OR towards the nulltowards the null

NondifferentialNondifferential Misclassification Misclassification BiasBias

CasesCases ControlsControls TotalTotal

ExposedExposed 100100 5050 150150

NonexposedNonexposed 5050 5050 100100

150150 100100 250250

OR = ad/bc = 2.0; RR = a/(a+b)/c/(c+d) = 1.3

True Classification

CasesCases ControlsControls TotalTotal

ExposedExposed 110110 6060 170170

NonexposedNonexposed 4040 4040 8080

150150 100100 250250

OR = ad/bc = 1.8; RR = a/(a+b)/c/(c+d) = 1.3

Nondifferential misclassification Overestimate exposure in 10 cases, 10 controls

bias towards null

Non-Differential Misclassification

"True Situation"Cases Controls Total

Exp. 85 40 125Not Exp. 15 60 75Total 100 100 200

OR= 8.5

50% of exposed misclassified as unexposed

Cases ControlsExp. 43 20Not Exp. 15 + 42 60 + 20Total 100 100

OR=3.0

Bias towards the null (1.0)

Differential MisclassificationDifferential Misclassification

Occurs when misclassification of exposure is Occurs when misclassification of exposure is not equal between diseased and nonnot equal between diseased and non--diseased diseased study subjectsstudy subjects

OROR

When misclassification of disease is not equal When misclassification of disease is not equal between exposed and nonbetween exposed and non--exposed study exposed study subjectssubjects


Causes a bias in the RR or OR Causes a bias in the RR or OR -- either towards or away from the null, either towards or away from the null, -- depending on the proportions of studydepending on the proportions of study

subjects misclassifiedsubjects misclassified


Direction of bias is Direction of bias is towards the nulltowards the null ifif-- fewer cases are considered to be exposed orfewer cases are considered to be exposed or-- fewer exposed are considered to be diseasedfewer exposed are considered to be diseased

Direction of bias is Direction of bias is away from the nullaway from the null ifif-- more cases are considered to be exposed ormore cases are considered to be exposed or-- more exposed are considered to be diseasedmore exposed are considered to be diseased

Differential MisclassificationDifferential MisclassificationExampleExample

CaseCase--Control studyControl study: : If more cases are If more cases are mistakenlymistakenly classified as being classified as being exposed than controls exposed than controls overestimation of ORoverestimation of OR

Cohort studyCohort study: : If exposed group is more likely to be If exposed group is more likely to be mistakenlymistakenlyclassified as having developed the outcome than the classified as having developed the outcome than the unexposed group unexposed group overestimation of RRoverestimation of RR

Leads to Leads to overover-- or underor under-- estimationestimation of the true of the true magnitude of the measure of associationmagnitude of the measure of association

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

1 2 3 4 5+

Birth OrderBirth Order

Affe

cted

A

ffect

ed B

abie

sB

abie

spe

r 100

0 Li

ve B

irths

per 1

000

Live

Birt

hsPrevalence of Down syndrome at birth by birth orderPrevalence of Down syndrome at birth by birth order

0

1

2

3

4

5

6

7

8

9

<20 20-24 25-29 30-34 35-39 40+

Maternal AgeMaternal Age

Affe

cted

Bab

ies

per 1

000

Live

A

ffect

ed B

abie

s pe

r 100

0 Li

ve

Birt

hsB

irths

Prevalence of Down syndrome at Maternal AgePrevalence of Down syndrome at Maternal Age

Here’s what we’d like to assess :

ExposureExposure DiseaseDisease

Here’s where confounding acts :

ExposureExposure DiseaseDisease

ConfoundingConfounding

Where represents a causal relationshiprepresents a non-causal relationship

Why is confounding a problem?

Because…...

• the estimate of association between exposure and disease includes BOTH the contribution of the exposure AND the confounder

• the estimate of association which includes the confounder gives an incorrect estimate of the impact of the exposure on the disease outcome

Criteria for confounding

• must be an independent predictor of disease with or without exposure

• must be associated (correlated) with exposure but not caused by the exposure

• must not be an intermediate link in a causal pathway between exposure and outcome

Potential Confounder

If ...

E D

C

E = Exposure D = Disease C = Confounder

E D

C

NOT A POTENTIAL CONFOUNDER

C E Dpart of causal pathway

E C D

E D

C

E D

C

E and C are part of a causal pathway

Or

CONFOUNDING CONFOUNDING

MIXING OF EFFECTSMIXING OF EFFECTS

The estimate of the effect of The exposure

of interest is distorted because it is mixed

With the effect of an Extraneous factor

COFFEE DRINKINGCOFFEE DRINKING,, CIGARETTE SMOKINGCIGARETTE SMOKINGAND CORONARY HEART DISEASEAND CORONARY HEART DISEASE

EXPOSURE(coffee drinking)

DISEASE(heart disease)

CONFOUNDINGVARIABLE

(cigarette smoking)

CONFOUNDING CONFOUNDING

TThe distortion introduced by a confounding factor can lead to overestimation or under estimation of an effect depending on the direction of the association that the confounding factor has with exposureand disease.

CConfoundingonfounding can even change the apparent direction of an effect.

ExampleExample :: AlcoholAlcohol

Smoking Smoking

Oral cancerOral cancer

E =E = E = E =

D D

D D

EE EEEE

DD

EE

DD

E =E = E =E =

D D

D D

E E E E E E E E

D D

D D

Situation in which Situation in which FF is a is a confounderconfounder for a for a DD -- EE associationassociation..

Situation in which Situation in which FF is notis not a a confounderconfounder for a for a DD -- EE associationassociation..

EE

FFDD

EE

FFDD

EE

FFDD

EE

FFDD

EE

FFDD

EE

FFDD

EE

FFDD

To be confoundingTo be confounding,, the extraneous variable must the extraneous variable must have the following characteristicshave the following characteristics

AA confounding variableconfounding variable must be a risk factor for the disease.

AA confounding variableconfounding variable must be associated with the exposure under study (in the population from which the case derive).

AA confounding variableconfounding variable must not be an intermediate step in the causal path between the exposure and the disease.

TThe data-based criterion for establishing the presence or absence of confounding involve the comparison of a crude effect measurewith an adjusted effect measure that corrects for distortions due to extraneous variables.

CConfounding is acknowledged to be present when the crude and adjusted effect measured i f f e r i n v a l u e .

- RESTRICTION

- MATCHING

- STRATIFICATION

- MATHEMATICAL MODEL

(Multivariate analysis)

DESIGNDESIGN

ANALYSISANALYSIS

CONTROL OF CONFOUNDINGCONTROL OF CONFOUNDING

AGE *MI (%) CONTROLS(%) **OC USE (%)

25-29 3 16 29

30-34 9 14 10

35-39 16 20 8

40-44 30 21 4

45-49 42 18 3

*MI : MYOCARDIAL INFARCTION**OC : ORAL CONTRACEPTIVE

Relation of Relation of ConfounderConfounder to to Disease and Exposure Disease and Exposure

DISEASE EXPOSURE

CRUDE RRCRUDE RR

D

D

+

-

E E-

RR = 4RR = 4

CRR1000

• Collapsed• Collapsed in 1 table without separation into subgroup.

CRUDERR

4==

1000 2000

E =E =

D

D

EE EE

D

D

EE EE

D

D

SMOKERSSMOKERSSMOKERS NON-SMOKERSNONNON--SMOKERSSMOKERS

6 29

871

-+

+

-

CIR = 1.86SM^

ADJUSTEDCIR = 1.13

^

CIR = 1.02SM^

EXPOSUREEXPOSURE

194 21

706 79

-+

DISEASEDISEASE

9494

+-

200200

800800

5050

950950

ALCALC

250

1750DISEASEDISEASE

1000 1000 2000

CRUDE CIR = 4.0^

EXPOSUREEXPOSURE

OCaOCa

+

- OCaOCa

IDENTIFYING A CONFOUNDER - an example

Calculate crude measure of association:

RR = 2.8 (2.21,3.63)d)c/(cb)a/(aRR

++

=

Smokers Non smokers TotalCHD 305 58 363NO CHD 345 292 637Total 650 350 1000

Calculate stratum-specific measures of association...

Smokers Non smokers TotalCHD 300 50 350No CHD 300 150 450Total 600 200 800

RR = 2.0 (1.55,2.58)

Smokers Non smokers TotalCHD 5 8 13No CHD 45 142 187Total 50 150 200

STRATUM 1: MEN

STRATUM 2: WOMEN

RR = 1.9 (0.64,5.47)

IS THERE A CONFOUNDER?

• CRUDE RR for smoking and CHD =2.8

• STRATUM-SPECIFIC RR for smoking and CHD with gender as a potential confounder...

MEN RR = 2.0WOMEN RR = 1.9

• Overlapping confidence intervals or 10% rule

• Gender confounds the association between smoking and CHD because the crude RR of 2.8 is NOT the same as the stratum-specific RR’s of approx. 2.0

roughly the same

IS THERE A CONFOUNDER?

• THE 10% RULE

• We assert that gender confounds the association between smoking and CHD because the crude RR of 2.8 is NOT the same as the stratum-specific RR’s of 2.0 or 1.9

• the 10% rule is a good rule of thumb for assessing whether there is confounding present

Is 2.0 and 1.9 more than 10% different from 2.8?

10% of 2.8 = .28 and the difference between our stratum specific RR’s and the crude RR is greater than .28 Not a replacement for a statistical test - simply a way to initially judge whether something is a potentially confounding factor

HOW TO CONTROL FOR CONFOUNDERS?

IN STUDY DESIGN…

- RESTRICTION of subjects according to potential confounders (i.e. simply don’t include confounder in study)

- RANDOM ALLOCATION of subjects to study groups to attempt to even out confounders

- MATCHING subjects on potential confounder thus assuring even distribution among study groups

HOW TO CONTROL FOR CONFOUNDERS?

IN DATA ANALYSIS…

- STRATIFIED ANALYSIS using the Mantel Haenszel method to adjust for confounders

- IMPLEMENT A MATCHED-DESIGN after you have collected data (frequency or group)

- RESTRICTION is still possible at the analysis stage but it means throwing away data

- MODEL FITTING using regression techniques

Using the Mantel Haenszel Method to Report Adjusted OR’s

• Can only use with confounders because we assume that ORs are constant across stratum

• General Formula : where i = strata and N = Total

iii

iiiMH Ncb

NdaOR∑∑

=

• This technique generates a summary measure across strata by removing the effect of the confounder

Example of the Mantel Haenszel Method ...

Stroke No Stroke TotalFm Hsty 16 47 63No Fm Hsty 2 7 9Total 18 54 72

STRATUM 1: Pre-Menopause OR = 1.19

STRATUM 2: Post-Menopause OR = 1.05

Stroke No Stroke TotalFm Hsty 24 13 37No Fm Hsty 58 33 91Total 82 46 128

1.29OR MH =

1281358

72472

1283324

72716

×+

×

×+

×

Mantel Haenszel RELATIVE RISK

∑

∑+

+

=

i

iii

i

iii

MH

N)b(ac

N)d(ca

RR

HOW TO REPORT DATA WITH CONFOUNDERS

IF YOU HAVE A CONFOUNDER….

• DO NOT report crude OR or RR (you know it’s wrong)

• GOOD: Report stratum-specific OR or RR

• BEST: Report summary measures such as a Mantel-Haenszel OR (this is like compiling stratum-specific OR’s)

THIRD VARIABLE SUMMARY

Are stratum-specific OR’s the same?

YES

YES

NO

NO

crude OR = stratum-specific? INTERACTION… reportstratum-specific OR or RR

CONFOUNDINGReport summaryMeasure (MH OR)

NO CONFOUNDING or INTERACTIONReport crude OR or RR

Degree of ConfoundingDegree of Confounding

mmeasures the amount of confoundingrather than mere presence or absence

degree of confoundingdegree of confounding = crude measureadjusted measure

= 4.00 = 3.531.13

Crude = 1.68Adjusted = 3.97

d.c. = 1.68 = 0.423.97

over estimation

under estimation

AGE Recent Use of OC MI Controls OR

Yes 4 6225-29No 2 244

7.2

Yes 9 3330-34

No 12 3908.9

Yes 4 26 1.535-39

No 33 330

Yes 6 9 3.740-44

No 65 362

Yes 6 5 3.945-49

No 93 301

Yes 29 135 1.7TOTAL

No 205 160765

AGE Recent Use of OC MI Controls OR

Yes 4 6225-29No 2 244

7.2

Yes 9 3330-34

No 12 3908.9

Yes 4 26 1.535-39

No 33 330

Yes 6 9 3.740-44

No 65 362

Yes 6 5 3.945-49

No 93 301

Yes 29 135 1.7TOTAL

No 205 160765

97.3)MH(ORa =∧

44 fold risk of fold risk of MIMI among recent of among recent of OCOC users users as compared to nonas compared to non--usersusers..

TYPES OF ASSOCIATIONTYPES OF ASSOCIATION

A. Not statistically associated (Independent)

B. Statistically associated

1.1. NoncausallyNoncausally associated associated (Secondarily)

2.2. Causally associated Causally associated a. Indirectly associatedb. Directly causal

Example No.

Type of Confounding

Unadjusted Relative Risk

Adjusted Relative Risk

1 Positive 3.5 1.0

2 Positive 3.5 2.1

3 Positive 0.3 0.7

4 Negative 1.0 3.2

5 Negative 1.5 3.2

6 Negative 0.8 0.2

7 Qualitative 2.0 0.7

8 Qualitative 0.6 1.8

Hypothetical Examples of Unadjusted and Adjusted Relative Risks According to Type of confounding (Positive or Negative)

errors in epidemiological studies - · pdf fileepidemiological studies is ... different...

Documents