epidemiologic methods- fall 2002. bias in clinical research: selection and measurement bias...

Epidemiologic Methods- Fall 2002

Where we have been:

Making, assessing, and using measurements

Lecture

1

Title

Understanding Measurement: Reproducibility & Validity

2 Study Design

3 Measures of Disease Occurrence I

4 Measures of Disease Occurrence II

5 Measures of Disease Association I

6 Measures of Disease Association II

Where we are going:

Threats to validity in clinical research studies andhow can they be prevented

Lecture Title

7 Bias in Clinical Research: Selection and Measurement Bias

8 Confounding and Interaction I: General Principles

9 Confounding and Interaction II: Assessing Interaction

10 Confounding and Interaction II: Stratified Analysis

11 Conceptual Approach to Multivariable Analysis I

12 Conceptual Approach to Multivariable Analysis II

Bias in Clinical Research: Selection and Measurement Bias

• Framework for threats to validity (bias)

• Selection bias

– by study design:• descriptive • case-control• cross-sectional• longitudinal studies (cohort or experimental)

• Measurement bias

– exposure vs. outcome

– non-differential vs. differential

Internal vs External Validity

• Validity– before, for measurements:

• accuracy of evaluation of individual traits or characteristics– today, for entire studies:

• accuracy of inferences about populations

• Internal validity– Do the results obtained from the actual subjects accurately

represent the target population?

• External validity (aka generalizability)– Do the results obtained from the actual subjects pertain to

persons outside of the target population?– Internal validity is a prereq for external validity

Diseased

Exposed

+ -

+

-

REFERENCE/TARGET/SOURCE POPULATION

STUDY SAMPLE

INTERNAL VALIDITY

OTHER POPULATIONS EXTERNAL

VALIDITY

• The goal of any study is to find the truth• Ways of missing the truth (getting the wrong answer):

– Bias• Any systematic process that results in incorrect estimate of:

– measure of disease (or exposure) occurrence in a descriptive study

– measure of association between exposure and disease in an analytic study

– Chance• Random error

– type I– type II

Threats to Validity in Clinical Research

MetLife Is Settling Bias Lawsuit

BUSINESS/FINANCIAL DESK | August 30, 2002, Friday

MetLife said yesterday that it had reached a preliminary settlement of a class-action lawsuit accusing it of charging blacks more than whites for life insurance from 1901 to 1972.

MetLife, based in New York, did not say how much the settlement was worth but said it should be covered by the $250 million, before tax, that it set aside for the case in February.

“Bias” in Webster’s Dictionary1 : a line diagonal to the grain of a fabric; especially : a line at a 45° angle to the selvage often utilized in the cutting of garments for smoother fit2 a : a peculiarity in the shape of a bowl that causes it to swerve when rolled on the green b : the tendency of a bowl to swerve; also : the impulse causing this tendency c : the swerve of the bowl3 a : bent or tendency b : an inclination of temperament or outlook; especially : a personal and sometimes unreasoned judgment : prejudice

c : an instance of such prejudice

d (1) : deviation of the expected value of a statistical estimate from the quantity it estimates

(2) : systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others

4 a : a voltage applied to a device (as a transistor control electrode) to establish a reference level for operation b : a high-frequency voltage combined with an audio signal to reduce distortion in tape recording

Classification Schemes for “Ways of Getting the Wrong Answer”

• Szklo and Nieto– Bias

• Selection Bias• Information/Measurement Bias

– Confounding– Chance

• Other Common Approach– Bias

• Selection Bias• Information/Measurement Bias• Confounding Bias

– Chance

Selection Bias

• Technical definition – Bias that is caused when individuals have different

probabilities of being included in the study according to relevant study characteristics: namely, the exposure and the outcome of interest

• Plain definition– Bias that is caused by some kind of problem in the

process of selecting subjects initially or - in a longitudinal study - in the process that determines how long subjects participate in the study

Selection Bias in a Descriptive Study

• Pre-election surveys re: 1948 Presidential Election– various methods used to find subjects– largest % favored Dewey

• General election results– Truman beat Dewey

• Ushered in realization of the importance of representative (random) sampling

Leukemia Incidence Among Observers of a Nuclear Bomb Test

Caldwell et al. JAMA 1980• Smoky Atomic Test in Nevada• Outcome of 76% of troops at site was later found; occurrence

of leukemia determined

82% contacted by the investigators

18% contacted the investigators on their own

4.4 greater risk of leukemia than those

contacted by the investigators


STUDY SAMPLE

Descriptive Study: Unbiased Sampling


STUDY SAMPLE

Descriptive Study: Selection Bias

Diseased

Exposed

+ -

+

-

REFERENCE POPULATION

STUDY SAMPLE

Analytic Study: Unbiased Sampling

Diseased

Exposed

+ -

+

-


STUDY SAMPLE

Analytic Study: Selection Bias

Selection Bias in Case-Control Studies

Coffee and cancer of the pancreas MacMahon et al. N Eng J Med 1981; 304:630-3

Cases: patients with histologic diagnosis of pancreatic cancer in any of 11 large hospitals in the Boston and Rhode Island between October 1974 and August 1979

What study base gave rise to these cases?

How should controls be selected?

Selection Bias in a Case-Control Study

Coffee and cancer of the pancreas MacMahon et al. N Eng J Med 1981; 304:630-3

Controls: • Other patients under the care of the same physician of the

cases with pancreatic cancer.

• Patients with diseases known to be associated with smoking or alcohol consumption were excluded

207 275

9 32

MalesCase Control

Coffee: > 1 cup day

No coffee

OR= (207/9) / (275/32) = 2.7 (95% CI, 1.2-6.5)

Coffee and cancer of the pancreasMacMahon et al., (N Eng J Med 1981; 304:630-3)

216 307

482

41

Relative to the study base that gave rise to the cases, the:

Controls: • Other patients under the care of the same physician at the time

of an interview with a patient with pancreatic cancer

Most of the MDs were gastroenterologists whose other patients were likely advised to stop using coffee

• Patients with diseases known to be associated with smoking or alcohol consumption were excluded

Smoking and alcohol use are correlated with coffee use; therefore, sample is relatively depleted of coffee users

Cancer No cancer coffee

no coffee


STUDY SAMPLE

Case-control Study of Coffee and Pancreatic Cancer: Selection Bias

Selection Bias in a Cross-sectional Study

• Inclusion of prevalent cases causes all sorts of problems

• Finding a diseased person in a cross-sectional study requires 2 things:– the disease occurred in the first place– the case survived long enough to be sampled

• Any factor associated with a prevalent case of disease might be associated with disease development, survival with disease, or both

• Assuming goal is to find factors associated with disease development, bias in prevalence ratio occurs any time that exposure under study is associated with survival with disease

Selection Bias in a Cross-sectional Study

e.g. Smoking and emphysema

• Smoking is a cause of emphysema, but persons with emphysema who continue to smoke have shorter survival

• Hence, in any cross-section of persons with emphysema, those who smoke less are apt to be more greatly represented (because of the survival disadvantage of those who continue to smoke)

• Therefore, cross-sectional study of current smoking and emphysema will result in a prevalence ratio that underestimates the entity you are presumably really interested in: the incidence ratio

Emphysema

Smoke

+ -

+

-

REFERENCE/TARGET POPULATION

STUDY SAMPLE

Cross-sectional study of smoking and emphysema

Selection Bias: Cohort Studies/RCTs

• Among initially selected subjects, selection bias much less likely to occur compared to case-control or cross-sectional studies

– Reason: study participants (exposed or unexposed; treatment vs placebo) are selected (theoretically) before the outcome occurs

Diseased

Exposed

+ -

+

-


STUDY SAMPLE

Cohort Study/RCTSince disease has not occurred yet among initially selected subjects, there is no opportunity for disproportionate sampling with respect to exposure and disease

E

_E

Diseased

Exposed

+ -

+

-


STUDY SAMPLE

Cohort Study/RCTAll that is sampled is exposure status

Even if disproportionate sampling occurs, it will not result in selection bias when forming measures of association

E

_E

Selection Bias: Cohort Studies

• Selection bias can occur on the “front-end” of the cohort if diseased individuals are unknowingly entered into the cohort

• e.g.:

– Consider a cohort study of the effects of exercise on all-cause mortality among persons initially thought to be completely healthy.

– If some participants were enrolled had undiagnosed cardiovascular disease and as a consequence were more likely to exercise less, what would the effect be on the measure of association?

Death No death

exercise

no exercise


STUDY SAMPLE

Cohort Study of Exercise and Survival

Selection bias will lead to spurious protective effect of exercise


• Most common form of selection bias does not occur with the process of initial selection of subjects

• Instead, selection bias most commonly caused by forces that determine length of participation (who ultimately stays in the analysis) i.e. loss to follow-up

– When those lost to follow-up have a different probability of the outcome than those who remain (i.e. informative censoring) AND

– this probability is different across exposure groups

– selection bias results


e.g., Cohort study of progression to AIDS: IDU vs homosexual men

• In general, getting sicker is a common reason for loss to follow-up

• Therefore, persons who are lost to follow-up have different AIDS incidence than those who remain (i.e., informative censoring)

• In general, IDU more likely to become loss to follow-up - at any given level of feeling sick

• Therefore, the degree of informative censoring differs across exposure groups (IDU vs homosexual men)

• Results in selection bias: underestimates the incidence of AIDS in IDU relative to homosexual men

Effect of Selection Bias in a Cohort Study

Survival assuming no informative censoring and no difference between IDU and homosexual men

Effect of informative censoring in IDU group

Effect of informative censoring in homosexual male group

AIDS No AIDS

IDU

Homo-sexual men


STUDY SAMPLE

Cohort Study of HIV Risk Group and AIDS Progression

Selection bias will lead to spurious underestimation of AIDS incidence in IDU group

Managing Selection Bias• Prevention and avoidance are critical• Unlike confounding where there are solutions in the analysis of the

data, once the subjects are selected, there are usually no fixes for selection bias

• In case-control studies:– Follow the study base principle

• In cross-sectional studies:– Be aware of how exposure in question affects disease survival

• In longitudinal studies (cohorts/RCTs):– Screen for occult disease at baseline– Avoid losses to follow-up

Measurement Bias

• Definition– bias that is caused when the information collected

about or from subjects is inaccurate (invalid; erroneous)

• any type of variable: exposure, outcome, or confounder

– aka: misclassification bias; information bias (text); identification bias

• misclassification is the immediate result

Definition of Terms Related to Measurement Accuracy

• Sensitivity

– the ability of a test (measurement) to identify correctly

those who have the characteristic (disease or exposure)

of interest.

• Specificity

– the ability of a test (measurement) to identify correctly

those who do NOT have the characteristic of interest

Causes for Misclassification

• Participant recall

• Ambiguous questions

• Under or overzealous interviewers

• Problems in biological specimen question

• Faulty instruments

• Data management problems

•

•

•

Diseased

Exposed

+ -

+

-


STUDY SAMPLE

Non-Differential Misclassification of Exposure

Problems with sensitivity - independent of disease status

Problems with specificity - independent of disease status

Non-differential Misclassification of Exposure

Truth: No misclassification (100% sensitivity/specificity)

Exposure Cases ControlsYes 50 20No 50 80

OR= (50/50)/(20/80) = 4.0

Presence of 70% sensitivity in exposure classification

Exposure Cases ControlsYes 50-15=35 20-6=14No 50+15=65 80+6=86

OR= (35/65)/(14/86) = 3.3

Effect of non-differential misclassification of 2 exposure categories: Bias the OR toward the null value of 1.0

Diseased

Exposed

+ -

+

-


STUDY SAMPLE

Non-Differential Misclassification of Exposure: Imperfect Sensitivity

Problems with sensitivity

Diseased

Exposed

+ -

+

-


STUDY SAMPLE

Non-Differential Misclassification of Exposure

Problems with sensitivity - independent of disease status

Problems with specificity - independent of disease status

Non-Differential Misclassification of Exposure: Imperfect Sensitivity and Specificity

Exposure Cases ControlsYes 50 20No 50 80 True OR = (50/50) / (20/80) = 4.0

True Cases Controls Distribution exp unexp exp unexp (gold standard) 50 50 20 80

Study distribution: Cases ControlsExposed 45 10 55 18 16 34Unexposed 5 40 45 2 64 66

sensitivity 0.90 0.80 0.90 0.80 or specificity

Exposure Cases ControlsYes 55 34No 45 66 Observed OR = (55/45) / (34/66) =2.4


Study Sample

Non-differential Misclassification of Exposure: Magnitude of Bias on the Odds Ratio

Assume True OR=4.0

2.20.0770.900.90

2.80.200.900.90

3.00.3680.900.90

1.90.200.600.90

3.20.200.950.90

1.90.200.850.60

2.60.200.850.90

Observed ORPrev of Exp in controls

SpecificitySensitivity

Diseased

Exposed

+ -

+

-


STUDY SAMPLE

Non-Differential Misclassification of Outcome

Problems with sensitivity -independent of exposure status

Problems with specificity - independent of exposure status

Non-differential Misclassification of Outcome: Magnitude of Bias on the Odds Ratio

Assume True OR=4.0

2.10.200.600.90

3.20.200.950.90

1.90.200.850.60

2.80.200.850.90

Observed ORPrev of Exp in controls

SpecificitySensitivity

Special Situation In a Cohort or Cross-sectional Study

Misclassification of outcome• If specificity of outcome measurement is 100%• Any degree of imperfect sensitivity, if non-differential, will not

bias the risk ratio or prevalence ratio• e.g.

• Worth knowing about when choosing cutoff for continuous variables on ROC curves: choose most specific cutoff

DiseaseNoDisease

Exposed 20 80 100Unexposed 10 90 100

2.0

1001010020

ratio )prevalence (or Risk

DiseaseNoDisease

Exposed 20-6=14 80+6=86100Unexposed 10-3=7 90+3=93100

2.0

1007

10014

ratio )prevalence (or Risk

Truth

70% sensitivity

Differential Misclassification of ExposureWeinstock et al. AJE 1991• Nested case-control study with Nurses Health Study

• Cases: women with new melanoma diagnoses

• Controls: women w/out melanoma - by incidence density sampling

• Measurements: questionnaire about “tanning ability”; administered

shortly after melanoma development

MelanomaNoMelanoma

No tan to light tan 15 77Med to dark tan 19 157

1.6

157771915

OR

• Question asked after diagnosis

• Question asked before diagnosis (NHS baseline)

MelanomaNoMelanoma


0.7

15579259

OR

MelanomaNoMelanoma


1.6

157771915

OR

Diseased

Exposed

+ -

+

-


STUDY SAMPLE

“Tanning Ability” and Melanoma

Imperfect specificity - mostly in cases

Differential Misclassification of Exposure: Magnitude of Bias on the Odds Ratio

Assume True OR=3.9

Exposure Classification

Sensitivity Specificity

Cases Controls Cases Controls OR

0.90 0.60 1.0 1.0 5.79

0.60 0.90 1.0 1.0 2.22

1.0 1.0 0.9 0.70 1.00

1.0 1.0 0.7 0.90 4.43

Prevalence of Exposure in Controls = 0.1

Misclassification: Summary of Effects• Dichotomous exposure and outcome

• Multi-level exposure and/or outcome– more complicated and less predictable– e.g. non-differential misclassification can lead to bias

away from null

Misclassification Measure of AssociationNon-differential

Exposure Towards nullOutcome Towards null*

DifferentialExposure Away or towards nullOutcome Away or towards null

*Exception: When specificity is 100%, no effect on risk ratio regardless of sensitivity

Poor Reproducibility

Poor Validity

Good Reproducibility

Good Validity

Managing Measurement Bias

• Prevention and avoidance are critical

• If true sensitivity/specificity are known, complex back-calculation techniques exist that can be used in the analysis phase

• Optimize the reproducibility/validity of your measurements!

Selection Bias in a Clinical Trial

• Losses to follow-up are the big unknown in clinical trials and the major potential for selection bias

• If:

– a symptomatic side effect of a drug is more common in persons “sick” from disease

– occurrence of the side effect is associated with more losses to follow-up

• Then:

– drug treatment group would be selectively depleted of the sickest persons

– drug overall looks better

epidemiologic methods- fall 2002. bias in clinical research: selection and measurement bias...

Documents