epidemiologic methods - fall 2011 unifying theme of study design: sampling underlying cohorts design...
TRANSCRIPT
Epidemiologic Methods - Fall 2011
Where we have been:
Designing studies, measuring disease occurrence, andestimating associations
Lecture Title
1 Study Design
2 Measures of Disease Occurrence I
3 Measures of Disease Occurrence II
4 Measures of Disease Association I
5 Measures of Disease Association II
Unifying theme of study design: sampling underlying cohorts
Design begets measures
Where we are going:
Threats to validity in clinical research and how can they be prevented
6 Selection Bias
7 Understanding Measurement: Reproducibility & Validity Journal Club
8 Measurement Bias
9 Confounding and Interaction I: General Principles
Journal Club
10 Confounding and Interaction II: Assessing Interaction
11 Confounding and Interaction III: Stratified Analysis Journal Club
12 Journal Club
Bias in Clinical Research: General Aspects and Focus on Selection Bias
• Framework for understanding error in clinical research
– systematic error, aka threats to internal validity or bias
– random error, aka sampling error or chance
• Selection bias (a type of systematic error)
– according to objective: descriptive or analytic
– by study design:• cross-sectional• case-control• longitudinal studies (cohort: observational or experimental)
WARNING: SHIFTING GEARS
• Today: A lot of theory– No equations or cook-book algorithms
• Why?– Identifying (or preventing) bias not a formulaic process
– Requires human intelligence• sound knowledge of theory
A Framework for Classifying Error
Clinical Research:SampleMeasure
(Intervene)Analyze
Infer (make an inference)• Inference
– Websters: act of passing from sample data to generalizations, with unknown degree of certainty
– All we can do is make educated guesses about the soundness of our inferences
– Those who are more educated will make better guesses
• Anyone can get a numeric answer
• The challenge is to tell if it is correct
Disease
Exposure
+ -
+
-
REFERENCE/TARGET/SOURCE POPULATIONaka STUDY BASE STUDY SAMPLE
OTHER POPULATIONS
Two types of inferences
Disease
Exposure
+ -
+
-
San Franciscans, 20 to 65 years old
SAMPLE of San Franciscans, 20 to 65 yrs old
>65 years old in U.S.
20 to 65 year olds, in U.S., outside of San Francisco
20 to 65 year olds, in Europe
Disease
Exposure
+ -
+
-
REFERENCE/TARGET/SOURCE POPULATIONaka STUDY BASE STUDY SAMPLE
Most important inference is the
first one
Without an accurate first
inference, there is little point
considering the second
inference
Attempts in study design to enhance the second inference are
often in conflict with goal of making a sound first inference
• The goal of any study is make an accurate (true) inference, i.e.:
– measure of disease occurrence in a descriptive study
– measure of association between exposure and disease in an analytic study
• Ways of getting the wrong answer:
– systematic error; aka “threat to validity” or bias
• any systematic process in the conduct of a study that causes a distortion from the truth in a predictable direction
• captured in the validity of the inference
– random error; aka chance or sampling error
• occurs because we cannot study everyone (we must sample)
• direction is random and not predictable
• captured in the precision of the inference (e.g., SE and CI)
Error in Clinical Research
Good Validity
Good Precision
Poor Validity
Poor Precision
Validity and Precision: Each Shot at Target Represents the ‘Answer’ from a Study Sample of the Same Sample Size of a Given Study Design
Validity and Precision
Poor Validity
Good Precision
Validity and Precision
Good
valid
ity; G
ood
prec
ision
- A
Poor v
alidit
y; Goo
d Pre
cision
- C
Good
valid
ity; P
oor p
recis
ion -
B
Poor v
alidit
y; Poo
r pre
cision
- D
Validity? Precision?
• Answer: Good validity; poor precision
Validity and Precision
Poor Validity
Good Precision
Good Validity
Poor Precision
Systematic error (bias)
Random error
(chance)
Random error
(chance)
No
Systematic error
Performing an Actual Study: You Only Have One Shot
Field of “statistics” can tell you the random
error (precision)
with formulae for
confidence intervals
Only judgment can tell you about
systematic error
(validity)
Judgment requires
substantive and
methodologic knowledge
Disease
Exposure
+ -
+
-
REFERENCE/TARGET/SOURCE POPULATION
? INTERNAL VALIDITY
OTHER POPULATIONS
? EXTERNAL VALIDITY (generalizability)
STUDY SAMPLE
Two Types of Inferences
Correspond to Two Types of Validity
Two Types of InferencesCorrespond to Two Types of Validity
1. Internal validity– Do the results obtained from the actual subjects accurately
represent the target/reference/source population?– Epidemiologic theory guides assessment
2. External validity (generalizability)– Do the results obtained from the actual subjects pertain to persons
outside of the source population?– Internal validity is a prerequisite for external validity– Always just a guess
• “Validity” typically means internal validity– “Threat to validity” = threat to internal validity– Identifying threats to validity is a critical aspect of research
Why Do We Need Valid Studies?
• The goal of any study is make an accurate (true) inference, i.e.:
– measure of disease occurrence in a descriptive study
– measure of association between exposure and disease in an analytic study
• Ways of getting the wrong answer:
– Our focus: systematic error = threats to validity = bias
• a systematic process in the conduct of a study that causes a distortion from the truth in a predictable direction
• captured in the validity of the inference
– random error; aka chance or sampling error
• occurs because we cannot study everyone (we must sample)
• direction is random and not predictable
• captured in the precision of the inference (e.g., SE and CI)
Error in Clinical Research
MetLife Is Settling Bias Lawsuit
BUSINESS/FINANCIAL DESK August 30, 2002, Friday
MetLife said yesterday that it had reached a preliminary settlement of a class-action lawsuit accusing it of charging blacks more than whites for life insurance from 1901 to 1972.
MetLife, based in New York, did not say how much the settlement was worth but said it should be covered by the $250 million, before tax, that it set aside for the case in February.
“Bias” in Webster’s Dictionary1 : a line diagonal to the grain of a fabric; especially : a line at a 45° angle to the selvage often utilized in the cutting of garments for smoother fit2 a : a peculiarity in the shape of a bowl that causes it to swerve when rolled on the green b : the tendency of a bowl to swerve; also : the impulse causing this tendency c : the swerve of the bowl3 a : bent or tendency b : an inclination of temperament or outlook; especially : a personal and sometimes unreasoned judgment : prejudice
c : an instance of such prejudice
d (1) : deviation of the expected value of a statistical estimate from the quantity it estimates
(2) : systematic error introduced into sampling or testing
4 a : a voltage applied to a device (as a transistor control electrode) to establish a reference level for operation b : a high-frequency voltage combined with an audio signal to reduce distortion in tape recording
Bias of Priene (600 - 540 BC)
• One of the 7 sages of classical antiquity• Consulted by Croesus, king of Lydia,
about the best way to deploy warships against the Ionians
• Bias wished to avoid bloodshed, so he misled Croesus, falsely advising him that the Ionians were buying horses
• Bias later confessed to Croesus that he had lied.
• Croesus was pleased with the way that he had been deceived by Bias and made peace with the Ionians.
• Bias = deviation from truthBMJ 2002;324:1071
Classification Schemes for Error
• Szklo and Nieto– Bias (Systematic error)
• Selection Bias• Information/Measurement Bias
– Confounding– Chance (Random error)
• Other Common Approach– Bias (Systematic error)
• Selection Bias• Information/Measurement Bias• Confounding Bias
– Chance (Random error)
Think of the “BIG 4” in all of your work
Sackett DL. Bias in analytic research. J Chron Dis 1979
selection biasmeasurement biasconfounding bias
vs.
popularity biascentripetal biasreferral filter biasdiagnostic access biasdiagnostic suspicion biasunmasking biasmimicry biasprevious opinion biasadmission biasprevalence-incidence biasdiagnostic vogue biasdiagnostic purity biasprocedure selection biasmissing clinical data biasnon-contemporaneouscontrol biasstarting time bias
volunteer biascontamination biaswithdrawal biascompliance biastherapeutic personality biasbogus control biasinsensitive measure biasunderlying cause biasend-digit preference biasapprehension biasunacceptability biasobsequiousness biasexpectation biassubstitution gamefamily information biasexposure suspicion biasrecall bias etc
Selection Bias• Technical definition
– Bias that is caused when individuals have different probabilities of being included in the study according to relevant characteristics: namely, the exposure and the outcome of interest
• Easier definition– Bias that is caused by some kind of systematic problem in the
process of selecting subjects initially or - in a longitudinal study - in the process that determines which subjects drop out of the study
• Problem caused by:– Investigators: Faulty study processes
– Participants: By choosing not to participate/ending participation or dying prior to event of interest
– (or both)
Unique to human subjects research
Selection Bias in a Descriptive Study
• Most fulminant: Surveys for 1948 Presidential election– various cross-sectional studies used to find subjects– largest % favored Dewey
• General election results– Truman beat Dewey
• Explanation: Bad Study Design
• Ushered in realization of the importance of representative (random) sampling in all fields
N= 894 sample Actual vote
Yes 4,717,006 (55%)No 3,809,090 (45%)
The San Francisco Chronicle
Should Gov. Davis be recalled?
No, retain Davis39%
Yes, recall Davis57%
Undecided4%
Based on a survey conducted in English and Spanish among random samples of people likely to vote in California’s Oct. 7 recall election
Election polls provide opportunity to later look at truth and evaluate bias in
study design
Luxury rarely occurs in clinical research
SOURCE POPULATION
STUDY SAMPLE
Descriptive Study: Depiction of No Selection Bias (Unbiased Sampling)
Even dispersion of arrows
SOURCE POPULATION
STUDY SAMPLE
Descriptive Study: Depiction of Selection Bias (Biased Sampling)
Uneven dispersion of arrows
e.g., Dewey backers were
over-represented
Leukemia Among Observers of a Nuclear Bomb Test
Caldwell et al. JAMA 1980• Smoky Atomic Test in Nevada• Outcome of 76% of observing troops at site was later found;
occurrence of leukemia determined
82% contacted by the investigators
18% contacted the investigators on their own
4.4-fold greater prevalence of leukemia than those contacted by
the investigators
Explanation: Human nature (affected humans like to come forward)
0.0
00
.05
0.1
00
.15
0.2
0P
rop
ort
ion
de
cea
sed
0 .5 1 1.5 2 2.5 3 3.5Time since initiation of antiretroviral therapy (years)
Mortality following initiation of antiretroviral therapy in Uganda
In the presence of 39% loss to follow-up at 3 years
Geng et al. JAMA 2008
Assum
e all
lost
are
dead
- A
Some
othe
r ide
a - E
Consu
lt a b
iosta
tistic
ian -
C
Mat
ch lo
sses
to n
at’l d
eath
inde
x - B
Hopele
ss; c
hoos
e an
othe
r pro
ject -
D
0.00
0.05
0.10
0.15
0.20
Pro
port
ion
dece
ased
0 .5 1 1.5 2 2.5 3 3.5Time since initiation of antiretroviral therapy (years)
Mortality following initiation of
antiretroviral therapy in Uganda
In the presence of 39% loss to follow-up at 3 yrs
What else to do at this point?
• Answer: Some other idea (sampling the lost)
0.0
00
.05
0.1
00
.15
0.2
0P
rop
ort
ion
de
cea
sed
0 .5 1 1.5 2 2.5 3 3.5Time since initiation of antiretroviral therapy (years)
Mortality following initiation of antiretroviral therapy in Uganda
Accounting for losses to follow-up by tracking down vital status of a sample of the lost in the community
Naive estimate
Corrected estimate Selection bias
(5-fold change)
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Analytic Study: Depiction of No Selection Bias (Unbiased Sampling)
Given that a person resides in one of the 4 cells in the source population, the selection probability is the probability he/she will be
represented in that cell in the study sample.
Equal weighted arrows = Equal selection probability
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Analytic Study: Depiction of No Selection Bias (Unbiased Sampling)
Equal selection probability in all 4 cells:
No Selection Bias
For selection bias to occur, selection probabilities must
differ according to both exposure and disease
40000
10000
10000
40000
PR = (40,000/50,000)/(10,000/50,000) = 4
1% 1%
1%
1%
400
100
100
400
PR = (400/500)/ (100/500) = 4
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Unequal selection probability isolated to one cell:
Underestimate of Exposure Effect
For selection bias to occur, selection probabilities must
differ according to both exposure and disease
40000
10000
10000
40000
PR = (40,000/50,000)/(10,000/50,000) = 4
1% 1%
0.5%
1%
200
100
100
400
PR = (200/300)/ (100/500) = 3.3
Analytic Study: Depiction of Selection Bias (Biased Sampling)
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Unequal selection probability:
Overestimate of Effect
Analytic Study: Depiction of Selection Bias (Biased Sampling)
For selection bias to occur, selection probabilities must
differ according to both exposure and disease
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Unequal selection probability:
Overestimate of Effect
Analytic Study: Depiction of Selection Bias (Biased Sampling)
For selection bias to occur, selection probabilities must
differ according to both exposure and disease
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Unequal selection probability:
Underestimate of Effect
Analytic Study: Depiction of Selection Bias (Biased Sampling)
For selection bias to occur, selection probabilities must
differ according to both exposure and disease
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Unequal selection probability:
Underestimate of Effect
Analytic Study: Depiction of Selection Bias (Biased Sampling)
For selection bias to occur, selection probabilities must
differ according to both exposure and disease
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Analytic Study: Depiction of No Selection Bias (Unbiased Sampling)
Unequal selection probability but only
according to exposure:No Selection Bias
For selection bias to occur, selection probabilities must
differ according to both exposure and disease
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Analytic Study: Depiction of No Selection Bias (Unbiased Sampling)
For selection bias to occur, selection probabilities must
differ according to both exposure and disease
40000
10000
10000
40000
PR = (40,000/50,000)/(10,000/50,000) = 4
1% 1%
0.1%
0.1%
40
100
10
400
PR = (40/50)/ (100/500) = 4
Unequal selection probability but only
according to exposure:No Selection Bias
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Analytic Study: Depiction of No Selection Bias (Unbiased Sampling)
Unequal selection probability but only
according to disease:No Selection Bias
For selection bias to occur, selection probabilities must
differ according to both exposure and disease
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Typically, in practice, you don’t know the selection
probabilities
Selection Bias in a Cross-sectional Study: Presence of exposure and disease at outset invites selection bias
?
? ?
?
Mechanisms of Unequal Selection Probabilities in Cross-Sectional Studies
– “Non-response” (eligible subjects in accessible population refuse participation according to exposure & outcome)
– Exposure influences survival/drop-out among non-diseased– Exposure influences survival/drop-out among diseased
Assuming that the goal is to identify
determinants of disease development (etiologic research)
History of Heart Attack
Hyper-lipidemia
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Selection Bias in a Cross-sectional Study: Effect of Non-Responders
Austin, AJE 1981Survey of S. California adults
OR observed = 3.6
25 347
45 2312
Overall 83% Response?
?
?
?
History of Heart Attack
Hyper-lipidemia
+ -
+
-
SOURCE POPULATION
Investigators made the extra effort to track down and question the initial non-
responders
Selection Bias in a Cross-sectional Study: Effect of Non-Responders
Austin, AJE 1981Survey of S. California adults
OR true = 3.3
2807100%100%
63
30100%
401100%
CORRECTED STUDY SAMPLE% responding
History of Heart Attack
Hyper-lipidemia
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Investigators made the extra effort to track down and question the initial non-
responders
Selection Bias in a Cross-sectional Study: Effect of Non-Responders
83% 87%
83%72%Austin, AJE 1981Survey of S. California adults
OR biased = 3.6
OR true = 3.325 347
45 2312
2807100%100%
63
30100%
401100%
CORRECTED STUDY SAMPLE
Response % Selection bias
Effect of unequal response probabilities in a cross-sectional study
Group Exposure Outcome
Bias in OR due to non-
response
Men Family h/o MI Heart failure +63%
Hypertension Stroke -32%
Women Family h/o stroke Stroke +59%
Family h/o diabetes Stroke -34%
Austin, AJE 1981Survey of S. California adults
Mechanism: Non-participation among some potential subjects
(“Non-response bias”; Study design is fine)
Mechanisms of Unequal Selection Probabilities in Cross-Sectional Studies
– “Non-response” (eligible subjects in accessible population refuse participation according to exposure & outcome)
– Exposure influences survival/drop-out among non-diseased– Exposure influences survival/drop-out among diseased
Assuming that the goal is to identify
determinants of disease development (etiologic research)
Selection Bias in a Cross-Sectional Study• Is glutathione S-transferase class deletion (GSTM1-null) polymorphism
associated with increased risk of breast cancer?
• With prevalent breast cancer in cross-sectional study, an association with GSTM1-null is seen depending upon the no. of years since diagnosis
• But not with brand new incident diagnoses (via case-control study)
Kelsey et al. Canc Epi Bio Prev 1997
4 - 8 yrCancer
Nocancer
GSTM1-null
52 126
GSTM1-positive
39 119
OR = 1.3
CancerNo
cancer
GSTM1-null
119 115
GSTM1-positive
121 124
OR = 1.08
Dx <4 yr Cancer
No cancer
GSTM1-null
44 126
GSTM1-positive
43 119
OR = 0.97
>8 yrCancer
Nocancer
GSTM1-null
44 126
GSTM1-positive
21 119
OR = 2.0
GSTM1-null is associated with survival after breast cancer, but
not with cancer development
Mechanisms of Unequal Selection Probabilities in Cross-Sectional Studies
– “Non-response” (eligible subjects refuse according to exposure & outcome)
– Exposure influences survival/drop-out among non-diseased– Exposure influences survival/drop-out among diseased
Breast Cancer
GSTM1
+ -
null
SOURCE POPULATION
STUDY SAMPLE
Cross-sectional study of GSTM1 polymorphism and breast cancer
pos.
Bias: overestimate effect of GSTM-1 null polymorphism in causing breast
cancer
Mechanisms of Unequal Selection Probabilities in Case-Control Studies
– “Non-response” (identified cases and controls refuse participation according to exposure & outcome)
– Exposure influences survival/drop-out among non-diseased– Exposure influences survival/drop-out among diseased
– Violation of the study base principle (choosing the wrong controls)
PLUS, since exposure and disease present at the outset, all the same mechanisms seen in cross-sectional study:
Selection Bias in Case-Control Studies: Presence of Exposure & Disease at Outset Also Invites Selection Bias
Cases: patients with histologic diagnosis of pancreatic cancer in any of 11 large hospitals in Boston and Rhode Island between October 1974 and August 1979
What type of study base is this?
Primar
y Stu
dy B
ase
- A
Secon
dary
Stu
dy B
ase
- B
Coffee and cancer of the pancreas. MacMahon et al. NEJM 1981
• Answer: Secondary study base
Selection Bias in Case-Control Studies: Presence of Exposure & Disease at Outset Also Invites Selection Bias
Cases: patients with histologic diagnosis of pancreatic cancer in any of 11 large hospitals in Boston and Rhode Island between October 1974 and August 1979
How should controls be chosen?
Rando
m d
igit d
ialing
in a
rea
- A
Appen
dicitis
adm
ission
s to
hosp
itals
- C
Neighb
ors o
f the
case
s - B
Hopele
ss; c
hoos
e an
othe
r pro
ject -
D
• Answer: None of these are quite right. Choosing controls in the face of a secondary study base is not easy. Random digit dialing is probably the best answer, although hopeless is also acceptable.
Selection Bias in a Case-Control Study
Coffee and cancer of the pancreas MacMahon et al. NEJM 1981
Controls: • Other patients without pancreatic cancer under the care of the
same physician of the cases with pancreatic cancer.
• Patients with diseases known to be associated with smoking or alcohol consumption were excluded
207 275
9 32
Case Control
Coffee: > 1 cup day
No coffee
OR= (207/9) / (275/32) = 2.7 (95% CI, 1.2-6.5)
Coffee and cancer of the pancreasMacMahon et al., NEJM 1981
216 307
Biased?
Relative to the hypothetical study base that gave rise to the cases, the selected controls were depleted of coffee users
Selected controls were: • Other patients under the care of the same physician at the time of
an interview with a patient with pancreatic cancer
Most of the MDs were gastroenterologists whose other patients were likely advised to stop using coffee
• Patients with diseases known to be associated with smoking or alcohol consumption were excluded
Smoking and alcohol use are correlated with coffee use; therefore, sample is relatively depleted of coffee users
Conclusion: Controls vastly depleted of coffee users compared to true study base
Mechanisms of Bias
– “Non-response” (identified cases and controls refuse according to exposure & outcome)
– Exposure influences survival/drop-out among non-diseased– Exposure influences survival/drop-out among diseased
– Violation of the study base principle (choosing the wrong controls)
PLUS, since exposure and disease present at the outset, all the same mechanisms seen in cross-sectional study:
Cancer No cancer coffee
no coffee
SOURCE POPULATION
STUDY SAMPLE
Case-control Study of Coffee and Pancreatic Cancer: Depiction of Selection Bias
Bias: overestimate effect of coffee in
causing cancer
1410
8284
Case Control
Coffee: > 1 cup day
No coffee
OR= (84/10) / (82/14) = 1.4 (95% CI, 0.55 - 3.8)
Coffee and cancer of the pancreas:Use of population-based controls
•Gold et al. Cancer 1985
Mechanisms of Unequal Selection Probabilities in Cohort Studies
– Among initially selected subjects, selection bias “on the front end” less likely to occur compared to case-control or cross-sectional studies
– Reason: subjects (exposed or unexposed; treatment vs placebo) are
selected before the outcome occurs
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Cohort Study/RCTAt the outset, since disease has not occurred yet among initially selected subjects, there is typically no opportunity for unequal sampling with respect to exposure and disease. (We cannot yet draw the 4 arrows)
Disease
Exposure
+ -
+
-
SOURCE POPULATION
STUDY SAMPLE
Cohort Study/RCTAll that is sampled at the beginning is exposure status (the “margins”)
Even if unequal sampling of exposed or unexposed groups occurs, it will not
result in selection bias when forming measures of
association
A + B
C + D
a + b
c + d
Selection Bias among Initially Enrolled: Cohort Studies
• Selection bias can occur on “front-end” of cohort if diseased persons:
– are unknowingly entered into the cohort;
– unequally distributed across exposure; and
– reason for maldistribution is because disease causes exposure
• e.g.:
– Consider a cohort study of effect of exercise on all-cause mortality in persons initially thought to be completely healthy.
– If some participants were enrolled had undiagnosed cardiovascular disease and as a consequence were more likely to exercise less, what would happen to the measure of association?
Death No death
exercise
no exercise
SOURCEPOPULATION
STUDY SAMPLE
Cohort Study of Exercise and Survival
Selection bias will lead to spurious protective effect of exercise (assuming truly no effect)
Mechanisms of Unequal Selection Probabilities in Cohort Studies
– Most common form of selection bias does not occur with the process of initial selection of subjects
– Instead, selection bias most commonly caused by forces that determine length of participation (i.e., who ultimately stays in the analysis; losses)
When do Losses Cause Selection Bias in Cohort Studies/RCTs?
• Selection bias caused by forces that determine length of participation (i.e., who ultimately stays in the analysis; losses): – When losses have a different incidence of outcome than
those who remain (i.e. informative censoring) in at least one of the exposure groups
AND
– Rate of informative censoring differs across exposure groups
• Selection bias results
Selection Bias: Cohort Studies
e.g., Cohort study of progression to cirrhosis in hepatitis C virus carriers: IDU vs transfusion recipients
All the ingredients are present for selection bias:
• Informative censoring is present– getting sick with cirrhosis is a common reason for loss to follow-up
– persons who are lost to follow-up have greater cirrhosis incidence than those who remain (i.e., informative censoring)
• Informative censoring is differential across exposure groups– IDU more likely to become lost to follow-up - at any level of feeling sick
– i.e., the magnitude of informative censoring differs across exposure groups (IDU vs transfusion recipients)
• Result: selection bias -- underestimates the incidence of cirrhosis in IDU relative to transfusion recipients
Effect of Selection Bias in a Cohort Study
Assuming no informative censoring and no difference between IDU and transfusion recipients (superimposed lines)
Effect of informative censoring in IDU group
Effect of informative censoring in transfusion recipients
Time
Pro
bab
ilit
y of
bei
ng
cirr
hos
is-
free
Selection bias
Cirrhosis No Cirrhosis
IDU
Transfusion recipients
SOURCE POPULATION
STUDY SAMPLE
Cohort Study of Risk Group and Cirrhosis Progression: Depiction of Selection Bias
Selection bias will lead to spurious underestimation of cirrhosis incidence in both exposure groups, more so in IDU group
Mechanism: Sick subjects stopped coming
Effect of losses to follow-up in a cohort study
Bisson, PLoSOne, 2008
Naively Ignoring Losses
Tracking Down Vital Status on Losses
Determinants of survival after initiation
of antiretroviral therapy in Africa
1.0 1.0
1.0 1.0
1.0 1.0
Selection bias
Selection Bias in a Randomized Clinical Trial
• If randomization is performed correctly, then selection bias on the “front-end” of the study (i.e., differential inclusion of diseased individuals between arms) is not possible (other than by chance)
– even if diseased individuals are unknowingly included, randomization typically ensures that this occurs evenly across treatment groups
Selection Bias in a Randomized Clinical Trial
• Losses to follow-up are the big unknown in clinical trials and the major potential cause of selection bias
• e.g., Assume that:– a symptom-causing side effect of a drug is more common in
persons “sick” from the disease under study– occurrence of the side effect is associated with more losses to
follow-up • Then:
– Compared to placebo, drug treatment group would be selectively depleted of the sickest persons (i.e., informative censoring)
– Would make drug treatment group appear better
Effect of Selection Bias in an RCT
Survival assuming no informative censoring and no difference between drug and placebo (superimposed curves)
Effect of informative censoring in drug group
Time
Pro
bab
ilit
y of
non
-d
isea
se
Selection bias
Managing Selection Bias• Prevention and avoidance are critical
– Unlike confounding where there are solutions in the analysis of the data, once the subjects are selected and their follow-up occurs, there are usually no easy fixes for selection bias
• In cross-sectional studies:– Strive for high response percentages
– Be aware of how exposure affects survival in diseased/non-diseased
• In case-control studies:– Follow the study base principle
– Mind the usual mechanisms of bias in cross-sectional studies
• In longitudinal studies (cohorts/RCTs):– Carefully screen for disease at baseline (front end)
– Avoid losses to follow-up (back end)
– Consider approaches to tracking down the lost (or at least a sample)
Extra Slides
• Clicker questions potentials:
• --When analytic studies are introduced, guess the direction of the bias when one cell has unequal selection probability
• -- When cohort studies are introduced, ask whether cohort studies should be more or less or the same risk for selection bias?
Emerging Terminology: “Causal Research”
• Goal: Identify causal relationships
• 6 ways a statistical association can occur1. Chance
2. Selection bias
3. Measurement bias
4. Confounding
5. Reverse causation
6. True causal relationship
• Process of causal research: rule out the first 5
Preventing and Managing Losses to Follow-upPrevention• Select those most willing to participate (internal validity before generalizability)
• Obtain comprehensive contact information– SSN (critical for death index), DOB– Middle initial, father’s surname– Address– Friends and family members
• Engage participants while in follow-up
Management• When losses occur, contact:
– postal service for change of address– DMV– National death index
• Search for a sample of those lost
Time Permitting
Diseased
Exposed
+ -
+
-
REFERENCE/TARGET/SOURCE POPULATION
STUDY SAMPLE
? INTERNAL VALIDITY
OTHER POPULATIONS
? EXTERNAL VALIDITY (generalizability)
STUDYPOPULATION
• Conditions for selection bias in a longitudinal study, I believe, turn out to more complicated than on the prior slide (and it may may matter if one is talking about the ratio or the absolute difference scale). Keeping just on the ratio scale:
• Conditions seem to be:– Informative censoring in one group (at least one group) but not the
other, this automatically gives selection bias– Informative censoring in both groups:
• If degree of informative censoring is same in both groups (ie those who are lost have twice the rate as those who stay) , then bias can occur if the magnitude of the losses differ across groups
• If degree of informative censoring is different in the groups, then ?? anything can happen. Even if one group has twice event rate in those who are lost than the other group, this could be balanced in the other group if they lost a lot of people --- need to think about this
Selection Bias in a Cross-sectional Study
e.g., Smoking and emphysema
• Smoking is a cause of emphysema, but persons with emphysema who continue to smoke have shorter survival
• Hence, in any cross-section of persons with emphysema, those who smoke less are apt to be more greatly represented (because of the survival disadvantage of those who continue to smoke)
• Therefore, cross-sectional study of current smoking and emphysema will result in a prevalence ratio that underestimates the entity you are presumably interested in: the risk (incidence) ratio
Another Mechanism for Selection Bias in Cross-sectional Studies
• Finding a diseased person in a cross-sectional study requires 2 things:– the disease occurred in the first place– person survived long enough to be sampled
• Any factor found associated with a prevalent case of disease might be associated with disease development, survival with disease, or both
• Assuming goal is to find factors associated with disease development (etiologic research), bias in prevalence ratio occurs any time that exposure under study is associated with survival with disease
Selection Bias in Case-Control Studies: Exposure & Disease at Outset Invites Selection Bias
Coffee and cancer of the pancreas MacMahon et al. NEJM 1981
Cases: patients with histologic diagnosis of pancreatic cancer in any of 11 large hospitals in Boston and Rhode Island between October 1974 and August 1979
What type of study base is this?
Primar
y Stu
dy B
ase
- A
Primar
y Stu
dy B
ase
- D
Primar
y Stu
dy B
ase
- E
Primar
y Stu
dy B
ase
- C
Primar
y Stu
dy B
ase
- B
Selection Bias in Case-Control Studies: Presence of Exposure & Disease at Outset Also Invites Selection Bias
Coffee and cancer of the pancreas MacMahon et al. NEJM 1981
Cases: patients with histologic diagnosis of pancreatic cancer in any of 11 large hospitals in Boston and Rhode Island between October 1974 and August 1979
What type of study base is this?
Primar
y Stu
dy B
ase
- A
Primar
y Stu
dy B
ase
- D
Primar
y Stu
dy B
ase
- C
Primar
y Stu
dy B
ase
- B
Selection Bias in Case-Control Studies: Presence of Exposure & Disease at Outset Also Invites Selection Bias
Coffee and cancer of the pancreas MacMahon et al. NEJM 1981
Cases: patients with histologic diagnosis of pancreatic cancer in any of 11 large hospitals in Boston and Rhode Island between October 1974 and August 1979
What type of study base is this?
Primar
y Stu
dy B
ase
- A
Primar
y Stu
dy B
ase
- C
Primar
y Stu
dy B
ase
- B
Selection Bias in Case-Control Studies: Presence of Exposure & Disease at Outset Also Invites Selection Bias
Coffee and cancer of the pancreas MacMahon et al. NEJM 1981
Cases: patients with histologic diagnosis of pancreatic cancer in any of 11 large hospitals in Boston and Rhode Island between October 1974 and August 1979
What type of study base is this?
Primar
y Stu
dy B
ase
- A
Primar
y Stu
dy B
ase
- B