Issues in Causal Inference
Steven Goodman, MD, MHS, PhD Johns Hopkins University
Schools of Medicine and Public Health IOM
June 24, 2009
NYT 10/23/2002, pp. A18-19
Scientists Debating Future of Hormone Replacement
Amino Acid May Not Predict Heart Attacks
Study is Unsure on Tainted Polio Vaccine’s Cancer Role
Nurse-Patient Ratio Linked to Death Rate
Questions assigned
What are the strengths and weaknesses of different types of evidence when making causal inferences?
How should evidence from epidemiological studies be weighed? What is the appropriate role of statistical significance in causal inference? How should causal inference be derived from statistical analysis?
How should interactions be considered in causal inference? Should they change the approach to statistical significance?
What is the appropriate role of biologic data when population-based studies are absent or conflicting?
3 Goodman IOM talk, 6/24/09
Questions assigned
How does causal inference differ when a strong biologic theory exists and when one does not?
Can causation be assessed without population-based studies? Are there special challenges associated with studying vaccine causation?
Estimated time required: 9 semesters Time allotted: 30 minutes
4 Goodman IOM talk, 6/24/09
A short research quiz
A study is done on sequelae of vaccination, based on a large database, and the authors state that a surprising association (i.e., one that they thought had no more than a 10% chance of being true before this study) has been observed between Hepatitis B immunization and pneumonia before age 2, OR= 3.0 (CI: 1.1 to 9.1), P =0.05.
The probability that this association is real is: a.) < 50% b.) 50% to 75% c.) 75+% to 94.99...% d.) ≥ 95%
5 Goodman IOM talk, 6/24/09
Implications
P=0.05 isn’t very strong evidence. We don’t know how to formally make
use of the prior information about plausibility.
Quiz Message
There is no mathematical formula that will tell you, based on the data alone, the likelihood that a claim is true, i.e. that something causes something else.
Judgment enters into that assessment in two domains: The “quality” of the study from which it came.
The plausibility of the relationship, based on prior evidence and biologic knowledge.
7 Goodman IOM talk, 6/24/09
Things identified as cancer risks (Altman and Simon, JNCI, 1992)
Electric Razors Broken Arms (only in women) Fluorescent lights Allergies
Breeding Reindeer Being a waiter Owning a pet bird Hot dogs Being short Being tall
Having a refrigerator 8
“We have no idea how or why the magnets work.”
“A real breakthrough…”
“…the [study] must be regarded as preliminary….”
“But…the early results were clear and... the treatment ought to be put to use immediately.”
Medical Inference Hypothetical underlying illnesses
cough fever rash angina splenomegaly
Possible observed signs and symptoms
Illness A Illness B Illness C D E D U C T I O N
I N D U C T I O N
Statistical Inference Possible underlying differences in cure rates
-5% 0% 5% 10% 15%
Possible observed difference in cure rates
Hypothesis 1 Δ=0%
Hypothesis 2 Δ=5%
Hypothesis 3 Δ=10%
D E D U C T I O N
I N D U C T I O N
Statistical inference
There is only one formal, coherent calculus of inductive statistical inference: Bayes Theorem.
“Traditional” statistical rules of inference are a collection of principles and conventions to avoid errors over the long run. They do not tell us how likely our claims are to be true.
The P-value is…
The probability of getting a result as or more extreme than the observed result, if the null hypothesis (of chance) were true.
Since the p-value is calculated assuming the null hypothesis to be true, it cannot represent the probability of the truth of the null hypothesis.
The P-value is not….
“The probability of the null hypothesis.” “The probability that you will make a Type I
error if you reject the null hypothesis.” “The probability that the observed data
occurred by chance.” “The probability of the observed data under
the null hypothesis.” Almost anything sensible you can think of.
Austin Bradford Hill on Statistics
“No formal tests of significance can answer [causal] questions. Such tests can, and should, remind us of the effects that the play of chance can create… Beyond that they contribute nothing to the ‘proof’ of our [causal] hypothesis.”
“… too often I suspect we waste a great deal of time, we grasp the shadow and lose the substance, we weaken our capacity to interpret data and to take reasonable decisions whatever the value of P. And far too often we deduce ‘no difference’ from ‘no significant difference’. Like fire, the chi-square test is an excellent servant and bad master.”
Hill AB, “The Environment and Disease: Association or Causation?” Proceedings of the Royals Society of Medicine, 58:295-300, 1965.
What is a cause?
“Counterfactual” definition of cause If B occurs in the presence of the A, and
does not occur in the absence of A, we say that “A causes B.”
Problems with “cause” in epidemiology
If you don’t smoke, can you avoid cancer? NO
Multiple causal pathways (cause not necessary)
If you do smoke, will you necessarily get cancer? NO
Multiple factors (“contributing causes”) needed to produce outcome (cause not sufficient).
Probabilistic definition of “cause”
For an individual, if Pr(Disease | Factor) > Pr (Disease | no Factor)
all other things equal, then the Factor is a cause of the disease.
Foundational equations
Mathematics ei π + 1 = 0
Physics E = mc2
Epidemiology Pr(Outcome | X=x) = Pr(Outcome | Set(X=x) )
22 Goodman IOM talk, 6/24/09
The unobservability of causal effects
With Factor W/O Factor Person A ----------------------------> ? Person B ----------------------------> ? Person C ----------------------------> ?
Average (A,B,C) D E F
Average (D, E, F) 23 Goodman IOM talk, 6/24/09
Effect of Random and Systematic Error
True Effect Average Study
Effect
Bias Observed study effect Random
Error
Types of uncertainty
Random error produces stochastic uncertainty (reflected in CIs, the minimum uncertainty)
Potential for bias contributes to epistemic uncertainty, not reflected in formulae, but rather in sensitivity analyses and qualitative evidence rating scales.
26 Goodman IOM talk, 6/24/09
Biologic knowledge
Relevant to inference in 2 ways Affects prior plausibility of relationship
Degree of confidence in mechanism is reflected in confidence that we have identified all relevant confounding factors in observational studies.
Single-case inference
FAA plane crash investigations Patient dies of peritonitis after bowel is
inadvertently cut and not repaired during appendectomy.
Child w/undiagnosed immunodeficiency develops polio post-vaccination.
Child is diagnosed with autism 2 weeks after MMR vaccination.
Ladder of Evidential Strength
________Meta-analysis of Individual patient data ___________ Large, multi-center RCTs ______________ Meta-analysis of group data ______________ Smaller, single site RCTs ______________ Prospective cohort studies, CCTs ____________ Case Control, retrospective
cohort or cross-sect. studies ____________ Poorly controlled studies (hx control ) ________________ Uncontrolled studies (case-series or reports)
STRENGTH OF EVIDENCE
Higher Lower
Sources of epistemic uncertainty
Prior empirical evidence Evidence for biological mechanism Confounders, measured and
unmeasured Analytic model: structure, covariates Missing data
31 Goodman IOM talk, 6/24/09
Ordinal uncertainty scales IOM Legal EPA 1986 EPA 2005 Probabilistic Sufficient to infer…
Beyond a reasonable doubt
Human carcinogen
Human carcinogen
99%+
Suggestive but not sufficient
Clear and convincing
Probable… Likely… 90-99%
Inadequate Preponderance of the evidence
Possible… Suggestive… 50%
Reasonable suspicion
25%
Insufficient evidence
Not classifiable Inadequate ???
Favors no causal relationship
Insufficient evidence
Non-carcinogen Unlikely… <25%
32 Goodman IOM talk, 6/24/09
EPA “narrative”
The framework provides a structure for organizing the facts upon which conclusions …rest. The purpose of using the framework is to make analysis transparent and to allow the reader to understand the facts and reasoning behind a conclusion.
The framework does not dictate an answer. The weight of evidence that is sufficient to support a decision about a mode of action may be less or more, depending on the purpose of the analysis, for example, screening, research needs identification, or full risk assessment. To make the reasoning transparent, the purpose of the analysis should be made apparent to the reader.
Generally, “sufficient” support is a matter of scientific judgment in the context of the requirements of the decision maker or in the context of science policy guidance regarding a certain mode of action.
36 Goodman IOM talk, 6/24/09
Hill’s Causal Criteria
1.) Strength of Association 2.) Consistency of effect in other settings and
populations 3.) Cause before effect. (Temporality) 4.) Biologic gradient (Dose-response) 5.) Plausibility / Coherence / Exper. evidence
6.) Analogy ( similar effects of similar mechanisms)
False True
Causal Conclusions Are…
True False U n c e r t a i n
0% 100%
Other studies Quality of design
Quality of execution
Strength of findings
Biologic evidence
Complex mix of inductive and deductive reasoning to make statements about causal relationships in nature.
Involves formulating and honestly testing competing non-causal hypotheses.
Requires a conceptual model for how the cause is exerting its effect. Key to explanation.
Causal conclusions cannot be made on the basis of the data that gave rise to the causal hypothesis.
What causal inference is
Questions assigned
What are the strengths and weaknesses of different types of evidence when making causal inferences?
How should evidence from epidemiological studies be weighed? What is the appropriate role of statistical significance in causal inference? How should causal inference be derived from statistical analysis?
How should interactions be considered in causal inference? Should they change the approach to statistical significance?
What is the appropriate role of biologic data when population-based studies are absent or conflicting?
45 Goodman IOM talk, 6/24/09
Questions assigned
How does causal inference differ when a strong biologic theory exists and when one does not?
Can causation be assessed without population-based studies? Are there special challenges associated with studying vaccine causation?
Estimated time required: 9 semesters Time allotted: 30 minutes
46 Goodman IOM talk, 6/24/09