a unification of mediation and interaction: a four-way ......a unification of mediation and...
TRANSCRIPT
A unification of mediation and
interaction: a four-way
decomposition
Tyler J. VanderWeele
Departments of Epidemiology and Biostatistics
Harvard School of Public Health
1
Plan of Presentation
(1) Questions of Mediation and Interaction
(2) A Unification of Mediation and Interaction
(3) Regression Approaches and Ratio Scales
(4) Application to Genetic Epidemiology
(5) Relation to Prior Decompositions
(6) Concluding Remarks
2
Mediation
In some research contexts we might be interested in the extent to
which the effect of some exposure A on some outcome Y is
mediated by an intermediate variable M and to what extent it is
direct
Stated another way, we are interested in the direct and indirect
effects of the exposure
In other research contexts we may be interested in whether A and
M interact in their effects, and how much of their effects are due
to interaction
A M Y
3
MediationIn some cases, we may be interested in both mediation and interaction
In 2008, GWAS studies found variants 15q25.1 associated with lung cancer
(Thorgeirsson et al., 2008; Hung et al., 2008; Amos et al., 2008)
These same variant were known to be associated with smoking (average
cigarettes per day) (Saccone et al., 2007; Spitz et al., 2008)
The variants also increased vulnerability to the harmful effect of smoking, a
gene-environment interaction e.g. carriers of the variant allele extract more
nicotine and toxins from each cigarette (Le Marchand, 2008)
The causal inference literature has developed methods that can assess
mediation in the presence of interaction to get direct and indirect effects
In this example from genetic epidemiology, most of the effect seemed
“direct” (94%) with respect to cigarettes per day (VanderWeele et al. 2012)
But this does not clarify the role of interaction itself 4
NotationLet Y denote some outcome of interest for each individual
Let A denote some exposure or treatment of interest for
each individual
Let M denote some post-treatment intermediate(s) for each
individual (potentially on the pathway between A and Y)
Let C denote a set of covariates for each individual
Let Ya be the counterfactual outcome (or potential outcome)
Y for each individual when intervening to set A to a
Let Ma be the counterfactual outcome M for each individual
when intervening to set A to a
Let Yam be the counterfactual outcome Y for each individual
when intervening to set A to a and M to m5
A Unification of Mediation and
Interaction
We can in fact decompose a total effect, TE = Y1 - Y0, into four components
(VanderWeele, 2014) under the “composition” assumption that Ya =YaMa
(1)A controlled direct effect (CDE): the effect of A in the absence of M
(2)A reference interaction (INTref): The interaction that operates only if the
mediator is present in the absence of exposure
(3)A mediated interaction (INTmed): The interaction that operates only if the
exposure changes the mediator
(4)A pure indirect effect (PIE): The effect of the mediator in the absence of
the exposure times the effect of the exposure on the mediator 6
A Unification of Mediation and
Interaction
We can summarize the four components as:
(1)CDE: Neither mediation nor interaction
(2)INTref: Interaction but not mediation
(3)INTmed: Both mediation and interaction
(4)PIE: Mediation but not interaction
7
A Unification of Mediation and
Interaction
We cannot identify these effects for an individual but, under certain
confounding assumptions (next slides), we can identify them on average for
a population. If so, we let pam = P(Y=1|A=a,M=m) then we have:
We could calculate the proportions due to each of the components:
8
A Unification of Mediation and
InteractionThe four components are:
We could add E[INTref] and E[INTmed] for the overall proportion due to interaction:
We could add E[PIE] and E[INTmed] for the overall proportion due to mediation:
9
Identification
The confounding assumptions are the same as those generally used in the
causal inference literature to identify direct and indirect effects:
(1) There are no unmeasured exposure-outcome confounders given C
(2) There are no unmeasured mediator-outcome confounders given (C,A)
(3) There are no unmeasured exposure-mediator confounders given C
(4) None of the mediator-outcome confounders are affected by exposure
For controlled direct effects,
only assumptions (1) and (2)
are needed
Note (1) and (3) are guaranteed
when treatment is randomized
A M YC1
C3 C2 10
Identification
More formally, in counterfactual notation, these assumptions are:
(1)is Yam | | A | C
(2) is Yam | | M | C,A
(3) is Ma | | A | C
(4) is Yam | | Ma* | C
For controlled direct effects,
only assumptions (1) and (2)
are needed
Note (1) and (3) are guaranteed
when treatment is randomized
A M YC1
C3 C2 11
Regression ApproachSimilar results hold if one or both of A or M are binary
Under the confounding assumptions we can estimate each of the four
components in a straightforward way using regression models for Y and M:
Under these models if our confounding assumptions, then the effects for a
change in the exposure from reference level a* to level a are given by:
12
Relation to Mediation Decompositions
Our basic four-way decomposition was:
If we combine the CDE and INTref we obtain what is sometimes called the
“natura/pure direct effect”If we combine the PIE and INTmed we obtain what is some times called the
“natural/total indirect effect” (Robins and Greenland1992;Pearl 2001)
PDE = Pure direct effect (natural direct effect) =
TIE = Total indirect effect (natural indirect effect =
These are also sometimes called natural direct and indirect effects
This is the decomposition of Robins and Greenland (1992) and Pearl (2001)
This is essentially the decomposition used in epidemiology and the social
sciences when interaction is absent 13
Relation to Prior DecompositionsVanderWeele and Tchetgen Tchetgen (2014) also showed the total effect could be
divided into CDE, PIE and proportion attributable to interaction; the 4-way
decomposition unites all other; We can summarize in a figure:
14
Ratio ScaleA similar four-way decomposition also holds using a ratio scale
Where RRam = pam /p00 and where κ = p00 / pa=0 is a scaling factor
If we divide each component by the sum, then κ drops out:
We can estimate the components using logistic regression (w/SAS code)
We can also proceed with case-control data under a rare outcome assumption 15
Genetic EpidemiologyIn 2008, GWAS studies found variants 15q25.1 associated with lung cancer
(Thorgeirsson et al., 2008; Hung et al., 2008; Amos et al., 2008)
These same variant were known to be associated with smoking (average
cigarettes per day) (Saccone et al., 2007; Spitz et al., 2008)
The variants also increased vulnerability to the harmful effect of smoking, a
gene-environment interaction e.g. carriers of the variant allele extract more
nicotine and toxins from each cigarette (Le Marchand, 2008)
When methods for direct and indirect effects were employed most of the
effect seemed “direct” with respect to cigarettes per day (VanderWeele et al.
2012)
But this did not fully capture the role of interaction; there was evidence for
such interaction (Li et al, 2010; Truong et al, 2010; VanderWeele et al, 2012)
Now we will examine what proportion of the effect is due (i) to just mediation,
(ii) to just interaction, (iii) to both and (iv) to neither 16
Genetic Epidemiology
The study sample consists of 1836 cases and 1452 controls is
from a case control study (cf. Miller et al., 2002) assessing the
molecular epidemiology of lung cancer, which began in 1992 at
the Massachusetts General Hospital (MGH)
Eligible cases included any person over the age of 18 years, with
a diagnosis of primary lung cancer that was further confirmed by
an MGH lung pathologist.
The controls were recruited from among the friends or spouses of
cancer patients or the friends or spouses of other surgery
patients in the same hospital.
Potential controls that carried a previous diagnosis of any cancer
(other than non-melanoma skin cancer) were excluded from
participation. 17
Genetic Epidemiology
Sample characteristics of cases and controls
_________________________________________________________________
Cases (N=1836) Controls (N=1452)
_________________________________________________________________
Average Cigarettes per Day 25.42 13.97
Smoking Duration 38.50 18.93
Age 64.86 58.58
College Education 31.3% 33.5%
Sex Male 50.1% 56.1%
Female 49.9% 43.9%
rs8034191 C alleles
0 33.8% 43.3%
1 48.5% 43.7%
2 17.7% 13.0%
18
Assumptions About Confounding
To use our approach with the genetic variants we need to assume no
unmeasured confounding for the (1) exposure-outcome, (2) mediator-
outcome, and (3) exposure-mediator relationships
Assumptions (1) and (3) are probably plausible for the exposure (the
genetic variant) subject to no population stratification (the analysis was
restricted to Caucasians)
*(2)* No confounding may be less plausible for the smoking – lung
cancer association (e.g. SES / neighborhood)
We consider sensitivity analysis later
(4) Smoking duration may affect
cigarettes/day and lung cancer and
may affected by the variant (though not
much evidence) and results are similar
when duration is omitted
A M YC
C U
Genetic EpidemiologyWhen we apply the four-way decomposition using logistic regression for lung
cancer and linear regression for square root of average cigarettes per day
(this measure is more normally distributed) comparing 2 to 0 variant alleles
we obtain a total effect risk ratio of RR=1.77 and:
Most of the direct effect (which is 94%) appears to be due to INTref i.e. to be
due to interaction but not mediation; mediation is only about 6%
As suspected, the proportion due to interaction is substantial, but now we
can quantify this
20
Genetic EpidemiologyWith the two-way (Robins and Greenland, 1992; Pearl, 2001; VanderWeele et al.,
2012) decomposition we obscure the role of interaction here because these
combine the CDE and INTref into the PDE
21
Study Summary
(1) Most of the effect seemed to be due to interaction, in the absence
of mediation (at least with respect to cigarettes per day)
(2) Both the mediated effect and the reference interaction may be
underestimated due to measurement error in the self-reported
cigarettes per day measure (Valeri et al., 2014)
(3) Other aspects of smoking (e.g. depth of inhalation) may mediate
more of the relationship
(4) The strong interaction might likewise depend on the smoking
variable used (cigarettes per day versus depth of inhalation)
(5) At least with respect to cigarettes per day, however, most of the
effect is not by increasing cigarettes per day (it does this only by
1 CPD for smokers) but rather because of the interaction
22
APOE and MemoryThe same technique was applied to a similar genetic example with APOE
e4 alleles (Sajeev et al., 2015)
APOE e4 is associated with Alzheimer’s, cognitive decline, memory, etc.
To what extent is the effect of e4 alleles on memory mediated by
cerebrovascular disease markers e.g. microbleeds?
To what extent is it due to interaction?
Data come from 4121 participants in the population-based Age-
Gene/Environment Susceptibility (AGES) Study in Reykjavik, Iceland
We use techniques for the 4-way decomposition
All models adjusted for age, sex, education, diabetes, smoking status,
and midlife measures of physical activity, body mass index, systolic blood
pressure, and total cholesterol 23
APOE and Memory
24
Concluding RemarksOne further theoretical point is of interest
Sometimes a portion eliminated measure is proposed as being of more policy
relevance (Robins and Greenland, 1992; VanderWeele, 2013):
E[PE] = E[TE] – E[CDE] = E(Y1 - Y0) - E(Y10 - Y00)i.e. what portion (or proportion) of the effect could we eliminate if we set M=0
The four-way decomposition gives a further causal interpretation of PE:
i.e. it is the proportion due to mediation or interaction or both
When we fix the mediator to 0, we eliminate both mediation and interaction
This is different from the portion mediated (PM) in that it includes INTref
In the example PM=6% but PE=61% because of the interaction
The CDE (and thus also the portion eliminated) are easier to identify from the
data (only confounding assumptions 1 and 2, not 3 and 4, are required) 25
Concluding Remarks
(1) The four-way decomposition makes clear what proportion of an
effect is due (i) to just mediation, (ii) to just interaction, (iii) to both
and (iv) to neither
(2) It unites, within a single framework, prior decompositions for
mediation and prior decompositions for interaction
(3) It gives the most insight into both phenomena of mediation and
interaction (cf. VanderWeele, 2015)
(4) It is relatively straightforward to implement with SAS code
(5) Sensitivity analysis for measurement error and unmeasured
confounding are available for some mediation and interaction
measures; it would be good to extend these to cover each of the
four components
26
27
OXFORD UNIVERSITY PRESS
Explanation in Causal
Inference
Methods for Mediation and
Interaction
2015 │ Hardcover│ ISBN:
9780199325870
ReferencesAmos, C.I., et al. Genome-wide association scan of tag SNPs identifies a
susceptibility locus for lung cancer at 15q25.1. Nat. Genet. 40, 616-622 (2008).
Baron RM, Kenny DA. The moderator-mediator variable distinction in social psycho-
logical research: conceptual, strategic, and statistical considerations. Journal of
Personality and Social Psychology. 1986; 51:1173-1182.
Chanock, S.J. & Hunter, D.J. When the smoke clears… Nat. 452, 537-538 (2008).
Hosmer, D.W., Lemeshow, S. (1992). Confidence interval estimation of interaction.
Epidemiology 3:452-56.
Hung, R., et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine
receptor subunit genes on 15q25. Nat. 452, 633-637 (2008).
Imai, K., Keele, L., Yamamoto, T. (2010a). Identification, inference, and sensitivity
analysis for causal mediation effects. Statistical Science, 25:51-71.
28
ReferencesImai, K., Keele, L., Tingley, D. (2010b). A general approach to causal mediation
analysis. Psychological Methods, 15:309-334.
Imai, K., Keele, L., Tingley, D., Yamamoto, T. (2010c). Causal mediation analysis
using R. In: H.D. Vinod (ed.), Advances in Social Science Research Using R. New
York: Springer (Lecture Notes in Statistics), p.129-154.
Judd CM, Kenny DA. Process analysis: estimating mediation in treatment
evaluations. Eval Rev, 1981;5:602-619.
Lange, T., Vansteelandt, S., and Bekaert, M. (2012). A simple unified approach for
estimating natural direct and indirect effects. Am J Epidemiol, 176:190-195.
Le Marchand, L., et al. Smokers with the CHRNA lung cancer-associated variants
are exposed to higher levels of nicotine equivalents and a carcinogenic tobacco-
specific nitrosamine. Cancer Res. 68, 9137-9140 (2008).
Pearl J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth
Conference on Uncertainty and Artificial Intelligence, 411-20. Morgan Kaufmann,
San Francisco.
29
ReferencesRobins JM, Greenland S. (1992). Identifiability and exchangeability for direct and
indirect effects. Epidemiology 3, 143-155.
Rothman, K. J. Modern Epidemiology. 1st ed. Little, Brown and Company, Boston,
MA (1986).
Tchetgen Tchetgen, E.J. (2011). On causal mediation analysis with a survival
outcome. International Journal of Biostatistics, 7:Article 33, 1-38.
Tchetgen Tchetgen, E.J. and Shpitser, I. (2012). Semiparametric theory for causal
mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis.
Annals of Statistics, 40:1816-1845.
Valeri, L., Lin, X., and VanderWeele, T.J. (2014). Mediation analysis when a
continuous mediator is measured with error and the outcome follows a generalized
linear model. Statistics in Medicine.
Valeri, L. and VanderWeele, T.J., Mediation analysis allowing for exposure-mediator
interactions and causal interpretation: theoretical assumptions and implementation
with SAS and SPSS macros. Psychological Methods, 18:137-150.30
References
VanderWeele, T.J. (2010). Bias formulas for sensitivity analysis for direct and indirect
effects. Epidemiology, 21:540-551.
VanderWeele, T.J. (2011). Causal mediation analysis with survival data. Epidemiol,
22:582-585.
VanderWeele, T.J. (2013). Policy-relevant proportions for direct effects.
Epidemiology, 24:175-176.
VanderWeele, T.J. (2013). A three-way decomposition of a total effect into direct,
indirect, and interactive effects. Epidemiology, 24: 24:224-232.
VanderWeel, T.J. (2014). A unification of mediation and interaction: a four-way
decomposition. Epidemiology, 25:749-761.
VanderWeel, T.J. (2015). Explanation in Causal Inference: Methods for Mediation
and Interaction. Oxford University Press: New York, in press.
31
References
VanderWeele, T.J., Asomaning, K., Tchetgen Tchetgen, E.J., Han, Y., Spitz, M.R.,
Shete, S., Wu, X., Gaborieau, V., Wang, Y., McLaughlin, J., Hung, R.J., Brennan, P.,
Amos, C.I., Christiani, D.C. and Lin, X. (2012). Genetic variants on 15q25.1, smoking
and lung cancer: an assessment of mediation and interaction. American Journal of
Epidemiology, 75:1013-1020.
VanderWeele, T.J. and Tchetgen Tchetgen, E.J. (2014). Attributing effects to
interactions. Epidemiology, 25:711-722.
VanderWeele, T.J. and Vansteelandt, S. (2009). Conceptual issues concerning
mediation, interventions and composition. Statistics and Its Interface 2:457-468.
VanderWeele, T.J. and Vansteelandt, S. (2010). Odds ratios for mediation analysis
with a dichotomous outcome. American Journal of Epidemiology, 172:1339-1348.
VanderWeele, T.J. and Vansteelandt, S. (2013). Mediation analysis with multiple
mediators. Epidemiologic Methods, 2:95-115.
32
General Decomposition
33
General Decomposition
34