summary of relationships between exchangeability, biasing paths and bias
TRANSCRIPT
REVIEW
Summary of relationships between exchangeability, biasing pathsand bias
William Dana Flanders • Ronald Curtis Eldridge
Received: 22 January 2014 / Accepted: 15 May 2014
� Springer Science+Business Media Dordrecht 2014
Abstract Definitions and conceptualizations of con-
founding and selection bias have evolved over the past
several decades. An important advance occurred with
development of the concept of exchangeability. For
example, if exchangeability holds, risks of disease in an
unexposed group can be compared with risks in an exposed
group to estimate causal effects. Another advance occurred
with the use of causal graphs to summarize causal rela-
tionships and facilitate identification of causal patterns that
likely indicate bias, including confounding and selection
bias. While closely related, exchangeability is defined in
the counterfactual-model framework and confounding
paths in the causal-graph framework. Moreover, the precise
relationships between these concepts have not been fully
described. Here, we summarize definitions and current
views of these concepts. We show how bias, exchange-
ability and biasing paths interrelate and provide justifica-
tion for key results. For example, we show that absence of
a biasing path implies exchangeability but that the reverse
implication need not hold without an additional assump-
tion, such as faithfulness. The close links shown are
expected. However confounding, selection bias and
exchangeability are basic concepts, so comprehensive
summarization and definitive demonstration of links
between them is important. Thus, this work facilitates and
adds to our understanding of these important biases.
Keywords Confounding � Exchangeability � Bias �Biasing path � Causal effect � Directed acyclic graph
Introduction
Confounding and selection bias are important biases that
can affect observational studies [1]. Conceptually, con-
founding can be defined as a mixing of the effects of an
extraneous variable with those of the factor of interest so as
to distort the observed association [2] and selection bias as
distortion due to the way subjects are selected, enrolled or
participate. Our understanding of and ability to identify
these fundamental biases have evolved substantially [1, 3,
4].
A particularly important advance occurred with publi-
cation of a classic paper on exchangeability [5].
Exchangeability is couched in the causal language of
counterfactuals and defined using potential-outcome mod-
els [5–7]. It defines conditions under which the observed
risks in one or more substitute groups can be used to
replace risks in a population of interest (the target popu-
lation) under conditions other than those which actually
occurred in the target. If the substitute risks differ from
those in the target for the hypothesized conditions, bias is
present [5, 7]. Although less appreciated, exchangeability
also concerns selection bias [7].
Another important advance in our ability to identify
confounding and selection bias occurred with development
of causal diagrams as tools for representing causal rela-
tionships [8–10]. These diagrams, often directed acyclic
graphs (DAGs), provide a convenient description of
assumed causal relationships among exposure, disease and
covariates [11–13]. Rules for constructing and interpreting
Electronic supplementary material The online version of thisarticle (doi:10.1007/s10654-014-9915-2) contains supplementarymaterial, which is available to authorized users.
W. D. Flanders (&) � R. C. Eldridge
Atlanta, GA 30322, USA
e-mail: [email protected]
123
Eur J Epidemiol
DOI 10.1007/s10654-014-9915-2
these diagrams provide links to causal models. The pre-
sence of certain paths, called biasing paths (defined below),
indicates that bias is likely and rules for identifying these
paths facilitate assessment of confounding and decisions
about whether to control for a covariate [10, 13].
Epidemiologists know well that exchangeability, con-
founding, selection bias and biasing paths in DAGs are
intimately related. For example, Greenland and Robins
described exchangeability assumptions that are part of
confounder adjustment methods [5]. Although many
reviews of causal models, biases and their inter-relation-
ships are available [6, 9, 10, 13–16] including those of
confounding paths and exchangeability [1, 5, 7], some
aspects of the relationships between exchangeability and
biasing paths are less well-documented.1
Our purpose is to review and summarize how
exchangeability, bias, and biasing paths inter-relate. When
multiple definitions are available (e.g., confounding), we
attempt to choose one that is commonly used by many, if
not all epidemiologists. Using these definitions, we provide
arguments about why each of the relationships summarized
in Table 1 should hold, and why certain invalid ‘‘conclu-
sions’’ (Table 2) should not. Our arguments also illustrate
representation of the same causal relationships in different
frameworks: the POM—(used in defining exchangeability),
the structural-equation—(closely related to DAGs), and the
DAG—(part of biasing paths) frameworks.
This manuscript is organized as follows. First, we briefly
review key terminology and concepts; short reviews of
causal and Markovian models appear in the Appendix 1 as
many excellent summaries are available [1, 6, 8, 10, 13, 14,
17, 18]. Second, we consider an example with 3 variables
to illustrate representation of the same causal relationships
in both the DAG and the POM framework, linking the two
by using structural equations [13, p. 27]. In the results
section, we provide arguments for each of the claims in
Table 1 and counterexamples showing that the premises in
Table 2 are insufficient for the listed ‘‘conclusion’’.
Finally, in the discussion, we note the implications of these
results.
Definitions and assumptions
Throughout the text, we assume that exposure (E) precedes
the dichotomous outcome of interest (D). We also assume
no misclassification and, since our interest is in bias, that
the population is large enough that population frequencies
differ negligibly from the corresponding probabilities.
Although other definitions are available, we choose rela-
tively standard ones and focus on our main goal—sum-
marizing the inter-relationships between the different
conceptualizations related to bias.
Potential-outcome model
A POM conceptualizes each individual as having an out-
come if she was exposed and another outcome if she had
Table 1 Key relationships between exchangeability, bias, biasing
paths
Claim Premise Conclusion (follows from
premise)
1A Exchangeability ? consistency No bias
1B Exchangeability ? faithfulness Absence of a biasing path
2A Absence of a biasing path Exchangeability
2B Absence of a biasing
path ? consistency
No bias
2C No bias ? consistency Exchangeability
Two additional assumptions—consistency and faithfulness—allow
further implications
Table 2 Additional relationships that do not validly follow from the
premise
Claim Premise Invalid conclusion (additional
assumption(s) needed)
3 Exchangeability Absence of a biasing path
(example)
4A No bias absence of a biasing path
(example)
4B Biasing path Bias (converse of 4A)
5 Partial exchangeability
(p1 ? p3 = q1 ? q3) and
consistency
No bias, when the target
population is the entire
cohort (Appendix 2 of
Supplementary material)
Table 3 Potential outcomes [D(e)a for D, dichotomous exposure E (1
denotes development of the outcome)]
Potential-
outcome
type
D(1):
outcome
D if
exposed
D(0):
outcome D
if
unexposed
Population
frequency in
exposed
group
Population
frequency in
unexposed
group
1 1 1 p1 q1
2 1 0 p2 q2
3 0 1 p3 q3
4 0 0 p4 q4
a D(e) is the value of D for an individual, if we intervened to set E to
e, for e = 0, 1
1 For example, a Google Scholar search (1/6/14) for ‘‘exchangeabil-
ity’’ and ‘‘confounding path’’ identified 25 publications; none give
conditions for which exchangeability implies no confounding path, or
conversely.
W. D. Flanders, R. C. Eldridge
123
been unexposed [6, 11, 13, 19]. Using Rubin’s notation
[19], D(e) denotes the outcome D for an individual, if we
intervened to set E to e, for e = 0, 1. This framework is
closely related to some definitions of causality [17] and is
often labeled ‘‘counterfactual’’ [6] since only one potential
outcome can actually be observed. For a dichotomous
exposure and disease, four potential-outcome patterns are
possible [5] (Table 3). The individual causal effect [20] is
D(1) - D(0), and the population effect is the correspond-
ing average over the target population [21] E[D(1)] -
E[D(0)]. We also assume that exposure of one individual
doesn’t affect the outcome of others [17, 22, 23] (no
interference; Appendix 1).
Directed acyclic graph (DAG)
Rules for constructing, using and interpreting DAGs are
reviewed in detail elsewhere [2, 6, 11, 13, 18]. Briefly, each
node or vertex (letter) in the DAG represents a character-
istic or event (variable) and each edge represents an effect,
with the arrow pointing from the causal to the affected
factor. A directed path between variables is a contiguous
sequence of arrows all pointing in the same direction; an
undirected path is a path wherein all arrows do not point in
the same direction. A DAG is acyclic because it contains
no loops, and it corresponds to a Markovian causal model
(see Appendix 1). A collider is a node where two arrow-
heads intersect. A path is blocked if it includes: either a
non-collider that is controlled analytically (e.g., using a
correctly specified model or stratification), or an uncon-
trolled collider whose descendants are uncontrolled. A path
is open if, along the path, all colliders or descendants of
colliders are controlled and no non-colliders are controlled.
If two or more variables are marginally associated, then the
DAG must contain an open path between them. Although
often not graphed, each node can have independent errors
or disturbances Ui that represents unmeasured or unknown
causes. Ancestors of a variable are all other factors in the
graph connected by a directed path ending with an arrow
into the variable. Parents are ancestors connected directly
to the variable.
Bias
Bias, in general, refers to an expected difference between
the estimand (what is purportedly being estimated), and the
estimator used to estimate it. We define the observed RD
and risk ratio (RR) as unbiased [17] if and only if:
E½R1� � E½R0� ¼ E½Dð1Þ� � E½Dð0Þ�E[R1�=E½R0� ¼ E½Dð1Þ�=E½Dð0Þ�
ð1Þ
where R1 and R0 are the observed risks in the exposed and
unexposed. Expectations are for the target, which is the
entire population, although others are possible. Equation
(1) imply the observed risk difference (RD) and ratio are
unbiased for the causal effect measured on the difference
and ratio scales, respectively. The RD and RR are condi-
tionally unbiased, if Eq. (1) hold in each stratum of
covariates C.
Collider bias
Analytic control for, or selection based on, a collider opens
the path connecting the parents of the collider and is
expected to change the association between the parents in
at least one category of the collider. Bias due to such
control or selection is called collider bias [1] (structural
selection bias; Appendix 1). When subjects are selected
based on a collider, collider bias is sometimes called
selection bias [1].
Biasing path
In a DAG [6, 8, 11, 17], a biasing path is an open (un-
blocked), undirected path (arrows in both directions)
between exposure and disease; a confounding path is a
biasing path ending with an arrow into disease [10, 18].
The DAG in Fig. 1a illustrates a simple biasing path (from
E to C to D). A biasing path, if unblocked (e.g., by control
of C in Fig. 1a) suggests that bias must be suspected and
that exchangeability is unlikely.
If exposure precedes disease, a biasing path is a con-
founding path or one that involves conditioning on a col-
lider. However, some overlap exists between confounding
paths and those representing selection bias and more gen-
erally collider bias (see Appendix 1) so we often use the
more generic ‘‘biasing paths’’.
A backdoor path is an undirected path between exposure
and disease, with an arrow into exposure [13]. Absent
conditioning, biasing paths, confounding paths and open
C
DE
C
D E (a) (b)
Fig. 1 a Casual graph illustrating a simple, confounding path from
E to C to D. E represents an exposure, C a confounder, and D the
outcome. b Casual graph illustrating an indirect effect of E, from E to
C and from C to D and, a direct effect of E on D. E represents an
exposure, C an intermediate factor, and D the outcome. In Fig. b,
there is no biasing path
Exchangeability and biasing paths
123
backdoor paths coincide [1] (please see ‘‘backdoor crite-
rion’’ below).
Confounding
Many definitions of confounding are available. Our main
focus concerns the relationships between exchangeability,
biasing paths and bias. However, for completeness we
define the exposure-disease relationship to be confounded
[1] if there is a confounding path between them, and
unconfounded otherwise. Similarly, we define confounding
[11] as: ‘‘Assuming that exposure precedes disease, con-
founding will be present if and only if exposure would
remain associated with disease even if all exposure effects
were removed, prevented, or blocked’’ (see Appendix 1).
Exchangeability
Exchangeability is defined in the POM framework and is
closely related to the concepts of confounding, bias and a
biasing pathway defined in the DAG framework. Green-
land and Robins [5] defined partial exchangeability to hold
if and only if the population frequencies of potential-out-
come types (Table 3) in the exposed (pi) and the unexposed
(qi) groups satisfy: p1 ? p3 = q1 ? q3.
Partial exchangeability and consistency imply that the
risk in the unexposed subpopulation equals what the risk in
the exposed subpopulation would have been if, contrary to
fact, the exposed had been unexposed [7] and that the
effect of exposure among the exposed is estimated by the
observed risk RD. However when the target is the full
population, unbiased estimation of causal effects (both
difference and ratio scales) requires an additional
assumption (Table 2; Appendix 2 of Supplementary
material). Thus, we use Hernan and Robins [17, 24] defi-
nition of exchangeability:
DðeÞa
E, ð2Þ
meaning D(e) is statistically independent (‘
) of E. This
definition, adopted throughout, coincides with a type of
‘‘complete’’ exchangeability [5]: p1 ? p3 = q1 ? q3 and
p1 ? p2 = q1 ? q2 (Appendix 2 of Supplementary mate-
rial). If exchangeability holds in each stratum of covariates
C, we write D(e)‘
E|C.
Consistency
Consistency is a property typically assumed to hold for
potential outcome models. Consistency is said to hold if
D(e) = D if E = e. It provides a link between potential
outcomes D(e), at least one of which is unobservable, and
the observable outcome D.
Faithfulness
Faithfulness is a property, sometimes assumed, that relates
DAGs and statistical independencies. Faithfulness holds if
an open path between two factors in a DAG implies that
they must be associated [1, 13] conditional on any con-
trolled factors; the independencies are ‘‘stable’’ under
alternative parameters. However, real situations exist that
approximate independence due to near cancelation of
effects, so this assumption can be controversial [1].
Example: A simple DAG and structural equations
for three variables
We now consider a simple example that illustrates: (a) the
link between DAGs and a mathematical formulation of the
implied causal effects—structural equations (see Appendix
1); (b) the counterfactuals associated with the causal effects;
and, (c) a biasing path. This example also provides, with
modification, counter-examples for the ‘‘claims’’ in Table 2.
The DAG in Fig. 1a specifies a simple causal model (see
Appendix 1) for E, C and D wherein C affects both E and D
(i.e., a biasing, or more specifically a confounding path),
and E affects D. E might represent alcohol consumption
(drinkers vs. non-drinkers), C smoking, and D myocardial
infarction (MI). The biasing path suggests that smoking’s
effect on MI could distort the drinking-MI association.
Figure 2 explicitly includes independent disturbances or
error terms U1, U2 and U3, which can cause different
individuals to respond differently to the same combination
of measured factors.
We can also represent the causal model of Fig. 2 using
structural equations (Appendix 1):
D e; c; uð Þ ¼ fDðe; c; uÞ for E ¼ e; C ¼ c; U1 ¼ u;
E c; vð Þ ¼ fEðc; vÞ for C ¼ c; U2 ¼ v;
C(wÞ ¼ fCðwÞ for U3 ¼ w
ð3Þ
D(e, c, u) is an individual’s potential outcome if E were set
to e, C to c, and U1 to u, with corresponding definitions for
E(c, v) and C(w). Implied by the DAG, the population
C
DE
U1
U3
U2
Fig. 2 Casual graph illustrating a simple, confounding path from E to
C to D, like that in Fig. 1a but with the independent errors now
included (U1–U3). E represents an exposure, C a confounder, and
D the outcome
W. D. Flanders, R. C. Eldridge
123
distributions of U1, U2, and U3 are jointly independent.
Each effect in the DAG corresponds to a function, form
unspecified, that gives the potential outcomes for each
combination of parents. For example, the parents of D in
Fig. 2 are E, C and U1, so the function fD(.) that specifies
the potential outcomes for D across different combinations
of its parents depends on just E, C and U1.
Note 1
Exchangeability would imply the MI-risk in non-drinkers,
represents the MI-risk that the drinkers would have had, if
they not been drinkers.
Bias is absent if and only if the RD and ratio comparing
MI-risk among drinkers to that among non-drinkers equals
the corresponding causal effect of smoking on MI-risk. In
view of the biasing path, both exchangeability and no bias
seem implausible if the DAG is correct. However, a biasing
path does not necessarily imply bias (Table 2).
Note 2
For this example, fD(e, c, u) = D(e), the potential D-out-
come if E were set to e, depends on c and u—the particular
individual’s values of C and U1. However, if E were a
cause of C (Fig. 1b), the G-computation algorithm and
composition [25] imply: D(e) = fD[e, c(e), u] where c(e) is
the potential outcome for C if E were set to e.
Results
Our main purpose in this section is to summarize and
justify the relationships between these conceptualizations
(Tables 1, 2). Most of the relationships are well-known or
expected; none are particularly surprising. Nevertheless, it
is instructive to justify the implications and identify what
additional assumptions may be needed for some.
Claim 1A Exchangeability and consistency imply no
bias.
Proof (See also Hernan and Robins [17]). Exchange-
ability implies P(D(1) = 1) = P(D(1) = 1|E = 1), and
consistency implies PðDð1Þ ¼ 1jE ¼ 1Þ ¼ PðD ¼ 1jE ¼1Þ ¼ E½R1�. Similarly PðDð0Þ ¼ 1Þ ¼ E½R1�, which gives
the condition for absence of bias.
Claim 1B Exchangeability and faithfulness imply
absence of a biasing path. The proof (Appendix 1) pro-
ceeds by showing that, if a biasing path did exist, then
either exchangeability or faithfulness could not hold.
Claim 2A Absence of a biasing path implies exchange-
ability. The proof (Appendix 1) uses the independence
between E and the parents of D that is implied by the
absence of a biasing path.
Note 3
Absence of a confounding path does not imply exchange-
ability. For example, under the DAG in Fig. 3—it is
straightforward to choose parameters so that exchange-
ability does not hold, despite absence of a confounding
path (although here, a biasing path is present).
Claim 2B Absence of a biasing path and consistency
imply no bias.
Proof We prove the contrapositive (bias ? consistency
imply a biasing path): by Claim 1A, bias and consistency
imply non-exchangeability; by Claim 2A non-exchange-
ability implies a biasing path.
Claim 2C No bias and consistency imply exchangeability.
Proof No bias implies E[D(1)] = P(D = 1|E = 1)
(Appendix 2 of Supplementary material, Note S6). Con-
sistency implies E[D(1)|E = 1] = P(D = 1|E = 1), but
E[D(1)] = E[D(1)|E = 1]P(E = 1) ? E[D(1)|E = 0]P(E = 0).
Combining results gives: P(D = 1|E = 1) = P(D = 1|E = 1)
P(E = 1) ? E[D(1)|E = 0]P(E = 0) which implies E[D(1)|
E = 0] = E[D(1)|E = 1], or D(1)‘
E. The corresponding
result for D(0) establishes exchangeability.
Note 4
Hernan and Robins [17] state claim 2C (without proof), and
also define conditional exchangeability under which
exchangeability holds with each stratum of covariates.
With consistency, exchangeability and no bias are equiv-
alent (Claims 1A and 2C).
Claim 3 In a Markovian causal model, exchangeability
need not imply absence of a biasing path. The proof
(Appendix 1) is based on a counterexample wherein there
is a biasing path and exchangeability. The example illus-
trates how specific parameter choices can create indepen-
dence [e.g., P(D(1)|E = 1) = P(D(1)|E = 0)]. However,
E
B
C
D
Fig. 3 Casual graph illustrating a simple, biasing path from E to C to
D that is not a confounding path; the path represents structural
selection bias. E represents an exposure, B and C covariates (C is a
collider), and D the outcome. This path would represent collider bias
(structural selection bias), defined in the Appendix 1. A box indicates
control for the variable
Exchangeability and biasing paths
123
the counterexample fails if we impose the condition of
faithfulness, since the independence could be destroyed for
other parameter choices (also, see Claim 1B).
Claim 4A Absence of bias does not imply absence of a
biasing path.
Claim 4B A biasing path does not imply bias. Claim 4B
is merely the contrapositive of 4A. We prove the claims
(Appendix 1) using a counterexample that has a biasing
path but no bias. The example is fine-tuned, so certain
effects cancel one another leaving no net bias. However,
small changes in parameterization would create bias so
faithfulness doesn’t hold.
Note 5
Absent stratification on any colliders, no confounding path
and consistency imply exchangeability. This claim follows
from claim 2A because, with no stratification, confounding
paths and biasing paths coincide. Conversely, exchange-
ability and faithfulness imply absence of a confounding
path by Claim 1B since all confounding paths are biasing
paths.
The Backdoor criterion is a condition potentially
describing a set of variables S. S meets the criterion if no
descendent of exposure is in S and if some member of S
intercepts every backdoor path. If S satisfies the backdoor
criterion, after conditioning on S no biasing path will
remain. Pearl [13, 26] shows that, with the backdoor cri-
terion, causal effects can be estimated (identified), condi-
tional on the variables in S. Even if not explicitly stated,
these conclusions presume no conditioning on variables
outside of S. The necessity of an additional assumption is
seen by considering a biasing path that is not a backdoor
path (as but one example, the top graph in Fig. 4). The
empty set satisfies the backdoor criterion as there are no
backdoor paths, yet a biasing path remains after condi-
tioning on S and the effect is not identifiable since bias is
expected due to conditioning on the collider C1.
If the backdoor criterion holds, then conditional on the
variables in S and assuming no other conditioning, then no
biasing path exists, which implies no bias by Claim 1A. No
bias in turn, implies that the RD and RR can be estimated
by contrasts of observed, conditional risks [extension of
Eq. (1)], consistent with Pearl’s identifiability result.
Discussion
We have considered the relationships between biasing
paths, exchangeability and bias, all well-known, important,
inter-related concepts. We showed that absence of a biasing
path, but not just absence of a confounding path, implies
exchangeability. We also showed that the converse doesn’t
hold (exchangeability doesn’t imply absence of a con-
founding path), although exchangeability and faithfulness
together imply absence of a biasing path. Further, presence
of a biasing path does not imply bias, although it must be
suspected absent typically implausible canceling of certain
effects. Such cancelation would likely be a violation of
faithfulness (e.g., as in Claim 4A).
If faithfulness holds, exchangeability is stronger than
absence of confounding in the sense that it implies absence of
confounding along with other biasing paths (e.g., selection
bias), whereas absence of confounding does not imply
exchangeability. These observations are consistent with the
original concept of exchangeability which referred indirectly
to selection bias, through the phrase ‘‘causal confounding’’ [7].
Because exchangeability is defined in the potential-
outcomes framework and biasing paths in the DAG
framework, we represented the same causal relationships in
both by using the well-known link, structural equations
[13]. These results illustrate, strengthen and further char-
acterize known links between exchangeability, confound-
ing and biasing paths. Understanding these links is
important because exchangeability and biasing paths are
two of the main ways to conceptualize, understand, identify
and even define bias. Indirectly, our results also illustrate
linking of concepts defined in the different frameworks by
considering the structural equations representation in the
DAG framework which we linked to potential-outcome
types and frequencies in the POM framework.
Construction of our causal DAGs and our POMs was
based on rules that link them to structural equations and to
population frequencies [13]. We stated our rules, attempt-
ing to use those that are common. However, different
definitions or rules for constructing and interpreting POMs
and DAGs could lead to different implications and links, so
clarity about these rules is vital. Furthermore, alternative
proofs are likely possible, perhaps using single world
intervention templates [27] or perhaps using other,
E DC1 C2
E C1 C2 D
C1 C2 DE C3
Fig. 4 Three causal graphs illustrating different causal relationships
and biasing paths. E represents an exposure, C1–C3 covariates, and
D the outcome (also, see Appendix 1). A box indicates control for the
variable
W. D. Flanders, R. C. Eldridge
123
established mappings between graphs, potential outcomes
and structural equations [13]. The relationships highlighted
here aren’t surprising and are perhaps known, but some
may be less obvious or widely appreciated. Thus, this
review provides a summary of interrelationships between
key confounding and bias concepts, and a demonstration of
the validity of the relationships or lack thereof in a single
source.
Linking the concepts of bias, exchangeability and con-
founding and biasing paths more tightly, while also
pointing to possible differences between them in the
absence of faithfulness, should provide greater insight into
each concept and allow the strengths of these concepts to
be used more completely together. Use of simple examples
illustrates these links. These ideas should facilitate teach-
ing and applied research since bias and confounding are
vital considerations in nearly every study. Clarity, discus-
sion and communication are facilitated through the ability
to relate concepts in one framework (e.g., POMs,
exchangeability) to corresponding concepts in the other
(e.g., DAGs, structural equations, biasing paths), and to
implications for bias; the relationships in Tables 1 and 2
should help that ability.
Acknowledgments We would like to thank and acknowledge Dr.
Sander Greenland (University of California Los Angeles) for his
helpful comments and correspondence in regards to this manuscript.
Appendix 1
In the Appendix 1, we define additional terms, state
assumptions and provide proofs of Claims 1B, 2A, 3, and 4.
Throughout, we assume that the causal model is Markovian
[13], defined below.
Causal models
Following Pearl [13, p. 27], a functional causal model, or
just causal model, is a set of structural equations that
determines the potential outcome xi of Xi for each depen-
dent variable by:
Xiðpai; uÞ ¼ fiðpai; uÞ; for i ¼ 1; . . .; n; ð4Þ
where: i indicates the variable, fi(.) is a function; pai is a
value of the variables PAi which are the parents of Xi (i.e.,
the immediate causes of Xi); and, u is a value of the error
term Ui and n is the number of variables. The errors,
sometimes called disturbances, are often unobserved and
could be viewed as representing omitted factors. In Eq. (3)
of the main text, Xi could represent D, and then PAD
consists of E and C, and Ui is U1.
The Eq. (4) give the effect on each Xi that would result
from changing PAi or Ui from one value to another. They
are assumed to represent autonomous, causal mechanisms
or effects. When the form of the fi(.) is unspecified they
define a non-parametric structural-equations model
(NPSEM), which generalizes the linear structural-equa-
tions models with Gaussian errors often found in the
econometric and social literature [13]. The set of structural
equations provide formulas for determining potential out-
comes that would occur for actions of setting specific
combinations of the relevant parents [13].
Without some restriction, Di(e) could depend on the
exposure of other individuals as might, for example, be true
of communicable diseases. Rubin describes independence
(no interference), wherein exposure of one individual
doesn’t affect outcomes of others. Here, to avoid this
interference, we make a ‘‘stable unit treatment value
assumption’’: Di(e)‘
Ei* for i and i* = i.
Markovian causal model
If we draw an arrow from the direct causes of each variable
Xi (from each member of PAi) to Xi in a causal model, we
obtain a causal diagram for the model. Here, each node or
letter in a DAG represents a variable (Xi) and each arrow
represents a causal effect with the arrowhead pointing to
the effect. We call each variable (Xi) a node and each
arrow an effect. If the causal diagram contains no cycles
and the errors of each variable are jointly independent, we
call the causal model Markovian [13, p. 30]. Each Mar-
kovian causal model induces a compatible probability
distribution [13] which we use here. With the appropriate
distribution for Ui, we could write P(Xi = xi|PAi =
pai) = P(Ui = u, u 2 {u: fi (pai, u) = xi}). This model is a
nonparametric structural model with independent errors
(NPSEM-IE). It differs from the ‘‘finest fully randomized
causally interpretable tree graph’’ (FFRCITG) of Richard-
son and Robins [24] because the errors are assumed inde-
pendent. The NPSEM-IE is a special case of the FFRCTIG.
Further note on DAGs
Other models, both causal and non-causal, can be repre-
sented by DAGs. Here, we use only the causal interpreta-
tion and independent errors (represent by a U) for DAGs so
that they represent Markovian causal models. With this link
and restrictions, we can use either the non-parametric
structural equations (Appendix Eq. 4) or the graphical
representation of a Markovian causal model. In a Mar-
kovian causal model, a compatible induced probability
distribution always exists and can be factored as:
P(X1 = x1,…, Xn = xn) =Q
i PðXi ¼ xijPAi ¼ paiÞ [13,
p. 30].
Exchangeability and biasing paths
123
Confounding
As noted in the main text, many definitions of confounding
(and other concepts) are available. Our main focus concerns
the relationships between exchangeability, bias and biasing
paths but for completeness, we provided one precise defi-
nition of confounding [11]. Although highly overlapping,
the presence of confounding under this definition does not
always coincide with presence of a confounded exposure-
disease association. For example, suppose that E precedes D
and consider the three DAGs in Fig. 4. The upper DAG
shows no confounding (under our definition [11] )—since E
and D would be unassociated if all effects of E were
removed, but the association is confounded since there is a
confounding path (conditioning on the collider C1, indi-
cated by the box, opens the path). The middle DAG shows
confounding—since E and D would be associated if all
effects of E were removed, but the association is not con-
founded since there is no confounding path (the biasing path
does not end with an arrow into D). This middle DAG
nevertheless illustrates a biasing path [1]; since it involves
conditioning on a common effect of exposure and disease,
referred to as structural selection bias.
Other definitions of confounding and of other biases,
while similar to those used here, are available and can
differ. We adopted one set of definitions from the literature
so we could illustrate and summarize the inter-relationships
between some of the different conceptualizations. Use of a
single set of definitions allowed us to focus on the inter-
relationships between conceptualizations in different
frameworks; we avoid some of discussion about strengths,
weakness and preferability of one option over another.
Nevertheless, it might be useful for the wider epidemiol-
ogic community to move towards adoption of a single set
of definitions.
Structural selection bias is defined as bias that results
from conditioning on a variable caused by two other
variables; one is exposure or a cause thereof, and the other
is disease or a cause thereof [17]. As such it’s a type of
collider bias. In the DAG framework, this situation would
be represented by a biasing path. However, the definition of
confounding path used here [1] includes some situations
that represent both a confounding path and structural
selection bias (e.g., the uppermost DAG of Fig. 4). Hence,
here we primarily refer to ‘‘biasing paths.’’ There are a few
biasing paths, perhaps unusual, that do not meet this defi-
nition of structural selection bias (but more inclusive def-
initions are available, see Appendix of Ref. [17] ) or a
confounding path (lowest DAG of Fig. 4).
We now sketch proofs of Claims 1B, 2B, 3, 4A and 4B.
Proof of Claim 1B (Exchangeability and faithfulness
imply absence of a biasing path). We prove a
contrapositive: a biasing path and faithfulness imply non-
exchangeability. Presence of a biasing path implies the
DAG includes an open, undirected path. An example is
illustrated in Fig. 5, where n and m are the number of
additional variables intermediate between E and C, and
between C and D, respectively, along the path. Additional
variables not on the path can also be present. Each factor in
the path is determined by its parents along the path plus
other parents not in the path, according to the structural
equations:
E ym; paE; uEð Þ ¼ fE ym; paE; uEð Þ;Yi yi�1; paYi; uYið Þ ¼ fYi yi�1; paYi; uYið Þ
for i ¼ 2; . . .;m; Y1 c; paY1; uY1ð Þ ¼ fY1 c; paY1; uY1ð ÞC paC; uCð Þ ¼ fC paC; uCð ÞZ1 c; paZ1; uZ1ð Þ ¼ fZ1 c; paZ1; uZ1ð Þ;
Zi zi�1; paZi; uZið Þ ¼ fZi zi�1; paZi; uZið Þ for i ¼ 2; . . .; n
D zn; paD; uDð Þ ¼ fD zn; paD; uDð Þ;
In the expression E(ym, paE, uE) = fE(ym, paE, uE) is the
potential value of E if Ym, were set to ym, PAE, to paE and
C
D
ZnYm
E
Y1 Z1
Fig. 5 Causal graph, illustrating a biasing path with m descendants of
C that are ancestors of exposure E, and n descendants of C that are
ancestors of outcome D (see Appendix 1)
C
D
ZnYm
E
Y1 Z1
Fig. 6 Causal graph, illustrating a biasing path with m - 1 descen-
dants of C that are parents of a descendant (Ym) of exposure E,
conditioning on collider (Ym) which opens the path between C and E,
and n descendants of C that are ancestors of outcome D (see
Appendix 1). A box indicates control for the variable
W. D. Flanders, R. C. Eldridge
123
the independent, random error term UE to uE, with analo-
gous statements for the other expressions; where PAX are
the parents of X excluding UX and the factors explicitly
included on the path shown. The path depicted is a special
type of biasing path—a backdoor path, without any con-
ditioning on colliders in the path and with an arrow into E;
However, the n ? m ? 3 equations above can be modified
to reflect other biasing paths; for each type we still have
n ? m ? 3 equations for variables on the paths. [For
example, the path in Fig. 6, is a biasing path with one,
controlled collider (Y1). We would need to modify 2
equations, setting E(paE, uE) = fE(paE, uE) and Ym (e, paYi,
uYi) = fYi(e, paYi, uYi) to reflect this modification.]
We show that one can always choose parameters so that
P(D(e)|E = 1) = P(D(e)|E = 0), for e = 0, 1. The func-
tions are unspecified, so we can define and parameterize
each function (e.g., fD) so that it depends on the unmea-
sured terms (e.g., UD) and on the immediate parent on the
path (e.g., Zn), but negligibly on other variables (e.g.,
PAD). We first consider the simplest case (Fig. 1a), where
C affects both E and D directly, and Yi and Zi aren’t
present. We can define:
Then with a1 = b1 = 1,000, a2 = b2 = c = 1, and
b0 = 0, E has no effect: D(1) = D(0). Also D(e) will be 1, to a
close approximation, if and only if C = 1, so:
P(D(e) = 1|E = 1) & P(C = 1|E = 1) & 1 and P(D(e) =
1|E = 0) & P(C = 1|E = 0) & 0 implying P(D(e)|
E = 1) = P(D(e)|E = 0) for e = 0, 1 and by consistency
P(D|E = 1) = P(D|E = 0).
For more complicated situations (e.g., Fig. 5), one can
show by induction on the number of equations that it is
always possible to choose functions (fE fYm, …, fY1, fC,
fZ1,…, fZn fD) and parameterizations for those functions
such that P(D(e)|E = 1) = P(D(e)|E = 0) for e = 0, 1.
Faithfulness now implies that exchangeability cannot
hold: the induction argument implies that parameters can be
chosen that would ‘‘destroy’’ the independence
P(D(1)|E = 1) = P(D(1)|E = 0) required by
exchangeability (see Note 2) and destroy
P(D|E = 1) = P(D|E = 0)—which is not consistent with
faithfulness. (In other words, a probability distribution which
implies P(D(e)|E = 1) = P(D(e)|E = 0) for e = 0, 1 under
the structure implied by a biasing-path-containing graph
could not be faithful). Thus, if a biasing path is present and
the distribution is faithful in this way, then exchangeability
cannot hold, establishing Claim 1B.
Proof of Claim 2A (Absence of a biasing path implies
exchangeability.) Absence of a biasing path implies that E‘AD, where AD is the set of D’s parents including the U’s
that are implicit in the DAG for each node. If E‘
AD did
not hold, the DAG would need to include an open path
from E to some X 2 AD, which would then be part of a
biasing path from E to X and then to D, conditionally on
controlled factors (if any). But, by G-computation or the
do-calculus, the parents of D determine the counterfactual
distribution of D, under interventions setting e to 0 or 1.
Since the distribution of D’s -parents is the same among the
exposed and the unexposed by independence, the distri-
bution of counterfactuals for D must be the same in the
exposed and unexposed. In particular, D(e)‘
E. (A pos-
sible subtlety is that absence of an open path from E to X 2AD immediately implies E and each X 2 AD are pairwise
independent, whereas the argument assumed that E is
independent of AD. However, this last independence is
implied, for example, by Theorem 1.2.5 of Pearl, since E
and AD are d-separated [13], conditionally on controlled
factors, if any).
Proof of Claim 3 (In a Markovian causal model,
exchangeability need not imply absence of a biasing path).
We start with the DAG in Fig. 2 which has a confounding
path and the corresponding structural equations (Example
1) and show that exchangeability can hold for some
parameterization. We consider the following parameteri-
zation:
fEðc; uEÞ ¼ 1 if expit a1cþ a2uEð Þ[ 0:5;¼ 0 if expitða1cþ a2uEÞ� 0:5
fCðucÞ ¼ 1 if expit(cucÞ[ 0:5¼ 0 if expit(cucÞ� 0:5
fDðe; c; uDÞ ¼ 1 if expit(b0eþ b1cþ b2uDÞ[ 0:5;¼ 0 if expit(b0eþ b1cþ b2uDÞ� 0:5; and
uE; uc and uD have independent; standard normal distributions
Exchangeability and biasing paths
123
With this parameterization, D(1) = 1 if and only if:
UD = 3; UD = 4; UD = 5 and C = 0; or, UD = 6 and
C = 1. These events are mutually exclusive so:
P D 1ð Þ ¼ 1jE ¼ 1ð Þ ¼ P UD ¼ 3ð Þ þ P UD ¼ 4ð Þþ P C ¼ 1jE ¼ 1ð Þ � P UD ¼ 6ð Þþ P C ¼ 0jE ¼ 1ð Þ � P UD ¼ 5ð Þ¼ P UD ¼ 3ð Þ þ P UD ¼ 4ð Þþ P UD ¼ 6ð Þ
Similarly, P(D(1) = 1|E = 0) = P(UD = 3) ? P(UD = 4)
? P(UD = 6) so D(1)‘
E; similar results show D(0)‘
E,
proving exchangeability. With the same parameterization,
a stronger form of exchangeability p~¼ q~ also holds. Thus,
exchangeability (not even the stronger form) does not
imply absence of a biasing path as Fig. 2 does, in fact, have
a biasing (confounding) path.
Proof of Claims 4A and 4B (Absence of bias does not
imply absence of a biasing path, and a biasing path does
not imply bias.). We prove this claim through an Example
that has a Biasing path but no bias (and exchangeability).
We again use the causal relationships in Fig. 2, where there
is a biasing path. Appendix Table 4 parameterizes the
causal relationships, in terms of the structural equations.
We also assume that C has 3 categories with
P(C = 1) = 0.4, P(C = 2) = 0.3, P(E = 1|C = 1) = 0.4,
P(E = 1|C = 2) = 0.5 and P(E = 1|C = 3) = 0.1. The
latter three equations represent causal effects of C on E.
With this parameterization, E[D(1)] = P(D = 1|E = 1)
= 0.1375 and E[D(0)] = P(D = 1|E = 0) = 0.1225 and
so there is no bias. Similarly, exchangeability holds. In this
example, we have ‘‘fine-tuned’’ the parameters, so bias
would be absent even though C affects both E and D, a
common situation for confounding. Bias would be present
for most minor changes in the parameters and so the
absence of bias is unstable in some sense. The distribution
would be technically be unfaithful, since for example with
most parameter changes D(e) would no longer be inde-
pendent of E.
References
1. Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd
ed. Philadelphia: Lippincott Williams & Wilkins; 2008.
2. Rothman KJ. Modern epidemiology. Boston: Little, Brown; 1986.
3. Greenland S, Robins J, Pearl J. Confounding and collapsibility in
causal inference. Stat Sci. 1999;14:29–46.
4. Miettinen OS, Cook EF. Confounding: essence and detection. Am
J Epidemiol. 1981;114:593–603.
5. Greenland S, Robins J. Identifiability, exchangeability, and epi-
demiologic confounding. Int J Epidemiol. 1986;15:413–9.
6. Greenland S, Brumback B. An overview of relations among
causal modelling methods. Int J Epidemiol. 2002;31:1030–7.
7. Greenland S, Robins JM. Identifiability, exchangeability and
confounding revisited. Epidemiol Perspect Innov. 2009;6. doi:10.
1186/742-5573-6-4.
8. Pearl J. Causal diagrams for empirical research (with discussion).
Biometrika. 1995;82:669–710.
9. Pearl J. Some apects of graphical models connected with cau-
sality. In: 49th session of the International Statistical Institute,
Florence, Italy; 1993.
10. Glymour MM, Greenland S. Modern Epidemiology, 3rd ed. In:
Rothman KJ, Greenland S, Lash TL, editors. Causal Diagrams.
Philadelphia: Lippincott, Williams & Wilkins; 2008. p. 183–209.
11. Greenland S, Pearl J, Robins J. Causal diagrams for epidemiol-
ogic research. Epidemiology. 1999;10:37–48.
12. Greenland S. Quantifying biases in causal models: classical
confounding vs collider-stratification bias. Epidemiology.
2003;14:300–6.
13. Pearl J. Causality. 2nd ed. Cambridge: Cambridge University
Press; 2009.
Table 4 Structural equations for example used in proof (Appendix 1)
of claim 4
E C UD (u) fD(E, C, UD) P(UD = u)
1 1 1 1 0.1
1 1 =1 0
1 2 2 1 0.175
1 2 =2 0
1 3 3 1 0.15
1 3 =3 0
0 1 4 1 0.13
0 1 =4 0
0 2 5 1 0.115
0 2 =5 0
0 3 6 1 0.12
0 3 =6 0
UD has 6 categories: P UD ¼ uð Þ[ 0 for u ¼ 1; . . .; 6; P UD ¼ 5ð Þ ¼ P(UD ¼ 6Þ;fDðe; c; 1Þ ¼ 0 for all e; c; fDðe; c; 2Þ ¼ 1 if e ¼ 0; and 0 otherwise;fDðe; c; 3Þ ¼ 1 for e ¼ 1 and 0 otherwise fDðe; c; 4Þ ¼ 1 for all e; c;fDðe; c; 5Þ ¼ 1 if c ¼ 0 and 0 otherwise; fDðe; c; 6Þ ¼ 1 for c ¼ 1 and 0 otherwise:
W. D. Flanders, R. C. Eldridge
123
14. Greenland S, Pearl J. Adjustments and their consequences—
collapsibility analysis using graphical models. Int Stat Rev.
2011;79:401–26.
15. VanderWeele TJ, Robins JM. Directed acyclic graphs, sufficient
causes, and the properties of conditioning on a common effect.
Am J Epidemiol. 2007;166:1096–104.
16. Robins JM, Richardson T. Alternative graphical causal models
and the identification of direct effects. In: Shrout P, Keyes K,
Ornstein K, editors. Causality and psychopathology: finding the
determinants of disorders and their cures. Oxford: Oxford Uni-
versity Press; 2010. p. 103–58.
17. Hernan MA, Robins J. Causal inference. 2012 ed; 2012. http://
www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.
Accessed 1 Oct 2012.
18. Greenland S, Pearl J. Causal diagrams. In: Boslaugh S, editor.
Encyclopedia of epidemiology. Thousand Oaks: Sage; 2007.
p. 149–56.
19. Rubin DB. Estimating causal effects of treatments in randomized
and nonrandomized studies. J Educ Psychol. 1974;66:688–701.
20. Hernan MA, Robins J. A definition of causal effect for epide-
miology. J Epidemiol Community Health. 2004;58:265–71.
21. Maldonado G, Greenland S. Estimating causal effects. Int J Ep-
idemiol. 2002;31:422–9.
22. Rubin DB. Comment: Neyman (1923) and causal inference in
experiments and observational studies. Stat Sci. 1990;5:472–80.
23. Rubin DB. Direct and indirect causal effects via potential out-
comes. Scand J Stat. 2004;31:161–70.
24. Hernan MA, Robins JM. Estimating causal effects from epidemi-
ological data. J Epidemiol Community Health. 2006;60:578–86.
25. VanderWeele TJ. Causal mediation analysis with survival data.
Epidemiology (Cambridge, Mass). 2011;22:582.
26. Pearl J, Paz A. Confounding equivalence in causal inference.
J Causal Inference. 2012;2:75–93.
27. Richardson TS, Robins JM. Single world intervention graphs
(SWIGs): a unication of the counterfactual and graphical
approaches to causality. Working paper number 128. Center for
Statistics and the Social Sciences, University of Washington.
2013. http://www.csss.washington.edu/Papers/wp128.pdf.
Exchangeability and biasing paths
123