causal effect identification by adjustment under confounding ...causal effect identiﬁcation by...

Causal Effect Identification by Adjustment under

Confounding and Selection Biases

Juan D. Correa

Purdue [email protected]

Elias Bareinboim

Purdue [email protected]

Abstract

Controlling for selection and confounding biases are two ofthe most challenging problems in the empirical sciences aswell as in artificial intelligence tasks. Covariate adjustment(or, Backdoor Adjustment) is the most pervasive techniqueused for controlling confounding bias, but the same is obliv-ious to issues of sampling selection. In this paper, we intro-duce a generalized version of covariate adjustment that simul-taneously controls for both confounding and selection biases.We first derive a sufficient and necessary condition for recov-ering causal effects using covariate adjustment from an ob-servational distribution collected under preferential selection.We then relax this setting to consider cases when additional,unbiased measurements over a set of covariates are availablefor use (e.g., the age and gender distribution obtained fromcensus data). Finally, we present a complete algorithm withpolynomial delay to find all sets of admissible covariates foradjustment when confounding and selection biases are simul-taneously present and unbiased data is available.

Introduction

One of the central challenges in data-driven fields is to com-pute the effect of interventions – for instance, how increas-ing the educational budget will affect violence rates in a city,whether treating patients with a certain drug will help theirrecovery, or how increasing the product price will changemonthly sales? These questions are commonly referred asthe problem of identification of causal effects. There aretwo types of systematic bias that pose obstacles to this kindof inference, namely confounding bias and selection bias.The former refers to the presence of a set of factors that af-fect both the action (also known as treatment) and the out-come (Pearl 1993), while the latter arises when the action,outcome, or other factors differentially affect the inclusionof subjects in the data sample (Bareinboim and Pearl 2016).

The goal of our analysis is to produce an unbiased esti-mand of the causal effect, specifically, the probability dis-tribution of the outcome when an action is performed by anautonomous agent (e.g., FDA, robot), regardless of how thedecision would naturally occur (Pearl 2000, Ch. 1). For ex-ample, consider the graph in Fig. 1(a) in which X represents

Copyright c� 2017, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

a treatment (e.g., taking or not a drug), Y represents an out-come (health status), and Z is a factor (e.g., gender, age) thataffects both the propensity of being treated and the outcome.The edges (Z,X) and (Z, Y ) may encode the facts ”genderaffects how the drug is being prescribed” and ”gender af-fects recovery” respectively – for example, females may bemore health conscious, so they seek for treatment more fre-quently than their male counterparts and at the same timeare less likely to develop large complications for the par-ticular disease. Intuitively, the causal effect represents thevariations of X that bring about change in Y regardless ofthe influence of Z on X , which is graphically representedin Fig. 1(b). Mutilation is the graphical operation of remov-ing arrows representing a decision made by an autonomousagent of setting a variable to a certain value. The mathemat-ical counterpart of mutilation is the do() operator and theaverage causal effect of X on Y is usually written in termsof the do-distribution P (y | do(x)) (Pearl 2000, Ch. 1).

The gold standard for obtaining the do-distribution isthrough the use of randomization, where the treatment as-signment is selected by a randomized device (e.g., a coinflip) regardless of any other set of covariates (Z). In fact,this operation physically transforms the reality of the un-derlying population (Fig. 1(a)) into the corresponding mu-tilated world (Fig. 1(b)). The effect of Z on X is neutral-ized once randomization is applied. Despite its effective-ness, randomized studies can be prohibitively expensive, andeven unattainable in certain cases, either for technical, eth-ical, or technical reasons – e.g., one cannot randomize thecholesterol level of a patient and record if it causes the heartto stops, when trying to assess the effect of cholesterol levelon cardiac failure.

An alternative way of computing causal effects istrying to relate non-experimentally collected samples(drawn from P (z, x, y)) with the experimental distribution(P (y | do(x))). Non-experimental (often called observa-tional) data relates to the model in Fig. 1(a) where sub-jects decide by themselves to take or not the drug (X)while influenced by other factors (Z). There are a num-ber of techniques developed for this task, where the mostgeneral one is known as do-calculus (Pearl 1995). In prac-tice, one particular strategy from do-calculus called adjust-ment is used the most. It consists of averaging the ef-fect of X on Y over the different levels of Z, isolating

TECHNICAL REPORT R-24-LFirst version: January 2017Revised: June 2017

the effect of interest from the effect induced by other fac-tors. Controlling for confounding bias by adjustment iscurrently the standard method for inferring causal effectsin data-driven fields, and different properties and enhance-ments have been studied in statistics (Rubin 1974; Robin-son and Jewell 1991; Pirinen, Donnelly, and Spencer 2012;Mefford and Witte 2012) and AI (Pearl 1993; 1995; Pearland Paz 2010; Shpitser, VanderWeele, and Robins 2010;Maathuis and Colombo 2015; van der Zander, Liskiewicz,and Textor 2014).

Orthogonal to confounding, sampling selection bias is in-duced by preferential selection of units for the dataset, whichis usually governed by unknown factors including treatment,outcome, and their consequences. It cannot be removed bya randomized trial and may stay undetected during the datagathering process, the whole study, or simply never be de-tected1. Consider Fig. 1(e) where X and Y represent againtreatment and outcome, but S represents a binary variablethat indicates if a subject is included in the pool (S=1 meansthat the unit is in the sample, S=0 otherwise). The effect ofX on Y in the entire population (P (y | do(x))) is usuallynot the same as in the sample (P (y | do(x), S=1)). For in-stance, patients that went to the hospital and were sampledare perhaps more affluent and have better nutrition than theaverage person in the population, which can lead to a fasterrecovery. This preferential selection of samples challengesthe validity of inferences in several tasks in AI (Cooper1995; Cortes et al. 2008; Zadrozny 2004) and Statistics(Little and Rubin 1986; Kuroki and Cai 2006) as well asin the empirical sciences (Heckman 1979; Angrist 1997;Robins 2001).

The problem of selection bias can be addressed by re-moving the influence of the biased sampling mechanism onthe outcome as if a random sample of the population wastaken. For the graph in Fig. 1(d), for example, the dis-tribution P (y | do(x)) is equal to P (y | x, S=1) becausethere are not external factors that affect X and the selec-tion mechanism S is independent of the outcome Y whenthe effect is estimated for the treatment X . There exists acomplete non-parametric2 solution for the problem of esti-mating statistical quantities from selection biased datasets(Bareinboim and Pearl 2012), and also sufficient and algo-rithmic conditions for recovering from selection in the con-text of causal inference (Bareinboim, Tian, and Pearl 2014;Bareinboim and Tian 2015).

Both confounding and selection biases carry extrane-ous “flow” of information between treatment and outcome,which is usually deemed “spurious correlation” since it doesnot correspond to the effect we want to compute on. De-spite all the progress made in controlling these biases sep-arately, we show that to estimate causal effects consider-ing both problems requires a more refined analysis. First,note that the effect of X on Y can be estimated by block-ing confounding and controlling for selection, respectively,

1(Zhang 2008) noticed some interesting cases where detectionis feasible in a class of non-chordal graphs.

2No assumptions about the about the functions that relates vari-ables are made (i.e. linearity, monotonicity).

X Y

Z

(a)

X Y

Z

(b)

X Y

U

(c)

X Y

S

(d)

X Y

S

(e)

X Y

Z

S

(f)

Figure 1: (a) and (d) give simple examples for confound-ing and selection bias respectively. (b) represents the modelin (a) after an intervention is performed on X . (c) and (e)present examples where confounding and selection bias cannot be removed respectively. In (f) we can control for ei-ther confounding or selection bias, but not for both unlesswe have external data on P (z).

in Figs. 1(a) and (d). On the other hand, confounding cannotbe removed in Fig. 1(c) nor it can be recovered from selec-tion bias in Fig. 1(e). Perhaps surprisingly, Fig. 1(f) presentsa scenario where either confounding or selection can be ad-dressed separately (P (y|do(x)) =

P

Z

P (y|x, z)P (z) andP (z, y|do(x)) = P (z, y|do(x), S=1)), but not simultane-ously (without external data). As this example suggests,there is an intricate connection between these two biases thatdisallow the methods developed for these problems of beingapplied independently and then combined.

In this paper, we study the problem of estimating causaleffects from models with an arbitrary structure that involveboth biases. We establish necessary and sufficient conditionsthat a set of variables should fulfill so as to guarantee that thetarget effect can be unbiasedly estimated by adjustment. Weconsider two settings – first when only biased data is avail-able, and then a more relaxed setting where additional unbi-ased samples of covariates are available for use (e.g., censusdata). Specifically, we solved the following problems:

1. Identification and recoverability without externaldata: The data is collected under selection bias,P (v | S=1), when does a set of covariates Z allowP (y | do(x)) to be estimated by adjusting for Z?

2. Identification and recoverability with external data:The data is collected under selection bias P (v | S=1) andunbiased samples of P (t),T ✓ V, are available. Whendoes a set of covariates Z ✓ T license the estimation ofP (y | do(x)) by adjusting for Z?

3. Finding admissible adjustment sets with external data:How can we list all admissible sets Z capable of identify-ing and recovering P (y | do(x)), for Z ✓ T ✓ V?

Preliminaries

The systematic analysis of confounding and selection biasesrequires a formal language where the characterization of theunderlying data-generating model can be encoded explicitly.

We use the language of Structural Causal Models (SCM)(Pearl 2000, pp. 204-207). Formally, a SCM M is a 4-tuplehU, V, F, P (u)i, where U is a set of exogenous (latent) vari-ables and V is a set of endogenous (measured) variables. Frepresents a collection of functions F = {f

i

} such that eachendogenous variable V

i

2 V is determined by a functionfi

2 F , where fi

is a mapping from the respective domainof U

i

[ Pai

to Vi

, Ui

✓ U , Pai

✓ V \Vi

(where Pai

is theset of endogenous variables that are arguments of f

i

), andthe entire set F forms a mapping from U to V . The uncer-tainty is encoded through a probability distribution over theexogenous variables, P (u). Within the structural semantics,performing an action X=x is represented through the do-operator, do(X=x), which encodes the operation of replac-ing the original equation of X by the constant x and inducesa submodel M

x

. For a detailed discussion on the propertiesof structural models, we refer readers to (Pearl 2000, Ch. 7).

We will represent sets of variables in bold. The causaleffect of a set X when it is assigned a set of values x,on a set Y when it is instantiated as y will be writ-ten as P (y | do(x)), which is a short hand notation forP (Y=y | do(X=x)). Mainly, we will operate with P (v),P (v | do(x)), P (v | S=1), respectively, the observational,experimental, and selection-biased distributions.

Formally, the task of estimating a probabilistic quantityfrom a selection-biased distribution is known as recoveringfrom selection bias (Bareinboim and Pearl 2012). It is notuncommon for observations of a subset of the variables overthe entire population (unbiased data) to be available for use.Therefore, we will consider two subsets of V, M,T ✓ V,where M contains the variables for which data was collectedunder selection bias, and T encompasses the variables ob-served in the overall population, without bias. The absenceof unbiased data is equivalent to have T = ;.

Selection Bias with Adjustment

The main justification for the validity of adjustment forconfounding comes under a graphical conditions called the“Backdoor criterion” (Pearl 1993; 2000), as shown below:

Definition 1 (Backdoor Criterion (Pearl 2000)). A set ofvariables Z satisfies the Backdoor Criterion relative to a pairof variables (X,Y ) in a directed acyclic graph G if:

(i) No node in Z is a descendant of X .(ii) Z blocks every path between X and Y that contains

an arrow into X .

The heart of the criterion lies in cond. (ii), where the setZ is required to block all the backdoor paths between X andY that generate confounding bias. Furthermore, cond. (i)forbids the inclusion of descendants of X in Z, which in-tends to avoid opening new non-causal paths. For example,the empty set is admissible for adjustment in Fig. 1(e), butadding S would not be allowed since it is a descendant of Xand opens the non-causal path X ! S Y . On the otherhand, even though S does not open any non-causal path inFig. 1(f), the criterion does not allow it to be used for ad-justment.

X

Z1 Z2 Y

S

Figure 2: A graph that does not satisfy the s-backdoor crite-rion (respect to Z), but the adjustment formula is recoverableand corresponds to desired causal effect.

(Bareinboim, Tian, and Pearl 2014) noticed that adjust-ment could be used for controlling for selection bias, in ad-dition to confounding, which lead to a sufficient graphicalcondition called Selection-Backdoor criterion.Definition 2 (Selection-Backdoor Criterion (Bareinboimand Tian 2015) ). A set Z = Z+ [ Z�, with Z� ✓ De

X

and Z+ ✓ V \DeX

(where DeX

is the set of variables thatare descendants of X in G) satisfies the selection backdoorcriterion (s-backdoor, for short) relative to X,Y and M,Tin a directed acyclic graph G if:

(i) Z+ blocks all back door paths from X to Y(ii) X and Z+ block all paths between Z� and Y , namely,

(Z� ?? Y | X,Z+)(iii) X and Z block all paths between S and Y , namely,

(Y ?? S | X,Z)(iv) Z [ {X,Y } ✓M and Z ✓ TThe first two conditions echo the extended-backdoor

(Pearl and Paz 2010)3, while cond. (iii) and (iv) guaranteethat the resultant expression is estimable from the availabledatasets. If the S-Backdoor criterion holds for Z relative toX,Y and M,T in G, then the effect P (y | do(x)) is identi-fiable, recoverable, and given by

P (y | do(x)) =X

Z

P (y | x, z, S=1)P (z) (1)

We note here that the S-Backdoor is sufficient but not nec-essary for adjustment. To witness, consider the model inFig. 2 where Z = {Z1, Z2},M = {X,Y, Z1, Z2}, andT = {Z1, Z2}. Here, Z+ = ;,Z� = {Z1, Z2}. Condition(ii) in Def. 2 is violated, namely (Z1, Z2 ?6? Y | X). Per-haps surprisingly, the effect P (y | do(x)) is identifiable andrecoverable, as follows:

P (y|do(x)) = P (y|x) (2)

= P (y|x)X

Z1

P (z1) (3)

=X

Z1

P (y|x, z1)P (z1) (4)

=X

Z1,Z2

P (y|x, z1, z2)P (z2|x, z1)P (z1) (5)

=X

Z1,Z2

P (y|x, z1, z2)P (z2|z1)P (z1) (6)

=X

Z1,Z2

P (y|x, z1, z2, S=1)P (z1, z2) (7)

3The extended-backdoor augments the backdoor criterion to al-low for descendants of X that could be harmless in terms of bias.

(2) follows from the application of the second rule ofdo calculus and the independence (X??Y )

GX . Equations(5),(6),(7) use the independences (Y??Z1|X), (Z2??X|Z1)and (S??Y |X,Z1, Z2) respectively. The final expression (7)is estimable from the available data.

Considering that Z = ; controls for confounding, adjust-ing for Z = {Z1, Z2} seems unnecessary. As it turns out,covariates irrelevant for confounding control, could play arole when we compound this task with recovering from se-lection bias (where Y will need to be separated from S).

Generalized Adjustment

without External Data

Let us consider the case when only biased data P (v | S=1)over V is measured. Our interest in this section is on condi-tions that allow P (y | do(x)) to be computed by adjustmentwithout external measurements.

Consider the model G in Fig. 3(a). Note that Y and Sare marginally independent in G

X

(the graph after an inter-vention on X where all edges into X are not present). Asfor confounding, Z needs to be conditioned on, but doingso opens a path between Y and S, letting spurious correla-tion from the bias to be included in our calculation. It turnsout that with a careful manipulation of the expression, bothbiases can be controlled as follows:

P (y | do(x)) = P (y | do(x), S=1) (8)

=X

Z

P (y | do(x), z, S=1)P (z | do(x), S=1) (9)

=X

Z

P (y | x, z, S=1)P (z | S=1) (10)

Eq. (8) follows from the independence (Y ?? S | X) in themutilated graph G

X

. Next we condition on Z and the (10) isvalid by the application of the second rule of do-calculus tothe first term and the third rule to the second in (9). Note thatevery term in (10) is estimable from the biased distribution.

Next we introduce a complete criterion to determinewhether adjusting by a set of covariates is admissible toidentify and recover the causal effect. Before that, we re-quire the concept of proper causal path.Definition 3 (Proper Causal Path (Shpitser, VanderWeele,and Robins 2010)). Let X,Y be sets of nodes. A causalpath from a node in X to a node in Y is called proper if itdoes not intersect X except at the end point.Definition 4 (Generalized Adjustment Criterion Type 1). Aset Z satisfies the generalized criterion relative to X,Y ina causal model with graph G augmented with the selectionmechanism S if:(a) No element of Z is a descendant in G

X

of any W /2 Xwhich lies on a proper causal path from X to Y.

(b) All non-causal paths between X and Y in G areblocked by Z and S .

(c) Y is d-separated from S given X under the interventiondo(x), i.e., (Y ?? S | X)

G

X

.(d) Every X 2 X is either a non-ancestor of S or it is

independent of Y in GX

, i.e., 8X2X\AnS (X ??Y)GX

XZ Y

S

(a)

Z

X YS

(b)

Figure 3: Models where Z satisfies Def. 8

GX(S) is the graph where all edges into X 2 X \ AnS are

removed, where AnV

stands for the set of ancestors of anode or set V in G.

Theorem 1 (Generalized Adjustment Formula Type 1).Given disjoint sets of variables X,Y and Z in a causalmodel with graph G. The effect P (y | do(x)) is given by

P (y | do(x)) =X

Z

P (y | x, z, S=1)P (z | S=1) (11)

in every model inducing G if and only if they satisfy the gen-eralized adjustment criterion type 1 (Def. 8).

The proof of Theorem 1 is presented in the supplementalmaterial due the the length constraints for the paper. Con-ditions (a) and (b) echo the Extended Backdoor/AdjustmentCriterion (Pearl and Paz 2010; Shpitser, VanderWeele, andRobins 2010) and guarantee that Z is admissible for adjust-ment in the unbiased distribution. Condition (c) requires theoutcome Y to be independent of the selection mechanism Swithout observing any covariate Z. Intuitively, condition (d)ensures that the distribution of some covariates under selec-tion bias are viable to control for confounding.

The model in Fig. 3(b) also satisfies Def. 8, in this case thederivation can be performed by first applying the backdoorand then introducing S in both terms of the expression. Ingeneral, Def. 8 / Thm. 1 encapsulate all possible derivationsthat allow one to recover from both selection and confound-ing biases.

Generalized Adjustment With External Data

A natural question that arises is whether additional measure-ments in the population level over the covariates can helpidentifying and recovering the desired causal effect. The fol-lowing criterion tries to relax the previous results by lever-aging the unbiased data available.Definition 5 (Generalized Adjustment Criterion Type 2).Let T be the set of variables measured without selection biasin the overall population. Also, let X,Y,Z be disjoint setsof variables in a causal model with diagram G, augmentedwith the selection mechanism S, where Z ✓ T. Then, Zsatisfies the generalized criterion relative to X,Y if:(a) No element in Z is a descendant in G

X


(b) All non-causal X-Y paths in G are blocked by Z.(c) Y is independent of the selection mechanism S given

Z and X, i.e. (Y ?? S | X,Z)Theorem 2 (Generalized Adjustment Formula Type 2). LetT is the set of variables measured without selection bias.

X

Z1 Z2

Z3

Y

S

(a)

X1

X2

Z1 Z2

Y

Z3S

(b)

Figure 4: Models where the set Z satisfies Def. 9.

Given disjoint sets of variables X,Y and Z ✓ T and acausal diagram G, then, for every model inducing G, theeffect P (y | do(x)) is given by

P (y | do(x)) =X

Z

P (y | x, z, S=1)P (z) (12)

if and only if the set Z satisfies the generalized adjustmentcriterion type 2 relative to the pair X,Y.

Proof. Suppose the set Z satisfies the conditions relative toX,Y. Then, by conditions (a) and (b), for every model in-duced by G we have:

P (y | do(x)) =X

Z

P (y | x, z)P (z)

We note that S can be introduced to the first term by cond.(c), which entail Eq. (34). The necessity part of the proof ismore involved and is provided in the supplemental material(Correa and Bareinboim 2016).

As in Def. 8, conditions (a) and (b) ensure Z is valid foradjustment without selection bias. Condition (c) requiresthat the influence of the selection mechanism in the outcomeis nullified by conditioning on X and Z and condition (d)simply guarantees that the adjustment expression can be es-timated from the available data. Fig. 4 presents two causalmodels that satisfies the previous criterion if measurementsover Z = {Z1, Z2, Z3} are available. To witness how theexpression can be reached using do-calculus and probabilityaxioms, consider Fig. 4(a):

P (y|do(x)) =X

Z3

P (y|do(x), z3)P (z3|do(x)) (13)

=X

Z3

P (y | x, z3)P (z3) (14)

=X

Z1,Z3

P (y | x, z1, z3)P (z1, z3) (15)

=X

Z

P (y | x, z)P (z2 | x, z1, z3)P (z1, z3) (16)

=X

Z

P (y | x, z, S=1)P (z) (17)

We start by conditioning on Z3 and removing do(x) usingrule 3 of the do-calculus. Then summing over Z1 in thesecond term, pulling the new sum out, and introducing Z1into the first term, using (Y ?? Z1 | Z3, X), results in (15).Eq. (16) follows from conditioning the first term on Z2, andfinally by removing X in the second term using the indepen-dence (Z2??X|Z1, Z3), combining the last two distributions

over the Z’s and introducing the selection bias term usingthe independence (Y ?? S | X,Z) results in (17), whichcorresponds to (34).

Model in Fig. 4(b) also satisfies the type 2 criterion andillustrates how this can be applied to models where X andY may be sets of variables.

Finding Admissible Sets for

Generalized Adjustment

A natural extension to the problem is how to systematicallylist admissible sets for adjustment, using the criteria dis-cussed in the previous sections. This is specially importantin practice where factors such as feasibility, cost, and statis-tical power relate to the choosing of a covariate set.

In order to perform this kind of task efficiently, (van derZander, Liskiewicz, and Textor 2014) introduced a transfor-mation of the model called the Proper Backdoor Graph andformulate a criterion equivalent to the Adjustment Criterion:Definition 6 (Proper Backdoor graph). Let G = (V,E)be a DAG, and X,Y ✓ V be pairwise disjoint subsets ofvariables. The proper backdoor graph, denoted as Gpbd

XY

, isobtained from G by removing the first edge of every propercausal path form X to Y.Definition 7 (Constructive Backdoor Criterion (CBD)). LetG = (V,E) be a DAG, and X,Y ✓ V be pairwise disjointsubsets of variables. The set Z satisfies the ConstructiveBackdoor Criterion relative to (X,Y) in G if:

i) Z ✓ V \Dpcp(X,Y) andii) Z d-separates X and Y in the proper backdoor graph

GpbdXY

.Where Dpcp(X,Y) = De((De

X

(X) \X) \AnX

(Y ))

The set Dpcp(X,Y) is exactly the set of nodes forbiddenby the first condition in both our generalized criteria, andGpbd

XY

only contain X,Y paths that need to be blocked.Lemma 3 (Constructive Backdoor =) Generalized Ad-justment Type 2). Any set Z satisfying the CDB applied toGpbd(X[S)Y and Dpcp(X[S,Y)[(V\T) relative to X,Y inG also satisfies the Generalized Adjustment Criterion type 2.

Proof. By the equivalence between the CBD criterion andthe adjustment criterion, we have that Dpcp(X,Y) is ex-actly the set of nodes forbidden by cond. (a) of the type 2criterion, so

Dpcp(X [ S,Y) (18)= De((De

X,S

(X [ {S}) \ (X [ S)) \ AnX,S

(Y ))

Since S has no descendants, DeX,S

(X[{S}) = DeX

(X)[S and An

X,S

(Y ) = AnX

(Y ). As a consequence Dpcp(X[S,Y) = Dpcp(X,Y) implying cond. (a) of Def. 9.

Gpbd(X[S)Y has all non-causal paths from X to Y presentin Gpbd

XY

, therefore, if Z block all non-causal paths in theformer, it will do in the latter satisfying condition (b).

Every S – Y path may or may not contain X. If not, Zshould block it in Gpbd(X[S)Y. In the latter case, the subpath

from X to Y is either causal or non-causal. If it is causalZ will not block it, but the S-Y path will be blocked by X.If the subpath is non-causal Z should block it, therefore, thelarger path is blocked too. This argument implies condition(c). Since CBD holds for Dpcp(X[ S,Y)[ (V \T) everyelement in Z must belong to T satisfying condition (d).

Lemma 4 (Generalized Adjustment Type 2 =) Construc-tive Backdoor). Any set Z satisfying the Generalized Ad-justment Criterion type 2 relative to X,Y in G also satisfiesthe constructive backdoor criterion applied to Gpbd(X[S)Y andDpcp(X [ S,Y) [ (V \T).

Proof. By lemma 3, Dpcp(X [ S,Y) = Dpcp(X,Y),which combined with condition (d) implies condition (i) ofthe CBP.

By cond. (b) every non-causal path from X to Y isblocked by Z and all paths from S to Y (which are alwaysnon-causal when S is treated as an X) are blocked by Z,Xby cond. (c). Those two facts together imply cond. (ii) of theCBD.

Theorem 5 (Generalized Adjustment Type 2 , Construc-tive Backdoor). A set Z satisfies the Generalized AdjustmentCriterion type 2 relative to X,Y in G if and only if it sat-isfies the CBC applied to Gpbd(X[S)Y and Dpcp(X [ S,Y) [(V \T).

Proof. It follows immediately from lemmas 3,4.

Thm. 5 allows us to use the LISTSEP procedure (van derZander, Liskiewicz, and Textor 2014) to list all the valid setsfor the generalized adjustment type 2. The algorithm guar-antees O(n(n+m)) polynomial delay, where n is the num-ber of nodes and m is the number of edges in G (see (Takata2010)). That means that the time needed to output the firstsolution or indicate failure, and the time between the outputof consecutive solutions, is O(n(n+m)).

To provide the reader an intuition of how the algorithmworks, consider the graph in Fig. 5(a) and its associated con-structive backdoor graph in (b). W is a “forbidden node” inthe sense that it cannot be used for adjustment and for thisexample is the only element in Dpcp(X, Y ) assuming thatunbiased measurement on the covariates Z1, Z2 and Z3 areavailable (i.e. {Z1, Z2, Z3} ✓ T). The algorithm LIST-SEP will output every set of variables that d-separates X[Sfrom Y in the proper backdoor graph that does not containany node in Dpcp(X, Y ).

Conclusions

We provide necessary and sufficient conditions for identifi-cation and recoverability from selection bias of causal ef-fects by adjustment, applicable for data-generating mod-els with latent variables and arbitrary structure in non-parametric settings. Def. 8 and Thm. 1 provide a completecharacterization of identification and recoverability by ad-justment when no external information is available. Def. 9and Thm. 3 provide a complete graphical condition forwhen external information on a set of covariates is avail-able. Thm. 5 allowed us to list all sets that satisfies the last

X1 W

X2

Z1 Z2

Y

Z3S

(a)

X1 W

X2

Z1 Z2

Y

Z3S

(b)

Figure 5: (a) shows a causal model and (b) the proper back-door graph associated with it relative to X [ S and Y . Thegray nodes in (b) represents variables in Dpcp.

criterion in polynomial-delay time, effectively helping in thedecision of what covariates need to be measured for recover-ability. This is especially important when measuring a vari-able is associated with a particular cost or effort. Despite thefact that adjustment is neither complete nor the only methodto identify causal effects, it is in fact the most used tool inthe empirical sciences. The methods developed in this papershould help to formalize and alleviate the problem of sam-pling selection and confounding biases in a broad range ofdata-intensive applications.

References

Angrist, J. D. 1997. Conditional independence in sampleselection models. Economics Letters 54(2):103–112.Bareinboim, E., and Pearl, J. 2012. Controlling selectionbias in causal inference. In Lawrence, N., and Girolami, M.,eds., Proceedings of the 15th International Conference onArtificial Intelligence and Statistics (AISTATS). La Palma,Canary Islands: JMLR. 100–108.Bareinboim, E., and Pearl, J. 2016. Causal inference and thedata-fusion problem. Proceedings of the National Academyof Sciences 113:7345–7352.Bareinboim, E., and Tian, J. 2015. Recovering causal effectsfrom selection bias. Proceedings of the Twenty-Ninth AAAIConference on Artificial Intelligence 3475–3481.Bareinboim, E.; Tian, J.; and Pearl, J. 2014. Recover-ing from selection bias in causal and statistical inference.In Brodley, C. E., and Stone, P., eds., Proceedings of theTwenty-eighth AAAI Conference on Artificial Intelligence,2410–2416. Palo Alto, CA: AAAI Press.Cooper, G. 1995. Causal discovery from data in the presenceof selection bias. In Proceedings of the Fifth InternationalWorkshop on Artificial Intelligence and Statistics, 140–150.Correa, J. D., and Bareinboim, E. 2016. Causal effect iden-tification by adjustment under confounding and selection bi-ases - supplemental material. Technical report, Purdue AILab, Department of Computer Science, Purdue University.Cortes, C.; Mohri, M.; Riley, M.; and Rostamizadeh, A.2008. Sample selection bias correction theory. In Interna-tional Conference on Algorithmic Learning Theory, 38–53.Springer.Heckman, J. J. 1979. Sample selection bias as a specificationerror. Econometrica 47(1):153–161.

Kuroki, M., and Cai, Z. 2006. On recovering a popu-lation covariance matrix in the presence of selection bias.Biometrika 93(3):601–611.Little, R. J. A., and Rubin, D. B. 1986. Statistical Analysiswith Missing Data. New York, NY, USA: John Wiley &Sons, Inc.Maathuis, M. H., and Colombo, D. 2015. A generalizedback-door criterion. Ann. Statist. 43(3):1060–1088.Mefford, J., and Witte, J. S. 2012. The covariate’s dilemma.PLoS Genet 8(11):e1003096.Pearl, J., and Paz, A. 2010. Confounding equivalence incausal equivalence. In Proceedings of the Twenty-Sixth Con-ference on Uncertainty in Artificial Intelligence. Corvallis,OR: AUAI. 433–441.Pearl, J. 1993. Aspects of graphical models connected withcausality. In Proceedings of the 49th Session of the Interna-tional Statistical Institute, 391–401.Pearl, J. 1995. Causal diagrams for empirical research.Biometrika 82(4):669–688.Pearl, J. 2000. Causality: Models, Reasoning, and Infer-ence. New York: Cambridge University Press. 2nd edition,2009.Pirinen, M.; Donnelly, P.; and Spencer, C. C. 2012. In-cluding known covariates can reduce power to detect geneticeffects in case-control studies. Nature genetics 44(8):848–851.Robins, J. M. 2001. Data, design, and background knowl-edge in etiologic inference. Epidemiology 12(3):313–320.Robinson, L. D., and Jewell, N. P. 1991. Some surprising re-sults about covariate adjustment in logistic regression mod-els. International Statistical Review/Revue Internationale deStatistique 227–240.Rubin, D. 1974. Estimating causal effects of treatments inrandomized and nonrandomized studies. Journal of Educa-tional Psychology 66:688–701.Shpitser, I.; VanderWeele, T. J.; and Robins, J. M. 2010.On the validity of covariate adjustment for estimating causaleffects. In Proceedings of UAI 2010, 527–536.Takata, K. 2010. Space-optimal, backtracking algorithmsto list the minimal vertex separators of a graph. DiscreteApplied Mathematics 158(15):1660–1667.van der Zander, B.; Liskiewicz, M.; and Textor, J. 2014.Constructing separators and adjustment sets in ancestralgraphs. In Proceedings of UAI 2014, 907–916.Zadrozny, B. 2004. Learning and evaluating classifiers un-der sample selection bias. In Proceedings of the twenty-firstinternational conference on Machine learning, 114. ACM.Zhang, J. 2008. On the completeness of orientation rules forcausal discovery in the presence of latent confounders andselection bias. Artif. Intell. 172:1873–1896.

Appendix

In order to prove the necessity of the criteria presented inthe paper, it is imperative to construct structural causal mod-els that serve as counter-examples to the identifiability orrecoverability of the causal effect, whenever the set of co-variates Z fails to satisfy the conditions relative to the pairX,Y. The following lemmata will be useful to constructsuch models. The first one, lemma 6 licenses the the directspecification of the conditional distributions of any variablegiven its parents, in accordance to the causal diagram G.Lemma 6 (Family Parametrization). Let G be a causal di-agram over a set V of n variables. Consider also, a set ofconditional distributions P (v

i

| paVi), 1 6 i 6 n such that

PaVi is the set of nodes in G from which there are outgoing

edges pointing into Vi

. Then, there exists a model M com-patible with G that induces P (v) =

Q

n

i=1 P (vi | paVi).

Proof. (By construction) For every Vi

define any orderingon the values of its domain, and let v(j)

i

refer to the jthvalue in that order. Also, define a continuous unobservablevariable U

i

⇠ U [0, 1] (uniformly distributed in the interval[0, 1]) for every variable V

i

2 V. Then, construct a struc-tural causal model M = hU,V,F , P (u)i where:

• V is the same set of observables in G• U =

S

n

i=1 U0i

• F =n

fi

(paVi, u

i

) = infj

n

P

j

k=1 P (v(j)i

| paVi) > u

i

o

,

1 6 i 6 n}• U

i

⇠ U [0, 1], 1 6 i 6 n

At every variable Vi

, given a particular configuration ofPa

Vi , M simulates its value using the distribution P (vi |pa

Vi). By the Markov property, the joint distribution will be

equal to the product of those distributions.

The following lemma, permits the construction of a struc-tural causal model M compatible with a causal diagram G,using another model compatible with a related, but different,causal diagram G0 where some arrows in a chain of variableshave the reverse direction.Lemma 7 (Chain Reversal).

Proof. (By construction) Given M and any probability dis-tribution P (v) induced by it, compute the joint distributionP (r1, . . . , r`, t). Construct a new model M 0 with the sameset of observable variables and identical functions for allvariables but for R1, . . . , R`, T . For those, assign the func-tions f

Ri(ri�1, URi), 1 6 i 6 ` � 1 as in lemma 6. Also,let f

R`(UR`) = UR` , P (UR`) = P (r`). By lemma 6 thesub-models composed of R1, . . . , R`, T in M 0 and M pro-duce the exact same distribution and since the set of parentsand function for every other part of the model are exactly thesame the overall distribution is identical.

Finally, the following lemma allows to simplify theparametrization of an arbitrarily long chain of binary vari-ables.

Lemma 8 (Collapsible Path Parametrization). Consider acausal diagram G and a probability distribution P (v) in-duced by any SCM compatible with G. If G contains achain W0 ! W1 ! . . . ! Wk, where each Wi repre-sents a binary random variable, for every 1 6 i 6 k theonly incoming edge to W

i

is from Wi�1, and every condi-

tional distribution P (wi

| wi�1) = p, P (wi | wi�1) = q,

for some 0 < p, q < 1. Then, the conditional distributionP (w

k

| w0) = q�(p�1)(p�q)k

q�p+1 , P (wk | w0) =q�q(p�q)k

q�p+1 .

Proof. Since W0, . . . ,Wk is a chain, the value of Wk is afunction of W0 when all other W1, . . . ,Wk�1 are marginal-ized. All W

i

, 1 6 i 6 k are independent of any othervariable given W0. Therefore, the distribution P (wk | w0)is equal to

P

k�1i=1

Q

k

i=1 P (wi | wi�1), because any othervariable can be removed from any product in this expres-sion and summed out. This distribution can be calculatedas the product of 2x2 matrices corresponding to the condi-tional distributions P (w

i

| wi�1) when encoded as WM =

p q1� p 1� q

�

. The product of k of such matrices is read-

ily available if WM

is decomposed using its eigenvalues{1, p� q} and eigenvectors

nh

q

(1�p) ,�1i

, [1, 1]o

:

P (wk

| w0) =k�1X

i=1

k

Y

i=1

P (wi

| wi�1) = (WM )

k (19)

=

"

q�(p�1)(p�q)kq�p+1

q�q(p�q)kq�p+1

1� q�(p�1)(p�q)k

q�p+1 1�q�q(p�q)k

q�p+1

#

(20)

Definition 8 (Generalized Adjustment Criterion Type 1). Aset Z satisfies the generalized criterion relative to X,Y ina causal model with graph G augmented with the selectionmechanism S if:

(a) No element of Z is a descendant in GX


(b) All non-causal paths between X and Y in G are blockedby Z and S.

(c) Y is d-separated from S given X under the interventiondo(x), i.e., (Y ?? S | X)

G

X

.(d) Every X 2 X is either a non-ancestor of S or it is inde-

pendent of Y in GX

, i.e., 8X2X\AnS (X ??Y)GX

GX(S) is the graph where all edges into X 2 X \ AnS are

removed, where AnV

stands for the set of ancestors of anode or set V in G.Theorem 1 (Generalized Adjustment Formula Type 1).Given disjoint sets of variables X,Y and Z in a causalmodel with graph G. The effect P (y | do(x)) is given by

P (y | do(x)) =X

Z

P (y | x, z, S=1)P (z | S=1) (21)

in every model inducing G if and only if X,Y,Z satisfy thegeneralized adjustment criterion type 1 (Def. 8).

Lemma 2. Let X,Y,Z be three disjoint sets of variables inthe graph G. If Z satisfy the the conditions of the criteriontype 1 relative to the pair X,Y, Z can be partitioned intosets of variables as follows:

• ZY,1nd

= {Z | Z 2 Z \DeX

and (Z ??Y | X, S)G

X

}• ZX,1

nd

= {Z | Z 2 Z \ DeX

\ ZY,1nd

and (Z ?? X |ZY,1

nd

, S)G

X(S)}

• ZYd

= {Z | Z 2 Z \ DeX

and (Z ?? Y |X,ZY,1

nd

,ZX,1nd

, S)G

X

}• ZX

d

= Z \DeX

\ ZYd

• ZY,2nd

= {Z | Z 2 Z \DeX

\ZY,1nd

\ZX,1nd

and (Z ??Y |X,ZY,1

nd

,ZX,1nd

,ZYd

,ZXd

, S)G

X

}• ZX,2

nd

= Z \DeX

\ ZY,1nd

\ ZX,1nd

Where (ZXd

?? X | ZYd

,ZY,1nd

,ZX,1nd

, S)G

X(ZYd

,S)and

(ZX,2nd

??X | Z \ ZX,2nd

, S)G

X(ZYd

,ZXd

,S)hold.

Proof. First, to show (ZX,2nd

??X | Z\ZX,2nd

, S)G

X(ZYd

,ZXd

,S),

we will assume, for the sake of contradiction that it is not thecase. Then, there exists a covariate Z 0 2 Z \De

X

\ ZY,1nd

\ZX,1

nd

\ ZY,2nd

that is d-connected to some X 2 X by a pathq. For Z 0 to not be in ZY,1

nd

, it must be d-connected to someY 2 Y by a path p that does not contain any collider, except,possibly Z 0 itself. In particular it may not contain S becauseof cond. (c). For Z 0 not to be in ZY,2

nd

, p does not containany covariates in ZY,1

nd

,ZX,1nd

,ZYd

,ZXd

or ZY,2nd

because, asp does not have colliders, any of those would close it. If Xis not an ancestor of S, q should have arrows going out fromX . Since Z 0 is a non-descendant of X , q would contain acollider, but for Z 0 not to be in ZX,1

nd

such collider must bean ancestor of S, contradicting the assumption that X is notan ancestor of S. Hence, X must be an ancestor of S.

If the path q have arrows into X , the junction of the pathsq and p witnesses a violation to cond. (d) of the criterionunless Z 0 is a collider, in which case the same path is non-causal, proper and open after all variables in Z are observed,contradicting cond. (b). If q have arrows going out from X ,and given that Z 0 is not a descendant of X , S must be acollider in q for Z 0 not to be in ZX,1

nd

. But, for cond. (c)to hold, Z 0 must be a descendant of a collider in p itself,because p has no other colliders. Then, observing Z 0 willactivate the path constituted by the concatenation of q and p,contradicting cond. (b). As a consequence, such Z 0 cannotexists, proving the first part claim.

Second, consider (ZXd

?? X |ZY

d

,ZY,1nd

,ZX,1nd

, S)G

X(ZYd

,S)and note that the empty

set satisfies it. For the case when Z \ DeX

\ ZYd

6= ;,assume for the sake of contradiction that the independencedoes not hold. Then, there exists Z 0 2 Z and some X 2 Xsuch that a directed path q goes from X to Z 0, and is openwhen X and S are observed (note that q cannot containany non-descendant of X). Since Z 0 /2 ZY

d

there exists

also a path p from Z 0 to some Y 2 Y that is open whenX,ZY,1

nd

,ZX,1nd

,ZYd

, S are observed, and does not containany node in X.

The path p must contain a collider (possibly Z 0); other-wise, the junction of q and p is a proper causal path contain-ing Z 0 which contradicts cond. (a) of the criterion. If Z 0 isthe collider itself cond. (b) will be violated unless a variablein ZY,2

nd

or ZX,2nd

closes q. But, the former cannot do it be-cause any variable in q would not satisfy the definition ofZY,2

nd

, and similarly, no node in q can be in ZX,2nd

. There-fore, Z 0 cannot be the collider in p. This means, that thereis a variable W 2 {S} [ ZY

d

[ ZY,1nd

[ ZX,1nd

is the colliderin p. However, S cannot be the one, because it would vio-late cond. (c). Similarly, W does not satisfy the definition ofZY

d

, because it is not independent of Y . And, W cannot bein ZY,1

nd

or ZX,1nd

because it is a descendant of X . As a con-sequence, such active path p is not possible, meaning thatZ 0 2 ZY

d

or does not exists.

Proof. (Of Theorem 1). (if) Suppose Z satisfy the the con-ditions of the theorem relative to the pair X,Y. Then Z canbe partitioned into several sets as in Lemma. 2. The causaleffect is derived as follows:

First, by condition (c), (Y ?? S | X)G

X

and S can be intro-duced into the expression:

P (y | do(x)) = P (y | do(x), S=1) (22)

By definition, ZY,1nd

can be introduced in the factor alongwith a sum over that variable:

P (y | do(x))

=X

Z

Y,1nd

P (y | do(x), zY,1nd

, S=1)P (zY,1nd

| S=1) (23)

Conditioning on ZX,1nd

, it becomes:

P (y | do(x))

=X

Z

Y,1nd

,Z

X,1nd

h


, zx,1nd

, S=1) (24)

P (zx,1nd

| do(x), zY,1nd

, S=1)P (zY,1nd

| S=1)i

By definition of ZX,1nd

the independence(ZX,1

nd

??X | ZY,1nd

, S)G

X(S)holds (note that G

X(S) =

GX(ZY,1

nd

,S)since ZY,1

nd

is not a descendant of X), and rule

3 of the do-calculus allow the removal of do(x) from thesecond term of the previous expression, which allow to jointhe second and third factor together using the chain rule

P (y | do(x))

=X

Z

Y,1nd

,Z

X,1nd

h


, zX,1nd

, S=1) (25)

P (zY,1nd

, zX,1nd

| S=1)i

The second factor can be summed over ZYd

, then pull thesum out and introduce ZY

d

in the first factor

P (y | do(x))

=X

Z

Y,1nd

,Z

X,1nd

,Z

Y

d

h


, zX,1nd

, zYd

, S=1)

(26)

P (zY,1nd

, zX,1nd

, zYd

| S=1)i

Conditioning the first factor on ZXd

, yields:

P (y | do(x))

=X

Z

Y,1nd

,Z

X,1nd

,Z

Y

d

,Z

X

d

h


, zX,1nd

, zYd

, zXd

, S=1)

P (zXd

| do(x), zY,1nd

, zX,1nd

, zYd

, S=1)

P (zY,1nd

, zX,1nd

, zYd

| S=1)i

(27)

By lemma 2, the independence (ZXd

?? X |ZY

d

,ZY,1nd

,ZX,1nd

, S)G

X(ZYd

,S)holds and using rule 3 of

do-calculus, the X operator can be removed from thesecond factor, which allows to join the same and the thirdfactor together:

P (y | do(x))

=X

Z

Y,1nd

,Z

X,1nd

,Z

Y

d

,Z

X

d

h


, zX,1nd

, zYd

, zXd

, S=1)

P (zY,1nd

, zX,1nd

, zYd

, zXd

| S=1)i

(28)

Summing over ZY,2nd

in the second factor, pulling thesummation out, and introducing the same term intothe first factor using the independence (ZY,2

nd

?? Y |X,ZY,1

nd

,ZX,1nd

,ZYd

,ZXd

, S)G

X

, yields:

P (y | do(x))

=X

Z

Y,1nd

,Z

X,1nd

,Z

Y

d

,Z

X

d

,Z

Y,2nd


, zX,1nd

, zYd

, zXd

,ZY,2nd

, S=1)

P (zY,1nd

, zX,1nd

, zYd

, zXd

,ZY,2d

| S=1)�

(29)

Conditioning on ZX,2nd

:

P (y | do(x))

=X

Z

P (y | do(x), z, S=1)

P (zX,2nd

| do(x), zY,1nd

, zX,1nd

, zYd

, zXd

,ZY,2d

, S=1)

P (zY,1nd

, zX,1nd

, zYd

, zXd

,ZY,2d

| S=1)�

(30)

By lemma 2, the independence (ZX,2nd

?? X | Z \ZX,2

nd

, S)G

X(ZYd

,ZXd

,S)holds and using rule 3 of do-calculus,

the X operator can be removed from the second factor,which allows to join the same and the third factor together:

P (y | do(x)) =X

Z

P (y | do(x), z, S=1)P (z | S=1)

(31)

Conditions (a) and (b) imply (Y ?? X | Z, S=1)G

X

andcan be used together with rule 2 of do-calculus to remove ofthe do() operator from the first factor, which results in theadjustment formula:

P (y | do(x)) =X

Z

P (y | x, z, S=1)P (z | S=1) (32)

(Only if) Note that condition (b) extends the second con-dition from the adjustment criterion but also requiring allnon-causal paths to be blocked even when S is observed.Suppose conditions (a) and (b) do not hold, then for anymodel compatible with G

S

, which is also compatible withG, the adjustment formula is equal to

P

Z

P (y | x, z)P (z).But by the adjustment criterion (Shpitser, VanderWeele, andRobins 2010) this expression will not be equal to P (y |do(x)).

To show the necessity for the extension of condition (b),and conditions (c) and (d), we construct counter examplesfor the identifiability and recoverability of the causal effect.In every case, let V represent all variables in the graph ex-cept for the selection mechanism S, and Q refer to the ad-justment formula as in Eq. (32). We construct two SCMsM1 and M2, that induce probability distributions P1 and P2,respectively. M1 and M2 will be compatible with G, andagree in the probability distribution under selection bias

P1(v | S=1) = P2(v | S=1) (33)

but Q1 in M1 will be different distribution than Q2 in M2.Let M1 be compatible with G and M2 with G

S

, making Sindependent from Pa

S

in M2 (i.e. (V ?? S)P2 ). Recov-erability should hold for any parametrization, hence with-out loss of generality, all variables are assumed to be binary.The construction parametrizes P1 through its factors (as inlemma 6) and then parametrizes P2 to enforce (33). As aconsequence, P2(v) = P2(v | S=1).

Without loss of generality, our attention can be directedinto the particular Y 0 2 Y not satisfying the condition, andon the causal effect for Y 0. To do this, the constructed modelwill have every variable in Y \ {Y 0} disconnected from the

X

L

Z1

N

R

S

Q

P

Z2

O

Y 0

Figure 6: Non-causal path between X and Y activated whenS and Z is observed. Edges with dotted lines encode arbi-trarily long chains of nodes.

graph, more precisely (Y \ {Y 0}??V) holds, so that:

P (y | do(x)) =X

Z

P (y | x, z)P (z)

=Y

Y

X

Z

P (y | x, z)P (z)

=

0

@

Y

Y\Y 0P (y)

1

A

X

Z

P (y0 | x, z)P (z)

= �X

Z

P (y0 | x, z)P (z)

where � represents the product of the marginal distributionof the remaining Y \ {Y 0}.

Suppose that condition (b) is not satisfied because there issome non-causal path from X 2 X to Y 2 Y blocked by Zbut open when S is observed. The model in Fig. 6 containsa non-causal path that has arrows outgoing from X and Y .Models with arrow incoming to those variables belong to thesame equivalence class as this one (lemma 7 can be use toreverse the directionality along chains of variables). It alsocontain one collider before and one after S. Models withmore colliders in the non-causal path can be constructed ina similar way. Following the general construction describedbefore, the value for the Q1 and Q2 are:

Q1 = �X

Z

P1(y0 | x, z)P1(z) =

�X

Z

P1(y0)P1(z) = �P1(y)

Q2 = �X

Z

P2(y0 | x, z)P2(z)

= �X

Z

P1(y0 | x, z, S=1)P1(z | S=1)

= �X

Z

P1(y0 | x, z, S=1)P1(z | S=1)

= �X

Z

PL,N,R,Q,P,O P1(y

0,x,z,l,n,r,q,p,o,S=1)P

Y 0,L,N,R,Q,P,O P1(y0,x,z,l,n,r,q,p,o,S=1)P1(z|S=1)

The numerator of the first term in the summation can be fac-torized

P1(y0,x, z, l, n, r, q, p, o, S=1)

= P1(x)P1(l | x)P1(z⇤1 | l, n)P1(n)P1(r | n)P1(S=1 | r, q)P1(q | p)P1(z⇤2 | p, o)P1(p)P (o | y0)P1(y0)

All the conditional distributions in the previous expressioncan be parametrized using lemma 6 and lemma 8 with pa-rameters p = 3/5, q = 2/5, as follows: P1(x) = P1(y0) =P1(n) = P1(p) = 1/2, P1(z⇤1 | l, n) = P1(z⇤1 | l, n) =3/4, P1(z⇤1 | l, n) = P1(z⇤1 | l, n) = 1/4, P1(z⇤2 |p, o) = P1(z⇤2 | p, o) = P1(z⇤2 | p, o) = 1/2, P1(z⇤2 |p, o) = 3/4, P1(S=1 | r, q) = P1(S=1 | r, q) =1/2, P1(S=1 | r, q) = 1/4, P1(S=1 | r, q) = 3/4, P1(l |x) = 1/2 + ✏1/2, P1(l | x) = 1/2 � ✏1/2, P1(r | n) =1/2 + ✏2/2, P1(r | n) = 1/2 � ✏2/2, P1(q | p) = 1/2 +✏3/2, P1(q | p) = 1/2�✏3/2, P (o | y0) = 1/2+✏4/2, P (o |y0) = 1/2� ✏4/2, where ✏i = (1/5)ki , and k1 is the lengthof the path X to L, k2 the length of the path from N to R,k3 the length of the path from P to Q and k4 the length ofthe path from Y 0 to O.

Calculating both Q1 and Q2 with this parametrization weobtain:

Q1 = �/2

Q2 = �/2

✓

14✏3✏4(✏1(7✏2 + ✏2✏3) + 56)(✏3 + 7)

� 14✏3✏4(✏1(7✏2 + ✏2✏3)� 56)(✏3 + 7)

�2✏3 � ✏3✏4 + ✏23✏4 � ✏23 + 63

(✏3 + 7)(✏3 � 9)

� 18✏3✏4(✏1(9✏2 � ✏2✏3)� 72)(✏3 � 9)

+18✏3✏4

(✏1(9✏2 � ✏2✏3) + 72)(✏3 � 9)

◆

Q1 and Q2 only when one of ✏i, i = 1, 2, 3, 4 is equal to 0,which is never possible in this parametrization since every✏i

= (1/5)ki and every ki

> 0.

Suppose condition (c) does not hold, then, there is an openpath between Y and S in G

X

. The following are the casesfor which Y 0 may violate cond. (c). Figure 7 illustrates thestructure of the cases graphically.

case 1: Y 0 2 PaS

Let W be the set of nodes connecting X and Y 0 with di-rected paths. Consider the induced subgraph G0 whereall nodes in V \ {X,W, Y 0, S} are disconnected from{X,W, Y 0, S}. It must be the case that Z and W are dis-joint, else condition (a) is violated. Consequently, everyZ is disconnected, and (Z ?? Y 0)

P1 holds. M1 and M2are constructed from G0, the adjustment formula in the

X Y 0

S

(a) Case 1

X Y 0

R

S

(b) Case 2

X Y 0

Z0

R

S

(c) Case 3

X Y 0

Q

N

R

S

(d) Case 4

X Y 0

Q

Z0

R

S

(e) Case 5

X Y 0

S

(f) Case 6

Figure 7: Cases considered for the necessity of condition(c) in the proof for Thm. 1. Dotted directed arrows indicatechains of arbitrary length in the graph.

second model can be expressed as:

Q2 = �X

Z

P2(y0 | x, z)P2(z)

= �X

Z

P1(y0 | x, z, S=1)P1(z | S=1)

= �X

Z

P1(y0 | x, S=1)P1(z | S=1)

= �P1(y0 | x, S=1)

= �P1(y0,x, S=1)

P

Y

0 P1(y0,x, S=1)

= �P1(S=1 | y0)P1(y0 | x)

P1(S=1 | y0)P1(y0 | x) + P1(S=1 | y0)P1(y0 | x)

Using lemma 6, let P1(S=1 | y0) = ↵ and P1(S=1 |y0) = � with 0 < ↵,� < 1 and ↵ 6= �. Proceed withlemma 8 (p = q = 1/2) to define P (y0 | x) = 1/2. Theprevious expression becomes:

Q2 = �↵

↵+ �

Following a similar derivation it can be established thatQ1 = �/2 which is never equal to Q2 in this parametriza-tion.

case 2: There is a directed path p from Y 0 to S without anyZ.Let R be the parent of S in such path, let W

1

be theset of nodes between X and Y 0 as in the previous case.Similarly let W

2

be the variables in the path from Y 0to R. Now, consider the graph G0 where all nodes ex-cept for {X,W

1

, Y 0,W2

, R, S} are disconnected from{X,W

1

, Y 0,W2

, R, S}. Proceeding as in the previouscase, with the consideration that Z is disconnected fromthe rest of the graph, yields:

Q2 = �P1(y0 | x, S=1) = �P1(y

0,x, S=1)

P1(x, S=1)

The numerator can be rewritten as:

P1(y0,x, S=1) =

X

R

P1(y0,x, r, S=1)

=X

R

P1(x)P1(y0 | x)P1(r | y0)P1(S=1 | r)

Factorizing the denominator analogously, the term P1(x)is the same and can be cancelled out, then Q2 becomes:

Q2 = �P1(y0 | x)

P

R

P1(r | y0)P1(S=1 | r)P

Y

0 P1(y0 | x)P

R

P1(r | y0)P1(S=1 | r)

Using lemma 8 to set P1(r | y0) = 1/2 + ✏/2, P1(r |y0) = 1/2 � ✏/2 where ✏ = (1/5)k (using p = 3/5, q =2/5), and defining P (S=1 | r) = 2/3 and P (S=1 | r) =1/2 leads to Q2 = �(1/2 + ✏/14) and Q1 = �/2 whichare never equal.

case 3: There is a directed path p from Y 0 to S that containssome Z0 2 Z

Let R be the parent of S in such path. It can be assuredthat, X and Y 0 are not connected by any causal path, oth-erwise Z0 violates condition (a). Let W1 be the nodesin the subpath between Y 0 and Z0, and W2 those in be-tween Z0 and R. Consider the graph G0 where all nodesexcept for {X, Y 0,W

1

, Z0,W2, R, S} are disconnectedfrom {X, Y 0,W

1

, Z0,W2, R, S}.Every Z0 = Z \ {Z0} is disconnected from the rest of thegraph, then:

Q2 = �X

Z

P1(y0 | x, z, S=1)P1(z | S=1)

= �X

Z0

X

Z

0

P1(y0 | x, z0, z0, S=1)P1(z0, z0 | S=1)

= �X

Z0

P1(y0 | x, z0, S=1)P1(z0 | S=1)

= �X

Z0

P1(y0 | x, z0)P1(z0 | S=1)

= �X

Z0

P1(y0,x, z0)P

Y

0 P1(y0,x, z0)P1(z0 | S=1)

The numerator of the fraction in the last expression isequal to:

P1(y0,x, z0) = P1(x)P1(y

0)P1(z0 | y0)A similar factorization can be employed for the denomi-nator, as well as for Q1. The factor P1(x) appears in bothparts of the fractions and can be canceled:

Q1 = �X

Z0

P1(y0)P1(z0 | y0)P

Y

0 P1(y0)P1(z0 | y0)P1(z0)

Q2 = �X

Z0

P1(y0)P1(z0 | y0)P

Y

0 P1(y0)P1(z0 | y0)P1(z0 | S=1)

Now, P1(z0) and P1(z0 | S=1) are derived in similarterms:

P1(z0) =X

Y

0

P1(z0, y0) =

X

Y

0

P1(y0)P1(z0 | y0)

P1(z0 | S=1) =P1(z0, S = 1)

P1(S = 1)

=P1(S = 1 | z0)

P

Y

0 P1(y0)P1(z0 | y0)P1(S = 1)

Replacing P1(z0) and P1(z0 | S=1) in Q1 and Q2, thensimplifying:

Q1 = �X

Z0

P1(y0)P (z0 | y0) = �P1(y0)

Q2 = �

P

Z0P1(y0)P1(z0 | y0)P1(S=1 | z0)

P1(S = 1)= �

P1(y0, S=1)

P1(S = 1)

= �P1(y0)P1(S=1 | y0)

P

Y

0 P1(S=1 | y0)P1(y0)

The term P1(S=1 | y0) =P

R

P1(S=1 | r)P1(r | y0).Lemma 8 can be employed exactly as in the previous case,

and P1(y0) can be defined directly since it has no parents,for instance P1(y0) = 1/2, then the queries end up as:

Q1 = �12 Q2 = �

�

12 +

✏

14

�

Which are never equal.case 4: There is a path p connecting Y 0 and S that goes

through and ancestor of both, and does not contain anynode in Z.Let N be the closest common ancestor of Y 0 and S. Let Rbe the parent of S and Q the parent of Y 0 in the mentionedpath. Let W

1

be the set of nodes between X and Y 0. LetW

2

and W3

be the nodes in the paths from N to Q andfrom N to R respectively. Consider the graph G0 wherethe arrows in the subpath from N to Q are reversed and allnodes except for {X,W

1

, Y 0, Q,W2

, N,W3

, R, S} aredisconnected from {X,W

1

, Y 0, Q,W2

, N,W3

, R, S}.Any model constructed for G0 can be translated to a modelcompatible with G using lemma 7. Following the samederivation as in case 2 (taking into account that Z is dis-connected from the rest of the graph) yields:

Q2 = �P1(y0,x, S=1)

P

Y

0 P1(y0,x, S=1)

The numerator of the last expression can be rewritten as:

P1(y0,x, S=1) =

X

Q

P1(y0,x, q, S=1)

=X

Q

P1(x)P1(y0 | x, q)P1(q)P1(S=1 | q)

By rewriting the denominator similarly, the term P1(x)appearing in both vanishes, then Q2 becomes:

Q1 = �

P

Q

P1(y0 | x, q)P1(q)P

Y

0,Q

P1(y0 | x, q)P1(q)

Q2 = �

P

Q

P1(y0 | x, q)P1(q)P1(S=1 | q)P

Y

0,Q

P1(y0 | x, q)P1(q)P1(S=1 | q)

Lemma 8 can be employed to set P1(r | q) = 1/2 +✏/2, P1(r | q) = 1/2 � ✏/2 where ✏ = (1/5)k (us-ing p = 3/5, q = 2/5). Define P (S=1 | r) = 2/3and P (S=1 | r) = 1/2. Calculate P (S=1 | q) asP

R

P1(r | q)P1(S=1 | r). Also by lemma 8 let P1(y0 |q, x) = P1(y0 | q, x) = 3/4, P1(y0 | q, x) = P1(y0 |q, x) = 1/2. It leads to:

Q1 = �38 Q2 = �

�

38 +

✏

56

�

which are never equal.case 5: There is a path p connecting Y 0 and S that goes

through an ancestor of both, and contains some Z0 2 Z.Let N,Q,R be be defined as in the previous case, alsoconstruct G0 the same way. Following the same deriva-tion strategy as in case 3, the query expressions become:

Q1 = �X

Z0

P

Q

P1(y0 | x, q)P1(q)P1(z0 | q)P

Y

0,Q

P1(y0 | x, q)P1(q)P1(z0 | q)P1(z0)

Q2 = �X

Z0

P

Q

P1(y0 | x, q)P1(q)P1(z0 | q)P

Y

0,Q

P1(y0 | x, q)P1(q)P1(z0 | q)P1(z0 | S=1)

Now, P1(z0) and P1(z0 | S=1) are derived in similarterms:

P1(z0) =X

Q

P1(z0, q) =X

Q

P1(q)P1(z0 | q)

P1(z0 | S=1) =P

R

P1(S = 1, z0, r)P

Z0,RP1(S = 1, z0, r)

=P1(z0)

P

R

P1(r | z0)P1(S = 1 | r)P

Z0P1(z0)

P

R

P1(r | z0)P1(S = 1 | r)

Use lemma 8 to parametrize P1(r | z0) = 1/2 +✏1/2, P1(r | z0) = 1/2 � ✏1/2, P (z0 | q) = 1/2 +✏2/2, P (z0 | q) = 1/2 � ✏2/2 where ✏i = (1/5)ki , i ={1, 2} (using p = 3/5, q = 2/5 in both cases). DefineP (S=1 | r) = 2/3 and P (S=1 | r) = 1/2. Also bylemma 8 let P1(y0 | q, x) = P1(y0 | q, x) = 3/4, P1(y0 |q, x) = P1(y0 | q, x) = 1/2. The queries end up as:

Q1 = �38 Q2 = �

�

38 +

✏1✏256

�

Which are never equal.case 6: There is confounding path between Y 0 and S con-

sisting of unobservable variables.The models for this case can be constructed as in case 4,then moving the variables in the in the path from Q to R(included) from the set of observables to the set of unob-servables.

Now, suppose condition (d) does not hold. It should bethe case that there exists some X 2 X\An

S

and a Y 0 2 Ythat are connected through a back-door path p that would beusually blocked by some Z0 2 Z. There are two possiblecases (depicted in Fig. 8) not contradicting previous condi-tions:

case 1: Z0 is an ancestor of X and Y 0, and S is a descendantof X.Let W

1

be the nodes in the path between Z0 and X, W2those between Z0 and Y 0, and W3 those between X andS. As in previous cases, consider the graph G0 where allnodes but {X, Z0, Y 0,W1,W2,W3} are disconnectedfrom this set. Also, suppose X and Y 0 are not connectedby any path not going through Z0. The queries in thecorresponding models can be expressed as:

Q1 = �X

Z0

P1(y0 | x, z0)P1(z0)

= �X

Z0

P1(y0 | z0)P1(z0) = �P1(y0)

Q2 = �X

Z0

P1(y0 | x, z0, S=1)P1(z0 | S=1)

= �X

Z0

P1(y0 | z0)P1(z0 | S=1)

The term P1(z0 | S=1) is available as:

P1(z0 | S=1) =P1(z0)

P

X

P1(x | z0)P (S=1 | x)P

X,Z0P1(z0)P1(x | z0)P (S=1 | x)

X Y 0

Z0

S

(a) Case 1

X Y 0

Z0

S

(b) Case 2

Figure 8: Cases considered for the necessity of condition(d) in the proof for Thm. 1. Dotted directed arrows indicatechains of arbitrary length in the graph.

Let P1(z0) = 1/2, P1(y0 | z0) = 1/2 + ✏1/2, P (y0 |z0) == 1/2 � ✏1/2, P1(x | z0) = 1/2 + ✏2/2, P (x |z0) = 1/2 � ✏2/2. Also P1(S=1 | x) = 1/2 +✏3, P1(S=1 | x) = 1/2 � ✏3 where ✏i = (1/5)ki , i =1, 2, 3 (using lemma 8 with p = 3/5, q = 2/5 in allcases):

Q1 =12� Q2 = �

�

12 +

✏1✏2✏32

�

Which are never equal.case 2: Z0 is an ancestor of X and a descendant of Y 0 and

S is a descendant of X.In this case there are no causal paths between X and Y 0otherwise Z0 violates condition (a). Lemma 7 can be usedto change the direction of the edges in the path from Y 0 toZ0 while staying in the same equivalence class, then thesame parametrization from the previous case applies.

Definition 9 (Generalized Adjustment Criterion Type 2).Let T be the set of variables measured without selection biasin the overall population. Also, let X,Y,Z be disjoint setsof variables in a causal model with diagram G, augmentedwith the selection mechanism S, where Z ✓ T. Then, Zsatisfies the generalized criterion relative to X,Y if:

(a) No element in Z is a descendant in GX


(b) All non-causal X-Y paths in G are blocked by Z.(c) Y is independent of the selection mechanism S given Z

and X, i.e. (Y ?? S | X,Z)Theorem 3 (Generalized Adjustment Formula Type 2). LetT is the set of variables measured without selection bias.Given disjoint sets of variables X,Y and Z ✓ T and a

causal diagram G, then, for every model inducing G, theeffect P (y | do(x)) is given by

P (y | do(x)) =X

Z

P (y | x, z, S=1)P (z) (34)

if and only if the set Z satisfies the generalized adjustmentcriterion type 2 relative to the pair X,Y.

Proof. (if) Suppose the set Z satisfies the conditions relativeto X,Y. Then, by conditions (a) and (b), for every modelinduced by G we have:

P (y | do(x)) =X

Z

P (y | x, z)P (z)

We note that S can be introduced to the first term by cond.(c), which entail Eq. (34).

(Only if) We prove this by the contrapositive: In case con-ditions (a) or (b) do not hold the same argument as in theproof of the previous type applies here.

To argue the necessity of condition (c), we show that theadjustment formula is not recoverable. Let V represents allvariables in the graph except for the selection mechanismS, we construct two models with distributions P1 and P2,compatible with G such that they agree in the probabilitydistribution under selection bias

P1(v | S=1) = P2(v | S=1) (35)

and on the non-biased distribution over Z,

P1(z) = P2(z) (36)

but Q1 in the first model provides a different distributionthan Q2 in the second model. Let P1 be compatible with Gand P2 compatible with G

S

such that (V ?? S)P2 . Recov-

erability should hold for any parametrization, hence with-out loss of generality, we describe Markovian models andassume that every variable is binary. In every constructionwe parametrize P1 through its factors and then parametrizeP2 to enforce (35) and (36). Moreover, (35) also equals toP2(v).

Suppose condition (c) does not hold, then there is an openpath between Y and S not blocked when Z is observed. Asfor the previous type proof, we fix any Y 0 2 Y not satisfyingthe condition and consider the query as:

Q = �X

Z

P (y0 | x, z)P (z)

where � represents the product of the marginal distributionof the remaining Y.

Now, let us consider every possible scenario in which con-dition (c) may be unsatisfied. Fig. 9 illustrates every case foreasier reference.

case 1: Y 0 2 PaS

Proceed exactly as in case 1 of the proof for Thm. 1.case 2: There is a directed path p from Y 0 to S

Consider the same name convention as in case 2 ofThm. 1. It should be the case that ({R} [ H) \ Z =; for condition (c) to be violated, therefore the sameparametrization of the mentioned case applies here.

X U Y 0

S

(a) Case 1

X U Y 0

R

W

S

(b) Case 2

X U Y 0

W

R

Q

S

(c) Case 3

X Y 0

Z0

L

S

(d) Case 4

X U Y 0

LR

S

(e) Case 5

X U Y 0

R

Z0

L

S

(f) Case 6

Figure 9: Ways in which Y 0 and S can be connected, unidi-rected dotted paths indicate that it may content an arbitrarynumber of variables

case 3: The path goes through a common ancestor of Y 0 andS with no colliders.This reduces to case 4 in the proof for Thm. 1.

case 4: Non directed path from Y 0 to S with some Z0 2 Zas a collider.X must be disconnected from Y 0 otherwise condition (a)is violated since Z0 is a descendant of Y 0 which is in everycausal path from X to Y 0. Let L be the common ances-tor of S and Z0, without loss of generality assume thatL is directly connected to Z0, if this were not the casewe can apply lemma 8 with p = 1/2, q = 1/2 and usethe same parametrization below. In this model the causaleffect P (y0 | do(x)) = P (y0), however the adjustmentformula will not be equal to this effect in every modelcompatible with the graph. To see this consider a modelwhere every Z but Z0 is also disconnected in the graph,we have:

Q = �X

Z

P (y0 | x, z, S=1)P (z)

= �X

Z0

X

Z

0

P (y0 | x, z0, z0, S=1)P (z0, z0)

= �X

Z0

P (y0 | x, z0, S=1)P (z0)

= �X

Z0

P (y0,x, z0, S=1)P

Y

0 P (y0,x, z0, S=1)P (z0)

We can write the numerator of the fraction in the last ex-pression as:

P (y0,x, z0, S=1)

=X

L

P (y0,x, z0, l, S=1)

= P (x)X

L

P (y0)P (l)P (z0 | y0, l)P (S=1 | l)

Let ↵L

(y0) = P (y0)P (l),f(y0, l, z0) = P (z0 | y0, l)P (S=1 | l) andg(y0, l, z0) = P (z0 | y0, l). We have:

Q = �X

Z0

P (x)P

L

↵L

(y0)f(y0, l, z0)

P (x)P

L,Y

0 ↵L

(y0)f(y0, l, z0)P (z0)

= �X

Z0

P

L

↵L

(y0)f(y0, l, z0)P

L,Y

0 ↵L

(y0)f(y0, l, z0)P (z0)

Consider the parametrization:P1(y0) = P1(l) = 1/2,P1(z0 | y0, l) = 1/2+ ✏, P1(z0 | y0, l) = 1/2� ✏, P1(z0 |y0, l) = 1/2, P1(z0 | y0, l) = 1/2, for 0 < ✏ < 1/2.Let P1(S=1 | l) = ↵, P1(S=1 | l̄) = �, and pick any0 < ↵,� < 1. We can calculate P (z0) as:

P (z0) =X

Y

0,L

P (z0 | y0, l)P (y0)P (l) = 1/2

Computing the queries with the given parametrization we

obtain:

P (y | do(x)) = �2

Q =�

2

(↵+ �)2 � 2✏2(↵� �)2

(↵+ �)2 � ✏2(↵� �)2

Q is always different that P (y | do(x)) for ↵ 6= �.case 5: There is a path between Y 0 and S passing by an an-

cestor of Y 0 having some X 0 2 X as a collider.Such path would have arrows incoming to X 0. But it isalso an open non-causal path between Y 0 and X 0, violat-ing condition (b).

case 6: There is a path p between Y 0 and S passing by anancestor R of Y 0 having some Z0 2 Z as a collider.Let us consider the variables X, Y 0, R, Z0, L, such that Lis a common ancestor of Z0 and S, that together with Rhas converging arrows into Z0. The path p can be seen asthe concatenation of four segments p1, . . . , p4 such that p1is the segment R, . . . , Y 0, p2 is the segment R, . . . , Z0, p3is the segment L, . . . , Z0, and p4 the segment L, . . . , S.Note that by construction, there might exist only chainsalong each of these segments, without loss of generalitywe assume that those are segments of length one, but itis trivial to stretch those segments using lemma 8 (withp=q=1/2) as in previous cases. When we have multipleZ0’s in p, we will have the concatenation of several seg-ments p2 and p3, and it will also be simple to extend theconstruction. We use a parametrization similar to the pre-vious cases, here Z0 = Z \ Z0:

Q2 = �X

Z

P1(y0 | x, z, S=1)P1(z)

= �X

Z0

X

Z

0

P1(y0 | x, z0, z0, S=1)P1(z0, z0)

= �X

Z0

P1(y0 | x, z0, S=1)

X

Z

0

P1(z0, z0)

= �X

Z0

P1(y0 | x, z0, S=1)P1(z0)

= �X

Z0

P1(y0,x, z0, S=1)P

Y

0 P1(y0,x, z0, S=1)P1(z0)

We can write the numerator of the fraction in the last ex-pression as:

P1(y0,x, z0, S=1)

=X

R,L

P1(y0,x, z0, r, l, S=1)

=X

R,L

P1(x)P1(y0 | r)P1(r)P1(z0 | r, l)P1(S=1 | l)P1(l)

= P1(x)X

R

P1(y0 | r)P1(r)

X

L

P1(z0 | r, l)P1(S=1 | l)P1(l)

Let ↵R

(y0) = P1(y0 | r)P1(r),f(r, z0) =

P

L

P1(z0 | r, l)P1(S=1 | l)P1(l) and

g(r, z0) =P

L

P1(z0 | r, l)P1(l). We have:

Q2 = �X

Z0

P1(x)P

R

↵R

(y0)f(r, z0)

P1(x)P

R,Y

0 ↵R

(y0)f(r, z0)P1(z0)

= �X

Z0

P

R

↵R

(y0)f(r, z0)P

R,Y

0 ↵R

(y0)f(r, z0)P1(z0)

Consider the following parametrization:P1(r) = P1(l) = 1/2, P1(y0 | r) = 1/2 + ✏, P1(y0 |r) = 1/2 � ✏, P1(z0 | r, l) = 1/2 + ✏, P1(z0 | r, l) =1/2 � ✏, P1(z0 | r, l) = 1/2, P1(z0 | r, l) = 1/2, for0 < ✏ < 1/2.Let P1(S=1 | l) = ↵, P1(S=1 | l̄) = �, and pick any0 < ↵,� < 1. We can calculate P1(z0) as:

P1(z0) =X

R,L

P1(z0 | r, l)P1(r)P1(l) = 1/2

Computing the queries with the given parametrization weobtain:

Q1 =�

2

Q2 =�

2

✓

1� 2✏3(↵� �)2

(↵+ �)2 � ✏2(↵� �)2

◆

Q2 is always different than Q1 for ↵ 6= � and 0 < ✏ <1/2.

causal effect identification by adjustment under confounding ...causal effect identiﬁcation by...

Documents