sensitivity analysis of linear structural causal modelsof e). missing directed edges represent...

20
Sensitivity Analysis of Linear Structural Causal Models Carlos Cinelli 1 Daniel Kumor 2 Bryant Chen 3 Judea Pearl 1 Elias Bareinboim 2 TECHNICAL REPORT R-41-L May, 2019 Abstract Causal inference requires assumptions about the data generating process, many of which are un- verifiable from the data. Given that some causal assumptions might be uncertain or disputed, for- mal methods are needed to quantify how sensitive research conclusions are to violations of those assumptions. Although an extensive literature exists on the topic, most results are limited to spe- cific model structures, while a general-purpose algorithmic framework for sensitivity analysis is still lacking. In this paper, we develop a formal, systematic approach to sensitivity analysis for ar- bitrary linear Structural Causal Models (SCMs). We start by formalizing sensitivity analysis as a constrained identification problem. We then develop an efficient, graph-based identification algorithm that exploits non-zero constraints on both directed and bidirected edges. This allows researchers to systematically derive sensitivity curves for a target causal quantity with an arbi- trary set of path coefficients and error covariances as sensitivity parameters. These results can be used to display the degree to which violations of causal assumptions affect the target quantity of in- terest, and to judge, on scientific grounds, whether problematic degrees of violations are plausible. 1. Introduction Randomized controlled trials (RCT) are considered the gold standard for identifying cause-effect relationships in data- intensive sciences (Giffin et al., 2010). In practice, however, direct randomization is often infeasible or unethical, requir- ing researchers to combine non-experimental observations 1 Depts. of Statistics and Computer Science, University of Cali- fornia, Los Angeles, California, USA. 2 Dept. of Computer Science, Purdue University, West Lafayette, IN, USA. 3 Brex, San Francisco, CA, USA. Most of this work was conducted while at IBM Research AI. Correspondence to: Carlos Cinelli <[email protected]>. Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). with assumptions about the data generating process in or- der to obtain causal claims. These assumptions are usually encoded as the absence of certain causal relationships, or as the absence of association between certain unobserved factors. Conclusions based on causal models are, therefore, provisional: they depend on the validity of causal assump- tions, regardless of the sample size (Pearl, 2000; Spirtes et al., 2000; Bareinboim & Pearl, 2016). In many real settings, it is not uncommon that these assump- tions are subject to uncertainty or dispute. Scientists may posit alternative causal models that are equally compatible with the observed data; or, more mundanely, researchers can make identification assumptions for convenience, simply to proceed with estimation. 1 Regardless of the motivation, the provisional character of causal inference behooves us to formally assess the extent to which causal conclusions are sensitive to violations of those assumptions. The importance of such exercises is best illustrated with a real example, which directly impacted public policy. During the late 1950s and early 1960s, there was a fierce debate regarding the causal effect of cigarette smoking on lung cancer. One of its most notable skeptics was the influen- tial statistician Ronald Fisher, who claimed that, without an experiment, one cannot rule out unobserved common causes (e.g. the individual’s genotype) as being responsible for the observed association (Fisher, 1957; 1958). Techni- cally speaking, Fisher’s statement was accurate; data alone could not refute his hypothesis. Yet, although no RCT mea- suring the effect of cigarette smoking on lung cancer was performed, currently there exists a broad consensus around the issue. How could such a consensus emerge? An important step towards the current state of affairs was a sensitivity analysis performed by Cornfield et al. (1959). Their investigation consisted of the following hypotheti- cal question: if Fisher’s hypothesis were true, how strong would the alleged confounder need to be to explain all the observed association between cigarette smoking and lung cancer? The analysis concluded that, since smokers had nine times the risk of nonsmokers for developing lung cancer, the latent confounder would need to be at least nine times 1 As noted by Joffe et al. (2010), “such assumptions are usually made casually, largely because they justify the use of available statistical methods and not because they are truly believed”.

Upload: others

Post on 27-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

Carlos Cinelli 1 Daniel Kumor 2 Bryant Chen 3 Judea Pearl 1 Elias Bareinboim 2

TECHNICAL REPORTR-41-L

May, 2019

AbstractCausal inference requires assumptions about thedata generating process, many of which are un-verifiable from the data. Given that some causalassumptions might be uncertain or disputed, for-mal methods are needed to quantify how sensitiveresearch conclusions are to violations of thoseassumptions. Although an extensive literatureexists on the topic, most results are limited to spe-cific model structures, while a general-purposealgorithmic framework for sensitivity analysis isstill lacking. In this paper, we develop a formal,systematic approach to sensitivity analysis for ar-bitrary linear Structural Causal Models (SCMs).We start by formalizing sensitivity analysis asa constrained identification problem. We thendevelop an efficient, graph-based identificationalgorithm that exploits non-zero constraints onboth directed and bidirected edges. This allowsresearchers to systematically derive sensitivitycurves for a target causal quantity with an arbi-trary set of path coefficients and error covariancesas sensitivity parameters. These results can beused to display the degree to which violations ofcausal assumptions affect the target quantity of in-terest, and to judge, on scientific grounds, whetherproblematic degrees of violations are plausible.

1. IntroductionRandomized controlled trials (RCT) are considered the goldstandard for identifying cause-effect relationships in data-intensive sciences (Giffin et al., 2010). In practice, however,direct randomization is often infeasible or unethical, requir-ing researchers to combine non-experimental observations

1Depts. of Statistics and Computer Science, University of Cali-fornia, Los Angeles, California, USA. 2Dept. of Computer Science,Purdue University, West Lafayette, IN, USA. 3Brex, San Francisco,CA, USA. Most of this work was conducted while at IBM ResearchAI. Correspondence to: Carlos Cinelli <[email protected]>.

Proceedings of the 36 th International Conference on MachineLearning, Long Beach, California, PMLR 97, 2019. Copyright2019 by the author(s).

with assumptions about the data generating process in or-der to obtain causal claims. These assumptions are usuallyencoded as the absence of certain causal relationships, oras the absence of association between certain unobservedfactors. Conclusions based on causal models are, therefore,provisional: they depend on the validity of causal assump-tions, regardless of the sample size (Pearl, 2000; Spirteset al., 2000; Bareinboim & Pearl, 2016).

In many real settings, it is not uncommon that these assump-tions are subject to uncertainty or dispute. Scientists mayposit alternative causal models that are equally compatiblewith the observed data; or, more mundanely, researchers canmake identification assumptions for convenience, simplyto proceed with estimation.1 Regardless of the motivation,the provisional character of causal inference behooves us toformally assess the extent to which causal conclusions aresensitive to violations of those assumptions.

The importance of such exercises is best illustrated with areal example, which directly impacted public policy. Duringthe late 1950s and early 1960s, there was a fierce debateregarding the causal effect of cigarette smoking on lungcancer. One of its most notable skeptics was the influen-tial statistician Ronald Fisher, who claimed that, withoutan experiment, one cannot rule out unobserved commoncauses (e.g. the individual’s genotype) as being responsiblefor the observed association (Fisher, 1957; 1958). Techni-cally speaking, Fisher’s statement was accurate; data alonecould not refute his hypothesis. Yet, although no RCT mea-suring the effect of cigarette smoking on lung cancer wasperformed, currently there exists a broad consensus aroundthe issue. How could such a consensus emerge?

An important step towards the current state of affairs wasa sensitivity analysis performed by Cornfield et al. (1959).Their investigation consisted of the following hypotheti-cal question: if Fisher’s hypothesis were true, how strongwould the alleged confounder need to be to explain all theobserved association between cigarette smoking and lungcancer? The analysis concluded that, since smokers had ninetimes the risk of nonsmokers for developing lung cancer,the latent confounder would need to be at least nine times

1As noted by Joffe et al. (2010), “such assumptions are usuallymade casually, largely because they justify the use of availablestatistical methods and not because they are truly believed”.

Page 2: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

more common in smokers than in nonsmokers—somethingdeemed implausible by experts at the time.

Cornfield’s exercise reveals the fundamental steps of a sen-sitivity analysis. The analyst introduces a violation of acausal assumption of the current model, such as positing thepresence of unobserved confounders that induce a non-zeroassociation between two error terms. Crucially, however, weare willing to tolerate this violation up to a certain plausibil-ity limit dictated by expert judgment (e.g., prior biologicalunderstanding, pilot studies). The task is, thus, to system-atically quantify how different hypothetical “degrees” ofviolation (to be defined) affect the conclusions, and to judgewhether expert knowledge can rule out problematic values.

The problem of sensitivity analysis has been studied through-out the sciences, ranging from statistics (Rosenbaum &Rubin, 1983; Small, 2007; Rosenbaum, 2010; Cinelli & Ha-zlett, 2018; Franks et al., 2019) to epidemiology (Brumbacket al., 2004; Vanderweele & Arah, 2011; Ding & Vander-Weele, 2016; Arah, 2017), sociology (Frank, 2000), psy-chology (Mauro, 1990), political science (Imai et al., 2010;Blackwell, 2013), and economics (Leamer, 1983; Imbens,2003; Oster, 2017; Masten & Poirier, 2018). Notwithstand-ing all this attention, the current literature is still limited tospecific models and solved on a case-by-case basis. Consid-ering the ubiquity of causal questions in the sciences andartificial intelligence, a formal, algorithmic framework todeal with violations of causal assumptions is needed.

Causal modeling requires a formal language where the char-acterization of the data generating process can be encodedexplicitly. Structural Causal Models (Pearl, 2000) providesuch a language and, in many fields, including machinelearning, the health and social sciences, linearity is a popularmodeling choice. In this paper, we focus on the sensitivityanalysis of linear acyclic semi-Markovian SCMs. We al-low violations of exclusion and independence restrictions,such as (i) the absence or presence of unobserved commoncauses; and, (ii) the absence, presence or reversal of directcausal effects. Our contributions are the following:

1. We introduce a formal, algorithmic approach for sen-sitivity analysis in linear SCMs and show it can bereduced to a problem of identification with non-zeroconstraints, i.e, identification when certain parametervalues are fixed to a known, but non-zero, number.

2. We develop a novel graphical procedure, called PUSH-FORWARD, that reduces identification with a knownerror covariance to vanilla identification, for which aplethora of algorithms are available.

3. We develop an efficient graph-based constrained identi-fication algorithm that takes as input a set of sensitivityparameters and returns a sensitivity curve for the ef-fect estimate. The algorithm is theoretically sound and

experimental results corroborate its generality, show-ing canonical sensitivity analysis examples are a smallsubset of the cases solved by our proposal.

This paper is structured as follows. Section 2 reviews basicterminology and definitions that will be used throughout thetext. Section 3 shows how sensitivity analysis in the contextof linear SCMs can be reduced to a constrained identificationproblem. In Section 4 we develop a novel approach thatallows researchers to systematically incorporate constraintson error covariances of linear SCMs. Section 5 utilizes theseresults to construct a constrained identification algorithmfor deriving sensitivity curves. Finally, Section 6 presentsexperimental results to evaluate our proposals.

2. PreliminariesIn this paper, we use the language of structural causal mod-els as our basic semantic framework (Pearl, 2000). In partic-ular, we consider linear semi-Markovian SCMs, consistingof a set of equations of the form V = ΛV +U , where V rep-resent the endogenous variables, U the exogenous variables,and Λ a matrix containing the structural coefficients repre-senting both the strength of causal relationships and lackof direct causation among variables (when λij = 0). Theexogenous variables are usually assumed to be multivariateGaussian with covariance matrix E , encoding independencebetween error terms (when εij = 0).2 We focus on acyclicmodels, where Λ can be arranged to be lower triangular.

The covariance matrix Σ of the endogenous variables in-duced by model M is given by Σ = (I−Λ)−1E(I−Λ)−>.Without loss of generality, we assume model variables havebeen standardized to unit variance. For any three variablesx, y and z, we denote σyx to be the covariance of x and y,σyx.z to be the partial covariance of y and x given z, andRyx.z the regression coefficient of y on x adjusting for z.Causal quantities of interest in a linear SCM are usually en-tries of Λ (or functions of those entries), and identifiabilityreduces to checking whether they can be uniquely computedfrom the observed covariance matrix Σ.

Causal graphs provide a parsimonious encoding of someof the substantive assumptions of a linear SCM. Thecausal graph (or the path diagram) of model M is a graphG = (V,D,B), where V denotes the vertices (endogenousvariables), D the set of directed edges (non-zero entriesof Λ) and B the set of bidirected edges (non-zero entriesof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missingbidirected edges denote independence restrictions, repre-senting the fact that no latent common causes exist betweentwo observed variables. When clear from context, we may

2Gaussianity is not necessary for the results of the paper.

Page 3: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

treat model coefficients and their corresponding edges onthe graph interchangeably. We use standard graph nota-tion, where Pa(y) denotes the parents, Ch(y) the children,Anc(y) the ancestors, and De(y) the descendants of node y.

3. Sensitivity analysis and identificationIn this section we demonstrate the pervasiveness of identi-fication problems in sensitivity analysis in the context of asimple example. Suppose a scientist hypothesizes modelGO shown in Fig. 1a with the goal of estimating the directeffect of a treatment x on an outcome y (structural coeffi-cient λxy). By the single-door criterion (Pearl 2000), sheverifies λxy is identifiable in GO and equal to the regressionestimand Ryx.z , licensing her to proceed with estimation.

Another investigator, however, is suspicious of the boldassumption that no common causes (confounders) exist be-tween z and x inGO. She goes on, therefore, and constructsan alternative model GA (Fig. 1b) such that the bidirectededge z ↔ x is included to account for that possibility. Aquestion now naturally arises: how wrong could one beusing Ryx.z to estimate λxy if the true causal model weregiven by graph GA? Answering this question requires defin-ing a measure of “wrongness” of the estimand, and perhapsthe simplest such measure is its bias in the additive scale.3

Definition 1 (Bias of ES with respect to Q). Let Q be acomputable quantity given a fully specified linear structuralcausal model, and let ES be any estimand (a functional ofthe covariance matrix Σ). The bias of ES with respect to Qis the difference between the two quantities, B = ES −Q.

In our example, the proposed estimand is ES = Ryx.z ,the target quantity is Q = λxy, and to compute the bias,B = Ryx.z − λxy, one needs to identify λxy. Computingthe bias, thus, entails an identification problem (Prop. 1).

Proposition 1. The bias of estimand ES with respect totarget quantity Q is identifiable iff Q is identifiable.

In GA, however, the presence of the bidirected edge x↔ zrenders λxy unidentifiable, and computation of B is notpossible. How could one circumvent this impediment?

As in Cornfield et al. (1959), the impossibility of computingthe exact bias of Ryx.z with respect to λxy calls for anotherstrategy—expressing the bias as a function of the “strength”of the omitted confounders. In this way, the analyst canpredict for any hypothetical strength of the confounderswhether it would be enough to change the research con-clusions. This allows the analyst to bring new substantiveknowledge to bear, by submitting these quantitative resultsto a judgment of plausibility and ruling out some scenarios.

3Note this refers to the bias of an estimand (not an estimator),and it is the difference between the proposed estimand and thedesired (causal) target quantity in the population.

z

x y

λ zx

λxy

εzy

(a) Model GO

z

x y

λ zx

λxy

εzyε zx

(b) Model GA

z

x y

λzy

λxy

εzyε zx

(c) Model GB

Figure 1: Original model GO and two alternative models,GA and GB . In GA any of the remaining parameters (λzx,εzx or εzy) can be used as a sensitivity parameter for λxy,whereasGB rules out εzx as a sensitivity parameter. Addinga bidirected edge x ↔ y in GA does not prevent εzy frombeing a valid sensitivity parameter, whereas in GB it does.

Implementing this idea requires a precise definition of howto measure the “strength” of the omitted confounders. In ourexample, a possible candidate for measuring such strengthis the structural parameter εzx of the added bidirected edgez ↔ x. The task then becomes: (i) to determine whetherknowledge of εzx allows the identification of λxy; and, (ii) ifso, to find a parameterized estimand for λxy in terms of εzx.This 2-step procedure can be seen as an identification prob-lem with non-zero constraints (Def. 2).4

Definition 2 (θ-identifiability). LetM be a linear SCM andθ a set of parameters of M with known (non-zero) values. Acausal quantityQ is said to be θ-identifiable if Q is uniquelycomputable from Σ and θ.

We call any functional of Σ and θ, which identifies Q, aθ-specific estimand (or sensitivity curve) for Q with re-spect to sensitivity parameters θ. These estimands are theworkhorse for sensitivity analysis; they allow us to investi-gate how strong certain relationships must be (as parameter-ized by θ) in order to induce significant bias in our estimates.In other words, identifying a bias function in terms of θ (andthe observed data) for sensitivity analysis is equivalent tothe constrained identification problem of Def. 2 (Prop. 2).

Proposition 2. The bias of ES with respect to Q can beexpressed as a function of θ (and Σ) iff Q is θ-identifiable.

Going back to GA, it is indeed possible to construct anεzx-specific estimand for λxy (see Sec. 4):

λxy(εzx) =σxy − (σzx − εzx)σyz1− (σzx − εzx)σzx

(1)

Eq. 1 allows one to compute the bias ofRyx.z with respect tothe target quantity λxy , for any given hypothetical value ofεzx, if the true model were given byGA. Similarly, it allowsone to determine how strong the unobserved confounderwould need to be (as parameterized by εzx) such that the as-

4Note the relationship to z-ID (Bareinboim & Pearl, 2012), inwhich case constraints are imposed on experimental distributionsin the non-parametric setting.

Page 4: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

sociation Ryx.z is completely explained by the unobservedconfounder (i.e., the value of εzx such that λxy(εzx) = 0).

Still, what if the analyst has no knowledge to plausiblybound the strength of εzx? Even though the violation intro-duced in model GA was the addition of the bidirected edgex ↔ z, corresponding to εzx, there is no reason to limitour attention to that parameter, and any θ-specific estimandcould be used for sensitivity analysis. In fact, the two re-maining parameters of the model also yield valid θ-specificestimands (Sec. 5 provides an algorithmic solution),

λxy(λzx) =σxy − λzxσyz1− λzxσzx

(2)

λxy(εzy) =σzy − εzyσzx

(3)

Having a diverse option of sensitivity curves is important,because sensitivity analysis relies on plausibility judgments.One could argue, for instance, that assessing the plausibilityof εzx could be hard because it involves judging the effect ofconfounders of unknown cardinality, and perhaps, previousstudies give plausible bounds on the direct causal effect ofz on x (i.e., λzx), making a λzx-specific estimand moreattractive. Regardless of the specific scenario, it is clear thatthe choice of sensitivity parameters should be guided by theavailability of substantive knowledge.

Remarkably, several subtleties arise when derivingθ-specific estimands, even in simple models with three vari-ables. For instance, a natural approach for tackling theproblem in our example could be the re-expression of Ryx.zin terms of the covariance matrix implied by GA, yielding,

Ryx.z = λxy −(σzx − λzx)εzy

1− σ2zx

= λxy −εzxεzy

1− σ2zx

(4)

One may surmise upon the examination of such expressionthat two sensitivity parameters are needed. As shown inEqs. 1 to 3, this conclusion would be misleading.

These subtleties also appear when solving several variationsof a model. Imagine the alternative model is now GB , in-stead of GA, as shown in Figure 1c. Is εzx an admissiblesensitivity parameter in this case? Is the εzy-specific esti-mand derived in GA still valid if the model were GB? Ifwe include another violation in both models, a bidirectedarrow x ↔ y, would the previously obtained εzy-specificestimands still be valid? Despite the apparent similarity ofboth models, the answers to these questions reveal their sen-sitivity curves behave quite differently. The tools developedin this paper not only provide an algorithmic solution tothese questions, but also allow researchers to swiftly answerthem by simple inspection of the graph.

The above examples demonstrate several of the identifica-tion problems entailed by a sensitivity analysis. If in smallmodels these tasks are already complex, once we move to

models with more than three or four variables, an informal,case-by-case approach to sensitivity analysis is simply infea-sible. Therefore, we need a formal framework and efficientalgorithms to incorporate constraints in linear SCMs.

4. Incorporating constraints in linear SCMsExisting methods for identification in linear SCMs,such as the QID algorithm from Chen et al. (2017),are able to incorporate constraints on directed edgesand can be used to derive sensitivity curves such as theλzx-specific estimand of Eq. 2. The QID algorithm exploitsa known edge λab by creating an auxiliary variable (AV)b∗ = b − λaba (Chen et al., 2016). Subtracting outthe direct effect of a on b in this way may help withthe identification of other coefficients in the model. Forinstance, the λzx-specific estimand can be computed usingAVs in the following way: (i) create x∗ = x− λzxz;(ii) use x∗ as an instrument for λxy, resulting inλxy(λzx) = σyx∗/σxx∗ = (σxy − λzxσyz)/(1− λzxσzx).5

However, neither the εzx-specific nor the εzy-specific esti-mands can be derived using QID; in fact, there is no currentidentification algorithm that offers a principled and efficientway to exploit knowledge of bidirected edges.6 As this iscritical for the derivation of sensitivity curves (see Sec. 6),one of the core contributions of this work is the developmentof a novel graphical procedure that allows one to systemati-cally incorporate constraints on error covariances.

Conventional linear SCMs already impose one type of con-straint on error covariances: a lack of a bidirected edge be-tween two variables a and b encodes the assumption that thestructural parameter εab is zero. The identification problemimposed by sensitivity analysis, nonetheless, sets a differenttype of constraint—the error covariance εab is fixed to aknown but non-zero number. The essence of our method isto represent this knowledge in the graph.

Considering a graph G, covariance matrix Σ, and a knownerror covariance εab, our strategy consists of performing a“manageable” transformation of G such that the bidirectededge a↔ b is removed from the graph. By “manageable”we mean the implied covariance matrix Σ′ of the trans-formed graph G′ can still be derived from Σ and the knownvalue εab; otherwise, we would have no connection betweenG′ and the data, making inference in G′ impossible. Oncethis graphical transformation is applied, we can exploit any

5The QID algorithm extends generalized instrumental sets(Brito & Pearl, 2002) using a bootstrapping procedure wherebycomplex models can be identified by iteratively identifying co-efficients and using them to generate new auxiliary variables. Ittakes as inputs a graph G, covariance matrix Σ and known directededges D, and it returns the new set of identified directed edges.

6Methods from computer algebra offer a complete solution butare computationally intractable. See Sec. 6 and Supp. Materials.

Page 5: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

existing graphical identification method on the modifiedmodel G′, and solutions in G′ can be transfered back to so-lutions in the original model G. In short, we manipulate thegraph to reduce an identification problem with a non-zeroconstraint to a standard one.

The easiest way to introduce our method, which we callPUSHFORWARD, is via an example. Consider againgraph GA in Fig. 1b, and assume εzy is known. Path-tracing(Wright, 1921) results in the following covariances, wherethe known parameter εzy is highlighted in red,

σzx = λzx + εzx (5)σzy = λzxλxy + εzxλxy + εzy (6)σxy = λxy + λzxεzy (7)

Ideally, we could create an alternative model G∗A wherethe bidirected edge z ↔ y is fully removed from the graph.For this to be useful, we need to be able to express the newimplied covariance matrix Σ∗A in terms of the original covari-ance matrix ΣA and the known error covariance εzy . Whileexpressing σ∗zy in terms of ΣA and εzy is straightforward(since, trivially, σ∗zy = σzy − εzy), it is not immediatelyclear how to write σ∗xy = σxy − λzxεzy = λxy in terms ofΣA and εzy, for this requires identifying either λxy or λzxin the original model.

Thus, rather than fully removing z ↔ y, we “push it for-ward” to the children of z, as shown in graph G′A of Fig. 2b.Note the bidirected edge is moved from being betweenz and y to being between x (a child of z) and y, with newstructural parameter ε′zy = λzxεzy. Path-tracing of G′Ashows its implied covariance matrix Σ′A is exactly the sameas ΣA, except for σ′zy , which can be obtained by subtractingεzy from σzy ,

σ′zx := σzx =λzx + εzx (8)σ′zy := σzy − εzy =λzxλxy + λxyεzx (9)

σ′xy := σxy =λxy + λzxεzy (10)

Since G′A has the same structural coefficients as G and weknow how to compute the covariance matrix induced byG′A from the known values Σ and εzy, we can use G′A toidentify the coefficients in our original model. In this case,z is an instrument for λxy in G′A, resulting in the estimandλxy(εzy) = σ′zy/σ

′zx = (σzy − εzy)/σzx of Eq. 3.

Applying the same logic to graph GB in Fig. 1c, assumeεzy is known. Since z has no other descendants except y,pushing forward εzy simply removes the bidirectededge z ↔ y. This results in the modified graph G′Bof Fig. 2d with the amortized covariance of z and y,σ′zy = σzy − εzy. Note εzy enters in no other covariancesof the system. The graph G′B renders z single-door admis-sible for the identification of λxy, giving us the estimandλxy(εzy) = R′yx.z = (σyx − σxz(σzy − εzy))/(1− σ2

xz).

z

x y

λ zx

λxy

εzyε zx

λzxεzy

(a) GA

z

x y

λ zx

λxyε′xy = λzxεzy

ε zx

(b) G′A

z

x y

λzy

λxy

εzyε zx

λzxεzy

(c) GB

z

x y

λzy

λxy

εzyε zx

λzxεzy

(d) G′B

Figure 2: Pushing forward εzy in GA renders z a validinstrument in G′A. Pushing forward εzy in GB renders zsingle-door admissible in G′B .

This simple graphical manipulation also makes it clear whyadding a bidirected edge x ↔ y as a further violation inthe original graphs GA and GB has different consequencesfor the identification of λxy. In G′A, z still remains a validinstrument even if the original graph had x↔ y; this wouldonly change the value of the structural coefficient ε′xy , whichwould now read ε′xy = εxy + λzxεzy. In G′B , however,adding x↔ y renders z inadmissible for single-door identi-fication of λxy , since this backdoor path cannot be blocked.

Sometimes it might be necessary to prune variables fromG′ to guarantee Σ′ is computable. Consider again GA andassume εzx is known. Pushing forward εzx results in Fig. 3bwhere, as before, we know σ′zx = σzx − εzx. However,path-tracing of Fig. 3b shows the covariance of z with ywould also need adjustment, σ′zy = σzy − λxyεzx. Thus,we have two cases: (i) if λxy is known, the adjustment isfeasible and we are done; (ii) if λxy is not (yet) known,the adjustment cannot be made; but, since y is a leaf node,it can be pruned from G′ (Tian & Pearl, 2003), avoidingthis problem (Fig. 3c). In this case, note the pruned graphis still helpful—now λzx can be identified. As previouslydiscussed, knowledge of λzx permits identification of λxyusing AVs, giving us the εzx-specific estimand of Eq. 1.

z

x y

λ zx

λxy

εzyε zx

(a) GA

z

x y

λ zx

λxy

εzyε zx

(b) Push forward εzx

z

x

λ zx

λxy

εzyε zx

(c) Prune y

Figure 3: Pushing forward εzx in GA requires adjusting σyz .If the adjustment is possible, y is kept in the graph as inFig. 3b; if not, y is marginalized (pruned) as in Fig. 3c.

The graphical manipulation of PUSHFORWARD is general,and can be performed whenever we have knowledge of aknown error covariance. Theorem 1 formalizes the proce-dure to arbitrary models. Given any bidirected edge x↔ ywith known value εxy , we remove it from the graph and reg-ister the new amortized covariance σ′xy = σxy − εxy . Nextwe repair the covariances of the descendants of x with y by,

Page 6: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

for every c ∈ Ch(x), adding (or modifying) the bidirectededge c ↔ y with the direct causal effect λxc times εxy.Finally, for any descendant z of y, we either (i) amortizeits covariance with x, if all edges that compose the totalcausal effect δyz of y on its descendant z are known, or(ii) marginalize z out by pruning the graph. The final outputis a modified model 〈G′,Σ′〉 where any graphical identifica-tion method can be applied; and, estimands in terms of Σ′

can be converted back to estimands in terms of Σ and εxy .

Theorem 1 (PUSHFORWARD). Given a linear SCM withgraph G, covariance matrix Σ, a set of known di-rected edges D, and known bidirected edge εxy, let thepair 〈G′,Σ′〉 be constructed from G and Σ as follows:

1. x↔ y is removed and σ′xy = σxy − εxy;

2. ∀c ∈ Ch(x), c 6= y, the bidirected edges c ↔ y areadded if they do not exist, and ε′cy = εcy + λcyεxy;

3. ∀z ∈ De(y), z 6= x, if there is an edge on any directedpath from y to z that is not in D, then z is removedfrom G′. For the remaining z, σ′xz = σxz − εxyδyz ,where δyz is the sum of all directed paths from y to z;

4. All other parameters and covariances remain the same.

Then, if λab is identifiable in G′, it is(εxy,D)-identifiable in G.

We denote by PF(G,Σ,D, εxy, x) the function that returnsthe modified model 〈G′,Σ′〉 as per Theorem 1. Pseudocodefor PF (which closely follows the steps of the theorem) aswell as the proof can be found in the supplementary material.

5. Algorithmic derivation of sensitivity curvesIn this section, we construct a graph-based constrained iden-tification algorithm for linear SCMs which systematicallyexploits knowledge of both path coefficients and error co-variances efficiently. Our algorithm relies on the PUSH-FORWARD method to incorporate constraints on bidirectededges, and on the AV technique (via the QID algorithm)to incorporate constraints on directed edges. This allowsthe algorithmic derivation of sensitivity curves for a targetquery λxy in arbitrary linear models, with an arbitrary setof directed and bidirected edges as sensitivity parameters.

Although the graphical modification of PUSHFORWARDis defined for one bidirected edge, the modified graph G′

is a valid model in which any graphical operation can beperformed. We can thus extend PUSHFORWARD to handlemultiple bidirected edges by iteratively applying it whenevera bidirected edge of the modified graph is still known—whatremains to be decided is the order in which to perform theseoperations. Note that testing all possible orders of graphical

x z

y

λxz λzyλ

xyεxy

εxz

εzy

(a) G

x z

y

λxz λzyλ

xyεxy

εxz

(b) G′z

x z

y

λxz λzyλ

xyεxy

εxz

ε ′zy

(c) G′y

x z

y

λxz λzyλ

xyεxy

εxzεzy

(d) G′′y

Figure 4: PF multiple edges in topological order.

manipulations can result in an algorithm with exponentialcomputational complexity, even when initially pushing for-ward a single bidirected edge εxy. This happens becausenew bidirected edges are created for each c ∈ Ch(x) and,if all the λcx are identifiable, all subsets of those bidirectededges may be eligible to be pushed forward again. Thus,here we propose an efficient procedure using topologicalordering, which performed as well as a brute-force approachin our computational experiments (Sec. 6).

Consider the example given in Fig. 4a. The task is to de-cide whether θ = (εxz, εxy, εzy) (in red) is an admissibleset of sensitivity parameters for the target coefficient λxy(in blue) and, if so, to find the corresponding sensitivitycurve. Our strategy consists of, for each node v, listing itsancestors a ∈ An(v), and, in topological order, iterativelypush forward εav if it is still known in the modified graph.By performing operations in this way, we are guaranteed tovisit each ancestor of v only once. Starting with node v = z,it has only one ancestor x and a single known bidirectededge to be removed, εxz . This can be handled with a one-step PUSHFORWARD operation (pruning y), resulting in themodified graph G′z of Fig. 4b, in which λxz can be triviallyidentified. Next, return to the original graph and considerv = y, with ancestors x and z. Following a topological or-der, we first push forward εxy , giving us the modified graphG′

y of Fig. 4c with new bidirected edge ε′zy = εzy +λxzεxy .Note all components of ε′zy are known, we can thus pushforward ε′zy in G′y, obtaining the graph G

′′

y in Fig. 4d, inwhich λxy is identified with sensitivity curve R′′yx.z .

x y

z

w

λxz

λ yz

λzw

λxy

εxw

εyw

εxy

ε yz

εzw

(a) Original Model

x y

z

w

λxz

λ yz

λzw

λxy

ε ′yw

εxy

ε yz

ε ′zw

(b) Push forward εxw

x y

z

w

λxz

λ yz

λzw

λxy

εxw

εxy

ε yz

ε ′zw

(c) Push forward εyw

Figure 5: Instruments with ancestors and descendants.

In the previous example we demonstrated how to system-atically deal with bidirected edges connected to ancestors

Page 7: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

Algorithm 1 CID(G,Σ,D,B)

1: initialize VB ← Vertices(B)2: repeat3: D ← D ∪ QID(G,Σ,D)4: for each v ∈ VB do5: 〈G′,Σ′〉 ← 〈G,Σ〉6: for each a ∈ An(v) in topological order do7: if ε′av is known then8: 〈G′,Σ′〉 ← PF(G′,Σ′,D, ε′av, a)9: D ← D ∪ QID(G′,Σ′,D)

10: end if11: end for12: end for13: for each εab ∈ B do14: 〈G′,Σ′〉 ← PF(G,Σ,D, εab, a)15: D ← D ∪ QID(G′,Σ′,D)16: end for17: until all directed edges have been identified or no edge

has been identified in the last iteration

of a node v; however, in linear models, descendants of vcan also help with the identification of direct causal effectsλav. Consider, for instance, Fig. 5a. The task is to find asensitivity curve for λxy in terms of θ = (εxw, εyw). Startwith node w and, as before, push forward εxw as in Fig. 5b.Here, λzw can be identified with x as an instrument. Re-turning to the original graph, now consider node y and pushforward the bidirected edge εyw with its descendant w, asin Fig. 5c. Since λzw has been identified, we can create theAV w∗ = w − λzwz which is a valid instrument for λxy .

These two cases illustrate our general procedure for han-dling multiple bidirected edges, which in combination withthe QID algorithm forms our constrained identification algo-rithm CID, provided in Algorithm 1. Lines 4 to 12 performPUSHFORWARD (PF) in topological ordering, each time ap-plying QID in the modified model to verify if new directededges can be identified; lines 13 to 16 perform a singlePUSHFORWARD operation on each bidirected edge, whichmay free descendants to be used as instruments as in Fig. 5.Since new identified edges can help both PUSHFORWARDas well as QID, this process is repeated until all or no newdirected edges are identified in the last iteration. The com-plexity of CID is dominated by QID, which is polynomialif the degree of each node is bounded (Chen et al., 2017).

An interesting 4-node example is shown in Fig. 6a,where εzw, a parameter neither related to x nor y, is anadmissible sensitivity parameter for λxy! Our algorithmderives an εzw-specific estimand for λxy as follows. It firstpushes forward εzw, and runs QID in the modified graph,resulting in the identification of λzw. Next, the algorithmreturns to the original graph, and runs QID, which uses λzwto create the auxiliary variable w∗ = w − λzwz, enabling

z

w

x y

λzw

λ zx

λxy

εzyε zx

ε zw

εwy

(a) λxy is εzw-identifiable (b) Sensitivity of λxy in terms of εzw

Figure 6: In Fig 6a note that, although not connected to xnor y, εzw is an admissible sensitivity parameter for λxy.Fig. 6b shows the sensitivity curve of λxy in terms of εzwfor a numerical simulation of the model in Fig. 6a.

the identification of λzx. Finally, still within QID, λxy isobtained using the auxiliary variable x∗ = x− λzxz.

As discussed in Sec.3, the utility of θ-specific estimands isto show how sensitive the target quantity of interest is todifferent hypothetical values of the sensitivity parameters θ.These results can then be submitted to quantitative plausi-bility judgments, for instance, in the form of θ ∈ Θp, whereΘp is a plausibility region. To illustrate how one could de-ploy this in practice, we provide a numerical example of thecausal model in Fig. 6a. Our goal is to assess how differenthypothetical values for εzw affects inference of λxy. In areal context, this needs to be estimated from finite samples,and here we use a maximum likelihood estimator. Fig. 6bshows the estimates for λxy (blue) for different values ofthe sensitivity parameter εzw, along with the corresponding95% confidence interval (gray). If, for instance, we canplausibly bound εzw to be within 0.1 to 0.3, the plot revealsλxy can be safely judged to be within -0.2 to -0.6.

6. Computational experimentsThe identification problem in linear systems has not yetbeen efficiently solved. Although there exists a complete so-lution using computer algebra (García-Puente et al., 2010),these methods are computationally intractable, making itimpractical for graphs larger than 4 or 5 nodes. Since werely on existing identification algorithms that are polyno-mial but not complete (i.e., QID cannot find all identifiableparameters), we cannot expect the CID algorithm to find allsensitivity curves as well. In this section, we report the re-sults of an extensive set of experiments aimed to empiricallyverify the generality of our approach. We have performedan exhaustive study of all possible queries in 3 and 4-nodemodels, which are essentially the largest instances computer

Page 8: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

3-NODE MODELS 4-NODE MODELS

ID ALGORITHM Directed Bidirected Both Directed Bidirected Both

QID (AVs only) 19(100%) – (0%) 68 (21%) 14,952(95%) – (0%) 170,304(29%)CID (AVs + PF) 19(100%) 109(100%) 320(100%) 14,952(95%) 50,708(97%) 555,758(96%)

GROUND TRUTH 19 109 320 15,740 52,016 578,858

Table 1: Number of θ-identifiable sensitivity queries (only when θ 6= ∅) per type of sensitivity parameters θ.

algebra methods can solve through brute force.7

A query consists of determining whether in model G, atarget parameter λxy is θ-identifiable given a set of sensi-tivity parameters θ. For 3-node models, we have 50 con-nected graphs with 720 possible queries; for 4-node models,we have 3,745 connected graphs and 1,059,156 possiblequeries.8 For each query, we used algebraic methods todetermine ground-truth identification and checked it againstthe results of both QID and CID. Our interest lies in thequeries that are θ-identifiable only when θ 6= ∅.

The results are given in Table 1, where columns restrictsensitivity parameters θ to be: (i) subsets of directed edges;(ii) subsets of bidirected edges; and, (iii) subsets of bothdirected and bidirected edges. The results show that our CIDalgorithm correctly identifies all possible sensitivity curvesfor 3-node models. Among 4-node models, our methodsolves 96% of all identifiable sensitivity queries.

These numbers reveal that, in the context of linear SCMs,canonical sensitivity analysis examples which have beenaddressed on a case-by-case basis in the literature (e.g.,Fig.7, target coefficient in blue and sensitivity parametersin red), are only a small subset of all possible sensitivityanalyses exercises enabled by our proposal.When compar-ing CID’s results to those of QID only, it is also clear thatsystematically incorporating constraints on bidirected edgesis essential for obtaining sensitivity curves.

A valid concern regarding CID’s current implementation isthat the proposed topological ordering for processing bidi-rected edges could be less capable than a general searchover all possible valid graphical manipulations. With thisin mind, we performed a thorough comparison of our pro-posal against other ordering methods for all queries in 3 and4-node models. Topological ordering proved to perform aswell as a brute-force search that recursively tests all possiblesubsets of bidirected edges that can be pushed forward.

Finally, the incompleteness of CID can stem from twosources: limitations of the graphical manipulations per-

7We use Gröbner bases, which has a doubly-exponential comp.complexity (Bardet, 2002). See Supp. Materials for details.

8For 5-node models, these numbers reach 1 million graphs and11 billion queries. Ground-truth computations in 5-node modelsusing computer algebra can take hours for a single graph.

z

x yλ zx

λzy

λxy

εxy

(a) Backdoor

z x yλzx

λzy

λxyεxy

εzy

(b) IV

m

x y

λ xm

λxy

λmy

εmy

εxy

(c) Mediation

Figure 7: Canonical sensitivity analyses: (a) backdoor viola-tion with unobserved confounders independent of observedconfounders (Carnegie et al., 2016); (b) putative instrumen-tal variable, where both the exclusion and independencerestriction are suspected to be violated (Wang et al., 2018);(c) randomized trial in which treatment x has side-effect m,and unobserved mediation-outcome confounding cannot beruled out (VanderWeele, 2010). For linear SCMs, these arespecial cases of all queries solved by our approach.

formed by PUSHFORWARD and the incompleteness of theidentification algorithm for directed edges, QID. Separat-ing the two can help guide efforts for future research. Toachieve that, we used algebraic methods to simulate howCID would have performed if it had access to a completeidentification algorithm for directed edges instead of QID.We found that CID would have identified over 99.99% of4-node sensitivity queries. This seem to suggest that: (i) themain bottleneck of CID is QID; and (ii) PUSHFORWARDwith topological ordering can reap the benefits of improvedidentification algorithms for directed edges.

7. ConclusionWe introduced a general algorithmic framework of sensitiv-ity analysis for linear SCMs. We reduced sensitivity anal-ysis to a constrained identification problem and developeda novel graphical procedure to systematically incorporateconstraints on bidirected edges. We then devised an effi-cient graph-based algorithm for deriving sensitivity curves.Exhaustive experiments corroborated the generality of ourproposal. Such systematic tools can help analysts betternavigate in the model space and understand the trade-offbetween the plausibility of assumptions and the strength ofconclusions. Extensions to other types of violations and tononlinear models are promising directions for future work.

Page 9: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

AcknowledgementsThe authors thank anonymous reviewers for helpful com-ments. Cinelli and Pearl are supported in parts by grantsfrom IBM Research, Defense Advanced Research ProjectsAgency [W911NF-16-057], National Science Foundation[IIS-1302448, IIS-1527490, and IIS-1704932], and Of-fice of Naval Research [N00014-17-S-B001]. Bareinboimand Kumor are supported in parts by grants from NSFIIS1704352, and IIS-1750807 (CAREER), IBM Research,and Adobe Research. Most of the work by Chen was con-ducted while at IBM Research AI.

ReferencesArah, O. A. Bias analysis for uncontrolled confounding in

the health sciences. Annual review of public health, 38:23–38, 2017.

Bardet, M. On the complexity of a grobner basis algorithm.pp. 8, 2002.

Bareinboim, E. and Pearl, J. Causal inference by sur-rogate experiments: z-identifiability. arXiv preprintarXiv:1210.4842, 2012.

Bareinboim, E. and Pearl, J. Causal inference and the data-fusion problem. Proceedings of the National Academy ofSciences, 113(27):7345–7352, 2016.

Blackwell, M. A selection bias approach to sensitivityanalysis for causal effects. Political Analysis, 22(2):169–182, 2013.

Brito, C. and Pearl, J. Generalized instrumental variables. InProceedings of the Eighteenth conference on Uncertaintyin artificial intelligence, pp. 85–93. Morgan KaufmannPublishers Inc., 2002.

Brumback, B. A., Hernán, M. A., Haneuse, S. J., and Robins,J. M. Sensitivity analyses for unmeasured confoundingassuming a marginal structural model for repeated mea-sures. Statistics in medicine, 23(5):749–767, 2004.

Carnegie, N. B., Harada, M., and Hill, J. L. Assessingsensitivity to unmeasured confounding using a simulatedpotential confounder. Journal of Research on EducationalEffectiveness, 9(3):395–420, 2016.

Chen, B., Pearl, J., and Bareinboim, E. Incorporating knowl-edge into structural equation models using auxiliary vari-ables. In Proceedings of the Twenty-Fifth InternationalJoint Conference on Artificial Intelligence, IJCAI 2016,pp. 3577–3583, 2016.

Chen, B., Kumor, D., and Bareinboim, E. Identificationand model testing in linear structural equation modelsusing auxiliary variables. In International Conference onMachine Learning, pp. 757–766, 2017.

Cinelli, C. and Hazlett, C. Making sense of sensitivity:Extending omitted variable bias, 2018.

Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld,A. M., Shimkin, M. B., and Wynder, E. L. Smoking andlung cancer: recent evidence and a discussion of somequestions. journal of National Cancer Institute, (23):173–203, 1959.

Cox, D., Little, J., and O’shea, D. Ideals, Varieties, andAlgorithms, volume 3. Springer, 1992.

Ding, P. and VanderWeele, T. J. Sensitivity analysis withoutassumptions. Epidemiology (Cambridge, Mass.), 27(3):368, 2016.

Fisher, R. Cigarettes, cancer, and statistics. The CentennialReview of Arts & Science, 2:151–166, 1958.

Fisher, R. A. Dangers of cigarette-smoking. British MedicalJournal, 2(5039):297, 1957.

Foygel, R., Draisma, J., and Drton, M. Half-trek criterionfor generic identifiability of linear structural equationmodels. The Annals of Statistics, pp. 1682–1713, 2012.

Frank, K. A. Impact of a confounding variable on a regres-sion coefficient. Sociological Methods & Research, 29(2):147–194, 2000.

Franks, A., D’Amour, A., and Feller, A. Flexible sensitiv-ity analysis for observational studies without observableimplications. Journal of the American Statistical Associ-ation, (just-accepted):1–38, 2019.

García-Puente, L. D., Spielvogel, S., and Sullivant, S. Iden-tifying causal effects with computer algebra. In Proceed-ings of the Twenty-Sixth Conference on Uncertainty inArtificial Intelligence (UAI). AUAI Press, 2010.

Giffin, R. B., Lebovitz, Y., English, R. A., et al. Transform-ing clinical research in the United States: challenges andopportunities: workshop summary. National AcademiesPress, 2010.

Imai, K., Keele, L., and Yamamoto, T. Identification, infer-ence and sensitivity analysis for causal mediation effects.Statistical science, pp. 51–71, 2010.

Imbens, G. W. Sensitivity to exogeneity assumptions inprogram evaluation. The American Economic Review, 93(2):126–132, 2003.

Joffe, M. M., Yang, W. P., and Feldman, H. I. Selectiveignorability assumptions in causal inference. The Inter-national Journal of Biostatistics, 6(2), 2010.

Leamer, E. Let’s take the con out of econometrics. TheAmerican Economic Review, 73(1):31–43, 1983.

Page 10: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

Masten, M. A. and Poirier, A. Identification of treatmenteffects under conditional partial independence. Econo-metrica, 86(1):317–351, 2018.

Mauro, R. Understanding love (left out variables error): Amethod for estimating the effects of omitted variables.Psychological Bulletin, 108(2):314, 1990.

Oster, E. Unobservable selection and coefficient stability:Theory and evidence. Journal of Business & EconomicStatistics, pp. 1–18, 2017.

Pearl, J. Causality. Cambridge university press, 2000.

Rosenbaum, P. R. Design of observational studies. SpringerSeries in Statistics, 2010.

Rosenbaum, P. R. and Rubin, D. B. Assessing sensitivity toan unobserved binary covariate in an observational studywith binary outcome. Journal of the Royal StatisticalSociety. Series B (Methodological), pp. 212–218, 1983.

Small, D. S. Sensitivity analysis for instrumental vari-ables regression with overidentifying restrictions. Journalof the American Statistical Association, 102(479):1049–1058, 2007.

Spirtes, P., Glymour, C. N., Scheines, R., Heckerman, D.,Meek, C., Cooper, G., and Richardson, T. Causation,prediction, and search. MIT press, 2000.

The Sage Developers. SageMath, the Sage Math-ematics Software System (Version 8.5), 2018.https://www.sagemath.org.

Tian, J. and Pearl, J. On the identification of causal effects.Technical Report R-290, Cognitive Systems Laboratory,UCLA, 2003.

VanderWeele, T. J. Bias formulas for sensitivity analysisfor direct and indirect effects. Epidemiology (Cambridge,Mass.), 21(4):540, 2010.

Vanderweele, T. J. and Arah, O. A. Bias formulas for sen-sitivity analysis of unmeasured confounding for generaloutcomes, treatments, and confounders. Epidemiology(Cambridge, Mass.), 22(1):42–52, January 2011. ISSN1531-5487.

Wang, X., Jiang, Y., Zhang, N. R., and Small, D. S. Sensitiv-ity analysis and power for instrumental variable studies.Biometrics, 2018.

Wright, S. Correlation and causation. Journal of agricul-tural research, 20(7):557–585, 1921.

Appendix

A. Proof of propositions 1 and 2The propositions follow directly from the definitions, butwe state the proofs here for completeness. For proposition 1,first noteES is a functional of the covariance matrix Σ and itis by definition identifiable. Thus, ifQ is identifiable, we canalso uniquely compute Q from Σ and, since B = ES −Q,and each of its components are identifiable, B can alsobe uniquely computed from Σ and it is thus identifiable.Conversely, if B is identifiable, just write Q = ES + B,which means Q can be uniquely determined from Σ and itis also identifiable.

Proposition 2 follows the same argument. First note that ifQ is θ-identifiable then we can write B(θ) = ES − Q(θ)which is uniquely determined by Σ and θ, giving us a biasfunction parameterized in terms of θ. Conversely, if thereexists a function B(θ) which, by definition, gives us aunique bias in terms of θ (and the data Σ), we can writeQ(θ) = ES +B(θ). This implies Q can be uniquely deter-mined from Σ and θ and it is thus θ-identifiable.

B. Proof and pseudocode for Theorem 1Theorem 1 (PUSHFORWARD). Given a linear SCM withgraph G, covariance matrix Σ, a set of known di-rected edges D, and known bidirected edge εxy, let thepair 〈G′,Σ′〉 be constructed from G and Σ as follows:

1. x↔ y is removed and σ′xy = σxy − εxy;

2. ∀c ∈ Ch(x), c 6= y, the bidirected edges c ↔ y areadded if they do not exist, and ε′cy = εcy + λcyεxy;

3. ∀z ∈ De(y), z 6= x, if there is an edge on any directedpath from y to z that is not in D, then z is removedfrom G′. For the remaining z, σ′xz = σxz − εxyδyz ,where δyz is the sum of all directed paths from y to z;

4. All other parameters and covariances remain the same.

Then if λab is identifiable in G′ it is(εxy,D)-identifiable in G.

Before moving forward, we use a couple definitions from theliterature, which make reasoning about paths in the grapheasier:

Definition 3. (Foygel et al., 2012) A path π from v to w isa trek if it has no colliding arrowheads, that is, π is of the

Page 11: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

form:

v ← ...←↔→ ...→ w

v ← ...← k → ...→ w

v ← ...← w

v → ...→ w

Definition 4. (Foygel et al., 2012) A trek monomial π(Λ, E)for trek π is defined as the product of the structural parame-ters along the trek, multiplied by the trek’s top error termcovariance.

In particular, if π does not contain a bidirected edge9,

π(Λ, E) = ε2k∏

x→y∈πλxy

where k is the node at the “top" of the trek (it has no incom-ing edges). If the trek contains bidirected edge εab, then

π(Λ, E) = εab∏

x→y∈πλxy

Lemma 1. (Foygel et al., 2012) The covariance between vand w, σvw can be written as the sum of the trek monomialsof all treks between v and w (Tvw):

σvw =∑π∈Tvw

π(Λ, E)

At its core, identifiability of an edge λ in linear GaussianSCM can be reduced to the problem of finding whether thereexists a unique solution for λ in terms of covariances in thesystem of equations defined by the rules of path analysis(Foygel et al., 2012), and knowledge of existing directedand bidirected effects.

With this in mind, we can prove PUSHFORWARD.

Proof. Specified in the theorem is a covariance matrix Σ, agraph of the structural equations G, a set of known directededges D, and known bidirected edge εxy .

The system of equations constraining values of structuralparameters is

σvw =∑π∈Tvw

π(Λ, E) ∀v, w ∈ G

We first look at σxy , and define a new known quantity σ′xy:

σxy = εxy +∑

π∈Txy\{εxy}

π(Λ, E)

σ′xy = σxy − εxy =∑

π∈Txy\{εxy}

π(Λ, E)

9Note also that we can have a trek from v to v, including a trekthat takes no edges at all, which would be simply ε2v

Algorithm 2 PF - PUSHFORWARD

1: function PF(G,Σ,D, εxy , x)2: initialize 〈G′,Σ′〉 ← 〈G,Σ〉3: update ε′xy ← 0 in G′ and σ′xy ← σ′xy − εxy in Σ′

4: for each c ∈ Ch(x) do5: update ε′cy ← ε′cy + λxcεxy in G′

6: end for7: for each z ∈ De(y) do8: if Edges(δyz) ⊆ D then9: update σ′xz = σxz − εxyδyz

10: else11: remove z from G′

12: end if13: end for14: return 〈G′,Σ′〉15: end function

We also look at all descendants of y, (z ∈ Z) where thedirected paths from y to z (δyz) are made entirely of knownedges (Edges(δyz) ⊆ D). We define

δab =1

ε2a

∑π∈T→xy

π(Λ, E)

where T →xy represents the set of treks taking only directededges from a to b: a→ ...→ b.

For each such descendant of y, z, we define the quantityσ′xz

σxz = δyzεxy +∑

π∈Txy\T→εxyyz

π(Λ, E)

σ′xz = σxz − δyzεxy =∑

π∈Txy\T→εxyyz

π(Λ, E)

Here, we used T →εxyyz to represent the treks starting fromεxy , and continuing from y to x (half-treks from x to z usingεxy).

Finally, we define σ′vw = σvw for all other covariancesbetween nodes a and b where both a and b are either non-descendants of y, or have their paths to y known.

This gives us a new system of equations in the original vari-ables. All that remains to be shown is that an identifiedquantity λ′ab in G′ which contains a “pushed-forward” bidi-rected edge guarantees that the above-generated system ofequations can be solved for the corresponding variable λab.

As per the definition of G′, it is identical to G, except:

1. the bidirected edge x↔ y is removed

2. ∀c ∈ Ch(x), the edges c↔ y are added.

Page 12: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

3. Descendants of y, z, where all edges of δyz are notknown are removed

This new model G′, with parameters Λ′ and E ′ has systemof equations:

σ′′

vw =∑π∈Tvw

π(Λ′, E ′) ∀v, w ∈ G′

We compare this new system of equations to the modifiedequations of G.

• For all non-descendants of x or y, all covariance equa-tions are identical (Both graphs have the same treksfrom non-descendants of x and y to all other nodes, andthese covariances were not modified in the augmentedequations).

• For all descendants of x, the modified equations forG have εxyλxc wherever G′ has ε′cy when G does nothave bidirected edge c↔ y. If G already includes anεcy, then it has (εxyλxc + εcy) for each ε′cy. This canbe seen by comparing the treks available in the twomodels. We can create a map of treks in G to treks inG′. Treks not crossing the added/removed bidirectededges are identical. All that remains are treks crossingεxy in G, and ε′cy in G′. Suppose we have a trek froma to b in G′ a ← ... ← c ↔ y → ... → b, crossingthe bidirected edge ε′cy. The corresponding trek in Gacross εcy, if it exists, and the trek a ← ... ← c →x↔ y → ...→ b both map to it. Since we have a mapfrom treks in G to all treks in G′, which differs only inthe specified spot, the equations are likewise identicalsave for the mapping difference.

• The covariances between x and the descendants ofy and y have likewise identical equations. This isbecause the removed treks in the modified equationsare the only possibilities including εxy , so all variablesbehave as if the bidirected edge did not exist. This canalso be seen by recognizing that setting εxy = 0 wouldresult in the same equation as removing all instances ofthe variable. Since the only treks from x which includeεxy start by crossing x↔ y, and continue on a directedpath, removing all directed paths from y multiplied byεxy achieves the desired effect.

Finally, we notice that any algorithm for identifiability inthis new model G′ certifies that the system of equations canbe uniquely solved for a given parameter, and the answercan be written in terms of Σ′′.

We argue that the same parameter can be solved using theaugmented equations of the original model G, replacingΣ′′ with Σ′ in the estimand returned from the identifiability

algorithm for G′. This would be directly true if the modifiedequations for G were really identical to the equations forG′. However, the equations of G differ in εxy and εxc asmentioned above. We show that this difference does notaffect the solutions for any of the directed or bidirectededges except εxc.

Looking at the form of the structural equations, we get asum of treks, which are themselves a product of paths. Weexploit the mapping created above between treks in G totreks in G′ to get:

σ′′wv =∑

treks not passing ε′cy +∑

treks passing ε′cy

Each trek can pass an ε at most once, at the top of the trek.Looking at the treks of G′, we get:

∑treks passing ε′cy = δ′cwε

′cyδ′yv + δ′cvε

′cyδ′yw

= (δ′cw + δ′cv)ε′cy(δ′yv + δ′yw)

The corresponding equation in G is:

∑treks passing εxy = (δcw+δcv)(εxyλxc+εcy)(δyv+δyw)

With the fact that the δ are all identical in both G and G′ interms of equation structure, we get:∑

treks passing ε′cy = (δcw + δcv)ε′cy(δyv + δyw)

Our goal now is to replace the εxyλxc + εcy from the equa-tion of G with a temporary variable εTc, giving

∑treks passing εxy = δcwεTcδyv

With this new system of equations, the temporary variableεTc corresponds to ε′cy , and the two systems are identical intheir unknowns, making any solution in G′ a solution forthe modified G equations.

To achieve this, we need to show that the substitution isvalid. The argument we are making is that if we have asystem of equations:

K1 = x+ (x+ 5)

K2 = 3x

we can replace x+ 5 with y, giving:

K1 = x+ y

K2 = 3x

y = x+ 5

Page 13: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

If the equations

K1 = x+ y

K2 = 3x

are sufficient for uniquely identifying the value of x, thenassuming consistency of the original equations, the samesolution for x is valid for the full system, including the thirdequation.

This is exactly the situation we have for our trek equations.We define εTc = εxyλxc+εcy , and if the system of equationsexcluding the equation constraining εTc has a solution for agiven parameter, the same solution will be valid for the fullsystem.

Lastly, we notice that εcy can be obtained from the solutionsto εTc, εxy, λxc using the same equation.

This completes the proof.

C. Identification, sensitivity analysis andGröbner bases

Gröbner bases are a symbolic method of computer algebraused to solve systems of polynomial equations. García-Puente et al. (2010) have shown that the identification (ID)problem in linear SCMs can be reduced to solving a systemof polynomial equations and how Gröbner bases provide acomplete solution.

In this section we will take a practical approach of showinghow to set up the ID problem so it can be solved with Gröb-ner bases. We also show how to extend this to include sen-sitivity parameters, solving the problem of θ-identification.Our approach is based on García-Puente et al. (2010). For abasic understanding of Gröbner bases, please refer to Coxet al. (1992).

Gröbner bases can be seen as an algorithm to do variableelimination in complex polynomial equations. Let us illus-trate the variable elimination approach in the simple instru-mental variable graph:

x

z

yb

a

εxy

We can write the (normalized) covariance equations inducedby the graph as follows:

σxy = a+ εxy

σzy = b× aσzx = b

Given these equations, the goal is to solve for a in terms ofthe covariances of Σ only. Normally, one would approach

this directly, by simply eliminating one variable at a time.For example, after eliminating b, we get:

σxy = a+ εxy

σzy = σzx × a

Next, we would eliminate εxy , by putting it in terms of a:

εxy = σxy − a

Then, we have a final equation just in terms a and the Σ.This equation can be solved for a, and depending on howmany values of a satisfy the constraint, it give us our identi-fication result (here, only a =

σzyσzx

is valid).

Gröbner bases perform an equivalent operation—they suc-cessively eliminate variables from the system of equations.In this situation, we want to eliminate εxy and b, leavingonly a and the covariances. In SAGE (The Sage Developers,2018), this reduces to the following code:

R.<a,b,epsilon_xy,sigma_zx,sigma_zy,sigma_xy>= PolynomialRing(QQ)

Ideal(sigma_xy - (a+epsilon_xy),sigma_zy - (b*a),sigma_zx - (b)

).elimination_ideal([epsilon_xy,b]).groebner_basis()

If the result is a first degree polynomial in a, there is a singlesolution.

The extension of this method to the θ-identification prob-lem entailed by sensitivity analysis is straightforward. Assensitivity parameters are treated like known variables, wesimply do not eliminate them. In the above example, if wewere to treat εxy as a sensitivity parameter, our code wouldbe:

R.<a,b,epsilon_xy,sigma_zx,sigma_zy,sigma_xy>= PolynomialRing(QQ)

Ideal(sigma_xy - (a+epsilon_xy),sigma_zy - (b*a),sigma_zx - (b)

).elimination_ideal([b]).groebner_basis()

with an identical interpretation: if the resulting polynomialsin a, Σ and εxy are linear in a, we conclude that knowingthe givens is sufficient to identify a.

Unfortunately, despite the completeness of this approach,Gröbner bases are doubly-exponential in the number of vari-ables, and in this case each edge corresponds to a variable(Bardet, 2002). This limits the practical solvable graph sizeto 4 or 5 nodes (Foygel et al., 2012; García-Puente et al.,2010). Our own experiments hit upon the same limitation,with attempted computations on 5-node graphs sometimestaking several days for identifying single edges, despite us-ing an optimized representation of the equations (Foygelet al., 2012).

Page 14: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

D. Detailed description of computationalexperiments

In this section, we provide a detailed description of ourcomputational experiments, including pseudocode and addi-tional tests. Our computational experiments have two maingoals.

First, they aim to empirically verify the generality of ourconstrained identification algorithm CID, by comparing ourresults to the ground truth obtained via computer algebra.

Second, note that CID has three separate components:

1. The QID algorithm (Chen et al., 2017), which we useboth for the identification of directed edges, and forincorporating constraints on directed edges that can beused as sensitivity parameters;

2. The graphical manipulations performed by PUSHFOR-WARD, which we use to incorporate constraints onbidirected edges; and,

3. The order in which to perform the graphical manipula-tion of PUSHFORWARD. In CID we chose to performa topological ordering as described in Algorithm 1.

Thus, our computational experiments also aim to disentanglethe contributions of each of those components to our results.

Solving all 3 and 4-Node sensitivity queries

Our computational experiments rely on the ability to findground-truth answers to the question of whether a targetcoefficient λab is θ-identifiable in a given graph G (this isdefined to be a query). As explained in Section C of thissupplementary material, these ground truth answers canbe obtained with algebraic methods, more precisely usingGröbner bases (García-Puente et al., 2010).

For 3-node models we have 50 connected graphs with 720possible queries; for 4-node models, we have 3,745 con-nected graphs and 1,059,156 possible queries. Note that,for 5-node models, we have 1,016,317 connected graphsand 11,615,669,904 possible queries. As mentioned in Sec-tion C, ground-truth computations using computer algebracan take hours (or sometimes days) for a single 5-nodegraph, redering an exhaustive study of sensitivity queries in5-node models impractical.

We have thus performed an exhaustive computation of theground truth answer of all possible queries in 3 and 4 nodemodels via computer algebra using SAGE (The Sage Devel-opers, 2018). These results give us a list stating for everygraph G, every edge λab, and all possible subsets of di-rected and bidirected edges used as sensitivity parameters θ,whether λab can be uniquely computed from Σ and θ.

Our main interest lies on those queries that can be identifiedonly when θ 6= ∅ (we call this a sensitivity query)—inother words, we do not consider those edges that can beidentified from Σ alone, since in these cases the parameteris identifiable and a sensitivity analysis would not be needed.The ground truth numbers of all θ-identifiable queries onlywhen θ 6= ∅ are 320 for 3-node models and 578,858 for4-node models.

Our exhaustive computations also allow us to see how manysensitivity queries can be solved using only subsets of di-rected edges or only subsets of bidirected edges as sensitivityparameters. The decomposition then becomes the follow-ing. For 3-node models, there are 19 sensitivity queriesthat can be solved using only subsets of directed edges assensitivity parameters, 109 using only subsets of bidirectededges, and, as before, 320 total queries which are solvableusing an arbitrary combination of both. For 4-node mod-els, these numbers increase to 15,740, 52,016 and 578,858respectively. These numbers reveal that incorporating con-straints on bidirected edges is an essential step for derivingsensitivity curves.

Comparing QID and CID to ground-truth answers

Once we have obtained ground-truth answers to all queriesin 3 and 4-node models, we run both the QID as well as theCID algorithm for each of those queries and check whetherthey can correctly decide whether θ is an admissible set ofsensitivity parameters for λab in G (and thus able to providea sensitivity curve). This comparison gives us the numberswe have presented in the main text in Table 1.

Alternative ordering methods for PUSHFORWARD

In the main text, the CID algorithm applies PUSHFORWARDin a topological ordering for processing multiple bidirectededges. The method does not perform all possible graphi-cal manipulations, and as such, a valid concern is that itmight be less capable than a more general search. Anotherinteresting question is to check whether simpler methodswould perform as well as the current CID implementation.To tackle these questions, we tested additional orderingmethods for handling multiple bidirected edges.

For simplicity of exposition, the CID algorithm in the maintext has the ordering method embedded in the pseudocodeitself. For the purposes of this section, however, it is con-ceptually easier to create a meta algorithm that repeats thefollowing process: (i) first it creates a collection of validmodified graphs G applying PUSHFORWARD according tosome ordering method; then, (ii) it applies an identificationalgorithm to each of those modified graphs. This is given inAlgorithm 3, which we call CID*.

In Algorithm 3, the argument PFORDER represents a func-

Page 15: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

3 NODES 4 NODES

PF order ID Alg. directed edges Directed Bidirected Both Directed Bidirected Both

none QID 19 - 68 14,952 - 170,304PFO QID 19 101 304 14,952 43,526 505,076PFS QID 19 105 308 14,952 46,630 517,036PFR QID 19 109 320 14,952 50,708 555,758

(CID) PFT QID 19 109 320 14,952 50,708 555,758

none Complete 19 - 68 15,740 - 177,216PFO Complete 19 101 304 15,740 44,680 524,846PFS Complete 19 105 308 15,740 47,962 538,332PFR Complete 19 109 320 15,740 51,992 578,758PFT Complete 19 109 320 15,740 51,992 578,758

GROUND TRUTH 19 109 320 15,740 52,016 578,858

Table 2: Number of θ-identifiable queries (only when θ 6= ∅) per type of sensitivity parameters θ, using different orderingmethods for PUSHFORWARD and different ID algorithm for the directed edges. Ground Truth is computed using Gröbnerbases. The first column defines the ordering method of PUSHFORWARD used for incorporating constraints on bidirectededges—this is passed as the argument PFORDER in the general function CID*. The second column refers to the identificationalgorithm used for directed edges—this is passed as the argument IDMETHOD in the general function CID*. “Complete”means we used Gröbner bases to simulate a complete ID algorithm for directed edges running inside CID*. Note the firstrow corresponds to QID and the boldfaced row corresponds to CID as presented in the main text applying PUSHFORWARDin topological ordering. These two rows are the ones presented in Table 1 of the main text. A pseudocode for computingthese numbers is given in Algorithm 4.

Algorithm 3 Meta constrained ID algorithm.

1: function CID*(G,Σ,B,D, PFORDER, IDMETHOD)2: repeat3: G ← PFORDER(G,Σ,B,D)4: for 〈G′,Σ′〉 ∈ G do5: D ← D ∪ IDMETHOD(G′,Σ′,D)6: end for7: until all directed edges have been identified or no

edge has been identified in the last iteration8: return D9: end function

tion that takes as inputs a graph G, a covariance matrix Σ, aset of known bidirected edges B and a set of known directededges D. It then returns a collection G of valid modifiedmodels 〈G′,Σ′〉 by iteratively applying PUSHFORWARDfollowing a particular ordering method (for example, topo-logical ordering). The argument IDMETHOD refers to anidentification method for directed edges (for instance, QID).It is a function that takes as inputs a graph G, a covariancematrix Σ and a set of known directed edges D and it returnsthe new set of known directed edges.

We can now create different functions for different orderingmethods. For instance, the function PFT described in Al-gorithm 7 applies PUSHFORWARD in topological ordering(as embedded in Algorithm 1 of the main text) and returns

all valid modified graphs. We now define three additionalordering methods.

• PFO described in Algorithm 5. This function pushesforward each bidirected edge only once, consideringthe original graph. This method is the simplest ap-plication of PUSHFORWARD, and serves as a base ofcomparison to assess the gains of more elaborate meth-ods.

• PFS described in Algorithm 6. This function tries toapply PUSHFORWARD once to all subsets of bidirectededges connected to each end node. This procedure hasexponential computational complexity.

• PFR described in Algorithm 8. This function recur-sively tries every possible combination of applyingPUSHFORWARD for each bidirected edge connectedto the same end node (it tries each subset once, and ofthose that can be pushed forward again, tries each sub-set, and so on). This procedure has doubly exponentialcomputational complexity.

All these function return a collection G of valid modifiedgraphs, and can be used as the PFORDER argument in theCID* function. Of these methdos, PFR is arguably themost important for comparison with our current implemen-tation of topological ordering. The results are shown in thefirst half of Table 2, which compares CID* using the same

Page 16: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

ID method for directed edges (QID) but different order-ing methods for applying PUSHFORWARD. Our preferredversion, which was presented in the main text as CID, cor-responds to the boldfaced row with ordering method PFTand ID method QID. As we can see, topological orderingperforms as well as the brute-force recursive search of allsubsets performed by PFR, which has doubly exponentialcomputational complexity.

Disentangling PF and QID

Finally, the incompleteness of CID can stem from twosources: limitations of the graphical manipulations per-formed by PUSHFORWARD or the incompleteness of theidentification algorithm for directed edges, QID. Separat-ing the two can help guide efforts for future research. Toachieve that, we used algebraic methods to simulate howCID would have performed if it had access to a completeidentification algorithm for directed edges instead of QID.

More precisely, we use Gröbner bases as our ID algorithmfor directed edges (IDMETHOD) in CID*, where, just likeQID, Gröbner bases only have access to constraints on bidi-rected edges via the graphical manipulation performed byPUSHFORWARD. That is, Gröbner bases is dealing with theproblem as if it were a “vanilla” identification problem, notexplicitly knowing that the bidirected edge is fixed. Theresults can be seen in the second half of Table 2. The lastrow indicates, for instance, that incorporating constraintson bidirected edges using PUSHFORWARD in topologicalorder, in combination with a complete identification algo-rithm for directed edges, would have identified over 99.99%of 4-node sensitivity queries.

This suggests that: (i) the main bottleneck of the current im-plementation of CID is QID itself; (ii) PUSHFORWARD withtopological ordering is an efficient procedure for dealingwith bidirected edges that can reap the benefits of improvedidentification algorithms.

E. The missed casesAs discussed in the previous section, PUSHFORWARD intopological order, in combination with a complete identifica-tion algorithm for directed edges, would have identified over99.99% of all 4-node sensitivity queries. In this section webriefly discuss some of the missed cases, which may provideguidance for further improvements of the CID algorithm.We also provide all the missed cases for those interested inexploring them further (Tables 3 and 4).

When iterating over modified graphs, the CID algorithmfeeds its next iteration only identification results for directeffects (single coefficients), not of path specific effects ortotal effects (sums of products of coefficients), which maynevertheless be identified. Figure 8 shows two simple ex-

x y

z

w

λxz

λxy

λxw

λwz

εxy

εxz

εwz

(a)

x y

z

wλxz

λxy

λxw

λwz

λ wy

εxy

εxz

εwz

(b)

Figure 8: Examples of missed cases using PUSHFORWARDwith a complete identification algorithm of directed edges.In both examples, λxy is εzx-identifiable, but the algorithmfails due to lack of exploitation of identified total effects.In example 8a, it turns out a simple marginalization of wsuffices for the εzx-identification of λxy using the currentimplementation of the CID algorithm. However, marginal-ization alone is often not enough, as shown in example 8b.

amples that illustrates how not exploiting the knowledge ofknown total effects can result in a failure of identification.

Let us start with Figure 8a. In this example, our task isto find a sensitivity curve for λxy in terms of εzx. Firstnote that z is not a valid instrument for λxy since it is adescendant of x. However, pushing forward εzx allows us toidentify the total effect of x on z. This, in turn, permits thecreation of the auxiliary variable z∗ = z−(λxz+λxwλwz)xwhich is now a valid instrument for λxy . In the example ofFigure 8a, it turns out a simpler solution would also suffice—marginalizing w. Note the marginalized DAG results in asimple three node model which can be solved by the currentimplementation of CID. Nevertheless, marginalization byitself may not always be sufficient, as a simple variation ofthis very example shows (Figure 8b).

In sum, θ-identification in these cases require systematicallyexploiting known total effects (for instance, creating AVssubtracting out total effects) or known path-specific effects,

c

xm

yλxy

λxmλmy

λ cx

λcm

εxm

εcm ε

cy

Figure 9: An interesting missed case example. Here λxyis εcm-identifiable. All examples can be found in Ta-bles 3 and 4.

Page 17: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

a task which still does not have a satisfactory solution in theliterature. A final interesting (and challenging) example inwhich CID failed to find the sensitivity curve is shown inFig .9.

F. Utility of descendantsHere we show that not pruning descendants of variable ycan be useful for identification. An example is given inFig. 10.

y

x

w

zλ xy

εwy

εwz

(a) Original

y

x

w

z

(b) PF w ↔ z

y

x

w

z

(c) Know y → z

y

x

w

(d) Look at y

y

x

w

(e) PF w ↔ y

y

x

w

z∗

(f) z∗ = z − λyzy

y

x

w

z∗

(g) PF w ↔ y

Figure 10: Take the graph in 10a. The red bidirected edges(εwy, εwz) are assumed to be known, and the target quantityis λxy (blue). First, we can use knowledge of w ↔ z to usew as an instrument for y → z (10b,10c). This knowledge,however, does not help in solving for x → y (10d), evenwhen pushing forward w ↔ y (10e). The issue is thatwe pruned z when pushing forward w ↔ y, since it is adescendant of y. However, note we can create an AV z∗,which behaves as if it were not a descendant of y (10f). Itturns out that z∗ is an instrument for x→ y conditioned onw in 10g, solving the problem!

Algorithm 4 Pseudocode for checking performance ofCID* with different PUSHFORWARD orders and differentID algorithm for directed edges. In the code, IDENTIFY-DIRECTEDEDGES and ISIDENTIFIED are computed usingcomputer algebra (Gröbner bases), and give the ground-truthvalues.

1: initialize Total← 02: initialize PFtotal← 03: S ← set of all possible connected DAGs, with all com-

binations of directed and bidirected edges.4: for each graph 〈G,Σ〉 ∈ S do5: IDedges← IDENTIFYDIRECTEDEDGES(G)6: for all (x→ y) ∈ G where (x→ y) /∈ IDedges do7: SPS ← All subsets of directed and bidirected

edges of G which do not contain (x→ y)8: for each set 〈D,B〉 ∈ SPS do9: if ISIDENTIFIED(G, x→ y,D,B) then

10: Total← Total + 111: if (x → y) ∈ CID*(G,Σ,D,B, PFORDER,

IDMETHOD) then12: PFTotal← PFTotal + 113: end if14: end if15: end for16: end for17: end for18: return PFtotal

Total

Algorithm 5 Push forward each bidirected edge once.

1: function PFO(G,Σ,B,D)2: let By represent subset of B where all edges have y

as end point (By = {(x↔ y) ∈ B,∀x})3: G ← {(G,Σ)}4: for each node y ∈ G do5: for bidirected edge εxy ∈ By do6: if x /∈ DE(y) or δyx ∈ D then7: add PF(G,Σ,D, εxy, x) to G8: end if9: end for

10: end for11: return G12: end function

Page 18: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

Algorithm 6 Push forward all subsets once.

1: function PFS(G,Σ,B,D)2: let By represent subset of B where all edges have y

as end point (By = {(x↔ y) ∈ B,∀x})3: G ← {(G,Σ)}4: for each node y do5: for each B′y ⊆ By do6: 〈G′,Σ′〉 ← 〈G,Σ〉7: for each εxy ∈ B′y do8: if x /∈ DE(y) or δyx ∈ D then9: 〈G′,Σ′〉 ←PF(G′,Σ′,D, εxy, x)

10: for all z ∈ CH(x) do11: if λxz /∈ D then12: remove εzy from B′y if it was not yet

processed.13: end if14: end for15: end if16: end for17: add 〈G′,Σ′〉 to G18: end for19: end for20: return G21: end function

Algorithm 7 Push forward in topological order.

1: function PFT(G,Σ,B,D)2: let By represent subset of B where all edges have y

as end point (By = {(x↔ y) ∈ B,∀x})3: G ← {(G,Σ)}4: for each node y do5: 〈G′,Σ′〉 ← 〈G,Σ〉6: for each εxy ∈ By in topological order on x do7: if x /∈ DE(y) or δyx ∈ D then8: 〈G′,Σ′〉 ←PF(G′,Σ′,D, εxy, x)9: add 〈G′,Σ′〉 to G

10: for all z ∈ CH(x) do11: if λxz /∈ D then12: remove εzy from By if it was not yet

processed.13: else14: add εzy to By15: end if16: end for17: end if18: end for19: end for20: return G ∪ PFO(G,Σ,B,D)21: end function

Algorithm 8 Push forward all subsets recursively.

1: function PFR(G,Σ,B,D)2: let By represent subset of B where all edges have y

as end point (By = {(x↔ y) ∈ B,∀x})3: initialize G ← {(G,Σ, ∅)}4: for each node y do5: PushSets← {〈G,Σ, B′y〉 for all B′y ⊆ By}6: while PushSets not empty do7: pop 〈G′,Σ′, B′y〉 from PushSets8: PushAgain← {}9: for each εxy ∈ B′y do

10: if x /∈ DE(y) or δyx ∈ D then11: 〈G′,Σ′〉 ←PF(G′,Σ′,D, εxy, x)12: for all z ∈ CH(x) do13: if λxz /∈ D then14: remove εzy from B′y if it was not yet

processed.15: else16: add εzy to PushAgain17: end if18: end for19: end if20: end for21: add 〈G′,Σ′〉 to G22: for allB′′y ⊆ PushAgain do23: add 〈G′,Σ′, B′′y 〉 to PushSets24: end for25: end while26: end for27: return G28: end function

Page 19: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

Graph Target Quantity Sensitivity Parameters1 1→2 1→3 1→4 2→4 1↔3 1↔4 2↔4 1→3 1↔42 1→2 1→3 1→4 2→4 1↔3 1↔4 2↔4 1→3 1↔4 1→23 1→2 1→3 1→4 2→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔44 1→2 1→3 1→4 2→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 1→25 1→2 1→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 1→3 1↔4 3→46 1→2 1→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 1→3 1↔4 1→2 3→47 1→2 1→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 3→48 1→2 1→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 1→2 3→49 1→2 1→3 2→3 1→4 2→4 1↔3 1↔4 2↔4 1→3 1↔4

10 1→2 1→3 2→3 1→4 2→4 1↔3 1↔4 2↔4 1→3 1↔4 2→311 1→2 1→3 2→3 1→4 2→4 1↔3 1↔4 2↔4 1→3 1↔4 1→212 1→2 1→3 2→3 1→4 2→4 1↔3 1↔4 2↔4 1→3 1↔4 1→2 2→313 1→2 1→3 2→3 1→4 2→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔414 1→2 1→3 2→3 1→4 2→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 2→315 1→2 1→3 2→3 1→4 2→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 1→216 1→2 1→3 2→3 1→4 2→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 1→2 2→317 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 1→3 1↔4 3→418 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 1→3 1↔4 3→4 1→219 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 1→3 1↔4 2→3 3→420 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 1→3 1↔4 2→3 3→4 1→221 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 3→422 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 3→4 1→223 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 2→3 3→424 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 1↔4 2↔4 3↔4 1→3 1↔4 3↔4 2→3 3→4 1→225 1→2 1→3 2→3 2→4 3→4 1↔3 2↔3 1↔4 2→4 1↔326 1→2 1→3 2→3 2→4 3→4 1↔3 2↔3 1↔4 2→4 1↔3 1→227 1→2 1→3 2→3 2→4 3→4 1↔3 2↔3 1↔4 3→4 1↔328 1→2 1→3 2→3 2→4 3→4 1↔3 2↔3 1↔4 3→4 1↔3 1→229 1→2 1→3 2→3 2→4 3→4 1↔3 2↔3 1↔4 3↔4 2→4 1↔3 3↔430 1→2 1→3 2→3 2→4 3→4 1↔3 2↔3 1↔4 3↔4 2→4 1↔3 3↔4 1→231 1→2 1→3 2→3 2→4 3→4 1↔3 2↔3 1↔4 3↔4 3→4 1↔3 3↔432 1→2 1→3 2→3 2→4 3→4 1↔3 2↔3 1↔4 3↔4 3→4 1↔3 3↔4 1→233 1→2 1→3 2→3 1→4 1↔3 2↔3 1↔4 1→4 1↔334 1→2 1→3 2→3 1→4 1↔3 2↔3 1↔4 1→4 1↔3 1→235 1→2 1→3 2→3 1→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔436 1→2 1→3 2→3 1→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 1→237 1→2 1→3 2→3 1→4 3→4 1↔3 2↔3 1↔4 1→4 1↔338 1→2 1→3 2→3 1→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 3→439 1→2 1→3 2→3 1→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 1→240 1→2 1→3 2→3 1→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 1→2 3→441 1→2 1→3 2→3 1→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔442 1→2 1→3 2→3 1→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 3→443 1→2 1→3 2→3 1→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 1→244 1→2 1→3 2→3 1→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 1→2 3→445 1→2 1→3 2→3 1→4 2→4 1↔3 2↔3 1↔4 1→4 1↔346 1→2 1→3 2→3 1→4 2→4 1↔3 2↔3 1↔4 1→4 1↔3 2→447 1→2 1→3 2→3 1→4 2→4 1↔3 2↔3 1↔4 1→4 1↔3 1→248 1→2 1→3 2→3 1→4 2→4 1↔3 2↔3 1↔4 1→4 1↔3 1→2 2→449 1→2 1→3 2→3 1→4 2→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔450 1→2 1→3 2→3 1→4 2→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 2→4

Table 3: Missed sensitivity queries of PUSHFORWARD in topological order, in combination with a complete identificationalgorithm for directed edges. Part 1.

Page 20: Sensitivity Analysis of Linear Structural Causal Modelsof E). Missing directed edges represent exclusion restric-tions—a variable is not a direct cause of the other. Missing bidirected

Sensitivity Analysis of Linear Structural Causal Models

Graph Target Quantity Sensitivity Parameters51 1→2 1→3 2→3 1→4 2→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 1→252 1→2 1→3 2→3 1→4 2→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 1→2 2→453 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 3→454 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 3→4 1→255 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 2→456 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 1→2 2→457 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 3→4 2→458 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 3→4 1→2 2→459 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 1↔460 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 1→4 1↔3 1↔4 1→261 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 2→4 1↔3 1→462 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 2→4 1↔3 1→2 1→463 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 2→4 1↔3 1↔464 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 2→4 1↔3 1↔4 1→265 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3→4 1↔3 1→466 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3→4 1↔3 1→2 1→467 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3→4 1↔3 1↔468 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3→4 1↔3 1↔4 1→269 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 3→470 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 3→4 1→271 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 2→472 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 1→2 2→473 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 3→4 2→474 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 3↔4 3→4 1→2 2→475 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 1↔4 3↔476 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 1→4 1↔3 1↔4 3↔4 1→277 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 2→4 1↔3 3↔4 1→478 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 2→4 1↔3 3↔4 1→2 1→479 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 2→4 1↔3 1↔4 3↔480 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 2→4 1↔3 1↔4 3↔4 1→281 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 3→4 1↔3 3↔4 1→482 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 3→4 1↔3 3↔4 1→2 1→483 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 3→4 1↔3 1↔4 3↔484 1→2 1→3 2→3 1→4 2→4 3→4 1↔3 2↔3 1↔4 3↔4 3→4 1↔3 1↔4 3↔4 1→285 1→2 2→3 2→4 3→4 1↔2 1↔4 3↔4 1→2 1↔486 1→2 2→3 2→4 3→4 1↔2 1↔4 3↔4 1→2 1↔4 2→387 1→2 2→3 2→4 3→4 1↔2 1↔4 2↔4 3↔4 1→2 1↔4 2↔488 1→2 2→3 2→4 3→4 1↔2 1↔4 2↔4 3↔4 1→2 1↔4 2↔4 2→389 1→2 2→3 1→4 2→4 3→4 1↔2 1↔4 3↔4 1→2 1↔4 1→490 1→2 2→3 1→4 2→4 3→4 1↔2 1↔4 3↔4 1→2 1↔4 2→3 1→491 1→2 2→3 1→4 2→4 3→4 1↔2 1↔4 2↔4 3↔4 1→2 1↔4 2↔4 1→492 1→2 2→3 1→4 2→4 3→4 1↔2 1↔4 2↔4 3↔4 1→2 1↔4 2↔4 2→3 1→493 1→2 1→3 1→4 3→4 1↔2 1↔4 3↔4 1→2 1↔494 1→2 1→3 1→4 3→4 1↔2 1↔4 3↔4 1→2 1↔4 1→395 1→2 1→3 1→4 3→4 1↔2 1↔4 2↔4 3↔4 1→2 1↔4 2↔496 1→2 1→3 1→4 3→4 1↔2 1↔4 2↔4 3↔4 1→2 1↔4 2↔4 1→397 1→2 1→3 1→4 2→4 3→4 1↔2 1↔4 3↔4 1→2 1↔4 2→498 1→2 1→3 1→4 2→4 3→4 1↔2 1↔4 3↔4 1→2 1↔4 1→3 2→499 1→2 1→3 1→4 2→4 3→4 1↔2 1↔4 2↔4 3↔4 1→2 1↔4 2↔4 2→4

100 1→2 1→3 1→4 2→4 3→4 1↔2 1↔4 2↔4 3↔4 1→2 1↔4 2↔4 1→3 2→4

Table 4: Missed sensitivity queries of PUSHFORWARD in topological order, in combination with a complete identificationalgorithm for directed edges. Part 2.