jonas peters, mateo rojas-carulla, amar shah machine ...cbl.eng.cam.ac.uk › ... › readinggroup...
Post on 25-Jun-2020
2 Views
Preview:
TRANSCRIPT
Causal Inference
Jonas Peters, Mateo Rojas-Carulla, Amar Shah
Machine Learning Group, University of Cambridge
23rd April 2015
is based on work by ...
UCLA: Judea Pearl
CMU: Peter Spirtes, Clark Glymour, Richard Scheines
Harvard University: Donald Rubin, Jamie Robins
ETH Zurich: Peter Buhlmann, Nicolai Meinshausen
Max-Planck-Institute Tubingen: Dominik Janzing, Bernhard Scholkopf
University of Amsterdam: Joris Mooij
Patrik Hoyer
... many others
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Charig et al.: “Comparison of treatment of renal calculi by open surgery, (...) ”, British Medical Journal, 1986
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Simpson’s Paradox
Events C and E , F and ¬F describing sub populations.
P(E |C ) > P(E |¬C )
P(E |C ,F ) < P(E |¬C ,F )
P(E |C ,¬F ) < P(E |¬C ,¬F )
Key distinction between seeing and doing.
E : recovery
C : treatment
F : size of stones
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
underlying ground truth:
recovery size of stone
treatment
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Rain Sprinkler
Season
Wet
Slippery
S1: ”Turning the sprinkler on would not affect the rain”S2: ”The state of the sprinkler is independent of the state of the rain”Pearl.: “Causality: Models, Reasoning, and Inference ”
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
We often work with i.i.d. samples from a joint distribution P. Claim:
Many times, we are interested in a different distribution P 6= P.
speak . . . CAUSALITY!
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
We often work with i.i.d. samples from a joint distribution P. Claim:
Many times, we are interested in a different distribution P 6= P.
speak . . . CAUSALITY!
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
We often work with i.i.d. samples from a joint distribution P. Claim:
Many times, we are interested in a different distribution P 6= P.
speak . . . CAUSALITY!
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Pearl do calculus
Consider a Markovian model (acyclic directed graph with no cycles) andall noise terms are jointly independent.
Theorem
The Causal Markov Condition: Any distribution generated by a Markovianmodel M can be factorized as:
P(v1, . . . , vn) =∏
i
P(vi |pai )
where V1, . . . ,Vn are the endogenous variables in M, and pai are values ofthe parents of Vi in the causal diagram associated with M.
Pearl.: “Causality: Models, Reasoning, and Inference ”
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Corollary
For any Markovian model, the distribution generated by an interventiondo(X = x0) on a set X of endogenous variables is given by the truncatedfactorization:
P(v1, . . . , vk |do(xo)) =∏
i |Vi /∈X
P(vi |pai )|x=x0
where P(vi |pai ) are the pre-intervention conditional probabilities.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
X Y Z
Factorization: P(X ,Y ,Z ) = P(Z |Y )P(Y |X )P(X )
Truncate: P(Y ,Z |do(x0)) = P(Z |Y )P(Y |x0)
Marginalize: P(Y |do(x0)) =∑
z P(Y ,Z |do(x0)) = P(Y |x0)
Coincides with conditional probability!
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
X Y
Z
Factorization: P(X ,Y ,Z ) = P(Y |X ,Z )P(X |Z )P(Z )
Truncate: P(Y ,Z |do(x0)) = P(Z )P(Y |x0,Z )
Marginalize:P(Y |do(x0)) =
∑z P(Y ,Z |do(x0)) =
∑z P(Z )P(Y |x0,Z )
This does not (in general) equal P(Y |x0) =∑
z P(Y |x0,Z )P(Z |x0).
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
P(X1, . . . ,X3) has been generated by a structural equation model if
X1 = f1(X3,N1) X1 = f1(N1)
X2 = f2(X1,X3,N2)
X3 = f3(N3)
• Ni jointly independent
• G0 has no cycles
X2 X3
X1G0
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Given: graph and P.
X1 = f1(X3,N1) X1 = f1(N1)
X2 = f2(X1,X3,N2)
X3 = f3(N3)
• Ni jointly independent
• G0 has no cycles
X2 X3
X1
recovery size of stone
treatmentG0
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Given: graph and P. We (Pearl et al.) can then compute P.
X1 = f1(X3,N1) X1 = f1(N1)
X2 = f2(X1,X3,N2)
X3 = f3(N3)
• Ni jointly independent
• G0 has no cycles
X2 X3
X1
recovery size of stone
treatmentG0
IMPORTANT: modularity, autonomy: Aldrich 1989, Pearl 2009, Scholkopf et al. 2012, . . .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Back to Simpson’s paradox: Remember the inequalities:
P(E |C ) > P(E |¬C )
P(E |C ,F ) < P(E |¬C ,F )
P(E |C ,¬F ) < P(E |¬C ,¬F )
What does do calculus tell us?
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
P(E |do(C ),F ) < P(E |do(¬C ),F )
P(E |do(C ),¬F ) < P(E |do(¬C ),¬F )
Also: P(F |do(C )) = P(F |do(¬C )) = P(F ). Thus:
P(E |do(C )) = P(E |do(C ),F )P(F ) + P(E |do(C ),¬F )P(¬F )
P(E |do(¬C )) = P(E |do(¬C ),F )P(F ) + P(E |do(¬C ),¬F )P(¬F )
This impliesP(E |do(C )) < P(E |do(¬C ))
.And the paradox is gone! Pearl.: “Causality: Models, Reasoning, and Inference ”
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
P(E |do(C ),F ) < P(E |do(¬C ),F )
P(E |do(C ),¬F ) < P(E |do(¬C ),¬F )
Also: P(F |do(C )) = P(F |do(¬C )) = P(F ). Thus:
P(E |do(C )) = P(E |do(C ),F )P(F ) + P(E |do(C ),¬F )P(¬F )
P(E |do(¬C )) = P(E |do(¬C ),F )P(F ) + P(E |do(¬C ),¬F )P(¬F )
This impliesP(E |do(C )) < P(E |do(¬C ))
.And the paradox is gone! Pearl.: “Causality: Models, Reasoning, and Inference ”
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
��������
������
Bottou et al.: “Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising”, JMLR 2013
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Given: graph and P.
X1 = f1(X3,N1)
X2 = f2(X1,N2) X2 = f2(X1, N2)
X3 = f3(N3)
X4 = f4(X2,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
X2 X3
X1
click
main line user intention
user dataG0
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Given: graph and P. We (RL, IPW [Horv.-Th.], ...) can then compute P.
X1 = f1(X3,N1)
X2 = f2(X1,N2) X2 = f2(X1, N2)
X3 = f3(N3)
X4 = f4(X2,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
X2 X3
X1
click
main line user intention
user dataG0
Ok, but what if there are hiddens?
Ok, but how do we find the graph?
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Given: graph and P. We (RL, IPW [Horv.-Th.], ...) can then compute P.
X1 = f1(X3,N1)
X2 = f2(X1,N2) X2 = f2(X1, N2)
X3 = f3(N3)
X4 = f4(X2,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
X2 X3
X1
click
main line user intention
user dataG0
Ok, but what if there are hiddens?
Ok, but how do we find the graph?
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Given: graph and P. We (RL, IPW [Horv.-Th.], ...) can then compute P.
X1 = f1(X3,N1)
X2 = f2(X1,N2) X2 = f2(X1, N2)
X3 = f3(N3)
X4 = f4(X2,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
X2 X3
X1
click
main line user intention
user dataG0
Ok, but what if there are hiddens?
Ok, but how do we find the graph?
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
Reichenbach: (common cause principle) Assume that X ⊥⊥� Y . Then
X causes Y ,
Y causes X ,
there is a hidden common cause or
combination of the above.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
CM ⊥⊥�NL
Quinn et al: “the strength of the association . . . does suggest that theabsence of a daily period of darkness during childhood is a potentialprecipitating factor in the development of myopia”Quinn, Shin, Maguire, Stone: “Myopia and ambient lighting at night”, Nature 1999
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
CM ⊥⊥�NL
Quinn et al: “the strength of the association . . . does suggest that theabsence of a daily period of darkness during childhood is a potentialprecipitating factor in the development of myopia”Quinn, Shin, Maguire, Stone: “Myopia and ambient lighting at night”, Nature 1999
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
It turned out that NL ⊥⊥ CM |PM and therefore ...
NL CM
PM
Zadnik, Jones, Irvin, Kleinstein, Manny, Shin, Mutti: “Vision: Myopia and ambient night-time lighting”, Nature 2000
Gwiazda, Ong, Held, Thorn: “Vision: Myopia and ambient night-time lighting”, Nature 2000
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
It turned out that NL ⊥⊥ CM |PM and therefore ...
NL CM
PM
Zadnik, Jones, Irvin, Kleinstein, Manny, Shin, Mutti: “Vision: Myopia and ambient night-time lighting”, Nature 2000
Gwiazda, Ong, Held, Thorn: “Vision: Myopia and ambient night-time lighting”, Nature 2000
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
W X Y Z
Some definitions:
path - sequence of edges, regardless of direction e.g. W, X, Y, Z
collider - node with > 1 arrows pointing to it e.g. X
unblocked path - a path not traversing collider edges e.g. X, Y, Z
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
W X Y Z
Some definitions:
path - sequence of edges, regardless of direction e.g. W, X, Y, Z
collider - node with > 1 arrows pointing to it e.g. X
unblocked path - a path not traversing collider edges e.g. X, Y, Z
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
W X Y Z
Some definitions:
path - sequence of edges, regardless of direction e.g. W, X, Y, Z
collider - node with > 1 arrows pointing to it e.g. X
unblocked path - a path not traversing collider edges e.g. X, Y, Z
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
W X Y Z
Some definitions:
path - sequence of edges, regardless of direction e.g. W, X, Y, Z
collider - node with > 1 arrows pointing to it e.g. X
unblocked path - a path not traversing collider edges e.g. X, Y, Z
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
W X Y Z
Nodes a and b are d-connected if there is an unblocked path betweenthem e.g. X and Z are, W and Y are not. W and Y are d-separated.
Nodes a and b are d-connected given a set of nodes C, if there existsan unblocked path between a and b containing no node in C.
Conditioning on a collider, or one of its descendants, unblocks thepaths that the collider previously blocked.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
W X Y Z
Nodes a and b are d-connected if there is an unblocked path betweenthem e.g. X and Z are, W and Y are not. W and Y are d-separated.
Nodes a and b are d-connected given a set of nodes C, if there existsan unblocked path between a and b containing no node in C.
Conditioning on a collider, or one of its descendants, unblocks thepaths that the collider previously blocked.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
W X Y Z
Nodes a and b are d-connected if there is an unblocked path betweenthem e.g. X and Z are, W and Y are not. W and Y are d-separated.
Nodes a and b are d-connected given a set of nodes C, if there existsan unblocked path between a and b containing no node in C.
Conditioning on a collider, or one of its descendants, unblocks thepaths that the collider previously blocked.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
A and B are d-separated by nodes C =⇒ A ⊥⊥ B|C .
But... we want to learn the graph from conditional independence tests.So what can we say about the converse?
Faithfulness: Distribution is faithful to the graph if whenever A and B arenot d-separated by C, then A ⊥⊥� B|C .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
A and B are d-separated by nodes C =⇒ A ⊥⊥ B|C .
But... we want to learn the graph from conditional independence tests.So what can we say about the converse?
Faithfulness: Distribution is faithful to the graph if whenever A and B arenot d-separated by C, then A ⊥⊥� B|C .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
A and B are d-separated by nodes C =⇒ A ⊥⊥ B|C .
But... we want to learn the graph from conditional independence tests.So what can we say about the converse?
Faithfulness: Distribution is faithful to the graph if whenever A and B arenot d-separated by C, then A ⊥⊥� B|C .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
P(X1,...,X4)
X1 ⊥⊥ X2
X2 ⊥⊥ X3
X1 ⊥⊥ X4 | {X3}X1 ⊥⊥ X2 | {X3}X2 ⊥⊥ X3 | {X1}
X4
X2 X3
X1G
X4 = f4(X3,N4)
X3 = f3(X1,N3)
X2 = f2(N2)
X1 = f1(N1)
Ni jointly independent
independence
tests
G′
G′′
Faithfuln.Markov
unique? trivial
Method: IC (Pearl 2009); PC, FCI (Spirtes et al., 2000)
1 Find all (cond.) independences from the data.
2 Select the DAG(s) that corresponds to these independences.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
P(X1,...,X4)
X1 ⊥⊥ X2
X2 ⊥⊥ X3
X1 ⊥⊥ X4 | {X3}X1 ⊥⊥ X2 | {X3}X2 ⊥⊥ X3 | {X1}
X4
X2 X3
X1G
X4 = f4(X3,N4)
X3 = f3(X1,N3)
X2 = f2(N2)
X1 = f1(N1)
Ni jointly independent
independence
tests
G′
G′′
Faithfuln.Markov
unique? trivial
Method: IC (Pearl 2009); PC, FCI (Spirtes et al., 2000)
1 Find all (cond.) independences from the data.
2 Select the DAG(s) that corresponds to these independences.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
P(X1,...,X4)
X1 ⊥⊥ X2
X2 ⊥⊥ X3
X1 ⊥⊥ X4 | {X3}X1 ⊥⊥ X2 | {X3}X2 ⊥⊥ X3 | {X1}
X4
X2 X3
X1G
X4 = f4(X3,N4)
X3 = f3(X1,N3)
X2 = f2(N2)
X1 = f1(N1)
Ni jointly independent
independence
tests
G′
G′′
Faithfuln.Markov
unique? trivial
Method: IC (Pearl 2009); PC, FCI (Spirtes et al., 2000)
1 Find all (cond.) independences from the data.
2 Select the DAG(s) that corresponds to these independences.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
P(X1,...,X4)
X1 ⊥⊥ X2
X2 ⊥⊥ X3
X1 ⊥⊥ X4 | {X3}X1 ⊥⊥ X2 | {X3}X2 ⊥⊥ X3 | {X1}
X4
X2 X3
X1G
X4 = f4(X3,N4)
X3 = f3(X1,N3)
X2 = f2(N2)
X1 = f1(N1)
Ni jointly independent
independence
tests
G′
G′′
Faithfuln.Markov
unique? trivial
Method: IC (Pearl 2009); PC, FCI (Spirtes et al., 2000)
1 Find all (cond.) independences from the data. Be smart.
2 Select the DAG(s) that corresponds to these independences.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
X Y Z
X Y Z
X Y Z
Conditional independence is a symmetric quantity, so there are a bunch ofDAGs may be valid. All have X ⊥⊥ Z |Y .
The set of admissible DAGs based on the conditional independence testsare called Markov equivalent.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
X Y Z
X Y Z
X Y Z
Conditional independence is a symmetric quantity, so there are a bunch ofDAGs may be valid. All have X ⊥⊥ Z |Y .
The set of admissible DAGs based on the conditional independence testsare called Markov equivalent.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
X Y Z
X Y Z
X Y Z
All the above have X ⊥⊥ Z |Y .
The only X, Y, Z configuration where X ⊥⊥� Z |Y , is when Y is a collider:
X Y Z
Here, X,Y,Z form a v-structure (collider with independent causes).
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
X Y Z
X Y Z
X Y Z
All the above have X ⊥⊥ Z |Y .
The only X, Y, Z configuration where X ⊥⊥� Z |Y , is when Y is a collider:
X Y Z
Here, X,Y,Z form a v-structure (collider with independent causes).
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
X Y Z
X Y Z
X Y Z
All the above have X ⊥⊥ Z |Y .
The only X, Y, Z configuration where X ⊥⊥� Z |Y , is when Y is a collider:
X Y Z
Here, X,Y,Z form a v-structure (collider with independent causes).
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
The PC algorithm: Step 1
Form the skeleton of the graph
Start with a fully connected graph
Remove edges X-Y where X ⊥⊥ Y
Remove edges X-Y for which there is a neighbour Z 6= Y of X suchthat X ⊥⊥ Y |ZRemove edges X-Y for which there are 2 neighbours Z1,Z2 6= Y of Xsuch that X ⊥⊥ Y |Z1,Z2
...
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
The PC algorithm: Step 1
Form the skeleton of the graph
Start with a fully connected graph
Remove edges X-Y where X ⊥⊥ Y
Remove edges X-Y for which there is a neighbour Z 6= Y of X suchthat X ⊥⊥ Y |ZRemove edges X-Y for which there are 2 neighbours Z1,Z2 6= Y of Xsuch that X ⊥⊥ Y |Z1,Z2
...
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
The PC algorithm: Step 1
Form the skeleton of the graph
Start with a fully connected graph
Remove edges X-Y where X ⊥⊥ Y
Remove edges X-Y for which there is a neighbour Z 6= Y of X suchthat X ⊥⊥ Y |Z
Remove edges X-Y for which there are 2 neighbours Z1,Z2 6= Y of Xsuch that X ⊥⊥ Y |Z1,Z2
...
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
The PC algorithm: Step 1
Form the skeleton of the graph
Start with a fully connected graph
Remove edges X-Y where X ⊥⊥ Y
Remove edges X-Y for which there is a neighbour Z 6= Y of X suchthat X ⊥⊥ Y |ZRemove edges X-Y for which there are 2 neighbours Z1,Z2 6= Y of Xsuch that X ⊥⊥ Y |Z1,Z2
...
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
The PC algorithm: Step 2
Find v-structures
Search over triples X-Y-Z, where X and Z are not connected
Since X and Z not connected, there is a separating set S s.t.X ⊥⊥ Z |S .
Is Y in the set S? If not, then X-Y-Z must form a v-structure.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Independence-based Methods
The PC algorithm: Step 3
Direct further edges
We cannot have cycles and have found all v-structures.Use this knowledge to fill in other arrows.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Mooij, JP, Janzing, Zscheischler, Scholkopf: “Distinguishing cause from effect using observational data: methods and
benchmarks”, submitted 2014
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Assume P(X1, . . . ,X4) has been generated by
X1 = f1(X3,N1)
X2 = N2
X3 = f3(X2,N3)
X4 = f4(X2,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
X2 X3
X1G0
Structural equation model.Can the DAG be recovered from P(X1, . . . ,X4)?
No.JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
P. Buhlmann, JP, J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regr., Annals of Statistics 2014
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Assume P(X1, . . . ,X4) has been generated by
X1 = f1(X3,N1)
X2 = N2
X3 = f3(X2,N3)
X4 = f4(X2,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
X2 X3
X1G0
Structural equation model.Can the DAG be recovered from P(X1, . . . ,X4)? No.JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
P. Buhlmann, JP, J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regr., Annals of Statistics 2014
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Intuition: Y = f (X ,N) with discrete N taking d values N1, . . . ,Nd .
X Y
N
Noise values could switch between d different mechanisms f1, . . . , fd !
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Assume P(X1, . . . ,X4) has been generated by
X1 = f1(X3) + N1
X2 = N2
X3 = f3(X2) + N3
X4 = f4(X2,X3) + N4
• Ni ∼ N (0, σ2i ) jointly independent
• G0 has no cycles
X4
X2 X3
X1G0
Additive noise model with Gaussian noise.Can the DAG be recovered from P(X1, . . . ,X4)? Yes iff fi nonlinear.JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
P. Buhlmann, JP, J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regr., Annals of Statistics 2014
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Consider a distribution generated by
Y = f (X ) + NY
with NY ,Xind∼ N
X Y
Then, if f is nonlinear, there is no
X = g(Y ) + MX
with MX ,Yind∼ N
X Y
JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Consider a distribution generated by
Y = f (X ) + NY
with NY ,Xind∼ N
X Y
Then, if f is nonlinear, there is no
X = g(Y ) + MX
with MX ,Yind∼ N
X Y
JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Consider a distribution corresponding to
Y = X 3 + NY
with NY ,Xind∼ N
X Y
with
X ∼ N (1, 0.52)
NY ∼ N (0, 0.42)
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
05
1015
X
Y
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
05
1015
X
Y
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
05
1015
X
Y
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4
05
1015
gam(X ~ s(Y))$residuals
Y
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Let P(X1, . . . ,Xp) be generated by an ...
conditions identif.structural equation model: Xi = fi (XPAi
,Ni ) - 7
additive noise model: Xi = fi (XPAi) + Ni nonlin. fct. 3
causal additive model: Xi =∑
k∈PAifik (Xk ) + Ni nonlin. fct. 3
linear Gaussian model: Xi =∑
k∈PAiβik Xk + Ni linear fct. 7
.
(results hold for Gaussian noise)
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Find causal order + variable selectionRESIT Algorithm
But finding best causal order is intractable for many nodes!
# DAGs with 13 nodes:18676600744432035186664816926721
# orderings with 13 nodes:6227020800
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Restricted Structural Equation Models
Find causal order + variable selectionRESIT AlgorithmBut finding best causal order is intractable for many nodes!
# DAGs with 13 nodes:18676600744432035186664816926721
# orderings with 13 nodes:6227020800
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Real Data: cause-effect pairs
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
Significant
Not significant
Decision rate (%)
Accura
cy (
%)
IGCI
LiNGaM
Additive Noise
PNL
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Real Data: Time Series
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Real Data: Time Series
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Back to basics... linear regression.
y = βx + u
X Y
U
Assumption: no dependence between x and u.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Consider regressing GDP (y) on years of schooling (x).The error term includes terms which embody all factors other thanschooling e.g. ability.
Suppose person has high u from high (unobserved) ability.Directly increases y , and possibly indirectly via x too!
X Y
U
OLS regression will give β which includes the dependence of x on y directlyAND via u. This does not tell us what happens when we intervene on x .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Consider regressing GDP (y) on years of schooling (x).The error term includes terms which embody all factors other thanschooling e.g. ability.
Suppose person has high u from high (unobserved) ability.Directly increases y , and possibly indirectly via x too!
X Y
U
OLS regression will give β which includes the dependence of x on y directlyAND via u. This does not tell us what happens when we intervene on x .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Consider regressing GDP (y) on years of schooling (x).The error term includes terms which embody all factors other thanschooling e.g. ability.
Suppose person has high u from high (unobserved) ability.Directly increases y , and possibly indirectly via x too!
X Y
U
OLS regression will give β which includes the dependence of x on y directlyAND via u. This does not tell us what happens when we intervene on x .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Consider regressing GDP (y) on years of schooling (x).The error term includes terms which embody all factors other thanschooling e.g. ability.
Suppose person has high u from high (unobserved) ability.Directly increases y , and possibly indirectly via x too!
X Y
U
OLS regression will give β which includes the dependence of x on y directlyAND via u. This does not tell us what happens when we intervene on x .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
The problem is the endogeneity of x .We need to generate only exogenous variation in x .
Ideally, we would carry out experiments to control the variations.Often infeasible e.g. hard to control for ability and many other factors.
This is where instrumental variables come in.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
The problem is the endogeneity of x .We need to generate only exogenous variation in x .
Ideally, we would carry out experiments to control the variations.Often infeasible e.g. hard to control for ability and many other factors.
This is where instrumental variables come in.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
The problem is the endogeneity of x .We need to generate only exogenous variation in x .
Ideally, we would carry out experiments to control the variations.Often infeasible e.g. hard to control for ability and many other factors.
This is where instrumental variables come in.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
z is an instrumental variable if (i) it is uncorrelated with u and (ii) iscorrelated with x .
Z X Y
U
Condition (i) excludes z from being a regressor.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
z is an instrumental variable if (i) it is uncorrelated with u and (ii) iscorrelated with x .
Z X Y
U
Condition (i) excludes z from being a regressor.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Two stage least squares:
(1) Isolate the part of x uncorrelated with u by regressing x on z .
xi = π0 + π1zi + εi
Compute xi = π0 + π1zi .
(2) Now regress y on xyi = βxi + τi
βTSLS is a consistent estimator of β.
Can be extended to multiple dimensions and to non-linear models.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Two stage least squares:
(1) Isolate the part of x uncorrelated with u by regressing x on z .
xi = π0 + π1zi + εi
Compute xi = π0 + π1zi .
(2) Now regress y on xyi = βxi + τi
βTSLS is a consistent estimator of β.
Can be extended to multiple dimensions and to non-linear models.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Two stage least squares:
(1) Isolate the part of x uncorrelated with u by regressing x on z .
xi = π0 + π1zi + εi
Compute xi = π0 + π1zi .
(2) Now regress y on xyi = βxi + τi
βTSLS is a consistent estimator of β.
Can be extended to multiple dimensions and to non-linear models.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Two stage least squares:
(1) Isolate the part of x uncorrelated with u by regressing x on z .
xi = π0 + π1zi + εi
Compute xi = π0 + π1zi .
(2) Now regress y on xyi = βxi + τi
βTSLS is a consistent estimator of β.
Can be extended to multiple dimensions and to non-linear models.
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Example 1:
Estimate change in market demand (y) due to exogenous changes tomarket price (x).
Demand clearly depends on price. But price is not exogenously available.Suitable instrument?
z needs to directly effect price, but not demand... how about marketsupply
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Example 1:
Estimate change in market demand (y) due to exogenous changes tomarket price (x).
Demand clearly depends on price. But price is not exogenously available.Suitable instrument?
z needs to directly effect price, but not demand... how about marketsupply
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Example 1:
Estimate change in market demand (y) due to exogenous changes tomarket price (x).
Demand clearly depends on price. But price is not exogenously available.Suitable instrument?
z needs to directly effect price, but not demand... how about marketsupply
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Example 2:
Estimate change in GDP (y) due to exogenous change in years ofschooling (x).
Popular choice for z is proximity to university.People living far from university are likely to stay in school for a shortertime (condition (2) satisfied).
Condition (1) likely satisfied, but can argue people far from university aremore likely to be in lower wage areas, which would affect y .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Example 2:
Estimate change in GDP (y) due to exogenous change in years ofschooling (x).
Popular choice for z is proximity to university.People living far from university are likely to stay in school for a shortertime (condition (2) satisfied).
Condition (1) likely satisfied, but can argue people far from university aremore likely to be in lower wage areas, which would affect y .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Example 2:
Estimate change in GDP (y) due to exogenous change in years ofschooling (x).
Popular choice for z is proximity to university.People living far from university are likely to stay in school for a shortertime (condition (2) satisfied).
Condition (1) likely satisfied, but can argue people far from university aremore likely to be in lower wage areas, which would affect y .
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Instrumental Variables
Summary
Instrumental variables are useful for measuring how y changes whenyou intervene on x
Valid IVs are difficult to find
βTSLS can be very inefficient
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Key idea:P(Y |PAY ) remains invariant if the struct. equ. for Y does not change.
X1 = f1(X3,N1)
Y = f2(X1,N2)
X3 = f3(N3)
X4 = f4(Y ,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
Y X3
X1G0
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Key idea:P(Y |PAY ) remains invariant if the struct. equ. for Y does not change.
X1 = f1(N1)
Y = f2(X1,N2)
X3 = f3(N3)
X4 = f4(Y ,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
Y X3
X1G0
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Key idea:P(Y |PAY ) remains invariant if the struct. equ. for Y does not change.
X1 = f1(X3,N1)
Y = f2(X1,N2)
X3 = f3(N3)
X4 = f4(Y ,X3, N4)
• Ni jointly independent
• G0 has no cycles
X4
Y X3
X1G0
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Key idea:P(Y |PAY ) remains invariant if the struct. equ. for Y does not change.
X1 = f1(N1)
Y = f2(X1,N2)
X3 = f3(X1,X4, N3)
X4 = f4(Y , N4)
• Ni jointly independent
• G0 has no cycles
X4
Y X3
X1G0
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Observe target Y and p covariates X in different “environments” e ∈ E :
(X e ,Y e) ∼ Pe .
Idea:
1 Look for sets of predictors that lead to invariant prediction.
2 The variables that are in all of those sets are called S(E).
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Observe target Y and p covariates X in different “environments” e ∈ E :
(X e ,Y e) ∼ Pe .
Idea:
1 Look for sets of predictors that lead to invariant prediction.
2 The variables that are in all of those sets are called S(E).
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Example 1:S∗ = {X1,X2}, E = {1}
X3
X1 X2
Y
sleep
chocolate mountains
mood
e = 1: observational
S(E) = ∅
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Example 1:S∗ = {X1,X2}, E = {1}
X3
X1 X2
Y
sleep
chocolate mountains
mood
e = 1: observational
S(E) = ∅
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Example 2:S∗ = {X1,X2}, E = {1, 2}
X3
X1 X2
Y
sleep
chocolate mountains
mood
X3
X1 X2
Y
sleep
chocolate mountains
mood
e = 1: observational e = 2: intervention on X1
S(E) = {X1}
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Example 2:S∗ = {X1,X2}, E = {1, 2}
X3
X1 X2
Y
sleep
chocolate mountains
mood
X3
X1 X2
Y
sleep
chocolate mountains
mood
e = 1: observational e = 2: intervention on X1
S(E) = {X1}
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Example 3:S∗ = {X1,X2}, E = {1, 2}
X3
X1 X2
Y
sleep
chocolate mountains
mood
X3
X1 X2
Y
sleep
chocolate mountains
mood
e = 1: observational e = 2: intervention on X1
S(E) = {X1,X2}
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Theorem1 No mistakes:
S(E) ⊆ S∗.
2 No chances with one environment:
#E = 1 =⇒ S(E) = ∅.
3 Seeing more environments helps:
S(E1) ⊆ S(E2) if E1 ⊆ E2
4 Sufficient conditions for S(E) = S∗:
a) many “generic” interventions: on each node except Y ORb) single “generic” intervention: on a “youngest” parent of Y that is
directly connected to all other parents of Y .
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Theorem1 No mistakes:
S(E) ⊆ S∗.
2 No chances with one environment:
#E = 1 =⇒ S(E) = ∅.
3 Seeing more environments helps:
S(E1) ⊆ S(E2) if E1 ⊆ E2
4 Sufficient conditions for S(E) = S∗:
a) many “generic” interventions: on each node except Y ORb) single “generic” intervention: on a “youngest” parent of Y that is
directly connected to all other parents of Y .
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Theorem1 No mistakes:
S(E) ⊆ S∗.
2 No chances with one environment:
#E = 1 =⇒ S(E) = ∅.
3 Seeing more environments helps:
S(E1) ⊆ S(E2) if E1 ⊆ E2
4 Sufficient conditions for S(E) = S∗:
a) many “generic” interventions: on each node except Y ORb) single “generic” intervention: on a “youngest” parent of Y that is
directly connected to all other parents of Y .
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Theorem1 No mistakes:
S(E) ⊆ S∗.
2 No chances with one environment:
#E = 1 =⇒ S(E) = ∅.
3 Seeing more environments helps:
S(E1) ⊆ S(E2) if E1 ⊆ E2
4 Sufficient conditions for S(E) = S∗:
a) many “generic” interventions: on each node except Y ORb) single “generic” intervention: on a “youngest” parent of Y that is
directly connected to all other parents of Y .
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .
Theorem
Assume that the probability of falsely rejecting S∗ is less than α. Then
P(
S(E) ⊆ S∗)≥ 1− α
possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese
of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .
Theorem
Assume that the probability of falsely rejecting S∗ is less than α. Then
P(
S(E) ⊆ S∗)≥ 1− α
possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese
of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .
Theorem
Assume that the probability of falsely rejecting S∗ is less than α. Then
P(
S(E) ⊆ S∗)≥ 1− α
possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese
of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .
Theorem
Assume that the probability of falsely rejecting S∗ is less than α. Then
P(
S(E) ⊆ S∗)≥ 1− α
possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)
possible test 2: linear regression on pooled data; test whether Rese
of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .
Theorem
Assume that the probability of falsely rejecting S∗ is less than α. Then
P(
S(E) ⊆ S∗)≥ 1− α
possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese
of group e have same mean and variance as Res−e .
(easy) tricks for computations and high-dimensions.
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .
Theorem
Assume that the probability of falsely rejecting S∗ is less than α. Then
P(
S(E) ⊆ S∗)≥ 1− α
possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese
of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.
JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)
p = 6170 genes
nobs = 160 wild-types
nint = 1479 gene deletions (targets known)
true hits: ≈ 9% of pairs
our method: E = {obs, int}
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)
p = 6170 genes
nobs = 160 wild-types
nint = 1479 gene deletions (targets known)
true hits: ≈ 9% of pairs
our method: E = {obs, int}
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)
p = 6170 genes
nobs = 160 wild-types
nint = 1479 gene deletions (targets known)
true hits: ≈ 9% of pairs
our method: E = {obs, int}Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
ACTIVITY GENE 5954
AC
TIV
ITY
GE
NE
471
0
−1.0 −0.5 0.0 0.5
−1.
0−
0.5
0.0
0.5
observational training data
ACTIVITY GENE 5954−1.0 −0.5 0.0 0.5
interventional training data(interv. on genes other than 5954 and 4710)
ACTIVITY GENE 5954
AC
TIV
ITY
GE
NE
471
0
−5 −4 −3 −2 −1 0 1
−5
−4
−3
−2
−1
01
interventional test data point(intervention on gene 5954)
most significant pair
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
ACTIVITY GENE 3729
AC
TIV
ITY
GE
NE
373
0
−0.5 0.0 0.5 1.0
−0.
50.
00.
51.
0
observational training data
ACTIVITY GENE 3729−0.5 0.0 0.5 1.0
interventional training data(interv. on genes other than 3729 and 3730)
ACTIVITY GENE 3729
AC
TIV
ITY
GE
NE
373
0
−4 −3 −2 −1 0 1 2
−4
−3
−2
−1
01
2 interventional test data point(intervention on gene 3729)
2nd most significant pair
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
ACTIVITY GENE 3672
AC
TIV
ITY
GE
NE
147
5
−0.5 0.0 0.5 1.0 1.5−0.
50.
00.
51.
01.
5
observational training data
ACTIVITY GENE 3672−0.5 0.0 0.5 1.0 1.5
interventional training data(interv. on genes other than 3672 and 1475)
ACTIVITY GENE 3672
AC
TIV
ITY
GE
NE
147
5
−3 −2 −1 0 1 2
−3
−2
−1
01
2 interventional test data point(intervention on gene 3672)
3rd most significant pair
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Observing data from different environments
proposedGIES IDA
marginal corr. randommethod observ. pooled guessing
# of true6 2 2 1 2
2 (95% quantile)positives 3 (99% quantile)
(out of 8) 4 (99.9% quantile)
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Summary:
: additive noise (single environment)
: independence-based methods (single environment)
: invariant prediction; control family wise error rate
Future directions (theory):
extend to estimation of graphs
combine with 1 or 2
nonlinear models
finite sample guarantees
Future work (methodology):
domain adaption
instrumental variables I X Y
W
Thank you!
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Summary:
: additive noise (single environment)
: independence-based methods (single environment)
: invariant prediction; control family wise error rate
Future directions (theory):
extend to estimation of graphs
combine with 1 or 2
nonlinear models
finite sample guarantees
Future work (methodology):
domain adaption
instrumental variables I X Y
W
Thank you!
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
Summary:
: additive noise (single environment)
: independence-based methods (single environment)
: invariant prediction; control family wise error rate
Future directions (theory):
extend to estimation of graphs
combine with 1 or 2
nonlinear models
finite sample guarantees
Future work (methodology):
domain adaption
instrumental variables I X Y
W
Thank you!
Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015
top related