discovering cyclic causal models by independent components analysis gustavo lacerda peter spirtes...
TRANSCRIPT
![Page 1: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/1.jpg)
Discovering Cyclic Causal Models by Independent Components Analysis
Gustavo LacerdaPeter SpirtesJoseph RamseyPatrik O. Hoyer
![Page 2: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/2.jpg)
Structural Equation Models (SEMs) Graphical models that represent causal
relationships.
Manipulating x3 to a fixed value…
x1 x2
x3
x4
x1 x2
x3
x4
f3(x1, x2)x3 = x4 = f4(x3)
kM:M(do (x3 = k)):
![Page 3: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/3.jpg)
Structural Equation Models (SEMs) Can be acyclic
…or cyclic
The data produced bycyclic models can beinterpreted as equilibriumpoints of dynamical systems
x1 x2
x3
x4
![Page 4: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/4.jpg)
Linear Structural Equation Models (SEMs) (deterministic example) The structural equations are linear
e.g.: x3 = 1.2 x1 + 0.9 x2 - 3 x4 = -5 x3 + 1
Each edge weight tells usthe corresponding coefficient
x1 x2
x3
x4
1.2 0.9
-5
![Page 5: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/5.jpg)
Linear Structural Equation Models (SEMs) (with randomness) Now, each variable has an
additive noise term with non-zero variance.
x1 = e1x2 = e2 x3 = 1.2 x1 + 0.9 x2 – 3 + e3x4 = -5 x3 + 1 + e4
x = B x + e
x1 x2
x3
x4
1.2 0.9
-5
e1 e2
e3
e4
![Page 6: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/6.jpg)
Linear Structural Equation Models (SEMs) (with randomness) x = B x + e
Solving for x, we get: x = (I – B)-1 e
Let A = (I – B)-1
then x = A e
A is called the “mixing matrix”.
x1 x2
x3
x4
1.2 0.9
-5
e1 e2
e3
e4
![Page 7: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/7.jpg)
Linear Structural Equation Models (SEMs) (with randomness)
The “mixing matrix” shows how the noise propagates:
x1 x2
x3
x4
1.2 0.9
-5
e1 e2
e3
e4
![Page 8: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/8.jpg)
Linear Structural Equation Models (SEMs) (with randomness)
The “mixing matrix” shows how the noise propagates:
Done.
x1 x2 x3 x4
e1 e2 e3 e4
x1 x2
x3
x4
1.2 0.9
-5
e1 e2
e3
e4
Let’s make it:
11 1 1
1.2
-60.9
-5-4.5
![Page 9: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/9.jpg)
What can we learn from observational data alone? Until recently, the best we could do was
identify the d-separation equivalence class We couldn’t tell the difference between:
x1
x2
x1
x2
M1: M2:
![Page 10: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/10.jpg)
Why not?
Because it was assumed that the error terms are Gaussian
…and when they are Gaussian, these two graphs are distribution-equivalent
x1
x2
x1
x2
M1: M2:
![Page 11: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/11.jpg)
Independent Components Analysis (ICA) Cocktail party problem
You want to get back the original signals, but all you have are the mixtures. What can you do?
x = A e
x1 x2
e2e1
![Page 12: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/12.jpg)
Independent Components Analysis (ICA) Cocktail party problem
This equation has infinitely many solutions! For any invertible A, there is a solution!
But if you assume that the signals are independent, it is possible to estimate A and e from just x.
How?
x = A e
x1 x2
e2e1
![Page 13: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/13.jpg)
Independent Components Analysis (ICA) Cocktail party problem
Any choice of A implies a list of samples of e Each list of implied samples of e has a degree of
independence We want the A for which the implied e’s are maximally
independent e’s maximally independent ↔ e’s maximally non-Gaussian Intuition: Central Limit Theorem
x = A e
x1 x2
e2e1
![Page 14: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/14.jpg)
Independent Components Analysis (ICA)
We don’t know which source signal is which, i.e. which is Alex and which is Bob
Scaling: when used with SEMs, the variance of each error term is confounded with its coefficients on each x.
![Page 15: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/15.jpg)
The LiNGAM approach(Shimizu et al, 2006)
What happens if we generate data from this linear SEM
… and then run ICA?
x1
x2 x3
x4
e1
e2 e3
e4
1.5
-2
1.1
![Page 16: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/16.jpg)
The LiNGAM approach
We would expect to see:
Except that ICA doesn’t know the scaling
x1 x2 x3 x4
e1 e2 e3 e4
1.5 -3
1
1.1
-21 1 1
![Page 17: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/17.jpg)
The LiNGAM approach
So we should expect to see something like:
…and we’d need to normalize by dividing all children of e1 by 2
x1 x2 x3 x4
e1 e2 e3 e4
3 -62 2.2
-2
![Page 18: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/18.jpg)
The LiNGAM approach
getting us:
Except that ICA doesn’t know the order of the e’s, i.e. which e’s go with which x’s…
x1 x2 x3 x4
e1 e2 e3 e4
1.5 -31 1.1
-2
![Page 19: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/19.jpg)
The LiNGAM approach
really, ICA gives us something like:
So first we need to find the right permutation of the e’s And then do the scaling Note that, since the model is a DAG, there is exactly one valid way
to permute the error terms.
x1 x2 x3 x4
e…e… e… e…
3 -62 2.2
-2
x1 x2 x3 x4
e1 e2 e3 e4
3 -62 2.2
-2
x1 x2 x3 x4
e1 e2 e3 e4
1.5 -31 1.1
-2
![Page 20: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/20.jpg)
The LiNGAM approach
After some matrix magic, we get back:
x1
x2 x3
x4
e1
e2 e3
e4
1.5
-2
1.1
B = I – A-1
![Page 21: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/21.jpg)
The LiNGAM approach Discovers the full structure of the DAG … by assuming causal sufficiency (i.e. independence of the error terms)
“causal sufficiency”: no latent variable is a cause of more than one observed variable
linear case, causal sufficiency ↔ independence of the error terms In particular, now M1 and M2 can be distinguished!
x1
x2
x1
x2
M1: M2:
![Page 22: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/22.jpg)
The LiNGAM approach Gaussian Uniform
Images by Patrik Hoyer et al, used with permissionfrom “Estimation of causal effects using linear non-Gaussian causal models with hidden variables”
![Page 23: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/23.jpg)
The LiNGAM approach Note that, once the valid permutation was found, there were no left-
pointing arrows. This is because: the generating model was a DAG. we wrote down the x’s in an order compatible with it
But it is possible for ICA to return a matrix that does not satisfy the acyclicity assumption
LiNGAM will pretend the red edge is not there
x1 x2 x3 x4
e1 e2 e3 e4
![Page 24: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/24.jpg)
The LiNGAM approach
LiNGAM cannot discover cyclic models… because:
since it assumes the data was generated by a DAG, it searches for a single valid permutation
If we search for any number of valid permutations…
then we can discover cyclic models too. That’s exactly what we did!
![Page 25: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/25.jpg)
The LiNG-DG approach
When the data looks acyclic, it works just like LiNGAM, and returns a single model.
When the data looks cyclic, more than one permutation is considered valid. Thus, it returns a distribution-equivalent set containing more than one model.
“distribution-equivalent” means you can’t do better, at least without experimental data or further assumptions.
![Page 26: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/26.jpg)
The LiNG-DG approach Let’s simulate using
this model:
Error terms are generatedby sampling from aGaussian and squaring
15000 data points
We test which ICAcoefficients are zeroby using bootstrapsampling followed bya quantile test
Ready?x5
e1
x4
x1
x2
x3
e2
e5
e4
e33
1.2
-0.3
2-1
![Page 27: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/27.jpg)
The LiNG-DG approach
LiNG-DG returns a set with 2 models:#1 #2
![Page 28: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/28.jpg)
LiNG-DG + the stability assumption
Note that only one of these models is stable.
If our data is a set of equilibria, then the true model must be stable.
Under what conditions are we guaranteed to have a unique stable model?
![Page 29: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/29.jpg)
LiNG-DG + the stability assumption
Theorem: if the true model’s cycles don’t intersect, then only one model is stable.
For simple cycle models, cycle-products are inverted: c1 = 1/c2.
So at least one cycle will be > 1 (in modulus) and thus unstable.
each cycle works independently, and any valid permutation* will invert at least one cycle, creating an unstable model.
*except for the identity permutation
![Page 30: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/30.jpg)
very large class: not evencovariance equivalent
d-separation equivalence class
What should one use?
non-Gaussian Gaussian
DAG
DG
Constraint-based methodse.g. PC, CPC, SGS
(or Geiger and Heckerman 1994 for a Bayesian alternative)
LiNGAM
unique model
Richardson’s CCDLiNG-DG2 cases
acyclic unique model
cyclic distribution-equivalence class
unknownor both or too little data
Check outHoyer, Hyvärinen, Glymour, Spirtes, Scheines,Ramsey, Lacerda, Shimizu
(submitted)
?
![Page 32: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/32.jpg)
Appendix 1: self-loops
Equilibrium equations usually correspond with the dynamical equations.
EXCEPT if a self-loop has coefficient 1, we will get the wrong structure, and the predicted results of intervention will be wrong!
self-loop coefficients are underdetermined.
Our stability results only hold if we assume no self-loops.
![Page 33: Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer](https://reader035.vdocument.in/reader035/viewer/2022062515/56649c765503460f9492a1cf/html5/thumbnails/33.jpg)
Appendix 2: search and pruning
Testing zeros: local vs non-local methods
To estimate the variance of the estimated coefficients, we use bootstrap sampling, carefully.
How to find row-permutations of W that have a zeroless diagonal: Acyclic: Hungarian algorithm General: k-best linear assignments, or constrained n-
Rooks (put rooks on the non-zero entries)