distinguishing between cause and effect: estimation of...

Post on 22-Jul-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Distinguishing between Cause and Effect:Estimation of Causal Graphs with two Variables

Jonas PetersETH Zurich

Tutorial

NIPS 2013 Workshop on Causality9th December 2013

F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Problem: Given P(X ,Y ), can we infer whether

X → Y or Y → X ?

Difficulty: So much symmetry:

P(X ) · P(Y |X ) = P(X ,Y ) = P(X |Y ) · P(Y )

We need assumptions!! (e.g. Markov and faithfulness do not suffice.)

Surprise (for some assumptions):

2 variables ⇒ p variables

J. Peters, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, arXiv:1309.6779

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Problem: Given P(X ,Y ), can we infer whether

X → Y or Y → X ?

Difficulty: So much symmetry:

P(X ) · P(Y |X ) = P(X ,Y ) = P(X |Y ) · P(Y )

We need assumptions!! (e.g. Markov and faithfulness do not suffice.)

Surprise (for some assumptions):

2 variables ⇒ p variables

J. Peters, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, arXiv:1309.6779

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Problem: Given P(X ,Y ), can we infer whether

X → Y or Y → X ?

Difficulty: So much symmetry:

P(X ) · P(Y |X ) = P(X ,Y ) = P(X |Y ) · P(Y )

We need assumptions!! (e.g. Markov and faithfulness do not suffice.)

Surprise (for some assumptions):

2 variables ⇒ p variables

J. Peters, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, arXiv:1309.6779

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Idea No. 1: Linear Non-Gaussian Additive Models (LiNGAM)

Structural assumptions like additive non-Gaussian noise models break thesymmetry:

Y = βX + NY NY ⊥⊥ X ,

with NY non-Gaussian.

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Asymmetry No. 1

Consider a distribution corresponding to

Y = βX + NY

• NY ⊥⊥ X• NY non-Gaussian

X Y

Then there is no

X = φY + NX

• NX ⊥⊥ Y• NX non-Gaussian

X Y

S. Shimizu, P.O. Hoyer, A. Hyvarinen and A.J. Kerminen: A linear non-Gaussian acyclic model for causal discovery, JMLR 2006

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Asymmetry No. 1

Consider a distribution corresponding to

Y = βX + NY

• NY ⊥⊥ X• NY non-Gaussian

X Y

Then there is no

X = φY + NX

• NX ⊥⊥ Y• NX non-Gaussian

X Y

S. Shimizu, P.O. Hoyer, A. Hyvarinen and A.J. Kerminen: A linear non-Gaussian acyclic model for causal discovery, JMLR 2006

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Idea No. 2: Additive noise models

Nonlinear functions are also fine!

Y = f (X ) + NY NY ⊥⊥ X

P. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Scholkopf: Nonlinear causal discovery with additive noise models, NIPS 2008

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Asymmetry No. 2

Consider a distribution corresponding to

Y = f (X ) + NY

with NY ⊥⊥ X

X Y

Then for “most combinations” (f ,P(X ),P(NY )) there is no

X = g(Y ) + MX

with MX ⊥⊥ Y

X Y

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Asymmetry No. 2

Consider a distribution corresponding to

Y = f (X ) + NY

with NY ⊥⊥ X

X Y

Then for “most combinations” (f ,P(X ),P(NY )) there is no

X = g(Y ) + MX

with MX ⊥⊥ Y

X Y

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Y = f (X ) + NY , NY ⊥⊥ X

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Y = f (X ) + NY , NY ⊥⊥ X

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

X = g(Y ) + NX , NX ⊥⊥ Y

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

X = g(Y ) + NX , NX ⊥⊥ Y

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Idea No. 3: Gaussian Process Inference (GPI)

We can always write∗

Y = f (X ,NY ), NY ⊥⊥ X

andX = g(Y ,NX ), NX ⊥⊥ Y

Which model is more “complex”? Use Bayesian model comparison.

J. M. Mooij, O. Stegle, D. Janzing, K. Zhang, B. Scholkopf:

Probabilistic latent variable models for distinguishing between cause and effect, NIPS 2010

∗E.g., J. Peters: Restricted Structural Equation Models for Causal Inference, PhD Thesis

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Asymmetry No. 3

1 Fix the noise distribution to be N (0, 1).

2 Put prior p(θX ) on input distribution p(x | θX ) ( complexity of X ).

3 Put prior p(θf ) on the functions p(f | θf ) ( complexity of f ).

4 Approximate marginal likelihood for X → Y

p(x , y) = p(x) · p(y | x)

=

∫p(x | θX )p(θX ) dθX

·∫δ(y − f (x , e)

)p(e)p(f ) de df

θf

f

Y

X

θX

E

5 Approximate marginal likelihood for Y → X .

6 Compare.

J. M. Mooij, O. Stegle, D. Janzing, K. Zhang, B. Scholkopf:

Probabilistic latent variable models for distinguishing between cause and effect, NIPS 2010

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Asymmetry No. 3

1 Fix the noise distribution to be N (0, 1).

2 Put prior p(θX ) on input distribution p(x | θX ) ( complexity of X ).

3 Put prior p(θf ) on the functions p(f | θf ) ( complexity of f ).

4 Approximate marginal likelihood for X → Y

p(x , y) = p(x) · p(y | x)

=

∫p(x | θX )p(θX ) dθX

·∫δ(y − f (x , e)

)p(e)p(f ) de df

θf

f

Y

X

θX

E

5 Approximate marginal likelihood for Y → X .

6 Compare.

J. M. Mooij, O. Stegle, D. Janzing, K. Zhang, B. Scholkopf:

Probabilistic latent variable models for distinguishing between cause and effect, NIPS 2010

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Idea No. 4: Information Geometric Causal Inference (IGCI)

Assume a deterministic relationship

Y = f (X )

and that f and P(X ) are “independent”.

D. Janzing, J. M. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, B. Scholkopf:

Information-geometric approach to inferring causal directions, Artificial Intelligence 2012

y

x

f(x)

p(x)

p(y)

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Idea No. 4: Information Geometric Causal Inference (IGCI)

Assume a deterministic relationship

Y = f (X )

and that f and P(X ) are “independent”.

D. Janzing, J. M. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, B. Scholkopf:

Information-geometric approach to inferring causal directions, Artificial Intelligence 2012

y

x

f(x)

p(x)

p(y)

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Asymmetry No. 4

Consider Y = f (X ) with id 6= f : [0, 1]→ [0, 1] invertible and X = g(Y ).If

“cov”(log f ′, pX ) =

∫log(f ′(x)) pX (x) dx −

∫log f ′(x) dx = 0

then

“cov”(log g ′, pY ) =

∫log(g ′(y)) pY (y) dy −

∫log g ′(y) dy > 0

D. Janzing, J. M. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, B. Scholkopf:

Information-geometric approach to inferring causal directions, Artificial Intelligence 2012

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Open Questions 1: Quantifying Identifiability

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Open Questions 1: Quantifying Identifiability

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Open Questions 1: Quantifying Identifiability

Proposition

Assume P(X ,Y ) is generated by

Y = f (X ) + NY

with independent X and NY .

Theninf

Q∈{Q:Y→X}KL(P ||Q) = ?

first steps to understand the geometry

gives us finite sample guarantees

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Open Questions 2: Robustness

What happens if assumptions are violated? E.g., in case of confounding?

X Y

Z

Can we still infer X → Y ? How useful is this?

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Conclusions

In theory, we can brake asymmetry between cause and effect.

restricted structural equation models:- linear functions, additive non-Gaussian noise- nonlinear functions, additive noise

complexity measures on functions and distributions

“independence” between function and input distribution

... principles behind new methods from challenge?

Causal inference prob-

lem of climate change is

solved! Fight the cause!

Don’t fly! (Zurich-SFO 5.4t CO2)!

Compensate!

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Conclusions

In theory, we can brake asymmetry between cause and effect.

restricted structural equation models:- linear functions, additive non-Gaussian noise- nonlinear functions, additive noise

complexity measures on functions and distributions

“independence” between function and input distribution

... principles behind new methods from challenge?

Causal inference prob-

lem of climate change is

solved! Fight the cause!

Don’t fly! (Zurich-SFO 5.4t CO2)!

Compensate!

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Conclusions

In theory, we can brake asymmetry between cause and effect.

restricted structural equation models:- linear functions, additive non-Gaussian noise- nonlinear functions, additive noise

complexity measures on functions and distributions

“independence” between function and input distribution

... principles behind new methods from challenge?

Causal inference prob-

lem of climate change is

solved! Fight the cause!

Don’t fly! (Zurich-SFO 5.4t CO2)!

Compensate!

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

IGCI

It turns out that if X → Y∫log |f ′(x)|p(x) dx <

∫log |g ′(y)|p(y) dy

Estimator:

CX→Y :=1

m

m∑j=1

log

∣∣∣∣yj+1 − yjxj+1 − xj

∣∣∣∣ ≈ ∫ log |f ′(x)|p(x) dx

Infer X → Y ifCX→Y < CY→X

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Y = βX + NY , NY ⊥⊥ X , NY non-Gaussian

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

X

Y

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Y = βX + NY , NY ⊥⊥ X , NY non-Gaussian

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

X

Y

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

X = φY + NX , NX ⊥⊥ Y , NX non-Gaussian

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

X

Y

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

X = φY + NX , NX ⊥⊥ Y , NX non-Gaussian

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

X

Y

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Does X cause Y or vice versa?Real Data

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Does X cause Y or vice versa?Real Data

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Does X cause Y or vice versa?

No (not enough) data for chocolate

... but we have data for coffee!

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Does X cause Y or vice versa?

No (not enough) data for chocolate

... but we have data for coffee!

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Does X cause Y or vice versa?

0 2 4 6 8 10 12

05

15

25

coffee consumption per capita (kg)

# N

ob

el L

au

rea

tes /

10

mio

Correlation: 0.698, p-value: < 2.2 · 10−16.

Nobel Prize→ Coffee: Dependent residuals (p-value of 0).Coffee→ Nobel Prize: Dependent residuals (p-value of 0).

⇒ Model class too small? Causally insufficient?

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

Does X cause Y or vice versa?

0 2 4 6 8 10 12

05

15

25

coffee consumption per capita (kg)

# N

ob

el L

au

rea

tes /

10

mio

Correlation: 0.698, p-value: < 2.2 · 10−16.

Nobel Prize→ Coffee: Dependent residuals (p-value of 0).Coffee→ Nobel Prize: Dependent residuals (p-value of 0).

⇒ Model class too small? Causally insufficient?Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

The linear Gaussian case

Y = βX + NY

with independent

X ∼ N (0, σ2X ) and

N ∼ N (0, σ2NY

) .

Then there is a linear SEM with

X = αY + MX

How can we find α and MX ?

L2

NYY

XβX

Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013

top related