a stability upgrade for data-driven models

70
A stability upgrade for data-driven models Y X 1 X 4 X 3 X 2 Jonas Peters University of Copenhagen AI Seminar, 14th June 2019

Upload: others

Post on 14-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A stability upgrade for data-driven models

A stability upgrade for data-driven models

Y

X1 X4

X3X2

Jonas PetersUniversity of Copenhagen

AI Seminar, 14th June 2019

Page 2: A stability upgrade for data-driven models

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 3: A stability upgrade for data-driven models

Stefan Bauer Peter Buhlmann Rune Christiansen Martin Jakobsen

Nicolai Meinshausen Niklas PfisterDominik

Rothenhausler Mads Thøisen

asd

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 4: A stability upgrade for data-driven models

Given yeast data...

this is the predictor with the strongest correlation:

data from: Kemmeren et al. 2014

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 5: A stability upgrade for data-driven models

Given yeast data... this is the predictor with the strongest correlation:

data from: Kemmeren et al. 2014

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 6: A stability upgrade for data-driven models

After an intervention on that predictor: (it’s a non-causal predictor!)

data from: Kemmeren et al. 2014

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 7: A stability upgrade for data-driven models

After an intervention on that predictor: (it’s a non-causal predictor!)

data from: Kemmeren et al. 2014

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 8: A stability upgrade for data-driven models

The Problem of Causal Discovery:

observed data

Y X2 X3 X4 X5

3.4 −0.3 5.8 −2.1 2.21.7 −0.2 7.0 −1.2 0.4−2.4 −0.1 4.3 −0.7 3.5

2.3 −0.3 5.5 −1.1 −4.43.5 −0.2 3.9 −0.9 −3.9...

......

......

?−→ causal model

Y

target

X2 X3

X4 X5

Here: Find the direct causes of Y !

(Different from: variable adjustment, propens. score matching, double robust methods, IVs, ...)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 9: A stability upgrade for data-driven models

The Problem of Causal Discovery:

observed data

Y X2 X3 X4 X5

3.4 −0.3 5.8 −2.1 2.21.7 −0.2 7.0 −1.2 0.4−2.4 −0.1 4.3 −0.7 3.5

2.3 −0.3 5.5 −1.1 −4.43.5 −0.2 3.9 −0.9 −3.9...

......

......

?−→ causal model

Y

target

X2 X3

X4 X5

Here: Find the direct causes of Y !

(Different from: variable adjustment, propens. score matching, double robust methods, IVs, ...)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 10: A stability upgrade for data-driven models

The Problem of Causal Discovery:

observed data

Y X2 X3 X4 X5

3.4 −0.3 5.8 −2.1 2.21.7 −0.2 7.0 −1.2 0.4−2.4 −0.1 4.3 −0.7 3.5

2.3 −0.3 5.5 −1.1 −4.43.5 −0.2 3.9 −0.9 −3.9...

......

......

?−→ causal model

Y

target

X2 X3

X4 X5

Here: Find the direct causes of Y !

(Different from: variable adjustment, propens. score matching, double robust methods, IVs, ...)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 11: A stability upgrade for data-driven models

The Problem of Causal Discovery:

observed data

Y X2 X3 X4 X5

3.4 −0.3 5.8 −2.1 2.21.7 −0.2 7.0 −1.2 0.4−2.4 −0.1 4.3 −0.7 3.5

2.3 −0.3 5.5 −1.1 −4.43.5 −0.2 3.9 −0.9 −3.9...

......

......

?−→ causal model

Y

target

X2 X3

X4 X5

Here: Find the direct causes of Y !

(Different from: variable adjustment, propens. score matching, double robust methods, IVs, ...)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 12: A stability upgrade for data-driven models

The Problem of Causal Discovery:

observed data

Y X2 X3 X4 X5

3.4 −0.3 5.8 −2.1 2.21.7 −0.2 7.0 −1.2 0.4−2.4 −0.1 4.3 −0.7 3.5

2.3 −0.3 5.5 −1.1 −4.43.5 −0.2 3.9 −0.9 −3.9...

......

......

?−→ causal model

Y

target

X2 X3

X4 X5

Here: Find the direct causes of Y !

(Different from: variable adjustment, propens. score matching, double robust methods, IVs, ...)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 13: A stability upgrade for data-driven models

Fundamental assumption: X1,X4 → Y is invariant under interventions.

Y

X2 X3

X4 X5

Y

X2 X3

X4 X5

cf. modularity, autonomy, Haavelmo 1944, Aldrich 1989, Pearl 2009, . . .

Key idea: Use and data and search for invariant models.

set ∅ {2} {3} {4} · · · {2, 3} {2, 4} · · · {2, 3, 4}invariance 7 7 7 7 · · · 3 7 · · · 3

S :=⋂

S invariant

S = {1, 4}

JP, Buhlmann, Meinshausen, JRSS-B 2016 (with discussion): P(S ⊆ S∗) ≥ 1− α.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 14: A stability upgrade for data-driven models

Fundamental assumption: X1,X4 → Y is invariant under interventions.

Y

X2 X3

X4 X5

Y

X2 X3

X4 X5

cf. modularity, autonomy, Haavelmo 1944, Aldrich 1989, Pearl 2009, . . .

Key idea: Use and data and search for invariant models.

set ∅ {2} {3} {4} · · · {2, 3} {2, 4} · · · {2, 3, 4}invariance 7 7 7 7 · · · 3 7 · · · 3

S :=⋂

S invariant

S = {1, 4}

JP, Buhlmann, Meinshausen, JRSS-B 2016 (with discussion): P(S ⊆ S∗) ≥ 1− α.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 15: A stability upgrade for data-driven models

Fundamental assumption: X1,X4 → Y is invariant under interventions.

Y

X2 X3

X4 X5

Y

X2 X3

X4 X5

cf. modularity, autonomy, Haavelmo 1944, Aldrich 1989, Pearl 2009, . . .

Key idea: Use and data and search for invariant models.

set ∅ {2} {3} {4} · · · {2, 3} {2, 4} · · · {2, 3, 4}invariance 7 7 7 7 · · · 3 7 · · · 3

S :=⋂

S invariant

S = {1, 4}

JP, Buhlmann, Meinshausen, JRSS-B 2016 (with discussion): P(S ⊆ S∗) ≥ 1− α.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 16: A stability upgrade for data-driven models

X4

Y

data from environment A = 1

data from environment A = −1

non−invariant set

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

−0.

50.

00.

5

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 17: A stability upgrade for data-driven models

X4

Y

data from environment A = 1

data from environment A = −1

non−invariant set

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

−0.

50.

00.

5

A stands for anchor.

EA(Y − X 4b) 6= 0

Thus: X 4 is a non-invariant model.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 18: A stability upgrade for data-driven models

X4

Y

data from environment A = 1

data from environment A = −1

non−invariant set

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

−0.

50.

00.

5

A stands for anchor.

EA(Y − X 4b) 6= 0

Thus: X 4 is a non-invariant model.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 19: A stability upgrade for data-driven models

Predictors that are inferred to be causal by ICP...

YMR104C

YM

R10

3C

−0.5 0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5

xxxxxxx

xx

xx x x

xxxxxxxxxxxxxxx

xxxxxx

x xxxxxxxxxx xxxxxxxxxxxx

x xxxx

xx xxxxxxxxx

xx xxxxxx

xxx xxxxxxx xxxxx

x xxxxxx x

xxx

x

xxxxxxx

xx xxx

xxxxxx xxxx

xxxx x

xxx xx

xxxxxxxxx xx

xxxx

xxxx

x

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 20: A stability upgrade for data-driven models

...and the corresponding interventions on the predictor

YMR104C

YM

R10

3C

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

xxxxxxxxx

xxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxx xxxx

xx xxxxxxxxxxx xxx

xxxxxxxxxxxxxxxxx

xx xxxxxxxxxx

x

xxxxxxxx

xxxxxx

xxxx xxxxxx

xxxx

xxxxxxxxxxxxxxx

xxxx

xxx xx

YPL273W

YM

R32

1C

−4 −3 −2 −1 0

−4

−3

−2

−1

0 xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxx

xxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx

YCL040W

YC

L042

W

−3 −2 −1 0 1 2 3

−2

−1

01

2

x

xxx

x

xx

xx

x

x

x

xx

xxxx

xx

x

xxx

xx

xx xx

xx xxx

xx

x

xxxxx

xxx

xxxx

xxx

xxx

x

x

xxx

xxx

xxxx

xx

x

xxx

x

xxx xx

x

x

xxxxx

xx

xxx

x

xxx

x

xxxx

xx

xxxx

xxxx

xxxxx

xxxxx xxx

xx

x

xx

xxx

x

xxx

xxxx

xxx

xx

xx

x

x

x

x

xx

x

xx

xx

x x

YLL019C

YLL

020C

−4 −3 −2 −1 0 1

−2.

0−

1.0

0.0

0.5

1.0

xxxxxxxxxx

xx xxxxx

xxxx

xxxx xxxx x

xxxxxxxx

xxx

xxxxxxxxx

x

xxxxxxxxxx

xxx xxxxx

xxxxx

xxxxxxx

xx

xxxxxxxxx

xx

xxx

xxxxxx xxx

x

xx

xxxx x

x

xx

xxxxxxxxxxx

x

xx

xxxx

x

xx

xx xxx

xxxx

xx x

xxxx

x xx

xxx

Kemmeren et al. 2014

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 21: A stability upgrade for data-driven models

...and the corresponding interventions on the predictor

YMR104C

YM

R10

3C

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

xxxxxxxxx

xxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxx xxxx

xx xxxxxxxxxxx xxx

xxxxxxxxxxxxxxxxx

xx xxxxxxxxxx

x

xxxxxxxx

xxxxxx

xxxx xxxxxx

xxxx

xxxxxxxxxxxxxxx

xxxx

xxx xx

YPL273W

YM

R32

1C

−4 −3 −2 −1 0

−4

−3

−2

−1

0 xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxx

xxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx

YCL040W

YC

L042

W

−3 −2 −1 0 1 2 3

−2

−1

01

2

x

xxx

x

xx

xx

x

x

x

xx

xxxx

xx

x

xxx

xx

xx xx

xx xxx

xx

x

xxxxx

xxx

xxxx

xxx

xxx

x

x

xxx

xxx

xxxx

xx

x

xxx

x

xxx xx

x

x

xxxxx

xx

xxx

x

xxx

x

xxxx

xx

xxxx

xxxx

xxxxx

xxxxx xxx

xx

x

xx

xxx

x

xxx

xxxx

xxx

xx

xx

x

x

x

x

xx

x

xx

xx

x x

YLL019C

YLL

020C

−4 −3 −2 −1 0 1

−2.

0−

1.0

0.0

0.5

1.0

xxxxxxxxxx

xx xxxxx

xxxx

xxxx xxxx x

xxxxxxxx

xxx

xxxxxxxxx

x

xxxxxxxxxx

xxx xxxxx

xxxxx

xxxxxxx

xx

xxxxxxxxx

xx

xxx

xxxxxx xxx

x

xx

xxxx x

x

xx

xxxxxxxxxxx

x

xx

xxxx

x

xx

xx xxx

xxxx

xx x

xxxx

x xx

xxx

Kemmeren et al. 2014

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 22: A stability upgrade for data-driven models

Discrete environments (gene data):JP, Meinshausen, Buhlmann: Causal inference using invariant prediction: identification and confidence intervals, JRSSB 2016

No environments (finance data): = timePfister, Buhlmann, JP: Invariant causal pred. for seq. data, JASA 2018

Nonlinear relations (fertility data):Heinze-Deml, JP, Meinshausen: Invariant Causal Prediction for Nonlinear Models, Journal of Causal Inference 2018

Discrete hidden variables (Earth system data):Christiansen, JP: Invariance-based Causal Discovery in the Presence of Discrete Hidden Variables, arXiv:1808.05541

20

30

40

50

60

−125 −100 −75 −50

longitude

latit

ude

ENF CRO

IGBP land cover classification

20

30

40

50

60

−125 −100 −75 −50

longitude

latit

ude

H = 1 H = 2

Classification based on reconstruction of H

−1

0

1

2

3

0 5 10 15

X2

Y

H = 1 H = 2

Fluorescence yield

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 23: A stability upgrade for data-driven models

Discrete environments (gene data):JP, Meinshausen, Buhlmann: Causal inference using invariant prediction: identification and confidence intervals, JRSSB 2016

No environments (finance data): = timePfister, Buhlmann, JP: Invariant causal pred. for seq. data, JASA 2018

Nonlinear relations (fertility data):Heinze-Deml, JP, Meinshausen: Invariant Causal Prediction for Nonlinear Models, Journal of Causal Inference 2018

Discrete hidden variables (Earth system data):Christiansen, JP: Invariance-based Causal Discovery in the Presence of Discrete Hidden Variables, arXiv:1808.05541

20

30

40

50

60

−125 −100 −75 −50

longitude

latit

ude

ENF CRO

IGBP land cover classification

20

30

40

50

60

−125 −100 −75 −50

longitude

latit

ude

H = 1 H = 2

Classification based on reconstruction of H

−1

0

1

2

3

0 5 10 15

X2

Y

H = 1 H = 2

Fluorescence yield

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 24: A stability upgrade for data-driven models

Discrete environments (gene data):JP, Meinshausen, Buhlmann: Causal inference using invariant prediction: identification and confidence intervals, JRSSB 2016

No environments (finance data): = timePfister, Buhlmann, JP: Invariant causal pred. for seq. data, JASA 2018

Nonlinear relations (fertility data):Heinze-Deml, JP, Meinshausen: Invariant Causal Prediction for Nonlinear Models, Journal of Causal Inference 2018

Discrete hidden variables (Earth system data):Christiansen, JP: Invariance-based Causal Discovery in the Presence of Discrete Hidden Variables, arXiv:1808.05541

20

30

40

50

60

−125 −100 −75 −50

longitude

latit

ude

ENF CRO

IGBP land cover classification

20

30

40

50

60

−125 −100 −75 −50

longitude

latit

ude

H = 1 H = 2

Classification based on reconstruction of H

−1

0

1

2

3

0 5 10 15

X2

Y

H = 1 H = 2

Fluorescence yield

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 25: A stability upgrade for data-driven models

Discrete environments (gene data):JP, Meinshausen, Buhlmann: Causal inference using invariant prediction: identification and confidence intervals, JRSSB 2016

No environments (finance data): = timePfister, Buhlmann, JP: Invariant causal pred. for seq. data, JASA 2018

Nonlinear relations (fertility data):Heinze-Deml, JP, Meinshausen: Invariant Causal Prediction for Nonlinear Models, Journal of Causal Inference 2018

Discrete hidden variables (Earth system data):Christiansen, JP: Invariance-based Causal Discovery in the Presence of Discrete Hidden Variables, arXiv:1808.05541

20

30

40

50

60

−125 −100 −75 −50

longitude

latit

ude

ENF CRO

IGBP land cover classification

20

30

40

50

60

−125 −100 −75 −50

longitude

latit

ude

H = 1 H = 2

Classification based on reconstruction of H

−1

0

1

2

3

0 5 10 15

X2

Y

H = 1 H = 2

Fluorescence yield

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 26: A stability upgrade for data-driven models

Discrete environments (gene data):JP, Meinshausen, Buhlmann: Causal inference using invariant prediction: identification and confidence intervals, JRSSB 2016

No environments (finance data): = timePfister, Buhlmann, JP: Invariant causal pred. for seq. data, JASA 2018

Nonlinear relations (fertility data):Heinze-Deml, JP, Meinshausen: Invariant Causal Prediction for Nonlinear Models, Journal of Causal Inference 2018

Discrete hidden variables (Earth system data):Christiansen, JP: Invariance-based Causal Discovery in the Presence of Discrete Hidden Variables, arXiv:1808.05541

20

30

40

50

60

−125 −100 −75 −50

longitude

latit

ude

ENF CRO

IGBP land cover classification

20

30

40

50

60

−125 −100 −75 −50

longitudela

titud

e

H = 1 H = 2

Classification based on reconstruction of H

−1

0

1

2

3

0 5 10 15

X2

Y

H = 1 H = 2

Fluorescence yield

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 27: A stability upgrade for data-driven models

Discrete environments (gene data):JP, Meinshausen, Buhlmann: Causal inference using invariant prediction: identification and confidence intervals, JRSSB 2016

No environments (finance data): = timePfister, Buhlmann, JP: Invariant causal pred. for seq. data, JASA 2018

Nonlinear relations (fertility data):Heinze-Deml, JP, Meinshausen: Invariant Causal Prediction for Nonlinear Models, Journal of Causal Inference 2018

Discrete hidden variables (Earth system data):Christiansen, JP: Invariance-based Causal Discovery in the Presence of Discrete Hidden Variables, arXiv:1808.05541

Survival data (registry data):Laksafoss, JP: Causal Methods for Survival Analysis, work in progress

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 28: A stability upgrade for data-driven models

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 29: A stability upgrade for data-driven models

invariance prediction

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 30: A stability upgrade for data-driven models

invariance prediction

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 31: A stability upgrade for data-driven models

invariance prediction

Find a trade-off between

• invariance with respect to

AND • predictive power

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 32: A stability upgrade for data-driven models

invariance prediction

Find a trade-off between

• invariance with respect to

AND • predictive power

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 33: A stability upgrade for data-driven models

Idea 1: Anchor regressionY ∈ R1: targetX ∈ R1×d : predictorsA ∈ R1×q: anchors, EAtA = Id

bγ := argminb

E(Y − Xb)2︸ ︷︷ ︸prediction

+ γ ‖EAt(Y − Xb)‖22︸ ︷︷ ︸

invariance

γ → 0: OLSγ →∞: IV solution (if identifiable)γ →∞: best invariant predictor (if not identifiable)

Anchor regression minimizes worst case prediction error under shiftinterventions. Rothenhausler, Buhlmann, Meinshausen, JP (arXiv:1801.06229)

The finite sample estimator is known as a k-class estimator for IVsolution. Theil (1958), Nagar (1959) – joint work with Martin Emil Jakobsen

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 34: A stability upgrade for data-driven models

Idea 1: Anchor regressionY ∈ R1: targetX ∈ R1×d : predictorsA ∈ R1×q: anchors, EAtA = Id

bγ := argminb

E(Y − Xb)2︸ ︷︷ ︸prediction

+ γ ‖EAt(Y − Xb)‖22︸ ︷︷ ︸

invariance

γ → 0: OLSγ →∞: IV solution (if identifiable)γ →∞: best invariant predictor (if not identifiable)

Anchor regression minimizes worst case prediction error under shiftinterventions. Rothenhausler, Buhlmann, Meinshausen, JP (arXiv:1801.06229)

The finite sample estimator is known as a k-class estimator for IVsolution. Theil (1958), Nagar (1959) – joint work with Martin Emil Jakobsen

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 35: A stability upgrade for data-driven models

Idea 1: Anchor regressionY ∈ R1: targetX ∈ R1×d : predictorsA ∈ R1×q: anchors, EAtA = Id

bγ := argminb

E(Y − Xb)2︸ ︷︷ ︸prediction

+ γ ‖EAt(Y − Xb)‖22︸ ︷︷ ︸

invariance

γ → 0: OLSγ →∞: IV solution (if identifiable)γ →∞: best invariant predictor (if not identifiable)

Anchor regression minimizes worst case prediction error under shiftinterventions. Rothenhausler, Buhlmann, Meinshausen, JP (arXiv:1801.06229)

The finite sample estimator is known as a k-class estimator for IVsolution. Theil (1958), Nagar (1959) – joint work with Martin Emil Jakobsen

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 36: A stability upgrade for data-driven models

Idea 1: Anchor regressionY ∈ R1: targetX ∈ R1×d : predictorsA ∈ R1×q: anchors, EAtA = Id

bγ := argminb

E(Y − Xb)2︸ ︷︷ ︸prediction

+ γ ‖EAt(Y − Xb)‖22︸ ︷︷ ︸

invariance

γ → 0: OLSγ →∞: IV solution (if identifiable)γ →∞: best invariant predictor (if not identifiable)

Anchor regression minimizes worst case prediction error under shiftinterventions. Rothenhausler, Buhlmann, Meinshausen, JP (arXiv:1801.06229)

The finite sample estimator is known as a k-class estimator for IVsolution. Theil (1958), Nagar (1959) – joint work with Martin Emil Jakobsen

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 37: A stability upgrade for data-driven models

Idea 2:

N. Pfister, S. Bauer, JP:

Learning stable structures in kinetic systems: benefits of a causal approach

arXiv:1810.11776

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 38: A stability upgrade for data-driven models

Example: Maillard reaction Glu, Mel, C5, ForAc, Triose, Cn, AcAc, Amad, lysR, Fru, AMP

d

dt[Glu]t = −θ1[Glu]t + θ2[Fru]t − θ3[lysR]t [Glu]t

d

dt[Mel]t = θ4[AMP]t . . .

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 39: A stability upgrade for data-driven models

Example: Maillard reaction Glu, Mel, C5, ForAc, Triose, Cn, AcAc, Amad, lysR, Fru, AMP

d

dt[Glu]t = −θ1[Glu]t + θ2[Fru]t − θ3[lysR]t [Glu]t

d

dt[Mel]t = θ4[AMP]t . . .

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 40: A stability upgrade for data-driven models

Example: Maillard reaction Glu, Mel, C5, ForAc, Triose, Cn, AcAc, Amad, lysR, Fru, AMP

http://www.srfcdn.ch/radio/modules/dynimages/624/srf-1/2015/01/diverses/264377.150114_raclette_key.jpg

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 41: A stability upgrade for data-driven models

Example: Maillard reaction Glu, Mel, C5, ForAc, Triose, Cn, AcAc, Amad, lysR, Fru, AMP

d

dt[Glu]t = −θ1[Glu]t + θ2[Fru]t − θ3[lysR]t [Glu]t

d

dt[Mel]t = θ4[AMP]t . . .

If the network structure is unknown, can we recover it from data?

WANTED: system of ODEs: Xt = gθ(Xt , t), X0 = x0.

adding noise: Xt := Xt + εt .

GIVEN: discrete time measurements (w/ repetitions): Xt1 , . . . , Xtm .

Brands et al. (2002): “Kinetic model. of reactions in heated monosaccharide-casein systems.”, J. of agr. and food chemistry.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 42: A stability upgrade for data-driven models

Example: Maillard reaction Glu, Mel, C5, ForAc, Triose, Cn, AcAc, Amad, lysR, Fru, AMP

d

dt[Glu]t = −θ1[Glu]t + θ2[Fru]t − θ3[lysR]t [Glu]t

d

dt[Mel]t = θ4[AMP]t . . .

If the network structure is unknown, can we recover it from data?

WANTED: system of ODEs: Xt = gθ(Xt , t), X0 = x0.

adding noise: Xt := Xt + εt .

GIVEN: discrete time measurements (w/ repetitions): Xt1 , . . . , Xtm .

Brands et al. (2002): “Kinetic model. of reactions in heated monosaccharide-casein systems.”, J. of agr. and food chemistry.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 43: A stability upgrade for data-driven models

Example: Maillard reaction Glu, Mel, C5, ForAc, Triose, Cn, AcAc, Amad, lysR, Fru, AMP

d

dt[Glu]t = −θ1[Glu]t + θ2[Fru]t − θ3[lysR]t [Glu]t

d

dt[Mel]t = θ4[AMP]t . . .

If the network structure is unknown, can we recover it from data?

WANTED: system of ODEs: Xt = gθ(Xt , t), X0 = x0.

adding noise: Xt := Xt + εt .

GIVEN: discrete time measurements (w/ repetitions): Xt1 , . . . , Xtm .

Brands et al. (2002): “Kinetic model. of reactions in heated monosaccharide-casein systems.”, J. of agr. and food chemistry.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 44: A stability upgrade for data-driven models

Example: Maillard reaction Glu, Mel, C5, ForAc, Triose, Cn, AcAc, Amad, lysR, Fru, AMP

d

dt[Glu]t = −θ1[Glu]t + θ2[Fru]t − θ3[lysR]t [Glu]t

d

dt[Mel]t = θ4[AMP]t . . .

If the network structure is unknown, can we recover it from data?

WANTED: system of ODEs: Xt = gθ(Xt , t), X0 = x0.

adding noise: Xt := Xt + εt .

GIVEN: discrete time measurements (w/ repetitions): Xt1 , . . . , Xtm .

Brands et al. (2002): “Kinetic model. of reactions in heated monosaccharide-casein systems.”, J. of agr. and food chemistry.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 45: A stability upgrade for data-driven models

Example: Maillard reaction Glu, Mel, C5, ForAc, Triose, Cn, AcAc, Amad, lysR, Fru, AMP

d

dt[Glu]t = −θ1[Glu]t + θ2[Fru]t − θ3[lysR]t [Glu]t

d

dt[Mel]t = θ4[AMP]t . . .

If the network structure is unknown, can we recover it from data?

WANTED: system of ODEs: Xt = gθ(Xt , t), X0 = x0.

adding noise: Xt := Xt + εt .

GIVEN: discrete time measurements (w/ repetitions): Xt1 , . . . , Xtm .

Brands et al. (2002): “Kinetic model. of reactions in heated monosaccharide-casein systems.”, J. of agr. and food chemistry.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 46: A stability upgrade for data-driven models

Example: Maillard reaction Glu, Mel, C5, ForAc, Triose, Cn, AcAc, Amad, lysR, Fru, AMP

d

dt[Glu]t = −θ1[Glu]t + θ2[Fru]t − θ3[lysR]t [Glu]t

d

dt[Mel]t = θ4[AMP]t . . .

If the network structure is unknown, can we recover it from data?

WANTED: system of ODEs: Xt = gθ(Xt , t), X0 = x0.

adding noise: Xt := Xt + εt .

GIVEN: discrete time measurements (w/ repetitions): Xt1 , . . . , Xtm .

Brands et al. (2002): “Kinetic model. of reactions in heated monosaccharide-casein systems.”, J. of agr. and food chemistry.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 47: A stability upgrade for data-driven models

King et al, Nature 2004, Bongard et al, PNAS 2007, Calderhead et al, NIPS 2009, Schmidt et al, Science 2009, Oates,Mukherjee, AoAS 2012, Babtie et al, PNAS 2014, Hill et al, Nature Methods 2016, Chen et al, JASA 2016,

Rudy et al, Science Advances 2017, Brunten et al, Nature Comm 2017, Mikkelsen, Hansen, arXiv 1710.09308, many more...

nonlinear least squares:

argminθ

∑t∈{t1,...,tm}

‖Xt − Xt‖22

Xt = gθ(Xt , t), X0 = x0

causality prediction

Yt = gθ(Xt , t), X0 = x0

causality prediction

causal models are invariantHaavelmo 1944, Aldrich 1989, Pearl 2009, ...

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 48: A stability upgrade for data-driven models

King et al, Nature 2004, Bongard et al, PNAS 2007, Calderhead et al, NIPS 2009, Schmidt et al, Science 2009, Oates,Mukherjee, AoAS 2012, Babtie et al, PNAS 2014, Hill et al, Nature Methods 2016, Chen et al, JASA 2016,

Rudy et al, Science Advances 2017, Brunten et al, Nature Comm 2017, Mikkelsen, Hansen, arXiv 1710.09308, many more...

nonlinear least squares:

argminθ

∑t∈{t1,...,tm}

‖Xt − Xt‖22

Xt = gθ(Xt , t), X0 = x0

causality prediction

Yt = gθ(Xt , t), X0 = x0

causality prediction

causal models are invariantHaavelmo 1944, Aldrich 1989, Pearl 2009, ...

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 49: A stability upgrade for data-driven models

King et al, Nature 2004, Bongard et al, PNAS 2007, Calderhead et al, NIPS 2009, Schmidt et al, Science 2009, Oates,Mukherjee, AoAS 2012, Babtie et al, PNAS 2014, Hill et al, Nature Methods 2016, Chen et al, JASA 2016,

Rudy et al, Science Advances 2017, Brunten et al, Nature Comm 2017, Mikkelsen, Hansen, arXiv 1710.09308, many more...

nonlinear least squares:

argminθ

∑t∈{t1,...,tm}

‖Xt − Xt‖22

Xt = gθ(Xt , t), X0 = x0

causality prediction

Yt = gθ(Xt , t), X0 = x0

causality prediction

causal models are invariantHaavelmo 1944, Aldrich 1989, Pearl 2009, ...

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 50: A stability upgrade for data-driven models

King et al, Nature 2004, Bongard et al, PNAS 2007, Calderhead et al, NIPS 2009, Schmidt et al, Science 2009, Oates,Mukherjee, AoAS 2012, Babtie et al, PNAS 2014, Hill et al, Nature Methods 2016, Chen et al, JASA 2016,

Rudy et al, Science Advances 2017, Brunten et al, Nature Comm 2017, Mikkelsen, Hansen, arXiv 1710.09308, many more...

nonlinear least squares:

argminθ

∑t∈{t1,...,tm}

‖Xt − Xt‖22

Xt = gθ(Xt , t), X0 = x0

causality prediction

Yt = gθ(Xt , t), X0 = x0

causality prediction

causal models are invariantHaavelmo 1944, Aldrich 1989, Pearl 2009, ...

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 51: A stability upgrade for data-driven models

King et al, Nature 2004, Bongard et al, PNAS 2007, Calderhead et al, NIPS 2009, Schmidt et al, Science 2009, Oates,Mukherjee, AoAS 2012, Babtie et al, PNAS 2014, Hill et al, Nature Methods 2016, Chen et al, JASA 2016,

Rudy et al, Science Advances 2017, Brunten et al, Nature Comm 2017, Mikkelsen, Hansen, arXiv 1710.09308, many more...

nonlinear least squares:

argminθ

∑t∈{t1,...,tm}

‖Xt − Xt‖22

Xt = gθ(Xt , t), X0 = x0

causality prediction

Yt = gθ(Xt , t), X0 = x0

causality prediction

causal models are invariantHaavelmo 1944, Aldrich 1989, Pearl 2009, ...

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 52: A stability upgrade for data-driven models

N. Pfister, S. Bauer, JP: Identifying Causal Structure in Large-Scale Kinetic Systems, arXiv:1810.11776

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 53: A stability upgrade for data-driven models

How invariant isYt = θ1X

8t ?

1. For each repetition i , smooth target trajectory. y (i)

2. Obtain fitted values for target derivatives. ξ(i)t1, . . . , ξ

(i)tm

3. Smooth target trajectory, with constraints on derivatives. y (i)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 54: A stability upgrade for data-driven models

How invariant isYt = θ1X

8t ?

1. For each repetition i , smooth target trajectory. y (i)

2. Obtain fitted values for target derivatives. ξ(i)t1, . . . , ξ

(i)tm

3. Smooth target trajectory, with constraints on derivatives. y (i)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 55: A stability upgrade for data-driven models

How invariant isYt = θ1X

8t ?

1. For each repetition i , smooth target trajectory. y (i)

2. Obtain fitted values for target derivatives. ξ(i)t1, . . . , ξ

(i)tm

3. Smooth target trajectory, with constraints on derivatives. y (i)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 56: A stability upgrade for data-driven models

How invariant isYt = θ1X

8t ?

1. For each repetition i , smooth target trajectory. y (i)

2. Obtain fitted values for target derivatives. ξ(i)t1, . . . , ξ

(i)tm

3. Smooth target trajectory, with constraints on derivatives. y (i)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 57: A stability upgrade for data-driven models

0

3

6

9

0 25 50 75 100

time t

Yt

a

−0.5

0.0

0.5

1.0

1.5

2.0

0 5 10 15Xt

8

dYt/

dt

b

N. Pfister, S. Bauer, JP: Identifying Causal Structure in Large-Scale Kinetic Systems, arXiv:1810.11776

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 58: A stability upgrade for data-driven models

How invariant isYt = θ1X

8t ?

1. For each repetition i , smooth target trajectory. y (i)

2. Obtain fitted values for target derivatives. ξ(i)t1, . . . , ξ

(i)tm

3. Smooth target trajectory, with constraints on derivatives. y (i)

4. Score for model ranking

n∑i=1

[RSS(i) − RSS(i)

]/[RSS(i)

],

where RSS(i) :=∑

`

(y

(i)t` − Y

(i)t`

)2.

5. Turn the score for models into a score for variables/complexes.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 59: A stability upgrade for data-driven models

How invariant isYt = θ1X

8t ?

1. For each repetition i , smooth target trajectory. y (i)

2. Obtain fitted values for target derivatives. ξ(i)t1, . . . , ξ

(i)tm

3. Smooth target trajectory, with constraints on derivatives. y (i)

4. Score for model ranking

n∑i=1

[RSS(i) − RSS(i)

]/[RSS(i)

],

where RSS(i) :=∑

`

(y

(i)t` − Y

(i)t`

)2.

5. Turn the score for models into a score for variables/complexes.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 60: A stability upgrade for data-driven models

How invariant isYt = θ1X

8t ?

1. For each repetition i , smooth target trajectory. y (i)

2. Obtain fitted values for target derivatives. ξ(i)t1, . . . , ξ

(i)tm

3. Smooth target trajectory, with constraints on derivatives. y (i)

4. Score for model ranking

n∑i=1

[RSS(i) − RSS(i)

]/[RSS(i)

],

where RSS(i) :=∑

`

(y

(i)t` − Y

(i)t`

)2.

5. Turn the score for models into a score for variables/complexes.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 61: A stability upgrade for data-driven models

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

false positive rate

true

pos

itiv

e ra

te Method

CausalKinetiX

DM

GM

random

a

CausalKinetiX DM GM random

0 1 2 3 4 5+ 0 1 2 3 4 5+ 0 1 2 3 4 5+ 0 1 2 3 4 5+0.0

0.2

0.4

0.6

number of false discoveries before complete model recovery

freq

uenc

y

b

N. Pfister, S. Bauer, JP: Identifying Causal Structure in Large-Scale Kinetic Systems, arXiv:1810.11776

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 62: A stability upgrade for data-driven models

a) In-sample plot

N. Pfister, S. Bauer, JP: Identifying Causal Structure in Large-Scale Kinetic Systems, arXiv:1810.11776

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 63: A stability upgrade for data-driven models

b) Out-of-sample plot

N. Pfister, S. Bauer, JP: Identifying Causal Structure in Large-Scale Kinetic Systems, arXiv:1810.11776

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 64: A stability upgrade for data-driven models

Idea 3: Exploiting Invariance in Reinforcement Learning:joint work with Mads Thøisen

1. Q-learning: 1st epoch Sutton and Barto (2015)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 65: A stability upgrade for data-driven models

Idea 3: Exploiting Invariance in Reinforcement Learning:joint work with Mads Thøisen

2. Q-learning: 20,000th epoch Sutton and Barto (2015)

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 66: A stability upgrade for data-driven models

Idea 3: Exploiting Invariance in Reinforcement Learning:joint work with Mads Thøisen

3. Learning deadlocks from smaller levels.

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 67: A stability upgrade for data-driven models

Idea 3: Exploiting Invariance in Reinforcement Learning:joint work with Mads Thøisen

4. Q-learning: 20,000th epoch – exploiting deadlocks

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 68: A stability upgrade for data-driven models

Conclusions

Causal models 6= predictive models.

Causal models are invariant causal discovery.

Combining invariance and predictability distributional robustness.(anchor regression, metabolic networks, reinforcement learning)

invariance prediction

Tusind tak!

Book: JP, D. Janzing, B. Scholkopf: Elements of Causal Inference: Foundations and Learning Algorithms, MIT Press 2017

JP, P. Buhlmann, N. Meinshausen: Causal inference using inv. pred.: identif. and conf. interv. (w/ discussion), JRSSB 2016

R. Christiansen and JP: Switching Regression Models and Causal Inference in the Presence of Latent Variables, arXiv:1808.05541

N. Pfister, P. Buhlmann, JP: Invariant causal prediction for sequential data, JASA 2018

D. Rothenhaussler, P. Buhlmann, N. Meinshausen, JP: Anchor regression: heterogeneous data meets causality, arXiv:1801.06229

N. Pfister, S. Bauer, JP: Identifying Causal Structure in Large-Scale Kinetic Systems, arXiv:1810.11776

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 69: A stability upgrade for data-driven models

Conclusions

Causal models 6= predictive models.

Causal models are invariant causal discovery.

Combining invariance and predictability distributional robustness.(anchor regression, metabolic networks, reinforcement learning)

invariance prediction

Tusind tak!

Book: JP, D. Janzing, B. Scholkopf: Elements of Causal Inference: Foundations and Learning Algorithms, MIT Press 2017

JP, P. Buhlmann, N. Meinshausen: Causal inference using inv. pred.: identif. and conf. interv. (w/ discussion), JRSSB 2016

R. Christiansen and JP: Switching Regression Models and Causal Inference in the Presence of Latent Variables, arXiv:1808.05541

N. Pfister, P. Buhlmann, JP: Invariant causal prediction for sequential data, JASA 2018

D. Rothenhaussler, P. Buhlmann, N. Meinshausen, JP: Anchor regression: heterogeneous data meets causality, arXiv:1801.06229

N. Pfister, S. Bauer, JP: Identifying Causal Structure in Large-Scale Kinetic Systems, arXiv:1810.11776

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019

Page 70: A stability upgrade for data-driven models

Conclusions

Causal models 6= predictive models.

Causal models are invariant causal discovery.

Combining invariance and predictability distributional robustness.(anchor regression, metabolic networks, reinforcement learning)

invariance prediction

Tusind tak!

Book: JP, D. Janzing, B. Scholkopf: Elements of Causal Inference: Foundations and Learning Algorithms, MIT Press 2017

JP, P. Buhlmann, N. Meinshausen: Causal inference using inv. pred.: identif. and conf. interv. (w/ discussion), JRSSB 2016

R. Christiansen and JP: Switching Regression Models and Causal Inference in the Presence of Latent Variables, arXiv:1808.05541

N. Pfister, P. Buhlmann, JP: Invariant causal prediction for sequential data, JASA 2018

D. Rothenhaussler, P. Buhlmann, N. Meinshausen, JP: Anchor regression: heterogeneous data meets causality, arXiv:1801.06229

N. Pfister, S. Bauer, JP: Identifying Causal Structure in Large-Scale Kinetic Systems, arXiv:1810.11776

Jonas Peters (Univ. of Copenhagen) A stability upgrade for data-driven models 14 June 2019