lecture 7: probabilistic graphical models · 2021. 1. 3. · announcements homework 2 is due today!...

196
Lecture 7: Probabilistic Graphical Models Andr e Martins Deep Structured Learning Course, Fall 2020 Andr e Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 1 / 66

Upload: others

Post on 25-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Lecture 7: Probabilistic GraphicalModels

Andre Martins

Deep Structured Learning Course, Fall 2020

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 1 / 66

Page 2: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Announcements

• Homework 2 is due today!

• Project midterm report is due next week!

• Homework 3 is out, the deadline is December 9. Start early!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 2 / 66

Page 3: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Slide Credits

• Vlad Niculae (co-instructor of DSL last year)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 3 / 66

Page 4: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Graphical Models

In this unit, we will formalize & extend these graphical representationsencountered in previous lectures.

Directed

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Undirected

. . . Yi−1 Yi Yi+1 . . .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 4 / 66

Page 5: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Outline

1 Directed Models

Bayes networks

Conditional independence and D-separation

Causal graphs & the do operator

2 Undirected Models

Markov random fields

Factor graphs

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 5 / 66

Page 6: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Outline

1 Directed Models

Bayes networks

Conditional independence and D-separation

Causal graphs & the do operator

2 Undirected Models

Markov random fields

Factor graphs

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 6 / 66

Page 7: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Bayes (belief) networks

• Common task: Characterize how some related events co-occur.Specifically, in terms of probabilities!

• A car alarm is going off. Was there a break-in?

Break-in

Wind

Alarm

Barometer

P(B) B=yes B=no

.05 .95

• P(B | A) =?

Can we observe wind? P(B | A,W ) =?Maybe we’re in the basement, but have a barometer.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 7 / 66

Page 8: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Bayes (belief) networks

• Common task: Characterize how some related events co-occur.Specifically, in terms of probabilities!

• A car alarm is going off. Was there a break-in?

Break-in

Wind

Alarm

Barometer

P(B) B=yes B=no

.05 .95

P(A | B) A=on A=off

B=yes .99 .01B=no .10 .90

• P(B | A) =?

Can we observe wind? P(B | A,W ) =?Maybe we’re in the basement, but have a barometer.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 7 / 66

Page 9: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Bayes (belief) networks

• Common task: Characterize how some related events co-occur.Specifically, in terms of probabilities!

• A car alarm is going off. Was there a break-in?

Break-in Wind

Alarm

Barometer

P(B) B=yes B=no

.05 .95

P(A | B,W ) A=on A=off

B=yes W=lo .99 .01B=yes W=med .99 .01B=yes W=hi .999 .001B=no W=lo .01 .99B=no W=med .05 .95B=no W=hi .25 .75

• P(B | A) =? Can we observe wind? P(B | A,W ) =?

Maybe we’re in the basement, but have a barometer.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 7 / 66

Page 10: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Bayes (belief) networks

• Common task: Characterize how some related events co-occur.Specifically, in terms of probabilities!

• A car alarm is going off. Was there a break-in?

Break-in Wind

Alarm Barometer

P(B) B=yes B=no

.05 .95

P(A | B,W ) A=on A=off

B=yes W=lo .99 .01B=yes W=med .99 .01B=yes W=hi .999 .001B=no W=lo .01 .99B=no W=med .05 .95B=no W=hi .25 .75

• P(B | A) =? Can we observe wind? P(B | A,W ) =?Maybe we’re in the basement, but have a barometer.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 7 / 66

Page 11: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Bayes networks

Toolkit for encoding knowledge about interaction structures between rv’s.

Break-in Wind

Alarm Barometer

Directed acyclic graph (DAG). Nodes = variables. Arrows = statisticaldependencies.

In general: P(X1, . . . ,Xn) =∏i

P(Xi | parents(Xi )

)For example: P(Break-in, Wind, Alarm, Barometer)

= P(Break-in)P(Wind)P(Alarm | Break-in, Wind)P(Barometer |Wind)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 8 / 66

Page 12: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Without any structure, P(Break-in, Wind, Alarm, Barometer)would have to be stored & estimated like

Brk. Wind Alarm Bar. P

yes lo on lo 0.0243yes lo on med 0.0002yes lo on hi 0.0002yes lo off lo 0.0002yes lo off med 2.50e-06yes lo off hi 2.50e-06yes med on lo 0.0001yes med on med 0.0146yes med on hi 0.0001yes med off lo 1.50e-06yes med off med 0.0001yes med off hi 1.50e-06yes hi on lo 9.99e-05yes hi on med 9.99e-05yes hi on hi 0.0098yes hi off lo 1.00e-07yes hi off med 1.00e-07yes hi off hi 9.80e-06

Brk. Wind Alarm Bar. P

no lo on lo 0.0047no lo on med 4.75e-05no lo on hi 4.75e-05no lo off lo 0.4608no lo off med 0.0047no lo off hi 0.0047no med on lo 0.0001no med on med 0.0140no med on hi 0.0001no med off lo 0.0027no med off med 0.2653no med off hi 0.0027no hi on lo 0.0005no hi on med 0.0005no hi on hi 0.0466no hi off lo 0.0014no hi off med 0.0014no hi off hi 0.1397

P(Break-in=yes, Alarm=on) = 0.0496

P(Break-in=no, Alarm=on) = 0.0665P(Break-in=yes | Alarm=on) =

P(Break-in=yes, Alarm=on)∑b P(Break-in=b, Alarm=on)

= .427

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 9 / 66

Page 13: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Without any structure, P(Break-in, Wind, Alarm, Barometer)would have to be stored & estimated like

Brk. Wind Alarm Bar. P

yes lo on lo 0.0243yes lo on med 0.0002yes lo on hi 0.0002yes lo off lo 0.0002yes lo off med 2.50e-06yes lo off hi 2.50e-06yes med on lo 0.0001yes med on med 0.0146yes med on hi 0.0001yes med off lo 1.50e-06yes med off med 0.0001yes med off hi 1.50e-06yes hi on lo 9.99e-05yes hi on med 9.99e-05yes hi on hi 0.0098yes hi off lo 1.00e-07yes hi off med 1.00e-07yes hi off hi 9.80e-06

Brk. Wind Alarm Bar. P

no lo on lo 0.0047no lo on med 4.75e-05no lo on hi 4.75e-05no lo off lo 0.4608no lo off med 0.0047no lo off hi 0.0047no med on lo 0.0001no med on med 0.0140no med on hi 0.0001no med off lo 0.0027no med off med 0.2653no med off hi 0.0027no hi on lo 0.0005no hi on med 0.0005no hi on hi 0.0466no hi off lo 0.0014no hi off med 0.0014no hi off hi 0.1397

P(Break-in=yes, Alarm=on) = 0.0496

P(Break-in=no, Alarm=on) = 0.0665P(Break-in=yes | Alarm=on) =

P(Break-in=yes, Alarm=on)∑b P(Break-in=b, Alarm=on)

= .427

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 9 / 66

Page 14: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Without any structure, P(Break-in, Wind, Alarm, Barometer)would have to be stored & estimated like

Brk. Wind Alarm Bar. P

yes lo on lo 0.0243yes lo on med 0.0002yes lo on hi 0.0002yes lo off lo 0.0002yes lo off med 2.50e-06yes lo off hi 2.50e-06yes med on lo 0.0001yes med on med 0.0146yes med on hi 0.0001yes med off lo 1.50e-06yes med off med 0.0001yes med off hi 1.50e-06yes hi on lo 9.99e-05yes hi on med 9.99e-05yes hi on hi 0.0098yes hi off lo 1.00e-07yes hi off med 1.00e-07yes hi off hi 9.80e-06

Brk. Wind Alarm Bar. P

no lo on lo 0.0047no lo on med 4.75e-05no lo on hi 4.75e-05no lo off lo 0.4608no lo off med 0.0047no lo off hi 0.0047no med on lo 0.0001no med on med 0.0140no med on hi 0.0001no med off lo 0.0027no med off med 0.2653no med off hi 0.0027no hi on lo 0.0005no hi on med 0.0005no hi on hi 0.0466no hi off lo 0.0014no hi off med 0.0014no hi off hi 0.1397

P(Break-in=yes, Alarm=on) = 0.0496

P(Break-in=no, Alarm=on) = 0.0665

P(Break-in=yes | Alarm=on) =P(Break-in=yes, Alarm=on)∑b P(Break-in=b, Alarm=on)

= .427

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 9 / 66

Page 15: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Without any structure, P(Break-in, Wind, Alarm, Barometer)would have to be stored & estimated like

Brk. Wind Alarm Bar. P

yes lo on lo 0.0243yes lo on med 0.0002yes lo on hi 0.0002yes lo off lo 0.0002yes lo off med 2.50e-06yes lo off hi 2.50e-06yes med on lo 0.0001yes med on med 0.0146yes med on hi 0.0001yes med off lo 1.50e-06yes med off med 0.0001yes med off hi 1.50e-06yes hi on lo 9.99e-05yes hi on med 9.99e-05yes hi on hi 0.0098yes hi off lo 1.00e-07yes hi off med 1.00e-07yes hi off hi 9.80e-06

Brk. Wind Alarm Bar. P

no lo on lo 0.0047no lo on med 4.75e-05no lo on hi 4.75e-05no lo off lo 0.4608no lo off med 0.0047no lo off hi 0.0047no med on lo 0.0001no med on med 0.0140no med on hi 0.0001no med off lo 0.0027no med off med 0.2653no med off hi 0.0027no hi on lo 0.0005no hi on med 0.0005no hi on hi 0.0466no hi off lo 0.0014no hi off med 0.0014no hi off hi 0.1397

P(Break-in=yes, Alarm=on) = 0.0496

P(Break-in=no, Alarm=on) = 0.0665P(Break-in=yes | Alarm=on) =

P(Break-in=yes, Alarm=on)∑b P(Break-in=b, Alarm=on)

= .427

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 9 / 66

Page 16: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Knowing the model structure (statistical dependencies), complicatedmodels become manageable.

Br W

A Ba

P(Br, W, A, Ba)

= P(Br)P(W)P(A | Br, W)P(Ba |W)

• Can estimate parts in isolatione.g. P(Wind) from weather history.

• Can sample by following the graphfrom roots to leaves.

P(Br) yes no

.05 .95

P(W) lo mid hi

.5 .3 .2

P(A | Br ,W ) on off

Br=yes W=lo .99 .01Br=yes W=med .99 .01Br=yes W=hi .999 .001Br=no W=lo .01 .99Br=no W=med .05 .95Br=no W=hi .25 .75

P(Ba |W ) lo mid hi

W=lo .98 .01 .01W=mid .01 .98 .01

W=hi .01 .01 .98

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 10 / 66

Page 17: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Knowing the model structure (statistical dependencies), complicatedmodels become manageable.

Br W

A Ba

P(Br, W, A, Ba)

= P(Br)P(W)P(A | Br, W)P(Ba |W)

• Can estimate parts in isolatione.g. P(Wind) from weather history.

• Can sample by following the graphfrom roots to leaves.

P(Br) yes no

.05 .95

P(W) lo mid hi

.5 .3 .2

P(A | Br ,W ) on off

Br=yes W=lo .99 .01Br=yes W=med .99 .01Br=yes W=hi .999 .001Br=no W=lo .01 .99Br=no W=med .05 .95Br=no W=hi .25 .75

P(Ba |W ) lo mid hi

W=lo .98 .01 .01W=mid .01 .98 .01

W=hi .01 .01 .98

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 10 / 66

Page 18: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Knowing the model structure (statistical dependencies), complicatedmodels become manageable.

Br W

A Ba

P(Br, W, A, Ba)

= P(Br)P(W)P(A | Br, W)P(Ba |W)

• Can estimate parts in isolatione.g. P(Wind) from weather history.

• Can sample by following the graphfrom roots to leaves.

P(Br) yes no

.05 .95

P(W) lo mid hi

.5 .3 .2

P(A | Br ,W ) on off

Br=yes W=lo .99 .01Br=yes W=med .99 .01Br=yes W=hi .999 .001Br=no W=lo .01 .99Br=no W=med .05 .95Br=no W=hi .25 .75

P(Ba |W ) lo mid hi

W=lo .98 .01 .01W=mid .01 .98 .01

W=hi .01 .01 .98

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 10 / 66

Page 19: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Bayes Nets:

reduce number of parameters & aid estimation

let us reason about independencies in a model

are a building-block for modeling causality

Page 20: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Bayes Nets:

are not neural network diagrams

encode structure, not parametrization

are non-unique for a distribution

encode independence requirements, not necessarily all

Page 21: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN are not neural net diagrams

Recall the RNN language model:

• In statistical terms, what are we modeling?

P(X1, . . . ,Xn) = P(X1)P(X2 | X1)P(X3 | X1,X2) . . .

• Bayes Net: X1 X2 X3 X4 . . .

• Not useful! Everything conditionally-depends on everything. (more later)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 13 / 66

Page 22: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN are not neural net diagrams

Recall the RNN language model:

• In statistical terms, what are we modeling?

P(X1, . . . ,Xn) = P(X1)P(X2 | X1)P(X3 | X1,X2) . . .

• Bayes Net: X1 X2 X3 X4 . . .

• Not useful! Everything conditionally-depends on everything. (more later)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 13 / 66

Page 23: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN are not neural net diagrams

Recall the RNN language model:

• In statistical terms, what are we modeling?

P(X1, . . . ,Xn) = P(X1)P(X2 | X1)P(X3 | X1,X2) . . .

• Bayes Net: X1 X2 X3 X4 . . .

• Not useful! Everything conditionally-depends on everything. (more later)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 13 / 66

Page 24: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Neural net diagrams(and computation graphs)show how to compute something

X1 X2 X3 X4

Bayes networksshow how a distribution factorizes(what is assumed independent)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 14 / 66

Page 25: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN encode structure, not parametrization

A BN tells us: how the distribution decomposesA BN can’t tell us: what the probabilities are!

Example: X ∈ X = all English sentences, Y ∈ {sports, music, . . . }.

BN for a generative model: XY

We must posit what are P(Y ) and P(X | Y ). Many possible options!

P(Y ): uniform: P(Y = sports) = P(Y = music) = 1|Y| , or estimated from data.

P(X | Y ) (remember: values of X are sentences)

Naive Bayes P(X | Y ) =∏L

j=1 P(Xj | Y )

Per-class Markov language model P(X | Y ) =∏L

j=1 P(Xj | Xj−1,Y )

Per-class recurrent NN language model P(X | Y ) = LSTM(x1, . . . , xL;wy )

P(X | Y ) need not be parametrized as a table.

rv’s need not be discrete! mixture of Gaussians: P(X | Y = y) ∼ N(µy ,Σy ).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 15 / 66

Page 26: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN encode structure, not parametrization

A BN tells us: how the distribution decomposesA BN can’t tell us: what the probabilities are!

Example: X ∈ X = all English sentences, Y ∈ {sports, music, . . . }.

BN for a generative model: XY

We must posit what are P(Y ) and P(X | Y ). Many possible options!

P(Y ): uniform: P(Y = sports) = P(Y = music) = 1|Y| ,

or estimated from data.

P(X | Y ) (remember: values of X are sentences)

Naive Bayes P(X | Y ) =∏L

j=1 P(Xj | Y )

Per-class Markov language model P(X | Y ) =∏L

j=1 P(Xj | Xj−1,Y )

Per-class recurrent NN language model P(X | Y ) = LSTM(x1, . . . , xL;wy )

P(X | Y ) need not be parametrized as a table.

rv’s need not be discrete! mixture of Gaussians: P(X | Y = y) ∼ N(µy ,Σy ).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 15 / 66

Page 27: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN encode structure, not parametrization

A BN tells us: how the distribution decomposesA BN can’t tell us: what the probabilities are!

Example: X ∈ X = all English sentences, Y ∈ {sports, music, . . . }.

BN for a generative model: XY

We must posit what are P(Y ) and P(X | Y ). Many possible options!

P(Y ): uniform: P(Y = sports) = P(Y = music) = 1|Y| , or estimated from data.

P(X | Y ) (remember: values of X are sentences)

Naive Bayes P(X | Y ) =∏L

j=1 P(Xj | Y )

Per-class Markov language model P(X | Y ) =∏L

j=1 P(Xj | Xj−1,Y )

Per-class recurrent NN language model P(X | Y ) = LSTM(x1, . . . , xL;wy )

P(X | Y ) need not be parametrized as a table.

rv’s need not be discrete! mixture of Gaussians: P(X | Y = y) ∼ N(µy ,Σy ).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 15 / 66

Page 28: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN encode structure, not parametrization

A BN tells us: how the distribution decomposesA BN can’t tell us: what the probabilities are!

Example: X ∈ X = all English sentences, Y ∈ {sports, music, . . . }.

BN for a generative model: XY

We must posit what are P(Y ) and P(X | Y ). Many possible options!

P(Y ): uniform: P(Y = sports) = P(Y = music) = 1|Y| , or estimated from data.

P(X | Y ) (remember: values of X are sentences)

Naive Bayes P(X | Y ) =∏L

j=1 P(Xj | Y )

Per-class Markov language model P(X | Y ) =∏L

j=1 P(Xj | Xj−1,Y )

Per-class recurrent NN language model P(X | Y ) = LSTM(x1, . . . , xL;wy )

P(X | Y ) need not be parametrized as a table.

rv’s need not be discrete! mixture of Gaussians: P(X | Y = y) ∼ N(µy ,Σy ).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 15 / 66

Page 29: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN encode structure, not parametrization

A BN tells us: how the distribution decomposesA BN can’t tell us: what the probabilities are!

Example: X ∈ X = all English sentences, Y ∈ {sports, music, . . . }.

BN for a generative model: XY

We must posit what are P(Y ) and P(X | Y ). Many possible options!

P(Y ): uniform: P(Y = sports) = P(Y = music) = 1|Y| , or estimated from data.

P(X | Y ) (remember: values of X are sentences)

Naive Bayes P(X | Y ) =∏L

j=1 P(Xj | Y )

Per-class Markov language model P(X | Y ) =∏L

j=1 P(Xj | Xj−1,Y )

Per-class recurrent NN language model P(X | Y ) = LSTM(x1, . . . , xL;wy )

P(X | Y ) need not be parametrized as a table.

rv’s need not be discrete! mixture of Gaussians: P(X | Y = y) ∼ N(µy ,Σy ).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 15 / 66

Page 30: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN encode structure, not parametrization

A BN tells us: how the distribution decomposesA BN can’t tell us: what the probabilities are!

Example: X ∈ X = all English sentences, Y ∈ {sports, music, . . . }.

BN for a generative model: XY

We must posit what are P(Y ) and P(X | Y ). Many possible options!

P(Y ): uniform: P(Y = sports) = P(Y = music) = 1|Y| , or estimated from data.

P(X | Y ) (remember: values of X are sentences)

Naive Bayes P(X | Y ) =∏L

j=1 P(Xj | Y )

Per-class Markov language model P(X | Y ) =∏L

j=1 P(Xj | Xj−1,Y )

Per-class recurrent NN language model P(X | Y ) = LSTM(x1, . . . , xL;wy )

P(X | Y ) need not be parametrized as a table.

rv’s need not be discrete! mixture of Gaussians: P(X | Y = y) ∼ N(µy ,Σy ).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 15 / 66

Page 31: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN encode structure, not parametrization

A BN tells us: how the distribution decomposesA BN can’t tell us: what the probabilities are!

Example: X ∈ X = all English sentences, Y ∈ {sports, music, . . . }.

BN for a generative model: XY

We must posit what are P(Y ) and P(X | Y ). Many possible options!

P(Y ): uniform: P(Y = sports) = P(Y = music) = 1|Y| , or estimated from data.

P(X | Y ) (remember: values of X are sentences)

Naive Bayes P(X | Y ) =∏L

j=1 P(Xj | Y )

Per-class Markov language model P(X | Y ) =∏L

j=1 P(Xj | Xj−1,Y )

Per-class recurrent NN language model P(X | Y ) = LSTM(x1, . . . , xL;wy )

P(X | Y ) need not be parametrized as a table.

rv’s need not be discrete! mixture of Gaussians: P(X | Y = y) ∼ N(µy ,Σy ).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 15 / 66

Page 32: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN encode structure, not parametrization

A BN tells us: how the distribution decomposesA BN can’t tell us: what the probabilities are!

Example: X ∈ X = all English sentences, Y ∈ {sports, music, . . . }.

BN for a generative model: XY

We must posit what are P(Y ) and P(X | Y ). Many possible options!

P(Y ): uniform: P(Y = sports) = P(Y = music) = 1|Y| , or estimated from data.

P(X | Y ) (remember: values of X are sentences)

Naive Bayes P(X | Y ) =∏L

j=1 P(Xj | Y )

Per-class Markov language model P(X | Y ) =∏L

j=1 P(Xj | Xj−1,Y )

Per-class recurrent NN language model P(X | Y ) = LSTM(x1, . . . , xL;wy )

P(X | Y ) need not be parametrized as a table.

rv’s need not be discrete! mixture of Gaussians: P(X | Y = y) ∼ N(µy ,Σy ).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 15 / 66

Page 33: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

BN encode structure, not parametrization

A BN tells us: how the distribution decomposesA BN can’t tell us: what the probabilities are!

Example: X ∈ X = all English sentences, Y ∈ {sports, music, . . . }.

BN for a generative model: XY

We must posit what are P(Y ) and P(X | Y ). Many possible options!

P(Y ): uniform: P(Y = sports) = P(Y = music) = 1|Y| , or estimated from data.

P(X | Y ) (remember: values of X are sentences)

Naive Bayes P(X | Y ) =∏L

j=1 P(Xj | Y )

Per-class Markov language model P(X | Y ) =∏L

j=1 P(Xj | Xj−1,Y )

Per-class recurrent NN language model P(X | Y ) = LSTM(x1, . . . , xL;wy )

P(X | Y ) need not be parametrized as a table.

rv’s need not be discrete! mixture of Gaussians: P(X | Y = y) ∼ N(µy ,Σy ).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 15 / 66

Page 34: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Equivalent factorizations

There are many possible factorizations! P(X ,Y ) =

YX

P(X )P(Y | X )

YX

P(Y )P(X | Y )

YX

P(X )P(Y )

The first two are valid Bayes nets for any P(X ,Y )!

In fact, recall generative vs discriminative classifiers!

• Generative (e.g. naıve Bayes): YX

To classify, we would compute P(Y | X ) via Bayes’ rule.

• Discriminative (e.g. logistic regression) YX

in LR, we don’t model P(X ), we assume X is always observed (gray).

Some arrow direction choices are harder to estimate.

Some make more sense (why?): WindBarmtr. vs. WindBarmtr.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 16 / 66

Page 35: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Equivalent factorizations

There are many possible factorizations! P(X ,Y ) =

YX

P(X )P(Y | X )

YX

P(Y )P(X | Y )

YX

P(X )P(Y )

The first two are valid Bayes nets for any P(X ,Y )!

In fact, recall generative vs discriminative classifiers!

• Generative (e.g. naıve Bayes): YX

To classify, we would compute P(Y | X ) via Bayes’ rule.

• Discriminative (e.g. logistic regression) YX

in LR, we don’t model P(X ), we assume X is always observed (gray).

Some arrow direction choices are harder to estimate.

Some make more sense (why?): WindBarmtr. vs. WindBarmtr.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 16 / 66

Page 36: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Equivalent factorizations

There are many possible factorizations! P(X ,Y ) =

YX

P(X )P(Y | X )

YX

P(Y )P(X | Y )

YX

P(X )P(Y )

The first two are valid Bayes nets for any P(X ,Y )!

In fact, recall generative vs discriminative classifiers!

• Generative (e.g. naıve Bayes): YX

To classify, we would compute P(Y | X ) via Bayes’ rule.

• Discriminative (e.g. logistic regression) YX

in LR, we don’t model P(X ), we assume X is always observed (gray).

Some arrow direction choices are harder to estimate.

Some make more sense (why?): WindBarmtr. vs. WindBarmtr.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 16 / 66

Page 37: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Equivalent factorizations

There are many possible factorizations! P(X ,Y ) =

YX

P(X )P(Y | X )

YX

P(Y )P(X | Y )

YX

P(X )P(Y )

The first two are valid Bayes nets for any P(X ,Y )!

In fact, recall generative vs discriminative classifiers!

• Generative (e.g. naıve Bayes): YX

To classify, we would compute P(Y | X ) via Bayes’ rule.

• Discriminative (e.g. logistic regression) YX

in LR, we don’t model P(X ), we assume X is always observed (gray).

Some arrow direction choices are harder to estimate.

Some make more sense (why?): WindBarmtr. vs. WindBarmtr.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 16 / 66

Page 38: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Equivalent factorizations

There are many possible factorizations! P(X ,Y ) =

YX

P(X )P(Y | X )

YX

P(Y )P(X | Y )

YX

P(X )P(Y )

The first two are valid Bayes nets for any P(X ,Y )!

In fact, recall generative vs discriminative classifiers!

• Generative (e.g. naıve Bayes): YX

To classify, we would compute P(Y | X ) via Bayes’ rule.

• Discriminative (e.g. logistic regression) YX

in LR, we don’t model P(X ), we assume X is always observed (gray).

Some arrow direction choices are harder to estimate.

Some make more sense (why?): WindBarmtr. vs. WindBarmtr.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 16 / 66

Page 39: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Equivalent factorizations

There are many possible factorizations! P(X ,Y ) =

YX

P(X )P(Y | X )

YX

P(Y )P(X | Y )

YX

P(X )P(Y )

The first two are valid Bayes nets for any P(X ,Y )!

In fact, recall generative vs discriminative classifiers!

• Generative (e.g. naıve Bayes): YX

To classify, we would compute P(Y | X ) via Bayes’ rule.

• Discriminative (e.g. logistic regression) YX

in LR, we don’t model P(X ), we assume X is always observed (gray).

Some arrow direction choices are harder to estimate.

Some make more sense (why?): WindBarmtr. vs. WindBarmtr.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 16 / 66

Page 40: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Equivalent factorizations

There are many possible factorizations! P(X ,Y ) =

YX

P(X )P(Y | X )

YX

P(Y )P(X | Y )

YX

P(X )P(Y )

The first two are valid Bayes nets for any P(X ,Y )!

In fact, recall generative vs discriminative classifiers!

• Generative (e.g. naıve Bayes): YX

To classify, we would compute P(Y | X ) via Bayes’ rule.

• Discriminative (e.g. logistic regression) YX

in LR, we don’t model P(X ), we assume X is always observed (gray).

Some arrow direction choices are harder to estimate.

Some make more sense (why?): WindBarmtr. vs. WindBarmtr.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 16 / 66

Page 41: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Minimal independence assumptions

Recall, we say X ⊥⊥ Y iff. P(X ,Y ) = P(X )P(Y )Let X = grade in DSL, Y = month you were born.

Bayes net (1): YX

Example parametrization:

P(X) A+ A B ...

.01 .02 .04

P(Y) Jan Feb Mar ...

.10 .12 .09

BN (1) imposes X ⊥⊥ Yin any parametrization.

Bayes net (2): YX

Does it mean X 6⊥⊥ Y necessarily?NO!

P(Y) Jan Feb Mar ...

.10 .12 .09

P(X | Y ) 20 19 18 ...

Y=Jan .01 .02 .04Y=Feb .01 .02 .04Y=Mar .01 .02 .04

...

A BN expresses which independences must exist, but there can beadditional ones.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 17 / 66

Page 42: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Minimal independence assumptions

Recall, we say X ⊥⊥ Y iff. P(X ,Y ) = P(X )P(Y )Let X = grade in DSL, Y = month you were born.

Bayes net (1): YX

Example parametrization:

P(X) A+ A B ...

.01 .02 .04

P(Y) Jan Feb Mar ...

.10 .12 .09

BN (1) imposes X ⊥⊥ Yin any parametrization.

Bayes net (2): YX

Does it mean X 6⊥⊥ Y necessarily?NO!

P(Y) Jan Feb Mar ...

.10 .12 .09

P(X | Y ) 20 19 18 ...

Y=Jan .01 .02 .04Y=Feb .01 .02 .04Y=Mar .01 .02 .04

...

A BN expresses which independences must exist, but there can beadditional ones.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 17 / 66

Page 43: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Minimal independence assumptions

Recall, we say X ⊥⊥ Y iff. P(X ,Y ) = P(X )P(Y )Let X = grade in DSL, Y = month you were born.

Bayes net (1): YX

Example parametrization:

P(X) A+ A B ...

.01 .02 .04

P(Y) Jan Feb Mar ...

.10 .12 .09

BN (1) imposes X ⊥⊥ Yin any parametrization.

Bayes net (2): YX

Does it mean X 6⊥⊥ Y necessarily?NO!

P(Y) Jan Feb Mar ...

.10 .12 .09

P(X | Y ) 20 19 18 ...

Y=Jan .01 .02 .04Y=Feb .01 .02 .04Y=Mar .01 .02 .04

...

A BN expresses which independences must exist, but there can beadditional ones.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 17 / 66

Page 44: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Minimal independence assumptions

Recall, we say X ⊥⊥ Y iff. P(X ,Y ) = P(X )P(Y )Let X = grade in DSL, Y = month you were born.

Bayes net (1): YX

Example parametrization:

P(X) A+ A B ...

.01 .02 .04

P(Y) Jan Feb Mar ...

.10 .12 .09

BN (1) imposes X ⊥⊥ Yin any parametrization.

Bayes net (2): YX

Does it mean X 6⊥⊥ Y necessarily?

NO!

P(Y) Jan Feb Mar ...

.10 .12 .09

P(X | Y ) 20 19 18 ...

Y=Jan .01 .02 .04Y=Feb .01 .02 .04Y=Mar .01 .02 .04

...

A BN expresses which independences must exist, but there can beadditional ones.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 17 / 66

Page 45: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Minimal independence assumptions

Recall, we say X ⊥⊥ Y iff. P(X ,Y ) = P(X )P(Y )Let X = grade in DSL, Y = month you were born.

Bayes net (1): YX

Example parametrization:

P(X) A+ A B ...

.01 .02 .04

P(Y) Jan Feb Mar ...

.10 .12 .09

BN (1) imposes X ⊥⊥ Yin any parametrization.

Bayes net (2): YX

Does it mean X 6⊥⊥ Y necessarily?NO!

P(Y) Jan Feb Mar ...

.10 .12 .09

P(X | Y ) 20 19 18 ...

Y=Jan .01 .02 .04Y=Feb .01 .02 .04Y=Mar .01 .02 .04

...

A BN expresses which independences must exist, but there can beadditional ones.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 17 / 66

Page 46: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Minimal independence assumptions

Recall, we say X ⊥⊥ Y iff. P(X ,Y ) = P(X )P(Y )Let X = grade in DSL, Y = month you were born.

Bayes net (1): YX

Example parametrization:

P(X) A+ A B ...

.01 .02 .04

P(Y) Jan Feb Mar ...

.10 .12 .09

BN (1) imposes X ⊥⊥ Yin any parametrization.

Bayes net (2): YX

Does it mean X 6⊥⊥ Y necessarily?NO!

P(Y) Jan Feb Mar ...

.10 .12 .09

P(X | Y ) 20 19 18 ...

Y=Jan .01 .02 .04Y=Feb .01 .02 .04Y=Mar .01 .02 .04

...

A BN expresses which independences must exist, but there can beadditional ones.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 17 / 66

Page 47: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Outline

1 Directed Models

Bayes networks

Conditional independence and D-separation

Causal graphs & the do operator

2 Undirected Models

Markov random fields

Factor graphs

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 18 / 66

Page 48: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Conditional independence in Bayes nets

Identifying independences in a distribution is generally hard.

Bayes nets let us reason about it via graph algorithms!

Definition (conditional independence)

A is independent of B given a set of variables C = {C1, . . . ,Cn}, denoted

A ⊥⊥ B | C ,

iff P(A,B | C1, . . . ,Cn) = P(A | C1, . . . ,Cn)P(B | C1, . . . ,Cn).Note. Equivalently, P(A | B,C1, . . . ,Cn) = P(A | C1, . . . ,Cn).Intuitively: if we observe C , does observing B too bring us more info about A?

Break-in AlarmAndre

wakes up

You want to assess if I’m awake. Does it matter if there really was abreak-in?

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 19 / 66

Page 49: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Three fundamental relationships in BN

The Fork

A

C

B

The Chain

A C B

The Collider

A

C

B

A ⊥⊥ B | CGiven C , A and B are independent.

Example: Alarm ← Wind → Barometer

A ⊥⊥ B | CAfter observing C ,

further observing A would not tell us about B.Example: Burglary → Alarm → Andre wakes up

Surprisingly, A ⊥⊥ Bbut not A ⊥⊥ B | C !

Example: Burglary → Alarm ← WindBurglaries occur regardless how windy it is.

If alarm rings, hearing wind makes burglary less likely!Burglary is “explained away” by wind.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 20 / 66

Page 50: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Three fundamental relationships in BN

The Fork

A

C

B

The Chain

A C B

The Collider

A

C

B

A ⊥⊥ B | CGiven C , A and B are independent.

Example: Alarm ← Wind → Barometer

A ⊥⊥ B | CAfter observing C ,

further observing A would not tell us about B.Example: Burglary → Alarm → Andre wakes up

Surprisingly, A ⊥⊥ Bbut not A ⊥⊥ B | C !

Example: Burglary → Alarm ← WindBurglaries occur regardless how windy it is.

If alarm rings, hearing wind makes burglary less likely!Burglary is “explained away” by wind.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 20 / 66

Page 51: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Three fundamental relationships in BN

The Fork

A

C

B

The Chain

A C B

The Collider

A

C

B

A ⊥⊥ B | CGiven C , A and B are independent.

Example: Alarm ← Wind → Barometer

A ⊥⊥ B | CAfter observing C ,

further observing A would not tell us about B.Example: Burglary → Alarm → Andre wakes up

Surprisingly, A ⊥⊥ Bbut not A ⊥⊥ B | C !

Example: Burglary → Alarm ← WindBurglaries occur regardless how windy it is.

If alarm rings, hearing wind makes burglary less likely!Burglary is “explained away” by wind.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 20 / 66

Page 52: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Three fundamental relationships in BN

The Fork

A

C

B

The Chain

A C B

The Collider

A

C

B

A ⊥⊥ B | CGiven C , A and B are independent.

Example: Alarm ← Wind → Barometer

A ⊥⊥ B | CAfter observing C ,

further observing A would not tell us about B.Example: Burglary → Alarm → Andre wakes up

Surprisingly, A ⊥⊥ Bbut not A ⊥⊥ B | C !

Example: Burglary → Alarm ← WindBurglaries occur regardless how windy it is.

If alarm rings, hearing wind makes burglary less likely!Burglary is “explained away” by wind.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 20 / 66

Page 53: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Detecting independence: d-separation

Definition: A and B are d-separated given set C if for any path P fromA to B at least one holds:

1 P includes a fork with observed parent:

X ← Z → Y (with Z ∈ C )

2 P includes a chain with observed middle:

X → Z → Y or X ← Z ← Y (with Z ∈ C )

3 P includes a collider with unobserved descendants:

X → Z ← Y (with neither Z nor any of its descendants ∈ C )

Theorem: A and B d-separated given C =⇒ A ⊥⊥ B | C .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 21 / 66

Page 54: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Detecting independence: d-separation

Definition: A and B are d-separated given set C if for any path P fromA to B at least one holds:

1 P includes a fork with observed parent:

X ← Z → Y (with Z ∈ C )

2 P includes a chain with observed middle:

X → Z → Y or X ← Z ← Y (with Z ∈ C )

3 P includes a collider with unobserved descendants:

X → Z ← Y (with neither Z nor any of its descendants ∈ C )

Theorem: A and B d-separated given C =⇒ A ⊥⊥ B | C .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 21 / 66

Page 55: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Detecting independence: d-separation

Definition: A and B are d-separated given set C if for any path P fromA to B at least one holds:

1 P includes a fork with observed parent:

X ← Z → Y (with Z ∈ C )

2 P includes a chain with observed middle:

X → Z → Y or X ← Z ← Y (with Z ∈ C )

3 P includes a collider with unobserved descendants:

X → Z ← Y (with neither Z nor any of its descendants ∈ C )

Theorem: A and B d-separated given C =⇒ A ⊥⊥ B | C .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 21 / 66

Page 56: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Detecting independence: d-separation

Definition: A and B are d-separated given set C if for any path P fromA to B at least one holds:

1 P includes a fork with observed parent:

X ← Z → Y (with Z ∈ C )

2 P includes a chain with observed middle:

X → Z → Y or X ← Z ← Y (with Z ∈ C )

3 P includes a collider with unobserved descendants:

X → Z ← Y (with neither Z nor any of its descendants ∈ C )

Theorem: A and B d-separated given C =⇒ A ⊥⊥ B | C .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 21 / 66

Page 57: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Detecting independence: d-separation

Definition: A and B are d-separated given set C if for any path P fromA to B at least one holds:

1 P includes a fork with observed parent:

X ← Z → Y (with Z ∈ C )

2 P includes a chain with observed middle:

X → Z → Y or X ← Z ← Y (with Z ∈ C )

3 P includes a collider with unobserved descendants:

X → Z ← Y (with neither Z nor any of its descendants ∈ C )

Theorem: A and B d-separated given C =⇒ A ⊥⊥ B | C .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 21 / 66

Page 58: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer?

NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm?

No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 59: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? No

Break-in ⊥⊥Wind? YesBreak-in ⊥⊥ Barometer? Yes

Break-in ⊥⊥ Barometer | Alarm?

No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 60: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind?

YesBreak-in ⊥⊥ Barometer? Yes

Break-in ⊥⊥ Barometer | Alarm?

No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 61: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm?

No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 62: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer?

YesBreak-in ⊥⊥ Barometer | Alarm?

No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 63: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? Yes

Break-in ⊥⊥ Barometer | Alarm?

No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 64: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm?

NoBreak-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 65: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 66: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind?

Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 67: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 68: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 69: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . . Yi+1 ⊥⊥ Yi−1?

NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 70: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . . Yi+1 ⊥⊥ Yi−1? No

Yi+1 ⊥⊥ Yi−1 | Yi? YesYi+1 ⊥⊥ Xi? No

Yi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 71: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . . Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi?

YesYi+1 ⊥⊥ Xi? No

Yi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 72: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . . Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 73: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . . Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi?

NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 74: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . . Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? No

Yi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 75: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

YiYi

Xi

Yi+1

Xi+1

. . . Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi?

Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 76: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples

Break-in Wind

Alarm Barometer

Wind ⊥⊥ Barometer? NoBreak-in ⊥⊥Wind? Yes

Break-in ⊥⊥ Barometer? YesBreak-in ⊥⊥ Barometer | Alarm? No

Break-in ⊥⊥ Barometer | Alarm,Wind? Yes

. . . Yi−1

Xi−1

YiYi

Xi

Yi+1

Xi+1

. . . Yi+1 ⊥⊥ Yi−1? NoYi+1 ⊥⊥ Yi−1 | Yi? Yes

Yi+1 ⊥⊥ Xi? NoYi+1 ⊥⊥ Xi | Yi? Yes

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 22 / 66

Page 77: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Generative stories and plate notation

In papers, you’ll see statistical models defined through generative stories:

µ ∼ Uniform([−1, 1])

σ ∼ Uniform([1, 2])

X | µ, σ ∼ Normal(µ, σ)

µ σ

X

Plate notation is a way to denote repetition of templates:

µ ∼ Uniform([−1, 1])

σ ∼ Uniform([1, 2])

Xn | µ, σ ∼ Normal(µ, σ) i = 1, . . . ,N

µ σ

Xn

N

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 23 / 66

Page 78: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Outline

1 Directed Models

Bayes networks

Conditional independence and D-separation

Causal graphs & the do operator

2 Undirected Models

Markov random fields

Factor graphs

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 24 / 66

Page 79: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Correlation does not imply causation;but then, what does?

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 25 / 66

Page 80: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing

Bayes nets only model independence assumptions.

The correlation between the a barometer reading B and wind strength Wcan be represented either way:

WB vs. WB

Seeing that the barometer reading is high, we can forecast wind.

P(W | B) lo mid hi

B = lo .98 .01 .01B = mid .01 .98 .01B = hi .01 .01 .98

But setting the barometer needle to high manually won’t cause wind!

We write: P(W | do(B = hi)

)=?

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 26 / 66

Page 81: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing

Bayes nets only model independence assumptions.

The correlation between the a barometer reading B and wind strength Wcan be represented either way:

WB vs. WB

Seeing that the barometer reading is high, we can forecast wind.

P(W | B) lo mid hi

B = lo .98 .01 .01B = mid .01 .98 .01B = hi .01 .01 .98

But setting the barometer needle to high manually won’t cause wind!

We write: P(W | do(B = hi)

)=?

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 26 / 66

Page 82: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing

Bayes nets only model independence assumptions.

The correlation between the a barometer reading B and wind strength Wcan be represented either way:

WB vs. WB

Seeing that the barometer reading is high, we can forecast wind.

P(W | B) lo mid hi

B = lo .98 .01 .01B = mid .01 .98 .01B = hi .01 .01 .98

But setting the barometer needle to high manually won’t cause wind!

We write: P(W | do(B = hi)

)=?

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 26 / 66

Page 83: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing

Setting the barometer needle to high manually won’t cause wind!

Two reasons why doing 6= seeing:

• the direction does not express a causal relationship

• we missed some confounding factor

If we created wind with a ceiling fan, does it alter the barometer?

No! Pressure is a confounding factor.

W B

P

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 27 / 66

Page 84: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing

Setting the barometer needle to high manually won’t cause wind!

Two reasons why doing 6= seeing:

• the direction does not express a causal relationship

• we missed some confounding factor

If we created wind with a ceiling fan, does it alter the barometer?

No! Pressure is a confounding factor.

W B

P

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 27 / 66

Page 85: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing

Setting the barometer needle to high manually won’t cause wind!

Two reasons why doing 6= seeing:

• the direction does not express a causal relationship

• we missed some confounding factor

If we created wind with a ceiling fan, does it alter the barometer?

No! Pressure is a confounding factor.

W B

P

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 27 / 66

Page 86: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Causal models

Definition (Pearl 2000)

A causal model is a DAG G with vertices X1, . . . ,XN representing events.Almost like a BN. However, paths are causal.

• A causes B only if A is an ancestor of B in G.

• A→ B means A is a direct cause of B.

A good model is essential.

Wrong causal assumptions ⇒ wrong conclusions.

(We won’t cover how to assess if the model is right. This is a bitchicken-and-egg, but domain knowledge helps, and we are allowed toreason about unobserved causes.)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 28 / 66

Page 87: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing, more rigorouslySeeing (observational): P(W | B = hi)

Measure the world for a while (or call IPMA)

Date Pressure Wind Barometer

1977-01-01 hi hi hi1977-01-02 hi mid hi1977-01-02 mid mid mid. . .2019-11-03 hi hi hi

gives:P(W | B) lo mid hi

B = hi .01 .01 .98

W B

P

Doing (interventional): P(W | do(B = hi))

Set the needle to high, breaking inbound arrows;re-generate new data in this new DAG(or estimate what that would give.)

P(W | do(B = hi)) = P(W ) W B

P

do

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 29 / 66

Page 88: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing, more rigorouslySeeing (observational): P(W | B = hi)

Measure the world for a while (or call IPMA)

Date Pressure Wind Barometer

1977-01-01 hi hi hi1977-01-02 hi mid hi1977-01-02 mid mid mid. . .2019-11-03 hi hi hi

gives:P(W | B) lo mid hi

B = hi .01 .01 .98

W B

P

Doing (interventional): P(W | do(B = hi))

Set the needle to high, breaking inbound arrows;re-generate new data in this new DAG(or estimate what that would give.)

P(W | do(B = hi)) = P(W ) W B

P

do

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 29 / 66

Page 89: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing, more rigorouslySeeing (observational): P(W | B = hi)

Measure the world for a while (or call IPMA)

Date Pressure Wind Barometer

1977-01-01 hi hi hi1977-01-02 hi mid hi1977-01-02 mid mid mid. . .2019-11-03 hi hi hi

gives:P(W | B) lo mid hi

B = hi .01 .01 .98

W B

P

Doing (interventional): P(W | do(B = hi))

Set the needle to high, breaking inbound arrows;re-generate new data in this new DAG(or estimate what that would give.)

P(W | do(B = hi)) = P(W ) W B

P

do

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 29 / 66

Page 90: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing, more rigorouslySeeing (observational): P(W | B = hi)

Measure the world for a while (or call IPMA)

Date Pressure Wind Barometer

1977-01-01 hi hi hi1977-01-02 hi mid hi1977-01-02 mid mid mid. . .2019-11-03 hi hi hi

gives:P(W | B) lo mid hi

B = hi .01 .01 .98

W B

P

Doing (interventional): P(W | do(B = hi))

Set the needle to high, breaking inbound arrows;re-generate new data in this new DAG(or estimate what that would give.)

P(W | do(B = hi)) = P(W )

W B

P

do

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 29 / 66

Page 91: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Seeing versus doing, more rigorouslySeeing (observational): P(W | B = hi)

Measure the world for a while (or call IPMA)

Date Pressure Wind Barometer

1977-01-01 hi hi hi1977-01-02 hi mid hi1977-01-02 mid mid mid. . .2019-11-03 hi hi hi

gives:P(W | B) lo mid hi

B = hi .01 .01 .98

W B

P

Doing (interventional): P(W | do(B = hi))

Set the needle to high, breaking inbound arrows;re-generate new data in this new DAG(or estimate what that would give.)

P(W | do(B = hi)) = P(W ) W B

P

do

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 29 / 66

Page 92: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Randomized controlled trials

Try to actually implement the do operator in real life.

Treatment Disease

Genetics

?

do

Patient Treatment Genetics Disease

#42 real ? cured#68 placebo ? not cured. . .

No need to be able to measure geneticsas long as we can sample A LOT OF test subjects with no/little bias.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 30 / 66

Page 93: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Randomized controlled trials

Try to actually implement the do operator in real life.

Treatment Disease

Genetics

?do

Patient Treatment Genetics Disease

#42 real ? cured#68 placebo ? not cured. . .

No need to be able to measure geneticsas long as we can sample A LOT OF test subjects with no/little bias.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 30 / 66

Page 94: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Do-calculus

RCTs are powerful, but often unethical, always expensive.

Do-calculus: use the causal DAG assumptionsto draw causal conclusions from observational data.

• Apply transformations to P(X | do(Y )) until the “do” goes away.(Not always possible!)

• Quantities without “do” can be estimated observationally.

• Transformation: 3 rules.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 31 / 66

Page 95: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Pearl’s 3 rules

Notation:

X ,Y ,Z ,W disjoint sets of events (sets of nodes); may be emptyGX the graph with all edges into X removed.G

¯X the graph with all edges out of X removed.

Z (X ) subset of nodes in Z which are not ancestors of X .y ; do(x) shorthand for Y = y ; respectively do(X = x).

1 Ignoring observations:

P(y | do(x), z ,w) = P(y | do(x),w) if (Y ⊥⊥ Z | X ,W )GX

2 Action/observation exchange: the back-door criterion

P(y | do(x), do(z),w) = P(y | do(x), z ,w) if (Y ⊥⊥ Z | X ,W )GX,¯Z

3 Ignoring actions

P(y | do(x), do(z),w) = P(y | do(x),w) if (Y ⊥⊥ Z | X ,W )GX, ¯Z(W )

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 32 / 66

Page 96: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples 1,2: Pressure and barometer

P B do

Rule 3: P(P = hi | do(B = hi)) = P(P = hi) since (P ⊥⊥ B)GB

P Bdo

Rule 2: P(B = hi | do(P = lo)) = P(B = hi | P = lo) since (B ⊥⊥ P)G¯P

Good check: we get the intuitively correct results.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 33 / 66

Page 97: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples 1,2: Pressure and barometer

P B do

Rule 3: P(P = hi | do(B = hi)) = P(P = hi) since (P ⊥⊥ B)GB

P Bdo

Rule 2: P(B = hi | do(P = lo)) = P(B = hi | P = lo) since (B ⊥⊥ P)G¯P

Good check: we get the intuitively correct results.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 33 / 66

Page 98: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Examples 1,2: Pressure and barometer

P B do

Rule 3: P(P = hi | do(B = hi)) = P(P = hi) since (P ⊥⊥ B)GB

P Bdo

Rule 2: P(B = hi | do(P = lo)) = P(B = hi | P = lo) since (B ⊥⊥ P)G¯P

Good check: we get the intuitively correct results.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 33 / 66

Page 99: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Example 3: Measurable confounder

T : treatment, D: disease. The confounder is W : wealth.

T D

W

do

Condition on wealth (which thus needs to be measurable)

P(D = cured | do(T = y)) = P(D = cured | do(T = y),W = y)P(W = y | do(T = y))

+ P(D = cured | do(T = y),W = n)P(W = n | do(T = y))

= P(D = cured | do(T = y),W = y)P(W = y)

+ P(D = cured | do(T = y),W = n)P(W = n) (R3)

= P(D = cured | T = y,W = y)P(W = y)

+ P(D = cured | T = y,W = n)P(W = n) (R2)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 34 / 66

Page 100: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Example 3: an impossible one

T : treatment, D: disease.

The confounder is G : genetics (impractical to measure and estimate)

T D

G

do

Without more info or more assumptions, we’re stuck!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 35 / 66

Page 101: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Example 4: a surprisingly possible one

T : treatment, D: disease, B: blood cell count.

The confounder is G : genetics (still hidden)

T B D

G

do

“The front-door criterion:” conditioning on B lets us remove dos!

(I won’t show you how, derivation is a bit longer. Try it at home.)

P(D = cured | do(T = y) =∑b

P(B = b | T = y)∑t

P(D = cured | T = t,B = b)P(T = t)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 36 / 66

Page 102: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Directed models: summary

• Bayes nets: specify & estimate fine-grained distributionsover interdependent events.

• Under a specified model, algorithm to decideconditional independence: d-separation

• Bestowing a DAG with causal assumptionslets us reason about interventions.

Further reading: (Pearl, 1988; Koller and Friedman, 2009; Pearl, 2000, 2012; Dawid, 2010)

Slides on causal inference and learning causal structure (links):

• Sanna Tyrvainen, Introduction to Causal Calculus

• Ricardo Silva, Causality

• Dominik Janzing & Bernhard Scholkopf, Causality

Highly recommended online course: https://www.bradyneal.com/causal-inference-course

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 37 / 66

Page 103: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Graphical Models

In this unit, we will formalize & extend these graphical representationsencountered in previous lectures.

Directed

. . . Yi−1

Xi−1

Yi

Xi

Yi+1

Xi+1

. . .

Undirected

. . . Yi−1 Yi Yi+1 . . .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 38 / 66

Page 104: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Outline

1 Directed Models

Bayes networks

Conditional independence and D-separation

Causal graphs & the do operator

2 Undirected Models

Markov random fields

Factor graphs

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 39 / 66

Page 105: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Outline

1 Directed Models

Bayes networks

Conditional independence and D-separation

Causal graphs & the do operator

2 Undirected Models

Markov random fields

Factor graphs

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 40 / 66

Page 106: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Modeling friendships

• Four students: An, Bo, Chris, Dee are voting on a Yes/No ballot.• Friendship pairs: An–Bo, Bo–Chris, Chris–Dee, Dee–An.• Friends are 100x more likely to vote the same way.

A

DB

C

• An’s vote is a random variable A with values a ∈ {Y ,N}, and so on.

P(a, b, c, d) ∝ f (a, b) · f (b, c) · f (c, d) · f (d , a)

For any X ,Y ∈ {A,B,C ,D}, f is the compatibility function:

f (x , y) =

{100 if x = y = Yes or x = y = No1 otherwise.

• Can we represent this exact factorization in a Bayes net?

No!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 41 / 66

Page 107: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Modeling friendships

• Four students: An, Bo, Chris, Dee are voting on a Yes/No ballot.• Friendship pairs: An–Bo, Bo–Chris, Chris–Dee, Dee–An.• Friends are 100x more likely to vote the same way.

A

DB

C

• An’s vote is a random variable A with values a ∈ {Y ,N}, and so on.

P(a, b, c, d) ∝ f (a, b) · f (b, c) · f (c, d) · f (d , a)

For any X ,Y ∈ {A,B,C ,D}, f is the compatibility function:

f (x , y) =

{100 if x = y = Yes or x = y = No1 otherwise.

• Can we represent this exact factorization in a Bayes net?

No!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 41 / 66

Page 108: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Modeling friendships

• Four students: An, Bo, Chris, Dee are voting on a Yes/No ballot.• Friendship pairs: An–Bo, Bo–Chris, Chris–Dee, Dee–An.• Friends are 100x more likely to vote the same way.

A

DB

C

• An’s vote is a random variable A with values a ∈ {Y ,N}, and so on.

P(a, b, c, d) ∝ f (a, b) · f (b, c) · f (c, d) · f (d , a)

For any X ,Y ∈ {A,B,C ,D}, f is the compatibility function:

f (x , y) =

{100 if x = y = Yes or x = y = No1 otherwise.

• Can we represent this exact factorization in a Bayes net?

No!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 41 / 66

Page 109: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Modeling friendships

• Four students: An, Bo, Chris, Dee are voting on a Yes/No ballot.• Friendship pairs: An–Bo, Bo–Chris, Chris–Dee, Dee–An.• Friends are 100x more likely to vote the same way.

A

DB

C

• An’s vote is a random variable A with values a ∈ {Y ,N}, and so on.

P(a, b, c, d) ∝ f (a, b) · f (b, c) · f (c, d) · f (d , a)

For any X ,Y ∈ {A,B,C ,D}, f is the compatibility function:

f (x , y) =

{100 if x = y = Yes or x = y = No1 otherwise.

• Can we represent this exact factorization in a Bayes net? No!Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 41 / 66

Page 110: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Markov random fields

A

DB

C

Definition

Let G be an undirected graph with nodes corresponding to randomvariables X1, . . . ,XN . Let C (G) denote the set of cliques (fully connectedsubgraphs) of G. A MRF is a distribution of the form

P(x1, . . . , xn) =1

Z

∏c∈C

fc(xc)

where for each clique c, fc is a non-negative compatibility function.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 42 / 66

Page 111: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Any BN can be encoded in a MRF

1 First, add edge A− C for any collider structure A→ B ← C ;

2 Convert all arcs A→ B or A← B into undirected edges A− B.

A B

A B

A B

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 43 / 66

Page 112: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Any BN can be encoded in a MRF

1 First, add edge A− C for any collider structure A→ B ← C ;

2 Convert all arcs A→ B or A← B into undirected edges A− B.

A B

A B

A B P(a | b)

Y Y .9N Y .1

Y N .1N N .9

B P(b)

Y .75N .25

A B

A B f (a, b)

Y Y .9 · .75N Y .1 · .75Y N .1 · .25N N .9 · .75

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 43 / 66

Page 113: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Any BN can be encoded in a MRF

1 First, add edge A− C for any collider structure A→ B ← C ;

2 Convert all arcs A→ B or A← B into undirected edges A− B.

A B

A B

A B

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 43 / 66

Page 114: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Any BN can be encoded in a MRF

1 First, add edge A− C for any collider structure A→ B ← C ;

2 Convert all arcs A→ B or A← B into undirected edges A− B.

A B

A B

A

C

B

A

C

B

A B

A

C

B

A

C

B

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 43 / 66

Page 115: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Any BN can be encoded in a MRF

1 First, add edge A− C for any collider structure A→ B ← C ;

2 Convert all arcs A→ B or A← B into undirected edges A− B.

A B

A B

A

C

B

A

C

B

A B

A

C

B

A

C

B

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 43 / 66

Page 116: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Any BN can be encoded in a MRF

1 First, add edge A− C for any collider structure A→ B ← C ;

2 Convert all arcs A→ B or A← B into undirected edges A− B.

A B

A B

A

C

B

A

C

B

A B

A

C

B

A

C

B

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 43 / 66

Page 117: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Loose conversion

Similarly, we can convert a MRF to a BN (we won’t cover it.)

However, independences may be lost in either direction.

From

A

DB

C

A ⊥⊥ C | B,DB ⊥⊥ D | A,C

A

C

B

A ⊥⊥ B

To

A

DB

C

A ⊥⊥ C | B,DB ⊥⊥ D | A,C

A

C

B

A ⊥⊥ BAndre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 44 / 66

Page 118: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Bayes vs Markov

Bayes network

• Factors are conditionals(normalized)

• Easy to sample

• Can be made causal

• Can easily find P(x1, . . . , xn).

A

DB

C

P(a, b, c, d) = P(a)P(b | a)P(c | b)P(d | a, c)

Markov networks

• Factors are cliques(unnormalized)

• No directional ambiguity

• Often more compact

• More symmetric notation

A

DB

C

P(a, b, c, d) = 1/Z f1(a, b)f2(b, c)f3(c, d)f4(d , a)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 45 / 66

Page 119: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

What are the factors in a MRF?

A

C

B

Single clique: {A,B,C}, so P(a, b, c) = 1Z f (a, b, c).

No way to represent P(a, b, c) = 1/Z f1(a, b)f2(b, c)f3(c , a).

Pairwise MRF: Like a MRF, but factors are edges rather than cliques.

But what if we want to mix them?

A

C

B

D

E

P(a, b, c , d , e) = 1/Z f1(a, b)f2(b, c)f3(c , a)f4(b, d , e)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 46 / 66

Page 120: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

What are the factors in a MRF?

A

C

B

Single clique: {A,B,C}, so P(a, b, c) = 1Z f (a, b, c).

No way to represent P(a, b, c) = 1/Z f1(a, b)f2(b, c)f3(c , a).

Pairwise MRF: Like a MRF, but factors are edges rather than cliques.

But what if we want to mix them?

A

C

B

D

E

P(a, b, c , d , e) = 1/Z f1(a, b)f2(b, c)f3(c , a)f4(b, d , e)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 46 / 66

Page 121: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

What are the factors in a MRF?

A

C

B

Single clique: {A,B,C}, so P(a, b, c) = 1Z f (a, b, c).

No way to represent P(a, b, c) = 1/Z f1(a, b)f2(b, c)f3(c , a).

Pairwise MRF: Like a MRF, but factors are edges rather than cliques.

But what if we want to mix them?

A

C

B

D

E

P(a, b, c , d , e) = 1/Z f1(a, b)f2(b, c)f3(c , a)f4(b, d , e)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 46 / 66

Page 122: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

What are the factors in a MRF?

A

C

B

Single clique: {A,B,C}, so P(a, b, c) = 1Z f (a, b, c).

No way to represent P(a, b, c) = 1/Z f1(a, b)f2(b, c)f3(c , a).

Pairwise MRF: Like a MRF, but factors are edges rather than cliques.

But what if we want to mix them?

A

C

B

D

E

P(a, b, c , d , e) = 1/Z f1(a, b)f2(b, c)f3(c , a)f4(b, d , e)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 46 / 66

Page 123: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

What are the factors in a MRF?

A

C

B

Single clique: {A,B,C}, so P(a, b, c) = 1Z f (a, b, c).

No way to represent P(a, b, c) = 1/Z f1(a, b)f2(b, c)f3(c , a).

Pairwise MRF: Like a MRF, but factors are edges rather than cliques.

But what if we want to mix them?

A

C

B

D

E

P(a, b, c , d , e) = 1/Z f1(a, b)f2(b, c)f3(c , a)f4(b, d , e)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 46 / 66

Page 124: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Outline

1 Directed Models

Bayes networks

Conditional independence and D-separation

Causal graphs & the do operator

2 Undirected Models

Markov random fields

Factor graphs

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 47 / 66

Page 125: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Factor graphs

Explicitly represent factors in the graph to remove ambiguity.

P(a, b, c , d , e) = 1/Z f1(a, b)f2(b, c)f3(c , a)f4(b, d , e)

A

C

B

D

E

f1

f2f3

f4

Definition (Factor graph)

A FG is a bipartite graph G with vertices in V ∪ F, where X1, . . . ,Xn ∈ V

are random variables and α ∈ F are factors, inducing a distribution

P(x1, . . . , xn) =1

Z

∏α∈F

fα(xα)

where fα ≥ 0, and Xα is the set of variables with an edge to factor α.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 48 / 66

Page 126: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Factor graphs

• Any MRF can be mapped exactly to a FG (clique → factor).

• Any Pairwise MRF can be mapped exactly to a FG (edge → factor).

• FGs are more general / more fine-grained.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 49 / 66

Page 127: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Algorithms

• Inference: Given a FG with compatibility functions, answer queries• Maximization: Find most likely assignment x1, . . . , xN

(possibly given evidence xi : i ∈ E).

argmaxx1,...,xM P(x1, . . . , xN | xE)

• Marginalization: Find the marginal probability of some partialassignment over xj : j ∈M (possibly given evidence xi : i ∈ E)

P(xM | xE)

• NP-hard / #P-hard in general, but doable for tree-shaped graphswith dynamic programming.

• Learning: Given a dataset, estimate the compatibility tables (or, ingeneral a model that produces them.)

Since BN → MRF → FG, it suffices to study inference algorithms for FG.1

1But not learning, since we cannot map back to BN losslessly!Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 50 / 66

Page 128: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Multiplying factors

A core operation: combining factors by multipliying them.

A B C

f1 f2

A B C

g

A B f1(a, b)

0 0 30 1 11 0 21 1 8

B C f2(a, b)

0 0 50 1 41 0 11 1 1

A B C g(a, b, c)

0 0 0 3 · 5 = 150 0 1 3 · 4 = 120 1 0 1 · 1 = 10 1 1 1 · 1 = 11 0 0 2 · 5 = 101 0 1 2 · 4 = 81 1 0 8 · 1 = 81 1 1 8 · 1 = 8

Distribution is preserved:

f1(a, b) · f2(b, c) · f3(. . . ) · . . . = g(a, b, c) · f3(. . . ) · . . .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 51 / 66

Page 129: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Maximizing over a variable

A B C

f

B C

g

A B C f (a, b, c)

0 0 0 150 0 1 120 1 0 10 1 1 11 0 0 101 0 1 81 1 0 81 1 1 8

— maximizing over A→

B C g(b, c)

0 0 150 1 121 0 81 1 8

maxa

f (a, b, c) · f4(. . . ) · . . .︸ ︷︷ ︸A−free

= g(b, c) · f4(. . . ) · . . .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 52 / 66

Page 130: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Marginalizing over a variable

A B C

f

B C

g

A B C f (a, b, c)

0 0 0 150 0 1 120 1 0 10 1 1 11 0 0 101 0 1 81 1 0 81 1 1 8

— summing over A→

B C g(b, c)

0 0 250 1 201 0 91 1 9

∑a

f (a, b, c) · f4(. . . ) · . . .︸ ︷︷ ︸A−free

= g(b, c) · f4(. . . ) · . . .

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 53 / 66

Page 131: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 132: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 133: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 134: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gC

gBgAhBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 135: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gC

gBgAhBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 136: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gC

gBgA

hBC

hAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 137: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gC

gBgA

hBC

hAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 138: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgB

gA

hBC

hAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 139: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgB

gA

hBC

hAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 140: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgB

gA

hBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 141: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgB

gA

hBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 142: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 143: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)

8 Just like Viterbi!The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 144: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: maxa,b,c,d P(a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 90B=0

1 54B=1

B gB(b)

0 9C=1

1 6C=1

C gC (c)

0 4D=0

1 3D=1

A B hAB(a, b)

0 0 10 · 9 = 90C=1

0 1 2 · 6 = 12C=1

1 0 3 · 9 = 27C=1

1 1 9 · 6 = 54C=1

B C hBC (b, c)

0 0 1 · 4 = 4D=0

0 1 3 · 3 = 9D=1

1 0 1 · 4 = 4D=0

1 1 2 · 3 = 6D=1

1 Pick order: D, C, B, A

2 Maximize over D (fCD → gC )

3 Multiply fBC with gC givinghBC

4 Maximize over C (hBC → gB)

5 Multiply fAB with gB givinghAB

6 Maximize over B (hAB → gA)

7 Maximize over A (gA → ∅)8 Just like Viterbi!

The max is 90/Z.

Backtrace to getargmax : (0, 0, 1, 1).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 54 / 66

Page 145: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 146: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 147: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 148: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gC

gBgAhBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 149: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gC

gBgAhBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 150: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gC

gBgA

hBC

hAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 151: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gC

gBgA

hBC

hAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 152: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgB

gA

hBC

hAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 153: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgB

gA

hBC

hAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 154: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgB

gA

hBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 155: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgB

gA

hBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 156: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 157: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)

8 Just like the Forwardalgorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 158: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: sum

A B C D

fAB fBC fCD

gCgBgAhBChAB

Query: Z =∑

a,b,c,d f (a, b, c, d) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A gA(a)

0 2081 180

B gB(b)

0 181 14

C gC (c)

0 61 4

A B hAB(a, b)

0 0 10 · 18 = 1800 1 2 · 14 = 281 0 3 · 18 = 541 1 9 · 14 = 126

B C hBC (b, c)

0 0 1 · 6 = 60 1 3 · 4 = 121 0 1 · 6 = 61 1 2 · 4 = 8

1 Pick order: D, C, B, A

2 Sum over D (fCD → gC )

3 Multiply fBC with gC giving hBC

4 Sum over C (hBC → gB)

5 Multiply fAB with gB giving hAB

6 Sum over B (hAB → gA)

7 Sum over A (gA → ∅)8 Just like the Forward

algorithm! Z = 388.so P(0, 0, 1, 1) = 90/Z ≈ .23For free: P(A = 0) = 208/388 ≈ .54.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 55 / 66

Page 159: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC vDhCDhBC

hABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 160: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC

vD

hCDhBChABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 161: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC

vD

hCDhBChABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 162: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC

vD

hCDhBChABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 163: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC

vDhCD

hBChABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 164: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC

vDhCD

hBChABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 165: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC vDhCD

hBChABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 166: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC vDhCD

hBChABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 167: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC vDhCDhBC

hABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 168: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC vDhCDhBC

hABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 169: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC vDhCDhBC

hABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 170: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC vDhCDhBC

hABC

gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 171: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination: more complicated example

A B C D

fAB fBC fCD

gC vDhCDhBC

hABC gAC

Query: P(a, c | D = 1) =?

A B fAB(a, b)

0 0 100 1 21 0 31 1 9

B C fBC (b, c)

0 0 10 1 31 0 11 1 2

C D fCD(c, d)

0 0 40 1 21 0 11 1 3

A C gAC (a, c)

0 0 240 1 1021 0 241 1 72

C gC (c)

0 21 3

D vD(d)

0 01 1

A B C hABC (a, b, c)

0 0 0 200 0 1 900 1 0 40 1 1 121 0 0 61 0 1 181 1 0 181 1 1 54

B C hBC (b, c)

0 0 20 1 91 0 21 1 6

1 Introduce evidence!

2 Pick order: D, C, B, A

3 Multiply all D factors

4 Sum over D (hCD → gC )

5 Multiply all C factors

6 Multiply all B factors

7 Sum over B.

C D hCD(c, d)

0 0 00 1 21 0 01 1 3

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 56 / 66

Page 172: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination

• Answer any query involving max, marginalization, evidence!• Complexity depends on elimination order: O(nkM)

• where n=n. variables, k=dimension, M=size of largest intermediatefactor.

• Example: In chain, intuitive order has M = 2.eliminating from middle of chain gives M = 3.

• Extreme example is a star graph. Best case M = 2, worst M = N!

AB1

B2

B3

. . .

Bj

• In chains and trees: optimal order is easy. Not in general.

• When given a new query, need to restart algorithm from scratch!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 57 / 66

Page 173: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination as message passing

A B C D

P(b)

P(c)

• Optimal order: A, D, C (or D, C, A)

• At each step, we eliminate a variable Y by multiplying (at most2) twofactors and summing over Y :

gY→X (x) =∑

y fXY (x , y)gY (y)

• These intermediate operations (“messages”) are shared for all queries,

so let’s compute all messages up front!

2because it’s a treeAndre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 58 / 66

Page 174: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination as message passing

A B C D

P(b)

P(c)

• Optimal order: A, D, C (or D, C, A)

• At each step, we eliminate a variable Y by multiplying (at most2) twofactors and summing over Y :

gY→X (x) =∑

y fXY (x , y)gY (y)

• These intermediate operations (“messages”) are shared for all queries,

so let’s compute all messages up front!

2because it’s a treeAndre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 58 / 66

Page 175: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination as message passing

A B C D

P(b)

P(c)

• Optimal order: A, D, C (or D, C, A)

• At each step, we eliminate a variable Y by multiplying (at most2) twofactors and summing over Y :

gY→X (x) =∑

y fXY (x , y)gY (y)

• These intermediate operations (“messages”) are shared for all queries,

so let’s compute all messages up front!

2because it’s a treeAndre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 58 / 66

Page 176: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Variable elimination as message passing

A B C D

P(b)

P(c)

• Optimal order: A, D, C (or D, C, A)

• At each step, we eliminate a variable Y by multiplying (at most2) twofactors and summing over Y :

gY→X (x) =∑

y fXY (x , y)gY (y)

• These intermediate operations (“messages”) are shared for all queries,

so let’s compute all messages up front!2because it’s a tree

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 58 / 66

Page 177: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Motivating Example: Counting Soldiers

(Adapted from MacKay 2003 and Gormley & Eisner ACL’14 tutorial.)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 59 / 66

Page 178: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Motivating Example: Counting Soldiers

(Adapted from MacKay 2003 and Gormley & Eisner ACL’14 tutorial.)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 59 / 66

Page 179: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Motivating Example: Counting Soldiers

(Adapted from MacKay 2003 and Gormley & Eisner ACL’14 tutorial.)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 59 / 66

Page 180: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Motivating Example: Counting Soldiers

(Adapted from MacKay 2003 and Gormley & Eisner ACL’14 tutorial.)

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 59 / 66

Page 181: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Motivating Example: Counting Soldiers

(Adapted from MacKay 2003 and Gormley & Eisner ACL’14 tutorial.)Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 59 / 66

Page 182: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Message passing in a tree FG

• Messages from variable X to factor α: aggregate variable beliefs fromany other factors. (For leaves, this message is 1).

νX→α(x) =∏

β∈N(X )−α

µβ→X (x)

• Messages from factor α to variable X : marginalizes over allassignments y1, . . . , yk for Y1, . . . ,Yk neighboring α

µα→X (x) =∑

y1,...,yk{Y1,...,Yk}=N(α)−X

fα(x , y1, . . . , yk)∏

Yi∈N(α)−X

νYi→α(yi )

• A message is sent once all messages it depends on have been received.

• For chain: forward-backward! For tree: leaves-to-root and back.

• If new evidence is added, many messages don’t change.

• Replace sum with max for maximization.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 60 / 66

Page 183: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

From messages to beliefs

• Once we collected all the messages, we can compute local beliefs.

• Variable beliefs:pX (x) ∝

∏α∈N(X )

µα→X (x)

• Factor beliefs:

pα(x1, . . . xk) ∝ fα(x1, . . . , xk)∏

Xi∈N(α)

νXi→α(xi )

• If no cycles, once all messages are passed, beliefs are true marginals:

pX (x) = P(x), pα(x1, . . . , xk) = P(x1, . . . , xk).

• What to do if there are cycles?

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 61 / 66

Page 184: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Counting Soldiers with Loops

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 62 / 66

Page 185: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Counting Soldiers with Loops

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 62 / 66

Page 186: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Inference in loopy graphs

• Exact solution: Junction Tree algorithm:• convert the graph into a tree, by merging cliques!

• Complexity: like variable elimination. Finding the best tree is NP-hard.(corresponds to finding an ordering for variable elimination.)

• Better than VE because we get all marginals at once.

• Approximate solution: Loopy Belief Propagation:• initialize all messages;• pass messages in some order until convergence.• (may not terminate, result not guaranteed correct, but works ok.)

• Many recent algorithms (early 2010s).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 63 / 66

Page 187: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Inference in loopy graphs

• Exact solution: Junction Tree algorithm:• convert the graph into a tree, by merging cliques!

• Complexity: like variable elimination. Finding the best tree is NP-hard.(corresponds to finding an ordering for variable elimination.)

• Better than VE because we get all marginals at once.• Approximate solution: Loopy Belief Propagation:

• initialize all messages;• pass messages in some order until convergence.• (may not terminate, result not guaranteed correct, but works ok.)

• Many recent algorithms (early 2010s).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 63 / 66

Page 188: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Inference in loopy graphs

• Exact solution: Junction Tree algorithm:• convert the graph into a tree, by merging cliques!

• Complexity: like variable elimination. Finding the best tree is NP-hard.(corresponds to finding an ordering for variable elimination.)

• Better than VE because we get all marginals at once.• Approximate solution: Loopy Belief Propagation:

• initialize all messages;• pass messages in some order until convergence.• (may not terminate, result not guaranteed correct, but works ok.)• Many recent algorithms (early 2010s).

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 63 / 66

Page 189: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

CRFs for any factor graph

Above, we took the factor scores for granted. We can learn to model them:

Use some model (neural or feature-based) to produce unary scores:

fA(y) = exp sA,y = (for example) expwA,y · φA(x)

and pairwise scores:

fAB(y , y ′) = exp sAB,y ,y ′ = (for example) expwA,B,y ,y ′ · φA,B(x)

(In general, factor scores fα(yα) = exp sα,yα)

The probability of an entire labeling y is then

P(y | x) =

∏α fα(yα)

Zmeaning logP(y | x) =

∑α

sα,yα − logZ

Gradient updates wrt a factor’s scores:

∂ logP(y | x)

∂sα,yα= [[yα = y true

α ]]− P(yα | x)

The updates use the factor beliefs P(yα | x) = pα(yα) for each factor!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 64 / 66

Page 190: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

CRFs for any factor graph

Above, we took the factor scores for granted. We can learn to model them:

Use some model (neural or feature-based) to produce unary scores:

fA(y) = exp sA,y = (for example) expwA,y · φA(x)

and pairwise scores:

fAB(y , y ′) = exp sAB,y ,y ′ = (for example) expwA,B,y ,y ′ · φA,B(x)

(In general, factor scores fα(yα) = exp sα,yα)

The probability of an entire labeling y is then

P(y | x) =

∏α fα(yα)

Zmeaning logP(y | x) =

∑α

sα,yα − logZ

Gradient updates wrt a factor’s scores:

∂ logP(y | x)

∂sα,yα= [[yα = y true

α ]]− P(yα | x)

The updates use the factor beliefs P(yα | x) = pα(yα) for each factor!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 64 / 66

Page 191: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

CRFs for any factor graph

Above, we took the factor scores for granted. We can learn to model them:

Use some model (neural or feature-based) to produce unary scores:

fA(y) = exp sA,y = (for example) expwA,y · φA(x)

and pairwise scores:

fAB(y , y ′) = exp sAB,y ,y ′ = (for example) expwA,B,y ,y ′ · φA,B(x)

(In general, factor scores fα(yα) = exp sα,yα)

The probability of an entire labeling y is then

P(y | x) =

∏α fα(yα)

Zmeaning logP(y | x) =

∑α

sα,yα − logZ

Gradient updates wrt a factor’s scores:

∂ logP(y | x)

∂sα,yα= [[yα = y true

α ]]− P(yα | x)

The updates use the factor beliefs P(yα | x) = pα(yα) for each factor!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 64 / 66

Page 192: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

CRFs for any factor graph

Above, we took the factor scores for granted. We can learn to model them:

Use some model (neural or feature-based) to produce unary scores:

fA(y) = exp sA,y = (for example) expwA,y · φA(x)

and pairwise scores:

fAB(y , y ′) = exp sAB,y ,y ′ = (for example) expwA,B,y ,y ′ · φA,B(x)

(In general, factor scores fα(yα) = exp sα,yα)

The probability of an entire labeling y is then

P(y | x) =

∏α fα(yα)

Zmeaning logP(y | x) =

∑α

sα,yα − logZ

Gradient updates wrt a factor’s scores:

∂ logP(y | x)

∂sα,yα= [[yα = y true

α ]]− P(yα | x)

The updates use the factor beliefs P(yα | x) = pα(yα) for each factor!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 64 / 66

Page 193: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

CRFs for any factor graph

Above, we took the factor scores for granted. We can learn to model them:

Use some model (neural or feature-based) to produce unary scores:

fA(y) = exp sA,y = (for example) expwA,y · φA(x)

and pairwise scores:

fAB(y , y ′) = exp sAB,y ,y ′ = (for example) expwA,B,y ,y ′ · φA,B(x)

(In general, factor scores fα(yα) = exp sα,yα)

The probability of an entire labeling y is then

P(y | x) =

∏α fα(yα)

Zmeaning logP(y | x) =

∑α

sα,yα − logZ

Gradient updates wrt a factor’s scores:

∂ logP(y | x)

∂sα,yα= [[yα = y true

α ]]− P(yα | x)

The updates use the factor beliefs P(yα | x) = pα(yα) for each factor!

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 64 / 66

Page 194: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

CRFs for any factor graph

Above, we took the factor scores for granted. We can learn to model them:

Use some model (neural or feature-based) to produce unary scores:

fA(y) = exp sA,y = (for example) expwA,y · φA(x)

and pairwise scores:

fAB(y , y ′) = exp sAB,y ,y ′ = (for example) expwA,B,y ,y ′ · φA,B(x)

(In general, factor scores fα(yα) = exp sα,yα)

The probability of an entire labeling y is then

P(y | x) =

∏α fα(yα)

Zmeaning logP(y | x) =

∑α

sα,yα − logZ

Gradient updates wrt a factor’s scores:

∂ logP(y | x)

∂sα,yα= [[yα = y true

α ]]− P(yα | x)

The updates use the factor beliefs P(yα | x) = pα(yα) for each factor!Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 64 / 66

Page 195: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

Undirected models: summary

• MRFs and pairwise MRFs, both special cases of FGs.

• Powerful, expressive, widely used for discriminative modelling.• Exact inference when not loopy.

• We’ve seen some ideas of what to do when loopy• We did not cover more advanced approaches, relating message passing

and dual decomposition: (Martins et al., 2015; Kolmogorov, 2006;Komodakis et al., 2007; Globerson and Jaakkola, 2007)

• For learning: a generalization of linear-chain CRFs

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 65 / 66

Page 196: Lecture 7: Probabilistic Graphical Models · 2021. 1. 3. · Announcements Homework 2 is due today! Project midterm report is due next week! Homework 3 is out, the deadline is December

References I

Dawid, A. P. (2010). Beware of the DAG! In Causality: objectives and assessment, pages 59–86.

Globerson, A. and Jaakkola, T. (2007). Fixing Max-Product: Convergent message passing algorithms for MAP LP-relaxations.In Proc. of NeurIPS.

Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.

Kolmogorov, V. (2006). Convergent Tree-Reweighted Message Passing for energy minimization. IEEE Transactions on PatternAnalysis and Machine Intelligence, 28(10):1568–1583.

Komodakis, N., Paragios, N., and Tziritas, G. (2007). MRF optimization via dual decomposition: Message-Passing revisited. InProc. of ICCV.

MacKay, D. (2003). Information Theory, Inference, and Learning Algorithms, volume 7. Cambridge University Press.

Martins, A. F., Figueiredo, M. A., Aguiar, P. M., Smith, N. A., and Xing, E. P. (2015). AD3: Alternating directions dualdecomposition for MAP inference in graphical models. JMLR, 16(1):495–545.

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.

Pearl, J. (2000). Causality: models, reasoning and inference, volume 29. Springer.

Pearl, J. (2012). The do-calculus revisited. arXiv preprint arXiv:1210.4852.

Andre Martins (IST) Lecture 7: Probabilistic Graphical Models IST, Fall 2020 66 / 66