![Page 1: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/1.jpg)
MIT OpenCourseWare http://ocw.mit.edu
6.047 / 6.878 Computational Biology: Genomes, Networks, EvolutionFall 2008
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
![Page 2: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/2.jpg)
Introduction to Bayesian Networks
6.047/6.878 Computational Biology: Genomes, Networks, Evolution
![Page 3: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/3.jpg)
Overview
• We have looked at a number of graphical representations of probability distributions– DAG example: HMM– Undirected graph example: CRF
• Today we will look at a very general graphical model representation – Bayesian Networks
• One application – modeling gene expression• Aviv Regev guest lecture – an extension of
this basic idea
![Page 4: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/4.jpg)
Probabilistic Reconstruction
• Expression data gives us information about what genes tend to be expressed with others
• In probability terms, information about the joint distribution over gene states X:
P(X)=P(X1,X2,X3,X4,…,Xm)
Can we model this joint distribution?
![Page 5: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/5.jpg)
Bayesian Networks
• Directed graph encoding joint distribution variables X
P(X) = P(X1,X2,X3,…,XN)
• Learning approaches• Inference algorithms
• Captures information about dependency structure of P(X)
![Page 6: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/6.jpg)
Example 1 – Icy Roads
IIcy Roads
WWatsonCrashes
HHolmesCrashes
Causalimpact
Assume we learn that Watson has crashedGiven this causal network, one might fear Holmes has
crashed too. Why?
![Page 7: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/7.jpg)
Example 1 – Icy Roads
IIcy Roads
WWatsonCrashes
HHolmesCrashes
Now imagine we have learned that roads are not icyWe would no longer have an increased fear that Holmes has
crashed
![Page 8: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/8.jpg)
Conditional Independence
IIcy Roads
WWatsonCrashes
HHolmesCrashes
If we know nothing about I, W and H are dependentIf we know I, W and H are conditionally independent
No Info on IWe Know I
![Page 9: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/9.jpg)
Conditional Independency
• Independence of 2 random variables
• Conditional independence given a third
( , ) ( ) ( )X Y P X Y P X P Y⊥ ⇔ =
| ( , | ) ( | ) ( | )but ( , ) ( ) ( ) necessarily
X Y Z P X Y Z P X Z P Y ZP X Y P X P Y
⊥ ⇔ =≠
![Page 10: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/10.jpg)
Example 2 – Rain/Sprinkler
RRain
WWatson
Wet
HHolmes
Wet
SSprinkler
Holmes discovers his house is wet.Could be rain or his sprinkler.
![Page 11: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/11.jpg)
RRain
WWatson
Wet
HHolmes
Wet
SSprinkler
Now imagine Holmes sees Watsons grass is wetNow we conclude it was probably rain
And probably not his sprinkler
Example 2 – Rain/Sprinkler
![Page 12: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/12.jpg)
Explaining Away
RRain
WWatson
Wet
HHolmes
Wet
SSprinkler
Initially we had two explanations for Holmes’ wet grass.But once we had more evidence for R, this explained away H
and thus no reason for increase in S
![Page 13: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/13.jpg)
Conditional Dependence
RRain
WWatson
Wet
HHolmes
Wet
SSprinkler
If we don’t know H, R and S are …independent
But if we know H, R and S are conditionally dependent
Don’t know H
Know H
![Page 14: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/14.jpg)
Graph Semantics
Y
X Z
Y
X Z
Y
X Z
Serial Diverging Converging
Each implies a particular independence relationship
Three basic building blocks
![Page 15: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/15.jpg)
Chain/Linear
|X Z⊥ ∅
|X Z Y⊥
YX Z
YX Z
Conditional Independence
![Page 16: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/16.jpg)
Diverging
|X Z⊥ ∅
|X Z Y⊥
YX Z
YX Z
Conditional Independence
![Page 17: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/17.jpg)
Converging
|X Z⊥ ∅
|X Z Y⊥
YX Z
YX Z
Conditional Dependence - Explaining Away
![Page 18: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/18.jpg)
Graph Semantics
Y
X Z
Converging
Three basic building blocks
A B ….
![Page 19: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/19.jpg)
D-Separation
Definition : Two nodes A and B are d-separated (or blocked) if for every path p between A and B there is a node V such that either
1. The connection is serial or divergingand V is known
2. The connection is converging and V and all of its descendants are unknown
If A and B are not d-separated, they are d-connected
Three semantics combine in concept of d-separation
![Page 20: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/20.jpg)
Equivalence of NetworksTwo structures are equivalent if they represent same independence relationship - they encode
the same space of probability distributions
Will return to this when we consider causal vs probabilistic networks
Y
X Z
Y
X Z
Y
X Z
Example
![Page 21: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/21.jpg)
Bayesian Networks
• A network structure S– Directed acyclic graph (DAG)– Nodes => random variables X– Encodes graph independence sematics
• Set of probability distributions P– Conditional Probability Distribution (CPD)– Local distributions for X
A Bayesian network (BN) for X = {X1, X2, X3,…, Xn} consists of:
![Page 22: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/22.jpg)
Example Bayesian Network
RRain
WWatson
Wet
HHolmes
Wet
SSprinkler
( ) ( ) ( | ) ( | , ) ( | , , )( ) ( ) ( | ) ( | , )
P X P R P S R P W R S P H R S WP R P S P W R P H S R
==
![Page 23: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/23.jpg)
BN Probability Distribution
Only need distributions over nodes and their parents
1 11
1
( ) ( ,..., )
( ( ))
n
i ii
n
i ii
P X P X X X
P X pa X
−=
=
=
=
∏
∏
![Page 24: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/24.jpg)
BNs are Compact Descriptions of P(X)Independencies allow us to factorize distribution
Example- Assume 2 states per node
X4
X2 X3
X1
X5
1 2 1 3 2 1
4 3 2 1 4 3 2 1
1
( ) P(X )P(X |X )P(X |X ,X ) P(X |X ,X ,X )P(X5|X ,X ,X ,X )
2 4 8 16 32 62 entries
( ) ( ( ))
P(X1)P(X2|X1)P(X3|X1) P(X4|X2)P(X5|X3) 2 4
n
i ii
P X
P X P X pa X=
=
⇒ + + + + =
=
=
⇒ + +
∏
4 4 4 18 entries+ + =
![Page 25: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/25.jpg)
Recall from HMM/CRF Lecture
( ) ( )all nodes v
i i-1 i i
P(X,Y) = P(v|parents(v))
= P Y | Y P X |Y
∏
∏
Y2 Y3Y1
X2 X3X1
Y4
X4
Y2 Y3Y1 Y4
X2
Directed Graph Semantics
Factorization
Potential Functions over Cliques(conditioned on X)
Markov Random Field
Factorization
( )all nodes v
i i-1
P(Y|X) = P(v|clique(v),X)
= P Y | Y ,X
∏
∏
![Page 26: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/26.jpg)
CPDs
R
W H
SR S H P(H|R,S)
0 0 1 0.1
0 0 0 0.9
0 1 1 0.8
0 1 0 0.2
1 0 1 0.9
1 0 0 0.1
1 1 1 0.1
1 1 0 0.9
Discrete
Continuous
![Page 27: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/27.jpg)
Bayesian Networks for Inference
• Observe values (evidence on) of a set of nodes, want to predict state of other nodes
• Exact Inference– Junction Tree Algorithm
• Approximate Inference– Variational approaches, Monte Carlo
sampling
Observational inference
![Page 28: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/28.jpg)
Walking Through an Example
R
W H
S
R=y R=nW=y 1 0.2W=n 0 0.8
P0(W|R)
R=y R=nS=y 1,0 0.9,0.1S=n 0,0 0,1
P0(H|R,S)
S=y S=n0.1 0.9
P0(S)
R=y R=n0.2 0.8
P0(R)
![Page 29: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/29.jpg)
Walking Through an Example
R
W H
S
WR RHS
We define two clusters:-WR, RHS
The key idea: the clusters only
communicate through R
If they agree on R, all is good
R
![Page 30: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/30.jpg)
Walking Through an Example
R=y R=nW=y 1 0.2W=n 0 0.8
P0(W|R)
R=y R=nS=y 1,0 0.9,0.1S=n 0,0 0,1
P0(H|R,S)
S=y S=n0.1 0.9
P0(S)
R=y R=n0.2 0.8
P0(R)
WR RHSR
We will find it easier to work on thisrepresentation:
We then need P(WR) and P(RHS):
P(WR) =P(R)P(W|R)P(RHS) =P(R)P(S)P(H|R,S)
![Page 31: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/31.jpg)
Walking Through an Example
R=y R=nW=y 0.2 0.16W=n 0 0.64
P0(W,R)
R=y R=nS=y 0.02,0 0.072,0.008S=n 0.18,0 0,0.72
P0(R,H,S)
R=y R=n0.2 0.8
P0(R)
WR RHSR
We will find it easier to work on thisrepresentation:
We then need P(WR) and P(RHS):
P(WR) =P(R)P(W|R)P(RHS) =P(R)P(S)P(H|R,S)
![Page 32: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/32.jpg)
Walking Through an Example
R=y R=nW=y 0.2 0.16W=n 0 0.64
P0(W,R)
R=y R=nS=y 0.02,0 0.072,0.008S=n 0.18,0 0,0.72
P0(R,H,S)
R=y R=n0.2 0.8
P0(R)
WR RHSR
Note that by marginalizing out W from P0(W,R) we get
P0(W)=(0.36,0.64)
This is our initial belief in Watsons grass being (wet,not wet)
![Page 33: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/33.jpg)
Walking Through an Example
R=y R=nW=y 0.2 0.16W=n 0 0.64
P0(W,R)
R=y R=nS=y 0.02,0 0.072,0.008S=n 0.18,0 0,0.72
P0(R,H,S)
R=y R=n0.2 0.8
P0(R)
WR RHSR
Now we observe H=y
We need to do three things:
1. Update RHS with this info2. Calculate a new P1(R)3. Transmit P1(R) to update WR
H=y
![Page 34: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/34.jpg)
Walking Through an Example
R=y R=nW=y 0.2 0.16W=n 0 0.64
P0(W,R)
R=y R=nS=y 0.02,0 0.072,0S=n 0.18,0 0,0
P0(R,H,S)
R=y R=n0.2 0.8
P0(R)
WR RHSR
Updating RHS with H=y
We can simply - Zero out all entries in RHS where
H=n
H=y
![Page 35: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/35.jpg)
Walking Through an Example
R=y R=nW=y 0.2 0.16W=n 0 0.64
P0(W,R)
R=y R=nS=y 0.074,0 0.264,0S=n 0.662,0 0,0
P1(R,H,S)
R=y R=n0.2 0.8
P0(R)
WR RHSR
Updating RHS with H=y
We can simply - Zero out all entries in RHS where
H=n - Re-normalizeBut you can see that
this changes P(R) from the perspective
of RHS
H=y
![Page 36: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/36.jpg)
Walking Through an Example
R=y R=nW=y 0.2 0.16W=n 0 0.64
P0(W,R)
R=y R=n0.2 0.8
P0(R)
WR RHSR
2. Calculate new P1(R)
Marginalize out H,S from RHS for:
P1(R) = (0.736,0.264)
Note also
P1(S) =(0.339,0.661)P0(S) =(0.1,0.9)
R=y R=nS=y 0.074,0 0.264,0S=n 0.662,0 0,0
P1(R,H,S)
H=y
![Page 37: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/37.jpg)
Walking Through an ExampleR=y R=n
0.736 0.264
P1(R)
WR RHSR
2. Transmit P1(R) to update WR
P1(R)
1
1 1
00
P (W,R)=P(W|R)P (R)P (R)=P (W,RP
)(R)
R=y R=nW=y 0.2 0.16W=n 0 0.64
P0(W,R)
R=y R=nS=y 0.074,0 0.264,0S=n 0.662,0 0,0
P1(R,H,S)
H=y
![Page 38: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/38.jpg)
Walking Through an ExampleR=y R=n
0.736 0.264
P1(R)
WR RHSR
2. Transmit P1(R) to update WR
P1(R)
1
1 1
00
P (W,R)=P(W|R)P (R)P (R)=P (W,RP
)(R)
R=y R=nW=y 0.736 0.052W=n 0 0.211
P1(W,R)
R=y R=nS=y 0.074,0 0.264,0S=n 0.662,0 0,0
P1(R,H,S)
H=y
![Page 39: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/39.jpg)
Walking Through an ExampleR=y R=n
0.736 0.264
P1(R)
WR RHSR
2. Transmit P1(R) to update WR
P1(R)
1
1 1
00
P (W,R)=P(W|R)P (R)P (R)=P (W,RP
)(R)
R=y R=nW=y 0.736 0.052W=n 0 0.211
P1(W,R)
R=y R=nS=y 0.074,0 0.264,0S=n 0.662,0 0,0
P1(R,H,S)
P1(R)/P0(R) H=y
![Page 40: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/40.jpg)
Walking Through an ExampleR=y R=n
0.736 0.264
P1(R)
WR RHSR
2. Transmit P1(R) to update WR
P1(R)
1
0
1 1
0
1
0
P (R)P (W,R)=P(W|R)P (R)
=P (W,R)
PP (R)
0.788(W=y)=P (W=y)=0.360
R=y R=nW=y 0.736 0.052W=n 0 0.211
P1(W,R)
R=y R=nS=y 0.074,0 0.264,0S=n 0.662,0 0,0
P1(R,H,S)
P1(R)/P0(R) H=y
![Page 41: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/41.jpg)
Walking Through an ExampleR=y R=n
0.736 0.264
P1(R)
WR RHSR
P2(R)/P1(R)
R=y R=nW=y 0.736 0.052W=n 0 0.211
P1(W,R)
R=y R=nS=y 0.074,0 0.264,0S=n 0.662,0 0,0
P1(R,H,S)
P2(R) H=yW=y
Now we observe W=y
1. Update WR with this info2. Calculate a new P2(R)3. Transmit P2(R) to update WR
![Page 42: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/42.jpg)
Walking Through an ExampleR=y R=n0.93 0.07
P2(R)
WR RHSR
P2(R)/P1(R)
R=y R=nW=y 0.93 0.07W=n 0 0
P2(W,R)
R=y R=nS=y 0.094,0 0.067,0S=n 0.839,0 0,0
P2(R,H,S)
P2(R) H=yW=y
P2(S=y)=0.161P1(S=y)=0.339P0(S=y)=0.1
- R is almost certain- We have explained away H=y- S goes low again
![Page 43: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/43.jpg)
P*(S)/P(S)
Message Passing in Junction Trees
X YS
e
Separator
• State of separator S is information shared between X and Y
• When Y is updated, it sends a message to X• Message has information to update X to
agree with Y on state of S
![Page 44: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/44.jpg)
4 4
4
5
5
1
2
1 1
13
2
Message Passing in Junction Trees
• A node can send one message to a neighbor, only after receiving all messages from each other neighbor
• When messages have been passed both ways along a link, it is consistent
• Passing continues until all links are consistent
A B C
F E D
G He
e
34
![Page 45: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/45.jpg)
HUGIN Algorithm
• Select one node, V, as root
• Call CollectEvidence(V): – Ask all neighbors of V to send message to V. – If not possible, recursively pass message to all neighbors but V
• Call DistributeEvidence(V): – Send message to all neighbors of V– Recursively send message to all neighbors but V
A simple algorithm for coordinated message passing in junction trees
![Page 46: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/46.jpg)
BN to Junction Tree
A topic by itself. In summary:
• Moral Graph – undirected graph with links between all parents of all nodes
• Triangulate – add links so all cycles>3 have cord
• Cliques become nodes of Junction Tree
R
W H
S
WR RHSR
![Page 47: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/47.jpg)
Recall from HMM/CRF Lecture
( ) ( )all nodes v
i i-1 i i
P(X,Y) = P(v|parents(v))
= P Y | Y P X |Y
∏
∏
Y2 Y3Y1
X2 X3X1
Y4
X4
Y2 Y3Y1 Y4
X2
Directed Graph Semantics
Factorization
Potential Functions over Cliques(conditioned on X)
Markov Random Field
Factorization
( )all nodes v
i i-1
P(Y|X) = P(v|clique(v),X)
= P Y | Y ,X
∏
∏
![Page 48: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/48.jpg)
Applications
G1 G2 G4
G10 G7 G6
G9 G8
G3G5
Measure some gene expression – predict rest
![Page 49: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/49.jpg)
Latent Variables
G1 G2 G4
G10 G7 G6
G9 G8
G3G5
Drugs
D1
D4
D2
D3
![Page 50: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/50.jpg)
Latent Variables
G1 G2 G4
G10 G7 G6
G9 G8
G3G5
Drugs
D1
D4
D2
D3
Env
pH
O2
![Page 51: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/51.jpg)
Latent Variables
G1 G2 G4
G10 G7 G6
G9 G8
G3G5
Drugs
D1
D4
D2
D3
Env
pH
O2
Metabolites M1 M2 M3 M4
![Page 52: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/52.jpg)
Observation vs Intervention
• Arrows not necessarily causal– BN models probability and correlation
between variables• For applications so far, we observe
evidence and want to know states of other nodes most likely to go with observation
• What about interventions?
![Page 53: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/53.jpg)
Example – Sprinkler
• If we observe the Sprinkler on
• Holmes grass more likely wet
• And more likely summer
• But what if we force sprinkler on
Rain
W H
Sprinkler
Season
Intervention – cut arrows from parents
![Page 54: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/54.jpg)
Causal vs Probabilistic
• This depends on getting the arrows correct!
• Flipping all arrows does not change independence relationships
• But changes causality for interventions
X ZY
X ZY
![Page 55: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/55.jpg)
Learning Bayesian Networks
1. A network structure S
2. Parameters, Θ, for probability distributions on each node, given S
Given a set of observations D (i.e. expression data set) on X, we want to find:
Relatively Easy
![Page 56: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/56.jpg)
Learning Θ• Given S, we can choose maximum likelihood
parameter Θ
• We can also choose to include prior information P(Θ) in a bayesian approach
1
arg max ( | , ) ( ( ) , )n
i ii
P D S P X pa Xθ
θ θ θ=
= =∏
( | ,D) ( , | ) ( )
( | , )bayes
P S P S D P
P S D d
θ θ θ
θ θ θ θ
=
= ∫
![Page 57: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/57.jpg)
Learning Bayesian Networks
1. A network structure S
2. Parameters, Θ, for probability distributions on each node, given S
Given a set of observations D (i.e. expression data set) on X, we want to find:
NP-Hard
![Page 58: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/58.jpg)
Learning S
Find optimal structure S given D
In special circumstances, integral analytically tractable(e.g. no missing data, multinomial, dirichlet priors)
( | ) ( | ) ( )
( | ) ( | , ) ( | )
P S D P D S P S
P D S P D S P S dθ θ θ
∝
= ∫
![Page 59: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/59.jpg)
Learning S – Heuristic Search
• Compute P(S|D) for all networks S • Select S* that maximizes P(S|D)
Problem 1: number of S grows super-exponentially with number of nodes – no exhaustive search, use hill climbing, etc..
Problem 2: Sparse data and overfitting
S
P(S|D)
S* S
P(S|D)
S*Sparse dataLots of data
![Page 60: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/60.jpg)
Model Averaging
• Rather than select only one model as result of search, we can draw many samples from P(M|D) and model average
( | ) ( | , ) ( | )
1 ( ) ( | )
xysamples
xysamples
P E D P Exy D S P S D
S P S D
=
=
∑
∑
How do we sample....?
![Page 61: MIT OpenCourseWare ...– DAG example: HMM – Undirected graph example: CRF • Today we will look at a very general graphical model representation – Bayesian Networks • One application](https://reader034.vdocument.in/reader034/viewer/2022051310/602b4d4a1a86145e4a0431a6/html5/thumbnails/61.jpg)
Sampling Models - MCMC
( | ) ( ) ( | )min 1,( | ) ( ) ( | )
new new old new
old old new old
P D S P S Q S SpP D S P S Q S S
⎧ ⎫= ×⎨ ⎬
⎩ ⎭
Markov Chain Monte Carlo Method
Sample from
Direct approach intractable due to partition function
MCMC• Propose – Given Sold, propose new Snew with
probability Q(Snew|Sold)
• Accept/Reject – Accept Snew as sample with
( | ) ( )( | ) s k
( | ) ( )s kk
P D S P SP S DP D S P S∑
=