1
Chapter 5 Belief Updating in Bayesian Networks
Bayesian Networks and Decision GraphsFinn V. Jensen
Qunyuan ZhangDivision. of Statistical Genomics, CGS
Statistical Genetics ForumMay 7,2007
2
Contents of the Book
I A practical Guide to Normative Systems
1 Causal and Bayesian Network
2 Building Models
3 Learning, Adaption, and Tuning
4 Decision Graphs
II Algorithms for Normative Systems
5 Belief Updating in Bayesian Network
6 Bayesian Network Analysis Tools
7 Algorithms for Influence Diagrams
3
Structure of the Book
1 Causal and Bayesian Network
2 Building Models3 Learning, Adaption, and Tuning
5 Belief Updating in Bayesian Network 6 Bayesian Network Analysis Tools
4 Decision Graphs7 Algorithms for Influence Diagrams
I. What is BN?
II. How to create a BN?
III. What can we use BN to do? and how?[to know sth.]
Prob.(a single variable | BN)Joint Prob.(a set variables | BN)Importance of varibales
evidence sensitivity parameter sensitivity
Data conflict analysis[to make decision]
Optimal decision (cost & gain)
4
BN & Decision Tree
A
D1
V1
B
T
D2 V2
C
U=V1+V2
D1
T
AXC V1+V2
D2
AXC V1+V2
AXC V1+V2
D2
AXC V1+V2
T
AXC V1+V2
D2
AXC V1+V2
AXC V1+V2
D2
AXC V1+V2
P(A,C|D1,T,D2)
5
“BN” of the Book
Concept of BN
Model Biulding
(known part of structure)
BN Learning
(uncertain part of structure)
BN
(structure & parameters)
Rules & Theories Data & Algorithms
Probability Calculation
Knowing, Understanding & Explaining
Decisions
Actions Cost & Gain
Changes
6
Chapter 5 Belief Updating in Bayesian Networks
Belief = Probability
Belief updating = Probability calculating based on a BN
(model, parameters and/or evidences)
Linear Model BN
Logistic Model
exxxy 3322110
3322110
3322110
1),|1( 3,21 xxx
xxx
e
exxxyP
X1 X2 X3 e
Y
Conditional ProbabilityP(Y| X1,X2,X3)
Marginal ProbabilityP(Y) =∑[-Y] φ
X2
X1
X3
Y
CA
B
E
D
F
7
Marginal Probability Calculation in BN
I. Simplification (5.5)
II. Marginalization (5.2),(5.3),(5.4),(5.6)
III. Simulation (5.7)
8
I. Simplifications
Graph-theoretic Representation
Definitions, Propositions & Theorems
Barren Nodes
D
A
B
C
F
E
G
e
D
A
B
C
F
E
G
eG
e
DA B C
F
E
DA B C
F
E
e
e
DA B C
F
E
e
d-separation
By excluding the non-informative nodes (white nodes)
9
II. Marginalization
Calculating sums of products of potentials by eliminating variables repeatedly
10
Marginal Probabilities
A
BA1 A2 P(B)
B1 p1 p2p1+p2
P(B1)
B2 p3 p4p3+p4
P(B2)
P(A)p1+p3
P(A1)
p2+p4
P(A2)
Joint Probabilities
11
An Example of Marginalization/Elimination
BN parameters (potentials) :
φ1=P (A1) , φ2=P (A2|A1) , φ3=P (A3|A1), φ4=P (A4|A2)
φ5=P (A5|A2, A3), φ6=P (A6|A3)
P(A4)=?
A3
A1
A2
A4 A5 A6
65
321
6532165321
),(),,(
),(),(),()(
)()(
3633253
13324412211
,,,,654321
,,,,4
AA
AAA
AAAAAAAAAA
AAAAA
AAAAAAA
UPAP
Distributive Law
12
Marginalization/Elimination Order
)(
),()(
),(),(),()(
)(),(),(),(),()(
)(),,(),(),(),()(
),(),,(),(),(),()(
)(
4'
1
41'211
21'324412211
3'632
'513324412211
3'6325313324412211
363325313324412211
4
1
21
321
5321
65321
A
AAA
AAAAAAA
AAAAAAAAAA
AAAAAAAAAAA
AAAAAAAAAAAA
AP
A
AA
AAA
AAAA
AAAAA
A3
A1
A2
A4 A5 A6
Variable Elimination Order
)( 412356 APAAAAA
13
Marginalization/Elimination
Graph-theoretic Representation
Definitions, Propositions & Theorems
Domain: a set of variables in BN
Potential: a real-valued probabilistic table over a domain
φ1=P (A1) , φ2=P (A2|A1) , φ3=P (A3|A1), φ4=P (A4|A2)
φ5=P (A5|A2, A3), φ6=P (A6|A3)
A3
A1
A2
A4 A5 A6
Definition 5.1 (Elimination)Let Фbe a set of potentials, and let X be a variable. X is eliminated from Ф by:
1.Remove all potentials in Ф with X in their domains. Call the removed set ФX
X= A3 => ФX=(φ3, φ5, φ6 ), Ф=(φ1, φ2, φ4 )
2.Calculate φ-X = ∑x ΠФX = ∑A3 φ3φ5φ6
3.Add φ-X to Ф. Call the result set Ф-X =(φ1, φ2, φ4 , φ-X )
P(Y) is calculated by repeatedly eliminating the variables except Y
Question : how to find an efficient/optimal elimination order?
14
Domain Graphs
Graph-theoretic Representation
Definitions, Propositions & Theorems
BN graph
6 domains
φ1 (A1) , φ2 (A2,A1) ,
φ3 (A3,A1), φ4 (A4,A2)
φ5 (A5,A2,A3), φ6(A6,A3)
A3
A1
A2
A4 A5 A6
Domain graph
6 domains
φ1 (A1) , φ2 (A2,A1) ,
φ3 (A3,A1), φ4 (A4,A2)
φ5 (A5,A2,A3), φ6(A6,A3)
A3
A1
A2
A4 A5 A6
15
Perfect Elimination Sequence
Graph-theoretic Representation
Definitions, Propositions & Theorems
Fill-ins (red links)
Perfect Elimination Sequence
An elimination sequence without introducing fill-ins.
e.g.
A6, A5, A3, A1, A2 down to A4 => P(A4)
A5, A6, A3, A1, A2 down to A4 => P(A4)
A1, A5, A6, A3, A2 down to A4 => P(A4)
A3
A1
A2
A4 A5 A6
A1
A2
A4 A5 A6
16
Domain Set of Elimination Sequence
Graph-theoretic Representation
Definitions, Propositions & Theorems
The domain set of an elimination sequence is the set of domains of potentials produced during the elimination where potentials that are subsets of other potentials are removed.
For the sequence
A6, A5, A3, A1, A2 down to A4 => P(A4)
the set of domains is
{(A6,A3),(A2,A3,A5),(A1,A2,A3), (A1,A2),(A2,A4)}
Domain set reflects the complexity of an elimination sequence.
Question: how to find the smallest domain set ?
17
Set of Cliques
Graph-theoretic Representation
Definitions, Propositions & Theorems
All perfect elimination sequences produce the same the domain set, namely the set of cliques of the domain graph.
e.g.
all the sequences
A6, A5, A3, A1, A2 down to A4
A5, A6, A3, A1, A2 down to A4
A1, A5, A6, A3, A2 down to A4
produce the domain set
{(A6,A3),(A2,A3,A5),(A1,A2,A3), (A1,A2),(A2,A4)}
which contains 5 domains / cliques
Any perfect elimination sequence is optimal.
Cliques are a set of domains produce by perfect elimination sequences.
Clique set is the optimal set of domains.
Question: how to determine the set of cliques?
18
Triangulated Graphs
Graph-theoretic Representation
Definitions, Propositions & Theorems
An undirected graph with a perfect elimination sequence is called a triangulated graph.
A triangulated graph A nontriangulated graph
Perfect elimination sequence No perfect elimination sequence
A5, A2, A4, A3 down to A1
A3
A1 A2
A4 A5
A3
A1 A2
A4 A5
19
Cliques in Triangulated Graphs
Graph-theoretic Representation
Definitions, Propositions & Theorems
X : a node in domain graph
Fx : the set of neighbor nodes of X plus X
Simplicial: nodes with a complete neighbor set are called simplicial
To determine the set of cliques in a triangulated graph
1. Eliminate a simplicial node X. Fx is a clique candidate.
2. If Fx does not include all remaining nodes, go to 1.
3. Prune the set of cliques candidates by removing sets that are subsets of other clique candidates.
4. The resulting set is the set of cliques.
Question: given a set of cliques, how to determine the perfect elimination order?
DA
B
C E
X
20
Join Tree
Graph-theoretic Representation
Definitions, Propositions & Theorems
An organized tree of cliques, in which all nodes on the path between V and W contain the intersection of V and W.
D
A B
C F
I
E
GH
J
ABCDV1
BCDS1
CGHJV5
CGS5
BCDEV10
BCDGV1
BCDS1
DEFIV3
DES3
ABCD
CGHJ
BCDE
BCDG
DEFI
ABCD
CGHJ
BCDE
BCDG
DEFI
A domain graph
Cliques (V) and Separators (S)
A join tree
Elimination sequence
A,F,I,H,J,G,B,C,D down to E
Not a join tree
21
Propagation Junction Trees
Graph-theoretic Representation
Definitions, Propositions & Theorems
A junction tree is a join tree with the following structure:
1. Each potential is attached to a clique containing the domain of this potential (cliques)
2. Each link has the appropriate separator attached (separable)
3. Each separator contains two “mailboxes”, one for each direction (mutual communication)
φ1,φ2,φ3
V4: A1, A2, A3
φ4
V6: A2, A4φ5
V2: A2, A3, A5
φ6
V1: A3, A6
↑ ↓S4:A2
↑ ↓S2:A2,A3
↑ ↓S1:A3
Collect evidence to V6
distribute evidence from V6
Junction trees provide a general framework for finding optimal elimination sequence for triangulated graphs.
Question: what if a graph is non-triangulated?
22
Triangulations
Graph-theoretic Representation
Definitions, Propositions & Theorems
Convert a non-triangulated graph into a triangulated one by adding new link(s)
BN non-triangulated graph triangulated graph
D
A B C
E
F G
H I J
D
A B C
E
F G
H I J
D
A B C
E
F G
H I J
Optimal triangulation? Minimal fill-in size?
Heuristic approach: eliminate repeatedly a smplicial node, and if this is not possible, eliminate a node X with minimal size of Fx.
23
III. Stochastic Simulations
Forward Sampling
1. P(A) => A
2. P(B|A)=>B, P(C|A)=>C
3. P(D|B)=>D
4. P(E|C,D)=>E
5. Repeat steps 1~4
D
A
B C
E
Gibbs Sampling
Evidence: B=n, E=n; P(B=n,E=n) is rare
P(A)=?
P(C| B=n,E=n, A=a0, D=d0) => c1
P(D| B=n,E=n,C=c1,A=a0) => d1
P(A| B=n,E=n, D=d1,C=c1) => a1
P(C| B=n,E=n, A=a1, D=d1) => c2
.
. discard
P(C| B=n,E=n, A=at-1, D=dt-1) => ct
. collect
.
.