bayesian networks chapter#4 book: modeling and reasoning with bayesian networks author : adnan...
TRANSCRIPT
BAYESIAN NETWORKS CHAPTER#4
Book: Modeling and Reasoning
with Bayesian Networks
Author : Adnan Darwiche
Publisher: CambridgeUniversity Press 2009
Introduction
Joint Probability Distribution can be used to model uncertain beliefs and change them in the face of Hard and Soft Evidence.
Problem with JPD is that size grows exponentially with the number of variables which introduces modeling and computational difficulties.
Need for BN
BN is a graphical modeling tool for compactly specifying JPD
BN relies on the basic insight that: “ independence forms a significant
aspect of belief” “Elicitation is relatively easily using the
language of graph”
Example
Earthquake(E)
Burglary(B)
Alarm(A)
Radio(R)
Call (C)
BN is a Directed Acyclic Graph
Nodes are Propositional Variables
Edges are Direct Causal Influences
Example
We would expect our belief in C to be influenced by some Evidence on R
For example if we get a Radio report that an Earthquake took place then our belief in Alarm triggering would increase which would increase our belief in receiving call from a neighbor
However we would not change our belief if we knew for sure that the Alarm did not trigger
Thus C would be independent of R given ¬A
Formal Representation of Independence
Given a variable V in a DAG G:
Parents (V) are the parents of V [Direct Causes of V]
Descendants(V) are the set of variables N with a directed path from V to N
[Effects of V]
Non_Descendants(V) are the variables other that Parents and Descendants
Independence Statement / Markovian Assumption
I ( V, Parents (V), Non_Descendants(V)) ….. 4.1
That is every variable is conditionally independent of its Non Descendants given its parents known as Markovian Assumption denoted by Markov(G)
4.1 can also be read as: Given the direct causes of a variable, our beliefs in
that variable will no longer be influenced by any other variable except possibly by its effects
Examples of Independence Statements
I (C,A, {B,E,R} ) I (R,E, {A,B,C} ) I (A,{B,E}, R) I (B, ø , {E,R}) I (E, ø , B)
Earthquake(E)
Burglary(B)
Alarm(A)
Radio(R)
Call (C)
Parameterizing the Independence Structure
Parameterizing means quantifying the dependencies between Nodes and their Parents
In other words construction of CPT
For every variable X in the DAG G and its parents U, we need to provide the probability Pr(x|u) for every value x of variable X and every instantiation u of parents U
Formal Definition of Bayesian Network
A Bayesian Network for variables Z is a pair where:
G is a directed acyclic graph over variables Z called the Network Structure
is a set of CPT’s one for each variable in Z called the Network Parameterization
(X|U) would be used to denote the CPT for variable X and its parents U, and refer to the set XU as a Network Family.
Def (continue..)
denotes the value assigned by CPT to the conditional probability Pr (x|u) and call it Network Parameter
Instantiation of all the network variables are called Network Instantiations
Network parameter Network instantiation
a a
(b|a) b
(¬c|a) ¬c
(d|b, ¬c) d
(¬e|¬c) ¬e
Chain Rule for Bayesian Networks
Network Instantiations z is simply the product of all network parameters compatible with z
Properties of Probabilistic Independence
Recall : I (X,Z,Y) Pr(x|z,y) = Pr(x|z) or Pr(y|z) =0 for all instantiations x,y,z
Graphoid Axioms: Symmetry Weak Union Decomposition Contraction
Symmetry
IPr(X,Z,Y) if and only if IPr(Y,Z,X)
If learning Y does not influence our belief in x then learning x does not influence our belief in y
By Markov(G) we know that: I (A,{B,E},R) Using Symmetry: I (R,{B,E},A)
Earthquake(E)
Burglary(B)
Alarm(A)
Radio(R)
Call (C)
Decomposition
IPr(X,Z,YUW) only if IPr (X,Z,Y) and IPr(X,Z,W)
If learning yw does not influence our belief in x then learning y alone or learning w alone does not influence our belief in x
Weak Union
IPr(X,Z,YUW) only if IPr (X,ZUY,W)
If the information yw is not relevant to our belief in x then the partial information will not make the rest of the information relevant
Contraction
IPr(X,Z,Y) and I (X,ZUY,W) only if IPr
(X,Z,YUW) If learning the irrelevant information y the information w is found to be irrelevant to our belief in x then the combined information must have been irrelevant from the beginning
Questions ???