bayesian networks chapter#4 book: modeling and reasoning with bayesian networks author : adnan...

BAYESIAN NETWORKS CHAPTER#4

Book: Modeling and Reasoning

with Bayesian Networks

Author : Adnan Darwiche

Publisher: CambridgeUniversity Press 2009

Introduction

Joint Probability Distribution can be used to model uncertain beliefs and change them in the face of Hard and Soft Evidence.

Problem with JPD is that size grows exponentially with the number of variables which introduces modeling and computational difficulties.

Need for BN

BN is a graphical modeling tool for compactly specifying JPD

BN relies on the basic insight that: “ independence forms a significant

aspect of belief” “Elicitation is relatively easily using the

language of graph”

Example

Earthquake(E)

Burglary(B)

Alarm(A)

Radio(R)

Call (C)

BN is a Directed Acyclic Graph

Nodes are Propositional Variables

Edges are Direct Causal Influences

Example

We would expect our belief in C to be influenced by some Evidence on R

For example if we get a Radio report that an Earthquake took place then our belief in Alarm triggering would increase which would increase our belief in receiving call from a neighbor

However we would not change our belief if we knew for sure that the Alarm did not trigger

Thus C would be independent of R given ¬A

Formal Representation of Independence

Given a variable V in a DAG G:

Parents (V) are the parents of V [Direct Causes of V]

Descendants(V) are the set of variables N with a directed path from V to N

[Effects of V]

Non_Descendants(V) are the variables other that Parents and Descendants

Independence Statement / Markovian Assumption

I ( V, Parents (V), Non_Descendants(V)) ….. 4.1

That is every variable is conditionally independent of its Non Descendants given its parents known as Markovian Assumption denoted by Markov(G)

4.1 can also be read as: Given the direct causes of a variable, our beliefs in

that variable will no longer be influenced by any other variable except possibly by its effects

Examples of Independence Statements

I (C,A, {B,E,R} ) I (R,E, {A,B,C} ) I (A,{B,E}, R) I (B, ø , {E,R}) I (E, ø , B)

Earthquake(E)

Burglary(B)

Alarm(A)

Radio(R)

Call (C)

Parameterizing the Independence Structure

Parameterizing means quantifying the dependencies between Nodes and their Parents

In other words construction of CPT

For every variable X in the DAG G and its parents U, we need to provide the probability Pr(x|u) for every value x of variable X and every instantiation u of parents U

Formal Definition of Bayesian Network

A Bayesian Network for variables Z is a pair where:

G is a directed acyclic graph over variables Z called the Network Structure

is a set of CPT’s one for each variable in Z called the Network Parameterization

(X|U) would be used to denote the CPT for variable X and its parents U, and refer to the set XU as a Network Family.

Def (continue..)

denotes the value assigned by CPT to the conditional probability Pr (x|u) and call it Network Parameter

Instantiation of all the network variables are called Network Instantiations

Network parameter Network instantiation

a a

(b|a) b

(¬c|a) ¬c

(d|b, ¬c) d

(¬e|¬c) ¬e

Chain Rule for Bayesian Networks

Network Instantiations z is simply the product of all network parameters compatible with z

Properties of Probabilistic Independence

Recall : I (X,Z,Y) Pr(x|z,y) = Pr(x|z) or Pr(y|z) =0 for all instantiations x,y,z

Graphoid Axioms: Symmetry Weak Union Decomposition Contraction

Symmetry

IPr(X,Z,Y) if and only if IPr(Y,Z,X)

If learning Y does not influence our belief in x then learning x does not influence our belief in y

By Markov(G) we know that: I (A,{B,E},R) Using Symmetry: I (R,{B,E},A)

Earthquake(E)

Burglary(B)

Alarm(A)

Radio(R)

Call (C)

Decomposition

IPr(X,Z,YUW) only if IPr (X,Z,Y) and IPr(X,Z,W)

If learning yw does not influence our belief in x then learning y alone or learning w alone does not influence our belief in x

Weak Union

IPr(X,Z,YUW) only if IPr (X,ZUY,W)

If the information yw is not relevant to our belief in x then the partial information will not make the rest of the information relevant

Contraction

IPr(X,Z,Y) and I (X,ZUY,W) only if IPr

(X,Z,YUW) If learning the irrelevant information y the information w is found to be irrelevant to our belief in x then the combined information must have been irrelevant from the beginning

Questions ???

bayesian networks chapter#4 book: modeling and reasoning with bayesian networks author : adnan...

Documents