reasoning with bayesian networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · lecture 1:...
TRANSCRIPT
Probability CalculusBayesian Networks
Reasoning with Bayesian NetworksLecture 1: Probability Calculus, Bayesian Networks
Jinbo Huang
NICTA and ANU
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Overview of the Course
I Probability calculus, Bayesian networks
I Inference by variable elimination, factor elimination,conditioning
I Models for graph decomposition
I Most likely instantiations
I Complexity of probabilistic inference
I Compiling Bayesian networks
I Inference with local structure
I Applications
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Probabilities of Worlds
Worlds modeled with propositional variables
A world is complete instantiation of variables
Probabilities of all worlds add up to 1
An event is a set of worlds
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Probabilities of Events
Pr(α)def=
∑ω∈α
Pr(ω)
I Pr(Earthquake) = Pr(ω1) + Pr(ω2) + Pr(ω3) + Pr(ω4) = .1
I Pr(¬Burglary) = .8
I Pr(Alarm) = .2442
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Probabilities of Events
Pr(α)def=
∑ω∈α
Pr(ω)
I Pr(Earthquake) = Pr(ω1) + Pr(ω2) + Pr(ω3) + Pr(ω4) = .1
I Pr(¬Burglary) = .8
I Pr(Alarm) = .2442
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Propositional Sentences as Events
Pr((Earthquake ∨ Burglary) ∧ ¬Alarm) =?
Interpret world ω as model (satisfying assignment) forpropositional sentence α
Pr(α)def=
∑ω|=α
Pr(ω)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Propositional Sentences as Events
Pr((Earthquake ∨ Burglary) ∧ ¬Alarm) =?
Interpret world ω as model (satisfying assignment) forpropositional sentence α
Pr(α)def=
∑ω|=α
Pr(ω)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Updating Beliefs
Suppose we’re given evidence β that Alarm = true
Need to update Pr(.) to Pr(.|β)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬β
I Relative beliefs stay put, for ω |= β
I∑
ω|=β Pr(ω) = Pr(β)
I∑
ω|=β Pr(ω|β) = 1
I Above imply we normalize by constant Pr(β)
Pr(ω|β)def=
{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β
I∑
ω|=β Pr(ω) = Pr(β)
I∑
ω|=β Pr(ω|β) = 1
I Above imply we normalize by constant Pr(β)
Pr(ω|β)def=
{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β
I∑
ω|=β Pr(ω) = Pr(β)
I∑
ω|=β Pr(ω|β) = 1
I Above imply we normalize by constant Pr(β)
Pr(ω|β)def=
{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β
I∑
ω|=β Pr(ω) = Pr(β)
I∑
ω|=β Pr(ω|β) = 1
I Above imply we normalize by constant Pr(β)
Pr(ω|β)def=
{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β
I∑
ω|=β Pr(ω) = Pr(β)
I∑
ω|=β Pr(ω|β) = 1
I Above imply we normalize by constant Pr(β)
Pr(ω|β)def=
{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β
I∑
ω|=β Pr(ω) = Pr(β)
I∑
ω|=β Pr(ω|β) = 1
I Above imply we normalize by constant Pr(β)
Pr(ω|β)def=
{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Updating Beliefs
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Bayes Conditioning
Pr(α|β) =∑ω|=α
Pr(ω|β)
=∑
ω|=α, ω|=β
Pr(ω|β) +∑
ω|=α, ω|=¬β
Pr(ω|β)
=∑
ω|=α, ω|=β
Pr(ω|β) =∑
ω|=α∧β
Pr(ω|β)
=∑
ω|=α∧β
Pr(ω)/Pr(β) =1
Pr(β)
∑ω|=α∧β
Pr(ω)
=Pr(α ∧ β)
Pr(β)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Bayes Conditioning
Pr(Burglary) = .2
Pr(Burglary |Alarm) ≈ .741 ↑Pr(Burglary |Alarm ∧ Earthquake) ≈ .253 ↓Pr(Burglary |Alarm ∧ ¬Earthquake) ≈ .957 ↑
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Bayes Conditioning
Pr(Burglary) = .2Pr(Burglary |Alarm) ≈ .741 ↑
Pr(Burglary |Alarm ∧ Earthquake) ≈ .253 ↓Pr(Burglary |Alarm ∧ ¬Earthquake) ≈ .957 ↑
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Bayes Conditioning
Pr(Burglary) = .2Pr(Burglary |Alarm) ≈ .741 ↑Pr(Burglary |Alarm ∧ Earthquake) ≈ .253 ↓
Pr(Burglary |Alarm ∧ ¬Earthquake) ≈ .957 ↑
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Bayes Conditioning
Pr(Burglary) = .2Pr(Burglary |Alarm) ≈ .741 ↑Pr(Burglary |Alarm ∧ Earthquake) ≈ .253 ↓Pr(Burglary |Alarm ∧ ¬Earthquake) ≈ .957 ↑
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Independence
Pr(Earthquake) = .1Pr(Earthquake|Burglary) = .1
Event α independent of event β if
Pr(α|β) = Pr(α) or Pr(β) = 0
Or equivalentlyPr(α ∧ β) = Pr(α) Pr(β)
Independence is symmetric
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Independence
Pr(Earthquake) = .1Pr(Earthquake|Burglary) = .1
Event α independent of event β if
Pr(α|β) = Pr(α) or Pr(β) = 0
Or equivalentlyPr(α ∧ β) = Pr(α) Pr(β)
Independence is symmetric
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Independence
Pr(Earthquake) = .1Pr(Earthquake|Burglary) = .1
Event α independent of event β if
Pr(α|β) = Pr(α) or Pr(β) = 0
Or equivalentlyPr(α ∧ β) = Pr(α) Pr(β)
Independence is symmetric
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Independence
Pr(Earthquake) = .1Pr(Earthquake|Burglary) = .1
Event α independent of event β if
Pr(α|β) = Pr(α) or Pr(β) = 0
Or equivalentlyPr(α ∧ β) = Pr(α) Pr(β)
Independence is symmetric
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Conditional Independence
Independence is a dynamic notion
Burglary independent of Earthquake, but not after acceptingevidence Alarm
Pr(Burglary |Alarm) ≈ .741Pr(Burglary |Alarm ∧ Earthquake) ≈ .253
α independent of β given γ if
Pr(α ∧ β|γ) = Pr(α|γ) Pr(β|γ) or Pr(γ) = 0
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Conditional Independence
Independence is a dynamic notion
Burglary independent of Earthquake, but not after acceptingevidence Alarm
Pr(Burglary |Alarm) ≈ .741Pr(Burglary |Alarm ∧ Earthquake) ≈ .253
α independent of β given γ if
Pr(α ∧ β|γ) = Pr(α|γ) Pr(β|γ) or Pr(γ) = 0
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Variable Independence
For disjoint sets of variables X, Y, and Z
IPr(X, Z, Y) : X independent of Y given Z in Pr
Means x independent of y given z for all instantiations of x, y, z
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Properties of Beliefs: Chain Rule
Pr(α1 ∧ α2 ∧ . . . ∧ αn)
= Pr(α1|α2 ∧ . . . ∧ αn) Pr(α2|α3 ∧ . . . ∧ αn) . . .Pr(αn)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Properties of Beliefs: Case Analysis
Pr(α) =n∑
i=1
Pr(α ∧ βi )
Or equivalently
Pr(α) =n∑
i=1
Pr(α|βi ) Pr(βi )
Where events βi are mutually exclusive and collectively exhaustive
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Properties of Beliefs: Case Analysis
Special case of n = 2
Pr(α) = Pr(α ∧ β) + Pr(α ∧ ¬β)
Or equivalently
Pr(α) = Pr(α|β) Pr(β) + Pr(α|¬β) Pr(¬β)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Properties of Beliefs: Bayes Rule
Pr(α|β) =Pr(β|α) Pr(α)
Pr(β)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Properties of Beliefs: Bayes Rule
Example: patient tested positive for disease
I D: patient has disease
I T : test comes out positive
I Would like to know Pr(D|T )
We know
I Pr(D) = 1/1000
I Pr(T |¬D) = 2/100 (false positive)
I Pr(¬T |D) = 5/100 (false negative)
Using Bayes rule: Pr(D|T ) = Pr(T |D) Pr(D)Pr(T )
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Properties of Beliefs: Bayes Rule
Example: patient tested positive for disease
I D: patient has disease
I T : test comes out positive
I Would like to know Pr(D|T )
We know
I Pr(D) = 1/1000
I Pr(T |¬D) = 2/100 (false positive)
I Pr(¬T |D) = 5/100 (false negative)
Using Bayes rule: Pr(D|T ) = Pr(T |D) Pr(D)Pr(T )
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Properties of Beliefs: Bayes Rule
Example: patient tested positive for disease
I D: patient has disease
I T : test comes out positive
I Would like to know Pr(D|T )
We know
I Pr(D) = 1/1000
I Pr(T |¬D) = 2/100 (false positive)
I Pr(¬T |D) = 5/100 (false negative)
Using Bayes rule: Pr(D|T ) = Pr(T |D) Pr(D)Pr(T )
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Properties of Beliefs: Bayes Rule
Computing Pr(T ): case analysis
Pr(T ) = Pr(T |D) Pr(D) + Pr(T |¬D) Pr(¬D)
=95
100× 1
1000+
2
200× 999
1000=
2093
100000
Pr(D|T ) =95
2093≈ 4.5%
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Properties of Beliefs: Bayes Rule
Computing Pr(T ): case analysis
Pr(T ) = Pr(T |D) Pr(D) + Pr(T |¬D) Pr(¬D)
=95
100× 1
1000+
2
200× 999
1000=
2093
100000
Pr(D|T ) =95
2093≈ 4.5%
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence
So far we’ve considered hard evidence, that some event hasoccurred
Soft evidence: neighbor with hearing problem calls to tell us theyheard alarm
May not confirm Alarm, but can still increase our belief in Alarm
Two main methods to specify strength of soft evidence
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: All Things Considered
“After receiving my neighbor’s call, my belief in Alarm is now .85”
Soft evidence as constraint Pr′(β) = q
Not a statement about strength of evidence per se, but result of itsintegration with our initial beliefs
Scale beliefs in worlds |= β so they add up to qScale beliefs in worlds |= ¬β so they add up to 1− q
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: All Things Considered
“After receiving my neighbor’s call, my belief in Alarm is now .85”
Soft evidence as constraint Pr′(β) = q
Not a statement about strength of evidence per se, but result of itsintegration with our initial beliefs
Scale beliefs in worlds |= β so they add up to qScale beliefs in worlds |= ¬β so they add up to 1− q
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: All Things Considered
For worlds
Pr′(ω)def=
{q
Pr(β) Pr(ω), if ω |= β1−q
Pr(¬β) Pr(ω), if ω |= ¬β
For events (Jeffery’s rule)
Pr′(α) = q Pr(α|β) + (1− q) Pr(α|¬β)
Bayes conditioning is special case of q = 1
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: All Things Considered
For worlds
Pr′(ω)def=
{q
Pr(β) Pr(ω), if ω |= β1−q
Pr(¬β) Pr(ω), if ω |= ¬β
For events (Jeffery’s rule)
Pr′(α) = q Pr(α|β) + (1− q) Pr(α|¬β)
Bayes conditioning is special case of q = 1
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: All Things Considered
For worlds
Pr′(ω)def=
{q
Pr(β) Pr(ω), if ω |= β1−q
Pr(¬β) Pr(ω), if ω |= ¬β
For events (Jeffery’s rule)
Pr′(α) = q Pr(α|β) + (1− q) Pr(α|¬β)
Bayes conditioning is special case of q = 1
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: Nothing Else Considered
Would like to declare strength of evidence independently of currentbeliefs
Need notion of odds of event β: O(β)def= Pr(β)
Pr(¬β)
Declare evidence β by Bayes factor k = O′(β)O(β)
“My neighbor’s call increases odds of Alarm by factor of 4”
Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: Nothing Else Considered
Would like to declare strength of evidence independently of currentbeliefs
Need notion of odds of event β: O(β)def= Pr(β)
Pr(¬β)
Declare evidence β by Bayes factor k = O′(β)O(β)
“My neighbor’s call increases odds of Alarm by factor of 4”
Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: Nothing Else Considered
Would like to declare strength of evidence independently of currentbeliefs
Need notion of odds of event β: O(β)def= Pr(β)
Pr(¬β)
Declare evidence β by Bayes factor k = O′(β)O(β)
“My neighbor’s call increases odds of Alarm by factor of 4”
Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: Nothing Else Considered
Would like to declare strength of evidence independently of currentbeliefs
Need notion of odds of event β: O(β)def= Pr(β)
Pr(¬β)
Declare evidence β by Bayes factor k = O′(β)O(β)
“My neighbor’s call increases odds of Alarm by factor of 4”
Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence
Soft Evidence: Nothing Else Considered
Would like to declare strength of evidence independently of currentbeliefs
Need notion of odds of event β: O(β)def= Pr(β)
Pr(¬β)
Declare evidence β by Bayes factor k = O′(β)O(β)
“My neighbor’s call increases odds of Alarm by factor of 4”
Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Probabilistic Reasoning
Basic task: belief updating in face of hard and soft evidence
Can be done in principle using definitions we’ve seen
Inefficient, requires joint probability tables (exponential)
Also a modeling difficulty
I Specifying joint probabilities is tedious
I Beliefs (e.g., independencies) held by human experts noteasily enforced
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Capturing Independence Graphically
Evidence R couldincrease Pr(A),Pr(C )
But not if we know¬A
C independent of Rgiven ¬A
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Capturing Independence Graphically
Evidence R couldincrease Pr(A),Pr(C )
But not if we know¬A
C independent of Rgiven ¬A
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Capturing Independence Graphically
Evidence R couldincrease Pr(A),Pr(C )
But not if we know¬A
C independent of Rgiven ¬A
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Capturing Independence Graphically
Evidence R couldincrease Pr(A),Pr(C )
But not if we know¬A
C independent of Rgiven ¬A
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Capturing Independence Graphically
Parents(V): variables withedge to V
Descendants(V): variablesto which ∃ directed pathfrom V
Nondescendants(V): allexcept V, Parents(V), andDescendants(V)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Markovian Assumptions
Every variable is conditionally independent of its nondescendantsgiven its parents
I (V ,Parents(V ),Nondescendants(V ))
Given direct causes of variable, our beliefs in that variable are nolonger influenced by any other variable except possibly by its effects
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Markovian Assumptions
Every variable is conditionally independent of its nondescendantsgiven its parents
I (V ,Parents(V ),Nondescendants(V ))
Given direct causes of variable, our beliefs in that variable are nolonger influenced by any other variable except possibly by its effects
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Capturing Independence Graphically
I (C ,A, {B,E ,R})
I (R,E , {A,B,C})
I (A, {B,E},R)
I (B, ∅, {E ,R})
I (E , ∅,B)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Conditional Probability Tables
DAG G together with Markov(G ) declares a set of independencies,but does not define Pr
Pr is uniquely defined if a conditional probability table (CPT) isprovided for each variable X
CPT: Pr(x |u) for every value x of variable X and everyinstantiation u of parents U
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Conditional Probability Tables
Pr(c|a)
Pr(r |e)
Pr(a|b, e)
Pr(e)
Pr(b)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Conditional Probability Tables
Half of probabilitiesare redundant
Only need 10probabilities forwhole DAG
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Conditional Probability Tables
Half of probabilitiesare redundant
Only need 10probabilities forwhole DAG
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Bayesian Network
A pair (G ,Θ) where
I G is DAG over variables Z, called network structure
I Θ is set of CPTs, one for each variable in Z, called networkparametrization
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Bayesian Network
ΘX |U: CPT for X andparents U
XU: network family
θx |u = Pr(x |u): networkparameter∑
x θx |u = 1 for everyparent instantiation u
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Bayesian Network
Network instantiation: e.g.,a, b, c , d , e
θx |u ∼ z: instantiations xu &z are compatible
θa, θb|a, θc|a, θd |b,c , θe|c all ∼ zabove
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Bayesian Network
Independence constraints imposed by network structure (DAG) andnumeric constraints imposed by network parametrization (CPTs)are satisfied by unique Pr
Pr(z)def=
∏θx|u∼z
θx |u (chain rule)
Bayesian network is implicit representation of this probabilitydistribution
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Chain Rule Example
Pr(z)def=
∏θx|u∼z
θx |u (chain rule)
Pr(a, b, c, d , e) = θa θb|a θc|a θd |b,c θe|c
= (.6)(.2)(.2)(.9)(.1)
= .0216
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Compactness of Bayesian Networks
Let d be max variable domain size, k max number of parents
Size of any CPT = O(dk+1)
Total number of network parameters = O(n · dk+1) for n-variablenetwork
Compact as long as k is relatively small, compared with jointprobability table (up to dn entries)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Properties of Probabilistic Independence
Distribution Pr represented by Bayesian network (G ,Θ) isguaranteed to satisfy every independence assumption in Markov(G )
IPr(X ,Parents(X ),Nondescendants(X ))
It may satisfy more
Derive independence by graphoid axioms
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Graphoid Axioms: Symmetry
IPr(X,Z,Y)↔ IPr(Y,Z,X)
If learning y does not influence our belief in x, then learning x doesnot influence our belief in y either
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Graphoid Axioms: Decomposition
IPr(X,Z,Y ∪W)→ IPr(X,Z,Y) ∧ IPr(X,Z,W)
If learning yw does not influence our belief in x, then learning yalone, or w alone, does not influence our belief in x either
If some information is irrelevant, then any part of it is alsoirrelevant
Reverse does not hold in general: Two pieces of information mayeach be irrelevant on their own, yet their combination may berelevant
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Graphoid Axioms: Decomposition
IPr(X,Z,Y ∪W)→ IPr(X,Z,Y) ∧ IPr(X,Z,W)
In particular, can strengthen Markov(G )
IPr(X ,Parents(X ),W) for W ⊆ Nondescendants(X )
Useful for proving chain rule for Bayesian networks from chain ruleof probability calculus
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Graphoid Axioms: Weak Union
IPr(X,Z,Y ∪W)→ IPr(X,Z ∪ Y,W)
If information yw is not relevant to our belief in x, then partialinformation y will not make rest of information, w, relevant
In particular, can strengthen Markov(G )
IPr(X ,Parents(X ) ∪W,Nondescendants(X )\W)
for W ⊆ Nondescendants(X )
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Graphoid Axioms: Contraction
IPr(X,Z,Y) ∧ IPr(X,Z ∪ Y,W)→ IPr(X,Z,Y ∪W)
If, after learning irrelevant information y, information w is found tobe irrelevant to our belief in x, then combined information ywmust have been irrelevant to start with
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Intersection Axiom
IPr(X,Z ∪W,Y) ∧ IPr(X,Z ∪ Y,W)→ IPr(X,Z,Y ∪W)
for strictly positive distribution Pr
If information w is irrelevant given y, and information y is irrelevantgiven w, then their combination is irrelevant to start with
Not true in general, when there’re zero probabilities (determinism)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Graphical Test of Independence
Graphoid axioms produce independencies beyond Markov(G )
Derivation of independencies is nontrivial; need efficient way
Linear-time graphical test : d-separation
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation
dsepG (X,Z,Y): every path between X & Y is blocked by Z
dsepG (X,Z,Y)→ IPr(X,Z,Y) for every Pr induced by G(soundness)
Also enjoys weak completeness (discussed later)
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation: Paths and Valves
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation: Paths and Valves
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation: Path Blocking
Path is blocked if any valve is closed
Sequential valve →W → closed iff W ∈ Z
Divergent valve ←W → closed iff W ∈ Z
Convergent value →W ← closed iff W 6∈ Z and none ofDescendants(W ) appears in Z
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation: Examples
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation: Examples
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation: Complexity
Need not enumerate paths between X & Y
dsepG (X,Z,Y) iff X and Y are disconnected in new DAG G ′
obtained by pruning DAG G as follows
I Delete all leaves W 6∈ X ∪ Y ∪ Z
I Delete all edges outgoing from Z
Linear in size of G
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation: Soundness
dsepG (X,Z,Y)→ IPr(X,Z,Y) for Pr induced by G
Prove by showing every independence claimed by d-separation canbe derived from graphoid axioms
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation: Completeness?
dsepG (X,Z,Y)← IPr(X,Z,Y)?
Does not hold
I Consider X → Y → Z
I X not d-separated from Z (given ∅)I However, if θy |x = θy |x then X & Y are independent, so are
X & Z
Not surprising as d-separation is based on structure alone
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
d-separation: Weak Completeness
For every DAG G , ∃ parametrization Θ such that
dsepG (X,Z,Y)↔ IPr(X,Z,Y)
where Pr is induced by Bayesian network (G ,Θ)
Implies that one cannot improve on d-separation
Jinbo Huang Reasoning with Bayesian Networks
Probability CalculusBayesian Networks
Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation
Summary
Probabilities of worlds & events, Bayes conditioning, independence,chain rule, case analysis, Bayes rule, two methods for soft evidence
Capturing independence graphically, Markovian assumptions,conditional probabilities tables, Bayesian networks, chain rule,graphoid axioms, d-separation
Jinbo Huang Reasoning with Bayesian Networks