chapter 8: reasoning under uncertainty - uni-hamburg.de

217
Chapter 8: Reasoning under Uncertainty c D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 1

Upload: others

Post on 12-Jul-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Chapter 8:Reasoning under Uncertainty

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 1

Page 2: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

“The mind is a neural computer, fitted by natural se-lection with combinatorial algorithms for causal andprobabilistic reasoning about plants, animals, objects,and people.

“In a universe with any regularities at all, deci-sions informed about the past are better than deci-sions made at random. That has always been true,and we would expect organisms, especially informa-vores such as humans, to have evolved acute intuitionsabout probability. The founders of probability, like thefounders of logic, assumed they were just formalizingcommon sense.”

Steven Pinker, How the Mind Works, 1997, pp. 524, 343.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 2

Page 3: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Using Uncertain Knowledge

Agents don’t have complete knowledge about the world.

Agents need to make decisions based on their uncertainty.

It isn’t enough to assume what the world is like.Example: wearing a seat belt.

An agent needs to reason about its uncertainty.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 3

Page 4: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Why Probability?

There is lots of uncertainty about the world, but agentsstill need to act.

Predictions are needed to decide what to do:I definitive predictions: you will be run over tomorrowI point probabilities: probability you will be run over

tomorrow is 0.002I probability ranges: you will be run over with probability

in range [0.001,0.34]

Acting is gambling: agents who don’t use probabilitieswill lose to those who do — Dutch books.

Probabilities can be learned from data.Bayes’ rule specifies how to combine data and priorknowledge.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 4

Page 5: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Probability

Probability is an agent’s measure of belief in someproposition — subjective probability.

An agent’s belief depends on its prior assumptions andwhat the agent observes.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 5

Page 6: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Numerical Measures of Belief

Belief in proposition, f , can be measured in terms of anumber between 0 and 1 — this is the probability of f .I The probability f is 0 means that f is believed to be

definitely false.I The probability f is 1 means that f is believed to be

definitely true.

Using 0 and 1 is purely a convention.

f has a probability between 0 and 1, means the agent isignorant of its truth value.

Probability is a measure of an agent’s ignorance.

Probability is not a measure of degree of truth.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 6

Page 7: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Random Variables

A random variable is a term in a language that can takeone of a number of different values.

The range of a variable X , written range(X ), is the setof values X can take.

A tuple of random variables 〈X1, . . . ,Xn〉 is a complexrandom variable with range range(X1)× · · · × range(Xn).Often the tuple is written as X1, . . . ,Xn.

Assignment X = x means variable X has value x .

A proposition is a Boolean formula made fromassignments of values to variables.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 7

Page 8: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Possible World Semantics

A possible world specifies an assignment of one value toeach random variable.

A random variable is a function from possible worlds intothe range of the random variable.

ω |= X = xmeans variable X is assigned value x in world ω.

Logical connectives have their standard meaning:

ω |= α ∧ β if ω |= α and ω |= β

ω |= α ∨ β if ω |= α or ω |= β

ω |= ¬α if ω 6|= α

Let Ω be the set of all possible worlds.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 8

Page 9: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Semantics of Probability

For a finite number of possible worlds:

Define a nonnegative measure µ(ω) to each world ωso that the measures of the possible worlds sum to 1.

The probability of proposition f is defined by:

P(f ) =∑ω|=f

µ(ω).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 9

Page 10: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Axioms of Probability: finite case

Three axioms define what follows from a set of probabilities:

Axiom 1 0 ≤ P(a) for any proposition a.

Axiom 2 P(true) = 1

Axiom 3 P(a ∨ b) = P(a) + P(b) if a and b cannot bothbe true.

These axioms are sound and complete with respect to thesemantics.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 10

Page 11: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Consequences

1. Negation of a proposition: P(¬α) = 1− P(α).

The propositions α ∨ ¬α and ¬(α ∧ ¬α) are tautologies.Therefore, P(α ∨ ¬α) = P(α) + P(¬α) = 1.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 11

Page 12: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Consequences

2. Logically equivalent propositions have the sameprobability: α↔ β P(α) = P(β).

If α↔ β, then α∨¬β is a tautology and P(α∨¬β) = 1.α and ¬β are contradictory statements, so with Axiom 3

P(α ∨ ¬β) = P(α) + P(¬β) = 1

Since P(¬β) = 1− P(β) also

P(α) + 1− P(β) = 1,

and therefore P(α) = P(β).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 12

Page 13: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Consequences

3. Reasoning by cases: P(α) = P(α ∧ β) + P(α ∧ ¬β).

The propositions α↔ ((α ∧ β) ∨ (α ∧ ¬β)) and¬((α ∧ β) ∧ (α ∧ ¬β)) are tautologies. Thus,

P(α) = P((α∧β)∨(α∧¬β)) = P(α∧β)+P(α∧¬β).

4. Reasoning by cases, generalized:

If V is a random variable with domain D, then, for allpropositions α,

P(α) =∑d∈D

P(α ∧ V = d).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 13

Page 14: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Consequences

5. Disjunction for non-exclusive propositions:

P(α ∨ β) = P(α) + P(β)− P(α ∧ β).

(α ∨ β)↔ ((α ∧ ¬β) ∨ β) is a tautology. Thus,

P(α ∨ β) = P((α ∧ ¬β) ∨ β) = P(α ∧ ¬β) + P(β).

With P(α ∧ ¬β) = P(α)− P(α ∧ β).

P(α ∨ β) = P(α)− P(α ∧ β) + P(β).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 14

Page 15: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Semantics of Probability: general case

In the general case, probability defines a measure on sets ofpossible worlds. We define µ(S) for some sets S ⊆ Ωsatisfying:

µ(S) ≥ 0

µ(Ω) = 1

µ(S1 ∪ S2) = µ(S1) + µ(S2) if S1 ∩ S2 = .Or sometimes σ-additivity:

µ(⋃

i

Si ) =∑

i

µ(Si ) if Si ∩ Sj = for i 6= j

Then P(α) = µ(ω|ω |= α).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 15

Page 16: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Probability Distributions

A probability distribution on a random variable X is afunction range(X )→ [0, 1] such that

x 7→ P(X = x).

This is written as P(X ).

This also includes the case where we have tuples ofvariables. E.g., P(X ,Y ,Z ) means P(〈X ,Y ,Z 〉).

When range(X ) is infinite sometimes we need aprobability density function...

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 16

Page 17: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Conditioning

Probabilistic conditioning specifies how to revise beliefsbased on new information.

An agent builds a probabilistic model taking allbackground information into account. This gives theprior probability.

All other information must be conditioned on.

If evidence e is all the information obtainedsubsequently, the conditional probability P(h|e) of h

given e is the posterior probability of h.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 17

Page 18: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds

µe(S) =

c × µ(S) if ω |= e for all ω ∈ S0 if ω 6|= e for some ω ∈ S

We can show that c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h|e) = µe(ω : ω |= h)

=

P(h ∧ e)

P(e)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 18

Page 19: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds

µe(S) =

c × µ(S) if ω |= e for all ω ∈ S0 if ω 6|= e for some ω ∈ S

We can show that c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h|e) = µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 19

Page 20: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Conditioning

Possible Worlds:

Observe Color = orange:

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 20

Page 21: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Conditioning

Possible Worlds:

Observe Color = orange:

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 21

Page 22: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Exercise

Flu Sneeze Snore µtrue true true 0.064true true false 0.096true false true 0.016true false false 0.024false true true 0.096false true false 0.144false false true 0.224false false false 0.336

What is:

(a) P(flu ∧ sneeze)

(b) P(flu ∧ ¬sneeze)

(c) P(flu)

(d) P(sneeze | flu)

(e) P(¬flu ∧ sneeze)

(f) P(flu | sneeze)

(g) P(sneeze | flu∧snore)

(h) P(flu | sneeze∧snore)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 22

Page 23: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Generalized conditional probability

Computation of a conditional probability from given jointprobabilities

P(fn|f1 ∧ . . . ∧ fn−1) =P(f1 ∧ · · · ∧ fn−1 ∧ fn)

P(f1 ∧ · · · ∧ fn−1)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 23

Page 24: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Chain Rule

Inverse of the generalized conditional probability:computation of a joint probability distribution from givenconditional probabilities

P(f1 ∧ f2 ∧ . . . ∧ fn)

=

P(fn|f1 ∧ · · · ∧ fn−1)× P(f1 ∧ · · · ∧ fn−1)

= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)× P(f1 ∧ · · · ∧ fn−2)

= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)

× · · · × P(f3|f1 ∧ f2)× P(f2|f1)× P(f1)

=n∏

i=1

P(fi |f1 ∧ · · · ∧ fi−1)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 24

Page 25: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Chain Rule

Inverse of the generalized conditional probability:computation of a joint probability distribution from givenconditional probabilities

P(f1 ∧ f2 ∧ . . . ∧ fn)

= P(fn|f1 ∧ · · · ∧ fn−1)× P(f1 ∧ · · · ∧ fn−1)

=

P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)× P(f1 ∧ · · · ∧ fn−2)

= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)

× · · · × P(f3|f1 ∧ f2)× P(f2|f1)× P(f1)

=n∏

i=1

P(fi |f1 ∧ · · · ∧ fi−1)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 25

Page 26: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Chain Rule

Inverse of the generalized conditional probability:computation of a joint probability distribution from givenconditional probabilities

P(f1 ∧ f2 ∧ . . . ∧ fn)

= P(fn|f1 ∧ · · · ∧ fn−1)× P(f1 ∧ · · · ∧ fn−1)

= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)× P(f1 ∧ · · · ∧ fn−2)

=

P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)

× · · · × P(f3|f1 ∧ f2)× P(f2|f1)× P(f1)

=n∏

i=1

P(fi |f1 ∧ · · · ∧ fi−1)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 26

Page 27: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Chain Rule

Inverse of the generalized conditional probability:computation of a joint probability distribution from givenconditional probabilities

P(f1 ∧ f2 ∧ . . . ∧ fn)

= P(fn|f1 ∧ · · · ∧ fn−1)× P(f1 ∧ · · · ∧ fn−1)

= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)× P(f1 ∧ · · · ∧ fn−2)

= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)

× · · · × P(f3|f1 ∧ f2)× P(f2|f1)× P(f1)

=n∏

i=1

P(fi |f1 ∧ · · · ∧ fi−1)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 27

Page 28: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) =

P(h|e)× P(e)

= P(e|h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h|e) =P(e|h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 28

Page 29: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) = P(h|e)× P(e)

= P(e|h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h|e) =P(e|h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 29

Page 30: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) = P(h|e)× P(e)

= P(e|h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h|e) =P(e|h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 30

Page 31: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) = P(h|e)× P(e)

= P(e|h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h|e) =

P(e|h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 31

Page 32: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) = P(h|e)× P(e)

= P(e|h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h|e) =P(e|h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 32

Page 33: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Why is Bayes’ theorem interesting?

Often you have causal knowledge:P(symptom | disease)P(light is off | status of switches and switch positions)P(alarm | fire)

P(image looks like | a tree is in front of a car)

and want to do evidential reasoning:P(disease | symptom)P(status of switches | light is off and switch positions)P(fire | alarm).

P(a tree is in front of a car | image looks like )

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 33

Page 34: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Conditional independence

Random variable X is independent of random variable Y

given random variable Z if, for all xi ∈ dom(X ),yj ∈ dom(Y ), yk ∈ dom(Y ) and zm ∈ dom(Z ),

P(X = xi |Y = yj ∧ Z = zm)

= P(X = xi |Y = yk ∧ Z = zm)

= P(X = xi |Z = zm).

That is, knowledge of Y ’s value doesn’t affect your belief inthe value of X , given a value of Z .

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 34

Page 35: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example domain (diagnostic assistant)

light

two-wayswitch

switch

off

on

poweroutlet

circuitbreaker

outside power

l1

l2

w1

w0

w2

w4

w3

w6

w5

p2

p1

cb2

cb1s1

s2s3

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 35

Page 36: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Examples of conditional independence

The identity of the queen of Canada is independent ofwhether light l1 is lit given whether there is outsidepower.

Whether there is someone in a room is independent ofwhether a light l2 is lit given the position of switch s3.

Whether light l1 is lit is independent of the position oflight switch s2 given whether there is power in wire w0.

Every other variable may be independent of whether lightl1 is lit given whether there is power in wire w0 and thestatus of light l1 (if it’s ok , or if not, how it’s broken).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 36

Page 37: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Idea of belief networks

Whether l1 is lit (L1 lit) de-pends only on the status of thelight (L1 st) and whether thereis power in wire w0. Thus,L1 lit is independent of theother variables given L1 st andW 0. In a belief network, W 0and L1 st are parents of L1 lit.

w1 w2

s2_pos

s2_st

w0

l1_lit

l1_st

... ... ......

Similarly, W 0 depends only on whether there is power in w1,whether there is power in w2, the position of switch s2(S2 pos), and the status of switch s2 (S2 st).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 37

Page 38: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Belief networks

A belief network is a graph: the nodes are random variables;there is an arc from the parents of each node into that node.

Suppose x1, . . . , xn are the variables of interest.

Totally order the variables of interest: X1, . . . ,Xn

Theorem of probability theory (chain rule):P(X1, . . . ,Xn) =

∏ni=1 P(Xi |X1, . . . ,Xi−1)

The parents parents(Xi ) of Xi are those predecessors ofXi that render Xi independent of the other predecessors.That is, parents(Xi ) ⊆ X1, . . . ,Xi−1 andP(Xi |parents(Xi )) = P(Xi |X1, . . . ,Xi−1)

So P(X1, . . . ,Xn) =∏n

i=1 P(Xi |parents(Xi ))

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 38

Page 39: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Components of a belief network

A belief network consists of:

a directed acyclic graph with nodes labeled with randomvariables

a domain for each random variable

a set of conditional probability tables for each variablegiven its parents (including prior probabilities for nodeswith no parents).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 39

Page 40: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example belief network

Outside_power

W3

Cb1_st Cb2_st

W6

W2

W0

W1

W4

S1_st

S2_st

P1P2

S1_pos

S2_pos

S3_pos

S3_st

L2_st

L2_lit

L1_st

L1_lit

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 40

Page 41: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example belief network (continued)

The belief network also specifies:

The domain of the variables:W0, . . . ,W6 have domain live, deadS1 pos, S2 pos, and S3 pos have domain up, downS1 st has ok , upside down, short, intermittent, broken.Conditional probabilities, including:P(W1 = live|s1 pos = up ∧ S1 st = ok ∧W3 = live)P(W1 = live|s1 pos = up ∧ S1 st = ok ∧W3 = dead)P(S1 pos = up)P(S1 st = upside down)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 41

Page 42: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Belief network summary

A belief network is automatically acyclic by construction.

A belief network is a directed acyclic graph (DAG) wherenodes are random variables.

The parents of a node n are those variables on which ndirectly depends.

A belief network is a graphical representation ofdependence and independence:I A variable is independent of its non-descendants given

its parents.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 42

Page 43: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Constructing belief networks

To represent a domain in a belief network, you need toconsider:

What are the relevant variables?I What will you observe?I What would you like to find out (query)?I What other features make the model simpler?

What values should these variables take?

What is the relationship between them? This should beexpressed in terms of local influence.

How does the value of each variable depend on itsparents? This is expressed in terms of the conditionalprobabilities.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 43

Page 44: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Using belief networks

The power network can be used in a number of ways:

Conditioning on the status of the switches and circuitbreakers, whether there is outside power and the positionof the switches, you can simulate the lighting.

Given values for the switches, the outside power, andwhether the lights are lit, you can determine the posteriorprobability that each switch or circuit breaker is ok or not.

Given some switch positions and some outputs and someintermediate values, you can determine the probability ofany other variable in the network.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 44

Page 45: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

What variables are affected by observing?

If you observe variable Y , the variables whose posteriorprobability is different from their prior are:I The ancestors of Y andI their descendants.

Intuitively (if you have a causal belief network):I You do abduction to possible causes andI prediction from the causes.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 45

Page 46: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Common descendants

tampering

alarm

fire tampering and fire areindependent

tampering and fire aredependent given alarm

Intuitively, tamperingcan explain away fire

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 46

Page 47: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Common ancestors

smokealarm

fire

alarm and smoke aredependent

alarm and smoke areindependent given fire

Intuitively, fire canexplain alarm and

smoke; learning onecan affect the other bychanging your belief infire.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 47

Page 48: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Chain

report

alarm

leaving

alarm and report aredependent

alarm and report areindependent givenleaving

Intuitively, the onlyway that the alarmaffects report is byaffecting leaving .

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 48

Page 49: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Pruning Irrelevant Variables

Suppose you want to compute P(X |e1 . . . ek):

Prune any variables that have no observed or querieddescendents.

Connect the parents of any observed variable.

Remove arc directions.

Remove observed variables.

Remove any variables not connected to X in the resulting(undirected) graph.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 49

Page 50: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Belief network inference

Four main approaches to determine posterior distributions inbelief networks:

Variable Elimination: exploit the structure of the networkto eliminate (sum out) the non-observed, non-queryvariables one at a time.

Search-based approaches: enumerate some of the possibleworlds, and estimate posterior probabilities from theworlds generated.

Stochastic simulation: random cases are generatedaccording to the probability distributions.

Variational methods: find the closest tractabledistribution to the (posterior) distribution we areinterested in.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 50

Page 51: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Factors

A factor is a representation of a function from a tuple ofrandom variables into a number.We will write factor f on variables X1, . . . ,Xj as f (X1, . . . ,Xj ).We can assign some or all of the variables of a factor:

f (X1 = v1,X2, . . . ,Xj ), where v1 ∈ dom(X1), is a factoron X2, . . . ,Xj .

f (X1 = v1,X2 = v2, . . . ,Xj = vj ) is a number that is thevalue of f when each Xi has value vi .

The former is also written as f (X1,X2, . . . ,Xj )X1 = v1 , etc.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 51

Page 52: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example factors

r(X ,Y ,Z ):

X Y Z valt t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X =t,Y ,Z ):

Y Z valt t 0.1t f 0.9f t 0.2f f 0.8

r(X =t,Y ,Z =f ):Y valt 0.9f 0.8

r(X =t,Y =f ,Z =f ) = 0.8

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 52

Page 53: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Multiplying factors

The product of factor f1(X ,Y ) and f2(Y ,Z ), where Y are

the variables in common, is the factor (f1 × f2)(X ,Y ,Z )defined by:

(f1 × f2)(X ,Y ,Z ) = f1(X ,Y )f2(Y ,Z ).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 53

Page 54: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Multiplying factors example

f1:

A B valt t 0.1t f 0.9f t 0.2f f 0.8

f2:

B C valt t 0.3t f 0.7f t 0.6f f 0.4

f1 × f2:

A B C valt t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f 0.32

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 54

Page 55: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Summing out variables

We can sum out a variable, say X1 with domain v1, . . . , vk,from factor f (X1, . . . ,Xj ), resulting in a factor on X2, . . . ,Xj

defined by:

(∑X1

f )(X2, . . . ,Xj )

= f (X1 = v1, . . . ,Xj ) + · · ·+ f (X1 = vk , . . . ,Xj )

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 55

Page 56: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Summing out a variable example

f3:

A B C valt t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f 0.32

∑B f3:

A C valt t 0.57t f 0.43f t 0.54f f 0.46

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 56

Page 57: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Evidence

If we want to compute the posterior probability of Z givenevidence Y1 = v1 ∧ . . . ∧ Yj = vj :

P(Z |Y1 = v1, . . . ,Yj = vj )

=

P(Z ,Y1 = v1, . . . ,Yj = vj )

P(Y1 = v1, . . . ,Yj = vj )

=P(Z ,Y1 = v1, . . . ,Yj = vj )∑Z P(Z ,Y1 = v1, . . . ,Yj = vj ).

So the computation reduces to the probability ofP(Z ,Y1 = v1, . . . ,Yj = vj ).We normalize at the end.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 57

Page 58: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Evidence

If we want to compute the posterior probability of Z givenevidence Y1 = v1 ∧ . . . ∧ Yj = vj :

P(Z |Y1 = v1, . . . ,Yj = vj )

=P(Z ,Y1 = v1, . . . ,Yj = vj )

P(Y1 = v1, . . . ,Yj = vj )

=

P(Z ,Y1 = v1, . . . ,Yj = vj )∑Z P(Z ,Y1 = v1, . . . ,Yj = vj ).

So the computation reduces to the probability ofP(Z ,Y1 = v1, . . . ,Yj = vj ).We normalize at the end.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 58

Page 59: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Evidence

If we want to compute the posterior probability of Z givenevidence Y1 = v1 ∧ . . . ∧ Yj = vj :

P(Z |Y1 = v1, . . . ,Yj = vj )

=P(Z ,Y1 = v1, . . . ,Yj = vj )

P(Y1 = v1, . . . ,Yj = vj )

=P(Z ,Y1 = v1, . . . ,Yj = vj )∑Z P(Z ,Y1 = v1, . . . ,Yj = vj ).

So the computation reduces to the probability ofP(Z ,Y1 = v1, . . . ,Yj = vj ).We normalize at the end.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 59

Page 60: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Probability of a conjunction

Suppose the variables of the belief network are X1, . . . ,Xn.To compute P(Z ,Y1 = v1, . . . ,Yj = vj ), we sum out the othervariables, Z1, . . . ,Zk = X1, . . . ,Xn − Z − Y1, . . . ,Yj.We order the Zi into an elimination ordering.

P(Z ,Y1 = v1, . . . ,Yj = vj )

=

∑Zk

· · ·∑

Z1

P(X1, . . . ,Xn)Y1 = v1,...,Yj = vj.

=∑Zk

· · ·∑

Z1

n∏i=1

P(Xi |parents(Xi ))Y1 = v1,...,Yj = vj.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 60

Page 61: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Probability of a conjunction

Suppose the variables of the belief network are X1, . . . ,Xn.To compute P(Z ,Y1 = v1, . . . ,Yj = vj ), we sum out the othervariables, Z1, . . . ,Zk = X1, . . . ,Xn − Z − Y1, . . . ,Yj.We order the Zi into an elimination ordering.

P(Z ,Y1 = v1, . . . ,Yj = vj )

=∑Zk

· · ·∑

Z1

P(X1, . . . ,Xn)Y1 = v1,...,Yj = vj.

=

∑Zk

· · ·∑

Z1

n∏i=1

P(Xi |parents(Xi ))Y1 = v1,...,Yj = vj.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 61

Page 62: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Probability of a conjunction

Suppose the variables of the belief network are X1, . . . ,Xn.To compute P(Z ,Y1 = v1, . . . ,Yj = vj ), we sum out the othervariables, Z1, . . . ,Zk = X1, . . . ,Xn − Z − Y1, . . . ,Yj.We order the Zi into an elimination ordering.

P(Z ,Y1 = v1, . . . ,Yj = vj )

=∑Zk

· · ·∑

Z1

P(X1, . . . ,Xn)Y1 = v1,...,Yj = vj.

=∑Zk

· · ·∑

Z1

n∏i=1

P(Xi |parents(Xi ))Y1 = v1,...,Yj = vj.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 62

Page 63: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Computing sums of products

Computation in belief networks reduces to computing thesums of products.

How can we compute ab + ac efficiently?

Distribute out the a giving a(b + c)

How can we compute∑

Z1

∏ni=1 P(Xi |parents(Xi ))

efficiently?

Distribute out those factors that don’t involve Z1.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 63

Page 64: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Computing sums of products

Computation in belief networks reduces to computing thesums of products.

How can we compute ab + ac efficiently?

Distribute out the a giving a(b + c)

How can we compute∑

Z1

∏ni=1 P(Xi |parents(Xi ))

efficiently?

Distribute out those factors that don’t involve Z1.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 64

Page 65: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Computing sums of products

Computation in belief networks reduces to computing thesums of products.

How can we compute ab + ac efficiently?

Distribute out the a giving a(b + c)

How can we compute∑

Z1

∏ni=1 P(Xi |parents(Xi ))

efficiently?

Distribute out those factors that don’t involve Z1.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 65

Page 66: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Computing sums of products

Computation in belief networks reduces to computing thesums of products.

How can we compute ab + ac efficiently?

Distribute out the a giving a(b + c)

How can we compute∑

Z1

∏ni=1 P(Xi |parents(Xi ))

efficiently?

Distribute out those factors that don’t involve Z1.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 66

Page 67: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination algorithm

To compute P(Z |Y1 = v1 ∧ . . . ∧ Yj = vj ):

Construct a factor for each conditional probability.

Set the observed variables to their observed values.

Sum out each of the other variables (the Z1, . . . ,Zk)according to some elimination ordering.

Multiply the remaining factors. Normalize by dividing theresulting factor f (Z ) by

∑Z f (Z ).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 67

Page 68: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Summing out a variable

To sum out a variable Zj from a product f1, . . . , fk of factors:

Partition the factors intoI those that don’t contain Zj , say f1, . . . , fi ,I those that contain Zj , say fi+1, . . . , fk

We know:

∑Zj

f1× · · ·×fk = f1× · · ·×fi×

∑Zj

fi+1× · · ·×fk

.

Explicitly construct a representation of the rightmostfactor. Replace the factors fi+1, . . . , fk by the new factor.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 68

Page 69: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

tampering

alarm

fire

leaving

report

smoke

P(Tampering ,Fire,Alarm,Smoke, Leaving ,Report) =P(Tampering)× P(Fire)× P(Alarm|Fire,Tampering)× P(Smoke|Fire)× P(Leaving |Alarm)× P(Report|Leaving)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 69

Page 70: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

tampering

alarm

fire

leaving

report

smoke

P(tampering) = 0.02P(fire) = 0.01P(alarm|fire ∧ tampering) = 0.5P(alarm|fire ∧ ¬tampering) = 0.99P(alarm|¬fire ∧ tampering) = 0.85P(alarm|¬fire ∧ ¬tampering) = 0.0001P(smoke|fire) = 0.9P(smoke|¬fire) = 0.01P(leaving |alarm) = 0.88P(leaving |¬alarm) = 0.001P(report|leaving) = 0.75P(report|¬leaving) = 0.01

Query: P(Tampering |Smoke = true ∧ Report = true).

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 70

Page 71: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Conditional probabilities and factors

P(Tampering)→ f0(Tampering) =true 0.02false 0.98

P(Fire)→ f1(Fire) =true 0.01false 0.99

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 71

Page 72: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Conditional probabilities and factors

P(Alarm|Tampering ,Fire)

→ f2(Tampering ,Fire,Alarm) =

true true true 0.5true true false 0.5true false true 0.85true false false 0.15false true true 0.99false true false 0.01false false true 0.0001false false false 0.9999

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 72

Page 73: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Conditional probabilities and factors

P(Smoke|Fire)

→ f3(Fire, Smoke) =

true true 0.9true false 0.1false true 0.01false false 0.99

P(Leaving |Alarm)

→ f4(Alarm, Leaving) =

true true 0.88true false 0.12false true 0.001false false 0.999

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 73

Page 74: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Conditional probabilities and factors

P(Report|Leaving)

→ f5(Leaving ,Report) =

true true 0.75true false 0.25false true 0.01false false 0.99

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 74

Page 75: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

variables: Tampering ,Fire,Alarm, Smoke, Leaving ,Report

query: P(Tampering |Smoke = true ∧ Report = true)

to eliminate:

Fire,Alarm, Smoke, Leaving ,Report

distributions: P(Alarm|Tampering ,Fire)

P(Smoke|Fire)

P(Leaving |Alarm)

P(Report|Leaving)

P(Tampering)

P(Fire)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 75

Page 76: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

variables: Tampering ,Fire,Alarm, Smoke, Leaving ,Report

query: P(Tampering |Smoke = true ∧ Report = true)

to eliminate: Fire,Alarm, Smoke, Leaving ,Report

distributions: P(Alarm|Tampering ,Fire)

P(Smoke|Fire)

P(Leaving |Alarm)

P(Report|Leaving)

P(Tampering)

P(Fire)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 76

Page 77: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

variables: Tampering ,Fire,Alarm, Smoke, Leaving ,Report

query: P(Tampering |Smoke = true ∧ Report = true)

to eliminate: Fire,Alarm, Smoke, Leaving ,Report

distributions:

P(Alarm|Tampering ,Fire)

P(Smoke|Fire)

P(Leaving |Alarm)

P(Report|Leaving)

P(Tampering)

P(Fire)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 77

Page 78: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

variables: Tampering ,Fire,Alarm, Smoke, Leaving ,Report

query: P(Tampering |Smoke = true ∧ Report = true)

to eliminate: Fire,Alarm, Smoke, Leaving ,Report

distributions: P(Alarm|Tampering ,Fire)

P(Smoke|Fire)

P(Leaving |Alarm)

P(Report|Leaving)

P(Tampering)

P(Fire)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 78

Page 79: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Eliminate the observed variable Smoke

f3(Fire, Smoke) =

true true 0.9true false 0.1false true 0.01false false 0.99

P(Smoke = true|Fire)

→ f ′3(Fire) =true 0.9false 0.01

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 79

Page 80: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Eliminate the observed variable Report

f5(Leaving ,Report) =

true true 0.75true false 0.25false true 0.01false false 0.99

P(Report = yes|Leaving)

→ f ′5(Leaving) =true 0.75false 0.01

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 80

Page 81: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Select e.g. Fire to be eliminated next

Collect all the factors containing Fire:

f1(Fire) =true 0.01false 0.99

f ′3(Fire) =true 0.9false 0.01

f2(Tampering ,Fire,Alarm) =

true true true 0.5true true false 0.5true false true 0.85true false false 0.15false true true 0.99false true false 0.01false false true 0.0001false false false 0.9999

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 81

Page 82: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Compute a new factor for them, eliminating Fire

f6(Tampering ,Alarm)

=∑Fire

(f1(Fire)× f2(Tampering ,Fire,Alarm)× f3(Fire)

=

true true 0.01292true false 0.00599false true 0.00891false false 0.00999

remaining factors:f0(Tampering), f4(Alarm, Leaving), f ′5(Leaving),f6(Tampering ,Alarm)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 82

Page 83: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Select e.g. Alarm to be eliminated next.

Collect the factors containing Alarm

f4(Alarm, Leaving) =

true true 0.88true false 0.12false true 0.001false false 0.999

f6(Tampering ,Alarm) =

true true 0.0129true false 0.006false true 0.0089false false 0.01

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 83

Page 84: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Compute a new factor for them, eliminating Alarm

f7(Tampering , Leaving)

=∑Alarm

f6(Tampering ,Alarm)× f4(Leaving ,Alarm)

=

true true 0.01137true false 0.00753false true 0.00785false false 0.01105

remaining factors:

f0(Tampering), f ′5(Leaving), f7(Tampering , Leaving)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 84

Page 85: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Select Leaving to be eliminated next

Collect the factors containing Leaving

f ′5(Leaving) =true 0.75false 0.01

f7(Tampering , Leaving , ) =

true true 0.01137true false 0.00753false true 0.00785false false 0.01105

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 85

Page 86: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Compute a new factor for them, eliminating Leavingf8(Tampering)

=∑

Leaving

f ′5(Leaving)× f7(Tampering , Leaving)

=true 0.0086false 0.006

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 86

Page 87: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Multiply the remaining factors for Tampering

f0(Tampering) =true 0.02false 0.98

f8(Tampering) =true 0.0086false 0.006

f9(Tampering)

= f0(Tampering)× f8(Tampering) =true 0.00017false 0.00588

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 87

Page 88: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

posterior distribution for Tampering

P(Tampering |Report, Smoke) =

f9(Tampering)∑Tampering f9(Tampering)

=true 0.02844false 0.97156

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 88

Page 89: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 89

Page 90: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 90

Page 91: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 91

Page 92: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

f ′5(Leaving)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 92

Page 93: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

f ′5(Leaving)

f 1(Fire) f 2(Tampering ,Fire,Alarm)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 93

Page 94: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

f ′5(Leaving)

f 1(Fire) f 2(Tampering ,Fire,Alarm)

f 6(Tampering ,Alarm)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 94

Page 95: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

f ′5(Leaving)

f 1(Fire) f 2(Tampering ,Fire,Alarm)

f 6(Tampering ,Alarm) f 4(Alarm, Leaving)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 95

Page 96: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

f ′5(Leaving)

f 1(Fire) f 2(Tampering ,Fire,Alarm)

f 6(Tampering ,Alarm) f 4(Alarm, Leaving)

f 7(Tampering , Leaving)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 96

Page 97: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

f ′5(Leaving)

f 1(Fire) f 2(Tampering ,Fire,Alarm)

f 6(Tampering ,Alarm) f 4(Alarm, Leaving)

f 7(Tampering , Leaving)

f 8(Tampering)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 97

Page 98: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

f ′5(Leaving)

f 1(Fire) f 2(Tampering ,Fire,Alarm)

f 6(Tampering ,Alarm) f 4(Alarm, Leaving)

f 7(Tampering , Leaving)

f 8(Tampering)f 0(Tampering)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 98

Page 99: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

f ′5(Leaving)

f 1(Fire) f 2(Tampering ,Fire,Alarm)

f 6(Tampering ,Alarm) f 4(Alarm, Leaving)

f 7(Tampering , Leaving)

f 8(Tampering)f 0(Tampering)

f 9(Tampering)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 99

Page 100: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable elimination example

Smoke = true f 3(Fire,Smoke)

f ′3(Fire)

f 5(Leaving ,Report) Report = true

f ′5(Leaving)

f 1(Fire) f 2(Tampering ,Fire,Alarm)

f 6(Tampering ,Alarm) f 4(Alarm, Leaving)

f 7(Tampering , Leaving)

f 8(Tampering)f 0(Tampering)

f 9(Tampering)

Normalization

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 100

Page 101: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable Elimination example

A B C D E F

G H

Query: P(G |f ); elimination ordering: A,H ,E ,D,B ,C

P(G |f ) ∝

∑C

∑B

∑D

∑E

∑H

∑A

P(A)P(B |A)P(C |B)

P(D|C )P(E |D)P(f |E )P(G |C )P(H |E )

=∑

C

(∑B

(∑A

P(A)P(B |A)

)P(C |B)

)P(G |C )(∑

D

P(D|C )

(∑E

P(E |D)P(f |E )∑

H

P(H |E )

))

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 101

Page 102: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable Elimination example

A B C D E F

G H

Query: P(G |f ); elimination ordering: A,H ,E ,D,B ,C

P(G |f ) ∝∑

C

∑B

∑D

∑E

∑H

∑A

P(A)P(B |A)P(C |B)

P(D|C )P(E |D)P(f |E )P(G |C )P(H |E )

=∑

C

(∑B

(∑A

P(A)P(B |A)

)P(C |B)

)P(G |C )(∑

D

P(D|C )

(∑E

P(E |D)P(f |E )∑

H

P(H |E )

))

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 102

Page 103: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Variable Elimination example

A B C D E F

G H

Query: P(G |f ); elimination ordering: A,H ,E ,D,B ,C

P(G |f ) ∝∑

C

∑B

∑D

∑E

∑H

∑A

P(A)P(B |A)P(C |B)

P(D|C )P(E |D)P(f |E )P(G |C )P(H |E )

=∑

C

(∑B

(∑A

P(A)P(B |A)

)P(C |B)

)P(G |C )(∑

D

P(D|C )

(∑E

P(E |D)P(f |E )∑

H

P(H |E )

))c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 103

Page 104: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Stochastic Simulation

Idea: probabilities ↔ samples

Get probabilities from samples:

X count

x1 n1...

...xk nk

total m

X probability

x1 n1/m...

...xk nk/m

If we could sample from a variable’s (posterior)probability, we could estimate its (posterior) probability.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 104

Page 105: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Generating samples from a distribution

For a variable X with a discrete domain or a (one-dimensional)real domain:

Totally order the values of the domain of X .

Generate the cumulative probability distribution:f (x) = P(X ≤ x).

Select a value y uniformly in the range [0, 1].

Select the x such that f (x) = y .

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 105

Page 106: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Cumulative Distribution

0

1

v1 v2 v3 v4 v1 v2 v3 v4

P(X)

f(X)

0

1

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 106

Page 107: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Forward sampling in a belief network

Sample the variables one at a time; sample parents of Xbefore sampling X .

Given values for the parents of X , sample from theprobability of X given its parents.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 107

Page 108: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rejection Sampling

To estimate a posterior probability given evidenceY1 = v1 ∧ . . . ∧ Yj = vj :

Reject any sample that assigns Yi to a value other thanvi .

The non-rejected samples are distributed according to theposterior probability:

P(α|evidence) ≈∑

sample|=α 1∑sample 1

where we consider only samples consistent with evidence.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 108

Page 109: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rejection Sampling Example: P(ta|sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false

8

s2 false true true true true true 4

s3 true false true false — — 8

s4 true true true true true true 4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 109

Page 110: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rejection Sampling Example: P(ta|sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true

4

s3 true false true false — — 8

s4 true true true true true true 4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 110

Page 111: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rejection Sampling Example: P(ta|sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true 4

s3 true false true false

— — 8

s4 true true true true true true 4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 111

Page 112: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rejection Sampling Example: P(ta|sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true 4

s3 true false true false — — 8

s4 true true true true true true

4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 112

Page 113: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rejection Sampling Example: P(ta|sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true 4

s3 true false true false — — 8

s4 true true true true true true 4

. . .s1000 false false false false

— — 8

P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 113

Page 114: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rejection Sampling Example: P(ta|sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true 4

s3 true false true false — — 8

s4 true true true true true true 4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 114

Page 115: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Importance Sampling

Samples have weights: a real number associated witheach sample that takes the evidence into account.

Probability of a proposition is weighted average ofsamples:

P(α|evidence) ≈∑

sample|=α weight(sample)∑sample weight(sample)

don’t sample all of the variables, but weight each sampleaccording to a proposal distribution P(evidence|sample).

summing out the variables which are neither observed norsampled (exact inference)

the proposal distribution should be as close as possible tothe posterior distribution (unknown at sampling time)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 115

Page 116: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Applications of Bayesian Networks

modelling human multimodal perceptionI human sensor data fusionI top down influences in human perception

multimodal human-computer interaction

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 116

Page 117: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Human Sensor Data Fusion

two general strategies (Ernst and Bulthoff, 2004)I sensory combination: maximize information delivered

from the different sensory modalitiesI sensory integration: reduce the variance in the sensory

estimate to increase its reliability

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 117

Page 118: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Sensor Data Fusion

sensory integration has to produce a coherent percept

Which modality is the dominating one?I visual capture: e.g. vision dominates haptic perceptionI auditory capture: e.g. number of auditory beeps vs.

number of visual flashes

modality precision, modality appropriateness, estimateprecision: the most precise modality wins

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 118

Page 119: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Sensor Data Fusion

two possible explanations:I maximum likelihood estimation: weighted sum of the

individual estimatesI all cues contribute to the percept

I cue switching:I the most precise cue takes overI the less precise cues have no influence

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 119

Page 120: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Sensor Data Fusion

maximum likelihood estimate:I weighted sum of the individual estimatesI weights are proportional to their inverse variance

s =∑

i

wi si with∑

i

wi = 1

wi =1/σ2

i∑j 1/σ2

j

I most reliable unbiased estimate possible (estimate withminimal variance)

I optimality not really required; good approximation mightbe good enough

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 120

Page 121: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Sensor Data Fusion

overwhelming evidence for the role of estimate precision...

weighting within modalitiesI visual depth perception: motion + disparity, texture +

disparityI visual perception of slantI visual perception of distanceI haptic shape perception: force + position

cross modal weighting:I vision + auditionI vision + hapticI vision + proprioception

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 121

Page 122: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Sensor Data Fusion

... but no really conclusive evidence for the reliabilityhypothesis

problem: estimating the variance of a stimulusI requires an independence assumptionI difficult to achieve in a unimodal taskI cues within one modality are correlatedI → multi-modal experiments

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 122

Page 123: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Sensor Data Fusion

Ernst and Banks (2002): vision-haptic integrationI modifying the visual reliability by adding noise to the

visual channelI two extreme cases:

I vision dominates (little noise)I haptics dominate (high noise)

→ perception requires dynamic adjustment of weights→ nervous system has online access to sensory reliabilities

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 123

Page 124: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Sensor Data Fusion

open question: Where do the estimates come from?

prior experience vs. on-line estimation during perception

on-line is more likely: observing the fluctuations ofresponses to a signalI over some period of timeI across a population of independent neurons (population

codes)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 124

Page 125: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Top-Down Influence

perception is modulated by contextual factors, e.g sceneor object properties

How to model top-down influence?I can be captured by prior probabilitiesI prior probabilities can be integrated by means of Bayes

rule→ Bayesian reasoning

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 125

Page 126: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Top-Down Influence

Kersten and Yuilley (2003)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 126

Page 127: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Top-Down Influence

Kersten and Yuilley (2003)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 127

Page 128: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Top-Down Influence

Kersten and Yuilley (2003)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 128

Page 129: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Top-Down Influence

Kersten and Yuilley (2003)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 129

Page 130: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Multimodal Human-Computer Interaction

Socher, Sagerer, Perona (2000), Wachsmuth, Sagerer(2002)I multi-modal human machine

interaction usingI speechI visionI (pointing gestures)

data fusion from different reference systemsI spatial (vision) vs. temporal (speech)I language based instruction: fusion on the level of

concepts

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 130

Page 131: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Multimodal Human-Computer Interaction

noisy and partial interpretation of the sensory signals

dealing with referential uncertainty

goal: cross modal synergy

sensory data: properties (color) and (spatial)relationships: degree-of-membership representation(fuzzyness)

combination using Bayesian Networks

estimating the probabilities by means of psycholinguisticexperimentsI how do humans categorize objects and verbalize object

descriptions

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 131

Page 132: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Multimodal Human-Computer Interaction

identified object(23)3-holed bar5-holed bar7-holed barcube, redcube, blue...

likelihood of the object being the intended one

scene (23)3-holed bar5-holed bar7-holed barcube, redcube, blue...

probability of being part of the scene

objectn (23)3-holed bar5-holed bar7-holed barcube, redcube, blue...

probability of the categorization

object1 (23)3-holed bar5-holed bar7-holed barcube, redcube, blue...

type3-holed bar5-holed bar7-holed barcube...

colorredbluegreen...

type3-holed bar5-holed bar7-holed barcube...

colorredbluegreen...

instruction(23)3-holed bar5-holed bar7-holed barcube, redcube, blue...

prob of having been mentioned

typeobjectbarcube3-holed bar...

colorredbluegreen...

sizesmallbigshortlong...

shaperoundangularelongatedhexagonal...

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 132

Page 133: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Multimodal Human-Computer Interaction

more sophisticated fusion model (Wachsmuth, Sagerer2002)I solution to the correspondence problem using selection

variables

more adequate modelling of naming habits

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 133

Page 134: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Multimodal Human-Computer Interaction

results for object identification

correct noisy noisy noisyinput speech vision input

recognition error rates – 15% 20% 15%+20%identification rates 0.85 0.81 0.79 0.76decrease of identification rates – 5% 7% 11%

synergy between vision and speech

higher robustness due to redundancy between modalities

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 134

Page 135: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Bayesian Models for Sequences

the world is dynamicI old information becomes obsoleteI new information is availableI the decisions an agent takes need to reflect these

changes

the dynamics of the world can be captured by means ofstate-based models

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 135

Page 136: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Bayesian Models for Sequences

changes in the world are modelled as transitions betweensubsequent states

state transitions can beI clocked, e.g.

I speech: every 10 msI vision: every 40 msI stock market trends: every 24 hours

I triggered by external eventsI language: every other wordI travel planning: potential transfer points

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 136

Page 137: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Bayesian Models for Sequences

main purpose:I predicting the probability of the next eventI computing the probability of a (sub-)sequence

important application areas:I speech and language processing, genome analysis, time

series predictions (stock market, natural desasters, ...)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 137

Page 138: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Markov chain

Markov chain : special sort of belief network forsequential observations

S0 S1 S2 S3 S4

Thus, P(St+1|S0, . . . , St) = P(St+1|St).

Intuitively St conveys all of the information about thehistory that can affect the future states.

“The future is independent of the past given the present.”

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 138

Page 139: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Stationary Markov chain

A stationary Markov chain is when for all t > 0, t ′ > 0,P(St+1|St) = P(St′+1|St′).

We specify P(S0) and P(St+1|St).I Simple model, easy to specifyI Often the natural modelI The network can extend indefinitely

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 139

Page 140: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Higher-order Markov Models

modelling dependencies of various lengths

bigrams

S0 S1 S2 S3 S4

trigrams

S0 S1 S2 S3 S4

I three different time slices have to be modelledI for S0: P(S0)I for S1: P(S1|S0)I for all others: P(Si |Si−2Si−1)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 140

Page 141: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Higher-order Markov Models

quadrograms: P(Si |Si−3Si−2Si−1)

S0 S1 S2 S3 S4

four different kinds of time slices required

Markov models can be applied to predict the probabilityof the next event

e.g. for speech and language processing, genome analysis,time series predictions (stock market, natural desasters,...)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 141

Page 142: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Higher-order Markov Models

quadrograms: P(Si |Si−3Si−2Si−1)

S0 S1 S2 S3 S4

four different kinds of time slices required

Markov models can be applied to predict the probabilityof the next event

e.g. for speech and language processing, genome analysis,time series predictions (stock market, natural desasters,...)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 142

Page 143: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Markov Models

examples of Markov chains for German letter sequences

unigrams:aiobnin*tarsfneonlpiitdregedcoa*ds*e*dbieastnreleeucdkeaitb*dnurlarsls*omn*keu**svdleeoieei* . . .

bigrams:er*agepteprteiningeit*gerelen*re*unk*ves*mterone*hin*d*an*nzerurbom* . . .

trigrams:billunten*zugen*die*hin*se*sch*wel*war*gen*man*nicheleblant*diertunderstim* . . .

quadrograms:eist*des*nich*in*den*plassen*kann*tragen*was*wiese*zufahr* . . .

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 143

Page 144: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Hidden Markov Model

Often the observation does not deterministically dependon the state of the model

This can be captured by a Hidden Markov Model(HMM)

... even if the state transitions are not directly observable

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 144

Page 145: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Hidden Markov Model

A HMM is a belief network where states and observationsare separated

S0 S1 S2 S3 S4

O0 O1 O2 O3 O4

P(S0) specifies initial conditions

P(St+1|St) specifies the dynamics

P(Ot |St) specifies the sensor model

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 145

Page 146: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Hidden Markov Models

A Hidden Markov Model consists of

S: a finite set of states si

O: a finite set of observations oi

transition probabilities T : S × S 7→ R+

emission probabilities E : S ×O 7→ R+

prior probabilities (for the initial states): π : S 7→ R+

HMMs are a special case of belief networks→ arbitrary distributions can be computed by means ofvariable elimination

HMMs make strong assumptions about the modeltopology→ special algorithms for inference are available

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 146

Page 147: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Hidden Markov Models

A Hidden Markov Model consists of

S: a finite set of states si

O: a finite set of observations oi

transition probabilities T : S × S 7→ R+

emission probabilities E : S ×O 7→ R+

prior probabilities (for the initial states): π : S 7→ R+

HMMs are a special case of belief networks→ arbitrary distributions can be computed by means ofvariable elimination

HMMs make strong assumptions about the modeltopology→ special algorithms for inference are available

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 147

Page 148: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Inference with Hidden Markov Models

Evaluation:What’s the probability of an HMM λ having produced anobservation sequence O = (o1, o2, . . . , ot)problem: hidden state sequence S = (s1, s2, . . . , st) is notknown

P(O|λ) =∑∀S

P(O|S , λ) P(S |λ)

with

P(O|S , λ) =t∏

i=1

P(Oi = oi |Si = si ) = Es1,o1 Es2,o2 . . .Est ,ot

P(S |λ) = P(S1 = s1)t∏

i=2

P(Si = si |Si−1 = si−1)

= πs1 Ts1,s2 Ts2,s3 . . .Tst−1,st

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 148

Page 149: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Inference with Hidden Markov Models

Forward algorithm

there are exponentially many state sequences→ naive computation requires exponentially manymultiplications: O(t · |S|t)

efficient calculation using a recursive reformulation basedon the Markov property

forward coefficients: αk(s) = P(O1:k , Sk = s|λ)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 149

Page 150: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Inference with Hidden Markov Models

Forward algorithm

initialize α1(s) = πs Es,o1 ∀s ∈ Srepeat, for k = 1 to k = t − 1 and for all s ∈ S

αk+1(s) = Es,ok+1

∑q∈S

αk(q) Tq,s

aggregate:

P(O|λ) =∑s∈S

αt(s)

complexity reduced to O(t · |S|2)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 150

Page 151: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Inference with Hidden Markov Models

Backward algorithm

initialize βk(s) = 1 ∀s ∈ Srepeat, for k = t − 1 to k = 1 and for all s ∈ S

βk(s) =∑q∈S

βk+1(q)Ts,q Eq,ok+1

aggregate:

P(O|λ) =∑s∈S

π(s) β1(s) Es,o1

complexity reduced to O(t · |S|2)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 151

Page 152: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Inference with Hidden Markov Models

Explanation

Filtering: P(Sk |O1:k)given an observation sequence O1:k = (o1, o2, . . . , ok)compute the distribution of Sk

Smoothing: P(Sj |O1:k), j < kgiven an observation sequence O1:k = (o1, o2, . . . , ok)compute the distribution of Sj , j < k

Prediction: P(Sj |O1:k), j > kgiven an observation sequence O1:k = (o1, o2, . . . , ok)compute the distribution of Sj , j > k

Decoding: S1k= arg max

SP(S1:k |O1:k)

given an observation sequence O1:k = (o1, o2, . . . , ok)compute the most likely state sequence S1:k

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 152

Page 153: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Filtering

Filtering:

P(Si |o1, . . . , oi )

What is the current belief state based on the observationhistory?

P(Si |o1, . . . , oi ) ∝ P(Si , o1, . . . , oi )

= P(oi |Si )P(Si , o1, . . . , oi−1)

= P(oi |Si )∑Si−1

P(Si , Si−1, o1, . . . , oi−1)

= P(oi |Si )∑Si−1

P(Si |Si−1)P(Si−1, o1, . . . , oi−1)

∝ P(oi |Si )∑Si−1

P(Si |Si−1)P(Si−1|o1, . . . , oi−1)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 153

Page 154: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Filtering

Filtering:

P(Si |o1, . . . , oi )

What is the current belief state based on the observationhistory?

P(Si |o1, . . . , oi ) ∝ P(Si , o1, . . . , oi )

= P(oi |Si )P(Si , o1, . . . , oi−1)

= P(oi |Si )∑Si−1

P(Si , Si−1, o1, . . . , oi−1)

= P(oi |Si )∑Si−1

P(Si |Si−1)P(Si−1, o1, . . . , oi−1)

∝ P(oi |Si )∑Si−1

P(Si |Si−1)P(Si−1|o1, . . . , oi−1)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 154

Page 155: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example (1): robot localization

Suppose a robot wants to determine its location based onits actions and its sensor readings: Localization

This can be represented by the augmented HMM:

Loc0 Loc1 Loc2 Loc3 Loc4

Obs0 Obs1 Obs2 Obs3 Obs4

Act0 Act1 Act2 Act3

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 155

Page 156: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example localization domain

Circular corridor, with 16 locations:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Doors at positions: 2, 4, 7, 11.

Noisy Sensors

Stochastic Dynamics

Robot starts at an unknown location and must determinewhere it is.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 156

Page 157: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example Sensor Model

P(Observe Door | At Door) = 0.8

P(Observe Door | Not At Door) = 0.1

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 157

Page 158: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example Dynamics Model

P(loct+1 = L|actiont = goRight ∧ loct = L) = 0.1

P(loct+1 = L + 1|actiont = goRight ∧ loct = L) = 0.8

P(loct+1 = L + 2|actiont = goRight ∧ loct = L) = 0.074

P(loct+1 = L′|actiont = goRight ∧ loct = L) = 0.002 forany other location L′.I All location arithmetic is modulo 16.I The action goLeft works the same but to the left.

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 158

Page 159: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Combining sensor information

the robot can have many (noisy) sensors for signals fromthe environment

e.g. a light sensor in addition to the door sensor

Sensor Fusion : combining information from differentsources

Loc0 Loc1 Loc2 Loc3 Loc4

Act0 Act1 Act2 Act3

D0 D1 D2 D3 D4L0 L1 L2 L3 L4

Dt door sensor value at time tLt light sensor value at time t

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 159

Page 160: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example (2) Medical diagnosis

milk infection test (Jensen and Nielsen 2007)I having test data for a certain period of time available,

what’s the probability that a cow is currently infected

the probability of the test outcome depends on the cowbeing infected or not

Infected?

Test

the probability of the cow being infected also depends onthe cow being infected on the previous dayI first order Markov model

Inf1

Test1

Inf2

Test2

Inf3

Test3

Inf4

Test4

Inf5

Test5

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 160

Page 161: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example dynamics

the probability of the cow being infected depends on thecow being infected on the two previous daysI incubation and infection periods of more than one dayI second order Markov model

Inf1

Test1

Inf2

Test2

Inf3

Test3

Inf4

Test4

Inf5

Test5

I assumes only random test errors

weaker independence assumptionsI more powerful modelI more data required for training

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 161

Page 162: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Refined models of the dynamics

the probability of the test outcome also depends on thecow’s health and the test outcome on the previous dayI can also capture systematic test errorsI second order Markov model for the infectionI first order Markov model for the test results

Inf1

Test1

Inf2

Test2

Inf3

Test3

Inf4

Test4

Inf5

Test5

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 162

Page 163: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Decoding

What’s the state sequence which most likely produced theobservation?

filtering and smoothing produce probability distributionsfor the values of a state variable

choosing the value with the highest probability gives onlya pointwise best estimationI a sequence of pointwise best estimation need not be the

best state value sequenceI the model need not even be able to produce the

pointwise best sequence

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 163

Page 164: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Decoding

Viterbi coefficients:

δk(s) = maxS1:k−1

P(S1:k = (S1:k−1, s),O1:k |λ)

I δk (s) is the probability of the most likely path ending instate s and generating the observation sequence O1:k

because of the Markov property the computationsimplifies to

δk+1(s) = maxq

(δk(q) Tq,s) Es,ok+1

this corresponds to the principle of dynamic programming

first computing the deltas in a forward pass

afterwards reconstructing the best path by means ofpointers pred from each state qk to its most likelypredecessor qk−1

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 164

Page 165: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Decoding

Viterbi coefficients:

δk(s) = maxS1:k−1

P(S1:k = (S1:k−1, s),O1:k |λ)

I δk (s) is the probability of the most likely path ending instate s and generating the observation sequence O1:k

because of the Markov property the computationsimplifies to

δk+1(s) = maxq

(δk(q) Tq,s) Es,ok+1

this corresponds to the principle of dynamic programming

first computing the deltas in a forward pass

afterwards reconstructing the best path by means ofpointers pred from each state qk to its most likelypredecessor qk−1

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 165

Page 166: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Decoding

Viterbi coefficients:

δk(s) = maxS1:k−1

P(S1:k = (S1:k−1, s),O1:k |λ)

I δk (s) is the probability of the most likely path ending instate s and generating the observation sequence O1:k

because of the Markov property the computationsimplifies to

δk+1(s) = maxq

(δk(q) Tq,s) Es,ok+1

this corresponds to the principle of dynamic programming

first computing the deltas in a forward pass

afterwards reconstructing the best path by means ofpointers pred from each state qk to its most likelypredecessor qk−1

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 166

Page 167: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Decoding

Viterbi algorithm

intitialize for all s ∈ SI δ1(s) = πs Es,o1

I pred1(s) = null

repeat recursivelyI δk+1(s) = max

q(δk (q) Tq,s) Es,ok+1

I predk+1(s) = arg maxq

(δk (q) Tq,s)

select the most likely terminal state st = arg maxsδt(s)

with p = δ(st) being the probability of the most likelypath

reconstruct the most likely path backwards:qk = predk+1(qk+1)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 167

Page 168: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Decoding

Viterbi algorithm

is similar to the forward algorithm

uses maximization instead of summation

has many applications in signal processing, patternrecognition, biocomputing, natural language processing,etc.I message reconstruction for noisy wireless communicationI speech recognition / speech synthesisI machine translationI swype keyboardsI intron/exon detectionI ...

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 168

Page 169: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Decoding

Viterbi algorithm

is similar to the forward algorithm

uses maximization instead of summation

has many applications in signal processing, patternrecognition, biocomputing, natural language processing,etc.I message reconstruction for noisy wireless communicationI speech recognition / speech synthesisI machine translationI swype keyboardsI intron/exon detectionI ...

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 169

Page 170: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example (3) Part-of-Speech tagging

elementary procedure for Natural Language Processing

annotating the word forms in a sentence with

part-of-speech informationYesterdayRB theDT schoolNNS wasVBD closedVBN

topic areas: He did some field work.fieldmilitary , fieldagriculture , fieldphysics , fieldsocial sci ., fieldoptics , ...

semantic rolesThe winnerBeneficiary received the trophyTheme at thetown hallLocation

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 170

Page 171: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example (3): Part-of-Speech tagging

sequence labelling problemI the label depends on the current state and the most

recent history

one-to-one correspondence between states, tags, andword forms

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 171

Page 172: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example dynamics

causal (generative) model of the sentence generationprocessI tags are assigned to statesI the underlying state (tag) sequence produces the

observations (word forms)

typical independence assumptionsI trigram probabilities for the state transitionsI word form probabilities depend only on the current state

Tag1

Word1

Tag2

Word2

Tag3

Word3

Tag4

Word4

Tag5

Word5

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 172

Page 173: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example dynamics

weaker independence assumption (stronger model):I the probability of a word form also depends on the

previous and subsequent state

Tag1

Word1

Tag2

Word2

Tag3

Word3

Tag4

Word4

Tag5

Word5

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 173

Page 174: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Two alternative graphical representations

influence diagrams, belief networks, Bayesian networks,causal networks, graphical models, ...state transition diagrams (probabilistic finite statemachines)

Bayesian networks State transition diagrams

state nodes variables with statesstates as values

edges into causal influence possible state transitionsstate nodes and their probabilities

# state nodes length of the observa-tion sequence

# model states

observation variables with observation valuesnodes observations as values

edges into conditional probability conditional probabilitiesobserv. nodes tables

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 174

Page 175: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Two alternative graphical representations

Bigram-tagging as a Bayesian network

Tag1

Word1

Tag2

Word2

Tag3

Word3

Tag4

Word4

Tag5

Word5

possible state transitions are not directly visibleI indirectly encoded in the conditional probability tables

sometimes state transition diagrams are better suited toillustrate the model topology

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 175

Page 176: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Two alternative graphical representations

Bigram-Tagging as a state transition diagram (can onlybe depicted for bigram models)

t1

t2

t3

t4

w1 . . . wn

w1 . . . wn

w1 . . . wn

w1 . . . wn

ergodic model: full connectivity between all states

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 176

Page 177: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example (4): Speech Recognition

similar problem: Swype gesture recognition

observation subsequences of unknown length are mappedto one label→ alignment problem

full connectivity is not needed

a phone/syllable/word realization cannot be reversed

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 177

Page 178: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example dynamics

possible model topologies for phones (only transitionsdepicted)

P(1|0) P(1|1) 0 0 0P(2|0) P(2|1) P(2|2) 0 0

0 P(3|1) P(3|2) P(3|3) 00 0 P(4|2) P(4|3) 0

P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 P(3|1) P(3|2) P(3|3) 00 0 0 P(4|3) 0

P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 0 P(3|2) P(3|3) 00 0 0 P(4|3) 0

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 178

Page 179: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example dynamics

possible model topologies for phones (only transitionsdepicted)

P(1|0) P(1|1) 0 0 0P(2|0) P(2|1) P(2|2) 0 0

0 P(3|1) P(3|2) P(3|3) 00 0 P(4|2) P(4|3) 0

P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 P(3|1) P(3|2) P(3|3) 00 0 0 P(4|3) 0

P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 0 P(3|2) P(3|3) 00 0 0 P(4|3) 0

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 179

Page 180: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example dynamics

possible model topologies for phones (only transitionsdepicted)

P(1|0) P(1|1) 0 0 0P(2|0) P(2|1) P(2|2) 0 0

0 P(3|1) P(3|2) P(3|3) 00 0 P(4|2) P(4|3) 0

P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 P(3|1) P(3|2) P(3|3) 00 0 0 P(4|3) 0

P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 0 P(3|2) P(3|3) 00 0 0 P(4|3) 0

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 180

Page 181: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Example dynamics

possible model topologies for phones (only transitionsdepicted)

P(1|0) P(1|1) 0 0 0P(2|0) P(2|1) P(2|2) 0 0

0 P(3|1) P(3|2) P(3|3) 00 0 P(4|2) P(4|3) 0

P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 P(3|1) P(3|2) P(3|3) 00 0 0 P(4|3) 0

P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 0 P(3|2) P(3|3) 00 0 0 P(4|3) 0

the more data available the more sophisticated (andpowerful) models can be trained

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 181

Page 182: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Model composition

composition of submodels on multiple levelsI phone models have to be concatenated into word modelsI word models are concatenated into utterance models

[ f ] [ a ] [ n ]

[ f a n ]

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 182

Page 183: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

using complex state descriptions, encoded by means offeaturesI model can be in ”different states” at the same time

more efficient implementation of state transitions

modelling of transitions between sub-models

factoring out different influences on the outcomeI interplay of several actuators (muscles, motors, ...)

modelling partly asynchronized processesI coordinated movement of different body parts (e.g. sign

language)I synchronization between speech sounds and lip

movementsI synchronization between speech and gestureI ...

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 183

Page 184: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

problem: state-transition probability tables are sparseI contain a large number of zero probabilities

alternative model structure: separation of state andtransition variables

deterministic statevariablesstochastic transitionvariables

observation variables

causal links can be stochastic or deterministicI stochastic: conditional probabilities to be estimatedI deterministic: to be specified manually (decision trees)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 184

Page 185: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

state variablesI distinct values for each state of the corresponding HMMI value at slice t + 1 is a deterministic function of the

state and the transition of slice t

transition variablesI probability distributionI which arc to take to leave a state of the corresponding

HMMI number of values is the outdegree of the corresponding

state in an HMM

use of transition variables is more efficient than usingstochastic state variables with zero probabilities for theimpossible state transitions

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 185

Page 186: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

composite models: some applications require a model tobe composed out of sub-modelsI speech: phones → syllables → words → utterancesI vision: sub-parts → parts → compositesI genomics: nucleotides → amino acids → proteins

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 186

Page 187: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

composite models:I length of the sub-segments is not kown in advanceI naive concatenation would require to generate all

possible segmentations of the input sequence

︸ ︷︷ ︸sub-model for /n/

︸ ︷︷ ︸sub-model for /ow/

evolution of articulationacoustic emission

which sub-model to choose next?

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 187

Page 188: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

additional sub-model variables select the next sub-modelto choose

sub-model indexvariables

stochastic transitionvariablessub-model statevariables

observation variables

sub-model index variables: which submodel to use at eachpoint in time

sub-model index and transition variables model legalsequences of sub-models (control layer)

several control layers can be combined

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 188

Page 189: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

factored models (1): factoring out different influences onthe observation

e.g. articulation:I asynchroneous movement of articulators

(lips, tongue, jaw, ...)

state

articulators

observation

if the data is drawn from a factored source, full DBNs aresuperior to the special case of HMMs

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 189

Page 190: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

factored models (2): coupling of different input channelsI e.g. acoustic and visual information in speech processing

naıve approach (1): data level fusion

state

mixtures

observation

too strong synchronisation constraints

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 190

Page 191: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

naıve approach(2): independent input streams

acoustic channel

visual channel

no synchronisation at all

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 191

Page 192: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

product model

state

mixtures

visual channel

acoustic channel

state values are taken from the cross product of acousticand visual states

large probability distributions have to be trained

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 192

Page 193: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

factorial model (Nefian et al. 2002)

factor 1 state

factor 2 state

mixtures

visual channel

acoustic channel

independent (hidden) states

indirect influence by means of the ”explaining away”effect

loose coupling of input channels

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 193

Page 194: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Dynamic Bayesian Networks

inference is extremely expensiveI nodes are connected across slicesI domains are not locally restrictedI cliques become intractably large

but: joint distribution usually need not be computedI only maximum detection requiredI finding the optimal path through a latticeI dynamic programming can be applied (Viterbi algorithm)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 194

Page 195: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Learning of Bayesian Networks

estimating the probabilities for a given structureI for complete data:

I maximum likelihood estimationI Bayesian estimation

I for incomplete dataI expectation maximizationI gradient descent methods

learning the network structure

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 195

Page 196: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Maximum Likelihood Estimation

likelihood of the model M given the (training) data D

L(M |D) =∏d∈D

P(d |M)

log-likelihood

LL(M |D) =∑d∈D

log2P(d |M)

choose among several possible models for describing thedata according to the principle of maximum likelihood

Θ = arg maxΘ

L(MΘ|D) = arg maxΘ

LL(MΘ|D)

the models only differ in the set of parameters Θc©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 196

Page 197: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Maximum Likelihood Estimation

complete data: estimating the parameters by counting

P(A = a) =N(A = a)∑

v∈dom(A) N(A = v)

P(A = a|B = b,C = c) =N(A = a,B = b,C = c)

N(B = b,C = c)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 197

Page 198: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rare events

sparse data results in pessimistic estimations for unseeneventsI if the count for an event in the data base is 0, the event

is considered impossible by the modelI in many applications most events will never be observed,

irrespective of the sample size

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 198

Page 199: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rare events

Bayesian estimation: using an estimate of the priorprobability as starting point for countingI estimation of maximum a posteriori parametersI no zero counts can occurI if nothing else available use an even distribution as priorI Bayesian estimate in the binary case with an even

distribution

P(yes) =n + 1

n + m + 2

n: counts for yes, m: counts for noI effectively adding virtual counts to the estimate

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 199

Page 200: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rare events

alternative: smoothing as a post processing step

remove probability mass from the frequent observations ...

... and distribute it to the not observed onesI floor methodI discountingI ...

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 200

Page 201: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Rare events

Backoff

interpolate with the estimates of a less sophisticatedmodel, e.g. combine trigram probabilities with bigram orunigram probabilities

P(on|on−2, on−1) =

c3 P(on|on−2, on−1) + c2 P(on|on−1) + c1 P(on)

good/acceptable coefficients ci can be estimated on heldout data

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 201

Page 202: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Incomplete Data

missing at random:I probability that a value is missing depends only on the

observed valueI e.g. confirmation measurement: values are available only

if the preceding measurement was positive/negative

missing completely at randomI probability that a value is missing is also independent of

the valueI e.g. stochastic failures of the measurement equipmentI e.g. hidden/latent variables (mixture coefficients of a

Gaussian mixture distribution)

nonignorable:I neither MAR or MCARI probability depends on the unseen values, e.g. exit polls

for extremist parties

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 202

Page 203: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Expectation Maximization

estimating the underlying distribution of not directlyobservable variables

expectation:I ”complete” the data set using the current estimation

h = Θ to calculate expectations for the missing valuesI applies the model to be learned (Bayesian inference)

maximization:I use the ”completed” data set to find a new maximum

likelihood estimation h′ = Θ′

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 203

Page 204: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Expectation Maximization

full data consists of tuples 〈xi1, ..., xik , zi1, ..., zil〉only xi can be observed

training data: X = ~x1, ..., ~xmhidden information: Z = ~z1, ...,~zmparameters of the distribution to be estimated: Θ

Z can be treated as random variable with p(Z ) = f (Θ,X )

full data: Y = ~y | ~y = ~xi ||~zihypothesis: h of Θ, needs to be revised into h′

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 204

Page 205: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Expectation Maximization

goal of EM: h′ = arg max E (log2 p(Y |h′))

define a function Q(h′|h) = E (log2 p(Y |h′)|h,X )

Estimation (E) step:Calculate Q(h′|h) using the current hypothesis h and theobserved data X to estimate the probability distributionover Y

Q(h′|h)← E (log2 p(Y |h′)|h,X )

Maximization (M) stepReplace hypothesis h by h′ that maximizes the function Q

h← arg maxh′

Q(h′|h)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 205

Page 206: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Expectation Maximization

expectation step requires applying the model to belearnedI Bayesian inference

gradient ascent / hill climbing searchI converges to the next local optimumI global optimum is not guaranteed

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 206

Page 207: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Expectation Maximization

Q(h′|h) Q(h′|h)

Q(h′|h)← E (ln p(Y |h′)|h,X )

h← arg maxh′

Q(h′|h)

If Q is continuous, EM converges to the local maximumof the likelihood function P(Y |h′)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 207

Page 208: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Learning the Network Structure

learning the network structure

space of possible networks is extremely large (> O(2n))

a Bayesian network over a complete graph is always apossible answer, but not an interesting one (no modellingof independencies)

problem of overfitting

two approachesI constraint-based learningI (score-based learning)

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 208

Page 209: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Constraint-based Structure Learning

estimate the pairwise degree of independence usingconditional mutual information

determine the direction of the arcs betweennon-independent nodes

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 209

Page 210: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Estimating Independence

conditional mutual information

CMI (A,B |X ) =∑X

P(X )∑A,B

P(A,B |X )log2P(A,B |X )

P(A|X )P(B |X )

two nodes are independent if CMI (A,B |X ) = 0

choose all pairs of nodes as non-independent, where thesignificance of a χ2-test on the hypothesisCMI (A,B |X ) = 0 is above a certain user-definedthreshold

high minimal significance level: more links are established

result is a skeleton of possible candidates for causalinfluence

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 210

Page 211: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Determining Causal Influence

Rule 1 (introduction of v-structures): A− C and B − Cbut not A− B introduce a v-structure A→ C ← B ifthere exists a set of nodes X so that A is d-separatedfrom B given X

A B

C

A B

C

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 211

Page 212: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Determining Causal Influence

Rule 2 (avoid new v-structures): When Rule 1 has beenexhausted and there is a structure A→ C − B but notA− B then direct C → B

Rule 3 (avoid cycles): If A→ B introduces a cycle in thegraph do A← B

Rule 4 (choose randomly): If no other rule can be appliedto the graph, choose an undirected link and give it anarbitrary direction

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 212

Page 213: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Determining Causal Influence

A B

C D E

F G

Rule 1

A B

C D E

F G

Rule 2

A B

C D E

F G

Rule 4

A B

C D E

F G

Rule 2

A B

C D E

F G

Rule 2

A B

C D E

F G

Rule 4

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 213

Page 214: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Determining Causal Influence

independence/non-independence candidates mightcontradict each other

¬I (A,B),¬I (A,C ),¬I (B ,C ), but I (A,B |C ), I (A,C |B)and I (B ,C |A)I remove a link and build a chain out of the remaining

ones

A B

C

A B

C

I uncertain region: different heuristics might lead todifferent structures

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 214

Page 215: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Determining Causal Influence

I (A,C ), I (A,D), I (B ,D)

A D

B C

A D

B C

E

I problem might be caused by a hidden variable E → BE → C A→ B D → C

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 215

Page 216: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Constraint-based Structure Learning

useful results can only be expected, ifI the data is completeI no (unrecognized) hidden variables obscure the induced

influence linksI the observations are a faithful sample of an underlying

Bayesian networkI the distribution of cases in D reflects the distribution

determined by the underlying networkI the estimated probability distribution is very close to the

underlying one

I the underlying distribution is recoverable from theobservations

c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 216

Page 217: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de

Constraint-based Structure Learning

example of an unrecoverable distribution:I two switches: P(A = up) = P(B = up) = 0.5I P(C = on) = 1 if val(A) = val(B)I → I (A,C ), I (B,C )

A B C

problem: independence decisions are taken on individuallinks (CMI), not on complete link configurations

P(C |A,B) =

(1 00 1

)c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 217