# tdt4171 artiﬁcial intelligence methods · pdf file tdt4171 artiﬁcial intelligence methods...

Post on 28-Jun-2020

1 views

Category:

## Documents

Embed Size (px)

TRANSCRIPT

• TDT4171 Artificial Intelligence Methods Lecture 3 & 4 – Probabilistic Reasoning Over Time

Norwegian University of Science and Technology

Helge Langseth IT-VEST 310

1 TDT4171 Artificial Intelligence Methods

• Outline

1 Leftovers from last time Inference

2 Probabilistic Reasoning over Time Set-up Basic speech recognition Inference: Filtering, prediction, smoothing Inference for Hidden Markov models Kalman Filters Dynamic Bayesian networks Summary

3 Speech recognition Speech as probabilistic inference Speech sounds Word sequences

2 TDT4171 Artificial Intelligence Methods

• Leftovers from last time

Summary from last time

Bayes nets provide a natural representation for (causally induced) conditional independence

Topology + CPTs = compact representation of joint distribution

Generally easy to construct – also for non-experts

Canonical distributions (e.g., noisy-OR) = compact representation of CPTs

Announcements

The first assignment due next Friday

Deliver it using It’s Learning

There will be no lecture next week!

3 TDT4171 Artificial Intelligence Methods

• Leftovers from last time Inference

Simple queries: compute posterior marginal P(Xi|E= e), e.g., P (NoGas|Gauge= empty, Lights= on, Starts= false)

Conjunctive queries: P(Xi,Xj |E= e) = P(Xi|E= e)P(Xj |Xi,E= e)

Optimal decisions: decision networks include utility information; probabilistic inference required for P (outcome|action, evidence)

Value of information: which evidence to seek next?

Sensitivity analysis: which probability values are most critical?

Explanation: why do I need a new starter motor?

4 TDT4171 Artificial Intelligence Methods

• Leftovers from last time Inference

Inference tasks – Inference by enumeration

Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation.

Simple query on the burglary network:

P(B|j,m) = P(B, j,m)/P (j,m) = αP(B, j,m)

= α Σe Σa P(B, e, a, j,m)

B E

J

A

M Rewrite full joint entries using product of CPT entries:

P(B|j,m) = ΣeΣaP(B)P (e)P(a|B, e)P (j|a)P (m|a)

Recursive depth-first enumeration: O(n) space, O(n · dn) time

5 TDT4171 Artificial Intelligence Methods

• Leftovers from last time Inference

Inference tasks – Inference by enumeration

Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation.

Simple query on the burglary network:

P(B|j,m) = P(B, j,m)/P (j,m) = αP(B, j,m)

= α Σe Σa P(B, e, a, j,m)

B E

J

A

M Rewrite full joint entries using product of CPT entries:

P(B|j,m) = ΣeΣaP(B)P (e)P(a|B, e)P (j|a)P (m|a) = αP(B) ΣeP (e)Σa P(a|B, e)P (j|a)P (m|a)

Recursive depth-first enumeration: O(n) space, O(dn) time

5 TDT4171 Artificial Intelligence Methods

• Leftovers from last time Inference

Enumeration algorithm

function Enumeration-Ask(X,e,bn) returns distr. over X inputs: X, the query variable

e, observed values for variables E bn, a Bayesian network with variables {X } ∪ E ∪ Y

Q(X )← a distribution over X, initially empty for each value xi of X do

extend e with value xi for X Q(xi)←Enum-All(Vars[bn],e)

return Normalize(Q(X ))

function Enum-All(vars,e) returns a real number if Empty?(vars) then return 1.0 Y←First(vars) if Y has value y in e

then return P (y|Pa(Y )) ×Enum-All(Rest(vars),e) else return

y P (y|Pa(Y ))×Enum-All(Rest(vars),ey) where ey is e extended with Y = y

6 TDT4171 Artificial Intelligence Methods

• Leftovers from last time Inference

Evaluation tree

P(j|a) .90

P(m|a) .70 .01

P(m| a)

.05 P(j| a) P(j|a)

.90

P(m|a) .70 .01

P(m| a)

.05 P(j| a)

P(b) .001

P(e) .002

P( e) .998

P(a|b,e) .95 .06

P( a|b, e) .05 P( a|b,e)

.94 P(a|b, e)

Enumeration is inefficient, as we have repeated computation of e.g., P (j|a)P (m|a) for each value of e. ⇒ Nice to know that better methods are available. . .

7 TDT4171 Artificial Intelligence Methods

• Leftovers from last time Inference

Summary of Chapter 14

Bayes nets provide a natural representation for (causally induced) conditional independence

Topology + CPTs = compact representation of joint

Generally easy to construct – also for non-experts

Canonical distributions (e.g., noisy-OR) = compact representation of CPTs

Efficient inference calculations are available (but the good ones are outside the scope of this course)

What you should know:

How to build models (and verify them using Conditional Independence and Causality)

What drives the . . .

model building burden complexity of inference

8 TDT4171 Artificial Intelligence Methods

• Probabilistic Reasoning over Time Set-up

Time and uncertainty

Motivation: The world changes; we need to track and predict it Static (Vehicle diagnosis) vs. Dynamic (Diabetes management)

Basic idea: copy state and evidence variables for each time step

Raint = Does it rain at time t

This assumes discrete time; step size depends on problem Here: A timestep is one day, I guess (?)

9 TDT4171 Artificial Intelligence Methods

• Probabilistic Reasoning over Time Set-up

Markov processes (Markov chains)

If we want to construct a Bayes net from these variables, then what are the parents?

Assume we have observations of Rain0, Rain1, . . . , Raint and want to predict whether or not it rains at day t + 1: P(Raint+1|Rain0, Rain1, . . . , Raint) Try to build a BN over Rain0, Rain1, . . . , Raint+1:

P(Raint+1) 6= P(Raint+1|Raint); base on Raint. P(Raint+1|Raint) ≈ P(Raint+1|Raint, Raint−1) (Do you agree?)

10 TDT4171 Artificial Intelligence Methods

• Probabilistic Reasoning over Time Set-up

Markov processes (Markov chains)

If we want to construct a Bayes net from these variables, then what are the parents?

Assume we have observations of Rain0, Rain1, . . . , Raint and want to predict whether or not it rains at day t + 1: P(Raint+1|Rain0, Rain1, . . . , Raint) Try to build a BN over Rain0, Rain1, . . . , Raint+1:

P(Raint+1) 6= P(Raint+1|Raint); base on Raint. P(Raint+1|Raint) ≈ P(Raint+1|Raint, Raint−1) (Do you agree?)

First-order Markov process:

P(Raint+1|Rain0, . . . , Raint) = P(Raint+1|Raint) “Future is cond. independent of Past given Present”

10 TDT4171 Artificial Intelligence Methods

• Probabilistic Reasoning over Time Set-up

Markov processes (Markov chains)

If we want to construct a Bayes net from these variables, then what are the parents?

Assume we have observations of Rain0, Rain1, . . . , Raint and want to predict whether or not it rains at day t + 1: P(Raint+1|Rain0, Rain1, . . . , Raint) Try to build a BN over Rain0, Rain1, . . . , Raint+1:

P(Raint+1) 6= P(Raint+1|Raint); base on Raint. P(Raint+1|Raint) ≈ P(Raint+1|Raint, Raint−1) (Do you agree?)

k’th-order Markov process: P(Raint+1|Rain0, . . . , Raint) = P(Raint+1|Raint−k+1, . . . , Raint)

10 TDT4171 Artificial Intelligence Methods

• Probabilistic Reasoning over Time Set-up

Markov processes as Bayesian networks

If we want to construct a Bayes net from these variables, then what are the parents?

Markov assumption: Xt depends on bounded subset of X0:t−1

First-order Markov process: P(Xt|X0:t−1) = P(Xt | Xt−1) Second-order Markov process:

P(Xt|X0:t−1) = P(Xt|Xt−2,Xt−1)

X t −1 X tX t −2 X t +1 X t +2

X t −1 X tX t −2 X t +1 X t +2First−order

Second−order

11 TDT4171 Artificial Intelligence Methods

• Probabilistic Reasoning over Time Set-up

Is a first-order Markov process suitable?

First-order Markov assumption not exactly true in real world!

Possible fixes: 1 Increase order of Markov process 2 Augment state, e.g., add Tempt, Pressuret

State augmentation is enough!

Any k’th-order Markov process can be expressed as a First order Markov process – Focus on first order processes from now on.

“Proof”: 1 Assume for simplicity that the process contains only variable

X, and that we have a second-order Markov process 2 Create a new variable X ′t identical to Xt−1. 3 Let Xt+1 have both Xt and X

′ t as parent.

4 Do for all t. Augmented model is first-order Markov process 12 TDT4171 Artificial Intelligence Methods

• Probabilistic Reasoning over Time Basic speech recognition

Speech as probabilistic inference

How can we recognize speech?

Speech signals are noisy, variable, ambiguous

What is the most likely word sequence, given the speech signal?

Why not choose Words to maximize P(Words|signal)?? Use Bayes’ rule:

P(Words|signal) = αP(signal|Words)P(Words)

I.e., decomposes into acoustic model + language model

Need to be able to do the required calculations!!

13 TDT4171 Artificial Intellige