tdt4171 artificial intelligence methods · pdf file tdt4171 artificial intelligence methods...
Post on 28-Jun-2020
1 views
Embed Size (px)
TRANSCRIPT
TDT4171 Artificial Intelligence Methods Lecture 3 & 4 – Probabilistic Reasoning Over Time
Norwegian University of Science and Technology
Helge Langseth IT-VEST 310
1 TDT4171 Artificial Intelligence Methods
Outline
1 Leftovers from last time Inference
2 Probabilistic Reasoning over Time Set-up Basic speech recognition Inference: Filtering, prediction, smoothing Inference for Hidden Markov models Kalman Filters Dynamic Bayesian networks Summary
3 Speech recognition Speech as probabilistic inference Speech sounds Word sequences
2 TDT4171 Artificial Intelligence Methods
Leftovers from last time
Summary from last time
Bayes nets provide a natural representation for (causally induced) conditional independence
Topology + CPTs = compact representation of joint distribution
Generally easy to construct – also for non-experts
Canonical distributions (e.g., noisy-OR) = compact representation of CPTs
Announcements
The first assignment due next Friday
Deliver it using It’s Learning
There will be no lecture next week!
3 TDT4171 Artificial Intelligence Methods
Leftovers from last time Inference
Inference tasks
Simple queries: compute posterior marginal P(Xi|E= e), e.g., P (NoGas|Gauge= empty, Lights= on, Starts= false)
Conjunctive queries: P(Xi,Xj |E= e) = P(Xi|E= e)P(Xj |Xi,E= e)
Optimal decisions: decision networks include utility information; probabilistic inference required for P (outcome|action, evidence)
Value of information: which evidence to seek next?
Sensitivity analysis: which probability values are most critical?
Explanation: why do I need a new starter motor?
4 TDT4171 Artificial Intelligence Methods
Leftovers from last time Inference
Inference tasks – Inference by enumeration
Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation.
Simple query on the burglary network:
P(B|j,m) = P(B, j,m)/P (j,m) = αP(B, j,m)
= α Σe Σa P(B, e, a, j,m)
B E
J
A
M Rewrite full joint entries using product of CPT entries:
P(B|j,m) = ΣeΣaP(B)P (e)P(a|B, e)P (j|a)P (m|a)
Recursive depth-first enumeration: O(n) space, O(n · dn) time
5 TDT4171 Artificial Intelligence Methods
Leftovers from last time Inference
Inference tasks – Inference by enumeration
Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation.
Simple query on the burglary network:
P(B|j,m) = P(B, j,m)/P (j,m) = αP(B, j,m)
= α Σe Σa P(B, e, a, j,m)
B E
J
A
M Rewrite full joint entries using product of CPT entries:
P(B|j,m) = ΣeΣaP(B)P (e)P(a|B, e)P (j|a)P (m|a) = αP(B) ΣeP (e)Σa P(a|B, e)P (j|a)P (m|a)
Recursive depth-first enumeration: O(n) space, O(dn) time
5 TDT4171 Artificial Intelligence Methods
Leftovers from last time Inference
Enumeration algorithm
function Enumeration-Ask(X,e,bn) returns distr. over X inputs: X, the query variable
e, observed values for variables E bn, a Bayesian network with variables {X } ∪ E ∪ Y
Q(X )← a distribution over X, initially empty for each value xi of X do
extend e with value xi for X Q(xi)←Enum-All(Vars[bn],e)
return Normalize(Q(X ))
function Enum-All(vars,e) returns a real number if Empty?(vars) then return 1.0 Y←First(vars) if Y has value y in e
then return P (y|Pa(Y )) ×Enum-All(Rest(vars),e) else return
∑
y P (y|Pa(Y ))×Enum-All(Rest(vars),ey) where ey is e extended with Y = y
6 TDT4171 Artificial Intelligence Methods
Leftovers from last time Inference
Evaluation tree
P(j|a) .90
P(m|a) .70 .01
P(m| a)
.05 P(j| a) P(j|a)
.90
P(m|a) .70 .01
P(m| a)
.05 P(j| a)
P(b) .001
P(e) .002
P( e) .998
P(a|b,e) .95 .06
P( a|b, e) .05 P( a|b,e)
.94 P(a|b, e)
Enumeration is inefficient, as we have repeated computation of e.g., P (j|a)P (m|a) for each value of e. ⇒ Nice to know that better methods are available. . .
7 TDT4171 Artificial Intelligence Methods
Leftovers from last time Inference
Summary of Chapter 14
Bayes nets provide a natural representation for (causally induced) conditional independence
Topology + CPTs = compact representation of joint
Generally easy to construct – also for non-experts
Canonical distributions (e.g., noisy-OR) = compact representation of CPTs
Efficient inference calculations are available (but the good ones are outside the scope of this course)
What you should know:
How to build models (and verify them using Conditional Independence and Causality)
What drives the . . .
model building burden complexity of inference
8 TDT4171 Artificial Intelligence Methods
Probabilistic Reasoning over Time Set-up
Time and uncertainty
Motivation: The world changes; we need to track and predict it Static (Vehicle diagnosis) vs. Dynamic (Diabetes management)
Basic idea: copy state and evidence variables for each time step
Raint = Does it rain at time t
This assumes discrete time; step size depends on problem Here: A timestep is one day, I guess (?)
9 TDT4171 Artificial Intelligence Methods
Probabilistic Reasoning over Time Set-up
Markov processes (Markov chains)
If we want to construct a Bayes net from these variables, then what are the parents?
Assume we have observations of Rain0, Rain1, . . . , Raint and want to predict whether or not it rains at day t + 1: P(Raint+1|Rain0, Rain1, . . . , Raint) Try to build a BN over Rain0, Rain1, . . . , Raint+1:
P(Raint+1) 6= P(Raint+1|Raint); base on Raint. P(Raint+1|Raint) ≈ P(Raint+1|Raint, Raint−1) (Do you agree?)
10 TDT4171 Artificial Intelligence Methods
Probabilistic Reasoning over Time Set-up
Markov processes (Markov chains)
If we want to construct a Bayes net from these variables, then what are the parents?
Assume we have observations of Rain0, Rain1, . . . , Raint and want to predict whether or not it rains at day t + 1: P(Raint+1|Rain0, Rain1, . . . , Raint) Try to build a BN over Rain0, Rain1, . . . , Raint+1:
P(Raint+1) 6= P(Raint+1|Raint); base on Raint. P(Raint+1|Raint) ≈ P(Raint+1|Raint, Raint−1) (Do you agree?)
First-order Markov process:
P(Raint+1|Rain0, . . . , Raint) = P(Raint+1|Raint) “Future is cond. independent of Past given Present”
10 TDT4171 Artificial Intelligence Methods
Probabilistic Reasoning over Time Set-up
Markov processes (Markov chains)
If we want to construct a Bayes net from these variables, then what are the parents?
Assume we have observations of Rain0, Rain1, . . . , Raint and want to predict whether or not it rains at day t + 1: P(Raint+1|Rain0, Rain1, . . . , Raint) Try to build a BN over Rain0, Rain1, . . . , Raint+1:
P(Raint+1) 6= P(Raint+1|Raint); base on Raint. P(Raint+1|Raint) ≈ P(Raint+1|Raint, Raint−1) (Do you agree?)
k’th-order Markov process: P(Raint+1|Rain0, . . . , Raint) = P(Raint+1|Raint−k+1, . . . , Raint)
10 TDT4171 Artificial Intelligence Methods
Probabilistic Reasoning over Time Set-up
Markov processes as Bayesian networks
If we want to construct a Bayes net from these variables, then what are the parents?
Markov assumption: Xt depends on bounded subset of X0:t−1
First-order Markov process: P(Xt|X0:t−1) = P(Xt | Xt−1) Second-order Markov process:
P(Xt|X0:t−1) = P(Xt|Xt−2,Xt−1)
X t −1 X tX t −2 X t +1 X t +2
X t −1 X tX t −2 X t +1 X t +2First−order
Second−order
11 TDT4171 Artificial Intelligence Methods
Probabilistic Reasoning over Time Set-up
Is a first-order Markov process suitable?
First-order Markov assumption not exactly true in real world!
Possible fixes: 1 Increase order of Markov process 2 Augment state, e.g., add Tempt, Pressuret
State augmentation is enough!
Any k’th-order Markov process can be expressed as a First order Markov process – Focus on first order processes from now on.
“Proof”: 1 Assume for simplicity that the process contains only variable
X, and that we have a second-order Markov process 2 Create a new variable X ′t identical to Xt−1. 3 Let Xt+1 have both Xt and X
′ t as parent.
4 Do for all t. Augmented model is first-order Markov process 12 TDT4171 Artificial Intelligence Methods
Probabilistic Reasoning over Time Basic speech recognition
Speech as probabilistic inference
How can we recognize speech?
Speech signals are noisy, variable, ambiguous
What is the most likely word sequence, given the speech signal?
Why not choose Words to maximize P(Words|signal)?? Use Bayes’ rule:
P(Words|signal) = αP(signal|Words)P(Words)
I.e., decomposes into acoustic model + language model
Need to be able to do the required calculations!!
13 TDT4171 Artificial Intellige