lecture 11 dynamic bayesian networks and hidden markov … › ~marco › dm533 › slides ›...
TRANSCRIPT
![Page 1: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/1.jpg)
Lecture 11Dynamic Bayesian Networks and
Hidden Markov ModelsDecision Trees
Marco Chiarandini
Deptartment of Mathematics & Computer ScienceUniversity of Southern Denmark
Slides by Stuart Russell and Peter Norvig
![Page 2: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/2.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningCourse Overview
4 Introduction4 Artificial Intelligence4 Intelligent Agents
4 Search4 Uninformed Search4 Heuristic Search
4 Adversarial Search4 Minimax search4 Alpha-beta pruning
4 Knowledge representation andReasoning
4 Propositional logic4 First order logic4 Inference
4 Uncertain knowledge andReasoning
4 Probability and Bayesianapproach
4 Bayesian NetworksHidden Markov ChainsKalman Filters
LearningDecision TreesMaximum LikelihoodEM AlgorithmLearning Bayesian NetworksNeural NetworksSupport vector machines
2
![Page 3: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/3.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningPerformance of approximation algorithms
Absolute approximation: |P(X |e)− P(X |e)| ≤ ε
Relative approximation: |P(X |e)−P(X |e)|P(X |e) ≤ ε
Relative =⇒ absolute since 0 ≤ P ≤ 1 (may be O(2−n))
Randomized algorithms may fail with probability at most δ
Polytime approximation: poly(n, ε−1, log δ−1)
Theorem (Dagum and Luby, 1993): both absolute and relativeapproximation for either deterministic or randomized algorithmsare NP-hard for any ε, δ < 0.5(Absolute approximation polytime with no evidence—Chernoff bounds)
3
![Page 4: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/4.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningPerformance of approximation algorithms
Absolute approximation: |P(X |e)− P(X |e)| ≤ ε
Relative approximation: |P(X |e)−P(X |e)|P(X |e) ≤ ε
Relative =⇒ absolute since 0 ≤ P ≤ 1 (may be O(2−n))
Randomized algorithms may fail with probability at most δ
Polytime approximation: poly(n, ε−1, log δ−1)
Theorem (Dagum and Luby, 1993): both absolute and relativeapproximation for either deterministic or randomized algorithmsare NP-hard for any ε, δ < 0.5(Absolute approximation polytime with no evidence—Chernoff bounds)
3
![Page 5: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/5.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningPerformance of approximation algorithms
Absolute approximation: |P(X |e)− P(X |e)| ≤ ε
Relative approximation: |P(X |e)−P(X |e)|P(X |e) ≤ ε
Relative =⇒ absolute since 0 ≤ P ≤ 1 (may be O(2−n))
Randomized algorithms may fail with probability at most δ
Polytime approximation: poly(n, ε−1, log δ−1)
Theorem (Dagum and Luby, 1993): both absolute and relativeapproximation for either deterministic or randomized algorithmsare NP-hard for any ε, δ < 0.5(Absolute approximation polytime with no evidence—Chernoff bounds)
3
![Page 6: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/6.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningPerformance of approximation algorithms
Absolute approximation: |P(X |e)− P(X |e)| ≤ ε
Relative approximation: |P(X |e)−P(X |e)|P(X |e) ≤ ε
Relative =⇒ absolute since 0 ≤ P ≤ 1 (may be O(2−n))
Randomized algorithms may fail with probability at most δ
Polytime approximation: poly(n, ε−1, log δ−1)
Theorem (Dagum and Luby, 1993): both absolute and relativeapproximation for either deterministic or randomized algorithmsare NP-hard for any ε, δ < 0.5(Absolute approximation polytime with no evidence—Chernoff bounds)
3
![Page 7: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/7.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningPerformance of approximation algorithms
Absolute approximation: |P(X |e)− P(X |e)| ≤ ε
Relative approximation: |P(X |e)−P(X |e)|P(X |e) ≤ ε
Relative =⇒ absolute since 0 ≤ P ≤ 1 (may be O(2−n))
Randomized algorithms may fail with probability at most δ
Polytime approximation: poly(n, ε−1, log δ−1)
Theorem (Dagum and Luby, 1993): both absolute and relativeapproximation for either deterministic or randomized algorithmsare NP-hard for any ε, δ < 0.5(Absolute approximation polytime with no evidence—Chernoff bounds)
3
![Page 8: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/8.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningPerformance of approximation algorithms
Absolute approximation: |P(X |e)− P(X |e)| ≤ ε
Relative approximation: |P(X |e)−P(X |e)|P(X |e) ≤ ε
Relative =⇒ absolute since 0 ≤ P ≤ 1 (may be O(2−n))
Randomized algorithms may fail with probability at most δ
Polytime approximation: poly(n, ε−1, log δ−1)
Theorem (Dagum and Luby, 1993): both absolute and relativeapproximation for either deterministic or randomized algorithmsare NP-hard for any ε, δ < 0.5(Absolute approximation polytime with no evidence—Chernoff bounds)
3
![Page 9: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/9.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSummary
Exact inference by variable elimination:– polytime on polytrees, NP-hard on general graphs– space = time, very sensitive to topology
Approximate inference by Likelihood Weighting (LW), Markov Chain MonteCarlo Method (MCMC):
– PriorSampling and RejectionSampling unusable as evidence grow– LW does poorly when there is lots of (late-in-the-order) evidence– LW, MCMC generally insensitive to topology– Convergence can be very slow with probabilities close to 1 or 0– Can handle arbitrary combinations of discrete and continuous variables
4
![Page 10: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/10.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningOutline
1. Exercise
2. Uncertainty over Time
3. Speech Recognition
4. Learning
5
![Page 11: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/11.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningWumpus World
OK 1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4
OKOK
3,4 4,4
B
B
Pij = true iff [i , j ] contains a pitBij = true iff [i , j ] is breezyInclude only B1,1,B1,2,B2,1 in the probability model
6
![Page 12: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/12.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSpecifying the probability model
The full joint distribution is P(P1,1, . . . ,P4,4,B1,1,B1,2,B2,1)
Apply product rule: P(B1,1,B1,2,B2,1 |P1,1, . . . ,P4,4)P(P1,1, . . . ,P4,4)
(Do it this way to get P(Effect|Cause).)
First term: 1 if pits are adjacent to breezes, 0 otherwise
Second term: pits are placed randomly, probability 0.2 per square:
P(P1,1, . . . ,P4,4) =
4,4∏i,j = 1,1
P(Pi,j) = 0.2n× 0.816−n
for n pits.
7
![Page 13: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/13.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSpecifying the probability model
The full joint distribution is P(P1,1, . . . ,P4,4,B1,1,B1,2,B2,1)
Apply product rule: P(B1,1,B1,2,B2,1 |P1,1, . . . ,P4,4)P(P1,1, . . . ,P4,4)
(Do it this way to get P(Effect|Cause).)
First term: 1 if pits are adjacent to breezes, 0 otherwise
Second term: pits are placed randomly, probability 0.2 per square:
P(P1,1, . . . ,P4,4) =
4,4∏i,j = 1,1
P(Pi,j) = 0.2n× 0.816−n
for n pits.
7
![Page 14: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/14.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningObservations and query
We know the following facts:b = ¬b1,1 ∧ b1,2 ∧ b2,1known = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
Query is P(P1,3|known, b)
Define Unknown = Pijs other than P1,3 and Known
For inference by enumeration, we have
P(P1,3|known, b) = α∑
unknown
P(P1,3, unknown, known, b)
Grows exponentially with number of squares!
8
![Page 15: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/15.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningObservations and query
We know the following facts:b = ¬b1,1 ∧ b1,2 ∧ b2,1known = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
Query is P(P1,3|known, b)
Define Unknown = Pijs other than P1,3 and Known
For inference by enumeration, we have
P(P1,3|known, b) = α∑
unknown
P(P1,3, unknown, known, b)
Grows exponentially with number of squares!
8
![Page 16: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/16.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence
Basic insight: observations are conditionally independent of other hiddensquares given neighbouring hidden squares
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
KNOWNFRINGE
QUERYOTHER
Define Unknown = Fringe ∪ OtherP(b|P1,3,Known,Unknown) = P(b|P1,3,Known,Fringe)Manipulate query into a form where we can use this!
9
![Page 17: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/17.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence
Basic insight: observations are conditionally independent of other hiddensquares given neighbouring hidden squares
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
KNOWNFRINGE
QUERYOTHER
Define Unknown = Fringe ∪ OtherP(b|P1,3,Known,Unknown) = P(b|P1,3,Known,Fringe)Manipulate query into a form where we can use this!
9
![Page 18: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/18.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence contd.
P(P1,3|known, b) = αX
unknown
P(P1,3, unknown, known, b)
= αX
unknown
P(b|P1,3, known, unknown)P(P1,3, known, unknown)
= αXfringe
Xother
P(b|known,P1,3, fringe, other)P(P1,3, known, fringe, other)
= αXfringe
Xother
P(b|known,P1,3, fringe)P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3)P(known)P(fringe)P(other)
= αP(known)P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)Xother
P(other)
= α′ P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)
10
![Page 19: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/19.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence contd.
P(P1,3|known, b) = αX
unknown
P(P1,3, unknown, known, b)
= αX
unknown
P(b|P1,3, known, unknown)P(P1,3, known, unknown)
= αXfringe
Xother
P(b|known,P1,3, fringe, other)P(P1,3, known, fringe, other)
= αXfringe
Xother
P(b|known,P1,3, fringe)P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3)P(known)P(fringe)P(other)
= αP(known)P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)Xother
P(other)
= α′ P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)
10
![Page 20: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/20.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence contd.
P(P1,3|known, b) = αX
unknown
P(P1,3, unknown, known, b)
= αX
unknown
P(b|P1,3, known, unknown)P(P1,3, known, unknown)
= αXfringe
Xother
P(b|known,P1,3, fringe, other)P(P1,3, known, fringe, other)
= αXfringe
Xother
P(b|known,P1,3, fringe)P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3)P(known)P(fringe)P(other)
= αP(known)P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)Xother
P(other)
= α′ P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)
10
![Page 21: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/21.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence contd.
P(P1,3|known, b) = αX
unknown
P(P1,3, unknown, known, b)
= αX
unknown
P(b|P1,3, known, unknown)P(P1,3, known, unknown)
= αXfringe
Xother
P(b|known,P1,3, fringe, other)P(P1,3, known, fringe, other)
= αXfringe
Xother
P(b|known,P1,3, fringe)P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3)P(known)P(fringe)P(other)
= αP(known)P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)Xother
P(other)
= α′ P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)
10
![Page 22: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/22.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence contd.
P(P1,3|known, b) = αX
unknown
P(P1,3, unknown, known, b)
= αX
unknown
P(b|P1,3, known, unknown)P(P1,3, known, unknown)
= αXfringe
Xother
P(b|known,P1,3, fringe, other)P(P1,3, known, fringe, other)
= αXfringe
Xother
P(b|known,P1,3, fringe)P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3)P(known)P(fringe)P(other)
= αP(known)P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)Xother
P(other)
= α′ P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)
10
![Page 23: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/23.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence contd.
P(P1,3|known, b) = αX
unknown
P(P1,3, unknown, known, b)
= αX
unknown
P(b|P1,3, known, unknown)P(P1,3, known, unknown)
= αXfringe
Xother
P(b|known,P1,3, fringe, other)P(P1,3, known, fringe, other)
= αXfringe
Xother
P(b|known,P1,3, fringe)P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3)P(known)P(fringe)P(other)
= αP(known)P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)Xother
P(other)
= α′ P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)
10
![Page 24: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/24.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence contd.
P(P1,3|known, b) = αX
unknown
P(P1,3, unknown, known, b)
= αX
unknown
P(b|P1,3, known, unknown)P(P1,3, known, unknown)
= αXfringe
Xother
P(b|known,P1,3, fringe, other)P(P1,3, known, fringe, other)
= αXfringe
Xother
P(b|known,P1,3, fringe)P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3)P(known)P(fringe)P(other)
= αP(known)P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)Xother
P(other)
= α′ P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)
10
![Page 25: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/25.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence contd.
P(P1,3|known, b) = αX
unknown
P(P1,3, unknown, known, b)
= αX
unknown
P(b|P1,3, known, unknown)P(P1,3, known, unknown)
= αXfringe
Xother
P(b|known,P1,3, fringe, other)P(P1,3, known, fringe, other)
= αXfringe
Xother
P(b|known,P1,3, fringe)P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3, known, fringe, other)
= αXfringe
P(b|known,P1,3, fringe)Xother
P(P1,3)P(known)P(fringe)P(other)
= αP(known)P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)Xother
P(other)
= α′ P(P1,3)Xfringe
P(b|known,P1,3, fringe)P(fringe)
10
![Page 26: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/26.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUsing conditional independence contd.
OK
1,1 2,1 3,1
1,2 2,2
1,3
OKOK
B
B
OK
1,1 2,1 3,1
1,2 2,2
1,3
OKOK
B
B
OK
1,1 2,1 3,1
1,2 2,2
1,3
OKOK
B
B
0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16 0.8 x 0.2 = 0.16
OK
1,1 2,1 3,1
1,2 2,2
1,3
OKOK
B
B
OK
1,1 2,1 3,1
1,2 2,2
1,3
OKOK
B
B
0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16
P(P1,3|known, b) = α′ 〈0.2(0.04 + 0.16 + 0.16), 0.8(0.04 + 0.16)〉≈ 〈0.31, 0.69〉
P(P2,2|known, b) ≈ 〈0.86, 0.14〉
11
![Page 27: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/27.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningOutline
1. Exercise
2. Uncertainty over Time
3. Speech Recognition
4. Learning
12
![Page 28: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/28.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningOutline
♦ Time and uncertainty♦ Inference: filtering, prediction, smoothing♦ Hidden Markov models♦ Kalman filters (a brief mention)♦ Dynamic Bayesian networks (an even briefer mention)
13
![Page 29: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/29.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningTime and uncertainty
The world changes; we need to track and predict it
Diabetes management vs vehicle diagnosis
Basic idea: copy state and evidence variables for each time stepXt = set of unobservable state variables at time t
e.g., BloodSugart , StomachContentst , etc.Et = set of observable evidence variables at time t
e.g., MeasuredBloodSugart , PulseRatet , FoodEatent
This assumes discrete time; step size depends on problem
Notation: Xa:b = Xa,Xa+1, . . . ,Xb−1,Xb
14
![Page 30: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/30.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningTime and uncertainty
The world changes; we need to track and predict it
Diabetes management vs vehicle diagnosis
Basic idea: copy state and evidence variables for each time stepXt = set of unobservable state variables at time t
e.g., BloodSugart , StomachContentst , etc.Et = set of observable evidence variables at time t
e.g., MeasuredBloodSugart , PulseRatet , FoodEatent
This assumes discrete time; step size depends on problem
Notation: Xa:b = Xa,Xa+1, . . . ,Xb−1,Xb
14
![Page 31: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/31.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningTime and uncertainty
The world changes; we need to track and predict it
Diabetes management vs vehicle diagnosis
Basic idea: copy state and evidence variables for each time stepXt = set of unobservable state variables at time t
e.g., BloodSugart , StomachContentst , etc.Et = set of observable evidence variables at time t
e.g., MeasuredBloodSugart , PulseRatet , FoodEatent
This assumes discrete time; step size depends on problem
Notation: Xa:b = Xa,Xa+1, . . . ,Xb−1,Xb
14
![Page 32: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/32.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningTime and uncertainty
The world changes; we need to track and predict it
Diabetes management vs vehicle diagnosis
Basic idea: copy state and evidence variables for each time stepXt = set of unobservable state variables at time t
e.g., BloodSugart , StomachContentst , etc.Et = set of observable evidence variables at time t
e.g., MeasuredBloodSugart , PulseRatet , FoodEatent
This assumes discrete time; step size depends on problem
Notation: Xa:b = Xa,Xa+1, . . . ,Xb−1,Xb
14
![Page 33: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/33.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningTime and uncertainty
The world changes; we need to track and predict it
Diabetes management vs vehicle diagnosis
Basic idea: copy state and evidence variables for each time stepXt = set of unobservable state variables at time t
e.g., BloodSugart , StomachContentst , etc.Et = set of observable evidence variables at time t
e.g., MeasuredBloodSugart , PulseRatet , FoodEatent
This assumes discrete time; step size depends on problem
Notation: Xa:b = Xa,Xa+1, . . . ,Xb−1,Xb
14
![Page 34: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/34.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningMarkov processes (Markov chains)
Construct a Bayes net from these variables:
- unbounded number of conditional probability table- unbounded number of parents
Markov assumption: Xt depends on bounded subset of X0:t−1First-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−1)Second-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−2,Xt−1)
X t −1 X tX t −2 X t +1 X t +2
X t −1 X tX t −2 X t +1 X t +2First−order
Second−order
Sensor Markov assumption: P(Et |X0:t ,E0:t−1) = P(Et |Xt) Stationary process:
transition model P(Xt |Xt−1) andsensor model P(Et |Xt) fixed for all t
15
![Page 35: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/35.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningMarkov processes (Markov chains)
Construct a Bayes net from these variables:- unbounded number of conditional probability table- unbounded number of parents
Markov assumption: Xt depends on bounded subset of X0:t−1First-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−1)Second-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−2,Xt−1)
X t −1 X tX t −2 X t +1 X t +2
X t −1 X tX t −2 X t +1 X t +2First−order
Second−order
Sensor Markov assumption: P(Et |X0:t ,E0:t−1) = P(Et |Xt) Stationary process:
transition model P(Xt |Xt−1) andsensor model P(Et |Xt) fixed for all t
15
![Page 36: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/36.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningMarkov processes (Markov chains)
Construct a Bayes net from these variables:- unbounded number of conditional probability table- unbounded number of parents
Markov assumption: Xt depends on bounded subset of X0:t−1First-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−1)Second-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−2,Xt−1)
X t −1 X tX t −2 X t +1 X t +2
X t −1 X tX t −2 X t +1 X t +2First−order
Second−order
Sensor Markov assumption: P(Et |X0:t ,E0:t−1) = P(Et |Xt) Stationary process:
transition model P(Xt |Xt−1) andsensor model P(Et |Xt) fixed for all t
15
![Page 37: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/37.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningMarkov processes (Markov chains)
Construct a Bayes net from these variables:- unbounded number of conditional probability table- unbounded number of parents
Markov assumption: Xt depends on bounded subset of X0:t−1First-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−1)Second-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−2,Xt−1)
X t −1 X tX t −2 X t +1 X t +2
X t −1 X tX t −2 X t +1 X t +2First−order
Second−order
Sensor Markov assumption: P(Et |X0:t ,E0:t−1) = P(Et |Xt)
Stationary process:transition model P(Xt |Xt−1) andsensor model P(Et |Xt) fixed for all t
15
![Page 38: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/38.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningMarkov processes (Markov chains)
Construct a Bayes net from these variables:- unbounded number of conditional probability table- unbounded number of parents
Markov assumption: Xt depends on bounded subset of X0:t−1First-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−1)Second-order Markov process: P(Xt |X0:t−1) = P(Xt |Xt−2,Xt−1)
X t −1 X tX t −2 X t +1 X t +2
X t −1 X tX t −2 X t +1 X t +2First−order
Second−order
Sensor Markov assumption: P(Et |X0:t ,E0:t−1) = P(Et |Xt) Stationary process:
transition model P(Xt |Xt−1) andsensor model P(Et |Xt) fixed for all t
15
![Page 39: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/39.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningExample
tRain
tUmbrella
Raint −1
Umbrella t −1
Raint +1
Umbrella t +1
Rt −1 tP(R )
0.3f0.7t
tR tP(U )
0.9t0.2f
First-order Markov assumption not exactly true in real world!Possible fixes:
1. Increase order of Markov process2. Augment state, e.g., add Tempt , Pressuret
Example: robot motion.Augment position and velocity with Batteryt
16
![Page 40: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/40.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningExample
tRain
tUmbrella
Raint −1
Umbrella t −1
Raint +1
Umbrella t +1
Rt −1 tP(R )
0.3f0.7t
tR tP(U )
0.9t0.2f
First-order Markov assumption not exactly true in real world!Possible fixes:
1. Increase order of Markov process2. Augment state, e.g., add Tempt , Pressuret
Example: robot motion.Augment position and velocity with Batteryt
16
![Page 41: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/41.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningInference tasks
1. Filtering: P(Xt |e1:t)belief state—input to the decision process of a rational agent
2. Prediction: P(Xt+k |e1:t) for k > 0evaluation of possible action sequences;like filtering without the evidence
3. Smoothing: P(Xk |e1:t) for 0 ≤ k < tbetter estimate of past states, essential for learning
4. Most likely explanation: argmaxx1:t P(x1:t |e1:t)speech recognition, decoding with a noisy channel
17
![Page 42: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/42.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningFiltering
Aim: devise a recursive state estimation algorithm:
P(Xt+1|e1:t+1) = f (et+1,P(Xt |e1:t))
P(Xt+1|e1:t+1) = P(Xt+1|e1:t , et+1)
= αP(et+1|Xt+1, e1:t)P(Xt+1|e1:t)
= αP(et+1|Xt+1)P(Xt+1|e1:t)
I.e., prediction + estimation. Prediction by summing out Xt :
P(Xt+1|e1:t+1) = αP(et+1|Xt+1)∑xt
P(Xt+1|xt , e1:t)P(xt |e1:t)
= αP(et+1|Xt+1)∑xt
P(Xt+1|xt)P(xt |e1:t)
f1:t+1 = Forward(f1:t , et+1) where f1:t =P(Xt |e1:t)Time and space constant (independent of t) by keeping track of f
18
![Page 43: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/43.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningFiltering
Aim: devise a recursive state estimation algorithm:
P(Xt+1|e1:t+1) = f (et+1,P(Xt |e1:t))
P(Xt+1|e1:t+1) = P(Xt+1|e1:t , et+1)
= αP(et+1|Xt+1, e1:t)P(Xt+1|e1:t)
= αP(et+1|Xt+1)P(Xt+1|e1:t)
I.e., prediction + estimation.
Prediction by summing out Xt :
P(Xt+1|e1:t+1) = αP(et+1|Xt+1)∑xt
P(Xt+1|xt , e1:t)P(xt |e1:t)
= αP(et+1|Xt+1)∑xt
P(Xt+1|xt)P(xt |e1:t)
f1:t+1 = Forward(f1:t , et+1) where f1:t =P(Xt |e1:t)Time and space constant (independent of t) by keeping track of f
18
![Page 44: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/44.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningFiltering
Aim: devise a recursive state estimation algorithm:
P(Xt+1|e1:t+1) = f (et+1,P(Xt |e1:t))
P(Xt+1|e1:t+1) = P(Xt+1|e1:t , et+1)
= αP(et+1|Xt+1, e1:t)P(Xt+1|e1:t)
= αP(et+1|Xt+1)P(Xt+1|e1:t)
I.e., prediction + estimation. Prediction by summing out Xt :
P(Xt+1|e1:t+1) = αP(et+1|Xt+1)∑xt
P(Xt+1|xt , e1:t)P(xt |e1:t)
= αP(et+1|Xt+1)∑xt
P(Xt+1|xt)P(xt |e1:t)
f1:t+1 = Forward(f1:t , et+1) where f1:t =P(Xt |e1:t)Time and space constant (independent of t) by keeping track of f
18
![Page 45: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/45.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningFiltering example
tRain
tUmbrella
Raint −1
Umbrella t −1
Raint +1
Umbrella t +1
Rt −1 tP(R )
0.3f0.7t
tR tP(U )
0.9t0.2f
Rain1
Umbrella1
Rain2
Umbrella2
Rain0
0.8180.182
0.6270.373
0.8830.117
TrueFalse
0.5000.500
0.5000.500
19
![Page 46: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/46.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningFiltering example
tRain
tUmbrella
Raint −1
Umbrella t −1
Raint +1
Umbrella t +1
Rt −1 tP(R )
0.3f0.7t
tR tP(U )
0.9t0.2f
Rain1
Umbrella1
Rain2
Umbrella2
Rain0
0.8180.182
0.6270.373
0.8830.117
TrueFalse
0.5000.500
0.5000.500
19
![Page 47: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/47.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSmoothing
X 0 X 1
1E tE
tXX k
Ek
Divide evidence e1:t into e1:k , ek+1:t :
P(Xk |e1:t) = P(Xk |e1:k , ek+1:t)
= αP(Xk |e1:k)P(ek+1:t |Xk , e1:k)
= αP(Xk |e1:k)P(ek+1:t |Xk)
= αf1:kbk+1:t
Backward message computed by a backwards recursion:
P(ek+1:t |Xk) =Xxk+1
P(ek+1:t |Xk , xk+1)P(xk+1|Xk)
=Xxk+1
P(ek+1:t |xk+1)P(xk+1|Xk)
=Xxk+1
P(ek+1|xk+1)P(ek+2:t |xk+1)P(xk+1|Xk)
20
![Page 48: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/48.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSmoothing
X 0 X 1
1E tE
tXX k
Ek
Divide evidence e1:t into e1:k , ek+1:t :
P(Xk |e1:t) = P(Xk |e1:k , ek+1:t)
= αP(Xk |e1:k)P(ek+1:t |Xk , e1:k)
= αP(Xk |e1:k)P(ek+1:t |Xk)
= αf1:kbk+1:t
Backward message computed by a backwards recursion:
P(ek+1:t |Xk) =Xxk+1
P(ek+1:t |Xk , xk+1)P(xk+1|Xk)
=Xxk+1
P(ek+1:t |xk+1)P(xk+1|Xk)
=Xxk+1
P(ek+1|xk+1)P(ek+2:t |xk+1)P(xk+1|Xk)
20
![Page 49: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/49.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSmoothing
X 0 X 1
1E tE
tXX k
Ek
Divide evidence e1:t into e1:k , ek+1:t :
P(Xk |e1:t) = P(Xk |e1:k , ek+1:t)
= αP(Xk |e1:k)P(ek+1:t |Xk , e1:k)
= αP(Xk |e1:k)P(ek+1:t |Xk)
= αf1:kbk+1:t
Backward message computed by a backwards recursion:
P(ek+1:t |Xk) =Xxk+1
P(ek+1:t |Xk , xk+1)P(xk+1|Xk)
=Xxk+1
P(ek+1:t |xk+1)P(xk+1|Xk)
=Xxk+1
P(ek+1|xk+1)P(ek+2:t |xk+1)P(xk+1|Xk)20
![Page 50: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/50.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSmoothing example
Rain1
Umbrella1
Rain2
Umbrella2
Rain0
TrueFalse
0.8180.182
0.6270.373
0.8830.117
0.5000.500
0.5000.500
1.0001.000
0.6900.410
0.8830.117
forward
backward
smoothed0.8830.117
If we want to smooth the whole sequence:Forward–backward algorithm: cache forward messages along the wayTime linear in t (polytree inference), space O(t|f|)
21
![Page 51: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/51.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningMost likely explanation
Most likely sequence 6= sequence of most likely states (joint distr.)!
Most likely path to each xt+1= most likely path to some xt plus one more step
maxx1...xt
P(x1, . . . , xt ,Xt+1|e1:t+1)
= P(et+1|Xt+1)maxxt
(P(Xt+1|xt) max
x1...xt−1P(x1, . . . , xt−1, xt |e1:t)
)Identical to filtering, except f1:t replaced by
m1:t = maxx1...xt−1
P(x1, . . . , xt−1,Xt |e1:t),
I.e., m1:t(i) gives the probability of the most likely path to state i .
Update has sum replaced by max, giving the Viterbi algorithm:
m1:t+1 = P(et+1|Xt+1)maxxt
(P(Xt+1|xt)m1:t)
22
![Page 52: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/52.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningMost likely explanation
Most likely sequence 6= sequence of most likely states (joint distr.)!Most likely path to each xt+1
= most likely path to some xt plus one more step
maxx1...xt
P(x1, . . . , xt ,Xt+1|e1:t+1)
= P(et+1|Xt+1)maxxt
(P(Xt+1|xt) max
x1...xt−1P(x1, . . . , xt−1, xt |e1:t)
)
Identical to filtering, except f1:t replaced by
m1:t = maxx1...xt−1
P(x1, . . . , xt−1,Xt |e1:t),
I.e., m1:t(i) gives the probability of the most likely path to state i .
Update has sum replaced by max, giving the Viterbi algorithm:
m1:t+1 = P(et+1|Xt+1)maxxt
(P(Xt+1|xt)m1:t)
22
![Page 53: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/53.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningMost likely explanation
Most likely sequence 6= sequence of most likely states (joint distr.)!Most likely path to each xt+1
= most likely path to some xt plus one more step
maxx1...xt
P(x1, . . . , xt ,Xt+1|e1:t+1)
= P(et+1|Xt+1)maxxt
(P(Xt+1|xt) max
x1...xt−1P(x1, . . . , xt−1, xt |e1:t)
)Identical to filtering, except f1:t replaced by
m1:t = maxx1...xt−1
P(x1, . . . , xt−1,Xt |e1:t),
I.e., m1:t(i) gives the probability of the most likely path to state i .
Update has sum replaced by max, giving the Viterbi algorithm:
m1:t+1 = P(et+1|Xt+1)maxxt
(P(Xt+1|xt)m1:t)
22
![Page 54: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/54.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningMost likely explanation
Most likely sequence 6= sequence of most likely states (joint distr.)!Most likely path to each xt+1
= most likely path to some xt plus one more step
maxx1...xt
P(x1, . . . , xt ,Xt+1|e1:t+1)
= P(et+1|Xt+1)maxxt
(P(Xt+1|xt) max
x1...xt−1P(x1, . . . , xt−1, xt |e1:t)
)Identical to filtering, except f1:t replaced by
m1:t = maxx1...xt−1
P(x1, . . . , xt−1,Xt |e1:t),
I.e., m1:t(i) gives the probability of the most likely path to state i .
Update has sum replaced by max, giving the Viterbi algorithm:
m1:t+1 = P(et+1|Xt+1)maxxt
(P(Xt+1|xt)m1:t)
22
![Page 55: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/55.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningViterbi example
Rain1 Rain2 Rain3 Rain4 Rain5
true
false
true
false
true
false
true
false
true
false
.8182 .5155 .0361 .0334 .0210
.1818 .0491 .1237 .0173 .0024
m 1:1 m 1:5m 1:4m 1:3m 1:2
statespacepaths
mostlikelypaths
umbrella true truetruefalsetrue
23
![Page 56: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/56.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningHidden Markov models
Xt is a single, discrete variable (usually Et is too)Domain of Xt is {1, . . . ,S}
Transition matrix Tij = P(Xt = j |Xt−1 = i), e.g.,(
0.7 0.30.3 0.7
)Sensor matrix Ot for each time step, diagonal elements P(et |Xt = i)
e.g., with U1 = true, O1 =
(0.9 00 0.2
)Forward and backward messages as column vectors:
f1:t+1 = αOt+1T>f1:t
bk+1:t = TOk+1bk+2:t
Forward-backward algorithm needs time O(S2t) and space O(St)
24
![Page 57: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/57.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningKalman filters
Modelling systems described by a set of continuous variables,e.g., tracking a bird flying—Xt =X ,Y ,Z , X , Y , Z .Airplanes, robots, ecosystems, economies, chemical plants, planets, . . .
tZ t+1Z
tX t+1X
tX t+1X
Gaussian prior, linear Gaussian transition model and sensor model
25
![Page 58: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/58.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUpdating Gaussian distributions
Prediction step: if P(Xt |e1:t) is Gaussian, then prediction
P(Xt+1|e1:t) =
∫xt
P(Xt+1|xt)P(xt |e1:t) dxt
is Gaussian. If P(Xt+1|e1:t) is Gaussian, then the updated distribution
P(Xt+1|e1:t+1) = αP(et+1|Xt+1)P(Xt+1|e1:t)
is Gaussian
Hence P(Xt |e1:t) is multivariate Gaussian N(µt ,Σt) for all t
General (nonlinear, non-Gaussian) process: description of posterior growsunboundedly as t →∞
26
![Page 59: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/59.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningUpdating Gaussian distributions
Prediction step: if P(Xt |e1:t) is Gaussian, then prediction
P(Xt+1|e1:t) =
∫xt
P(Xt+1|xt)P(xt |e1:t) dxt
is Gaussian. If P(Xt+1|e1:t) is Gaussian, then the updated distribution
P(Xt+1|e1:t+1) = αP(et+1|Xt+1)P(Xt+1|e1:t)
is Gaussian
Hence P(Xt |e1:t) is multivariate Gaussian N(µt ,Σt) for all t
General (nonlinear, non-Gaussian) process: description of posterior growsunboundedly as t →∞
26
![Page 60: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/60.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearning2-D tracking example: filtering
8 10 12 14 16 18 20 22 24 266
7
8
9
10
11
12
X
Y
2D filtering
trueobservedfiltered
27
![Page 61: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/61.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearning2-D tracking example: smoothing
8 10 12 14 16 18 20 22 24 266
7
8
9
10
11
12
X
Y
2D smoothing
trueobservedsmoothed
28
![Page 62: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/62.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningWhere it breaks
Cannot be applied if the transition model is nonlinearExtended Kalman Filter models transition as locally linear around xt =µtFails if systems is locally unsmooth
29
![Page 63: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/63.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningDynamic Bayesian networks
Xt , Et contain arbitrarily many variables in a replicated Bayes net
0.3f0.7t
0.9t0.2f
Rain0 Rain1
Umbrella1
P(U )1R1
P(R )1R0
0.7
P(R )0
Z1
X1
X1tXX 0
X 0
1BatteryBattery 0
1BMeter
30
![Page 64: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/64.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningDBNs vs. HMMs
Every HMM is a single-variable DBN; every discrete DBN is an HMM
X t Xt+1
tY t+1Y
tZ t+1Z
Sparse dependencies ⇒ exponentially fewer parameters;e.g., 20 state variables, three parents eachDBN has 20× 23 = 160 parameters, HMM has 220× 220 ≈ 1012
31
![Page 65: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/65.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningDBNs vs Kalman filters
Every Kalman filter model is a DBN, but few DBNs are KFs;real world requires non-Gaussian posteriors
32
![Page 66: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/66.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSummary
Temporal models use state and sensor variables replicated over time
Markov assumptions and stationarity assumption, so we need– transition model P(Xt |Xt−1)– sensor model P(Et |Xt)
Tasks are filtering, prediction, smoothing, most likely sequence;all done recursively with constant cost per time step
Hidden Markov models have a single discrete state variable; usedfor speech recognition
Kalman filters allow n state variables, linear Gaussian, O(n3) update
Dynamic Bayes nets subsume HMMs, Kalman filters; exact updateintractable
33
![Page 67: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/67.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningOutline
1. Exercise
2. Uncertainty over Time
3. Speech Recognition
4. Learning
34
![Page 68: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/68.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningOutline
♦ Speech as probabilistic inference♦ Speech sounds♦ Word pronunciation♦ Word sequences
35
![Page 69: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/69.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSpeech as probabilistic inference
Speech signals are noisy, variable, ambiguous
What is the most likely word sequence, given the speech signal?I.e., choose Words to maximize P(Words|signal)
Use Bayes’ rule:
P(Words|signal) = αP(signal |Words)P(Words)
I.e., decomposes into acoustic model + language model
Words are the hidden state sequence, signal is the observation sequence
36
![Page 70: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/70.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSpeech as probabilistic inference
Speech signals are noisy, variable, ambiguous
What is the most likely word sequence, given the speech signal?I.e., choose Words to maximize P(Words|signal)
Use Bayes’ rule:
P(Words|signal) = αP(signal |Words)P(Words)
I.e., decomposes into acoustic model + language model
Words are the hidden state sequence, signal is the observation sequence
36
![Page 71: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/71.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSpeech as probabilistic inference
Speech signals are noisy, variable, ambiguous
What is the most likely word sequence, given the speech signal?I.e., choose Words to maximize P(Words|signal)
Use Bayes’ rule:
P(Words|signal) = αP(signal |Words)P(Words)
I.e., decomposes into acoustic model + language model
Words are the hidden state sequence, signal is the observation sequence
36
![Page 72: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/72.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningSpeech as probabilistic inference
Speech signals are noisy, variable, ambiguous
What is the most likely word sequence, given the speech signal?I.e., choose Words to maximize P(Words|signal)
Use Bayes’ rule:
P(Words|signal) = αP(signal |Words)P(Words)
I.e., decomposes into acoustic model + language model
Words are the hidden state sequence, signal is the observation sequence
36
![Page 73: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/73.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningPhones
All human speech is composed from 40-50 phones, determined by theconfiguration of articulators (lips, teeth, tongue, vocal cords, air flow)Form an intermediate level of hidden states between words and signal
⇒ acoustic model = pronunciation model + phone modelARPAbet designed for American English
[iy] beat [b] bet [p] pet[ih] bit [ch] Chet [r] rat[ey] bet [d] debt [s] set[ao] bought [hh] hat [th] thick[ow] boat [hv] high [dh] that[er] Bert [l] let [w] wet[ix] roses [ng] sing [en] button...
......
......
...E.g., “ceiling” is [s iy l ih ng] / [s iy l ix ng] / [s iy l en]
37
![Page 74: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/74.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningWord pronunciation models
Each word is described as a distribution over phone sequencesDistribution represented as an HMM transition model
0.5
0.5
0.2
0.8
[m]
[ey]
[ow][t]
[aa]
[t]
[ah]
[ow]
1.0
1.0
1.0
1.0
1.0
P([towmeytow ]|“tomato”) = P([towmaatow ]|“tomato”) = 0.1P([tahmeytow ]|“tomato”) = P([tahmaatow ]|“tomato”) = 0.4
Structure is created manually, transition probabilities learned from data
41
![Page 75: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/75.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningIsolated words
Phone models + word models fix likelihood P(e1:t |word) for isolatedword
P(word |e1:t) = αP(e1:t |word)P(word)
Prior probability P(word) obtained simply by counting word frequenciesP(e1:t |word) can be computed recursively: define
`1:t =P(Xt , e1:t)
and use the recursive update
`1:t+1 = Forward(`1:t , et+1)
and then P(e1:t |word) =∑
xt`1:t(xt)
Isolated-word dictation systems with training reach 95–99% accuracy42
![Page 76: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/76.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningContinuous speech
Not just a sequence of isolated-word recognition problems!– Adjacent words highly correlated– Sequence of most likely words 6= most likely sequence of words– Segmentation: there are few gaps in speech– Cross-word coarticulation—e.g., “next thing”
Continuous speech systems manage 60–80% accuracy on a good day
43
![Page 77: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/77.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningLanguage model
Prior probability of a word sequence is given by chain rule:
P(w1 · · ·wn) =n∏
i=1
P(wi |w1 · · ·wi−1)
Bigram model:
P(wi |w1 · · ·wi−1) ≈ P(wi |wi−1)
Train by counting all word pairs in a large text corpusMore sophisticated models (trigrams, grammars, etc.) help a little bit
44
![Page 78: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/78.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningCombined HMM
States of the combined language+word+phone model are labelled bythe word we’re in + the phone in that word + the phone state in thatphone
Viterbi algorithm finds the most likely phone state sequence
Does segmentation by considering all possible word sequences andboundaries
Doesn’t always give the most likely word sequence becauseeach word sequence is the sum over many state sequences
Jelinek invented A∗ in 1969 a way to find most likely word sequencewhere “step cost” is − logP(wi |wi−1)
45
![Page 79: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/79.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningOutline
1. Exercise
2. Uncertainty over Time
3. Speech Recognition
4. Learning
46
![Page 80: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/80.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningOutline
♦ Learning agents♦ Inductive learning♦ Decision tree learning♦ Measuring learning performance
47
![Page 81: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/81.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningLearning
Back to Turing’s article:- child mind program- education
Reward & Punishment
Learning is essential for unknown environments,i.e., when designer lacks omniscience
Learning is useful as a system construction method,i.e., expose the agent to reality rather than trying to write it down
Learning modifies the agent’s decision mechanisms to improveperformance
48
![Page 82: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/82.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningLearning agents
Performance standard
Agent
En
viron
men
t
Sensors
Effectors
Performance element
changes
knowledgelearning goals
Problem generator
feedback
Learning element
Critic
experiments
49
![Page 83: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/83.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningLearning element
Design of learning element is dictated by♦ what type of performance element is used♦ which functional component is to be learned♦ how that functional compoent is represented♦ what kind of feedback is available
Example scenarios:
Performance element
Alpha−beta search
Logical agent
Simple reflex agent
Component
Eval. fn.
Transition model
Transition model
Representation
Weighted linear function
Successor−state axioms
Neural net
Dynamic Bayes netUtility−based agent
Percept−action fn
Feedback
Outcome
Outcome
Win/loss
Correct action
Supervised learning: correct answers for each instanceReinforcement learning: occasional rewards
50
![Page 84: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/84.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningLearning element
Design of learning element is dictated by♦ what type of performance element is used♦ which functional component is to be learned♦ how that functional compoent is represented♦ what kind of feedback is available
Example scenarios:
Performance element
Alpha−beta search
Logical agent
Simple reflex agent
Component
Eval. fn.
Transition model
Transition model
Representation
Weighted linear function
Successor−state axioms
Neural net
Dynamic Bayes netUtility−based agent
Percept−action fn
Feedback
Outcome
Outcome
Win/loss
Correct action
Supervised learning: correct answers for each instanceReinforcement learning: occasional rewards
50
![Page 85: Lecture 11 Dynamic Bayesian Networks and Hidden Markov … › ~marco › DM533 › Slides › dm533-lec11.pdf · 2009-12-08 · Slides by Stuart Russell and Peter Norvig. Exercise](https://reader034.vdocument.in/reader034/viewer/2022042406/5f2002302692b4051b3ca5c4/html5/thumbnails/85.jpg)
ExerciseUncertainty over TimeSpeech RecognitionLearningLearning element
Design of learning element is dictated by♦ what type of performance element is used♦ which functional component is to be learned♦ how that functional compoent is represented♦ what kind of feedback is available
Example scenarios:
Performance element
Alpha−beta search
Logical agent
Simple reflex agent
Component
Eval. fn.
Transition model
Transition model
Representation
Weighted linear function
Successor−state axioms
Neural net
Dynamic Bayes netUtility−based agent
Percept−action fn
Feedback
Outcome
Outcome
Win/loss
Correct action
Supervised learning: correct answers for each instanceReinforcement learning: occasional rewards
50