hidden markov models - courses.cs.washington.edusimma, erik sudderth, david fernandez-baca, drena...
TRANSCRIPT
![Page 1: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/1.jpg)
Mausam
(Slides based on Dan Klein, Luke Zettlemoyer, Alex
Simma, Erik Sudderth, David Fernandez-Baca,
Drena Dobbs, Serafim Batzoglou, William Cohen,
Andrew McCallum, Dan Weld)
Hidden Markov ModelsChapter 15
1
![Page 2: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/2.jpg)
Temporal Models
• Graphical models with a temporal component
• St/Xt = set of unobservable variables at time t
• Wt/Yt = set of evidence variables at time t
• Notation Xa:b = Xa, Xa+1, …, Xb
2
![Page 3: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/3.jpg)
Target Tracking
• Estimate motion of targets in 3D world from indirect, potentially noisy measurements
3
Radar-based tracking
of multiple targets
Visual tracking of
articulated objects(L. Sigal et. al., 2006)
![Page 4: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/4.jpg)
Financial Forecasting
• Predict future market behavior from historical data, news reports, expert opinions, … 4
http://www.steadfastinvestor.com/
![Page 5: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/5.jpg)
Biological Sequence Analysis
• Temporal models can be adapted to exploit more general forms of sequential structure, like those arising in DNA sequences 5
(E. Birney, 2001)
![Page 6: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/6.jpg)
Speech Recognition• Given an audio
waveform, would like to robustly extract & recognize any spoken words
• Statistical models can be used to– Provide greater
robustness to noise
– Adapt to accent of different speakers
– Learn from training6
S. Roweis, 2004
![Page 7: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/7.jpg)
Markov Chain
• Set of states
– Initial probabilities
– Transition probabilities
Markov Chain models system dynamics7
![Page 8: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/8.jpg)
Markov Chains: Graphical Models
8
0.5
0.3
0.2
0.1
0.9 0.6
0.40.0
0.0
Difference from a Markov Decision Process?
it is a system that transitions by itself
![Page 9: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/9.jpg)
Hidden Markov Model
• Set of states
– Initial probabilities
– Transition probabilities
• Set of potential observations
– Emission/Observation probabilities
HMM generates observation sequence
o1 o2 o3 o4 o5
9
![Page 10: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/10.jpg)
Hidden Markov Models (HMMs)
Finite state machine
Graphical Model...Hidden states
Observations
o1 o2 o3 o4 o5 o6 o7 o8
Hidden state sequence
Observation sequence
Generates
Xt-2 Xt-1 Xt
yt-2 yt-1 yt...
...
...
Random variable Xt
takes values from{s1, s2, s3, s4}
Random variable yt takes values from s {o1, o2, o3, o4, o5, …}
10
![Page 11: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/11.jpg)
HMM
Finite state machine
o1 o2 o3 o4 o5 o6 o7 o8
Hidden state sequence
Observation sequence
Generates
Graphical Model...Hidden states
Observations
Xt-2 Xt-1 Xt
yt-2 yt-1 yt...
...
...
Random variable Xt
takes values from sss{s1, s2, s3, s4}
Random variable yt takes values from s {o1, o2, o3, o4, o5, …}
11
![Page 12: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/12.jpg)
HMM
Graphical Model...Hidden states
Observations
Xt-2 Xt-1 Xt
yt-2 yt-1 yt...
...
...
Random variable yt takes values from sss{s1, s2, s3, s4}
Random variable xt takes values from s {o1, o2, o3, o4, o5, …}
Need Parameters: Start state probabilities: P(x1=sk )Transition probabilities: P(xt=si | xt-1=sk)Observation probabilities: P(yt=oj | xt=sk )
12
![Page 13: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/13.jpg)
Hidden Markov Models
13
• Just another graphical model…
“Conditioned on the present,
the past & future are independent”
hidden
states
observed
vars
Transition
Distribution
Ob
se
rva
tio
n
Dis
trib
utio
n
![Page 14: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/14.jpg)
Hidden states
14
hidden
states
observed
process
• Given , earlier observations provide no
additional information about the future:
![Page 15: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/15.jpg)
HMM Generative Process
15
We can easily sample sequences pairs:
X0:n,Y0:n = S0:n,W0:n
For i = 1 ... n
Sample si from the distribution P(si|si-1)
Sample wi from the distribution P(wi|si)
Sample initial state: P(x0)
![Page 16: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/16.jpg)
Example: POS Tagging
• Useful as a pre-processing step
16
DT NN IN NN VBD NNS VBDThe average of interbank offered rates plummeted …
DT NNP NN VBD VBN RP NN NNS
The Georgia branch had taken on loan commitments …
Setup:
states S = {DT, NNP, NN, ... } are the POS tags
Observations W = V are words
Transition dist’n P(si|si-1) models the tag sequences
Observation dist’n P(wi|si) models words given their POS
![Page 17: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/17.jpg)
Example: Chunking
• Find spans of text with certain properties
• For example: named entities with types
– (PER, ORG, or LOC)
• Germany ’s representative to the European Union’s veterinary committee Werner Zwingman said on Wednesday consumers should ...
• [Germany]LOC ’s representative to the [European Union]ORG ‘s veterinary committee [Werner Zwingman]PER said on Wednesday consumers should ...
17
![Page 18: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/18.jpg)
Example: Chunking
• [Germany]LOC ’s representative to the [European Union]ORG ‘s veterinary committee [Werner Zwingman]PER said on Wednesday consumers should ...
• Germany/BL ’s/NA representative/NA to/NA the/NA European/BO Union/CO ‘s/NA veterinary/NA committee/NA Werner/BP Zwingman/CP said/NA on/NA Wednesday/NA consumers/NA should/NA ...
• HMM Model:– States S = {NA, BL, CL, BO, CO, BL, CL} represent beginnings (BL, BO, BP} and
continuations (CL, CO, CP) of chunks, and other (NA)
– Observations W = V are words
– Transition dist’n P(si |si -1) models the tag sequences
– Observation dist’n P(wi |si) models words given their type
18
![Page 19: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/19.jpg)
Example: The Occasionally Dishonest CasinoA casino has two dice:• Fair die:
P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6• Loaded die:
P(1) = P(2) = P(3) = P(4) = P(5) = 1/10; P(6) = 1/2
• Dealer switches between dice as:
– Prob(Fair Loaded) = 0.01
– Prob(Loaded Fair) = 0.2
– Transitions between dice obey a Markov process
Game:1. You bet $12. You roll (always with a fair die)3. Casino player rolls
(maybe with fair die, maybe with loaded die)
4. Highest number wins $2 19
![Page 20: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/20.jpg)
An HMM for the occasionally dishonest casino
20
P(1|F) = 1/6
P(2|F) = 1/6
P(3|F) = 1/6
P(4|F) = 1/6
P(5|F) = 1/6
P(6|F) = 1/6
P(1|L) = 1/10
P(2|L) = 1/10
P(3|L) = 1/10
P(4|L) = 1/10
P(5|L) = 1/10
P(6|L) = 1/2
![Page 21: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/21.jpg)
Question # 1 – Evaluation
GIVEN
A sequence of rolls by the casino player
124552646214614613613666166466163661636616361…
QUESTION
How likely is this sequence, given our model of how the casino works?
This is the EVALUATION problem in HMMs21
![Page 22: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/22.jpg)
Question # 2 – Decoding
GIVEN
A sequence of rolls by the casino player
1245526462146146136136661664661636616366163…
QUESTION
What portion of the sequence was generated with the fair die, and what portion with the loaded die?
This is the DECODING question in HMMs22
![Page 23: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/23.jpg)
Question # 3 – Learning
GIVEN
A sequence of rolls by the casino player
124552646214614613613666166466163661636616361651…
QUESTION
How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back?
This is the LEARNING question in HMMs23
![Page 24: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/24.jpg)
HMM Inference
• Evaluation: prob. of observing an obs. sequence
– Forward Algorithm (very similar to Viterbi)
• Decoding: most likely sequence of hidden states
– Viterbi algorithm
• Marginal distribution: prob. of a particular state
– Forward-Backward
24
![Page 25: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/25.jpg)
Decoding ProblemGiven w=w1 …wn and HMM θ, what is “best” parse s1 …sn?
Several possible meanings of ‘solution’1. States which are individually most likely2. Single best state sequence
We want sequence s1 …sn,such that P(s|w) is maximized
s* = argmaxs P( s|w )
1
2
K
…
1
2
K
…
1
2
K
…
…
…
…
1
2
K
…
o1 o2 o3 oT
2
1
K
2
25
![Page 26: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/26.jpg)
Most Likely Sequence
• Problem: find the most likely (Viterbi) sequence under the model
26
P(s0:n,w0:n) = P(NNP|) P(Fed|NNP) P(VBZ|NNP) P(raises|VBZ) P(NN|NNP)…..
NNP VBZ NN NNS CD NN
NNP NNS NN NNS CD NN
NNP VBZ VB NNS CD NN
logP = -23
logP = -29
logP = -27
In principle, we’re done – list all possible tag sequences, score each one, pick the best one (the Viterbi state sequence)
Fed raises interest rates 0.5 percent .
NNP VBZ NN NNS CD NN .
Given model parameters, we can score any sequence pair
2n multiplications per sequence
|S|n state sequences!
![Page 27: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/27.jpg)
The occasionally dishonest casino
• Known:– The structure of the model– The transition probabilities
• Hidden: What the casino did– FFFFFLLLLLLLFFFF...
• Observable: The series of die tosses– 3415256664666153...
• What we must infer:– When was a fair die used?– When was a loaded one used?
• The answer is a sequenceFFFFFFFLLLLLLFFF...
27
![Page 28: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/28.jpg)
The occasionally dishonest casino
00227.0
6
199.0
6
199.0
6
15.0
)|6()|()|2()|()|6()0|(),Pr( )1(
FpFFpFpFFpFpFpsw
28
008.0
5.08.01.08.05.05.0
)|6()|()|2()|()|6()0|(),Pr( )2(
LpLLpLpLLpLpLpsw
0000417.0
5.001.06
12.05.05.0
)|6()|()|2()|()|6()0|(),Pr( )3(
LpFLpFpLFpLpLpsw
LLLs )2(
LFLs )3(
6,2,6,, 321 wwww
FFFs )2(
![Page 29: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/29.jpg)
Finding the Best Trajectory • Too many trajectories (state sequences) to list• Option 1: Beam Search
– A beam is a set of partial hypotheses– Start with just the single empty trajectory– At each derivation step:
• Consider all continuations of previous hypotheses• Discard most, keep top k
29
<>
Fed:N
Fed:V
Fed:J
raises:N
raises:V
raises:N
raises:V
Beam search works ok in practice … but sometimes you want the optimal answer
… and there’s usually a better option than naïve beams
![Page 30: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/30.jpg)
The State Lattice / Trellis
30
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
START Fed raises interest rates END
![Page 31: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/31.jpg)
The State Lattice / Trellis
31
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
START Fed raises interest rates END
![Page 32: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/32.jpg)
32
--1)
Dynamic Programming
δi(s): probability of most likely state sequence ending with state s, given observations w1, …, wi
![Page 33: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/33.jpg)
33
--1)
Dynamic Programming
δi(s): probability of most likely state sequence ending with state s, given observations w1, …, wi
![Page 34: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/34.jpg)
34
Vitterbi Algorithm
![Page 35: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/35.jpg)
The Viterbi Algorithmw1 w2 …w i-1 w i………………………………wN
State 1
2
K
i δi(s)
Maxs’ δi-1(s’) * Ptrans* Pobs
Remember: δi(s) = probability of most likely state seq ending with s at time i
35
![Page 36: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/36.jpg)
Terminating Viterbi
δ
δ
δ
δ
δ
w1 w2 …………………………………………..wN
State 1
2
K
iChoose Max
36
![Page 37: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/37.jpg)
Terminating Viterbi
Time: O(|S|2N)Space: O(|S|N)
w1 w2 …………………………………………..wN
State 1
2
K
i
Linear in length of sequence
Maxs’ δN-1(s’) * Ptrans* Pobs
Max
How did we compute *?
δ*
Now Backchain to Find Final Sequence
37
![Page 38: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/38.jpg)
Viterbi: Example
1
s
w
0
0
6 2 6
(1/6)(1/2)= 1/12
0
(1/2)(1/2)= 1/4
(1/6)max{(1/12)0.99,(1/4)0.2}
= 0.01375
(1/10)max{(1/12)0.01,(1/4)0.8}
= 0.02
B
F
L
0 0
(1/6)max{0.013750.99,0.020.2}
= 0.00226875
(1/2)max{0.013750.01,0.020.8}
= 0.08
)'()'|(max)|()( 1'
ssspswps is
ii
38
Viterbi: Example
![Page 39: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/39.jpg)
Viterbi gets it right more often than not
39
![Page 40: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/40.jpg)
Computing Marginals
• Problem: find the marginal distribution
40
In principle, we’re done – list all possible tag sequences,
score each one, sum up the values
Fed raises interest rates 0.5 percent .
NNP VBZ NN NNS CD NN .
P(NNP|) P(Fed|NNP) P(VBZ|NNP) P(raises|VBZ) P(NN|NNP)…..
Given model parameters, we can score any tag sequence
![Page 41: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/41.jpg)
The State Lattice / Trellis
41
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
^
N
V
J
D
$
START Fed raises interest rates END
![Page 42: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/42.jpg)
The Forward Backward Algorithm
42
)|(),(
),|(),(
),,(
:1:0
:0:1:0
:1:0
iniii
iiniii
niii
swPwsP
wswPwsP
wwsP
),( :0 ni wsP
![Page 43: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/43.jpg)
The Forward Backward Algorithm
Sum over all paths, on both sides:
43
![Page 44: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/44.jpg)
44
The Forward Backward Algorithm
![Page 45: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/45.jpg)
HMM Learning
• Learning from data D
– Supervised
• D = {(s0:n,w0:n)i | i = 1 ... m}
– Unsupervised
• D = {(w0:n)i | i = 1 ... m}
• We won’t do this case!
• (~hidden vars) EM– Also called Baum Welch algorithm
45
![Page 46: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/46.jpg)
Supervised Learning
– Given data D = {Xi | i = 1 ... m } where Xi=(s0:n,w0:n) is a state, observation sequence pair
– Define the parameters Θ to include:• For every pair of states:
• For every state, obs. pair:
– Then the data likelihood is:
46
And the maximum likelihood solutions is
![Page 47: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/47.jpg)
Final ML Estimates (as in BNs)
– c(s,s’) and c(s,w) are the empirical counts of transitions and observations in the data D
– The final, intuitive, estimates:
47
Just as with BNs, the counts can be zero
use smoothing techniques!
![Page 48: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/48.jpg)
The Problem with HMMs
• We want more than an Atomic View of Words
• We want many arbitrary, overlapping features of words
identity of word
ends in “-ly”, “-ed”, “-ing”
is capitalized
appears in a name database/Wordnet
…
xt -1
xt
wt
xt+1
wt +1
wt -1
…
…
Use discriminative models instead of generative ones(e.g., Conditional Random Fields)
48
![Page 49: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/49.jpg)
Finite State Models
Naïve Bayes
Logistic Regression
Linear-chain CRFs
HMMsGenerative
directed models
General CRFs
Sequence
Sequence
Conditional Conditional Conditional
GeneralGraphs
GeneralGraphs
49
![Page 50: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov](https://reader034.vdocument.in/reader034/viewer/2022042018/5e75d370eb49c67250311763/html5/thumbnails/50.jpg)
Temporal Models
• Full Bayesian Networks have dynamic versions too
– Dynamic Bayesian Networks (Chapter 15.5)
– HMM is a special case
• HMMs with continuous variables often useful for filtering (estimating current state)
– Kalman filters (Chapter 15.4)
50