hidden markov models (hmms)
DESCRIPTION
Hidden Markov Models (HMMs). Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic in nature Can be used for handwriting, keystroke biometrics. Classification with Static Features. Simpler than dynamic problem - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/1.jpg)
1
Hidden Markov Models (HMMs)
• Probabilistic Automata
• Ubiquitous in Speech/Speaker Recognition/Verification
• Suitable for modelling phenomena which are dynamic in nature
• Can be used for handwriting, keystroke biometrics
![Page 2: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/2.jpg)
2
Classification with Static Features
• Simpler than dynamic problem
• Can use, for example, MLPs
• E.g. In two dimensional space:
x x
xx
xx x o
ooo
oo
o
![Page 3: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/3.jpg)
3
Hidden Markov Models (HMMs)• First: Visible VMMs
• Formal Definition• Recognition• Training
• HMMs• Formal Definition• Recognition• Training• Trellis Algorithms
• Forward-Backward• Viterbi
![Page 4: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/4.jpg)
4
Visible Markov Models
• Probabilistic Automaton
• N distinct states S = {s1, …, sN}
• M-element output alphabet K = {k1, …, kM}
• Initial state probabilities Π = {πi}, i S
• State transition at t = 1, 2,…
• State trans. probabilities A = {aij}, i,j S
• State sequence X = {X1, …, XT}, Xt S
• Output seq. O = {o1, …, oT}, ot K
![Page 5: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/5.jpg)
5
VMM: Weather Example
2.0
5.0
3.0
3
2
1
![Page 6: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/6.jpg)
6
Generative VMM
• We choose the state sequence probabilistically…
• We could try this using:• the numbers 1-10• drawing from a hat• an ad-hoc assignment scheme
![Page 7: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/7.jpg)
7
• Training Problem– Given an observation sequence O and a “space”
of possible models which spans possible values for model parameters w = {A, Π}, how do we find the model that best explains the observed data?
• Recognition (decoding) problem– Given a model wi = {A, Π}, how do we compute
how likely a certain observation is, i.e. P(O | wi) ?
2 Questions
![Page 8: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/8.jpg)
8
Training VMMs
• Given observation sequences Os, we want to find model parameters w = {A, Π} which best explain the observations
• I.e. we want to find values for w = {A, Π} that maximises P(O | w)
• {A, Π} chosen = argmax {A, Π} P(O | {A, Π})
![Page 9: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/9.jpg)
9
• Straightforward for VMMs• frequency in state i at time t =1•
(number of transitions from state i to state j)-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(number of transitions from state i)
=(number of transitions from state i to state j)-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(number of times in state i)
Training VMMs
iija
![Page 10: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/10.jpg)
10
Recognition
• We need to calculate P(O | wi)
• P(O | wi) is handy for calculating P(wi|O)
• If we have a set of models L = {w1,w2,…,wV} then if we can calculate P(wi|O) we can choose the model which returns the highest probability, i.e.
wchosen = argmax wi L P(wi|O)
![Page 11: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/11.jpg)
11
Recognition• Why is P(O | wi) of use?
• Let’s revisit speech for a moment.• In speech we are given a sequence of
observations, e.g. a series of MFCC vectors– E.g. MFCCs taken from frames of length 20-
40ms, every 10-20 ms
• If we have a set of models L = {w1,w2,…,wV} and if we can calculate P(wi|O) we can choose the model which returns the highest probability, i.e.wchosen = argmax wi L P(wi|O)
![Page 12: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/12.jpg)
12
wchosen = argmax wi L P(wi|O)• P(wi|O) difficult to calculate as we would have
to have a model for every possible observation sequence O
• Use Bayes’ rule:P(x | y) = P (y | x) P(x) / P(y)
• So now we have
wchosen = argmax wi L P(O |wi) P(wi) / P(O)• P(wi) can be easily calculated• P(O) is the same for each calculation and so
can be ignored
• So P(O |wi) is the key!!!
![Page 13: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/13.jpg)
13
Hidden Markov Models• Probabilistic Automaton
• N distinct states S = {s1, …, sN}• M-element output alphabet K = {k1, …, kM}• Initial state probabilities Π = {πi}, i S• State transition at t = 1, 2,…• State trans. probabilities A = {aij}, i,j S• Symbol emission probabilities
B = {bik}, i S, k K• State sequence X = {X1, …, XT}, Xt S• Output sequence O = {o1, …, oT}, ot K
![Page 14: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/14.jpg)
14
HMM: Weather Example
![Page 15: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/15.jpg)
15
State Emission Distributions
0
0.1
0.2
0.3
0.4
0.5
sunny rainy cloudy
Discrete probability distribution
![Page 16: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/16.jpg)
16
State Emission Distributions
Continuous probability distribution
5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Temperature (C)
![Page 17: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/17.jpg)
17
Generative HMM
• Now we not only choose the state sequence probabilistically…
• …but also the state emissions• Try this yourself using the numbers 1-10 and
drawing from a hat...
![Page 18: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/18.jpg)
18
• Recognition (decoding) problem– Given a model wi = {A, B, Π}, how do we compute how likely
a certain observation is, i.e. P(O | wi) ?
• State sequence?– Given the observation sequence and a model how do we
choose a state sequence X = {X1, …, XT} that best explains
the observations
• Training Problem– Given an observation sequence O and a “space” of possible
models which spans possible values for model parameters w = {A, B, Π}, how do we find the model that best explains the observed data?
3 Questions
![Page 19: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/19.jpg)
19
Computing P(O | w)• For any particular state sequence X = {X1, …, XT}
we have
TT oXoXoX
ttTt
bbb
wXoP) P(O | X, w
2211
),|(1
and
TT XXXXXXX aaa w) P(X132211
|
![Page 20: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/20.jpg)
20
• This requires (2T) NT multiplications
• Very inefficient!
Computing P(O | w)
T
ttttXX
oX
T
XXoXX
X
bab
wXPwXOPwOP
wXPwXOPwXOP
1
1111 2
)|(),|()|(
)|(),|()|,(
![Page 21: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/21.jpg)
21
Trellis Algorithms
• Array of states vs. time
![Page 22: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/22.jpg)
22
• Overlap in paths implies repetition of the same calculations
• Harness the overlap to make calculations efficient• A node at (si , t) stores info about state sequences
that contain Xt = si
Trellis Algorithms
![Page 23: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/23.jpg)
23
• Consider 2 states and 3 time points:
Trellis Algorithms
31112221111
32212111111
31112111111
1
1111 2
)|(),|()|(
osssosssoss
osssosssoss
osssosssoss
XXoX
T
XXoXX
X
babab
babab
babab
bab
wXPwXOPwOP
T
tttt
![Page 24: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/24.jpg)
24
• A node at (si , t) stores info about state sequences up to
time t that arrive at si
Forward Algorithm
s1
s2
sj
)(tis
)(1
ts
)(2
ts
)1( tjs
jssa1
jssa2
![Page 25: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/25.jpg)
25
Forward Algorithm
N
i s
os
N
i ssss
osss
itts
TwOP
Tt
Njbatt
Nib
wsXoooPt
i
tjjiij
iii
i
1
1
21
)()|(:nTerminatio
11
,1,)()1(
:Induction
1,)1(:tionInitialisa
)|,()(:Definition
1
1
![Page 26: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/26.jpg)
26
• A node at (si , t) stores info about
state sequences from time t that evolve from
si
Backward Algorithm
s1
s2
si
)(tis
)1(1
ts
)1(2
ts
)(tis
111 ti osss ba
122 ti osss ba
![Page 27: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/27.jpg)
27
Backward Algorithm
N
i soss
sos
N
j sss
s
itTtts
iii
jtjjii
i
i
bwOP
Tt
Njtbat
NiT
wsXoooPt
1
1
21
)1()|(:nTerminatio
1,,1
,1),1()(
:Induction
1,1)(:tionInitialisa
)|,()(:Definition
1
1
![Page 28: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/28.jpg)
28
• P(O | w) as calculated from the forward and backward algorithms should be the same
• FB algorithm usually used in training• FB algorithm not suited to recognition as it
considers all possible state sequences• In reality, we would like to only consider the
“best” state sequence (HMM problem 2)
Forward & Backward Algorithms
![Page 29: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/29.jpg)
29
“Best” State Sequence
• How is “best” defined?
• We could choose most likely individual state at each time t:
),|(maxarg1
wOsXP itNi
![Page 30: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/30.jpg)
30
),|(maxarg1
wOsXP itNi
•Define
N
i ss
ss
N
i it
it
it
its
tt
tt
wsXOP
wsXOP
wOP
wsXOP
wOsXPt
ii
ii
i
1
1
)()(
)()(
)|,(
)|,(
)|(
)|,(
),|()(
![Page 31: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/31.jpg)
31
Viterbi Algorithm
…may produce an unlikely or even invalid state sequence
• One solution is to choose the most likely state sequence:
),|(maxarg1
wOsXP itNi
)|,(maxarg),|(maxarg wOXPwOXPXX
![Page 32: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/32.jpg)
32
• Define
Viterbi Algorithm
wooosXXXXPt tittXXX
st
i|,,max)( 21121
121
)(tis is the best score along a single path, at time
t, which accounts for the first t observations and
ends in state si
By induction we have:
1])([max)1(
tijiij ossssi
s batt
![Page 33: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/33.jpg)
33
Viterbi Algorithm
Tt
Njat
Tt
Njbatt
Nib
jiii
tijiij
i
iii
sssNi
s
ossssNi
s
s
osss
2
1,])1([maxarg)1(
2
1,])1([max)(
:Recursion
0)1(
1,)1(
:tionInitialisa
1
1
1
![Page 34: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/34.jpg)
34
Viterbi Algorithm
1,,2,1),1(
:ngbacktrackiby Path
)]([maxarg
)]([max
:nTerminatio
*1
*
1
*
1
*
TTttX
TX
TP
t
i
i
Xt
sNi
T
sNi
![Page 35: Hidden Markov Models (HMMs)](https://reader035.vdocument.in/reader035/viewer/2022062222/568149ee550346895db7203a/html5/thumbnails/35.jpg)
35
Viterbi vs. Forward Algorithm
• Similar in implementation– Forward sums over all incoming paths– Viterbi maximises
• Viterbi probability Forward probability
• Both efficiently implemented using a trellis structure