cs 3710: hidden markov models (hmms)milos/courses/cs3710/lectures/class24.pdf · 1 cs 3710: hidden...
TRANSCRIPT
1
CS 3710:Hidden Markov Models (HMMs)
Presented by Paul HoffmannNov 30th, 2005
Overview (Chapter 12)
HMMs vs MMs (Mixture Models)Full joint, parameterization of HMMInference problemsInference algorithmsParameter estimation (EM)
2
HMMs vs. MMs
HMMsgeneralization of MMs“dynamic” MMsmixture components called states
HMMMM
MMs: Generation Example
Let yt = (y1t,y2
t)1) Randomly select mixture component according to P(qt)2) Randomly select y according to P(yt|qt) x
xx
x
x xxx
xxx
x xy1
y2
3
HMMs: Generation Example
Let yt = (y1t,y2
t)1) Randomly select state according to P(qt+1|qt)2) Randomly select y according to P(yt|qt)
xxx
x x x xxx
xxx
x
y1
y2
Some Notation
number of states + 1T
number of components of state
M
index component of stateinteger from 1 to M
i, j
state transition matrixA
observable outputyt
ith component of state qt
(1 or 0)qi
t
state at time t (from multinomial distribution
qt
timet
MeaningVariable
4
Graphical Model
Q0 Qt Qt+1 QT
Y0 Yt Yt+1 YT
… …
∏∏=
−
=+=
T
ttt
T
ttt qypqqpqpyqp
0
1
010 )|()|()(),( rr
Chain Rule for BBNs:
CPT for Q0
Q0 πi=P(Q0=i)M …i…1
Q0
∏=
≡M
i
qiq
i
1
0
0][ππotherwise 0 i, πif 1 t ==itπ
5
CPT for Qt+1
…
M…i
1Qt Qt+1
ai,j=P(Qt+1=j|Qt=i)
M …j …1
Qt Qt+1
∏∏= =
+
+≡
M
i
M
j
qqijqq
jt
it
ttaa
1 1,
1
1][otherwise 0 i,q if 1 == t
itq
CPT for Yt
?
Qt
Yt
6
Graphical Model
Q0 Qt Qt+1 QT
Y0 Yt Yt+1 YT
… …
∏∏
∏∏
=
−
=
=
−
=+
+=
=
T
ttt
T
t,qqq
T
ttt
T
ttt
qypayqp
qypqqpqpyqp
tt0
1
0
0
1
010
)|(),(
)|()|()(),(
10πrr
rrChain Rule for BBNs:
Inference
)|()|(yqPyqP
tr
rr
wzyyqP wz =),,...,|( 0wzyyqP wz >),,...,|( 0
wzyyqP wz <),,...,|( 0
Filtering:Prediction:Smoothing:
7
Computing the Posterior
)(),()|(
yPyqPyqP r
rrrr=
∏∏=
−
=+
=T
ttt
T
t,qqq qyPayqPtt
0
1
0
)|(),(10
πrr
∑∑ ∑∏ ∏−
= =+
=0 1
1
1
0 0, ),|(...)(
q q q
T
t
T
tttqq
T
ttqyPayP ηr
T+1 summations, each summing M variables
Computing P(qt|y)
Goal: compute P(qt|y)Problem: conditioning on y for HMMsdoesn’t result in conditional independencesSolution: use Bayes Rule so can condition on qt
)()()|()|(
yPqPqyPyqP tt
t r
rr=
8
Computing P(qt|y)
Goal: compute P(qt,y) and P(y)Step 1: obtain P(y) from P(qt,y)
Step 2: obtain P(qt,y) by using conditional independences and recursion
recursion allows us to use dynamic programming
∑=tq
t yqPyP ),()( rrα-β Recursion
9
Computing P(qt,y)
Goal: use conditional independences to split up P(qt,y)
Solution: use qt to split up
Solution: regroup and combine terms, use recursion on each result
)()|,...,()|,...,()()|( 10 ttTttttt qPqyyPqyyPqPqyP +=r
Q0 Qt Qt+1 QT
Y0 Yt Yt+1 YT
… …
)|,...,()(),,...,()(
where)()()()|(
1
0
tTtt
ttt
tttt
qyyPqqyyPq
qqqPqyP
+==
=
βα
βαr
α Recursion
Goal: define α as a recursive function
Q0 Qt Qt+1 QT
Y0 Yt Yt+1 YT
… …
∑
∑
∑
∑
∑
++
+++
+++
++++
+++
+++
++++
+++
+++
+=
=
=
=
=
===
=
t
tt
t
t
t
t
qttqqt
qtttttt
qttttttt
qtttttttt
qttttt
tttt
ttttt
ttt
ttt
qyPaq
qyPqqPqyyP
qyPqPqqPqyyP
qyPqPqqPqqyyP
qyPqqyyPqyPqyyP
qPqyPqyyPqPqyyPqyyPq
)|()(
)|()|(),,...,(
)|()()|()|,...,(
)|()()|(),|,...,(
)|(),,,...,()|(),,...,(
)()|()|,...,()()|,...,(),,...,()(
11,
1110
1110
11110
1110
1110
11110
1110
1101
1α
α
#use chain rule so can condition on qt+1#use CIs from HMM graph#regroup & combine#introduce qt so can have α(qt) term
#use chain rule
#use CIs from HMM graph
#regroup & combine
#definitions of α, a
10
α Recursion
0)|()()|(),()( 00000000 qqyPqPqyPqyPq πα ===
Base Case:
Computational Complexity of Each Step: O(M2)
∑ +++ +=
t
ttq
ttqqtt qyPaqq )|()()( 11,1 1αα
qt+1 takes on M different values
2M multiplications
Need to do these computations for t = 1 to t = T.Computational complexity O(M2T)
β Recursion
Goal: define β as a recursive function
Q0 Qt Qt+1 QT
Y0 Yt Yt+1 YT
… …
∑
∑
∑
∑
+
+
+
+
+
+++
+++++
+++
++
+
=
=
=
=
=
1
1
1
1
1
)|()(
)|()|()|,...,(
)|(),|,...,(
)|,,...,()|,...,()(
11,1
11112
111
11
1
t
tt
t
t
t
qttqqt
qtttttTt
qttttTt
qttTt
tTtt
qyPaq
qqPqyPqyyP
qqPqqyyP
qqyyPqyyPq
β
β#introduce qt+1 so can have β(qt+1) term
#use chain rule
#use CIs from HMM graph
#definitions of β, a
11
β Recursion: Base Case
∑
∑
∑
∑
−=
=
=
=
=
−
−−
−
−−
T
TT
T
T
T
qTTqq
qTTTT
qTTTTT
qTTT
TTT
qyPa
qqPqyP
qqPqqyP
qqyPqyPq
)|(*1
)|()|(
)|(),|(
)|,()|()(
,
1
11
1
11
1
β#introduce qT
#use chain rule
#use CIs from HMM graph
#definitions of a
∑+
+ +++=1
1)|()()( 11,1
t
ttq
ttqqtt qyPaqq ββ
β(qT-1) has same form as other β(qt)'s if β(T) is set to 1
α-γ Recursion
12
α-γ Recursion
Do alpha recursion, then gamma recursionGamma function definition:
)()()()|()(
yPqqyqPq tt
tt rr βαγ =≡
γ Recursion
Goal: define γ as a recursive function
Q0 Qt Qt+1 QT
Y0 Yt Yt+1 YT
… …
∑∑
∑∑
∑∑
∑
∑
∑
+ +
+
+
+
+
+
+
+
++
+
++
+
++
++
+
=
=
=
=
=
=
=
1 1
1
1
1
1
1
1
)()(
)(
),...,|()|(),...,,(
)|(),...,,(
),...,|(),...,,,(
),...,,,(
),...,|(),...,,|(
),...,|(),...,,|(
),...,|,(),...,|()(
1,
,
0110
10
0101
01
0101
0101
01
0
t
t
tt
tt
t
t
t
t
t
t
t
qt
qqqt
qqt
qTt
qtttt
tttt
qTt
qttt
ttt
qTtttt
qTtTtt
qTtt
Ttt
qaqaq
yyqPqqPyyqqqPyyqP
yyqPyyqqPyyqqP
yyqPyyqqP
yyqPyyqqP
yyqqPyyqPq
γα
α
γ
#introduce qt+1 so can have γ(qt+1) term
#use chain rule to get γ(qt+1) term
#use CIs from HMM graph
#use definition for P(X|Y)
#split using CIs from HMM graph
#definitions of α, a, γ
13
γ Recursion: Base Case
∑∑
∑∑
∑∑
∑
∑
∑
+ −
−
+
−
−
−
−
−−−
−−−
−−
−−
−−
−
−
−−
=
=
=
=
=
=
=
1 1
1
1
1
1
)()(
)(
),...,|()|(),...,,(
)|(),...,,(
),...,|(),...,,,(
),...,,,(
),...,|(),...,,|(
),...,|(),...,,|(
),...,|,(),...,|()(
,1
,1
01101
1101
0101
101
0101
001
01
011
t
t
TT
TT
t
T
T
T
T
T
T
qT
qqqT
qqT
qTT
qTTTT
TTTT
qTT
qTTT
TTT
qTTTTT
qTTTTT
qTTT
TTT
qaqaq
yyqPqqPyyqPqqPyyqP
yyqPyyqqyyqqP
yyqPyyqqP
yyqPyyqqP
yyqqPyyqPq
αα
α
γ
#introduce qT
#use chain rule
#use CIs from HMM graph
#use definition for P(X|Y)
#split using CIs from HMM graph
#definitions of α, a
∑∑+ +
++=
1 1
1 )()(
)()( 1
,
,
t
t
tt
tt
qt
qqqt
qqtt q
aqaq
q γα
αγ
base case
EM
Initialize parameters θdo
Set θ' = θ1) Expectation
Complete all hidden and missing values with expectations given current set of parameters θ'
2) MaximizationUse completed data to compute new estimates for θ
while improvement possible
14
EM
For this example, it is assumed that the outputs yt are multinomial
∏∏= =
=M
i
N
j
yqtt
jt
itqyP
1 1
][),|( ηη
EM
log likelihood
∑∑∑∑∑∑∑
∏∏∏∏∏∏∏
∏∏
= = =
−
= = =+
=
= = =
−
= = ==
=
−
=
++=
=
=
+
+
T
t
M
i
N
jji
jt
it
T
t
M
i
M
jji
jt
it
M
ii
i
T
t
M
i
N
j
yqji
T
t
M
i
M
j
qqji
M
i
qi
T
ttt
T
tqqq
yqaqqq
a
qypayqp
jt
it
jt
it
i
tt
0 1 1,
1
0 1 1,1
10
0 1 1,
1
0 1 1,
1
0
1
0,
logloglog
}][][][log{
}),|(log{),(log
10
10
ηπ
ηπ
ηπr
15
EM
∑
∑
=
−
=+
≡
≡
T
t
jt
itjiji
T
t
jt
itjiji
ii
yqn
qqma
q
0,,
1
01,,
0
:
:
:
η
π
Sufficient statistics Maximum likelihoodestimates
∑
∑
=
=
=
=
=
N
kki
jiji
M
kki
jiji
ii
n
n
m
ma
q
1,
,,
1,
,,
0
ˆ
ˆ
ˆ
η
π
Expectation
∑
∑−
=+
=
≡
≡
1
0
,1,
)(,
0
)(,
),|(
),|(
T
t
jitt
pji
T
t
jt
it
pji
ymE
yynE
ξθ
γθ
16
Maximization (Baum-Welch updates)
ipi
M
k
T
t
it
T
t
jitt
M
k
T
t
kitt
T
t
jitt
pji
T
t
it
T
t
jt
it
N
k
T
t
kt
it
T
t
jt
it
pji
a
y
y
y
0)1(
1
1
0
1
0
,1,
1
1
0
,1,
1
0
,1,
)1(,
0
0
1 0
0)1(,
ˆ
ˆ
ˆ
γπ
γ
ξ
ξ
ξ
γ
γ
γ
γη
=
==
==
+
=
−
=
−
=+
=
−
=+
−
=+
+
=
=
= =
=+
∑∑
∑
∑∑
∑
∑
∑
∑∑
∑
Overview (Chapter 18)
HMMs and Junction tree algorithmLinear Gaussian Models
17
HMM
q1 qt qt+1 qT
y1 yt yt+1 yT
… …
Assume, for illustration purposes, yt is a multinomial node
)1|1(, ==≡ it
itji qyPb
q0
y0
……
Creating Junction Tree
1) Moralize graph (nodes have at most 1 parent, no parents to join)
2) Triangulate graph (no cycles in graph)
q1 qt qt+1 qT
y1 yt yt+1 yT
q0
y0
18
Creating Junction Tree
3) Create maximal spanning tree
(q0,y0)
……
(q0,q1) (qt-1,qt) (qt,qt+1) (qT-1,qT)
(q1,y1) (qt,yt) (qt+1,yt+1) (qT,yT)
Creating Junction Tree
4) Make the separator set explicit5) Assign local CPTs to potentials6) Initialize separator potentials to 1
ψ(q0,y0)
……
ψ(q0,q1) ψ(qt-1,qt) ψ(qt,qt+1) ψ(qT-1,qT)
ψ(q1,y1) ψ(qt,yt) ψ(qt+1,yt+1) ψ(qT,yT)
φ(q0) φ(qt)
ζ(q1) ζ(qt) ζ(qt+1) ζ(qT)
P(q0)P(y0|q0)
P(yt|qt)P(y1|q1) P(yt+1|qt+1) P(yT|qT)
P(qt+1|qt) P(qT|qT-1)P(qt|qt-1)P(q1|q0)
11 1 1
1 1
19
)()()|( 00000
qPqPqyPy
=∑
Junction Tree, No Evidence
ψ(q0,y0)
……P(q0,q1) P(qt-1,qt) P(qt,qt+1) P(qT-1,qT)
P(y1|q1) P(yt|qt) P(yt+1|qt+1) P(yT|qT)
P(q0,y0)
ψ(q0,q1) ψ(qt-1,qt) ψ(qt,qt+1) ψ(qT-1,qT)
ψ(q1,y1) ψ(qt,yt) ψ(qt+1,yt+1) ψ(qT,yT)
φ(q0) φ(qt)
ζ(q1) ζ(qt) ζ(qt+1) ζ(qT)
P(q0)P(y0|q0)
P(yt|qt)P(y1|q1) P(yt+1|qt+1) P(yT|qT)
P(qt+1|qt) P(qT|qT-1)P(qt|qt-1)P(q1|q0)
11 1 1
P(q0)
1)|( =∑ty
tt qyP )()()|( 11 ++ =∑ ttq
tt qPqPqqPt
P(qt)
Root
Junction Tree, No Evidence
ψ(q0,y0)
……P(q0,q1) P(qt-1,qt) P(qt,qt+1) P(qT-1,qT)
P(y1,q1) P(yt,qt) P(yt+1,qt+1) P(yT,qT)
P(q0,y0)
ψ(q0,q1) ψ(qt-1,qt) ψ(qt,qt+1) ψ(qT-1,qT)
ψ(q1,y1) ψ(qt,yt) ψ(qt+1,yt+1) ψ(qT,yT)
φ(q0) φ(qt)
ζ(q1) ζ(qt) ζ(qt+1) ζ(qT)
P(q0)P(y0|q0)
P(yt|qt)P(y1|q1) P(yt+1|qt+1) P(yT|qT)
P(qt+1|qt) P(qT|qT-1)P(qt|qt-1)P(q1|q0)
P(q1) P(qt+1)
P(q0) P(qt)
Root
P(qt) P(qT)
20
)|()(
),()(
11*
,
1*
1
1 ++
++
∑
∑
+=
=
tttq
qttt
qyPqa
qqq
t
tt
t
ϕ
ψϕ
)()()|( 0000 qqPqyP α=
Junction Tree, With Evidence
ψ(q0,y0)
……P(q0,q1) P(qt-1,qt) P(qt,qt+1) P(qT-1,qT)
P(y1|q1) P(yt|qt) P(yt+1|qt+1) P(yT|qT)
P(q0,y0)
ψ(q0,q1) ψ(qt-1,qt) ψ(qt,qt+1) ψ(qT-1,qT)
ψ(q1,y1) ψ(qt,yt) ψ(qt+1,yt+1) ψ(qT,yT)
φ(q0) φ(qt)
ζ(q1) ζ(qt) ζ(qt+1) ζ(qT)
P(q0)P(y0|q0)
P(yt|qt)P(y1|q1) P(yt+1|qt+1) P(yT|qT)
P(qt+1|qt) P(qT|qT-1)P(qt|qt-1)P(q1|q0)
P(y1|q1) P(yt+1|qt+1)
α(q0)
)|()( iii qyPq =ζ
α(qt)
Root
P(yt|qt)P(yT|qT)
Junction Tree, With Evidence
ψ(q0,y0)
……P(q0,q1) P(qt-1,qt) P(qt,qt+1) P(qT-1,qT)P(q0,y0)
ψ(q0,q1) ψ(qt-1,qt) ψ(qt,qt+1) ψ(qT-1,qT)
ψ(q1,y1) ψ(qt,yt) ψ(qt+1,yt+1) ψ(qT,yT)
φ(q0) φ(qt)
ζ(q1) ζ(qt) ζ(qt+1) ζ(qT)
P(q0)P(y0|q0)
P(yt|qt)P(y1|q1) P(yt+1|qt+1) P(yT|qT)
P(qt+1|qt) P(qT|qT-1)P(qt|qt-1)P(q1|q0)
γ(q0) γ(qt)
Root
21
Other algorithms
Derived α-γ from junction tree, can also derive α-β and ρ-ζ algorithms
Linear Gaussian Models(LG-HMMs)
Same graphical structure as HMMDifferent node type and parameterization than HMMs
Nodes are linear-Gaussians
Junction tree is the same except use linear-Gaussian potentials
22
LG-HMMs
Gaussian CPDs (using moments)
),(~)|(),(~)|( 1
RCxNxyPGQGAxNxxP
ttt
Tttt+
)()()|( 0000
0
xPxPxyPy
=∫
Junction Tree, No Evidence
ψ(x0,y0)
……P(x0,x1) P(xt-1,xt) P(xt,xt+1) P(xT-1,xT)
P(y1|x1) P(yt|xt) P(yt+1|xt+1) P(yT|xT)
P(x0,y0)
ψ(x0,x1) ψ(xt-1,xt) ψ(xt,xt+1) ψ(xT-1,xT)
ψ(x1,y1) ψ(xt,yt) ψ(xt+1,yt+1) ψ(xT,yT)
φ(x0) φ(xt)
ζ(x1) ζ(xt) ζ(xt+1) ζ(xT)
P(x0)P(y0|x0)
P(yt|xt)P(y1|x1) P(yt+1|xt+1) P(yT|xT)
P(xt+1|xt) P(xT|xT-1)P(xt|xt-1)P(x1|x0)
11 1 1
P(x0)
1)|( =∫ty
tt xyP )()()|( 11 ++ =∫ tx
ttt xPxPxxPt
P(xt)
Root
23
Junction Tree, With Evidence
ψ(x0,y0)
……P(x0,x1) P(xt-1,xt) P(xt,xt+1) P(xT-1,xT)P(x0,y0)
ψ(x0,x1) ψ(xt-1,xt) ψ(xt,xt+1) ψ(xT-1,xT)
ψ(x1,y1) ψ(xt,yt) ψ(xt+1,yt+1) ψ(xT,yT)
φ(x0) φ(xt)
ζ(x1) ζ(xt) ζ(xt+1) ζ(xT)
P(x0)P(y0|x0)
P(yt|xt)P(y1|x1) P(yt+1|xt+1) P(yT|xT)
P(xt+1|xt) P(xT|xT-1)P(xt|xt-1)P(x1|x0)
γ(x0) γ(xt)
Root