hidden markov models
DESCRIPTION
Hidden Markov Models. Hsin-Min Wang [email protected]. References: L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6 X. Huang et. al., (2001) Spoken Language Processing, Chapter 8 - PowerPoint PPT PresentationTRANSCRIPT
1
Hidden Markov Models
Hsin-Min [email protected]
References: 1. L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6
2. X. Huang et. al., (2001) Spoken Language Processing, Chapter 8
3. L. R. Rabiner, (1989) “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989
2
Hidden Markov Model (HMM)
History– Published in Baum’s papers in late 1960s and early 1970s– Introduced to speech processing by Baker (CMU) and Jelinek (I
BM) in the 1970s– Introduced to computational biology in late1980s
• Lander and Green (1987) used HMMs in the construction of genetic linkage maps
• Churchill (1989) employed HMMs to distinguish coding from noncoding regions in DNA
3
Hidden Markov Model (HMM)
Assumption– Speech signal (DNA sequence) can be characterized as a
parametric random process– Parameters can be estimated in a precise, well-defined manner
Three fundamental problems– Evaluation of probability (likelihood) of a sequence of
observations given a specific HMM– Determination of a best sequence of model states– Adjustment of model parameters so as to best account for
observed signal/sequence
4
Hidden Markov Model (HMM)
S2
S1
S3
{A:.34,B:.33,C:.33}
{A:.33,B:.34,C:.33} {A:.33,B:.33,C:.34}
0.34
0.340.33
0.33
0.33
0.330.33
0.33
0.34 33.033.034.0
34.0,33.0,33.0
33.0,34.0,33.0
33.0,33.0,34.0
34.033.033.0
33.034.033.0
33.033.034.0
333
222
111
π
A
CbBbAb
CbBbAb
CbBbAb
Given an initial model as follows:
We can train HMMs for the following two classes using their training data respectively.
Training set for class 1:1. ABBCABCAABC 2. ABCABC 3. ABCA ABC 4. BBABCAB 5. BCAABCCAB 6. CACCABCA 7. CABCABCA 8. CABCA 9. CABCA
Training set for class 2:1. BBBCCBC 2. CCBABB 3. AACCBBB 4. BBABBAC 5. CCAABBAB 6. BBBCCBAA 7. ABBBBABA 8. CCCCC 9. BBAAA
We can then decide which class the following testing sequences belong to. ABCABCCAB AABABCCCCBBB
back
5
Probability Theorem
Consider the simple scenario of rolling two dice, labeled die 1 and die 2. Define the following three events:
A: Die 1 lands on 3. B: Die 2 lands on 1. C: The dice sum to 8.
Prior probability: P(A)=P(B)=1/6, P(C)=5/36.
Joint probability: P(A,B) (or P(A∩B)) =1/36, two events A and B are statistically independent if and only if P(A,B) = P(A)xP(B).
P(B,C)=0, two events B and C are mutually exclusive if and only if B∩C=Φ, i.e., P(B∩C)=0.
Conditional probability: , P(B|A)=P(B), P(C|B)=0 6
1
6/1
36/1
)(
),()|(
AP
ACPACP
)|( max argˆ OP
)(
)()|( max arg
OP
POP
)()|( max arg
POPBayes’ rule
Posterior probability maximum likelihood principle
{(2,6), (3,5), (4,4), (5,3), (6,2)}
A∩B ={(3,1)}
B∩C=Φ
6
The Markov Chain
),...,,(),...,,|(),...,,|( 2212211121 nnnnn XXXPXXXXPXXXXP
),,,...,,( 1221 nnn XXXXXP
)()|(),( APABPBAP
),...,,|()( 1212
1 ii
n
iXXXXPXP
),,...,,(),,...,,|( 12211221 nnnnn XXXXPXXXXXP
),(),|(...),...,,|(
),...,,|(),...,,|(
212133212
2211121
XXPXXXPXXXXP
XXXXPXXXXP
nn
nnnn
)()|(),|(...),...,,|(
),...,,|(),...,,|(
1122133212
2211121
XPXXPXXXPXXXXP
XXXXPXXXXP
nn
nnnn
)|()(),...,,( 12
121 ii
n
in XXPXPXXXP First-order Markov chain
7
The parameters of a Markov chain, with N states labeled by {1,…,N} and the state at time t in the Markov chain denoted as qt, can be described as
aij=P(qt= j|qt-1=i) 1≤i,j≤N
i =P(q1=i) 1≤i≤N
The output of the process is the
set of states at each time instant t, where each state corresponds to an observable event Xi
There is a one-to-one
correspondence between the observable sequence and the Markov chain state sequence
Observable Markov Model
Nj ij ia1 ) 1(
)1( 1 Ni i
(Rabiner 1989)
8
The Markov Chain – Ex 1
A 3-state Markov Chain – State 1 generates symbol A only,
State 2 generates symbol B only, State 3 generates symbol C only
– Given a sequence of observed symbols O={CABBCABC}, the only one corresponding state sequence is Q={S3S1S2S2S3S1S2S3}, and the corresponding probability is
P(O|)=P(CABBCABC|)=P(Q| )=P(S3S1S2S2S3S1S2S3 |)
=π(S3)P(S1|S3)P(S2|S1)P(S2|S2)P(S3|S2)P(S1|S3)P(S2|S1)P(S3|S2)=0.10.30.30.70.20.30.30.2=0.00002268
1.05.04.0
5.02.03.0
2.07.01.0
1.03.06.0
π
AS2 S3
A
B C
0.6
0.7
0.30.1
0.2
0.20.1
0.3
0.5
S1
9
The Markov Chain – Ex 2
A three-state Markov chain for the Dow Jones Industrial average
0.3
0.2
0.5t
iπ
The probability of 5 consecutive up days
0.06480.60.5
days econsecutiv 5
4
111111111
11111
aaaa
,S,S,S,SSP
upP
(Huang et al., 2001)
10
Extension to Hidden Markov Models
HMM: an extended version of Observable Markov Model– The observation is a probabilistic function (discrete or
continuous) of a state instead of an one-to-one correspondence of a state
– The model is a doubly embedded stochastic process with an underlying stochastic process that is not directly observable (hidden)
• What is hidden? The State Sequence!According to the observation sequence, we are not sure which state sequence generates it!
11
Hidden Markov Models – Ex 1
A 3-state discrete HMM
– Given an observation sequence O={ABC}, there are 27 possible corresponding state sequences, and therefore the probability, P(O|), is
S2
S1
S3
{A:.3,B:.2,C:.5}
{A:.7,B:.1,C:.2} {A:.3,B:.6,C:.1}
0.6
0.7
0.30.1
0.2
0.20.1
0.3
0.5 1.05.04.0
1.0,6.0,3.0
2.0,1.0,7.0
5.0,2.0,3.0
5.02.03.0
2.07.01.0
1.03.06.0
333
222
111
π
A
CbBbAb
CbBbAb
CbBbAb
07.02.07.05.0
007.01.01.07.0,, when e.g.
: ,,,
23222
322322
27
1
27
1
SSPSSPSλP
SCPSBPSAPλPSSS
encestate sequλPλPλPλP
i
ii
ii
iii
i
Q
QOQ
QQQOQOO
Initial model
12
Hidden Markov Models – Ex 2
(Huang et al., 2001)
Given a three-state Hidden Markov Model for the Dow Jones Industrial averageas follows:
How to find the probability P(up, up, up, up, up|)?How to find the optimal state sequence of the model which generates the observation sequence “up, up, up, up, up”?
cf. the Markov chain
(35 state sequences can generate “up, up, up, up, up”.)
13
Elements of an HMM
An HMM is characterized by the following:1. N, the number of states in the model
2. M, the number of distinct observation symbols per state
3. The state transition probability distribution A={aij}, where aij=P[q
t+1=j|qt=i], 1≤i,j≤N
4. The observation symbol probability distribution in state j, B={bj
(vk)} , where bj(vk)=P[ot=vk|qt=j], 1≤j≤N, 1≤k≤M
5. The initial state distribution ={i}, where i=P[q1=i], 1≤i≤N
For convenience, we usually use a compact notation =(A,B,) to indicate the complete parameter set of an HMM1. Requires specification of two model parameters (N and M)
14
Two Major Assumptions for HMM
First-order Markov assumptionFirst-order Markov assumption– The state transition depends only on the origin and destination
– The state transition probability is time invariant
Output-independent assumptionOutput-independent assumption– The observation is dependent on the state that generates it, not
dependent on its neighbor observations
aij=P(qt+1=j|qt=i), 1≤i, j≤N
T
tttTt qqPqPqqqPP
2111 ,,...,,..., Q
T
ttq
T
tttTtTt obqoPqqqoooPP
t11
11 ,,,...,,...,,...,,...,, QO
15
Three Basic Problems for HMMs
Given an observation sequence O=(o1,o2,…,oT), and an HMM =(A,B,)
– Problem 1:
How to compute P(O|) efficiently ?
Evaluation Problem
– Problem 2:
How to choose an optimal state sequence Q=(q1,q2,……, qT) whic
h best explains the observations?
Decoding Problem– Problem 3:
How to adjust the model parameters =(A,B,) to maximize P(O|)? Learning/Training Problem
)|,(maxarg* OQQQ
P
)|(maxarg*iP
i
OP(up, up, up, up, up|)?
16
Solution to Problem 1
17
Solution to Problem 1 - Direct Evaluation
Given O and , find P(O|)= Pr{observing O given } Evaluating all possible state sequences of length T that ge
nerate the observation sequence O
: The probability of the path Q– By first-order Markov assumption
: The joint output probability along the path Q – By output-independent assumption
QQOQOO
,,allall
PPPP
TT qqqqqqq
T
ttt aaaqqPqPP
132211...,
211
Q
QP
,QOP
T
ttq
T
ttt obqoPP
t11
,, QO
18
Solution to Problem 1 - Direct Evaluation (cont’d)
S2
S3
S1
o1
S2
S3
S1
S2
S3
S1
S2
S3
S1
State
o2 o3 oT
1 2 3 T-1 T Time
S2
S3
S1
oT-1
Sj means bj(ot) has been computed
aij means aij has been computed
)( 133 ob )( 2232 oba )( 3323 oba … )(121 Toba
19
Solution to Problem 1 - Direct Evaluation (cont’d)
– A Huge Computation Requirement: O(NT) (NT state sequences)• Exponential computational complexity
A more efficient algorithm can be used to evaluate – The Forward Procedure/Algorithm
Tqqqqqq
,..,q,qqqq
allTqqqqqqqqqq
all
obaobaob
obobobaaa
PPP
TTTT
TTT
122121
11
21132211
.....
..........
,
21
21
QOQO
Q
Q
ADD 1 2MUL12 : -N, TN NT- TTT Complexity
OP
20
Solution to Problem 1 - The Forward Procedure
Base on the HMM assumptions, the calculation of and involves only qt-1, qt , and o
t , so it is possible to compute the likelihood
with recursion on t
Forward variable : – The probability of the joint event that o1,o2,…,ot are observed and
the state at time t is i, given the model λ
,1tt qqP ,tt qoP
λiqoooPiα ttt ,,...,, 21
OP
λjqooooPjα tttt 11211 ,,,...,,
)()( 11
tj
N
iijt obaiα
21
)()(
)(),|(,,...,,
)(),,,...,,|(,,...,,
)(,,,...,,
)(|,,...,,
),|(|,,...,,
)|(),|(,|,...,,
)|(,|,,...,,
|,,,...,,
11
11
121
11
21121
11
121
1121
11121
111121
11121
11211
tj
N
iijt
tj
N
itttt
tj
N
ittttt
tj
N
ittt
tjtt
tttt
ttttt
tttt
tttt
obai
obλiqjqPλiqoooP
obλiqooojqPλiqoooP
obλjqiqoooP
objqoooP
jqoPjqoooP
jqPjqoPjqoooP
jqPjqooooP
jqooooPj
Solution to Problem 1 - The Forward Procedure (cont’d)
Ball
BAPAP
),(
)|(),|()(
),(
),(
),,(
)(
),,()|,(
BPBAP
P
BP
BP
BAP
P
BAPBAP
Output-independent assumption
)|,()|(),|( BAPBPBAP
)(, 111 tjtt objqoP
),|()|()|,( ABPAPBAP
First-order Markov assumption
22
Solution to Problem 1 - The Forward Procedure (cont’d)
3(2)=P(o1,o2,o3,q3=2|)
=[2(1)*a12+ 2(2)*a22 +2(3)*a32]b2(o3)
S2
S3
S1
o1
S2
S3
S1
S3
S2
S1
S2
S3
S1
State
o2 o3 oT
1 2 3 T-1 T Time
S2
S3
S1
oT-1
Sj means bj(ot) has been computed
aij means aij has been computed
2(1)
2(2)
2(3)
a12
a22
a32b2(o3)
Time index
State index
23
Solution to Problem 1 - The Forward Procedure (cont’d)
Algorithm
– Complexity: O(N2T)
Based on the lattice (trellis) structure– Computed in a time-synchronous fashion from left-to-right,
where each cell for time t is completely computed before proceeding to time t+1
All state sequences, regardless how long previously, merge to N nodes (states) at each time instance t
N
iT
tj
N
iijtt
ii
iαλP
Nj,T-t, obaiαjα
Ni, obπiqoPiα
1
11
1
1111
Oion 3.Terminat
111 Induction 2.
1)|,(tion Initializa 1.
TNN))N(T-(N-
T N)+N )(T-N(N+2
2
111: ADD
11 : MUL
λiqoooPi ttt ,...21
cf. O(NT) for direct evaluation
24
Solution to Problem 1 - The Forward Procedure (cont’d)
A three-state Hidden Markov Model for the Dow Jones Industrial average
b1(up)=0.7
b2(up)= 0.1
b3(up)=0.3
a11=0.6
a21=0.5
a31=0.4
(Huang et al., 2001)
b1(up)=0.7
b2(up)= 0.1
b3(up)=0.3
π1=0.5
π2=0.2
π3=0.3
α1(1)=0.5*0.7
α1(2)= 0.2*0.1
α1(3)= 0.3*0.3
α2(1)= (0.35*0.6+0.02*0.5+0.09*0.4)*0.7
α2(2)=(0.35*0.2+0.02*0.3+0.09*0.1)*0.1
α2(3)=(0.35*0.2+0.02*0.2+0.09*0.5)*0.3
P(up, up|) = α2(1)+α2(2)+α2(3)
a12=0.2
a22=0.3
a32=0.1a13=0.2a23=0.2
a33=0.5
25
Solution to Problem 2
26
Solution to Problem 2 - The Viterbi Algorithm
The Viterbi algorithm can be regarded as the dynamic programming algorithm applied to the HMM or as a modified forward algorithm– Instead of summing probabilities from different paths coming to t
he same destination state, the Viterbi algorithm picks and remembers the best path
• Find a single optimal state sequence Q*
– The Viterbi algorithm also can be illustrated in a trellis framework similar to the one for the forward algorithm
)|,(maxarg* OQQQ
P
27
Solution to Problem 2 - The Viterbi Algorithm (cont’d)
S2
S3
S1
o1
S2
S3
S1
S2
S3
S1
S2
S1
S3
State
o2 o3 oT
1 2 3 T-1 T Time
S2
S3
S1
oT-1
28
Solution to Problem 2 - The Viterbi Algorithm (cont’d)
1. Initialization
2. Induction
3. Termination
4. Backtracking),...,,(
1,...,2.1),(**
2*1
*
*11
T
tt*t
qqq
TTtqq
Q
Ni, i
Ni, obπi ii
10)(
1
1
11
Nj,T-t, aij
Nj,T-t, obaij
ijtNi
tjijtNi
t
1 11][maxarg)(
1 11][max
11t
11
1
iq
iλP
TNi
*T
TNi
1
1
*
maxarg
maxO
Complexity: O(N2T)
is the best state sequence
11
1 cf.
tj
N
iijtt obaiαjα
N
iT iαλP
1O
29
b1(up)=0.7
b2(up)= 0.1
b3(up)=0.3
a11=0.6
a21=0.5
a31=0.4
b1(up)=0.7
b2(up)= 0.1
b3(up)=0.3
π1=0.5
π2=0.2
π3=0.3
Solution to Problem 2 - The Viterbi Algorithm (cont’d)
A three-state Hidden Markov Model for the Dow Jones Industrial average
(Huang et al., 2001)
δ1(1)=0.5*0.7
δ1(2)= 0.2*0.1
δ1(3)= 0.3*0.3
δ2(1)=max (0.35*0.6, 0.02*0.5, 0.09*0.4)*0.7
δ2(1)= 0.35*0.6*0.7=0.147Ψ2(1)=1
0.09
a12=0.2a22=0.3
a32=0.1a13=0.2
a23=0.2
a33=0.5
δ2(2)=max (0.35*0.2, 0.02*0.3, 0.09*0.1)*0.1
δ2(2)= 0.35*0.2*0.1=0.007Ψ2(2)=1
δ2(3)=max (0.35*0.2, 0.02*0.2, 0.09*0.5)*0.3
δ2(3)= 0.35*0.2*0.3=0.021Ψ2(3)=1
The most likely state sequence that generates “up up”: 1 1
1)1()( 2*221 qq*
1maxarg 231
2
iqi
*
30
Some Examples
31
Isolated Digit Recognition
o1 o2 o3 oT
1 2 3 T-1 T Time
oT-1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
1
0
)3()|( 0 TP O
S2
S3
S1
)3()|( 1 TP O
S2
S3
S1
S2
S3
S1
32
Continuous Digit Recognition
o1 o2 o3 oT
1 2 3 T-1 T Time
oT-1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S5
S6
S4
S5
S6
S4
S5
S6
S4
S5
S6
S4
S5
S6
S4
1
0
)6(T
S2
S3
S1
)3(T
S2
S3
S1
S5
S6
S4
S5
S6
S4
)6(T
)3(T
33
Continuous Digit Recognition (cont’d)
1 2 3 4 5 6 7 8 9 Time
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S5
S6
S4
S5
S6
S4
S5
S6
S4
S5
S6
S4
1
0 S2
S3
S1
S5
S6
S4
S2
S3
S1
S5
S6
S4
S2
S3
S1
S5
S6
S4
S2
S3
S1
S5
S6
S4
S2
S3
S1
S5
S6
S4
)6(8
)6(8
S1 S1 S2 S6S3 S3 S4 S5 S5
Beststate sequence
34
CpG Islands
Two QuestionsTwo Questions Q1: Given a short sequence, does it come from a CpG is
land? Q2: Given a long sequence, how would we find the CpG
islands in it?
35
CpG Islands Answer to Q1:
– Given sequence x, probabilistic model M1 of CpG islands, and probabilistic model M2 for non-CpG island regions
– Compute p1=P(x|M1) and p2=P(x|M2)
– If p1 > p2, then x comes from a CpG island (CpG+)
– If p2 > p1, then x does not come from a CpG island (CpG-)
S1:A S2:C
S3:T S4:G
CpG+ A C G T
A 0.180 0.274 0.426 0.120
C 0.171 0.368 0.274 0.188
G 0.161 0.339 0.375 0.125
T 0.079 0.355 0.384 0.182
CpG- A C G T
A 0.300 0.205 0.285 0.210
C 0.322 0.298 0.078 0.302
G 0.248 0.246 0.298 0.208
T 0.177 0.239 0.292 0.292
Large CG transition probability
vs.
Small CG transition probability
36
CpG Islands Answer to Q2:
S1 S2
A: 0.3C: 0.2G: 0.2T: 0.3
A: 0.2C: 0.3G: 0.3T: 0.2
p22=0.9999
p11=0.99999
p12=0.00001
p21=0.0001 CpG+CpG-
… A C T C G A G T A …
S1 S1 S1 S1S2 S2 S2 S2 S1
Observable
Hidden
37
A Toy Example: 5’ Splice Site Recognition
5’ splice site indicates the “switch” from an exon to an intron
Assumptions:– Uniform base composition on average in exons (25% each base) – Introns are A/T rich (40% A/T, and 10% C/G)– The 5’SS consensus nucleotide is almost always a G (say, 95%
G and 5% A)
From “What is a hidden Markov Model?”, by Sean R. Eddy
38
A Toy Example: 5’ Splice Site Recognition
39
Solution to Problem 3
40
Solution to Problem 3 – Maximum Likelihood Estimation of Model Parameters
How to adjust (re-estimate) the model parameters =(A,B,) to maximize P(O|)?– The most difficult one among the three problems, because there
is no known analytical method that maximizes the joint probability of the training data in a closed form
• The data is incomplete because of the hidden state sequence
– The problem can be solved by the iterative Baum-Welch algorithm, also known as the forward-backward algorithm
• The EM (Expectation Maximization) algorithm is perfectly suitable for this problem
– Alternatively, it can be solved by the iterative segmental K-means algorithm
• The model parameters are adjusted to maximize P(O, Q* |), Q* is the state sequence given by the Viterbi algorithm
• Provide a good initialization of Baum-Welch training
41
Solution to Problem 3 – The Segmental K-means Algorithm
Assume that we have a training set of observations and an initial estimate of model parameters– Step 1 : Segment the training data
The set of training observation sequences is segmented into states, based on the current model, by the Viterbi Algorithm
– Step 2 : Re-estimate the model parameters
– Step 3: Evaluate the model If the difference between the new and current model scores exceeds a threshold, go back to Step 1; otherwise, return
j
jkkb j statein nsobservatio ofNumber
statein "" ofNumber
sequences trainingofNumber
timesofNumber 1 iqi
state from ns transitioofNumber
state to state from ns transitioofNumber
i
jiaij
42
Solution to Problem 3 – The Segmental K-means Algorithm (cont’d)
3 states and 2 codewords
π1=1, π2=π3=0 a11=3/4, a12=1/4 a22=2/3, a23=1/3 a33=1 b1(A)=3/4, b1(B)=1/4 b2(A)=1/3, b2(B)=2/3 b3(A)=2/3, b3(B)=1/3
A
B
O1
State
O2 O3
1 2 3 4 5 6 7 8 9 10O4
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
O5 O6 O9O8O7 O10Training data:
Re-estimatedparameters:
What if the training data is labeled?
43
Solution to Problem 3 – The Backward Procedure
Backward variable :– The probability of the partial observation sequence ot+1,ot+2,…,oT,
given state i at time t and the model – 2(3)=P(o3,o4,…, oT|q2=3,)
=a31* b1(o3)*3(1)+a32* b2(o3)*3(2)+a33* b3(o3)*3(3)
S2
S3
S1
o1
S2
S3
S1
S2
S3
S1
S2
S3
S1
o2 o3 oT
1 2 3 T-1 T Time
S2
S3
S3
oT-1
S2
S3
S1
State
λ,,...,, 21 iqoooPi tTttt
3(1)b1(o3)
a31
44
Solution to Problem 3 – The Backward Procedure (cont’d)
Algorithm
TN))N(T-(N-T N) (T-N
Nj,T-tjobai
NiiβN
jttjijt
T
222
111
11 :ADD ; 12 : MUL Complexity
1 11 ,Induction 2.
1 ,1tion Initializa 1.
λiqoooPi tTttt ,,...,, 21
N
iii
N
iT
N
iT
N
iT
obi
λiqPλiqoPλiqoooP
λiqPλiqooooPλiqooooPλP
111
1111
132
11
13211
1321
)()(
,,,...,,
,,...,,,,,...,,,
O
N
iT iαλP
1Ocf.
45
Solution to Problem 3 – The Forward-Backward Algorithm
Relation between the forward and backward variables
)(][
,...
11
21
ti
N
jjitt
ttt
obaji
iqoooPi
λ
N
jttjijt
tTttt
jobai
iqoooPi
111
21
)(
,...
λ
λiqPii ttt ,)( O
(Huang et al., 2001)
Ni tt iiλP 1 )(O
46
Solution to Problem 3 – The Forward-Backward Algorithm (cont’d)
λiqP
iqoooP
iqPiqoooP
iqoooPiqPiqoooP
iqoooPiqoooP
ii
t
tT
ttT
tTttttt
tTtttt
tt
,
)|,,...,,(
)|(),|,...,,(
),|,...,,()|(),|,...,,(
),|,...,,()|,,...,,(
)(
21
21
2121
2121
O
N
itt
N
it iiλiqPλP
11)()(, OO
47
Solution to Problem 3 – The Intuitive View
Define two new variables:t(i)= P(qt = i | O, ) – Probability of being in state i at time t, given O and
t( i, j )=P(qt = i, qt+1 = j | O, ) – Probability of being in state i at time t and state j at time t+1, given O an
d
N
m
N
nttnmnt
ttjijtttt
nobam
jobai
λP
λjqiqPji
1 111
111 ,,,
O
O
N
jtt jii
1
,
Ni tt
tttttt
ii
ii
λP
ii
λP
iqPi
1
)|,(
OO
O
λiqPii ttt ,)( O
Ni tt iiλP 1 )(O
48
Solution to Problem 3 – The Intuitive View (cont’d)
P(q3 = 1, O | )=3(1)*3(1)
o1
s2
s1
s3
s2
s1
s3
S2
S3
S1
State
o2 o3 oT
1 2 3 4 T-1 T Time
oT-1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
3(1) 3(1)
49
Solution to Problem 3 – The Intuitive View (cont’d)
P(q3 = 1, q4 = 3, O | )=3(1)*a13*b3(o4)*4(3)
o1
s2
s1
s3
s2
s1
s3
S2
S3
S1
State
o2 o3 oT
1 2 3 4 T-1 T Time
oT-1
S2
S3
S1
S2
S3
S1
S3
S2
S1
S2
S3
S1
S2
S3
S1
3(1)
4(3)
a13
b3(o4)
50
Solution to Problem 3 – The Intuitive View (cont’d)
t( i, j )=P(qt = i, qt+1 = j | O, )
t(i)= P(qt = i | O, )
Oin state to state from ns transitioofnumber expected
,1
1
ji
jiT
tt
Oin state from ns transitioofnumber expected
1
1
i
iT
tt
51
Solution to Problem 3 – The Intuitive View (cont’d)
Re-estimation formulae for , A, and B are
itii 11)( at time statein times)of(number freqency expected
statein timesofnumber expected
symbol observing and statein timesofnumber expectedT
1t
T
s.t.1 t
j
j
j
vjvb
t
vo
t
kkj
kt
i
i,jξ
i
jia
T-
tt
T-
tt
ij
1
1
1
1 state from ns transitioofnumber expected
state to state from ns transitioofnumber expected