hidden markov models - vision labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/... ·...
TRANSCRIPT
Hidden Markov Models
End of the Semester Levity
JOKE 1: A mathematician, an engineer, and a physicist are being interviewed for a job. In each case, the interview goesalong famously until the last question is asked: "How much is one plus one?"
Each of them suspects a trap, and is hesitant to answer.
The mathematician thinks for a moment, and says "I'm not sure, butI think it converges".
The physicist says "I'm not sure, but I think it's on the order of one"
The engineer gets up, closes the door to the office, and says "How muchdo you want it to be?".
JOKE 2: A biologist, a statistician, a mathematician and a computer scientist are on a photo-safari in Africa.They drive out into the savannah in their jeep, stop and scour the horizon with their binoculars.
The biologist: "Look! There's a herd of zebras! And there, in the middle:a white zebra! It's fantastic! There are white zebras! We'll be famous!"
The statistician:"It's not significant. We only know there's one white zebra"
The mathematician:"Actually, we know there exists a zebra which is white on one side"
The computer scientist:"Oh no! A special case!"
Time Series Data• You are given a collection of labelled points in
some order: { (y1,x1), (y2, x2),…., (yN, xN) }.Where xi are category labels.
• In time series data, independence is violated.
• In time series data, order matters.
†
p(y1, y2,...,yn,x1,x2,...,xn ) ≠ p(y1 | x1)p(y2 | x2)p(yn | xn )
†
Basic decomposition for any distributionp(y1, x1, x2, y2,..., yn , xn ) = p(yn ,xn | xn-1,yn-1,..., y1,x1) ⋅
p(xn-1, yn-1 | xn-2,yn-2,...,y1,x1)L p(y1 | x1)
Markov Assumptionp(y1, x1, x2, y2,..., yn , xn ) = p(yn ,xn | xn-1,yn-1)p(xn-1, yn-1 | xn-2, yn-2)L p(y1 | x1)
Time Series Data
0 50 100 150 200550
600
650
700
TIME
AMPL
ITUD
E
Composite Index
0 50 100 150 200450
500
550
600
TIME
AMPL
ITUD
E Utility Index
ECG Data
0 200 400 600 800 1000-1.5
-1
-0.5
0
0.5
1
1.5
TIME (1/180 sec)
AMPL
ITUD
E
Time Series Data with Hidden state
0 50 100 150 200550
600
650
700
TIME
AMPL
ITUD
E
Composite Index
0 50 100 150 200450
500
550
600
TIME
AMPL
ITUD
E Utility Index
Holiday Season
ECG Data with Hidden State
0 200 400 600 800 1000-1.5
-1
-0.5
0
0.5
1
1.5
TIME (1/180 sec)
AMPL
ITUD
E
Nicotine Inhalation
Example: The Dishonest CasinoA casino has two dice:• Fair die
P(1) = P(2) = P(3) = P(5) = P(6) = 1/6• Loaded die
P(1) = P(2) = P(3) = P(5) = 1/10P(6) = 1/2
Casino player switches back-&-forth between fairand loaded die once every 20 turns
Game:1. You bet $12. You roll (always with a fair die)3. Casino player rolls (maybe with fair die, maybe
with loaded die)4. Highest number wins $2
Problem 1: EvaluationGIVEN
A sequence of rolls by the casino player
1245526462146146136136661664661636616366163616515615115146123562344
QUESTION
How likely is this sequence, given our model of how thecasino works?
This is the EVALUATION problem in HMMs
Problem 2 – DecodingGIVEN
A sequence of rolls by the casino player
1245526462146146136136661664661636616366163616515615115146123562344
QUESTION
What portion of the sequence was generated with the fair die,and what portion with the loaded die?
This is the DECODING question in HMMs
Problem 3 – LearningGIVEN
A sequence of rolls by the casino player
1245526462146146136136661664661636616366163616515615115146123562344
QUESTION
How “loaded” is the loaded die? How “fair” is the fair die?How often does the casino player change from fair toloaded, and back?
This is the LEARNING question in HMMs
The dishonest casino model
FAIR LOADED
0.05
0.05
0.950.95
P(1¦F) = 1/6P(2¦F) = 1/6P(3¦F) = 1/6P(4¦F) = 1/6P(5¦F) = 1/6P(6¦F) = 1/6
P(1¦L) = 1/10P(2¦L) = 1/10P(3¦L) = 1/10P(4¦L) = 1/10P(5¦L) = 1/10P(6¦L) = 1/2
The three main questions on HMMs1. Evaluation
GIVEN a HMM M, and a sequence x,FIND Prob( x | M )
2. Decoding
GIVEN a HMM M, and a sequence x,FIND the sequence q of states that maximizes P( x, q | M )
3. Learning
GIVEN a HMM M, with unspecified transition/emission probs.,and a sequence x,
FIND parameters q = (ei(.), aij) that maximize P( x | q )
Let’s not be confused by notation
P( x | M ): The probability that sequence x was generated by the model
The model is: architecture (#states, etc) + parameters q = aij, ei(.)
So, P( x | q ), and P( x ) are the same, when the architecture, and theentire model, respectively, are implied
Similarly, P( x, q | M ) and P ( x, q ) are the same
In the LEARNING problem we always write P( x | q ) to emphasizethat we are seeking the q that maximizes P( x | q )
Message passing view offorwards algorithm
Yt-1 Yt+1
Xt-1 XtXt+1
Yt
at|t-1
btbt+1
Forwards-backwards algorithm
Yt-1 Yt+1
Xt-1 Xt Xt+1
Yt
at|t-1bt
bt
Discrete analog of RTS smoother
Forwards algorithm for HMMsPredict:
Update:
O(T S2) time using dynamic programming
Discrete-state analog of Kalman filter
Assuming a family of Markov Transition Models M, what is theProbability of the data given a particular model?
Evaluation and Model Selection