hidden markov models - vision labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/... ·...

Hidden Markov Models

End of the Semester Levity

JOKE 1: A mathematician, an engineer, and a physicist are being interviewed for a job. In each case, the interview goesalong famously until the last question is asked: "How much is one plus one?"

Each of them suspects a trap, and is hesitant to answer.

The mathematician thinks for a moment, and says "I'm not sure, butI think it converges".

The physicist says "I'm not sure, but I think it's on the order of one"

The engineer gets up, closes the door to the office, and says "How muchdo you want it to be?".

JOKE 2: A biologist, a statistician, a mathematician and a computer scientist are on a photo-safari in Africa.They drive out into the savannah in their jeep, stop and scour the horizon with their binoculars.

The biologist: "Look! There's a herd of zebras! And there, in the middle:a white zebra! It's fantastic! There are white zebras! We'll be famous!"

The statistician:"It's not significant. We only know there's one white zebra"

The mathematician:"Actually, we know there exists a zebra which is white on one side"

The computer scientist:"Oh no! A special case!"

Time Series Data• You are given a collection of labelled points in

some order: { (y1,x1), (y2, x2),…., (yN, xN) }.Where xi are category labels.

• In time series data, independence is violated.

• In time series data, order matters.

†

p(y1, y2,...,yn,x1,x2,...,xn ) ≠ p(y1 | x1)p(y2 | x2)p(yn | xn )

†

Basic decomposition for any distributionp(y1, x1, x2, y2,..., yn , xn ) = p(yn ,xn | xn-1,yn-1,..., y1,x1) ⋅

p(xn-1, yn-1 | xn-2,yn-2,...,y1,x1)L p(y1 | x1)

Markov Assumptionp(y1, x1, x2, y2,..., yn , xn ) = p(yn ,xn | xn-1,yn-1)p(xn-1, yn-1 | xn-2, yn-2)L p(y1 | x1)

Time Series Data

0 50 100 150 200550

600

650

700

TIME

AMPL

ITUD

E

Composite Index

0 50 100 150 200450

500

550

600

TIME

AMPL

ITUD

E Utility Index

ECG Data

0 200 400 600 800 1000-1.5

-1

-0.5

0

0.5

1

1.5

TIME (1/180 sec)

AMPL

ITUD

E

Time Series Data with Hidden state

0 50 100 150 200550

600

650

700

TIME

AMPL

ITUD

E

Composite Index

0 50 100 150 200450

500

550

600

TIME

AMPL

ITUD

E Utility Index

Holiday Season

ECG Data with Hidden State

0 200 400 600 800 1000-1.5

-1

-0.5

0

0.5

1

1.5

TIME (1/180 sec)

AMPL

ITUD

E

Nicotine Inhalation

Example: The Dishonest CasinoA casino has two dice:• Fair die

P(1) = P(2) = P(3) = P(5) = P(6) = 1/6• Loaded die

P(1) = P(2) = P(3) = P(5) = 1/10P(6) = 1/2

Casino player switches back-&-forth between fairand loaded die once every 20 turns

Game:1. You bet $12. You roll (always with a fair die)3. Casino player rolls (maybe with fair die, maybe

with loaded die)4. Highest number wins $2

Problem 1: EvaluationGIVEN

A sequence of rolls by the casino player

1245526462146146136136661664661636616366163616515615115146123562344

QUESTION

How likely is this sequence, given our model of how thecasino works?

This is the EVALUATION problem in HMMs

Problem 2 – DecodingGIVEN


1245526462146146136136661664661636616366163616515615115146123562344

QUESTION

What portion of the sequence was generated with the fair die,and what portion with the loaded die?

This is the DECODING question in HMMs

Problem 3 – LearningGIVEN


1245526462146146136136661664661636616366163616515615115146123562344

QUESTION

How “loaded” is the loaded die? How “fair” is the fair die?How often does the casino player change from fair toloaded, and back?

This is the LEARNING question in HMMs

The dishonest casino model

FAIR LOADED

0.05

0.05

0.950.95

P(1¦F) = 1/6P(2¦F) = 1/6P(3¦F) = 1/6P(4¦F) = 1/6P(5¦F) = 1/6P(6¦F) = 1/6

P(1¦L) = 1/10P(2¦L) = 1/10P(3¦L) = 1/10P(4¦L) = 1/10P(5¦L) = 1/10P(6¦L) = 1/2

The three main questions on HMMs1. Evaluation

GIVEN a HMM M, and a sequence x,FIND Prob( x | M )

2. Decoding

GIVEN a HMM M, and a sequence x,FIND the sequence q of states that maximizes P( x, q | M )

3. Learning

GIVEN a HMM M, with unspecified transition/emission probs.,and a sequence x,

FIND parameters q = (ei(.), aij) that maximize P( x | q )

Let’s not be confused by notation

P( x | M ): The probability that sequence x was generated by the model

The model is: architecture (#states, etc) + parameters q = aij, ei(.)

So, P( x | q ), and P( x ) are the same, when the architecture, and theentire model, respectively, are implied

Similarly, P( x, q | M ) and P ( x, q ) are the same

In the LEARNING problem we always write P( x | q ) to emphasizethat we are seeking the q that maximizes P( x | q )

Message passing view offorwards algorithm

Yt-1 Yt+1

Xt-1 XtXt+1

Yt

at|t-1

btbt+1

Forwards-backwards algorithm

Yt-1 Yt+1

Xt-1 Xt Xt+1

Yt

at|t-1bt

bt

Discrete analog of RTS smoother

Forwards algorithm for HMMsPredict:

Update:

O(T S2) time using dynamic programming

Discrete-state analog of Kalman filter

Assuming a family of Markov Transition Models M, what is theProbability of the data given a particular model?

Evaluation and Model Selection

hidden markov models - vision labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/... ·...

Documents