rolling dice data analysis - hidden markov model danielle tan haolin zhu

17
Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Upload: heather-bell

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Rolling Dice Data Analysis - Hidden Markov Model

Danielle Tan

Haolin Zhu

Page 2: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Observations-Histogram 1

1 2 3 4 5 60

500

1000

1500

2000

2500Histogram of All Dice Rolls

Page 3: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Observations-Histogram 2

1 2 3 4 5 60

20

40

60

80

100

120

140

160

180

200Histogram of Dice Rolls #1 to #1000

1 2 3 4 5 60

50

100

150

200

250

300

350Histogram of Dice Rolls #1001 to #2000

1 2 3 4 5 60

50

100

150

200

250

300

350

400Histogram of Dice Rolls #7001 to #8000

#1-1000: fair die?

#1001-2000: loaded die 1?

#7001-8000: loaded die 2?

Page 4: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Observations-Cumulative Sum

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

0.5

1

1.5

2

2.5

3

3.5x 10

4 Cumulative Sum of Dice Roll Values

Actual DataFair Dice

1500 1980 36903790 4250 4660 5700 6500 7700

Fair region’s slope = 3.5; Loaded regions have approx. same slope of 4.5

Page 5: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Observations- Histogram 3

1st Loaded Region

3rd Loaded Region

5th Loaded Region

1 2 3 4 5 60

50

100

150

200

250

1 2 3 4 5 60

20

40

60

80

100

120

140

160

180

200Histogram of Dice Rolls #4250 to #4660

1 2 3 4 5 60

50

100

150

200

250

300

350Histogram of Dice Rolls #7000 to #7780

Page 6: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Observations 2 dice: One is fair, one is loaded.

Loaded regions are:

#1500-1980; #3690-3790, #4250-4660, #5700-6500 & #7000-7700

Probability of 6 on the loaded dice is ½.

Once either of the dice is used, it will continue being used for a while.

Page 7: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Hidden Markov Model Known information:

A sequence of observations with integers between 1-6.

Questions of interest: How was this data set generated? What portion of the data was generated by the fair dice and

loaded dice respectively? What are the probabilities of the transition between the

dice? What is the probability of generating 6 using the loaded

dice?

Page 8: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Hidden-Markov Model

Define two states:

Fair Loaded

Probabilities of the transition between the two states.

0.05

0.05 0.95 0.95

Transition Matrix:0.95 0.05

0.05 0.95A

A guess from observation!

Page 9: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Hidden-Markov Model In each state, there are 6 possible output:

Fair Loaded

1 1/6 1/10

2 1/6 1/10

3 1/6 1/10

4 1/6 1/10

5 1/6 1/10

6 1/6 1/2

Emission Matrix:

1/ 6 1/ 6 1/ 6 1/ 6 1/ 6 1/ 6

1/10 1/10 1/10 1/10 1/10 1/ 2b

Again a guess!

Page 10: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Hidden-Markov ModelA set of observations:

The states are hidden:

1 2( , , )Ny y yy

1 2( , , )Ns s ss For example: s=(FFFFFFFFLLLFFFLL…)

Given the output sequence y, we need to find the most likely set of state transition and output probabilities. In other words, to derive the maximum likelihood estimate of the parameters (transition probabilities) of the HMM given a dataset of output sequences.

Page 11: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Forward-Backward algorithmWhat is the probability that the actual state of the system is i at time t?

The probability of the observed data up to time t:

The probability of the observed data after time t:

Then:

11

( ) ( ) ( )M

t ij j t tj

j A b y j

1 1 2 21 1 1 1 2 11

( ) ( ); ( ) ( ) ( )... ( ) ( )M

j t i i i i t ij j ti

j b y j b y A b y i A b y

1

( ) ( )( )

( ) ( )

t tt M

t ti

i iP i

i i

( ) ( | )t tP i P s i y

Page 12: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Baum-Welch re-estimation

Notice that we are using a guess of the transition matrix and the emission matrix!

Re-estimation of A and b:

Then we are able to iterate until it converges—we keep track of the probability of the whole data set generated by the given parameters until it converges to a maximum.

1 1'( ) ( ) ( )

( ) ( )t ij j t tt

ijt tt

i A b y jA

i i

'( , ) ( ) ( )

( )( ) ( )

t t tti

t tt

y k i ib k

i i

Page 13: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Results

Transition matrix:

Emission matrix:

0.9982 0.0018

0.0036 0.9964A

0.1696 0.1766 0.1583 0.1661 0.1649 0.1645

0.0985 0.0973 0.1019 0.0951 0.1051 0.5022b

Page 14: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Results Time when the loaded dice was used:

Page 15: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Results Histogram of the data generated by the Hidden-

Markov model:

Page 16: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Results Cumulative sum of the data generated by the Hidden

Markov model:

Page 17: Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu

Results Log of the likelihood