rolling dice data analysis - hidden markov model danielle tan haolin zhu

Rolling Dice Data Analysis - Hidden Markov Model

Danielle Tan

Haolin Zhu

Observations-Histogram 1

1 2 3 4 5 60

500

1000

1500

2000

2500Histogram of All Dice Rolls

Observations-Histogram 2

1 2 3 4 5 60

20

40

60

80

100

120

140

160

180

200Histogram of Dice Rolls #1 to #1000

1 2 3 4 5 60

50

100

150

200

250

300


1 2 3 4 5 60

50

100

150

200

250

300

350


#1-1000: fair die?

#1001-2000: loaded die 1?

#7001-8000: loaded die 2?

Observations-Cumulative Sum

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

0.5

1

1.5

2

2.5

3

3.5x 10

4 Cumulative Sum of Dice Roll Values

Actual DataFair Dice

1500 1980 36903790 4250 4660 5700 6500 7700

Fair region’s slope = 3.5; Loaded regions have approx. same slope of 4.5

Observations- Histogram 3

1st Loaded Region

3rd Loaded Region

5th Loaded Region

1 2 3 4 5 60

50

100

150

200

250

1 2 3 4 5 60

20

40

60

80

100

120

140

160

180


1 2 3 4 5 60

50

100

150

200

250

300


Observations 2 dice: One is fair, one is loaded.

Loaded regions are:

#1500-1980; #3690-3790, #4250-4660, #5700-6500 & #7000-7700

Probability of 6 on the loaded dice is ½.

Once either of the dice is used, it will continue being used for a while.

Hidden Markov Model Known information:

A sequence of observations with integers between 1-6.

Questions of interest: How was this data set generated? What portion of the data was generated by the fair dice and

loaded dice respectively? What are the probabilities of the transition between the

dice? What is the probability of generating 6 using the loaded

dice?

Hidden-Markov Model

Define two states:

Fair Loaded

Probabilities of the transition between the two states.

0.05

0.05 0.95 0.95

Transition Matrix:0.95 0.05

0.05 0.95A

A guess from observation!

Hidden-Markov Model In each state, there are 6 possible output:

Fair Loaded

1 1/6 1/10

2 1/6 1/10

3 1/6 1/10

4 1/6 1/10

5 1/6 1/10

6 1/6 1/2

Emission Matrix:

1/ 6 1/ 6 1/ 6 1/ 6 1/ 6 1/ 6

1/10 1/10 1/10 1/10 1/10 1/ 2b

Again a guess!

Hidden-Markov ModelA set of observations:

The states are hidden:

1 2( , , )Ny y yy

1 2( , , )Ns s ss For example: s=(FFFFFFFFLLLFFFLL…)

Given the output sequence y, we need to find the most likely set of state transition and output probabilities. In other words, to derive the maximum likelihood estimate of the parameters (transition probabilities) of the HMM given a dataset of output sequences.

Forward-Backward algorithmWhat is the probability that the actual state of the system is i at time t?

The probability of the observed data up to time t:

The probability of the observed data after time t:

Then:

11

( ) ( ) ( )M

t ij j t tj

j A b y j

1 1 2 21 1 1 1 2 11

( ) ( ); ( ) ( ) ( )... ( ) ( )M

j t i i i i t ij j ti

j b y j b y A b y i A b y

1

( ) ( )( )

( ) ( )

t tt M

t ti

i iP i

i i

( ) ( | )t tP i P s i y

Baum-Welch re-estimation

Notice that we are using a guess of the transition matrix and the emission matrix!

Re-estimation of A and b:

Then we are able to iterate until it converges—we keep track of the probability of the whole data set generated by the given parameters until it converges to a maximum.

1 1'( ) ( ) ( )

( ) ( )t ij j t tt

ijt tt

i A b y jA

i i

'( , ) ( ) ( )

( )( ) ( )

t t tti

t tt

y k i ib k

i i

Results

Transition matrix:

Emission matrix:

0.9982 0.0018

0.0036 0.9964A

0.1696 0.1766 0.1583 0.1661 0.1649 0.1645

0.0985 0.0973 0.1019 0.0951 0.1051 0.5022b

Results Time when the loaded dice was used:

Results Histogram of the data generated by the Hidden-

Markov model:

Results Cumulative sum of the data generated by the Hidden

Markov model:

Results Log of the likelihood

rolling dice data analysis - hidden markov model danielle tan haolin zhu

Documents