machine learning 10-701tom/10701_sp11/slides/hmms_3_22-2011_ann.pdf · 2011-03-23 · machine...
TRANSCRIPT
![Page 1: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/1.jpg)
1
Machine Learning 10-701 Tom M. Mitchell
Machine Learning Department Carnegie Mellon University
March 22, 2011
Today: • Time series data • Markov Models • Hidden Markov Models • Dynamic Bayes Nets
Reading: • Bishop: Chapter 13 (very
thorough)
thanks to Professors Venu Govindaraju, Carlos Guestrin, Aarti Singh, and Eric Xing for access to slides on which some of these are based
Sequential Data
• stock market prediction • speech recognition • gene data analysis
O1 O2 OT …
how shall we represent and learn P(O1, O2 … OT) ?
![Page 2: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/2.jpg)
2
Markov Model
O1 O2 OT …
how shall we represent and learn P(O1, O2 … OT) ?
Use a Bayes net:
Markov model:
nth order Markov model:
Markov Model
O1 O2 OT …
how shall we represent and learn P(O1, O2 … OT) ?
Use a Bayes net:
Markov model:
nth order Markov model:
if Ot real valued, and assume P(Ot)~ N( f(Ot-1,Ot-2 … Ot-n), σ), where f is some linear function, called nth order autoregressive (AR) model
![Page 3: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/3.jpg)
3
Hidden Markov Models: Example
An experience in a casino
Game: 1. You bet $1 2. You roll (always with a fair die) 3. Casino player rolls (sometimes with fair die, sometimes with loaded die) 4. Highest number wins $2
Here is his sequence of die rolls:
1245526462146146136136661664661636616366163616515615115146123562344
Which die is being used in each play?
5
Question:
1245526462146146136136661664661636616366163616515615115146123562344
Which die is being used in each play?
![Page 4: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/4.jpg)
4
FAIR LOADED
0.05
0.05
0.95 0.95
P(1|F)=1/6P(2|F)=1/6P(3|F)=1/6P(4|F)=1/6P(5|F)=1/6P(6|F)=1/6
P(1|L)=1/10P(2|L)=1/10P(3|L)=1/10P(4|L)=1/10P(5|L)=1/10P(6|L)=1/2
The Dishonest Casino Model
7
Puzzles Regarding the Dishonest Casino
GIVEN: A sequence of rolls by the casino player
1245526462146146136136661664661636616366163616515615115146123562344
QUESTION • How likely is this sequence, given our model of how the casino
works? – This is the EVALUATION problem
• What portion of the sequence was generated with the fair die, and what portion with the loaded die?
– This is the DECODING question
• How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back?
– This is the LEARNING question
8
![Page 5: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/5.jpg)
5
Definition of HMM A A A A o2 o3 o1 oT
x2 x3 x1 xT ...
...
Graphical model
K
1
…
2
State automata view
9
1. N: number of hidden states Xt can take on X = {1, 2, … N}
2. O: set of values Ot can take on 3. Initial state distribution: P(X1=k) for k=1, 2, … N 4. State transition distribution: P(Xt+1=k | Xt =i ), for k,i =1, 2, … N 5. Emission distribution: P(Ot | Xt)
Handwriting recognition
Character recognition, e.g., logistic regression, Naïve Bayes
![Page 6: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/6.jpg)
6
Example of a hidden Markov model (HMM)
Understanding HMM Semantics
X1 = {a,…z} X5 = {a,…z} X3 = {a,…z} X4 = {a,…z} X2 = {a,…z}
O1 = O2 = O3 = O4 = O5 =
![Page 7: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/7.jpg)
7
HMMs semantics: Details
Just 3 distributions:
How do we generate a random output sequence following the HMM?
![Page 8: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/8.jpg)
8
Using and Learning HMM’s
Core HMM questions:
1. How do we calculate P(o1, o2, … on) ?
2. How do we calculate argmax over x1, x2, … xn of P(x1, x2, … xn | o1, o2, … on) ?
3. How do we train the HMM, given its structure and 3a. Fully observed training examples: < x1, … xn, o1, … on >
3b. Partially observed training examples: < o1, … on >
How do we compute
1. brute force:
![Page 9: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/9.jpg)
9
How do we compute
1. brute force:
2. Forward algorithm (dynamic progr., variable elimination): define
How do we compute
2. Backward algorithm (dynamic progr., variable elimination):
define
![Page 10: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/10.jpg)
10
How do we compute
Viterbi algorithm, based on recursive computation of
Learning HMMs from fully observable data: easy
Learn 3 distributions:
![Page 11: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/11.jpg)
11
Learning HMMs when only observe o1...oT
Additional Time Series Models
![Page 12: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/12.jpg)
12
![Page 13: Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University](https://reader033.vdocument.in/reader033/viewer/2022050103/5f42718285e18313351c9eb6/html5/thumbnails/13.jpg)
13
• Hidden Markov models (HMMs) – Very useful, very powerful! – Speech, OCR, time series, … – Parameter sharing, only learn 3 distributions – Dynamic programming (variable elimination) reduces
inference complexity – Special case of Bayes net – Dynamic Bayesian Networks
What you need to know