© padhraic smyth, uc irvine a review of hidden markov models for context-based classification...

66
© Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams College June 28th 2001 Padhraic Smyth Information and Computer Science University of California, Irvine www.datalab.uci.edu

Upload: clarence-barnaby-benson

Post on 18-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

A Review of Hidden Markov Models for Context-Based Classification

ICML’01 Workshop onTemporal and Spatial Learning

Williams CollegeJune 28th 2001

Padhraic SmythInformation and Computer Science

University of California, Irvine

www.datalab.uci.edu

Page 2: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Outline

• Context in classification

• Brief review of hidden Markov models

• Hidden Markov models for classification

• Simulation results: how useful is context?• (with Dasha Chudova, UCI)

Page 3: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Historical Note

• “Classification in Context” was well-studied in pattern recognition in the 60’s and 70’s– e.g, recursive Markov-based algorithms were

proposed, before hidden Markov algorithms and models were fully understood

• Applications in– OCR for word-level recognition– remote-sensing pixel classification

Page 4: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Papers of Note

Raviv, J., “Decision-making in Markov chains applied to the problem of pattern recognition”, IEEE Info Theory, 3(4), 1967

Hanson, Riseman, and Fisher, “Context in word recognition,”Pattern Recognition, 1976

Toussaint, G., “The use of context in pattern recognition,” Pattern Recognition, 10, 1978

Mohn, Hjort, and Storvik, “A simulation study of some contextual classification methods for remotely sensed data,”IEEE Trans Geo. Rem. Sens., 25(6), 1987.

Page 5: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Context-Based Classification Problems

• Medical Diagnosis– classification of a patient’s state over time

• Fraud Detection– detection of stolen credit card

• Electronic Nose– detection of landmines

• Remote Sensing– classification of pixels into ground cover

Page 6: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Modeling Context

• Common Theme = Context– class labels (and features) are “persistent” in

time/space

Page 7: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Modeling Context

• Common Theme = Context– class labels (and features) are “persistent” in

time/space

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Features(observed)

Class (hidden)

Time

Page 8: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Feature Windows

• Predict Ct using a window, e.g., f(Xt, Xt-1, Xt-2)

– e.g., NETtalk application

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Features(observed)

Class (hidden)

Time

Page 9: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Alternative: Probabilistic Modeling

• E.g., assume p(Ct | history) = p(Ct | Ct-1)

– first order Markov assumption on the classes

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Features(observed)

Class (hidden)

Time

Page 10: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Brief review of hidden Markov models (HMMs)

Page 11: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Graphical Models

• Basic Idea: p(U) <=> an annotated graph

– Let U be a set of random variables of interest

– 1-1 mapping from U to nodes in a graph

– graph encodes “independence structure” of model

– numerical specifications of p(U) are stored locally at the nodes

Page 12: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Acyclic Directed Graphical Models(aka belief/Bayesian networks)

A B

C

In general,

p(X1, X2,....XN) = p(Xi | parents(Xi ) )

p(A,B,C) = p(C|A,B)p(A)p(B)

Page 13: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Undirected Graphical Models (UGs)

• Undirected edges reflect correlational dependencies– e.g., particles in physical systems, pixels in an image

• Also known as Markov random fields, Boltzmann machines, etc

p(X1, X2,....XN) =potential(clique i)

Page 14: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Examples of 3-way Graphical Models

A CB Markov chainp(A,B,C) = p(C|B) p(B|A) p(A)

Page 15: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Examples of 3-way Graphical Models

A CB

A B

C

Markov chainp(A,B,C) = p(C|B) p(B|A) p(A)

Independent Causes:p(A,B,C) = p(C|A,B)p(A)p(B)

Page 16: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Hidden Markov Graphical Model• Assumption 1:

– p(Ct | history) = p(Ct | Ct-1)

– first order Markov assumption on the classes

• Assumption 2:

– p(Xt | history, Ct ) = p(Xt | Ct )

– Xt only depends on current class Ct

Page 17: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Hidden Markov Graphical Model

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Features(observed)

Class (hidden)

Time

Notes: - all temporal dependence is modeled through the class variable C - this is the simplest possible model

- Avoids modeling p(X|other X’s)

Page 18: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Generalizations of HMMs

R1

C1

R2

C2

R3

C3

RT

CT

- - - - - - - -

SpatialRainfall(observed)

State (hidden)

Hidden state model relating atmospheric measurementsto local rainfall

“Weather state” couples multiple variables in time and space

(Hughes and Guttorp, 1996)

Graphical models = language for spatio-temporal modeling

A1 A2 A3 ATAtmospheric(observed)

Page 19: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Exact Probability Propagation (PP) Algorithms

• Basic PP Algorithm – Pearl, 1988; Lauritzen and Spiegelhalter, 1988– Assume the graph has no loops– Declare 1 node (any node) to be a root – Schedule two phases of message-passing

• nodes pass messages up to the root• messages are distributed back to the leaves

– (if loops, convert loopy graph to an equivalent tree)

Page 20: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Properties of the PP Algorithm

• Exact– p(node|all data) is recoverable at each node

• i.e., we get exact posterior from local message-passing

– modification: MPE = most likely instantiation of all nodes jointly

• Efficient– Complexity: exponential in size of largest clique– Brute force: exponential in all variables

Page 21: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Hidden Markov Graphical Model

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Features(observed)

Class (hidden)

Time

Page 22: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

PP Algorithm for a HMM

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Features(observed)

Class (hidden)

Let CT be the root

Page 23: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

PP Algorithm for a HMM

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Features(observed)

Class (hidden)

Let CT be the root

Absorb evidence from X’s (which are fixed)

Page 24: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

PP Algorithm for a HMM

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Features(observed)

Class (hidden)

Let CT be the root

Absorb evidence from X’s (which are fixed)

Forward pass: pass evidence forward from C1

Page 25: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

PP Algorithm for a HMM

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Features(observed)

Class (hidden)

Let CT be the root

Absorb evidence from X’s (which are fixed)

Forward pass: pass evidence forward from C1

Backward pass: pass evidence backward from CT

(This is the celebrated “forward-backward” algorithm for HMMs)

Page 26: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Comments on F-B Algorithm

• Complexity = O(T m2)

• Has been reinvented several times– e.g., BCJR algorithm for error-correcting codes

• Real-time recursive version– run algorithm forward to current time t– can propagate backwards to “revise” history

Page 27: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

HMMs and Classification

Page 28: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Forward-Backward Algorithm

• Classification

– Algorithm produces p(Ct|all other data) at each node

– to minimize 0-1 loss• choose most likely class at each t

• Most likely class sequence?– Not the same as the sequence of most likely classes– can be found instead with Viterbi/dynamic

programming• replace sums in F-B with “max”

Page 29: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Supervised HMM learning

• Use your favorite classifier to learn p(C|X)– i.e., ignore temporal aspect of problem (temporarily)

• Now, estimate p(Ct | Ct-1) from labeled training

data

• We have a fully operational HMM– no need to use EM for learning if class labels are

provided (i.e., do “supervised HMM learning”)

Page 30: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Fault Diagnosis Application (Smyth, Pattern Recognition, 1994)

Features

FaultClasses

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Fault Detection in 34m Antenna Systems:

Classes: {normal, short-circuit, tacho problem, ..}

Features: AR coefficients measured every 2 seconds

Classes are persistent over time

Page 31: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Approach and Results

• Classifiers– Gaussian model and neural network– trained on labeled “instantaneous window” data

• Markov component– transition probabilities estimated from MTBF data

• Results– discriminative neural net much better than Gaussian– Markov component reduced the error rate (all false

alarms) of 2% to 0%.

Page 32: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Classification with and withoutthe Markov context

We will compare what happens when

(a) we just make decisions based on p(Ct | Xt )(“ignore context”)

(b) we use the full Markov context(i.e., use forward-backward to“integrate” temporal information)

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Page 33: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Component 1 Component 2p(

x)

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Mixture Model

x

p(x)

Page 34: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 10 20 30 40 50 60 70 80 90 100

-2024

Gaussian vs HMM Classification

Obs

erva

tions

Page 35: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 10 20 30 40 50 60 70 80 90 100

-2024

Gaussian vs HMM Classification

Obs

erva

tions

ObservationsTrue states

Page 36: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 10 20 30 40 50 60 70 80 90 100

-2024

Gaussian vs HMM Classification

Obs

erva

tions

ObservationsTrue states

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

Pos

terio

r

Page 37: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 10 20 30 40 50 60 70 80 90 100

-2024

Gaussian vs HMM Classification

Obs

erva

tions

ObservationsTrue states

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

Pos

terio

r

HMMGauss

Page 38: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 10 20 30 40 50 60 70 80 90 100

-2024

Gaussian vs HMM Classification

Obs

erva

tions

ObservationsTrue states

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

Pos

terio

r

HMMGauss

0 10 20 30 40 50 60 70 80 90 1001

1.5

2

HM

M D

ecod

ing

Page 39: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 10 20 30 40 50 60 70 80 90 100

-2024

Gaussian vs HMM Classification

Obs

erva

tions

ObservationsTrue states

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

Pos

terio

r

HMMGauss

0 10 20 30 40 50 60 70 80 90 1001

1.5

2

HM

M D

ecod

ing

0 10 20 30 40 50 60 70 80 90 1001

1.5

2

Gau

ss D

ecod

ing

Page 40: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Simulation Experiments

Page 41: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Systematic Simulations

Simulation Setup1. Two Gaussian classes, at mean 0 and mean 1 => vary “separation” = sigma of the Gaussians

2. Markov dependence A = [p 1-p ; 1-p p]

Vary p (self-transition) = “strength of context”

Look at Bayes error with and without context

X1

C1

X2

C2

X3

C3

XT

CT

- - - - - - - -

Page 42: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

-4 -3 -2 -1 0 1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Class 1 Class 2p(x)

Bayes error = 0.08

Separation = 3sigma

Page 43: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

-4 -3 -2 -1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Class 1 Class 2p(x)

Bayes error = 0.31

Separation = 1sigma

Page 44: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5Bayes Error vs. Markov Probability

Self-transition probability

Ba

yes

Err

or

Ra

te

Separation = 0.1Separation = 1Separation = 2Separation = 4

Page 45: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 0.5 1 1.5 2 2.5 3 3.5 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5Bayes Error vs. Gaussian Separation

Separation

Ba

yes

Err

or

Ra

teSelf-transition = 0.5Self-transition = 0.9Self-transition = 0.94Self-transition = 0.99

Page 46: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 0.5 1 1.5 2 2.5 3 3.5 40

10

20

30

40

50

60

70

80

90

100% Reduction in Bayes Error vs. Gaussian Separation

Separation

Pe

rce

nt D

ecr

ea

se in

Ba

yes

Err

or

Self-transition = 0.5Self-transition = 0.9Self-transition = 0.94Self-transition = 0.99

Page 47: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

In summary….

• Context reduces error– greater Markov dependence => greater reduction

• Reduction is dramatic for p>0.9– e.g., even with minimal Gaussian separation, Bayes

error can be reduced to zero!!

Page 48: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Approximate Methods

• Forward-Only:– necessary in many applications

• “Two nearest-neighbors”– only use information from C(t-1) and C(t+1)

• How suboptimal are these methods?

Page 49: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35Bayes Error vs. Markov Probability

Log-odds of self-transition probability

Ba

yes

Err

or

Ra

teFwBw, Separation = 1Fw, Separation = 1NN2, Separation = 1

Page 50: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45Bayes Error vs. Markov Probability

Log-odds of self-transition probability

Ba

yes

Err

or

Ra

te

FwBw, Separation = 0.25Fw, Separation = 0.25NN2, Separation = 0.25

Page 51: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 0.5 1 1.5 2 2.5 3 3.5 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5Bayes Error vs. Gaussian Separation

Separation

Ba

yes

Err

or

Ra

teFwBw, self-transition = 0.99Fw, self-transition = 0.99NN2, self-transition = 0.99Bayes error

Page 52: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 0.5 1 1.5 2 2.5 3 3.5 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5Bayes Error vs. Gaussian Separation

Separation

Ba

yes

Err

or

Ra

teFwBw, self-transition = 0.9Fw, self-transition = 0.9NN2, self-transition = 0.9Bayes error

Page 53: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

In summary (for approximations)….

• Forward only:– “tracks” forward-backward reductions– generally gets much more than 50% of gap between

F-B and context-free Bayes error

• 2-neighbors– typically worse than forward only – much worse for small separation– much worse for very high transition probs

• does not converge to zero Bayes error

Page 54: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Extensions to “Simple” HMMs

Semi Markov modelsduration in each state need not be geometric

Segmental Markov Modelsoutputs within each state have a non-constant mean, regression function

Dynamic Belief NetworksAllow arbitrary dependencies among classes and features

Stochastic Grammars, Spatial Landmark models, etc

[See Afternoon Talks at this workshop for other approaches]

Page 55: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Conclusions

• Context is increasingly important in many classification applications

• Graphical models – HMMs are a simple and practical approach– graphical models provide a general-purpose

language for context

• Theory/Simulation– Effect of context on error rate can be dramatic

Page 56: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 0.5 1 1.5 2 2.5 3 3.5 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35Absolute Reduction in Bayes Error vs. Gaussian Separation

Separation

Pe

rce

nt D

ecr

ea

se in

Ba

yes

Err

or

Self-transition = 0.5Self-transition = 0.9Self-transition = 0.94Self-transition = 0.99

Page 57: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 1 2 3 4 5 6 70

0.01

0.02

0.03

0.04

0.05

0.06

0.07Bayes Error vs. Markov Probability

Log-odds of self-transition probability

Ba

yes

Err

or

Ra

teFwBw, Separation = 3Fw, Separation = 3NN2, Separation = 3

Page 58: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 0.5 1 1.5 2 2.5 3 3.5 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5Bayes Error vs. Gaussian Separation

Separation

Ba

yes

Err

or

Ra

teFwBw, self-transition = 0.7Fw, self-transition = 0.7NN2, self-transition = 0.7Bayes error

Page 59: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 0.5 1 1.5 2 2.5 3 3.5 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35Absolute Reduction in Bayes Error vs. Gaussian Separation

Separation

De

cre

ase

in B

aye

s E

rro

r R

ate

FwBw, self-transition = 0.99Fw, self-transition = 0.99NN2, self-transition = 0.99

Page 60: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

0 0.5 1 1.5 2 2.5 3 3.5 40

10

20

30

40

50

60

70

80

90

100Percent Decrease in Bayes Error vs. Gaussian Separation

Separation

Pe

rce

nt D

ecr

ea

se in

Ba

yes

Err

or

FwBw, self-transition = 0.99Fw, self-transition = 0.99NN2, self-transition = 0.99

Page 61: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Sketch of the PP algorithm in action

Page 62: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Sketch of the PP algorithm in action

Page 63: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Sketch of the PP algorithm in action

1

Page 64: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Sketch of the PP algorithm in action

1 2

Page 65: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Sketch of the PP algorithm in action

1 2

3

Page 66: © Padhraic Smyth, UC Irvine A Review of Hidden Markov Models for Context-Based Classification ICML’01 Workshop on Temporal and Spatial Learning Williams

© Padhraic Smyth, UC Irvine

Sketch of the PP algorithm in action

1 2

3 4