s. maarschalkerweerd & a. tjhang1 parameter estimation for hmms, baum-welch algorithm, model...
Post on 20-Dec-2015
220 views
TRANSCRIPT
S. Maarschalkerweerd & A. Tjhang
1
Parameter estimation for HMMs, Parameter estimation for HMMs, Baum-Welch algorithm, Model Baum-Welch algorithm, Model topology, Numerical stability topology, Numerical stability
Chapter 3.3-3.7
S. Maarschalkerweerd & A. Tjhang
2
Overview last lectureOverview last lecture
Hidden Markov Models Different algorithms:
– Viterbi– Forward– Backward
S. Maarschalkerweerd & A. Tjhang
3
Overview todayOverview today
Parameter estimation for HMMs– Baum-Welch algorithm
HMM model structureMore complex Markov chainsNumerical stability of HMM algorithms
S. Maarschalkerweerd & A. Tjhang
4
Specifying a HMM modelSpecifying a HMM model
Most difficult problem using HMMs is specifying the model– Design of the structure– Assignment of parameter values
S. Maarschalkerweerd & A. Tjhang
5
Specifying a HMM modelSpecifying a HMM model
Most difficult problem using HMMs is specifying the model– Design of the structure– Assignment of parameter values
S. Maarschalkerweerd & A. Tjhang
6
Parameter estimation for HMMsParameter estimation for HMMs
Estimate transition and emission probabilities akl and ek(b)
Two ways of learning:– Estimation when state sequence is known– Estimation when paths are unknown
Assume that we have a set of example sequences (training sequences x1, …xn)
S. Maarschalkerweerd & A. Tjhang
7
Parameter estimation for HMMsParameter estimation for HMMs
Assume that x1…xn independent.So P(x1,…,xn | ) = P(xj | )
Since log ab = log a + logb
n
j=1
S. Maarschalkerweerd & A. Tjhang
8
Estimation when state Estimation when state sequence is knownsequence is known
Easier than estimation when paths unknown
Akl = number of transitions k to l in trainingdata + rkl
Ek(b) = number of emissions of b from k in training data + rk(b)
S. Maarschalkerweerd & A. Tjhang
9
Estimation when paths are unknownEstimation when paths are unknown
More complex than when paths are knownWe can’t use maximum likelihood
estimators Instead, an iterative algorithm is used
– Baum-Welch
S. Maarschalkerweerd & A. Tjhang
10
The Baum-Welch algorithmThe Baum-Welch algorithm
We don’t know real values of Akl and Ek(b)
1. Estimate Akl and Ek(b)
2. Update akl and ek(b)
3. Repeat with new model parameters akl and ek(b)
S. Maarschalkerweerd & A. Tjhang
11
Baum-Welch algorithmBaum-Welch algorithmForward value Backward value
S. Maarschalkerweerd & A. Tjhang
12
Baum-Welch algorithmBaum-Welch algorithm
Now that we have estimated Akl and Ek(b), use maximum likelihood estimators to compute akl and ek(b)
We use these values to estimate Akl and Ek(b) in the next iteration
Continue doing this iteration until change is very small or max number of iterations is exceeded
S. Maarschalkerweerd & A. Tjhang
16
DrawbacksDrawbacks
ML estimators– Vulnerable to overfitting if not enough data– Estimations can be undefined if never used in
training set (use pseudocounts)Baum-Welch
– Local maximum instead of global maximum can be found, depending on starting values of parameters
– This problem will be worse for large HMMs
S. Maarschalkerweerd & A. Tjhang
17
Modelling of labelled sequencesModelling of labelled sequences
Only -- and ++ are calculatedBetter than using ML estimators, when
many different classes are present
S. Maarschalkerweerd & A. Tjhang
18
Specifying a HMM modelSpecifying a HMM model
Most difficult problem using HMMs is specifying the model– Design of the structure– Assignment of parameter values
S. Maarschalkerweerd & A. Tjhang
19
Design of the structureDesign of the structure
Design: how to connect states by transitions A good HMM is based on the knowledge about the
problem under investigation Local maxima are biggest disadvantage in models
that are fully connected After deleting a transition from model Baum-Welch
will still work: set transition probability to zero
S. Maarschalkerweerd & A. Tjhang
21
Example 2Example 2
Model distribution of length between 2 and 10
S. Maarschalkerweerd & A. Tjhang
23
Silent statesSilent states
States that do not emit symbols
Also in other places in HMM
B
S. Maarschalkerweerd & A. Tjhang
25
Silent statesSilent states
Advantage:– Less estimations of transition probabilities
needed
Drawback:– Limits the possibilities of defining a model
S. Maarschalkerweerd & A. Tjhang
26
More complex Markov chainsMore complex Markov chains
So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol
More complex – High order Markov chains– Inhomogeneous Markov chains
S. Maarschalkerweerd & A. Tjhang
27
High order Markov chainsHigh order Markov chains
An nth order Markov process
Probability of a symbol in a sequence depends on the probability of the previous n symbols
An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet An of n-tuples, because
P(AB|B) = P(A|B)
S. Maarschalkerweerd & A. Tjhang
28
ExampleExample
A second order Markov chain with two different symbols {A,B}
This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB}
Sometimes the framework of high order model is convenient
S. Maarschalkerweerd & A. Tjhang
29
Gene candidates in DNA:
-sequence of triplets of nucleotides:
startcodon nr. of non-stopcodons stopcodon
-open reading frame (ORF)An ORF can be either a gene or a non-
coding ORF (NORF)
Finding prokaryotic genesFinding prokaryotic genes
S. Maarschalkerweerd & A. Tjhang
30
Finding prokaryotic genesFinding prokaryotic genes
Experiment: – DNA from bacterium E.coli– Dataset contains 1100 genes (900 used for
training, 200 for testing)
Two models:– Normal model with first order Markov chains– Also first order Markov chains, but codons
instead of nucleotides are used as symbol
S. Maarschalkerweerd & A. Tjhang
32
Inhomogeneous Markov chainsInhomogeneous Markov chains
Using the position information in the codon– Three models for position 1, 2 and 3
CAT GCA
P(C)aCA aAT aTG aGC aCA P(C)a2CA a3
AT a1TG a2
GC a3CA
Homogeneous Inhomogeneous
1 2 3 1 2 3
S. Maarschalkerweerd & A. Tjhang
33
Numerical Stability of HMM Numerical Stability of HMM algorithmsalgorithms
Multiplying many probabilities can cause numerical problems:– Underflow errors– Wrong numbers are calculated
Solutions:– Log transformation– Scaling of probabilities
S. Maarschalkerweerd & A. Tjhang
34
The log transformationThe log transformation
Compute log probabilities– Log 10-100000 = -100000– Underflow problem is essentially solved
Sum operation is often faster than product operation
In the Viterbi algorithm:
S. Maarschalkerweerd & A. Tjhang
35
Scaling of probabilitiesScaling of probabilities
Scale f and b variablesForward variable:
– For each i a scaling variable si is defined
– New f variables are defined:
– New forward recursion:
S. Maarschalkerweerd & A. Tjhang
36
Scaling of probabilitiesScaling of probabilities
Backward variable– Scaling has to be with same numbers as forward
variable– New backward recursion:
This normally works well, however underflow errors can still occur in models with many silent states (chapter 5)