hidden markov models. two learning scenarios 1.estimation when the “right answer” is known...
Post on 19-Dec-2015
214 views
TRANSCRIPT
Two learning scenarios
1. Estimation when the “right answer” is known
Examples: GIVEN: a genomic region x = x1…x1,000,000 where we have good
(experimental) annotations of the CpG islands
GIVEN: the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls
2. Estimation when the “right answer” is unknown
Examples:GIVEN: the porcupine genome; we don’t know how frequent are the
CpG islands there, neither do we know their composition
GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice
QUESTION:Update the parameters of the model to maximize P(x|)
1. When the right answer is known
Given x = x1…xN
for which the true = 1…N is known,
Define:
Akl = # times kl transition occurs in Ek(b) = # times state k in emits b in x
We can show that the maximum likelihood parameters are:
Akl Ek(b)
akl = ––––– ek(b) = –––––––
i Aki c Ek(c)
2. When the right answer is unknown
We don’t know the true Akl, Ek(b)
Idea:
• We estimate our “best guess” on what Akl, Ek(b) are
• We update the parameters of the model, based on our guess
• We repeat
2. When the right answer is unknown
Starting with our best guess of a model M, parameters :
Given x = x1…xN
for which the true = 1…N is unknown,
We can get to a provably more likely parameter set
Principle: EXPECTATION MAXIMIZATION
1. Estimate Akl, Ek(b) in the training data
2. Update according to Akl, Ek(b)
3. Repeat 1 & 2, until convergence
Estimating new parameters
To estimate Akl:
At each position i of sequence x,
Find probability transition kl is used:
P(i = k, i+1 = l | x) = [1/P(x)] P(i = k, i+1 = l, x1…xN) = Q/P(x)
where Q = P(x1…xi, i = k, i+1 = l, xi+1…xN) = = P(i+1 = l, xi+1…xN | i = k) P(x1…xi, i = k) = = P(i+1 = l, xi+1xi+2…xN | i = k) fk(i) = = P(xi+2…xN | i+1 = l) P(xi+1 | i+1 = l) P(i+1 = l | i = k) fk(i) = = bl(i+1) el(xi+1) akl fk(i)
fk(i) akl el(xi+1) bl(i+1)So: P(i = k, i+1 = l | x, ) = ––––––––––––––––––
P(x | )
Estimating new parameters
• So,
fk(i) akl el(xi+1) bl(i+1)
Akl = i P(i = k, i+1 = l | x, ) = i –––––––––––––––––
P(x | )
• Similarly,
Ek(b) = [1/P(x)] {i | xi = b} fk(i) bk(i)
k l
xi+1
akl
el(xi)
bl(i+1)fk(i)
x1………xi-1xi+2………xN
xi
The Baum-Welch Algorithm
Initialization:
Pick the best-guess for model parameters
(or arbitrary)
Iteration:1. Forward
2. Backward
3. Calculate Akl, Ek(b)
4. Calculate new model parameters akl, ek(b)
5. Calculate new log-likelihood P(x | )
GUARANTEED TO BE HIGHER BY EXPECTATION-MAXIMIZATION
Until P(x | ) does not change much
The Baum-Welch Algorithm
Time Complexity:
# iterations O(K2N)
• Guaranteed to increase the log likelihood of the model
P( | x) = P(x, ) / P(x) = P(x | ) / ( P(x) P() )
• Not guaranteed to find globally best parameters
Converges to local optimum, depending on initial conditions
• Too many parameters / too large model: Overtraining
Alternative: Viterbi Training
Initialization: Same
Iteration:1. Perform Viterbi, to find *
2. Calculate Akl, Ek(b) according to * + pseudocounts
3. Calculate the new parameters akl, ek(b)
Until convergence
Notes: Convergence is guaranteed – Why? Does not maximize P(x | ) In general, worse performance than Baum-Welch
Higher-order HMMs
The Genetic Code
3 nucleotides make 1 amino acid
Statistical dependencies in triplets
Question:
Recognize protein-coding segments with a HMM
One way to model protein regions
P(xixi+1xi+2 | xi-1xixi+1)
Every state of the HMM emits 3 nucleotides
Transition probabilities:
Probability of one triplet, given previous triplet P(i, | i-1)
Emission probabilities:
P(xixi-1xi-2 | i ) = 1/0
P(xi-1xi-2xi-3 | i-1 ) = 1/0
AAA AAC
AAT
TTT
…
…
A more elegant way
Every state of the HMM emits 1 nucleotide
Transition probabilities:
Probability of one triplet, given previous 3 triplets
P(i, | i-1, i-2, i-3)
Emission probabilities:
P(xi | i)
Algorithms extend with small modifications
A C
G T
Modeling the Duration of States
Length distribution of region X:
E[lX] = 1/(1-p)
• Geometric distribution, with mean 1/(1-p)
This is a significant disadvantage of HMMs
Several solutions exist for modeling different length distributions
X Y
1-p
1-q
p q
Sol’n 1: Chain several states
X Y
1-p
1-q
p
qXX
Disadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)
Sol’n 2: Negative binomial distribution
Duration in X: m turns, where During first m – 1 turns, exactly n – 1 arrows to next state are followed During mth turn, an arrow to next state is followed
m – 1 m – 1
P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-n
X
p
XX
p
1 – p 1 – p
p
…… Y
1 – p
Example: genes in prokaryotes
• EasyGene:
Prokaryotic
gene-finder
Larsen TS, Krogh A
• Negative binomial with n = 3
Solution 3: Duration modeling
Upon entering a state:
1. Choose duration d, according to probability distribution2. Generate d letters according to emission probs3. Take a transition to next state according to transition probs
Disadvantage: Increase in complexity:
Time: O(D2)Space: O(D)
where D = maximum duration of state
X
A state model for alignment
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACCIMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII
M(+1,+1)
I(+1, 0)
J(0, +1)
Alignments correspond 1-to-1 with sequences of states M, I, J
Let’s score the transitions
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACCIMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII
M(+1,+1)
I(+1, 0)
J(0, +1)
Alignments correspond 1-to-1 with sequences of states M, I, J
s(xi, yj)
s(xi, yj) s(xi, yj)
-d -d
-e -e
-e
-e
How do we find optimal alignment according to this model?
Dynamic Programming:
M(i, j): Optimal alignment of x1…xi to y1…yj ending in M
I(i, j): Optimal alignment of x1…xi to y1…yj ending in I
J(i, j): Optimal alignment of x1…xi to y1…yj ending in J
The score is additive, therefore we can apply DP recurrence formulas
Needleman Wunsch with affine gaps – state version
Initialization:M(0,0) = 0; M(i,0) = M(0,j) = -, for i, j > 0I(i,0) = d + ie; J(0,j) = d + je
Iteration:
M(i – 1, j – 1)M(i, j) = s(xi, yj) + max I(i – 1, j – 1)
J(i – 1, j – 1)
e + I(i – 1, j)I(i, j) = max e + J(i, j – 1)
d + M(i – 1, j – 1)
e + I(i – 1, j)J(i, j) = max e + J(i, j – 1)
d + M(i – 1, j – 1)
Termination:Optimal alignment given by max { M(m, n), I(m, n), J(m, n) }