multiple alignment using hidden markove models november 21, 2001 kim hye jin intelligent multimedia...
TRANSCRIPT
Multiple alignment using hidden Multiple alignment using hidden Markove modelsMarkove models
November 21, 2001
Kim Hye Jin
Intelligent Multimedia Lab
Outline
• Introduction
• Methods and algorithm
• Result
• Discussion
IM lab
IntroductionIntroduction
• Why HMM?– Mathematically consistent description of
insertions and deletions– Theoretical insight into the difficulties of
combining disparate forms of information
Ex) sequences / 3D structures– Possible to train models from initially unaligned
sequences
Introduction| why HMM
IM lab
Methods and algorithms
• State transition – State sequence is a 1st
order Markov chain
– Each state is hidden
– match/Insert/delete state
• Symbol emission
Methods and algorithms|HMMs
States transition
Symbol emission
IM lab
Deletion state
Match state
Insertion state
IM lab
Methods and algorithms|HMMs
Methods and algorithms
Methods and algorithms|HMMs
IM lab
• Replacing arbitrary scores with probabilities relative to consensus
• Model M consists of N states S1 …SN.
• Observe sequence O consists of T symbols
O1 … ON from an alphabet x• aij : a transition from Si to Sj • bj(x) : emission probabilities for emission of a
symbol x from each state Sj
Methods and algorithms
Methods and algorithms|HMMs
IM lab
• Model of HMM : example of ACCY
Methods and algorithms
Methods and algorithms|HMMs
IM lab
• Forward algorithm
- a sum rather than a maximum
Methods and algorithms
Methods and algorithms|HMMs
IM lab
• Viterbi algorithm- the most likely path through the model- following the back pointers
Methods and algorithms
Methods and algorithms|HMMs
IM lab
• Baum-Welch algorithm– A variation of the forward algorithm– Reasonable guess for initial model and then
calculates a score for each sequence in the training set using EM algorithms
• Local optima problem: – forward algorithm /Viterbi algorithm – Baum-welch algorithm
Methods and algorithms
Methods and algorithms|HMMs
IM lab
• Simulated annealing– support global suboptimal – kT = 0 : standard Viterbi training procesure– kT goes down while in training
Methods and algorithms
Methods and algorithms|HMMs
IM lab
ClustalW
Methods and algorithms
Methods and algorithms|HMMs
IM lab
ClustalX
Results
Results
IM lab
• len : consensus length of the alignment
• ali : the # structurally aligned sequences
• %id: the percentage sequence identity
• Homo: the # homologues identified in and extraced from SwissProt 30
• %id : the average percentage sequence identity in the set of homologues
Results
Results
IM lab
Discussion
Discussion
IM lab
• HMM- a consistent theory for insertion and deletion
penality- EGF : fairly difficult alignments are well done
• ClusterW- progressive alignment- Disparaties between the sequence identity of the
structures and the sequence identity of the homologoues
- Large non-correlation between score and quality
Discussion
Discussion
IM lab
• The ability of HMM to sensitive fold recognition is apparent