dp and smith water man
DESCRIPTION
dna comparsionTRANSCRIPT
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimage
Dynamic Programming & Smith-Waterman
algorithmSeminar: Classical Papers in Bioinformatics
Yvonne Herrmann
May 3rd, 2010
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimage Overview
1 Dynamic Programming
2 Sequence comparison
3 Smith-Waterman algorithm
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimageDynamic ProgrammingIntroduction
Definition
Dynamic Programming is a method of solving problemsby breaking them down into simpler steps
problem need to contain overlapping subproblems andshould have an optimal substructure
method is used for mathematical optimization andcomputer programming
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimageDynamic ProgrammingIntroduction
Definition
Dynamic Programming is a method of solving problemsby breaking them down into simpler steps
problem need to contain overlapping subproblems andshould have an optimal substructure
method is used for mathematical optimization andcomputer programming
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimageDynamic ProgrammingIntroduction
Divide&Conquer
Divide&Conquer is used when all subproblems areindependent.
calculate partitions and combine the solutions to solvethe entire problem.
vs.
Dynamic Programming
Dynamic Programming is used when subproblems aredependent
there are no partitions, since the subproblems overlap.
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimageDynamic ProgrammingIntroduction
Definition
Dynamic Programming is a method of solving problemsby breaking them down into simpler steps
problem need to contain overlapping subproblems andshould have an optimal substructure
method is used for mathematical optimization andcomputer programming
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimageDynamic ProgrammingThe Principle of Optimality
The Principle of Optimality
An optimal policy has the property that
whatever the initial state and initial decision are,
the remaining decisions must constitute an optimal
policy with regard to the state resulting from the
first decision. a
aBellman, R.E. 1957. Dynamic Programming, Chap.III.3., Princeton
University Press
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimageDynamic ProgrammingThe Principle of Optimality - Example
shortest path
shortest way by car to get from Bielefeld to Cologne
have to pass through Hamm(Westf) and Dortmund
shortest route from Hamm(Westf) to Cologne, needs togo through Dortmund
The second problem is inside the first one.
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimageDynamic ProgrammingAlgorithms
Dynamic Programming is used by...
Floyd-Warshall algorithm (shortest path algorithm)
Needleman-Wunsch algorithm
Smith-Waterman algorithm
Bellman-Ford algorithm, etc.
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimage Overview
1 Dynamic Programming
2 Sequence comparison
3 Smith-Waterman algorithm
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Intentions
Alignments
Smith-Waterman
algorithm
References
pgflastimageSequence comparisonIntentions
Why compare sequences?
Quantify the similarity or dissimilarity between two or moresequences and find out where they are similar or different.
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Intentions
Alignments
Smith-Waterman
algorithm
References
pgflastimageSequence comparisonWhy compare sequences?
The analysis of this can help to determint:
if genes from two different organism are related
if similar nucleotide sequences lead to similar proteinstructures
which species is likely more related to another one
what kind of development happened in the evolution?(Mutations, insertions and deletions of gens or morespecific in the aminoacid sequence itself)
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Intentions
Alignments
Smith-Waterman
algorithm
References
pgflastimageAlignmentsHow to compare sequences?
sequence alignment
Method of arranging the sequences of DNA, RNA oraminoacids of proteins to find regions of similarity whichmight be a consequence of functional, structural orevolutionary relationships between the sequences.
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Intentions
Alignments
Smith-Waterman
algorithm
References
pgflastimageAlignmentsHow to compare sequences?
Conditions a alignment has to fulfill
all symbols have to be in the same order they appear inthe given sequences
a symbol can be aligned with a blank (-)
two blanks cannot be aligned
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Intentions
Alignments
Smith-Waterman
algorithm
References
pgflastimageAlignmentsHow to compare sequences?
Example
sequence s and t are given:s: A C T G A A C T Gt: A T G G A C C T Ga possible alignment is:A C T - G A - A C T GA - T G G A C - C T G
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Intentions
Alignments
Smith-Waterman
algorithm
References
pgflastimageLocal vs. global alignmentWhats the difference?
global alignment
The sequences must be aligned from start to end.
local alignment
Local alignments identify regions of high similarity withinsequences.
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Intentions
Alignments
Smith-Waterman
algorithm
References
pgflastimageLocal vs. global alignmentWhats the difference?
global alignment
The sequences must be aligned from start to end.
local alignment
Local alignments identify regions of high similarity withinsequences which are often widely different overall.
Smith-Waterman algorithm calculates the optimal localalignment!
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Intentions
Alignments
Smith-Waterman
algorithm
References
pgflastimage Overview
1 Dynamic Programming
2 Sequence comparisonIntentionsAlignments
3 Smith-Waterman algorithm
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmA little history
algorithm was proposed in 1981 by Temple F. Smithand Michael S. Waterman
algorithm uses dynamic programming and is a variationof the Needleman-Wunsch algorithm
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmWhats the goal of this algorithm?
Smith-Waterman algorithm calculates the localalignment of two given sequences
used to identify similar DNA, RNA and proteinsegments
alignments of any possible length starting and ending atany position in the two sequences are compared toobtain the optimal local alignment
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmWhats the goal of this algorithm?
it guarantees to find the optimal local alignmentconsidering the given scoring system.
scoring system includes a substitution matrix and agap-scoring scheme.
scores consider matches, mismatches, substitutions orinsertions/deletions
main difference to the Needleman-Wunsch algorithm is:negative scores are set to zero
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmThe algorithm
Starting conditions
two molecular sequences A=a1a2...an and B=b1b2...bm.
scoring theme
course of events
first: setting up matrix HHk0 = H0l = 0 (for 0 k n and 0 l m)
next: calculate score for each cell
last: backtrace the path to obtain optimal alignment
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmThe algorithm
How to calculate the score for each cell?
Individual pair-wise comparisons between the characters as:
Hi j = max
Hi1,j1 +s(ai ,bj),
maxk{ Hik ,j - Wk},
maxl{ Hi ,jl - Wl},
0.
k = deletion of length kl = deletion of length lWk and Wl is the gap cost function
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmDefintion
backtracing
During the filling of matrix H you have to use backpointersto reconstruct from which cell you came.Then when you found the highest score in the matrix H youcan backtrace the path and obtain the optimal alignment.
caption of backpointers:
Deletion
Insertion
Substitution
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmSmith-Waterman - Example
Example
sequence A and B are given:
A: A G C T T and B: A G A C T
scoring theme:match = +1
mismatch = 1
3
Wk = 1 +1
3 k
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmSmith-Waterman - Example
Example
sequence A and B are given:A: A G C T T and B: A G A C T
Figure: Filled matrix H
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmSmith-Waterman - Example
Example
optimal local alignment:A G A CTA G - CT
Figure: Filled matrix H and backtracing path
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmSmith-Waterman - Example 2
best optimal local alignment can be anywhere in thesequences Find highest score in matrix H as backtracing start point
Figure: Example from the original paper
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmSmith-Waterman - Example 2
optimal local alignment:G C A U U GG C - U C G
Figure: Example from the original paper
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmComplexity of the algorithm
Complexity of the algorithm
running-time: O(nm)
algorithm is exact, but very time consuming.FASTA is an heuristic approximation and mostly usedtoday.
need of space: O(nm)
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmDisadvantages
time and space cost are very high
finds the alignment with maximal score, but not withmaximal percent of matches
algorithm makes mosaics of well-conserved fragmentswith connections by poorly-conserved fragmentssolution: length-normalized local alignment
obtains the region with maximum degree of similarity
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
History
Goal of the algorithm
The algorithm
The algorithm - anexample
complexity analysis
Disadvantages
Applications
References
pgflastimageSmith-Waterman algorithmApplications
JAligner
SSEARCH (in FASTA package)
Live-Demo of the Smith-Waterman algorithm:http://baba.sourceforge.net/
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimage Bibliography
[1] Alison Cawsey, Dynamic Programming,http://www.macs.hw.ac.uk/alison/ds98/node122.html,1998
[2] Temple F. Smith and Michael S. Waterman,Identification of Common Molecular Subsequences,J. Mol. Biol., 147(1):195-197, March 1981
[3] Script: Sequence Analysis I+II, Lecture notes Faculty ofTechnology, Bielefeld University,Winter 2008/09 and Summer 2009
[4] Norman Casagrande, Basic-Algorithms of BioinformaticsApplet,http://baba.sourceforge.net/, 2003
[5] University of Southern California, University Professor,http://www.cmb.usc.edu/people/msw/Waterman.html,2005
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
-
Dynamic
Programming &
Smith-Waterman
algorith
Overview
Dynamic
Programming
Sequence
comparison
Smith-Waterman
algorithm
References
pgflastimage Thank you!
The End
Thank you for your attention!
Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm
OverviewDynamic ProgrammingSequence comparisonIntentionsAlignments
Smith-Waterman algorithmHistoryGoal of the algorithmThe algorithmThe algorithm - an examplecomplexity analysisDisadvantagesApplications
References