dp and smith water man

Upload: pooja

Post on 13-Jan-2016

4 views

Category:

Documents


0 download

DESCRIPTION

dna comparsion

TRANSCRIPT

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimage

    Dynamic Programming & Smith-Waterman

    algorithmSeminar: Classical Papers in Bioinformatics

    Yvonne Herrmann

    May 3rd, 2010

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimage Overview

    1 Dynamic Programming

    2 Sequence comparison

    3 Smith-Waterman algorithm

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimageDynamic ProgrammingIntroduction

    Definition

    Dynamic Programming is a method of solving problemsby breaking them down into simpler steps

    problem need to contain overlapping subproblems andshould have an optimal substructure

    method is used for mathematical optimization andcomputer programming

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimageDynamic ProgrammingIntroduction

    Definition

    Dynamic Programming is a method of solving problemsby breaking them down into simpler steps

    problem need to contain overlapping subproblems andshould have an optimal substructure

    method is used for mathematical optimization andcomputer programming

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimageDynamic ProgrammingIntroduction

    Divide&Conquer

    Divide&Conquer is used when all subproblems areindependent.

    calculate partitions and combine the solutions to solvethe entire problem.

    vs.

    Dynamic Programming

    Dynamic Programming is used when subproblems aredependent

    there are no partitions, since the subproblems overlap.

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimageDynamic ProgrammingIntroduction

    Definition

    Dynamic Programming is a method of solving problemsby breaking them down into simpler steps

    problem need to contain overlapping subproblems andshould have an optimal substructure

    method is used for mathematical optimization andcomputer programming

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimageDynamic ProgrammingThe Principle of Optimality

    The Principle of Optimality

    An optimal policy has the property that

    whatever the initial state and initial decision are,

    the remaining decisions must constitute an optimal

    policy with regard to the state resulting from the

    first decision. a

    aBellman, R.E. 1957. Dynamic Programming, Chap.III.3., Princeton

    University Press

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimageDynamic ProgrammingThe Principle of Optimality - Example

    shortest path

    shortest way by car to get from Bielefeld to Cologne

    have to pass through Hamm(Westf) and Dortmund

    shortest route from Hamm(Westf) to Cologne, needs togo through Dortmund

    The second problem is inside the first one.

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimageDynamic ProgrammingAlgorithms

    Dynamic Programming is used by...

    Floyd-Warshall algorithm (shortest path algorithm)

    Needleman-Wunsch algorithm

    Smith-Waterman algorithm

    Bellman-Ford algorithm, etc.

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimage Overview

    1 Dynamic Programming

    2 Sequence comparison

    3 Smith-Waterman algorithm

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Intentions

    Alignments

    Smith-Waterman

    algorithm

    References

    pgflastimageSequence comparisonIntentions

    Why compare sequences?

    Quantify the similarity or dissimilarity between two or moresequences and find out where they are similar or different.

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Intentions

    Alignments

    Smith-Waterman

    algorithm

    References

    pgflastimageSequence comparisonWhy compare sequences?

    The analysis of this can help to determint:

    if genes from two different organism are related

    if similar nucleotide sequences lead to similar proteinstructures

    which species is likely more related to another one

    what kind of development happened in the evolution?(Mutations, insertions and deletions of gens or morespecific in the aminoacid sequence itself)

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Intentions

    Alignments

    Smith-Waterman

    algorithm

    References

    pgflastimageAlignmentsHow to compare sequences?

    sequence alignment

    Method of arranging the sequences of DNA, RNA oraminoacids of proteins to find regions of similarity whichmight be a consequence of functional, structural orevolutionary relationships between the sequences.

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Intentions

    Alignments

    Smith-Waterman

    algorithm

    References

    pgflastimageAlignmentsHow to compare sequences?

    Conditions a alignment has to fulfill

    all symbols have to be in the same order they appear inthe given sequences

    a symbol can be aligned with a blank (-)

    two blanks cannot be aligned

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Intentions

    Alignments

    Smith-Waterman

    algorithm

    References

    pgflastimageAlignmentsHow to compare sequences?

    Example

    sequence s and t are given:s: A C T G A A C T Gt: A T G G A C C T Ga possible alignment is:A C T - G A - A C T GA - T G G A C - C T G

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Intentions

    Alignments

    Smith-Waterman

    algorithm

    References

    pgflastimageLocal vs. global alignmentWhats the difference?

    global alignment

    The sequences must be aligned from start to end.

    local alignment

    Local alignments identify regions of high similarity withinsequences.

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Intentions

    Alignments

    Smith-Waterman

    algorithm

    References

    pgflastimageLocal vs. global alignmentWhats the difference?

    global alignment

    The sequences must be aligned from start to end.

    local alignment

    Local alignments identify regions of high similarity withinsequences which are often widely different overall.

    Smith-Waterman algorithm calculates the optimal localalignment!

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Intentions

    Alignments

    Smith-Waterman

    algorithm

    References

    pgflastimage Overview

    1 Dynamic Programming

    2 Sequence comparisonIntentionsAlignments

    3 Smith-Waterman algorithm

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmA little history

    algorithm was proposed in 1981 by Temple F. Smithand Michael S. Waterman

    algorithm uses dynamic programming and is a variationof the Needleman-Wunsch algorithm

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmWhats the goal of this algorithm?

    Smith-Waterman algorithm calculates the localalignment of two given sequences

    used to identify similar DNA, RNA and proteinsegments

    alignments of any possible length starting and ending atany position in the two sequences are compared toobtain the optimal local alignment

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmWhats the goal of this algorithm?

    it guarantees to find the optimal local alignmentconsidering the given scoring system.

    scoring system includes a substitution matrix and agap-scoring scheme.

    scores consider matches, mismatches, substitutions orinsertions/deletions

    main difference to the Needleman-Wunsch algorithm is:negative scores are set to zero

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmThe algorithm

    Starting conditions

    two molecular sequences A=a1a2...an and B=b1b2...bm.

    scoring theme

    course of events

    first: setting up matrix HHk0 = H0l = 0 (for 0 k n and 0 l m)

    next: calculate score for each cell

    last: backtrace the path to obtain optimal alignment

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmThe algorithm

    How to calculate the score for each cell?

    Individual pair-wise comparisons between the characters as:

    Hi j = max

    Hi1,j1 +s(ai ,bj),

    maxk{ Hik ,j - Wk},

    maxl{ Hi ,jl - Wl},

    0.

    k = deletion of length kl = deletion of length lWk and Wl is the gap cost function

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmDefintion

    backtracing

    During the filling of matrix H you have to use backpointersto reconstruct from which cell you came.Then when you found the highest score in the matrix H youcan backtrace the path and obtain the optimal alignment.

    caption of backpointers:

    Deletion

    Insertion

    Substitution

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmSmith-Waterman - Example

    Example

    sequence A and B are given:

    A: A G C T T and B: A G A C T

    scoring theme:match = +1

    mismatch = 1

    3

    Wk = 1 +1

    3 k

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmSmith-Waterman - Example

    Example

    sequence A and B are given:A: A G C T T and B: A G A C T

    Figure: Filled matrix H

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmSmith-Waterman - Example

    Example

    optimal local alignment:A G A CTA G - CT

    Figure: Filled matrix H and backtracing path

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmSmith-Waterman - Example 2

    best optimal local alignment can be anywhere in thesequences Find highest score in matrix H as backtracing start point

    Figure: Example from the original paper

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmSmith-Waterman - Example 2

    optimal local alignment:G C A U U GG C - U C G

    Figure: Example from the original paper

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmComplexity of the algorithm

    Complexity of the algorithm

    running-time: O(nm)

    algorithm is exact, but very time consuming.FASTA is an heuristic approximation and mostly usedtoday.

    need of space: O(nm)

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmDisadvantages

    time and space cost are very high

    finds the alignment with maximal score, but not withmaximal percent of matches

    algorithm makes mosaics of well-conserved fragmentswith connections by poorly-conserved fragmentssolution: length-normalized local alignment

    obtains the region with maximum degree of similarity

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    History

    Goal of the algorithm

    The algorithm

    The algorithm - anexample

    complexity analysis

    Disadvantages

    Applications

    References

    pgflastimageSmith-Waterman algorithmApplications

    JAligner

    SSEARCH (in FASTA package)

    Live-Demo of the Smith-Waterman algorithm:http://baba.sourceforge.net/

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimage Bibliography

    [1] Alison Cawsey, Dynamic Programming,http://www.macs.hw.ac.uk/alison/ds98/node122.html,1998

    [2] Temple F. Smith and Michael S. Waterman,Identification of Common Molecular Subsequences,J. Mol. Biol., 147(1):195-197, March 1981

    [3] Script: Sequence Analysis I+II, Lecture notes Faculty ofTechnology, Bielefeld University,Winter 2008/09 and Summer 2009

    [4] Norman Casagrande, Basic-Algorithms of BioinformaticsApplet,http://baba.sourceforge.net/, 2003

    [5] University of Southern California, University Professor,http://www.cmb.usc.edu/people/msw/Waterman.html,2005

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

  • Dynamic

    Programming &

    Smith-Waterman

    algorith

    Overview

    Dynamic

    Programming

    Sequence

    comparison

    Smith-Waterman

    algorithm

    References

    pgflastimage Thank you!

    The End

    Thank you for your attention!

    Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

    OverviewDynamic ProgrammingSequence comparisonIntentionsAlignments

    Smith-Waterman algorithmHistoryGoal of the algorithmThe algorithmThe algorithm - an examplecomplexity analysisDisadvantagesApplications

    References