Download - DNA Sequence Alignment
DNA Sequence Alignment
A dynamic programming algorithm
Some ideas stole from Winter 1996 offering of 590BI athttp://www/education/courses/590bi/98wi/See Lecture 2 by Prof. Ruzzo. Or try current quarter of CSE 527.Those slides are more detailed and biologically accurate.
DNA Sequence Alignment (aka “Longest Common Subsequence”)
• The problem– What is a DNA sequence?– DNA similarity– What is DNA sequence alignment?– Using English words
• The Naïve algorithm• The Dynamic Programming algorithm• Idea of Dynamic Programming
What is a DNA sequence
• DNA: string using letters A,C,G,T– Letter = DNA “base”– e.g. AGATGGGCAAGATA
• DNA makes up your “genetic code”
DNA similarity
• DNA can mutate.– Change a letter
• AACCGGTT ATCCGGTT
– Insert a letter• AACCGGTT ATAACCGGTT
– Delete a letter• AACCGGTT ACCGGTT
• A few mutations makes sequences different, but “similar”
Why is DNA similarity important
• New sequences compared to existing sequences
• Similar sequences often have similar function
• Most widely used algorithm in computational biology tools– e.g. BLAST at
http://www.ncbi.nlm.nih.gov/BLAST/
What is DNA sequence alignment?
• Match 2 sequences, with underscore ( _ ) wildcards.
• Best Alignment minimum underscores (slight simplification, but okay for 326)
• e.g. ACCCGTTTTCCCTTT
Best alignment:(3 underscores)
A_CCCGTTT_TCCC_TTT
Moving to English words
zashaashes
zash__a_ashes_
Naïve algorithm
• Try every way to put in underscores
• If it works, and is best so far, record it.
• At end, return best solution.
Naïve Algorithm – Running Time
• Strings size M,N: )2( NM
Dynamic Approach – A table
• Table(x,y): best alignment for first x letters of string 1, and first y letters of string 2
• Decide what to do with the end of string, then look up best alignment of remainder in Table.
e.g. ‘a’ vs. ‘s’
• “zasha” vs. “ashes”. 2 possibilities for last letters:– (1) match ‘a’ with ‘_’:
• best_alignment(“zash”,”ashes”)+1
– (2) match ‘s’ with ‘_’:• best_alignment(“zasha”,”ashe”)+1
best_alignment(“zasha”,”ashes”)=min(best_alignment(“zash”,”ashes”)+1, best_alignment(“zasha”,”ashe”)+1)
An example
(empty) Z A S H A
(empty)
A
S
H
E
S
Example with solution(empty) Z A S H A
(empty) 0 1 2 3 4 5
A 1 2 1 2 3 4
S 2 3 2 1 2 3
H 3 4 3 2 1 2
E 4 5 4 3 2 3
S 5 6 5 4 3 4
zasha___ash_es
Pseudocode (bottom-up)Given: Strings X,Y , Table[0..x,0..y]
For i=1 to x doTable[i,0]=i
For j=1 to y doTable[0,j]=i
i=1, j=1While i<=x and j<=y
If X[x]=Y[y] Then// matches – no underscoresTable[x,y]=Table[x-1,y-1]
ElseTable[x,y]=min(Table[x-1,y],Table[x,y-1])+1
End Ifi=i+1If i>x Then
i=1j=j+1
End If
Pseudocode (top-down)Given: Strings X,Y , Table[0..x,0..y]
BestAlignment (x,y)Compute Table[x-1,y] if necessaryCompute Table[x,y-1] if necessaryCompute Table[x-1,y-1] if necessary
If X[x]=Y[y] Then// matches – no underscoresTable[x,y]=Table[x-1,y-1]
ElseTable[x,y]=min(Table[x-1,y],Table[x,y-1])+1
End If
Running time
• Every square in table is filled in once
• Filling it in is constant time(n2) squares alg is (n2)
Idea of dynamic programming
• Re-use expensive computations– Identify critical input to problem (e.g. best
alignment of prefixes of strings)– Store results in table, indexed by critical input– Solve cells in table of other cells
• Top-down often easier to program
Albert Q.Dynamicat Whislermountain
Picture from PhotoDisc.com