dna sequence alignment

17
DNA Sequence Alignment A dynamic programming algorithm Some ideas stole from Winter 1996 offering of 590BI at http://www/education/courses/590bi/98wi/ See Lecture 2 by Prof. Ruzzo. Or try current quarter of CSE 527. Those slides are more detailed and biologically accurate.

Upload: benedict-buckner

Post on 30-Dec-2015

33 views

Category:

Documents


0 download

DESCRIPTION

DNA Sequence Alignment. A dynamic programming algorithm. Some ideas stole from Winter 1996 offering of 590BI at http://www/education/courses/590bi/98wi/ See Lecture 2 by Prof. Ruzzo. Or try current quarter of CSE 527. Those slides are more detailed and biologically accurate. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DNA Sequence Alignment

DNA Sequence Alignment

A dynamic programming algorithm

Some ideas stole from Winter 1996 offering of 590BI athttp://www/education/courses/590bi/98wi/See Lecture 2 by Prof. Ruzzo. Or try current quarter of CSE 527.Those slides are more detailed and biologically accurate.

Page 2: DNA Sequence Alignment

DNA Sequence Alignment (aka “Longest Common Subsequence”)

• The problem– What is a DNA sequence?– DNA similarity– What is DNA sequence alignment?– Using English words

• The Naïve algorithm• The Dynamic Programming algorithm• Idea of Dynamic Programming

Page 3: DNA Sequence Alignment

What is a DNA sequence

• DNA: string using letters A,C,G,T– Letter = DNA “base”– e.g. AGATGGGCAAGATA

• DNA makes up your “genetic code”

Page 4: DNA Sequence Alignment

DNA similarity

• DNA can mutate.– Change a letter

• AACCGGTT ATCCGGTT

– Insert a letter• AACCGGTT ATAACCGGTT

– Delete a letter• AACCGGTT ACCGGTT

• A few mutations makes sequences different, but “similar”

Page 5: DNA Sequence Alignment

Why is DNA similarity important

• New sequences compared to existing sequences

• Similar sequences often have similar function

• Most widely used algorithm in computational biology tools– e.g. BLAST at

http://www.ncbi.nlm.nih.gov/BLAST/

Page 6: DNA Sequence Alignment

What is DNA sequence alignment?

• Match 2 sequences, with underscore ( _ ) wildcards.

• Best Alignment minimum underscores (slight simplification, but okay for 326)

• e.g. ACCCGTTTTCCCTTT

Best alignment:(3 underscores)

A_CCCGTTT_TCCC_TTT

Page 7: DNA Sequence Alignment

Moving to English words

zashaashes

zash__a_ashes_

Page 8: DNA Sequence Alignment

Naïve algorithm

• Try every way to put in underscores

• If it works, and is best so far, record it.

• At end, return best solution.

Page 9: DNA Sequence Alignment

Naïve Algorithm – Running Time

• Strings size M,N: )2( NM

Page 10: DNA Sequence Alignment

Dynamic Approach – A table

• Table(x,y): best alignment for first x letters of string 1, and first y letters of string 2

• Decide what to do with the end of string, then look up best alignment of remainder in Table.

Page 11: DNA Sequence Alignment

e.g. ‘a’ vs. ‘s’

• “zasha” vs. “ashes”. 2 possibilities for last letters:– (1) match ‘a’ with ‘_’:

• best_alignment(“zash”,”ashes”)+1

– (2) match ‘s’ with ‘_’:• best_alignment(“zasha”,”ashe”)+1

best_alignment(“zasha”,”ashes”)=min(best_alignment(“zash”,”ashes”)+1, best_alignment(“zasha”,”ashe”)+1)

Page 12: DNA Sequence Alignment

An example

(empty) Z A S H A

(empty)

A

S

H

E

S

Page 13: DNA Sequence Alignment

Example with solution(empty) Z A S H A

(empty) 0 1 2 3 4 5

A 1 2 1 2 3 4

S 2 3 2 1 2 3

H 3 4 3 2 1 2

E 4 5 4 3 2 3

S 5 6 5 4 3 4

zasha___ash_es

Page 14: DNA Sequence Alignment

Pseudocode (bottom-up)Given: Strings X,Y , Table[0..x,0..y]

For i=1 to x doTable[i,0]=i

For j=1 to y doTable[0,j]=i

i=1, j=1While i<=x and j<=y

If X[x]=Y[y] Then// matches – no underscoresTable[x,y]=Table[x-1,y-1]

ElseTable[x,y]=min(Table[x-1,y],Table[x,y-1])+1

End Ifi=i+1If i>x Then

i=1j=j+1

End If

Page 15: DNA Sequence Alignment

Pseudocode (top-down)Given: Strings X,Y , Table[0..x,0..y]

BestAlignment (x,y)Compute Table[x-1,y] if necessaryCompute Table[x,y-1] if necessaryCompute Table[x-1,y-1] if necessary

If X[x]=Y[y] Then// matches – no underscoresTable[x,y]=Table[x-1,y-1]

ElseTable[x,y]=min(Table[x-1,y],Table[x,y-1])+1

End If

Page 16: DNA Sequence Alignment

Running time

• Every square in table is filled in once

• Filling it in is constant time(n2) squares alg is (n2)

Page 17: DNA Sequence Alignment

Idea of dynamic programming

• Re-use expensive computations– Identify critical input to problem (e.g. best

alignment of prefixes of strings)– Store results in table, indexed by critical input– Solve cells in table of other cells

• Top-down often easier to program

Albert Q.Dynamicat Whislermountain

Picture from PhotoDisc.com