brandon andrews. longest common subsequences global sequence alignment scoring alignments local...

14
Dynamic Programming 6.5-6.9 Brandon Andrews

Upload: christal-taylor

Post on 25-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Dynamic Programming6.5-6.9

Brandon Andrews

Page 2: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Topics Longest Common Subsequences Global Sequence Alignment Scoring Alignments Local Sequence Alignment Alignment with Gap Penalties Questions

Page 3: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Longest Common Subsequences (LCS)

Goal: Looking for sequence similarity between two sequences

Sequences can vary in length between each other• Sequences are denoted as v and w and are

viewed as strings of characters. v = ATTGCTA

Page 4: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Subsequences Subsequences are an ordered

sequence of characters in v or w For example: v = ATTGCTA then

AGCA and ATTA are subsequences• AGCA: ATTGCTA• ATTA: ATTGCTA

Page 5: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Operations The only operations we can perform is

insertion and deletion• Insertion: ATCTGAT -> A-TCTGAT

The hyphen represents inserting anything• Deletion: Insertion into the other sequence to

offset the characters to line up the longest common subsequences

•v=AT-C-TGAT•w=-TGCAT-A-• How do we find TCTA using dynamic programming?

Page 6: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Review: Edit Distance Turning one sequence into another

with the least number of operations.• Allowed insertion, deletion, and

substitutions The longest common subsequences

problem is basically identical with only insertion and deletion and the weights are 0 for a non-match and 1 for a match in the grid (basically Manhattan with fixed weights)

Page 7: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Example Example: Other slides

• Chapter 6: Edit Distance, Slides 54-58,

Page 8: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Global Sequence Alignment

Chapter 6: Alignment

Page 9: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Scoring Alignments Scoring matrices are based on

biological evidence.• Certain amino acid mutations are more

common than others.• For instance, Asn, Asp, Glu, and Ser are the

most mutable amino acids• The probability that Ser mutates into Phe is

approximately three times as likely as Trp mutating into the same amino acid Phe

Page 10: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

PAM 1 mutation for every 100 amino acids Required condition that ensures proteins

that are being analyzed are closely related.• The scoring matrix uses probabilities that can

change if the proteins are not closely related. The probability that one amino acid can mutate into

another is different essentially 1 PAM is the average time for the

“average” protein to mutate 1% You end up with PAM 1, PAM 2 type scoring

matrices

Page 11: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Local Sequence Alignment Global alignment looked at two entire

strings Local alignment attempts to only look

for local alignments• That is look for small sequences that are

similar in larger sequences

Page 12: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Smith-Waterman Local Alignment Algorithm

Set an edge weight of 0 from the source to every other vertex.

Page 13: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

Alignment with Gap Penalties

Gaps are expected in the sequences.• However, very small gaps could indicate

dissimilarity, so a penalty is given for gaps that meet a criteria

Page 14: Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties

References An Introduction to Bioinformatics

Algorithms Related Slides