brandon andrews. longest common subsequences global sequence alignment scoring alignments local...
TRANSCRIPT
Dynamic Programming6.5-6.9
Brandon Andrews
Topics Longest Common Subsequences Global Sequence Alignment Scoring Alignments Local Sequence Alignment Alignment with Gap Penalties Questions
Longest Common Subsequences (LCS)
Goal: Looking for sequence similarity between two sequences
Sequences can vary in length between each other• Sequences are denoted as v and w and are
viewed as strings of characters. v = ATTGCTA
Subsequences Subsequences are an ordered
sequence of characters in v or w For example: v = ATTGCTA then
AGCA and ATTA are subsequences• AGCA: ATTGCTA• ATTA: ATTGCTA
Operations The only operations we can perform is
insertion and deletion• Insertion: ATCTGAT -> A-TCTGAT
The hyphen represents inserting anything• Deletion: Insertion into the other sequence to
offset the characters to line up the longest common subsequences
•v=AT-C-TGAT•w=-TGCAT-A-• How do we find TCTA using dynamic programming?
Review: Edit Distance Turning one sequence into another
with the least number of operations.• Allowed insertion, deletion, and
substitutions The longest common subsequences
problem is basically identical with only insertion and deletion and the weights are 0 for a non-match and 1 for a match in the grid (basically Manhattan with fixed weights)
Example Example: Other slides
• Chapter 6: Edit Distance, Slides 54-58,
Global Sequence Alignment
Chapter 6: Alignment
Scoring Alignments Scoring matrices are based on
biological evidence.• Certain amino acid mutations are more
common than others.• For instance, Asn, Asp, Glu, and Ser are the
most mutable amino acids• The probability that Ser mutates into Phe is
approximately three times as likely as Trp mutating into the same amino acid Phe
PAM 1 mutation for every 100 amino acids Required condition that ensures proteins
that are being analyzed are closely related.• The scoring matrix uses probabilities that can
change if the proteins are not closely related. The probability that one amino acid can mutate into
another is different essentially 1 PAM is the average time for the
“average” protein to mutate 1% You end up with PAM 1, PAM 2 type scoring
matrices
Local Sequence Alignment Global alignment looked at two entire
strings Local alignment attempts to only look
for local alignments• That is look for small sequences that are
similar in larger sequences
Smith-Waterman Local Alignment Algorithm
Set an edge weight of 0 from the source to every other vertex.
Alignment with Gap Penalties
Gaps are expected in the sequences.• However, very small gaps could indicate
dissimilarity, so a penalty is given for gaps that meet a criteria
References An Introduction to Bioinformatics
Algorithms Related Slides