2015 bioinformatics alignments_wim_vancriekinge
TRANSCRIPT
![Page 1: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/1.jpg)
![Page 2: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/2.jpg)
FBW
20-10-2015
Wim Van Criekinge
![Page 3: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/3.jpg)
![Page 4: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/4.jpg)
Rat versus
mouse RBP
Rat versus
bacterial
lipocalin
![Page 5: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/5.jpg)
– Henikoff and Henikoff have compared the
BLOSUM matrices to PAM by evaluating how
effectively the matrices can detect known members
of a protein family from a database when searching
with the ungapped local alignment program
BLAST. They conclude that overall the BLOSUM
62 matrix is the most effective.
• However, all the substitution matrices investigated
perform better than BLOSUM 62 for a proportion of
the families. This suggests that no single matrix is
the complete answer for all sequence comparisons.
• It is probably best to compliment the BLOSUM 62
matrix with comparisons using 250 PAMS, and
Overington structurally derived matrices.
– It seems likely that as more protein three
dimensional structures are determined, substitution
tables derived from structure comparison will give
the most reliable data.
Overview
![Page 6: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/6.jpg)
Available Dot Plot Programs
Dotlet (Java Applet)
http://www.isrec.isb-
sib.ch/java/dotlet/Dotlet.
html
![Page 7: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/7.jpg)
Sequence Alignments
Introduction
Algorithms
What ?
Examples
Properties
Dynamic Programming for Pairwise Alignment
Concept
Example
Needleman-Wunsch(.pl)
Smith-Waterman(.pl)
Multiple Alignment
MSA
Hierarchical Pairwise Alignent
ClustalW, PileUp
Formatting
Interpretation
Alternative Methods
SIM
Blast2
Dali
![Page 8: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/8.jpg)
Global and local alignment
Pairwise sequence alignment can be global or local
Global: the sequences are completely aligned
(Needleman and Wunsch, 1970)
Local: only the best sub-regions are aligned
(Smith and Waterman, 1981). BLAST
uses local alignment.
![Page 9: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/9.jpg)
– In order to characterize protein families, identify shared regions of homology in a multiple sequence alignment; (this happens generally when a sequence search revealed homologies to several sequences)
– Determination of the consensus sequence of several aligned sequences
– Help prediction of the secondary and tertiary structures of new sequences;
– Preliminary step in molecular evolution analysis using Phylogenetic methods for constructing phylogenetic trees – Garbage in, Garbage out
– Chicken/egg
Why we do multiple alignments?
![Page 10: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/10.jpg)
Why we do multiple alignments?
• To find conserved regions– Local multiple alignment reveals conserved
regions
– Conserved regions usually are key functional regions
– These regions are prime targets for drug developments
• To do phylogenetic analysis:– Same protein from different species
– Optimal multiple alignment probably implies history
– Discover irregularities, such as Cystic Fibrosis gene
![Page 11: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/11.jpg)
VTISCTGSSSNIGAG-NHVKWYQQLPG
VTISCTGTSSNIGS--ITVNWYQQLPG
LRLSCSSSGFIFSS--YAMYWVRQAPG
LSLTCTVSGTSFDD--YYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDG--
ATLVCLISDFYPGA--VTVAWKADS--
AALGCLVKDYFPEP--VTVSWNSG---
VSLTCLVKGFYPSD--IAVEWWSNG--
![Page 12: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/12.jpg)
Sequence Alignments
Introduction
Algorithms
What ?
Examples
Properties
Dynamic Programming for Pairwise Alignment
Concept
Example
Needleman-Wunsch(.pl)
Smith-Waterman(.pl)
Multiple Alignment
MSA
Hierarchical Pairwise Alignent
ClustalW, PileUp
Formatting
Interpretation
Alternative Methods
SIM
Blast2
Dali
![Page 13: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/13.jpg)
Algorithms and Programs
• Algorithm: a method or a process followed to solve a problem.– A recipe.
• An algorithm takes the input to a problem (function) and transforms it to the output.– A mapping of input to output.
• A problem can have many algorithms.
![Page 14: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/14.jpg)
![Page 15: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/15.jpg)
Arayabhata-Euclid’s algorithm: How to find gcd(a,b),
the greatest common divisor of a and b
Based on a single observation: if a = b q + r, then
any divisor of a and b is also a divisor of r, and any divisor
of b and r is also a divisor of a, so gcd(a,b) = gcd(b,r)
Euclid algorithm: use the division algorithm repeatedly
To reduce the problem to one you can solve.
Example: gcd(55,35)
55 = 35*1 + 20 so gcd(55,35) = gcd(35,20)
35 = 20*1 + 15 so gcd(35,20) = gcd(20,15)
20 = 15*1 + 5 done gcd(55,35) = 5
![Page 16: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/16.jpg)
Pseudocode
![Page 17: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/17.jpg)
GGD.py
def gcd(a, b):
while a != 0:
a, b = b%a, a # parallel assignment
return b
print (gcd(55, 35))
![Page 18: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/18.jpg)
Bubble Sort Algorithm
1. Initialize the size of the list to be sorted to be the actual size of the list.
2. Loop through the list until no element needs to be exchanged with another
to reach its correct position.
2.1 Loop (i) from 0 to size of the list to be sorted - 2.
2.1.1 Compare the ith and (i + 1)st elements in the unsorted list.
2.1.2 Swap the ith and (i + 1)st elements if not in order ( ascending or
descending as desired).
2.2 Decrease the size of the list to be sorted by 1.
One of the simplest sorting algorithms proceeds by walking down the list, comparing
adjacent elements, and swapping them if they are in the wrong order. The process is
continued until the list is sorted.
More formally:
Each pass "bubbles" the largest element in the unsorted part of the list to its correct location.
A 13 7 43 5 3 19 2 23 29 ?? ?? ?? ?? ??
![Page 19: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/19.jpg)
Bubble Sort Implementation
void BubbleSort(int List[] , int Size) {
int tempInt; // temp variable for swapping list elems
for (int Stop = Size - 1; Stop > 0; Stop--) {
for (int Check = 0; Check < Stop; Check++) { // make a pass
if (List[Check] > List[Check + 1]) { // compare elems
tempInt = List[Check]; // swap if in theList[Check] = List[Check + 1]; // wrong orderList[Check + 1] = tempInt;
}
}}
}
Bubblesort compares and swaps adjacent elements; simple but not very efficient.
Efficiency note: the outer loop could be modified to exit if the list is already sorted.
Here is an ascending-order implementation of the bubblesort algorithm for integer arrays:
![Page 20: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/20.jpg)
"Great algorithms are the poetry of computation"
![Page 21: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/21.jpg)
"Great algorithms are the poetry of computation"
1946: The Metropolis Algorithm for Monte Carlo. Through the use of random processes, this algorithm offers an efficient way to stumble toward answers to problems that are too complicated to solve exactly.
1947: Simplex Method for Linear Programming. An elegant solution to a common problem in planning and decision-making.
1950: Krylov Subspace Iteration Method. A technique for rapidly solving the linear equations that abound in scientific computation.
1951: The Decompositional Approach to Matrix Computations. A suite of techniques for numerical linear algebra.
1957: The Fortran Optimizing Compiler. Turns high-level code into efficient computer-readable code.
1959: QR Algorithm for Computing Eigenvalues. Another crucial matrix operation made swift and practical.
1962: Quicksort Algorithms for Sorting. For the efficient handling of large databases.
1965: Fast Fourier Transform. Perhaps the most ubiquitous algorithm in use today, it breaks down waveforms (like sound) into periodic components.
1977: Integer Relation Detection. A fast method for spotting simple equations satisfied by collections of seemingly unrelated numbers.
1987: Fast Multipole Method. A breakthrough in dealing with the complexity of n-body calculations, applied in problems ranging from celestial mechanics to protein folding.
From Random Samples, Science page 799, February 4, 2000.
![Page 22: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/22.jpg)
Algorithm Properties
• An algorithm possesses the following properties:– It must be correct.
– It must be composed of a series of concrete steps.
– There can be no ambiguity as to which step will be performed next.
– It must be composed of a finite number of steps.
– It must terminate.
• A computer program is an instance, or concrete representation, for an algorithm in some programming language.
![Page 23: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/23.jpg)
Measuring Algorithm Efficiency
• Types of complexity
– Space complexity
– Time complexity
• Analysis of algorithms
– The measuring of the complexity of an algorithm
• Cannot compute actual time for an algorithm
– We usually measure worst-case time
![Page 24: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/24.jpg)
Measuring Algorithm Efficiency
Three algorithms for computing
1 + 2 + … n for an integer n > 0
![Page 25: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/25.jpg)
Measuring Algorithm Efficiency
The number of operations required by the algorithms
![Page 26: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/26.jpg)
Measuring Algorithm Efficiency
The number of operations required by the algorithms as a
function of n
![Page 27: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/27.jpg)
Big Oh Notation
• To say "Algorithm A has a worst-case time
requirement proportional to n"
– We say A is O(n)
– Read "Big Oh of n"
• For the other two algorithms
– Algorithm B is O(n2)
– Algorithm C is O(1)
• O is derived from order (magnitude)
![Page 28: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/28.jpg)
Picturing Efficiency
O(n) algorithm
![Page 29: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/29.jpg)
Picturing Efficiency
An O(n2) algorithm.
![Page 30: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/30.jpg)
Picturing Efficiency
Another O(n2) algorithm.
![Page 31: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/31.jpg)
Sequence Alignments
Introduction
Algorithms
What ?
Examples
Properties
Dynamic Programming for Pairwise Alignment
Concept
Example
Needleman-Wunsch(.pl)
Smith-Waterman(.pl)
Multiple Alignment
MSA
Hierarchical Pairwise Alignent
ClustalW, PileUp
Formatting
Interpretation
Alternative Methods
SIM
Blast2
Dali
![Page 32: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/32.jpg)
The best alignment:
The one with the maximum total
score
![Page 33: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/33.jpg)
• Exhaustive …
– All combinations:
• Algorithm
– Dynamic programming (much faster)
• Heuristics
– Needleman – Wunsh for global
alignments
(Journal of Molecular Biology, 1970)
– Later adapated by Smith-Waterman
for local alignment
Overview
![Page 34: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/34.jpg)
![Page 35: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/35.jpg)
• Score of an alignment: reward matches and penalize mismatches and spaces.
– eg, each column gets a (different) value for: • a match: +1, (both have the same
characters); • a mismatch : -1, (both have different
characters); and • a space in a column: -2.
– The total score of an alignment is the sum of the values assigned to its columns.
![Page 36: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/36.jpg)
A metric …
GACGGATTAG, GATCGGAATAG
GA-CGGATTAG
GATCGGAATAG
+1 (a match), -1 (a mismatch),-2 (gap)
9*1 + 1*(-1)+1*(-2) = 6
![Page 37: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/37.jpg)
Dynamic programming
Reduce the problem:
the solution to a large problem is to simplify … if we first know the solution to a smaller problem that is a subset of the larger problem
Overview
P
P2P1 P3
P
![Page 38: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/38.jpg)
Dynamic Programming
• Finding optimal solution to search problem
• Recursively computes solution
• Fundamental principle is to produce optimal solutions to smaller pieces of the problem first and then glue them together
• Efficient divide-and-conquer strategy because it uses a bottom-up approach and utilizes a look-up table instead of recomputing optimal solutions to sub-problems
P
P2P1 P3
P
![Page 39: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/39.jpg)
the best alignment between
• a zinc-finger core sequence:
– CKHVFCRVCI
• and a sequence fragment
from a viral polyprotein:
– CKKCFCKCV
![Page 40: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/40.jpg)
C K H V F C R V C I
+--------------------
C | 1 1 1
K | 1
K | 1
C | 1 1 1
F | 1
C | 1 1 1
K | 1
C | 1 1 1
V | 1 1
Dynamic Programming
![Page 41: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/41.jpg)
C K H V F C R V C I
+--------------------
C | 1 1 1
K | 1
K | 1
C | 1 1 1
F | 1
C | 1 1 1
K | 1
C | 1 1 1
V | 1 1
Dynamic Programming
![Page 42: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/42.jpg)
C K H V F C R V C I
+--------------------
C | 1 1 1 0
K | 1 0
K | 1 0
C | 1 1 1 0
F | 1 0
C | 1 1 1 0
K | 1 0
C | 1 1 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 43: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/43.jpg)
C K H V F C R V C I
+--------------------
C | 1 1 1 0
K | 1 0
K | 1 0
C | 1 1 1 0
F | 1 0
C | 1 1 1 0
K | 1 0
C | 2 1 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 44: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/44.jpg)
C K H V F C R V C I
+--------------------
C | 1 1 1 0
K | 1 0 0
K | 1 0 0
C | 1 1 1 0
F | 1 0 0
C | 1 1 1 0
K | 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 45: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/45.jpg)
C K H V F C R V C I
+--------------------
C | 1 1 1 1 0
K | 1 1 0 0
K | 1 1 0 0
C | 1 1 1 1 0
F | 1 1 0 0
C | 1 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 46: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/46.jpg)
C K H V F C R V C I
+--------------------
C | 1 1 1 1 1 0
K | 1 1 1 0 0
K | 1 1 1 0 0
C | 1 1 1 1 1 0
F | 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 47: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/47.jpg)
C K H V F C R V C I
+--------------------
C | 1 2 1 1 1 0
K | 1 1 1 1 0 0
K | 1 1 1 1 0 0
C | 1 2 1 1 1 0
F | 2 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 48: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/48.jpg)
C K H V F C R V C I
+--------------------
C | 1 2 2 1 1 1 0
K | 1 2 1 1 1 0 0
K | 1 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 2 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 49: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/49.jpg)
C K H V F C R V C I
+--------------------
C | 1 3 2 2 1 1 1 0
K | 1 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 2 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 50: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/50.jpg)
C K H V F C R V C I
+--------------------
C | 1 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 2 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 51: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/51.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 2 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 52: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/52.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 53: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/53.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 54: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/54.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 55: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/55.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 56: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/56.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 57: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/57.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 58: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/58.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 59: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/59.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 60: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/60.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
Dynamic Programming
![Page 61: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/61.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
C K H V F C R V C I
C K K C F C - K C V
C K H V F C R V C I
C K K C F C K - C V
C - K H V F C R V C I
C K K C - F C - K C V
C K H - V F C R V C I
C K K C - F C - K C V
Dynamic Programming
![Page 62: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/62.jpg)
C K H V F C R V C I
+--------------------
C | 5 3 3 3 2 2 1 1 1 0
K | 4 4 3 3 2 1 1 1 0 0
K | 3 4 3 3 2 1 1 1 0 0
C | 4 3 3 3 2 2 1 1 1 0
F | 3 2 2 2 3 1 1 1 0 0
C | 4 2 2 2 2 2 1 1 1 0
K | 2 3 2 2 2 1 1 1 0 0
C | 2 1 1 1 1 2 1 0 1 0
V | 0 0 0 1 0 0 0 1 0 0
C K H V F C R V C I
C K K C F C - K C V
C K H V F C R V C I
C K K C F C K - C V
C - K H V F C R V C I
C K K C - F C - K C V
C K H - V F C R V C I
C K K C - F C - K C V
Dynamic Programming
![Page 63: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/63.jpg)
Needleman-Wunsch-Simple.py
![Page 64: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/64.jpg)
Needleman-Wunsch-Simple.py
The Score Matrix
----------------
Seq1(j)1 2 3 4 5 6 7
Seq2 * C K H V F C R
(i) * 0 -1 -2 -3 -4 -5 -6 -7
1 C -1 1 0 -1 -2 -3 -4 -5
2 K -2 0 2 1 0 -1 -2 -3
3 K -3 -1 1 1 0 -1 -2 -3
4 C -4 -2 0 0 0 -1 0 -1
5 F -5 -3 -1 -1 -1 1 0 -1
6 C -6 -4 -2 -2 -2 0 2 1
7 K -7 -5 -3 -3 -3 -1 1 1
8 C -8 -6 -4 -4 -4 -2 0 0
9 V -9 -7 -5 -5 -3 -3 -1 -1
![Page 65: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/65.jpg)
The Score Matrix
----------------
Seq1(j)1 2 3 4 5 6 7
Seq2 * C K H V F C R
(i) * 0 -1 -2 -3 -4 -5 -6 -7
1 C -1 1 0 -1 -2 -3 -4 -5
2 K -2 0 2 1 0 -1 -2 -3
3 K -3 -1 1 1 0 -1 -2 -3
4 C -4 -2 0 0 0 -1 0 -1
5 F -5 -3 -1 -1 -1 1 0 -1
6 C -6 -4 -2 -2 -2 0 2 1
7 K -7 -5 -3 -3 -3 -1 1 1
8 C -8 -6 -4 -4 -4 -2 0 0
9 V -9 -7 -5 -5 -3 -3 -1 -1
Needleman-Wunsch-Simple.py
![Page 66: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/66.jpg)
The Score Matrix
----------------
Seq1(j)1 2 3 4 5 6 7
Seq2 * C K H V F C R
(i) * 0 -1 -2 -3 -4 -5 -6 -7
1 C -1 1 0 -1 -2 -3 -4 -5
2 K -2 0 2 1 0 -1 -2 -3
3 K -3 -1 1 1 0 -1 -2 -3
4 C -4 -2 0 0 0 -1 0 -1
5 F -5 -3 -1 -1 -1 1 0 -1
6 C -6 -4 -2 -2 -2 0 2 1
7 K -7 -5 -3 -3 -3 -1 1 1
8 C -8 -6 -4 -4 -4 -2 0 0
9 V -9 -7 -5 -5 -3 -3 -1 -1
abc
A: matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH
if (substr(seq1,j-1,1) eq substr(seq2,i-1,1)
B: up_score = matrix(i-1,j) + GAP
C: left_score = matrix(i,j-1) + GAP
Needleman-Wunsch-Simple.py
![Page 67: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/67.jpg)
The Score Matrix
----------------
Seq1(j)1 2 3 4 5 6 7
Seq2 * C K H V F C R
(i) * 0 -1 -2 -3 -4 -5 -6 -7
1 C -1 1 0 -1 -2 -3 -4 -5
2 K -2 0 2 1 0 -1 -2 -3
3 K -3 -1 1 1 0 -1 -2 -3
4 C -4 -2 0 0 0 -1 0 -1
5 F -5 -3 -1 -1 -1 1 0 -1
6 C -6 -4 -2 -2 -2 0 2 1
7 K -7 -5 -3 -3 -3 -1 1 1
8 C -8 -6 -4 -4 -4 -2 0 0
9 V -9 -7 -5 -5 -3 -3 -1 -1
Needleman-Wunsch-Simple.py
![Page 68: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/68.jpg)
Needleman-Wunsch-Simple.py
![Page 69: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/69.jpg)
Seq1:CKHVFCRVCI
Seq2:CKKCFC-KCV
++--++--+- score = 0
Needleman-Wunsch-Simple.py
![Page 70: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/70.jpg)
![Page 71: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/71.jpg)
Extensions to basic dynamic programming methoduse gap penalties
– constant gap penalty for gap > 1
– gap penalty proportional to gap size
• one penalty for starting a gap (gap
opening penalty)
• different (lower) penalty for adding to a
gap (gap extension penalty)
use blosum62
• instead of MATCH and MISMATCH
Dynamic Programming: Needleman-Wunsch-Complete.py
![Page 72: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/72.jpg)
Needleman-Wunsch-Complete.py
![Page 73: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/73.jpg)
Needleman-Wunsch-Complete.py
![Page 74: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/74.jpg)
Needleman-Wunsch-Complete.py
![Page 75: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/75.jpg)
Needleman-Wunsch-Complete.py
![Page 76: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/76.jpg)
Needleman-Wunsch-Complete.py
![Page 77: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/77.jpg)
Needleman-Wunsch-Complete.py
![Page 78: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/78.jpg)
Uses of Needleman-Wunsch-Complete.py
• Time Complexity
• Use random proteins to generate
histogram of scores from aligned
random sequences
![Page 79: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/79.jpg)
Time complexity with Needleman-Wunsch-Complete.py
Sequence Length
(aa)
Execution Time (s)
10 0:00:00.001500
25 0:00:00.005340
50 0:00:00.020112
100 0:00:00.081580
500 0:00:01.960721
1000 0:00:07.720884
10000 0:11:36.344549
100000 Memory could not be
written
![Page 80: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/80.jpg)
Simple version (Match/Mismatch) – no gap extension
![Page 81: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/81.jpg)
Complete version !
![Page 82: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/82.jpg)
True positives False positives
False negatives
Sequences reported
as related
Sequences reported
as unrelatedTrue negatives
homologous
sequences
non-homologous
sequences
Sensitivity:
ability to find
true positives
Specificity:
ability to minimize
false positives
![Page 83: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/83.jpg)
If the sequences are similar, the path
of the best alignment should be very
close to the main diagonal.
Therefore, we may not need to fill the
entire matrix, rather, we fill a narrow
band of entries around the main
diagonal.
An algorithm that fills in a band of
width 2k+1 around the main
diagonal.
![Page 84: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/84.jpg)
Local alignment
• The concept of ‘local alignment’ was
introduced by Smith & Waterman in 1981
• A local alignment of 2 sequences is an
alignment between parts of the 2
sequences
Two proteins may one share one stretch of high sequence
similarity, but be very dissimilar outside that region
A global (N-W) alignment of such sequences would have:
(i) lots of matches in the region of high sequence similarity
(ii) lots of mismatches & gaps (insertions/deletions)
outside the region of similarity
It makes sense to find the best local alignment instead
![Page 85: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/85.jpg)
Smith-Waterman.py
• Three changes
– The edges of the matrix are initialized to 0 instead
of increasing gap penalties
– The maximum score is never less than 0, and no
pointer is recorded unless the score is greater
than 0
– The trace-back starts from the highest score in
the matrix (rather than at the end of the matrix)
and ends at a score of 0 (rather than the start of
the matrix)
![Page 86: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/86.jpg)
Smith-Waterman.py
![Page 87: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/87.jpg)
Smith-Waterman.py
![Page 88: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/88.jpg)
![Page 89: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/89.jpg)
Sequence Alignments
Introduction
Algorithms
What ?
Examples
Properties
Dynamic Programming for Pairwise Alignment
Concept
Example
Needleman-Wunsch(.pl)
Smith-Waterman(.pl)
Multiple Alignment
MSA
Hierarchical Pairwise Alignent
ClustalW, PileUp
Formatting
Interpretation
Alternative Methods
SIM
Blast2
Dali
![Page 90: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/90.jpg)
The best alignment:
The one with the maximum total score
Multiple Aligment: n>2
![Page 91: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/91.jpg)
2 to 3: hyperlattice
![Page 92: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/92.jpg)
On its top-left side, the cube is
"covered" by the polyhedron. The
edges 1, 2, 3, 6 and 7 are coming
from the inside, and edges 4 and 5
can be ignored (and are therefore
not labeled in the figure).
![Page 93: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/93.jpg)
• Each node in the k-dimensional hyperlattice is
visited once, and therefore the running time
must be proportional to the number of nodes in
the lattice.
– This number is the product of the lengths of the
sequences.
– eg. the 3-dimensional lattice as visualized.
Computational Complexity of MA by standard Dynamic Programming
![Page 94: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/94.jpg)
• The memory space requirement is even worse.
To trace back the alignment, we need to store the
whole lattice, a data structure the size of a
multidimensional skyscraper.
– In fact, space is the No.1 problem here, bogging down
multiple alignment methods that try to achieve
optimality.
– Furthermore, incorporating a realistic gap model, we
will further increase our demands on space and running
time
![Page 95: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/95.jpg)
Size/Time limits…
![Page 96: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/96.jpg)
• The most practical and widely used
method in multiple sequence alignment
is the hierarchical extensions of
pairwise alignment methods.
• The principal is that multiple alignments
is achieved by successive application
of pairwise methods.
– First do all pairwise alignments (not just one
sequence with all others)
– Then combine pairwise alignments to generate
overall alignment
Multiple Alignment Method
![Page 97: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/97.jpg)
• The steps are summarized as follows:– Compare all sequences pairwise.
– Perform cluster analysis on the pairwise data to
generate a hierarchy for alignment. This may be in
the form of a binary tree or a simple ordering
– Build the multiple alignment by first aligning the
most similar pair of sequences, then the next most
similar pair and so on. Once an alignment of two
sequences has been made, then this is fixed.
Thus for a set of sequences A, B, C, D having
aligned A with C and B with D the alignment of A,
B, C, D is obtained by comparing the alignments
of A and C with that of B and D using averaged
scores at each aligned position.
Multiple Alignment Method
![Page 98: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/98.jpg)
Multiple Alignment Method
![Page 99: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/99.jpg)
Multiple Alignment Method
![Page 100: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/100.jpg)
• Automatic multiple alignemnt– extend dynamic programming (MSA - Lipman)
• limit: computing power: length and number of sequences (e.q. 2000^8)
– progressive alignment (Feng & Doolittle)• use “guide tree” (PileUp, ClustalW etc)
• Dedicated alignment editing program– Boxshade
– SeaView
– SeqPup (Java)
• Combination (Biology – Computation)
Multiple Sequence Alignment programs
![Page 101: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/101.jpg)
• ClustalW is a general purpose multiple
alignment program for DNA or proteins.
• ClustalW is produced by Julie D. Thompson,
Toby Gibson of European Molecular Biology
Laboratory, Germany and Desmond Higgins
of European Bioinformatics Institute,
Cambridge, UK. Algorithmic
• Improves the sensitivity of progressive
multiple sequence alignment through
sequence weighting, positions-specific gap
penalties and weight matrix choice. Nucleic
Acids Research, 22:4673-4680.
ClustalW
![Page 102: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/102.jpg)
****** MULTIPLE ALIGNMENT MENU ******
1. Do complete multiple alignment now (Slow/Accurate)
2. Produce guide tree file only
3. Do alignment using old guide tree file
4. Toggle Slow/Fast pairwise alignments = SLOW
5. Pairwise alignment parameters
6. Multiple alignment parameters
7. Reset gaps between alignments? = OFF
8. Toggle screen display = ON
9. Output format options
S. Execute a system command
H. HELP
or press [RETURN] to go back to main menu
Your choice:
Running ClustalW
![Page 103: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/103.jpg)
• The final product of a PILEUP run is a set of aligned
sequences, which are stored in a Multiple Sequence File (called .msf by GCG).
This msf file is a text file that can be formatted with
a text editor, but GCG has some dedicated tools for
improving the looks of msf files for easier
interpretation and for publication.
• Consensus sequences can be calculated and the
relationship of each character of each sequence to
the consensus can be highlighted using the
program PRETTY
Formatting Multiple Alignments
![Page 104: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/104.jpg)
• Shading of regions of high homology can be created using
the programs BOXSHADE and PRETTYBOX , but that
goes beyond the scope of this tutorial. (Boxshade:
http://www.ch.embnet.org/software/BOX_form.html)
• In addition to these programs that run on the Alpha, the
output of PILEUP (or CLUSTAL) can be moved by FTP
from your RCR account to a local Mac or PC.
• Since this output is a plain text file, it can be edited with
any word processing program, or imported into any
drawing program to add boldface text, underlining,
shading, boxes, arrows, etc
Formatting Multiple Alignments
![Page 105: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/105.jpg)
http://dot.imgen.bcm.tmc.edu:9331/multi-align/multi-align.html
![Page 106: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/106.jpg)
VTISCTGSSSNIGAG-NHVKWYQQLPG
VTISCTGTSSNIGS--ITVNWYQQLPG
LRLSCSSSGFIFSS--YAMYWVRQAPG
LSLTCTVSGTSFDD--YYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDG--
ATLVCLISDFYPGA--VTVAWKADS--
AALGCLVKDYFPEP--VTVSWNSG---
VSLTCLVKGFYPSD--IAVEWWSNG--
An example of Multiple Alignment … immunoglobulin
![Page 107: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/107.jpg)
• Their alignment highlights conserved
residues (one of the cysteines forming the
disulphide bridges, and the tryptophan are
notable)
• conserved regions (in particular, "Q.PG" at
the end of the first 4 sequences), and more
sophisticated patterns, like the dominance of
hydrophobic residues at fragment positions 1
and 3.
• The alternating hydrophobicity pattern is
typical for the surface beta-strand at the
beginning of each fragment. Indeed, multiple
alignments are helpful for protein structure
prediction.
An example of Multiple Alignment … immunoglobulin
![Page 108: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/108.jpg)
• Providing the alignment is accurate then the following may be inferred about the secondary structure from a multiple sequence alignment.
The position of insertions and deletions (INDELS) suggests regions where surface loops exist.
Conserved glycine or proline suggests a beta-turn.
A Practical Approach: Interpretation
![Page 109: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/109.jpg)
• Residues with hydrophobic properties conserved at i, i+2, i+4 separated by unconserved or hydrophilic residues suggest surface beta- strands.
A short run of hydrophobic amino acids (4 residues) suggests a buried beta-strand.
Pairs of conserved hydrophobic amino acids separated by pairs of unconserved, or hydrophilic residues suggests an alfa-helix with one face packing in the protein core. Likewise, an i, i+3, i+4, i+7 pattern of conserved hydrophobic residues.
A Practical Approach: Interpretation
![Page 110: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/110.jpg)
• Take out noise (GAPS)
• Extra information (structure - function)
• Recursive selection
– first most similar to have an idea about
conserved regions
– manual scan for these in more distant
members then include these
A Practical Approach: Which sequences to use ?
![Page 111: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/111.jpg)
Sequence Alignments
Introduction
Algorithms
What ?
Examples
Properties
Dynamic Programming for Pairwise Alignment
Concept
Example
Needleman-Wunsch(.pl)
Smith-Waterman(.pl)
Multiple Alignment
MSA
Hierarchical Pairwise Alignent
ClustalW, PileUp
Formatting
Interpretation
Alternative Methods
SIM
Blast2
Dali
![Page 112: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/112.jpg)
L-align (2 sequences)
SIM (www.expasy.ch)
LALNVIEW is available for UNIX, Mac
and PC on the ExPASy anonymous
FTP server.
very nice TWEAKING tool (70% criteria)
![Page 113: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/113.jpg)
Length
P-value
SIM
![Page 114: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/114.jpg)
SIM
![Page 115: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/115.jpg)
SIM
![Page 116: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/116.jpg)
How can I use NCBI
to compare two
sequences?
Answer:
Use the
“BLAST 2 Sequences”
program
![Page 117: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/117.jpg)
• Go to http://www.ncbi.nlm.nih.gov/BLAST
• Choose BLAST 2 sequences
• In the program,
[1] choose blastp (protein search) or blastn (for DNA)
[2] paste in your accession numbers
(or use FASTA format)
[3] select optional parameters, such as
--BLOSU62 matrix is default for proteins
try PAM250 for distantly related proteins
--gap creation and extension penalties
[4] click “align”
Practical guide to pairwise alignment:
the “BLAST 2 sequences” website
![Page 118: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/118.jpg)
![Page 119: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/119.jpg)
![Page 120: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/120.jpg)
Question #2:
How can I use NCBI
to compare a
sequence to an
entire database?
BLAST!
![Page 121: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/121.jpg)
![Page 122: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/122.jpg)
![Page 123: 2015 bioinformatics alignments_wim_vancriekinge](https://reader034.vdocument.in/reader034/viewer/2022050614/58757c4a1a28ab78498b62df/html5/thumbnails/123.jpg)
Weblems
W4.1: Align the amino acid sequence of acetylcholine receptor from human, rat, mouse, dog with
ClustalW
T-Coffee
Dali
MSA
W4.2: Use BoxShade to create a word file indicating the different conserved resides in colours
W4.3: Perform a LocalAlignent using SIM and Lalign on the same sequence and Blast2
W4.4: Do the different methods give different results, what are the default settings they use ?
W4.5: How would you identify critical residues for catalytic activity ?