b.sc biochem i bobi u 3.1 sequence alignment
TRANSCRIPT
![Page 1: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/1.jpg)
Course: B.Sc Biochemistry
Subject: Basic of Bioinformatics
Unit: III
![Page 2: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/2.jpg)
OUTLINE
Sequence Alignment Scoring Alignments and Substitution Matrices Inserting Gaps Dynamic Programming Database Searches
![Page 3: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/3.jpg)
Sequence Alignment
Comparing sequences for– Similarity– Homology
Prediction of function of genes and proteins Construction of phylogeny Finding motifs
![Page 4: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/4.jpg)
Sequence Alignment - HOMOLOGY
Orthologues : any gene pairwise relation where the ancestor node is a speciation event. Often have similar function
Paralogues : any gene pairwise relation where the ancestor node is a duplication event. Paralogs tend to have different functions
![Page 5: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/5.jpg)
Sequence Alignment - HOMOLOGY
1.
![Page 6: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/6.jpg)
Sequence Alignment - HOMOLOGY
2.
![Page 7: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/7.jpg)
Sequence Alignment - PHYLOGENY
3.
![Page 8: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/8.jpg)
Sequence Alignment – PROTEIN FUNCTIONS
4.
![Page 9: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/9.jpg)
Scoring Alignments and Substitution Matrices
The quality of an alignment is measured by giving it a quantitative score
The simplest way of quatifying similarity between two sequences is percentage identity.– Simply measured by counting the number of
identical bases or amino acids matched between the aligned sequences.
![Page 10: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/10.jpg)
Scoring Alignments and Substitution Matrices
The dot-plot gives a visual assesment of similarity based on identity.
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]5.
![Page 11: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/11.jpg)
Scoring Alignments and Substitution Matrices
Percentage identity is a relatively crude measure and does bot give a complete picture of the degree of similarity of two sequences.
Scoring identical matches 1 and mismatches as 0 ignores the fact that the type of amino acids involved is highly significant.
![Page 12: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/12.jpg)
Scoring Alignments and Substitution Matrices
Genuine matches may not be identical:
Seq1: T H I S I S A S E Q U E N C E
Seq1: T H A T _ _ _ S E Q U E N C E
Isoleucine – Alanine: both hydrophobic
Serine – Threonine : both polar
![Page 13: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/13.jpg)
Scoring Alignments and Substitution Matrices
Scoring pairs of amino acids:– with similar properties higher scores– With different properties lower scores
![Page 14: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/14.jpg)
Scoring Alignments and Substitution Matrices
To assign scores for alignmens use SUBSTITUTION MATRICES
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
5.
![Page 15: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/15.jpg)
Scoring Alignments and Substitution Matrices
Different types of substitution matrices are being used based on:– The number of mutations required for
convertion of one amino acid to the other– Similarities in physicochemical properties.
![Page 16: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/16.jpg)
Scoring Alignments and Substitution Matrices
PAM substitution matrices:– Use closely related protein sequences to
derive substitution frequencies– Accepted Point Mutations per 100 residues
250 PAM 250 mutation on 100 residues
![Page 17: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/17.jpg)
Scoring Alignments and Substitution Matrices
BLOSUM substitution matrices:– BLOcks of Amino Acid SUbstitution Matrix – Use mutation data from highly conserved
local regions– BLOSUM 62 62% identity
![Page 18: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/18.jpg)
Scoring Alignments and Substitution Matrices
Which matrix to use ?– Depends on the problem properties,– Distantly related sequences : PAM 250 –
BLOSUM 50– Closely related sequences: PAM 120,
BLOSUM 80
![Page 19: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/19.jpg)
Scoring Alignments and Substitution Matrices
Which matrix to use ?– Some special purpose matrices (SLIM and
PHAT are designed for membrane proteins)– The length of the sequende is important
Short sequences PAM 40 or BLOSUM 80 Long sequences PAM 250 or BLOSUM 50
![Page 20: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/20.jpg)
Scoring Alignments and Substitution Matrices
BLOSUM – 62 and PAM 120
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 6.
![Page 21: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/21.jpg)
Inserting Gaps
Gap insertion requires a scoring penalty (gap penalty).
To achieve correct matches gaps are required
Alignment programs use gap penalties to limit the introduction of gaps in the alignments
![Page 22: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/22.jpg)
Inserting Gaps
Insertions tend to be several residues long rather than just a single residue long– Fewer insertions and deletions occur in sequences
of structural importance– Smaller penalty on lengthening an existing gap
(gap extension penalty) than introducing a new gap
– Gap penaly is high the number of gaps will be decreased
– Gap penalty is low more and large gaps will be inserted.
![Page 23: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/23.jpg)
Inserting Gaps
Choosing gap penalties:– Linear– Affine
Gap open penalty Gap extension penlty
![Page 24: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/24.jpg)
Dynamic Programming
Global and Local alignments
Pairwise and Multiple alignments
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 7.
![Page 25: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/25.jpg)
For a pair of sequences there is a large number of possible alignments.
2 sequences of length 1000 have appriximately 10600 different alignments.
Dynamic Programming
![Page 26: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/26.jpg)
Dynamic Programming:– Problem can be divided into many smaller parts.– Optimal alignment will not contain parts that are
not themselves optimal.– Start from sufficiently short sub-sequences.– Alignement is additive:
Dynamic Programming
![Page 27: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/27.jpg)
Needleman and Wunsch were the first to propose this method.
Find optimal global alignments. Align sequences:
– Seq1: x (x1x2x3…xm)
– Seq1: y (y1y2y3…yn)
Dynamic Programming
![Page 28: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/28.jpg)
s(a,b) = score of aligning a and b F(i,j) = optimal similarity of X(1:i) and Y(1:j) Recurrence relation:
– F(i,0) = Σ s(X(k), gap), 0 <= k <= i
– F(0,j) =Σ s(gap, B(k)), 0 <= k <= j
– F(i,j) = max [ F(i,j-1) + s(gap,Y(j),
F(i-1,j) + s(X(i),gap),
F(i-1, j-1) + s(X(i), Y(j)]
– Assume linear gap penalty
Dynamic Programming
![Page 29: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/29.jpg)
Dynamic Programming
Matrix S of optimal scores of sub-sequence alignments.
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
9.
![Page 30: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/30.jpg)
Dynamic Programming
S(I, T) = -1,
10.
![Page 31: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/31.jpg)
Dynamic Programming
S(I, H) = -3,
S(I, gap) = -8,
S(gap, H) = -8Recurrence relation:
F(i,j) = max [ F(i,j-1) + s(gap,Y(j), F(i-1,j) + s(X(i),gap), F(i-1, j-1) + s(X(i), Y(j)]
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
11.
![Page 32: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/32.jpg)
Dynamic Programming
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
12.
![Page 33: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/33.jpg)
Dynamic Programming
–Linear gap penalty (E=4)
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
13.
![Page 34: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/34.jpg)
Dynamic Programming
Semi – global alignment:– When we treat terminal gaps differently than
internal gaps– How to modify dynamic programming to be able
to make semi – global alignment ?
![Page 35: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/35.jpg)
Dynamic Programming
Local alignment:– If we compare a sequence to whole genome– Find sub-strings whose optimal global
alignment value is maximum
![Page 36: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/36.jpg)
Dynamic Programming
What is the difference between global and local alignment ?
Can we define the recuernce relation of local alignment similar to global alignment ?
![Page 37: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/37.jpg)
Recurrence relation of GLOBAL ALIGNMENT:
(Needleman & Wunsch)
– F(i,0) = Σ s(X(k), gap), 0 <= k <= i
– F(0,j) =Σ s(gap, B(k)), 0 <= k <= j
– F(i,j) = max [ F(i,j-1) + s(gap,Y(j),
F(i-1,j) + s(X(i),gap),
F(i-1, j-1) + s(X(i), Y(j)]
Dynamic Programming
![Page 38: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/38.jpg)
Recurrence relation of LOCAL ALIGNMENT:
(Smith-Waterman)
– F(i,0) = 0
– F(0,j) = 0
– F(i,j) = max [ 0,
F(i,j-1) + s(gap,Y(j),
F(i-1,j) + s(X(i),gap),
F(i-1, j-1) + s(X(i), Y(j)]
Dynamic Programming
![Page 39: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/39.jpg)
Database Searches
FASTA and BLAST Use some heuristics Dynamic Programming Complexity
– Time O(n*m)– Space O(n*m)
![Page 40: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/40.jpg)
Database Searches FASTA
Good local alignment should have some exact match subsequence.
Find all k-tuples. (k=1-2 for proteins, 3-6 for DNA sequences)
Protein k – tuples nc, sp, … (k = 2) Nucleotide k – tuples TAAA, CTCC,…(k = 4)
![Page 41: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/41.jpg)
Database Searches FASTA
If k = 3 for nucleotide sequences.– There will be 64 possible k – tuples– Assign a number e( ):
e(A) = 0, e(C) = 1, e(G) = 2, e(T) = 3
Each 3 – tuples are represented as xi xi+1xi+2
Assign a number to each 3 – tuple
– Ci = e(xi)42 + e(xi+1)41 + e(xi+2)40
– For example: AAA AAA 042 + 041 + 040 = 0 CAA 142 + 041 + 040 = 16
![Page 42: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/42.jpg)
Database Searches FASTA
Find each occurance of k – tuples in the sequences.
Chaining Look – Up Tables Consider TAAAACTCTAAC (if k = 3):
3 - tuples Position
AAA (0) 2, 3
AAC (1) 4, 10
AAG (2) 0
AAT (3) 0
… …
![Page 43: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/43.jpg)
Database Searches BLAST
Use short words to search the database sequence.
Searches for k – mers that will score above a threshold (T) value when aligned with query k - mer (Remember FASTA looks for k – tuples which are identical).
Use a scheme based on finite state automata (Remember FASTA use hashing and chaining fot rapid identification of k - tuples)
![Page 44: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/44.jpg)
Database Searches BLAST
From Query Sequence, create query words (for protein sequences word size is 3)
![Page 45: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/45.jpg)
Database Searches BLAST
Blast uses a list of high scoring words created from words similar to query words. Considers the words with a score bigger than a threshold value.
![Page 46: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/46.jpg)
Database Searches BLAST
Scan each database sequence for an exact match to the list of words.
Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S".
![Page 47: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/47.jpg)
Database Searches BLAST
Keep only the extended matches that have a score at least S.
Determine statistical significance of each remaining match.
![Page 48: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/48.jpg)
Database Searches BLAST
http://blast.ncbi.nlm.nih.gov/Blast.cgi
1.
14.
![Page 49: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/49.jpg)
Database Searches BLAST
15.
![Page 50: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/50.jpg)
Database Searches BLAST
16.
![Page 51: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/51.jpg)
Database Searches BLAST
17.
![Page 52: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/52.jpg)
Database Searches BLAST
18.
![Page 53: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/53.jpg)
Database Searches HISTORY
1970: NW 1980: SW 1985: FASTA 1989: BLAST
![Page 54: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/54.jpg)
Books and Web References
Books Name :
1. Introduction To Bioinformatics by T. K. Attwood
2. BioInformatics by Sangita
3. Basic Bioinformatics by S.Ignacimuthu, s.j.
http://en.wikipedia.org/wiki/Sequence_alignment http://pages.cs.wisc.edu/~bsettles/ibs08/lectures/02-alignment.pdf http://www.ks.uiuc.edu/Training/Tutorials/science/bioinformatics-tutorial/
bioinformatics.pdf M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland
Science Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A
practical guide to the analysis of genes and proteins”, 2001, Wiley.
54
![Page 55: B.sc biochem i bobi u 3.1 sequence alignment](https://reader035.vdocument.in/reader035/viewer/2022062313/55c76e4cbb61eb83578b47dd/html5/thumbnails/55.jpg)
Images References
1.http://gorbi.irb.hr/files/5712/7497/9729/Slide09.jpg 2.http://www.ensembl.org/info/genome/compara/
tree_example1.png 3.http://www.nature.com/nature/journal/v496/n7445/images/
nature12027-f1.2.jpg 4.
http://upload.wikimedia.org/wikipedia/commons/e/e6/Spombe_Pop2p_protein_structure_rainbow.png
5. & 6. Book: Basic Bioinformatics by S.Ignacimuthu, s.j. 7. to 13. Book: Basic Bioinformatics by S.Ignacimuthu, s.j. 14. to 18. http://blast.ncbi.nlm.nih.gov/Blast.cgi