pairwise sequence alignments. comparison methods global alignment local alignment topics to be...
TRANSCRIPT
![Page 1: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/1.jpg)
Pairwise Sequence Alignments
![Page 2: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/2.jpg)
Comparison methods
Global alignment
Local alignment
Topics to be Covered
![Page 3: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/3.jpg)
Introduction to Alignment
Analyze the similarities and differences at the individual base level or amino acid level
Aim is to infer structural, functional and evolutionary relationships among sequences
![Page 4: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/4.jpg)
982 TGTTTGCTAAAGCTTCAGCTATCCACAACCCAATTGACCTCTAC 1022 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 961 TCTTTGCTAAGACCGCCTCCATCTACAACCCAATCA - - - TCTAC 1001
Two sequences written out , one on top of the otherIdentical or similar characters placed in same columnNonidentical characters either placed in same column as mismatch or opposite gap in the other sequenceOverall quality of the alignment is then evaluated based on a formula that counts the number of identical (or similar) pairs minus the number of mismatches and gaps
Sequence Alignment
![Page 5: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/5.jpg)
Pairwise Sequence AlignmentsWhy to compare
Similarity search is necessary for:
• Family assignment
• Sequence annotation
• Construction of phylogenetic trees
• Learn about evolutionary relationships
• Classify sequences
• Identify functions
• Homology Modeling
![Page 6: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/6.jpg)
Essential Elements of an Alignment Algorithm
• Defining the problem (Global, local alignment)
• Scoring scheme (Gap penalties)
• Distance Matrix (PAM, BLOSUM series)
![Page 7: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/7.jpg)
Global and Local AlignmentsGlobal – attempt is made to align the entire sequence using as many characters as possible, up to both ends of the sequencesLocal – stretches of sequence with the highest density of matches are aligned
L G P S S K Q T G K G S – S R I W D N| | | | | | | Global AlignmentL N – I T K S A G K G A I M R L G D A
- - - - - - - T G K G - - - - - - | | | Local Alignment- - - - - - - A G K G - - - - - -
![Page 8: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/8.jpg)
Local vs. Global Alignment (cont’d)
• Global Alignment
• Local Alignment—better alignment to find conserved segment
--T—-CC-C-AGT—-TATGT-CAGGGGACACG—A-GCATGCAGA-GAC | || | || | | | ||| || | | | | |||| | AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG—T-CAGAT--C
TCCCAGTTATGTCAGGGGACACGAGCATGCAGAGAC ||||||||||||
AATTGCCGCCGTCGTTTTCAGCAGTTATGTCAGATC
![Page 9: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/9.jpg)
• Global - When two sequences are of approximately equal length. Here, the goal is to obtain maximum score by completely aligning them
• Local - When one sequence is a sub-string of the other or the goal is to get maximum local score
• Protein motif searches in a database
Global and Local Alignments
![Page 10: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/10.jpg)
Dynamic programming algorithm
• Dynamic programming =
Build up optimal alignment using previous solutions for optimal alignments of subsequences
![Page 11: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/11.jpg)
Aligning Sequences without Insertions and Deletions: Hamming Distance
Given two DNA sequences v and w :v :
• The Hamming distance: dH(v, w) = 8 is large but the sequences are very similar
A T A T A T A TAT A T A T A Tw :
![Page 12: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/12.jpg)
Aligning Sequences with Insertions and Deletions
v : A T A T A T A TAT A T A T A Tw : ----
By shifting one sequence over one position:
• The edit distance: dH(v, w) = 2.
• Hamming distance neglects insertions and deletions in DNA
![Page 13: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/13.jpg)
Edit Distance
Levenshtein (1966) introduced edit distance between two strings as the minimum number of elementary operations (insertions, deletions, and substitutions) to transform one string into the other
d(v,w) = MIN number of elementary operations
to transform v w
![Page 14: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/14.jpg)
Edit Distance vs Hamming Distance
V = ATATATAT
W = TATATATA
Hamming distance always compares i-th letter of v with i-th letter of w
Hamming distance: d(v, w)=8Computing Hamming distance is a trivial task.
![Page 15: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/15.jpg)
Edit Distance vs Hamming Distance
V = ATATATAT
W = TATATATA
Hamming distance: Edit distance: d(v, w)=8 d(v, w)=2
Computing Hamming distance Computing edit distance
is a trivial task is a non-trivial task
W = TATATATA
Just one shift
Make it all line up
V = - ATATATAT
Hamming distance always compares i-th letter of v with i-th letter of w
Edit distance may compare i-th letter of v with j-th letter of w
![Page 16: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/16.jpg)
Edit Distance vs Hamming Distance
V = ATATATAT
W = TATATATA
Hamming distance: Edit distance: d(v, w)=8 d(v, w)=2
(one insertion and one deletion)
How to find what j goes with what i ???
W = TATATATA
V = - ATATATAT
Hamming distance always compares i-th letter of v with i-th letter of w
Edit distance may compare i-th letter of v with j-th letter of w
![Page 17: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/17.jpg)
Edit Distance: Example
TGCATAT ATCCGAT in 5 steps
TGCATAT (delete last T) TGCATA (delete last A) TGCAT (insert A at front)ATGCAT (substitute C for 3rd G)ATCCAT (insert G before last A) ATCCGAT (Done)
![Page 18: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/18.jpg)
Edit Distance: ExampleTGCATAT ATCCGAT in 5 steps
TGCATAT (delete last T)TGCATA (delete last A)TGCAT (insert A at front)ATGCAT (substitute C for 3rd G)ATCCAT (insert G before last A) ATCCGAT (Done)What is the edit distance? 5?
![Page 19: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/19.jpg)
Edit Distance: Example (cont’d)
TGCATAT ATCCGAT in 4 steps
TGCATAT (insert A at front)
ATGCATAT (delete 6th T)
ATGCATA (substitute G for 5th A)
ATGCGTA (substitute C for 3rd G)
ATCCGTA (Done)
![Page 20: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/20.jpg)
Edit Distance: Example (cont’d)TGCATAT ATCCGAT in 4 steps
TGCATAT (insert A at front)
ATGCATAT (delete 6th T)
ATGCAAT (substitute G for 5th A)
ATGCGAT (substitute C for 3rd G)
ATCCGAT (Done)
Can it be done in 3 steps???
![Page 21: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/21.jpg)
The Alignment Grid
– Every alignment path is from source to sink
![Page 22: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/22.jpg)
Alignment as a Path in the Edit Graph
0 1 2 2 3 4 5 6 7 70 1 2 2 3 4 5 6 7 7 A T _ G T T A T _A T _ G T T A T _ A T C G T _ A _ CA T C G T _ A _ C0 1 2 3 4 5 5 6 6 7 0 1 2 3 4 5 5 6 6 7
(0,0) , (1,1) , (2,2), (2,3), (3,4), (4,5), (0,0) , (1,1) , (2,2), (2,3), (3,4), (4,5), (5,5), (6,6), (7,6), (7,7)(5,5), (6,6), (7,6), (7,7)
1
0
2
3
4
5
6
7
10 2 3 4 5 6 7
GGA T C AAT Cw
AA
T
T
G
T
AA
T
v
- Corresponding path -
![Page 23: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/23.jpg)
Alignment as a Path in the Edit Graph
1
0
2
3
4
5
6
7
10 2 3 4 5 6 7
GGA T C AAT Cw
AA
T
T
G
T
AA
T
v
Every path in the edit graph corresponds to an alignment:
![Page 24: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/24.jpg)
Alignment as a Path in the Edit Graph
1
0
2
3
4
5
6
7
10 2 3 4 5 6 7
GGA T C AAT Cw
AA
T
T
G
T
AA
T
vOld AlignmentOld Alignment 01223012234545677677v= AT_Gv= AT_GTTTTAT_AT_w=ATCGw=ATCGT_T_A_CA_C 01234012345555667667
New AlignmentNew Alignment 01223012234545677677v= AT_Gv= AT_GTTTTAT_AT_w=ATCGw=ATCG_T_TA_CA_C 01234012344545667667
![Page 25: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/25.jpg)
From LCS to Alignment: Change up the Scoring
• The Longest Common Subsequence (LCS) problem—the simplest form of sequence alignment – allows only insertions and deletions (no mismatches).
• In the LCS Problem, we scored 1 for matches and 0 for indels
• Consider penalizing indels and mismatches with negative scores
• Simplest scoring schema: +1 : match premium -μ : mismatch penalty -σ : indel penalty
![Page 26: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/26.jpg)
Simple Scoring
• When mismatches are penalized by –μ, indels are penalized by –σ, and matches are rewarded with +1,
the resulting score is:
#matches – μ(#mismatches) – σ (#indels)
![Page 27: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/27.jpg)
Dynamic programming algorithm
• define a matrix Fij:
Fij is the optimal alignment of
subsequence A1...i and B1...j
• iterative build up: F(0,0) = 0
• define each element i,j from
(i-1,j): gap in sequence A
(i, j-1): gap in sequence B
(i-1, j-1): alignment of Ai to Bj
![Page 28: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/28.jpg)
Dynamic programming
![Page 29: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/29.jpg)
Sequence Comparison Scoring Matrices
• The choice of a scoring matrix can strongly influence the outcome of sequence analysis
• Scoring matrices implicitly represent a particular theory of evolution
• Elements of the matrices specify the similarity or the
distance of replacing one residue (base) by another
• Distance and similarity matrices are inter-convertible by some mathematical transformation.
![Page 30: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/30.jpg)
Protein Scoring Matrices
• The two most popular matrices are the PAM and the BLOSUM matrix
![Page 31: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/31.jpg)
T A T G T G G A A T G A
Scoring Insertions and Deletions
A T G T - - A A T G C A
A T G T A A T G C A
T A T G T G G A A T G A
The creation of a gap is penalized with a negative score value.
insertion / deletion
![Page 32: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/32.jpg)
• The optimal alignment of two similar sequences is usually that which
• maximizes the number of matches and• minimizes the number of gaps.
• Permitting the insertion of arbitrarily many gaps can lead to high scoring alignments of non-homologous sequences.
• Penalizing gaps forces alignments to have relatively few gaps.
Why Gap Penalties?
![Page 33: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/33.jpg)
1 GTGATAGACACAGACCGGTGGCATTGTGG 29 ||| | | ||| | || || |1 GTGTCGGGAAGAGATAACTCCGATGGTTG 29
Why Gap Penalties?
Gaps allowed but not penalized Score: 88
Gaps not permitted Score: 0
1 GTG.ATAG.ACACAGA..CCGGT..GGCATTGTGG 29 ||| || | | | ||| || | | || || |1 GTGTAT.GGA.AGAGATACC..TCCG..ATGGTTG 29
Match = 5Mismatch = -4
![Page 34: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/34.jpg)
Gap Penalties Linear gap penalty score:
γ(g) = - gd
Affine gap penalty score:
γ(g) = -d - (g -1)e
γ(g) = gap penalty score of a gap of length g d = gap opening penalty
e = gap extension penalty
g = gap length
![Page 35: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/35.jpg)
Scoring Indels: Naive Approach
• A fixed penalty σ is given to every indel:– -σ for 1 indel, – -2σ for 2 consecutive indels– -3σ for 3 consecutive indels, etc.
Can be too severe penalty for a series of 100 consecutive indels
![Page 36: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/36.jpg)
Affine Gap Penalties
• In nature, a series of k indels often come as a single event rather than a series of k single nucleotide events:
ATA__GCATATTGC
ATAG_GCAT_GTGC
Normal scoring would give the same score for both alignments
This is more likely.
This is less likely.
![Page 37: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/37.jpg)
Accounting for Gaps
• Gaps- contiguous sequence of spaces in one of the rows
• Score for a gap of length x is: -(ρ + σx) where ρ >0 is the penalty for introducing a gap: gap opening penalty ρ will be large relative to σ: gap extension penalty because you do not want to add too much of a
penalty for extending the gap.
![Page 38: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/38.jpg)
Affine Gap Penalty Recurrencessi,j = s i-1,j - σ
max s i-1,j –(ρ+σ)
si,j = s i,j-1 - σ
max s i,j-1 –(ρ+σ)
si,j = si-1,j-1 + δ (vi, wj)
max s i,j
s i,j
Continue Gap in w (deletion)
Start Gap in w (deletion): from middle
Continue Gap in v (insertion)
Start Gap in v (insertion):from middle
Match or Mismatch
End deletion: from top
End insertion: from bottom
![Page 39: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/39.jpg)
Scoring Insertions and Deletions
A T G T T A T A C
T A T G T G C G T A T A
Total Score: 4
Gap parameters:
d = 3 (gap opening)
e = 0.1 (gap extension)
g = 3 (gap lenght)
γ(g) = -3 - (3 -1) 0.1 = -3.2
T A T G T G C G T A T A
A T G T - - - T A T A C
insertion / deletion
match = 1mismatch = 0
Total Score: 8 - 3.2 = 4.8
![Page 40: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/40.jpg)
Modification of Gap Penalties
1 V...LSPADKFLTNV 12 | |||| | | |1 VFTELSPA.K..T.V 11
gap opening penalty = 0gap extension penalty = 0.1score = 11.3
Score Matrix: BLOSUM62
1 ...VLSPADKFLTNV 12 |||| 1 VFTELSPAKTV.... 11
gap opening penalty = 3gap extension penalty = 0.1score = 6.3
![Page 41: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/41.jpg)
Pairwise Sequence AlignmentLocal Alignment
Semi-Global Alignment
![Page 42: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/42.jpg)
Local Alignment
A local Alignment between sequence s and
sequence t is an alignment with maximum
similarity between a substring of s and a
substring of t.
T. F. Smith & M. S. Waterman, “Identification of Common Molecular Subsequences”, J. Mol. Biol., 147:195-197
![Page 43: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/43.jpg)
43
Why choose a local alignment algorithm?
• More meaningful – point out conserved regions between two sequences
• Aligns two sequences of different lengths to be matched
• Aligns two partially overlapping sequences
• Aligns two sequences where one is a subsequence of another
43
![Page 44: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/44.jpg)
44
Dynamic ProgrammingLocal Alignment
• Si,j = MAXIMUM[ Si-1, j-1 + s(ai,bj) (match/mismatch in the diagonal),
Si,j-1 + w (gap in sequence #1),
Si-1,j + w (gap in sequence #2),
0]
44
![Page 45: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/45.jpg)
45
Initialization Step
![Page 46: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/46.jpg)
46
Matrix Fill Step
![Page 47: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/47.jpg)
47
Traceback Step
![Page 48: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/48.jpg)
48
Traceback Step
![Page 49: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/49.jpg)
49
Traceback Step
![Page 50: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/50.jpg)
An Introduction To Multiple Sequence Alignment (MSA)
![Page 51: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/51.jpg)
Topics To Be Discussed
• Motivation for MSA
• What is MSA
• Extension of Dynamic Programming
• The STAR Method
• Progressive Alignment
• Scoring Multiple Alignments
![Page 52: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/52.jpg)
Multiple Alignment versus Pairwise Alignment
• Up until now we have only tried to align two sequences.
![Page 53: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/53.jpg)
• Up until now we have only tried to align two sequences.
• What about more than two? And what for?
Multiple Alignment versus Pairwise Alignment
![Page 54: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/54.jpg)
• Up until now we have only tried to align two sequences.
• What about more than two? And what for?
• A faint similarity between two sequences becomes significant if present in many
• Multiple alignments can reveal subtle similarities that pairwise alignments do not reveal
Multiple Alignment versus Pairwise Alignment
![Page 55: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/55.jpg)
Motivation For MSA
• A natural extension of Pairwise Sequence Alignment
• MSA gives Biologist the ability to extract biologically important
but perhaps widely dispersed sequence similarities that can give
biologist hints about the evolutionary history of certain sequences.
• In pairwise alignment, when two sequences align, it is concluded
that there is probably a functional relationship between the two
sequences. Whereas for MSA, if it is known that there is a
functional similarity amongst a number of sequences, we can use
MSA to find out where the similarity comes from.
![Page 56: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/56.jpg)
What is MSA• MSA is the alignment of N sequences (Protein/Nucleotide)
simultaneously, where N > 2 .
• Let Si denote a sequence than the Global Multiple Sequence
Alignment of N > 2 sequences S = { S1 , …, SN } is obtained
by inserting gaps denoted by “ - “ at any possibly the beginning
or end, position.
• The new set of N sequences denoted by
S’ = { S1’ , …, SN
’ } will all have length L
Ovar STCVLSAYWKD-LNNYHBota STCVLSAYWKD-LNNYHSusc STCVLSAYWRNELNNFHHosa STCMLGTY-QD-FNKFHRano STCMLGTY-QD-LNKFHSasa STCVLGKLSQE-LHKLQ
![Page 57: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/57.jpg)
Interpretation of positions
• Generally there are two interpretations of a position in a multiple sequence alignment:
• Evolutionary/historical
• Functional/structural
In many cases these are the same, but they may not be.
![Page 58: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/58.jpg)
Multiple sequence alignment algorithm
• Ideal approach to multiple sequence alignment is to extend dynamic programming.
• Instead of aligning two sequences (two dimensional grid) we align k sequences (k dimensional grid)
• Extension is relatively straightforward
![Page 59: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/59.jpg)
Dynamic programming for sequence alignment
• Recurrence relation
• Tabular computation
• Traceback
• Pairwise recurrence relation
S(i,j) = max[S(i-1, j-1) + m(i,j), S(i-1, j) + g, S(i, j-1) + g]
m(i,j) = similarity matrix eg BLOSUM
g = gap penalty
![Page 60: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/60.jpg)
Aligning Three Sequences• Same strategy as
aligning two sequences
• Use a 3-D “Manhattan Cube”, with each axis representing a sequence to align
• For global alignments, go from source to sink
source
sink
![Page 61: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/61.jpg)
2-D cell versus 2-D Alignment Cell
In 3-D, 7 edges in each unit cube
In 2-D, 3 edges in each unit square
![Page 62: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/62.jpg)
Architecture of 3-D Alignment Cell(i-1,j-1,k-1)
(i,j-1,k-1)
(i,j-1,k)
(i-1,j-1,k) (i-1,j,k)
(i,j,k)
(i-1,j,k-1)
(i,j,k-1)
![Page 63: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/63.jpg)
Multiple Alignment: Dynamic Programming
• si,j,k = max
• δ(x, y, z) is an entry in the 3-D scoring matrix
si-1,j-1,k-1 + δ(vi, wj, uk)
si-1,j-1,k + δ (vi, wj, _ )
si-1,j,k-1 + δ (vi, _, uk)
si,j-1,k-1 + δ (_, wj, uk)
si-1,j,k + δ (vi, _ , _)
si,j-1,k + δ (_, wj, _)
si,j,k-1 + δ (_, _, uk)
cube diagonal: no indels
face diagonal: one indel
edge diagonal: two indels
![Page 64: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/64.jpg)
Extending dynamic programming
• Based on the extrapolation from two to three sequences, we can define the recurrence relation for any number of sequences in the same way
• The other steps - tabular computation and traceback - are done in the same way as for pairwise alignment
![Page 65: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/65.jpg)
There are seven cases when aligning three sequences
1 2 3 4 5 6 7 I I I - I -
- J J - J - J
- K - K K - -
K
23 -1 to choose the maximum similarity
![Page 66: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/66.jpg)
Three sequence recurrence relation
S(i,j,k) = max[S(i-1, j-1, k-1) + m(i,j) + m(i,k) + m(j,k), S(i-1, j-1, k) + m(i,j) + g, S(i-1, j, k-1) + m(i,k) + g, S(i, j-1, k-1) + m(j,k) + g, S(i-1, j, k)+ g + g, S(i, j-1, k) + g + g, S(i, j, k-1) + g + g]m(i,j) = similarity matrix eg BLOSUMg = gap penalty
![Page 67: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/67.jpg)
Dynamic programming time increases exponentially
• Time taken for alignment by dynamic programming is O(n * m) for two sequences n, m characters long.
• Time taken for alignment by dynamic programming is O(n * m * p) for three sequences n, m, p characters long.
![Page 68: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/68.jpg)
Dynamic programming time increases exponentially
• Clearly, for N sequences, each sequence Li
characters long, the time required will be
N O( Π Li )
i=1This is exponential - O( LN )We need to fill out each ‘box’ in the grid
![Page 69: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/69.jpg)
Pairwise Dynamic ProgrammingComparing Similar Sequences
• Faster algorithm for aligning similar sequences.
• If two sequences are similar, the best alignments have their paths near the main diagonal of the dynamic programming matrix.
• To compute the optimal score and alignment, it is not necessary to fill in the entire matrix.
• A narrow band around the main diagonal should suffice
![Page 70: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/70.jpg)
Global Alignment: Comparing Similar Sequences
Match = 5, Mismatch = -4, Gap w= -7, K=2
![Page 71: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/71.jpg)
Global Alignment: Comparing Similar Sequences
Match = 5, Mismatch = -4, Gap w= -7, K=2
![Page 72: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/72.jpg)
Heuristic multiple sequence alignment
• Currently, most practical methods are hierarchial methods
• For example, pairwise alignments, defining hierarchy followed by progressive addition of sequences to alignment
![Page 73: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/73.jpg)
Multiple Alignment Induces Pairwise Alignments
Every multiple alignment induces pairwise alignments
x: AC-GCGG-C y: AC-GC-GAG z: GCCGC-GAG
Induces:
x: ACGCGG-C; x: AC-GCGG-C; y: AC-GCGAGy: ACGC-GAC; z: GCCGC-GAG; z: GCCGCGAG
![Page 74: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/74.jpg)
Reverse Problem: Constructing Multiple Alignment from Pairwise Alignments
Given 3 arbitrary pairwise alignments:
x: ACGCTGG-C; x: AC-GCTGG-C; y: AC-GC-GAGy: ACGC--GAC; z: GCCGCA-GAG; z: GCCGCAGAG
can we construct a multiple alignment that inducesthem?
![Page 75: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/75.jpg)
Reverse Problem: Constructing Multiple Alignment from Pairwise Alignments
Given 3 arbitrary pairwise alignments:
x: ACGCTGG-C; x: AC-GCTGG-C; y: AC-GC-GAGy: ACGC--GAC; z: GCCGCA-GAG; z: GCCGCAGAG
can we construct a multiple alignment that inducesthem? NOT ALWAYS
Pairwise alignments may be inconsistent
![Page 76: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/76.jpg)
Inferring Multiple Alignment from Pairwise Alignments
• From an optimal multiple alignment, we can infer pairwise alignments between all pairs of sequences, but they are not necessarily optimal
• It is difficult to infer a ``good” multiple alignment from optimal pairwise alignments between all sequences
![Page 77: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/77.jpg)
Combining Optimal Pairwise Alignments into Multiple Alignment
Can combine pairwise alignments into multiple alignment
Can not combine pairwise alignments into multiple alignment
![Page 78: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/78.jpg)
The STAR Alignment Method
• Using a pairwise alignment method (DP,etc) find the sequence that
is most similar to all the other sequences.
• Using this “best” sequence as the center (of a star, hence the name)
align the other sequences following the once a gap always a gap
rule .
• For example consider the following set of sequences
S1 A T T G C C A T T
S2 A T G G C C A T T
S3 A T C C A A T T T T
S4 A T C T T C T T
S5 A C T G A C C
![Page 79: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/79.jpg)
STAR Alignment - 2
• Now Consider the following similarity matrix for the pairwise
comparing of the sequences.
S1 S2 S3 S4 S5 SUM sim(Si, Sj)
I≠J
S1 - 7 -2 0 -3 2
S2 7 - -2 0 -4 1
S3 -2 -2 - 0 -7 -11
S4 0 0 0 - -3 -3
S5 -3 -4 -7 -3 - -17
For this example S1 is the center of the STAR
![Page 80: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/80.jpg)
STAR Alignment - 3
• Next we get the best alignment between S1 and the other sequences as follows:
S1 | A T T G C C A T T S1 | A T T G C C A T T
S2 | A T G G C C A T T S5 | A C T G A C C - -
S1 | A T T G C C A T T - -
S3 | A T C - C A A T T T T
S1 | A T T G C C A T T
S4 | A T C T T C - T T
![Page 81: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/81.jpg)
STAR Alignment 4
• Next to build the MSA we start with S1 & S2 as
A T T G C C A T T
A T G G C C A T T adding S3 using once a gap always a gap
A T T G C C A T T - -
A T G G C C A T T - -
A T C - C A A T T T T continuing in this fashion we obtain
for our MSA of all the sequences
![Page 82: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/82.jpg)
Star Alignment 5
A T T G C C A T T - -
A T G G C C A T T - -
A T C - C A A T T T T
A T C T T C - T T - -
A C T G A C C - - - -
• Clearly, using the STAR method the time complexity is
dominated by computing the pairwise alignment which again for
N sequences we have O(N2) pairs. We consider each pairwise
alignment to take L2 time where again L is the length of each
sequence.
![Page 83: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/83.jpg)
STAR Alignment - 6
• Thus the time complexity for computing all pairwise alignments
will be O[(NL)2]
• We still have to consider the time it takes to merge the sequences into a MSA . If Lmax is the upper bound of the alignment length
then it will take N2(Lmax) time to merge the sequences into a MSA.
• Thus the time complexity for STAR is O( N2L2 + N2Lmax )
• Clearly for large N, L this is less than the time complexity for
SP which is O[ (2L)N (N2)]
• Recall SP is optimal whereas STAR is not, thus there is a trade-off between optimization and practicality .
![Page 84: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/84.jpg)
Profile Representation of Multiple Alignment
- A G G C T A T C A C C T G T A G – C T A C C A - - - G C A G – C T A C C A - - - G C A G – C T A T C A C – G G C A G – C T A T C G C – G G
A 1 1 .8 C .6 1 .4 1 .6 .2G 1 .2 .2 .4 1T .2 1 .6 .2- .2 .8 .4 .8 .4
![Page 85: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/85.jpg)
Profile Representation of Multiple Alignment
In the past we were aligning a sequence against a sequence
Can we align a sequence against a profile?
Can we align a profile against a profile?
- A G G C T A T C A C C T G T A G – C T A C C A - - - G C A G – C T A C C A - - - G C A G – C T A T C A C – G G C A G – C T A T C G C – G G
A 1 1 .8 C .6 1 .4 1 .6 .2G 1 .2 .2 .4 1T .2 1 .6 .2- .2 .8 .4 .8 .4
![Page 86: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/86.jpg)
Aligning alignments
• Given two alignments, can we align them?
x GGGCACTGCATy GGTTACGTC-- Alignment 1 z GGGAACTGCAG
w GGACGTACC-- Alignment 2v GGACCT-----
![Page 87: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/87.jpg)
Aligning alignments
• Given two alignments, can we align them?
• Hint: use alignment of corresponding profiles
x GGGCACTGCATy GGTTACGTC-- Combined Alignment z GGGAACTGCAG
w GGACGTACC-- v GGACCT-----
![Page 88: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/88.jpg)
Multiple Alignment: Greedy Approach
• Choose most similar pair of strings and combine into a profile , thereby reducing alignment of k sequences to an alignment of of k-1 sequences/profiles. Repeat
• This is a heuristic greedy methodu1= ACGTACGTACGT…
u2 = TTAATTAATTAA…
u3 = ACTACTACTACT…
…
uk = CCGGCCGGCCGG
u1= ACg/tTACg/tTACg/cT…
u2 = TTAATTAATTAA…
…
uk = CCGGCCGGCCGG…
k
k-1
![Page 89: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/89.jpg)
Greedy Approach: Example
• Consider these 4 sequencess1 GATTCAs2 GTCTGAs3 GATATTs4 GTCAGC
![Page 90: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/90.jpg)
Greedy Approach: Example (cont’d)
• There are = 6 possible alignments
s2 GTCTGAs4 GTCAGC (score = 2)
s1 GAT-TCAs2 G-TCTGA (score = 1)
s1 GAT-TCAs3 GATAT-T (score = 1)
s1 GATTCA--s4 G—T-CAGC(score = 0)
s2 G-TCTGAs3 GATAT-T (score = -1)
s3 GAT-ATTs4 G-TCAGC (score = -1)
![Page 91: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/91.jpg)
Greedy Approach: Example (cont’d)
s2 and s4 are closest; combine:
s2 GTCTGAs4 GTCAGC
s2,4 GTCt/aGa/cA (profile)
s1 GATTCAs3 GATATTs2,4 GTCt/aGa/c
new set of 3 sequences:
![Page 92: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/92.jpg)
Progressive Alignment
• Progressive alignment is a variation of greedy algorithm with a somewhat more intelligent strategy for choosing the order of alignments.
• Progressive alignment works well for close sequences, but deteriorates for distant sequences• Gaps in consensus string are permanent• Use profiles to compare sequences
![Page 93: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/93.jpg)
ClustalW
• Popular multiple alignment tool today
• ‘W’ stands for ‘weighted’ (different parts of alignment are weighted differently).
• Three-step process
1.) Construct pairwise alignments
2.) Build Guide Tree
3.) Progressive Alignment guided by the tree
![Page 94: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/94.jpg)
The CLUSTALW Algorithm
• Step 1 : Determine all pairwise alignment between sequences and determine degrees of similarity between each pair.
• Step 2 : Construct a similarity tree * .
• Step 3 : Combine the alignments starting from the most closely related groups to the most distantly related groups, as in STAR we use the once a gap always a gap rule .
• * The PILEUP program is similar to CLUSTALW but uses a different method for producing the similarity tree .
![Page 95: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/95.jpg)
Heuristic Multiple Alignment Methods
![Page 96: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/96.jpg)
Clustal W progressive multiple alignment
• Align two sequences to each other
• Align a sequence to an existing alignment
• Align two alignments to each other
![Page 97: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/97.jpg)
![Page 98: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/98.jpg)
Multiple Alignments: Scoring
• As in the pairwise case, not all MSA’s are equally good.
• We need a method of scoring for determining when one MSA is better than another one.
• Number of matches (multiple longest common subsequence score)
• Entropy score
• Sum of pairs (SP-Score)
![Page 99: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/99.jpg)
Multiple LCS Score• A column is a “match” if all the letters in the
column are the same
• Only good for very similar sequences
AAAAAAAATATC
![Page 100: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/100.jpg)
Entropy• Define frequencies for the occurrence of each
letter in each column of multiple alignment• pA = 1, pT=pG=pC=0 (1st column)
• pA = 0.75, pT = 0.25, pG=pC=0 (2nd column)
• pA = 0.50, pT = 0.25, pC=0.25 pG=0 (3rd column)
• Compute entropy of each column AAAAAAAATATC
![Page 101: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/101.jpg)
Entropy: Example
Best case
Worst case
![Page 102: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/102.jpg)
Multiple Alignment: Entropy Score Entropy for a multiple alignment is the sum of entropies of its columns:
Σ over all columns Σ X=A,T,G,C pX logpX
![Page 103: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/103.jpg)
Entropy of an Alignment: Example
column entropy: -( pAlogpA + pClogpC + pGlogpG + pTlogpT)
•Column 1 = -[1*log(1) + 0*log0 + 0*log0 +0*log0] = 0
•Column 2 = -[(1/4)*log(1/4) + (3/4)*log(3/4) + 0*log0 + 0*log0] = -[ (1/4)*(-2) + (3/4)*(-.415) ] = +0.811
•Column 3 = -[(1/4)*log(1/4)+(1/4)*log(1/4)+(1/4)*log(1/4) +(1/4)*log(1/4)] = 4* -[(1/4)*(-2)] = +2.0
•Alignment Entropy = 0 + 0.811 + 2.0 = +2.811
A A A
A C C
A C G
A C T
![Page 104: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/104.jpg)
Sum of Pairs Score(SP-Score)
• Consider pairwise alignment of sequences
ai and aj
imposed by a multiple alignment of k sequences
• Denote the score of this suboptimal (not necessarily optimal) pairwise alignment as
s*(ai, aj)
• Sum up the pairwise scores for a multiple alignment:
s(a1,…,ak) = Σi,j s*(ai, aj)
![Page 105: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/105.jpg)
Computing SP-Score
Aligning 4 sequences: 6 pairwise alignments
Given a1,a2,a3,a4:
s(a1…a4) = Σs*(ai,aj) = s*(a1,a2) + s*(a1,a3) + s*(a1,a4) + s*(a2,a3) + s*(a2,a4) + s*(a3,a4)
![Page 106: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/106.jpg)
SP-Score: Examplea1
.ak
ATG-C-AATA-G-CATATATCCCATTT
Pairs of Sequences
A
A A
11
1
G
C G
1−μ
−μ
Score=3 Score = 1 – μ
Column 1 Column 3
To calculate each column:
![Page 107: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/107.jpg)
• Consider aligning the following 4 portein sequences
S1 = AQPILLLV
S2 = ALRLL
S3 = AKILLL
S4 = CPPVLILV
• Next consider the following MSA matrix M
A Q P I L L L V
A L R - L L - -
A K - I L L L -
C P P V L I L V
SP-Score: Example
![Page 108: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/108.jpg)
• Assume s(match) = 1 , s(mismatch) = -1 , and s(gap) = -2 ,
also assume s(-, -) = 0 to prevent the double counting of gaps.
• Then the SP score for the 4th column of M would be
SP(m4) = SP(I, -, I, V)
= s(I,-) + s(I,I) + s(I,V) + s(-,I) + s(-, V) + s(I,V)
= -2 + 1 + (-1) + (-2) + (-2) +(-1)
= -7
• To find SP(M) we would find the score of each mi and then SUM
all the SP(mi) scores to get the score M .
• To find the optimal score using this method we need to consider
all possible MSA matrices. We say more about this later.
SP-Score: Example
![Page 109: Pairwise Sequence Alignments. Comparison methods Global alignment Local alignment Topics to be Covered](https://reader035.vdocument.in/reader035/viewer/2022062300/56649de55503460f94adcf36/html5/thumbnails/109.jpg)
Some Problems with the SP Score
• Consider column 1 of our example ie A,A,A,C for this column
we get SP(m4) = SP(A,A,A,C)
= 1 + 1 + (-1) + 1 + (-1) + (-1)
= 0
whereas if we had A,A,A,A we get a score of
SP(A,A,A,A) = 1+1+1+1+1+1 = 6 , thus we get a difference of
6 for what could be explained by a single mutation.
• The SP method tends to overweight the influence of mutations
• The major problem with the SP method is that finding the optimal MSA is very time consuming.