recent progress in multiple sequence alignments: a survey
DESCRIPTION
Recent Progress in Multiple Sequence Alignments: A Survey. Cédric Notredame. Our Scope. What are The existing Methods ?. How Do They Work: -Assemby Algorithms -Weighting Schemes. When Do They Work ?. Which Future ?. Outline. - Introduction. - A taxonomy of the existing Packages. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/1.jpg)
Cédric Notredame (21/04/23)
Recent Progress in Multiple Sequence
Alignments:A Survey
Cédric Notredame
![Page 2: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/2.jpg)
Cédric Notredame (21/04/23)
Our Scope
What are The existing Methods?
How Do They Work: -Assemby Algorithms-Weighting Schemes.
When Do They Work ?
Which Future?
![Page 3: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/3.jpg)
Cédric Notredame (21/04/23)
Outline
-Introduction
-A taxonomy of the existing Packages
-A few algorithms…
-Performance Comparison using BaliBase
![Page 4: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/4.jpg)
Cédric Notredame (21/04/23)
Introduction
![Page 5: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/5.jpg)
Cédric Notredame (21/04/23)
What Is A Multiple Sequence Alignment?
A MSA is a MODEL
It Indicates the RELATIONSHIP between residues of different sequences.
It REVEALS-Similarities-Inconsistencies
LIKE ANYMODEL
![Page 6: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/6.jpg)
Cédric Notredame (21/04/23)
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *
chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :
How Can I Use A Multiple Sequence Alignment?
Extrapolation
Motifs/Patterns
Phylogeny
Profiles
Struc. Prediction
Multiple Alignments Are CENTRAL to MOST Bioinformatics Techniques.
![Page 7: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/7.jpg)
Cédric Notredame (21/04/23)
How Can I Use A Multiple Sequence Alignment?
Multiple Alignments Is the most INTEGRATIVE Method Available Today.
We Need MSA to INCORPORATE existing DATA
![Page 8: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/8.jpg)
Cédric Notredame (21/04/23)
Why Is It Difficult To Compute A multiple Sequence Alignment?
A CROSSROAD PROBLEM
BIOLOGY:What is A Good Alignment
COMPUTATIONWhat is THE Good Alignment
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *
![Page 9: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/9.jpg)
Cédric Notredame (21/04/23)
Why Is It Difficult To Compute A multiple Sequence Alignment ?
BIOLOGY
CIRCULAR PROBLEM....
GoodSequences
GoodAlignment
COMPUTATION
![Page 10: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/10.jpg)
Cédric Notredame (21/04/23)
A Taxonomy of Multiple Sequence Alignment Methods
![Page 11: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/11.jpg)
Cédric Notredame (21/04/23)
Grouping According to the assembly Algorithm
![Page 12: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/12.jpg)
Cédric Notredame (21/04/23)
SimultaneousAs opposed to Progressive
Exact As opposed to Heursistic
Stochastic As opposed to Determinist
Iterative As opposed to Non Iterative
[Simultaneous: they simultaneously use all the information]
[Heuristics: cut corners like Blast Vs SW]
[Heuristics: do not guarranty an optimal solution]
[Stochastic: contain an element of randomness]
[Stochastic: Example of a Monte Carlo Surface estimation ]
[Iterative: Most stochastic methods are iterative]
[Iterative: run the same algorithm many times]
![Page 13: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/13.jpg)
Cédric Notredame (21/04/23)Iterative
Iteralign
Prrp
SAM HMMer
SAGAGA
Clustal
Dialign
T-Coffee
ProgressiveSimultaneous
MSA
POA OMA
PralineMAFFT
DCA
Combalign
Non tree based
GAs
HMMs
![Page 14: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/14.jpg)
Cédric Notredame (21/04/23)Iterative
Iteralign
Prrp
SAM HMMer
GA
Clustal
Dialign
T-Coffee
ProgressiveSimultaneous
MSA
POA OMA
PralineMAFFT
DCA
Combalign
StochasticSAGA
![Page 15: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/15.jpg)
Cédric Notredame (21/04/23)
NEARLY EVERY OPTIMISATIONALGORITHM
HAS BEEN APPLIED TO THEMSA PROBLEM!!!
![Page 16: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/16.jpg)
Cédric Notredame (21/04/23)
Grouping According to the Objective Function
![Page 17: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/17.jpg)
Cédric Notredame (21/04/23)
Scoring an Alignment: Evolutionary based
methods
BIOLOGYHow many events separate my sequences?
Such an evaluation relies on a biological model.
COMPUTATIONEvery position musd be independant
![Page 18: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/18.jpg)
Cédric Notredame (21/04/23)
REAL Tree
Model: ALL the sequences evolved from the same ancestor
A
A
A C
Tree: Cost=1C
AAACC
A CA
PROBLEM: We do not know the true tree
![Page 19: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/19.jpg)
Cédric Notredame (21/04/23)
STAR Tree
Model: ALL the sequences have the same ancestor
A
A
A CStar Tree: Cost=2
C
AAACC
A
PROBLEM: the tree star is phylogenetically wrong
![Page 20: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/20.jpg)
Cédric Notredame (21/04/23)
Sums of Pairs
Model=Every sequence is the ancestor of every sequence
A
A
A CSums of Pairs: Cost=6
CAAACC
PROBLEM: -over-estimation of the mutation costs-Requires a weighting scheme
lk
li
kii mmsmS ,
[s(a,b): matrix]
[i: column i]
[k, l: seq index]
![Page 21: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/21.jpg)
Cédric Notredame (21/04/23)
Sums of Pairs: Some of itslimitations (Durbin,
p140)
LLLLL
GCost=5*N*(N-1)/2-(5)*(N-1) - (-4)*(N-1)
[glycine effect]
Cost=5*N*(N-1)/2-(9)*(N-1)
Cost= 5*N*(N-1)/2[5: Leucine Vs Leucine with Blosum50]
![Page 22: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/22.jpg)
Cédric Notredame (21/04/23)
Sums of Pairs: Some of its limitations (Durbin,
p140)
LLLLL
G
Delta=2*(9)*(N-1)
5*N*(N-1)=
(9)
5*N
N
Delta
Conclusion: The more Leucine, the less expensive it gets to add a Glycin to the column...
![Page 23: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/23.jpg)
Cédric Notredame (21/04/23)
Enthropy based Functions
Model: Minimize the enthropy (variety) in each Column
AAACC
PROBLEM: -requires a simultaneous alignment-assumes independant sequences
j
jiia amc [number of Alanine (a) in column i]
a
iaiai PcmS log* [Score of column i][a: alphabet]
[P can incorporate pseudocounts]
S=0 if the column is conserved
![Page 24: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/24.jpg)
Cédric Notredame (21/04/23)
Consistency based Functions
Model: Maximise the consistency (agreement) with a list of constraints (alignments)
PROBLEM: -requires a list of constraints
AAACC
lk
li
kii mmS , [kand l are sequences, i is a column]
Existsmmmm li
ki
li
ki ,1,
[the two residues are found aligned in the list of constraints]
![Page 25: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/25.jpg)
Cédric Notredame (21/04/23)
Concistency Based
Iteralign
Dialign
T-Coffee
Praline
Combalign
Prrp
ClustalPOA
MSA
MAFFTOMA
DCA
SAGA
WeightedSums
of Pairs
EnthropySAM HMMer
GIBBS
![Page 26: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/26.jpg)
Cédric Notredame (21/04/23)
A few Multiple Sequence Alignment Algorithms
![Page 27: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/27.jpg)
Cédric Notredame (21/04/23)
A Few Algorithms
MSA and DCA
ClustalW
Dialign IIPrrp
SAGA
GIBBS Sampler
MAFFT
POA
![Page 28: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/28.jpg)
Cédric Notredame (21/04/23)
Simultaneous: MSA and DCA
![Page 29: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/29.jpg)
Cédric Notredame (21/04/23)
Simultaneous Alignments : MSA
1) Set Bounds on each pair of sequences (Carillo and Lipman)
2) Compute the Maln within the Hyperspace
-Few Small Closely Related Sequence.
-Do Well When They Can Run.
-Memory and CPU hungry
![Page 30: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/30.jpg)
Cédric Notredame (21/04/23)
MSA: the carillo and Lipman bounds
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP
S( )=
S(S(
)
)+
…[Pairwise projection of sequences k and l]
![Page 31: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/31.jpg)
Cédric Notredame (21/04/23)
MSA: the carillo and Lipman bounds
a(k,l)=score of the projection k l in the optimal MSA
â(k,l)=score of the optimal alignment of k l
(a(x,y))=score of the complete multiple alignment
a(k,l) â(k,l) a(k,m) â(k,m)
?
Upper
Lower
![Page 32: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/32.jpg)
Cédric Notredame (21/04/23)
MSA: the carillo and Lipman bounds
LM: a lower bound for the complete MSA
a(k,l)>=LM +â(k,l)-(â(x,y))
LM<=(â(x,y)) - (â(k,l)-a(k,l))
a(k,l) â(k,l)
â(k,l)
LM+ â(k,l)-(â(x,y))
?
![Page 33: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/33.jpg)
Cédric Notredame (21/04/23)
MSA: the carillo and Lipman bounds
LM: can be measured on ANY heuristic alignment
a(k,l) â(k,l)
â(k,l)
LM+ â(k,l)-(â(x,y)) ä(k,l)
LM = (ä(x,y))
The better LM, the tighter the bounds…
![Page 34: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/34.jpg)
Cédric Notredame (21/04/23)
MSA: the carillo and Lipman bounds
backward Forward
Best( M-i, N-j) Best( 0-i, 0-j)
0
M
N 0
M
N
+
![Page 35: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/35.jpg)
Cédric Notredame (21/04/23)
Simultaneous Alignments : MSA
1) Set Bounds on each pair of sequences (Carillo and Lipman)
2) Compute the Maln within the Hyperspace
-Few Small Closely Related Sequence.
-Do Well When They Can Run.
-Memory and CPU hungry
![Page 36: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/36.jpg)
Cédric Notredame (21/04/23)
Simultaneous Alignments : DCA
-Few Small Closely Related Sequence, but less limited than MSA
-Do Well When Can Run.
-Memory and CPU hungry, but less than MSA
![Page 37: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/37.jpg)
Cédric Notredame (21/04/23)
Simultaneous With a New Sequence Representaion:
POA-Partial Ordered Graph
![Page 38: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/38.jpg)
Cédric Notredame (21/04/23)
![Page 39: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/39.jpg)
Cédric Notredame (21/04/23)
![Page 40: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/40.jpg)
Cédric Notredame (21/04/23)
POA
POA makes it possible to represent complex relationships:
-domain deletion-domain inversions
![Page 41: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/41.jpg)
Cédric Notredame (21/04/23)
Progressive: ClustalW
![Page 42: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/42.jpg)
Cédric Notredame (21/04/23)
Progressive Alignment: ClustalW
Feng and Dolittle, 1988; Taylor 198ç
Clustering
![Page 43: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/43.jpg)
Cédric Notredame (21/04/23)
Dynamic Programming Using A Substitution Matrix
Progressive Alignment: ClustalW
![Page 44: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/44.jpg)
Cédric Notredame (21/04/23)
Tree based Alignment : Recursive Algorithm
Align ( Node N){
if ( N->left_child is a Node)A1=Align ( N->left_child)
else if ( N->left_child is a Sequence)A1=N->left_child
if (N->right_child is a node)A2=Align (N->right_child)
else if ( N->right_child is a Sequence)A2=N->right_child
Return dp_alignment (A1, A2)}
A D E F GCB
![Page 45: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/45.jpg)
Cédric Notredame (21/04/23)
Progressive Alignment : ClustalW
-Depends on the ORDER of the sequences (Tree).
-Depends on the CHOICE of the sequences.
-Depends on the PARAMETERS:
•Substitution Matrix.
•Penalties (Gop, Gep).
•Sequence Weight.
•Tree making Algorithm.
![Page 46: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/46.jpg)
Cédric Notredame (21/04/23)
Weighting Within ClustalWProgressive Alignment : ClustalW Weighting
![Page 47: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/47.jpg)
Cédric Notredame (21/04/23)
Position Specific GOPProgressive Alignment : ClustalW GOP
![Page 48: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/48.jpg)
Cédric Notredame (21/04/23)
ClustalW is the most Popular Method
-Fast
-Greedy Heuristic (No Guarranty).
Progressive Alignment : ClustalW
-Scales Well: N, N L3 2 2
![Page 49: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/49.jpg)
Cédric Notredame (21/04/23)
Progressive Alignment With a Heuristic DP:
MAFFT
![Page 50: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/50.jpg)
Cédric Notredame (21/04/23)
![Page 51: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/51.jpg)
Cédric Notredame (21/04/23)
ProgressiveAnd
Concistency BasedDialign II
![Page 52: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/52.jpg)
Cédric Notredame (21/04/23)
Dialign II
1) Identify best chain of segments on each pair of sequence. Assign a Pvalue to each Segment Pair.
3) Assemble the alignment according to the segment pairs.
2) Ré-évaluate each segment pair according to its consistency with the others
![Page 53: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/53.jpg)
Cédric Notredame (21/04/23)
Dialign II
-May Align Too Few Residues
-No Gap Penalty-Does well with ESTs
![Page 54: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/54.jpg)
Cédric Notredame (21/04/23)
ProgressiveAnd
Concistency BasedT-COFFEE
![Page 55: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/55.jpg)
Cédric Notredame (21/04/23)
Mixing Local and Global Alignments
Local Alignment Global Alignment
Extension
Multiple Sequence Alignment
![Page 56: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/56.jpg)
Cédric Notredame (21/04/23)
What is a library?
Extension+T-Coffee
2Seq1 MySeqSeq2 MyotherSeq#1 21 1 253 8 70….
3Seq1 anotherseqSeq2 atsecondoneSeq3 athirdone#1 21 1 25#1 33 8 70….
![Page 57: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/57.jpg)
Cédric Notredame (21/04/23)
Iterative
![Page 58: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/58.jpg)
Cédric Notredame (21/04/23)
7.16.1 ProgressiveIterative Methods
-HMMs, HMMER, SAM.
-Slow, Sometimes Inaccurate-Good Profile Generators
![Page 59: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/59.jpg)
Cédric Notredame (21/04/23)
7.16.2 PrrpInitial Alignment
Tree and weights computation
Weights converged End
Realign two sub-groups
Alignment converged
YES
NO
YES NO
Inner Iteration
Outer Iteration
Iterative Methods : Prrp
![Page 60: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/60.jpg)
Cédric Notredame (21/04/23)
Iterative Sochastic:SAGA, The Genetic
Algorithm
![Page 61: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/61.jpg)
Cédric Notredame (21/04/23)
![Page 62: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/62.jpg)
Cédric Notredame (21/04/23)
![Page 63: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/63.jpg)
Cédric Notredame (21/04/23)
![Page 64: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/64.jpg)
Cédric Notredame (21/04/23)
Automatic scheduling of the operators
![Page 65: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/65.jpg)
Cédric Notredame (21/04/23)
![Page 66: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/66.jpg)
Cédric Notredame (21/04/23)
Weighting Schemes
![Page 67: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/67.jpg)
Cédric Notredame (21/04/23)
The Problem
The sequences Contain Correlated Information
Most scoring Schemes Ignore this Correlation
![Page 68: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/68.jpg)
Cédric Notredame (21/04/23)
Weighting Sequence Pairs with a Tree:
Carillo and LipmanRationale I
![Page 69: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/69.jpg)
Cédric Notredame (21/04/23)
A D E F GCB
E=EDGE
P=Evolutive Path from A to X
E must contribute the same weight to every path P that goes throught it.
QUESTION: Which Weight for a Pair of Sequences
All the weights using E must sum to 1: (WP,E)=1.
Wp=Nk-1)
1
Nk: Number of Edges meeting on Node k.
![Page 70: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/70.jpg)
Cédric Notredame (21/04/23)
USAGE
]][[*),( yB
xAAB
yB
xA RRMatWRRScore
![Page 71: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/71.jpg)
Cédric Notredame (21/04/23)
PROBLEM: Weight Depends only on the Tree topology
B
A C
AB: 0.5AC: 0.5BC: 0.5.
B
A C
AB: 0.5AC: 0.5BC: 0.5.
![Page 72: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/72.jpg)
Cédric Notredame (21/04/23)
Weighting Sequences with a Tree
Clustal WWeights
![Page 73: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/73.jpg)
Cédric Notredame (21/04/23)
GA D E FCB
QUESTION: Which Weight for Sequences ?
W=Length *1/4
W=Length *1/2
W=Length *1
GG W=W)
Number Sequences Sharing Edge
Edge LengthWseq =
![Page 74: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/74.jpg)
Cédric Notredame (21/04/23)
USAGE
]][[**),( yB
xABA
yB
xA RRMatWWRRScore
![Page 75: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/75.jpg)
Cédric Notredame (21/04/23)
PROBLEM: Overweight of distant sequences
D E F G
C-C Will dominate the Alignment
-C Will be very Difficult to align
![Page 76: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/76.jpg)
Cédric Notredame (21/04/23)
Performance Comparison Using
Collections of Reference
Alignments: BaliBase and
Ribosomal RNA
![Page 77: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/77.jpg)
Cédric Notredame (21/04/23)
What Is BaliBaseBaliBase
BaliBase is a collection of reference Multiple Alignments
The Structure of the Sequences are known and were used to assemble the MALN.
Evaluation is carried out by Comparing the Structure Based Reference Alignment With its Sequence Based Counterpart
![Page 78: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/78.jpg)
Cédric Notredame (21/04/23)
What Is BaliBaseBaliBase
DALI, Sap …
Method X
Comparison
![Page 79: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/79.jpg)
Cédric Notredame (21/04/23)
What Is BaliBaseBaliBase
DescriptionPROBLEM
Source: BaliBase, Thompson et al, NAR, 1999,
Even Phylogenic Spread.
One Outlayer Sequence
Two Distantly related Groups
Long Internal Indel
Long Terminal Indel
![Page 80: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/80.jpg)
Cédric Notredame (21/04/23)
Choosing The Right Method
![Page 81: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/81.jpg)
Cédric Notredame (21/04/23)
Choosing The Right Method (POA Evaluation)
![Page 82: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/82.jpg)
Cédric Notredame (21/04/23)
Choosing The Right Method (POA Evaluation)
![Page 83: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/83.jpg)
Cédric Notredame (21/04/23)
Choosing The Right Method (MAFFT evaluation)
![Page 84: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/84.jpg)
Cédric Notredame (21/04/23)
Choosing The Right Method (MAFFT evaluation)
![Page 85: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/85.jpg)
Cédric Notredame (21/04/23)
Choosing The Right Method (MAFFT evaluation)
![Page 86: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/86.jpg)
Cédric Notredame (21/04/23)
Conclusion
![Page 87: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/87.jpg)
Cédric Notredame (21/04/23)
What Is BaliBaseWhich Method ?
PROBLEM
Source: BaliBase, Thompson et al, NAR, 1999,
Strategy
Strategy
ClustalW, T-coffee,MSA, DCA
PrrP,T-Coffee
Dialign
T-Coffee
T-Coffee
Dialign
T-Coffee
![Page 88: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/88.jpg)
Cédric Notredame (21/04/23)
Methods /Situtations
1-Carillo and Lipman:-MSA, DCA.
-Few Small Closely Related Sequence.
2-Segment Based:-DIALIGN, MACAW.
-May Align Too Few Residues-Good For Long Indels
-Do Well When They Can Run.
3-Iterative:-HMMs, HMMER, SAM.
-Slow, Sometimes Inaccurate-Good Profile Generators
4-Progressive: -ClustalW, Pileup, Multalign…-Fast and Sensitive
![Page 89: Recent Progress in Multiple Sequence Alignments: A Survey](https://reader031.vdocument.in/reader031/viewer/2022013101/56814371550346895daff1bc/html5/thumbnails/89.jpg)
Cédric Notredame (21/04/23)
Addresses
MAFFT Progressive www.biophys.kyoto-u.jp/katoh POA Progressive/Simulataneous www.bioinformatics.ucla.edu/poa MUSCLE Progressive/Iterative www.drive5.com/muscle/