Download - 3D -COFFEE Mixing Sequences and Structures
3D-COFFEE Mixing Sequences and Structures
Cédric Notredame
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *
chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :
Potential Uses of A Multiple Sequence Alignment?
Extrapolation
Motifs/Patterns
Phylogeny
Profiles
Struc. PredictionMultiple Alignments Are CENTRAL to MOST Bioinformatics Techniques.
Why Is It Difficult To Compute A multiple Sequence Alignment?
A CROSSROAD PROBLEMBIOLOGY:
What is A Good Alignment
COMPUTATIONWhat is THE Good
Alignment
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *
Why Is It Difficult To Compute A multiple
Sequence Alignment ?
BIOLOGY
CIRCULAR PROBLEM....
GoodSequences
GoodAlignment
COMPUTATION
The T-Coffee Algorithm
Local Alignment Global Alignment
Extension
Multiple Sequence Alignment
Mixing Local and Global Alignments
What is a library?
Extension+T-Coffee
Library Based Multiple Sequence Alignment
2Seq1 MySeqSeq2 MyotherSeq#1 21 1 253 8 70….
3Seq1 anotherseqSeq2 atsecondoneSeq3 athirdone#1 21 1 25#1 33 8 70….
The Triplet Assumption
X
Y
Z
X
Y
SEQ A
SEQ B
Consistency Consensus
ClustalW T-Coffee
Dynamic Programming Using An Extended Library
Progressive Alignment
What Is BaliBaseHow Good is T-Coffee ???
Best Performing Method on MSA benchmark Datasets
BaliBase -Notredame-Sonhammer
Ribosomal RNA-Katoh (Mafft)
Homstrad-Notredame
OxBench-Barton
Mixing Heterogenous Data With
T-CoffeeLocal Alignment Global Alignment
Multiple Sequence Alignment
Multiple Alignment
StructuralSpecialist
Mixing Sequences and Structures
Why Do We Want To Mix Sequences and Structures?
1-Predicting Sequence Structures
STUCTURE FUNCTION
Why Do We Want To Mix Sequences and Structures?
•Sequences are Cheap and Common.
•Structures are Expensive and Rare.
Why Do We Want To Mix Sequences and Structures?
Cheapest Structure determination:
Sequence-Structure Alignment
THREADOr
ALIGNADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN
Why Do We Want To Mix Sequences and Structures?
ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN
THREADOr
ALIGN
Convincing Alignment
Same Fold
Why Do We Want To Mix Sequences and Structures?
Convincing Alignment
Same Fold
Distant sequences are hard to align
Why Do We Want To Mix Sequences and Structures?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *
Multiple Sequence Alignments Help
Exploring the Twilight Zone
Why Do We Want To Mix Sequences and Structures?
1-Predicting Sequence Structures
2-Produce Better Alignments
Why Do We Want To Mix Sequences and Structures?
ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLNALIGN
Unreliable alignment if %ID <30%
Why Do We Want To Mix Sequences and Structures?
Alignment Unsentitive to %ID
ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN
Struc.Superposition
Folds evolve Slower than Sequences
Why Do We Want To Mix Sequences and Structures?
Why Do We Want To Mix Sequences and Structures?
StructureSuperposition
Why Do We Want To Mix Sequences and Structures?
1-Predicting Sequence Structures
2-Produce Better Alignments
How To Mix Sequences and
Structures
Mixing Heterogenous Data With
T-CoffeeLocal Alignment Global Alignment
Multiple Sequence Alignment
Multiple Alignment
StructuralSpecialist
Struct Vs StructSeq Vs Struct
Thread
Evaluation on Homestrad
Superpose
Seq Vs SeqLocalGlobal
Mixing Sequences and Structures with T-Coffee
The 3D-Coffee LibrariesMethods
•Global: Needlman and Wunsch
•Local: Sim (lalign)
•Threading: Fugue
•Superposition:SAP
•Threading: Fugue
Fugue
•Threading: Fugue
Fugue
•Threading: Fugue
1-Turn Sequence into a profile:-lower penalties in loops-Structure specific matrix
2-Align Profile
withSequence
Evaluating Fugue
•Threading: Fugue
1-Select 967 pairs of sequences in HOMSTRAD
FUGUE T-Coffee2-Align each pair with T-Coffee and Fugue.
Compare
3-Compare the TwoAlignments
Fugue
•Threading: Fugue
1-Select 967 pairs of sequences in HOMSTRAD
2-Align each pair with T-Coffee and Fugue.
3-Compare the TwoAlignments TCdef wins
Fugue wins TCdef: 58.81%Fugue: 61.81%
Superposition:
SAP
•Superposition:SAP
•Superposition:SAP
1-High Level Dynamic Programming
Substitution Matrix when doing regular Alignments
2-Low Level DP.Forcing the aln of two residues
1-High Level Dynamic Programming
•Superposition:SAP
1
9
12131
8
14
53-Rigid Body Superposition
RMSD
2-Low Level DP.Forcing the aln of two residues
1-High Level Dynamic Programming
•Superposition:SAP
1
9
1213
18
14
53-Rigid Body Superposition
RMSD2-Low Level DP.Forcing the aln of two residues
1-High Level Dynamic Programming
•Superposition:SAP
3-Rigid Body Superposition
2-Low Level DP.Evaluate Every Pair
1-High Level Dynamic Programming
•Superposition:SAP
Structure Based Sequence Alignment
Make a DP on the
accumulated traces
Use Traces like a
Substitution Matrix
SAP T- Coff ee
Compare
1-Select 967 pairs of sequences in HOMSTRAD
2-Align each pair with T-Coffee and SAP.
3-Compare the TwoAlignments
•Superposition:SAP
1-Select 967 pairs of sequences in HOMSTRAD
2-Align each pair with T-Coffee and SAP.
3-Compare the TwoAlignments
•Superposition:SAP
TCdef: 58.81%SAP: 86.31%
•SAP•Fugue
TCdef: 58.81%Fugue: 61.81%
TCdef: 58.81%Fugue: 86.31%
Sequences and Structures:
How Good is The Mixture ???
Our Benchmark:
HOM39
-HOMSTRAD: Structure based MSAs that can be used as References.
-COMPACT and DEMANDING
-HOM39: The 39 Most difficult datasets (percent ID lower than 25).
Our BenchMark:
Using HOM39
BENCHMARKING Strategy:
-re-align HOM39 without using ALL the structures
-Compare the result with the reference
Evaluating 3D-Coffee
1- Can a SINGLE structure Help ?
Seq Vs Struct
Thread
Evaluation on HOM39
Seq Vs SeqLocalGlobal
Using ONE structure with3D-Coffee
HOM39 with ONE Structure per MSA
Evaluating 3D-Coffee
1- Can a SINGLE structure Help ?
2- Does it benefit to ALL the Sequences
Is EVERYONE Happier if there is a STAR in the team…
BaliBase
HOM39 TC-Fugue
+
Remove Provided Structure(s)
Comparison
Evaluating 3D-Coffee
1- Can a SINGLE structure Help ?
3- Can We Use Two or More Structures
2-Does it benefit to all the sequences
Seq Vs Struct
Fugue
Evaluation on Homestrad
Seq Vs SeqLocalGlobal
Mixing Sequences and Structures with 3D-Coffee
HOM39 with TWO Structures/MSA
Struct Vs Struct
SAP, LSQ
Indirect Improvement
Direct Improvement
Evaluating 3D-Coffee
1- Can a SINGLE structure Help ?
4-Relation Accuracy/ N-structures ???
2-Does it benefit to all the sequences
3-Can we use Two Structures
Seq Vs Struct
Fugue
Evaluation on Homestrad
Seq Vs SeqLocalGlobal
Mixing Sequences and Structures with T-Coffee
HOM39 with 1-N Structures per MSA
Struct Vs Struct
SAP
Induced Improvement
Conclusion
-Structures Help
BUT NOT SO MUCH
The More Structures The Merrier
The More Structures The Merrier
Credits
Orla O’Sullivan: University College, Cork, Ireland
Des Higgins: University College, Cork, Ireland
Karsten Suhre: IGS-CNRS, Marseille, France