consensus fold recognition methods dongbo bu school of computer science university of waterloo joint...
TRANSCRIPT
![Page 1: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/1.jpg)
Consensus Fold Recognition Methods
Dongbo BuSchool of Computer Science
University of Waterloo
Joint work withS.C. Li, X. Gao, L. Yu, J. Xu, M. Li
Nov. 2006
![Page 2: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/2.jpg)
Outline
• Background
• Consensus Prediction Methods
• ACE7: consensus method by identifying latent servers
• Experimental Results
• Future Work
![Page 3: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/3.jpg)
Background
![Page 4: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/4.jpg)
From sequence to structure
• The Rate Gap – gene prediction is fast,– but experimental structure
determination is slow
• The First Principle– Sequence almost determine
structure
• CASP Competition– A fair and objective examination
Computational Methods
motivation
possibility
benchmark
![Page 5: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/5.jpg)
Homologous Modeling --- sequence-sequence alignment
![Page 6: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/6.jpg)
Threading---sequence-structure alignment
![Page 7: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/7.jpg)
Ab initio--- database independent
![Page 8: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/8.jpg)
Why Consensus?
• Observation:– no single server can reliably predict the best
models for all the targets. – a particular structure prediction server may
perform well on some targets, but badly on others.
• A natural idea to solve this issue:– to combine the strengths of different
prediction methods to obtain better structural models.
![Page 9: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/9.jpg)
What is Consensus Method?
![Page 10: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/10.jpg)
Formal Description
• Notations:– Target: the query protein sequence– Server: implementation of a prediction method– Model: a predicted structure
![Page 11: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/11.jpg)
Classical Consensus Methods
![Page 12: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/12.jpg)
Research History
• Early exploration of consensus idea:– Consensus many methods in one server.– INBGU (SHGU) D. Fischer 2000– 3D-PSSM (Phyre) L. Kelly 2000
• The first consensus server– CAFASP-CONSENS: D. Fischer 2001
• Successors: – Pcons/Pmodeller J. Lundstrom, A.
Elofsson 2001– 3D-Jury K. Ginalski, A. Elofsson 2003– 3D-Shotgun D. Fischer 2003– ACE L. Yu, J. Xu, M. Li 2004
![Page 13: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/13.jpg)
Three-step Process
• Step1: Model Comparison– determine model similarities
• Step2: Feature Extraction– formal description of a model
• Step3: Model Selection – select a model, or part of it.
• Many machine learning techniques were introduced in the 3rd step.
![Page 14: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/14.jpg)
3D-Shotgun: Majority Voting
• Basic Idea:– Reminiscent of “cooperative algorithms”
• Five Input Servers:– GONP, GONPM, PRFSEQ, SEQPPRF,
SEQPMPRF
• Step 1. Model Comparison– For each initial model, to find models with
LOCAL similarity.
![Page 15: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/15.jpg)
3D-Shotgun (cont)
• Step 2. Feature Extraction– For each model M, superimpose similar
models upon M,– Using the shared similarity to compute
transformation– Build a multiple structure alignment A(M) as a
result,– Feature:
• the number of models share structural element with A(M).
![Page 16: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/16.jpg)
3D-Shotgun (cont)
• Step 3. Selection– Majority Voting– Choose the structural element with the highest count.– The underlying rationale:
• The recurring structural elements are most likely to be correct.
![Page 17: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/17.jpg)
Confidence Assignment
• For each assembled model M’, a confidence score S’ is given as follows:
• Here, – k,l run over all the input models– S_{k,l} is the confidence score given by the individual
server– Sim() adopts MaxSub.
![Page 18: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/18.jpg)
Performance of 3D-Shotgun
![Page 19: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/19.jpg)
CAFASP-Consensus and Pcons: Neutral Network
• Step 1. Model Comparison– CAFASP-Consensus: check SCOP id, or run MaxSub– Pcons: LGScore2 to detect similarity
• Step 2: Feature Extraction– CAFASP-Consensus: number of similar models– Pcons:
ratio of the similar models
weighted f1
ratio of the similar 1st model
![Page 20: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/20.jpg)
CAFASP-Consensus and Pcons: (cont)
• Step 3. Model Selection– Formulated into a
machine learning problem
– Attribute: • Log(LGScore2),
significantly better than LGScore2.
![Page 21: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/21.jpg)
Pmodeller = Pcons + ProQ
• ProQ:– a neutral network package to measure the
quality of a structure
• Pmodeller has an advantage over Pcons because a number of high-score but false-positive models are eliminated.
![Page 22: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/22.jpg)
Performance of Pcons/Pmod
![Page 23: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/23.jpg)
ACE: SVM Regression
• Step 1. Model Comparison– MaxSub
• Step 2. Feature Extraction
– f1: the normalized similarity with all the other models– f2: the normalized similarity with the most similar one– f3: for each target, to measure the divergence of server
predictions.
![Page 24: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/24.jpg)
ACE (cont)
• Step 3: Selection– SVM Regression: to predict the model quality– Attribute:
• MaxSub with the native structure
![Page 25: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/25.jpg)
Performance of ACE• In CASP6, ACE was ranked 2nd among 87
automatic servers. • On LiveBench test set:
![Page 26: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/26.jpg)
Other techniques• 3D-Jury:
– Rationale: the average of lower energy conformation is similar to the native structure.
– Basic Idea: Mimic the average step by the following scoring function:
![Page 27: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/27.jpg)
Other techniques (cont)
• Robetta:– For each fragment, choose a local structure
from a set, and assemble them to minimize an energy funtion.
• BPROMPT: – Bayesian Belief Network
• JPred:– Decision Tree
![Page 28: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/28.jpg)
CASP7 Performance
![Page 29: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/29.jpg)
ACE7: A Consensus Method by Identifying Latent Servers
![Page 30: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/30.jpg)
Motivation
• Server Correlation:– Although consensus servers assume that
each individual server is independent of others, it is observed from CASP6 results that correlation exists between different servers to some degree.
• Negative Effect:– this kind of correlation sometimes makes a
native-like model receive less support than the incorrect models.
![Page 31: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/31.jpg)
Examination of ACE on CASP6 Dataset
• Observation:– If a native-like model receives support from only 1or 2
server, it is difficult to select it.
![Page 32: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/32.jpg)
Source of Server Correlation
• Server Correlation:– some servers tend to generate similar results,
• Reason:– Roughly speaking, the correlations arose from the fact
that these servers adopt similar techniques, including sequence alignment tools, secondary structure prediction methods, and scoring functions,etc.
• Latent Servers: – Here, we use independent latent servers to represent
the common features shared by these implicit servers.
![Page 33: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/33.jpg)
ACE7: to reduce the server correlation
• Step 1. Adopting Maximum Likelihood to estimate the server correlation.
• Step 2. Employing Principle Component Analysis technique to derive the latent servers.
• Step 3. Using an ILP model to weigh the latent servers.
![Page 34: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/34.jpg)
Two Assumptions of ACE7
• Assumption 1:
– Here, we approximate Ci,m by:
• Assumption 2:
![Page 35: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/35.jpg)
Maximum Likelihood Estimation of Server Correlation
Here,
![Page 36: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/36.jpg)
Server Correlation
• Observation:– The server correlation is significant with respect to the fact that there are
thousands of candidate models.– some servers are correlated more tightly than others.
• mGenThreader and RAPTOR (0.383) vs. FUGUE3 and Prospect (0.182).
• Implication: – These individual server may be clustered into cliques according to
correlations; – the servers in a small clique may be underestimated according to the simple
“majority voting” rule.
![Page 37: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/37.jpg)
Uncovering the Latent Server
•
![Page 38: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/38.jpg)
Uncovering the Latent Servers (cont)
• Using the PCA technique, the latent severs can be estimated as:
![Page 39: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/39.jpg)
Explanation of Latent Servers
• Observation:– H1: represents MGTH and RAPT– H2: SPKS– H3: FUG3– H4: ST02– H5: PROS– H6: no preference
![Page 40: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/40.jpg)
Construct a More Accurate Server
• Since latent servers are mutually independent, it is reasonable to assume:
• Key Point:– How to set the weight of each latent server?
– An ILP model:• To maximize the gap between the scores of the native-like
models and incorrect models.
![Page 41: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/41.jpg)
ILP Model (soft-margin idea)
![Page 42: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/42.jpg)
Experiment on CASP7 Dataset
• Observation:– For T0363, ACE7 succeeds even only one server votes the native-like
model.
![Page 43: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/43.jpg)
Sensitivity of ACE7
• Observation: – ACE7 has a higher sensitivity than any individual
server.
![Page 44: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/44.jpg)
Future Work
![Page 45: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/45.jpg)
Conclusion
• Though consensus methods rely on structure clustering property, the server correlation also bring negative effect.
•
![Page 46: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/46.jpg)
Future Work
• To find a better approximation of Ci,m.
• Using MaxSub instead of GDT.
• RAPTOR has a good performance in choosing the top 5 models, but always be puzzled to choose the top 1 model.
• We try to help to choose the best from the top 5 models remains an open problem.
![Page 47: Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov](https://reader036.vdocument.in/reader036/viewer/2022062417/551b3d265503465c7e8b4f8e/html5/thumbnails/47.jpg)
Thanks.