consensus fold recognition methods
DESCRIPTION
Consensus Fold Recognition Methods. Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov. 2006. Outline. Background Consensus Prediction Methods ACE7: consensus method by identifying latent servers Experimental Results - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/1.jpg)
Consensus Fold Recognition Methods
Dongbo BuSchool of Computer Science
University of Waterloo
Joint work withS.C. Li, X. Gao, L. Yu, J. Xu, M. Li
Nov. 2006
![Page 2: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/2.jpg)
Outline
• Background
• Consensus Prediction Methods
• ACE7: consensus method by identifying latent servers
• Experimental Results
• Future Work
![Page 3: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/3.jpg)
Background
![Page 4: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/4.jpg)
From sequence to structure
• The Rate Gap – gene prediction is fast,– but experimental structure
determination is slow
• The First Principle– Sequence almost determine
structure
• CASP Competition– A fair and objective examination
Computational Methods
motivation
possibility
benchmark
![Page 5: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/5.jpg)
Homologous Modeling --- sequence-sequence alignment
![Page 6: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/6.jpg)
Threading---sequence-structure alignment
![Page 7: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/7.jpg)
Ab initio--- database independent
![Page 8: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/8.jpg)
Why Consensus?
• Observation:– no single server can reliably predict the best
models for all the targets. – a particular structure prediction server may
perform well on some targets, but badly on others.
• A natural idea to solve this issue:– to combine the strengths of different
prediction methods to obtain better structural models.
![Page 9: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/9.jpg)
What is Consensus Method?
![Page 10: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/10.jpg)
Formal Description
• Notations:– Target: the query protein sequence– Server: implementation of a prediction method– Model: a predicted structure
![Page 11: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/11.jpg)
Classical Consensus Methods
![Page 12: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/12.jpg)
Research History
• Early exploration of consensus idea:– Consensus many methods in one server.– INBGU (SHGU) D. Fischer 2000– 3D-PSSM (Phyre) L. Kelly 2000
• The first consensus server– CAFASP-CONSENS: D. Fischer 2001
• Successors: – Pcons/Pmodeller J. Lundstrom, A.
Elofsson 2001– 3D-Jury K. Ginalski, A. Elofsson 2003– 3D-Shotgun D. Fischer 2003– ACE L. Yu, J. Xu, M. Li 2004
![Page 13: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/13.jpg)
Three-step Process
• Step1: Model Comparison– determine model similarities
• Step2: Feature Extraction– formal description of a model
• Step3: Model Selection – select a model, or part of it.
• Many machine learning techniques were introduced in the 3rd step.
![Page 14: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/14.jpg)
3D-Shotgun: Majority Voting
• Basic Idea:– Reminiscent of “cooperative algorithms”
• Five Input Servers:– GONP, GONPM, PRFSEQ, SEQPPRF,
SEQPMPRF
• Step 1. Model Comparison– For each initial model, to find models with
LOCAL similarity.
![Page 15: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/15.jpg)
3D-Shotgun (cont)
• Step 2. Feature Extraction– For each model M, superimpose similar
models upon M,– Using the shared similarity to compute
transformation– Build a multiple structure alignment A(M) as a
result,– Feature:
• the number of models share structural element with A(M).
![Page 16: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/16.jpg)
3D-Shotgun (cont)
• Step 3. Selection– Majority Voting– Choose the structural element with the highest count.– The underlying rationale:
• The recurring structural elements are most likely to be correct.
![Page 17: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/17.jpg)
Confidence Assignment
• For each assembled model M’, a confidence score S’ is given as follows:
• Here, – k,l run over all the input models– S_{k,l} is the confidence score given by the individual
server– Sim() adopts MaxSub.
![Page 18: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/18.jpg)
Performance of 3D-Shotgun
![Page 19: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/19.jpg)
CAFASP-Consensus and Pcons: Neutral Network
• Step 1. Model Comparison– CAFASP-Consensus: check SCOP id, or run MaxSub– Pcons: LGScore2 to detect similarity
• Step 2: Feature Extraction– CAFASP-Consensus: number of similar models– Pcons:
ratio of the similar models
weighted f1
ratio of the similar 1st model
![Page 20: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/20.jpg)
CAFASP-Consensus and Pcons: (cont)
• Step 3. Model Selection– Formulated into a
machine learning problem
– Attribute: • Log(LGScore2),
significantly better than LGScore2.
![Page 21: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/21.jpg)
Pmodeller = Pcons + ProQ
• ProQ:– a neutral network package to measure the
quality of a structure
• Pmodeller has an advantage over Pcons because a number of high-score but false-positive models are eliminated.
![Page 22: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/22.jpg)
Performance of Pcons/Pmod
![Page 23: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/23.jpg)
ACE: SVM Regression
• Step 1. Model Comparison– MaxSub
• Step 2. Feature Extraction
– f1: the normalized similarity with all the other models– f2: the normalized similarity with the most similar one– f3: for each target, to measure the divergence of server
predictions.
![Page 24: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/24.jpg)
ACE (cont)
• Step 3: Selection– SVM Regression: to predict the model quality– Attribute:
• MaxSub with the native structure
![Page 25: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/25.jpg)
Performance of ACE• In CASP6, ACE was ranked 2nd among 87
automatic servers. • On LiveBench test set:
![Page 26: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/26.jpg)
Other techniques• 3D-Jury:
– Rationale: the average of lower energy conformation is similar to the native structure.
– Basic Idea: Mimic the average step by the following scoring function:
![Page 27: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/27.jpg)
Other techniques (cont)
• Robetta:– For each fragment, choose a local structure
from a set, and assemble them to minimize an energy funtion.
• BPROMPT: – Bayesian Belief Network
• JPred:– Decision Tree
![Page 28: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/28.jpg)
CASP7 Performance
![Page 29: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/29.jpg)
ACE7: A Consensus Method by Identifying Latent Servers
![Page 30: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/30.jpg)
Motivation
• Server Correlation:– Although consensus servers assume that
each individual server is independent of others, it is observed from CASP6 results that correlation exists between different servers to some degree.
• Negative Effect:– this kind of correlation sometimes makes a
native-like model receive less support than the incorrect models.
![Page 31: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/31.jpg)
Examination of ACE on CASP6 Dataset
• Observation:– If a native-like model receives support from only 1or 2
server, it is difficult to select it.
![Page 32: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/32.jpg)
Source of Server Correlation
• Server Correlation:– some servers tend to generate similar results,
• Reason:– Roughly speaking, the correlations arose from the fact
that these servers adopt similar techniques, including sequence alignment tools, secondary structure prediction methods, and scoring functions,etc.
• Latent Servers: – Here, we use independent latent servers to represent
the common features shared by these implicit servers.
![Page 33: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/33.jpg)
ACE7: to reduce the server correlation
• Step 1. Adopting Maximum Likelihood to estimate the server correlation.
• Step 2. Employing Principle Component Analysis technique to derive the latent servers.
• Step 3. Using an ILP model to weigh the latent servers.
![Page 34: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/34.jpg)
Two Assumptions of ACE7
• Assumption 1:
– Here, we approximate Ci,m by:
• Assumption 2:
![Page 35: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/35.jpg)
Maximum Likelihood Estimation of Server Correlation
Here,
![Page 36: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/36.jpg)
Server Correlation
• Observation:– The server correlation is significant with respect to the fact that there are
thousands of candidate models.– some servers are correlated more tightly than others.
• mGenThreader and RAPTOR (0.383) vs. FUGUE3 and Prospect (0.182).
• Implication: – These individual server may be clustered into cliques according to
correlations; – the servers in a small clique may be underestimated according to the simple
“majority voting” rule.
![Page 37: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/37.jpg)
Uncovering the Latent Server
•
![Page 38: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/38.jpg)
Uncovering the Latent Servers (cont)
• Using the PCA technique, the latent severs can be estimated as:
![Page 39: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/39.jpg)
Explanation of Latent Servers
• Observation:– H1: represents MGTH and RAPT– H2: SPKS– H3: FUG3– H4: ST02– H5: PROS– H6: no preference
![Page 40: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/40.jpg)
Construct a More Accurate Server
• Since latent servers are mutually independent, it is reasonable to assume:
• Key Point:– How to set the weight of each latent server?
– An ILP model:• To maximize the gap between the scores of the native-like
models and incorrect models.
![Page 41: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/41.jpg)
ILP Model (soft-margin idea)
![Page 42: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/42.jpg)
Experiment on CASP7 Dataset
• Observation:– For T0363, ACE7 succeeds even only one server votes the native-like
model.
![Page 43: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/43.jpg)
Sensitivity of ACE7
• Observation: – ACE7 has a higher sensitivity than any individual
server.
![Page 44: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/44.jpg)
Future Work
![Page 45: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/45.jpg)
Conclusion
• Though consensus methods rely on structure clustering property, the server correlation also bring negative effect.
•
![Page 46: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/46.jpg)
Future Work
• To find a better approximation of Ci,m.
• Using MaxSub instead of GDT.
• RAPTOR has a good performance in choosing the top 5 models, but always be puzzled to choose the top 1 model.
• We try to help to choose the best from the top 5 models remains an open problem.
![Page 47: Consensus Fold Recognition Methods](https://reader035.vdocument.in/reader035/viewer/2022062423/56814b63550346895db85291/html5/thumbnails/47.jpg)
Thanks.