molecular modelling / structure prediction (a computational approach to protein structure) today:...
Post on 21-Dec-2015
213 views
TRANSCRIPT
Molecular modelling / structure prediction
(A computational approach to protein structure)
Today:• Why bother about proteins/prediction• Concepts of molecular modelling
– The physicist’s approach– The biologist’s approach
• Get a feel for usefulness/uselessness• Where is the future going?
Thomas HuberDepartment of Mathematics
Room 724, Priestley building [email protected]
Why do we care about Protein Structures/
Prediction?• Academic curiosity?
– Understanding how nature works
• Drug & Ligand design– Need protein structure to design molecules
which inhibit/excite• cure all sorts of diseases
• Protein design– making better proteins
• sensor proteins
• industrial catalysts (washing powder, synthetic reactions, …)
• Urgency of prediction 104 structures are determined
• insignificant compared to all proteins
– sequencing = fast & cheap
– structure determination = hard & expensive
Three basic choices in molecular modelling
• Representation– Which degrees of freedom are treated
explicitly
• Scoring– Which scoring function (force field)
• Searching– Which method to search or sample
conformational space
The physicist’s approach: Folding by 1st principles
• Representation: atomic level
• Scoring: physical force field
• Searching: Newton’s equations of motion
Concept: Doing what nature does
Naïve idea?
• Levinthal’s paradox (1968)– 3 possible rotamers per dihedral angle
astronomical number of conformations
• Golf course scenario
Levinthal’s paradoxis irrelevant
• Folding is not a random process Bumpy bowl
scenario
Why are foldingsimulations still unsuccessful?• Simulations computational expensive• Force fields are not good• Gross approximations in simulations• Nature uses tricks
• Posttranslational processing
• Chaperones
• Environment change
Is a physicalapproach useless?
• No!• Very useful aid to structure determination / refinement
– Experimentally observed structural data very incomplete
• NMR: only distances < 6Å
• Xtallography: only 50% of data can be measured (phase information missing)
– Physico-chemical information and complement experimental data
• Give dynamical picture of structure
Biologist’s approach:Prediction by induction
• Representation: amino acid sequence• Scoring: sequence similarity (identity)• Searching: optimal string matching
(with gaps and insertions)
Concept: Homologous sequences fold into similar structures
Validation of concept(Rost, 1999)
• >106 sequence alignments between protein pairs
• Optimal discrimination between similar and dis-similar structure
Is it useful?
• PDB statistics: 104 protein structures determined
– <103 protein folds
8 Modelling steps
• Template recognition• Alignment• Alignment correction• Backbone generation• Loop building• Side chain generation• Overall model refinement• Model verification
– Comparison with Experimental results
– Steric overlap
– Ramachandran plot
Seq
uenc
e sc
ore
For
ce f
ield
Limiting factors
How good arehomology models?
• G.V. Vried 1998: 34 homologous protein pairs
What about side chains?
• Biology happens in side chains
• Packing side chains in protein core is not a trivial problem– Many alternative arrangements– High energy barriers
Accuracy of modelledside chains
• Dunbrack SCWRL results– 299 monomeric proteins
– 40263 side chains
The Next Step: Computational Proteomics
• Mass scale homology modelling of entire genomes– Lots of sequence data
– First pick the easy cases
– Computers are cheap and work 7-24
Prediction of Protein Structure
How to detect remote homologues
• Fold recognition using threading– Combine concepts of physicist and
biologists
• Predicting secondary structure
• More about that in BIOL3004– Structural biology elective
• Tue 8/5 10am
• Thu 10/5 10am
– Database mining elective• L10
Take home messages
• Computational approaches are– Not perfect– Yet indispensable
• Molecular modelling has huge potential in structural biology– Currently 104 structures in PDB– For every sequence in the Swissprot
database with homology to a structure in the PDB models are available!!
– Vast amount of data still to come
• Levinthal paradox– Is true– BUT not relevant
• Different aims need different approaches (3 choices of MM!)– modelling enzyme reactions– modelling protein folding– weather forecast
Clever approaches more important than bigger
computers