molecular modelling / structure prediction (a computational approach to protein structure) today:...

Molecular modelling / structure prediction

(A computational approach to protein structure)

Today:• Why bother about proteins/prediction• Concepts of molecular modelling

– The physicist’s approach– The biologist’s approach

• Get a feel for usefulness/uselessness• Where is the future going?

Thomas HuberDepartment of Mathematics

Room 724, Priestley building [email protected]

Why do we care about Protein Structures/

Prediction?• Academic curiosity?

– Understanding how nature works

• Drug & Ligand design– Need protein structure to design molecules

which inhibit/excite• cure all sorts of diseases

• Protein design– making better proteins

• sensor proteins

• industrial catalysts (washing powder, synthetic reactions, …)

• Urgency of prediction 104 structures are determined

• insignificant compared to all proteins

– sequencing = fast & cheap

– structure determination = hard & expensive

Three basic choices in molecular modelling

• Representation– Which degrees of freedom are treated

explicitly

• Scoring– Which scoring function (force field)

• Searching– Which method to search or sample

conformational space

The physicist’s approach: Folding by 1st principles

• Representation: atomic level

• Scoring: physical force field

• Searching: Newton’s equations of motion

Concept: Doing what nature does

Naïve idea?

• Levinthal’s paradox (1968)– 3 possible rotamers per dihedral angle

astronomical number of conformations

• Golf course scenario

Levinthal’s paradoxis irrelevant

• Folding is not a random process Bumpy bowl

scenario

Why are foldingsimulations still unsuccessful?• Simulations computational expensive• Force fields are not good• Gross approximations in simulations• Nature uses tricks

• Posttranslational processing

• Chaperones

• Environment change

Is a physicalapproach useless?

• No!• Very useful aid to structure determination / refinement

– Experimentally observed structural data very incomplete

• NMR: only distances < 6Å

• Xtallography: only 50% of data can be measured (phase information missing)

– Physico-chemical information and complement experimental data

• Give dynamical picture of structure

Biologist’s approach:Prediction by induction

• Representation: amino acid sequence• Scoring: sequence similarity (identity)• Searching: optimal string matching

(with gaps and insertions)

Concept: Homologous sequences fold into similar structures

Validation of concept(Rost, 1999)

• >106 sequence alignments between protein pairs

• Optimal discrimination between similar and dis-similar structure

Is it useful?

• PDB statistics: 104 protein structures determined

– <103 protein folds

8 Modelling steps

• Template recognition• Alignment• Alignment correction• Backbone generation• Loop building• Side chain generation• Overall model refinement• Model verification

– Comparison with Experimental results

– Steric overlap

– Ramachandran plot

Seq

uenc

e sc

ore

For

ce f

ield

Limiting factors

How good arehomology models?

• G.V. Vried 1998: 34 homologous protein pairs

What about side chains?

• Biology happens in side chains

• Packing side chains in protein core is not a trivial problem– Many alternative arrangements– High energy barriers

Accuracy of modelledside chains

• Dunbrack SCWRL results– 299 monomeric proteins

– 40263 side chains

The Next Step: Computational Proteomics

• Mass scale homology modelling of entire genomes– Lots of sequence data

– First pick the easy cases

– Computers are cheap and work 7-24

Prediction of Protein Structure

How to detect remote homologues

• Fold recognition using threading– Combine concepts of physicist and

biologists

• Predicting secondary structure

• More about that in BIOL3004– Structural biology elective

• Tue 8/5 10am

• Thu 10/5 10am

– Database mining elective• L10

Take home messages

• Computational approaches are– Not perfect– Yet indispensable

• Molecular modelling has huge potential in structural biology– Currently 104 structures in PDB– For every sequence in the Swissprot

database with homology to a structure in the PDB models are available!!

– Vast amount of data still to come

• Levinthal paradox– Is true– BUT not relevant

• Different aims need different approaches (3 choices of MM!)– modelling enzyme reactions– modelling protein folding– weather forecast

Clever approaches more important than bigger

computers

molecular modelling / structure prediction (a computational approach to protein structure) today:...

Documents