representations of molecular structure: bonds only

47
Representations of Molecular Structure: Bonds Only

Upload: paula-dean

Post on 30-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Bonds Only

Page 2: Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Bonds Only

Page 3: Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Atoms Only

Page 4: Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Atoms and Bonds

Page 5: Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Ribbons

Page 6: Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Mixed

Page 7: Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: van der Waals Surface

Page 8: Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Solvent Excluded Surface

Page 9: Representations of Molecular Structure: Bonds Only

Protein Structure Prediction

Page 10: Representations of Molecular Structure: Bonds Only

Protein folding is differentfrom structure prediction

• Folding is concerned with the process of taking the 3D shape, usually based on physical principles.

• Prediction uses any statistical, theoretical or empirical data to try to get at the end result.

Page 11: Representations of Molecular Structure: Bonds Only

Protein Structure Prediction

• A bit of history: Asilomar, 1994, 1996, 1998, 2000, 2002, & 2004 (pending)

• Three approaches to structure prediction:

a. Homology modeling

b. Sequence-structure threading

c. Ab initio prediction

Page 12: Representations of Molecular Structure: Bonds Only

Asilomar

• Experimentalists who had structures that would be solved before date of CASP meeting submitted the sequences of the unknowns to a central repository.

• Predictors could download sequence and minimal information about protein (name), and could enter one of three categories.

• Assessors use automatic programs for analysis in addition to expertise to evaluate quality of predictions.

Page 13: Representations of Molecular Structure: Bonds Only

CASP6 in Numbers

• Number of human expert groups registered 228 Number of prediction servers registered 65

• Number of targets released 87Targets canceled 11 Valid targets 76 Targets for human expert prediction 76 Targets for server prediction 76

Page 14: Representations of Molecular Structure: Bonds Only

CASP6: Accepted Predictions

Prediction format No. groups No. 1 Models All Models

3D coordinates 166 8276 27472

Alignments to PDB 37 1726 5250

Residue-residue contacts 16 983 1664

Domains assignments 24 1230 1546

Disordered regions 20 1365 1695

Function prediction 25 1033 1179

All 228 (unique) 14613 38806

Page 15: Representations of Molecular Structure: Bonds Only

Asilomar Categories

• Homology Modeling (sequences with high homology to sequences of known structures)

Given a sequence with homology > 25-30% with known structure in PDB, use known structure as starting point to create a model of the 3D structure of the sequence.

Takes advantage of knowledge of a closely related protein. Use sequence alignment techniques to establish correspondences between known “template” and unknown.

Page 16: Representations of Molecular Structure: Bonds Only
Page 17: Representations of Molecular Structure: Bonds Only

Asilomar Categories

• Fold recognition (sequences with no sequence identity (<= 30%) to sequences of known structure.

Given the sequence, and a set of folds observed in PDB, see if any of the sequences could adopt one the known folds.

Takes advantage of knowledge of existing structures, and principles by which they are stabilized (favorable interactions).

Page 18: Representations of Molecular Structure: Bonds Only

Fold Recognition

• New sequence:

MLDTNMKTQLKAYLEKLTKPVELIATL

DDSAKSAEIKELL…• Library of known folds:

Page 19: Representations of Molecular Structure: Bonds Only

Asilomar Categories

• Ab initio prediction (no known homology with any sequence of known structure)

Given only the sequence, predict the 3D structure from “first principles”, based on energetic or statistical principles.

Secondary structure prediction and multiple alignment techniques used to predict features of these molecules. Then, some method necessary for assembling 3D structure.

Page 20: Representations of Molecular Structure: Bonds Only

Ab initio prediction

• New sequence:MLDTNMKTQLKAYLEKLTKPVELIATLDDSAKSAEIKELL…

• Predict secondary structure:MLDTNMKTQLKAYLEKLTKPVELIATLDDSAKSAEIKELL…HHHHHCCCCCHHHHHHHHHHCCCCBBBBBBBCCBBBB…

• Predict 3D structure entirely:

Page 21: Representations of Molecular Structure: Bonds Only

Asilomar Results

How to evaluate predictions?• RMSD• Overall identification and topology of

secondary structures• Energy considerations (contacts, H-

bonds)• Similarity of hydrophobic core• Sequence alignment quality (and

systematic shift)

Page 22: Representations of Molecular Structure: Bonds Only

Homology Modeling

• When sequence homology is > 70%, high resolution models are possible (< 3 Å RMSD).

• Sophisticated energy minimization techniques do not dramatically improve upon initial guess.

• Rigorous criteria applied such as torsion angles, van der Waals violations, RMSD.

Page 23: Representations of Molecular Structure: Bonds Only

Homology Modeling Samples Thick backbone shows known structure. Thin lines show modeled

structures. Some sidechains are not positioned correctly, but backbone and other sidechains look quite good.

Page 24: Representations of Molecular Structure: Bonds Only

Homology Modeling Mistakes

• a. Sidechain mistakes• b. Shifts with correct

alignment • c. No template • d. Misalignment• e. Incorrect template

Page 25: Representations of Molecular Structure: Bonds Only

Limitations of Homology Modeling

Page 26: Representations of Molecular Structure: Bonds Only

Useful Conclusions from CASP

• Use of sensitive multiple alignment techniques helped get best alignments.

• Side chain modeling uses libraries of known amino acid conformations. Success ranged from 45% to 80% correct (= angles within 30° of experimental structure).

• Energy based refinement still not improving the structures.

Page 27: Representations of Molecular Structure: Bonds Only

Ab Initio Predictions – From Primary to Secondary

• Range of accuracy from 66% to 77% (3 state labeling: helix, coil or beta).

• Human hand editing improves the accuracy.

• Multiple sequence alignments improve the performance of secondary structure prediction.

Page 28: Representations of Molecular Structure: Bonds Only

Ab Initio Predictions –From Secondary to Tertiary

• Sensitive to errors in secondary structure

• Predictors were more likely to predict previously known structures.

Page 29: Representations of Molecular Structure: Bonds Only

Ab Initio Predictions –From Primary to Tertiary

• Predict interresidue contacts and then compute structure (mild success)

• Simplified energy term + reduced search space (phi/psi or lattice) (moderate success)

• Creative ways to memorize sequence <-> structure correlations in short segments from the PDB, and use these to model new structures. database method. (moderate success)

Page 30: Representations of Molecular Structure: Bonds Only

Ab Initio Predictions –Tertiary (1 to 3): Good Methods

• Associate sequence of unknown with known 3D structure library, and then optimizing contact frequency of amino acids, as measured in PDB (Baker et al).

• Generate all folds on lattice and then filter the bad ones out (Samudrala et al)

• Combine multiple sequence alignment, secondary structure prediction and lattice. (Skolnick et al)

Page 31: Representations of Molecular Structure: Bonds Only

Lattice Model: Overcoming Entropic Barriers

Page 32: Representations of Molecular Structure: Bonds Only

Substructure/Fragment Model: Overcoming Entropic Barriers

• Break target into fragments of 9 amino acids

• Search for similar PDB sequences based on sequence similarity

• Start with extended chain, and evaluate the effect of introducing the fragments into the chain.

Page 33: Representations of Molecular Structure: Bonds Only

Substructure/Fragment Model: Overcoming Entropic Barriers

• Use Metropolis-type algorithm for optimization, using following terms:

– hydrophobic burial

– polar side-chain interactions

– hydrogen bonding between beta-strands

– hard sphere repulsion (van der Waals)• Create 1000 structures, cluster them.• Choose one representative from each cluster as

possible prediction…

Page 34: Representations of Molecular Structure: Bonds Only

Successful Stories of Rosseta

Page 35: Representations of Molecular Structure: Bonds Only

Successful Stories of Rosseta

Page 36: Representations of Molecular Structure: Bonds Only

Fold Recognition Becoming More Important

• CASP1: Of 21 target proteins, 11 wound up having folds that were previously known.

• CASP2: Of 22 targets, 15 with available folds

• CASP3: Of 43 targets, 36 with available folds

• …

Page 37: Representations of Molecular Structure: Bonds Only

Fold Recognition

• Every predictor does well on something.

• Common folds (more examples) are easier to recognize.

• Fold recognition was the surprise performer at the first competition. Incremental progress at second, third, fourth …

Page 38: Representations of Molecular Structure: Bonds Only

Fold Recognition

• Not “all or none”. List of top N hits much better than top hit.

• Common folds easier to recognize.

• Quality of alignments that result is NOT good.

• Potentials include: residue pair contact terms, hydrophobicity, polarity, H-bonds, local structure terms.

Page 39: Representations of Molecular Structure: Bonds Only

1 = target, 2 = Fold in PDB

Page 40: Representations of Molecular Structure: Bonds Only

1 = target, 2 = Fold in PDB

Page 41: Representations of Molecular Structure: Bonds Only

Elements of a fold recognitionalgorithm

• Library of protein structures, suitably processed- All structures- Representative subset- Structures with loops removed• Scoring function- contact potential- environmental evaluation function• Method for generating initial alignments and/or

searching for better alignments.

Page 42: Representations of Molecular Structure: Bonds Only

Scoring: Contact Potential

• Instead of modeling energies from first physical principles, simplify the problem by positioning only amino acids, and compute empirical energies from the observed associations of amino acids.

• “GLU is attracted to LYS” = E(glu, lys)

Page 43: Representations of Molecular Structure: Bonds Only

Scoring: Contact Potential

• Create energy terms between amino acids:

E(interaction) = -KT ln[frequency of interaction]

• Frequency of interaction is measured in database of known structures. Higher frequency, more favorable interaction.

Page 44: Representations of Molecular Structure: Bonds Only

Sippl Contact Potential

Given:a = amino acid type a (ALA, VAL, etc...)b = amino acid type bs = separation in sequence

Δ Eabs(r) = Eabs (r) — Es(r)

Energy of interaction between a and b minus average energy at that separation equals the energy difference that contributes to stability.

Page 45: Representations of Molecular Structure: Bonds Only

Sippl Contact Potential

Thus we have:

ΔEabs(r) = -KT ln [ fabs (r) / fs (r) ]

• For any given sequence in 3D, compute distances between all pairs of amino acids (usually up to r = 10-15Å), and sum.

• ΔEtot = Σ ΔEabs(r) all a,b pairs

Page 46: Representations of Molecular Structure: Bonds Only

Using Contact Potential

• Given 3D structure, need to mount the sequence on the structure.

– dynamic programming (okay)– exhaustive enumeration (too expensive)(recent paper shows that this is NP-hard)– heuristic enumeration—limit on gap lengths, loop lengths

(heuristic)• Evaluate the contact potential for the alignment.• [Optional] Locally optimize the potential score.• Compare potential with random shuffle of sequence, and

with other sequences to approximate z-score.

Page 47: Representations of Molecular Structure: Bonds Only

Future of Structure Predictions

• Protein fold recognition will get asymptotically better, as we get more folds.

• Best ab initio methods use knowledge of database, and will thus also improve.

• Estimates are that we now have between 30% and 50% of folds that occur.

• Given fold, we need to improve refinement with homology modeling techniques.