2d 3d structure
TRANSCRIPT
-
8/3/2019 2d 3d Structure
1/38
1
Structure prediction methods
(2D and 3D)
Much of the text in the slides that follow are drawn either verbatim or
paraphrased from the following texts:
Bioinformatics (Baxevanis and Ouellette)
Chapter 8: Predictive methods using protein sequences
(Ofran and Rost) 198-219
Chapter 9: Protein structure prediction and analysis
(Wishart) 224-247
Chapter 12: Creation and analysis of protein multiple sequence alignments
(Barton) 333-336
Proteins: Structures and molecular properties
(Thomas Creighton)
Topics Covered Overview of protein structure: primary, secondary, tertiary, and
quaternary
Overview of protein folding
Secondary structure prediction methods
Solvent accessibility prediction
3D fold prediction Ab initio protein structure prediction
Threading methods
Community evaluation of protein structure prediction
Critical Assessment of protein Fold Prediction (CASP)http://predictioncenter.org/
EVA (real-time continuous evaluation of protein fold prediction methods)http://cubic.bioc.columbia.edu/eva/
Methods for solving protein structures experimentally
-
8/3/2019 2d 3d Structure
2/38
2
The importance of protein structure
Bioinformatics is much more than just sequence analysismany
of the most interesting and exciting applications in
bioinformatics today actually are concerned with structure
analysis.
The origins of bioinformatics actually lie in the field of structural
biology
Proteins are perhaps the most complex chemical entities in nature.
No other class of molecule exhibits the variety and and
irregularity in shape, size, texture and mobility that can be
found in proteins.
Baxevanis & Ouellette (Ch. 9, p.224, Wishart)
Hierarchical descriptions of proteins
(follows the folding process) Primary structure: the amino acid sequence
Secondary structure: regular local structure of linear segments ofpolypeptide chains (Creighton)
Helices (~35% of residues) Beta sheet (~25% of residues) Both types predicted by Linus Pauling (Corey and Pauling, 1953) Other less common structures:
Beta turns
3/10 helices
loops
Remaining unclassifiable regions termed random coil or unstructuredregions
http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm Tertiary structure: Overall topology of the folded polypeptide chain
(Creighton) Mediated by hydrophobic interactions between distant parts of protein
Quaternary structure: Aggregation of the separate polypeptide chainsof a protein (Creighton)
Baxevanis & Ouellette (Ch. 9, p.224, Wishart)
-
8/3/2019 2d 3d Structure
3/38
3
Protein folding
Folded conformations of globular
proteins
Most proteins are globular: natural proteins in solution aremuch smaller in their dimensions than comparablepolypeptides with random or repetitive conformations andhave roughly spherical shapes
Denaturation: Most proteins are robust to changes in theirenvironment, until they (somewhat literally) fall apart: Most proteins are robust to changes in temperature, pH and
pressure, exhibiting little or no change until a point is reached atwhich there is a sudden change and loss of biological function
Denaturing proteins has been used to explore folding pathways
(e.g.,Understanding how proteins fold: the lysozyme story so far.Dobson
CM, Evans PA, Radford SE.Trends Biochem Sci. 1994)
Creighton, Proteins Ch. 6
-
8/3/2019 2d 3d Structure
4/38
4
Structural domains
Folded structures of most small proteins are roughlyspherical and remarkably compact
Proteins with >200aa tend to consist of >2 structural units,called domains
Domains interact to varying extents, but less extensivelythan do structural elements within domains Some domain detection tools make use of this pattern, looking for
covariation between positions as evidence of interaction
Nagarajan and Yona, Automatic prediction of protein domainsfrom sequence information using a hybrid learning system.Bioinformatics2004
Domains may not always be well segregated; someproteins have multiple domains with 2 or three polypeptideconnections between domains
See, for example, the SCOP interleaved domains
Creighton, Proteins Ch. 6
Structural domains (contd) Definition of domain is a subjective process done in
different ways by different people
Domains are most evident by their compactness
Expressed quantitatively as the ratio of the surface area of adomain to the surface area of a sphere with the same volume
Observed values are 1.65+/- 0.08
Course of polypeptide backbone through domain isirregular, but generally follows moderately straight coursethrough the domain and then makes a U-turn to recross thedomain
Overall impression: segments of somewhat stiffpolypeptide chain interspersed with relatively tight turns orbends (almost always on the molecules surface) Compared to behavior of a fire hose dropped in one spot
Creighton, Proteins Ch. 6
-
8/3/2019 2d 3d Structure
5/38
5
Structural
domains
(contd)
Creighton, Proteins Ch. 6
Figure 6.13
Driving forces in protein folding
Complex combination of local and globalforces
Local forces drive secondary structureformation
Repulsion between hydrophobic side chains of someamino acids and hydrophilic backbone of proteinchain (intra-molecular)
Interaction between side chains and surroundingsolvent
Subcellular environment (e.g., membrane, secreted, etc.)
Pauling et al 1951
Baxevanis & Ouellette (Ch. 9, Wishart)
-
8/3/2019 2d 3d Structure
6/38
6
More driving forces in protein folding
Hydrophobicity
Hydrophobic residues need to be shielded from solvent
Polar residues to the outside, hydrophobic to the inside
Stronger interactions
Hydrogen bonds, disulfide bridges
Weak interactions
Van der Waals, electrostatic, etc
Recommended reading: Proteins (Thomas Creighton).
Global effects on protein fold
Long-range interactions (repulsive or
attractive) between distant parts of structure
These can override local effects
E.g., chameleon protein:
11 amino acids adopt helical structure in one region,
and the same 11 amino acids adopt beta strand in
another.
Minor & Kim, 1996
Baxevanis & Ouellette (Ch. 9, Wishart)
-
8/3/2019 2d 3d Structure
7/38
7
Ligands and co-factors
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/E/Enzymes.html#coenzymes
-
8/3/2019 2d 3d Structure
8/38
8
Information required for folding is (mostly)
contained in the primary sequence Early on, proteins were shown to fold into their native
structures in isolation
This led to the belief that structure is determined by
sequence alone (Anfinsen, 1973)
Over the last decade, a significant number of proteins have
been shown to not fold properly in the test tube (e.g.,
requiring the assistance of chaperonins)
Nevertheless, the native 3D structure is assumed to be in
some energetic minimum This led to the development ofab initio folding methods
Baxevanis & Ouellette (Ch. 9, Wishart)
Folding pathways
Evidence that local structure segments form first,
and then pack against each other to form 3D fold
Exploited in protein fold prediction, Rosetta method
Simons, Bonneau, Ruczinski & Baker (1999).Ab initio
Protein Structure Prediction of CASP III Targets Using
ROSETTA. Proteins
Semi-stable structural intermediates on foldingpathway to lowest-energy conformation
Prof. Susan Marqusee, Berkeley
Baxevanis & Ouellette (Ch. 9, Wishart)
-
8/3/2019 2d 3d Structure
9/38
9
Secondary structure
Alpha
helix
structure
http://www.web-books.com/MoBio/Free/Ch2C4.htm
-
8/3/2019 2d 3d Structure
10/38
10
Amphi-
pathic
alpha
helix
http://www.web-books.com/MoBio/Free/Ch2C4.htm
Beta strand
http://www.web-books.com/MoBio/Free/Ch2C4.htm
-
8/3/2019 2d 3d Structure
11/38
11
Beta sheet
http://www.web-books.com/MoBio/Free/Ch2C4.htm
Secondary Structure Prediction
-
8/3/2019 2d 3d Structure
12/38
12
Why is secondary structure
prediction important?
Secondary structure diverges less rapidly
than primary sequence
Knowledge or prediction of 2ary structure
improves detection and alignment of remote
homologs
3d-pssm, SAM T02 (fold prediction servers)
Baxevanis & Ouellette (Ch. 9, Wishart)
Focusing on single residues
Early structure prediction methods focused on thestructural characteristics of individual residues
This enabled the larger problem to be decomposedinto smaller easier-to-solve problems (enabling thecombination of solutions to sub-problems to forma global solution)
This also enabled methods to focus on detectingtransmembrane regions, solvent-accessibleresidues, and other important features ofmolecules
Baxevanis & Ouellette (Ch. 9, Wishart)
-
8/3/2019 2d 3d Structure
13/38
13
Secondary structure prediction
using MSA information?
Labeling residues in a sequence as -helix, -
sheet or turn/coil (3-state prediction).
Accuracy of prediction enhanced by ~6% when
multiple sequence alignments are used vs the use
of a single sequence (Cuff & Barton, 1999)
State of the art methods -- PSIPRED (Jones 1999)
and JNET (Cuff & Barton, unpublished) have >76%accuracy for 3-state prediction.
Baxevanis & Ouellette (Ch. 12, Barton)
Amino acid patterns indicative of-strand structures
Short runs of conserved hydrophobic
Buried -strand
An i, i+2, i+4 pattern of conservedhydrophobic residues suggests a surface -strand.
Conserved residues sharing the samephysicochemical properties are likely toform one face of a strand.
Baxevanis & Ouellette (Ch. 12, Barton)
-
8/3/2019 2d 3d Structure
14/38
14
Amino acid patterns indicative of
-helical structures
Conservation patterns of i, i+3, i+4, i+7and variations (e.g., i, i+4, i+7) suggestsan alpha helix
Amphiphilic/conservation patterns(alternating hydrophobic and polarresidues) following an i, i+3, i+4, i+7pattern (and variations, e.g., i, i+4, i+7)are likely to represent surface helices
Baxevanis & Ouellette (Ch. 12, Barton)
Identifying loop regions
Insertions and deletions are not welltolerated in the hydrophobic core.
Regions of an MSA that include many gapcharacters are likely to indicate surface loops.
Glycine and proline residues can be foundin any secondary structure.
However, conserved glycine/proline residuesare strongly suggestive of loops.
Baxevanis & Ouellette (Ch. 12, Barton)
-
8/3/2019 2d 3d Structure
15/38
-
8/3/2019 2d 3d Structure
16/38
16
Early schemes used observed preferences
Various schemes give the amino acids numerical weights orrankings for their preferences, and several computer programscan predict the secondary structure from the given sequence.
The simplest such scheme of Chou and Fasman, Ann. RevBiochem. (1978), examined the statistical distribution of aminoacids in alpha helix, beta sheet and turns or loops, using a set ofknown protein structures from the protein databank.
A novel sequence can then be scanned, and the tendency ofeach portion of the sequence to form secondary structure isassessed.
http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm
Improving secondary structure prediction
Peer pressure (pressure from the neighbors): A minimum of4 amino acids out of 6 should show alpha preference, or 3 out of5 beta preference, or clusters of 2-3 breakers in a sequence of 4are needed to set the secondary structure in any region, andindividual misfits adopt the secondary structure of theirneighbours.
Learning secondary structure preferences from expandeddata sets: More recent prediction schemes take advantage oflarger data sets to examine amino acid preference for differentregions in a helix or different positions in a tight turn.
Up-weighting conserved residues: In addition, sequences ofhomologous proteins may be compared. The rationale is thathighly conserved amino acids contribute more to the three
dimensional structure than unconserved, and differentweightings can be introduced to the statistical analysis.
Improved accuracy: The accuracy of prediction has risen fromabout 55% using the simple Chou-Fasman method, where thetendency is to overpredict, to about 80% using current methods.
http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm
-
8/3/2019 2d 3d Structure
17/38
17
Basic types of secondary structure Helices ( and others)
is most common; 3.6 residues/turn
Side chains project outward
Structure is stabilized between hydrogen bondsbetween the carbonyl (CO) group of one amino acidand the amino (NH) group of the amino acid that is 4positions C-terminal to it
-Strands (two or more strands interact to form a
-sheet) Other (sometimes called loop, coil, or non-
regular)
Baxevanis & Ouellette (Ch. 9, Wishart)
The new generation of secondarystructure prediction
PHDsec (Rost et al 1994, Rost et al 1996)
Based on machine learning concepts
Training set: learn implicit rules, principles and model
parameters from labelled data (sequences whose
secondary structures are known for each position)
Test set: sequences of unknown structure
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
-
8/3/2019 2d 3d Structure
18/38
18
Key to success
The success of machine learning algorithmsdepends on the careful choice of the biologicallybased features used for training and asufficiently large and accurate training set
To enhance prediction accuracy on novel data,training data diversity is also critical
Exploit knowledge that local environment isimportant: to predict 2ary structure of residue i,
consider all residues in a window aroundI: i-n, i, i+n.
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
PHDsec
Employs homology detection and a feed-forwardartificial neural network
Step 1: homolog search and MSA construction
Step 2: label each position with conservationsignal (across MSA) and observed substitutions
Step 3: submit representative annotated sequence
to a system of neural networks. Output is a prediction of the most likely secondary
structure at each position, with the estimatedconfidence in that prediction
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
-
8/3/2019 2d 3d Structure
19/38
19
Assessing performance evaluations
Overall, the correct evaluation of performance
for prediction methods is an art in itself; only a
handful of methods turned out over time to not
have been overestimated by their developers.
Evaluation must be performed on a standard dataset
Training and test data should be rigorously kept
separate
Standard deviations of estimates should be provided
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
Other problems with comparing
different methods
Performance reported in literature can take different forms
Accuracy and coverage
Positive (or negative) predictive power
Sensitivity and specificity
Machine learning terms (e.g., Matthews coefficients)
Wilcoxon paired score signed rank tests
Or be based on different criteria for success per residue
per secondary structure element
per protein
Others measure performance only in cases where aprediction has high confidence (with a likelihood of alower FP rate)
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
-
8/3/2019 2d 3d Structure
20/38
20
The EVA server Continuous assessment of the predictions of automatic
servers using the same measurements, the same standards,and the same sequences to all methods
New structures (pre-release to PDB) given to EVA byparticipating structural biologists. EVA submits the aminoacid sequences to online servers.
Predictions stored until release of 3D coordinates to PDB.Then the predicted (2D or 3D) structures can be comparedagainst the solved structures, and given various scores.
Approach enables the community to compare methods, andgives developers concrete feedback that is critical formethod improvement.
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
How do the methods compare?
Best methods now reach 76% accuracy at 3-stateprediction (helix, strand, random coil)
Rost 2001
See EVA website for detailed comparisons
Metaservers:
Consensus approaches combining weighted predictionsfrom different servers
These almost always outperform individual methods
Shown in both CASP and EVA
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
-
8/3/2019 2d 3d Structure
21/38
21
Caveats
Even when an experimental structure is available, it issometimes unclear where one secondary structure elementends and another begins
Low-confidence predictions (and regions of disagreementacross servers) can correspond to structurally ambiguousregions
Real-life example: Prion protein (involved in bovinespongiform encephalopathy, Creutzfeld-Jakob disease, etc).
Region assumed to be responsible for aggregation believed to flipfrom experimentally determined helical structure to (predicted)strand in diseased individuals
All the best secondary structure prediction methods predict thisregion to be beta (incorrect)
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
Secondary structure predictionprograms
PSI-PRED
JNET (Cuff & Barton)
PHD (Rost & Sander)
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
-
8/3/2019 2d 3d Structure
22/38
22
PSIPRED
-
8/3/2019 2d 3d Structure
23/38
23
Solvent accessibility
Solvent accessibility is the area of a proteins surface thatis exposed to surrounding solvent.
This information is critical for facilitating the detection offunctionally (as opposed to structurally) critical residues
Solvent-exposed positions have the potential to interactwith other molecules, metal atoms or ions
Entirely buried residues may help stabilize a proteins 3Dfold, but can not participate in
an enzyme active site,
binding site in a DNA-binding protein, or an interaction site in a signal transduction component
all of which require spatial accessibility of the residue tosolvent
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
-
8/3/2019 2d 3d Structure
24/38
24
Measuring solvent accessibility Measured in square Angstroms
Values range from 0 (entirely buried) to 300 (onsurface)
Two entirely exposed residues can have verydifferent accessible areas
Residues with long side chains expose a larger area tosolvent than residues with short side chains
Values typically normalized by the maximum
possible for an amino acid, to measure thepercentage of the residue that is accessible tosolvent.
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
Conservation of solventaccessibility
Homologous proteins with similar folds
tend to conserve solvent accessibility values
at buried positions (i.e., solvent accessibility
between 0-10%);
Exposed positions (values between 60-
100%) show less conservation of solventaccessibility between homologs.
Rost and Sander, 1994
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
-
8/3/2019 2d 3d Structure
25/38
25
Prediction methods PHDacc and PROFace
Part of the PredictProtein service at Columbia
U. (Burkhard Rost lab)
Sequence alignment and profile construction
using MaxHom
Per-residue 10-state scheme, corresponding to
predicted percentage of residue that is
accessible (1=0-1%; 2=2-4%; etc)
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
Prediction methods: Jpred Cuff & Barton, 2000
Prediction server predicting 2ary structure andsolvent accessibility
Sequence alignment and profile construction usingPSI-BLAST and HMM methods
Per-residue 3-state scheme, corresponding topredicted percentage of residue that is accessible(0%, 5%, 25%)
Prediction outputs from two neural networks arecombined to give an average relative solventaccessibility.
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
-
8/3/2019 2d 3d Structure
26/38
26
Solvent accessibility:
Method performance No large-scale continuous system for evaluation isavailable (unlike the case for 2D and 3D structureprediction)
Local sequence information is insufficient Accessibility to solvent appears to be influenced by nonlocal
effects
For two-state prediction (buried vs exposed) accuracy isbetween 75-85% for both PHDacc and PROFacc
For more detailed definitions (e.g., percentage of exposure),accuracy is more difficult to measure.
Correlation coefficient between predicted and measuredsolvent accessibility for PHDacc is 0.53
Random guess would yield a correlation coefficient of zero
Superior results require a homology model construction
Baxevanis & Ouellette Ch 8 (Ofran and Rost)
3D-structure prediction
-
8/3/2019 2d 3d Structure
27/38
27
Basic premise: The function and structure of
a protein are encoded in its primary sequence
The amino acid sequence determines a proteins 3D
structure, subcellular localization, intermolecular
interactions, biochemical physiological tasks, and
(eventually) how and when it will be broken down into
its component building blocks.
Paraphrased from class text (Ofran and Rost), p 198
How many unique protein folds are there?
Many structural biologists believe that all protein domainswill eventually be classified into only 1000 different foldclasses
Koonin et al 2002
Structural Genomics Initiative is designed to populate thatfold space Even with attempts to solve novel structures, upon examination of
new structures, many are clearly members of existing structural
classes
Baxevanis & Ouellette (Ch. 9, Wishart)
-
8/3/2019 2d 3d Structure
28/38
28
3D structure classification schemes All alpha (>50% helix; 30% beta sheet;
-
8/3/2019 2d 3d Structure
29/38
29
Threading
Limited to generating approximate models or suggestingapproximate folds
>5 Angstroms for 3D threading
>3Angstroms for 2D threading
Name based on threading a tube (called a snake) througha plumbing system.
Each unique threading of a sequence through the 3D modelcan be evaluated using empirically derived energy functionor measure of packing efficiency
Sequences can be scored based on how well they fit themodel (i.e., the best score achievable)
Baxevanis & Ouellette Ch 9 (Wishart)
Three-dimensional threading First described by Novotny et al (1984)
Rediscovered in early 1990s
Jones et al 2992; Sippl & Weitckus 1992; Bryant & Lawrence1993
Based largely on heuristic contact potentials (interactions betweenpairs of residues)
3D coordinates of theoretical structure (based on threading ofsequence through PDB structure model) used to evaluate predictedcontacts and derive a fitness score based on a pseudoenergyfunction
Powerful for predicting 3D structure of unknown proteins,
and for evaluating structure of known proteins Limitations found in this method:
interactions are not always conserved between distant homologs
Computational complexity (very slow)
Modest accuracy (early methods ignored amino acid information;model accuracy >5Angstroms)
Baxevanis & Ouellette Ch 9 (Wishart)
-
8/3/2019 2d 3d Structure
30/38
30
Contact maps 2D plots of distances
between C-alphaatoms of all pairs ofresidues
Observed interactionsbetween amino acidsused to form contactpotentials for 3Dthreading methods
Creighton, Proteins Ch. 6
Figure 6.14
Two-dimensional threading Sequence-profile methods; combines predictions of 2ary
structure prediction (and possibly solvent accessibility)with standard profile methods to score and align proteins
Improved accuracy through combined use of 2ary structureand amino acid similarity
Much faster than standard 3D threading
Model accuracy good but not excellent (RMSD>3Angstroms)
However, for model construction for proteins with no closehomologs with solved structure, these methods are among the best
Examples: UCSC SAMT99 (two-track HMMs), 3d-pssm, FUGUE
Judged best by EVA
Baxevanis & Ouellette Ch 9 (Wishart)
-
8/3/2019 2d 3d Structure
31/38
31
Rosetta
Hybrid ab initio and homology-based
structure prediction
David Baker
The HMMSTR-Rosetta server
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
-
8/3/2019 2d 3d Structure
32/38
32
Assessing method performance
Astral benchmark datasets
Park et al
CASP experiments
EVA and Livebench
Continuous evaluation of webservers
-
8/3/2019 2d 3d Structure
33/38
33
Experimental methods for solving
protein 3D structure
Experimental determination ofprotein structure
X-ray crystallography
NMR spectroscopy
-
8/3/2019 2d 3d Structure
34/38
34
X-ray crystallography
Most accurate; can be applied to larger proteins
Oldest method; first structure (myoglobin) determined in late1950s (Kendrew et al 1958). More than 20K structures solved todate
Method: Small protein crystals (measuring
-
8/3/2019 2d 3d Structure
35/38
35
NMR spectroscopy
Much newer: first NMR structure in 1983 Allows biologists to study structure and dynamics of molecules in liquid state
(or near-physiological environment)
Structures solved by measuring how radio waves are absorbed by atomicnuclei
Absorption measurement allows the determination of how much nuclearmagnetism is transferred from one atom (or nucleus) to another
Magnetization transfer measured through chemical shifts, J-couplings and nuclearOverhauser effects
Measured parameters define a set of approximate structural constraints that are fedinto a constraint minimization calculation (distance geometry or simulatedannealing)
Result is an ensemble of (15-50) of structures that satisfy the experimentalconstraints
These multiple structures are overlaid/superimposed on each other to produceblurrograms
NMR result is potentially more reflective of true solution behavior of proteins;most proteins seem to exist in an ensemble of slightly different configurations
Baxevanis & Ouellette (Ch. 9, Wishart)
Limitations of NMR spectroscopy
Size limitations: maximum of 30kD (~250aa)
Solubility of molecule
cannot be applied to membrane proteins
Expensive: requires special isotopically labeled molecules
Inherently less precise
Baxevanis & Ouellette (Ch. 9, Wishart)
-
8/3/2019 2d 3d Structure
36/38
36
Storing and retrieving protein structures
The Protein Data Bank (PDB)
First electronic database in bioinformatics
Set up at Brookhaven National Laboratory by WalterHamilton in 1971
7 protein structures at database initiation
Coordinates stored and distributed on punch cards and computer tape
Currently
22K structures (as of October 23, 2005)
Coordinate distribution and deposition is electronic (via the worldwide web)
Moved to the Research Collaboratory for Structural Bioinformatics(RSCB) in 1998
Primary archival center for experimentally determined 3D structuresof proteins, nucleic acids, carbohydrates and complexes
Separate repository for theoretical models
Baxevanis & Ouellette (Ch. 9, Wishart)
http://www.usm.maine.edu/~rhodes/ModQual/index.html
-
8/3/2019 2d 3d Structure
37/38
37
http://www.usm.maine.edu/~rhodes/ModQual/index.html
Summary Experimental determination of protein structure is
expensive and not always straightforward
Predictive methods are relied upon to obtain clues toprotein fold (and function)
Knowing what (which parts of a protein structure) you canbelieve and what you cant is critical for both experimentaland predicted structures
-
8/3/2019 2d 3d Structure
38/38
Summary (contd)
Ab initio methods of protein fold prediction use physics-based energyminimization to simulate the process of protein folding
These methods are generally less successful than homology-based foldprediction (limited to short peptides/small proteins)
Exception: Rosetta/I-sites methods (Baker group) which employ bothtypes of approach
Threading methods fall into the homology-based class of approaches.
2D profiles use 2ary structure (prediction/knowledge) as well as sequenceinformation (and perhaps additional information).
3D profiles use 3D models and assign scores to proteins based on inter-residue contacts based on the observed contacts in the original structuretemplate and derived contact potentials from other structures
Summary (contd) Community assessment of 2D and 3D structure prediction uses various
approaches
EVA and LiveBench (continuous real-time assessment of methods)
CASP (Critical Assessment of Protein Structure Prediction)
Benchmark datasets (e.g., Astral PDB40 for fold recognition)
Reported accuracy of 2D structure prediction between 75-77% (forbest methods)
Reported accuracy of comparative models derived by 3D structureprediction servers is harder to assess.
Fold prediction (ignoring the comparative model construction) is fairly
accurate for the best serversprovided A homologous structure has already been deposited in the PDB
That structure can be detected with a significant E-value using sequenceinformation alone, e.g., by PSI-BLAST)
The inclusion of 2ary structure prediction (e.g., in 2D profiles) canimprove the alignment and give a modest boost to fold recognitionaccuracy when %ID is very low, but can also yield errors in prediction