yoonjoo choi , sumeet agarwal and charlotte m. deaneafter superimposing anchor structures. modeller...
TRANSCRIPT
-
Submitted 13 November 2012Accepted 21 November 2012Published 12 February 2013
Corresponding authorCharlotte M. Deane,[email protected]
Academic editorWalter de Azevedo, Jr
Additional Information andDeclarations can be found onpage 12
DOI 10.7717/peerj.1
Copyright2013 Choi et al.
Distributed underCreative Commons CC-BY 3.0
OPEN ACCESS
How long is a piece of loop?
Yoonjoo Choi1, Sumeet Agarwal2,3,4 and Charlotte M. Deane5
1 Department of Computer Science, Dartmouth College, Hanover, NH, USA2 Department of Systems Biology Doctoral Training Centre, University of Oxford, Oxford,
United Kingdom3 Department of Physics, University of Oxford, Oxford, United Kingdom4 Department of Electrical Engineering, Indian Institute of Technology Delhi, Delhi, India5 Department of Statistics, University of Oxford, United Kingdom
ABSTRACTLoops are irregular structures which connect two secondary structure elements inproteins. They often play important roles in function, including enzyme reactionsand ligand binding. Despite their importance, their structure remains difficultto predict. Most protein loop structure prediction methods sample local loopsegments and score them. In particular protein loop classifications and databasesearch methods depend heavily on local properties of loops. Here we examine thedistance between a loop’s end points (span). We find that the distribution of loopspan appears to be independent of the number of residues in the loop, in other wordsthe separation between the anchors of a loop does not increase with an increasein the number of loop residues. Loop span is also unaffected by the secondarystructures at the end points, unless the two anchors are part of an anti-parallelbeta sheet. As loop span appears to be independent of global properties of theprotein we suggest that its distribution can be described by a random fluctuationmodel based on the Maxwell–Boltzmann distribution. It is believed that theprimary difficulty in protein loop structure prediction comes from the numberof residues in the loop. Following the idea that loop span is an independent localproperty, we investigate its effect on protein loop structure prediction and showhow normalised span (loop stretch) is related to the structural complexity ofloops. Highly contracted loops are more difficult to predict than stretched loops.
Subjects Biochemistry, Bioinformatics, Biophysics, Computational BiologyKeywords Protein structure, Protein loop, Protein structure prediction, Protein loop structure,Protein loop structure prediction, Protein, Loop stretch, Loop span
INTRODUCTIONProtein loops are patternless regions which connect two regular secondary structures.
They are generally located on the protein’s surface in solvent exposed areas and often play
important roles, such as interacting with other biological objects.
Despite the lack of patterns, loops are not completely random structures. Early studies
of short turns and hairpins showed that these peptide fragments could be clustered
into structural classes (Richardson, 1981; Sibanda & Thorton, 1985). Such classifications
have also been made across all loops (Burke, Deane & Blundell, 2000; Chothia & Lesk,
1987; Donate et al., 1996; Espadaler et al., 2004; Oliva et al., 1997; Vanhee et al., 2011)
How to cite this article Choi et al. (2013), How long is a piece of loop? PeerJ 1:e1; DOI 10.7717/peerj.1
mailto:[email protected]:[email protected]://peerj.com/academic-boards/editors/https://peerj.com/academic-boards/editors/http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://peerj.comhttp://dx.doi.org/10.7717/peerj.1
-
or within specific protein families such as antibody complementarity determining
regions (CDRs) (Al-Lazikani, Lesk & Chothia, 1998; Chothia & Lesk, 1987; Chothia et
al., 1989). Loop classifications are generally based on local properties such as sequence,
the secondary structures from which the loop starts and finishes (anchor region), the
distance between the anchors, and the geometrical shape along the loop structure
(Kwasigroch, Chomilier & Mornon, 1996; Leszczynski & Rose, 1986; Ring et al., 1992; Wojcik,
Mornon & Chomilier, 1999).
Loops can also be classified in terms of function. There is some evidence that a loop
can have local functionality. Experiments have been carried out which support the idea
that swapping a local loop sequence for a different functional loop sequence enables the
new function to be taken on (Pardon et al., 1995; Toma et al., 1991; Wolfson et al., 1991).
One important example of functional loop exchange is in the development of humanised
antibodies (Queen et al., 1989; Riechmann et al., 1988).
Accurate protein loop structure prediction remains an open question. Protein loop
predictors have dealt with the problem as a case of local protein structure prediction.
Protein structures are hypothesised to be in thermodynamic equilibrium with their
environment (Anfinsen, 1973). Thus the primary determinant of a protein structure
is considered to be its atomic interactions, i.e. its amino acid sequence. An analogous
conjecture has arisen at the local scale where environment other than loop structure is
fixed. Thus the modelling of protein loops is often considered a mini protein folding
problem (Fiser, Do & Sali, 2000; Nagi & Regan, 1997). Although most loop structure
prediction methods are based on this conjecture, loop sequence alone is not the complete
determinant of the loop structure as even identical loop sequences can take multiple
structural conformations depending on external environmental factors such as solvent
and ligand binding (Fernandez-Fuentes & Fiser, 2006). Quintessential examples of such
multiple loop structure conformations can be found in antibody CDR loops upon antigen
binding (Choi & Deane, 2011).
Database search methods have been successful in the realm of loop structure prediction
(Verschueren et al., 2011). They depend upon the assumption that similarity between local
properties may suggest similar local structures. All database search methods work in an
analogous fashion using either a complete set or a classified set of loops and selecting
predictions using local features including sequence similarity and anchor geometry
(Choi & Deane, 2010; Fernandez-Fuentes, Oliva & Fiser, 2006; Hildebrand et al., 2009;
Peng & Yang, 2007; Wojcik, Mornon & Chomilier, 1999). Ab initio loop modelling methods
aim to predict peptide fragments that do not exist in homology modelling templates
without structure databases. Generally, ab initio methods generate large local structure
conformation sets and select predictions (de Bakker et al., 2003; Fiser, Do & Sali, 2000;
Jacobson et al., 2004; Mandell, Coutsias & Kortemme, 2009; Soto et al., 2008). The generated
loop candidates are optimised against scoring functions. In all loop modelling procedures
anchor regions are often problematic and the accuracy of loop modelling depends upon
the distance between the anchors (Xiang, 2006).
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 2/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1
-
Here, we focus on a specific local property of protein loop structure: the distance
between the two terminal Cα atoms of the loop, which we refer to as its span. The natureof the span distribution is broadly similar across different protein classes or anchor types,
except for loops linking anti-parallel strands (anti-parallel β loops). In particular, the
most highly frequent span appears to stay the same irrespective of the number of residues.
This suggests that the span is distributed independently of other local properties and
global structures. We demonstrate that the observed span distribution can largely be
explained by a simple model of random fluctuations with a given length scale, based on the
Maxwell–Boltzmann distribution.
It is widely believed that the accuracy of loop structure prediction depends on the
number of residues, i.e. the larger the number of residues, the more difficult a loop is
to predict (Choi & Deane, 2010; Karen et al., 2007). We introduce the normalised span
which indicates how stretched a loop is (loop stretch λ). Fully stretched loops (λ ' 1)are almost always predicted accurately, whereas contracted loops (λ� 1) are harder topredict. In fact, shorter loops tend to be more stretched whereas longer loops are likely
to be highly contracted. We suggest that loop stretch should be addressed in practical
modelling situations and loop structure prediction should be concerned with predicting
highly contracted loops.
MATERIALS AND METHODSLoop definitionIn each of the sets of protein structures loops, were identified using the following protocol.
Secondary structures were annotated using JOY (Mizuguchi et al., 1998). A loop structure
was defined as any region between two regular secondary structures that was at least three
residues in length (Donate et al., 1996). Short (less than 4 residues in length) loops were
discarded. Redundancy was removed using sequence identity. If a pair of loops shares over
40% sequence identity (Fernandez-Fuentes & Fiser, 2006), the loop which has a higher
average B-factor was discarded.
Membrane protein structuresMembrane proteins (3,789 chains) were extracted from PDBTM (Tusnady, Dosztanyi &
Simon, 2004). The membrane layer was defined as being from−20 to+20 Å (Scott et al.,
2008) from the centre of the protein and loops whose two end Cα atom coordinates wereoutside the layer were discarded. A total of 1,027 non-redundant membrane loops were
defined.
Soluble protein structuresAll protein chains determined by X-ray crystallography which share less than 99%
sequence identity (
-
Figure 1 The definition of loop span and loop stretch. Loop span is the separation of the two Cαs ateither end of the loop. In this example, 2J9O Chain A (198-205) has a span of 13.7 Å and contains 8residues. Maximum span can be calculated from the number of residues in the loop to be 21.6 Å. Loopstretch is the normalised span (13.7/21.6' 0.63).
was then used to compare the 3,789 membrane chains against the soluble set. Any chains
found during 5 iterations with an E-value cut-off of 0.001 were removed from the list of
soluble protein chains. A total of 25,191 non-redundant soluble loops were identified from
27,717 soluble protein chains.
Loop span and loop stretchThe loop span (l) is the distance between the two terminal Cα atoms of a loop (Fig. 1).
The maximum span lmax is a function of the number of residues n and calculated asfollows:
lmax(n)=
{γ · (n/2− 1)+ δ if n is even
γ · (n− 1)/2 if n is odd
where γ = 6.046 Å and δ = 3.46 Å (Flory, 1998; Tastan, Klein-Seetharaman & Meirovitch,2009). If the distance between two terminal Cα atoms in the loop (i.e. the span) is l, theloop stretch (λ) of the loop is defined as a normalised span.
λ≡l
lmax. (1)
Note that the values of γ and δ are theoretical approximations so the λ of some loops
may occasionally be larger than 1. Similar notations are found in Ring et al. (1992), Tastan,
Klein-Seetharaman & Meirovitch (2009).
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 4/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1
-
PROTEIN STRUCTURE PREDICTION AND LOOPSTRETCHLoop modelling test setsThere are two modelling test sets. The first set includes loops of 8 residues. The loops were
binned every 0.1 loop stretch. In each bin, 40 test loops were randomly selected. A total of
320 test loops from 0.2 to 1 in loop stretch were used (a full list is given in Table S1).
The second set consists of loops of between 6 and 10 residues in length. Two classes
of loops were collected at each length: contracted loops (λ < 0.4) and stretched loops(λ > 0.95); an identical number of loops was kept in each of these classes at each length.A total of 346 test loops were identified (58, 72, 110, 58 and 48 loops respectively, See
Tables S2 and S3). For example, there are 55 contracted test loops and 55 stretched loops
for loops of 8 residues.
The measurement of accuracy is loop RMSD of all backbone atoms (N, Cα , C and O)after superimposing anchor structures.
MODELLER settingThe default loop refinement script was used. One hundred loop models were sampled
under the molecular dynamics level of slow. The DOPE potential energy (Shen & Sali,
2006) was used for model quality assessment.
FREAD settingA database was constructed using the 27,717 soluble protein chains defined above. All the
parameters were set as default (the environment substitution score cut-off value≥25). Any
results from self-prediction were eliminated.
RESULTSNomenclatureIn this paper, proteins are divided into two main classes: membrane and soluble proteins.
Loops from membrane protein structures are called “membrane loops” and those from
soluble protein structures are referred to as “soluble loops”. Loops are also described by
their secondary structure types: for example, loops connecting anti-parallel β sheets are
termed “anti-parallel β loops”. The physical spatial distance between the two end Cα atomsof a loop is referred to as “span” (l). Maximum loop span (lmax) is the furthest that a setof residues can spread. “Loop stretch” (λ) is the normalised loop span: the observed span
between two Cα atoms at each end of a loop in a protein structure over the loops maximumspan (Fig. 1).
Loop span distributionThe number of residues in a loop is distributed in a similar fashion regardless of anchor
types except for the loops linking anti-parallel β sheets due to the constraint of hydrogen
bonds between adjacent β strands (Fig. 2A). Figure 2B displays how loop spans are
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 5/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1
-
Figure 2 Statistics of protein loops. (A) The frequency distribution of loops containing different numbers of residues. Anti-parallel β loops tendto have fewer residues. (B) The loop span distribution in terms of the anchor secondary structure do not show differences except for anti-parallel βloops. The upper part of the anti-parallel β loop span distribution is omitted in the figure. (C) The distributions of soluble loop span and membraneloop span appear to be similar. (D)–(G) Q–Q plots showing that the membrane and soluble loop span distributions are from the same probabilitydistribution.
distributed for different anchor types. Again, apart from anti-parallel β loops, the loop
span distributions do not change with anchor structures.
The loop span distribution also does not alter when considering different protein
classes. Figures 2C–2G show how the loop spans of membrane loops and soluble loops
are distributed in a similar manner.
Essentially a loop span value reflects how distant the end tips of the two secondary
structures that the loop connects are. These observations suggest that the loop span may
be distributed independently of local anchor structures and protein types, i.e. anchor
distances do not depend on local secondary structure elements or global protein structures.
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 6/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1
-
Figure 3 The span distributions for loops containing different numbers of residues. (A) These appearto show a constant mode. Data here is soluble loops excluding anti-parallel beta loops. (B) The modesfor the span distributions for loops containing different numbers of residues compared to the maximumspan for that length. The span modes were estimated using the Gaussian kernel density estimation. Notethat the estimated mode of loops of 4 residues is close to its maximum span.
The modes of loop span distributions are roughly constant (Fig. 2B), even if we split the
loops in terms of the number of residues (Fig. 3A). We fit our data using the Gaussian
kernel density estimation. The estimated distributions show a nearly constant mode
('13 Å on average, Fig. 3B). This constant span value may be due to protein packing.
Folded proteins tend to be tightly packed and thus secondary structures are placed close
to one another while avoiding side chain steric clashes. This packing effect may mean that
the end points of two secondary structures (i.e. span) will lie within a constant span value
regardless of the number of residues in a loop.
Maxwell–Boltzmann distribution for loop spanFrom the above observations, it appears that loop span is distributed independently of
local anchor structures or global protein classes. Here we assume that a protein loop is an
independent unit of the protein structure and the span is determined regardless of any
other effects including sequence or the rest of the structure.
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 7/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1
-
Here a model for the loop span distribution is established under the hypothesis
that the two end points of a loop fluctuate in three dimensional space, following the
Maxwell–Boltzmann distribution. Two constraints are imposed in this model: the
minimum span lmin and the maximum span as a function of the number of residueslmax(n). Within these constraints, the span oscillates according to a normal distributionN (µ,σ 2) with a given length-scale lmode in three dimensional space.
The underlying assumptions are that the end points cannot approach each other too
closely, and that there is a maximum span achievable for a loop with a given number of
residues (n). Within these constraints, the span is allowed to fluctuate around the givenlength-scale lmode in three dimensional space. Thus, in this model, the loop span l of nresidues is distributed as
l=√
l2x + l2y + l2z lx,ly,lz ∼N
(0,
l2mode2
)(2)
subject to the constraints that l ≥ lmin and l ≤ lmax(n), as stated above. The varianceof l2mode/2 corresponds to a modal span of lmode. Thus there are two parameters to bedetermined in our model: lmin and lmode. We set lmin to 3.8 Å, which is the typical distancebetween two neighbouring Cα atoms in a protein chain. lmode is set to an estimate of theempirical mode using the Gaussian kernel density estimation (12.7 Å).
As there are not many longer loops in the data set, loops longer than 20 residues were
discarded. In addition, all anti-parallel β loops were eliminated due to their physical
constraints. These eliminations left 21,597 soluble loops (The frequency distribution for
each number of residues is in Fig. S2). Having set the two parameters lmin and lmode, loopspans were generated 10 times per model in accordance with the Maxwell–Boltzmann
distribution, preserving the observed distribution of the number of residues (i.e. 10
simulated loop spans were generated for each real loop in the data set). The simulation out-
come is depicted in Fig. 4A. The two distributions show the same shape and the quantile
comparison in Fig. 4B indicates that they are statistically similar except for the tail region.
There are apparent anomalies between the simulated and real span distributions
towards the extremes. The model seems to predict more short-span loops than observed.
Our model imposes a sharp lower threshold at lmin = 3.8 Å, whereas in reality we expecta smoother transition. In other words, we expect our assumption of free fluctuation to
break down when the span gets close to the lower bound and the physical constraints begin
to become relevant. On the other side of the distribution, we see a substantially higher
number of long-span loops (>20 Å) than predicted by the model. The mismatches in the
long-span region tend to become more prominent as the number of residues is increased.
When we examined which loops tend to have exceptionally long spans, we found that
some of these “loops” are domain linkers between independent folding units and therefore
likely to be under different constraints. Others appear to have been misclassified, as the
loop definition used here is based only on the anchors containing at least three consecutive
residues of secondary structures and the loop containing none. This allows segments such
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 8/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1
-
Figure 4 Maxwell–Boltzmann distribution and loop span distribution. (A) The loop span distribution(black) from soluble loops and that of the Maxwell–Boltzmann distribution (red). (B) The Q–Q plotsuggesting that they follow the same distribution.
as termini structures to be included if there happen to be very short helical segments at a
protein structure’s terminus (Fig. S1).
Protein structure prediction and loop stretchThe number of residues in loops is known to be related to the protein stability (Nagi
& Regan, 1997) and the accuracy of most loop modelling techniques. Based on our
observation that the loop span is independent of other properties, we examine its effects
on protein loop structure prediction. Here we introduce loop stretch, the normalised loop
span (Eq. (1)). Loop stretch values take on a range of 0 to 1, which indicates how stretched a
loop is (1: fully stretched).
Figure 5 displays how loop stretch frequencies are distributed for different numbers
of residues, demonstrating that the number of residues is negatively correlated with loop
stretch, i.e. the longer a loop is, the more likely it is to be contracted. This may suggest
that, instead of the standard belief that loop modelling performs worse as the number
of residues in the loop increases, it may be that the real problem is better described by
considering how stretched the loop to be predicted is. For example, if a loop contains many
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 9/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1
-
Figure 5 Loop stretch of long and short loops. Loop stretch distributions for loops containing differentnumbers of residues. Shorter loops tend to be more stretched whereas longer loops are likely to be morecontracted. Only soluble loops excluding anti-parallel β loops are plotted.
residues but is highly stretched, it will be predicted relatively accurately, as it can take on
only a small number of different conformations.
In order to check the relationship between accuracy and loop stretch we used a
test set containing only 8 residue loops with 40 non-redundant loops in every 0.1
loop stretch bin. Two loop modelling methods, which use two different sampling
methods, were tested. MODELLER (Fiser, Do & Sali, 2000) is a popular protein structure
prediction programme which has a built-in ab initio loop modelling module. FREAD
(Choi & Deane, 2010) is a database search method which samples candidate loops
depending on local properties and ranks predictions based on local loop sequence
similarity and anchor geometry matches.
The average accuracy of MODELLER shows a negative linear correlation against loop
stretch for the first test set (Fig. 6A). In the case of fully stretched loops (λ > 0.95),MODELLER can produce consistently accurate predictions, but its predictions worsen
as the target loops are less stretched. FREAD produces more accurate predictions than
MODELLER in general. However its predictions also begin to disperse as the loops become
more contracted (Fig. 6B). FREAD generates candidate loops based on anchor matches
and sequence similarity for a given loop target. This may imply that contracted loops tend
to have multiple structural conformations or stringent sequence identity is required to
predict such highly contracted loops. It should be noted that FREAD is not able to predict
all the target loops due to the incompleteness of the structure database it uses (Fig. 6C).
In order to further assess the effect of loop stretch in loop structure prediction,
MODELLER was re-examined on a second set. The second test set consists of loops from
6 to 10 residues in length. In this set, for each number of residues, the same numbers
of loops (See Materials and Methods) were selected for both contracted (λ < 0.4) andfully stretched loops (λ > 0.95). MODELLER produces consistently accurate results for
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 10/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1
-
Figure 6 Protein loop structure prediction and loop stretch. Accuracy of protein loop structure prediction methods do not only depend on thenumber of residues, but also on loop stretch. MODELLER (A) and FREAD (B) both show accurate results when the target loop is stretched on thefirst set (including loops of 8 residues in length only). MODELLER shows worse prediction as loop stretch decreases whereas FREAD gives consistentaccuracy on loop stretch. However both fail to predict very contracted loops (λ < 0.4). (C) The coverage of FREAD predictions in terms of loopstretch. (D) The second test set (contracted (λ < 0.4) and stretched (λ > 0.95) loops). The test loops are also split by the number of residues. Forfully stretched loops (λ > 0.95), regardless of the number of residues, MODELLER predicts accurately.
fully stretched loops regardless of the number of residues, but fails to accurately predict
contracted loops (Fig. 6D).
We calculated the partial correlations (Spearman’s rank correlation) between accuracy,
and the number of residues and loop stretch on the second test set so as to investigate what
affects the prediction accuracy more (the number of residues or loop stretch). The partial
correlation between loop stretch and RMSD is larger than that between the number of
residues and RMSD (−0.465 and 0.367 respectively). Loop stretch, just like the number of
residues is something that can be calculated without knowledge of loop conformation and
therefore can be used in the design of loop structure prediction software.
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 11/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1
-
DISCUSSIONIn this paper, we focus on a specific local property (span) and demonstrate that the modes
of loop span distribution appear to be independent of the number of residues. Loop
span shows a distinct frequency distribution which does not depend on anchor types or
protein classes. From these observations, we hypothesised that loop span is independent
of the other effects and showed how the loop span distribution appears to correspond to a
truncated Maxwell–Boltzmann distribution.
The reason behind the independence of loop span from the number of loop residues
or secondary structure type is not known. The fact that the loop span distribution can be
captured by a simple Maxwell–Boltzmann model allows one to speculate that protein loop
structure prediction is indeed a local mini protein folding problem.
ADDITIONAL INFORMATION AND DECLARATIONS
FundingYoonjoo Choi was funded by the Department of Statistics, St. Cross College and the
University of Oxford. Sumeet Agarwal was funded by the Clarendon Fund of the University
of Oxford. The funders had no role in study design, data collection and analysis, decision to
publish, or preparation of the manuscript.
Grant DisclosuresThe following grant information was disclosed by the authors:
Department of Statistics, St. Cross College and the University of Oxford.
Clarendon Fund of the University of Oxford.
Competing InterestsProfessor Charlotte M. Deane is an editor of PeerJ.
Author Contributions• Yoonjoo Choi conceived and designed the experiments, performed the experiments,
analyzed the data, contributed reagents/materials/analysis tools, wrote the paper.
• Sumeet Agarwal performed the experiments, analyzed the data, contributed
reagents/materials/analysis tools.
• Charlotte M. Deane conceived and designed the experiments, wrote the paper.
Supplemental InformationSupplemental information for this article can be found online at http://dx.doi.org/
10.7717/peerj.1.
REFERENCESAl-Lazikani B, Lesk AM, Chothia C. 1998. Standard conformations for the canonical structures of
immunoglobulins. Journal of Molecular Biology 273:927–948 DOI ./jmbi...
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 12/15
https://peerj.comhttp://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.7717/peerj.1http://dx.doi.org/10.1006/jmbi.1997.1354http://dx.doi.org/10.7717/peerj.1
-
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. GappedBLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic AcidsResearch 25:3389–3402 DOI ./nar/...
Anfinsen CB. 1973. Principles that govern the folding of protein chains. Science 181:223–230DOI ./science....
Burke DF, Deane CM, Blundell TL. 2000. Browsing the SLoop database of structurally classifiedloops connecting elements of protein secondary structure. Bioinformatics 16:513–519DOI ./bioinformatics/...
Choi Y, Deane CM. 2010. FREAD revisited: accurate loop structure prediction using a databasesearch algorithm. Proteins 78:1431–1440 DOI ./prot..
Choi Y, Deane CM. 2011. Predicting antibody complementarity determining region structureswithout classification. Molecular Biosystems 7:3327–3334 DOI ./cmbc.
Chothia C, Lesk AM. 1987. Canonical structures for the hypervariable regions ofimmunoglobulins. Journal of Molecular Biology 196:901–917 DOI ./-()-.
Chothia C, Lesk AM, Tramontano A, Levitt M, Smith-Gill SJ, Air G, Sheriff S,Padlan EA, Davies D, Tulip WR, Colman PM, Spinelli S, Alzari PM, Poljak RJ. 1989.Conformations of immunoglobulin hypervariable regions. Nature 342:877–883DOI ./a.
de Bakker PI, DePristo MA, Burke DF, Blundell TL. 2003. Ab initio construction of polypeptidefragments: accuracy of loop decoy discrimination by an all-atom statistical potential andthe AMBER force field with the Generalized Born solvation model. Proteins 51:21–40DOI ./prot..
Donate LE, Rufino SD, Canard LH, Blundell TL. 1996. Conformational analysis and clustering ofshort and medium size loops connecting regular secondary structures: a database for modelingand prediction. Protein Science 5:2600–2616 DOI ./pro..
Espadaler J, Fernandez-Fuentes N, Hermoso A, Querol E, Aviles FX, Sternberg MJE, Oliva B.2004. ArchDB: automated protein loop classification as a tool for structural genomics. NucleicAcids Research 32:D185–D188 DOI ./nar/gkh.
Fernandez-Fuentes N, Fiser A. 2006. Saturating representation of loop conformational fragmentsin structure databanks. BMC Structural Biology 6: DOI ./---.
Fernandez-Fuentes N, Oliva B, Fiser A. 2006. A supersecondary structure library and searchalgorithm for modeling loops in protein structures. Nucleic Acids Research 34:2085–2097DOI ./nar/gkl.
Fiser A, Do RK, Sali A. 2000. Modeling of loops in protein structures. Protein Science 9:1753–1773DOI ./ps....
Flory P. 1998. Statistical mechanics of chain molecules. Hanser.
Hildebrand PW, Goede A, Bauer RA, Gruening B, Ismer J, Michalsky E, Preissner R. 2009.SuperLooper – a prediction server for the modeling of loops in globular and membraneproteins. Nucleic Acids Research 37:W571–W574 DOI ./nar/gkp.
Jacobson MP, Pincus DL, Rapp CS, Day TJ, Honig B, Shaw DE, Friesner RA. 2004. A hierarchicalapproach to all-atom protein loop prediction. Proteins 55:351–367 DOI ./prot..
Karen AR, Weigelt CA, Nayeem A, Krystek SR Jr. 2007. Loopholes and missing links in proteinmodeling. Protein Science 16:1–14 DOI ./ps..
Kwasigroch KM, Chomilier J, Mornon JP. 1996. A global taxonomy of loops in globular proteins.Journal of Molecular Biology 259:855–872 DOI ./jmbi...
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 13/15
https://peerj.comhttp://dx.doi.org/10.1093/nar/25.17.3389http://dx.doi.org/10.1126/science.181.4096.223http://dx.doi.org/10.1093/bioinformatics/16.6.513http://dx.doi.org/10.1002/prot.22658http://dx.doi.org/10.1039/c1mb05223chttp://dx.doi.org/10.1016/0022-2836(87)90412-8http://dx.doi.org/10.1038/342877a0http://dx.doi.org/10.1002/prot.10235http://dx.doi.org/10.1002/pro.5560051223http://dx.doi.org/10.1093/nar/gkh002http://dx.doi.org/10.1186/1472-6807-1186-1115http://dx.doi.org/10.1093/nar/gkl156http://dx.doi.org/10.1110/ps.9.9.1753http://dx.doi.org/10.1093/nar/gkp338http://dx.doi.org/10.1002/prot.10613http://dx.doi.org/10.1110/ps.072887807http://dx.doi.org/10.1006/jmbi.1996.0363http://dx.doi.org/10.7717/peerj.1
-
Leszczynski JF, Rose GD. 1986. Loops in globular proteins: a novel category of secondarystructure. Science 234:849–855 DOI ./science..
Mandell DJ, Coutsias EA, Kortemme T. 2009. Sub-angstrom accuracy in protein loopreconstruction by robotics-inspired conformational sampling. Nature Methods 6:551–552DOI ./nmeth-.
Mizuguchi K, Deane CM, Blundell TL, Johnson MS, Overington JP. 1998. JOY: proteinsequence-structure representation and analysis. Bioinformatics 14:617–623 DOI ./bioin-formatics/...
Nagi AD, Regan L. 1997. An inverse correlation between loop length and stability in afour-helix-bundle protein. Folding and Design 2:67–75 DOI ./S-()-.
Oliva B, Bates PA, Querol E, Aviles FX, Sternberg MJE. 1997. An automated classficationof the structure of protein loops. Journal of Molecular Biology 266:814–830DOI ./jmbi...
Pardon E, Haezebrouck P, De Baetselier A, Hooke SD, Fancourt KT, Dobson JDCM, Dael HV,Joniau M. 1995. A Ca(2+)-binding chimera of human lysozyme and bovine alpha-lactalbuminthat can form a molten globule. Journal of Biological Chemistry 270:10514–10524DOI ./jbc....
Peng H, Yang A. 2007. Modeling protein loos with knowledge-based prediction of sequence-structure alignment. Bioinformatics 23:2836–2842 DOI ./bioinformatics/btm.
Queen C, Schneider WP, Selick HE, Payne PW, Landolfi NF, Duncan JF,Avdalovic NM, Levitt M, Junghans RP, Waldmann TA. 1989. A humanized antibodythat binds to the interleukin 2 receptor. Proceedings of the National Academy of Sciences USA86:10029–10033 DOI ./pnas....
Richardson JS. 1981. The anatomy and taxonomy of protein structure. Advances in ProteinChemistry 34:167–339 DOI ./S-()-.
Riechmann L, Clark M, Waldmann H, Winter G. 1988. Reshaping human antibodies for therapy.Nature 332:323–327 DOI ./a.
Ring CS, Kneller DG, Langridge R, Cohen FE. 1992. Taxonomy and conformational analysis ofloops in proteins. Journal of Molecular Biology 224:685–699 DOI ./-()-V.
Scott KA, Bond PJ, Ivetac A, Chetwynd AP, Khalid S, Sansom MSP. 2008. Coarse-GrainedMD simulations of membrane protein-bilayer self-assembly. Structure 16:621–630DOI ./j.str....
Shen MY, Sali A. 2006. Statistical potential for assessment and prediction of protein structures.Protein Science 15:2507–2524 DOI ./ps..
Sibanda BL, Thorton JM. 1985. Beta-hairpin families in globular proteins. Nature 316:170–174DOI ./a.
Soto CS, Fasnacht M, Zhu J, Forrest L, Honig B. 2008. Loop modeling: sampling, filtering, andscoring. Proteins 70:834–843 DOI ./prot..
Tastan O, Klein-Seetharaman J, Meirovitch H. 2009. The effect of loops on the structuralorganization of-helical membrane proteins. Biophysical Journal 96:2299–2312DOI ./j.bpj....
Toma S, Campagnoli S, Margarit I, Gianna R, Grandi G, Bolognesi M, De Filippis V, Fontana A.1991. Grafting of a calcium-binding loop of thermolysin to Bacillus subtilis neutral protease.Biochemistry 30:97–106 DOI ./bia.
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 14/15
https://peerj.comhttp://dx.doi.org/10.1126/science.3775366http://dx.doi.org/10.1038/nmeth0809-551http://dx.doi.org/10.1093/bioinformatics/14.7.617http://dx.doi.org/10.1093/bioinformatics/14.7.617http://dx.doi.org/10.1016/S1359-0278(97)00007-2http://dx.doi.org/10.1006/jmbi.1996.0819http://dx.doi.org/10.1074/jbc.270.18.10514http://dx.doi.org/10.1093/bioinformatics/btm456http://dx.doi.org/10.1073/pnas.86.24.10029http://dx.doi.org/10.1016/S0065-3233(08)60520-3http://dx.doi.org/10.1038/332323a0http://dx.doi.org/10.1016/0022-2836(92)90553-Vhttp://dx.doi.org/10.1016/j.str.2008.01.014http://dx.doi.org/10.1110/ps.062416606http://dx.doi.org/10.1038/316170a0http://dx.doi.org/10.1002/prot.21612http://dx.doi.org/10.1016/j.bpj.2008.12.3894http://dx.doi.org/10.1021/bi00215a015http://dx.doi.org/10.7717/peerj.1
-
Tusnady GE, Dosztanyi ZD, Simon I. 2004. Transmembrane proteins in the Protein Data Bank:identification and classification. Bioinformatics 20:2964–2972 DOI ./bioinformat-ics/bth.
Vanhee P, Verschueren E, Baeten L, Stricher F, Serrano L, Rousseau F, Schymkowitz J. 2011.BriX: a database of protein building blocks for structural analysis, modeling and design. NucleicAcids Research 39:D435–D442 DOI ./nar/gkq.
Verschueren E, Vanhee P, van der Sloot AM, Serrano L, Rousseau F, Schymkowitz J. 2011.Protein design with fragment databases. Current Opinion in Structural Biology 21:452–459DOI ./j.sbi....
Wang G, Dunbrack RL Jr. 2005. PISCES: recent improvements to a PDB sequence culling server.Nucleic Acids Research 33:W94–W98 DOI ./nar/gki.
Wojcik J, Mornon JP, Chomilier J. 1999. New efficient statistical sequence-dependent structureprediction of short to medium-sized protein loops based on an exhaustive loop classification.Journal of Molecular Biology 289:1469–1490 DOI ./jmbi...
Wolfson AJ, Kanaoka M, Lau FT, Ringe D. 1991. Insertion of an elastase-binding loop intointerleukin-1 beta. Protein Engineering 4:313–317 DOI ./protein/...
Xiang Z. 2006. Advances in homology protein structure modeling. Current Protein & PeptideScience 7:217–227 DOI ./.
Choi et al. (2013), PeerJ, DOI 10.7717/peerj.1 15/15
https://peerj.comhttp://dx.doi.org/10.1093/bioinformatics/bth340http://dx.doi.org/10.1093/bioinformatics/bth340http://dx.doi.org/10.1093/nar/gkq972http://dx.doi.org/10.1016/j.sbi.2011.05.002http://dx.doi.org/10.1093/nar/gki402http://dx.doi.org/10.1006/jmbi.1999.2826http://dx.doi.org/10.1093/protein/4.3.313http://dx.doi.org/10.2174/138920306777452312http://dx.doi.org/10.7717/peerj.1
How long is a piece of loop?IntroductionMaterials and MethodsLoop definitionMembrane protein structuresSoluble protein structuresLoop span and loop stretch
Protein Structure Prediction and Loop StretchLoop modelling test setsMODELLER settingFREAD setting
ResultsNomenclatureLoop span distributionMaxwell--Boltzmann distribution for loop spanProtein structure prediction and loop stretch
DiscussionReferences