caculation of protein structures with ambiguous distance …€¦ · model noe data were generated...

16
J. Mol. Biol. (1995) 245, 645–660 Calculation of Protein Structures with Ambiguous Distance Restraints. Automated Assignment of Ambiguous NOE Crosspeaks and Disulphide Connectivities Michael Nilges European Molecular Biology The distances derived from nuclear Overhauser effect (NOE) spectra are Laboratory, Meyerhofstr. 1 usually converted into three-dimensional structures by computer algorithms D-69117 Heidelberg, FRG loosely termed distance geometry. To a varying degree, these methods require that the distance data is unambiguously assigned to pairs of atoms. Typically, however, there are many NOE crosspeaks that cannot be assigned without some knowledge of the structure. These crosspeaks have to be assigned in an iterative manner, using preliminary structures calculated from the unambiguous crosspeaks. In this paper, I present an alternative to this iterative approach. The ambiguity of an NOE crosspeak is correctly described in terms of the distances between all pairs of protons that may be involved. A simple restraining term is defined in terms of ‘‘ambiguous’’ distance restraints that can allow all possible assignments. A new minimization procedure based on simulated annealing is described that is capable of using highly ambiguous data for ab initio structure calculations. In particular, it is feasible to specify the restraint list directly in terms of the proton chemical shift assignment and the NOE peak table, without having assigned NOE crosspeaks to proton pairs. While the primary aim of this paper is determining the global fold of proteins from NMR data, similar strategies can be used for other types of ambiguous distance data. The application to one example, disulphide bridges with unknown connectivity, is described. Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments. Varying degrees of ambiguity in the data were assumed. The method obtained the correct polypeptide fold even when all distance restraints were ambiguous. Thus, the new approach may facilitate structure calculations with data derived from very overlapped spectra. It is also a step towards automating the calculation of structures from NMR data. This could prove especially valuable for data derived from three- and four-dimensional experiments. The approach may also prove useful for model building studies and tertiary structure prediction. Keywords: distance geometry; disulphide bridge; NOE; simulated annealing; three-dimensional solution structure Introduction Converting distance information into three- dimensional structures is an important and difficult non-linear optimization problem. The most important example is determining structures of biological macromolecules from distance data obtained by 1 H nuclear magnetic resonance (NMR) experiments. However, distance data can also be obtained from other sources such as fluorescence transfer experiments, model building, and analysis of multiple sequence alignments (Go ¨bel et al ., 1994). Indeed, the interest in distance geometry and related methods started years before the 1 H NMR structure determination of proteins became feasible (Crippen, 1977; Crippen & Havel, 1978). Metric matrix distance geometry (MMDG; Crippen & Havel, 1978) tackles the problem of Abbreviations used: MMDG, metric matrix distance geometry; ISPA, isolated spin pair approximation; NOE, nuclear Overhauser effect; r.m.s., root-mean-square; SA, simulated annealing; 3D, three-dimensional; 4D, 4-dimensional. 0022–2836/95/0506645–16 $08.00/0 7 1995 Academic Press Limited

Upload: others

Post on 25-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286 Cust. Ref. No. PEW 96/94 [SGML]

J. Mol. Biol. (1995) 245, 645–660

Calculation of Protein Structures with AmbiguousDistance Restraints. Automated Assignment ofAmbiguous NOE Crosspeaks and DisulphideConnectivities

Michael Nilges

European Molecular Biology The distances derived from nuclear Overhauser effect (NOE) spectra areLaboratory, Meyerhofstr. 1 usually converted into three-dimensional structures by computer algorithmsD-69117 Heidelberg, FRG loosely termed distance geometry. To a varying degree, these methods

require that the distance data is unambiguously assigned to pairs of atoms.Typically, however, there are many NOE crosspeaks that cannot be assignedwithout some knowledge of the structure. These crosspeaks have to beassigned in an iterative manner, using preliminary structures calculatedfrom the unambiguous crosspeaks. In this paper, I present an alternative tothis iterative approach. The ambiguity of an NOE crosspeak is correctlydescribed in terms of the distances between all pairs of protons that may beinvolved. A simple restraining term is defined in terms of ‘‘ambiguous’’distance restraints that can allow all possible assignments. A newminimization procedure based on simulated annealing is described that iscapable of using highly ambiguous data for ab initio structure calculations.In particular, it is feasible to specify the restraint list directly in terms of theproton chemical shift assignment and the NOE peak table, without havingassigned NOE crosspeaks to proton pairs. While the primary aim of thispaper is determining the global fold of proteins from NMR data, similarstrategies can be used for other types of ambiguous distance data. Theapplication to one example, disulphide bridges with unknown connectivity,is described. Model NOE data were generated from the X-ray crystalstructure of a small protein with known chemical shift assignments. Varyingdegrees of ambiguity in the data were assumed. The method obtained thecorrect polypeptide fold even when all distance restraints were ambiguous.Thus, the new approach may facilitate structure calculations with dataderived from very overlapped spectra. It is also a step towards automatingthe calculation of structures from NMR data. This could prove especiallyvaluable for data derived from three- and four-dimensional experiments.The approach may also prove useful for model building studies and tertiarystructure prediction.

Keywords: distance geometry; disulphide bridge; NOE; simulatedannealing; three-dimensional solution structure

Introduction

Converting distance information into three-dimensional structures is an important and difficultnon-linear optimization problem. The mostimportant example is determining structures of

biological macromolecules from distance dataobtained by 1H nuclear magnetic resonance (NMR)experiments. However, distance data can also beobtained from other sources such as fluorescencetransfer experiments, model building, and analysisof multiple sequence alignments (Gobel et al.,1994). Indeed, the interest in distance geometryand related methods started years before the 1HNMR structure determination of proteins becamefeasible (Crippen, 1977; Crippen & Havel, 1978).

Metric matrix distance geometry (MMDG;Crippen & Havel, 1978) tackles the problem of

Abbreviations used: MMDG, metric matrix distancegeometry; ISPA, isolated spin pair approximation; NOE,nuclear Overhauser effect; r.m.s., root-mean-square; SA,simulated annealing; 3D, three-dimensional; 4D,4-dimensional.

0022–2836/95/0506645–16 $08.00/0 7 1995 Academic Press Limited

Page 2: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data646

calculating three-dimensional structures from 1HNMR distances in a direct and mathematicallyelegant way (Braun et al., 1981; Havel & Wuthrich,1984). However, other computational solutions havebeen applied to 1H NMR structure determination,mostly torsion angle minimization using a variabletarget function (Braun & Go, 1985; Guntert et al.,1991), and, to a lesser extent, simulated annealingstarting from extended or random structures (Nilgeset al., 1988a, 1991). Since these techniques use onlydistance data to determine the structure, they havealso been called ‘‘distance geometry’’ methods.

In general, all previous distance geometrymethods have been used with the underlyingassumption that each experimental distance restraintis unambiguously assigned to a pair of atoms, apartfrom special cases such as methyl and methylenegroups. Often, however, several protons have thesame chemical shift. An NOE crosspeak involvingthese protons cannot be directly converted into adistance restraint between two atoms. Thus, evenafter the proton chemical shifts have been completelyassigned, the task of assigning the ambiguous NOEcrosspeaks remains. For protons that are spatiallyclose due to covalent bonds or secondary structure,this task can generally be completed. However, theseNOEs are often not very useful for determining thetertiary fold of the protein. Many ambiguouslong-range NOE crosspeaks can only be assigned onthe basis of a structural model.

Structure calculations are therefore usuallyperformed in an iterative way. NOEs that can beassigned unambiguously are used to calculatepreliminary three-dimensional structures. Ad-ditional NOE crosspeaks can be assigned on thebasis of these structures (e.g. Guntert et al., 1993).These additional restraints are used to calculate asecond generation of structures, which in turn is thenused to obtain more assignments. The procedure isiterated until no further assignments can beobtained. This ‘‘bootstrap’’ approach has beenpartially automated (Guntert et al., 1993; Meadowset al., 1994).

This procedure can fail. For example, the distancebetween two protons in the preliminary structuresmay be considerably larger than 5 A, even thoughthese protons give rise to an NOE crosspeak. Aside-chain rotation can for example increase thedistance by several A without affecting the over-allfold of the protein. Hence, in addition to theexperiment that enough NOEs, can initially beassigned to calculate preliminary structures with thecorrect fold, the bootstrap approach relies also onhow well the preliminary structures sample theconformation space. A more fundamental limitationof the approach arises if one assumes that only oneinterpretation of the NOE spectrum is consistentwith a three-dimensional structure. Several assign-ments of the same NOE spectrum may be possible,resulting in different three-dimensional structures(Nilges, 1993).

The aim of this paper is to present an alternativeto the bootstrap assignment approach. As usual, the

distance restraints derived from the NOE spectra areincorporated into a target function. However, insteadof refining the restraint list to remove the ambiguitiesin the data, the proposed target function allows allpossibilities to be specified for each restraint. Asimulated annealing minimization scheme is devel-oped to minimize this target function. The generalproblem of calculating the global structure of amacromolecule from ambiguous NMR data ab initiohas not been systematically addressed before.Previous work in this direction (Nilges, 1993) wasrestricted to symmetric dimers and ab initiocalculations only for simple protein folds. With thenew method, data sets with a high degree ofambiguity can be used. Thus, I show that the correctprotein fold can be obtained even when all distancerestraints are ambiguous. It is also shown that themethod is applicable to other types of ambiguousdistance data, for example the assignment ofdisulphide connectivities. Throughout the paper,disulphide bonds are used in an analogous fashionto NOE derived distance restraints.

Calculation Strategy

For all structure calculations and analysis, theprogram X-plor version 3.1 was used (Brunger, 1992).Ambiguous distance restraints for BPTI werecalculated from the X-ray crystal structure ofDeisenhofer & Steigemann (1975; PDC (Bernsteinet al., 1977) accession code 4PTI). The protonchemical shift assignments were taken from Wagneret al. (1987). Hydrogens were added to the X-raycrystal structure with the HBUILD function of X-plor(Brunger & Karplus, 1988).

NOE derived distance restraints

The volume of an ambiguous NOE crosspeak atthe chemical shift coordinates F1, F2 containscontributions from all proton pairs with the samechemical shift assignments:

NOEF1,F2 = si $ 4F1,D15,j $ 4F2,D25

NOEij , (1)

where 4F,D5 is defined as the set of all protons withchemical shifts between F − D and F + D. If thecontribution of one particular spin pair is muchlarger than that of any other, the NOE crosspeak canbe assigned. Within the isolated spin pair approxi-mation (ISPA), contributions to the crosspeak volumeor the buildup curve simply depend on the distancesDij for each spin pair i,j

ddtm

NOEij = cD − 6ij , (2)

where c is a constant. The term ‘‘spin’’ in this paperrefers equally to protons, methylene groups, methylgroups, or equivalent protons on aromatic rings (seealso Model Data Sets below). Making the usualassumption that the order parameters and internal

Page 3: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data 647

Figure 1. a, Ambiguous distance restraints using directly the chemical shifts stored in the ‘‘Q’’ array. b, Ambiguousdistance restraints for the 3 disulphide bridges in BPTI.

correlation times for all spin pairs are identical, oneobtains for the ambiguous crosspeak

ddtm

NOEF1,F2 = c0 si $ 4F1,D15,j $ 4F2,D25

D − 6ij 1 . (3)

The ambiguous NOE depends on the sum of theinverse sixth powers of the individual distances.With the definition of a ‘‘r − 6-summed’’ distance D�

D� F1,F2 = 0 si $ 4F1,D15,j $ 4F2,D25

D − 6ij 1

− 1/6

; (4)

one can write equation (3) simply as

ddtm

NOEF1,F2 = cD� − 6F1,F2. (5)

D� is similar to the �r − 6� − 1/6 average distance, whichhas been introduced for the treatment of methyl andmethylene groups (Brunger et al., 1986). Thedifference is a scale factor of n1/6, where n is thenumber of spin pairs in the sum (equation (4)). Sincethe data sets used in this paper contain highlyambiguous data points with many contributingdistances (i.e. large n), this scale factor is notnegligible. The �r − 6� − 1/6 average distance is betweenthe smallest and the largest of the n distances, whileD� is always shorter than any of the contributingdistances. Equation (4) has first been suggested forthe treatment of methylene and methyl protons byLevy et al. (1989). In contrast to the multidimensionalpotential developed by Habazettl et al. (1990),originally for a ‘‘floating assignment’’ of methyleneprotons, equation (4) does not assume that there isonly one single significant contribution to anambiguous NOE.

The definition of D� (equation (4)) makes it possible

to specify an NOE derived distance restraint directlyin terms of the chemical shift coordinates of the peak.While the current version of X-plor (3.1) (Brunger,1992) does not support the restraint specificationbased on chemical shifts directly, one can still use thisconcept by storing the proton chemical shifts forexample in the Q (occupancy) array of the PDB(Bernstein et al., 1977) file. The restraints can thenbe specified in terms of the values of Q withthe ‘‘ATTRIBUTE’’ command. An example isshown in Figure 1a for a D value of 0.02 p.p.m. Thistype of restraint specification can be generalized ina simple manner, to include, for example, infor-mation from 3D or 4D heteronuclear NOEexperiments, or 3D TOCSY-NOESY experiments(see, e.g., Clore & Gronenborn, 1991, and Oschkinatet al., 1994 for reviews). It is also easily possible tointroduce a different D for each crosspeak. Restraintfiles of this type can be automatically generated froma peak table with a simple computer programwithout any knowledge of the three-dimensionalstructure.

Disulphide bridges

Identifying disulphide connectivities on the basisof NOE data is often not straightforward. Likeambiguous NOEs, unknown connectivities areusually left out of the calculation of preliminarystructures. Klaus et al. (1993) have described anapproach based on a comparison of Cys Cb–Cb

distances in the preliminary structures with adatabase of known structures. Other more ad hocmethods have been used; these have been reviewedby Klaus et al. (1993).

While unknown disulphide connectivities arenot completely analogous to ambiguous NOEcrosspeaks, the two cases are similar enough for the

Page 4: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data648

Figure 2. Sequence of simulated annealing calculations.Random is the new simulated annealing protocoldescribed in this paper (Table 1); regularize is the standardX-PLOR dgsa protocol (Nilges et al., 1988b; Kuszewskiet al., 1992; Brunger, 1992); refine is a refinement protocol(Table 2; Nilges, 1993).

mate conformation from ambiguous data. Thestructures are then regularized as describedpreviously (Nilges et al., 1988c; Kuszewsky et al.,1992) with the DGSA protocol in the X-PLORmanual (Brunger, 1992). This is followed bytwo refinement stages (Nilges, 1993). The applicationof this strategy to a variety of different experimentaldata sets and a detailed investigation of samplingand convergence properties will be publishedelsewhere.

The most important changes of the presentrandom protocol from previous work (Nilges et al.,1988b) are briefly described below. Details are givenin Table 1.

Energy term ENOE

Using expression (4), the information content of theambiguous NOESY spectrum can easily be ex-pressed in form of a target function. In the presentpaper, we use a harmonic ‘‘flat-bottom’’ potentialwith a linear behaviour for large deviations (Nilgeset al., 1988a).

(Lk − D� k )2 if D� k<Lk

8ENOE = sk

0 if LkED� kEUk

(D� k − Uk )2 if Uk<D� k<S

A (D� k − S ) − 1 + B (D� k − S ) + C if D� k > S, (7)where Lk and Uk are the lower and upper boundson the r − 6-summed distance derived from the sizeof the NOE crosspeak, D� is the r − 6-summed distancecalculated from the current model via equation (4),S is the value where the potential changes betweenthe harmonic and asymptotic shapes, C is theslope of the asymptote, and the coefficients A and Bare determined such that the potential is continuousand differential everywhere. While the exact value ofthe asymptotic slope does not seem to have a large

current purpose. Thus, in order to include theunknown disulphide connectivity in the calculation,we use an ambiguous distance restraint between onespecific Cys Sg atom and all other Cys Sg atomsfollowing equation (4), where D� is now the length ofthe covalent sulphur–sulphur bond, 2.02 A. Therestraints necessary for the three disulphide bridgesin BPTI are shown in Figure 1b.

This type of restraint can be satisfied by two ormore Sg atoms clustered together. In contrast to NOEdistances, clusters of three or more Sg atoms areobviously unphysical and can be rejected. Fortu-nately, these clusters showed up rarely in the testcalculations. In particular, none of the structureswith a low overall restraint energy had these clusters(see Results).

An efficient simulated annealing protocol

Distance data of the form of equation (4) can beeasily incorporated in a target function in thestandard way:

E = wcovEcov + wexvEexv + wNOEENOE ; (6)

where Ecov describes covalent geometry (bondlengths, bond angles, planarity, chirality), Eexv is asimple repulsive function that prevents atoms fromoverlapping, and ENOE depends on the experimentaldistance information. The wx are the respectiveweights on the three energy terms.

Starting coordinates for a refinement cannot beobtained by standard distance geometry techniques(see Discussion). In order to avoid bias, initialcartesian coordinates are chosen randomly within a20 A cube.

The minimization strategy consists of a sequenceof simulated annealing (SA) stages. The wholestrategy is outlined in Figure 2. The basic ideais the same as in previous work (Nilges et al.,1988b). The separation into several stages makesthe calculations more efficient. The first stage(‘‘random’’ in Figure 2) replaces the distancegeometry ‘‘embedding’’ stage. It is the centralpart of the new strategy, since it finds an approxi-

Table 1Random

Stage 1 2Temperature† (K) 2000.0 2000.0Masses (a.m.u.) 100.0 100.0

Energy constants

Kbonds (kcal/(mol A2)) 0.05 : 10.0 10.0 : 100.0Kangles (kcal/(mol rad2)) 0.05 : 10.0 10.0 : 100.0Kplanar (kcal/(mol rad2)) 0.0 0.0Kchiral (kcal/(mol rad2)) 0.0 0.0Krepeat (kcal/(mol A4)) 0.1‡ 0.002 : 0.01KNOE (kcal/(mol A2)) 0.5 5.0Asymptote 2.0 2.0S (A) 1.0 1.0

Kharm (kcal/(mol A2)) 0.0001 0.0

Steps 7500 1500

† The temperature is maintained by coupling to a heat bath(Berendsen et al., 1984) with a coupling constant of 10 ps − 1.

‡ The non-bonded interactions are computed only between Ca

atoms with van der Waals radii of 2.25 A.

Page 5: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data 649

Table 2Refinement

Stage Search Cool1 Cool2/miniTemperature† (K) 2000.0 2000.0 : 1000.0 1000.0 : 100.0Masses (a.m.u.) 100.0 100.0 100.0Energy constants

Kbonds (kcal/(mol A2)) 1000.0 1000.0 1000.0Kangles (kcal/(mol rad2)) 125.0 125.0 : 500.0 500.0Kplanar (kcal/(mol rad2)) 50.0 50.0 : 500.0 500.0Kchiral (kcal/(mol rad2)) 50.0 50.0 : 500.0 500.0Krepeat (kcal/(mol A4)) 0.1‡ 0.003 : 4.0 4.0KNOE (kcal/(mol A2)) 25.0 25.0 25.0Asymptote 0.1 0.1 : 1.0 1.0S (A) 0.5 0.5 0.5

Steps 5000 5000 2000/250

† See footnote † to Table 1.‡ See footnote ‡ to Table 1.

influence on the convergence rate, a ‘‘soft’’asymptotic behaviour seems essential.

Energy term Eexv

The increased efficiency of this protocol is mostlydue to changes in Eexv . For most of the random stage,Eexv is only calculated between the C a atoms, whoseradii have been increased to 2.25 A. This ensures thatthe structures have non-overlapping backbones. Thepacking of the side-chain and missing backboneatoms is taken care of in the subsequent regulariz-ation and refinement stages.

Energy term Ecov

One of the major practical problems with startingfrom a structure with very distorted geometry arisesfrom the evaluation of torsion angle terms. A torsionangle involving four atoms is defined only if all threebond lengths are non-zero, and if both bond anglesare not equal to 0° or 180°. Torsion angles in randominitial coordinates often do not meet theseconditions. A very simple and effective solution is toremove all torsion angles terms (dihedral angles,planarity, and chirality) from Ecov . This solution wasinspired by the success of MMDG calculations,where chirality terms cannot be used in the boundsmoothing and embedding steps. They are reintro-duced in the regularization and refinement stages.

Minimization procedure

In the annealing schedules, both the temperatureand the weights on different energy terms are varied.One can show (Brunger, 1991) that varying theoverall weight of the potential energy (or, trivially,the mass) is equivalent to varying the temperature:Multiplying the potential energy by a factor has thesame effect as dividing the masses or thetemperature by the same factor. In order to achieveany additional efficiency, the weights wcov , wexv andwNOE have to be varied at different rates. This does notimply a very complicated schedule. Thus, for most ofthe ‘random’ stage, only the weight on the covalent

geometry wcov is varied, while wexv and wNOE areconstant. wcov is multiplied by 1.25 every 300 steps ofdynamics. The NOE and excluded volume termsare satisfied more or less equally well throughoutstage 1, while the covalent geometry (bond lengthsand bond angles) is slowly improved from theinitial random structure.

In some cases, the convergence can be improved bythe addition of a very weak harmonic potentialcentred at origin (Kharm in Table 1). This restrainsatoms from moving too far away during the initialcycles when wcov is very small. The first stage isterminated when the covalent geometry is roughlyas good as that of structures after the ‘‘embed’’ stageof MMDG. With standard, unambiguous distancerestraints, the sampling of this method seems to beas good as MMDG with metrization (unpublishedresults). As for MMDG, the correct enantiomer hasto be selected at the end of the random protocol, sinceno chiral information is present in this stage. This canbe done using similar criteria as for MMDG, e.g., thetotal energy after refinement of both enentiomers(Kuszewski et al., 1992).

Model data sets

Distance sets with various degrees of ambiguitywere generated from the X-ray crystal structure ofBPTI (Deisenhofer & Steigemann, 1975) and thechemical shift assignments of Wagner et al. (1987).Protons were added to the X-ray coordinates withX-plor (Brunger & Karplus, 1988). Each data set ischaracterized by a parameter D that describes theprecision of the peak location, or a half width of thepeak (Table 3), and corresponds to D1 and D2 inequation (1). To create data sets for different D, thechemical shifts were rounded to the next multiple ofD. From the X-ray crystal structure, all spin pairs lessthan 3.6 A apart were selected. For each chemical shiftpair, these were summed according to equation (4) toyield r − 6-summed distances. These r − 6-summeddistances were classified into weak (<3.6 A), medium(<3.0 A) and strong (<2.5 A), and upper limits on ther − 6-summed distances were set to 5.0, 3.3 and 2.7 A,respectively. No lower bounds were used.

Page 6: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data650

Table 3

Data sets for different calculationsCalculation D NNOE † Nass ‡ Nterms§ Log10 (Nposs )>N20 0.02 650 71 3005 357N20a 0.02 650 0 3076 378N20b 0.02 71 71 71 0N20c 0.02 71 71 71 0N30 0.03 618 26 4684 471N30a 0.03 520 24 3885 393N40 0.04 595 8 6641 546

† Total number of distance restraints.‡ Number of unambiguously assigned restraints.§ Total number of spin pairs contributing to the NOEs.> Total number of different assignments of the spectrum.

No stereospecific assignments were assumed. Thechemical shift values of both protons of methylenegroups and both methyl groups of propyl groupswere set to the same value. For simplicity, the averagedistances to aromatic ring, methylene and methylprotons were calculated using equation (4). This isnot exactly the appropriate treatment for methylgroups and aromatic rings, for which one should use�r − 3� − 1/3 and �r − 6� − 1/6 averages, respectively (Kon-ing et al., 1991). If the method is used with distancesderived from an experimental spectrum, correctionssimilar in spirit to those used for pseudo atoms(Wuthrich et al., 1983) could be used. Note, however,that the corrections would have to be applied in theopposite sense (i.e. the upper bound has to bedecreased) since the r − 6 sum (equation (4)) is alwaysshorter than �r − 6� − 1/6 or �r − 3� − 1/3 averages, while thedistance to the pseudo atom is longer. In the presentstudy, the generation of the model data and therestraining term are consistent.

Three basic data sets were generated in this wayfor D 0.02, 0.03, and 0.04 p.p.m. The correspondingnames in Table 3 and Figures 3, 4 and 5 are N20, N30,and N40. Several additional data sets were derivedfrom the sets N20 and N30 to test various aspects ofconvergence.

In order to test the influence of unambiguousdistance restraints, a data set (N20a) was generatedthat contains only ambiguous restraints, by provid-ing an arbitrary second possibility for eachunambiguous restraint. The information content ofthe 71 unambiguously assigned NOEs in data setN20 was assessed by calculations with data set N20b,which contains only these 71 restraints. Data set N20ccontains the ambiguous restraints for the disulphidebridges in addition to the 71 unambiguous restraintsof data set N20b. In set N30a, all restraints with bothchemical shifts smaller than 3.5 p.p.m. (the aliphatic-aliphatic region) have been removed. This region isoften especially crowded and unreliable to integrate.

Table 3 lists the total number of restraints presentin the data sets, the total number of spin pairs, andthe possible number of ‘‘non-trivial’’ ways to assignthe NOE spectrum. The latter number illustrates thata procedure designed to assign the NOE spectra bysystematically trying all possibilities without theknowledge of a structure would fail. Differentassignments due to equivalent aromatic, methylene,

and methyl protons are considered as trivial, and arenot counted separately in the Table. The total numberof restraints is smaller than the number obtainedexperimentally (Berndt et al., 1992). For most of thedata points, there is one dominant contribution to thesum in equations (1) and (4), which can be assignedto a single spin pair. For N20, 579 of the 650 NOEshave a dominant contribution accounting for morethan 90% of the sum, while for 71 NOEs thedominant contribution is between 40 and 90%. 298 ofthe restraints are predominantly intraresidue, 133sequential, 63 medium range, and 164 long range.Intraresidue restraints, including structurally irrele-vant restraints, are present in the data sets simplybecause there is no a priori way of assigning them assuch.

Figures 3a to c, and 4a to d, illustrate the degreeof ambiguity in the data sets further. Figure 3a to care histograms of the number of possible non-trivialassignments for NOE peaks in each data set. Forexample, in the data set N20, there are 71unambiguously assigned NOEs, 166 with twopossible assignments, 56 with three, and so on. Astandard NOE residue–residue contact plot is shownin Figure 4a. A black square indicates that tworesidues are connected by at least one spin-spindistance smaller than 3.6 A. Figure 4b to d show thecorresponding plots for the data sets N20, N30, andN40. A grey square indicates a possible butambiguous connection of the two residues by adistance smaller than 3.6 A, a black square at leastone unambiguously assigned NOE. The data setN20a roughly corresponds to only the grey squaresof Figure 4b, N20b to only the black squares.

For all data sets except N20b, the NOE deriveddistances were complemented with ambiguousrestraints for the disulphide bonds.

Results

Convergence to the correct fold

Figure 5a to f shows plots of the r.m.s. differenceof Ca atoms of residues 5 to 55 to the X-ray crystalstructure against the residual NOE energy. Thenumber of converged structures for each data set arecompiled in Table 4.

Page 7: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data 651

Figure 3. Histograms of the number of possibleassignments for NOE cross-peaks for data sets N20 (a), N30(b) and N40 (c).

Figure 4. NOE contact plots illustraing different data setsused. a, Standard 1H-1H contact plot derived from the X-raycrystal structure of BPTI (Deisenhofer & Steigemann,1975). Only contacts <3.6 A are shown. b, data set N20. Agray square indicates an NOE contact between 2 residuesallowed by the ambiguous distance restraints. A blacksquare indicates at least one unambiguous NOE contact. c,as b; for data set N30. d, as b; for data set N40.

The method is remarkably stable for the ‘‘best’’data set, N20, which was derived with a parameterD of 0.02 p.p.m. Even in this data set, most of therestraints are ambiguous (see Figures 3 and 4). All 20structures have the correct fold, and 17 of them havefinal NOE energies under 10 kcal/mol. The threestructures that have higher NOE energies alsodeviate more from the X-ray crystal structure.

Figure 5b shows the effect of making the 71unambiguous distance restraints ambiguous. Theconvergence rate drops, so that now only 11structures have residual energies of below 10 kcal/mol. There are three structures with major errors inthe fold and high NOE energies. Remarkably,however, it is still possible to arrive at correct foldswith completely ambiguous restraints.

In contrast, none of the calculations with the 71unambiguous restraints by themselves (data setN20b) converged to the correct fold. However, all 20structures satisfy the data very well (Figure 5c). Theinformation content of the 71 unambiguouslyassigned NOEs is obviously insufficient to determinethe fold of BPTI. In practice, additional assignmentscan usually be made based on the knowledge of

Page 8: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data652

Table 4

Number of converged structuresCalculation Nfold † Nenergy ‡ Nssbond §

N20 20 17 20N20a 17 11 18N20b 0 19 —N20c 0 19 5

N30 10 4 10N30a 4 3 7

N40 2 0 5

† Number of structures with final r.m.s. difference to the X-raystructure below 2 A.

‡ Number of structures with final NOE energy below10 kcal/mol.

§ Number of structures with the correct disulphide pairing.

refinement, is virtually identical to the contact plotderived from the X-ray crystal structure itself(Figure 4a).

The convergence of the disulphide bridges isshown in Figure 7 in more detail. The Figure showsall disulphide bridges at different times during thecalculation, as contact plots between the Sg atoms.Instead of the usual square representation, thelower triangle of the contact plot is shown on a singleline, and a vertical bar indicates a contact. A contactis shown for the shortest Sg–Sg distances if thedistance is shorter than 5 A. The Figure shows thatdisulphide bridges are broken and formed until verylate in the random stage of the calculation, indicatingthat the final result is not biased by the initialstructure.

Convergence of the assignments

A detailed comparison of the final assignmentsobtained by the calculation and the known correctassignments reveals that all of the structureshave some assignments that differ from theX-ray crystal structure. This means that themodel NOE spectra can be assigned in several ways,not that the method itself is invalidated. On thecontrary, the fact that several solutions are generatedautomatically is one of the main advantages of themethod.

Most of the differences are found for peakswith several significant contributions to the NOE( > 10%), and the difference lies in the ranking ofthe contributions. If one restricts the analysis topeaks where the dominant contribution accountsfor more than 90% of the NOE, the 17 convergedstructures for data set N20 have between one andten incorrect assignments. However, there isno convergence to incorrect assignments, that is,none of the incorrect assignments are present inall 17 structures. Furthermore, the differencesoccur predominantly in areas that are determinedby only few restraints, for example the N and Ctermini. Thus, the most common misassignment(13 out of 17 structures) is an NOE betweenCys55 Ha and Arg1 Ha which is instead assignedto Cys55 Ha and Pro2 Hg. Seven times the NOEbetween Gly57 Ha and Arg1 Hb was misassigned (toAla58 Ha-Arg1 Hb or Thr54 Hb-Pro2 Hg). Figure 8shows two calculated structures overlayed with theX-ray crystal structure, one with the incorrectassignment of the NOE between Cys55 Ha andArg 1 Hb (Figure 8a), one with the correct assignment(Figure 8b).

Equations (1) to (4) treat explicitly also the caseswhere several spin pairs contribute to the NOEcrosspeak. A more stringent test on the convergenceis therefore a comparison of the relative contributionsto the crosspeaks between the X-ray crystal structureand the ensemble of converged structures. This isshown in Figure 9. For most spin pairs, thecontribution is either close to 0% or to 100%. The graylines parallel to the diagonal indicate the regions

secondary structure, and correlation of NOEsbetween two spin systems. It may therefore well bepossible to determine the three-dimensional struc-ture of a protein with NOE spectra of the quality ofN20. This calculation only illustrates the smallinformation content of the unambiguously assignedNOEs alone. Including the ambiguous restraints forthe disulphide bridges has only a small effect on theresults in terms of r.m.s. differences to the X-raystructure.

For data set N30, only ten structures converged tothe correct fold, and only four of these have very lowNOE energies. As a comparison with data set N20ashows, this is only partly a consequence of thereduction in the numbers of unambiguouslyassigned NOEs. The average number of possibilitiesseems to have a stronger effect. Removing thealiphatic-aliphatic region from this data set reducedthe convergence rate further (Figure 5f). Finally, onlytwo calculations converged to within 2 A of the X-raystructure for data set N40. Clearly, the degree ofambiguity in this data set is a limit for theconvergence.

Trajectories

Figure 6 shows snapshots from a calibration withdata set N20. The Ca trace of the three-dimensionalstructure and an NOE contact plot derived from thestructure are plotted side by side at several stagesduring the calculation. The contact plots visualise theassignments in the following way: for eachambiguous restraint, the assignment is taken to bethat corresponding to the shortest spin-spin distance.For clarity, a contact is only shown when the shortestdistance is smaller than 5 A. The starting structurehas random residue–residue contacts. Halfwaythrough the random stage, the contact plot beginsto show some features, like ‘‘density’’ for theantiparallel b-sheet. At the end of the random stage,the contact plot shows many correct features,although the folding topology of the correspondingthree-dimensional structure is not entirely correct.The final contact plot, after regularization and

Page 9: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data 653

Figure 5. R.m.s. differences of the 20 final structures of each set of calculations to the X-ray structure (calculated forCa of residues 5 to 55), plotted against the NOE energy of the final structures, evaluated with equation (7) and the finalenergy parameters in Table 2. Note the different scales of the axes.

where the calculated contribution is more than 20%wrong. Surprisingly, the procedure works reasonablywell also for the majority of contributions in the 10%to 90% range (Figure 9a). However, there areincorrectly calculated contributions. This is at leastpartly a consequence of the wide error bounds on the

ambiguous distance restraints used. Figure 9b showsthe same plot for a set of structures that were refinedusing tighter distance bounds (the measureddistance +/− 0.2 A for strong peaks, 0.3 A formedium peaks, and 0.5 A for weak peaks), andstereospecific assignments. Few contributions are

Page 10: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data654

Figure 6. Trajectory for the assignment of NOE connectivities. 1H-1H contact plots are shown at different times in therefinement protocol. Only distances <5 A that are allowed by the ambiguous restraints are shown. Contact plots are shown(a) of an initial structure; (b) of a structure after 50% of the random protocol; (c) after the random protocol; (d) afterregularization and 2 cycles of refinement.

more than 20% incorrect, and the largest errorsare around 50% (this corresponds to a 12%error in the distance). It seems thereforeadvantageous to get better distance estimatesthan the qualitative classification into weak,

medium, and strong NOEs if ambiguous dataare used. In contrast, the behaviour is notnecessarily improved by removing theambiguous data points from the restraint list(see Discussion).

Page 11: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data 655

Figure 7. Trajectory for the assignment of disulphidebridges. Each line represents one triangle of a Sg–Sg contactplot at different times during the refinement. A vertical lineindicates a contact. Only distances <5 A are shown.

there is a fundamental difference between theannealing protocol proposed here, and the variabletarget function methods (Braun & Go, 1985;Endo et al., 1991; Guntert et al., 1991), and also earlierwork based on restrained molecular dynamics(Brunger et al., 1986; Clore et al., 1986): the variationsnever make use of the assignments of the NOEcrosspeaks to spin pairs. By contrast, the variabletarget function methods employ a buildup pro-cedure to include restraints in the target functionbased on their separation in sequence. Similarly,the restrained molecular dynamics studies useda separation of the restraints in short andmedium range on the one hand, and long range onthe other. For both methods, a knowledge of theassignments is necessary at least at the level of singleresidues.

In part, therefore, the use of an iterativeassignment strategy is a consequence of the type ofalgorithms used for calculating structures fromNMR data. The algorithms make explicit use of theassignments in the minimisation strategy. In metricmatrix distance geometry algorithms, this is deeplyrooted in the algorithm itself (during boundsmoothing and embedding).

A step towards fitting all data

Ambiguous distance restraints make it possible touse all data in the NOE spectrum directly from thestart of a structure calculation, even if anunambiguous assignment is not possible. Alsoiterative approaches aim at using as much of theinformation present in the NOE spectra as possible.However, the interpretation of the NOE spectrumproceeds in parallel with the structure calculations,and usually only data points that can be attributedto a specific spin pair are used to determine thestructure.

The different approaches may result in differentstructures, since leaving experimental data pointsout of the calculation will have a different effect fromusing the data in the form of ambiguous restraints.The point can be illustrated by the example ofdisulphide bonds with unknown covalent connec-tivity. The usual procedure is to calculate initialstructures without any distance restraints for thedisulphide bonds. The probability that the Sg–Sg

distances in the structures calculated this way areclose to the covalent bond length can be very small,especially if there are few NOE distance restraints.Thus, the shortest sulphur–sulphur distances in theN20 structures are around 5 to 10 A. In other words,none of the calculated structures satisfy theexperimental fact that the cysteines are disulphidebonded. Also, none of the structures have the correctdisulphide bonding pattern, even when more relaxedcriteria (Sa–Sg distance <10 A) are used. Usingambiguous distance restraints for the disulphideconnectivities, however, the structures have a muchhigher probability to have sulphur–sulphur dis-tances close to the covalent bond length. 16 of the 20

Discussion

Relationship to other methods

The main conceptual difference between thestructure determination strategy presented in thispaper and other methods lies in the treatment ofthe data. Ambiguous data are directly used inthe most simple and straight-forward way: Theupper and lower bounds U and L in the targetfunction (equation (7)) are derived directly from thesum of all potentially contributing NOEs. Theresulting target function has many local minimacorresponding to incorrect assignments. The successof the method relies on the ability of theminimization strategy to find the global minimum ofthis function.

The generalized annealing scheme developedfor this purpose involves varying the weightson different terms of the target function. However,

Page 12: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data656

Figure 8. Details of 2 N20 SA structures overlayed with the X-ray crystal structure. a, The NOE between Cys55 Ha andArg1 Hb, is incorrectly assigned to Cys55 Ha and Pro2 Hg. b, An SA structure with the correct assignment for the sameNOE.

N20c structures have exactly three disulphidebridges (Sg–Sg distance <2.3 A). Four differentpatterns appear, twice (5–14, 30–38, 51–55), five times(5–30, 14–38, 51–55), four times (5–51, 14–38, 30–55),and five times the correct pattern (5–55, 14–38,30–51). Clearly, the NOE distance restraints in datasets N20b and N20c do not allow the assignment ofthe disulphide connectivities. However, 16 of theN20c structures satisfy the data in the sense that thecysteines are disulphide bonded.

Sampling of conformation and assignmentspace

Is it more appropriate to remove ambiguous datapoints or use them as ambiguous restraints? Thisquestion is connected to the issue of sampling. In theabove example of the disulphide bridges it could beargued that the conformation space sampled by theN20c structures is more relevant since it is restrainedto disulphide bonded structures. In addition, theambiguous restraints for the disulphide bridges haveincreased the sampling in this particular example:the average pair-wise r.m.s. differences of the N20cstructures is larger than that of the N20b structures(8.01 versus 7.24 A). This might be a consequence ofa more complex energy surface, which wouldincrease the chances that the structures are trappedin local minima corresponding to different assign-ments.

The disulphide bridges have not converged to anyparticular pattern in the N20c structures. However,there is the possibility of convergence towards a

wrong assignment. While it is difficult to see whymanual assignment should be able to avoid this error,it is interesting to test how the removal of asingle ambiguous restraint affects the sampling ofconformation space. The distance restraint corre-sponding to the NOE peak at the chemical shifts(F1 = 4.00 p.p.m., F2 = 1.88 p.p.m.), arising mostlydue to the interaction between Gly57 Ha and Arg1Hb, was taken out of the restraint list N20, and thestructures were recalculated with an otherwiseunaltered data set. Three possible assignmentsappeared for this particular restraint (see Results).Table 5 compiles the distances between the threepossible spin pairs, Gly 57 Ha–Arg1 Hb, Ala58Ha–Arg1 Hb, and Thr54 Hb–Pro2 Hg, in the twoensembles of structures and in the X-ray structure.The distance ranges are consistently larger in the N20structures than in the N20/157 structures. Also noneof the N20/157 structures satisfies the experimentalfact that there is a strong NOE at (F1 = 4.00 p.p.m.,F2 = 1.88 p.p.m.). The shortest distance is 2.8 A, andcorresponds to an incorrect assignment (to Ala58Ha–Arg1 Hb).

Good sampling of conformation space is arequirement for the bootstrap approach. A typicalassignment strategy would add an assignment if aparticular distance is less than a threshold (5 A, say)in one member of the ensemble of initial structures(Guntert et al., 1993). However, refinement pro-cedures undersample conformation space in theimportant sense that the structure calculated with alldata is not contained in an ensemble of structurescalculated with a reduced data set (Brunger et al.,

Page 13: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data 657

Figure 9. Comparison of relative contributions to D�(equation (4)) calculated from the X-ray crystal structureand the SA structures. The relative contribution due to adistance D between 2 protons is calculated as (D/D� )6. Forthe SA structures, the difference between the �r − 6� − 1/6

average over the distances in all structures and the X-raystructure is plotted. a, Data set N20. b, N20 with tighterupper and lower bounds and stereospecific assignments.

ambiguous restraints. This turned out to be a difficultproblem with symmetric dimers (Nilges, 1993). Dueto the fact that essentially only torsion angles werefree to move in the structure calculation protocolused there, the method biased the interpretationtowards intra-monomer NOEs in some cases. Similarconsiderations are also applicable to the case ofgeneral ambiguities. If, for example, an NOE can beassigned both as intra-residue and long-range, therewould be an invariable bias towards the intra-residueNOE with random coil initial structures, and motionessentially only in the torsion angles. For this reason,the current protocol starts from random cartesiancoordinates.

Practical considerations and possibleextensions

The major practical outcome is that the distancerestraints can be specified directly in terms of thechemical shift coordinates. This may be especiallyuseful in combination with three and four-dimen-sional spectra, which are more tedious to evaluatemanually. Due to the reduced resolution in thespectra, overlapping and ambiguous peaks remain.The calculation method is ideally suited to interpretthe NOE spectra when the proton chemical shifts aremostly determined via through bond methods(Meadows et al., 1994 and references cited there). Itmay also be possible to attempt structure calcu-lations if not enough unambiguously defined NOEpeaks can be assigned to generate initial structuresfor an iterative assignment scheme.

However, there remain some additional problemswith generating structures automatically fromexperimentally derived peak lists and assignmenttables. If the peak table is generated by an automaticpeak picking method, peak positions may be shifteddue to extensive overlap. A possible solution to thisproblem could be to modify equation (3) to includeinformation on the peak shape:

ddtm

NOEF1,F2 = c0 sij

f1(dni )f2(dnj )D − 6ij 1, (8)

where dni is the frequency difference of proton i fromF1, and f1 (dni ) and f2 (dni ) describe the peak shape inthe two frequency dimensions. Integrating NOEpeaks would be replaced by evaluating the NOEspectrum at certain points. This could be all possiblefrequency pairs based on the assignment table, or all‘‘pixels’’ (Yang & Havel, 1993). A manual refinementstrategy along these lines, using backcalculation, hasalso been suggested for nucleic acids (Mirau, 1992).

Artifacts in the spectrum would have to beremoved manually from the peak list. A way torecognise and exclude noise peaks during thecalculation may be offered by the method of‘’self-correcting distance geometry’’ (Hanggi &Braun, 1994), which can readily be extended toambiguous distance restraints. In a similar vein,ambiguous distance restraints could be usedtogether with an iterative assignment strategy, where

1993; Clore et al., 1993). Therefore, the iterativeassignment strategy can miss possible alternateassignments, or the distance threshold has to be setto larger values. Using ambiguous distance re-straints, alternate assignments are generated auto-matically.

Some bias towards particular assignments can beintroduced by the calculation method used with the

Table 5

Distance ranges for NOE Gly57 Ha–Arg1 Hb

Spin pair N20 N20/157 X-ray

Gly57 Ha–Arg1 Hb 1.9–8.4 3.5–5.8 2.2Ala58 Ha–Arg1 Hb 2.1–10.5 2.8–10.0 6.3Thr54 Hb–Pro2 Hg 2.6–4.8 3.8–5.1 4.8

Page 14: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data658

an ambiguous restraint is added to the list wheneverit is satisfied in an initial structure. It may beadvantageous to go beyond the qualitative classifi-cation of peak size and extract more exact values forthe distance restraints. Methods to correct thedistance restraints for spin diffusion (Boelens et al.,1988, 1989; Borgias & James, 1990; Kochl & Lefevre,1990) have to be adapted to deal with the distancesof the form of equation (3).

Other uses for model building

Distance geometry and related methods havealways been of interest outside the field of NMR. Acase in point are the modelling calculationsperformed on BPTI with the assumed knowledge ofthe secondary structure and the hydrogen bonds(Levitt, 1983). In the same paper, the limitations ofthe metric matrix distance geometry approach werepointed out, in that only distance information can beused, and potentials such as the van der Waalspotential cannot be incorporated. Ambiguousdistance restraints go beyond the possibilities ofMMDG in yet another way. The restraints are stillpurely geometric, and no energy terms in theclassical sense (van der Waals, electrostatic) are used.

NOE derived distances and disulphide bridges arenot the only use for ambiguous distance restraints.For example, we have used the concept to model thecomplex of the single stranded DNA binding proteinencoded by gene V of filamentous phage m13 withsingle stranded DNA (Folmer et al., 1994), based onthe paramagnetic shift data of Folkers et al. (1993).Due to the mobile nature of the complex, theparamagnetic shift data could not be interpreted asstandard distance restraints between the phosphatebackbone of the DNA and the protein.

Another possible use is for hydrogen bonds, whichare often included as distance restraints in NMRstructure calculations. As has been shown forexample by Billeter (1992), the hydrogen bondingpattern may deviate from the regular patternespecially near the end of secondary structureelement, making an unambiguous assignment of thehydrogen bond acceptor from the NMR datadifficult. Using ambiguous distance restraints, theexperimental information that indicates a hydrogenbond (slow hydrogen exchange) can be included inthe calculation as a restraint between the donor andseveral acceptors.

Conclusions

In this paper, I have presented a calculationstrategy that extends the distance geometry conceptto a more general type of data. The data that are usedare still distances; however, the requirement thateach distance restraint is assigned to a pair of pointshas been removed. In particular, the global fold of aprotein can be determined with ambiguous distancerestraints alone. The inclusion of ambiguous datapoints affects the sampling of conformation space of

the calculated structures, which is restricted to amore relevant part of the space.

The main use of the method is an automatedassignment of ambiguous NOE crosspeaks duringthe structure calculation. The method can be used tocalculate structures directly from a peak table, oncethe chemical shifts of all protons are known.Ambiguous restraints are not only useful for NOEderived distances, but also for other types of distancedata, such as disulphide connectivities and hydrogenbonds.

Acknowledgement

I thank Sean O’Donoghue for a critical reading of themanuscript and many suggestions regarding style.

ReferencesBerendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F.,

DiNola, A. & Haak, J. R. (1984). Molecular dynamicswith coupling to an external bath. J. Chem. Phys. 81,3684–3690.

Berndt, K. D., Guntert, P., Orbons, L. P. M. & Wuthrich, K.(1992). Determination of a high-quality nuclearmagnetic resonance structure of the bovine pancreatictrypsin inhibitor and comparison with three crystalstructures. J. Mol. Biol. 227, 757–775.

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F.Jr., Brice, M. D., Rodgers, J. R., Kennard, O.,Shimanouchi, T. & Tasumi, M. (1977). The protein databank: a computer-based archival file for macro-molecular structures. J. Mol. Biol. 112, 535–542.

Billeter, M. (1992). Comparison of protein structuresdetermined by NMR in solution and by X-raydiffraction in single crystals. Quart. Rev. Biophys. 25,325–377.

Boelens, R., Koning, T. M. G. & Kaptein, R. (1988).Determination of biomolecular structures fromproton-proton NOEs using a relaxation matrixapproach. J. Mol. Struct. 173, 299–311.

Boelens, R., Koning, T. M. G., Van der Marel, G. A., vanBoom, J. H. & Kaptein, R. (1989). Iterative procedurefor structure determination from proton-proton NOEsusing a full relaxation matrix approach. Application toa DNA octamer. J. Magn. Reson. 82, 290–308.

Borgias, B. A. & James, T. L. (1990). MARDIGRAS: aprocedure for matrix analysis of relaxation fordiscerning geometry of an aqueous structure. J. Magn.Reson. 87, 475–487.

Braun, W. & Go, N. (1985). Calculation of proteinconformation by proton-proton distance constraints.A new efficient algorithm. J. Mol. Biol. 186, 611–626.

Braun, W., Bosch, C., Brown, L. R., Go, N. & Wuthrich, K.(1981). Combined use of proton-proton Overhauserenhancements and a distance geometry algorithm fordetermination of polypeptide conformations. Appli-cation to micelle-bound glucagon. Biochem. Biophys.Acta. 667, 377–396.

Brunger, A. T. (1991). Simulated annealing in crystallogra-phy. Annu. Rev. Phys. Chem. 42, 197–223.

Brunger, A. T. (1992). X-plor. A System for X-rayCrystallography and NMR. Yale University Press, NewHaven.

Brunger, A. T. & Karplus, M. (1988). Polar hydrogenpositions in proteins: empirical energy placement and

Page 15: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data 659

neutron diffraction comparison. Proteins: Struct.Funct. Genet. 4, 148–156.

Brunger, A. T., Clore, G. M., Gronenborn, A. M., Saffrich,R. & Nilges, M. (1993). Assessing the quality ofsolution nuclear magnetic resonance structures bycomplete cross-validation. Science, 261, 328–331.

Brunger, A. T., Clore, G. M., Gronenborn, A. M. & Karplus,M. (1986). Three-dimensional structure of proteinsdetermined by molecular dynamics with interprotondistance restraints: application to crambin. Proc. Nat.Acad. Sci., U.S.A. 83, 3801–3805.

Clore, G. M. & Gronenborn, A. M. (1991). Structures oflarger proteins in solution: three- and four-dimen-sional heteronuclear NMR spectroscopy. Science, 252,1390–1399.

Clore, G. M., Brunger, A. T., Karplus, M. & Gronenborn,A. M. (1986). Application of molecular dynamics withinterproton distance restraints to three-dimensionalprotein structure determination: a model study ofcrambin. J. Mol. Biol. 191, 523–551.

Clore, G. M., Robien, M. A. & Gronenborn, A. M. (1993).Exploring the limits of precision and accuracy ofprotein structures determined by nuclear magneticresonance spectroscopy. J. Mol. Biol. 231, 82–102.

Crippen, G. M. (1977). A novel approach to the calculationof conformation: distance geometry. J. Comp. Phys. 24,96–107.

Crippen, G. M. & Havel, T. F. (1978). Stable calculation ofcoordinates from distance information. Acta Crystal-logr. sect. A, A34, 282–284.

Deisenhofer, J. & Steigemann, W. (1975). Crystallgraphicrefinement of the structure of bovine pancreatictrypsin inhibitor at 1.5 A resolution. Acta Crystallogr.sect. B, 31, 238–250.

Endo, S., Wako, H., Nagayama, K. & Go, N. (1991). A newversion of DADAS (distance analysis in dihedral anglespace) and its performance. In Computational Aspectsof the Study of Biological Macromolecules by NuclearMagnetic Resonance Spectroscopy (Hoch, J. C., Poulsen,F. M. & Redfield, C. eds), pp. 233–251, Plenum Press,New York.

Folkers, P. J. M., van Duynhoven, J. P. M., van Lieshout,H. T. M., Harmsen, B. J. M., van Boom, J. H., Tesser,G. I., Konings, R. N. H. & Hilbers, C. W. (1993).Exploring the DNA binding domain of gene V proteinencoded by bacteriophage M13 with the aid ofspin-labeled oligonucleotides in combination with1H-NMR. Biochemistry, 32, 9407–9416.

Folmer, R. H. A., Nilges, M., Folkers, P. J. M., Konings,R. N. H. & Hilbers, C. W. (1994). A model of thecomplex between single-stranded DNA and thesingle-stranded DNA binding protein encoded bygene V of filamentous bacteriophage M13. J. Mol. Biol.

Gobel, U., Sander, C., Schneider, R. & Valencia, A. (1994).Correlated mutations and residue contacts in proteins.Proteins: Struct. Funct. Genet. 18, 309–317.

Guntert, P., Braun, W. & Wuthrich, K. (1991). Efficientcomputation of three-dimensional protein structuresin solution from nuclear magnetic resonance datausing the program DIANA and the supportingprograms CALIBA, HABAS and GLOMSA. J. Mol.Biol. 217, 517–530.

Gunstert, P., Berndt, K. D. & Wuthrich, K. (1993). Theprogram ASNO for computer-supported collection ofNOE upper distance constraints as input for proteinstructure determination. J. Biomol. NMR, 3, 601–606.

Habazettl, J., Cieslar, C., Oschkinat, H. & Holak, T. A.(1990). 1H NMR assignments of sidechain confor-mations in proteins using a high-dimensional

potential in the simulated annealing calculations.FEBS Letters 268, 141–145.

Havel, T. F. & Wuthrich, K. (1984). A distance geometryprogram for determining the structures of smallproteins and other macromolecules from nuclearmagnetic resonance measurements of intramolecular1H1H proximities in solution. Bull. Math. Bio. 46,673–698.

Hanggi, G. & Braun, W. (1994). Pattern recognition andself-correcting distance geometry calculations appliedto myohemerythrin. FEBS Letters, in the press.

Kalus, W., Broger, C., Gerber, P. & Senn, H. (1993).Determination of the disulphide bonding pattern inproteins by local and global analysis of nuclearmagnetic resonance data. J. Mol. Biol. 232, 987–906.

Kochl, P. & Lefevre, J. F. (1990). The reconstruction of therelaxation matrix from an incomplete set of nuclearOverhauser effects. J. Magn. Reson. 86, 565–583.

Koning, T. M. G., Boelens, R., van der Marel, G. A., vanBoom, J. H. & Kaptein, R. (1991). Structuredetermination of a DNA octamer in solution by NMRspectroscopy. Effect of fast local motions. Biochemistry,30, 3787–3797.

Kuszewski, J., Nilges, M. & Brunger, A. T. (1992). Samplingand efficiency of metric matrix distance geometry: anovel partial metrization algorithm. J. Biomol. NMR, 2,33–56.

Levitt, M. (1983). Protein folding by restrained energyminimization and molecular dynamics. J. Mol. Biol.170, 723–764.

Levy, R. M., Bassolino, D. A., Kitchen, D. B. & Pardi, A.(1989). Solution structures of proteins from NMR dataand modeling: alternative folds for neutrophil peptide5. Biochemistry, 28, 9361–9372.

Meadows, R. P., Olejniezak, E. T. & Fesik, S. W. (1994). Acomputer-based protocol for semiautomated assign-ments and 3D structure determination of proteins.J. Biomol. NMR, 4, 79–96.

Mirau, P. A. (1992). A strategy for NMR structuredetermination. J. Magn. Reson. 96, 480–490.

Nilges, M., Gronenborn, A. M., Brunger, A. T. & Clore,G. M. (1988a). Determination of three-dimensionalstructures of proteins by simulated annealing withinterproton distance restraints: application tocrambin, potato carboxypeptidase inhibitor andbarley serine proteinase inhibitor 2. Protein Eng. 2,27–38.

Nilges, M., Clore, G. M. & Gronenborn, A. M. (1988b).Determination of three-dimensional structures ofproteins from interproton distance data by dynamicalsimulated annealing from a random array of atoms.FEBS Letters, 239, 129–136.

Nilges, M., Clore, G. M. & Gronenborn, A. M. (1988c).Determination of three-dimensional structures byproteins from interproton distance data by hybriddistance geometry-dynamical simulated annealingcalculations. FEBS Letters, 229, 317–324.

Nilges, M., Kuszewski, J. & Brunger, A. T. (1991). Samplingproperties of simulated annealing and distancegeometry. In Computational Aspects of the Study ofBiological Macromolecules by Nuclear Magnetic Reson-ance Spectroscopy (Hoch, J. C., Poulsen, F. M. &Redfield, C. eds), pp. 451–455, Plenum Press, NewYork.

Nilges, M. (1993). A calculation strategy for the structuredetermination of symmetric dimers by 1H NMR.Proteins: Struct. Funct. Genet. 17, 297–309.

Oschkinat, H., Muller, T. & Dieckmann, T. (1994).Protein structure determination with three-and

Page 16: Caculation of Protein Structures with Ambiguous Distance …€¦ · Model NOE data were generated from the X-ray crystal structure of a small protein with known chemical shift assignments

JMB—MS 286

Protein Structure from Ambiguous Distance Data660

four-dimensional NMR spectroscopy. Angew. Chem.Int. Ed. 33, 277–293.

Wagner, G., Braun, W., Havel, T. F., Schaumann, T., Go, N.& Wuthrich, K. (1987). Protein structures in solutionby nuclear magnetic resonance and distance geome-try: the polypeptide fold of the basic pancreatictrypsin inhibitor determined using two differentalgorithms, DIS-GEO and DISMAN. J. Mol. Biol. 196,611–639.

Wuthrich, K., Billeter, M. & Braun, W. (1983). Pseudo-structures for the 20 common amino acids for use instudies of protein conformations by measurements ofintramolecular proton-proton distance constraintswith nuclear magnetic resonance. J. Mol. Biol. 169,949–961.

Yang, J. & Havel, T. F. (1993). SESAME: A least squareapproach to the evaluation of protein structurescomputed from NMR data. J. Biomol. NMR, 3, 355–360.

Edited by P. E. Wright

(Received 5 August 1994; accepted 12 October 1994)