a force field for virtual atom molecular mechanics …a force field for virtual atom molecular...

6
A force field for virtual atom molecular mechanics of proteins Anil Korkut a and Wayne A. Hendrickson a,b,1 a Department of Biochemistry and Molecular Biophysics and b Howard Hughes Medical Institute, Columbia University, New York, NY 10032 Contributed by Wayne A. Hendrickson, July 13, 2009 (sent for review March 4, 2009) Activities of many biological macromolecules involve large confor- mational transitions for which crystallography can specify atomic details of alternative end states, but the course of transitions is often beyond the reach of computations based on full-atomic potential functions. We have developed a coarse-grained force field for molecular mechanics calculations based on the virtual interactions of C atoms in protein molecules. This force field is parameterized based on the statistical distribution of the energy terms extracted from crystallographic data, and it is formulated to capture features dependent on secondary structure and on resi- due-specific contact information. The resulting force field is applied to energy minimization and normal mode analysis of several proteins. We find robust convergence in minimizations to low energies and energy gradients with low degrees of structural distortion, and atomic fluctuations calculated from the normal mode analyses correlate well with the experimental B-factors obtained from high-resolution crystal structures. These findings suggest that the virtual atom force field is a suitable tool for various molecular mechanics applications on large macromolecular systems undergoing large conformational changes. energy minimization normal modes transition pathways A ccurate understanding of the dynamic properties of proteins has been a major challenge in biophysics (1, 2). With advances in macromolecular crystallography, structural information has been obtained on very large complexes such as ribosome particles (3), chaperone complexes (4), virus particles (5), and RNA polymerases (6) as well as on thousands of individual proteins. In addition, snapshots of protein structures in different states of activity dem- onstrate the existence of very large conformational changes (7, 8). Computational analysis of the dynamics of such systems is extremely difficult, if not impossible, when using full-atomic computational approaches, due not only to computational limitations but also to the complexity of the resulting information. Thus, coarse-grained approaches have gained importance for addressing large systems and large conformational changes (9–11). Coarse graining reduces computational complexity by greatly decreasing degrees of freedom of a molecular system with appro- priate assumptions to achieve simplification without compromise of essential features (12). For reducing complexity in proteins, C- only models have been the most popular, but other coarse-graining approaches have also been taken, such as the inclusion of side-chain centroids (SCs) (13). An important aspect of coarse-grained anal- ysis is the use of an appropriate pseudoforce field to model the forces and constraints exerted on the molecular system. The use of simple harmonic potentials to model the C-to-C interactions in proteins as flexible springs has been most popular (14). Such simple harmonic potentials are very effective in defining the near-native-state f luctuations of proteins calculated by elastic network models, but these models fail to represent the specific restraints that true interatomic potentials impose on virtual bond angles and dihedrals. Unrealistic distortions can result, especially for calculations that aim to model properties far from the native state. A few potential functions have been designed to account for true molecular forces. The first major approach to a realistic coarse-grained force field is the united residue (UNRES) potential developed by Scheraga and coworkers (13, 15). It has been used mainly in ab initio protein prediction by means of global conformational searches with energy evaluations. With UNRES, the polypeptide chain is represented by C and SC positions. Various energy terms such as virtual C–C bond terms, virtual dihedral and bond-angle constraints, electro- static interactions, C–SC interactions, SC rotamer energies and local correlation energies are taken into consideration. Several other knowledge-based coarse-grained functions have also been designed for structure prediction and design (16). One of particular interest here is the C, knowledge-based, virtual potential OPUS-CA (17), which includes solvent and hydrogen-bonding energies in addition to virtual local and packing energy terms. Interestingly, this potential models the curvilinear terms in a simple secondary-structure-specific approach based on three structure types (helix, sheet, and loop). Unlike the UNRES potential, but as for many related functions, OPUS-CA is not formulated for first or second derivatization as is needed for molecular mechanics. Other coarse-grained potentials have been devised for applica- tions in dynamic simulation. In one model (18), the SCs are used to calculate only the linear energy terms (i.e., virtual bonds and nonbonding interactions). Forces here are formulated in analogy to the full-atomic CHARMM force field; thus, the nonbonded inter- actions, virtual C–C dihedrals, and bond angles are similar in character to CHARMM curvilinear terms. This model has been used in molecular dynamics simulations of complex biological systems. In a different approach, atomic features are mapped into a reduced representation for a coarse-grained potential that ac- counts for the double-well character of virtual dihedral and bond angle potentials (19). This potential, parameterized by statistical information solely from HIV protease, has been used in Brownian dynamics simulations of HIV-1 protease (20). None of these ‘‘nonharmonic’’ force fields has been verified systematically against experimental data except for comparisons of UNRES-generated ab initio structures to crystal structures. We have developed a restrained coarse-grained force field for protein molecules. Our potential function is based on virtual atoms at C atomic positions, and it is constituted to preserve accurate geometry in computations on the structure and dynamics of bio- logical macromolecules. We therefore call this the virtual atom molecular mechanics (VAMM) potential. VAMM includes both linear and curvilinear terms, parameterized against crystal- structure data by the Boltzmann conversion method (21) and also local restraints that ensure computational stability. Energy mini- mizations with VAMM converge to energy gradients of order 10 6 kcal/molÅ without significant distortions. Normal mode calcula- tions with VAMM yield excellent fits to experimentally obtained Author contributions: A.K. and W.A.H. designed research; A.K. performed research; A.K. and W.A.H. analyzed data; and A.K. and W.A.H. wrote the paper. The authors declare no conflict of interest. Freely available online through the PNAS open access option. 1 To whom correspondence should be addressed: [email protected]. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0907674106/DCSupplemental. www.pnas.orgcgidoi10.1073pnas.0907674106 PNAS September 15, 2009 vol. 106 no. 37 15667–15672 BIOPHYSICS AND COMPUTATIONAL BIOLOGY Downloaded by guest on September 29, 2020

Upload: others

Post on 26-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A force field for virtual atom molecular mechanics …A force field for virtual atom molecular mechanics of proteins Anil Korkuta and Wayne A. Hendricksona,b,1 aDepartment of Biochemistry

A force field for virtual atom molecular mechanicsof proteinsAnil Korkuta and Wayne A. Hendricksona,b,1

aDepartment of Biochemistry and Molecular Biophysics and bHoward Hughes Medical Institute, Columbia University, New York, NY 10032

Contributed by Wayne A. Hendrickson, July 13, 2009 (sent for review March 4, 2009)

Activities of many biological macromolecules involve large confor-mational transitions for which crystallography can specify atomicdetails of alternative end states, but the course of transitions isoften beyond the reach of computations based on full-atomicpotential functions. We have developed a coarse-grained forcefield for molecular mechanics calculations based on the virtualinteractions of C� atoms in protein molecules. This force field isparameterized based on the statistical distribution of the energyterms extracted from crystallographic data, and it is formulated tocapture features dependent on secondary structure and on resi-due-specific contact information. The resulting force field is appliedto energy minimization and normal mode analysis of severalproteins. We find robust convergence in minimizations to lowenergies and energy gradients with low degrees of structuraldistortion, and atomic fluctuations calculated from the normalmode analyses correlate well with the experimental B-factorsobtained from high-resolution crystal structures. These findingssuggest that the virtual atom force field is a suitable tool forvarious molecular mechanics applications on large macromolecularsystems undergoing large conformational changes.

energy minimization � normal modes � transition pathways

Accurate understanding of the dynamic properties of proteinshas been a major challenge in biophysics (1, 2). With advances

in macromolecular crystallography, structural information has beenobtained on very large complexes such as ribosome particles (3),chaperone complexes (4), virus particles (5), and RNA polymerases(6) as well as on thousands of individual proteins. In addition,snapshots of protein structures in different states of activity dem-onstrate the existence of very large conformational changes (7, 8).Computational analysis of the dynamics of such systems is extremelydifficult, if not impossible, when using full-atomic computationalapproaches, due not only to computational limitations but also tothe complexity of the resulting information. Thus, coarse-grainedapproaches have gained importance for addressing large systemsand large conformational changes (9–11).

Coarse graining reduces computational complexity by greatlydecreasing degrees of freedom of a molecular system with appro-priate assumptions to achieve simplification without compromise ofessential features (12). For reducing complexity in proteins, C�-only models have been the most popular, but other coarse-grainingapproaches have also been taken, such as the inclusion of side-chaincentroids (SCs) (13). An important aspect of coarse-grained anal-ysis is the use of an appropriate pseudoforce field to model theforces and constraints exerted on the molecular system.

The use of simple harmonic potentials to model the C�-to-C�interactions in proteins as flexible springs has been most popular(14). Such simple harmonic potentials are very effective in definingthe near-native-state fluctuations of proteins calculated by elasticnetwork models, but these models fail to represent the specificrestraints that true interatomic potentials impose on virtual bondangles and dihedrals. Unrealistic distortions can result, especiallyfor calculations that aim to model properties far from the nativestate. A few potential functions have been designed to account fortrue molecular forces.

The first major approach to a realistic coarse-grained force fieldis the united residue (UNRES) potential developed by Scheragaand coworkers (13, 15). It has been used mainly in ab initio proteinprediction by means of global conformational searches with energyevaluations. With UNRES, the polypeptide chain is represented byC� and SC positions. Various energy terms such as virtual C�–C�bond terms, virtual dihedral and bond-angle constraints, electro-static interactions, C�–SC interactions, SC rotamer energies andlocal correlation energies are taken into consideration. Severalother knowledge-based coarse-grained functions have also beendesigned for structure prediction and design (16). One of particularinterest here is the C�, knowledge-based, virtual potentialOPUS-CA (17), which includes solvent and hydrogen-bondingenergies in addition to virtual local and packing energy terms.Interestingly, this potential models the curvilinear terms in a simplesecondary-structure-specific approach based on three structuretypes (helix, sheet, and loop). Unlike the UNRES potential, but asfor many related functions, OPUS-CA is not formulated for first orsecond derivatization as is needed for molecular mechanics.

Other coarse-grained potentials have been devised for applica-tions in dynamic simulation. In one model (18), the SCs are used tocalculate only the linear energy terms (i.e., virtual bonds andnonbonding interactions). Forces here are formulated in analogy tothe full-atomic CHARMM force field; thus, the nonbonded inter-actions, virtual C�–C� dihedrals, and bond angles are similar incharacter to CHARMM curvilinear terms. This model has beenused in molecular dynamics simulations of complex biologicalsystems. In a different approach, atomic features are mapped intoa reduced representation for a coarse-grained potential that ac-counts for the double-well character of virtual dihedral and bondangle potentials (19). This potential, parameterized by statisticalinformation solely from HIV protease, has been used in Browniandynamics simulations of HIV-1 protease (20). None of these‘‘nonharmonic’’ force fields has been verified systematically againstexperimental data except for comparisons of UNRES-generated abinitio structures to crystal structures.

We have developed a restrained coarse-grained force field forprotein molecules. Our potential function is based on virtual atomsat C� atomic positions, and it is constituted to preserve accurategeometry in computations on the structure and dynamics of bio-logical macromolecules. We therefore call this the virtual atommolecular mechanics (VAMM) potential. VAMM includes bothlinear and curvilinear terms, parameterized against crystal-structure data by the Boltzmann conversion method (21) and alsolocal restraints that ensure computational stability. Energy mini-mizations with VAMM converge to energy gradients of order 10�6

kcal/molÅ without significant distortions. Normal mode calcula-tions with VAMM yield excellent fits to experimentally obtained

Author contributions: A.K. and W.A.H. designed research; A.K. performed research; A.K.and W.A.H. analyzed data; and A.K. and W.A.H. wrote the paper.

The authors declare no conflict of interest.

Freely available online through the PNAS open access option.

1To whom correspondence should be addressed: [email protected].

This article contains supporting information online at www.pnas.org/cgi/content/full/0907674106/DCSupplemental.

www.pnas.org�cgi�doi�10.1073�pnas.0907674106 PNAS � September 15, 2009 � vol. 106 � no. 37 � 15667–15672

BIO

PHYS

ICS

AN

DCO

MPU

TATI

ON

AL

BIO

LOG

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

29, 2

020

Page 2: A force field for virtual atom molecular mechanics …A force field for virtual atom molecular mechanics of proteins Anil Korkuta and Wayne A. Hendricksona,b,1 aDepartment of Biochemistry

B-factors. The current VAMM force field is useful in variouscomputational approaches to protein dynamics, even for the largestsystems, and it has flexibility for extensions to include other typesof molecules, such as nucleic acids and lipids, and other proteinfeatures, such as SCs.

Theoretical Formulation

Coarse-Grained Force Field Parameterization. VAMM is a coarse-grained model of polypeptide chains based on C� atoms, and itdefines the restraints and pseudoforces acting upon these atoms(Fig. 1). The potential function is formulated as

VVAMM � Vbonded � Vangle � Vdihedral � Vnonbonded � V local [1]

where Vbonded ,Vangle, Vdihedral, Vnonbonded, and Vlocal are the virtualbond, angle bending, dihedral, nonbonded, and local restraintpotentials, respectively. All of the energy terms correspond to thevirtual interactions between the C� atoms.

To parameterize the functional forms for each term, statisticaldistributions of the properties are calculated from the evaluation(EVA) database (22), which contains more than 2,600 uniquestructures of proteins. The resulting probability distributions areused to calculate a potential of mean force by the Boltzmannconversion method,

Vi � kBT ln�Pi/P0� [2]

where kB is the Boltzmann constant, T is the temperature, Pi is theprobability of a property at value i, and P0 is the reference-stateprobability.

Virtual C�–C� Bonds. The virtual interaction between the bondedC� atoms are described by a harmonic potential, such that

V bondedBT �r ij� � kbonded� �r ij� � �r0��2 [3]

Three types of virtual bonded restraints exist in polypeptide chains.The first two are the cis and trans peptide bonds on the main chain,and the third includes the disulfide bridges that link the cysteineresidues. Each type is explained by the same harmonic potential butwith different parameters, specified here by the superscript BT forbond type.

The statistical analysis shows a narrow distribution of C�–C�distances centered on 3.78 Å (� � 0.0093 Å) for trans peptides (Fig.2A). Hence, a minimum at 3.8 Å can be adopted for the transpeptides or, alternatively, crystal-structure distances can be set asthe minima point. Cis peptide bonds, most common before prolineresidues, adopt a different conformation where the C�–C� distanceis 2.97 Å on average. To avoid deficiencies around the cis–peptidebonds—or a more complicated functional form to account for thecis conformation as well—the same functional form is preserved,whereas the crystal-structure distances given for each C� pair aretaken as the minima points in a fashion similar to those seen inharmonic network models (14). Harmonic potential fitting to theenergy profile obtained from Boltzmann conversion of the prob-

Fig. 1. The C�-only coarse-grained model. The VAMM pseudoforce fieldincludes restraints for virtual bonds, virtual dihedral angles (�), virtual bondangles (�), local interactions, and nonbonded interactions. The springs repre-sent the virtual bonding and local restraints, and the purple shaded springrepresents the nonbonded interactions.

Fig. 2. Statistical analyses (Upper) and potentials of mean force (Lower) for virtual trans C�–C� bonds, virtual dihedral angles, virtual bond angles, and nonbondedinteactions. (A) The probability distribution, P(r), for virtual bond lengths, r � C�–C� distance, in the EVA database (Upper) and corresponding Boltzmann conversionenergy, E (Lower). The computed PMF (black curve) is fitted by a harmonic potential to give the VAMM potential term (red curve). (B) Probability distribution, P(�), ofvirtual dihedral angles � for residue quartets with the helix–helix–helix–helix (HHHH) conformation (Upper) and corresponding Boltzmann energy, E (Lower). The PMFfromtheHHHHdistribution(black) isfittedbyaFourier series toyieldtheVAMMterm(red). (C)Probabilitydistribution,P(�), forvirtualbondangles,�, for residuetripletswiththehelix–helix–helix (HHH)conformation(Upper)with its correspondingBoltzmannenergy,E (Lower).ThecalculatedPMF(black) isfittedbyaFourier series insidethe defined boundary and a harmonic potential outside to give the VAMM potential term (red). (D) The radial distribution g(r) of all nonbonded interactions calculatedinashellof12-Åradius (Upper)andthecorrespondingBoltzmannenergy,E (Lower).Notethedouble-peakfrompopulationsofshort-andlong-rangeinteractionshells.The PMF for this overall distribution (black) is fitted by a Morse potential (red). Note that nonbonded interactions for VAMM are parameterized separately for eachpair of residue types.

15668 � www.pnas.org�cgi�doi�10.1073�pnas.0907674106 Korkut and Hendrickson

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

29, 2

020

Page 3: A force field for virtual atom molecular mechanics …A force field for virtual atom molecular mechanics of proteins Anil Korkuta and Wayne A. Hendricksona,b,1 aDepartment of Biochemistry

ability distribution from trans peptides (Fig. 2A) yields a springconstant of �70 kcal/molÅ2. The same force constant is alsoadopted for cis peptides, because bonding interactions are similarin cis and trans peptides.

Disulfide bridges are also modeled by a harmonic potential as forthe C�–C� virtual bonds. A virtual bond is assumed to existbetween the C� atoms of the Cys residues that form the bridges. Incontrast to C�–C� interactions, however, there is a fairly largeheterogeneity in the distribution of distances for disulfide bridges.In this circumstance, the equilibrium distances in the harmonicpotentials for these virtual bonds are assigned directly as thecrystal-structure values. Due to the relative heterogeneity of the C�atoms related to the disulfide bridges, an empirical force constantvalue of 20 kcal/molÅ2 is adopted. This value permits relativeflexibility, but it also preserves the disulfide–bridge restraints.

Virtual Dihedral Restraints. The virtual dihedral angle defined byfour consecutive C� atoms is directly related to the secondarystructure of the residues involved. Based on this main observation,the dihedral angle distributions are calculated for every secondarystructure combination possible for each quartet of residues. For thispurpose, Dictionary of Protein Secondary Structure (DSSP) (23)assignments are made to all proteins represented in the EVAdatabase. DSSP assigns eight different secondary structure types toresidues (i.e., �-helix [H], 310 helix [G], �-helix [I], -sheet [E], turn[T], -bulge [B], bend [S], and loop [L]), but we combine -bulgewith the -sheet assignment [E] because these types have verysimilar dihedral and bending angle propensities. Hence, a series ofsecondary-structure-based classes of dihedral angle quartets aredefined, such as HHHH, HHHL, EEEE, etc. Among such quartets,the dihedrals that are not represented more than 1,000 times in theEVA database are combined into more general classes. For exam-ple, XGLX, which groups all of the virtual dihedrals whose centralresidues are a 310 helix and loop, are grouped in the same quartet.These groupings yielded a total of 126 quartet types having specificdihedral energy parameters.

Having classified the C� quartets, the probability distribution fordihedral angles in each group is calculated from the EVA database.The probability distribution for the HHHH quartet is given in Fig.2B as an example. Each probability distribution is used to calculatethe corresponding potential of mean force (PMF) by using theBoltzmann conversion method. In the final step, PMF is fit to theFourier series as

VdihedralSS ��� � V0 � �

n�1

4

�kn cos�n�� � k�n sin�n��� [4]

where � is the virtual dihedral angle, is the multiplicity, and kn andk�n values are the force constants. The superscript SS denotes thatthe virtual dihedral potential is secondary-structure specific. Vo isthe base energy for each SS dihedral type. The fitting to the PMFcurve for the HHHH quartet is given in Fig. 2B.

Virtual Bond Angle. PMFs for virtual bond angles are calculated ina way similar to dihedral angles. For the bond angles, quartets ofdihedrals are replaced by the residue triplets that define the bondangle. This procedure has yielded a total of 79 triplet types havingspecific angle-bending parameters. Different from the dihedraldistribution, the virtual-bond-angle distributions indicate that onlya limited number of angle values are populated in the Protein DataBank (PDB), giving rise to localized, sharp peaks. This finding leadsto a PMF defined only in a subset of the angle space (i.e., 0 to �),and the PMF cannot be described outside of these boundaries ofdefinition. An example is given for the HHH (helix–helix–helix)triplet in Fig. 2C. The sharp peak is centered on roughly �/2, andthe probability values are 0 outside the boundary [0.3 �, 0.75 �]. Todefine the energy outside the boundaries and avoid over-fitting,

potential energy is described by a Fourier series around thepopulated regions (i.e., in the boundary) and fitting a harmonicpotential outside the boundaries. The resulting potential is de-scribed as follows:

VangleSS ��� � a���� �

n�0

4

�kn cos�n�� � k�n sin�n����� �1 � a�����k��� ij � �0�

2� . [5]

The k values describe the force constants, is the multiplicity, and� is the angle. SS again denotes the particular secondary-structurespecification, and �0 is the base value of the angle for a givensecondary-structure state. The effective part of the angle-bendingpotential is specified by the Boolean operator a(�), whereby theangle is within the boundary of the peak of probability distributionfor a(�) � 1, and it is outside for a(�) � 0.

Nonbonded Interactions. The interaction between nonbonded C�atoms can be modeled as springs linking the atoms in a similarfashion to the network models (14). However, the radial distribution ofC� atoms within a 12-Å sphere indicates the existence of two interac-tion shells, as noted before (24). The generic radial distribution functionis given in Fig. 2D. For the generic distribution, the short-rangeinteraction shell peaks at �5.5 Å, whereas the long range shell peaks at�9.9 Å. Residue-specific radial distributions differ from one another,but a similar bipartite character is preserved.

Based on these observations, a separate Morse potential withparameters defined for long-range and short-range interactions isfitted to the PMF for each of the 210 types of residue pairs. Theboundary position between the two shells, rboundary, is defined as thedistance where the fitted functions for long-range and short-rangeinteractions cross (Fig. 2D). An artificially high potential energy isobtained for the atoms within /�1 Å from rboundary due touncertainty about belonging to short- vs. long-range interactionshells. Thus, we smooth the potential in this interval to the averageof nonbonded energies at rboundary�1Å and rboundary1Å. Thisboundary-smoothening dramatically improves VAMM energyminimizations, permitting convergence to very low energy gradi-ents without significant structural distortions (see Applicationssection below). A generic form for the nonbonded interactions isrepresented in Fig. 2D for simplicity, but unlike the Morse potentialin ref 19. our function has residue-specific parameters. The poten-tial is formulated as

VnonbondedRR �r ij� � 6e �r0/2.8���1 � e�1�r0�rij��2 � �2�

r0, �1, �2 � � r0�short, �1�short, �2�short r � rboundary

r0�long, �1�long, �2�long r rboundary

[6]

where RR denotes the particular residue pair, r0 values are atenergy-minimum distances, and �1 and �2 are residue-specificenergy parameters. The r0, �1, and �2 values are fitted separately forthe long- and short-range interactions of each residue pair. Becausethe radial distribution function is normalized according to the 12-Åcutoff distance, the energy assumes the value of �0 at the cutoffdistance without any shifting or switching of function. Nonbondedinteractions are defined only for the atom pairs that are more thanfive residues away from each other on the polypeptide chain.

Local Restraints. It is well known that C� atoms generally enjoy lessflexibility compared with the side-chain atoms in proteins. Mostfull-atomic force fields actually account for this difference inher-ently. When a protein system is represented only by its C� atoms,however, proper restrictions on flexibility are not imposed bytypical pseudoforces. We note that experimental distributions ofC�–C� distances for closely linked residues on the polypeptidechain (2 � � j�i� � 5) are similar to, albeit broader than, those for

Korkut and Hendrickson PNAS � September 15, 2009 � vol. 106 � no. 37 � 15669

BIO

PHYS

ICS

AN

DCO

MPU

TATI

ON

AL

BIO

LOG

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

29, 2

020

Page 4: A force field for virtual atom molecular mechanics …A force field for virtual atom molecular mechanics of proteins Anil Korkuta and Wayne A. Hendricksona,b,1 aDepartment of Biochemistry

virtually bonded pairs (� j�i� � 1), whereas the distributions for trulynonbonded pairs (� j�i� 5) have two peaks (Fig. 2D). Computa-tional distortions can arise because, in the absence of a full-atomicmodel, strict restraints can be maintained for dihedral and angularterms while permitting deviations from statistical values for otherlocal features. To prevent such unrealistic distortions, local har-monic restraints are incorporated into the force field. The localharmonic restraint potential is defined as

Vlocal�r ij� � k local� �r ij� � �r0��2 [7]

where rij is an inter-C� distance, r0 is the corresponding equilibriumvalue taken from the crystal structure, and klocal is the pseudoforceconstant. Local restraints are defined between C� atoms within acontiguous span of five residues, i.e., 2 � � j�i� � 5. A statisticalanalysis of local residue pairs yields slightly different force constantsdepending on separations along the peptide chain, but we use theaverage value of 5 kcal/molÅ2 for klocal in our applications. Thisvalue suffices to maintain structural integrity during energy mini-mizations and does not exert excessive constraints during coarse-grained normal mode analysis.

ApplicationsEnergy Minimization with Truncated Newton Method. Extensive en-ergy minimization of a protein crystal structure can cause distor-tions of the protein from its native state if computational proce-dures are inappropriate. An accurate minimization algorithm andforce field should not yield significantly high distortions from anaccurate starting model, such as a high-resolution crystal structure(25). Structure preservation upon energy minimization of high-resolution crystal structures is a test to assure that the force field willbe useful to define macromolecular structures around their knownnative states and to evolve meaningful characteristics during furthercomputational analysis.

To detect the behavior of proteins under the VAMM restraintsand prepare them for further analysis, energy minimizations areperformed. For this purpose, the TN algorithm (TNPACK) pro-vided by Schlick and colleagues (26) is adapted to the coarse-grained systems and VAMM. The TN method (27) is a second-order minimization algorithm that can minimize large systems tovery low energy gradients in few steps. Derivatives are calculated byusing the chain rule and the mathematical formulation given in refs.28 and 29.

Results from energy minimizations on a test set of high-resolutionstructures, performed to reach gradient values down to the order of10�6 kcal/molÅ, are given in Table 1. These VAMM minimizationscaused an average root-mean-squared-deviation (RMSD) of 0.72 Å,whereas minimizations of the same set with the CHARMM19 forcefield and an adopted basis Newton–Raphson (ABNR) algorithm leftdeviations (RMSD � 1.74 Å) much higher than those with VAMM.Distortions observed for the minimizations with VAMM are inthe same range as those observed for minimization with otherfull-atomic force fields such as AMBER (RMSD � 0.41 Å),OPLS (RMSD � 0.92 Å), and GROMOS96 (RMSD � 1.36 Å),which performed on a different set of proteins (25).

An example-run VAMM-based minimization is given in Fig. 3for the pro-kumamolisin proteinase (PDB ID 1T1E). Among all 15tested cases, only one exceeded 1 Å in residual RMSD betweenexperimental and minimized structures. Initial gradient values (Fig.3B) all fall into the range from 1.4 to 2.0 kcal/molÅ. These resultsverify that VAMM defines native structures in a robust andaccurate way, giving minimized states in accordance with proteinstructure space as defined by crystal structures.

Normal Mode Analysis. Normal mode analysis (NMA) has been oneof the most popular applications of the coarse-grained systems (14).NMA methods, such as network models, when applied to coarse-grained systems generally give accurate and useful information onthe functional/collective motions of the protein (12). The atomicfluctuations calculated by the NMA are usually compared with theexperimental B-factors of crystal structures to verify the NMAresults (30). Interestingly, simple harmonic potentials have provento be the strongest method in terms of reproducing the experimen-tal B-factors. In a recent study, five different NMA procedures arecompared with one another in terms of reproducing the B-factorsof ultrahigh-resolution crystal structures (31). The analysis showsthat the Elastic Network Model (ENM) and the similar ElNemonetwork-modeling server can reproduce the isotropic B-factorsbetter than the other approaches, whereas the block normal modeanalysis (BNM) approach (32) performs slightly better to repro-duce anisotropic B-factors of ultrahigh-resolution structures.

In light of previous results, we have performed NMA withVAMM and compared its performance with those of ENM andBNM. For VAMM and BNM, the protein structures are brieflyminimized to gradient values of 0.1 kcal/molÅ by the TN methodand 10�3 kcal/molÅ by the ABNR algorithm, respectively. No

Table 1. Energy minimizations and correlations with B-factors from crystals structures

Minimizationdeviation, Å*

Correlationcoefficient†

PDB ID Protein name Residues Resolution, Å R‡ VAMM CHARMM VAMM ENM BNM

1hhp HIV protease 198 2.70 0.19 0.67 1.90 0.76 0.72 0.751t1e Pro-kumamolisin 534 1.18 0.21 0.70 2.22 0.60 0.41 0.591wcw UROIII synthase 254 1.30 0.18 1.06 2.31 0.60 0.48 0.511yfq Bub3 342 1.10 0.15 0.90 2.42 0.53 0.56 0.411xnb Xylanase 185 1.49 0.17 0.79 1.43 0.64 0.61 0.781yt3 ExoRNase 375 1.60 0.20 0.81 2.41 0.57 0.62 0.531z53 Cyt c peroxidase 293 1.13 0.14 0.57 1.90 0.77 0.77 0.661xkr CheC 205 1.75 0.23 0.75 1.82 0.58 0.66 0.492 sil Neuraminidase 381 1.60 0.17 0.79 1.40 0.63 0.44 0.553pte DD-peptidase 347 1.60 0.16 0.62 1.15 0.79 0.68 0.601wdp -Amylase 493 1.2 0.13 0.59 1.43 0.67 0.53 0.631qgi Chitosanase 259 1.60 0.19 0.62 1.48 0.85 0.66 0.741vd5 Glucuronidase 381 1.80 0.17 0.55 1.23 0.81 0.64 0.621 szn �-Galactosidase 417 1.54 0.15 0.76 1.50 0.54 0.54 0.531nwz PYP 125 0.82 0.12 0.64 1.56 0.54 0.52 0.39Average 0.72 1.74 0.66 0.59 0.59

*RMSD between minimized and X-ray structures with VAMM and CHARMM19 force fields.†Comparison between experimental B(C�) and mean square NMA fluctuations by different procedures.‡Crystallographic residual, R � � (�Fobs� � �Fcalc�)/��Fobs�.

15670 � www.pnas.org�cgi�doi�10.1073�pnas.0907674106 Korkut and Hendrickson

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

29, 2

020

Page 5: A force field for virtual atom molecular mechanics …A force field for virtual atom molecular mechanics of proteins Anil Korkuta and Wayne A. Hendricksona,b,1 aDepartment of Biochemistry

further energy minimization was required with the coarse-grainedrepresentation as no imaginary normal mode is observed. Wejustify our NMA approach in the SI Appendix (33). The Hessianmatrix is computed by standard methods. Second derivatives ofdihedral and angular terms for the VAMM-based NMA arecalculated by the chain rule (28, 29). The Hessian is diagonalized toobtain normal mode vectors and frequencies with singular valuedecomposition. The resulting eigenvectors are used to computeatomic fluctuations, which are compared with experimental B-factors (Table 1).

VAMM yields an average correlation coefficient of 0.66, whereasthis value is 0.59 when ENM or BNM is used. The statisticalsignificance of these difference are tested with a t test showing P �0.04 for differences related to both ENM and BNM. Thus, VAMMperforms better than both other methods in reproducing theexperimental B-factors of high-resolution structures. Specific ex-amples comparing VAMM with network models are shown inFig. 4. At some positions with ENM, C� atoms show unrealisticallyhigh fluctuations; VAMM prevents such improper behavior byincorporating local restraint terms (Eq. 4). Ma and colleagues haveintroduced alternative approaches to mitigate this ‘‘tip effect’’problem by using a modified elastic potential function in ENMcalculations (34) and a pairwise Hessian diagonalization scheme inBNM calculations (35). In practice, tip effects may not introduce avery serious problem because the important slow modes do notusually display this abnormality; however, it causes defects inanalysis of the overall normal modes. The problem most probablyarises because restraints in simple models derive solely frompacking density within a protein, and systems become unstablewhere packing density is extremely low.

We note that the utility of B-factor comparisons as a test ofnormal mode calculations (30) is limited by crystal lattice effects.An analysis of 43 T4 lysozyme molecules in 25 crystal latticesshowed an average of 0.55 for the correlation coefficient of mainchain B-factors, each against the average of all others (36), andB-factor distributions also vary significantly among four myoglobinstructures (37). Calculated fluctuations are the same independentof crystal lattice, and we find that VAMM-NMA fluctuationscorrelate with the average B-factors from T4 lysozyme structures atthe level of 0.66 and with the myoglobin averages at 0.75. Thus, theaverage of 0.66 from VAMM for the 15-structure test set (Table 1)may be near the limit of intrinsic variation.

These results indicate that VAMM provides excellent accor-dance with experimental B-factors and controls against abnormallyhigh local fluctuations at ill-defined C� positions. This is the secondverification of VAMM against experiment, following energy min-imizations. It thus implies that VAMM can be used to calculate thenormal modes of biological macromolecules to model proteindynamics, and it opens the way for incorporating this force field intoother simulation methods.

Transition Pathway Calculations. An accurate and efficient descrip-tion of large-scale conformational transition pathways and inter-mediate states of proteins is an important challenge for computa-tional biology. Coarse-grained pathway-analysis algorithms basedon network models and simple harmonic potentials, or potentialsderived directly from the proteins of interest, have been applied byus (38) and others (39–42) to transitions between alternativeprotein conformations. Such computations typically generate in-termediate structures that are highly distorted from allowed con-formations of polypeptide chains; the computations are, in fact, soloaded with high strain energies that intermediate structures havebeen interpreted as undergoing ‘‘conformational transition throughcracking’’ (40, 42). Moreover, no doubt because of such distortions,the pictured transition pathways do not converge efficiently (42). Incontrast, the VAMM force field provides an accurate and com-prehensive energy function, but one simple enough for calculatinglarge-scale conformational changes (38). VAMM also provides a

Fig. 3. Truncated Newton energy minimization. (A) The energy (E) minimi-zation profile for pro-kumamolisin, a sedolisin-type proteinase (PDB ID 1T1E).(B) The root-mean-square energy gradient, Grms, calculated for the sameprotein during the minimization process. (C) The RMSD of the minimizedstructure from the crystal structure of pro-kumamolisin during minimization.Contributions of individual potential energy terms to the total VAMM energychange during the minimization are shown in Fig. S1 of the SI Appendix.

Fig. 4. Profiles of mean square (ms) fluctuations along the polypeptide chain.(A) Chitonase crystal structure. (B) Pro-kumamolisin, a sedolisin-type proteinasecrystal structure.Profilesare shownforexperimentalB-factors (bluecurves), fromNMA using a harmonic potential (green curves), from a NMA using VAMM toconstruct the Hessian matrix (red curves). Fluctuations are given as ms valuesnormalized to match the crystallographic B factors, B � 8�2�u2 where �u2 is themean fluctuation in the direction of the scattering vector.

Korkut and Hendrickson PNAS � September 15, 2009 � vol. 106 � no. 37 � 15671

BIO

PHYS

ICS

AN

DCO

MPU

TATI

ON

AL

BIO

LOG

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

29, 2

020

Page 6: A force field for virtual atom molecular mechanics …A force field for virtual atom molecular mechanics of proteins Anil Korkuta and Wayne A. Hendricksona,b,1 aDepartment of Biochemistry

basic framework for expansion to include side-chain positionswithout significant sacrifice from its simplicity.

Molecular Dynamics Simulations. A potential and important appli-cation of the VAMM force field is its implementation into molec-ular dynamics simulations (43). The general applicability andaccuracy of the VAMM force field suggests that such implemen-tation is feasible and has the potential to dramatically expand theuse of the VAMM force field for specific types of applications. Suchcoarse-grained molecular dynamics simulations may be used, forexample, in undirected sampling of large-scale conformationalchanges, for steered transformation pathway simulations, and inexamination of conformational ensembles about NMA-based tran-sition pathway calculations.

MethodsProtein Structures. The protein structures are chosen randomly from a set ofultrahigh-resolutioncrystal structureswithresolutionsbetterthan2Å.Thecrystalstructure of HIV-1 protease (PDB ID 1HHP) is an exception in this selection with aresolution of 2.7 Å (44); it is chosen due to its common use in analysis of proteindynamics. Secondary-structure assignments are made by using the DSSP software(23) and the C� atom coordinates are extracted from PDB files for VAMM-basedcalculations.

Energy Minimization. Energy minimizations with VAMM are performed by usingthe TN algorithm (26). The TNPACK numeric algorithm for Hessian-vector multi-plication is generally used; a user-defined analytical multiplication algorithm didnot provide any significant improvement. A golden-section line search is adaptedto increase minimization efficiency. Default settings of TNPACK are used forother options.

Energy minimizations with the Charmm19 force field are performed by theCHARMM simulation package (45). Nonbonded interaction parameters are set

such that the electrostatic interaction is shifted to zero at 12 Å, and the van derWaals interaction is switched off from 8 Å to 12 Å. A distance-dependent dielec-tric constant (� � 4r) is adopted for minimizations in vacuum to mimic the effectof the solvent.

NMA. Elastic Network Analysis is performed, similar to the case defined in ref. 14.A simple harmonic potential function is adapted to model all of the hypotheticalsprings connecting the C� atoms of proteins.

VENM � �i, j

V�r ij� � �ij

kENM� �r ij� � �r0��2 [8]

where Vij is the pair wise energy between the atoms i and j, rij is the fluctuatingdistance, and r0 is the equilibrium distance given in the crystal structures. A cut ofdistanceof13Åisadopted.AHessian isconstructedanddiagonalizedtocalculatethe eigenvectors and eigenvalues corresponding to normal mode fluctuationvectors and frequencies respectively. VAMM-based NMA differs only in the use ofVAMM for Hessian construction instead of harmonic potentials.

BNM is performed by the vibran module of the CHARMM simulation package(45).Theproteinmoleculesareenergy-minimizedtoagradientof10�3 kcal/molÅto avoid negative eigenvalues. The same nonbonded interactions are used forenergy minimizations.

Atomic fluctuations (theoretical B-factors) are calculated from the frequency-weightedlinearsumofallnormalmodesascomputedbyanyoneofthemethods.

A manageable set of 15 structures used for comparison of theoretical andexperimental B-factors was chosen from a large set of high-resolution structuresanalyzed by Eyal et al. (37), such as to have virtually the same average correlationcoefficientagainstENMnormalmodefluctuationsforthese15(0.59)asforall176(a range of 0.54–0.58 with varying ENM parameters).

ACKNOWLEDGMENTS. We thank Barry Honig and Ogan Gurel for comments onthe manuscript. This work was supported in part by National Institues of HealthGrant GM56550 (to W. A. H.).

1. Pohl FM (1972) Cooperative conformational changes in globular proteins. AngewChem 11:894–906.

2. Parak FG (2003) Proteins in action: the physics of structural fluctuations and confor-mational changes. Curr Opin Struct Biol 13:552–557.

3. Ban N, et al. (2000) The complete atomic structure of the large ribosomal subunit at 2.4Å resolution. Science 289:905–920.

4. Braig K, et al. (1994) The Crystal structure of the bacterial chaperonin groEL at 2.8Å.Nature 371:578–586.

5. Reinisch KM, Nibert M, Harrison SC (2000) Structure of the reovirus core at 3.6 Åresolution. Nature 404:960–967.

6. Cramer P, Bushnell DA, Kornberg RD (2001) Structural basis of transcription: RNApolymerase II at 2.8Å resolution. Science 292:1863–1876.

7. Mitra K, Frank J (2006) Ribosome dynamics: Insights from atomic structure modelinginto cryo-electron microscopy maps. Annu Rev Biophys Biomol Struct 35:299–317.

8. Vonrhein C, Schlauderer GJ, Schulz GE (1995) Movie of the structural changes duringa catalytic cycle of nucleoside monophosphate kinases. Structure 3:483–490.

9. Levitt M, Warshel A (1975) Computer-simulation of protein folding. Nature 253:694–698.10. Bahar I, Rader AJ (2005) Coarse-grained normal mode analysis in structural biology.

Curr Opin Struct Biol 15:586–592.11. Stein M, Gabdoulline RR, Wade RC (2007) Bridging from molecular simulation to

biochemical networks. Curr Opin Struct Biol 17:166–172.12. Haliloglu T, Bahar I (1999) Structure based analysis of protein dynamics: Comparison of

theoretical results for hen lysozyme with X-ray diffraction and NMR relaxation data.Proteins Struct Funct Genet 37:654–667.

13. Liwo A, et al. (1997) A united residue force field for off-lattice protein structuresimulations. 1. Functional forms and parameters of long range sidechain interactionpotentials from protein crystal data. J Comput Chem 18:849–873.

14. Atilgan AR, et al. (2001) Anisotropy of fluctuation dynamics of proteins with an elasticnetwork model. Biophys J 80:505–515.

15. Liwo A, Khalili M, Scheraga HA (2005) Ab initio simulations of protein-folding path-ways by molecular dynamics with the united-residue model of polypeptide chains. ProcNatl Acad Sci USA 102:2362–2367.

16. Poole AM, Ranganathan R (2006) Knowledge-based potentials in protein design. CurrOpin Struct Biol 16:508–513.

17. Wu YH, et al. (2007) OPUS-Ca: A knowledge-based potential function requiring only Calpha positions. Protein Sci 16:1449–1463.

18. Shih AY, Arkhipov A, Freddolino PL, Schulten K (2006) Coarse grained protein-lipidmodel with application to lipoprotein particles. J Phys Chem B 110:3674–3684.

19. Tozzini V, Rocchia W, McCammon JA (2006) Mapping all-atom models onto one-beadcoarse-grained models: General properties and applications to a minimal polypeptidemodel. J Chem Theory Comput 2:667–673.

20. Chang CE, et al. (2006) Gated binding of ligands to HIV-1 protease: Brownian dynamicssimulations in a coarse-grained model. Biophys J 90:3880–3885.

21. Reith D, Meyer H, Muller-Plathe F (2002) CG-OPT: A software package for automaticforce field design. Comput Phys Commun 148:299–313.

22. Eyrich VA, et al. (2001) EVA: continuous automatic evaluation of protein structureprediction servers. Bioinformatics 17:1242–1243.

23. KabschW,SanderC(1983)Dictionaryofproteinsecondarystructure—Pattern-recognitionof hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637.

24. Bahar I, JerniganRL(1997) Inter-residuepotentials inglobularproteinsandthedominanceof highly specific hydrophilic interactions at close separation. J Mol Biol 266:195–214.

25. Summa CM, Levitt M (2007) Near-native structure refinement using in vacuo energyminimization. Proc Natl Acad Sci USA 104:3177–3182.

26. Schlick T, Fogelson A (1992) Tnpack—a Truncated Newton Minimization Package forLarge-Scale Problems. 2. Implementation Examples. ACM Trans Math Software 18:71–111.

27. Schlick T, Overton M (1987) A powerful truncated Newton method for potential energyminimization. J Comput Chem 8:1025–1039.

28. Blondel A, Karplus M (1996) New formulation for derivatives of torsion angles andimproper torsion angles in molecular mechanics: Elimination of singularities. J ComputChem 17:1132–1141.

29. Niketic SR, Rasmussen K (1977) The Consistent Force Field (Springer, New York).30. Eyal E, Yang LW, Bahar I (2006) Anisotropic network model: Systematic evaluation and

a new web interface. Bioinformatics 22:2619–2627.31. Kondrashov DA, et al. (2007) Protein structural variation in computational models and

crystallographic data. Structure 15:169–177.32. Tama F, Gadea FX, Marques O, Sanejouand YH (2000) Building-block approach for

determining low-frequency normal modes of macromolecules. Proteins Struct FunctGenet 41:1–7.

33. Levitt M, Sander C, Stern PS (1985) Protein normal-mode dynamics—Trypsin-inhibitor,crambin, ribonuclease and lysozyme. J Mol Biol 181:423–447.

34. Lu MY, Poon B, Ma JP (2006) A new method for coarse-grained elastic normal-modeanalysis. J Chem Theory Comput 2:464–471.

35. Lu MY, Ma JP (2008) A minimalist network model for coarse-grained normal modeanalysis and its application to biomolecular x-ray crystallography. Proc Natl Acad SciUSA 105:15358–15363.

36. Zhang XJ, Wozniak JA, Matthews BW (1995) Protein flexibility and adaptability seen in25 crystal forms of T4 lysozyme. J Mol Biol 250:527–552.

37. Kondrashov DA, et al. (2008) Sampling of the native conformational ensemble ofmyoglobin via structures in different crystalline environments. Proteins Struct FunctBioinf 70:353–362.

38. Korkut A, Hendrickson WA (2009) Computation of conformational transitions inproteins by virtual atom molecular mechanics as validated in application to adenylatekinase. Proc Natl Acad Sci USA 106:15673–15678.

39. Maragakis P, Karplus M (2005) Large amplitude conformational change inproteins explored with a plastic network model: Adenylate kinase. J Mol Biol352:807– 822.

40. Whitford PC, Miyashita O, Levy Y, Onuchic JN (2007) Conformational transitions ofadenylate kinase: Switching by cracking. J Mol Biol 366:1661–1671.

41. Zheng WJ, Brooks BR, Hummer G (2007) Protein conformational transitions exploredby mixed elastic network models. Proteins Struct Funct Bioinf 69:43–57.

42. Kirillova S, Cortes J, Stefaniu A, Simeon T (2008) An NMA guided path planningapproach for computing large amplitude conformational changes in proteins. ProteinsStruct Funct Bioinf 70:131–143.

43. Adcock SA, McCammon JA (2006) Molecular dynamics: Survey of methods for simu-lating the activity of proteins. Chem Rev 106:1589–1615.

44. Spinelli S, et al. (1991) The 3-dimensional structure of the aspartyl protease from theHiv-1 isolate bru. Biochimie 73:1391–1396.

45. Brooks BR, et al. (1983) Charmm - a Program for Macromolecular Energy, Minimization,and Dynamics Calculations. J Comput Chem 4:187–217.

15672 � www.pnas.org�cgi�doi�10.1073�pnas.0907674106 Korkut and Hendrickson

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

29, 2

020