statistical thermodynamics of residue fluctuations in native proteins

14
Statistical thermodynamics of residue fluctuations in native proteins Osman N. Yogurtcu, Mert Gur, and Burak Erman Citation: J. Chem. Phys. 130, 095103 (2009); doi: 10.1063/1.3078517 View online: http://dx.doi.org/10.1063/1.3078517 View Table of Contents: http://jcp.aip.org/resource/1/JCPSA6/v130/i9 Published by the American Institute of Physics. Additional information on J. Chem. Phys. Journal Homepage: http://jcp.aip.org/ Journal Information: http://jcp.aip.org/about/about_the_journal Top downloads: http://jcp.aip.org/features/most_downloaded Information for Authors: http://jcp.aip.org/authors Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Upload: burak

Post on 08-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical thermodynamics of residue fluctuations in native proteins

Statistical thermodynamics of residue fluctuations in native proteinsOsman N. Yogurtcu, Mert Gur, and Burak Erman Citation: J. Chem. Phys. 130, 095103 (2009); doi: 10.1063/1.3078517 View online: http://dx.doi.org/10.1063/1.3078517 View Table of Contents: http://jcp.aip.org/resource/1/JCPSA6/v130/i9 Published by the American Institute of Physics. Additional information on J. Chem. Phys.Journal Homepage: http://jcp.aip.org/ Journal Information: http://jcp.aip.org/about/about_the_journal Top downloads: http://jcp.aip.org/features/most_downloaded Information for Authors: http://jcp.aip.org/authors

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 2: Statistical thermodynamics of residue fluctuations in native proteins

Statistical thermodynamics of residue fluctuations in native proteinsOsman N. Yogurtcu, Mert Gur, and Burak Ermana�

Center for Computational Biology and Bioinformatics, Koc University, Sariyer 34450, Istanbul, Turkey

�Received 7 August 2008; accepted 14 January 2009; published online 6 March 2009�

Statistical thermodynamics of residue fluctuations of native proteins in a temperature, pressure, andforce reservoir is formulated. The general theory is discussed in terms of harmonic and anharmonicfluctuations of residues. The two elastic network models based on the harmonic approximation, theanisotropic network and the Gaussian network models are discussed as the limiting cases of thegeneral theory. The heat capacity and the correlations between the energy fluctuations and residuefluctuations are obtained for the harmonic approximation. The formulation is extended to largefluctuations of residues in order to account for effects of anharmonicity. The fluctuation probabilityfunction is constructed for this purpose as a tensorial Hermite series expansion with higher ordermoments of fluctuations as coefficients. Evaluation of the higher order moments using the proposedstatistical thermodynamics model is explained. The formulation is applied to a hexapeptide and thefluctuations of residues obtained by molecular dynamics simulations are characterized in theframework of the model developed. In particular, coupling of two different modes in the nonlinearmodel is discussed in detail. © 2009 American Institute of Physics. �DOI: 10.1063/1.3078517�

I. INTRODUCTION

A protein in aqueous solution constitutes a system whoseatoms exhibit fluctuations over time about well defined meanpositions. The aqueous medium forms the reservoir at con-stant temperature and pressure. The magnitude of fluctua-tions may be large relative to atomic radii as indicated byexperiment. Fluctuations in atomic coordinates are well char-acterized by experiments.1 In theory, fluctuations are studiedat various levels of approximation, ranging from all-atom tocoarse-grained scales. Studying the fluctuations of the�-carbons is a convenient approximation where each succes-sive �-carbon pair is assumed to be connected by a virtualbond of fixed length and only interactions between residues,represented by their �-carbons, are considered. In the presentstudy, we adopt this level of approximation.

Coarse-grained models of fluctuations started with theimportant observation that the large amplitude fluctuations ofthe protein G-actin could be described in the harmonic ap-proximation by a single parameter only.2 Based on thissimple picture of the elastic fluctuations of a protein, theGaussian network model �GNM� was proposed,3,4 accordingto which the C�’s were assumed analogous to the junctionsof an amorphous network whose fluctuations were similar tothose given in the random amorphous network model pro-posed by Flory.5,6 As in the random network model, theGNM is based on an isotropic description of residue fluctua-tions where only the number of neighbors of a given residueis important. The anisotropic network model �ANM� wasthen introduced to estimate the directions of fluctuations.7,8

The GNM and models that followed it, collectively referredto as the elastic network models �ENMs� are found to pro-vide important insights for understanding the structure-function relations of proteins. For this reason, and because of

their immediate applicability to all kinds of proteins withoutsize restrictions, they found wide use during the pastdecade.4,9–12 In general, these studies and several others thatare cited by them, elaborate on different levels of approxi-mation of the ENMs. They try to identify the force constantsassociated with the models, compare the different models,associate the models with NMR data, optimize the modelparameters over databases, and apply the models to drugdesign problems and prediction of binding sites, foldingcores, allosteric effects, and hot residues. In addition to workin harmonic fluctuations cited here, anharmonicities of pro-tein fluctuations13,14 in the form of nonlinear modes that arelocalized in certain regions of the protein play importantroles in protein function.15,16 In this respect, coupling of fastand slow modes resulting in energy flow is the most impor-tant process responsible for the protein’s function.17

Despite this wide range of interest, a general statisticalmechanical treatment of fluctuations that describes the theo-retical basis of harmonic as well as anharmonic behavior ismissing in literature. The specific aim of the present paper isto give a statistical thermodynamic interpretation of fluctua-tions in native proteins that covers both harmonic and anhar-monic behavior.

The paper consists of three major parts: In the first part,we introduce the statistical thermodynamics basis of fluctua-tions in native proteins. We discuss, in some detail, the pair-wise inter-residue energies that play significant role in themodel. In the second part, we obtain the harmonic approxi-mation as a special case of the general formalism for fluc-tuations, and discuss the two most widely used models,ANM and GNM. We also discuss two simple applications ofthe thermodynamic formalism by deriving the heat capacityand correlations of energy and residue fluctuations of theGNM. In the third part, the effects of anharmonicities areintroduced into the probability function of fluctuations, ina�Electronic mail: [email protected].

THE JOURNAL OF CHEMICAL PHYSICS 130, 095103 �2009�

0021-9606/2009/130�9�/095103/13/$25.00 © 2009 American Institute of Physics130, 095103-1

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 3: Statistical thermodynamics of residue fluctuations in native proteins

terms of moment based tensorial Hermite series expansion.As a simple application of the series formalism, anharmonicfluctuations of a hexapeptide are obtained by molecular dy-namics �MD� simulations, and the distribution functions forthe fluctuations of C�’s are determined. Contributions fromdifferent modes of fluctuations to a given mode throughmode coupling are discussed in terms of the Hermite seriesexpansion.

II. THEORY

In this section, we present the thermodynamic and sta-tistical basis of fluctuations in native proteins. We use theentropy representation for the fundamental relation,18

S = S�U,V,R� , �1�

where S, U, V, and R are the mean �thermodynamic� valuesof the entropy, energy, volume, and position vectors of C�’s,respectively. Water is not shown explicitly in the fundamen-tal relation and only a single protein molecule is considered.The protein is in diathermal contact with the surroundingwater. Similarly, the protein is in contact with a pressure �P�and a force �F� reservoir, as a result of which the energy,volume and the positions of residues exhibit fluctuations.Other proteins are present in the surroundings but they donot influence the energy levels of the given protein. We callthe protein and the surrounding water as an element. Thecollection of all elements of the system constitutes the en-semble. Statistical mechanics is applicable to a single ele-ment. Thermodynamics applies only to an ensemble of theelements. The ensemble of elements with its extensive prop-erties constitutes a macroscopic system.18,19 The thermody-namic variables S ,U ,V ,R are obtained from the ensemble.For each element, these variables exhibit fluctuations about

their native values. The distribution f�U , V , R� of the instan-

taneous extensive variables U , V , R is given by the relation

f�U,V,R� = exp�− k−1S� 1

T,P

T,F

T�

− k−1� 1

TU +

P

TV −

F

T· R , �2�

where k is the Boltzmann constant and S�1 /T , P /T ,F /T� isthe Massieu transform of the entropy, which for the specifiedthermodynamic variables chosen reads as

S� 1

T,P

T,F

T� = S −

U

T−

P

TV +

F

T· R . �3�

The distribution now takes the explicit form

f�U,V,R� = exp�− k−1�S −U

T−

P

TV +

F

T· R

− k−1� U

T+

P

TV −

F

T· R . �4�

In Eq. �4�, provided that the system remains around the givenequilibrium point, i.e., a point on the thermodynamic surfaceS=S�U ,V ,R�, there are no restrictions on the degree of de-parture of the system, i.e., the magnitude of fluctuations,

from the average thermodynamic variables. If the fluctua-tions are large, the fluctuations may be anharmonic or mayinduce a jump from one local minimum to another. The ap-plicability of results derived from Eq. �4� is discussed indetail in Secs. III–V.

The correlation of fluctuations of the ith and jth residuesmay now be obtained from

��Ri�R jT� = �Ri − Ri��R j − R j�Tf�U,V,R� , �5�

where the superscript T denotes transpose and the summationis over all allowable states.

Using Eq. �4� in Eq. �5� leads to

��Ri�R jT� = kT� �Ri

�F j

T,P,Fi�j

, �6�

where the variables to be kept fixed are indicated as sub-scripts. The equation is valid when the system is in or closeto equilibrium. The derivation of Eq. �6� is given by Callen,18

which is outlined briefly in Appendix A.In general, if �k represents any of the extensive vari-

ables �U, �V, �R, and �k represent the conjugate variables1 /T, P /T, −F /T, then, in principle, all higher moments canbe derived iteratively according to the rule18

����k� = − k�

��k��� − k� ��

��k� , �7�

where � is a higher order product of the fluctuations of theextensive variables, �U, �V, �R. The product of the form�Ri�R j�Rk¯ is an example to � that leads to higher ordermoments of residue position fluctuations.

Equation �6� forms the statistical mechanical basis of allENMs for fluctuations in native proteins. Assuming that theprotein is in equilibrium, the right-hand side of Eq. �6� maybe evaluated if the energy of the system is known as a func-tion of residue positions. For the case of pairwise potentials,the most general form of this relation is

Eij = Eij0f ij� Rij

Rij0 , �8�

where Rij is the distance between residues i and j and f ij is adimensionless function. Eij

0 is the reference interaction en-ergy, and Rij

0 is a reference length, both of which will bediscussed in detail below.

A. The forces

The right-hand side of Eq. �3� can be expressed in termsof the independent variables T, P, and F as −��T , P ,F� /T.Knowing this relationship leads to the following five equa-tions:

� = ��T,P,F� , � = U − TS + PV − FR ,

�9�

S = −��

�T, V =

��

�P, R = −

��

�F,

where F and R are 3N dimensional, but here we representedthem as scalars for the clarity of the discussion. The fourvariables �, T, P, and F may be eliminated among these five

095103-2 Yogurtcu, Gur, and Erman J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 4: Statistical thermodynamics of residue fluctuations in native proteins

equations to yield U=U�S ,V ,R�. The forces are then ob-tained from U according to the relation F=−�U�S ,V ,R� /�R.Considering pairwise potentials Eij and concentrating on theposition variables only, i.e., neglecting S and V dependence,the forces may be written as

Fi = − �Ri j

Eij . �10�

The gradient �Eij /�R j in Eq. �10� is obtained by the chainrule

�RiEij = � �Eij

�Rij2 �Rij

2

�R j

= � �Eij

�Rij2 ��Ri · Ri − 2Ri · R j + R j · R j�

�R j

= 2� �Eij

�Rij2�R j − Ri� . �11�

The term in the first parenthesis in the second line is theslope of Eij with respect to Rij

2. For a given i and j, it is ascalar quantity whose value depends solely on the type of theenergy function used. The vectorial property of the forcecomes as the term R j −Ri in the second parenthesis. Equa-tions �10� and �11� may be arranged in matrix form as

F = ��3N�R , �12�

where ��3N� is the 3N�3N matrix defined as

�ij = �− 2�Eij

�Rij2 i � j

− i�k

�ik i = j .� �13�

Two different ordering of the ��3N� matrix is used in thestudy of ENMs. We name them as block representation andstandard MD representation. For details see Appendix B. Inthe block representation described in Appendix B, Eq. �11�reads as

�FX

FY

FZ� = ��X

�N� 0 0

0 �Y�N� 0

0 0 �Z�N� ��

RX

RY

RZ� . �14�

Here, the subscripts denote the X, Y, or Z components, and�X

�N�, �Y�N�, and �Z

�N� are N�N. It is to be noted thatirrespective of the form of the energy function, the threesubmatrices in Eq. �14� are identical, as can be seen from thedefinition of the derivative given in Eq. �11�. The force andposition vectors in Eq. �14� are thermodynamic quantities,i.e., average values, and at equilibrium, F=��3N�Req=0.

In Secs. III and IV, we will use the block representation.The order of the matrices, 3N�3N or N�N, will be self-evident and will not be shown explicitly unless needed forclarity.

B. The correlation matrix

Correlations among the fluctuations of residues are givenby Eq. �6�. which requires the evaluation of the derivative�Fi /�R j. Using Eq. �14�, this derivative is written as

�Fi

�Rk= �ij

�R j

�Rk+

��ij

�RkR j � �ik + ��ik � �ik. �15�

Here, ��ik= ���ij /�Rk�R j where ��ij /�Rk is third order, andits inner product with the position vector R j gives a secondorder matrix that has the following block form:

���XX ��XY ��XZ

��YY ��YZ

��ZZ� , �16�

where the symmetric lower half is not shown. The blockmatrices are of dimensions N�N, with

��XX = ���ij

�Rij2 �Xj − Xi�2 = − 2

�2Eij

��Rij2�2 �Xj − Xi�2 i � j

− k�j

��jk i = j ,��17�

where Xi and Xj are the X-components of the ith and jthresidues, respectively. The terms for ��YY and ��ZZ are ob-tained similarly where Y and Z replaces the X’s respectively.The first off-diagonal term ��XY is obtained as

��XY =���ij

�Rij2 �Xj − Xi��Y j − Yi�

=− 2�2Eij

��Rij2�2 �Xj − Xi��Y j − Yi� i � j

− k�j

��jk i = j .� �18�

The terms for the other off-diagonal blocks are written simi-larly by replacing the variables in Eq. �18� accordingly. Thecomponents of the �� matrix may thus be written in compactform as

���� = �− 2�2Eij

��Rij2�2 �� j − �i��� j − �i� i � j

− k�j

��jk i = j ,� �19�

where � and � represent the coordinates, X, Y, or Z at thegiven equilibrium or reference state.

The gradient of �� given in Eq. �15� is obtained by firsttaking the first column of the � matrix, taking its gradient,which gives a vector and then dot this with R and obtain thefirst column of ��. Applying the same operation to the re-maining columns of � leads to the 3N�3N �� matrix. Re-arranging the terms leads to Eq. �19�. Substituting Eq. �15� inEq. �6� leads to

��Ri�RjT� = kT�� + ��−1�ij = kT�−1�ij . �20�

Equation �20� is the fundamental relation expressing the cor-relations in terms of the inverse of the � matrix.

095103-3 Residue fluctuations in native proteins J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 5: Statistical thermodynamics of residue fluctuations in native proteins

C. Comments on the energy function and theapplicability of Equation „6… in general

In its most general form, the pairwise additive energy ofthe system in the coarse-grained approximation from Eq. �8�is

U =1

2 i,j

Eij0f ij� Rij

Rij0 . �21�

Here, Eij0 and Rij

0 are the energy and distance parametersthat characterize the native state such that dU=0 and d2U�0. The first condition gives

dU = i

�RiU · dRi =1

2 i��Ri

j

Eij0f ij� Rij

Rij0� · dRi

= 0. �22�

There is a special functional form of f ij�Rij /Rij0� in Eq. �22�

where the reference distance Rij0 is chosen equal to the equi-

librium distance Rijeq between i and j in the native state in

the presence of all other residues. In this case, at equilibriumthe gradient in the square brackets will equate to zero inde-pendently for each term in the summation. We call this formof the energy function “the standard form” because at equi-librium, the term in the square brackets is a minimum for alli and j. This advantage of the standard form rests on the apriori knowledge of the equilibrium state. However, the stan-dard form may not always be readily available for proteins atequilibrium in an aqueous environment because the coordi-nates of the native protein are usually obtained from crystalstructure x-ray data, which may differ significantly fromthose at equilibrium in water. In this case, MD simulations ofthe protein in water are needed to establish Rij

eq and Eijeq for

recovering the standard form of the energy function. An ex-ample of the standard form is given in Sec. III on the har-monic approximation. It is to be noted here that when theenergy function is given in the standard form, �ij �0 andonly the �� matrix contributes to the correlation of fluctua-tions. The � matrix, the inverse of which gives the secondorder correlations, has first and second derivatives of theenergy with respect to position, which corresponds to infor-mation on the local structure of the energy function only.Higher order correlations that can be obtained by Eq. �7�require higher order derivatives of the pair potential. Thisequation contains the n-1st derivative of the � matrix, or thenth derivative of the pair potential, when it is used to obtainthe nth order correlation. Characterization of the full statis-tical features of fluctuations with the proposed model re-quires the knowledge of all order derivatives of the pair po-tential which is equivalent to the knowledge of the fullprobability distribution function. The second order correla-tions ��Ri�R j

T� containing the effects of anharmonicitiescan only be evaluated when �Ri�R j

T is averaged using thefull probability function. The full probability density func-tion is presented below in the form of a tensorial Hermitepolynomial, the coefficients of which are given by the gen-eral theory presented here. However, it is also possible, byusing a perturbation scheme, to evaluate the second ordercorrelation matrix without resort to the full distribution func-

tion. In the remaining part of this section, we show that the �matrix can consistently be renormalized to include the effectsof anharmonicities of the pair potentials.

The pair potentials, and therefore the full energy of thesystem, do not need to possess a unique minimum. Indeed,amino acid pair potentials with multiple minima arecommon.20 The only requirement for Eq. �6� to be valid isthat the fluctuations should not move the system too muchaway from equilibrium. If the latter is the case, then thevariables U, V, and R will not be sufficient to describe thebehavior of the system, and additional independent variableswill be necessary. Several factors contribute to the deviationof a given protein from the minimum energy configuration.The manifold of internal constraints that the protein is sub-ject to around the native state sets the mechanisms by whicha protein may move from one stable energy state to anotherduring its fluctuations. Existence of a pair of neighboringresidues in one or the other pair-energy minima is of thistype. Given the state of the protein, Eq. �6� then gives thecorrelations. The chain relations given in Eq. �11� character-ize the energy surface at the given conformation of the pro-tein, but also leads to the question of whether the elements ofthe matrix � consists of only the local slope and curvaturesof pair potentials �see Eqs. �13� and �19��. If this were thecase, then the general result given in Eq. �6� would not re-flect the effects of anharmonicities on second order correla-tions. As stated above, introduction of anharmonicities intothe second order correlations needs the construction of thefull probability distribution function, Eq. �2�. As an alterna-tive route, in the remainder of this section, we show, by aperturbation scheme, that any feature of the pair potentialmay be incorporated into the second order correlation matrix��Ri�R j

T�.For illustrative purposes, let us assume that the potential

between every pair of residues is strictly harmonic, exceptone pair, say residues m and n, for which the potential differsfrom Emn by a small amount �Emn���, where � is a measureof deviation from harmonicity. This will change the elementsmm, nn, mn, and nm of the � matrix. which will take thefollowing form:

��� = �0� + �V , �23�

where we denote the unperturbed state by zero. � is smalland V is the matrix all of whose elements are zero exceptmm, nn, mn, and nm, such that21

V =����

��. �24�

The eigenvalues and eigenvectors of �0� are i�0� and ui�0�.The perturbed eigenvalues and eigenvectors are as follows:

k��� = k�0� + �uk�0�TVuk�0� + �2 l�k

ul�0�TVuk�0� k�0� − l�0�

,

�25�

095103-4 Yogurtcu, Gur, and Erman J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 6: Statistical thermodynamics of residue fluctuations in native proteins

uk��� = uk�0� + � l�k

ul�0�TVuk�0� k�0� − l�0�

ul�0� . �26�

The perturbed correlation matrix will be written as

��Ri����Rj���� =3

2�

k

1

k����uk����i�uk���� j . �27�

Substituting for the perturbed eigenvalues and eigenvectors

��Ri����Rj���� =3�

2�

k

1

k�0� + �uk�0�TVuk�0� + �2 l�k

uk�0�TVuk�0� k�0� − l�0�

· ���uk�0��l + � l�k

ul�0�TVuk�0� k�0� − l�0�

�ul�0��i����uk�0��l + �

l�k

um�0�TVuk�0� k�0� − l�0�

�um�0��i� . �28�

Keeping only the first order terms in � leads to

��Ri����Rj���� = ��Ri�0��Rj�0�� +3�

2�

k

1

k�0��2 l�k

ul�0�TVuk�0� k�0� − l�0�

�ul�0��i�uk�0�� j −1

k�0�uk�0�TVuk�0��uk�0��i�uk�0�� j� .

�29�

Thus, the anharmonicity introduced to a pair potential propa-gates to the full second order correlation matrix. One caniterate this process, by replacing the unperturbed ��0�, ��0�,and u�0� by ����, ����, and u��� and repeating the aboveanalysis. This introduces the effects of anharmonicities to the� matrix, including those resulting from multiple pair-energyminima. Of course, effects of anharmonicities can be intro-duced in this way not only to the second order correlationfunction, but to all higher order moments of the fluctuationvectors using Eq. �7�. The higher order moments obtained inthis manner may then be used to characterize the probability

distribution function f�U , V , R� in terms of moment basedHermite polynomials, which we present following the dis-cussion on the harmonic approximation.

III. THE HARMONIC APPROXIMATION

The harmonic approximation is based on expanding theenergy function into Taylor series and keeping the quadraticterms. Then, using the differentiations indicated in Eqs. �15�,�17�, and �18�, the matrix � is obtained, the inverse of whichgives the correlation matrix. This approach, which is calledthe ANM was introduced by Hinsen7 and applied to proteinsby several authors. The model is based on the expansion ofthe standard form of the energy expression

E =1

2 i,j

Eij�Rijeq��Rij − Rij

eq�2. �30�

Since this expression is in standard form, the � matrix van-ishes when Rij =Rij

eq and the front factor �2Eij /��Rij2�2 in the

�� matrix in Eq. �21� becomes Eij�Rijeq� / �Rij

eq�2. It is to benoted, however, that when Rij

0 is not chosen as Rijeq, the �

matrix will be nonzero.

When the term in parenthesis in Eq. �30�, ��Rij2

+2Rijeq ·�Rij +Rij

eq2�1/2−Rij

eq is expanded into Taylor series,and the leading term is taken for infinitesimal fluctuations,the resulting expression is obtained,

E =1

2 i,j

�2Eij

��Rij2�2 ���Ri − �R j� · uij�2, �31�

where �Rij =�R j −�Ri and uij = �Rieq−R j

eq� /Rijeq is the unit

vector along Rieq−R j

eq. Equation �31� may now be expressedin matrix form as

E = 12�RT��R , �32�

where

ij = �−�2Eij

��Rij2�2cos2 �ij i � j

− k

ij i = j � k .� �33�

Here, �ij is the angle between �Ri−�R j and Rieq−R j

eq.The energy expression given by Eq. �32�, together with

the definition of the � matrix given by Eq. �33�, is the basisof the ENMs which hold for infinitesimally small fluctua-tions. Formulations of the ENMs based on Eq. �32� are out-lined in several papers in the recent book of Cui and Bahar.10

In the GNM, the matrix � is assumed to be of the fol-lowing form:

�ij = �− �� i � j and Rij � rcutoff

0 i � j and Rij � rcutoff

− k

��i = j � k . � �34�

Here, Rij is the distance between the ith an jth C�’s that arewithin an interaction distance of rcutoff, �� is the force con-stant representing this interaction. Residues separated by a

095103-5 Residue fluctuations in native proteins J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 7: Statistical thermodynamics of residue fluctuations in native proteins

distance larger than rcutoff are assumed not to interact. The �matrix, or the � matrix by Eq. �20� when ��=0, is N�N,and is identical for the X, Y, and Z components.

The harmonic approximation may also be obtained,equivalently, by using the Gaussian distribution W��R� offluctuations

W��R� = �2��−3N/2�det �/kT�−1/2

�exp�− 12�RT��/kT��R� · �35�

Here, �R is the 3N dimensional fluctuation vector, and � isthe spring constant matrix. Multiplying Eq. �35� with�R�RT and integrating over all possible states of fluctua-tions leads to

��R�RT� � � �R�RTW��R�d��R�

= �2��−3N/2�det ��−1/2� �R�RT

�exp�−1

2�RT��R�d��R� , �36�

where

d��R� = d�X1d�X2 ¯ d�XNd�Y1d�Y2 ¯

d�YNd�Z1d�Z2 ¯ d�ZN. �37�

Carrying out the integration leads to

��R�RT� = kT�−1, �38�

where the elements of the spring constant matrix � are cho-sen such that Eq. �38� is consistent with Eq. �20�. This result,which forms the basis of the GNM has been first shown forGaussian networks by Kloczkowski et al.6 In that work, afactor of 3/2 on the right-hand side was present due to thedifferent constants of proportionality adopted in the defini-tion of the elements of the � matrix.

In this section, we rederived the equations of the har-monic ENMs starting from the general statistical thermody-namics formalism. A statistical thermodynamics rendition ofthe ENMs has not been elaborated in previous studies. Thisgeneral approach has several advantages over previousmechanistic approaches as may be apparent in the two ex-amples below.

A. The contribution of harmonic fluctuations to heatcapacity

The heat capacity of a native protein can be obtainedfrom Eq. �7� by letting �=��k=�E and �k=1 /T. Withthese substitutions, Eq. �7� takes the form

���E�2� = −�U

��= kT2Cv, �39�

where �=1 /kT. The contribution from the fluctuations ofresidues of a protein of N residues comes from the meanenergy11

U = � 12�RT��R� = 3

2 �N − 1�kT . �40�

Differentiating U with respect to � leads to the heat capacity

Cv = 32 �N − 1�k . �41�

The term N−1 rather than N appears in Eq. �41� because onedegree of freedom is suppressed against translation in eachcoordinate direction.22 The present statistical thermodynam-ics model from which the fluctuations are derived is that of asolid where the C�’s fluctuate around their equilibrium posi-tions. In this case, one would expect Cv=3�N−1�k ratherthan Eq. �41�. This result is obtained because only the poten-tial energy associated with fluctuations is considered in thederivation. The vibrational component, which is not includedhere, contributes another 3

2 �N−1�k, resulting in the heat ca-pacity of a monatomic solid. This is the high temperaturelimit where all modes of motion are excited. Recently, Yuanet al.23 obtained the harmonic contributions to the heat ca-pacity of native proteins using the GNM and associatingeach mode i with a frequency �i=�� i and using the Ein-stein relation

E = i=2

Nh�i

exp�h�i/kT� − 1

for the energy. In this sense, the vibrational modes of theprotein are assumed not fully excited at finite temperatures.

B. Coupling between energy fluctuationsand the fluctuations of residue positions

In the statistical mechanics model presented here, theprotein exchanges energy with its surroundings. This ex-change is the source of ���E�2� given in Eq. �39�. It is alsothe driving potential for the fluctuations of residue positions.In this section, we discuss how the fluctuations in energy arecorrelated with the fluctuations of residue positions in a con-certed way, as a consequence of which the protein performsits function.

Using Eq. �7� for correlating �U, �Ri, and �Rj, we ob-tain

��U�Ri�R jT� = �kT�2� �2U

�F j � Fk . �42�

Performing the differentiation shown in Eq. �42� and usingthe relations �� /�F j���U /�Fk�= ��Rk /�F j�=�−1 leads to theexpression

��U�Ri�R jT� = �kT�2��−1�ij = kT��Ri�R j

T� . �43�

Thus, fluctuations of energy are distributed to the residues inproportion to the correlations of fluctuations. The diagonalelements of ��Ri�R j

T� are positive by definition. Therefore,the average ��U��Ri�2� has to be positive for each i, if Eq.�43� is to hold. In order for the average ��U��Ri�2� to bepositive, a positive value of �U must couple with large val-ues of ��Ri�2 and a negative �U must couple with smallvalues of ��Ri�2. For the off-diagonal terms, the same patternholds. If ��Ri�R j

T��0, then positive energy fluctuationspick up the large positive �Ri�R j

T’s. Conversely, if

095103-6 Yogurtcu, Gur, and Erman J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 8: Statistical thermodynamics of residue fluctuations in native proteins

��Ri�R jT��0, then positive energy fluctuations pick up the

large negative �Ri�R jT’s. The exchange of energy of a pro-

tein with its surroundings is expected to have major role inprotein-ligand binding. Recent work12,24 shows that highestmodes of ��U��Ri�2� locate the binding sites of ligands onproteins.

IV. DEVIATIONS FROM THE HARMONIC POTENTIAL

When the energy function is expressed in standard form,the thermodynamic model proposed leads to fluctuations thatdepend on the curvature of the energy surface, as can be seenfrom Eq. �19�. However, proteins in general exhibit largescale fluctuations, and the dynamics is strongly dependent onthe anharmonicity of the energy landscape.25 The fact thatnative proteins exhibit large fluctuations about the equilib-rium configuration necessitates the introduction of an im-provement in the model that accounts for departures fromharmonicity. In Secs. I–III we introduced an iterative pertur-bation scheme to characterize the effects of anharmonicitiesin the � matrix. In this section, we further elaborate on thisproblem.

We approximate the probability function f�U , V , R� inthe presence of large scale fluctuations by a tensorial Her-mite series26

f��R� = �2��−3N/2�det��R�RT��−1/2

�exp�−1

2�RT��R�RT�−1�R�

· �1 + �=3

��!�−1�H�� · H����R�RT�−1/2�R�� .

�44�

On the left-hand side, we dropped the arguments U and V,and used �R instead of R. The leading term of the distribu-tion function is the Gaussian as given by Eq. �35�. Nonlinearterms are introduced as corrections in terms of the Hermitepolynomials. These correction terms become unimportant asthe fluctuations become small, and/or the system approachesa harmonic one. The first few polynomials, H�, are as fol-lows:

H1��R� = �Ri,

H2��R� = �Ri�Rj − �ij ,

H3��R� = �Ri�Rj�Rk − ��R��ijk,

�45�H4��R� = ��R4 − �R2� + �2�ijkl,

H5��R� = ��R5 − �R3� + �R�2�ijklm,

H6��R� = ��R6 − �R4� + �R2�2 − �3�ijklmn,

where �ij is the Kronecker delta, and ��R��ijk in the expres-sion for H3 is a short-hand notation for �Ri� jk+�Rj�ik

+�Rk� ji, with similar expressions for the remaining terms inEq. �45�. For example,

��R2��ijkl = �Ri�Rj�kl + �Ri�Rk� jl + �Ri�Rl�ki

+ �Rj�Rk�il + �Rj�Rl�ik + �Rk�Rl�ij .

The third term in the series represents the first deviation fromthe harmonic potential and contains the average��Ri�Rj�Rk�. According to the present model, this averageis given, by the application of Eq. �7�, as18

��Ri�Rj�Rk� = �kT�2 �2Ri

�Fj � Fk

= �kT�2 m� �

�Rm��−1�ik���−1�mj , �46�

where the second line is obtained by using Eq. �20� for theforce relation, as

�2Ri

�Fj � Fk=

�Fj� �Ri

�Fk =

�Fj��−1�ik

= m

�Rm��−1�ik

�Rm

�Fj

= m� �

�Rm��−1�ik���−1�mj .

The second line of Eq. �46� contains the derivative of �−1,which can be carried out if the energy function is known.

As an alternative to Eq. �46�, higher order moments, ofcourse, can be evaluated from MD trajectories. In this case,long trajectories are needed for the molecule to populate allthe accessible states.25 The example worked out in Sec. IV ofthis paper derives the averages from MD trajectories.

Equation �44� may take a simpler form if it is presentedin terms of the transformed fluctuations �r that is related to�R by the transformation

�r = ��R�RT�−1/2�R . �47�

With this transformation, the correlation matrix ��r�rT� iswritten as

��r�rT� = ��R�RT�−1/2��R�RT���R�RT�−1/2T� E ,

�48�

where E is the identity matrix, and the last equality followsbecause the inverse square root of a symmetric matrix issymmetric. We let V represent the eigenvector matrix thatdiagonalizes ��R�RT�, and represent the eigenvalues.Then, for Eq. �48� to be the unit matrix, we must have

��R�RT�−1/2 = diag −1/2VT. �49�

With these equalities, we see that the fluctuations �r are thefluctuations in the mode space spanned by the eigenvectors,V.11

The linear transformation given by Eq. �47� is the Kar-hunen Loeve or the principal component analysis widelyused in the analysis of MD trajectories.14,27,28 Equation �44�may now be written in mode space as

095103-7 Residue fluctuations in native proteins J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 9: Statistical thermodynamics of residue fluctuations in native proteins

f��r� = �2��−3N/2exp�−1

2�r2�

· �1 + �=3

��!�−1�H�� · H���r�� , �50�

where the average Hermite polynomials are defined as

�Hv� = �−�

Hv��r�W��r�d�r . �51�

Equation �50� represents the distribution of coordinates inmodal space. The elements of �Hv� now contain products ofmodal coordinates. For example the third order terms arenow ��ri�rj�rk�, and are measures of the extent of modecoupling. Obviously, the second order modes are decoupledsince ��ri�rj�=�ij.

Let us consider the distribution of the first mode f��r1�,for example. There are three types of terms in �Hv�’s in Eq.�50�: �i� Terms that contain �r1 only, �ii� terms that are com-binations of �r1 and other modes, and �iii� terms that do notcontain �r1. Terms of type �i� are pure first mode contribu-tions to f��r1�. Terms of type �ii� indicate the extent of modecoupling on the distribution of �r1, and terms of type �iii�have no contribution to mode 1. The part given by �ii� showsthe contributions from the coupling of other modes to f��r1�.Obviously, same argument is valid for any mode other thanmode 1, and may serve as a suitable approach to understandthe effects of mode coupling in proteins.

A. Transformation from modal space to real space

Having determined the statistical features of fluctuationsin modal space, it is straightforward to study the propertiesof these correlations and couplings in real space on residuebasis with the help of the transformation �R=V diag 1/2�r. For example, how the fluctuations ��Ri�R j�of residues i and j are affected due to the coupling of themodes u and v is given by the expression

��R�RT� = V diag VT, �52�

where all the elements of diag are set equal to zero except u and v. Similarly, for the third order correlations, how thefluctuations ��Ri�R j�Rk� of residues i, j, and k are affecteddue to the coupling of the modes u, v, and w is given by theexpression

��Ri�R j�Rk� = p,q,r

� p q r�1/2VipV jqVkr��rp�rq�rr� ,

�53�

where the set �p ,q ,r� is the permutation of the values�u ,v ,w�.

Example: Calculations of harmonic and anharmoniccontributions to residue correlations for a hexapeptide

We use a randomly chosen hexapeptide of sequenceASN-ASP-MET-PHE-ARG-LEU. This is a toylike proteinchosen for illustrative purposes only. Initially, a random con-formation was chosen and the energy of the system wasminimized for a sufficiently long time until no large scale

conformational changes took place. This conformation istaken as the “native” state of the peptide. The fluctuations ofresidue positions about this conformation are determined byMD simulations. Simulations were performed in explicit sol-vent �water� using NAMD 2.5 package with CHARMM27force field. All simulations were performed at constant tem-perature �300 K� in a periodic water box with a 20 Å cush-ion. To evaluate the nonbonded interaction, cutoff distancewas set to 12 Å. The particle Ewald sum was used as a wayof calculating long-range forces in the periodic systems,thereby minimizing the error introduced by truncation due tothe cutoff distance. Integration time step was set to 2 fs andstructure was recorded at 2000 step �4 ps in MD� for a 22 nslong simulation. Only the final 14 ns part of the trajectorywas used for the present calculations.

We recorded only the C� positions for the trajectory. Theresults of calculations reported here are based on the C�

coordinates, which lead to 18 degrees of freedom. The tra-jectory consists of an 18�1 fluctuation vector �R recordedfor each time step. The block representation is used in re-cording each fluctuation vector. The three degrees of free-dom due to rigid body rotation and three due to the transla-tion of the centroid are removed from the trajectories,leading to 12 degrees of freedom only. Removal of the sixdegrees of freedom was performed using visual moleculardynamics root mean square deviation Tool plug in. Atomselection for alignment was set to C� atoms and all structureswere aligned using the first structure of the trajectory as ref-erence.

The second order correlation ��R�RT� matrix is deter-mined for the trajectory. The matrix has 12 nonzero eigen-values and eigenvectors. The modal coordinates are obtainedby Eq. �47�, and the various averages ��ri�r j¯�rm� arecalculated from the trajectory. The distributions f��ri� for�ri, irrespective of the values of all other �r j are also calcu-lated from the trajectory. Results are shown by the filledcircles in Fig. 1 for all modes, 1–12. The solid curve in eachfigure is the Gaussian approximation obtained from Eq. �50�with the Hermite series terms equated to zero. The differencebetween the solid curve and the calculated points is the con-tribution of anharmonicities to each mode. Only the slowesttwo modes show significant deviations from harmonicity.The shapes of the modes 3–12 may be approximated rela-tively well by Gaussians, although there are significant de-viations in the maximum anharmonic amplitudes from har-monics, as may readily be verified from Fig. 1.

We now search an answer to the important question ofwhat fraction of the deviations from harmonicity in a givenmode results from coupling with other modes. We will thensearch the answer to which other mode couples moststrongly to the given mode. The proposed tensorial Hermiteseries expansion is capable of providing answers to thesequestions.

The coupling of different modes to a given mode i re-sults from the nonzero averages of mixed terms in the Her-mite series expansion, such as ��ri

p�r jq¯�rm

s�. Anhar-monic contributions to the distribution f��ri� purely frommode i will be from the moments ��ri

p� only. In Fig. 2, wepresent results of Hermite series expansions up to the 17th

095103-8 Yogurtcu, Gur, and Erman J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 10: Statistical thermodynamics of residue fluctuations in native proteins

order moments, obtained by equating the mixed terms��ri

p�r jq¯�rm

s� to zero, and keeping moments of type��ri

p� for each mode i, only. The reason for going up to the17th term is that full convergence was observed only at thisorder. The modal coordinates are nondimensional. Anhar-monic contributions in the absence of mode coupling, formodes 1–3 are shown by the solid curves in Fig. 2. The filledcircles are obtained from MD histograms. The difference be-tween the points and the curve for a given mode i comesfrom coupling of other modes to the ith. For mode 1, theshape and the skewness of the distribution is obtained bypurely anharmonic contributions from mode 1, and contribu-tions from coupling with other modes affect only the peak at�r=−1.2. A similar trend is seen also for mode 2. Modecoupling affects only the peak values of the distributions. Formode 3, small deviations at the tails are observed. Modes4–12 were predicted almost perfectly with pure terms andeffects of mode couplings on the distributions are small. Theunrealistic negative values of f��r1� seen in the first of Fig.1 result from an artifact of the Hermite series expansion.However, the negative values are insignificant, as seen fromthe figure. When sufficiently large number of terms are usedin the expansion, the negative values become insignificantlysmall.

FIG. 1. The distribution functions f��r� for each of the 12 modal coordi-nates, irrespective of the others. The filled circles are calculated as histo-grams from the MD trajectory. The solid curves are the Gaussians obtainedfrom Eq. �50�, with the Hermite terms equated to zero.

FIG. 2. Comparison of the MD histograms with Hermite series of 17 terms,for modes 1–3.

095103-9 Residue fluctuations in native proteins J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 11: Statistical thermodynamics of residue fluctuations in native proteins

We now try to answer the question of which modecouples to which mode in the most significant way. Thisquestion has to be answered for each order of the momentseparately. After analyzing moments of all orders, one canconclude on the strongest coupling in the system. Here, forillustrative purposes, we consider only third order moments,��ri�r j�rk�, where i, j, and k take values from 1 to 12, andall three are not equal to each other because those terms donot represent mode coupling. In Fig. 3, we present the valuesof ��ri�r j�rk�, which we term “correlation amplitudes,” ofthe third moments. We present only the positive values. Theyare sorted in descending order.

It is worth noting that there is one value that is muchlarger than all the others. Also, most of the correlation am-plitude values are below 0.5. In order to give an idea onwhich point in Fig. 3 corresponds to which triplet of modes,the three modes and the corresponding correlation ampli-tudes are presented in Table I for values larger than 0.5. Thevalues in bold point in the table are for pure modes, theothers are for mixed modes. Pure mode values are also in-cluded in the table for comparison of their magnitudes withthose of the mixed modes.

In Fig. 4, we present the degree of coupling of the firstmode with the two other modes in third order correlations,i.e., in ��r1�r j�rk�, where j and k are the mode indicesshown along the abscissa and the ordinate in Fig. 4. Darkerregions in the figure indicate stronger correlations. In gen-

eral, positive correlations are stronger when compared withthe negative ones. The coupling of the first mode is not con-fined only to its neighboring modes. For example the cou-pling ��r1�r10�r11� between the first mode and the 10th and11th is one of the strongest positive couplings. Similarly, thenegative coupling ��r1�r7�r12� is also among the strongest.

V. DISCUSSION

The statistical thermodynamics treatment of native pro-teins presented here points out to the fact that nonbondedpair potentials between residues play the key role in deter-mining the fluctuations, or in general, the full thermostatis-tics. The �� matrix of Eq. �20� contains the second deriva-tives of the pair potentials. Higher order derivatives of pairpotentials are present in the distribution function given byEq. �44�. The forms of the short and long-range inter-residuepotentials evaluated by Bahar et al.,20,29 over a databank of302 globular proteins show that inter-residue potentials forsome pairs may be expressed as Lennard-Jones type, or moregenerally as a Mie potential.30 For such cases, neglecting theasymmetry at the potential minima, the harmonic represen-tation suffices for characterizing the system for small fluc-tuations. However, the first and important deviation from asymmetric harmonic potential comes from to the asymmetryof the potentials. The pair potential rises steeply when thedistance between two residues becomes smaller than the po-tential minimum, but rises mildly when the distance becomeslarger. The evaluated coarse-grained inter-residue potentialsfor some other residue pairs depart strongly from the Mie-type potential in that they either do not have minima, or havea multitude of minima with increasing inter-residue separa-tion. These differences establish the specificity of pair poten-

FIG. 3. The values of the mixed third order terms, sorted in descendingorder.

TABLE I. Largest correlation amplitudes for third order moments.

Mode i Mode j Mode k Mode amplitude

3 3 3 1.6407 7 7 1.07710 10 10 0.89412 12 12 �0.7242 3 3 �0.7633 3 10 0.7593 11 11 0.7237 7 9 0.7063 10 10 0.6853 3 9 0.6505 7 7 0.6383 12 12 0.6297 7 12 �0.6099 11 11 0.5882 7 7 0.5187 11 11 0.514

FIG. 4. Coupling of the first mode to the other two modes in third ordercorrelations. The two modes are indicated by the corresponding mode indi-ces along the abscissa and ordinates. Darker regions indicate strongercorrelations.

095103-10 Yogurtcu, Gur, and Erman J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 12: Statistical thermodynamics of residue fluctuations in native proteins

tials and have to be taken into consideration, especially forlarger fluctuations. Piazza and Sanejouand16 used an energyfunction of the form

Eij = p=1

4kp

p�Rij − Rij

eq�p

to investigate the effects of nonlinearities in the potential.The tensorial Hermite polynomials formalism presented hereis an alternative rational scheme of introducing the devia-tions of pair potentials from the harmonic.

In the harmonic approximation, the vibrational modesare independent and the energy of each mode is equal, obey-ing the equipartition theorem.11 Independence of modes isdestroyed when pair potentials deviate from the harmonic.There is significant interest in the coupling of fluctuationalmodes in native proteins.15–17 The interest is mainly an out-come of the belief that biological function is coupled withanharmonic dynamics.15,16,31 There are two types of anhar-monicity associated with a given mode, i: �i� the part of thedistribution that results from the higher moments of mode iitself, and �ii� the part of the distribution that results fromcoupling of mode i with other modes. The example workedout in this paper shows that the shape of the distribution ofmode i is well represented by the higher moments of thatsame mode while the coupling of other modes affects theamplitude of the distribution, specifically the peak values.

Moritsugu et al.,17 performed a normal mode analysis ofmyoglobin assuming that energy transfer is due to a weakanharmonicity that can be decomposed into a vibrational en-ergy flow between a pair of normal modes. Nonlinearity wasintroduced in terms of third order mode coupling that corre-sponds to the third order moments of the Hermite series.They showed that the vibrational energy was transferredfrom a normal mode to a very few number of specific normalmodes for myoglobin. The magnitude of the coupling coef-ficient, which corresponds to the third order moments offluctuations, was estimated by the degree of the geometricaloverlap between the coupled modes. The present Hermiteseries approximation shows that coupling from higher orderthan the third may play significant role in protein behavior.

Larger fluctuations in native proteins are of significantinterest in studying the hopping of residues from one state tothe other. In this case, the anharmonicity of the energy land-scape needs be taken into consideration. There is growinginterest in this direction, and semianalytical models havebeen used in addition to MD simulations of theanharmonicity.28,32 The proposed moment based expansionof the fluctuation probability function is capable of charac-terizing such anharmonicity effects, especially if sufficientnumber of higher moments is included in the expansion.

The interest in expressing the fluctuations of proteins innormal mode or principal components is not new.33 In orderto describe the internal motions of human lysozyme obtainedby MD or Monte Carlo �MC� simulations as motions of nor-mal mode variables, Horiuchi and Go34 projected the MCand MD trajectories of the protein on its normal mode axes.The idea behind this study was that the harmonic motionpredicted by the normal mode analysis could approximately

simulate the motion which is in reality highly unharmonic.They showed that the lowest frequency normal mode ex-tracted from the MC and MD simulations correlate very wellwith the hinge bending motion motions. Amadei et al.,35

showed that it is possible to separate the configurationalspace into an essential subspace of few degrees of freedom inwhich anharmonic motions occur and a harmonic space inwhich the motion has a narrow Gaussian distribution. Therelevance of fluctuation dynamics to energy landscape hasbeen discussed by Hayward and co-workers.14,32 The presentstudy generalizes these arguments by introducing a tensorialmoment based Hermite series form for the well known Kar-hunen Loeve expansion. Our effort in the present work, un-like those of the previous studies mentioned above, is moti-vated mainly in identifying the effects of mode couplingwhich underlies the function of proteins.

APPENDIX A: DERIVATION OF EQUATION „6…

The correlation matrix ��R�RT� is defined as

��R�RT� = �R − R��R − R�Tf . �A1�

The gradient of f�U , V , R , N� with respect to F /T reads

� f

��F/T�= k−1�R −

��F/T�S� 1

T,P

T,F

T,�

T� f

= k−1�R − R�f . �A2�

Substituting Eq. �A2� into Eq. �A1�, we have

��R�RT� = kT �R − R�� f

�F

= kT�

�F�R − R� − kT� �

�F�R − R�� . �A3�

The first term on the right-hand side vanishes, and R is sta-tistically independent of F, and we have

��R�RT� = kT�R

�F. �A4�

Equation �A4� is valid irrespective of system size and istherefore suitable for the study of a single protein.

APPENDIX B: THE BLOCK REPRESENTATIONOF THE � MATRIX

There are two different representations of the matrices �and �� with respect to ordering of the X, Y, and Z coordi-nates of the N residues. The use of one instead of the othercauses confusion. In its full generality, the left-hand side ofEq. �10� consists of the various products of �Xi, �Yi, �Zi

and �Xj, �Y j, �Zj, expressed with respect to a laboratoryfixed coordinate system OXYZ. In the block repre-sentation, the elements of �R are arranged as�R=col��X1 ,�X2 , . . . ,�XN ,�Y1 ,�Y2 , . . . ,�YN ,�Z1 ,�Z2 ,. . . ,�ZN�. In other ENMs the standard MD representation isused according to which, �Rt=col��X1 ,�Y1 ,�Z1 ,�X2 ,�Y2 ,�Z2 , . . . ,�XN ,�YN ,�ZN�. The correlation matrixC is accordingly written either as C= ��R�RT� or Ct

095103-11 Residue fluctuations in native proteins J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 13: Statistical thermodynamics of residue fluctuations in native proteins

= ��R��R�T�. Both C and Ct are of order 3N�3N, where Nis the number of residues. The passage from one to the otheris made by C=TCtTT, where T is a 3N�3N permutationmatrix formed as

Tij = �1, for i = 1,2, . . . ,3N and

j = 3��i − 1�mod N� + � i − 1

N � + 1

0, otherwise.� �B1�

In the block representation the matrices � and �� are parti-tioned into submatrices as

� = ���XX� 0 0

0 ��YY� 0

0 0 ��ZZ��

�B2�

�� = ����XX� ���XY� ���XZ�− ���YY� ���YZ�− − ���ZZ�

� ,

where each submatrix is N�N. The second submatrix��XY�, for example, has the mixed products �Xi�Y j.

APPENDIX C: HERMITE POLYNOMIALS

Here we give the explicit forms of the Hermite polyno-mials up to the 17th order used in the calculations in thepresent work. These are the terms for obtaining the contribu-tions to the distribution function by a single mode. For thegeneral case of mixed modes, the definition of the tensorialHermite polynomials given by Eq. �44� should be used. Inthe expressions given here, q has two meanings, either �r or��r� depending whether it is in H or �H�, respectively,

H1 = 0,

H2 = 0,

H3 = q3 − 3q ,

H4 = q4 − 6q2 + 3,

H5 = q5 − 10q3 + 15q ,

H6 = q6 − 15q4 + 45q2 − 15,

H7 = q7 − 21q5 + 105q3 − 105q ,

H8 = q8 − 28q6 + 210q4 − 420q2 + 105,

H9 = q9 − 36q7 + 378q5 − 1260q3 + 945q ,

H10 = q10 − 45q8 + 630q6 − 3150q4 + 4725q2 − 945,

H11 = q11 − 55q9 + 990q7 − 6930q5 + 17 325q3 − 9450q ,

H12 = q12 − 66q10 + 1485q8 − 13 860q6 + 51 975q4

− 62 370q2 + 10 395,

H13 = q13 − 78q11 + 2145q9 − 25 740q7 + 135 135q5

− 270 270q3 + 10 395q ,

H14 = q14 − 91q12 + 3003q10 − 45 045q8 + 315 315q6

− 945 945q4 + 945 945q2 − 135 135,

H15 = q15 − 105q13 + 4095q11 − 75 075q9 + 675 675q7

− 2 837 835q5 + 4 729 725q3 − 2 027 025q ,

H16 = q16 − 120q14 + 5460q12 − 120 120q10

+ 1 351 350q8 − 7 567 560q6 + 18 918 900q4

− 16 216 200q2 + 2 027 025,

H17 = q17 − 136q15 + 7140q13 − 185 640q11

+ 2 552 550q9 − 18 378 360q7 + 64 324 260q5

− 91 891 800q3 + 34 459 425q . �C1�

1 H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weis-sig, I. N. Shindyalov, and P. E. Bourne, Nucleic Acids Res. 28, 235�2000�.

2 M. M. Tirion, Phys. Rev. Lett. 77, 1905 �1996�.3 I. Bahar, A. R. Atilgan, and B. Erman, Folding Des. 2, 173 �1997�.4 T. Haliloglu, I. Bahar, and B. Erman, Phys. Rev. Lett. 79, 3090 �1997�.5 P. J. Flory, Proc. R. Soc. London, Ser. 351, 351 �1976�.6 A. Kloczkowski, J. E. Mark, and B. Erman, Macromolecules 22, 1423�1989�.

7 K. Hinsen, Proteins 33, 417 �1998�.8 A. R. Atilgan, S. R. Durell, R. L. Jernigan, M. C. Demirel, O. Keskin,and I. Bahar, Biophys. J. 80, 505 �2001�.

9 M. Delarue and Y. H. Sanejouand, J. Mol. Biol. 320, 1011 �2002�; O.Keskin, I. Bahar, D. Flatow, D. G. Covell, and R. L. Jernigan, Biochem-istry 41, 491 �2002�; W. Zheng, B. R. Brooks, and G. Hummer, Proteins69, 43 �2007�; L. Yang, G. Song, and R. L. Jernigan, Biophys. J. 93, 920�2007�; M. C. Demirel, A. R. Atilgan, R. L. Jernigan, B. Erman, and I.Bahar, Protein Sci. 7, 2522 �1998�; C. Micheletti, P. Carloni, and A.Maritan, Proteins 55, 635 �2004�; H. Yu, L. Ma, Y. Yang, and Q. Cui,PLOS Comput. Biol. 3, e21 �2007�; Y. J. Jeong, and M. K. Kim, J. Mol.Graphics Modell. 24, 296 �2006�; L. Leherte and D. P. Vercauteren,Comput. Phys. Commun. 179, 171 �2008�; G. Song and R. L. Jernigan, J.Mol. Biol. 369, 880 �2007�; R. Lavery and S. Sophie Sacquin-Mora, J.Biosci. 32, 891 �2007�; L. Marsella, Proteins: Struct., Funct., Bioinf. 62,173 �2006�; S. Kundu, S. C. Sorensen, and G. N. Phillips, Proteins:Struct. Funct. Bioinf. 57, 725 �2004�; K. Eom, S. C. Baek, J. Ahn, and S.Na, J. Comput. Chem. 28, 1400 �2007�; P. Doruker, R. L. Jernigan, andI. Bahar, ibid. 23, 119 �2002�; T. Haliloglu and I. Bahar, Proteins 31, 271�1998�; I. Bahar, B. Erman, R. L. Jernigan, A. R. Atilgan, and D. G.Covell, J. Mol. Biol. 285, 1023 �1999�; P. Doruker, A. R. Atilgan, and I.Bahar, Proteins 40, 512 �2000�; E. Eyal, C. Chennubhotla, L.-W. Yang,and I. Bahar, Bioinformatics 23, i175 �2007�; L.-W. Yang, E. Eyal, C.Chennubhotla, J. JunGoo, A. M. Gronenborn, and I. Bahar, Structure 15,741 �2007�; E. Eyal, L. Yang, and I. Bahar, Bioinformatics 22, 2619�2006�; L. W. Yang, A. J. Rader, X. Liu, C. J. Jursa, S. C. Chen, and H.A. Karimi, Nucleic Acids Res. 34, W24 �2006�; C. Chennubhotla and I.Bahar, Lect. Notes Comput. Sci. 3909, 379 �2006�; L. W. Yang and I.Bahar, Structure �London� 13, 893 �2005�; L. W. Yang, X. Liu, X. Chris-topher, J. Jursa, M. Holliman, A. J. Rader, H. Karimi, and I. Bahar,Bioinformatics 21, 2978 �2005�; Y. Wang, A. J. Rader, I. Bahar, and R.L. Jernigan, J. Struct. Biol. 147, 302 �2004�; A. J. Rader and I. Bahar,Polymer 45, 659 �2004�; C. Xu, D. Tobi, and I. Bahar, J. Mol. Biol. 333,153 �2003�; B. Erman and K. A. Dill, J. Chem. Phys. 112, 1050 �2000�;A. Erkip, B. Erman, C. Seok, and K. A. Dill, Polymer 43, 495 �2002�; B.Erman, Biophys. J. 91, 3589 �2006�; J. L. Liao and D. N. Beratan, ibid.87, 1369 �2004�; F. Tama and C. L. Brooks, J. Mol. Biol. 318, 733�2002�; I. Bahar and A. J. Rader, Curr. Opin. Struct. Biol. 15, 586�2005�.

095103-12 Yogurtcu, Gur, and Erman J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions

Page 14: Statistical thermodynamics of residue fluctuations in native proteins

10 Q. Cui and I. Bahar, Normal Mode Analysis: Theory and Applications toBiological and Chemical Systems �Chapman and Hall, London, 2006�.

11 I. Bahar, A. R. Atilgan, M. C. Demirel, and B. Erman, Phys. Rev. Lett.80, 2733 �1998�.

12 T. Haliloglu, E. Seyrek, and B. Erman, Phys. Rev. Lett. 100, 228102�2008�.

13 R. Levy, D. Perahia, and M. Karplus, Proc. Natl. Acad. Sci. U.S.A. 79,1346 �1982�.

14 S. Hayward, A. Kitao, and N. Go, Proteins 23, 177 �1995�.15 B. Juanico, Y. H. Sanejouand, F. Piazza, and D. P. Los Rios, Phys. Rev.

Lett. 99, 238104 �2007�.16 F. Piazza and Y. H. Sanejouand, Phys. Biol. 5, 026001 �2008�.17 K. Moritsugu, O. Miyashita, and A. Kidera, Phys. Rev. Lett. 85, 3970

�2000�; K. Moritsugu, O. Miyashita, and A. Kidera, J. Phys. Chem. B107, 3309 �2003�.

18 H. B. Callen, Thermodynamics and an Introduction to Thermostatistics,2nd ed. �Wiley, New York, 1985�.

19 T. L. Hill, Thermodynamics of Small Systems �Dover, New York, 1994�.20 I. Bahar and R. L. Jernigan, J. Mol. Biol. 266, 195 �1997�.21 B. Erman, I. Bahar, and C. Chennubhotla �unpublished�.22 The appearance of 3�N−1� in Eq. �33� is because the Hamiltonian for the

GNM is only rotationally invariant and three degrees of freedom due torigid body translation are suppressed. For a Hamiltonian that has bothtranslational and rotational invariance, six degrees of freedom will besuppressed leading to a factor of 3�N−2�.

23 Y. Yuan, Y. Wu, and J. Zi, J. Phys.: Condens. Matter 17, 469 �2005�.24 T. Haliloglu and B. Erman, Phys. Rev. Lett. �unpublished�.25 F. Pontiggia, G. Colombo, C. Micheletti, and H. Orland, Phys. Rev. Lett.

98, 048102 �2007�.26 P. J. Flory and D. Y. Yoon, J. Chem. Phys. 61, 5358 �1974�.27 M. A. Balsera, W. Wriggers, Y. Oono, and K. Schulten, J. Phys. Chem.

100, 2567 �1996�; A. A. Palazoglu, A. Gursoy, Y. Arkun, and B. Erman,J. Comput. Biol. 11, 1149 �2004�.

28 A. Kitao, in Normal Mode Analysis: Theory and Applications to Biologi-cal and Chemical Systems, edited by Q. Cui and I. Bahar �Chapman andHall, London, 2006�, p. 233.

29 I. Bahar, M. Kaplan, and R. L. Jernigan, Proteins 29, 292 �1997�.30 G. Mie, Ann. Phys. 11, 657 �1903�.31 H. Frauenfelder, F. Parak, and R. D. Young, Annu. Rev. Biophys. Bio-

phys. Chem. 17, 451 �1988�; Nonlinear Excitations in Biomolecules,edited by M. Peyrard �Springer, Berlin, 1995�.

32 A. Kitao, S. Hayward, and N. Go, Proteins 33, 496 �1998�.33 M. Karplus, in Normal Mode Analysis: Theory and Applications to Bio-

logical and Chemical Systems, edited by Q. Cui and I. Bahar �Chapmanand Hall, London, 2006�; B. Brooks and M. Karplus, Proc. Natl. Acad.Sci. U.S.A. 80, 6571 �1983�.

34 T. Horiuchi and N. Go, Proteins 10, 106 �1991�.35 A. Amadei A. Linssen, and H. J. C. Berendsen, Proteins 17, 412 �1993�.

095103-13 Residue fluctuations in native proteins J. Chem. Phys. 130, 095103 �2009�

Downloaded 22 Apr 2013 to 128.205.114.91. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions