Download - 20120703 girona seminar

Girona Seminar, 03/07/2012

multiscale structure and function

Multiscale simulations to identify hotspots in protein interfaces

César L. Ávila, Nils Drechsel, Michael A. Johnston and Jordi Villà-Freixa

Universitat Pompeu Fabra / Universitat de Vichttp://cbbl.imim.es

http://cbbl.imim.es/

http://cbbl.imim.es/



sequence->structure->function...

Guix et al. Brain (2009)Giupponi et al (in preparation)



how do we explore protein movements?



M

M

M

SP

SP

reaction coordinate

potential energy surface



example of the dynamics of a RING finger domain



great... as long as we have a good knowledge of the structure but what if not?

example of the dynamics of a RING finger domain



1 11 21 31 41

2NMS.pdb, chain A G I P Q I T G P T T V N G L E R G S L T V Q C V Y R S GW E T Y L K WW C R G A I W R D C K I L V K18 rat.B99990001.pdb G C V P L R G P S S V T G T V G E S L N V T C Q Y E E R F K M N K K Y W C R G S L V L L C K D I V R23

51 61 71 81 91

2NMS.pdb, chain A T S G S E Q E V K R D R V S I K D N Q K N R T F T V T M E D L M K T D A D T Y W C G I E K T G N D L68 rat.B99990001.pdb T G G S E . E A R N G R V S I R D D R D N L T F T V T L Q N L T L E D A G T Y M C A V D I P L I D H73

101 111

2NMS.pdb, chain A G V T V Q V T I D P A P118 rat.B99990001.pdb S F K V E L S V V P G N122

RMSD

RMSD

RMSD

-

James Dalton Sergio Rubio

in collaboration with Joan Sayós



!

A more challenging problem: CFTR

James Dalton

Dalton, Kalid, Shushan, Ben-Tal and Villà-Freixa . J. Chem. Inf. Mod. in press

!

Serohijos et al. PNAS 2008 Mornon et al. Cell Mol Lif Sci 2008



!

A more challenging problem: CFTR

James Dalton

Dalton, Kalid, Shushan, Ben-Tal and Villà-Freixa . J. Chem. Inf. Mod. in press!



homology modeling + docking + md

CRIS CHARNECO

Mulero et al. submitted



Explicit microscopicMajor convergence challenges

Simplified microscopic(PDLD)

Semi-macroscopic(PDLD/S-LRA)

Schutz et al., proteins (2001)

Beyond dynamics: energetics



PDLD/S-LRA

Warshel and coworkers



Khan et al. (unpublished)

residue differential Solvation



2-steps CDK2/CyclinA

Bonet et al. proteins (2006)

stabilization of Lys33 by its direct interactions to Glu51and Asp145. An interesting feature can be seen in theresults of Glu51 and Asp145. In this case, one would expecta similar behavior as for Lys33, but it is not the case. ForGlu51, the effect of conformational reorganization is prac-

tically nonexistent, and the effect of binding to Cyclin A isagain irrelevant for its stabilization. This is striking,because one would expect a large stabilization change ofthis residue upon spatial reorganization of the PSTAIRE.When looking at the matrix of interactions in Figure 4a we

Fig. 3. Bar plots showing the stabilization for all residues in CDK2 in three different conformational states:(a) unbound CDK2 (!!Gsolv

i,CDK2); (b) bound CDK2 conformation with calculations performed in the absence ofCyclin A (!!Gsolv

i,CDK2"); and (c) bound CDK2 (!!Gsolvi,CDK2"/CyclinA). The “ALL” label refers to the sum of all

stabilities.

70 J. BONET ET AL.



2-steps CDK2/CyclinA

Bonet, et al Proteins (2006)

free

ene

rgy

soft complexes

strong complexes

catalytic regulatory

A B

A' B'

A' B'



RING fingers / UBC interactions

Schepper, et al Proteins (2009)



Schepper, et al Proteins (2009)



Norma Díaz-Vergara



0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

True

Pos

itive

Rat

e

False Negative Rate

Transient Proteins

GPDLD

GRobetta

GPDLDTran

GRobetta Tran

GPDLDTran + ASA 0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1Tr

ue P

ositi

ve R

ate

False Negative Rate

Permanent Proteins

GPDLD

GRobet

GRobet Perm

ASA Perm

ASA Perm

GPDLD Perm

Keskin and coworkers; Díaz-Vergara, et al in preparation



AYTON & VOTH, COSB (2007)

going multiscale



Coarse grain models

• average contribution of groups of atoms

• use of the most effective variables (greatest conformational change from the smallest energy change)

Marrink et al. (2003-2011)



Levitt, NSB (2004)

energy vs sampling



Johnston, Villà-Freixa, in progress

energy vs sampling



EXPLORATION

REFINEMENT

Potential Energy Surface Coordinates MappingEn

ergy

Ener

gy

Internal Coordinates

Coarse-grained

All-Atom



IV. Concluding RemarksThis work develops and examines a method for calculations

of activation entropies of chemical reactions in solution. Themethod developed focused on the entropic contribution of thereacting fragments. Reliable evaluation of this contribution isessential for further progress in the understanding of the roleof entropic effects in enzyme catalysis. Our approach involvesthe thermodynamic cycle of Figure 1, where the activationbarriers are considered for two paths; in one path the reactingfragments are restrained to a single reaction coordinated and inthe second path they are allowed to move in the subspaceperpendicular to the reaction coordinate. The difference betweenthese two activation barriers can provide our -T(!Sq)!. How-ever, instead of performing a direct calculation along the abovepaths we use the free energies !G!RS and !G!TS obtained bystarting from the states where the fragments in the RS and TS,respectively, are forced to stay at a given Rh (by a strongCartesian restraint) and then allowing the fragments to moveby releasing the restraints. The difference between !G!RS and!G!TS gives us the desired -T(!Sq)! and a residual contribu-tion from the enthalpy of the system. This residual contributionwas minimized by finding restraint coordinates that minimized|!G!RS| and |!G!TS|.In order to obtain converging results, it is essential to have

ability to perform an extensive sampling of the availablepotential surface and this cannot be done at present with high-level ab initio approaches. Thus, we use here the EVB potentialsurfaces. These surfaces can provide a good approximation forthe corresponding ab initio potential surfaces and describe in aconsistent way the effect of the solvent on the solute Hamil-tonian.The value of -T(!Sq)! found in this work is smaller than

the value predicted by traditional estimates that involve the lossof three translations and three rotations in the transition stateof a bimolecular reactions,1-3 and amount to !15 kcal/mol. Thereason for this becomes clear upon examination of the presentsystem. It appears that many degrees of freedom have similarmotions at the ground state and the transition state. This is thecase, for example, for the second water molecule that becameH3O+ at the transition state but still retained a large configu-rational freedom. Thus, a large part of the entropic contributionin the RS remains also in the TS. In view of the present result,it is quite likely that previous consideration of the entropiccontributions to enzyme catalysis (e.g., refs 1-3) reflect anoverestimate, since the entropic contributions of the referencesolution reaction were probably overestimated. However, a moreconclusive study of this important issue requires one to use thepresent approach in studies of the contribution of the substratesmotion to the activation free energies of enzyme catalysis. Suchstudies are now in progress in our laboratory.

Acknowledgment. This work was supported by the NIHGrant GM24492. J.V. acknowledges EMBO fellowship ALTF509-1998. We are grateful to Dr. J. Florian and Mr. C. F. Jenfor insightful discussion.

AppendixIn order to obtain the solute entropic contribution to a transfer

between two potential surfaces (UIf UII), we have to separatethe corresponding partition functions to solute and solventcontributions by writing

where UN is the potential surface for state N and !(R - Rh)indicates that the corresponding function would be collected at(!R/2. Here R and r are the solute and solvent coordinate,respectively, pN(Rh) is the solute unnormalized probabilitydistribution evaluated at the given Rh averaged over all thesolvent coordinates and WN(Rh) is the corresponding potentialof mean force (PMF) which includes, of course, the solute andsolvent contributions. Expanding this potential around itsminimum gives

where we could, of course, use R but we took Rh as our variableto indicate that this is our restraint coordinate. Here, !gsol(Rh)is the solvation free energy at Rh , KW is the force constant ofthe quadratic term of our expansion and Rh0

N is the value of Rh atthe minimum of WN.Now, for the sake of simplicity, we continue by examining

a case where the PMF corresponds to a one-dimensional solutecoordinate, although extension to many dimensions is straight-forward. Our one-dimensional case is described by the ther-modynamic cycle of Figure 3. In this cycle, which is theequivalent of Figure 1, we divide R to segments that correspondto our qN’s. Each of this segments can be defined by the deltafunction of eq A1 or by introducing a strong quadratic constraint((K1/2)(R - Rh)2) as is done in the present work. In this case,we have

Next, we use eq A2 and our quadratic constraint and evaluatethe relevant partition functions.

Figure 3. A schematic illustration of the effect of Rh and the enthalpiccontribution to !G!. The figure considers one-dimensional potentialsof mean force (W) for two systems and the effect of confining thesystem to small segments of R by a strong constraint (K1). The figureis essentially a rigorous equivalent of Figure 1 where the effect ofrestraining the system in different Rh’s can be formulated. As explainedin the Appendix and illustrated here, the value of !G! depends on Rhand the !G! obtained with the minimum of the corresponding W (i.e.,Rh0) has the smallest absolute value. This !G! is our -T!S!.

WN(Rh) ) UN(Rh) + !gsolN (Rh) ) WN

0 (Rh0N) + (KW

N /2)(Rh - Rh0N)2

(A2)

W!N(Rh) ) WN0 (Rh0

N) + (KW/2)(Rh - Rh0N)2 + (K1/2)(Rh - Rh0

N)2

) WN(Rh) + (K1/2)(Rh - Rh0N)2 (A3)

QN ) e-WN0(Rh)" "-"

" dRh e-(KWN/2)(Rh-Rh0N)2" ) e-WN

0(Rh)" #2#/"KWN

QN )"dR"dr e-UN(R,r)" )"dRh ["dR"dr !(R -

Rh)e-UN(R,r)"] )"dRh pN(Rh) )"dRh e-WN(Rh)" (A1)

Activation Entropies of Chemical Reactions J. Phys. Chem. B, Vol. 104, No. 18, 2000 4583

Strajbl et al. JPCB (2000), Villà et al. PNAS (2000)

is the change of the parameters associated with the Xatom for the residue being mutated. Furthermore, itis not necessary to force the folded system to unfold.Additionally, the unfolded state can be modeled usingonly the neighboring residues of the residue to bemutated, greatly simplifies the calculation of DGuf

Nsp!Msp

and reducing the computational cost involved. Thesecomputational savings can be reinvested by increasingthe number and length of frames used to calculate

DGfNsp!Msp

and DGufNsp!Msp

via the free-energy perturba-tion method, which greatly improves the accuracy of thecalculation.As a test case for the performance of Eq. (31), we have

chosen to examine the pseudo-wild-type ubiquitin andthe Asp21Asn mutant discussed in our previous work21

(see also Ref. 30). The three-dimensional representationof this system is shown in Figure F66. We have chosen thissystem because it has been well studied by other

J_ID: Z7E Customer A_ID: 00524-2009.R1 Cadmus Art: PROT22640 Date: 21-NOVEMBER-09 Stage: I Page: 9

ID: kumarpr Date: 21/11/09 Time: 01:02 Path: N:/Wiley/3b2/PROT/VOL00000/090317/APPFile/C2PROT090317

Figure 5The thermodynamic cycles used to calculate the change in free energy of unfolding upon mutation (see text for details).

CG Model Simulations of Protein Landscapes

PROTEINS 9

Messer et al. Proteins (2010)

simplified reference potentials



be used in the method. The energy function used in theSCP step consists of the van der Waals and dihedral energyterms as defined by the AMBER99 force field.55

The method to place the side-chains is a straightforwardhill climbing algorithm. It starts by generating an initial

structure that contains positions for all of the side-chainsbased on the backbone atom positions. The initial struc-ture is constructed by placing the rotamer on each residuethat has the minimal energy between the side-chain andthe backbone atoms of the other residues. During this pro-cedure any rotamer that has interaction energy with thebackbone higher than a user defined cutoff is discardedand is no longer considered in further iterations. Theenergy cutoff helps to improve the efficiency by eliminat-ing any side-chains that have steric clashes with atoms inthe backbone. In this work an energy cutoff of 100 kcal/mol was used. Unlike the Xiang and Honig method, weonly use this one structure as the initial conformationrather than generating 120 starting conformations. Thiswas done to improve efficiency, even if it could cause aslight decrease in accuracy. However, the results show thatthe method still performs well. Additionally, there is evi-dence that most side-chains can be placed correctly byonly using their interactions with the backbone57 andother methods also use this as the initial structure.54

Starting from the initial structure an iterative proce-dure is used to find side-chains with the lowest energy.Each side-chain is selected in turn, and the interactionenergy between the possible rotamers of currentlyselected side-chain and all of the other currently placedside-chains and the backbone is considered. If there isanother rotamer that has a lower energy, the currentrotamer is replaced by the lower energy rotamer. Thiscontinues until after a full iteration over the entire pro-tein none of the rotamers are replaced or until a userspecified maximum number of iterations is reached. Inthe results of this paper the maximum number of itera-tions allowed was 10, which was never reached whenreconstructing the coarse-grain simulations of src-SH3and S6. The rotamers can be considered sequentiallydown the chain or in a random order. We found that forthe proteins studied in this paper the order of iterationdid not affect the results.

Side-chain minimization step

After the SCP a number of side-chains in very high-energy conformations were detected. The all-atom minimi-zation could only fix a small fraction of the high-energy

Figure 1Cartoon illustration of the RACOGS method for a short peptide with thesequence VAL-ASP-SER-LEU-VAL. (1) Starting from the Ca atoms the backboneatoms are added. (2) After the backbones are added the side-chains are placed.(3) The first and third amino acids, circled, are clashing and causing a highenergy interaction. The side-chain minimization step is performed on the firstamino acid and resolves the clash. The last step of adding hydrogens andperforming an all-atom minimization is not shown. [Color figure can be viewedin the online issue, which is available at www.interscience.wiley.com.]

Multiscale Analysis of Protein Landscapes

DOI 10.1002/prot PROTEINS 649

(de)constructing CG-aA representations

Heath et al. Proteins (2007)

RACOGS

Ávila et al. CPPS (2011)



all atom vs coarse grain force field

AMBER COARSE GRAINED FORCFIELD

Cesar L. Avila and Nils J. D. Drechsel and Jordi Villa–Freixa

September 7, 2011

Abstract

abstract

1 functional form

nils: Can you specify the meaning of each term? Some are obvious but others

are not at all, like the one containing the g function jordi: The order of the

terms in the “long” version corresponds with the order of the short version. All

terms containing ss, mm and ms are sidechain-sidechain, mainchain-mainchain

or mainchain-sidechain interactions respectively. They are: (1) mainchain-

mainchain interactions where the “normal” amber forcefield is used. (2) sidechain-

sidechain vdw. (3) sidechain-sidechain electrostatics. (4) selfenergy, which is the

energy to transfer a residue from the gasphase into the protein and is e,g, larger

for polar residues into nonpolar environments, thus it depends on neighboring

atoms. (5) sidechain-mainchain vdw. (6) sidechain-mainchain electrostatics. (7)

hydrogenbonding. (8) an additional mainchain-mainchain torsional potential to

favor secondary structures. (9) additional mainchain-mainchain electrostatics

(but I’m not sure for what exactly. Didn’t find information in the paper either,

maybe CA c�sar knows more)

V (r)amber =

�

bonds

κb(b− b0)2+

�

angles

κθ(θ − θ0)2

+

�

dihedrals

(Vn/2)(1 + cos[nφ− δ])

+

�

nonbij

(Aij/r12ij)− (Bij/r

6ij) + (qiqj/rij)

+GB(r)

(1)

nils: is the value 9 in the HB term fixed or is it variable too in the GA?

jordi: The 9 is actually a parameter that we optimize

V (r)ambercg =U0mm

+ Uef

ss+ UQQ

ss+ Uself

s+ Uef

ms+ UQq

ms

+∆UHB

mm+∆Uφ−ψ

mm+∆Uqq

mm

(2)

1

AMBER COARSE GRAINED FORCFIELD

Cesar L. Avila and Nils J. D. Drechsel and Jordi Villa–Freixa

September 7, 2011

Abstract

abstract

1 functional form

nils: Can you specify the meaning of each term? Some are obvious but others

are not at all, like the one containing the g function jordi: The order of the

terms in the “long” version corresponds with the order of the short version. All

terms containing ss, mm and ms are sidechain-sidechain, mainchain-mainchain

or mainchain-sidechain interactions respectively. They are: (1) mainchain-

mainchain interactions where the “normal” amber forcefield is used. (2) sidechain-

sidechain vdw. (3) sidechain-sidechain electrostatics. (4) selfenergy, which is the

energy to transfer a residue from the gasphase into the protein and is e,g, larger

for polar residues into nonpolar environments, thus it depends on neighboring

atoms. (5) sidechain-mainchain vdw. (6) sidechain-mainchain electrostatics. (7)

hydrogenbonding. (8) an additional mainchain-mainchain torsional potential to

favor secondary structures. (9) additional mainchain-mainchain electrostatics

(but I’m not sure for what exactly. Didn’t find information in the paper either,

maybe CA c�sar knows more)

V (r)amber =

�

bonds

κb(b− b0)2+

�

angles

κθ(θ − θ0)2

+

�

dihedrals

(Vn/2)(1 + cos[nφ− δ])

+

�

nonbij

(Aij/r12ij)− (Bij/r

6ij) + (qiqj/rij)

+GB(r)

(1)

nils: is the value 9 in the HB term fixed or is it variable too in the GA?

jordi: The 9 is actually a parameter that we optimize

V (r)ambercg =U0mm

+ Uef

ss+ UQQ

ss+ Uself

s+ Uef

ms+ UQq

ms

+∆UHB

mm+∆Uφ−ψ

mm+∆Uqq

mm

(2)

1



Adun

http://adun.imim.esJohnston et al. JCC (2005), LNCS (2007)

http://adun.imim.es

http://adun.imim.es



Adun

http://adun.imim.esJohnston et al. JCC (2005), LNCS (2007)

Different local and remote (P2P) databases

http://adun.imim.es

http://adun.imim.es



3 4 5 6 7 8 9

−300

−280

−260

−240

−220

−200

RMSD

Ener

gy /

kcal

mol−1

CG potential energy landscape for alpha helix

3 4 5 6

−250

−200

−150

−100

−50

RMSD

Ener

gy /

kcal

mol−1

CG potential energy landscape for beta hairpin

Figure 2: Correlation between RMSD and Energy for the complete sampling space of

the model peptides. Hexagon binning for snapshots of the 300K trajectory for (Ala)15

(left) and (Val)5ProGly(Val)5 (right).

-5

-4

-3

-2

-1

Rg

yr

RMSD(A)

3

4

5

6

7

8

9

5 6 7 8 9 10-6

-5

-4

-3

-2

-1

Rg

yr

RMSD(A)

4.2

4.42

4.64

4.86

5.08

5.3

0 1.2 2.4 3.6 4.8 6

Figure 3: Peptide folding FES. Data for (Ala)15 (left) and (Val)5ProGly(Val)5 (right)

are depicted. The RMSD is calculated against a model α–helix and β–hairpin respec-

tively.

8

3 4 5 6 7 8 9

−300

−280

−260

−240

−220

−200

RMSD

Ener

gy /

kcal

mol−1

CG potential energy landscape for alpha helix

3 4 5 6

−250

−200

−150

−100

−50

RMSD

Ener

gy /

kcal

mol−1

CG potential energy landscape for beta hairpin

Figure 2: Correlation between RMSD and Energy for the complete sampling space of

the model peptides. Hexagon binning for snapshots of the 300K trajectory for (Ala)15

(left) and (Val)5ProGly(Val)5 (right).

-5

-4

-3

-2

-1

Rg

yr

RMSD(A)

3

4

5

6

7

8

9

5 6 7 8 9 10-6

-5

-4

-3

-2

-1

Rg

yr

RMSD(A)

4.2

4.42

4.64

4.86

5.08

5.3

0 1.2 2.4 3.6 4.8 6

Figure 3: Peptide folding FES. Data for (Ala)15 (left) and (Val)5ProGly(Val)5 (right)

are depicted. The RMSD is calculated against a model α–helix and β–hairpin respec-

tively.

8

cases, the CG folding FES shows a single dominant non-native minimum present in

addition to the native basin16

. In agreement with the previous figures, it seems clear

that the coarse grain model is problematic in handling β secondary structure interac-

tions. The origin of this problem is not the HB pattern, but, as suggested above, the

Uphi−psi

term. In order to analyze this fact, we project the FES with respect to the two

main chain torsional coordinates and superimpose the expected Ramachandran angles

for both peptides. Figure 4 clearly shows a displacement from the expected angles

Figure 4: Peptide folding FES projected onto backbone dihedral space. Trajectories

for polyalanine are displayed on the left while those for the hairpin are shown on the

right.

for the β–hairpin system, while a less pronounced displacement is seen in the (Ala)15

peptide.

Despite the limitation of the coarse grain potential, especially for β–hairpin struc-

tures, the objective of this paper is to demonstrate the feasibility in building a complete

pipeline for multiscale simulations in a portable software like Adun, so next, we an-

alyze the ability of the method to reconstruct the free energy surface for the all atom

system. Figure 5 shows the results for the corrected FE, obtained as follows. First a

series of 8× 8 (α–helix) and 16× 16 (β–hairpin) representative structures were ob-

tained from Figure 3. For each of these a free energy perturbation (FEP) protocol was

9

carried out in which one slowly transforms the structure from a coarse grain represen-

tation to an all atom representation (see the Methods section for details). As the FEP

involves the appearance of new atoms in the system, it is obvious that care must be

taken not to make the simulation explode right after the first (fully coarse grain) FEP

window. The final reconstructed all atom FES is shown in Figure 5. In terms of the

-4

-3

-2

-1

0

Rgyr

RMSD(A)

3

4

5

6

7

8

9

5 6 7 8 9 10

-6

-5

-4

-3

-2

-1

Rgyr

RMSD(A)

4.2

4.42

4.64

4.86

5.08

5.3

0 1.2 2.4 3.6 4.8 6

Figure 5: Corrected folding FES for the (Ala)15 (left) and (Val)5ProGly(Val)5 (right)peptides derived from Figure 3, using the multiscale free energy perturbation approachto move from the CG to the AA model.

final structures, the expected α–helix and β–hairpin structures are compared with the

central structures in the relevant minimum region in Figure 4 and in Figure 5. The

last row in Figure 6 shows a representative structure for the minimum FE region in

Figure 5. In both Figure 5 and Figure 6 it can be seen that the FEP protocol does not

significantly introduce changes into the shape of the CG FES when moving to the all

atom representation. This result is significant, as it shows that the global pipeline of

the method is entirely sound, as it brings about the possibility of improving individual

modules of the protocol, in particular the quality of the coarse grain potential used as

reference for the FEP.

10

CG PES and reconstruction of the AA PES

alpha-helix beta hairpin

all atom



Target optimal Coarse Grain CG to AA



testing in CASP



-3.5

-3

-2.5

-2

-1.5

-1

-0.5

Rg

yr

RMSD(A)

10.4

10.74

11.08

11.42

11.76

12.1

0.5 2.28 4.06 5.84 7.62 9.4

-10

-8

-6

-4

-2

Rg

yr

RMSD(A)

10.4

10.74

11.08

11.42

11.76

12.1

0.5 2.28 4.06 5.84 7.62 9.4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

Rg

yr

RMSD(A)

8.5

9.12

9.74

10.36

10.98

11.6

0.7 2.82 4.94 7.06 9.18 11.3

-7

-6

-5

-4

-3

-2

-1

Rg

yr

RMSD(A)

8.5

9.12

9.74

10.36

10.98

11.6

0.7 2.82 4.94 7.06 9.18 11.3

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

Rg

yr

RMSD(A)

10.1

10.46

10.82

11.18

11.54

11.9

1 2.86 4.72 6.58 8.44 10.3

-9

-8

-7

-6

-5

-4

-3

-2

-1

Rg

yr

RMSD(A)

10.1

10.46

10.82

11.18

11.54

11.9

1 2.86 4.72 6.58 8.44 10.3

Figure 6: Coarse grain FES (left column) and all atom FES reconstruction (right col-

umn) for three selected structures of increasing complexity: A fast folder, PDB code

1HRC (top row); a two state folder, PDB code 1PRB (middle row), and a challenging

CASP 2010 structure 3NZL (bottom row)

14

1HRC

1PRB

3NZL



Testing on CASP 2010 targets

Table 1: Cα RMSD in terms of Cα coordinates obtained for the target structures: A)after ab-initio folding with Rosetta; B) the best structure compared to the native inthe 300K trajectory; C) the best structure in the bin corresponding to the minimumin the corrected free energy surface compared to the native structure; D) all-atomicrepresentation of the reference structure with which the free energy perturbation wascarried out; The lines correspond to: first, one-state folder; second, two-state folder;third, CASP 2008 targets; fourth, CASP 2010 targets

PDB-ID RMSD RMSD RMSD RMSD residues(Rosetta) (globally best) (CG, best (AA, reference

in minimum bin) structure)

1PRB 4.94 5.31 10.32 11.03 531HRC 8.24 7.64 11.49 9.77 1031YCC 6.77 8.11 8.71 8.61 1082K53 6.64 6.38 6.56 8.20 762K5C 9.53 8.03 10.77 11.51 952K5E 6.19 5.19 5.36 5.96 732KDL 10.36 9.61 9.89 11.38 563DAI 13.82 12.18 14.06 13.96 1302KY4 11.39 10.72 10.82 13.32 1492L06 8.15 9.38 11.14 9.89 1552L09 9.26 7.34 9.52 8.39 623NEU 9.59 9.64 9.94 10.28 1253NRW 12.49 8.29 11.74 14.22 1173NZL 6.89 6.55 7.00 10.38 83

Free energy surfaces

As in the previous section, the first step in our protocol is to obtain the FES generated

by exploring the coarse grain PES. Our starting points are now the ab initio folded

structures obtained before for the 14 targets. After the initial CG FES analysis we

move onto the all atom reconstruction for each structure. Table 1, Figure 6 and Figures

S3-S6 show the results obtained.

We first analyze the different columns in Table 1. Starting from the Rosetta results

(second column in the table) we observe that the sampling of the CG potential produces

structures that in some cases explore regions closer to the native structure than the

Rosetta protocol itself. This is the case for 10 of the 14 structures analyzed. Such

12

sequence

secondary structureanalysis

generate 3aa + 9aastructural fragments

generate initialtertiary structure

relax tertiarystructure

energy scorestructure

choose structurewith lowest energy

x 10

Figure 7: Protocol used to generate the initial structures for structures in Table 1 withRosetta

19



Improving the model



Improving the model• better sampling

– statistical potentials as reference– transition path sampling and string method– GPU computing– better global reaction coordinate



Improving the model• better sampling

– statistical potentials as reference– transition path sampling and string method– GPU computing– better global reaction coordinate

• better energy– polarizable models– solvation models– simple improvement of the CG

parameters



rethinking the problem: hotspotsNative

UnfoldedNativeFolded

NativeComplex

MutantUnfolded

MutantFolded

MutantComplex

! G Nf

! GMf ! GM

a

! G Na

!! G N ! >MB = ! GM

B ! ! G NB = ( ! GM

a ! ! GMf ) ! (! G N

a ! ! G Nf )

!! G N ! >MB = ! G N ! >M

c ! ! G N ! >Mf ! ! G N ! >M

u

! G N ! >Mc! G N ! >M

f! G N ! >Mu



rethinking the problem: hotspots



changes on binding free energy upon mutation

WT MutantA MutantB MutantC

Unfolded

Folded

Complex

Unfolded

Folded

Complex

UnfoldedFolded

Complex

Unfolded

Folded

Complex

Folding

Binding

Destabilization of complex

Destabilization of unbound protein

Mixed



extensive optimization

Trajectory reconstruction

explicit vs CG Pearson

correlation



initi

al p

opul

atio

n

start

initialparameters

mutate

calculate !tness

calculate !tness

calculate !tness

calculate !tness

spawns processesparent 1

parent 2

parent 1

parent 2

o"spring 1

o"spring 2

o"spring 1

o"spring 2

select parents with a rankbased roulette-wheel selector

recombine with a uniform recombinationoperator, mutate with a gaussian mutationoperator

control process

parallel serial

checks if completed

spaw

ns p

roce

sses

colle

cts n

ew p

aram

eter

s

initialization

CG model optimization with GA




-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200 250 300 350

pear

son

coef

ficie

nt

generation

simulation 9

meanstandard deviations

bestworst




0.5

0.6

0.7

0.8

0.9

1

1.1

VDW2VDW3

HBOND

BONDED

CGX-CT-N

CGX-CT-N3

CGX-CT-H1

CGX-CT-C

AHELIX

BSHEET

GTURN

LHELIX

varia

nce

bond types

simulation 9



120

140

160

180

200

220

240

260

280

300

150 200 250 300 350 400 450

coar

se-g

rain

ed [k

cal/m

ol]

explicit [kcal/mol]

energy comparison

60

80

100

120

140

160

180

80 100 120 140 160 180 200 220 240 260 280

coar

se-g

rain

ed [k

cal/m

ol]

explicit [kcal/mol]

energy comparison



-460

-440

-420

-400

-380

-360

-340

-320

450 500 550 600 650 700 750

coar

se-g

rain

ed [k

cal/m

ol]

explicit [kcal/mol]

energy comparison

-480

-460

-440

-420

-400

-380

-360

-340

450 500 550 600 650 700 750

coar

se-g

rain

ed [k

cal/m

ol]

explicit [kcal/mol]

energy comparison

GA

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

coar

se-g

rain

ed

explicit

correlation

we still have a

problem



Still work in progress

0

2

4

6

8

10

12

14

-1 0 1 2 3 4 5 6 7 8

Pred

icte

d G

bind

ing

(kca

l/mol

)

Experimental Gbinding (kcal/mol)

Hotspot prediction

TPFP

TN FN

AK27A

AR59A

AR83Q

AR87A

AH102A

DY29F

DY29A

DD35A

DW38F

DD39A

DT42A

DW44F

DE76A

DE80A



Beyond the current model

– using statistical potentials as a reference (collaboration with Janusz Bujnicki, IIMCB Warsaw)

– including a semi-explicit solvation model (collaboration with Ken A. Dill, SBNY)



RNA Statistical potential

Raúl Alcántara



Non bonded interactions



Fitting non-bonded term

MatlabInitial guesses from GMM

MathematicaNon-linear fit



force field markup language: FFML

<mrow potential="FourierTorsion" numberOfAtoms="4" type="energy">

<variable name="angle">

<parameterdata number="1" name="param1">


<apply>

<times/>

<ci>param1</ci>

<apply>

<power/>

<apply>

<minus/>

<ci>angle</ci>

<ci>param2</ci>

</apply>

<cn>2</cn>

</apply>

</apply>

</mrow>

<mrow potential="FourierTorsion" numberOfAtoms="4" type="force">

<variable name="dAngle">

<variable name="angle">



<apply>

<times/>

<cn>2</cn>

<ci>param1</ci>

<apply>

<minus/>

<ci>angle</ci>

<ci>param2</ci>

</apply>

<ci>dAngle</ci>

</apply>

</mrow>

Meneu et al., in progress



SECIS elements simulations



!ra

"ra

rw

rw

a b

c d

0

1

2

3

4

!G

(kca

l/m

ol)

Semi-Explicit assembly

! A + b

TIP3P

Experiment

a

0

1

2

3

4

!G

(kca

l/m

ol)

b

-1

0

1

2

3

4

5

!G

(kca

l/m

ol)

c

-1

0

1

2

3

4

5

!G

(kca

l/m

ol)

d

Linear Alkanes Linear Alkynes

Linear PAHs Planar PAHs

CH4

Figure 3: The nonpolar solvation free energy for a series of a) linear alkanes, b) linear alkynes, c) polyaro-

matic hydrocarbons (PAHs) in a linear arrangement, and d) PAHs in a planar arrangement calculated using

γA + b, Semi-Explicit assembly, and explicit solvent. For γA + b, the traditional (0.00542× SAtot) + 0.92was used,

18and the TIP3P results are those obtained through explicit free energy calculations.

36Experi-

mental comparisons to ∆G cannot be drawn with the linear alkynes or PAHs series, because they have a

substantial polar term to the overall solvation.

12

a c eb d

Figure 4: Maps of the collective dispersion attraction about the solvent accessible surface (SAS) of a) n-pentane, b) cyclopentane, c) pent-1-yne, d) benzene, and e) pyrene. The color of the surface indicates the LJwell-depth, with blue starting at 0 kcal/mol and red lowering to deeper than 5 kcal/mol. Note the red “hotspots” around the triple bond in pent-1-yne and in the center of the benzene and pyrene ring planes. Theseindicate a significant enhancement of dispersion attraction with the surroundings. As these regions growwith increasing molecule size, these collective dispersion attractions will offset the cost of cavity formationin surrounding solvent. With a simple γA, all these surfaces would be a uniform blue.

Figure 3b shows solvation free energies for the linear alkynes, from the various models. Alkynes have

a carbon-carbon triple bond at the end of the chain. In the GAFF forcefield,49 the dispersion interaction

well-depth is twice that of carbon-carbon single bonds. Like the explicit simulations, but unlike γA, the

Semi-Explicit approach captures the more favorable aqueous solvation of the alkynes relative to the alkanes.

Figures 4a and 4c show that the extra attraction for water of the alkynes is localized near the triple bond.

Hot spots: not all hydrocarbons are the same

Figures 4a and 4b show the LJ potential surfaces for n-pentane and cyclopentane. Seams between atom

surfaces form favorable interaction “hot spot” regions, while methyl end-groups of the alkane chain are

a deeper blue and less favorable. The surface area of cyclopentane is less than n-pentane, but this only

accounts for a modest decrease of 0.2 kcal/mol in ∆G when using γA + b. This modest change is much

less than the greater than 1 kcal/mol decrease seen experimentally.9, 50 Semi-Explicit assembly includes

the effects of these “hot spots” and lowers ∆G by an additional 0.4 kcal/mol. The remaining difference

between the estimated and the experimental value likely comes from approximations in the Semi-Explicit

assembly approach, such as the void term discussed previously and the incomplete capturing of solvent-

solvent interaction enhancement from optimal hydration cages.

13

Fennell et al. JACS (2010)

SEA water model



NILS



-1

0

1

2

3

4

-1 0 1 2 3 4

Fern

et (k

cal/m

ol)

TIP3 (kcal/mol)

504molecule cluster, ADUN

SEA-Watercorrelation with explicit

water 0.91

Semi-Analytic SEA-Watercorrelation with

explicit water 0.95

SEA-water vs TIP3P

Drechsel et al, in preparation



summary!Gfold

wt

m

!Gbindwt

!Gfold !Gbindm

wt-m!!Gunfolded !!Gbound!!Gunbound

wt-m wt-m



Janusz Bujnicki (IIMCB)Ken Dill and Chris Fennell (UCSF)

Sergio RubioJames Dalton (UAB)Nils Drechsel (SBNY)Michael A. Johnston (IBM)Norma Díaz-‐Vergara (UAB)César L. Ávila (U Tucumán)

aScidea

aScideaCOMPUTATIONAL BIOLOGY SOLUTIONS

COMPUTATIONAL BIOLOGY SOLUTIONS

Proposta 04Cromosoma B

CÉSAR

NORMA

JAMES

NILS

MICHAEL

SERGIO

ACCESS TO INFORMATION



ACCESS TO INFORMATION



INTEGRATION



-10

-8

-6

-4

-2

0

1 1.5 2 2.5 3

E (kc

al/mo

l)

rOH-180 -120 -60 0 60 120 180!

-180

-120

-60

0

60

120

180

"

-10-9-8-7-6-5-4-3-2-1 0

-2-1.5

-1-0.5

0 0.5

1 1.5

2

3 4 5 6 7 8 9 10

E (kc

al/mo

l)

rij

Usp = Umm + U efss + U QQ

ss + U selfs + U ef

ms + U Qqms + ! U HB

mm + ! U !! "mm + ! U qq

mm

U efss =

i<j

0ij

C scaleij

[3(r 0ij /r ij )8 ! 4(r 0ij /r ij )6 ]

! U HBmm =

! 9 r " 2.0

! 9exp(! 15(r ! 2.0)2 ) r > 2.0

! U !! "mm =

4

i=1

A i g(!! !i0 , w

i0 )g(" ! "i

0 , wi0 )

g(# , w) = exp(! 0.693(1 ! cos(# )) /sin (w/2))

U selfs =

i

Unp (N inp ) + Upolar (N i

polar )

Unp =4exp[! 0.2(N np ! 6)2 ] N np " 64 N np > 6

Upolar =! 2exp[! 0.2(N polar ! 4)2 ] N polar " 4! 2 N polar > 4

r ij

(!, ")

Washel and col. (2009-2010)

Download - 20120703 girona seminar

Top Related