Download - 20120703 girona seminar
Girona Seminar, 03/07/2012
multiscale structure and function
Multiscale simulations to identify hotspots in protein interfaces
César L. Ávila, Nils Drechsel, Michael A. Johnston and Jordi Villà-Freixa
Universitat Pompeu Fabra / Universitat de Vichttp://cbbl.imim.es
Girona Seminar, 03/07/2012
multiscale structure and function
Girona Seminar, 03/07/2012
multiscale structure and function
Girona Seminar, 03/07/2012
multiscale structure and function
sequence->structure->function...
Guix et al. Brain (2009)Giupponi et al (in preparation)
Girona Seminar, 03/07/2012
multiscale structure and function
sequence->structure->function...
Guix et al. Brain (2009)Giupponi et al (in preparation)
Girona Seminar, 03/07/2012
multiscale structure and function
sequence->structure->function...
Guix et al. Brain (2009)Giupponi et al (in preparation)
Girona Seminar, 03/07/2012
multiscale structure and function
Girona Seminar, 03/07/2012
multiscale structure and function
how do we explore protein movements?
Girona Seminar, 03/07/2012
multiscale structure and function
M
M
M
SP
SP
reaction coordinate
potential energy surface
Girona Seminar, 03/07/2012
multiscale structure and function
example of the dynamics of a RING finger domain
Girona Seminar, 03/07/2012
multiscale structure and function
great... as long as we have a good knowledge of the structure but what if not?
example of the dynamics of a RING finger domain
Girona Seminar, 03/07/2012
multiscale structure and function
1 11 21 31 41
2NMS.pdb, chain A G I P Q I T G P T T V N G L E R G S L T V Q C V Y R S GW E T Y L K WW C R G A I W R D C K I L V K18 rat.B99990001.pdb G C V P L R G P S S V T G T V G E S L N V T C Q Y E E R F K M N K K Y W C R G S L V L L C K D I V R23
51 61 71 81 91
2NMS.pdb, chain A T S G S E Q E V K R D R V S I K D N Q K N R T F T V T M E D L M K T D A D T Y W C G I E K T G N D L68 rat.B99990001.pdb T G G S E . E A R N G R V S I R D D R D N L T F T V T L Q N L T L E D A G T Y M C A V D I P L I D H73
101 111
2NMS.pdb, chain A G V T V Q V T I D P A P118 rat.B99990001.pdb S F K V E L S V V P G N122
RMSD
RMSD
RMSD
-
James Dalton Sergio Rubio
in collaboration with Joan Sayós
Girona Seminar, 03/07/2012
multiscale structure and function
!
A more challenging problem: CFTR
James Dalton
Dalton, Kalid, Shushan, Ben-Tal and Villà-Freixa . J. Chem. Inf. Mod. in press
!
Serohijos et al. PNAS 2008 Mornon et al. Cell Mol Lif Sci 2008
Girona Seminar, 03/07/2012
multiscale structure and function
!
A more challenging problem: CFTR
James Dalton
Dalton, Kalid, Shushan, Ben-Tal and Villà-Freixa . J. Chem. Inf. Mod. in press!
Girona Seminar, 03/07/2012
multiscale structure and function
homology modeling + docking + md
CRIS CHARNECO
Mulero et al. submitted
Girona Seminar, 03/07/2012
multiscale structure and function
Explicit microscopicMajor convergence challenges
Simplified microscopic(PDLD)
Semi-macroscopic(PDLD/S-LRA)
Schutz et al., proteins (2001)
Beyond dynamics: energetics
Girona Seminar, 03/07/2012
multiscale structure and function
PDLD/S-LRA
Warshel and coworkers
Girona Seminar, 03/07/2012
multiscale structure and function
Khan et al. (unpublished)
residue differential Solvation
Girona Seminar, 03/07/2012
multiscale structure and function
Girona Seminar, 03/07/2012
multiscale structure and function
2-steps CDK2/CyclinA
Bonet et al. proteins (2006)
Girona Seminar, 03/07/2012
multiscale structure and function
2-steps CDK2/CyclinA
Bonet et al. proteins (2006)
Girona Seminar, 03/07/2012
multiscale structure and function
2-steps CDK2/CyclinA
Bonet et al. proteins (2006)
stabilization of Lys33 by its direct interactions to Glu51and Asp145. An interesting feature can be seen in theresults of Glu51 and Asp145. In this case, one would expecta similar behavior as for Lys33, but it is not the case. ForGlu51, the effect of conformational reorganization is prac-
tically nonexistent, and the effect of binding to Cyclin A isagain irrelevant for its stabilization. This is striking,because one would expect a large stabilization change ofthis residue upon spatial reorganization of the PSTAIRE.When looking at the matrix of interactions in Figure 4a we
Fig. 3. Bar plots showing the stabilization for all residues in CDK2 in three different conformational states:(a) unbound CDK2 (!!Gsolv
i,CDK2); (b) bound CDK2 conformation with calculations performed in the absence ofCyclin A (!!Gsolv
i,CDK2"); and (c) bound CDK2 (!!Gsolvi,CDK2"/CyclinA). The “ALL” label refers to the sum of all
stabilities.
70 J. BONET ET AL.
Girona Seminar, 03/07/2012
multiscale structure and function
2-steps CDK2/CyclinA
Bonet, et al Proteins (2006)
free
ene
rgy
soft complexes
strong complexes
catalytic regulatory
A B
A' B'
A' B'
Girona Seminar, 03/07/2012
multiscale structure and function
RING fingers / UBC interactions
Schepper, et al Proteins (2009)
Girona Seminar, 03/07/2012
multiscale structure and function
RING fingers / UBC interactions
Schepper, et al Proteins (2009)
Girona Seminar, 03/07/2012
multiscale structure and function
Schepper, et al Proteins (2009)
Girona Seminar, 03/07/2012
multiscale structure and function
Norma Díaz-Vergara
Girona Seminar, 03/07/2012
multiscale structure and function
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
True
Pos
itive
Rat
e
False Negative Rate
Transient Proteins
GPDLD
GRobetta
GPDLDTran
GRobetta Tran
GPDLDTran + ASA 0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1Tr
ue P
ositi
ve R
ate
False Negative Rate
Permanent Proteins
GPDLD
GRobet
GRobet Perm
ASA Perm
ASA Perm
GPDLD Perm
Keskin and coworkers; Díaz-Vergara, et al in preparation
Girona Seminar, 03/07/2012
multiscale structure and function
AYTON & VOTH, COSB (2007)
going multiscale
Girona Seminar, 03/07/2012
multiscale structure and function
Coarse grain models
• average contribution of groups of atoms
• use of the most effective variables (greatest conformational change from the smallest energy change)
Marrink et al. (2003-2011)
Girona Seminar, 03/07/2012
multiscale structure and function
Levitt, NSB (2004)
energy vs sampling
Girona Seminar, 03/07/2012
multiscale structure and function
Johnston, Villà-Freixa, in progress
energy vs sampling
Girona Seminar, 03/07/2012
multiscale structure and function
Johnston, Villà-Freixa, in progress
energy vs sampling
Girona Seminar, 03/07/2012
multiscale structure and function
Johnston, Villà-Freixa, in progress
energy vs sampling
Girona Seminar, 03/07/2012
multiscale structure and function
Johnston, Villà-Freixa, in progress
energy vs sampling
Girona Seminar, 03/07/2012
multiscale structure and function
EXPLORATION
REFINEMENT
Potential Energy Surface Coordinates MappingEn
ergy
Ener
gy
Internal Coordinates
Coarse-grained
All-Atom
Girona Seminar, 03/07/2012
multiscale structure and function
IV. Concluding RemarksThis work develops and examines a method for calculations
of activation entropies of chemical reactions in solution. Themethod developed focused on the entropic contribution of thereacting fragments. Reliable evaluation of this contribution isessential for further progress in the understanding of the roleof entropic effects in enzyme catalysis. Our approach involvesthe thermodynamic cycle of Figure 1, where the activationbarriers are considered for two paths; in one path the reactingfragments are restrained to a single reaction coordinated and inthe second path they are allowed to move in the subspaceperpendicular to the reaction coordinate. The difference betweenthese two activation barriers can provide our -T(!Sq)!. How-ever, instead of performing a direct calculation along the abovepaths we use the free energies !G!RS and !G!TS obtained bystarting from the states where the fragments in the RS and TS,respectively, are forced to stay at a given Rh (by a strongCartesian restraint) and then allowing the fragments to moveby releasing the restraints. The difference between !G!RS and!G!TS gives us the desired -T(!Sq)! and a residual contribu-tion from the enthalpy of the system. This residual contributionwas minimized by finding restraint coordinates that minimized|!G!RS| and |!G!TS|.In order to obtain converging results, it is essential to have
ability to perform an extensive sampling of the availablepotential surface and this cannot be done at present with high-level ab initio approaches. Thus, we use here the EVB potentialsurfaces. These surfaces can provide a good approximation forthe corresponding ab initio potential surfaces and describe in aconsistent way the effect of the solvent on the solute Hamil-tonian.The value of -T(!Sq)! found in this work is smaller than
the value predicted by traditional estimates that involve the lossof three translations and three rotations in the transition stateof a bimolecular reactions,1-3 and amount to !15 kcal/mol. Thereason for this becomes clear upon examination of the presentsystem. It appears that many degrees of freedom have similarmotions at the ground state and the transition state. This is thecase, for example, for the second water molecule that becameH3O+ at the transition state but still retained a large configu-rational freedom. Thus, a large part of the entropic contributionin the RS remains also in the TS. In view of the present result,it is quite likely that previous consideration of the entropiccontributions to enzyme catalysis (e.g., refs 1-3) reflect anoverestimate, since the entropic contributions of the referencesolution reaction were probably overestimated. However, a moreconclusive study of this important issue requires one to use thepresent approach in studies of the contribution of the substratesmotion to the activation free energies of enzyme catalysis. Suchstudies are now in progress in our laboratory.
Acknowledgment. This work was supported by the NIHGrant GM24492. J.V. acknowledges EMBO fellowship ALTF509-1998. We are grateful to Dr. J. Florian and Mr. C. F. Jenfor insightful discussion.
AppendixIn order to obtain the solute entropic contribution to a transfer
between two potential surfaces (UIf UII), we have to separatethe corresponding partition functions to solute and solventcontributions by writing
where UN is the potential surface for state N and !(R - Rh)indicates that the corresponding function would be collected at(!R/2. Here R and r are the solute and solvent coordinate,respectively, pN(Rh) is the solute unnormalized probabilitydistribution evaluated at the given Rh averaged over all thesolvent coordinates and WN(Rh) is the corresponding potentialof mean force (PMF) which includes, of course, the solute andsolvent contributions. Expanding this potential around itsminimum gives
where we could, of course, use R but we took Rh as our variableto indicate that this is our restraint coordinate. Here, !gsol(Rh)is the solvation free energy at Rh , KW is the force constant ofthe quadratic term of our expansion and Rh0
N is the value of Rh atthe minimum of WN.Now, for the sake of simplicity, we continue by examining
a case where the PMF corresponds to a one-dimensional solutecoordinate, although extension to many dimensions is straight-forward. Our one-dimensional case is described by the ther-modynamic cycle of Figure 3. In this cycle, which is theequivalent of Figure 1, we divide R to segments that correspondto our qN’s. Each of this segments can be defined by the deltafunction of eq A1 or by introducing a strong quadratic constraint((K1/2)(R - Rh)2) as is done in the present work. In this case,we have
Next, we use eq A2 and our quadratic constraint and evaluatethe relevant partition functions.
Figure 3. A schematic illustration of the effect of Rh and the enthalpiccontribution to !G!. The figure considers one-dimensional potentialsof mean force (W) for two systems and the effect of confining thesystem to small segments of R by a strong constraint (K1). The figureis essentially a rigorous equivalent of Figure 1 where the effect ofrestraining the system in different Rh’s can be formulated. As explainedin the Appendix and illustrated here, the value of !G! depends on Rhand the !G! obtained with the minimum of the corresponding W (i.e.,Rh0) has the smallest absolute value. This !G! is our -T!S!.
WN(Rh) ) UN(Rh) + !gsolN (Rh) ) WN
0 (Rh0N) + (KW
N /2)(Rh - Rh0N)2
(A2)
W!N(Rh) ) WN0 (Rh0
N) + (KW/2)(Rh - Rh0N)2 + (K1/2)(Rh - Rh0
N)2
) WN(Rh) + (K1/2)(Rh - Rh0N)2 (A3)
QN ) e-WN0(Rh)" "-"
" dRh e-(KWN/2)(Rh-Rh0N)2" ) e-WN
0(Rh)" #2#/"KWN
QN )"dR"dr e-UN(R,r)" )"dRh ["dR"dr !(R -
Rh)e-UN(R,r)"] )"dRh pN(Rh) )"dRh e-WN(Rh)" (A1)
Activation Entropies of Chemical Reactions J. Phys. Chem. B, Vol. 104, No. 18, 2000 4583
Strajbl et al. JPCB (2000), Villà et al. PNAS (2000)
is the change of the parameters associated with the Xatom for the residue being mutated. Furthermore, itis not necessary to force the folded system to unfold.Additionally, the unfolded state can be modeled usingonly the neighboring residues of the residue to bemutated, greatly simplifies the calculation of DGuf
Nsp!Msp
and reducing the computational cost involved. Thesecomputational savings can be reinvested by increasingthe number and length of frames used to calculate
DGfNsp!Msp
and DGufNsp!Msp
via the free-energy perturba-tion method, which greatly improves the accuracy of thecalculation.As a test case for the performance of Eq. (31), we have
chosen to examine the pseudo-wild-type ubiquitin andthe Asp21Asn mutant discussed in our previous work21
(see also Ref. 30). The three-dimensional representationof this system is shown in Figure F66. We have chosen thissystem because it has been well studied by other
J_ID: Z7E Customer A_ID: 00524-2009.R1 Cadmus Art: PROT22640 Date: 21-NOVEMBER-09 Stage: I Page: 9
ID: kumarpr Date: 21/11/09 Time: 01:02 Path: N:/Wiley/3b2/PROT/VOL00000/090317/APPFile/C2PROT090317
Figure 5The thermodynamic cycles used to calculate the change in free energy of unfolding upon mutation (see text for details).
CG Model Simulations of Protein Landscapes
PROTEINS 9
Messer et al. Proteins (2010)
simplified reference potentials
Girona Seminar, 03/07/2012
multiscale structure and function
be used in the method. The energy function used in theSCP step consists of the van der Waals and dihedral energyterms as defined by the AMBER99 force field.55
The method to place the side-chains is a straightforwardhill climbing algorithm. It starts by generating an initial
structure that contains positions for all of the side-chainsbased on the backbone atom positions. The initial struc-ture is constructed by placing the rotamer on each residuethat has the minimal energy between the side-chain andthe backbone atoms of the other residues. During this pro-cedure any rotamer that has interaction energy with thebackbone higher than a user defined cutoff is discardedand is no longer considered in further iterations. Theenergy cutoff helps to improve the efficiency by eliminat-ing any side-chains that have steric clashes with atoms inthe backbone. In this work an energy cutoff of 100 kcal/mol was used. Unlike the Xiang and Honig method, weonly use this one structure as the initial conformationrather than generating 120 starting conformations. Thiswas done to improve efficiency, even if it could cause aslight decrease in accuracy. However, the results show thatthe method still performs well. Additionally, there is evi-dence that most side-chains can be placed correctly byonly using their interactions with the backbone57 andother methods also use this as the initial structure.54
Starting from the initial structure an iterative proce-dure is used to find side-chains with the lowest energy.Each side-chain is selected in turn, and the interactionenergy between the possible rotamers of currentlyselected side-chain and all of the other currently placedside-chains and the backbone is considered. If there isanother rotamer that has a lower energy, the currentrotamer is replaced by the lower energy rotamer. Thiscontinues until after a full iteration over the entire pro-tein none of the rotamers are replaced or until a userspecified maximum number of iterations is reached. Inthe results of this paper the maximum number of itera-tions allowed was 10, which was never reached whenreconstructing the coarse-grain simulations of src-SH3and S6. The rotamers can be considered sequentiallydown the chain or in a random order. We found that forthe proteins studied in this paper the order of iterationdid not affect the results.
Side-chain minimization step
After the SCP a number of side-chains in very high-energy conformations were detected. The all-atom minimi-zation could only fix a small fraction of the high-energy
Figure 1Cartoon illustration of the RACOGS method for a short peptide with thesequence VAL-ASP-SER-LEU-VAL. (1) Starting from the Ca atoms the backboneatoms are added. (2) After the backbones are added the side-chains are placed.(3) The first and third amino acids, circled, are clashing and causing a highenergy interaction. The side-chain minimization step is performed on the firstamino acid and resolves the clash. The last step of adding hydrogens andperforming an all-atom minimization is not shown. [Color figure can be viewedin the online issue, which is available at www.interscience.wiley.com.]
Multiscale Analysis of Protein Landscapes
DOI 10.1002/prot PROTEINS 649
(de)constructing CG-aA representations
Heath et al. Proteins (2007)
RACOGS
Ávila et al. CPPS (2011)
Girona Seminar, 03/07/2012
multiscale structure and function
Girona Seminar, 03/07/2012
multiscale structure and function
all atom vs coarse grain force field
AMBER COARSE GRAINED FORCFIELD
Cesar L. Avila and Nils J. D. Drechsel and Jordi Villa–Freixa
September 7, 2011
Abstract
abstract
1 functional form
nils: Can you specify the meaning of each term? Some are obvious but others
are not at all, like the one containing the g function jordi: The order of the
terms in the “long” version corresponds with the order of the short version. All
terms containing ss, mm and ms are sidechain-sidechain, mainchain-mainchain
or mainchain-sidechain interactions respectively. They are: (1) mainchain-
mainchain interactions where the “normal” amber forcefield is used. (2) sidechain-
sidechain vdw. (3) sidechain-sidechain electrostatics. (4) selfenergy, which is the
energy to transfer a residue from the gasphase into the protein and is e,g, larger
for polar residues into nonpolar environments, thus it depends on neighboring
atoms. (5) sidechain-mainchain vdw. (6) sidechain-mainchain electrostatics. (7)
hydrogenbonding. (8) an additional mainchain-mainchain torsional potential to
favor secondary structures. (9) additional mainchain-mainchain electrostatics
(but I’m not sure for what exactly. Didn’t find information in the paper either,
maybe CA c�sar knows more)
V (r)amber =
�
bonds
κb(b− b0)2+
�
angles
κθ(θ − θ0)2
+
�
dihedrals
(Vn/2)(1 + cos[nφ− δ])
+
�
nonbij
(Aij/r12ij)− (Bij/r
6ij) + (qiqj/rij)
+GB(r)
(1)
nils: is the value 9 in the HB term fixed or is it variable too in the GA?
jordi: The 9 is actually a parameter that we optimize
V (r)ambercg =U0mm
+ Uef
ss+ UQQ
ss+ Uself
s+ Uef
ms+ UQq
ms
+∆UHB
mm+∆Uφ−ψ
mm+∆Uqq
mm
(2)
1
AMBER COARSE GRAINED FORCFIELD
Cesar L. Avila and Nils J. D. Drechsel and Jordi Villa–Freixa
September 7, 2011
Abstract
abstract
1 functional form
nils: Can you specify the meaning of each term? Some are obvious but others
are not at all, like the one containing the g function jordi: The order of the
terms in the “long” version corresponds with the order of the short version. All
terms containing ss, mm and ms are sidechain-sidechain, mainchain-mainchain
or mainchain-sidechain interactions respectively. They are: (1) mainchain-
mainchain interactions where the “normal” amber forcefield is used. (2) sidechain-
sidechain vdw. (3) sidechain-sidechain electrostatics. (4) selfenergy, which is the
energy to transfer a residue from the gasphase into the protein and is e,g, larger
for polar residues into nonpolar environments, thus it depends on neighboring
atoms. (5) sidechain-mainchain vdw. (6) sidechain-mainchain electrostatics. (7)
hydrogenbonding. (8) an additional mainchain-mainchain torsional potential to
favor secondary structures. (9) additional mainchain-mainchain electrostatics
(but I’m not sure for what exactly. Didn’t find information in the paper either,
maybe CA c�sar knows more)
V (r)amber =
�
bonds
κb(b− b0)2+
�
angles
κθ(θ − θ0)2
+
�
dihedrals
(Vn/2)(1 + cos[nφ− δ])
+
�
nonbij
(Aij/r12ij)− (Bij/r
6ij) + (qiqj/rij)
+GB(r)
(1)
nils: is the value 9 in the HB term fixed or is it variable too in the GA?
jordi: The 9 is actually a parameter that we optimize
V (r)ambercg =U0mm
+ Uef
ss+ UQQ
ss+ Uself
s+ Uef
ms+ UQq
ms
+∆UHB
mm+∆Uφ−ψ
mm+∆Uqq
mm
(2)
1
Girona Seminar, 03/07/2012
multiscale structure and function
Adun
http://adun.imim.esJohnston et al. JCC (2005), LNCS (2007)
Girona Seminar, 03/07/2012
multiscale structure and function
Adun
http://adun.imim.esJohnston et al. JCC (2005), LNCS (2007)
Different local and remote (P2P) databases
Girona Seminar, 03/07/2012
multiscale structure and function
3 4 5 6 7 8 9
−300
−280
−260
−240
−220
−200
RMSD
Ener
gy /
kcal
mol−1
CG potential energy landscape for alpha helix
3 4 5 6
−250
−200
−150
−100
−50
RMSD
Ener
gy /
kcal
mol−1
CG potential energy landscape for beta hairpin
Figure 2: Correlation between RMSD and Energy for the complete sampling space of
the model peptides. Hexagon binning for snapshots of the 300K trajectory for (Ala)15
(left) and (Val)5ProGly(Val)5 (right).
-5
-4
-3
-2
-1
Rg
yr
RMSD(A)
3
4
5
6
7
8
9
5 6 7 8 9 10-6
-5
-4
-3
-2
-1
Rg
yr
RMSD(A)
4.2
4.42
4.64
4.86
5.08
5.3
0 1.2 2.4 3.6 4.8 6
Figure 3: Peptide folding FES. Data for (Ala)15 (left) and (Val)5ProGly(Val)5 (right)
are depicted. The RMSD is calculated against a model α–helix and β–hairpin respec-
tively.
8
3 4 5 6 7 8 9
−300
−280
−260
−240
−220
−200
RMSD
Ener
gy /
kcal
mol−1
CG potential energy landscape for alpha helix
3 4 5 6
−250
−200
−150
−100
−50
RMSD
Ener
gy /
kcal
mol−1
CG potential energy landscape for beta hairpin
Figure 2: Correlation between RMSD and Energy for the complete sampling space of
the model peptides. Hexagon binning for snapshots of the 300K trajectory for (Ala)15
(left) and (Val)5ProGly(Val)5 (right).
-5
-4
-3
-2
-1
Rg
yr
RMSD(A)
3
4
5
6
7
8
9
5 6 7 8 9 10-6
-5
-4
-3
-2
-1
Rg
yr
RMSD(A)
4.2
4.42
4.64
4.86
5.08
5.3
0 1.2 2.4 3.6 4.8 6
Figure 3: Peptide folding FES. Data for (Ala)15 (left) and (Val)5ProGly(Val)5 (right)
are depicted. The RMSD is calculated against a model α–helix and β–hairpin respec-
tively.
8
cases, the CG folding FES shows a single dominant non-native minimum present in
addition to the native basin16
. In agreement with the previous figures, it seems clear
that the coarse grain model is problematic in handling β secondary structure interac-
tions. The origin of this problem is not the HB pattern, but, as suggested above, the
Uphi−psi
term. In order to analyze this fact, we project the FES with respect to the two
main chain torsional coordinates and superimpose the expected Ramachandran angles
for both peptides. Figure 4 clearly shows a displacement from the expected angles
Figure 4: Peptide folding FES projected onto backbone dihedral space. Trajectories
for polyalanine are displayed on the left while those for the hairpin are shown on the
right.
for the β–hairpin system, while a less pronounced displacement is seen in the (Ala)15
peptide.
Despite the limitation of the coarse grain potential, especially for β–hairpin struc-
tures, the objective of this paper is to demonstrate the feasibility in building a complete
pipeline for multiscale simulations in a portable software like Adun, so next, we an-
alyze the ability of the method to reconstruct the free energy surface for the all atom
system. Figure 5 shows the results for the corrected FE, obtained as follows. First a
series of 8× 8 (α–helix) and 16× 16 (β–hairpin) representative structures were ob-
tained from Figure 3. For each of these a free energy perturbation (FEP) protocol was
9
carried out in which one slowly transforms the structure from a coarse grain represen-
tation to an all atom representation (see the Methods section for details). As the FEP
involves the appearance of new atoms in the system, it is obvious that care must be
taken not to make the simulation explode right after the first (fully coarse grain) FEP
window. The final reconstructed all atom FES is shown in Figure 5. In terms of the
-4
-3
-2
-1
0
Rgyr
RMSD(A)
3
4
5
6
7
8
9
5 6 7 8 9 10
-6
-5
-4
-3
-2
-1
Rgyr
RMSD(A)
4.2
4.42
4.64
4.86
5.08
5.3
0 1.2 2.4 3.6 4.8 6
Figure 5: Corrected folding FES for the (Ala)15 (left) and (Val)5ProGly(Val)5 (right)peptides derived from Figure 3, using the multiscale free energy perturbation approachto move from the CG to the AA model.
final structures, the expected α–helix and β–hairpin structures are compared with the
central structures in the relevant minimum region in Figure 4 and in Figure 5. The
last row in Figure 6 shows a representative structure for the minimum FE region in
Figure 5. In both Figure 5 and Figure 6 it can be seen that the FEP protocol does not
significantly introduce changes into the shape of the CG FES when moving to the all
atom representation. This result is significant, as it shows that the global pipeline of
the method is entirely sound, as it brings about the possibility of improving individual
modules of the protocol, in particular the quality of the coarse grain potential used as
reference for the FEP.
10
CG PES and reconstruction of the AA PES
alpha-helix beta hairpin
all atom
Girona Seminar, 03/07/2012
multiscale structure and function
Target optimal Coarse Grain CG to AA
Girona Seminar, 03/07/2012
multiscale structure and function
testing in CASP
Girona Seminar, 03/07/2012
multiscale structure and function
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
Rg
yr
RMSD(A)
10.4
10.74
11.08
11.42
11.76
12.1
0.5 2.28 4.06 5.84 7.62 9.4
-10
-8
-6
-4
-2
Rg
yr
RMSD(A)
10.4
10.74
11.08
11.42
11.76
12.1
0.5 2.28 4.06 5.84 7.62 9.4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
Rg
yr
RMSD(A)
8.5
9.12
9.74
10.36
10.98
11.6
0.7 2.82 4.94 7.06 9.18 11.3
-7
-6
-5
-4
-3
-2
-1
Rg
yr
RMSD(A)
8.5
9.12
9.74
10.36
10.98
11.6
0.7 2.82 4.94 7.06 9.18 11.3
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
Rg
yr
RMSD(A)
10.1
10.46
10.82
11.18
11.54
11.9
1 2.86 4.72 6.58 8.44 10.3
-9
-8
-7
-6
-5
-4
-3
-2
-1
Rg
yr
RMSD(A)
10.1
10.46
10.82
11.18
11.54
11.9
1 2.86 4.72 6.58 8.44 10.3
Figure 6: Coarse grain FES (left column) and all atom FES reconstruction (right col-
umn) for three selected structures of increasing complexity: A fast folder, PDB code
1HRC (top row); a two state folder, PDB code 1PRB (middle row), and a challenging
CASP 2010 structure 3NZL (bottom row)
14
1HRC
1PRB
3NZL
Girona Seminar, 03/07/2012
multiscale structure and function
Testing on CASP 2010 targets
Table 1: Cα RMSD in terms of Cα coordinates obtained for the target structures: A)after ab-initio folding with Rosetta; B) the best structure compared to the native inthe 300K trajectory; C) the best structure in the bin corresponding to the minimumin the corrected free energy surface compared to the native structure; D) all-atomicrepresentation of the reference structure with which the free energy perturbation wascarried out; The lines correspond to: first, one-state folder; second, two-state folder;third, CASP 2008 targets; fourth, CASP 2010 targets
PDB-ID RMSD RMSD RMSD RMSD residues(Rosetta) (globally best) (CG, best (AA, reference
in minimum bin) structure)
1PRB 4.94 5.31 10.32 11.03 531HRC 8.24 7.64 11.49 9.77 1031YCC 6.77 8.11 8.71 8.61 1082K53 6.64 6.38 6.56 8.20 762K5C 9.53 8.03 10.77 11.51 952K5E 6.19 5.19 5.36 5.96 732KDL 10.36 9.61 9.89 11.38 563DAI 13.82 12.18 14.06 13.96 1302KY4 11.39 10.72 10.82 13.32 1492L06 8.15 9.38 11.14 9.89 1552L09 9.26 7.34 9.52 8.39 623NEU 9.59 9.64 9.94 10.28 1253NRW 12.49 8.29 11.74 14.22 1173NZL 6.89 6.55 7.00 10.38 83
Free energy surfaces
As in the previous section, the first step in our protocol is to obtain the FES generated
by exploring the coarse grain PES. Our starting points are now the ab initio folded
structures obtained before for the 14 targets. After the initial CG FES analysis we
move onto the all atom reconstruction for each structure. Table 1, Figure 6 and Figures
S3-S6 show the results obtained.
We first analyze the different columns in Table 1. Starting from the Rosetta results
(second column in the table) we observe that the sampling of the CG potential produces
structures that in some cases explore regions closer to the native structure than the
Rosetta protocol itself. This is the case for 10 of the 14 structures analyzed. Such
12
sequence
secondary structureanalysis
generate 3aa + 9aastructural fragments
generate initialtertiary structure
relax tertiarystructure
energy scorestructure
choose structurewith lowest energy
x 10
Figure 7: Protocol used to generate the initial structures for structures in Table 1 withRosetta
19
Girona Seminar, 03/07/2012
multiscale structure and function
Improving the model
Girona Seminar, 03/07/2012
multiscale structure and function
Improving the model• better sampling
– statistical potentials as reference– transition path sampling and string method– GPU computing– better global reaction coordinate
Girona Seminar, 03/07/2012
multiscale structure and function
Improving the model• better sampling
– statistical potentials as reference– transition path sampling and string method– GPU computing– better global reaction coordinate
• better energy– polarizable models– solvation models– simple improvement of the CG
parameters
Girona Seminar, 03/07/2012
multiscale structure and function
rethinking the problem: hotspotsNative
UnfoldedNativeFolded
NativeComplex
MutantUnfolded
MutantFolded
MutantComplex
! G Nf
! GMf ! GM
a
! G Na
!! G N ! >MB = ! GM
B ! ! G NB = ( ! GM
a ! ! GMf ) ! (! G N
a ! ! G Nf )
!! G N ! >MB = ! G N ! >M
c ! ! G N ! >Mf ! ! G N ! >M
u
! G N ! >Mc! G N ! >M
f! G N ! >Mu
Girona Seminar, 03/07/2012
multiscale structure and function
rethinking the problem: hotspots
Girona Seminar, 03/07/2012
multiscale structure and function
changes on binding free energy upon mutation
WT MutantA MutantB MutantC
Unfolded
Folded
Complex
Unfolded
Folded
Complex
UnfoldedFolded
Complex
Unfolded
Folded
Complex
Folding
Binding
Destabilization of complex
Destabilization of unbound protein
Mixed
Girona Seminar, 03/07/2012
multiscale structure and function
extensive optimization
Trajectory reconstruction
explicit vs CG Pearson
correlation
Girona Seminar, 03/07/2012
multiscale structure and function
initi
al p
opul
atio
n
start
initialparameters
mutate
calculate !tness
calculate !tness
calculate !tness
calculate !tness
spawns processesparent 1
parent 2
parent 1
parent 2
o"spring 1
o"spring 2
o"spring 1
o"spring 2
select parents with a rankbased roulette-wheel selector
recombine with a uniform recombinationoperator, mutate with a gaussian mutationoperator
control process
parallel serial
checks if completed
spaw
ns p
roce
sses
colle
cts n
ew p
aram
eter
s
initialization
CG model optimization with GA
Girona Seminar, 03/07/2012
multiscale structure and function
CG model optimization with GA
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 50 100 150 200 250 300 350
pear
son
coef
ficie
nt
generation
simulation 9
meanstandard deviations
bestworst
Girona Seminar, 03/07/2012
multiscale structure and function
CG model optimization with GA
0.5
0.6
0.7
0.8
0.9
1
1.1
VDW2VDW3
HBOND
BONDED
CGX-CT-N
CGX-CT-N3
CGX-CT-H1
CGX-CT-C
AHELIX
BSHEET
GTURN
LHELIX
varia
nce
bond types
simulation 9
Girona Seminar, 03/07/2012
multiscale structure and function
120
140
160
180
200
220
240
260
280
300
150 200 250 300 350 400 450
coar
se-g
rain
ed [k
cal/m
ol]
explicit [kcal/mol]
energy comparison
60
80
100
120
140
160
180
80 100 120 140 160 180 200 220 240 260 280
coar
se-g
rain
ed [k
cal/m
ol]
explicit [kcal/mol]
energy comparison
Girona Seminar, 03/07/2012
multiscale structure and function
-460
-440
-420
-400
-380
-360
-340
-320
450 500 550 600 650 700 750
coar
se-g
rain
ed [k
cal/m
ol]
explicit [kcal/mol]
energy comparison
-480
-460
-440
-420
-400
-380
-360
-340
450 500 550 600 650 700 750
coar
se-g
rain
ed [k
cal/m
ol]
explicit [kcal/mol]
energy comparison
GA
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
coar
se-g
rain
ed
explicit
correlation
we still have a
problem
Girona Seminar, 03/07/2012
multiscale structure and function
Still work in progress
0
2
4
6
8
10
12
14
-1 0 1 2 3 4 5 6 7 8
Pred
icte
d G
bind
ing
(kca
l/mol
)
Experimental Gbinding (kcal/mol)
Hotspot prediction
TPFP
TN FN
AK27A
AR59A
AR83Q
AR87A
AH102A
DY29F
DY29A
DD35A
DW38F
DD39A
DT42A
DW44F
DE76A
DE80A
Girona Seminar, 03/07/2012
multiscale structure and function
Beyond the current model
– using statistical potentials as a reference (collaboration with Janusz Bujnicki, IIMCB Warsaw)
– including a semi-explicit solvation model (collaboration with Ken A. Dill, SBNY)
Girona Seminar, 03/07/2012
multiscale structure and function
RNA Statistical potential
Raúl Alcántara
Girona Seminar, 03/07/2012
multiscale structure and function
RNA Statistical potential
Raúl Alcántara
Girona Seminar, 03/07/2012
multiscale structure and function
RNA Statistical potential
Raúl Alcántara
Girona Seminar, 03/07/2012
multiscale structure and function
Non bonded interactions
Girona Seminar, 03/07/2012
multiscale structure and function
Fitting non-bonded term
MatlabInitial guesses from GMM
MathematicaNon-linear fit
Girona Seminar, 03/07/2012
multiscale structure and function
force field markup language: FFML
<mrow potential="FourierTorsion" numberOfAtoms="4" type="energy">
<variable name="angle">
<parameterdata number="1" name="param1">
<parameterdata number="2" name="param2">
<apply>
<times/>
<ci>param1</ci>
<apply>
<power/>
<apply>
<minus/>
<ci>angle</ci>
<ci>param2</ci>
</apply>
<cn>2</cn>
</apply>
</apply>
</mrow>
<mrow potential="FourierTorsion" numberOfAtoms="4" type="force">
<variable name="dAngle">
<variable name="angle">
<parameterdata number="1" name="param1">
<parameterdata number="2" name="param2">
<apply>
<times/>
<cn>2</cn>
<ci>param1</ci>
<apply>
<minus/>
<ci>angle</ci>
<ci>param2</ci>
</apply>
<ci>dAngle</ci>
</apply>
</mrow>
Meneu et al., in progress
Girona Seminar, 03/07/2012
multiscale structure and function
SECIS elements simulations
Girona Seminar, 03/07/2012
multiscale structure and function
SECIS elements simulations
Girona Seminar, 03/07/2012
multiscale structure and function
!ra
"ra
rw
rw
a b
c d
0
1
2
3
4
!G
(kca
l/m
ol)
Semi-Explicit assembly
! A + b
TIP3P
Experiment
a
0
1
2
3
4
!G
(kca
l/m
ol)
b
-1
0
1
2
3
4
5
!G
(kca
l/m
ol)
c
-1
0
1
2
3
4
5
!G
(kca
l/m
ol)
d
Linear Alkanes Linear Alkynes
Linear PAHs Planar PAHs
CH4
Figure 3: The nonpolar solvation free energy for a series of a) linear alkanes, b) linear alkynes, c) polyaro-
matic hydrocarbons (PAHs) in a linear arrangement, and d) PAHs in a planar arrangement calculated using
γA + b, Semi-Explicit assembly, and explicit solvent. For γA + b, the traditional (0.00542× SAtot) + 0.92was used,
18and the TIP3P results are those obtained through explicit free energy calculations.
36Experi-
mental comparisons to ∆G cannot be drawn with the linear alkynes or PAHs series, because they have a
substantial polar term to the overall solvation.
12
a c eb d
Figure 4: Maps of the collective dispersion attraction about the solvent accessible surface (SAS) of a) n-pentane, b) cyclopentane, c) pent-1-yne, d) benzene, and e) pyrene. The color of the surface indicates the LJwell-depth, with blue starting at 0 kcal/mol and red lowering to deeper than 5 kcal/mol. Note the red “hotspots” around the triple bond in pent-1-yne and in the center of the benzene and pyrene ring planes. Theseindicate a significant enhancement of dispersion attraction with the surroundings. As these regions growwith increasing molecule size, these collective dispersion attractions will offset the cost of cavity formationin surrounding solvent. With a simple γA, all these surfaces would be a uniform blue.
Figure 3b shows solvation free energies for the linear alkynes, from the various models. Alkynes have
a carbon-carbon triple bond at the end of the chain. In the GAFF forcefield,49 the dispersion interaction
well-depth is twice that of carbon-carbon single bonds. Like the explicit simulations, but unlike γA, the
Semi-Explicit approach captures the more favorable aqueous solvation of the alkynes relative to the alkanes.
Figures 4a and 4c show that the extra attraction for water of the alkynes is localized near the triple bond.
Hot spots: not all hydrocarbons are the same
Figures 4a and 4b show the LJ potential surfaces for n-pentane and cyclopentane. Seams between atom
surfaces form favorable interaction “hot spot” regions, while methyl end-groups of the alkane chain are
a deeper blue and less favorable. The surface area of cyclopentane is less than n-pentane, but this only
accounts for a modest decrease of 0.2 kcal/mol in ∆G when using γA + b. This modest change is much
less than the greater than 1 kcal/mol decrease seen experimentally.9, 50 Semi-Explicit assembly includes
the effects of these “hot spots” and lowers ∆G by an additional 0.4 kcal/mol. The remaining difference
between the estimated and the experimental value likely comes from approximations in the Semi-Explicit
assembly approach, such as the void term discussed previously and the incomplete capturing of solvent-
solvent interaction enhancement from optimal hydration cages.
13
Fennell et al. JACS (2010)
SEA water model
Girona Seminar, 03/07/2012
multiscale structure and function
NILS
Girona Seminar, 03/07/2012
multiscale structure and function
NILS
Girona Seminar, 03/07/2012
multiscale structure and function
-1
0
1
2
3
4
-1 0 1 2 3 4
Fern
et (k
cal/m
ol)
TIP3 (kcal/mol)
504molecule cluster, ADUN
SEA-Watercorrelation with explicit
water 0.91
Semi-Analytic SEA-Watercorrelation with
explicit water 0.95
SEA-water vs TIP3P
Drechsel et al, in preparation
Girona Seminar, 03/07/2012
multiscale structure and function
summary!Gfold
wt
m
!Gbindwt
!Gfold !Gbindm
wt-m!!Gunfolded !!Gbound!!Gunbound
wt-m wt-m
Girona Seminar, 03/07/2012
multiscale structure and function
Janusz Bujnicki (IIMCB)Ken Dill and Chris Fennell (UCSF)
Sergio RubioJames Dalton (UAB)Nils Drechsel (SBNY)Michael A. Johnston (IBM)Norma Díaz-‐Vergara (UAB)César L. Ávila (U Tucumán)
aScidea
aScideaCOMPUTATIONAL BIOLOGY SOLUTIONS
COMPUTATIONAL BIOLOGY SOLUTIONS
Proposta 04Cromosoma B
CÉSAR
NORMA
JAMES
NILS
MICHAEL
SERGIO
Girona Seminar, 03/07/2012
multiscale structure and function
ACCESS TO INFORMATION
Girona Seminar, 03/07/2012
multiscale structure and function
ACCESS TO INFORMATION
Girona Seminar, 03/07/2012
multiscale structure and function
INTEGRATION
Girona Seminar, 03/07/2012
multiscale structure and function
-10
-8
-6
-4
-2
0
1 1.5 2 2.5 3
E (kc
al/mo
l)
rOH-180 -120 -60 0 60 120 180!
-180
-120
-60
0
60
120
180
"
-10-9-8-7-6-5-4-3-2-1 0
-2-1.5
-1-0.5
0 0.5
1 1.5
2
3 4 5 6 7 8 9 10
E (kc
al/mo
l)
rij
Usp = Umm + U efss + U QQ
ss + U selfs + U ef
ms + U Qqms + ! U HB
mm + ! U !! "mm + ! U qq
mm
U efss =
i<j
0ij
C scaleij
[3(r 0ij /r ij )8 ! 4(r 0ij /r ij )6 ]
! U HBmm =
! 9 r " 2.0
! 9exp(! 15(r ! 2.0)2 ) r > 2.0
! U !! "mm =
4
i=1
A i g(!! !i0 , w
i0 )g(" ! "i
0 , wi0 )
g(# , w) = exp(! 0.693(1 ! cos(# )) /sin (w/2))
U selfs =
i
Unp (N inp ) + Upolar (N i
polar )
Unp =4exp[! 0.2(N np ! 6)2 ] N np " 64 N np > 6
Upolar =! 2exp[! 0.2(N polar ! 4)2 ] N polar " 4! 2 N polar > 4
r ij
(!, ")
Washel and col. (2009-2010)