an improved hybrid monte carlo method for conformational sampling of proteins
DESCRIPTION
An improved hybrid Monte Carlo method for conformational sampling of proteins. Jesús A. Izaguirre and Scott Hampton Department of Computer Science and Engineering University of Notre Dame March 5, 2003 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/1.jpg)
1
An improved hybrid Monte Carlo method for
conformational sampling of proteins
Jesús A. Izaguirre and Scott HamptonDepartment of Computer Science and Engineering
University of Notre Dame
March 5, 2003
This work is partially supported by two NSF grants (CAREER and BIOCOMPLEXITY) and two grants from University of Notre Dame
![Page 2: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/2.jpg)
2
Overview1. Motivation: sampling conformational space of proteins
2. Methods for sampling (MD, HMC)
3. Evaluation of new Shadow HMC
4. Future applications
![Page 3: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/3.jpg)
3
Protein: The Machinery of LifeNH2-Val-His-Leu-Thr-Pro-Glu-Glu-Lys-Ser-Ala-Val-Thr-Ala-Leu-Trp-Gly-Lys-Val-Asn-Val-Asp-Glu-Val-Gly-Gly-Glu-…..
![Page 4: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/4.jpg)
4
Protein Structure
![Page 5: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/5.jpg)
5
Why protein folding? Huge gap: sequence data and 3D structure data
EMBL/GENBANK, DNA (nucleotide) sequences 15 million sequence, 15,000 million base pairs
SWISSPROT, protein sequences120,000 entries
PDB, 3D protein structures20,000 entries
Bridging the gap through prediction Aim of structural genomics:
“Structurally characterize most of the protein sequences by an efficient combination of experiment and prediction,” Baker and Sali (2001)
Thermodynamics hypothesis: Native state is at the global free energy minimum
Anfinsen (1973)
![Page 6: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/6.jpg)
6
Questions related to folding I Long time kinetics:
dynamics of folding only statistical
correctness possible ensemble dynamics e.g., folding@home
Short time kinetics strong correctness
possible e.g., transport
properties, diffusion coefficients
![Page 7: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/7.jpg)
7
Questions related to folding II Sampling
Compute equilibrium averages by visiting all (most) of “important” conformations
Examples: Equilibrium
distribution of solvent molecules in vacancies
Free energies Characteristic
conformations (misfolded and folded states)
![Page 8: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/8.jpg)
8
Overview1. Motivation: sampling conformational space of proteins
2. Methods for sampling (MD, HMC)
3. Evaluation of new Shadow HMC
4. Future applications
![Page 9: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/9.jpg)
9
Classical molecular dynamics Newton’s
equations of motion:
Atoms Molecules CHARMM force
field(Chemistry at Harvard Molecular Mechanics)
'' ( ) ( ). - - - (1)U Mq q F q
Bonds, angles and torsions
![Page 10: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/10.jpg)
10
What is a Forcefield?
The forcefield is a collection of equations and associated constants designed to reproduce molecular geometry and selected properties of tested structures.
In molecular dynamics a molecule is described as a series of charged points (atoms) linked by springs (bonds).
![Page 11: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/11.jpg)
11
Energy Terms Described in the CHARMm forcefield
Bond Angle
Dihedral Improper
![Page 12: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/12.jpg)
12
Energy Functions
Ubond = oscillations about the equilibrium bond lengthUangle = oscillations of 3 atoms about an equilibrium angleUdihedral = torsional rotation of 4 atoms about a central bondUnonbond = non-bonded energy terms (electrostatics and Lennard-Jones)
![Page 13: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/13.jpg)
13
Molecular Dynamics –what does it mean?MD = change in conformation over time using a forcefield
Conformational change
EnergyEnergy supplied to the minimized system at the start of the simulation
Conformation impossible to access through MD
![Page 14: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/14.jpg)
14
MD, MC, and HMC in sampling Molecular Dynamics takes long steps in phase
space, but it may get trapped Monte Carlo makes a random walk (short
steps), it may escape minima due to randomness
Can we combine these two methods?
MCMDHMC
![Page 15: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/15.jpg)
15
Hybrid Monte Carlo We can sample from a distribution with
density p(x) by simulating a Markov chain with the following transitions: From the current state, x, a candidate state x’
is drawn from a proposal distribution S(x,x’). The proposed state is accepted with prob.min[1,(p(x’) S(x’,x)) / (p(x) S(x,x’))]
If the proposal distribution is symmetric, S(x’,x)) = S(x,x’)), then the acceptance prob. only depends on p(x’) / p(x)
![Page 16: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/16.jpg)
16
Hybrid Monte Carlo II Proposal functions must be reversible:
if x’ = s(x), then x = s(x’) Proposal functions must preserve
volume Jacobian must have absolute value one Valid proposal: x’ = -x Invalid proposals:
x’ = 1 / x (Jacobian not 1) x’ = x + 5 (not reversible)
![Page 17: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/17.jpg)
17
Hybrid Monte Carlo III Hamiltonian dynamics preserve volume in
phase space Hamiltonian dynamics conserve the Hamiltonian
H(q,p) Reversible symplectic integrators for
Hamiltonian systems preserve volume in phase space
Conservation of the Hamiltonian depends on the accuracy of the integrator
Hybrid Monte Carlo: Use reversible symplectic integrator for MD to generate the next proposal in MC
![Page 18: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/18.jpg)
18
HMC Algorithm
Perform the following steps:1. Draw random values for the momenta p from
normal distribution; use given positions q2. Perform cyclelength steps of MD, using a
symplectic reversible integrator with timestep t, generating (q’,p’)
3. Compute change in total energy H = H(q’,p’) - H(q,p)
4. Accept new state based on exp(- H )
![Page 19: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/19.jpg)
19
Hybrid Monte Carlo IV
Advantages of HMC: HMC can propose and accept distant points
in phase space, provided the accuracy of the MD integrator is high enough
HMC can move in a biased way, rather than in a random walk (distance k vs sqrt(k))
HMC can quickly change the probability density
![Page 20: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/20.jpg)
20
Hybrid Monte Carlo V As the number of atoms
increases, the total error in the H(q,p) increases. The error is related to the time step used in MD
Analysis of N replicas of multivariate Gaussian distributions shows that HMC takes O(N5/4 ) with time step t = O(N-1/4) Kennedy & Pendleton, 91
System size N
Max t
66 0.5
423 0.25
868 0.1
5143 0.05
![Page 21: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/21.jpg)
21
Hybrid Monte Carlo VI The key problem in scaling is the accuracy of
the MD integrator More accurate methods could help scaling Creutz and Gocksch 89 proposed higher
order symplectic methods for HMC In MD, however, these methods are more
expensive than the scaling gain. They need more force evaluations per step
![Page 22: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/22.jpg)
22
Overview1. Motivation: sampling conformational space of proteins
2. Methods for sampling (MD, HMC)
3. Evaluation of new Shadow HMC
4. Future applications
![Page 23: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/23.jpg)
23
Improved HMC Symplectic integrators conserve exactly
(within roundoff error) a modified Hamiltonian that for short MD simulations (such as in HMC) stays close to the true Hamiltonian Sanz-Serna & Calvo 94
Our idea is to use highly accurate approximations to the modified Hamiltonian in order to improve the scaling of HMC
![Page 24: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/24.jpg)
24
Shadow Hamiltonian
Work by Skeel and Hardy, 2001, shows how to compute an arbitrarily accurate approximation to the modified Hamiltonian, called the Shadow Hamiltonian
Hamiltonian: H=1/2pTM-1p + U(q) Modified Hamiltonian: HM = H + O(t p) Shadow Hamiltonian: SH2p = HM + O(t 2p)
Arbitrary accuracy Easy to compute Stable energy graph
Example, SH4 = H – f( qn-1, qn-2, pn-1, pn-2 ,βn-1 ,βn-2)
![Page 25: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/25.jpg)
25
See comparison of SHADOW and ENERGY
![Page 26: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/26.jpg)
26
Shadow HMC Replace total energy H with shadow
energy SH2m = SH2m (q’,p’) – SH2m (q,p)
Nearly linear scalability of sampling rateComputational cost SHMC, N(1+1/2m), where
m is accuracy order of integrator Extra storage (m copies of q and p) Moderate overhead (25% for small
proteins)
![Page 27: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/27.jpg)
27
Example Shadow Hamiltonian
![Page 28: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/28.jpg)
28
ProtoMol: a framework for MD
Front-end
Middle layer
back-end
libfrontend
libintegrators
libbase, libtopologylibparallel, libforces
Modular design of ProtoMol (Prototyping Molecular dynamics).Available at http://www.cse.nd.edu/~lcls/protomol
Matthey, et al, ACM Tran. Math. Software (TOMS), submitted
![Page 29: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/29.jpg)
29
SHMC implementation Shadow Hamiltonian
requires propagation of β
Can work for any integrator
![Page 30: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/30.jpg)
30
Systems tested
![Page 31: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/31.jpg)
31
Sampling Metric 1 Generate a plot of dihedral angle vs.
energy for each angle Find local maxima Label ‘bins’ between maxima For each dihedral angle, print the label
of the energy bin that it is currently in
![Page 32: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/32.jpg)
32
Sampling Metric 2 Round each dihedral angle to the
nearest degree Print label according to degree
![Page 33: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/33.jpg)
33
Acceptance Rates
![Page 34: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/34.jpg)
34
More Acceptance Rates
![Page 35: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/35.jpg)
35
Sampling rate for decalanine (dt = 2 fs)
![Page 36: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/36.jpg)
36
Sampling rate for 2mlt
![Page 37: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/37.jpg)
37
Sampling rate comparison Cost per conformation is total
simulation time divided by number of new conformations discovered (2mlt, dt = 0.5 fs) HMC 122 s/conformation SHMC 16 s/conformation HMC discovered 270 conformations in
33000 seconds SHMC discovered 2340 conformations in
38000 seconds
![Page 38: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/38.jpg)
38
Conclusions SHMC has a much higher acceptance
rate, particularly as system size and timestep increase
SHMC discovers new conformations more quickly
SHMC requires extra storage and moderate overhead.
SHMC works best at relatively large timesteps
![Page 39: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/39.jpg)
39
Future work Multiscale problems for rugged energy surface
Multiple time stepping algorithms plus constraining Temperature tempering and multicanonical
ensemble Potential smoothing
System size Parallel Multigrid O(N) electrostatics
Applications Free energy estimation for drug design Folding and metastable conformations Average estimation
![Page 40: An improved hybrid Monte Carlo method for conformational sampling of proteins](https://reader035.vdocument.in/reader035/viewer/2022062501/56815c76550346895dca8a53/html5/thumbnails/40.jpg)
40
Acknowledgments Dr. Thierry Matthey, co-developer of ProtoMol, University
of Bergen, Norway Graduate students: Qun Ma, Alice Ko, Yao Wang, Trevor
Cickovski Students in CSE 598K, “Computational Biology,” Spring
2002 Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr. Christoph
Schutte for valuable discussions Dr. Radford Neal’s presentation “Markov Chain Sampling
Using Hamiltonian Dynamics” (http://www.cs.utoronto.ca ) Dr. Klaus Schulten’s presentation “An introduction to
molecular dynamics simulations” (http://www.ks.uiuc.edu )