an improved hybrid monte carlo method for conformational sampling of proteins

Post on 31-Dec-2015

24 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

An improved hybrid Monte Carlo method for conformational sampling of proteins. Jesús A. Izaguirre and Scott Hampton Department of Computer Science and Engineering University of Notre Dame March 5, 2003 - PowerPoint PPT Presentation

TRANSCRIPT

1

An improved hybrid Monte Carlo method for

conformational sampling of proteins

Jesús A. Izaguirre and Scott HamptonDepartment of Computer Science and Engineering

University of Notre Dame

March 5, 2003

This work is partially supported by two NSF grants (CAREER and BIOCOMPLEXITY) and two grants from University of Notre Dame

2

Overview1. Motivation: sampling conformational space of proteins

2. Methods for sampling (MD, HMC)

3. Evaluation of new Shadow HMC

4. Future applications

3

Protein: The Machinery of LifeNH2-Val-His-Leu-Thr-Pro-Glu-Glu-Lys-Ser-Ala-Val-Thr-Ala-Leu-Trp-Gly-Lys-Val-Asn-Val-Asp-Glu-Val-Gly-Gly-Glu-…..

4

Protein Structure

5

Why protein folding? Huge gap: sequence data and 3D structure data

EMBL/GENBANK, DNA (nucleotide) sequences 15 million sequence, 15,000 million base pairs

SWISSPROT, protein sequences120,000 entries

PDB, 3D protein structures20,000 entries

Bridging the gap through prediction Aim of structural genomics:

“Structurally characterize most of the protein sequences by an efficient combination of experiment and prediction,” Baker and Sali (2001)

Thermodynamics hypothesis: Native state is at the global free energy minimum

Anfinsen (1973)

6

Questions related to folding I Long time kinetics:

dynamics of folding only statistical

correctness possible ensemble dynamics e.g., folding@home

Short time kinetics strong correctness

possible e.g., transport

properties, diffusion coefficients

7

Questions related to folding II Sampling

Compute equilibrium averages by visiting all (most) of “important” conformations

Examples: Equilibrium

distribution of solvent molecules in vacancies

Free energies Characteristic

conformations (misfolded and folded states)

8

Overview1. Motivation: sampling conformational space of proteins

2. Methods for sampling (MD, HMC)

3. Evaluation of new Shadow HMC

4. Future applications

9

Classical molecular dynamics Newton’s

equations of motion:

Atoms Molecules CHARMM force

field(Chemistry at Harvard Molecular Mechanics)

'' ( ) ( ). - - - (1)U Mq q F q

Bonds, angles and torsions

10

What is a Forcefield?

To describe the time evolution of bond lengths, bond angles and torsions, also the nonbond van der Waals and elecrostatic interactions between atoms, one uses a forcefield.The forcefield is a collection of equations and associated constants designed to reproduce molecular geometry and selected properties of tested structures.

In molecular dynamics a molecule is described as a series of charged points (atoms) linked by springs (bonds).

11

Energy Terms Described in the CHARMm forcefield

Bond Angle

Dihedral Improper

12

Energy Functions

Ubond = oscillations about the equilibrium bond lengthUangle = oscillations of 3 atoms about an equilibrium angleUdihedral = torsional rotation of 4 atoms about a central bondUnonbond = non-bonded energy terms (electrostatics and Lennard-Jones)

13

Molecular Dynamics –what does it mean?MD = change in conformation over time using a forcefield

Conformational change

EnergyEnergy supplied to the minimized system at the start of the simulation

Conformation impossible to access through MD

14

MD, MC, and HMC in sampling Molecular Dynamics takes long steps in phase

space, but it may get trapped Monte Carlo makes a random walk (short

steps), it may escape minima due to randomness

Can we combine these two methods?

MCMDHMC

15

Hybrid Monte Carlo We can sample from a distribution with

density p(x) by simulating a Markov chain with the following transitions: From the current state, x, a candidate state x’

is drawn from a proposal distribution S(x,x’). The proposed state is accepted with prob.min[1,(p(x’) S(x’,x)) / (p(x) S(x,x’))]

If the proposal distribution is symmetric, S(x’,x)) = S(x,x’)), then the acceptance prob. only depends on p(x’) / p(x)

16

Hybrid Monte Carlo II Proposal functions must be reversible:

if x’ = s(x), then x = s(x’) Proposal functions must preserve

volume Jacobian must have absolute value one Valid proposal: x’ = -x Invalid proposals:

x’ = 1 / x (Jacobian not 1) x’ = x + 5 (not reversible)

17

Hybrid Monte Carlo III Hamiltonian dynamics preserve volume in

phase space Hamiltonian dynamics conserve the Hamiltonian

H(q,p) Reversible symplectic integrators for

Hamiltonian systems preserve volume in phase space

Conservation of the Hamiltonian depends on the accuracy of the integrator

Hybrid Monte Carlo: Use reversible symplectic integrator for MD to generate the next proposal in MC

18

HMC Algorithm

Perform the following steps:1. Draw random values for the momenta p from

normal distribution; use given positions q2. Perform cyclelength steps of MD, using a

symplectic reversible integrator with timestep t, generating (q’,p’)

3. Compute change in total energy H = H(q’,p’) - H(q,p)

4. Accept new state based on exp(- H )

19

Hybrid Monte Carlo IV

Advantages of HMC: HMC can propose and accept distant points

in phase space, provided the accuracy of the MD integrator is high enough

HMC can move in a biased way, rather than in a random walk (distance k vs sqrt(k))

HMC can quickly change the probability density

20

Hybrid Monte Carlo V As the number of atoms

increases, the total error in the H(q,p) increases. The error is related to the time step used in MD

Analysis of N replicas of multivariate Gaussian distributions shows that HMC takes O(N5/4 ) with time step t = O(N-1/4) Kennedy & Pendleton, 91

System size N

Max t

66 0.5

423 0.25

868 0.1

5143 0.05

21

Hybrid Monte Carlo VI The key problem in scaling is the accuracy of

the MD integrator More accurate methods could help scaling Creutz and Gocksch 89 proposed higher

order symplectic methods for HMC In MD, however, these methods are more

expensive than the scaling gain. They need more force evaluations per step

22

Overview1. Motivation: sampling conformational space of proteins

2. Methods for sampling (MD, HMC)

3. Evaluation of new Shadow HMC

4. Future applications

23

Improved HMC Symplectic integrators conserve exactly

(within roundoff error) a modified Hamiltonian that for short MD simulations (such as in HMC) stays close to the true Hamiltonian Sanz-Serna & Calvo 94

Our idea is to use highly accurate approximations to the modified Hamiltonian in order to improve the scaling of HMC

24

Shadow Hamiltonian

Work by Skeel and Hardy, 2001, shows how to compute an arbitrarily accurate approximation to the modified Hamiltonian, called the Shadow Hamiltonian

Hamiltonian: H=1/2pTM-1p + U(q) Modified Hamiltonian: HM = H + O(t p) Shadow Hamiltonian: SH2p = HM + O(t 2p)

Arbitrary accuracy Easy to compute Stable energy graph

Example, SH4 = H – f( qn-1, qn-2, pn-1, pn-2 ,βn-1 ,βn-2)

25

See comparison of SHADOW and ENERGY

26

Shadow HMC Replace total energy H with shadow

energy SH2m = SH2m (q’,p’) – SH2m (q,p)

Nearly linear scalability of sampling rateComputational cost SHMC, N(1+1/2m), where

m is accuracy order of integrator Extra storage (m copies of q and p) Moderate overhead (25% for small

proteins)

27

Example Shadow Hamiltonian

28

ProtoMol: a framework for MD

Front-end

Middle layer

back-end

libfrontend

libintegrators

libbase, libtopologylibparallel, libforces

Modular design of ProtoMol (Prototyping Molecular dynamics).Available at http://www.cse.nd.edu/~lcls/protomol

Matthey, et al, ACM Tran. Math. Software (TOMS), submitted

29

SHMC implementation Shadow Hamiltonian

requires propagation of β

Can work for any integrator

30

Systems tested

31

Sampling Metric 1 Generate a plot of dihedral angle vs.

energy for each angle Find local maxima Label ‘bins’ between maxima For each dihedral angle, print the label

of the energy bin that it is currently in

32

Sampling Metric 2 Round each dihedral angle to the

nearest degree Print label according to degree

33

Acceptance Rates

34

More Acceptance Rates

35

Sampling rate for decalanine (dt = 2 fs)

36

Sampling rate for 2mlt

37

Sampling rate comparison Cost per conformation is total

simulation time divided by number of new conformations discovered (2mlt, dt = 0.5 fs) HMC 122 s/conformation SHMC 16 s/conformation HMC discovered 270 conformations in

33000 seconds SHMC discovered 2340 conformations in

38000 seconds

38

Conclusions SHMC has a much higher acceptance

rate, particularly as system size and timestep increase

SHMC discovers new conformations more quickly

SHMC requires extra storage and moderate overhead.

SHMC works best at relatively large timesteps

39

Future work Multiscale problems for rugged energy surface

Multiple time stepping algorithms plus constraining Temperature tempering and multicanonical

ensemble Potential smoothing

System size Parallel Multigrid O(N) electrostatics

Applications Free energy estimation for drug design Folding and metastable conformations Average estimation

40

Acknowledgments Dr. Thierry Matthey, lead developer of

ProtoMol, University of Bergen, Norway Graduate students: Qun Ma, Alice Ko,

Yao Wang, Trevor Cickovski Dr. Robert Skeel, Dr. Ruhong Zhou, and

Dr. Christoph Schutte for valuable discussions

41

Multiple time stepping Fast/slow force splitting

Bonded: “fast” Small periods

Long range nonbonded: “slow” Large characteristic time

Evaluate slow forces less frequently Fast forces cheap Slow force evaluation expensive

42

The Impulse integratorGrubmüller,Heller, Windemuth and Schulten,

1991 Tuckerman, Berne and Martyna, 1992

The impulse “train”

Time, t

Fast impulses, t

Slow impulses, t

How far apart can we stretch the impulse train?

43

Stretching slow impulses t ~ 100 fs if accuracy does not degenerate

1/10 of the characteristic time MaIz, SIAM J. Multiscale Modeling and Simulation, 2003

(submitted)

Resonances (let be the shortest period) Natural: t = n , n = 1, 2, 3, … Numerical:

Linear: t = /2 Nonlinear: t = /3

MaIS_a, SIAM J. on Sci. Comp. (SISC), 2002 (in press)MaIS_b, 2003 ACM Symp. App. Comp. (SAC’03), 2002 (in press)

Impulse far from being multiscale!

44

3rd order nonlinear resonance of Impulse MaIS_a, SISC, 2002; MaIS_b, ACM

SAC’03, 2002

Fig. 1: Left: flexible water system. Right: Energy drift from 500ps MD simulation of flexible water at room temperature revealing 3:1 and 4:1 nonlinear resonance (3.3363 and 2.4 fs)

45

Overview Introduction

Molecular dynamics (MD) in action Classical MD Protein folding

Nonlinear instabilities of Impulse integrator Approximate MD integrators

Targeted MOLLY MUSICO Applications

Summary Future work Acknowledgements

46

Objective statement

Design multiscale integrators that are not limited by nonlinear and linear instabilities Allowing larger time steps Better sequential performance Better scaling

47

Overview Introduction

Molecular dynamics (MD) in action Classical MD Protein folding

Nonlinear instabilities of Impulse integrator Approximate MD integrators

Targeted MOLLY MUSICO Applications

Summary Future work Acknowledgements

48

Targeted MOLLY (TM)

TM = MOLLY + targeted Langevin coupling

Mollified Impulse (MOLLY) to overcome linear instabilities

Izaguirre, Reich and Skeel, 1999

Stochasticity to stabilize MOLLYIzaguirre, Catarello, et al, 2001

MaIz, 2003 ACM Symp. App. Comp. (SAC’03), 2002 (in press)MaIz, SIAM J. Multiscale Modeling and Simulation, 2003 (submitted)

49

Mollified Impulse (MOLLY)

MOLLY (mollified Impulse) Slow potential at time averaged positions,

A(x) Averaging using only fastest forces Mollified slow force = Ax(x) F(A(x)) Equilibrium and B-spline

B-spline MOLLY Averaging over certain time interval Needs analytical Hessians Step sizes up to 6 fs (50~100% speedup)

50

Introducing stochasticity Langevin stabilization of MOLLY (LM)

Izaguirre, Catarello, et al, 2001

12 fs for flexible waters with correct dynamics Dissipative particle dynamics (DPD):

Pagonabarraga, Hagen and Frenkel, 1998

Pair-wise Langevin force on “particles” Time reversible if self-consistent

Vi Vj

FRi, FD

i

FRj = - FR

i

FDj = - FD

i

51

Targeted Langevin coupling

Targeted at “trouble-making” pairs Bonds, angles Hydrogen bonds

Stabilizing MOLLY Slow forces evaluated much less

frequently Recovering correct dynamics

Coupling coefficient small

52

TM: main results

16 fs for flexible waters Correct dynamics

Self-diffusion coefficient, D. leapfrog w/ 1fs, D = 3.69+/-0.01 TM w/ (16 fs, 2fs), D = 3.68+/-0.01

Correct structure Radial distribution function (r.d.f.)

53

TM: correct r.d.f.

Fig. 4. Radial distribution function of O-H (left) and H-H (right) in flexible waters.

54

TM: discussion Much larger time steps (~100 fs)

Better force splitting Zhou, Harder, etc. 2001

More stable MOLLY integrators Conformational transition rate

folding@home: ensemble dynamics Celestial systems Graphics rendering in gaming

applications

55

Overview Introduction

Molecular dynamics (MD) in action Classical MD Protein folding

Nonlinear instabilities of Impulse integrator Approximate MD integrators

Targeted MOLLY MUSICO Applications

Summary Future work Acknowledgements

56

Lengthening time scales: SRP

1 msec simulation of an enzyme using stochastic reaction path (SRP) method. Courtesy of R. Elber at Cornell Univ.. SRP requires initial and final configuration.

57

What do we learn from SRP?

Cons: final configuration not always available

Pros: large time steps/long simulation Motions whose char. time is less than the

step size filtered out Essential (interesting) motions remain

What are other approaches for filtering out fast motions?

58

Lengthening time scales: MUSICO

Ideas: Splitting into nearly linear and nonlinear

parts Implicit integration of linear part with

constraining of internal d.o.f. Explicit treatment of highly nonlinear part Optional pairwise stochasticity for stability

MUSICO (in progress) Multi-Scale Implicit-explicit Constrained

molecular dynamics

59

MUSICO: the anticipations

Larger step sizes: ps, ns Implicit methods generally more stable

Schlick, et al (1997)

Greater scalability Hessian computations mostly local Low in communication

Real speedup Linear parts cheap 100 fold speedup sequentially?

60

MUSICO: the implicit part Implicit equations for nearly linear part

Newton-Raphson method

Require Hessians Krylov subspace method

1/ 2

21/ 2 1/ 2 1/ 2

1 1/ 2

1 1/ 2 1

1. / 2,

2. ( ),

3. ,

4. / 2

n n n

n n n

n n n

n n n

t

U t

t

t

x x p

F x F

p p F

x x p

20 1 1 1/ 2 1/ 2 1/ 2( , ,..., ) ( ) 0,

.N n n nf f f U t

F F x F

J f F

61

Overview Introduction

Molecular dynamics (MD) in action Classical MD Protein folding

Nonlinear instabilities of Impulse integrator Approximate MD integrators

Targeted MOLLY MUSICO Applications

Summary Future work Acknowledgements

62

Applications of approximate MD

Combining w/ other algorithms Steered/interactive MD Monte Carlo (MC)

Escaping from local minima Quicker exploring conformational space

Coarsening Large system and/or long simulations

Protein simulations: Folding/Unfolding/Equilibrium Drug design Other

63

Approx. MD in drug design

Designing drugs that regulate proteins Examples: drugs that bind to

Enzymes Receptors Ion channels and transporter systems

Evaluation of binding affinity using approximate MD Very long simulations and better sampling

64

Approx. MD in drug design (cont.)

65

ProtoMol: a framework for MD

Front-end

Middle layer

back-end

libfrontend

libintegrators

libbase, libtopologylibparallel, libforces

Modular design of ProtoMol (Prototyping Molecular dynamics).Available at http://www.cse.nd.edu/~lcls/protomol

Matthey, et al, ACM Tran. Math. Software (TOMS), submitted

66

Overview Introduction

Molecular dynamics (MD) in action Classical MD Protein folding

Nonlinear instabilities of Impulse integrator Approximate MD integrators

Targeted MOLLY MUSICO Applications

Summary Future work Acknowledgements

67

Summary Introduction Nonlinear instabilities (original) New algorithms

Targeted MOLLY (original) Larger time steps and speedup Correct dynamics and structures

MUSICO (original, in progress) Applications of approx. MD (in progress)

68

Overview Introduction

Molecular dynamics (MD) in action Classical MD Protein folding

Nonlinear instabilities of Impulse integrator Approximate MD integrators

Targeted MOLLY MUSICO Applications

Summary Future work Acknowledgements

69

Future work Short term

MUSICO / Krylov solver Simulation of small proteins in vacuum

Mid term Optimization of MUSICO / Krylov solver Simulation of large proteins in water Binding affinity in drug design

Long term Distributed protein folding and drug design Infrastructure for heterogeneous distributed

computing – similar to folding@home

70

Future work (cont.)

Funding opportunities NSF NIH DOE Pharmaceutical industry

71

Key references[1] Izaguirre, Ma, et al. Overcoming instabilities in Verlet-I/r-RESPA with the

mollified impulse method. Vol. 24 of Lecture Notes in Comput. Sci. & Eng., pages 146-174, Springer-Verlag, Berlin, New York, 2002

[2] Ma, Izaguirre and Skeel. Verlet-I/r-RESPA/Impulse is limited by nonlinear instability. SIAM J. Scientific Computing, 2002 (in press).

[3] Ma and Izaguirre. Targeted mollified impulse method for molecular dynamics. SIAM J. Multiscale Modeling and Simulation, submitted

[4] Matthey, Cickovski, Hampton, Ko, Ma, Slabach and Izaguirre. PROTOMOL, an object-oriented framework for prototyping novel applications of molecular dynamics. ACM Tran. Math. Software (TOMS), submitted

[5] Ma, Izaguirre and Skeel. Nonlinear instability in multiple time stepping molecular dynamics. 2003 ACM Symposium on Applied Computing (SAC’03). Melborne, Florida. 2002 (in press)

[6] Ma and Izaguirre. Long time step molecular dynamics using targeted Langevin Stabilization. Accepted by the 2003 ACM SAC’03. Melborne, Florida. 2002 (in press)

72

Overview Introduction

Molecular dynamics (MD) in action Classical MD Protein folding

Nonlinear instabilities of Impulse integrator Approximate MD integrators

Targeted MOLLY MUSICO Applications

Summary Future work Acknowledgements

73

Acknowledgements

People Dr. Jesus Izaguirre Dr. Robert Skeel, Univ. of Illinois at Urbana-

Champaign Dr. Thierry Matthey, University of Bergen, Norway

Resources Hydra and BOB clusters at ND Norwegian Supercomputing Center, Bergen, Norway

Funding NSF BIOCOMPLEXITY-IBN-0083653, and NSF CAREER Award ACI-0135195

74

THE END. THANKS!

75

Key references[1] J. A. Izaguirre, Q. Ma, T. Matthey, et al. Overcoming instabilities in Verlet-I/r-RESPA

with the mollified impulse method. In T. Schlick and H. H. Gan, editors, Proceedings of the 3rd International Workshop on Algorithms for Macromolecular Modeling, Vol. 24 of Lecture Notes in Computational Science and Engineering, pages 146-174, Springer-Verlag, Berlin, New York, 2002

[2] Q. Ma, J. A. Izaguirre, and R. D. Skeel. Verlet-I/r-RESPA/Impulse is limited by nonlinear instability. Accepted by the SIAM Journal on Scientific Computing, 2002. Available at http://www.nd.edu/~qma1/publication_h.html.

[3] Q. Ma and J. A. Izaguirre. Targeted mollified impulse method for molecular dynamics. Submitted to the SIAM Journal on Multiscale Modeling and Simulation, 2003.

[4] T. Matthey, T. Cickovski, S. Hampton, A. Ko, Q. Ma, T. Slabach and J. Izaguirre. PROTOMOL, an object-oriented framework for prototyping novel applications of molecular dynamics. Submitted to the ACM Transactions on Mathematical Software (TOMS), 2003.

[5] Q. Ma, J. A. Izaguirre, and R. D. Skeel. Nonlinear instability in multiple time stepping molecular dynamics. Accepted by the 2003 ACM Symposium on Applied Computing (SAC’03). Melborne, Florida. March 2003

[6] Q. Ma and J. A. Izaguirre. Long time step molecular dynamics using targeted Langevin Stabilization. Accepted by the 2003 ACM Symposium on Applied Computing (SAC’03). Melborne, Florida. March 2003

[7] M. Zhang and R. D. Skeel. Cheap implicit symplectic integrators. Appl. Num. Math., 25:297-302, 1997

76

Other references[8] J. A. Izaguirre, Justin M. Wozniak, Daniel P. Catarello, and Robert D. Skeel. Langevin

Stabilization of Molecular Dynamics, J. Chem. Phys., 114(5):2090-2098, Feb. 1, 2001.  [9] T. Matthey and J. A. Izaguirre, ProtoMol: A Molecular Dynamics Framework with Incremental

Parallelization, in Proc. of the Tenth SIAM Conf. on Parallel Processing for Scientific Computing, 2001.

[10] H. Grubmuller, H. Heller, A. Windemuth, and K. Schulten, Generalized Verlet algorithm for efficient molecular dynamics simulations with long range interactions, Molecular Simulations 6 (1991), 121-142.

[11] M. Tuckerman, B. J. Berne, and G. J. Martyna, Reversible multiple time scale molecular dynamics, J. Chem. Phys 97 (1992), no. 3, 1990-2001

[12] J. A. Izaguirre, S. Reich, and R. D. Skeel. Longer time steps for molecular dynamics. J. Chem. Phys., 110(19):9853–9864, May 15, 1999.

[13] L. Kale, R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, J. Phillips, A. Shinozaki, K. Varadarajan, and K. Schulten. NAMD2: Greater scalability for parallel molecular dynamics. J. Comp. Phys., 151:283–312, 1999.

[14] R. D. Skeel. Integration schemes for molecular dynamics and related applications. In M. Ainsworth, J. Levesley, and M. Marletta, editors, The Graduate Student’s Guide to Numerical Analysis, SSCM, pages 119-176. Springer-Verlag, Berlin, 1999

[15] R. Zhou, , E. Harder, H. Xu, and B. J. Berne. Efficient multiple time step method for use with ewald and partical mesh ewald for large biomolecular systems. J. Chem. Phys., 115(5):2348–2358, August 1 2001.

77

Other references (cont.)[16] E. Barth and T. Schlick. Extrapolation versus impulse in multiple-time-stepping schemes:

Linear analysis and applications to Newtonian and Langevin dynamics. J. Chem. Phys., 1997.[17] I. Pagonabarraga, M. H. J. Hagen and D. Frenkel. Self-consistent dissipative particle dynamics

algorithm. Europhys Lett., 42 (4), pp. 377-382 (1998).[18] G. Besold, I. Vattulainen, M. Kartunnen, and J. M. Polson. Towards better integrators for

dissipative particle dynamics simulations. Physical Review E, 62(6):R7611–R7614, Dec. 2000.[19] R. D. Groot and P. B. Warren. Dissipative particle dynamics: Bridging the gap between

atomistic and mesoscopic simulation. J. Chem. Phys., 107(11):4423–4435, Sep 15 1997.[20] I. Pagonabarraga and D. Frenkel. Dissipative particle dynamics for interacting systems. J.

Chem. Phys., 115(11):5015–5026, September 15 2001.[21] Y. Duan and P. A. Kollman. Pathways to a protein folding intermediate observed in a 1-

microsecond simulation in aqueous solution. Science, 282:740-744, 1998. [22] M. Levitt. The molecular dynamics of hydrogen bonds in bovine pancreatic tripsin inhibitor

protein, Nature, 294, 379-380, 1981 [23] M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids, Oxford University Press,

New York, 1987.[24] E. Hairer, C. Lubich and G. Wanner. Geometric numerical integration: structure-preserving

algorithms for ordinary differential equations. Springer, 2002[25] C. B. Anfinsen. Principles that govern the folding of protein chains. Science, 181, 223-230,

1973

78

Flood of data – genes and proteins

http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

79

Flood of data (cont.)

http://us.expasy.org/sprot/relnotes/relstat.html

80

Flood of data (cont.)PDB Content Growth

http://www.rcsb.org/pdb/holdings.html

81

Flood of data (cont.)

http://us.expasy.org/sprot/relnotes/relstat.html

82

Pseudo-code of an MD simulation

MD Simulation:

(1) Pre-processing: Construct initial configuration of x0, v0 and F0;(2) loop 1 to number of steps

(a) Update velocities (by a half step)(b) Update positions (by a full step)(c) Evaluate forces on each atom(d) Update velocities (by a half step)

(3) Post-processing

Algorithm 1: Pseudo-code of an MD simulation.

83

Why protein folding? Huge gap: sequence data and 3D structure data

EMBL/GENBANK, DNA (nucleotide) sequences 15 million sequence, 15,000 million base pairs

SWISSPROT, protein sequences120,000 entries

PDB, 3D protein structures20,000 entries

Bridging the gap through prediction Aim of structural genomics:

“Structurally characterize most of the protein sequences by an efficient combination of experiment and prediction,” Baker and Sali (2001)

Thermodynamics hypothesis: Native state is at the global free energy minimum

Anfinsen (1973)

84

Folding: challenging time scale gap

Longest fully solvated: Duan and Kollman, 1998 1 s single trajectory/100 days Cray T3E

IBM’s Blue Gene project: 1999 -- 2005 Petaflop (1015 flops per second) computer

for folding proteins by 2005

85

Stömer/Verlet/Leapfrog Discretization of Eq. (1) as

Stömer (astronomy, its higher order variants, 1907, aurora borealis), Verlet (molecular dynamics, 1967), Leapfrog (partial differential equations)

An explicit one-step method:

Second order accurate, symplectic, and time reversible

21 12 ( )n n n nq q q h f q

1 1

1/ 2 1 1/ 2 1/ 2 1/ 2 1

: ( , ) ( , ) :

( ), , ( )2 2

Vh n n n n

n n n n n n n n n

q v q v

h hv v f q q q hv v v f q

86

Verlet-I/r-RESPA/ImpulseGrubmüller,Heller, Windemuth and Schulten,

1991 Tuckerman, Berne and Martyna, 1992

The state-of-the-art MTS integrator Splitting via switching functions 2nd order accurate, time reversible

slow1/ 2half kick: ( ) / 2

oscillate: update positions and momentum

using leapfrog ( /2, much smaller time steps)

n n nv v t f q

t

Algorithm 1. Half step discretization of Impulse integrator

87

Nonlinear resonance of Impulse

Approach Analytical: Stability conditions using only model

parameters for a simple nonlinear problem Numerical: Long simulations differing only in outer

time steps; correlation between step size and instability

Results: energy growth occurs unless longest t < 1/3 shortest period.

Unconditionally unstable 3rd order resonance Flexible waters: outer time step less than 3~3.3fs Proteins w/ SHAKE: time step less than 4~5fs

Ma, Izaguirre and Skeel (2003)

88

Nonlinear instability: analytical Approach:

1-D nonlinear model problem, in the neighborhood of stable equilibrium

MTS splitting of potential:

Analyze the reversible symplectic map Express stability condition in terms of problem

parameters Result:

3rd order resonance stable only if “equality” met 4th order resonance stable only if “inequality” met Impulse unstable at 3rd order resonance in practice

2 2 2 3 4 5oscillate kick( ) ( / 2) ( / 2 / 3 / 4) ( ).U q q Aq Bq Cq O q

89

Nonlinear: analytical (cont.) Main result. Let

1. (3rd order) Map stable at equilibrium if and unstable if

Impulse is unstable in practice. 2. (4th order) Map stable if

and unstable if

May be stable at the 4th order resonance.

2 , where , where

1 ' /(2 ') if ' 0, 0 if ' 0, and

1 ' /(2 ') if s '/ 0, 0 if s '/ 0, and

' sin( / 2) and c ' cos( / 2).

i

hs A c c c

hc A s

s h h

0, 0,B C 0.B

20 or 2 ' '/ ,C C hB s c 20 2 ' '/ .C hB s c

90

Fig. 2: Left: flexible melittin protein (PDB entry 2mlt). Right: Energy drift from 10ns MD simulation of flexible 2mlt at room temperature revealing 3:1 nonlinear resonance (at 3, 3.27 and 3.78 fs).

Nonlinear: numerical (cont.)

91

Nonlinear: numerical (cont.)

Energy drift from 500ps SHAKE-constrained MD simulation of explicitly solvated 2mlt at room temperature revealing combined 4:1 and 3:1 nonlinear resonance.

92

B-spline MOLLY (cont.)Total energy(Kcal/mol) vs. time (fs)

Relative drift of toal energy:

< 1.0% for t = 5.0 fs,

< 3.5% for t = 6.0 fs.

93

TM: background (cont.) Dissipative particle dynamics:

Coarsening Pair-wise random/dissipative force on

elements (thus momentum preserving) Time reversible if using self-consistent

scheme (self-consistent dissipative leapfrog, SCD-leapfrog)

Vi Vj

FRi, FD

i

FRj = - FR

i

FDj = - FD

i

94

Self-consistent dissipative leapfrog

1. 1/2 kick: ( ) /(2 ),

2. Oscillate:

3. Evaluate: ( ), ( , ) ( ) , ( ) /

4. 1/2 kick (a): ( ) /(2 )

5. 1/2 kick (b):

C D Ri i i i i

i i i

C D Ri j i j j ij ij ij i j ij ij

C Ri i i i

ii

v v t F F F m

r r v t

F r F r v v e e F r e t

v v t F F m

v v t F

/(2 )

6. Evaluate: ( , )

Iterate steps 5 and 6 k times to obtain consistency of and .

Di

Di j j

Di ij

m

F r v

F v

Algorithm 2. One step of the self-consistent dissipative leapfrog discretization

Pagonabarraga, Hagen and Frenkel, 1998

95

Binding effectiveness computation Structure based drug design

First, study molecular details of a function or condition

Then, find a protein target for a disease Third, design the drug that binds to the

protein target Determine protein structure (experiments,

prediction) Search drug databases – screening Effectiveness of binding using molecular dynamics

96

Small proteins

Beta hairpin protein Villin head piece protein

http://www.stanford.edu/group/pandegroup/Cosm/phase2.html

97

Targeted MOLLY discretization

One step of Targeted MOLLY:(1) Update velocities w/ mollified “slow” forces (by a half long-step)(2) Propagate positions and velocities w/ “fast” forces and pairwise

targeted Langevin coupling (by a full long-step)(3) Do a time-averaging of positions using “fastest” forces and

evaluate the mollified “slow” force(4) Update velocities w/ mollified “slow” forces (by a half long-step)

Algorithm 1: Pseudo-code of one step of Targeted MOLLY

98

TM: one step of discretization

slow1 1

1 1

1 fast R D

-n

1 MOLLIFIED KICK: / 2,

2OSCILLATE: Propagate and by integrating

, ( ) ( ) ( ) for an interval to get and p

n n

n n

n

p p t F

q p

q M p p F q F q F qt q

_

_reduced

_slow

using SCD-leapfrog.

A TIME AVERAGING: Calculate time-averaged positions and

a Jacobian matrix, using only the fastest forces, ( ).1

MOLLIFIED KICK: (2

n

n q n

n n n

q

J q F q

p p t F q

) / 2.

Algorithm 2. One step of the Targeted MOLLY (TM) discretization

99

TM: canonical ensemble sampling

Fig. 3: Phase diagram (left) and velocity distribution (right) of the FPU problem [Hairer, C. Lubich and G. Wanner 2002] using TM with Langevin coupling coefficient of 0.01 showing TM recovers canonical ensemble after 2E07 steps with initial conditions x(0) = 0, v(0) = 1.

2 2 2 4 2( , ) / 2 / 2 ( 1) / 4, with (0) 0, (0) 1, 100.H x v v x x x v

100

MD in action: coalescence

Coalescence of water droplets w/ surfactants in gas phase. http://www.cse.nd.edu/~lcls/protomol

101

ProtoMol: object oriented framework for MD Matthey and Izaguirre (2001), Matthey, et al (2003)

http://www.cse.nd.edu/~lcls/protomol

Kale, Skeel, et al (1999)

102

ProtoMol (cont.)

103

Approx. MD: ion channel gating

KcsA potassium channel, courtesy of the Theoretical and Computational Biophysics Group at UIUC, http://www.ks.uiuc.edu/Research/smd_imd/kcsa/

104

MD: Verlet Method

Newton’s equation represents a set of N second order differential equations which are solved numerically at discrete time steps to determine the trajectory of each atom.

Advantage of the Verlet Method: requires only one force evaluation per timestep

Energy function:

used to determine the force on each atom:

105

What is a Forcefield?

To describe the time evolution of bond lengths, bond angles and torsions, also the nonbond van der Waals and elecrostatic interactions between atoms, one uses a forcefield.The forcefield is a collection of equations and associated constants designed to reproduce molecular geometry and selected properties of tested structures.

In molecular dynamics a molecule is described as a series of charged points (atoms) linked by springs (bonds).

106

Energy Terms Described in the CHARMm forcefield

Bond Angle

Dihedral Improper

107

Energy Functions

Ubond = oscillations about the equilibrium bond lengthUangle = oscillations of 3 atoms about an equilibrium angleUdihedral = torsional rotation of 4 atoms about a central bondUnonbond = non-bonded energy terms (electrostatics and Lenard-Jones)

108

Multigrid I

109

Multigrid II

110

Results (10-4 rPE)

111

Simulation Results for Melittin

PME requires about 3% of the CPU time (17 days 20 hours) when measured against Ewald

MG in pbc requires only about 1% MG is about 66% faster than PME

top related