anharmonic effects of small clusters of molecules …

The Pennsylvania State University

The Graduate School

Department of Chemistry

ANHARMONIC EFFECTS OF SMALL CLUSTERS OF MOLECULES

AND RANKING ACTIVITY OF PROTEIN MUTANTS

A Dissertation in

Chemistry

by

Malika D. Kumarasiri

2009 Malika D. Kumarasiri

Submitted in Partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

May 2009

The dissertation of Malika D. Kumarasiri was reviewed and approved* by the following:

Sharon Hammes-Schiffer Eberly Professor of Biotechnology Professor of Chemistry Dissertation Advisor Chair of Committee

A. Welford Castleman, Jr. Evan Pugh Professor of Chemistry and Physics Eberly Distinguished Chair in Science

Philip C. Bevilacqua Professor of Chemistry

James D. Kubicki Associate Professor of Geochemistry

Ayusman Sen Professor of Chemistry Head of the Department of Department or Graduate Program

*Signatures are on file in the Graduate School

iii

ABSTRACT

This thesis is presented in two parts. In part 1, anharmonic effects of small molecules are

investigated using theoretical methods. In part 2, mutants of enzymes are ranked according to

their activation energy barriers.

Anharmonic effects are required to describe vital processes such as bond breaking or

bond forming, and they significantly affect properties such as geometries and vibrational

frequencies. Despite their importance, anharmonic effects are typically overlooked due to the

high computational cost associated with calculating them. In part 1 of this thesis, anharmonic

effects of small clusters of ammonium nitrate and hydroxylammonium nitrate are investigated.

We compare the structures and vibrational modes against their harmonic counterparts using a

vibrational perturbation theory approach within the density functional theory framework.

Anharmonic effects significantly alter the structures and vibrational frequencies of ammonium

nitrate and hydroxylammonium nitrate clusters.

In part 2 of the thesis, we implement an efficient procedure to rank many mutants of

enzymes or protein designs according to the free energy barrier of the catalyzed reaction.

Escherichia coli dihydrofolate reductase (DHFR) and its mutants are used in this study, and the

mutant structures are generated based on the wild type enzyme structure. Different methods are

investigated to calculate the free energy barrier of hydride transfer in DHFR. The hydride transfer

reaction is investigated using empirical valence bond molecular dynamics simulations followed

by a weighted histogram analysis or umbrella integration to generate the free energy distribution

along the reaction coordinate. Fifteen single mutants of DHFR are used in this study. Our results

indicate a promising correlation between experimentally determined reaction rates and calculated

free energy barriers. The procedures are mostly automated and can easily be adapted for other

enzymatic mutants or designs.

iv

TABLE OF CONTENTS

LIST OF FIGURES ................................................................................................................. v

LIST OF TABLES ................................................................................................................... vi

ACKNOWLEDGEMENTS ..................................................................................................... vii

Chapter 1 Introduction ............................................................................................................ 1

1.1 General Introduction ...................................................................................................... 1 1.2 References ...................................................................................................................... 5

Chapter 2 Anharmonic Effects in Ammonium Nitrate and Hydroxylammonium Nitrate Clusters ............................................................................................................................ 8

2.1 Introduction .................................................................................................................... 8 2.2 Methods .......................................................................................................................... 10 2.3 Results ............................................................................................................................ 10 2.4 Conclusions .................................................................................................................... 22 2.5 References ...................................................................................................................... 23

Chapter 3 Simulation Methods for Hydride Transfer in Dihydrofolate Reductase ................ 25

3.1 Introduction .................................................................................................................... 25 3.2 EVB Molecular Dynamics ............................................................................................. 26 3.3 WHAM and UI ............................................................................................................... 28 3.4 Application to DHFR ..................................................................................................... 31 3.5 Conclusions .................................................................................................................... 39 3.6 References ...................................................................................................................... 40

Chapter 4 Ranking Mutants of Dihydrofolate Reductase According to the Hydride Transfer Rates .................................................................................................................. 42

4.1 Introduction .................................................................................................................... 42 4.2 Methods .......................................................................................................................... 43 4.3 Results ............................................................................................................................ 49 4.4 Conclusions .................................................................................................................... 55 4.5 References ...................................................................................................................... 56

Chapter 5 Conclusions ............................................................................................................ 58

5.1 Anharmonic Effects in Small Clusters ........................................................................... 58 5.2 Simulation of Hydride Transfer in Dihydrofolate Reductase ........................................ 59 5.3 Ranking Mutants of Dihydrofolate Reductase ............................................................... 60 5.4 References ...................................................................................................................... 61

v

Appendix A Technical Details of the Mutation Procedure ..................................................... 62

A.1 Introduction .................................................................................................................... 62 A.2 Protocol for Creating Mutant Topology Files ................................................................ 62 A.3 Generating Mutant Coordinates ..................................................................................... 63 A.3.1 Sample Input Pdb File .......................................................................................... 64 A.3.2 Sample Output Pdb File ....................................................................................... 65 A.3.3 Sample Profix Long Entry .................................................................................... 65

Appendix B Scripts for Automating Computer Job Submission ............................................ 67

B.1 Introduction .................................................................................................................... 67 B.2 Makedir.sh ...................................................................................................................... 68 B.3 Run_restr.sh ................................................................................................................... 70 B.4 Submit_min1.sh ............................................................................................................. 71 B.5 Run_min1.sh .................................................................................................................. 72 B.6 Run_all.sh ...................................................................................................................... 73 B.7 Run_gromos.sh .............................................................................................................. 74 B.8 Run_lambda.sh ............................................................................................................... 75 B.9 ExtractV.f90 ................................................................................................................... 76

vi

LIST OF FIGURES

Figure 1.1: The two-step reaction in DHFR. Proton transfer at N5 position of DHF is thought to precede hydride transfer to C6 position. ......................................................... 5

Figure 2.1: Optimized geometries of monomers and dimers of AN and HAN. Hydrogen bonds are indicated by dashed lines. The hydrogen bonding distances are given in Angstroms for the vibrationally averaged and equilibrium geometries, where the equilibrium distances are given in parentheses. (a) Covalent monomer of AN with Cs symmetry. (b) Covalent monomer of HAN with Cs symmetry. (c) Ionic dimer of (AN)2 with C2h symmetry. (d) Ionic dimer of (HAN)2 with C2 symmetry. This figure was created using GaussView.27 ...................................................................................... 11

Figure 3.1: Free energy profiles of E. coli wild-type DHFR using WHAM (blue) and UI (red). (a) Full free energy curve using 19 windows. (b) Partial free energy curve using 6 windows. .............................................................................................................. 33

Figure 3.2: Free energy profiles of wild-type DHFR using UI method. Full free energy curve using 19 windows is in blue, and the partial free energy curve using 6 windows is in red. ............................................................................................................ 34

Figure 3.3: Original reactant state snapshot. Periodic images manifest to make the protein structure appear broken. This figure was created using VMD.37 ..................................... 35

Figure 3.4: Resolvated structure with no periodic images. This figure was created using VMD.37 ............................................................................................................................. 36

Figure 3.5: Summary of steps of the restrained minimization and MD simulation procedure. fc is the restraining force constant which is halved with each cycle i. ........... 37

Figure 3.6: Partial free energy profile of E. coli wild-type DHFR with WHAM (blue) and UI (red) using the sewed structure and 6 windows. ......................................................... 38

Figure 4.1: Hydride transfer reaction from the NADPH cofactor to the protonated dihydrofolate substrate H3F+ to form the products tetrahydrofolate H4F and NADP+. Figure reproduced with permission from Ref.8. ............................................................... 44

Figure 4.2: Summary of the steps for the generation of the initial mutant structure, equilibration, and calculation of the free energy barrier. Here fc is the force constant of the position restraints with respect to the initial structure during the restrained minimizations and molecular dynamics simulations. ...................................................... 49

Figure 4.3: Depiction of the mutation sites of DHFR. The cofactor is green, the substrate is magenta, and the mutated residues are red. This figure was created using VMD.49 .... 51

Figure 4.4: Correlation plot for the calculated and experimental changes in the free energy barrier for the 15 mutants, where the calculated free energy barriers were obtained using UI with 6 windows. The correlation coefficient is R = 0.82. .................. 54

vii

LIST OF TABLES

Table 2.1: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the isolated neutral species. The experimental frequencies for NH3 are from Ref. 30. The experimental frequencies for HNO3 are from Refs. 31-34. The experimental frequencies for HONH2 are from Ref. 35. ......................................................................... 13

Table 2.2: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the isolated ionic species. Note that VPT2 method is not applicable to +

4ΝΗ because it behaves as a spherical top. ............................................................................................................. 14

Table 2.3: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the covalent monomers (i.e. one ion pair) AN and HAN. .................................................................... 16

Table 2.4: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the ionic dimer (AN)2. ..................................................................................................................... 17

Table 2.5: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the ionic dimer (HAN)2. .................................................................................................................. 18

Table 2.6: Nuclear magnetic shielding constants for AN, HAN, (AN)2 and (HAN)2. All shielding constants are given in ppm. σeq and σvib are the shieldings at the equilibrium and the vibrationally averaged geometries, respectively. For reference, the nuclear magnetic shielding constant for H in TMS calculated at this level of theory is 31.9702. ............................................................................................................. 21

Table 4.1: The experimentally determined hydride transfer rate constants for E. coli DHFR mutants at pH ≈7 and 300 K. These rate constants were measured at pH 7 for wild-type DHFR and all mutants except D27E and D27C, which were measured at pH 7.3. .............................................................................................................................. 50

Table 4.2: The change in the free energy barrier relative to the wild-type free energy barrier for a series of mutants for different equilibration periods. The experimental free energy barriers are obtained from the transition state theory rate constant

expression ( )†exp= −∆BB

k Tk G k T

husing the experimentally determined rate

constants in Table 4.1. The calculated free energy barriers are obtained using WHAM with six windows, UI with six windows, and UI with four windows. The notation “WHAM,350” denotes WHAM with 350 ps of MD on the EVB reactant surface in the equilibration procedure. “WHAM,650” and “WHAM,850” are defined accordingly. UI4 uses mλ = 0.050, 0. 250, 0.500 and 0.625 and UI4’ uses mλ = 0.050, 0.125, 0.500 and 0.625. All free energies are given in kcal/mol. The equilibration period is given in ps. ................................................................................... 53

viii

ACKNOWLEDGEMENTS

I thank my advisor Dr. Sharon Hammes-Schiffer for all the guidance and support I

received during research and to professional development. I am also grateful for all the support

and discussions I had with Dr. Alexander Soudackov. Their advice made my graduate years a

very rewarding experience. I would also like to acknowledge the help form past and present

members of the Hammes-Schiffer group especially, Dr. M. Pak, Dr. C. Swalina, Dr. K.F. Wong,

Dr. Q. Wang, Dr. A. Chackraborty, Dr. J.Watney, Dr. Y. Small, Dr. J.H. Skone, Dr. Dr. A. Hazra,

D.K. Chakravorty, M. Ludlow, S. Edwards, N. Veeraraghavan and, G. Baker.

I would like to thank my doctoral committee members Dr. Will Castleman, Dr. Phil

Bevilacqua and Dr. James Kubicki for the interest and effort put towards my work. I also

acknowledge the financial support from the granting agencies: Air Force Office of Scientific

Research (grant no. FA9550-04-1-0062), National Institutes of Health (grant no. GM56207) and

the Defense Advanced Projects Research Agency (Protein Design Processes project).

Finally, I would like to acknowledge my parents for their guidance to my education and

development. I am also in debt of my wife Vindhya. Without her standing beside me every step

of my graduate years, I would not have made it this far.

Chapter 1

Introduction

1.1 General Introduction

This thesis is comprised of two parts. In part 1, we evaluate the importance of

anharmonic effects by examining clusters of small molecules using ab initio methods. In part 2,

we present an efficient scheme to rank mutants of enzymes according to their reaction rates using

an empirical valence bond (EVB) methodology.

Throughout computational chemistry, the harmonic approximation is used to describe

many molecular properties. However, the harmonic approximation is unsuccessful at describing

certain vital chemical phenomena such as bond breaking or bond forming processes. Anharmonic

effects must be included in computations to account for such processes. Additionally,

anharmonicity can significantly alter properties such as geometries, vibrational frequencies and

nuclear magnetic shieldings. There are several ways to include anharmonic effects into

calculations using first principles. Two popular ways are based on self-consistent (vibrational

self-consistent field: VSCF1,2) or second order perturbative (second order vibrational perturbation

theory: VPT23) approaches. The VSCF method is implemented in the GAMESS package4, and

the VPT2 method is implemented in Gaussian 03.5 Although it appears that converged

variational results of VSCF should be more accurate than a second order perturbative method, as

noticed by Handy and coworkers, the VPT2 treatment often leads to an effective inclusion of

nearly exact higher order terms that are required for computation of third and semidiagonal fourth

derivatives needed for second order perturbation.6,7 Therefore, VPT2 predictions can be closer to

experimental results than VSCF predictions.8 We also note that incorporating quantitative

2

anharmonic effects into calculations requires heavy additional computations and thus is still

prohibitive for large molecular systems such as biomolecules.

In part 1 of this thesis, we specifically investigate anharmonic effects of single and

double pairs of ammonium nitrate (AN) and hydroxylammonium nitrate (HAN) molecules. The

VPT2 method is used within the density functional theory framework to compare and contrast

anharmonic effects against their harmonic counterparts. AN and HAN are useful model systems

in ionic liquid studies. They are good examples of protic ionic liquids, which are formed by

proton transfer between acids and bases. This proton transfer reaction has been a topic of

extensive investigations due to its potential use in high-temperature fuel cells.9-12 AN has also

been used as a solid oxidizer in rocket propulsion fuels,13,14 and HAN has been used in liquid

propellants. Accurate understanding of fundamental properties of these ionic materials provides

the foundation for future studies of ionic liquids. In chapter 2, we will characterize AN and HAN

with the VPT2 method.

In part 2 of this thesis, we investigate chemical reactions within large biomolecules such

as enzymes. As stated previously, biomolecules are not tractable with ab initio computational

methods due to the large system size. Thus, we perform molecular dynamics (MD) simulations

within the framework of the empirical valence bond (EVB) theory.15 EVB methodology allows

chemical bonds to break or form and includes anharmonic effects for these bonds. The method is

then extended to investigate activation free energy barriers of many enzymatic mutants.

Enzymatic proteins are remarkable molecules due to their specificity and high efficiency

in catalysis.16,17 Decoding their ability to increase rates of reactions by many orders of magnitude

under physiological conditions is not only of interest to fundamental sciences but to applied

sciences and industry as well. Many state of the art experimental and computational techniques

are employed to probe the structure and dynamics of enzymes. Among the computational

techniques are various flavors of MD simulation methods.15,18 MD simulation is based on the

3

hypothesis that statistical ensemble averages of a system are equal to the time averages.

Interactions between particles of the system are often described using an empirical forcefield, and

the time evolution of the system is computed over a large number of steps with each step carried

out for a small change in time.15 As a result, calculations extending over a long period of time,

with quality dependant on the empirical forcefield, are typical characteristics of MD simulations.

Although significant progress has been made toward understanding the structures and

functions of enzymes using MD simulations, there are still many unanswered questions due to the

complex structural and dynamical nature of enzymes. Continuous improvements are made in

theoretical procedures not only to improve accuracy of predictions of these systems but also to

increase their efficiency. However, improving one of these two aspects typically comes at the

expense of the other. One particularly interesting area under investigation is computational

mutant studies or protein design. The goal of mutant studies or protein design is to investigate

possibilities of altering activity or specificity of an enzyme. Typically, the mutants or designs are

then ranked according to their chemical activity. Enzymes with altered activity or specificity are

in demand for drug discovery, detergents, soil treatment, and even enzyme-based computers.19-23

Although mutant studies can be performed by both experimental and computational methods,

computational methods to predict activity are typically more attractive in the initial stages due to

the lesser amount of resources required.

Computational methods used in mutant activity studies require two essential qualities: (1)

they should be very efficient as there will be many structures to study; (2) they should be able to

describe the chemical processes in enzymes accurately enough to distinguish between the

activities of different mutants or designs. This implies that it is prohibitive to perform long

conventional MD calculations per mutant or design, and the MD procedure needs to be

streamlined and possibly automated so that many computations can be managed at once.

Additionally, the inability of empirical forcefields to describe bond breaking and bond forming

4

processes needs to be addressed. These problems can be remedied by using a hybrid

quantum/classical MD (QM/MM) approach.24-26 While there are many flavors of QM/MM

approaches, we use an empirical valence bond (EVB) strategy in this thesis.15,27 The EVB MD

method is well suited for our task, as it is capable of providing insights into chemical reactions

quantitatively and very efficiently. It has been used successfully to describe a wide range of

chemical reactions in solution and proteins.15,24,28-30 To describe a reaction with the EVB method,

two valence bond states are defined, one for the reactant state and one for the product state. The

electronic ground state of the potential energy surface is modeled by a mixture of these two states

to allow the chemical reaction to be driven forward. A more complete description of the EVB

method is given in Section 3.2.

We present the mutant ranking studies performed within the EVB/MD framework to

efficiently rank mutants of dihydrofolate reductase (DHFR) according to their activity. DHFR has

become an ideal system for computational studies due to the small system size, the abundance of

structures present in the Protein Data Bank,31 and the importance of the reaction it catalyzes.

DHFR catalyzes the reduction of 7,8-dihydrofolate (DHF) to 5,6,7,8-tetrahydrofolate (THF),

where coenzyme nicotinamide adenine dinucleotide phosphate (NADPH) acts as a hydride

donor.32 THF is the active form of folate in humans, and it is required to function as a methyl

group shuttle for the de novo synthesis of purines, pyrimidines, and certain amino acids.

Inhibiting DHFR activity results in folate deficiency, which can be manipulated into a therapeutic

effect to battle cancerous cell growth or bacterial growth.17,19,33-35 The complete reaction of

DHFR is a two-step process, where a proton transfer to the N5 position of DHF is thought to

occur prior to the hydride transfer between protonated DHF and NADPH (Figure 1.1). This

notion is based on the chemical intuition that an initial proton transfer would be energetically

more favorable than an initial hydride transfer. Hydride transfer in DHFR has been subjected to

many computational studies, which provides a solid foundation for studying mutants.

5

Figure 1.1: The two-step reaction in DHFR. Proton transfer at N5 position of DHF is thought to precede

hydride transfer to C6 position.

In chapter 3 we will present the methodology for generating free energy profiles and

validate it using E. coli wild-type DHFR. Chapter 4 will present the procedures and results for

ranking the mutants of DHFR according to their catalytic reaction rates. Finally, in chapter 5 we

will provide overall conclusions.

1.2 References

(1) Carney, G. D.; Sprandel, L. I.; Kern, C. W. Advances in Chemical Physics 1978, 37, 305. (2) Bowman, J. M. Journal of Chemical Physics 1978, 68, 608.

6

(3) Barone, V. Journal of Chemical Physics 2004, 120, 3059. (4) Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S. J.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. Journal of Computational Chemistry 1993, 14, 1347. (5) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, J. A., Jr.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; Pople, J. A. Gaussian03; revision C.03 ed.; Gaussian, Inc.: Pittsburgh, PA, 2003. (6) Burcl, R.; Handy, N. C.; Carter, S. Spectrochim Acta A Mol Biomol Spectrosc 2003, 59, 1881. (7) Burcl, R.; Carter, S.; Handy, N. C. Chemical Physics Letters 2003, 373, 357. (8) Barone, V. Journal of Chemical Physics 2005, 122, 014108. (9) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2002, 117, 2599. (10) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2003, 119, 4274. (11) Guillot, B.; Guissani, Y. Journal of Chemical Physics 2002, 116, 2047. (12) Schmidt, M. W.; Gordon, M. S.; Boatz, J. A. Journal of Physical Chemistry A 2005, 109, 7285. (13) Kondirkov, B. N.; Annikov, V. E.; Egorshev, V. Y.; DeLuca, L.; Bronzi, C. J. Propul. Power 1999, 15, 763. (14) Sinditskii, V. P.; Egorshev, V. Y.; Levshenkov, A. I.; Serushkin, V. V. Propellants, Explosives, Pyrotechnics 2005, 30, 269. (15) Warshel, A. Computer Modeling of Chemical Reactions in Enzymes and Solutions; John Wiley & Sons, Inc.: New York, 1991. (16) Benkovic, S. J.; Hammes-Schiffer, S. Science 2003, 301, 1196. (17) Berg, J. M.; Tymoczko, J. L.; Stryer, L. Biochemistry, 5th ed.; W.H. Freeman: New York, 2002. (18) Brooks, C. L.; Karplus, M.; Pettitt, B. M. Proteins : a theoretical perspective of dynamics, structure, and thermodynamics; J. Wiley: New York, 1988. (19) Allegra, C. J.; Hoang, K.; Yeh, G. C.; Drake, J. C.; Baram, J. Journal of Biological Chemistry 1987, 262, 13520. (20) Baron, R.; Lioubashevski, O.; Katz, E.; Niazov, T.; Willner, I. Angew Chem Int Ed Engl 2006, 45, 1572. (21) Leahy, J. G.; Colwell, R. R. Microbiological Reviews 1990, 54, 305. (22) Kapoor, K. K.; Jain, M. K.; Mishra, M. M.; Singh, C. P. Annals of Microbiology (Paris) 1978, 129 B, 613. (23) Rao, A. G. Plant Physiology 2008, 147, 6. (24) Billeter, S. R.; Webb, S. P.; Iordanov, T.; Agarwal, P. K.; Hammes-Schiffer, S. Journal of Chemical Physics 2001, 114, 6925. (25) Billeter, S. R.; Webb, S. P.; Agarwal, P. K.; Iordanov, T.; Hammes-Schiffer, S. Journal of the American Chemical Society 2001, 123, 11262.

7

(26) Hammes-Schiffer, S. Accounts of Chemical Research 2006, 39, 93. (27) Warshel, A. Journal of Physical Chemistry 1982, 86, 2218. (28) Schmitt, U. W.; Voth, G. A. Journal of Physical Chemistry B 1998, 102, 5547. (29) Vuilleumier, R.; Borgis, D. Chemical Physics Letters 1998, 284, 71. (30) Cembran, A.; Gao, J. Theoretical Chemistry Accounts 2007, 118, 211. (31) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Research 2000, 28, 235. (32) Miller, G. P.; Benkovic, S. J. Chemistry & Biology 1998, 5, R105. (33) Miovic, M.; Pizer, L. I. Journal of Bacteriology 1971, 106, 856. (34) Huennekens, F. M. Advances in Enzyme Regulation 1994, 34, 397. (35) Schweitzer, B. I.; Dicker, A. P.; Bertino, J. R. FASEB Journal 1990, 4, 2441.

8

Chapter 2

Anharmonic Effects in Ammonium Nitrate and Hydroxylammonium Nitrate Clusters

Reproduced in part with permission from M. D. Kumarasiri, C. Swalina, and S. Hammes-Schiffer, Journal of Physical Chemistry B 2007, 111, 4653.

© 2007 American Chemical Society

2.1 Introduction

The physical properties of hydrogen-bonded acid-base complexes impact a wide range of

materials. One example is room temperature ionic liquids, which are typically defined to be

organic salts that melt below 100 °C.1-3 Ionic liquids have many potential technological

applications because of their low vapor pressure, versatility, and environmentally benign nature.

Protic ionic liquids, which are formed by proton transfer between acids and bases, are potentially

relevant to high-temperature fuel cell applications.4 Although not room temperature ionic liquids,

ammonium nitrate (AN) and hydroxylammonium nitrate (HAN) serve as useful model systems

for the development of methods to study the properties of ionic liquids. AN has been used as a

solid oxidizer in rocket propulsion fuels,5,6 and HAN has been used in liquid propellants.7,8

Understanding the fundamental properties of these ionic materials will provide the foundation for

future studies of ionic liquids. A topic of particular interest is the role of proton transfer reactions

in hydrogen-bonded acid-base complexes. Hydrogen tunneling and coupling between heavy

atom motions and the transferring proton are expected to be important in these types of reactions.

A variety of theoretical studies have investigated the role of proton transfer reactions in

hydrogen-bonded acid-base complexes. Thompson and coworkers used density functional theory

and ab initio MP2 theory to study proton transfer in gas phase clusters of ammonium nitrate,9

ammonium dinitramide,10 and hydroxylammonium nitrate.11 In addition, Morokuma and

9

coworkers studied ammonium dinitramide clusters at both the RHF and the MP2 levels.12 The

calculations on single acid-base pairs in the gas phase indicate that the hydrogen bonded, neutral

acid-base pair is the only stable structure (i.e., the ionic pairs are not stable) at correlated levels of

theory. The ionic dimers (i.e., two ionic acid-base pairs) are stable minima on the correlated

potential energy surfaces. Thus, the properties of ionic dimers are expected to be more relevant

to bulk ionic materials. In addition to these studies, recently Schmidt, Gordon, and Boatz

performed calculations on proton transfer in triazolium-dinitramide ion pairs.13 Guillot and

Guissani performed one-phase and two-phase molecular dynamics simulations to study the

impact of proton transfer on the phase behavior of ammonium chloride (NH4Cl).14 They

determined that the existence of both ionic and covalent species in the liquid phase influences the

melting process.

The objective of this chapter is to characterize covalent and ionic clusters of ammonium

nitrate (NH4+NO3

−) and hydroxyl ammonium nitrate (HONH3+NO3

−) with the inclusion of

anharmonic effects. We perform density functional theory calculations of the isolated neutral and

ionic components, the covalent monomers, and the ionic dimers. In each case, we use the second-

order vibrational perturbation theory (VPT2) to calculate the frequencies and geometries. This

approach leads to more accurate frequencies and geometries than previous calculations of

frequencies directly from the Hessian because the anharmonic effects are included. We also

calculate the anharmonic effects on the nuclear magnetic shielding constants for nitrogen,

oxygen, and hydrogen nuclei in the ionic clusters. All of these calculations provide insight into

the significance of anharmonic effects in ionic materials and provide data that will be useful for

the parameterization of molecular mechanical forcefields for ionic liquids and other ionic

materials.

10

2.2 Methods

All of the calculations were performed with density functional theory (DFT) using the

B3LYP functional15-17 and the 6-311++G(d,p) basis set18,19 with the Gaussian 03 package.20 We

used a pruned (99,770) grid for the numerical integrations. We calculated the frequencies based

on the harmonic approximation directly from the Hessian and the frequencies including

anharmonic effects with the VPT2 method. In the VPT2 method, the zeroth-order vibrational

wavefunctions are generated from the harmonic approximation, and the second-order perturbation

theory corrections are calculated from the cubic force constants and semidiagonal quartic force

constants. The required cubic and quartic force constants are obtained by numerical

differentiation of the analytical Hessians. The VPT2 method has been implemented by

Barone21,22 in the Gaussian 03 package.20

We calculated the anharmonic contribution to the vibrationally averaged isotropic nuclear

magnetic shielding constants for all nuclei by comparing shielding constants evaluated at the

equilibrium and vibrationally averaged geometries at 0 K temperature. The gauge independent

atomic orbital (GIAO) approach23 was used to calculate the nuclear magnetic shielding constants.

This approach accounts specifically for the contribution to the shielding constants arising from

the anharmonicity of the potential energy surface. Additional zero-point vibrational effects could

be calculated from the curvature of the surface corresponding to the shielding constant and the

harmonic frequencies,24-26 but such calculations are beyond the scope of this work.

2.3 Results

As observed previously for AN and HAN,9,11 the only stable structures for the monomers

are hydrogen-bonded, neutral acid-base pairs, whereas the structures corresponding to two ionic

11

acid-base pairs are the global minima for the dimers. The optimized geometries for the

monomers and dimers of AN and HAN are shown in Figure 2.1.

Figure 2.1: Optimized geometries of monomers and dimers of AN and HAN. Hydrogen bonds are

indicated by dashed lines. The hydrogen bonding distances are given in Angstroms for the vibrationally

averaged and equilibrium geometries, where the equilibrium distances are given in parentheses. (a)

Covalent monomer of AN with Cs symmetry. (b) Covalent monomer of HAN with Cs symmetry. (c) Ionic

dimer of (AN)2 with C2h symmetry. (d) Ionic dimer of (HAN)2 with C2 symmetry. This figure was created

using GaussView.27

(a) (b)

(c) (d)

12

The AN and HAN monomers have Cs symmetry, the (AN)2 dimer has C2h symmetry, and

the (HAN)2 dimer has C2 symmetry. The hydrogen bonding distances are given for both the

vibrationally averaged and the equilibrium structures. The vibrationally averaged structure is

obtained by averaging the coordinates over the nuclear vibrational wavefunction calculated with

the VPT2 method. Thus, the vibrationally averaged structures include anharmonic effects. As

expected, the bond lengths for the bonds between the donor atoms and the hydrogen atoms

increase when the anharmonic effects are included. For the structures given in Figures 2.1(a), (b)

and (c), the distance between the hydrogen atom and the donor atom increases and the distance

between the hydrogen atom and the acceptor atom decreases in each hydrogen bond when

anharmonic effects are included. The hydrogen bonding in the (HAN)2 dimer shown in Figure

2.1(d) is more complex because one of the oxygen atoms on each −3ΝΟ moiety serves as the

acceptor for two hydrogen bonds.

The calculated frequencies for the isolated neutral and ionic species are given in Tables

2.1 and 2.2, respectively. The experimental frequencies for the isolated neutral species are also

provided in Table 2.1. A comparison of the frequencies calculated with the VPT2 method to the

experimental data enables us to benchmark the VPT2 method for these types of systems. As

shown in Table 2.1, the frequencies obtained with the VPT2 method are in better agreement with

the gas phase experimental data than those obtained with the conventional harmonic approach.

These results illustrate the importance of anharmonic effects. In some cases, the anharmonic

effects decrease the frequency by ∼200 cm-1, significantly improving the agreement with

experiment. A more standard approach to account for vibrational anharmonicity in electronic

structure calculations is to scale the calculated harmonic frequencies by an empirical scaling

factor. The empirical scaling factor for the B3LYP DFT method with similar basis sets has been

determined to be ~0.96 – 0.97.28,29 While this empirical scaling procedure leads to qualitatively

13

reasonable frequencies, Tables 2.1 and 2.2 indicate that the VPT2 method is more quantitatively

accurate. Average deviation of VPT2 frequencies is 7 cm-1 from experiment compared to the 66

cm-1 of harmonic frequencies.

Table 2.1: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the isolated neutral species. The experimental frequencies for NH3 are from Ref. 30. The experimental frequencies for HNO3 are from Refs. 31-34. The experimental frequencies for HONH2 are from Ref. 35.

Species Label Description Experimental frequency Harmonic VPT2

NH3 v1(a1) Sym. Stretch 3337 3480 3339

v2(a1) Sym. Bend 950 1006 902

v3(e) Asym. Stretch 3444 3607 3440

v4(e) Asym. Bend 1627 1669 1619

HNO3 v1(a') OH stretch 3550 3727 3548

v2(a') NO asym. stretch 1709 1756 1711

v3(a') NO sym. stretch 1326 1349 1319

v4(a') NOH bend 1304 1320 1294

v5(a') NO(OH) stretch 878 897 875

v6(a') ONO bend 647 649 633

v7(a'') ONO(OH) bend 580 587 575

v8(a'') N out-of-plane bend 763 773 762

v9(a'') OH torsion 458 461 446

HONH2 v1(a') OH stretch 3650 3824 3631

v2(a') NH stretch 3294 3448 3286

v3(a') HNH bend 1604 1673 1609

v4(a') NOH bend 1353 1390 1337

v5(a') NH2 wag 1115 1135 1096

v6(a') NO stretch 895 927 900

v7(a'') NH stretch 3359 3528 3342

v8(a'') NH2 twist 1294 1328 1286

v9(a'') OH torsion 386 442 419

14

Table 2.2: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the isolated ionic species. Note that VPT2 method is not applicable to +

4ΝΗ because it behaves as a spherical top.

Species Label Description Harmonic VPT2

NH4+ v1(a1) Sym. Stretch 3372

v2(e) Twist 1727

v3(t) Asym. Stretch 3475

v4(t) Asym. Bend 1489

NO3- v1(a') Sym. Stretch 1066 1044

v2(a'') N out-of-plane bend 835 825

v3(e') Asym. Stretch 1378 1344

v4(e'') ONO Asym. Bend 709 699

HONH3+ v1(a') OH stretch 3698 3523

v2(a') NH stretch 3426 3254

v3(a') NH stretch 3328 3202

v4(a') HNH bend 1640 1586

v5(a') NH3 umbrella mode 1592 1542

v6(a') NOH bend 1476 1404

v7(a') NH3 wag 1152 1126

v8(a') NO stretch 1016 981

v9(a'') NH stretch 3400 3229

v10(a'') HNH bend 1639 1591

v11(a'') NH3 twist 1193 1157

v12(a'') HONH torsion 315 258

15

The calculated frequencies for the covalent monomers of AN and HAN are given in

Tables 2.3. Tables 2.4 and 2.5 provides calculated frequencies of ionic dimers of AN and HAN,

respectively. The inclusion of anharmonic effects significantly decreases the frequencies of the

vibrational modes in all of these clusters, particularly for the NH and OH stretching modes. The

largest effects were observed for the NH and OH stretching modes involved in hydrogen bonding

interactions. The NH and OH stretching frequencies (i.e., ν2-4, 29-32) are decreased by up to ~500

cm-1 in (HAN)2. Moreover, the +4ΝΗ symmetric stretch frequency, ν2, is decreased by ~1000

cm-1 in (AN)2. In these cases, the application of the empirical scaling factor leads to substantial

errors in the predicted frequencies. In addition to providing the frequencies for all modes in the

clusters, Table 2.3 also provides descriptions of the modes that are relatively localized with

definitive character. The remaining modes are not straightforward to characterize. For the ionic

dimers, we identified a mode corresponding to an intermolecular breathing mode, which

corresponds to ν11 in (AN)2 and ν20 in (HAN)2. These breathing modes are associated with

relatively low frequencies of ~300 cm-1.

16

Table 2.3: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the covalent monomers (i.e. one ion pair) AN and HAN.

AN HAN Label Description Harmonic VPT2 Label Description Harmonic VPT2 v1(a') NH3 asym.

stretch 3586 3416 v1(a') OH(HONH2)

stretch 3682 3480

v2(a') NH3 sym. stretch

3472 3326 v2(a') NH sym. stretch

3448 3286

v3(a') OH stretch 2732 2266 v3(a') OH(HNO3) stretch

2660 2106

v4(a') NO asym. stretch

1733 1675 v4(a') ONO stretch 1740 1683

v5(a') NH3 asym. bend 1656 1629 v5(a') HNH bend 1665 1605 v6(a') NOH bend 1511 1459 v6(a') NOH(HNO3)

bend 1533 1487

v7(a') NO sym. stretch 1326 1290 v7(a') NOH(HONH2) bend

1487 1442

v8(a') NH3 sym. bend 1151 1099 v8(a') ONO stretch 1313 1273 v9(a') NO(OH) stretch 953 940 v9(a') NH2 wag 1191 1164 v10(a') ONO bend 691 680 v10(a') NO(HONH2)

stretch 984 953

v11(a') ONO(OH) bend 660 645 v11(a') NO(HNO3) stretch

963 937

v12(a') NH3 in-plane rotation

430 424 v12(a') ONO bend 698 687

v13(a') NHO stretch 248 235 v13(a') ONO bend 655 643 v14(a') NHO in-plane

bend 106 90 v14(a') HONH2 in-

plane wag 271 26

v15(a'') NH3 asym. stretch

3592 3420 v15(a') NHO stretch 200 190

v16(a'') NH3 asym. bend 1668 1632 v16(a') 163 147 v17(a'') NH3 out-of-

plane torsion 1097 1051 v17(a") NH asym.

stretch 3519 3335

v18(a'') N out-of-plane bend

791 782 v18(a") NH2 twist 1320 1274

v19(a'') OH torsion 339 349 v19(a") OH(HNO3) torsion

1089 1060

v20(a'') 73 68 v20(a") N out-of-plane bend

786 779

v21(a'') 61 v21(a") OH(HONH2) torsion

558 544

v22(a") NH2 twist 370 372 v23(a") 81 79 v24(a") 42 46

17

Table 2.4: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the ionic dimer (AN)2.

Label Description Harmonic VPT2 Label Description Harmonic VPT2 v1(ag) NH asym.

stretch 3504 3332 v25(bg) NH asym.

stretch 3577 3405

v2(ag) NH(H-bonded) sym. stretch

2943 1839 v26(bg) NH(H-bonded) asym. stretch

2782 2292

v3(ag) NH4+ twist 1752 1680 v27(bg) NH4

+ asym. bend

1623 1550

v4(ag) NH4+ twist 1726 1567 v28(bg) NO3

- asym. stretch

1496 1460

v5(ag) NH4+ asym.

bend 1510 1443 v29(bg) NH4

+ asym. bend

1361 1294

v6(ag) NO(H-bonded) asym. stretch

1309 1272 v30(bg) ONO asym. bend

723 710

v7(ag) NO3- sym.

stretch 1028 1001 v31(bg) NH4

+ wag 436 411

v8(ag) N out-of-plane bend

828 814 v32(bg) NH4+ wag 311 288

v9(ag) ONO asym. bend

715 701 v33(bg) 178 140

v10(ag) NH4+ wag 416 364 v34(bg) 105 98

v11(ag) Breathing 304 291 v35(bg) 72 64 v12(ag) 131 126 v36(bu) NH asym.

stretch 3577 3402

v13(ag) 41 30 v37(bu) NH(H-bonded) asym. stretch

2913 2482

v14(au) NH asym. stretch

3503 3338 v38(bu) NH4+ asym.

bend 1621 1541

v15(au) NH(H-bonded) sym. stretch

2899 2554 v39(bu) NH4+ asym.

bend 1404 1349

v16(au) NH4+ twist 1751 1682 v40(bu) NO(H-bonded)

asym. stretch 1281 1235

v17(au) NH4+ twist 1706 1617 v41(bu) NO3

- sym. stretch

1025 996

v18(au) NO3- asym.

stretch 1518 1483 v42(bu) N out-of-plane

bend 828 812

v19(au) NH4+ asym.

bend 1498 1436 v43(bu) ONO asym.

bend 714 700

v20(au) ONO asym. bend

733 721 v44(bu) NH4+ wag 463 463

v21(au) NH4+ wag 404 374 v45(bu) NH4

+ wag 332 311 v22(au) 283 270 v46(bu) 254 201 v23(au) 76 73 v47(bu) 100 99 v24(au) 65 65 v48(bu) 33 22

18

Table 2.5: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the ionic dimer (HAN)2.

Label Description Harmonic VPT2 Label Description Harmonic VPT2 v1(a) NH stretch 3520 3350 v29(b) NH stretch 3519 3350 v2(a) OH stretch 3307 3096 v30(b) OH stretch 3318 3039 v3(a) NH(H-bonded)

stretch 3013 2576 v31(b) NH(H-bonded)

stretch 3017 2640

v4(a) NH(H-bonded) stretch

2836 2355 v32(b) NH(H-bonded) stretch

2881 2460

v5(a) HNH bend 1715 1609 v33(b) HNH bend 1709 1623 v6(a) NH3

umbrella mode

1633 1587 v34(b) HNH bend 1626 1566

v7(a) HNH bend 1604 1543 v35(b) NOH bend 1607 1591 v8(a) NOH bend 1583 1525 v36(b) NH3

umbrella mode

1577 1527

v9(a) NO(NO3) asym. stretch

1499 1465 v37(b) NO(NO3) asym. stretch

1520 1481

v10(a) NH3 wag 1310 1269 v38(b) NH3 wag 1311 1273 v11(a) NH3 twist 1293 1244 v39(b) NH3 twist 1280 1239 v12(a) NH3 wag 1219 1188 v40(b) NH3 wag 1220 1186 v13(a) NO(NO3)

sym. stretch 1043 1019 v41(b) NO(NO3)

sym. stretch 1037 1012

v14(a) NO(HONH3) stretch

1031 996 v42(b) NO(HONH3) stretch

1029 995

v15(a) N out-of-plane bend

828 806 v43(b) N out-of-plane bend

826 811

v16(a) OH(HONH3) twist

802 733 v44(b) OH(HONH3) twist

782 724

v17(a) ONO bend 728 715 v45(b) ONO bend 726 712 v18(a) ONO bend 707 689 v46(b) ONO bend 701 675 v19(a) NH3 twist 458 442 v47(b) NH3 twist 460 446 v20(a) Breathing 286 274 v48(b) 289 280 v21(a) 233 205 v49(b) 256 240 v22(a) 177 154 v50(b) 190 169 v23(a) 145 140 v51(b) 151 148 v24(a) 124 126 v52(b) 133 130 v25(a) 111 102 v53(b) 115 107 v26(a) 71 62 v54(b) 48 53 v27(a) 52 39 v28(a) 30 12

19

The nuclear magnetic shielding constants for all nuclei in the covalent monomers and the

ionic dimers of AN and HAN are given in Table 2.6. For both monomers, the shieldings for the

oxygen nuclei involved in hydrogen bonding interactions are influenced more by anharmonic

effects than the other oxygen nuclei. For both dimers and the AN monomer, the shieldings for

the nitrogen nuclei involved in hydrogen bonding interactions are influenced more by anharmonic

effects than the other nitrogen nuclei. For both monomers and (AN)2, the magnitude of the shift

of the shieldings due to anharmonic effects is similar for all of the hydrogen nuclei, but the

direction of this shift is different for the hydrogen nuclei involved in hydrogen bonds. For

(HAN)2, the anharmonic effects on the shieldings for the hydrogen nuclei do not exhibit a clear

trend because of the more complex hydrogen bonding pattern. For reference, we also calculated

the nuclear magnetic shielding constant for hydrogen in tetramethylsilane (TMS) at the same

level of theory.36 This reference enables the calculation of chemical shifts that are experimentally

observable. The nuclear magnetic shielding constants for other reference materials are also

straightforward to calculate. These results indicate that the inclusion of anharmonic effects

significantly alters the nuclear magnetic shielding constants. Thus, a quantitatively accurate

prediction of chemical shifts for comparison to experimental data requires the inclusion of

anharmonic effects.

These calculations provide insight into several general features of covalent and ionic

hydrogen-bonded clusters. As observed previously, the most stable structures for the monomers

are covalent acid-base pairs, whereas the most stable structures for the dimers are ionic acid-base

pairs. The hydrogen bonding distances are greater in the ionic dimers than in the covalent

monomers. Although the hydrogen bonding distances might be expected to be shorter for the

charged species, the observed trend arises in part because the nitrogen and oxygen atoms are

involved in multiple competing hydrogen bonding interactions in the dimers. In addition, the

frequencies undergo substantial qualitative shifts from the covalent monomers to the ionic

20

dimers. Although the direct correspondence between specific modes in the covalent and ionic

complexes is not rigorous due to mixing among the many modes, some general trends are

observed. As expected, the NH stretching frequencies in NH3 and +4NH for AN and (AN)2,

respectively, differ significantly. The NO symmetric and asymmetric stretching frequencies in

NO3 also differ substantially between the covalent and the ionic AN clusters. Furthermore, the

intermolecular hydrogen-bonding stretching motion shifts from ν13 = 235 cm-1 in AN to a

breathing mode of ν11 = 291 cm-1 in (AN)2. Similar trends in the frequencies are observed for

HAN and (HAN)2, although the characterization of the modes is not as straightforward. In this

case, the intermolecular hydrogen-bonding stretching motion shifts from ν15 = 190 cm-1 in HAN

to a breathing mode of ν20=274 cm-1 in (AN)2. The quantitative study of these changes in the

structures and vibrational frequencies requires the inclusion of anharmonic effects.

21

Table 2.6: Nuclear magnetic shielding constants for AN, HAN, (AN)2 and (HAN)2. All shielding constants are given in ppm. σeq and σvib are the shieldings at the equilibrium and the vibrationally averaged geometries, respectively. For reference, the nuclear magnetic shielding constant for H in TMS calculated at this level of theory is 31.9702.

Species Atom σeq σeq - σvib AN 1N -116.6324 2.7692

2O -73.7641 10.0719 3O -189.1048 7.7809 4O -211.7382 4.6787 5N 242.2423 -15.1759 6H 14.1399 0.4162 7,8H 30.9618 -1.9366 9H 30.4328 -1.9017

HAN 1N -114.9366 2.4143 2O -78.2194 9.0993 3O -172.1313 4.9056 4O -216.7373 2.7291 5O 236.5382 -1.9766 6N 141.3442 -2.1825 7H 14.3767 0.6096 8H 26.0492 -0.2386 9H 27.2008 -0.2030 10H 27.2006 -0.2032

(AN)2 6,11N -143.9728 -0.5831 1,15N 224.5532 -1.8135 8,9,10,13O -187.3903 2.0320 7,12O -129.3589 2.1699 3,4,14,16H 20.4853 0.5080 2,5,17,18H 28.5415 -0.5431

(HAN)2 1,12N -142.7165 0.3727 6,16N 156.4747 2.0139 4,11O -214.6025 2.5519 2,14O -155.4694 1.3114 3,13O -118.9725 2.8498 5,19O 208.0308 3.4048 10,15H 19.3276 0.4135 7,17H 20.7594 0.3260 8,20H 22.2187 -0.0484 9,18H 26.7813 -0.0224

22

2.4 Conclusions

In this chapter, we characterized the covalent and ionic clusters of ammonium nitrate and

hydroxyl ammonium nitrate using density functional theory and second-order vibrational

perturbation theory. These clusters exhibit strong hydrogen bonding interactions. Our

calculations confirmed that the most stable structures are covalent acid-base pairs for the

monomers and ionic acid-base pairs for the dimers. The hydrogen bonding distances were found

to be greater in the ionic dimers than in the covalent monomers in part because the nitrogen and

oxygen atoms are involved in multiple competing hydrogen bonding interactions in the dimers.

We also observed significant shifts in the stretching frequencies from the covalent monomers to

the ionic dimers. Moreover, we identified an intermolecular hydrogen-bonding stretching motion

of ~200 cm-1 in the monomers that shifts to an intermolecular breathing motion of slightly higher

frequency of ~300 cm-1 in the dimers.

Our calculations illustrate that the anharmonicities of the potential energy surfaces

influence the geometries, frequencies, and nuclear magnetic shieldings for these systems. The

inclusion of anharmonic effects was found to significantly decrease many of the calculated

frequencies in these clusters and to improve the agreement of the calculated frequencies with the

experimental data available for the isolated neutral species. Our results also indicate that the

anharmonic effects should be included in calculations of nuclear magnetic shielding constants for

these types of systems to ensure quantitatively accurate predictions for comparison to

experimental data. Furthermore, the consideration of anharmonic effects in the development of

molecular forcefields will be important for simulations of proton transfer reactions in ionic

liquids and other ionic materials.

23

2.5 References

(1) Welton, T. Chemical Reviews 1999, 99, 2071. (2) Holbrey, J. D.; Seddon, K. R. J. Chem. Soc., Dalton Trans. 1999, 2133. (3) Brennecke, J. F.; Maginn, E. J. AIChE Journal 2001, 47, 2384. (4) Yoshizawa, M.; Xu, W.; Angell, A. Journal of the American Chemical Society 2003, 125, 15411. (5) Kondirkov, B. N.; Annikov, V. E.; Egorshev, V. Y.; DeLuca, L.; Bronzi, C. J. Propul. Power 1999, 15, 763. (6) Sinditskii, V. P.; Egorshev, V. Y.; Levshenkov, A. I.; Serushkin, V. V. Propellants, Explosives, Pyrotechnics 2005, 30, 269. (7) Lee, H.; Litzinger, T. A. Combustion and Flame 2001, 127, 2205. (8) Lee, H.; Litzinger, T. A. Combustion and Flame 2003, 135, 151. (9) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2002, 117, 2599. (10) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2003, 118, 2599. (11) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2003, 119, 4274. (12) Mebel, A. M.; Lin, M. C.; Morokuma, K.; Melius, C. F. Journal of Physical Chemistry 1995, 99, 6842. (13) Schmidt, M. W.; Gordon, M. S.; Boatz, J. A. Journal of Physical Chemistry A 2005, 109, 7285. (14) Guillot, B.; Guissani, Y. Journal of Chemical Physics 2002, 116, 2047. (15) Lee, C.; Yang, W.; Parr, P. G. Physical Review B 1988, 37, 785. (16) Becke, A. D. Journal of Chemical Physics 1993, 98, 5648. (17) Stephens, P. J.; Devlin, F. J.; Chablowski, C. F.; Frisch, M. J. Journal of Physical Chemistry 1994, 98, 11623. (18) Krishnan, R.; Binkley, J. S.; Seeger, R.; Pople, J. A. Journal of Chemical Physics 1980, 72, 650. (19) Clark, T.; Chandrasekhar, J.; Spitznagel, G. W.; Schleyer, P. v. R. Journal of Computational Chemistry 1983, 4, 294. (20) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, J. A., Jr.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; Pople, J. A. Gaussian03; revision C.03 ed.; Gaussian, Inc.: Pittsburgh, PA, 2003. (21) Barone, V. Journal of Chemical Physics 2004, 120, 3059. (22) Barone, V. Journal of Chemical Physics 2005, 122, 014108. (23) Ditchfield, R. Journal of Chemical Physics 1972, 56, 5688. (24) Astrand, P.-O.; Ruud, K.; Taylor, P. R. Journal of Chemical Physics 2000, 112, 2655.

24

(25) Ruud, K.; Astrand, P.-O.; Taylor, P. R. Journal of Chemical Physics 2000, 112, 2668. (26) Ruud, K.; Astrand, P.-O.; Taylor, P. R. Journal of the American Chemical Society 2001, 123, 4826. (27) Dennington II, R.; Keith, T.; Millam, J.; Eppinnett, K.; Hovell, W. L.; Gilliland, R. GaussView; 3.09 ed.; Semichem, Inc.: Shawnee Mission, KS, 2003. (28) Irikura, K. K.; Johnson, R. D., III; Kacker, R. N. Journal of Physical Chemistry A 2005, 109, 8430. (29) Andersson, M. P.; Uvdal, P. Journal of Physical Chemistry A 2005, 109, 2937. (30) Shimanouchi, T. Molecular Vibrational Frequencies. In NIST Chemistry WebBook, NIST Standard Reference Database Number 69; Linstrom, P. J., Mallard, W. G., Eds.; National Institute of Standards and Technology: Gaithersburg, MD, 20899 (http://webbook.nist.gov), June 2005. (31) McGraw, G. E.; Bernitt, D. L.; Hisatsune, I. C. Journal of Chemical Physics 1965, 42, 237. (32) Perrin, A.; Lado-Bordowsky, O.; Valentin, A. Molecular Physics 1989, 67, 249. (33) Maki, A. G.; Olsen, W. B. Journal of Molecular Spectroscopy 1989, 133, 171. (34) Goldman, A.; Burkholder, J. B.; Howard, C. J.; Escribano, R.; Maki, A. G. Journal of Molecular Spectroscopy 1988, 131, 195. (35) Luckhaus, D. Journal of Chemical Physics 1997, 106, 8409. (36) Note that these calculations are based on harmonics frequencies. The VPT2 method is not applicable to TMS because it behaves as a spherical top.

http://webbook.nist.gov)/�

25

Chapter 3

Simulation Methods for Hydride Transfer in Dihydrofolate Reductase

Reproduced in part with permission from D.K. Chakravory, M.D. Kumarasiri, A.V. Soudackov, and S. Hammes-Schiffer, Journal of Chemical Theory and Computation 2008, 4, 1974.


3.1 Introduction

This chapter presents the methodology used to simulate the hydride transfer reaction

catalyzed by dihydrofolate reductase (DHFR). Our goal is to predict the free energy barrier of

wild-type DHFR as efficiently as possible without losing accuracy. This will allow us to simulate

the hydride transfer in many mutants of DHFR. The next chapter will utilize this methodology to

rank mutants of DHFR according to their hydride transfer reaction rates.

DHFR is a vital enzyme required for folate metabolism in humans. It converts DHF to

THF using the coenzyme NADPH. Specifically, the pro-R hydride of NADPH is transferred to

the C6 position of N5 protonated DHF (H3F+).1 The mechanism of this hydride transfer, as well

as the structure and dynamics of DHFR, has been studied extensively. The minima and transition

states have been studied with ab initio, semiempirical, and QM/MM methods.2-5 Hammes-

Schiffer and coworkers have used a hybrid quantum/classical approach to study the hydride

transfer reaction in DHFR.1,6-9 Truhlar, Gao and coworkers10-14 and, Thorpe and Brooks15-17 have

also studied this reaction using similar theoretical methods. X-ray crystallographic structure of

DHFR of different species has been determined for many substrate, cofactor and inhibitor

complexes.18 Thus DHFR provides an excellent target to study ranking of activity of mutants.

We are specifically focusing on Escherichia coli DHFR.

26

In transition state theory, the relationship between the rate constant k of a reaction and the

free energy barrier of activation †G∆ can be written as

† / BG k Tk e−∆∝ (3.1) 19

Therefore, ordering mutants according to their †G∆ will rank them according to their reaction

rates. The free energy barrier †G∆ of an enzymatic reaction can be calculated by generating the

free energy profile of the reaction. However, as stated in Section 1.1, common empirical force

fields used in molecular dynamics simulations cannot directly be used to study a chemical

reaction, as they do not permit bond breaking and forming processes. To overcome this

disadvantage, we use an empirical valence bond (EVB) approach.20,21

3.2 EVB Molecular Dynamics

In the EVB approach for a hydride transfer reaction, two valence bond states are defined.

In the first state, the transferring hydrogen is bonded to the donor (reactant state) and in the

second state, the transferring hydrogen is bonded to the acceptor (product state). Then MD

simulations are carried out to generate the free energy profile of the reaction along a collective

reaction coordinate. The total wavefunction of a system described by two valence bond (VB)

states can be written as

1 1 2 2c cψ ψΨ = + , (3.2)

where 1ψ is the wavefunction of the first VB state and 2ψ is the wavefunction of the second VB

state. The EVB Hamiltonian matrix corresponding to this system is

( ) ( ) ( )( ) ( )

11 12

21 22EVB

V VH

V V

=

R RR

R R, (3.3)

27

where R represents the coordinates of the nuclei, ( )11V R is the potential energy of state 1, and

( )22V R is the potential energy of state 2. The electronic ground state potential surface is

obtained by diagonalizing ( )EVBH R . Although the terms ( )12V R and ( )21V R can be expressed in

simple analytical functional forms with parameters fit to ab initio calculations or experimental

data, we approximate them as constants for simplicity and computational efficiency.

To obtain †G∆ for the hydrogen transfer reaction, the free energy profile of the reaction

must be calculated. This can be achieved efficiently by performing MD simulations with a

mapping or a biasing potential rather than the ground state EVB potential.20 The mapping

potential drives the reaction forward and is defined as

map 11 22( ; ) (1 ) ( ) ( )i i iV V Vλ λ λ= − +R R R , (3.4)

where ( )11V R and ( )22V R are the diagonal elements of the EVB Hamiltonian in Equation 3.2 and

R is the coordinates of the system. As the mapping parameter iλ is varied between 0 and 1, the

reaction progresses from VB state 1 (reactant state) to VB state 2 (product state). Therefore, MD

simulations are done for a series of iλ along a collective reaction coordinate. The collective

reaction coordinate chosen is an energy gap reaction coordinate:

11 22( ) ( ) ( )V VΛ = −R R R . (3.5)

This definition is analogous to the solvent coordinate used in standard Marcus theory for electron

transfer reactions.21-23

The free energy profile of the reaction is obtained in two steps. In the first step, the free

energy for the mapping potential map ( ; )iV λR is calculated along the reaction coordinate ( )Λ R for

each iλ using a standard binning procedure during MD simulations. In the second step, the

distributions of free energies along ( )Λ R obtained with different iλ are combined using a

28

statistical method. The weighted histogram analysis method (WHAM) and umbrella integration

(UI) method are two such statistical methods used for this purpose.24

3.3 WHAM and UI

In umbrella sampling,25-27 simulations are performed with a series of biasing potentials

( )iw ξ , where ξ is the reaction coordinate. The distribution b ( )iP ξ of the biased system along

the reaction coordinate is typically obtained by standard binning procedures to generate a

histogram. Specifically, the relevant range of the reaction coordinate is divided into bins, and

bbin( )iP ξ is the fraction of sampled configurations in the bin centered at the reaction coordinate

ξbin for the window corresponding to the biasing potential ( )iw ξ . The potential of mean force

(PMF) for the biased system along the reaction coordinate is given by

b b1( ) ln ( )i iA Pξ ξβ

= − , (3.6)

where 1 Bk Tβ = . The PMF for the unbiased system in each window is

u b1( ) ln ( ) ( )ii i iA P w Fξ ξ ξβ

= − − + , (3.7)

where Fi are constants that differ for each biasing potential or window.

In WHAM,26-30 the constants Fi are calculated iteratively to combine the unbiased

potentials of mean force for different windows. The following two equations are solved

iteratively

( ) ( ) ( )b j jwindows windows

F wi i j

i jP N P N e ξ βξ ξ − = ∑ ∑ (3.8)

( ) ( )ii wFe d e Pξ ββ ξ ξ−− = ∫ (3.9)

29

where Ni is the total number of configurations sampled for window i used to construct b ( )iP ξ .

After these equations are solved to self consistency, the PMF A(ξ) is obtained directly from P(ξ)

using the relation ( ) ln ( )A Pξ ξ β= − .

In UI,31,32 the derivative of the unbiased PMF with respect to the reaction coordinate is

calculated for each window,

( ) ( ) ( )u bln1 ii iA P dw

dξ ξ ξ

ξ β ξ ξ∂ ∂

= − −∂ ∂

, (3.10)

The data from different windows are combined according to a weighted average

( ) ( ) ( )uwindows

ii

i

A Ap

ξ ξξ

ξ ξ ∂ ∂

= ∂ ∂

∑ , (3.11)

where

( ) ( ) ( )b bi i i i i

ip N P N Pξ ξ ξ= ∑ . (3.12)

Subsequently, A(ξ) is obtained by numerical integration over ξ. In previous applications of UI,

the biasing potential is assumed to be of the form ( ) ( )2 2i iw Kξ ξ ξ= − . Moreover, the biased

PMF is expanded in a power series and truncated after the quadratic term, which is equivalent to

assuming a normal distribution for b ( )iP ξ ,

( )2

bb

bb

1 1exp22

ii

ii

P ξ ξξσσ π

− = − , (3.13)

where the mean biξ and the variance b

iσ for each window are determined from the simulation

data. These approximations lead to an analytical expression for the derivative of the unbiased

PMF given in Equation 3.10.

30

The UI method differs from WHAM in two important aspects. First, the UI method is

based on the derivative of the PMF, rather than the PMF itself, so it does not involve offsets and

therefore avoids the iterative procedure inherent to WHAM. Second, UI does not require a

binning procedure because the mean and variance of the normal distribution for each window are

determined directly from the raw simulation data, so a binning procedure is not required to obtain

the derivative of the PMF given in Equation 3.11. Specifically, the values of the reaction

coordinate for all configurations sampled are collected during the simulation, and the mean and

variance of the reaction coordinates collected for each window are determined directly from these

data without generating a histogram.33

To implement the UI method within the framework of a two-state EVB potential using an

energy gap reaction coordinate and a mapping potential, the derivative of the unbiased PMF

given in Equation 3.10 can be expressed as,

( )bu

2 212

ln1 12 2 4

iii

PAV

ξ ξλξ β ξ ξ

∂∂ = − − − + ∂ ∂ +

(3.14)

The complete derivation is given elsewhere.24 Approximating ( )biP ξ by a normal distribution,

we have obtained an analytical form for the derivative of the unbiased PMF for each window.

The data for the different windows can be combined using Equation 3.11, followed by numerical

integration of the derivative of the PMF over ξ to obtain the PMF A(ξ).

UI has several advantages over WHAM. One advantage of UI is that it does not require

overlap between the distributions of the windows, although such overlap is desirable to enhance

the accuracy. In contrast, WHAM requires sufficient overlap between the distributions of the

windows although, in principle, given sufficient sampling within each window, WHAM and UI

should converge to the same results if the distributions are Gaussian. However, the convergence

of the iterative procedure in WHAM becomes slow for small overlap between the distributions of

31

the windows, and insufficient sampling of the tail regions of the distributions combined with very

small overlap could preclude convergence. Additionally, UI utilizes an analytical expression for

the distributions, thereby decreasing the statistical noise. Moreover, UI does not require an

iterative procedure, so convergence is not an issue. These advantages become particularly

pronounced for small overlaps between the distributions of the windows, although additional

windows will enhance the accuracy of both methods.

3.4 Application to DHFR

The initial coordinates for wild type DHFR were obtained from an equilibrated reactant

state snapshot of a previous simulation.8,34 The initial simulation system includes the entire

protein, the substrate, and the cofactor solvated by 4122 explicit water molecules in a truncated

octahedral periodic box. The potential energy surface was represented by a two-state EVB

potential,20 where state 1 corresponds to the transferring hydrogen atom bonded to the donor, and

state 2 corresponds to the transferring hydrogen atom bonded to the acceptor, as described

previously. The diagonal elements of the EVB Hamiltonian terms ( )11V R and ( )22V R

correspond to the GROMOS force field.35 The EVB constant coupling parameter was 34.66

kcal/mol, and the constant energy adjustment for ( )22V R was 65.25 kcal/mol. Both of these EVB

parameters were obtained elsewhere.34 The MD simulations were performed using GROMOS

with a modified FORCE routine. The integration time step was 1 fs, and the constraints were

maintained by SHAKE. Two separate Berendsen thermostats with 0.1 ps relaxation times each

were used to maintain the temperature of the solute and the solvent molecules at 300 K.

A set of 19 mapping parameters from λi = 0.05 to 0.95 with a spacing of 0.05 were used

to generate the full free energy curve. The starting configuration for each window was obtained

32

from the previous window after 20 ps of equilibration. Each window was equilibrated for a total

of 350 ps, followed by 300 ps of data collection. The free energy barriers of 15.0 kcal/mol and

15.3 kcal/mol were determined with UI and WHAM, respectively. We used a Fortran 90

program to calculate the free energy barrier with WHAM and a Mathematica33 notebook to

calculate UI barriers. These barrier heights are consistent with the classical barriers determined

from previous simulations using both thermodynamic integration and WHAM. We also

generated two other independent sets of data with 50 ps of equilibration followed by 300 ps of

data collection. The free energy barriers determined from these three data sets differ by less than

0.5 kcal/mol compared to the free energy barrier of 15.4 kcal/mol obtained from another set of

data generated using 20 mapping potentials with 4.5 ns of molecular dynamics for each window

and an additional 2 ns for the four windows near the transition state.8,36 The nuclear quantum

effects of the transferring hydrogen, such as zero point energy and tunneling, have been shown

previously to decrease the free energy barrier by 2.4 kcal/mol.1 Since this decrease is expected to

be similar for wild-type and all mutants, not including the nuclear quantum effects will allow us

to increase computational efficiency of the method without sacrificing accuracy. Thus, all free

energy barriers were obtained with a classical treatment of the transferring hydrogen nucleus.

Generating 19 windows with 650 ps each requires approximately 1300 CPU hours on

Intel Xeon 3.0 GHz processors. Thus it is desirable to attempt to generate the free energy barrier

at a lesser computational cost. A set of six mapping parameters, λi = 0.05, 0.125, 0.250, 0.375,

0.500 and 0.625, were used to accomplish this and to generate a partial free energy curve.

Choosing λi=0.625 as the last window drives the reaction over the barrier, allowing adequate

sampling of the transition state region. Each window was equilibrated for 50 ps, followed by 300

ps of data collection. The resulting free energy barrier was 15.4 kcal/mol with WHAM and 15.1

kcal/mol with UI. Approximately 230 CPU hours were required to generate the partial free

energy curve from six windows. This validates our choice of generating partial free energy

33

profiles using a smaller number of windows. The complete free energy profile using 19 windows

and the partial free energy profile using six windows are given in Figure 3.1. Figure 3.2 compares

the free energy barriers obtained with six windows to that of 19 windows.

Figure 3.1: Free energy profiles of E. coli wild-type DHFR using WHAM (blue dashed) and UI (red). (a) Full free energy curve using 19 windows. (b) Partial free energy curve using 6 windows.

(b)

(a)

(b)

Collective reaction coordinate

34

Figure 3.2: Free energy profiles of wild-type DHFR using UI method. Full free energy curve using 19 windows is in blue, and the partial free energy curve using 6 windows is in red.

The structure of the initial snapshot in the central cell is given in figure 3.3. It is

composed of 21 pieces involving periodic images because the protein has drifted from the middle

of the central cell during previous simulations. During MD simulations, programs such as

GROMOS use periodic boundary conditions, so this drifting has no effect on the calculations.

However, the programs that we used to make mutations for the calculations discussed in next

chapter do not use periodic boundary conditions. Therefore, it is necessary to reconstruct the

system and place it in the middle of the central cell before making mutations. This involves three

steps:

1. Sewing the periodic images and centering the protein at the middle of the box.

2. Resolvating the protein.

3. Re-equilibrating the system.

35

Figure 3.3: Original reactant state snapshot. Periodic images manifest to make the protein structure appear broken. This figure was created using VMD.37

First the system is stripped of all non-crystal waters. Sewing the periodic images was

done using the UTAILOR routine implemented in the DLPROTEIN38 MD simulation package.

The UTAILOR procedure uses bond information to connect all bonds that do not have the

participating two atoms in the central single cell. To move the crystal waters with the protein,

first, the geometrical center of the sewed protein was placed at the center of the central box,

shifting the crystal waters with the protein. This positions the crystal waters that were not sewed

during the UTAILOR procedure outside the central box. Then the truncated octahedral periodic

boundary conditions were applied to move the crystal waters back into the central cell. During

this process, we also had to compensate for the difference in the origin of the coordinate system

between GROMOS and DLPROTEIN. The resulting structure was resolvated. The final system

36

contained 4097 SPC/E39 water molecules in a truncated octahedral periodic box with a distance of

66.55 Å between opposing square surfaces (Figure 3.4). The SPC/E water model includes

Lennard – Jones and electrostatic interactions. The water molecules were equilibrated for 500 ps

while freezing the protein structure. Then four cycles of restrained steepest descent

minimizations and 15 ps MD simulations were done on the reactant state potential energy surface

with gradual release of a restraining force constant. Figure 3.5 illustrates this procedure. The

initial restraining force constant used was 59.75 kcal mol-1 Å-2 and was halved with each cycle.

Finally, 300 ps of MD simulation was done on the EVB reactant potential surface prior to starting

the MD simulations for the windows corresponding to different mapping potentials. Each

window was equilibrated for 120 ps, followed by 250 ps of data collection.

Figure 3.4: Resolvated structure with no periodic images. This figure was created using VMD.37

37

Figure 3.5: Summary of steps of the restrained minimization and MD simulation procedure. fc is the restraining force constant which is halved with each cycle i.

The resulting free energy barrier is 15.6 kcal/mol with WHAM (Figure 3.6). UI provides

a 15.4 kcal/mol barrier. When we analyzed the data using 70 ps of equilibration followed by 300

ps of data collection per window or, alternatively, 170 ps of equilibration followed by 200 ps of

data collection per window, we obtained a free energy barrier of 15.7 kcal/mol or 15.6 kcal/mol,

respectively. Thus, the free energy barrier is converged to within 0.3 kcal/mol using six windows

and 120 ps of equilibration followed by 250 ps of data collection for each window.

Generate free energy curve with WHAM

or UI

6 windows MD with mapping

potentials 370 ps each

i = i + 1 k = k/2

MD of EVB reactant

300 ps

Restrained minimization of reactant

fc = k

Restrained MD of reactant fc = k, 15 ps

Wild type reactant structure

i = 1

i ≤ 4

Y

N

38

Figure 3.6: Partial free energy profile of E. coli wild-type DHFR with WHAM (blue dashed) and UI (red) using the sewed structure and 6 windows.

Based on previous observations, we also attempted to generate the free energy barrier

using a lesser number of windows with the UI method. Using four windows with iλ = 0.050,

0.250, 0.500 and 0.625 results in a 15.1 kcal/mol free energy barrier, while using iλ = 0.050,

0.125, 0.500 and 0.625 results in a 15.2 kcal/mol free energy barrier. Using only three windows

with the UI method did not result in a meaningful free energy barrier. We also noted that the

WHAM method using only four windows does not generate a meaningful result. This is expected

with WHAM, as it relies more heavily on the overlap between windows.

39

3.5 Conclusions

In this chapter, we described the theory behind predicting enzymatic reaction rates

according to the activation free energy barrier. The reaction rate is proportional to the free energy

barrier. The free energy barrier of a hydrogen transfer reaction can be generated by carrying out

EVB/MD simulations with an energy gap collective reaction coordinate. The transferring

hydrogen was represented by a Morse potential. A mapping potential is used to drive the reaction

over the barrier. The MD simulations were carried out for a series of mapping potentials

allowing sampling along the entire energy gap reaction coordinate.

We also described two statistical methods that can be used to generate the free energy

curve. The WHAM method uses an iterative procedure to combine the biased PMF curves

obtained with different mapping potentials. The UI method is based on the derivative of the PMF

with respect to the reaction coordinate rather than the PMF itself. There are two significant

advantages of UI over WHAM. The first advantage is that UI does not rely on a binning

procedure to generate histograms and therefore reduces the statistical error and converges

efficiently. The second advantage is that UI can provide accurate PMF curves efficiently even

with a small number of windows that do not overlap significantly. Thus, UI is a promising

method for generating accurate PMF curves for large systems for which sampling may be limited.

We established that for DHFR, it is not necessary to generate the full free energy profile

to calculate the free energy barrier. Generating only partial free energy profiles with a smaller

number of windows significantly reduces the computational cost. To facilitate mutant studies, we

also centered the protein in the central cell based on an initial structure that was generated with

periodic boundary conditions. The free energy barrier calculated with this initial structure agrees

with previous calculations and was similar for both the WHAM and UI methods. We showed

that the UI method can generate comparable free energy barriers even with only four windows.

40

3.4 References

(1) Agarwal, P. K.; Billeter, S. R.; Hammes-Schiffer, S. Journal of Physical Chemistry B 2002, 106, 3283. (2) Peter L. Cummins, J. E. G. Journal of Computational Chemistry 1990, 11, 791. (3) Andrés, J.; Safont, V. S.; Martins, J. B. L.; Beltrán, A.; Moliner, V. Journal of Molecular Structure: THEOCHEM 1995, 330, 411. (4) Andrés, J.; Moliner, V.; Safont, V. S.; Domingo, L. R.; Picher, M. T.; Krechl, J. Bioorganic Chemistry 1996, 24, 10. (5) Castillo, R.; Andres, J.; Moliner, V. Journal of the American Chemical Society 1999, 121, 12140. (6) Wong, K. F.; Selzer, T.; Benkovic, S. J.; Hammes-Schiffer, S. Proc Natl Acad Sci U S A 2005, 102, 6807. (7) Watney, J. B.; Soudackov, A. V.; Wong, K. F.; Hammes-Schiffer, S. Chemical Physics Letters 2006, 418, 268. (8) Wang, Q.; Hammes-Schiffer, S. Journal of Chemical Physics 2006, 125, 184102. (9) Billeter, S. R.; Webb, S. P.; Iordanov, T.; Agarwal, P. K.; Hammes-Schiffer, S. Journal of Chemical Physics 2001, 114, 6925. (10) Hammes-Schiffer, S. Current Opinion in Structural Biology 2004, 14, 192. (11) Pang, J.; Pu, J.; Gao, J.; Truhlar, D. G.; Allemann, R. K. Journal of the American Chemical Society 2006, 128, 8015. (12) Garcia-Viloca, M.; Truhlar, D. G.; Gao, J. Biochemistry 2003, 42, 13558. (13) Garcia-Viloca, M.; Truhlar, D. G.; Gao, J. Journal of Molecular Biology 2003, 327, 549. (14) Garcia-Viloca, M.; Alhambra, C.; Truhlar, D. G.; Gao, J. Journal of Computational Chemistry 2003, 24, 177. (15) Thorpe, I. F.; Brooks, C. L., 3rd. Proteins 2004, 57, 444. (16) Rod, T. H.; Radkiewicz, J. L.; Brooks, C. L., 3rd. Proc Natl Acad Sci U S A 2003, 100, 6980. (17) Brooks, C. L.; Karplus, M.; Pettitt, B. M. Proteins : a theoretical perspective of dynamics, structure, and thermodynamics; J. Wiley: New York, 1988. (18) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Research 2000, 28, 235. (19) Wigner, E. Physical Review 1932, 40, 749. (20) Warshel, A. Computer Modeling of Chemical Reactions in Enzymes and Solutions; John Wiley & Sons, Inc.: New York, 1991. (21) Warshel, A. Journal of Physical Chemistry 1982, 86, 2218. (22) Marcus, R. A. Annual Review of Physical Chemistry 1964, 15, 155. (23) Zusman, L. D. Chemical Physics 1980, 49, 295. (24) Chakravorty, D. K.; Kumarasiri, M.; Soudackov, A. V.; Hammes-Schiffer, S. Journal of Chemical Theory and Computation 2008, 4, 1974. (25) Torrie, G. M.; Valleau, J. P. Chemical Physics Letters 1974, 28, 578. (26) Ferrenberg, A. M.; Swendsen, R. H. Physical Review Letters 1989, 63, 1195. (27) Ferrenberg, A. M.; Swendsen, R. H. Phys. Rev. Lett. 1988, 61, 2635. (28) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. J. Comput. Chem. 1992, 13, 1011. (29) Roux, B. Comput. Phys. Commun. 1995, 91, 275. (30) Souaille, M.; Roux, B. Comput. Phys. Commun. 2001, 135, 40.

41

(31) Kastner, J.; Thiel, W. J. Chem. Phys. 2005, 123, 144104. (32) Kastner, J.; Thiel, W. J. Chem. Phys. 2006, 124, 234106. (33) Mathematica, Version 6.0; Wolfram Research, Inc.: Champaign, IL, 2007. (34) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. Journal of Physical Chemistry B 2004, 108, 12231. (35) van Gunsteren, W. F.; Billeter, S. R.; Eising, A. A.; Hunenberger, P. H.; Kruger, P.; Mark, A. E.; Scott, W. R. P.; Tironi, I. G. Biomolecular simulation: The GROMOS96 manual and user guide; VdF Hochschulverlag, ETH Zurich: Zurich, 1996. (36) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. J. Phys. Chem. B 2004, 108, 12231. (37) Humphrey, W.; Dalke, A.; Schulten, K. J Mol Graph 1996, 14, 33. (38) Melchionna, S.; Cozzini, S. DLPROTEIN 2.1Rome, Italy, 2001. (39) Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T. P. Journal of Physical Chemistry 1987, 91, 6269.

42

Chapter 4

Ranking Mutants of Dihydrofolate Reductase According to the Hydride Transfer Rates

Reproduced in part with permission from M. D. Kumarasiri, G.A. Baker, A.V. Soudackov and S. Hammes-Schiffer, submitted to Journal of Physical Chemistry B.


4.1 Introduction

Computational protein design is a rapidly growing field with potential applications in

pharmaceuticals, biotechnology, and other industrial processes. Since most protein design

protocols generate large numbers of designs,1-3 an efficient method for ranking these designs

according to specified criteria is essential. This type of ranking process enables the selection of a

smaller number of designs for experimental characterization. A variety of criteria could be used

in this ranking process, including protein stability, substrate binding energy, and activity. Often

the objective is to rank protein designs according to the enzyme-catalyzed reaction rate. Within

the framework of transition state theory, the rate of a chemical reaction is exponentially related to

the free energy barrier. In this case, the protein designs are ranked according to the relative free

energy barriers of the chemical step of interest.

In this chapter, we present an efficient computational approach for ranking mutant

enzymes according to the relative free energy barriers associated with the catalyzed chemical

reaction. The mutant enzymes are generated using a rotamer library4 in conjunction with

restrained minimizations and molecular dynamics simulations. For each mutant, the partial free

energy curve for the chemical step of interest is calculated along a collective reaction coordinate

using biased molecular dynamics simulations and umbrella integration5,6 with an empirical

valence bond potential.7,8 This procedure does not include any type of parameter fitting for the

43

mutants. Each step in the procedure can be automated and optimized for the specific enzyme

system.

We apply this ranking approach to dihydrofolate reductase (DHFR), which catalyzes the

reduction of 7,8-dihydrofolate (DHF) into 5,6,7,8-tetrahydrofolate (THF) using the coenzyme

nicotinamide adenine dinucleotide phosphate (NADPH).9 In this reaction, the hydride is

transferred from the C4 position of the NADPH cofactor to the C6 position of the protonated

dihydrofolate substrate. The product THF is essential for the synthesis of purines, pyrimidines,

and certain amino acids. As a result, DHFR inhibition has been promoted as a pharmacological

target for antibacterial agents and anticancer drugs.10-14 Furthermore, the hydride transfer

reaction catalyzed by DHFR and its mutants has been the subject of a wide variety of

experimental15-29 and theoretical30-43 studies. In particular, kinetic measurements on 15 single

mutant DHFR enzymes indicate hydride transfer rates ranging from 0.2 s-1 to 319 s-1 at pH 7.16-21

The objective of this chapter is to use the computational approach outlined above to rank these 15

DHFR mutant enzymes according to the rate of hydride transfer and compare the results to the

available experimental data.

In Section 4.2, we present the general computational ranking approach and describe its

implementation for studying the hydride transfer reaction in DHFR. Section 4.3 compares the

experimental and calculated changes in free energy barriers for this enzyme-catalyzed reaction.

The conclusions are presented in Section 4.4.

4.2 Methods

We calculated the free energy barrier for hydride transfer in wild-type DHFR and 15

mutants to determine the change in the free energy barrier upon mutation. These 15 mutants were

identified using a general literature search for experimental measurements of the hydride transfer

44

rate at pH 7.44 The hydride transfer reaction catalyzed by DHFR is depicted in Figure 4.1. Our

group studied this reaction previously in wild-type DHFR,31,32,45 as well as a few selected

mutants,33,34 with a hybrid quantum-classical molecular dynamics approach, which includes the

nuclear quantum effects of the transferring hydrogen with grid-based or path integral methods.

Here we use the same simulation system and EVB potential but do not include the nuclear

quantum effects because they are expected to be similar for all mutants and therefore will not

significantly impact the changes in the free energy barrier upon mutation. The reasoning behind

this was provided in Section 3.4.

HN

N NH

HN

H2N

NHRO

NH

HR'

NH2O

H3F+ NADPH H4F

HN

N NH

HN

H2N

NHRO

H

NADP+

N R'

NH2O

H

+

+

Figure 4.1: Hydride transfer reaction from the NADPH cofactor to the protonated dihydrofolate substrate H3F+ to form the products tetrahydrofolate H4F and NADP+. Figure reproduced with permission from Ref. 8.

In previous simulations, the initial coordinates were obtained from a crystal structure of

Escherichia coli DHFR complexed with NADP+ and folate (PDB code 1rx2).22 Here, we started

with a snapshot of the equilibrated reactant state obtained from previous simulations.32,45 We

followed the procedure described in Section 3.4 to generate a single connected structure in the

central cell. The present simulation system has the protein, the substrate NADPH, and the

cofactor H3F+ solvated by 4097 explicit water molecules in a truncated octahedral periodic box

with a distance of 66.55 Å between opposing square faces. The protonation states of the amino

acids are the same as those used in previous work, which were determined from the pKa values at

pH=7 and hydrogen bonding environments. The potential energy surface of the hydride transfer

45

is represented by a two-state empirical valence bond (EVB) potential,7 where state 1 corresponds

to the transferring hydrogen atom bonded to the donor carbon, and state 2 corresponds to the

transferring hydrogen atom bonded to the acceptor carbon. The diagonal elements of the EVB

Hamiltonian terms ( )11V R and ( )22V R are based on the GROMOS force field46 with the

covalent bond involving the transferring hydrogen represented by a Morse poteitial.31 The two

constant EVB parameters corresponding to the relative energy of the two valence bond states and

the coupling between these states are 65.25 kcal/mol and 34.66 kcal/mol and are fixed at these

values for all mutants. These parameters were determined elsewhere by fitting the results

obtained from hybrid quantum/classical molecular dynamics simulations of wild-type DHFR to

the experimental free energies of reaction and activation obtained from the pH-independent

forward and reverse hydride transfer reaction rate constants.32 In the present calculations, the

transferring hydrogen nucleus is treated classically, and the results are compared to experimental

hydride transfer rates for the mutants measured at pH ≈ 7. For this reason, we focus on the

changes in the free energy barriers relative to the wild-type DHFR rather than the absolute free

energy barriers.

The free energy barrier for each mutant is estimated by generating the potential of mean

force for the hydride transfer reaction along a collective reaction coordinate using umbrella

sampling techniques. The collective reaction coordinate is defined as the difference between the

energies of the two VB states

( ) ( ) ( )11 22V VΛ = −R R R , (4.1)

where 11( )V R and 22 ( )V R are the energies of VB states 1 and 2, respectively, and R denotes all

nuclear coordinates. The molecular dynamics simulations are performed with mapping potentials

defined as linear combinations of the energies of the two VB states7

map 11 22( ; ) (1 ) ( ) ( )m m mV V Vλ λ λ= − +R R R . (4.2)

46

As the mapping parameter λm is varied from zero to unity, the reaction progresses from the

reactant state to the product state. The potential of mean force is obtained by propagating a series

of independent molecular dynamics simulations with different mapping potentials and combining

them using the weighted histogram analysis method (WHAM)47 or umbrella integration (UI).5,6

The details of the molecular dynamics simulations and WHAM and UI implementations are given

in Section 3.1 and 3.2.

In this chapter, our objective is to estimate the free energy barriers for the mutants as

efficiently as possible. Thus, we decreased the number of windows and the amount of

equilibration and data collection for each window and confirmed that the desired accuracy for the

free energy barrier is still maintained. For this purpose, we generated the free energy profile

using WHAM with only six windows: mλ = 0.050, 0.125, 0.250, 0.375, 0.500 and 0.625. As

stated in Section 3.3, the starting configuration for each window was obtained from the previous

window after 20 ps of equilibration. Each window was equilibrated for a total of 120 ps,

followed by 250 ps of data collection. The resulting wild-type free energy barrier of 15.6

kcal/mol is similar to the previously obtained barrier with 20 windows and significantly more

equilibration and data collection.32,45

Previously we showed that UI provides the same free energy profiles as WHAM but

requires fewer windows for efficient convergence.8 Using UI to generate the free energy profile,

we obtained a wild-type free energy barrier of 15.4 kcal/mol with the six windows given above.

We obtained a free energy barrier of 15.1 kcal/mol using only four of these windows: mλ =

0.050, 0.250, 0.500 and 0.625. Based on this analysis, we compared the free energy barrier

changes for the 15 mutants using WHAM with six windows and UI with both four and six

windows. As will be shown below, the WHAM and UI methods with six windows lead to nearly

47

identical results, and the UI method with four windows leads to qualitatively similar results with

minor quantitative discrepancies.

The mutant structures of DHFR were generated from an equilibrated wild-type structure.

The coordinates of this structure are given in Supporting Information. The profix utility in the

JACKAL suite of programs4,48 was used to generate the initial mutant structures from this wild-

type structure. The profix utility uses a backbone rotamer library, a side-chain rotamer library,

and distance geometry constraints to sample segment conformations. Missing residues are

reconstructed with the Nest and Scap modules4 in conjunction with the all-atom AMBER96

forcefield.49 In our procedure, the coordinates of the residue to be mutated were deleted from the

pdb file, the residue name was changed to the new name, and the profix utility was run for the

modified pdb file. Appendix A provides additional details of profix usage and mutation

procedure. In addition to the mutated residue, we observed that the conformations of

approximately four residues on each side of the mutation site are also altered during this

procedure.

The resulting mutant structures were subjected to four cycles of restrained minimizations

and molecular dynamics simulations on the pure reactant potential energy surface with a gradual

release of the atomic restraints. The initial restraining force constant on all atoms with respect to

the initial structure was 59.75 kcal mol-1 Å-2 and was halved with each cycle of the procedure.

Each cycle consisted of a steepest descent geometry optimization followed by 15 ps of molecular

dynamics. Subsequently, 650 ps of equilibration on the EVB reactant potential energy surface

with mλ = 0.05 was performed. After this equilibration, molecular dynamics simulations with

mapping potentials corresponding to mλ = 0.050, 0.125, 0.250, 0.375, 0.500 and 0.625 were

propagated. The starting configuration for each window was obtained from the previous window

48

after 20 ps of equilibration, and each window was equilibrated for a total of 120 ps, followed by

250 ps of data collection.

In order to confirm that the initial equilibration time of 650 ps on the EVB reactant

surface was sufficient, we performed the calculations for all mutants with an initial equilibration

time of only 300 ps on the EVB reactant surface. The free energy barriers differ from those

obtained with 650 ps of equilibration by less than 0.5 kcal/mol for all mutants except G121P,

G121V, D122A, and D27E. For these four mutants, we repeated the calculations with an initial

equilibration time of 850 ps on the EVB reactant surface and found that the resulting free energy

barriers are within 0.2 kcal/mol of those obtained with 650 ps of equilibration. Based on these

tests, we concluded that an initial equilibration time of 650 ps on the EVB reactant potential

energy surface is sufficient for all mutants studied.

The steps for calculating the free energy barrier change upon mutation, along with the

approximate CPU times for an Intel Xeon 3.0 GHz processor, are as follows:

1. Generation of mutant structure from wild-type using the profix utility (<2 CPU

minutes)

2. Restrained minimizations/molecular dynamics on pure reactant surface (≈8 CPU

hours)

3. 650 ps equilibration on EVB reactant surface with mλ = 0.05 (≈65 CPU hours)

4. 120 ps equilibration and 250 ps data collection for each window (≈37 CPU hours per

window – the windows may be run in parallel on separate processors after the initial

20 ps per window)

5. Generation of free energy profiles using WHAM or UI ( <2 CPU minutes)

This procedure is depicted in Figure 4.2. Note that this procedure does not include any

free parameters or parameter fitting for the mutant calculations. The individual steps have been

49

automated. Appendix B provides the essential scripts written for automation purposes. For each

mutant, the first three steps and the fifth step require only a single processor, and the fourth step

can be run in parallel using four to six processors, depending on the number of windows used.

Thus, the free energy barrier changes for 16 enzymes can be evaluated in approximately one

week using 32 processors.

Figure 4.2: Summary of the steps for the generation of the initial mutant structure, equilibration, and calculation of the free energy barrier. Here fc is the force constant of the position restraints with respect to the initial structure during the restrained minimizations and molecular dynamics simulations.

4.3 Results

Table 4.1 provides the experimentally determined rates for hydride transfer catalyzed by

the 15 mutants. The locations of these mutation sites in the DHFR structure are depicted in

Figure 4.3. Note that these 7 mutation sites are distributed throughout the protein. The slowest

mutant exhibits a decrease in the hydride transfer rate by a factor of ~1000, and the fastest mutant

6 windows MD with mapping

potentials 370 ps each

i = i + 1 k = k/2

MD of EVB reactant

650 ps

Restrained minimization of reactant

fc = k

Restrained MD of reactant fc = k, 15 ps

Profix to mutate

Wild type reactant structure

i = 1

i ≤ 4

Generate free energy curve with

WHAM or UI

Y

N

50

exhibits an increase in the hydride transfer rate by a factor of ~1.5. The associated experimental

free energy barriers were obtained using the standard transition state theory rate constant

expression. All of the experiments were performed at pH = 7 except for the D27C and D27E

mutants which were performed at pH=7.3. Based on the experimentally observed hydride

transfer rate changes in wild-type DHFR around pH 7, we expect the rate changes for D27C and

D27E mutants from pH =7.3 to pH=7 to be within our numerical accuracy. The experimental

changes in the free energy barrier, †expt∆∆G , are defined relative to the wild-type free energy

barrier at pH = 7 and were calculated using the transition state theory rate expression. We also

note that previously the transmission coefficient was calculated to be 0.88 for wild-type DHFR,31

and we do not expect the degree of recrossings of the dividing surface to differ significantly for

the mutants.

Table 4.1: The experimentally determined hydride transfer rate constants for E. coli DHFR mutants at pH ≈7 and 300 K. These rate constants were measured at pH 7 for wild-type DHFR and all mutants except D27E and D27C, which were measured at pH 7.3.

Mutant khyd (s-1) Mutant khyd (s-1)

G121L16 0.2 D27E17 40

G121P16 0.5 S49A20 120

G121V16 1.4 S148A21 157

D27C17 1.7 S148K21 162

G121S18 3.7 G67V18 190

D122A19 4.0 WT16 220

D122S19 5.9 H149Q44 234

D122N19 9.4 S148D21 319

Mutant khyd (s-1) Mutant khyd (s-1)

G121L16 0.2 D27E17 40

G121P16 0.5 S49A20 120

G121V16 1.4 S148A21 157

D27C17 1.7 S148K21 162

G121S18 3.7 G67V18 190

D122A19 4.0 WT16 220

D122S19 5.9 H149Q44 234

D122N19 9.4 S148D21 319

51

Figure 4.3: Depiction of the mutation sites of DHFR. The cofactor is green, the substrate is magenta, and the mutated residues are orange. This figure was created using VMD.50

Table 4.2 provides a comparison of the experimental and calculated changes in the free

energy barrier relative to wild-type DHFR for the series of 15 mutants. The calculated changes in

the free energy barrier are very similar using WHAM and UI with six windows. The results

obtained using UI with only four windows ( mλ = 0.050, 0.250, 0.500 and 0.625) are qualitatively

similar to those obtained using UI with six windows, but the quantitative changes in the free

energy barrier differ by as much as 0.7 kcal/mol and the correlation coefficient51 is only R = 0.78.

Using a different set of four windows, specifically mλ = 0.050, 0.125, 0.500 and 0.625, yields

similar results with R=0.77. We expected that the second set of four windows might yield better

52

results than the first set as we have better reactant region data sampling in the second set, but we

did not notice such a trend. We also attempted using three windows ( mλ = 0.050, 0.500 and

0.625) but, no meaningful free energy barriers could be predicted. Figure 4.4 depicts a correlation

plot of the results obtained using UI with six windows, in which case the correlation coefficient51

is R = 0.82. Given the significant approximations underlying the computational approach, this

level of agreement between the calculated and experimental data is encouraging. Note that the

computational approach predicts the correct direction of the change in free energy barrier for all

15 mutants.

Based on Figure 4.4, we would like to understand the basis for the differences in the rates

among the mutants. The correlation between the experimental barrier heights and the calculated

barrier heights for the mutants with faster rates, S148D, H149Q, G67V, S148K, S148A, and

S49A, appears to be different from the mutants with slower rates. However, according to Figure

4.4, no clear relationship is evident between the positions of the mutation sites relative to the

active site and the rates of hydride transfer. We also note that previous studies indicate that the

thermally averaged donor acceptor distance at the reactant state is larger than that of the transition

state.31,52 Some structural changes that may affect the rate of hydride transfer have also been

observed previously. For example in the case of S148D, which is the fastest in the group,

Benkovic and coworkers suggest that substituting Ser148 with aspartic acid can strengthen the

interaction between βG – βH and Met20 loops stabilizing the product state.53 For the slower

mutants Gly121, it has been suggested that the mutation at position 121 disrupts the network of

coupled promoting motions in DHFR, which manifests as conformational changes that occur

during the reaction.30,54 This results in an increased free energy barrier. Thermally averaged

distances can be used to propose such chemical insights. This work is currently in progress.

53

Table 4.2: The change in the free energy barrier relative to the wild-type free energy barrier for a series of

mutants for different equilibration periods. The experimental free energy barriers are obtained from the

transition state theory rate constant expression ( )†exp= −∆BB

k Tk G k T

husing the experimentally

determined rate constants in Table 4.1. The calculated free energy barriers are obtained using WHAM with

six windows, UI with six windows, and UI with four windows. The notation “WHAM,350” denotes

WHAM with 350 ps of MD on the EVB reactant surface in the equilibration procedure. “WHAM,650” and

“WHAM,850” are defined analogously. UI4 uses mλ = 0.050, 0. 250, 0.500 and 0.625 and UI4’ uses mλ

= 0.050, 0.125, 0.500 and 0.625. All free energies are given in kcal/mol.

Mutant †exptG∆∆ †

WHAM,350G∆∆ †WHAM,650G∆∆ †

WHAM,850G∆∆ †UI6,650G∆∆ †

UI4,650G∆∆ †UI4',650G∆∆

G121L 4.2 4.4 4.3 4.8 4.4 4.1 G121P 3.6 0.7 2.2 2.3 2.6 3.0 2.7 G121V 3.0 -0.1 1.6 1.8 1.9 2.4 2.2 D27C 2.9 2.6 2.6 2.7 2.9 2.7 G121S 2.4 2.4 2.4 2.8 3.3 3.1 D122A 2.4 -0.8 2.3 2.3 2.5 2.9 2.8 D122S 2.2 1.3 1.6 1.8 2.2 1.8 D122N 1.9 2.9 2.8 2.9 3.4 3.2 D27E 1.0 -0.3 1.9 1.9 2.0 2.6 2.3 S49A 0.4 1.5 1.2 1.4 1.8 1.4 S148A 0.2 0.4 0.4 0.8 1.5 1.3 S148K 0.2 2.1 1.7 1.7 2.0 1.8 G67V 0.1 0.7 0.9 0.9 1.6 1.5 H149Q 0.0 -0.9 -1.3 -1.0 -1.0 -1.3 S148D -0.2 -2.8 -1.6 -1.7 -1.6 -0.9 -1.1

54

Figure 4.4: Correlation plot for the calculated and experimental changes in the free energy barrier for the 15 mutants, where the calculated free energy barriers were obtained using UI with 6 windows. The correlation coefficient is R = 0.82.

During our calculations, several assumptions were made. Based on chemical intuition, it

was assumed that the proton transfer step of the DHFR reaction precedes the hydride transfer.

The nuclear quantum effects of the transferring hydrogen were assumed to decrease the free

energy barrier of mutants by a similar amount to the wild-type. We also assumed that there are no

recrossings of the dividing surface, based on the high transmission coefficient of 0.88 calculated

previously.31 We represented the hydride transfer reaction by a two state EVB potential, and the

G121L

G121V

G121P

H149Q

S148D

G121SD122N

D27E

D27C

S148K

S49A

S148A

G67V

D122A

D122S

55

EVB parameters were assumed to be the same for wild-type and the mutants. Additionally, the

off-diagonal coupling terms were approximated by constants. Finally, our simulations suffer

from limitations that are inherent to MD simulations. These include limitations of the forcefield,

solvent model, the reaction field electrostatic treatment in GROMOS, and the length of

simulations governed by available computer power.

4.4 Conclusions

In this chapter, we presented a computationally efficient approach for evaluating the

impact of mutation on enzyme-catalyzed reaction rates. This procedure requires the generation

and equilibration of the mutant structure, followed by the calculation of a partial free energy

curve using an empirical valence bond potential in conjunction with biased molecular dynamics

simulations and umbrella integration. No parameter fitting is involved in this procedure for the

mutants. The individual steps are automated and optimized for computational efficiency.

We used this approach to calculate the changes in the free energy barrier for hydride

transfer upon mutation of DHFR. The 15 mutants studied were chosen objectively based on a

general literature search for experimental measurements of the hydride transfer rate at pH 7.44

The agreement between the calculated and experimental changes in the free energy barrier upon

mutation is encouraging. The computational approach predicts the correct direction of the change

in free energy barrier for all mutants, and the correlation coefficient between the calculated and

experimental data is 0.82. In the future, this approach will be used to predict the impact of

mutations that have not been studied experimentally yet. The feedback between experiment and

theory will guide the further refinement of the procedure. This general approach for ranking

protein designs according to the free energy barrier has implications for protein engineering and

drug design.

56

4.5 References

(1) Rothlisberger, D.; Khersonsky, O.; Wollacott, A. M.; Jiang, L.; DeChancie, J.; Betker, J.; Gallaher, J. L.; Althoff, E. A.; Zanghellini, A.; Dym, O.; Albeck, S.; Houk, K. N.; Tawfik, D. S.; Baker, D. Nature (London) 2008, 453, 190. (2) Jiang, L.; Althoff, E. A.; Clemente, F. R.; Doyle, L.; Rothlisberger, D.; Zanghellini, A.; Gallaher, J. L.; Betker, J. L.; Tanaka, F.; Barbas III, C. F.; Hilvert, D.; Houk, K. N.; Stoddard, B. L.; Baker, D. Science 2008, 319, 1387. (3) Das, R.; Baker, D. Annu. Rev. Biochem. 2008, 77, 363. (4) Xiang, Z.; Honig, B. J. Mol. Biol. 2001, 311, 421. (5) Kastner, J.; Thiel, W. J. Chem. Phys. 2005, 123, 144104. (6) Kastner, J.; Thiel, W. J. Chem. Phys. 2006, 124, 234106. (7) Warshel, A. Computer Modeling of Chemical Reactions in Enzymes and Solutions; John Wiley & Sons, Inc.: New York, 1991. (8) Chakravorty, D. K.; Kumarasiri, M.; Soudackov, A. V.; Hammes-Schiffer, S. Journal of Chemical Theory and Computation 2008, 4, 1974. (9) Miller, G. P.; Benkovic, S. J. Chemistry & Biology 1998, 5, R105. (10) Berg, J. M.; Stryer, L.; Tymoczko, J. Biochemistry, 5th ed.; Freeman: New York, 2002. (11) Miovic, M.; Pizer, L. I. Journal of Bacteriology 1971, 106, 856. (12) Allegra, C. J.; Hoang, K.; Yeh, G. C.; Drake, J. C.; Baram, J. J. Biol. Chem. 1987, 262, 13520. (13) Huennekens, F. M. Advances in Enzyme Regulation 1994, 34, 397. (14) Schweitzer, B. I.; Dicker, A. P.; Bertino, J. R. FASEB J. 1990, 4, 2441. (15) Fierke, C. A.; Johnson, K. A.; Benkovic, S. J. Biochemistry 1987, 26, 4085. (16) Cameron, C. E.; Benkovic, S. J. Biochemistry 1997, 36, 15792. (17) David, C. L.; Howell, E. E.; Farnum, M. F.; Villafranca, J. E.; Oatley, S. J.; Kraut, J. Biochemistry 1992, 31, 9813. (18) Rajagopalan, P. T. R.; Lutz, S.; Benkovic, S. J. Biochemistry 2002, 41, 12618. (19) Miller, G. P.; Benkovic, S. J. Biochemistry 1998, 37, 6336. (20) Adams, J. A.; Fierke, C. A.; Benkovic, S. J. Biochemistry 1991, 30, 11046. (21) Miller, G. P.; Wahnon, D. C.; Benkovic, S. J. Biochemistry 2001, 40, 867. (22) Sawaya, M. R.; Kraut, J. Biochemistry 1997, 36, 586. (23) Osborne, M. J.; Schnell, J.; Benkovic, S. J.; Dyson, H. J.; Wright, P. E. Biochemistry 2001, 40, 9846. (24) Schnell, J. R.; Dyson, H. J.; Wright, P. E. Annual Review of Biophysical Biomolecular Structure 2004, 33, 119. (25) Zhang, Z. Q.; Rajagopalan, P. T. R.; Selzer, T.; Benkovic, S. J.; Hammes, G. G. Proc. Nat. Acad. Sci. U.S.A. 2004, 101, 2764. (26) Sikorski, R. S.; Wang, L.; Markham, K. A.; Rajagopalan, P. T. R.; Benkovic, S. J.; Kohen, A. J. Am. Chem. Soc. 2004, 126, 4778. (27) Antikainen, N. M.; Smiley, R. D.; Benkovic, S. J.; Hammes, G. G. Biochemistry 2005, 44, 16835. (28) Wang, L.; Goodey, N. M.; Benkovic, S. J.; Kohen, A. Proceedings of the National Academy of Sciences U.S.A. 2006, 103, 15753. (29) Boehr, D. D.; McElheny, D.; Dyson, H. J.; Wright, P. E. Science 2006, 313, 1638.

57

(30) Agarwal, P. K.; Billeter, S. R.; Rajagopalan, P. T. R.; Benkovic, S. J.; Hammes-Schiffer, S. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 2794. (31) Agarwal, P. K.; Billeter, S. R.; Hammes-Schiffer, S. J. Phys. Chem. B 2002, 106, 3283. (32) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. J. Phys. Chem. B 2004, 108, 12231. (33) Watney, J. B.; Agarwal, P. K.; Hammes-Schiffer, S. J. Am. Chem. Soc. 2003, 125, 3745. (34) Wong, K. F.; Selzer, T.; Benkovic, S. J.; Hammes-Schiffer, S. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6807. (35) Castillo, R.; Andres, J.; Moliner, V. J. Am. Chem. Soc. 1999, 121, 12140. (36) Radkiewicz, J. L.; Brooks, C. L., III. J. Am. Chem. Soc. 2000, 122, 225. (37) Cummins, P. L.; Greatbanks, S. P.; Rendell, A. P.; Gready, J. E. J. Phys. Chem. B 2002, 106, 9934. (38) Garcia-Viloca, M.; Truhlar, D. G.; Gao, J. Biochemistry 2003, 42, 13558. (39) Rod, T. H.; Radkiewicz, J. L.; Brooks III, C. L. Proceedings of the National Academy USA 2003, 100, 6980. (40) Thorpe, I. F.; Brooks III, C. L. J. Phys. Chem. B 2003, 107, 14042. (41) Thorpe, I. F.; Brooks III, C. L. Proteins: Structure, Function, and Bioinformatics 2004, 57, 444. (42) Swanwick, R. S.; Shrimpton, P. J.; Allemann, R. K. Biochemistry 2004, 43, 4119. (43) Liu, H.; Warshel, A. Biochemistry 2007, 46, 6011. (44) Lee, J.; Benkovic, S. J. personal communication. (45) Wang, Q.; Hammes-Schiffer, S. J. Chem. Phys. 2006, 125, 184102. (46) van Gunsteren, W. F.; Billeter, S. R.; Eising, A. A.; Hunenberger, P. H.; Kruger, P.; Mark, A. E.; Scott, W. R. P.; Tironi, I. G. Biomolecular simulation: The GROMOS96 manual and user guide; VdF Hochschulverlag, ETH Zurich: Zurich, 1996. (47) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. J. Comput. Chem. 1992, 13, 1011. (48) Xiang, J. Z.; Honig, B. JACKAL: A Protein Structure Modeling Package; Columbia University & Howard Hughes Medical Institute: New York, 2002. (49) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M., Jr.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 5179. (50) Humphrey, W.; Dalke, A.; Schulten, K. J Mol Graph 1996, 14, 33. (51) Weisstein, E. W. "Correlation Coefficient" From MathWorld - A Wolfram Web Resource. http://mathworld.wolfram.com/CorrelationCoefficient.html (52) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. Journal of Physical Chemistry B 2004, 108, 12231. (53) Miller, G. P.; Wahnon, D. C.; Benkovic, S. J. Biochemistry 2001, 40, 867. (54) Watney, J. B.; Agarwal, P. K.; Hammes-Schiffer, S. Journal of the American Chemical Society 2003, 125, 3745.

http://mathworld.wolfram.com/CorrelationCoefficient.html�

58

Chapter 5

Conclusions

5.1 Anharmonic Effects in Small Clusters

We investigated anharmonic effects of ammonium nitrate and hydroxylammonium nitrate

covalent monomers, ionic dimers and constituent ions. Density functional theory and second-

order vibrational perturbation theory as implemented in Gaussian 03 package were used in the

calculations. Our calculations illustrate that the anharmonicities of the potential energy surfaces

significantly influence the geometries, frequencies, and nuclear magnetic shieldings for these

systems All clusters exhibit strong hydrogen bonding interactions. Our calculations confirmed

that the most stable structures are covalent acid-base pairs for the monomers and ionic acid-base

pairs for the dimers.1,2 The hydrogen bonding distances were found to be greater in the ionic

dimers than in the covalent monomers in part because the nitrogen and oxygen atoms are

involved in multiple competing hydrogen bonding interactions in the dimers.

We also observed significant shifts in the stretching frequencies from the covalent

monomers to the ionic dimers. Moreover, we identified an intermolecular hydrogen-bonding

stretching motion of ~200 cm-1 in the monomers that shifts to an intermolecular breathing motion

of slightly higher frequency of ~300 cm-1 in the dimers. In these cases, it is incorrect to use

scaling factors that are normally used in ab initio harmonic frequency calculations. The inclusion

of anharmonic effects was found to significantly decrease many of the calculated frequencies in

these clusters and to improve the agreement of the calculated frequencies with the experimental

data available for the isolated neutral species.

Our calculations of nuclear magnetic shielding constants for all nuclei in these clusters

illustrate that quantitatively accurate predictions of nuclear magnetic shieldings for comparison to

59

experimental data require the inclusion of anharmonic effects. Furthermore, the consideration of

anharmonic effects in the development of molecular forcefields will be particularly important for

simulations of proton transfer reactions in ionic liquids and other ionic materials.

5.2 Simulation Methods for Hydride Transfer in Dihydrofolate Reductase

We also described two statistical methods that can be used to generate the potential of

mean force (PMF) for a chemical reaction: the weighted histogram analysis method (WHAM)

and the umbrella integration (UI) method. In WHAM, two equations are solved iteratively, and

the PMF is obtained directly from them. The UI method is based on the derivative of the PMF

with respect to the reaction coordinate rather than the PMF itself. There are two significant

advantages of UI over WHAM. The first advantage is that UI does not rely on a binning

procedure to generate histograms and therefore reduces the statistical error and converges

efficiently. The second advantage is that the UI method can provide accurate PMF curves

efficiently even with a small number of windows that do not overlap significantly. Thus, UI is a

promising method for generating accurate PMF curves for large systems for which sampling may

be limited.

The free energy barrier calculated for hydride transfer in DHFR was similar to previously

calculated barriers with significantly longer sampling periods.3,4 We established that for DHFR, it

is not necessary to generate the full free energy profile to calculate the free energy barrier.

Additionally, generating only partial free energy profiles with a lesser number of windows

significantly reduces the computational cost. Compared to the previously obtained free energy

barrier using 20 windows with significantly more sampling, the free energy barriers using 6

windows were similar for both the WHAM and the UI methods.5 The UI method was able to

generate comparable free energy barriers even with four windows.

60

5.3 Ranking Mutants of DHFR According to Catalytic Reaction Rates

A computationally efficient approach was presented for evaluating the impact of

mutation on enzyme-catalyzed reaction rates. This procedure requires the generation and

equilibration of the mutant structure, followed by the calculation of a partial free energy curve

using an EVB potential in conjunction with biased molecular dynamics simulations and the UI or

the WHAM method. No parameter fitting is involved in this procedure for the mutants. The

individual steps were automated and optimized for computational efficiency.

We used this approach to calculate the changes in the free energy barrier for hydride

transfer upon mutation of DHFR. The 15 mutants studied were chosen objectively based on a

general literature search for experimental measurements of the hydride transfer rate at pH 7.6

Mutations were done using rotamer libraries as implemented in the Jackal suite of programs.7,8

The resulting structures were subjected to a restrained minimization and equilibration procedure.

We observed that some mutants require longer equilibration periods.

The agreement between the calculated and experimental changes in the free energy

barrier upon mutation is encouraging. The computational approach predicts the correct direction

of the change in free energy barrier for all mutants, and the correlation coefficient between the

calculated and experimental data is 0.82. In the future, this approach will be used to predict the

impact of mutations that have not been studied experimentally yet. The feedback between

experiment and theory will guide the further refinement of the procedure. This general approach

for ranking protein designs according to the free energy barrier has implications for protein

engineering and drug design. Additionally, the method is mostly automated and can readily be

modified to be used for mutants of different enzymes or enzyme designs.

61

5.4 References

(1) Mebel, A. M.; Lin, M. C.; Morokuma, K.; Melius, C. F. Journal of Physical Chemistry 1995, 99, 6842. (2) Nguyen, M.-T.; Jamka, A. J.; Cazar, R. A.; Tao, F.-M. The Journal of Chemical Physics 1997, 106, 8710. (3) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. Journal of Physical Chemistry B 2004, 108, 12231. (4) Wang, Q.; Hammes-Schiffer, S. Journal of Chemical Physics 2006, 125, 184102. (5) Chakravorty, D. K.; Kumarasiri, M.; Soudackov, A. V.; Hammes-Schiffer, S. Journal of Chemical Theory and Computation 2008, 4, 1974. (6) Lee, J.; Benkovic, S. J. personal communication. (7) Xiang, J. Z.; Honig, B. JACKAL: A Protein Structure Modeling Package; Columbia University & Howard Hughes Medical Institute: New York, 2002. (8) Xiang, Z.; Honig, B. Journal of Molecular Biology 2001, 311, 421.

62

Appendix A

Technical Details of the Mutation Procedure

A.1 Introduction

The mutation procedure involves generating the GROMOS coordinate file, the

GROMOS topology and the GROMOS EVB files for the mutants. The GROMOS topology files

for the mutants are obtained by following the protocol given in Section A.2. The mutant EVB

files are manually edited as it only involves replacing a few atom indices. Generating the

GROMOS coordinate file is the most time consuming step and involves a significant amount of

manual manipulation of files (Section A.3). The mutant topology and EVB files require ≈10

minutes to generate, while the coordinate file requires ≈30 minutes. The example scenario

provided below is for mutating the wild-type enzyme to the G121L mutant.

A.2 Protocol for Creating Mutant Topology and EVB Files

In the EVB files for each mutant, atom indices must shift by a constant amount, which is

the difference between the total number of atoms between the wild-type and the mutant. For the

G121L mutant, this shifts the atom indices by +4 and editing the atom indices can be done easily

manually. We use the GROMOS executable progmt.64 to create the topology file. Progmt.64

usage is explained with examples in the GROMOS manual. The topology for the amino acids in

the protein is created separately from the substrate and cofactor. Then the partial topologies are

merged to create the topology of the whole system. Therefore, only the amino acid part of the

63

topology needs to be regenerated for each mutant. The following steps are followed to create a

mutant topology file:

1. Edit the AANM directive of the control file for progmt.64 to reflect the mutation.

This section lists the residues present in the enzyme and only the mutated residue

identity should to be changed. GROMOS program manual provides examples of all

control file directives.

2. Execute progmt.64 to create the amino acid part of the topology file. Executing

progmt.64 can be done using a script similar to Run_min1.sh given in Appendix B by

changing the name of the executable in line 25.

3. Merge amino acid topology with substrate and cofactor topologies using progmt.64.

Here the NMOL2 directive in the control file is kept at 1. NMOL2 specifies how

many times a second molecular topology is merged with a previous topology.

A.3 Generating Mutant Coordinates

As stated in Section 4.2, the profix utility in the Jackal suite of programs was used to

perform the mutation. We start with the wild-type pdb file with the atomic coordinates and the

SEQRES cards that lists the amino acid sequence. We remove the mutated residue from the

coordinates section of the pdb file and change the SEQRES entry of the mutated residue to reflect

the mutation. Then profix is executed with the -fix 1 option. Thus the command line input is

“profix – fix 1 <input_file_name> <output_file_name> >& <log_file_name>”. Relevant sections

of a sample input pdb file, output pdb file and the profix log file are given in Sections A.3.1,

A.3.2 and A.3.3, respectively. Once the coordinates are created using profix, they can be copied

and pasted on to the GROMOS coordinate file.

64

A.3.1 Sample Input Pdb File

In the input pdb file, residue 121 is changed to LEU in the SEQRES section and is

removed completely from the coordinates section. Mutating residue is underlined for

identification purposes.

TITLE DHFR_WT(1RX2)_TO_G121L SEQRES 1 159 MET ILE SER LEU ILE ALA ALA LEU ALA VAL ASP ARG VAL SEQRES 2 159 ILE GLY MET GLU ASN ALA MET PRO TRP ASN LEU PRO ALA SEQRES 3 159 ASP LEU ALA TRP PHE LYS ARG ASN THR LEU ASP LYS PRO SEQRES 4 159 VAL ILE MET GLY ARG HIS THR TRP GLU SER ILE GLY ARG SEQRES 5 159 PRO LEU PRO GLY ARG LYS ASN ILE ILE LEU SER SER GLN SEQRES 6 159 PRO GLY THR ASP ASP ARG VAL THR TRP VAL LYS SER VAL SEQRES 7 159 ASP GLU ALA ILE ALA ALA CYS GLY ASP VAL PRO GLU ILE SEQRES 8 159 MET VAL ILE GLY GLY GLY ARG VAL TYR GLU GLN PHE LEU SEQRES 9 159 PRO LYS ALA GLN LYS LEU TYR LEU THR HIS ILE ASP ALA SEQRES 10 159 GLU VAL GLU LEUSEQRES 11 159 ASP ASP TRP GLU SER VAL PHE SER GLU PHE HIS ASP ALA

ASP THR HIS PHE PRO ASP TYR GLU PRO

SEQRES 12 159 ASP ALA GLN ASN SER HIS SER TYR CYS PHE GLU ILE LEU SEQRES 13 159 GLU ARG ARG ATOM 1 H1 MET 1 42.160 31.995 28.982 ATOM 2 H2 MET 1 40.908 32.853 28.352 ATOM 3 N MET 1 41.383 32.589 29.191 ATOM 4 H3 MET 1 41.772 33.407 29.615 ATOM 5 CA MET 1 40.646 31.835 30.184 ... ... ATOM 1168 CA GLU 120 12.372 31.835 47.872 ATOM 1169 CB GLU 120 10.949 32.200 48.160 ATOM 1170 CG GLU 120 9.978 31.844 47.054 ATOM 1171 CD GLU 120 8.503 32.041 47.420 ATOM 1172 OE1 GLU 120 7.836 31.020 47.607 ATOM 1173 OE2 GLU 120 8.119 33.213 47.705 ATOM 1174 C GLU 120 12.994 32.652 46.720 ATOM 1175 O GLU 120 13.402 33.821 46.983 ATOM 1181 N ASP 122 13.500 32.448 41.907 ATOM 1182 H ASP 122 14.264 33.091 41.858 ATOM 1183 CA ASP 122 13.011 31.934 40.579 ATOM 1184 CB ASP 122 12.186 32.976 39.764 ... ...

65

A.3.2 Sample Output Pdb File

In the output file, LEU is added at position 121. Only the coordinates of the united atoms

are given here.

...

... ATOM 1859 CA GLU 120 12.372 31.835 47.872 ATOM 1862 CB GLU 120 10.966 32.277 48.133 ATOM 1863 CG GLU 120 9.876 31.665 47.213 ATOM 1864 CD GLU 120 9.199 32.690 46.305 ATOM 1865 OE1 GLU 120 8.566 33.616 46.877 ATOM 1866 OE2 GLU 120 9.295 32.561 45.058 ATOM 1860 C GLU 120 13.044 32.586 46.703 ATOM 1861 O GLU 120 13.548 33.721 46.947 ATOM 1873 N LEU 121 13.004 31.991 45.511 ATOM 1881 HN LEU 121 12.529 31.116 45.427 ATOM 1874 CA LEU 121 13.625 32.567 44.322 ATOM 1877 CB LEU 121 14.989 31.901 44.098 ATOM 1878 CG LEU 121 16.180 32.459 44.906 ATOM 1879 CD1 LEU 121 16.283 33.991 44.827 ATOM 1880 CD2 LEU 121 16.067 32.012 46.370 ATOM 1875 C LEU 121 12.767 32.346 43.077 ATOM 1876 O LEU 121 11.672 31.771 43.160 ATOM 1892 N ASP 122 13.497 32.416 41.920 ATOM 1900 HN ASP 122 14.051 33.248 41.897 ATOM 1893 CA ASP 122 13.011 31.934 40.579 ATOM 1896 CB ASP 122 12.206 33.002 39.778 ... ...

A.3.3 Sample Profix Log Entry

A portion of the output from profix is provided here. It identifies that a residue is missing

at position 121 and then adds it and prints out the conservation scores. Then it re-indexes the

system.

...

... Warning...... the pdb file:1rx2_to_g121l has breaker at:E120 D122 ...

66

... conserve score 114 I---I:1 conserve score 115 D---D:1 conserve score 116 A---A:0.931592 conserve score 117 E---E:0.919129 conserve score 118 V---V:0.899655 conserve score 119 E---E:0.863992 conserve score 120 ----L:0.771263 conserve score 121 D---D:0.863992 conserve score 122 T---T:0.899655 conserve score 123 H---H:0.919129 conserve score 124 F---F:0.931592 conserve score 125 P---P:1 conserve score 126 D---D:1 conserve score 127 Y---Y:1 ... ... reindexing... indexing from old to new... 1 -- 1 indexing from old to new... 2 -- 2 ... ... indexing from old to new... 119 -- 119 indexing from old to new... 120 -- 120 indexing from old to new........ -- 121 indexing from old to new... 122 -- 122 indexing from old to new... 123 -- 123 ... write down the final structure...1rx2_to_g121l_fix.pdb

67

Appendix B

Scripts for Automating Computer Job Submission

B.1 Introduction

Supervising MD simulations for a large number of mutants requires utilizing many

scripts and small programs. Some of the scripts used for job submitting and editing input files are

given here. Our strategy was to use a very few master scripts to manage a large number of other

scripts (Sections B.2, B.3 and B.6). The master scripts are able to edit input files and other

scripts, submit computer jobs, monitor job status and collect results. The scripts provided here

have been formatted for printing purposes. Therefore, they may not reflect the best scripting

practices. Additionally, these scripts are open-source and are not bound by the copyright rules of

the rest of this thesis.

The scenario used in the scripts is for the G121L DHFR mutant. It is assumed that the

main job directory is /home/malika/DHFR/G121L/data_x2. Generating the G121L mutant free

energy profile starts with making the mutation as described in Appendix A. Then the necessary

directories are created by executing Makedir.sh script. Run_restr.sh script is used to perform the

restrained minimizations and equilibrations before starting the windows. This script

automatically calls each minimization and equilibration script. Submit_min1.sh is used to submit

the first restrained minimization to the computer job queue and is provided as an example. This

script executes Run_min1.sh script, which calls the GROMOS executable to perform the

requested MD simulations. The windows are started by executing Run_all.sh script. This script

automatically calls each Run_gromos.sh script, one per window. Each Run_gromos.sh script

executes a Run_lambda.sh script to perform MD simulations. At the end of MS simulations of

each window, the necessary GROMOS files are copied in to the analysis directory, which is

68

within the main directory of the mutant. A Fortran 90 program (Section B.9) is used to extract

( )11V R and ( )22V R to perform WHAM or UI analysis. A compact description of each script is

given before the script. Scripts are also heavily annotated.

B.2 Makedir.sh

This script makes the necessary directory structure for a mutant and copies and edits the

necessary input files. The variables that commonly require changing are provided at the

beginning of the script. Additionally, line 39 can be changed to accommodate a different number

of windows.

#!/bin/sh 1 2 # Makes the full folder structure for a mutant 3 # and copy and edit necessary files. 4 # Copy script to where ever you want the $DATADIR to be. 5 6 NEWm='G121L' # new mutant 7 OLDm='D27C' # old mutant 8 NEWd='x2' # directory name 2nd half 9 OLDd='test' # old directory name 2nd half 10 NEWp='1706' # new # of protein atoms 11 OLDp='1701' # old # of protein atoms 12 NEWt='13997' # new total atoms 13 OLDt='13992' # old total atoms 14 15 SYS_BASE='1RX2_G121L' 16 HOMEDIR='/home/malika/DHFR/G121L/data_' # old folder name 1st half 17 DATADIR='data_' # new folder name 1st half 18 19 # Begin creating directories 20 mkdir ${DATADIR}${NEWd} 21 mkdir ${DATADIR}${NEWd}/restrain_jobs 22 mkdir ${DATADIR}${NEWd}/analysis 23 mkdir ${DATADIR}${NEWd}/tempfiles 24 mkdir ${DATADIR}${NEWd}/inputs 25 26 # Copy input files and scripts 27 cp ${HOMEDIR}${OLDd}/inputs/* ${DATADIR}${NEWd}/inputs/ 28 cp ${HOMEDIR}${OLDd}/*.sh ${DATADIR}${NEWd}/ 29

69

sed 's/'$OLDm'/'$NEWm'/' < ${HOMEDIR}${OLDd}/analysis/untar.sh > 30 ${DATADIR}${NEWd}/analysis/untar.sh 31 sed 's/'$OLDm'/'$NEWm'/' < ${HOMEDIR}${OLDd}/analysis/link.sh > 32 ${DATADIR}${NEWd}/analysis/link.sh 33 34 # Work on windows 35 # File names are changed on the fly 36 for LAM in 0.050 0.125 0.250 0.375 0.500 0.625 37 do 38 mkdir ${DATADIR}${NEWd}/l${LAM} 39 mkdir ${DATADIR}${NEWd}/l${LAM}/${SYS_BASE}_l${LAM}_init_md 40 mkdir ${DATADIR}${NEWd}/l${LAM}/${SYS_BASE}_l${LAM}_cont_md 41 sed -e 's/'$OLDm'/'$NEWm'/' -e 's/'$OLDd'/'$NEWd'/' < 42 ${HOMEDIR}${OLDd}/${LAM}_run_LAM.sh > 43 ${DATADIR}${NEWd}/${LAM}_run_LAM.sh 44 done 45 46 # Work on scripts 47 # File names are changed on the fly 48 for WORD in eqm min1 min2 min3 min4 posrest1 posrest2 posrest3 49 posrest4 50 do 51 sed -e 's/'$OLDm'/'$NEWm'/' -e 's/'$OLDd'/'$NEWd'/' < 52 ${HOMEDIR}${OLDd}/${WORD}_run_LAM.sh > 53 ${DATADIR}${NEWd}/${WORD}_run_LAM.sh 54 done 55 56 # Work on any remaining script 57 # File names are changed on the fly 58 for WORD in run_all initialize 59 do 60 sed -e 's/'$OLDm'/'$NEWm'/' -e 's/'$OLDd'/'$NEWd'/' < 61 ${HOMEDIR}${OLDd}/${WORD}.sh > ${DATADIR}${NEWd}/${WORD}.sh 62 done 63 64 # Now work on control files list 65 # File names are changed on the fly 66 # Atoms numbers are changed on the fly 67 for WORD in eqm init init0 cont min1 min2 min3 min4 posrest1 68 posrest2 posrest3 posrest4 69 do 70 sed -e 's/'$OLDp'/'$NEWp'/' -e 's/'$OLDt'/'$NEWt'/' < 71 ${WORD}_1RX2_${OLDm}_control.dat > 72 ${WORD}_1RX2_${NEWm}_control.dat 73 rm ${WORD}_1RX2_${OLDm}_control.dat 74 done 75 76 exit 077

70

B.3 Run_restr.sh

This script is similar to Run_all.sh. It submits each restrained minimization and

equilibration job successively. It calls scripts similar to Run_lambda.sh for each step of

minimization or equilibration.

#!/bin/sh 1 2 # perform restrain minimize/MD procedure 3 SYS_BASE='1RX2_G121L' 4 HOMEDIR='/home/malika/DHFR/G121L/data_x2' # job submitting folder 5 loopvar="a" # infinite loop 6 variable 7 LAM=( min1 posrest1 min2 posrest2 min3 posrest3 min4 posrest4 ) 8 i=0 # index for array LAM 9 time=180 # wait time in seconds 10 11 ./${LAM[$i]}_run_gromos.sh G121L_${LAM[$i]} & 12 echo "First job submitted: " ${LAM[$i]} 13 sleep $time 14 15 while [ "$loopvar" != "b" ] 16 do 17 echo "Enter inner loop, loopvar: " $loopvar 18 if test -e ${HOMEDIR}/inputs/${SYS_BASE}_${LAM[$i]}.xyz 19 then 20 i=`expr $i + 1` 21 echo "====> ${SYS_BASE}_${LAM[$i-1]}.xyz detected! LAM is 22 incremented to " ${LAM[$i]} 23 echo "----> next job submitted..." 24 ./${LAM[$i]}_run_gromos.sh G121L_${LAM[$i]} & 25 else 26 echo "${SYS_BASE}_${LAM[$i-1]}.xyz is still not there... 27 Waiting..." 28 sleep $time 29 fi 30 if [ "$i" = "${#LAM[@]}" ] 31 then 32 loopvar="b" 33 fi 34 done 35 echo "All jobs submitted..." 36 37 exit 038

71

B.4 Submit_min1.sh

This script submits a restrained minimization job to the computer job queue. It calls

Run_min1.sh to run the desired GROMOS job.

#!/bin/sh 1 2 # Script to autosubmit min1 3 # Usage: ./<script_name> <job_name> 4 HERE=`pwd` 5 JOBNAME=$1_$$ 6 JOBDIR=$1_$$ 7 8 # Name of the executable 9 EXEC=/scratch/${USER}/${JOBDIR}/Run_min1.sh 10 11 # create the PBS script 12 cat << EnD > $JOBNAME.pbs 13 #PBS -S /bin/sh 14 #PBS -N $JOBNAME 15 #PBS -q batch 16 #PBS -l walltime=1:00:00 17 #PBS -l ncpus=1 18 #PBS -j oe 19 mkdir /scratch/${USER}/${JOBDIR} 20 cd $HERE/restrain_jobs 21 cp ../*.sh /scratch/${USER}/${JOBDIR} 22 cd /scratch/${USER}/${JOBDIR} 23 ${EXEC} 24 tar -czvf ${JOBNAME}_job.tar.gz * 25 mkdir ${HERE}/restrain_jobs/${JOBNAME} 26 cp ${JOBNAME}_job.tar.gz ${HERE}/restrain_jobs/${JOBNAME} 27 cd ${HERE}/restrain_jobs/${JOBNAME} 28 tar -xzvf ${JOBNAME}_job.tar.gz 29 rm ${JOBNAME}_job.tar.gz 30 rm -rf /scratch/${USER}/${JOBDIR} 31 EnD 32 33 # Now submit the pbs job to the queue 34 qsub ${HERE}/${JOBNAME}.pbs 35 36 exit 037

72

B.5 Run_min1.sh

This script executes the first GROMOS minimization job. The script creates necessary

input and output links for the GROMOS executable.

#!/bin/sh 1 2 # Run the requested GROMOS jobs (min1) 3 4 SYS_BASE='1RX2_G121L' 5 XDIR='/home/malika/grforce/' 6 LAM='freeze_test' # current LAM 7 HOMEDIR='/home/malika/DHFR/G121L/data_xx' 8 9 IUNIT=${HOMEDIR}/inputs/min1_${SYS_BASE}_control.dat 10 OUNIT=${SYS_BASE}_${LAM}.out 11 12 # Create input links for GROMOS executable 13 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-mt.dat fort.20 14 ln -s ${HOMEDIR}/inputs/${SYS_BASE}_init.xyz fort.21 15 ln -s ${HOMEDIR}/inputs/${SYS_BASE}_restrain.xyz fort.22 16 ln -s ${HOMEDIR}/inputs/${SYS_BASE}_atom_seq.xyz fort.23 17 18 # Output links 19 ln -s ${SYS_BASE}_${LAM}.xyz fort.11 20 ln -s ${SYS_BASE}_${LAM}.trj fort.12 21 ln -s ${SYS_BASE}_${LAM}.nrg fort.15 22 23 # Execute GROMOS 24 $XDIR/bin/jw_epromd.64 < $IUNIT > $OUNIT 25 26 # Copy coordinates 27 # Edit files for next step: posrest1 28 rm -f fort.* 29 cp ${SYS_BASE}_${LAM}.xyz ${HOMEDIR}/inputs/${SYS_BASE}_min1.xyz 30 sed 's/POSITION/REFPOSITION/' < ${SYS_BASE}_${LAM}.xyz > 31 ${HOMEDIR}/inputs/${SYS_BASE}_restrain1.xyz 32 sed 's/POSITION/POSRESSPEC/' < ${SYS_BASE}_${LAM}.xyz > 33 ${HOMEDIR}/inputs/${SYS_BASE}_atom_seq1.xyz 34 35 # Archive output files 36 tar -czvf ${SYS_BASE}_${LAM}.trj.tar.gz ${SYS_BASE}_${LAM}.trj 37 tar -czvf ${SYS_BASE}_${LAM}.nrg.tar.gz ${SYS_BASE}_${LAM}.nrg 38 rm -f rm -f *.trj *.nrg 39 40 exit 041

73

B.6 Run_all.sh

This script submits windows after 20 ps of MD in the previous window by checking the

existence of the GROMOS coordinates file after 20 ps.

#!/bin/sh 1 2 # Script to auto submit windows 3 4 # Define variables 5 SYS_BASE='1RX2_G121L' # file identifier 6 HOMEDIR='/home/malika/DHFR/G121L/data_x2' # start folder 7 loopvar="a" # loop variable 8 LAM=( 0.050 0.125 0.250 0.375 0.500 0.625 ) # window identifiers 9 i=0 10 time=300 # sleep time in sec. 11 12 # Start first window 13 ./${LAM[$i]}_run_gromos.sh G121L_${LAM[$i]} & 14 echo "First job submitted, LAMBDA: " ${LAM[$i]} 15 sleep $time 16 17 # Start checking for output files of the previous job 18 # Then start the following window 19 while [ "$loopvar" != "b" ] 20 do 21 if test -e 22 ${HOMEDIR}/tempfiles/${SYS_BASE}_l${LAM[$i]}_init_md.xyz 23 then 24 i=`expr $i + 1` 25 echo "====> ${SYS_BASE}_l${LAM[$i-1]}_init_md.xyz detected!" 26 echo " LAMBDA is incremented to " ${LAM[$i]} 27 ./${LAM[$i]}_run_gromos.sh G121L_${LAM[$i]} & 28 echo "----> LAMBDA ${LAM[$i]} submitted." 29 else 30 echo "Waiting for ${SYS_BASE}_l${LAM[$i-1]}_init_md.xyz ..." 31 sleep $time 32 fi 33 34 # Break loop if we’re at the last window 35 if [ "$i" = "${#LAM[@]}" ] 36 then 37 loopvar="b" 38 fi 39 done 40 echo "All jobs submitted..." 41 42 exit 043

74

B.7 Run_gromos.sh

This script submits the computer job to the job queue of the server. This script is called

within Run_all.sh although it can be executed directly. The actual file name of the script begins

with the window designation. The jobs are run at /scratch/malika directory, and then results are

archived and copied back to the starting directory. Each window has one Run_gromos.sh script.

#!/bin/sh 1 2 # Submits a job to the batch PBS queue on shscluster2 3 # Usage: ./<script_name> <job_name> 4 HERE=`pwd` 5 JOBNAME=$1_$$ 6 JOBDIR=$1_$$ 7 8 # Name of the executable 9 EXEC=/scratch/${USER}/${JOBDIR}/0.125_Run_lambda.sh 10 11 # create the PBS script 12 cat << EnD > 13 $JOBNAME.pbs 14 #PBS -S /bin/sh 15 #PBS -N $JOBNAME 16 #PBS -q batch 17 #PBS -l walltime=180:00:00 18 #PBS -l ncpus=1 19 #PBS -l nodes=1:big 20 #PBS -j oe 21 mkdir /scratch/${USER}/${JOBDIR} 22 cd $HERE/l0.125 23 cp -r *md /scratch/${USER}/${JOBDIR} 24 cp ../0.125_run_LAM.sh /scratch/${USER}/${JOBDIR} 25 cd /scratch/${USER}/${JOBDIR} 26 ${EXEC} 27 tar -czvf ${JOBNAME}_job.tar.gz * 28 mkdir ${HERE}/l0.125/${JOBNAME} 29 cp ${JOBNAME}_job.tar.gz ${HERE}/l0.125/${JOBNAME} 30 cd ${HERE}/l0.125/${JOBNAME} 31 tar -xzvf ${JOBNAME}_job.tar.gz rm ${JOBNAME}_job.tar.gz 32 rm -rf /scratch/${USER}/${JOBDIR} 33 EnD 34 35 # Now submit the pbs job to the queue 36 qsub ${HERE}/${JOBNAME}.pbs 37 38 exit 039

75

B.8 Run_lambda.sh

This script performs the actual GROMOS calculation for one window. It performs two

jobs, one for the initial 20 ps in order to generate the coordinate file for Run_all.sh script and then

continues the job for 350 ps. Each window has one Run_lambda.sh script.

#!/bin/sh 1 2 # Initialize and run MD for one LAMBDA 3 4 SYS_BASE='1RX2_G121L' # file identifier 5 XDIR='/home/malika/grforce/' # executable directory 6 LAM=l0.125 # current LAMBDA 7 OLDLAM=l0.050 # previous LAMBDA 8 HOMEDIR='/home/malika/DHFR/G121L/data_x2' # where job files are 9 10 # First run 20 ps 11 cd ${SYS_BASE}_${LAM}_init_md/ 12 IUNIT=${HOMEDIR}/inputs/init_${SYS_BASE}_control.dat 13 OUNIT=${SYS_BASE}_${LAM}_init_md.out 14 15 # Create input links for GROMOS executable 16 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-mt-pert-evb.dat fort.20 17 ln -s ${HOMEDIR}/tempfiles/${SYS_BASE}_${OLDLAM}_init.xyz fort.21 18 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-pert-evb.dat fort.30 19 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-evb-${LAM}.dat fort.50 20 21 # Output links 22 ln -s ${SYS_BASE}_${LAM}_init_md.xyz fort.11 23 ln -s ${SYS_BASE}_${LAM}_init_md.trj fort.12 24 ln -s ${SYS_BASE}_${LAM}_init_md.nrg fort.15 25 ln -s ${SYS_BASE}_${LAM}_init_md.dlm fort.16 26 27 # Execute program 28 $XDIR/bin/jw_epromd.64 < $IUNIT > $OUNIT 29 30 # Copy .xyz to tempfiles to trigger Run_all.sh 31 cp ${SYS_BASE}_${LAM}_init_md.xyz ${HOMEDIR}/tempfiles/ 32 rm -f fort.* 33 rm -f *.trj *.nrg *.dlm 34 35 # Now continue running 330 ps 36 cd ../${SYS_BASE}_${LAM}_cont_md/ 37 IUNIT=${HOMEDIR}/inputs/cont_${SYS_BASE}_control.dat 38 OUNIT=${SYS_BASE}_${LAM}_cont_md.out 39 40 # Create input and output links for this job 41 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-mt-pert-evb.dat fort.20 42 ln -s ${HOMEDIR}/tempfiles/${SYS_BASE}_${LAM}_init_md.xyz fort.21 43

76

ln -s ${HOMEDIR}/inputs/${SYS_BASE}-pert-evb.dat fort.30 44 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-evb-${LAM}.dat fort.50 45 ln -s ${SYS_BASE}_${LAM}_cont_md.xyz fort.11 46 ln -s ${SYS_BASE}_${LAM}_cont_md.trj fort.12 47 ln -s ${SYS_BASE}_${LAM}_cont_md.nrg fort.15 48 ln -s ${SYS_BASE}_${LAM}_cont_md.dlm fort.16 49 50 $XDIR/bin/jw_epromd.64 < $IUNIT > $OUNIT 51 rm -f fort.* 52 53 # Archive output files 54 # Copy .nrg file to analysis directory 55 tar -czvf ${SYS_BASE}_${LAM}_cont_md.trj.tar.gz *.trj 56 tar -czvf ${SYS_BASE}_${LAM}_cont_md.nrg.tar.gz *.nrg 57 tar -czvf ${SYS_BASE}_${LAM}_cont_md.dlm.tar.gz *.dlm 58 cp ${SYS_BASE}_${LAM}_cont_md.nrg.tar.gz ${HOMEDIR}/analysis/ 59 rm -f *.trj *.nrg *.dlm 60 61 exit 062

B.9 ExtractV.f90

This Fortran 90 program extracts ( )11V R and ( )22V R from the GROMOS .nrg output

file. It creates two separate output files, one to be used for the WHAM analysis, and the other for

the UI analysis. The data extraction is based on searching for the EVBMAT keyword in the

GROMOS file.

program extractv 1 2 ! Extracts V11 and V22 from a Gromos .nrg file by searching for 3 ! EVBMAT keyword 4 ! Input is a GROMOS .nrg file. 5 ! WHAM output is in kJ/mol and UI output is in kcal/mol 6 ! Malika - 4/5/2008 7 8 implicit none 9 10 character(len=80) :: rest 11 character(len=80) :: filename 12 13 integer :: temp 14 real (kind=8) :: v11, v22, dummy 15 16

77

! Read input file name 17 write(*,'("Enter the name of Gromos NRG file: ",$)') 18 read(*,*) filename 19 20 ! Open input and output files 21 open(1,file=trim(filename)) 22 open(2,file=trim(filename)//".wham") 23 open(3,file=trim(filename)//".ui") 24 25 temp = 0 26 27 do 28 read(1,'(a80)',end=99) rest 29 30 if (trim(rest).eq."# EVBMAT") then 31 temp = temp + 1 32 read(1,*) v11, dummy 33 read(1,*) dummy, v22 34 write(2,*) v11, v22 35 write(3,*) v11/4.184, v22/4.184, (v11-v22)/4.184 36 endif 37 38 enddo 39 40 write (*,’(“Number of EVBMATs extracted: “,I6)’) temp 41 99 continue 42 43 ! Close opened files 44 close(1) 45 close(2) 46 close(3) 47 48 end program extractv49

VITA

Malika D. Kumarasiri

Malika D. Kumarasiri was born in Colombo, Sri Lanka to Padmini Kumarasiri and

Pettanayake Kumarasiri. In December 2001, he graduated from University of Colombo, Sri

Lanka with a Bachelor of Science Honors degree in chemistry. He married Vindhya Panagoda in

2003. He then, joined the research group of Prof. Sharon Hammes-Schiffer at the Pennsylvania

State University in University Park, Pennsylvania to pursue graduate studies in chemistry. In

December 2008, he received a Doctorate of Philosophy in chemistry for investigating anharmonic

effects of small clusters and ranking enzymatic mutants efficiently, according to their catalytic

reaction rates. His publications include:

M. Kumarasiri, G.A. Baker, A.V. Soudackov, and S. Hammes-Schiffer, “Ranking

Mutants of Dihydrofolate Reductase According to the Hydride Transfer Rates,” Journal

of Physical Chemistry B, submitted.

D.K. Chakravorty, M. Kumarasiri, A.V. Soudackov, and S. Hammes-Schiffer,

“Implementation of Umbrella Integration within the Framework of the Empirical Valence

Bond Approach,” Journal of Chemical Theory and Computation, 2008, 4, 1974 – 1980.

M. Kumarasiri, C. Swalina, and S. Hammes-Schiffer, “Anharmonic Effects in

Ammonium Nitrate and Hydroxylammonium Nitrate Clusters,” Journal of Physical

Chemistry B., 2007, 111, 4653 – 4658

anharmonic effects of small clusters of molecules …

Documents