anharmonic effects of small clusters of molecules …
TRANSCRIPT
The Pennsylvania State University
The Graduate School
Department of Chemistry
ANHARMONIC EFFECTS OF SMALL CLUSTERS OF MOLECULES
AND RANKING ACTIVITY OF PROTEIN MUTANTS
A Dissertation in
Chemistry
by
Malika D. Kumarasiri
2009 Malika D. Kumarasiri
Submitted in Partial Fulfillment of the Requirements
for the Degree of
Doctor of Philosophy
May 2009
The dissertation of Malika D. Kumarasiri was reviewed and approved* by the following:
Sharon Hammes-Schiffer Eberly Professor of Biotechnology Professor of Chemistry Dissertation Advisor Chair of Committee
A. Welford Castleman, Jr. Evan Pugh Professor of Chemistry and Physics Eberly Distinguished Chair in Science
Philip C. Bevilacqua Professor of Chemistry
James D. Kubicki Associate Professor of Geochemistry
Ayusman Sen Professor of Chemistry Head of the Department of Department or Graduate Program
*Signatures are on file in the Graduate School
iii
ABSTRACT
This thesis is presented in two parts. In part 1, anharmonic effects of small molecules are
investigated using theoretical methods. In part 2, mutants of enzymes are ranked according to
their activation energy barriers.
Anharmonic effects are required to describe vital processes such as bond breaking or
bond forming, and they significantly affect properties such as geometries and vibrational
frequencies. Despite their importance, anharmonic effects are typically overlooked due to the
high computational cost associated with calculating them. In part 1 of this thesis, anharmonic
effects of small clusters of ammonium nitrate and hydroxylammonium nitrate are investigated.
We compare the structures and vibrational modes against their harmonic counterparts using a
vibrational perturbation theory approach within the density functional theory framework.
Anharmonic effects significantly alter the structures and vibrational frequencies of ammonium
nitrate and hydroxylammonium nitrate clusters.
In part 2 of the thesis, we implement an efficient procedure to rank many mutants of
enzymes or protein designs according to the free energy barrier of the catalyzed reaction.
Escherichia coli dihydrofolate reductase (DHFR) and its mutants are used in this study, and the
mutant structures are generated based on the wild type enzyme structure. Different methods are
investigated to calculate the free energy barrier of hydride transfer in DHFR. The hydride transfer
reaction is investigated using empirical valence bond molecular dynamics simulations followed
by a weighted histogram analysis or umbrella integration to generate the free energy distribution
along the reaction coordinate. Fifteen single mutants of DHFR are used in this study. Our results
indicate a promising correlation between experimentally determined reaction rates and calculated
free energy barriers. The procedures are mostly automated and can easily be adapted for other
enzymatic mutants or designs.
iv
TABLE OF CONTENTS
LIST OF FIGURES ................................................................................................................. v
LIST OF TABLES ................................................................................................................... vi
ACKNOWLEDGEMENTS ..................................................................................................... vii
Chapter 1 Introduction ............................................................................................................ 1
1.1 General Introduction ...................................................................................................... 1 1.2 References ...................................................................................................................... 5
Chapter 2 Anharmonic Effects in Ammonium Nitrate and Hydroxylammonium Nitrate Clusters ............................................................................................................................ 8
2.1 Introduction .................................................................................................................... 8 2.2 Methods .......................................................................................................................... 10 2.3 Results ............................................................................................................................ 10 2.4 Conclusions .................................................................................................................... 22 2.5 References ...................................................................................................................... 23
Chapter 3 Simulation Methods for Hydride Transfer in Dihydrofolate Reductase ................ 25
3.1 Introduction .................................................................................................................... 25 3.2 EVB Molecular Dynamics ............................................................................................. 26 3.3 WHAM and UI ............................................................................................................... 28 3.4 Application to DHFR ..................................................................................................... 31 3.5 Conclusions .................................................................................................................... 39 3.6 References ...................................................................................................................... 40
Chapter 4 Ranking Mutants of Dihydrofolate Reductase According to the Hydride Transfer Rates .................................................................................................................. 42
4.1 Introduction .................................................................................................................... 42 4.2 Methods .......................................................................................................................... 43 4.3 Results ............................................................................................................................ 49 4.4 Conclusions .................................................................................................................... 55 4.5 References ...................................................................................................................... 56
Chapter 5 Conclusions ............................................................................................................ 58
5.1 Anharmonic Effects in Small Clusters ........................................................................... 58 5.2 Simulation of Hydride Transfer in Dihydrofolate Reductase ........................................ 59 5.3 Ranking Mutants of Dihydrofolate Reductase ............................................................... 60 5.4 References ...................................................................................................................... 61
v
Appendix A Technical Details of the Mutation Procedure ..................................................... 62
A.1 Introduction .................................................................................................................... 62 A.2 Protocol for Creating Mutant Topology Files ................................................................ 62 A.3 Generating Mutant Coordinates ..................................................................................... 63 A.3.1 Sample Input Pdb File .......................................................................................... 64 A.3.2 Sample Output Pdb File ....................................................................................... 65 A.3.3 Sample Profix Long Entry .................................................................................... 65
Appendix B Scripts for Automating Computer Job Submission ............................................ 67
B.1 Introduction .................................................................................................................... 67 B.2 Makedir.sh ...................................................................................................................... 68 B.3 Run_restr.sh ................................................................................................................... 70 B.4 Submit_min1.sh ............................................................................................................. 71 B.5 Run_min1.sh .................................................................................................................. 72 B.6 Run_all.sh ...................................................................................................................... 73 B.7 Run_gromos.sh .............................................................................................................. 74 B.8 Run_lambda.sh ............................................................................................................... 75 B.9 ExtractV.f90 ................................................................................................................... 76
vi
LIST OF FIGURES
Figure 1.1: The two-step reaction in DHFR. Proton transfer at N5 position of DHF is thought to precede hydride transfer to C6 position. ......................................................... 5
Figure 2.1: Optimized geometries of monomers and dimers of AN and HAN. Hydrogen bonds are indicated by dashed lines. The hydrogen bonding distances are given in Angstroms for the vibrationally averaged and equilibrium geometries, where the equilibrium distances are given in parentheses. (a) Covalent monomer of AN with Cs symmetry. (b) Covalent monomer of HAN with Cs symmetry. (c) Ionic dimer of (AN)2 with C2h symmetry. (d) Ionic dimer of (HAN)2 with C2 symmetry. This figure was created using GaussView.27 ...................................................................................... 11
Figure 3.1: Free energy profiles of E. coli wild-type DHFR using WHAM (blue) and UI (red). (a) Full free energy curve using 19 windows. (b) Partial free energy curve using 6 windows. .............................................................................................................. 33
Figure 3.2: Free energy profiles of wild-type DHFR using UI method. Full free energy curve using 19 windows is in blue, and the partial free energy curve using 6 windows is in red. ............................................................................................................ 34
Figure 3.3: Original reactant state snapshot. Periodic images manifest to make the protein structure appear broken. This figure was created using VMD.37 ..................................... 35
Figure 3.4: Resolvated structure with no periodic images. This figure was created using VMD.37 ............................................................................................................................. 36
Figure 3.5: Summary of steps of the restrained minimization and MD simulation procedure. fc is the restraining force constant which is halved with each cycle i. ........... 37
Figure 3.6: Partial free energy profile of E. coli wild-type DHFR with WHAM (blue) and UI (red) using the sewed structure and 6 windows. ......................................................... 38
Figure 4.1: Hydride transfer reaction from the NADPH cofactor to the protonated dihydrofolate substrate H3F+ to form the products tetrahydrofolate H4F and NADP+. Figure reproduced with permission from Ref.8. ............................................................... 44
Figure 4.2: Summary of the steps for the generation of the initial mutant structure, equilibration, and calculation of the free energy barrier. Here fc is the force constant of the position restraints with respect to the initial structure during the restrained minimizations and molecular dynamics simulations. ...................................................... 49
Figure 4.3: Depiction of the mutation sites of DHFR. The cofactor is green, the substrate is magenta, and the mutated residues are red. This figure was created using VMD.49 .... 51
Figure 4.4: Correlation plot for the calculated and experimental changes in the free energy barrier for the 15 mutants, where the calculated free energy barriers were obtained using UI with 6 windows. The correlation coefficient is R = 0.82. .................. 54
vii
LIST OF TABLES
Table 2.1: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the isolated neutral species. The experimental frequencies for NH3 are from Ref. 30. The experimental frequencies for HNO3 are from Refs. 31-34. The experimental frequencies for HONH2 are from Ref. 35. ......................................................................... 13
Table 2.2: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the isolated ionic species. Note that VPT2 method is not applicable to +
4ΝΗ because it behaves as a spherical top. ............................................................................................................. 14
Table 2.3: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the covalent monomers (i.e. one ion pair) AN and HAN. .................................................................... 16
Table 2.4: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the ionic dimer (AN)2. ..................................................................................................................... 17
Table 2.5: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the ionic dimer (HAN)2. .................................................................................................................. 18
Table 2.6: Nuclear magnetic shielding constants for AN, HAN, (AN)2 and (HAN)2. All shielding constants are given in ppm. σeq and σvib are the shieldings at the equilibrium and the vibrationally averaged geometries, respectively. For reference, the nuclear magnetic shielding constant for H in TMS calculated at this level of theory is 31.9702. ............................................................................................................. 21
Table 4.1: The experimentally determined hydride transfer rate constants for E. coli DHFR mutants at pH ≈7 and 300 K. These rate constants were measured at pH 7 for wild-type DHFR and all mutants except D27E and D27C, which were measured at pH 7.3. .............................................................................................................................. 50
Table 4.2: The change in the free energy barrier relative to the wild-type free energy barrier for a series of mutants for different equilibration periods. The experimental free energy barriers are obtained from the transition state theory rate constant
expression ( )†exp= −∆BB
k Tk G k T
husing the experimentally determined rate
constants in Table 4.1. The calculated free energy barriers are obtained using WHAM with six windows, UI with six windows, and UI with four windows. The notation “WHAM,350” denotes WHAM with 350 ps of MD on the EVB reactant surface in the equilibration procedure. “WHAM,650” and “WHAM,850” are defined accordingly. UI4 uses mλ = 0.050, 0. 250, 0.500 and 0.625 and UI4’ uses mλ = 0.050, 0.125, 0.500 and 0.625. All free energies are given in kcal/mol. The equilibration period is given in ps. ................................................................................... 53
viii
ACKNOWLEDGEMENTS
I thank my advisor Dr. Sharon Hammes-Schiffer for all the guidance and support I
received during research and to professional development. I am also grateful for all the support
and discussions I had with Dr. Alexander Soudackov. Their advice made my graduate years a
very rewarding experience. I would also like to acknowledge the help form past and present
members of the Hammes-Schiffer group especially, Dr. M. Pak, Dr. C. Swalina, Dr. K.F. Wong,
Dr. Q. Wang, Dr. A. Chackraborty, Dr. J.Watney, Dr. Y. Small, Dr. J.H. Skone, Dr. Dr. A. Hazra,
D.K. Chakravorty, M. Ludlow, S. Edwards, N. Veeraraghavan and, G. Baker.
I would like to thank my doctoral committee members Dr. Will Castleman, Dr. Phil
Bevilacqua and Dr. James Kubicki for the interest and effort put towards my work. I also
acknowledge the financial support from the granting agencies: Air Force Office of Scientific
Research (grant no. FA9550-04-1-0062), National Institutes of Health (grant no. GM56207) and
the Defense Advanced Projects Research Agency (Protein Design Processes project).
Finally, I would like to acknowledge my parents for their guidance to my education and
development. I am also in debt of my wife Vindhya. Without her standing beside me every step
of my graduate years, I would not have made it this far.
Chapter 1
Introduction
1.1 General Introduction
This thesis is comprised of two parts. In part 1, we evaluate the importance of
anharmonic effects by examining clusters of small molecules using ab initio methods. In part 2,
we present an efficient scheme to rank mutants of enzymes according to their reaction rates using
an empirical valence bond (EVB) methodology.
Throughout computational chemistry, the harmonic approximation is used to describe
many molecular properties. However, the harmonic approximation is unsuccessful at describing
certain vital chemical phenomena such as bond breaking or bond forming processes. Anharmonic
effects must be included in computations to account for such processes. Additionally,
anharmonicity can significantly alter properties such as geometries, vibrational frequencies and
nuclear magnetic shieldings. There are several ways to include anharmonic effects into
calculations using first principles. Two popular ways are based on self-consistent (vibrational
self-consistent field: VSCF1,2) or second order perturbative (second order vibrational perturbation
theory: VPT23) approaches. The VSCF method is implemented in the GAMESS package4, and
the VPT2 method is implemented in Gaussian 03.5 Although it appears that converged
variational results of VSCF should be more accurate than a second order perturbative method, as
noticed by Handy and coworkers, the VPT2 treatment often leads to an effective inclusion of
nearly exact higher order terms that are required for computation of third and semidiagonal fourth
derivatives needed for second order perturbation.6,7 Therefore, VPT2 predictions can be closer to
experimental results than VSCF predictions.8 We also note that incorporating quantitative
2
anharmonic effects into calculations requires heavy additional computations and thus is still
prohibitive for large molecular systems such as biomolecules.
In part 1 of this thesis, we specifically investigate anharmonic effects of single and
double pairs of ammonium nitrate (AN) and hydroxylammonium nitrate (HAN) molecules. The
VPT2 method is used within the density functional theory framework to compare and contrast
anharmonic effects against their harmonic counterparts. AN and HAN are useful model systems
in ionic liquid studies. They are good examples of protic ionic liquids, which are formed by
proton transfer between acids and bases. This proton transfer reaction has been a topic of
extensive investigations due to its potential use in high-temperature fuel cells.9-12 AN has also
been used as a solid oxidizer in rocket propulsion fuels,13,14 and HAN has been used in liquid
propellants. Accurate understanding of fundamental properties of these ionic materials provides
the foundation for future studies of ionic liquids. In chapter 2, we will characterize AN and HAN
with the VPT2 method.
In part 2 of this thesis, we investigate chemical reactions within large biomolecules such
as enzymes. As stated previously, biomolecules are not tractable with ab initio computational
methods due to the large system size. Thus, we perform molecular dynamics (MD) simulations
within the framework of the empirical valence bond (EVB) theory.15 EVB methodology allows
chemical bonds to break or form and includes anharmonic effects for these bonds. The method is
then extended to investigate activation free energy barriers of many enzymatic mutants.
Enzymatic proteins are remarkable molecules due to their specificity and high efficiency
in catalysis.16,17 Decoding their ability to increase rates of reactions by many orders of magnitude
under physiological conditions is not only of interest to fundamental sciences but to applied
sciences and industry as well. Many state of the art experimental and computational techniques
are employed to probe the structure and dynamics of enzymes. Among the computational
techniques are various flavors of MD simulation methods.15,18 MD simulation is based on the
3
hypothesis that statistical ensemble averages of a system are equal to the time averages.
Interactions between particles of the system are often described using an empirical forcefield, and
the time evolution of the system is computed over a large number of steps with each step carried
out for a small change in time.15 As a result, calculations extending over a long period of time,
with quality dependant on the empirical forcefield, are typical characteristics of MD simulations.
Although significant progress has been made toward understanding the structures and
functions of enzymes using MD simulations, there are still many unanswered questions due to the
complex structural and dynamical nature of enzymes. Continuous improvements are made in
theoretical procedures not only to improve accuracy of predictions of these systems but also to
increase their efficiency. However, improving one of these two aspects typically comes at the
expense of the other. One particularly interesting area under investigation is computational
mutant studies or protein design. The goal of mutant studies or protein design is to investigate
possibilities of altering activity or specificity of an enzyme. Typically, the mutants or designs are
then ranked according to their chemical activity. Enzymes with altered activity or specificity are
in demand for drug discovery, detergents, soil treatment, and even enzyme-based computers.19-23
Although mutant studies can be performed by both experimental and computational methods,
computational methods to predict activity are typically more attractive in the initial stages due to
the lesser amount of resources required.
Computational methods used in mutant activity studies require two essential qualities: (1)
they should be very efficient as there will be many structures to study; (2) they should be able to
describe the chemical processes in enzymes accurately enough to distinguish between the
activities of different mutants or designs. This implies that it is prohibitive to perform long
conventional MD calculations per mutant or design, and the MD procedure needs to be
streamlined and possibly automated so that many computations can be managed at once.
Additionally, the inability of empirical forcefields to describe bond breaking and bond forming
4
processes needs to be addressed. These problems can be remedied by using a hybrid
quantum/classical MD (QM/MM) approach.24-26 While there are many flavors of QM/MM
approaches, we use an empirical valence bond (EVB) strategy in this thesis.15,27 The EVB MD
method is well suited for our task, as it is capable of providing insights into chemical reactions
quantitatively and very efficiently. It has been used successfully to describe a wide range of
chemical reactions in solution and proteins.15,24,28-30 To describe a reaction with the EVB method,
two valence bond states are defined, one for the reactant state and one for the product state. The
electronic ground state of the potential energy surface is modeled by a mixture of these two states
to allow the chemical reaction to be driven forward. A more complete description of the EVB
method is given in Section 3.2.
We present the mutant ranking studies performed within the EVB/MD framework to
efficiently rank mutants of dihydrofolate reductase (DHFR) according to their activity. DHFR has
become an ideal system for computational studies due to the small system size, the abundance of
structures present in the Protein Data Bank,31 and the importance of the reaction it catalyzes.
DHFR catalyzes the reduction of 7,8-dihydrofolate (DHF) to 5,6,7,8-tetrahydrofolate (THF),
where coenzyme nicotinamide adenine dinucleotide phosphate (NADPH) acts as a hydride
donor.32 THF is the active form of folate in humans, and it is required to function as a methyl
group shuttle for the de novo synthesis of purines, pyrimidines, and certain amino acids.
Inhibiting DHFR activity results in folate deficiency, which can be manipulated into a therapeutic
effect to battle cancerous cell growth or bacterial growth.17,19,33-35 The complete reaction of
DHFR is a two-step process, where a proton transfer to the N5 position of DHF is thought to
occur prior to the hydride transfer between protonated DHF and NADPH (Figure 1.1). This
notion is based on the chemical intuition that an initial proton transfer would be energetically
more favorable than an initial hydride transfer. Hydride transfer in DHFR has been subjected to
many computational studies, which provides a solid foundation for studying mutants.
5
Figure 1.1: The two-step reaction in DHFR. Proton transfer at N5 position of DHF is thought to precede
hydride transfer to C6 position.
In chapter 3 we will present the methodology for generating free energy profiles and
validate it using E. coli wild-type DHFR. Chapter 4 will present the procedures and results for
ranking the mutants of DHFR according to their catalytic reaction rates. Finally, in chapter 5 we
will provide overall conclusions.
1.2 References
(1) Carney, G. D.; Sprandel, L. I.; Kern, C. W. Advances in Chemical Physics 1978, 37, 305. (2) Bowman, J. M. Journal of Chemical Physics 1978, 68, 608.
6
(3) Barone, V. Journal of Chemical Physics 2004, 120, 3059. (4) Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S. J.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. Journal of Computational Chemistry 1993, 14, 1347. (5) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, J. A., Jr.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; Pople, J. A. Gaussian03; revision C.03 ed.; Gaussian, Inc.: Pittsburgh, PA, 2003. (6) Burcl, R.; Handy, N. C.; Carter, S. Spectrochim Acta A Mol Biomol Spectrosc 2003, 59, 1881. (7) Burcl, R.; Carter, S.; Handy, N. C. Chemical Physics Letters 2003, 373, 357. (8) Barone, V. Journal of Chemical Physics 2005, 122, 014108. (9) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2002, 117, 2599. (10) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2003, 119, 4274. (11) Guillot, B.; Guissani, Y. Journal of Chemical Physics 2002, 116, 2047. (12) Schmidt, M. W.; Gordon, M. S.; Boatz, J. A. Journal of Physical Chemistry A 2005, 109, 7285. (13) Kondirkov, B. N.; Annikov, V. E.; Egorshev, V. Y.; DeLuca, L.; Bronzi, C. J. Propul. Power 1999, 15, 763. (14) Sinditskii, V. P.; Egorshev, V. Y.; Levshenkov, A. I.; Serushkin, V. V. Propellants, Explosives, Pyrotechnics 2005, 30, 269. (15) Warshel, A. Computer Modeling of Chemical Reactions in Enzymes and Solutions; John Wiley & Sons, Inc.: New York, 1991. (16) Benkovic, S. J.; Hammes-Schiffer, S. Science 2003, 301, 1196. (17) Berg, J. M.; Tymoczko, J. L.; Stryer, L. Biochemistry, 5th ed.; W.H. Freeman: New York, 2002. (18) Brooks, C. L.; Karplus, M.; Pettitt, B. M. Proteins : a theoretical perspective of dynamics, structure, and thermodynamics; J. Wiley: New York, 1988. (19) Allegra, C. J.; Hoang, K.; Yeh, G. C.; Drake, J. C.; Baram, J. Journal of Biological Chemistry 1987, 262, 13520. (20) Baron, R.; Lioubashevski, O.; Katz, E.; Niazov, T.; Willner, I. Angew Chem Int Ed Engl 2006, 45, 1572. (21) Leahy, J. G.; Colwell, R. R. Microbiological Reviews 1990, 54, 305. (22) Kapoor, K. K.; Jain, M. K.; Mishra, M. M.; Singh, C. P. Annals of Microbiology (Paris) 1978, 129 B, 613. (23) Rao, A. G. Plant Physiology 2008, 147, 6. (24) Billeter, S. R.; Webb, S. P.; Iordanov, T.; Agarwal, P. K.; Hammes-Schiffer, S. Journal of Chemical Physics 2001, 114, 6925. (25) Billeter, S. R.; Webb, S. P.; Agarwal, P. K.; Iordanov, T.; Hammes-Schiffer, S. Journal of the American Chemical Society 2001, 123, 11262.
7
(26) Hammes-Schiffer, S. Accounts of Chemical Research 2006, 39, 93. (27) Warshel, A. Journal of Physical Chemistry 1982, 86, 2218. (28) Schmitt, U. W.; Voth, G. A. Journal of Physical Chemistry B 1998, 102, 5547. (29) Vuilleumier, R.; Borgis, D. Chemical Physics Letters 1998, 284, 71. (30) Cembran, A.; Gao, J. Theoretical Chemistry Accounts 2007, 118, 211. (31) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Research 2000, 28, 235. (32) Miller, G. P.; Benkovic, S. J. Chemistry & Biology 1998, 5, R105. (33) Miovic, M.; Pizer, L. I. Journal of Bacteriology 1971, 106, 856. (34) Huennekens, F. M. Advances in Enzyme Regulation 1994, 34, 397. (35) Schweitzer, B. I.; Dicker, A. P.; Bertino, J. R. FASEB Journal 1990, 4, 2441.
8
Chapter 2
Anharmonic Effects in Ammonium Nitrate and Hydroxylammonium Nitrate Clusters
Reproduced in part with permission from M. D. Kumarasiri, C. Swalina, and S. Hammes-Schiffer, Journal of Physical Chemistry B 2007, 111, 4653.
© 2007 American Chemical Society
2.1 Introduction
The physical properties of hydrogen-bonded acid-base complexes impact a wide range of
materials. One example is room temperature ionic liquids, which are typically defined to be
organic salts that melt below 100 °C.1-3 Ionic liquids have many potential technological
applications because of their low vapor pressure, versatility, and environmentally benign nature.
Protic ionic liquids, which are formed by proton transfer between acids and bases, are potentially
relevant to high-temperature fuel cell applications.4 Although not room temperature ionic liquids,
ammonium nitrate (AN) and hydroxylammonium nitrate (HAN) serve as useful model systems
for the development of methods to study the properties of ionic liquids. AN has been used as a
solid oxidizer in rocket propulsion fuels,5,6 and HAN has been used in liquid propellants.7,8
Understanding the fundamental properties of these ionic materials will provide the foundation for
future studies of ionic liquids. A topic of particular interest is the role of proton transfer reactions
in hydrogen-bonded acid-base complexes. Hydrogen tunneling and coupling between heavy
atom motions and the transferring proton are expected to be important in these types of reactions.
A variety of theoretical studies have investigated the role of proton transfer reactions in
hydrogen-bonded acid-base complexes. Thompson and coworkers used density functional theory
and ab initio MP2 theory to study proton transfer in gas phase clusters of ammonium nitrate,9
ammonium dinitramide,10 and hydroxylammonium nitrate.11 In addition, Morokuma and
9
coworkers studied ammonium dinitramide clusters at both the RHF and the MP2 levels.12 The
calculations on single acid-base pairs in the gas phase indicate that the hydrogen bonded, neutral
acid-base pair is the only stable structure (i.e., the ionic pairs are not stable) at correlated levels of
theory. The ionic dimers (i.e., two ionic acid-base pairs) are stable minima on the correlated
potential energy surfaces. Thus, the properties of ionic dimers are expected to be more relevant
to bulk ionic materials. In addition to these studies, recently Schmidt, Gordon, and Boatz
performed calculations on proton transfer in triazolium-dinitramide ion pairs.13 Guillot and
Guissani performed one-phase and two-phase molecular dynamics simulations to study the
impact of proton transfer on the phase behavior of ammonium chloride (NH4Cl).14 They
determined that the existence of both ionic and covalent species in the liquid phase influences the
melting process.
The objective of this chapter is to characterize covalent and ionic clusters of ammonium
nitrate (NH4+NO3
−) and hydroxyl ammonium nitrate (HONH3+NO3
−) with the inclusion of
anharmonic effects. We perform density functional theory calculations of the isolated neutral and
ionic components, the covalent monomers, and the ionic dimers. In each case, we use the second-
order vibrational perturbation theory (VPT2) to calculate the frequencies and geometries. This
approach leads to more accurate frequencies and geometries than previous calculations of
frequencies directly from the Hessian because the anharmonic effects are included. We also
calculate the anharmonic effects on the nuclear magnetic shielding constants for nitrogen,
oxygen, and hydrogen nuclei in the ionic clusters. All of these calculations provide insight into
the significance of anharmonic effects in ionic materials and provide data that will be useful for
the parameterization of molecular mechanical forcefields for ionic liquids and other ionic
materials.
10
2.2 Methods
All of the calculations were performed with density functional theory (DFT) using the
B3LYP functional15-17 and the 6-311++G(d,p) basis set18,19 with the Gaussian 03 package.20 We
used a pruned (99,770) grid for the numerical integrations. We calculated the frequencies based
on the harmonic approximation directly from the Hessian and the frequencies including
anharmonic effects with the VPT2 method. In the VPT2 method, the zeroth-order vibrational
wavefunctions are generated from the harmonic approximation, and the second-order perturbation
theory corrections are calculated from the cubic force constants and semidiagonal quartic force
constants. The required cubic and quartic force constants are obtained by numerical
differentiation of the analytical Hessians. The VPT2 method has been implemented by
Barone21,22 in the Gaussian 03 package.20
We calculated the anharmonic contribution to the vibrationally averaged isotropic nuclear
magnetic shielding constants for all nuclei by comparing shielding constants evaluated at the
equilibrium and vibrationally averaged geometries at 0 K temperature. The gauge independent
atomic orbital (GIAO) approach23 was used to calculate the nuclear magnetic shielding constants.
This approach accounts specifically for the contribution to the shielding constants arising from
the anharmonicity of the potential energy surface. Additional zero-point vibrational effects could
be calculated from the curvature of the surface corresponding to the shielding constant and the
harmonic frequencies,24-26 but such calculations are beyond the scope of this work.
2.3 Results
As observed previously for AN and HAN,9,11 the only stable structures for the monomers
are hydrogen-bonded, neutral acid-base pairs, whereas the structures corresponding to two ionic
11
acid-base pairs are the global minima for the dimers. The optimized geometries for the
monomers and dimers of AN and HAN are shown in Figure 2.1.
Figure 2.1: Optimized geometries of monomers and dimers of AN and HAN. Hydrogen bonds are
indicated by dashed lines. The hydrogen bonding distances are given in Angstroms for the vibrationally
averaged and equilibrium geometries, where the equilibrium distances are given in parentheses. (a)
Covalent monomer of AN with Cs symmetry. (b) Covalent monomer of HAN with Cs symmetry. (c) Ionic
dimer of (AN)2 with C2h symmetry. (d) Ionic dimer of (HAN)2 with C2 symmetry. This figure was created
using GaussView.27
(a) (b)
(c) (d)
12
The AN and HAN monomers have Cs symmetry, the (AN)2 dimer has C2h symmetry, and
the (HAN)2 dimer has C2 symmetry. The hydrogen bonding distances are given for both the
vibrationally averaged and the equilibrium structures. The vibrationally averaged structure is
obtained by averaging the coordinates over the nuclear vibrational wavefunction calculated with
the VPT2 method. Thus, the vibrationally averaged structures include anharmonic effects. As
expected, the bond lengths for the bonds between the donor atoms and the hydrogen atoms
increase when the anharmonic effects are included. For the structures given in Figures 2.1(a), (b)
and (c), the distance between the hydrogen atom and the donor atom increases and the distance
between the hydrogen atom and the acceptor atom decreases in each hydrogen bond when
anharmonic effects are included. The hydrogen bonding in the (HAN)2 dimer shown in Figure
2.1(d) is more complex because one of the oxygen atoms on each −3ΝΟ moiety serves as the
acceptor for two hydrogen bonds.
The calculated frequencies for the isolated neutral and ionic species are given in Tables
2.1 and 2.2, respectively. The experimental frequencies for the isolated neutral species are also
provided in Table 2.1. A comparison of the frequencies calculated with the VPT2 method to the
experimental data enables us to benchmark the VPT2 method for these types of systems. As
shown in Table 2.1, the frequencies obtained with the VPT2 method are in better agreement with
the gas phase experimental data than those obtained with the conventional harmonic approach.
These results illustrate the importance of anharmonic effects. In some cases, the anharmonic
effects decrease the frequency by ∼200 cm-1, significantly improving the agreement with
experiment. A more standard approach to account for vibrational anharmonicity in electronic
structure calculations is to scale the calculated harmonic frequencies by an empirical scaling
factor. The empirical scaling factor for the B3LYP DFT method with similar basis sets has been
determined to be ~0.96 – 0.97.28,29 While this empirical scaling procedure leads to qualitatively
13
reasonable frequencies, Tables 2.1 and 2.2 indicate that the VPT2 method is more quantitatively
accurate. Average deviation of VPT2 frequencies is 7 cm-1 from experiment compared to the 66
cm-1 of harmonic frequencies.
Table 2.1: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the isolated neutral species. The experimental frequencies for NH3 are from Ref. 30. The experimental frequencies for HNO3 are from Refs. 31-34. The experimental frequencies for HONH2 are from Ref. 35.
Species Label Description Experimental frequency Harmonic VPT2
NH3 v1(a1) Sym. Stretch 3337 3480 3339
v2(a1) Sym. Bend 950 1006 902
v3(e) Asym. Stretch 3444 3607 3440
v4(e) Asym. Bend 1627 1669 1619
HNO3 v1(a') OH stretch 3550 3727 3548
v2(a') NO asym. stretch 1709 1756 1711
v3(a') NO sym. stretch 1326 1349 1319
v4(a') NOH bend 1304 1320 1294
v5(a') NO(OH) stretch 878 897 875
v6(a') ONO bend 647 649 633
v7(a'') ONO(OH) bend 580 587 575
v8(a'') N out-of-plane bend 763 773 762
v9(a'') OH torsion 458 461 446
HONH2 v1(a') OH stretch 3650 3824 3631
v2(a') NH stretch 3294 3448 3286
v3(a') HNH bend 1604 1673 1609
v4(a') NOH bend 1353 1390 1337
v5(a') NH2 wag 1115 1135 1096
v6(a') NO stretch 895 927 900
v7(a'') NH stretch 3359 3528 3342
v8(a'') NH2 twist 1294 1328 1286
v9(a'') OH torsion 386 442 419
14
Table 2.2: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the isolated ionic species. Note that VPT2 method is not applicable to +
4ΝΗ because it behaves as a spherical top.
Species Label Description Harmonic VPT2
NH4+ v1(a1) Sym. Stretch 3372
v2(e) Twist 1727
v3(t) Asym. Stretch 3475
v4(t) Asym. Bend 1489
NO3- v1(a') Sym. Stretch 1066 1044
v2(a'') N out-of-plane bend 835 825
v3(e') Asym. Stretch 1378 1344
v4(e'') ONO Asym. Bend 709 699
HONH3+ v1(a') OH stretch 3698 3523
v2(a') NH stretch 3426 3254
v3(a') NH stretch 3328 3202
v4(a') HNH bend 1640 1586
v5(a') NH3 umbrella mode 1592 1542
v6(a') NOH bend 1476 1404
v7(a') NH3 wag 1152 1126
v8(a') NO stretch 1016 981
v9(a'') NH stretch 3400 3229
v10(a'') HNH bend 1639 1591
v11(a'') NH3 twist 1193 1157
v12(a'') HONH torsion 315 258
15
The calculated frequencies for the covalent monomers of AN and HAN are given in
Tables 2.3. Tables 2.4 and 2.5 provides calculated frequencies of ionic dimers of AN and HAN,
respectively. The inclusion of anharmonic effects significantly decreases the frequencies of the
vibrational modes in all of these clusters, particularly for the NH and OH stretching modes. The
largest effects were observed for the NH and OH stretching modes involved in hydrogen bonding
interactions. The NH and OH stretching frequencies (i.e., ν2-4, 29-32) are decreased by up to ~500
cm-1 in (HAN)2. Moreover, the +4ΝΗ symmetric stretch frequency, ν2, is decreased by ~1000
cm-1 in (AN)2. In these cases, the application of the empirical scaling factor leads to substantial
errors in the predicted frequencies. In addition to providing the frequencies for all modes in the
clusters, Table 2.3 also provides descriptions of the modes that are relatively localized with
definitive character. The remaining modes are not straightforward to characterize. For the ionic
dimers, we identified a mode corresponding to an intermolecular breathing mode, which
corresponds to ν11 in (AN)2 and ν20 in (HAN)2. These breathing modes are associated with
relatively low frequencies of ~300 cm-1.
16
Table 2.3: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the covalent monomers (i.e. one ion pair) AN and HAN.
AN HAN Label Description Harmonic VPT2 Label Description Harmonic VPT2 v1(a') NH3 asym.
stretch 3586 3416 v1(a') OH(HONH2)
stretch 3682 3480
v2(a') NH3 sym. stretch
3472 3326 v2(a') NH sym. stretch
3448 3286
v3(a') OH stretch 2732 2266 v3(a') OH(HNO3) stretch
2660 2106
v4(a') NO asym. stretch
1733 1675 v4(a') ONO stretch 1740 1683
v5(a') NH3 asym. bend 1656 1629 v5(a') HNH bend 1665 1605 v6(a') NOH bend 1511 1459 v6(a') NOH(HNO3)
bend 1533 1487
v7(a') NO sym. stretch 1326 1290 v7(a') NOH(HONH2) bend
1487 1442
v8(a') NH3 sym. bend 1151 1099 v8(a') ONO stretch 1313 1273 v9(a') NO(OH) stretch 953 940 v9(a') NH2 wag 1191 1164 v10(a') ONO bend 691 680 v10(a') NO(HONH2)
stretch 984 953
v11(a') ONO(OH) bend 660 645 v11(a') NO(HNO3) stretch
963 937
v12(a') NH3 in-plane rotation
430 424 v12(a') ONO bend 698 687
v13(a') NHO stretch 248 235 v13(a') ONO bend 655 643 v14(a') NHO in-plane
bend 106 90 v14(a') HONH2 in-
plane wag 271 26
v15(a'') NH3 asym. stretch
3592 3420 v15(a') NHO stretch 200 190
v16(a'') NH3 asym. bend 1668 1632 v16(a') 163 147 v17(a'') NH3 out-of-
plane torsion 1097 1051 v17(a") NH asym.
stretch 3519 3335
v18(a'') N out-of-plane bend
791 782 v18(a") NH2 twist 1320 1274
v19(a'') OH torsion 339 349 v19(a") OH(HNO3) torsion
1089 1060
v20(a'') 73 68 v20(a") N out-of-plane bend
786 779
v21(a'') 61 v21(a") OH(HONH2) torsion
558 544
v22(a") NH2 twist 370 372 v23(a") 81 79 v24(a") 42 46
17
Table 2.4: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the ionic dimer (AN)2.
Label Description Harmonic VPT2 Label Description Harmonic VPT2 v1(ag) NH asym.
stretch 3504 3332 v25(bg) NH asym.
stretch 3577 3405
v2(ag) NH(H-bonded) sym. stretch
2943 1839 v26(bg) NH(H-bonded) asym. stretch
2782 2292
v3(ag) NH4+ twist 1752 1680 v27(bg) NH4
+ asym. bend
1623 1550
v4(ag) NH4+ twist 1726 1567 v28(bg) NO3
- asym. stretch
1496 1460
v5(ag) NH4+ asym.
bend 1510 1443 v29(bg) NH4
+ asym. bend
1361 1294
v6(ag) NO(H-bonded) asym. stretch
1309 1272 v30(bg) ONO asym. bend
723 710
v7(ag) NO3- sym.
stretch 1028 1001 v31(bg) NH4
+ wag 436 411
v8(ag) N out-of-plane bend
828 814 v32(bg) NH4+ wag 311 288
v9(ag) ONO asym. bend
715 701 v33(bg) 178 140
v10(ag) NH4+ wag 416 364 v34(bg) 105 98
v11(ag) Breathing 304 291 v35(bg) 72 64 v12(ag) 131 126 v36(bu) NH asym.
stretch 3577 3402
v13(ag) 41 30 v37(bu) NH(H-bonded) asym. stretch
2913 2482
v14(au) NH asym. stretch
3503 3338 v38(bu) NH4+ asym.
bend 1621 1541
v15(au) NH(H-bonded) sym. stretch
2899 2554 v39(bu) NH4+ asym.
bend 1404 1349
v16(au) NH4+ twist 1751 1682 v40(bu) NO(H-bonded)
asym. stretch 1281 1235
v17(au) NH4+ twist 1706 1617 v41(bu) NO3
- sym. stretch
1025 996
v18(au) NO3- asym.
stretch 1518 1483 v42(bu) N out-of-plane
bend 828 812
v19(au) NH4+ asym.
bend 1498 1436 v43(bu) ONO asym.
bend 714 700
v20(au) ONO asym. bend
733 721 v44(bu) NH4+ wag 463 463
v21(au) NH4+ wag 404 374 v45(bu) NH4
+ wag 332 311 v22(au) 283 270 v46(bu) 254 201 v23(au) 76 73 v47(bu) 100 99 v24(au) 65 65 v48(bu) 33 22
18
Table 2.5: Harmonic and VPT2 frequencies in cm-1 of vibrational modes in the ionic dimer (HAN)2.
Label Description Harmonic VPT2 Label Description Harmonic VPT2 v1(a) NH stretch 3520 3350 v29(b) NH stretch 3519 3350 v2(a) OH stretch 3307 3096 v30(b) OH stretch 3318 3039 v3(a) NH(H-bonded)
stretch 3013 2576 v31(b) NH(H-bonded)
stretch 3017 2640
v4(a) NH(H-bonded) stretch
2836 2355 v32(b) NH(H-bonded) stretch
2881 2460
v5(a) HNH bend 1715 1609 v33(b) HNH bend 1709 1623 v6(a) NH3
umbrella mode
1633 1587 v34(b) HNH bend 1626 1566
v7(a) HNH bend 1604 1543 v35(b) NOH bend 1607 1591 v8(a) NOH bend 1583 1525 v36(b) NH3
umbrella mode
1577 1527
v9(a) NO(NO3) asym. stretch
1499 1465 v37(b) NO(NO3) asym. stretch
1520 1481
v10(a) NH3 wag 1310 1269 v38(b) NH3 wag 1311 1273 v11(a) NH3 twist 1293 1244 v39(b) NH3 twist 1280 1239 v12(a) NH3 wag 1219 1188 v40(b) NH3 wag 1220 1186 v13(a) NO(NO3)
sym. stretch 1043 1019 v41(b) NO(NO3)
sym. stretch 1037 1012
v14(a) NO(HONH3) stretch
1031 996 v42(b) NO(HONH3) stretch
1029 995
v15(a) N out-of-plane bend
828 806 v43(b) N out-of-plane bend
826 811
v16(a) OH(HONH3) twist
802 733 v44(b) OH(HONH3) twist
782 724
v17(a) ONO bend 728 715 v45(b) ONO bend 726 712 v18(a) ONO bend 707 689 v46(b) ONO bend 701 675 v19(a) NH3 twist 458 442 v47(b) NH3 twist 460 446 v20(a) Breathing 286 274 v48(b) 289 280 v21(a) 233 205 v49(b) 256 240 v22(a) 177 154 v50(b) 190 169 v23(a) 145 140 v51(b) 151 148 v24(a) 124 126 v52(b) 133 130 v25(a) 111 102 v53(b) 115 107 v26(a) 71 62 v54(b) 48 53 v27(a) 52 39 v28(a) 30 12
19
The nuclear magnetic shielding constants for all nuclei in the covalent monomers and the
ionic dimers of AN and HAN are given in Table 2.6. For both monomers, the shieldings for the
oxygen nuclei involved in hydrogen bonding interactions are influenced more by anharmonic
effects than the other oxygen nuclei. For both dimers and the AN monomer, the shieldings for
the nitrogen nuclei involved in hydrogen bonding interactions are influenced more by anharmonic
effects than the other nitrogen nuclei. For both monomers and (AN)2, the magnitude of the shift
of the shieldings due to anharmonic effects is similar for all of the hydrogen nuclei, but the
direction of this shift is different for the hydrogen nuclei involved in hydrogen bonds. For
(HAN)2, the anharmonic effects on the shieldings for the hydrogen nuclei do not exhibit a clear
trend because of the more complex hydrogen bonding pattern. For reference, we also calculated
the nuclear magnetic shielding constant for hydrogen in tetramethylsilane (TMS) at the same
level of theory.36 This reference enables the calculation of chemical shifts that are experimentally
observable. The nuclear magnetic shielding constants for other reference materials are also
straightforward to calculate. These results indicate that the inclusion of anharmonic effects
significantly alters the nuclear magnetic shielding constants. Thus, a quantitatively accurate
prediction of chemical shifts for comparison to experimental data requires the inclusion of
anharmonic effects.
These calculations provide insight into several general features of covalent and ionic
hydrogen-bonded clusters. As observed previously, the most stable structures for the monomers
are covalent acid-base pairs, whereas the most stable structures for the dimers are ionic acid-base
pairs. The hydrogen bonding distances are greater in the ionic dimers than in the covalent
monomers. Although the hydrogen bonding distances might be expected to be shorter for the
charged species, the observed trend arises in part because the nitrogen and oxygen atoms are
involved in multiple competing hydrogen bonding interactions in the dimers. In addition, the
frequencies undergo substantial qualitative shifts from the covalent monomers to the ionic
20
dimers. Although the direct correspondence between specific modes in the covalent and ionic
complexes is not rigorous due to mixing among the many modes, some general trends are
observed. As expected, the NH stretching frequencies in NH3 and +4NH for AN and (AN)2,
respectively, differ significantly. The NO symmetric and asymmetric stretching frequencies in
NO3 also differ substantially between the covalent and the ionic AN clusters. Furthermore, the
intermolecular hydrogen-bonding stretching motion shifts from ν13 = 235 cm-1 in AN to a
breathing mode of ν11 = 291 cm-1 in (AN)2. Similar trends in the frequencies are observed for
HAN and (HAN)2, although the characterization of the modes is not as straightforward. In this
case, the intermolecular hydrogen-bonding stretching motion shifts from ν15 = 190 cm-1 in HAN
to a breathing mode of ν20=274 cm-1 in (AN)2. The quantitative study of these changes in the
structures and vibrational frequencies requires the inclusion of anharmonic effects.
21
Table 2.6: Nuclear magnetic shielding constants for AN, HAN, (AN)2 and (HAN)2. All shielding constants are given in ppm. σeq and σvib are the shieldings at the equilibrium and the vibrationally averaged geometries, respectively. For reference, the nuclear magnetic shielding constant for H in TMS calculated at this level of theory is 31.9702.
Species Atom σeq σeq - σvib AN 1N -116.6324 2.7692
2O -73.7641 10.0719 3O -189.1048 7.7809 4O -211.7382 4.6787 5N 242.2423 -15.1759 6H 14.1399 0.4162 7,8H 30.9618 -1.9366 9H 30.4328 -1.9017
HAN 1N -114.9366 2.4143 2O -78.2194 9.0993 3O -172.1313 4.9056 4O -216.7373 2.7291 5O 236.5382 -1.9766 6N 141.3442 -2.1825 7H 14.3767 0.6096 8H 26.0492 -0.2386 9H 27.2008 -0.2030 10H 27.2006 -0.2032
(AN)2 6,11N -143.9728 -0.5831 1,15N 224.5532 -1.8135 8,9,10,13O -187.3903 2.0320 7,12O -129.3589 2.1699 3,4,14,16H 20.4853 0.5080 2,5,17,18H 28.5415 -0.5431
(HAN)2 1,12N -142.7165 0.3727 6,16N 156.4747 2.0139 4,11O -214.6025 2.5519 2,14O -155.4694 1.3114 3,13O -118.9725 2.8498 5,19O 208.0308 3.4048 10,15H 19.3276 0.4135 7,17H 20.7594 0.3260 8,20H 22.2187 -0.0484 9,18H 26.7813 -0.0224
22
2.4 Conclusions
In this chapter, we characterized the covalent and ionic clusters of ammonium nitrate and
hydroxyl ammonium nitrate using density functional theory and second-order vibrational
perturbation theory. These clusters exhibit strong hydrogen bonding interactions. Our
calculations confirmed that the most stable structures are covalent acid-base pairs for the
monomers and ionic acid-base pairs for the dimers. The hydrogen bonding distances were found
to be greater in the ionic dimers than in the covalent monomers in part because the nitrogen and
oxygen atoms are involved in multiple competing hydrogen bonding interactions in the dimers.
We also observed significant shifts in the stretching frequencies from the covalent monomers to
the ionic dimers. Moreover, we identified an intermolecular hydrogen-bonding stretching motion
of ~200 cm-1 in the monomers that shifts to an intermolecular breathing motion of slightly higher
frequency of ~300 cm-1 in the dimers.
Our calculations illustrate that the anharmonicities of the potential energy surfaces
influence the geometries, frequencies, and nuclear magnetic shieldings for these systems. The
inclusion of anharmonic effects was found to significantly decrease many of the calculated
frequencies in these clusters and to improve the agreement of the calculated frequencies with the
experimental data available for the isolated neutral species. Our results also indicate that the
anharmonic effects should be included in calculations of nuclear magnetic shielding constants for
these types of systems to ensure quantitatively accurate predictions for comparison to
experimental data. Furthermore, the consideration of anharmonic effects in the development of
molecular forcefields will be important for simulations of proton transfer reactions in ionic
liquids and other ionic materials.
23
2.5 References
(1) Welton, T. Chemical Reviews 1999, 99, 2071. (2) Holbrey, J. D.; Seddon, K. R. J. Chem. Soc., Dalton Trans. 1999, 2133. (3) Brennecke, J. F.; Maginn, E. J. AIChE Journal 2001, 47, 2384. (4) Yoshizawa, M.; Xu, W.; Angell, A. Journal of the American Chemical Society 2003, 125, 15411. (5) Kondirkov, B. N.; Annikov, V. E.; Egorshev, V. Y.; DeLuca, L.; Bronzi, C. J. Propul. Power 1999, 15, 763. (6) Sinditskii, V. P.; Egorshev, V. Y.; Levshenkov, A. I.; Serushkin, V. V. Propellants, Explosives, Pyrotechnics 2005, 30, 269. (7) Lee, H.; Litzinger, T. A. Combustion and Flame 2001, 127, 2205. (8) Lee, H.; Litzinger, T. A. Combustion and Flame 2003, 135, 151. (9) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2002, 117, 2599. (10) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2003, 118, 2599. (11) Alavi, S.; Thompson, D. L. Journal of Chemical Physics 2003, 119, 4274. (12) Mebel, A. M.; Lin, M. C.; Morokuma, K.; Melius, C. F. Journal of Physical Chemistry 1995, 99, 6842. (13) Schmidt, M. W.; Gordon, M. S.; Boatz, J. A. Journal of Physical Chemistry A 2005, 109, 7285. (14) Guillot, B.; Guissani, Y. Journal of Chemical Physics 2002, 116, 2047. (15) Lee, C.; Yang, W.; Parr, P. G. Physical Review B 1988, 37, 785. (16) Becke, A. D. Journal of Chemical Physics 1993, 98, 5648. (17) Stephens, P. J.; Devlin, F. J.; Chablowski, C. F.; Frisch, M. J. Journal of Physical Chemistry 1994, 98, 11623. (18) Krishnan, R.; Binkley, J. S.; Seeger, R.; Pople, J. A. Journal of Chemical Physics 1980, 72, 650. (19) Clark, T.; Chandrasekhar, J.; Spitznagel, G. W.; Schleyer, P. v. R. Journal of Computational Chemistry 1983, 4, 294. (20) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, J. A., Jr.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; Pople, J. A. Gaussian03; revision C.03 ed.; Gaussian, Inc.: Pittsburgh, PA, 2003. (21) Barone, V. Journal of Chemical Physics 2004, 120, 3059. (22) Barone, V. Journal of Chemical Physics 2005, 122, 014108. (23) Ditchfield, R. Journal of Chemical Physics 1972, 56, 5688. (24) Astrand, P.-O.; Ruud, K.; Taylor, P. R. Journal of Chemical Physics 2000, 112, 2655.
24
(25) Ruud, K.; Astrand, P.-O.; Taylor, P. R. Journal of Chemical Physics 2000, 112, 2668. (26) Ruud, K.; Astrand, P.-O.; Taylor, P. R. Journal of the American Chemical Society 2001, 123, 4826. (27) Dennington II, R.; Keith, T.; Millam, J.; Eppinnett, K.; Hovell, W. L.; Gilliland, R. GaussView; 3.09 ed.; Semichem, Inc.: Shawnee Mission, KS, 2003. (28) Irikura, K. K.; Johnson, R. D., III; Kacker, R. N. Journal of Physical Chemistry A 2005, 109, 8430. (29) Andersson, M. P.; Uvdal, P. Journal of Physical Chemistry A 2005, 109, 2937. (30) Shimanouchi, T. Molecular Vibrational Frequencies. In NIST Chemistry WebBook, NIST Standard Reference Database Number 69; Linstrom, P. J., Mallard, W. G., Eds.; National Institute of Standards and Technology: Gaithersburg, MD, 20899 (http://webbook.nist.gov), June 2005. (31) McGraw, G. E.; Bernitt, D. L.; Hisatsune, I. C. Journal of Chemical Physics 1965, 42, 237. (32) Perrin, A.; Lado-Bordowsky, O.; Valentin, A. Molecular Physics 1989, 67, 249. (33) Maki, A. G.; Olsen, W. B. Journal of Molecular Spectroscopy 1989, 133, 171. (34) Goldman, A.; Burkholder, J. B.; Howard, C. J.; Escribano, R.; Maki, A. G. Journal of Molecular Spectroscopy 1988, 131, 195. (35) Luckhaus, D. Journal of Chemical Physics 1997, 106, 8409. (36) Note that these calculations are based on harmonics frequencies. The VPT2 method is not applicable to TMS because it behaves as a spherical top.
25
Chapter 3
Simulation Methods for Hydride Transfer in Dihydrofolate Reductase
Reproduced in part with permission from D.K. Chakravory, M.D. Kumarasiri, A.V. Soudackov, and S. Hammes-Schiffer, Journal of Chemical Theory and Computation 2008, 4, 1974.
© 2007 American Chemical Society
3.1 Introduction
This chapter presents the methodology used to simulate the hydride transfer reaction
catalyzed by dihydrofolate reductase (DHFR). Our goal is to predict the free energy barrier of
wild-type DHFR as efficiently as possible without losing accuracy. This will allow us to simulate
the hydride transfer in many mutants of DHFR. The next chapter will utilize this methodology to
rank mutants of DHFR according to their hydride transfer reaction rates.
DHFR is a vital enzyme required for folate metabolism in humans. It converts DHF to
THF using the coenzyme NADPH. Specifically, the pro-R hydride of NADPH is transferred to
the C6 position of N5 protonated DHF (H3F+).1 The mechanism of this hydride transfer, as well
as the structure and dynamics of DHFR, has been studied extensively. The minima and transition
states have been studied with ab initio, semiempirical, and QM/MM methods.2-5 Hammes-
Schiffer and coworkers have used a hybrid quantum/classical approach to study the hydride
transfer reaction in DHFR.1,6-9 Truhlar, Gao and coworkers10-14 and, Thorpe and Brooks15-17 have
also studied this reaction using similar theoretical methods. X-ray crystallographic structure of
DHFR of different species has been determined for many substrate, cofactor and inhibitor
complexes.18 Thus DHFR provides an excellent target to study ranking of activity of mutants.
We are specifically focusing on Escherichia coli DHFR.
26
In transition state theory, the relationship between the rate constant k of a reaction and the
free energy barrier of activation †G∆ can be written as
† / BG k Tk e−∆∝ (3.1) 19
Therefore, ordering mutants according to their †G∆ will rank them according to their reaction
rates. The free energy barrier †G∆ of an enzymatic reaction can be calculated by generating the
free energy profile of the reaction. However, as stated in Section 1.1, common empirical force
fields used in molecular dynamics simulations cannot directly be used to study a chemical
reaction, as they do not permit bond breaking and forming processes. To overcome this
disadvantage, we use an empirical valence bond (EVB) approach.20,21
3.2 EVB Molecular Dynamics
In the EVB approach for a hydride transfer reaction, two valence bond states are defined.
In the first state, the transferring hydrogen is bonded to the donor (reactant state) and in the
second state, the transferring hydrogen is bonded to the acceptor (product state). Then MD
simulations are carried out to generate the free energy profile of the reaction along a collective
reaction coordinate. The total wavefunction of a system described by two valence bond (VB)
states can be written as
1 1 2 2c cψ ψΨ = + , (3.2)
where 1ψ is the wavefunction of the first VB state and 2ψ is the wavefunction of the second VB
state. The EVB Hamiltonian matrix corresponding to this system is
( ) ( ) ( )( ) ( )
11 12
21 22EVB
V VH
V V
=
R RR
R R, (3.3)
27
where R represents the coordinates of the nuclei, ( )11V R is the potential energy of state 1, and
( )22V R is the potential energy of state 2. The electronic ground state potential surface is
obtained by diagonalizing ( )EVBH R . Although the terms ( )12V R and ( )21V R can be expressed in
simple analytical functional forms with parameters fit to ab initio calculations or experimental
data, we approximate them as constants for simplicity and computational efficiency.
To obtain †G∆ for the hydrogen transfer reaction, the free energy profile of the reaction
must be calculated. This can be achieved efficiently by performing MD simulations with a
mapping or a biasing potential rather than the ground state EVB potential.20 The mapping
potential drives the reaction forward and is defined as
map 11 22( ; ) (1 ) ( ) ( )i i iV V Vλ λ λ= − +R R R , (3.4)
where ( )11V R and ( )22V R are the diagonal elements of the EVB Hamiltonian in Equation 3.2 and
R is the coordinates of the system. As the mapping parameter iλ is varied between 0 and 1, the
reaction progresses from VB state 1 (reactant state) to VB state 2 (product state). Therefore, MD
simulations are done for a series of iλ along a collective reaction coordinate. The collective
reaction coordinate chosen is an energy gap reaction coordinate:
11 22( ) ( ) ( )V VΛ = −R R R . (3.5)
This definition is analogous to the solvent coordinate used in standard Marcus theory for electron
transfer reactions.21-23
The free energy profile of the reaction is obtained in two steps. In the first step, the free
energy for the mapping potential map ( ; )iV λR is calculated along the reaction coordinate ( )Λ R for
each iλ using a standard binning procedure during MD simulations. In the second step, the
distributions of free energies along ( )Λ R obtained with different iλ are combined using a
28
statistical method. The weighted histogram analysis method (WHAM) and umbrella integration
(UI) method are two such statistical methods used for this purpose.24
3.3 WHAM and UI
In umbrella sampling,25-27 simulations are performed with a series of biasing potentials
( )iw ξ , where ξ is the reaction coordinate. The distribution b ( )iP ξ of the biased system along
the reaction coordinate is typically obtained by standard binning procedures to generate a
histogram. Specifically, the relevant range of the reaction coordinate is divided into bins, and
bbin( )iP ξ is the fraction of sampled configurations in the bin centered at the reaction coordinate
ξbin for the window corresponding to the biasing potential ( )iw ξ . The potential of mean force
(PMF) for the biased system along the reaction coordinate is given by
b b1( ) ln ( )i iA Pξ ξβ
= − , (3.6)
where 1 Bk Tβ = . The PMF for the unbiased system in each window is
u b1( ) ln ( ) ( )ii i iA P w Fξ ξ ξβ
= − − + , (3.7)
where Fi are constants that differ for each biasing potential or window.
In WHAM,26-30 the constants Fi are calculated iteratively to combine the unbiased
potentials of mean force for different windows. The following two equations are solved
iteratively
( ) ( ) ( )b j jwindows windows
F wi i j
i jP N P N e ξ βξ ξ − = ∑ ∑ (3.8)
( ) ( )ii wFe d e Pξ ββ ξ ξ−− = ∫ (3.9)
29
where Ni is the total number of configurations sampled for window i used to construct b ( )iP ξ .
After these equations are solved to self consistency, the PMF A(ξ) is obtained directly from P(ξ)
using the relation ( ) ln ( )A Pξ ξ β= − .
In UI,31,32 the derivative of the unbiased PMF with respect to the reaction coordinate is
calculated for each window,
( ) ( ) ( )u bln1 ii iA P dw
dξ ξ ξ
ξ β ξ ξ∂ ∂
= − −∂ ∂
, (3.10)
The data from different windows are combined according to a weighted average
( ) ( ) ( )uwindows
ii
i
A Ap
ξ ξξ
ξ ξ ∂ ∂
= ∂ ∂
∑ , (3.11)
where
( ) ( ) ( )b bi i i i i
ip N P N Pξ ξ ξ= ∑ . (3.12)
Subsequently, A(ξ) is obtained by numerical integration over ξ. In previous applications of UI,
the biasing potential is assumed to be of the form ( ) ( )2 2i iw Kξ ξ ξ= − . Moreover, the biased
PMF is expanded in a power series and truncated after the quadratic term, which is equivalent to
assuming a normal distribution for b ( )iP ξ ,
( )2
bb
bb
1 1exp22
ii
ii
P ξ ξξσσ π
− = − , (3.13)
where the mean biξ and the variance b
iσ for each window are determined from the simulation
data. These approximations lead to an analytical expression for the derivative of the unbiased
PMF given in Equation 3.10.
30
The UI method differs from WHAM in two important aspects. First, the UI method is
based on the derivative of the PMF, rather than the PMF itself, so it does not involve offsets and
therefore avoids the iterative procedure inherent to WHAM. Second, UI does not require a
binning procedure because the mean and variance of the normal distribution for each window are
determined directly from the raw simulation data, so a binning procedure is not required to obtain
the derivative of the PMF given in Equation 3.11. Specifically, the values of the reaction
coordinate for all configurations sampled are collected during the simulation, and the mean and
variance of the reaction coordinates collected for each window are determined directly from these
data without generating a histogram.33
To implement the UI method within the framework of a two-state EVB potential using an
energy gap reaction coordinate and a mapping potential, the derivative of the unbiased PMF
given in Equation 3.10 can be expressed as,
( )bu
2 212
ln1 12 2 4
iii
PAV
ξ ξλξ β ξ ξ
∂∂ = − − − + ∂ ∂ +
(3.14)
The complete derivation is given elsewhere.24 Approximating ( )biP ξ by a normal distribution,
we have obtained an analytical form for the derivative of the unbiased PMF for each window.
The data for the different windows can be combined using Equation 3.11, followed by numerical
integration of the derivative of the PMF over ξ to obtain the PMF A(ξ).
UI has several advantages over WHAM. One advantage of UI is that it does not require
overlap between the distributions of the windows, although such overlap is desirable to enhance
the accuracy. In contrast, WHAM requires sufficient overlap between the distributions of the
windows although, in principle, given sufficient sampling within each window, WHAM and UI
should converge to the same results if the distributions are Gaussian. However, the convergence
of the iterative procedure in WHAM becomes slow for small overlap between the distributions of
31
the windows, and insufficient sampling of the tail regions of the distributions combined with very
small overlap could preclude convergence. Additionally, UI utilizes an analytical expression for
the distributions, thereby decreasing the statistical noise. Moreover, UI does not require an
iterative procedure, so convergence is not an issue. These advantages become particularly
pronounced for small overlaps between the distributions of the windows, although additional
windows will enhance the accuracy of both methods.
3.4 Application to DHFR
The initial coordinates for wild type DHFR were obtained from an equilibrated reactant
state snapshot of a previous simulation.8,34 The initial simulation system includes the entire
protein, the substrate, and the cofactor solvated by 4122 explicit water molecules in a truncated
octahedral periodic box. The potential energy surface was represented by a two-state EVB
potential,20 where state 1 corresponds to the transferring hydrogen atom bonded to the donor, and
state 2 corresponds to the transferring hydrogen atom bonded to the acceptor, as described
previously. The diagonal elements of the EVB Hamiltonian terms ( )11V R and ( )22V R
correspond to the GROMOS force field.35 The EVB constant coupling parameter was 34.66
kcal/mol, and the constant energy adjustment for ( )22V R was 65.25 kcal/mol. Both of these EVB
parameters were obtained elsewhere.34 The MD simulations were performed using GROMOS
with a modified FORCE routine. The integration time step was 1 fs, and the constraints were
maintained by SHAKE. Two separate Berendsen thermostats with 0.1 ps relaxation times each
were used to maintain the temperature of the solute and the solvent molecules at 300 K.
A set of 19 mapping parameters from λi = 0.05 to 0.95 with a spacing of 0.05 were used
to generate the full free energy curve. The starting configuration for each window was obtained
32
from the previous window after 20 ps of equilibration. Each window was equilibrated for a total
of 350 ps, followed by 300 ps of data collection. The free energy barriers of 15.0 kcal/mol and
15.3 kcal/mol were determined with UI and WHAM, respectively. We used a Fortran 90
program to calculate the free energy barrier with WHAM and a Mathematica33 notebook to
calculate UI barriers. These barrier heights are consistent with the classical barriers determined
from previous simulations using both thermodynamic integration and WHAM. We also
generated two other independent sets of data with 50 ps of equilibration followed by 300 ps of
data collection. The free energy barriers determined from these three data sets differ by less than
0.5 kcal/mol compared to the free energy barrier of 15.4 kcal/mol obtained from another set of
data generated using 20 mapping potentials with 4.5 ns of molecular dynamics for each window
and an additional 2 ns for the four windows near the transition state.8,36 The nuclear quantum
effects of the transferring hydrogen, such as zero point energy and tunneling, have been shown
previously to decrease the free energy barrier by 2.4 kcal/mol.1 Since this decrease is expected to
be similar for wild-type and all mutants, not including the nuclear quantum effects will allow us
to increase computational efficiency of the method without sacrificing accuracy. Thus, all free
energy barriers were obtained with a classical treatment of the transferring hydrogen nucleus.
Generating 19 windows with 650 ps each requires approximately 1300 CPU hours on
Intel Xeon 3.0 GHz processors. Thus it is desirable to attempt to generate the free energy barrier
at a lesser computational cost. A set of six mapping parameters, λi = 0.05, 0.125, 0.250, 0.375,
0.500 and 0.625, were used to accomplish this and to generate a partial free energy curve.
Choosing λi=0.625 as the last window drives the reaction over the barrier, allowing adequate
sampling of the transition state region. Each window was equilibrated for 50 ps, followed by 300
ps of data collection. The resulting free energy barrier was 15.4 kcal/mol with WHAM and 15.1
kcal/mol with UI. Approximately 230 CPU hours were required to generate the partial free
energy curve from six windows. This validates our choice of generating partial free energy
33
profiles using a smaller number of windows. The complete free energy profile using 19 windows
and the partial free energy profile using six windows are given in Figure 3.1. Figure 3.2 compares
the free energy barriers obtained with six windows to that of 19 windows.
Figure 3.1: Free energy profiles of E. coli wild-type DHFR using WHAM (blue dashed) and UI (red). (a) Full free energy curve using 19 windows. (b) Partial free energy curve using 6 windows.
(b)
(a)
(b)
Collective reaction coordinate
34
Figure 3.2: Free energy profiles of wild-type DHFR using UI method. Full free energy curve using 19 windows is in blue, and the partial free energy curve using 6 windows is in red.
The structure of the initial snapshot in the central cell is given in figure 3.3. It is
composed of 21 pieces involving periodic images because the protein has drifted from the middle
of the central cell during previous simulations. During MD simulations, programs such as
GROMOS use periodic boundary conditions, so this drifting has no effect on the calculations.
However, the programs that we used to make mutations for the calculations discussed in next
chapter do not use periodic boundary conditions. Therefore, it is necessary to reconstruct the
system and place it in the middle of the central cell before making mutations. This involves three
steps:
1. Sewing the periodic images and centering the protein at the middle of the box.
2. Resolvating the protein.
3. Re-equilibrating the system.
35
Figure 3.3: Original reactant state snapshot. Periodic images manifest to make the protein structure appear broken. This figure was created using VMD.37
First the system is stripped of all non-crystal waters. Sewing the periodic images was
done using the UTAILOR routine implemented in the DLPROTEIN38 MD simulation package.
The UTAILOR procedure uses bond information to connect all bonds that do not have the
participating two atoms in the central single cell. To move the crystal waters with the protein,
first, the geometrical center of the sewed protein was placed at the center of the central box,
shifting the crystal waters with the protein. This positions the crystal waters that were not sewed
during the UTAILOR procedure outside the central box. Then the truncated octahedral periodic
boundary conditions were applied to move the crystal waters back into the central cell. During
this process, we also had to compensate for the difference in the origin of the coordinate system
between GROMOS and DLPROTEIN. The resulting structure was resolvated. The final system
36
contained 4097 SPC/E39 water molecules in a truncated octahedral periodic box with a distance of
66.55 Å between opposing square surfaces (Figure 3.4). The SPC/E water model includes
Lennard – Jones and electrostatic interactions. The water molecules were equilibrated for 500 ps
while freezing the protein structure. Then four cycles of restrained steepest descent
minimizations and 15 ps MD simulations were done on the reactant state potential energy surface
with gradual release of a restraining force constant. Figure 3.5 illustrates this procedure. The
initial restraining force constant used was 59.75 kcal mol-1 Å-2 and was halved with each cycle.
Finally, 300 ps of MD simulation was done on the EVB reactant potential surface prior to starting
the MD simulations for the windows corresponding to different mapping potentials. Each
window was equilibrated for 120 ps, followed by 250 ps of data collection.
Figure 3.4: Resolvated structure with no periodic images. This figure was created using VMD.37
37
Figure 3.5: Summary of steps of the restrained minimization and MD simulation procedure. fc is the restraining force constant which is halved with each cycle i.
The resulting free energy barrier is 15.6 kcal/mol with WHAM (Figure 3.6). UI provides
a 15.4 kcal/mol barrier. When we analyzed the data using 70 ps of equilibration followed by 300
ps of data collection per window or, alternatively, 170 ps of equilibration followed by 200 ps of
data collection per window, we obtained a free energy barrier of 15.7 kcal/mol or 15.6 kcal/mol,
respectively. Thus, the free energy barrier is converged to within 0.3 kcal/mol using six windows
and 120 ps of equilibration followed by 250 ps of data collection for each window.
Generate free energy curve with WHAM
or UI
6 windows MD with mapping
potentials 370 ps each
i = i + 1 k = k/2
MD of EVB reactant
300 ps
Restrained minimization of reactant
fc = k
Restrained MD of reactant fc = k, 15 ps
Wild type reactant structure
i = 1
i ≤ 4
Y
N
38
Figure 3.6: Partial free energy profile of E. coli wild-type DHFR with WHAM (blue dashed) and UI (red) using the sewed structure and 6 windows.
Based on previous observations, we also attempted to generate the free energy barrier
using a lesser number of windows with the UI method. Using four windows with iλ = 0.050,
0.250, 0.500 and 0.625 results in a 15.1 kcal/mol free energy barrier, while using iλ = 0.050,
0.125, 0.500 and 0.625 results in a 15.2 kcal/mol free energy barrier. Using only three windows
with the UI method did not result in a meaningful free energy barrier. We also noted that the
WHAM method using only four windows does not generate a meaningful result. This is expected
with WHAM, as it relies more heavily on the overlap between windows.
39
3.5 Conclusions
In this chapter, we described the theory behind predicting enzymatic reaction rates
according to the activation free energy barrier. The reaction rate is proportional to the free energy
barrier. The free energy barrier of a hydrogen transfer reaction can be generated by carrying out
EVB/MD simulations with an energy gap collective reaction coordinate. The transferring
hydrogen was represented by a Morse potential. A mapping potential is used to drive the reaction
over the barrier. The MD simulations were carried out for a series of mapping potentials
allowing sampling along the entire energy gap reaction coordinate.
We also described two statistical methods that can be used to generate the free energy
curve. The WHAM method uses an iterative procedure to combine the biased PMF curves
obtained with different mapping potentials. The UI method is based on the derivative of the PMF
with respect to the reaction coordinate rather than the PMF itself. There are two significant
advantages of UI over WHAM. The first advantage is that UI does not rely on a binning
procedure to generate histograms and therefore reduces the statistical error and converges
efficiently. The second advantage is that UI can provide accurate PMF curves efficiently even
with a small number of windows that do not overlap significantly. Thus, UI is a promising
method for generating accurate PMF curves for large systems for which sampling may be limited.
We established that for DHFR, it is not necessary to generate the full free energy profile
to calculate the free energy barrier. Generating only partial free energy profiles with a smaller
number of windows significantly reduces the computational cost. To facilitate mutant studies, we
also centered the protein in the central cell based on an initial structure that was generated with
periodic boundary conditions. The free energy barrier calculated with this initial structure agrees
with previous calculations and was similar for both the WHAM and UI methods. We showed
that the UI method can generate comparable free energy barriers even with only four windows.
40
3.4 References
(1) Agarwal, P. K.; Billeter, S. R.; Hammes-Schiffer, S. Journal of Physical Chemistry B 2002, 106, 3283. (2) Peter L. Cummins, J. E. G. Journal of Computational Chemistry 1990, 11, 791. (3) Andrés, J.; Safont, V. S.; Martins, J. B. L.; Beltrán, A.; Moliner, V. Journal of Molecular Structure: THEOCHEM 1995, 330, 411. (4) Andrés, J.; Moliner, V.; Safont, V. S.; Domingo, L. R.; Picher, M. T.; Krechl, J. Bioorganic Chemistry 1996, 24, 10. (5) Castillo, R.; Andres, J.; Moliner, V. Journal of the American Chemical Society 1999, 121, 12140. (6) Wong, K. F.; Selzer, T.; Benkovic, S. J.; Hammes-Schiffer, S. Proc Natl Acad Sci U S A 2005, 102, 6807. (7) Watney, J. B.; Soudackov, A. V.; Wong, K. F.; Hammes-Schiffer, S. Chemical Physics Letters 2006, 418, 268. (8) Wang, Q.; Hammes-Schiffer, S. Journal of Chemical Physics 2006, 125, 184102. (9) Billeter, S. R.; Webb, S. P.; Iordanov, T.; Agarwal, P. K.; Hammes-Schiffer, S. Journal of Chemical Physics 2001, 114, 6925. (10) Hammes-Schiffer, S. Current Opinion in Structural Biology 2004, 14, 192. (11) Pang, J.; Pu, J.; Gao, J.; Truhlar, D. G.; Allemann, R. K. Journal of the American Chemical Society 2006, 128, 8015. (12) Garcia-Viloca, M.; Truhlar, D. G.; Gao, J. Biochemistry 2003, 42, 13558. (13) Garcia-Viloca, M.; Truhlar, D. G.; Gao, J. Journal of Molecular Biology 2003, 327, 549. (14) Garcia-Viloca, M.; Alhambra, C.; Truhlar, D. G.; Gao, J. Journal of Computational Chemistry 2003, 24, 177. (15) Thorpe, I. F.; Brooks, C. L., 3rd. Proteins 2004, 57, 444. (16) Rod, T. H.; Radkiewicz, J. L.; Brooks, C. L., 3rd. Proc Natl Acad Sci U S A 2003, 100, 6980. (17) Brooks, C. L.; Karplus, M.; Pettitt, B. M. Proteins : a theoretical perspective of dynamics, structure, and thermodynamics; J. Wiley: New York, 1988. (18) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Research 2000, 28, 235. (19) Wigner, E. Physical Review 1932, 40, 749. (20) Warshel, A. Computer Modeling of Chemical Reactions in Enzymes and Solutions; John Wiley & Sons, Inc.: New York, 1991. (21) Warshel, A. Journal of Physical Chemistry 1982, 86, 2218. (22) Marcus, R. A. Annual Review of Physical Chemistry 1964, 15, 155. (23) Zusman, L. D. Chemical Physics 1980, 49, 295. (24) Chakravorty, D. K.; Kumarasiri, M.; Soudackov, A. V.; Hammes-Schiffer, S. Journal of Chemical Theory and Computation 2008, 4, 1974. (25) Torrie, G. M.; Valleau, J. P. Chemical Physics Letters 1974, 28, 578. (26) Ferrenberg, A. M.; Swendsen, R. H. Physical Review Letters 1989, 63, 1195. (27) Ferrenberg, A. M.; Swendsen, R. H. Phys. Rev. Lett. 1988, 61, 2635. (28) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. J. Comput. Chem. 1992, 13, 1011. (29) Roux, B. Comput. Phys. Commun. 1995, 91, 275. (30) Souaille, M.; Roux, B. Comput. Phys. Commun. 2001, 135, 40.
41
(31) Kastner, J.; Thiel, W. J. Chem. Phys. 2005, 123, 144104. (32) Kastner, J.; Thiel, W. J. Chem. Phys. 2006, 124, 234106. (33) Mathematica, Version 6.0; Wolfram Research, Inc.: Champaign, IL, 2007. (34) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. Journal of Physical Chemistry B 2004, 108, 12231. (35) van Gunsteren, W. F.; Billeter, S. R.; Eising, A. A.; Hunenberger, P. H.; Kruger, P.; Mark, A. E.; Scott, W. R. P.; Tironi, I. G. Biomolecular simulation: The GROMOS96 manual and user guide; VdF Hochschulverlag, ETH Zurich: Zurich, 1996. (36) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. J. Phys. Chem. B 2004, 108, 12231. (37) Humphrey, W.; Dalke, A.; Schulten, K. J Mol Graph 1996, 14, 33. (38) Melchionna, S.; Cozzini, S. DLPROTEIN 2.1Rome, Italy, 2001. (39) Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T. P. Journal of Physical Chemistry 1987, 91, 6269.
42
Chapter 4
Ranking Mutants of Dihydrofolate Reductase According to the Hydride Transfer Rates
Reproduced in part with permission from M. D. Kumarasiri, G.A. Baker, A.V. Soudackov and S. Hammes-Schiffer, submitted to Journal of Physical Chemistry B.
© 2008 American Chemical Society
4.1 Introduction
Computational protein design is a rapidly growing field with potential applications in
pharmaceuticals, biotechnology, and other industrial processes. Since most protein design
protocols generate large numbers of designs,1-3 an efficient method for ranking these designs
according to specified criteria is essential. This type of ranking process enables the selection of a
smaller number of designs for experimental characterization. A variety of criteria could be used
in this ranking process, including protein stability, substrate binding energy, and activity. Often
the objective is to rank protein designs according to the enzyme-catalyzed reaction rate. Within
the framework of transition state theory, the rate of a chemical reaction is exponentially related to
the free energy barrier. In this case, the protein designs are ranked according to the relative free
energy barriers of the chemical step of interest.
In this chapter, we present an efficient computational approach for ranking mutant
enzymes according to the relative free energy barriers associated with the catalyzed chemical
reaction. The mutant enzymes are generated using a rotamer library4 in conjunction with
restrained minimizations and molecular dynamics simulations. For each mutant, the partial free
energy curve for the chemical step of interest is calculated along a collective reaction coordinate
using biased molecular dynamics simulations and umbrella integration5,6 with an empirical
valence bond potential.7,8 This procedure does not include any type of parameter fitting for the
43
mutants. Each step in the procedure can be automated and optimized for the specific enzyme
system.
We apply this ranking approach to dihydrofolate reductase (DHFR), which catalyzes the
reduction of 7,8-dihydrofolate (DHF) into 5,6,7,8-tetrahydrofolate (THF) using the coenzyme
nicotinamide adenine dinucleotide phosphate (NADPH).9 In this reaction, the hydride is
transferred from the C4 position of the NADPH cofactor to the C6 position of the protonated
dihydrofolate substrate. The product THF is essential for the synthesis of purines, pyrimidines,
and certain amino acids. As a result, DHFR inhibition has been promoted as a pharmacological
target for antibacterial agents and anticancer drugs.10-14 Furthermore, the hydride transfer
reaction catalyzed by DHFR and its mutants has been the subject of a wide variety of
experimental15-29 and theoretical30-43 studies. In particular, kinetic measurements on 15 single
mutant DHFR enzymes indicate hydride transfer rates ranging from 0.2 s-1 to 319 s-1 at pH 7.16-21
The objective of this chapter is to use the computational approach outlined above to rank these 15
DHFR mutant enzymes according to the rate of hydride transfer and compare the results to the
available experimental data.
In Section 4.2, we present the general computational ranking approach and describe its
implementation for studying the hydride transfer reaction in DHFR. Section 4.3 compares the
experimental and calculated changes in free energy barriers for this enzyme-catalyzed reaction.
The conclusions are presented in Section 4.4.
4.2 Methods
We calculated the free energy barrier for hydride transfer in wild-type DHFR and 15
mutants to determine the change in the free energy barrier upon mutation. These 15 mutants were
identified using a general literature search for experimental measurements of the hydride transfer
44
rate at pH 7.44 The hydride transfer reaction catalyzed by DHFR is depicted in Figure 4.1. Our
group studied this reaction previously in wild-type DHFR,31,32,45 as well as a few selected
mutants,33,34 with a hybrid quantum-classical molecular dynamics approach, which includes the
nuclear quantum effects of the transferring hydrogen with grid-based or path integral methods.
Here we use the same simulation system and EVB potential but do not include the nuclear
quantum effects because they are expected to be similar for all mutants and therefore will not
significantly impact the changes in the free energy barrier upon mutation. The reasoning behind
this was provided in Section 3.4.
HN
N NH
HN
H2N
NHRO
NH
HR'
NH2O
H3F+ NADPH H4F
HN
N NH
HN
H2N
NHRO
H
NADP+
N R'
NH2O
H
+
+
Figure 4.1: Hydride transfer reaction from the NADPH cofactor to the protonated dihydrofolate substrate H3F+ to form the products tetrahydrofolate H4F and NADP+. Figure reproduced with permission from Ref. 8.
In previous simulations, the initial coordinates were obtained from a crystal structure of
Escherichia coli DHFR complexed with NADP+ and folate (PDB code 1rx2).22 Here, we started
with a snapshot of the equilibrated reactant state obtained from previous simulations.32,45 We
followed the procedure described in Section 3.4 to generate a single connected structure in the
central cell. The present simulation system has the protein, the substrate NADPH, and the
cofactor H3F+ solvated by 4097 explicit water molecules in a truncated octahedral periodic box
with a distance of 66.55 Å between opposing square faces. The protonation states of the amino
acids are the same as those used in previous work, which were determined from the pKa values at
pH=7 and hydrogen bonding environments. The potential energy surface of the hydride transfer
45
is represented by a two-state empirical valence bond (EVB) potential,7 where state 1 corresponds
to the transferring hydrogen atom bonded to the donor carbon, and state 2 corresponds to the
transferring hydrogen atom bonded to the acceptor carbon. The diagonal elements of the EVB
Hamiltonian terms ( )11V R and ( )22V R are based on the GROMOS force field46 with the
covalent bond involving the transferring hydrogen represented by a Morse poteitial.31 The two
constant EVB parameters corresponding to the relative energy of the two valence bond states and
the coupling between these states are 65.25 kcal/mol and 34.66 kcal/mol and are fixed at these
values for all mutants. These parameters were determined elsewhere by fitting the results
obtained from hybrid quantum/classical molecular dynamics simulations of wild-type DHFR to
the experimental free energies of reaction and activation obtained from the pH-independent
forward and reverse hydride transfer reaction rate constants.32 In the present calculations, the
transferring hydrogen nucleus is treated classically, and the results are compared to experimental
hydride transfer rates for the mutants measured at pH ≈ 7. For this reason, we focus on the
changes in the free energy barriers relative to the wild-type DHFR rather than the absolute free
energy barriers.
The free energy barrier for each mutant is estimated by generating the potential of mean
force for the hydride transfer reaction along a collective reaction coordinate using umbrella
sampling techniques. The collective reaction coordinate is defined as the difference between the
energies of the two VB states
( ) ( ) ( )11 22V VΛ = −R R R , (4.1)
where 11( )V R and 22 ( )V R are the energies of VB states 1 and 2, respectively, and R denotes all
nuclear coordinates. The molecular dynamics simulations are performed with mapping potentials
defined as linear combinations of the energies of the two VB states7
map 11 22( ; ) (1 ) ( ) ( )m m mV V Vλ λ λ= − +R R R . (4.2)
46
As the mapping parameter λm is varied from zero to unity, the reaction progresses from the
reactant state to the product state. The potential of mean force is obtained by propagating a series
of independent molecular dynamics simulations with different mapping potentials and combining
them using the weighted histogram analysis method (WHAM)47 or umbrella integration (UI).5,6
The details of the molecular dynamics simulations and WHAM and UI implementations are given
in Section 3.1 and 3.2.
In this chapter, our objective is to estimate the free energy barriers for the mutants as
efficiently as possible. Thus, we decreased the number of windows and the amount of
equilibration and data collection for each window and confirmed that the desired accuracy for the
free energy barrier is still maintained. For this purpose, we generated the free energy profile
using WHAM with only six windows: mλ = 0.050, 0.125, 0.250, 0.375, 0.500 and 0.625. As
stated in Section 3.3, the starting configuration for each window was obtained from the previous
window after 20 ps of equilibration. Each window was equilibrated for a total of 120 ps,
followed by 250 ps of data collection. The resulting wild-type free energy barrier of 15.6
kcal/mol is similar to the previously obtained barrier with 20 windows and significantly more
equilibration and data collection.32,45
Previously we showed that UI provides the same free energy profiles as WHAM but
requires fewer windows for efficient convergence.8 Using UI to generate the free energy profile,
we obtained a wild-type free energy barrier of 15.4 kcal/mol with the six windows given above.
We obtained a free energy barrier of 15.1 kcal/mol using only four of these windows: mλ =
0.050, 0.250, 0.500 and 0.625. Based on this analysis, we compared the free energy barrier
changes for the 15 mutants using WHAM with six windows and UI with both four and six
windows. As will be shown below, the WHAM and UI methods with six windows lead to nearly
47
identical results, and the UI method with four windows leads to qualitatively similar results with
minor quantitative discrepancies.
The mutant structures of DHFR were generated from an equilibrated wild-type structure.
The coordinates of this structure are given in Supporting Information. The profix utility in the
JACKAL suite of programs4,48 was used to generate the initial mutant structures from this wild-
type structure. The profix utility uses a backbone rotamer library, a side-chain rotamer library,
and distance geometry constraints to sample segment conformations. Missing residues are
reconstructed with the Nest and Scap modules4 in conjunction with the all-atom AMBER96
forcefield.49 In our procedure, the coordinates of the residue to be mutated were deleted from the
pdb file, the residue name was changed to the new name, and the profix utility was run for the
modified pdb file. Appendix A provides additional details of profix usage and mutation
procedure. In addition to the mutated residue, we observed that the conformations of
approximately four residues on each side of the mutation site are also altered during this
procedure.
The resulting mutant structures were subjected to four cycles of restrained minimizations
and molecular dynamics simulations on the pure reactant potential energy surface with a gradual
release of the atomic restraints. The initial restraining force constant on all atoms with respect to
the initial structure was 59.75 kcal mol-1 Å-2 and was halved with each cycle of the procedure.
Each cycle consisted of a steepest descent geometry optimization followed by 15 ps of molecular
dynamics. Subsequently, 650 ps of equilibration on the EVB reactant potential energy surface
with mλ = 0.05 was performed. After this equilibration, molecular dynamics simulations with
mapping potentials corresponding to mλ = 0.050, 0.125, 0.250, 0.375, 0.500 and 0.625 were
propagated. The starting configuration for each window was obtained from the previous window
48
after 20 ps of equilibration, and each window was equilibrated for a total of 120 ps, followed by
250 ps of data collection.
In order to confirm that the initial equilibration time of 650 ps on the EVB reactant
surface was sufficient, we performed the calculations for all mutants with an initial equilibration
time of only 300 ps on the EVB reactant surface. The free energy barriers differ from those
obtained with 650 ps of equilibration by less than 0.5 kcal/mol for all mutants except G121P,
G121V, D122A, and D27E. For these four mutants, we repeated the calculations with an initial
equilibration time of 850 ps on the EVB reactant surface and found that the resulting free energy
barriers are within 0.2 kcal/mol of those obtained with 650 ps of equilibration. Based on these
tests, we concluded that an initial equilibration time of 650 ps on the EVB reactant potential
energy surface is sufficient for all mutants studied.
The steps for calculating the free energy barrier change upon mutation, along with the
approximate CPU times for an Intel Xeon 3.0 GHz processor, are as follows:
1. Generation of mutant structure from wild-type using the profix utility (<2 CPU
minutes)
2. Restrained minimizations/molecular dynamics on pure reactant surface (≈8 CPU
hours)
3. 650 ps equilibration on EVB reactant surface with mλ = 0.05 (≈65 CPU hours)
4. 120 ps equilibration and 250 ps data collection for each window (≈37 CPU hours per
window – the windows may be run in parallel on separate processors after the initial
20 ps per window)
5. Generation of free energy profiles using WHAM or UI ( <2 CPU minutes)
This procedure is depicted in Figure 4.2. Note that this procedure does not include any
free parameters or parameter fitting for the mutant calculations. The individual steps have been
49
automated. Appendix B provides the essential scripts written for automation purposes. For each
mutant, the first three steps and the fifth step require only a single processor, and the fourth step
can be run in parallel using four to six processors, depending on the number of windows used.
Thus, the free energy barrier changes for 16 enzymes can be evaluated in approximately one
week using 32 processors.
Figure 4.2: Summary of the steps for the generation of the initial mutant structure, equilibration, and calculation of the free energy barrier. Here fc is the force constant of the position restraints with respect to the initial structure during the restrained minimizations and molecular dynamics simulations.
4.3 Results
Table 4.1 provides the experimentally determined rates for hydride transfer catalyzed by
the 15 mutants. The locations of these mutation sites in the DHFR structure are depicted in
Figure 4.3. Note that these 7 mutation sites are distributed throughout the protein. The slowest
mutant exhibits a decrease in the hydride transfer rate by a factor of ~1000, and the fastest mutant
6 windows MD with mapping
potentials 370 ps each
i = i + 1 k = k/2
MD of EVB reactant
650 ps
Restrained minimization of reactant
fc = k
Restrained MD of reactant fc = k, 15 ps
Profix to mutate
Wild type reactant structure
i = 1
i ≤ 4
Generate free energy curve with
WHAM or UI
Y
N
50
exhibits an increase in the hydride transfer rate by a factor of ~1.5. The associated experimental
free energy barriers were obtained using the standard transition state theory rate constant
expression. All of the experiments were performed at pH = 7 except for the D27C and D27E
mutants which were performed at pH=7.3. Based on the experimentally observed hydride
transfer rate changes in wild-type DHFR around pH 7, we expect the rate changes for D27C and
D27E mutants from pH =7.3 to pH=7 to be within our numerical accuracy. The experimental
changes in the free energy barrier, †expt∆∆G , are defined relative to the wild-type free energy
barrier at pH = 7 and were calculated using the transition state theory rate expression. We also
note that previously the transmission coefficient was calculated to be 0.88 for wild-type DHFR,31
and we do not expect the degree of recrossings of the dividing surface to differ significantly for
the mutants.
Table 4.1: The experimentally determined hydride transfer rate constants for E. coli DHFR mutants at pH ≈7 and 300 K. These rate constants were measured at pH 7 for wild-type DHFR and all mutants except D27E and D27C, which were measured at pH 7.3.
Mutant khyd (s-1) Mutant khyd (s-1)
G121L16 0.2 D27E17 40
G121P16 0.5 S49A20 120
G121V16 1.4 S148A21 157
D27C17 1.7 S148K21 162
G121S18 3.7 G67V18 190
D122A19 4.0 WT16 220
D122S19 5.9 H149Q44 234
D122N19 9.4 S148D21 319
Mutant khyd (s-1) Mutant khyd (s-1)
G121L16 0.2 D27E17 40
G121P16 0.5 S49A20 120
G121V16 1.4 S148A21 157
D27C17 1.7 S148K21 162
G121S18 3.7 G67V18 190
D122A19 4.0 WT16 220
D122S19 5.9 H149Q44 234
D122N19 9.4 S148D21 319
51
Figure 4.3: Depiction of the mutation sites of DHFR. The cofactor is green, the substrate is magenta, and the mutated residues are orange. This figure was created using VMD.50
Table 4.2 provides a comparison of the experimental and calculated changes in the free
energy barrier relative to wild-type DHFR for the series of 15 mutants. The calculated changes in
the free energy barrier are very similar using WHAM and UI with six windows. The results
obtained using UI with only four windows ( mλ = 0.050, 0.250, 0.500 and 0.625) are qualitatively
similar to those obtained using UI with six windows, but the quantitative changes in the free
energy barrier differ by as much as 0.7 kcal/mol and the correlation coefficient51 is only R = 0.78.
Using a different set of four windows, specifically mλ = 0.050, 0.125, 0.500 and 0.625, yields
similar results with R=0.77. We expected that the second set of four windows might yield better
52
results than the first set as we have better reactant region data sampling in the second set, but we
did not notice such a trend. We also attempted using three windows ( mλ = 0.050, 0.500 and
0.625) but, no meaningful free energy barriers could be predicted. Figure 4.4 depicts a correlation
plot of the results obtained using UI with six windows, in which case the correlation coefficient51
is R = 0.82. Given the significant approximations underlying the computational approach, this
level of agreement between the calculated and experimental data is encouraging. Note that the
computational approach predicts the correct direction of the change in free energy barrier for all
15 mutants.
Based on Figure 4.4, we would like to understand the basis for the differences in the rates
among the mutants. The correlation between the experimental barrier heights and the calculated
barrier heights for the mutants with faster rates, S148D, H149Q, G67V, S148K, S148A, and
S49A, appears to be different from the mutants with slower rates. However, according to Figure
4.4, no clear relationship is evident between the positions of the mutation sites relative to the
active site and the rates of hydride transfer. We also note that previous studies indicate that the
thermally averaged donor acceptor distance at the reactant state is larger than that of the transition
state.31,52 Some structural changes that may affect the rate of hydride transfer have also been
observed previously. For example in the case of S148D, which is the fastest in the group,
Benkovic and coworkers suggest that substituting Ser148 with aspartic acid can strengthen the
interaction between βG – βH and Met20 loops stabilizing the product state.53 For the slower
mutants Gly121, it has been suggested that the mutation at position 121 disrupts the network of
coupled promoting motions in DHFR, which manifests as conformational changes that occur
during the reaction.30,54 This results in an increased free energy barrier. Thermally averaged
distances can be used to propose such chemical insights. This work is currently in progress.
53
Table 4.2: The change in the free energy barrier relative to the wild-type free energy barrier for a series of
mutants for different equilibration periods. The experimental free energy barriers are obtained from the
transition state theory rate constant expression ( )†exp= −∆BB
k Tk G k T
husing the experimentally
determined rate constants in Table 4.1. The calculated free energy barriers are obtained using WHAM with
six windows, UI with six windows, and UI with four windows. The notation “WHAM,350” denotes
WHAM with 350 ps of MD on the EVB reactant surface in the equilibration procedure. “WHAM,650” and
“WHAM,850” are defined analogously. UI4 uses mλ = 0.050, 0. 250, 0.500 and 0.625 and UI4’ uses mλ
= 0.050, 0.125, 0.500 and 0.625. All free energies are given in kcal/mol.
Mutant †exptG∆∆ †
WHAM,350G∆∆ †WHAM,650G∆∆ †
WHAM,850G∆∆ †UI6,650G∆∆ †
UI4,650G∆∆ †UI4',650G∆∆
G121L 4.2 4.4 4.3 4.8 4.4 4.1 G121P 3.6 0.7 2.2 2.3 2.6 3.0 2.7 G121V 3.0 -0.1 1.6 1.8 1.9 2.4 2.2 D27C 2.9 2.6 2.6 2.7 2.9 2.7 G121S 2.4 2.4 2.4 2.8 3.3 3.1 D122A 2.4 -0.8 2.3 2.3 2.5 2.9 2.8 D122S 2.2 1.3 1.6 1.8 2.2 1.8 D122N 1.9 2.9 2.8 2.9 3.4 3.2 D27E 1.0 -0.3 1.9 1.9 2.0 2.6 2.3 S49A 0.4 1.5 1.2 1.4 1.8 1.4 S148A 0.2 0.4 0.4 0.8 1.5 1.3 S148K 0.2 2.1 1.7 1.7 2.0 1.8 G67V 0.1 0.7 0.9 0.9 1.6 1.5 H149Q 0.0 -0.9 -1.3 -1.0 -1.0 -1.3 S148D -0.2 -2.8 -1.6 -1.7 -1.6 -0.9 -1.1
54
Figure 4.4: Correlation plot for the calculated and experimental changes in the free energy barrier for the 15 mutants, where the calculated free energy barriers were obtained using UI with 6 windows. The correlation coefficient is R = 0.82.
During our calculations, several assumptions were made. Based on chemical intuition, it
was assumed that the proton transfer step of the DHFR reaction precedes the hydride transfer.
The nuclear quantum effects of the transferring hydrogen were assumed to decrease the free
energy barrier of mutants by a similar amount to the wild-type. We also assumed that there are no
recrossings of the dividing surface, based on the high transmission coefficient of 0.88 calculated
previously.31 We represented the hydride transfer reaction by a two state EVB potential, and the
G121L
G121V
G121P
H149Q
S148D
G121SD122N
D27E
D27C
S148K
S49A
S148A
G67V
D122A
D122S
55
EVB parameters were assumed to be the same for wild-type and the mutants. Additionally, the
off-diagonal coupling terms were approximated by constants. Finally, our simulations suffer
from limitations that are inherent to MD simulations. These include limitations of the forcefield,
solvent model, the reaction field electrostatic treatment in GROMOS, and the length of
simulations governed by available computer power.
4.4 Conclusions
In this chapter, we presented a computationally efficient approach for evaluating the
impact of mutation on enzyme-catalyzed reaction rates. This procedure requires the generation
and equilibration of the mutant structure, followed by the calculation of a partial free energy
curve using an empirical valence bond potential in conjunction with biased molecular dynamics
simulations and umbrella integration. No parameter fitting is involved in this procedure for the
mutants. The individual steps are automated and optimized for computational efficiency.
We used this approach to calculate the changes in the free energy barrier for hydride
transfer upon mutation of DHFR. The 15 mutants studied were chosen objectively based on a
general literature search for experimental measurements of the hydride transfer rate at pH 7.44
The agreement between the calculated and experimental changes in the free energy barrier upon
mutation is encouraging. The computational approach predicts the correct direction of the change
in free energy barrier for all mutants, and the correlation coefficient between the calculated and
experimental data is 0.82. In the future, this approach will be used to predict the impact of
mutations that have not been studied experimentally yet. The feedback between experiment and
theory will guide the further refinement of the procedure. This general approach for ranking
protein designs according to the free energy barrier has implications for protein engineering and
drug design.
56
4.5 References
(1) Rothlisberger, D.; Khersonsky, O.; Wollacott, A. M.; Jiang, L.; DeChancie, J.; Betker, J.; Gallaher, J. L.; Althoff, E. A.; Zanghellini, A.; Dym, O.; Albeck, S.; Houk, K. N.; Tawfik, D. S.; Baker, D. Nature (London) 2008, 453, 190. (2) Jiang, L.; Althoff, E. A.; Clemente, F. R.; Doyle, L.; Rothlisberger, D.; Zanghellini, A.; Gallaher, J. L.; Betker, J. L.; Tanaka, F.; Barbas III, C. F.; Hilvert, D.; Houk, K. N.; Stoddard, B. L.; Baker, D. Science 2008, 319, 1387. (3) Das, R.; Baker, D. Annu. Rev. Biochem. 2008, 77, 363. (4) Xiang, Z.; Honig, B. J. Mol. Biol. 2001, 311, 421. (5) Kastner, J.; Thiel, W. J. Chem. Phys. 2005, 123, 144104. (6) Kastner, J.; Thiel, W. J. Chem. Phys. 2006, 124, 234106. (7) Warshel, A. Computer Modeling of Chemical Reactions in Enzymes and Solutions; John Wiley & Sons, Inc.: New York, 1991. (8) Chakravorty, D. K.; Kumarasiri, M.; Soudackov, A. V.; Hammes-Schiffer, S. Journal of Chemical Theory and Computation 2008, 4, 1974. (9) Miller, G. P.; Benkovic, S. J. Chemistry & Biology 1998, 5, R105. (10) Berg, J. M.; Stryer, L.; Tymoczko, J. Biochemistry, 5th ed.; Freeman: New York, 2002. (11) Miovic, M.; Pizer, L. I. Journal of Bacteriology 1971, 106, 856. (12) Allegra, C. J.; Hoang, K.; Yeh, G. C.; Drake, J. C.; Baram, J. J. Biol. Chem. 1987, 262, 13520. (13) Huennekens, F. M. Advances in Enzyme Regulation 1994, 34, 397. (14) Schweitzer, B. I.; Dicker, A. P.; Bertino, J. R. FASEB J. 1990, 4, 2441. (15) Fierke, C. A.; Johnson, K. A.; Benkovic, S. J. Biochemistry 1987, 26, 4085. (16) Cameron, C. E.; Benkovic, S. J. Biochemistry 1997, 36, 15792. (17) David, C. L.; Howell, E. E.; Farnum, M. F.; Villafranca, J. E.; Oatley, S. J.; Kraut, J. Biochemistry 1992, 31, 9813. (18) Rajagopalan, P. T. R.; Lutz, S.; Benkovic, S. J. Biochemistry 2002, 41, 12618. (19) Miller, G. P.; Benkovic, S. J. Biochemistry 1998, 37, 6336. (20) Adams, J. A.; Fierke, C. A.; Benkovic, S. J. Biochemistry 1991, 30, 11046. (21) Miller, G. P.; Wahnon, D. C.; Benkovic, S. J. Biochemistry 2001, 40, 867. (22) Sawaya, M. R.; Kraut, J. Biochemistry 1997, 36, 586. (23) Osborne, M. J.; Schnell, J.; Benkovic, S. J.; Dyson, H. J.; Wright, P. E. Biochemistry 2001, 40, 9846. (24) Schnell, J. R.; Dyson, H. J.; Wright, P. E. Annual Review of Biophysical Biomolecular Structure 2004, 33, 119. (25) Zhang, Z. Q.; Rajagopalan, P. T. R.; Selzer, T.; Benkovic, S. J.; Hammes, G. G. Proc. Nat. Acad. Sci. U.S.A. 2004, 101, 2764. (26) Sikorski, R. S.; Wang, L.; Markham, K. A.; Rajagopalan, P. T. R.; Benkovic, S. J.; Kohen, A. J. Am. Chem. Soc. 2004, 126, 4778. (27) Antikainen, N. M.; Smiley, R. D.; Benkovic, S. J.; Hammes, G. G. Biochemistry 2005, 44, 16835. (28) Wang, L.; Goodey, N. M.; Benkovic, S. J.; Kohen, A. Proceedings of the National Academy of Sciences U.S.A. 2006, 103, 15753. (29) Boehr, D. D.; McElheny, D.; Dyson, H. J.; Wright, P. E. Science 2006, 313, 1638.
57
(30) Agarwal, P. K.; Billeter, S. R.; Rajagopalan, P. T. R.; Benkovic, S. J.; Hammes-Schiffer, S. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 2794. (31) Agarwal, P. K.; Billeter, S. R.; Hammes-Schiffer, S. J. Phys. Chem. B 2002, 106, 3283. (32) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. J. Phys. Chem. B 2004, 108, 12231. (33) Watney, J. B.; Agarwal, P. K.; Hammes-Schiffer, S. J. Am. Chem. Soc. 2003, 125, 3745. (34) Wong, K. F.; Selzer, T.; Benkovic, S. J.; Hammes-Schiffer, S. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6807. (35) Castillo, R.; Andres, J.; Moliner, V. J. Am. Chem. Soc. 1999, 121, 12140. (36) Radkiewicz, J. L.; Brooks, C. L., III. J. Am. Chem. Soc. 2000, 122, 225. (37) Cummins, P. L.; Greatbanks, S. P.; Rendell, A. P.; Gready, J. E. J. Phys. Chem. B 2002, 106, 9934. (38) Garcia-Viloca, M.; Truhlar, D. G.; Gao, J. Biochemistry 2003, 42, 13558. (39) Rod, T. H.; Radkiewicz, J. L.; Brooks III, C. L. Proceedings of the National Academy USA 2003, 100, 6980. (40) Thorpe, I. F.; Brooks III, C. L. J. Phys. Chem. B 2003, 107, 14042. (41) Thorpe, I. F.; Brooks III, C. L. Proteins: Structure, Function, and Bioinformatics 2004, 57, 444. (42) Swanwick, R. S.; Shrimpton, P. J.; Allemann, R. K. Biochemistry 2004, 43, 4119. (43) Liu, H.; Warshel, A. Biochemistry 2007, 46, 6011. (44) Lee, J.; Benkovic, S. J. personal communication. (45) Wang, Q.; Hammes-Schiffer, S. J. Chem. Phys. 2006, 125, 184102. (46) van Gunsteren, W. F.; Billeter, S. R.; Eising, A. A.; Hunenberger, P. H.; Kruger, P.; Mark, A. E.; Scott, W. R. P.; Tironi, I. G. Biomolecular simulation: The GROMOS96 manual and user guide; VdF Hochschulverlag, ETH Zurich: Zurich, 1996. (47) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. J. Comput. Chem. 1992, 13, 1011. (48) Xiang, J. Z.; Honig, B. JACKAL: A Protein Structure Modeling Package; Columbia University & Howard Hughes Medical Institute: New York, 2002. (49) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M., Jr.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 5179. (50) Humphrey, W.; Dalke, A.; Schulten, K. J Mol Graph 1996, 14, 33. (51) Weisstein, E. W. "Correlation Coefficient" From MathWorld - A Wolfram Web Resource. http://mathworld.wolfram.com/CorrelationCoefficient.html (52) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. Journal of Physical Chemistry B 2004, 108, 12231. (53) Miller, G. P.; Wahnon, D. C.; Benkovic, S. J. Biochemistry 2001, 40, 867. (54) Watney, J. B.; Agarwal, P. K.; Hammes-Schiffer, S. Journal of the American Chemical Society 2003, 125, 3745.
58
Chapter 5
Conclusions
5.1 Anharmonic Effects in Small Clusters
We investigated anharmonic effects of ammonium nitrate and hydroxylammonium nitrate
covalent monomers, ionic dimers and constituent ions. Density functional theory and second-
order vibrational perturbation theory as implemented in Gaussian 03 package were used in the
calculations. Our calculations illustrate that the anharmonicities of the potential energy surfaces
significantly influence the geometries, frequencies, and nuclear magnetic shieldings for these
systems All clusters exhibit strong hydrogen bonding interactions. Our calculations confirmed
that the most stable structures are covalent acid-base pairs for the monomers and ionic acid-base
pairs for the dimers.1,2 The hydrogen bonding distances were found to be greater in the ionic
dimers than in the covalent monomers in part because the nitrogen and oxygen atoms are
involved in multiple competing hydrogen bonding interactions in the dimers.
We also observed significant shifts in the stretching frequencies from the covalent
monomers to the ionic dimers. Moreover, we identified an intermolecular hydrogen-bonding
stretching motion of ~200 cm-1 in the monomers that shifts to an intermolecular breathing motion
of slightly higher frequency of ~300 cm-1 in the dimers. In these cases, it is incorrect to use
scaling factors that are normally used in ab initio harmonic frequency calculations. The inclusion
of anharmonic effects was found to significantly decrease many of the calculated frequencies in
these clusters and to improve the agreement of the calculated frequencies with the experimental
data available for the isolated neutral species.
Our calculations of nuclear magnetic shielding constants for all nuclei in these clusters
illustrate that quantitatively accurate predictions of nuclear magnetic shieldings for comparison to
59
experimental data require the inclusion of anharmonic effects. Furthermore, the consideration of
anharmonic effects in the development of molecular forcefields will be particularly important for
simulations of proton transfer reactions in ionic liquids and other ionic materials.
5.2 Simulation Methods for Hydride Transfer in Dihydrofolate Reductase
We also described two statistical methods that can be used to generate the potential of
mean force (PMF) for a chemical reaction: the weighted histogram analysis method (WHAM)
and the umbrella integration (UI) method. In WHAM, two equations are solved iteratively, and
the PMF is obtained directly from them. The UI method is based on the derivative of the PMF
with respect to the reaction coordinate rather than the PMF itself. There are two significant
advantages of UI over WHAM. The first advantage is that UI does not rely on a binning
procedure to generate histograms and therefore reduces the statistical error and converges
efficiently. The second advantage is that the UI method can provide accurate PMF curves
efficiently even with a small number of windows that do not overlap significantly. Thus, UI is a
promising method for generating accurate PMF curves for large systems for which sampling may
be limited.
The free energy barrier calculated for hydride transfer in DHFR was similar to previously
calculated barriers with significantly longer sampling periods.3,4 We established that for DHFR, it
is not necessary to generate the full free energy profile to calculate the free energy barrier.
Additionally, generating only partial free energy profiles with a lesser number of windows
significantly reduces the computational cost. Compared to the previously obtained free energy
barrier using 20 windows with significantly more sampling, the free energy barriers using 6
windows were similar for both the WHAM and the UI methods.5 The UI method was able to
generate comparable free energy barriers even with four windows.
60
5.3 Ranking Mutants of DHFR According to Catalytic Reaction Rates
A computationally efficient approach was presented for evaluating the impact of
mutation on enzyme-catalyzed reaction rates. This procedure requires the generation and
equilibration of the mutant structure, followed by the calculation of a partial free energy curve
using an EVB potential in conjunction with biased molecular dynamics simulations and the UI or
the WHAM method. No parameter fitting is involved in this procedure for the mutants. The
individual steps were automated and optimized for computational efficiency.
We used this approach to calculate the changes in the free energy barrier for hydride
transfer upon mutation of DHFR. The 15 mutants studied were chosen objectively based on a
general literature search for experimental measurements of the hydride transfer rate at pH 7.6
Mutations were done using rotamer libraries as implemented in the Jackal suite of programs.7,8
The resulting structures were subjected to a restrained minimization and equilibration procedure.
We observed that some mutants require longer equilibration periods.
The agreement between the calculated and experimental changes in the free energy
barrier upon mutation is encouraging. The computational approach predicts the correct direction
of the change in free energy barrier for all mutants, and the correlation coefficient between the
calculated and experimental data is 0.82. In the future, this approach will be used to predict the
impact of mutations that have not been studied experimentally yet. The feedback between
experiment and theory will guide the further refinement of the procedure. This general approach
for ranking protein designs according to the free energy barrier has implications for protein
engineering and drug design. Additionally, the method is mostly automated and can readily be
modified to be used for mutants of different enzymes or enzyme designs.
61
5.4 References
(1) Mebel, A. M.; Lin, M. C.; Morokuma, K.; Melius, C. F. Journal of Physical Chemistry 1995, 99, 6842. (2) Nguyen, M.-T.; Jamka, A. J.; Cazar, R. A.; Tao, F.-M. The Journal of Chemical Physics 1997, 106, 8710. (3) Wong, K. F.; Watney, J. B.; Hammes-Schiffer, S. Journal of Physical Chemistry B 2004, 108, 12231. (4) Wang, Q.; Hammes-Schiffer, S. Journal of Chemical Physics 2006, 125, 184102. (5) Chakravorty, D. K.; Kumarasiri, M.; Soudackov, A. V.; Hammes-Schiffer, S. Journal of Chemical Theory and Computation 2008, 4, 1974. (6) Lee, J.; Benkovic, S. J. personal communication. (7) Xiang, J. Z.; Honig, B. JACKAL: A Protein Structure Modeling Package; Columbia University & Howard Hughes Medical Institute: New York, 2002. (8) Xiang, Z.; Honig, B. Journal of Molecular Biology 2001, 311, 421.
62
Appendix A
Technical Details of the Mutation Procedure
A.1 Introduction
The mutation procedure involves generating the GROMOS coordinate file, the
GROMOS topology and the GROMOS EVB files for the mutants. The GROMOS topology files
for the mutants are obtained by following the protocol given in Section A.2. The mutant EVB
files are manually edited as it only involves replacing a few atom indices. Generating the
GROMOS coordinate file is the most time consuming step and involves a significant amount of
manual manipulation of files (Section A.3). The mutant topology and EVB files require ≈10
minutes to generate, while the coordinate file requires ≈30 minutes. The example scenario
provided below is for mutating the wild-type enzyme to the G121L mutant.
A.2 Protocol for Creating Mutant Topology and EVB Files
In the EVB files for each mutant, atom indices must shift by a constant amount, which is
the difference between the total number of atoms between the wild-type and the mutant. For the
G121L mutant, this shifts the atom indices by +4 and editing the atom indices can be done easily
manually. We use the GROMOS executable progmt.64 to create the topology file. Progmt.64
usage is explained with examples in the GROMOS manual. The topology for the amino acids in
the protein is created separately from the substrate and cofactor. Then the partial topologies are
merged to create the topology of the whole system. Therefore, only the amino acid part of the
63
topology needs to be regenerated for each mutant. The following steps are followed to create a
mutant topology file:
1. Edit the AANM directive of the control file for progmt.64 to reflect the mutation.
This section lists the residues present in the enzyme and only the mutated residue
identity should to be changed. GROMOS program manual provides examples of all
control file directives.
2. Execute progmt.64 to create the amino acid part of the topology file. Executing
progmt.64 can be done using a script similar to Run_min1.sh given in Appendix B by
changing the name of the executable in line 25.
3. Merge amino acid topology with substrate and cofactor topologies using progmt.64.
Here the NMOL2 directive in the control file is kept at 1. NMOL2 specifies how
many times a second molecular topology is merged with a previous topology.
A.3 Generating Mutant Coordinates
As stated in Section 4.2, the profix utility in the Jackal suite of programs was used to
perform the mutation. We start with the wild-type pdb file with the atomic coordinates and the
SEQRES cards that lists the amino acid sequence. We remove the mutated residue from the
coordinates section of the pdb file and change the SEQRES entry of the mutated residue to reflect
the mutation. Then profix is executed with the -fix 1 option. Thus the command line input is
“profix – fix 1 <input_file_name> <output_file_name> >& <log_file_name>”. Relevant sections
of a sample input pdb file, output pdb file and the profix log file are given in Sections A.3.1,
A.3.2 and A.3.3, respectively. Once the coordinates are created using profix, they can be copied
and pasted on to the GROMOS coordinate file.
64
A.3.1 Sample Input Pdb File
In the input pdb file, residue 121 is changed to LEU in the SEQRES section and is
removed completely from the coordinates section. Mutating residue is underlined for
identification purposes.
TITLE DHFR_WT(1RX2)_TO_G121L SEQRES 1 159 MET ILE SER LEU ILE ALA ALA LEU ALA VAL ASP ARG VAL SEQRES 2 159 ILE GLY MET GLU ASN ALA MET PRO TRP ASN LEU PRO ALA SEQRES 3 159 ASP LEU ALA TRP PHE LYS ARG ASN THR LEU ASP LYS PRO SEQRES 4 159 VAL ILE MET GLY ARG HIS THR TRP GLU SER ILE GLY ARG SEQRES 5 159 PRO LEU PRO GLY ARG LYS ASN ILE ILE LEU SER SER GLN SEQRES 6 159 PRO GLY THR ASP ASP ARG VAL THR TRP VAL LYS SER VAL SEQRES 7 159 ASP GLU ALA ILE ALA ALA CYS GLY ASP VAL PRO GLU ILE SEQRES 8 159 MET VAL ILE GLY GLY GLY ARG VAL TYR GLU GLN PHE LEU SEQRES 9 159 PRO LYS ALA GLN LYS LEU TYR LEU THR HIS ILE ASP ALA SEQRES 10 159 GLU VAL GLU LEUSEQRES 11 159 ASP ASP TRP GLU SER VAL PHE SER GLU PHE HIS ASP ALA
ASP THR HIS PHE PRO ASP TYR GLU PRO
SEQRES 12 159 ASP ALA GLN ASN SER HIS SER TYR CYS PHE GLU ILE LEU SEQRES 13 159 GLU ARG ARG ATOM 1 H1 MET 1 42.160 31.995 28.982 ATOM 2 H2 MET 1 40.908 32.853 28.352 ATOM 3 N MET 1 41.383 32.589 29.191 ATOM 4 H3 MET 1 41.772 33.407 29.615 ATOM 5 CA MET 1 40.646 31.835 30.184 ... ... ATOM 1168 CA GLU 120 12.372 31.835 47.872 ATOM 1169 CB GLU 120 10.949 32.200 48.160 ATOM 1170 CG GLU 120 9.978 31.844 47.054 ATOM 1171 CD GLU 120 8.503 32.041 47.420 ATOM 1172 OE1 GLU 120 7.836 31.020 47.607 ATOM 1173 OE2 GLU 120 8.119 33.213 47.705 ATOM 1174 C GLU 120 12.994 32.652 46.720 ATOM 1175 O GLU 120 13.402 33.821 46.983 ATOM 1181 N ASP 122 13.500 32.448 41.907 ATOM 1182 H ASP 122 14.264 33.091 41.858 ATOM 1183 CA ASP 122 13.011 31.934 40.579 ATOM 1184 CB ASP 122 12.186 32.976 39.764 ... ...
65
A.3.2 Sample Output Pdb File
In the output file, LEU is added at position 121. Only the coordinates of the united atoms
are given here.
...
... ATOM 1859 CA GLU 120 12.372 31.835 47.872 ATOM 1862 CB GLU 120 10.966 32.277 48.133 ATOM 1863 CG GLU 120 9.876 31.665 47.213 ATOM 1864 CD GLU 120 9.199 32.690 46.305 ATOM 1865 OE1 GLU 120 8.566 33.616 46.877 ATOM 1866 OE2 GLU 120 9.295 32.561 45.058 ATOM 1860 C GLU 120 13.044 32.586 46.703 ATOM 1861 O GLU 120 13.548 33.721 46.947 ATOM 1873 N LEU 121 13.004 31.991 45.511 ATOM 1881 HN LEU 121 12.529 31.116 45.427 ATOM 1874 CA LEU 121 13.625 32.567 44.322 ATOM 1877 CB LEU 121 14.989 31.901 44.098 ATOM 1878 CG LEU 121 16.180 32.459 44.906 ATOM 1879 CD1 LEU 121 16.283 33.991 44.827 ATOM 1880 CD2 LEU 121 16.067 32.012 46.370 ATOM 1875 C LEU 121 12.767 32.346 43.077 ATOM 1876 O LEU 121 11.672 31.771 43.160 ATOM 1892 N ASP 122 13.497 32.416 41.920 ATOM 1900 HN ASP 122 14.051 33.248 41.897 ATOM 1893 CA ASP 122 13.011 31.934 40.579 ATOM 1896 CB ASP 122 12.206 33.002 39.778 ... ...
A.3.3 Sample Profix Log Entry
A portion of the output from profix is provided here. It identifies that a residue is missing
at position 121 and then adds it and prints out the conservation scores. Then it re-indexes the
system.
...
... Warning...... the pdb file:1rx2_to_g121l has breaker at:E120 D122 ...
66
... conserve score 114 I---I:1 conserve score 115 D---D:1 conserve score 116 A---A:0.931592 conserve score 117 E---E:0.919129 conserve score 118 V---V:0.899655 conserve score 119 E---E:0.863992 conserve score 120 ----L:0.771263 conserve score 121 D---D:0.863992 conserve score 122 T---T:0.899655 conserve score 123 H---H:0.919129 conserve score 124 F---F:0.931592 conserve score 125 P---P:1 conserve score 126 D---D:1 conserve score 127 Y---Y:1 ... ... reindexing... indexing from old to new... 1 -- 1 indexing from old to new... 2 -- 2 ... ... indexing from old to new... 119 -- 119 indexing from old to new... 120 -- 120 indexing from old to new........ -- 121 indexing from old to new... 122 -- 122 indexing from old to new... 123 -- 123 ... write down the final structure...1rx2_to_g121l_fix.pdb
67
Appendix B
Scripts for Automating Computer Job Submission
B.1 Introduction
Supervising MD simulations for a large number of mutants requires utilizing many
scripts and small programs. Some of the scripts used for job submitting and editing input files are
given here. Our strategy was to use a very few master scripts to manage a large number of other
scripts (Sections B.2, B.3 and B.6). The master scripts are able to edit input files and other
scripts, submit computer jobs, monitor job status and collect results. The scripts provided here
have been formatted for printing purposes. Therefore, they may not reflect the best scripting
practices. Additionally, these scripts are open-source and are not bound by the copyright rules of
the rest of this thesis.
The scenario used in the scripts is for the G121L DHFR mutant. It is assumed that the
main job directory is /home/malika/DHFR/G121L/data_x2. Generating the G121L mutant free
energy profile starts with making the mutation as described in Appendix A. Then the necessary
directories are created by executing Makedir.sh script. Run_restr.sh script is used to perform the
restrained minimizations and equilibrations before starting the windows. This script
automatically calls each minimization and equilibration script. Submit_min1.sh is used to submit
the first restrained minimization to the computer job queue and is provided as an example. This
script executes Run_min1.sh script, which calls the GROMOS executable to perform the
requested MD simulations. The windows are started by executing Run_all.sh script. This script
automatically calls each Run_gromos.sh script, one per window. Each Run_gromos.sh script
executes a Run_lambda.sh script to perform MD simulations. At the end of MS simulations of
each window, the necessary GROMOS files are copied in to the analysis directory, which is
68
within the main directory of the mutant. A Fortran 90 program (Section B.9) is used to extract
( )11V R and ( )22V R to perform WHAM or UI analysis. A compact description of each script is
given before the script. Scripts are also heavily annotated.
B.2 Makedir.sh
This script makes the necessary directory structure for a mutant and copies and edits the
necessary input files. The variables that commonly require changing are provided at the
beginning of the script. Additionally, line 39 can be changed to accommodate a different number
of windows.
#!/bin/sh 1 2 # Makes the full folder structure for a mutant 3 # and copy and edit necessary files. 4 # Copy script to where ever you want the $DATADIR to be. 5 6 NEWm='G121L' # new mutant 7 OLDm='D27C' # old mutant 8 NEWd='x2' # directory name 2nd half 9 OLDd='test' # old directory name 2nd half 10 NEWp='1706' # new # of protein atoms 11 OLDp='1701' # old # of protein atoms 12 NEWt='13997' # new total atoms 13 OLDt='13992' # old total atoms 14 15 SYS_BASE='1RX2_G121L' 16 HOMEDIR='/home/malika/DHFR/G121L/data_' # old folder name 1st half 17 DATADIR='data_' # new folder name 1st half 18 19 # Begin creating directories 20 mkdir ${DATADIR}${NEWd} 21 mkdir ${DATADIR}${NEWd}/restrain_jobs 22 mkdir ${DATADIR}${NEWd}/analysis 23 mkdir ${DATADIR}${NEWd}/tempfiles 24 mkdir ${DATADIR}${NEWd}/inputs 25 26 # Copy input files and scripts 27 cp ${HOMEDIR}${OLDd}/inputs/* ${DATADIR}${NEWd}/inputs/ 28 cp ${HOMEDIR}${OLDd}/*.sh ${DATADIR}${NEWd}/ 29
69
sed 's/'$OLDm'/'$NEWm'/' < ${HOMEDIR}${OLDd}/analysis/untar.sh > 30 ${DATADIR}${NEWd}/analysis/untar.sh 31 sed 's/'$OLDm'/'$NEWm'/' < ${HOMEDIR}${OLDd}/analysis/link.sh > 32 ${DATADIR}${NEWd}/analysis/link.sh 33 34 # Work on windows 35 # File names are changed on the fly 36 for LAM in 0.050 0.125 0.250 0.375 0.500 0.625 37 do 38 mkdir ${DATADIR}${NEWd}/l${LAM} 39 mkdir ${DATADIR}${NEWd}/l${LAM}/${SYS_BASE}_l${LAM}_init_md 40 mkdir ${DATADIR}${NEWd}/l${LAM}/${SYS_BASE}_l${LAM}_cont_md 41 sed -e 's/'$OLDm'/'$NEWm'/' -e 's/'$OLDd'/'$NEWd'/' < 42 ${HOMEDIR}${OLDd}/${LAM}_run_LAM.sh > 43 ${DATADIR}${NEWd}/${LAM}_run_LAM.sh 44 done 45 46 # Work on scripts 47 # File names are changed on the fly 48 for WORD in eqm min1 min2 min3 min4 posrest1 posrest2 posrest3 49 posrest4 50 do 51 sed -e 's/'$OLDm'/'$NEWm'/' -e 's/'$OLDd'/'$NEWd'/' < 52 ${HOMEDIR}${OLDd}/${WORD}_run_LAM.sh > 53 ${DATADIR}${NEWd}/${WORD}_run_LAM.sh 54 done 55 56 # Work on any remaining script 57 # File names are changed on the fly 58 for WORD in run_all initialize 59 do 60 sed -e 's/'$OLDm'/'$NEWm'/' -e 's/'$OLDd'/'$NEWd'/' < 61 ${HOMEDIR}${OLDd}/${WORD}.sh > ${DATADIR}${NEWd}/${WORD}.sh 62 done 63 64 # Now work on control files list 65 # File names are changed on the fly 66 # Atoms numbers are changed on the fly 67 for WORD in eqm init init0 cont min1 min2 min3 min4 posrest1 68 posrest2 posrest3 posrest4 69 do 70 sed -e 's/'$OLDp'/'$NEWp'/' -e 's/'$OLDt'/'$NEWt'/' < 71 ${WORD}_1RX2_${OLDm}_control.dat > 72 ${WORD}_1RX2_${NEWm}_control.dat 73 rm ${WORD}_1RX2_${OLDm}_control.dat 74 done 75 76 exit 077
70
B.3 Run_restr.sh
This script is similar to Run_all.sh. It submits each restrained minimization and
equilibration job successively. It calls scripts similar to Run_lambda.sh for each step of
minimization or equilibration.
#!/bin/sh 1 2 # perform restrain minimize/MD procedure 3 SYS_BASE='1RX2_G121L' 4 HOMEDIR='/home/malika/DHFR/G121L/data_x2' # job submitting folder 5 loopvar="a" # infinite loop 6 variable 7 LAM=( min1 posrest1 min2 posrest2 min3 posrest3 min4 posrest4 ) 8 i=0 # index for array LAM 9 time=180 # wait time in seconds 10 11 ./${LAM[$i]}_run_gromos.sh G121L_${LAM[$i]} & 12 echo "First job submitted: " ${LAM[$i]} 13 sleep $time 14 15 while [ "$loopvar" != "b" ] 16 do 17 echo "Enter inner loop, loopvar: " $loopvar 18 if test -e ${HOMEDIR}/inputs/${SYS_BASE}_${LAM[$i]}.xyz 19 then 20 i=`expr $i + 1` 21 echo "====> ${SYS_BASE}_${LAM[$i-1]}.xyz detected! LAM is 22 incremented to " ${LAM[$i]} 23 echo "----> next job submitted..." 24 ./${LAM[$i]}_run_gromos.sh G121L_${LAM[$i]} & 25 else 26 echo "${SYS_BASE}_${LAM[$i-1]}.xyz is still not there... 27 Waiting..." 28 sleep $time 29 fi 30 if [ "$i" = "${#LAM[@]}" ] 31 then 32 loopvar="b" 33 fi 34 done 35 echo "All jobs submitted..." 36 37 exit 038
71
B.4 Submit_min1.sh
This script submits a restrained minimization job to the computer job queue. It calls
Run_min1.sh to run the desired GROMOS job.
#!/bin/sh 1 2 # Script to autosubmit min1 3 # Usage: ./<script_name> <job_name> 4 HERE=`pwd` 5 JOBNAME=$1_$$ 6 JOBDIR=$1_$$ 7 8 # Name of the executable 9 EXEC=/scratch/${USER}/${JOBDIR}/Run_min1.sh 10 11 # create the PBS script 12 cat << EnD > $JOBNAME.pbs 13 #PBS -S /bin/sh 14 #PBS -N $JOBNAME 15 #PBS -q batch 16 #PBS -l walltime=1:00:00 17 #PBS -l ncpus=1 18 #PBS -j oe 19 mkdir /scratch/${USER}/${JOBDIR} 20 cd $HERE/restrain_jobs 21 cp ../*.sh /scratch/${USER}/${JOBDIR} 22 cd /scratch/${USER}/${JOBDIR} 23 ${EXEC} 24 tar -czvf ${JOBNAME}_job.tar.gz * 25 mkdir ${HERE}/restrain_jobs/${JOBNAME} 26 cp ${JOBNAME}_job.tar.gz ${HERE}/restrain_jobs/${JOBNAME} 27 cd ${HERE}/restrain_jobs/${JOBNAME} 28 tar -xzvf ${JOBNAME}_job.tar.gz 29 rm ${JOBNAME}_job.tar.gz 30 rm -rf /scratch/${USER}/${JOBDIR} 31 EnD 32 33 # Now submit the pbs job to the queue 34 qsub ${HERE}/${JOBNAME}.pbs 35 36 exit 037
72
B.5 Run_min1.sh
This script executes the first GROMOS minimization job. The script creates necessary
input and output links for the GROMOS executable.
#!/bin/sh 1 2 # Run the requested GROMOS jobs (min1) 3 4 SYS_BASE='1RX2_G121L' 5 XDIR='/home/malika/grforce/' 6 LAM='freeze_test' # current LAM 7 HOMEDIR='/home/malika/DHFR/G121L/data_xx' 8 9 IUNIT=${HOMEDIR}/inputs/min1_${SYS_BASE}_control.dat 10 OUNIT=${SYS_BASE}_${LAM}.out 11 12 # Create input links for GROMOS executable 13 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-mt.dat fort.20 14 ln -s ${HOMEDIR}/inputs/${SYS_BASE}_init.xyz fort.21 15 ln -s ${HOMEDIR}/inputs/${SYS_BASE}_restrain.xyz fort.22 16 ln -s ${HOMEDIR}/inputs/${SYS_BASE}_atom_seq.xyz fort.23 17 18 # Output links 19 ln -s ${SYS_BASE}_${LAM}.xyz fort.11 20 ln -s ${SYS_BASE}_${LAM}.trj fort.12 21 ln -s ${SYS_BASE}_${LAM}.nrg fort.15 22 23 # Execute GROMOS 24 $XDIR/bin/jw_epromd.64 < $IUNIT > $OUNIT 25 26 # Copy coordinates 27 # Edit files for next step: posrest1 28 rm -f fort.* 29 cp ${SYS_BASE}_${LAM}.xyz ${HOMEDIR}/inputs/${SYS_BASE}_min1.xyz 30 sed 's/POSITION/REFPOSITION/' < ${SYS_BASE}_${LAM}.xyz > 31 ${HOMEDIR}/inputs/${SYS_BASE}_restrain1.xyz 32 sed 's/POSITION/POSRESSPEC/' < ${SYS_BASE}_${LAM}.xyz > 33 ${HOMEDIR}/inputs/${SYS_BASE}_atom_seq1.xyz 34 35 # Archive output files 36 tar -czvf ${SYS_BASE}_${LAM}.trj.tar.gz ${SYS_BASE}_${LAM}.trj 37 tar -czvf ${SYS_BASE}_${LAM}.nrg.tar.gz ${SYS_BASE}_${LAM}.nrg 38 rm -f rm -f *.trj *.nrg 39 40 exit 041
73
B.6 Run_all.sh
This script submits windows after 20 ps of MD in the previous window by checking the
existence of the GROMOS coordinates file after 20 ps.
#!/bin/sh 1 2 # Script to auto submit windows 3 4 # Define variables 5 SYS_BASE='1RX2_G121L' # file identifier 6 HOMEDIR='/home/malika/DHFR/G121L/data_x2' # start folder 7 loopvar="a" # loop variable 8 LAM=( 0.050 0.125 0.250 0.375 0.500 0.625 ) # window identifiers 9 i=0 10 time=300 # sleep time in sec. 11 12 # Start first window 13 ./${LAM[$i]}_run_gromos.sh G121L_${LAM[$i]} & 14 echo "First job submitted, LAMBDA: " ${LAM[$i]} 15 sleep $time 16 17 # Start checking for output files of the previous job 18 # Then start the following window 19 while [ "$loopvar" != "b" ] 20 do 21 if test -e 22 ${HOMEDIR}/tempfiles/${SYS_BASE}_l${LAM[$i]}_init_md.xyz 23 then 24 i=`expr $i + 1` 25 echo "====> ${SYS_BASE}_l${LAM[$i-1]}_init_md.xyz detected!" 26 echo " LAMBDA is incremented to " ${LAM[$i]} 27 ./${LAM[$i]}_run_gromos.sh G121L_${LAM[$i]} & 28 echo "----> LAMBDA ${LAM[$i]} submitted." 29 else 30 echo "Waiting for ${SYS_BASE}_l${LAM[$i-1]}_init_md.xyz ..." 31 sleep $time 32 fi 33 34 # Break loop if we’re at the last window 35 if [ "$i" = "${#LAM[@]}" ] 36 then 37 loopvar="b" 38 fi 39 done 40 echo "All jobs submitted..." 41 42 exit 043
74
B.7 Run_gromos.sh
This script submits the computer job to the job queue of the server. This script is called
within Run_all.sh although it can be executed directly. The actual file name of the script begins
with the window designation. The jobs are run at /scratch/malika directory, and then results are
archived and copied back to the starting directory. Each window has one Run_gromos.sh script.
#!/bin/sh 1 2 # Submits a job to the batch PBS queue on shscluster2 3 # Usage: ./<script_name> <job_name> 4 HERE=`pwd` 5 JOBNAME=$1_$$ 6 JOBDIR=$1_$$ 7 8 # Name of the executable 9 EXEC=/scratch/${USER}/${JOBDIR}/0.125_Run_lambda.sh 10 11 # create the PBS script 12 cat << EnD > 13 $JOBNAME.pbs 14 #PBS -S /bin/sh 15 #PBS -N $JOBNAME 16 #PBS -q batch 17 #PBS -l walltime=180:00:00 18 #PBS -l ncpus=1 19 #PBS -l nodes=1:big 20 #PBS -j oe 21 mkdir /scratch/${USER}/${JOBDIR} 22 cd $HERE/l0.125 23 cp -r *md /scratch/${USER}/${JOBDIR} 24 cp ../0.125_run_LAM.sh /scratch/${USER}/${JOBDIR} 25 cd /scratch/${USER}/${JOBDIR} 26 ${EXEC} 27 tar -czvf ${JOBNAME}_job.tar.gz * 28 mkdir ${HERE}/l0.125/${JOBNAME} 29 cp ${JOBNAME}_job.tar.gz ${HERE}/l0.125/${JOBNAME} 30 cd ${HERE}/l0.125/${JOBNAME} 31 tar -xzvf ${JOBNAME}_job.tar.gz rm ${JOBNAME}_job.tar.gz 32 rm -rf /scratch/${USER}/${JOBDIR} 33 EnD 34 35 # Now submit the pbs job to the queue 36 qsub ${HERE}/${JOBNAME}.pbs 37 38 exit 039
75
B.8 Run_lambda.sh
This script performs the actual GROMOS calculation for one window. It performs two
jobs, one for the initial 20 ps in order to generate the coordinate file for Run_all.sh script and then
continues the job for 350 ps. Each window has one Run_lambda.sh script.
#!/bin/sh 1 2 # Initialize and run MD for one LAMBDA 3 4 SYS_BASE='1RX2_G121L' # file identifier 5 XDIR='/home/malika/grforce/' # executable directory 6 LAM=l0.125 # current LAMBDA 7 OLDLAM=l0.050 # previous LAMBDA 8 HOMEDIR='/home/malika/DHFR/G121L/data_x2' # where job files are 9 10 # First run 20 ps 11 cd ${SYS_BASE}_${LAM}_init_md/ 12 IUNIT=${HOMEDIR}/inputs/init_${SYS_BASE}_control.dat 13 OUNIT=${SYS_BASE}_${LAM}_init_md.out 14 15 # Create input links for GROMOS executable 16 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-mt-pert-evb.dat fort.20 17 ln -s ${HOMEDIR}/tempfiles/${SYS_BASE}_${OLDLAM}_init.xyz fort.21 18 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-pert-evb.dat fort.30 19 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-evb-${LAM}.dat fort.50 20 21 # Output links 22 ln -s ${SYS_BASE}_${LAM}_init_md.xyz fort.11 23 ln -s ${SYS_BASE}_${LAM}_init_md.trj fort.12 24 ln -s ${SYS_BASE}_${LAM}_init_md.nrg fort.15 25 ln -s ${SYS_BASE}_${LAM}_init_md.dlm fort.16 26 27 # Execute program 28 $XDIR/bin/jw_epromd.64 < $IUNIT > $OUNIT 29 30 # Copy .xyz to tempfiles to trigger Run_all.sh 31 cp ${SYS_BASE}_${LAM}_init_md.xyz ${HOMEDIR}/tempfiles/ 32 rm -f fort.* 33 rm -f *.trj *.nrg *.dlm 34 35 # Now continue running 330 ps 36 cd ../${SYS_BASE}_${LAM}_cont_md/ 37 IUNIT=${HOMEDIR}/inputs/cont_${SYS_BASE}_control.dat 38 OUNIT=${SYS_BASE}_${LAM}_cont_md.out 39 40 # Create input and output links for this job 41 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-mt-pert-evb.dat fort.20 42 ln -s ${HOMEDIR}/tempfiles/${SYS_BASE}_${LAM}_init_md.xyz fort.21 43
76
ln -s ${HOMEDIR}/inputs/${SYS_BASE}-pert-evb.dat fort.30 44 ln -s ${HOMEDIR}/inputs/${SYS_BASE}-evb-${LAM}.dat fort.50 45 ln -s ${SYS_BASE}_${LAM}_cont_md.xyz fort.11 46 ln -s ${SYS_BASE}_${LAM}_cont_md.trj fort.12 47 ln -s ${SYS_BASE}_${LAM}_cont_md.nrg fort.15 48 ln -s ${SYS_BASE}_${LAM}_cont_md.dlm fort.16 49 50 $XDIR/bin/jw_epromd.64 < $IUNIT > $OUNIT 51 rm -f fort.* 52 53 # Archive output files 54 # Copy .nrg file to analysis directory 55 tar -czvf ${SYS_BASE}_${LAM}_cont_md.trj.tar.gz *.trj 56 tar -czvf ${SYS_BASE}_${LAM}_cont_md.nrg.tar.gz *.nrg 57 tar -czvf ${SYS_BASE}_${LAM}_cont_md.dlm.tar.gz *.dlm 58 cp ${SYS_BASE}_${LAM}_cont_md.nrg.tar.gz ${HOMEDIR}/analysis/ 59 rm -f *.trj *.nrg *.dlm 60 61 exit 062
B.9 ExtractV.f90
This Fortran 90 program extracts ( )11V R and ( )22V R from the GROMOS .nrg output
file. It creates two separate output files, one to be used for the WHAM analysis, and the other for
the UI analysis. The data extraction is based on searching for the EVBMAT keyword in the
GROMOS file.
program extractv 1 2 ! Extracts V11 and V22 from a Gromos .nrg file by searching for 3 ! EVBMAT keyword 4 ! Input is a GROMOS .nrg file. 5 ! WHAM output is in kJ/mol and UI output is in kcal/mol 6 ! Malika - 4/5/2008 7 8 implicit none 9 10 character(len=80) :: rest 11 character(len=80) :: filename 12 13 integer :: temp 14 real (kind=8) :: v11, v22, dummy 15 16
77
! Read input file name 17 write(*,'("Enter the name of Gromos NRG file: ",$)') 18 read(*,*) filename 19 20 ! Open input and output files 21 open(1,file=trim(filename)) 22 open(2,file=trim(filename)//".wham") 23 open(3,file=trim(filename)//".ui") 24 25 temp = 0 26 27 do 28 read(1,'(a80)',end=99) rest 29 30 if (trim(rest).eq."# EVBMAT") then 31 temp = temp + 1 32 read(1,*) v11, dummy 33 read(1,*) dummy, v22 34 write(2,*) v11, v22 35 write(3,*) v11/4.184, v22/4.184, (v11-v22)/4.184 36 endif 37 38 enddo 39 40 write (*,’(“Number of EVBMATs extracted: “,I6)’) temp 41 99 continue 42 43 ! Close opened files 44 close(1) 45 close(2) 46 close(3) 47 48 end program extractv49
VITA
Malika D. Kumarasiri
Malika D. Kumarasiri was born in Colombo, Sri Lanka to Padmini Kumarasiri and
Pettanayake Kumarasiri. In December 2001, he graduated from University of Colombo, Sri
Lanka with a Bachelor of Science Honors degree in chemistry. He married Vindhya Panagoda in
2003. He then, joined the research group of Prof. Sharon Hammes-Schiffer at the Pennsylvania
State University in University Park, Pennsylvania to pursue graduate studies in chemistry. In
December 2008, he received a Doctorate of Philosophy in chemistry for investigating anharmonic
effects of small clusters and ranking enzymatic mutants efficiently, according to their catalytic
reaction rates. His publications include:
M. Kumarasiri, G.A. Baker, A.V. Soudackov, and S. Hammes-Schiffer, “Ranking
Mutants of Dihydrofolate Reductase According to the Hydride Transfer Rates,” Journal
of Physical Chemistry B, submitted.
D.K. Chakravorty, M. Kumarasiri, A.V. Soudackov, and S. Hammes-Schiffer,
“Implementation of Umbrella Integration within the Framework of the Empirical Valence
Bond Approach,” Journal of Chemical Theory and Computation, 2008, 4, 1974 – 1980.
M. Kumarasiri, C. Swalina, and S. Hammes-Schiffer, “Anharmonic Effects in
Ammonium Nitrate and Hydroxylammonium Nitrate Clusters,” Journal of Physical
Chemistry B., 2007, 111, 4653 – 4658