n keviews 0 in 0 computational chemistry i1 · computational chemistry literature and an updated...

n 0 0

Keviews in Computational Chemistry I1

Edited by

Kenny B. Lipkowitz and Donald B. Boyd

@ WILEY-VCH New York Chichester Weinheirn Brisbane Singapore Toronto

Reviews in Computational Chemistry I1

Reviews in Computational Chemistry 11

Edited by

Kenny B. Lipkowitz and Donald B. Boyd

@ WILEY-VCH ~ ~~

New York Chichester Weinheirn Brisbane Singapore Toronto

A NOTE TO THE READER This book has been electronically reproduced from digital information stored at John Wiley & Sons, Inc. We are pleased that the use of this new technology will enable us to keep works of enduring scholarly value in print as long as there is a reasonable demand for them. The content of this book is identical to previous printings.

Kenny D . Lipkowitz Department of Chemistry Indiana University - - Purdue

University at Indianapolis I125 East 38th Street Indianapolis. Indiana 46205

Donald B. Boyd Lilly Research Laboratories Eli Lilly and Company Lilly Corporate Center Indianapolis, Indiana 46285

Copyright 'C 199 I by Wiley-VCH. Inc.

Originally published as ISBN 1-56081-5 15-9

Published sinitiltaneously in Canada.

N o part of this publication may be reproduced. stored 111 a retrieval system, or transinittcd in any forin or by any means, electronic. mechanical, photocopying, recording. scanning or otherwise, except as permitted tiiider Sections 107 and 108 oftlie 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center; 222 Rosewood Drive, Danveru. MA 01923, (978) 750-8400> fax (978) 750-4744. Requests to the Publisher for permission should he addressed to the I'ermissions Department: John Wiley & Sons, Inc.. 605 Third Avenue, New York, N Y 10158-0012. (212) 850-601 1. fax (212) 850-6008. E-mail: [email protected].

For ordering and custoiner service. call 1 -SOO-CALL-WILEY,

Lihr(irJ1 of Congress Cntrrlogirig-iri-Pitbliclrtiolt Data Reviews i n computational clieniistry : advances / edited by Kenny B.

Lipko\+ itz and Donald 8. Boyd p. cm.

Includes bibliographical references and index

1 , Chemistry -- Data processing. 2 . Chemistry -- Mathematics. lSBN 0-471-18810-7

I . Lipkowitz. Kenny B. 11. Boyd. Donald B. QD39.3.E46R49 1991 541.2'2--dc20 91-20017

CIP

1 0 9 8 7 6 5 4 3 2 1

Preface

In 1987, we laid the plans for a book series reviewing advances in the rapidly growing and evolving field of computational chemistry. We felt that such a series would be a service to the scientific community and would fill a need not otherwise being met. We aimed the scope broadly covering not only quantum and molecular mechanics, but also covering the closely allied technologies of computer-assisted molecular design, molecular modeling, computer graphics, and quantitative structure-activity relationships. In other words, any way in which computers could help scientists better understand molecules was of interest to us and, we felt, would be of interest to others. It was our desire to have the discourses meet the needs of experts actively working in the field, while pro- viding enough introductory material that newcomers could also gain from the reviews. These plans led to the first volume, Reviews in Comptrtutional Chemis- tvy, which was published in 1990. The response by the scientific community has been heartening.

New applications, new methodologies, and new perspectives are offered in this second volume. We have arranged the contributions as follows. First are four chapters dealing with conformational analysis, molecular mechanics, and molecular dynamics. Following these are four chapters on quantum mechani- cally oriented topics and two chapters on quantitative structure-activity relationships (QSAR). Lastly, an essay focusing on pivotal papers and trends in the computational chemistry literature and an updated appendix on software for molecular modeling are presented.

In Chapter 1, Andrew R. Leach introduces and thoroughly describes methods for generating and modeling conformations of small and medium-sized molecules. He, of course, helped develop rule-based methods for generating conformers and describes not only this new approach, but also systematic search, random search, distance geometry, and molecular dynamics methods. In Chapter 2, John M. Troyer and Fred E. Cohen survey the ever pressing problem of predicting the three-dimensional structure of proteins. Advances in understanding the folding of macromolecules will greatly accelerate molecular biology and drug design studies. Essential to any classical simulation of molecular structure is a force field. The parameterization of the force field is of utmost importance because all the results are dependent on it. Two laboratories actively

V

vi Preface

involved in force field development have contributed the next two chapters. J. Phillip Bowen and Norman L , Allinger describe in Chapter 3 the development of the famous force field that has evolved from MMI and M M 2 to MM3. These force fields are especially applicable to small molecules of interest to organic chemists. In Chapter 4, Uri Dinur and Arnold T. Hagler delineate progress in developing better force fields particularly applicable to macromolecules such as proteins.

In Chapter 5 , Steve Scheiner shares his expertise in quantum mechanical studies of hydrogen-bonded systems. Strategies for rigorous theoretical treat- ment of intermolecular systems are explained. In Chapter 6 , Donald E. Wil- liams describes an analysis of charge distribution in molecules. Electrons are distributed throughout the constellation of nuclei in a molecule, so it has always been an intriguing problem to divide this probabilistic distribution among the atoms into net (or so-called partial or point) charges. Good net atomic charges are of interest not only to quantum chemists, but also to developers of force fields. The electron distribution within a molecule produces electrical effects around its periphery that control how one molecule perceives another. Electro- static potentials are reviewed by Peter Politzer and Jane S. Murray in Chapter 7 . In Chapter 8, Michael C. Zerner builds on the chapter of J. J. P. Stewart in the first volume of this series and develops the relationship of the many semiempirical molecular orbital methods in use today. Although a b znitio molecular orbital approaches are essential for rigorous description of small molecular systems, semiempirical methods remain the most practical way to describe the electronic structure of larger systems. Particular focus is given to transition metal complexes.

Molecular topology deals with mathematical descriptions of which atoms are bonded in a molecule and what the environment of each atom is. Scientists working to understand the relationship between molecular structure and biological activity will appreciate that molecular topology has wide applicability without being costly in terms of computer resources. In Chapter 9, Lowell H. Hall and Lemont B. Kier update their ever expanding applications of molecular connectivity indexes. The group at the Institute of Chemistry in Moldavia has forged a new method for analyzing structure-property relationships built on a quantum chemical foundation and treatable by small computers. I. B. Bersuker, who studied in the department of V. A. Fock in Leningrad, and A. S. Dimoglo reveal QSAR applications of this method in Chapter 10.

We have again been fortunate in having renowned experts in the field write the reviews. We thank them for sharing their knowledge. Joanne Hequembourg Boyd is acknowledged for assistance in indexing and proofreading. We appreciate the high level of interest that exists for this book series and welcome com- ments and suggestions from the readership.

Kenny B. Lipkowitz and Donald B. Boyd Indianapolis December, 1990

Contributors

Norman L. Allinger, Department of Chemistry, School of Chemical Sciences, University of Georgia, Athens, Georgia 30602, U.S.A.

I. B. Bersuker, Institute of Chemistry, Academy of Sciences, S.S.R. Moldova, Grosul str. 3, 277028 Kishinev, U.S.S.R.

Donald B. Boyd, Lilly Research Laboratories, Eli Lilly and Company, Indi- anapolis, Indiana 46285, U.S.A.

J. Phillip Bowen, Department of Chemistry, School of Chemical Sciences, University of Georgia, Athens, Georgia 30602, U.S.A.

Fred E. Cohen, Department of Medicine, University of California, San Fran- cisco, San Francisco, California 94143-0446, U.S.A.

A. S. Dimoglo, Institute of Chemistry, Academy of Sciences, S.S.R. Moldova, Grosul str. 3, 277028 Kishinev, U.S.S.R.

Uri Dinur, Department of Chemistry, Ben-Gurion University of the Negev, Beer- Sheva 84105, Israel.

Arnold T. Hagler, Biosym Technologies Inc., 10065 Barnes Canyon Road, San Diego, California 92121, U.S.A.

Lowell H. Hall, Department of Chemistry, Eastern Nazarene College, Quincy, Massachusetts 02170, U.S.A.

Lemont B. Kier, Department of Medicinal Chemistry, School of Pharmacy, Vir- ginia Commonwealth University, Richmond, Virginia 23298 , U.S.A.

Andrew R. Leach, Computer Graphics Laboratory, School of Pharmacy, Univer- sity of California, San Francisco, San Francisco, California 94143-0446, U.S.A.

V i i

viii Contributors

Jane S. Murray, Department of Chemistry, University of New Orleans, New Orleans, Louisiana 70148, U.S.A.

Peter Politzer, Department of chemistry, University of New Orleans, New Orleans, Louisiana 70148, U.S.A.

Steve Scheiner, Department of Chemistry, Southern Illinois University, Carbon- dale, Illinois 62901, U.S.A.

John M. Troyer, Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94143-0446, U.S.A.

Donald E. Williams, Department of Chemistry, University of Louisville, Louis- ville, Kentucky 40292, U.S.A.

Michael C. Zerner, Quantum Theory Project, Department of Chemistry, University of Florida, Gainesville, Florida 3261 1, U.S.A.

Lontents

1. A Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules Andrew R. Leach

Introduction 1 Conformational Analysis: Some Concepts 1 Conformational Searching: Statement of the Problem 2

Systematic Search Methods 4 Tree Representations and Their Use in Systematic Search 4 Implementations of the Systematic Search S

of Conformation 1 2 Molecular Models 1 2 The “Build-up” Approach: Polypeptides and DNA 13 Symbolic Representations of Conformation and Their

1 s Crystallographic Databases and Conformational Analysis 18

Random Search Methods 20 Cartesian and Internal Coordinate Random Search

Methods 20 2 s

Further Uses of the Metropolis Algorithm in Random Searching Methods 26

Simulated Annealing 28 30

The Representation of Conformations Using Interatomic Distances 30

Detailed Description of the Distance Geometry Method 30 The Generation of Conformations of a Simple Molecule

Model Building Approaches and Symbolic Representations

Use in Searching Conformational Space

Random Simulations and the Metropolis Algorithm

Distance Geometry and Related Methods

Using Distance Geometry and Some Applications of the Method 35

Energy Embedding 38

ix

x Contents

Related Approaches: Target Function Minimization, the Diffusion Equation Method, and the Ellipsoid Algorithm 39

The Molecular Dynamics Method 42

Conformational Space 43 Restrained Molecular Dynamics 45

Summary and Conclusions 46 References 47

Molecular Dynamics 42

Using Molecular Dynamics to Search

2. Simplified Models for Understanding and Predicting Protein Structure John M . Troyer and Fred E . Cohen

Introduction Molecular Mechanics Modeling Knowledge-Based Modeling Semiempirical and Polymer Models Conclusion References

3. Molecular Mechanics: The Art and Science of Parameterization I. Phillip Bowen and Norman L. Allinger

Introduction Molecular Mechanics Theory History of Molecular Mechanics Formulation of Molecular Mechanics

Bond Stretching Angle Bending Torsional Angles van der Waals Electrostatics Cross Terms

Heats of Formation Parameterization References

4. New Approaches to Empirical Force Fields Uri Dinur and Arnold T Hagler Force Fields and Their Physical Significance

Introduction The Basic Paradigm

57 58 63 68 74 74

81 82 82 84 8.5 86 87 88 89 89 91 92 95

99 99

101

Contents xi

System of Coordinates, Spectroscopic versus Empirical Force Fields, and the Assumption of Transferability

The Energy Expression Determining Force Constants

Derivation of “Quantum Mechanical” Force Fields from Ab Initio Data: The Theory of Energy Derivatives

Specific Force Constant Analysis and Computational Observables

Applications of the Theory of Energy Second Derivatives An lnitio Dihedral Potentials Nonbonded Interactions

Conclusions References

5. Calculating the Properties of Hydrogen Bonds by ab Initio Methods 1)

Steve Scheiner

Definition of a Hydrogen Bond Geometry Energetics Electronic Rearrangement Spectroscopic Criteria Exceptions Make the Rules

Perturbation Theory vs. Supermolecular Approach Components of Interaction Energy

Theoretical Framework

Computational Issues Superposition Error

Historical Perspective Secondary Effects Conclusions

Simple Predictive Models Basis Set Dependence Anisotropy of Correlated Components

Hartree-Fock Level Correlation Contributions Level of Correlation

Potential Energy Surfaces Water Dimer HF Dimer Ammonia Dimer

Geometry

Interaction Energy

Flexibility and Vibrational Frequencies

103 107 117

119

122 128 128 139 156 158

165 165 166 166 166 167 168 168 169 171 172 173 177 178 179 179 182 186 187 187 189 191 192 192 196 197 20 1

xii Contents

Energetic Requirements for Geometric Deformation Vibrational Frequencies Influence of Basis Set, Correlation, and

Anharmonicity Summary and Recommendations References

6 . Net Atomic Charge and Multipole Models for the ab Initio Molecular Electric Potential Donald E . Williams

Introduction Electronegativity, Net Atomic Charges, and Molecular

Multipoles Calculation of ab lnitio Wavefunctions

Observed and Calculated Dipole Mo?ents Population Analysis of the Wavefunction Calculation and Display of the Electric Potential Multipole Expansion of the Wavefunction Calculation of Potential-Derived Point Charges and

Multipoles in Molecules

Charges Least-Squares Derivation of PD Net Atomic

Location of Grid Points for the Electric Potential Goodness-of-Fit Parameters

Hydrocarbons Halogen Compounds Oxygen Compounds Nitrogen Compounds (Except Amides) Amides Miscellaneous Compounds

Results for Potential-Derived Net Atomic Charges

Potential-Derived Monopole Models with Additional Nonatomic Sites

Lone Pair Sites in Azabenzenes Lone Pair and Bond Charge Models for

Lone Pair Sites in Water Monomer and Dimer

PD Atomic Multipole Models PD Bond Dipole Models Electrostatics in Molecular Mechanics

Fluorohydrocarbons

Potential-Derived Multicenter Multipole Models

Conclusion References

202 202

205 208 210

219

22 1 222 224 225 226 23 2

235

23 5 23 7 239 239 240 24 1 243 245 249 248

249 249

254 254 258 258 261 263 263 264

Contents xiii

7. Molecular Electrostatic Potentials and Chemical Reactivity Peter Politzer and Jane S. Murray

Introduction The Electrostatic Potential: Definition and Significance Historical Survey

Electrophilic Processes Biological Recognition Processes Hydrogen Bonding

Computational Methodology Rigorous Evaluation of V(r) Approximate Evaluation of V(r)

Nucleophilic Processes Correlations with Other Properties Strained Molecules

Some Recent Applications

Summary References

8. Semiempirical Molecular Orbital Methods Michael C. Zerner

Introduction Hartree-Fock Theory Approximate Formulations of the Fock Equations

Zero Differential Overlap Methods Extended Huckel Schemes

Complete Neglect of Differential Overlap

Intermediate Neglect of Differential Overlap Neglect of Diatomic Differential Overlap Extended Huckel Theories

M I N D 0 / 3 MNDO, AM1, and PM3 SINDOl IPJDO/S

Properties Reactions

Parameterization

Schemes

Current Reliability of Semiempirical Methods

Semiempirical Quantum Chemistry

Summary References

273 274 278 27 8 280 282 284 284 285 289 289 295 301 303 304

313 315 318 318 324 328

329 333 338 340 342 343 343 347 348 353 354 356 357 359

xiv Contents

9. The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure-Property Modeling Lowell H . Hall and Lernont B. Kier

Introduction Background for Molecular Connectivity Development of Molecular Connectivity Molecular Connectivity Approach Molecular Connectivity Method

Order Zero: O x

Order One: 1x

Higher Order Chi Indexes: " x r and '"x;

Chromatographic Retention Molar Volume Heat of Atomization of Hydrocarbons and Alcohols Ionization Potential Molar Refraction QSAR of General Anesthetics Phenol Toxicity to Fathead Minnows Inhibition of Microsomal p-Hydroxylation of Anilines

Antiviral Activity of Benzimidazoles against Flu Virus Bioconcentration Factor for Phenyl and Biphenyl

Physical Significance of Molecular Connectivity Indexes

QSAR Applications of Molecular Connectivity Chi Indexes

by Alcohols

Compounds

Characterization of Molecular Shape Background: Steric or Shape Influence Methods for Steric Quantification

Quantitation of Influence on Properties Geometric Models Object Comparisons Structure Description Based on Topology or Chemical

Graph Theory Model of Molecular Shape Based on Chemical Graph Theory

General Model First-Order Shape Attribute Second-Order Shape Attribute Third-Order Shape Attribute A Shape Index from Zero-Order Paths

Shape Information in the Kappa Values Encoding Atom Identity

Modified Atom Count Effect of Alpha Inclusion in Kappas

3 67 369 371 373 375 376 378 378 380 380 382 383 384 385 386 3 87

388 388

389 3 90 391 392 392 392 393 393

394 394 394 396 397 398 398 399 400 400 402

Contents xv

10.

11.

Kappa Index Values for Small Molecules Molecular Shape Quantitation

General Model Higher Order Indexes Additivity

General Applications Shape Similarity Cavity Definition Molecular Flexibility

Specific Application of Kappa Indexes The Pitzer Acentric Factor Comparison with the Taft Steric Parameter Enzyme Inhibitors Toxicity Analysis

Characterization of Skeletal Atoms, the Topological State References

The Electron-Topological Approach to the QSAR Problem 1. B. Bersuker and A. S. Dimoglo

Introduction Background Brief Review of QSAR Methods Basic Ideas of the Electron-Topological Approach Algorithms and Computer Implementation Applications to Specific Problems Concluding Remarks References

The Computational Chemistry Literature Donald B. Boyd

Introduction Nobel Laureates Most Cited Long-standing Papers Most Cited Papers in 1984 and 1985 Some Papers Recently Receiving Recognition Comparison of Computational Chemistry Journals Conclusion References

403 404 404 404 404 40.5 405 405 406 409 409 409 410 410 41 1 41.5

423 423 425 43 1 433 438 456 4.57

46 1 46 1 462 466 469 470 475 477

Appendix: Compendium of Software for Molecular Modeling Donald B. Boyd

Introduction 48 1

xvi Contents

Themes 481 References 484 Software 485 Personal Computers 486 Minicornputers-Superrninicomputers-Supercomputers-

Workstations 489

Author Index 499

Subject Index 51s

CHAPTER 1

A Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules

Andrew R. Leach Computer Graphics Laboratory, School of Pharmacy, University of California, Sun Francisco, San Francisco, California 941 43-0446

INTRODUCTION

Conformational Analysis: Some Concepts The conformations of a given molecular configuration are traditionally

defined as the set of arrangements of its atoms in space, which can be inter- converted solely by rotation about single bonds.’ It is usually helpful to relax this definition somewhat to recognize that many conformational changes require small distortions of bond angles and bond lengths and also to acknowl- edge that rotation can occur about bonds with a bond order intermediate between one and two. Conformational analysis can be loosely defined as the study of the conformations of a molecule and their influence on its properties and behavior. Although the foundations of conformational analysis were laid in the late nineteenth century by chemists such as van’t Hoff2 and Le Be1,3 “modern” conformational analysis is frequently considered to date from the pioneering work of B a r t ~ n , ~ who extended the earlier work of HasselS to explain how equatorial and axial substituents on cyclohexane rings show different reactivity. His ideas were rapidly accepted both by workers interested in natural products and those concerned with mechanistic studies. One major rea- son for the subsequent interest at that time was the development of techniques such as infrared spectroscopy, NMR, and X-ray crystallography, which (unlike the degradative/synthetic methods used previously) enabled the chemist to actu- ally determine the conformation.

1

2 Methods for Searching the Conformational Space of Molecules

I h

Figure 1 Schematic representation of a conformational energy surface showing how the potential energy ( E ) varies with some conformational parameter (1). The minimum of lower energy is narrower and so may be less populated t h a n the broader minimum.

Conformational Searching: Statement of the Problem

Most molecules of interest to organic, bioorganic, and medicinal chemists can adopt more than one conformation. Well-known examples are the staggered and eclipsed forms of ethane, and the chair, twist-boat, and boat forms of cyclohexane. The conformations of a molecule are typically present in different amounts. Interconversion between conformations is due to internal vibrations of the molecule, which can themselves be regarded as arising from a variety of internal motions such as the stretching of bonds, the bending of bond angles, and the rotation about single bonds. By relating changes in these internal motions to some potential energy function, it is possible to regard changes in the molecule’s conformation as movements on the multidimensional surface that describes the relationship between the value of the energy function and the conformation. Stable conformations of a molecule (sometimes called conformers) correspond to local minima in the potential energy function. The molecule undergoes oscillations about each minimum, giving rise to conformational entropy. The relative popula- tions of the minima depend on their statistical weights,6 which include contributions from both the potential energy and the entropy. An important consequence of this is that the global energy minimum on the potential energy surface does not necessarily correspond to the structure with the highest statistical weight. This is schematically illustrated in Figure 1; although the narrow potential well is of lower energy, the broader well may have a larger statistical weight due to the proportion- ately larger contribution from the conformational entropy, because more conformational microstates are accessible.’

Introduction .3

To perform a “conformational search” it is therefore necessary to determine those minimum energy conformations that are believed to contribute to the overall conformational partition function. This requires some means of determining the energy of any given conformation and a method for determining minima on the surface described by the potential energy function. Conformational energies are usually calculated using either quantum mechanics (often by semiempirical methods*) or molecular mechanic^.^ It is not the scope of this article to deal with either of these but two points are pertinent to the present discussion. First, relative energies calculated with one energy function may not necessarily correspond to relative values obtained using another. Second, the effects of solvent are often not included (most potential energy functions currently in use are more appropriate to isolated molecules in the gaseous state). A variety of efforts have been made to include the effects of solvent by varying the dielectric constant or by adding a term to represent the free energy of solvation to the potential f u n ~ t i o n , ~ ~ - ’ ~ but there is still room for further improvement. A variety of methods can be used to locate minima on the conformational energy surface. 15 Of particular importance in conformational searching is that most minimization algorithms can go only in a “downhill” direction and are unable to surmount energy barriers in order to locate a lower minimum elsewhere on the surface. Some algorithms have been described that can overcome energy barriers, but none to date has successfully been able to locate the global minimum energy conformation from an arbitrary structure. A consequence of this inability to pass over energy barriers on the surface is that in order to perform a conformational search it is necessary to have some method of deriving initial structures for subsequent minimization, and it is these algorithms that will be the main focus of this review.

The most general conformational searching algorithms are those that aim to identify all minima on the potential energy surface. However, as the number of minima usually increases dramatically with the number of rotatable bonds, finding all of them rapidly becomes an impossible task. It is then necessary to reduce the scope of the search. There are two ways in which this is typically done. The first method is to impose some form of constraint on the conformations generated. These constraints may come from experimental investigations or from theoretical studies. It is possible to incorporate constraints into most searching methods, though more easily in some than others. The second way to reduce the search to a manageable level is to bias i t toward regions of the conformational space that correspond to the very lowest energy structures. The extreme examples of this approach are those algorithms that aim to locate just the global minimum energy conformation. Some authors would contend that a method that can supply only this one conformation is inadequate because the global energy minimum may not be the active (i.e., functional) structure and because more than one minimum may have to be considered in order to fully understand the behavior of the system. However, it is clear that there are some molecules (for example, polypeptides and proteins) where the number of energy minima is so huge that it is necessary to restrict the search, and it is usually assumed that the native conformation is the one with the lowest energy.


The methods to be discussed will be classified into the following cate- gories: systematic search, model building and symbolic approaches, random methods, distance geometry and related methods, and molecular dynamics, The emphasis will be on explaining the underlying concepts of each approach, giving some indication of the areas of applicability, and citing some typical examples. The scope of this chapter is to cover conformational searching methods for small and medium-sized molecules. Considerable effort has been expended on the development of methods for predicting the tertiary structure of proteins; many of these methods are quite specific for these molecules and are considered elsewhere in this volume.I6 However, some of the algorithms originally developed for searching the conformational space of proteins and polypeptides could also be applied to other systems and will be discussed.

SYSTEMATIC SEARCH METHODS

The systematic search is perhaps the most obvious of all searching methods. As the conformations of a molecule can to a first approximation be defined as those structures that differ solely by rotation about single bonds, an obvious way to perform a conformational search is to systematically increment each single bond through 360°, thereby generating all possible combinations of torsional angles. Such an algorithm is called a grid search. It is usual to then minimize each structure to find the associated minimum energy conformation. The size of molecule to which this straightforward algorithm can be applied is fairly limited. Suppose the angular increment is 6 and that there are n rotatable bonds in the molecule. The number of conformations generated is therefore (360/0)". For example, using an angular increment of 60° to systematically search the conformational space of a molecule with 5 rotatable bonds produces 7776 structures. If each of these requires an average of 10 sec to minimize (a fairly optimistic estimate with contemporary hardware), the search would require a total of 22 hr. Extending the search to a molecule with 7 rotatable bonds would require the minimization of 279,936 structures if the same grid size were used and would take just over a month. This exponential increase in the number of possible solutions is frequently termed a combinatorial explosion. Rings are an additional problem for systematic search methods, for it is then necessary to incorporate some means of closing the cycle. Nevertheless, the grid search does have an inherent appeal because it is an exhaustive technique-one can be certain to find all the conformations at the resolution of the chosen grid.

Tree Representations and Their Use in Systematic Search

A tree representation i s helpful in understanding the systematic search and how the basic algorithm described above can be improved. Trees are frequently

Systematic Search M e t h o d s 5

used to represent the interrelationships between the states a problem o r system can adopt.17,18 An example is shown in Figure 2. They consist of nodes con- nected by arcs. There is often a single root node that represents the initial situa- tion; in a systematic search the root node would correspond to the starting point with no dihedral angle values assigned. From the root node there are one o r more daughter (or leaf) nodes; these correspond to the options available for the first “move.” In the systematic search, these relate to the values that the first dihedral angle o1 can adopt, and so there will be 3 6 0 / 0 , such nodes. From each of the nodes at this first level, there will be an appropriate number of nodes which correspond to the values that the second dihedral angle is allowed, and SO on. Figure 3 illustrates a tree in which there are three values for the first torsional angle, two for the second, and three for the third. A maximum of 18 ( = 2 x 3 x 3 ) conformations would therefore be generated in this simple example.

Now consider how the tree representation is related to the way in which the search is performed, using Figure 3 to illustrate the procedure. Setting the first bond to its first value corresponds to moving from the root node on the tree to the first of its daughter nodes (numbered 2 in Figure 3 ) . The second bond is now assigned to its first value. This corresponds to a move from node 2 to node 5. Similarly for the third bond; as values have now been assigned to all of the variable bonds in the molecule, the conformation is fully defined and ready to be minimized. Having generated this first conformation, there are a variety of choices for the next move. A commonly used algorithm for searching trees is the depth-first search, which uses a backtracking method; here the nodes would be expanded in the order 1, 2, 5, 11, 12, 13, 6 , 14, 15, 16, 3 . . . , as illustrated in Figure 4.

One way to improve the efficiency of a systematic search is to discard structures that violate some form of energetic o r geometric criterion (e.g., close interatomic contacts or unsatisfactory ring closure) before the time-consuming energy minimization stage. The efficiency may be improved further by checking the partially constructed conformations for such problems. If a violation is detected, then all conformations that lie below the current node in the search tree will also contain the problem and can be eliminated from further consideration. For example, if the first value of torsional angle 1 when combined with the second value of torsion 2 gives rise to some problem, then all conformations which contain this combination of torsional angle values would be invalid and can be immediately rejected. These conformations are represented by nodes 14, 15 and 16 in Figure 3 . The portion of the tree that lies below node 6 is said to have been pruned from the search tree. Note, however, that only those portions of the molecule whose relative orientations will not be changed later can be considered in such checks, and so the order in which the dihedral angles are altered will be crucial to ensure optimal efficiency.

Implementations of the Systematic Search

A number of groups have successfully used these and similar ideas to enable the systematic search method to be applied to quite large molecules.


Q, 7J

C

0 .- e

\ rJ ir 3


Figure 3 A tree representation of the conformational space of a molecule in which the first dihedral angle can adopt three values, the second dihedral can adopt two values, and the third dihedral can adopt three values. There are a total of 18 possible conformations.

Figure 4 In a depth-first search, the nodes are examined in the order 1 , 2, 5, 11, 12, 13, 6, 14, 15, 16, 17, 3, 7 , 17, 18, 19, 8 , 20, 21, 22, 4, 9, 23, 24, 25, 10, 26, 27, 28.


Lipton and Still’s MULTIC program19 incorporates many such features (MUL- TIC can in some ways be regarded as a successor to Still’s “Ringmaker” programz0). For example, the rotatable bonds are chosen in a sequentially unidirectional fashion. In an acyclic molecule this means that the search starts at one end of the structure and moves down the chain. In a cyclic system, rings are opened to give a “pseudoacyclic” molecule, which is then processed as for the acyclic case; ring closure constraints are enforced where appropriate. By ordering the dihedral angles in this way, it is possible to identify, at each stage of the process, atoms whose relative positions will not change in a later torsional rotation. These sets of atoms can therefore now be tested for constraints (e.g., close interatomic contacts or ring closure tests) and thus enable any additional conformations having the same undesirable combinations of dihedral angles to be eliminated from further consideration.

Additional sources of computational inefficiency may be eliminated from tree searching algorithms. Two potential problems with backtracking algorithms, even with early testing, are known as “thrashing” and “rediscovery.” Thrashing occurs when a partial solution fails, a t which point a standard backtracking algorithm would return to the most recently changed variable. How- ever, the cause of the failure may be higher in the tree, in which case the algorithm must cycle through all the variables between the point where the failure occurred and the assignment that led to the failure. Rediscovery occurs when combinations of assignments continue to be generated even though a sub- set of those assignments leads to a problem. For example, should the second value of torsion 2 in Figure 3 when combined with the third value of torsion 3 always give rise to a high-energy steric interaction, then in a standard backtracking algorithm this combination would be generated three times (corresponding to the three possible values of torsion 1) when it should only be examined once. Koschmann and colleagues have described how a program using techniques based on Artificial Intelligence research into “truth maintenance systems” can be used to avoid some of these problems.21 Such systems use elaborate means to cache, or store, information as they proceed. In effect, the system is able to “learn” about the problem during the search. Although the storage and retrieval of this information require additional resources, they can sometimes signifi- cantly improve search efficiency.

Rings are difficult to deal with in a systematic search. One approach is to use a set of acceptable ring closure constraints. In MULTIC, for example, six values must be within prescribed limits for the conformation to be acceptable: the distance between the two atoms of the ring closure bond, the two internal angles, and the three internal dihedral angles (see Figure 5). However, such closure constraints can be applied only quite late in the process, when most of the ring system has been completed, and so rejection at this stage is relatively inefficient. An additional set of constraints, described by Smith and Veber,22 and also used by Lipton and Still, can be employed when half of the ring has been constructed. At this point the remaining bonds in their most extended form

Systematic Search Methods 9

Figure 5 The six ring closure constraints used by the MUI<TIC systematic search: three dihedral values, two bond angles, and one bond length must be within prescribed bounds.

must have sufficient length to close the ring. I f not, that branch of the search tree can be discarded. An alternative approach to the ring closure problem is to use an analytical solution to determine the atomic coordinates. G o and Scheraga derived a series of equations whose solutions give the set of torsional angles that will bridge a gap between two endpoints.23 They found that for six or fewer dihedrals the problem is totally defined (i.e., at least seven dihedrals are required before any torsional angles can be assigned independently). Bruccoleri and Karplus have modified this algorithm to allow for some variation in the bond anglesz4; they found that the original G o and Scheraga method (which assumes rigid bond lengths and angles) reduced the range of endpoints for which closure was possible.

Bruccoleri and Karplus used this modified procedure in a program ( C O N - G E N ) to systematically generate polypeptide loop structures for use in homol- ogy modeling.25 In this technique the (unknown) structure of a protein is predicted using that of a n homologous one as a template; in families of homologous proteins it is frequently found that the core structures are very similar but that there are variations on the perimeters which must be constructed by some other means. C O N G E N determines possible conformations of such loop residues. The generation of loop structures, like the construction of rings, is a n example of a conformational search that is subject to geometric constraints (i.e., the endpoints of the loop structures must coincide with the equivalent points on the rest of the model).

CONGEN first derives a set of backbone conformations using the modified Go and Scheraga method. As seven torsions is the threshold for which any independent assignment is possible, values can be assigned only to loops with

10 Methods f o r Searching the Conformational Space of Molecules

greater than three residues (each nonproline amino acid has two freely rotating backbone bonds). Values are systematically assigned to the 6 and w values of N-3 residues, and then the chain is closed. The ( 6 , w ) values are taken from Ramachandran plots, so that only those combinations known to be energetically acceptable are used. Having established a series of backbone conformations, it is then necessary to add the side chains.

Bruccoleri and Karplus have investigated no fewer than five different methods for constructing side chain conformations. One of these methods is a straightforward systematic search that generates all possible conformations using a series of nested iterations over all the side chains. In most cases such an approach is not feasible due to the amount of time required. A second method uses the same systematic approach, but terminates when a single acceptable set of side chain conformations has been found. The remaining three methods aim to determine the “best” side chain conformation (which in the absence of any additional information is usually considered to be the one with the lowest energy). One of these remaining methods is based on an algorithm suggested by Hooke and Jeeves.26 A conformation is first generated by the second approach outlined above; although this represents an acceptable arrangement of the side chains (i.e., no high-energy steric contacts), its energy may still be quite high. The conformational space of the first residue is then systematically searched (keeping the other residue conformations fixed), and the conformation with the lowest energy is saved. This procedure is then repeated for the second residue, then the third, and so on until the entire chain has been treated. The program then returns to the first residue and con- tinues to iterate until the energies of the side chain atoms do not change or until some predefined limit on the number of iterations is exceeded. This method will find an energetically reasonable conformation for the side chains, but it is not necessarily the one of lowest energy because the iterative process may have been biased by the initial side chain arrangement. Bruccoleri, Haber, and Novotny have demonstrated that the CONGEN program can be used to construct models of antibody hypervariable loops that agree well with the known crystal structure^.^^

It is interesting to compare the CONGEN approach with that of Moult and James who also address the problem of generating protein loop conformations.28 Again, a two-stage procedure is employed; backbone conformations are constructed first and then the side chains are added. To generate backbone conformations, the backbone is divided into two halves, which are then built from their respective end-points, Only those pairs of chains for which the end atoms superimpose within some criterion are selected. This “divide and conquer” strategy drastically reduces the number of possible conformations that must be built and is a common tactic in searching algorithms. For residues other than glycine and proline, 11 different backbone conformations are used; these were identified by analyzing the ( 4 , w ) maps of 13 highly resolved protein structures. Having obtained backbone geometries, the side chains are added by performing a complete systematic search using a fairly crude grid in which only the staggered conformations of each bond are considered. Even at this level of resolu-


tion, the search would consume an intolerable amount of computer resources were it not for the use of rules, o r filters, that prune the search tree. In addition to rules found in other systematic search programs (e.g., checks for high steric interactions), Moult and James have introduced further tests based on known features of protein structure. For example, one of these additional rules states that the conformation with the lowest electrostatic energy is very close to the cor- rect one. Thus only the polar and charged side chains are initially allowed to vary until the conformation with the lowest electrostatic energy has been found. The hydrophobic side chains are then allowed to explore their conformational space, and the conformation chosen is the one with the smallest exposed hydrophobic area. They tested their method on Streptomyces griseus trypsin (STG). A model for this protein had previously been manually constructed29 from bovine trypsin. When the structure of STG was solved and compared to this mode1,3O it was found that some regions, each containing between 3 and 6 amino acid residues, had been built incorrectly. Two of these regions were selected by Moult and James for analysis using their automated procedure; one contained four residues ( Arg-Leu-Ser-Met) and the other five residues (Asp-Asn- Ala-Asp-Glu). The conformations chosen by the method proved to have rms deviations between 1.2 and 1.4 A with respect to the X-ray structure.

Marshall, Dammkoehler, and colleagues have devoted considerable effort to the development of efficient systematic searching techniques3',32 for use in the development of pharmocophore models. Such models describe a set of geometric relationships between functional groups believed to be important in the interaction with a receptor. They have used a variety of interesting schemes to surmount the combinatorial explosions that arise with systematic searches. One method they employed with some success is to apply the results of a search on one molecule in searching the conformational space of another. As all of the molecules must fit the receptor model, the search can be restricted to those regions of conformational space which correspond to the currently defined model. For example, if a pair of atoms must lie within some distance range in order to agree with the current model, then the range of torsional values that will permit this constraint to be satisfied can be calculated, and only these values used in the search (rather than the full 360'). In one example, a set of 28 angiotensin converting enzyme (ACE) inhibitors was studied in order to determine a model for the receptor site.33 The use of a variety of searching techniques enabled the search time to be reduced by over three orders of magnitude in comparison with a previous study.

Any systematic search involves a balance between the grid resolution and the available computer resources-too fine a grid and the search may take too long; too coarse and minima may be missed. Lipton and Still have investigated the effect of changing the various variables in their MULTIC search method; in addition to the resolution of the grid, ring closure criteria must be selected, and nonbonded cutoff values assigned. From their studies it is clear that the minimum grid resolution varies with the atom types involved; for example, the

12 Methods for Searchinn the Conformational Space of Molecules

lower barrier to rotation about C( sp3) - C(sp2) bonds means that a finer grid is needed for such bonds than for C(sp3)-C(sp3) bonds. An additional result of some significance was that the various cutoff values are interdependent, and so if one variable is changed it is necessary to investigate whether another should also be reassigned.

MODEL BUILDING APPROACHES AND SYMBOLIC REPRESENTATIONS OF CONFORMATION

Molecular Models The development of handheld molecular models was an important land-

mark in conformational analysis. Two of the earliest types are still extensively used today- those devised by D r e i d i r ~ g ~ ~ and those of Corey, Pauling, and Koltun (CPK).3S Models such as these can be used to construct a wide variety of molecules and in the hands of an experienced chemist are quite adequate for performing qualitative conformational analyses of small systems. Computer molecular graphi~s~6.3’ has extended the scope of the model-building approach, particularly when allied to some form of fragment library. Many of the currently available molecular modeling systems38 contain predefined fragments that can be joined together by the user. The resulting models are often of high quality and little different from the associated energy minimized structures. Qne significant advantage of computer-built models over their manual counterparts is that in today’s integrated modeling packages all of the functionality for performing energy calculation and minimization are included within the same program; with a manually built model one has no means for evaluating the relative energies of the conformations constructed.

To extend the fragment-joining approach from the construction of a single model to a true conformational search, it is necessary to recognize that a given fragment may be able to exist in more than one conformation. The 20 naturally occurring amino acids provide a classic example of this; they have a number of conformational minima, well documented by experimental and theoretical investigation^.^^ The most straightforward way in which a fragment-based conformational search can be implemented is as a systematic search, in which all possible combinations of fragment conformations are generated and then minimized. The tree-search algorithms discussed above are thus directly applicable to a fragment-based search. An important assumption with the approach is that a given fragment exhibits the same conformational behavior in a large molecule as in a small one, implying that short-range forces are dominant in determining conformation. A second point to note is that most fragment-joining methods are restricted to those regions of conformational space implicitly

n keviews 0 in 0 computational chemistry i1 · computational chemistry literature and an updated...

Documents