discovering unique, low energy transition states of small ... papers/2013 cim.pdf · molecular...

15
Discovering Unique, Low-energy Transition States of Small, Non-cyclic Molecules Using Evolutionary Molecular Memetic Computing M.M.H. Ellabaan 1 , Y.S. Ong 2 , S.D. Handoko 2 , C.K. Kwoh 3 , and H. Man 4 1 Center for Systems Microbiology, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark 2 Centre for Computational Intelligence, School of Computer Engineering, Nanyang Technological University, Singapore 3 BioInformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore 4 Department of Biology, Boston University, Boston, MA, USA Abstract—In the last few decades, identification of the transition states has experienced significant growth in research interests from various scientific communities. As per the transition states theory, reaction paths and landscape analysis as well as many thermodynamic properties of biochemical systems can be accurately identified through the transition states. Transition states describe the paths of molecular systems in transiting across stable states. In this article, we present a novel Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition states and showcase the efficacy of identifying the transition states using the evolutionary nature of the memetic computing paradigm. In essence, the MMC is equipped with the tree-based representation of non-cyclic molecules and the covalent-bond-driven evolutionary operators in addition to the typical backbone of memetic algorithms. Herein, we employ genetic algorithm for the global search, Berny algorithm for the individual learning, and make use of the valley-adaptive clearing scheme as the niching strategy in the spirit of Lamarckian learning. Experiments with a number of small non-cyclic molecules demonstrated excellent efficacy of the MMC compared to recent advances of several state-of-the-art algorithms. Not only did the MMC uncover the largest number of transition states, but it also incurred the least amount of computational costs. 1. Introduction Small molecules with a few dozens of atoms play important roles in biology, chemistry, and medicine. Alongside their usage as the combinatorial building blocks in chemical synthesis, small molecules have become essential for screening, designing, discovering, and synthesizing new drugs [1]. Discovery of the low-energy stereoisomers of small molecules has gained growing attention of the scientific community in the past decades as these stereoisomers may lead to the discovery of more effective drug molecules compared to their traditional counterparts [2–4]. Such stereoisomers are not always available in nature. They may require special synthesis, for which knowledge on their transition states are often required in order to identify the minimum energy pathway to achieve the synthesis. The transition states describe molecular configurations assumed during the synthesis of a stereoisomer from another stereoisomer. Transient nature of the transition states makes computational study a necessity as it is difficult—if not impossible—to observe them through wet- lab experiments. Knowledge on the transition states helps scientists not only in the synthesis of drug

Upload: others

Post on 21-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

Discovering Unique, Low-energy Transition States of Small, Non-cyclic Molecules Using Evolutionary Molecular Memetic Computing

M.M.H. Ellabaan1, Y.S. Ong2, S.D. Handoko2, C.K. Kwoh3, and H. Man4

1Center for Systems Microbiology, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark

2Centre for Computational Intelligence, School of Computer Engineering, Nanyang Technological University, Singapore

3BioInformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore

4Department of Biology, Boston University, Boston, MA, USA

Abstract—In the last few decades, identification of the transition states has experienced significant

growth in research interests from various scientific communities. As per the transition states theory,

reaction paths and landscape analysis as well as many thermodynamic properties of biochemical

systems can be accurately identified through the transition states. Transition states describe the

paths of molecular systems in transiting across stable states. In this article, we present a novel

Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy

transition states and showcase the efficacy of identifying the transition states using the evolutionary

nature of the memetic computing paradigm. In essence, the MMC is equipped with the tree-based

representation of non-cyclic molecules and the covalent-bond-driven evolutionary operators in

addition to the typical backbone of memetic algorithms. Herein, we employ genetic algorithm for

the global search, Berny algorithm for the individual learning, and make use of the valley-adaptive

clearing scheme as the niching strategy in the spirit of Lamarckian learning. Experiments with a

number of small non-cyclic molecules demonstrated excellent efficacy of the MMC compared to

recent advances of several state-of-the-art algorithms. Not only did the MMC uncover the largest

number of transition states, but it also incurred the least amount of computational costs.

1. Introduction Small molecules with a few dozens of atoms play important roles in biology, chemistry, and

medicine. Alongside their usage as the combinatorial building blocks in chemical synthesis, small

molecules have become essential for screening, designing, discovering, and synthesizing new drugs

[1]. Discovery of the low-energy stereoisomers of small molecules has gained growing attention of

the scientific community in the past decades as these stereoisomers may lead to the discovery of

more effective drug molecules compared to their traditional counterparts [2–4]. Such stereoisomers

are not always available in nature. They may require special synthesis, for which knowledge on their

transition states are often required in order to identify the minimum energy pathway to achieve the

synthesis. The transition states describe molecular configurations assumed during the synthesis of a

stereoisomer from another stereoisomer. Transient nature of the transition states makes

computational study a necessity as it is difficult—if not impossible—to observe them through wet-

lab experiments. Knowledge on the transition states helps scientists not only in the synthesis of drug

Page 2: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

isomers, but also in the identification of drug sensitivities to environmental changes. The lower the

energy of the transition states associated with a given drug isomer, the more vulnerable will the

drug be to the environmental changes. Such information may be used to measure the relative ranks

of drug analogs. Analogs that are less sensitive to environmental changes can then be recommended

subject to pathological conditions.

Identification of the transition states is a computationally challenging task. Given a small molecule,

its energy landscape is generally highly constrained and involves vast search space. While no explicit

constraint function is normally formulated, the energy terms that serve as the objective function can

easily approach infinity, suggesting the nonsensical or the naturally impossible atomic configuration

of the molecule. Furthermore, the energy calculation of the molecule is generally computationally

expensive. With a single evaluation requiring minutes to hours of CPU time, the challenge is hence to

maximize the production of meaningful conformations so as not to waste the precious

computational resources evaluating many nonsensical structures. To-date, many optimization

approaches have been proposed to identify the transition states. These can be classified into the

conventional and the stochastic approaches [5].

Conventional approaches assume the availability of domain knowledge to produce reasonably good

initial guesses of the transition states. Methods in this category can be further segregated into

double-ended and single-ended techniques [6]. Given pairs of known stereoisomers (reactants and

products), the double-ended methods find transition states that bridges them [7–9]. In contrast, the

single-ended methods find transition states in the vicinity of some initial guesses [10]. Upon

availability of such domain-specific a priori information, conventional approaches converge

efficiently to precise transition states. Otherwise, considerable efforts would be spent attempting to

“repair” the “bad” initial guesses.

In contrast to conventional approaches, significantly fewer stochastic population-based methods

have been explored for locating the transition states, making these fertile area for research

investigations. Among the related stochastic optimization techniques proposed to-date, Chaudhury

et al. studied the simulated-annealing approach for locating saddle points [11] and extended their

efforts with the inclusion of gradients information [12]. Bungay et al. considered coupling the

genetic algorithm with eigenvalues information to bias the evolutionary search towards transition

states [13]. Despite their stochastic nature, these methods incidentally also require domain-specific

a priori information for their proper operations due to the vast search space involved. Furthermore,

it is worth noting that while stochastic search methods are able to reliably identify the regions of

interest where transition states may be located, they generally suffer from slow and inefficient

convergence to high-quality solutions.

Taking these cues, we present in this article a novel evolutionary Molecular Memetic Computing

(MMC) methodology requiring zero domain knowledge on the initial guesses of the transition states.

The essence of MMC lies on the tree-based representation of non-cyclic molecules and the covalent-

bond-driven evolutionary operators in addition to the typical backbone of memetic algorithms—the

population-based global search method and the individual-based life-time learning procedure. In this

article, we employ genetic algorithm [14] for the global search, Berny algorithm [15] for the

individual learning, and make use of the valley-adaptive clearing scheme [16] as the niching strategy

in the spirit of Lamarckian learning [17, 18]. Using the specialized representation and operators, the

Page 3: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

MMC is capable of producing offspring that represent meaningful atomic configurations of a

molecule most—if not all—of the time. These in turn shall serve as good initial guesses of the

transition states. Additionally, we formulate fitness function that biases the memetic search towards

finding low-energy transition states. Experiments with five real non-cyclic molecules at the Hartree-

Fock level of ab initio calculation employing the STO-3G basis set have been conducted to assess the

efficacy of the MMC against the recent advances of several state-of-the-art algorithms. Not only

does the MMC uncover the largest set of transition states, but it also incurs the least amount of

computational costs. Furthermore, the evolutionary memetic computing paradigm allows

knowledge discovery that would be valuable for subsequent studies [19].

In the remaining of this article, mathematical formulation of the problem of searching for the

transition states will be briefly discussed in Section 2. Algorithmic details of the MMC will then be

presented in Section 3. Experimental results and in-depth discussions will follow next in Section 4.

These include the presentation of knowledge discovery from the evolutionary data in the forms of

factors affecting the number of transition states, landscape analysis, as well as density distribution of

the transition states. Section 5 finally concludes this article and provides plausible immediate future

works.

2. Problem Formulation Energy landscape has proven to be a useful conceptual framework in the field of protein folding and

molecular optimization [20–22]. The landscape can be formally defined as an ordered set of three

components, i.e. ( ), where * | + is the set of all possible structural

configurations of a molecule with atoms, represents the energy terms that serve as the

fitness function, and denotes the distance measure between two structural

configurations in . The structural configuration space is therefore a set of physically consistent

three-dimensional molecular structures. Each structure is a vector of real numbers in

angstroms (Å), representing the Cartesian coordinates of each of the atoms of the molecule. The

potential energy function gives an indication of the height of the energy landscape. Herein, we

work with the first-principle calculations at the Hartree-Fock level using the STO-3G basis set [23]. To

measure the similarity between two structural configurations in , the Ultrafast Shape Recognition

(USR) [24, 25] is considered in this article.

Transition state denotes the configuration where the minimum energy barrier is needed to cross

from the low-energy vicinity of the reactant to the low-energy vicinity of the product. Transition

states play a key role in theoretical and physical chemistry. Their molecular structures and energy

values help in the identification of reaction rates [26] and minimum energy paths [27].

Mathematically, transition state can be defined as a stationary point on the landscape with only

one negative eigenvalue in the Hessian matrix [6], i.e.

* | ( ) +

where is an eigenvalue of the Hessian matrix ( ). Scientists are generally interested on

specific transition states with the following properties:

low-energy, i.e. ( ) where is some user-defined threshold

Page 4: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

unique, i.e. | ( ) ( )| ( ) where and are the maximum

acceptable similarities in the potential energy and the structural configuration spaces,

respectively

precise, i.e. ( )

3. Molecular Memetic Computing As an emerging field, memetic computing has gained an increasing research attention in the recent

few decades with growing number of successes [28–36]. Inspired by Dawkins’ notion of a meme, it

was first introduced as the Memetic Algorithm (MA)—a marriage between population-based global

search and individual-based local search which respectively facilitate exploration and exploitation of

the search space. Not only MA is capable of escaping local optimum trap, but also it is converging

rapidly to some optimum with sufficient precision due to the life-time learning of the individual. To-

date, MAs have been crafted for solving real-world problems in science and engineering more

efficiently [29, 31–32, 37–42].

In this section, specialized representation and operators for the evolution of molecular structure will

be presented. These are crucial for efficient discovery of the low-energy transition states of small

non-cyclic molecules using the molecular memetic computing (MMC) methodology. Specialized

fitness function will also be formulated to enable efficient selection using the stochastic roulette-

wheel procedure. Finally, the valley-adaptive clearing scheme [16] and Berny algorithm [15] for

maintaining diversity and enabling life-time individual learning, respectively, will be discussed briefly.

3.1. Tree-based Representation The most straightforward representation of a molecular structure is undoubtedly the string of

concatenated Cartesian coordinates of all its atoms. However, various constraints of a molecular

system, such as covalent bonding, make such representation prone to evolving into nonsensical

structures. To alleviate this problem, we propose a tree-based representation of the molecular

structure where nodes represent atoms, links describes covalent bonds, and subtrees translates into

substructures as depicted in Figure 1. Furthermore, the tree-based representation allows the design

of specialized evolutionary operators that produces meaningful evolved structures most of the time

as will be discussed next.

Page 5: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

Figure 1. Tree-based Representation of Non-cyclic Molecule

3.2. Covalent-bond-driven Molecular Evolutionary Operators

3.2.1. Initialization

Constructing the tree-based representation given a seed molecular structure using the procedure

described above produces only one individual. To initialize a population, random sets of mutation

are applied to the seed individual such that multiple structures with different configurations are

achieved as depicted in Figure 2.

xC1,yC1,zC1

xC2,yC2,zC2

xC3,yC3,zC3

xC4,yC4,zC4

xN,yN,zN

xH1,yH1,zH1 xH2,yH2,zH2

xH3,yH3,zH3 xH4,yH4,zH4

xH5,yH5,zH5 xH6,yH6,zH6

xH7,yH7,zH7 xH8,yH8,zH8

xH9,yH9,zH9

xO1,yO1,zO1 xO2,yO2,zO2

Page 6: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

Figure 2. Initial Population of MMC

3.2.2. Crossover

Crossover between two molecular configurations aims at interchanging the substructures of

different molecular configurations such that favourable substructures are allowed to replicate across

generations. With reference to the tree-based representation, crossover can be easily performed by

choosing a random link as the crossover point following the alignment of the two molecular

configurations using Kabsch procedure [43] as illustrated in Figure 3. The resulted structures will

therefore most likely observe the constraints imposed by the covalent bonds. Even when the

constraints are not very strictly observed, the evolved structures could have been easily repaired.

chromosomes aligned

Figure 3. Covalent-bond-driven Crossover

Page 7: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

3.2.3. Mutation

Mutation of a molecular configuration aims at effecting random changes to covalent bonds about

which rotation and along which stretching may be performed. With reference to the tree-based

representation, mutation can be easily attained by carrying out random rotation about or translation

(subject to minimum and maximum allowable bond lengths [44]) along a randomly chosen link as

shown in Figure 4. In this manner, the resulted structures will always observe the constraints

imposed by the covalent bonds.

(a) bond rotation (b) bond stretching

Figure 4. Covalent-bond-driven Mutation

3.3. Transition State Fitness Function The fitness function is designed to evaluate quantitatively how suitable a generated structural

configuration of a molecule satisfies as a candidate solution of a low-energy transition state. It is

defined mathematically as

( ) ( )

( ( ) )(|| | | )

where | | and denote the number of negative eigenvalues and a significant small value to prevent

division-by-zero error, respectively. The numerator serves to favour solutions that meet the user-

defined low-energy threshold of . The denominator, on the other hand, is designed to bias the

search towards transition states.

3.4. Valley-adaptive Clearing Scheme To maintain diversity of the evolving populations, a valley-adaptive clearing scheme [16] is

incorporated to adapt to the non-uniform width of valleys in the energy landscape. Individuals in the

population are clustered into groups that share common valleys. The least fit individuals from each

group are then cleared by relocating them to random regions on the search space. The elite

individual per valley group will then be refined using Berny algorithm [15] described next in the spirit

of Lamarckian learning [17, 18].

3.5. Berny Algorithm With its capability of converging to the transition states precisely, Berny algorithm enables life-time

individual learning of the elite individual in each valley group. Berny geometry optimization

algorithm is a modified version of the Schlegel algorithm [15] that operates by adjusting the signs of

inadequate eigenvalues. When multiple negative eigenvalues are encountered, all negative

eigenvalues are replaced by their absolute value except the one with the smallest magnitude. When

no negative eigenvalue is present, the sign of the least positive eigenvalue is negated. Following the

Page 8: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

life-time individual learning, valley elites that satisfy as transition states are archived and checked

against possible duplicates.

4. Results and Discussions To showcase the efficacy of the evolutionary MMC methodology detailed above, an experimental

study was performed at the Hartree-Fock level of ab initio calculation of the energy, derivatives, and

eigenvalues with STO-3G basis set. A population size of 10 individuals and an evolution length of

1,000 generations were assumed. As the molecules of interest are the following small non-cyclic

compounds (also shown in Figure 5):

GABA (gamma-aminobutyric acid): a chief inhibitory neurotransmitter that regulates

neuronal excitability in the mammalian central nervous system.

GABOB (gamma-amino-beta-hydroxybutyric acid or buxamin): a derivative of the

neurotransmitter GABA that works mainly as the anti-epileptic drug, offering significant de-

stressing and anti-aging benefits.

Leucine: an essential amino acid important for haemoglobin formation that potently

activates the mammalian target of rapamycin kinase for cell growth regulation.

Tromethamine: an alkalizing agent and buffer in enzymatic reactions that affects the

balance of water and electrolytes in the body; also used in the synthesis of surface-active

agents and pharmaceuticals, in particular, the treatment of metabolic acidosis.

Valine: another essential amino acid characterized with stimulant activity that regulates the

immune function of the body and helps in the development and growth of the muscle as

well as tissue repair; also a main precursor in the penicillin biosynthetic pathway; has been

found useful in treatments involving muscle, mental, and emotional upsets and for insomnia

and nervousness.

(a) GABA (b) GABOB

(c) Leucine (d) Tromethamine (e) Valine

Figure 5. Structural View of the Small Non-cyclic Molecules of Interest

Carbon atoms are represented in green, Nitrogen in blue, Oxygen in red, and Hydrogen in silver.

Page 9: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

Figure 6 depicts the number of transition states discovered and the computational cost incurred by

different algorithms, namely the Stochastic Multi-start Local Search using Berny algorithm (SMLS)

[45] as the representative of single-ended methods, the GRRM [46] as the representative of double-

ended methods, the Sequential-Niching Memetic Algorithm (SNMA) [30] that represents the recent

advances in niching and memetic algorithms, and finally the proposed MMC methodology with

specialized representation and evolutionary operators. As a double-ended method, the GRRM

requires the reactant and product information to uncover the transition states between them. In

contrast, the SMLS, the SNMA, and the MMC only need to provide initial guesses of the transition

states prior to executing the life-time individual learning procedure. Intuitively, the GRRM shall incur

less computational costs than the other three methods. However, it is witnessed from Figure 6 that

the GRRM is only more efficient than the SMLS and the SNMA. Without the tree-based

representation, initial guesses produced by the SMLS and the SNMA can easily be nonsensical

configurations that require considerable efforts to repair. With tree-based representation and

specialized evolutionary operators, the MMC is observed to be not only more efficient, but also

more effective than the GRRM as well as the SMLS and the SNMA in uncovering the transition states.

In term of number of transition states uncovered, the SNMA actually performs comparably to the

GRRM. The use of niching algorithm in SNMA allows selective life-time individual learning which in

turn gives way to a better exploration of the search space than in the case of SMLS. This

improvement is achievable without requiring the reactant and product information as with the case

of GRRM. In MMC, the use of valley-adaptive clearing scheme as the niching algorithm coupled with

the tree-based representation and the covalent-bond-driven evolutionary operators have further

improved the efficacy of using memetic computing methodology in the discovery of unique, low-

energy transition states of small, non-cyclic molecules.

Figure 6. Performance of SMLS, SNMA, GRMM, and MMC on GABA, GABOB, Leucine, Tromethamine, and Valine in Terms of Number of Transition States Uncovered and Number of Energy Calculations Performed

With the significantly larger number of uncovered transition states, the use of evolutionary MMC

has furthermore allowed knowledge discovery from data generated over the course of the evolution

rather than requiring prior knowledge as with the case of the other methods. Table 1 tabulates the

characteristics of the uncovered transition states in terms of number of oxygen, carbon, and

hydrogen atoms as well as number of the total atoms and the rotatable bonds. Observation suggests

oxygen atoms play a key role in the flexibility of the molecule, allowing larger number of transition

states to be uncovered. Carbon atoms, in contrast, seem to limit the flexibility of the molecule,

decreasing the number of transition states to be uncovered. Meanwhile, hydrogen atoms are

observed not to play important role in the number of transition states uncovered. Additionally, it is

Page 10: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

observed that the number of rotatable bonds play an obvious role in the number of transition states

uncovered while the total number of atoms has not shown any significant impact. The large number

of transition states of GABOB and thromethamine indicates the flexibility of the molecules, allowing

them to interact with broader range of molecular systems. GABOB as an anti-epileptic drug, for

example, is a more effective neuro-inhibitor than GABA from which it is derived. The flexibility that

GABOB can adopt allows it to interact more efficiently with larger set of GABA-ergic system

components in the central nervous systems.

Table 1. Characteristics of Uncovered Transition States in Terms of Number of Atoms, Number of Rotatable Bonds, and Fitness-Distance Correlation

GABA GABOB Leucine Thrometamine Valine

#transition_state 256 643 305 519 159

#oxygen 2 3 2 3 2

#carbon 4 4 6 4 5

#hydrogen 9 9 13 11 11

#total_atoms 16 17 22 19 19

#rotatable_bonds 3 3 3 3 2

FDC 26.57 3.37 54.48 31.83 92.20

In Table 1, characteristics of the energy landscape of the molecules are also tabulated as Fitness-

Distance Correlation (FDC) [13] to measure the difficulty of the landscape. FDC is essentially the

Pearson product moment correlation between the energy differences and the structural differences

of the solutions to the lowest-energy transition states as shown below

( )

( ) ( )

where ( ) and ( ) are the covariance and the standard deviation, respectively, while

and represent the energy difference and the USR structure dissimilarity between each transition

state and the lowest-energy transition state, respectively. The FDC categorizes energy landscapes

into well-ordered, rough, or deceptive. High correlation with FDC value above 60 indicates well-

ordered landscape such that optimization methods can locate the transition states quite easily

whereas low correlation indicates rough landscape in which optimization techniques could be misled

to sub-optimal solutions. Negative correlation, meanwhile, indicates a deceptive landscape where

the lowest-energy transition state is located among the high energy transition states. From Table 1,

the non-cyclic molecules we investigate, except valine, are found to possess rough landscapes. Of

remarkable difficulty is the landscape of GABOB with FDC value of 3.37. The MMC has however

successfully uncovered around six to eight times more of its transition states than those found by

the other methods, owing the success to the tree-based representation and the covalent-bond-

driven evolutionary operators. This further showcases the strengths of the evolutionary memetic

computing paradigm when coupled with the appropriate representation and operators such that

little or no effort would be spent exploring the region where no solution may be located.

With the large number of transition states uncovered, we further our analysis to include the

distribution of uncovered transition states by the MMC across the relative energy and the structure

dissimilarity, making reference to the lowest-energy transition states, as shown in Figure 7.

Distribution of the uncovered transition states of the molecules under investigation are depicted in

separate subgraphs using the scatter plots. Of remarkable achievement is the broad range of

Page 11: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

transition states uncovered by the MMC as can be concluded from observing the x-axis of the scatter

plots. In other words, the MMC has been capable of discovering unique transition states with high

degree of structural dissimilarities. Observing the y-axis, on the other hand, it is noted that the

uncovered transition states are distinguishable into low- and high-energy transition states. The

presence of high-energy transition states in the region of low dissimilarities could easily hinder the

discovery of low-energy transition states in the region of high dissimilarities. Fortunately, this is not

the case with the MMC. The valley-adaptive clearing scheme used to maintain the population

diversity has indeed been an effective measure for the successful and efficacious exploration of the

energy landscape. Together with the tree-based representation and the covalent-bond-driven

evolutionary operators, the MMC has incurred the least amount of computational costs in addition

to uncovering the largest set of transition states. This shows the evolutionary memetic computing

paradigm is indeed a powerful and efficient tool for solving complex problems when equipped with

appropriate representation and operators.

(a) GABA (b) GABOB

(c) Leucine (d) Thrometamine (e) Valine

Figure 7. Distribution of Uncovered Transition States across Relative Energy and Structure Dissimilarity with reference to

the Lowest-energy Transition State

5. Conclusion and Future Works With the importance of small molecules in biology, chemistry, and medicine, this article has been

utterly dedicated to the discovery of the transition states of small non-cyclic molecules. The

mathematical formulation and a brief description of the nonlinear programming problem of finding

the transition states have been presented. A novel Molecular Memetic Computing (MMC)

methodology that requires no prior knowledge for the discovery of transition states has been

proposed. At the heart of the method are the tree-based representation of non-cyclic molecules and

the covalent-bond-driven evolutionary operators. In the spirit of Lamarckian learning [17, 18], the

specialized representation and operators are used to complement the memetic computing

Page 12: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

technique comprising a genetic algorithm [14] for the population-based global search, the Berny

algorithm [15] for the life-time individual learning, and the valley-adaptive clearing scheme [16] for

maintaining the population diversity.

The evolutionary MMC methodology shows promising application of memetic computing techniques

in the field of molecular optimization by demonstrating excellent efficacy against several state-of-

the-art approaches. Not only did the MMC uncover the largest number of transition states, but it

also incurred the least amount of computational costs. The use of evolutionary MMC, furthermore,

allows knowledge discovery from data generated over the course of the evolution rather than

requiring prior knowledge as with the case of many earlier methods [6–13]. It has been found out

that oxygen atoms play a key role in the flexibility of the molecule, allowing larger number of

transition states to be uncovered. In contrast, carbon atoms have been observed to limit the

flexibility of the molecule, decreasing the number of transition states to be uncovered. Meanwhile,

hydrogen atoms have not been observed to play any important role in the number of transition

states uncovered. Additionally, it has also been found out—as one would expected—that the

number of rotatable bonds play an obvious role in the number of transition states uncovered while

the total number of atoms has not shown any significant impact. On a side note, large number of the

transition states of GABOB and thromethamine suggests the flexibility of the molecules, allowing

them to interact with broader range of molecular systems. GABOB as an anti-epileptic drug, for

instance, is a more effective neuro-inhibitor than GABA, from which it is derived. The flexibility that

GABOB can adopt allows it to interact more efficiently with larger set of GABA-ergic system

components in the central nervous systems.

With the generic tree-based representation, the MMC can be easily adapted for subsequent studies

of other small non-cyclic molecules that are important in biology, chemistry, and medicine. To cover

molecules that involve ring structures, an extension to the tree-based representation shall be

studied. Extensions to the current covalent-bond-driven evolutionary operators shall also be

devised. To allow working with larger molecules, techniques to intelligently restrict the execution of

individual learning procedure [47–50] shall be explored. Finally, Web Service that would allow

scholars to search for the transition states of molecular systems of their choices shall be developed.

In the end, our aim shall be to paves the way to establish the landscape connectivity graph, the

master piece of the comprehensive chemical reactions.

References 1. Chen, J., S.J. Swamidass, Y. Dou, J. Bruand, and P. Baldi, “ChemDB: A Public Database of

Small Molecules and Related Chemoinformatics Resources,” Bioinformatics, 21(22),

pp.4133–4139, 2005.

2. Lagorce, D., T. Pencheva, B.O. Villoutreix, and M.A. Miteva, “DG-AMMOS: A New Tool to

Generate 3D Conformation of Small Molecules Using Distance Geometry and Automated

Molecular Mechanics Optimization for in silico Screening,” BMC Chemical Biology, 9(1),

2009.

3. Sperandio, O., M. Souaille, F. Delfaud, M.A. Miteva, and B.O.Villoutreix, “MED-3DMC: A New

Tool to Generate 3D Conformation Ensembles of Small Molecules with A Monte Carlo

Sampling of the Conformational Space,” European Journal of Medicinal Chemistry, 44(4),

pp.1405–1409, 2009.

Page 13: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

4. Miteva, M.A., F. Guyon, and P. Tufféry, “Frog2: Efficient 3D Conformation Ensemble

Generator for Small Compounds,” Nucleic Acids Research, 38, pp.W622–W627, 2010.

5. Ellabaan, M.M.H., Y.S. Ong, M.H. Lim, and J.L. Kuo, “Finding Multiple First-order Saddle

Points Using A Valley-adaptive Clearing Genetic Algorithm,” in Proceedings of 2009 IEEE

International Symposium on Computational Intelligence in Robotics and Automation,

pp.457–462, 2009.

6. Trygubenko, S.A. and D.J. Wales, “A Doubly Nudged Elastic Band Method for Finding

Transition States,” Journal of Chemical Physics, 120(5), pp.2082–2094, 2004.

7. Carr, J.M., S.A. Trygubenko, and D.J. Wales, “Finding Pathways Between Distant Local

Minima,” Journal of Chemical Physics, 122(23), pp.234903–234909, 2005.

8. Koslover, E.S. and D.J. Wales, “Comparison of Double-ended Transition State Search

Methods,” Journal of Chemical Physics, 127(13), pp.134102–134113, 2007.

9. Sheppard, D., R. Terrell, and G. Henkelman, “Optimization Methods for Finding Minimum

Energy Paths,” Journal of Chemical Physics, 128(13), pp.134106–134115, 2008.

10. Olsen, R.A., G.J. Kroes, G. Henkelman, A. Arnaldsson, and H. Jónsson, “Comparison of

Methods for Finding Saddle Points without Knowledge of the Final States,” Journal of

Chemical Physics, 121(20), pp.9776–9792, 2004.

11. Chaudhury, P. and S.P. Bhattacharyya, “A Simulated Annealing Based Technique for Locating

First-order Saddle Points on Multidimensional Surfaces and Constructing Reaction Paths:

Several Model Studies,” Journal of Molecular Structure: THEOCHEM, 429, pp.175–186, 1998.

12. Chaudhury, P., S.P. Bhattacharyya, and W. Quapp, “A Genetic Algorithm Based Technique for

Locating First-order Saddle Point Using A Gradient-dominated Recipe,” Chemical Physics,

253(2–3), pp.295–303, 2000.

13. Bungay, S.D., R.A. Poirier, and R.J. Charron, “Optimization of Transition State Structures

Using Genetic Algorithms,” Journal of Mathematical Chemistry, 28(4), pp.389–401, 2000.

14. Holland, J.H., Adaptation in Natural and Artificial Systems, MIT Press, 1992.

15. Schlegel, H.B., “Optimization of Equilibrium Geometries and Transition Structures,” Journal

of Computational Chemistry, 3(2), pp.214–218. 1982.

16. Ellabaan, M.M.H. and Y.S. Ong, “Valley-adaptive Clearing Scheme for Multimodal

Optimization Evolutionary Search,” in Proceedings of the Ninth International Conference on

Intelligent Systems Design and Applications, pp.1–6, 2009.

17. Ong, Y.S. and A.J. Keane, “Meta-lamarckian Learning in Memetic Algorithm,” IEEE

Transactions on Evolutionary Computation, 8(2), pp.99–110, 2004.

18. Le, M.N., Y.S. Ong, Y. Jin, and B. Sendhoff, “Lamarckian Memetic Algorithms: Local Optimum

and Connectivity Structure Analysis,” Memetic Computing, 1(3), pp.175–190, 2009.

19. Ang, J.H., K.C. Tan, and A.A. Mamun, “An Evolutionary Memetic Algorithm for Rule

Extraction,” Expert Systems with Applications, 37(2), pp.1302–1315, 2010.

20. Brooks, C.L. 3rd, J.N. Onuchic, and D.J. Wales, “Statistical Thermodynamics. Taking A Walk on

A Landscape,” Science, 293(5530), pp.612–613, 2001.

21. Wales, D.J., Energy Landscapes: Applications to Clusters, Biomolecules and Glasses,

Cambridge University Press, 2003.

22. Soh, H., Y.S. Ong, Q.C. Nguyen, Q.H. Nguyen, M.S. Habibullah, T. Hung, and J.L. Kuo,

“Discovering Unique, Low-Energy Pure Water Isomers: Memetic Exploration, Optimization,

and Landscape Analysis,” IEEE Transactions on Evolutionary Computation, 14(3), pp.419–

437, 2010.

Page 14: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

23. Jensen, F., Introduction to Computational Chemistry, Wiley, 1999.

24. Zhou, T., K. Lafleur, A. Caflisch, “Complementing Ultrafast Shape Recognition with An Optical

Isomerism Descriptor,” Journal of Molecular Graphics and Modelling, 29(3), pp.443–449,

2010.

25. Ballester, P.J. “Ultrafast Shape Recognition: Method and Applications,” Future Medicinal

Chemistry, 3(1), pp.65–78, 2011.

26. Truong, T.N. and D.G. Truhlar “Ab initio Transition State Theory Calculations of the Reaction

Rate for OH+CH4→H2O+CH3,” Journal of Chemical Physics, 93(3), pp.1761–1769, 1990.

27. Henkelman, G. and G. Jóhannesson, “Methods for Finding Saddle Points and Minimum

Energy Paths,” Progress in Theoretical Chemistry and Physics, 5, pp.269–302, 2002.

28. Moscato, P., “On Evolution, Search, Optimization, Genetic Algorithms, and Martial Arts:

Towards Memetic Algorithms,” Caltech Concurrent Computation Program, 1989.

29. Neri, F., J. Toivanen, G. Cascella, and Y.S. Ong, “An Adaptive Multimeme Algorithm for

Designing HIV Multidrug Therapies,” IEEE/ACM Transactions on Computational Biology and

Bioinformatics, 4(2), pp.264–278, 2007.

30. Vitela, J.E. and O. Castanos, “A Real-coded Niching Memetic Algorithm for Continuous

Multimodal Function Optimization,” in Proceedings of 2008 IEEE Congress on Evolutionary

Computation, pp.2170–2177, 2008.

31. Ong, Y.S., M.H. Lim, and X.S. Chen, “Memetic Computation—Past, Present, & Future,” IEEE

Computational Intelligence Magazine, 5(2), pp.24–36, 2010.

32. Chen, X.S., Y.S. Ong, M.H. Lim, and K.C. Tan, “A Multi-facet Survey on Memetic

Computation,” IEEE Transactions on Evolutionary Computation, 15(5), pp.591–607, 2011.

33. Cheng, H.C., T.C. Chiang, and L.C. Fu, “A Two-stage Hybrid Memetic Algorithm for

Multiobjective Job Shop Scheduling,” Expert Systems with Applications, 38(9), pp.10983–

10998, 2011.

34. Zhang, J., Z. Zhan, Y. Lin, N. Chen, Y. Gong, J. Zhong, H.S.H. Chung, Y. Li, and Y. Shi,

"Evolutionary Computation Meets Machine Learning: A Survey," IEEE Computational

Intelligence Magazine, 6(4), pp.68–75, 2011.

35. Jin, Y., K. Tang, X. Yu, B. Sendhoff, and X. Yao, “A Framework for Finding Robust Optimal

Solutions Over Time,” Memetic Computing, in Press.

36. Le, M.N., Y.S. Ong, S. Menzel, Y. Jin, and B. Sendhoff, "Evolution by Adapting Surrogates,"

Evolutionary Computation, in Press.

37. Chia, J.Y., C.K. Goh, K.C. Tan, and V.A. Shim, "Memetic Informed Evolutionary Optimization

via Data Mining," Memetic Computing, 3(2), pp.73–87, 2011.

38. Damas, ., O. Cord n, and . antamar a, edi a Ima e Re istration sin o tionary

Computation: An Experimental Survey," IEEE Computational Intelligence Magazine, 6(4),

pp.26–42, 2011.

39. Meng, Y., Y. Zheng, and Y. Jin, "Autonomous Self-reconfiguration of Modular Robots by

Evolving A Hierarchical Mechanochemical Model," IEEE Computational Intelligence

Magazine, 6(1), pp.43–54, 2011.

40. Shim, V.A., K.C. Tan, J.Y. Chia, and J.K. Chong, "Evolutionary Algorithms for Solving Multi-

objective Travelling Salesman Problems," Flexible Services and Manufacturing, 23(2),

pp.207–241, 2011.

Page 15: Discovering Unique, Low energy Transition States of Small ... Papers/2013 CIM.pdf · Molecular Memetic Computing (MMC) methodology for the discovery of unique, low-energy transition

41. Le, M.N., Y.S. Ong, Y. Jin, and B. Sendhoff, "Gene Meets Meme in Computational

Intelligence: A Theoretic Modeling of Symbiotic Evolution," IEEE Computational Intelligence

Magazine, 7(1), pp.20–35, 2012.

42. Thomas S.A. and Y. Jin, "Evolving Connectivity Between Genetic Oscillators and Switches

Using Evolutionary Algorithms," Journal of Bioinformatics and Computational Biology, in

Press.

43. Kabsch, W., “A Solution for the Best Rotation to Relate Two Sets of Vectors,” Acta

Crystallographica, 32(5), pp.922–923, 1976.

44. Weast, R.C., Handbook of Chemistry and Physics, CRC Press, 1984.

45. Kok, S. and C. Sandrock, “Locating and Characterizing the Stationary Points of the Extended

Rosenbrock Function,” Evolutionary Computation, 17(3), pp.437–453, 2009.

46. Maeda, S., Y. Matsuda, S. Mizutani, A. Fujii, and K. Ohno, “Long-range Migration of A Water

Molecule to Catalyze A Tautomerization in Photoionization of the Hydrated Formamide

Cluster,” Journal of Physical Chemistry A, 114(44), pp.11896–11899, 2010.

47. Handoko, S.D., C.K. Kwoh, and Y.S. Ong, “Using Classification for Constrained Memetic

Algorithm: A New Paradigm,” in Proceedings of IEEE International Conference on Systems,

Man and Cybernetics, pp.547–552, 2008.

48. Handoko, S.D., C.K. Kwoh, and Y.S. Ong, “Classification-assisted Memetic Algorithms for

Equality-constrained Optimization Problems,” Lecture Notes in Computer Science, 5866,

pp.391–400, 2009.

49. Handoko, S.D., C.K. Kwoh, and Y.S. Ong, “Feasibility Structure Modeling: An Effective

Chaperone for Constrained Memetic Algorithms,” IEEE Transactions on Evolutionary

Computation, 14(5), pp.740–758, 2010.

50. Handoko, S.D., C.K. Kwoh, Y.S. Ong, and J. Chan, “Classification-assisted Memetic Algorithms

for Solving Optimization Problems with Restricted Equality Constraint Function Mapping,” in

Proceedings of 2011 IEEE Congress on Evolutionary Computation, pp.1209–1216, 2011.