hands-onworkshop: density-functionaltheoryandbeyond ... · 2018. 8. 2. · pseudocode(algorithm1):...

19
Hands-on workshop: Density-Functional Theory and Beyond - Frontiers of Advanced Electronic Structure and Molecular Dynamics Methods Beijing, China, July 30 – August 10, 2018 Energy coordinates Tutorial III: Searching conformational space Manuscript for Exercise Problems Prepared by Dmitrii Maksimov, Haiyuan Wang, and Carsten Baldauf Peking University Beijing, August 2, 2018

Upload: others

Post on 08-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

Hands-on workshop:Density-Functional Theory and Beyond -Frontiers of Advanced Electronic Structure

and Molecular Dynamics Methods

Beijing, China, July 30 – August 10, 2018

Ene

rgy

coordinates

Tutorial III: Searching conformational spaceManuscript for Exercise Problems

Prepared byDmitrii Maksimov,

Haiyuan Wang, and Carsten BaldaufPeking University

Beijing, August 2, 2018

Page 2: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

IntroductionThe aim of this tutorial is to familiarize you with the basic concepts of global searches in molecularstructure space using the search tool Fafoom.

The practice session consists of three parts:

Part A: Genetic algorithm searchProblem I: Preparing to runProblem II: Start of the first run

Part B: Analysis of the resultsProblem III: Collecting the structuresProblem IV: Sorting the structuresProblem V: Visualizing and analyzing of the results

Part C: And beyondProblem VI: Adding custom parameters for visualizingProblem VII: Extra capabilities for search

Please start with Problem I and prepare and launch your genetic algorithm (GA) based search for alaninedipeptide. While waiting for the results, please proceed with Part B. Once your GA run is completed(Problem II), you can apply what you have learned while doing Part B. Part C features some optionalfree exercises.In the directory $HandsOn/tutorial_3/, you can find all the files necessary for this tutorial. Ded-

icated folders have been prepared in the skel/ directory for each problem. Please copy the contentsof the $HandsOn/tutorial_3/skel/ folder into your own working directory. Solutions are provided in$HandsOn/tutorial_3/solutions/, while some scripts provided in $HandsOn/tutorial_3/utilities/

Practical notes• You can use Jmol for visualizing 3D structures. Jmol recognizes a number of different chemical

formats. You can measure bond lengths, bond angles and dihedral angles with Jmol: just double-click on the first atom and click the remaining ones.

• The atom ordering in a file matters. Most of the scripts/programs rely on the fact that the atomsare ordered in a consistent matter, especially when comparing structures!

• The Berlin ab initio amino acid data base 1 provides structural data of 20 amino acids, bare andin cation-complexes [1]. You can use the data stored there for benchmarking!

1 http://aminoaciddb.rz-berlin.mpg.de/

2

Page 3: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

Investigating molecular structuresRepresenting a chemical compoundSmall (bio)organic molecules are often highly flexible and may adapt different 3D structures that differin properties and energy. For methods like protein-ligand docking or catalyst design it is importantto know the low-energy conformers of the molecule that build its molecular ensemble. To this end, themolecular conformational space needs to be sampled efficiently so that all relevant low-energy conformersare found.Figure 1 depicts popular chemical representations of alanine dipeptide. The chemical formula stores

only the composition of the compound. The simplified molecular-input line-entry system (SMILES) [2]string is a convenient representation as it allows for encoding the connectivity, the bond order and thestereochemical information in a one-line notation. It should be noted, that a number of valid SMILEScodes can be constructed for the same compound. An advantage of the SMILES strings is the fact thatthey are intuitive and can be easily read and written.

Chemical formula

SMILES

InChI

CC(=O)N[C@H](C(=O)NC)C

1S/C6H12N2O2/c1-4(6(10)7-3)8-5(2)9/h4H,1-3H3,(H,7,10)(H,8,9)/t4-/m0/s1

string

3D

chemicalformula

atom connections hydrogen atomsversion, standard

C6H12N2O2

stereochemicallayer

N

O

O

NH

HCH3 CH3

CH3graph

Figure 1: Alternative chemical representations on the example of alanine dipeptide.

SMILES codes can be used to generate a schematic, graph-like representation of a molecule. Finally,the last missing piece of information, namely the spatial arrangement of atoms, is revealed in a 3Drepresentation of a molecule. Two types of coordinates are commonly employed to represent a molecular3D structure: Cartesian and internal coordinates. In Cartesian coordinates, each atom is representedas a point in 3D space. Cartesian coordinates are universal, intuitive and always relative to the originof the coordinate system. The simplest internal coordinates are based on the ’Z-matrix coordinates’ i.e.include bond lengths, bond angles as well as dihedral angles (torsions) (Figure 1). The main advantage ofthe internal coordinates is that they are orientation- and location-invariant, i.e. they remain unchangedupon translation and rotation in 3D space. In contrast, the dihedral angles are in most cases the onlyrelevant degrees of freedom (DOF). Bond lengths and bond angles have usually only one minimum, i.e.the energy will increase rapidly if these parameters adopt non-optimal values. On the contrary, there isno single minimum for the value of a dihedral angle and in most cases, diverse values can be adopted.The adopted values depend on the neighboring atoms/functional groups and on the steric interactionswithin the conformation.For the purpose of global structure search as we perform it here, only single, non-ring bonds between

non-terminal atoms are considered as fully rotatable bonds after excluding bonds that are attachedto methyl groups that carry three identical substituents. Further, the cis/trans nomenclature can beutilized to describe the relative orientation of functional groups within a molecule. In cases in which thefunctional groups are oriented in the same direction we refer to it as cis, whereas, when the groups areoriented in opposite directions, we refer to it as trans.

The full representation of a molecular 3D structure is the list of its Cartesian coordinates. An alter-native way to store 3D structures is to use a reduced representation that contains the connectivity ofatoms (in SMILES notation) and DOFs with the corresponding values. The difference between thesetwo alternative representations is illustrated in Figure 2.

3

Page 4: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

CC(=O)N[C@H](C(=O)NC)C

fullC 7.77629 -0.99492 -2.45999C 6.70435 -0.55761 -1.49837O 6.43080 0.61938 -1.29985N 6.05315 -1.59489 -0.87545C 4.97379 -1.36011 0.08672H 4.45919 -0.42867 -0.17831C 5.54586 -1.18095 1.51158O 5.22975 -1.89502 2.46012N 6.40900 -0.11162 1.63863C 7.00728 0.22717 2.90258C 3.97100 -2.50525 0.02836H 8.45209 -0.15731 -2.65424H 7.31692 -1.31259 -3.39960H 8.36070 -1.81736 -2.03868H 6.49171 -2.50679 -0.87046H 6.52118 0.51324 0.84106H 6.22683 0.34653 3.65919H 7.56699 1.15828 2.79096H 7.68144 -0.57908 3.20522H 3.10880 -2.29845 0.67116H 4.41840 -3.44833 0.36172H 3.60690 -2.65261 -0.99416

reduced

rotable bond 1 (C-N-C-C): 178 ºrotable bond 2 (N-C-C-N): 61 ºrotable bond 3 (C-C-N-C): -88 ºrotable bond 4 (C-N-C-C): -180 º

SMILES:

Significant degreesof freedom:

and1

2

3 4

Figure 2: The comparison of a full and reduced representation of 3D structure of the alanine dipeptide. Thefull representation contains all atomic Cartesian coordinates. The reduced representation consists ofthe SMILES string and dictionary of the rotatable bonds together with the corresponding values ofone specific conformation. For bond lengths and bond angles, standard values can be assumed.

The substantial advantage of the reduced representation is the fact that for a specified chemicalcompound, the only stored data are simultaneously the DOFs for the optimization. This is extremelyconvenient, especially for larger systems. Nevertheless, one should keep in mind that the reduced rep-resentation stores no information about the bond lengths and bond angles, but assumes no substantialchanges from standard or equilibrium values of these coordinates.

Geometric similarity of structuresThe quantification of the molecular similarity is a common problem that needs to be solved, e.g. in orderto remove duplicates from a pool of 3D structures. The most popular approach to quantify the similarityis the root-mean-square deviation RMSD, calculated for two sets of Cartesian coordinates.

Root-mean-square deviation (RMSD) Given two 3D geometries of a compound with N atoms, theformula for the RMSD is defined as follows:

RMSD =

√√√√ 1N

N∑i=1

d2i (1)

where di is the distance between the corresponding atoms. Although fast to calculate, the RMSD valuedescribes the similarity of two molecular conformations only after the best superposition of the geometriesis identified. The most popular algorithm for finding the best alignment of two sets of coordinates isthe Kabsch algorithm [3]. After translating the centroids of the two sets of coordinates to the center ofthe coordinate system, the Kabsch algorithm computes the optimal rotation matrix that minimizes theRMSD. Often only heavy atoms are considered in RMSD calculations. There are multiple advantages ofusing the Cartesian RMSD, e.g. it is a well-recognized metric, it is easy to calculate and reproduce andis available as a basic functionality in most of the modeling packages.

Torsional RMSD (tRMSD) Instead of using Cartesian coordinates, the values of the significant tor-sional degrees of freedom, i.e. rotatable bonds, can be used. In analogy to the Cartesian RMSD, giventwo 3D geometries with m rotatable bonds, the formula for tRMSD reads:

tRMSD =

√√√√ 1m

m∑i=1

θ2i (2)

where θi is the angular difference between values of the corresponding dihedral angles. The calculationof the tRMSD is cheaper compared to the Cartesian RMSD. The value of the tRMSD is also easierto interpret. The major practical drawback of the tRMSD is the necessity to always provide a list ofconsidered torsions in order to ensure reproducibility.

4

Page 5: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

Characterizing the Potential Energy SurfaceThe potential energy surface (PES) describes the relation between the geometry and potential energy ofa molecule [4] (Figure 3).

Ene

rgy

global minimum

local minimum

1st order saddle point

Coordinate 1

Coord

inat

e 2

Figure 3: Schematic representation of a model PES representing the energy as a function of two coordinates.

Usually, a number of local minima with a global minimum among them exist on the PES of flexible,organic compounds. If the system is in a minimum, any small displacement will increase the potentialenergy. Each of PES minimum corresponds to a different 3D geometry. As a consequence, a variety oflow-energy conformers can be adopted by flexible molecules. Another type of stationary point located onthe PES is a first-order saddle point that corresponds to a transition state (TS) structure. A first-ordersaddle point is a maximum along exactly one direction and a minimum in all other directions.In the following, we focus on methods for the global optimization, i.e methods that can be used to find

the global minimum on the PES.

Global optimization

The exploration of the high-dimensional PES is a complex task. The solution space is vast and thus it isgenerally unfeasible to tackle the search problem in a deterministic way. A number of stochastic methodshave been developed in order to efficiently sample the PES and to generate low-energy conformers. Someof the most popular methods are summarized in Table 1.

Table 1: Popular sampling approaches. Names of freely available programs are highlighted in boldface.Reprinted from [5].

Method Description Implemented, e.g., ingrid-based based on grids of selected Cartesian or internal

coordinates (e.g., grids of different torsionalangle values of a molecule)

CAESAR [6], Open Babel [7],Confab [8], MacroModel [9],MOE [10]

rule/knowledge -based

use known (e.g., from experiments) structuralpreferences of compounds

ALFA [11], CONFECT [12], CO-RINA and ROTATE [13, 14], COS-MOS [15, 16], OMEGA [17]

population-basedmetaheuristic

improve candidate solutions in a guided search Balloon [18], Cyndi [19]

distance geometry based on a matrix with permitted distancesbetween pairs of atoms

RDKit [20]

basin-hopping [21] /minima hopping [22]

based on moves across the PES combined withlocal relaxation

ASE [23], GMIN [24], TINKERSCAN [25]

In this tutorial, we selected a genetic algorithm based search for sampling the PES. The geneticalgorithm (GA) [26, 27, 28] is a metaheuristic optimization method and belongs to family of evolutionarycomputing techniques. The concept of GA is to mimic evolution and follow the ’survival of the fittest’concept. The algorithm starts with a pool of random candidates for solutions. The best of the solutionsare allowed to evolve while those unfavorable are removed from the pool. With this, the algorithm uses

5

Page 6: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

the available information in order to explore promising candidates. GAs found numerous applicationsin the field of 3D structure prediction, e.g: (i) conformational searches for molecules, e.g. unbranchedalkanes [29] or polypeptide folding [30]; (ii) molecular design [31]; (iii) protein-ligand docking [32, 33](iv) cluster optimization [34, 35, 36, 37]; (v) predictions of crystal structures [38, 39]. In most of theapplications, the fitness is a function of the total energy. In addition to that, an example of a GA wherethe experimental information is included in the search process was suggested by Neiss and Schoos [40].

Fafoom - Flexible algorithm for optimization of moleculesFafoom performs a global search based on a user-curated selection of degrees of freedom and conductslocal optimization in Cartesian coordinates with an external software (here with FHI-aims). Fafoom isa Python package implemented using Python 2.7 and designed for sampling the conformational space oforganic molecules. It is distributed under the GNU Lesser GENERAL Public License [41] and availablefrom GitHub ( https://github.com/FHIBioGroup/fafoom-dev). Some important aspects of Fafoom:

• Energies and distances are measured in eV and Å, respectively.

• Generated 3D structures are characterized as sensible structures by checking for clashes betweennot connected atoms.

• The similarity of 3D structures is checked by either:– Cartesian RMSD - if the RMSD exceeds a certain cutoff, the structures are considered to be

different.– Degrees of freedom deviation (DOFd) - measures of the variation of the values of the DOFs.

DOFd is a list with one value (True or False) per entry (per DOF). E.g. for the type ’rotatablebonds’ the list will store ’True’ if the tRMSD value for the evaluated 3D structures does notexceed a certain cutoff. If the list contains at least one ’False’, the two structures are consideredto be different. In other words, a pair of similar structures is similar only if the values of allconsidered degrees of freedom are similar.

• 3D structures are internally encoded as strings in structure-data format (SDF). The SDF is a com-bination of the MDL Mol format that stores the information about the atoms, bonds, connectivityand 3D coordinates with any associated data.

• A blacklist is maintained to keep track of generated conformers. The blacklist stores all structuresthat: (i) were subjected to local relaxations and (ii) resulted from converged local relaxations. Theblacklist is consulted in order to evaluate the uniqueness of the newly generated structures and toultimately avoid recomputing structures.

6

Page 7: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

Genetic algorithm based searchThe main aim of the Fafoom package is to allow for performing genetic-algorithm-based searches forsampling the conformational space. The operating principle of the algorithm is given by the followingpseudocode (Algorithm 1):# initializationwhile i < popsize:

x = random_sensible_geometryif x is not in the blacklist:

blacklist.append(x)x = local_relaxation(x)blacklist.append(x)population.append(x)i+=1

# iterationwhile j < iterations:

population.sort(index=energy)(parent1, parent2) = population.select_candidates(2)child = sensible_crossover(parent1, parent2)repeat

child = mutation(child1, child2)until child is sensible and are not in the blacklistblacklist.append(child)child = local_relaxation(child)blacklist.append(child)population.append(child)population.sort(index=energy)population.delete_high_energy_candidates(1)if convergence criteria met:

breakelse:

j+=1Algorithm 1: Genetic algorithm for sampling the conformational space of molecules.

Initialization

Starting from a molecule in SDF format, degrees of freedom, e.g. rotatable bonds, cis/trans bonds, andpyranose rings, are initially characterized. This assignment does not change during the GA search, aswe consider intact molecules. Next, random values are assigned to the DOF:

Rotatable bonds are assigned integers in the range from −179◦ to 180◦

cis/trans bonds can either be 0◦ or 180◦

Pyranose ring conformations are assigned as one of 38 ring puckers, two chairs, six boats, six skew-boats, 12 half-chairs and 12 envelopes [42, 43], by assigning an integer number between 0 to 37.

If the resulting 3D geometry is sensible and unique, i.e. different from all structures stored in theblacklist, local relaxation is performed. Once the local relaxation is completed, the values of the DOFsare updated and the structure is added to the population and to the blacklist. The procedure (i.e.generating, optimizing, and checking a random structure) is repeated until the intended population sizeN is reached.

Iteration

The objective function (energy) is optimized as the population evolves over generations. An iterationbegins with assigning fitness values Fi to the individual population members i:

Fi = Emax − Ei

Emax − Emin(3)

7

Page 8: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

where Emin(max) refer to the lowest (highest) energy among the energies of the structures in the currentpopulation and Ei refers to the energy of structure i. Highest fitness (F = 1) is assigned to the structurewith the lowest energy (’best’, i.e. most stable) in the population. The structure with the highest energyis assigned fitness F = 0. In case of populations with little variation in the energies values, i.e. ifEmax−Emin < 0.001 eV, all structures are assigned a fitness value of F = 1. Once the fitness values areassigned, the genetic operations follow.

Selection Three different mechanisms for assigning selection probabilities p to the structures are imple-mented. Based on the resulting probability values, two distinct structures are selected and are referredto as ’parents’ for the subsequent genetic operations (crossing over and mutation).

• roulette wheel: pi = Fi∑N

n=1Fi

• reversed roulette wheel: pi = FN+1−i∑N

n=1Fi

• uniform: pi = 1N

Crossing-over Aim of the crossing-over operation is to recombine favorable properties of individuals.After combining the DOFs into a vector of values, the crossing-over is a two-step procedure (Figure 4):(i) a cutting point in the vector(s) is defined and (ii) the respective segments are exchanged and therebyrecombined into new candidate structures (’children’).

Parent 1 Parent 2Torsions:

Cis/Trans:

Crossing-over

Torsions:

Cis/Trans:

Single point Uniform

t1 t2 t3 t4 t5 t6

c1c2c3c4c5c6

t1 t2 t3 t4 t5 t6 t1 t2 t3 t4 t5 t6

t1 t2 t3 t4 t5 t6

c1c2c3c4c5c6 c1c2c3c4c5c6

c1c2c3c4c5c6

Figure 4: Crossing-over procedure.

The structures generated by crossing over must be sensible (clash free), otherwise this step is repeateduntil a sensible geometry is generated or a maximum number of attempts is reached. If no sensiblegeometry can be generated, the children are exact copies of the parents. Two crossing-over methods areimplemented in Fafoom: single point and uniform crossover.

Mutation With some probability, new individuals from the crossing-over procedure may undergo mu-tation, performed independently for each DOF type. Therefore, selected DOF values are being randomlychanged, which then results in changes in the 3D structure and the energy. The probability for a mu-tation and the maximum number of alterations can be controlled with parameters. The location andactual number of alterations is decided randomly. Also this genetic operation is only successful if thecreated 3D structure is sensible and unique. Otherwise the mutation is repeated until a sensible andunique structure is generated or a maximum number of attempts is exceeded.

Local optimization and update A structure created by the aforementioned genetic operations is passedon to the external software for local optimization (here FHI-aims). Once the optimization is completed,the values of the DOFs are updated and the structure is added to the blacklist. After adding the newlyoptimized structure to the population, the highest energy individual in population is eliminated in orderto implement a population of constant size and the idea of subsequent generations.

Termination

After a minimum number of iterations, the convergence of the algorithm is evaluated. The algorithmterminates if at least one of the following criteria is met:

8

Page 9: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

• the lowest energy has not changed by more than a defined threshold during a defined number ofiterations

• the lowest energy reached a defined value

• maximum number of iteration is exceeded

• relaxation with the external package (FHI-aims) fails

Parameters

The Fafoom parameters can be assigned to three groups: (i) molecular settings, (ii) run settings, and(iii) GA settings. The list of all parameters (together with descriptions and defaults) can be found inthe Fafoom manual. An overview of the available parameters is summarized in Figure 5.

check the box on chirality in Part II

Similarity descriptor between structures: 'cartesian' or 'internal_coord'

number of the initial random individuals

command to run FHI-aims

number of iterations

Figure 5: Overview over Fafoom parameters as contained in the configuration file parameters.txt.

9

Page 10: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

Part A: Genetic algorithm searchIn the first part of this tutorial you will use Fafoom and run a genetic algorithm based search for thealanine dipeptide, see structure in Figure 6.

N

O

O

NH

HCH3 CH3

CH3

Figure 6: alanine dipeptide

Problem I: Preparing to runFirst of all Fafoom requires the 3D representation of the molecule (coordinates) as structure template aswell as a definition of the DOFs to be adjusted. For example for torsions as a list of 4 atoms. There aremultiple ways to create this structural input, for example:

• From SMILES code with use of prepare.py script

• From 3D molecular file format like PDB, XYZ, MOL, etc. by converting to SDF format.

• By manual construction of the molecule manually with, e.g., Jmol.

Methods 2 and 3 imply further manual specification of the DOFs, i.e. lists of atoms that define torsionangles on which genetic operations are to be applied.

Method 1: Preparing from SMILES code

Create file mol.smi which contains SMILES code of the molecule. In case of the alanine dipeptide, theSMILES code is: CC(=O)N[C@H](C(=O)NC)C

echo "CC(=O)N[C@H](C(=O)NC)C" > mol.smi

Then use mol.smi file as an input for the prepare.py script is part of Fafoom package that employs afunctionality of the package RDKit to produce a 3D representation of the molecule. You can run it bytyping:

python prepare.py mol.smi

The script prepare.py will produce the mol.sdf file which is 3D representation of your molecule in SDFformat. The file parameters.txt contains the run parameters for Fafoom. The clear advantage of thismethod is that torsion angles are specified automatically and that a template input file is generated.Take a look at parameters.txt and identify the sections with the DOFs. Further parameters in herewill be discussed in detail later.

Method 2: Converting from another format

There are more than 100 chemical file formats for representing molecular structures. Some examplesare presented in the Appendix in Figure 9. Openbabel 2 is a tool to read and convert files of differentformats, e.g. if you want to convert a FHI-aims geometry.in file to a XYZ-file, use:

obabel -ifhiaims geometry.in -oxyz -Ogeometry.xyz

where

-i <format> specifies the format of the input-o <format> specifies the format of the output-O <filename> specifies the name of the output file

2 https://github.com/openbabel/openbabel

10

Page 11: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

So if you already have a 3D representation of the molecule of interest in a format other than SDF, youcan convert it with use of obabel. In the folder problem_1 you can find files AlanineDipeptide.in andalanineDipeptide.xyz which are in different file formats. Try to convert it to SDF format:

obabel -ifhiaims AlanineDipeptide.in -osdf -Omol.sdf

Remember that the input file specifying the molecule has to be called mol.sdf.

In order to produce input files for Fafoom you can also convert your molecule to smiles format andrepeat method 1 (For example: obabel -ixyz AlanineDipeptide.xyz -osmi -Omol.smi) If due tosome reasons it is not possible to convert your molecule into smiles code you can prepare parameters.txtfile manually. For that you need to specify the atoms that define torsion angles. For that it is convenientto open your molecule in Jmol by typing:

jmol mol.sdf

In order to show numbers of atoms go to Display → Label → Number in the menu line on top ofJmol. Now you can see atom numbers of your molecule. Right click of the mouse button opensthe menu, go to Measurements and choose "Click for torsion (dihedral) measurements". Now, bydouble-clicking on each atom you can select 4 atoms (see Fig. 7) and specify all the dihedral anglesthat should be taken into account. To take a look at the selected dihedral angles you can go to Tools→Measurements in the menu of Jmol. You can see a, b, c and d atom numbers for each dihedral angle.Be careful, Fafoom starts to count from 0 so all the numbers should be decreased by one and added asseparate quadruples after the list_of_torsions flag in the parameters.txt file.

Attention:

• Fafoom starts counting from 0 while Jmol counts starting from 1. Take this into ac-count and decrease numbers by 1 for the lists for Fafoom.

• In Fafoom, the middle numbers of the torsion angle quadruple MUST rise (see Fig-ure 7). Wrong ordering of the atoms will lead errors when manipulating torsion duringthe GA run.

A B

1

2

3 4

5

Valid representations of thesame dihedral angle: 1-A-B-4 2-A-B-4 3-A-B-41-A-B-5 2-A-B-5 3-A-B-5

WRONG in Fafoom: 4-B-A-1 5-B-A-3

A > B

Figure 7: A dihedral angle can be represented in multiple ways. It is common to select heavy atoms for therepresentation. Pay attention to the order of atoms in a Fafoom GA run.

Method 3: Manual construction of the molecule with JmolThis has been introduced in Tutorial 1.

11

Page 12: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

Adjusting the parameters.txt fileAn example of the parameters.txt file is presented in Fig. 5. The energy calculations will be runexternally with FHI-aims which is specified in the field of energy_function which is in [Run settings]section of the file. During the conformational sampling two structures will be identified as the same ifRMSD between them is below rmsd_cutoff_uniq value. If this value is low all the structures will beconsidered as unique and this can lead to lack of diversity in the population and to trapping of the GAalgorithm in one part of conformational space since probability of mutation usually is quite low value.On the other hand keeping this value too high (e.g. above 1) may well lead to judging quite differentstructures as similar and therewith to the missing of minima in the GA search. This value is also sensitiveto the criteria of convergence of the local relaxation (FHI-aims control.in). Adjust this value to 0.5since we will use quite "loose" convergence criteria. The GA and run settings in the parameters.txt areselected in such way that a minimal GA run can be conducted: popsize=3 and max_iter=8. Giventhese settings, the search will start with an initial population of 3 random structures and 5 structureswill be produced with genetic operation procedure. In each generation, one new candidate structure isbuilt. Within this minimal GA run, a total of 3 + 5 = 8 DFT relaxations will be conducted one aftereach other.

Attention:

• Make sure that the aims_call in the parameters.txt is correct. In our case it ismpirun -np 4 aims.x

• Please also check the parameters "popsize" and "max_iter" (see below) inparameters.txt.

Create directory adds in your working folder and copy the mol.sdf file into it. The directory addsshould also contain the control.in file that will be used for the FHI-aims relaxations (you will findan example in the tutorial_3/solutions/ folder). In order to perform an exemplary GA run for thealanine dipeptide on the present machines and within the timescale of the tutorial, we have to carefullyadjust the settings for the DFT calculations for "prescreening" of the conformational space.

Modify the provided template for the control.in file:

• Add the missing xc-type option: pbe

• Add the tag: vdw_correction_hirshfeld

• Append the ’light’ species defaults (available in this directory $SPECIES_DEFAULTS) forthe atoms C, H, N, and O to the control.in file.

• In order to adjust computational demand for a prescreening-like calculation, the designof FHI-aims with having the important settings under user control comes in handy:– Decrease amount of grid points required for calculation: set outer_grid 50 and

leave only division 0.3822 50 uncommented for each of the species– Adjust settings for SCF convergence criteria (these flags are sc_accuracy_rho,

sc_accuracy_eev, sc_accuracy_etot, sc_accuracy_forces).– Also decrease relaxation geometry convergence criteria: relax_geometry trm

[number]

Attention:

• Too drastic reduction of the above parameters will result in failure of the FHI-aimscalculations and therewith failure of the Fafoom GA search.

• Population size as well as number of generations of the GA run are very short too. Aproduction run will have to be longer and repeats are advisable too!

For a test/tutorial run like this, it might be OK. For production runs, you have to be moreaccurate!

12

Page 13: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

Problem II: Start of the first runIf your working directory contains parameters.txt and folder adds, which in turn contains the filesmol.sdf and control.in, you can start your GA run with:

ga.py parameters.txt &

The Fafoom GA algorithm starts and creates multiple *_structure folders that will contain the respec-tive FHI-aims calculations. Also, if something goes wrong a restart can be automatically performed justby running ga.py in the same folder again. The output.txt file will be generated immediately andcompleted during the run. You can check the progress of the GA run with typing:

tail -f output.txt

Also additional geometry.in.constrained file in the adds folder appeared. This file generated auto-matically if is not provided. In case Fafoom would have found this file, which could contain, for examplea surface or a single atom or molecule, it would perform GA search of alanine dipeptide with respect tothe system contained in this file. In the output.txt you will find information about the molecule andthe identified degrees of freedom that will be optimized during the run.The structure search takes some time and will, due to the fairly reduced settings, only

find a few structures. Instead and until your search is done, you should analyze the resultsof a 200 iterations long GA run provided in tutorial_3/solutions/problem_2. In order tosave space, only one representative result.out file are provided in the folder. You canextract results with "unzip long_GA.zip" and proceed for now to Part B.

Once your GA run is finished you can take look at the results. As you probably already saw whenanalyzing the long GA run in Part B, alanine dipeptide can adopt a number of stable conformers. Inorder to find most of them, a number of independent GA runs (at least 5) with bigger population size(5-15) and number of iterations (40-50) would be needed. However, the short GA run that you haveconducted should have found at least few alanine dipeptide conformers. Take a look at your results:

• each of the folders *_structure contain files of the individual FHI-aims run: control.in, geometry.inand result.out. Files geometry_in.sdf, geometry.out, geometry_out.sdf and geometry_out.xyzare automatically generated by Fafoom. Check them and find out what they are.

• backup_* files can be used to restart your calculation.

• output.txt is the log-file of the GA run.

• backup_new_blacklist.dat contains all found unique structures also in SDF format. Take a lookat this file, you can find index and energy of the structure in the "comment" section.

Once you have some structures calculated you can write a script to collect all the relaxed geometries intoone file for further analysis or use what can be found in the folder utilities: collect_structures.pyand collect_final_geometries.py. The first one will collect structures from geometry_out.xyz fromeach folder and put it into one file (all_structures.xyz). The second one will collect information aboutthe individual FHI-aims runs from the various result.out files which is useful when you perform searchwith respect to another system.

Did your GA run find the reference global minimum? (Don’t worry if it didn’t: with such ashort GA the chance is below 20%.)

13

Page 14: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

Part B: Analysis of the resultsProblem III: Collecting the structuresThe GA run (either yours or the one from the zip archive) will have identified a number of local minima.In order to collect all the unique structures obtained during GA search, you can write your own scriptfor that or copy the example from /tutorial_3/utilities/collect_structures.py to your workingdirectory, please do not forget to adjust user privileges (make it executable) by typing "chmod u+xcollect_structures.py". Running

./collect_structures.py

which will produce all_structures.xyz file. Open this file in Jmol:

jmol all_structures.xyz

and take a look at all the structures by pressing "Go to next Frame" (right arrow) button, there canbe duplicates among the structures. A look at more cleaned-up data is also possible, Fafoom partiallydoes the work for you and the file backup_new_blacklist.dat contains only unique structures identifiedduring the GA run based on the rmsd_cutoff_uniq flag.

jmol backup_new_blacklist.dat

Can you already find out if the global minimum is found? This will be easier after we have sorted thestructures in the next step.

Write a script to For the next analysis steps, convert the backup_new_blacklist.dat file,which is basically a SDF trajectory file, into an XYZ trajectory file where the second line foreach structure in XYZ format contains index and energy of the structure (00index_structureEnergy: energy ) where index is index of the structure which is number of folder where it isstored and energy is the total energy of relaxed geometry.

You can find example of such script in tutorial_3/utilities/convert_blacklist.py.

Attention:Copy it to your working directory and change execution rights:chmod +x convert_blacklist.py collect_structures.py

By invoking the script through:

./convert_blacklist.py -i backup_new_blacklist.dat

which will produce the unique_structures.xyz file. This file serves as a basis for the analysis tool wewill use in the next steps.

Problem IV: Sorting the structuresWhen you have all structures collected it is convenient to have them sorted in order to take a look atthe lowest found conformer.

Sort structures and generate "trajectory" file unique_structures_sorted.xyz where allstructures are sorted according to increasing energy. For visualizing, we will also need toproduce a file unique_structures_data.dat where the information about values of torsionangles, and energies of the corresponding structures are stored.

You can copy such script from tutorial_3/utilities/sort_structures.py into your working direc-tory, adjust permission privileges (chmod ...) and run:

./sort_structures.py -i unique_structures.xyz

14

Page 15: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

ω1: 1-2-4-5ω2: 5-6-8-9

5

12

4

3

8

7

6

9

Peptide bonds:

φ: 2-4-5-6ψ: 4-5-6-8

φψ

10~ -83o

~ 75o

Figure 8: The global minimum of alanine dipeptide: C7eq conformer. The atoms are annotated with thecorresponding indices.

This command will produce the file unique_structures_sorted.xyz (take a look with jmol). Toidentify whether you have found the global minimum, you may look at the first structure of this file andvisually inspect it (measure the φ and ψ angles! You can do it with use of Jmol). Global minimum ofalanine dipeptide is the C7eq conformer (see Figure 8).Additionally, the file unique_structures_data.dat is generated that contains several columns like,

for example, DIH_0 that contains dihedral angle values of the individual conformers. ’Hierarchy’ col-umn should contain ones since it will be convenient to plot Energy hierarchy with that. Please opensort_structures.py and take a look inside: Can you find the line containing "list_of_torsions"? Youcan easily find the necessary information about the atoms that make up a torsion angle by looking it upin parameters.txt file (combine list_of_torsions and list_of_cistrans together).

Take a look at the produced files and at the folder called temp where all the splitted structures arecontained. The dihedral angle C-C-N-C (1-2-4-5) is one of the two peptide bonds in alanine dipep-tide which is (0,1,3,4) in Fafoom parameters.txt file (see Figure 8). After having successfully ex-ecuted the script and therewith having produced the two files unique_structures_sorted.xyz andunique_structures_data.dat, we can visualize the results.

Problem V: Visualizing and analyzing of the resultsNow you can use Interactive Sketchmap Visualizer (ISV) developed in Michele Ceriotti’s group at EPFLby Sandip De (https://cosmo.epfl.ch). It is a convenient, browser-based tool for visualizing structures(e.g. the results of your structure search) and for interactive plotting related data (e.g. energies ordihedral angles) at the same time. You can read more about it at http://interactive.sketchmap.orgor download the code from GitHub (https://github.com/cosmo-epfl/isv).

ISV is installed on the machines and you can use it by typing:

build.py --data unique_structures_data.dat --traj unique_structures_sorted.xyz --extserver

This command will generate temporary files to run a local server in order to run the visualization inyour browser that you start with:

python server.py --app unique_structures_data

After loading of the page press the link unique_structures_data which can be found by scrollingdown the webpage a little bit. This will open a tab in the browser where you can play with the interactiveenvironment.

• Can you produce a Ramachandran plot3 or an energy hierarchy?

• Can you plot how energy depends on one of the dihedral angles?

3 https://en.wikipedia.org/wiki/Ramachandran_plot

15

Page 16: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

• Can you also save plots and show them?

• Can you save an image of the lowest-energy structure?

In order to shut down server first close the browser and then press CTRL+C in the terminal window.That is it for basic GA tutorial and the usage of Fafoom together with FHI-aims and the analysis of

the results using the Interactive Sketchmap Viewer. If you have the time and the stamina, proceed toPart C (optional).

Part C: And beyondProblem VI: Adding custom parameters for visualizingIf you want to visualize more "coordinates" in ISV, just add extra columns to your input fileunique_structures_data.dat file, for example the distance between two oxygen atoms of the molecule.Be careful to add it consistent with the order of structures, which are sorted by increasing energy. You canadjust the sort_structures.py script or write your own to, for example visit each file in the temp folder,measure the property you want, and adds it to the data file. Investigate sort_structures.py script care-fully. Do not forget that the names of the columns must not contain spaces: recommended format for longnames is very_long_name. You can also take a look inside tutorial_3/utilities/add_column_to_data.pyand specify yourself the property to extract. For now only distance between two atoms is implemented.You can add extracting of a bond angle between 3 atoms or of another dihedral by yourself. Be careful,scripts produces new modified_data.dat with new column added. In order to open the new data filein ISV, please specify the "–data" flag correctly when calling the build.py script. Take care to keepcorrect order (as it can be found can be found in ’name’ column) for the new added property. Now youcan plot some dependencies for example distance between two atoms of your choice vs. dihedral angleand color it accordingly to the energy. It is not always obvious what are the drivers for system to preferparticular structures.

Problem VII: Extra capabilities for searchBy adjusting of geometry_constrained.in, which has appeared in the adds directory, you may set upa search for molecular conformation of whatever you put as mol.sdf (here alanine dipeptide) relativeto the system in geometry_constrained.in. This file is in FHI-aims format and putting atoms therewill make Fafoom to perform a GA search with respect to the system there. For example it can bea single ion, another molecule or even a slab or a surface (periodic boundary conditions need to bespecified in the latter case). Fafoom will produce conformations, put them on top of the "host" systemin geometry.in.constrained, and then pass the files to FHI-aims for local relaxation. Things to adjustare:

• in control.in, adjust for example species defaults and charge to meet the requirements of yoursystem.

• in parameters.txt, set the optimize_orientation flag to "True" in order to include varyingrotation and position of the molecule with respect to fixed system from geometry_constrained.in.

Since these extra computations can be rather demanding and go beyond the time frame of this tutorialafternoon, you can use Fafoom also simply as random structure generator (generator of input files) byspecifying energy_function=’no’ in parameters.txt. You may, for example, generate input structuresfor amino acids with a Ca2+ placed in adds/geometry_constrained.in. You can even compare torelaxed structures from http://aminoaciddb.rz-berlin.mpg.de/. Or generate complexes of two aminoacids by placing one (then rigid) conformer in the adds/geometry_constrained.in file. Just play a bit.

Problem VIII: Other systems of interestYou can take a look at Berlin Amino Acid Database (http://aminoaciddb.rz-berlin.mpg.de), down-load structures, and perform analysis. In order to download structures, first find the system(s) you areinterested in, identify the range of their urls and then adapt and run the following bash script:

16

Page 17: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

for x in $(seq -f "%04g" 1 NUMBER);do wget http://aminoaciddb.rz-berlin.mpg.de/database/AMINO/BB/CATION/conformer.$x.xyz;done

Take a look at the website for the correct address of the system that you want to download. Adjustthe placeholders NUMBER, AMINO, BB, and CATION to match the subset of data you are interested in. Inorder to get everything into the right format and in one file:

cat *.xyz > unique_structure.xyz

Adjust and use the scripts you used before, in particular the list_of_torsions in the sort_structures.pyscript. You will obtain file suitable for analyzing with ISV.Now you now enough to run your own calculation and perform sampling and analyzing of your system.

Create your own molecule of interest with use of Jmol or SMILES. Remember that you can convertmultiple file formats, e.g. xyz files, to SMILES by using OpenBabel:

obabel -ixyz conformer.0001.xyz -osmi -Omol.smi

AcknowledgmentsWe would like to thank the people testing this tutorial for their time and feedback and you for yourpatience and interest.

AppendixChemical formats

3

O 0.56468 0.17567 1.42856H 1.53257 0.20623 1.39359H 0.28583 0.78628 0.72973

water OpenBabel04141614133D

3 2 0 0 0 0 0 0 0 0999 V2000 0.5647 0.1757 1.4286 O 0 0 0 0 0 0 0 0 0 0 0 0 1.5326 0.2062 1.3936 H 0 0 0 0 0 0 0 0 0 0 0 0 0.2858 0.7863 0.7297 H 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 3 1 1 0 0 0 0M END$$$$

XYZ

number of atoms

atom 0.56468 0.17567 1.42856 Oatom 1.53257 0.20623 1.39359 Hatom 0.28583 0.78628 0.72973 H

FHI-AIMS

header

atom block

bond block

SDF

Figure 9: Different chemical file formats of water.

References[1] Matti Ropo, Markus Schneider, Carsten Baldauf, and Volker Blum. First-principles data set of 45,892 isolated and cation-

coordinated conformers of 20 proteinogenic amino acids. Sci. Data, 3:160009, 2016. doi: 10.1038/sdata.2016.9.

17

Page 18: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

[2] David Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules.J. Chem. Inf. Comput. Sci., 28(1):31–36, 1988. doi: 10.1021/ci00057a005.

[3] Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. Sect. A, 32(5):922–923,1976. doi: 10.1107/S0567739476001873.

[4] H. Bernhard Schlegel. Geometry optimization. WIREs: Comput. Mol. Sci., 1(5):790–809, 2011. doi: 10.1002/wcms.34.

[5] Adriana Supady, Volker Blum, and Carsten Baldauf. First-Principles Molecular Structure Search with a Genetic Algorithm.J. Chem. Inf. Model., 2015. doi: 10.1021/acs.jcim.5b00243.

[6] Jiabo Li, Tedman Ehlers, Jon Sutter, Shikha Varma-O’Brien, and Johannes Kirchmair. CAESAR: a new conformer generationalgorithm based on recursive buildup and local rotational symmetry consideration. J. Chem. Inf. Model., 47(5):1923–1932,2007. doi: 10.1021/ci700136x.

[7] Noel M. O’Boyle, Michael Banck, Craig A. James, Chris Morley, Tim Vandermeersch, and Geoffrey R. Hutchison. OpenBabel: An open chemical toolbox. J. Cheminform., 3(1):33, 2011. doi: 10.1186/1758-2946-3-33.

[8] Noel M. O’Boyle, Tim Vandermeersch, Christopher J. Flynn, Anita R. Maguire, and Geoffrey R. Hutchison. Confab -Systematic generation of diverse low-energy conformers. J. Cheminform., 3(1):8, 2011. doi: 10.1186/1758-2946-3-8.

[9] Fariborz Mohamadi, Nigel G. J. Richards, Wayne C. Guida, Rob Liskamp, Mark Lipton, Craig Caufield, George Chang,Thomas Hendrickson, and W. Clark Still. Macromodel - An integrated software system for modeling organic and bioorganicmolecules using molecular mechanics. J. Comput. Chem., 11(4):440–467, 1990. doi: 10.1002/jcc.540110405.

[10] MOE (Molecular Operating Environment). Chemical Computing Group, Inc.: Montreal, Canada, 2008.

[11] Javier Klett, Alvaro Cortés-Cabrera, Rubén Gil-Redondo, Federico Gago, and Antonio Morreale. ALFA: Automatic LigandFlexibility Assignment. J. Chem. Inf. Model., 54(1):314–323, 2014. doi: 10.1021/ci400453n.

[12] Christin Schärfer, Tanja Schulz-Gasch, Jérôme Hert, Lennart Heinzerling, Benjamin Schulz, Therese Inhester, Martin Stahl,and Matthias Rarey. CONFECT: Conformations from an Expert Collection of Torsion Patterns. ChemMedChem, pages1690–1700, 2013. doi: 10.1002/cmdc.201300242.

[13] Jens Sadowski, Johann Gasteiger, and Gerhard Klebe. Comparison of Automatic Three-Dimensional Model Builders Using639 X-ray Structures. J. Chem. Inf. Comput. Sci., 34(4):1000–1008, 1994. doi: 10.1021/ci00020a039.

[14] Steffen Renner, Christof H. Schwab, Johann Gasteiger, and Gisbert Schneider. Impact of conformational flexibility on three-dimensional similarity searching using correlation vectors. J. Chem. Inf. Model., 46(6):2324–2332, 2006. doi: 10.1021/ci050075s.

[15] Alessio Andronico, Arlo Randall, Ryan W. Benz, and Pierre Baldi. Data-driven high-throughput prediction of the 3-Dstructure of small molecules: review and progress. J. Chem. Inf. Model., 51(4):760–776, 2011. doi: 10.1021/ci100223t.

[16] Peter Sadowski and Pierre Baldi. Small-molecule 3D structure prediction using open crystallography data. J. Chem. Inf.Model., 53(12):3127–3130, 2013. doi: 10.1021/ci4005282.

[17] Paul C. D. Hawkins, A. Geoffrey Skillman, Gregory L. Warren, Benjamin A. Ellingson, and Matthew T. Stahl. Conformergeneration with OMEGA: algorithm and validation using high quality structures from the Protein Databank and CambridgeStructural Database. J. Chem. Inf. Model., 50(4):572–584, 2010. doi: 10.1021/ci100031x.

[18] Mikko J. Vainio and Mark S. Johnson. Generating conformer ensembles using a multiobjective genetic algorithm. J. Chem.Inf. Model., 47(6):2462–2474, 2007. doi: 10.1021/ci6005646.

[19] Xiaofeng Liu, Fang Bai, Sisheng Ouyang, Xicheng Wang, Honglin Li, and Hualiang Jiang. Cyndi: a multi-objective evolutionalgorithm based method for bioactive molecular conformational generation. BMC Bioinf., 10(1):101, 2009. doi: 10.1186/1471-2105-10-101.

[20] RDKit: Cheminformatics and Machine Learning Software. URL http://www.rdkit.org/.

[21] David J. Wales and Jonathan P. K. Doye. Global Optimization by Basin-Hopping and the Lowest Energy Structures ofLennard-Jones Clusters Containing up to 110 Atoms. J. Phys. Chem. A, 101(28):5111–5116, 1997. doi: 10.1021/jp970984n.

[22] Stefan Goedecker. Minima hopping: an efficient search method for the global minimum of the potential energy surface ofcomplex molecular systems. J. Chem. Phys., 120(21):9911–9917, 2004. doi: 10.1063/1.1724816.

[23] Sune R. Bahn and Karsten W. Jacobsen. An object-oriented scripting interface to a legacy electronic structure code. Comput.Sci. Eng., 4(3):56–66, 2002. doi: 10.1109/5992.998641.

[24] David J Wales. GMIN: A program for finding global minima and calculating thermodynamic properties from basin-sampling.URL http://www-wales.ch.cam.ac.uk/GMIN/.

[25] Jay W Ponder. Tinker - Software Tools for Molecular Design. URL http://dasher.wustl.edu/tinker/.

[26] John H. Holland. Adaptation in natural and artificial systems: an introductory analysis with applications to biology,control, and artificial intelligence. University of Michigan Press, Ann Arbor, MI, 1975. ISBN 0472084607.

[27] David B. Fogel, editor. Evolutionary Computation: The Fossil Record. IEEE Press, Piscataway, NJ, 1998.

[28] David E. Goldberg. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading, MA,1989.

[29] Nikhil Nair and Jonathan M. Goodman. Genetic Algorithms in Conformational Analysis. J. Chem. Inf. Comput. Sci., 38(2):317–320, 1998. doi: 10.1021/ci970433u.

[30] Martin Damsbo, Brian S. Kinnear, Matthew R. Hartings, Peder T. Ruhoff, Martin F. Jarrold, and Mark A. Ratner. Applicationof evolutionary algorithm methods to polypeptide folding: comparison with experimental results for unsolvated Ac-(Ala-Gly-Gly)5-LysH+. Proc. Natl. Acad. Sci. U. S. A., 101(19):7215–7222, 2004. doi: 10.1073/pnas.0401659101.

18

Page 19: Hands-onworkshop: Density-FunctionalTheoryandBeyond ... · 2018. 8. 2. · pseudocode(Algorithm1): #initialization whilei

[31] Niss O. Carstensen, Johannes M. Dieterich, and Bernd Hartke. Design of optimally switchable molecules by genetic algorithms.Phys. Chem. Chem. Phys., 13(7):2903–2910, 2011. doi: 10.1039/c0cp01065k.

[32] Gareth Jones, Peter Willett, Robert C. Glen, Andrew R. Leach, and Robin Taylor. Development and validation of a geneticalgorithm for flexible docking. J. Mol. Biol., 267(3):727–748, 1997. doi: 10.1006/jmbi.1996.0897.

[33] Garrett M. Morris, David S. Goodsell, Robert S. Halliday, Ruth Huey, William E. Hart, Richard K. Belew, and Arthur J.Olson. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput.Chem., 19(14):1639–1662, 1998. doi: 10.1002/(SICI)1096-987X(19981115)19:14<1639::AID-JCC10>3.0.CO;2-B.

[34] Bernd Hartke. Global geometry optimization of clusters using genetic algorithms. J. Phys. Chem., 97(39):9973–9976, 1993.doi: 10.1021/j100141a013.

[35] David M. Deaven and Kai-Ming Ho. Molecular geometry optimization with a genetic algorithm. Phys. Rev. Lett., 75(2):288–291, 1995.

[36] Roy L. Johnston. Evolving better nanoparticles: Genetic algorithms for optimising cluster geometries. Dalt. Trans., (22):4193–4207, 2003. doi: 10.1039/b305686d.

[37] Saswata Bhattacharya, Sergey V. Levchenko, Luca M. Ghiringhelli, and Matthias Scheffler. Stability and Metastability ofClusters in a Reactive Atmosphere: Theoretical Evidence for Unexpected Stoichiometries of MgMOx. Phys. Rev. Lett., 111(13):135501, 2013. doi: 10.1103/PhysRevLett.111.135501.

[38] Artem R. Oganov and Colin W. Glass. Crystal structure prediction using ab initio evolutionary techniques: principles andapplications. J. Chem. Phys., 124(24):244704, 2006. doi: 10.1063/1.2210932.

[39] Scott M. Woodley and Richard Catlow. Crystal structure prediction from first principles. Nature Mater., 7(12):937–946,2008. doi: 10.1038/nmat2321.

[40] Christian Neiss and Detlef Schooss. Accelerated cluster structure search using electron diffraction data in a genetic algorithm.Chem. Phys. Lett., 532(null):119–123, 2012. doi: 10.1016/j.cplett.2012.02.062.

[41] GNU Lesser General Public License. URL https://www.gnu.org/licenses/lgpl.html.

[42] Attila Bérces, Dennis M. Whitfield, and Tomoo Nukada. Quantitative description of six-membered ring conformations fol-lowing the IUPAC conformational nomenclature. Tetrahedron, 57(3):477–491, 2001. doi: 10.1016/S0040-4020(00)01019-X.

[43] Anthony D. Hill and Peter J. Reilly. Puckering coordinates of monocyclic rings by triangular decomposition. J. Chem. Inf.Model., 47(3):1031–1035, 2007. doi: 10.1021/ci600492e.

19