evolutionary aglorithms for the protein structure prediction problem and molecular docking a.-a....

54
Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr Laboratoire d’Informatique Fondamentale de Lille Parallel Cooperative Optimization Research Group INRIA DOLPHIN Project Supported by the French Research Agency

Post on 15-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Evolutionary Aglorithms for theProtein Structure Prediction Problem

and Molecular Docking

A.-A. Tantar, N. Melab and E-G. Talbi{tantar, melab, talbi}@lifl.fr

Laboratoire d’InformatiqueFondamentale de Lille

Parallel Cooperative

Optimization Research

Group

INRIA DOLPHIN Project

Supported by the French Research Agency

Page 2: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Outline

Protein Structure Prediction, Molecular Docking: modeling and complexity analysis

Parallel Hybrid Metaheuristics for the PSP Problem

Grid experimentation on GRID5000 Conclusion and Future Work

Page 3: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Protein Structure Prediction Problem

...

...

Conformational Sampling

Native Conformation

ENERGY

MINIMIZATION

Protein Structure Prediction (PSP) ~ finding the ground-state (tertiary structure)

conformation of a protein, given its amino-acid sequence - the primary structure

Amino-acids

A protein

Page 4: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Molecular Docking

Molecular Docking ~ the prediction of the optimal bound conformation of two molecules exerting geometrical and chemical complementarity.

LIG

AN

DR

EC

EP

TO

R

XK263 INHIBITOR

HIV-1 PROTEASE

+

HIV-1 PROTEASE + XK263

Page 5: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Scoring Energy Function (A)

Neumaier, A. (1997). Molecular modeling of proteins and mathematical prediction of protein structure. SIAM Review, 39(3):407 – 460.

Empirical Force Fields – Classical Mechanics

AMBER – Assisted Model Building with Energy Refinement CHARMM – Chemistry at HARvard Molecular Mechanics OPLS – Optimized Potentials for Liquid Simulations GROMOS – GROningen MOlecular Simulation SYBYL

D-Score - DOCK F-Score - FlexX G-Score - Gold

AutoDock/AutoGrid – suite of toos for automated docking ...

From recent results Empirical Methods OUTPERFORM Ab Initio Methods (!!)

WEAK

STRONG

London ForcesVan der Waals

ionic interactions

dipole-dipole

stacking

hydrogen bond

Page 6: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Scoring Energy Function (B)

AutoDock

DOCK - SYBYL/D-Score

Page 7: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Scoring Energy Function (C)

GOLD - SYBYL/G-Score

FlexX - SYBYL/F-Score

Renxiao Wang. Yipin Lu, and Shaomeng Wang, Comparative Evaluation of 11 Scoring Functions for Molecular Modeling, J. Med. Chem., 10.1021/jm0203783, 2003, 46, 2287-2303

Page 8: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Scoring Energy Function (D)

the factors model oscillating entities – forces simulated by interconnecting springs between atoms; offer an easy concept and fast energy evaluations.

constants derived from higher-level calculations (e.g. Ab initio) – difficult to obtain and to fit.

empirical force fields have the DRAWBACK of offering results not directly comparable with results obtained through another differently parameterized force field.

RMSD (Root Mean Squared Deviation) – typical distance measure between conformations.

Page 9: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

PSP modeling: Encoding of the conformations

AMINO-ACID

TORSION-ANGLE BASED CHROMOZOME

286

165

316

.

.

.

7

69

Cartesian atomic coordinates representation

amino-acid-based encoding – hydrophobic/hydrophilic models

all heavy atoms coordinates

backbone C coordinates

backbone coordinates and side-chain centroid coordinates

torsion-angle based representation

Page 10: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Molecular Docking: Encoding of the conformations

AutoDock Representation - string of real valued genes

three Cartesian coordinates for ligand translation four variables for defining a quaternion specifying ligand orientation one real value for each ligand torsion

Model complexity

High Complexity

flexible docking - both the ligand and the receptor are modeled as flexible molecules; limitations may be imposed

partially flexible docking – to some extent flexibility is modeled by focusing on the smaller molecule or by defining comprehensive regions of significance

rigid docking – extreme simplification; both the ligand and the receptor are rigid entities, no flexibility being allowed at any point

Low Complexity

+ Multiple Potential Binding Sites

Garret M. Morris et al., Automated Docking Using a Lamarckian Genetic Algorithm and an Empirical Binding Free Energy Function, Journal of Computational Chemistry, Vol. 19, No. 14, 1639-1662, 1998.

Page 11: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Complexity Analysis

A molecule of 40 residues

10 conformations per residue

1040 conformations

1014 conformations per second

1028 years

Levi

ntha

l’s P

arad

ox

1011 local optima for the [met]-enkephalin pentapeptide

75 atoms Five amino-acids (Tyr-Gly-Gly-Phe-Met) 22 variable backbone dihedral angles

Page 12: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Motivation and objectives

PSP and Molecular Docking can be modeled as a optimization problems …

… which are multi-modal … and require a huge amount of resources for large proteins

Need of hybrid metaheuristics Need of large scale parallelism (Grid computing)

$$$ : $150000-$250000 per X-ray structure

Study of two hybrid metaheuristics for PSP: Genetic Algorithms and Simulated Annealing

Page 13: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Classification of the Existing Methods

Energy minimization vs. Geometry complementarity approaches

Ab initio, de nuovo electronic structure calculations rely on quantum mechanics for determining different molecular characteristics comprise no approximations and no a priori experimental data is required high computational complexity => reduced size systems

Semi-empirical methods make use of approximations as substitution for ab initio techniques employ simplified models for electron-electron interactions

Empirical methods rely upon molecular dynamics (classical mechanics based methods) often the only applicable methods for large molecular systems, i.e. proteins and

polymers do not dissociate atoms into electrons and nuclei - indivisible entities

Continuous vs. Discretization - Combinatorial Optimization

Page 14: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Classification – Algorithmic Overview (A)

Global Optimization – Energy Landscape Projections, Convex Global Underestimation, etc.

construct a metamodel of the energy surface - avoid most of the inconvenients determined by the funnel with bumps structure of the landscape

tend to require less computational time than classical techniques – a few orders of magnitude apparently have a lower probability of remaining blocked in kinetic traps employ suppositions regarding the landscape which may not extend to different classes require the construction of a metamodel, underestimation, etc. which offers no

optimality/corectness guarantee

Molecular Dynamics offer accurate results as far as the force field model fits the real energy landscape the method can be applied straightforward – apart the formalism, no parameters to be tuned require extensive computational resources – may only be applied to reduced models the design and the development of the formalism may be far from trivial given the high computational complexity, stand-alone approaches cannot be practical

10-15 femto MD step

10-12 pico 10-9 nano,

MD long run10-6 micro 10-3 mili 100

seconds

bond vibration isomerization, water dynamics

fastest folders

typical folders

slow folders

Page 15: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Classification – Algorithmic Overview (B)

Monte Carlo / Simulated Annealing / Metropolis-based algorihtms do not require a large number of parameters – adaptive versions overcome to an extent the

inconvenients determined by the initial specified values offer a performance guarantee for ataining the optimal value compared to gradient based methods, are less prone to getting traped in local minima require a non feasible number of sampling points for having the performance guarantee appropiate for local search optimization or for the refinement of specific areas might employ specific distributions for performing the conformational sampling depending on the nature of the method, it may pose important parallelization problems

Evolutionary algoritmhs – Genetic Algorithms exhibit strong exploration characteristics while lacking intensification capabilities –

hybridization approaches tend to be more appropiate offer the base for complex parallelization techniques require implementing different components, having different parameters the design of the operators may require to consider structural aspects (i.e. xover operator

should not mix angles from different amino-acids when performing the recombination, etc.) do not offer any performance guarantee – statistical convergence

Page 16: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

The sketch of an evolutionary algorithm

Initialization

Parents Stop ? Selection

Genitors

OffspringEvaluation

Best individual

Recombination,mutation

Replacement

Evaluation

Evolutionaryengine

Page 17: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Seeding - information inserted from structural databases ensure that the population is seeded with the substructures that lead to the optimal

conformation does not represent the desired approach as strucutural information may not be

available in all the cases

Ab initio approach – no initial structural information is given uniform random initialization of the dihedral angles – the most common option depending on structural data different distributions may be employed for offering a

bias

The size of the population may have an impact on the convergence rate:

reduced number of chromosomes -> premature convergence larger number of chromosomes -> extra computational time

should be related to the cardinality of the alphabet used for encoding, chromosome length, algorithm parameters, etc.; empirical, restricted to special cases - no rigorous specifications

Generation of the Initial population

DUSAN P. DJURDJEVIC, MARK J. BIGGS, Ab Initio Protein Fold Prediction Using Evolutionary Aglorithms: Influence of Design and Control Parameters on Performance, 10.1002/jcc.20440, Wiley InterScience, 2005.

Page 18: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Selection pressurethe probability of selecting the best chromosome as compared to the average selection probability of all the chromosomes

Selection intensitypopulation expected average fitness value after performing a selection step having as base a normalized Gaussian distribution

Selection variancepopulation expected variance of the fitness distribution after performing a selection step having as base a normalized Gaussian distribution

Loss of diversitythe percentage of non-selected individuals during the selection phase

Biasthe absolute difference between the expected reproduction probability of a specific chromozome and the chromosome’s normalized fitness

Spreadrange of possible values for the offsprings of a given chromosome

Selection and Replacement (A)

Page 19: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Selection strategy

fitness-proportionate selection no bias, does not guarantee a minimum spread

rank-based selection selection relative to rank and not to the chromosome’s real fitness value

stochastic tournament selection tournament among randomly chosen individuals selection pressure can be adjusted by modifying the size of the tournament offer a constant selection pressure

uniform selection selection bias towards sparse fitness levels and not towards best fit

chromosomes might not behave well under extensive genetic diversity conditions (i.e, initial

phases)

Selection and Replacement (B)

Page 20: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Replacement strategy

Global replacement vs. Local replacement generational replacement – replace all the parents with the offsprings uniform replacement - less offsprings than parents, replace at random elitist replacement - less offsprings than parents, replace worst parents fitness-based replacement - more offsprings than parents, insert only the best

offsprings

random, first-in-first out – the simplest ones, do not exert strong performace characteristics

worst-fit replacement – the least fit chromosome(s) are replaced

exponential replacement – chromosomes are ranked from the least fit to the the most fit, the replacement process employing an associated probability distribution

...

Selection and Replacement (C)

Page 21: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

discrete recombination – exchange of chromosome values; for each locus in the offspring, the parent chromosomes have equal probabilities of contributing

intermediate recombination – offsprings obtained by interpolation of the parents

Offspring[ loci ] = ui * A[ loci ] + ( 1-ui ) * B [ loci ], ui [ -, 1+ ], i 1,n

line recombination – particular case of intermediat recombination, a single uniformly random generated variable being considered

Offspring[ loci ] = u * A[ loci ] + ( 1-u ) * B [ loci ], u [ -, 1+ ], i 1,n

one/multi point crossover – implies the generation of one/multiple cutting points marking the segments to be combined into offsprings

uniform crossover – each locus is consider as potential cutting point

...

Recombination Operators

Page 22: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

real valued mutation – the main parameters to adjust are the mutation step size and the mutation probability – usually chosen to be 1/n, n – no. of variables

variable step mutation – small step mutations are accepted with high probability while large step mutations are accepted with low probability

M[ loci ] = M[ loci ] + si * ri * pi , 1,n si{ -1, +1 } ri = r * |Bi – Ai|, r ~ 0.1, M[ loci ] [Ai, Bi] pi =2-u*k, u[ 0, 1 ]

step-size adaptation mutation – n step-sizes / one direction / n-directions; do not offer consistent improvements for extreme multi-modal / high noise functions

...

Mutation Operators

Page 23: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Guided Mutation - Mutation + LS

Optimized Individual

...∂'1

Local Search

∂'2

∂'n

...∂1

∂2

∂n...∂

i

...∂1

∂2

∂n...∂

i'

Angle Mutation

... ...GA population

Page 24: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

preset number of fitness function evaluations / number of generations

termination once the evolution ceases

optimal value has been found (!!)

...

Termination Criteria

Page 25: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Distance - Levenberg-Marquardt

Page 26: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Distance - Diversification

... ...GA population

... ...∂i ∂

j∂

k

CO

NV

ER

GEN

CE

Levenberg-MarquadtRMSD(i,j) RMSD(i,k) RMSD(j,k)

... ...∂i ∂

j∂

k

Compute Distances

between Conformations

Transform ConformationsDIV

ER

SIF

ICA

TIO

N

Root Mean Squared Deviation

Ananth Ranganathan, The Levenberg-Marquardt Algorithm, citeseer.ist.psu.edu/638988.html, June 2004

Page 27: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Literature examples (A)

Dusan P. Djurdjevic, Mark J. Biggs, Ab Initio Protein Fold Prediction Using Evolutionary Algorithms: Influence of Design and Control Parameters on Performance, DOI 10.1002/jcc.20440, Wiley Interscience

Generational Evolutionary Algorithm Steady-State Evolutionary Algorithm tournament selection, single-point/multi-point and uniform crossover, uniform

distribution mutation with angles within the range of 0-360, termination in case of lack of improvement for a specified number of generations

Christopher D. Rosin University of California, San Diego, A Comparison of Global and Local Search Methods in Drug Docking, Proceedings of the Seventh International Conference on Genetic Algorithms, ICGA, 1997

Simulated Annealing – 100 tests with 1.5~1.8 billion function evaluations per test, linearly reduced temperature

Solis and Wet’s Algorithm (a class of local and global search algorithms) – part of the GA+LS hybrid, deviations chosen from a normal distribution - does not require gradient information

Genetic Algorithm+LS – local search applied to only 7% of the population in each generation; stochastic selection, two-points crossover, Cauchy-deviate mutation

Page 28: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Garrett M. Morris et al, Automated Docking Using a Lamarckian Genetic Algorithm and an Empirical Binding Free Energy Function, Journal of Computational Chemistry, Vol. 19, No. 14, 1639-1662, 1998.

Monte Carlo – applied on a pre-calculated grid of interaction energies Simulated Annealing Genetic Algorithm, Lamarckian Genetic Algorithm

uniform random value initialization (-180.0, 180.0), maximum number of generations/function evaluations, proportional elitist selection, constant-size population, two-point crossover with breaks between genes, Cauchy distribution-based

mutation, local search at the end of each generation on a user-defined percentage of the

population

each method is given ~1.5 million function evaluations / up to 41.5 minutes on a 200MHz Silicon Graphics MIPS with 128 Mb of RAM

Literature examples (B)

Page 29: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Outline

Protein Structure Prediction (PSP): modeling and complexity analysis

Parallel Hybrid Metaheuristics for the PSP Problem

Grid experimentation on GRID5000 Conclusion and Future Work

Page 30: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Our contributions

Multi-Objective EO (MOEO) for the design of multi-objective evolutionary algorithms

Moving Objects (MO) for the design of local search algorithms

ParadisEO for parallel hybrid metaheuristics

PARAllel and DIStributed Evolving Objectshttp://paradiseo.gforge.inria.fr/

Message passing (MPI, PVM) Clusters, Networks of Workstations,

Multi-programming (PThreads) Shared Memory Multi-processors

(SMP) Parallel distributed computing

Clusters of SMPs (CLUMPS) Grid computing

Condor-MW and Globus (MPICH-G2)

EO

ParadisEO for computational Grids

MO MOEO PVM, PThreads MPI (LAM, CH)Condor-MW Globus

S. Cahon, N. Melab and E-G. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics. Journal of Heuristics, Elsevier Science, Vol.10(3), pages 357-380, May 2004.

Evolving Objects framework (EO)

European project(Geneura Team, INRIA, LIACS)

http://eodev.sourceforge.net

Transparent use

Page 31: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Parallelism(Partitioning solutions at several steps)

Contributions

ParadisEO-EO (A framework of Evolutionary Algorithms)

Solution-basedmetaheuristics

Hill Climbing

Simulated Annealing

Tabu search Cooperation/Hybridization(Algorithms exchange solutions)Ex. Island cooperation

ParadisEO-PEO (parallel and distributed

metaheuristics)

Techniquesrelated to

multi-objectiveoptimization

http://paradiseo.gforge.inria.frParadisEO-MO

ParadisEO -MOEO

Page 32: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Development of hybrid metaheuristics

Hybrid metaheuristic

High level Low level

Relay Coevolutionary

Encapsulated metaheuristicsIndependently cooperatingmetaheuristics

Metaheuristics executedin sequence

Concurrently executingmetaheuristics

Trivial

ParadisEOdistributed multi-agent model

Inheritance relationships

E-G. Talbi, « A taxonomy of hybrid metaheuristics », Journal of Heuristics, 2002.

The deployment of concurrent independent/cooperative metaheuristics

The parallelization of a single step of the metaheuristic (based on distribution of the handled solutions)

The parallelization of the processing of a single solution

Unifying view of the three parallel hierarchical levels

Page 33: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Design of several levels of parallelization/hybridization

SPH Scalability

Processing of a single solution(Objective / Data partitioning)

Independent walks, Multi-start model,

Hybridization/Cooperationof metaheuristics

Parallel evaluation ofthe neighborhood/population

Heuristic Population / Neighborhood Solution

Page 34: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Parallel evaluation of

the population

Low-level co-evolutionary hybridization

Cooperative GAs (Island model)

Parallel asynchronous hierarchical Lamarkian GA

Page 35: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Asynchronous Island ModelE.A.

E.A.

E.A.E.A.

Migration

Page 36: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

E.A.

Solution

Fullfitness

The Parallel Evaluation of the Population

Page 37: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Parallel Hybrid Simulated Annealing

Synchronous Multi-Start Model

Parallel Neighbourhood Exploration

Gradient Local Search Optimization

Generate S0

k := 0while T

k > T

threshold do

for s:=1 to nbSamples doS

rand := randomMove( S

0 );

ΔE := eval( Srand

) - eval( S0 );

if ΔE < 0 then S

0 := S

rand;

else S

0 := S

rand with prob. 1.0 / ( 1.0 + eΔE/Tk);

endifendfor

k := k + 1; Adjust Tk;

endwhile

Page 38: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Outline

Protein Structure Prediction (PSP): modeling and complexity analysis

Parallel Hybrid Metaheuristics for the PSP Problem

Grid experimentation on GRID5000

Conclusion and Future Work

Page 39: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

M3M3

M2M2

M2M2

M6M6

M5M5

M4M4

M1M1

M6M6

M5M5

M4M4

M3M3

M2M2

FRONTALEFRONTALE

M1M1

M6M6

M5M5

M4M4

M3M3

M2M2

FRONTALEFRONTALEM1

M1

M6M6

M5M5

M4M4

M3M3

M2M2

FRONTALEFRONTALE

M3M3

Deployment of ParadisEO-G4

1. Reservation of the desired nodes

2. Select a master node for the Globus GRID3. Configure the Globus GRID (certificates, user credentials, xinetd, postgresql, etc.)4. Deployment and execution – MPICH-G2

GRID5000: A fully reconfigurable grid! The configuration phase relies on the deployment of pre-built Linux « images » having Globus and MPICH-G2 already installed.

• Lille, Nice-Sophia Antipolis, Lyon, Nancy, Rennes: 400CPUs

• exclusive reservation – no interference may occur; the processor are completely available during the reservation time.

CLUSTER A

CLUSTER B

CLUSTER C

Page 40: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

1L2Y and α-Cyclodextrin

Triptophan-Cage – Protein Data Bank ID: 1L2Y α-β-γ Cyclodextrin

Page 41: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Parallel asynchronous hierarchical Lamarkian GA - Cyclodextrin

Native energy at 242 kcal mol-1

MAX_GEN 100POPULATION 300

CXOVER 0.95CMUTATION 0.05LS_XOVER 0.15LS_MUTATION 0.05

XOVER 1.0MUTATION 0.1

MIGRATION_R 15%MIGRATION_STEP 5Gen

Page 42: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Parallel Hybrid SA – 1L2Y & Cyclodextrin

the same number of individuals is sampled by both algorithms, GA and SA

TRYPTOPHAN-CAGE (1L2Y) GA vs. SA CYCLODEXTRIN SA

Page 43: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Experimental Results

Cyclodextrin Min Max Avg StDev

GA 2470 5845.27 3790.56 708.54

SA 1029.14 4593 2359.26 281.2

GA+LS, no isl.

GA+LS SA SA+LS

Cyclodextrin 264.599 164.973 1029.14 904.046

1L2Y 93.5521 57.5447 201.37 86.3106

Cyclodextrin Native Energy: 242.4 kcal mol-1

Tryptophan-cage Native Energy: 46.446 kcal mol-1

Random Search 1L2Y

1012.54 13471.4 4420.91 1521.67

1204 6329.91 3132.8 800.85Random+LS

Min Max Avg StDev

GA+LS, Island

GA+LS,No Ils.

GA+Isl. GA

Cyclodextrin 31m20s 29m32s 7m58s 7m24s

1L2Y 29m57s 31m17s 7m47s 7m45s

Page 44: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Outline

Protein Structure Prediction (PSP): modeling and complexity analysis

Parallel Hybrid Metaheuristics for the PSP Problem

Grid experimentation on GRID5000 Conclusion and Future Work

Page 45: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Conclusion and Future Work (1)

The GA behaves better on the considered benchmarks, especially on Cyclodextrin Bellow native-conformation energy results were

obtained

The SA results are comparable to the results obtained by the GA with no hybridization

The SA allows for little intrinsic parallelism …

… but, it is not getting easily trapped in local minima (like directed LS, i.e. gradient LS)

Page 46: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Conclusion and Future Work (2)

Hierarchical multi-stage parallel models may prove to be efficient on the considered benchmarks

Hybridization schemes combining … … the GA (strong exploration capabilities) … and the SA (intensification + less prone to

getting trapped in local minima than gradient methods)

Page 47: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

ParadisEO-PEO Implementation

Page 48: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

ParadisEO-PEO Evolutionary Algorithm

pop_eval ( population ); do {

eoPop< EOT > offsprings; select ( population, offsprings );

transform ( offsprings );

pop_eval ( offsprings );

replace ( population, offsprings );

} while ( cont ( population ) );

evaluation function - the designed class has to be derived from the peoPopEval class

peoSeqPopEval peoParaPopEval

selection strategy – applied at each iteration in order to obtain the offspring population

transformation operators - crossover operator(s), mutation operator(s); peoEA requires a peoTransform derived object

peoSeqTransform peoParaSGATransform

replacement strategy – for integrating the offsprings back into the initial population

continuation criterion - maximum number of generations, checkpoints, etc.

peoEA ( eoContinue< EOT >& __cont, peoPopEval< EOT >& __pop_eval, eoSelect< EOT >& __select, peoTransform< EOT >& __transform, eoReplacement< EOT >& __replace

);

Page 49: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

ParadisEO-PEO EA Representation of the Individuals

#include <EO.h>...

class Conformation : public EO< eoScalarFitness< double, greater< double > > > {...int operator[]( const int& index ) const {

return chromosome->getChromo( index+1 ); }...

Individu* chromosome;...static Molecule* molecule; static Hamiltonian* hamiltonian; static Population* population;

};

extern void pack( const dockingAtGrid::Conformation& conformation );extern void unpack( dockingAtGrid::Conformation& conformation );

Page 50: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

ParadisEO-PEO EA Representation – Packing & Unpacking

void pack( const dockingAtGrid::Conformation& conformation ) {

if ( conformation.invalid() ) pack( (unsigned int)0 );

else {pack( (unsigned int)1 );pack( conformation.fitness() );

}

for ( unsigned int index = 0; index < conformation.size(); index++ )

pack( conformation[ index ] ); }

Un

pack R

ecv.

Data

Pack D

ata

to

Sen

d

void unpack( dockingAtGrid::Conformation& conformation ) {

eoScalarFitness<double, std::greater<double> > fitness;

unsigned int validFitness;

unpack( validFitness ); if ( validFitness ) {

unpack( fitness );conformation.fitness( fitness );

} else {...

} }

MIDLEWARE

Page 51: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

ParadisEO-PEO EA Random chromozome initialization

Random initialization for each of the active torsional angles with a value within a specified angle domain (typically 0..360).

#include <eoInit.h>...

class ConformationInit : public eoInit< Conformation > { public:

...void operator()( Conformation& conf ) {

for ( unsigned int i =1; i <= conf.molecule->getNvars(); i++ ) {

conf.chromosome->setChromo( i, eo::rng.random( ANGLE_DOMAIN ) );

}}

...};

Page 52: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

Asyn

c.

Ils.

Asynchronous Island ModelP

ara

mete

rsE A

// migrations will occur periodically at every N generationseoPeriodicContinue < Conformation > mig_cont ( MIGRATIONS_AT_N_GENERATIONS );

// selection of the emigrant conformations is performed as a stochastic tournamenteoStochTournamentSelect < Conformation > mig_select_one;eoSelectNumber < Conformation > mig_select ( mig_select_one, NUMBER_OF_MIGRANTS );

// migrations are merged with the population - the worse individuals are discardedeoDeterministicSaDReplacement < Conformation > mig_elit_replace ( 0.3, 0.0, 0.1, 0.0 );

// the migration stream will follow a ring topologyRingTopology ring_topology;

peoAsyncIslandMig < Conformation > island_mig ( mig_cont, mig_select, mig_elit_replace, ring_topology, alg.population,

alg.population );

// to be activated at the end of every generationcheckpoint->add ( island_mig ); peoEA < Conformation > eAlg (checkpoint, pop_eval, select, paraTransform, replacement ); island_mig.setOwner ( eAlg );

Page 53: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

The Parallel Evaluation of the Population

// functor for performing the evaluation of a specified conformation

ConformationEval conformationEval;

// wrapper for the evaluation functor – in addition it offers parallel evaluation

peoParaPopEval< Conformation > pop_eval ( conformationEval );

peoParaSGATransform< Conformation > paraTransform ( crossover, XOVER_RATE, mutation, MUTATION_RATE );

…peoEA < Conformation > eAlg ( checkpoint, pop_eval, select, paraTransform, replacement );

<?xml version="1.0"?><schema>

<group scheduler="0"><node name="0"

num_workers="0"></node><node name="1"

num_workers="0"><runner>1</

runner>...

</node><node name="2"

num_workers="1"></node><node name="3"

num_workers="1">...

</group></schema>

SCHEDULER

ALGORITHMS

WORKER NODES

XM

L G

rid M

appin

g

Page 54: Evolutionary Aglorithms for the Protein Structure Prediction Problem and Molecular Docking A.-A. Tantar, N. Melab and E-G. Talbi {tantar, melab, talbi}@lifl.fr

End of story...