development and validation of a genetic algorithm for flexible docking gareth jones, peter willet,...

36
Development and Validation of a Genetic Algorithm for Flexible Docking Gareth Jones, Peter Willet, Robert C. Glen, Andrew R. Leach and Robin Taylor J. Mol. Biol., 1997, 267

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Development and Validation of a Genetic Algorithm for Flexible Docking

Gareth Jones, Peter Willet, Robert C. Glen, Andrew R. Leach and Robin Taylor

J. Mol. Biol., 1997, 267

Bioinformatics Seminar 2005 Matthias Dietzen

2

Contents Introduction

Docking

Genetic Algorithm

Development of GOLD

Validation of GOLD

Conclusions

Discussion

Introduction

Bioinformatics Seminar 2005 Matthias Dietzen

4

Introduction Nowadays, computer-aided design of

therapeutic molecules is the method of choice

Screening virtual libraries for novel chemical entities and predicting their binding modes for a given receptor would save both time and money

Satisfies „fail fast, fail cheap“

Docking

Definition Problems

Bioinformatics Seminar 2005 Matthias Dietzen

6

DockingDefinition “Docking tries to find the energetically most

feasible three-dimensional arrangement of two molecules in close contact with each other.”

Use of Docking: Target Validation Lead Discovery Lead Optimization

Bioinformatics Seminar 2005 Matthias Dietzen

7

DockingProblems 3 different complexities:

Rigid (comparatively simple) Semi-flexible (hard) Flexible (undoable)

Combinatorial explosion when accounting for flexibility of ligand and/or receptor forces the development of highly sophisticated algorithms

One of these: Genetic Algorithm

Genetic Algorithm

Definition Model Algorithm

Bioinformatics Seminar 2005 Matthias Dietzen

9

Genetic AlgorithmDefinition “A Genetic Algorithm evolves the population of

possible solutions through genetic operators to a final population, optimizing a predefined fitness function.”

Underlying principle: Darwin‘s Theory of Evolution

Population growth is limited by the food available Individuals using this food more efficiently will

produce more offspring displacement of less adapted individuals „Survival of the fittest“

Bioinformatics Seminar 2005 Matthias Dietzen

10

Genetic AlgorithmModel A Genetic Algorithm provides:

Population(s) of individuals competing against each other

Each individual represented as a set of chromosomes encoding the individual‘s features

Genetic Operators modelling processes of evolution

A Fitness Function ranking the individuals of one generation

Bioinformatics Seminar 2005 Matthias Dietzen

11

Genetic AlgorithmAlgorithm1. Select and initialize the set of genetic operators

2. Randomly create an initial population and rank by fitness

3. Select parents in dependence of their ranking

4. Breed children by the use of genetic operators

5. Evaluate the children‘s fitness

6. Replace least fit members of the population

7. Go to 3 until termination or convergence

Development of GOLD

Chromosomes Fitness Function Genetic Operators

Bioinformatics Seminar 2005 Matthias Dietzen

13

Development of GOLDChromosomes 2 binary strings for conformation information

of both ligand and protein 1 byte for each bond‘s rotation angle

2 integer strings for mapping of hydrogen bonds

Acceptor (ligand) -> Donor (receptor) Donor (ligand) -> Acceptor (receptor)

Use of least squares fitting to form as many hydrogen bonds as possible

Bioinformatics Seminar 2005 Matthias Dietzen

14

Development of GOLDFitness Function

3 energy terms H_Bond_Energy: sum of energies of all

hydrogen bonds in the complex

Complex_Energy: steric energy of interaction between ligand and receptor

Internal_Energy: the ligand‘s steric and torsional energy based on molecular mechanics

Final fitness score: -(H_Bond_Energy+Internal_Energy+Complex_Energy)

Bioinformatics Seminar 2005 Matthias Dietzen

15

Development of GOLDFitness Function - H_Bond_Energy Epair x distance_weight x angle_weight

Geometrical arrangement of donor hydrogen, acceptor and any lone-pairs

hydrogen-bond energy between a donor and an acceptor

Bioinformatics Seminar 2005 Matthias Dietzen

16

Development of GOLDFitness Function - H_Bond_Energy Epair

Uses model fragments for donor (d) and acceptor (a)

Accounts for displacement of water (w)

Initially, Donor and acceptor are in solution, but when forming a hydrogen-bond, water is stripped off

Epair = (Eda + Eww) – (Edw + Eaw)

Bioinformatics Seminar 2005 Matthias Dietzen

17

Development of GOLDGenetic operators Island model:

isolated subpopulations instead of one large population

No increase effectiveness but efficiency five subpopulations, each with 100 individuals

Use of four genetic operators: Crossover Mutation Migration Selection

Bioinformatics Seminar 2005 Matthias Dietzen

18

Development of GOLDGenetic operators Crossover

Inherits the parents‘ features by crossover of chromosomes

Mutation Changes a single individual‘s chromosome randomly (bit

flipping)

Migration Copies an individual from one island to a neighbouring one

Selection Relative probability to chose fittest individual as a parent Pressure: 1.1

Validation of GOLD

Data set Classification Results

Bioinformatics Seminar 2005 Matthias Dietzen

20

Validation of GOLDData set Data set of 100 protein ligand complexes of

pharmacological interest from PDB

High Variance of test set: Heavy atoms between 6 and 55 Rotatable bonds between 0 and 30 Many functionally different protein types Metalloenzymes

Hand-curated with respect to charges, protonation and tautomeric states

Bioinformatics Seminar 2005 Matthias Dietzen

21

Validation of GOLDClassification 20 GA runs per complex

Ensures to find best solution

Four subjective categories: Good: binding mode, hydrogen-bonds, close contacts,

metal coordination correct Close: result acceptable, but with some displacement of

ligand groups from the experimental result Errors: Partially correct, but with significant errors Wrong: Completely incorrect

Preference to rmsd, small rmsd may mask errors

Bioinformatics Seminar 2005 Matthias Dietzen

22

Validation of GOLDClassification

Left: good Right: errors

Bioinformatics Seminar 2005 Matthias Dietzen

23

Validation of GOLDResults Prediction:

71/100 in categories good and close

Complexes predicted after 2, 5, 10 runsGA runs Correctly

predicted

2 49/71

5 63/71

10 65/71

Bioinformatics Seminar 2005 Matthias Dietzen

24

Validation of GOLDResults

Ligand composition:Heavy Atoms

0

10

20

30

40

50

60

Max Avg Min

% H-Bonding

0

10

20

30

40

50

60

70

80

Max Avg Min

Torsions/free corners

0

5

10

15

20

25

30

35

40

45

Max Avg Min

Bioinformatics Seminar 2005 Matthias Dietzen

25

Validation of GOLDResults Problems in resolution:

0%

20%

40%

60%

80%

100%

≤ 1.

5

≤ 2.

0

≤ 2.

5

≤ 3.

0

> 3

.0

Resolution Å

Cla

ssif

icat

ion

Errors + Wrong

Good + Close

Bioinformatics Seminar 2005 Matthias Dietzen

26

Validation of GOLDResults Summary:

71% prediction accuracy

In general, GOLD does not require 20 runs

fails for many heavy atoms/torsions due to complexity

fails for few hydrogen bonds due to fitness score

Prediction rate of 77% for resolution ≤2.5

Conclusions

Bioinformatics Seminar 2005 Matthias Dietzen

28

Conclusions Genetic Algorithms in general:

Random initialization (non-deterministic) Convergence to global minimum Solutions are suboptimal Need of a local minimizer

GOLD: Bit vector mutation leads to solutions far

from the original individual Problems of docking large, flexible,

hydrophobic ligands

Thank you for your attention!

Discussion

Bioinformatics Seminar 2005 Matthias Dietzen

31

Validation of GOLDResults Ligand composition

(good+close/errors+wrong):Heavy Atoms

Torsions & free corners

% H-bonding

Max 52/55 28/40 66.7/53.9

Avg 20.4/24.3 7.9/11.4 31.9/25.1

Min 6/9 0/0 8.8/4.8

Bioinformatics Seminar 2005 Matthias Dietzen

32

DevelopmentThe Fitness Function – H_Bond_Energy distance_wt

1, d ≤ 0.25 Å distance_wt: d (dmax – d)/(dmax – 0.25 Å), d in [0.25

Å,dmax]

0, d ≥ dmax

dmax varies linearly from 4.0 Å (when the GA starts) to 1.5 Å (after 75.000 genetic operations)

allows long range interactions in the beginning but only close contacts in the end

Bioinformatics Seminar 2005 Matthias Dietzen

33

DevelopmentThe Fitness Function – H_Bond_Energy angle_wt

Acceptor w/o lone-pair directional preference: angle_wt = 1

For acceptors with directionality in the plane of lone-pairs: 1, θ < 20°angle_wt: θ [(60°– θ) / (60°-20°)]2, θ in [20°,60°]

0, θ > 60°

For acceptors with directionality along the lone-pairs: 1, θ > 160°

angle_wt: Φ [(160°– θ) / (160°-60°)]2, θ in [60°,160°] 0, θ < 60°

Bioinformatics Seminar 2005 Matthias Dietzen

34

DevelopmentThe Fitness Function – Complex_Energy ∑atoms i ∑atoms j Eij

Eij = A/dij8 – B/dij

4 (8-4 potential) smoother than standard Lennard-Jones 12-6 potential A, B chosen to reproduce the minimum of 12-6 potential

Adjustments for hydrogen bonds Eij = 0 for interaction of donor-H and acceptor Distance between donor and acceptor is scaled by 1.43 reduces vdW-radii by 70%

Bioinformatics Seminar 2005 Matthias Dietzen

35

DevelopmentThe Fitness Function – Complex_Energy Let –kij be minimum energy of interaction

between two atoms i and j

For Eij > scale x kij => Eij = 1.5 x scale x kij

scale varies logarithmically from 1.0 (when GA starts) to 120.0 (after 75.000 genetic operations)

Encourages to form close contacts early in a GA run, while avoiding steric clashes in the end

Bioinformatics Seminar 2005 Matthias Dietzen

36

DevelopmentThe Fitness Function – Internal_Energy Internal_Energy

steric energy (for each two atoms i,j) Eij = C/dij

12 - D/dij6

with C and D chosen such that Eij is minimal for dij = ri

+rj

torsional energy (for four consecutively bonded atoms i,j,k,l)

Eijkl = ½ Vijkl [1 + ηijkl / |ηijkl| cos(|ηijkl| x ωijkl) ] with ω torsional angle η periodicity (predefined) V barrier to rotation (predefined)