development and validation of a genetic algorithm for flexible docking gareth jones, peter willet,...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Development and Validation of a Genetic Algorithm for Flexible Docking
Gareth Jones, Peter Willet, Robert C. Glen, Andrew R. Leach and Robin Taylor
J. Mol. Biol., 1997, 267
Bioinformatics Seminar 2005 Matthias Dietzen
2
Contents Introduction
Docking
Genetic Algorithm
Development of GOLD
Validation of GOLD
Conclusions
Discussion
Bioinformatics Seminar 2005 Matthias Dietzen
4
Introduction Nowadays, computer-aided design of
therapeutic molecules is the method of choice
Screening virtual libraries for novel chemical entities and predicting their binding modes for a given receptor would save both time and money
Satisfies „fail fast, fail cheap“
Bioinformatics Seminar 2005 Matthias Dietzen
6
DockingDefinition “Docking tries to find the energetically most
feasible three-dimensional arrangement of two molecules in close contact with each other.”
Use of Docking: Target Validation Lead Discovery Lead Optimization
Bioinformatics Seminar 2005 Matthias Dietzen
7
DockingProblems 3 different complexities:
Rigid (comparatively simple) Semi-flexible (hard) Flexible (undoable)
Combinatorial explosion when accounting for flexibility of ligand and/or receptor forces the development of highly sophisticated algorithms
One of these: Genetic Algorithm
Bioinformatics Seminar 2005 Matthias Dietzen
9
Genetic AlgorithmDefinition “A Genetic Algorithm evolves the population of
possible solutions through genetic operators to a final population, optimizing a predefined fitness function.”
Underlying principle: Darwin‘s Theory of Evolution
Population growth is limited by the food available Individuals using this food more efficiently will
produce more offspring displacement of less adapted individuals „Survival of the fittest“
Bioinformatics Seminar 2005 Matthias Dietzen
10
Genetic AlgorithmModel A Genetic Algorithm provides:
Population(s) of individuals competing against each other
Each individual represented as a set of chromosomes encoding the individual‘s features
Genetic Operators modelling processes of evolution
A Fitness Function ranking the individuals of one generation
Bioinformatics Seminar 2005 Matthias Dietzen
11
Genetic AlgorithmAlgorithm1. Select and initialize the set of genetic operators
2. Randomly create an initial population and rank by fitness
3. Select parents in dependence of their ranking
4. Breed children by the use of genetic operators
5. Evaluate the children‘s fitness
6. Replace least fit members of the population
7. Go to 3 until termination or convergence
Bioinformatics Seminar 2005 Matthias Dietzen
13
Development of GOLDChromosomes 2 binary strings for conformation information
of both ligand and protein 1 byte for each bond‘s rotation angle
2 integer strings for mapping of hydrogen bonds
Acceptor (ligand) -> Donor (receptor) Donor (ligand) -> Acceptor (receptor)
Use of least squares fitting to form as many hydrogen bonds as possible
Bioinformatics Seminar 2005 Matthias Dietzen
14
Development of GOLDFitness Function
3 energy terms H_Bond_Energy: sum of energies of all
hydrogen bonds in the complex
Complex_Energy: steric energy of interaction between ligand and receptor
Internal_Energy: the ligand‘s steric and torsional energy based on molecular mechanics
Final fitness score: -(H_Bond_Energy+Internal_Energy+Complex_Energy)
Bioinformatics Seminar 2005 Matthias Dietzen
15
Development of GOLDFitness Function - H_Bond_Energy Epair x distance_weight x angle_weight
Geometrical arrangement of donor hydrogen, acceptor and any lone-pairs
hydrogen-bond energy between a donor and an acceptor
Bioinformatics Seminar 2005 Matthias Dietzen
16
Development of GOLDFitness Function - H_Bond_Energy Epair
Uses model fragments for donor (d) and acceptor (a)
Accounts for displacement of water (w)
Initially, Donor and acceptor are in solution, but when forming a hydrogen-bond, water is stripped off
Epair = (Eda + Eww) – (Edw + Eaw)
Bioinformatics Seminar 2005 Matthias Dietzen
17
Development of GOLDGenetic operators Island model:
isolated subpopulations instead of one large population
No increase effectiveness but efficiency five subpopulations, each with 100 individuals
Use of four genetic operators: Crossover Mutation Migration Selection
Bioinformatics Seminar 2005 Matthias Dietzen
18
Development of GOLDGenetic operators Crossover
Inherits the parents‘ features by crossover of chromosomes
Mutation Changes a single individual‘s chromosome randomly (bit
flipping)
Migration Copies an individual from one island to a neighbouring one
Selection Relative probability to chose fittest individual as a parent Pressure: 1.1
Bioinformatics Seminar 2005 Matthias Dietzen
20
Validation of GOLDData set Data set of 100 protein ligand complexes of
pharmacological interest from PDB
High Variance of test set: Heavy atoms between 6 and 55 Rotatable bonds between 0 and 30 Many functionally different protein types Metalloenzymes
Hand-curated with respect to charges, protonation and tautomeric states
Bioinformatics Seminar 2005 Matthias Dietzen
21
Validation of GOLDClassification 20 GA runs per complex
Ensures to find best solution
Four subjective categories: Good: binding mode, hydrogen-bonds, close contacts,
metal coordination correct Close: result acceptable, but with some displacement of
ligand groups from the experimental result Errors: Partially correct, but with significant errors Wrong: Completely incorrect
Preference to rmsd, small rmsd may mask errors
Bioinformatics Seminar 2005 Matthias Dietzen
22
Validation of GOLDClassification
Left: good Right: errors
Bioinformatics Seminar 2005 Matthias Dietzen
23
Validation of GOLDResults Prediction:
71/100 in categories good and close
Complexes predicted after 2, 5, 10 runsGA runs Correctly
predicted
2 49/71
5 63/71
10 65/71
Bioinformatics Seminar 2005 Matthias Dietzen
24
Validation of GOLDResults
Ligand composition:Heavy Atoms
0
10
20
30
40
50
60
Max Avg Min
% H-Bonding
0
10
20
30
40
50
60
70
80
Max Avg Min
Torsions/free corners
0
5
10
15
20
25
30
35
40
45
Max Avg Min
Bioinformatics Seminar 2005 Matthias Dietzen
25
Validation of GOLDResults Problems in resolution:
0%
20%
40%
60%
80%
100%
≤ 1.
5
≤ 2.
0
≤ 2.
5
≤ 3.
0
> 3
.0
Resolution Å
Cla
ssif
icat
ion
Errors + Wrong
Good + Close
Bioinformatics Seminar 2005 Matthias Dietzen
26
Validation of GOLDResults Summary:
71% prediction accuracy
In general, GOLD does not require 20 runs
fails for many heavy atoms/torsions due to complexity
fails for few hydrogen bonds due to fitness score
Prediction rate of 77% for resolution ≤2.5
Bioinformatics Seminar 2005 Matthias Dietzen
28
Conclusions Genetic Algorithms in general:
Random initialization (non-deterministic) Convergence to global minimum Solutions are suboptimal Need of a local minimizer
GOLD: Bit vector mutation leads to solutions far
from the original individual Problems of docking large, flexible,
hydrophobic ligands
Bioinformatics Seminar 2005 Matthias Dietzen
31
Validation of GOLDResults Ligand composition
(good+close/errors+wrong):Heavy Atoms
Torsions & free corners
% H-bonding
Max 52/55 28/40 66.7/53.9
Avg 20.4/24.3 7.9/11.4 31.9/25.1
Min 6/9 0/0 8.8/4.8
Bioinformatics Seminar 2005 Matthias Dietzen
32
DevelopmentThe Fitness Function – H_Bond_Energy distance_wt
1, d ≤ 0.25 Å distance_wt: d (dmax – d)/(dmax – 0.25 Å), d in [0.25
Å,dmax]
0, d ≥ dmax
dmax varies linearly from 4.0 Å (when the GA starts) to 1.5 Å (after 75.000 genetic operations)
allows long range interactions in the beginning but only close contacts in the end
Bioinformatics Seminar 2005 Matthias Dietzen
33
DevelopmentThe Fitness Function – H_Bond_Energy angle_wt
Acceptor w/o lone-pair directional preference: angle_wt = 1
For acceptors with directionality in the plane of lone-pairs: 1, θ < 20°angle_wt: θ [(60°– θ) / (60°-20°)]2, θ in [20°,60°]
0, θ > 60°
For acceptors with directionality along the lone-pairs: 1, θ > 160°
angle_wt: Φ [(160°– θ) / (160°-60°)]2, θ in [60°,160°] 0, θ < 60°
Bioinformatics Seminar 2005 Matthias Dietzen
34
DevelopmentThe Fitness Function – Complex_Energy ∑atoms i ∑atoms j Eij
Eij = A/dij8 – B/dij
4 (8-4 potential) smoother than standard Lennard-Jones 12-6 potential A, B chosen to reproduce the minimum of 12-6 potential
Adjustments for hydrogen bonds Eij = 0 for interaction of donor-H and acceptor Distance between donor and acceptor is scaled by 1.43 reduces vdW-radii by 70%
Bioinformatics Seminar 2005 Matthias Dietzen
35
DevelopmentThe Fitness Function – Complex_Energy Let –kij be minimum energy of interaction
between two atoms i and j
For Eij > scale x kij => Eij = 1.5 x scale x kij
scale varies logarithmically from 1.0 (when GA starts) to 120.0 (after 75.000 genetic operations)
Encourages to form close contacts early in a GA run, while avoiding steric clashes in the end
Bioinformatics Seminar 2005 Matthias Dietzen
36
DevelopmentThe Fitness Function – Internal_Energy Internal_Energy
steric energy (for each two atoms i,j) Eij = C/dij
12 - D/dij6
with C and D chosen such that Eij is minimal for dij = ri
+rj
torsional energy (for four consecutively bonded atoms i,j,k,l)
Eijkl = ½ Vijkl [1 + ηijkl / |ηijkl| cos(|ηijkl| x ωijkl) ] with ω torsional angle η periodicity (predefined) V barrier to rotation (predefined)