de novo design tools for the generation of synthetically accessible ligands
Post on 03-Jan-2016
41 Views
Preview:
DESCRIPTION
TRANSCRIPT
De Novo design tools for the generation of synthetically
accessible ligands
IC AM S
Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko
To suggest potential leads that bind strongly to a given protein because of
shape and electrostatic complementarity Are easy to synthesise
Receptor Structure Based Drug Design
Docking methods (preferably flexible docking) identify new lead structures by rapidly screening a database of 3-D structures of known compounds
De novo design methods (such as SPROUT) construct a diverse set of entirely novel potential leads from scratch
Approaches:
Objective:
Detects potential binding pockets of the protein structures
Identifies favourable hydrogen bonding
interaction sites (H-bonding, hydrophobic,
covalent, metal, user defined)Docks structures to target interaction sites
Generates 3D molecular structures of novel ligands by linking the docked starting fragments together in an incremental construction scheme
Scores, sorts and clusters the solutions
SPROUT Components
De novo design programs such as SPROUT
can suggest large sets of entirely novel
potential leads
Problem with Large Answer Sets
Powerful heuristics are necessary
to evaluate (and reduce) often large answer
sets
Eliminate candidates with
poor estimated binding
affinity
Binding Affinity ScoreEliminate candidates with
complex molecular
structures
Synthetic Feasibility
For de novo design prediction For de novo design prediction of synthetic accessibilty is of synthetic accessibilty is
equally importantequally important
Hypothetical ligands, including those Hypothetical ligands, including those predicted to bind very strongly, have predicted to bind very strongly, have no practical value unless they can be no practical value unless they can be
readily synthesised.readily synthesised.
Our Attempts to Provide Our Attempts to Provide Solutions: Solutions:
CAESA (estimates synthetic accessibility)
Complexity Analysis (estimates structural complexity and drug-likeness)
SynSPROUT (avoids the problem by building constraints into the structure generation process)
CAESAComputer Assisted Estimation of Synthetic Accessibility
Glenn MyattJon Baber
Goals of CAESA Project
Clear need for automated method of ranking hypothetical compounds according to perceived ease of synthesis
Good synthetic chemists can do this job themselves on small number of compounds but are unwilling to do it for hundreds or thousands of compounds
CAESA attempts to do the same job but never gets bored!
Estimation of Synthetic Accessibility: Criteria used by CAESA
CAESA scores the synthetic accessibility of structuresusing two main criteria:
a) An estimate of structural complexity: stereocentres complex topological features (fusions etc.) functional group complexity
b) Availability of good starting materials: rapid retrosynthetic analysis database of commercially available materials reaction rule base (editable)
CAESA Components
ComplexityKnowledgeBase
IdentifySyntheticMolecularComplexity
Target Structure
SelectedStartingMaterials
Detailed Information on theStructure's Complexity
Automatic Selection of Starting Materials
Starting Materials and Synthetic Accessibility
Availability of suitable starting materials very important factor - good starting materials can dramatically reduce the difficulty of synthesising a compound.
Good starting materials for part of the target molecule means the analysis of structural synthetic difficulty or complexity can be directed to just those portions of the target molecule that cannot be made from available starting materials
Finding good starting materials through retrosynthetic analysis also
provides possible synthetic routes as a byproduct
TARGET
Identified as an available starting material
In the absence of frequent user intervention the combinatorial nature of this approach prohibits very broad searches (many alternative reactions) as well as very deep searches (many steps between starting materials and products)
Level 1 precursors
Level 2 precursors
Level 3 precursors
Traditional Retrosynthetic Analysis
O
O
OO
Database of Starting Materials
O
O
OO
Retrosynthetic on-line Generation of Precursors
Synthetic off-line generation of virtual SMs
:
:
O
O
OO
O
O
OO
O
O
BrO
BrO
MT
O
OOMT
MT
MT
MT
MT
B
A
Target
MT=metal or metalloid
Bidirectional Search for Synthetic Routes
Example of Starting Material Selection
F O
N
O
N
O
Cl
N
O
Complexity Analysis
Starting Materials Selected
Residual Complexity
Target Structure
X XX
X
X
XX
X
X
X
F O
O
O
Cl
N
O
N
X
X
X
X
OO
adjacent stereocentres on ring - relatively easy to control stereochemistry
Summary of CAESA Features CAESA carries out a retrosynthetic analysis which terminates
when a starting material from a database (such as ACD) is found
Found starting materials are scored according to length and difficulty of reaction sequence and coverage of target compound
All chemistry rules and transformations are described in editable text knowledge bases easily modified by chemists
Quality of the analysis depends on the chemistry included in the knowledge bases and the comprehensiveness of the starting material libraries
But CAESA is relatively slow and speedier methods needed for pruning of large data sets
Alternative Approach
Complexity Analysis
Based on statistical distribution of various substitution patterns found in databases of existing drugs and available starting materials.
Molecular Complexity Analysis of de Novo Designed LigandsKrisztina Boda and A. Peter JohnsonJ. Med. Chem.; 2006; ASAP Web Release Date: 26-Jan-2006
If a molecular structure contains ring and
chain substitution patterns which are
common in
Assumption
Complexity analysis based on
statistical distribution of various
substitution patterns
existing drugs
than the structure is
likely
to be “drug-like” as well
as readily synthesisable
available starting
materials,
then the structure is
likely
to be readily
synthesisable
Building Complexity Database
N
N
NO
N
O
O
O
Input structureInput structure
N O
NON N
OO
Enumerate chainpatterns
• 1-centred
• 2-centred
• 3-centred
• 4-centred
NNO
O O
O
N O
Enumerate ring/ring substitution patterns
N
N
O
O
N
N
O
N
N
DatabaseDatabaseof chainsof chains
DatabaseDatabaseof rings/ring of rings/ring substitutionssubstitutions
Atom Substitution HierarchyA
AA
A
AAsp2
sp3
Ring (and chain) substitutions
are organised in hierarchies
3591
N 1586
S
494
F
688
Cl
537
Br
62
O
S
610 N
S
420NN 83 N
Cl
352
NNN 21 NN
Br
30
3780O
N
F
266
N
NN 6
N
NO 32
NO 32
The hierarchy stores:
• Atom type sequence• Number of occurrences• Binding properties Total Total
occurrences of occurrences of the topology: the topology:
11,80111,801
N
Ligand Complexity Analysis
3. Match canonical name against the hierarchy roots of the database
4. Retrieval of frequency of occurrences → Calculate score
DATABASEof
hierarchies +
frequency of
occurrences
5. Rank structures by complexity score
N N N
1. Enumerate ring and chain patterns
2. Generate canonical names for each atom patternCanonical name : A Canonical name : BCanonical name : C
[More Patterns]
Speed of
Complexity
Analysis
~ 1000-1200
structures /
minute
on Linux PC
(3GHz)
CONCEPTCONCEPT
rotbondstereo PP
Patternsof Num
SCORESCORESCORE SUBS.ATOM TOPOLOGY
TOTAL
Calculation of Complexity Score
Penalise atom patterns which are infrequent or not present in the complexity database.
In SPROUT the complexity analysis is followed by ranking the putative ligands according to their evaluated complexity score.
,*2
,* topology)commonr.most ln(No.occu
topology)r.matchedln(No.occu-1
SCORE
database from missing topology if
exists topology if
TOPOLOGY
Penalty
Penalty
,*2
,*) subs. atom commonr.most ln(No.occu
subs.) atom matchedr.best ln(No.occu-1
SCORE
missing subs. atomor topology if
exists subs atom matching if
SUBSATOM
Penalty
Penalty
Penalty values can be altered to tailor the system for different applications.
The penalty values used in the examples presented here are 25, 20, 15, 10 for 1-,2-,3- and 4-centred chain patterns, 40 and 30 for rings and ring substitutions.
Validation ExperimentComparison with CAESA
Both methods used to estimate synthetic accessibility for the same set of 50 top selling drugs
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.0
110.0
0%20%40%60%80%100%
CAESA prediction of synthetic accessibility
Co
mp
lexi
ty a
nal
ysis
sco
re
CAESA vs. Complexity Analysis
Elapsed time:
CAESA : 703 sec
Complexity
Analysis : 8 sec
Complexity scores are calculated using the complexity database derived from available SMs + 2.0 penalty for each identified stereo centre in the structures.
Complexity Analysis vs CAESA
More suitable for prioritization of thousands of structures within a reasonable time frame.
Provides acceptable compromise between the speed of the analysis and the accuracy of calculated scores.
Because this approach is based on characteristics of existing readily available compounds, simple but novel structural features may be wrongly identified as complex
Yet another alternative approach
Build synthetic feasibility into the structure
generation process IC AM S~
SynSPROUT Approach
Readily synthetisable putative
ligand structures
Reliable high yielding
reactions
SyntheticSyntheticKnowledge Knowledge
BaseBasePool of readily
available starting
materials
FragmentFragmentLibraryLibrary
fuse
spiro
new bond
Classic SPROUTClassic SPROUT
Built in / user defined reactions:Amide formationEther formationEster formationAmine alkylationReductive aminationetc.
SynSPROUTSynSPROUT
Ease of synthesis
is a key factor in drug
development
Build synthetic
constraints into
structure generation
process
VIRTUAL SYNTHESIS VIRTUAL SYNTHESIS IN IN
RECEPTOR CAVITYRECEPTOR CAVITY
SynSPROUT SchemeSynSPROUT Scheme
Current Status Promising structures with estimated high binding affinity SynSPROUT provides the equivalent to screening a large number of combinatorial libraries Potential for suggesting starting points for new combinatorial libraries Combination of a large starting material library with a large reaction knowledgebase causes a combinatorial problem – even with parallel processing Restricting either size of library or number of synthetic reactions gives acceptable run times
De Novo Structure Generation vs. Lead Optimization
De Novo Structure Generation
Lead Optimization
No structural information from any existing bound ligand is utilised
To generate diverse putative ligands from scratch
To suggest better ligands structurally similar to the bound one
The structure of a good bound ligand provides a starting point (core)
AIMAIMAIMAIM
Variations on the SynSPROUT Theme
SPROUT LeadOpt
Two modes for structure based lead optimisation Core Extension – Extends core structure (derived
from lead) by virtual synthetic chemistry Monomer Replacement – Replaces monomers
which have been identified by retrosynthetic analysis of a lead compound
Core Extension Import the modified bound ligand (core) +
identify substitution points (functional groups)
Generate core + monomer product by performing virtual synthetic reaction(s) at selected functional groups
Estimate binding affinity for products
List of reactions (between functional groups)
Synthetic Synthetic Knowledge BaseKnowledge Base
Core Extension Scheme
CORE
CORE
CORE
CORE
Simulate Simulate synthetic synthetic
reaction in reaction in the 3D the 3D
context of context of receptor receptor
sitesite
CORE
CORE
RR2323
RR1313
CORECORERR1212
RR2222
RR3333RR3232RR3131
RR1111
RR2121Multiple low energy conformers + detected functional groups
Core StructureCore Structure
Monomer LibraryMonomer Library General Scheme All possible core + monomer combinations are generated
Automatic Monomer Library Generation
SDF file of SDF file of 3D monomers3D monomers Perception
Knowledge Base
o Aromaticityo Normalisationo Hybridisationo H-bonding properties
Synthetic rules
Functional Groups
Synthetic Knowledge
BaseAtom & Ring
Perception
Detect FunctionalGroups (joining points)
Multiple low energy conformers + detected functional groups
Monomer LibraryMonomer Library
…
CHEMICAL-LABEL <Carboxylic Acid>C[SPCENTRE=2](=O)-O[HS=1]CHEMICAL-LABEL <Primary Amine>C-N[HS=2];[CONNECTION=1]
Synthetic Knowledge Base
Steps of formation Hybridization changes Bond type Bond length Dihedral penalty/angle
Steps of Joining Rules
EXPLANATION Amide Formation IF Carboxylic Acid INTER Primary AmineTHEN delete-atom 3 change-hybridization 5 to SP2 form-bond - between 1 and 5DIHEDRAL-ATOMS 2 1 5 4DIHEDRAL 0 0 BOND-LENGTH 1.35END-THEN
1
3
2
4
5+
Importing the Core Structure (from MOL/PDB file in Elephant module)
Importing from a pdb file pdb→mol converter is invoked
Functional group(s) are automatically detected when
the core structure isimported into the system
Hydrogen donor/acceptor or spheric target sites anchor the imported core structure inside
the receptor cavity, partially restricting the displacement of the core during lead optimization, but allowing slight
movements in order to avoid boundary violations.
Product Generation I.
R1
Sulphonamide Formation
Amide Formation
Core R2
Generate products bymimicking synthetic reactions between core + monomers
Step I.
Product Generation II.
Secondary conformers generated by twisting about rotatable bonds of the low energy monomer conformersUser defined parameters:• Max deviation • Sampling of dihedral angles• Max penalty
Primary monomer conformers generated by(a) CORINA + ROTATE(b) sampling discrete dihedral angles around formed bonds
Rigid body docking
R1
R2
Core
Ligand flexibility = generate multiple low energy conformers
Step II.
Product Generation III.
Docking + rejection of conformers with• High internal energy• Boundary violation
Step III.
Multiple Extension Points Combinatorial Problem
Clients-Master-Slaves architecture Mixed SGI/Linux cluster network (TCP/IP socket network communication)
MasterMaster
ClientClient11 ClientClient22 ClientClient33 …
…
LinuxLinux SGISGI
SlaveSlave11
CORECORE
RR33
RR11
RR22
SlaveSlave22
CORECORE
RR33
RR11
RR22
SlaveSlave33
CORECORE
RR33
RR11
RR22
Each slave performs
optimization on different core +
monomer combination
PDB: 1KE8
NS
N
O
N
SN
OO
Case Study (CDK2)
CORECORERR11 RR22
ROTATEROTATE
11712D structures
CORINACORINA
4557 3D conformers
Monomer Library
At least one of the following functional groups: Carboxylic Acid Primary Amine Primary Alkyl Halide Carbonyl
Applied Applied filtersfilters Number of heavy atoms ≥ 8
Number of heavy atoms ≤ 16 Number of acceptor atom ≤ 5 Number of donor atoms ≤ 3 Number of rotatable bonds ≤ 2 Max chain length ≤ 3 Allowed atom types: H, B, C, N, O, F, S, Cl, Br Number of rings ≤ 3 Stereo centres ≤ 1 No 3,4,7,8,9 –membered ring
Maybridge & Aldrich(~140.000) 2D structures
Monomer Reagent Library Generation
Case Study (CDK2)
• Primary amine in sulphonamide formation
Sulphonyl chloride reacts with
• Carboxylic acid in amide reaction• Primary aryl halide in amine alkylation reaction • Carbonyl in reductive amination and imine formation
Primary amine reacts with
CORECORERR11 RR22
Case Study (CDK2)
CORECORERR11 RR22
523 Primary Amine
R1 Monomer Library
Elapsed time ~ 5 Hours (with 100 slave processors) R1 +Core + R2 combinations:• Screened 81.23%• Failed 4.87 %• Accepted 13.90 % (54,123)
Results
293 Carboxylic Acid 93 Primary Alkyl Halide393 Carbonyl
R2 Monomer Library
x = 432,345combinations
2549
25913
1014
24646
0
5,000
10,000
15,000
20,000
25,000
30,000
-4 to -5 -5 to -6 -6 to -7 -7 to -8
Estimated binding affinity
Nu
mb
er o
f g
ener
ated
pro
du
cts
Case Study (CDK2)
-7.95
-7.82
-7.75
-7.60
-7.47
-7.56
-7.45
-7.07
Case Study (Generated Products)
Monomer Replacement
• Many lead compounds are composed of readily available starting materials (monomers) linked by reliable high yielding reactions
• Retrosynthetic analysis can be used to identify the monomers
• Structurally related analogues could be generated by exhaustive monomer replacement
• Considerable efficiency gains if monomer library is arranged in a hierarchy based on substructural relationships
AmideAmide
N
N
N
N
N
N
N
N
N
N
N
N
NN
SubstructureNo overlapSubstructureSubstructure
N
N
N
N
N
N
SuperstructureSuperstructureNo overlap
Hierarchy ConstructionHierarchy Construction
AmideAmide
NN
N
N
N
N
N
N
N
N
N
N
N
N
N
OO
O
ON
O
ON
O
ON
O
ON
O
Hierarchy UsageHierarchy Usage
Monomer ReplacementMonomer Replacement
N
O
O
O
N
O
HO
OHO
O
NH2
Cl
NHHO
Cl
Do they Do they exist in exist in starting starting
materials materials HIERARCHHIERARCH
Y?Y?
Retro-synthetic analysis
N
O
O
O
N
O
N
O
N
O
O
O
N
O
O
N
O
N
O
O
O
N
O
N
CASE STUDY Optimisation of SPROUT designed inhibitors of p falciparum Dihydro-orotate
Dehydrogenase using Monomer Replacement
Initial lead compound MD-155
Sprout score -7.88
Retrosynthetic analysis finds amide formation and Ullmann/Suzuki reaction for monomer formation
Monomer library: aryl halides and p-halo-anilines
2D structures: 1923
conformations: 26916
High scoring monomer replacement results
Monomer replacement gave 840 new structures (including multiple conformers of the same structure)
Scores – 7.50 to 9.30.
Experimental Results for Some Ligands Suggested
by SPROUT LeadOpt Monomer Replacement
Starting Point
MD-155PfDHODH Ki 3.0 mM
HsDHODH Ki 11.0 nM
MD-204PfDHODH Ki 733 nM
HsDHODH Ki 21.0 nM
4 fold enhancement in Ki for PfDHODH
MD-213PfDHODH Ki 478 nM
HsDHODH Ki 21.7 nM
6 fold enhancement in
Ki for PfDHODH
N
CH3HO
H
O
CNN
CH3HO
H
O
CN
CF3
Cl
N
O
CN
CH3OH
HCl
Cl
Conclusions
Scoring functions for assessment of binding affinity of the hypothetical compounds produced by de novo design are far from perfect
Hence only readily synthesisable putative ligands will undergo experimental evaluation by medicinal chemists
Assessment of synthetic feasibility is a tractable problem
Acknowledgements
Matt Davies, Phil Bone and Timo Heikkala for experimental work
Molecular Networks GmbH for providing CORINA & ROTATE
MDL for providing MDDR, one of the databases used in the complexity analysis project
for sponsoring the lead optimization project
top related