de novo design tools for the generation of synthetically accessible ligands

De Novo design tools for the generation of synthetically

accessible ligands

IC AM S

Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko

To suggest potential leads that bind strongly to a given protein because of

shape and electrostatic complementarity Are easy to synthesise

Receptor Structure Based Drug Design

Docking methods (preferably flexible docking) identify new lead structures by rapidly screening a database of 3-D structures of known compounds

De novo design methods (such as SPROUT) construct a diverse set of entirely novel potential leads from scratch

Approaches:

Objective:

Detects potential binding pockets of the protein structures

Identifies favourable hydrogen bonding

interaction sites (H-bonding, hydrophobic,

covalent, metal, user defined)Docks structures to target interaction sites

Generates 3D molecular structures of novel ligands by linking the docked starting fragments together in an incremental construction scheme

Scores, sorts and clusters the solutions

SPROUT Components

De novo design programs such as SPROUT

can suggest large sets of entirely novel

potential leads

Problem with Large Answer Sets

Powerful heuristics are necessary

to evaluate (and reduce) often large answer

Eliminate candidates with

poor estimated binding

affinity

Binding Affinity ScoreEliminate candidates with

complex molecular

structures

Synthetic Feasibility

For de novo design prediction For de novo design prediction of synthetic accessibilty is of synthetic accessibilty is

equally importantequally important

Hypothetical ligands, including those Hypothetical ligands, including those predicted to bind very strongly, have predicted to bind very strongly, have no practical value unless they can be no practical value unless they can be

readily synthesised.readily synthesised.

Our Attempts to Provide Our Attempts to Provide Solutions: Solutions:

CAESA (estimates synthetic accessibility)

Complexity Analysis (estimates structural complexity and drug-likeness)

SynSPROUT (avoids the problem by building constraints into the structure generation process)

CAESAComputer Assisted Estimation of Synthetic Accessibility

Glenn MyattJon Baber

Goals of CAESA Project

Clear need for automated method of ranking hypothetical compounds according to perceived ease of synthesis

Good synthetic chemists can do this job themselves on small number of compounds but are unwilling to do it for hundreds or thousands of compounds

CAESA attempts to do the same job but never gets bored!

Estimation of Synthetic Accessibility: Criteria used by CAESA

CAESA scores the synthetic accessibility of structuresusing two main criteria:

a) An estimate of structural complexity: stereocentres complex topological features (fusions etc.) functional group complexity

b) Availability of good starting materials: rapid retrosynthetic analysis database of commercially available materials reaction rule base (editable)

CAESA Components

ComplexityKnowledgeBase

IdentifySyntheticMolecularComplexity

Target Structure

SelectedStartingMaterials

Detailed Information on theStructure's Complexity

Automatic Selection of Starting Materials

Starting Materials and Synthetic Accessibility

Availability of suitable starting materials very important factor - good starting materials can dramatically reduce the difficulty of synthesising a compound.

Good starting materials for part of the target molecule means the analysis of structural synthetic difficulty or complexity can be directed to just those portions of the target molecule that cannot be made from available starting materials

Finding good starting materials through retrosynthetic analysis also

provides possible synthetic routes as a byproduct

TARGET

Identified as an available starting material

In the absence of frequent user intervention the combinatorial nature of this approach prohibits very broad searches (many alternative reactions) as well as very deep searches (many steps between starting materials and products)

Level 1 precursors

Level 2 precursors

Level 3 precursors

Traditional Retrosynthetic Analysis

Database of Starting Materials

Retrosynthetic on-line Generation of Precursors

Synthetic off-line generation of virtual SMs

Target

MT=metal or metalloid

Bidirectional Search for Synthetic Routes

Example of Starting Material Selection

Complexity Analysis

Starting Materials Selected

Residual Complexity

Target Structure

adjacent stereocentres on ring - relatively easy to control stereochemistry

Summary of CAESA Features CAESA carries out a retrosynthetic analysis which terminates

when a starting material from a database (such as ACD) is found

Found starting materials are scored according to length and difficulty of reaction sequence and coverage of target compound

All chemistry rules and transformations are described in editable text knowledge bases easily modified by chemists

Quality of the analysis depends on the chemistry included in the knowledge bases and the comprehensiveness of the starting material libraries

But CAESA is relatively slow and speedier methods needed for pruning of large data sets

Alternative Approach

Complexity Analysis

Based on statistical distribution of various substitution patterns found in databases of existing drugs and available starting materials.

Molecular Complexity Analysis of de Novo Designed LigandsKrisztina Boda and A. Peter JohnsonJ. Med. Chem.; 2006; ASAP Web Release Date: 26-Jan-2006

If a molecular structure contains ring and

chain substitution patterns which are

common in

Assumption

Complexity analysis based on

statistical distribution of various

substitution patterns

existing drugs

than the structure is

likely

to be “drug-like” as well

as readily synthesisable

available starting

materials,

then the structure is

likely

to be readily

synthesisable

Building Complexity Database

Input structureInput structure

Enumerate chainpatterns

• 1-centred

• 2-centred

• 3-centred

• 4-centred

Enumerate ring/ring substitution patterns

DatabaseDatabaseof chainsof chains

DatabaseDatabaseof rings/ring of rings/ring substitutionssubstitutions

Atom Substitution HierarchyA

Ring (and chain) substitutions

are organised in hierarchies

N 1586

420NN 83 N

NNN 21 NN

The hierarchy stores:

• Atom type sequence• Number of occurrences• Binding properties Total Total

occurrences of occurrences of the topology: the topology:

11,80111,801

Ligand Complexity Analysis

3. Match canonical name against the hierarchy roots of the database

4. Retrieval of frequency of occurrences → Calculate score

DATABASEof

hierarchies +

frequency of

occurrences

5. Rank structures by complexity score

1. Enumerate ring and chain patterns

2. Generate canonical names for each atom patternCanonical name : A Canonical name : BCanonical name : C

[More Patterns]

Speed of

Complexity

Analysis

~ 1000-1200

structures /

minute

on Linux PC

(3GHz)

CONCEPTCONCEPT

rotbondstereo PP

Patternsof Num

SCORESCORESCORE SUBS.ATOM TOPOLOGY

Calculation of Complexity Score

Penalise atom patterns which are infrequent or not present in the complexity database.

In SPROUT the complexity analysis is followed by ranking the putative ligands according to their evaluated complexity score.

,* topology)commonr.most ln(No.occu

topology)r.matchedln(No.occu-1

database from missing topology if

exists topology if

TOPOLOGY

Penalty

,*) subs. atom commonr.most ln(No.occu

subs.) atom matchedr.best ln(No.occu-1

missing subs. atomor topology if

exists subs atom matching if

SUBSATOM

Penalty

Penalty values can be altered to tailor the system for different applications.

The penalty values used in the examples presented here are 25, 20, 15, 10 for 1-,2-,3- and 4-centred chain patterns, 40 and 30 for rings and ring substitutions.

Validation ExperimentComparison with CAESA

Both methods used to estimate synthetic accessibility for the same set of 50 top selling drugs

0%20%40%60%80%100%

CAESA prediction of synthetic accessibility

CAESA vs. Complexity Analysis

Elapsed time:

CAESA : 703 sec

Complexity

Analysis : 8 sec

Complexity scores are calculated using the complexity database derived from available SMs + 2.0 penalty for each identified stereo centre in the structures.

Complexity Analysis vs CAESA

More suitable for prioritization of thousands of structures within a reasonable time frame.

Provides acceptable compromise between the speed of the analysis and the accuracy of calculated scores.

Because this approach is based on characteristics of existing readily available compounds, simple but novel structural features may be wrongly identified as complex

Yet another alternative approach

Build synthetic feasibility into the structure

generation process IC AM S~

SynSPROUT Approach

Readily synthetisable putative

ligand structures

Reliable high yielding

reactions

SyntheticSyntheticKnowledge Knowledge

BaseBasePool of readily

available starting

materials

FragmentFragmentLibraryLibrary

new bond

Classic SPROUTClassic SPROUT

Built in / user defined reactions:Amide formationEther formationEster formationAmine alkylationReductive aminationetc.

SynSPROUTSynSPROUT

Ease of synthesis

is a key factor in drug

development

Build synthetic

constraints into

structure generation

process

VIRTUAL SYNTHESIS VIRTUAL SYNTHESIS IN IN

RECEPTOR CAVITYRECEPTOR CAVITY

SynSPROUT SchemeSynSPROUT Scheme

Current Status Promising structures with estimated high binding affinity SynSPROUT provides the equivalent to screening a large number of combinatorial libraries Potential for suggesting starting points for new combinatorial libraries Combination of a large starting material library with a large reaction knowledgebase causes a combinatorial problem – even with parallel processing Restricting either size of library or number of synthetic reactions gives acceptable run times

De Novo Structure Generation vs. Lead Optimization

De Novo Structure Generation

Lead Optimization

No structural information from any existing bound ligand is utilised

To generate diverse putative ligands from scratch

To suggest better ligands structurally similar to the bound one

The structure of a good bound ligand provides a starting point (core)

AIMAIMAIMAIM

Variations on the SynSPROUT Theme

SPROUT LeadOpt

Two modes for structure based lead optimisation Core Extension – Extends core structure (derived

from lead) by virtual synthetic chemistry Monomer Replacement – Replaces monomers

which have been identified by retrosynthetic analysis of a lead compound

Core Extension Import the modified bound ligand (core) +

identify substitution points (functional groups)

Generate core + monomer product by performing virtual synthetic reaction(s) at selected functional groups

Estimate binding affinity for products

List of reactions (between functional groups)

Synthetic Synthetic Knowledge BaseKnowledge Base

Core Extension Scheme

Simulate Simulate synthetic synthetic

reaction in reaction in the 3D the 3D

context of context of receptor receptor

sitesite

RR2323

RR1313

CORECORERR1212

RR2222

RR3333RR3232RR3131

RR1111

RR2121Multiple low energy conformers + detected functional groups

Core StructureCore Structure

Monomer LibraryMonomer Library General Scheme All possible core + monomer combinations are generated

Automatic Monomer Library Generation

SDF file of SDF file of 3D monomers3D monomers Perception

Knowledge Base

o Aromaticityo Normalisationo Hybridisationo H-bonding properties

Synthetic rules

Functional Groups

Synthetic Knowledge

BaseAtom & Ring

Perception

Detect FunctionalGroups (joining points)

Multiple low energy conformers + detected functional groups

Monomer LibraryMonomer Library

CHEMICAL-LABEL <Carboxylic Acid>C[SPCENTRE=2](=O)-O[HS=1]CHEMICAL-LABEL <Primary Amine>C-N[HS=2];[CONNECTION=1]

Synthetic Knowledge Base

Steps of formation Hybridization changes Bond type Bond length Dihedral penalty/angle

Steps of Joining Rules

EXPLANATION Amide Formation IF Carboxylic Acid INTER Primary AmineTHEN delete-atom 3 change-hybridization 5 to SP2 form-bond - between 1 and 5DIHEDRAL-ATOMS 2 1 5 4DIHEDRAL 0 0 BOND-LENGTH 1.35END-THEN

Importing the Core Structure (from MOL/PDB file in Elephant module)

Importing from a pdb file pdb→mol converter is invoked

Functional group(s) are automatically detected when

the core structure isimported into the system

Hydrogen donor/acceptor or spheric target sites anchor the imported core structure inside

the receptor cavity, partially restricting the displacement of the core during lead optimization, but allowing slight

movements in order to avoid boundary violations.

Product Generation I.

Sulphonamide Formation

Amide Formation

Core R2

Generate products bymimicking synthetic reactions between core + monomers

Step I.

Product Generation II.

Secondary conformers generated by twisting about rotatable bonds of the low energy monomer conformersUser defined parameters:• Max deviation • Sampling of dihedral angles• Max penalty

Primary monomer conformers generated by(a) CORINA + ROTATE(b) sampling discrete dihedral angles around formed bonds

Rigid body docking

Ligand flexibility = generate multiple low energy conformers

Step II.

Product Generation III.

Docking + rejection of conformers with• High internal energy• Boundary violation

Step III.

Multiple Extension Points Combinatorial Problem

Clients-Master-Slaves architecture Mixed SGI/Linux cluster network (TCP/IP socket network communication)

MasterMaster

ClientClient11 ClientClient22 ClientClient33 …

LinuxLinux SGISGI

SlaveSlave11

CORECORE

SlaveSlave22

CORECORE

SlaveSlave33

CORECORE

Each slave performs

optimization on different core +

monomer combination

PDB: 1KE8

Case Study (CDK2)

CORECORERR11 RR22

ROTATEROTATE

11712D structures

CORINACORINA

4557 3D conformers

Monomer Library

At least one of the following functional groups: Carboxylic Acid Primary Amine Primary Alkyl Halide Carbonyl

Applied Applied filtersfilters Number of heavy atoms ≥ 8

Number of heavy atoms ≤ 16 Number of acceptor atom ≤ 5 Number of donor atoms ≤ 3 Number of rotatable bonds ≤ 2 Max chain length ≤ 3 Allowed atom types: H, B, C, N, O, F, S, Cl, Br Number of rings ≤ 3 Stereo centres ≤ 1 No 3,4,7,8,9 –membered ring

Maybridge & Aldrich(~140.000) 2D structures

Monomer Reagent Library Generation

Case Study (CDK2)

• Primary amine in sulphonamide formation

Sulphonyl chloride reacts with

• Carboxylic acid in amide reaction• Primary aryl halide in amine alkylation reaction • Carbonyl in reductive amination and imine formation

Primary amine reacts with

CORECORERR11 RR22

Case Study (CDK2)

CORECORERR11 RR22

523 Primary Amine

R1 Monomer Library

Elapsed time ~ 5 Hours (with 100 slave processors) R1 +Core + R2 combinations:• Screened 81.23%• Failed 4.87 %• Accepted 13.90 % (54,123)

Results

293 Carboxylic Acid 93 Primary Alkyl Halide393 Carbonyl

R2 Monomer Library

x = 432,345combinations

10,000

15,000

20,000

25,000

30,000

-4 to -5 -5 to -6 -6 to -7 -7 to -8

Estimated binding affinity

Case Study (CDK2)

Case Study (Generated Products)

Monomer Replacement

• Many lead compounds are composed of readily available starting materials (monomers) linked by reliable high yielding reactions

• Retrosynthetic analysis can be used to identify the monomers

• Structurally related analogues could be generated by exhaustive monomer replacement

• Considerable efficiency gains if monomer library is arranged in a hierarchy based on substructural relationships

AmideAmide

SubstructureNo overlapSubstructureSubstructure

SuperstructureSuperstructureNo overlap

Hierarchy ConstructionHierarchy Construction

AmideAmide

Hierarchy UsageHierarchy Usage

Monomer ReplacementMonomer Replacement

Do they Do they exist in exist in starting starting

materials materials HIERARCHHIERARCH

Retro-synthetic analysis

CASE STUDY Optimisation of SPROUT designed inhibitors of p falciparum Dihydro-orotate

Dehydrogenase using Monomer Replacement

Initial lead compound MD-155

Sprout score -7.88

Retrosynthetic analysis finds amide formation and Ullmann/Suzuki reaction for monomer formation

Monomer library: aryl halides and p-halo-anilines

2D structures: 1923

conformations: 26916

High scoring monomer replacement results

Monomer replacement gave 840 new structures (including multiple conformers of the same structure)

Scores – 7.50 to 9.30.

Experimental Results for Some Ligands Suggested

by SPROUT LeadOpt Monomer Replacement

Starting Point

MD-155PfDHODH Ki 3.0 mM

HsDHODH Ki 11.0 nM

MD-204PfDHODH Ki 733 nM

HsDHODH Ki 21.0 nM

4 fold enhancement in Ki for PfDHODH

MD-213PfDHODH Ki 478 nM

HsDHODH Ki 21.7 nM

6 fold enhancement in

Ki for PfDHODH

Conclusions

Scoring functions for assessment of binding affinity of the hypothetical compounds produced by de novo design are far from perfect

Hence only readily synthesisable putative ligands will undergo experimental evaluation by medicinal chemists

Assessment of synthetic feasibility is a tractable problem

Acknowledgements

Matt Davies, Phil Bone and Timo Heikkala for experimental work

Molecular Networks GmbH for providing CORINA & ROTATE

MDL for providing MDDR, one of the databases used in the complexity analysis project

for sponsoring the lead optimization project

de novo design tools for the generation of synthetically accessible ligands

available starting materials

possible synthetic routes

novo design programs

novo design tools

structural complexity

drug design docking

target molecule

large answer setsfor

Documents

synthetically lethal interactions of heme oxygenase-1 and

de novo design of bispecific ligands exscientia ukqsar16

characterisation of synthetically developed cry1ab gene in...

etc won v17 · 2012. 10. 2. · synthetically produce...

advanced synthetically enhanced detector resolution...

synthetically encoded ultrashort ... - harvard university

asymmetric catalysis, privileged ligands and complexes ·...

development of synthetically useful methodologies …

os-prva-bj.skole.hros-prva-bj.skole.hr/upload/os-prva-bj/images/static3/3426/file/popis... ·...

benzoquinonediimine ligands: synthesis, coordination

synthetically supervised feature learning for scene...

nitrogen centered radical ligands

receptors bind specific ligands

cannabinoid receptor ligands review

chk1 inhibition is synthetically lethal with loss of...

novo a/s annual review 2015 -...

synthetically modified bioisosteres of salicyl alcohol and

development of p-chirogenic phosphine ligands based on...

generating synthetically accessible ligands by de novo...

cannabinoid receptor ligands