de novo design tools for the generation of synthetically accessible ligands

51
De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko

Upload: damon-hansen

Post on 03-Jan-2016

41 views

Category:

Documents


4 download

DESCRIPTION

De Novo design tools for the generation of synthetically accessible ligands. Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko. Receptor Structure Based Drug Design. Objective:. To suggest potential leads that - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: De Novo design tools for the generation of synthetically accessible ligands

De Novo design tools for the generation of synthetically

accessible ligands

IC AM S

Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko

Page 2: De Novo design tools for the generation of synthetically accessible ligands

To suggest potential leads that bind strongly to a given protein because of

shape and electrostatic complementarity Are easy to synthesise

Receptor Structure Based Drug Design

Docking methods (preferably flexible docking) identify new lead structures by rapidly screening a database of 3-D structures of known compounds

De novo design methods (such as SPROUT) construct a diverse set of entirely novel potential leads from scratch

Approaches:

Objective:

Page 3: De Novo design tools for the generation of synthetically accessible ligands

Detects potential binding pockets of the protein structures

Identifies favourable hydrogen bonding

interaction sites (H-bonding, hydrophobic,

covalent, metal, user defined)Docks structures to target interaction sites

Generates 3D molecular structures of novel ligands by linking the docked starting fragments together in an incremental construction scheme

Scores, sorts and clusters the solutions

SPROUT Components

Page 4: De Novo design tools for the generation of synthetically accessible ligands

De novo design programs such as SPROUT

can suggest large sets of entirely novel

potential leads

Problem with Large Answer Sets

Powerful heuristics are necessary

to evaluate (and reduce) often large answer

sets

Eliminate candidates with

poor estimated binding

affinity

Binding Affinity ScoreEliminate candidates with

complex molecular

structures

Synthetic Feasibility

Page 5: De Novo design tools for the generation of synthetically accessible ligands

For de novo design prediction For de novo design prediction of synthetic accessibilty is of synthetic accessibilty is

equally importantequally important

Hypothetical ligands, including those Hypothetical ligands, including those predicted to bind very strongly, have predicted to bind very strongly, have no practical value unless they can be no practical value unless they can be

readily synthesised.readily synthesised.

Our Attempts to Provide Our Attempts to Provide Solutions: Solutions:

CAESA (estimates synthetic accessibility)

Complexity Analysis (estimates structural complexity and drug-likeness)

SynSPROUT (avoids the problem by building constraints into the structure generation process)

Page 6: De Novo design tools for the generation of synthetically accessible ligands

CAESAComputer Assisted Estimation of Synthetic Accessibility

Glenn MyattJon Baber

Page 7: De Novo design tools for the generation of synthetically accessible ligands

Goals of CAESA Project

Clear need for automated method of ranking hypothetical compounds according to perceived ease of synthesis

Good synthetic chemists can do this job themselves on small number of compounds but are unwilling to do it for hundreds or thousands of compounds

CAESA attempts to do the same job but never gets bored!

Page 8: De Novo design tools for the generation of synthetically accessible ligands

Estimation of Synthetic Accessibility: Criteria used by CAESA

CAESA scores the synthetic accessibility of structuresusing two main criteria:

a) An estimate of structural complexity: stereocentres complex topological features (fusions etc.) functional group complexity

b) Availability of good starting materials: rapid retrosynthetic analysis database of commercially available materials reaction rule base (editable)

Page 9: De Novo design tools for the generation of synthetically accessible ligands

CAESA Components

ComplexityKnowledgeBase

IdentifySyntheticMolecularComplexity

Target Structure

SelectedStartingMaterials

Detailed Information on theStructure's Complexity

Page 10: De Novo design tools for the generation of synthetically accessible ligands

Automatic Selection of Starting Materials

Starting Materials and Synthetic Accessibility

Availability of suitable starting materials very important factor - good starting materials can dramatically reduce the difficulty of synthesising a compound.

Good starting materials for part of the target molecule means the analysis of structural synthetic difficulty or complexity can be directed to just those portions of the target molecule that cannot be made from available starting materials

Finding good starting materials through retrosynthetic analysis also

provides possible synthetic routes as a byproduct

Page 11: De Novo design tools for the generation of synthetically accessible ligands

TARGET

Identified as an available starting material

In the absence of frequent user intervention the combinatorial nature of this approach prohibits very broad searches (many alternative reactions) as well as very deep searches (many steps between starting materials and products)

Level 1 precursors

Level 2 precursors

Level 3 precursors

Traditional Retrosynthetic Analysis

Page 12: De Novo design tools for the generation of synthetically accessible ligands

O

O

OO

Database of Starting Materials

O

O

OO

Retrosynthetic on-line Generation of Precursors

Synthetic off-line generation of virtual SMs

:

:

O

O

OO

O

O

OO

O

O

BrO

BrO

MT

O

OOMT

MT

MT

MT

MT

B

A

Target

MT=metal or metalloid

Bidirectional Search for Synthetic Routes

Page 13: De Novo design tools for the generation of synthetically accessible ligands

Example of Starting Material Selection

F O

N

O

N

O

Cl

N

O

Complexity Analysis

Starting Materials Selected

Residual Complexity

Target Structure

X XX

X

X

XX

X

X

X

F O

O

O

Cl

N

O

N

X

X

X

X

OO

adjacent stereocentres on ring - relatively easy to control stereochemistry

Page 14: De Novo design tools for the generation of synthetically accessible ligands

Summary of CAESA Features CAESA carries out a retrosynthetic analysis which terminates

when a starting material from a database (such as ACD) is found

Found starting materials are scored according to length and difficulty of reaction sequence and coverage of target compound

All chemistry rules and transformations are described in editable text knowledge bases easily modified by chemists

Quality of the analysis depends on the chemistry included in the knowledge bases and the comprehensiveness of the starting material libraries

But CAESA is relatively slow and speedier methods needed for pruning of large data sets

Page 15: De Novo design tools for the generation of synthetically accessible ligands

Alternative Approach

Complexity Analysis

Based on statistical distribution of various substitution patterns found in databases of existing drugs and available starting materials.

Molecular Complexity Analysis of de Novo Designed LigandsKrisztina Boda and A. Peter JohnsonJ. Med. Chem.; 2006; ASAP Web Release Date: 26-Jan-2006

Page 16: De Novo design tools for the generation of synthetically accessible ligands

If a molecular structure contains ring and

chain substitution patterns which are

common in

Assumption

Complexity analysis based on

statistical distribution of various

substitution patterns

existing drugs

than the structure is

likely

to be “drug-like” as well

as readily synthesisable

available starting

materials,

then the structure is

likely

to be readily

synthesisable

Page 17: De Novo design tools for the generation of synthetically accessible ligands

Building Complexity Database

N

N

NO

N

O

O

O

Input structureInput structure

N O

NON N

OO

Enumerate chainpatterns

• 1-centred

• 2-centred

• 3-centred

• 4-centred

NNO

O O

O

N O

Enumerate ring/ring substitution patterns

N

N

O

O

N

N

O

N

N

DatabaseDatabaseof chainsof chains

DatabaseDatabaseof rings/ring of rings/ring substitutionssubstitutions

Page 18: De Novo design tools for the generation of synthetically accessible ligands

Atom Substitution HierarchyA

AA

A

AAsp2

sp3

Ring (and chain) substitutions

are organised in hierarchies

3591

N 1586

S

494

F

688

Cl

537

Br

62

O

S

610 N

S

420NN 83 N

Cl

352

NNN 21 NN

Br

30

3780O

N

F

266

N

NN 6

N

NO 32

NO 32

The hierarchy stores:

• Atom type sequence• Number of occurrences• Binding properties Total Total

occurrences of occurrences of the topology: the topology:

11,80111,801

Page 19: De Novo design tools for the generation of synthetically accessible ligands

N

Ligand Complexity Analysis

3. Match canonical name against the hierarchy roots of the database

4. Retrieval of frequency of occurrences → Calculate score

DATABASEof

hierarchies +

frequency of

occurrences

5. Rank structures by complexity score

N N N

1. Enumerate ring and chain patterns

2. Generate canonical names for each atom patternCanonical name : A Canonical name : BCanonical name : C

[More Patterns]

Speed of

Complexity

Analysis

~ 1000-1200

structures /

minute

on Linux PC

(3GHz)

Page 20: De Novo design tools for the generation of synthetically accessible ligands

CONCEPTCONCEPT

rotbondstereo PP

Patternsof Num

SCORESCORESCORE SUBS.ATOM TOPOLOGY

TOTAL

Calculation of Complexity Score

Penalise atom patterns which are infrequent or not present in the complexity database.

In SPROUT the complexity analysis is followed by ranking the putative ligands according to their evaluated complexity score.

,*2

,* topology)commonr.most ln(No.occu

topology)r.matchedln(No.occu-1

SCORE

database from missing topology if

exists topology if

TOPOLOGY

Penalty

Penalty

,*2

,*) subs. atom commonr.most ln(No.occu

subs.) atom matchedr.best ln(No.occu-1

SCORE

missing subs. atomor topology if

exists subs atom matching if

SUBSATOM

Penalty

Penalty

Penalty values can be altered to tailor the system for different applications.

The penalty values used in the examples presented here are 25, 20, 15, 10 for 1-,2-,3- and 4-centred chain patterns, 40 and 30 for rings and ring substitutions.

Page 21: De Novo design tools for the generation of synthetically accessible ligands

Validation ExperimentComparison with CAESA

Both methods used to estimate synthetic accessibility for the same set of 50 top selling drugs

Page 22: De Novo design tools for the generation of synthetically accessible ligands

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

110.0

0%20%40%60%80%100%

CAESA prediction of synthetic accessibility

Co

mp

lexi

ty a

nal

ysis

sco

re

CAESA vs. Complexity Analysis

Elapsed time:

CAESA : 703 sec

Complexity

Analysis : 8 sec

Complexity scores are calculated using the complexity database derived from available SMs + 2.0 penalty for each identified stereo centre in the structures.

Page 23: De Novo design tools for the generation of synthetically accessible ligands

Complexity Analysis vs CAESA

More suitable for prioritization of thousands of structures within a reasonable time frame.

Provides acceptable compromise between the speed of the analysis and the accuracy of calculated scores.

Because this approach is based on characteristics of existing readily available compounds, simple but novel structural features may be wrongly identified as complex

Page 24: De Novo design tools for the generation of synthetically accessible ligands

Yet another alternative approach

Build synthetic feasibility into the structure

generation process IC AM S~

Page 25: De Novo design tools for the generation of synthetically accessible ligands

SynSPROUT Approach

Readily synthetisable putative

ligand structures

Reliable high yielding

reactions

SyntheticSyntheticKnowledge Knowledge

BaseBasePool of readily

available starting

materials

FragmentFragmentLibraryLibrary

fuse

spiro

new bond

Classic SPROUTClassic SPROUT

Built in / user defined reactions:Amide formationEther formationEster formationAmine alkylationReductive aminationetc.

SynSPROUTSynSPROUT

Ease of synthesis

is a key factor in drug

development

Build synthetic

constraints into

structure generation

process

VIRTUAL SYNTHESIS VIRTUAL SYNTHESIS IN IN

RECEPTOR CAVITYRECEPTOR CAVITY

SynSPROUT SchemeSynSPROUT Scheme

Page 26: De Novo design tools for the generation of synthetically accessible ligands

Current Status Promising structures with estimated high binding affinity SynSPROUT provides the equivalent to screening a large number of combinatorial libraries Potential for suggesting starting points for new combinatorial libraries Combination of a large starting material library with a large reaction knowledgebase causes a combinatorial problem – even with parallel processing Restricting either size of library or number of synthetic reactions gives acceptable run times

Page 27: De Novo design tools for the generation of synthetically accessible ligands

De Novo Structure Generation vs. Lead Optimization

De Novo Structure Generation

Lead Optimization

No structural information from any existing bound ligand is utilised

To generate diverse putative ligands from scratch

To suggest better ligands structurally similar to the bound one

The structure of a good bound ligand provides a starting point (core)

AIMAIMAIMAIM

Page 28: De Novo design tools for the generation of synthetically accessible ligands

Variations on the SynSPROUT Theme

SPROUT LeadOpt

Two modes for structure based lead optimisation Core Extension – Extends core structure (derived

from lead) by virtual synthetic chemistry Monomer Replacement – Replaces monomers

which have been identified by retrosynthetic analysis of a lead compound

Page 29: De Novo design tools for the generation of synthetically accessible ligands

Core Extension Import the modified bound ligand (core) +

identify substitution points (functional groups)

Generate core + monomer product by performing virtual synthetic reaction(s) at selected functional groups

Estimate binding affinity for products

Page 30: De Novo design tools for the generation of synthetically accessible ligands

List of reactions (between functional groups)

Synthetic Synthetic Knowledge BaseKnowledge Base

Core Extension Scheme

CORE

CORE

CORE

CORE

Simulate Simulate synthetic synthetic

reaction in reaction in the 3D the 3D

context of context of receptor receptor

sitesite

CORE

CORE

RR2323

RR1313

CORECORERR1212

RR2222

RR3333RR3232RR3131

RR1111

RR2121Multiple low energy conformers + detected functional groups

Core StructureCore Structure

Monomer LibraryMonomer Library General Scheme All possible core + monomer combinations are generated

Page 31: De Novo design tools for the generation of synthetically accessible ligands

Automatic Monomer Library Generation

SDF file of SDF file of 3D monomers3D monomers Perception

Knowledge Base

o Aromaticityo Normalisationo Hybridisationo H-bonding properties

Synthetic rules

Functional Groups

Synthetic Knowledge

BaseAtom & Ring

Perception

Detect FunctionalGroups (joining points)

Multiple low energy conformers + detected functional groups

Monomer LibraryMonomer Library

Page 32: De Novo design tools for the generation of synthetically accessible ligands

CHEMICAL-LABEL <Carboxylic Acid>C[SPCENTRE=2](=O)-O[HS=1]CHEMICAL-LABEL <Primary Amine>C-N[HS=2];[CONNECTION=1]

Synthetic Knowledge Base

Steps of formation Hybridization changes Bond type Bond length Dihedral penalty/angle

Steps of Joining Rules

EXPLANATION Amide Formation IF Carboxylic Acid INTER Primary AmineTHEN delete-atom 3 change-hybridization 5 to SP2 form-bond - between 1 and 5DIHEDRAL-ATOMS 2 1 5 4DIHEDRAL 0 0 BOND-LENGTH 1.35END-THEN

1

3

2

4

5+

Page 33: De Novo design tools for the generation of synthetically accessible ligands

Importing the Core Structure (from MOL/PDB file in Elephant module)

Importing from a pdb file pdb→mol converter is invoked

Functional group(s) are automatically detected when

the core structure isimported into the system

Hydrogen donor/acceptor or spheric target sites anchor the imported core structure inside

the receptor cavity, partially restricting the displacement of the core during lead optimization, but allowing slight

movements in order to avoid boundary violations.

Page 34: De Novo design tools for the generation of synthetically accessible ligands

Product Generation I.

R1

Sulphonamide Formation

Amide Formation

Core R2

Generate products bymimicking synthetic reactions between core + monomers

Step I.

Page 35: De Novo design tools for the generation of synthetically accessible ligands

Product Generation II.

Secondary conformers generated by twisting about rotatable bonds of the low energy monomer conformersUser defined parameters:• Max deviation • Sampling of dihedral angles• Max penalty

Primary monomer conformers generated by(a) CORINA + ROTATE(b) sampling discrete dihedral angles around formed bonds

Rigid body docking

R1

R2

Core

Ligand flexibility = generate multiple low energy conformers

Step II.

Page 36: De Novo design tools for the generation of synthetically accessible ligands

Product Generation III.

Docking + rejection of conformers with• High internal energy• Boundary violation

Step III.

Page 37: De Novo design tools for the generation of synthetically accessible ligands

Multiple Extension Points Combinatorial Problem

Clients-Master-Slaves architecture Mixed SGI/Linux cluster network (TCP/IP socket network communication)

MasterMaster

ClientClient11 ClientClient22 ClientClient33 …

LinuxLinux SGISGI

SlaveSlave11

CORECORE

RR33

RR11

RR22

SlaveSlave22

CORECORE

RR33

RR11

RR22

SlaveSlave33

CORECORE

RR33

RR11

RR22

Each slave performs

optimization on different core +

monomer combination

Page 38: De Novo design tools for the generation of synthetically accessible ligands

PDB: 1KE8

NS

N

O

N

SN

OO

Case Study (CDK2)

CORECORERR11 RR22

Page 39: De Novo design tools for the generation of synthetically accessible ligands

ROTATEROTATE

11712D structures

CORINACORINA

4557 3D conformers

Monomer Library

At least one of the following functional groups: Carboxylic Acid Primary Amine Primary Alkyl Halide Carbonyl

Applied Applied filtersfilters Number of heavy atoms ≥ 8

Number of heavy atoms ≤ 16 Number of acceptor atom ≤ 5 Number of donor atoms ≤ 3 Number of rotatable bonds ≤ 2 Max chain length ≤ 3 Allowed atom types: H, B, C, N, O, F, S, Cl, Br Number of rings ≤ 3 Stereo centres ≤ 1 No 3,4,7,8,9 –membered ring

Maybridge & Aldrich(~140.000) 2D structures

Monomer Reagent Library Generation

Case Study (CDK2)

Page 40: De Novo design tools for the generation of synthetically accessible ligands

• Primary amine in sulphonamide formation

Sulphonyl chloride reacts with

• Carboxylic acid in amide reaction• Primary aryl halide in amine alkylation reaction • Carbonyl in reductive amination and imine formation

Primary amine reacts with

CORECORERR11 RR22

Case Study (CDK2)

Page 41: De Novo design tools for the generation of synthetically accessible ligands

CORECORERR11 RR22

523 Primary Amine

R1 Monomer Library

Elapsed time ~ 5 Hours (with 100 slave processors) R1 +Core + R2 combinations:• Screened 81.23%• Failed 4.87 %• Accepted 13.90 % (54,123)

Results

293 Carboxylic Acid 93 Primary Alkyl Halide393 Carbonyl

R2 Monomer Library

x = 432,345combinations

2549

25913

1014

24646

0

5,000

10,000

15,000

20,000

25,000

30,000

-4 to -5 -5 to -6 -6 to -7 -7 to -8

Estimated binding affinity

Nu

mb

er o

f g

ener

ated

pro

du

cts

Case Study (CDK2)

Page 42: De Novo design tools for the generation of synthetically accessible ligands

-7.95

-7.82

-7.75

-7.60

-7.47

-7.56

-7.45

-7.07

Case Study (Generated Products)

Page 43: De Novo design tools for the generation of synthetically accessible ligands

Monomer Replacement

• Many lead compounds are composed of readily available starting materials (monomers) linked by reliable high yielding reactions

• Retrosynthetic analysis can be used to identify the monomers

• Structurally related analogues could be generated by exhaustive monomer replacement

• Considerable efficiency gains if monomer library is arranged in a hierarchy based on substructural relationships

Page 44: De Novo design tools for the generation of synthetically accessible ligands

AmideAmide

N

N

N

N

N

N

N

N

N

N

N

N

NN

SubstructureNo overlapSubstructureSubstructure

N

N

N

N

N

N

SuperstructureSuperstructureNo overlap

Hierarchy ConstructionHierarchy Construction

Page 45: De Novo design tools for the generation of synthetically accessible ligands

AmideAmide

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

OO

O

ON

O

ON

O

ON

O

ON

O

Hierarchy UsageHierarchy Usage

Page 46: De Novo design tools for the generation of synthetically accessible ligands

Monomer ReplacementMonomer Replacement

N

O

O

O

N

O

HO

OHO

O

NH2

Cl

NHHO

Cl

Do they Do they exist in exist in starting starting

materials materials HIERARCHHIERARCH

Y?Y?

Retro-synthetic analysis

N

O

O

O

N

O

N

O

N

O

O

O

N

O

O

N

O

N

O

O

O

N

O

N

Page 47: De Novo design tools for the generation of synthetically accessible ligands

CASE STUDY Optimisation of SPROUT designed inhibitors of p falciparum Dihydro-orotate

Dehydrogenase using Monomer Replacement

Initial lead compound MD-155

Sprout score -7.88

Retrosynthetic analysis finds amide formation and Ullmann/Suzuki reaction for monomer formation

Monomer library: aryl halides and p-halo-anilines

2D structures: 1923

conformations: 26916

Page 48: De Novo design tools for the generation of synthetically accessible ligands

High scoring monomer replacement results

Monomer replacement gave 840 new structures (including multiple conformers of the same structure)

Scores – 7.50 to 9.30.

Page 49: De Novo design tools for the generation of synthetically accessible ligands

Experimental Results for Some Ligands Suggested

by SPROUT LeadOpt Monomer Replacement

Starting Point

MD-155PfDHODH Ki 3.0 mM 

HsDHODH Ki 11.0 nM 

MD-204PfDHODH Ki 733 nM

HsDHODH Ki 21.0 nM

4 fold enhancement in Ki for PfDHODH

MD-213PfDHODH Ki 478 nM

HsDHODH Ki 21.7 nM

6 fold enhancement in

Ki for PfDHODH

N

CH3HO

H

O

CNN

CH3HO

H

O

CN

CF3

Cl

N

O

CN

CH3OH

HCl

Cl

Page 50: De Novo design tools for the generation of synthetically accessible ligands

Conclusions

Scoring functions for assessment of binding affinity of the hypothetical compounds produced by de novo design are far from perfect

Hence only readily synthesisable putative ligands will undergo experimental evaluation by medicinal chemists

Assessment of synthetic feasibility is a tractable problem

Page 51: De Novo design tools for the generation of synthetically accessible ligands

Acknowledgements

Matt Davies, Phil Bone and Timo Heikkala for experimental work

Molecular Networks GmbH for providing CORINA & ROTATE

MDL for providing MDDR, one of the databases used in the complexity analysis project

for sponsoring the lead optimization project