enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian...

32
Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using Bayesian Models Sean Ekins 1, 2* , Robert C. Reynolds 3,4 , Baojie Wan 5 Scott G. Franzblau 5 , Joel S. Freundlich 6,7 and Barry A. Bunin 1 1 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA. 2 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA. 3 Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA. 4 Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd Avenue South, Birmingham, Alabama 35294-1240, USA. 5 Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, IL 60607, USA 6 Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA. 7 Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA. .

Upload: sean-ekins

Post on 28-Jan-2015

107 views

Category:

Health & Medicine


0 download

DESCRIPTION

ACS talk 2013

TRANSCRIPT

Page 1: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using

Bayesian Models

Sean Ekins1, 2*, Robert C. Reynolds3,4, Baojie Wan5 Scott G. Franzblau5, Joel S. Freundlich6,7and Barry A. Bunin1

 

1Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.2Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.3Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA. 4Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd Avenue South, Birmingham, Alabama 35294-1240, USA.5 Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, IL 60607, USA 6Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA.7Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA.

.

Page 2: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds) 1/3rd of worlds population infected!!!!

Multi drug resistance in 4.3% of cases Extensively drug resistant increasing incidence One new drugs in over 40 yrs Drug-drug interactions and Co-morbidity with HIV

Collaboration between groups is rare These groups may work on existing or new targets Use of computational methods with TB is rare

Applying CDD to Build a disease community for TB

Page 3: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

~ 20 public datasets for TBIncluding Novartis data on TB hits >300,000 cpds

Patents, Papers Annotated by CDD

Open to browse by anyone

http://www.collaborativedrug.com/

register

Page 4: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Ekins et al,Trends in Microbiology

19: 65-74, 2011

Fitting into the drug discoveryprocess

Page 5: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

HTS Hit rates

SRI papers

Usually less than 1%

Page 6: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

ProviderCompound

Library

Number of

compounds

Inhibitor

concentration (ug/ml

or uM)

ReadoutHit rate (%) at 90%

Inhibition

ChemBridge Novacore 50,000 30 uMLuminescence

(LuxAB)4.55

Asinex Diverse 59,760 50 uMLuminescence

(LuxAB)1.91

ASDI 6,811 30 uMLuminescence

(LuxAB)2.73

Prestwick 1,120 20 ug/ml Luminescence (ATP) 20.6

Fluorescence (MABA) 16.07

MRCT 100,000 10 uMLuminescence

(LuxABCDE)0.67

UIC hit rates

Page 7: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Wasting data?

Information from these inefficient and expensive HTS campaigns does not appear to have been used to direct “informed” selection of new libraries in subsequent screens and compound optimization in TB drug discovery

How can we continuously learn from all the data?

Page 8: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Bayesian machine learning

Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010

Bayesian classification is a simple probabilistic classification model. It is based on Bayes’ theorem

h is the hypothesis or modeld is the observed datap(h) is the prior belief (probability of hypothesis h before observing any data)p(d) is the data evidence (marginal probability of the data)p(d|h) is the likelihood (probability of data d if hypothesis h is true) p(h|d) is the posterior probability (probability of hypothesis h being true given the observed data d)

A weight is calculated for each feature using a Laplacian-adjusted probability estimate to account for the different sampling frequencies of different features.

The weights are summed to provide a probability estimate

Page 9: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Top scoring molecules assayed forMtb growth inhibition

Mtb screening molecule database

High-throughputphenotypic

Mtb screening

Descriptors + Bioactivity

Bayesian Machine Learning Mtb Model

Molecule Database (e.g. GSK malaria actives)

virtually scored using Bayesian Models

New bioactivity datamay enhance models

Identify in vitro hits

Increased hit/lead discovery efficiency

Process – Bioactivity only

Page 10: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Bayesian Classification TB Models

Dateset (number of molecules)

External ROC Score

Internal ROC

Score Concordance Specificity Sensitivity

MLSMR All single point

screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26

MLSMR dose response set

(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96

We can use the public data for machine learning model buildingUsing Discovery Studio Bayesian modelLeave out 50% x 100

Ekins et al., Mol BioSyst, 6: 840-851, 2010

Page 11: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Bayesian Classification Models for TB

G1: 1704324327

73 out of 165 good Bayesian Score: 2.885

G2: -2092491099 57 out of 120 good

Bayesian Score: 2.873

G3: -1230843627

75 out of 188 good Bayesian Score: 2.811

G4: 940811929

35 out of 65 good Bayesian Score: 2.780

G5: 563485513

123 out of 357 good Bayesian Score: 2.769

B1: 1444982751

0 out of 1158 good Bayesian Score: -3.135

B2: 274564616

0 out of 1024 good Bayesian Score: -3.018

B3: -1775057221 0 out of 982 good

Bayesian Score: -2.978

B4: 48625803

0 out of 740 good Bayesian Score: -2.712

B5: 899570811

0 out of 738 good Bayesian Score: -2.709

Good

Bad

active compounds with MIC < 5uM

Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and simple descriptors. 2 models 220,000 and >2000 compounds

Ekins et al., Mol BioSyst, 6: 840-851, 2010

Page 12: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Bayesian Classification Dose response

Good

Bad

Ekins et al., Mol BioSyst, 6: 840-851, 2010

Page 13: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Both models substantially better than the random hit rate for identifying known active compounds with MIC 5 uM in the first 1000 compounds sorted by the Bayesian model scores

The number of active compoundswas substantially larger in the NIAID dataset (1871 out of3748) versus the GVKbio dataset (377 out of 2880),

Ekins et al., Mol BioSyst, 6: 840-851, 2010

Initial testing of Mtb Bayesian models using NIAID and GVKbio data

Page 14: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

100K library Novartis Data FDA drugs

Additional test sets

Suggests models can predict data from the same and independent labsEnrichments 4-10 foldInitial enrichment – enables screening few compounds to find actives

21 hits in 2108 cpds34 hits in 248 cpds1702 hits in >100K cpds

Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.Ekins et al., Mol BioSyst, 6: 840-851, 2010

Page 15: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Dual-Event models

Become more stringent in what we call an ACTIVE

IC90 < 10 uM and a selectivity index (SI) greater than ten. SI was calculated as SI = CC50/IC90 where CC50 is the concentration that resulted in 50% inhibition of Vero cells (CC50).

Page 16: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Top scoring molecules assayed forMtb growth inhibition

Mtb screening molecule database

High-throughputphenotypic

Mtb screening

Descriptors + Bioactivity (+Cytotoxicity)

Bayesian Machine Learning Mtb Model

Molecule Database (e.g. GSK malaria actives)

virtually scored using Bayesian Models

New bioactivity datamay enhance models

Identify in vitro hits

Increased hit/lead discovery efficiency

Dual-Event models

Page 17: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Bayesian Classification TB Models

Dateset (number of molecules)

External ROC

Score

Internal ROC

Score Concordance Specificity Sensitivity

MLSMR All single point

screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26

MLSMR dose response set

(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96

NEW Dose resp and cytotoxicity (N =

2273) 0.82 ± 0.02 0.84 ± 0.02 82.61 ± 4.68 83.91 ± 5.48 65.99 ± 7.47

Single pt ROC XV AUC = 0.88Dose resp = 0.78Dose resp + cyto = 0.86

Ekins et al., PLOSONE, in press 2013

Page 18: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Good

bad

MLSMR dual event model

Ekins et al., PLOSONE, in press 2013

Page 19: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

A new dataset to model

Page 20: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Models with SRI kinase data

Model 1 ROC XV AUC (N 23797) = 0.89Model 2 (N 1248) = 0.72Model 3 (N 1248) = 0.77

Leave out 50% x 100

Dateset (number of molecules)

External ROC Score

Internal ROC

Score Concordance Specificity Sensitivity

Model 1(N = 23797) 0.87 ± 0 0.88 ± 0 76.77 ± 2.14 76.49 ± 2.41 81.7 ± 2.96

Model 2(N = 1248) 0.65 ± 0.01 0.70 ± 0.01 61.58 ± 1.56 61.85 ± 8.45 61.30 ± 8.24

Model 3(N=1248) 0.74 ± 0.02 0.75 ± 0.02 68.67 ± 6.88 69.28 ± 9.84

64.84 ± 12.11

Ekins et al., PLOSONE, in press 2013

Page 21: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Testing to date has been retrospective

Can we use our models to select compounds and influence design?

Prospective prediction

Do it enough times to show robustness

Page 22: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

MLSMR dose response with cytotoxicity and the TAACF kinase dose response with cytotoxicity models were used to screen the

Asinex library (N = 25,008)

Maybridge library (N = 57,200)

Selleck Chemicals kinase library (N = 194)

Testing prospectively

Page 23: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Results - Asinex library

94 molecules selected with the MLSMR dose response and cytotoxicity model

88 with the library based on kinase inhibitor scaffolds with cytotoxicity model and were tested at a single concentration.

8 (MLSMR) and 19 hits (kinase) with > 90% inhibition at 100 ug/ml (8.5% and 21.5% hit rates)

Results - Maybridge library

50 molecules had greater than or equal to 90% inhibition at 100 ug/ml (28.7% hit rate) - 8 with good SI

Ekins et al., PLOSONE, in press 2013

Page 24: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Asinex and MLSMR actives PCA

Ekins et al., PLOSONE, in press 2013

Page 25: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Examples of selective and active

compounds with MIC <10 ug/ml

Page 26: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

JFD02381

98.9 95 5.84 10.09 >100 25.27 (0.80)

12.79 (0.5)

JFD02382

91.5 90.1 > 100 47.99 >100 18.32 (0.69)

9.78 (0.43)

O

O

O

O

OH

OCH3

OO

O

O

OH

O

CH3

CH3

Maybridge

number

Structure Inhibition %

MABA at 100

g g/ml

Inhibition %

LORA at 100

g g/ml

MIC

MABA

(g/ml)

MIC LORA

(g g/ml)

CC50 Vero

(g g/ml)

MLSMR

model

score

Kinase

model

score

An example of the model ranking similar compounds

Page 27: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Analysis of SelleckChem Kinase library N=194

47 molecules greater than or equal to 90% inhibition of M. tuberculosis activity, at 100ug/ml

hit rate of 24.2%.

Note best model was another dual activity model (Ekins et al., Chem Biol 20: 370-378, 2013)

Ekins et al., PLOSONE, in press 2013

Page 28: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Kinase inhibitors active vs Mtb

SI not ideal– several other weaker actives are approved drugs

Page 29: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

A summary of the numbers involved – filtering for hits.

82,403 molecules screened through Bayesian models

550 molecules were tested in vitro

124 actives were identified

22.5 % hit rate

Identified several novel potent lead series with good cytotoxicity & selectivity

Identified known human kinase inhibitors and FDA approved drugs as new hits

Page 30: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Conclusions

Still difficult to identify molecules with bioactivity and no cytotoxicity

Models perform differently on different data sets

Need to understand what factors are key

Hit rate much higher than HTS / screen a fraction of molecules

Computational models should be used prior to HTS

Focus resources

Page 31: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Acknowledgments

The project described was supported by Award Number R43 LM011152-01 “Biocomputation across distributed private datasets to enhance drug discovery” from the National Library of Medicine (PI: S. Ekins)

Accelrys

The CDD TB has been developed thanks to funding from the Bill and Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing”)

Allen Casey (IDRI)

Page 32: Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

You can find me @... CDD Booth 205

PAPER ID: 13433PAPER TITLE: “Dispensing processes profoundly impact biological assays and computational and statistical analyses”April 8th 8.35am Room 349

PAPER ID: 14750PAPER TITLE: “Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using Bayesian Models” April 9th 1.30pm Room 353PAPER ID: 21524

PAPER TITLE: “Navigating between patents, papers, abstracts and databases using public sources and tools”April 9th 3.50pm Room 350PAPER ID: 13358

PAPER TITLE: “TB Mobile: Appifying Data on Anti-tuberculosis Molecule Targets”April 10th 8.30am Room 357

PAPER ID: 13382PAPER TITLE: “Challenges and recommendations for obtaining chemical structures of industry-provided repurposing candidates”April 10th 10.20am Room 350

PAPER ID: 13438PAPER TITLE: “Dual-event machine learning models to accelerate drug discovery”April 10th 3.05 pm Room 350