slas2016: why have one model when you could have thousands?
TRANSCRIPT
![Page 1: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/1.jpg)
Why have one modelwhen you could have thousands?
Alex M. Clark, Ph.D.
January 2016
© 2016 Molecular Materials Informatics, Inc. http://molmatinf.com
![Page 2: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/2.jpg)
MOLECULAR MATERIALS INFORMATICS
Cheminformatics• Generally 2D structures with activities:
• Look for trends: structure-activity relationships
• Leverages quantity rather than detail... but quality is also supremely important
2
![Page 3: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/3.jpg)
MOLECULAR MATERIALS INFORMATICS
Structure-Activity Models
• Bayesian models very effective
• Tabulate structure fingerprints for actives vs. inactives
• Prediction: ordering, probability
• Low maintenance
3
10001001000001101001011101110111
• ECFP6 fingerprints
0.8343ROC integral
![Page 4: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/4.jpg)
MOLECULAR MATERIALS INFORMATICS
The Data Problem• > 10 years ago: quantity the biggest issue
- open structure-activity data rare and small - paid collections, big pharma registration
• ~5 years ago: quality the biggest issue
- huge databases, e.g. PubChem, ChemSpider, ZINC, vendors, etc.
- generally no provenance: anything goes
• Cheminformatics seemed to be stagnant...
- new methods, same mediocre performance
4
![Page 5: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/5.jpg)
MOLECULAR MATERIALS INFORMATICS
The Data Solution• Recently: some excellent developments
- Open Melting Points: models actually work - PubChem: direct submission by scientists - CDD: store and share with same platform - ChEMBL: large, open, high quality, broad
• Can now have quantity and quality, without fees or restrictions
• Evidence suggests that the data was holding us back, not the methods
5
![Page 6: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/6.jpg)
MOLECULAR MATERIALS INFORMATICS
ChEMBL• Hierarchy looks like this:
• What we need it to be:
6
target assay activity molecule
dataset assayactivitymolecule
target
mergedactivity
materialsfor model
![Page 7: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/7.jpg)
MOLECULAR MATERIALS INFORMATICS
Slicing & Dicing
• Divide by target, species and type of assay (protein binding, whole cell, ADMET, etc.)
• Measurements: [Ki, Kd] or [IC50, EC50, AC50, GI50]
• Units: [M, mM, μM, nM]
• Relations [=, <, >, ≤, ≥]
• Total of 8646 groups of structure-activity
7
![Page 8: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/8.jpg)
MOLECULAR MATERIALS INFORMATICS
Consolidation• Strip salts / adducts
• Common organic elements only:
- [H, C, N, O, P, S, F, Cl, Br, I, B, Si, Se, As, Sb, Te]
• Duplicate molecules: merge activities, e.g.
- [1.2, 1.8] ➡ 1.5 ± 0.3 - [> 5, 5.5] ➡ > 5 - [< 1, 3.5] ➡ invalid
• Keep groups with at least 100 molecules remaining
• Now down to 1839 datasets
8
![Page 9: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/9.jpg)
MOLECULAR MATERIALS INFORMATICS
Model Building• Bayesian models need a threshold...
9
pIC50 9 157 3
inactive active
• Suitable values often known; large scale automation: must estimate
• Score: population, balance, trial Bayesian
• See J. Chem. Inf. Model. 55, 1246-1260 (2015)
![Page 10: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/10.jpg)
MOLECULAR MATERIALS INFORMATICS
Model Results
• Metrics generally good for Bayesian models using ECFP6 fingerprints
• Note that not all datasets have any SAR
10
AU
C (
easy
)
AU
C (
hard
)
population population
![Page 11: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/11.jpg)
MOLECULAR MATERIALS INFORMATICS
Deliverable• Datasets with acceptable models: 1826
- list of unique molecules - activity (standard molar units) - threshold (active/inactive) - target & assay provenance - Bayesian model (ECFP6)
• Targets are diverse, data is high quality: thanks to the ChEMBL project
• Can apply all models to any molecule...
• Start with a set of discontinued drugs...
11
![Page 12: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/12.jpg)
MOLECULAR MATERIALS INFORMATICS
Discontinued Drugs12
• ~50 drugs that passed most tests, but never made it to market
• Maybe they cure something else?
![Page 13: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/13.jpg)
MOLECULAR MATERIALS INFORMATICS
Detail & Visualisation13
Atom-centric Bayesian
Honeycomb clustering
![Page 14: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/14.jpg)
MOLECULAR MATERIALS INFORMATICS
PolyPharma app
• Proof of concept tools being explored for several drug discovery collaborations
• Interactive functionality demonstrated as a mobile app for iPhone & iPad
• Free to use
14
http://itunes.apple.com/app/polypharma/id1025327772
![Page 15: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/15.jpg)
MOLECULAR MATERIALS INFORMATICS 15
![Page 16: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/16.jpg)
MOLECULAR MATERIALS INFORMATICS 16
![Page 17: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/17.jpg)
MOLECULAR MATERIALS INFORMATICS 17
![Page 18: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/18.jpg)
MOLECULAR MATERIALS INFORMATICS 18
![Page 19: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/19.jpg)
MOLECULAR MATERIALS INFORMATICS 19
![Page 20: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/20.jpg)
MOLECULAR MATERIALS INFORMATICS 20
![Page 21: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/21.jpg)
MOLECULAR MATERIALS INFORMATICS 21
![Page 22: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/22.jpg)
MOLECULAR MATERIALS INFORMATICS 22
![Page 23: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/23.jpg)
MOLECULAR MATERIALS INFORMATICS 23
![Page 24: SLAS2016: Why have one model when you could have thousands?](https://reader031.vdocument.in/reader031/viewer/2022030218/588617ea1a28abe63e8b68ab/html5/thumbnails/24.jpg)
Acknowledgments
http://molmatinf.com http://molsync.com http://cheminf20.org
@aclarkxyz
• Collaborative Drug Discovery
• Sean Ekins
• Society for Laboratory Automation & Screening
• Inquiries to [email protected]