how deep learning can help to design better and safer ...€¦ · numerous commercial and open...

31
Olexandr Isayev, Ph.D. University of North Carolina at Chapel Hill @olexandr http://olexandrisayev.com How deep learning can help to design better and safer medicine? KinomeNet: multi-task deep convolutional network How deep learning can help to design better and safer medicine? KinomeNet: multi-task deep convolutional network

Upload: others

Post on 14-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Olexandr Isayev, Ph.D.University of North Carolina at Chapel Hill

@olexandr http://olexandrisayev.com

How deep learning can help to design better and safer medicine?

KinomeNet: multi-task deep convolutional network

How deep learning can help to design better and safer medicine?

KinomeNet: multi-task deep convolutional network

Page 2: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

About me

Ph.D. in Chemistry (computational)

Minor in CS

Worked in Federal research lab on HPC & GPU computing to solve chemical problems

Now I am research faculty at the University of North Carolina, Chapel Hill

http://olexandrisayev.com

And I am also Director of Drug Discovery at Atlas Regeneration. We use AI & multi-omics for developing regenerative medicine and stem cell differentiation technologies.

http://atlasregeneration.com/

Page 3: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

A public-private partnership that supports the discovery of new medicines through open access researchwww.thesgc.org

Page 4: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

How drugs are discovered?

Page 5: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

The Long and Winding Road to Drug Discovery

Data Science approachesuseful across the pipeline,

butvery different techniques

aim for success,but if not:

fail early, fail cheap

Page 6: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Medicines Are Transforming the Treatment of Many Diseases

Page 7: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Robotic biological tests (HTS)

Robotic synthesis

Page 8: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as
Page 9: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as
Page 10: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Drowning in Databut starving for Knowledge

The rapid growth of materials research has led to accumulation of vast amounts of data:  For example, 160,000 entries in the Inorganic Crystal Structure Database (ICSD) 

Numerous commercial and open experimental databases NIST, MatWeb, MatBase etc.

Vast computational databases such as AFLOWLIB, Materials Project, and Harvard Clean Energy.

Page 11: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Scannell et al. Nature Reviews Drug Discovery, 2012, 11, 191‐200

Decline in Pharmaceutical R&D efficiency

The cost of developing a new drug (~$2‐3B) roughly doubles every nine years.

Page 12: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Why Drugs are failed?

Page 13: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Selectivity of Kinase inhibitorsAll kinases bind ATP and therefore contain a conserved binding site

Most compounds inhibit more than one kinase

Page 14: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Why Don’t we Do Better?A Couple of Observations

• Tykerb – Breast cancer

• Gleevac – Leukemia, GI cancers

• Nexavar – Kidney and liver cancer

• Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive

Collins and Workman 2006 Nature Chemical Biology 2 689‐700

>40% of biologically active compounds bind to more than one target

Page 15: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

~106 – 107

molecules

~102 – 103

molecules

VIRTUAL SCREENING

Empirical Rules/FiltersSimilarity Search

Consensus QSA

PotentialHits

ML or QSAR ModelsStructure-based Models

Virtual Screeningto identify potential hits

Candidate molecules

Page 16: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Our vision for next-gen cheminformatics platforms

• Scale up Machine Learning Methods with the Data• Use all viraity of available data (-omics, sensors, etc)• Take advantage of latest algorithmic developments –

Deep Learning

Page 17: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Collected all human kinase data from open sources

• ChEMBL• PKIS• PubChem• Private datasets• Literature, patents, etc.

300,000+ Molecules

489 Targets 

>800,000 Experimental data points

Biggest target data: >25000 molecules Smallest target data: 1 

Human Kinase Inhibitor Data Collection

Page 18: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Human Kinase IC50 Data Distribution 

“Popular” targets

“Rare” targets

Page 19: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Convolutional Neural Network (ConvNet)

Page 20: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Convolution Function (Filter)

Comes from Image and Signal Processing

The easiest way to understand a convolution is by thinking of it as a sliding window function applied to a matrix.

Groundbreaking results of DL are mostly based on networks with convolutional filters

• Image recognition• Object detection• Medical image processing 

Page 21: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Different Levels of Abstraction 

• Hierarchical Learning 

• Natural progression from low level to high level structure as seen in natural complexity 

• Easier to monitor what is being learnt and to guide the machine to better subspaces 

• A good lower level representation can be used for many distinct tasks 

Page 22: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

KinomeNet: Convolutional Neural Network for QSAR

ConvNet

2D matrix of DescriptorsMultitask Learning

(253 targets)

ABL1

ACVR1

ZAK

ZAP70

Page 23: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

N compounds Active @1uM AUC TN FP TP FN Sensitivity Specificity

MAP4K4 160 10 0.88 149 1 1 9 0.1 0.93

BMX 155 151 0.78 0 4 151 0 1.0 0.0

Some Statistics & Performance Numbers

Random Forest Models

DL Model

MAP4K4 160 10 0.91 150 0 6 4 0.6 0.94

BMX 155 151 0.93 4 0 149 6 0.99 1.0

RF (Random Forest)Average AUC: 0.90

KinomeNetAverage AUC: 0.96

Page 24: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

KinomeNet: “Deorphanizing” rare targets

ConvNet

Multitask Learning(253 targets)

ABL1

ACVR1

ZAK

ZAP70

2D matrix of Descriptors

Page 25: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

KinomeNet: “Deorphanizing” rare targets

ConvNet

“Rare” targets(67 targets)

ACVR1

TYMS

…“Frequent”(253 targets)

Multitask Learning(320 targets)

2D matrix of Descriptors

Page 26: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Why it Works: Transfer  Learning

• Feature‐representation‐transfer

• To learn a “good” feature representation for the target domain. 

• The knowledge used to transfer across domains is encoded into the learned feature representation.

• With the new feature representation, the performance of the target task is expected to improve. 

Page 27: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Recovery of Kinase Similarity by the Network  

Page 28: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

Atlas Regeneration

Young dynamic startup company (formed in 2015) in North Carolina

We use AI to develop regenerative medicine

Design molecules to induce iPSC stem cell differentiation

Tissue and muscle regeneration, fibrosis

Page 29: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

BIG CHEMICAL DATA

FAST ARTIFICIAL INTELLIGENCE TOP HITS

250M+ SCREENING MOLECULESo Integrated public data

(PubChem, ChEMBL, etc)

o Private datasets

o Literature and patents

o In vitro (HTS)

o In vivo (mouse, rats)

o Multi-omics

o Signaling Pathways

o Gene Expression

AI Drug Discovery Platform

Page 30: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

200M+ of potential candidates

SelectivityOff target bindingToxicityMetabolic stabilityBioavailabilitySolubilityetc.

7

• Good selectivity• Three novel scaffolds• Predicted potency 7 – 25 nM• Good synthetic accessibility• Good ADME/Tox properties

Large scale prediction of bioactivity with Deep Learning

TGF beta inhibitor (Fibrosis)

FAST ARTIFICIAL INTELLIGENCE

Page 31: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as

• Data availability is the biggest barrier• Novel architecture for multitask‐QSAR• Improvement over well converged RF models• Convenience: 1 vs 320 models• Training of 1 network is faster that 320 RF models• Scalability of DL to “Big Data”• DL benefits from transfer learning• More tasks and more data – higher the benefit• Transferability: KinomeNet ‐> GPCRNet

Conclusions