recent advances in machine learning for admetox prediction

31
06.04.2021 Recent advances in machine learning for ADMETox prediction Dr. Igor V. Tetko BIGCHEM GmbH, Institute of Structural Biology, Helmholtz Zentrum München, Germany and A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Ivanovo, Russia 6 th April 2021

Upload: others

Post on 27-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recent advances in machine learning for ADMETox prediction

06.04.2021

Recent advances in machine learning for ADMETox prediction

Dr.IgorV.Tetko

BIGCHEM GmbH, Institute of Structural Biology, Helmholtz Zentrum München, Germany and A.KrestovInstituteof

SolutionChemistryoftheRussianAcademyofSciences,Ivanovo,Russia

6thApril2021

Page 2: Recent advances in machine learning for ADMETox prediction

06.04.2021

Agenda

•  UseofADMETOxinindustry•  Newsourcesofdata•  Inductivelearning•  Interpretationofpredictions

–  Designofinterpretabledescriptors–  Useof(more)interpretablemethods

•  Descriptor-lessmethods–  ExplainableAI

•  Accuracyofprediction•  Conclusionandperspectives

Page 3: Recent advances in machine learning for ADMETox prediction

06.04.2021

Target Discovery and Validation Lead Discovery Lead Optimisation

ADME/DMPK Lead Profiling ADME/DMPK Clinics Approval

PHARMACOLOGY Safety and Efficacy ADME/DMPK

500.000 Cpds

<5000Cpds

<500Cpds

<5Cpds 1CC 1drug

2-3 Years 0.5-1 Years 1-3 Years 1-2 Years 5-6 Years 1-2 Years

Pre-clinics Clinics

Traditional Process of Drug Discovery

•  Profilingandscreeninginthevirtualspacehelpstoidentifythemostpromisingcandidates

SlidecourtesyofDr.C.Höfer,Sandoz

Page 4: Recent advances in machine learning for ADMETox prediction

06.04.2021

ADMETox filters in Bayer

Göller,AHetalDrugDiscov.Today2020,25(9),1702-1709.

Page 5: Recent advances in machine learning for ADMETox prediction

06.04.2021

Bayer workflow for model life cycle

Göller,A.H.etal.DrugDiscov.Today2020,25(9),1702-1709.

Page 6: Recent advances in machine learning for ADMETox prediction

06.04.2021

Challenges - Extraction of information from patents[0835]Toasolutionof2-amino-4,6-dimethoxybenzamide(0.266g,1.36mmol)and3-(5-(methylsulfinyl)thiophen-2-yl)benzaldehyde(0.34g,1.36mmol)inN,N-dimethylacetamide(17mL)wasaddedNaHSO3(0.36g,2.03mmol)andp-toluenesulfonicacidmonohydrate(0.052g,0.271mmol)atrt.Thereactionmixturewasheatedat120°C.for12.5h.Afterthattimethereactionwascooledtort,concentratedunderreducedpressureanddilutedwithwater(20mL).Theprecipitatedsolidswerecollectedbyfiltration,washedwithwateranddried.Theproductwaspurifiedbyflashcolumnchromatography(silicagel,95:5chloroform/methanol)togive5,7-dimethoxy-2-(3-(5-(methylsulfinyl)thiophen-2-yl)phenyl)quinazolin-4(3H)-one(0.060g,10%)asalightyellowsolid:mp289-290°C.;1HNMR(400MHz,DMSO-d6)δ12.19(brs,1H),8.48(s,1H),8.18(d,J=7.81Hz,1H),7.90(d,J=8.20Hz,1H),7.72(d,J=3.90Hz,1H),7.55-7.64(m,2H),6.77(d,J=2.34Hz,1H),6.54(d,J=1.95Hz,1H),3.88(s,3H),3.84(s,3H),2.96(s,3H);ESIMSm/z427[M+H]+.

http://www.google.com/patents/US20140140956

Page 7: Recent advances in machine learning for ADMETox prediction

06.04.2021

NextMoveSoftwareLtd.

Challenges - Extraction of information from patents

http://ochem.eu

Page 8: Recent advances in machine learning for ADMETox prediction

06.04.2021

Prediction errors for a set of drugs (Bergström dataset) using models developed with different training sets

0

10

20

30

40

50

60

277(Bergström) 4k(Karthikeyan) 50k(Literature) 275k(Patents)

°C

trainingsetsize

Page 9: Recent advances in machine learning for ADMETox prediction

06.04.2021

Prediction of solubility using logP and melting point (MP) based on 230k measurements

logS=0.5–0.01(MP-25)–logP

Page 10: Recent advances in machine learning for ADMETox prediction

06.04.2021

TransferLearning

Labeleddataareavailableinatargetdomain

InductiveTransferLearning

Nolabeleddatainasourcedomain

Self-taughtLearning

Labeleddataareavailableinasourcedomain

Multi-taskLearning

Labeleddataareavailableonlyinasourcedomain

TransductiveLearning

Differentdomainsbutsingletask Domain

adaptation

Singledomainandsingletask

Sampleselectionbias

Nolabeleddata

UnsupervisedTransferLearning

Adaptedfrom:Pan,S.J.;Yang,Q.Asurveyontransferlearning.IEEETransactionsonKnowledgeandDataEngineering2010,22,1345-1359.

Page 11: Recent advances in machine learning for ADMETox prediction

06.04.2021

Multi-task learning

0

0.1

0.2

0.3

0.4

0.5

0.6

Mea

n A

bsol

ute

Erro

rone several

Human data

FatBrainLiverKidneyMuscle

Problem:• predictionoftissue-airpartitioncoefficients• smalldatasets30-100molecules(human&ratdata)Results:simultaneouspredictionofseveralpropertiesincreasedtheaccuracyofmodels

Varnek,A.etal.J.Chem.Inf.Model.2009,49,133-44.

Page 12: Recent advances in machine learning for ADMETox prediction

06.04.2021

Analysis of toxicity of chemical compounds

toxicity measurements from RTECS* 129 142 unique molecular structures 87 064 toxicity endpoints 29

*RTECS: Registry of Toxic Effects of Chemical Substances

Sosnin,S.etal.J.Chem.Inf.Model.,2019,59:1062-1072.

Jain,S.etal.J.Chem.Inf.Model.2021,61(2),653-663–extendedto>60endpoints.

Page 13: Recent advances in machine learning for ADMETox prediction

06.04.2021

Toxicity prediction of single vs. multitask

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

ST

MT

Sosnin,S.etal.J.Chem.Inf.Model.,2019,59:1062-1072.

RMSE

Page 14: Recent advances in machine learning for ADMETox prediction

06.04.2021

Data storage and model development http://ochem.eu

Tetko,I.V.;Maran,U.;Tropsha,A.Mol.Inform.2017,36.

Page 15: Recent advances in machine learning for ADMETox prediction

06.04.2021

Comparison of different models, RMSE

single multi

Page 16: Recent advances in machine learning for ADMETox prediction

06.04.2021

Comprehensive model view

Page 17: Recent advances in machine learning for ADMETox prediction

06.04.2021

Traditional Representations of Chemical Structures

Saccharin 1D 2D 3D C7H5NO3S

Karlov,D.S.etal.RSCAdvances2019,9,5151-5157.

Page 18: Recent advances in machine learning for ADMETox prediction

06.04.2021

Sedykh,A.etal.Chem.Res.Toxicol.2021,34,634-640.

Design of interpretable descriptors

Structuralalertsandextendedfunctionalgroups(EFG)http://ochem.euSalmina,E.S.etal.Molecules2016,21,1

Yang,C.etal.J.Chem.Inf.Model.2015,55,510-528.

Toxprints

Page 19: Recent advances in machine learning for ADMETox prediction

06.04.2021

OverviewoftheworkflowusedtoanalyzetheTox21450kdataset.(a)Overallstudydesign.(b)Constructandevaluatepredictivemodelwithselectedpredictor,modelingalgorithm,andendpoint.

Wu,L.etalChem.Res.Toxicol.2021,34,541-549.

Page 20: Recent advances in machine learning for ADMETox prediction

06.04.2021

Overallbestperformance5CV/TEST

RF+all:0.84/0.84LS-SVM+MORDRED:0.87/0.88RF+EFG:0.81/0.86All+MOLD2:0.82/0.84

Wu,L.etalChem.Res.Toxicol.2021,34,541-549

Page 21: Recent advances in machine learning for ADMETox prediction

06.04.2021

Machine Learning directly from chemical structures

Textprocessing:convolutionalneuralnetworks,Transformers,LSTMGraphprocessing:messagepassingneuralnetworks

Auto-encoder:

Saccharin:c1ccc2c(c1)C(=O)NS2(=O)=O

SMILEScanonizationbymachinelearningàtransferlearningtonewdata

Page 22: Recent advances in machine learning for ADMETox prediction

06.04.2021

Convolutional vs. Descriptor-based Neural Neural Networks

Coefficientofdetermination,r2.TransformerCNNprovidessimilarorbetteraccuracycomparedtotraditionalmethodsbasedondescriptorsevenforsmalldatasets(hundredscompounds!).Karpov,Petal.J.Cheminform.2020,12,17.https://github.com/bigchem/transformer-cnn

Page 23: Recent advances in machine learning for ADMETox prediction

06.04.2021

Layer wise Relevance Propagation (LRP)

minimum of ANN on the learning set and as a rule matchesto the end of a network training, where ANN overtrainingtakes place. The S3 point was used to investigate theinfluence of overtraining on variable selection and to preventthe network falling into “local minima”. The terminationof network training consisted in limiting the network run toNall ) 10 000 epochs or to stop its learning after Nstop )2000 epochs following the last improvement of MSE in eitherof the S1 or S2 points (see details in ref 8).

ANALYZED PRUNING METHODS

Pruning methods take an important place in the ANNliterature. Two main groups of methods can be identified:(1) sensitivity methods and (2) penalty term methods.The sensitivity methods introduce some measures of

importance of weights by so called “sensitivities”. Thesensitivities of all weights are calculated, and the elementswith the smallest sensitivities are deleted.24-26 The secondgroup modify the error function by introducing penaltyterms.27,28 These terms drive some weights to zero duringtraining. A good overview of the literature of these and someother types of pruning methods can be found in ref 29.Usually, pruning methods were used to eliminate redun-

dant weights or to optimize internal architecturesnumberof hidden layers and number of neurons on the hiddenlayers30,31sof ANN. Occasionally, some conclusions aboutthe relevance of input variables were made, if all weightsfrom some input neurons were deleted. Authors did not tryto focus their analysis on the determination of some relevantset of variables, because, it was impossible to do whenpruning was concentrated at the level of a single weight.However, the weight sensitivities from the first group ofmethods can be easily converted to neuron sensitivities.We investigate here five pruning methods A-E. A

sensitivity Si of input variable i is introduced by

where sk is a sensitivity of a weight wk, and summation isover a set ø of outgoing weights of the neuron i or, usinganother order of weight numeration, sji is a sensitivity of aweight wji connecting the ith neuron to the jth neuron in thenext layer. The method B was developed by us previ-ously,32,33 while in the other methods an estimation of weightsensitivity was updated from the appropriate references.

The sensitivity of a neuron of ANN in the first twomethods is based on direct analysis of the magnitudes of itsoutgoing weights (“magnitude based” methods). Conversely,in the last three methods, the sensitivity is approximated bya change of the network error due to the elimination of someneuron weights (“error-based” methods).(A) The first method was inspired by ref 14. The

generation of color maps is useful for visual determinationof the most important input variables, but it is rathersubjective. This technique cannot be used, however, forautomatic processing of large amounts of data. Therefore,the absolute magnitude of a weight wk

was used as its sensitivity in eq 1. In such a way the mostappropriate relation between ref 14, and our model isachieved. Sensitivities of neurons calculated accordinglywere successfully used to prune unimportant neurons re-sponsible for four basic tastes (see ref 34 for details).(B) The preliminary version of the second approach was

used to predict activity of anti AIDS substances33 accordingto

and maxa is taken over all weights ending at neuron j. Therationale of this equation is that different neurons in an upperlayer can work in different modes (Figure 4). Thus, a neuronworking near saturation has bigger input values in compari-son with one working in linear mode. An input neuronconnected principally to the linear mode neurons on thehidden layer is always characterized by smaller sensitivitieswhen compared with a neuron primarily connected to thesaturated ones. The sensitivity calculated by (3) eliminatesthis drawback. This method was improved to take intoaccount the sensitivity of neurons in the upper layers

where, Sj is a sensitivity of the jth neuron in the upper layer.This is a recurrent formula calculating the sensitivity of aunit on layer s via the neuron sensitivities on layer s + 1.The sensitivities of output layer neurons are set to 1. Allsensitivities in a layer are usually normalized to a maximumvalue of 1. This type of sensitivity calculation was especiallyuseful in the case of simultaneous analysis of several outputpatterns.(C) Let us consider an approximation of the error function

E of an ANN by a Taylor series. When the weight vectorW is perturbed, the change in the error is approximately

where the ‰wi are the components of ‰W, gi are thecomponents of the gradient of E with respect to W, and thehij are elements of the Hessian matrix H

Figure 4. Modes of an activation function f (˙) ) 1/1 + e-Í independence from the input ˙ ) ∑iwif‚ai of lower layer nodes. Hereai is value of a neuron i on the previous layer.

Si ) ∑k✏øi

sk ⌘∑j)1

nj

sji (1)

sk ) |wk| (2)

Si )∑j)1

nj ( wji

maxa

|wja|)2

(3)

Si )∑j)1

nj ( wji

maxa

|wja|)2

‚Sj (4)

‰E )∑igi‰wi +

1

2∑ihii‰wi

2 +1

2∑i*jhij‰wi‰wj +

O(||‰W||3) (5)

796 J. Chem. Inf. Comput. Sci., Vol. 36, No. 4, 1996 TETKO ET AL.

+ +

Tetko,I.V.etal.J.Chem.Inf.Comput.Sci.1996,36,794-803. Bach,S.etal.PloSOne2015,10,e0130140.

Page 24: Recent advances in machine learning for ADMETox prediction

06.04.2021

Interpretation of models for Transformer-CNN

P.Karpov,G.Godin,I.V.Tetko,J.Cheminform.2020,12,17.https://github.com/bigchem/transformer-cnn

Page 25: Recent advances in machine learning for ADMETox prediction

06.04.2021

SchematicoftheXAImethodologyandneuralnetworkarchitecture.Amessage-passinggraphneuralnetwork(GNN)andaforwardfullyconnectedneuralnetwork(FNN)werecombinedtoprocessaninputpresentedasamoleculargraphwithatom,bond,andcomputedglobalproperties(e.g.,octanol–waterpartitioncoefficient,topologicalpolarsurfacearea).Theintegratedgradientsmethodwasappliedtocomputeatom,bond,andglobalimportancescores.

Jiménez-Luna,J.;etalJ.Chem.Inf.Model.2021,10.1021/acs.jcim.0c01344.

Page 26: Recent advances in machine learning for ADMETox prediction

06.04.2021

Examplesofmotifsindicatinga)hERGandb)CYP450inhibition.

Jiménez-Luna,J.;etal.J.Chem.Inf.Model.2021,10.1021/acs.jcim.0c01344.

Page 27: Recent advances in machine learning for ADMETox prediction

06.04.2021

Accuracy of predictions for classification model

Page 28: Recent advances in machine learning for ADMETox prediction

06.04.2021

Page 29: Recent advances in machine learning for ADMETox prediction

06.04.2021

Advanced machine Learning for Innovative drug discovery

15PhDpositionsathttp://ai-dd.eudeadlineApril18th!

Page 30: Recent advances in machine learning for ADMETox prediction

06.04.2021

Take home message

•  ADMEToxmodelingdependsonthequalityandamountofdata

•  Textprocessingmethodscanautomaticallygeneratedata

–  DataextractionfromPatentsisamaturatedtechnology

–  Imageprocessingmethodscanextractdatafrombooks,reports,pdf

•  Simultaneousmodelingofrelatedpropertiesincreasemodelquality

•  Useofinterpretabledescriptorsandinterpretablemethodsshouldnotbeneglected

•  Useofdescriptor-lessmethodscontributeshighlypredictivemodels

•  ExplainableArtificialIntelligence(XAI)toexplainmodelsisontherise

“Compared to Big Data challenges, “how to best analyze the Big Data”, the future progress is linked to the need for explainable “chemistry aware” methods.”*

*Tetko,I.V.;Engkvist,O.J.Cheminform.2020,12,74.

Page 31: Recent advances in machine learning for ADMETox prediction

06.04.2021

Acknowledgement PavelKarpovZhonghuaXiaMarkEmbrechtsDipanGhoshJosephYapElenaDrachevaGennyCauMonicaCampillosGuillaumeGodin(Firmenich)SergeySosnin(Skoltech)MaximFedorov(Skoltech)YuraSushko(Google)SergeyNovotarskyi(Facebook)RobertKörnerM.Sattler(HMGU)