recent advances in machine learning for admetox prediction

06.04.2021

Recent advances in machine learning for ADMETox prediction

Dr.IgorV.Tetko

BIGCHEM GmbH, Institute of Structural Biology, Helmholtz Zentrum München, Germany and A.KrestovInstituteof

SolutionChemistryoftheRussianAcademyofSciences,Ivanovo,Russia

6thApril2021

06.04.2021

Agenda

•  UseofADMETOxinindustry•  Newsourcesofdata•  Inductivelearning•  Interpretationofpredictions

–  Designofinterpretabledescriptors–  Useof(more)interpretablemethods

•  Descriptor-lessmethods–  ExplainableAI

•  Accuracyofprediction•  Conclusionandperspectives

06.04.2021

Target Discovery and Validation Lead Discovery Lead Optimisation

ADME/DMPK Lead Profiling ADME/DMPK Clinics Approval

PHARMACOLOGY Safety and Efficacy ADME/DMPK

500.000 Cpds

<5000Cpds

<500Cpds

<5Cpds 1CC 1drug

2-3 Years 0.5-1 Years 1-3 Years 1-2 Years 5-6 Years 1-2 Years

Pre-clinics Clinics

Traditional Process of Drug Discovery

•  Profilingandscreeninginthevirtualspacehelpstoidentifythemostpromisingcandidates

SlidecourtesyofDr.C.Höfer,Sandoz

06.04.2021

ADMETox filters in Bayer

Göller,AHetalDrugDiscov.Today2020,25(9),1702-1709.

06.04.2021

Bayer workflow for model life cycle

Göller,A.H.etal.DrugDiscov.Today2020,25(9),1702-1709.

06.04.2021

Challenges - Extraction of information from patents[0835]Toasolutionof2-amino-4,6-dimethoxybenzamide(0.266g,1.36mmol)and3-(5-(methylsulfinyl)thiophen-2-yl)benzaldehyde(0.34g,1.36mmol)inN,N-dimethylacetamide(17mL)wasaddedNaHSO3(0.36g,2.03mmol)andp-toluenesulfonicacidmonohydrate(0.052g,0.271mmol)atrt.Thereactionmixturewasheatedat120°C.for12.5h.Afterthattimethereactionwascooledtort,concentratedunderreducedpressureanddilutedwithwater(20mL).Theprecipitatedsolidswerecollectedbyfiltration,washedwithwateranddried.Theproductwaspurifiedbyflashcolumnchromatography(silicagel,95:5chloroform/methanol)togive5,7-dimethoxy-2-(3-(5-(methylsulfinyl)thiophen-2-yl)phenyl)quinazolin-4(3H)-one(0.060g,10%)asalightyellowsolid:mp289-290°C.;1HNMR(400MHz,DMSO-d6)δ12.19(brs,1H),8.48(s,1H),8.18(d,J=7.81Hz,1H),7.90(d,J=8.20Hz,1H),7.72(d,J=3.90Hz,1H),7.55-7.64(m,2H),6.77(d,J=2.34Hz,1H),6.54(d,J=1.95Hz,1H),3.88(s,3H),3.84(s,3H),2.96(s,3H);ESIMSm/z427[M+H]+.

http://www.google.com/patents/US20140140956

06.04.2021

NextMoveSoftwareLtd.

Challenges - Extraction of information from patents

http://ochem.eu

06.04.2021

Prediction errors for a set of drugs (Bergström dataset) using models developed with different training sets

277(Bergström) 4k(Karthikeyan) 50k(Literature) 275k(Patents)

trainingsetsize

06.04.2021

Prediction of solubility using logP and melting point (MP) based on 230k measurements

logS=0.5–0.01(MP-25)–logP

06.04.2021

TransferLearning

Labeleddataareavailableinatargetdomain

InductiveTransferLearning

Nolabeleddatainasourcedomain

Self-taughtLearning

Labeleddataareavailableinasourcedomain

Multi-taskLearning

Labeleddataareavailableonlyinasourcedomain

TransductiveLearning

Differentdomainsbutsingletask Domain

adaptation

Singledomainandsingletask

Sampleselectionbias

Nolabeleddata

UnsupervisedTransferLearning

Adaptedfrom:Pan,S.J.;Yang,Q.Asurveyontransferlearning.IEEETransactionsonKnowledgeandDataEngineering2010,22,1345-1359.

06.04.2021

Multi-task learning

rone several

Human data

FatBrainLiverKidneyMuscle

Problem:• predictionoftissue-airpartitioncoefficients• smalldatasets30-100molecules(human&ratdata)Results:simultaneouspredictionofseveralpropertiesincreasedtheaccuracyofmodels

Varnek,A.etal.J.Chem.Inf.Model.2009,49,133-44.

06.04.2021

Analysis of toxicity of chemical compounds

toxicity measurements from RTECS* 129 142 unique molecular structures 87 064 toxicity endpoints 29

*RTECS: Registry of Toxic Effects of Chemical Substances

Sosnin,S.etal.J.Chem.Inf.Model.,2019,59:1062-1072.

Jain,S.etal.J.Chem.Inf.Model.2021,61(2),653-663–extendedto>60endpoints.

06.04.2021

Toxicity prediction of single vs. multitask

Sosnin,S.etal.J.Chem.Inf.Model.,2019,59:1062-1072.

06.04.2021

Data storage and model development http://ochem.eu

Tetko,I.V.;Maran,U.;Tropsha,A.Mol.Inform.2017,36.

06.04.2021

Comparison of different models, RMSE

single multi

06.04.2021

Comprehensive model view

06.04.2021

Traditional Representations of Chemical Structures

Saccharin 1D 2D 3D C7H5NO3S

Karlov,D.S.etal.RSCAdvances2019,9,5151-5157.

06.04.2021

Sedykh,A.etal.Chem.Res.Toxicol.2021,34,634-640.

Design of interpretable descriptors

Structuralalertsandextendedfunctionalgroups(EFG)http://ochem.euSalmina,E.S.etal.Molecules2016,21,1

Yang,C.etal.J.Chem.Inf.Model.2015,55,510-528.

Toxprints

06.04.2021

OverviewoftheworkflowusedtoanalyzetheTox21450kdataset.(a)Overallstudydesign.(b)Constructandevaluatepredictivemodelwithselectedpredictor,modelingalgorithm,andendpoint.

Wu,L.etalChem.Res.Toxicol.2021,34,541-549.

06.04.2021

Overallbestperformance5CV/TEST

RF+all:0.84/0.84LS-SVM+MORDRED:0.87/0.88RF+EFG:0.81/0.86All+MOLD2:0.82/0.84

Wu,L.etalChem.Res.Toxicol.2021,34,541-549

06.04.2021

Machine Learning directly from chemical structures

Textprocessing:convolutionalneuralnetworks,Transformers,LSTMGraphprocessing:messagepassingneuralnetworks

Auto-encoder:

Saccharin:c1ccc2c(c1)C(=O)NS2(=O)=O

SMILEScanonizationbymachinelearningàtransferlearningtonewdata

06.04.2021

Convolutional vs. Descriptor-based Neural Neural Networks

Coefficientofdetermination,r2.TransformerCNNprovidessimilarorbetteraccuracycomparedtotraditionalmethodsbasedondescriptorsevenforsmalldatasets(hundredscompounds!).Karpov,Petal.J.Cheminform.2020,12,17.https://github.com/bigchem/transformer-cnn

06.04.2021

Layer wise Relevance Propagation (LRP)

minimum of ANN on the learning set and as a rule matchesto the end of a network training, where ANN overtrainingtakes place. The S3 point was used to investigate theinfluence of overtraining on variable selection and to preventthe network falling into “local minima”. The terminationof network training consisted in limiting the network run toNall ) 10 000 epochs or to stop its learning after Nstop )2000 epochs following the last improvement of MSE in eitherof the S1 or S2 points (see details in ref 8).

ANALYZED PRUNING METHODS

Pruning methods take an important place in the ANNliterature. Two main groups of methods can be identified:(1) sensitivity methods and (2) penalty term methods.The sensitivity methods introduce some measures of

importance of weights by so called “sensitivities”. Thesensitivities of all weights are calculated, and the elementswith the smallest sensitivities are deleted.24-26 The secondgroup modify the error function by introducing penaltyterms.27,28 These terms drive some weights to zero duringtraining. A good overview of the literature of these and someother types of pruning methods can be found in ref 29.Usually, pruning methods were used to eliminate redun-

dant weights or to optimize internal architecturesnumberof hidden layers and number of neurons on the hiddenlayers30,31sof ANN. Occasionally, some conclusions aboutthe relevance of input variables were made, if all weightsfrom some input neurons were deleted. Authors did not tryto focus their analysis on the determination of some relevantset of variables, because, it was impossible to do whenpruning was concentrated at the level of a single weight.However, the weight sensitivities from the first group ofmethods can be easily converted to neuron sensitivities.We investigate here five pruning methods A-E. A

sensitivity Si of input variable i is introduced by

where sk is a sensitivity of a weight wk, and summation isover a set ø of outgoing weights of the neuron i or, usinganother order of weight numeration, sji is a sensitivity of aweight wji connecting the ith neuron to the jth neuron in thenext layer. The method B was developed by us previ-ously,32,33 while in the other methods an estimation of weightsensitivity was updated from the appropriate references.

The sensitivity of a neuron of ANN in the first twomethods is based on direct analysis of the magnitudes of itsoutgoing weights (“magnitude based” methods). Conversely,in the last three methods, the sensitivity is approximated bya change of the network error due to the elimination of someneuron weights (“error-based” methods).(A) The first method was inspired by ref 14. The

generation of color maps is useful for visual determinationof the most important input variables, but it is rathersubjective. This technique cannot be used, however, forautomatic processing of large amounts of data. Therefore,the absolute magnitude of a weight wk

was used as its sensitivity in eq 1. In such a way the mostappropriate relation between ref 14, and our model isachieved. Sensitivities of neurons calculated accordinglywere successfully used to prune unimportant neurons re-sponsible for four basic tastes (see ref 34 for details).(B) The preliminary version of the second approach was

used to predict activity of anti AIDS substances33 accordingto

and maxa is taken over all weights ending at neuron j. Therationale of this equation is that different neurons in an upperlayer can work in different modes (Figure 4). Thus, a neuronworking near saturation has bigger input values in compari-son with one working in linear mode. An input neuronconnected principally to the linear mode neurons on thehidden layer is always characterized by smaller sensitivitieswhen compared with a neuron primarily connected to thesaturated ones. The sensitivity calculated by (3) eliminatesthis drawback. This method was improved to take intoaccount the sensitivity of neurons in the upper layers

where, Sj is a sensitivity of the jth neuron in the upper layer.This is a recurrent formula calculating the sensitivity of aunit on layer s via the neuron sensitivities on layer s + 1.The sensitivities of output layer neurons are set to 1. Allsensitivities in a layer are usually normalized to a maximumvalue of 1. This type of sensitivity calculation was especiallyuseful in the case of simultaneous analysis of several outputpatterns.(C) Let us consider an approximation of the error function

E of an ANN by a Taylor series. When the weight vectorW is perturbed, the change in the error is approximately

where the ‰wi are the components of ‰W, gi are thecomponents of the gradient of E with respect to W, and thehij are elements of the Hessian matrix H

Figure 4. Modes of an activation function f (˙) ) 1/1 + e-Í independence from the input ˙ ) ∑iwif‚ai of lower layer nodes. Hereai is value of a neuron i on the previous layer.

Si ) ∑k✏øi

sk ⌘∑j)1

sji (1)

sk ) |wk| (2)

Si )∑j)1

nj ( wji

|wja|)2

Si )∑j)1

nj ( wji

|wja|)2

‚Sj (4)

‰E )∑igi‰wi +

2∑ihii‰wi

2∑i*jhij‰wi‰wj +

O(||‰W||3) (5)

796 J. Chem. Inf. Comput. Sci., Vol. 36, No. 4, 1996 TETKO ET AL.

Tetko,I.V.etal.J.Chem.Inf.Comput.Sci.1996,36,794-803. Bach,S.etal.PloSOne2015,10,e0130140.

06.04.2021

Interpretation of models for Transformer-CNN

P.Karpov,G.Godin,I.V.Tetko,J.Cheminform.2020,12,17.https://github.com/bigchem/transformer-cnn

06.04.2021

SchematicoftheXAImethodologyandneuralnetworkarchitecture.Amessage-passinggraphneuralnetwork(GNN)andaforwardfullyconnectedneuralnetwork(FNN)werecombinedtoprocessaninputpresentedasamoleculargraphwithatom,bond,andcomputedglobalproperties(e.g.,octanol–waterpartitioncoefficient,topologicalpolarsurfacearea).Theintegratedgradientsmethodwasappliedtocomputeatom,bond,andglobalimportancescores.

Jiménez-Luna,J.;etalJ.Chem.Inf.Model.2021,10.1021/acs.jcim.0c01344.

06.04.2021

Examplesofmotifsindicatinga)hERGandb)CYP450inhibition.

Jiménez-Luna,J.;etal.J.Chem.Inf.Model.2021,10.1021/acs.jcim.0c01344.

06.04.2021

Accuracy of predictions for classification model

06.04.2021

Advanced machine Learning for Innovative drug discovery

15PhDpositionsathttp://ai-dd.eudeadlineApril18th!

06.04.2021

Take home message

•  ADMEToxmodelingdependsonthequalityandamountofdata

•  Textprocessingmethodscanautomaticallygeneratedata

–  DataextractionfromPatentsisamaturatedtechnology

–  Imageprocessingmethodscanextractdatafrombooks,reports,pdf

•  Simultaneousmodelingofrelatedpropertiesincreasemodelquality

•  Useofinterpretabledescriptorsandinterpretablemethodsshouldnotbeneglected

•  Useofdescriptor-lessmethodscontributeshighlypredictivemodels

•  ExplainableArtificialIntelligence(XAI)toexplainmodelsisontherise

“Compared to Big Data challenges, “how to best analyze the Big Data”, the future progress is linked to the need for explainable “chemistry aware” methods.”*

*Tetko,I.V.;Engkvist,O.J.Cheminform.2020,12,74.

06.04.2021

Acknowledgement PavelKarpovZhonghuaXiaMarkEmbrechtsDipanGhoshJosephYapElenaDrachevaGennyCauMonicaCampillosGuillaumeGodin(Firmenich)SergeySosnin(Skoltech)MaximFedorov(Skoltech)YuraSushko(Google)SergeyNovotarskyi(Facebook)RobertKörnerM.Sattler(HMGU)

recent advances in machine learning for admetox prediction

Documents

recent advances in numerical weather prediction preveen...

recent advances in surfactant eor - rice...

hurricane prediction a century of advances

recent advances

cdphtu.edu.np2y$10... · 2020. 11. 4. · recent advances...

recent advances in hormone replacement...

recent advances at the national centers for environmental...

recent advances in nanotechnology

proudly sponsored by - recent advances in animal nutrition...

recent advances in mechanical - participants' … · recent...

ant2008 recent advances

review recent advances in molecularly imprinted polymers in...

recent advances in mechanical - wseas.org · recent...

recent advances in orthopaedics

recent advances in network-based methods for …gene...

recent advances on

recent advances in development of ground motion ......recent...

recent advances in mechanical engineering and mechanics ·...

recent advances deformity

recent advances in prediction of...