recent advances in machine learning for admetox prediction
Post on 27-Feb-2022
1 Views
Preview:
TRANSCRIPT
06.04.2021
Recent advances in machine learning for ADMETox prediction
Dr.IgorV.Tetko
BIGCHEM GmbH, Institute of Structural Biology, Helmholtz Zentrum München, Germany and A.KrestovInstituteof
SolutionChemistryoftheRussianAcademyofSciences,Ivanovo,Russia
6thApril2021
06.04.2021
Agenda
• UseofADMETOxinindustry• Newsourcesofdata• Inductivelearning• Interpretationofpredictions
– Designofinterpretabledescriptors– Useof(more)interpretablemethods
• Descriptor-lessmethods– ExplainableAI
• Accuracyofprediction• Conclusionandperspectives
06.04.2021
Target Discovery and Validation Lead Discovery Lead Optimisation
ADME/DMPK Lead Profiling ADME/DMPK Clinics Approval
PHARMACOLOGY Safety and Efficacy ADME/DMPK
500.000 Cpds
<5000Cpds
<500Cpds
<5Cpds 1CC 1drug
2-3 Years 0.5-1 Years 1-3 Years 1-2 Years 5-6 Years 1-2 Years
Pre-clinics Clinics
Traditional Process of Drug Discovery
• Profilingandscreeninginthevirtualspacehelpstoidentifythemostpromisingcandidates
SlidecourtesyofDr.C.Höfer,Sandoz
06.04.2021
ADMETox filters in Bayer
Göller,AHetalDrugDiscov.Today2020,25(9),1702-1709.
06.04.2021
Bayer workflow for model life cycle
Göller,A.H.etal.DrugDiscov.Today2020,25(9),1702-1709.
06.04.2021
Challenges - Extraction of information from patents[0835]Toasolutionof2-amino-4,6-dimethoxybenzamide(0.266g,1.36mmol)and3-(5-(methylsulfinyl)thiophen-2-yl)benzaldehyde(0.34g,1.36mmol)inN,N-dimethylacetamide(17mL)wasaddedNaHSO3(0.36g,2.03mmol)andp-toluenesulfonicacidmonohydrate(0.052g,0.271mmol)atrt.Thereactionmixturewasheatedat120°C.for12.5h.Afterthattimethereactionwascooledtort,concentratedunderreducedpressureanddilutedwithwater(20mL).Theprecipitatedsolidswerecollectedbyfiltration,washedwithwateranddried.Theproductwaspurifiedbyflashcolumnchromatography(silicagel,95:5chloroform/methanol)togive5,7-dimethoxy-2-(3-(5-(methylsulfinyl)thiophen-2-yl)phenyl)quinazolin-4(3H)-one(0.060g,10%)asalightyellowsolid:mp289-290°C.;1HNMR(400MHz,DMSO-d6)δ12.19(brs,1H),8.48(s,1H),8.18(d,J=7.81Hz,1H),7.90(d,J=8.20Hz,1H),7.72(d,J=3.90Hz,1H),7.55-7.64(m,2H),6.77(d,J=2.34Hz,1H),6.54(d,J=1.95Hz,1H),3.88(s,3H),3.84(s,3H),2.96(s,3H);ESIMSm/z427[M+H]+.
http://www.google.com/patents/US20140140956
06.04.2021
NextMoveSoftwareLtd.
Challenges - Extraction of information from patents
http://ochem.eu
06.04.2021
Prediction errors for a set of drugs (Bergström dataset) using models developed with different training sets
0
10
20
30
40
50
60
277(Bergström) 4k(Karthikeyan) 50k(Literature) 275k(Patents)
°C
trainingsetsize
06.04.2021
Prediction of solubility using logP and melting point (MP) based on 230k measurements
logS=0.5–0.01(MP-25)–logP
06.04.2021
TransferLearning
Labeleddataareavailableinatargetdomain
InductiveTransferLearning
Nolabeleddatainasourcedomain
Self-taughtLearning
Labeleddataareavailableinasourcedomain
Multi-taskLearning
Labeleddataareavailableonlyinasourcedomain
TransductiveLearning
Differentdomainsbutsingletask Domain
adaptation
Singledomainandsingletask
Sampleselectionbias
Nolabeleddata
UnsupervisedTransferLearning
Adaptedfrom:Pan,S.J.;Yang,Q.Asurveyontransferlearning.IEEETransactionsonKnowledgeandDataEngineering2010,22,1345-1359.
06.04.2021
Multi-task learning
0
0.1
0.2
0.3
0.4
0.5
0.6
Mea
n A
bsol
ute
Erro
rone several
Human data
FatBrainLiverKidneyMuscle
Problem:• predictionoftissue-airpartitioncoefficients• smalldatasets30-100molecules(human&ratdata)Results:simultaneouspredictionofseveralpropertiesincreasedtheaccuracyofmodels
Varnek,A.etal.J.Chem.Inf.Model.2009,49,133-44.
06.04.2021
Analysis of toxicity of chemical compounds
toxicity measurements from RTECS* 129 142 unique molecular structures 87 064 toxicity endpoints 29
*RTECS: Registry of Toxic Effects of Chemical Substances
Sosnin,S.etal.J.Chem.Inf.Model.,2019,59:1062-1072.
Jain,S.etal.J.Chem.Inf.Model.2021,61(2),653-663–extendedto>60endpoints.
06.04.2021
Toxicity prediction of single vs. multitask
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
ST
MT
Sosnin,S.etal.J.Chem.Inf.Model.,2019,59:1062-1072.
RMSE
06.04.2021
Data storage and model development http://ochem.eu
Tetko,I.V.;Maran,U.;Tropsha,A.Mol.Inform.2017,36.
06.04.2021
Comparison of different models, RMSE
single multi
06.04.2021
Comprehensive model view
06.04.2021
Traditional Representations of Chemical Structures
Saccharin 1D 2D 3D C7H5NO3S
Karlov,D.S.etal.RSCAdvances2019,9,5151-5157.
06.04.2021
Sedykh,A.etal.Chem.Res.Toxicol.2021,34,634-640.
Design of interpretable descriptors
Structuralalertsandextendedfunctionalgroups(EFG)http://ochem.euSalmina,E.S.etal.Molecules2016,21,1
Yang,C.etal.J.Chem.Inf.Model.2015,55,510-528.
Toxprints
06.04.2021
OverviewoftheworkflowusedtoanalyzetheTox21450kdataset.(a)Overallstudydesign.(b)Constructandevaluatepredictivemodelwithselectedpredictor,modelingalgorithm,andendpoint.
Wu,L.etalChem.Res.Toxicol.2021,34,541-549.
06.04.2021
Overallbestperformance5CV/TEST
RF+all:0.84/0.84LS-SVM+MORDRED:0.87/0.88RF+EFG:0.81/0.86All+MOLD2:0.82/0.84
Wu,L.etalChem.Res.Toxicol.2021,34,541-549
06.04.2021
Machine Learning directly from chemical structures
Textprocessing:convolutionalneuralnetworks,Transformers,LSTMGraphprocessing:messagepassingneuralnetworks
Auto-encoder:
Saccharin:c1ccc2c(c1)C(=O)NS2(=O)=O
SMILEScanonizationbymachinelearningàtransferlearningtonewdata
06.04.2021
Convolutional vs. Descriptor-based Neural Neural Networks
Coefficientofdetermination,r2.TransformerCNNprovidessimilarorbetteraccuracycomparedtotraditionalmethodsbasedondescriptorsevenforsmalldatasets(hundredscompounds!).Karpov,Petal.J.Cheminform.2020,12,17.https://github.com/bigchem/transformer-cnn
06.04.2021
Layer wise Relevance Propagation (LRP)
minimum of ANN on the learning set and as a rule matchesto the end of a network training, where ANN overtrainingtakes place. The S3 point was used to investigate theinfluence of overtraining on variable selection and to preventthe network falling into “local minima”. The terminationof network training consisted in limiting the network run toNall ) 10 000 epochs or to stop its learning after Nstop )2000 epochs following the last improvement of MSE in eitherof the S1 or S2 points (see details in ref 8).
ANALYZED PRUNING METHODS
Pruning methods take an important place in the ANNliterature. Two main groups of methods can be identified:(1) sensitivity methods and (2) penalty term methods.The sensitivity methods introduce some measures of
importance of weights by so called “sensitivities”. Thesensitivities of all weights are calculated, and the elementswith the smallest sensitivities are deleted.24-26 The secondgroup modify the error function by introducing penaltyterms.27,28 These terms drive some weights to zero duringtraining. A good overview of the literature of these and someother types of pruning methods can be found in ref 29.Usually, pruning methods were used to eliminate redun-
dant weights or to optimize internal architecturesnumberof hidden layers and number of neurons on the hiddenlayers30,31sof ANN. Occasionally, some conclusions aboutthe relevance of input variables were made, if all weightsfrom some input neurons were deleted. Authors did not tryto focus their analysis on the determination of some relevantset of variables, because, it was impossible to do whenpruning was concentrated at the level of a single weight.However, the weight sensitivities from the first group ofmethods can be easily converted to neuron sensitivities.We investigate here five pruning methods A-E. A
sensitivity Si of input variable i is introduced by
where sk is a sensitivity of a weight wk, and summation isover a set ø of outgoing weights of the neuron i or, usinganother order of weight numeration, sji is a sensitivity of aweight wji connecting the ith neuron to the jth neuron in thenext layer. The method B was developed by us previ-ously,32,33 while in the other methods an estimation of weightsensitivity was updated from the appropriate references.
The sensitivity of a neuron of ANN in the first twomethods is based on direct analysis of the magnitudes of itsoutgoing weights (“magnitude based” methods). Conversely,in the last three methods, the sensitivity is approximated bya change of the network error due to the elimination of someneuron weights (“error-based” methods).(A) The first method was inspired by ref 14. The
generation of color maps is useful for visual determinationof the most important input variables, but it is rathersubjective. This technique cannot be used, however, forautomatic processing of large amounts of data. Therefore,the absolute magnitude of a weight wk
was used as its sensitivity in eq 1. In such a way the mostappropriate relation between ref 14, and our model isachieved. Sensitivities of neurons calculated accordinglywere successfully used to prune unimportant neurons re-sponsible for four basic tastes (see ref 34 for details).(B) The preliminary version of the second approach was
used to predict activity of anti AIDS substances33 accordingto
and maxa is taken over all weights ending at neuron j. Therationale of this equation is that different neurons in an upperlayer can work in different modes (Figure 4). Thus, a neuronworking near saturation has bigger input values in compari-son with one working in linear mode. An input neuronconnected principally to the linear mode neurons on thehidden layer is always characterized by smaller sensitivitieswhen compared with a neuron primarily connected to thesaturated ones. The sensitivity calculated by (3) eliminatesthis drawback. This method was improved to take intoaccount the sensitivity of neurons in the upper layers
where, Sj is a sensitivity of the jth neuron in the upper layer.This is a recurrent formula calculating the sensitivity of aunit on layer s via the neuron sensitivities on layer s + 1.The sensitivities of output layer neurons are set to 1. Allsensitivities in a layer are usually normalized to a maximumvalue of 1. This type of sensitivity calculation was especiallyuseful in the case of simultaneous analysis of several outputpatterns.(C) Let us consider an approximation of the error function
E of an ANN by a Taylor series. When the weight vectorW is perturbed, the change in the error is approximately
where the ‰wi are the components of ‰W, gi are thecomponents of the gradient of E with respect to W, and thehij are elements of the Hessian matrix H
Figure 4. Modes of an activation function f (˙) ) 1/1 + e-Í independence from the input ˙ ) ∑iwif‚ai of lower layer nodes. Hereai is value of a neuron i on the previous layer.
Si ) ∑k✏øi
sk ⌘∑j)1
nj
sji (1)
sk ) |wk| (2)
Si )∑j)1
nj ( wji
maxa
|wja|)2
(3)
Si )∑j)1
nj ( wji
maxa
|wja|)2
‚Sj (4)
‰E )∑igi‰wi +
1
2∑ihii‰wi
2 +1
2∑i*jhij‰wi‰wj +
O(||‰W||3) (5)
796 J. Chem. Inf. Comput. Sci., Vol. 36, No. 4, 1996 TETKO ET AL.
+ +
Tetko,I.V.etal.J.Chem.Inf.Comput.Sci.1996,36,794-803. Bach,S.etal.PloSOne2015,10,e0130140.
06.04.2021
Interpretation of models for Transformer-CNN
P.Karpov,G.Godin,I.V.Tetko,J.Cheminform.2020,12,17.https://github.com/bigchem/transformer-cnn
06.04.2021
SchematicoftheXAImethodologyandneuralnetworkarchitecture.Amessage-passinggraphneuralnetwork(GNN)andaforwardfullyconnectedneuralnetwork(FNN)werecombinedtoprocessaninputpresentedasamoleculargraphwithatom,bond,andcomputedglobalproperties(e.g.,octanol–waterpartitioncoefficient,topologicalpolarsurfacearea).Theintegratedgradientsmethodwasappliedtocomputeatom,bond,andglobalimportancescores.
Jiménez-Luna,J.;etalJ.Chem.Inf.Model.2021,10.1021/acs.jcim.0c01344.
06.04.2021
Examplesofmotifsindicatinga)hERGandb)CYP450inhibition.
Jiménez-Luna,J.;etal.J.Chem.Inf.Model.2021,10.1021/acs.jcim.0c01344.
06.04.2021
Accuracy of predictions for classification model
06.04.2021
06.04.2021
Advanced machine Learning for Innovative drug discovery
15PhDpositionsathttp://ai-dd.eudeadlineApril18th!
06.04.2021
Take home message
• ADMEToxmodelingdependsonthequalityandamountofdata
• Textprocessingmethodscanautomaticallygeneratedata
– DataextractionfromPatentsisamaturatedtechnology
– Imageprocessingmethodscanextractdatafrombooks,reports,pdf
• Simultaneousmodelingofrelatedpropertiesincreasemodelquality
• Useofinterpretabledescriptorsandinterpretablemethodsshouldnotbeneglected
• Useofdescriptor-lessmethodscontributeshighlypredictivemodels
• ExplainableArtificialIntelligence(XAI)toexplainmodelsisontherise
“Compared to Big Data challenges, “how to best analyze the Big Data”, the future progress is linked to the need for explainable “chemistry aware” methods.”*
*Tetko,I.V.;Engkvist,O.J.Cheminform.2020,12,74.
06.04.2021
Acknowledgement PavelKarpovZhonghuaXiaMarkEmbrechtsDipanGhoshJosephYapElenaDrachevaGennyCauMonicaCampillosGuillaumeGodin(Firmenich)SergeySosnin(Skoltech)MaximFedorov(Skoltech)YuraSushko(Google)SergeyNovotarskyi(Facebook)RobertKörnerM.Sattler(HMGU)
top related