sdtrls: predicting drug-target interactions for complex...

Research ArticleSDTRLS Predicting Drug-Target Interactions for ComplexDiseases Based on Chemical Substructures

Cheng Yan12 Jianxin Wang1 Wei Lan1 Fang-Xiang Wu3 and Yi Pan4

1School of Information Science and Engineering Central South University Changsha Hunan 410083 China2School of Computer and Information Qiannan Normal University for Nationalities Duyun Guizhou 558000 China3Department of Mechanical Engineering and Division of Biomedical Engineering University of Saskatchewan Saskatoon SK Canada4Department of Computer Science Georgia State University Atlanta GA 30302 USA

Correspondence should be addressed to Jianxin Wang jxwangmailcsueducn

Received 8 April 2017 Revised 19 October 2017 Accepted 1 November 2017 Published 3 December 2017

Academic Editor Daniela Paolotti

Copyright copy 2017 Cheng Yan et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

It is well known that drug discovery for complex diseases via biological experiments is a time-consuming and expensive processAlternatively the computational methods provide a low-cost and high-efficiency way for predicting drug-target interactions (DTIs)from biomolecular networks However the current computational methods mainly deal with DTI predictions of known drugsthere are few methods for large-scale prediction of failed drugs and new chemical entities that are currently stored in somebiological databases may be effective for other diseases compared with their originally targeted diseases In this study we propose amethod (called SDTRLS) which predicts DTIs through RLS-Kronmodel with chemical substructure similarity fusion andGaussianInteraction Profile (GIP) kernels SDTRLS can be an effective predictor for targets of old drugs failed drugs and new chemicalentities from large-scale biomolecular network databases Our computational experiments show that SDTRLS outperforms thestate-of-the-art SDTNBI method specifically in the G protein-coupled receptors (GPCRs) external validation the maximum andthe average AUC values of SDTRLS are 0842 and 0826 respectively which are superior to those of SDTNBI which are 0797 and0766 respectivelyThis study provides an important basis for newdrug development and drug repositioning based on biomolecularnetworks

1 Introduction

The identification of target molecules associated with spe-cific diseases is the basis of modern drug discovery anddevelopment [1ndash3] Therefore the identification of drug-target interactions (DTIs) is important for drug developmentHowever it is well known that drug discovering is a cost-and time-consuming process in the field of pharmacologyAccording to the USA Food and Drug Administrationstatistical data the cost of new drug discovery is approx-imately $18 billion and it takes an average of 13 years[4] Therefore how to deal with this problem becomesan emerging issue Over decades different computationalmethods and tools [5ndash13] have been developed to predictlarge-scale potential DTIs and drug repositing through theunremitting efforts of a large number of researchers andorganizations under the development of computing technol-ogy

Meanwhile many DTI data have been generated with therapid growth of the public chemical and biological databaseFor example PubChem [14] is a freely available chemistrydatabase There are 7759 drug entities 4104 target proteinsand 15199 DTIs present in DrugBank [15] database by nowThe freely available online ChEMBL [16] database providespharmaceutical chemists with a convenient platform forquerying target bioactivity data for compounds or targetsIn addition including TTD [17] KEGG [18] SIDER [19]STITCH [20] STRING [21] BindingDB [22] and othervarious kinds of resources have established the basis for DTIprediction

Now it is possible for us to quickly and inexpensivelyidentify potential DTIs and repurpose existing drugs [23ndash27] through the developments of computational methodsThese methods are mainly divided into three categoriesincluding basic network-based models machine learning-basedmodels and other approaches based on similarity [28]

HindawiComplexityVolume 2017 Article ID 2713280 10 pageshttpsdoiorg10115520172713280

2 Complexity

From the viewpoint of basic network-based modelCheng et al [29] developed the method to predict DTIsthrough network-based inference (NBI) Comparing withdrug-based similarity inference (DBSI) and target-based sim-ilarity inference (TBSI) NBI is better than them because it isin the full use of the knownDTIs Moreover node- and edge-weighted NBI was developed via constructing the weightof nodes and edges on drug-target network Network-basedRandom Walk with Restart on the Heterogeneous network(NRWRH)was developed by Chen et al which implementedthe random walk on the heterogeneous network (protein-protein similarity network drug-drug similarity networkand known drug-target interaction networks) [30] It isan enhanced version of the traditional random walk thatimproved the predictive performance through making fulluse of data with the integrated heterogeneous network

Some machine learning-based approaches were alsodeveloped to predict DTIs Following Bleakley et al [31] andMordelet and Vert [32] Bleakley and Yamanishi [33] furtherproposed the bipartite local model (BLM) to predict DTIswhich used local support vector machine (SVM) classifierswith known DTIs and integrated the chemical structure sim-ilarity and protein sequence similarity information GaussianInteraction Profile (GIP) kernels on drug-target networkswere significant improvements developed by van Laarhovenet al [34] In order to solve the problem of negative samplesLan et al [35] proposed a prediction method (PUDT)which classified unlabeled samples into the reliable negativeexamples and likely negative examples based on the similarityof protein structure and achieved good results

The matrix decomposition technique is also used forpredicting DTIs miRNA-disease associations [36 37] and soon It maps the DTI matrix to the low-dimensional matrixto infer the hidden interactions based on the known inter-actions Gonen [38] proposed a Bayesian model that com-bined dimensionality reduction matrix factorization andbinary classification for predicting DTIs via integrating thedrug-drug chemical similarity and protein-protein sequencesimilarity Multiple Similarities Collaborative Matrix Factor-ization (MSCMF) [39] method projected drugs and targetsinto a common low-rank feature space and significantlyimproved the results via adjusting the weight of similaritymatrix of drugs and of targets Ezzat et al [40] developed theRegularized Matrix Factorization method that distinguishedfrom many of nonoccurring edges in the interaction matrixwhich are actually unknown or hide cases by other simi-larity information DrugE-Rank [12] developed a machinelearning-based model by combining the advantages of twodifferent types of feature-based and similarity-basedmethodsto improve the prediction performance

Although the above methods have gained good results inpredicting the newDTIs on known drugs it is also importantto predict DTIs of failed drugs and new chemical entitiesThere are thousands of drugs that are failed in clinical phasesand even US National Center for Advancing TranslationalSciences is paying US$20 million to research for repurposing58 failed drugs [41 42] as the drugs that failed in their initiallytargeted diseases may be effective in other diseases Wu et al[43] proposed an integrated network and chemoinformatics

tool for systematic prediction ofDTIs anddrug repositioningnamely SDTNBI (substructure-drug-target network-basedinference) which predicted newDTIS of failed drugs and newchemical entities by integrating known DTIS and chemicalsubstructure of failed drugs or new chemical entities in away of resource diffusionTheir study assumed that chemicalsubstructure played key roles in DTIs This method achievedgood prediction results for large-scale failed drugs and newchemical entities based on chemical substructures sharedbetween them and the known drugs

In this study we propose a method called SDTRLS(substructure-drug-target Kronecker product kernel regu-larized least squares) for large-scale DTI prediction anddrug repositioning based on the chemical substructures ofknown drugs failed drugs and new chemical entities Firstlywe compute the substructure similarity and then create aGaussian Interaction Profile (GIP) kernels for drug entitiesand target proteins based on known DTIs The 119896-nearestneighbor (KNN) was used to compute the initial relationalscore in the presence of a new chemical entity or failed drugthat has no known DTIs Through similarity network fusion(SNF) technology [44] the similarity of substructure and GIPof drugs are integrated SNF substantially outperforms single-type data analysis and establishes integrative approaches topredicting DTIs Finally the RLS-Kron [34] classifier wasused to predict DTIs which constructs a large kernel thatdirectly relates to the drug-target pairs by combining thesimilarity kernels of drug entities and target proteins In orderto comprehensively assess the performance of our methodwe compare it against current state-of-the-art algorithmswith the same data and evaluation criteriaWe use the 10-foldcross validation and external validation to show the accuracyand robustness of our method The computational resultsshow that our proposed SDTRLS is comparable to other fivemethods in terms of stability Especially in the G protein-coupled receptors (GPCRs) external validation dataset themaximum and average AUC values were 0842 and 0826respectively which are superior to 0797 and 0766 from state-of-the-art SDTNBI method In order to further confirm theprediction ability of STDRLS we perform an experimentalanalysis on someprediction results In summary we provide anew alternative method for DTI prediction for known drugsfailed drugs and new chemical entities It provides the basisfor drug discovery development and personalized medicaltreatment in the future

2 Materials

This study used five internal validation datasets and twoexternal validation datasets The internal datasets are usedto validate the predictions of the new DTIs of known drugsand the external datasets are used to validate the predictionsof all DTIs of new entities and failed drugs Five internaldatasets are G protein-coupled receptors (GPCRs) kinasesuperfamily (Kinases) ion channels (ICs) nuclear receptors(NRs) and Global GPCRs and Kinases were downloadedfrom ChEMBL database ICs and NRs were collected fromthe ChEMBL and BindingDB databaseTheGlobal is a globalnetwork covering genomewide targets where all drugs also

Complexity 3

Table 1 Drugs targets and DTIs in each dataset

Datasets Targets 119878119889 119878119905 119873119889119905 Sparsity ()

Internal datasets

GPCRs 4741 97 17111 372Kinases 2827 206 13647 234ICs 7929 97 8944 116NRs 5218 35 7366 403Global 1844 1032 10185 054

External datasets ExGPCRs 92 46 271 64ExKinases 188 28 202 384

119878119889 is the number of drugs 119878119905 is the number of targets119873119889119905 is the known DTIs and sparsity is the proportion of the of119873119889119905 to all possible DTIs in datasets

come from DrugBank database Two external datasets wereselected from GPCRs and Kinases in DrugBank databaserespectively

The external validation is to predict all DTIs for drugsso it needs a basic dataset that includes drugs targets andknown DTIs GPCRs and Kinases are the basic datasets toExGPCRs and ExKinases respectivelyThe known 17111 DTIsof GPCRs are the prior knowledge to external validation ofExGPCRs in Table 1

Table 1 shows that the 92 drugs of ExGPCRs and 4741of GPCRs are independent of each other However the46 targets of ExGPCRs are the subset of the 92 targets ofGPCRs Furthermore the relationship of drugs and targetsbetween Kinases and ExKinases is the same as that betweenExGPCRs and GPCRs These datasets can be downloadedfrom httplmmdecusteducnmethodssdtnbi Table 1contains some statistics of five internal validation datasets andtwo external datasets

21 Chemical Substructure In this study we used seventypes of fingerprints to express the chemical substruc-tures of each molecule All substructure data are generatedfrom PaDEL-Descriptor software including CDK Finger-print CDK Extended Fingerprint CDK Graph Only Finger-print Substructure Fingerprint Klekota-Roth FingerprintMACCS Fingerprint and PubChem Fingerprint namelyCDK CDKExt Graph FP4 KR MACCS and PubChemrespectively Each type of substructures of each molecule isrepresented by a multiple dimensional vector with values of0 or 1 We only used the substructures that appear in thedatasets

Table 2 contains the overview of the seven substructuresof dataset GPCRs including the dimension of each chemicalsubstructure The dimensions of substructures were derivedfrom the statistics result of the datasets that include allappearing substructure types

3 Methods

31 Chemical Substructure Similarity Let 119878 = 1199041 1199042 119904119870be a set of all substructures for one type of seven chemicalsubstructures where 119870 is the dimension of the chemicalsubstructure For example the value of 119870 is 1024 in CDKand the value of 119870 is 153 in MACCS 119863 = 1198891 1198892 119889119898

Table 2 The dimensions on GPCRs

Chemical substructuretypes Dimensions

CDK 1024CDKExt 1012FP4 131Graph 1023KR 1834MACCS 153PubChem 627

is the set of drugs where 119898 is the number of drugs Forone chemical substructure drug 119889119894 can be represented by aprofile (binary vector) of the substructure that is 119863119878(119889119894) =1198891199041(119889119894) 1198891199042(119889119894) 119889119904119870(119889119894) If drug 119889119894 has 119904119896 the value of119889119904119896(119889119894) is 1 otherwise 0 For a type of chemical substructurethe substructure similarity 119878subsim(119889119894 119889119895) of drugs 119889119894 and119889119895 can be computed by the weighted cosine correlationcoefficient based on the substructure information [27]

119878subsim (119889119894 119889119895) = sum119870119896=1 119908119896119889119904119896 (119889119894) 119889119904119896 (119889119895)radicsum119870119896=1 1199081198961198891199042119896 (119889119894)radicsum119870119896=1 1199081198961198891199042119896 (119889119895)

(1)

where119908119896 is the weight of the 119896th substructure (119904119896) which canbe calculated by the formula [27]

119908119896 = exp(minus 11989121198961205752ℎ2) (2)

where 119891119896 is the frequency of chemical substructure 119904119896 in thewhole dataset 120575 is the standard deviation of 119904119896119896=119870119896=1 and ℎis a parameter (set to be 01 in this study) The basic rationalefor introducing theweight to compute substructure similaritybetween drugs and new chemical entities is that substructureswith fewer occurrences should occupy a more proportionthan substructures which appear frequently

32 Gaussian Interaction Profile Kernel We denoted that 119879 =1199051 1199052 119905119899 is the set of 119899 targets A drug-target network canbe represented by a bipartite graph which has an adjacency

4 Complexity

matrix 119884 isin 119877119898lowast119899 where the value of 119910119894119895 is 1 if 119889119894 and 119905119895 haveknown DTI otherwise 0 The Gaussian Interaction Profile(GIP) kernel is constructed from the topology informationof known DTIs network [10 34] The kernel of drugs 119889119894 and119889119895 can be formulated as

119870GIP119889 (119889119894 119889119895) = exp (minus120574119889 10038171003817100381710038171003817119884 (119889119894) minus 119884 (119889119895)100381710038171003817100381710038172) 120574119889 = 120572

((1119873119889)sum119873119889119894=1 1003817100381710038171003817119884 (119889119894)10038171003817100381710038172) (3)

where 119884(119889119894) = 1199101198941 1199101198942 119910119894119899 is the interaction profile ofdrug 119889119894 and 120572 is a parameter that controls the bandwidthwe set the value to be 1 in this study Similarly the kernel oftargets 119905119894 and 119905119895 can be calculated by (4)

119870GIP119905 (119905119894 119905119895) = exp (minus120574119905 10038171003817100381710038171003817119884 (119905119894) minus 119884 (119905119895)100381710038171003817100381710038172) (4)

120574119905 = 120573((1119873119905)sum119873119905119894=1 1003817100381710038171003817119884 (119905119894)10038171003817100381710038172) (5)

where 119884(119905119895) = 1199101119895 1199102119895 119910119898119895119879 is the interaction profile oftarget 119905119895 we also set the parameter 120573 to be 1

33 Similarity Network Fusion Wehave two similaritymatri-ces for drugs (including known drugs new chemical entities)namely substructure similarity 119878subsim isin 119877119898lowast119898 and 119870GIP119889 isin119877119898lowast119898 To constructmore comprehensive similarity kernel fordrugs we used the SNFmethod to fuse two similarity kernels

Firstly the row-normalized matrices 119875(1) and 119875(2) arecalculated from the drug similarity matrices 119878subsim and119870GIP119889 respectively Secondly according to the 119870-nearestneighbors (KNN)method the resultant matrices 119878(1) and 119878(2)are obtained from119875(1) and119875(2) by the following equation[44]

119878 (119889119894 119889119895) =

119875 (119889119894 119889119895)sum119889119896isin119873(119889119894) 119875 (119889119894 119889119896) 119889119895 isin 119873 (119889119894) 0 otherwise

(6)

where119873(119889119894) is the set of top119873 similar neighbors of drug 119889119894In this study we set the value of119873 to be 50 The main idea ofSNF is iteratively updating similarity matrices 119875(1) and 119875(2)[44]

119875(1)119905+1 = 119878(1) times 119875(2)119905 times (119878(1))119879 119875(2)119905+1 = 119878(2) times 119875(1)119905 times (119878(2))119879

(7)

where the parameter 119905 represents the times of iterations andits value is set to be 20 in this study by considering that theiteration time can not be too long and max(max(abs(((119875(1)119905 +119875(2)119905 )2 minus (119875(1)119905minus1 + 119875(2)119905minus1)2)))) lt 10minus3 The initial matrices aredefined as 119875(1)119905=1 = 119875(1) and 119875(2)119905=1 = 119875(2) The final similaritymatrix 119878final isin 119877119898lowast119898 of drugs is calculated from the averagevalue of matrices 119875(1)20 and 119875(2)20 (119878final = (119875(1)20 + 119875(2)20 )2)

34 Kron RLS Kronecker product kernels are used widelyin prediction issues of other studies and conditions [45ndash47] In this study we also use a Kronecker product kernelto construct a larger kernel for the drug-target pairs Thenthe prediction of DTIs is based on the ranking of the pairsthat include known drugs and targets and new entities orfailed drugs and targets The higher rank implies the higherpossibility of existing interactions Based on the kernel ofdrugs and targets the Kronecker product kernel of drug-target pairs is constructed as follows [34]

119870((119889119894 119905119895) (119889119896 119905119897)) = 119870119889 (119889119894 119889119896)119870119905 (119905119895 119905119897) (8)

where 119870119889(119889119894 119889119896) is the (119894 119896)th element of the kernel of drugswith 119878final while 119870119905(119905119895 119905119897) is the (119895 119897)th element kernel oftargets with119870GIP119905

According the Kronecker product kernel of formula (8)the predictions of DTIs for all drug-target pairs can becalculated as follows [34]

vec (119879) = 119870 (119870 + 120590119868)minus1 vec (119884119879) (9)

where 120590 is a regularization parameter The smoother resultcan be obtained via the higher value 120590 We get = 119884 when120590 = 0 which shows no generalization [34] We also usethe eigendecompositions of the kernel matrices according toLaarhovenrsquos study The eigendecompositions of matrices 119870119889and 119870119905 are 119870119889 = or119889and119889or119879119889 and 119870119905 = or119905and119905or119879119905 in which or119889and or119905 are the unitary matrices of feature vectors and and119889and and119905 are the diagonal matrices of eigenvalues for drugsand targets respectively Since the eigenvalues (vectors) of aKronecker product are the Kronecker product of eigenvalues(vectors) the Kronecker product kernel of drug-target pairscan be formulated as follows [34]

119870 = 119870119889 otimes 119870119905 = or and or119879 (10)

in whichor = or119889 otimes or119905and = and119889 otimes and119905

(11)

35 KNN forNewChemical Entities New chemical entities orfailed drugs have no known associations with targets whichmakes it impossible to predict more associations by existingmethods In this study we used the KNNmethod to estimatethe interaction scores for new chemical entities or faileddrugs by the similarity between them and known drugs Forexample we denote a new chemical entity or failed drug as119862new whose interaction score with target 119905119895 can be computedby the formula

Score (119862new 119905119895) = sum 119878(119889119894 119889119897)subsim119910119897119895sum119878(119889119894 119889119897)subsim

119897 isin 119870new (12)

where 119878(119889119894 119889119897)subsim is the (119894 119897)th element of chemical substructuresimilaritymatrix 119878subsim isin 119877119898lowast119898 and119910119897119895 is the (119897 119895)th elementof 119884 isin 119877119898lowast119899 119870new is the set of top 119870 neighbors according tothe 119878subsim matrix In this study we set the value of119870 to be 4

Complexity 5

Table 3 The performance of 10-fold cross validation on 5 datasets

AUCTarget FP DBSI-R NWNBI EWNBIlowast NBI SDTNBIlowast SDTRLS

GPCRs

CDK 0896 plusmn 0003 0981 plusmn 0001 0981 plusmn 0001 0980 plusmn 0001 0904 plusmn 0003 0982 plusmn 0002CDKExt 0895 plusmn 0002 0981 plusmn 0001 0981 plusmn 0001 0980 plusmn 0001 0901 plusmn 0003 0982 plusmn 0002FP4 0896 plusmn 0002 0981 plusmn 0001 0981 plusmn 0001 0980 plusmn 0001 0966 plusmn 0002 0979 plusmn 0002Graph 0897 plusmn 0002 0981 plusmn 0001 0981 plusmn 0001 0980 plusmn 0001 0917 plusmn 0003 0980 plusmn 0001KR 0909 plusmn 0002 0981 plusmn 0001 0981 plusmn 0001 0980 plusmn 0001 0960 plusmn 0002 0983 plusmn 0002

MACCS 0881 plusmn 0005 0981 plusmn 0001 0981 plusmn 0001 0980 plusmn 0001 0931 plusmn 0002 0982 plusmn 0001PubChem 0895 plusmn 0003 0981 plusmn 0001 0981 plusmn 0001 0980 plusmn 0001 0918 plusmn 0003 0981 plusmn 0001

Kinases



ICs

CDK 0923 plusmn 0002 0582 plusmn 0007 0 0573 plusmn 0013 0932 plusmn 0004 0956 plusmn 0005CDKExt 0922 plusmn 0003 0582 plusmn 0007 0 0573 plusmn 0013 0931 plusmn 0004 0955 plusmn 0005FP4 0916 plusmn 0003 0582 plusmn 0007 0 0573 plusmn 0013 0954 plusmn 0003 0943 plusmn 0005Graph 0920 plusmn 0003 0582 plusmn 0007 0 0573 plusmn 0013 0940 plusmn 0003 0948 plusmn 0005KR 0932 plusmn 0004 0582 plusmn 0007 0 0573 plusmn 0013 0971 plusmn 0002 0953 plusmn 0005

MACCS 0919 plusmn 0004 0582 plusmn 0007 0 0573 plusmn 0013 0941 plusmn 0003 0950 plusmn 0005PubChem 0916 plusmn 0003 0582 plusmn 0007 0 0573 plusmn 0013 0937 plusmn 0003 0949 plusmn 0005

NRs



Global



0 represents the fact that we did not compute the prediction performance because of data reason lowast stands for the prediction results derived from previousstudies

4 Experiments and Results

41 Benchmark Evaluation and Evaluation Indices In orderto demonstrate the performance of our method we adoptthe 10-fold cross validation and external validation The10-fold validation was widely used in prediction of DTIs[29 48 49] and other interaction prediction in bioinfor-matics The main experiment process is that the wholedataset is randomly divided into 10 groups each groupalternates as a testing set and the rest of the 9 groups

alternate as the training set and this process is repeated 10times

Furthermore the DTIs of new chemical entities andfailed drugs are a very important portion in this study Weuse two external datasets (ExGPCRs ExKinases) to evaluateperformance of our method by predicting all interactionswith them

We use the AUC (area under the ROC curve) as anevaluation metric for our SDTRLS as for SDTNBI methodsand the values in Tables 3 5 and 6 are presented in the format

6 Complexity

Table 4 The performance of two external validations

AUCTarget FP DBSI-R NWNBI EWNBIlowast NBI SDTNBI SDTRLS

ExGPCRs

CDK 0752 0756 0764 0769 0753 0824CDKExt 0751 0756 0764 0769 0751 0804FP4 0758 0756 0764 0769 0784 0818Graph 0758 0756 0764 0769 0761 0842KR 0770 0756 0764 0769 0797 0840

MACCS 0754 0756 0764 0769 0758 0822PubChem 0754 0756 0764 0769 0759 0831

ExKinases


MACCS 0851 0812 0821 0828 0852 0844PubChem 0850 0812 0821 0828 0852 0846

lowast stands for the prediction results derived from previous studies

of mean plusmn standard deviation The larger the AUC value isthe better the prediction is

42 Cross Validation Table 3 describes the performanceevaluation index values of the predicted datasets in the 10-fold cross validation for 5 datasets SDTRLSrsquos minimumAUCamong the seven substructures reaches 0979 and the averageis 0981 which indicates good prediction results Howeveron NRs dataset the validation results of each substructureare relatively poor and the minimum value is 0905 basedon Graph substructure while the maximum value is 0916On Kinases dataset the verification results are also verystable with the maximum and minimum values of 0973 and0969 respectively On ICs dataset the verification resultsare not bad the minimum value of AUC is 0943 with FP4substructure and the maximum value is 0956 with CDKsubstructure Similarly on Global dataset the results arestable except for the slightly lower values of 7 substructuresbetween 0935 and 0936 In general the validation resultson GPCR and Kinase datasets are better than the other threedatasets Moreover the prediction performances of EWNBINWNBI and NBI on Kinases dataset are lightly better thanSDTRLS while SDTRLS has obvious advantage on ICs NRsand Global datasets In addition because the authors didnot provide the data needed for EWNBI method on threedatasets (ICs NRs and Global) and the prediction resultsof datasets GPCRs and Kinases are not good we do notcompute the AUC values of EWNBI method on these threedatasets Overall SDTRLS and SDTNBI provide more stableprediction results on 5 datasets

43 External Validation Table 4 describes the evaluationresults of six methods on two external datasets ExGPCRsand ExKinases the basic datasets are GPCRs and Kinasesrespectively Overall external validation results of all predic-tion methods are worse than 10-fold cross validation results

because new chemical entities have no known DTIs OnExGPCRs dataset the AUC values of SDTRLS on the 7substructures are between 0804 and 0842 On ExKinasesdataset the AUC values of SDTRLS of the 7 substructures arebetween 0827 and 0855 As can be seen from Table 4 theverification results of all approaches on ExKinases are betterthan on ExGPCRs In the validation on ExKinases datasetthere are no obvious differences in AUC values amongDBSI-R SDTNBI and SDTRLS On ExKinases SDTRLSdemonstrates its excellent prediction power

44 Comparison with Previous Methods Since the datasetsused in this study are derived from the datasets used inthe SDTNBI method as the state-of-the-art method itsprediction performances are more stable than the other 4methods In this study the comparison is performed in termsof the 119905-test statistical analyses of SDTRLS and SDTNBImethods as well as in terms of the parameter-independentAUC value with other 5 methods

Table 5 shows 119905-tests results of SDTNBI and SDTRLSon five datasets GPCRs Kinases ICs NRs and Globalrespectively We can see from Table 5 that the average AUCof our method on each dataset is greater than that of theSDTNBI method especially in the GPCRs and Kinasesdatasets respectively from 0928 to 0981 and from 0919to 0971 Moreover there were significant differences (119901 lt005) in the comparison results of GPCRs Kinases andNRs datasets particularly the comparison result is moresignificant (119901 lt 001) on GPCRs and Kinases datasets Inconclusion our method is more stable than the SDTNBImethod in terms of the 10-fold cross validation

We also compare the prediction results with other fourmethods on five datasets GPCRs Kinases ICs NRs andGlobal The four competing methods are NBI NWNBIEWNBI and DBSI-R [29] NBI applied a mass diffusion-based method to obtain the predicted list by considering

Complexity 7

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

20 30 40 50 60 70 80 90 10010N(di)

ExGPCRs with Graph

70 8040 50 6020 30 90 10010N(di)

05

06

07

08

09

10

AUC

Figure 1 Robustness of SDTRLS with respect to the number of119873(119889119894) the dotted line is the default value and its prediction performance

Table 5 The 119905-tests results of 10-fold cross validations on 5 datasets

AUCMethods GPCRs Kinases ICs NRs GlobalSDTNBI 0928 plusmn 0026 0919 plusmn 0028 0944 plusmn 0014 0888 plusmn 0023 0928 plusmn 0017SDTRLS 0981 plusmn 0001 0971 plusmn 0001 0951 plusmn 0005 0912 plusmn 0004 0935 plusmn 0000119901 00002 00004 0248 0016 0232

Table 6 The 119905-tests results of two external validationsAUC

Methods ExGPCRs ExKinasesSDTNBI 0766 plusmn 0017 0853 plusmn 0005SDTRLS 0825 plusmn 0013 0842 plusmn 0009119901 104119890 minus 05 0019

the bipartite graph However in EWNBI method a DTInetwork was weighted by the potency of binding affinity orinhibitory activity of the interactions with drugs and targetsThe theoretical basis of NWNBI method is that the hubnode is more difficult to be influenced The DBSI methodis based on the hypothesis that two similar drugs may havesimilar targets Table 3 shows that the SDTRLS method isslightly better than NBI NWNBI and EWNBI methods onGPCRs and much better than DBSI-R method In additionSDTRLS method is much better than DBSI-R method whilebeing comparable with NBI NWNBI and EWNBI methodson Kinases In general the SDTRLS approach is comparableto these four methods from the results of the 10-fold crossvalidation on GPCRs and Kinases datasets

Table 6 shows results of SDTNBI and SDTRLS on twodatasets ExGPCRs and ExKinases respectively FromTable 6we can see that our method greatly outperforms the SDTNBImethod on ExGPCRs in terms of the average AUC and 119905-test result (119901 lt 001) In addition the average AUC of our

methods are slightly lower than those of SDTNBI method onExKinases which may be due to the sparsity of known DTIsin this dataset

We compare the prediction result of our method withother four competing methods on the same datasets ExG-PCRs and ExKinases We can see from Table 4 that SDTRLSmethod outperforms the other four competing methods onExGPCRs dataset In addition SDTRLS method is also com-parable with other four competing methods on ExKinasesdataset

45 Parameter Analysis for 119873(119889119894) and 119870 In this section weanalyzed two parameters including119873(119889119894) for similarity net-work fusion and 119870 for new chemical entities The parameterℎ was set to be 1 according to previous study [27] MoreoverGIP is widely used in other studies [10 34 37 50 51] wealso set the values of both 120572 and 120573 to be 1 All results werevalidated over external validation of ExGPCRs datasets basedon substructures MACCS and Graph Figure 1 describes thatthe sensitivity of the prediction performance of SDTRLSwith to different numbers119873(119889119894) of similarity network fusionSDTRLS had stable prediction performance over awide rangefrom 10 to 100 The impact of parameter 119870 for new chemicalentities on the prediction performance of SDTRLS in termsof AUC value is illustrated in Figure 2 SDTRLS was robustto different values of parameter 119870

46 Case Studies In order to further confirm the predictionability of our method we conduct an experimental analysis

8 Complexity

ExGPCRs with Graph

05

06

07

08

09

10

AUC

2 987 101 543 6K

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

2 987 101 543 6K

Figure 2 Robustness of SDTRLS with respect to the number of 119870 the dotted line is the default value and its prediction performance

on dataset ExGPCRs and its known DTIs are not used asa priori knowledge when conducting external validationThe selected predictions of drugs are confirmed with Drug-Bank ChEMBL and KEGG databases Table 7 describesthe confirmed result based on ExGPCRs dataset We selectthe top five predicted interactions of 5 drugs the top onepredicted interaction of every drug is confirmed by searchingdatabases Furthermore 76 of all predicted DTIs (19 outof 25) are also confirmed with three databases 32 ofpredicted DTIs (8 out of 25) are simultaneously confirmedwith two databases especially in the predicted result of(DB00209 DB00283 and DB00334) they are all confirmedwith the several databases In addition we further validatethe results marked as unknown in the prediction resultswe searched the relevant literature and found the relateddescription For example thiethylperazine (DB00372) is anantagonist of human Dopamine D3 (hDRD3 170) accordingto the description in Petsko and Ringe [52] which showsthat the prediction result is meaningful and other remainingunknown DTIs deserve being validated in the future Ingeneral it proves that our method is effective in practicalapplications

5 Conclusions

The systematic understanding of the interactions betweenchemical compounds and target proteins is very impor-tant for new drug design and development In the pastdecades in order to solve the time-consuming shortcom-ings of traditional biochemical methods many computa-tional approaches have been developed to predict DTIs likemachine learning network inference and so on Howeverthese methods mainly focused on newDTIs for known drugsand paid less attention to new chemical entities for DTIs Inaddition their prediction performances are not good enough

In this study we have constructed the similarity kernelof approved drugs failed drugs and new chemical entities

Table 7The new confirmation of drug-target interactions based onGraph substructure in the ExGPCRs

Drug ID Target ID Rank Source

DB00209

hCHRM2 86 1 KEGGhCHRM3 98 2 KEGG

hCHRM1 92 3 DrugBankKEGG


DB00283

hDRD3 170 1 ChEMBLhDRD4 106 2 ChEMBLHDRD2 94 3 ChEMBLhOPRM1 166 4 ChEMBLhOPRK1 173 5 ChEMBL

DB00334

hCHRM2 86 1 DrugBankChEMBL



hA1AB 164 4 ChEMBLKEGG

hA1AD 116 5 ChEMBLKEGG

DB00372

hDRD2 94 1 DrugBankKEGG

h5HT2A 125 2 Unknownh5HT2C 126 3 UnknownhDRD3 170 4 Unknownh5HT1A 89 5 Unknown

DB00612

hB1AR 88 1 DrugBankKEGG

hB2AR 84 2 DrugBankhB3AR 93 3 UnknownhDRD2 94 4 UnknownhOPRM1 166 5 Unknown

Complexity 9

by weighting the chemical substructures Then GIP kernelswere calculated from drugs and targets according to theknown DTIs For the new chemical entities or failed drugswe used the KNN to initialize the DTIs before calculatingthe GIP kernel To construct a comprehensive similaritykernel for drugs SNF method is used to fuse GIP kerneland substructure similarity kernel Finally the score of drug-target pairs was predicted by Kron RLS We compared theprediction performance with other competing methods viathe tenfold cross validation and external validation

However there are still some limitations in this studyFirst since the target set is specified within the currentdatasets it may be not possible to predict the DTIs of thetarget beyond the datasets Other similarity information oftargets such as the sequence and functional network [53ndash56] is not used when the similarity kernel of targets isconstructed In addition the 3D structure of drugs mayalso need to be considered as important information It isexpected that additional informationmay improve predictionperformance In the future more information using othermethods such as ClusterViz [57] should be integrated todevelop a more efficient prediction method Neverthelessthis study provides an important basis for new drug devel-opment and drug repositioning and also plays an importantrole in the personalized medical development

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been supported in part by the National NaturalScience Foundation of China under Grants no 61772552 no61420106009 and no 61622213

References

[1] T T Ashburn and K B Thor ldquoDrug repositioning identifyingand developing new uses for existing drugsrdquo Nature ReviewsDrug Discovery vol 3 no 8 pp 673ndash683 2004

[2] N Novac ldquoChallenges and opportunities of drug reposition-ingrdquo Trends in Pharmacological Sciences vol 34 no 5 pp 267ndash272 2013

[3] M R Hurle L Yang Q Xie D K Rajpal P Sanseau andP Agarwal ldquoComputational drug repositioning from data totherapeuticsrdquoClinical PharmacologyampTherapeutics vol 93 no4 pp 335ndash341 2013

[4] A L Hopkins ldquoDrug discovery predicting promiscuityrdquoNature vol 462 no 7270 pp 167-168 2009

[5] S J Swamidass ldquoMining small-molecule screens to repurposedrugsrdquo Briefings in Bioinformatics vol 12 no 4 pp 327ndash3352011

[6] B Y Feng A Simeonov A Jadhav et al ldquoA high-throughputscreen for aggregation-based inhibition in a large compoundlibraryrdquo Journal ofMedicinal Chemistry vol 50 no 10 pp 2385ndash2390 2007

[7] M A Yildirim K-I Goh M E Cusick A-L Barabasi and MVidal ldquoDrug-target networkrdquoNature Biotechnology vol 25 no10 pp 1119ndash1126 2007

[8] S Zhao and S Li ldquoNetwork-based relating pharmacological andgenomic spaces for drug target identificationrdquo PLoS ONE vol5 no 7 Article ID e11764 2010

[9] S Alaimo A Pulvirenti R Giugno and A Ferro ldquoDrug-targetinteraction prediction through domain-tuned network-basedinferencerdquo Bioinformatics vol 29 no 16 pp 2004ndash2008 2013

[10] J-P Mei C-K Kwoh P Yang X-L Li and J Zheng ldquoDrug-target interaction prediction by learning from local informationand neighborsrdquo Bioinformatics vol 29 no 2 pp 238ndash245 2013

[11] T van Laarhoven and E Marchiori ldquoPredicting Drug-TargetInteractions for New Drug Compounds Using a WeightedNearest Neighbor Profilerdquo PLoS ONE vol 8 no 6 Article IDe66952 2013

[12] Q Yuan J Gao D Wu S Zhang H Mamitsuka and S ZhuldquoDrugE-Rank improving drug-target interaction prediction ofnew candidate drugs or targets by ensemble learning to rankrdquoBioinformatics vol 32 no 12 pp i18ndashi27 2016

[13] H Luo J Wang M Li et al ldquoDrug repositioning basedon comprehensive similarity measures and Bi-Random walkalgorithmrdquo Bioinformatics vol 32 no 17 pp 2664ndash2671 2016

[14] E E Bolton Y Wang P AThiessen and S H Bryant Chapter12 - PubChem Integrated Platform of Small Molecules andBiological Activities Elsevier Science amp Technology 2008

[15] D S Wishart C Knox A C Guo et al ldquoDrugBank a knowl-edgebase for drugs drug actions and drug targetsrdquoNucleic AcidsResearch vol 36 pp D901ndashD906 2008

[16] A Gaulton L J Bellis A P Bento et al ldquoChEMBL a large-scalebioactivity database for drug discoveryrdquoNucleic Acids Researchvol 40 no 1 pp D1100ndashD1107 2012

[17] C Qin C Zhang F Zhu et al ldquoTherapeutic target databaseupdate 2014 a resource for targeted therapeuticsrdquoNucleic AcidsResearch vol 42 no 1 pp D1118ndashD1123 2014

[18] M Kanehisa and S Goto ldquoKEGG kyoto encyclopedia of genesand genomesrdquo Nucleic Acids Research vol 28 no 1 pp 27ndash302000

[19] M Kuhn M Campillos I Letunic L J Jensen and P BorkldquoA side effect resource to capture phenotypic effects of drugsrdquoMolecular Systems Biology vol 6 p 343 2010

[20] M Kuhn D Szklarczyk S Pletscher-Frankild et al ldquoSTITCH4 integration of protein-chemical interactions with user datardquoNucleic Acids Research vol 42 no 1 pp D401ndashD407 2014

[21] A Franceschini D Szklarczyk S Frankild et al ldquoSTRING v91protein-protein interaction networks with increased coverageand integrationrdquoNucleic Acids Research vol 41 no 1 pp D808ndashD815 2013

[22] T Liu Y Lin X Wen R N Jorissen and M K Gilson ldquoBind-ingDB a web-accessible database of experimentally determinedprotein-ligand binding affinitiesrdquo Nucleic Acids Research vol35 supplement 1 pp D198ndashD201 2007

[23] J T Dudley T Deshpande and A J Butte ldquoExploiting drug-disease relationships for computational drug repositioningrdquoBriefings in Bioinformatics vol 12 no 4 pp 303ndash311 2011

[24] H Ding I Takigawa H Mamitsuka and S Zhu ldquoSimilarity-basedmachine learning methods for predicting drug-targetinteractions A brief reviewrdquo Briefings in Bioinformatics vol 15no 5 pp 734ndash747 2013

[25] Y Tabei and Y Yamanishi ldquoScalable prediction of compound-protein interactions using minwise hashingrdquo BMC systemsbiology vol 7 p S3 2013

10 Complexity

[26] H Yabuuchi S Niijima H Takematsu et al ldquoAnalysis ofmultiple compound-protein interactions reveals novel bioactivemoleculesrdquo Molecular Systems Biology vol 7 article no 4722011

[27] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 Article ID btq176 pp i246ndashi254 2010

[28] X Chen C C Yan X Zhang et al ldquoDrug-target interactionprediction Databases web servers and computational modelsrdquoBriefings in Bioinformatics vol 17 no 4 pp 696ndash712 2016

[29] F Cheng C Liu J Jiang et al ldquoPrediction of drug-target inter-actions and drug repositioning via network-based inferencerdquoPLoS Computational Biology vol 8 no 5 Article ID e10025032012

[30] X Chen M-X Liu and G-Y Yan ldquoDrug-target interactionprediction by random walk on the heterogeneous networkrdquoMolecular BioSystems vol 8 no 7 pp 1970ndash1978 2012

[31] K Bleakley G Biau and J-P Vert ldquoSupervised reconstructionof biological networks with local modelsrdquo Bioinformatics vol23 no 13 pp i57ndashi65 2007

[32] F Mordelet and J-P Vert ldquoSIRENE Supervised inference ofregulatory networksrdquo Bioinformatics vol 24 no 16 pp i76ndashi822008

[33] K Bleakley and Y Yamanishi ldquoSupervised prediction of drug-target interactions using bipartite local modelsrdquo Bioinformaticsvol 25 no 18 pp 2397ndash2403 2009

[34] T van Laarhoven S B Nabuurs and E Marchiori ldquoGaussianinteraction profile kernels for predicting drug-target interac-tionrdquo Bioinformatics vol 27 no 21 pp 3036ndash3043 2011

[35] W Lan JWangM Li et al ldquoPredicting drugndashtarget interactionusing positive-unlabeled learningrdquo Neurocomputing vol 206pp 50ndash57 2016

[36] W Lan J Wang M Li J Liu F X Wu and Y Pan ldquoPredictingmicrorna-disease associations based on improved micrornaand disease similaritiesrdquo IEEEACM Transactions on Computa-tional Biology amp Bioinformatics vol PP no 99 p 1 2016

[37] C Yan J Wang P Ni W Lan F X Wu and Y Pan ldquoDnrlmf-mdapredicting microrna-disease associations based on simi-larities of micrornas and diseasesrdquo IEEEACM Transactions onComputational Biology amp Bioinformatics 2017

[38] M Gonen ldquoPredicting drug-target interactions from chemicaland genomic kernels using Bayesian matrix factorizationrdquoBioinformatics vol 28 no 18 pp 2304ndash2310 2012

[39] X Zheng H Ding H Mamitsuka and S Zhu ldquoCollaborativematrix factorization with multiple similarities for predictingdrug-target interactionsrdquo in Proceedings of the 19th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1025ndash1033 Chicago Ill USA August 2013

[40] A Ezzat P Zhao M Wu and X Li ldquoDrug-target interac-tion prediction with graph regularized matrix factorizationrdquoIEEEACM Transactions on Computational Biology amp Bioinfor-matics vol 1 p 1 2016

[41] M Hay D W Thomas J L Craighead C Economides and JRosenthal ldquoClinical development success rates for investiga-tional drugsrdquo Nature Biotechnology vol 32 no 1 pp 40ndash512014

[42] A Mullard ldquoDrug repurposing programmes get lift offrdquoNatureReviews Drug Discovery vol 11 no 7 pp 505-506 2012

[43] Z Wu F Cheng J Li W Li G Liu and Y Tang ldquoSDTNBI anintegrated network and chemoinformatics tool for systematic

prediction of drugndashtarget interactions and drug repositioningrdquoBriefings in Bioinformatics pp 333ndash347 2016

[44] B Wang A M Mezlini F Demir et al ldquoSimilarity networkfusion for aggregating data types on a genomic scalerdquo NatureMethods vol 11 no 3 pp 333ndash337 2014

[45] J Basilico and T Hofmann ldquoUnifying collaborative andcontent-based filteringrdquo in Proceedings of the ProceedingsTwenty-First International Conference on Machine LearningICML 2004 pp 65ndash72 July 2004

[46] A Ben-Hur and W S Noble ldquoKernel methods for predictingprotein-protein interactionsrdquo Bioinformatics vol 21 no 1 ppi38ndashi46 2005

[47] M Hue and J-P Vert ldquoOn learning with kernels for unorderedpairsrdquo in Proceedings of the 27th International Conference onMachine Learning ICML 2010 pp 463ndash470 June 2010

[48] Z Xia LWu X Zhou and S TWong ldquoSemi-supervised drug-protein interaction prediction from heterogeneous biologicalspacesrdquo BMC Systems Biology vol 4 no Suppl 2 p S6 2010

[49] C Huang R Zhang Z Chen et al ldquoPredict potential drugtargets from the ion channel proteins based on SVMrdquo Journalof Theoretical Biology vol 262 no 4 pp 750ndash756 2010

[50] Z-H You Z-A Huang Z Zhu et al ldquoPbmda A novel andeffective path-based computational model for mirna-diseaseassociation predictionrdquo PLoS Computational Biology vol 13 no3 2017

[51] W Lan M Li K Zhao et al ldquoLDAP a web server for lncRNA-disease association predictionrdquo Bioinformatics vol 33 no 3 pp458ndash460 2016

[52] G Petsko andD Ringe ldquoligand-based virtual screening for newantagonists of dopamine receptor D2D3rdquo Shanghai Manage-ment Science 2013

[53] X Peng J Wang W Peng F Wu and Y Pan ldquoProteinndashproteininteractions detection reliability assessment and applicationsrdquoBriefings in Bioinformatics vol 18 no 5 pp 798ndash819 2016

[54] B Zhao J Wang and F Wu ldquoComputational Methods toPredict Protein Functions from Protein-Protein InteractionNetworksrdquoCurrent ProteinampPeptide Science vol 18 no 11 2017

[55] B Zhao J Wang X Li and F-XWu ldquoEssential protein discov-ery based on a combination of modularity and conservatismrdquoMethods vol 110 pp 54ndash63 2016

[56] B Zhao J Wang M Li et al ldquoA New Method for PredictingProtein Functions from Dynamic Weighted Interactome Net-worksrdquo IEEE Transactions on NanoBioscience vol 15 no 2 pp133ndash141 2016

[57] J Wang J Zhong G Chen M Li F-X Wu and Y PanldquoClusterViz ACytoscapeAPP for Cluster Analysis of BiologicalNetworkrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 12 no 4 pp 815ndash822 2015

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Journal of


Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

2 Complexity

From the viewpoint of basic network-based modelCheng et al [29] developed the method to predict DTIsthrough network-based inference (NBI) Comparing withdrug-based similarity inference (DBSI) and target-based sim-ilarity inference (TBSI) NBI is better than them because it isin the full use of the knownDTIs Moreover node- and edge-weighted NBI was developed via constructing the weightof nodes and edges on drug-target network Network-basedRandom Walk with Restart on the Heterogeneous network(NRWRH)was developed by Chen et al which implementedthe random walk on the heterogeneous network (protein-protein similarity network drug-drug similarity networkand known drug-target interaction networks) [30] It isan enhanced version of the traditional random walk thatimproved the predictive performance through making fulluse of data with the integrated heterogeneous network

Some machine learning-based approaches were alsodeveloped to predict DTIs Following Bleakley et al [31] andMordelet and Vert [32] Bleakley and Yamanishi [33] furtherproposed the bipartite local model (BLM) to predict DTIswhich used local support vector machine (SVM) classifierswith known DTIs and integrated the chemical structure sim-ilarity and protein sequence similarity information GaussianInteraction Profile (GIP) kernels on drug-target networkswere significant improvements developed by van Laarhovenet al [34] In order to solve the problem of negative samplesLan et al [35] proposed a prediction method (PUDT)which classified unlabeled samples into the reliable negativeexamples and likely negative examples based on the similarityof protein structure and achieved good results

The matrix decomposition technique is also used forpredicting DTIs miRNA-disease associations [36 37] and soon It maps the DTI matrix to the low-dimensional matrixto infer the hidden interactions based on the known inter-actions Gonen [38] proposed a Bayesian model that com-bined dimensionality reduction matrix factorization andbinary classification for predicting DTIs via integrating thedrug-drug chemical similarity and protein-protein sequencesimilarity Multiple Similarities Collaborative Matrix Factor-ization (MSCMF) [39] method projected drugs and targetsinto a common low-rank feature space and significantlyimproved the results via adjusting the weight of similaritymatrix of drugs and of targets Ezzat et al [40] developed theRegularized Matrix Factorization method that distinguishedfrom many of nonoccurring edges in the interaction matrixwhich are actually unknown or hide cases by other simi-larity information DrugE-Rank [12] developed a machinelearning-based model by combining the advantages of twodifferent types of feature-based and similarity-basedmethodsto improve the prediction performance

Although the above methods have gained good results inpredicting the newDTIs on known drugs it is also importantto predict DTIs of failed drugs and new chemical entitiesThere are thousands of drugs that are failed in clinical phasesand even US National Center for Advancing TranslationalSciences is paying US$20 million to research for repurposing58 failed drugs [41 42] as the drugs that failed in their initiallytargeted diseases may be effective in other diseases Wu et al[43] proposed an integrated network and chemoinformatics

tool for systematic prediction ofDTIs anddrug repositioningnamely SDTNBI (substructure-drug-target network-basedinference) which predicted newDTIS of failed drugs and newchemical entities by integrating known DTIS and chemicalsubstructure of failed drugs or new chemical entities in away of resource diffusionTheir study assumed that chemicalsubstructure played key roles in DTIs This method achievedgood prediction results for large-scale failed drugs and newchemical entities based on chemical substructures sharedbetween them and the known drugs

In this study we propose a method called SDTRLS(substructure-drug-target Kronecker product kernel regu-larized least squares) for large-scale DTI prediction anddrug repositioning based on the chemical substructures ofknown drugs failed drugs and new chemical entities Firstlywe compute the substructure similarity and then create aGaussian Interaction Profile (GIP) kernels for drug entitiesand target proteins based on known DTIs The 119896-nearestneighbor (KNN) was used to compute the initial relationalscore in the presence of a new chemical entity or failed drugthat has no known DTIs Through similarity network fusion(SNF) technology [44] the similarity of substructure and GIPof drugs are integrated SNF substantially outperforms single-type data analysis and establishes integrative approaches topredicting DTIs Finally the RLS-Kron [34] classifier wasused to predict DTIs which constructs a large kernel thatdirectly relates to the drug-target pairs by combining thesimilarity kernels of drug entities and target proteins In orderto comprehensively assess the performance of our methodwe compare it against current state-of-the-art algorithmswith the same data and evaluation criteriaWe use the 10-foldcross validation and external validation to show the accuracyand robustness of our method The computational resultsshow that our proposed SDTRLS is comparable to other fivemethods in terms of stability Especially in the G protein-coupled receptors (GPCRs) external validation dataset themaximum and average AUC values were 0842 and 0826respectively which are superior to 0797 and 0766 from state-of-the-art SDTNBI method In order to further confirm theprediction ability of STDRLS we perform an experimentalanalysis on someprediction results In summary we provide anew alternative method for DTI prediction for known drugsfailed drugs and new chemical entities It provides the basisfor drug discovery development and personalized medicaltreatment in the future

2 Materials

This study used five internal validation datasets and twoexternal validation datasets The internal datasets are usedto validate the predictions of the new DTIs of known drugsand the external datasets are used to validate the predictionsof all DTIs of new entities and failed drugs Five internaldatasets are G protein-coupled receptors (GPCRs) kinasesuperfamily (Kinases) ion channels (ICs) nuclear receptors(NRs) and Global GPCRs and Kinases were downloadedfrom ChEMBL database ICs and NRs were collected fromthe ChEMBL and BindingDB databaseTheGlobal is a globalnetwork covering genomewide targets where all drugs also

Complexity 3



Internal datasets









3 Methods







(1)


119908119896 = exp(minus 11989121198961205752ℎ2) (2)



4 Complexity



((1119873119889)sum119873119889119894=1 1003817100381710038171003817119884 (119889119894)10038171003817100381710038172) (3)



120574119905 = 120573((1119873119905)sum119873119905119894=1 1003817100381710038171003817119884 (119905119894)10038171003817100381710038172) (5)




119878 (119889119894 119889119895) =


(6)



(7)



119870((119889119894 119905119895) (119889119896 119905119897)) = 119870119889 (119889119894 119889119896)119870119905 (119905119895 119905119897) (8)



vec (119879) = 119870 (119870 + 120590119868)minus1 vec (119884119879) (9)


119870 = 119870119889 otimes 119870119905 = or and or119879 (10)


(11)



119897 isin 119870new (12)


Complexity 5



GPCRs



Kinases



ICs



NRs



Global









6 Complexity



ExGPCRs


MACCS 0754 0756 0764 0769 0758 0822PubChem 0754 0756 0764 0769 0759 0831

ExKinases


MACCS 0851 0812 0821 0828 0852 0844PubChem 0850 0812 0821 0828 0852 0846









Complexity 7

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

20 30 40 50 60 70 80 90 10010N(di)

ExGPCRs with Graph

70 8040 50 6020 30 90 10010N(di)

05

06

07

08

09

10

AUC












8 Complexity

ExGPCRs with Graph

05

06

07

08

09

10

AUC

2 987 101 543 6K

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

2 987 101 543 6K



5 Conclusions





DB00209




DB00283


DB00334






DB00372



DB00612



Complexity 9





Acknowledgments


References


























10 Complexity









































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




Complexity 3



Internal datasets









3 Methods







(1)


119908119896 = exp(minus 11989121198961205752ℎ2) (2)



4 Complexity



((1119873119889)sum119873119889119894=1 1003817100381710038171003817119884 (119889119894)10038171003817100381710038172) (3)



120574119905 = 120573((1119873119905)sum119873119905119894=1 1003817100381710038171003817119884 (119905119894)10038171003817100381710038172) (5)




119878 (119889119894 119889119895) =


(6)



(7)



119870((119889119894 119905119895) (119889119896 119905119897)) = 119870119889 (119889119894 119889119896)119870119905 (119905119895 119905119897) (8)



vec (119879) = 119870 (119870 + 120590119868)minus1 vec (119884119879) (9)


119870 = 119870119889 otimes 119870119905 = or and or119879 (10)


(11)



119897 isin 119870new (12)


Complexity 5



GPCRs



Kinases



ICs



NRs



Global









6 Complexity



ExGPCRs


MACCS 0754 0756 0764 0769 0758 0822PubChem 0754 0756 0764 0769 0759 0831

ExKinases


MACCS 0851 0812 0821 0828 0852 0844PubChem 0850 0812 0821 0828 0852 0846









Complexity 7

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

20 30 40 50 60 70 80 90 10010N(di)

ExGPCRs with Graph

70 8040 50 6020 30 90 10010N(di)

05

06

07

08

09

10

AUC












8 Complexity

ExGPCRs with Graph

05

06

07

08

09

10

AUC

2 987 101 543 6K

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

2 987 101 543 6K



5 Conclusions





DB00209




DB00283


DB00334






DB00372



DB00612



Complexity 9





Acknowledgments


References


























10 Complexity









































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




4 Complexity



((1119873119889)sum119873119889119894=1 1003817100381710038171003817119884 (119889119894)10038171003817100381710038172) (3)



120574119905 = 120573((1119873119905)sum119873119905119894=1 1003817100381710038171003817119884 (119905119894)10038171003817100381710038172) (5)




119878 (119889119894 119889119895) =


(6)



(7)



119870((119889119894 119905119895) (119889119896 119905119897)) = 119870119889 (119889119894 119889119896)119870119905 (119905119895 119905119897) (8)



vec (119879) = 119870 (119870 + 120590119868)minus1 vec (119884119879) (9)


119870 = 119870119889 otimes 119870119905 = or and or119879 (10)


(11)



119897 isin 119870new (12)


Complexity 5



GPCRs



Kinases



ICs



NRs



Global









6 Complexity



ExGPCRs


MACCS 0754 0756 0764 0769 0758 0822PubChem 0754 0756 0764 0769 0759 0831

ExKinases


MACCS 0851 0812 0821 0828 0852 0844PubChem 0850 0812 0821 0828 0852 0846









Complexity 7

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

20 30 40 50 60 70 80 90 10010N(di)

ExGPCRs with Graph

70 8040 50 6020 30 90 10010N(di)

05

06

07

08

09

10

AUC












8 Complexity

ExGPCRs with Graph

05

06

07

08

09

10

AUC

2 987 101 543 6K

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

2 987 101 543 6K



5 Conclusions





DB00209




DB00283


DB00334






DB00372



DB00612



Complexity 9





Acknowledgments


References


























10 Complexity









































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




Complexity 5



GPCRs



Kinases



ICs



NRs



Global









6 Complexity



ExGPCRs


MACCS 0754 0756 0764 0769 0758 0822PubChem 0754 0756 0764 0769 0759 0831

ExKinases


MACCS 0851 0812 0821 0828 0852 0844PubChem 0850 0812 0821 0828 0852 0846









Complexity 7

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

20 30 40 50 60 70 80 90 10010N(di)

ExGPCRs with Graph

70 8040 50 6020 30 90 10010N(di)

05

06

07

08

09

10

AUC












8 Complexity

ExGPCRs with Graph

05

06

07

08

09

10

AUC

2 987 101 543 6K

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

2 987 101 543 6K



5 Conclusions





DB00209




DB00283


DB00334






DB00372



DB00612



Complexity 9





Acknowledgments


References


























10 Complexity









































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




6 Complexity



ExGPCRs


MACCS 0754 0756 0764 0769 0758 0822PubChem 0754 0756 0764 0769 0759 0831

ExKinases


MACCS 0851 0812 0821 0828 0852 0844PubChem 0850 0812 0821 0828 0852 0846









Complexity 7

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

20 30 40 50 60 70 80 90 10010N(di)

ExGPCRs with Graph

70 8040 50 6020 30 90 10010N(di)

05

06

07

08

09

10

AUC












8 Complexity

ExGPCRs with Graph

05

06

07

08

09

10

AUC

2 987 101 543 6K

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

2 987 101 543 6K



5 Conclusions





DB00209




DB00283


DB00334






DB00372



DB00612



Complexity 9





Acknowledgments


References


























10 Complexity









































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




Complexity 7

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

20 30 40 50 60 70 80 90 10010N(di)

ExGPCRs with Graph

70 8040 50 6020 30 90 10010N(di)

05

06

07

08

09

10

AUC












8 Complexity

ExGPCRs with Graph

05

06

07

08

09

10

AUC

2 987 101 543 6K

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

2 987 101 543 6K



5 Conclusions





DB00209




DB00283


DB00334






DB00372



DB00612



Complexity 9





Acknowledgments


References


























10 Complexity









































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




8 Complexity

ExGPCRs with Graph

05

06

07

08

09

10

AUC

2 987 101 543 6K

ExGPCRs with MACCS

05

06

07

08

09

10

AUC

2 987 101 543 6K



5 Conclusions





DB00209




DB00283


DB00334






DB00372



DB00612



Complexity 9





Acknowledgments


References


























10 Complexity









































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




Complexity 9





Acknowledgments


References


























10 Complexity









































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




10 Complexity









































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of











Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




sdtrls: predicting drug-target interactions for complex...

Documents