predicting missing links based on a new triangle...

Research ArticlePredicting Missing Links Based on a New Triangle Structure

Shenshen Bai12 Longjie Li 1 Jianjun Cheng1 Shijin Xu1 and Xiaoyun Chen 1

1School of Information Science amp Engineering Lanzhou University Lanzhou 730000 China2Department of Electronic and Information Engineering Lanzhou Vocational Technical College Lanzhou 730070 China

Correspondence should be addressed to Longjie Li ljlilzueducn and Xiaoyun Chen chenxylzueducn

Received 1 May 2018 Revised 17 October 2018 Accepted 12 November 2018 Published 2 December 2018

Guest Editor Katarzyna Musial

Copyright copy 2018 Shenshen Bai et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

With the rapid growth of various complex networks link prediction has become increasingly important because it can discover themissing information and predict future interactions between nodes in a network Recently the CAR and CCLP indexes have beenpresented for link prediction bymeans of different triangle structure informationHowever both indexesmay lose the contributionsof some shared neighbors We propose in this work a new index to make up the weakness and then improve the accuracy oflink prediction The proposed index focuses on a new triangle structure ie the triangle formed by one seed node one commonneighbor and one other node It emphasizes the importance of these triangles but does not ignore the contribution of any commonneighbor In addition the proposed index adopts the theory of resource allocation by penalizing large-degree neighborsThe resultsof comparison with CN AA RA ADP CAR CAA CRA and CCLP on 12 real-world networks show that the proposed indexoutperforms the compared methods in terms of AUC and ranking score

1 Introduction

As a fundamental research hotspot in complex networkanalysis link prediction has a wide range of applications inboth theory and reality such as analysis of network evolution[1 2] recommendation system [3] and checking potentialinteractions between proteins in biological networks [4 5]The basic task of link prediction is to estimate the missing orlatent existent links between unconnected nodes in a network[6 7] To date a host of algorithms and models have beenproposed for link prediction [6 8 9] Reference [8] groupsthem into twoways similarity-based approaches and learning-based approaches A similarity-based approach computessimilarity scores between unconnected nodes based on theknown information Then a ranked list of node pairs indescending order according to their similarity scores isobtained and the node pairs at the top are thought most likelyto have links A learning-based approach formalizes the linkprediction problem into a binary classification task [10] andusesmachine learning methods to solve the problemThe keyjob in a learning-based approach is to construct the featurevectors of node pairs In general learning-based approachesare more complicated than similarity-based ones

The hypothesis behind similarity-based approaches is themore similar that two nodes are the more likely that a linkexists between them [8] This idea is simple and intuitiveThus the study of this kind of approaches has become themainstream [6 9] The Common Neighbors (CN) index [11]as its name suggests simply counts the number of commonneighbors between two nodes The Adamic-Adar (AA) [12]and Resource Allocation (RA) [13] indexes are two variants ofthe CN index they penalize the contributions of large-degreecommon neighbors These indexes are called local methodsbecause they only use local structure information Besidessome global and quasilocal methods have also been proposedby researchers such as Katz [14] SimRank [15] RandomWalks with Restart [16] Local Path [17] FriendLink [18] andLocal RandomWalk [19]

With the increasing growth of sizes of complex networkslocal methods are still good candidates because they are moreefficient in terms of running time than global and quasilocalmethods Therefore we focus in this study on local methodsRecently Cannistraci et al proposed the CAR index [20]which suggests that links between the common neighborsie local-community-links (LCLs) are more valuable thancommon neighbors in link prediction In CAR index a local

HindawiComplexityVolume 2018 Article ID 7312603 11 pageshttpsdoiorg10115520187312603

2 Complexity

a b

c

de

f

g

h(a) Example network

(b) Triangles used in CAR index

(c) Triangles used in CCLP index

a b

c

d

hi

a b

c

d

c a

c

d

g

b(d) Triangles used in TRA index

(d) Similarity of (a b)

seed node

common neighbor

non-common neighbor neighbor of common neighbor

i

g

f

CN(a b) = 4

RA(a b) =4

3

CAR(a b) = 4

CCLP(a b) =4

3

TRA(a b) =49

24

Figure 1 Triangles used in similarity indexes

community is a triangle passing through two common neigh-bors and one seed node In the example network shown inFigure 1(a) there is one LCL between the common neighborsof seed nodes 119886 and 119887 (see Figure 1(b)) Thus CAR indexassigns a similarity score of four to nodes 119886 and 119887 Howeverif we remove the link between 119888 and 119889 CAR will assigna zero similarity score to 119886 and 119887 even though they havefour common neighbors In addition the idea of LCL is alsoplugged into AA RA and Jaccard indexes [20] Later Wu etal proposed the CCLP index based on the clustering coef-ficients of common neighbors This index considers all tri-angles passing through a common neighbor For the examplenetwork in Figure 1(a) there are triangles passing throughnodes 119888 119889 and 119891 respectively (see Figure 1(c)) Thus CCLPindex accumulates the clustering coefficients of nodes 119888 119889and 119891 when calculating the similarity between 119886 and 119887 bututterly neglects the contribution of node 119890 In real-worldnetworks it is possible that there are no triangles passingthrough some or even all shared neighbors of one node pairThus CAR and CCLP indexes may assign a very low or evenzero similarity score to the node pair even if it has manycommon neighbors

In this paper we defines a new type of triangle structurecalled TRA-triangle which is formed by one seed node onecommon neighbor and one other node (see Figure 1(d))Based on the TRA-triangle a new similarity index namelyTRA index is proposed for link prediction This indexsuggests that the common neighbors that can form TRA-tri-angles with a seed node are more important than others Inaddition the proposed index also penalizes the large-degreeneighbors as done in RA index [13] Although all theTRA CAR-based and CCLP indexes are based on trianglestructures the intuitions behind themare differentTheCAR-based indexes believe that LCLs are more valuable than

common neighbors The CCLP index is inspired by CARindex but employs all triangles passing through commonneighbors while the TRA index which only uses the TRA-triangles strikes a balance between CAR and CCLP Further-more as aforementioned CAR-based and CCLP indexes losethe contribution of those common neighbors with no trian-gles passing through them whereas TRA index counts thecontribution of all kinds of common neighbors ThereforeTRA index can achieve better prediction accuracy than CAR-based indexes and CCLP index The accuracy of TRA indexis evaluated on 12 real-world networks from various fieldsThe experimental results show that our index is far superiorto CAR-based indexes and CCLP index Take the networkof HEP as an example which is a very sparse network theimprovements made by TRA on CAR and CCLP under themetric of AUC are up to 269 and 42 respectively

The rest of the paper is structured as follows In Section 2we give the description of the link prediction problem and theevaluation metrics list the compared methods and networksand depict the Wilcoxon signed-ranks test Section 3 intro-duces the proposed method In Section 4 the experimentalresults and performance analysis of the proposed method arepresented Finally Section 5 concludes this work

2 Preliminaries

21 Problem Description and Metric Given an undirectedand unweighted network 119866(119881 119864) in which 119881 and 119864 are thenode set and link set respectively in this study multilinksand self-loops are not allowed Let119873 = |119881| be the number ofnodes in the network and let119880 be the universal possible linkset which contains119873(119873minus1)2 possible linksThen the set ofnonobserved links or nonexisting links is119880minus119864 Suppose thereare some missing links in 119880 minus 119864 the task of link prediction

Complexity 3

is to find those links A similarity-based approach assigns asimilarity score to each node pair in 119880 minus 119864 and assumes thatthe higher score a node pair has the more likely there is a linkbetween them

To test the performance of a similarity index we ran-domly divide the link set 119864 into two parts training set 119864119905119903and testing set 119864119905119904 such that 119864 = 119864119905119903 cup 119864119905119904 and 119864119905119903 cap 119864119905119904 = 0119864119905119903 is supposed to be the observed information and 119864119905119904 isused for testing Two parameter-free metrics are employedto quantify the accuracy of link prediction algorithms AUC[6] and ranking score [21 22] In this situation the AUCscore can be interpreted as the probability that a randomlyselected missing link (ie a link in 119864119905119904) is given a higherscore than a randomly selected nonexistent link (ie a linkin119880minus119864) When implementing if we perform 119899 independentcomparisons there are 1198991 times that the missing link hashigher score and 1198992 times that they have the same score TheAUC value is then computed as

119860119880119862 = 1198991 + 051198992119899 (1)

Ranking score (RS) takes the ranks of links in testing setafter sorting in descend order according to their similarityscores into consideration Let 119867 = 119880 minus 119864119905119903 be the set ofnonobserved links Let 119890119894 be a missing link in 119864119905119904 and 119903119894 be itsrankThe ranking score of 119890119894 is defined as119877119878(119890119894) = 119903119894|119867| andthe ranking score of the link prediction result is as follows

119877119878 = 110038161003816100381610038161198641199051199041003816100381610038161003816sum119890119894isin119864119905119904

119877119878 (119890119894) =110038161003816100381610038161198641199051199041003816100381610038161003816sum119890119894isin119864119905119904

119903119894|119867| (2)

Note that the AUC value is the higher the better whereasthe ranking score is the smaller the better

22 Local Similarity Indexes As yet many similarity indexeshave been proposed for link prediction [6 8 9] Here welist some local similarity indexes that will be used in ourexperiments for the purpose of comparison(1) Common Neighbor (CN) index [11] defines the

similarity between 119909 and 119910 as the number of their commonneighbors which is

119862119873(119909 119910) = 1003816100381610038161003816Γ (119909) cap Γ (119910)1003816100381610038161003816 (3)

where Γ(119909) denotes the set of neighbors of node 119909(2) Adamic-Adar (AA) index [12] is a variant of CN

index which believes that small-degree neighbors have morecontributions than large-degree neighbors when computingsimilarity Its definition is as follows

119860119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1log 119896119911

(4)

where 119896119911 is the degree of node 119911(3) Resource Allocation (RA) index [13] defines the

similarity between 119909 and 119910 as the amount of resource that 119910received from 119909 through their common neighbors which is

119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1119896119911 (5)

(4) Adaptive Degree Penalization (ADP) index [23]penalizes a common neighbor according to its degree and theaverage clustering coefficient of the networkTherefore it canautomatically adapt to the network The definition of ADPindex is as follows

119860119863119875(119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119896minus120573119862119911 (6)

where 120573 is a constant and 119862 is the average clusteringcoefficient of the network We set 120573 = 25 as suggested bythe authors(5)CAR index [20] suggests that two seed nodes are more

likely to link together if there are links between their commonneighbors which is defined as

119862119860119877 (119909 119910) = 119862119873(119909 119910) sdot sum119911isinΓ(119909)capΓ(119910)

119871 (119911)2 (7)

where 119871(119911) is the number of links between 119911 and othercommon neighbors of 119909 and 119910(6)CAA and CRA indexes [20] are generated by plugging

the idea of CAR index into the AA and RA indexes respec-tively which are defined as

119862119860119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119871 (119911)log 119896119911

(8)

119862119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119871 (119911)119896119911 (9)

(7) CCLP index [24] computes the similarity between 119909and 119910 by employing clustering coefficient of common neigh-bors which is

119862119862119871119875 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119862119862119911 (10)

where119862119862119911 denotes the clustering coefficient of node 119911 whichis

119862119862119911 =2119905119911

119896119911 (119896119911 minus 1) (11)

in which 119905119911 is the number of triangles passing through node119911

23 Networks In this study we use 12 real-world networksdrawn from various fields to evaluate the effectiveness of linkprediction methods

(1) Advogato (ADV) a social network whose users aremainly free and open source software developers [25]

(2) Celegans (CE) the neural network of a Caenorhab-ditis elegans worm [26]

(3) Dolphin a social network of 62 dolphins in a commu-nity living off Doubtful Sound New Zealand [27]

(4) Email a network of email interchanges betweenmembers of a university [28]

4 Complexity

(5) Foodweb (FW) a food web in Florida Bay during therainy season [29]

(6) Hamster a friendship network between users onhamsterstercom [30]

(7) HEP the coauthorships network of scientists whoposted preprints on the high-energy theory archivefrom 1995 to 1999 [31]

(8) Karate the social network of a karate club at a USuniversity [32]

(9) Political blogs (PB) a network of blogs about US poli-tics [33]

(10) USAir a network of the US air transportation system[6]

(11) Word an adjacency network of common adjectivesand noun in the novel ldquoDavidCopperfieldrdquo byCharlesDickens [34]

(12) Yeast the protein-protein interaction network ofbudding yeast [35]

In this work all the aforementioned networks are treatedas undirected and unweighted networks and only the giantcomponent of each network is used Table 1 lists the basicstatistics of the giant components of these networks

Given network 119866(119881 119864) suppose 119909 119910 be two seed nodes(119909 119910) is called a seed node pair with common neighbors if theyhave at least one common neighbor119875Λ denotes the set of seednode pairs with common neighbors formally

119875Λ = (119909 119910) | (119909 119910) notin 119864 and Γ (119909) cap Γ (119910) = 0 (12)

Let 119909 119910 be two seed nodes and 119911 is one of their commonneighbors If 119862119862119911 = 0 we call 119911 is a zero-triangle-neighborotherwise 119911 is a triangle-neighbor If 119871(119911) = 0 119911 is called aCAR-triangle-neighbor and if (119909 119910 119911) = 0 (see (18)) 119911 iscalled a TRA-triangle-neighbor Let 119878 be the set of triangle-neighbors and 119878119862119860119877 119878119879119877119860 denote the sets of CAR- and TRA-triangle-neighbors respectively Clearly 119878119862119860119877 sube 119878119879119877119860 sube 119878Let 119875exist(119878) and 119875forall(119878) be two subsets of 119875Λ For any pairin 119875exist(119878) at least one of their shared neighbors is not atriangle-neighbor and for any pair in 119875forall(119878) all of theirshared neighbors are not triangle-neighbors More explicitly

119875exist (119878)

= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878

119875forall (119878)

= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878

(13)

Similarly we define 119875exist(119878119879119877119860) 119875forall(119878119879119877119860) 119875exist(119878119862119860119877) and119875forall(119878119862119860119877) which are

119875exist (119878119879119877119860)


119875forall (119878119879119877119860)


119875exist (119878119862119860119877)


119875forall (119878119862119860119877)


(14)

Correspondingly the ratios of those subsets to 119875Λ arerespectively defined as

119877exist (119878) =10038161003816100381610038161003816119875exist (119878)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878) =10038161003816100381610038161003816119875forall (119878)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877exist (119878119879119877119860) =10038161003816100381610038161003816119875exist (119878119879119877119860)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878119879119877119860) =10038161003816100381610038161003816119875forall (119878119879119877119860)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877exist (119878119862119860119877) =10038161003816100381610038161003816119875exist (119878119862119860119877)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878119862119860119877) =10038161003816100381610038161003816119875forall (119878119862119860119877)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

(15)

Table 2 lists these ratios over the 12 networks

24Wilcoxon Signed-Ranks Test TheWilcoxon signed-rankstest is a nonparametric statistical hypothesis test used tocheck whether two methods perform equally well over multi-ple networks [38 39] Let 119889119894 be the difference in performancescores of two link prediction methods on the 119894th networkThe differences are ranked in accordance with their absolutevalues in case of ties average ranks are assigned Let 119877+be the sum of ranks for the networks on which the secondmethod outperformed the first and 119877minus the sum of ranks forthe opposite For a larger number of networks the statistics

119911 = 119879 minus (14)119873 (119873 + 1)radic(124)119873 (119873 + 1) (2119873 + 1) (16)

is distributed approximately normally [39] In (16) 119879 =min(119877+ 119877minus) and119873 is the number of networks

With 120572 = 005 if 119911 is small than -196 we reject the null-hypothesis which states that both methods perform equallywell

Complexity 5

Table 1 The basic structural features of the giant components of the 12 networks |119881| and |119864| are the total numbers of nodes and edgesrespectively119863 denotes the density which is119863 = 2|119864||119881|(|119881| minus 1) ⟨119896⟩ and ⟨119889⟩ present the average degree and the average shortest distancerespectively 119862 and 119903 indicate the clustering coefficient [26] and assortative coefficient [36] respectively 119867 is the degree heterogeneity [6]defined as119867 = ⟨1198962⟩⟨119896⟩2 and 119890 is the network efficiency [37]

Networks |119881| |119864| 119863 ⟨119896⟩ ⟨119889⟩ 119862 119903 119867 119890ADV 5042 39227 31E-03 15560 3275 0253 -0096 5303 0324CE 297 2148 49E-02 14465 2455 0292 -0163 1801 0445Dolphin 62 159 84E-02 5129 3357 0259 -0044 1327 0379Email 1133 5451 85E-03 9622 3606 0220 0078 1942 0300FW 128 2075 26E-01 32422 1776 0335 -0112 1237 0622Hamster 1788 12476 78E-03 13955 3453 0143 -0089 3264 0317HEP 5835 13815 81E-04 4735 7026 0506 0185 1926 0155Karate 34 78 14E-01 4588 2408 0571 -0476 1693 0492PB 1222 16714 22E-02 27355 2738 0320 -0221 2971 0398USAir 332 2126 39E-02 12807 2738 0625 -0208 3464 0406Word 112 425 68E-02 7589 2536 0173 -0129 1815 0442Yeast 2224 6609 27E-03 5943 4376 0138 -0105 2803 0246

Table 2 The ratios of various seed pairs over the 12 networks

Networks 119877exist(119878) 119877forall(119878) 119877exist(119878119862119860119877) 119877forall(119878119862119860119877) 119877exist(119878119879119877119860) 119877forall(119878119879119877119860)ADV 0001 0001 0807 0750 0018 0014CE 0001 0000 0768 0670 0012 0008Dolphin 0020 0011 0857 0817 0089 0069Email 0008 0005 0881 0841 0070 0057FW 0000 0000 0240 0136 0005 0000Hamster 0014 0005 0893 0817 0074 0048HEP 0007 0007 0881 0876 0029 0028Karate 0004 0000 0743 0706 0034 0011PB 0001 0000 0577 0497 0010 0007USAir 0000 0000 0510 0509 0005 0005Word 0067 0028 0799 0735 0108 0062Yeast 0083 0063 0945 0931 0357 0324

3 The Proposed Index

The link prediction problem has a familiar relationship withthe network evolvingmechanism [2 40] A recently proposedtriangle growth mechanism demonstrates that various keyfeatures observed in most real-world networks can be gener-ated in simulated networks [41] Therefore triangle structureinformation has an important effect in link formation

In this work we focus on a new triangle structure namelyTRA-triangle A TRA-triangle passes through one seed nodeone common neighbor and one other node In our opinionthe commonneighbors that can formTRA-triangles aremoreimportant than others Given two nodes 119906 and V we denotethe number of triangles passing through them as (119906 V)which is

(119906 V) =

119862119873(119906 V) if (119906 V) isin 1198640 otherwise

(17)

For the example network in Figure 1(a) the trianglesused for seed nodes 119886 119887 are shown in Figure 1(d) Clearly

(119886 119888) = 2 and (119886 119889) = 1 Thus node 119888 is in more closecontact with 119886 than 119889 Given seed nodes 119909 and 119910 119911 is oneof their common neighbors Function(119909 119910 119911) sums up thenumber of TRA-triangles formed by 119909 119911 and 119910 119911 which is

(119909 119910 119911) = (119909 119911) + (119910 119911) (18)

In this paper we propose a new similarity index bycombining the aforementioned triangle structure and theidea of RA index [13] For the convenience of statement wename our new method TRA index Its definition is

119879119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1 + (119909 119910 119911) 2119896119911

(19)

In (19) the numerator is 1 + (119909 119910 119911)2 Therefore theTRA index does not miss the effect of any common neighborIf all common neighbors are zero-triangle-neighbors TRAdegenerates to RA For the example network in Figure 1(a)119879119877119860(119886 119887) = (1+32)4+(1+22)3+(1+02)2+(1+02)4 =4924

6 Complexity

Table 3 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 01The best performance for each network is emphasized by boldface

CN AA RA ADP CAR CAA CRA CCLP TRAADV 08992 09026 09030 09033 08054 08051 08063 09011 09043CE 08450 08613 08662 08654 07657 07677 07704 08625 08713Dolphin 07832 07863 07854 07866 06475 06473 06473 07804 07850Email 08471 08491 08488 08491 06994 06995 06996 08452 08493FW 06053 06071 06114 06097 06192 06271 06321 06323 06890Hamster 08037 08067 08074 08067 06542 06542 06543 08075 08127HEP 08984 08987 08987 08987 07079 07079 07079 08624 08985Karate 06985 07409 07523 07532 05848 05881 05880 07085 07755PB 09192 09226 09239 09242 08926 08929 08946 09217 09282USAir 09357 09466 09522 09523 09136 09158 09202 09391 09452Word 06656 06649 06621 06651 05717 05713 05714 06727 06809Yeast 07041 07047 07045 07047 05994 05994 05994 06972 07054

4 Experimental Results

Table 3 lists the predicted results of different methods interms of AUC on the 12 networks The results are obtained byaveraging over 50 independent realizations for each networkwith testing set containing 10 links The highest AUC valuefor each network is highlighted in boldface Clearly TRAindex gets nine best results over the 12 networks MeanwhileTRA index outperforms the CAR CAA CRA and CCLPindexes on all networksWe can see fromTable 2 that onmostof the networks there exist varying degrees of such seed nodepairs with common neighbors that belong to 119875exist(119878) andor119875forall(119878) As stated in Introduction CCLP index will give loweror zero similarity scores to those pairs Furthermore bothvalues of 119877exist(119878119862119860119877) and 119877forall(119878119862119860119877) are very high on mostof the networks Particularly on Dolphin Email HamsterHEP and Yeast the corresponding values of 119877forall(119878119862119860119877) aregreater than 08 This phenomenon indicates that only a verysmall fraction of seed node pairs with common neighborson those networks can be assigned similarity scores by CAR-based indexes Although there are some seed node pairsbelonging to 119875exist(119878119879119877119860) andor 119875forall(119878119879119877119860) TRA index still canassign reasonable similarity scores to them Therefore theresults of TRA index in Table 3 are better than them ofCAR CAA CRA and CCLP indexes For CN AA RAand ADP indexes ADP index performs the best since itcan penalize common neighbors by automatically adaptingto the network On Dolphin HEP and USAir ADP indexobtains the best accuracy the performance of our indexapproximates to the best In addition TRA index achievesmuch better AUC scores than others on FW and Karate Thisresult suggests that TRA-triangles play an important role onthese two networks From Table 1 both networks are denseones Roughly speaking the probability that there exist TRA-triangle-neighbors between seed nodes on dense networks ismore than on sparse ones

To check whether the proposed index is significantly dif-ferent with compared methods we appliedWilcoxon signed-ranks test [39] based on the results in Table 3 The pairwisetest results are presented in Figure 2 From the statistical point

Wilcoxon signed-ranks testminus18

minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196

Figure 2 The results of Wilcoxon signed-ranks test based onTable 3With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected

of view our index is significantly better than others exceptADP index because ADP index has the capability of adaptingto the structure of a network automatically Although there isno statistical difference between our index and ADP indexaccording toWilcoxon signed-ranks test our index performsbetter than ADP index in terms of AUC

Figure 3 exhibits the changes of AUC on 12 networkswhen the proportion of 119864119905119904 in 119864 increases from 10 to20 It is quite evident from Figure 3 that the AUC valuesof all indexes show downward trends when the proportionincreases from 10 to 20 except on FW The reason is thatthe increase of 119864119905119904 will decrease the size of training set 119864119905119903and then will result in the number of common neighborsbetween seed nodes becoming small Consequently thedifficulty of link prediction will enhance The FW networkwhich possesses high average degree small average shortestdistance and small-degree heterogeneity is a very dense

Complexity 7

CN AA RA ADP CAR CAA CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA070

075

080

085

090

095

100AU

C

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

CN AA RA ADP CRA CCLP TRA065

070

075

080

085

090

095

AUC

CE


060

065

070

075

080

085

AUC

Dolphin

CN AA RA ADP CAR CAA CRA CCLP TRA060

065

070

075

080

085

090

AUC

Email


055

060

065

070

075

080

AUC

FW


065

070

075

080

085

090

AUC

Hamster


AUC

HEP


055

060

065

070

075

080

AUC

Karate


AUC

PB


AUC

USAir


AUC

Word


055

060

065

070

075

080AU

CYeast

Figure 3 The changes of AUC when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over 50independent realizations

network Therefore the decrease of training set gives slightinfluence of accuracy on FW In addition we can observefrom Figure 3 that the performance presented by all indexeson ADV CE Dolphin Email Hamster HEP Karate Wordand Yeast is very similar On these nine networks the AUCvalues of CAR-based indexes are obvious lower than thoseof others On the network of FW the results of CAR-basedindexes are better than those of CN AA RA and ADPindexes because FW is a very dense network in which theratio of CAR-triangle-neighbor is very high (see Table 2) OnPB and USAir the performance of CAR-based indexes is notas bad as on other nine networks The reason is both networkshave high average degrees small average shortest distancesand high ratio of CAR-triangle-neighbors

Furthermore we list the AUC values of different methodson the 12 networks when |119864119905119904||119864| = 02 in Table 4 Theresults of our index outperform others on eight among the

12 networks while CCLP index achieves the highest value onCE

Table 5 gives the results in terms of ranking score Theseresults are similar to those in Table 3 The ranking score ofTRA index outperforms others except on Dolphin HEP andUSAir The pairwise Wilcoxon signed-ranks test results areshown in Figure 4 Similar to the test in Figure 2 TRA indexis significantly better than compared methods except ADPindex As depicted above ADP has the adaptive capabilityand hence performs better than other compared methods

Figure 5 describes the changes of ranking score on 12networks when |119864119905119904||119864| increases from 10 to 20 Clearlyall indexes yield higher ranking scores with the increase of119864119905119904 Do not forget that higher ranking score means loweraccuracy As analyzed above FW is very dense Thus thechanges of AUC on FW are very slight (see Figure 3)However the changes of ranking score on FW are more

8 Complexity


minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196

Figure 4 The results of Wilcoxon signed-ranks test based on Table 5 With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected


CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA010015020025030035040045050

Rank

ing

Scor

e

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


02

03

04

05

06

Rank

ing

Scor

e

CE


04

05

06

07

08

Rank

ing

Scor

e

Dolphin


03

04

05

06

07

Rank

ing

Scor

e

Email


032

034

036

038

040

042

044

Rank

ing

Scor

e

FW


04

05

06

07

08

Rank

ing

Scor

e

Hamster

CN AA RA ADP CAR CAA CRA CCLP TRA

02

03

04

05

06

07

Rank

ing

Scor

e

HEP


03

04

05

06

07

08

09

Rank

ing

Scor

e

Karate


Rank

ing

Scor

e

PB


006

008

010

012

014

016

018

020

Rank

ing

Scor

e

USAir


05

06

07

08

09

Rank

ing

Scor

e

Word


Rank

ing

Scor

e

Yeast

Figure 5 The changes of ranking score when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over50 independent realizations

Complexity 9



Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface


evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors

Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small

From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks

5 Conclusion and Discussion

Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in

various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness

The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction

10 Complexity



There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair

Data Availability

Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)

References

[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013

[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015

[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012

[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016

[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011

[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015

[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014

[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016

[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016

[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007

[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003

[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009

[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953

[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002

[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006

Complexity 11

[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009

[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012

[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010

[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013

[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014

[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017

[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016

[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016

[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009

[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998

[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003

[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003

[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005

[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013

[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001

[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977

[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005

[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006

[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003

[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003

[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001

[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006

[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012

[41] ZWuGMenichetti C Rahmede andG Bianconi ldquoEmergentcomplex network geometryrdquo Scientific Reports vol 5 2015

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of


Mathematical Problems in Engineering

Applied MathematicsJournal of


Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of



Engineering Mathematics

International Journal of


Operations ResearchAdvances in

Journal of


Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences


Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018


Decision SciencesAdvances in


AnalysisInternational Journal of


Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

2 Complexity

a b

c

de

f

g

h(a) Example network

(b) Triangles used in CAR index

(c) Triangles used in CCLP index

a b

c

d

hi

a b

c

d

c a

c

d

g

b(d) Triangles used in TRA index

(d) Similarity of (a b)

seed node

common neighbor

non-common neighbor neighbor of common neighbor

i

g

f

CN(a b) = 4

RA(a b) =4

3

CAR(a b) = 4

CCLP(a b) =4

3

TRA(a b) =49

24

Figure 1 Triangles used in similarity indexes

community is a triangle passing through two common neigh-bors and one seed node In the example network shown inFigure 1(a) there is one LCL between the common neighborsof seed nodes 119886 and 119887 (see Figure 1(b)) Thus CAR indexassigns a similarity score of four to nodes 119886 and 119887 Howeverif we remove the link between 119888 and 119889 CAR will assigna zero similarity score to 119886 and 119887 even though they havefour common neighbors In addition the idea of LCL is alsoplugged into AA RA and Jaccard indexes [20] Later Wu etal proposed the CCLP index based on the clustering coef-ficients of common neighbors This index considers all tri-angles passing through a common neighbor For the examplenetwork in Figure 1(a) there are triangles passing throughnodes 119888 119889 and 119891 respectively (see Figure 1(c)) Thus CCLPindex accumulates the clustering coefficients of nodes 119888 119889and 119891 when calculating the similarity between 119886 and 119887 bututterly neglects the contribution of node 119890 In real-worldnetworks it is possible that there are no triangles passingthrough some or even all shared neighbors of one node pairThus CAR and CCLP indexes may assign a very low or evenzero similarity score to the node pair even if it has manycommon neighbors

In this paper we defines a new type of triangle structurecalled TRA-triangle which is formed by one seed node onecommon neighbor and one other node (see Figure 1(d))Based on the TRA-triangle a new similarity index namelyTRA index is proposed for link prediction This indexsuggests that the common neighbors that can form TRA-tri-angles with a seed node are more important than others Inaddition the proposed index also penalizes the large-degreeneighbors as done in RA index [13] Although all theTRA CAR-based and CCLP indexes are based on trianglestructures the intuitions behind themare differentTheCAR-based indexes believe that LCLs are more valuable than

common neighbors The CCLP index is inspired by CARindex but employs all triangles passing through commonneighbors while the TRA index which only uses the TRA-triangles strikes a balance between CAR and CCLP Further-more as aforementioned CAR-based and CCLP indexes losethe contribution of those common neighbors with no trian-gles passing through them whereas TRA index counts thecontribution of all kinds of common neighbors ThereforeTRA index can achieve better prediction accuracy than CAR-based indexes and CCLP index The accuracy of TRA indexis evaluated on 12 real-world networks from various fieldsThe experimental results show that our index is far superiorto CAR-based indexes and CCLP index Take the networkof HEP as an example which is a very sparse network theimprovements made by TRA on CAR and CCLP under themetric of AUC are up to 269 and 42 respectively

The rest of the paper is structured as follows In Section 2we give the description of the link prediction problem and theevaluation metrics list the compared methods and networksand depict the Wilcoxon signed-ranks test Section 3 intro-duces the proposed method In Section 4 the experimentalresults and performance analysis of the proposed method arepresented Finally Section 5 concludes this work

2 Preliminaries

21 Problem Description and Metric Given an undirectedand unweighted network 119866(119881 119864) in which 119881 and 119864 are thenode set and link set respectively in this study multilinksand self-loops are not allowed Let119873 = |119881| be the number ofnodes in the network and let119880 be the universal possible linkset which contains119873(119873minus1)2 possible linksThen the set ofnonobserved links or nonexisting links is119880minus119864 Suppose thereare some missing links in 119880 minus 119864 the task of link prediction

Complexity 3



119860119880119862 = 1198991 + 051198992119899 (1)


119877119878 = 110038161003816100381610038161198641199051199041003816100381610038161003816sum119890119894isin119864119905119904

119877119878 (119890119894) =110038161003816100381610038161198641199051199041003816100381610038161003816sum119890119894isin119864119905119904

119903119894|119867| (2)




119862119873(119909 119910) = 1003816100381610038161003816Γ (119909) cap Γ (119910)1003816100381610038161003816 (3)



119860119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1log 119896119911

(4)



119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1119896119911 (5)


119860119863119875(119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119896minus120573119862119911 (6)




119871 (119911)2 (7)



119862119860119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119871 (119911)log 119896119911

(8)

119862119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119871 (119911)119896119911 (9)


119862119862119871119875 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119862119862119911 (10)


119862119862119911 =2119905119911

119896119911 (119896119911 minus 1) (11)







4 Complexity











119875Λ = (119909 119910) | (119909 119910) notin 119864 and Γ (119909) cap Γ (119910) = 0 (12)


119875exist (119878)


119875forall (119878)


(13)


119875exist (119878119879119877119860)


119875forall (119878119879119877119860)


119875exist (119878119862119860119877)


119875forall (119878119862119860119877)


(14)


119877exist (119878) =10038161003816100381610038161003816119875exist (119878)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878) =10038161003816100381610038161003816119875forall (119878)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877exist (119878119879119877119860) =10038161003816100381610038161003816119875exist (119878119879119877119860)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878119879119877119860) =10038161003816100381610038161003816119875forall (119878119879119877119860)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877exist (119878119862119860119877) =10038161003816100381610038161003816119875exist (119878119862119860119877)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878119862119860119877) =10038161003816100381610038161003816119875forall (119878119862119860119877)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

(15)



119911 = 119879 minus (14)119873 (119873 + 1)radic(124)119873 (119873 + 1) (2119873 + 1) (16)



Complexity 5








(119906 V) =


(17)



(119909 119910 119911) = (119909 119911) + (119910 119911) (18)


119879119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1 + (119909 119910 119911) 2119896119911

(19)


6 Complexity







minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196




Complexity 7


CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA070

075

080

085

090

095

100AU

C

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


070

075

080

085

090

095

AUC

CE


060

065

070

075

080

085

AUC

Dolphin


065

070

075

080

085

090

AUC

Email


055

060

065

070

075

080

AUC

FW


065

070

075

080

085

090

AUC

Hamster


AUC

HEP


055

060

065

070

075

080

AUC

Karate


AUC

PB


AUC

USAir


AUC

Word


055

060

065

070

075

080AU

CYeast







8 Complexity


minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196



CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA010015020025030035040045050

Rank

ing

Scor

e

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


02

03

04

05

06

Rank

ing

Scor

e

CE


04

05

06

07

08

Rank

ing

Scor

e

Dolphin


03

04

05

06

07

Rank

ing

Scor

e

Email


032

034

036

038

040

042

044

Rank

ing

Scor

e

FW


04

05

06

07

08

Rank

ing

Scor

e

Hamster


02

03

04

05

06

07

Rank

ing

Scor

e

HEP


03

04

05

06

07

08

09

Rank

ing

Scor

e

Karate


Rank

ing

Scor

e

PB


006

008

010

012

014

016

018

020

Rank

ing

Scor

e

USAir


05

06

07

08

09

Rank

ing

Scor

e

Word


Rank

ing

Scor

e

Yeast


Complexity 9












10 Complexity




Data Availability




Acknowledgments


References

















Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 3



119860119880119862 = 1198991 + 051198992119899 (1)


119877119878 = 110038161003816100381610038161198641199051199041003816100381610038161003816sum119890119894isin119864119905119904

119877119878 (119890119894) =110038161003816100381610038161198641199051199041003816100381610038161003816sum119890119894isin119864119905119904

119903119894|119867| (2)




119862119873(119909 119910) = 1003816100381610038161003816Γ (119909) cap Γ (119910)1003816100381610038161003816 (3)



119860119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1log 119896119911

(4)



119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1119896119911 (5)


119860119863119875(119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119896minus120573119862119911 (6)




119871 (119911)2 (7)



119862119860119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119871 (119911)log 119896119911

(8)

119862119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119871 (119911)119896119911 (9)


119862119862119871119875 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

119862119862119911 (10)


119862119862119911 =2119905119911

119896119911 (119896119911 minus 1) (11)







4 Complexity











119875Λ = (119909 119910) | (119909 119910) notin 119864 and Γ (119909) cap Γ (119910) = 0 (12)


119875exist (119878)


119875forall (119878)


(13)


119875exist (119878119879119877119860)


119875forall (119878119879119877119860)


119875exist (119878119862119860119877)


119875forall (119878119862119860119877)


(14)


119877exist (119878) =10038161003816100381610038161003816119875exist (119878)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878) =10038161003816100381610038161003816119875forall (119878)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877exist (119878119879119877119860) =10038161003816100381610038161003816119875exist (119878119879119877119860)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878119879119877119860) =10038161003816100381610038161003816119875forall (119878119879119877119860)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877exist (119878119862119860119877) =10038161003816100381610038161003816119875exist (119878119862119860119877)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878119862119860119877) =10038161003816100381610038161003816119875forall (119878119862119860119877)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

(15)



119911 = 119879 minus (14)119873 (119873 + 1)radic(124)119873 (119873 + 1) (2119873 + 1) (16)



Complexity 5








(119906 V) =


(17)



(119909 119910 119911) = (119909 119911) + (119910 119911) (18)


119879119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1 + (119909 119910 119911) 2119896119911

(19)


6 Complexity







minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196




Complexity 7


CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA070

075

080

085

090

095

100AU

C

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


070

075

080

085

090

095

AUC

CE


060

065

070

075

080

085

AUC

Dolphin


065

070

075

080

085

090

AUC

Email


055

060

065

070

075

080

AUC

FW


065

070

075

080

085

090

AUC

Hamster


AUC

HEP


055

060

065

070

075

080

AUC

Karate


AUC

PB


AUC

USAir


AUC

Word


055

060

065

070

075

080AU

CYeast







8 Complexity


minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196



CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA010015020025030035040045050

Rank

ing

Scor

e

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


02

03

04

05

06

Rank

ing

Scor

e

CE


04

05

06

07

08

Rank

ing

Scor

e

Dolphin


03

04

05

06

07

Rank

ing

Scor

e

Email


032

034

036

038

040

042

044

Rank

ing

Scor

e

FW


04

05

06

07

08

Rank

ing

Scor

e

Hamster


02

03

04

05

06

07

Rank

ing

Scor

e

HEP


03

04

05

06

07

08

09

Rank

ing

Scor

e

Karate


Rank

ing

Scor

e

PB


006

008

010

012

014

016

018

020

Rank

ing

Scor

e

USAir


05

06

07

08

09

Rank

ing

Scor

e

Word


Rank

ing

Scor

e

Yeast


Complexity 9












10 Complexity




Data Availability




Acknowledgments


References

















Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018








4 Complexity











119875Λ = (119909 119910) | (119909 119910) notin 119864 and Γ (119909) cap Γ (119910) = 0 (12)


119875exist (119878)


119875forall (119878)


(13)


119875exist (119878119879119877119860)


119875forall (119878119879119877119860)


119875exist (119878119862119860119877)


119875forall (119878119862119860119877)


(14)


119877exist (119878) =10038161003816100381610038161003816119875exist (119878)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878) =10038161003816100381610038161003816119875forall (119878)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877exist (119878119879119877119860) =10038161003816100381610038161003816119875exist (119878119879119877119860)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878119879119877119860) =10038161003816100381610038161003816119875forall (119878119879119877119860)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877exist (119878119862119860119877) =10038161003816100381610038161003816119875exist (119878119862119860119877)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

119877forall (119878119862119860119877) =10038161003816100381610038161003816119875forall (119878119862119860119877)

100381610038161003816100381610038161003816100381610038161003816119875Λ1003816100381610038161003816

(15)



119911 = 119879 minus (14)119873 (119873 + 1)radic(124)119873 (119873 + 1) (2119873 + 1) (16)



Complexity 5








(119906 V) =


(17)



(119909 119910 119911) = (119909 119911) + (119910 119911) (18)


119879119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1 + (119909 119910 119911) 2119896119911

(19)


6 Complexity







minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196




Complexity 7


CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA070

075

080

085

090

095

100AU

C

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


070

075

080

085

090

095

AUC

CE


060

065

070

075

080

085

AUC

Dolphin


065

070

075

080

085

090

AUC

Email


055

060

065

070

075

080

AUC

FW


065

070

075

080

085

090

AUC

Hamster


AUC

HEP


055

060

065

070

075

080

AUC

Karate


AUC

PB


AUC

USAir


AUC

Word


055

060

065

070

075

080AU

CYeast







8 Complexity


minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196



CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA010015020025030035040045050

Rank

ing

Scor

e

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


02

03

04

05

06

Rank

ing

Scor

e

CE


04

05

06

07

08

Rank

ing

Scor

e

Dolphin


03

04

05

06

07

Rank

ing

Scor

e

Email


032

034

036

038

040

042

044

Rank

ing

Scor

e

FW


04

05

06

07

08

Rank

ing

Scor

e

Hamster


02

03

04

05

06

07

Rank

ing

Scor

e

HEP


03

04

05

06

07

08

09

Rank

ing

Scor

e

Karate


Rank

ing

Scor

e

PB


006

008

010

012

014

016

018

020

Rank

ing

Scor

e

USAir


05

06

07

08

09

Rank

ing

Scor

e

Word


Rank

ing

Scor

e

Yeast


Complexity 9












10 Complexity




Data Availability




Acknowledgments


References

















Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 5








(119906 V) =


(17)



(119909 119910 119911) = (119909 119911) + (119910 119911) (18)


119879119877119860 (119909 119910) = sum119911isinΓ(119909)capΓ(119910)

1 + (119909 119910 119911) 2119896119911

(19)


6 Complexity







minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196




Complexity 7


CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA070

075

080

085

090

095

100AU

C

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


070

075

080

085

090

095

AUC

CE


060

065

070

075

080

085

AUC

Dolphin


065

070

075

080

085

090

AUC

Email


055

060

065

070

075

080

AUC

FW


065

070

075

080

085

090

AUC

Hamster


AUC

HEP


055

060

065

070

075

080

AUC

Karate


AUC

PB


AUC

USAir


AUC

Word


055

060

065

070

075

080AU

CYeast







8 Complexity


minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196



CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA010015020025030035040045050

Rank

ing

Scor

e

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


02

03

04

05

06

Rank

ing

Scor

e

CE


04

05

06

07

08

Rank

ing

Scor

e

Dolphin


03

04

05

06

07

Rank

ing

Scor

e

Email


032

034

036

038

040

042

044

Rank

ing

Scor

e

FW


04

05

06

07

08

Rank

ing

Scor

e

Hamster


02

03

04

05

06

07

Rank

ing

Scor

e

HEP


03

04

05

06

07

08

09

Rank

ing

Scor

e

Karate


Rank

ing

Scor

e

PB


006

008

010

012

014

016

018

020

Rank

ing

Scor

e

USAir


05

06

07

08

09

Rank

ing

Scor

e

Word


Rank

ing

Scor

e

Yeast


Complexity 9












10 Complexity




Data Availability




Acknowledgments


References

















Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018








6 Complexity







minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196




Complexity 7


CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA070

075

080

085

090

095

100AU

C

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


070

075

080

085

090

095

AUC

CE


060

065

070

075

080

085

AUC

Dolphin


065

070

075

080

085

090

AUC

Email


055

060

065

070

075

080

AUC

FW


065

070

075

080

085

090

AUC

Hamster


AUC

HEP


055

060

065

070

075

080

AUC

Karate


AUC

PB


AUC

USAir


AUC

Word


055

060

065

070

075

080AU

CYeast







8 Complexity


minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196



CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA010015020025030035040045050

Rank

ing

Scor

e

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


02

03

04

05

06

Rank

ing

Scor

e

CE


04

05

06

07

08

Rank

ing

Scor

e

Dolphin


03

04

05

06

07

Rank

ing

Scor

e

Email


032

034

036

038

040

042

044

Rank

ing

Scor

e

FW


04

05

06

07

08

Rank

ing

Scor

e

Hamster


02

03

04

05

06

07

Rank

ing

Scor

e

HEP


03

04

05

06

07

08

09

Rank

ing

Scor

e

Karate


Rank

ing

Scor

e

PB


006

008

010

012

014

016

018

020

Rank

ing

Scor

e

USAir


05

06

07

08

09

Rank

ing

Scor

e

Word


Rank

ing

Scor

e

Yeast


Complexity 9












10 Complexity




Data Availability




Acknowledgments


References

















Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 7


CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA070

075

080

085

090

095

100AU

C

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


070

075

080

085

090

095

AUC

CE


060

065

070

075

080

085

AUC

Dolphin


065

070

075

080

085

090

AUC

Email


055

060

065

070

075

080

AUC

FW


065

070

075

080

085

090

AUC

Hamster


AUC

HEP


055

060

065

070

075

080

AUC

Karate


AUC

PB


AUC

USAir


AUC

Word


055

060

065

070

075

080AU

CYeast







8 Complexity


minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196



CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA010015020025030035040045050

Rank

ing

Scor

e

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


02

03

04

05

06

Rank

ing

Scor

e

CE


04

05

06

07

08

Rank

ing

Scor

e

Dolphin


03

04

05

06

07

Rank

ing

Scor

e

Email


032

034

036

038

040

042

044

Rank

ing

Scor

e

FW


04

05

06

07

08

Rank

ing

Scor

e

Hamster


02

03

04

05

06

07

Rank

ing

Scor

e

HEP


03

04

05

06

07

08

09

Rank

ing

Scor

e

Karate


Rank

ing

Scor

e

PB


006

008

010

012

014

016

018

020

Rank

ing

Scor

e

USAir


05

06

07

08

09

Rank

ing

Scor

e

Word


Rank

ing

Scor

e

Yeast


Complexity 9












10 Complexity




Data Availability




Acknowledgments


References

















Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018








8 Complexity


minus20

minus22

minus24

minus26

minus28

minus30

z

TRA

CN

TRA

AA

TRA

RA

TRA

AD

P

TRA

CA

R

TRA

CA

A

TRA

CRA

TRA

CCL

P

z=-196



CAR CAA CAR CAA

CAR CAA CAR CAA

CAR CAA CAR CAA

CRA CCLP TRA010015020025030035040045050

Rank

ing

Scor

e

ADV

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020

1020


02

03

04

05

06

Rank

ing

Scor

e

CE


04

05

06

07

08

Rank

ing

Scor

e

Dolphin


03

04

05

06

07

Rank

ing

Scor

e

Email


032

034

036

038

040

042

044

Rank

ing

Scor

e

FW


04

05

06

07

08

Rank

ing

Scor

e

Hamster


02

03

04

05

06

07

Rank

ing

Scor

e

HEP


03

04

05

06

07

08

09

Rank

ing

Scor

e

Karate


Rank

ing

Scor

e

PB


006

008

010

012

014

016

018

020

Rank

ing

Scor

e

USAir


05

06

07

08

09

Rank

ing

Scor

e

Word


Rank

ing

Scor

e

Yeast


Complexity 9












10 Complexity




Data Availability




Acknowledgments


References

















Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 9












10 Complexity




Data Availability




Acknowledgments


References

















Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018








10 Complexity




Data Availability




Acknowledgments


References

















Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 11

































Journal of












Journal of







Volume 2018






Volume 2018















Journal of












Journal of







Volume 2018






Volume 2018








predicting missing links based on a new triangle...

Documents