gl2vec: learning feature representation using graphlets ... · gl2vec: learning feature...

12
1 Learning Features of Network Structures Using Graphlets Kun Tu * , Jian Li * , Member, IEEE, Don Towsley, Life Fellow, IEEE, Dave Braines, Member, IEEE, and Liam D. Turner, Member, IEEE Abstract—Networks are fundamental to the study of complex systems, ranging from social contacts, message transactions, to biological regulations and economical networks. In many realistic applications, these networks may vary over time. Modeling and analyzing such temporal properties is of additional interest as it can provide a richer characterization of relations between nodes in networks. In this paper, we explore the role of graphlets in network classification for both static and temporal networks. Graphlets are small non-isomorphic induced subgraphs representing connected patterns in a network and their frequency can be used to assess network structures. We show that graphlet features, which are not captured by state-of-the-art methods, play a significant role in enhancing the performance of network classification. To that end, we propose two novel graphlet-based techniques, gl2vec for network embedding, and gl-DCNN for diffusion-convolutional neural networks. We demonstrate the efficacy and usability of gl2vec and gl-DCNN through extensive experiments using several real-world static and temporal networks. We find that features learned from graphlets can bring notable performance increases to state-of-the-art methods in network analysis. Index Terms—Graphlet, subgraph feature, temporal network, DCNN, subgraph ratio profile, network classification, null model 1 I NTRODUCTION Networks, where elements are denoted as nodes and their interactions are denoted as edges, are fundamental to the study of complex systems [2], [35], including social, com- munication, and biological networks. Analysis of networks include network classification, community detection and so on. Typical analysis usually models these systems as static undirected graphs that describe relations between nodes. However, in many realistic applications, these relations are directional and may change over time [21], [26], [42]. Mod- eling these directed and temporal properties is of additional interest as it can provide a richer characterization of rela- tions between nodes in networks. It has become one of the most in-demand but computationally challenging problems in today’s highly connected society. * Co-primary authors. This research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under Agreement Number W911NF-16-3-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. K. Tu is with the College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA. E-mail: [email protected]. J. Li is with the Department of Electrical and Computer Engineering, Binghamton University, the State University of New York, Binghamton, NY, USA. E-mail: [email protected]. D. Towsley is with the College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA. E-mail: [email protected]. D. Braines is with IBM Research UK, Emerging Technology, Hursley Park, Winchester, UK. E-mail: dave [email protected]. L. D. Turner is with School of Computer Science and Informatics, Cardiff University, Cardiff, UK. E-mail: [email protected]. A wide variety of analytical approaches have been pro- posed for network classification tasks in networks. This of- ten involves applying machine learning techniques to these problems, which requires the network to be represented as a feature vector. However, representing a network as a feature vector is challenging due to high dimensionality and potentially large and complex network structures. In this paper, we propose to advance the state-of-the-art by exploring the role of graphlets in network classification for both static and temporal networks. Graphlets are small non-isomorphic induced subgraphs representing connected patterns in a network and their frequency can be used to assess network structure. For example, Figure 1 shows all 16 triads in static directed network, and Figure 2 shows all possible 2-node and 3-node, 3-edge, δ-temporal graphlets. These will be described in detail in Section 2. While there is a dramatic increase of research into net- works in recent years, there is little work investigating graphlets within graphs that capture important human ties and interactions among these networks. Additionally, most studies of graphlets focused on their presence in static graphs and little is known about how they change over time. Such an understanding of network dynamicity can offer interpretable explanations into the possible role or purpose of the observed network based on similarity to other networks. Such analyses can also provide guidance on addressing temporal changes in networks on network analysis, including comparison, classification and prediction of temporal network behaviors. In this paper, we focus on investigating the functionality of graphlets in network embedding and graph neural networks for network classifi- cation. Network Embedding. Various ways of learning feature repre- sentations of nodes in networks have been recently proposed to exploit their relations to vector representations [1], [18], arXiv:1812.05473v3 [cs.SI] 5 Apr 2020

Upload: others

Post on 14-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

1

Learning Features of Network StructuresUsing Graphlets

Kun Tu∗, Jian Li∗, Member, IEEE, Don Towsley, Life Fellow, IEEE, Dave Braines, Member, IEEE,and Liam D. Turner, Member, IEEE

Abstract—Networks are fundamental to the study of complex systems, ranging from social contacts, message transactions, tobiological regulations and economical networks. In many realistic applications, these networks may vary over time. Modeling andanalyzing such temporal properties is of additional interest as it can provide a richer characterization of relations between nodes innetworks. In this paper, we explore the role of graphlets in network classification for both static and temporal networks. Graphlets aresmall non-isomorphic induced subgraphs representing connected patterns in a network and their frequency can be used to assessnetwork structures. We show that graphlet features, which are not captured by state-of-the-art methods, play a significant role inenhancing the performance of network classification. To that end, we propose two novel graphlet-based techniques, gl2vec for networkembedding, and gl-DCNN for diffusion-convolutional neural networks. We demonstrate the efficacy and usability of gl2vec andgl-DCNN through extensive experiments using several real-world static and temporal networks. We find that features learned fromgraphlets can bring notable performance increases to state-of-the-art methods in network analysis.

Index Terms—Graphlet, subgraph feature, temporal network, DCNN, subgraph ratio profile, network classification, null model

F

1 INTRODUCTION

Networks, where elements are denoted as nodes and theirinteractions are denoted as edges, are fundamental to thestudy of complex systems [2], [35], including social, com-munication, and biological networks. Analysis of networksinclude network classification, community detection and soon. Typical analysis usually models these systems as staticundirected graphs that describe relations between nodes.However, in many realistic applications, these relations aredirectional and may change over time [21], [26], [42]. Mod-eling these directed and temporal properties is of additionalinterest as it can provide a richer characterization of rela-tions between nodes in networks. It has become one of themost in-demand but computationally challenging problemsin today’s highly connected society.

∗Co-primary authors.This research was sponsored by the U.S. Army Research Laboratory and theU.K. Ministry of Defence under Agreement Number W911NF-16-3-0001. Theviews and conclusions contained in this document are those of the authors andshould not be interpreted as representing the official policies, either expressed orimplied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K.Ministry of Defence or the U.K. Government. The U.S. and U.K. Governmentsare authorized to reproduce and distribute reprints for Government purposesnotwithstanding any copyright notation hereon.

• K. Tu is with the College of Information and Computer Sciences,University of Massachusetts Amherst, Amherst, MA, USA. E-mail:[email protected].

• J. Li is with the Department of Electrical and Computer Engineering,Binghamton University, the State University of New York, Binghamton,NY, USA. E-mail: [email protected].

• D. Towsley is with the College of Information and Computer Sciences,University of Massachusetts Amherst, Amherst, MA, USA. E-mail:[email protected].

• D. Braines is with IBM Research UK, Emerging Technology, HursleyPark, Winchester, UK. E-mail: dave [email protected].

• L. D. Turner is with School of Computer Science and Informatics, CardiffUniversity, Cardiff, UK. E-mail: [email protected].

A wide variety of analytical approaches have been pro-posed for network classification tasks in networks. This of-ten involves applying machine learning techniques to theseproblems, which requires the network to be representedas a feature vector. However, representing a network asa feature vector is challenging due to high dimensionalityand potentially large and complex network structures. Inthis paper, we propose to advance the state-of-the-art byexploring the role of graphlets in network classification forboth static and temporal networks. Graphlets are smallnon-isomorphic induced subgraphs representing connectedpatterns in a network and their frequency can be used toassess network structure. For example, Figure 1 shows all16 triads in static directed network, and Figure 2 shows allpossible 2-node and 3-node, 3-edge, δ-temporal graphlets.These will be described in detail in Section 2.

While there is a dramatic increase of research into net-works in recent years, there is little work investigatinggraphlets within graphs that capture important human tiesand interactions among these networks. Additionally, moststudies of graphlets focused on their presence in staticgraphs and little is known about how they change overtime. Such an understanding of network dynamicity canoffer interpretable explanations into the possible role orpurpose of the observed network based on similarity toother networks. Such analyses can also provide guidanceon addressing temporal changes in networks on networkanalysis, including comparison, classification and predictionof temporal network behaviors. In this paper, we focuson investigating the functionality of graphlets in networkembedding and graph neural networks for network classifi-cation.Network Embedding. Various ways of learning feature repre-sentations of nodes in networks have been recently proposedto exploit their relations to vector representations [1], [18],

arX

iv:1

812.

0547

3v3

[cs

.SI]

5 A

pr 2

020

Page 2: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

2

Fig. 1: All 16 triads in static directed network [14].

[34], [44], [48]. However, most of these are applied to nodeand edge predictions and fail to fully capture network struc-tures. It remains an open question as to whether networkclassification through node embedding methods can be im-proved, since the whole network structure clearly also playsa significant role.Graph Neural Networks. Graph neural networks (e.g.,graph convolutional networks [25], diffusion convolutionalneural networks (DCNNs) [5]) has been proposed as a newmodel to learn representations from graph-structured data,which can outperform probabilistic relational models andkernel methods at node classification tasks. Graph neuralnetworks, in particular, DCNNs, offer a complementaryapproach that provides a significant improvement in pre-dictive performance at node classification tasks. However,its benefit on network classification tasks is not clear.

In this paper, we demonstrate the benefit of includinggraphlet features in network embedding and graph neuralnetworks for network classification. To achieve this goal, weproposal a graphlet-based network embedding methodol-ogy, gl2vec, and a graphlet-based DCNN, gl-DCNN, respec-tively, and apply these in extensive experiments to showsignificant performance improvements.

1.1 gl2vec

We propose a novel network embedding methodology,gl2vec, to address the aforementioned issues with currentstate-of-the-art techniques. To capture network structure,gl2vec constructs vectors for feature representations by com-paring static or temporal network graphlet statistics in a networkto random graphs generated from different null models (known assubgraph ratio profile, i.e., SRP, see Section 3). Null modelsare used to generate random graphs with specific structuralfeatures to determine the significance of graphlet frequencywith a specific graph (See Section 3). We show how the ratiosof occurrences of graphlets in a network to their occurrencesin random graphs (SRPs) can be used as a fixed lengthfeature representation to classify and compare networks ofvarying sizes and periods of time with improved accuracy.

We demonstrate the efficacy and usability of gl2vec onnetwork classification tasks such as network type classifica-tion and subgraph identification in several real-world static

Fig. 2: All 2-node and 3-node, 3-edge, δ-temporal graphletsas defined in [42]. Edge labels correspond to the ordering ofedges. All 36 graphlets are labeled with Mi,j across 6 rowsand 6 columns. The first edge in each graphlet is from thegreen to the orange node. The second edge is the same alongeach row, and the third edge is the same along each column.

and temporal directed networks. We apply various well-known machine learning models along with our graph fea-ture representation for network classifications, and make acomparison with state-of-the-art methods, such as differentgraph kernels [48], node2vec [18], struc2vec [44], sub2vec [1],and graph2vec [34].

First, we study how static and temporal networkgraphlets can be used to classify the network type. A networktype is defined as the network domain [42] (or superfamily[32], representing the type of relation or interaction betweennodes) that a network belongs to, e.g., email networks,Google+ or Twitter in social networks, question answeringnetworks, or even networks representing switching betweenmobile apps. Graphs or subgraphs from the same networktype often have similar structures [32]. Correctly identifyingthe type of network further allows us to study interactionsbetween nodes, and predict unobserved network structures.Secondly, we consider the problem of identifying a partic-ular (sub)network within the same network type from itsstatic or temporal topological structure. For example, wepredict the community ID for (sub)graphs within a network,such as identifying a department based on the temporalemail-exchange pattern or detecting a mobile phone usergiven their app switching behaviors represented as static ortemporal networks.

Given a network topological structure, identifying thenetwork type or a network community ID in a networkcan be viewed as a (sub)graph classification problem. Manyexisting methods use different graph embedding techniquesto represent graphs in a vector space and apply machinelearning methods for classification. Yet, little work hasapplied network graphlets to real-world application in di-rected (temporal) network classification. In this paper, wefind that a strong relation exists between network type,graphlet distribution and subgraph ratio profile (SRP), ameasure to decide if a graphlet is a significant patternin a network by comparing graphlet counts between anempirical network and random graphs from a null model.

Page 3: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

3

We also numerically characterize the impact of different nullmodels on the performance of network classification in staticdirected networks and results show that all models achieveequally good performance.

From this, we find that gl2vec identifies additional net-work features that are not captured by state-of-the-art meth-ods. These features are important to improve the accu-racy of network classification in both static and temporaldirected networks. We show that concatenating gl2vec toaforementioned state-of-the-art methods for network typeclassification and subgraph identification tasks can bringnotable performance increases.

1.2 gl-DCNN

DCNN requires node attributes to embed nodes andgraphs. We propose, gl-DCNN, which concatenates addi-tional graphlet features to DCNN. We demonstrate the ef-ficacy and usability of gl-DCNN on both node classificationand graph classification in several real-world networks. Weshow that gl-DCNN can significantly improve the perfor-mance of DCNN, especially in graph classification.

The rest of the paper is organized as follows. Firstly, wepresent preliminaries in Section 2. From this, we presentgl2vec and gl-DCNN, and their evaluations in Section 3, andSection 4, respectively. We discuss related work in Section 5and conclude our findings in Section 6.

2 DEFINITIONS

In this section, we provide definitions used in the rest ofthe paper. For temporal networks and temporal networkgraphlets, we consider definitions given in [42], althoughwe can equally use definitions in [26]. We present them herefor completeness.

Definition 1. A temporal directed network [42] is a set ofnodes and a collection of directed temporal edges with a timestampon each edge. Formally, a temporal directed network T on a setof nodes V is a collection of tuples (ui, vi, ti), i = 1, · · · , N,where N is the number of directed temporal edges, ui, vi ∈ Vand ti ∈ R is a timestamp. We refer to (ui, vi, ti) as a temporaledge.

In order to strictly order the tuples, we assume times-tamps ti are unique. This assumption can be easily extendedto cases where timestamps are not unique at the cost ofcomplex notation.

Definition 2. A directed static network G(V,E) is defined asa set of nodes, denoted as V and a set of directed edges withouttimestamps, denoted as E ⊂ V 2 \ {(u, u) : u ∈ V }.

In the following, we formalize the definitions of (static)graphlet and temporal graphlet.

Definition 3. Graphlets are small connected non-isomorphicinduced subgraphs of a larger network.

In particular, we focus on triads, shown in Figure 1. Notethat the first three triads are not connected, hence do notsatisfy the graphlet definition, but we argue that they arealso important in constructing vectors for network featurerepresentation.

It is worth mentioning that four- or even five-nodegraphlets can also be used for network embedding in ourmodel. However, counting four-or five-node graphlets aremuch more computationally expensive and some of ourgraph datasets (see Section 3.4) are too large to reasonablycompute counts of graphlets with more than three nodes.Given this and that triads have been widely studied inliterature, we focus on triads in this work.

Definition 4. Temporal network graphlets [42] are defined asinduced subgraphs on sequences of temporal edges. Formally,a k-node, l-edge, δ-temporal graphlet is a sequence of l edges,M = (u1, v1, t1), · · · , (ul, vl, tl) that are time-ordered within aduration δ, i.e., t1 < · · · < tl and tl − t1 ≤ δ, such that theinduced static graph from edges is connected with k nodes.

We consider all 2-node and 3-node, 3-edge, δ-temporalgraphlets, as shown in Figure 2. Note that [42] used the termnetwork motif.

3 GL2VEC: NETWORK EMBEDDING WITHGRAPHLETS

In this section, we consider and characterize the significantimportance of graphlets in designing network embeddingtechniques. Network embedding has received considerableattention due to its effect on the performance of networkclassification, see Section 5. However, previous work hasprimarily focused on examining this for undirected staticnetworks. Applying these existing techniques to directedstatic networks may lose network structure information,while applying them to temporal networks may lose tem-poral information, and both may result in reduced accuracy.Therefore, we introduce a new static (temporal) networkembedding technique based on static (temporal) networkgraphlets.

3.1 Subgraph Ratio Profile and Null Models

Graph embeddings need to be independent of network sizeand, if temporal, the time period the network covers. Whileprevious work has shown that the counting and probabilitydistribution of graphlets are strongly related to networktypes [42], graphlet counts may differ across networks. Toachieve this independence, we use subgraph ratio profile(SRP) for network embedding, which is computed usinggraphlet counts from both the network in question andrandom graphs produced using a null model.

Definition 5. A null model [37] is a generative model used togenerate random graphs that matches a specific graph in some ofits structural features such as the degrees of nodes or number ofnodes and edges.

3.1.1 Static NetworksFor static networks, we consider three different null models:

(i) NE: random graphs with the same number of nodesand edges;

(ii) MAN: random graphs with the same numbers of(M)utual, (A)symmetric and (N)ull edges;

(iii) BDS: random graphs with the same in/out degree-pair sequence (also called bidegree sequence, BDS).

Page 4: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

4

NE has been widely used in previous studies since itis easy to generate random graphs [28] and the probabilityof a node degree in a random graph can be approximatedby Poisson distribution in the large limit of graph size [38].Thus network features and graphlet statistics can be easilymodeled. Recent studies [4], [38] showed that node degreesin a wide range of real-world networks do not necessarilyfollow a Poisson distribution and suggested a null modelwith controlled node degree sequence for network study.We extend this to BDS for our study. A study using di-rected networks [23] discovered that reciprocity betweennodes in different social networks tends to reach maximalreciprocity constraint given in/out degree sequence. Sincereciprocities can be computed by the numbers of mutual,asymmetric and null edges, the effect of graphlets statisticsin MAN on network classification is worth investigating.We numerically characterize the impact of null models onthe performance of network classification in Section 3.4.

3.1.2 Temporal NetworksFor temporal networks, since there is no equivalent nullmodel, we consider ensembles of randomized time-shuffleddata as a temporal null model [31]. To be more specific,we randomly permute the timestamps on the edges whilekeeping the node pairs fixed. This model breaks the tempo-ral dependencies between edges but preserves the networkstructure.

In our study, we use a null model to compare graphletcounts in a network against random graphs. The differencebetween counts is then used to construct an SRP as a featurerepresentation of the network.

Definition 6. Subgraph ratio profile (SRP) [32] for a graphlet iis defined as

SRPi =∆i√∑

∆2i

, (1)

where

∆i =Nobi− < Nrandi

>

Nobi+ < Nrandi > +ε.

Here Nobi is the count of graphlet i observed in an empiricalnetwork, and < Nrandi

> is the the average count in randomnetworks in a null model. Last, ε (usually set to four) is an errorterm to make sure that ∆i is not too large when a graphlet i rarelyappears in both empirical and random graphs.

A large positive value of an SRP indicates that a graphletoccurs much more frequently in a network than would beexpected by random chance. Since SRP for a graphlet hasbeen normalized, it can be used to compare different sizenetworks. The network embedding is a vector containing16 SRPs for static triads. For null models of temporal di-rected networks, we randomly order of temporal edges. Theembedding contains the SRPs for the 36 temporal graphletsillustrated in Figure 2.

3.2 Problem FormulationWe use two network classification tasks, network typeclassification and subgraph identification, as the basis forevaluating gl2vec.

Denote {Gi(Vi, Ei, Li)}Ni=1 as (sub)graphs in differentstatic or temporal networks, where Vi is a set of nodes andEi is a set of edges in Gi. If Gi is a temporal network,Ei is then a temporal edge with a timestamp as definedin Definition 1, otherwise, Ei is a directed edge. Supposethat graphs can be categorized into D classes, D < N . Weassociate each graph Gi with a label Li ∈ {1, · · · , D}.

Let f : {Gi} → Rm be a mapping function (also calledgraph embedding function) from Gi to a 1 × m featurerepresentation vector defined using SRPs of static or temporalgraphlets.

Let g : Rm → P ∈ RD be a classifier that maps a featurerepresentation to a categorical distribution P for D labels.We represent probability distribution of Gi’s label as Pi =[pi,1, . . . , pi,D] = g(f(Gi)).

Our goal is to solve this classification problem by de-signing an embedding function f and selecting a machinelearning model g that minimizes the sum of cross entropy [11]for all graphs

arg ming,f

−∑i

D∑j=1

1Li=j log(pi,j)

= arg min

g,f

(−∑i

log(pi,Li)

).

We obtain g by training machine learning models. In thenext section, we discuss how to design an embedding func-tion f for static and temporal networks using graphlets.

Algorithm 1: gl2vecData: Static or temporal graph edges list E,

Null model MResult: Graph feature vector ~f

1 ~Nob = getGraphletCounts(E) ;2 ~Nrand = getAvgGraphletCountsInNullModel(E,M )3 for i = 1 : | ~Nobs| do4 ~fi = getSRP( ~Nobsi ,

~Nrandi )

5 return ~f

3.3 Algorithmgl2vec works as follows: given the topological structure ofa directed static or temporal network, we first compute itsgraphlet counts. For static networks, we applied JMotif [52]to compute triad counts for networks and random graphsin different null models. We refer interested readers to [52]for more details. For temporal networks, we use the SNAPpackage [42] to compute 3-edge, δ-temporal graphlet counts.

We then compute average graphlet counts in null modelsNE. For static networks, there are two approaches: simu-lation based and probability based. The simulation basedapproach generates a large set of random graphs with thesame structure of the given network and a graphlet countsare computed for each random graph. The probability basedapproach computes the probability of occurrence for eachtype of graphlet given the in/out degree of the nodes in-volved. The time complexity of simulation based approachis O(MND2), where M is the number of random graphs,

Page 5: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

5

Network type Datasets # of nodes # of edgesSocial networks Twitter 81, 306 1, 768, 149

Google+ 106, 674 13, 673, 453Question answering networks Askubuntu 159, 316 596, 933

Mathoverflow 24, 818 239, 978P2P file sharing networks p2p-Gnutella 6, 301 to 62, 586 20, 777 to 147, 892

Physics paper citation networks Cit-HepPh 34, 546 421, 578Cit-HepTh 27, 770 352, 807

Friendship networks Slashdot 77, 360 905, 468Wikipedia networks WikiVote 7, 115 103, 689

Bitcoin networks OTC trust weighted signed 5, 881 35, 592Trust networks Epinion 75, 879 508, 837

Advice-seeking networks Advice 30 to 60 200 to 500Co-sponsorship networks US Senate 100 to 104 4188 to 6415

TABLE 1: Summary of static datasets characteristics.

N is the number of nodes, and D is the average degree ofthe graph in consideration. For probability based approach,the time complexity is D3, where D is the number of uniquebi-degree pair [52].

We apply the probability based approach to NE due toits fast computation speed and high accuracy. We applythe simulation based approach to the BDS null model be-cause it has lower computational complexity. It also avoidsapproximation errors that occur when the network size issmall (less than 1000 nodes). For temporal networks, wegenerate random graphs by shuffling timestamps on edgesand then compute their average temporal graphlet counts.Finally, we compute SRPs for corresponding graphlets usingEquation (1). The pseudocode is presented in Algorithm 1.

3.4 ExperimentsIn this section, we conduct network classification tasks onseveral real-world static and temporal directed networks.Experiments include two tasks: network type classificationand subgraph identification. In network type classification,we use gl2vec to predict the most likely purpose of the re-lation and interaction between nodes, e.g., email communi-cation, question answering or friendship in social networks.In subgraph identification, we predict the community IDfor (sub)graphs within the same network. Examples includeidentifying a department based on email-exchange patternsor detecting a mobile phone user based on their app switch-ing behavior represented as static or temporal networks.

Highlights of our experimental findings include:B gl2vec, constructing vectors for feature representations

using static or temporal graphlet SRPs, can outperform orachieve comparable performance with state-of-the-art meth-ods in network type classification.

B For static directed networks, the impact of null modelson the computation complexity of SRP is significant. Com-pared to the other two null models, BDS and MAN, NE hasthe least control on the network structure, i.e., it providesthe most randomness in the structure of a random graph.Hence, NE is much easier to implement. Furthermore, weobserve that NE achieves a slightly better performance innetwork classification in most cases.

B Adding graphlet features from gl2vec to state-of-the-art-methods significantly improves their performance. Thissuggests that graphlet patterns from SRPs provide substan-tial additional information about network type that do notexist in state-of-the-art-methods.

B Both static and temporal graphlets play importantroles in temporal network classification.

3.4.1 Datasets

We use a wide range of real-world network datasets, whichonly contain topological structure. Attributes of nodes andedges are unknown, except for labels for classification, andtimestamps of edges, in temporal networks. These datasetsmay challenge some current state-of-the-art methods thatrequire attributes of nodes or edges.Static Directed Network Datasets: We use different types ofstatic directed networks and perform network classificationusing their topological structures in our experiments.• SNAP Datasets [30]: For social Networks, Twitterdataset contains 1000 ego-networks with 81, 306 nodes and1, 768, 149 edges. Google+ dataset contains 133 ego-networkswith 106, 674 nodes and 13, 673, 453 edges. A directed edgefrom u to v represents that user u follows v. The size ofego-networks range from 10 to 4, 964 nodes.

Askubuntu and Mathoverflow datasets are question-answering networks that store interactions between users.The interactions include posting answers to question (a2q),comments to questions (c2q) and comments to answers(c2a). Both datasets contain four directed networks: an a2qnetwork, a c2q network, a c2a network and a networkcontaining all interactions.

p2p-Gnutella dataset contains 9 directed peer-to-peer filesharing networks. Nodes represent hosts and edges repre-sent topological connections between hosts.

Cit-HepPh and Cit-HepTh are two physics paper citationnetworks. Cit-HepPh (Cit-HepTh).

Slashdot is a friendship network, where users tag eachother as friends. WikiVote dataset contains votes from users inWikipedia to promote other users to become administrators.Bitcoin OTC trust weighted signed network contains ratingsfrom Bitcoin users to other users.•Other Network Types: Epinion social network [45] is a who-trusts-whom network from a consumer review site Epin-ions.com. A directed edge represents that a user “trusts”another user. Advice dataset contains advice-seeking be-tween employees in four different companies [9], [27],[29]. Co-sponsorship networks [15] contain US Senate co-sponsorship patterns during the 1995, 2000, 2005, and 2010congressional terms. Nodes represent senators and a di-rected edge from u to v represents that senator u cospon-

Page 6: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

6

sored at least one piece of legislation for which senator vwas the primary sponsor.

A summary of characteristics of these datasets are givenin Table 1.Temporal Directed Network Datasets: We also collect tem-poral directed networks to test feature representation usingtemporal graphlets.• Email Networks: EmailEU [59] is a directed temporalnetwork constructed from email exchanges in a large Eu-ropean research institution for a 803-day period. It contains986 email addresses as nodes and 332, 334 emails as edgeswith timestamps. There are 42 ground truth departmentsin the dataset and we choose 26 departments whose emailnetwork sizes are larger than 10. EmailTraffic [40] is atemporal directed network storing email interactions of 819staff in 23 different departments in BBN for about 7 months.Edges with integer timestamps represent emails sent out ata certain time.

We constructed temporal subgraphs, each lasting 12weeks for departments in EmailEU networks. This ensureseach subgraph becomes a connected network componentwhen converted to an unweighted static graph. We createthese graphs at the beginning of every four weeks to avoidtoo much overlap of edges between graphs. Each depart-ment has up to 28 subgraphs as a result. For departments inEmailTraffic, we create subgraphs at the beginning of everyweek and each subgraph covers four weeks.• SwitchApp: (from the Tymer project [54]–[56]) containsapplication switching data for 53 Android users over a 42-day period. We construct a directed temporal network foreach user on each day, where a directed edge (denoted aseuv) with an integer timestamp t represents a user switchingfrom an app u to another v at time t.

3.4.2 Experiment Setup

We compute SRPs for static and temporal graphlets forcorresponding static and temporal networks in our datasets.We use three widely used machine learning models thatprovide good performance using small amounts of train-ing data in multi-class classification: XGBoosting [7], SVM[8], random forest [49]. XGBoosting usually has a superiorperformance over other classifiers when the dataset is ofmiddle size. SVM is suitable for a small amount of trainingdata. Random forest not only works well for imbalanceddata, but also performs feature selection during trainingwhich can help us investigate the usefulness of our featurerepresentation, especially when used in conjunction withother approaches by concatenating the feature vectors.

We use a grid search method to search the best hyper-parameters for these models. For the XGBoosting algorithm,the learning rate ranges from 0.001 to 1, maximal tree depthrange from 4 to 32, minimal child weight is 1 and thesubsample ratio of train instances ranges from 0.4 to 1. Theregularization weight in SVM ranges from 1 to 8. In randomforest, the number of trees ranges from 50 to 400 and theminimal number of samples required to split a tree nodefrom 2 to 10. 10-fold cross-validation is adopted to splitthe data to select the best parameters. All experiments areconducted using a cluster with 32 Xeon CPU with 256GBRAM and one Tesla K40 GPU.

We compare the network classification accuracy ofgl2vec to state-of-the-art methods, including graphlet andWeisfeiler-Lehman kernels [48], and recently developednode and graph embedding methods node2vec [18], struc2vec[44], sub2vec [1], graph2vec [34].

For node embedding methods such as node2vec andstruc2vec, we apply a sum-based approach [10] to aggregatenode embedding vectors to construct a graph embedding.We refer interested readers to [20] for more detail. The lengthof network embedding (ranging from 50 to 500) is deter-mined using grid search and 10-fold cross-validation. Wemodify state-of-the-art methods to apply them to directedgraphs: we run a random walk on directed graphs in sub2vecinstead of undirected graphs. Some state-of-the-art methodsalso require node attributes for network embeddings andnode degree are suggested for computing undirected graphembeddings [20]. For directed networks, we therefore useNetworkX to compute the in/out degree and centralitiessuch as betweenness, closeness and in/out degree centralityfor each node. We also consider additional attributes: countsof subgraphs of a specific triad that a node belongs to.These counts are normalized as a distribution indicating thelikelihood a node belongs to a specific triad.

3.4.3 Network Type Classification Task

In network type classification, we are given the topologicalstructure of a subgraph in a network. Our goal is to predictthe type of interaction that an edge represents, e.g. emailexchange or question answering.

Among all the datasets introduced in Section 3.4.1,EmailEU, EmailTraffic and SwitchApp datasets have groundtruth labels (department ID or user ID) available for eachsubgraph, which is created from email exchanges in adepartment or app switch behaviors of a user within aperiod of time. Hence, we can obtain all subgraphs for thesecommunities in these three networks. For the other datasets,there is no ground truth information on network communi-ties; we compute network communities using modularity[36] to obtain subgraphs. These subgraphs are convertedinto feature vectors using the previously introduced em-bedding methods and assigned labels according to networktypes. Finally, we collect about 10, 000 (sub)graphs from2, 355 real-world networks taken from 15 network typesintroduced above, which include Google+ and Twitter in so-cial networks, high energy physics theory citation networks,Gnutella P2P networks, SwichApp and so on.Static Directed Networks: We use all datasets to evaluateembedding methods on static networks. Note that we con-vert temporal networks into unweighted static networks byremoving the timestamps on the edges. Baseline methodsinclude graphlet graph kernel (GK graphlet), Weisfeiler-Lehman graph kernel (GK WL), feature vector with triaddistribution (MotifDist), node2vec, graph2vec, sub2Vec andstruc2vec.

We first investigate the impact of null models on our pro-posed graph representation approach gl2vec. This includesthe computation of SRPs using the following three differentnull models:

(i) gl2vec(BDS): random graphs with the same bidegreesequence;

Page 7: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

7

XGBoost (%) SVM (%) RF (%)gl2vec(BDS) 80.92 ± 3.20 72.69 ±3.38 80.17 ±4.07

gl2vec(MAN) 81.49 ± 3.33 71.39 ± 3.98 79.78 ±3.46gl2vec(NE) 81.58 ±3.07 71.64 ±2.13 79.42 ±3.69

TABLE 2: The impact of null models on network typeclassification accuracy of gl2vec.

XGBoost (%) SVM (%) RF (%)GK Graphlet 78.94 ± 3.18 72.66 ±2.79 78.72 ±3.01

+gl2vec 82.18 ± 2.86 69.01 ± 2.27 81.39 ± 3.36GK WL 78.26 ±2.65 72.81 ± 2.74 78.41 ±3.02+gl2vec 82.54 ± 2.85 68.59 ± 2.75 82.26 ± 3.43

MotifDist 78.08 ± 3.34 71.40 ±2.29 78.01 ± 3.56+gl2vec 81.75 ± 3.48 69.70 ± 3.64 80.95 ± 3.63

node2vec 74.25 ±3.07 69.03 ±1.23 72.24 ±1.67+gl2vec 88.76 ± 1.26 73.24 ± 2.92 86.14 ± 1.71

graph2vec 72.48 ± 3.99 70.81 ± 3.84 72.61 ±3.36+gl2vec 79.83 ± 4.59 66.70 ± 4.04 80.03 ± 4.38sub2vec 81.39 ± 1.70 79.69± 1.41 78.44 ±2.26+gl2vec 92.30 ± 2.29 83.16 ± 2.62 90.01 ± 2.16

struc2vec 79.15 ± 3.42 78.22 ±3.15 78.94 ±3.31+nodeTriadDistr 81.93 ± 3.53 79.18 ± 3.55 82.01 ± 3.42

+gl2vec 93.38 ± 1.51 84.25 ± 0.82 93.48 ± 1.42gl2vec 81.58 ±3.07 71.64 ±2.13 79.42 ±3.69

TABLE 3: Network type classification accuracy. We use“+” to denote an embedding generated by combining twoembedding methods. Bold indicates the best performaningmachine learning model for each embedding.

(ii) gl2vec(MAN): random graphs with the same numberof mutual, asymetric and null edges;

(iii) gl2vec(NE): random graphs with the same number ofnodes and edges.

The accuracies of gl2vec with different null models fornetwork type classification are shown in Table 2. We observethat all null models to compute SRP have similar impact onthe classification, with NE performing slightly better thanthe others. NE is thus preferable among the three because ofits lower computational complexity. Therefore, in the rest ofthe paper, we only present results for gl2vec using NE, andfor simplicity, we denote gl2vec(NE) as gl2vec directly.

The accuracies of different embedding methods for net-work type classification are presented in Table 3. We makethe following observations:

B The machine learning methods used have an impacton the results. For this task, XGBoost provides the bestperformance on average in network type classification.

B gl2vec can outperform or achieve comparable per-formance with state-of-the-art methods with appropriatemachine learning methods.

B Combining gl2vec with different state-of-the-art meth-ods by directly concatenating their feature vectors providessignificant improvement on the state-of-the-art methods,especially for graph-based network embedding methodssub2vec and struc2vec. This suggests that both our ap-proach and state-of-the-art methods capture important butdifferent features for network type classification, givingthe best results through the combination of these features.Furthermore, this validates the importance of includingsubgraph information into feature representations for taskslike network classification in which network structure playsa significant role. Moreover, there are also improvements on

MotifDist and GK Graphlet. This confirms that adding nullmodels to construct feature representation helps improveprediction performance. Finally, as representations fromgl2vec, MotifDist and GK Graphlet all construct featuresfrom graphlets, the improvement that gl2vec brings is notas notable.

Fig. 3: Classifying email datasets and SwitchApp TemporalNetworks.

Temporal Directed Networks: We consider the temporaldatasets discussed in Section 3.4.1. We explore whethertemporal graphlets provide more information than staticgraphlets in temporal networks. We investigate the effectof graphlets on predicting whether a temporal (sub)graph isan email exchange network or the app switching behavior ofa mobile user. Since the-state-of-the-art methods work onlyon static networks, we choose gl2vec(NE) as a baseline forcomparison due to its good performance in static networktype classification. The results are shown in Figure 3. FromFigure 3, we observe that temporal information improvesnetwork type classification in all models considered here.Therefore, it is important to use temporal graphlets whenconstructing vectors for feature representations of temporalnetworks, since temporal graphlets provide more networkstructure information than static graphlets.

3.4.4 Subgraph Identification Task

In subgraph identification, we are interested in classifyingsubgraphs within the same network given their topologicalstructure. For example, we can identify which departmentan email exchange subgraph belongs to or detect a mobilephone user given their app switching behavior.

We use EmailEU, EmailTraffic and SwitchApp datasetssince ground truth labels (department ID or user ID) areavailable for each subgraph. We first solve this problemusing static graph embedding methods. Then we investi-gate whether the timestamp information of edges can helpimprove identification accuracy.Static Directed Networks: The results on the accuracyof identifications of departments in emailEu, emailTrafficnetworks and user ID in app switch network using differentmethods are illustrated in Tables 4, 5 and 6, respectively.Again, we only present results for gl2vec using NE asmentioned earlier. We cannot obtain results from graph2vecdue to its insufficient GPU memory in our test system.We also evaluate gl2vec when combined with state-of-the-art methods. We use “+” to denote these combinations. For

Page 8: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

8

XGBoost (%) SVM (%) RF (%)MotifDistr 56.68 ± 6.70 45.82 ± 7.38 61.54 ± 10.50

+gl2vec 64.18 ± 6.52 52.20 ± 4.80 63.79 ± 8.94GK WL 50.96 ± 8.91 47.92 ± 6.15 57.01 ± 6.91+gl2vec 63.12 ± 5.44 51.95 ± 4.44 65.29 ± 8.81

GK Graphlet 61.22 ± 4.70 52.32 ± 5.16 62.90 ± 4.49+gl2vec 62.04 ± 5.69 52.22 ± 4.85 64.35 ± 8.86

node2vec 52.08 ± 3.00 57.76 ± 3.11 57.89 ± 2.83+gl2vec 63.20 ± 3.69 59.20 ± 5.89 63.22 ± 3.40

sub2vec 55.45 ± 3.42 52.02 ± 3.29 59.87 ± 3.77+gl2vec 73.01 ± 8.93 58.88 ± 9.42 77.69 ± 6.90

struc2vec 60.25 ± 9.40 56.8 ± 11.34 60.59 ± 11.14+nodeTriadDistr 60.78 ± 9.13 59.86 ± 9.22 61.24 ± 9.88

+gl2vec 69.78 ± 6.20 54.91 ± 9.14 70.30 ± 7.36gl2vec 61.72 ± 3.09 51.03 ± 3.30 63.09 ± 3.23

TABLE 4: Accuracy in correctly identifying 26 EmailEUdepartment in static directed networks.

XGBoost (%) SVM(%) RF (%)MotifDistr 67.81 ± 7.60 62.60 ± 8.27 70.04 ± 7.48

+gl2vec 78.81 ± 10.87 71.93 ± 6.78 80.19 ± 7.73GK WL 72.18 ± 5.86 70.73 ± 6.81 75.99 ± 5.82+gl2vec 77.96 ± 9.03 71.73 ± 7.45 80.58 ± 7.24

GK graphlet 74.39 ± 10.71 70.77 ± 12.35 78.61 ± 8.91+gl2vec 77.17 ± 11.73 71.52 ± 6.71 80.18 ± 7.23

node2vec 74.02 ± 8.13 70.45 ± 12.25 77.41 ± 6.93+gl2vec 85.21 ± 6.64 75.36 ± 6.88 87.81 ± 4.93sub2vec 77.79 ± 3.93 77.80 ± 3.63 77.01 ± 3.83+gl2vec 83.39 ± 5.43 86.74 ± 5.25 87.00 ± 4.58

struc2vec 73.78 ± 9.40 65.33 ± 9.34 72.16 ± 9.14+nodeTriadDistr 74.35 ± 10.72 66.84 ± 10.08 77.17 ± 10.44

+gl2vec 79.85 ± 14.01 56.23 ± 14.88 81.43 ± 12.38gl2vec 76.80 ± 6.24 71.13 ± 6.49 80.78 ± 5.65

TABLE 5: Accuracy in correctly identifying EmailTrafficdepartment in static directed networks.

example, MotifDistr+gl2vec combines feature vectors fromMotifDistr and gl2vec for feature representation.

We observe that the addition of graphlet SRP featuresto state-of-the-art methods can significantly improve perfor-mance of the corresponding state-of-the-art methods. Thisindicates that our gl2vec method provides new informationnot present in state-of-the-art methods.Algorithm Performance with Graphlet Features: Tables 4,5 and 6 indicate that random forest (RF) is usually moreaccurate for graph embeddings that include our SRP fea-ture vectors in baselines. This is because RF automaticallyperforms feature selection during training and adapts to

XGBoost (%) SVM (%) RF (%)MotifDistr 11.82 ± 2.03 11.62 ± 2.02 12.33 ± 2.28

+gl2vec 16.16 ± 1.85 12.95 ± 2.31 15.34 ± 1.45GK WL 11.50 ± 1.65 14.59 ± 0.97 13.43 ± 2.26+gl2vec 16.01 ± 2.43 13.15 ± 1.44 17.31 ± 1.81

GK graphlet 13.89 ± 1.26 15.24 ± 1.67 15.29 ± 2.28+gl2vec 16.52 ± 1.77 13.98 ± 2.51 15.95 ± 1.61

node2vec 10.15 ± 1.50 7.91 ± 1.32 9.98 ± 1.66+gl2vec 16.33 ± 2.04 12.94 ± 2.71 16.21 ± 1.97sub2vec 16.27 ± 2.20 16.19 ± 4.37 16.54 ± 1.64+gl2vec 31.74 ± 3.58 23.43 ± 2.33 33.94 ± 4.58

struc2vec 14.18 ± 2.21 9.75 ± 2.49 12.17 ± 2.64+nodeTriadDistr 15.23 ± 2.57 7.81 ± 2.02 13.16 ± 2.09

+gl2vec 19.53 ± 3.13 9.30 ± 1.60 20.70 ± 3.07gl2vec 16.17 ± 1.80 13.56 ± 1.60 16.82 ± 1.31

TABLE 6: Accuracy in correctly identifying 53 SwitchAppuser in static directed networks.

the change in number of features. As a result, it is easierfor RF to achieve better results given a similar amountof effort fine-tuning the hyper-parameters. Finally, the im-provements on all machine learning models confirm that itis worth combining our gl2vec graph embedding with othermethods to achieve better performance.Temporal Directed Networks: In temporal networks fromEmailEU and EmailTraffic, we attempt to identify whichdepartment emails belong to. For the SwitchApp dataset,we attempt to identify a particular user based on their dailyapp switching behavior represented as a temporal network.

For the EmailEU and EmailTraffic dataset, multiple tem-poral and static networks are constructed for each depart-ment from email exchanges as described in Section 3.4.1. Forthe SwitchApp dataset, 42 temporal and static networks aregenerated for each person from their app switching behav-iors every day. XGBoosting, SVM and random forest are im-plemented using different network feature representations:subgraph ratio profile (SRP) with temporal (“Temporal”)and with static (“Static”) graphlets, combined SRPs withboth temporal and static graphlet (“Temp+Static”). We il-lustrate the result from sub2vec representation (“Sub2Vec”)because it performs best among the baseline methods. Fi-nally, we create a combination of all three representations(“CombineAll”).

Fig. 4: Department identification in EmailEU dataset.Dashed line represents the accuracy of a random selectionmodel.

Fig. 5: BBN department identification in EmailTraffic.Dashed line represents the accuracy of a random selectionmodel.

The results for EmailEU, EmailTraffic and SwitchApp areshown in Figures 4, 5, and 6, respectively. The dashed line

Page 9: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

9

Fig. 6: User identification in SwitchApp. Dashed line repre-sents the accuracy of a random selection model.

is the accuracy of a random selection model. We make thefollowing observations:

B The accuracy achieved by temporal graphlet embed-ding is slightly better than that of static graphlet embed-ding in both emailEU and SwitchApp datasets. However,static graphlet embedding performs better than temporalgraphlets in EmailTraffic dataset. This shows that staticgraphlets are still useful for temporal network classificationand can capture useful features even better than temporalgraphlets in some datasets. Hence, we combine both staticand temporal graphlet features (“Temp+Static”) and ob-serve that this achieves a significant improvement in accu-racy against the baseline, which suggests that both temporaland static graphlets are useful for network identification (ofdepartments or personal app switching behavior).

B Our graphlet-based network embeddings are compet-itive with the state-of-the-art method, sub2vec.

B Combining all three graph embedding vectors (staticand temporal graphlets, and sub2vec) for classification yieldsthe best accuracy. This suggests that both our static andtemporal embedding approaches capture useful features toboost the performances of state-of-the-art methods.

B It should be noted that the accuracy for SwitchApp ismuch lower than the other two datasets. This is primarilybecause all users have very structurally similar app switchnetworks [56] and therefore the ability to distinguish be-tween them is a challenging task, combined with the largenumber of classification choices.

4 GL-DCNN: DIFFUSION-CONVOLUTIONAL NEU-RAL NETWORK WITH GRAPHLETS

Diffusion-convolutional neural networks (DCNNs) [5] havebeen proposed as a new model for graph-structured data,which significantly outperforms state-of-the-art methods fornode classification tasks, and offer comparable performanceto baseline methods for graph classification tasks. In thissection, we argue that the subgraph features can also sig-nificantly improve the performance of DCNN in networkclassification tasks, especially for graph classification.

4.1 Node Features from GraphletsDCNNs require node attributes to embed nodes and graphs.We generate additional node attributes that contain graphlet

CoraModel Accuracy (%) micro F1 (%) macro F1 (%)DCNN 86.77 86.77 85.84

+gl 87.21 87.21 86.07

PubmedModel Accuracy (%) micro F1 (%) macro F1 (%)DCNN 89.73 89.73 89.42

+gl 90.66 90.66 89.82

TABLE 7: Node classification for Cora paper citation net-work and Pubmed network. We use “+” to denote an em-bedding generated by combining features generated fromgraphlets. When subgraph feature is added (+gl), the per-formance is improved.

information. Let v be a node in a graph of size n, wehave that v belongs to (n−1)(n−2)

2 unique triad (3-nodesubgraphs). We compute the distribution of all 16 triad types(see Figure 1) in those subgraphs as attributes for node v.

4.2 Node Classification

We report classification accuracy as well as micro- andmacro-averaged F1, where F1 score is a weighted averageof precision and recall, computed by

F1 =2× precision× recall

precision + recall.

Micro F1 calculates F1 score globally by counting the totaltrue positives, false negatives and false positives, whilemacro F1 calculates F1 score for each label, and find theirunweighted mean. This does not take label imbalance intoaccount. We report each measure as a mean and confidenceinterval computed from several trials.

4.2.1 Node Classification DataThe Cora paper citation network [47] contains 2, 708 nodesrepresenting machine learning papers and 5, 429 edges rep-resenting citations. Each paper has a label from seven ma-chine learning subjects. The Pubmed network [47] contains19, 717 nodes representing scientific papers on diabetes and44, 338 links. Each paper is assigned with a label from threeclasses, where the label is the type of diabetes addressed inthe publication.

DCNN [5] considers both Cora and Pubmed networksas undirected graphs, however, we treat them as directedgraphs in our embedding method. Each undirected edgeis considered as two reciprocal edges. To characterize theimportance of subgraph features (graphlets), we concatenatethe feature vectors for nodes generated from graphlets to thecounterparts of DCNN for classification.

4.2.2 Result DiscussionTable 7 compares the performance of a pure DCNN anda DCNN concatenated with subgraph features generatedfrom graphlets for node classification. It is clear that sub-graph features improves the performance in classificationaccuracy, and micro- and macro-averaged F1 scores. Thisindicates that our subgraph-based node embedding is rep-resenting more information than is captured by a DCNN

Page 10: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

10

Dataset Model Accuracy (%)NCI1 DCNN 62.61 ± 3.23

+gl 65.83 ± 4.41NCl109 DCNN 61.72 ± 3.10

+gl 62.13 ± 3.43MUTAG DCNN 67.44 ± 2.84

+gl 69.49 ± 2.75PCI DCNN 55.46 ± 3.43

+gl 55.83 ± 3.11ENZYMES DCNN 18.16 ± 2.49

+gl 18.01 ± 1.89

TABLE 8: Graph classification comparison of performancebetween a pure DCNN and a DCNN concatenated withsubgraph features from graphlets. We use “+” to denotean embedding generated by combining features generatedfrom graphlets.

directly. However, we note that the improvement is not thatsignificant, one reason for this is that the count for mostsubgraphs are close to zero in the datasets we used here, andthe subgraph distribution is similar in the citation network.

4.3 Graph ClassificationWe also ran experiments to investigate how subgraph fea-tures improve DCNN performance when labels are learnedon the whole network.

4.3.1 Graph Classification DataSimilar to Atwood and Towsley [5], we consider a standardset of graph classification datasets including NCI1, NCI109,MUTAG, PCI and ENZYMES. There are 4, 100 and 4, 127graphs in NCI1 and NCI109 [57] that represent chemicalcompounds. The label of each graph is decided by whetherit has the ability to suppress or inhibit the growth of a panelof human tumor cell lines, and the label of each node is cho-sen from 37 possible labels for NCI1, and 38 possible labelsfor NCI109. There are 188 nitro compounds in MUTAG [12]which are labeled as either aromatic or heteroaromatic withseven node features. The 344 compounds in PTC [51] arelabeled with whether they are carcinogenic with rats with19 nodes features. Three node features are associated with600 proteins in ENZYMES [6].

4.3.2 Result DiscussionTable 8 compares the performance of pure a DCNN [5]and a DCNN concatenated with subgraph features gen-erated from graphlets. In contrast with node classificationexperiments, we observe that subgraph features can signif-icantly improve the performance for graph classification,especially for NCI1, NCI109 and MUTAG datasets. Theseresults further suggest that graphlet features play a criticalrole in graph classification and are helpful for improvingclassification accuracy.

5 RELATED WORK

The primary focus of related works in classifying networksinvolves examining the topological structure of the graph.The work most related to our method is graph kernel, whichhas been used to calculate similarities between static undi-rected graphs [17], [24], [58]. However, the corresponding

computational complexity grows significantly with increasein network size. Moreover, studies in graphlet kernel do notconsider features generated by comparing graphlet countsbetween an empirical network and random graphs from dif-ferent null models. In our experiments we have establishedthat different null models can yield significant improvementin network classification.

Different node embedding techniques have been pro-posed in recent years, such as node2Vec [19], DeepWalk [43],Line [50] and Local Linear Embedding [46] that use featurevectors to embed nodes into high-dimensional space andempirically perform well. However, these methods can onlybe applied to node classification but not graph classification.Neural networks on graphs, such as Graph neural network(GCN) [13], [25], also obtain the embedding for each node.It has been recently shown that GCN-based approaches ob-tain competitive results against kernel-based methods andgraph-based regularization techniques, e.g., [53] exploitsstructural regularity regarding the regular equivalence, [41]considers embedding distribution, or heterogeneous infor-mation is incorporated in [16], [22]. However, these methodsare usually computationally expensive and used for smallscale tasks.

Additionally, several approaches have been proposedto aggregate node feature vectors to a feature vector fornetworks. For example, a graph-coarsening approach [13]computes a hierarchical structure containing multiple lay-ers; nodes in lower layers are clustered and combined asa node in upper layers using element-wise max-pooling.However, this has high computational complexity. Someapproaches [39] define an order of nodes and concatenatetheir feature vectors for a convolutional neural network forclassification, however, this can only be applied to undi-rected static networks. Recently, some subgraph embeddingbased approaches were proposed. struc2vec [44] appliedsum-based approach such as mean-field [10] and loopybelief propagation [33] to aggregate node embedding tograph representation. sub2vec [1] embedded subgraphs witharbitrary structure, while graph2vec [34] was proposed basedon a doc2vec framework to learn data-driven distributedrepresentations of arbitrary sized graphs. But these em-beddings do not fully capture network structures to thebest performance. Similar to ours is [3], which uses motiffrequencies, while we use graphlet distributions and SRP.Furthermore, [3] requires node labels, but we do not withgl2vec and gl-DCNN.

6 CONCLUSION

We presented two novel graphlet-based technologies,gl2vec for network embedding and gl-DCNN for diffusion-convolutional neural networks. We have shown that thefeatures learned from graphlets can significantly improvethe performance of state-of-the-art methods in networkanalysis. Firstly, examined the utility of gl2vec by usingits features in different network classification tasks forboth static and temporal directed networks and found thatconcatenating gl2vec with many state-of-the-art methodsyields the best accuracy for real-world applications suchas identifying network types, predicting community ID forsubgraphs and detecting mobile phone users based on their

Page 11: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

11

app-switching behaviors. Secondly, we showed that addi-tional node attributes learned from graphlet features cansignificantly improve the performance of DCNN, especiallyin graph classification. In future work, we will investigateif graphlet census information can serve as features fornodes in a network. Specifically, we will investigate whetherembedding each node with the numbers of graphlets thatit belongs to in a network can improve node and networkclassification further.

REFERENCES

[1] B. Adhikari, Y. Zhang, N. Ramakrishnan, and B. A. Prakash.Sub2vec: Feature Learning for Subgraphs. In PAKDD, 2018.

[2] R. Albert and A.-L. Barabasi. Statistical Mechanics of ComplexNetworks. Reviews of modern physics, 74(1):47, 2002.

[3] E. G. Allan Jr, W. H. Turkett, and E. W. Fulp. Using network motifsto identify application protocols. In IEEE GLOBECOM, 2009.

[4] L. A. N. Amaral, A. Scala, M. Barthlmy, and H. E. Stanley. Classesof small-world networks. PNAS, 97(21):11149–11152, 2000.

[5] J. Atwood and D. Towsley. Diffusion-convolutional neural net-works. In NIPS, 2016.

[6] K. M. Borgwardt, C. S. Ong, S. Schonauer, S. Vishwanathan, A. J.Smola, and H.-P. Kriegel. Protein function prediction via graphkernels. Bioinformatics, 21(suppl 1):i47–i56, 2005.

[7] T. Chen and C. Guestrin. Xgboost: A Scalable Tree BoostingSystem. In ACM SIGKDD, 2016.

[8] C. Cortes and V. Vapnik. Support-vector networks. Machinelearning, 20(3):273–297, 1995.

[9] R. L. Cross, R. L. Cross, and A. Parker. The hidden power of socialnetworks: Understanding how work really gets done in organizations.Harvard Business Press, 2004.

[10] H. Dai, B. Dai, and L. Song. Discriminative embeddings of latentvariable models for structured data. In ICML, 2016.

[11] P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein. ATutorial on the Cross-Entropy Method. Annals of operations research,134(1):19–67, 2005.

[12] A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J.Shusterman, and C. Hansch. Structure-activity relationship ofmutagenic aromatic and heteroaromatic nitro compounds. corre-lation with molecular orbital energies and hydrophobicity. Journalof medicinal chemistry, 34(2):786–797, 1991.

[13] M. Defferrard, X. Bresson, and P. Vandergheynst. ConvolutionalNeural Networks on Graphs with Fast Localized Spectral Filter-ing. In NIPS, 2016.

[14] M. Doroud, P. Bhattacharyya, S. F. Wu, and D. Felmlee. Theevolution of ego-centric triads: A microscopic approach towardpredicting macroscopic network properties. In IEEE SocialCom,2011.

[15] J. H. Fowler. Connecting the congress: A study of cosponsorshipnetworks. Political Analysis, 14(4):456–487, 2006.

[16] H. Gao and H. Huang. Deep attributed network embedding. InIJCAI, 2018.

[17] Gauzere, Benoit and Grenier, Pierre-Anthony and Brun, Luc andVillemin, Didier. Treelet Kernel Incorporating Cyclic, Stereo andInter Pattern Information in Chemoinformatics. Pattern Recogni-tion, 48(2):356–367, 2015.

[18] A. Grover and J. Leskovec. node2vec: Scalable Feature Learningfor Networks. In ACM SIGKDD, 2016.

[19] A. Grover and J. Leskovec. node2vec: Scalable Feature Learningfor Networks. In ACM SIGKDD, 2016.

[20] W. L. Hamilton, R. Ying, and J. Leskovec. Representation learningon graphs: Methods and applications. arXiv:1709.05584, 2017.

[21] P. Holme and J. Saramaki. Temporal Networks. Physics reports,519(3):97–125, 2012.

[22] X. Huang, J. Li, and X. Hu. Label informed attributed networkembedding. In ACM WSDM, 2017.

[23] B. Jiang, Z.-L. Zhang, and D. Towsley. Reciprocity in socialnetworks with capacity constraints. In ACM SIGKDD. ACM, 2015.

[24] R. Kaspar and B. Horst. Graph Classification and Clustering based onVector Space Embedding, volume 77. World Scientific, 2010.

[25] T. N. Kipf and M. Welling. Semi-supervised classification withgraph convolutional networks. arXiv:1609.02907, 2016.

[26] L. Kovanen, M. Karsai, K. Kaski, J. Kertesz, and J. Saramaki. Tem-poral Motifs in Time-Dependent Networks. Journal of StatisticalMechanics: Theory and Experiment, 2011(11):P11005, 2011.

[27] D. Krackhardt. Cognitive social structures. Social networks,9(2):109–134, 1987.

[28] M. Kretzschmar and M. Morris. Measures of concurrency innetworks and the spread of infectious disease. Mathematicalbiosciences, 133(2):165–195, 1996.

[29] E. Lazega et al. The collegial phenomenon: The social mechanismsof cooperation among peers in a corporate law partnership. OxfordUniversity Press on Demand, 2001.

[30] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large networkdataset collection. http://snap.stanford.edu/data, June 2014.

[31] A. Mellor. Classifying Conversation in Digital Communication.arXiv:1801.10527, 2018.

[32] R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr, I. Ayzen-shtat, M. Sheffer, and U. Alon. Superfamilies of Evolved andDesigned Networks. Science, 303(5663):1538–1542, 2004.

[33] K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy Belief Propagationfor Approximate Inference: An Empirical Study. In UAI, 1999.

[34] A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu,and S. Jaiswal. graph2vec: Learning Distributed Representationsof Graphs. Arxiv preprint arXiv:1707.05005, 2018.

[35] M. Newman. Networks: an Introduction. Oxford university press,2010.

[36] M. E. Newman. Modularity and community structure in net-works. PNAS, 103(23):8577–8582, 2006.

[37] M. E. Newman and M. Girvan. Finding and Evaluating Commu-nity Structure in Networks. Physical review E, 69(2):026113, 2004.

[38] M. E. Newman, S. H. Strogatz, and D. J. Watts. Random graphswith arbitrary degree distributions and their applications. Physicalreview E, 64(2):026118, 2001.

[39] M. Niepert, M. Ahmed, and K. Kutzkov. Learning ConvolutionalNeural Networks for Graphs. In ICML, 2016.

[40] C. Olsson, P. Petrov, J. Sherman, and A. Perez-Lopez. Finding andexplaining similarities in linked data. In STIDS, 2011.

[41] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang. Adversar-ially regularized graph autoencoder for graph embedding. arXivpreprint arXiv:1802.04407, 2018.

[42] A. Paranjape, A. R. Benson, and J. Leskovec. Motifs in temporalnetworks. In ACM WSDM, 2017.

[43] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online Learningof Social Representations. In ACM SIGKDD, 2014.

[44] L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo. struc2vec:Learning Node Representations from Structural Identity. In ACMSIGKDD, 2017.

[45] M. Richardson, R. Agrawal, and P. Domingos. Trust managementfor the semantic web. In semantic Web conference. Springer, 2003.

[46] S. T. Roweis and L. K. Saul. Nonlinear Dimensionality Reductionby Locally Linear Embedding. Science, 290(5500):2323–2326, 2000.

[47] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. Collective classification in network data. AI magazine,29(3):93–93, 2008.

[48] N. Shervashidze, P. Schweitzer, E. J. v. Leeuwen, K. Mehlhorn,and K. M. Borgwardt. Weisfeiler-lehman graph kernels. Journal ofMachine Learning Research, 12(Sep):2539–2561, 2011.

[49] V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, andB. P. Feuston. Random Forest: a Classification and RegressionTool for Compound Classification and QSAR Modeling. Journal ofchemical information and computer sciences, 43(6):1947–1958, 2003.

[50] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line:Large-Scale Information Network Embedding. In WWW, 2015.

[51] H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, and C. Helma.Statistical evaluation of the predictive toxicology challenge 2000–2001. Bioinformatics, 19(10):1183–1193, 2003.

[52] K. Tu. Jmotif, 2018.[53] K. Tu, P. Cui, X. Wang, P. S. Yu, and W. Zhu. Deep recursive

network embedding with regular equivalence. In ACM SIGKDD,2018.

[54] K. Tu, J. Li, D. Towsley, D. Braines, and L. Turner. Networkclassification in temporal networks using motifs. In ECML/PKDD-AALTD, 2018.

[55] K. Tu, J. Li, D. Towsley, D. Braines, and L. Turner. gl2vec: Learningfeature representation using graphlets for directed networks. InIEEE/ACM ASONAM, 2019.

[56] L. D. Turner, R. M. Whitaker, S. M. Allen, D. E. Linden, K. Tu,J. Li, and D. Towsley. Evidence to support common application

Page 12: gl2vec: Learning Feature Representation Using Graphlets ... · gl2vec: Learning Feature Representation Using Graphlets for Directed Networks Kun Tu 1, Jian Li , Don Towsley , Dave

12

switching behaviour on smartphones. Royal Society Open Science,6(3), 2019.

[57] N. Wale, I. A. Watson, and G. Karypis. Comparison of descriptorspaces for chemical compound retrieval and classification. Knowl-edge and Information Systems, 14(3):347–375, 2008.

[58] P. Yanardag and S. Vishwanathan. Deep Graph kernels. In ACMSIGKDD, 2015.

[59] H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich. Local Higher-Order Graph Clustering. In ACM SIGKDD, 2017.

Kun Tu holds a Ph.D. (2019) in ComputerScience from the University of MassachusettsAmherst. He is currently working as a MachineLearning Scientist, designing and developingmachine learning models based on large scaleindustry problems and deploying those modelsin parallel. His researching interest focuses onNetwork Science, Machine Learning and BigData.

Jian Li is an Assistant Professor of ComputerEngineering in Department of Electrical andComputer Engineering at Binghamton Univer-sity, the State University of New York. He wasa postdoctoral research associate at the Uni-versity of Massachusetts Amherst. He receivedhis Ph.D. in Computer Engineering from TexasA&M University in December, 2016, and B.E. inElectrical Engineering from Shanghai Jiao TongUniversity in June, 2012. His current researchinterests lie broadly in the interplay of large scale

networked systems and big data analytics focusing on applying machinelearning, online algorithms, game theory, and signal processing tech-nologies to big data applications.

Don Towsley holds a B.A. in Physics (1971)and a Ph.D. in Computer Science (1975) fromUniversity of Texas. He is currently a Distin-guished Professor at the University of Mas-sachusetts in the College of Information & Com-puter Sciences. He has held visiting positionsat numerous universities and research labs.His research interests include network science,performance evaluation, and quantum network-ing.

He is co-founder and Co-EiC of the new ACMTransactions on Modeling and Performance Evaluation of ComputingSystems (TOMPECS), and has served as Editor-in-Chief of IEEE/ACMTransactions on Networking and on numerous editorial boards. He hasserved as Program Co-chair of several conferences including INFOCOM2009.

He is a corresponding member of the Brazilian Academy of Sciencesand has received several achievement awards including the 2007 IEEEKoji Kobayashi Award and the 2011 INFOCOM Achievement Award. Hehas received numerous paper awards including the 2012 ACM SIG-METRICS Test-of-Time Award, a 2008 SIGCOMM Test-of-Time PaperAward, and a 2018 SIGMOBILE Test-of-time Award. He also receivedthe 1998 IEEE Communications Society William Bennett Best PaperAward. Last, he has been elected Fellow of both the ACM and IEEE.

Dave Braines is the Chief Technology Officer forEmerging Technology, IBM Research UK, andis a Fellow of the British Computer Society. Asa member of IBM Research he is an activeresearcher in the field of Artificial Intelligenceand is currently focused on Machine Learning,Deep Learning and Network Motif analysis. Hehas published over 100 conference and journalpapers and is currently the industry technicalleader for a 10-year research consortium com-prised of 17 academic, industry and government

organisations from the UK and US. Dave is passionate about human-machine cognitive interfaces and has developed a number of tech-niques to support deep interactions between human users and machineagents.

Since 2017 Dave has been pursuing a part-time PhD in ArtificialIntelligence at Cardiff University, and in his spare time he likes to getoutdoors for camping, walking, kayaking, cycling or anything else thatgets him away from desks and screens!

Liam D. Turner is an Assistant Professor ofComputer Science at the School of ComputerScience and Informatics, Cardiff Univeristy, UK.He received his Ph.D. from Cardiff University in2017 and B.Sc in Computer Science from CardiffUniversity in 2013. His current research interestsprimarily surround using computational modelsto represent and understand complex systemsand behaviors and how machine learning canbe leveraged to develop systems for supportinghuman behavior.