986 ieee transactions on cybernetics, vol. 46, no. 4...

14
986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4, APRIL 2016 Local Community Mining on Distributed and Dynamic Networks From a Multiagent Perspective Zhan Bu, Zhiang Wu, Member, IEEE, Jie Cao, and Yichuan Jiang, Senior Member, IEEE Abstract—Distributed and dynamic networks are ubiquitous in many real-world applications. Due to the huge-scale, decentralized, and dynamic characteristics, the global topological view is either too hard to obtain or even not available. So, most existing community detection methods working on the global view fail to handle such decentralized and dynamic large networks. In this paper, we propose a novel autonomy-oriented computing-based method for community mining (AOCCM) from the multiagent perspective in the distributed environment. In particular, AOCCM utilizes reactive agents to pick the neighborhood node with the largest structural similarity as the candidate node, and thus determine whether it should be added into local community based on the modularity gain. We further improve AOCCM to a more efficient incremental version named AOCCM-i for mining communities from dynamic networks. AOCCM and AOCCM-i can be easily expanded to detect both nonoverlapping and overlapping global community structures. Experimental results on real-life networks demonstrate that the proposed methods can reduce the computational cost by avoiding repeated structural similarity calculation and can still obtain the high-quality communities. Index Terms—Autonomy-oriented computing (AOC), dis- tributed and dynamic networks, incremental computing, local community detection (LCD), multiagent. I. I NTRODUCTION R EAL-LIFE networks, such as the transportation sys- tems [1], the computer science networks [2], and the online friendship network systems (e.g., Twitter and Facebook) [3], [4], are composed of a large number of highly Manuscript received October 28, 2014; revised February 5, 2015; accepted March 31, 2015. Date of publication June 16, 2015; date of current ver- sion March 15, 2016. This work was supported in part by the National Natural Science Foundation of China under Grant 71372188 and Grant 61472079, in part by the National Center for International Joint Research on E-Business Information Processing under Grant 2013B01035, in part by the National Key Technologies Research and Development Program of China under Grant 2013BAH16F01, in part by Industry Projects in Jiangsu S&T Pillar Program under Grant BE2014141, in part by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and in part by the Key/Surface Project of Natural Science Research in Jiangsu Provincial Colleges and Universities under Grant 12KJA520001, Grant 14KJA520001, and Grant 14KJB520015. This paper was recommended by Associate Editor B. Chaib-draa. (Corresponding author: Zhiang Wu.) Z. Bu, Z. Wu, and J. Cao are with the Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing 210003, China (e-mail: [email protected]). Y. Jiang is with the School of Computer Science and Engineering, Southeast University, Nanjing 211189, China, and also with the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2015.2419263 interconnected nodes/actors. And they often display a com- mon topological feature-community structure. Discovering the latent communities therein is a useful way to infer some important functions. In general, a community should be thought of a set of nodes that have more and/or better-connected edges between its members than between its members and the remainder of the network. The existing community definitions in the literature can be roughly divided into three categories, one is global-based [5]–[7], the other is based on the node- similarity [8]–[11], and the third is local-based [12]–[14]. 1) The global-based definitions consider the graph as a whole, and they follow the assumption that a graph has community structure if it is different from a random graph, i.e., null model. 2) The node-similarity-based definitions are based on the assumption that communities are groups of nodes similar to each other. Traditional methods, including hierarchi- cal [8], partitional [9], and spectral clustering [10], [11], are based on this kind of definition. 3) The local-based definitions compare the internal and external cohesion of a subgraph. The first recipe of this kind is Luccio Sami (LS)-set [12], which stems from social network analysis. The condition of LS-set is quite strict and can be relaxed into the so-called weak definition of community [13]. Some other local-based definitions can be found in [15]–[17]. Since existing global-based and the node-similarity-based community detection approaches require clear pictures of the entire graph structure, they are often described as global com- munity detection (in short as GCD henceforth). In those GCD methods, the networks concerned are centralized (i.e., they are processed in a centralized manner and with a global control) instead of being distributed. However, in the real world, many applications involve distributed networks, in which resources and controls are often decentralized. As they are based on the structure-oriented view, the characteristics and effects of social actors are neglected. To consider the effects of actor characteristics, the actor- structure crossing view has been recently introduced into community detection, especially applications of the local community detection (LCD) or autonomy-oriented comput- ing (AOC) [18]–[20]. The networks concerned by LCD/AOC methods can be distributed, and each actor in the networks is modeled as an agent who acts autonomously to find 2168-2267 c 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 23-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4, APRIL 2016

Local Community Mining on Distributedand Dynamic Networks From

a Multiagent PerspectiveZhan Bu, Zhiang Wu, Member, IEEE, Jie Cao, and Yichuan Jiang, Senior Member, IEEE

Abstract—Distributed and dynamic networks are ubiquitous inmany real-world applications. Due to the huge-scale, decentralized,and dynamic characteristics, the global topological view is eithertoo hard to obtain or even not available. So, most existingcommunity detection methods working on the global view fail tohandle such decentralized and dynamic large networks. In thispaper, we propose a novel autonomy-oriented computing-basedmethod for community mining (AOCCM) from the multiagentperspective in the distributed environment. In particular, AOCCMutilizes reactive agents to pick the neighborhood node with thelargest structural similarity as the candidate node, and thusdetermine whether it should be added into local communitybased on the modularity gain. We further improve AOCCM to amore efficient incremental version named AOCCM-i for miningcommunities from dynamic networks. AOCCM and AOCCM-i canbe easily expanded to detect both nonoverlapping and overlappingglobal community structures. Experimental results on real-lifenetworks demonstrate that the proposed methods can reducethe computational cost by avoiding repeated structural similaritycalculation and can still obtain the high-quality communities.

Index Terms—Autonomy-oriented computing (AOC), dis-tributed and dynamic networks, incremental computing, localcommunity detection (LCD), multiagent.

I. INTRODUCTION

REAL-LIFE networks, such as the transportation sys-tems [1], the computer science networks [2], and

the online friendship network systems (e.g., Twitter andFacebook) [3], [4], are composed of a large number of highly

Manuscript received October 28, 2014; revised February 5, 2015; acceptedMarch 31, 2015. Date of publication June 16, 2015; date of current ver-sion March 15, 2016. This work was supported in part by the NationalNatural Science Foundation of China under Grant 71372188 and Grant61472079, in part by the National Center for International Joint Researchon E-Business Information Processing under Grant 2013B01035, in partby the National Key Technologies Research and Development Programof China under Grant 2013BAH16F01, in part by Industry Projects inJiangsu S&T Pillar Program under Grant BE2014141, in part by the PriorityAcademic Program Development of Jiangsu Higher Education Institutions,and in part by the Key/Surface Project of Natural Science Research inJiangsu Provincial Colleges and Universities under Grant 12KJA520001,Grant 14KJA520001, and Grant 14KJB520015. This paper was recommendedby Associate Editor B. Chaib-draa. (Corresponding author: Zhiang Wu.)

Z. Bu, Z. Wu, and J. Cao are with the Jiangsu Provincial Key Laboratory ofE-Business, Nanjing University of Finance and Economics, Nanjing 210003,China (e-mail: [email protected]).

Y. Jiang is with the School of Computer Science and Engineering, SoutheastUniversity, Nanjing 211189, China, and also with the College of ComputerScience and Technology, Nanjing University of Aeronautics and Astronautics,Nanjing 211106, China.

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCYB.2015.2419263

interconnected nodes/actors. And they often display a com-mon topological feature-community structure. Discovering thelatent communities therein is a useful way to infer someimportant functions.

In general, a community should be thought of a set ofnodes that have more and/or better-connected edges betweenits members than between its members and the remainderof the network. The existing community definitions in theliterature can be roughly divided into three categories, oneis global-based [5]–[7], the other is based on the node-similarity [8]–[11], and the third is local-based [12]–[14].

1) The global-based definitions consider the graph as awhole, and they follow the assumption that a graph hascommunity structure if it is different from a randomgraph, i.e., null model.

2) The node-similarity-based definitions are based on theassumption that communities are groups of nodes similarto each other. Traditional methods, including hierarchi-cal [8], partitional [9], and spectral clustering [10], [11],are based on this kind of definition.

3) The local-based definitions compare the internal andexternal cohesion of a subgraph. The first recipe ofthis kind is Luccio Sami (LS)-set [12], which stemsfrom social network analysis. The condition of LS-set isquite strict and can be relaxed into the so-called weakdefinition of community [13]. Some other local-baseddefinitions can be found in [15]–[17].

Since existing global-based and the node-similarity-basedcommunity detection approaches require clear pictures of theentire graph structure, they are often described as global com-munity detection (in short as GCD henceforth). In those GCDmethods, the networks concerned are centralized (i.e., they areprocessed in a centralized manner and with a global control)instead of being distributed. However, in the real world, manyapplications involve distributed networks, in which resourcesand controls are often decentralized. As they are based on thestructure-oriented view, the characteristics and effects of socialactors are neglected.

To consider the effects of actor characteristics, the actor-structure crossing view has been recently introduced intocommunity detection, especially applications of the localcommunity detection (LCD) or autonomy-oriented comput-ing (AOC) [18]–[20]. The networks concerned by LCD/AOCmethods can be distributed, and each actor in the networksis modeled as an agent who acts autonomously to find

2168-2267 c© 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

BU et al.: LOCAL COMMUNITY MINING ON DISTRIBUTED AND DYNAMIC NETWORKS FROM A MULTIAGENT PERSPECTIVE 987

its local community. In [20], the community evolution hasbeen introduced into the proposed incremental AOC-basedmethod (AOC-i), in which the new community structure canbe quickly derived based on the previous one and the incre-mental network update. Although existing LCD/AOC methodshave already achieved significant success, further study is stillneeded on seeking a nice balance between the high efficiencyof local search models and the high accuracy of detected com-munities. Therefore, we proposed a novel AOC method fromthe multiagent perspective for detecting community structuresin the distributed environment, in which three key problemsare carefully concerned.

1) How to model the real-life networks in the distributedenvironment?

2) How to effectively and accurately detect the local com-munity starting from an arbitrary distributed node?

3) How to monitor the influence on a given local commu-nity and incrementally compute its current structure?

Targeting at effectively solving these above problems, inthis paper, we propose a fully distributed system guided byAOC methodology for modeling decentralized networks. Inthis system, every node is assigned an autonomous agent forLCD. For the LCD task of each agent, a heuristic algorithmnamed AOCCM is presented, and then its incremental versioncalled AOCCM-i is designed for handling community evolu-tion. More specifically, our main contributions are summarizedin threefold as follows.

1) We present a novel modularity gain criterion, basedon which a heuristic algorithm named AOCCM isdesigned for the LCD task of each agent. The pro-posed method is able to start from an arbitrary nodein a distributed network, and repeats two iterative steps(Update and Join) until the local community hasreached its convergent status or the agent’s clock timeis over.

2) We expand AOCCM to a more efficient incremen-tal method (AOCCM-i) for mining communities fromdynamic and distributed networks. The process ofAOCCM-i is an iterative process consisting of a seriesof discrete evolutionary cycles. In each cycle, the newobjective can be incremental updated based on the pre-vious results and the dynamic changes of the network.

3) Based on the local communities detected by AOCCM orAOCCM-i, we further propose two global versions fornonoverlapping and overlapping community detections.Thorough experiments on real-life networks demonstratethat the proposed methods can keep a nice balancebetween the high accuracy and short running time.

The remainder of this paper is organized as follows.Section II presents the related work about AOC and dynamicnetwork mining. In Section III, we give a problem definitionof distributed community mining and the basic ideas behindour method. Section IV introduces the AOC-based method forcommunity mining. In Section V, we validate the proposedmethods using some real-world networks, and examine its per-formances in detail. We further present an incremental AOCmethod for dynamic network mining in Section VI, and finallythe conclusion of this paper is given in Section VII.

II. RELATED WORK

Here, we discuss related work from two areas: 1) AOC and2) dynamic network mining.

A. Autonomy-Oriented Computing

Early work in LCD can be adopted to AOC, which can beclassified into two main categories, namely: 1) degree-basedmethods and 2) similarity-based methods.

Degree-based methods evaluate the local community qual-ity by investigating nodesdegrees. Some naive solutions, suchas l-shell search algorithm [21], discovery-then-examinationapproach [15], and outwardness-based method [16], onlyconsider the number of edges inside and outside a local com-munity. Clauset [22] defined local modularity by consideringthe boundary points of a subgraph, and proposes a greedy algo-rithm on optimizing this measure. Similarly, Luo et al. [17]presented another measurement as the ratio of the internaldegree and external degree of a subgraph. Both measurementscan achieve high recall but suffer from low precision due toincluding many outliers [15].

Similarity-based methods utilize similarities between nodesto help evaluate the local community quality. Local tight-ness expansion (LTE) algorithm [23] is a representative ofsimilarity-based methods, using a well-designed metric forlocal community quality known as tightness. There are a fewalternative similarity-based metrics such as vertex similarityprobability model [24] and relation strength similarity [25] thatcan also help evaluate the local community quality, althoughthey are not originally designed for LCD.

Some multiagent technologies have been introduced intocommunity detection [26], [27], in which, each actor in thenetworks is modeled as an agent and acts autonomously tofind its community. For example, Chen et al. [28] formu-lated the agents’ utility by the combination of a gain functionand a loss function and make agents select communities by agame-theoretic framework to achieve an equilibrium for inter-preting a community structure. To consider in the distributedexperiment, Yang et al. [20] utilized reactive agents to makedistributed and incremental mining of communities based onlyon their local views and interactions.

Our new autonomy-oriented computing-based method forcommunity mining (AOCCM) is also based on the multiagentperspective, in which, the local search model of each agent isalso an extension of the similarity model. However, in compar-ison to the above approaches which calculate the quantitativemetrics for every node in the neighbor sets, the structural simi-larity of each pair of nodes in AOCCM is calculated only once.By introducing the notion of modularity gain, which is seen asa quantified criterion to decide whether the candidate node canbe added into the local community or not, the effectiveness ofAOCCM is very high.

B. Dynamic Network Mining

Recently, finding communities in dynamic networks hasgained more and more attention. A family of events on bothcommunities and individuals have been introduced in [29]to characterize evolution of communities. An evolutionary

Page 3: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

988 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4, APRIL 2016

Fig. 1. Environment of the AOC system.

version of the spectral clustering algorithms has been firstlyproposed by Chi et al. [30], in which the graph cut is usedas a metric for measuring community structures and com-munity evolutions. Their work has been further expandedby Lin et al. [31], in which, a graph-factorization cluster-ing algorithm named FacetNet has been proposed to analyzedynamic networks. The above mentioned studies often adopteda two-step approach where first static analysis is applied to thesnapshots of the social network at different time steps, and thencommunity evolutions are introduced afterwards to interpretthe change of communities over time. As they overlooked theold community structures as obtained in the previous snapshot,this strategy of recalculating is not efficient. In the frameworkof multiagent system (MAS), Yang et al. [20] introduced anAOC-i, in which the new community structure can be quicklyderived based on the incremental change and the old commu-nity structure as obtained in the previous cycle. The proposedincremental computing method in this paper is also AOC-based. Compared with AOC-i, AOCCM-i does not requireany user-defined parameters, and the network updating can bemore arbitrarily, e.g., gradually inputting any number of newedges. We will also prove in the experiment that the averageconvergent speed of AOCCM-i is much faster than AOC-is.

III. PROBLEM DEFINITION AND PRELIMINARIES

As the nodes and the connectivity information in a dis-tributed network are located at different positions. Communitymining in the distributed network is aimed to find all theunderlying communities only based on the local views and thecooperations among the nodes. One possible solution to solvethis problem is to provide the clear picture of the network toan administrative agent, on which a GCD algorithm is appliedto discover the underlying communities. However, this strat-egy will have several inherent limitations when confrontinglarge-scale or highly-dynamic networks. To address this issues,the methodology of AOC is presented in this paper, which isderived from [18] and [19]. Generally speaking, an AOC sys-tem can be viewed a MAS, in which agents interact with eachother according to some predefined transfer protocols to detectthe local community.

A. Environment of AOC System

The presented AOC system is defined as a distributed net-work in which nodes and the connectivity information aredecentralized, as shown in Fig. 1. Let G = (V, E) be a givenweighted distributed network, where V is the set of nodes(|V| = n), E is the set of edges (E = {eij}) that connect thenodes in V , and wij is the weight on edge eij. In the AOC sys-

tem, each node is defined as a tuple <�i,�i, ϒi, li>, wherethe components denote the identifiers of its adjacent neigh-bors (�i), the message pool on the node (�i), the data poolon the node (ϒi), and the community label of the node (li),respectively. For each node vi of the distributed network, thereis one autonomous agent Ai residing on it, which is character-ized by three characteristics, including clock, community,and boundary, denoted as <t, Ci(t),Bi(t)>. The attribute tdenotes the clock maintained by agent Ai, who will adjustits clock by t ← t + 1 after each iteration. When its clockreaches the final time T , Ai will become inactive. Ci(t) denotesthe local community detected by agent Ai at time t. Initially,each agent only includes its appurtenant node in the com-munity, e.g., Ci(0) = {vi}. Then, it starts to enlarge itslocal community based on the information it gathers from itsenvironment, until its clock reaches the final time T or thecommunity cannot be changed any more. In other words, wehave Ci = Ci(T). The boundary of each agent is closelyassociated with its local community, which can be defined asBi(t) = {vj|vj �∈ Ci(t), vk ∈ Ci(t),<vj, vk, wjk>∈ E}. At thebeginning, Bi(0) is set to be the adjacent neighbors (�i) ofnode vi. During the community updating of agent Ai, nodesbelonging to its community will be added from the bound-ary one by one. In contrast, those nodes not belonging to thecommunity will be gradually excluded out of its boundary.The above mentioned notations are summarized in Table I.

Remark: The identifier of node vi, e.g., i, is also used todenote the address of the location, in which the node situates.After knowing the identifier of vi, Ai can send messages to it.In the AOC system, the message pool on each node (�i) storesthe messages from others agents, and the data pool ϒi storesthe structural similarity between node vi and its adjacent nodes.

Page 4: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

BU et al.: LOCAL COMMUNITY MINING ON DISTRIBUTED AND DYNAMIC NETWORKS FROM A MULTIAGENT PERSPECTIVE 989

TABLE INOTATIONS OF THE AOC SYSTEM

Initially, both �i and ϒi are empty. Without loss of generality,let us consider agent Ai. Before conducting local commu-nity mining, Ai sends messages to the neighbors of node vi

(�j,vj∈�i ← Ai.message); then Aj,vj∈�i return �j,vj∈�i to cal-culate the structural similarities between vi and its neighbors;finally, Ai stores all the structure similarity values in the datapool of node vi (ϒi ← {sij|vj ∈ �i}). Therefore, the structuralsimilarity of each pair of nodes in AOC system is calculatedonly once.

B. Objective of AOC System

The objective of the AOC system can be achieved if the allagents autonomously achieve their respective objectives

C = {C1, . . . , Cn} (1)

where Ci is the stable local community of agent Ai, and∪1≤i≤nCi = V . The objective of agent Ai is to find a localcommunity structure starting from its appurtenant node vi, onlybased on its local view [e.g., Ci(t) ∪ Bi(t)] and interactionswith other agents.

Generally, a community is measured by a specific propertyof the nodes within it. For this task, different community mea-surements have been proposed [13], [16], [17], [22] in recentyears. In this paper, we adopt a structural similarity measurefrom the cosine similarity function [23] to effectively denotesthe local connectivity density of any two adjacent nodes in adistributed network. Here, we first formalize some notions ofthe local community.

Definition 1 (Structural Similarity): Given a distributednetwork G = (V, E), the structural similarity between twoadjacent nodes vi and vj is defined as

sij =∑

vk∈�i∩�jwikwjk

∑vk∈�i

w2ik

∑vk∈�j

w2jk

. (2)

When we consider an unweighted network, the weight wij

of any edge can be set to 1 and the equation above can betransformed to

sij =∣∣�i ∩ �j

∣∣

|�i|∣∣�j

∣∣

(3)

which corresponds to the edge-clustering coefficient intro-duced in [13].

Definition 2 (Internal Similarity and External Similarity):By employing the structural similarity, we introduce internal

and external similarities of the community Ci(t) as follows:

Sin(Ci(t)) =∑

vj,vk∈Ci(t),<vj,vk,wjk>∈E

sjk (4)

Sout(Ci(t)) =∑

vj∈Ci(t),vk∈Cci (t),<vj,vk,wjk>∈E

sjk (5)

where Cci (t) is the complement of Ci(t).

The criterion, agent Ai used to find the local communitycontaining the appurtenant node vi, is derived from [32], whichfinds a community with a large number of edges within itselfand a small number of edges to the rest of the network.

Definition 3 (Local Modularity): The local modularity ofthe community Ci(t), denoted as W(Ci(t)), is given as follows:

W(Ci(t)) = I(Ci(t))

|Ci(t)|2 −O(Ci(t))

|Ci(t)||Cci (t)|

(6)

where I(Ci(t)) = ∑vi,vj∈Ci(t) Aij, O(Ci(t)) =

∑vi∈Ci(t),vj∈Cc

i (t) Aij, A = [Aij] is an n × n adjacencymatrix of the distributed network G.

Based on the definition of local modularity, we have thefollowing theorem.

Theorem 1: The local modularity value of the communityCi(t) will increase when Ci(t) has high-intracluster density andlow intercluster density.

Proof: The term I(Ci(t)) is twice the number of the edgeswithin Ci(t), and O(Ci(t)) represents the number of edgesbetween Ci(t) and the rest of the network. Each term is nor-malized by the total number of possible edges in each case.Note that, we normalize the first term by |Ci(t)|2 rather than|Ci(t)|(|Ci(t)| − 1) in order to conveniently derive the modu-larity gain discussed below, but in practice this makes littledifference. Subject to this small difference, the local modu-larity can be described as the intracluster density minus theintercluster density. Thus, the proof completes.

Based on Definition 2 and Theorem 1, we have thefollowing corollary.

Corollary 1: The local modularity value of the communityCi(t) will increase when Ci(t) has high-internal similarity andlow external similarity.

Proof: A high value of Sin(Ci(t)) reveals a large number ofcommon neighbors of any adjacent node pair in Ci(t), resultingin a high value of intracluster density. While, a low value ofSout(Ci(t)) reveals a small number of common neighbors ofany adjacent node pair between Ci(t) and Cc

i (t), resulting in alow value of intracluster density.

Page 5: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

990 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4, APRIL 2016

Fig. 2. W variant when a node vj joins Ci(t).

In Definition 2, as the second term will be made negligibleby the large |Cc

i (t)|, a very small community can give a highvalue of W(Ci(t)). We further make an adjustment in the spiritof the ratio cut and maximize the following criterion:

W(Ci(t)) = |Ci(t)|∣∣Cc

i (t)∣∣(

I(Ci(t))

|Ci(t)|2 −O(Ci(t))

|Ci(t)||Cci (t)|

)

(7)

where the factor |Ci(t)||Cci (t)| penalizes very small and very

large communities and produces more balanced solutions.Suppose at clock t, Ai explores the adjacent nodes in the

boundary area Bi(t), as shown in Fig. 2. It distinguishesthree types of links: those internal to the community Ci(t)(L),between Ci(t) and the node vj(Lin), between Ci(t) and othersnodes in Bi(t)(Lout). To simplify the calculations, we expressthe number of external links in terms of L and kj (the degreeof node vj), so Lin = a1L = a2kj, Lout = b1L, with b1 ≥ 0,a1 ≥ 1/L, a2 ≥ 1/kj [since any vj in Bi(t) at least hasone neighbor in Ci(t)]. So, the value of W for the currentcommunity can be written as

W(Ci(t)) = n− |Ci(t)||Ci(t)| 2L− (a1 + b1)L. (8)

Then, the variant W of the community Ci(t) ∪ vj becomes

W(Ci(t) ∪ vj

) = n− |Ci(t)| − 1

|Ci(t)| + 12L(1+ a1)−

(b1L+ kj − a2kj

).

(9)

So, we define the modularity gain in the following.Definition 4 (Modularity Gain): The modularity gain for

the community Ci(t) adopting a neighbor node vj can bedenoted as

�WCi(t)(vj) = W(Ci(t) ∪ vj

)− W(Ci(t))

= n− |Ci(t)| − 1

|Ci(t)| + 12L(1+ a1)−

(b1L+ kj − a2kj

)

−(

n− |Ci(t)||Ci(t)| 2L− (a1 + b1)L

)

= 2na2kj|Ci(t)| − L

|Ci(t)|(|Ci(t)| + 1)− kj. (10)

It means that if a small node in terms of degree links manynodes in community Ci(t), adopting it may increase the localmodularity of Ci(t). Therefore, �WCi(t)(vj) can be utilized as

Algorithm 1 Life-Cycle of Agent Ai (AOCCM(Ai))1: /*Initialization phase*/2: t← 0;3: Ci(0)← {vi};4: Bi(0)← {vj|vj ∈ �i};5: /*Active phase*/6: while t < T do7: v∗j = arg maxvj∈Bi(t)

∑vj∈Ci(t) sij;

8: if �WCi(t)(v∗j ) > 0 then

9: Bi(t + 1)← Bi(t) ∪ {vk|vk ∈ �j∗ , vk /∈ Ci(t)} − {v∗j };10: Ci(t + 1)← Ci(t) ∪ {v∗j };11: else12: Bi(t + 1)← Bi(t)− {v∗j };13: end if14: t← t + 1;15: if Bi(t) = ∅ then16: break;17: end if18: end while19: /*Inactive phase*/20: Ci = Ci(t);21: li ← arg maxvj∈Ci kj;

a criterion for Ai to determine whether the candidate node vj

should be included in the community Ci(t + 1) or not.

IV. AOC-BASED METHOD FOR COMMUNITY MINING

In this section, we propose an AOC-based method for com-munity mining (in short as AOCCM henceforth). First, weintroduce the basic idea of AOCCM and then present algo-rithmic details including the complexity analysis. Second,we introduce how to use AOCCM to detect the globalnonoverlapping and overlapping community structures.

In the AOC system, each agent, e.g., Ai, starts from itsappurtenant node vi to find the densely connected local com-munity. Ai works with two iterative steps: 1) Update stepand 2) Join step. First, the appurtenant node vi is added intothe local community, e.g., Ci(0) = {vi}. In the Update step,Ai refreshes the boundary area Bi(t), and calculate the struc-tural similarities between nodes in the community Ci(t) andtheir neighbor nodes in Bi(t). In the Join step, Ai tries toabsorb a node in Bi(t), e.g., v∗j , having highest structuralsimilarity with nodes in Ci(t) into the local community. If�WCi(t)(v

∗j ) > 0, then the node v∗j will be inserted into

Ci(t+1). Otherwise, it will be removed from Bi(t+1) and othernodes will be considered in the descending order of the struc-tural similarity. The two procedures above will be repeated byAi in turn until its clock reaches the final time T or its bound-ary is empty. Then, the whole community Ci is discovered.Ai further selects the node with maximum degree in Ci as thecore node, the identifier of which can be seen as the label ofdetected community. The life-cycle of agent Ai on node vi isgiven in the following.

Remark: Unlike existing methods [16], [17], [22], whichcalculate the quantitative metrics for each node in B and selectthe node who produces the greatest increment of the metric to

Page 6: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

BU et al.: LOCAL COMMUNITY MINING ON DISTRIBUTED AND DYNAMIC NETWORKS FROM A MULTIAGENT PERSPECTIVE 991

join C, each agent Ai in the AOC system picks the neighbornode with the largest structure similarity as the candidate nodev∗j and calculate �WCi(t)(v

∗j ) to determine whether it should

be added into Ci(t+1) or not. The structural similarity reflectsthe local connectivity density of the network. The larger thesimilarity between a node inside Ci(t) and a node outside it, themore common neighbors the two nodes share, and the moreprobability they are at the same community. So the executionof AOCCM on each agent is accelerated and the accuracyremains high.

A. Complexity Analysis

The running time of AOCCM on agent Ai is mainly con-sumed in line 7 of Algorithm 1, which is selecting the neighbornode with the largest structure similarity. Agent Ai can imple-ment it using a binary Fibonacci heap Hi [23], which takestwo steps.

1) Extract (extract the maximum element from Hi). Aseach Extract operation of Hi takes O(log n′i) timeand the body of the while loop is executed n′ times,the total time for all Extract steps is O(n′ log n′i),where n′i is the number of nodes inferred (nodesin Ci ∪ Bi).

2) Update [for each node in current Bi(t), Ai updates itssum of structure similarities with nodes in Ci(t)]. First,the sum of structure similarities with nodes in Ci(t) foreach node vj ∈ Bi(t) should be computed, which can becompleted in O(k′i) time, where k′i is the mean degree ofinferred nodes. For nodes which are not in Hi, Ai insertsthem into Hi in O(1) time; otherwise, it takes O(1)

time to make an increase-key operation. As the abovesteps are executed O(m′i) times, where m′i is the num-ber of edges in Ci ∪Bi. Therefore, the total time of theUpdate steps is O(m′ik′i). Adding all together, the totaltime complexity is O(m′ik′i + n′i log n′i) for AOCCM onagent Ai.

B. Nonoverlapping Community Detection

Nonoverlapping community detection aims to find a goodK-way partition P = {P1, . . . ,PK}, where Pk is the kth com-munity, in which li = lj ∀vi, vj ∈ Pk, and P1 ∪ · · · ∪ PK ⊆ V ,Pk ∩ Pk′ = ∅ ∀ k �= k′. K is automatically determined byresults of each AOCCM(Ai). Our assumption is that similaradjacent agents will return analogous community structures,in which the core nodes are almost unanimous. Therefore,if Ai detects the same community label, their appurtenantnodes are likely to be in the same community. The pro-cess of AOCCM expansion algorithm for nonoverlapping (inshort as AOCCMnO henceforth) is given as in Algorithm 2,where L = {li|i = 1, . . . , n} is the label list of nodes inthe distributed network. AOCCMnO could be completed inO(m∗k∗ + n∗ log n∗) time, where n∗, m∗ are the number ofnodes and edges in the largest Ci ∪ Bi, and k∗ is the meandegree of inferred nodes in it.

C. Overlapping Community Detection

While, for an overlapping partition, overlapping communi-ties can be represented as a membership matrix U = [ui,k],

Algorithm 2 AOCCMnO(G)

1: for i = 1; i <= n; i++ do2: [Ci, li]← AOCCM(Ai); //Parallel Computing3: end for4: L = unique(L); // L = {l′k|1 ≤ k ≤ K}5: K = Length(L);6: for k = 1; k <= K; k ++ do7: Pk = {vi|∀li = l′k};8: end for

Algorithm 3 AOCCMO(G)

1: for s = 1; s <= n; s++ do2: [Ci, li]← AOCCM(Ai); //Parallel Computing3: end for4: L = unique(L); // L = {l′k|1 ≤ k ≤ K}5: K = Length(L);6: for i = 1; i <= n; i++ do7: for k = 1; i <= K; k ++ do

8: ui,k =∑

j=1,··· ,n ∧lj=l′k

δ(vi,Cj)∑

j=1,··· ,n δ(vi,Cj);

9: end for10: end for

i = 1, . . . , n, k = 1, . . . , K, where 0 ≤ ui,k ≤ 1 denotes theratio of membership that node vi belongs to Pk. If node ibelongs to only one community, ui,k = 1, and it clearly fol-lows that

∑Kk=1 ui,k = 1 for all 1 ≤ i ≤ n. With the detected

communities of AOCCM(Ai), ui,k can be calculated as follows:

ui,k =∑

j=1,...,n∧

lj=l′k δ(vi, Cj

)

∑j=1,...,n δ

(vi, Cj

) (11)

δ(vi, Cj) ={

1 if vi ∈ Cj

0 otherwise.(12)

The process of AOCCM expansion algorithm for overlap-ping (in short as AOCCMO henceforth) is given as follows.The running time of AOCCMO is mainly consumed inlines 6–9 of Algorithm 3, which is calculating the member-ship that node i belongs to Pk. The total time of those steps isO(nK). Adding the local community extraction steps, the totaltime complexity is O(m∗k∗ + n∗ log n∗ + nK) for AOCCMO.

In the following example, we show how to detect commu-nities hidden in the distributed network (see Fig. 3) using theAOCCM algorithm. The original distributed network includes12 nodes and 20 edges, the weight on the edges betweennodes v1, v5, and v9 is 3, and 1 on other edges. We will focuson a single agent and observe its local community at differenttime step during its entire life-cycle. In this case, we chooseagent A1, and the final time T is set to be 12.

First, A1 initializes itself. The time of its clock is set to 0,and its local community and boundary area at time 0 are setto C1(0) = {v1}, B1(0) = {v2, v3, v5, v9}, respectively.

Then, A1 starts its active phase. After calculating the struc-ture similarities in step 7, it selects node v5 from B1(0),which has the highest structural similarity with the nodesin C1(0). In step 8, �WC1(0)(v5) is calculated to deter-mine whether node v5 should be added into C1(1) or not.

Page 7: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

992 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4, APRIL 2016

Fig. 3. Illustrative example for AOCCM, AOCCMnO, and AOCCMO.

As �WC1(0)(v5) > 0, node v5 is added into C1(1), and B1(1)

is refreshed to be {v2, v3, v6, v7, v8, v9}. Next, in step 14,A1 updates its clock, the time of its clock is 1. In step 15,A1 checks whether it has reached a convergent status byobserving its boundary area. As B1(1) is not empty and thecurrent time is less than T , A1 goes to step 7 and starts anew iteration. After finishing the second iteration, the localcommunity and boundary area of A1 and time 2 becomes:C1(2) = {v1, v5, v9}, B1(2) = {v2, v3, v6, v7, v8, v10, v11, v12},respectively. As B1(2) is not empty and the current time isstill less than T . A1 goes to step 7 and starts the third itera-tion, after which the local community of A1 at time 3 keepsunchanged while its boundary area B1(3) is shrank to be{v3, v6, v7, v8, v10, v11, v12}. Once again, A1 has not reached itsconvergent status and continues the iterative process until itsclock time reaches 10. In this case, the boundary area B1(10)

is empty, so it quits the cycle of community updating.Finally, A1 records its convergent local community as

C1 = C1(10) = {v1, v5, v9}, and further selects the identifierof the node with maximum degree (e.g., v5) as the labelof detected community. The entire life-cycle of agent A1 iscompleted and A1 becomes inactive.

Similarly, we can observe the activities of other agents.When we acquire the attributive tags of all 12 nodes, theglobal nonoverlapping community structure has been detected.Finally, all local communities are further assembled to be amembership matrix U = [ui,k], which promulgates the globaloverlapping community structure.

V. EVALUATION OF THE AOC SYSTEM

Four real-world undirected networks: 1) Karate;2) National Collegiate Athletic Associa-tion (NCAA); 3) Facebook; and 4) pretty goodprivacy (PGP) are used to validate the AOC system.Although these networks are given as centralized represen-tations, for the purpose of testing our distributed method,here we treat them as distributed networks, i.e., we considerthat their nodes and links are distributed (e.g., over differentsources, or geographical locations). Some characteristics ofthese networks are shown in Table II, where |V| and |E|indicate the numbers of nodes and edges respectively in thenetwork, and k indicates the average degree. Karate isa well known social network that describes the friendshiprelations between members of a karate club. NCAA is a

TABLE IIREAL-WORLD NETWORKS FOR EXPERIMENTS

Fig. 4. Comparison on efficiency.

representation of the schedule of American Division I collegefootball games. Vertices in the network represent teams,which are divided into eleven communities (or conferences)and five independent teams. Edges represent regular seasongames between the two teams they connect. Facebook hasbeen anonymized by replacing the Facebook-internal ids foreach user with a new value. Each edge tells whether twousers have the same political affiliations. PGP is a large scalesocial network, where each node represents a peer and eachtie points out that one peer trusts the other.

A. Performance of AOCCM

1) Effectiveness: To test the effectiveness of our approach,the results of AOCCM are compared with the ground truthcommunities of each network. To be special, let Ti be theground truth community including the node vi, we can com-pare Ti and Ci in the framework of precision, recall, andf-measure (PRF) to assess our results. A higher value ofprecision (P) indicates fewer wrong classifications, while ahigher value of recall (R) indicates less false negatives. It iscommon to use the harmonic mean of both measurements,called F-measure, which weighs precision and recall equally

Page 8: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

BU et al.: LOCAL COMMUNITY MINING ON DISTRIBUTED AND DYNAMIC NETWORKS FROM A MULTIAGENT PERSPECTIVE 993

TABLE IIIACCURACY COMPARISON ON REAL-WORLD NETWORKS

important. They are calculated as follows:

P(vi) = |Ci ∩ Ti||Ci| (13)

R(vi) = |Ci ∩ Ti||Ti| (14)

F1(vi) = 2P(vi)R(vi)

P(vi)+ R(vi). (15)

Since the last two networks (Facebook and PGP) haveno ground truth, we apply FUC [7] to identify communitieswith the global structure information of these two networks,and utilize its detection results as the ground truth for the

AOCCM and LCD methods. This is based on the intuitionthat an LCD method is acceptable if it can achieve an approx-imate result as a GCD approach does, because LCD methodsusually perform faster than GCD approaches. In addition,since the global community quality metrics such as the well-known modularity metric [5] are not suitable to evaluate thequality of the detected local community, we use each nodein a community as a seed and report algorithms’ averageprecision, recall, and F1-measure on this community. We com-pare AOCCM with classical degree-based LCD algorithmsLuo-Wang-Promislow [17], ELC [16], and a similarity-basedalgorithm LTE [23]. The comparison results are presented

Page 9: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

994 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4, APRIL 2016

Fig. 5. AOCCMnO on small social networks. (a) Karate: ground truth. (b) Karate: AOCCMnO results. (c) NCAA: ground truth. (d) NCAA: AOCCMnOresults.

in Table III. Note that FUC totally detects 85 communities onPGP, from which ten large communities in terms of size areselected to compare the accuracy. In Table III, we can observethat: 1) the recall values for all methods are overall worsethan precision values, this is because LCD methods are basedon the greedy search, which will trend to find a local opti-mal solution; 2) AOCCM almost achieves the high precisionfor all datasets, which demonstrate the superiority of its localsearch model over the other methods; and 3) AOCCM usu-ally outperforms LMR and ELC, and have a slight advantageover LTE, even though the later has been proven by extensiveexperiments to be one of the most accurate algorithms amongprevious LCD methods in [23].

2) Efficiency: We further compare the efficiency ofAOCCM, LMR, ELC, and LTE, Fig. 4 shows the averagerunning time of LCD methods starting from each node inthe four test graphs. Apparently, the execution of AOCCM ismore accelerated. Both AOCCM and LTE are similarity-basedalgorithms, their difference lies at the definition of local mod-ularity. Compared with AOCCM, the calculation of modularitygain in LTE is more complex, which will consume extra time.LMR and ELC are degree-based LCD algorithms, which need

calculate the quantitative metrics for each node in B. Themetric calculations are somewhat duplicate, which can not besimplified. Especially, the stopping criteria for ELC is to judewhether the current community is a “p-strong community,”which will cost more time in every search step.

B. Performance of AOCCMnO

Here, we first apply AOCCMnO to the two small socialnetworks with ground truth: 1) Karate and 2) NCAA. Thepurpose is to gain a direct understanding of nonoverlappingcommunity detection by network visualization. Then, we fur-ther compare AOCCMnO with classical GCD methods, suchas FNM [5], FUC [7], METIS [33], and Cluto [34].Karate is split into two parties following a disagreement

between an instructor (node 1) and an administrator (node 34),which serves as the ground truth about the communitiesin Fig. 5(a). We employ AOCCMnO to extract nonoverlap-ping communities from the network. The result is shown inFig. 5(b), which supplements the division of the club withmore information. More interestingly, AOCCMnO actuallytends to partition this network into four rather than two

Page 10: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

BU et al.: LOCAL COMMUNITY MINING ON DISTRIBUTED AND DYNAMIC NETWORKS FROM A MULTIAGENT PERSPECTIVE 995

TABLE IVMODULARITY AND RUNNING TIME COMPARISON BY AOCCMNO, FNM [5], FUC [7], METIS [33], AND CLUTO [34]

communities, as indicated by the nodes in four colors/shapesin Fig. 5(b). This implies that there exits a latent subparty(including nodes 6, 7, and 11) inside the party led by node 1,and a latent subparty (including nodes 25, 26, 32) inside theparty led by node 34.

The ground truth of NCAA labels nodes with their actualconferences, corresponding twelve different colors/shapes inFig. 5(c). As shown in Fig. 5(d), AOCCMnO generallywell captures the “sharp-cut” teams in conferences “AC,”“BE,” “Ten,” “SE,” “MV,” “WA,” and “Twelve,” respectively,although there yet exists some teams assigned mistakenly.Note that nearly all the “orangered rectangle” in Fig. 5(c) aretotally detected mistakenly by AOCCMnO. This is indeed rea-sonable since those nodes have very few internal connections,actually, they represent five independent teams (Utah State,Navy, Notre Dame, Connecticut, and Central Florida) inNCAA.

1) Modularity and Running Time Comparison: The globalnonoverlapping community structure can be evaluated by somepredefined quantitative criterions, in which, the modularityof [5] is one of most popular quality functions. Modularitycan then be written as follows:

Q = 1

2m

ij

(

Aij − kikj

2m

)

χ(li, lj

)(16)

where the χ -function yields one if nodes vi and vj are in thesame community (li = lj), zero otherwise.

In order to verity the effectiveness of AOCCMnO,we compare it with classical GCD methods, such asFNM [5], FUC [7], METIS [33], and Cluto [34]. For eachmethod/network, Table IV displays the modularity that isachieved and the running time. The modularity obtained byAOCCMnO are slightly lower than FUCs, but it outperformsnearly all the other methods. In terms of running time, METIShas a great advantage due to its powerful parallel processingmodules. However, it performs poorly on graphs with obscurecommunity structure, e.g., Karate and NCAA. AOCCMnO,on the contrary, keeps a nice balance between high modularityand short running time.

C. Performance of AOCCMO

To evaluate the performance of AOCCMO, we also employthe PRF framework. Let Ck be the kth overlapping community,which obeys C1∪· · ·∪CK ⊆ V . In the following, we introducea membership threshold α, 0 < α ≤ 1, to control the scale atwhich we want to observe the overlapping communities in anetwork.

Fig. 6. Accuracy for different α on the four test networks. (a) Karate.(b) NCAA. (c) Facebook. (d) PGP.

Definition 5 (α-Overlapping Community): The kthα-overlapping community, denoted by Ck(α), is defined as

Ck(α) = {vi|ui,k ≥ α}. (17)

Therefore, we can use each node in a overlapping commu-nity as a seed and report AOCCMOs average precision, recall,and F1-measure. The precision (P(α)), recall (R(α)), andF1-measure (F1(α)) of the detected α-overlapping communitystructure are defined as follows:

P(α) =

∑k=1,...,K

∑vi∈Ck(α)

∣∣∣Ck(α)∩Ti

∣∣∣

∣∣∣Ck(α)

∣∣∣

∑k=1,...,K

∑vi∈Ck(α)

1(18)

R(α) =∑

k=1,...,K∑

vi∈Ck(α)

∣∣∣Ck(α)∩Ti

∣∣∣

|Ti|∑k=1,...,K

∑vi∈Ck(α)

1(19)

F1(α) = 2P(α)R(α)

P(α)+ R(α). (20)

Fig. 6 shows the accuracy in the function of α for the fourtest graphs, from which we can observe that: 1) the recallvalues for AOCCMO have a significant improvement in allscales, compared with previous AOCCM algorithms; 2) thevalues of α in the range [0.6, 0.8] are optimal, in the sensethat overlapping communities extracted by AOCCMO in thisregion have a high F1-measure; and 3) AOCCMO performsbetter in dense networks rather than in sparse networks.

Page 11: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

996 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4, APRIL 2016

VI. INCREMENTAL AOC-BASED METHOD FOR

MINING DYNAMIC NETWORKS

In real world, an AOC system could be updated peri-odically depending on new local updates. We can useG = {G1, G

2, . . . , GT} to denote a collection of snapshot

graphs for a given dynamic network over T discrete time steps.

Let Cl = {Cl

1, . . . , Clnl} be the archived objective of the AOC

system at time l, where nl is the total number of agents. Theproblem of incremental community detection can be simplifiedto accurately and efficiently compute C

l+1 when the networkis updated from G

l to Gl+1.

One immediate approach to solve the above problem is todirectly apply the AOCCM algorithm on each agent in theupdated network as discussed in Section IV. Obviously, thestrategy of recalculating is not efficient as it overlooks the oldcommunity structure in the previous snapshot. To address thisissues, we try to find an incremental function �

∗, which canfigure out the new community structure based on the previousarchived objective and the incremental update

Cl = �

∗(C

l−1,�Gl)

(21)

where �Gl = (�Vl,�El) = G

l−Gl−1 denotes the incremen-

tal update of the network G at time l.

A. Incremental AOC-Based Method

In the incremental AOC-based method (in short asAOCCM-i henceforth), the network to be mined is dynami-cally changing, that will trigger the agents to detect the newcommunity structure. We can understand the AOCCM-i algo-rithm as an iterative process consisting of a series of discreteevolutionary cycles. In the lth evolutionary cycle, the newobjective of agent Ai can be quickly derived based on itsprevious local community (Cl−1

i ) and the incremental updateof the network (�G

l). The life-cycle of agent Ai in the lthevolutionary cycle is given in Algorithm 4.

Remark: In Algorithm 4, the initial local community ofagent Ai is set to be C

l−1i , which is the obtained objective

in the previous cycle. In the AOC system, once the net-work is updated, agent Ai can quickly monitor the changesin its local environment by the cooperation among agents.If the incremental update, �G

l, happens to agent Ai, wehave Bl

i(0) �= ∅, which will be initialized as {vj|vj �∈ Cl−1i ,

vk ∈ Cl−1i ,< vj, vk, wjk >∈ �El}. Otherwise, Bl

i(0) = ∅,agent Ai will directly turn to the inactive phase. Except forthe initialization phase, the life-cycle of agent Ai in AOCCM-iis almost the same as that in the AOCCM algorithm, asshown in Algorithm 1. Like AOCCM, AOCCM-i can bealso expanded to detect the global nonoverlapping and over-lapping community structures of the dynamic distributednetworks.

Fig. 7 shows an example of AOCCM-i, which focuses onagent A8 from its l− 1th evolutionary cycle to the lth one. Inthe previous evolutionary cycle, A8 detects a local communityincluding nodes {v8, v6, v7, v5}; then the network is updatedby adding into three new edges {e15, e19, e59}. A8 monitorsthe changes, and initializes its local community and boundary

Algorithm 4 Life-Cycle of Ai in the lth Evolutionary Cycle1: /*Initialization phase*/2: t← 0;3: Cl

i(0)← Cl−1i ;

4: Bli(0)← {vj|vj �∈ C

l−1i , vk ∈ C

l−1i ,< vj, vk, wjk >∈ �El};

5: if Bli(0) = ∅ then

6: go to Step 22;7: end if8: /*Active phase*/9: while t < T do

10: v∗j = arg maxvj∈Bli(t)

∑vj∈Cl

i (t)sij;

11: if �WCli (t)

(v∗j ) > 0 then

12: Bli(t + 1)← Bl

i(t) ∪ {vk|vk ∈ �j∗ , vk /∈ Cli(t)} − {v∗j };

13: Cli(t + 1)← Cl

i(t) ∪ {v∗j };14: else15: Bl

i(t + 1)← Bli(t)− {v∗j };

16: end if17: t← t + 1;18: if Bl

i(t) = ∅ then19: break;20: end if21: end while22: /*Inactive phase*/23: C

li = Ci(t);

24: lli ← arg maxvj∈Clikj;

Fig. 7. Illustrative example for AOCCM-i. (a) l − 1th evolutionary cycle.(b) Network updating. (c) lth evolutionary cycle.

area to be Cl8(0) = {v8, v6, v7, v5} and Bl

8(0) = {v1, v9},respectively. Next, A8 starts its active phase and continuesthe iterative searching process until its clock time reachesT or it has reached its convergent status. Finally, the localcommunity detected by A8 at the lth evolutionary cycleis {v8, v6, v7, v5, v1, v9}. In Fig. 8, we further show how touse AOCCM-i to mine the dynamic NCAA. In each update ofNCAA, 100 new edges are added. The entire process containssix evolutionary circles. Fig. 8(a)–(f) presents the snapshotsof NCAA after each evolutionary cycle, respectively. The finalNCAA from the entire process is shown in Fig. 8(f), from whichwe can see that the detected community structure is almost thesame as that found by the AOCCMnO.

B. Performance Analysis of the AOCCM-i Method

We first take the similar method as in [20] to analyzethe computational complexity of AOCCM-i. The time steps

Page 12: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

BU et al.: LOCAL COMMUNITY MINING ON DISTRIBUTED AND DYNAMIC NETWORKS FROM A MULTIAGENT PERSPECTIVE 997

Fig. 8. Process of mining the dynamic NCAA. (a) First evolutionary cycle: n1 = 42 and m1 = 100. (b) Second evolutionary cycle: n2 = 64 and m2 = 200.(c) Third evolutionary cycle: n3 = 79 and m3 = 300. (d) Fourth evolutionary cycle: n4 = 93 and m4 = 400. (e) Fifth evolutionary cycle: n5 = 104 andm5 = 500. (f) Sixth evolutionary cycle: n6 = 115 and m6 = 616.

required by Ai to finish local community mining in the lthevolutionary cycle is defined as tli. Without considering theparallel computing, the total time steps required by all agents

in the lth evolutionary cycle can be calculated as τ l =∑nl

i=1 tli.Thus, we can approximately evaluate the efficiency of the

AOCCM-i method using τ l. Smaller value of τ l indicates thatless time is required in the lth evolutionary cycle. In the exper-iments, for each network, we use three different strategies tomine its community structure. The first one is based on therecalculating strategy, that is, directly applying AOCCMnO onthe updated network for nonoverlapping community detection.The second one is based on (21), in which, the new communitystructure will be detected by AOCCM-i based on the previousobjective and the incremental update of the network. The lastone is AOC-i proposed by Yang et al. [20]. Note that all net-works used here are considered as dynamic ones, which growgradually by adding a fixed number (μ) of edges in each evo-lutionary cycle. As shown in Fig. 9, AOCCM-i outperformsAOC-i, and the τ value of AOCCM-i is smaller than AOCCMs

by nearly two magnitude. Our method works efficiently andrequires much less time to deal with the new network aftereach update.

In each evolutionary cycle of AOCCM-i, a fixed number (μ)of edges are added into the network, which can be seen as theupdating speed of a dynamic network. Fig. 10 further presentsthe relationship between the average τ value and the updatingspeed μ, from which, we can see that Avg. τ grows withthe increase of μ. This implies that if the average convergentspeed of agents could be matched with the updating speed ofthe dynamic network, AOCCM-i will perform well withoutany delay; otherwise, it may result in some delay.

We finally compare AOCCM-i and AOC-i on the effec-tiveness. For each algorithm/network, Fig. 11(a)–(d) displaysthe modularity that is achieved in each evolutionary cycle.As shown in Fig. 11, the Q values of AOCCM-i are largerthan AOC-is on the first three networks, while AOCCM-i isslightly inferior to AOC-i on PGP. This might due to the strictdefinition of the modularity gain, which decides whether the

Page 13: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

998 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4, APRIL 2016

Fig. 9. Efficiency comparisons between AOCCM, AOCCM-i, and AOC-i onthe four test networks. (a) Karate: μ = 1. (b) NCAA: μ = 8. (c) Facebook:μ = 1100. (d) PGP: μ = 350.

Fig. 10. Average τ value with different μ.

Fig. 11. Efficiency comparisons between AOCCM, AOCCM-i, and AOC-i onthe four test networks. (a) Karate: μ = 1. (b) NCAA: μ = 8. (c) Facebook:μ = 1100. (d) PGP: μ = 350.

candidate node should be added into the local community.As PGP is a large sparse network compared with other net-works, the local search model by AOCCM-i might quicklyreach the convergent status. One possible solution is to relax

the selection criteria, e.g., redefine the modularity gain as�WCi(t)(vj) = W(Ci(t) ∪ vj)/W(Ci(t)), and then use a thresh-old to control the candidate node selection. We will investigatethis in our further work.

VII. CONCLUSION

Real-life networks are distributed and composed of a set ofsocial actors and their interaction relations. Multiagent technolo-gies have already achieved significant success in past years,especially for modeling and analyzing autonomous and dis-tributed multientity systems. The questions then arise of how toconnect real-life networks and MASs and how to use multiagenttechnologies to model and analyze the community structures ofreal-life networks. This paper attempts to answer this problemby surveying real-life networks from a multiagent perspective.

In this paper, we have proposed an AOC-based method forcommunity mining (AOCCM), in which, each actor in thedistributed networks is associated with an agent who sponta-neously interacts with the local environment to find the localcommunity. The identifier of core node in the local commu-nity can be seen as its community tag in the global view. Ourmethod can be easily expanded to find the global nonover-lapping and overlapping community structures. Experimentalresults on real-life networks have demonstrated the advantageof AOCCM over previous LCD methods by either effec-tiveness or efficiency. Furthermore, we have described howAOCCM can be expanded to a more efficient incrementalAOC-based method (AOCCM-i) for mining communities fromdynamic and distributed networks. With the experiment, wefound that if the average convergent speed in each evolution-ary cycle is matched with the updating speed of a dynamicnetwork, the performance of the AOCCM-i algorithm mightbe good without any delay.

One limitation of this paper is the core node selection: inAOCCM, large node in terms of degree is selected as the corenode, the identifier of which is used as its community labelin the global version. However, this assumption may be toostrong, for example, in social networks, a person with widesocial relations may act as an intermediary between differentcommunities. In our future work, we will investigate the com-munity mapping technology, which automatically generatescommunity label according to the detected local community.

Another interesting topic for the future work is to considerthe dynamic change in the AOC system. In this paper, the inter-action relations between old nodes are fixed during communitymining. However, in reality the interaction between any twoactors can arbitrarily generate and disappear, even the weightof interaction can dynamically change with the time. Such adynamic situation may bring about new problems to our cur-rent incremental computing model. Therefore, it is essential todevise feasible approaches to deal with the emergent problemsin the dynamic AOC system.

REFERENCES

[1] R. Guimerà, S. Mossa, A. Turtschi, and L. N. Amaral, “The worldwideair transportation network: Anomalous centrality, community structure,and cities’ global roles,” Proc. Nat. Acad. Sci., vol. 102, no. 22,pp. 7794–7799, May 2005.

Page 14: 986 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 4 ...cie.nufe.edu.cn/__local/...E5E4740E_21AAEC.pdf?e=.pdf · community detection methods working on the global view fail to handle

BU et al.: LOCAL COMMUNITY MINING ON DISTRIBUTED AND DYNAMIC NETWORKS FROM A MULTIAGENT PERSPECTIVE 999

[2] Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, “Link communities revealmultiscale complexity in networks,” Nature, vol. 466, no. 7307,pp. 761–764, 2010.

[3] B. Yang, D. Liu, and J. Liu, “Discovering communities from social net-works: Methodologies and applications,” in Handbook of Social NetworkTechnologies and Applications. New York, NY, USA: Springer, 2010,pp. 331–346.

[4] Z. Lu, X. Sun, Y. Wen, G. Cao, and T. La Porta, “Algorithms and appli-cations for community detection in weighted networks,” IEEE Trans.Parallel Distrib. Syst., doi: 10.1109/TPDS.2014.2370031, 2015.

[5] M. E. Newman and M. Girvan, “Finding and evaluating communitystructure in networks,” Phys. Rev. E, vol. 69, no. 2, 2004, Art. ID 026113.

[6] Z. Bu, C. Zhang, Z. Xia, and J. Wang, “A fast parallel modularity opti-mization algorithm (FPMQA) for community detection in online socialnetwork,” Knowl. Based Syst., vol. 50, pp. 246–259, Sep. 2013.

[7] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fastunfolding of communities in large networks,” J. Statist. Mech. TheoryExp., vol. 2008, no. 10, 2008, Art. ID P10008.

[8] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of StatisticalLearning. New York, NY, USA: Springer, 2009.

[9] A. Hlaoui and S. Wang, “A direct approach to graph clustering,” in Proc.Neural Netw. Comput. Intell. Conf., Grindelwald, Switzerland, 2004,pp. 158–163.

[10] B. Yang and J. Liu, “Discovering global network communities based onlocal centralities,” ACM Trans. Web, vol. 2, no. 1, 2008, Art. ID 9.

[11] L. Yang, X. Cao, D. Jin, X. Wang, and D. Meng, “A unified semi-supervised community detection framework using latent space graph reg-ularization,” IEEE Trans. Cybern., doi: 10.1109/TCYB.2014.2377154,2015.

[12] F. Luccio and M. Sami, “On the decomposition of networks in minimallyinterconnected subnetworks,” IEEE Trans. Circuit Theory, vol. 16, no. 2,pp. 184–188, May 1969.

[13] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi,“Defining and identifying communities in networks,” Proc. Nat. Acad.Sci. USA, vol. 101, no. 9, pp. 2658–2663, Mar. 2004.

[14] T. Zhang and B. Wu, “A method for local community detection byfinding core nodes,” in Proc. IEEE Comput. Soc. Int. Conf. Adv. Soc.Netw. Anal. Min. (ASONAM). Istanbul, Turkey, 2012, pp. 1171–1176.

[15] J. Chen, O. Zaïane, and R. Goebel, “Local community identificationin social networks,” in Proc. IEEE Int. Conf. Adv. Soc. Netw. Anal.Min. (ASONAM), Athens, Greece, 2009, pp. 237–242.

[16] J. P. Bagrow, “Evaluating local community methods in networks,”J. Statist. Mech. Theory Exp., vol. 2008, no. 5, 2008, Art. ID P05001.

[17] F. Luo, J. Z. Wang, and E. Promislow, “Exploring local communitystructures in large networks,” Web Intell. Agent Syst., vol. 6, no. 4,pp. 387–400, 2008.

[18] J. Liu, X. Jin, and K. C. Tsui, “Autonomy-oriented computing (AOC):Formulating computational systems with autonomous components,”IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 35, no. 6,pp. 879–902, Nov. 2005.

[19] J. Liu, “Autonomy-oriented computing (AOC): The nature and impli-cations of a paradigm for self-organized computing,” in Proc. 4th Int.Conf. Nat. Comput. (ICNC), vol. 1. Jinan, China, 2008, pp. 3–11.

[20] B. Yang, J. Liu, and D. Liu, “An autonomy-oriented computing approachto community mining in distributed and dynamic networks,” Auton.Agents Multi-Agent Syst., vol. 20, no. 2, pp. 123–157, 2010.

[21] J. P. Bagrow and E. M. Bollt, “Local method for detecting communities,”Phys. Rev. E, vol. 72, no. 4, 2005, Art. ID 046108.

[22] A. Clauset, “Finding local community structure in networks,”Phys. Rev. E, vol. 72, no. 2, 2005, Art. ID 026132.

[23] J. Huang, H. Sun, Y. Liu, Q. Song, and T. Weninger, “Towardsonline multiresolution community detection in large-scale networks,”PLoS ONE, vol. 6, no. 8, 2011, Art. ID e23829.

[24] K. Li and Y. Pang, “A vertex similarity probability model for findingnetwork community structure,” in Advances in Knowledge Discoveryand Data Mining. Berlin, Germany: Springer, 2012, pp. 456–467.

[25] H.-H. Chen, L. Gou, X. L. Zhang, and C. L. Giles, “Discovering missinglinks in networks using vertex similarity measures,” in Proc. 27th Annu.ACM Symp. Appl. Comput., Trento, Italy, 2012, pp. 138–143.

[26] Y. Jiang and J. Jiang, “Understanding social networks from a multia-gent perspective,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 10,pp. 2743–2759, Oct. 2014.

[27] W. Wang and Y. Jiang, “Community-aware task allocation for socialnetworked multiagent systems,” IEEE Trans. Cybern., vol. 44, no. 9,pp. 1529–1543, Sep. 2014.

[28] W. Chen, Z. Liu, X. Sun, and Y. Wang, “Community detection in socialnetworks through community formation games,” in Proc. Int. Joint Conf.Artif. Intell. (IJCAI), Barcelona, Spain, 2011, pp. 2576–2581.

[29] S. Asur, S. Parthasarathy, and D. Ucar, “An event-based frameworkfor characterizing the evolutionary behavior of interaction graphs,”ACM Trans. Knowl. Disc. Data, vol. 3, no. 4, 2009, Art. ID 16.

[30] Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng, “Evolutionaryspectral clustering by incorporating temporal smoothness,” in Proc. 13thACM SIGKDD Int. Conf. Knowl. Disc. Data Min., San Jose, CA, USA,2007, pp. 153–162.

[31] Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, and B. L. Tseng, “Facetnet:A framework for analyzing communities and their evolutions in dynamicnetworks,” in Proc. 17th Int. Conf. World Wide Web, Beijing, China,2008, pp. 685–694.

[32] Y. Zhao, E. Levina, and J. Zhu, “Community extraction for socialnetworks,” Proc. Nat. Acad. Sci., vol. 108, no. 18, pp. 7321–7326, 2011.

[33] G. Karypis, “Multi-constraint mesh partitioning for contact/impact com-putations,” in Proc. ACM/IEEE Conf. Supercomput., Phoenix, AZ, USA,2003, p. 56.

[34] G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: Hierarchical clus-tering using dynamic modeling,” Computer, vol. 32, no. 8, pp. 68–75,Aug. 1999.

Zhan Bu received the Ph.D. degree in computer sci-ence from the Nanjing University of Aeronautics andAstronautics, Nanjing, China, in 2014.

He is currently a Lecturer with the JiangsuProvincial Key Laboratory of E-Business, NanjingUniversity of Finance and Economics, Nanjing. Hiscurrent research interests include social networkanalysis, complex network, and data mining.

Dr. Bu is a member of China ComputerFederation (CCF)

Zhiang Wu (M’12) received the Ph.D. degreein computer science from Southeast University,Nanjing, China, in 2009.

He is currently an Associate Professor with theJiangsu Provincial Key Laboratory of E-Business,Nanjing University of Finance and Economics,Nanjing. His current research interests include dis-tributed computing, social network analysis, and datamining.

Dr. Wu is a member of ACM and CCF.

Jie Cao received the Ph.D. degree from SoutheastUniversity, Nanjing, China, in 2002.

He is currently a Professor and the Dean ofthe School of Information Engineering, NanjingUniversity of Finance and Economics, Nanjing. Hiscurrent research interests include cloud computing,business intelligence, and data mining.

Prof. Cao was a recipient of the Young andMid-Aged Expert with Outstanding Contribution inJiangsu Province and was selected in the Programfor New Century Excellent Talents in University.

Yichuan Jiang (SM’13) received the Ph.D. degreefrom Fudan University, Shanghai, China, in 2002.

He is currently a Full Professor and the Directorof the Distributed Intelligence and Social ComputingLaboratory, School of Computer Science andEngineering, Southeast University, Nanjing, China.His current research interests include multiagentsystems, social networks, social computing, andcomplex distributed systems.

Prof. Jiang is a member of ACM.