an efficient algorithm based on self-adapted fuzzy c-means clustering for detecting communities in...

22
An Efficient Algorithm Based on Self-adapted Fuzzy C-Means Clustering for Detecting Communities in Complex Networks Jianzhi Jin 1 , Yuhua Liu 1 , Kaihua Xu 2 ,Fang Hu 1 1 Department of Computer Science, HuaZhong Normal University Wuhan, 430079, China 2 College of Physical Science and Technology, HuaZhong Normal Universi ty Wuhan, 430079, China Email: [email protected] 2011.12.16

Upload: sawyer-sager

Post on 15-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

An Efficient Algorithm Based on Self-adapted Fuzzy C-Means Clustering for Detecting

Communities in Complex Networks

Jianzhi Jin1, Yuhua Liu1, Kaihua Xu2,Fang Hu1

1 Department of Computer Science, HuaZhong Normal University

Wuhan, 430079, China2 College of Physical Science and Technology, HuaZhong Normal University

Wuhan, 430079, China

Email: [email protected]

2011.12.16

Outline

Introduction

Self-adapted Fuzzy C-Means Clustering in

Complex Networks

Simulations and Analysis

Conclusion and Future Works

Introduction(1/6)

Many complex networked systems are found to divide

naturally into modules or communities, groups of vertices

with relatively dense connections within groups but

sparser connections between them.

Detecting Communities can provide invaluable help in

understanding and visualizing the structure of networks

Introduction(2/6)

Detecting Communities Requirements:

High efficiency and high accuracy Be based on sound theoretical principles Not allowed to be any cut-node or cut-link

Introduction(3/6)

Detecting Communities Validation Metrics

Modularity

Accuracy

Density

2

1

[ ]2

Ks s

s

m dQ

m m

1( , )

n

tv pvvequal l l

Accuracyn

1

Ks

s

mP

m

Introduction(4/6)

FCM in Complex Networks Have been applied to detecting communities in recent

years The mainstream algorithm—AFCM, CFCM and

NFCM etc. All use the different variants of Laplacian matrix of the

graph

Introduction(5/6)

FCM in Complex Networks Laplacian matrix (N=D-A) is used in AFCM N=D-1A is used in CFCM, and N=D-1/2(D-A) D-1/2 is

used in NFCM. D is the diagonal matrix consisting by the degree of all nodes

in the whole network, and A is the adjacency matrix of the network.

Introduction(6/6)

FCM in Complex Networks Better clustering accuracy and running efficiency The synthetic performance is well Two deficiencies

Cannot find the number of clusters to be explored voluntarily Easy to get stuck in a local extremum

Self-adapted Fuzzy C-Means Clustering in Complex Networks(1/5) SFCM in Complex Networks

A new algorithm based on FCM to detecting communities----Self-adapted FCM.

Constructing a new validity function to find an optimal number of clusters voluntarily. 

Self-adapted Fuzzy C-Means Clustering in Complex Networks(2/5)

A New Validity Function The inter-cluster distances should be as bigger as

possible The intra-cluster distances should be as smaller as

possible.2

1 1

2

1 1

( ) || || ( )

( )|| || ( 1)

c nmij i

i j

c nmij j i

i j

u v v n c

L cu x v c

Self-adapted Fuzzy C-Means Clustering in Complex Networks(3/5)

Steps of the Algorithm Step 1 Initialization : termination condition ,

cluster number , , .

Step 2 The partition matrix was constructed.

If there exist j and r, so that , then and for .

( )( ) 2

1( )

1

1

( )

kij kc

ij mk

r rj

ud

d

2c (1) 0L 0k

( ) 0krjd

( ) 1kiju

( ), 0kiji r u

( )kU

0

Self-adapted Fuzzy C-Means Clustering in Complex Networks(4/5)

Steps of the Algorithm Step 3 The prototypes was calculated.

Step 4 If

Then stop the iteration, else let ,and go to Step 2.

( )

1( 1)

( )

1

( )

( )

nk m

ij jjk

i nk m

ijj

u x

vu

( 1) ( )|| ||k kV V

1k k

( 1)kV

Self-adapted Fuzzy C-Means Clustering in Complex Networks(5/5)

Steps of the Algorithm Step 5 was calculated under . If

is the highest values, then stop the algorithm, else go to Step 2 with .

Deficiency The computable complexity is O(n3).

( )L c 2 c n ( )L c

1c c

Simulations and Analysis(1/7)

Zachary’s Karate Club

Network of American Football Games

Tests on Computer-generated Networks

Simulations and Analysis(2/7)

Zachary’s Karate Club

Square nodes and circle nodes represent the instructor’s faction and the administrator’s faction, respectively. The squares also split into two communities, which are identified by blue and green, in accordance with the circles which are identified by red and yellow.

Simulations and Analysis(3/7)

Zachary’s Karate Club Modularity of all are not high. Modularity in AFCM is declined substantially. Modularity in CFCM is lower than NFCM and SFCM.

Algorithm Communities Modularity Density

AFCM 4 0.052433 0.628205

CFCM 4 0.226003 0.730769

NFCM 4 0.227318 0.730769

SFCM 4 0.227318 0.730769

Simulations and Analysis(4/7)

Network of American Football Games

The algorithm can find ten communities, which contain ten conferences almost exactly voluntarily. A total of 11 nodes are unclassified or misclassified, with a red circle marked, and its Accuracy is 90.43%.

Simulations and Analysis(5/7)

Network of American Football Games The modularity calculated by SFCM is higher than

others, so does the density. Likewise, the community number of the first three algorithms is pre-specified.

Algorithm Communities Modularity Density

AFCM 10 0.495357 0.674029

CFCM 10 0.495442 0.673915

NFCM 10 0.494795 0.673475

SFCM 10 0.498077 0.675367

Simulations and Analysis(6/7)

Tests on Computer-generated Networks RN(c, m, k, p)

Where c is the number of communities in the network, m is the number of nodes in each community, k is the degree of each node, and p is the density we presented.

Simulations and Analysis(7/7)

Tests on Computer-generated Networks p is increasing from 0 to 1, the community structure in the

network becomes more cohesive. All algorithms can correctly cluster all the nodes when p

was no less than 0.5. In the range of , the accuracy of SFCM is better

than others.0.2 0.5p

Conclusion and Future Works

A new validity function is defined in this algorithm to find an optimal cluster number voluntarily.

The simulation results verify that the algorithm is more complete and accurate

The higher computable complexity will influence its performance in the end

In a further research, we will focus on improving the computability and complexity with less loss of precision, and getting the global optimal solution.

Please Ask Questions

Thank you!