an efficient algorithm based on self-adapted fuzzy c-means clustering for detecting communities in...
TRANSCRIPT
An Efficient Algorithm Based on Self-adapted Fuzzy C-Means Clustering for Detecting
Communities in Complex Networks
Jianzhi Jin1, Yuhua Liu1, Kaihua Xu2,Fang Hu1
1 Department of Computer Science, HuaZhong Normal University
Wuhan, 430079, China2 College of Physical Science and Technology, HuaZhong Normal University
Wuhan, 430079, China
Email: [email protected]
2011.12.16
Outline
Introduction
Self-adapted Fuzzy C-Means Clustering in
Complex Networks
Simulations and Analysis
Conclusion and Future Works
Introduction(1/6)
Many complex networked systems are found to divide
naturally into modules or communities, groups of vertices
with relatively dense connections within groups but
sparser connections between them.
Detecting Communities can provide invaluable help in
understanding and visualizing the structure of networks
Introduction(2/6)
Detecting Communities Requirements:
High efficiency and high accuracy Be based on sound theoretical principles Not allowed to be any cut-node or cut-link
Introduction(3/6)
Detecting Communities Validation Metrics
Modularity
Accuracy
Density
2
1
[ ]2
Ks s
s
m dQ
m m
1( , )
n
tv pvvequal l l
Accuracyn
1
Ks
s
mP
m
Introduction(4/6)
FCM in Complex Networks Have been applied to detecting communities in recent
years The mainstream algorithm—AFCM, CFCM and
NFCM etc. All use the different variants of Laplacian matrix of the
graph
Introduction(5/6)
FCM in Complex Networks Laplacian matrix (N=D-A) is used in AFCM N=D-1A is used in CFCM, and N=D-1/2(D-A) D-1/2 is
used in NFCM. D is the diagonal matrix consisting by the degree of all nodes
in the whole network, and A is the adjacency matrix of the network.
Introduction(6/6)
FCM in Complex Networks Better clustering accuracy and running efficiency The synthetic performance is well Two deficiencies
Cannot find the number of clusters to be explored voluntarily Easy to get stuck in a local extremum
Self-adapted Fuzzy C-Means Clustering in Complex Networks(1/5) SFCM in Complex Networks
A new algorithm based on FCM to detecting communities----Self-adapted FCM.
Constructing a new validity function to find an optimal number of clusters voluntarily.
Self-adapted Fuzzy C-Means Clustering in Complex Networks(2/5)
A New Validity Function The inter-cluster distances should be as bigger as
possible The intra-cluster distances should be as smaller as
possible.2
1 1
2
1 1
( ) || || ( )
( )|| || ( 1)
c nmij i
i j
c nmij j i
i j
u v v n c
L cu x v c
Self-adapted Fuzzy C-Means Clustering in Complex Networks(3/5)
Steps of the Algorithm Step 1 Initialization : termination condition ,
cluster number , , .
Step 2 The partition matrix was constructed.
If there exist j and r, so that , then and for .
( )( ) 2
1( )
1
1
( )
kij kc
ij mk
r rj
ud
d
2c (1) 0L 0k
( ) 0krjd
( ) 1kiju
( ), 0kiji r u
( )kU
0
Self-adapted Fuzzy C-Means Clustering in Complex Networks(4/5)
Steps of the Algorithm Step 3 The prototypes was calculated.
Step 4 If
Then stop the iteration, else let ,and go to Step 2.
( )
1( 1)
( )
1
( )
( )
nk m
ij jjk
i nk m
ijj
u x
vu
( 1) ( )|| ||k kV V
1k k
( 1)kV
Self-adapted Fuzzy C-Means Clustering in Complex Networks(5/5)
Steps of the Algorithm Step 5 was calculated under . If
is the highest values, then stop the algorithm, else go to Step 2 with .
Deficiency The computable complexity is O(n3).
( )L c 2 c n ( )L c
1c c
Simulations and Analysis(1/7)
Zachary’s Karate Club
Network of American Football Games
Tests on Computer-generated Networks
Simulations and Analysis(2/7)
Zachary’s Karate Club
Square nodes and circle nodes represent the instructor’s faction and the administrator’s faction, respectively. The squares also split into two communities, which are identified by blue and green, in accordance with the circles which are identified by red and yellow.
Simulations and Analysis(3/7)
Zachary’s Karate Club Modularity of all are not high. Modularity in AFCM is declined substantially. Modularity in CFCM is lower than NFCM and SFCM.
Algorithm Communities Modularity Density
AFCM 4 0.052433 0.628205
CFCM 4 0.226003 0.730769
NFCM 4 0.227318 0.730769
SFCM 4 0.227318 0.730769
Simulations and Analysis(4/7)
Network of American Football Games
The algorithm can find ten communities, which contain ten conferences almost exactly voluntarily. A total of 11 nodes are unclassified or misclassified, with a red circle marked, and its Accuracy is 90.43%.
Simulations and Analysis(5/7)
Network of American Football Games The modularity calculated by SFCM is higher than
others, so does the density. Likewise, the community number of the first three algorithms is pre-specified.
Algorithm Communities Modularity Density
AFCM 10 0.495357 0.674029
CFCM 10 0.495442 0.673915
NFCM 10 0.494795 0.673475
SFCM 10 0.498077 0.675367
Simulations and Analysis(6/7)
Tests on Computer-generated Networks RN(c, m, k, p)
Where c is the number of communities in the network, m is the number of nodes in each community, k is the degree of each node, and p is the density we presented.
Simulations and Analysis(7/7)
Tests on Computer-generated Networks p is increasing from 0 to 1, the community structure in the
network becomes more cohesive. All algorithms can correctly cluster all the nodes when p
was no less than 0.5. In the range of , the accuracy of SFCM is better
than others.0.2 0.5p
Conclusion and Future Works
A new validity function is defined in this algorithm to find an optimal cluster number voluntarily.
The simulation results verify that the algorithm is more complete and accurate
The higher computable complexity will influence its performance in the end
In a further research, we will focus on improving the computability and complexity with less loss of precision, and getting the global optimal solution.