gskeletonclu - revealing density-based clustering structure from the core-connected tree of a...
TRANSCRIPT
gSkeletonClu [1]Revealing density-based clustering structure from the core-connected tree of a network
[1]Huang, J., Sun, H., Song, Q., Deng, H., & Han, J. (2013). Revealing density-based clustering structure from the core-connected tree of a network. IEEE Transactions on Knowledge and Data Engineering, 25(8), 1876–1889. http://doi.org/10.1109/TKDE.2012.100http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6200274&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F69%2F4358933%2F06200274.pdf%3Farnumber%3D6200274
Abstract
Objective: Identify communities and vertices roles in a weighted network
Overview
Given a weighted network…
1- Calculate its CCMST with the Core-Connectivity Similarity
2- Find the components called (Structure Core-Connected)
● Components that contains the core
3- Attach the vertex classified as border
4 - Identify the Hubs and Outlier
Def1) Neighborhood
Neighborhood of n1:
r(n1) = {n1, n2, n3, n8}
Def2) Structural Similarity
num = 2*weight(n1, n8) = 20
denA = denB = 1
num += 10*10 = 120
denA += Sqr[(10*10) + (10*10)] = 14.14
denB += Sqr[(10*10) + (10*10) + (10*10) + (5*5)] = 18.02
σ(n1, n8) = num/(denA*denB)
*Note = Initial values of num and den are a mysterious
Def2) Structural Similarity
σ(n1, n8) = 0.47
Def3) Ɛ-Neighborhood
Ɛ-Neighborhood for n1:
● Ɛ = 0.47● rƐ(n1) = {n1, n2, n8}
Def4) Core
if | �(u) | >= μ, then u is a core. Denoted by Kε,μ (u)
Considering,
● Ɛ = 0.47● μ = 3,
so...
● Kε,μ (n1)
Def5) Directly Structure-Reachable
If u is a core AND v belongs to �Ɛ(u).
So:
● u ⟼ ε,μv○ n1 ⟼ ε,μn8
Def6) Hubs and Outliers
if h does not belong to any cluster
AND
if h bridges multiples cluster, such that:
h E r(u) ^ h E r(v)
then h is hub.
If not hub:
v is Outlier
Def6) Hubs and Outliers
if h does not belong to any cluster
AND
if h bridges multiples cluster, such that:
● h E r(u) ^ h E r(v)
then h is hub.
If not hub:
v is Outlier
hub
outlier
Def7) Structure Core-Similarity
CS(n1) candidates...
1. (n1, n0) - 0.082. (n1, n2) - 0.683. (n1, n3) - 0.434. (n1, n8) - 0.47
Ɛ
Def7) Structure Core-Similarity
CS(n1) candidates...
1. (n1, n0) - 0.082. (n1, n2) - 0.683. (n1, n3) - 0.434. (n1, n8) - 0.47
Ɛ
Def8) Reachability-Similarity
RS(n6, n7) = min {0.51, 0.1} = 0.1
RS(n6, n4) = 0.51
RS(n6, n5) = 0.55
---
RS(n7, n6) = min {0, 0.1} = 0
Asymmetric!!!!
Def9) Core Connectivity Similarity
CCS(n6, n4) = 0.51
CCS(n6, n5) = 0.51
CCS(n6, n7) = 0
Def9) Core Connectivity Similarity
CCS(n6, n4) = 0.51
CCS(n6, n5) = 0.51
CCS(n6, n7) = 0
Def10) Structure Core-Connected
Given Ɛ E IR, μ E IN;u, v E V;u, and v are directly core-connected with each other if and only if:● Kε,μ (u) ^Kε,μ (v) ^ u ⟼ ε,μv
This is denoted by:u ⟷ ε,μv
gSkeletonClu will first try to find structures that respect this definition above, after that will append the "borders" ( vertex that are "directly structure reachable" but don't respect this def. above). At the end, the gSkeletonClu will separate the clusters, hubs and outliers.
CCMST - Core-Connected Maximal Spanning Tree
Instead to use the complete network the authors proved that it is possible to identify the Structure Core-Connected components from the CCMST, considering the weight as the "CCS(u,v)".
Ɛ-Candidates:
● 0.51● 0.47● 0.43● 0.08● 0
Core-Connected Components from CCMST
Ɛ= 0.51 Ɛ= 0.47
Ɛ= 0.43 Ɛ= 0.08
Attracting Indices for Attaching Borders
RS(2,3) = 0.55
RS(1,3) = 0.43
--
AS(3) = 0.55
Attracting Indices for Attaching Borders
AS(3) = 0.55
Ɛ= 0.47if AS(3) > Ɛ:
n3 is attached to the cluster that contains n2.
So What…. ?Let`s execute from scratch!
Step 1 - Prepare your weapons!!
Calculate the Weighted Core-Similarity NetworK
Ɛ = 0.47
μ = 3
Weighted NetworK: Weighted Core-Similarity NetworK:
Step 2- Point your weapons...
Calculate the CCMST
.
Step 3A - Fire!
Detect Core-Connected Components...
Ɛ = 0.47
μ = 3Ɛ= 0.47
Step 3B - Fire again !
Attach the borders!
Ɛ= 0.47
Step 3C - Kill it, before it kills you!
Detect Cluster, hubs and outliers
n0 is a hub because:
● n0 does not belong to any cluster● n0 bridges the clusters A and B.
n7 is a outlier because:
● it is not a hub =(
hub
outlier
Results - Guard the guns… You are the winner!(or just a survivor...)
Clustering of Automatically Selected Ɛ
If you have the Ɛ candidates extracted from the CCMST…
AND...
If you adopt a way to measure what is the best Ɛ...
Then, you can automatically select the Ɛ parameter.
One possible choice is to use the modularity Q as a quality measure of network clustering. The Q value belongs to [0,1]. The higher the value close to 1 indicates a better clustering result.
In a nutshell… You should run the gSkeletonClu for all Ɛ candidates and based on a quality index, choose the best partition!!!
Did you like?
There is more!
From the CCMST is possible to extract the clustering hierarchy… (next opportunity)
Limitation
● The gSkletonClu just can be applied on networks!● In the author`s paper of gSkeletonClu, the tests show that it is slower than
SCAN…● Maybe it can not work in BIG networks. (more than 1 million of vertex)
○ SCAN ++ (Shiokawa, 2015) [1][2] did tests in BIG networks and could not perform the gSkeleton on them…
Have fun![1] http://www.vldb.org/pvldb/vol8/p1178-shiokawa.pdf[2] htp://pt.slideshare.net/LazyShion/scan-efficient-algorithm-for-finding-clusters-hubs-and-outliers-on-largescale-graphs-vldb-2015
Presentation created by:Danilo Amaral de Oliveira
Thank you!