gskeletonclu - revealing density-based clustering structure from the core-connected tree of a...

31
gSkeletonClu [1] Revealing density-based clustering structure from the core-connected tree of a network [1]Huang, J., Sun, H., Song, Q., Deng, H., & Han, J. (2013). Revealing density-based clustering structure from the core-connected tree of a network. IEEE Transactions on Knowledge and Data Engineering, 25(8), 1876–1889. http://doi.org/10.1109/TKDE.2012.100 http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6200274&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F69%2F4358933%2F06200274.pdf% 3Farnumber%3D6200274

Upload: danilo-oliveira

Post on 21-Jan-2017

273 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

gSkeletonClu [1]Revealing density-based clustering structure from the core-connected tree of a network

[1]Huang, J., Sun, H., Song, Q., Deng, H., & Han, J. (2013). Revealing density-based clustering structure from the core-connected tree of a network. IEEE Transactions on Knowledge and Data Engineering, 25(8), 1876–1889. http://doi.org/10.1109/TKDE.2012.100http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6200274&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F69%2F4358933%2F06200274.pdf%3Farnumber%3D6200274

Page 2: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Abstract

Objective: Identify communities and vertices roles in a weighted network

Page 3: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Overview

Given a weighted network…

1- Calculate its CCMST with the Core-Connectivity Similarity

2- Find the components called (Structure Core-Connected)

● Components that contains the core

3- Attach the vertex classified as border

4 - Identify the Hubs and Outlier

Page 4: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def1) Neighborhood

Neighborhood of n1:

r(n1) = {n1, n2, n3, n8}

Page 5: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def2) Structural Similarity

num = 2*weight(n1, n8) = 20

denA = denB = 1

num += 10*10 = 120

denA += Sqr[(10*10) + (10*10)] = 14.14

denB += Sqr[(10*10) + (10*10) + (10*10) + (5*5)] = 18.02

σ(n1, n8) = num/(denA*denB)

*Note = Initial values of num and den are a mysterious

Page 6: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def2) Structural Similarity

σ(n1, n8) = 0.47

Page 7: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def3) Ɛ-Neighborhood

Ɛ-Neighborhood for n1:

● Ɛ = 0.47● rƐ(n1) = {n1, n2, n8}

Page 8: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def4) Core

if | �(u) | >= μ, then u is a core. Denoted by Kε,μ (u)

Considering,

● Ɛ = 0.47● μ = 3,

so...

● Kε,μ (n1)

Page 9: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def5) Directly Structure-Reachable

If u is a core AND v belongs to �Ɛ(u).

So:

● u ⟼ ε,μv○ n1 ⟼ ε,μn8

Page 10: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def6) Hubs and Outliers

if h does not belong to any cluster

AND

if h bridges multiples cluster, such that:

h E r(u) ^ h E r(v)

then h is hub.

If not hub:

v is Outlier

Page 11: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def6) Hubs and Outliers

if h does not belong to any cluster

AND

if h bridges multiples cluster, such that:

● h E r(u) ^ h E r(v)

then h is hub.

If not hub:

v is Outlier

hub

outlier

Page 12: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def7) Structure Core-Similarity

CS(n1) candidates...

1. (n1, n0) - 0.082. (n1, n2) - 0.683. (n1, n3) - 0.434. (n1, n8) - 0.47

Ɛ

Page 13: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def7) Structure Core-Similarity

CS(n1) candidates...

1. (n1, n0) - 0.082. (n1, n2) - 0.683. (n1, n3) - 0.434. (n1, n8) - 0.47

Ɛ

Page 14: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def8) Reachability-Similarity

RS(n6, n7) = min {0.51, 0.1} = 0.1

RS(n6, n4) = 0.51

RS(n6, n5) = 0.55

---

RS(n7, n6) = min {0, 0.1} = 0

Asymmetric!!!!

Page 15: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def9) Core Connectivity Similarity

CCS(n6, n4) = 0.51

CCS(n6, n5) = 0.51

CCS(n6, n7) = 0

Page 16: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def9) Core Connectivity Similarity

CCS(n6, n4) = 0.51

CCS(n6, n5) = 0.51

CCS(n6, n7) = 0

Page 17: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Def10) Structure Core-Connected

Given Ɛ E IR, μ E IN;u, v E V;u, and v are directly core-connected with each other if and only if:● Kε,μ (u) ^Kε,μ (v) ^ u ⟼ ε,μv

This is denoted by:u ⟷ ε,μv

gSkeletonClu will first try to find structures that respect this definition above, after that will append the "borders" ( vertex that are "directly structure reachable" but don't respect this def. above). At the end, the gSkeletonClu will separate the clusters, hubs and outliers.

Page 18: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

CCMST - Core-Connected Maximal Spanning Tree

Instead to use the complete network the authors proved that it is possible to identify the Structure Core-Connected components from the CCMST, considering the weight as the "CCS(u,v)".

Ɛ-Candidates:

● 0.51● 0.47● 0.43● 0.08● 0

Page 19: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Core-Connected Components from CCMST

Ɛ= 0.51 Ɛ= 0.47

Ɛ= 0.43 Ɛ= 0.08

Page 20: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Attracting Indices for Attaching Borders

RS(2,3) = 0.55

RS(1,3) = 0.43

--

AS(3) = 0.55

Page 21: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Attracting Indices for Attaching Borders

AS(3) = 0.55

Ɛ= 0.47if AS(3) > Ɛ:

n3 is attached to the cluster that contains n2.

Page 22: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

So What…. ?Let`s execute from scratch!

Page 23: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Step 1 - Prepare your weapons!!

Calculate the Weighted Core-Similarity NetworK

Ɛ = 0.47

μ = 3

Weighted NetworK: Weighted Core-Similarity NetworK:

Page 24: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Step 2- Point your weapons...

Calculate the CCMST

.

Page 25: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Step 3A - Fire!

Detect Core-Connected Components...

Ɛ = 0.47

μ = 3Ɛ= 0.47

Page 26: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Step 3B - Fire again !

Attach the borders!

Ɛ= 0.47

Page 27: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Step 3C - Kill it, before it kills you!

Detect Cluster, hubs and outliers

n0 is a hub because:

● n0 does not belong to any cluster● n0 bridges the clusters A and B.

n7 is a outlier because:

● it is not a hub =(

hub

outlier

Page 28: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Results - Guard the guns… You are the winner!(or just a survivor...)

Page 29: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Clustering of Automatically Selected Ɛ

If you have the Ɛ candidates extracted from the CCMST…

AND...

If you adopt a way to measure what is the best Ɛ...

Then, you can automatically select the Ɛ parameter.

One possible choice is to use the modularity Q as a quality measure of network clustering. The Q value belongs to [0,1]. The higher the value close to 1 indicates a better clustering result.

In a nutshell… You should run the gSkeletonClu for all Ɛ candidates and based on a quality index, choose the best partition!!!

Page 30: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Did you like?

There is more!

From the CCMST is possible to extract the clustering hierarchy… (next opportunity)

Limitation

● The gSkletonClu just can be applied on networks!● In the author`s paper of gSkeletonClu, the tests show that it is slower than

SCAN…● Maybe it can not work in BIG networks. (more than 1 million of vertex)

○ SCAN ++ (Shiokawa, 2015) [1][2] did tests in BIG networks and could not perform the gSkeleton on them…

Have fun![1] http://www.vldb.org/pvldb/vol8/p1178-shiokawa.pdf[2] htp://pt.slideshare.net/LazyShion/scan-efficient-algorithm-for-finding-clusters-hubs-and-outliers-on-largescale-graphs-vldb-2015

Page 31: gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network

Presentation created by:Danilo Amaral de Oliveira

[email protected]

Thank you!