robust local community detection: on free rider effect and its elimination 1 case western reserve...
DESCRIPTION
Community Goodness Metrics [1] B. Saha, et al. RECOMB’10. [2] C. Tsourakakis, et al. SIGMOD’14. [3] M. Sozio, et al. KDD’10. [4] W. Cui, et al. SIGMOD’14. [5] F. Luo, et al. WIAS’08. [6] K. J. Lang, CIKM’07. [7] R. Andersen, et al. FOCS’06. [8] A. Clauset, PRE’05. IntuitionsGoodness metricsRef. Internal denseness Classic density[1] Edge-surplus[2] Minimum degree[3,4] Internal denseness & external sparseness Subgraph modularity[5] Density-isolation[6] External conductance[7] Boundary sharpness Local modularity[8]TRANSCRIPT
Robust Local Community Detection:On Free Rider Effect and Its Elimination
1Case Western Reserve University
Yubao Wu1, Ruoming Jin2, Jing Li1, Xiang Zhang1
2Kent State University
Generic Local Community Detection Problem
Input: a) Graph b) A set of query nodes c) A goodness metric
Output: Subgraph such that:1) contains ()2) is maximized
[1] M. Sozio, et al. KDD’10.[2] W. Cui, et al. SIGMOD’14.[3] L. Ma, et al. DaWak’13.[4] B. Saha, et al. RECOMB’10.
[5] C. Tsourakakis, et al. SIGMOD’14.[6] A. Clauset, PRE’05.[7] F. Luo, et al. WIAS’08.[8] R. Andersen, et al. FOCS’06.
A
Community Goodness Metrics
[1] B. Saha, et al. RECOMB’10.[2] C. Tsourakakis, et al. SIGMOD’14.[3] M. Sozio, et al. KDD’10.[4] W. Cui, et al. SIGMOD’14.
[5] F. Luo, et al. WIAS’08.[6] K. J. Lang, CIKM’07. [7] R. Andersen, et al. FOCS’06.[8] A. Clauset, PRE’05.
Intuitions Goodness metrics Ref. Formulas
Internal denseness
Classic density [1]
Edge-surplus [2]concave
Minimum degree [3,4]
Internal denseness &
external sparseness
Subgraph modularity [5]Density-isolation [6]
External conductance [7]
Boundary sharpness Local modularity [8]
Free Rider Effect
Goodness metrics A A B A CClassic density 2.50 2.95 2.83Edge-surplus 15.3 26.5 22.8
Minimum degree 4 4 4Subgraph modularity 2.0 3.6 4.6
Density-isolation -2.6 3.8 1.5Ext. conductance 0.25 0.14 0.11Local modularity 0.63 0.70 0.78
[1] B. Saha, et al. RECOMB’10.[2] C. Tsourakakis, et al. SIGMOD’14.[3] M. Sozio, et al. KDD’10.[4] W. Cui, et al. SIGMOD’14.
[5] F. Luo, et al. WIAS’08.[6] K. J. Lang, CIKM’07. [7] R. Andersen, et al. FOCS’06.[8] A. Clauset, PRE’05.
Free Rider Effect in Real Networks
(a) Co-author network (b) Biological network
Barna, Saha, et al. Dense subgraphs with restrictions and applications to gene annotation graphs. RECOMB, 2010.
One existing method: classic density
Query Biased Node Weighting
: proximity value w.r.t. the query
Node Weight:
Query biased density:
𝜌 (𝑆)=𝑒(𝑆)𝜋 (𝑆)
: sum of node weights
Subgraph A becomes the query biased densest subgraph
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798-809, 2015.
QDC ProblemQuery biased densest connected subgraph (QDC) problem:
Input: a) Graph b) A set of query nodes
Output: Subgraph such that:1) contains ()2) Query biased density is maximized3) is connected
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798-809, 2015.
QDC QDC’ QDC’’
Input 1) 2) query
1) 2) query
Output:
1) contains 2) is maximized3) is connected
1) contains 2) is maximized is maximized
Complexity NP-hard Polynomial Polynomial
QDC Problem and Two Related Problems
If contains
Optimal
If is connected
Optimal
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798-809, 2015.
Finding the QDC’’
1. Removing Low Degree Nodes
2. Detect the Densest Subgraph
Finding the QDC’
Subgraph contraction
• Reduce the search space• Retain the densest subgraph
• On the reduced search space
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798-809, 2015.
Finding the QDC
Greedy Node Deletion Local Expansion
1) Delete low degree nodes
2) Maintain the connectivity
1) Connect the query nodes with a Steiner tree
2) Greedy local expansion
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798-809, 2015.
Experiments——Datasets
Dataset # Nodes # Edges # Communities
Amazon 00,334,863 0,000,925,872 0,151,037
DBLP 00,317,080 0,001,049,866 0,013,477
Youtube 01,134,890 0,002,987,624 0,008,385
Orkut 03,072,441 0,117,185,083 6,288,363
LiveJournal 03,997,962 0,034,681,189 0,287,512
Friendster 65,608,366 1,806,067,135 0,957,154
[1] J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In ICDM, 2012.[2] snap.stanford.edu
Experiments——State-of-the-Art Methods
Classes Abbr. Ref. Key Idea
Internal denseness
DS [1] Densest subgraph with query constraint
OQC [2] Optimal quasi-clique; edge-surplus
MDG [3] Minimum degree
Internal denseness & external sparseness
PRN [4] External conductance
LS [5] Local spectral
EMC [6] More internal edges than external edges
SM [7] Subgraph modularity
Boundary LM [8] Local modularity
[1] B. Saha, et al. RECOMB’10.[2] C. Tsourakakis, et al. SIGMOD’14.[3] M. Sozio, et al. KDD’10.[4] R. Andersen, et al. FOCS’06.
[5] M. W. Mahoney, et al. JMLR’12.[6] G. W. Flake, KDD’00.[7] F. Luo, et al. WIAS’08.[8] A. Clauset, PRE’05.
Experiments——Effectiveness Evaluat. MetricsMetrics Formulas
F-score
Community goodness metrics
Density
Cohesiveness
Separability
Consistency
[1] J. Yang and J. Leskovec. Dening and evaluating network communities based on ground-truth. In ICDM, pages 745-754, 2012.[2] Ma, Lianhang, et al. GMAC: A seed-insensitive approach to local community detection. In DaWak, pages 297-308, 2013.
Effectiveness Evaluation —— F-ScoreF-score QDC DS OQC MDG PRN LS EMC SM LM
Amazon 0.83 0.52 0.54 0.46 0.69 0.66 0.61 0.60 0.58
DBLP 0.46 0.31 0.33 0.32 0.48 0.42 0.34 0.36 0.37
Youtube 0.43 0.23 0.22 0.17 0.26 0.24 0.21 0.21 0.22
Orkut 0.47 0.15 0.16 0.13 0.21 0.17 0.19 0.16 0.18
LiveJournal 0.64 0.48 0.47 0.40 0.52 0.51 0.47 0.48 0.49
Friendster 0.32 -- 0.14 0.12 0.17 0.16 -- 0.14 0.13
Avg. F-score 0.53 0.3 0.31 0.27 0.39 0.36 0.33 0.33 0.33
Avg. Precision 0.65 0.46 0.45 0.29 0.51 0.41 0.34 0.38 0.48
Avg. Recall 0.78 0.61 0.58 0.69 0.67 0.64 0.66 0.63 0.59
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798-809, 2015.
Effectiveness Evaluation——Goodness Metrics
Community goodness metrics on LiveJournal graph
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798-809, 2015.
Effectiveness Evaluation——Consistency
Consistency QDC DS OQC MDG PRN LS EMC SM LM
Amazon 0.94 0.77 0.76 0.58 0.79 0.69 0.74 0.67 0.61
DBLP 0.88 0.62 0.64 0.37 0.65 0.53 0.56 0.43 0.56
Youtube 0.85 0.61 0.54 0.46 0.71 0.41 0.57 0.37 0.36
Orkut 0.83 0.56 0.52 0.32 0.68 0.43 0.51 0.54 0.47
LiveJournal 0.93 0.74 0.67 0.43 0.84 0.64 0.73 0.58 0.52
Friendster 0.78 -- 0.56 0.45 0.65 0.49 -- 0.32 0.39
Average 0.87 0.64 0.62 0.44 0.72 0.53 0.61 0.49 0.49
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798-809, 2015.
Conclusions
1) Free rider effect is a serious problem;
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798-809, 2015.
2) Query biased node weighting scheme can effectively eliminate the free rider effect thus improve the accuracy.