Community Detection for Emerging Networks
Jiawei Zhang1, Philip S. Yu1,2
1 University of Illinois at Chicago, USA
2Tsinghua University, China
New Social Networks Emerge Every Year
2010200920082007200620052004 2011launch year
http://en.wikipedia.org/wiki/List_of_social_networking_websites
Emerging Networks Attract Limited Usages
Locations
Tips
8 AM 12 PM 4 PM 8 PM 11 PM
User Accounts
Temporal Activities
Emerging Networks Contains Sparse Information
Hard to calculate effective closeness measures among users due to the sparse information
Emerging Network Community Detection
closeness measures among users: Intimacy
Challenge 1: Information Sparsity Problem• Solution: use both Link and Attribute information
Intimacy Calculation with both Connection and Attribute Information
user
user time loc word
time
loc
wor
d
0
network transitional matrix
weighted normalized adjacency matrices (1) among users (2) between users and attributes
Intimacy Calculation with both Connection and Attribute Information
0high-dimensional
stationary network transitional matrix
we only care about the intimacy matrix among users (lower dimension)
sub-matrix at the upper left corner
intimacy matrix among users
stationary network transitional matrix calculation
Locations
Tips
8 AM 12 PM 4 PM 8 PM 11 PM
User Accounts
Temporal Activities
Challenge 2: Cold Start Community Detection
Emerging Network Community Detection
A special case: Cold Start Community Detection
(no social activities exist at all)
Locations
Tips
8 AM 12 PM 4 PM 8 PM 11 PM
User Accounts
Temporal Activities
User Accounts
Locationslocate
locate
Tweets
8 AM 12 PM 4 PM 8 PM 11 PM
Temporal Activitiesanchor links
Users use multiple social networks simultaneously
anchor users
non-anchor users
Partially Aligned Social Networks
Intimacy Calculation with Information across Aligned Networks
00
00 network transitional matrix of Twitter
network transitional matrix of Foursquare
anchor transitional matrix
weighted aligned network transitional matrix
high-dimensional stationary aligned
network transitional matrix
we only care about the intimacy matrix among users (lower dimension)
sub-matrix at the upper left corner
intimacy matrix among users in Foursquare
00
00
Intimacy Calculation with Information across Aligned Networks
Challenge 3: High Time and Space Costs
00
00
approximation
intimacy matrix among users in
intimacy matrix among users in
Foursquarethe final
appr. intimacy matrix
Solution: Approximated Intimacy Calculation
Approximated Intimacy Calculation
Clustering based on Intimacy Matrix
Parameter Adjustment: weights of different information types and sources
U
Experiments• Dataset
# anchor links: 3,388
Experiments• Comparison Methods
Experiments• Evaluation Metrics
Experiment Resultshow new the emerging networks areperformance of methods using approximated
intimacy scores is close to the one with the exact intimacy scoresparameter adjustment step helps
our proposed methods can overcome the cold start problem very well
methods with approximated intimacy matrix can save lots of space and time
Summary• Problem Studied: Emerging Network Community
Detection & Cold Start Community Detection • Calculate the Intimacy scores among users in the
emerging network with both Connection and Attribute information across Partially Aligned Networks.
• To lower the time and space cost: Approximated Intimacy Calculation
Q & A
Anchor Links across Networks