chair of computer science 5 rwth aachen...
TRANSCRIPT
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
1
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Contextualized versus Structural
Overlapping Communities in Social
Media
Mohsen Shahriari, Sabrina Haefele, Ralf Klamma
Advanced Community Information Systems (ACIS)
RWTH Aachen University, Germany
shahriari, haefele, [email protected]
Chair of Computer Science 5
RWTH Aachen University
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
2
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Outline
Research background
– Necessity of community analysis
– Community detection
Literature & Challenges
Research questions
Baselines & Proposed Methods
Dataset & Metrics
Results
Conclusion & Future Works
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
3
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
4
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
Degree Distribution of the CiteULike user-tag
network
Source: Taken from
networkscience.wordpress.com
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
5
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
Source: Milgram experiment “The small world problem”
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
6
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
Source: Taken from
networkscience.wordpress.com
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
7
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to
Characterize Networks
Power law
– Eligible for social network analysis
– Presence of hubs
Small-World-ness
Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif
structures
Community structure
– Overlapping community structure
– But also to support other applications
– Scale up information
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
8
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: What Is A
(overlapping) Community?
Components have high density inside communities
and sparse among clusters
People with similar interests
or needs (Preece, 2000)
Recent research: Overlapping
Structures are dense (Jaewon Yang & Leskovec, 2012)
(Girvan & Newman, Mark E. J., 2002)
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
9
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: What Is A
(overlapping) Community?
In some networks even other definitions
Signed social networks: density and balancing theory (Doreian, 2004)
Different interpretation of communities and their
definitions
--
+
++
++
++
+
+
++
+
+
+
+
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
10
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: What is A
(overlapping) Community?
Communities may be formed when people have
some ideas, innovation and thoughts to discuss
– When they do not know each other
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
11
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
LiteratureLiterature
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
12
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Challenges regarding Content-based
OCD
Imperceptible knowledge regarding significance of content
– Community events e.g., releases in open source developer network
– Correlation of content and structural properties of the social media
Few of them detect overlapping community structures
– Detecting only disjoint community structures
Most of the methods are not suitable for thread-based data
structures
– Needs huge tuning
Most of the approaches do not work on actual posts/contents
– Use mainly attributes/tags
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
13
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Questions
How structural properties like number of overlapping
nodes, modularity and average community size are
affected by contextualized similarities among users in
question & answer social platforms?
Can adding of content improve the performance of
structural based algorithms?
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
14
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Structural/Content-Based OCD
Approaches
First we introduce the baselines used in this work– Disassortative degree Mixing and Information Diffusion (DMID)
– Speaker-listener Label Propagation Algorithm (SLPA)
– Stanoev, Smikov and Kocarev (SSK)
– Algorithm by Li, Zhang, Liu, Chen and Zhang (CLIZZ)
Then we introduce the proposed Content-based
methods– Cost function optimization clustering algorithm (CFOCA)
– Term community merging algorithm (TCMA)
– Combining content and structural values
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
15
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Baseline Methods: Disassortative Degree
Mixing and Information Diffusion (DMID)
Detecting most influential nodes (leaders)
– Using of disassortative degree mixing property
– 𝐴𝑆𝑖𝑗 = deg 𝑖 − deg(𝑗)
– Row normalize disassortative matrix
– 𝑇𝑖𝑗 =𝐴𝑆𝑖𝑗
𝑘=1𝑁
𝐴𝑆𝑖𝑘
– Performing a random walk
– 𝐷𝐴𝑡+1 = 𝐷𝐴𝑡 × 𝑇
– Computing local leadership value
– Combining degree and disassortative value
– 𝐿𝐿𝑖 = 𝐷𝐴𝑖 × 𝐷𝑅𝑁𝑖
Cascading behavior named network coordination game
𝑃𝐴 𝑖 =𝑗∈𝑁 𝑖 :𝑗 ℎ𝑎𝑠 𝑏𝑒ℎ𝑎𝑣𝑖𝑜𝑢𝑟 𝐴
𝑁(𝑖)
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
16
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Baseline Methods: Speaker-listener
Label Propagation Algorithm (SLPA)
Extension of label propagation algorithm
– Nodes can take multiple labels
Idea: speaker-listener information propagation process (mimics human
communication)
Nodes can store updated labels
Steps:
1. Node’s memory is initialized with unique label
2. Do until a user defined iteration number is reached:
1. Select one node as listener
2. Each neighbor randomly selects a label
3. Listener accepts one of the propagated labels according to a rule (e.g.,
most popular label)
3. Post-processing phase for identifying the communities
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
17
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Baseline Methods: Stanoev, Smikov
and Kocarev (SSK)
An algorithm based on influence dynamics and membership
computation
– Relationships of nodes and their influences are more important than direct
connections
– Proxies among nodes are better established while there exits triangles among
nodes
Computing transitive link matrix using both adjacency matrix and
triangle occurrences
Computing the membership of nodes to leaders
– Weighted average membership of neighbors
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
18
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Baseline Methods: CLIZZ
Two phase algorithm
– Identifying influential nodes based on influence range
– Influence ranges are computed based on shortest
distance
– Computing membership values of nodes using and
updating rule
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
19
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Proposed Content-Based Methods:
Feature Creation Phase
Term Matrix
– Constructed from threads of the user
– Converted by tf-idf
Threads
tf-idf
Threads
Threads
w1 w2 w3 …
0.23 0.5 0
0.8 0 1
0 1.2 0.59
w1
w3
w2Term Matrix
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
20
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Minimization of the costs
Cost function J based on cosine similarity
Updating the centroids using gradient descent
Modification for overlapping communities: threshold
for distance to other centroids
Cost Function Optimization
Clustering Algorithm (CFOCA)
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
21
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Term Community Merging
Algorithm (TCMA)
Two phases
– Compute one community per each word
– Refinement of the communities using overlapping
coefficient
w1 w2 w3 …
0.23 0.5 0
0.8 0.76 1
0 1.2 0.59
Term Matrix
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
22
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Content-Based Weighting Method
Generate two weights from content
Use OCD algorithms to compute communities, like
DMID, SSK and CLiZZ
Threads
( r , s )
w1 w2 w3 …
0.23 0.5 0 …
0.8 0 1 …
Term Matrix
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
23
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Datasets and Metrics
Jmol dataset
– Forum discussion regarding a Java-Tool for molecular modeling of
chemical structures
– Open source development
– 2002 – 2012
– Publicly available at
– https://github.com/rwth-acis/REST-OCDServices/wiki/Jmol-Dataset
Combined modularity
– Considering both
content and density
Number of overlapping nodes, average community sizes to
extract useful information
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
24
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Similarity Costs versus Average
Community Size
1, 10 and 11 have low content similarity
6 has the highest content similarity
Community has the highest size
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
25
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Similarity Costs versus Number of
Overlapping Nodes
Releases 2, 3, 4 and 5 have high similarity and low
overlapping nodes
Similarity costs are global measures
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
26
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Similarity Costs versus Modularity
Reverse relation between content similarity and modularity
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
27
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Average Community Size versus
Releases
Content-based algorithms are useful when structure of the
network is missing
Content-based algorithms detect bigger community sizes
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
28
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Number of Overlapping Nodes versus
Releases
Content-based methods may reflect the actual changes
Content-based methods detect higher overlaps in
comparison to structural-based methods
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
29
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Conclusion & Future Works
Conclusion & Message:
Content has significant effect on structural-based techniques
– Changing in community sizes, number of overlapping nodes and modularity
– Content-based methods detect bigger community sizes with bigger overlaps
Future Works:
Investigate local similarity costs
Improving time complexity
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
30
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
References
Ahn, Y.-Y., Bagrow, J. P., & Lehmann, S. (2010). Link communities reveal multiscale complexity in networks, Nature, 466(7307),
761–764. doi:10.1038/nature09182
Derényi, I., Palla, G., & Vicsek, T. (2005). Clique Percolation in Random Networks. Physical Review Letters, 94(16), 160202.
doi:10.1103/PhysRevLett.94.160202
Ding, Z., Zhang, X., Sun, D., & Luo, B. (2016). Overlapping Community Detection based on Network Decomposition. Sci Rep,
6(24115). doi:10.1038/srep24115
Doreian, P. (2004). Evolution of Human Signed Networks, 1(2), 277–293. Retrieved from http://snap.stanford.edu/class/cs224w-
readings/dorean04evolution.pdf
Girvan, M., & Newman, Mark E. J. (2002). Community structure in social and biological networks. Proceedings of the National
Academy of Sciences, 99(12), 7821–7826. doi:10.1073/pnas.122653799
Gunnemann, S., Boden, B., Farber, I., & Seidl, T. (2013). Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs
with Feature Vectors. In Advances in Knowledge Discovery and Data Mining (pp. 261–275). Springer Berlin Heidelberg.
Gunnemann, S., Farber, I., Boden, B., & Seidl, T. (2010). subspace clustering meets dense subgraph mining; a synthesis of two
paradigms. In The 10th International Conference On Data Mining .
Havemann, F., Heinz, M., Struck, A., & Gläser, J. (2011). Identification of overlapping communities and their hierarchy by locally
calculating community-changing resolution levels. Journal of Statistical Mechanics: Theory and Experiment. doi:10.1088/1742-
5468/2011/01/P01023
Preece, J. (2002). Supporting Community and Building Social Capital - Guest Editorial. Communications of the ACM, 45(4), 37 ‐ 39.
Shahriari, M., Parekodi, S., & Klamma, R. (2015). Community-aware Ranking Algorithms for Expert Identification in Question-
answer Forums. In Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business. I-
KNOW (pp. 1–8). ACM. Retrieved from http://doi.acm.org/10.1145/2809563.2809592
Shen, H., Cheng, X., Cai, K., & Hu, M.-B. (2009). Detect overlapping and hierarchical community structure in networks. PHYSICA A-
STATISTICAL MECHANICS AND ITS APPLICATIONS, 388(8), 1706–1712. doi:10.1016/j.physa.2008.12.021
Yang, J., & Leskovec, J. (2012). Structure and Overlaps of Communities in Networks. CoRR, abs/1205.6228.
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
31
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma