chair of computer science 5 rwth aachen...

31
Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 1 Learning Layers Contextualized versus Structural Overlapping Community Structures in Social Media Mohsen Shahriari Ying Li Ralf Klamma This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Contextualized versus Structural Overlapping Communities in Social Media Mohsen Shahriari, Sabrina Haefele, Ralf Klamma Advanced Community Information Systems (ACIS) RWTH Aachen University, Germany {shahriari, haefele, klamma}@dbis.rwth-aachen.de Chair of Computer Science 5 RWTH Aachen University

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

1

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Contextualized versus Structural

Overlapping Communities in Social

Media

Mohsen Shahriari, Sabrina Haefele, Ralf Klamma

Advanced Community Information Systems (ACIS)

RWTH Aachen University, Germany

shahriari, haefele, [email protected]

Chair of Computer Science 5

RWTH Aachen University

Page 2: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

2

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Outline

Research background

– Necessity of community analysis

– Community detection

Literature & Challenges

Research questions

Baselines & Proposed Methods

Dataset & Metrics

Results

Conclusion & Future Works

Page 3: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

3

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to

Characterize Networks

Power law

– Eligible for social network analysis

– Presence of hubs

Small-World-ness

Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif

structures

Community structure

– Overlapping community structure

– But also to support other applications

– Scale up information

Page 4: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

4

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to

Characterize Networks

Power law

– Eligible for social network analysis

– Presence of hubs

Small-World-ness

Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif

structures

Community structure

– Overlapping community structure

– But also to support other applications

– Scale up information

Degree Distribution of the CiteULike user-tag

network

Source: Taken from

networkscience.wordpress.com

Page 5: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

5

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to

Characterize Networks

Power law

– Eligible for social network analysis

– Presence of hubs

Small-World-ness

Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif

structures

Community structure

– Overlapping community structure

– But also to support other applications

– Scale up information

Source: Milgram experiment “The small world problem”

Page 6: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

6

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to

Characterize Networks

Power law

– Eligible for social network analysis

– Presence of hubs

Small-World-ness

Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif

structures

Community structure

– Overlapping community structure

– But also to support other applications

– Scale up information

Source: Taken from

networkscience.wordpress.com

Page 7: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

7

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to

Characterize Networks

Power law

– Eligible for social network analysis

– Presence of hubs

Small-World-ness

Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif

structures

Community structure

– Overlapping community structure

– But also to support other applications

– Scale up information

Page 8: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

8

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: What Is A

(overlapping) Community?

Components have high density inside communities

and sparse among clusters

People with similar interests

or needs (Preece, 2000)

Recent research: Overlapping

Structures are dense (Jaewon Yang & Leskovec, 2012)

(Girvan & Newman, Mark E. J., 2002)

Page 9: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

9

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: What Is A

(overlapping) Community?

In some networks even other definitions

Signed social networks: density and balancing theory (Doreian, 2004)

Different interpretation of communities and their

definitions

--

+

++

++

++

+

+

++

+

+

+

+

Page 10: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

10

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: What is A

(overlapping) Community?

Communities may be formed when people have

some ideas, innovation and thoughts to discuss

– When they do not know each other

Page 11: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

11

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

LiteratureLiterature

Page 12: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

12

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Challenges regarding Content-based

OCD

Imperceptible knowledge regarding significance of content

– Community events e.g., releases in open source developer network

– Correlation of content and structural properties of the social media

Few of them detect overlapping community structures

– Detecting only disjoint community structures

Most of the methods are not suitable for thread-based data

structures

– Needs huge tuning

Most of the approaches do not work on actual posts/contents

– Use mainly attributes/tags

Page 13: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

13

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Questions

How structural properties like number of overlapping

nodes, modularity and average community size are

affected by contextualized similarities among users in

question & answer social platforms?

Can adding of content improve the performance of

structural based algorithms?

Page 14: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

14

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Structural/Content-Based OCD

Approaches

First we introduce the baselines used in this work– Disassortative degree Mixing and Information Diffusion (DMID)

– Speaker-listener Label Propagation Algorithm (SLPA)

– Stanoev, Smikov and Kocarev (SSK)

– Algorithm by Li, Zhang, Liu, Chen and Zhang (CLIZZ)

Then we introduce the proposed Content-based

methods– Cost function optimization clustering algorithm (CFOCA)

– Term community merging algorithm (TCMA)

– Combining content and structural values

Page 15: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

15

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Baseline Methods: Disassortative Degree

Mixing and Information Diffusion (DMID)

Detecting most influential nodes (leaders)

– Using of disassortative degree mixing property

– 𝐴𝑆𝑖𝑗 = deg 𝑖 − deg(𝑗)

– Row normalize disassortative matrix

– 𝑇𝑖𝑗 =𝐴𝑆𝑖𝑗

𝑘=1𝑁

𝐴𝑆𝑖𝑘

– Performing a random walk

– 𝐷𝐴𝑡+1 = 𝐷𝐴𝑡 × 𝑇

– Computing local leadership value

– Combining degree and disassortative value

– 𝐿𝐿𝑖 = 𝐷𝐴𝑖 × 𝐷𝑅𝑁𝑖

Cascading behavior named network coordination game

𝑃𝐴 𝑖 =𝑗∈𝑁 𝑖 :𝑗 ℎ𝑎𝑠 𝑏𝑒ℎ𝑎𝑣𝑖𝑜𝑢𝑟 𝐴

𝑁(𝑖)

Page 16: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

16

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Baseline Methods: Speaker-listener

Label Propagation Algorithm (SLPA)

Extension of label propagation algorithm

– Nodes can take multiple labels

Idea: speaker-listener information propagation process (mimics human

communication)

Nodes can store updated labels

Steps:

1. Node’s memory is initialized with unique label

2. Do until a user defined iteration number is reached:

1. Select one node as listener

2. Each neighbor randomly selects a label

3. Listener accepts one of the propagated labels according to a rule (e.g.,

most popular label)

3. Post-processing phase for identifying the communities

Page 17: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

17

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Baseline Methods: Stanoev, Smikov

and Kocarev (SSK)

An algorithm based on influence dynamics and membership

computation

– Relationships of nodes and their influences are more important than direct

connections

– Proxies among nodes are better established while there exits triangles among

nodes

Computing transitive link matrix using both adjacency matrix and

triangle occurrences

Computing the membership of nodes to leaders

– Weighted average membership of neighbors

Page 18: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

18

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Baseline Methods: CLIZZ

Two phase algorithm

– Identifying influential nodes based on influence range

– Influence ranges are computed based on shortest

distance

– Computing membership values of nodes using and

updating rule

Page 19: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

19

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Proposed Content-Based Methods:

Feature Creation Phase

Term Matrix

– Constructed from threads of the user

– Converted by tf-idf

Threads

tf-idf

Threads

Threads

w1 w2 w3 …

0.23 0.5 0

0.8 0 1

0 1.2 0.59

w1

w3

w2Term Matrix

Page 20: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

20

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Minimization of the costs

Cost function J based on cosine similarity

Updating the centroids using gradient descent

Modification for overlapping communities: threshold

for distance to other centroids

Cost Function Optimization

Clustering Algorithm (CFOCA)

Page 21: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

21

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Term Community Merging

Algorithm (TCMA)

Two phases

– Compute one community per each word

– Refinement of the communities using overlapping

coefficient

w1 w2 w3 …

0.23 0.5 0

0.8 0.76 1

0 1.2 0.59

Term Matrix

Page 22: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

22

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Content-Based Weighting Method

Generate two weights from content

Use OCD algorithms to compute communities, like

DMID, SSK and CLiZZ

Threads

( r , s )

w1 w2 w3 …

0.23 0.5 0 …

0.8 0 1 …

Term Matrix

Page 23: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

23

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Datasets and Metrics

Jmol dataset

– Forum discussion regarding a Java-Tool for molecular modeling of

chemical structures

– Open source development

– 2002 – 2012

– Publicly available at

– https://github.com/rwth-acis/REST-OCDServices/wiki/Jmol-Dataset

Combined modularity

– Considering both

content and density

Number of overlapping nodes, average community sizes to

extract useful information

Page 24: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

24

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Similarity Costs versus Average

Community Size

1, 10 and 11 have low content similarity

6 has the highest content similarity

Community has the highest size

Page 25: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

25

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Similarity Costs versus Number of

Overlapping Nodes

Releases 2, 3, 4 and 5 have high similarity and low

overlapping nodes

Similarity costs are global measures

Page 26: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

26

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Similarity Costs versus Modularity

Reverse relation between content similarity and modularity

Page 27: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

27

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Average Community Size versus

Releases

Content-based algorithms are useful when structure of the

network is missing

Content-based algorithms detect bigger community sizes

Page 28: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

28

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Number of Overlapping Nodes versus

Releases

Content-based methods may reflect the actual changes

Content-based methods detect higher overlaps in

comparison to structural-based methods

Page 29: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

29

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Conclusion & Future Works

Conclusion & Message:

Content has significant effect on structural-based techniques

– Changing in community sizes, number of overlapping nodes and modularity

– Content-based methods detect bigger community sizes with bigger overlaps

Future Works:

Investigate local similarity costs

Improving time complexity

Page 30: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

30

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

References

Ahn, Y.-Y., Bagrow, J. P., & Lehmann, S. (2010). Link communities reveal multiscale complexity in networks, Nature, 466(7307),

761–764. doi:10.1038/nature09182

Derényi, I., Palla, G., & Vicsek, T. (2005). Clique Percolation in Random Networks. Physical Review Letters, 94(16), 160202.

doi:10.1103/PhysRevLett.94.160202

Ding, Z., Zhang, X., Sun, D., & Luo, B. (2016). Overlapping Community Detection based on Network Decomposition. Sci Rep,

6(24115). doi:10.1038/srep24115

Doreian, P. (2004). Evolution of Human Signed Networks, 1(2), 277–293. Retrieved from http://snap.stanford.edu/class/cs224w-

readings/dorean04evolution.pdf

Girvan, M., & Newman, Mark E. J. (2002). Community structure in social and biological networks. Proceedings of the National

Academy of Sciences, 99(12), 7821–7826. doi:10.1073/pnas.122653799

Gunnemann, S., Boden, B., Farber, I., & Seidl, T. (2013). Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs

with Feature Vectors. In Advances in Knowledge Discovery and Data Mining (pp. 261–275). Springer Berlin Heidelberg.

Gunnemann, S., Farber, I., Boden, B., & Seidl, T. (2010). subspace clustering meets dense subgraph mining; a synthesis of two

paradigms. In The 10th International Conference On Data Mining .

Havemann, F., Heinz, M., Struck, A., & Gläser, J. (2011). Identification of overlapping communities and their hierarchy by locally

calculating community-changing resolution levels. Journal of Statistical Mechanics: Theory and Experiment. doi:10.1088/1742-

5468/2011/01/P01023

Preece, J. (2002). Supporting Community and Building Social Capital - Guest Editorial. Communications of the ACM, 45(4), 37 ‐ 39.

Shahriari, M., Parekodi, S., & Klamma, R. (2015). Community-aware Ranking Algorithms for Expert Identification in Question-

answer Forums. In Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business. I-

KNOW (pp. 1–8). ACM. Retrieved from http://doi.acm.org/10.1145/2809563.2809592

Shen, H., Cheng, X., Cai, K., & Hu, M.-B. (2009). Detect overlapping and hierarchical community structure in networks. PHYSICA A-

STATISTICAL MECHANICS AND ITS APPLICATIONS, 388(8), 1706–1712. doi:10.1016/j.physa.2008.12.021

Yang, J., & Leskovec, J. (2012). Structure and Overlaps of Communities in Networks. CoRR, abs/1205.6228.

Page 31: Chair of Computer Science 5 RWTH Aachen Universitymagazin.know-center.tugraz.at/wp-content/uploads/... · Contextualized versus Structural Overlapping Community Structures in Social

Lehrstuhl Informatik 5

(Information Systems)

Prof. Dr. M. Jarke

31

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma