a framework for community identification in dynamic social networks chayant, tanya berger-wolf,...

A framework For Community Identification in Dynamic Social

Networks Chayant, Tanya Berger-Wolf, David Kempe

[KDD’07]

AdvisorAdvisor ：： Dr. Koh Jia-Dr. Koh Jia-LingLing

SpeakerSpeaker ：： Che-Wei LiangChe-Wei LiangDateDate ：： 2008.1.82008.1.8

Outline

• Introduction• Problem formulation• Finding optimal colorings• Group Coloring Heuristics• Experiment• Conclusion

Introduction

• Social networks– Graphs of interactions between individuals.

Introduction

• What is Community?– collections of individuals who interact unusually

frequently.– reveal interesting properties shared by member,

such as common hobbies, occupations.

• Why dynamic community?– may have more interesting properties.

History of Interactions

t=11

2 3

45

Assume discrete time and interactions in form of complete subgraphs.

Approach: Graph Model

5

5

5

5

5

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

Preliminaries

Problem Formulation

• Behavior of individuals assumption:– Individuals and groups represent exactly one

community at a time.

– Concurrent groups represent distinct communities.

Problem Formulation

• Behavior of individuals assumption (cont.):– Conservatism: community affiliation changes are

rare.– Group Loyalty: individuals observed in a group

belong to the same community.– Parsimony: few affiliations overall for each

individual.

Approach: Color = Community

Valid coloring: distinct color of groups in each time step

i-cost

•Conservatism: switching cost (α)

•Group loyalty:•Being absent (β1) •Being different (β2)

•Parsimony: number of colors (γ)

g-cost


•Group loyalty:-Being absent (β1) -Being different (β2)


c-cost


•Group loyalty:-Being absent (β1) -Being different (β2)


• Minimum Community Interpretation For a given cost setting, (α,β1,β2,γ), find vertex coloring that minimizes total cost.– Color of group vertices = Community structure– Color of individual vertices = Affiliation sequences

• Problem is NP-Complete and APX-Hard

Finding Optimal Colorings

• Individual Coloring

– G(t, x): g-cost of coloring i at time step t with color x– I(t, x, y): i-cost of coloring I at time steps t and i-1 with colors x and y.– C(x, R): c-cost of using color x when R is the set of colors used

in prior steps.

Finding Optimal Colorings

• Group Coloring– Using exhaustive search over all group colorings.– Speed up by Branch-and-Bound techniques.

Group Coloring Heuristics

• Bipartite Matching Heuristic– Using standard flow techniques.

• Greedy Heuristics– Maximize “similarity”– Jaccard’s index: Jac(g, g’) = – Repeatedly select the pair(g, g’) with highest

similarity, decide g, g’ should have same color.

'

'

gg

gg

Experiment

• Synthetic Data sets

Southern Women Data Setby Davis, Gardner, and Gardner, 1941

Photograph by Ben Shaln, Natchez, MS, October; 1935 Aggregated network

Event participation

An Optimal Coloring: (α,β1,β2,γ)=(1,1,3,1)

Cor

eP

erip

hery

Pe

riph

ery

Cor

e

An Optimal Coloring: (α,β1,β2,γ)=(1,1,1,1)

Cor

eP

erip

he

ry

Cor

e

Conclusion

• An optimization-based framework for finding communities in dynamic social networks.

• Finding an optimal solution is NP-Complete and APX-Hard.

• Model evaluation by exhaustive search.• Heuristic algorithms for larger data sets.

Heuristic results comparable to optimal.

a framework for community identification in dynamic social networks chayant, tanya berger-wolf,...

Documents