a framework for community identification in dynamic social networks chayant, tanya berger-wolf,...
TRANSCRIPT
A framework For Community Identification in Dynamic Social
Networks Chayant, Tanya Berger-Wolf, David Kempe
[KDD’07]
AdvisorAdvisor :: Dr. Koh Jia-Dr. Koh Jia-LingLing
SpeakerSpeaker :: Che-Wei LiangChe-Wei LiangDateDate :: 2008.1.82008.1.8
Outline
• Introduction• Problem formulation• Finding optimal colorings• Group Coloring Heuristics• Experiment• Conclusion
Introduction
• Social networks– Graphs of interactions between individuals.
Introduction
• What is Community?– collections of individuals who interact unusually
frequently.– reveal interesting properties shared by member,
such as common hobbies, occupations.
• Why dynamic community?– may have more interesting properties.
History of Interactions
t=11
2 3
45
Assume discrete time and interactions in form of complete subgraphs.
Approach: Graph Model
5
5
5
5
5
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
Preliminaries
Problem Formulation
• Behavior of individuals assumption:– Individuals and groups represent exactly one
community at a time.
– Concurrent groups represent distinct communities.
Problem Formulation
• Behavior of individuals assumption (cont.):– Conservatism: community affiliation changes are
rare.– Group Loyalty: individuals observed in a group
belong to the same community.– Parsimony: few affiliations overall for each
individual.
Approach: Color = Community
Valid coloring: distinct color of groups in each time step
i-cost
•Conservatism: switching cost (α)
•Group loyalty:•Being absent (β1) •Being different (β2)
•Parsimony: number of colors (γ)
g-cost
•Conservatism: switching cost (α)
•Group loyalty:-Being absent (β1) -Being different (β2)
•Parsimony: number of colors (γ)
c-cost
•Conservatism: switching cost (α)
•Group loyalty:-Being absent (β1) -Being different (β2)
•Parsimony: number of colors (γ)
• Minimum Community Interpretation For a given cost setting, (α,β1,β2,γ), find vertex coloring that minimizes total cost.– Color of group vertices = Community structure– Color of individual vertices = Affiliation sequences
• Problem is NP-Complete and APX-Hard
Finding Optimal Colorings
• Individual Coloring
– G(t, x): g-cost of coloring i at time step t with color x– I(t, x, y): i-cost of coloring I at time steps t and i-1 with colors x and y.– C(x, R): c-cost of using color x when R is the set of colors used
in prior steps.
Finding Optimal Colorings
• Group Coloring– Using exhaustive search over all group colorings.– Speed up by Branch-and-Bound techniques.
Group Coloring Heuristics
• Bipartite Matching Heuristic– Using standard flow techniques.
• Greedy Heuristics– Maximize “similarity”– Jaccard’s index: Jac(g, g’) = – Repeatedly select the pair(g, g’) with highest
similarity, decide g, g’ should have same color.
'
'
gg
gg
Experiment
• Synthetic Data sets
Southern Women Data Setby Davis, Gardner, and Gardner, 1941
Photograph by Ben Shaln, Natchez, MS, October; 1935 Aggregated network
Event participation
An Optimal Coloring: (α,β1,β2,γ)=(1,1,3,1)
Cor
eP
erip
hery
Pe
riph
ery
Cor
e
An Optimal Coloring: (α,β1,β2,γ)=(1,1,1,1)
Cor
eP
erip
he
ry
Cor
e
Conclusion
• An optimization-based framework for finding communities in dynamic social networks.
• Finding an optimal solution is NP-Complete and APX-Hard.
• Model evaluation by exhaustive search.• Heuristic algorithms for larger data sets.
Heuristic results comparable to optimal.