community extraction for social networks - u-m personal world wide
TRANSCRIPT
![Page 1: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/1.jpg)
Community Extraction for Social Networks
Yunpeng Zhao
Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
August 3, 2011
Advisor: Liza Levina and Ji Zhu
![Page 2: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/2.jpg)
Outline
Review of community detection
Community extraction
Asymptotic consistency
Simulation study
Real data analysis
![Page 3: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/3.jpg)
Network data
Network analysis has been a focus of attention in differentfields.
Social science: friendship networks
Internet: WWW, hyper-links
Biology: food webs, gene regulatory networks
![Page 4: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/4.jpg)
Community detection
Communities: Networks consist of communities, orclusters, with many connections within a community andfew connections between communities.
Community detection problem: For an undirected networkN = (V ,E), the community detection problem is typicallyformulated as finding a partition V = V1 ∪·· ·∪VK whichgives “tight” communities in some suitable sense.
![Page 5: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/5.jpg)
Community detection problem
Existing community detection methods: minimizing linksbetween communities while maximizing links withincommunities (see Newman (2004) for a review).
For simplicity, we consider the case of partitioning the networkinto two communities V1 and V2.
![Page 6: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/6.jpg)
Min-cut (Wu and Leahy, 1993)
To minimize
R = ∑i∈V1,j∈V2
Aij .
However, min-cut always yields a trivial solution of V1 = V orV2 = V .
![Page 7: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/7.jpg)
Ratio-cut (Wei and Cheng, 1989)
min R/(|V1| · |V2|),
where |V1| and |V2| represent the sizes of two groupsrespectively.
Ratio-cut can avoid trivial solutions because the maximizer of|V1| · |V2| is achieved at |V1| = |V2| = |V |/2.
![Page 8: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/8.jpg)
Normalized-cut (Shi and Malik, 2000)
minR
assoc(V1,V )+
Rassoc(V2,V )
,
where assoc(Vk ,V ) = ∑i∈Vk ,j∈V Aij for k = 1,2.
Normalized-cut can avoid trivial solutions because an extremelysmall group Vk may have a large ratio R/assoc(Vk ,V ).
![Page 9: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/9.jpg)
Modularity (Newman and Girvan, 2004)
To maximize
Q =2
∑k=1
[
Okk
L−
(
Dk
L
)2]
,
where Okk = ∑i∈Vk ,j∈VkAij ,Dk = ∑i∈Vk ,j∈V Aij ,L = ∑2
k=1 Dk .
Q represents the fraction of edges that fall within communities,minus the “average” value of the same quantity if edges fall atrandom given the degree of each node.
![Page 10: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/10.jpg)
Outline
X Review of community detection
Community extraction
Asymptotic consistency
Simulation study
Real data analysis
![Page 11: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/11.jpg)
Community extraction
Most networks consist of a number (not known a priori) ofcommunities, with relatively tight links within eachcommunity and sparse links to the outside, and“background” nodes that only have sparse links to othernodes.
We propose a method that extracts communitiessequentially: at each step, the tightest is extracted from thenetwork until no more meaningful communities exist.
![Page 12: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/12.jpg)
Criterion
Extract one community at a time by looking for a set ofnodes with a large number of links within itself and a smallnumber of links to the rest of the network.
The links within the complement of this set do not matter.
To maximize
W (S) =I(S)
k2 −B(S)
k(n−k),
where
I(S) = ∑i ,j∈S
Aij , B(S) = ∑i∈S,j∈Sc
Aij , k = |S| .
![Page 13: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/13.jpg)
Adjusted criterion
Empirically, the previous criterion performs well for densenetworks. However, it always finds very small communitiesfor sparse networks.
To avoid small communities, we also propose
To maximize
Wa(S) = k(n−k)
(
I(S)
k2 −B(S)
k(n−k)
)
.
The factor k(n−k) penalizes communities with k close to 1 orn and encourages more balanced solutions.
![Page 14: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/14.jpg)
Algorithm
Tabu Search (Glover, 1986; Glover and Laguna, 1997): alocal optimization technique based on label switching
Run the algorithm for many randomly ordered nodes
![Page 15: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/15.jpg)
Outline
X Review of community detection
X Community extraction
Asymptotic consistency
Simulation study
Real data analysis
Future work
![Page 16: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/16.jpg)
Block models
Asymptotic consistency can be established under theassumption of block models.
General block models1 Each node is assigned to a block independently of other
nodes, with probability πk for block k , 1 ≤ k ≤ K ,∑K
k=1 πk = 1.2 Given that node i belongs to block a and node j belongs to
block b, P[Aij = 1] = pab, and all edges are independent.
Block models for networks with background
We can define the last block as background, by assumingpaK < pbb for all a = 1, . . . ,K , and all b = 1, . . . ,K −1.
![Page 17: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/17.jpg)
Asymptotic consistency
For simplicity, assume there is only one community andbackground in the network (K = 2 with parametersp11,p12,p22,π and 1−π).
Let c denote the true community labels, c(n) denote theestimated labels, based on Bickel and Chen (2010), weproved
Theorem
For any 0 < π < 1, if p11 > p12, p11 > p22 and p11 +p22 > 2p12,the maximizer c(n) of both unadjusted and adjusted criteriasatisfies
P[c(n)= c] → 1 as n → ∞.
![Page 18: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/18.jpg)
Outline
X Review of community detection
X Community extraction
X Asymptotic consistency
Simulation study
Real data analysis
![Page 19: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/19.jpg)
Simulation I
Two communities with background (block model)
n = 1000
n1 = 100,200,n2 = 100
p12 = p23 = p13 = p33 = 0.05
p11 = 0.05i ,p22 = 0.04i , i = 3,4
Rand index
![Page 20: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/20.jpg)
Results for simulation I
M B E
0.0
0.2
0.4
0.6
0.8
1.0
n1=
100
n2=
200
p11=0.15 p22=0.12
M B E
0.0
0.2
0.4
0.6
0.8
1.0
p11=0.2 p22=0.16
M B E
0.0
0.2
0.4
0.6
0.8
1.0
n1=
200
n2=
200
M B E
0.0
0.2
0.4
0.6
0.8
1.0
![Page 21: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/21.jpg)
Simulation II
Two communities with background
n = 1000
n1 = 100,200,n2 = 100
p12 = p23 = p13 = p33 = 0.05
p11 = 0.05i ,p22 = 0.04i , i = 3,4
Doubling the degree for 10 highest degree nodes
![Page 22: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/22.jpg)
Results for simulation II
M B E
0.0
0.2
0.4
0.6
0.8
1.0
n1=
100
n2=
200
p11=0.15 p22=0.12
M B E
0.0
0.2
0.4
0.6
0.8
1.0
p11=0.2 p22=0.16
M B E
0.0
0.2
0.4
0.6
0.8
1.0
n1=
200
n2=
200
M B E
0.0
0.2
0.4
0.6
0.8
1.0
![Page 23: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/23.jpg)
Outline
X Review of community detection
X Community extraction
X Asymptotic consistency
X Simulation study
Real data analysis
![Page 24: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/24.jpg)
Karate club network
Friendships between 34 members of a karate club(Zachary, 1977).
This club has subsequently split into two parts following adisagreement between an instructor (node 0) and anadministrator (node 33).
![Page 25: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/25.jpg)
Karate club network
(a) Modularity (b) Block model (c) Extraction
01
2
3
4
5
6 7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
3233
01
2
3
4
5
6 7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
3233
01
2
3
4
5
6 7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
3233
![Page 26: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/26.jpg)
Political books network
Links in the political books network (Newman, 2006) representpairs of books frequently bought together on amazon.com.
Blue: liberalRed: conservative
![Page 27: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/27.jpg)
Political books network
(a) Modularity (b) Block model (c) Extraction
![Page 28: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/28.jpg)
Acknowledgment
Thank my advisors: Elizaveta Levina and Ji Zhu
Thank Professor Mark Newman for constructivesuggestion and sharing his code
Thank my friends: Xi Chen, Jian Guo and Yizao Wang
And also
![Page 29: Community Extraction for Social Networks - U-M Personal World Wide](https://reader035.vdocument.in/reader035/viewer/2022071601/613d3428736caf36b75a8c37/html5/thumbnails/29.jpg)
Thank you all very much!