estimating graph shweta jain properties through › wp-content › uploads › 2018 › 05 ›...
TRANSCRIPT
![Page 1: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/1.jpg)
Estimating Graph Properties through Sampling
Shweta JainAdvisor: Prof. C. Seshadhri
University of California,Santa Cruz
1
![Page 2: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/2.jpg)
Large GraphsSocial Network
Routing Networks
Protein-Interaction Networks
Citation Networks
2
![Page 3: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/3.jpg)
Peculiarities of real-world graphs❖ Degree distribution
❖ Heavy tailed
A: Actor collaboration network, B: WWW, C: Power Grid data [Barabási et. al., 1999]
Source: www.sciencemag.com3
![Page 4: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/4.jpg)
Peculiarities of real-world graphs
❖ Counts of patterns: cycles, triangles, cliques
❖ Avg. distance between nodes - small world property❖ High clustering coefficients
3-clique (triangle) 5-clique
4
5-cycle
![Page 5: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/5.jpg)
Need for graph sampling
❖ Scale - traditional graph-theoretic algorithms impractical❖ Limitations of access model e.g. streaming❖ Can utilize unique characteristics of real-world graphs
5
![Page 6: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/6.jpg)
Goals
❖ Estimate global characteristics from small sample.❖ Fast, work well on real-world instances.❖ Accurate, with provable error bounds.
6
![Page 7: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/7.jpg)
Applications
❖ Computationally hard problems - clique counting❖ Restricted access model - estimating the degree
distribution
7
![Page 8: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/8.jpg)
A Fast and Provable Method for Estimating Clique Counts using Turán’s Theorem.
Shweta JainC. Seshadhri
University of California,Santa Cruz
WWW 2017 Best Paper
8
![Page 9: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/9.jpg)
Cliques❖ k-clique: set of k vertices all connected to each other.
❖ [Holland et. al., 1970], [Milo et. al., 2002], [Burt, 2004], [Przulj et. al., 2004], [Hanneman et. al., 2005], [Hormozdiari et. al., 2007], [Faust, 2010], [Jackson, 2010], [Tsourakakis et. al., 2015], [Sizemore et. al., 2016] - clique counts appear in all these papers.
❖ Used in modeling, community detection, spam detection etc. 9
3-clique (triangle) 5-clique4-clique
![Page 10: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/10.jpg)
Problem Statement❖ Given a simple, undirected graph G, and a positive
integer k, estimate the number of k-cliques in G.
10
#5-Cliques = 0123
![Page 11: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/11.jpg)
Prior theoretical work
❖ Clique counting:❖ Arboricity and subgraph listing algorithms. [Chiba et. al., 1985]❖ Finding dense subgraphs with size bounds. [Alon et. al., 1994] ❖ Efficient algorithms for clique problems. [Vassilevska, 2009]
❖ Maximal clique counting:❖ Finding all cliques of an undirected graph. [Bron et. al., 1963]❖ Worst case time complexity of generating all maximal cliques. [Tomita
et.al., 2004]❖ Listing all maximal cliques in large sparse real-world graphs.
[Eppstein et. al, 2013]
theo work
11
![Page 12: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/12.jpg)
Challenge❖ Combinatorial explosion!
GRAPH VERTICES EDGES 7-CLIQUES 10-CLIQUES
web-BerkStan
0.6M 6M 9T 50000T
as-skitter 2M 11M 73B 22T
com-lj 4M 34M 510T 14000000T
com-orkut 3M 110M 360B 31T
12
Enumeration is costly.Hence, approximate.
![Page 13: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/13.jpg)
Practical approaches❖ Practical approaches:
❖ Color Coding [Alon et. al, 1994], [Hormozdiari et. al., 2007], [Betzler et. al., 2011], [Zhao et. al., 2012]
❖ Edge Sampling, GRAFT [Tsourakakis et. al., 2009], [Tsourakakis et. al., 2011], [Rahman et. al., 2014]
❖ MCMC based [Bhuiyan et. al., 2012]
❖ Parallel algorithm using MapReduce [Finocchi et. al., 2015]
❖ kClist [Danisch et. al., 2018]
13
![Page 14: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/14.jpg)
Our contribution
❖ We present a randomized algorithm, TuránShadow that approximates the number of k-cliques in G and has the following properties:❖ Runs on a single machine❖ Provable error bounds
14
![Page 15: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/15.jpg)
Our contribution❖ Extremely fast and accurate
❖ For 10 cliques, no other method terminated for all graphs in min{100xTuranShadow, 7 hours}!
15
GRAPH 7-CLIQUES TIME ERROR %
web-BerkStan 9.3T < 4 minutes 1.05
as-skitter 73B < 3 minutes 0.23
com-orkut 361B < 2 hours 1.97
![Page 16: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/16.jpg)
Main theoremLet S be the Turán k-clique shadow of G. Then w.h.p.
TuránShadow outputs a (1 ± !)-approximation to the number of k-cliques in G.
The running time of TuránShadow is O*(⍺|S|+m+n).⍺: degeneracy
m: #edgesn: #vertices
16
![Page 17: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/17.jpg)
Degeneracy
❖ ⍺: degeneracy of graph❖ Measure of density, low for real-world graphs❖ Let T: set of all subgraphs of G❖ Degeneracy = max
t2Tmin
v2t{degree of v in G|t}
17
![Page 18: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/18.jpg)
How many edges can a n-vertex graph have without having a triangle?
n2
4Ans:
[Turán, 1941] If the graph has more than edges, then it must have a triangle.
n2
4
18
![Page 19: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/19.jpg)
[Erdös, 1941] If the graph has even one more edge than , then it must have triangles.n2
4⌦(n)
density = #edges�n2
�
19
= 1
2
Thus, if density > , then graph necessarily has
triangles.⌦(n)
1
2
![Page 20: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/20.jpg)
Turán’s theoremGeneralizes for larger k.
If a graph on n vertices has density greater than
then it must have
k-cliques.
1� 1
k � 1
⌦(nk�2)
20
![Page 21: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/21.jpg)
G
Naïve algorithm
GGG
E[#samples] =
n =k =
1M5
#5-cliques = 100T
21
≅ 1016
![Page 22: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/22.jpg)
Key IdeaReal world graphs have dense pockets.
Drill down on dense pockets and count cliques within them!22
![Page 23: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/23.jpg)
Turan Shadow
23
G
G1G7
G4
G8 G2
G6 G3
G5
G9
Turán density!
decompose
G-> G1, k1
G2, k2
G3, k3
…
![Page 24: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/24.jpg)
G1
Turan Shadow
C = G1
G7
G4
G8G2
G6 G3
G5
G9
G1G1G1
#samples = 1 0 2 3 0 1
E[#samples] =
n1=k1=
215
<
< Erdös!
24
![Page 25: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/25.jpg)
Constructing the shadow
❖ Convert G to a DAG - order by degeneracy❖ Build clique enumeration tree, stopping whenever Turán
density is reached.
25
![Page 26: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/26.jpg)
Constructing the shadow
…v1 v2 v3 v4 vn
Convert G to DAG
26
![Page 27: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/27.jpg)
Constructing the shadow
…v1 v2 v3 vnv4
v3
v2
v4
Convert G to DAG
Check outnbrhd of v1
27
![Page 28: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/28.jpg)
Constructing the shadow
…v1 v2 v3 vnv4
v3
v2
v4
Convert G to DAG
Check outnbrhd of v1: Γ+(v1)Is density > Turán
density (k-1)? Add to TuránShadow
Yes
28
![Page 29: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/29.jpg)
Constructing the shadow
…v1 v2 v3 vnv4
v4
Convert G to DAG
Check outnbrhd of v1
Add to TuránShadow
NoExpand further
Is density > Turán density (k-1)?
Add to TuránShadowYes
v4
v2 v3
29
Γ+(v1)⋂Γ+(v2)
![Page 30: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/30.jpg)
Sampling
…
G1(n1, k1) G2(n2, k2) G3(n3, k3) Gl(nl, kl)
Sample leaf i with probability
(
niki)
Pjl
(
njkj)
Randomly sample ki vertices from leaf i
30
![Page 31: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/31.jpg)
Sampling
…
G1(n1, k1) G2(n2, k2) G3(n3, k3) Gl(nl, kl)
Bernoulli r.v. X = 1 if ki-clique, else 0
Exp[X] = #k-cliques in GPjl
(njkj)
31
![Page 32: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/32.jpg)
Putting it all together
❖ Construct Turán Shadow❖ Setup distribution over leaves❖ Sample from distribution and scale success ratio
32
![Page 33: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/33.jpg)
TuranShadow terminated in minutes for all graphs except com-orkut (3M/100M) for which it took 3 hours.
33
7 and 10 Clique Count Estimation PerformanceTi
me
(s)
0
1
10
100
1,000
10,000
100,000
loc-
gow
web
-Sta
n
amaz
on
yout
ube
Goo
gle
Berk
Stan
as-s
kitte
r
Pate
nts
soc-
poke
c
com
-lj
com
-ork
ut
k=7 k=10
![Page 34: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/34.jpg)
3-100x speedup for k=7.
For k=10, no other algorithm terminated for all graphs in min{100x, 7 hours}
34
k = 7Sp
eedu
p
0
1
10
100
1,000
loc-gow
web-Stan
amazon
youtube
BerkStan
as-skitter
Patents
soc-pokec
com-lj
com-orkut
ES GRAFT
![Page 35: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/35.jpg)
Size of shadow
105 106 107 108 109 1010
Number of edges
105
106
107
108
109
1010S
hado
wsi
zeShadow size, k=7
Shadow size roughly linear in m.35
![Page 36: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/36.jpg)
Less than 2% error with just 50,000 samples.
36
![Page 37: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/37.jpg)
Trends in clique countsC
lique
s
1E-011E+01
1E+05
1E+09
1E+13
1E+17
5 6 7 8 9 10
com-ljweb-BerkStancom-orkutas-skitter
com-youtubeamazon0601cit-Patents
Clique Size k
![Page 38: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/38.jpg)
What we achieved
❖ We make clique-counting feasible for larger cliques.❖ Single commodity machine. No need to use
MapReduce.❖ Extremely fast and accurate❖ Provable error bounds
38
![Page 39: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/39.jpg)
Open Questions
❖ Feasible for cliques of size k > 10?❖ Can we count near-cliques?❖ Can this approach be used for dense subgraph
discovery?
39
![Page 40: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/40.jpg)
Thank you
Questions?
40
![Page 41: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/41.jpg)
Provable and Practical Approximations For the Degree Distribution using Sublinear Graph Samples*
* Talya and Shweta are equal contributors.
WWW 2018
Talya EdenTel Aviv University
Shweta JainUniversity of California,
Santa Cruz
Ali PinarSandia National Labs
Dana RonTel Aviv University
C. SeshadhriUniversity of California,
Santa Cruz
1
![Page 42: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/42.jpg)
Large GraphsSocial Network
Routing Networks
Protein-Interaction Networks
Citation Networks
2
![Page 43: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/43.jpg)
❖ Degree(v) = #vertices v is connected to
Degree Distribution
d = 5v
3
![Page 44: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/44.jpg)
❖ Degree(v) = #vertices v is connected to
❖ Degree distribution: histogram of number of vertices of a certain degree
Degree Distribution
d = 5v
0
1
2
3
4
1 2 3 4 5
d
#vertices
4
![Page 45: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/45.jpg)
Heavy tail
A: Actor collaboration network, B: WWW, C: Power Grid data [Barabási et. al., 1999]Source: www.sciencemag.com
5
![Page 46: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/46.jpg)
Why sample❖ If access to whole graph: O(n) algorithm
6
0
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 50
1
2
3
4
1 2 3 4 5
d
#vertic
es
![Page 47: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/47.jpg)
Why sample❖ But what if we did not have access to whole graph?
❖ Internet, routing networks
❖ Crawl based methods, traceroutes [Faloutsos et. al., 1999]
❖ Contains bias! [Achlioptas et. al., 2009]
❖ Cannot simply scale sample.
❖ [Faloutsos et. al., 1999], [Leskovec et. al., 2006], [Ebbes et. al., 2008][Maiya et. al., 2011], [Ahmed et. al., 2010, 2014] - aim to capture representative graph sample
7
![Page 48: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/48.jpg)
Problem Definition❖ ccdh: complementary cumulative degree histogram
❖ N(d) = #vertices with degree >= d❖ monotonically non-increasing, smooth
Can we estimate N(d) for any given d?
8
![Page 49: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/49.jpg)
Query Model1. Vertex queries: u.a.r. v ∈ V
9
Can I get a vertex
Here you go!
Can I get a vertex
Here you go!
![Page 50: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/50.jpg)
Query Model2. Neighbor queries: u.a.r. neighbor u of v
10
Can I have a neighbor of A
Here you go!
Can I have a neighbor of B
Here you go!
A
B
![Page 51: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/51.jpg)
Query Model3. Degree queries: degree dv
11
Can I have the degree of A
4
Can I have the degree of B
9
A
B
![Page 52: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/52.jpg)
Query Model1. Vertex queries: u.a.r. v ∈ V
2. Neighbor queries: u.a.r. neighbor u of v
3. Degree queries: degree dv
12
![Page 53: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/53.jpg)
Prior work
❖ Vertex sampling [Stumpf et. al., 2005, Lee et. al. 2006]
❖ Edge Sampling [Stumpf et. al., 2005, Lee et. al. 2006]
❖ Random Walk with Jump [Lee et. al. 2006]
❖ Forest Fire Sampling [Faloutsos et. al., 2006]
❖ Snowball Sampling [Maiya et. al., 2011]
❖ Linear system solver [Zhang et. al., 2015]
13
All need to sample at least 10-30% of the graph!
![Page 54: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/54.jpg)
Main contribution❖ Randomized algorithm SADDLES that estimates N(d)❖ Uses a sublinear number of queries for any degree distribution
bounded below by a power law.❖ Power Law
exponent number of samples
2 n
3 n❖ Strongly sublinear!
14
1-2
-32
![Page 55: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/55.jpg)
Main contribution❖ In practice, we needed to sample only 1% of the graph❖ Works well for all degrees
15
100 101 102 103 104 105
degree d
100
101
102
103
104
105
106
N(d
)web-Google actual
SADDLESVSVS invOWSOWS invFFRWJIN inv
![Page 56: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/56.jpg)
Query complexity❖ Depends on 2 parameters:
❖ h-index = mind max(d, N(d))❖ Largest d, such that there are at least d vertices of degree
>= d.❖ Same as the bibliometric h-index!
16d
N(d)
d = N(d)
h
![Page 57: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/57.jpg)
Query complexity❖ Depends on 2 parameters:
❖ h-index = mind max(d, N(d))❖ z-index = mind:N (d)>0 sqrt(d·N(d))
❖ replace max by geometric mean❖ h and z are large for power laws!
17
![Page 58: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/58.jpg)
Vertex sampling❖ Sample u.a.r. vertices❖ Bin them according to degree❖ Need samples
18
Have to take many samples
to hit highdegree vertex
d d
![Page 59: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/59.jpg)
Edge sampling
19
-61
-61
-61-6
1
-61
-61 1
1
11
1
1
Undirected edge -> 2 directed edges
wt((v,u)) =
![Page 60: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/60.jpg)
Edge sampling
20
-61
-61
-61-
61
-61
-61 1
1
11
1
1
1Sum of weights of
edges incident on a vertex = 1
![Page 61: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/61.jpg)
Edge sampling
21
-61
1
1
11
1
11
Sum of weights of edges incident on a vertex = 1
-61-6
1
-61
-61 -6
1
![Page 62: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/62.jpg)
Edge sampling
22
-61
1
1
11
1
11
1
1
1
1
1
1-61
-61
-61 -6
1
Sum of all weights = n
![Page 63: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/63.jpg)
Edge sampling
23
-61
0
0
00
0
00
0
0
0
0
0
1-61
-61
-61 -6
1
To get N(d), set weights of
irrelevant edges to 0
Say, d = 5
Sum of all weights = N(d)
![Page 64: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/64.jpg)
Edge sampling
24
0 00000 -61 -6
1 -61 -6
1 -61 -6
1
Set of objects, we want their sum
Sample randomly
Take average of sampled weights
Scale by number of edgesto get total sum
-61 -6
1 -61 0
—42/6 x 12 = 1
0
![Page 65: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/65.jpg)
Main Idea❖ Combine vertex sampling and edge sampling❖ But we don’t have edge sampling❖ Simulate it!
25
![Page 66: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/66.jpg)
Theoretical work❖ Average degree [Feige et. al., 2006], [Goldreich et. al.,
2002, 2008]❖ Number of star graphs, moments [Eden et. al., 2011]❖ Number of triangles [Eden et. al., 2014]
26
![Page 67: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/67.jpg)
Simulated Edge Sampling
27
❖ Sample some vertices❖ The neighbors of these vertices is the edge set that
we will perform random sampling on.
![Page 68: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/68.jpg)
Simulated Edge Sampling
28
❖ Sample r vertices❖ Set up distribution D to sample
vertex v ∝ dv❖ Repeat q times:
❖ Sample a vertex v from D❖ Sample u.a.r. neighbor u of
v❖ Find average weight of
samples❖ Scale appropriately
u
r vertices
v
![Page 69: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/69.jpg)
Putting it all together
29
Sample vertices
Enough vertices withdegree>d found?
Yes Use estimator of vertex sampling
No
Sample edgesand use estimator of
edge sampling
d
d
![Page 70: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/70.jpg)
r and q❖ Total samples: ❖ How big do r and q need to be?
❖ If VS: r =❖ If ES: r = ❖ Similarly,
q =
30
degree d vertex
Want at least 1 of its d neighbors to be in R
![Page 71: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/71.jpg)
Query complexity❖ Query complexity:
31
❖ Vertex queries: ❖ Neighbor queries:
d
N(d)
d = N(d)
h
![Page 72: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/72.jpg)
Simulated Edge Sampling❖ Single edge sample is uniform at random❖ But multiple edge samples are correlated❖ Key insights:
❖ Correlation can be contained if h and z are high. Power laws have high h and z!
❖ 1-hop distance is enough - don’t need to do long random walks
32
![Page 73: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/73.jpg)
h and z❖ Indeed large!
33
GRAPH VERTICES EDGES AVG. DEG. h z
web-BerkStan 0.6M 6M 10 707 220
as-skitter 2M 11M 7 982 184
com-lj 4M 34M 9 810 114
com-orkut 3M 110M 38 1638 172
![Page 74: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/74.jpg)
Results
34
100 101 102 103 104
degree d
100
101
102
103
104
105
106
107
N(d
)
cit-Patents actualSADDLESVSVS invOWSOWS invFFRWJIN inv
100 101 102 103 104 105
degree d
100
101
102
103
104
105
106
N(d
)
web-Google actualSADDLESVSVS invOWSOWS invFFRWJIN inv
![Page 75: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa](https://reader033.vdocument.in/reader033/viewer/2022060323/5f0dd1ac7e708231d43c3e00/html5/thumbnails/75.jpg)
Thank you
Questions?
35