counting triangles in real-world networks
DESCRIPTION
Counting Triangles in Real-World NetworksTRANSCRIPT
![Page 2: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/2.jpg)
CSE'11 2
Geoff Sanders Lawrence Livermore
![Page 3: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/3.jpg)
CSE'11 3
Gary L. Miller SCS, CMU
Mihail N. Kolountzakis Math, University of Crete
![Page 4: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/4.jpg)
Motivation Existing Work Spectral Family Combinatorial Family Experimental Results Conclusions
CSE'11 4
![Page 5: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/5.jpg)
A
C B
(Wasserman Faust ‘94)
Friends of friends tend to become friends themselves!
(left to right) Paul Erdös , Ronald Graham, Fan Chung Graham CSE'11 5
![Page 6: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/6.jpg)
6 CSE'11
Eckmann-‐Moses, Uncovering the Hidden Thematic Structure of the Web (PNAS, 2001)
Key Idea: Connected regions of high curvature (i.e., dense in triangles) indicate a common topic!
![Page 7: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/7.jpg)
7 CSE'11
Triangles used for Web Spam Detection (Becchetti et al. KDD ‘08)
Key Idea: Triangle Distribution among spam hosts is significantly different from non-‐spam hosts!
![Page 8: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/8.jpg)
8 CSE'11
Triangles used for assessing Content Quality in Social Networks
Welser, Gleave, Fisher, Smith Journal of Social Structure 2007
Key Claim: The amount of triangles in the self-‐centered social network of a user is a good indicator of the role of that user in the community!
![Page 9: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/9.jpg)
CSE'11 9
![Page 10: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/10.jpg)
CSE'11 10
(Watts,Strogatz’98)
![Page 11: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/11.jpg)
Signed triangles in structural balance theory Jon Kleinberg
Triangle closing models also used to model the microscopic evolution of social networks (Leskovec et.al., KDD ‘08)
CSE'11 11
![Page 12: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/12.jpg)
CAD applications, E.g., solving systems of geometric constraints involves triangle counting! (Fudos, Hoffman 1997)
CSE'11 12
![Page 13: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/13.jpg)
Numerous other applications including : • Motif Detection/ Frequent Subgraph Mining (e.g., Protein-‐Protein Interaction Networks)
• Community Detection (Berry et al. ‘09) • Outlier Detection (CET ‘08) • Link Recommendation
13 CSE'11
Fast triangle counting algorithms are necessary.
![Page 14: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/14.jpg)
There is no general, good definition but typical characteristics include: Skewed degree distributions High clustering coefficients “Small world” characteristics (Six degrees of separation)
CSE'11 14
![Page 15: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/15.jpg)
Motivation Existing Work Spectral Family Combinatorial Family Experimental Results Conclusions
CSE'11 15
![Page 16: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/16.jpg)
Alon Yuster Zwick
Asymptotically the fastest algorithm but not practical for large graphs.
In practice, one of the iterator algorithms are preferred. • Node Iterator (count the edges among the neighbors of each vertex)
• Edge Iterator (count the common neighbors of the endpoints of each edge)
Both run asymptotically in O(mn) time. CSE'11 16
![Page 17: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/17.jpg)
Remarks In Alon, Yuster, Zwick appears the idea of partitioning the vertices into “large” and “small” degree and treating them appropriately.
For more work, see references in our paper: ▪ Itai, Rodeh (STOC ‘77) ▪ Papadimitriou, Yannakakis (IPL ‘81) ……
CSE'11 17
![Page 18: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/18.jpg)
r independent samples of three distinct vertices
CSE'11 18
Then the following holds:
with probability at least 1-δ
Works for dense graphs. e.g., T3 n2logn
![Page 19: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/19.jpg)
(Yosseff, Kumar, Sivakumar ‘02) require n2/polylogn edges
More follow up work: (Jowhari, Ghodsi ‘05) (Buriol, Frahling, Leondardi, Marchetti, Spaccamela, Sohler ‘06)
(Becchetti, Boldi, Castillio, Gionis ‘08)
CSE'11 19
![Page 20: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/20.jpg)
Motivation Existing Work Spectral Family Combinatorial Family Experimental Results Conclusions
CSE'11 20
![Page 21: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/21.jpg)
21 CSE'11
eigenvalues of adjacency matrix
i-th eigenvector
Key Idea: Few top eigenvalue-‐eigenvector pairs typically give a good approximation to the number of triangles.
CET, [ICDM ’08]
![Page 22: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/22.jpg)
CSE'11 22
Keep only 3!
Political Blogs Network (1.2K,17K) (Adamic, Glance ‘04)
![Page 23: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/23.jpg)
The few top eigenvalues are significantly larger than the bulk of the eigenvalues (“Eigenvalue power law”)
Hence, they contribute a lot to the number of triangles and cubes amplify this even more.
Bulk of eigenvalues almost symmetrically distributed around 0, cubes cancel out.
Lanczos method converges fast due to large eigengaps.
CSE'11 23
![Page 24: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/24.jpg)
CSE'11 24
Political Blogs Network (1.2K,17K) (Adamic, Glance ‘04)
Pearson’s correlation coefficient ρ=0.9997 using a rank 10 approximation
![Page 25: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/25.jpg)
CSE'11 25
Note: with a rank 3 approximation almost perfect results
![Page 26: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/26.jpg)
Sample the i-‐th column A(i) of the adjacency matrix with probability proportional to the degree of the i-‐th vertex and scale it “appropriately”
Compute a low rank approximation of sampled matrix using SVD.
CSE'11 26
CET, [KAIS ’11]
Key idea
![Page 27: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/27.jpg)
Observation 1: Eigendecomposition <-‐> SVD when matrix is symmetric, i.e., eigenvectors = left singular vectors λi=σi sgn(uivi) (where λi,σi eigenvalue, singular value respectively, ui and vi left and right singular vectors respectively.
Observation 2: We care about a k-‐rank approximation Ak of A, where k is small.
CSE ’11 27
![Page 28: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/28.jpg)
Frieze, Kannan, Vempala
Idea: Sample c columns, obtain A and find Ak instead of the optimal Ak. Recover signs from left and right singular vectors. Use EigenTriangle!
Results: c=100, k=6 for Flickr (400k,2M) 95.6% accuracy
CSE ‘11 28
(1) Pick column i with probability proportional to its squared length (2) Use the sampled matrix to obtain a good low rank approximation to the original one
~ ~
![Page 29: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/29.jpg)
Success is based on empirical properties: Real world networks typically satisfy the properties shown before but not always.
Very little knowledge about the spectrum, most we know about are the top eigenvalues
Way less knowledge about eigenvectors of real world networks
CSE'11 29
![Page 30: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/30.jpg)
Motivation Existing Work Spectral Family Combinatorial Family Results Conclusions
CSE'11 30
![Page 31: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/31.jpg)
Approximate a given graph G with a sparse graph H, such that H is close to G in a certain notion.
Examples: Cut preserving Benczur-‐Karger
Spectral Sparsifier Spielman-‐Teng
CSE ‘11 31
What about Triangle Sparsifiers?
![Page 32: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/32.jpg)
CSE'11 32
![Page 33: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/33.jpg)
Speedup: e.g., if we use any standard iterator method 1/p2
Setting p optimally using “median boosting trick” (Jerrum, Valiant, Vazirani ‘86)
Sampling in expected sublinear time O(pm) Can justify even O(n) speedups in graphs with sufficiently many triangles.
Practice: huge speedups, high accuracy
CSE'11 33
![Page 34: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/34.jpg)
CSE'11 34
McSherry Achlioptas
CET et al. [ASONAM ‘09] : Speeds up spectra computations while not affecting triangle estimates MACH: Fast Randomized Tensor Decompositions (CET, SDM’10) Theoretical guarantees on HOSVD decompositions for dense tensors, works great in practice for Tucker decompositions too.
Sparsify matrix A appropriately Compute faster a low rank Approximation which is “good” in terms of any reasonable norm (e.g., Frobenious,2-‐norm)
![Page 35: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/35.jpg)
Theorem If then with probability 1-‐1/n3-‐d the sampled graph has a triangle count that ε-‐approximates the true number of triangles for any 0<d<3.
CSE'11 35
![Page 36: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/36.jpg)
CSE'11 36
1 k+1
2
Every graph on n vertices with max. degree Δ(G) =k is (k+1) -‐colorable with all color classes differing at size by at most 1.
….
![Page 37: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/37.jpg)
Create an auxiliary graph where each triangle is a vertex and two vertices are connected iff the corresponding triangles share an edge.
Observe: Δ(G)=Ο(n)
Invoke Hajnal-‐Szemerédi theorem and apply Chernoff bound per each chromatic class. Finally, take a union bound. Q.E.D.
CSE'11 37
![Page 38: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/38.jpg)
CSE'11 38
K, M, Peng, CET Int. Math. ‘11
![Page 39: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/39.jpg)
CSE'11 39
![Page 40: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/40.jpg)
CSE'11 40
Given a graph G with n vertices and m edges which graph maximizes the edges in the line graph L(G)?
![Page 41: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/41.jpg)
CSE'11 41
![Page 42: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/42.jpg)
Motivation Existing Work Spectral Family Combinatorial Family Experimental Results Conclusions
CSE'11 42
![Page 43: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/43.jpg)
CSE'11 43
![Page 44: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/44.jpg)
CSE'11 44
LiveJournal (5.4M,48M)
Orkut (3.1M,117M)
Web-‐EDU (9.9M,46.3M)
YouTube (1.2M,3M)
Flickr, (1.9M,15.6M)
![Page 45: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/45.jpg)
CSE'11 45
Social networks abundant in triangles!
![Page 46: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/46.jpg)
0
50
100
150
200
250
Orkut Flickr Livejournal Wiki-‐2006 Wiki-‐2007
Exact
Triple Sampling
Hybrid
CSE'11 46
secs
Accuracy ~99%
![Page 47: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/47.jpg)
p was set to 0.1. More sophisticated techniques for setting p exist (CET, Kolountzakis, Miller ) using a doubling procedure.
From our results, there is not a clear winner, but the hybrid algorithm achieves both high accuracy and speed.
Our code, even our exact algorithm, outperforms the fastest approximate counting competitors code, hence we compared different versions of our code!
CSE'11 47
![Page 48: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/48.jpg)
Motivation Existing Work Spectral Family Combinatorial Family Experimental Results Conclusions
CSE'11 48
![Page 49: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/49.jpg)
Real world graphs though of as “planar graphs” Many problems can be solved more efficiently than the general case.
Spectral algorithm designed based on empirical special spectral properties
Triangle Sparsifiers (fast with strong theoretical guarantees)
CSE'11 49
![Page 50: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/50.jpg)
“Ιnterplay” Combinatorial-‐Spectral approach MACH for HOSVD
Degree based partitioning is a very practical “trick”
State of the art results for sampling based and semi-‐streaming triangle counting algorithms
CSE'11 50
![Page 51: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/51.jpg)
Triangles in Kronecker graphs [CET ICDM’08] Triangle Power Laws [CET ICDM’08] Random projections and counting triangles [ Kolountzakis, Miller, Peng, CET ‘11] Semistreaming model with low space usage and only 3 passes over the graph stream
[ Kolountzakis, Miller, Peng, CET ‘11] MapReduce implementation [CET et al, KDD’09] High quality code with optimized cache properties
CSE'11 51
![Page 52: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/52.jpg)
Remove edge (1,2)
Remove any weighted edge w sufficiently large
52 CSE'11
Spielman-‐Srivastava and Benczur-‐Karger sparsifiers also don’t work!
![Page 53: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/53.jpg)
THANK YOU!
QUESTIONS
CSE'11 53
![Page 54: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/54.jpg)
CSE'11 54
![Page 55: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/55.jpg)
CSE'11 55
621,963,073
![Page 56: Counting Triangles in Real-World Networks](https://reader034.vdocument.in/reader034/viewer/2022042901/568c35ad1a28ab02359530e8/html5/thumbnails/56.jpg)
Best method for our applications: best running time, high accuracy
CSE'11 56
Hybrid vs. Naïve Sampling improves accuracy, Increases running time