billion-scale network embedding with iterative random ...4 challenge: billion-scale network data...
TRANSCRIPT
![Page 1: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/1.jpg)
Billion-scale Network Embedding
with Iterative Random Projection
Ziwei Zhang Peng Cui Haoyang Li Xiao Wang Wenwu Zhu
Tsinghua U Tsinghua U Tsinghua U Tsinghua U Tsinghua U
![Page 2: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/2.jpg)
2
Network Data is Ubiquitous
Social Network Biology Network
Traffic Network
![Page 3: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/3.jpg)
3
Network Embedding: Vector Representation of Nodes
Generate
Embed
Apply feature-based machine
learning algorithms
Fast computing of nodes similarity
Support parallel computing
Applications: link prediction,
node classification, community
detection, centrality measure,
anomaly detection ...
![Page 4: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/4.jpg)
4
Challenge: Billion-scale Network Data
Social Networks
WeChat: 1 billion monthly active users (March,
2018)
Facebook: 2 billion active users (2017)
E-commerce Networks
Amazon: 353 million products, 310 million users, 5
billion orders (2017)
Citation Networks
130 million authors, 233 million publications, 754
million citations (Aminer, 2018)
How to conduct network embedding for
such large-scale network dataοΌ
![Page 5: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/5.jpg)
5
Bottleneck of Existing Methods Methods based on random-walks
DeepWalk, B. Perozzi, et al. KDD 2014.
LINE, J. Tang, et al. WWW 2015.
Node2vec, A. Grover, et al. KDD 2016.
Methods based on matrix factorization
M-NMF, X. Wang, et al. AAAI 2017.
AROPE, Z. Zhang, et al. KDD 2018.
Methods based on deep learning
SDNE, D. Wang, et al. KDD 2016.
DVNE, D. Zhu, et al. KDD 2018.
Common bottleneck: based on sophisticated optimization
Computationally expansive
Hard to resort to distributed computing scheme
Optimization is entangled and needs global information
β Communication cost is high
Only handle thousands or millions of nodes and edges
![Page 6: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/6.jpg)
Network embedding: essentially a dimension reduction problem
Random projection: optimization-free for dimension reduction
Basic idea: randomly project data into a low-dimensional subspace
Extremely efficient and friendly to distributed computing
6
Random Projection
![Page 7: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/7.jpg)
Key network property: high-order proximity
Can solve the network sparsity problem
Measure indirect relationship between nodes
β How to design a high-order proximity preserved random projection?
7
High-Order Proximity
Target Node
First-order
Second-order
Third-order
β¦
![Page 8: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/8.jpg)
8
Problem Formulation Objective function: matrix factorization of preserving high-order proximity
minπ,π
π β πππ π2
π = π π΄ = πΌ1π΄1 + πΌ2π΄
2 +β―+ πΌππ΄π
Slight modification: assuming positive semi-definite and using 2 norm
minπ
πππ β πππ2
π = π π΄ = πΌ1π΄1 + πΌ2π΄
2 +β―+ πΌππ΄π
Random projection:
Denote π β βπΓπ as a Gaussian random matrix
π ππ~π© 0,1
π
Surprisingly simple result:
π = ππ
![Page 9: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/9.jpg)
9
Theoretical Guarantee
Theoretical guarantee
Basically, random projection can effectively minimize the objective function
However, calculating π is still very time consuming
![Page 10: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/10.jpg)
10
Iterative Projection Iterative projection:
π = ππ = πΌ1π΄1 + πΌ2π΄
2 +β―+ πΌππ΄π π
= πΌ1π΄1π + πΌ2π΄
2π +β―+ πΌππ΄ππ
Can be calculated iteratively
Why efficient?
π΄: π Γπ sparse adjacency matrix
π : π Γ π low-dimensional matrix
Associative law of matrix multiplication
π΄π΄β¦π΄π΄π΄π
Γ A Γ A Γ A
π΄π΄β¦π΄π΄ π΄π
π΄π΄β¦π΄ π΄π΄π
Sparse
Low-dimensional
Sparse
Low-dimensional
Sparse
Low-dimensional
Sparse matrix multiplication!
![Page 11: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/11.jpg)
11
Iterative Projection
A1 A2 AqΓ A Γ A Γ Aβ¦
Γ R
π1Γ A
π2 ππΓ A Γ Aβ¦
Γ R
Time Consuming!
Efficient!
![Page 12: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/12.jpg)
12
RandNE: Iterative Random projection Network Embedding
Time Complexity: π πππ + ππ2
π/π: number of nodes/edges; π: dimension; π: order
Linear w.r.t. network size
Only need to calculate q sparse matrix products
Orders of magnitude faster than existing methods!
Advantages:
Distributed Calculation
Dynamic Updating
![Page 13: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/13.jpg)
13
Distributed Calculation
Iterative random projection only involves matrix product ππ = π΄ππβ1
Each dimension can be calculated separately
Property of sparse matrix product
No communication is needed during calculation!
π β βπΓπ
![Page 14: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/14.jpg)
14
Dynamic Updating Networks are dynamic in nature
E.g., in social networks, users add/delete friends, new users join, old users leave
Changes of edges β Calculate incremental parts!
ππ + Ξππ = π΄ + Ξπ΄ β (ππβ1+Ξππβ1)
β Ξππ = π΄ β Ξππβ1 + Ξπ΄ β ππβ1 + Ξπ΄ β Ξππβ1
Changes of nodes β adjust the dimensionality
TimeT T+1
![Page 15: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/15.jpg)
15
Dynamic Updating
Linear scalability w.r.t. number of changed nodes/edges
No error accumulation
Identical results as re-running the algorithm
![Page 16: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/16.jpg)
16
Experimental Setting: Moderate-scale Networks
Datasets: BlogCatalog, Flickr, YouTube
Baselines:
DeepWalk (KDD 2014): DFS random walk + skip-gram
LINE (WWW 2015): BFS random walk + skip-gram
Node2vec (KDD 2016): biased random walk + skip-gram
SDNE (KDD 2016): deep auto-encoder
![Page 17: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/17.jpg)
17
Experimental Results Running time
At least dozens of times faster
![Page 18: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/18.jpg)
20
Experimental Results Node Classification
![Page 19: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/19.jpg)
22
Experimental Results Parameter analysis:
Effectiveness of preserving high-order proximity
Scalability
<4 minutes for network with 1 million nodes, 100 million edges with one PC
![Page 20: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/20.jpg)
23
Experiments on a Billion-scale Network Experimental results on WeChat
250 millions nodes, 4.8 billion edges
Network Reconstruction
Dynamic link prediction
Better results and no error accumulation!
![Page 21: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/21.jpg)
24
Experimental Results Running time and acceleration ratio
Practical running time for real billion-scale networks
Number of Computing Nodes 4 8 12 16
Running Time(s) 82157 46029 33965 24757
<7 hours!
Support distributed computing
![Page 22: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/22.jpg)
25
Conclusion RandNE: a billion-scale network embedding method
Based on iterative random projection to preserve high-order proximities
Much more computationally efficient
Distributed algorithm
Handle dynamic networks
Experimental results on moderate-scale networks
At least one order of magnitude faster
Better or comparable performance
Linear scalability
Experiments on WeChat, a real billion-scale network
Better results in network reconstruction and link prediction
No error accumulation
Linear acceleration ratio
![Page 23: Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data Social Networks WeChat: 1 billion monthly active users (March, 2018) Facebook: 2 billion](https://reader035.vdocument.in/reader035/viewer/2022071608/6145a12807bb162e665fcee3/html5/thumbnails/23.jpg)
26
Thanks!Ziwei Zhang, Tsinghua University
http://zw-zhang.github.io/
http://nrl.thumedialab.com/