learning representations of large-scale networksir.sdu.edu.cn/~zhuminchen/rl/tangjian2017.pdf ·...

117
Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal Institute of Learning Algorithms (MILA) [email protected] 1

Upload: hakhanh

Post on 08-Sep-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Learning Representations of

Large-Scale Networks Jian Tang

HEC Montréal Montréal Institute of Learning Algorithms (MILA) [email protected]

1

Page 2: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

2

Networks

Social Network

World Wide Web Internet-of-Things

Road Network

• Ubiquitous in real world

• A flexible and general data structure • Many types of data can be formulated as networks

Page 3: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

3

Continental Airline: Source: http://www.airlineroutemaps.com/

Page 4: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

4 Graph from Albert-László Barabási’ s SIGIR09 keynote

Page 5: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

5

Gene-Regulatory Network - Abdollahi et al. Transcriptional network governing the angiogenic switch in human pancreatic cancer. PNAS vol.

104 no. 31,2007

Page 6: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Network Mining: Link Prediction

Does she know Richard Gere?

6

Page 7: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Network Mining: Ranking

Co-citation network of Sloan Digital Sky Survey - http://nevac.ischool.drexel.edu/~james/infovis09/FP-tree-visual.html

• Importance of vertices • Which is the most influential paper?

7

Page 8: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Network Mining: Community Detection

Coauthor

Network Information retrieval

Machine learning Data mining

Who tend to work together? - Q.Mei, D.Cai, D.Zhang, and C.Zhai, Topic Modeling with Network Regularization, WWW 2008

8

Page 9: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Network Mining: Classification

• d1 is democratic • d2 is republican • What can we say about d3 and d4?

- Graph from Jerry Zhu’s tutorial in ICML 07

9

Page 10: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Network Mining: Many Other Tasks

• Sampling

• Recommendation

• Structure analysis (e.g., structural holes)

• Evolution

• Matching

• Visualization

• „

10

Page 11: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

11

Traditional Representations of Networks

• Suffer from data sparsity • Suffer from high dimensionality • Does not facilitate computation • Does not represent “semantics” • „

Page 12: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Research Question and Challenges (1)

• How to effectively and efficiently represent a Large-scale network?

• Challenges:

• Large-scale: millions of nodes and billions of edges

• Heterogeneous: directed/undirected, and binary/weighted

12

Page 13: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

13

Learning Node Representations for Networks

• Node Classification • Node Clustering • Link Prediction • Recommendation • „

• Word representation

Unstructured text

Text representation, e.g., word and document representation, „

Deep learning has been attracting increasing attention „

A future direction of deep learning is to integrate unlabeled data „ The Skip-gram model is quite effective and efficient „ „

degree

network

edge

node word

document

classification

text

embedding

Word co-occurrence network

• E.g., Facebook social network -> user representations (features)-> friend recommendation

Network Node representations

Page 14: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Extremely Low-dimensional Representations: 2D/3D for Visualizing Networks

14

„. „.

„. „. „.

„. Networks 2D/3D Layout

Heatmaps

Network Diagrams Scatter Plots

„.

High-dimensional Data

Page 15: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Visualizing Scientific Papers (Tang et al. 2016)

15

10M Scientific Papers on One Slide

Page 16: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

In some cases, we are given a collection of small networks„

16

Sentence dependency graph

Molecule networks

• A collection of small networks with different structures

„ Ego-networks

Page 17: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Research Question and Challenges (2)

• How to learn an effective representation for an entire network?

• Challenges:

• The structures of different networks are different

17

Page 18: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Outline

• Part I: Learning Node Representations of Networks

• Related Work: Laplacian Eigenmap, Word2Vec

• LINE, DeepWalk, and Node2Vec

• Extensions

• Part II: Visualizing Networks and High-Dimensional Data

• t-SNE

• LargeVis

• Pat III: Learning Representations of Entire Networks

• CNN

• Neural Message Passing

• Part IV: Summary, Challenges & Future Work

18

Page 19: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Outline

• Part I: Learning Node Representations of Networks

• Related Work: Laplacian Eigenmap, Word2Vec

• LINE, DeepWalk, and Node2Vec

• Extensions

• Part II: Visualizing Networks and High-Dimensional Data

• t-SNE

• LargeVis

• Pat III: Learning Representations of Entire Networks

• CNN

• Neural Message Passing

• Part IV: Summary, Challenges & Future Work

19

Page 20: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Problem Definition: Node Embedding

20

Networks Node representations

• Given a network/graph G=(V, E, W), where V is the set of nodes, E is the set of edges between the nodes, and W is the set of weights of the edges, the goal of node embedding is to represent each node i with a vector , which preserves the structure of networks.

Page 21: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Related Work

• Classical graph embedding algorithms • MDS, IsoMap, LLE, Laplacian Eigenmap, „ • Hard to scale up

• Graph factorization (Ahmed et al. 2013) • Not specifically designed for network representation • Undirected graphs only

• Neural word embeddings (Bengio et al. 2003) • Neural language model • word2vec (skipgram), paragraph vectors, etc.

21

Page 22: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Laplacian Eigenmap (Belkin and Niyogi, 2003)

22

• Intuition: the embeddings of similar nodes should be close to each other

• Objective:

• Where , L is the Laplacian matrix , and

• Optimization by finding the eigenvectors of smallest eigenvalues of the Laplacian matrix L:

• Computationally expensive for finding eigenvectors when networks are very big

L =D-W Dii = wijj

å

Lu = lDu

Mikhail Belkin and Partha Niyogi. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 2003.

Page 23: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Skipgram (Mikolov et al. 2014)

23

• Goal: represent each word i with a vector by training from a sequence

• Distributional hypothesis (John Rupert Firth): You know a word by the company it keeps

• Skip-gram: learning word representations by predicting the nearby words within a window

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2014

Page 24: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Skipgram

24

• Objective:

• Where c is the window size • Direct optimization is computationally expensive due to the softmax function

• Negative sampling:

• Where is a noisy distribution

Page 25: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

LINE: Large-scale Information Network Embedding (Tang et al., Most Cited Paper of WWW 2015)

• Arbitrary types of networks • Directed, undirected, and/or weighted

• Clear objective function • Preserve the first-order and second-order proximity

• Scalable • Asynchronous stochastic gradient descent • Millions of nodes and billions of edges: a coupe of hours on a single machine

25

Jian Tang, Meng Qu, Mingzhe Wang, Jun Yan, Ming Zhang and Qiaozhu Mei. LINE: Large-scale Information Network Embedding. WWW’15

Page 26: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

First-order Proximity

26

• The local pairwise proximity between the nodes • However, many links between the nodes are not observed

• Not sufficient for preserving the entire network structure

Page 27: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Second-order Proximity

27

• Proximity between the neighborhood structures of the nodes

“The degree of overlap of two people’s friendship networks correlates with the strength of ties between them” --Mark Granovetter

“You shall know a word by the company it keeps” --John Rupert Firth

Page 28: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Preserving the First-order Proximity (LINE 1st)

28

• Distributions: : (defined on the undirected edge i - j)

• Objective:

p̂1(vi,v j ) =wij

wmn(m,n)ÎE

å

O1 =KL( p̂1, p1) = - wij log p1(vi,v j )(i, j )ÎE

å

: Embedding of i

Empirical distribution of first-order proximity:

Model distribution of first-order proximity:

Page 29: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Preserving the Second-order Proximity (LINE 2nd)

29

• Distributions: (defined on the directed edge i -> j)

• Objective:

p̂2(v j | vi ) =wij

wikkÎV

å

O2 = KL( p̂2(× | vi ), p2(× | vi ))i

å = - wij log p2(v j | vi )(i, j )ÎE

å

Empirical distribution of neighborhood structure:

Model distribution of neighborhood structure:

Page 30: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Optimization Tricks

30

• Stochastic gradient descent + Negative Sampling • Randomly sample an edge and multiple negative edges

• The gradient w.r.t the embedding with edge (i, j)

• Problematic when the variances of weights of the edges are large • The variance of the gradients are large

• Solution: edge sampling • Sample the edges according to their weights and treat the edges as binary

• Complexity: O(d*K*|E|) • Linear to the dimensionality d, the number of negative samples K, and the number of edges

Page 31: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Discussion

31

• Embed nodes with few neighbors • Expand the neighbors by adding higher-order neighbors • Breadth-first search (BFS) • Adding only second-order neighbors works well in most cases

• Embed new nodes • Fix the embeddings of existing nodes • Optimize the objective w.r.t. the embeddings of new nodes

Page 32: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

DeepWalk (Perozzi et al. 2014)

32

• Learning node representations with the technique for learning word representations, i.e., Skipgram

• Treat random walks on networks as sentences

Random walk generation (generate node contexts through random search)

Predict the nearby nodes in the random walks

Bryan Perozzi, Rami Al-Rfou, Steven Skiena. DeepWalk: Online Learning of Social Representations. KDD’14

Page 33: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

DeepWalk (Perozzi et al. 2014)

33

• Optimization: hierarchical softmax (Morin, Bengio, 2005)

• Assign the nodes to the leaves of a binary tree • Predict the node => predict a path in the tree

• Make binary decisions along the path

• Complexity from |V| to log(|V|)

Predict the nearby nodes in the random walks (v1->v3, v1->v5)

v1

Hierarchical softmax

Page 34: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Node2Vec (Grover and Leskovec, 2016)

34

• Find the node context by a hybrid strategy of • Breadth-first Sampling (BFS): homophily • Depth-first Sampling (DFS): structural equivalence

Aditya Grover and Jure Leskovec. node2vec: Scalable Feature Learning for Networks. KDD’16

Page 35: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Expand Node Contexts with Biased Random Walk

35

• Biased random walk with two parameters p and q • p: controls the probability of revisiting a node in the walk • q: controls the probability of exploring “outward” nodes • Find optimal p and q through cross-validation on labeled data

• Optimized through similar objective as LINE with first-order proximity

Page 36: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Comparison between LINE, DeepWalk, and Node2Vec

36

Algorithm Neighbor Expansion

Proximity Optimization Labeled Data

LINE BFS 1st or 2nd Negative Sampling No

DeepWalk Random 2nd Hierarchical Softmax

No

Node2Vec BFS + DFS 1st Negative Sampling Yes

Page 37: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Applications

37

• Node classification (Perozzi et al. 2014, Tang et al. 2015a, Grover et al. 2015 )

• Node visualization (Tang et al. 2015a) • Link prediction (Grover et al. 2015) • Recommendation (Zhao et al. 2016) • Text representation (Tang et al. 2015a, Tang et al. 2015b)

• „

Page 38: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Node Classification

38

Table: Results on Flickr Network (Perozzi et al. 2014)

• social network => user representations (features) => classifier

• Community identities as classification labels

DeepWalk > Laplacian Eigenmap

Page 39: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Node Classification

39

Table: Results on Youtube Network(Tang et al. 2015a)

• social network => user representations (features) => classifier

• Community identities as classification labels

LINE(1st + 2nd ) >LINE(2nd )> DeepWalk > LINE(1st )

Page 40: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Node Visualization (Tang et al. 2015a)

40

• Coauthor network: authors from three different research fields

“Data mining” “Machine learning” “Computer vision”

(a) Graph factorization (b) DeepWalk (c) LINE(2nd )

Page 41: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Link Prediction (Grover and Leskovec, 2016)

41

Node Embeddings (LINE, DeepWalk, node2vec) > Jaccard’s Coefficient > Adamic-Adar

Table: Results of Link Prediction

Page 42: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Unsupervised Text Representation (Tang et al. 2015a)

degree

network

edge

node word

document

classification

text

embedding

Word co-occurrence network

Unstructured text

Text representation, e.g., word and document representation, „

Deep learning has been attracting increasing attention „

A future direction of deep learning is to integrate unlabeled data „

The Skip-gram model is quite effective and efficient „ Information networks encode the relationships between the data objects „ text

information

network

word …

classification

doc_1

doc_2

doc_3

doc_4 …

Word-document network

42

• Construct text networks from unstructured text

Page 43: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Word Analogy (Tang et al. 2015a)

• Entire Wikipedia articles => word co-occurrence network (~2M words, 1B edges)

LINE(2nd) > LINE(1st)

LINE(2nd) > SkipGram

• Size of word co-occurrence networks does not grow linearly with data size • Only the weights of edges change

43

Algorithm Semantic(%) Syntactic(%) Overall

GF 61.38 44.08 51.93

SkipGram 69.14 57.94 63.02

LINE(1st) 58.08 49.42 53.35

LINE(2nd ) 73.79 59.72 66.10

Page 44: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Text Classification on Long Documents (Tang et al. 2015a)

• Word co-occurrence network (w-w) , word-document network (w-d) to learn the word embedding

• Document embedding as average of word embeddings in the document

LINE(w-w) > SkipGram LINE(w-d) > LINE(w-w)

44

Accuracy

Page 45: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Text Classification on Short Documents (Tang et al. 2015a)

• Word co-occurrence network (w-w) , word-document network (w-d) to learn the word embedding

• Document embedding as average of word embeddings in the document

LINE(w-w) > SkipGram LINE(w-w) > LINE(w-d)

45

Accuracy

Page 46: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Extensions

• Other variants

• Multi-view networks

• Networks with node attributes

• Heterogeneous networks

• Task-specific network embedding

46

Page 47: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Extensions

• Other variants

• Multi-view networks

• Networks with node attributes

• Heterogeneous networks

• Task-specific network embedding

47

Page 48: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Other Variants

• Leverage global structural information (Cao et al. 2015)

• Non-linear methods based on autoencoders (Wang et al. 2016)

• Directed network embedding (Ou et al. 2016)

• Signed network embedding (Wang et al. 2017)

• Shaosheng Cao, Wei Lu, and Qiongkai Xu. GraRep: Learning graph representations with global structural information. CIKM’ 2015.

• Mingdong Ou, Peng Cui, Jian Pei, Wenwu Zhu. Asymmetric transitivity preserving graph embedding. KDD, 2016.

• Daixing Wang, Peng Cui, Wenwu Zhu. Structural deep network embedding. KDD, 2016. • Suhang Wang, Jiliang Tang, Charu Aggarwal, Yi Chang, Huan Liu. Signed network embedding in social media.

SDM 2017.

48

Page 49: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Extensions

• Other variants

• Multi-view networks

• Networks with node attributes

• Heterogeneous networks

• Task-specific network embedding

49

Page 50: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Multi-view Network Embedding ( Qu and Tang et al. 2017)

• Multiple types of relationships between nodes exist in real-world

networks

• E.g., following, retweeting relationships between users in Twitter

• Each type of relationship => a view of the network

• Multiple types of relationships => multi-view networks

• Infer robust node representations with multiple views

• Complementary information in different views

Figure: Networks with multiple views Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and Jiawei Han. Learning Distributed Node

Representations for Networks with Multiple Views. To appear in CIKM 2017.

50

Page 51: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

A Co-Regularization Approach ( Qu and Tang et al. 2017)

• Each node has a robust representation and multiple view-specific

representations

• Preserve the structure of different views through view-specific representations

• Promote the collaboration of different views to vote for robust representations

• Regularize the view-specific representations

51

Page 52: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

A Co-Regularization Approach ( Qu and Tang et al. 2017)

• Objective

View-specific objective

Regularization objective

:robust embedding of node i

:view-specific node embedding of node i

:weights of views of node i

52

Page 53: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Learning the Weights of the Views via Neural Attention ( Qu and Tang et al. 2017)

• Define the attention weight of views for each node:

• According to the regularization term:

: concatenation of view-specific embeddings of node i

• Learning the weights with supervised data, e.g., node classification

: embedding of view k

53

Page 54: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Results of Multi-label Node Classification

54

MVE > MVE-NoAttn > LINE/node2vec

Without learning the weights of views

Single best view

Page 55: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Extensions

• Other variants

• Multi-view networks

• Networks with node attributes

• Heterogeneous networks

• Task-specific network embedding

55

Page 56: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Networks with Node Attributes (Yang et al. 2015, N.Kipf et al. 2016, Liao et al. 2017)

• Networks with text information (Yang et al. 2015)

• Networks with attributes (Liao et al. 2017)

• Gender, location, text, „

• Variational graph autoencoders (N.Kipf et al. 2016)

• Encode the node with neighborhood structures and attributes

• Decode the neighborhood structures

56

• Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, Edward Y. Chang. Network representation learning with rich text information. IJCAI 2015.

• Thomas N.Kipf and Max Welling. Variational Graph Auto-encoders. NIPS Workshop 2016. • Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. Attributed Social Network Embedding. arXiv, 2017.

Page 57: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Extensions

• Other variants

• Multi-view networks

• Networks with node attributes

• Heterogeneous networks

• Task-specific network embedding

57

Page 58: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Heterogeneous Network Embedding via Deep Architectures (Chang et al. 2015)

• Heterogeneous networks of images and text

• Make the embeddings of linked objects close to each other

• image-image, image-text, text-text

58

Siyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, Thomas S. Huang. Heterogeneous network embedding via Deep Architectures. KDD’15

Page 59: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Heterogeneous Star Network Embedding (Chen et al. 2017)

• Heterogeneous Star Networks

• Paper, keywords, authors, venues

• Aims to embed the center objects

• paper

59

Ting Chen and Yizhou Sun, "Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification. WSDM’17.

Page 60: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Extensions

• Other variants

• Multi-view networks

• Networks with node attributes

• Heterogeneous networks

• Task-specific network embedding

60

Page 61: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Semi-supervised Text Representation (Tang et al. 2015b) • Heterogeneous text network

• Word-word, word-document, and word-label networks

• Different levels of word co-occurrences: local context-level, document-level, label-level

• Learning word embeddings through jointly training the heterogeneous networks

• Document embeddings as the average of word embeddings

Unstructured text

Text representation, e.g., word and document representation, „

Deep learning has been attracting increasing „

A future direction of deep learning is to integrate „

The Skip-gram model is quite effective and efficient „ Information networks encode the relationships

label document

label

label

null

null

null

degree

network

edge

node word

document

classification

text

embedding

Word co-occurrence network

text

information

network

word …

classification

doc_1

doc_2

doc_3

doc_4 …

Word-document network

text

information

network

word …

classification

label_2

label_1

label_3 … …

Word-label network

Jian Tang, Meng Qu, and Qiaozhu Mei. PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks. KDD’15.

61

Page 62: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Results on Text Classification of Long Documents

20newsgroup Wikipedia IMDB

Type Algorithm Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1

Unsupervised LINE(G_wd) 79.73 78.40 80.14 80.13 89.14 89.14

Predictive embedding

CNN 80.15 79.43 79.25 79.32 89.00 89.00

PTE(G_wl) 82.70 81.97 79.00 79.02 85.98 85.98

PTE(G_ww+G_wl) 83.90 83.11 81.65 81.62 89.14 89.14

PTE(G_wd+G_wl) 84.39 83.64 82.29 82.27 89.76 89.76

PTE(joint) 84.20 83.39 82.51 82.49 89.80 89.80

PTE > CNN

62

Page 63: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Results on Text Classification of Short Documents

20newsgroup Wikipedia IMDB

Type Algorithm Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1

Unsupervised LINE(G_ww) 74.22 70.12 71.13 71.12 73.84 73.84

Predictive embedding

CNN 76.16 73.08 72.71 72.69 75.97 75.96

PTE(G_wl) 76.45 72.74 73.44 73.42 73.92 73.91

PTE(G_ww+G_wl) 76.80 73.28 72.93 72.92 74.93 74.92

PTE(G_wd+G_wl) 77.46 74.03 73.13 73.11 75.61 75.61

PTE(joint) 77.15 73.61 73.58 73.57 75.21 75.21

PTE CNN »

63

Page 64: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Semi-supervised Classification with Graph Convolutional Networks (Kipf et al. 2017)

• Task: Given a graph G = (V, E), and the features of nodes

, and the labels of a subset of nodes are given.

• Learning the node representations through multi-layer graph convolutional networks

• Combining node representations (self-link) and representations of neighbors

X Î RN´D

Thomas N.Kipf and Max Welling. Semi-Supervised Classification with Graph Convolutional Networks. ICLR’17.

64

Page 65: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Multi-layer Graph Convolution Neural Networks

• Final objective:

65

• Starting from the node features • Define the propagation rule

Add the self-links

= Normalize the matrix

Nonlinear propagation

H (0) = X

Page 66: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Experimental Results (Kipf et al. 2017)

66

GCN > Label Propagation

Page 67: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Outline

• Part I: Learning Node Representations of Networks

• Related Work: Laplacian Eigenmap, Word2Vec

• LINE, DeepWalk, and Node2Vec

• Extensions

• Part II: Visualizing Networks and High-Dimensional Data

• t-SNE

• LargeVis

• Pat III: Learning Representations of Entire Networks

• CNN

• Neural Message Passing

• Part IV: Summary, Challenges & Future Work

67

Page 68: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Extremely Low-dimensional Representations: 2D/3D for Visualizing Networks

„.

„. „. „.

„. Networks 2D/3D Layout

„.

High-dimensional Data

K-Nearest Neighbor Graph (KNN-G) Construction

Graph Layout

68 Heatmaps

Network Diagrams Scatter Plots

Page 69: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

t-SNE (Maarten and Hinton, 2008, 2014 )

• State-of-the-art algorithms for high-dimensional data visualization

• Deployed in Tensorflow for visualizing the representations learned by deep neural networks.

Visualizations of MNIST Data TensorBoard Visualizations by t-SNE

L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. JMLR, 2008. L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. JMLR, 2014.

69

Page 70: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Constructing the K-nearest Neighbor Graph

• Finding the nearest neighbors for all the data points

• Vantage-point tree

• Calculating the similarities between the data points

• Complexity: O(NlogN) w.r.t. the number of data points N

: nearest neighbors of node i

70

Page 71: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

K-nearest Neighbor Graph Layout

• Similarity between two data points i and j in low-dimensional space is defined as:

• Objective: minimize the distance between the similarities defined in the high-dimensional spaces and low-dimensional spaces

• The complexity: O(NLogN) (Maarten, 2014).

: low-dimensional representations (coordinates) of node i

71

Page 72: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Limitations of t-SNE

• K-NNG construction: complexity grows O(NlogN) to the number of data points N

• Graph layout: complexity is O(NlogN) • Very sensitive parameters

72

Page 73: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

LargeVis (Tang et al., Best Paper Nomination at WWW 2016) • Efficient approximation of K-NNG construction

• 30 times faster than t-SNE (3 million data points) • Better time-accuracy tradeoff

• Efficient probabilistic model for graph layout • O(NlogN) -> O(N) • 7 times faster than t-SNE (3 million data points) • Better visualization layouts • Stable parameters across different data sets

Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. Visualizing Large-scale and High-dimensional Data. WWW’16

73

Page 74: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Random Projection Trees

• Partition the whole space into different regions with multiple hyperplanes

74

Page 75: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Random Projection Trees

75

Page 76: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Random Projection Trees

76

Page 77: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Random Projection Trees

77

Page 78: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Random Projection Trees

78

Page 79: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

K-NNG Construction

• Search nearest neighbors through traversing trees • Only data points in the leaf are considered as nearest neighbors

• Multiple trees are usually used to improve the accuracy • e.g., hundreds

79

Page 80: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Reduce the Number of Trees

• Construct a less accurate K-NNG with a few trees • Iteratively refine the K-NNG through “neighbor exploring”

• “A neighbor of my neighbor is also likely to be my neighbor” • Second-order neighbors are also treated as candidates of first-order neighbors

80

Page 81: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

It Works!

• X axis: accuracy of K-NNG • Y axis: running time (minutes) • tSNE: 16 hours (95% accuracy) • LargeVis: 25 minutes

• >30 times faster than t-SNE

LargeVis

t-SNE

Random projection trees

81

Page 82: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Learning the Layout of KNN Graph

• Preserve the similarities of the nodes in 2D/3D space • Represent each node with a 2D/3D vector • Keep similar data close while dissimilar data far apart

• Probability of observing a binary edge between nodes (i,j):

• Likelihood of observing a weighted edge between nodes (i,j):

p(eij =wij ) = p(eij =1)wij

82

Page 83: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

A Probabilistic Model for Graph Layout

• Objective:

• Randomly sample some negative edges • Optimized through asynchronous stochastic gradient descent • Time complexity: linear to the number of data points

γ : an unified weight assigned to negative edge

O = p(eij =wij )(i, j )ÎE

Õ (1- p(eij =wij )(i, j )ÎE

Õ )g

83

Page 84: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

It Works Too!

• Time complexity • t-SNE: O(NlogN) • LargeVis: O(N)

• On 3 million data points • t-SNE: 45 hours • LargeVis: 5.6 hours • Seven times faster LargeVis

t-SNE

84

Page 85: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Visualization Quality

• Metric: classification accuracy with KNN

on 2D space

• Configuration: • LargeVis with default parameters

• t-SNE with default and optimal parameters

(tuned per data set)

• LargeVis t-SNE with optimal parameters

• LargeVis >> t-SNE with default parameters

• Parameters of LargeVis are very stable

LargeVis t-SNE (optimal parameters)

t-SNE (default parameters)

»

85

Page 86: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

10M Scientific Papers on One Slide

86

Page 87: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

10M Scientific Papers on One Slide

87

Page 88: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Computer Science Mathematics

88

Page 89: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Physics Biology

89

Page 90: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Computer Science vs. Mathematics

90

Page 91: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Computer Science vs. Physics

91

Page 92: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Wikipedia Articles (color: semantic cluster)

92

Page 93: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

LiveJournal Network (color: community)

93

Page 94: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Computer Science Authors (color: community)

94

Page 95: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Summary • LargeVis: a new technique for visualizing networks

and high-dimensional data

• A better tool than t-SNE.

• >7 times faster than t-SNE on three million data points

95

Page 96: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Software

https://github.com/tangjianpku/LINE

https://github.com/lferry007/LargeVis

LINE: (C++)

LargeVis : (C++&Python)

https://github.com/elbamos/largeVis

(307 stars, released since 2015.3)

(307 stars, released since 2016.7)

LargeVis Tutorial: https://jlorince.github.io/viz-tutorial/

Interactive Visualization:

https://github.com/NLeSC/DiVE

R version in CRAN:

Our release:

Other tools based on our implementation:

96

Page 97: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Outline

• Part I: Learning Node Representations of Networks

• Laplacian Eigenmap

• Word2Vec

• LINE, DeepWalk, and Node2Vec

• Part II: Visualizing Networks and High-Dimensional Data

• t-SNE

• LargeVis

• Pat III: Learning Representations of Entire Networks

• CNN

• Neural Message Passing

• Part IV: Summary, Challenges & Future Work

97

Page 98: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Learning Representations of Entire (Small) Networks

98

• A collection of small networks with different structures • How to represent an entire (small) network

Sentence dependency graph

Molecules

„ Ego-networks

Page 99: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Convolutional Neural Networks

99

Page 100: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

PATCHY-SAN [Niepert+ '16]

Node selection (w=4 nodes)

c f g

c

d

b

a

f

e

g

h

i

d

c

d

b

f

e

c

f

e

g

d f

g i

Neighborhood assembly (at least k=4 nodes)

Neighborhood normalization (exactly k=4 nodes)

1

4

3

2

3

1

2

4

3 2

1 4

j

c

d

g i

j

1

2 3

4

100

Page 101: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

PATCHY-SAN [Niepert+ '16]

1 2 3 4 a1 a2 „ an

Attributes

Nodes

Apply CNNs

101

Normalized neighborhood

1

4

3

2 1

2 3

4

1 2 3 4 a1 a2 „ an

Page 102: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Neural Message Passing

102

Page 103: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Structure2vec (Dai et al. 2016)

103

• ũi = σ (W1xi+W2Σ j∈N(i) ũj+W3Σ j∈N(i)xj)

• {W1, W2, W3} are parameters

• N(i) are neighbors of i

• σ is an activation function

Message passing

Page 104: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Structure2vec (Dai et al. 2016)

104

Message passing

• We have embedding vectors {ui}

• ũi = σ (W1xi+W2Σ j∈N(i) ũj+W3Σ j∈N(i)xj)

• Represent a graph by Σ i ũi

• Minimize the empirical square loss

• (y - θ Tσ (Σ i ũi))2

• y is the graph label

• θ is a parameter

Page 105: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Neural Message Passing Framework (Gilmer et al. 2017)

105

• Message passing phrase

• Readout phase

v

w1 w2

w3 w4

: message function

: vertex update function

: readout function

Page 106: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Gated Graph Neural Networks (Li et al. 2016)

106

• Message function:

• Vertex updating function

• Readout function

a learned matrix for each type of edge (discrete type)

Neural networks

Page 107: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Neural Message Passing (Gilmer et al. 2017)

107

• Message function:

• Vertex updating function

• Readout function

or Set2Set (Vinyals et al. 2015)

A neural network maps a edge vector (continuous) to a matrix

Permutation invariant

Page 108: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Neural Message Passing (Gilmer et al. 2017)

108

• Add virtual edges

• Add a specific type of edge between nodes without edges

• Add a master node

• Connect to every node

Table :Results on the task of predicting chemical properties (QM9 Dataset)

Page 109: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Summary

• Convolutional Neural Network

• Define “receptive field” on graphs

• Neural message passing

• Define message passing, node updating, and graph pooling functions

109

Page 110: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Outline

• Part I: Learning Node Representations of Networks

• Laplacian Eigenmap

• Word2Vec

• LINE, DeepWalk, and Node2Vec

• Part II: Visualizing Networks and High-Dimensional Data

• t-SNE

• LargeVis

• Pat III: Learning Representations of Entire Networks

• CNN

• Neural message passing

• Part IV: Summary, Challenges & Future Work

110

Page 111: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Summary

• Network representation: a new methodology for analyzing and mining networks

• State-of-the-art approaches for node representation learning

• LINE, DeepWalk, and Node2Vec

• Moving towards to task-specific node representations (e.g., PTE and GraphConv)

• Visualizing large-scale networks and high-dimensional data

• LargeVis

• Sales up to tens of millions of nodes or data points

• Learning representations of network substructures

• CNN, Neural message passing 111

Page 112: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Challenges & Future Work

• Scalability

• How to scale up to networks with billions of nodes

• Hierarchical representations

• How to learn hierarchical representations of networks

• Dynamic

• Heterogeneous networks

• Multiple types of nodes, multiple types of edges

• Learning isomorphism-invariance representations of entire networks

• „

112

Page 113: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

References ### Node Embeddings ###

[Belkin et al. 2003] Mikhail Belkin and Partha Niyogi. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 2003.

[Mikolov et al. 2014] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2014.

[Tang et al. 2015a] Jian Tang, Meng Qu, Mingzhe Wang, Jun Yan, and Qiaozhu Mei. LINE: Large-scale Information Network Embedding. WWW’15

[Perozzi et al. 2014] Bryan Perozzi, Rami Al-Rfou, Steven Skiena. DeepWalk: Online Learning of Social Representations. KDD’14

[Grover et al. 2016] Aditya Grover and Jure Leskovec. node2vec: Scalable Feature Learning for Networks. KDD’16

[Cao et al. 2015] Shaosheng Cao, Wei Lu, and Qiongkai Xu. GraRep: learning graph representations with global structural information. CIKM’15.

[Qu et al. 2017] Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and Jiawei Han. Learning Distributed Node Representations for Networks with Multiple Views.

[Yang et al. 2015] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, Edward Y. Chang. Network representation learning with rich text information. IJCAI 2015.

[Kipf et al. 2016] Thomas N.Kipf and Max Welling. Variational Graph Auto-encoders. NIPS Workshop 2016.

[Liao et al. 2017]Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. Attributed Social Network Embedding. arXiv, 2017.

[Tang et al. 2015b] Jian Tang, Meng Qu, and Qiaozhu Mei. PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks. KDD’15.

[Kipf et al. 2017]Thomas N.Kipf and Max Welling. Semi-Supervised Classification with Graph Convolutional Networks. ICLR’17.

[Chang et al. 2017] Siyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, Thomas S. Huang. Heterogeneous network embedding via Deep Architectures. KDD’15

[Chen et al. 2017] Ting Chen and Yizhou Sun, "Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification. WSDM’17.

113

Page 114: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

References [Wang et al. 2017] Daixin Wang, Peng Cui, Wenwu Zhu. Structural deep network embedding. KDD, 2016.

### Node Visualizations ###

[Maaten et al. 2008] L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. JMLR, 2008.

[Maaten et al. 2014] L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. JMLR, 2014.

[Tang et al. 2016] Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. Visualizing Large-scale and High-dimensional Data. WWW’16

### Graph Embeddings ###

[Li et al. 2016] Cheng Li, Xiaoxiao Guo, and Qiaozhu Mei. 2016. DeepGraph: Graph Structure Predicts Network Growth. arXiv preprint arXiv:1610.06251 (2016).

[Niepert et al. 2016] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In Proceedings of the 33rd annual international conference on machine learning. ACM.

[Li et al. 2017] Cheng Li, Jiaqi Ma, Xiaoxiao Guo, and Qiaozhu Mei. 2017. DeepCas: an End-to-end Predictor of Information Cascades. In Proceedings of the 26th international conference on World wide web.

[Dai et al. 2016] Dai, Hanjun, Bo Dai, and Le Song. "Discriminative embeddings of latent variable models for structured data." International Conference on Machine Learning. 2016.

[Gilmer et al. 2017] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, G. E. Dahl. Neural message passing for quantum chemistry. arXiv, 2017.

[Li et al. 2016] Y. Li, D. Tarlow, M. Brockschmidt, R. Zemel. Gated graph sequence neural networks. ICLR, 2016.

114

Page 116: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Optimization

• The gradient w.r.t. the embedding

• Z is the partition function:

• The complexity w.r.t. the number of data points N is O(N^2)

• Too expensive!

116

Page 117: Learning Representations of Large-Scale Networksir.sdu.edu.cn/~zhuminchen/RL/tangjian2017.pdf · Learning Representations of Large-Scale Networks Jian Tang HEC Montréal Montréal

Barnes-Hut Approximation

• Rewriting the gradient as:

Attractive forces Complexity: linear to the number of edges

Repulsive forces Complexity: O(N^2)

• Constructing a quadtree of the nodes according to the current low-dimensional representations

Sum of node i and nodes in a cell:

From O(N^2) to O(NLogN)!

117