deep learning on graphs - deeploria site · deeploria...

Deep Learning on Graphswith Graph Convolutional Networks

Thomas Kipf, 22 March 2017

… …

…

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

joint work with Max Welling (University of Amsterdam)

BDL Workshop @ NIPS 2016 Conference paper @ ICLR 2017 (+ new paper on arXiv)

Deep Learning on Graph-Structured Data Thomas Kipf

The success story of deep learning

2

Speech data

Natural language processing (NLP)

…


The success story of deep learning

2

Speech data

Natural language processing (NLP)

…

Deep neural nets that exploit:

- translation invariance (weight sharing) - hierarchical compositionality


Recap: Deep learning on Euclidean data

3

Euclidean data: grids, sequences…

2D grid



3

Euclidean data: grids, sequences…

2D grid

1D grid



4

We know how to deal with this:

Convolutional neural networks (CNNs)

(Animation by Vincent Dumoulin) (Source: Wikipedia)



4

We know how to deal with this:

Convolutional neural networks (CNNs)

(Animation by Vincent Dumoulin) (Source: Wikipedia)

or recurrent neural networks (RNNs)

(Source: Christopher Olah’s blog)


Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:



5



…



5



…

Update for a single pixel: • Transform neighbors individually

• Add everything up



5



…

Update for a single pixel: • Transform neighbors individually

• Add everything up

Full update:


Graph-structured data

6

… …

…

Input


ReLU

Output

ReLU

What if our data looks like this?


or this:


6

… …

…

Input


ReLU

Output

ReLU



or this:


6

… …

…

Input


ReLU

Output

ReLU


Real-world examples:• Social networks • World-wide-web • Protein-interaction networks

• Telecommunication networks • Knowledge graphs • …


Graphs: Definitions

7

… …

…

Input


ReLU

Output

ReLUGraph:

: Set of nodes ,

: Set of edges

(adjacency matrix):

We can define:

(can also be weighted)


Graphs: Definitions

7

• Set of trainable parameters • Trainable in time • Applicable even if the input graph changes

Model wish list:

… …

…

Input


ReLU

Output

ReLUGraph:

: Set of nodes ,

: Set of edges

(adjacency matrix):

We can define:

(can also be weighted)


A naive approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

… …

…

Input


ReLU

Output

ReLU

?


A naive approach

8




• Done?

Problems:

• Huge number of parameters

• Needs to be re-trained if number of nodes changes

• Does not generalize across graphs

… …

…

Input


ReLU

Output

ReLU

?


A naive approach

8




• Done?

Problems:

• Huge number of parameters

• Needs to be re-trained if number of nodes changes

• Does not generalize across graphs

… …

…

Input


ReLU

Output

ReLU

?We need weight sharing!

=> CNNs on graphs or “Graph Convolutional Networks” (GCNs)


CNNs on graphs with spatial filters

9

Consider this undirected graph:

(related idea was first proposed in Scarselli et al. 2009)



9


Calculate update for node in red:




9



Update rule:

: neighbor indices

: norm. constant (per edge)




9



How is this related to spectral CNNs on graphs? ➡ Localized 1st-order approximation of spectral filters [Kipf & Welling, ICLR 2017]

Update rule:

: neighbor indices

: norm. constant (per edge)



Vectorized form

10

with


Vectorized form

10

with

Or treat self-connection in the same way:

with


Vectorized form

10

with

Or treat self-connection in the same way:

with

is typically sparse➡ We can use sparse matrix multiplications! ➡ Efficient implementation in Theano or TensorFlow

Deep Learning on Graph-Structured Data Thomas Kipf 11

Model architecture (with spatial filters)

… …

…

Input


ReLU

Output

ReLU

Input: Feature matrix , preprocessed adjacency matrix

[Kipf & Welling, ICLR 2017](or more sophisticated filters / basis functions)

Deep Learning on Graph-Structured Data Thomas Kipf 12

What does it do? An example.

f( ) =

Forward pass through untrained 3-layer GCN model

Parameters initialized randomly

Produces (useful?) random embeddings!

2-dim output per node

[Karate Club Network]


Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment



13


Useful as graph isomorphism check for most graphs(exception: highly regular graphs)



13


Useful as graph isomorphism check for most graphs(exception: highly regular graphs)

GCN:


Semi-supervised classification on graphs

14

Setting: Some nodes are labeled (black circle) All other nodes are unlabeled

Task: Predict node label of unlabeled nodes



14

Setting: Some nodes are labeled (black circle) All other nodes are unlabeled

Task: Predict node label of unlabeled nodes

Standard approach: graph-based regularization (“smoothness constraints”)

with

assumes: connected nodes likely to share same label

[Zhu et al., 2003]



15

Embedding-based approaches Two-step pipeline:

1) Get embedding for every node 2) Train classifier on node embedding

Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]



15




Problem: Embeddings are not optimized for classification!



15




Problem: Embeddings are not optimized for classification!

Idea: Train graph-based classifier end-to-end using GCN

Evaluate loss on labeled nodes only:

set of labeled node indices

label matrix

GCN output (after softmax)


Toy example (semi-supervised learning)

16

Video also available here: http://tkipf.github.io/graph-convolutional-networks

http://tkipf.github.io/graph-convolutional-networks


Classification on citation networks

17

(Figure from: Bronstein, Bruna, LeCun, Szlam, Vandergheynst, 2016)

Input: Citation networks (nodes are papers, edges are citation links, optionally bag-of-words features on nodes)

Target: Paper category (e.g. stat.ML, cs.LG, …)


Experiments and results

18

Dataset statistics

(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)

Model: 2-layer GCN



18

Classification results (accuracy)

no input features

Dataset statistics


Model: 2-layer GCN



18

Classification results (accuracy)

no input features

Dataset statistics


Model: 2-layer GCN

Learned representations (t-SNE embedding of hidden layer activations)


Other recent work

19

Spectral domain Spatial domain

SyncSpecCNN (Spectral Transformer Nets)

[Yi et al., 2016]

Chebyshev CNN (approximate spectral filters in spatial domain) [Defferrard et al., 2016]

Mixture model CNN (learned basis functions)

[Monti et al., 2016]

Global filters (hard to generalize) Local filters (limited field of view)


Conclusions and open questions

20

• Deep learning paradigm can be extended to graphs • State-of-the-art results in a number of domains • Use end-to-end training instead of multi-stage approaches

(graph kernels, DeepWalk, node2vec etc.)

Results from past 1-2 years have shown:

Open problems:

• Learn basis functions from scratch - First step: [Monti et al., 2016] • Robustness of filters to change of graph structure • Learn representations of graphs - First step: [Duvenaud et al., 2015]

• Theoretical understanding of what is learned • Common benchmark datasets • …


Further reading

21

Blog post Graph Convolutional Networks: http://tkipf.github.io/graph-convolutional-networks

Code on Github: http://github.com/tkipf/gcn

• E-Mail: [email protected]

• Twitter: @thomaskipf

• Web: http://tkipf.github.io

Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017: https://arxiv.org/abs/1609.02907

Kipf & Welling, Variational Graph Auto-Encoders, NIPS BDL Workshop, 2016: https://arxiv.org/abs/1611.07308

You can get in touch with me via:

Project funded by SAP

http://tkipf.github.io/graph-convolutional-networks

http://github.com/tkipf/gcn

https://arxiv.org/abs/1609.02907

https://arxiv.org/abs/1611.07308


How deep is deep enough?

22

Residual connection

deep learning on graphs - deeploria site · deeploria...

Documents