deep learning on graphs - deeploria site · deeploria...

52
Deep Learning on Graphs with Graph Convolutional Networks Thomas Kipf, 22 March 2017 Input Hidden layer Hidden layer ReLU Output ReLU joint work with Max Welling (University of Amsterdam) BDL Workshop @ NIPS 2016 Conference paper @ ICLR 2017 (+ new paper on arXiv)

Upload: builien

Post on 28-Apr-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Deep Learning on Graphswith Graph Convolutional Networks

Thomas Kipf, 22 March 2017

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

joint work with Max Welling (University of Amsterdam)

BDL Workshop @ NIPS 2016 Conference paper @ ICLR 2017 (+ new paper on arXiv)

Deep Learning on Graph-Structured Data Thomas Kipf

The success story of deep learning

2

Speech data

Natural language processing (NLP)

Deep Learning on Graph-Structured Data Thomas Kipf

The success story of deep learning

2

Speech data

Natural language processing (NLP)

Deep neural nets that exploit:

- translation invariance (weight sharing) - hierarchical compositionality

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

3

Euclidean data: grids, sequences…

2D grid

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

3

Euclidean data: grids, sequences…

2D grid

1D grid

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

4

We know how to deal with this:

Convolutional neural networks (CNNs)

(Animation by Vincent Dumoulin) (Source: Wikipedia)

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

4

We know how to deal with this:

Convolutional neural networks (CNNs)

(Animation by Vincent Dumoulin) (Source: Wikipedia)

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

4

We know how to deal with this:

Convolutional neural networks (CNNs)

(Animation by Vincent Dumoulin) (Source: Wikipedia)

or recurrent neural networks (RNNs)

(Source: Christopher Olah’s blog)

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Update for a single pixel: • Transform neighbors individually

• Add everything up

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Update for a single pixel: • Transform neighbors individually

• Add everything up

Full update:

Deep Learning on Graph-Structured Data Thomas Kipf

Graph-structured data

6

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

What if our data looks like this?

Deep Learning on Graph-Structured Data Thomas Kipf

or this:

Graph-structured data

6

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

What if our data looks like this?

Deep Learning on Graph-Structured Data Thomas Kipf

or this:

Graph-structured data

6

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

What if our data looks like this?

Real-world examples:• Social networks • World-wide-web • Protein-interaction networks

• Telecommunication networks • Knowledge graphs • …

Deep Learning on Graph-Structured Data Thomas Kipf

Graphs: Definitions

7

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLUGraph:

: Set of nodes ,

: Set of edges

(adjacency matrix):

We can define:

(can also be weighted)

Deep Learning on Graph-Structured Data Thomas Kipf

Graphs: Definitions

7

• Set of trainable parameters • Trainable in time • Applicable even if the input graph changes

Model wish list:

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLUGraph:

: Set of nodes ,

: Set of edges

(adjacency matrix):

We can define:

(can also be weighted)

Deep Learning on Graph-Structured Data Thomas Kipf

A naive approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

?

Deep Learning on Graph-Structured Data Thomas Kipf

A naive approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

Problems:

• Huge number of parameters

• Needs to be re-trained if number of nodes changes

• Does not generalize across graphs

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

?

Deep Learning on Graph-Structured Data Thomas Kipf

A naive approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

Problems:

• Huge number of parameters

• Needs to be re-trained if number of nodes changes

• Does not generalize across graphs

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

?

Deep Learning on Graph-Structured Data Thomas Kipf

A naive approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

Problems:

• Huge number of parameters

• Needs to be re-trained if number of nodes changes

• Does not generalize across graphs

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

?We need weight sharing!

=> CNNs on graphs or “Graph Convolutional Networks” (GCNs)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

Calculate update for node in red:

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

Calculate update for node in red:

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

Calculate update for node in red:

Update rule:

: neighbor indices

: norm. constant (per edge)

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

Calculate update for node in red:

How is this related to spectral CNNs on graphs? ➡ Localized 1st-order approximation of spectral filters [Kipf & Welling, ICLR 2017]

Update rule:

: neighbor indices

: norm. constant (per edge)

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

Vectorized form

10

with

Deep Learning on Graph-Structured Data Thomas Kipf

Vectorized form

10

with

Or treat self-connection in the same way:

with

Deep Learning on Graph-Structured Data Thomas Kipf

Vectorized form

10

with

Or treat self-connection in the same way:

with

is typically sparse➡ We can use sparse matrix multiplications! ➡ Efficient implementation in Theano or TensorFlow

Deep Learning on Graph-Structured Data Thomas Kipf 11

Model architecture (with spatial filters)

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

Input: Feature matrix , preprocessed adjacency matrix

[Kipf & Welling, ICLR 2017](or more sophisticated filters / basis functions)

Deep Learning on Graph-Structured Data Thomas Kipf 12

What does it do? An example.

f( ) =

Forward pass through untrained 3-layer GCN model

Parameters initialized randomly

Produces (useful?) random embeddings!

2-dim output per node

[Karate Club Network]

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Useful as graph isomorphism check for most graphs(exception: highly regular graphs)

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Useful as graph isomorphism check for most graphs(exception: highly regular graphs)

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Useful as graph isomorphism check for most graphs(exception: highly regular graphs)

GCN:

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

14

Setting: Some nodes are labeled (black circle) All other nodes are unlabeled

Task: Predict node label of unlabeled nodes

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

14

Setting: Some nodes are labeled (black circle) All other nodes are unlabeled

Task: Predict node label of unlabeled nodes

Standard approach: graph-based regularization (“smoothness constraints”)

with

assumes: connected nodes likely to share same label

[Zhu et al., 2003]

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

15

Embedding-based approaches Two-step pipeline:

1) Get embedding for every node 2) Train classifier on node embedding

Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

15

Embedding-based approaches Two-step pipeline:

1) Get embedding for every node 2) Train classifier on node embedding

Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]

Problem: Embeddings are not optimized for classification!

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

15

Embedding-based approaches Two-step pipeline:

1) Get embedding for every node 2) Train classifier on node embedding

Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]

Problem: Embeddings are not optimized for classification!

Idea: Train graph-based classifier end-to-end using GCN

Evaluate loss on labeled nodes only:

set of labeled node indices

label matrix

GCN output (after softmax)

Deep Learning on Graph-Structured Data Thomas Kipf

Toy example (semi-supervised learning)

16

Video also available here: http://tkipf.github.io/graph-convolutional-networks

Deep Learning on Graph-Structured Data Thomas Kipf

Toy example (semi-supervised learning)

16

Video also available here: http://tkipf.github.io/graph-convolutional-networks

Deep Learning on Graph-Structured Data Thomas Kipf

Classification on citation networks

17

(Figure from: Bronstein, Bruna, LeCun, Szlam, Vandergheynst, 2016)

Input: Citation networks (nodes are papers, edges are citation links, optionally bag-of-words features on nodes)

Target: Paper category (e.g. stat.ML, cs.LG, …)

Deep Learning on Graph-Structured Data Thomas Kipf

Experiments and results

18

Dataset statistics

(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)

Model: 2-layer GCN

Deep Learning on Graph-Structured Data Thomas Kipf

Experiments and results

18

Classification results (accuracy)

no input features

Dataset statistics

(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)

Model: 2-layer GCN

Deep Learning on Graph-Structured Data Thomas Kipf

Experiments and results

18

Classification results (accuracy)

no input features

Dataset statistics

(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)

Model: 2-layer GCN

Learned representations (t-SNE embedding of hidden layer activations)

Deep Learning on Graph-Structured Data Thomas Kipf

Other recent work

19

Spectral domain Spatial domain

SyncSpecCNN (Spectral Transformer Nets)

[Yi et al., 2016]

Chebyshev CNN (approximate spectral filters in spatial domain) [Defferrard et al., 2016]

Mixture model CNN (learned basis functions)

[Monti et al., 2016]

Global filters (hard to generalize) Local filters (limited field of view)

Deep Learning on Graph-Structured Data Thomas Kipf

Conclusions and open questions

20

• Deep learning paradigm can be extended to graphs • State-of-the-art results in a number of domains • Use end-to-end training instead of multi-stage approaches

(graph kernels, DeepWalk, node2vec etc.)

Results from past 1-2 years have shown:

Open problems:

• Learn basis functions from scratch - First step: [Monti et al., 2016] • Robustness of filters to change of graph structure • Learn representations of graphs - First step: [Duvenaud et al., 2015]

• Theoretical understanding of what is learned • Common benchmark datasets • …

Deep Learning on Graph-Structured Data Thomas Kipf

Further reading

21

Blog post Graph Convolutional Networks: http://tkipf.github.io/graph-convolutional-networks

Code on Github: http://github.com/tkipf/gcn

• E-Mail: [email protected]

• Twitter: @thomaskipf

• Web: http://tkipf.github.io

Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017: https://arxiv.org/abs/1609.02907

Kipf & Welling, Variational Graph Auto-Encoders, NIPS BDL Workshop, 2016: https://arxiv.org/abs/1611.07308

You can get in touch with me via:

Project funded by SAP

Deep Learning on Graph-Structured Data Thomas Kipf

How deep is deep enough?

22

Residual connection