cs224w: machine learning with graphs jure leskovec, weihua...

75
CS224W: Machine Learning with Graphs Jure Leskovec, Weihua Hu, Stanford University http://cs224w.stanford.edu

Upload: others

Post on 05-Apr-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

CS224W: Machine Learning with GraphsJure Leskovec, Weihua Hu, Stanford University

http://cs224w.stanford.edu

Page 2: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

Output: Node embeddings.We can also embed larger network structures, subgraphs, graphs.

Page 3: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Key idea: Generate node embeddings based on local network neighborhoods

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

Page 4: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Intuition: Nodes aggregate information from their neighbors using neural networks

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

Neural networks

Page 5: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018

5

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

What’s inside the box?

¡ Many model variants have been proposed with difference choice of neural networks.

?

?

?

?

Page 6: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018

6

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

¡ Many model variants have been proposed with difference choice of neural networks.

Graph Convolutional Networks [Kipf & Welling ICLR'2017]

Linear + ReLU

Linear + ReLU

Mean

Mean

Page 7: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018

7

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

¡ Many model variants have been proposed with difference choice of neural networks.

GraphSAGE[Hamilton+ NeurIPS’2017]

Max

MLP

MaxMLP

Page 8: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Intuition: Network neighborhood defines a computation graph

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

Every node defines a computation graph based on its neighborhood!

Page 9: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Obtain node representation by neighbor aggregation

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 912/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

Page 10: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Obtain graph representation by pooling node representation

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 1012/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

Pool (e.g., Sum, Average)

Page 11: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

Graph Neural Networks have achieved state-of-the-art performance on:

¡ Node classification [Kipf+ ICLR’2017]

¡ Graph Classification [Ying+ NeurIPS’2018]

¡ Link Prediction [Zhang+ NeurIPS’2018]

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11

Page 12: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

Graph Neural Networks have achieved state-of-the-art performance on

¡ Node classification [Kipf+ ICLR’2017]

¡ Graph Classification [Ying+ NeurIPS’2018]

¡ Link Prediction [Zhang+ NeurIPS’2018]

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12

Are GNNs perfect?

What are the limitations of GNNs?

Page 13: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Some simple graph structures cannot be distinguished by conventional GNNs.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

GCN and GraphSAGE fail to distinguish the two graphs.

Assume: Input node features are uniform(denoted by the same node color)

Page 14: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Some simple graph structures cannot be distinguished by conventional GNNs.

¡ GNNs are not robust to noise in graph data.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

GCN and GraphSAGE fail to distinguish the two graphs.

Assume: Input node features are uniform(denoted by the same node color)

111

011

101

000

110?

Graph

GNNs1

0

Noise in graph1. Node feature

perturbation2. Edge

addition/deletion Class 1 Class 2 Class 3Class prediction

Page 15: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

1. Limitations of conventional GNNs in capturing graph structure

2. Vulnerability of GNNs to noise in graph data

3. Open questions & Future direction

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15

Page 16: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with
Page 17: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17

¡ Given two different graphs, can GNNs map them into different graph representations?

¡ Important condition for classification scenario.

GNN GNN

? ?

Page 18: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18

¡ Essentially, graph isomorphism test problem.¡ No polynomial algorithms exist for general case.¡ GNNs may not perfectly distinguish any graphs!

GNN

?

GNN

?

Page 19: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

¡ Essentially graph isomorphism test problem.¡ No polynomial algorithms exist for general case.¡ GNNs may not perfectly distinguish any graphs.

GNN

?

GNN

?How well can GNNs perform the graph isomorphism test?

Requires rethinking the mechanism of how GNNs capture graph structure.

Page 20: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20

¡ GNNs use different computational graphs to distinguish different graphs.

1 2

34

1 2 3 4

1’ 2’

3’4’

1’ 2’ 3’ 4’

Page 21: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21

¡ Node representation captures rooted subtree structure.

1 2

34

1 2 3 4

1’ 2’

3’4’

1’ 2’ 3’ 4’

Rooted subtree structure

Rooted subtree structure

Page 22: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22

¡ Most discriminative GNNs map different subtrees into different node representations (denoted by different colors).

1 2 3 4 1’ 2’ 3’ 4’

Page 23: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23

¡ Most discriminative GNNs map different subtrees into different node representation (denoted by different colors).

1 2 3 4 1’ 2’ 3’ 4’

Injectivity

Page 24: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Function is injective if it maps different elements into different outputs.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24

Page 25: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Entire neighbor aggregation is injective if every step of neighbor aggregation is injective.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25

Injective à

Injective à

Entire function is Injective!

Page 26: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Neighbor aggregation is essentially a function over multi-set (set with repeating elements).

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26

Neighbor aggregation

Multi-set function

EquivalentExamples of multi-set

Same color indicates the same node features

Page 27: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Neighbor aggregation is essentially function over multi-set (set with repeating elements).

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27

Neighbor aggregation

Multi-set function

EquivalentExamples of multi-set

Discriminative Power of GNNs can be characterized by that of multi-set functions

Next: Analyzing GCN, GraphSAGE

Page 28: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28

Recall: GCN uses mean pooling.

Mean Pooling + Linear ReLU

Page 29: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29

Recall: GCN uses mean pooling.

Mean pooling + Linear ReLU

GCN will fail to distinguish proportionally equivalent multi-sets.

Not injective!

Page 30: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 30

Recall: GraphSAGE uses max pooling.

MLP + Max pooling

Page 31: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31

Recall: GraphSAGE uses max pooling.

MLP + Max pooling.

GraphSAGE will even fail to distinguish multi-set with the same distinct elements.

Not injective!

Page 32: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32

Recall: GraphSAGE uses max pooling.

MLP + max pooling.

GCN will even fail to distinguish multi-set with the same distinct elements. Not injective!

How can we design injective multi-set function using neural networks?

Page 33: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

TheoremAny injective multi-set function can be expressed by

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33

𝜙 "#∈%

𝑓(𝑥)

𝑆 : multi-set

Some non-linear function

Some non-linear function

𝜙 𝑓 𝑓 𝑓+ +

Sum over multi-set

Page 34: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

TheoremAny injective multi-set function can be expressed by

We can model 𝜙 and 𝑓 using Multi-Layer-Perceptron (MLP). Note: MLP is a universal approximator.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 34

𝜙 "#∈%

𝑓(𝑥)Some non-linear function

Some non-linear function

Sum over multi-set

+ +𝑀𝐿𝑃/ 𝑀𝐿𝑃0 𝑀𝐿𝑃0 𝑀𝐿𝑃0

Page 35: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 35

Graph Isomorphism Network (GIN) [Xu+ ICLR’2019]

MLP + sum pooling

Page 36: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36

MLP + sum pooling

The GIN’s neighbor aggregation is injective!

Graph Isomorphism Network (GIN) [Xu+ ICLR’2019]

Page 37: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Graph pooling is also a function over multiset. Sum pooling can give injective graph pooling!

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 37

1 2

34

2 3 4

1’ 2’

3’4’

1’ 2’

Sum pooling Sum pooling

Page 38: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38

MLP + sum pooling

The GIN’s neighbor aggregation is injective!

Graph Isomorphism Network (GIN) [Xu+ ICLR’2019] So far: GIN achieves maximal

discriminative power by using injective neighbor aggregation.

How powerful is this?

Page 39: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39

¡ GIN is closely related to Weisfeiler-Lehman (WL) Graph Isomorphism Test (1968).

¡ WL test is known to be capable of distinguishing most of real-world graphs.

Page 40: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40

¡ GIN is closely related to Weisfeiler-Lehman (WL) Graph Isomorphism Test (1968).

¡ WL test is known to be capable of distinguishing most of real-world graphs.

¡ Next: We will show GIN is as discriminative as the WL test.

Page 41: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 41

1 2 3 4 1’ 2’ 3’ 4’

¡ WL first maps different rooted subtrees to different colors

1 2

34

1’ 2’

3’4’

Similar to Injective neighbor aggregation in GIN

Page 42: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ WL then counts different colors.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 42

1 2 3 4 1’ 2’ 3’ 4’

1 2

34

1’ 2’

3’4’

Similar to graph pooling in GNN

4 x

1 x 1 x

2 x

Page 43: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43

1 2 3 4 1’ 2’ 3’ 4’

¡ Finally, WL compares the count

1 2

34

1’ 2’

3’4’

4 x 1 x

1 x

2 x Compare!

Page 44: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44

1 2 3 4 1’ 2’ 3’ 4’

¡ Finally, WL compares the count

1 2

34

1’ 2’

3’4’

4 x 1 x

1 x

2 x Compare!WL test and GIN are operationally equivalent.

Graphs that WL test can distinguishGraphs that GIN can distinguish

Page 45: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

Observation¡ GINs have the same discriminative power as

the WL graph isomorphism test.

¡ WL test has been known to distinguish most of the graphs, except for some corner cases.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45

Relational Pooling for Graph Representations

where CNN denotes a 2D (LeCun et al., 1989; Krizhevskyet al., 2012) if there are no edge features and a 3D CNN (Jiet al., 2013) if there are edge features, [· on ·] is a concatena-tion of the representations, and MLP is a multilayer percep-tron. Multi-resolution 3D convolutions (Qi et al., 2016) canbe used to map variable-sized graphs into the same sizedrepresentation for downstream layers.

GNNs. The function f*

can also be a graph neural network(GNN), a broad class of models that use the graph G itselfto define the computation graph. These are permutation-invariant by design but we will show that their integrationinto RP can (1) make them more powerful and (2) speed uptheir computation via theoretically sound approximations.The GNNs we consider follow a message-passing (Gilmeret al., 2017) scheme defined by the recursion

h(l)u = �(l)

⇣h(l�1)u , JP

�(h(l�1)

v )v2N (u)

�⌘, (4)

where �(l) is a learnable function with distinct weights ateach layer 1 l L of the computation graph, JP isa general (learnable) permutation-invariant function (Mur-phy et al., 2019), N (u) is the set of neighbors of u 2 V ,and h(l)

u 2 Rd(l)h is a vector describing the embedding of

node u at layer l. h(0)u is the feature vector of node u,

(X(v))u,· or can be assigned a constant c if u has no fea-tures. Under this framework, node embeddings can be useddirectly to predict node-level targets, or all node embed-dings can be aggregated (via a learnable function) to forman embedding hG used for graph-wide tasks.

There are several variations of Equation 4 in the literature.Duvenaud et al. (2015) proposed using embeddings fromall layers l 2 {1, 2, . . . , L} for graph classification. Hamil-ton et al. (2017a) used a similar framework for node clas-sification and link prediction tasks, using the embeddingat the last layer, while Xu et al. (2018) extend Hamiltonet al. (2017a) to once again use embeddings at all layersfor node and link prediction tasks. Other improvements in-clude attention (Velickovic et al., 2018). This approach canbe derived from spectral graph convolutions (e.g., (Kipf &Welling, 2017)). More GNNs are discussed in Section 3.

Recently, Xu et al. (2019); Morris et al. (2019) showedthat these architectures are at most as powerful as theWeisfeiler-Lehman (WL) algorithm for testing graph iso-morphism (Weisfeiler & Lehman, 1968), which itself ef-fectively follows a message-passing scheme. Accordingly,we will broadly refer to models defined by Equation 4 asWL-GNNs. Xu et al. (2019) proposes a WL-GNN calledGraph Isomorphism Network (GIN) which is as powerfulas the WL test in graphs with discrete features.

Can a WL-GNN be more powerful than the WL test?WL-GNNs inherit a shortcoming from the WL test (Cai

Gskip(11, 2) Gskip(11, 3)

Figure 1: The WL test incorrectly deems these isomorphic.

et al., 1992; Arvind et al., 2017; Furer, 2017; Morris et al.,2019); node representations h(l)

u do not encode whethertwo nodes have the same neighbor or distinct neighbors

with the same features, limiting their ability to learn an ex-pressive representation of the entire graph. Consider a taskwhere graphs represent molecules, where node features in-dicate atom type and edges denote the presence or absenceof bonds. Here, the first WL-GNN layer cannot distinguishthat two (say) carbon atoms have a bond with the same car-bon atom or a bond to two distinct carbon atoms. Succes-sive layers of the WL-GNN update node representationsand the hope is that nodes eventually get unique representa-tions (up to isomorphisms), and thus allow the WL-GNN todetect whether two nodes have the same neighbor based onthe representations of their neighbors. However, if there aretoo few WL-GNN layers or complex cycles in the graph,the graph and its nodes will not be adequately represented.

To better understand this challenge, consider the extremecase illustrated by the two graphs in Figure 1. These arecycle graphs with M = 11 nodes where nodes that areR 2 {2, 3} ‘hops’ around the circle are connected by anedge. These highly symmetric graphs, which are specialcases of circulant graphs (Vilfred, 2004) are formally de-fined in Definition 2.1 but the key point is that the WL test,and thus WL-GNNs, cannot distinguish these two noniso-morphic graphs.

Definition 2.1: [Circulant Skip Links (CSL) graphs] LetR and M be co-prime natural numbers1 such that R <M � 1. Gskip(M,R) denotes an undirected 4-regular graphwith vertices {0, 1, . . . ,M � 1} whose edges form a cycleand have skip links. That is, for the cycle, {j, j + 1} 2 Efor j 2 {0, . . . ,M � 2} and {M � 1, 0} 2 E. For theskip links, recursively define the sequence s1 = 0, si+1 =(si +R) mod M and let {si, si+1} 2 E for any i 2 N. ⌃We will use RP to help WL-GNNs overcome this short-coming. Let f

*

be a WL-GNN that we make permutationsensitive by assigning each node an identifier that dependson ⇡. Permutation sensitive IDs prevent the RP sum fromcollapsing to just one term but more importantly help dis-tinguish neighbors that otherwise appear identical. In par-ticular, given any ⇡ 2 ⇧|V |, we append to the rows of X(v)

one-hot encodings of the row number before computing f*

.We can represent this by an augmented vertex attribute ma-

1Two numbers are co-primes if their only common factor is 1.

Relational Pooling for Graph Representations

where CNN denotes a 2D (LeCun et al., 1989; Krizhevskyet al., 2012) if there are no edge features and a 3D CNN (Jiet al., 2013) if there are edge features, [· on ·] is a concatena-tion of the representations, and MLP is a multilayer percep-tron. Multi-resolution 3D convolutions (Qi et al., 2016) canbe used to map variable-sized graphs into the same sizedrepresentation for downstream layers.

GNNs. The function f*

can also be a graph neural network(GNN), a broad class of models that use the graph G itselfto define the computation graph. These are permutation-invariant by design but we will show that their integrationinto RP can (1) make them more powerful and (2) speed uptheir computation via theoretically sound approximations.The GNNs we consider follow a message-passing (Gilmeret al., 2017) scheme defined by the recursion

h(l)u = �(l)

⇣h(l�1)u , JP

�(h(l�1)

v )v2N (u)

�⌘, (4)

where �(l) is a learnable function with distinct weights ateach layer 1 l L of the computation graph, JP isa general (learnable) permutation-invariant function (Mur-phy et al., 2019), N (u) is the set of neighbors of u 2 V ,and h(l)

u 2 Rd(l)h is a vector describing the embedding of

node u at layer l. h(0)u is the feature vector of node u,

(X(v))u,· or can be assigned a constant c if u has no fea-tures. Under this framework, node embeddings can be useddirectly to predict node-level targets, or all node embed-dings can be aggregated (via a learnable function) to forman embedding hG used for graph-wide tasks.

There are several variations of Equation 4 in the literature.Duvenaud et al. (2015) proposed using embeddings fromall layers l 2 {1, 2, . . . , L} for graph classification. Hamil-ton et al. (2017a) used a similar framework for node clas-sification and link prediction tasks, using the embeddingat the last layer, while Xu et al. (2018) extend Hamiltonet al. (2017a) to once again use embeddings at all layersfor node and link prediction tasks. Other improvements in-clude attention (Velickovic et al., 2018). This approach canbe derived from spectral graph convolutions (e.g., (Kipf &Welling, 2017)). More GNNs are discussed in Section 3.

Recently, Xu et al. (2019); Morris et al. (2019) showedthat these architectures are at most as powerful as theWeisfeiler-Lehman (WL) algorithm for testing graph iso-morphism (Weisfeiler & Lehman, 1968), which itself ef-fectively follows a message-passing scheme. Accordingly,we will broadly refer to models defined by Equation 4 asWL-GNNs. Xu et al. (2019) proposes a WL-GNN calledGraph Isomorphism Network (GIN) which is as powerfulas the WL test in graphs with discrete features.

Can a WL-GNN be more powerful than the WL test?WL-GNNs inherit a shortcoming from the WL test (Cai

Gskip(11, 2) Gskip(11, 3)

Figure 1: The WL test incorrectly deems these isomorphic.

et al., 1992; Arvind et al., 2017; Furer, 2017; Morris et al.,2019); node representations h(l)

u do not encode whethertwo nodes have the same neighbor or distinct neighbors

with the same features, limiting their ability to learn an ex-pressive representation of the entire graph. Consider a taskwhere graphs represent molecules, where node features in-dicate atom type and edges denote the presence or absenceof bonds. Here, the first WL-GNN layer cannot distinguishthat two (say) carbon atoms have a bond with the same car-bon atom or a bond to two distinct carbon atoms. Succes-sive layers of the WL-GNN update node representationsand the hope is that nodes eventually get unique representa-tions (up to isomorphisms), and thus allow the WL-GNN todetect whether two nodes have the same neighbor based onthe representations of their neighbors. However, if there aretoo few WL-GNN layers or complex cycles in the graph,the graph and its nodes will not be adequately represented.

To better understand this challenge, consider the extremecase illustrated by the two graphs in Figure 1. These arecycle graphs with M = 11 nodes where nodes that areR 2 {2, 3} ‘hops’ around the circle are connected by anedge. These highly symmetric graphs, which are specialcases of circulant graphs (Vilfred, 2004) are formally de-fined in Definition 2.1 but the key point is that the WL test,and thus WL-GNNs, cannot distinguish these two noniso-morphic graphs.

Definition 2.1: [Circulant Skip Links (CSL) graphs] LetR and M be co-prime natural numbers1 such that R <M � 1. Gskip(M,R) denotes an undirected 4-regular graphwith vertices {0, 1, . . . ,M � 1} whose edges form a cycleand have skip links. That is, for the cycle, {j, j + 1} 2 Efor j 2 {0, . . . ,M � 2} and {M � 1, 0} 2 E. For theskip links, recursively define the sequence s1 = 0, si+1 =(si +R) mod M and let {si, si+1} 2 E for any i 2 N. ⌃We will use RP to help WL-GNNs overcome this short-coming. Let f

*

be a WL-GNN that we make permutationsensitive by assigning each node an identifier that dependson ⇡. Permutation sensitive IDs prevent the RP sum fromcollapsing to just one term but more importantly help dis-tinguish neighbors that otherwise appear identical. In par-ticular, given any ⇡ 2 ⇧|V |, we append to the rows of X(v)

one-hot encodings of the row number before computing f*

.We can represent this by an augmented vertex attribute ma-

1Two numbers are co-primes if their only common factor is 1.

Follow-up work to resolve corner cases but with exponential time complexity:[Murphy+ ICML 2019]

The two graphs look the same for WL test because all the nodes have the same local subtree structure!

Page 46: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Graph classification: social and bio/chem graphsTraining accuracy of different GNN architectures.GIN fits training data much better than GCN, GraphSAGE.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46

WL kernelGIN

GCNGraphSAGE

Page 47: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Graph classification: social and bio/chem graphsTraining accuracy of different GNN architectures.Same trend across datasets!

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47

GIN

GraphSAGE

GCN

Page 48: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48

¡ Graph classification: social and bio/chem graphs

GIN outperforms existing GNNs also in terms of test accuracy because it can better capture graph structure.

Data / Model GIN(Powerful) Sum-1 GCN

(Mean)GraphSAGE

(Max)

RDT-B 92.4 90.0 50.0 50.0

IMDB-B 75.1 74.1 74.0 72.3

NCI1 82.7 82.0 80.2 77.7

MUTAG 89.4 90.0 85.6 85.1

Data / Model GIN(Powerful) Sum-1 GCN

(Mean)GraphSAGE

(Max)

RDT-B 92.4 90.0 50.0 50.0

IMDB-B 75.1 74.1 74.0 72.3

NCI1 82.7 82.0 80.2 77.7

MUTAG 89.4 90.0 85.6 85.1

Reddit dataset does not have node features!

Page 49: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Existing GNNs use non-injective neighbor aggregation, thus have low discriminative power

¡ GIN uses injective neighbor aggregation, and is as discriminative as the WL graph isomorphism test

¡ GIN achieves state-of-the-art test performance in graph classification

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 49

Page 50: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with
Page 51: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Deep Neural Networks are vulnerable to adversarial attacks!

¡ Attacks are often implemented as imperceptible noise that changes the prediction

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 51

Gradient direction that changes prediction

Page 52: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

52

¡ Adversaries are very common in applications of graph neural networks, e.g., search engines, recommender systems, social networks, etc.

¡ These adversaries will exploit any exposed vulnerabilities!

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Page 53: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

53

¡ Adversaries are very common in applications of graph neural networks. e.g. search engines, recommender systems, social networks.

¡ These adversaries will exploit any vulnerabilities exposed.

Are GNNs robust to adversarial attacks?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Page 54: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

54

Labeled

??

?

?

?

??

Message passing

¡ Here we focus on semi-supervised node classification using Graph Convolutional Neural Networks (GCN) [Kipf+ ICLR’2017].

GCN

Labeled

?: Unlabeled

Input: Partially labeledattributed graph

Goal: Predict labels ofunlabeled nodes

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Page 55: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 55

𝐴 ∈ 0,1 5×5: Adjacency matrix𝑋 ∈ 0,1 5×8: (binary) node attributes

𝑠𝑜𝑓𝑡𝑚𝑎𝑥 >𝐴 𝑅𝑒𝐿𝑈 >𝐴𝑋𝑊 C 𝑊 D

>𝐴 ≡ 𝐷GHI 𝐴 + 𝐼 𝐷G

HI : Renormalized adjacency matrix

Classification Model: Two-step GCN message passing

Training:Minimize cross entropy loss on labeled data

Testing:Apply the model to predict unlabeled data

Labeled

??

?

?

?

??

Labeled

Page 56: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

Target node 𝑡 ∈ 𝑉: node whose classification label we want to changeAttacker nodes 𝑆 ⊂ 𝑉: nodes the attacker can modify

56

Target node

Attacker nodeAttacker node¡ What are attack possibilities in real world?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Page 57: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

Target node 𝑡 ∈ 𝑉: node whose classification label we want to changeAttacker nodes 𝑆 ⊂ 𝑉: nodes the attacker can modify

Direct attack (𝑆 = {𝑡})

¡ Modify thetarget‘s features

¡ Add connectionsto the target

¡ Remove connectionsfrom the target

57

Target node

Attacker nodeAttacker node

Change websitecontent

Buy likes/ followers

Example

Unfollowuntrustedusers

¡ What are attack possibilities in real world?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Page 58: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

Target node 𝑡 ∈ 𝑉: node whose classification label we want to changeAttacker nodes 𝑆 ⊂ 𝑉: nodes the attacker can modify

Direct attack (𝑆 = {𝑡})

¡ Modify thetarget‘s features

¡ Add connectionsto the target

¡ Remove connectionsfrom the target

58

Target node

Indirect attack (𝑡 ∉ 𝑆)

• Modify theattackers‘ features

• Add connectionsto the attackers

• Remove connectionsfrom the attackers

Attacker nodeAttacker node

Change websitecontent

Buy likes/ followers

Example

Unfollowuntrustedusers

Hijack friendsof target

Create a link/ spam farm

Example

¡ What are attack possibilities in real world?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Page 59: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

Target node 𝑡 ∈ 𝑉: node whose classification label we want to changeAttacker nodes 𝑆 ⊂ 𝑉: nodes the attacker can modify

Direct attack (𝑆 = {𝑡})

¡ Modify thetarget‘s features

¡ Add connectionsto the target

¡ Remove connectionsfrom the target

59

Target node

Indirect attack (𝑡 ∉ 𝑆)

• Modify theattackers‘ features

• Add connectionsto the attackers

• Remove connectionsfrom the attackers

Attacker nodeAttacker node

Change websitecontent

Buy likes/ followers

Example

Unfollowuntrustedusers

Hijack friendsof target

Create a link/ spam farm

Example

¡ What are attack possibilities in real world?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

How to mathematically formalize these attack

possibilities?

Page 60: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Zügner+, Adversarial Attacks on Neural Networks for Graph Data, KDD’18

Maximize (Change of predicted labels of target node)Subject to (Limited noise in the graph)

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 60

111

011

101

000

110?

Graph

GCN1

0

Class 1 Class 2 Class 3

Add small graph noise

Target node

Change the predicted label

Page 61: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

argmaxVW,XW

maxYZY[\]

log 𝑍a,Y∗ − log 𝑍a,Y[\]∗

𝑤ℎ𝑒𝑟𝑒 𝑍∗ = 𝑓g∗ 𝐴h, 𝑋h =𝑠𝑜𝑓𝑡𝑚𝑎𝑥 >𝐴′ 𝑅𝑒𝐿𝑈 >𝐴′𝑋′𝑊 C 𝑊 D ,

𝑤𝑖𝑡ℎ 𝜃∗ = argmingℒ (𝜃; 𝐴h, 𝑋h)

𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

¡ Find a modified graph that maximizes the change of predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 61

Let’s parse the objective function!

Page 62: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

argmaxVW,XW

maxYZY[\]

log 𝑍a,Y∗ − log 𝑍a,Y[\]∗

𝑤ℎ𝑒𝑟𝑒 𝑍∗ = 𝑓g∗ 𝐴h, 𝑋h =𝑠𝑜𝑓𝑡𝑚𝑎𝑥 >𝐴′ 𝑅𝑒𝐿𝑈 >𝐴′𝑋′𝑊 C 𝑊 D ,

𝑤𝑖𝑡ℎ 𝜃∗ = argmingℒ (𝜃; 𝐴h, 𝑋h)

𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

¡ Find a modified graph that maximizes the change of predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 62

Modified adjacency matrix

Modified nodefeature

Original prediction

New prediction (specified by attacker)

Target node

Page 63: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Find a modified graph that maximizes the change of predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 63

argmaxVW,XW

maxYZY[\]

log 𝑍a,Y∗ − log 𝑍a,Y[\]∗

𝑤ℎ𝑒𝑟𝑒 𝑍∗ = 𝑓g∗ 𝐴h, 𝑋h =𝑠𝑜𝑓𝑡𝑚𝑎𝑥 >𝐴′ 𝑅𝑒𝐿𝑈 >𝐴′𝑋′𝑊 C 𝑊 D ,

𝑤𝑖𝑡ℎ 𝜃∗ = argmingℒ (𝜃; 𝐴h, 𝑋h)

𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

Increase the loglikelihood of target node 𝑣 being predicted as 𝑐

Decrease the loglikelihood of target node 𝑣being predicted as 𝑐tuv

Page 64: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Find a modified graph that maximizes the change of predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 64

GCN is trained on modified graph, which is then used to predict labels of the target node.

argmaxVW,XW

maxYZY[\]

log 𝑍a,Y∗ − log 𝑍a,Y[\]∗

𝑤ℎ𝑒𝑟𝑒 𝑍∗ = 𝑓g∗ 𝐴h, 𝑋h =𝑠𝑜𝑓𝑡𝑚𝑎𝑥 >𝐴′ 𝑅𝑒𝐿𝑈 >𝐴′𝑋′𝑊 C 𝑊 D ,

𝑤𝑖𝑡ℎ 𝜃∗ = argmingℒ (𝜃; 𝐴h, 𝑋h)

𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

Page 65: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

argmaxVW,XW

maxYZY[\]

log 𝑍a,Y∗ − log 𝑍a,Y[\]∗

𝑤ℎ𝑒𝑟𝑒 𝑍∗ = 𝑓g∗ 𝐴h, 𝑋h =𝑠𝑜𝑓𝑡𝑚𝑎𝑥 >𝐴′ 𝑅𝑒𝐿𝑈 >𝐴′𝑋′𝑊 C 𝑊 D ,

𝑤𝑖𝑡ℎ 𝜃∗ = argmingℒ (𝜃; 𝐴h, 𝑋h)

𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

¡ Find a modified graph that maximizes the change of predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 65

The modified graph should be close to the original graph.E.g., Fixed budget of edge deletion/addition, node attribute perturbation

Page 66: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ In practice, we cannot exactly solve the optimization problem because…§ Graph modification is discrete (cannot use simple

gradient descent to optimize)§ Inner loop involves expensive re-training of GCN

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 66

argmaxVW,XW

maxYZY[\]

log 𝑍a,Y∗ − log 𝑍a,Y[\]∗

𝑤ℎ𝑒𝑟𝑒 𝑍∗ = 𝑓g∗ 𝐴h, 𝑋h =𝑠𝑜𝑓𝑡𝑚𝑎𝑥 >𝐴′ 𝑅𝑒𝐿𝑈 >𝐴′𝑋′𝑊 C 𝑊 D ,

𝑤𝑖𝑡ℎ 𝜃∗ = argmingℒ (𝜃; 𝐴h, 𝑋h)

𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

Page 67: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Some heuristics have been proposed to efficiently obtain an approximate solution.

¡ For example:§ Greedily choosing the step-by-step graph

modification§ Simplifying GCN by removing ReLU activation (to

work in closed form)§ Etc.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 67

More details in Zügner+ KDD’2018!

Page 68: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Semi-supervised node classification with GCN§ Class predictions for a single node, produced by 5 GCNs with

different random initializations

68

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6(correct)

Class 7

Predicted probabilities (clean)

Classification margin> 0: Correct classification< 0: Incorrect classification

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Page 69: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ The GCN prediction is easily manipulated by only 5 modifications of graph structure (|V|=~2k, |E|=~5k)

69

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6(correct)

Class 7

Predicted probabilities after attack (5 modifications)

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Page 70: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

70

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6(correct)

Class 7

Predicted probabilities after attack (5 modifications)

¡ The prediction of GCN is easily manipulated by only 5 modifications of graph structure.

GNNs are not robust to adversarial attacks!

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

Page 71: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with
Page 72: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 72

¡ Chemistry: Molecular graphs§ Molecular property prediction

¡ Biology: Protein-Protein Interaction Networks§ Protein function prediction

= toxic?

GNN ( ) = adenylate cyclase activity?

GNN( )

Page 73: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Scarcity of labeled data§ Labels require expensive experiments

à Models overfit to small training datasets

¡ Out-of-distribution prediction§ Test examples are very different from training in

scientific discovery à Models typically perform poorly

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 73

Page 74: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ Pre-training GNNs [Hu+ 2019]

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 74

GNN

Pre-train GNNs on relevant, easy to obtain graph data

Unlabeled molecules

Chemistry database

Pre-train

Downstream task 1

Downstream task 2

Downstream task N

Fine-tunewith chemical knowledge

Page 75: CS224W: Machine Learning with Graphs Jure Leskovec, Weihua ...web.stanford.edu/class/cs224w/slides/18-limitations.pdf · 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with

¡ We have seen how to attack GNNs¡ Open question:

How to defend against the attacks?

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, hpp://cs224w.stanford.edu 75

Challenges:¡ Tractable optimization on discrete graph data¡ Achieving good trade-off between Accuracy

and Robustness