cs224w: machine learning with graphs jure leskovec, weihua...

CS224W: Machine Learning with GraphsJure Leskovec, Weihua Hu, Stanford University

http://cs224w.stanford.edu

http://cs224w.stanford.edu/

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

…

Output: Node embeddings.We can also embed larger network structures, subgraphs, graphs.

¡ Key idea: Generate node embeddings based on local network neighborhoods


INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

¡ Intuition: Nodes aggregate information from their neighbors using neural networks


INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

Neural networks


Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018

5

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

What’s inside the box?

¡ Many model variants have been proposed with difference choice of neural networks.

?

?

?

?



6

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A


Graph Convolutional Networks [Kipf & Welling ICLR'2017]

Linear + ReLU

Linear + ReLU

Mean

Mean



7

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A


GraphSAGE[Hamilton+ NeurIPS’2017]

Max

MLP

MaxMLP

¡ Intuition: Network neighborhood defines a computation graph


Every node defines a computation graph based on its neighborhood!

¡ Obtain node representation by neighbor aggregation

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 912/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

¡ Obtain graph representation by pooling node representation

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 1012/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

Pool (e.g., Sum, Average)

Graph Neural Networks have achieved state-of-the-art performance on:

¡ Node classification [Kipf+ ICLR’2017]

¡ Graph Classification [Ying+ NeurIPS’2018]

¡ Link Prediction [Zhang+ NeurIPS’2018]


Graph Neural Networks have achieved state-of-the-art performance on

¡ Node classification [Kipf+ ICLR’2017]

¡ Graph Classification [Ying+ NeurIPS’2018]

¡ Link Prediction [Zhang+ NeurIPS’2018]


Are GNNs perfect?

What are the limitations of GNNs?

¡ Some simple graph structures cannot be distinguished by conventional GNNs.


GCN and GraphSAGE fail to distinguish the two graphs.

Assume: Input node features are uniform(denoted by the same node color)

¡ Some simple graph structures cannot be distinguished by conventional GNNs.

¡ GNNs are not robust to noise in graph data.


GCN and GraphSAGE fail to distinguish the two graphs.

Assume: Input node features are uniform(denoted by the same node color)

111

011

101

000

110?

Graph

GNNs1

0

Noise in graph1. Node feature

perturbation2. Edge

addition/deletion Class 1 Class 2 Class 3Class prediction

1. Limitations of conventional GNNs in capturing graph structure

2. Vulnerability of GNNs to noise in graph data

3. Open questions & Future direction



¡ Given two different graphs, can GNNs map them into different graph representations?

¡ Important condition for classification scenario.

GNN GNN

? ?


¡ Essentially, graph isomorphism test problem.¡ No polynomial algorithms exist for general case.¡ GNNs may not perfectly distinguish any graphs!

GNN

?

GNN

?


¡ Essentially graph isomorphism test problem.¡ No polynomial algorithms exist for general case.¡ GNNs may not perfectly distinguish any graphs.

GNN

?

GNN

?How well can GNNs perform the graph isomorphism test?

Requires rethinking the mechanism of how GNNs capture graph structure.


¡ GNNs use different computational graphs to distinguish different graphs.

1 2

34

1 2 3 4

1’ 2’

3’4’

1’ 2’ 3’ 4’


¡ Node representation captures rooted subtree structure.

1 2

34

1 2 3 4

1’ 2’

3’4’

1’ 2’ 3’ 4’

Rooted subtree structure

Rooted subtree structure


¡ Most discriminative GNNs map different subtrees into different node representations (denoted by different colors).

1 2 3 4 1’ 2’ 3’ 4’


¡ Most discriminative GNNs map different subtrees into different node representation (denoted by different colors).

1 2 3 4 1’ 2’ 3’ 4’

Injectivity

¡ Function is injective if it maps different elements into different outputs.


¡ Entire neighbor aggregation is injective if every step of neighbor aggregation is injective.


Injective à

Injective à

Entire function is Injective!

¡ Neighbor aggregation is essentially a function over multi-set (set with repeating elements).


Neighbor aggregation

Multi-set function

EquivalentExamples of multi-set

Same color indicates the same node features

¡ Neighbor aggregation is essentially function over multi-set (set with repeating elements).


Neighbor aggregation

Multi-set function

EquivalentExamples of multi-set

Discriminative Power of GNNs can be characterized by that of multi-set functions

Next: Analyzing GCN, GraphSAGE


Recall: GCN uses mean pooling.

Mean Pooling + Linear ReLU


Recall: GCN uses mean pooling.

Mean pooling + Linear ReLU

GCN will fail to distinguish proportionally equivalent multi-sets.

Not injective!


Recall: GraphSAGE uses max pooling.

MLP + Max pooling



MLP + Max pooling.

GraphSAGE will even fail to distinguish multi-set with the same distinct elements.

Not injective!



MLP + max pooling.

GCN will even fail to distinguish multi-set with the same distinct elements. Not injective!

How can we design injective multi-set function using neural networks?

TheoremAny injective multi-set function can be expressed by


𝜙 "#∈%

𝑓(𝑥)

𝑆 : multi-set

Some non-linear function


𝜙 𝑓 𝑓 𝑓+ +

Sum over multi-set

TheoremAny injective multi-set function can be expressed by

We can model 𝜙 and 𝑓 using Multi-Layer-Perceptron (MLP). Note: MLP is a universal approximator.


𝜙 "#∈%

𝑓(𝑥)Some non-linear function


Sum over multi-set

+ +𝑀𝐿𝑃/ 𝑀𝐿𝑃0 𝑀𝐿𝑃0 𝑀𝐿𝑃0


Graph Isomorphism Network (GIN) [Xu+ ICLR’2019]

MLP + sum pooling


MLP + sum pooling

The GIN’s neighbor aggregation is injective!

Graph Isomorphism Network (GIN) [Xu+ ICLR’2019]

¡ Graph pooling is also a function over multiset. Sum pooling can give injective graph pooling!


1 2

34

2 3 4

1’ 2’

3’4’

1’ 2’

Sum pooling Sum pooling


MLP + sum pooling

The GIN’s neighbor aggregation is injective!

Graph Isomorphism Network (GIN) [Xu+ ICLR’2019] So far: GIN achieves maximal

discriminative power by using injective neighbor aggregation.

How powerful is this?


¡ GIN is closely related to Weisfeiler-Lehman (WL) Graph Isomorphism Test (1968).

¡ WL test is known to be capable of distinguishing most of real-world graphs.


¡ GIN is closely related to Weisfeiler-Lehman (WL) Graph Isomorphism Test (1968).

¡ WL test is known to be capable of distinguishing most of real-world graphs.

¡ Next: We will show GIN is as discriminative as the WL test.


1 2 3 4 1’ 2’ 3’ 4’

¡ WL first maps different rooted subtrees to different colors

1 2

34

1’ 2’

3’4’

Similar to Injective neighbor aggregation in GIN

¡ WL then counts different colors.


1 2 3 4 1’ 2’ 3’ 4’

1 2

34

1’ 2’

3’4’

Similar to graph pooling in GNN

4 x

1 x 1 x

2 x


1 2 3 4 1’ 2’ 3’ 4’

¡ Finally, WL compares the count

1 2

34

1’ 2’

3’4’

4 x 1 x

1 x

2 x Compare!


1 2 3 4 1’ 2’ 3’ 4’

¡ Finally, WL compares the count

1 2

34

1’ 2’

3’4’

4 x 1 x

1 x

2 x Compare!WL test and GIN are operationally equivalent.

Graphs that WL test can distinguishGraphs that GIN can distinguish

Observation¡ GINs have the same discriminative power as

the WL graph isomorphism test.

¡ WL test has been known to distinguish most of the graphs, except for some corner cases.


Relational Pooling for Graph Representations

where CNN denotes a 2D (LeCun et al., 1989; Krizhevskyet al., 2012) if there are no edge features and a 3D CNN (Jiet al., 2013) if there are edge features, [· on ·] is a concatena-tion of the representations, and MLP is a multilayer percep-tron. Multi-resolution 3D convolutions (Qi et al., 2016) canbe used to map variable-sized graphs into the same sizedrepresentation for downstream layers.

GNNs. The function f*

can also be a graph neural network(GNN), a broad class of models that use the graph G itselfto define the computation graph. These are permutation-invariant by design but we will show that their integrationinto RP can (1) make them more powerful and (2) speed uptheir computation via theoretically sound approximations.The GNNs we consider follow a message-passing (Gilmeret al., 2017) scheme defined by the recursion

h(l)u = �(l)

⇣h(l�1)u , JP

�(h(l�1)

v )v2N (u)

�⌘, (4)

where �(l) is a learnable function with distinct weights ateach layer 1 l L of the computation graph, JP isa general (learnable) permutation-invariant function (Mur-phy et al., 2019), N (u) is the set of neighbors of u 2 V ,and h(l)

u 2 Rd(l)h is a vector describing the embedding of

node u at layer l. h(0)u is the feature vector of node u,

(X(v))u,· or can be assigned a constant c if u has no fea-tures. Under this framework, node embeddings can be useddirectly to predict node-level targets, or all node embed-dings can be aggregated (via a learnable function) to forman embedding hG used for graph-wide tasks.

There are several variations of Equation 4 in the literature.Duvenaud et al. (2015) proposed using embeddings fromall layers l 2 {1, 2, . . . , L} for graph classification. Hamil-ton et al. (2017a) used a similar framework for node clas-sification and link prediction tasks, using the embeddingat the last layer, while Xu et al. (2018) extend Hamiltonet al. (2017a) to once again use embeddings at all layersfor node and link prediction tasks. Other improvements in-clude attention (Velickovic et al., 2018). This approach canbe derived from spectral graph convolutions (e.g., (Kipf &Welling, 2017)). More GNNs are discussed in Section 3.

Recently, Xu et al. (2019); Morris et al. (2019) showedthat these architectures are at most as powerful as theWeisfeiler-Lehman (WL) algorithm for testing graph iso-morphism (Weisfeiler & Lehman, 1968), which itself ef-fectively follows a message-passing scheme. Accordingly,we will broadly refer to models defined by Equation 4 asWL-GNNs. Xu et al. (2019) proposes a WL-GNN calledGraph Isomorphism Network (GIN) which is as powerfulas the WL test in graphs with discrete features.

Can a WL-GNN be more powerful than the WL test?WL-GNNs inherit a shortcoming from the WL test (Cai

Gskip(11, 2) Gskip(11, 3)

Figure 1: The WL test incorrectly deems these isomorphic.

et al., 1992; Arvind et al., 2017; Furer, 2017; Morris et al.,2019); node representations h(l)

u do not encode whethertwo nodes have the same neighbor or distinct neighbors

with the same features, limiting their ability to learn an ex-pressive representation of the entire graph. Consider a taskwhere graphs represent molecules, where node features in-dicate atom type and edges denote the presence or absenceof bonds. Here, the first WL-GNN layer cannot distinguishthat two (say) carbon atoms have a bond with the same car-bon atom or a bond to two distinct carbon atoms. Succes-sive layers of the WL-GNN update node representationsand the hope is that nodes eventually get unique representa-tions (up to isomorphisms), and thus allow the WL-GNN todetect whether two nodes have the same neighbor based onthe representations of their neighbors. However, if there aretoo few WL-GNN layers or complex cycles in the graph,the graph and its nodes will not be adequately represented.

To better understand this challenge, consider the extremecase illustrated by the two graphs in Figure 1. These arecycle graphs with M = 11 nodes where nodes that areR 2 {2, 3} ‘hops’ around the circle are connected by anedge. These highly symmetric graphs, which are specialcases of circulant graphs (Vilfred, 2004) are formally de-fined in Definition 2.1 but the key point is that the WL test,and thus WL-GNNs, cannot distinguish these two noniso-morphic graphs.

Definition 2.1: [Circulant Skip Links (CSL) graphs] LetR and M be co-prime natural numbers1 such that R <M � 1. Gskip(M,R) denotes an undirected 4-regular graphwith vertices {0, 1, . . . ,M � 1} whose edges form a cycleand have skip links. That is, for the cycle, {j, j + 1} 2 Efor j 2 {0, . . . ,M � 2} and {M � 1, 0} 2 E. For theskip links, recursively define the sequence s1 = 0, si+1 =(si +R) mod M and let {si, si+1} 2 E for any i 2 N. ⌃We will use RP to help WL-GNNs overcome this short-coming. Let f

*

be a WL-GNN that we make permutationsensitive by assigning each node an identifier that dependson ⇡. Permutation sensitive IDs prevent the RP sum fromcollapsing to just one term but more importantly help dis-tinguish neighbors that otherwise appear identical. In par-ticular, given any ⇡ 2 ⇧|V |, we append to the rows of X(v)

⇡

one-hot encodings of the row number before computing f*

.We can represent this by an augmented vertex attribute ma-

1Two numbers are co-primes if their only common factor is 1.

Relational Pooling for Graph Representations

where CNN denotes a 2D (LeCun et al., 1989; Krizhevskyet al., 2012) if there are no edge features and a 3D CNN (Jiet al., 2013) if there are edge features, [· on ·] is a concatena-tion of the representations, and MLP is a multilayer percep-tron. Multi-resolution 3D convolutions (Qi et al., 2016) canbe used to map variable-sized graphs into the same sizedrepresentation for downstream layers.

GNNs. The function f*

can also be a graph neural network(GNN), a broad class of models that use the graph G itselfto define the computation graph. These are permutation-invariant by design but we will show that their integrationinto RP can (1) make them more powerful and (2) speed uptheir computation via theoretically sound approximations.The GNNs we consider follow a message-passing (Gilmeret al., 2017) scheme defined by the recursion

h(l)u = �(l)

⇣h(l�1)u , JP

�(h(l�1)

v )v2N (u)

�⌘, (4)

where �(l) is a learnable function with distinct weights ateach layer 1 l L of the computation graph, JP isa general (learnable) permutation-invariant function (Mur-phy et al., 2019), N (u) is the set of neighbors of u 2 V ,and h(l)

u 2 Rd(l)h is a vector describing the embedding of

node u at layer l. h(0)u is the feature vector of node u,

(X(v))u,· or can be assigned a constant c if u has no fea-tures. Under this framework, node embeddings can be useddirectly to predict node-level targets, or all node embed-dings can be aggregated (via a learnable function) to forman embedding hG used for graph-wide tasks.

There are several variations of Equation 4 in the literature.Duvenaud et al. (2015) proposed using embeddings fromall layers l 2 {1, 2, . . . , L} for graph classification. Hamil-ton et al. (2017a) used a similar framework for node clas-sification and link prediction tasks, using the embeddingat the last layer, while Xu et al. (2018) extend Hamiltonet al. (2017a) to once again use embeddings at all layersfor node and link prediction tasks. Other improvements in-clude attention (Velickovic et al., 2018). This approach canbe derived from spectral graph convolutions (e.g., (Kipf &Welling, 2017)). More GNNs are discussed in Section 3.

Recently, Xu et al. (2019); Morris et al. (2019) showedthat these architectures are at most as powerful as theWeisfeiler-Lehman (WL) algorithm for testing graph iso-morphism (Weisfeiler & Lehman, 1968), which itself ef-fectively follows a message-passing scheme. Accordingly,we will broadly refer to models defined by Equation 4 asWL-GNNs. Xu et al. (2019) proposes a WL-GNN calledGraph Isomorphism Network (GIN) which is as powerfulas the WL test in graphs with discrete features.

Can a WL-GNN be more powerful than the WL test?WL-GNNs inherit a shortcoming from the WL test (Cai

Gskip(11, 2) Gskip(11, 3)

Figure 1: The WL test incorrectly deems these isomorphic.

et al., 1992; Arvind et al., 2017; Furer, 2017; Morris et al.,2019); node representations h(l)

u do not encode whethertwo nodes have the same neighbor or distinct neighbors

with the same features, limiting their ability to learn an ex-pressive representation of the entire graph. Consider a taskwhere graphs represent molecules, where node features in-dicate atom type and edges denote the presence or absenceof bonds. Here, the first WL-GNN layer cannot distinguishthat two (say) carbon atoms have a bond with the same car-bon atom or a bond to two distinct carbon atoms. Succes-sive layers of the WL-GNN update node representationsand the hope is that nodes eventually get unique representa-tions (up to isomorphisms), and thus allow the WL-GNN todetect whether two nodes have the same neighbor based onthe representations of their neighbors. However, if there aretoo few WL-GNN layers or complex cycles in the graph,the graph and its nodes will not be adequately represented.

To better understand this challenge, consider the extremecase illustrated by the two graphs in Figure 1. These arecycle graphs with M = 11 nodes where nodes that areR 2 {2, 3} ‘hops’ around the circle are connected by anedge. These highly symmetric graphs, which are specialcases of circulant graphs (Vilfred, 2004) are formally de-fined in Definition 2.1 but the key point is that the WL test,and thus WL-GNNs, cannot distinguish these two noniso-morphic graphs.

Definition 2.1: [Circulant Skip Links (CSL) graphs] LetR and M be co-prime natural numbers1 such that R <M � 1. Gskip(M,R) denotes an undirected 4-regular graphwith vertices {0, 1, . . . ,M � 1} whose edges form a cycleand have skip links. That is, for the cycle, {j, j + 1} 2 Efor j 2 {0, . . . ,M � 2} and {M � 1, 0} 2 E. For theskip links, recursively define the sequence s1 = 0, si+1 =(si +R) mod M and let {si, si+1} 2 E for any i 2 N. ⌃We will use RP to help WL-GNNs overcome this short-coming. Let f

*

be a WL-GNN that we make permutationsensitive by assigning each node an identifier that dependson ⇡. Permutation sensitive IDs prevent the RP sum fromcollapsing to just one term but more importantly help dis-tinguish neighbors that otherwise appear identical. In par-ticular, given any ⇡ 2 ⇧|V |, we append to the rows of X(v)

⇡

one-hot encodings of the row number before computing f*

.We can represent this by an augmented vertex attribute ma-

1Two numbers are co-primes if their only common factor is 1.

Follow-up work to resolve corner cases but with exponential time complexity:[Murphy+ ICML 2019]

The two graphs look the same for WL test because all the nodes have the same local subtree structure!

¡ Graph classification: social and bio/chem graphsTraining accuracy of different GNN architectures.GIN fits training data much better than GCN, GraphSAGE.


WL kernelGIN

GCNGraphSAGE

¡ Graph classification: social and bio/chem graphsTraining accuracy of different GNN architectures.Same trend across datasets!


GIN

GraphSAGE

GCN


¡ Graph classification: social and bio/chem graphs

GIN outperforms existing GNNs also in terms of test accuracy because it can better capture graph structure.

Data / Model GIN(Powerful) Sum-1 GCN

(Mean)GraphSAGE

(Max)

RDT-B 92.4 90.0 50.0 50.0

IMDB-B 75.1 74.1 74.0 72.3

NCI1 82.7 82.0 80.2 77.7

MUTAG 89.4 90.0 85.6 85.1

Data / Model GIN(Powerful) Sum-1 GCN

(Mean)GraphSAGE

(Max)

RDT-B 92.4 90.0 50.0 50.0

IMDB-B 75.1 74.1 74.0 72.3

NCI1 82.7 82.0 80.2 77.7

MUTAG 89.4 90.0 85.6 85.1

Reddit dataset does not have node features!

¡ Existing GNNs use non-injective neighbor aggregation, thus have low discriminative power

¡ GIN uses injective neighbor aggregation, and is as discriminative as the WL graph isomorphism test

¡ GIN achieves state-of-the-art test performance in graph classification


¡ Deep Neural Networks are vulnerable to adversarial attacks!

¡ Attacks are often implemented as imperceptible noise that changes the prediction


Gradient direction that changes prediction

52

¡ Adversaries are very common in applications of graph neural networks, e.g., search engines, recommender systems, social networks, etc.

¡ These adversaries will exploit any exposed vulnerabilities!

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

53

¡ Adversaries are very common in applications of graph neural networks. e.g. search engines, recommender systems, social networks.

¡ These adversaries will exploit any vulnerabilities exposed.

Are GNNs robust to adversarial attacks?


54

Labeled

??

?

?

?

??

Message passing

¡ Here we focus on semi-supervised node classification using Graph Convolutional Neural Networks (GCN) [Kipf+ ICLR’2017].

GCN

Labeled

?: Unlabeled

Input: Partially labeledattributed graph

Goal: Predict labels ofunlabeled nodes



𝐴 ∈ 0,1 5×5: Adjacency matrix𝑋 ∈ 0,1 5×8: (binary) node attributes

𝑠𝑜𝑓𝑡𝑚𝑎𝑥 >𝐴 𝑅𝑒𝐿𝑈 >𝐴𝑋𝑊 C 𝑊 D

>𝐴 ≡ 𝐷GHI 𝐴 + 𝐼 𝐷G

HI : Renormalized adjacency matrix

Classification Model: Two-step GCN message passing

Training:Minimize cross entropy loss on labeled data

Testing:Apply the model to predict unlabeled data

Labeled

??

?

?

?

??

Labeled

Target node 𝑡 ∈ 𝑉: node whose classification label we want to changeAttacker nodes 𝑆 ⊂ 𝑉: nodes the attacker can modify

56

Target node

Attacker nodeAttacker node¡ What are attack possibilities in real world?



Direct attack (𝑆 = {𝑡})

¡ Modify thetarget‘s features

¡ Add connectionsto the target

¡ Remove connectionsfrom the target

57

Target node

Attacker nodeAttacker node

Change websitecontent

Buy likes/ followers

Example

Unfollowuntrustedusers

¡ What are attack possibilities in real world?







58

Target node

Indirect attack (𝑡 ∉ 𝑆)

• Modify theattackers‘ features

• Add connectionsto the attackers

• Remove connectionsfrom the attackers




Example


Hijack friendsof target

Create a link/ spam farm

Example








59

Target node

Indirect attack (𝑡 ∉ 𝑆)

• Modify theattackers‘ features

• Add connectionsto the attackers

• Remove connectionsfrom the attackers




Example


Hijack friendsof target

Create a link/ spam farm

Example



How to mathematically formalize these attack

possibilities?

¡ Zügner+, Adversarial Attacks on Neural Networks for Graph Data, KDD’18

Maximize (Change of predicted labels of target node)Subject to (Limited noise in the graph)


111

011

101

000

110?

Graph

GCN1

0

Class 1 Class 2 Class 3

Add small graph noise

Target node

Change the predicted label

argmaxVW,XW

maxYZY[\]

log 𝑍a,Y∗ − log 𝑍a,Y[\]∗

𝑤ℎ𝑒𝑟𝑒 𝑍∗ = 𝑓g∗ 𝐴h, 𝑋h =𝑠𝑜𝑓𝑡𝑚𝑎𝑥 >𝐴′ 𝑅𝑒𝐿𝑈 >𝐴′𝑋′𝑊 C 𝑊 D ,

𝑤𝑖𝑡ℎ 𝜃∗ = argmingℒ (𝜃; 𝐴h, 𝑋h)

𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

¡ Find a modified graph that maximizes the change of predicted labels of target node


Let’s parse the objective function!

argmaxVW,XW

maxYZY[\]




𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)



Modified adjacency matrix

Modified nodefeature

Original prediction

New prediction (specified by attacker)

Target node



argmaxVW,XW

maxYZY[\]




𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

Increase the loglikelihood of target node 𝑣 being predicted as 𝑐

Decrease the loglikelihood of target node 𝑣being predicted as 𝑐tuv



GCN is trained on modified graph, which is then used to predict labels of the target node.

argmaxVW,XW

maxYZY[\]




𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

argmaxVW,XW

maxYZY[\]




𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)



The modified graph should be close to the original graph.E.g., Fixed budget of edge deletion/addition, node attribute perturbation

¡ In practice, we cannot exactly solve the optimization problem because…§ Graph modification is discrete (cannot use simple

gradient descent to optimize)§ Inner loop involves expensive re-training of GCN


argmaxVW,XW

maxYZY[\]




𝑠. 𝑡. 𝐴h, 𝑋h ≈ (𝐴, 𝑋)

¡ Some heuristics have been proposed to efficiently obtain an approximate solution.

¡ For example:§ Greedily choosing the step-by-step graph

modification§ Simplifying GCN by removing ReLU activation (to

work in closed form)§ Etc.


More details in Zügner+ KDD’2018!

¡ Semi-supervised node classification with GCN§ Class predictions for a single node, produced by 5 GCNs with

different random initializations

68

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6(correct)

Class 7

Predicted probabilities (clean)

Classification margin> 0: Correct classification< 0: Incorrect classification


¡ The GCN prediction is easily manipulated by only 5 modifications of graph structure (|V|=~2k, |E|=~5k)

69


Class 7

Predicted probabilities after attack (5 modifications)


70


Class 7

Predicted probabilities after attack (5 modifications)

¡ The prediction of GCN is easily manipulated by only 5 modifications of graph structure.

GNNs are not robust to adversarial attacks!



¡ Chemistry: Molecular graphs§ Molecular property prediction

¡ Biology: Protein-Protein Interaction Networks§ Protein function prediction

= toxic?

GNN ( ) = adenylate cyclase activity?

GNN( )

¡ Scarcity of labeled data§ Labels require expensive experiments

à Models overfit to small training datasets

¡ Out-of-distribution prediction§ Test examples are very different from training in

scientific discovery à Models typically perform poorly


¡ Pre-training GNNs [Hu+ 2019]


GNN

Pre-train GNNs on relevant, easy to obtain graph data

Unlabeled molecules

Chemistry database

Pre-train

Downstream task 1

Downstream task 2

Downstream task N

Fine-tunewith chemical knowledge

¡ We have seen how to attack GNNs¡ Open question:

How to defend against the attacks?

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, hpp://cs224w.stanford.edu 75

Challenges:¡ Tractable optimization on discrete graph data¡ Achieving good trade-off between Accuracy

and Robustness

cs224w: machine learning with graphs jure leskovec, weihua...

Documents