uag: uncertainty-aware attention graph neural network for

UAG: Uncertainty-aware Attention Graph Neural Networkfor Defending Adversarial Attacks

Boyuan Feng, Yuke Wang, Yufei Ding{boyuan, yuke_wang, yufeiding}@ucsb.edu

University of California, Santa Barbara

AbstractWith the increasing popularity of graph-based learning, graphneural networks (GNNs) emerge as the essential tool for gain-ing insights from graphs. However, unlike the conventionalCNNs that have been extensively explored and exhaustivelytested, people are still worrying about the GNNs’ robustnessunder the critical settings, such as financial services. The mainreason is that existing GNNs usually serve as a black-box inpredicting and do not provide the uncertainty on the predic-tions. On the other side, the recent advancement of Bayesiandeep learning on CNNs has demonstrated its success of quanti-fying and explaining such uncertainties to fortify CNN models.Motivated by these observations, we propose UAG, the firstsystematic solution to defend adversarial attacks on GNNsthrough identifying and exploiting hierarchical uncertaintiesin GNNs. UAG develops a Bayesian uncertainty technique toexplicitly capture uncertainties in GNNs and further employsan uncertainty-aware attention technique to defend adversar-ial attacks on GNNs. Intensive experiments show that ourproposed defense approach outperforms the state-of-the-artsolutions by a significant margin.

IntroductionAs the emerging trend of extending deep learning fromEuclidean data (e.g., images) to non-Euclidean data (e.g.,graphs), graph neural network (GNN) (Xu et al. 2019b; Kipfand Welling 2017; Velickovic et al. 2018) wins lots of atten-tions from both research and industrial domains. Comparedwith the conventional graph-learning approaches (e.g., ran-dom walk (Grover and Leskovec 2016; Perozzi, Al-Rfou, andSkiena 2014), and graph Laplacians (Luo et al. 2011, 2009;Cheng et al. 2018)), GNNs excel at both computation effi-ciency and runtime performance for various tasks, such as thenode classification (Kaspar and Horst 2010; Gibert, Valveny,and Bunke 2012; Duran and Niepert 2017) and link predic-tion (Chen, Li, and Huang 2005; Kunegis and Lommatzsch2009; Tylenda, Angelova, and Bedathur 2009). Despite thestunning success, people still concern about the robustnessof GNNs, especially in some safety-critical domains (e.g.,financial services and medicinal chemistry). Existing work(Zügner and Günnemann 2019; Xu et al. 2019a; Waniek et al.

Copyright © 2021, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

2018) has shown that GNNs are sensitive to small perturba-tions on the topology and the node features, which motivatesour work for defending adversarial attacks.

The most recent work, RGCN (Zhu et al. 2019), improvesGNN robustness with a simple strategy that replaces deter-ministic GNN features with a Gaussian distribution and mea-sures the variance in the intermediate feature vectors. How-ever, it assumes fixed GNN weights without quantifying theuncertainty from GNN models and does not consider the un-certainty from the graph topology, leading to unsatisfactoryaccuracy under severe attacks. We believe the key to improvethe GNN robustness is to develop a powerful technique toquantify and exploit uncertainties from various sources toabsorb the effect of adversarial attacks.

In this paper, we focus on exploring the benefits of explic-itly quantifying GNN uncertainty to defend GNNs againstadversarial attacks. In particular, there are two major types ofuncertainties in GNNs – the model uncertainty and the datauncertainty. The former refers to the uncertainty in model pa-rameters to tell whether the selected parameters can best suitthe distribution of the collected data. The latter refers to theuncertainty in the noisy data collection, coming from eitherthe noises in the data collection process or the adversarialattacks. However, exploring these uncertainties is non-trivialsince there are several challenges to overcome:

1. Uncertainty Measurement: How to explicitly measurethe uncertainty of GNNs?

2. Robustness: How to effectively incorporate the measureduncertainty into existing GNNs for defending adversarialattacks?

To tackle these challenges, we propose the first Bayesian-based uncertainty guided approach to defend the GNN effec-tively. First, we develop a Bayesian uncertainty techniquebased on the powerful Bayesian framework to capture theseuncertainties from different sources. Intuitively, we measurethe uncertainty value for individual nodes where a higheruncertainty usually indicates a lower prediction accuracy.Then, we design an uncertainty-aware attention techniqueto dynamically adjust the impact of one node towards itsneighboring nodes according to its uncertainty. In particular,for nodes with high uncertainty that may have been attacked,

PRELIMINARY VERSION: DO NOT CITE The AAAI Digital Library will contain the published

version some time after the conference

we restrict its feature propagation towards neighboring nodesin order to absorb the attack impact.

In short, we summerize our contributions as follows:

• We identify two types of uncertainties in GNNs (i.e., modeluncertainty and data uncertainty) and propose a Bayesianuncertainty technique to explicitly capture both types ofuncertainties.

• We introduce an uncertainty-aware attention techniqueto defend the adversarial attack by assigning less impact(weights) on nodes with high uncertainty, thus, mitigatingtheir impact on the final prediction.

• Rigorous experiments and studies across various datasetson mainstream GNNs show that our proposed defenseapproach outperforms the state-of-the-art RGCN by a sig-nificant margin.

Related WorkGraph Neural Network Graph Neural Networks (GNNs)are now becoming a major way of gaining insights fromthe graph structures. It generally includes several graph con-volutional layers, each of which consists of a neighbor ag-gregation and a node update step. The most common graphconvolutional layer (Kipf and Welling 2017) computes theembedding for node v at layer k+1 based on node embeddingat layer k, where k ≥ 0.

h(k+1)v = σ(

∑u∈N(v)

1

cucvh(k)u ·W (k)) (1)

As shown in Equation 1, h(k)v is the embedding vector fornode v at layer k, W (k) is the GNN weight at layer k, andN(v) = N(v) ∪ v is the set of node v and its neighboringnodes. cu and cv are fixed values determined by the degreeof node u and v and will be omitted in following sections fornotation simplicity. Intuitively, the graph convolution layeraggregates information across nodes by averaging features innearby nodes. More advanced GNNs utilize different aggre-gation methods. For example, GAT (Velickovic et al. 2018)aggregates node features with weighted average based on thecosine similarity between node features.Graph Adversarial Attacks and Defense Existing workshave explored the robustness of the GNNs in two opposite butclosely related directions, GNN attacks, and GNN defense.On the attack side, existing GNN attacks can be broadly clas-sified into two major categories, poisoning (Zügner, Akbarne-jad, and Günnemann 2018; Zügner and Günnemann 2019)and evasion (Dai et al. 2018), depending on the time theyhappen. The former (poisoning attack) happens during thetraining time of the GNNs through modifying training dataand the latter (evasion attack) takes place during the GNNinference time by changing test data samples. Our work isorthogonal and complementary to these existing GNN attackresearch, since 1) our goal is to minimize the impact of theseGNN attacks by incorporating model and data uncertaintiesduring the GNN computation; 2) our defense-oriented re-search may potentially motivate more diverse adversarialattacks tailored for GNNs.

MUB Model Uncertainty

DataUncertainty

Prediction

Deterministic Branch

Probabilistic Branch

Graph Topology (𝑨) Node Features (𝑿)

GCN Conv

GCN Conv

GCN Conv

DUB

Attention Attention Attention

Figure 1: Overview of UAG.

On the defense side, RGCN (Zhu et al. 2019) proposes anovel model to make GCN immune from adversarial attacksby leveraging Gaussian distributions to reduce the impactof GNN attacks. Different from RGCN, our UAG is thefirst work to identify and quantify how adversarial attacksaffect GNN’s performance – model and data uncertainties thattake both model (weight) and data (topology and embeddingfeatures) into consideration. And we further exploit suchuncertainty information by incorporating novel techniques tofacilitate the defense.Bayesian Neural Network and Uncertainty Many researchefforts have been made towards developing Bayesian NeuralNetwork to measure uncertainty (Gal 2016) in computer vi-sion (Kendall and Gal 2017; Gal and Ghahramani 2016a,b;Alex Kendall and Cipolla 2017), natural language processing(Xiao and Wang 2019), and time series analysis (Zhu andLaptev 2017). These works usually focus on convolutionalneural networks and use Bayesian Neural Network as a regu-larization technique. Some recent contributions (Zhang et al.2019; Hasanzadeh et al. 2020) extend the Bayesian NeuralNetwork to the graph domain as a stochastic regularizationtechnique. These works aim to solve the over-smoothing prob-lem in GNNs and do not explicitly quantify the uncertainty.To the best of our knowledge, we are the first to exploit theuncertainty in the graph domain to defend adversarial attacks.

MethodologyThe overview of UAG is presented in Figure 1. UAG takestwo inputs – the adjacency matrix A ∈ RN×N and the nodefeatures X ∈ RN×D, where N is the number of nodes andD is the dimension of node features. There are two mainbranches in UAG – the probabilistic branch (including theModel Uncertainty Branch (MUB) and the Data UncertaintyBranch (DUB)) and the deterministic branch, where the ar-chitecture and weights are different across branches. Giventhe graph data (A,X), the probabilistic branch measures thenode-wise uncertainty U = [UM , UD] ∈ RN×2 from theGNN model weights and the graph data. Here, the proba-bilistic branch adopts a novel Bayesian uncertainty techniqueto measure the uncertainty for each node due to the adver-sarial attacks. The deterministic branch takes the measurednode-wise uncertainty U and the graph data (A,X) to gen-erate the node classification results Y ∈ RN . It containsan uncertainty-aware attention technique to adaptively ad-just the edge attention during the inference to defend againstadversarial attacks.

𝑨

𝒒(𝑾𝑴𝑼𝑩𝟏 )

...Combination-2

&𝒀𝟏

𝑼𝑴 (𝒀|𝑨,𝑿)

𝒒(𝑾𝑴𝑼𝑩𝟐 )

𝑿

)𝑾𝟏(𝟏)

...Combination-1

Aggregation AggregationAggregation

&𝒀𝟐 &𝒀𝑻

Aggregation

𝑾𝑫𝑼𝑩(𝟐)

𝑼𝑫 (𝒀|𝑨,𝑿)=,𝝈𝟐

)𝑾𝟐(𝟏) )𝑾𝑻

(𝟏)

)𝑾𝑻(𝟐))𝑾𝟐

(𝟐))𝑾𝟏(𝟐)

𝑾𝑫𝑼𝑩(𝟏)

(a) (b)

Figure 2: Bayesian Uncertainty Technique Overview. (a) Model Uncertainty Branch (MUB); (b) Data Uncertainty Branch (DUB)

Bayesian Uncertainty TechniqueBayesian uncertainty technique measures two sources of un-certainties – the model uncertainty UM ∈ RN and the datauncertainty UD ∈ RN . Formally, following the law of totalvariance, the uncertainty in the prediction Y is

V ar(Y ) = V ar(E[Y |A,X]) + E[V ar(Y |A,X)]

= UM (Y |A,X) + UD(Y |A,X)(2)

Here, we use a model uncertainty branch (MUB) and adata uncertainty branch (DUB) to access UM (Y |A,X) andUD(Y |A,X), respectively.

Model Uncertainty. The model uncertainty UM measuresthe uncertainty in the mapping process E[Y |A,X] due tomodel weight selection. We use a 2-layer GCN to quantify themodel uncertainty. Instead of using fixed weights, we utilizea probability distribution to describe the uncertainty fromthe model weights, as illustrated in Figure 2(a). Formally,given the graph data (A,X) and the partial label Y ∈ RNL

with NL as the number of labeled nodes, we first train theweight posterior distribution p(W |A,X). Then, we conductthe prediction mapping procedure as

p(Y |X,A) =∫W

p(Y |W,A,X) p(W |A,X)dW (3)

Since the exact inference is intractable, we adopt the MCdropout variational inference (Gal and Ghahramani 2016a)method by multiplying a deterministic model weight WMUB

with a random variableB following the Bernoulli distribution.This provides q(W ) as an approximation to the true posteriorp(W |A,X). In particular, the model weights W follows

q(W ) ∼ B �WMUB

P (B) ∼ Bernoulli(p)(4)

where � is the Hadamard product, and p is a hyperparameter(=0.8 by default in our evaluation). During training, we cantrain the weight by minimizing the cross-entropy loss

Lmodel = −1

T

T∑t=1

log p(Yt|Wt, A,X)+1− p2T||WMUB ||2 (5)

where Wt is sampled from q(W ), Yt is the prediction undersampled weight Wt, and T is the number of samples duringthe MC dropout variational inference. During inference, weperform the Monte Carlo integration:

E(Y |A,X) =1

T

T∑t=1

Yt (6)

where Yt ∈ RL is the prediction after the softmax layer, andL is the number of classes in the graph data.

Here, we observe that applying Bernoulli distribution atdifferent granularities leads to different probabilistic inter-pretation. To provide a comprehensive measurement on themodel uncertainty, we apply dropout independently for indi-vidual GNN layers, channels, nodes, and edges

W (k)uv = B(k)

uv �B(k)u �W (k) (7)

where W (k) ∈ Rfk×fk+1 is the GNN weights at the kth

layer, B(k)uv ∈ R determines the dropout on the edge-level,

B(k)u ∈ {0, 1}fk drops the weight at the channel level, and

fk is the number of feature channels at layer k. From theperspective of individual nodes, we have

h(k+1)v = σ(

∑u∈N(v)

h(k)u ·W (k)

uv )

= σ(∑

u∈N(v)

h(k)u · (B(k)

uv �B(k)u �W (k)))

(8)

where N(v) = N(v) ∪ v. Noting that this dropout also pro-vides a Bayesian view of dropping edges or nodes wheneither B(k)

uv = 0 or B(k)u = 0.

Given the Bayesian framework on GNNs, we can measurethe model uncertainty as the variance in predictions

UM (Y |A,X) = V ar(Y |A,X)

= E(Y 2|A,X)− [E(Y |A,X)]2

=1

T

T∑t=1

Y 2t − [E(Y |A,X)]2

(9)

Here, we additionally apply a reduce operation to trans-form the L-dimension vector UM to a scalar value as themodel uncertainty.

Figure 3: Relationship between accuracy and node diver-sity. DICE_0.2 and Mettack_0.2 indicates perturbing 20%edges with DICE and Mettack, respectively.

Data Uncertainty The data uncertainty UD measures theprediction noise intrinsic to the data inputs. There are twostandard approaches to identify the data uncertainty on eachnode. One approach utilizes the maximum predicted proba-bility to measure confidence in the prediction. However, thisconfidence comes as a side effect of the model training andlacks sophisticated probabilistic interpretation. The other ap-proach from the CNN domain predicts the uncertainty valuebased on the image inputs. However, naively borrowing thisapproach into the GNN domain focuses only on the node fea-tures and fails to exploit the important topology informationin the graph data.

Instead, we aim to capture the data uncertainty that isintrinsic to the graph topology in terms of node diversityDiv

(k)node, defined as the number of different labels in the

node’s k-hop neighbors. Our key observation is that adversar-ial attacks on graph data usually increase the node diversityand add edge connections between nodes with different labels.For example, DICE attack (delete edges internally, connectexternally) (Waniek et al. 2018) exploits the node label infor-mation to increase node diversity by deleting edges betweennodes with the same label and adding edges between nodeswith different labels. Figure 3 shows the accuracy amongnodes that have node diversity larger than various thresholds.We observe that the accuracy usually decreases significantlyas the node diversity increases, which holds for both the cleangraph data and the attacked graph data from various attackingalgorithms.

To this end, we explicitly measure the data uncertainty bytreating the prediction as a Gaussian distribution and settingthe variance to be the node diversity, as illustrated in Figure2(b). Formally, we have

Y ∼ N(µ(A,X), σ2(A,X)) (10)

where µ and σ2 are the predicted label and node diversity,respectively. Here, we parameterize the µ and σ2 with theadjacency matrix A and the node feature X and use a 2-layerGCN to predict their values. During inference, we will usethe σ2(A,X) as the data uncertainty

UD(Y |A,X) = σ2(A,X) (11)

Figure 4: Illustration of UAT. (a) Aggregation on the cleangraph; (b) aggregation when the red node is attack.

To train the data uncertainty, we have two losses on thelabeled nodes and unlabeled nodes, respectively. On the la-beled nodes, we focus on the ground truth labels and havea KL-divergence that requires the predicted distribution tomatch with the ground truth distribution

L1 = KL(N(µ(A,X), σ2(A,X))|N(Y, σ2)) (12)

where the Y comes from the ground truth label and σ2 mea-sures the node diversity Div(k)node in the graph data. Whencomputing the node diversity, we utilize only labeled node inthe clean graph data. In particular, for a given node, we firstcollect all labeled 2-hop neighboring nodes and then countthe number of distinct labels. On the unlabeled nodes, similarto the unsupervised learning (Bojchevski and Günnemann2018) on graph data, we focus on the graph topology andadopt an energy-based unsupervised loss

L2 =∑i

∑k<l

∑jk∈Nik

∑jk∈Nil

(E2ijk + exp−Eijl )

Eij = DKL(N(Yj , σ2j )||N(Yi, σ

2i ))

(13)

Assuming that a node tends to have a similar label withneighboring nodes, this loss implicitly captures the nodediversity by forcing higher feature similarities in neighboringnodes.

Uncertainty-aware Attention TechniqueThe Uncertainty-aware Attention Technique (UAT) adap-tively adjusts the edge attention during the inference to de-fend against adversarial attacks, as illustrated in Figure 4. Weequip a 2-layer GCN with our UAT. On a clean graph (Figure4a), we adopt edge aggregation similar to existing GNNsthat each node aggregates and propagates information acrossneighboring nodes. On an attacked graph (e.g., the red nodein Figure 4b), UAT adaptively limits the information prop-agation between the attacked node and other nodes. Whileexisting works (Zügner, Akbarnejad, and Günnemann 2018)have shown that attacking one node in the graph can also leadto the wrong prediction on other nodes, UAT mitigates it byreducing the impact from attacked nodes to remaining nodes.

Formally, given the feature h(k+1)u for each node u at the

k + 1 layer, we compute an attention Attτ (u) for each nodeu and compute each GNN layer as

h(k+1)v = σ(

∑u∈N(v)

Attuvτ · h(k)u ·W (k))

Attuvτ = min(Attτ (u), Attτ (v))

(14)

where τ ∈ {M,D} indicates whether we are using modeluncertainty or the data uncertainty, each node embedding hkuis weighted by an attention. Here, we use attention from bothnodes u and v to decide the attention value on the edge. Notethat the deterministic branch focuses on improving accuracyand utilizes independent weight from the probabilistic branch,which focuses on capturing uncertainties. We design twoattentions to measure the model uncertainty and the datauncertainty, respectively

Attτ (u) = exp(−ζ · Uτ,u)ζ = ατ · exp(−βτ ·Range(Uτ ))

(15)

where Uτ,u measures the uncertainty on node u, and ατ > 0and βτ > 0 are two hyperparameters to adjust the impactfrom uncertainty. Intuitively, a larger uncertainty Uτ,u on anode u leads to lower weight in the information propagationalong with the graph topology. Here, we additionally utilizea Range(Uτ ) operation to measure the global uncertaintydiversity in order to absorb the uniform uncertainty scalechange on all nodes under diverse attacks. In particular, wecollect Uτ for all nodes and measure the Range(Uτ ) as theabsolute difference between the first and the third quartiles.We have investigated several functions to combine the datauncertainty and the model uncertainty, and find out a simpleminimal combination can already lead to good performance

AttBoth(u) = min(AttM , AttD) (16)

Intuitively, we restrict the information propagation from onenode when it shows either a high model uncertainty or a highdata uncertainty.

EvaluationIn this section, we evaluate UAG on three popular datasetsand compare with three baselines to show its effectiveness.

Experiment EnvironmentsDatasets We select the most typical datasets (Cora, Citeseer,and Pubmed) used by many GNN papers (Kipf and Welling2017; Xu et al. 2019b; Hamilton, Ying, and Leskovec 2017)to evaluate our UAG. In these datasets, the node representsdocuments, edge refers to citations, and each node has itsown associated bag-of-word features. Table 1 summarizesthe details of these datasets. We follow the common data splitby selecting 10% nodes as the training dataset, 10% nodes asthe validation dataset, and 80% nodes as the testing dataset.Baselines Graph Convolutional Network (GCN) (Kipf andWelling 2017) is one of the most popular GNN architec-tures. It has been widely adopted in node classification,graph classification, and link prediction tasks. Besides, itis also the key backbone network for many other GNNs,such as GraphSage (Hamilton, Ying, and Leskovec 2017),and differentiable pooling (Diffpool) (Ying et al. 2018).Graph Attention Network (GAT) (Xu et al. 2019b), anothertypical type of GNN, aims to distinguish the graph-structurethat cannot be identified by GCN. GAT differs from GCNin its aggregation function, which assigns different weightsto different nodes during the aggregation. Robust GCN

Table 1: Datasets for Evaluation.

Dataset #Vertex #Edge #Dim #ClassCiteseer 3,327 9,464 3,703 6Cora 2,708 10,858 1,433 7Pubmed 19,717 88,676 500 3

(RGCN) (Zhu et al. 2019), leverages the Gaussian distri-butions for node representations to amortize the effects ofadversarial attacks.Attack Methods Random Attack is a popular attack methodthat randomly adds fake edges into the graph dataset withoutconsidering the label of nodes. DICE Attack (delete edgesinternally, connect externally) (Waniek et al. 2018) exploitsthe node label information to increase node diversity by delet-ing edges between nodes with the same label and addingedges between nodes with different labels. Mettack (Zügnerand Günnemann 2019) is another representative attack thatadopts a meta-learning approach to reason the loss change byiteratively perturbing individual edges and features.Platforms. We implement UAG based on PyTorch Geomet-ric (Fey and Lenssen 2019). We evaluate UAG on a DellWorkstation T7910 (Ubuntu 18.04) with an Intel Xeon CPUE5-2603, 64 GB memory, and an NVIDIA 1080Ti GPU with12GB memory.

Overall PerformanceIn this section, we demonstrate the effectiveness of our pro-posed UAG approach (UAG-both) by comparing accuracy(under different attack methods and different attack ratio)with the original unoptimized GCN and GAT model as wellas the one equipped with the state-of-the-art RGCN (Zhu et al.2019) defense method. Besides, to gain more design insights,we add two more baselines for comparison, including UAG-Data (only considering data uncertainty) and UAG-Model(only considering data uncertainty). We show the accuracyunder diverse attack ratio, defined as the number of attackededges over the number of total edges. For Mettack on Pubmeddataset, since running the adversarial attack for all nodes isvery time-consuming, we randomly sample 10% of them. Forother datasets and attack method, we attack on all nodes.

Figure 5 shows accuracy performance comparisons forRandom Attack among different implementations on theCora, Citeseer, and Pubmed datasets. As we can easily tellfrom those three sub-plots, UAG method and its variants(UAG-Data, UAG-Model, and UAG-Both) consistently out-perform the state-of-the-art RGCN defense approach. Themajor source of such performance improvements is thatUAG effectively captures the data uncertainty and the modeluncertainty, based on which UAG adaptively adjusts the edgeweights and the amount of information propagation betweennodes. For individual dataset settings, with the increase ofthe attack ratio, we see the overall trend of accuracy decreas-ing among these implementations. We also notice that onCora and Citeseer dataset, RGCN offers notable accuracyimprovement over the original GAT and GCN. However, itis still inferior compared with our UAG approach, since weexplicitly capture the uncertainties, instead of relying on a

Figure 5: Results of different methods when adopting Random Attack as the attack method.

Figure 6: Results of different methods when adopting DICE Attack as the attack method

Figure 7: Results of different methods when adopting Mettack as the attack method.

Figure 8: Relationship between Accuracy and Uncertainty. Left: Model Uncertainty v.s. Accuracy. Mid: Data Uncertainty v.s.Accuracy. Right: Data Uncertainty v.s. True Diversity.

(a) Benefit of Loss Designs.

50

55

60

65

70

75

80

85

1 2 3 4 5 6

Accuracy (%)

Attack Ratio

GCN α_ꓓ=15,𝛽_D=2.5α_ꓓ=20,𝛽_D=2.5 α_ꓓ=10,𝛽_D=2.5α_ꓓ=15,𝛽_D=2.0 α_ꓓ=15,𝛽_D=3.0

(b) Impact of Data Hyperparameters

50

55

60

65

70

75

80

85

1 2 3 4 5 6

Acc

ura

cy (

%)

Attack Ratio

GCN α_M=15,𝛽_M=2.5α_M=20,𝛽_M=2.5 α_M=10,𝛽_M=2.5α_M=15,𝛽_M=2.0 α_M=15,𝛽_M=3.0

(c) Impact of Model Hyperparameters

Figure 9: Loss Design Benefits and Parameter Analysis.

0.2 0.4 0.6 0.8Weight Value (Attack Ratio: Red: 0.2, Blue: 0.8)

0

1

2

3

4

5

6

7

8

Freq

uenc

y

attack ratio=20%attack ratio=80%

Figure 10: Left: Accuracy of Static Edge Weights. Right:Edge weight distribution under Random Attack.

Gaussian distribution to implicitly defend adversarial attacks.Another observation is that UAG-Both usually outperformsUAG-Data and UAG-Model under diverse datasets and attackratios. The reason is that the data uncertainty and the modeluncertainty capture uncertainties from different sources andcombining these two values usually offers a more compre-hensive measurement on the prediction uncertainties.

On DICE Attack (Figure 6), besides the similar observa-tions as the above attack setting, we have more observations.GAT and RGCN are sensitive towards DICE attack as demon-strated with significant accuracy drop with the increase ofattack ratio, because DICE intentionally increases the nodediversity by adding cross-community connections (i.e., edgesbetween nodes with different labels). In contrast, our UAGapproach can handle this attack effectively, because it ex-plicitly captures the node diversity as the data uncertainty.We also observe a similar performance trend on Mettack(Figure 7) as the previous two types of attacks and furtherdemonstrate the advantage of our UAG approach in terms ofhigher accuracy under diverse attack ratios.

Ablation StudiesIn this section, we conduct a set of ablation studies for in-depth analysis.Accuracy and Uncertainty. As shown in Figure 8(a) and(b), the increase of the model uncertainty and data uncer-tainty would lead to the decrease of the accuracy. This alsostrengthens our initial assumption that the relationship be-tween uncertainties and accuracy can be explored to defendadversarial attacks. Figure 8(c) exhibits the relation betweendata uncertainty and node diversity, showing the effectivenessof our Loss design (Eq 12) in learning the node diversity.Loss Design Benefits. Figure 9(a) validates the effectiveness

of the two loss designs in the data uncertainty. Here, L1(Eq 12) forces UAG to learn the node diversity in labelednodes, while L2 (Eq 13) is an unsupervised loss that im-plicitly encodes the node diversity in unlabeled nodes. Weobserve that the UAG with L1+L2 outperforms only L1 or L2,since it fully exploits both the labeled and unlabeled node.Parameter Analysis. Figure 9(b) and (c) shows the impactfrom different values of the data-uncertainty-related hyperpa-rameters and the model-uncertainty-related hyperparameterson the UAG performance. Intuitively, larger values of ατ andlower value of βτ lead to a stronger impact from the uncer-tainties and higher accuracy under large attack ratios, whereτ ∈ {M,D}. However, setting ατ too large or βτ too smallmay also impose too many constraints on the informationpropagation. In our experiments, we can achieve satisfyingresults by using ατ = 15 and βτ = 2.5.Uncertainty on Attention Values. Figure 10 visualizes theweight changes under different ratios of Random Attack. Wecan notice that a higher random attack ratio would lead todenser weight distribution towards zero. This is because ourUAG approach would try to amortize the impact of suchattacks by changing weight values towards zero that canminimize the value propagation between neighboring nodes.Furthermore, we consider pre-assigning different weightsto show the key importance of weight value for adversarialattacks. We can see that the higher the static weight value thepoor the performance in maintaining model accuracy underthe attack. The major reason is more “attack impacts” willbe propagated to different nodes through node aggregation,thus, lowering the model overall performance. Our UAG canadaptively determine the weight value for different nodesbased on the model and data uncertainty factor, thus, largelyabsorbing the influence of the adversarial attack.

ConclusionIn this paper, we propose UAG, the first systematic defensesolution for adversarial attacks on GNNs by considering hier-archical uncertainty in GNNs. UAG incorporates a Bayesianuncertainty technique to explicitly capture uncertainty inGNNs and further employs an uncertainty-aware attentiontechnique to fortify GNNs. Extensive experiments furtherdemonstrate UAG’s advantages over the state-of-the-art so-lutions. Overall, our work paves a new way of exploringuncertainty benefits in GNN research.

ReferencesAlex Kendall, V. B.; and Cipolla, R. 2017. Bayesian SegNet:Model Uncertainty in Deep Convolutional Encoder-DecoderArchitectures for Scene Understanding. In Tae-Kyun Kim,Stefanos Zafeiriou, G. B.; and Mikolajczyk, K., eds., Pro-ceedings of the British Machine Vision Conference (BMVC),57.1–57.12. BMVA Press. ISBN 1-901725-60-X. doi:10.5244/C.31.57. URL https://dx.doi.org/10.5244/C.31.57.

Bojchevski, A.; and Günnemann, S. 2018. Deep GaussianEmbedding of Graphs: Unsupervised Inductive Learning viaRanking. In International Conference on Learning Represen-tations. URL https://openreview.net/forum?id=r1ZdKJ-0W.

Chen, H.; Li, X.; and Huang, Z. 2005. Link prediction ap-proach to collaborative filtering. In Proceedings of the 5thACM/IEEE-CS Joint Conference on Digital Libraries (JCDL),141–142. IEEE.

Cheng, D.; Gong, Y.; Chang, X.; Shi, W.; Hauptmann, A.; andZheng, N. 2018. Deep feature learning via structured graphLaplacian embedding for person re-identification. PatternRecognition 82: 94–104.

Dai, H.; Li, H.; Tian, T.; Huang, X.; Wang, L.; Zhu, J.; andSong, L. 2018. Adversarial attack on graph structured data.arXiv preprint arXiv:1806.02371 .

Duran, A. G.; and Niepert, M. 2017. Learning graph rep-resentations with embedding propagation. In Advances inneural information processing systems (NIPS), 5119–5130.

Fey, M.; and Lenssen, J. E. 2019. Fast Graph RepresentationLearning with PyTorch Geometric. In ICLR Workshop onRepresentation Learning on Graphs and Manifolds (ICLR).

Gal, Y. 2016. Uncertainty in Deep Learning. Ph.D. thesis,University of Cambridge.

Gal, Y.; and Ghahramani, Z. 2016a. Dropout as a BayesianApproximation: Representing Model Uncertainty in DeepLearning. In Proceedings of the 33rd International Confer-ence on International Conference on Machine Learning -Volume 48, ICML’16, 1050–1059. JMLR.org.

Gal, Y.; and Ghahramani, Z. 2016b. A TheoreticallyGrounded Application of Dropout in Recurrent Neural Net-works. In Proceedings of the 30th International Confer-ence on Neural Information Processing Systems, NIPS’16,1027–1035. Red Hook, NY, USA: Curran Associates Inc.ISBN 9781510838819.

Gibert, J.; Valveny, E.; and Bunke, H. 2012. Graph embed-ding in vector spaces by node attribute statistics. PatternRecognition 45(9): 3072–3083.

Grover, A.; and Leskovec, J. 2016. node2vec: Scalable fea-ture learning for networks. In Proceedings of the 22nd ACMinternational conference on Knowledge discovery and datamining (SIGKDD), 855–864.

Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive rep-resentation learning on large graphs. In Advances in neuralinformation processing systems (NIPS), 1024–1034.

Hasanzadeh, A.; Hajiramezanali, E.; Boluki, S.; Zhou, M.;Duffield, N.; Narayanan, K.; and Qian, X. 2020. Bayesian

Graph Neural Networks with Adaptive Connection Sampling.In ICML 2020: International Conference on Machine Learn-ing. URL https://arxiv.org/abs/2006.04064.

Kaspar, R.; and Horst, B. 2010. Graph classification andclustering based on vector space embedding, volume 77.World Scientific.

Kendall, A.; and Gal, Y. 2017. What uncertainties do we needin bayesian deep learning for computer vision? In Advancesin neural information processing systems (NeurIPS), 5574–5584.

Kipf, T. N.; and Welling, M. 2017. Semi-supervised classi-fication with graph convolutional networks. InternationalConference on Learning Representations (ICLR) .

Kunegis, J.; and Lommatzsch, A. 2009. Learning spectralgraph transformations for link prediction. In Proceedingsof the 26th Annual International Conference on MachineLearning (ICML), 561–568.

Luo, D.; Ding, C.; Huang, H.; and Li, T. 2009. Non-negativelaplacian embedding. In 2009 Ninth IEEE InternationalConference on Data Mining (ICDM), 337–346. IEEE.

Luo, D.; Nie, F.; Huang, H.; and Ding, C. H. 2011. Cauchygraph embedding. In Proceedings of the 28th InternationalConference on Machine Learning (ICML), 553–560.

Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. DeepWalk:Online Learning of Social Representations. In Proceedingsof the 20th ACM International Conference on KnowledgeDiscovery and Data Mining (SIGKDD), 701–710. New York,NY, USA: Association for Computing Machinery. ISBN9781450329569. doi:10.1145/2623330.2623732. URL https://doi.org/10.1145/2623330.2623732.

Tylenda, T.; Angelova, R.; and Bedathur, S. 2009. Towardstime-aware link prediction in evolving social networks. InProceedings of the 3rd workshop on social network miningand analysis, 1–10.

Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò,P.; and Bengio, Y. 2018. Graph Attention Networks. In In-ternational Conference on Learning Representations (ICLR).URL https://openreview.net/forum?id=rJXMpikCZ.

Waniek, M.; Michalak, T. P.; Wooldridge, M. J.; and Rahwan,T. 2018. Hiding individuals and communities in a socialnetwork. Nature Human Behaviour 2: 139.

Xiao, Y.; and Wang, W. Y. 2019. Quantifying uncertaintiesin natural language processing tasks. In Proceedings of theAAAI Conference on Artificial Intelligence, volume 33, 7322–7329.

Xu, K.; Chen, H.; Liu, S.; Chen, P.-Y.; Weng, T.-W.; Hong,M.; and Lin, X. 2019a. Topology attack and defense forgraph neural networks: An optimization perspective. arXivpreprint arXiv:1906.04214 .

Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. 2019b. HowPowerful are Graph Neural Networks? In InternationalConference on Learning Representations (ICLR). URLhttps://openreview.net/forum?id=ryGs6iA5Km.

Ying, R.; You, J.; Morris, C.; Ren, X.; Hamilton, W. L.; andLeskovec, J. 2018. Hierarchical Graph Representation Learn-ing with Differentiable Pooling. In Proceedings of the 32ndInternational Conference on Neural Information ProcessingSystems (NIPS), 4805–4815. Red Hook, NY, USA.Zhang, Y.; Pal, S.; Coates, M.; and Ustebay, D. 2019.Bayesian Graph Convolutional Neural Networks for Semi-Supervised Classification. Proceedings of the AAAI Con-ference on Artificial Intelligence 33: 5829–5836. doi:10.1609/aaai.v33i01.33015829.Zhu, D.; Zhang, Z.; Cui, P.; and Zhu, W. 2019. Robust graphconvolutional networks against adversarial attacks. In Pro-ceedings of the 25th ACM SIGKDD International Conferenceon Knowledge Discovery & Data Mining, 1399–1407.Zhu, L.; and Laptev, N. 2017. Deep and Confident Predic-tion for Time Series at Uber. In 2017 IEEE InternationalConference on Data Mining Workshops (ICDMW), 103–110.Zügner, D.; Akbarnejad, A.; and Günnemann, S. 2018. Ad-versarial attacks on neural networks for graph data. In Pro-ceedings of the 24th ACM SIGKDD International Conferenceon Knowledge Discovery & Data Mining, 2847–2856.Zügner, D.; and Günnemann, S. 2019. Adversarial attackson graph neural networks via meta learning. arXiv preprintarXiv:1902.08412 .

uag: uncertainty-aware attention graph neural network for

Documents