transformers are graph neural networks

Transformers are Graph Neural Networks

Chaitanya K. Joshi

Graph Deep Learning Reading Group

Full Blogpost:

https://graphdeeplearning.github.io/post/transformers-are-gnns/

Some success stories: GNNs for RecSys

Another success story? The Transformer architecture for NLP

Representation Learning for NLP

Breaking down the Transformer

We update the hidden feature h of the i'thword in a sentence S from layer ℓ to layer ℓ+1 as follows:

where j∈ S denotes the set of words in the sentence and Q, K, V are learnable linear weights.

Multi-head Attention

Bad random initializations can de-stabilize the learning process of this dot-product attention mechanism. We can ‘hedge our bets’ through concatenating multiple attention ‘heads’:

Multi-head Attention

The Final Picture

Update each word’s features through

Multi-head Attention mechanismas a weighted sum of features of other words in the sentence.

+ Scaling dot product attention

+ Normalization layers

+ Residual links

The Final Picture

+ Residual links

The Final Picture

+ Residual links

Representation Learning on Graphs

Graph Neural Networks

GNNs update the hidden features h of node i at layer ℓ via a non-linear transformation of the node’s own features added to the aggregation of features from each neighbouring node j∈ N(i):

where U, V are learnable weight matrices of the GNN layer and σ is a non-linearity.

Connections b/w GNNs and Transformers

Consider a sentence as a fully connected graph of words…

Connections b/w GNNs and Transformers

Sum over local neighborhood

Sum over all words

Simple linear transform and sum

Weighted sum via attention

Standard Graph NN

Graph Attention Network

Standard GNN GAT

Transformer

+ Multi-head mechanism

+ Residual links

+ Weighted sum aggregation

transformers are graph neural networks

Documents

a gentle introduction to graph neural networks gentle... ·...

graph structure of neural networks

benchmarking graph neural networks - arxiv

graph neural networks with heterophily

demystifying graph neural networks via graph filter …

graph neural networks in computer...

algebraic graph-assisted bidirectional transformers for

graph neural networks - chrome · graphs & computer...

graph data structures and graph neural networks in high

rethinking graph regularization for graph neural networks

autobahn: automorphism-based graph neural nets

retrognn: approximating retrosynthesis by graph neural

graph neural...

recursive neural filters and dynamical range transformers

graph neural networks · 2021. 3. 25. · graph neural...

graph neural networks - chrome.ws.dei.polimi.it

graph neural networks (gnns)

spectral clustering with graph neural networks for graph...

explaining graph neural networks

factor graph neural network - arxiv.org