ivan titov slides - akbc.ws

66
Extracting and Modeling Relations with Graph Convolutional Networks Ivan Titov 1 with Diego Marcheggiani, Michael Schlichtkrull, Thomas Kipf, Max Welling, Rianne van den Berg and Peter Bloem

Upload: others

Post on 01-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ivan Titov Slides - akbc.ws

Extracting and Modeling Relations with Graph Convolutional NetworksIvan Titov

1

with Diego Marcheggiani, Michael Schlichtkrull, Thomas Kipf, Max Welling, Rianne van den Berg and Peter Bloem

Page 2: Ivan Titov Slides - akbc.ws

Inferring missing facts in knowledge bases: link prediction

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov

St. Petersburg

located_in

Page 3: Ivan Titov Slides - akbc.ws

Relation Extraction

Vaganova Academy

studied_at

lived_in

Mikhail Baryshnikov Mariinsky Theatre

danced_for ?

St. Petersburg

located_in

Baryshnikov danced for Mariinsky based in what was then Leningrad (now St. Petersburg)

danced_for

Page 4: Ivan Titov Slides - akbc.ws

Generalization of link prediction and relation extraction

Vaganova Academy

studied_at

lived_in

Mikhail Baryshnikov Mariinsky Theatre

danced_for ?

St. Petersburg

located_in

E.g., Universal Schema (Reidel et al., 2013)

After a promising start in Mariinsky ballet, Baryshnikov defected to Canada in 1974 ...

Page 5: Ivan Titov Slides - akbc.ws

KBC: it is natural to represent both sentences and KB with graphs

Vaganova Academy

studied_at

lived_in

After a promising start in Mariinsky ballet, Baryshnikov defected to Canada in 1974 ...

Mikhail Baryshnikov Mariinsky Theatre

danced_for ?

St. Petersburg

located_in

For sentences, the graphs encode beliefs about their linguistic structure

How can we model (and exploit) these graphs with graph neural networks?

Page 6: Ivan Titov Slides - akbc.ws

Outline

Graph Convolutional Networks (GCNs)

Extracting Semantic Relations: Semantic Role Labeling

Link Prediction with Graph Neural Networks

Syntactic GCNs

Semantic Role Labeling Model

Relational GCNs

Denoising Graph Autoencoders for Link Prediction

Page 7: Ivan Titov Slides - akbc.ws

Graph Convolutional Networks: Neural Message Passing

Page 8: Ivan Titov Slides - akbc.ws

Graph Convolutional Networks: message passing

v

Undirected graph Update for node v

Kipf & Welling (2017). Related ideas earlier, e.g., Scarselli et al. (2009).

Page 9: Ivan Titov Slides - akbc.ws

Graph Convolutional Networks: message passing

v

Undirected graph Update for node v

Kipf & Welling (2017). Related ideas earlier, e.g., Scarselli et al. (2009).

Page 10: Ivan Titov Slides - akbc.ws

GCNs: multilayer convolution operation

Representations informed by node neighbourhoods

X = H(0) Z = H(N)

H(1)

Input

Hidden layer

H(2)

Hidden layer

Output

Initial feature representations of nodes

Parallelizable computation, can be made quite efficient (e.g., Hamilton, Ying and Leskovec (2017)).

Page 11: Ivan Titov Slides - akbc.ws

GCNs: multilayer convolution operation

X = H(0) Z = H(N)

H(1)

Input

Hidden layer

H(2)

Hidden layer

Output

Parallelizable computation, can be made quite efficient (e.g., Hamilton, Ying and Leskovec (2017)).

Representations informed by node neighbourhoods

Initial feature representations of nodes

Page 12: Ivan Titov Slides - akbc.ws

Graph Convolutional Networks: Previous work

Shown very effective on a range of problems - citations graphs, chemistry, ...

Mostly:

- Unlabeled and undirected graphs- Node labeling in a single large graph (transductive setting)- Classification of graphlets

How to apply GCNs to graphs we have in knowledge based completion / construction?

See Bronstein et al. (Signal Processing, 2017) for an overview

Page 13: Ivan Titov Slides - akbc.ws

Link Prediction with Graph Neural Networks

Page 14: Ivan Titov Slides - akbc.ws

Link Prediction

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_inlocated_in

Page 15: Ivan Titov Slides - akbc.ws

Link Prediction

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_inlocated_in

Page 16: Ivan Titov Slides - akbc.ws

Link Prediction

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_inlocated_in

Page 17: Ivan Titov Slides - akbc.ws

KB Factorization

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_inlocated_in

Page 18: Ivan Titov Slides - akbc.ws

KB Factorization

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_in

X X

Baryshnikov lived_in St. Petersburg

A scoring function is used to predict whether a relation holds:

RESCAL(Nickel et al., 2011)

located_in

Page 19: Ivan Titov Slides - akbc.ws

KB Factorization

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_in

X X

Baryshnikov lived_in St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

located_in

Page 20: Ivan Titov Slides - akbc.ws

KB Factorization

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_in

X X

Baryshnikov lived_in St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

located_in

Relies on SGD to propagate information across the graph

Page 21: Ivan Titov Slides - akbc.ws

Relational GCNs

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_in

X X

Baryshnikov lived_in St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

Use the same scoring function but with GCN node representations rather than parameter vectors

located_in

Schlichtkrull et al., 2017

Page 22: Ivan Titov Slides - akbc.ws

Relational GCNs

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_inlocated_in

X X

Baryshnikov lived_in St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

Use the same scoring function but with GCN node representations rather than parameter vectors

Info about St. Petersburg reached here

Schlichtkrull et al., 2017

Page 23: Ivan Titov Slides - akbc.ws

Relational GCNs

Vaganova Academy

studied_at

lived_in ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

St. Petersburg

located_inlocated_in

X X

Baryshnikov lived_in St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

Use the same scoring function but with GCN node representations rather than parameter vectors Schlichtkrull et al., 2017

Info about St. Petersburg reached here

Page 24: Ivan Titov Slides - akbc.ws

Relational GCNs

Page 25: Ivan Titov Slides - akbc.ws

Relational GCNs

How do we train Relational GCNs?

How do we compactly parameterize Relational GCNs?

Page 26: Ivan Titov Slides - akbc.ws

GCN Denoising Autoencoders

citizen_of

Mikhail Baryshnikov

U.S.A.

Mariinsky Theatre

danced_for

Vaganova Academy

awarded

studied_at

VilcekPrize

St. Petersburg

located_in

located_in

lived

_in

Take the training graph

Schlichtkrull et al (2017)

Page 27: Ivan Titov Slides - akbc.ws

GCN Denoising Autoencoders

Mikhail Baryshnikov

U.S.A.

Mariinsky Theatre

danced_for

Vaganova Academy

awarded

studied_at

VilcekPrize

St. Petersburg

located_in

located_in

Produce a noisy version: drop some random edgesUse this graph for encoding nodes with GCNs

Schlichtkrull et al., 2017

Page 28: Ivan Titov Slides - akbc.ws

GCN Denoising Autoencoders

citizen_of

Mikhail Baryshnikov

U.S.A.

Mariinsky Theatre

danced_for

Vaganova Academy

awarded

studied_at

VilcekPrize

St. Petersburg

located_in

located_in

lived

_in

Force the model to reconstruct the original graph (including dropped edges)

(a ranking loss on edges) Schlichtkrull et al., 2017

Page 29: Ivan Titov Slides - akbc.ws

Training

Encoder

Decoder

Classic DistMult Our R-GCN

(e.g., node embeddings)

(e.g., node

embeddings)

Schlichtkrull et al., 2017

Page 30: Ivan Titov Slides - akbc.ws

Instead of denoising AEs, we can use variational AEs to train R-GCNs

VAE R-GCN can be regarded as an inference network performing amortized

variational inference

Intuition:

R-GCN AEs are amortized versions of factorization models

GCN Autoencoders: Denoising vs Variational

Page 31: Ivan Titov Slides - akbc.ws

Relational GCN

v

There are too many relations in realistic KBs, we cannot use full rank matrices

Schlichtkrull et al., 2017

Page 32: Ivan Titov Slides - akbc.ws

Relational GCN

Naive logic:

We score with a diagonal matrix (DistMul), let’s use a diagonal one in GCN

Page 33: Ivan Titov Slides - akbc.ws

Relational GCN

Block diagonal assumption:

Latent features can be grouped into sets of tightly inter-related

features, modeling dependencies across the sets is less important

Page 34: Ivan Titov Slides - akbc.ws

Relational GCN

Basis / Dictionary learning:

Represent every KB relation as a linear combination of basis transformations

basis

transformations

coefficients

Page 35: Ivan Titov Slides - akbc.ws

Results on FB15k-237 (hits@10)

Our R-GCN relies on DistMult in the

decoder: DistMult is its natural baseline

See other results and metrics in the paper.Results for ComplEX, TransE and HolE from code of Trouillon et al. (2016). Results for HolE using code by Nickel et al. (2015)

Our model

DistMult baseline

Page 36: Ivan Titov Slides - akbc.ws

Relational GCNs

Fast and simple approach to Link Prediction

Captures multiple paths without the need to explicitly marginalize over them

Unlike factorizations, can be applied to subgraphs unseen in training

FUTURE WORK:

R-GCNs can be used in combination with more powerful factorizations / decoders

Objectives favouring recovery of paths rather than edges

Gates and memory may be effective

Page 37: Ivan Titov Slides - akbc.ws

Extracting Semantic Relations

Page 38: Ivan Titov Slides - akbc.ws

Semantic Role Labeling

Closely related to the relation extraction task

Discovering the predicate-argument structure of a sentence

Sequa makes and repairs jet engines

Page 39: Ivan Titov Slides - akbc.ws

Semantic Role Labeling

Closely related to the relation extraction task

Discovering the predicate-argument structure of a sentence- Discover predicates

Sequa makes and repairs jet engines

Page 40: Ivan Titov Slides - akbc.ws

Semantic Role Labeling

Closely related to the relation extraction task

Discovering the predicate-argument structure of a sentence- Discover predicates- Identify arguments and label them with their semantic roles

Sequa makes and repairs jet

creator

engines

creationrepairer

entity repaired

Page 41: Ivan Titov Slides - akbc.ws

Syntax/semantics interaction

Sequa makes and repairs jet

subj

creator

enginescoord

obj

conjnmod

creation

Some syntactic dependencies are mirrored in the semantic graph

Page 42: Ivan Titov Slides - akbc.ws

Syntax/semantics interaction

Sequa makes and repairs jet

subj

creator

enginescoord

obj

conjnmod

creation

Some syntactic dependencies are mirrored in the semantic graph

… but not all of them – the syntax-semantics interface is far from trivial

repairer

entity repaired

GCNs provide a flexible framework for capturing interactions between the graphs

Page 43: Ivan Titov Slides - akbc.ws

Syntactic GCNs: directionality and labels

v

WinWout

amod objadvmod advmod

WloopAlong syntactic edges

Direction opposite of syntactic edges

Page 44: Ivan Titov Slides - akbc.ws

Syntactic GCNs: directionality and labels

v

WinWout

amod objadvmod advmod

Wloop

Direction opposite of syntactic edges

Along syntactic edges

Weight matrix for each direction:Wout, Win, Wloop

Bias for each label + direction, e.g. bin-subj

Page 45: Ivan Titov Slides - akbc.ws

Syntactic GCNs: edge-wise gating

The gate weights the message

Not all edges are equally informative for the downstream task or reliable

We use parsers to predict syntax

Marcheggiani et al., EMNLP 2017

Page 46: Ivan Titov Slides - akbc.ws

Graph Convolutional Encoders

Sequa makes and repairs jet

subj

coord

obj

conjengines

nmod

Encoder (BiRNN, CNN, ..)

Page 47: Ivan Titov Slides - akbc.ws

Graph Convolutional Encoders

Sequa makes and repairs jet

subj

coord

obj

conjengines

nmod

Encoder (BiRNN, CNN, ..)

GCN layer 1

Page 48: Ivan Titov Slides - akbc.ws

Graph Convolutional Encoders

Sequa makes and repairs jet

subj

coord

obj

conj

Wout

Wout

W out

enginesnmod

Wout

W out

Encoder (BiRNN, CNN, ..)

GCN layer 1

Page 49: Ivan Titov Slides - akbc.ws

Graph Convolutional Encoders

Sequa makes and repairs jet

subj

coord

obj

conjengines

nmod

Wout

Wout

W out Wout

W out

Encoder (BiRNN, CNN, ..)

GCN layer 1

Page 50: Ivan Titov Slides - akbc.ws

Graph Convolutional Encoders

Sequa makes and repairs jet

subj

coord

obj

conjengines

nmod

Wout

Wout

W out Wout

W out

Encoder (BiRNN, CNN, ..)

GCN layer 2

GCN layer 1

GCN layer 3

Page 51: Ivan Titov Slides - akbc.ws

Graph Convolutional Encoders

Sequa makes and repairs jet

subj

coord

obj

conj

Encoder (BiRNN, CNN, ..)

GCN layer 2

GCN layer 1

enginesnmod

GCN layer 3

Wout

Wout

W out Wout

W out

Page 52: Ivan Titov Slides - akbc.ws

Graph Convolutional Encoders

How do we construct a GCN-based semantic role labeler?

Page 53: Ivan Titov Slides - akbc.ws

GCNs for Semantic Role Labeling

BiRNN

RepairerSemantic Role Labeler

GCN layer(s)

Sequa makes and repairs jet

subj

coord

obj

conj

Wout W

out

W out

enginesnmod

Wout

W out

Marcheggiani et al., EMNLP 2017

Page 54: Ivan Titov Slides - akbc.ws

GCNs for Semantic Role Labeling

NULL

Sequa makes and repairs jet

subj

coord

obj

conj

Wout W

out

W out

enginesnmod

Wout

W out

Marcheggiani et al., EMNLP 2017

BiRNN

Semantic Role Labeler

GCN layer(s)

Page 55: Ivan Titov Slides - akbc.ws

GCNs for Semantic Role Labeling

NULL

Sequa makes and repairs jet

subj

coord

obj

conj

Wout W

out

W out

enginesnmod

Wout

W out

Marcheggiani et al., EMNLP 2017

BiRNN

Semantic Role Labeler

GCN layer(s)

Page 56: Ivan Titov Slides - akbc.ws

GCNs for Semantic Role Labeling

NULL

Sequa makes and repairs jet

subj

coord

obj

conj

Wout W

out

W out

enginesnmod

Wout

W out

Marcheggiani et al., EMNLP 2017

BiRNN

Semantic Role Labeler

GCN layer(s)

Page 57: Ivan Titov Slides - akbc.ws

GCNs for Semantic Role Labeling

NULL

Sequa makes and repairs jet

subj

coord

obj

conj

Wout W

out

W out

enginesnmod

Wout

W out

Marcheggiani et al., EMNLP 2017

BiRNN

Semantic Role Labeler

GCN layer(s)

Page 58: Ivan Titov Slides - akbc.ws

GCNs for Semantic Role Labeling

Entity Repaired

Sequa makes and repairs jet

subj

coord

obj

conj

Wout W

out

W out

enginesnmod

Wout

W out

Marcheggiani et al., EMNLP 2017

BiRNN

Semantic Role Labeler

GCN layer(s)

Page 59: Ivan Titov Slides - akbc.ws

Results (F1) on Chinese (CoNLL-2009, dev set)

Marcheggiani & Titov (EMNLP, 2017) Predicate disambiguation is excluded from the F1 metric

Page 60: Ivan Titov Slides - akbc.ws

Results (F1) on Chinese (CoNLL-2009, test set)

Marcheggiani & Titov (EMNLP, 2017)

Page 61: Ivan Titov Slides - akbc.ws

Results (F1) on English (CoNLL-2009)

Marcheggiani & Titov (EMNLP, 2017)

Ensemble of 3

Our model

Single models Ensembles

Page 62: Ivan Titov Slides - akbc.ws

Flexibility of GCN encoders

Simple and fast approach to integrating linguistic structure into encoders

In principle we can exploit almost any kind of linguistic structure:

Semantic role labeling structure

Co-reference chains

AMR semantic graphs

Their combination

Page 63: Ivan Titov Slides - akbc.ws

Other applications of syntactic GCN encoders

Bastings et al. (EMNLP, 2017)

Others recently applied them to NER

Cetoli et al. (arXiv:1709.10053)

We also showed them effective as encoders in Neural Machine Translation

Page 64: Ivan Titov Slides - akbc.ws

Conclusions

GCNs are in subtasks of KBC (and in NLP beyond KBC):

- Semantic Roles: we proposed GCNs for encoding linguistic knowledge - Link prediction: GCNs for link prediction (and entity classification) in

multi-relational knowledge bases

Code available

We are hiring! (PhD students / postdocs)

Page 65: Ivan Titov Slides - akbc.ws

Analysis / Discussion

- Improvement across the board, especially in the middle of the range

Page 66: Ivan Titov Slides - akbc.ws

Effect of Distance between Argument and Predicate (English)