forcing neural link predictors to play by the rules

Forcing Neural Link Predictors to Play by the Rules

Sebastian Riedel

Machine Reading

Collaborators

2

Tim Rocktäschel

Thomas DemeesterPasquale Minervini

(now at Oxford)

(University of Ghent)(UCL)

Johannes Welbl(UCL)

Guillaume Bouchard(now at Bloomsbury AI)

Tim Dettmer(University of Lugano)

Theo Trouillon(Xerox Research)

Calculate Truth Scores for Statements

3

If higher, more likely true

Goal

learn from data and prior knowledge, use neural networks, graphical models etc.

in natural language, predicate logic, atoms, rules etc.

Running Example

4

Enrique can speak Spanish

Enrique works in Mexico City

Enrique lives in Mexico

Luis speaks Spanish

Luis lives in Mexico

livesIn(Ivan, Mexico)

speaks(Ivan, Spanish)

When someone lives in Mexico, they will likely speak Spanish

s( ) = 0.9Prior Knowledge (Text)

Training data (Text)

Training data (Structure)

Compliance

5

I can accept tickets to NBA court-side events

You must not accept tickets to expensive sports events

NBA court-side tickets costs at least $1000

s( ) = 0.1

Bloomsbury AIwith

Fact Checking

6

US Unemployment is 42%

The official unemployment rate is the US is 4.4%

There are 420 billion potential working hours

only 240 billion working hours were actually recorded

s( ) = 0.1

with

Scientific Text

7

DrugA interacts with DrugB

DrugA interacts with Protein1

Protein1 has been shown to stimulate Protein2

DrugB increases the rate of Protein2 …

s( ) = 0.9

Knowledge Graphs

8

AndrewMcCallum livesIn Amhersts( ) = 0.9

Andrew McCallum

attended

attended

employee

Dartmouth

Amherst

livesIn?

Part 1: Learning To Score

9




Luis can speak Spanish





s( ) = 0.9ICML 2016 JLMR 2017 AAAI 2018

Part 2: Injecting Prior Knowledge

10









s( ) = 0.9

NAACL 2015 EMNLP 2016 UAI 2017

Part 1: Learning To Score

11









s( ) = 0.9

Simplifying the Problem

12








s( ) = 0.9

Simplifying the Problem: Link Prediction

13

Enrique canSpeak Spanish

Enrique worksIn MexicoCity

Enrique livesIn Mexico

Luis canSpeak Spanish

Luis livesIn Mexico



s( ) = 0.9

Simplifying the Problem: Triples

14

Enrique canSpeak Spanishs( ) = 0.9Encode

Score

= 0.9

DistMult (Yang et. al, 2015)

15


Score

= 0.9

< , , >

Symmetric

16

Spanish canSpeak Enriques( ) = 0.9Encode

Score

= 0.9

< , , >

Complex (Trouillon et al, 2016)

17


Score

= 0.9

Re < , , >

TransE (Bordes et al. 2013)

18


Score

= 0.9

|| + ||-

Rescal (Nickel et. al 2013)

19


Score

= 0.9

ERMLP (Dong et al. 2014)

20


Score

= 0.9

ConvE (Dettmers et al. 2017)

21


Score

= 0.9

< , >conv

Training Objective

Terms in training objective for observed facts

22

loss ( s ( subject, predicate, object ) )

low

Plus some term with negative sampling

Optimised by Stochastic Gradient Descent

∑subject, predicate, object

in training data

high

Learning Visualised

23

Random Entity Embeddings & Training and Test Edges

speaks

languagelivesIn

Test EdgeTraining Edge

Learning Visualised

24

Predictions of a randomly initialised Model

speaks

languagelivesIn


False NegativeFalse PositiveMatch

Learning Visualised

25

Predictions after training

speaks

language

livesIn



Part 2: Injecting Prior Knowledge

26









s( ) = 0.9

Learning Visualised

27

With sufficient training data

speaks

language

livesIn



Learning Visualised

28

With limited training data

language



livesIn

Limited training Data

29

(Obviously) Leads to errors

language



livesIn

Errors

30

May violate our Prior Knowledge

livesIn

When A lives in C

language

and C’s language is L

speaks

then A likely speaks L

A

C

L

Solution

31

Add a loss term that punishes this specific violation

livesIn language

speaksA

C

L

… + loss(vA, vB, vC, vlives, vspeaks, vlang )

Re-training

32

with the specific loss term added

language



livesIn

Re-training

33

fixes the specific error

language



livesIn

Re-training

34

but not all

livesInlanguage

Loss

35

So minimise the loss of the worst case

livesInlanguage

loss(vA, vB, vC, vlives, vspeaks, vlang )… + MaxA,B,C

expensive

Loss

36

Avoid combinatorial optimisation by maximising over embeddings

livesInlanguage

loss(vA, vB, vC, vlives, vspeaks, vlang )… + MaxvA, vB, vC

cheaper

Full Loss

Terms in training objective for observed facts

37

loss ( s ( subject, predicate, object ) )

Minimised in two player game

∑subject, predicate, object

in training data

loss(vA, vB, vC, vlives, vspeaks, vlang )+ MaxvA, vB, vC

Player 1: Link Predictor

38

Estimates entity and relation embeddings to fix the graph (gradient descent)

Player 1

39

Predictions after training

Player 2: Adversary

40

Synthesises entity embeddings that break the rules (gradient ascent)

Player 2: Adversary

41

Synthesises entity embeddings that break the rules (gradient ascent)

livesInlanguage

speaks


42

Estimates entity and relation embeddings to fix the graph and violations

Player 2: Adversary

43



44


Closed Form Solutions

45

For some rules and models the max expression has a closed form solution

loss(vA, vB, vrel1, vrel2)MaxvA, vB

When A relation1 B then A relation2 B

= lossclosed-form(vrel1, vrel2)

For DistMult, Complex, TransE:

Entailment in Vector Space

Where to put livesIn vector to be implied by worksIn

46

* =3]1[1Enrique

]2[1Mexico

worksIn

]1[1worksIn

livesIn

*


Entailment: livesIn score is at least as high as worksIn

47

* =3]2[1worksIn

]1[1worksIn

livesIn

Enrique, Mexico

Enrique, Mexico


When it’s the same vector

48

worksIn

]1[1]2[1 * =3livesIn

3Enrique, Mexico


When the first component is larger

49

worksIn

]2[1]2[1 * =5livesIn

3Enrique, Mexico


When the first component is larger

50

worksIn

]3[1]2[1 * =7livesIn

3Enrique, Mexico


When the second component is larger

51

worksIn

]1[2]2[1 * =4livesIn3


For any linear combination

52

worksIn

]x[* ≥livesIn

3≥1y≥1]2[1


And any (non-negative) input entity pair

53

worksIn

]x[* ≥livesIn

z≥1y≥1]a[b

≥0

Compare Order Embeddings (Vendrov et al. 2016)

Results

54

Applied to general link prediction problems

Integrated rules such asnationalityOf(P,N),officialLang(N,L)=>speaks(P,L)

FB122 Hits@3TransE 59%

Kale-TransE (Guo 16) 62%DistMult 67%

…with rules 71%Complex 67%

…with rules 72%

UAI 2017

Conclusion

Train Neural “Sentence Scorer” with limited data

and prior knowledge

Implemented via 2-player game

Player 1 learns to predict training facts, and follow rules

Player 2 creates tuples (in embedding space) that violate the rules

Runtime “independent of domain size”

Future work:

inject prior knowledge in natural language

extract explanations (see Sameer Singh’s talk)

55

Papers presented in this work

Complex Embeddings for Simple Link Prediction, Trouillon, Théo, Welbl, Johannes, Bouchard, Guillaume, Riedel, Enriqueastian and Gaussier, Eric, International Conference on Machine Learning 2016

Injecting Logical Background Knowledge into Embeddings for Relation Extraction, Rocktaschel, Tim, Singh, Sameer and Riedel, Enriqueastian, Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 2015

Lifted Rule Injection for Relation Embeddings, Demeester, Thomas, Rocktaschel, Tim and Riedel, Enriqueastian, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2016

Adversarial Sets for Regularising Neural Link Predictors, Minervini, Pasquale, Demeester, Thomas, Rocktaschel, Tim, Riedel, Enriqueastian, Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) 2017

56

Entailed Predicates need to live in the boxes of their premises

57

forcing neural link predictors to play by the rules

Documents