forcing neural link predictors to play by the rules
TRANSCRIPT
Collaborators
2
Tim Rocktäschel
Thomas DemeesterPasquale Minervini
(now at Oxford)
(University of Ghent)(UCL)
Johannes Welbl(UCL)
Guillaume Bouchard(now at Bloomsbury AI)
Tim Dettmer(University of Lugano)
Theo Trouillon(Xerox Research)
Calculate Truth Scores for Statements
3
If higher, more likely true
Goal
learn from data and prior knowledge, use neural networks, graphical models etc.
in natural language, predicate logic, atoms, rules etc.
Running Example
4
Enrique can speak Spanish
Enrique works in Mexico City
Enrique lives in Mexico
Luis speaks Spanish
Luis lives in Mexico
livesIn(Ivan, Mexico)
speaks(Ivan, Spanish)
When someone lives in Mexico, they will likely speak Spanish
s( ) = 0.9Prior Knowledge (Text)
Training data (Text)
Training data (Structure)
Compliance
5
I can accept tickets to NBA court-side events
You must not accept tickets to expensive sports events
NBA court-side tickets costs at least $1000
s( ) = 0.1
Bloomsbury AIwith
Fact Checking
6
US Unemployment is 42%
The official unemployment rate is the US is 4.4%
There are 420 billion potential working hours
only 240 billion working hours were actually recorded
s( ) = 0.1
with
Scientific Text
7
DrugA interacts with DrugB
DrugA interacts with Protein1
Protein1 has been shown to stimulate Protein2
DrugB increases the rate of Protein2 …
s( ) = 0.9
Knowledge Graphs
8
AndrewMcCallum livesIn Amhersts( ) = 0.9
Andrew McCallum
attended
attended
employee
Dartmouth
Amherst
livesIn?
Part 1: Learning To Score
9
Enrique can speak Spanish
Enrique works in Mexico City
Enrique lives in Mexico
Luis can speak Spanish
Luis lives in Mexico
livesIn(Ivan, Mexico)
speaks(Ivan, Spanish)
When someone lives in Mexico, they will likely speak Spanish
s( ) = 0.9ICML 2016 JLMR 2017 AAAI 2018
Part 2: Injecting Prior Knowledge
10
Enrique can speak Spanish
Enrique works in Mexico City
Enrique lives in Mexico
Luis can speak Spanish
Luis lives in Mexico
livesIn(Ivan, Mexico)
speaks(Ivan, Spanish)
When someone lives in Mexico, they will likely speak Spanish
s( ) = 0.9
NAACL 2015 EMNLP 2016 UAI 2017
Part 1: Learning To Score
11
Enrique can speak Spanish
Enrique works in Mexico City
Enrique lives in Mexico
Luis can speak Spanish
Luis lives in Mexico
livesIn(Ivan, Mexico)
speaks(Ivan, Spanish)
When someone lives in Mexico, they will likely speak Spanish
s( ) = 0.9
Simplifying the Problem
12
Enrique can speak Spanish
Enrique works in Mexico City
Enrique lives in Mexico
Luis can speak Spanish
Luis lives in Mexico
livesIn(Ivan, Mexico)
speaks(Ivan, Spanish)
s( ) = 0.9
Simplifying the Problem: Link Prediction
13
Enrique canSpeak Spanish
Enrique worksIn MexicoCity
Enrique livesIn Mexico
Luis canSpeak Spanish
Luis livesIn Mexico
livesIn(Ivan, Mexico)
speaks(Ivan, Spanish)
s( ) = 0.9
Training Objective
Terms in training objective for observed facts
22
loss ( s ( subject, predicate, object ) )
low
Plus some term with negative sampling
Optimised by Stochastic Gradient Descent
∑subject, predicate, object
in training data
high
Learning Visualised
23
Random Entity Embeddings & Training and Test Edges
speaks
languagelivesIn
Test EdgeTraining Edge
Learning Visualised
24
Predictions of a randomly initialised Model
speaks
languagelivesIn
Test EdgeTraining Edge
False NegativeFalse PositiveMatch
Learning Visualised
25
Predictions after training
speaks
language
livesIn
Test EdgeTraining Edge
False NegativeFalse PositiveMatch
Part 2: Injecting Prior Knowledge
26
Enrique can speak Spanish
Enrique works in Mexico City
Enrique lives in Mexico
Luis can speak Spanish
Luis lives in Mexico
livesIn(Ivan, Mexico)
speaks(Ivan, Spanish)
When someone lives in Mexico, they will likely speak Spanish
s( ) = 0.9
Learning Visualised
27
With sufficient training data
speaks
language
livesIn
Test EdgeTraining Edge
False NegativeFalse PositiveMatch
Learning Visualised
28
With limited training data
language
Test EdgeTraining Edge
False NegativeFalse PositiveMatch
livesIn
Limited training Data
29
(Obviously) Leads to errors
language
Test EdgeTraining Edge
False NegativeFalse PositiveMatch
livesIn
Errors
30
May violate our Prior Knowledge
livesIn
When A lives in C
language
and C’s language is L
speaks
then A likely speaks L
A
C
L
Solution
31
Add a loss term that punishes this specific violation
livesIn language
speaksA
C
L
… + loss(vA, vB, vC, vlives, vspeaks, vlang )
Re-training
32
with the specific loss term added
language
Test EdgeTraining Edge
False NegativeFalse PositiveMatch
livesIn
Re-training
33
fixes the specific error
language
Test EdgeTraining Edge
False NegativeFalse PositiveMatch
livesIn
Loss
35
So minimise the loss of the worst case
livesInlanguage
loss(vA, vB, vC, vlives, vspeaks, vlang )… + MaxA,B,C
expensive
Loss
36
Avoid combinatorial optimisation by maximising over embeddings
livesInlanguage
loss(vA, vB, vC, vlives, vspeaks, vlang )… + MaxvA, vB, vC
cheaper
Full Loss
Terms in training objective for observed facts
37
loss ( s ( subject, predicate, object ) )
Minimised in two player game
∑subject, predicate, object
in training data
loss(vA, vB, vC, vlives, vspeaks, vlang )+ MaxvA, vB, vC
Player 1: Link Predictor
38
Estimates entity and relation embeddings to fix the graph (gradient descent)
Player 2: Adversary
41
Synthesises entity embeddings that break the rules (gradient ascent)
livesInlanguage
speaks
Player 1: Link Predictor
42
Estimates entity and relation embeddings to fix the graph and violations
Player 1: Link Predictor
44
Estimates entity and relation embeddings to fix the graph and violations
Closed Form Solutions
45
For some rules and models the max expression has a closed form solution
loss(vA, vB, vrel1, vrel2)MaxvA, vB
When A relation1 B then A relation2 B
= lossclosed-form(vrel1, vrel2)
For DistMult, Complex, TransE:
Entailment in Vector Space
Where to put livesIn vector to be implied by worksIn
46
* =3]1[1Enrique
]2[1Mexico
worksIn
]1[1worksIn
livesIn
*
Entailment in Vector Space
Entailment: livesIn score is at least as high as worksIn
47
* =3]2[1worksIn
]1[1worksIn
livesIn
Enrique, Mexico
Enrique, Mexico
Entailment in Vector Space
When it’s the same vector
48
worksIn
]1[1]2[1 * =3livesIn
3Enrique, Mexico
Entailment in Vector Space
When the first component is larger
49
worksIn
]2[1]2[1 * =5livesIn
3Enrique, Mexico
Entailment in Vector Space
When the first component is larger
50
worksIn
]3[1]2[1 * =7livesIn
3Enrique, Mexico
Entailment in Vector Space
And any (non-negative) input entity pair
53
worksIn
]x[* ≥livesIn
z≥1y≥1]a[b
≥0
Compare Order Embeddings (Vendrov et al. 2016)
Results
54
Applied to general link prediction problems
Integrated rules such asnationalityOf(P,N),officialLang(N,L)=>speaks(P,L)
FB122 Hits@3TransE 59%
Kale-TransE (Guo 16) 62%DistMult 67%
…with rules 71%Complex 67%
…with rules 72%
UAI 2017
Conclusion
Train Neural “Sentence Scorer” with limited data
and prior knowledge
Implemented via 2-player game
Player 1 learns to predict training facts, and follow rules
Player 2 creates tuples (in embedding space) that violate the rules
Runtime “independent of domain size”
Future work:
inject prior knowledge in natural language
extract explanations (see Sameer Singh’s talk)
55
Papers presented in this work
Complex Embeddings for Simple Link Prediction, Trouillon, Théo, Welbl, Johannes, Bouchard, Guillaume, Riedel, Enriqueastian and Gaussier, Eric, International Conference on Machine Learning 2016
Injecting Logical Background Knowledge into Embeddings for Relation Extraction, Rocktaschel, Tim, Singh, Sameer and Riedel, Enriqueastian, Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 2015
Lifted Rule Injection for Relation Embeddings, Demeester, Thomas, Rocktaschel, Tim and Riedel, Enriqueastian, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2016
Adversarial Sets for Regularising Neural Link Predictors, Minervini, Pasquale, Demeester, Thomas, Rocktaschel, Tim, Riedel, Enriqueastian, Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) 2017
56