explainable, data-efficient, verifiable representation ...kc2813/xaiseminar/... · barack obama is...
TRANSCRIPT
Explainable, Data-Efficient, Verifiable Representation Learning
Pasquale Minervini, UCL [email protected]
ICL, Nov 27th
Symbolic and Sub-Symbolic AI
Symbolic
Ontologies, First-Order Logic, Logic Programming, Knowledge Bases, Theorem Proving..
∀X, Y, Z : grandFather(X, Y ) ⇐ father(X, Z), parent(Z, Y )
Symbolic and Sub-Symbolic AI
Symbolic
Ontologies, First-Order Logic, Logic Programming, Knowledge Bases, Theorem Proving..
∀X, Y, Z : grandFather(X, Y ) ⇐ father(X, Z), parent(Z, Y )
Symbolic and Sub-Symbolic AI
✓Data-Efficient
✓Interpretable and Explainable
✓Easy to Incorporate Knowledge
✓Verifiable
Symbolic
Ontologies, First-Order Logic, Logic Programming, Knowledge Bases, Theorem Proving..
∀X, Y, Z : grandFather(X, Y ) ⇐ father(X, Z), parent(Z, Y )
Sub-Symbolic/Connectionist
Neural Representation Learning, (Deep) Latent Variable Models..
f1( f2(…fn( )…)) = {0.15 ≡ cat0.85 ≡ dog
Symbolic and Sub-Symbolic AI
✓Data-Efficient
✓Interpretable and Explainable
✓Easy to Incorporate Knowledge
✓Verifiable
Sub-Symbolic/Connectionist
Neural Representation Learning, (Deep) Latent Variable Models..
f1( f2(…fn( )…)) = {0.15 ≡ cat0.85 ≡ dog
✓Noisy, Ambiguous, Sensory Data
✓Highly Parallel
✓High Predictive Accuracy
Symbolic
Ontologies, First-Order Logic, Logic Programming, Knowledge Bases, Theorem Proving..
✓Data-Efficient
✓Interpretable and Explainable
✓Easy to Incorporate Knowledge
✓Verifiable
∀X, Y, Z : grandFather(X, Y ) ⇐ father(X, Z), parent(Z, Y )
Symbolic and Sub-Symbolic AI
Sub-Symbolic/Connectionist
Neural Representation Learning, (Deep) Latent Variable Models..
f1( f2(…fn( )…)) = {0.15 ≡ cat0.85 ≡ dog
✓Noisy, Ambiguous, Sensory Data
✓Highly Parallel
✓High Predictive Accuracy
Symbolic
Ontologies, First-Order Logic, Logic Programming, Knowledge Bases, Theorem Proving..
✓Data-Efficient
✓Interpretable and Explainable
✓Easy to Incorporate Knowledge
✓Verifiable
∀X, Y, Z : grandFather(X, Y ) ⇐ father(X, Z), parent(Z, Y )
Symbolic and Sub-Symbolic AI
Knowledge Graph — graph structured Knowledge Base, where knowledge is encoded by relationships between entities.
Knowledge Graphs
Knowledge Graph — graph structured Knowledge Base, where knowledge is encoded by relationships between entities. In practice — set of subject-predicate-object triples, denoting a relationship of type predicate between subject and object.
subject predicate object
Barack Obama was born in Honolulu
Hawaii has capital Honolulu
Barack Obama is politician of United States
Hawaii is located in United States
Barack Obama is married to Michelle Obama
Michelle Obama is a Lawyer
Michelle Obama lives in United States
Knowledge Graphs
Knowledge Graphs
Knowledge Graphs
Link Prediction in Knowledge Graphs
Washington
Malia Ann Obama
Sasha Obama
Barack Obama
Michelle Obama
lives in
parent of
?
Rule-Based Link Prediction
∀X, Y, Z :married with(X, Y) ⇐
parent of(X, Z),parent of(Y, Z)
Washington
Malia Ann Obama
Sasha Obama
Barack Obama
Michelle Obama
lives in
parent of
?
✕ Not always true ✕ Hard to learn from data ✕ Hard to formalise for other modalities
Washington
Malia Ann Obama
Sasha Obama
Barack Obama
Michelle Obama
lives in
parent of
?
∀X, Y, Z :married with(X, Y) ⇐
parent of(X, Z),parent of(Y, Z)
Rule-Based Link Prediction
Washington
Malia Ann Obama
Sasha Obama
Barack Obama
Michelle Obama
lives in
parent of
?
Neural Link Prediction
P(BO married MO) ∝
fmarried( , )
Washington
Malia Ann Obama
Sasha Obama
Barack Obama
Michelle Obama
lives in
parent of
?
Neural Link Prediction
Learning Representationsℒ(𝒢 ∣ Θ) = ∑
(s,p,o)∈𝒢
log σ (fp(es, eo))+ ∑
(s,p,o)∉𝒢
log [1 − σ (fp(es, eo))]Washington
Malia Ann Obama
Sasha Obama
Barack Obama
Michelle Obama
lives in
parent of
?
P(BO married MO) ∝
fmarried( , )
Neural Link Prediction
Neural Link Prediction — Scoring Functions
Models Scoring Functions Parameters
RESCAL [Nickel et al. 2011]
TransE [Bordes et al. 2013]
DistMult [Yang et al. 2015]
HolE [Nickel et al. 2016]
ComplEx [Trouillon et al. 2016]
ConvE [Dettmers et al. 2017]
− es + rp − eo2
p
⟨es, rp, eo⟩
Re (⟨es, rp, eo⟩)
r⊤p (ℱ−1 [ℱ[es] ⊙ ℱ[eo]])
f (vec (f ([es; rp] * ω)) W) eo
rp ∈ ℝk
rp ∈ ℝk
rp ∈ ℝk
rp ∈ ℝk, W ∈ ℝc×k
rp ∈ ℂk
e⊤s Wpeo Wp ∈ ℝk×k
The interaction between the latent features is defined by the scoring function — several variants in the literature:f( ⋅ )
Neural Link Prediction — Scoring Functions
Models Scoring Functions Parameters
RESCAL [Nickel et al. 2011]
TransE [Bordes et al. 2013]
DistMult [Yang et al. 2015]
HolE [Nickel et al. 2016]
ComplEx [Trouillon et al. 2016]
ConvE [Dettmers et al. 2017]
− es + rp − eo2
p
⟨es, rp, eo⟩
Re (⟨es, rp, eo⟩)
r⊤p (ℱ−1 [ℱ[es] ⊙ ℱ[eo]])
f (vec (f ([es; rp] * ω)) W) eo
rp ∈ ℝk
rp ∈ ℝk
rp ∈ ℝk
rp ∈ ℝk, W ∈ ℝc×k
rp ∈ ℂk
e⊤s Wpeo Wp ∈ ℝk×k
The interaction between the latent features is defined by the scoring function — several variants in the literature:f( ⋅ )
Neural Link Prediction — Scoring Functions
Models Scoring Functions Parameters
RESCAL [Nickel et al. 2011]
TransE [Bordes et al. 2013]
DistMult [Yang et al. 2015]
HolE [Nickel et al. 2016]
ComplEx [Trouillon et al. 2016]
ConvE [Dettmers et al. 2017]
− es + rp − eo2
p
⟨es, rp, eo⟩
Re (⟨es, rp, eo⟩)
r⊤p (ℱ−1 [ℱ[es] ⊙ ℱ[eo]])
f (vec (f ([es; rp] * ω)) W) eo
rp ∈ ℝk
rp ∈ ℝk
rp ∈ ℝk
rp ∈ ℝk, W ∈ ℝc×k
rp ∈ ℂk
e⊤s Wpeo Wp ∈ ℝk×k
The interaction between the latent features is defined by the scoring function — several variants in the literature:f( ⋅ )
Neural Link Prediction — Scoring Functions
Models Scoring Functions Parameters
RESCAL [Nickel et al. 2011]
TransE [Bordes et al. 2013]
DistMult [Yang et al. 2015]
HolE [Nickel et al. 2016]
ComplEx [Trouillon et al. 2016]
ConvE [Dettmers et al. 2017]
− es + rp − eo2
p
⟨es, rp, eo⟩
Re (⟨es, rp, eo⟩)
r⊤p (ℱ−1 [ℱ[es] ⊙ ℱ[eo]])
f (vec (f ([es; rp] * ω)) W) eo
rp ∈ ℝk
rp ∈ ℝk
rp ∈ ℝk
rp ∈ ℝk, W ∈ ℝc×k
rp ∈ ℂk
e⊤s Wpeo Wp ∈ ℝk×k
The interaction between the latent features is defined by the scoring function — several variants in the literature:f( ⋅ )
Evaluation Metrics — Area Under the Precision-Recall Curve (AUC-PR), Mean Reciprocal Rank (MRR), Hits@k. In MRR and Hits@k, for each test triple:
• Modify its subject with all the entities in the Knowledge Graph,• Score all the triple variants, and compute the rank of the original test triple,• Repeat for the object.
MRR =1
|𝒯 |
|𝒯|
∑i=1
1ranki
, HITS@k =|{ranki ≤ 10} |
|𝒯 |From [Lacroix et al. ICML 2018]
Neural Link Prediction — Accuracy
Convolutional 2D Knowledge Graph Embeddings
Subject Embedding
Predicate Embedding
[AAAI 2018]
Idea — use ideas from computer vision for modeling the interactions between latent features.
Convolutional 2D Knowledge Graph Embeddings
Subject Embedding
Predicate Embedding
Reshape
“Latent Images”
Idea — use ideas from computer vision for modeling the interactions between latent features.
[AAAI 2018]
Convolutional 2D Knowledge Graph Embeddings
Subject Embedding
Predicate Embedding
Reshape
“Latent Images”
2D ConvolutionsConcatenate
Idea — use ideas from computer vision for modeling the interactions between latent features.
[AAAI 2018]
Convolutional 2D Knowledge Graph Embeddings
Subject Embedding
Predicate Embedding
Reshape
“Latent Images”
2D ConvolutionsConcatenateProjection Logits
Product with Entity Matrix
Idea — use ideas from computer vision for modeling the interactions between latent features.
[AAAI 2018]✓Scalable
✓State-of-the-art Results
Convolutional 2D Knowledge Graph EmbeddingsIdea — use ideas from computer vision for modeling the interactions between latent features.
[AAAI 2018]
0.1
0.225
0.35
0.475
0.6
DistMult ComplEx R-GCN ConvE
MRR Hits@1 Hits@3 Hits@10
✓Efficiency via parameter sharing ✓State-of-the-art Results
Interpreting Knowledge Graph EmbeddingsQuite hard to understand the semantics of the learned representations..
[Minervini et al. ECML 2017]
Interpreting Knowledge Graph EmbeddingsQuite hard to understand the semantics of the learned representations..
.. but we can use their geometric relationships for identifying — and incorporating — semantic relationships between them.
[Minervini et al. ECML 2017]
Regularising Knowledge Graph EmbeddingsQuite hard to understand the semantics of the learned representations..
.. but we can use their geometric relationships for identifying — and incorporating — semantic relationships between them.
[Minervini et al. ECML 2017]
Regularising Knowledge Graph EmbeddingsQuite hard to understand the semantics of the learned representations..
.. but we can use their geometric relationships for identifying — and incorporating — semantic relationships between them.
[Minervini et al. ECML 2017]
✕ is a(x, y) ∧ is a(y, z) ⇒ is a(x, z)
Incorporating Background Knowledge via Adversarial Training
Idea — adversarial training process where, iteratively:
[Minervini et al. UAI 2017]
Incorporating Background Knowledge via Adversarial Training
Idea — adversarial training process where, iteratively: • An adversary searches for inputs where the model violates constraints
[Minervini et al. UAI 2017]
e.g. x, y, z such that is a(x, y) ∧ is a(y, z) ∧ ¬is a(x, z)
Incorporating Background Knowledge via Adversarial Training
Idea — adversarial training process where, iteratively: • An adversary searches for inputs where the model violates constraints • The model is regularised to correct such violations.
[Minervini et al. UAI 2017]
Incorporating Background Knowledge via Adversarial Training
Idea — adversarial training process where, iteratively: • An adversary searches for inputs where the model violates constraints, • The model is regularised to correct such violations.
Formally:
minΘ
ℒdata(D ∣ Θ) + λ maxS
ℒviolation(S, D ∣ Θ)
[Minervini et al. UAI 2017]
e.g. S = {x, y, z} such that is a(x, y) ∧ is a(y, z) ∧ ¬is a(x, z)
Incorporating Background Knowledge via Adversarial Training
Idea — adversarial training process where, iteratively: • An adversary searches for inputs where the model violates constraints, • The model is regularised to correct such violations.
Formally:
• Inputs S can be either input space or embedding space
• In most interesting cases, max has closed form solutions
• Constraints are guaranteed to hold everywhere in embedding space.
[Minervini et al. UAI 2017]
minΘ
ℒdata(D ∣ Θ) + λ maxS
ℒviolation(S, D ∣ Θ)
e.g. S = {ex, ey, ez} such that is a(ex, ey) ∧ is a(ey, ez) ∧ ¬is a(ex, ez)
Incorporating Background Knowledge via Adversarial Training
Idea — adversarial training process where, iteratively: • An adversary searches for inputs where the model violates constraints, • The model is regularised to correct such violations.
Formally:
[Minervini et al. UAI 2017]
e.g. S = {ex, ey, ez} such that is a(ex, ey) ∧ is a(ey, ez) ∧ ¬is a(ex, ez)
✓Incorporates Background Knowledge
✓Verifiable
minΘ
ℒdata(D ∣ Θ) + λ maxS
ℒviolation(S, D ∣ Θ)
Incorporating Background Knowledge via Adversarial Training
[Minervini et al. UAI 2017]
55
60.75
66.5
72.25
78
TransE KALE-Pre KALE-Joint DistMult ASR-DistMult ComplEx ASR-ComplEx
Hits@3 Hits@5 Hits@10
Incorporating Background Knowledge via Adversarial Training
[Minervini et al. UAI 2017]
55
60.75
66.5
72.25
78
TransE KALE-Pre KALE-Joint DistMult ASR-DistMult ComplEx ASR-ComplEx
Hits@3 Hits@5 Hits@10
Incorporating Background Knowledge in Natural Language Inference Models
[Minervini et al. CoNLL 2018]
Natural Language Inference — detect the type of relationship, i.e. entailment, contradiction, neutral, between two sentences.
Incorporating Background Knowledge in Natural Language Inference Models
If a stentence x contradicts y, then also y contradicts x. If x entails y, and y entails z, then x also entails z.
[Minervini et al. CoNLL 2018]
Natural Language Inference — detect the type of relationship, i.e. entailment, contradiction, neutral, between two sentences.
Incorporating Background Knowledge in Natural Language Inference Models
If a stentence x contradicts y, then also y contradicts x. If x entails y, and y entails z, then x also entails z.
x) A man in uniform is pushing a medical bed. y) A man is pushing carrying something.
[Minervini et al. CoNLL 2018]
Natural Language Inference — detect the type of relationship, i.e. entailment, contradiction, neutral, between two sentences.
Incorporating Background Knowledge in Natural Language Inference Models
If a stentence x contradicts y, then also y contradicts x. If x entails y, and y entails z, then x also entails z.
ℒviolation({x, y}) : 0.01 ⇝ 0.92P(x entails y) = 0.72
P(y contradicts x) = 0.93
x) A man in uniform is pushing a medical bed. y) A man is pushing carrying something.
[Minervini et al. CoNLL 2018]
Natural Language Inference — detect the type of relationship, i.e. entailment, contradiction, neutral, between two sentences.
Incorporating Background Knowledge in Natural Language Inference Models
[Minervini et al. CoNLL 2018]
End-to-End Differentiable Reasoning
End-to-End Differentiable ReasoningCore idea — we can combine neural networks and symbolic models by re-implementing classic reasoning algorithms using end-to-end differentiable (neural) architectures.
(Black-Box) Neural Models
•Can generalise from noisy and ambiguois modalities
•Can learn representations from data•SOTA on a number of tasks
Symbolic Reasoning Models
•Data efficient•Interpretable•Explainable•Verifiable•Can incorporate background
knowledge and constraints
Reasoning via Backward Chaining
Backward Chaining — start with a list of goals, and work backwards from the consequent Q to the antecedent P to see if any data supports any of the consequents.
q(X) ← p(X)q(a)?p(a)
p(b)p(c)
…
You can see backward chaining as a query reformulation strategy.
Reasoning via Backward Chaining
Backward Chaining — start with a list of goals, and work backwards from the consequent Q to the antecedent P to see if any data supports any of the consequents.
q(X) ← p(X)q(a)?p(a)
p(b)p(c)
…
You can see backward chaining as a query reformulation strategy.
p(a)
Reasoning via Backward Chaining
Backward Chaining — start with a list of goals, and work backwards from the consequent Q to the antecedent P to see if any data supports any of the consequents.
q(X) ← p(X)q(a)?p(a)
p(b)p(c)
…
You can see backward chaining as a query reformulation strategy.
✓
p(a)
End-to-End Differentiable Reasoning𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏 (𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏 (𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)
✓ ✓✓
sim = 1sim = 1sim = 0.9
End-to-End Differentiable ReasoningKnowledge Base:
𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏(𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)
proof score S1
𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
proof score S2
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y ) ⇐𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Z ),𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, Y ) .
End-to-End Differentiable ReasoningKnowledge Base:
𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏(𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)
proof score S1
𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
proof score S2
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y )X /𝚊𝚋𝚎 Y/𝚋𝚊𝚛𝚝
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, 𝚋𝚊𝚛𝚝)
Subgoals:
proof score S3
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y ) ⇐𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Z ),𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, Y ) .
End-to-End Differentiable ReasoningKnowledge Base:
𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏(𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)
proof score S1
𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
proof score S2
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y )X /𝚊𝚋𝚎 Y/𝚋𝚊𝚛𝚝
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, 𝚋𝚊𝚛𝚝)
Subgoals:
proof score S3
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )Z
proof score S4
proof score S5
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛) …
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y ) ⇐𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Z ),𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, Y ) .
End-to-End Differentiable ReasoningKnowledge Base:
𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏(𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)
proof score S1
𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
proof score S2
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y )X /𝚊𝚋𝚎 Y/𝚋𝚊𝚛𝚝
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, 𝚋𝚊𝚛𝚝)
Subgoals:
proof score S3
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )Z
proof score S4
proof score S5
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛) …∑F∈K
log pKB∖F(F)
− ∑F̃∼corr(F)
log pKB(F̃)
Train viaSelf-Supervision:
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)θ1(X, Y ) ⇐ θ2(X, Z ), θ3(Z, Y ) .
End-to-End Differentiable ReasoningKnowledge Base:
𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏(𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)
proof score S1
𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
proof score S2
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y )X /𝚊𝚋𝚎 Y/𝚋𝚊𝚛𝚝
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, 𝚋𝚊𝚛𝚝)
Subgoals:
proof score S3
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )Z
proof score S4
proof score S5
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛) …
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y ) ⇐𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Z ),𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, Y ) .
End-to-End Differentiable ReasoningKnowledge Base:
𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏(𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y )X /𝚊𝚋𝚎 Y/𝚋𝚊𝚛𝚝
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, 𝚋𝚊𝚛𝚝)
Subgoals:
proof score S3
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )Z
proof score S4
proof score S5
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛) …
𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)
𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y ) ⇐𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Z ),𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, Y ) .
[Minervini et al. 2018, AAAI 2020, Welbl et al. ACL 2019]
End-to-End Differentiable Reasoning
[Minervini et al. AAAI 2020]
End-to-End Differentiable Reasoning
[Minervini et al. AAAI 2020]
End-to-End Differentiable Reasoning with Natural Language
We can embed facts from the KG and facts from text in a shared embedding space, and learn to reason over them jointly:
5XOH�*URXS�S�;��<�����T�<��;� 5XOHV 5XOH�*URXS�S�;��<�����T�;��=���U�=��<�
; < �� < ;
HQFRGHU
.%
HQFRGHU
4XHU\
$1'
FRQWDLQHG,Q�5LYHU�7KDPHV��8.�����
³/RQGRQ�LV�ORFDWHG�LQ�WKH�8.´
³/RQGRQ�LV�VWDQGLQJ�RQ�WKH�5LYHU�7KDPHV´
³>;@�LV�ORFDWHG�LQ�WKH�><@´�;��<�����ORFDWHG,Q�;��<�
ORFDWHG,Q�;��<�����ORFDWHG,Q�;��=���ORFDWHG,Q�=��<�
.%�5HS�� 7H[W�5HSUHVHQWDWLRQV
; < �� < ;; < �� < ; ; < �� ; =�
= <; < �� ; =�
= <
5HFXUVH
N�11�25
[Minervini et al. AAAI 2020]
End-to-End Differentiable Reasoning with Natural Language
We can embed facts from the KG and facts from text in a shared embedding space, and learn to reason over them jointly:
[Welbl et al. ACL 2019, Minervini et al. AAAI 2020]
End-to-End Differentiable Reasoning with Natural Language
We can embed facts from the KG and facts from text in a shared embedding space, and learn to reason over them jointly:
[Welbl et al. ACL 2019, Minervini et al. AAAI 2020]
Thank you!