expanding the yago knowledge base - thomas rebele · yago knowledge base rebele the yago knowledge...
TRANSCRIPT
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
1/41
Expanding the YAGO knowledge base
Thomas Rebele
Télécom ParisTech
2018-07-19
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
What is a knowledgebase?
What is YAGO?
Involvement
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is a knowledge base?
2/41
Albert EinsteinMileva Maric
marriedAlfred Kleinerhas advisor
Nobel Prize in Physics
won prize
◮
◮
◮
◮
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
What is a knowledgebase?
What is YAGO?
Involvement
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is a knowledge base?
2/41
Albert EinsteinMileva Maric
marriedAlfred Kleinerhas advisor
Nobel Prize in Physics
won prize
Applications of knowledge bases:
◮ Question answering
◮ Semantic search
◮ Text analysis
◮ Machine translation
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
What is a knowledgebase?
What is YAGO?
Involvement
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is YAGO?
3/41
◮ Knowledge base with 10 million entities and >210 million facts
◮ Automatically extracted from Wikipedia, Wordnet, and Geonames
◮ Multilingual facts from 10 languages
◮ Focus on precision
◮ Developed by Max-Planck Institute for Informatics and TélécomParisTech
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
What is a knowledgebase?
What is YAGO?
Involvement
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is YAGO?
4/41
Languages
EnglishGermanFrench
Albert Einstein
Albert Einstein was aphysicist. His work in-fluenced the philosophyof science. He developedthe theory of relativity.
Albert Einstein
Spouse(s) Mileva MaricDoctoral advisor(s) Alfred Kleiner
Categories: Nobel laureates in Physics
Albert EinsteinMileva Maric
marriedAlfred Kleinerhas advisor
Nobel Prize in Physics
won prize
automaticextraction
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
What is a knowledgebase?
What is YAGO?
Involvement
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Involvement
5/41
◮ I joined the project in 2014
◮ Maintenance and development
◮ Contributed to open source release in 2017 athttps://github.com/yago-naga/yago3/
◮ Coordinated / contributed to the evaluation
◮ ground truth: Wikipedia◮ 98% facts of the sample were correct
Publication: ISWC 2016 (resource paper)
Thomas Rebele Fabian Suchanek Johannes Hoffart Joanna Biega Erdal Kuzey Gerhard Weikum
YAGO is very accurate. But how complete is it?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
6/41
Contributions:
Extracting more information aboutresidences, gender, birth and death dates
Repairing regular expressionsby adding missing words
Preprocessing tabular databy transforming queries to Bash scripts
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
6/41
Contributions:
Extracting more information aboutresidences, gender, birth and death dates
Repairing regular expressionsby adding missing words
Preprocessing tabular databy transforming queries to Bash scripts
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Problem statement
7/41
◮ Every person lives somewhere,but YAGO knows the residence only for 30% of the people
◮ Every person has a gender,but YAGO knows the gender only for 64% of the people
How can we make YAGO more complete?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Place of residence
8/41
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birthplace Athens
Categories: 420s BC births | 340s BCdeaths | Greek philosopher | Greekmale wrestler | Austrian writer
residence
previous approach
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Place of residence
8/41
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birthplace Athens
Categories: 420s BC births | 340s BCdeaths | Greek philosopher | Greekmale wrestler | Austrian writer
residence
previous approach
AustriaGreece Greece
mapping of
5900 demonyms
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Place of residence
8/41
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birthplace Athens
Categories: 420s BC births | 340s BCdeaths | Greek philosopher | Greekmale wrestler | Austrian writer
residence
previous approach
AustriaGreece Greece
mapping of
5900 demonyms
Greece: 2Austria: 1
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Place of residence
8/41
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birthplace Athens
Categories: 420s BC births | 340s BCdeaths | Greek philosopher | Greekmale wrestler | Austrian writer
residence
previous approach
AustriaGreece Greece
mapping of
5900 demonyms
Greece: 2Austria: 1
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Gender
9/41
Extract gender:
Languages
EnglishGermanFrench
Albert Einstein
Albert Einstein was aphysicist. His work in-fluenced the philosophyof science. He developedthe theory of relativity.
Albert Einstein
Categories: Male scientist | Swissphysicists
◮
◮
From pronoun:
◮ YAGO’s originalalgorithm
◮ Count pronouns (he,him / she, her)
◮ Assign genderaccordingly
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Gender
9/41
Extract gender:
Languages
EnglishGermanFrench
Albert Einstein
Albert Einstein was aphysicist. His work in-fluenced the philosophyof science. He developedthe theory of relativity.
Albert Einstein
Categories: Male scientist | Swissphysicists
From category
◮
◮
From pronoun:
◮ YAGO’s originalalgorithm
◮ Count pronouns (he,him / she, her)
◮ Assign genderaccordingly
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Gender
9/41
Extract gender:
Languages
EnglishGermanFrench
Albert Einstein
Albert Einstein was aphysicist. His work in-fluenced the philosophyof science. He developedthe theory of relativity.
Albert Einstein
Categories: Male scientist | Swissphysicists
From category From first name:
◮ Countmales/females foreach first name
◮ Assign names togender accordingly
From pronoun:
◮ YAGO’s originalalgorithm
◮ Count pronouns (he,him / she, her)
◮ Assign genderaccordingly
Using YAGO for the humanities: Evaluation
10/41
◮ Compare extraction process on Wikipedia dump from 2017-02-20
◮ Extracted on 11 languages
◮ Evaluate precision based on a sample of 100 people
ExtractionYAGO
beforeRecall
YAGO
nowRecall Precision DBpedia (en)
Place of
residence 0.7m 30% 2.1m 91% (+201%) 97% (*) 0.7m
Gender 1.5m 64% 2.0m 87% (+35%) 98% 4k
Birth dates 1.6m 69% 1.7m 74% (+8%) 100% 0.8m
Death dates 0.7m 33% 0.8m 36% (+10%) 100% 0.3m
Table: Coverage and precision of our methods.Recall relative to total number of people in YAGO (2.2m).
m million k thousand(*) 6% of anachronistic residencies (e.g., German Empire instead of Germany)
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Births per month
11/41
0 2 4 6 8 10 12
7.5%
8%
8.5%
9%
Month
Rel
ativ
eb
irth
s
YAGO - all
National Center for Health Statistics
Figure: Births per month in the United States between 2003 and 2015(with the Student’s t confidence interval at α = 95%).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Births per month
11/41
0 2 4 6 8 10 12
7.5%
8%
8.5%
9%
Month
Rel
ativ
eb
irth
s
YAGO - all
National Center for Health Statistics
Figure: Births per month in the United States between 2003 and 2015(with the Student’s t confidence interval at α = 95%).
Languages
EnglishEuskara
Relative age effect
The relative age effectdescribes a bias. Peopleborn early in the selec-tion period of sports oracademia are more likelyto succeed.
Categories: Ageism|Epidemiology
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Births per month
11/41
0 2 4 6 8 10 12
7.5%
8%
8.5%
9%
Month
Rel
ativ
eb
irth
s
YAGO - no sportsmen
National Center for Health Statistics
Figure: Births per month in the United States between 2003 and 2015(with the Student’s t confidence interval at α = 95%).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Life span over time
12/41
100 500 1000 1500 1900
45
50
55
60
65
70
75
80
85
Year
Med
ian
age
male
female
Figure: Median age over time, by year of birth(with the Student’s t confidence interval at α = 95%).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Relative population size
13/41
−3000 −2500 −2000 −1500 −1000 −500 0 500 1000 1500 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.81
Year
Rel
ativ
ep
op
ula
tio
nsi
ze
Egypt
Babylonian-EmpireSyria
ChinaGreece
Ancient-RomeBritainFrance
ItalyGermany
United-States
Figure: Relative population size, by century. The y-axis is scaled by a quadraticfunction.
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Summary
14/41
◮ Extension of YAGO:◮ More people with residences (+201%, 97% precison)◮ More people with genders (+35%, 98% precision)◮ More birth and death dates (+8%/10%, 100% precision)
◮ Case studies:◮ Births per month◮ Life span over time◮ Relative population size over time
◮ Interdisciplinary project
Publication: ISWC 2017 (workshop paper)
Thomas Rebele Arash Nekoei Fabian Suchanek
We often had to repair regular expressions (e.g., for matching dates).Can we automate this step?
100 500 1000 1500 1900
50
60
70
80
Year
Med
ian
age
male
female
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Problem statement
Extensions
Place of residence
Gender
Evaluation
Births per month
Life span over time
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the humanities: Summary
15/41
Contributions:
Extracting more information aboutresidences, gender, birth and death dates
Repairing regular expressionsby adding missing words
Preprocessing tabular databy transforming queries to Bash scripts
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Introduction
16/41
Why does YAGO not knowthe ISBN numbers of my books?
◮ We want to find ISBN numbers in Wikipedia to include it in YAGO
◮ We try the regex ISBN(978|979)?\d{10}
◮ I978-2-1234-5680-3
◮
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Introduction
16/41
Why does YAGO not knowthe ISBN numbers of my books?
◮ We want to find ISBN numbers in Wikipedia to include it in YAGO
◮ We try the regex ISBN(978|979)?\d{10}
◮ Why does the regex not find I978-2-1234-5680-3 ?
◮ How can we modify the regex automatically to match the word?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Problem statement
17/41
Problem statement, first try:
Given
◮ a regular expression r and ISBN(978|979)?\d{10}
◮ a set of positive examples E+, { I978-2-1234-5680-3 }find a regular expression r′ such that
◮ L(r) ⊆ L(r′)
◮ E+ ⊆ L(r′)
′ = .∗
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Problem statement
17/41
Problem statement, first try:
Given
◮ a regular expression r and ISBN(978|979)?\d{10}
◮ a set of positive examples E+, { I978-2-1234-5680-3 }find a regular expression r′ such that
◮ L(r) ⊆ L(r′)
◮ E+ ⊆ L(r′)
Solution:
r′ = .∗
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Problem statement
18/41
Problem statement:
Given
◮ a regular expression r, ISBN(978|979)?\d{10}
◮ a set of positive examples E+, { I978-2-1234-5680-3 }◮ a set of negative examples E−, { 0612345678 }
find a regular expression r′ such that
◮ L(r) ⊆ L(r′)
◮ E+ ⊆ L(r′)
◮ L(r′) ∩ E− is small
◮ ′ ≥ ≈
◮ ′ ≥
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Problem statement
18/41
Problem statement:
Given
◮ a regular expression r, ISBN(978|979)?\d{10}
◮ a set of positive examples E+, { I978-2-1234-5680-3 }◮ a set of negative examples E−, { 0612345678 }
find a regular expression r′ such that
◮ L(r) ⊆ L(r′)
◮ E+ ⊆ L(r′)
◮ L(r′) ∩ E− is small
Evaluation:
◮ Precision of r′ ≥ or ≈ precision of r
◮ Recall of r′ ≥ recall of r(w.r.t. the intended meaning of the regex)
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: What is new in our approach
19/41
Previous approaches:
regex E+ E− regex+ + −→
Our approach:
regex E+ E− regex+ + −→
Rationale: creating a large set of positive examples is difficult
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Approximate regex matching
20/41
Step 1: match string and regex approximately [Myers et al. 1989]
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3
...
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Finding the gaps
21/41
Step 2: find the gaps
◮ Between regex leaves
◮
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \dS B N
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3
...
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Finding the gaps
21/41
Step 2: find the gaps
◮ Between regex leaves
◮ Between characters of the string
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-
...
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
22/41
Step 3 (simple approach): adapt regex, so that it includes the missingparts
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \dS B N
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3
...
.
.
I ?
.
S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
22/41
Step 3 (simple approach): adapt regex, so that it includes the missingparts
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-
...
.
.
I ?
.
S B N
?
|
.
9 7 8
.
9 7 9
?
-
.
\d \d ... \d \d
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
22/41
Step 3 (simple approach): adapt regex, so that it includes the missingparts
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-
...
.
.
I ?
.
S B N
?
|
.
9 7 8
.
9 7 9
?
-
.
\d ?
-
\d ... \d \d
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
22/41
Step 3 (simple approach): adapt regex, so that it includes the missingparts
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-
...
.
.
I ?
.
S B N
?
|
.
9 7 8
.
9 7 9
?
-
.
\d ?
-
\d ... \d ?
-
\d
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
23/41
Step 3 (adaptive approach): adapt regex, so that it includes the missingparts
Exemplarily for a concatenation:
.
a b ... c d
gs
2g1 g
e
2g3
{g1, g2, g3}
→
.
a
?
b ... c d
{g1, gs
2} {g e
2, g3}
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
23/41
Step 3 (adaptive approach): adapt regex, so that it includes the missingparts
Exemplarily for a concatenation:
.
a b ... c d
gs
2g1 g
e
2g3
{g1, g2, g3}
→
.
a
?
b ... c d
{g1, gs
2} {g e
2, g3}
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Feedback function
24/41
Now we want to find URLs:
◮ We try regex r = http://[a-zA-Z\.]+
◮ It does not find s = wikipedia.org
◮ Repaired regex r′ = (http://)?[a-zA-Z\.]+
Problem:
◮ r′ finds all words
◮ Precision drops
−
◮
◮
( ′) = | − ∩ ( ′)| ≤ α| − ∩ ( )|
◮ ( ′) =http://[a-zA-Z\.]+|wikipedia.org
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Feedback function
24/41
Now we want to find URLs:
◮ We try regex r = http://[a-zA-Z\.]+
◮ It does not find s = wikipedia.org
◮ Repaired regex r′ = (http://)?[a-zA-Z\.]+
Problem:
◮ r′ finds all words
◮ Precision drops
Solution: use feedback on set of negative examples E−
◮ Determine the parts of the regex that we can make optional
◮ We use the number of false positives, i.e.,
f (r′) = |E− ∩ L(r′)| ≤ α|E− ∩ L(r)|
◮ If f (r′) = false, add the word as disjunction instead:http://[a-zA-Z\.]+|wikipedia.org
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Experiments
25/41
Input data:
◮ Datasets:ReLIE [Li et al., 2008],Enron [Babbar et al., 2010], andWikipedia infobox attributes
◮ In total 8 tasks (e.g., phone numbers, software names, dates)
◮ In total 52 regexes
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Experiments
26/41
Input data:
◮ Datasets:ReLIE [Li et al., 2008],Enron [Babbar et al., 2010], andWikipedia infobox attributes
◮ In total 8 tasks (e.g., phonenumbers, software names, dates)
◮ In total 52 regexes
Approaches:
◮ Dis: r|s1| · · · |sn
◮ Star: .*
◮ B&S: [Babbar et al., 2010](reimplementation)
◮ Simple
◮ Adaptive
baseline adaptive
measure original dis star B&S simple α = 1.0 α = 1.1 α = 1.20
F1 55 55 21 40 56 60 60 60recall 66 67 62 35 69 75 76 77precision 64 64 14 71 64 63 63 63
length 56 270 2 3929 250 76 80 81
Table: Averaged measures for the different systems. Length is # of characters ofthe regex.
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Summary
27/41
Summary:
◮ Algorithm for adding missing words to regexes
◮ Increases recall, while keeping precision stable
◮ Source code available athttps://github.com/thomasrebele/regex-repair
Future work:
◮ Decrease dependency on E−
◮ Add a generalization step as postprocessing
Publications: ISWC 2017 (demo), PAKDD 2018 (full paper)
Thomas Rebele Katerina Tzompanaki Fabian Suchanek
Now that we have all this data, how can we process it efficiently?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
Introduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Summary
28/41
Contributions:
Extracting more information aboutresidences, gender, birth and death dates
Repairing regular expressionsby adding missing words
Preprocessing tabular databy transforming queries to Bash scripts
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Motivation
29/41
How can I findall my academic ancestors?
Albert Einstein
Relativity
teaches
Alfred Kleiner
Statistical physics
teaches
hasAdvisor
Johann Müller
Electromagnetism
teaches
hasAdvisor
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Idea
30/41
QSPARQL / OWL
qresult
database
qTSV/n-triples/
/
σπ
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Idea
30/41
QSPARQL / OWL
qresult
database
qTSV/n-triples/
/
σπ
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Idea
30/41
QSPARQL / OWL
qresult
database
qTSV/n-triples/
/
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
/
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Idea
30/41
QSPARQL / OWL
qresult
database
qTSV/n-triples/
/
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
/
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
31/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
Query "Who are Einstein’s academic ancestors?" in SPARQL:
SELECT ?Y WHERE {<Einstein> <hasAdvisor>+ ?Y
}
Translating the query to Datalog (simplified):
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( ) ( )
( )
( , )
( , )( , )
( , ., )( , ., )
( )
( , , )
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( ) ( )
( )
( , )
( , )( , )
( , ., )( , ., )
( )
( , , )
( )
projection
( , )( , )
selection
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( ) ( )
( )
( , )
( , )( , )
( , ., )( , ., )
( )
( , , )
( )
projection
( , )( , )
selection
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( ) ( )
( )
( , )
( , )( , )
( , ., )( , ., )
( )
join
( , , )
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
least fixed point
( ) ( )
( )
( , )
( , )( , )
( , ., )( , ., )
( )
( , , )
recursive call
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( ) ( )
( )
( , )
( , )( , )
( , s., )( , s., )
( )
( , , )
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( )
( )
( )
( , )
( , )( , )
( , s., )( , s., )
( )
( , , )
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( )
( )
( )
( , )
( , )( , )
( , s., )( , s., )
( )
( , , )
( )
( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( )
( )
( )
( , )
( , )( , )
( , s., )( , s., )
( )
( , , )
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( )
( )
( )
( , )
( , )( , )
( , s., )( , s., )
( )
( , , )
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( )
( )
( )
( , )
( , )( , )
( , s., )( , s., )
( )
( , , )
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( ) ( )
( )
( , )
( , )( , )
( , s., )( , s., )
( )
( , , )
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
32/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
facts(X, Y, Z) :~ read_ntriples input
adv(X, Y) :- facts(X, "hasAdvisor", Y).
result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
( ) ( )
( )
( , )
( , )( , )
( , s., )( , s., )
( )
( , , )
( ) ( , )( , )
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
33/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
Optimizations:
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
µx
∪
π3
σ1=Einstein
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
33/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
Optimizations:
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
µx
∪
π3
σ1=Einstein
σ2=hasAdvisor
input
π3
⋊⋉1=1
δx π1,3
σ2=hasAdvisor
input
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
33/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
Optimizations:
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
µx
sort
∪
π3
σ1=Einstein
σ2=hasAdvisor
input
π3
⋊⋉1=1
sort1
δx
sort1
π1,3
σ2=hasAdvisor
input
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
33/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
Optimizations:
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
µx
π3
σ1=Einstein
σ2=hasAdvisor
input
sort
π3
⋊⋉1=1
sort1
δx
sort1
π1,3
σ2=hasAdvisor
input
(1st)
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
33/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
Optimizations:
µx
∪
π2
σ1=Einstein
π1,3
σ2=hasAdvisor
input
π3
⋊⋉1=1
x π1,3
σ2=hasAdvisor
input
£ a
sort1
π1,3
σ2=hasAdvisor
input
µx
π3
σ1=Einstein
σ2=hasAdvisor
input
sort
π3
⋊⋉1=1
sort1
δx
a
(1st)
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
34/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
£ a
sort1
π1,3
σ2=adv.
input
µx
π3
σ1=Einstein
σ2=adv.
input
sort
π3
⋊⋉1=1
sort1
δx
a
(1st)
awk ’( $1 == "Einstein"
&& $2 == "hasAdvisor"){ print $3 >> "b" }
($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }
’ <(read_ntriples input)
# lock a(
sort -k 1 pre_a > a# unlock a
) &
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
34/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
£ a
sort1
π1,3
σ2=adv.
input
µx
π3
σ1=Einstein
σ2=adv.
input
sort
π3
⋊⋉1=1
sort1
δx
a
(1st)
awk ’( $1 == "Einstein"
&& $2 == "hasAdvisor"){ print $3 >> "b" }
($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }
’ <(read_ntriples input)
# lock a(
sort -k 1 pre_a > a# unlock a
) &
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
34/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
£ a
sort1
π1,3
σ2=adv.
input
µx
π3
σ1=Einstein
σ2=adv.
input
sort
π3
⋊⋉1=1
sort1
δx
a
(1st)
awk ’( $1 == "Einstein"
&& $2 == "hasAdvisor"){ print $3 >> "b" }
($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }
’ <(read_ntriples input)
# lock a(
sort -k 1 pre_a > a# unlock a
) &
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
34/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
£ a
sort1
π1,3
σ2=adv.
input
µx
π3
σ1=Einstein
σ2=adv.
input
sort
π3
⋊⋉1=1
sort1
δx
a
(1st)
awk ’( $1 == "Einstein"
&& $2 == "hasAdvisor"){ print $3 >> "b" }
($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }
’ <(read_ntriples input)
# lock a(
sort -k 1 pre_a > a# unlock a
) &
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
34/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
£ a
sort1
π1,3
σ2=adv.
input
µx
π3
σ1=Einstein
σ2=adv.
input
sort
π3
⋊⋉1=1
sort1
δx
a
(1st)
awk ’( $1 == "Einstein"
&& $2 == "hasAdvisor"){ print $3 >> "b" }
($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }
’ <(read_ntriples input)
# lock a(
sort -k 1 pre_a > a# unlock a
) &
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
34/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
£ a
sort1
π1,3
σ2=adv.
input
µx
π3
σ1=Einstein
σ2=adv.
input
sort
π3
⋊⋉1=1
sort1
δx
a
(1st)
while# ...sort -k 1 -u
<( # wait for ajoin -1 1 -2 1 -o 2.2
<(sort -k 1 -u delta)a
)
# ...[ -s delta ];do continue; done
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
34/41
QSPARQL / OWL
&Bash script
Q1. Datalog
σπ2. Algebra
÷3. Optimize
^4. Translate
£ a
sort1
π1,3
σ2=adv.
input
µx
π3
σ1=Einstein
σ2=adv.
input
sort
π3
⋊⋉1=1
sort1
δx
a
(1st)
while# ...sort -k 1 -u
<( # wait for ajoin -1 1 -2 1 -o 2.2
<(sort -k 1 -u delta)a
)
# ...[ -s delta ];do continue; done
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
35/41
How can I find all professors?
Professor(X) :- Person(X),teachesCourse(X,Y).
Professor(X) :- advisorOf(X,Y),Professor(Y).
Person(X) :- Employee(X).Person(X) :- Professor(X).
Professor(X) :- Professor(X),teachesCourse(X,Y).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
35/41
How can I find all professors?
Professor(X) :- Person(X),teachesCourse(X,Y).
Professor(X) :- advisorOf(X,Y),Professor(Y).
Person(X) :- Employee(X).Person(X) :- Professor(X).
Professor(X) :- Professor(X),teachesCourse(X,Y).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
35/41
How can I find all professors?
Professor(X) :- Person(X),teachesCourse(X,Y).
Professor(X) :- advisorOf(X,Y),Professor(Y).
Person(X) :- Employee(X).Person(X) :- Professor(X).
Combining the first and the last rule leads to
Professor(X) :- Professor(X),teachesCourse(X,Y).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
35/41
How can I find all professors?
Professor(X) :- Person(X),teachesCourse(X,Y).
Professor(X) :- advisorOf(X,Y),Professor(Y).
Person(X) :- Employee(X).Person(X) :- Professor(X).
Combining the first and the last rule leads to
Professor(X) :- Professor(X),teachesCourse(X,Y).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
36/41
µx
∪
π1
⋊⋉1=1
∪
employee x
teachesCourse
π1
⋊⋉2=1
advisorOf x
( ) (?)
( ) (?)
( )
( , , ?)
( )
( )
⇒
(?)
(?, , )
( )
⇒
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
36/41
µx
∪
π1
⋊⋉1=1
∪
employee x
teachesCourse
π1
⋊⋉2=1
advisorOf x
( )
(?)
( )
(?)
( )
( , , ?)
( )
( )
⇒
(?)
(?, , )
( )
⇒
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
36/41
µx
∪
π1
⋊⋉1=1
∪
employee x
teachesCourse
π1
⋊⋉2=1
advisorOf x
( )
(?)
( )
(?)
( )
( , , ?)
( )
( )
⇒ superfluous
(?)
(?, , )
( )
⇒
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
36/41
µx
∪
π1
⋊⋉1=1
∪
employee x
teachesCourse
π1
⋊⋉2=1
advisorOf x
( ) (?)
( ) (?)
( )
( , , ?)
( )
( )
⇒ superfluous
(?)
(?, , )
( )
⇒
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
36/41
µx
∪
π1
⋊⋉1=1
∪
employee x
teachesCourse
π1
⋊⋉2=1
advisorOf x
( ) (?)
( ) (?)
( )
( , , ?)
( )
( )
⇒ superfluous
(?)
(?, , )
( )
⇒ necessary
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
36/41
µx
∪
π1
⋊⋉1=1
∪
employee x
teachesCourse
π1
⋊⋉2=1
advisorOf x
(c1) (?)
(c1) (?)
(c1)
(c1, c1, ?)
(c1)
(c1)
⇒ superfluous
(?)
(?, c1, c1)
(c1)
⇒ necessary
Answering Queries with Unix Shell: Experiments
37/41
◮ Dataset: LUBM university benchmark
◮ 14 different queries
◮ Competitors: Datalog-based systems (DLV, Souffle, RDFox),Triple stores (Jena, Stardog, Virtuoso),Database management systems (MonetDB, Postgres)
Number of finished queries within time limit
BashDLV
Souffle
RDFoxJena
Stardog
Virtuoso
MonetDB*
Postgres*
0
5
10
14
#o
ffi
nis
hed
qu
erie
s
LUBM 10
* = we folded the TBox into the query
Answering Queries with Unix Shell: Experiments
38/41
101 102 103
100
101
102
103
# of universities
Ru
nti
me
(s)
Bash DLV RDFox
Stardog Souffle Jena
Virtuoso MonetDB Postgres
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Experiments
39/41
dataset Bash RDFox BigDatalog Stardog Virtuoso
LiveJournal 117 70 532 941 -orkut 225 121 1838 1123 -friendster 16306 - - - -
Table: Runtime for the reachability query, in seconds.
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Motivation
Idea
Approach
Optimization
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Summary
40/41
Summary:
◮ Preprocess large datasets without installing software
◮ Supports subset of SPARQL / OWL and Datalog as query language
◮ Try it online athttps://www.thomasrebele.org/projects/bashlog
◮ Source code available athttps://github.com/thomasrebele/bashlog
Future work:
◮ Numerical comparisons
◮ Aggregations (e.g., max, count)
Publication: ISWC 2018 (full paper)
Thomas Rebele Thomas P. Tanon Fabian Suchanek
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Outline
Using YAGO forthe humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Conclusion
41/41
This thesis showed how to extend YAGO along several axes:
◮ Improve completeness w.r.t. people
◮ Automatically repairing of its regular expressions
◮ Preprocessing queries using only a Bash shell
Other accomplishments:
◮ Source code of all contributions is available online
◮ Publications at ISWC 2016 (resource paper),ISWC 2017 (demo, workshop), PAKDD 2018 (full paper),ISWC 2018 (full paper)
Future work:
◮ More studies on human society using facts from YAGO (ongoing)
◮ Combine YAGO and Wikidata
◮ Queries with numerical comparisons and aggregations