ontological pathfinding: mining first-order knowledge from ...yang/doc/sigmod16/slides.pdf ·...
TRANSCRIPT
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Ontological Pathfinding: Mining First-OrderKnowledge from Large Knowledge Bases
Yang Chen, Sean Goldberg, Daisy Zhe Wang, SoumitraSiddharth Johri
{yang,sean,daisyw}@cise.ufl.edu, [email protected]
Computer and Information Science and EngineeringUniversity of Florida
SIGMOD’16, San Francisco, CAJun 29, 2016
Ontological Pathfinding Jun 29, 2016 1/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Outline
1 IntroductionKnowledge Bases
2 Ontological PathfindingPartitioningParallel Rule Mining
3 ExperimentsOverall ResultPartitioning
Ontological Pathfinding Jun 29, 2016 2/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Outline
1 IntroductionKnowledge Bases
2 Ontological PathfindingPartitioningParallel Rule Mining
3 ExperimentsOverall ResultPartitioning
Ontological Pathfinding Jun 29, 2016 3/25
Introduction Ontological Pathfinding Experiments
Knowledge BasesA knowledge base organizes human information in a structuredformat.
Predicate Subject ObjectisLocatedIn Washington, D.C. United StateshasCapital Canada OttawawasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesdealsWith United States Canada
H(x, y) b1(x, z) b2(y, z)dealsWith isLocatedIn isLocatedIndealsWith imports exportsisCitizenOf wasBornIn hasCapital
worksAt wasBornIn isLocatedInisLocatedIn hasCapital isLocatedIn
Figure: Knowledge base examples.
Ontological Pathfinding Jun 29, 2016 4/25
Introduction Ontological Pathfinding Experiments
Knowledge BasesA knowledge base organizes human information in a structuredformat.
H(x, y) b1(x, z) b2(y, z)dealsWith isLocatedIn isLocatedIndealsWith imports exportsisCitizenOf wasBornIn hasCapital
worksAt wasBornIn isLocatedInisLocatedIn hasCapital isLocatedIn
Figure: Knowledge base examples.
Ontological Pathfinding Jun 29, 2016 4/25
Introduction Ontological Pathfinding Experiments
Knowledge BasesA knowledge base organizes human information in a structuredformat.
H(x, y) b1(x, z) b2(y, z)dealsWith isLocatedIn isLocatedIndealsWith imports exportsisCitizenOf wasBornIn hasCapital
worksAt wasBornIn isLocatedInisLocatedIn hasCapital isLocatedIn
Figure: Knowledge base examples.Ontological Pathfinding Jun 29, 2016 4/25
Introduction Ontological Pathfinding Experiments
Knowledge Bases
ProbKB
Ontological Pathfinding Jun 29, 2016 5/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
First-Order Knowledge
Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis→ Kale helps prevent Osteoporosis.
Question answering;
Data cleaning;
Incremental knowledge construction.
Ontological Pathfinding Jun 29, 2016 6/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
First-Order Knowledge
Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis→ Kale helps prevent Osteoporosis.
Question answering;
Data cleaning;
Incremental knowledge construction.
Ontological Pathfinding Jun 29, 2016 6/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
First-Order Knowledge
Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis→ Kale helps prevent Osteoporosis.
Question answering;
Data cleaning;
Incremental knowledge construction.
Ontological Pathfinding Jun 29, 2016 6/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
First-Order Knowledge
Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis→ Kale helps prevent Osteoporosis.
Question answering;
Data cleaning;
Incremental knowledge construction.
Ontological Pathfinding Jun 29, 2016 6/25
Introduction Ontological Pathfinding Experiments
State-of-the-Art
AMIE
YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.
AMIE+
YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.
Sherlock
TextRunner: 250K facts;Runtime: 50 minutes.
Freebase: 112M entities, 388M facts;Is it possible to mine first-order rules from Freebase?
Ontological Pathfinding Jun 29, 2016 7/25
Introduction Ontological Pathfinding Experiments
State-of-the-Art
AMIE
YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.
AMIE+
YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.
Sherlock
TextRunner: 250K facts;Runtime: 50 minutes.
Freebase: 112M entities, 388M facts;Is it possible to mine first-order rules from Freebase?
Ontological Pathfinding Jun 29, 2016 7/25
Introduction Ontological Pathfinding Experiments
State-of-the-Art
AMIE
YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.
AMIE+
YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.
Sherlock
TextRunner: 250K facts;Runtime: 50 minutes.
Freebase: 112M entities, 388M facts;Is it possible to mine first-order rules from Freebase?
Ontological Pathfinding Jun 29, 2016 7/25
Introduction Ontological Pathfinding Experiments
State-of-the-Art
AMIE
YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.
AMIE+
YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.
Sherlock
TextRunner: 250K facts;Runtime: 50 minutes.
Freebase: 112M entities, 388M facts;
Is it possible to mine first-order rules from Freebase?
Ontological Pathfinding Jun 29, 2016 7/25
Introduction Ontological Pathfinding Experiments
State-of-the-Art
AMIE
YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.
AMIE+
YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.
Sherlock
TextRunner: 250K facts;Runtime: 50 minutes.
Freebase: 112M entities, 388M facts;Is it possible to mine first-order rules from Freebase?
Ontological Pathfinding Jun 29, 2016 7/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours;
publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.
(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)
Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.
(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)
Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.
(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Outline
1 IntroductionKnowledge Bases
2 Ontological PathfindingPartitioningParallel Rule Mining
3 ExperimentsOverall ResultPartitioning
Ontological Pathfinding Jun 29, 2016 9/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Partitioning
Ontological Pathfinding Jun 29, 2016 10/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Partitioning
Independent Overlapping Partitions
Partition 1
Partition 2
Ontological Pathfinding Jun 29, 2016 10/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Partitioning
Output 1
Output 2
Output 3
Independent Overlapping Partitions
Partition 1
Partition 2
Ontological Pathfinding Jun 29, 2016 10/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Partitioning
Output 1
Output 2
Output 3
Output
Independent Overlapping Partitions
Partition 1
Partition 2
Ontological Pathfinding Jun 29, 2016 10/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
The Partitioning Problem
Given size constraints s and m, find a partition {M1, . . . ,Mk} ofthe rules M that satisfies the following constraints:
(C1) |Γi| ≤ s, 1 ≤ i ≤ k
(C2) |Mi| ≤ m, 1 ≤ i ≤ k
(C3)k⋃
i=1
Mi = M,
(C4) Mi ∩Mj = ∅, 1 ≤ i < j ≤ k
Where σ(Γ,Mi) = |Γi| =∑
p∈Γ(Mi)
H0(p).
Ontological Pathfinding Jun 29, 2016 11/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
The Partitioning Problem
Given size constraints s and m, find a partition {M1, . . . ,Mk} ofthe rules M that satisfies the following constraints:
(C1) |Γi| ≤ s, 1 ≤ i ≤ k(C2) |Mi| ≤ m, 1 ≤ i ≤ k
(C3)k⋃
i=1
Mi = M,
(C4) Mi ∩Mj = ∅, 1 ≤ i < j ≤ k
Where σ(Γ,Mi) = |Γi| =∑
p∈Γ(Mi)
H0(p).
Ontological Pathfinding Jun 29, 2016 11/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
The Partitioning Problem
Given size constraints s and m, find a partition {M1, . . . ,Mk} ofthe rules M that satisfies the following constraints:
(C1) |Γi| ≤ s, 1 ≤ i ≤ k(C2) |Mi| ≤ m, 1 ≤ i ≤ k
(C3)k⋃
i=1
Mi = M,
(C4) Mi ∩Mj = ∅, 1 ≤ i < j ≤ k
Where σ(Γ,Mi) = |Γi| =∑
p∈Γ(Mi)
H0(p).
Ontological Pathfinding Jun 29, 2016 11/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
The Partitioning Problem
Given size constraints s and m, find a partition {M1, . . . ,Mk} ofthe rules M that satisfies the following constraints:
(C1) |Γi| ≤ s, 1 ≤ i ≤ k(C2) |Mi| ≤ m, 1 ≤ i ≤ k
(C3)k⋃
i=1
Mi = M,
(C4) Mi ∩Mj = ∅, 1 ≤ i < j ≤ k
Where σ(Γ,Mi) = |Γi| =∑
p∈Γ(Mi)
H0(p).
Ontological Pathfinding Jun 29, 2016 11/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Binary Partitioning Example
H(x,y) b1(x,z) b2(y,z)
dealsWith isLocatedIn isLocatedIndealsWith exports imports
isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn
isLocatedIn hasCapital isLocatedIn
p x y
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
(a) Γ (b) M
Partition 1
Partition 2
Ontological Pathfinding Jun 29, 2016 12/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Binary Partitioning Example
H(x,y) b1(x,z) b2(y,z)
dealsWith isLocatedIn isLocatedIndealsWith exports imports
isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn
isLocatedIn hasCapital isLocatedIn
p x y
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
(a) Γ (b) M
Partition 1
Partition 2
Ontological Pathfinding Jun 29, 2016 12/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Binary Partitioning Example
H(x,y) b1(x,z) b2(y,z)
dealsWith isLocatedIn isLocatedIndealsWith exports imports
isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn
isLocatedIn hasCapital isLocatedIn
p x y
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
(a) Γ (b) M
Partition 1
Partition 2
Ontological Pathfinding Jun 29, 2016 12/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Binary Partitioning Example
H(x,y) b1(x,z) b2(y,z)
dealsWith isLocatedIn isLocatedIndealsWith exports imports
isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn
isLocatedIn hasCapital isLocatedIn
p x y
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
(a) Γ (b) M
Partition 1
Partition 2
Ontological Pathfinding Jun 29, 2016 12/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Binary Partitioning Example
H(x,y) b1(x,z) b2(y,z)
dealsWith isLocatedIn isLocatedIndealsWith exports imports
isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn
isLocatedIn hasCapital isLocatedIn
p x y
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
(a) Γ (b) M
Partition 1
Partition 2
Ontological Pathfinding Jun 29, 2016 12/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Binary Partitioning Example
H(x,y) b1(x,z) b2(y,z)
dealsWith isLocatedIn isLocatedIndealsWith exports imports
isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn
isLocatedIn hasCapital isLocatedIn
p x y
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
(a) Γ (b) M
Partition 1
Partition 2
Ontological Pathfinding Jun 29, 2016 12/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Binary Partitioning Example
H(x,y) b1(x,z) b2(y,z)
dealsWith isLocatedIn isLocatedIndealsWith exports imports
isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn
isLocatedIn hasCapital isLocatedIn
p x y
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
(a) Γ (b) M
Partition 1
Partition 2
M1
M2Γ1
Γ2
Ontological Pathfinding Jun 29, 2016 12/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Partitioning
Joining partitioned RDDs requires:
O(tl−1|S||M |)→ O(tl−1sm|M |).
bounded above by the largest partition size sm.
Ontological Pathfinding Jun 29, 2016 13/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Group facts by the join variable z.
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Group facts by the join variable z.
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
For each group, apply inference rules by an in-memory hashjoin, each fact noted by the inferring rule or “0” for basefacts.
R1
R2
R3
R1
R2
R3
Group joins
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
CheckGroup by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
CheckGroup by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
CheckGroup by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
CheckGroup by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
CheckGroup by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Reduce by r, aggregating the counts.
R1
R2
R3
R1
R2
R3
Group joins CountGroup by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
CheckGroup by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each rule to its confidence score.
R1
R2
R3
R1
R2
R3
Group joins CountGroup by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
CheckGroup by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Rule PruningThe Non-Functionality Problem
Example
diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).
“diedIn,” “wasBornIn” are N : 1 predicates.
Large intermediate results.
Histogram based detection:
Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;
t picked by experiments.
Ontological Pathfinding Jun 29, 2016 15/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Rule PruningThe Non-Functionality Problem
Example
diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).
“diedIn,” “wasBornIn” are N : 1 predicates.
Large intermediate results.
Histogram based detection:
Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;
t picked by experiments.
Ontological Pathfinding Jun 29, 2016 15/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Rule PruningThe Non-Functionality Problem
Example
diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).
“diedIn,” “wasBornIn” are N : 1 predicates.
Large intermediate results.
Histogram based detection:
Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;
t picked by experiments.
Ontological Pathfinding Jun 29, 2016 15/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Rule PruningThe Non-Functionality Problem
Example
diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).
“diedIn,” “wasBornIn” are N : 1 predicates.
Large intermediate results.
Histogram based detection:
Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};
Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;
t picked by experiments.
Ontological Pathfinding Jun 29, 2016 15/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Rule PruningThe Non-Functionality Problem
Example
diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).
“diedIn,” “wasBornIn” are N : 1 predicates.
Large intermediate results.
Histogram based detection:
Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};
Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;
t picked by experiments.
Ontological Pathfinding Jun 29, 2016 15/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Rule PruningThe Non-Functionality Problem
Example
diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).
“diedIn,” “wasBornIn” are N : 1 predicates.
Large intermediate results.
Histogram based detection:
Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;
t picked by experiments.
Ontological Pathfinding Jun 29, 2016 15/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Rule PruningThe Non-Functionality Problem
Example
diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).
“diedIn,” “wasBornIn” are N : 1 predicates.
Large intermediate results.
Histogram based detection:
Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;
t picked by experiments.
Ontological Pathfinding Jun 29, 2016 15/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Outline
1 IntroductionKnowledge Bases
2 Ontological PathfindingPartitioningParallel Rule Mining
3 ExperimentsOverall ResultPartitioning
Ontological Pathfinding Jun 29, 2016 16/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Datasets
KB YAGO2 YAGO2s Freebase
# Predicates 130 126 67,415# Entities 834,554 2,137,468 111,781,246# Facts 948,047 4,484,907 388,474,630
Table: Dataset statistics.
Ontological Pathfinding Jun 29, 2016 17/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
ExperimentsOverall Results
KB Algorithm # Rules Precision RuntimeOP 218 0.35 3.59 min
YAGO2AMIE 1090 0.46 4.56 min
OP 312 0.35 19.40 minYAGO2s
AMIE 278+ N/A 5+ d
OP 36,625 0.60 33.22 hFreebase
AMIE 0+ N/A 5+ d
Table: Overall mining result.
Ontological Pathfinding Jun 29, 2016 18/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
ExperimentsQuality
We detect trivial extensions and composite rules, which providelittle knowledge in addition to lengths 2 and 3 rules.
Trivial extensions add valid rules to body of another rule.
book/book/first edition(x, u), book/book edition/book(u, v),book/book/first edition(v, y)→ book/book/editions(x, y)
Composite rules chain multiple shorter rules.
film/film/sequel(x, u), film/film/country(u, v), (→file/film/country(x, v))
location/country/official language(v, y)→film/film/language(x, y)
Ontological Pathfinding Jun 29, 2016 19/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
ExperimentsQuality
We detect trivial extensions and composite rules, which providelittle knowledge in addition to lengths 2 and 3 rules.
Trivial extensions add valid rules to body of another rule.
book/book/first edition(x, u), book/book edition/book(u, v),book/book/first edition(v, y)→ book/book/editions(x, y)
Composite rules chain multiple shorter rules.
film/film/sequel(x, u), film/film/country(u, v), (→file/film/country(x, v))
location/country/official language(v, y)→film/film/language(x, y)
Ontological Pathfinding Jun 29, 2016 19/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
ExperimentsQuality
We detect trivial extensions and composite rules, which providelittle knowledge in addition to lengths 2 and 3 rules.
Trivial extensions add valid rules to body of another rule.
book/book/first edition(x, u), book/book edition/book(u, v),book/book/first edition(v, y)→ book/book/editions(x, y)
Composite rules chain multiple shorter rules.
film/film/sequel(x, u), film/film/country(u, v), (→file/film/country(x, v))
location/country/official language(v, y)→film/film/language(x, y)
Ontological Pathfinding Jun 29, 2016 19/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
ExperimentsQuality
Correct3.4%
Incorrect
6.3%
Composite
9.0%
Trivialextensions
81.3%
(c) Freebase Length 4 Rules
Figure: Quality of long rules.
Ontological Pathfinding Jun 29, 2016 20/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
ExperimentsQuality
2 3 4 5Rule Length
0
100
200
300
400
500
600
700
#M
ined
Rules
0.0
0.2
0.4
0.6
0.8
1.0
Precision
# Rules
Precision
Runtime
0
1
2
3
4
5
6
7
8
Run
time/h
(a) YAGO2s: Rule Lengths
2 3 4Rule Length
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
#M
ined
Rules
0.0
0.2
0.4
0.6
0.8
1.0
Precision
# Rules
Precision
Runtime
0
20
40
60
80
100
Run
time/h
(b) Freebase: Rule Lengths
Figure: OP performance for mining lengths 4 (YAGO and Freebase) and5 (YAGO) rules.
Ontological Pathfinding Jun 29, 2016 21/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
ExperimentsEffect of Partitioning
Partitions0
5
10
15
20
25
30
Parti
tion
size
£ ru
le si
ze/109
Freebase Partitions (s=20M, m=2K)
0
10
20
30
40
50
60
70
80
Run
time/
min
Partition size £ rule sizeRuntime
Partitions0
100
200
300
400
500
600
700
Parti
tion
size
£ ru
le si
ze/109
Freebase Partitions (s=200M, m=10K)
200
400
600
800
1000
1200
1400
1600
1800
2000
Run
time/
min
Partition size £ rule sizeRuntime
Figure: Effect of partitioning.
Ontological Pathfinding Jun 29, 2016 22/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
ExperimentsEffect of Partitioning
050100150200Max Partition size/M
10
20
30
40
50
60
Run
time/
h
m = 10K
m = 5K
m = 2K
m = 1K
(a) Freebase: Runtime vs Partitioning
050100150200Max Partition size/M
0
5
10
15
20
25
30
Run
time/
h
m = 10K
m = 5K
m = 2K
m = 1K
(b) Freebase: Max Runtime vs Partitioning
050100150200Max Partition size/M
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
DO
V
m = 10K
m = 5K
m = 2K
m = 1K
(c) Freebase: DOV vs Partitioning
01234Max Partition size/M
5
10
15
20
25
30R
untim
e/m
in
m = 1000
m = 500
m = 1000; max runtime m = 500;max runtime
(d) YAGO2s: Runtime vs Partitioning
0 100 200 300 400 500 600Functional Constraint
0
10
20
30
40
50
60
70
80
YA
GO
2s R
untim
e/m
in
0
2
4
6
8
10
12
14
Free
base
Run
time/
h
(e) Runtime vs Functional Constraint
YAGO2s runtimeFreebase runtime
0 100 200 300 400 500 600Functional Constraint
0.80
0.85
0.90
0.95
1.00
Prun
ing
Prec
isio
n
1000
2000
3000
4000
5000
6000
# Pr
uned
Rul
es
(f) Pruned Rules Quality
YAGO2s pruning precisionYAGO2s # pruned rulesFreebase pruning precisionFreebase # pruned rules
Runtime: 2.55 days → 5.06 hours.
Slowest partition: 1.27 days → 38.14 minutes.
Ontological Pathfinding Jun 29, 2016 23/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Conclusion
Design the Ontological Pathfinding algorithm that scales rulemining to Freebase (largest KB with 388M facts in 34 hours).
Partition KB into independent subsets to reduce join sizes.
Divide joins into smaller joins that run in parallel. Prototypewith Spark.
Publish the first Freebase rule set (36,625 inference rules).
Open-source athttp://dsr.cise.ufl.edu/projects/probkb-web-scale-probabilistic-knowledge-base.
Ontological Pathfinding Jun 29, 2016 24/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Thank you!
Yang Chen:http://cise.ufl.edu/˜yang
Data Science Research at UF:http://dsr.cise.ufl.edu
Questions?
Ontological Pathfinding Jun 29, 2016 25/25