ontological pathfinding: mining first-order knowledge from ...yang/doc/sigmod16/slides.pdf ·...

82
Data Science Research @ Introduction Ontological Pathfinding Experiments Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri {yang,sean,daisyw}@cise.ufl.edu, soumitra.johri@ufl.edu Computer and Information Science and Engineering University of Florida SIGMOD’16, San Francisco, CA Jun 29, 2016 Ontological Pathfinding Jun 29, 2016 1/25

Upload: others

Post on 15-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Ontological Pathfinding: Mining First-OrderKnowledge from Large Knowledge Bases

Yang Chen, Sean Goldberg, Daisy Zhe Wang, SoumitraSiddharth Johri

{yang,sean,daisyw}@cise.ufl.edu, [email protected]

Computer and Information Science and EngineeringUniversity of Florida

SIGMOD’16, San Francisco, CAJun 29, 2016

Ontological Pathfinding Jun 29, 2016 1/25

Page 2: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Outline

1 IntroductionKnowledge Bases

2 Ontological PathfindingPartitioningParallel Rule Mining

3 ExperimentsOverall ResultPartitioning

Ontological Pathfinding Jun 29, 2016 2/25

Page 3: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Outline

1 IntroductionKnowledge Bases

2 Ontological PathfindingPartitioningParallel Rule Mining

3 ExperimentsOverall ResultPartitioning

Ontological Pathfinding Jun 29, 2016 3/25

Page 4: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Introduction Ontological Pathfinding Experiments

Knowledge BasesA knowledge base organizes human information in a structuredformat.

Predicate Subject ObjectisLocatedIn Washington, D.C. United StateshasCapital Canada OttawawasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesdealsWith United States Canada

H(x, y) b1(x, z) b2(y, z)dealsWith isLocatedIn isLocatedIndealsWith imports exportsisCitizenOf wasBornIn hasCapital

worksAt wasBornIn isLocatedInisLocatedIn hasCapital isLocatedIn

Figure: Knowledge base examples.

Ontological Pathfinding Jun 29, 2016 4/25

Page 5: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Introduction Ontological Pathfinding Experiments

Knowledge BasesA knowledge base organizes human information in a structuredformat.

H(x, y) b1(x, z) b2(y, z)dealsWith isLocatedIn isLocatedIndealsWith imports exportsisCitizenOf wasBornIn hasCapital

worksAt wasBornIn isLocatedInisLocatedIn hasCapital isLocatedIn

Figure: Knowledge base examples.

Ontological Pathfinding Jun 29, 2016 4/25

Page 6: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Introduction Ontological Pathfinding Experiments

Knowledge BasesA knowledge base organizes human information in a structuredformat.

H(x, y) b1(x, z) b2(y, z)dealsWith isLocatedIn isLocatedIndealsWith imports exportsisCitizenOf wasBornIn hasCapital

worksAt wasBornIn isLocatedInisLocatedIn hasCapital isLocatedIn

Figure: Knowledge base examples.Ontological Pathfinding Jun 29, 2016 4/25

Page 7: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Introduction Ontological Pathfinding Experiments

Knowledge Bases

ProbKB

Ontological Pathfinding Jun 29, 2016 5/25

Page 8: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

First-Order Knowledge

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis→ Kale helps prevent Osteoporosis.

Question answering;

Data cleaning;

Incremental knowledge construction.

Ontological Pathfinding Jun 29, 2016 6/25

Page 9: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

First-Order Knowledge

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis→ Kale helps prevent Osteoporosis.

Question answering;

Data cleaning;

Incremental knowledge construction.

Ontological Pathfinding Jun 29, 2016 6/25

Page 10: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

First-Order Knowledge

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis→ Kale helps prevent Osteoporosis.

Question answering;

Data cleaning;

Incremental knowledge construction.

Ontological Pathfinding Jun 29, 2016 6/25

Page 11: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

First-Order Knowledge

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis→ Kale helps prevent Osteoporosis.

Question answering;

Data cleaning;

Incremental knowledge construction.

Ontological Pathfinding Jun 29, 2016 6/25

Page 12: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.

AMIE+

YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.

Sherlock

TextRunner: 250K facts;Runtime: 50 minutes.

Freebase: 112M entities, 388M facts;Is it possible to mine first-order rules from Freebase?

Ontological Pathfinding Jun 29, 2016 7/25

Page 13: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.

AMIE+

YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.

Sherlock

TextRunner: 250K facts;Runtime: 50 minutes.

Freebase: 112M entities, 388M facts;Is it possible to mine first-order rules from Freebase?

Ontological Pathfinding Jun 29, 2016 7/25

Page 14: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.

AMIE+

YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.

Sherlock

TextRunner: 250K facts;Runtime: 50 minutes.

Freebase: 112M entities, 388M facts;Is it possible to mine first-order rules from Freebase?

Ontological Pathfinding Jun 29, 2016 7/25

Page 15: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.

AMIE+

YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.

Sherlock

TextRunner: 250K facts;Runtime: 50 minutes.

Freebase: 112M entities, 388M facts;

Is it possible to mine first-order rules from Freebase?

Ontological Pathfinding Jun 29, 2016 7/25

Page 16: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts;Runtime: 3.59 minutes.

AMIE+

YAGO2S: 2.1M entities, 4.5M facts;Runtime: 1 hour.

Sherlock

TextRunner: 250K facts;Runtime: 50 minutes.

Freebase: 112M entities, 388M facts;Is it possible to mine first-order rules from Freebase?

Ontological Pathfinding Jun 29, 2016 7/25

Page 17: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledgebases.

Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.

Contributions:

Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

Page 18: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledgebases.

Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours;

publish the first Freebase rule set.

Contributions:

Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

Page 19: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledgebases.

Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.

Contributions:

Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

Page 20: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledgebases.

Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.

Contributions:

Partition KB into independent subsets to reduce join sizes.

(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

Page 21: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledgebases.

Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.

Contributions:

Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)

Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

Page 22: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledgebases.

Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.

Contributions:

Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.

(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

Page 23: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledgebases.

Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.

Contributions:

Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)

Prune inefficient and erroneous candidate rules.(Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

Page 24: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledgebases.

Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.

Contributions:

Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.

(Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

Page 25: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledgebases.

Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.

Contributions:

Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

Page 26: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Outline

1 IntroductionKnowledge Bases

2 Ontological PathfindingPartitioningParallel Rule Mining

3 ExperimentsOverall ResultPartitioning

Ontological Pathfinding Jun 29, 2016 9/25

Page 27: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Ontological Pathfinding Jun 29, 2016 10/25

Page 28: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Independent Overlapping Partitions

Partition 1

Partition 2

Ontological Pathfinding Jun 29, 2016 10/25

Page 29: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Output 1

Output 2

Output 3

Independent Overlapping Partitions

Partition 1

Partition 2

Ontological Pathfinding Jun 29, 2016 10/25

Page 30: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Output 1

Output 2

Output 3

Output

Independent Overlapping Partitions

Partition 1

Partition 2

Ontological Pathfinding Jun 29, 2016 10/25

Page 31: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

The Partitioning Problem

Given size constraints s and m, find a partition {M1, . . . ,Mk} ofthe rules M that satisfies the following constraints:

(C1) |Γi| ≤ s, 1 ≤ i ≤ k

(C2) |Mi| ≤ m, 1 ≤ i ≤ k

(C3)k⋃

i=1

Mi = M,

(C4) Mi ∩Mj = ∅, 1 ≤ i < j ≤ k

Where σ(Γ,Mi) = |Γi| =∑

p∈Γ(Mi)

H0(p).

Ontological Pathfinding Jun 29, 2016 11/25

Page 32: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

The Partitioning Problem

Given size constraints s and m, find a partition {M1, . . . ,Mk} ofthe rules M that satisfies the following constraints:

(C1) |Γi| ≤ s, 1 ≤ i ≤ k(C2) |Mi| ≤ m, 1 ≤ i ≤ k

(C3)k⋃

i=1

Mi = M,

(C4) Mi ∩Mj = ∅, 1 ≤ i < j ≤ k

Where σ(Γ,Mi) = |Γi| =∑

p∈Γ(Mi)

H0(p).

Ontological Pathfinding Jun 29, 2016 11/25

Page 33: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

The Partitioning Problem

Given size constraints s and m, find a partition {M1, . . . ,Mk} ofthe rules M that satisfies the following constraints:

(C1) |Γi| ≤ s, 1 ≤ i ≤ k(C2) |Mi| ≤ m, 1 ≤ i ≤ k

(C3)k⋃

i=1

Mi = M,

(C4) Mi ∩Mj = ∅, 1 ≤ i < j ≤ k

Where σ(Γ,Mi) = |Γi| =∑

p∈Γ(Mi)

H0(p).

Ontological Pathfinding Jun 29, 2016 11/25

Page 34: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

The Partitioning Problem

Given size constraints s and m, find a partition {M1, . . . ,Mk} ofthe rules M that satisfies the following constraints:

(C1) |Γi| ≤ s, 1 ≤ i ≤ k(C2) |Mi| ≤ m, 1 ≤ i ≤ k

(C3)k⋃

i=1

Mi = M,

(C4) Mi ∩Mj = ∅, 1 ≤ i < j ≤ k

Where σ(Γ,Mi) = |Γi| =∑

p∈Γ(Mi)

H0(p).

Ontological Pathfinding Jun 29, 2016 11/25

Page 35: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z)

dealsWith isLocatedIn isLocatedIndealsWith exports imports

isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn

isLocatedIn hasCapital isLocatedIn

p x y

exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.

(a) Γ (b) M

Partition 1

Partition 2

Ontological Pathfinding Jun 29, 2016 12/25

Page 36: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z)

dealsWith isLocatedIn isLocatedIndealsWith exports imports

isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn

isLocatedIn hasCapital isLocatedIn

p x y

exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.

(a) Γ (b) M

Partition 1

Partition 2

Ontological Pathfinding Jun 29, 2016 12/25

Page 37: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z)

dealsWith isLocatedIn isLocatedIndealsWith exports imports

isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn

isLocatedIn hasCapital isLocatedIn

p x y

exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.

(a) Γ (b) M

Partition 1

Partition 2

Ontological Pathfinding Jun 29, 2016 12/25

Page 38: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z)

dealsWith isLocatedIn isLocatedIndealsWith exports imports

isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn

isLocatedIn hasCapital isLocatedIn

p x y

exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.

(a) Γ (b) M

Partition 1

Partition 2

Ontological Pathfinding Jun 29, 2016 12/25

Page 39: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z)

dealsWith isLocatedIn isLocatedIndealsWith exports imports

isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn

isLocatedIn hasCapital isLocatedIn

p x y

exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.

(a) Γ (b) M

Partition 1

Partition 2

Ontological Pathfinding Jun 29, 2016 12/25

Page 40: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z)

dealsWith isLocatedIn isLocatedIndealsWith exports imports

isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn

isLocatedIn hasCapital isLocatedIn

p x y

exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.

(a) Γ (b) M

Partition 1

Partition 2

Ontological Pathfinding Jun 29, 2016 12/25

Page 41: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z)

dealsWith isLocatedIn isLocatedIndealsWith exports imports

isCitizenOf wasBornIn hasCapitalworksAt wasBornIn isLocatedIn

isLocatedIn hasCapital isLocatedIn

p x y

exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.

(a) Γ (b) M

Partition 1

Partition 2

M1

M2Γ1

Γ2

Ontological Pathfinding Jun 29, 2016 12/25

Page 42: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Joining partitioned RDDs requires:

O(tl−1|S||M |)→ O(tl−1sm|M |).

bounded above by the largest partition size sm.

Ontological Pathfinding Jun 29, 2016 13/25

Page 43: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Group facts by the join variable z.

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 44: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Group facts by the join variable z.

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 45: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

For each group, apply inference rules by an in-memory hashjoin, each fact noted by the inferring rule or “0” for basefacts.

R1

R2

R3

R1

R2

R3

Group joins

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 46: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 47: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 48: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 49: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 50: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 51: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 52: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 53: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 54: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 55: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 56: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 57: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

CheckGroup by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 58: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

CheckGroup by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 59: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

CheckGroup by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 60: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

CheckGroup by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 61: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Check (fact, {r}) and generate (r, c, 1) tuples, wherec = (0 ∈ {r}) indicates correctness.

R1

R2

R3

R1

R2

R3

Group joins Group by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

CheckGroup by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 62: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Reduce by r, aggregating the counts.

R1

R2

R3

R1

R2

R3

Group joins CountGroup by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

CheckGroup by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 63: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y)← q(x, z), r(y, z).

Map each rule to its confidence score.

R1

R2

R3

R1

R2

R3

Group joins CountGroup by facts

R1

R2

R3

F1F2, F5

F4

F2, F3F1

F2

F5F3, F4

F5

CheckGroup by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

Page 64: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule PruningThe Non-Functionality Problem

Example

diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).

“diedIn,” “wasBornIn” are N : 1 predicates.

Large intermediate results.

Histogram based detection:

Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;

t picked by experiments.

Ontological Pathfinding Jun 29, 2016 15/25

Page 65: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule PruningThe Non-Functionality Problem

Example

diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).

“diedIn,” “wasBornIn” are N : 1 predicates.

Large intermediate results.

Histogram based detection:

Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;

t picked by experiments.

Ontological Pathfinding Jun 29, 2016 15/25

Page 66: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule PruningThe Non-Functionality Problem

Example

diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).

“diedIn,” “wasBornIn” are N : 1 predicates.

Large intermediate results.

Histogram based detection:

Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;

t picked by experiments.

Ontological Pathfinding Jun 29, 2016 15/25

Page 67: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule PruningThe Non-Functionality Problem

Example

diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).

“diedIn,” “wasBornIn” are N : 1 predicates.

Large intermediate results.

Histogram based detection:

Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};

Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;

t picked by experiments.

Ontological Pathfinding Jun 29, 2016 15/25

Page 68: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule PruningThe Non-Functionality Problem

Example

diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).

“diedIn,” “wasBornIn” are N : 1 predicates.

Large intermediate results.

Histogram based detection:

Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};

Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;

t picked by experiments.

Ontological Pathfinding Jun 29, 2016 15/25

Page 69: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule PruningThe Non-Functionality Problem

Example

diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).

“diedIn,” “wasBornIn” are N : 1 predicates.

Large intermediate results.

Histogram based detection:

Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;

t picked by experiments.

Ontological Pathfinding Jun 29, 2016 15/25

Page 70: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule PruningThe Non-Functionality Problem

Example

diedIn(x, z), wasBornIn(y, z)→ hasAcademicAdvisor(x, y).

“diedIn,” “wasBornIn” are N : 1 predicates.

Large intermediate results.

Histogram based detection:

Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};Functional constraint t requires H2(diedIn, z) ≤ t andH2(wasBornIn, z) ≤ t for ∀z;

t picked by experiments.

Ontological Pathfinding Jun 29, 2016 15/25

Page 71: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Outline

1 IntroductionKnowledge Bases

2 Ontological PathfindingPartitioningParallel Rule Mining

3 ExperimentsOverall ResultPartitioning

Ontological Pathfinding Jun 29, 2016 16/25

Page 72: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Datasets

KB YAGO2 YAGO2s Freebase

# Predicates 130 126 67,415# Entities 834,554 2,137,468 111,781,246# Facts 948,047 4,484,907 388,474,630

Table: Dataset statistics.

Ontological Pathfinding Jun 29, 2016 17/25

Page 73: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

ExperimentsOverall Results

KB Algorithm # Rules Precision RuntimeOP 218 0.35 3.59 min

YAGO2AMIE 1090 0.46 4.56 min

OP 312 0.35 19.40 minYAGO2s

AMIE 278+ N/A 5+ d

OP 36,625 0.60 33.22 hFreebase

AMIE 0+ N/A 5+ d

Table: Overall mining result.

Ontological Pathfinding Jun 29, 2016 18/25

Page 74: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

ExperimentsQuality

We detect trivial extensions and composite rules, which providelittle knowledge in addition to lengths 2 and 3 rules.

Trivial extensions add valid rules to body of another rule.

book/book/first edition(x, u), book/book edition/book(u, v),book/book/first edition(v, y)→ book/book/editions(x, y)

Composite rules chain multiple shorter rules.

film/film/sequel(x, u), film/film/country(u, v), (→file/film/country(x, v))

location/country/official language(v, y)→film/film/language(x, y)

Ontological Pathfinding Jun 29, 2016 19/25

Page 75: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

ExperimentsQuality

We detect trivial extensions and composite rules, which providelittle knowledge in addition to lengths 2 and 3 rules.

Trivial extensions add valid rules to body of another rule.

book/book/first edition(x, u), book/book edition/book(u, v),book/book/first edition(v, y)→ book/book/editions(x, y)

Composite rules chain multiple shorter rules.

film/film/sequel(x, u), film/film/country(u, v), (→file/film/country(x, v))

location/country/official language(v, y)→film/film/language(x, y)

Ontological Pathfinding Jun 29, 2016 19/25

Page 76: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

ExperimentsQuality

We detect trivial extensions and composite rules, which providelittle knowledge in addition to lengths 2 and 3 rules.

Trivial extensions add valid rules to body of another rule.

book/book/first edition(x, u), book/book edition/book(u, v),book/book/first edition(v, y)→ book/book/editions(x, y)

Composite rules chain multiple shorter rules.

film/film/sequel(x, u), film/film/country(u, v), (→file/film/country(x, v))

location/country/official language(v, y)→film/film/language(x, y)

Ontological Pathfinding Jun 29, 2016 19/25

Page 77: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

ExperimentsQuality

Correct3.4%

Incorrect

6.3%

Composite

9.0%

Trivialextensions

81.3%

(c) Freebase Length 4 Rules

Figure: Quality of long rules.

Ontological Pathfinding Jun 29, 2016 20/25

Page 78: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

ExperimentsQuality

2 3 4 5Rule Length

0

100

200

300

400

500

600

700

#M

ined

Rules

0.0

0.2

0.4

0.6

0.8

1.0

Precision

# Rules

Precision

Runtime

0

1

2

3

4

5

6

7

8

Run

time/h

(a) YAGO2s: Rule Lengths

2 3 4Rule Length

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

#M

ined

Rules

0.0

0.2

0.4

0.6

0.8

1.0

Precision

# Rules

Precision

Runtime

0

20

40

60

80

100

Run

time/h

(b) Freebase: Rule Lengths

Figure: OP performance for mining lengths 4 (YAGO and Freebase) and5 (YAGO) rules.

Ontological Pathfinding Jun 29, 2016 21/25

Page 79: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

ExperimentsEffect of Partitioning

Partitions0

5

10

15

20

25

30

Parti

tion

size

£ ru

le si

ze/109

Freebase Partitions (s=20M, m=2K)

0

10

20

30

40

50

60

70

80

Run

time/

min

Partition size £ rule sizeRuntime

Partitions0

100

200

300

400

500

600

700

Parti

tion

size

£ ru

le si

ze/109

Freebase Partitions (s=200M, m=10K)

200

400

600

800

1000

1200

1400

1600

1800

2000

Run

time/

min

Partition size £ rule sizeRuntime

Figure: Effect of partitioning.

Ontological Pathfinding Jun 29, 2016 22/25

Page 80: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

ExperimentsEffect of Partitioning

050100150200Max Partition size/M

10

20

30

40

50

60

Run

time/

h

m = 10K

m = 5K

m = 2K

m = 1K

(a) Freebase: Runtime vs Partitioning

050100150200Max Partition size/M

0

5

10

15

20

25

30

Run

time/

h

m = 10K

m = 5K

m = 2K

m = 1K

(b) Freebase: Max Runtime vs Partitioning

050100150200Max Partition size/M

1.00

1.05

1.10

1.15

1.20

1.25

1.30

1.35

DO

V

m = 10K

m = 5K

m = 2K

m = 1K

(c) Freebase: DOV vs Partitioning

01234Max Partition size/M

5

10

15

20

25

30R

untim

e/m

in

m = 1000

m = 500

m = 1000; max runtime m = 500;max runtime

(d) YAGO2s: Runtime vs Partitioning

0 100 200 300 400 500 600Functional Constraint

0

10

20

30

40

50

60

70

80

YA

GO

2s R

untim

e/m

in

0

2

4

6

8

10

12

14

Free

base

Run

time/

h

(e) Runtime vs Functional Constraint

YAGO2s runtimeFreebase runtime

0 100 200 300 400 500 600Functional Constraint

0.80

0.85

0.90

0.95

1.00

Prun

ing

Prec

isio

n

1000

2000

3000

4000

5000

6000

# Pr

uned

Rul

es

(f) Pruned Rules Quality

YAGO2s pruning precisionYAGO2s # pruned rulesFreebase pruning precisionFreebase # pruned rules

Runtime: 2.55 days → 5.06 hours.

Slowest partition: 1.27 days → 38.14 minutes.

Ontological Pathfinding Jun 29, 2016 23/25

Page 81: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Conclusion

Design the Ontological Pathfinding algorithm that scales rulemining to Freebase (largest KB with 388M facts in 34 hours).

Partition KB into independent subsets to reduce join sizes.

Divide joins into smaller joins that run in parallel. Prototypewith Spark.

Publish the first Freebase rule set (36,625 inference rules).

Open-source athttp://dsr.cise.ufl.edu/projects/probkb-web-scale-probabilistic-knowledge-base.

Ontological Pathfinding Jun 29, 2016 24/25

Page 82: Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Thank you!

Yang Chen:http://cise.ufl.edu/˜yang

Data Science Research at UF:http://dsr.cise.ufl.edu

Questions?

Ontological Pathfinding Jun 29, 2016 25/25