[width=3cm]images/colibri.png colibri unsupervised link ......1 use unsupervised link discovery no...
TRANSCRIPT
COLIBRIUnsupervised Link Discovery Through Knowledge Base
Repair
Axel-Cyrille Ngonga Ngomo Mohamed Ahmed Sherif Klaus Lyko
ESWC 2014, Crete, Greece
Outline Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Ngonga Ngomo· Sherif · Lyko Colibri 2/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Why Link Discovery?
1 Fourth principle
2 Links are central for• Cross-ontology QA• Data Integration• Reasoning• Federated Queries• ...
Ngonga Ngomo· Sherif · Lyko Colibri 2/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Why is it difficult?• Time complexity
• Large number of triples• Quadratic runtime
• Complexity of specifications• Combination of several attributes required for high precision• Tedious discovery of most adequate mapping• Dataset-dependent similarity functions
http://saim.aksw.org
Ngonga Ngomo· Sherif · Lyko Colibri 3/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Solution
1 Use unsupervised link discovery• No need for training data• Minimizes load on user
2 Combine results of linking tasksover n > 2 knowledge bases
• Make explicit use of the topologyof the Data Web
3 Repair noisy data to improve linkdiscovery
• Address different quality ofdatasets across the Data Web
Ngonga Ngomo· Sherif · Lyko Colibri 4/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Ngonga Ngomo· Sherif · Lyko Colibri 5/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2. . . Kn
Unsupervised LD
Voting
Mappings
Repair
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 5/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2. . . Kn
Unsupervised LD
Voting
Mappings
Repair
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 5/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2. . . Kn
Unsupervised LD
Voting
Mappings
Repair
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 5/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2. . . Kn
Unsupervised LD
Voting
Mappings
Repair
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 5/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2. . . Kn
Unsupervised LD
Voting
Mappings
Repair
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 5/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2. . . Kn
Unsupervised LD
Voting
Mappings
Repair
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 5/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Key Concepts
• Mapping matrix
• M12 =
1 0 00 1 00 0 0
ex2:1
ex2:2
ex2:3
ex1:1
ex1:2
ex1:3
1
1
K2K1
Ngonga Ngomo· Sherif · Lyko Colibri 6/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Key Concepts
• Pseudo-F-measure as objective function
• P(Mij) =|links(Ki ,Mij )|+|links(Kj ,Mij )|
2|Mij |
• R(Mij) =|links(Ki ,Mij )|+|links(Kj ,Mij )|
|Ki |+|Kj |
• Fβ = (1 + β2) PRβ2P+R
Example:
• P(M12) = 1
• R(M12) = 23
• F1(M12) = 45
ex2:1
ex2:2
ex2:3
ex1:1
ex1:2
ex1:3
1
1
K2K1
Ngonga Ngomo· Sherif · Lyko Colibri 7/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 1: Unsupervised Link Discovery
K1 K2. . . Kn
Unsupervised LD
VotingRepair
Mappings
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 8/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 1: Unsupervised Link Discovery
• Link all pairs (Ki ,Kj) using any unsupervised link discoveryapproach
• Here, Euclid• Specifications are points in a similarity space• Find accurate specification by using hierarchical grid search• Detect specification which maximizes Fβ
Ngonga Ngomo· Sherif · Lyko Colibri 9/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 1: Unsupervised Link Discovery
• Mapping matrices
• M12 =
1 0 00 1 00 0 0
• M13 =
1 0 10 0.5 00 0 0.5
• M23 =
1 0 00 0.5 00 0 0.5
ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
11
0.5
0.51
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 10/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
K1 K2. . . Kn
Unsupervised LD
VotingRepair
Mappings
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 11/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
Ki s
Kjt
1
Kk z
1
1
Ngonga Ngomo· Sherif · Lyko Colibri 12/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
Ki s
Kjt
1
Kk z
1
1
Ngonga Ngomo· Sherif · Lyko Colibri 12/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
Ki s
Kjt
1
Kk z
1
1
Ngonga Ngomo· Sherif · Lyko Colibri 12/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Vij = 1n−1
Mij +n∑
k=1k 6=i ,j
MikMkj
• Mapping matrices
• M12 =
1 0 00 1 00 0 0
• M13 =
1 0 10 0.5 00 0 0.5
• M23 =
1 0 00 0.5 00 0 0.5
ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
11
0.5
0.51
0.5
0.5
K3
K1
K2
V12 =
1 0 0.250 0.625 00 0 0.125
Ngonga Ngomo· Sherif · Lyko Colibri 13/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Vij = 1n−1
Mij +n∑
k=1k 6=i ,j
MikMkj
• Mapping matrices
• M12 =
1 0 00 1 00 0 0
• M13 =
1 0 10 0.5 00 0 0.5
• M23 =
1 0 00 0.5 00 0 0.5
ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
11
0.5
0.51
0.5
0.5
K3
K1
K2
V12 =
1 0 0.250 0.625 00 0 0.125
Ngonga Ngomo· Sherif · Lyko Colibri 13/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Vij = 1n−1
Mij +n∑
k=1k 6=i ,j
MikMkj
• Mapping matrices
• M12 =
1 0 00 1 00 0 0
• M13 =
1 0 10 0.5 00 0 0.5
• M23 =
1 0 00 0.5 00 0 0.5
ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
11
0.5
0.51
0.5
0.5
K3
K1
K2
V12 =
1 0 0.250 0.625 00 0 0.125
Ngonga Ngomo· Sherif · Lyko Colibri 13/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Voting matrices
• V12 =
1 0 0.250 0.625 00 0 0.125
• V13 =
1 0 0.50 0.5 00 0 0.25
• V23 =
1 0 00 0.5 00 0 0.25
• Post-processed matrices
• V12 =
1 0 00 0.625 00 0 0.125
ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
11
0.5
0.51
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 14/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Assume links in Vij to be correct
• vij = 1→ All matrices agree onhow to link (Ki ,Kj)e.g., V12(ex1:1, ex2:1)
• For all vij < 1 assume either
1 Missing linkse.g., V12(ex1:3, ex2:3) notcontained in M12
2 Weak linkse.g., V12(ex1:2, ex2:2) < 1 isdue to M13(ex1:2, ex3:2) andM32(ex3:2, ex2:2) being 0.5
V12 =
1 0 00 0.625 00 0 0.125
ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
11
0.5
0.51
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 15/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Assume links in Vij to be correct
• vij = 1→ All matrices agree onhow to link (Ki ,Kj)e.g., V12(ex1:1, ex2:1)
• For all vij < 1 assume either
1 Missing linkse.g., V12(ex1:3, ex2:3) notcontained in M12
2 Weak linkse.g., V12(ex1:2, ex2:2) < 1 isdue to M13(ex1:2, ex3:2) andM32(ex3:2, ex2:2) being 0.5
V12 =
1 0 00 0.625 00 0 0.125
ex1:1ex1:1 ex1:2 ex1:3
ex2:1ex2:1
ex2:2
ex2:3
ex3:1ex3:1
ex3:2
ex3:3
11
11
11
0.5
0.511
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 15/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Assume links in Vij to be correct
• vij = 1→ All matrices agree onhow to link (Ki ,Kj)e.g., V12(ex1:1, ex2:1)
• For all vij < 1 assume either
1 Missing linkse.g., V12(ex1:3, ex2:3) notcontained in M12
2 Weak linkse.g., V12(ex1:2, ex2:2) < 1 isdue to M13(ex1:2, ex3:2) andM32(ex3:2, ex2:2) being 0.5
V12 =
1 0 00 0.625 00 0 0.125
ex1:1 ex1:2 ex1:3ex1:3
ex2:1
ex2:2
ex2:3ex2:3
ex3:1
ex3:2
ex3:3
1
1
11
0.5
0.51
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 15/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Assume links in Vij to be correct
• vij = 1→ All matrices agree onhow to link (Ki ,Kj)e.g., V12(ex1:1, ex2:1)
• For all vij < 1 assume either
1 Missing linkse.g., V12(ex1:3, ex2:3) notcontained in M12
2 Weak linkse.g., V12(ex1:2, ex2:2) < 1 isdue to M13(ex1:2, ex3:2) andM32(ex3:2, ex2:2) being 0.5
V12 =
1 0 00 0.625 00 0 0.125
ex1:1 ex1:2ex1:2 ex1:3
ex2:1
ex2:2ex2:2
ex2:3
ex3:1
ex3:2ex3:2
ex3:3
1
1
11
0.50.5
0.51
0.50.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 15/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
K1 K2. . . Kn
Unsupervised LD
VotingRepair
Mappings
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 16/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
• Goal: Repair instance data soas to improve vij < 1
• Link to be repaired is(ex1:2, ex2:2).
• Reason for this link:
• rs = ex1:2 and• rt = ex3:2.
• Computing average similarity:
• σ(ex1:2) = 0.75 while• σ(ex3:2) = 0.5.
• Colibri overwrite the valuesof ex3:2 with those of ex1:2.
V12 =
1 0 00 0.625 00 0 0.125
ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
1
1
0.5
0.5
1
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 17/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
• Goal: Repair instance data soas to improve vij < 1
• Link to be repaired is(ex1:2, ex2:2).
• Reason for this link:
• rs = ex1:2 and• rt = ex3:2.
• Computing average similarity:
• σ(ex1:2) = 0.75 while• σ(ex3:2) = 0.5.
• Colibri overwrite the valuesof ex3:2 with those of ex1:2.
V12 =
1 0 00 0.625 00 0 0.125
ex1:1 ex1:2ex1:2 ex1:3
ex2:1
ex2:2ex2:2
ex2:3
ex3:1
ex3:2ex3:2
ex3:3
1
1
1
1
0.50.5
0.5
1
0.50.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 17/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
• Goal: Repair instance data soas to improve vij < 1
• Link to be repaired is(ex1:2, ex2:2).
• Reason for this link:
• rs = ex1:2 and• rt = ex3:2.
• Computing average similarity:
• σ(ex1:2) = 0.75 while• σ(ex3:2) = 0.5.
• Colibri overwrite the valuesof ex3:2 with those of ex1:2.
V12 =
1 0 00 0.625 00 0 0.125
ex1:1 ex1:2ex1:2 ex1:3
ex2:1
ex2:2ex2:2
ex2:3
ex3:1
ex3:2ex3:2
ex3:3
1
1
1
11
0.50.5
0.5
1
0.50.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 17/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
• Goal: Repair instance data soas to improve vij < 1
• Link to be repaired is(ex1:2, ex2:2).
• Reason for this link:
• rs = ex1:2 and• rt = ex3:2.
• Computing average similarity:
• σ(ex1:2) = 0.75 while• σ(ex3:2) = 0.5.
• Colibri overwrite the valuesof ex3:2 with those of ex1:2.
V12 =
1 0 00 0.625 00 0 0.125
ex1:1 ex1:2ex1:2 ex1:3
ex2:1
ex2:2ex2:2
ex2:3
ex3:1
ex3:2ex3:2
ex3:3
1
1
1
1
0.50.5
0.5
1
0.50.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 17/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Ngonga Ngomo· Sherif · Lyko Colibri 18/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Benchmark Generation Approach
• So far, no benchmark for linking n > 2 knowledge bases
• Benchmark generation approach (Ferrara et al., 2011)
• Generated m − 1 copies of initial dataset K1
• Alteration operators:• Misspellings• Abbreviations• Word permutations
• Alteration strategy:• Pick random resource according to alteration probability• Pick random operator
Ngonga Ngomo· Sherif · Lyko Colibri 18/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
• Datasets:• Two synthetic datasets (OAEI2010)• Three real-world datasets (Koepcke et al., 2010)
• Colibri:• Maximal number of iterations = 10• Number of knowledge bases = {3, 4, 5}• Alteration probability ap = {10%, 20%, . . . , 50%}• Repeat each experiment 5 times
Ngonga Ngomo· Sherif · Lyko Colibri 19/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Experimental Results (synthetic dataset)
KBs FEuclid FColibri Runtime (sec) Repaired links
3 0.89 0.98 0.4 434 0.90 1.00 0.9 355 0.88 1.00 1.3 34
• Restaurant dataset
• Average values after 10 iterations
• Alteration probability ap = 50%
Ngonga Ngomo· Sherif · Lyko Colibri 20/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Experimental Results (real-world dataset)
KBs FEuclid FColibri Runtime (sec) Repaired links
3 0.86 0.98 81.8 3004 0.85 0.99 160.4 1505 0.84 0.88 246.8 60
• Amazon dataset
• Average values after 10 iterations
• Alteration probability ap = 50%
Ngonga Ngomo· Sherif · Lyko Colibri 21/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Results on the Restaurants dataset
• Alteration probabilityap = 50%
• Knowledge bases = 5
iterationNr 2 4 60
20
40
60
80
100
PrecisionRecallF-MeasureError rate
Full results at:https://github.com/AKSW/LIMES/tree/master/
evaluationsResults/colibri
Ngonga Ngomo· Sherif · Lyko Colibri 22/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Ngonga Ngomo· Sherif · Lyko Colibri 23/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work• Conclusion
• Presented Colibri• Improved F-measure of Euclid up to 14%
• Future Work• Evaluation on other datasets• Interactive scenarios (i.e., consult user before dataset repair)• Combination with other unsupervised solutions (e.g., Eagle)
Ngonga Ngomo· Sherif · Lyko Colibri 23/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Thank You!
Questions?Mohamed Sherif
Augustusplatz 10D-04109 Leipzig
[email protected]://aksw.org/MohamedSherif
http://limes.sf.net
Ngonga Ngomo· Sherif · Lyko Colibri 24/24