an empirical study of instance-based ontology mapping antoine isaac, lourens van der meij, stefan...
Post on 21-Dec-2015
214 views
TRANSCRIPT
An Empirical Study of Instance-Based Ontology Mapping
Antoine Isaac, Lourens van der Meij, Stefan Schlobach, Shenghui Wang
STITCH@CATCH funded by NWO
Vrije Universiteit AmsterdamKoninklijke Bibliotheek Den HaagMax Planck Instutute Nijmegen
ISWC 2007
Metamotivation
• Ontology mapping in practise • Based on real problems in the host institution at the Dutch
Royal Library
• Task-driven • Annotation support• Merging of thesauri
• Real thesauri (100 years of tradition)• Really messy• Conceptually difficult• Inexpressive
• Generic Solutions to Specific Questions & Tasks• Using Semantic Web Standards (SKOSification)
ISWC 2007
Overview
• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions
ISWC 2007
The Alignment Task: Context
• National Library of the Netherlands (KB)• 2 main collections
• Legal Deposit: all Dutch printed books• Scientific Collections: history, language…
• Each described (indexed) by its own thesaurus
ScientificCollection
Depot
1.4Mbooks
1Mbooks
GTT Brinkman
ISWC 2007
A need for thesaurus mapping
• The KB wants • (Scenario 1) Possibly discontinue one of both
annotation and retrieval methods.• (Scenario 2) Possibly merge the thesauri
• We try to explore mapping• (Task 1) In case of single/new/merged retrieval
system, find books annotated with old system, facilitated by using mappings
• (Task 2) Candidate terms for merged thesaurus
• We make use of the doubly annotated corpus to calculate Instance-Based mappings
ISWC 2007
Overview
• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions
ISWC 2007
Standard approach (Jaccard)
• Use co-occurrence measure to calculate similarity between 2 concepts: e.g.
B GElements of B
Elements of G
Joint Elements
Similarity = 5/9 = 55 % (overlap, e.g. Degree of Greenness )Similarity = 1/7 = 14 % (overlap, e.g. Degree of Greenness )
Set of books in the library
ISWC 2007
Issues with this measure (sparse data)• What is more reliable?
• We need • more reliable measures • Or thresholds (at least n doubly annotated books)
Or ?
Jacc = 18/21 = 86 %
Jacc = 1/1 = 100 %
The second solution is worse: bB = {MemberOfParliament} and bG = {Cricket}
ISWC 2007
Issue with measure (hierarchy):
B G
Non hierarchical
Set of books in the library
·
Hierarchical Elements
B’Jacc(B’,G) = ½ = 50%
Jacc(B’,G) = 2/6 = 33%
Consider a hierarchy
ISWC 2007
An empirical study of instance-based OM
• We experimented with three dimensions
Sim
ilarit
y m
easu
re
Threshold
Hierarchy
Jaccard
Corrected Jaccard
Pointwise Mutual Information
Log Likelihood Ratio
Information Gain
0
10
Yes
No
Why only 2 thresholds? Because of evaluation costs!
ISWC 2007
Overview
• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions
ISWC 2007
User Evaluation Statistics
• 3 evaluators with 1500 evaluations• 90% agreement ONLYEQ• If some evaluator says "equivalent", 73% of
other evaluators say the same• Comparing two evaluators, correspondence in
assignment is best for equivalence, followed by "No Link", "Narrower than", "Broader than", at or above 50% agreement, "Related To" has 35% agreement.
• There are correlations between evaluators.• For example, Ev1 and Ev2 agreed much more on
saying that there is no link than the Ev3.
ISWC 2007
Evaluation Interpretation: What is a good mapping?
• Is use case specific. We considered:• ONLYEQ: Only Equivalent answer → correct• NOTREL: EQ, BT,NT → correct• ALL: EQ, BT, NT, RT → correct
ONLYEQ NOTREL ALL
• The question is obviously: do they produce the same results
ISWC 2007
Evaluation: validity of the (different) methods
Answer is: yes
All evaluations produce the same results (in different scales)
ISWC 2007
A remark about Evaluation
• Use of mappings strongly task dependant • Scenario 1 (legacy data/annotation support) and
Scenario 2 (thesaurus merging) require different mappings.
• Our evaluation is useful (correct) for Scenario 2 (intensional)
• Scenario 1 can be evaluated differently (e.g. cross-validation on test-data)
• See our paper at the Cultural Heritage Workshop.
ISWC 2007
Overview
• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions
ISWC 2007
Experiments: Setup, Data and Thesauri
• We calculated • 5 different similarity measures with• Threshold: 0 and 10• Hierarchy: yes or no.
• Based on on • 24.061 GTT concepts with • 4.990 Brinkman concepts based on • 243.886 books with double annotations
ISWC 2007
Experiments: Result calculation
• Average precision at similarity position i:• Pi = Ngood,i/Ni (take the first i mappings, and return the percentage of correct ones)
Example:
This means that from the first 798 mappings 86% were correct
• Recall is estimated based on lexical mappings• F-measure is calculated as usual
100%
798th mapping
86 %
ISWC 2007
Overview
• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions
ISWC 2007
Results: Three research questions
1. What is the influence of the choice of threshold?
2. What is the influence of hierarchical information?
3. What is the best measure and setting for instance-based mapping?
ISWC 2007
What is the influence of the choice of threshold?
Threshold needed for Jaccard
Threshold NOT needed for LLR
ISWC 2007
Best measure and setting for instance-based mapping?
10
We have two winners!
The corrected Jaccard measures
ISWC 2007
Conclusion
• Summary• About 80% precision at estimated 80% recall• Simple measures perform better, if statistical
correction applied, (threshold or explicit statistical correction)
• Hierarchical aspects unresolved• Some measures really unsuited
• Future work: • Generalize results
• Other use cases, web directories, …
• Study other measures
ISWC 2007
Similarity measures Formulae
• Jaccard:
• Corrected Jaccard: assign a smaller score to less frequently co-occurring annotations.
ISWC 2007
Information Theoretic Measures
• Pointwise Mutual Information:• Measures the reduction of
uncertainty that the annotation of one concept yields for the annotation with another concept.
• -> disadvantage: inadequate for spare data
• LogLikelihoodRatio:
• Information Gain:• Information gain is the difference in entropy,• determine the attribute that distinguishes best between positive an
negative example