instance-based ontology matching by instance enrichment
TRANSCRIPT
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Instance-Based Ontology MatchingBy Instance Enrichment
Balthasar A.C. Schopman–
supervisors:Antoine Isaac
Shenghui WangStefan Schlobach
Vrije Universiteit Amsterdam
June 29, 2009
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Outline
1 Ontology matching
2 Instance-based OM
3 IBOMbIE
4 Experiments
5 Comparison other OM
6 Conclusions
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Research questions
General research questions:
How do different algorithm design options ofIBOMbIE influence the final result?
How does the performance of IBOMbIE relate to other OMalgorithms?
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Questions from the audience
Crucial questions: please interrupt me.Other questions: after presentation please.
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Ontology
Definition of an ontology1:
An ontology typically (1) defines a vocabulary relevant ina certain domain of interest, (2) specifies the meaning ofterms and (3) specifies relations between terms.
Ontologies:
controlled vocabulary
thesaurus
database schema
canonical semantic web ontology: a set of typed, interrelatedconcepts defined in a formal language
1by Euzenat and Shvaiko
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Ontology
Definition of an ontology1:
An ontology typically (1) defines a vocabulary relevant ina certain domain of interest, (2) specifies the meaning ofterms and (3) specifies relations between terms.
Ontologies:
controlled vocabulary
thesaurus
database schema
canonical semantic web ontology: a set of typed, interrelatedconcepts defined in a formal language
1by Euzenat and Shvaiko
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Ontology Matching (OM)
Ontologies ...
facilitate interoperability between parties
do not solve heterogeneity problem, but raise it to a higherlevel: the OM level
Elementary OM techniques:
terminological
structure-based
semantic-based
instance-based
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Ontology Matching (OM)
Ontologies ...
facilitate interoperability between parties
do not solve heterogeneity problem, but raise it to a higherlevel: the OM level
Elementary OM techniques:
terminological
structure-based
semantic-based
instance-based
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Instance-based OM (IBOM)
Variants IBOM:
1 use dually annotated instances (DAI)
2 create DAI
3 use extension of concepts (DAI not required)
General pros and cons:
Con: does not deduce specific relations
Con: suitable instances rarely available
Pro: focus on active part of ontology
Pro: able to deal with ambiguous linguistic phenomena:synonym, homonym
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Instance-based OM (IBOM)
Variants IBOM:
1 use dually annotated instances (DAI)
2 create DAI
3 use extension of concepts (DAI not required)
General pros and cons:
Con: does not deduce specific relations
Con: suitable instances rarely available
Pro: focus on active part of ontology
Pro: able to deal with ambiguous linguistic phenomena:synonym, homonym
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Intro
Definitions of ‘instance of’-relation
Example definitions:
Canonical semantic web definition
Library definition
someone:Peter
foaf:Person
rdf:type
"Peter"
foaf:name
someone:Nate
foaf:knows
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Intro
Definitions of ‘instance of’-relation
Example definitions:
Canonical semantic web definition
Library definition
ontology / vocabulary
c1
c2
c3
...
...
object o1
c1
object o2
c2
c3
c1
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Intro
Application
Two library scenarios: KB and TEL
match controlled vocabularies
data-sets: book catalogs
multi-lingual
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
IBOM: measuring similarity
c2
c1
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
IBOM: measuring similarity
c2
c1
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
IBOM: measuring similarity
c2
c1
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
IBOM: measuring similarity
c2
c1
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
Jaccard coefficient
Jaccard coefficient:
J(c1, c2) =|i1 ∩ i2||i1 ∪ i2|
quantifies the overlap of the extension of concepts→ relatedness between concepts
Con: no multi-sets
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
Jaccard coefficient
Jaccard coefficient:
J(c1, c2) =|i1 ∩ i2||i1 ∪ i2|
quantifies the overlap of the extension of concepts→ relatedness between concepts
Con: no multi-sets
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
Creating dually annotated instances (DAI)
Jaccard needs DAIIf DAI unavailable:
exact instance matching → merge annotations
approximate instance matching → enrich instances
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
Creating dually annotated instances (DAI)
Jaccard needs DAIIf DAI unavailable:
exact instance matching → merge annotations
approximate instance matching → enrich instances
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Instance matching
Approximate instance matching
Instance similarity measures:
Lucene
vector space model (VSM)
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
Basic instance enrichment (IE)
data-set D2data-set D1
i1
a b
i
i2
A B
ii
i
match
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
Basic instance enrichment (IE)
data-set D2data-set D1
i1
a b
i
i2
A B
ii
i
BA
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: topN
data-set D2data-set D1
i1
a bi3
D
i2
A B
i4
A C
i
i
1st
match
3rd
match
2nd
match
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: topN
data-set D2data-set D1
i1
a bi3
D
i2
A B
i4
A C
i
i
BA
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: topN
data-set D2data-set D1
i1
a bi3
D
i2
A B
i4
A C
i
i
BA
D
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: topN
data-set D2data-set D1
i1
a bi3
D
i2
A B
i4
A C
i
i
BA
D
A C
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: similarity threshold (ST)
data-set D2data-set D1
i1
a bi3
D
i2
A B
i4
A C
i
i
sim(i1,i2)= 0.8
sim(i1,i4)= 0.2
sim(i1,i3)= 0.4
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: similarity threshold (ST)
data-set D2data-set D1
i1
a b i3
D
i2
A B
i4
A C
i
i
sim(i1,i2)= 0.8
sim(i1,i4)= 0.2
sim(i1,i3)= 0.4
BA
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: similarity threshold (ST)
data-set D2data-set D1
i1
a b i3
D
i2
A B
i4
A C
i
i
sim(i1,i2)= 0.8
sim(i1,i4)= 0.2
sim(i1,i3)= 0.4
BA
D
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: similarity threshold (ST)
data-set D2data-set D1
i1
a b i3
D
i2
A B
i4
A C
i
i
sim(i1,i2)= 0.8
sim(i1,i4)= 0.2
sim(i1,i3)= 0.4
BA
D
A C
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Experimental questions
Experimental questions
Instance similarity measure
topN parameter
ST parameter
combining topN + ST parameters
performance as compared to other OM algorithms
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Evaluation
Alignment evaluation
Methods:
Gold standard := good alignment
Reindexing
Measures:
Precision
Recall
f-measure
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Results of experiments
Results: instance similarity measure - quality
0
0.2
0.4
0.6
0.8
1
10 100 1000 10000 100000 1e+06
perf
orm
ance
mapping rank
P VSMR VSMF VSM
P LuceneR LuceneF Lucene
(a) Gold standard
0
0.2
0.4
0.6
0.8
1
100 1000 10000 100000 1e+06
perf
orm
ance
mapping rank
P VSMR VSMF VSM
P LuceneR LuceneF Lucene
(b) Reindex
Virtually equal
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Results of experiments
Results: instance similarity measure - quality
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000 1e+06
over
lap
mapping rank
(c) Overlap
0
0.2
0.4
0.6
0.8
1
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
perf
orm
ance
mapping rank
precision VSMprecision Lucene
(d) Manual Evaluation
Edge to VSM
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Results of experiments
Results: instance similarity measure - run-time
amount time to enrich 100Kindexed instances (hrs:min)instances Lucene VSM
524K 1:04 0:171,457K 7:20 0:222,506K 26:15 0:32
(e) stats
0
200
400
600
800
1000
1200
1400
1600
4 6 8 10 12 14 16 18 20 22 24 26
incr
ease
run
-tim
e
indexed documents * 100K
VSMLucene
(f) figure it out
Optimizations VSM:
pre-calculate weights indexed documents
purge insignificant weights (35% + 50%)
word centered indexing approach
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Results of experiments
Results: instance similarity measure - run-time
amount time to enrich 100Kindexed instances (hrs:min)instances Lucene VSM
524K 1:04 0:171,457K 7:20 0:222,506K 26:15 0:32
(g) stats
0
200
400
600
800
1000
1200
1400
1600
4 6 8 10 12 14 16 18 20 22 24 26
incr
ease
run
-tim
e
indexed documents * 100K
VSMLucene
(h) figure it out
Optimizations VSM:
pre-calculate weights indexed documents
purge insignificant weights (35% + 50%)
word centered indexing approach
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Results of experiments
Results: topN parameter (TEL)
As N increases, quality of mappings decrease
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
1 10 100 1000 10000 100000 1e+06
f-m
easu
re
mapping rank
top1 (baseline)top2top3top4top5top6
(i) Gold standard
0
0.05
0.1
0.15
0.2
0.25
100 1000 10000 100000 1e+06
f-m
easu
re
mapping rank
top1 (baseline)top2top3top4top5top6
(j) Reindex
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Results of experiments
Results: similarity threshold parameter (KB)
Best performance with ST: ST=µ
Best performance: baseline (topN=1, ST=∞)
0
0.1
0.2
0.3
0.4
0.5
0.6
10 100 1000 10000 100000 1e+06
f-m
easu
re
mapping rank
baselineT=mean-1.5s
T=mean-sT=mean-.5s
T=meanT=mean+.5s
T=mean+sT=mean+1.5s
(k) Gold standard
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
100 1000 10000 100000 1e+06
f-m
easu
re
mapping rank
baselineT=mean-1.5s
T=mean-sT=mean-.5s
T=meanT=mean+.5s
T=mean+sT=mean+1.5s
(l) Reindex
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Results of experiments
Results: combining parameters
Using both parameters performs good in TEL, not in KB...possibly due to:
more selective IBOMbIE pays off in TEL, because vocabularies+ instance annotations are more different than in KB scenario.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
100 1000 10000 100000 1e+06
f-m
easu
re
mapping rank
baselinetopN=1 ST=mu-0.5s
topN=1 ST=mutopN=1 ST=mu+0.5stopN=2 ST=mu-0.5s
topN=2 ST=mutopN=2 ST=mu+0.5stopN=3 ST=mu-0.5s
topN=3 ST=mu
(m) KB
0
0.05
0.1
0.15
0.2
0.25
0.3
100 1000 10000 100000 1e+06
f-m
easu
re
mapping rank
baselinetopN=1 ST=mu-0.5s
topN=1 ST=mutopN=1 ST=mu+0.5stopN=2 ST=mu-0.5s
topN=2 ST=mutopN=2 ST=mu+0.5s
topN=3 ST=mutopN=3 ST=mu+0.5s
(n) TEL
(evaluation method: reindexing)
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
OAEI
Ontology alignment evaluation initiative (OAEI)
terminol- structure- semantic- instance-ogical based based based
DSSim X X X ✗
Lily X X X ✗
TaxoMap X X X ✗
IBOMbIE ✗ ✗ ✗ X
DSSim, Lily and TaxoMap:
consider KB ontologies “huge”
feature functionality to deal with large ontologies
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
OAEI
Performance comparison: quality
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 2000 4000 6000 8000 10000
perf
orm
ance
mapping rank
P IBOMbIE topN=1R IBOMbIE topN=1
P DSSimR DSSim
P LilyR Lily
P TaxoMapR TaxoMap
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
OAEI
Performance comparison: resources + coverage
matcher run-time amount mappings
DSSim 12:00 2930Lily ? 2797
TaxoMap 2:40 1851IBOMbIE 1:54 7000+
(Amount lexically equal concepts KB vocabulaires = 2,895)
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Conclusions + discussion
IBOMbIE algorithm is quite promising:
Relatively low run-time
Able to deal with large ontologies
Amount + quality of mappings
Pros of IBOM
Able to align ontologies using disjunct data-sets
Basic instance enrichment appears best performing method.Possible cause: Jaccard coefficient does not support multi-sets.
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Fin
Thank you... any questions ?
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Vocabularies
vocabulary size
KB GTT 35KBrinkman 5K
TEL LCSH 340KRameau 155KSWD 805K
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IE parameter: similarity threshold (ST)
D1 D2
annotated annotatedwith with µ σ
KB O1 O2 0.297 0.106O2 O1 0.279 0.101
TEL O1 O2 0.260 0.097O2 O1 0.232 0.084
standard ST: µ
step-size: 12σ
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
VSM
Weights are components of vectors:
term frequency - inverse document frequency: TF-IDF
e.g. audiovisual features
tfidfw ,d = tfw ,d ∗ idfw
tfw ,d =
√nw ,d
|d |
idfw = log|D|
|d ∈ D : w ∈ d |VSM cosine similarity
cosine sim(d1, d2) =~d1 · ~d2
|~d1||~d2|=
∑ni=1 wi ,d1
wi ,d2√
∑
i w2i ,d1
√
∑
i w2i ,d2
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Evaluation method: gold standard
Gold standard := good alignment
P = precision =|{reference} ∩ {retrieved}|
|{retrieved}|
R = recall =|{reference} ∩ {retrieved}|
|{reference}|
F = f − measure = 2 ∗ P ∗ R
P + R
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Evaluation method: reindexing
instance i_dual
{a, b}
{x}
instance i_dual
{x, z}
{a, b}
reindex
o_1
a
b
o_2
x
y
zc
P =
∑dually annotated instances |{reference}∩{retrieved}||{retrieved}|
|{reindexed instances}|
R =
∑dually annotated instances |{reference}∩{retrieved}||{reference}|
|{dually annotated instances}|
Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IbOM by IM algorithm overview
Whole algorithm
Start: two data-sets Dx and Dy
1 Enrich instances of Dx with annotations of instances of DyFor every instance a:
1 Find N best matching instances {b} in Dy2 Add annotations of {b} to a
2 Enrich vice versa
3 Merge data-sets into one dually annotated data-set
4 Apply Jaccard measure