semantic web - ce.sharif.educe.sharif.edu/courses/90-91/2/ce694-1/resources/root/lecture...
TRANSCRIPT
بسمه تعالی
Semantic Web
Morteza Amini
Ontology Alignment
Sharif University of Technology Spring 90-91
Outline
The Problem of Ontologies
Ontology Alignment Overall Process
Ontology Heterogeneity
Similarity Methods
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 2
Outline
The Problem of Ontologies
Ontology Alignment Overall Process
Ontology Heterogeneity
Similarity Methods
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 3
The Problem
Like the Web, the Semantic Web by
design will be distributed and
heterogeneous.
Ontology is used in it to support
interoperability and common
understanding between different parties.
Ontologies themselves may have some
heterogeneities.
Ontology Alignment is needed to find
semantic relationships among entities of
ontologies.
How should I use them? !!!
?
? ? ?
?
? ? d c
b
a
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 4
Need for Ontology Merging
There is significant overlap in existing ontologies
Yahoo! and DMOZ Open Directory
Product catalogs for similar domains
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 5
Terminology (1)
Mapping: a formal expression that states the semantic
relationship between two entities belonging to different
ontologies.
Given two ontologies O1 and O2, mapping one ontology onto
another means that for each entity (concept C, relation R, or
instance I) in ontology O1, we try to find a corresponding entity,
which has the same intended meaning, in ontology O2.
map(e1i) = e2j
Ontology Alignment: a set of correspondences between
two or more (in case of multi-alignment) ontologies. These
correspondences are expressed as mappings.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 6
Terminology (2)
Ontology Transformation: a general term for referring to any process which leads to a new ontology O0 from an ontology O by using a transformation function T.
Ontology Translation: an ontology transformation function t for translating an ontology o written in some language L into an ontology o’ written in a distinct language L’.
Ontology Merging: the creation of a new ontology from two (possibly overlapping) source ontologies. This concept is closely related to that of integration in the database community.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 7
An Example of Ontology Alignment
FastAli’s
Peugeot
VehicleHas
Specification
Speed
250
km/h
Peugeot 405
Has
SpeedCar
Speed
Ali
Owner
Boat
Thing
Automobile
Object
Vehicle
Has
Owner
1.0
0.6
0.6
0.8
Car – Automobile
Label Similarity = 0.0
Super Similarity = 1.0
Instance Similarity = 0.6
Relation Similarity = 0.8
Total Similarity = 0.6
Concept
Property
Instance
Type
Similarity
Car : Ontology A ( similar to ) Automobile : Ontology B
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 8
An Example of Ontology Merging
Family Car
Porsche
Sport Car
Automobile
Thing Object
Luxury Car Family Car
Sport Car
Vehicle
Car Bus
BMW
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 9
An Example of Ontology Merging
Object
Luxury Car Family Car
Sport Car
Family Car Sport Car
Automobile
Thing
Vehicle
Car Bus
Porsche
BMW
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 10
An Example of Ontology Merging
Object, Thing
Luxury Car Family Car Sport Car
Vehicle
Car, Automobile Bus
Porsche BMW
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 11
Outline
The Problem of Ontologies
Ontology Alignment Overall Process
Ontology Heterogeneity
Similarity Methods
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 12
Ontology Alignment Process
Iterations
Input Output
Features Similarity Aggregation Interpretation Entity Pair
Selection
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 13
Features
Object
Vehicle
Car Boat
hasOwner
Owner Speed hasSpeed
Porsche KA-123 Marc 250 km/h
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 14
Similarity Measure
)),min(
),(),min(,0max(),(
21
2121
21ss
ssedsssssimString
String similarity: string comparisons e.g. labels.
E.g.,
Object similarity: direct object comparisons. Are two objects the same?
E.g., instances.
Set similarity: set comparisons. Are the two sets of objects the same?
E.g., concepts (based on their instances).
Set similarity requires a precalculated similarity of the objects based on object similarity method.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 15
Similarity Rules
Feature Similarity Measure
Concepts label String Similarity
subclassOf Set Similarity
instances Set Similarity
…
Relations
Instances
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 16
Aggregation
k
kk fesimwfesim ),(),(
How are the individual similarity measures combined?
Linearly
Weighted
Special Function
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 17
Interpretation
From similarities to mappings.
A threshold can be applied on the similarity (measured in
the previous step) to determine the required mapping.
map(e1j) = e2j ← sim(e1j ,e2j)>t
The threshold can be determined through test (training)
data sets.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 18
Outline
The Problem of Ontologies
Ontology Alignment Overall Process
Ontology Heterogeneity
Similarity Methods
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 19
Forms of Heterogeneity in Ontologies (1)
(1) Syntactic: depend on the choice of the representation OWL, RDFS, DAML, N3, DATALOG, PROLOG, …
(2) Terminological: all forms of mismatches that are related to the process of naming the entities (e.g. individuals, classes, properties, relations) that occur in an ontology.
Typical Examples:
different words are used to name the same entity (synonymy);
the same word is used to name different entities (polysemy);
words from different languages (English, French, etc.) are used to name entities;
syntactic variations of the same word (different acceptable spellings, abbreviations, use of optional prefixes or suffixes, etc.).
Mismatches at the terminological level are not as deep as those occurring at the conceptual level. However, Most real cases have to do with the terminological level (e.g., with the way different people name the same entities), and therefore this level is at least as crucial as the other one.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 20
Forms of Heterogeneity in Ontologies (2)
(3) Conceptual: we encounter mismatches which have to do
with the content of an ontology.
Metaphysical differences: which have to do with how the world
is “broken into pieces”.
Coverage: cover different portions – possibly
overlapping– of the world.
Granularity: One ontology provides a more (or less)
detailed description of the same entities.
Perspective: an ontology may provide a viewpoint, which
is different from the viewpoint adopted in another
ontology.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 21
Forms of Heterogeneity in Ontologies (3)
Metaphysical differences:
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 22
Overcoming Heterogeneity
One common approach to the problems of heterogeneity is the definition of relations (mappings) across the heterogeneous representations.
These relations can be used for transforming expression of one ontology into a form compatible with that of the other.
This may happen at any level:
syntactic: through semantic-preserving transducers;
terminological: through functions mapping lexical information;
conceptual: through general transformation of the representations (sometimes requiring a complete prover for some languages).
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 23
Structure of Mappings
Alignment: a process that starts from two representations O and O’ and produces a set of mappings between pairs of (simple or complex) entities <e, e’> belonging to O and O’ respectively.
Intuitively, we will assume that in general a mapping can be described as a quadruple: <e, e’, n, R>
e and e’ are the entities between which a relation is asserted by the mapping.
n is a degree of trust (confidence) in that mapping.
R is the relation associated to a mapping, where R identifies the relation holding between e and e’.
simple set-theoretic relation
a fuzzy relation
a probabilistic distribution over a complete set of relations
a similarity measure
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 24
Finding Mappings Through Similarity
There are many ways to assess the similarity between two
entities. The most common way amounts to defining a measure
of this similarity.
The characteristics which can be asked from these measures:
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 25
Outline
The Problem of Ontologies
Ontology Alignment Overall Process
Ontology Heterogeneity
Similarity Methods
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 26
Similarity Methods
Local Methods
Having local view to compute similarities.
Global Methods
Having global view to compute similarities and merge
computed local similarities.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 27
Similarity – Local Methods
Terminological Methods
String Based Methods
Language Based Methods
Structural Methods
Internal Structure
External Structure
Extensional (based on instances) Methods
When the classes share the same instances
When they do not
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 28
Terminological Methods
Terminological methods compare strings.
Can be applied to: name,
label
comments concerning entities
URI
Take advantage of the structure of the string (as a sequence of letter).
The main idea in using such measures is the fact that usually similar entities have similar names and descriptions in different ontologies.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 29
Terminological Methods - Normalization
There are a number of normalization procedures that help improving the results of subsequent comparison:
Case normalization: consists of converting each alphabetic character in the strings in their down case counterpart;
Diacritics suppression: replacing characters with diacritic signs with their most frequent replacement (replacing Montréal with Montreal);
Blank normalization: Normalizing all blank characters (blank, tabulation, carriage return) into a single blank character;
Link stripping: normalizing some links between words, e.g., replacing apostrophes and blank underline into dashes;
Stopword elimination: eliminates words that can be found in a list (usually like, “to”, “a". . . ).
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 30
Terminological Methods - String Based
Substring Similarity
Hamming Distance
N-Gram Distance
Edit Distance
Jaro Similarity
Token Based Distances
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 31
Terminological Methods - String Based
In string edit distance, the operations usually considered are insertion of a character, replacement of a character by another and deletion of a character.
Levenstein Distance is an Edit Distance with all costs to 1.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 32
Terminological Methods – Language Based
Rely on using NLP techniques to find associations between instances of concepts or classes.
Intrinsic methods: perform the terminological matching with the help of morphological and syntactic analysis to perform term normalization. (Stemming) : going go
Extrinsic methods: make use of external resources such as dictionaries and lexicons (Wordnet). Resnik Semantic Similarity
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 33
Structural Methods
The structure of entities that can be found in ontology can be compared, instead of comparing their names or identifiers.
Internal Structure: use criteria such as the range of their properties (attributes and relations), their cardinality, and the transitivity and/or symmetry of their properties to calculate the similarity between them.
External Structure: The similarity comparison between two entities from two ontologies can be based on the position of entities within their hierarchies.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 34
Structural Methods – External (1)
If two entities from two ontologies are similar, their neighbors might also be somehow similar.
Criteria for deciding that the two entities are similar include:
Their direct super-entities are already similar.
Their sibling-entities are already similar.
Their direct sub-entities are already similar.
All (or most) of their descendant-entities (entities in the sub tree rooted at the entity in question) are already similar.
All (or most) of their leaf-entities are already similar.
All (or most) of entities in the paths from the root to the entities in question are already similar.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 35
Structural Methods – External (2)
Some existing Approaches:
Structural topological dissimilarity on hierarchies
Upward Cotopic Distance
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 36
Extensional (based on instances) Methods
Compares the extension of classes, i.e., their set of instances rather than
their interpretation.
Conditions in which such techniques can be used:
When the classes share the same instances
When they do not
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 37
Similarity – Global Methods
After calculation of local similarity, it is remain to compute the
alignment. This involve some kind of more global treatments,
including:
aggregating the results of these base methods in order to compute
the similarity between compound entities
organizing the combination of various similarity / alignment
algorithms
involving the user in the loop
finally extracting the alignments (mappings) from the resulting
(dis)similarity
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 38
Compound Similarity
Some existing approaches:
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 39
Users Feed Back
The support of effective interaction of the user with the
system components is one concern of ontology
alignment.
User input can take place in many areas of alignment:
Assessing initial similarity between some terms;
Invoking and composing alignment methods;
Accepting or refusing similarity or alignment provided by the
various methods.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 40
Alignment Extraction
The ultimate alignment goal is a satisfactory set of
correspondences between ontologies.
Manual Extraction: Display the entity pairs with their
similarity scores and/or ranks and leaving the choice of the
appropriate pairs up to the user of the alignment tool.
Automatic Extraction: Using Thresholds
Hard threshold: retains all the correspondence above threshold n.
Delta method: using the highest similarity value to which a
particular constant value d is subtracted as a threshold.
Proportional method: using the a percentage of the highest
similarity value as a threshold.
Percentage: retains the n% correspondences above the others.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 41
Learning Methods
Like in many other fields, learning methods developed in machine learning reveals useful in ontology alignment.
Two particular areas:
supervised learning in which the ontology alignment algorithm learns how to work through the presentation of many good alignment (positive examples) and bad alignments (negative examples).
it is difficult to know which techniques works well for which ontology features.
An ontology alignment algorithm learnt with several ontology pairs, might not necessarily work well for a new ontology pair.
Learning from data in which a population of instances is communicated to the algorithm together with theirs relations and the classes they belong to.
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 42
Existing Works
Method Year Organization Project Leader
Automatic
Features
Ag
greg
atio
n
Lexical
Str
uctu
re
Str
ing
Sem
an
tic
In
sta
nce
OntoMorph 1997 S. California Chalupsky Semi T
U.S. Army 1999 DARPA Semi T
Smart 1999 Sanford Fridman, Noy Semi T T
Chimaera 1999 Stanford McGuinness Semi T T T
Prompt 2001 Stanford Noy, Musen Semi T T
InfoSlueth 2001 Amsterdam Ding Semi T T
A. Prompt 2002 Stanford Noy, Musen Semi T T T
Glue 2002 Illinois Doan Automatic T T T T
IF Map 2003 Southampton Kafoglou Automatic T T
NOM 2003 Karlsruhe Ehric Automatic T T T T T
QOM 2004 Karlsruhe Ehric Automatic T T T T
CROSI 2005 Southampton Kafoglou Automatic T T T
Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 43