logics for data and knowledge representation applications of classl: lightweight ontologies
TRANSCRIPT
Logics for Data and KnowledgeRepresentation
Applications of ClassL: Lightweight Ontologies
Outline Ontologies
Descriptive and classification ontologies Real world and classification semantics
Lightweight Ontologies Converting classifications into Lightweight Ontologies
Applications on Lightweight Ontologies Document Classification Query-answering Semantic Matching
2
Ontologies Ontologies are explicit
specifications of conceptualizations
[Gruber, 1993]
They are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts
3
Animal
Bird HeadMammal
Predator Herbivore
GoatTiger
Chicken
Cat
Is-a
Is-a
Is-aIs-a
Is-a
EatsEats
Is-aPart-of
Is-a Is-a
Eats
Body
Part-of
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Concepts and Relations between them CONCEPT: it represents a set of objects or individuals EXTENSION: the set above is called the concept extension or the
concept interpretation Concepts are often lexically defined, i.e. they have natural language
names which are used to describe the concept extensions (e.g. Animal, Lion, Rome), often with an additional description (gloss)
RELATION: a link from the source concept to the target concept The backbone structure of an ontology graph is a taxonomy in which
the relations are ‘is-a’, ‘part-of’ and ‘instance-of’, whereas the remaining structure of the graph supplies auxiliary information about the modeled domain and may include relations like ‘located-in’, ‘eats’, ‘ant’, etc. They are respectively called hierarchical (BT/NT) and associative (RT) relations in Library Science.
4
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Ontology as a graph: a mathematical definition
5
An ontology is an ordered pair
O = <V, E>
V is the set of vertices describing the concepts
E is the set of edges describing relations
Animal
Bird HeadMammal
Predator Herbivore
GoatTiger
Chicken
Cat
Is-a
Is-a
Is-aIs-a
Is-a
EatsEats
Is-aPart-of
Is-a Is-a
Eats
Body
Part-of
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Tree-like Ontologies Take the ontology in the
previous slide and remove those auxiliary relations…
… we get a tree-like ontology consisting of its backbone structure with ‘is-a’ and ‘part-of’ relations (*), that is an informal lightweight ontology.
(*) Notice that in some cases we can obtain more complex structures like DAGs or even with cycles
6
Animal
Bird HeadMammal
Predator Herbivore
GoatTiger
Chicken
Cat
Is-a
Is-a
Is-aIs-a
Is-a Is-aPart-of
Is-a Is-a
Body
Part-of
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
o Classification ontologies
They are used to classify things, such as books, documents, web pages, etc.; the purpose is to provide domain specific terminology and organize individuals accordingly. Such ontologies usually take the form of classifications with (BT\NT\RT) or without explicit relations.
o Descriptive ontologies
They are used to describe a piece of world, such as the Gene ontology, Industry ontology, etc.; the purpose is to offer an unambiguous description of the world. Relations are typically explicit (e.g. is-a) and can be of any kind.
Classification vs. Descriptive Ontologies
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
o Classification ontologies are in classification semantics
In classification ontologies, the extension of each concept (label of a node) is the set of documents about the entities or individual objects described by the label of the concept. For example, the extension of the concept animal is “the set of documents about animals” of any kind.
o Descriptive ontologies are in real world semantics
In descriptive ontologies, concepts represent real world entities.
For example, the extension of the concept animal is the set of real world animals, which can be connected via relations of the proper kind.
Classification vs. Real World semantics
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Classification ontologiesONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Descriptive ontologiesONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Why ‘Lightweight’ Ontologies? The majority of existing ontologies are ‘simple’ taxonomies or
classifications, i.e., hierarchically organized categories used to classify resources.
Ontologies with arbitrary relations do exist, but no intuitive and efficient reasoning techniques support such ontologies in general.
… so we need ‘lightweight’ ontologies.
11
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Lightweight Ontologies A (formal) lightweight ontology is a triple
O = <N,E,C>
where: N is a finite set of nodes, E is a set of edges on N, such that <N,E> is a rooted tree, C is a finite set of concepts expressed in a formal language F,
such that for any node ni N, there is one and only one ∈concept ci C, and, if n∈ i is the parent node for nj, then cj c⊑ i.
NOTE: lightweight ontologies are in classification semantics
12
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Converting tree-like structures into LOs For a descriptive ontology, the backbone taxonomy of ‘is-a’ and
‘instance-of’ is intuitively coincident with the subsumption (‘ ’)⊑ relation in LOs.
NOTE: ‘part-of’ relations correspond to subsumption only if transitive. For instance the following chain cannot be translated:
handle part-of door part-of school part-of school system
For a classification ontology, the extension of each node is the set of documents (books, websites, etc.) that should be classified under the node. Therefore, the links has to be interpreted as ‘subset’ relations and can be transformed directly into subsumption in the target LOs.
13
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Descriptive and classification ontologies
Animal
Vertebrate
Mammal
A
B
D
InvertebrateC
E Bird
is-a is-a
is-a is-a
(a) World
Europe
France
A
B
D
AsiaC
E Italy
part-of
part-of part-of
F Rome
part-of
part-of
(b)
(a) and (b) are two descriptive ontologies. The corresponding classification ontologies are obtained by substituting all the relations with ‘subset’.
(a) and (b) can be converted into lightweight ontologies by substituting the relations into subsumptions. However, the semantics changes from real world to classification semantics.
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
14
Populated (Lightweight) Ontologies In Information Retrieval, the term classification is seen as the
process of arranging a set of objects (e.g., documents) into a set of categories or classes.
A classification ontology is said populated if a set of objects has been classified under ‘proper’ nodes.
Thus a populated (lightweight) ontology includes (explicit or implicit) ‘instance-of’ relations
15
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Example of a Populated Ontology
16
⊑ ⊑
Head Body
Animal
Bird Mammal
Predator Herbivore
GoatTiger
Chicken
Cat
⊑ ⊑
⊑ ⊑ ⊑
⊑⊑ ⊑
‘Chicken Soup’
‘How to Raise Chicken’
‘Tom and Jerry’ ‘www.protectTiger.org’ …
Instance-of
Instance-of
Instance-of Instance-of Instance-of
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Lightweight Ontologies in ClassL: TBox Subsumption terminologies. Recall:
‘… C is a finite set of concepts expressed in a formal language F, such that for any node ni N, there is one and only one concept ∈ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.’
1. Bird ⊑ Animal
2. Mammal ⊑ Animal
3. Chicken ⊑ Bird
4. Cat ⊑ Predator
5. …
NOTE: a tree-like ontology can be transformed into a lightweight ontology, but not vice versa. This is because we loose information during the translation.
17
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Populated LOs in ClassL: TBox + ABox ‘instance-of’ links are encoded into ‘concept assertions’:
1. Chicken(ChickenSoup)
2. Cat(TomAndJerry)
3. …
Instances are the elements of the domain, namely the documents classified in the categories.
18
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Classifications are: Easy to use for humans
Pervasive (Google, Yahoo, Amazon, our PC directories, email folders, address book, etc.).
Largely used in commercial applications (Google, Yahoo, eBay, Amazon, BBC, CNN, libraries, etc.).
Have been studied for very long time (e.g., Dewey Decimal Classification system - DDC, Library of Congress Classification system - LCC, etc.).
19
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Classification Example: Yahoo! Directory
20
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Classification Example: Email Folders
21
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Classification Example: E-Commerce Category
22
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Label Semantics Natural language words are often
ambiguous
E.g. Java (an island, a beverage, a programming language)
When used with other words in a label, improper senses can be pruned
E.g., “Java Language” – only the 3rd sense of Java is preserved
We translate node labels into unambiguous propositions in ClassL in classification semantics
This can be done by using NLP (Natural Language Processing) techniques
23
Level
4
Subjects
Computers andInternet
0
1
2
3
…
…
…
…
…
…
…
(1)
(3)
(5)
(7)
(8)
Programming
Java Language
Java Beans
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Link semantics Get-specific principle: Child nodes in a
classification are always considered in the context of their parent nodes. As a consequence they specialize the meaning of the parent nodes.
Subsumption relation (a): the extension of the child node is a proper subset of the parent node. The meaning of node 2 is B.
General intersection relation (b): the extension of the child node is a subset of the parent node. The meaning of node 2 is C = A ⊓ B.
We generalize to (b). The meaning of the node is what we call the concept at node.
24
1
2
A
B
?
A
B
A
B C
(b)
(a)
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Concept at node
Wine and Cheese
Italy
Europe
Austria
Pictures
1
2 3
4 5
In ClassL: C4 = Ceurope ⊓ Cpictures ⊓ Citaly
25
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Document Classification Document concept: each document d in a classification is
assigned a proposition Cd in ClassL, build from d in two steps:1. keywords are retrieved from d by using standard text mining
techniques.
2. keywords are converted into propositions by using the methodology discussed above to translate node labels.
Automatic classification: For any given document d and its concept Cd we classify d in each node ni such that:
1. ⊨ Cd C⊑ i,
2. and there is no node nj (j ≠ i), for which C⊨ j C⊑ i and C⊨ d C⊑ j.
In other words we always classify in the node with the most specific concept.
26
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Query-answering Query-answering on a hierarchy of documents based on a
query q as a set of keywords is defined in two steps:
1. The ClassL proposition Cq is build from q by converting q’s keywords as said above.
2. The set of answers (retrieval set) to q is defined as a set of subsumption checking problems in ClassL:
Aq = {d document | T C∈ ⊨ d C⊑ q}
27
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Semantic Matching: Why? Most popular knowledge can be represented as graphs.
The heterogeneity between knowledge graphs demands the exposition of relations, such as semantically equivalent.
Some popular situations that can be modeled as a matching problem are: Concept matching in semantic networks. Schema matching in distributed databases. Ontology matching (ontology “alignment”) in the Semantic
Web.
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
28
The Matching Problem Matching Problem: given two finite graphs, finds all nodes in
the two graphs that syntactically or semantically correspond to each other.
Given two graph-like structures (e.g., classifications, XML and database schemas, ontologies), a matching operator produces a mapping between the nodes of the graphs.
Solution: A possible solution [Giunchiglia & Shvaiko, 2003], consists in the conversion of the two graphs in input into lightweight ontologies and then matching them semantically.
29
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
A Matching Problem
?
?
?
30
ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES