Ontologies and terminological concept modelling
Bodil Nistrup Madsen &
Hanne Erdman Thomsen
DANTERMcentret & Copenhagen Business School
EAFT and NORDTERM Workshop 10th February 2006, Vaasa
Handelshøjskolen i København
Part 1: The terminological method: principles and tools
Part 2: Terminological ontologies vs. other kinds of ontologies
Part 3: Terminological concept modelling vs. conceptual data modelling
Part 1: The terminological method: principles and tools
Principles:• feature specifications• dimensions • dimension specifications • subdividing dimensions • inheritance
Tools:• i-Term & i-Model• CAOS 2
Example ontology from Working Group 07:Prevention, Health Promotion and Public Health
National Board of Health, Denmark
Background:• IT strategy for the health sector, Government of Denmark, 2003: The Danish Council for Health Terminology• Working groups: Administrative concepts, Clinical process, Medication, Adverse events, Quality development, Information security, Prevention, Health Promotion and Public Health, Clinical interventions and results
Objective: To develop a common concept database for the Danish health sector as a basis for the development of electronic health record systems.
DANTERMcentret: terminology courses and consultancy
Working Group 07:Prevention, Health Promotion and Public Health
National Board of Health, Denmark
http.//begrebsbasen.sst.dk/forebyggelse
and special report which may be downloaded from the web site
Terminological methods presented by examples from i-Term & i-Model
Terminology and Knowledge Management System DANTERMcentret
i-Model allows the user to interactively produce a graphical representation of a concept system (‘traditional’ presentation).
It is possible to enter all kinds of concept relations, using special symbols for generic, part-whole, temporal and other relations, which may be named specifically by the user.
The user may also enter feature specifications and subdivision criteria (subdividing dimensions).
feature specification
subdividision criteria
i-Model: choose your own colours and layout
i-Model: ’Traditional’ layout
i-Model:
Inheritance may be introduced.
Polyhierachy is possible.
No checking of consistancy in diagrams.
polyhierarchy
inheritance
illegal polyhierarchy: the two superordinate concepts must
belong to different groups (dimensions)
How to build a concept system in i-Model
part-whole relation
temporal relation
type relation
associative relation
This concept system comprises: • concept positions• feature specifications • subdivision criteria
CAOSComputer-Aided Ontology
Structuring
Bodil Nistrup Madsen
Hanne Erdman Thomsen
Carl Vikner
Bo Krantz Simonsen
Jacob M. Christensen
Dept. of Computational Linguistics
Concept systems in CAOS are based on the UML notation – with extensions.
We build terminological ontologies.
dimension specifications(specify the values associated with the corresponding attribute on the subconcepts)
subdividing dimension(concepts belonging to the same subdividing dimension are grouped together and the subdividing dimension is shown on the links to the concepts)
feature specification
primary feature specification
inherited feature specifications
type relation
How to build a concept system in CAOS 2
First concept prevention and dimension specification:TARGET GROUPwith values:• popuplation• high-risk groups• high-risk individuals!
The terminologist does not know the terms yet!
Three subordinate concepts automatically generated on the basis of the dimension specification. No terms – yet!
Terms have been added
TARGET GROUP chosen as subdividing dimension
Second dimension specification:PHASE IN CLINICAL COURSEwith values on new concepts• before• during• after
Terms added at this stage.
Attempt at creating an illegal polyhierarchy: a concept universal selective prevention with two superordinate concepts within the same group (dimension TARGET GROUP).
Creating a legal polyhierarchy: a concept universal primary prevention with two superordinate concepts within two different groups (dimensions TARGET GROUP and PHASE IN COURSE).
There is only one delimiting dimension: TARGET GROUP.
The introduction of the feature specifications containing the dimension ARENA indicates that there may exist some other concepts,e.g.: prevention in schools. Or the feature specifications containing ARENA may be considered as supplementary and determined by the feature specifications containing TARGET GROUP.
New dimension specification: ARENA with the values school and risk environment.
CAOS implements more restrictive terminological principles.
CAOS helps the user in setting up consistant concept systems adhering to the terminological principles.
The user has the possibility of overriding some constraints if she wants to.
The backbone of this concept modelling is constituted by characteristics modelled by formal feature specifications, i.e. attribute-value pairs.
Constraints in CAOS related to subdivision criteria
1) A concept (with only one mother concept) may contain at most one delimiting feature specification
(i.e. a subdividing dimension may not overlap another one).
Argumentation:
Multiplying delimiting characteristics in one concept may obscure the concept system by leaving out well-founded superordinate concepts, i.e. creating conceptual gaps, i.e. if the terminologist considers it necessary to attach more than one delimiting characteristic to a concept, this may indicate gaps in the concept system.
2) A concept (of level 2 or below) must contain at least one delimiting feature specification
(i.e. the subdividing dimensions taken together must cover all subordinate concepts).
Argumentation:
It is not possible to make proper definitions for a concept if the concept does not have a delimiting characteristic.
3) An attribute may only be associated with one value in a feature structure
(i.e. one concept can only be related to two superordinate concepts, if the two superordinate concepts belong to different subdividing dimensions – which is the case in a polyhierarchical structure).
4) A given dimension may occur only on one concept in an ontology (uniqueness of dimensions)
(i.e. feature specifications with the same attribute must always occur on coordinate concepts).
Argumentation:
(to create coherence and simplicity in the ontological structure because concepts that are characterized by means of a certain common dimension must appear as coordinate concepts on the same level having a common superordinate concept).
5) A feature specification may only occur once in an ontology as primary (uniqueness of primary feature specifications)
(i.e. a given primary feature specification can only appear on one of the subordinate concepts).
Argumentation:
This principle contributes to create coherence and simplicity in the ontological structure because closely related concepts, i.e. concepts with common characteristics, are kept closely together in the ontology in that they must be subconcepts of one common superordinate concept.
The two uniqueness principles (4 and 5) make it possible to a certain extent to carry out automatic placing of concepts into an ontology.
If a new concept is characterized by one or more feature specifications, the system can be instructed to search the ontology for concepts with the attributes as dimensions and possibly concepts having the same feature specifications, and on this basis propose a location for the new concept.
Part 2: Terminological ontologies vs. other kinds of ontologies
Terminological ontologies
vs.
philosophical ontologies
bottom-up vs. top-down
Sowa (1997:36) says:
"Philosophers usually build their ontologies from the top down. They start with grand conceptions about everything in heaven and earth.
Programmers, however, tend to work from the bottom up. For their database and AI systems, they start with limited ontologies or microworlds, which have a small number of concepts that are tailored for a single application."
Being
Substance Accident
Affection Relation
Inherence Process Circumstance
Quantity Quality Activity Position Passivity State Where When
Brentano’s tree structure of the categories of Aristotle.
Terminology work is corpus based – bottom up.
List of references used by Group 07 Prevention, Health Promotion and Public Health
Need for a topontology for the health terminology!
The working groups use general concepts in their definitions and as top concepts.
Possible strategies:
1. Top-down – before the work of the working groups
2. Bottom-up – after / during the work of the working groups based on general concepts identified by the 7 groups.
Solution: Strategy 2!
Draft topontology for the health terminology
Terminological principles used!
OpenCyc Selected Vocabulary and Upper Ontology
http://www.cyc.com/cycdoc/vocab/upperont-diagram.html
Terminological ontologies
vs.
mathematical-logical ontologies
In terminological concept modelling only relevant subconcepts are registered. This means that not all possible ‘combinations’ of concepts from two or more groups (dimensions) will be registered, e.g. a concept universal secondary prevention is not relevant.
x y
da b c
In lattices you typically find all combinations that are logically possible.
Part 3: Terminological concept modelling vs. conceptual data modelling
Concept systems
corresponding to central concepts
in ISO 1087-1
with extensions from the NORDTERM version of ISO 1087-1
characteristic
NU
MB
ER
OF
RE
FER
EN
TS
POSI
TIO
N I
N
HIE
RA
RC
HY
intension
concept
concept system
concept relation
extension
object
generic relation partitive relation
associative relation
temporal relation
sequential relation
hierarchical relation
ISO-1087 Terminology of terminology
Concept system 3.2 Concepts
Concept marked with red is not defined in ISO 1087-1.
individual concept[NUMBER OF REFERENTS: one object]
superordinate conceptPOSITION IN HIERARCHY: above subordinate concept
general concept[NUMBER OF REFERENTS: two or more objects]
subordinate conceptPOSITION IN HIERARCHY:below superordinate concept
characteristic
intension
concept
extension
object
ISO 1087-1 (cf. previous slide)
concept
unit of knowledge created by a unique combination of characteristics
intension
set of characteristics which makes up the concept
NORDTERM: Danish version of concept system – 02 Concepts
Concept marked with red is not defined in ISO 1087-1.
characteristic
property
intension conceptextension
object
referent
intension
set of characteristics which denotes the extension of a concept
mængde af karakteristiske træk der udpeger ekstensionen af et begreb
concept
unique combination of characteristics which makes up the content of a term
unik kombination af karakteristiske træk der udgør indholdssiden af en term
property
quality of an entity
characteristic
intension
concept
extension
object
ISO 1087-1 (from previous slide)
NORDTERM: Danish definitions translated into English
Draft concept system: NORDTERM Terminology of terminology in i-Model
Terminological concept modelling using UML
UML diagrams
corresponding to central concepts of
ISO 1087-1
NB! Here we are not talking about conceptual data models for a database
individual concept[NUMBER OF REFERENTS: one object]
superordinate conceptPOSITION IN HIERARCHY: above subordinate concept
general concept[NUMBER OF REFERENTS: two or more objects]
subordinate conceptPOSITION IN HIERARCHY:below superordinate concept
NUMBER O
F
REFERENTS
POSITIO
N IN
HIERARCHY
concept ISO-1087 Terminology of terminology
Traditional presentation
individual concept
superordinate concept
general concept
subordinate concept
number of referents
position in hierarchy
concept
Example of specialisation (= type relation) and discriminators (= subdivision criteria) in UML diagrams
ISO-1087 (types of concepts)
discriminatorspecialisation
concept
concept system
concept relation
generic relation partitive relation
associative relation
temporal relation
sequential relation
hierarchical relation
unit of knowledge created by a unique combination of characteristicsset of concepts
structured according to the relations among them
relation between two concepts which may be either a generic or a partitive relation
relation between two concepts having a non-hierarchical thematic connection by virtue of experience
relation between two concepts where the intension of one of the concepts includes that of the other concept and at least one additional delimiting characteristic
relation between two concepts where one of the concepts constitutes the whole and the other concept a part of that whole
associative relation based on spatial or temporal proximity
sequential relation involving events in time
ISO 1087-1
concept system:
3.2 Concepts
conceptconcept system
concept relation
generic relation partitive relation
associative relation
temporal relation
sequential relation
hierarchical relation
1..*
1..*
Example of aggregation (= part-whole relation) in UML diagrams
ISO-1087 (types of concepts)
aggregation
Conceptual data modelling
for DANTERM / CAOS databases represented in UML
belo
ngs
to
1..*
is related to
is related to
1..* 1..*
1..*
0..*
is expressed by
1..*
conceptSystem
S-ID pkSYSTNAMELANG fk
concSystPos
S-ID pkC-ID pkPOS-ID
concept
C-ID pkLANG fkCLASSA fk
term
C-ID pk fkE-ID pk fkSTATUS …
expression
E-ID pkEXPRESS
concSystRel
S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID
0..* = zero, one or more1..* = one or more
class
attributes
association
multiplicity:
belo
ngs
to
1..*
is related to
is related to
1..* 1..*
1..*
0..*
is expressed by
1..*
conceptSystem
S-ID pkSYSTNAMELANG fk
concSystPos
S-ID pkC-ID pkPOS-ID
concept
C-ID pkLANG fkCLASSA fk
term
C-ID pk fkE-ID pk fkSTATUS …
expression
E-ID pkEXPRESS
concSystRel
S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID
attributes are found in a special compartment in the class
class
attributes
association
multiplicity
belo
ngs
to
1..*
is related to
is related to
1..* 1..*
1..*
0..*
is expressed by
1..*
conceptSystem
S-ID pkSYSTNAMELANG fk
concSystPos
S-ID pkC-ID pkPOS-ID
concept
C-ID pkLANG fkCLASSA fk
term
C-ID pk fkE-ID pk fkSTATUS String…
expression
E-ID pkEXPRESS
concSystRel
S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID
information aboutprimary key (pk) foreign keys (fk) and data types (String), may be added to the attributes
belo
ngs
to
1..*
is related to
is related to
1..* 1..*
1..*
0..*
is expressed by
1..*
conceptSystem
S-ID pkSYSTNAMELANG fk
concSystPos
S-ID pkC-ID pkPOS-ID
concept
C-ID pkLANG fkCLASSA fk
term
C-ID pk fkE-ID pk fkSTATUS …
expression
E-ID pkEXPRESS
concSystRel
S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID
extra class between classes in a many-to-many relationship
1..* 1..*is expressed by
term
C-ID pk fkE-ID pk fkSTATUS …
expression
E-ID pkEXPRESS
reflexive association
One concept in one position in a concept system is related to one or several concepts in the same concept system.
concSystRel
S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID
belo
ngs
to
1..*
is related to
is related to
1..*
0..*
1..*
conceptSystem
S-ID pkSYSTNAMELANG fk
concSystPos
S-ID pkC-ID pkPOS-ID
concept
C-ID pkLANG fkCLASSA fk
• in order to produce a well-functioning database it is necessary to know the concept model for the domain underlying the data model, which forms the basis of the database structure
• knowledge about the concepts in a domain is found in the characteristics and the concept relations
Terminological concept modelling vs. conceptual data modelling
• concept systems and data models do have something in common
but• there is no one-to-one correspondence between a
concept system and the data model of the database:
• There is no one-to-one mapping between concepts
and characteristics in the concept model and classes and attributes in the data model.
• Some concepts correspond to attributes in the data
model, and some concepts may neither correspond to classes nor to attributes.
Terminological concept modelling vs. conceptual data modelling
A concept system for concepts may comprise concepts such as superordinate concept and subordinate concept, which are subordinate concepts to concept.
superordinate concept
subordinate concept
position in hierarchy
concept
There are no corresponding classes or attributes in the conceptual data model; rather, they will be represented by means of the attributes C-ID1 and C-ID2 on the class concSystRel, and the corresponding table concSystRel relates two concepts to each other (via their positions) together with a specification of which relation type (attribute R-ID) holds between them.
concSystRel
S-ID pkC-ID1 pkS-ID2 pkC-ID2 pkR-ID
belo
ngs
to
1..*
is related to
is related to
1..*
0..*
1..*
conceptSystem
S-ID pkSYSTNAMELANG fk
concSystPos
S-ID pkC-ID pkPOS-ID
concept
C-ID pkLANG fkCLASSA fk
Another example: concepts such as intension and extension, which are very important in a concept system for the understanding of central concepts like concept and characteristic, will not be found in an entity/relationship diagram for a terminology database.
characteristic
property
intension conceptextension
object
referent
UML:
UML was originally developed for conceptual data modelling, i.e. graphical presentation of the model that forms the basis of the structure of an IT system, for example a database.
• not possible to represent several dimensions, from which one may be chosen as the subdividing dimension
• no notation for the specification of dimension values, at least not in the way it is done in CAOS
• no notation for feature specifications (it is possible to use a facility of UML which comes close to feature specifications as used in CAOS: in specializations it is possible to introduce attributes with initial values, e.g. ‘plane’ for the class ‘flight’ which is a specialization of the class ‘travel’).
The attributes in a conceptual data model:
• specify which kinds of information may be related to each class and consequently to each instance in the IT system
• the values of the attributes will exist only in the IT system (e.g. in the database), and they will give information about instances. The value of an attribute may differ for each instance of the class.
• Concept modelling• information about concepts in the form of feature
specifications and concept relations
• Conceptual data modelling• information about the classes in the form of
attributes and associations between the classes• attributes give no information about the meaning of
the classes, but only a specification of what kind of information will be given about the entities represented by the classes in question
NB! The attribute values describe the individual instances, not the concept which lies behind the class
Terminological concept modelling vs. conceptual data modelling
Thank you for your attention!