data modelling part 2: modelling principles - termnet · data modelling part 2: modelling...
TRANSCRIPT
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
1
Data Modelling Part 2:Modelling Principles
Terminology Summer School - Cologne16 - 20 July 2007
Klaus-Dirk SchmitzInstitute for Information ManagementFaculty 03University of Applied Sciences [email protected]
K.-D. Schmitz, IIM, FH Köln
Overview
A little bit of theory againData modelling
(Data categories)DependenciesModelling variancesConcept orientationTerm autonomy
Data modelling in general: meta modelSupport by (ISO) standards
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
2
K.-D. Schmitz, IIM, FH Köln
Communication
“mouse”“mouse”
K.-D. Schmitz, IIM, FH Köln
Terminological triangle
“mouse”“mouse”
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
3
K.-D. Schmitz, IIM, FH Köln
Terminological triangle
objectterm
concept
designation
K.-D. Schmitz, IIM, FH Köln
Object
Any part of the perceivable or conceivable world
Objects may be material (e.g. mouse) or immaterial (e.g. magnetism)
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
4
K.-D. Schmitz, IIM, FH Köln
Concept
Unit of thinking made up of characteristicsthat are derived by categorizing objects having a number of identical properties (DIN)
Unit of knowledge created by a unique combination of characteristics (ISO)
Concepts are not bound to particular languages. They are, however, influenced by social or cultural background
K.-D. Schmitz, IIM, FH Köln
Term
Designation of a defined conceptin a special languageby a linguistic expression
Designation: Any representation of a concept
A term may consist of one or more words
“mouse”“mouse”
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
5
K.-D. Schmitz, IIM, FH Köln
Communication can be disturbed
“return key?”“return key?”
Synonymy
“enter key”“enter key”
K.-D. Schmitz, IIM, FH Köln
Communication can fail
“mouse”“mouse”
Homonymy / Polysemy
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
6
K.-D. Schmitz, IIM, FH Köln
Data modelling for terminology DBs
Important aspects:(selection of adequate data categories)modelling dependencymodelling varianceconcept orientationterm autonomy
Terminology science and terminology standards provides the adequate theory, principles and methods for data modelling
K.-D. Schmitz, IIM, FH Köln
Data categories
Data categories have not been discussed in detail in terminologytheory in the past
First approaches of describing“fields” of forms for recordingterminological data offline
Improvement for the descriptionof term bank structures
But no real definition of underlying data categories
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
7
K.-D. Schmitz, IIM, FH Köln
First comprehensive analysis of terminological data categories used in TMS for the preparation of ISO 12620
First standard for data categories: ISO 12620:1999
Data model for terminological data collections developed for terminology interchange (MARTIF): ISO 12200: 1999
Improved for the Terminology Markup Framework(TMF) in ISO 16642: 2003
Data categories
K.-D. Schmitz, IIM, FH Köln
Dependencies between data categories
ISO 12620:1999 provides a “simple hierarchy”of data categories• grammar = term-related:
grammar is dependent from term
In addition to this, much more dependencies exist and have to be taken into account
• source is dependent from definition• for additional definitions, additional sources are needed• the source of the definition has to be differentiated
from the source of the term or the context example
Modelling dependencies
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
8
K.-D. Schmitz, IIM, FH Köln
Although following ISO 12620, there are sometime more than one modellingsolution to implement the data category
Simple example:
a) gender: m. / f. / n.
b) masculine: yes / nofeminine: yes / noneuter: yes / no
Modelling variances
K.-D. Schmitz, IIM, FH Köln
Complex example:
a) term: ink jet printersuperordinate concept: non-impact printersubordinate concept: bubble jet printercoordinate concept: laser printer
b)term: ink jet printerrelated concept: non-impact printer
type of relation: superordinaterelated concept: bubble jet printer
type of relation: subordinaterelated concept: laser printer
type of relation: coordinate
Modelling variances
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
9
K.-D. Schmitz, IIM, FH Köln
Concept orientation was handled in most of the theoretical publications and the practical guidelines for terminology work
But there exists no data category for concept in ISO 12620:1999
The “concept” can only be represented by data modelling principles
Concept orientation
K.-D. Schmitz, IIM, FH Köln
wordword meaningmeaning
meaningmeaning
meaningmeaning
meaningmeaning
meaningmeaning
meaningmeaning
Lexicographical model / entry
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
10
K.-D. Schmitz, IIM, FH Köln
conceptconcept termterm
descriptive descriptive terminology managementterminology management
termterm
termterm
termterm
termterm
termterm
Terminological model / entry
K.-D. Schmitz, IIM, FH Köln
conceptconcept termterm
prescriptive prescriptive terminology managementterminology management
termterm
termterm
termterm
(term)(term)
termterm
Terminological model / entry
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
11
K.-D. Schmitz, IIM, FH Köln
All terminological information belonging to one concept including all terms in all languages and all term-related and administrative data must be store in one terminological entry
concept = terminological entry
Concept orientation
K.-D. Schmitz, IIM, FH Köln
Many of the older term banks and TMS are more designed for term orientation
Modern TMS not only follow the concept approach but also support features for consistent concept entries (preventing “double entries”)
Concept orientation
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
12
K.-D. Schmitz, IIM, FH Köln
K.-D. Schmitz, IIM, FH Köln
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
13
K.-D. Schmitz, IIM, FH Köln
All terms belonging to one concept should be managed (in one terminological entry) as autonomous (repeatable) blocks of data categories without any preference for a specific term
Therefore all terms can be documented with the relevant term-related data categoriesTerm autonomy is necessary for the main term, all synonyms, all variants, and all short formsTerm autonomy is not explicitly discussed in theoretical literature
Term autonomy
LREC 2002, May 2002 K.-D. Schmitz, IIM, FH Köln
Conceptrepresented by ID-No. and/or classification / notation
Language 1 Language 2 Language 3 ...
Term 1+ AuxInfo
Term 2+ AuxInfo
Term 1+ AuxInfo
Term 2+ AuxInfo
Term 1+ AuxInfo
Term 3+ AuxInfo
Term autonomy
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
14
K.-D. Schmitz, IIM, FH Köln
Concept
TermDE 1
TermFR 1
TermEN 1
TermDE 2
Gram. Context Style etc.
Domain
Notation
etc.
Term autonomy
K.-D. Schmitz, IIM, FH Köln
TermDE 1
TermFR 1
TermEN 1
TermDE 2
Gram. Context Style etc.
Domain
VariantVariantAbbreviationAbbreviationSynonymSynonymTranslationTranslation
??
Term autonomy (no concept-orientation)
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
15
K.-D. Schmitz, IIM, FH Köln
K.-D. Schmitz, IIM, FH Köln
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
16
K.-D. Schmitz, IIM, FH Köln
K.-D. Schmitz, IIM, FH Köln
Data modelling: meta model (ISO 12200)
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
17
K.-D. Schmitz, IIM, FH Köln
Data modelling: meta model (ISO 16642)
K.-D. Schmitz, IIM, FH Köln
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
18
K.-D. Schmitz, IIM, FH Köln
Design of a terminology DB
Data modelling taken from terminology management / terminology interchange(ISO 12200, ISO 12620, ISO 16642)
Monolingual terminological entries:
Terminological Data Collection
Complementary InfoGlobal Info Concept/Entry
Term*
*
K.-D. Schmitz, IIM, FH Köln
Design of a terminology DB
Data modelling taken from terminology management / terminology interchange(ISO 12200, ISO 12620, ISO 16642)
Monolingual terminological entries: Example
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
19
K.-D. Schmitz, IIM, FH Köln
Multilingual terminological entries:
Terminological Data Collection
Complementary InfoGlobal Info Concept/Entry
Language*
*
Term*
Design of a terminology DB
K.-D. Schmitz, IIM, FH Köln
Design of a terminology DB
Multilingual terminological entries: Example
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
20
K.-D. Schmitz, IIM, FH Köln
Multilingual terminological entries: Example TBX
<?xml version='1.0'?><!DOCTYPE martif SYSTEM "./TBXcoreStructureDTD-v-1-0.DTD">
<martif type='TBX' xml:lang='en' ><martifHeader>…</martifHeader><text><body>
<termEntry id=‘ID0000073578’<descrip type=’subjectField’>Materialbeschaffenheit</descrip><langSet lang=de>
<ntig><termGrp><term>Opazität</term>
<termNote type=‘partOfSpeech’>Substantiv</termNote> <termNote type=‘grammaticalGender’>f</termNote>
<descripGrp><descrip type=‘definition’>Maß für die Lichtundurchlässigkeit</descrip><ref type=‘sourceIdentifier’ target=‘DIN-6370.1996-05>S. 383</ref>
</descripGrp></ntig></termEntry></body></text></martif>
Design of a terminology DB
K.-D. Schmitz, IIM, FH Köln
Terminology data modelling:benefits from standardization
basic principles and methods
ISO 704: Terminology work - Principles and methods
ISO 1087: Terminology work - Vocabulary
DIN 2330: Begriffe und Benennungen - Allgemeine Grundsätze
DIN 2342: Begriffe der Terminologielehre -Grundbegriffe
concept orientation + term autonomy
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
21
K.-D. Schmitz, IIM, FH Köln
working procedures
ISO 12618: Computer applications in terminology -Design, implementation and maintenance of terminology management systems (review 26162)
DIN 2339: „Terminologiearbeit“ (review)
KÜDES: Empfehlungen für die Terminologiearbeit
guidelines for terminology management/work
Terminology data modelling:benefits from standardization
K.-D. Schmitz, IIM, FH Köln
IT-realization: design
ISO 12200: Computer applications in terminology -Machine-readable terminology interchange format (MARTIF) - Negotiated interchange
ISO 16642: Computer applications in terminology -Terminological markup framework (TMF)
ISO 12618: Computer applications in terminology -Design, implementation and maintenance of terminology management systems (review 26162)
data modeling + meta-model
Terminology data modelling:benefits from standardization
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
22
K.-D. Schmitz, IIM, FH Köln
IT-realization: data categories
ISO 12620: Computer applications in terminology -Data categories (review)
IT-realization: data interchange
ISO 12200: MARTIF
ISO 16642: TMF
TBX: Termbase Exchange Format (LISA ISO)
Terminology data modelling:benefits from standardization
K.-D. Schmitz, IIM, FH Köln
Conclusion
A terminology management solution (or a termbase) has to be designed and im-plemented very thoroughly, especially for:
selecting adequate data categories
modelling the terminological entry
choosing a terminology management tool (system / software)
Wrong decisions and mistakes can later be repaired only with huge efforts and costs
Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences
Data Modelling: Part 2TSS 2007 Cologne
23
Thank you for your attention
Prof. Dr. Klaus-Dirk SchmitzFachhochschule Köln
Fakultät 03 - ITMK/IIMMainzer Str. 5
50678 Kö[email protected]