data modelling part 2: modelling principles - termnet · data modelling part 2: modelling...

23
Prof. Dr. Klaus-Dirk Schmitz Cologne University of Applied Sciences Data Modelling: Part 2 TSS 2007 Cologne 1 Data Modelling Part 2: Modelling Principles Terminology Summer School - Cologne 16 - 20 July 2007 Klaus-Dirk Schmitz Institute for Information Management Faculty 03 University of Applied Sciences Cologne [email protected] K.-D. Schmitz, IIM, FH Köln Overview A little bit of theory again Data modelling (Data categories) Dependencies Modelling variances Concept orientation Term autonomy Data modelling in general: meta model Support by (ISO) standards

Upload: leque

Post on 05-Jun-2018

266 views

Category:

Documents


0 download

TRANSCRIPT

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

1

Data Modelling Part 2:Modelling Principles

Terminology Summer School - Cologne16 - 20 July 2007

Klaus-Dirk SchmitzInstitute for Information ManagementFaculty 03University of Applied Sciences [email protected]

K.-D. Schmitz, IIM, FH Köln

Overview

A little bit of theory againData modelling

(Data categories)DependenciesModelling variancesConcept orientationTerm autonomy

Data modelling in general: meta modelSupport by (ISO) standards

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

2

K.-D. Schmitz, IIM, FH Köln

Communication

“mouse”“mouse”

K.-D. Schmitz, IIM, FH Köln

Terminological triangle

“mouse”“mouse”

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

3

K.-D. Schmitz, IIM, FH Köln

Terminological triangle

objectterm

concept

designation

K.-D. Schmitz, IIM, FH Köln

Object

Any part of the perceivable or conceivable world

Objects may be material (e.g. mouse) or immaterial (e.g. magnetism)

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

4

K.-D. Schmitz, IIM, FH Köln

Concept

Unit of thinking made up of characteristicsthat are derived by categorizing objects having a number of identical properties (DIN)

Unit of knowledge created by a unique combination of characteristics (ISO)

Concepts are not bound to particular languages. They are, however, influenced by social or cultural background

K.-D. Schmitz, IIM, FH Köln

Term

Designation of a defined conceptin a special languageby a linguistic expression

Designation: Any representation of a concept

A term may consist of one or more words

“mouse”“mouse”

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

5

K.-D. Schmitz, IIM, FH Köln

Communication can be disturbed

“return key?”“return key?”

Synonymy

“enter key”“enter key”

K.-D. Schmitz, IIM, FH Köln

Communication can fail

“mouse”“mouse”

Homonymy / Polysemy

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

6

K.-D. Schmitz, IIM, FH Köln

Data modelling for terminology DBs

Important aspects:(selection of adequate data categories)modelling dependencymodelling varianceconcept orientationterm autonomy

Terminology science and terminology standards provides the adequate theory, principles and methods for data modelling

K.-D. Schmitz, IIM, FH Köln

Data categories

Data categories have not been discussed in detail in terminologytheory in the past

First approaches of describing“fields” of forms for recordingterminological data offline

Improvement for the descriptionof term bank structures

But no real definition of underlying data categories

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

7

K.-D. Schmitz, IIM, FH Köln

First comprehensive analysis of terminological data categories used in TMS for the preparation of ISO 12620

First standard for data categories: ISO 12620:1999

Data model for terminological data collections developed for terminology interchange (MARTIF): ISO 12200: 1999

Improved for the Terminology Markup Framework(TMF) in ISO 16642: 2003

Data categories

K.-D. Schmitz, IIM, FH Köln

Dependencies between data categories

ISO 12620:1999 provides a “simple hierarchy”of data categories• grammar = term-related:

grammar is dependent from term

In addition to this, much more dependencies exist and have to be taken into account

• source is dependent from definition• for additional definitions, additional sources are needed• the source of the definition has to be differentiated

from the source of the term or the context example

Modelling dependencies

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

8

K.-D. Schmitz, IIM, FH Köln

Although following ISO 12620, there are sometime more than one modellingsolution to implement the data category

Simple example:

a) gender: m. / f. / n.

b) masculine: yes / nofeminine: yes / noneuter: yes / no

Modelling variances

K.-D. Schmitz, IIM, FH Köln

Complex example:

a) term: ink jet printersuperordinate concept: non-impact printersubordinate concept: bubble jet printercoordinate concept: laser printer

b)term: ink jet printerrelated concept: non-impact printer

type of relation: superordinaterelated concept: bubble jet printer

type of relation: subordinaterelated concept: laser printer

type of relation: coordinate

Modelling variances

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

9

K.-D. Schmitz, IIM, FH Köln

Concept orientation was handled in most of the theoretical publications and the practical guidelines for terminology work

But there exists no data category for concept in ISO 12620:1999

The “concept” can only be represented by data modelling principles

Concept orientation

K.-D. Schmitz, IIM, FH Köln

wordword meaningmeaning

meaningmeaning

meaningmeaning

meaningmeaning

meaningmeaning

meaningmeaning

Lexicographical model / entry

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

10

K.-D. Schmitz, IIM, FH Köln

conceptconcept termterm

descriptive descriptive terminology managementterminology management

termterm

termterm

termterm

termterm

termterm

Terminological model / entry

K.-D. Schmitz, IIM, FH Köln

conceptconcept termterm

prescriptive prescriptive terminology managementterminology management

termterm

termterm

termterm

(term)(term)

termterm

Terminological model / entry

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

11

K.-D. Schmitz, IIM, FH Köln

All terminological information belonging to one concept including all terms in all languages and all term-related and administrative data must be store in one terminological entry

concept = terminological entry

Concept orientation

K.-D. Schmitz, IIM, FH Köln

Many of the older term banks and TMS are more designed for term orientation

Modern TMS not only follow the concept approach but also support features for consistent concept entries (preventing “double entries”)

Concept orientation

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

12

K.-D. Schmitz, IIM, FH Köln

K.-D. Schmitz, IIM, FH Köln

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

13

K.-D. Schmitz, IIM, FH Köln

All terms belonging to one concept should be managed (in one terminological entry) as autonomous (repeatable) blocks of data categories without any preference for a specific term

Therefore all terms can be documented with the relevant term-related data categoriesTerm autonomy is necessary for the main term, all synonyms, all variants, and all short formsTerm autonomy is not explicitly discussed in theoretical literature

Term autonomy

LREC 2002, May 2002 K.-D. Schmitz, IIM, FH Köln

Conceptrepresented by ID-No. and/or classification / notation

Language 1 Language 2 Language 3 ...

Term 1+ AuxInfo

Term 2+ AuxInfo

Term 1+ AuxInfo

Term 2+ AuxInfo

Term 1+ AuxInfo

Term 3+ AuxInfo

Term autonomy

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

14

K.-D. Schmitz, IIM, FH Köln

Concept

TermDE 1

TermFR 1

TermEN 1

TermDE 2

Gram. Context Style etc.

Domain

Notation

etc.

Term autonomy

K.-D. Schmitz, IIM, FH Köln

TermDE 1

TermFR 1

TermEN 1

TermDE 2

Gram. Context Style etc.

Domain

VariantVariantAbbreviationAbbreviationSynonymSynonymTranslationTranslation

??

Term autonomy (no concept-orientation)

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

15

K.-D. Schmitz, IIM, FH Köln

K.-D. Schmitz, IIM, FH Köln

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

16

K.-D. Schmitz, IIM, FH Köln

K.-D. Schmitz, IIM, FH Köln

Data modelling: meta model (ISO 12200)

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

17

K.-D. Schmitz, IIM, FH Köln

Data modelling: meta model (ISO 16642)

K.-D. Schmitz, IIM, FH Köln

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

18

K.-D. Schmitz, IIM, FH Köln

Design of a terminology DB

Data modelling taken from terminology management / terminology interchange(ISO 12200, ISO 12620, ISO 16642)

Monolingual terminological entries:

Terminological Data Collection

Complementary InfoGlobal Info Concept/Entry

Term*

*

K.-D. Schmitz, IIM, FH Köln

Design of a terminology DB

Data modelling taken from terminology management / terminology interchange(ISO 12200, ISO 12620, ISO 16642)

Monolingual terminological entries: Example

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

19

K.-D. Schmitz, IIM, FH Köln

Multilingual terminological entries:

Terminological Data Collection

Complementary InfoGlobal Info Concept/Entry

Language*

*

Term*

Design of a terminology DB

K.-D. Schmitz, IIM, FH Köln

Design of a terminology DB

Multilingual terminological entries: Example

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

20

K.-D. Schmitz, IIM, FH Köln

Multilingual terminological entries: Example TBX

<?xml version='1.0'?><!DOCTYPE martif SYSTEM "./TBXcoreStructureDTD-v-1-0.DTD">

<martif type='TBX' xml:lang='en' ><martifHeader>…</martifHeader><text><body>

<termEntry id=‘ID0000073578’<descrip type=’subjectField’>Materialbeschaffenheit</descrip><langSet lang=de>

<ntig><termGrp><term>Opazität</term>

<termNote type=‘partOfSpeech’>Substantiv</termNote> <termNote type=‘grammaticalGender’>f</termNote>

<descripGrp><descrip type=‘definition’>Maß für die Lichtundurchlässigkeit</descrip><ref type=‘sourceIdentifier’ target=‘DIN-6370.1996-05>S. 383</ref>

</descripGrp></ntig></termEntry></body></text></martif>

Design of a terminology DB

K.-D. Schmitz, IIM, FH Köln

Terminology data modelling:benefits from standardization

basic principles and methods

ISO 704: Terminology work - Principles and methods

ISO 1087: Terminology work - Vocabulary

DIN 2330: Begriffe und Benennungen - Allgemeine Grundsätze

DIN 2342: Begriffe der Terminologielehre -Grundbegriffe

concept orientation + term autonomy

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

21

K.-D. Schmitz, IIM, FH Köln

working procedures

ISO 12618: Computer applications in terminology -Design, implementation and maintenance of terminology management systems (review 26162)

DIN 2339: „Terminologiearbeit“ (review)

KÜDES: Empfehlungen für die Terminologiearbeit

guidelines for terminology management/work

Terminology data modelling:benefits from standardization

K.-D. Schmitz, IIM, FH Köln

IT-realization: design

ISO 12200: Computer applications in terminology -Machine-readable terminology interchange format (MARTIF) - Negotiated interchange

ISO 16642: Computer applications in terminology -Terminological markup framework (TMF)

ISO 12618: Computer applications in terminology -Design, implementation and maintenance of terminology management systems (review 26162)

data modeling + meta-model

Terminology data modelling:benefits from standardization

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

22

K.-D. Schmitz, IIM, FH Köln

IT-realization: data categories

ISO 12620: Computer applications in terminology -Data categories (review)

IT-realization: data interchange

ISO 12200: MARTIF

ISO 16642: TMF

TBX: Termbase Exchange Format (LISA ISO)

Terminology data modelling:benefits from standardization

K.-D. Schmitz, IIM, FH Köln

Conclusion

A terminology management solution (or a termbase) has to be designed and im-plemented very thoroughly, especially for:

selecting adequate data categories

modelling the terminological entry

choosing a terminology management tool (system / software)

Wrong decisions and mistakes can later be repaired only with huge efforts and costs

Prof. Dr. Klaus-Dirk SchmitzCologne University of Applied Sciences

Data Modelling: Part 2TSS 2007 Cologne

23

Thank you for your attention

Prof. Dr. Klaus-Dirk SchmitzFachhochschule Köln

Fakultät 03 - ITMK/IIMMainzer Str. 5

50678 Kö[email protected]