ics-forth february 19, 2002 1 the cidoc crm, a standard for the integration of cultural information...

51
1 ICS-FORTH February 19, 2002 The CIDOC CRM, a Standard for the Integration of Cultural Information Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Monterey, California, February 19, 2002 Center for Cultural Informatics

Upload: andrew-little

Post on 24-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

1ICS-FORTH February 19, 2002

The CIDOC CRM, a Standard for the Integration of Cultural

Information

Martin Doerr

Foundation for Research and Technology - HellasInstitute of Computer Science

Monterey, California,February 19, 2002

Center for Cultural Informatics

2ICS-FORTH February 19, 2002

The CIDOC CRMOutline

Problem statement – information diversity

Motivation example – the Yalta Conference

The goal of the CIDOC CRM

Presentation of contents

Methodological considerations

State of development

Conclusion

3ICS-FORTH February 19, 2002

The CIDOC CRMCultural Diversity and Data

Standards

Aspects of cultural information:

Collection description (art, archeology, natural history….) Archives and literature (records, treaties, letters, artful works..) Administration, preservation, conservation of material heritage Science and scholarship – investigation, interpretation Presentation – exhibition making, teaching, publication

But how to make a data standard ?

Each aspect needs its methods, forms, communication means Data overlap, but do not fit in one schema Understanding lives from relationships, but how to express them?

4ICS-FORTH February 19, 2002

The CIDOC CRM Historical Archives….

Type: TextTitle: Protocol of Proceedings of Crimea Conference Title.Subtitle: II. Declaration of Liberated Europe Date: February 11, 1945.Creator: The Premier of the Union of Soviet Socialist Republics The Prime Minister of the United Kingdom The President of the United States of AmericaPublisher: State DepartmentSubject: Postwar division of Europe and Japan

“The following declaration has been approved:The Premier of the Union of Soviet Socialist Republics, the Prime Minister of the United Kingdom and the President of the United States of America have consulted with each other in the common interests of the people of their countries and those of liberated Europe. They jointly declare their mutual agreement to concert… ….and to ensure that Germany will never again be able to disturb the peace of the world…… “

DocumentsMetadata

About…

5ICS-FORTH February 19, 2002

The CIDOC CRM Images, non-verbose…

Type: ImageTitle: Allied Leaders at Yalta Date: 1945Publisher: United Press International (UPI)Source: The Bettmann ArchiveCopyright: CorbisReferences: Churchill, Roosevelt, Stalin Photos, Persons

Metadata

About…

6ICS-FORTH February 19, 2002

The CIDOC CRM Places and Objects

TGN Id: 7012124Names: Yalta (C,V), Jalta (C,V) Types: inhabited place(C), city (C)Position: Lat: 44 30 N,Long: 034 10 EHierarchy: Europe (continent) <– Ukrayina (nation) <– Krym (autonomous republic)Note: …Site of conference between Allied powers in WW II in 1945; ….Source: TGN, Thesaurus of Geographic Names

Places, Objects

About…

Title: Yalta, Crimean PeninsulaPublisher: Kurgan-LisnetSource: Liaison Agency

7ICS-FORTH February 19, 2002

The CIDOC CRM Capture Underlying Semantics…

Diversity requires many (meta)data standards Related Information does not match one format or query

We have recognized (already 1994 with Museum Benaki): Event-centric models can integrate many kinds of retrospective

(historical) information (and may be more).

CIDOC has engaged in interdisciplinary work

to create semantic content models, so-called: domain ontologies

Ontologies are formalized knowledge: clearly defined concepts and relationships about possible states of affairs of the domain

8ICS-FORTH February 19, 2002

The CIDOC CRM Explicit Events, Object Identity,

Symmetry

carried out

participated in

has created E31 Document“Yalta Agreement”

E7 Activity

“Crimea Conference”

E65 Creation

*

E38 Image

falls within

took place at

refers to

E52 Time-Span

February 1945

at least covering

at most within

E39 Actor

E39 Actor

E39 Actor

E53 Place7012124

E52 Time-Span

11-2-1945

9ICS-FORTH February 19, 2002

The CIDOC CRM Outcomes

The CIDOC Conceptual Reference Model

A collaboration with the International Council of Museums An ontology of 75 classes and 106 properties for culture and more With the capacity to explain dozens of (meta)data formats Accepted by ISO TC46 in Sept. 2000, now Committee Draft

Serving as:

intellectual guide to create schemata, formats, profiles

A language for analysis of existing sources for integration

“Identify elements with common meaning”

Transportation format for data integration / migration / Internet

10ICS-FORTH February 19, 2002

The CIDOC CRM The Intellectual Role of the CRM

Legacy systems

Legacy systems

Databases

World Phenomena

?

Data structures &Presentation models

Conceptualization

abstracts fromapproximates

explains,motivates

organize

refer to

Data in various forms

11ICS-FORTH February 19, 2002

The CIDOC CRM is a formal ontology (defined in TELOS)

But CRM instances can be encoded in many forms: RDBMS, ooDBMS, XML, RDF(S).

Uses Multiple isa – to achieve uniqueness of properties in the schema.

Uses multiple instantiation - to be able to combine not always valid combinations (e.g. destruction – activity).

Uses Multiple isA for properties to capture different abstraction of relationships.

Methodological aspects:

Entities are introduced only if anchor of property ( if structurally relevant).

Frequent joins (shot-cuts) of complex data paths for data found in different degrees of detail are modeled explicitly.

The CIDOC CRM Encoding of the CIDOC CRM

ICS-FORTH February 19, 2002

Justifying Multiple Inheritance:achieving uniqueness of

properties

belongs to church

Ecclesiastical item

Holy Bread Basket

container

lid

Museum Artefact

museum number

material

collection

Single Inheritance form:

Museum Artefact

museum number

material

collection

Multiple Inheritance form:

Canister

container

lid belongs to church

Ecclesiastical item Canister

container

lid

Holy Bread Basket

Repetition of properties ! Unique identity of properties !

13ICS-FORTH February 19, 2002

custody_changed_by,

Transfer of Epitaphios GE34604 (entity Transfer of Custody, Acquisition ) custody_surrendered_by

Metropolitan Church of the Greek Community of Ankara (entity Actor ) transfers_title_from

Metropolitan Church of the Greek Community of Ankara (entity Actor ) custody_received_by

Museum Benaki (entity Actor ) transfers_title_to

Exchangable Fund of Refugees (entity Legal Body ) has type

national foundation (entity Type ) carried out by

Exchangable Fund of Refugees (entity Actor ) has time - span

GE34604_transfer_time (entity Time - Span )

at most within

1923 - 1928 (entity Time Primitive ) took place at

Greece (entity Place ) has type

nation (entity Type ) republic (entity Type)

falls within Europe (entity Place )

has type continent (entity Type)

Possible Encoding of Dataas CIDOC CRM instance (XML-style)

TGN data

title_changed_by

Epitaphios GE34604 (entity Man-made Object )

14ICS-FORTH February 19, 2002

The CIDOC CRMWhat does and what it does not

Idea: Not being prescriptive creates much flexibility !

The CRM can be used as data format for transport / migration / presentation (but for not designed for data entry)

It does not propose what to describe

It allows to interprete what museums, archives actually describe

It tries to formalize concepts which help data integration and resource discovery (not all information)

Focused on data structure semantics, integration, information about the past

15ICS-FORTH February 19, 2002

The CIDOC CRMTop-level Entities relevant for

Integration

participate in

Actors

Types

Conceptual Objects

Physical Entities

Temporal Entities

Ap

pel

lati

ons

affect or / refer to

refer to / refine

refe

r to

/ i d

ent i f

ie

location

atwithinPlaces

Time-Spans

16ICS-FORTH February 19, 2002

Identification of real world items by real world names.

Classification of real world items.

Part-decomposition and structural properties of Conceptual &

Physical Objects, Periods, Actors, Places and Times.

Participation of persistent items in temporal entities.

— creates a notion of history: “world-lines” meeting in space-time.

Location of periods in space-time and physical objects in space.

Influence of objects on activities and products and vice-versa.

Reference of information objects to any real-world item.

The CIDOC CRMA Classification of its Relationships

17ICS-FORTH February 19, 2002

The CIDOC CRMExample: The Temporal Entity

Hierarchy

18ICS-FORTH February 19, 2002

The CIDOC CRM Example: Temporal Entity

Temporal EntityScope Note:

This is an abstract entity and has no examples. It groups together things such as events, states and other phenomena which

are limited in time. It is specialized into Period, which holds on some geographic area, and Condition State, which holds for, on, or over a certain object.

— consists of related or similar phenomena, — Is limited in time, is the only link to time, but not time itself— spreads out over a place or object (physical or not).— the core of a model of physical history, open for unlimited specialisation.

19ICS-FORTH February 19, 2002

The CIDOC CRM Example: Temporal Entity-

Subclasses

Period binds together related phenomena introduces inclusion topologies - parts etc. Is confined in space and time the basic unit for temporal-spatial reasoning

Event looks at the input and the outcome the basic unit for causal reasoning each event is a period if we study the process

Activity brings the people in adds purpose

20ICS-FORTH February 19, 2002

The CIDOC CRM Temporal Entity- Main Properties

Temporal Entity Properties: has time-span (is time-span of): Time-Span

Period Properties: consists of (forms part of): Period

falls within (contains): Period took place at (witnessed): Place

Event Properties: had participants (participated in): Actor

occurred in the presence of (was present at): Stuff

Activity Properties: carried out by (performed): Actor

had specific purpose (was purpose of): Activity had as general purpose (was purpose of): Type

21ICS-FORTH February 19, 2002

The CIDOC CRM Termini postquem / antequem

Pope Leo I AttilaAttila

meetingLeo I

carried out by (performed)

carried out by (performed)

Birth ofLeo I

Birth ofAttila

Death ofLeo I

Death ofAttila

* has time-span (is time-span of)

*

has time-span

(is time-span of)

was dea

th of

(died

in)

was death of

(died in)

brought into life

(was born) brought in

to life

(was b

orn)

*

has tim e-span (is time- span of)

at mostwithin

at mostwithin

at mostwithin

AD453AD461

AD452

before

beforebefore

before

Deduction: before

had participants:

took o.o.existence:

brought i. existence:

22ICS-FORTH February 19, 2002

The CIDOC CRM The Participation Properties

had participants (participated in) Event Actor

carried out by (performed) Activity Actor

transferred title to (acquired title of) Acquisition Actor

transferred title from (surrendered title of)Acquisition Actor

custody surrendered by (surrendered custody) Transfer of Custody Actor

custody received by (received custody) Transfer of Custody Actor

has formed (was formed by) Formation Group

dissolved (was dissolved by) Dissolution Group

by mother (gave birth) Birth Person

brought into life (was born) Birth Person

was death of (died in) Death Person

23ICS-FORTH February 19, 2002

0,n 1,1

0,n 1,n

Activity

Type

Event

CIDOC Entity String has notes

CIDOC Notion

has type

is type of

The CIDOC CRM Activities

24ICS-FORTH February 19, 2002

0,n

0,n

0,n

0,n 0,n

0,n

Measurement

Attribute Assignment

Physical Entity Dimension value unit

has dimension (is dimension of)

was measured (measured) observed dimension (was observed)

The CIDOC CRM Activities: Measurement

25ICS-FORTH February 19, 2002

0,n

0,n

0,n

0,n

0,n

1,1 0,n

0,n Activity

Condition Assessment

Physical Entity Condition State has conditions (condition of)

assessed by (concerns) has identified (identified by)

Actor carried out by (performed)

in the role of

Temporal Entity

The CIDOC CRM Activities: Condition Assessment

26ICS-FORTH February 19, 2002

0,n

0,n

0,n

0,n

0,n

0,n

0,n 0,n 0,n 0,n

Activity

Acquisition

Actor Physical Object is current owner of (has current owner)

is former or current owner of (has former or current owner)

acquires title of (transferred title to)

transferred title of (changed ownership by) surrenders title of (transferred title from)

The CIDOC CRM Activities: Acquisition

27ICS-FORTH February 19, 2002

0,n

0,n

0,n

0,n 0,1 0,n

0,n

0,n

0,n

0,n 0,n 0,n

0,n 0,n 0,n 0,n

Activity

Move

Physical Object Place

moved by (moved)

moved to (occupied)

has current location (currently holds)

moved from (vacated)

Type had as general purpose (was purpose of)

has specific purpose (was purpose of)

has current permanent location (is ~ of)

has former or current location (is ~ of)

The CIDOC CRM Activities: Move

28ICS-FORTH February 19, 2002

0,n

0,n

0,n

0,n 0,n

0,n

0,n

0,n 0,n

0,n 0,n

0,n

Activity

Modification

Physical Entity

has produced (was produced by)

Actor carried out by (performed) in the role of

Type used general technique (was technique of)

Man-Made Entity

Design or Procedure

used specific technique (was used by)

Material

consists of (is incorporated in)

usually employs (is usually employed by)

The CIDOC CRM Activities: Modification/Production

29ICS-FORTH February 19, 2002

The CIDOC CRM Entity 11: Modification

Properties:  is called (identifies): Appellation has type (is type of): Type had participants (participated in): Actor carried out by (performed): Actor

(in the role of : Type) has produced (was produced by): Physical Man-Made Stuff took into account (was taken into account by): Conceptual Object occurred in the presence of (was present at): Stuff used object (was used for): Physical Object (mode of use: String) used general technique (was technique of): Type used specific technique (was used by): Design or Procedure was motivation for (motivated): Conceptual Object motivated the creation of (was created for): Conceptual Object was intended use of (was made for): Man-Made Object (mode of use: String) had specific purpose (was purpose of): Activity had as general purpose (was purpose of): Type

declared properties

inherited properties

inherited properties

declared properties

inherited properties

30ICS-FORTH February 19, 2002

time

after

at most within

at least covering

before

“in

ten

sity

The CIDOC CRM Time Uncertainty, Certainty and Duration

duration

31ICS-FORTH February 19, 2002

Condition State Period

Event

Date

Time Appellation Period Appellation

AppellationTemporal Entity

Time Span

CIDOC Entity

Placehas time-span(is time-span of)

falls within(contains)

consists of(forms part of)

falls within(contains)

consists of(forms part of)

is called(identifies)

is identified by(identifies)

took place at(witnessed)

The CIDOC CRM Time-Span

32ICS-FORTH February 19, 2002

The CIDOC CRM Example: Place

Place A place is an extent in space, determined diachronically wrt a larger,

persistent constellation of matter, often continents -

by coordinates, geophysical features, artefacts, communities, political systems, objects - but not identical to.

A “CRM Place” is not a landscape, not a seat - it is an abstraction from temporal changes - “the place where…”

A means to reason about the “where” in multiple reference systems.

Examples: figures from the bow of a ship, African dinosaur foot-prints in Portugal

33ICS-FORTH February 19, 2002

The CIDOC CRM Place

AddressPlace Name

Spatial Coordinates

Section Definition Physical Object

Place Appellation

Placeconsists of

(forms part of)

defines section of(has section definition)

is located on or within

(has section)

identif

ies

(is id

entif

ied by) has former or current location

(is former or current location of )

Move

moved to (occupied)

moved from (vacated)

moved (moved  by)

Production

has produced (was produced by)

happened at (witnessed)

34ICS-FORTH February 19, 2002

Attribute Assignment

Place NamingGroup

Actor

Place

Time-Span Community

Place Appellation

falls within

is identified by(identifies)

assi

gns n

ame

Period

to community

iden

tifi

ed b

y

carriesout

has time-sp

an

to placetook place at

has time-span

The CIDOC CRM Extension Example: Getty’s TGN

35ICS-FORTH February 19, 2002

Nineveh namingPeople of Iraq

TGN7017998

1st mill. BC City of Nineveh

Kuyunjikfalls within is identified by

(identifies)

assi

gns

nam

e

to community

iden

tifi

ed b

y

carryout

has time-span

to place

took place at

has time-span

Example from the TGN

20th century

Nineveh

is identified by(identifies)

Nineveh naming

assi

gns

nam

e

TGN1001441

36ICS-FORTH February 19, 2002

The CIDOC CRM Physical Stuff

37ICS-FORTH February 19, 2002

The CIDOC CRM Conceptual Object

38ICS-FORTH February 19, 2002

The CIDOC CRM Actors

39ICS-FORTH February 19, 2002

The CIDOC CRM -Application Mapping DC to the CIDOC CRM

Type: textTitle: Mapping of the Dublin Core Metadata Element Set to the CIDOC CRMCreator: Martin DoerrPublisher: ICS-FORTHIdentifier: FORTH-ICS / TR 274 July 2000Language: English

Example: Partial DC Record about a Technical Report

40ICS-FORTH February 19, 2002

was created by

is identified by

E41 Appellation

Name: Mapping of the Dublin Core Metadata Element Set to the CIDOC CRM…..

E33 Linguistic Object

Object: FORTH-ICS /

TR-274 July 2000

E72 Actor Appellation

Name: Martin Doerr

E65 Creation

Event: 0001

carried out by is identified by

E72 Actor Appellation

Name: ICS-FORTH

E7 Activity

Event: 0002

carried out by is identified by

E55 Type

Type: Publication

has type

was taken into account by

E75 Conceptual Object Appellation

Name: FORTH-ICS / TR-274 July 2000

E55 Type

Type:FORTH Identifier

has type

is identified by

E56 Language

Lang.: English

has language

The CIDOC CRM -Application Mapping DC to the CIDOC CRM (RDF style)

E39 Actor

Actor:0001

E39 Actor

Actor:0002

41ICS-FORTH February 19, 2002

Type: imageTitle: Garden of ParadiseCreator: Master of the Paradise GardenPublisher: Staedelsches Kunstinstitut

Example: Partial DC Record about a painting

The CIDOC CRM -Application Mapping DC to the CIDOC CRM

42ICS-FORTH February 19, 2002

is identified by

E41 Appellation

Name: Garden of Paradise…..

E23 Iconographic Object

Object: PA 310-1A??

E72 Actor Appellation

Name: Master of the Paradise Garden

E39 Actor

ULAN: 4162

E12 Production

Event: 0003

carried out byis identified by

E72 Actor Appellation

Name: Staedelsches Kunstinstitut

E39 Actor

Actor: 0003

E65 Creation carried out by is identified by

E55 Type

Type: Publication Creation

has type

is documented in

E31 Document

Docu: 0001

was created byhas type

was produced by

The CIDOC CRM -Application Mapping DC to the CIDOC CRM

E55 Type

AAT: painting

E55 Type

DCT1: imageEvent: 0004

43ICS-FORTH February 19, 2002

The CIDOC CRM -Application Repository Indexing

Actors Events Objects

Derivedknowledge

data (e.g. RDF)

Thesauriextent

CRM entities

On

tolo

gy

ex

pa

ns

ion

Sources and

metadata(XML/RDF)

Backgroundknowledge /Authorities

CIDOCCRM

44ICS-FORTH February 19, 2002

The CIDOC CRMProposition about Restructuring

Data Data acquisition needs different structure from presentation:

Acquisition (can be motivated by the CRM):

— sequence and order, completeness, constraints to guide and control data entry.

— ergonomic, case-specific language, optimized to specialist needs

— often working on series of analogous items Integration / comprehension (target of the CRM):

— restore connectivity with related subjects,

— match and relate by underlying common concepts

— no preference direction or subject Presentation, story-telling (can be based on CRM)

— explore context, paths, analogies orthogonal to data acquisition

— present in order, allow for digestion

— support abstraction

45ICS-FORTH February 19, 2002

As an explanatory and mediation model, the CRM:

does not enforce constraints

— optional properties, multivalued properties, multiple instantiation

contains redundant paths

— “short cuts” of secondary processes, complex indirections

contains abstractions at various levels

— of entities and

— of attributes/properties

The CIDOC CRM About the CRM

46ICS-FORTH February 19, 2002

1. Will the CRM require me to change my documentation?

Answer:

No. It makes no prescriptions.

You may, however, orient your document structures according to CRM constructs,

so that an automatic transfer into a CRM-compatible form is possible.

2. What is the benefit of standardisation?

Answer:

You can communicate the meaning of your data structures in a unique way to any

user of the CRM.

The CIDOC CRM Frequently asked questions

47ICS-FORTH February 19, 2002

3. What does compatibility with the CRM mean?

Answer:

That the parts of my data structure with the same scope can be explained

by the CRM as:

Subset: data can be transformed into a CRM instance and back.

Superset: A CRM instance can be transformed... and back.

Overlap: equivalent parts can be transformed...

Extension: The CRM can be consistently extended by refinement or

generalization such that my data structure becomes a subset.

Still to be defined precisely. Major consensus issue.

The CIDOC CRM Frequently asked questions

48ICS-FORTH February 19, 2002

4. What does the CRM cover?

Answer: Potentially the whole world as physical history in human perception!

At the moment: CIDOC Information Categories, DC, EAD, 90% of

SPECTRUM, most of FRBR, fits with OPENGIS, and many proprietary

formats.

The mapping exercise has verified the stability of model, its

extensibility and the effectiveness of the methodology.

Experience of 5-10 years development by interdisciplinary groups,

several implementations.

The CIDOC CRM Frequently asked questions

49ICS-FORTH February 19, 2002

Elegant and simple compared to comparable Entity-Relationship models

Coherently integrates information at varying degrees of detail

Readily extensible through O-O class typing and specializations

Richer semantic content; allows inferences to be made from underspecified data elements

Designed for mediation of cultural heritage information

The CIDOC CRM Benefits of the CRM (From Tony Gill)

50ICS-FORTH February 19, 2002

Proposed to ISO as Committee Draft. The structure is stable since four years.

Currently the SIG is elaborating minor extension to satisfy archeology, Natural History and relationships to Digital Libraries. Process will close in July.

Elaboration of improved documentation, scope note formulation, presentation until end of 2002.

Acceptance as standard expected by 2003.

The CIDOC CRM State of Development

51ICS-FORTH February 19, 2002

The CRM is NOT a metadata standard,

it should become our language for semantic interoperability,

it is a Conceptual Reference Model for analyzing and designing

cultural information systems

The CRM is in the ISO standardization process:

Dissemination for wide understanding and consensus

Extended application tests

Elaboration of details and documentation

Consensus

The CIDOC CRM Conclusions