ics-forth february 19, 2002 1 the cidoc crm, a standard for the integration of cultural information...
TRANSCRIPT
1ICS-FORTH February 19, 2002
The CIDOC CRM, a Standard for the Integration of Cultural
Information
Martin Doerr
Foundation for Research and Technology - HellasInstitute of Computer Science
Monterey, California,February 19, 2002
Center for Cultural Informatics
2ICS-FORTH February 19, 2002
The CIDOC CRMOutline
Problem statement – information diversity
Motivation example – the Yalta Conference
The goal of the CIDOC CRM
Presentation of contents
Methodological considerations
State of development
Conclusion
3ICS-FORTH February 19, 2002
The CIDOC CRMCultural Diversity and Data
Standards
Aspects of cultural information:
Collection description (art, archeology, natural history….) Archives and literature (records, treaties, letters, artful works..) Administration, preservation, conservation of material heritage Science and scholarship – investigation, interpretation Presentation – exhibition making, teaching, publication
But how to make a data standard ?
Each aspect needs its methods, forms, communication means Data overlap, but do not fit in one schema Understanding lives from relationships, but how to express them?
4ICS-FORTH February 19, 2002
The CIDOC CRM Historical Archives….
Type: TextTitle: Protocol of Proceedings of Crimea Conference Title.Subtitle: II. Declaration of Liberated Europe Date: February 11, 1945.Creator: The Premier of the Union of Soviet Socialist Republics The Prime Minister of the United Kingdom The President of the United States of AmericaPublisher: State DepartmentSubject: Postwar division of Europe and Japan
“The following declaration has been approved:The Premier of the Union of Soviet Socialist Republics, the Prime Minister of the United Kingdom and the President of the United States of America have consulted with each other in the common interests of the people of their countries and those of liberated Europe. They jointly declare their mutual agreement to concert… ….and to ensure that Germany will never again be able to disturb the peace of the world…… “
DocumentsMetadata
About…
5ICS-FORTH February 19, 2002
The CIDOC CRM Images, non-verbose…
Type: ImageTitle: Allied Leaders at Yalta Date: 1945Publisher: United Press International (UPI)Source: The Bettmann ArchiveCopyright: CorbisReferences: Churchill, Roosevelt, Stalin Photos, Persons
Metadata
About…
6ICS-FORTH February 19, 2002
The CIDOC CRM Places and Objects
TGN Id: 7012124Names: Yalta (C,V), Jalta (C,V) Types: inhabited place(C), city (C)Position: Lat: 44 30 N,Long: 034 10 EHierarchy: Europe (continent) <– Ukrayina (nation) <– Krym (autonomous republic)Note: …Site of conference between Allied powers in WW II in 1945; ….Source: TGN, Thesaurus of Geographic Names
Places, Objects
About…
Title: Yalta, Crimean PeninsulaPublisher: Kurgan-LisnetSource: Liaison Agency
7ICS-FORTH February 19, 2002
The CIDOC CRM Capture Underlying Semantics…
Diversity requires many (meta)data standards Related Information does not match one format or query
We have recognized (already 1994 with Museum Benaki): Event-centric models can integrate many kinds of retrospective
(historical) information (and may be more).
CIDOC has engaged in interdisciplinary work
to create semantic content models, so-called: domain ontologies
Ontologies are formalized knowledge: clearly defined concepts and relationships about possible states of affairs of the domain
8ICS-FORTH February 19, 2002
The CIDOC CRM Explicit Events, Object Identity,
Symmetry
carried out
participated in
has created E31 Document“Yalta Agreement”
E7 Activity
“Crimea Conference”
E65 Creation
*
E38 Image
falls within
took place at
refers to
E52 Time-Span
February 1945
at least covering
at most within
E39 Actor
E39 Actor
E39 Actor
E53 Place7012124
E52 Time-Span
11-2-1945
9ICS-FORTH February 19, 2002
The CIDOC CRM Outcomes
The CIDOC Conceptual Reference Model
A collaboration with the International Council of Museums An ontology of 75 classes and 106 properties for culture and more With the capacity to explain dozens of (meta)data formats Accepted by ISO TC46 in Sept. 2000, now Committee Draft
Serving as:
intellectual guide to create schemata, formats, profiles
A language for analysis of existing sources for integration
“Identify elements with common meaning”
Transportation format for data integration / migration / Internet
10ICS-FORTH February 19, 2002
The CIDOC CRM The Intellectual Role of the CRM
Legacy systems
Legacy systems
Databases
World Phenomena
?
Data structures &Presentation models
Conceptualization
abstracts fromapproximates
explains,motivates
organize
refer to
Data in various forms
11ICS-FORTH February 19, 2002
The CIDOC CRM is a formal ontology (defined in TELOS)
But CRM instances can be encoded in many forms: RDBMS, ooDBMS, XML, RDF(S).
Uses Multiple isa – to achieve uniqueness of properties in the schema.
Uses multiple instantiation - to be able to combine not always valid combinations (e.g. destruction – activity).
Uses Multiple isA for properties to capture different abstraction of relationships.
Methodological aspects:
Entities are introduced only if anchor of property ( if structurally relevant).
Frequent joins (shot-cuts) of complex data paths for data found in different degrees of detail are modeled explicitly.
The CIDOC CRM Encoding of the CIDOC CRM
ICS-FORTH February 19, 2002
Justifying Multiple Inheritance:achieving uniqueness of
properties
belongs to church
Ecclesiastical item
Holy Bread Basket
container
lid
Museum Artefact
museum number
material
collection
Single Inheritance form:
Museum Artefact
museum number
material
collection
Multiple Inheritance form:
Canister
container
lid belongs to church
Ecclesiastical item Canister
container
lid
Holy Bread Basket
Repetition of properties ! Unique identity of properties !
13ICS-FORTH February 19, 2002
custody_changed_by,
Transfer of Epitaphios GE34604 (entity Transfer of Custody, Acquisition ) custody_surrendered_by
Metropolitan Church of the Greek Community of Ankara (entity Actor ) transfers_title_from
Metropolitan Church of the Greek Community of Ankara (entity Actor ) custody_received_by
Museum Benaki (entity Actor ) transfers_title_to
Exchangable Fund of Refugees (entity Legal Body ) has type
national foundation (entity Type ) carried out by
Exchangable Fund of Refugees (entity Actor ) has time - span
GE34604_transfer_time (entity Time - Span )
at most within
1923 - 1928 (entity Time Primitive ) took place at
Greece (entity Place ) has type
nation (entity Type ) republic (entity Type)
falls within Europe (entity Place )
has type continent (entity Type)
Possible Encoding of Dataas CIDOC CRM instance (XML-style)
TGN data
title_changed_by
Epitaphios GE34604 (entity Man-made Object )
14ICS-FORTH February 19, 2002
The CIDOC CRMWhat does and what it does not
Idea: Not being prescriptive creates much flexibility !
The CRM can be used as data format for transport / migration / presentation (but for not designed for data entry)
It does not propose what to describe
It allows to interprete what museums, archives actually describe
It tries to formalize concepts which help data integration and resource discovery (not all information)
Focused on data structure semantics, integration, information about the past
15ICS-FORTH February 19, 2002
The CIDOC CRMTop-level Entities relevant for
Integration
participate in
Actors
Types
Conceptual Objects
Physical Entities
Temporal Entities
Ap
pel
lati
ons
affect or / refer to
refer to / refine
refe
r to
/ i d
ent i f
ie
location
atwithinPlaces
Time-Spans
16ICS-FORTH February 19, 2002
Identification of real world items by real world names.
Classification of real world items.
Part-decomposition and structural properties of Conceptual &
Physical Objects, Periods, Actors, Places and Times.
Participation of persistent items in temporal entities.
— creates a notion of history: “world-lines” meeting in space-time.
Location of periods in space-time and physical objects in space.
Influence of objects on activities and products and vice-versa.
Reference of information objects to any real-world item.
The CIDOC CRMA Classification of its Relationships
18ICS-FORTH February 19, 2002
The CIDOC CRM Example: Temporal Entity
Temporal EntityScope Note:
This is an abstract entity and has no examples. It groups together things such as events, states and other phenomena which
are limited in time. It is specialized into Period, which holds on some geographic area, and Condition State, which holds for, on, or over a certain object.
— consists of related or similar phenomena, — Is limited in time, is the only link to time, but not time itself— spreads out over a place or object (physical or not).— the core of a model of physical history, open for unlimited specialisation.
19ICS-FORTH February 19, 2002
The CIDOC CRM Example: Temporal Entity-
Subclasses
Period binds together related phenomena introduces inclusion topologies - parts etc. Is confined in space and time the basic unit for temporal-spatial reasoning
Event looks at the input and the outcome the basic unit for causal reasoning each event is a period if we study the process
Activity brings the people in adds purpose
20ICS-FORTH February 19, 2002
The CIDOC CRM Temporal Entity- Main Properties
Temporal Entity Properties: has time-span (is time-span of): Time-Span
Period Properties: consists of (forms part of): Period
falls within (contains): Period took place at (witnessed): Place
Event Properties: had participants (participated in): Actor
occurred in the presence of (was present at): Stuff
Activity Properties: carried out by (performed): Actor
had specific purpose (was purpose of): Activity had as general purpose (was purpose of): Type
21ICS-FORTH February 19, 2002
The CIDOC CRM Termini postquem / antequem
Pope Leo I AttilaAttila
meetingLeo I
carried out by (performed)
carried out by (performed)
Birth ofLeo I
Birth ofAttila
Death ofLeo I
Death ofAttila
* has time-span (is time-span of)
*
has time-span
(is time-span of)
was dea
th of
(died
in)
was death of
(died in)
brought into life
(was born) brought in
to life
(was b
orn)
*
has tim e-span (is time- span of)
at mostwithin
at mostwithin
at mostwithin
AD453AD461
AD452
before
beforebefore
before
Deduction: before
had participants:
took o.o.existence:
brought i. existence:
22ICS-FORTH February 19, 2002
The CIDOC CRM The Participation Properties
had participants (participated in) Event Actor
carried out by (performed) Activity Actor
transferred title to (acquired title of) Acquisition Actor
transferred title from (surrendered title of)Acquisition Actor
custody surrendered by (surrendered custody) Transfer of Custody Actor
custody received by (received custody) Transfer of Custody Actor
has formed (was formed by) Formation Group
dissolved (was dissolved by) Dissolution Group
by mother (gave birth) Birth Person
brought into life (was born) Birth Person
was death of (died in) Death Person
23ICS-FORTH February 19, 2002
0,n 1,1
0,n 1,n
Activity
Type
Event
CIDOC Entity String has notes
CIDOC Notion
has type
is type of
The CIDOC CRM Activities
24ICS-FORTH February 19, 2002
0,n
0,n
0,n
0,n 0,n
0,n
Measurement
Attribute Assignment
Physical Entity Dimension value unit
has dimension (is dimension of)
was measured (measured) observed dimension (was observed)
The CIDOC CRM Activities: Measurement
25ICS-FORTH February 19, 2002
0,n
0,n
0,n
0,n
0,n
1,1 0,n
0,n Activity
Condition Assessment
Physical Entity Condition State has conditions (condition of)
assessed by (concerns) has identified (identified by)
Actor carried out by (performed)
in the role of
Temporal Entity
The CIDOC CRM Activities: Condition Assessment
26ICS-FORTH February 19, 2002
0,n
0,n
0,n
0,n
0,n
0,n
0,n 0,n 0,n 0,n
Activity
Acquisition
Actor Physical Object is current owner of (has current owner)
is former or current owner of (has former or current owner)
acquires title of (transferred title to)
transferred title of (changed ownership by) surrenders title of (transferred title from)
The CIDOC CRM Activities: Acquisition
27ICS-FORTH February 19, 2002
0,n
0,n
0,n
0,n 0,1 0,n
0,n
0,n
0,n
0,n 0,n 0,n
0,n 0,n 0,n 0,n
Activity
Move
Physical Object Place
moved by (moved)
moved to (occupied)
has current location (currently holds)
moved from (vacated)
Type had as general purpose (was purpose of)
has specific purpose (was purpose of)
has current permanent location (is ~ of)
has former or current location (is ~ of)
The CIDOC CRM Activities: Move
28ICS-FORTH February 19, 2002
0,n
0,n
0,n
0,n 0,n
0,n
0,n
0,n 0,n
0,n 0,n
0,n
Activity
Modification
Physical Entity
has produced (was produced by)
Actor carried out by (performed) in the role of
Type used general technique (was technique of)
Man-Made Entity
Design or Procedure
used specific technique (was used by)
Material
consists of (is incorporated in)
usually employs (is usually employed by)
The CIDOC CRM Activities: Modification/Production
29ICS-FORTH February 19, 2002
The CIDOC CRM Entity 11: Modification
Properties: is called (identifies): Appellation has type (is type of): Type had participants (participated in): Actor carried out by (performed): Actor
(in the role of : Type) has produced (was produced by): Physical Man-Made Stuff took into account (was taken into account by): Conceptual Object occurred in the presence of (was present at): Stuff used object (was used for): Physical Object (mode of use: String) used general technique (was technique of): Type used specific technique (was used by): Design or Procedure was motivation for (motivated): Conceptual Object motivated the creation of (was created for): Conceptual Object was intended use of (was made for): Man-Made Object (mode of use: String) had specific purpose (was purpose of): Activity had as general purpose (was purpose of): Type
declared properties
inherited properties
inherited properties
declared properties
inherited properties
30ICS-FORTH February 19, 2002
time
after
at most within
at least covering
before
“in
ten
sity
”
The CIDOC CRM Time Uncertainty, Certainty and Duration
duration
31ICS-FORTH February 19, 2002
Condition State Period
Event
Date
Time Appellation Period Appellation
AppellationTemporal Entity
Time Span
CIDOC Entity
Placehas time-span(is time-span of)
falls within(contains)
consists of(forms part of)
falls within(contains)
consists of(forms part of)
is called(identifies)
is identified by(identifies)
took place at(witnessed)
The CIDOC CRM Time-Span
32ICS-FORTH February 19, 2002
The CIDOC CRM Example: Place
Place A place is an extent in space, determined diachronically wrt a larger,
persistent constellation of matter, often continents -
by coordinates, geophysical features, artefacts, communities, political systems, objects - but not identical to.
A “CRM Place” is not a landscape, not a seat - it is an abstraction from temporal changes - “the place where…”
A means to reason about the “where” in multiple reference systems.
Examples: figures from the bow of a ship, African dinosaur foot-prints in Portugal
33ICS-FORTH February 19, 2002
The CIDOC CRM Place
AddressPlace Name
Spatial Coordinates
Section Definition Physical Object
Place Appellation
Placeconsists of
(forms part of)
defines section of(has section definition)
is located on or within
(has section)
identif
ies
(is id
entif
ied by) has former or current location
(is former or current location of )
Move
moved to (occupied)
moved from (vacated)
moved (moved by)
Production
has produced (was produced by)
happened at (witnessed)
34ICS-FORTH February 19, 2002
Attribute Assignment
Place NamingGroup
Actor
Place
Time-Span Community
Place Appellation
falls within
is identified by(identifies)
assi
gns n
ame
Period
to community
iden
tifi
ed b
y
carriesout
has time-sp
an
to placetook place at
has time-span
The CIDOC CRM Extension Example: Getty’s TGN
35ICS-FORTH February 19, 2002
Nineveh namingPeople of Iraq
TGN7017998
1st mill. BC City of Nineveh
Kuyunjikfalls within is identified by
(identifies)
assi
gns
nam
e
to community
iden
tifi
ed b
y
carryout
has time-span
to place
took place at
has time-span
Example from the TGN
20th century
Nineveh
is identified by(identifies)
Nineveh naming
assi
gns
nam
e
TGN1001441
39ICS-FORTH February 19, 2002
The CIDOC CRM -Application Mapping DC to the CIDOC CRM
Type: textTitle: Mapping of the Dublin Core Metadata Element Set to the CIDOC CRMCreator: Martin DoerrPublisher: ICS-FORTHIdentifier: FORTH-ICS / TR 274 July 2000Language: English
Example: Partial DC Record about a Technical Report
40ICS-FORTH February 19, 2002
was created by
is identified by
E41 Appellation
Name: Mapping of the Dublin Core Metadata Element Set to the CIDOC CRM…..
E33 Linguistic Object
Object: FORTH-ICS /
TR-274 July 2000
E72 Actor Appellation
Name: Martin Doerr
E65 Creation
Event: 0001
carried out by is identified by
E72 Actor Appellation
Name: ICS-FORTH
E7 Activity
Event: 0002
carried out by is identified by
E55 Type
Type: Publication
has type
was taken into account by
E75 Conceptual Object Appellation
Name: FORTH-ICS / TR-274 July 2000
E55 Type
Type:FORTH Identifier
has type
is identified by
E56 Language
Lang.: English
has language
The CIDOC CRM -Application Mapping DC to the CIDOC CRM (RDF style)
E39 Actor
Actor:0001
E39 Actor
Actor:0002
41ICS-FORTH February 19, 2002
Type: imageTitle: Garden of ParadiseCreator: Master of the Paradise GardenPublisher: Staedelsches Kunstinstitut
Example: Partial DC Record about a painting
The CIDOC CRM -Application Mapping DC to the CIDOC CRM
42ICS-FORTH February 19, 2002
is identified by
E41 Appellation
Name: Garden of Paradise…..
E23 Iconographic Object
Object: PA 310-1A??
E72 Actor Appellation
Name: Master of the Paradise Garden
E39 Actor
ULAN: 4162
E12 Production
Event: 0003
carried out byis identified by
E72 Actor Appellation
Name: Staedelsches Kunstinstitut
E39 Actor
Actor: 0003
E65 Creation carried out by is identified by
E55 Type
Type: Publication Creation
has type
is documented in
E31 Document
Docu: 0001
was created byhas type
was produced by
The CIDOC CRM -Application Mapping DC to the CIDOC CRM
E55 Type
AAT: painting
E55 Type
DCT1: imageEvent: 0004
43ICS-FORTH February 19, 2002
The CIDOC CRM -Application Repository Indexing
Actors Events Objects
Derivedknowledge
data (e.g. RDF)
Thesauriextent
CRM entities
On
tolo
gy
ex
pa
ns
ion
Sources and
metadata(XML/RDF)
Backgroundknowledge /Authorities
CIDOCCRM
44ICS-FORTH February 19, 2002
The CIDOC CRMProposition about Restructuring
Data Data acquisition needs different structure from presentation:
Acquisition (can be motivated by the CRM):
— sequence and order, completeness, constraints to guide and control data entry.
— ergonomic, case-specific language, optimized to specialist needs
— often working on series of analogous items Integration / comprehension (target of the CRM):
— restore connectivity with related subjects,
— match and relate by underlying common concepts
— no preference direction or subject Presentation, story-telling (can be based on CRM)
— explore context, paths, analogies orthogonal to data acquisition
— present in order, allow for digestion
— support abstraction
45ICS-FORTH February 19, 2002
As an explanatory and mediation model, the CRM:
does not enforce constraints
— optional properties, multivalued properties, multiple instantiation
contains redundant paths
— “short cuts” of secondary processes, complex indirections
contains abstractions at various levels
— of entities and
— of attributes/properties
The CIDOC CRM About the CRM
46ICS-FORTH February 19, 2002
1. Will the CRM require me to change my documentation?
Answer:
No. It makes no prescriptions.
You may, however, orient your document structures according to CRM constructs,
so that an automatic transfer into a CRM-compatible form is possible.
2. What is the benefit of standardisation?
Answer:
You can communicate the meaning of your data structures in a unique way to any
user of the CRM.
The CIDOC CRM Frequently asked questions
47ICS-FORTH February 19, 2002
3. What does compatibility with the CRM mean?
Answer:
That the parts of my data structure with the same scope can be explained
by the CRM as:
Subset: data can be transformed into a CRM instance and back.
Superset: A CRM instance can be transformed... and back.
Overlap: equivalent parts can be transformed...
Extension: The CRM can be consistently extended by refinement or
generalization such that my data structure becomes a subset.
Still to be defined precisely. Major consensus issue.
The CIDOC CRM Frequently asked questions
48ICS-FORTH February 19, 2002
4. What does the CRM cover?
Answer: Potentially the whole world as physical history in human perception!
At the moment: CIDOC Information Categories, DC, EAD, 90% of
SPECTRUM, most of FRBR, fits with OPENGIS, and many proprietary
formats.
The mapping exercise has verified the stability of model, its
extensibility and the effectiveness of the methodology.
Experience of 5-10 years development by interdisciplinary groups,
several implementations.
The CIDOC CRM Frequently asked questions
49ICS-FORTH February 19, 2002
Elegant and simple compared to comparable Entity-Relationship models
Coherently integrates information at varying degrees of detail
Readily extensible through O-O class typing and specializations
Richer semantic content; allows inferences to be made from underspecified data elements
Designed for mediation of cultural heritage information
The CIDOC CRM Benefits of the CRM (From Tony Gill)
50ICS-FORTH February 19, 2002
Proposed to ISO as Committee Draft. The structure is stable since four years.
Currently the SIG is elaborating minor extension to satisfy archeology, Natural History and relationships to Digital Libraries. Process will close in July.
Elaboration of improved documentation, scope note formulation, presentation until end of 2002.
Acceptance as standard expected by 2003.
The CIDOC CRM State of Development
51ICS-FORTH February 19, 2002
The CRM is NOT a metadata standard,
it should become our language for semantic interoperability,
it is a Conceptual Reference Model for analyzing and designing
cultural information systems
The CRM is in the ISO standardization process:
Dissemination for wide understanding and consensus
Extended application tests
Elaboration of details and documentation
Consensus
The CIDOC CRM Conclusions