interoperability in the cultural heritage domain
DESCRIPTION
Interoperability in the Cultural Heritage Domain. Lourens van der Meij VU Amsterdam – KB (part of sheets by A.Isaac) October 3 rd , 2008. Background. CATCH (NWO) C ontinuous A ccess T o C ultural H eritage Computer science research projects - PowerPoint PPT PresentationTRANSCRIPT
Interoperability in the Cultural Heritage Domain
Lourens van der MeijVU Amsterdam – KB
(part of sheets by A.Isaac)
October 3rd , 2008
Interoperability in the Cultural Heritage Domain
Background
• CATCH (NWO) • Continuous Access To Cultural Heritage• Computer science research projects• Applied to Cultural Heritage (Libraries,
Musea)
• STITCH• SemanTic Interoperability To access
Cultural Heritage• Interoperability:
• Exchanging (standardization)• Integrating (translating, linking)metadata
Interoperability in the Cultural Heritage Domain
Intention
Show through example applications that• Integration of data, collections, and services• Interoperability:
• Data standardized such that it can be used across different applications
• Functionality reusable via services.• Creating mappings, semantic links between data
from different sources
is important in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain
First
• Illustrate Integrated access to collections in the CH domain by looking at use case.
• Introduction of the use case• About vocabulaires• Introduce the collections that will be
integrated• Faceted browsing• What we want ->• Demo• Requirements, details
Interoperability in the Cultural Heritage Domain
(Integrated) Access to collections
• Collections: (records) of books, pieces of art,…• Electronic access, web portal.• STITCH focuses on semantics: structured access using
the available knowledge sources, not full text search• Records: meta data, information about the object
• Author• Date• Subject
• CH institutes often maintain knowledge structures(KOS), vocabularies, to facilitate storage and access and maintenance.
• Subject meta data, access through KOS focus of STITCH.
Interoperability in the Cultural Heritage Domain
Vocabularies (Knowledge Structures, KOS)
• Thesauri, classification systems, structuring collections, describing content, form, aspects of collection elements.
• Many vocabularies, within the KB: STITCH is cooperation between VU Amsterdam (KRR group), National Library(KB) and MPI Nijmegen. In the KB in the order of 10 vocabularies are maintained internally, and 20 or more external vocabularies play a role. Why?• History• Specialized collections, particular views on the
collection and theories how access should be provided.• Examples of vocabularies in the demos.
Interoperability in the Cultural Heritage Domain
Vocabularies
• Many different (kinds) of Vocabularies• Many different representations, data formats,
methods of access.
• Integrated access requires • standardized representation of vocabularies and
collections• standardized access => services• Providing links between elements of vocabularies,
alignment of vocabularies
• Next: example of integration
Interoperability in the Cultural Heritage Domain
Illustration, use case STITCH
• Integrated access to two collections:• KB : geillumineerde manuscripten• BnF: Mandragore, manuscrits enluminés• STITCH focus:
• Integration• Alignment, techniques (and standards)
• Interoperability• RDF, SKOS
Those aspects will be discussed after the first demo.
Interoperability in the Cultural Heritage Domain
KB Illustrated Manuscripts
Interoperability in the Cultural Heritage DomainKB Illustrated Manuscripts: Iconclass
Interoperability in the Cultural Heritage Domain
Mandragore
Interoperability in the Cultural Heritage Domain
Mandragore
Interoperability in the Cultural Heritage Domain
Faceted browsing
• Access the collection, using structure of the vocabularies
• Different dimensions: subject, author,..• Use the hierarchy of vocabularies if there is
such to group together objects• Lions, Giraffes, Zebras -> animals. Distinguish them
as a group.
Interoperability in the Cultural Heritage Domain
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
What we have
Interoperability in the Cultural Heritage Domain
What we want
Interoperability in the Cultural Heritage Domain
Demo
• KB Illuminated Manuscripts• BNF Mandragore Manuscripts
• http://galjas.cs.vu.nl:33333/MANDRA-SV-ICE-mandraNewNONE , amphibians
• Wheat
Interoperability in the Cultural Heritage Domain
Integrated Access
• Integrated semantic access requires • standardized representation of vocabularies and
collections• standardized access => services• Providing links between elements of vocabularies.
Interoperability in the Cultural Heritage Domain
Standardized representation
• Use of semantic web techniques• “Things” are represented as “resources”,URIs, over
any application and data set• Values as simple strings, numbers(Literals), URIs• Properties as typed, named links between URIs and
URIs and Literals• Theory, reasoning methods. interoperability, some standardization
Still need standardization on how to represent CH objects (xml:Dublin core), vocabularies (SKOS), links between elements of vocabularies.
Interoperability in the Cultural Heritage Domain
http://www.iconclass.nl/s_11
http://www.iconclass.nl/s_11F
skos:Concept
rdf:type
skos:broader
skos:prefLabel“the Virgin Mary”@en
skos:prefLabel“la Vierge Marie”@fr
http://www.iconclass.nl/
skos:inScheme
skos:ConceptScheme
rdf:type
SKOS: Example
Interoperability in the Cultural Heritage Domain
SKOS (Simple Knowledge Organization System)
• SKOS offers building blocks to represent KOSs in RDF
• Objects: Concept and ConceptScheme• Lexical properties (multilingual)
• prefLabel• altLabel
• Semantic relations• broader, narrower• related
• Notes • scopeNote• definition
…
Interoperability in the Cultural Heritage Domain
Vocabulary alignment
• Aim: finding semantic correspondences between vocabulary elements• “klassieke ruïnes” ≈ “landschap met ruïnes”• “maagd Maria” = “Heilige Moeder”
• Doing it (semi-) automatically• Vocabularies are big (tens of thousands concepts)• They change
Interoperability in the Cultural Heritage Domain
Automatic alignment techniques
• Lexical Labels of entities and textual definitions
• StructuralStructure of the vocabularies
• Background knowledge Using a shared conceptual reference to find links
• ExtensionalObject information (e.g. book indexing)
céréale, grain, blé blé
Interoperability in the Cultural Heritage Domain
Automatic alignment techniques
• Lexical Labels of entities and textual definitions
• StructuralStructure of the vocabularies
• Background knowledge Using a shared conceptual reference to find links
• ExtensionalObject information (e.g. book indexing)
céréale, grain, blé blé
Interoperability in the Cultural Heritage Domain
Extensional Statistical Alignment
• Object information (e.g. book indexing)
Thesaurus 1 Thesaurus 2
Collectionof books
“DutchLiterature”
“Dutch”
Interoperability in the Cultural Heritage Domain
Results
1: 9132.9 (1704 3479 976) Schilderijen - schilderkunst
2: 8088.5 (1204 2330 767) Kwaliteitszorg - kwaliteitsmanagement
3: 6232.7 (820 1572 543) Personeelsmanagement - personeelsbeleid
4: 5392.1 (1399 3271 622) Beeldende kunsten - beeldende kunst
5: 5063.1 (4951 1152 613) Nederlands - Nederlandse taalkunde
17: 3421.8 (280 714 243) Diabetes mellitus - suikerziekte
Interoperability in the Cultural Heritage Domain
Alignment: no Trivial Solution
• Current techniques are not reliable as unique source of knowledge
• What is a good alignment?• Evaluation criteria?• => What will it be used for?Usage scenarios • Integrated Search• Reindexing• Thesaurus merging• Navigation => faceted browsing
Interoperability in the Cultural Heritage Domain
What next
• Evaluation, lessons learned• What next ->• Second use case: reindexing• (Vocabulary service)• Conclusion
Interoperability in the Cultural Heritage Domain
Why usage scenarios
• Evaluation of alignments depends on its use.• Real world applications provide test of quality of
alignments• Requirements on alignments depend on their use.• What kinds of links should be distinguished?• Optional demo evaluation:
• http://localhost:33344/logineval• http://kits.cs.vu.nl:33344/logineval
• Next, reindexing, nearest to real world application.
Interoperability in the Cultural Heritage Domain
Situation at Dutch libraries, National Library(=KB)
• KB: two large collections:• DEPOT?Deposit collection: all Dutch language
publications)• Own Scientific collection• Subject indexing using two completely different
indexing systems Brinkman, GOO
• Common automation system for NL, Eu (OCLC-Pica)
• Meta data of books, contains lots of fields• Een boek, publicatie door verschillende
bibliotheken voorzien van meta data, gebruik makend van vele verschillende vocabulaires.
Interoperability in the Cultural Heritage Domain
Reindexing
• KB has about 20 people indexing books daily, about 20,000 books per year are being indexed.
• Indexing even internally according to different vocabularies. Indexing: adding keywords and classification information to books.
• Some books come with indexing done by other libraries (openbare bibliotheken, Biblion).
• If Biblion indices, or combinations could be translated to KB indices (Brinkman). Less work for KB.
Interoperability in the Cultural Heritage Domain
WinIBW
• OCLC (PICA) automatiseringssysteem voor bibliotheken in Nederland, ook gebruikt binnen Europa
• Online Public Access Catalogue (OPAC) • WinIBW internet access to Pica system (local
and central). Adding records, adding meta data, searching records.
• Demo, closest to real world application.
Interoperability in the Cultural Heritage Domain
Reindexing
• Biblion -> Brinkman Fietstochten, Kapellen, Beesel, Heiligenbeelden,… -> Brinkman?
Use alignment..Bibl:Fietstochten -> Brinkman?Bibl:Kappellen -> Brinkman?DEMO(Voorbeeld z sel 3-10-2008 gd?79)
Interoperability in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain
Result
Interoperability in the Cultural Heritage Domain
Reindexing
• Under evaluation• Improvement:
• Use other meta data• Adapt scenario (pass 95% confidence records)
• Many other uses.
Interoperability in the Cultural Heritage Domain
Schets vocabulaires van belang voor de KB
Interoperability in the Cultural Heritage Domain
Integrated Access
• Services through the internet• Protocols, SOAP, REST,..• Collection Access?• Vocabulary Access, Alignment access• http://eculture.cs.vu.nl:38080/vocreptags• http://localhost:8080/vocreptags
Interoperability in the Cultural Heritage Domain
Lessons
• Using semantic web techniques interoperability and integration of collections can be made easier.
• Aligning vocabularies is of use in different situations. The alignment methods need to be fine-tuned to the application they are meant for.
• Introducing new techniques, interaction between field CH and scientific institutes very valuable.
• Standardization of access to collections and vocabularies should be dealt with (prototype has been developed).
Interoperability in the Cultural Heritage Domain
Begrippen
• An ontology in both computer science and information science is a formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain.
• Metadata (meta data, or sometimes metainformation) is "data about data", of any sort in any media. An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema.
Interoperability in the Cultural Heritage Domain
begrippen
• A library classification is a system of coding and organizing library materials (books, serials, audiovisual materials, computer files, maps, manuscripts, realia) according to their subject and allocating a call number to that information resource. Similar to classification systems used in biology, bibliographic classification systems group entities that are similar together typically arranged in a hierarchical tree structure.
• In information technology, a thesaurus represents a database or list of semantically orthogonal topical search keys. In the field of Artificial Intelligence, a thesaurus may sometimes be referred to as an ontology.