semantic interoperability to access cultural heritage frank van harmelen henk matthezing peter...

29
SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van der Meij Stefan Schlobach Paul Doorenbosch

Upload: austen-casey

Post on 22-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

SemanTic Interoperability To access Cultural HeritageFrank van HarmelenHenk Matthezing Peter WittenburgMarjolein van GendtAntoine IsaacLourens van der MeijStefan SchlobachPaul Doorenbosch

SemanTic Interoperability To access Cultural Heritage

CH Interoperability Problems

• Current CH trend: portals that build on heterogeneous collectionsDifferent databases/vocabularies/MD schemes

SemanTic Interoperability To access Cultural Heritage

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

DescriptionBase Y

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

DescriptionBase X

DocumentCollection X

DocumentCollection Y

Thesaurus x

Thesaurus y

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

DescriptionBase Y

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

DescriptionBase X

DocumentCollection X

DocumentCollection Y

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

DescriptionBase Y

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

DescriptionBase X

DocumentCollection X

DocumentCollection Y

Thesaurus x

Thesaurus y

SemanTic Interoperability To access Cultural Heritage

CH Interoperability Problems

• Current CH trend: portals that build on heterogeneous collectionsDifferent databases/vocabularies/MD schemes

• Syntactic interoperability problem being solved?Access can be granted

• Semantic interoperability still to be addressedLinks with original vocabularies/MD structures are lost

SemanTic Interoperability To access Cultural Heritage

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

Unified (Virtual)Description Base

DB X

Unified MD Scheme- Field 1

- Field 1.1- Field 1.2

- …

DB Y

No semantic information for description vocabulary

One-shot translation towardsa merged MD structure

SemanTic Interoperability To access Cultural Heritage

STITCH General Goals

Allow heterogeneous CH collections to be accessed• In an integrated way• Still benefiting from specific collection commitments

Keeping original metadata schemes and vocabularies

Using Semantic Web means for• Representation of different points of view in one

system• Creation and use of alignment knowledge

SemanTic Interoperability To access Cultural Heritage

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

Knowledge baseDB Y

DB X

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

Knowledge baseDB Y

DB X

SemanTic Interoperability To access Cultural Heritage

STITCH General Goals (2)

Research objective: develop theory, methods and tools for allowing metadata interoperability through semantic links between vocabularies

• Formalization of schemes (and collections)• Applying ontology mapping techniques to

those schemes• Using the results of the mappings in formal

reasoning mechanisms (and dedicated interfaces)

SemanTic Interoperability To access Cultural Heritage

Applying SW research to concrete objectives

• Specificity of resources (thesauri, metadata schemes)Formalization in a context of natural semantics

• What can ontology mapping techniques bring to solve the interoperability problem in CH?• Quantitative and qualitative evaluation• Integration into realistic scenariosAre these techniques really applicable to the CH case?

• Uses that have to be further specified• What does ‘accessing collections in an integrated way’

mean?• Interfaces, services?Anticipating needs that are not yet stabilized

SemanTic Interoperability To access Cultural Heritage

Pilot Project

Experiment on a reduced scale • Choose and formalize 2 collections and their

associated subject vocabularies• Rijksmuseum ARIA Masterpieces and its “catalogue”• KB Illustrated Manuscripts and Iconclass

• Use existing mapping tools to align vocabularies

• Adapt/develop a browsing interface providing an integrated access using:• Original vocabularies and their structure• Alignment information

SemanTic Interoperability To access Cultural Heritage

1st Collection: KB Illustrated Manuscripts

SemanTic Interoperability To access Cultural Heritage

2nd Collection: Rijksmuseum ARIA collection

SemanTic Interoperability To access Cultural Heritage

PP Modules

Initial thesauriIconclass

ARIA catalogue

Standard SWrepresentation of

vocabularies

Syntactic interoperability forvocabularies

Standard SWrepresentationof collections

Mappingknowledge

Syntactic interoperability andmanually-achieved semantic

interoperability for MD schemes

Semantic interoperabilityfor vocabularies

Browser

Definitions offacets

SW descriptionstorage andquery engine

Initial collectionsKB Manuscripts

ARIA

View specification

SemanTic Interoperability To access Cultural Heritage

PP Modules

Initial thesauriIconclass

ARIA catalogue

Standard SWrepresentation of

vocabularies

Syntactic interoperability forvocabularies

Standard SWrepresentationof collections

Mappingknowledge

Syntactic interoperability andmanually-achieved semantic

interoperability for MD schemes

Semantic interoperabilityfor vocabularies

Browser

Definitions offacets

SW descriptionstorage andquery engine

Initial collectionsKB Manuscripts

ARIA

View specification

Collection formalization

SemanTic Interoperability To access Cultural Heritage

Collection Formalization Goals

• Analysis of the vocabularies and MD structures• Representation using SW languages

• Testing standard means (SKOS/RDF)

• Conversion for vocabularies, but also for metadata structures• Ontologies providing proper collection-related

relations

• Conversion for interface and reasoning engine (application-specific) but also for formal ontology mapping tools

SemanTic Interoperability To access Cultural Heritage

Vocabulary Formalisation: ARIA in SKOS

aria:BT_24563

aria:T_27945

skos:prefLabel

skos:broaderskos:prefLabel

"AnimalPieces"

"Birds"

rdf:typeskos:Concept

rdf:type

skos:inSchemearia:Catalog_CS

skos:inSchemerdf:type

skos:ConceptScheme

SemanTic Interoperability To access Cultural Heritage

Collection Formalization Problems

• Interpreting and representing vocabularies using formal standards is hindered by expressivity variation• Complex models• Fuzzy structures, weakly structuredImplies some loss of data during standardisation?

• Part of the formalization is system-specific• Depending on application environment

• Standard RDFS expressivity and implemented tools

• Depending on the mapping tools, which might make different hypotheses on the nature of knowledge to align

• OWL classes vs. nodes in trees

Changes the role of the standard representation in the system?

SemanTic Interoperability To access Cultural Heritage

PP Modules

Initial thesauriIconclass

ARIA catalogue

Standard SWrepresentation of

vocabularies

Syntactic interoperability forvocabularies

Standard SWrepresentationof collections

Mappingknowledge

Syntactic interoperability andmanually-achieved semantic

interoperability for MD schemes

Semantic interoperabilityfor vocabularies

Browser

Definitions offacets

SW descriptionstorage andquery engine

Initial collectionsKB Manuscripts

ARIA

View specification

Collection integration

SemanTic Interoperability To access Cultural Heritage

Automatic Ontology Matching Techniques

Generally aiming at recognizing equivalence or subsumption links between ontology elements

• Lexical Labels of entities, textual definitions

• StructuralStructure of the formal definitions of entities, position in the

hierarchy

• StatisticalObjects, instantiation of the concepts

• Shared background knowledge (“oracles”)Using conceptual references to deduce correspondences

Most mapping tools use a mix of such approachesE.g. lexical string matching can ignite a structural alignment process

brainLong tumor tumorLong

SemanTic Interoperability To access Cultural Heritage

Collection Integration Goals

• Provide mappers with proper resources• Pre-processing done in previous step

• Use them in the most efficient way• Setting taking into account the specificities of CH

vocabularies

• Evaluation/selection of their results• Taking into account the use of CH vocabularies in their

collection

• Use their result in the application system• Post-processing

• Do it for vocabularies but also for metadata schemesNot in pilot

SemanTic Interoperability To access Cultural Heritage

Mappings

SemanTic Interoperability To access Cultural Heritage

Mappings

SemanTic Interoperability To access Cultural Heritage

Collection Formalization Problems

• Input: needs pre-processing, possibly division• Output: needs re-interpretation of mapping relations

• Can confidence measures be used?

• Alignment process• Usually turning to resources that may be absent from

thesauri• Rich formal/structural information• Dually indexed documents

• Not (properly) using all information found in thesauri• E.g. rich lexical information

Leading to ‘low-quality’ thesaurus mapping

SemanTic Interoperability To access Cultural Heritage

PP Modules

Collection access

Initial thesauriIconclass

ARIA catalogue

Standard SWrepresentation of

vocabularies

Syntactic interoperability forvocabularies

Standard SWrepresentationof collections

Mappingknowledge

Syntactic interoperability andmanually-achieved semantic

interoperability for MD schemes

Semantic interoperabilityfor vocabularies

Browser

Definitions offacets

SW descriptionstorage andquery engine

Initial collectionsKB Manuscripts

ARIA

View specification

SemanTic Interoperability To access Cultural Heritage

User Interface: Access to Collections

• Adapted faceted browsing paradigm (Flamenco)• Search by navigating through several facets• STITCH PP facet adaptation:

From orthogonal facets (‘material’, ‘location’) to facets describing different conceptual schemes (ARIA, Iconclass)

• 3 views on integrated collections• Single view• Combined view• Merged view

• http://stitch.cs.vu.nl

SemanTic Interoperability To access Cultural Heritage

Collections Access: Single View

• Facets based on 1 point of view and its associated concept scheme(s)• Access to objects indexed against concepts from other schemes

• If mapping between their index and the concepts from single viewA single point of view on integrated data set

SINGLE ARIA view

[other ARIA facets..]

ARIA facet1

ARIA facet2

SemanTic Interoperability To access Cultural Heritage

Collections Access: Combined View

• Search based on 2 (or more) points of view• One facet uses 1 vocabulary from 1 point of view• Facets attached to the different points of view are presented

• Simultaneous access to different points of view of the same data

COMBINED view

IconClass facet1

[other ARIA/IC facets..]

ARIA facet1

SemanTic Interoperability To access Cultural Heritage

Collections Access: Merged View

• Facets using a merged concept scheme• Mapping leads to hierarchical links between schemes

• Making the links between vocabularies more visible during search• A way to ‘enrich’ weakly structured vocabularies

MERGED view

Merged facet1

[other merged facets…]

SemanTic Interoperability To access Cultural Heritage

Collection Access: Conclusion

Prototype is thin layer on top of SW/RDF technology (using Sesame)

• All data is stored in and retrieved from RDF repositories

• Easily adaptable for experimentation with different views (without programming)

For convincing results you need ‘good quality’ mapping

• E.g., to assess the value of Merged viewTowards application-specific evaluation criteria?