semantic interoperability to access cultural heritage frank van harmelen henk matthezing peter...
TRANSCRIPT
SemanTic Interoperability To access Cultural HeritageFrank van HarmelenHenk Matthezing Peter WittenburgMarjolein van GendtAntoine IsaacLourens van der MeijStefan SchlobachPaul Doorenbosch
SemanTic Interoperability To access Cultural Heritage
CH Interoperability Problems
• Current CH trend: portals that build on heterogeneous collectionsDifferent databases/vocabularies/MD schemes
SemanTic Interoperability To access Cultural Heritage
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
DescriptionBase Y
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
DescriptionBase X
DocumentCollection X
DocumentCollection Y
Thesaurus x
Thesaurus y
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
DescriptionBase Y
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
DescriptionBase X
DocumentCollection X
DocumentCollection Y
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
DescriptionBase Y
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
DescriptionBase X
DocumentCollection X
DocumentCollection Y
Thesaurus x
Thesaurus y
SemanTic Interoperability To access Cultural Heritage
CH Interoperability Problems
• Current CH trend: portals that build on heterogeneous collectionsDifferent databases/vocabularies/MD schemes
• Syntactic interoperability problem being solved?Access can be granted
• Semantic interoperability still to be addressedLinks with original vocabularies/MD structures are lost
SemanTic Interoperability To access Cultural Heritage
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
Unified (Virtual)Description Base
DB X
Unified MD Scheme- Field 1
- Field 1.1- Field 1.2
- …
DB Y
No semantic information for description vocabulary
One-shot translation towardsa merged MD structure
SemanTic Interoperability To access Cultural Heritage
STITCH General Goals
Allow heterogeneous CH collections to be accessed• In an integrated way• Still benefiting from specific collection commitments
Keeping original metadata schemes and vocabularies
Using Semantic Web means for• Representation of different points of view in one
system• Creation and use of alignment knowledge
SemanTic Interoperability To access Cultural Heritage
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
Knowledge baseDB Y
DB X
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
Knowledge baseDB Y
DB X
SemanTic Interoperability To access Cultural Heritage
STITCH General Goals (2)
Research objective: develop theory, methods and tools for allowing metadata interoperability through semantic links between vocabularies
• Formalization of schemes (and collections)• Applying ontology mapping techniques to
those schemes• Using the results of the mappings in formal
reasoning mechanisms (and dedicated interfaces)
SemanTic Interoperability To access Cultural Heritage
Applying SW research to concrete objectives
• Specificity of resources (thesauri, metadata schemes)Formalization in a context of natural semantics
• What can ontology mapping techniques bring to solve the interoperability problem in CH?• Quantitative and qualitative evaluation• Integration into realistic scenariosAre these techniques really applicable to the CH case?
• Uses that have to be further specified• What does ‘accessing collections in an integrated way’
mean?• Interfaces, services?Anticipating needs that are not yet stabilized
SemanTic Interoperability To access Cultural Heritage
Pilot Project
Experiment on a reduced scale • Choose and formalize 2 collections and their
associated subject vocabularies• Rijksmuseum ARIA Masterpieces and its “catalogue”• KB Illustrated Manuscripts and Iconclass
• Use existing mapping tools to align vocabularies
• Adapt/develop a browsing interface providing an integrated access using:• Original vocabularies and their structure• Alignment information
SemanTic Interoperability To access Cultural Heritage
PP Modules
Initial thesauriIconclass
ARIA catalogue
Standard SWrepresentation of
vocabularies
Syntactic interoperability forvocabularies
Standard SWrepresentationof collections
Mappingknowledge
Syntactic interoperability andmanually-achieved semantic
interoperability for MD schemes
Semantic interoperabilityfor vocabularies
Browser
Definitions offacets
SW descriptionstorage andquery engine
Initial collectionsKB Manuscripts
ARIA
View specification
SemanTic Interoperability To access Cultural Heritage
PP Modules
Initial thesauriIconclass
ARIA catalogue
Standard SWrepresentation of
vocabularies
Syntactic interoperability forvocabularies
Standard SWrepresentationof collections
Mappingknowledge
Syntactic interoperability andmanually-achieved semantic
interoperability for MD schemes
Semantic interoperabilityfor vocabularies
Browser
Definitions offacets
SW descriptionstorage andquery engine
Initial collectionsKB Manuscripts
ARIA
View specification
Collection formalization
SemanTic Interoperability To access Cultural Heritage
Collection Formalization Goals
• Analysis of the vocabularies and MD structures• Representation using SW languages
• Testing standard means (SKOS/RDF)
• Conversion for vocabularies, but also for metadata structures• Ontologies providing proper collection-related
relations
• Conversion for interface and reasoning engine (application-specific) but also for formal ontology mapping tools
SemanTic Interoperability To access Cultural Heritage
Vocabulary Formalisation: ARIA in SKOS
aria:BT_24563
aria:T_27945
skos:prefLabel
skos:broaderskos:prefLabel
"AnimalPieces"
"Birds"
rdf:typeskos:Concept
rdf:type
skos:inSchemearia:Catalog_CS
skos:inSchemerdf:type
skos:ConceptScheme
SemanTic Interoperability To access Cultural Heritage
Collection Formalization Problems
• Interpreting and representing vocabularies using formal standards is hindered by expressivity variation• Complex models• Fuzzy structures, weakly structuredImplies some loss of data during standardisation?
• Part of the formalization is system-specific• Depending on application environment
• Standard RDFS expressivity and implemented tools
• Depending on the mapping tools, which might make different hypotheses on the nature of knowledge to align
• OWL classes vs. nodes in trees
Changes the role of the standard representation in the system?
SemanTic Interoperability To access Cultural Heritage
PP Modules
Initial thesauriIconclass
ARIA catalogue
Standard SWrepresentation of
vocabularies
Syntactic interoperability forvocabularies
Standard SWrepresentationof collections
Mappingknowledge
Syntactic interoperability andmanually-achieved semantic
interoperability for MD schemes
Semantic interoperabilityfor vocabularies
Browser
Definitions offacets
SW descriptionstorage andquery engine
Initial collectionsKB Manuscripts
ARIA
View specification
Collection integration
SemanTic Interoperability To access Cultural Heritage
Automatic Ontology Matching Techniques
Generally aiming at recognizing equivalence or subsumption links between ontology elements
• Lexical Labels of entities, textual definitions
• StructuralStructure of the formal definitions of entities, position in the
hierarchy
• StatisticalObjects, instantiation of the concepts
• Shared background knowledge (“oracles”)Using conceptual references to deduce correspondences
Most mapping tools use a mix of such approachesE.g. lexical string matching can ignite a structural alignment process
brainLong tumor tumorLong
SemanTic Interoperability To access Cultural Heritage
Collection Integration Goals
• Provide mappers with proper resources• Pre-processing done in previous step
• Use them in the most efficient way• Setting taking into account the specificities of CH
vocabularies
• Evaluation/selection of their results• Taking into account the use of CH vocabularies in their
collection
• Use their result in the application system• Post-processing
• Do it for vocabularies but also for metadata schemesNot in pilot
SemanTic Interoperability To access Cultural Heritage
Collection Formalization Problems
• Input: needs pre-processing, possibly division• Output: needs re-interpretation of mapping relations
• Can confidence measures be used?
• Alignment process• Usually turning to resources that may be absent from
thesauri• Rich formal/structural information• Dually indexed documents
• Not (properly) using all information found in thesauri• E.g. rich lexical information
Leading to ‘low-quality’ thesaurus mapping
SemanTic Interoperability To access Cultural Heritage
PP Modules
Collection access
Initial thesauriIconclass
ARIA catalogue
Standard SWrepresentation of
vocabularies
Syntactic interoperability forvocabularies
Standard SWrepresentationof collections
Mappingknowledge
Syntactic interoperability andmanually-achieved semantic
interoperability for MD schemes
Semantic interoperabilityfor vocabularies
Browser
Definitions offacets
SW descriptionstorage andquery engine
Initial collectionsKB Manuscripts
ARIA
View specification
SemanTic Interoperability To access Cultural Heritage
User Interface: Access to Collections
• Adapted faceted browsing paradigm (Flamenco)• Search by navigating through several facets• STITCH PP facet adaptation:
From orthogonal facets (‘material’, ‘location’) to facets describing different conceptual schemes (ARIA, Iconclass)
• 3 views on integrated collections• Single view• Combined view• Merged view
• http://stitch.cs.vu.nl
SemanTic Interoperability To access Cultural Heritage
Collections Access: Single View
• Facets based on 1 point of view and its associated concept scheme(s)• Access to objects indexed against concepts from other schemes
• If mapping between their index and the concepts from single viewA single point of view on integrated data set
SINGLE ARIA view
[other ARIA facets..]
ARIA facet1
ARIA facet2
SemanTic Interoperability To access Cultural Heritage
Collections Access: Combined View
• Search based on 2 (or more) points of view• One facet uses 1 vocabulary from 1 point of view• Facets attached to the different points of view are presented
• Simultaneous access to different points of view of the same data
COMBINED view
IconClass facet1
[other ARIA/IC facets..]
ARIA facet1
SemanTic Interoperability To access Cultural Heritage
Collections Access: Merged View
• Facets using a merged concept scheme• Mapping leads to hierarchical links between schemes
• Making the links between vocabularies more visible during search• A way to ‘enrich’ weakly structured vocabularies
MERGED view
Merged facet1
[other merged facets…]
SemanTic Interoperability To access Cultural Heritage
Collection Access: Conclusion
Prototype is thin layer on top of SW/RDF technology (using Sesame)
• All data is stored in and retrieved from RDF repositories
• Easily adaptable for experimentation with different views (without programming)
For convincing results you need ‘good quality’ mapping
• E.g., to assess the value of Merged viewTowards application-specific evaluation criteria?