enriching cultural heritage data with dbpedia
TRANSCRIPT
Enriching Cultural Heritage Data with DBpediaAntoine Isaac | DBpedia Community Meeting 2016
Netherlands, Public Domain
1660 - 1625, Rijksmuseum
Anonymous
Arrival of a Portuguese ship
Title hereCC BY-SA
Europeana?
Europeana EssentialsCC BY-SA
Enriching Cultural Heritage Data with DBpediaCC BY-SA
Europeana Collections homepageEuropeana| CC BY-SA
Title hereCC BY-SA
Title hereCC BY-SA
Europeana EssentialsCC BY-SA
Enriching Cultural Heritage Data with DBpediaCC BY-SA
Europeana aggregation infrastructureEuropeana| CC BY-SA
Europeana?
Europeana has many data challenges
Enriching Cultural Heritage Data with DBpediaCC BY-SA
We aggregate very heterogeneous metadata
• More than 48M objects• 3,500 galleries, libraries, archives and museums• 50 languages• From all EU countries• Level of quality varies greatly
Title hereCC BY-SA
Title hereCC BY-SA
Enriching Cultural Heritage Data with DBpediaCC BY-SA
Linked Open Data
Europeana Linked Open Data video on VimeoEuropeana | CC BY-SA
Europeana Linked Data StrategyOur efforts and lines of work
Enriching Cultural Heritage Data with DBpediaCC BY-SA
• The Europeana Data Model (EDM) offers a way to represent richer (linked) data
• We apply an enrichment strategy to link source data to reference data, including DBpedia
Will be discussed in Parallel Session 2:
• We encourage data providers to contribute links between objects and (their own) vocabularies
• We encourage alignment activities between domain vocabularies
Title hereCC BY-SA
Title hereCC BY-SA
Europeana EssentialsCC BY-SA
The Europeana Data Model
Enriching Cultural Heritage Data with DBpediaCC BY-SA
Clavecin, Bartolomeo Cristofori Cite de la Musique, MIMO - Musical Instruments Museums Online|CC BY-NC-SA
Europeana Data Model exampleEuropeana| CC BY-SA
Title hereCC BY-SA
Title hereCC BY-SA
Europeana EssentialsCC BY-SA
Create a “semantic layer” on top of cultural heritage objects
Enriching Cultural Heritage Data with DBpediaCC BY-SA
Include multilingual “value vocabularies” (e.g. thesauri represented SKOS)
from Europeana’s providers or from third-party data sources
Semantic enrichment, a solution for better quality data? Automatic and manual enrichment are more and more commonly used in digital libraries to:
• normalise data
• “standardize data” by linking it to authority resources
• improve multilingual coverage in datasets
• contextualise resources
Enriching Cultural Heritage Data with DBpediaCC BY-SA
The main components of semantic enrichment
CC BY-SA
source objects whose metadata is being enriched set of resources
used to enrich the source metadata
targets can be of different types, from simple uncontrolled strings to resources published as LOD
specify how the enrichment between the source and target should be executed.
SourceTarget
Rules
Enriching Cultural Heritage Data with DBpedia
Automatic enrichment process in Europeana
CC BY-SA
selection of metadata fields in descriptions
selection of potential rules to match
matching the values of the metadata fields to values of the contextual resources
adding contextual links
selection of values from the contextual resource
values go into the search index
Analysis
Linking
Augmentation of search index
Enriching Cultural Heritage Data with DBpedia
CC BY-SAEnriching Cultural Heritage Data with DBpedia
Vocabularies we currently enrich metadata with
CC BY-SAEnriching Cultural Heritage Data with
DBpedia
Entity Class
Target vocabulary Size Metadata Fields subject of Enrichment
Places GeoNames 140,097 dcterms:spatial, dc:coverage
Concepts DBpedia 5,284 dc:subject, dc:type
GEMET 280
Agents DBpedia 161,209 dc:creator, dc:contributor
Time Semium Time 2,566 dc:coverage, dcterms:temporal, dc:date, edm:year
Why DBpedia?
CC BY-SA
Building an ecosystem of networked references
• It offers labels in about 124 languages through all its language editions of which 48 match the languages that Europeana supports
• It gives fairly complete and accurate descriptive metadata about entities
• Works great as a “pivot” vocabulary, providing further links to other vocabularies such as Wikidata and Freebase
Not everything is perfect
France, Public Domain1921, National Library of FranceAgence de presse Meurisse
Colombes : championnats de France d’Athlétisme :rivière, le speaker
Challenges of multilingual automatic enrichment
Evaluation of metadata enrichment practices in digital libraries: steps towards better data enrichments
Poisonous India or the Importance of a Semantic and Multilingual Enrichment StrategyMarlies Olensky, Juliane Stiller, Evelyn Dröge, MTSR 2012 http://link.springer.com/chapter/10.1007%2F978-3-642-35233-1_25
Comparative evaluation of enrichments
CC BY-SAEnriching Cultural Heritage Data with DBpedia
We ran a quantitative evaluation on a sample set enriched by 7 different tools (settings)
http://pro.europeana.eu/taskforce/evaluation-and-enrichments
Example of Recommendations that will be explored
CC BY-SAEnriching Cultural Heritage Data with
DBpedia
Define your enrichment goals• Develop better criteria for evaluating enrichment
Choose the right service• enrichment tool more aware of the semantics of the
model
Monitor your enrichment process and re-assess• target dataset could be richer: new terms, new
languages, more granular
Enrichment using a better reference for contextual entities?
You will hear about this in the next session ☺
Title hereCC BY-SA
Name of image | Creator
Providing organization| Country, licence
Name of image | CreatorProviding organization| Country, licence
With slides from Valentine Charles, Juliane Stiller, Hugo Manguinhas and Stefan Gradmann