linked data and cultural heritage data: an overview of the approaches from europeana and the...

31

Upload: the-european-library

Post on 14-Jun-2015

152 views

Category:

Science


3 download

DESCRIPTION

Europeana provides access to digital resources from a wide range of cultural heritage institutions all across Europe. In order to support Europeana, a wide network of organizations collaborates in data integration activities. The European Library plays the role of library-domain aggregator for Europeana, and its activities include also being a gateway to the collections and data of Europe’s national and research libraries, operating on the principle of open data for re-use. The Europeana Network addresses its data integration challenges by leveraging on Linked Data and the Semantic Web. Its approach to data integration is based in a single data model, the Europeana Data Model, which embraces the Semantic Web principles to integrate the various data models and ontologies used in cultural heritage data. The paradigm of Linked Data, brings many new challenges to libraries. The generic nature of data representation used in Linked Data, while allowing any community to manipulate the data, also opens many paths for implementation, with no clear optimal choice for libraries. The European Library leverages on its operational infrastructure to make library data available. It maintains The European Library Open Dataset, which is derived from the data aggregated from member libraries, and made available under the Creative Commons CC0 1.0 Universal license, in order to promote and facilitate its reuse by any community. Extensive linking is performed in the preparation of The European Library Open Dataset. It relies on Information Extraction and Data Mining to establish links to external open datasets, covering the most prominent entities types present in library data: persons, corporate bodies, places, concepts, intellectual works and manifestations. The European Library also applies a linked data approach for intellectual property rights clearance processes, for supporting mass digitization projects. This approach is applied in the within the European ARROW rights infrastructure .

TRANSCRIPT

Page 1: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library
Page 2: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European

Library

Nuno FreireChief data officerThe European Library

Pacific Neighbourhood Consortium 2014 Annual ConferenceTaipei, October 2014

Page 3: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Outline Introduction and Context

• The European Library

• Europeana

• The data model for metadata exchange in the Europeana network

Linked Data at The European Library• Managing and linking person names

• Managing and linking place names

• Managing and linking concepts

Page 4: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Introduction and context

www.theeuropeanlibrary.org

Page 5: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

What is The European Library?

Project started 1996, full operational service from 2005

European hub of metadata, collections and increasing amount of full text

Membership of national and research libraries of 47 Council of Europe states

Non-profit, owned and managed by member libraries

Page 6: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library
Page 7: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

What does The European Library offer?

Experienced European project partner

Large-scale aggregation

Infrastructure

Data and digital contentof Europe’s libraries

Data distribution

Data enrichment Linked open

data

Page 8: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Open data distribution

Page 9: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

32.6m records from 2,300 European galleries, museums, archives and libraries

Books, newspapers, journals, letters, diaries, archival papers

Paintings, maps, drawings, photographs

Music, spoken word, radio broadcasts

Film, newsreels, television

Curated exhibitions

31 languages

EUROPEANA - Europe’s cultural heritage portal

Page 10: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

The European Library as libraries aggregator to Europeana

Domain Aggregators National initiatives

Audiovisual collections

National Aggregators

Regional Aggregators

Archives

Thematic collections

Libraries

e.g. Musées Lausannois

e.g. Culture Grid,

Culture.fr

e.g. The European Library

e.g. APEX

e.g. EUScreen, European Film Gateway

e.g. Judaica Europeana, Europeana Fashion

Page 11: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Metadata in the Europeana Context

Provides a portal for users to access that data

• Metadata, previews and links to source

Makes the metadata freely available for anyone to re-use

• Under Creative Commons Zero (CC0) -public domain dedication

Makes metadata available via an API

Makes metadata available as Linked Open Data

• http://data.europeana.eu/

Page 12: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Europeana Data Model: a Collaborative Effort

Cross-community development

Involving library, archive and museum experts

Ca. 60 participants

http://pro.europeana.eu/edm-documentation

Page 13: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Europeana Data Model: general principles

• A cross domain approach

• Supporting the common semantics of cultural domains

• Addressing the requirements of the Europeana portal

• Adheres to the modeling principles of the Web of Data

• Available as an OWL ontology and XML schema

• Allows finer-grained models of the different domains to be at least partly interoperable at the semantic level

• Allows metadata to retain their original expressivity and richness

Page 14: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data at The European Library

Managing and linking person names

Page 15: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Which data from VIAF is used at The European Library

Name variantsVarious forms of the name of the person or organization. May include the complete name, abbreviated names, acronyms, etc.

Date of birth/deathThe dates of birth and death of the person

NationalitiesThe nationalities of a person or organization.

Page 16: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

How data from VIAF is used in The European Library

Name variants• For matching of names across records and data sources• Improves the identification of all publications of a work, the

identification of publications in books-in-print databases, and the identification of the contributor in the rights-holders databases.

Date of birth/death• Used for determining the public domain status. • Used for matching confirmation and disambiguation of

homonyms across data sources

Nationalities• Used, in some countries, for determining the public domain

status of the work.

Page 17: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

The matching process

VIAF data used for matching, disambiguation, and match probability

Page 18: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Matching work contributors with VIAF

Names are matched by similarity Confirmation of the correctness of a name

match is taken from other matching data• The dates of birth and death • The title of the work is compared against the list

of titles available in VIAF • All the contributors of the work are matched

against the list of known co-authors in VIAF• The publisher(s) of the work are matched against

the list of known publishers in VIAF A match is only chosen if enough supporting

evidence is found

Page 19: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Contributor names in statements of responsibility

“French Canadian freely arranged by Katherine K. Davis”.

“ed. by Peter Noever ; with a forew. by Frank O. Gehry; and contrib. by Coop Himmelblau.”

“W. Lange, A.C. Zeven and N.G. Hogenboom, editors”

“by Pamela and Neal Priestland”

“Vicente Aleixandre ; estudio previo, selección y notas de Leopoldo de Luis”

Page 20: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

The approach

To approach the problem as a Named Entity Recognition task in text that may not be grammatically correct, thus lacking lexical evidence

Some requirements from the ARROW context• Easily applicable to several languages• The outcomes of the recognition task must be explainable

Design decisions• Exploring the structured data within national bibliographies

• By analysis of the frequency of word occurrences in names of persons, and in other textual data

• Using word occurrence frequency allows to • bypass the need for building training sets• be able to provide simpler explanations of the name recognition

results

Page 21: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

The process – bibliographic record processing

The named entity recognition is performed for a record as follows:• Statement of responsibility is tokenized• The person names are recognized by comparing the

tokens with the dictionaries• The recognized names are compared against the

names of the contributors present in the structured fields of the record.

• If no similar name exists in the record, the contributor is added to the record in a structured data field

Page 22: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Evaluation data set(size of bibliographies and evaluation samples)

National BibliographyTotal

recordsMain

language

Evaluation sample

Statements of responsibility

ReferredPersons

British Library 13.4 million English 205 328German National Library

9.4 million German 200 378

National Library of the Netherlands

3.2 million Dutch 200 335

National Library of Greece 0.4 million Greek 297 379

Central Institute for the Union Catalogue of Italian Libraries

12.4 million Italian 224 297

Royal Library of Belgium 1 million

French and Dutch

203 387

    Total: 1329 2104

Page 23: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Evaluation results

Dataset

Exact match metric

Partial match metric

Precision Recall Precision Recall

British Library 0.981 0.979 0.991 0.991German National Library 0.975 0.934 0.992 0.992

National Library of the Netherlands

0.973 0.875 0.977 0.979

National Library of Greece 0.656 0.414 0.758 0.868

Central Institute for the Union Catalogue of Italian Libraries

0.97 0.896 0.971 0.973

Royal Library of Belgium 0.981 0.959 0.981 0.982

Overall: 0.948 0.837 0.958 0.963

Page 24: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data at The European Library

Managing and linking place names

Page 25: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

The approach for place name linking

• We process the complete metadata elements• The alignment is performed with Geonames

• Using the RDF dump of Geonames

• A generic approach not using any language specific information• The words themselves are not used as evidence

• We use only characteristics of the words (capitalization, size, etc)

• Wordnets, part-of-speech analysis, morphological analysis, etc., are not used.

• … in order to allow the use of this approach in a language independent manner

Page 26: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Resolution of the place names

• This task aims to find a single entity in the geographic ontology for aligning with the place name

• The first step of this task is to find all possible candidates for the resolution in the geographic ontology

• Uses a heuristic based predictive model:• Assigns a probability for each resolution candidate as match

or non_match

• An alignment is established if a minimum probability threshold for the class match is achieved.

Page 27: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Feature DescriptionNumber of words The number of words in the place name.

Name match If the recognized place name matched: the main name of the place, an alternate name, etc.

Exact name match

If the recognized place name matched exactly the place name.

Relative population

Relative population of the candidate in comparison with other candidates.

Geographic feature type

The type of geographic feature: continent, country, city, etc.

Related places found

The number of other place names found in the administrative hierarchy.

Relative related places

The relative number of administrative divisions found in the subject heading

In source country If it is located in one of the source countries of the subject heading system.

Which information supports the place name resolution

Page 28: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data at The European Library

Managing and linking concepts

Page 29: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linking Subject Indexing and Classification Data The context

• The centralization of bibliographic metadata enables resource access under a unified knowledge organization system

The challenges• Diversity of languages • Diversity of knowledge organization systems in use across

European libraries• Heterogeneous levels of details in subject information

Current status at The European Library• Use of alignments between ontologies:• Alignments were created manually or semi-automatically• Alignments in use include: CERIF, MACS (LCSH,

RAMEAU, SWD), UDC and DDC

Page 30: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

ReferencesFurther details may be consulted in the following publications:

•Freire, N, 2014, 'Word Occurrence Based Extraction of Work Contributors from Statements of Responsibility'. International Journal on Digital Libraries: Volume 14, Issue 3 (2014), Page 141-148. DOI: 10.1007/s00799-014-0113-3.•Charles, V., Freire, N, Antoine, I., 2014, 'Links, languages and semantics: linked data approaches in The European Library and Europeana', in 'Linked Data in Libraries: Let's make it happen!' IFLA 2014 Satellite Meeting on Linked Data in Libraries.•Freire, N, Muhr, M, 2013, 'Use of Authorities Open Data in the ARROW Rights Infrastructure' in proceeding of the DC-2013 Linking to the Future Conference, 2013.•Freire, N, 2013, 'Visualization and navigation of knowledge in pan-European resources: the case of The European Library' in proceedings of International UDC Seminar on Classification & Visualization: interfaces to knowledge.•N. Freire, et al., "Author Consolidation across European National Bibliographies and Academic Digital Repositories", 11th International Conference on Current Research Information Systems, 2012.•N. Freire, J. Borbinha, P. Calado, "A Language Independent Approach for Aligning Subject Heading Systems with Geographic Ontologies", International Conference on Dublin Core and Metadata Applications 2011, 2011.•N. Freire, J. Borbinha, P. Calado, B. Martins, "A Metadata Geoparsing System for Place Name Recognition and Resolution in Metadata Records", ACM/IEEE Joint Conference on Digital Libraries, 2011.

Page 31: Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Thank you

Nuno Freire Chief data officer

[email protected]