find research data b2find integration how to publish metadata in eudats b2find catalogue this...
DESCRIPTION
Community-Driven Solutions PHYSICAL SCIENCES & ENGINEERING SOCIAL SCIENCES & HUMANITIES MATERIALS & ANALYTICAL FACILITIES ENVIRONMENTAL SCIENCES MAPPER BIOMEDICAL & MEDICAL SCIENCES EUDAT services (the so called B2 Service Suite ) are designed, built and implemented based on user community requirements.TRANSCRIPT
Find Research Data
b2find.eudat.euwww.eudat.eu
B2FIND IntegrationHow to publish metadata in EUDAT’s
B2FIND catalogue
This work is licensed under the Creative Commons CC-BY 4.0 licence
Version 2December 2015
EUDAT: A truly pan-European Infrastructure
EUDAT offers common data services to both research communities and individuals through a network of 35 European organisations.
EUDAT enables European researchers from any discipline and any geographic location to preserve, find, access, and process data in a trusted environment.
European infrastructuresTechnology ProvidersResearch Communities
Community-Driven Solutions
PHYSICAL SCIENCES & ENGINEERING
SOCIAL SCIENCES
& HUMANITIES
MATERIALS & ANALYTICAL FACILITIES
ENVIRONMENTAL SCIENCES
MAPPER
BIOMEDICAL & MEDICAL SCIENCES
EUDAT services (the so called B2 Service Suite) are designed, built and implemented based on user community requirements.
B2 Service Suite
b2find.eudat.euB2FIND is based on a
comprehensive joint metadata catalogue of research data collections stored in EUDAT data centres and other repositoriesB2FIND provides a simple and user-friendly discovery service on metadata steadily harvested from a wide range of research communities
What is B2FIND?
b2find.eudat.eu
Why should you publish your metadata in B2FIND?
Make your research datasearchable, visible and accessible to the publicpopular in a cross-disciplinary and international scope
Improve interoperability and re-use of your dataAllow feedback and annotations on your research outputBenefit from validation, quality assurance and added value of your meta data
b2find.eudat.eu
Data from a huge selection of subjects
B2FIND has a truly cross-community approachMetadata is mapped and offered covering a wide range of communities– From climate research to
Social Sciences– From Biodiversity to
Linguistics– From Archaeology to
Seismology Transformation and homogenisation of the catalogue allows use of a common vocabulary
b2find.eudat.eu
B2FIND communities
B2FIND comprises initially communities in the EUDAT registered domain of data, which provide a well-described and stable metadata offers. EUDAT is extending the service to other interested and reliable data and metadata providersThe list of currently integrated communities is available at http://b2find.eudat.eu/group/
b2find.eudat.eu
What will be covered
How get your metadata published in B2FIND ?Metadata GenerationMetadata repository and providerMetadata HarvestingMetadata Formats (excerpt)Metadata MappingB2FIND MD Schema (excerpt)Metadata ValidationSupport requestsAppendix: OAI-PMH - What it is and how it works
b2find.eudat.eu
How to get your metadata published in B2FIND? - The Metadata (MD) Ingestion Roadmap
MD Generation
MD Harvesting
MD Mapping and Validation
MD Uploading and Indexation
Data Provider on Community site
Service Provideron EUDAT site
MD Repository and Provider
b2find.eudat.eu
Metadata Generation
Must be done in close proximity to the data productionshould be part of the data management planmust be checked and possibly enhanced to aim for a comprehensive data descriptionbenefits from quality control at an early stageshould be based on common ontologies and metadata formats
b2find.eudat.eu
Metadata repository and provider
To be set up on community site to allow harvestingOAI-PMH is the preferred protocol (for a detailed description of the protocol and an installation guide of the data provider tool see the Appendix)But as well other data transfer techniques are supported, if necessaryEUDAT offers support for the installation
b2find.eudat.eu
Metadata Harvesting
B2FIND harvests regularly and incrementally from OAI endpoints
Initially the B2FIND team will do a first harvest try on a given and accessible OAI endpoint The frequency and the harvested sets will be negotiated with the community
b2find.eudat.eu
Metadata Formats (excerpt)Name Specification Description Used by B2FIND to harvest
from Communities
Dublincore Specification: See at http://dublincore.org/specifications/ and in the following standard documents:•IETF RFC 5013•ISO Standard 15836-2009•NISO Standard Z39.85
The Dublin Core Schema is a small set of vocabulary terms that can be used to describe web resources (video, images, web pages, etc.), as well as physical resources such as books or CDs, and objects like artworks. The full set of Dublin Core metadata terms can be found on the Dublin Core Metadata Initiative (DCMI) website, see left.
• DataCite• NARCIS• PanData• TheEuropeanLibrary• SDL• DARIAH• IVOA• PDC
ISO 19115 http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=53798
ISO 19115-1:2014 defines the schema required for describing geographic information and services by means of metadata. It provides information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services.
• ENES• Earlinet
MarcXML http://www.loc.gov/standards/marcxml/
MARC (MAchine-Readable Cataloging) standards are a set of digital formats for the description of items catalogued by libraries, such as books. It was developed by Henriette Avram at the US Library of Congress during the 1960s to create records that can be used by computers, and to share those records among libraries.
• B2SHARE• ALEPH
CMDI http://www.clarin.eu/content/component-metadata
CMDI (Component MetaData Infrastructure) was initiated by CLARIN to provide a framework to describe and reuse metadata blueprints. Description building blocks (“components”, which include field definitions) can be grouped into a ready-made description format (a “profile”).
• CLARIN
DDI http://www.ddialliance.org DDI (Data Documentation Initiative) is an effort to create an international standard for describing data from the social, behavioural, and economic sciences.
• CESSDA
b2find.eudat.eu
Metadata Mapping
The community specific ‘raw’ metadata are processed and homogenized to B2FIND schema in the following steps
Parse harvested XML records and select entries by MD format specific XPATH rulesAnalyse and parse values and map onto key-value pairs (JSON) vs. given controlled vocabulariesCheck and validate the resulting JSON records against B2FIND schemaUse (community specific) ontologies and thesauri
b2find.eudat.eu
B2FIND Metadata Schema (excerpt)
MetadataType
B2FINDField name
Semantic definition Allowed values / CV Level of Obligation
Occurrence
General information
Title A name or title a resource is known
Free text Mandatory 1
Description All additional textual information
CKAN2.0 only supports plain text Recommended 1
Data Access Source URI of the related resource Valid URL Mandatory 1PID Persistent Identifier Recommended 1DOI Digital Object Identifier Recommended 1
Provenance data
Creator List of the main researchers involved in producing the data
Text field (‘;’ list of citied names, separately indexed)
Recommended 0-n
Discipline Field of research Text field (mapped and validated against CV)
Recommended 0-n
Publisher The person or institution publishes the data
PublicationYear The year when the data was or will be made public
YYYY Recommended 1
Data coverage TemporalCoverage Relation to or Coverage of a specific interval in time.
Interval between two UTC Date Timestamps : [ BeginDateTime , EndDateTime ]
Optional 1
SpatialCoverage The spatial limits of a place.
A spatial point or box specification, CKAN representation :spatial={"type":"Polygon","coordinates":[[[minlat,minlon…]]}
Optional 1
b2find.eudat.eu
Metadata Validation
Check each field for coverage, consistency and validity
‘Technical’, e.g.:Check date-time vs. UTC formatCheck spatial coverage by geonames.org and consistency of lat/lon coordinates
Semantic mappingusing controlled vocabulariesusing ISO standards, e.g. iso639 library for ‘Language’
Online checksof links to the data objects (‘Source’, ‘PID’ and ‘DOI’)
b2find.eudat.eu
Support requests
www.eudat.eu/support-request?service=B2FIND
b2find.eudat.eu
For more info: https://eudat.eu/services/b2find User documentation: https://
www.eudat.eu/services/userdoc/b2find-integration
b2find.eudat.eu
Appendix OAI-PMH: What it is and how it
worksOAI-PMH ( http://www.openarchives.org )• stands for Open Archives Initiative Protocol for Metadata
Harvesting• aims at world-wide consolidation of scholarly archives• enables free access to the archives (at least: metadata)• is a low-barrier mechanism for repository interoperability• consists in a set of six verbs or services that are invoked
within HTTP• provides consistent interfaces for data and service
provider• allows effortless implementation• is based only on a few simple protocols (HTTP, XML, DC)
Data/Service Provider setup
Basic functioning of OAI-PMH
MetadataHarvester
Service Provider
Metadata(Documents)
Data Provider
Requests (based on HTTP)
Metadata (encoded in XML)
Local MetadataStorage
„Services“, e.g.• Search• Access• Commenting• …
EUDATMetadata Catalogue
Interoperability: it is by no means domain specific and based on common metadata schemas Widely used: It’s a quasi standard tool for providing metadata, for registered data providers (more than 2800 repostitories worldwide) see e.g. at https://www.openarchives.org/Register/BrowseSites Simple to install: In the appendix we offer a guideline of the software joai. See the list of tools implemented by members of the Open Archives Initiative community at https://www.openarchives.org/pmh/tools/tools.php Simple to use: OAI attached great importance to simplicity of the protocol
OAI benefits
Inefficiency: The XML serialisation and deserialisation takes time. Reference clash issue: if two records happen to have the same ID value, the envelope is not valid XML. Persistence of deletion: OAI-PMH allows three levels of persistence, but most providers promise none. Lack of SSL: By a strict reading OAI-PMH standard supports only http: , but not https:
OAI shortcomings
jOAI software (http://www.dlese.org/dds/services/joai_software.jsp )
is a Java-based data provider and harvester toolis from open source Open Archives Initiative runs in a servlet container such as Apache Tomcatenables existing systems, archives and databases
to provide metadata via OAI-PMH and to harvest metadata to the file system.
Software for OAI-PMH
To install and run the jOAI software you must have the following:
oai.war - the jOAI software.Apache Tomcat v5.5.x or v6.x.Java Standard Edition (SE) (or JDK) version 6.
For details see the OAI-PMH tutorial athttp://www.oaiforum.org/tutorial/
Installation overview
Configuration and customisation can be done directly in the jOAI data provider site:
1. Setup and configuration Data Provider
Setup and status Repository Information and Administration
2. Add metadata by adding directories of files Metadata Files Configuration
Add metadata directory3. (Re)index added/changed dierectories ..4. (optional): Set configuration, Access control, …
Data provider
Verbs that specify the service being invokedIdentify - used to retrieve information about the repository.ListIdentifiers - used to retrieve record headers from the repository.ListRecords - used to harvest full records from the repository.ListSets - used to retrieve the set structure of the repository.ListMetadataFormats - lists available metadata formatsGetRecord - used to retrieve an individual record from the repository.
Selective harvesting by parametersidentifier - specifies a specific record identifier.metadataPrefix - specifies the metadata format of the returned recordsset - specifies the set that returned records must belong to.from/until – returns records created/update/deleted after/before this dateresumptionToken - a token to resume a request where it last left off.
OAI-PMH Harvester – Verbs and Parameters
An example of an OAI Provider and Harvester