do more with your data
TRANSCRIPT
Do MORe with your data
LoCloud Final Conference5th February 2016
Dr. Dimitris GavrilisDigital Curation Unit - IMIS, Athena Research Center
LoCloud is funded by the European Commission's ICT Policy Support Programme
Do MORe with your data
• Key characteristics:– Fault-tolerance– High-availability– Elasticity– Scalability
• Key components– Storage layer– Decentralized & scalable
services– Pluggable services
MORe Architecture
Micro-service architecture
Validation service mgmt
Validation micro-servicesInput sources
Structure
Schema
Linking
Schematron rules
Data access layer
OAI-PMH
MINT mapping tool
Storage nodes
Core services layer
Input service mgmt Publish serv. mgmt
Publish services
Archive
Elastic Search
RDF Store
OAI-PMH
Omeka
Wikimedia
LoCloud collections
Enrichment service mgmt
Language identification
Thesauri collections
Vocabulary matching
Background links
Geo normalization
Geo coding
Reverse geo-coding
Historic place names
Enrichment micro-services
File-Upload
Enrichment micro-services
• 14 enrichment services so far– Thematic– Spatial– Temporal– Other
• Enrichment services run on:– Austria– Spain– Greece– Lithuania– Slovenia– Norway
Distributed
Validation
• Validation schemes– Flexibility
• Schematron Rule based validation– No more rejected
packages
• Get completeness graphs for every package and– schema– element– Per
mandatory/recommended set
Metadata Quality
Metadata Quality
• On the fly indexing, analysis and intuitive presentation of – Thematic information– Spatial information – Temporal information
Preview
Publication
• Publish your enriched data to– Europeana– An RDF Store as LOD– To Elastic Search – Download them in a zip archive
• Publish to multiple targets simultaneously
Enrichment micro-services
• We have our own Geo-names server
Place names
• We have our own PeriodO database
Periods
• We have access to over 30 thesauri
AIT (Angewandte Informationstechnik Forschungsgesellschaft mbH
Author Name of vocabulary University of California, Santa Barbara Alexandria Digital Library Feature Type Thesaurus Royal Commission on the Ancient and Historical Monuments of Scotland (RCAHMS)
Archeological Objects Thesaurus Scotland
English Heritage Archeological Sciences Thesaurus English Heritage Building Materials Thesaurus English Heritage Components Thesaurus American Folklore Society Ethnographic Thesaurus English Heritage Event Type Thesaurus English Heritage Evidence Thesaurus English Heritage FISH Archeological Objects Thesaurus Eionet European Environment Information and Observation Network
General Multilingual Environmental Thesaurus GEMET
Federation Internationale des Archives du Film (FIAF)
General Subject headings for Film Archives
The Discovery Programme Irish Monuments The Discovery Programme Irish Periods Royal Commission on the Ancient and Historical Monuments of Scotland (RCAHMS)
Maritime Craft Thesaurus Scotland
English Heritage Maritime Craft Type Thesaurus English Heritage and Royal Commission on the Historical Monuments of England
MDA Archaeological Objects Thesaurus
Royal Commission on the Ancient and Historical Monuments of Wales (RCAHMW)
Monument Thesaurus Wales
Royal Commission on the Ancient and Historical Monuments of Scotland (RCAHMS)
Monument Type Thesaurus
English Heritage Period Thesaurus Royal Commission on the Ancient and Historical Monuments of Wales (RCAHMW)
Period Thesaurus Wales
Bibliographic Standards Committee of the Rare Books and Manuscripts Section (ACRL/ALA)
Relator Terms for Use in Rare Book and Special Collections Cataloguing
Universidad de León
Tesauro de Ciencias de la Documentación
Library of Congress. Prints and Photographs Division
Thesaurus for Graphic Materials 1: Subject Terms
Library of Congress. Prints and Photographs Division
Thesaurus for Graphic Materials 2: Genre and Physical Characteristic Terms
Ministero per i Beni e le Attività Culturali
Thesaurus PICO 4.1
UKAT UK Archival Thesaurus (UKAT) UNESCO UNESCO thesaurus
Thesauri mappings
• Map your subject terms to standardized concepts from SKOSified vocabularies– AAT– Perio.do– …
• Subject collections showcase– Publically available subject
collections • Seamless integration with
MoRe– Autocomplete search of
terms within thesaurus• Targeted enrichment based
on item level subject terms
Subject collections
• Automatically enrichment of content with entries from:– Wikipedia– DBPedia– SKOSified thesauri
UPV/EHU – Universidad del País Vasco
Metadata Enrichment
• MORe API allows to run the entire aggregation engine through REST
• Developers area– API key generation– API documentation with
examples– Example Java projects for
NetBeans & Eclipse IDEs
Developers & Creative Industries API Integration
Developers & Creative IndustriesPlugins
• Allows developers to create their own enrichment micro-services on their own servers and integrate them into the enrichment process of MoRe.
• Developers have to implement a REST based interface and declare it as an enrichment micro-service in MoRe
• 10 more projects are using/evaluating MORe– ARIADNE chose MORe as it’s primary aggregator
• Over 1 million records have been aggregated and published to the ARIADNE portal
– RDA DDRI WG uses MORe• Zero downtime• Zero data loss• New metadata schemas have been integrated • New enrichment services have been developed /
integrated
MORe success stories
Thank [email protected]
LoCloud is funded by the European Commission's ICT Policy Support Programme
The views and opinions expressed in this presentation are the sole responsibility of the
authors and do not necessarily reflect the views of the European Commission.
Funding
Native record (OAI_DC)
EDM Record
Missing language attributes
Place label is a concat string of coordinates
Enriched EDM Record
Language identification
Vocabulary matching
Geo-normalization
Geo-coding
Enrichment Plan