g04 vassilis tzouvaras_mapping_with_mint

36
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena Research Center LoCloud is funded by the European Commission's ICT Policy Support Programme

Upload: evaminerva

Post on 15-Apr-2017

283 views

Category:

Internet


0 download

TRANSCRIPT

The Mint Mapping tool The MoRe aggregator

Vassilis Tzouvaras, Dimitris Gavrilis

National Technical University of Athens

Digital Curation Unit - IMIS, Athena Research Center

LoCloud is funded by the European Commission's ICT Policy Support Programme

Cultural Heritage Content

• Diversity of cultural heritage content

– Numerous metadata schemas to annotate content (LIDO, CIDOC-CRM, EAD, METS )

• Massive digitization and annotation activities are in progress

• Need for interoperability

MINT Mapping Tool

• Provides users the ability to perform a mapping of their own metadata schemas to reference domain models

• Follows a typical web based architecture

• It was developed for ATHENA, but it is currently used for EUScreen, CARARE, Judaica, ECLAP, DCA and Linked Heritage

MINT 2 – What’s new?

• The backend was reconstructed for better performance

– File size for imports is extended

• The frontend was updated

– New interface

– Workflow is integrated in UI

– Facilitated browsing of input and target schema

MORe Overall Architecture

Registry

Apache Cassandra cluster

Fedora-commons

Temporary storage

Vocabulary services

Storage

JMS logging

Messaging

Core services

Enrichment service management

Entity matching / NLP

Geocoding / Historic Place names

RES

T

External enrichment services

Publish service management

OAI-PMH

RDF Store

Elastic Search

Archive

Cloud architecture

• De-centralized

• Scalable

• Four cloud environmets

– Storage

– Monitoring & logging

– Core services deployment

– Enrichment services deployment

Distributed

• Enrichment services run on:

– Austria

– Spain

– Greece

– Lithuania

– Slovenia

– Norway

• Scalability can be facilitated through a virtualization infrastructure

Workflow

OAI-PMH

LoCloud Collections

Wikimedia

MINT

Harvest

Ingest

Transform Enrich

Publish

OAI-PMH

Archive

RDF Store

SolR

Validate Index

Delete Reject

Omeka

Intermediate Schemas

Dublin Core

LIDO

CARARE

EAD

ESE

EDM

Dublin Core

LIDO

CARARE

EAD

ESE

EDM

OMEKA-XML

OGD

• Harvesting

• Validation

• Ingestion

• Transformation

• Enrichment

• Previewing

• Publishing

Core services

Harvests content from metadata sources OAI-PMH repository MINT LoCloud Collections Wikimedia

Multiple schemas are supported OAI_DC CARARE CARARE 2.0 LIDO EAD EDM ESE

• Harvesting

• Validation

• Ingestion

• Transformation

• Enrichment

• Previewing

• Publishing

Core services

Validates incoming information packages Executes validation schemes Validation micro-services

Structure Schema Linking Schematron rules

Flexible

How it is used in MoRe: Pre-validation Post-validation

• Harvesting

• Validation

• Ingestion

• Transformation

• Enrichment

• Previewing

• Publishing

Core services

Ingest content into storage Uses storage layer API Pluggable drivers for attaching different technologies / repositories

Apache Cassandra Filesystem-based Fedora-commons

Versioning support Complex digital object support

• Harvesting

• Validation

• Ingestion

• Transformation

• Enrichment

• Previewing

• Publishing

Core services

Content Model

Digital objects comprise data streams Each data stream can hold any kind of information

• XML/RDF, Image, Video, Documents, etc. Each different representation of an information object is stored as a different data stream

Each curation action generates a new version

• Transformation, Enrichment

• Harvesting

• Validation

• Ingestion

• Transformation

• Enrichment

• Previewing

• Publishing

Core services

Transforms entire information packages into the Europeana Data Model (EDM), or any other schema Multiple transformation routines

Per schema Per project Per provider

User can attach rights statement

• Harvesting

• Validation

• Ingestion

• Transformation

• Enrichment

• Previewing

• Publishing

Core services

The generic enrichment service facilitates the execution of the enrichment micro-services

• Hides the complexity from the user by using enrichment plans

• Provides seamless integration with the UI of MORE

Virtual Enrichment driver

• Allows developers/creative industries to create their own enrichment services and declare/use them within MoRe

• Harvesting

• Validation

• Ingestion

• Transformation

• Enrichment

• Previewing

• Publishing

Core services

Preview the XML record information for all datastreams

Preview the record in HTML (using the Europeana style sheet)

• Harvesting

• Validation

• Ingestion

• Transformation

• Enrichment

• Previewing

• Publishing

Core services

Publish transformed / enriched information

• Internal OAI-PMH provider

• XML export

• Publish directly to RDF repositories

• Sesame

• Virtuoso

• SolR index server

• Thematic – Thesauri collections – Vocabulary matching – Background links

• Spatial – Geo normalization – Geo coding – Reverse geo-coding – Historic place names

• Other

– Language identification

Enrichment micro-services

SKOS Thesauri

Geo-Names

DBPedia

Wikipedia

Enrichment Plan

• Enrichment micro-services are used within enrichment workflows:

– Enrichment plans

• Each enrichment plan applies to a specific schema

• Each enrichment plan executes enrichment micro-services in a specific order

Enrichment plans

Language identification

Vocabulary matching

Geo-normalization

Geo-coding

Enrichment Plan

• Each enrichment plan defines run-time parameters for specific services

– Content based

Enrichment plans

Language identification

Vocabulary matching

Geo-normalization

Geo-coding

Add subject collection A only if term X or Y

are matched

Dashboard

Packages organization

Package overview

Package lifecycle overview

Preview

Metadata completeness & statistics

Enrichment services overview

Direct access to 27 thesauri Create & (re)use subject collections