natasa bulatovic max planck digital library research and development

Post on 15-Jan-2016

33 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources. Natasa Bulatovic Max Planck Digital Library Research and Development. The Max Planck Digital Library (MPDL) in a Nutshell. - PowerPoint PPT Presentation

TRANSCRIPT

This work is licensed under a Creative Commons Attribution 2.0 Germany License http://creativecommons.org/licenses/by/2.0/de/

eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources Natasa Bulatovic

Max Planck Digital Library

Research and Development

Max Planck Digital Library (MPDL) is a service unit within the Max Planck Society (MPG)

MPG consists of about 80 institutes in three scientific sections the Chemistry, Physics and Technology Section

the Biology and Medicine Section

the Human Sciences Section

The core activities of the MPDL lie in building up service infrastructure and tools for publications and research data

MPDL develops software solutions in close cooperation with scientists, librarians and technicians

In the Human Sciences Section several institutes have digitized cultural artefacts and want to make them open access

The Max Planck Digital Library (MPDL) in a Nutshell

eSciDoc SOA Landscape

Which data are managed?

How?

PubMan – Publication Management

VIRR – Textual digitized resources management

IMEJI – Image management

PubMan: Management of publications

21.04.23

Collaboration of the MPDL with the Max Planck Institute for European Legal History

Motivation: The period of the Holy Roman Empire produced a enormous corpus of legislative sources.Till now no complete collection of this works exist.

VIRR is about

21.04.23

ViRR Key features

Web-based collaborative application

Editor (bibliographic metadata, table of contents and structural metadata)

Viewer (online representation)

Browser

21.04.23

ViRR Editor

Combines a set of tools

Paginator

Table of Contents Editor

Metadata Editor

One complex, but flexible workspace

No default order for the usage of the tools

21.04.23

ViRR Editor - Paginator

Assign the logical page numbers to the physical ones

Choose between different formats (Arabic, Latin, custom)

Paginate manually or automatically

21.04.23

ViRR Editor - ToC Editor

Gather the logical structure of a work by breaking it down in structural elements

Arrange the hierarchical order of structural elements in the tree

Assign scans to structural elements

Choose from fine granular structural element types (over sixty)

21.04.23

ViRR Editor – Metadata Editor

Assign descriptive metadata to structural elements

Detailed description of every structural element

Systematic browsing

Dedicated search will be possible

ViRR Viewer

Browse by scanBrowse by ToC

Navigate to page

View metadata of structural element

Page (web resolution)

Page(full resolution)on click

ViRR: Sharing and reuse

http://virr.mpdl.mpg.de

From ViRR to Digitization Lifecycle Project Goal

support the complete Digitization Lifecycle with guideliness, standards, tools and a publishing platform

Partners: MPI for European Legal History, Frankfurt

Kunsthistorisches Institut, Florenz (KHI)

Bibliotheca Hertziana, Rom

MPI for Human Development, Berlin

Related projects: ViRR (see http://colab.mpdl.mpg.de/mediawiki/ViRR:_Virtueller_Raum_Reichsrecht)

XML-Workflow (see http://colab.mpdl.mpg.de/mediawiki/MPDL_Project_XML_Workflow)

Imeji: Management of image collections

Imeji: repository of Digital Images

Organized into

Collections

Created and defined by the institution, project, working group

Albums

Created and defined by the researcher

Imeji: what is so different about it?

Imeji is not Flickr, nor Facebook...

Freely definable metadata profiles at collection level

Controlled Vocabularies may be integrated

Smart search for dates, ranges (based on the metadata type)

Helps gathering the metadata more effectively

Focusses on collaboration and metadata quality

Repository: Data can be exported at any time

eSciDoc and other services

eSciDoc SOA Landscape

eSciDoc core infrastructure

Set Handler (OAI-PMH)

Admin Handler

Aggregation Definition

Handl.

Statistics Data Handler

Scope Handler

Report Handler

Report Definition Handler

Item Handler

Container Handler

Context Handler

Organizational Unit Handler

Content Model Manager

User Account Handler

Role Handler

Group Handler

Resources & Data Statistics Security

Content Relation Handler

CoNE Service● Manages named entities

○ Journals

○ Persons

○ Dewey Decimal Classification (3 public levels)

○ Creative Commons Licenses (CC licenses)

○ ISO 639-3 Languages

○ MIME Types

○ PACS classification

○ Custom classifications

● Reuse○ Data delivered in multiple formats (JSON, HTML, RDF/XML, Options list)

● Motivation○ Metadata quality: autosuggest components in solutions during metadata editing

○ Disambiguation: each entity is a named graph

○ Data linking: CoNE identifiers in publication metadata

○ Technical facilitation: all lists in one place

○ Persons: Researcher Portfolio

● Extensions○ Refresh data from external sources

CoNE – Control of Named Entitieshttp://cone.mpdl.mpg.de/

http://pubman.mpdl.mpg.de/cone/persons/resource/persons2450+

Content negotiation supported

Transformation Service

● Transforms textual data formats○ Metadata

○ Resources

○ Standard formats

○ Specific formats (e.g. EndNote custom fields)

● Motivation○ Migration of data from MPI

○ Exports and dissemination

○ Imports

○ Continuous interoperability enhancement

○ Implement once, use wherever needed

eDoc

BibTex

APA

OpenURL

EndNote

arXiv

Pmc

TEI

AJPBmc

METS

Spires

eSciDoc-Publication

eSciDoc-TOC

eDoc

BibTex

APA

OpenURL

EndNote

arXiv

Pmc

TEI

AJPBmc

METS

Spires

eSciDoc-Publication

eSciDoc-TOC

Search&Export ServiceCiation style manager

● Searches and exports results ● Citation styles (Citation style manager)

○ EndNote

○ BibTex

○ …

● Reuse○ Data delivered in multiple formats (PDF, HTML, XML, ODT)

○ By external systems (content management, wordpress)

● Motivation○ Search results should be available in various outputs

○ One service – many presentations (e.g. Wordpress Plug-in)

○ One interface – easy inclusion of various export formats

Syndication Service

● Provides with the latest data updates ● RSS

● Atom

● Reuse○ Subscription to feeds and data reuse

○ By any external clients

● Extensions○ Media RSS

Validation service

Semantical validation

Contextual validation

Validation rule editor (upcoming)

Data acquisition service• Fetches data from known sources via identifier (unAPI

interface)

• Transforms data to other format

Pubman SWORD Server

• Deposit of data packages (metadata and fulltexts)

• Logic implements a pubman specific workflow

PID Cache manager● Fetches Handles from the GWDG Handle System (dummy

resolution)

● Assigns a pre-fetched handle to the resource

● Synchronizes the assigned handle with the resolution to a resource in the Handle system

EPIC – European Persistent Identifier Consortium (GWDG Germany, SARA Netherlands, CSC Finland, http://www.pidconsortium.eu/ )

A note on the metadata profiles

● DCAP based (Dublin Core Application Profile)

● DC terms (identified URIs)

● eSciDoc solution specific terms (identified by URIs)

● METS/MODS

● Publicly available

● Functional description http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Application_Profiles

● Schemas http://metadata.mpdl.mpg.de/escidoc/metadata/schemas/0.1/

● Interoperability levels

● Shared term definitions (done)

● Semantic interoperability (done)

● Description set syntactic interoperability (prepared)

● Description set profile interoperability (prepared)

Premises● Applications

○ Web-based

○ Internationalized

○ Integrated Help system

○ Easy to use

○ Easy to install

● Services and infrastructure

○ Reusable, interoperable, composed, technology-independent

○ Extensible, Scalable and performant ● Data

○ Persistently identified, versioned, discoverable, provenance and authenticity information, fine-grained authorization

○ Described with published metadata profiles

○ Interoperable and enabled for reuse and repurpose

Related projects and new developments

DARIAH

Digital Research Infrastructure for Arts and Humanities (see http://dariah.eu)

Imeji

AWOB

Astronomers Workbench

Resource Registries

ECHO – European Cultural Heritage Online (see http://echo.mpiwg-berlin.mpg.de/home )

Thank you!

bulatovic@mpdl.mpg.de

http://colab.mpdl.mpg.de

http://escidoc.org

top related