the driver initiative for networking repositories wolfram horstmann universität bielefeld

41
The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Post on 18-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

The DRIVER initiative for networking repositories

Wolfram Horstmann

Universität Bielefeld

Page 2: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

DRIVER motivation

Scholarly communication changes towards distributed provision of text, data and services

Repositories are thought as a saviour in this development building such a distributed system

An infrastructure supporting distributed repositories and services is needed

(and reactions)

(needs explanation)

Page 3: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some observations on repositories

They represent a shift towards …open internet-exposure as opposed to closed database (‚graveyards‘)

content orientation as opposed to mere technical orientation (‚web-servers‘)

distributed systems centralized structures not immediateley required nowadays

Page 4: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

„Everybody can be a publisher“Common description standards e.g. Dublin Core Metadata Initiative Many subject-specific standards

Common transfer protocols e.g. OAI-PMH, but also FTP, XML-RPC, WS, etc.

Searchability is possible!

Still: many results are lost to re-use/remixClosed: too sensible, weakly described, unimportant (???)

Missing service frameworks / infrastructures

Problems: Data and service interoperability

Solution: „Infrastructure“

Repositories can solve access problem

Page 5: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

What infrastructures are: DRIVER terms

Not an infrastructureSingle repository

Single application for search and retrieval (e.g. BASE) Only local operation Backwards causation on repositories is missing

Maybe an infrastructureDistributed repository landscape as a whole As a capacity for emergent properties, e.g. quality and quantity

incentive for data population Nurturing development of service providers

Definitely an infrastructureMany service providers in one organisational and technical context (e.g. run-time environment)

Enabling re-use and remix of data and services

Page 6: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

DRIVER Objectives

Organisational structure for repositoriese.g. the „Confederation“

Improving quality and standards in local rep.e.g. validation procedures

Building a distributed runtime systeme.g. service and data sharing

Target GroupsRepository Managers

Service Providers

Information System Executives

Page 7: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

The DRIVER approach is incremental

Start with publication metadataExisting distributed system, somehow connected

Considerable homogeneity and formats: OAI-PMH

Extend geographical coverageFrom 5 countries, to 10, to 27, to ???

Extend towards other contentsFrom publication metadata to enhanced publications, i.e. representations of „texts + data“

Learn about subject specificityData bring in disciplinary requirements

Page 8: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

88

The DRIVER Initiative

DRIVER-I 6/2006 – 11/2007

Organisational Models and Technical Test-Bed

DRIVER-II 12/2007 – 11/2009

Running Organisation and Production Infrastructure

DRIVER-Confederation 2010ff

Operations Office and Technical Deployment

NB: DRIVER is not an authoritative body, it is a liberal

bottom-up initiative of stakeholders

Page 9: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

DRIVER partners and related projects

Networking, Support, Policy, StudiesGöttingen, Nottingham, SURF, Genth, Ljubiljana, Minho, Copenhagen

Technical development and deploymentAthens, Bielefeld, Pisa, Warsaw

Partners make links to many other thingsOA-services: Sherpa-ROMEO, OpenDOAR, BASE…

Projects: Europeana, PEER, DELOS, DL.org, D4Science, PARSE-Insight, NESTOR…

Orgs: DINI, JISC, LIBER, SPARC, KE …

Platforms: DSPACE/FEDORA/OPUS/ePrints

Page 10: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

10 DRIVER-II Midterm Review, January 30, 2009 - Pisa10

Project structure

Networking

ResearchService

Running Infrastructure: Content &

Functionality

Construction of Services: ideas, design, development

TechnicalManagement

Advocacy: attracting users, content and Service

providers

Discovery: technology watch, EPs

requirements

Page 11: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some results

Page 12: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some Results: Studies

Page 13: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some Results: A Portal

Page 14: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some Results: A Search

Page 15: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some Results: Repository Registration

Page 16: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some Results: Guidelines

Build on knowledge from past & current IR projects (EU)26 actively involved contributors (experts and repository managers) from 8 countries.Practical answers on how to:

Improve full-text access Standardize metadata qualityCreate a reliable infrastructure for permanent identification, resolution, traceability and storageResolve semantic and classification issues

Page 17: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some Results: Support structures

Page 18: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some Results: Repositories

185+ harvested repositories

21 countries

856,264+ documents

Page 19: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some Results: Service-Oriented-Arch.

9 hosting nodes

25+ Functionality typologies(services)

36serviceInstances

3 applications: DRIVER Main, Belgium, Spain-Recolecta

Page 20: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

2020

Some Results: Runtime-System & Hosting

Enabling Layer

Data Layer

EU Open AccessRepositories

Functionality Layer

Ad

min

istr

ato

rsE

nd

use

rs

Advanced User InterfacesNational portals

Project Applications

Page 21: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Another Compulsory Design Diagram

Page 22: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Some Results: A software

Meant for large service providers only!

Page 23: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Technicalities

Page 24: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

DRIVER and standards

Service Resources are implemented as Web Services and accessed through the corresponding Web Service Interface

Parameters calls are enveloped into SOAP messages

The Enabling Services are also compatible with REST

XML is the lingua-franca for the whole systemResource internal status, i.e. Resource profiles

Profiles in Information Service use Exist XML engine

VocabulariesNames of Languages: ISO 639 – 2 (three letters, B/T)Names of Countries: ISO 3166 (two letters)Date format: ISO 8601: 1988 (E)

DRIVER AggregationHarvesting according to OAI-PMH protocolAdopting OAI-Provenance best practice (OAI-about)To be extended to other object models and harvesting protocols

Queries to Search and Index obey to SRW/CQL standard

Page 25: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

25 DRIVER-II Midterm Review, January 30, 2009 - Pisa25

Enabling Layer DevelopmentsFunction Task Partner Status D-NET

IS-Store Resource profile store Enhanced

Port (PERL > JAVA)

CNR RC 1.1

IS-S&N W3C S&N/Topics Enhanced

Port (PERL > JAVA)

CNR RC 1.1

IS-Lookup Resource discovery Enhanced

Port (PERL > JAVA)

CNR RC 1.1

IS-Registry Resource registration/de-registration/update

Enhanced

Port (PERL > JAVA)

CNR RC 1.1

Manager Orchestration of DRIVER Info Space

Enhanced

Port (PERL > JAVA)

CNR RC 1.1

Authn&Authz Service-2-Service secure interaction/multiple applications

Enhanced Service (JAVA)

ICM Proto 2.0

Monitoring Admin User Interface and autonomic administration

Novel Service (JAVA) CNR RC 1.2

Page 26: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

26 DRIVER-II Midterm Review, January 30, 2009 - Pisa26

Data-Layer DevelopmentsFunction Task Partner Status D-NET

Harvester Collects arbitrary formats Port (PERL > JAVA) UniBi/CNR Alpha 2.0

Transformator Eases arbitrary mappings Novel service (JAVA) UniBi/CNR Alpha 2.0

Feature Extraction Executes transform.s. and utilities

Novel service (JAVA) UniBi Alpha 2.0

Text-Engine Utilities, e.g. language detection, full-text-extr.

Novel service (JAVA) UniBi Alpha 1.1

MD-Store Support special MD operations

Port (PERL > JAVA) UniBi Alpha 1.1

Store Generic store for binaries Novel service (JAVA) UniBi/ICM/CNR

Proto 2.0

Index Lookup table for stored information

Adapt from YADDA ICM/UniBi Prod. 1.0

OAI-ORE Publisher Exposure of stored information

Novel service (JAVA) CNR Spec. 2.0

OAI-PMH Publisher Exposure of stored information

-- CNR Prod. 1.0

Content Service Managing complex objects Novel service (JAVA) CNR Proto 2.0

Access Service Generic service for using remote objects

Novel service (JAVA) CNR Proto 2.0

Page 27: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

27 DRIVER-II Midterm Review, January 30, 2009 - Pisa27

Functional Layer DevelopmentsFunction Task Partner Status D-NET

AID Enhanced Publications management

Novel Service (JAVA) NKUA Spec. 2.0

Advanced search Optimized Search

Similarity Search

Enhanced Service (JAVA)

Novel Service (JAVA)

NKUA

ICM

Spec.

Spec.

2.0

2.0

User Services Advanced personalization Enhanced Service (JAVA)

NKUA Spec. 2.0

Community Service Advanced Community management

Enhanced Service (JAVA)

NKUA Spec. 2.0

Web Interface Generic to data model and services

Enhanced UIs

Enhanced Service (JAVA)

NKUA Spec.

Spec

1.2

2.0

Page 28: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

28

Current Work: DRIVER-II

NetworkingConfederation with who-is-who advisory board

Outreach: LIBER, SPARC, US, JAPAN etc…

ConsolidationDRIVER-I Services packaged and performing in production quality

EnhancementDRIVER-I Services Improved indexing and data aggregation functionalities

DRIVER-II Services: D-NET v2.0 Enhanced publication management and functionality

Page 29: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

DRIVER II – D-NET v2.0

StudiesWhat are „Enhanced Publications“? >> PDFTechnologies for „Enhanced Publications“ >> PDFLong-Term Preservation of „Enhanced Publications“„Technology Watch“: the Future >> PDF

Demonstrators „Enhanced Publications“ >> Live„Enhanced Publications“ Long-Term Preserv. >> Film

InfrastructureSpecs. ready, Development in progress >> WIKI D-NET v1.1: Java-Porting & Build-System D-NET v1.2: New Aggregator, Installer (, Contracts) D-NET v2.0: Compound Object Management

Page 30: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Outlook: Enhanced Publications

Page 31: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Outlook: Enhanced Publications

Based on OAI-ORE

Page 32: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

The Web-Capable Model – OAI-ORE

http://www.openarchives.org/ore/

Page 33: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

The Document Model for DRIVER

Page 34: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

The Object Model – Internal Processing

Primitives: Types, Sets and Objects

Object: atoms, descriptions, relations

Page 35: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

3535

The DRIVER-application

Page 36: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Compound Object Management

Object Instances DRIVER Processing DRIVER Application

Web-Representation

Web-Processing

Page 37: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Conclusion

Page 38: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Lessons learnt

Distributed data infrastructure requires links between organisational and technical concepts

Data specialists, computer scientists, service providers

Guidelines / content policies as a „glue“

In distributed data provision, quality and access measures are the most ‚expensive‘ tasks

Distributed service operation (not data provision) can be solved but asks novel questions (SLAs)

„Infrastructure“ for novel paradigms for scholarly communication are hard to get across ;-)

Page 39: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Summary

DRIVER tackles the data infrastructure challenge from the text-repository side (mostly OAI-PMH)

DRIVER handshakes with primary & secondary data through „enhanced publications“

DRIVER isn‘t only a project but a forum for information specialists

‚Products‘ include: Studies, Infrastructure run-time-system in production, software, support …

DRIVER has adressed many problems for data and service interoperability in a distributed repository environment and found some solutions

Page 40: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

But…

How could DRIVER link to serious processing of unstructured data?

Page 41: The DRIVER initiative for networking repositories Wolfram Horstmann Universität Bielefeld

Thanks