the driver initiative for networking repositories wolfram horstmann universität bielefeld

Post on 18-Dec-2015

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The DRIVER initiative for networking repositories

Wolfram Horstmann

Universität Bielefeld

DRIVER motivation

Scholarly communication changes towards distributed provision of text, data and services

Repositories are thought as a saviour in this development building such a distributed system

An infrastructure supporting distributed repositories and services is needed

(and reactions)

(needs explanation)

Some observations on repositories

They represent a shift towards …open internet-exposure as opposed to closed database (‚graveyards‘)

content orientation as opposed to mere technical orientation (‚web-servers‘)

distributed systems centralized structures not immediateley required nowadays

„Everybody can be a publisher“Common description standards e.g. Dublin Core Metadata Initiative Many subject-specific standards

Common transfer protocols e.g. OAI-PMH, but also FTP, XML-RPC, WS, etc.

Searchability is possible!

Still: many results are lost to re-use/remixClosed: too sensible, weakly described, unimportant (???)

Missing service frameworks / infrastructures

Problems: Data and service interoperability

Solution: „Infrastructure“

Repositories can solve access problem

What infrastructures are: DRIVER terms

Not an infrastructureSingle repository

Single application for search and retrieval (e.g. BASE) Only local operation Backwards causation on repositories is missing

Maybe an infrastructureDistributed repository landscape as a whole As a capacity for emergent properties, e.g. quality and quantity

incentive for data population Nurturing development of service providers

Definitely an infrastructureMany service providers in one organisational and technical context (e.g. run-time environment)

Enabling re-use and remix of data and services

DRIVER Objectives

Organisational structure for repositoriese.g. the „Confederation“

Improving quality and standards in local rep.e.g. validation procedures

Building a distributed runtime systeme.g. service and data sharing

Target GroupsRepository Managers

Service Providers

Information System Executives

The DRIVER approach is incremental

Start with publication metadataExisting distributed system, somehow connected

Considerable homogeneity and formats: OAI-PMH

Extend geographical coverageFrom 5 countries, to 10, to 27, to ???

Extend towards other contentsFrom publication metadata to enhanced publications, i.e. representations of „texts + data“

Learn about subject specificityData bring in disciplinary requirements

88

The DRIVER Initiative

DRIVER-I 6/2006 – 11/2007

Organisational Models and Technical Test-Bed

DRIVER-II 12/2007 – 11/2009

Running Organisation and Production Infrastructure

DRIVER-Confederation 2010ff

Operations Office and Technical Deployment

NB: DRIVER is not an authoritative body, it is a liberal

bottom-up initiative of stakeholders

DRIVER partners and related projects

Networking, Support, Policy, StudiesGöttingen, Nottingham, SURF, Genth, Ljubiljana, Minho, Copenhagen

Technical development and deploymentAthens, Bielefeld, Pisa, Warsaw

Partners make links to many other thingsOA-services: Sherpa-ROMEO, OpenDOAR, BASE…

Projects: Europeana, PEER, DELOS, DL.org, D4Science, PARSE-Insight, NESTOR…

Orgs: DINI, JISC, LIBER, SPARC, KE …

Platforms: DSPACE/FEDORA/OPUS/ePrints

10 DRIVER-II Midterm Review, January 30, 2009 - Pisa10

Project structure

Networking

ResearchService

Running Infrastructure: Content &

Functionality

Construction of Services: ideas, design, development

TechnicalManagement

Advocacy: attracting users, content and Service

providers

Discovery: technology watch, EPs

requirements

Some results

Some Results: Studies

Some Results: A Portal

Some Results: A Search

Some Results: Repository Registration

Some Results: Guidelines

Build on knowledge from past & current IR projects (EU)26 actively involved contributors (experts and repository managers) from 8 countries.Practical answers on how to:

Improve full-text access Standardize metadata qualityCreate a reliable infrastructure for permanent identification, resolution, traceability and storageResolve semantic and classification issues

Some Results: Support structures

Some Results: Repositories

185+ harvested repositories

21 countries

856,264+ documents

Some Results: Service-Oriented-Arch.

9 hosting nodes

25+ Functionality typologies(services)

36serviceInstances

3 applications: DRIVER Main, Belgium, Spain-Recolecta

2020

Some Results: Runtime-System & Hosting

Enabling Layer

Data Layer

EU Open AccessRepositories

Functionality Layer

Ad

min

istr

ato

rsE

nd

use

rs

Advanced User InterfacesNational portals

Project Applications

Another Compulsory Design Diagram

Some Results: A software

Meant for large service providers only!

Technicalities

DRIVER and standards

Service Resources are implemented as Web Services and accessed through the corresponding Web Service Interface

Parameters calls are enveloped into SOAP messages

The Enabling Services are also compatible with REST

XML is the lingua-franca for the whole systemResource internal status, i.e. Resource profiles

Profiles in Information Service use Exist XML engine

VocabulariesNames of Languages: ISO 639 – 2 (three letters, B/T)Names of Countries: ISO 3166 (two letters)Date format: ISO 8601: 1988 (E)

DRIVER AggregationHarvesting according to OAI-PMH protocolAdopting OAI-Provenance best practice (OAI-about)To be extended to other object models and harvesting protocols

Queries to Search and Index obey to SRW/CQL standard

25 DRIVER-II Midterm Review, January 30, 2009 - Pisa25

Enabling Layer DevelopmentsFunction Task Partner Status D-NET

IS-Store Resource profile store Enhanced

Port (PERL > JAVA)

CNR RC 1.1

IS-S&N W3C S&N/Topics Enhanced

Port (PERL > JAVA)

CNR RC 1.1

IS-Lookup Resource discovery Enhanced

Port (PERL > JAVA)

CNR RC 1.1

IS-Registry Resource registration/de-registration/update

Enhanced

Port (PERL > JAVA)

CNR RC 1.1

Manager Orchestration of DRIVER Info Space

Enhanced

Port (PERL > JAVA)

CNR RC 1.1

Authn&Authz Service-2-Service secure interaction/multiple applications

Enhanced Service (JAVA)

ICM Proto 2.0

Monitoring Admin User Interface and autonomic administration

Novel Service (JAVA) CNR RC 1.2

26 DRIVER-II Midterm Review, January 30, 2009 - Pisa26

Data-Layer DevelopmentsFunction Task Partner Status D-NET

Harvester Collects arbitrary formats Port (PERL > JAVA) UniBi/CNR Alpha 2.0

Transformator Eases arbitrary mappings Novel service (JAVA) UniBi/CNR Alpha 2.0

Feature Extraction Executes transform.s. and utilities

Novel service (JAVA) UniBi Alpha 2.0

Text-Engine Utilities, e.g. language detection, full-text-extr.

Novel service (JAVA) UniBi Alpha 1.1

MD-Store Support special MD operations

Port (PERL > JAVA) UniBi Alpha 1.1

Store Generic store for binaries Novel service (JAVA) UniBi/ICM/CNR

Proto 2.0

Index Lookup table for stored information

Adapt from YADDA ICM/UniBi Prod. 1.0

OAI-ORE Publisher Exposure of stored information

Novel service (JAVA) CNR Spec. 2.0

OAI-PMH Publisher Exposure of stored information

-- CNR Prod. 1.0

Content Service Managing complex objects Novel service (JAVA) CNR Proto 2.0

Access Service Generic service for using remote objects

Novel service (JAVA) CNR Proto 2.0

27 DRIVER-II Midterm Review, January 30, 2009 - Pisa27

Functional Layer DevelopmentsFunction Task Partner Status D-NET

AID Enhanced Publications management

Novel Service (JAVA) NKUA Spec. 2.0

Advanced search Optimized Search

Similarity Search

Enhanced Service (JAVA)

Novel Service (JAVA)

NKUA

ICM

Spec.

Spec.

2.0

2.0

User Services Advanced personalization Enhanced Service (JAVA)

NKUA Spec. 2.0

Community Service Advanced Community management

Enhanced Service (JAVA)

NKUA Spec. 2.0

Web Interface Generic to data model and services

Enhanced UIs

Enhanced Service (JAVA)

NKUA Spec.

Spec

1.2

2.0

28

Current Work: DRIVER-II

NetworkingConfederation with who-is-who advisory board

Outreach: LIBER, SPARC, US, JAPAN etc…

ConsolidationDRIVER-I Services packaged and performing in production quality

EnhancementDRIVER-I Services Improved indexing and data aggregation functionalities

DRIVER-II Services: D-NET v2.0 Enhanced publication management and functionality

DRIVER II – D-NET v2.0

StudiesWhat are „Enhanced Publications“? >> PDFTechnologies for „Enhanced Publications“ >> PDFLong-Term Preservation of „Enhanced Publications“„Technology Watch“: the Future >> PDF

Demonstrators „Enhanced Publications“ >> Live„Enhanced Publications“ Long-Term Preserv. >> Film

InfrastructureSpecs. ready, Development in progress >> WIKI D-NET v1.1: Java-Porting & Build-System D-NET v1.2: New Aggregator, Installer (, Contracts) D-NET v2.0: Compound Object Management

Outlook: Enhanced Publications

Outlook: Enhanced Publications

Based on OAI-ORE

The Web-Capable Model – OAI-ORE

http://www.openarchives.org/ore/

The Document Model for DRIVER

The Object Model – Internal Processing

Primitives: Types, Sets and Objects

Object: atoms, descriptions, relations

3535

The DRIVER-application

Compound Object Management

Object Instances DRIVER Processing DRIVER Application

Web-Representation

Web-Processing

Conclusion

Lessons learnt

Distributed data infrastructure requires links between organisational and technical concepts

Data specialists, computer scientists, service providers

Guidelines / content policies as a „glue“

In distributed data provision, quality and access measures are the most ‚expensive‘ tasks

Distributed service operation (not data provision) can be solved but asks novel questions (SLAs)

„Infrastructure“ for novel paradigms for scholarly communication are hard to get across ;-)

Summary

DRIVER tackles the data infrastructure challenge from the text-repository side (mostly OAI-PMH)

DRIVER handshakes with primary & secondary data through „enhanced publications“

DRIVER isn‘t only a project but a forum for information specialists

‚Products‘ include: Studies, Infrastructure run-time-system in production, software, support …

DRIVER has adressed many problems for data and service interoperability in a distributed repository environment and found some solutions

But…

How could DRIVER link to serious processing of unstructured data?

Thanks

top related