„metadata“

36
„Metadata“ The DRIVER experience and the OpenAIRE direction Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann 1

Upload: zaynah

Post on 19-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

„Metadata“. The DRIVER experience and the OpenAIRE direction. The metadata scope of this talk. Metadata is a multifacted thing and you can do many beautiful things with it… Focus in DRIVER and OpenAIRE - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: „Metadata“

„Metadata“

The DRIVER experience and the OpenAIRE direction

Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann

1

Page 2: „Metadata“

The metadata scope of this talk

• Metadata is a multifacted thing and you can do many beautiful things with it…

• Focus in DRIVER and OpenAIRE– Metadata for Research Publications but also

administrative, authority files, terminologies etc.• Format: Simple DC but also DIDL, OAI-ORE, RDF…• Protocol: OAI-PMH but also Feeds, Syncs…• Function: aggregation & search but also deploy, mine …

Page 3: „Metadata“

A coarse genealogy

2007 2008 2009 2010 2011

D-NET v1.0 D-NET v1.2

Portal

Page 4: „Metadata“

The Beginnings & Essentials

• Since 2004, originally a service for researchers at Bielefeld University for finding documents in repositories distributed across the globe

• In the meantime used world-wide • Indexing >25 Mio. docs from >1500 sources• Simple, pragmatic, informal and independent;

minimal effort but high reliability and value• Mostly OAI-PMH > Synergies with DRIVER• Now work on Thesauri, Mining, Syncing etc.

Page 5: „Metadata“
Page 6: „Metadata“

Lessons learnt

• OAI-PMH/SimpleDC allows effective search engine with immediate added value

• Many years of operation show that even simple, distributed approaches require a lot of care and patience• Heterogeneity of distributed resources introduces ambiguity

and requires service-sided effort• Over 1000 profiles and processing pipelines for sources

• Negative effects attenuated by display for humans• „users know what they see, when they see it“

• Main drawbacks• Local data quality • missing sharing and re-use between service-providers

• „Repository Infrastructure“ needed

Page 7: „Metadata“

A coarse genealogy

2007 2008 2009 2010 2011

D-NET v1.0 D-NET v1.2

Portal

Page 8: „Metadata“

The DRIVER initiative for networking repositories

2007-2009

Page 9: „Metadata“

DRIVER Objectives: Infrastructure!

Organisational structures for repositoriese.g. the „Confederation“

Improving quality and standards in local rep.e.g. guidelines and validation procedures

Building a distributed infrastructure for metadatae.g. service and function sharing

Target GroupsRepository ManagersService ProvidersInformation System Executives

Page 10: „Metadata“

What infrastructures are: DRIVER terms

Not an infrastructureSingle repositorySingle application for search and retrieval (e.g. BASE) Only local operation Backwards causation on repositories is missing

Maybe an infrastructureDistributed repository landscape as a whole As a capacity for emergent properties, e.g. quality and quantity

incentive for data population Nurturing development of service providers

Definitely an infrastructureMany service providers in one organisational and technical context (e.g. run-time environment)Enabling re-use and remix of data and services

Page 11: „Metadata“

The DRIVER approach was incremental

Start with publication metadataExisting distributed system, somehow connected

Considerable homogeneity and formats: OAI-PMH

Extend geographical coverageFrom 5 countries, to 10, to 27, to ???

Extend towards other contentsFrom publication metadata to enhanced publications, i.e. representations of „texts + data“

Learn about subject specificityData bring in disciplinary requirements

Page 12: „Metadata“

The DRIVER Initiative

DRIVER-I 6/2006 – 11/2007

Organisational Models and Technical Test-Bed

DRIVER-II 12/2007 – 11/2009

Running Organisation and Production

Infrastructure

DRIVER-Confederation and Technical

Service 2010ff

Organisation and Technical Deployment

1414

Page 13: „Metadata“

Some Results: Studies

Page 14: „Metadata“

Some Results: Guidelines

Build on knowledge from past & current IR projects (EU)26 actively involved contributors (experts and repository managers) from 8 countries.Practical answers on how to:

Improve full-text access Standardize metadata qualityCreate a reliable infrastructure for permanent identification, resolution, traceability and storageResolve semantic and classification issues

Page 15: „Metadata“

Some Results: A Portal

Page 16: „Metadata“

Some Results: A Search

Page 17: „Metadata“

Some Results: Repository Registration

Page 18: „Metadata“

Some Results: Support structures

Page 19: „Metadata“

Some Results: Repositories

Page 20: „Metadata“

Some Results: Service-Oriented-Arch.

9 hosting nodes

25+ Functionality typologies(services)

36serviceInstances + other applications: Spain, Slovenia, EFG …

Page 21: „Metadata“

Some Results: Runtime-System & Hosting

2323

Enabling Layer

Data Layer

EU Open AccessRepositories

Functionality Layer

Ad

min

istr

ato

rsE

nd

use

rs

Advanced User InterfacesNational portals

Project Applications

Page 22: „Metadata“

Some Results: A software

Meant for large service providers only!

Page 23: „Metadata“

Lessons learnt

Distributed data infrastructure requires links between organisational and technical concepts

Data specialists, computer scientists, service providers

Guidelines / content policies as a „glue“

In distributed data provision, quality and access measures are the most ‚expensive‘ tasks

Infrastructure AND data focus very demanding

Distributed service operation (not data provision) can be solved but asks novel questions (SLAs)

Infrastructure is there, applications are next…

Page 24: „Metadata“

Metadata aspects in DRIVER

OAI-PMH/SimpleDC corroborated

Necessity for other extensions shownAdministrative (CRIS): ‚project‘, ‚funder‘

Subject-specific: NLM, PACS etc.

Authority files: institutions, journals, authors…

Enhanced Publications = Text + Data Aggregation-Encoding: DIDL, OAI-ORE Introduce preservation-challenges Necessity for different Service-Typology

Page 25: „Metadata“

A coarse genealogy

2007 2008 2009 2010 2011

D-NET v1.0 D-NET v1.2

Portal

Page 26: „Metadata“

Primer Metadata-Workshop | Nijmegen | 7/8-SEP-2010

Page 27: „Metadata“

OpenAIRE Assignment

OpenAIRE Open Access Infrastructure for Research in EuropeObjective Support the Open Access Pilot of the EC & ERC(Practical implementation of „clause 39“)

- European Helpdesk: National Nodes- Repository Infrastructure: Deposit-Multiplexer- Research on Metadata, Impact & Disciplines

Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann29

Page 28: „Metadata“

OpenAIRE - factsheet

Open Access Infrastructure for Research in Europe

Programme: FP7 – Research Infrastructures

Starting date: December 1, 2009

Duration: 36 months

Budget: 4.1 Million

38 partners covering all European member-states

– To be reached at www.openaire.euMetadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann30

Page 29: „Metadata“

European Helpdesk

Promote FP7-pilot and ERC OA guidelinesNational Open Access Liaison Offices (27 countries)Provide OA “toolkits” for

– Researchers– Institutions

Setup 24/7 portal for deposit, search of OA publicationsLiaison with

– Other European OA initiatives– Publishers– CRIS systems

Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann32

Page 30: „Metadata“

Liaison Offices

Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann33

Page 31: „Metadata“

Supporting Repository Infrastructure

OpenAIRE portal built on D-NETAccess to scientific publications

– Search, browse– Visualization tools

Deposition of articles– Setup repository for homeless researchers (INVENIO)– Multiplexer for OA publications in existing repositories

Provide monitoring tools for– Document/depositing statistics– Usage statistics from repository infrastructure

Interoperation with other infrastructures Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann34

Page 32: „Metadata“

OpenAIRE system in a nutshell

OpenAIRE overall

overview:

functionalities

and domains

served

OpenAIRE overall

overview:

functionalities

and domains

served

Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann35

Page 33: „Metadata“

Explorative activities JRA

Interoperability for usage statistics / metrics and administrative research information systems (CRIS/CERIF)Explore the requirements, practices, incentives, workflows, data models, and technologies to deposit, access, and otherwise manipulate research datasets– Work with four (4) scientific communities

Health (Life Sciences) Environment Information & Communication Science Socio-economic Sciences and Humanities

Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann36

Page 34: „Metadata“

Metadata directions foreseeable

Repository compliance even more important than in DRIVER

Interface to administrative systems essential– E.g. EC project database

Authority files for authors, journals etc.

Exchange with others: ArXiV, PubMed etc.

Data extensions will introduce new worlds

this is a demo slide presentation to show you all the layouts37

Page 35: „Metadata“

A coarse genealogy

2007 2008 2009 2010 2011

D-NET v1.0 D-NET v1.2

Portal

Page 36: „Metadata“

Conclusions

• Metadata allow and require serious international infrastructure in research

• Even very simple approaches unfold complexity in distributed systems

• „Division of labour“ necessary– Keep an eye on trade-offs between specialized

expertise vs. organisational overhead

• Suggested approach: Simple and integrative rather than complex and integrated