„metadata“
DESCRIPTION
„Metadata“. The DRIVER experience and the OpenAIRE direction. The metadata scope of this talk. Metadata is a multifacted thing and you can do many beautiful things with it… Focus in DRIVER and OpenAIRE - PowerPoint PPT PresentationTRANSCRIPT
„Metadata“
The DRIVER experience and the OpenAIRE direction
Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann
1
The metadata scope of this talk
• Metadata is a multifacted thing and you can do many beautiful things with it…
• Focus in DRIVER and OpenAIRE– Metadata for Research Publications but also
administrative, authority files, terminologies etc.• Format: Simple DC but also DIDL, OAI-ORE, RDF…• Protocol: OAI-PMH but also Feeds, Syncs…• Function: aggregation & search but also deploy, mine …
A coarse genealogy
2007 2008 2009 2010 2011
D-NET v1.0 D-NET v1.2
Portal
The Beginnings & Essentials
• Since 2004, originally a service for researchers at Bielefeld University for finding documents in repositories distributed across the globe
• In the meantime used world-wide • Indexing >25 Mio. docs from >1500 sources• Simple, pragmatic, informal and independent;
minimal effort but high reliability and value• Mostly OAI-PMH > Synergies with DRIVER• Now work on Thesauri, Mining, Syncing etc.
Lessons learnt
• OAI-PMH/SimpleDC allows effective search engine with immediate added value
• Many years of operation show that even simple, distributed approaches require a lot of care and patience• Heterogeneity of distributed resources introduces ambiguity
and requires service-sided effort• Over 1000 profiles and processing pipelines for sources
• Negative effects attenuated by display for humans• „users know what they see, when they see it“
• Main drawbacks• Local data quality • missing sharing and re-use between service-providers
• „Repository Infrastructure“ needed
A coarse genealogy
2007 2008 2009 2010 2011
D-NET v1.0 D-NET v1.2
Portal
The DRIVER initiative for networking repositories
2007-2009
DRIVER Objectives: Infrastructure!
Organisational structures for repositoriese.g. the „Confederation“
Improving quality and standards in local rep.e.g. guidelines and validation procedures
Building a distributed infrastructure for metadatae.g. service and function sharing
Target GroupsRepository ManagersService ProvidersInformation System Executives
What infrastructures are: DRIVER terms
Not an infrastructureSingle repositorySingle application for search and retrieval (e.g. BASE) Only local operation Backwards causation on repositories is missing
Maybe an infrastructureDistributed repository landscape as a whole As a capacity for emergent properties, e.g. quality and quantity
incentive for data population Nurturing development of service providers
Definitely an infrastructureMany service providers in one organisational and technical context (e.g. run-time environment)Enabling re-use and remix of data and services
The DRIVER approach was incremental
Start with publication metadataExisting distributed system, somehow connected
Considerable homogeneity and formats: OAI-PMH
Extend geographical coverageFrom 5 countries, to 10, to 27, to ???
Extend towards other contentsFrom publication metadata to enhanced publications, i.e. representations of „texts + data“
Learn about subject specificityData bring in disciplinary requirements
The DRIVER Initiative
DRIVER-I 6/2006 – 11/2007
Organisational Models and Technical Test-Bed
DRIVER-II 12/2007 – 11/2009
Running Organisation and Production
Infrastructure
DRIVER-Confederation and Technical
Service 2010ff
Organisation and Technical Deployment
1414
Some Results: Studies
Some Results: Guidelines
Build on knowledge from past & current IR projects (EU)26 actively involved contributors (experts and repository managers) from 8 countries.Practical answers on how to:
Improve full-text access Standardize metadata qualityCreate a reliable infrastructure for permanent identification, resolution, traceability and storageResolve semantic and classification issues
Some Results: A Portal
Some Results: A Search
Some Results: Repository Registration
Some Results: Support structures
Some Results: Repositories
Some Results: Service-Oriented-Arch.
9 hosting nodes
25+ Functionality typologies(services)
36serviceInstances + other applications: Spain, Slovenia, EFG …
Some Results: Runtime-System & Hosting
2323
Enabling Layer
Data Layer
EU Open AccessRepositories
Functionality Layer
Ad
min
istr
ato
rsE
nd
use
rs
Advanced User InterfacesNational portals
Project Applications
Some Results: A software
Meant for large service providers only!
Lessons learnt
Distributed data infrastructure requires links between organisational and technical concepts
Data specialists, computer scientists, service providers
Guidelines / content policies as a „glue“
In distributed data provision, quality and access measures are the most ‚expensive‘ tasks
Infrastructure AND data focus very demanding
Distributed service operation (not data provision) can be solved but asks novel questions (SLAs)
Infrastructure is there, applications are next…
Metadata aspects in DRIVER
OAI-PMH/SimpleDC corroborated
Necessity for other extensions shownAdministrative (CRIS): ‚project‘, ‚funder‘
Subject-specific: NLM, PACS etc.
Authority files: institutions, journals, authors…
Enhanced Publications = Text + Data Aggregation-Encoding: DIDL, OAI-ORE Introduce preservation-challenges Necessity for different Service-Typology
A coarse genealogy
2007 2008 2009 2010 2011
D-NET v1.0 D-NET v1.2
Portal
Primer Metadata-Workshop | Nijmegen | 7/8-SEP-2010
OpenAIRE Assignment
OpenAIRE Open Access Infrastructure for Research in EuropeObjective Support the Open Access Pilot of the EC & ERC(Practical implementation of „clause 39“)
- European Helpdesk: National Nodes- Repository Infrastructure: Deposit-Multiplexer- Research on Metadata, Impact & Disciplines
Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann29
OpenAIRE - factsheet
Open Access Infrastructure for Research in Europe
Programme: FP7 – Research Infrastructures
Starting date: December 1, 2009
Duration: 36 months
Budget: 4.1 Million
38 partners covering all European member-states
– To be reached at www.openaire.euMetadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann30
European Helpdesk
Promote FP7-pilot and ERC OA guidelinesNational Open Access Liaison Offices (27 countries)Provide OA “toolkits” for
– Researchers– Institutions
Setup 24/7 portal for deposit, search of OA publicationsLiaison with
– Other European OA initiatives– Publishers– CRIS systems
Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann32
Liaison Offices
Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann33
Supporting Repository Infrastructure
OpenAIRE portal built on D-NETAccess to scientific publications
– Search, browse– Visualization tools
Deposition of articles– Setup repository for homeless researchers (INVENIO)– Multiplexer for OA publications in existing repositories
Provide monitoring tools for– Document/depositing statistics– Usage statistics from repository infrastructure
Interoperation with other infrastructures Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann34
OpenAIRE system in a nutshell
OpenAIRE overall
overview:
functionalities
and domains
served
OpenAIRE overall
overview:
functionalities
and domains
served
Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann35
Explorative activities JRA
Interoperability for usage statistics / metrics and administrative research information systems (CRIS/CERIF)Explore the requirements, practices, incentives, workflows, data models, and technologies to deposit, access, and otherwise manipulate research datasets– Work with four (4) scientific communities
Health (Life Sciences) Environment Information & Communication Science Socio-economic Sciences and Humanities
Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann36
Metadata directions foreseeable
Repository compliance even more important than in DRIVER
Interface to administrative systems essential– E.g. EC project database
Authority files for authors, journals etc.
Exchange with others: ArXiV, PubMed etc.
Data extensions will introduce new worlds
this is a demo slide presentation to show you all the layouts37
A coarse genealogy
2007 2008 2009 2010 2011
D-NET v1.0 D-NET v1.2
Portal
Conclusions
• Metadata allow and require serious international infrastructure in research
• Even very simple approaches unfold complexity in distributed systems
• „Division of labour“ necessary– Keep an eye on trade-offs between specialized
expertise vs. organisational overhead
• Suggested approach: Simple and integrative rather than complex and integrated