to architect or engineer? lessons from datapool on building rdm repositories

26
To architect or engineer? Lessons from DataPool on building RDM repositories Steve Hitchcock, JISC DataPool Project 9th DCC Research Data Management Forum (RDMF9) Cambridge, 14-15 November 2012

Upload: jiscdatapool

Post on 11-May-2015

1.982 views

Category:

Technology


1 download

DESCRIPTION

There cannot be many mature products where development meetings have not been interrupted with a rueful declaration that to make further progress “you wouldn’t start from here”. This encapsulates one key difference between the architect and engineer, the latter prepared to work with the set of tools provided, the other preferring to start with a blank sheet of paper or an open space. In building research data repositories using two different softwares, Microsoft Sharepoint and EPrints, the DataPool Project is working somewhere between these extremes. Which approach will prove to be the more resilient for research data management (RDM)? In this talk we will look at the relevant factors.

TRANSCRIPT

Page 1: To architect or engineer? Lessons from DataPool on building RDM repositories

To architect or engineer? Lessons from DataPool on building RDM repositories

Steve Hitchcock, JISC DataPool Project 9th DCC Research Data Management Forum (RDMF9) Cambridge, 14-15 November 2012

Page 3: To architect or engineer? Lessons from DataPool on building RDM repositories

DataPool architecture (Sharepoint)

Peter Hancock, iSolutions, University of Southampton

Page 4: To architect or engineer? Lessons from DataPool on building RDM repositories

DataPool Building Capacity, Developing Skills, Supporting Researchers

JISCMRD Progress Workshop 24-25 October 2012 Nottingham

Byatt, D. ([email protected])Hitchcock, S. ([email protected] )White, W. ([email protected] )

Policy and guidance Data repositoryTraining

http:/datapool.soton.ac.uk/

SharePoint

EPrints 3.3

EPrints data apps

Support for Data Management Plans e.g.

3-layer metadata

Capture/share with external sources, e.g. SWORD-ARM

Informed by

Developing/working with

Progress

AssignDataCiteDOIs

Graduate & staff training services

Large-scaledata storage

University Strategic Research Groups

Doctoral Training Centres

IDMB Surveys of data practices among academics

Case studies +• Imaging, 3D• Geodata• ++

Page 5: To architect or engineer? Lessons from DataPool on building RDM repositories

Data repository platforms

• DataFlow• MS Sharepoint

• EPrints

Other platforms available

• DSpace, CKAN, data.bris, etc.

Architected

EngineeredFrom a data repository

perspective

Page 6: To architect or engineer? Lessons from DataPool on building RDM repositories

Implementations of DataFlow Model

DataStage SWORDCurated repository/archive

DataFlow: two data deposit motivations for creators: want to (practice), need to (policy)

Two-stage architecture DataBank

Addresses Dropbox effect for data producers

EPrints

DSpace QMUL

Page 7: To architect or engineer? Lessons from DataPool on building RDM repositories

DataStage: Upload file

DataStage was developed at the University of OxfordDataStage screenshots courtesy JISC Kaptur project

http://www.vads.ac.uk/kaptur/Thanks to Carlos Silva

Page 8: To architect or engineer? Lessons from DataPool on building RDM repositories

DataStage: Submit as data package

Page 9: To architect or engineer? Lessons from DataPool on building RDM repositories

3-layer metadata model

Takeda et al., 6th IDCC, Dec. 2010available from http://eprints.soton.ac.uk/169533/

JISC Institutional Data Management Blueprint (IDMB) Project, University of Southampton

Page 10: To architect or engineer? Lessons from DataPool on building RDM repositories

SharePoint user interface 1: project

Page 11: To architect or engineer? Lessons from DataPool on building RDM repositories

SharePoint user interface 2: data

+ fields for format, keywords

Page 12: To architect or engineer? Lessons from DataPool on building RDM repositories

Prof. Simon Cox (engng) on Sharepoint“The concept that formed part of SP thinking (at Southampton) from the very inception … that ability to use SP as a way to manage or at least collaborate as part of a 5-10 year programme of work.

“The other side is what we’re doing with intellectual property and what we’re offering for students. I chair a group design project, and every single student has said ‘I just do it all on Dropbox’. The same is happening with our research. So I think we have at least to provide a level of service and a level of integration between our research experience and our teaching experience. Would these people go to Southampton rather than University of Nowhereshire on the Web or the University of Google or the University of Dropbox? These are deep questions for us.”

Page 13: To architect or engineer? Lessons from DataPool on building RDM repositories

ePrints Soton: Item type: Dataset

Currently EPrints v3.2, customised to ePrints SotonDataset Item Type from 2007

Page 14: To architect or engineer? Lessons from DataPool on building RDM repositories

ePrints Soton: start to deposit Dataset

Page 15: To architect or engineer? Lessons from DataPool on building RDM repositories

EPrints data apps

Apps available from EPrints Bazaar http://bazaar.eprints.org/ Apps work with EPrints v3.3 or later

Page 16: To architect or engineer? Lessons from DataPool on building RDM repositories

EPrints (test repo) DataShare enabled

App by Tim Brody, EPrints + DataPool

Page 17: To architect or engineer? Lessons from DataPool on building RDM repositories

EPrints (test repo) Data Core enabled

App by Patrick McSweeney

Data Core “adds a few fields and doesn’t remove any fields from the eprint object. It creates an alternate workflow for datasets which is much smaller than a normal eprints workflow.”

Page 18: To architect or engineer? Lessons from DataPool on building RDM repositories

EPrints (test repo) Data Core enabled 2

App by Patrick McSweeney

Page 19: To architect or engineer? Lessons from DataPool on building RDM repositories

Essex Research Data metadata profile aims“Using metadata schema relevant to UK HE and research data (DataCite, INSPIRE and DDI 2.1), we have developed a basic metadata profile suited to describing research data generated at institutions with disciplinary diversity. The inclusion of fields like Funder and Grant number will ensure future harvesting and linking opportunities (like RCUK Research Outcome Systems). The metadata also suits the EPSRC data registry requirements.”http://researchdataessex.posterous.com/repository-beta-metadata-profile-released

Page 20: To architect or engineer? Lessons from DataPool on building RDM repositories

EPrints: Essex Research Data repository

EPrints v3.3.10, customised to Essex Research Data

http://researchdata.essex.ac.uk/

Screenshots courtesy JISC

Research Data @Essex project

Thanks to Louise Corti, Tom Ensom,

Alexis Wolton

Page 21: To architect or engineer? Lessons from DataPool on building RDM repositories

Essex Research Data record

Page 22: To architect or engineer? Lessons from DataPool on building RDM repositories

Essex Research Data: observations

• Assumes data deposit, so no selection of EPrints Item Type• No selection of e.g. Creative Commons licence, just copyright• Requirement for Time Period suggests particular type of data expected• Fields for Geographic info (not required) suggests particular type of data expected

Page 23: To architect or engineer? Lessons from DataPool on building RDM repositories

Architects and surroundings“On one plot aggressively crystalline blocks by Rogers Stirk Harbour are going up, their diamond shapes having nothing in particular to do with anything around them. On another Foster and Partners have designed a series of curving, stepped, blobby things, of the kind usually designed to take advantage of views on the Med or the Gulf, but are here facing each other like rows of daleks. Again, it shows little interest in anything around it.”

R. Moore, Utopia on Thames, Observer, 11 Nov 2012

Nine Elms, London

usembassylondon

Page 24: To architect or engineer? Lessons from DataPool on building RDM repositories

Open access repository interoperabilityConfederation of Open Access Repositories (COAR)Dublin Core, CRIS-CERIFOpenAIRE, RepositoryNet+, RioxxRCUK: Research Outcomes System, Gateway to Research, REF

Is there the same current debate about interoperability of data repositories?

Page 25: To architect or engineer? Lessons from DataPool on building RDM repositories

COAR on OA interoperabilitySpecific initiatives designed to support interoperability: AuthorClaim, CRIS-OAR, DataCite, DINI Certificate for Document and Publication Services, DOI, DRIVER, Handle System, KE Usage Statistics Guidelines, OAI-ORE, OAI-PMH, OA-Statistik, OA Repository Junction, OpenAIRE, ORCID, PersID, PIRUS, SURE, SWORD, and UK RepositoryNet+.COAR, The Current State of Open Access Repository Interoperability (2012), 26 Oct. 2012 v.02

MT @gknight2000 (Gareth Knight) Lincoln's CKan instance impressive bit.ly/QQd1au Doesn't appear to support OAIPMH or preservation function #jiscmrd

Page 26: To architect or engineer? Lessons from DataPool on building RDM repositories

What next for DataPool repositories?Sharepoint• User test and feedback sessions scheduled, will direct further development

EPrints apps (1 or 2 0f following, initially)• Develop app based on Essex data repository, providing other repositories with a 1-click install of this profile• Build interoperability (I/O) apps: e.g. Data Management Plans, Dropbox• Automate record capture for producers of large-scale, regular data outputs