the blessing and the curse: handshaking between general and specialist data repositories

Post on 05-Dec-2014

811 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk presented at the Genomic Standards Consortium 15 conference.

TRANSCRIPT

The blessing and the curse: handshaking between

general andspecialist data repositories

Hilmar Lapp (NESCent), Todd Vision (UNC Chapel Hill)GSC 15 Conference, Bethesda, MD

April 22-24, 2013

> 180 for biological sciences

alone

Which data goes where?Which is required?

Addressing the long tail of orphan dataVo

lum

e

Rank frequency of datatype

Specialized repositories(e.g. GenBank, GBIF)

Orphan data

After Heidorn (2008) http://hdl.handle.net/2142/9127

Many datasets belong to the long tail. Though less standardized, they can be rich in information content and have unique value

General purpose repositories cater to long-tail data

General purpose repositories cater to long-tail data

And that’s aside from the proverbial Babel of

data formats.

Where does this leave the user?

Where to deposit what, and how?

Enter Publication:

Please enter your publication:

Publication:

Enter Publication:

Metadatahas to be

provisioned redundantly

How to concisely link to the supporting data?

Given the article, how do I find the data?

Given a data record, how

do I find related data?

How do I assess quality and fitness for purpose?

Lessons fromDryad/TreeBASE

handshaking

• The End To make data archiving and reuse a standard part of scholarly communication.

• The Means Integrate data archiving with the process of publication. Make archiving easy and low burden for both authors and journals. Give researchers incentives to archive their data. Promote responsible data reuse. Empower journals, societies & publishers in shared governance. Ensure sustainability and long-term preservation. Work with and support trusted, specialized disciplinary repositories.

• The Scope Research data in sciences and medicine. (Early focus on evolution and ecology). Content must be complementary to existing disciplinary repositories. Data must be associated with a vetted publication (article, thesis, book chapter, etc.) Associated non-data content (e.g. software scripts, figures) where appropriate

Lessons learnt

• Different priorities on deposit versus metadata richness may void benefits

• Advantages of one-stop deposition and when to use it are not obvious to users

• Custom-building handshaking protocols is not robust, doesn’t scale

How to promote

• Minimum metadata reporting standards?

• Uptake of community specialist repositories?

• Archival of all long-tail data?

• Linking between repositories?

DataMetadata Links

DataMetadata Links

Standards for repository & web of data

interoperability

Standards for repository & web of data interoperability

Promoting community rallying around standards

?

Promoting community rallying around standards

?

Repo: http://datadryad.orgBlog: http://blog.datadryad.orgWiki: http://datadryad.org/wikiCode: http://code.google.com/p/dryadList: dryad-users@nescent.org @datadryad Dryad

top related