the dryad digital repository: published data as part of the greater data ecosystem

23
The Dryad Digital Repository: Published data as part of the greater data ecosystem Todd Vision, Hilmar Lapp National Evolutionary Synthesis Center (NESCent) NESCent Kevin Clarke Heather Piwowar Peggy Schaeffer Ryan Scherle UNC-CH <MRC> Sarah Carrier Elena Feinstein Jane Greenberg Hollie White Kristin Antelman (NCSU) Bill Michener (UNM / DataONE) Bill Piel (Yale / TreeBASE) Funding: NSF, IMLS

Upload: hilmar-lapp

Post on 30-May-2015

610 views

Category:

Education


0 download

DESCRIPTION

Presented at the M3 and Biosharing Special Interest Group (SIG) meeting at ISMB 2010 in Boston, MA: http://gensc.org/gc_wiki/index.php/M3_%26_BioSharing

TRANSCRIPT

Page 1: The Dryad Digital Repository: Published data as part of the greater data ecosystem

The Dryad Digital Repository:Published data as part of the greater data ecosystem

Todd Vision, Hilmar LappNational Evolutionary Synthesis Center (NESCent)

NESCentKevin Clarke

Heather PiwowarPeggy Schaeffer

Ryan Scherle

UNC-CH <MRC>Sarah Carrier

Elena FeinsteinJane Greenberg

Hollie White

Kristin Antelman (NCSU)Bill Michener (UNM / DataONE)

Bill Piel (Yale / TreeBASE)

Funding: NSF, IMLS

Page 2: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Henry Oldenburg

Page 3: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Use and reuse of archived data in evolutionary biology

• n=27 articles from 5 journals

Use previously published data:

Provide supplementary data:

Provide supplementary materials:

GenBank submission honored:

0 25 50 75 100

100

41

7

48

% articles

Page 4: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Sharing data on request is not effective

• Wicherts et al (2006) requested data from 141 articles in the field of psychology. “6 months later, after … 400 emails, [sending] detailed

descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied

• In a survey among geneticists by Campbell et al. (2002) the most frequent reason for withholding data was the effort required to share it (80%). 28% were unable confirm others published research

because of data withholding.

Page 5: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Archiving at the time of publication is effective

• The point in time when authors are most prepared to archive their data.No opportunity for loss, corruption, etc., of data files

• Publication can be both carrot and stick.• The “GenBank model” is uniquely successful.

Page 6: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Further incentives to authors

• Increases impact of one’s own work• A quid pro quo for access to others’ data• Relief from the burden of ad hoc data sharing

Page 7: The Dryad Digital Repository: Published data as part of the greater data ecosystem
Page 8: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Evoldir survey March 2008

n=414

“Do you think the data underlying published scientific results should be made publicly accessible?” Yes: 395 (95.4%) No: 19 (4.6%)

“If yes, do you think journals should require data sharing of their authors, or should it be voluntary?” Required: 220 (55.6%) Voluntary: 176 (44.4%)

Page 9: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Joint Data Archiving Policy Data are important products of the scientific enterprise, and they

should be preserved and usable for decades in the future. [This journal] requires, as a condition for publication, that data

supporting the results in the article should be deposited in an appropriate public archive.

Authors may elect to … embargo access to the data for a period up to a year after publication.

Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.

Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.

DOI:10.1086/650340

Page 10: The Dryad Digital Repository: Published data as part of the greater data ecosystem

So where is this“appropriate public archive”?

Page 11: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Potential archiving solutions

Specialized databases (e.g. GenBank, TreeBase)Will cover some datatypes well, some not at all; High quality data, but with greater submission burden; May have issues with sustainability.

Supplementary materials onlinePublisher provides basic infrastructure, but with low level of service.

Author-managed websites Avoids some of the hazards of informal sharing, but is fragile.

Or ...

Page 12: The Dryad Digital Repository: Published data as part of the greater data ecosystem

• Functional goalsTo publish and preserve the

data reported in the biological literature.

To promote reuse of the data.

• Organizational goalsGovernance is shared by a

consortium of journals.Responsible long-term

stewardship.

Dryad - A shared public archivehttp://datadryad.org

Page 13: The Dryad Digital Repository: Published data as part of the greater data ecosystem

• Permanent identifiers (DOIs), trackable data citations

• Explicit terms (CCZero) for reuse

• No paywall to access• Searchable across publishers &

repositories

• Metadata enhanced for discoverability

• Support for standard APIs

• Commitment to preservation in perpetuity

• Migration of formats, files updatable• Support for embargoes

Dryad - A shared public archivehttp://datadryad.org

Page 14: The Dryad Digital Repository: Published data as part of the greater data ecosystem

14

Page 15: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Dryad is a digital library

not a traditional bioinformatics database

Page 16: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Repository priorities

Integration

Sharing

Discovery

Preservation

Page 17: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Repository priorities

Integration

Sharing

Discovery

Preservation

Dryad’s scope

Page 18: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Low-burden for deposition

published data (with article citation)

published article(with data citation)

DRYAD

JOURNAL

prepare manuscript and related data files

submit manuscript

editor

manuscript review

curation

send articledescription

Dryad data package

accepted?

yesno

send data identifier (DOI)

author

accepted?

data curator

upload data

Page 19: The Dryad Digital Repository: Published data as part of the greater data ecosystem

   engaging  the  scien+st  in  the  data  cura+on  process

   suppor+ng  the  full  data  life  cycle

   encouraging  data  stewardship  and  sharing

   promo+ng  best  prac+ces

   engaging  ci+zens

   developing  domain-­‐agnos+c  solu+ons

1.    Build  on  exis0ng  cyberinfrastructure

2.  Create  new  cyberinfrastructure

3.  Support  new  communi0es  of  prac0ce

DataONE:  An  Interopera0ng  Consor0um

Page 20: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Distributed  frameworkFlexible,  scalable,  sustainable  network  of  Member  Nodes  and  Coordina0ng  Nodes  

Page 21: The Dryad Digital Repository: Published data as part of the greater data ecosystem
Page 22: The Dryad Digital Repository: Published data as part of the greater data ecosystem

Lessons from Dryad (so far)• The importance of journals in data publication.• The value of a shared public repository to

promotion of data reuse.• The delicate balance of benefit and burden to

data authors.• The need to break down data silos.• Achieving long-term data preservation by

achieving long-term organizational sustainability.