literature/data integration and ryan scherle data repository architect dryad digital repository...

33
Literature/data integration and Ryan Scherle Data Repository Architect Dryad Digital Repository HighWire Fall Publishers’ Meeting November 20, 2013 You may reuse any of the original content in these slides as you wish, provided you attribute the source

Upload: eileen-bryant

Post on 27-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Literature/data integration and

Ryan ScherleData Repository ArchitectDryad Digital Repository

HighWire Fall Publishers’ MeetingNovember 20, 2013

You may reuse any of the original content in these slides as you wish, provided you attribute the source

CC-BY-NC-SA nic221http://www.flickr.com/photos/nic221/391536867/

Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.

CC-BY Adamohttp://www.piqs.de/fotos/121272.html

Who cares if the data is lost?

By Agrant141 (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

James Cook, portrait by Nathaniel Dance-Holland, c. 1775, National Maritime Museum, Greenwich

Source: Publishing Research Consortium, http://publishingresearch.netn=3824

6

Who cares if the data is lost?

Data “available upon request”

Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals.

“6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied

Wicherts JM, Borsboom D, Kats J, Molenaar D (2006) doi:10.1037/0003-066X.61.7.726

Fighting data entropy

8

Info

rmati

on

Con

ten

t

Time

Time of publication

Specific details

General details

Accident

Retirement or career change

Death

(Michener et al. 1997)

Funder policies

o CDCo DODo DOEo EPAo NASA

o NIHo NISTo NOAAo NSFo USDA

US funding agencies that require or strongly recommend data sharing:

Joint data archiving policy

Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future.

As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive.

Authors may elect to embargo access to the data for a period up to a year after publication.

Exceptions may be granted at the discretion of the editor, especially for sensitive information.http://datadryad.org/pages/jdap

Piwowar HA, Chapman WW (2008) hdl:10101/npre.2008.1700.1

Impact factor and archiving policies

n=70

IF=3.6

IF=4.5

IF=6.0

Data archiving landscape

There are so many data repositories that we need directories of them:

o http://re3data.orgo http://DataBib.org

These repositories vary along many dimensions:o Datatype focuso Community focuso Allowed file sizeso Curation policieso Data access policieso Funding model

Data archiving landscape

Datatype Focus

Com

mu

nit

y F

ocu

s

General

General

Focused

Focused

Figshare

Institutional RepositorySupplement

alMaterials

Genbank

Pangaea Zenodo

LabDatabas

e

Dryad

14

Dryad vs supplementary materials

Dryad SOM

Discoverable: indexed and exposed to both web and bibliographic search engines

✔ ✗

Identifiable: DataCite DOIs within articles serve as permanent, resolvable identifiers

✔ ✗*

Permanent: processes in place to promote preservation (incl. format migration) ✔ ✔/✗**

Curated: quality control by both automated processes and human inspection ✔ ✗*

Ease of deposit: streamlined deposit, allowance for large and complex datasets ✔ ✔/✗**

Formatted for reuse: do not convert reusable formats to PDF ✔ ✔/✗**

Updatable: new versions of data files can be added, metadata can be enhanced ✔ ✗

Support for embargoes: can delay release of data in accordance with journal policy

✔ ✗

Free reuse: no paywall, clear terms of reuse (all data released under CC Zero) ✔ ✔/✗**

Support for large files: allow data files up to 10GB ✔ ✗

Economy of scale: cost efficiency from shared infrastructure ✔ ✔/✗**

Alignment to organizational mission: focus on archiving and reuse of scientific data

✔ ✗

* A few publisher SOM sites are exceptions to the general rule** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit

DataDryad.org 15

What makes Dryad unique

1. Tight focus on data associated with published literature

2. Data packages are curated

3. Open development process allows broad participation

4. Nonprofit organization managed by stakeholders

Dryad features

Quick and easy submission process…

Dryad features

…referencing authoritative sources…

Dryad features

…and leveraging integration with journals…

Dryad features

…to maximize the submitter’s valuable time.

DataDryad.org 20

DataDryad.org 21

Data citations

Best practice is to cite both the article and the data – they are both useful research products

But limit data citations to one data package per article – this eliminates most concerns about the size/granularity of data files

DataDryad.org 22

Materials and Methods

References

Dryad uptake

>4,000 data packages containing >12,000 files associated with articles in 275 journals

200 submissions each month and growing

Some data packages have been downloaded more than 10,000 times

Fewer than 10% of authors chose to embargo their data when this option is allowed by the journal

Price schedule

Plan Member Non-member Minimum Purchase

Voucher $65 per data package $70 per data package 25 vouchers

Deferred Payment $70 per data package $75 per data package 1 year

contract

Subscriptionannual fee based on $25 per published research article

annual fee based on $30 per published research article

2 year contract

Pay on submission N/A

$80 per data package, payable by the submitter

1 data package

29

Sponsoring open data

Functional EcologyHeredityJournal of HereditySystematic BiologyThe American NaturalistEcological MonographsProceedings AProceedings BJournal of EcologyInterface FocusPlant PhysiologyThe Plant CallOpen BiologyEcology and EvolutionEvolutionary ApplicationseLife

Publishers, societies, and other organizations are now sponsoring deposits in 44 Journals

EvolutionElementaPalaeontologyMycoKeysComparative CytogeneticsSubterranean BiologyNature ConservationNeoBiotaPhytoKeysZooKeysPaleobiologyBiodiversity Data JournalBioRiskMolecular EcologyMolecular Ecology Resources

GMS German Medical ScienceGMS Medizinische Infomatik, Biometric und EpidemiologieSpecial Papers in PalaeontologyJournal of Evolutionary BiologyJournal of the Royal Society InterfaceJournal of Applied EcologyJournal of Animal EcologyMethods in Ecology and EvolutionThe Journal of PaleontologyJournal of Hymenoptera ResearchPhilosophical Transactions APhilosophical Transactions B

In development…

Added value for journals, including a data display widget and a dashboard for editors

Integrated article & data submission

Key functionalityo Makes data deposition simple for

authors (once files are prepared)o Ensures permanent link to data

within each article (and vice versa).

Options are customized to meet journal policies

o Data can be submitted prior to manuscript review or upon acceptance

o Journals may allow authors the option of a embargoing data for 1 year after publication

32

To learn more

Repository home: http://datadryad.orgNews: http://blog.datadryad.orgTwitter: @datadryad

Ryan Scherle, [email protected]

33