data sharing*, archiving, and discovery: tips and tools

43
Data Sharing*, Archiving, and Discovery: Tips and Tools William Michener College of University Libraries & Learning Sciences DataONE University of New Mexico *Making data available for others to use

Upload: others

Post on 11-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Sharing*, Archiving, and Discovery: Tips and Tools

Data Sharing*, Archiving, and

Discovery: Tips and Tools

William MichenerCollege of University Libraries & Learning Sciences

DataONE

University of New Mexico

*Making data available for others to use

Page 2: Data Sharing*, Archiving, and Discovery: Tips and Tools

2

The Data Deluge

Page 3: Data Sharing*, Archiving, and Discovery: Tips and Tools

3

Co

nte

nt

Time

Time of publication

Specific details

General details

Accident

Retirement or

career change

Death

(Michener et al. 1997)

Vin

es,

T.

H.

et

al. C

urr

. B

iol. h

ttp://d

x.d

oi.o

rg/1

0.1

016

/j.c

ub.2

013

.11.0

14

(2013).

Data Entropy

Page 4: Data Sharing*, Archiving, and Discovery: Tips and Tools

4

Dark data in

the long tail

Specific Data are Hard to Find …

The Rest are Inaccessible

PB Heidorn (2008) Library Trends 57 (2), 280-299

Page 5: Data Sharing*, Archiving, and Discovery: Tips and Tools

� “the merging of ideas, approaches and

technologies from widely diverse fields of

knowledge to stimulate innovation and

discovery”

5

Convergent Science

Page 6: Data Sharing*, Archiving, and Discovery: Tips and Tools

Data Sharing

6

Page 7: Data Sharing*, Archiving, and Discovery: Tips and Tools

� The International Biological Program (IBP):

1964-1974� “… data policies and protocols were never

elaborated nor even agreed to in principle.” (Porter

& Callahan 1994)

7

A brief history of ecological data

sharing

Michener (2015) Ecological Informatics 29:33-44

Page 8: Data Sharing*, Archiving, and Discovery: Tips and Tools

A brief history of ecological data

sharing

Long Term Ecological

Research Network

(LTER): 1980-present

• LTER Guidelines for Site Data Management Policies issued in 1990 (Porter & Callahan 1994)

• LTER Network Data Access Policy, Data Access Requirements, and General Data Use Agreement (approved by the LTER Coordinating Committee April 6, 2005)

8 Michener (2015) Ecological Informatics 29:33-44

Approx. 20,000 data packages available

Page 9: Data Sharing*, Archiving, and Discovery: Tips and Tools

9

Page 10: Data Sharing*, Archiving, and Discovery: Tips and Tools

� NSF Policy from Grant General Conditions

(April 1, 2001)� “NSF … expects investigators to share with other

researchers, at no more than incremental cost and

within a reasonable time, the data, samples,

physical collections and other supporting materials

created or gathered in the course of the work.”

� America Competes Act (August 9, 2007)� requires civilian federal agencies to provide

guidelines, policy and procedures, to facilitate and

optimize the open exchange of data and research

between agencies, the public and policymakers.

10

A brief history of ecological data

sharing

1 Michener (2015) Ecological Informatics 29:33-44

Page 11: Data Sharing*, Archiving, and Discovery: Tips and Tools

11

A brief history of ecological data

sharing

1 Michener (2015) Ecological Informatics 29:33-44

Page 12: Data Sharing*, Archiving, and Discovery: Tips and Tools

� [Journal] requires, as a condition for publication, that

data supporting the results in the paper should be

archived in an appropriate public archive, such as [list

of approved archives here]. Data are important

products of the scientific enterprise, and they should

be preserved and usable for decades in the future.

Authors may elect to have the data publicly available at

time of publication, or, if the technology of the archive

allows, may opt to embargo access to the data for a

period up to a year after publication. Exceptions may be

granted at the discretion of the editor, especially for

sensitive information such as human subject data or the

location of endangered species.

12

The 2011 Joint Data Archiving Policy

(JDAP; see datadryad.org)

Michener (2015) Ecological Informatics 29:33-44

Page 13: Data Sharing*, Archiving, and Discovery: Tips and Tools

� “PLOS journals require authors to make all data

underlying the findings described in their

manuscript fully available without restriction,

with rare exception1.”� Nature, Science, Ecological Monographs, …

13

A brief history of ecological data

sharing

1 Michener (2015) Ecological Informatics 29:33-44

Page 14: Data Sharing*, Archiving, and Discovery: Tips and Tools

14

Page 15: Data Sharing*, Archiving, and Discovery: Tips and Tools

15

Page 16: Data Sharing*, Archiving, and Discovery: Tips and Tools

0 20 40 60 80 100

Use others' datasets if their data were

easily accesible

Willing to share data across a broad group

Use others’ datasets if their data were easily accessible

Process for searching

Perception

Satisfaction

Baselin

e (

2010)

Follo

w-u

p (

2014)

Views: 35,693; Citations: 188

(published Jun 2011)

Views: 8,342; Citations: 8

(published Aug 2015)

Community Practices

and Perceptions

16

20102014

Page 17: Data Sharing*, Archiving, and Discovery: Tips and Tools

17

Page 18: Data Sharing*, Archiving, and Discovery: Tips and Tools

18

Page 19: Data Sharing*, Archiving, and Discovery: Tips and Tools

19

Benefits of Data Sharing1

Page 20: Data Sharing*, Archiving, and Discovery: Tips and Tools

� “data sharing accelerates the pace of science

by enabling researchers to discover and re-use

relevant data, combine data from multiple

sources, and ask new questions”

� “public trust increases as science is made more

transparent and findings can be reproduced and

verified”

� Researchers “benefit from the credit attributed

to them when their archived data are cited and

used by others” and “citation rates of

publication increase when the research data

are shared” 20

Benefits of Data Sharing

Michener (2015) Ecological Informatics 29:33-44

Page 21: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices

21

Page 22: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Sharing Data:1. Create and Follow a Data Management Plan

22

Michener WK (2015) Ten Simple Rules for Creating a Good Data Management Plan. PLoS Comput Biol 11(10): e1004525. doi:10.1371/journal.pcbi.1004525

Page 23: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Sharing Data:2. Adopt/follow Data Sharing & Attribution Policies

23

Joint Data Archiving Policy: [Journal] requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as [list of approved archives here]. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.

http://datadryad.org/pages/jdap

Whitlock, M. C., M. A. McPeek, M. D. Rausher, L.

Rieseberg, and A. J. Moore. 2010. Data Archiving.

American Naturalist. 175(2):145-146,

http://dx.doi.org/10.1086/650340

Creative Commons Licenses

(https://creativecommons.org)

Page 24: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Sharing Data:3. Fully Document the Data

� Darwin Core – species and biodiversity

collections

� EML – Ecological Metadata Language

� ISO 19115 – for wide variety of geospatial data

24

https://knb.ecoinformatics.org/#tools/morpho

http://rs.tdwg.org/dwc/

Page 25: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Sharing Data:4. Preserve the Data, Software and Workflows

25

http://specifyx.specifysoftware.org

Catalog of 1,500+ Data Repositories

Page 26: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Sharing Data:5. “Publish” and Disseminate the Data Products

26

http://www.gbif.org

http://www.vertnet.org

http://www.nature.com/sdata/

Page 27: Data Sharing*, Archiving, and Discovery: Tips and Tools

Archiving

27

Page 28: Data Sharing*, Archiving, and Discovery: Tips and Tools

Role of the Data Archive

28

Cook et al. (In press) Preserve: Protecting Data for Long-Term Use. In: Recknagel F, Michener WK(eds) Ecological Informatics, 4th edn. Springer.

Page 29: Data Sharing*, Archiving, and Discovery: Tips and Tools

Bad Practices for Preserving Data

29 Example from Lesson 4 in DataONE education modules (see DataONE.org)

Page 30: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Preserving DataCook et al. (In press) Preserve: Protecting Data for Long-Term Use. In: Recknagel F, Michener WK (eds) Ecological

Informatics, 4th edn. Springer.

1. “Keep similar measurements together in one

data set”

2. Follow standard approaches (e.g.

International System) when defining names,

units & formats (e.g., yyyy-mm-dd or

yyyymmdd for date, 20161220)

3. Use consistent data organization

30

Page 31: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Preserving DataCook et al. (In press) Preserve: Protecting Data for Long-Term Use. In: Recknagel F, Michener WK (eds) Ecological

Informatics, 4th edn. Springer.

4. Use stable file format� Text/CSV, shapefile, GeoTIFF, HDF, netCDF

5. Specify spatial & temporal coordinates

6. Assign descriptive file names� “Soil carbon and nitrogen concentrations in Barrow….”

7. Save raw data in read-only format and save

processing scripts (R, MATLAB, SAS)

31

Page 32: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Preserving DataMichener (In press) Quality assurance and quality control. In: Recknagel F, Michener WK (eds) Ecological Informatics,

4th edn. Springer.

8. Assure data quality

9. Provide complete documentation

10. Protect data (1 original, 1 copy onsite, 1 off-site)

32

Page 33: Data Sharing*, Archiving, and Discovery: Tips and Tools

The Data Repository Will Ensure:Cook et al. (In press) Preserve: Protecting Data for Long-Term Use. In: Recknagel F, Michener WK (eds) Ecological

Informatics, 4th edn. Springer.

1. Files are received as sent

2. Documentation describes files

3. Parameters and units are defined

4. File content is consistent

5. Parameter values are reasonable

6. Files are reformatted and

reorganized if necessary

33

Page 34: Data Sharing*, Archiving, and Discovery: Tips and Tools

34

Page 35: Data Sharing*, Archiving, and Discovery: Tips and Tools

35

Page 36: Data Sharing*, Archiving, and Discovery: Tips and Tools

Discovery

36

Page 37: Data Sharing*, Archiving, and Discovery: Tips and Tools

Data Repositories

37

Page 38: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Data Discovery:1. Search a Domain Portal or Aggregator

38

Page 39: Data Sharing*, Archiving, and Discovery: Tips and Tools

Data Federations (DataONE, GBIF)

carbon cycling plant biomass

ocean nitrogen avian distribution

39

Page 40: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Data Discovery:2. Refine the Search, Using Relevant Facets

40

Page 41: Data Sharing*, Archiving, and Discovery: Tips and Tools

41

Page 42: Data Sharing*, Archiving, and Discovery: Tips and Tools

Best Practices for Data Discovery:3. Give Back – ie Cite the Data Appropriately

42

Dryad links to journals

Provides citation instructions

Page 43: Data Sharing*, Archiving, and Discovery: Tips and Tools

dataone.org