ucar workshop review – “bridging data lifecycles: tracking data use via data citations” matt...

32
UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated Information Services (IIS) National Center for Atmospheric Research (NCAR) University Corporation for Atmospheric Research (UCAR) BESSIG, April 18, 2012

Upload: abigail-fletcher

Post on 17-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

UCAR Workshop Review – “Bridging Data Lifecycles:

Tracking Data Use via Data Citations”

Matt MayernikResearch Data Service Specialist

NCAR Library/Integrated Information Services (IIS)National Center for Atmospheric Research (NCAR)

University Corporation for Atmospheric Research (UCAR)

BESSIG, April 18, 2012

Page 2: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Workshop• April 5-6, at UCAR Center Green Campus• Funded by NOAA through the UCAR JOSS program• ~80 attendees

– Academic librarians– Data management professionals– Software engineers– Scientists

• Agenda and presentations posted at http://library.ucar.edu/data_workshop/

2

Page 3: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

What is a Data Citation?

3

From:Patil, S. and M. Stieglitz. 2011. Hydrologic similarity among catchments under variable flow conditions. Hydrology and Earth System Sciences, 15, 989–997. doi: 10.5194/hess-15-989-2011

Citation to journal article

Citation to data set

Page 4: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Interest in Data Citations

NSF GEO issued a “Dear Colleague Letter” on March 29

4

Page 5: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

NCAR/UCAR Data

5

Climate model output data

Longitudinaltime-series data

Observational data from field studies

All images: copyright University Corporation for Atmospheric Research

Page 6: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Motivation for Data Citations• Understand use and impact of data

– Measurements of data use– Give scientists and data centers credit for producing,

managing, and curating data– Metrics requirements as an FFRDC

• Connecting data and scholarship• Increase transparency of data and science

6

Page 7: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

7

Mark Parsons

Page 8: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Data Citation Practices

• Most data users don’t cite data

• Ex. “MODIS snow cover data” from NSIDC

From: Parsons, M. A., Duerr, R., and Minster, J.-B. 2010. Data Citation and Peer Review. Eos Transactions, AGU, 91(34): 297-298. http://dx.doi.org/10.1029/2010EO340001

8

Page 9: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Hypothesis: ~80% of citation scenarios for 80% of ESS data 9

Mark Parsons

Page 10: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

EZID: long-term identifiers made easy

take control of the management

and distribution of your research,

share and get credit for it, and

build your reputation through its

collection and documentation

Primary Functions1. Create persistent identifiers2. Manage identifiers over time3. Manage associated metadata over time

Joan Starr

Page 11: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

• Established brand in publishing

• Indexed by major A&I citation databases

• Cannot be deleted• More costly• Ex. http://dx.doi.org/10.5065/D6WD3XH5

DOIs vs ARKs

• Case sensitive• Special feature

supports granularity• Informative• Less costly• Ex. http://n2t.net/ark:/b5065/d6wd3xh5

Joan Starr

Both resolve to:http://www.ncl.ucar.edu

Page 12: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Excerpts from existing AGU policy – Citing Data

..data cited in AGU publications must be permanently archived in a data center or centers that meet the following conditions:• are open to scientists throughout the world.• are committed to archiving data sets indefinitely.• provide services at reasonable costs.Data sets that are available only from the author, through miscellaneous public network services, or academic, government or commercial institutions not chartered specifically for archiving data, may not be cited in AGU publications.

Bill Cook

Page 13: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Excerpts from existing AGU policy – Preserving/Archiving Data

AGU does not expect to archive data sets subject to this policy, except on a for-fee basis and for sets of a small sizeIt is not AGU's intention to serve as an archive for large data sets that should be housed in data centers.AGU maintains a deposit service for supplementary material of different types in order to provide long-term access to small supporting data sets and graphics files that are published concurrently with, and are an electronic component of, some AGU journal articles.

Bill Cook

Page 14: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

NCAR Data Citation Initiatives

1. Technical

2. Policy/procedural

14Image copyright University Corporation for Atmospheric Research

Page 15: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

15

Citation Challenges1. Diversity

2. Granularity

3. Version Control

4. Maintenance Over Time

Page 16: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

What granularity for EOL DOIs and when are they issued?

• Given a large project with aircraft, soundings, radars, model output and satellite data do we:– Assign a DOI for each data file?– Assign one DOI for all datasets for the project?– Assign separate DOIs for datasets from each major platform?– What about ancillary data? Do we assign DOIs or does the providing

institution?

• We are thinking to assign DOIs for each major platform data associated with the project (e.g. C-130, S-Pol), outside datasets that we have “value-added”, and data for which no DOI exists

• It may be beneficial to only issue DOIs when processed data are released so as to prevent pubs from referencing preliminary data

Mike Daniels

Page 17: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Data QCGary Strand

Page 18: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

The LTER NIS 2000

Nicole Kaplan, CSU - Long-Term Management of Ecological Data - April 2012, UCAR

K.S. Baker, B.J. Benson, D.L. Henshaw, D. Blodgett, J.H. Porter, S.G. Stafford. (2000) Evolution of a Multisite Network Information System: The LTER Information Management Paradigm. BioScience. 50(11) 963-978.

Nicole Kaplan

Page 19: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

The LTER NIS 2011

Nicole Kaplan, CSU - Long-Term Management of Ecological Data - April 2012, UCAR

Nicole Kaplan

Page 20: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Results of CU Faculty Survey About Data Curation

• Many researchers had curation plans for their data• Many had orphan data without curation plans• Few departments had procedure for data preservation, some

participated in disciplinary based repositories supporting long-term storage

• Receptivity to a library role in data curation fell more in-line with the researchers disciplinary culture or philosophy regarding data sharing and collaborative projects.

Barb Losoff

Page 21: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

21

Ruth Duerr

Page 22: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

22

Lynn Yarmey

Page 23: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Citations in the Bigger PictureTed Habermann, NOAA/NESDIS/NGDC, NASA/ESDIS

Data preservation is communicating with the future

Ted Habermann

Page 24: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Metadata Types and Sharing

Discovery

Use / Mashup

Understanding

Discovery Portal

Community Metadata Collections

UserUser

More documentation is required for understanding data than discovering or using it.

Ted Habermann

Page 25: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

25

Tim Killeen

Page 26: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Bridging Data Lifecycles, April 5-6, 2012

26

Current Practices @ NCAR’s Research Data ArchiveMetrics Usage - Sample

37% of Users are from US

Now exporting 25+ TB monthly

Subsetting, in general, is +500 requests/month

Track User activity: - who accessed what and when

Steve Worley

Page 27: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

* SPIDR NODES

- 294,337 visits (browser/user only) - 14,658 unique visitors- 9.27 pages/visit- 6:45 avg. duration

Most Accessed out of 28 Data Sets:

Dan Kowal, Data Administrator

Annual Reporting ExampleDan Kowal

Page 28: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

NCAR Mauna Loa Solar Observatory Pubs.

28

Leonard Sitongia

Page 29: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

29

Dataset Family Tree Example

International Comprehensive Ocean Atmosphere Data Set (ICOADS)Global marine surface observations (1662-2011)

HadISST(1871-2011)

NOAA OI SST(1981-2011)

NOAA ERSST (1854-2011)

HadSLP (1871-2011)

JMA SST (1871-2011)

Ocean Clouds(1900-2010)

NOC Surf. Flux (1973-2009)

WASwind(1950-2009)

Global and Regional Atmospheric and Ocean Re-analysesNCEP/NCAR, NARR, ERA-40, ERA-Interim, 20CR, OARCA

Etc.

Steve Worley

Page 30: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

How to Get Started• Know what you want to achieve• Know your identifier options• Engage stakeholders• Start with well-bounded cases• Plan for the long-term implications

– How to maintain– How to count

30

Page 31: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

Thank You

Workshop agenda and presentations:http://library.ucar.edu/data_workshop/

Email:[email protected]

31

Page 32: UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated

32

END