niso virtual conference scientific data management: caring for your institution and its intellectual...

34
NISO Virtual Conference: Scientific Data Management February 18, 2015 Jennifer Doty Research Data Librarian Emory University Atlanta, GA Learning to Curate Research Data

Upload: national-information-standards-organization-niso

Post on 15-Jul-2015

872 views

Category:

Education


1 download

TRANSCRIPT

Page 1: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

NISO Virtual Conference:Scientific Data Management

February 18, 2015

Jennifer DotyResearch Data Librarian

Emory UniversityAtlanta, GA

Learning to Curate Research Data

Page 2: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Overview

Data Curation Working Group• pilot project• lessons learned

Data Curation Workshop• planning • challenges • feedback (more lessons learned)

Next Steps…

Page 3: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Data Curation Working Group Pilot ProjectLearning to Curate Research Data

Page 4: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

www.icpsr.umich.edu

Page 5: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Collaborative Curation

“We propose that domain specific archives partner with institution based repositories to provide expertise, tools, guidelines, and best practices to the research communities they serve.”

Green, Ann G., and Myron P. Gutmann. (2007) "Building Partnerships Among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives." OCLC Systems and Services: International Digital Library Perspectives. 23: 35-53. <http://hdl.handle.net/2027.42/41214>

Page 6: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Support:

Page 7: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Ron Nakao, Stanford

Libbie Stephenson, UCLA

Jon Stiles, UC Berkeley

Jen Doty, Emory

Rob O’Reilly, Emory

Joel Herndon, Duke

Jared Lyle, ICPSR

Page 8: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

DCWG Pilot Goals

For participants:

• Apply curation theories to practice through actual data processing.

• Will have a fully curated data collection ready for archiving at the end of the session.

• Interact with and ask questions of other data specialists within a working environment.

• Gain first-hand experience using ICPSR’s internal tools and workflows for curation.

• Understand level of effort to work through collections and provide assistance to researchers.

• Learn about things not thought about (e.g., costing, standardized workflows).

Page 9: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

DCWG Pilot Goals

For ICPSR:• Engage with outside data curators to learn what others

are doing and thinking.• Polish internal procedures and tools by opening them

to outside review and critique.• More data will be curated and archived, benefiting the

ICPSR membership and the entire social science community.

• Better utilize resources of the Official Representative (OR) community, including personal relationships and, especially, their wide-ranging expertise.

• Train a data curation community of support.

Page 10: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Page 11: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

DCWG Schedule

Week 1 - Introductions & Data Sources

Week 2 – Acquisition

Week 3 - Review

Week 4 – Processing

Week 5 – Metadata

Week 6 – Dissemination

Week 7 - Summary

Page 12: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

DCWG Topics

Acquisitions

•Gathering information from the data producer

•Legal agreements

•Appraisal

•What to keep, and for how long?

Review

•Quality review - are the data complete, accurate, and well documented?

•Disclosure review - is there sensitive or private information?

•Create a plan of attack

Processing

•Data cleaning

• Insuring data integrity

•Quality review - is the final package self-contained?

Metadata

•Standards overview

•Variable level metadata

•Study level metadata

Dissemination

•Final packaging and review

•Workflows

•Preservation policies

•Web delivery

Page 13: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

What's in it for us?

• Well-timed with new hires in 2012, and higher-up support for RDM projects

• Learn from gold standard holders:• ICPSR processing

pipeline and tools• implications of

providing premium level service for staffing and resource allocation

Nobel Prize Illustration by Howdy, I’m H. Michael Karshis on Flickr / CC BY 2.0

Page 14: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

The Data

• Panel Data - all states in the United States, 1972-2007, annual

• Coded Data - state-level data policies on home schooling, and relevant court cases

• Publicly-Available Data - a mix of demographic, economic, and social data from sources such as the BEA, the Census Bureau, the NCES

• No issues with regard to sensitivity of data or proprietary restrictions

Page 15: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Issues and Considerations

• Data assembled for particular project, not with long-term archiving and research in mind

• Discrepancies in documentation:• variable names

• unclear citations

• broken URLs

• variables in data missing from codebook, and vice-versa

Page 16: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Issues and Considerations

• Long history with the Principal Investigator for the project, which meant lots of context about the project and the data

• Useful in clarifying ambiguities in the data, e.g. “it makes sense to us” citations

• Even with that context, there was still much work and back-and-forth involved

Page 17: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Issues and Considerations

Absent that prior history, the climb would have been much more steep…

Stee

p c

limb

up

by

lisa

An

gu

lo r

eid

on

Flic

kr /

CC

BY-

NC

2.0

Page 18: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Lessons Learned

Overall, very impressive “to see how the sausage is made”:

• ICPSR processing pipeline

• SDE infrastructure

• Internal production and preservation tools

Sausage machine by Scoobyfoo on Flickr / CC BY-NC-ND 2.0

Page 19: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Lessons Learned

Realistically, best equipped at current levels to provide consultations and guidance, but not hands-on data curation

IBM 1620 in Computer Lab by euthman on Flickr / CC BY-SA

Page 20: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Work in Progress

• Intent to archive dataset with ICPSR still holds, but delayed by:• necessity for further documentation from investigators

• demands on our time from other projects

• Future plans for archiving datasets created by campus researchers informed by lessons learned from participating in pilot project

Page 21: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Data Curation Workshop for Researchers & LibrariansLearning to Curate Research Data

Page 22: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Page 23: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Local Workshop Objectives

• Raise awareness of funder requirements and journal policies to preserve and share data, and resources available to help do so

• Educate researchers and librarians in best practices for documenting and preparing data for long-term preservation and sharing

• Provide guidance and support to researchers depositing their data with appropriate domain repositories (e.g. ICPSR, Dryad)

• Opportunity to reach the researchers where they reside…

Page 24: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Page 25: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Lizzy Rolando, Georgia Tech

Jen Doty, Emory University

Mandy Swygart-Hobaugh, Georgia State University

Page 26: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Page 27: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Challenges

• condensing week-long workshop material into one-day sessions

• identifying topics most relevant to each audience

• time constraints for everyone—January dates overlapped with start of classes on all 3 campuses

Page 28: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Workshop agenda

• Identifying and Finding Data to Archive

• Reviewing Data

• Reviewing Confidential Data

• Cleaning Data

• Describing Data

• Depositing Data

• Disseminating & Publishing Data

• Local data curation resources

Page 29: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Participant Feedback

• Mixed responses: generally positive about content and structure, and all replies useful for revising material and better marketing

• Appreciated balance of presentations plus exercises and discussions

• Expressed interest in information related to planning for data management and curation

• Increased awareness of resources available

Page 30: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Next Steps…

For ICPSR:

• revising materials: vary the approach for researchers and librarians• Researchers: why best practices matter and how to

apply to projects and data right now

• Librarians: focus on curation topics

• planning for additional offerings in other locations

Page 31: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Next Steps…

For institutions:

• identify related training to offer locally

• adopt methods to support our researchers preparing data for archiving and sharing

• explore additional opportunities to partner with domain data archives

Page 32: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Gre

en Q

ues

tio

n M

ark

by

mik

eco

gh

on

Flic

kr /

CC

BY

Page 33: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

Thank You!

Jennifer Doty

Research Data Librarian

Emory University

[email protected]

Page 34: NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth

References

RDAP14 presentation:

http://www.slideshare.net/asist_org/rdap14-learning-to-curate-panel-32822019

Overview: http://www.asis.org/Bulletin/Aug-14/AugSep14_DotyEtAl.html

ACRL webinar: http://connect.ala.org/file-manager/download/group/85286/Webinar Slides/20140113_acrl_lyle.pptx