dmptool and data management basics · 2014. 7. 29. · data management plan (dmp)? for all the same...

37
DMPTool and Data Management Basics Hannah Norton July 29, 2014 Image modified from : http://www.flickr.com/photos/blprnt/3642742876/in/photostream /

Upload: others

Post on 19-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

DMPTool and Data Management Basics

Hannah NortonJuly 29, 2014

Image modified from : http://www.flickr.com/photos/blprnt/3642742876/in/photostream/

Page 2: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Background: the Data Lifecycle

Study Concept

Data Collection

Data Processing

Data Distribution

Data Archiving

Data Discovery

Data Analysis

Repurposing

Data Analysis

* Based on Data Documentation Initiative (DDI) version 3.0 Combined Life Cycle Model

Data Management Planning

2

Page 3: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

What is a data management plan (DMP)?

• A clear description of how you plan to address data management issues in your research.

• A way to communicate your data management efforts to members of your team and others (especially funders).

A data management plan gives a concise description of the who, what, where, and when of your data throughout its life cycle.

Page 4: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Why do researchers need a Data Management Plan (DMP)?

For all the same reasons you should take care of your data…• To ensure that valuable data resources will be

accessible in the future to members of the research team and the broader community.

• To make life easier – by planning ahead and documenting data throughout its life cycle, researchers can save time and focus on research.

• To increase the visibility of research.• To satisfy funders’ requirements.

Page 5: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Components of a DMP• Project description• Data collection:

– Types of data– Data and metadata standards to be used

• Legal and ethical issues:– Privacy and confidentiality – Intellectual property rights

• Policies for data sharing and re-use• Data preservation (long-term)• Who is responsible for data management

Page 6: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

http://dmptool.org

Page 7: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Log in to DMPTool with Gatorlink

Page 8: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Funders with DMPTool Templates• Alfred P. Sloan Foundation• Gordon and Betty Moore Foundation• Gulf of Mexico Research Initiative• Institute of Education Sciences (US Dept of Education)• Institute of Museum and Library Services• Joint Fire Science Program• National Institutes of Health• National Endowment for the Humanities – Office of

Digital Humanities• National Science Foundation (General and 11

Directorates)• U.S. Geological Survey

Page 9: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources
Page 10: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

http://library.ufl.edu/datamgmt

Page 11: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

http://guides.uflib.ufl.edu/datamanagement

Page 12: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources
Page 13: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources
Page 14: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Sample DMPs from UF

• Example text in the IR@UF: http://ufdc.ufl.edu/AA00014694/00001/

• Research Computing guidance on Data Management Plans (includes links to UF College of Engineering and Department of Astronomy guides): http://www.hpc.ufl.edu/research/proposal-support/data-management-plan/

Page 15: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources
Page 16: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources
Page 17: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Components of a DMP• Project description• Data collection• Legal and ethical issues• Policies for data sharing and re-use• Data preservation (long-term)• Who is responsible for data management

Page 18: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Example data collection questions• What file formats will you use for your data, and why?

What metadata/documentation will be submitted alongside the data? (NIH)

• Describe the data to be collected (actual observations) during your research including amount (if known). Name the type of data, the instrument or collection approach, and how the data will be sampled. (NSF-BIO)

• Give a short description of the data, including amount (estimated amount or known amount) and content. Data types could include XML spreadsheets, interview transcripts, text files, historical documents, diaries, field notes, geospatial data, citations, software code, algorithms, etc. (NEH)

Page 19: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Data generated throughout the lifecycle has different needs

• Raw data - some must be kept forever, others can be discarded after the project is complete

• Intermediate data for analyzing and processing - can be often be discarded at the end of the computation, but computational methods should be kept for reproducibility

• Final data - should be made available indefinitely to the community

Page 20: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

File formats

Formats with the following characteristics are considered relatively stable and better for long-term preservation:• open documentation• support across a range of software platforms• wide adoption• no compression (or lossless compression)• no embedded files or embedded programs/scripts• non-proprietary format

See the following for preferred and accepted file formats for the IR@UF: http://ufdc.ufl.edu/AA00017119/00011

Page 21: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

What exactly is metadata again?

• Descriptive information that helps you and others understand your data

• “Data about data” that acts as a surrogate for your data when you or others are trying to:– Find the data later– Know what the data is later– Share the data later

Page 22: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Metadata across the disciplinesBasic information to keep:• Descriptive

– What is it about? – Title, time, author, keywords – Relations to other data objects

• Administrative – Ownership and use permissions

• Provenance – Where does it come from?– History of changes to the data, versions

More specific information varies by discipline

Page 23: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Components of a DMP• Project description• Data collection• Legal and ethical issues• Policies for data sharing and re-use• Data preservation (long-term)• Who is responsible for data management

Page 24: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Example legal/ethical questions

• Procedures for managing and for maintaining the confidentiality of the data to be shared (IES)

• Will any permission restrictions need to be placed on the data? (NSF-BIO)

• Policies for public access and sharing should be described, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements. (NEH)

Page 25: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Components of a DMP• Project description• Data collection• Legal and ethical issues• Policies for data sharing and re-use• Data preservation (long-term)• Who is responsible for data management

Page 26: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Example data sharing questions

• Will you share data via a repository, handle requests directly or use another mechanism? (IES)

• What transformations will be necessary to prepare data for preservation/data sharing? (NIH)

• How long will the original data collector/creator/principal investigator retain the right to use the data before opening it up to wider use? (NEH)

Page 27: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Example data preservation/archiving questions

• If your method of sharing is with an archive, which archive/repository/database have you identified as a place to deposit data? (IES)

• What is the long-term strategy for maintaining, curating and archiving the data? (NSF-BIO)

• The Data Management Plan should describe physical and cyber resources and facilities that will be used for the effective preservation and storage of research data. These can include third party facilities and repositories. (NEH)

Page 28: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Finding a home for your data

• Data storage, both short-term and long-term, can take place in 3 types of places:– Locally, within the lab or research environment– Within the institution– Within a national/discipline-based repository

See the following guide to find discipline-based repositories: http://guides.uflib.ufl.edu/datasets

Page 29: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

http://www.hpc.ufl.edu/

Page 30: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

RepositoriesAdvantages of an institutional repository:• Linked to your institution –

intellectual capital of the institution in one place

• You can put all your datasets together

• Some guarantee of support from the university

• Some domain repositories may “go out of business” once their funding ends

Advantages of a domain repository:• Your data will stored with

similar datasets• Researchers in your discipline

will may find your data more easily

• The repository will understand what your data needs in terms of storage, archiving and preservation

• Computational tools may be developed to crunch a critical mass of data of a certain kind

Adapted from: http://libraries.mit.edu/guides/subjects/data-management/Managing%20Research%20Data%20101.pdf

Page 31: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Benefits of sharing data

• Data can be used by other researchers with different objectives

• Accelerate the time of discovery by building upon previous research

• Results can be reproduced more easily and accurately

• Researchers receive the credit they’re due• Data producers have a new channel by which to

promote their work (increase impact of research)

Page 32: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Components of a DMP• Project description• Data collection• Legal and ethical issues• Policies for data sharing and re-use• Data preservation (long-term)• Who is responsible for data management

Page 33: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Example data management responsibility questions

• Roles and responsibilities of project or institutional staff in the management and retention of research data (IES)

• Who will be responsible for data management and for monitoring the data management plan? How will adherence to this data management plan be checked or demonstrated? (NSF-BIO)

• Who will have responsibility over time for decisions about the data once the original personnel are no longer available? (NEH)

Page 34: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources
Page 35: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources
Page 36: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

A cautionary tale…

From NYU Health Science Center Libraries: http://youtu.be/N2zK3sAtr-4

Page 37: DMPTool and Data Management Basics · 2014. 7. 29. · Data Management Plan (DMP)? For all the same reasons you should take care of your data… • To ensure that valuable data resources

Questions?

Feel free to contact the Data Management/Curation Task Force: [email protected]

Or me: Hannah Norton, [email protected], 352-273-8412