this photo by unknown author is licensed under cc by-sa
TRANSCRIPT
This Photo by Unknown Author is
licensed under CC BY-SA
@chochrphd
Topics Of Discussion
• Overview of Research Data Management
• Data Lifecycle
• Data Management Plan (DMP)
• Websites-Resources
What is research data?
Research Data Management Defined• Statistical Records
• Video & Audio recordings
• Images
• Measurements
• Software & Code
• Algorithms
• Lab notebooks
• Biospecimens
Research Data Management Defined
Research Data Management
• The organization, storage, preservation, and sharing of data collected and used in a research project.
✓ Everyday management of research data during lifetime of a project
✓ Decisions about how data will be preserved and shared afterthe project is completed
Research Data Management Defined
Importance of research data management• Verify the integrity of your data
• Make your data findable and reusable
• Help others understand your data
• Encourage other researchers to reuse and cite your data
• It is required by some funding agencies
Data Lifecycle
Research Data Management Defined
Plan
Discover
Collect & Organize
QualityDescribe
Store
Share
PLAN
Stage 1 Things to consider:
• Policies
• Type of data
• Versions
• Backup
• Describing and labeling
• Access and Sharing
• Rights and Permissions
• Roles and Responsibilities
• Budget
Stage 1: Plan
Data Management Plan (DMP)• A document that describes how you will treat your data throughout a project and
what happens with the data after the project ends.
• Some funding agencies require a Data Management Plan
Stage 1: Plan
• Data Management Plans address:
1. Data Type
2. Data Format
3. Data Sharing Plan
4. Data Archiving/Preserving Plan
https://old.dataone.org/data-management-planning
https://dmptool.org/
EXAMPLE: National Science Foundation (NSF) DMP
Length
• The data management plan is a supplementary document.
• Plans should be no longer than two pages.
Components
• Types of data produced
• Data and metadata standards
• Data access and sharing
• Data re-use and re-distribituion
• Archiving and preservation
EXAMPLE: National Science Foundation (NSF) DMP
Types of data produced:
The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project.
EXAMPLE: National Science Foundation (NSF) DMP
Types of data produced:
Questions To Ask
1. What types of data will be produced for your project?
2. How will the data be created or captured?
3. What software programs will be used to generate your data?
4. How much data will be produced?
5. How big will your digital files be and how many will there be?
6. Will you be using existing data? If so, what is the source of that data?
EXAMPLE: National Science Foundation (NSF) DMP
Data and Metadata Standards
The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies).
EXAMPLE: National Science Foundation (NSF) DMP
Data and Metadata Standards
Questions to Ask
1. How will you document your data and project?
2. What file formats will you be using in your project and why?
3. How will you organize your files into directories, and what naming conventions will you use?
4. How often will your data change or be updated, and will versions need to be tracked?
5. What types of metadata do you need to collect in order for someone else to fully understand your data?
EXAMPLE: National Science Foundation (NSF) DMP
Data Access, Sharing, Reuse, and Redistribution
Policies for access and sharing, including provisions for the appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements. Policies and provisions for reuse, redistribution, and the production of derivatives.
EXAMPLE: National Science Foundation (NSF) DMP
Data Access, Sharing, Reuse, and Redistribution Questions to Ask
1. Who is responsible for managing and controlling your data?
2. Who is likely to be interested in your data and what are the foreseeable future uses of the data?
3. When and where do you intend to publish or distribute your data?
4. How will the data be made available?
5. Will there be an embargo period before the data is made available for wider distribution? If so, explain why.
6. Are there issues regarding privacy or restricted, confidential, or sensitive data?
7. How have you addressed any institutional review board (IRB) protocols that may apply to your research?
8. Are there intellectual property issues or agreements with industry or government agencies that affect sharing?
9. If you are using data from other sources, do you have the right to share that data?
EXAMPLE: National Science Foundation (NSF) DMP
Storage and Preservation
Plans for archiving data, samples, and other research products, and for preservation of access to them.
EXAMPLE: National Science Foundation (NSF) DMP
Storage and Preservation
Questions to Ask
1. What is your strategy for data storage and backup?
2. What data will be preserved for the long term?
3. Are extra steps required to prepare the data for preservation?
4. What related information or metadata will be preserved along with the data?
5. Where and how will the data be preserved?
6. What procedures does the archive have in place to ensure preservation and backup?
7. How long will the data be kept after the project is completed?
Stage 2: Discover
Locate existing data
• Data Directories (e.g. re3data, OpenAccessDirectory)
• General Repositories (e.g. figshare)
• Discipline-related repositories (e.g. DRYAD for life sciences)
• Data Journals (e.g. https://www.nature.com/sdata/)
DataCite is an international organization that helps researchers to find, access, and use data.
Collect & Organize
Stage 3 Things to consider:
• Finding and reusing data
• Choosing a file format
• Naming data files
• Data versioning
File name
Stage 3: Collect and Organize
Before staring your project,
decide on a naming convention
for your files.
1. Meaningful
2. Length
3. Underscores & Hyphens
4. YYYYMMDD
5. Zeros
6. No special characters
7. Versions
• Stanford University Libraries - Data Management Services• University of Wisconsin Research Data Services• Purdue University Libraries - Data Management for Graduate Researchers• Cornell University Research Data Management Service Group
Stage 3: Collect and Organize
File Format• Choose one and stick to it
• Consider the software that will be used to access data
• Repository requirements
• Lost features during conversion
• Stanford University Libraries - Data Management Services
• Cornell University Research Data Management Service Group
• Cambridge University Libraries - Data Management
Stage 3: Collect and Organize
Data Versioning
Saving new copies of your files when you make changes so that you can go back and retrieve specific versions of your files later.
DataFileName_1.0 = original documentDataFileName_1.1 = original document with minor revisionsDataFileName_2.0 = document with substantial revisions
Data Versioning
Style 1: end of the file name.
image1_v1.jpg
image1_v2.jpg
image2_v1.jpg
image2_v2.jpg
Data Versioning
Style 1: incorporate names or initials of collaborators
dataset1_20160402_KES
dataset1_20160301_WTC
dataset1_20160814_GSC
Stage 4: Data Quality
Quality Assurance vs. Quality Control• Assurance: Process oriented and focuses on defect prevention
• Control: Product oriented and focuses on defect identification.
Stage 4: Data Quality
Help others
understand how
to use data
Avoid mistakes
due to poor data
quality
Track errors and
conflicts
Importance of QA/QC plan
Stage 4: Data Quality
QA/QC plans should include
• Methods to deal with erroneous data (Assurance)
• Methods to identify erroneous data (Control)
• Methods to mark erroneous data (Control)
Stage 4: Data Quality
Methods
• Consistent techniques, processes, and environments
• Mechanisms to compare data sets
• Scripts or macros
Components of data description
Stage 5: Data Description
• Describe scientific context
• Include critical information
• Identifiers within datasets
• Create a data dictionary
Metadata
Stage 5: Data Description
• “Data about data” (context)
• Description of your research data
Stage 5: Data Description
Makes your data
easier to find.
Increases
understanding and
reusability of data.
Makes your data and
associated research
verifiable
What does metadata do?
Stage 5: Data Description
What to include in metadata• General Information
• Data and File Overview
• Methodological Information
• Data specific-information
who created the data
what the data file contains
when the data were generated
where the data were generated
why the data were generated
how the data were generated.
Stage 5: Data Description
Where can metadata be collected?
• Lab notebooks
• Plain text README files
• Within data file
• Web forms
Data Dictionary
Stage 5: Data Description
• Describes all the data stored in a data set or used by a
database
• Describes the data, does not contain the data
Components of data dictionary
Stage 5: Data Description
• List of all files
• Type of data included
• List of field and variable names
• Description of information contained in each
field
Examples:
• Ag Data Commons
• National Renewable Energy Laboratory (NREL)
• Protein Data Bank Exchange Data Dictionary (PDBx/mmCIFV4.0)
Data Storage
Stage 6 Things to consider:
• Size of dataset
• Computational requirements
• Backup
• Security
Stage 7: Data Sharing
Benefits of Data Sharing• Promote new discoveries
• Enhance Impact
• Support Validation
• Encourage Collaboration
• Increase Public investment
• Reduce redundancy
Stage 7: Data Sharing
Locations
• Disciplinary repository
• Data journal
• Supplementary File
• Web-based tools
Stage 7: Data Sharing
Preparation for sharing
• Use consistent and meaningful file names
• Use self-explanatory variable names and abbreviations
• Remove redundant variables and labels
• Apply anonymization as needed
• Check copyright and privacy permissions
Summary
• Data management is the organization, storage, preservation, and sharing of data collected and used in a research project.
• Data management is critical in every stage of the data lifecycle
• Things to always remember:
• RECORD and TRACK
• NAME FILES
• STORE and BACKUP
• GUIDELINES and REQUIREMENTS
Free Data Management software
Service Description
Adobe Bridge Adobe Bridge is free software for locally organizing images.
Figshare
Figshare is a multidisciplinary repository where users can make all of their research outputs available in a citable, shareable and discoverable manner. Figshare allows users to upload any file format to be made visualisable in the browser so that figures, datasets, media, papers, posters, presentations and filesets can be disseminated. Figshare uses Datacite DOIs for persistent data citation.
Open Science Framework
The Open Science Framework (OSF) is a free, open source web application that connects and supports the research workflow, enabling scientists to increase the efficiency and effectiveness of their research. Researchers use the OSF to collaborate, document, archive, share, and register research projects, materials, and data.
XSEDE Bridges computing and storage
XSEDE national infrastructure facility hosted at the Pittsburgh (PA) Supercomputer Center. Campus XSEDE champion is Aaron Culich (as of 2016). XSEDE offers free computing and storage to qualified researchers through a competitive application process.
XSEDE Storage Services
XSEDE is a set of national facilities that scientists can use to interactively share computing resources, data and expertise. People around the world use these resources and services — things like supercomputers, collections of data and new tools — to improve our planet. XSEDE resources include several services for storing research data.
References
https://guides.library.yale.edu/rdm_healthsci/home
https://pitt.libguides.com/managedata/understanding#s-lg-box-4890536
https://data.research.cornell.edu/content/readme
https://ukdataservice.ac.uk/deposit-data/preparing-data.aspx
https://dmptool.org/
https://www.dataone.org/
https://datadryad.org/stash
Qualitative Data Management
• Create a data dictionary that contains:
• Dates
• Locations
• Individual or group characteristics
• Interview characteristics
• Other defining features
• Ensure fidelity of analyzed data
• Ethics requirements
• Version control
Mack N, Woodsong C, MacQueen KM, Guest G, Namey E. Qualitative research methods: a data collectors field guide.