metro rdm webinar
TRANSCRIPT
![Page 1: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/1.jpg)
Managing & Preserving Data SetsVicky Steeves | METRO Webinar | 8/3/2015
![Page 2: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/2.jpg)
Itinerary ❖Science Data: Definition & Explanation❖Current Trends in DigiPres for Science❖Benefits to Curating Datasets❖Existing Problems❖Research Data Management❖Upcoming Tools
![Page 3: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/3.jpg)
What is Data @ the Federal Gov’t?
“the recorded factual material commonly accepted in the scientific community as necessary to validate research
findings.”
-Federal Office of Management & Budget Circular A-110
![Page 4: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/4.jpg)
Why is science data different?
![Page 5: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/5.jpg)
Why is Science data different?
OR
![Page 6: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/6.jpg)
Why is Science data different?
![Page 7: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/7.jpg)
Why is Science data different?
![Page 8: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/8.jpg)
Why is Science data different?
![Page 9: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/9.jpg)
Digital Preservation of Science in the US
❏ North Carolina County Geospatial Data
❏ Caroline Dean Wildflower Collection
❏ FSU Biological Scientist, Dr. A.K.S.K. Prasad Diatomscapes I and II Collections Photographs
❏ FSU Department of Oceanography Technical Reports
![Page 10: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/10.jpg)
NDSA Levels of Digital PreservationLevel 1 Level 2 Level 3 Level 4
Storage & Geographic Locations
File Fixity & Data Integrity
Information Security
Metadata
File Formats
![Page 11: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/11.jpg)
USGS Levels of Digital PreservationLevel 1 Level 2 Level 3 Level 4
Storage & Geographic Locations
Data Integrity
Information Security
Metadata
File Formats
Physical Media
![Page 12: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/12.jpg)
Benefits Satisfy your federal grant requirements
more likely to receive future funding if data is made immediately accessible & you write and follow a well-structured DMP
Saves time & effort, making research process more efficientreduces duplicated work & cuts costs of storagemore efficient quality control of data produced better version control of data--identify versions that can be
periodically purged to save money on storage costs!Adding metadata means that both you & others can go back to it
and easily understand and use it effectively
![Page 13: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/13.jpg)
BenefitsMakes your data scientifically and legally defensible
verifiable & authenticatable, replicable & easily duplicated!
Supports Open Access Movementadvocates for researchers to share data to foster development of body of
knowledgesustainability of science data in the long-term!Reinforce open scientific inquiry
could lead to new and unanticipated discoveries!continually improve the quality of datadata can be used as valuable teaching instrument to train future scientists
![Page 14: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/14.jpg)
EXAMPLE: Benefit to Science
![Page 15: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/15.jpg)
Existing Problems: Management“My research assistants
manage all my data.”
“I think only the division chair knows how much server space we have in total.”
“There are not enough computer terminals in the
imaging lab.”
“I have no time to standardize my data
management.”
“I have no way to get my data off of the computers in the gene sequencing lab because the files are too large.”
“I don’t know where exactly to get support for my database. I don’t know if its IT’s jurisdiction or job.”
![Page 16: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/16.jpg)
Existing Problem: Scope
![Page 17: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/17.jpg)
Existing Problems: Media
![Page 18: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/18.jpg)
Possible Solution
![Page 19: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/19.jpg)
Research Data Management
a set of practices which affords researchers the ability to more quickly, efficiently, and accurately find, access, and understand their own or others’ research data
*also our way to get science to listen to us! (sorry Scott)
![Page 20: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/20.jpg)
Data Management PlanA data management plan (DMP) is a document that describes how you will collect, organise, manage, store, secure, backup, preserve, and share your data.
![Page 21: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/21.jpg)
Federal Agencies & DMPs
![Page 22: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/22.jpg)
DMPs Usual Requirements❏ Format of Data❏ Research methodology❏ Roles & Responsibilities for Data❏ Metadata Standards❏ Storage & Back Up Procedure❏ Long-Term Archiving & Preservation Plan❏ Access Policy❏ Security Measures
❏ Data❏ Humans
![Page 23: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/23.jpg)
Existing Tools
![Page 24: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/24.jpg)
Research Lifecycle
Data management is done at all stages of the research lifecycle.
*each step in the process has its own best practices & standards
![Page 25: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/25.jpg)
Research Lifecycle: CreateCreating Data
what format will the data be in?
where will we store this data?
how will it be backed up?
how are we going to share this data?
how will we collect this data?
how will we describe this data?
![Page 26: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/26.jpg)
Research Lifecycle: ProcessProcessing Data
how will we check, validate, or clean the data?
a. how will we describe that process?
b. how will we describe the data?
will we store the processed data? where? how?
![Page 27: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/27.jpg)
Research Lifecycle: AnalyzeAnalyzing Data
how will we interpret data?
what research outputs will be produced?
a. what format will they be in?
b. how can we make it preservation-ready?
where will we store this data?
how will we ready this data for publication?
![Page 28: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/28.jpg)
Existing Resources
Metadata
Open File Formats
Darwin Core
Library of Congress File Standards Guide
ABCD
Data Documentation Initiative
File Directories
& Org
Fixity Checks & SecurityBack Ups
Version Control
![Page 29: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/29.jpg)
Research Lifecycle: PreservePreserving Data
what is the best archival format for our type of data?
what is the best type of archival storage for our data?
What needs to be preserved alongside our data to make it useful to others?
a. metadata and documentation
![Page 30: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/30.jpg)
Research Lifecycle: AccessAccess to Data
Interoperable formatting means many people can use our data
distribute data
share data
promote data
![Page 31: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/31.jpg)
Existing Resources
![Page 32: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/32.jpg)
Research Lifecycle: ReUseRe-using Data
5 years down the road? 10?
a. use open, new archival formats
b. refresh it our storage media
c. scrutinize our findings & integrate data into new projects
teach next generation using our datasets
![Page 33: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/33.jpg)
What’s more...
![Page 34: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/34.jpg)
Short-Term Solution
![Page 35: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/35.jpg)
Long-Term Solution
![Page 36: METRO RDM Webinar](https://reader030.vdocument.in/reader030/viewer/2022021503/587c1a101a28abb5068b4f6f/html5/thumbnails/36.jpg)
Upcoming Tools & Strategies● ReproZip is a general tool for Linux distributions that simplifies the process
of creating reproducible experiments from command-line executions.
○ automatically creates a package that contains all the dependencies (e.g., libraries, input data) that are required to run the input command.
● Hydra in a Box is a turnkey feature-rich, robust, flexible digital repository that is easy to install, configure, and maintain
○ Joint venture between DPLA, Stanford University, and DuraSpace for $2M from IMLS
○ 30 month period to completion