Download - Practical Data Management
![Page 1: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/1.jpg)
Practical Data Management
ACRL DCIG Webinar30 April 2014
Kristin Briney, PhD
![Page 2: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/2.jpg)
andrius.v, https://www.flickr.com/photos/banditaz/6823875954 (CC BY-NC-SA)
![Page 3: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/3.jpg)
Mr.TinDC, https://www.flickr.com/photos/mr_t_in_dc/5940438148 (CC BY-ND)
![Page 4: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/4.jpg)
International Institute of Tropical Agriculture, https://www.flickr.com/photos/iita-media-library/8160877379 (CC BY-NC)Musgo Dumio_Momio, https://www.flickr.com/photos/30976576@N07/2903662286 (CC BY-NC-SA)
![Page 5: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/5.jpg)
Jen Doty and Rob O'Reilly, “Learning to Curate @ Emory”. RDAP 2014
![Page 6: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/6.jpg)
Data Management Basics
• Introduction to a few topics in data management– File organization and naming– Documentation– Storage and backups– Future file usability
![Page 7: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/7.jpg)
Data Management Basics
• Introduction to a few topics in data management– File organization and naming– Documentation– Storage and backups– Future file usability
Teach & Use
![Page 8: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/8.jpg)
For each minute of planning at beginning of a project, you will save 10 minutes of headache later
![Page 9: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/9.jpg)
FILE ORGANIZATION & NAMING
Dan Zen, http://www.flickr.com/photos/danzen/5551831155/ (CC BY)
![Page 10: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/10.jpg)
File Organization
• What?– Keeping your files in order
![Page 11: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/11.jpg)
File Organization
• Why?– Easier to find and use data– Tell, at a glance, what is done and what you have
yet to do– Can still find and use files in the future
![Page 12: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/12.jpg)
File Organization
• When?– Always!– Get in the habit of putting files in the right place
![Page 13: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/13.jpg)
File Organization
• How?– Any system is better than none– Make your system logical for your data• 80/20 Rule
– Possibilities• By project• By analysis type• By date• …
![Page 14: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/14.jpg)
Example
• Thesis– By chapter• By file type (draft, figure, table, etc.)
• Data– By researcher• By analysis type
– By date
![Page 15: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/15.jpg)
File Naming Conventions
• What?– Consistent naming for files
![Page 16: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/16.jpg)
http://retractionwatch.com/2014/01/07/doing-the-right-thing-authors-retract-brain-paper-with-systematic-human-error-in-coding/
![Page 17: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/17.jpg)
File Naming Conventions
• Why?– Make it easier to find files– Avoid duplicates– Make it easier to wrap up a project because you
know which files belong to it
![Page 18: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/18.jpg)
File Naming Conventions
• When?– For a group of related files (3 to 1000+)– May need different conventions for different
groups
![Page 19: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/19.jpg)
File Naming Conventions
• How?– Pick what is most important for your name• Date• Site• Analysis• Sample• Short description
![Page 20: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/20.jpg)
File Naming Conventions
• How?– Files should be named consistently– Files names should be descriptive but short (<25
characters)– Use underscores instead of spaces– Avoid these characters: “ / \ : * ? ‘ < > [ ] & $– Use the dating convention: YYYY-MM-DD
![Page 21: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/21.jpg)
Example
• YYYYMMDD_site_sampleNum– 20140422_PikeLake_03– 20140424_EastLake_12
• Analysis-sample-concentration– UVVis-stilbene-10mM– IR-benzene-pure
![Page 22: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/22.jpg)
DOCUMENTATION
Brady, https://www.flickr.com/photos/freddyfromutah/4424199420 (CC BY)
![Page 23: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/23.jpg)
What would someone unfamiliar with your data need in order to find, evaluate, understand, and reuse them?
![Page 24: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/24.jpg)
Documentation
• Why?– Data without notes are unusable– Because you won’t remember everything– For others who may need to use your files
![Page 25: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/25.jpg)
Documentation
• When?– Always– Documentation needs will vary between files
![Page 26: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/26.jpg)
Documentation
• How?– Take good notes– Metadata schemas• http://www.dcc.ac.uk/resources/metadata-standards
![Page 27: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/27.jpg)
Documentation
• How?– Methods• Protocols• Code• Survey• Codebook• Data dictionary• Anything that lets someone reproduce your results
![Page 28: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/28.jpg)
Documentation
• How?– Templates• Like structured metadata but easier• Decide on a list of information before you collect data
– Make sure you record all necessary details– Takes a few minutes upfront, easy to use later
• Print and post in prominent place or use as worksheet
![Page 29: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/29.jpg)
Example
• I need to collect:– Date– Experiment– Scan number– Powers– Wavelengths– Concentration (or sample weight)– Calibration factors, like timing and beam size
![Page 30: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/30.jpg)
Documentation
• How?– README.txt• For digital information, address the questions
– “What the heck am I looking at?”– “Where do I find X?”
• Use for project description in main folder• Use to document conventions• Use where ever you need extra clarity
![Page 31: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/31.jpg)
Example
• Project-wide README.txt– Basic project information• Title• Contributors• Grant info• etc.
– Contact information for at least one person– All locations where data live, including backups
![Page 32: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/32.jpg)
Example
“Talk_v1: rough outline of talk Talk_v2: draft of talk Talk_v3: updated 2014-01-15 after feedback”
“ ‘Data’ folder contains all raw data files by date ‘Analysis’ has analyzed data and plots ‘Paper’ has drafts of article on this work”
![Page 33: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/33.jpg)
grover_net, http://www.flickr.com/photos/9246159@N06/599820538/ (CC BY-ND)
STORAGE AND BACKUPS
![Page 34: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/34.jpg)
Storage
• Why?– Need good storage practices to prevent loss– Keep data secure
![Page 35: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/35.jpg)
Storage
• How?– Library motto: Lots of Copies Keeps Stuff Safe!– Rule of 3: 2 onsite, 1 offsite
![Page 36: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/36.jpg)
Storage
• How?– Computer– External hard drive– Shared drives/servers– Tape backup– Cloud storage*– CDs/DVDs– USB flash drive
Erica Wheelan, https://www.flickr.com/photos/reinventedwheel/5985479866 (CC BY)
![Page 37: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/37.jpg)
*Cloud Storage
• Read the Terms of Service!• Eg. Google Drive
– “When you upload or otherwise submit content to our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones”
![Page 38: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/38.jpg)
Backups
http://toystory.disney.com/
![Page 39: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/39.jpg)
Backups
• How?– Any backup is better than none– Automatic backup is better than manual– Your work is only as safe as your backup plan
![Page 40: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/40.jpg)
Backups
• How?– Check your backups• Backups only as good as ability to recover data• Test your backups periodically
– Preferably a fixed schedule– 1 or 2 times a year may be enough– Bigger/more complex backups should be checked more often
• Test your backup whenever you change things
![Page 41: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/41.jpg)
Example
• I keep my data– On my computer– Backed up manually on shared drive• I set a weekly reminder to do this
– Backed up automatically via SpiderOak cloud storage
![Page 42: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/42.jpg)
FUTURE FILE USABILITY
Ian, http://www.flickr.com/photos/ian-s/2152798588/ (CC BY-NC-ND)
![Page 43: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/43.jpg)
Future File Usability
• What?– Can you read your files from 10 years ago?– Data needs to be• Accessible• Interpretable• Readable
![Page 44: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/44.jpg)
lukasbenc, https://www.flickr.com/photos/lukasbenc/3493808772 (CC BY-NC-SA)
![Page 45: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/45.jpg)
Future File Usability
• Why?– You may want to use the data in 5 years– PI sometimes keeps data and notes– Prep for data sharing– Per OMB Circular A-110, must retain data at least
3 years post-project• Better to retain for >6 years
![Page 46: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/46.jpg)
Future File Usability
• When?– When you wrap up a project– (As you work on a project)
![Page 47: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/47.jpg)
Future File Usability
• How?– Back up written notes• People always forget this one• Difficult to interpret data without notes• Options
– Digitally scan (recommended with digital data)– Photocopies
![Page 48: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/48.jpg)
Future File Usability
• How?– Convert file formats• Can you open digital files from 10 years ago?• Use open, non-proprietary formats that are in wide use
– .docx .txt– .xlsx .csv– .jpg .tif
• Save a copy in the old format, just in case• Preserve software if no open file format
![Page 49: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/49.jpg)
Future File Usability
• How?– Move to new media• Hardware dies and becomes obsolete
– Floppy disks!• Expect average lifetime to be 3-5 years• Keep up with technology
![Page 50: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/50.jpg)
WHERE TO GO FROM HERE
![Page 51: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/51.jpg)
Center for Teaching Vanderbilt University, https://www.flickr.com/photos/vandycft/8244800868 (CC BY-NC)
![Page 52: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/52.jpg)
easylocum, https://www.flickr.com/photos/easylocum/2921542814 (CC BY)
![Page 53: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/53.jpg)
Chris Hoving, https://www.flickr.com/photos/pcrucifer/2433274595 (CC BY-ND)
![Page 54: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/54.jpg)
Resources
• Data Ab Initio blog– http://dataabinitio.com/
• eScience Portal– http://esciencelibrary.umassmed.edu/
• DataONE Best Practices– http://www.dataone.org/best-practices
![Page 55: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/55.jpg)
Steal My Slides
• Slides + recording available– http://connect.ala.org/node/220603
• Slides available– http://www.slideshare.net/kbriney
![Page 56: Practical Data Management](https://reader035.vdocument.in/reader035/viewer/2022081513/56816925550346895de05dee/html5/thumbnails/56.jpg)
Thank You!
• This presentation available under a Creative Commons Attribution (CC-BY) license
• Some content courtesy of Dorothea Salo – http://www.graduateschool.uwm.edu/research/researcher-
central/proposal-development/data-plan/boot-camp/ (CC BY)