data literacy: creating and managing reserach data

37
Data Literacy: Creating and Managing Research Data Research Resources Forum 2014 Cunera Buys Kelsey Rydland Claire Stewart

Upload: cunera

Post on 18-Aug-2015

97 views

Category:

Data & Analytics


2 download

TRANSCRIPT

  1. 1. Data Literacy: Creating and Managing Research Data Research Resources Forum 2014 Cunera Buys Kelsey Rydland Claire Stewart
  2. 2. Data nightmares Tweeted in 2012 by Gail Steinhart, Head of Research Services, Mann Library, Cornell University
  3. 3. Data nightmares
  4. 4. Science Staff. 2011. Challenges and Opportunities. Science 331 (6018) (February 11): 692 -693. doi:10.1126/science.331.6018.692.
  5. 5. What are data?
  6. 6. Data Digital Curation Center (UK): Data, any information in binary digital form, is at the centre of the Curation Lifecycle. Office of Management and Budget: Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings
  7. 7. BICEP2 (South Pole telescope) Performativity, Place, Space Burgess and Hamming, 2011BICEP2 Collaboration, 2014 Data in the sciences, humanities
  8. 8. Every discipline has data! Spreadsheets Scanned books and images Instrument data Managing data well from the start of the project is critical: make a plan
  9. 9. What is Data? Types of data include: observational data laboratory experimental data computer simulation textual analysis physical artifacts or relics Examples of data include: Audio and video files Code or scripts Digital text Lab notebooks Geospatial images Photographs Rock samples Survey results Scanned documents Spreadsheets Video games
  10. 10. Data management is important because..
  11. 11. FUNDING AGENCIES
  12. 12. Why do funders and broader science community want to share and preserve data?
  13. 13. Prevent Data Loss
  14. 14. Scientific Reproducibility
  15. 15. Recognition Chapter II.C.2.f(i)(c), Biographical Sketch(es), has been revised to rename the Publications section to Products and amend terminology and instructions accordingly. This change makes clear that products may include, but are not limited to, publications, data sets, software, patents, and copyrights.
  16. 16. Journal Requirements 7. Sharing of Data, Materials, and Software Publication is conditional upon the agreement of the authors to make freely available any materials and information described in their publication that may be reasonably requested by others. Data Availability PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception1. When submitting a manuscript online, authors must provide a Data Availability Statement describing compliance with PLOS's policy. If the article is accepted for publication, the data availability statement will be published as part of the final article. Refusal to share data and related metadata and methods in accordance with this policy will be grounds for rejection. PLOS journal editors encourage researchers to contact them if they encounter difficulties in obtaining data from articles published in PLOS journals. If restrictions on access to data come to light after publication, we reserve the right to post a correction, to contact the authors' institutions and funders, or in extreme cases to retract the publication.
  17. 17. Deposit on publication of article Some Journal publishers require or recommend that supporting data for articles be made publicly available. The Joint Data Archiving Policy (JDAP) requires data sharing in a public archive as a condition of publication. Journals that have adopted JDAP include: Science, Nature and Genetics The author is usually responsible for making data available in repository/ archive. Check data archiving policies of journals before submitting articles.
  18. 18. Why share data? Why make it open? Clearly documents and provides evidence for research in conjunction with published results. Meet copyright and ethical compliance (i.e. HIPAA). Increases the impact of research through data citation. Preserves data for long-term access and prevents loss of data. Describes and shares data with others to further new discoveries and research. Prevent duplication of research. Accelerates the pace of research. Promotes reproducibility of research.
  19. 19. Start with a plan
  20. 20. Common Data Lifecycle Stages From: Fary, Michael and Owen, Kim, Developing an Institutional Research Data Management Plan Service, Educause ACTI white paper, January 2013, http://net.educause.edu/ir/library/pdf/ACTI1301.pdf Data management planning
  21. 21. Points to address in your DMP Types of data to be produced. Standards or descriptions that would be used with the data (metadata). How these data will be accessed and shared. Policies and provisions for data sharing and reuse. Provisions for archiving and preservation. flickr.com/photos/inl/5097547405
  22. 22. Thoughts on naming stuff File naming Versioning Directory structures Metadata
  23. 23. Why should you care? Find your files easier Creates uniformity Allows for sorting Understand what is under the hood Allows for versioning Source: http://library.stanford.edu/spc/university-archives/managing-university-records/file-naming-guidelines
  24. 24. File naming Part I Create names that allow for useful sorting YES: 20130909_RogersParkAnalysis NO: Kelseys Rogers Park Files Keep names short and easy to read YES: 2014_RogersParkStudy NO: Rogers Park Demographic Analysis of.. Use camel case YES: 2014_RogersParkAnalysis NO: 2014 Rogers Park Analysis Source: http://library.stanford.edu/spc/university-archives/managing-university-records/file-naming-guidelines
  25. 25. File naming Part II Avoid spaces, symbols, abbreviations OK to use underscores _ and hyphens DATES! Use them! Enhances sorting Should be: YEAR_MONTH_DAY (19791203 or 1979_12_03) File name as version control (e.g. KelseyPartyPolicy_rev2013_02_20.docx) Source: http://library.stanford.edu/spc/university-archives/managing-university-records/file-naming-guidelines
  26. 26. Some thoughts on directories Folders should be major functions/activities Subfolders by year Make folder names explanatory Avoid personal names Avoid duplication Simple and simplistic Source: http://bentley.umich.edu/dchome/resources/filenaming.php
  27. 27. Dont lose your DATA Store at least 3 versions USB, someplace else and someplace else (e.g. USB, personal computer, Northwestern Box) box.northwestern.edu 30 gb of FREE storage
  28. 28. Northwestern Box Demo!
  29. 29. Do you have a repository? Project repository? Funder repository? Open data? Who knows?! See DataLib (http://databib.org/)
  30. 30. Metadata Metadata (metacontent) is defined as the data providing information about one or more aspects of the data, such as: Means of creation of the data Purpose of the data Time and date of creation Creator or author of the data Location on a computer network where the data were created Standards used Data about data...
  31. 31. Data about data?
  32. 32. Metadata Data about data Information that describes the data Two types: Structural metadata Descriptive metadata Ability to explain to somebody that knows nothing about your research
  33. 33. Metadata according to ICPSR A number of elements should be included in metadata, including, but not limited to: Principal investigator Funding sources Data collector/producer Project description Sample and sampling procedures Weighting Substantive, temporal, and geographic coverage of the data collection Data source(s) Unit(s) of analysis/observation Variables Technical information on files Data collection instruments
  34. 34. RESOURCES Northwestern University Library Data Management Web Page: http://www.library.northwestern.edu/dmp DMPTool: https://dmp.org/ Northwestern University's Research Data: Ownership, Retention and Access Policy: http://www.research.northwestern.edu/policies/documents/research_data.pdf Northwestern University Library's Center for Scholarly Communication & Digital Curation: http://www.library.northwestern.edu/services/faculty-graduate- students/scholarly-communication
  35. 35. Contact information Data Management Support Cunera Buys, e-science librarian: [email protected] Kelsey Rydland, GIS/Data Analyst: [email protected] Claire Stewart, Head Digital Collections & Scholarly Communication Services: [email protected]