the curator’s approach to data management and …...the curator’s approach to data management...
TRANSCRIPT
The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science & Scholarship Graduate School of Library & Information Science University of Illinois at Urbana-Champaign Digital Humanities at Oxford Summer School 14-18 July 2014
Agenda
Data management – ...as a DH technique
• “…valued ends…” • “…available resources…”
– DMP Agency Mandates – DMP beyond two pages
Sustainability – Significant properties – 2 Case studies in DH sustainability
“I’m trying to deflate the idea of digital humanities from a domain to an underlying set of practices” 6 July DH 2014
DM as a DH Technique
Many different
Techniques
Data Management as a DH Technique
“…the ensemble of practices by which one uses available resources in order to achieve certain valued ends.”
Harold Lasswell
Valued Ends
• Preservation of Knowledge (material artifacts that are produced, as well as ways of knowing)
• Maximize the value of public investment • Increase the efficiency of doing digital
humanities research – both immediate and long-term.
7 The Royal Society Science Policy Centre. (2012). Science as an open enterprise. Page 60.
Data management
• Is highly personal • Interpersonal when collaborating • Intrapersonal in our relationship with
institutions, organizations and funding agencies
! =
Data management techniques include concerns of …
• Planning ( more in a bit ) / Costing • Documentation • Formatting • Storage • Copyright / IP / Licensing
Documentation
Documentation : tricks and tips
• Include a “header” line that describes the variables as the first line in the table.
• Use plain ASCII text for your file names, variable names, and data values.
• Record naming schemes (<- develop naming schemes)
• When you export from an analysis environment (e.g. SPSS, R, Gephi, etc.) record transformations in a separate: readme_(filename).txt file
Storage & Formatting!
Storage : DIY Cyberinfrastructure
Formatting & Storage: Tricks and Tips
• Store data in nonproprietary software formats (e.g., comma delimited text file, .csv); proprietary software (e.g., Excel, Access) can become unavailable, whereas text files can always be read.
• When in an analysis stage - store an uncorrected (raw) data file. Do not make any corrections to this file; make corrections within a scripted language.
Modified from: https://www.nceas.ucsb.edu/content/simple-guidelines-effective-data-management
Copyright / IP slide
IP: Tricks of Trade
Melissa Levine’s Checklist on the DH Curation Guide: http://guide.dhcuration.org/legal/policy/#p05
Data Management Planning
• Is highly social – Dialectic (optimal vs. practical) – Plans change
Peer Reviewed
Components Enforcement
AHRC
Yes
Summary of Digital Outputs and Digital Technologies; Technical Methodology; Standards and Formats; Hardware and Software; Data Acquisition, Processing, Analysis and Use; Technical Support and Relevant Experience; Preservation, Sustainability and Use; Preserving Your Data; Ensuring Continued Access and Use of Your Digital Outputs
Unclear
NEH
YES
Expected types of data Period of data retention Data forms and dissemination Data storage and preservation
YES
EU
No Data set reference and name Data set description Standards and metadata Data sharing Archiving and preservation
Sliding
DMP Mandates (Funding Agencies)
AHRC Example Project: Kitchen Cosmology Project University of Bristol. PI: Dr. Rita Langer. Link: http://bit.ly/1n0eVUn
NEH Example Project: A unified approach to preserving cultural software objects and their development histories : UC – Santa Cruz. PI Noah Wardrip-Fruin Link: http://1.usa.gov/1kNxM8n
completed worksheets
Costing – Tricks and Tips
4C: Overview of 10 curation cost models: http://bit.ly/1lDMUFt “…provides a short description of each of the models and a presentation of their core features…”
More tricks of the trade slide
• Advertise your data • Say how you would like it to be cited (paper?
data? both?) • State known limitations (fit-for-purpose) • Rely on journals, repositories and colleagues
for guidance • Don’t rely on journals, repositories or
colleagues for guidance
SUSTAINABILITY How do projects end?
Why this matters to DC
Fundamental questions of digital preservation: 1. What must you retain to ensure the integrity
and authenticity of the digital object? 2. What can you lose without potential implications?
Significant Properties
“…characteristics of an information object that must be maintained to ensure that object’s continued access, use, and meaning over time as it is moved to new technologies.” (Wilson, 2007).
Five categories of SPs • Content • Context • Rendering • Structure • Behavior
Criteria for deciding significance
Grace, S. & Knight, G. (2008)
GLOBALIZATION AND AUTONOMY ONLINE COMPENDIUM
Case study 1 : Sustainability
Rockwell, Day, Yu, and Engel (2014) Burying Dead Projects: Depositing the Globalization Compendium. Digital Humanities Quarterly; 8 (002). http://www.digitalhumanities.org/dhq/vol/8/2/000179/000179.html
Then we came to (planning for) the end
- XML files with content; - A MySQL bibliographic database; - A metadata database of the content
for generating topical pages and for searching;
- A full text index for searching the text;
- The code that handles the dynamic generation of the site, the searching, linking, and the XSL transforms;
- Some HTML pages and CSS stylesheets;
- And various images that are embedded in pages.
End of what?
http://globalautonomy.ca/global1/index.jsp
“The experience of the Compendium is that the intellectual work is not only in the individual articles, or even in the bibliographic data – it is in the interaction between these, mediated by code and in the user experience.”
Rockwell et al. 2014
What was deposited?
Content: …the texts, including bibliography, and glossary. We also considered the text on the HTML pages content.
Code: HTML, CSS, and includes the XSLT code that generated much of the interface Process: …materials (but not all) that document the editorial processes, including the editorial backend that strictly speaking was not part of the Compendium as experienced. The User Experience: …information about the experience of the Compendium as an interactive work by writing a narrative along with screen shots of typical use of the Compendium stored as PDFs
Five categories of SPs • Content • Context • Rendering • Structure • Behavior
Rockwell’s Categories • Content • Code • Process • User Experience
PERSEUS DIGITAL LIBRARY Case study 2 : Sustainability
How would Perseus End? (hint – not by beheading Medusa)
RESOURCE LIST
Rockwell, Day, Yu, and Engel (2014) Burying Dead Projects: Depositing the Globalization Compendium. Digital Humanities Quarterly; 8 (002). http://www.digitalhumanities.org/dhq/vol/8/2/000179/000179.html Grace, S. & Knight, G. (2008) What are significant properties and why should I care? Presentation delivered at Digital Curation 101, October, 7 2008. Edinburgh, Scotland