data management for grant funded projects may 10, 2011 jennifer eustis, libraries antje harnisch,...

48
Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn Mills, Libraries

Upload: logan-shaw

Post on 11-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Data Management for Grant Funded Projects

May 10, 2011

Jennifer Eustis, LibrariesAntje Harnisch, OSP

Jila Kazerounian, UITSDavid Lowe, Libraries

Carolyn Mills, Libraries

Page 2: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Table of Contents Motivation: What’s behind the new NSF Data Management Plan

(DMP) requirement? [DL] “Data” as record Ulterior concerns

Your plan: How do I meet the new DMP requirement?I. Types of data [JK]II. Standards (Formats, metadata) [JK,JE]III. Project Storage [DL]IV. Access policies [AH]V. Post-project plans [JK,DL]VI. Review of UConn examples [AH+]

Resources: Where can I turn for advice in meeting the new DMP requirement? OSP [AH] UITS [JK] Libraries’ Scholarly Communication page+ [CM]

Page 3: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Data and the Scientific Record: Purpose

• To communicate (findings, hypotheses, insights)• To organize (nomenclature, terminology,

disciplines)• To build communities toward collaboration• To document, manage, resolve controversies• To establish precedence• To be trustworthy• To be reproducible• To perturb assumptions and methods

See Clifford Lynch (2009)

Page 4: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Your Data and the DMP

• Consider providing access to the data from your project that serves the above purposes effectively and efficiently

• Examples of data like this, not like that (UOregon)– Not:

• preliminary analyses• drafts of scientific papers• plans for future research• peer reviews• communications with colleagues

Page 5: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Motivations for Data Concerns

• Fragility of digital data• Data best managed when like sets/types

together• New paradigm of data intensive discovery

– Esp. cross-disciplinary

Page 6: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Motivation: Data fragility

from Gizmodo, via

BusinessInsider (2010)

Page 7: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Motivation: Like Data Together

• Traditional Libraries: – Maps– AV– Oversize– Archival Material

• eScience Repositories:– ICPSR for Social Sciences– GenBank for Genetic Sequencing– …

Page 8: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Motivation: New Possibilities

• Visions of “what if”:– Single discipline: Better earthquake/tsunami

prediction possible with signs from current data in hand?

– Cross-discipline: Traffic engineers and communications disorders specialists research shared data to alleviate the “wrong way driver” problem

“The future is interdisciplinary.”--Susan Herbst, March 22, 2011

Page 9: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Your DMP

I. Types of data [JK]II. Standards (Formats, metadata) [JK,JE]III. Project Storage [DL]IV. Access policies [AH]V. Post-project plans [JK,DL]VI. Review of UConn examples [AH+]

Page 10: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

I. Types of Data

• The types of data cover the following points:

• Who is the data for and who controls it? PI, Funding Agency, University, etc.Who is your audience for the data? How will they use the data?

• What kind of data? e.g. Numeric, Text, Modeling, Multimedia/Image, etc.

Page 11: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

I. Types of Data

• Is the data generated from experiments, simulated from models, observed and captured at the time of some event or derived and compiled from databases, data mining, etc.?

http://www.data-archive.ac.uk/sharing/acceptable.asp

Page 12: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

I. Types of Data

• What is the growth rate of the data? Are you gathering data by hand or using

sophisticated instrumentation that is able to capture a lot of data at once?

Will there be more data as time goes on? If so, you will need to plan for the growth. What amounts to enough storage this year may not be sufficient for next year.

Page 13: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. The Standards for Format• Data could be one of the following formats: • Text -- e.g. Word, PDF• Numeric -- e.g. Excel, Access, MYSQL• Multimedia/Image -- e.g. jpeg, tiff, dicom, mpeg,

quicktime • Models -- e.g. 3D• Domain-specific -- e.g. FITS in astronomy, CIF in

chemistry • To get more detail on the types of data, refer to the report issued by the

UK Data Archive: (http://www.data-archive.ac.uk/sharing/acceptable.asp).

Page 14: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. The Standards for Format

• Formats more likely to be accessible in the future are:

• Non-proprietary• Open, documented standard• Common usage by research community• Standard representation (e.g., simple text)• Unencrypted• Uncompressed

Page 15: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. The Standards for Format

• Examples of preferred format choices:• PDF/A, not Word• CSV [comma separated values], not Excel • MPEG-4, not Quicktime• TIFF or JPEG2000, not GIF or JPG• XML or RDF , not RDBMS

Page 16: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. File Version Control

• Strategies include:file-naming conventions

standard file headers (inside the file) listing creation date, version number, status

log files

Page 17: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. File Version Control

version control software (e.g., SVN [subversion])Always record every change to a file no matter how small. Discard obsolete versions if no longer needed after making backups.

Page 18: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. File Naming Convention

• Reserve the 3-letter file extension for application-specific codes, e.g., formats like WRL, MOV, TIF

• Identify the activity or project in the file name, e.g., use the unique project name or identifierProject_name_YYYYMMDD[hh][mm][ss][_extra].ext

Page 19: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. File Naming Conventions

• Many academic disciplines have specific recommendations, e.g.,

DOE’s Atmospheric Radiation Measurement (ARM) Program: http://www.arm.gov/data/plan.stm

• GIS datasets from Massachusetts State http://www.mass.gov/mgis/dwn-name.htm

Page 20: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. Metadata

Metadata in simplest terms is “data about data”• structured information that describes, explains,

locates or otherwise makes it easier to retrieve, use or manage an information resource.

• a metadata record consists of a set of attributes, or elements, necessary to describe the resource in question

• Metadata assures accessibility of the data• Can be embedded into data or stored separately

Page 21: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. Metadata

• 3 main types of metadata addressed in different places in various standards:

• Descriptive: describes the resource for identification and discovery

• Structural: how compound objects are put together

• Administrative: creation date, file type, rights management (who can access the data), and preservation (archiving and preserving)

Page 23: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. Dublin Core

Dublin Core metadata standard is a simple yet effective element set for describing a wide range of networked resources.

• Extensibility• Data dictionary

Examples:• Invasive plant database • Connecticut History Online images

Page 24: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. Dublin Core elements

• Title• Creator• Subject• Description• Publisher• Contributor• Date• Type

• Format• Identifier• Source• Language• Relation• Coverage• Rights

see NSDL metadata guidelines

Page 25: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. Guidelines for Good Metadata

• Format appropriate to the data being collected• Interoperable – can be stored and transmitted if

needed• Standard controlled vocabulary to reflect content• Includes statement on conditions and terms of

use• Supports long term management• Consistency• Accuracy

Page 26: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. Metadata Creation Tools– Dublin Core tools

http://dublincore.org/tools

– Learning Object Metadata Editorshttp://www.cancore.ca/editors.htmlUse this tool online: http://demo.licef.teluq.uquebec.ca/eRIB/

– FGDC Metadata Toolshttp://www.fgdc.gov/metadata/geospatial-metadata-toolsA list of metadata creation tools and metadata processing software. Each tool makes use of the Federal Geographic Data Committee's (FGDC) Content Standards for Digital Geospatial Metadata and may support the Biological Data Profile.

– OAI-Specific Toolshttp://www.openarchives.org/tools/tools.htmlA list of links to the tools implemented by members of the Open Archives Initiative community.

Page 27: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

II. Metadata Resources

– Consult your professional societies for preferred metadata resources and tools

– National Information Standards Organization (NISO) http://www.niso.org/publications/press/UnderstandingMetadata.pdf A link to NISO’s booklet, “What is metadata” with examples and resources.

Page 28: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

III. Project Storage

• (see handout)

Page 29: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn
Page 30: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

IV. NSF—Dissemination and Access• Investigators are expected to share with other researchers, at no

more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.

• General adjustments and, where essential, exceptions to this sharing expectation may be specified by the funding NSF Program or Division/Office for a particular field or discipline to safeguard the rights of individuals and subjects, the validity of results, or the integrity of collections or to accommodate the legitimate interest of investigators.

• A grantee or investigator also may request a particular adjustment or exception from the cognizant NSF Program Officer.

Page 31: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

IV. Access Policies and Provisions

• Is the data shared with other researchers? And if so, how is it shared?

• Do you have the right to share the data if not produced by you?

• If could be shared, is it shared with everyone or a limited number of people?

Page 32: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Data Policies• UConn-Storrs has no data policy—yet• Ownership

– Health Center policy stipulates:• Generally, research data are owned by faculty, grad student who created the data• Data generated in projects supported by grants or contracts containing such provisions shall be jointly owned by the Health Center

and the PI. The Health Center shall have an irrevocable right to obtain such data from the PI at any time, even if that individual has left the institution. Custody of the data will continue to be the responsibility of the PI.

– Typical University Data Policy• Consistent with federal policy and prevailing higher education practice, Research Data belong to the University

• Retention—at least 3 years– Institution Case must retain research data in sufficient detail and for an adequate period

of time to enable appropriate responses to questions about accuracy, authenticity, primacy and compliance with laws and regulations governing the conduct of the research.

– PI: Custodian, responsible for collection, retention, and management of research data

Page 33: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

IV. Data Sharing• Publication − dissemination through articles in scientific

journals• Investigator − scientist responds directly to data requests

(mailing a CD-ROM containing data or posting data on a Web site)

• Data Hosting (local data center or offsite)−controlled, secure environment in which eligible researchers can perform analyses using data resources

• Data Archive−place where data can be acquired, manipulated, documented, and distributed

• Mixed Mode−more than one version of a dataset, each providing a different level of access

Page 34: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

IV. Limitations to Sharing/Access

• What are the issues related to confidentiality and intellectual property?

• Does the data have direct or indirect information that could identify the research subjects?

• Is all or part of the data copyrightable? (Copyright could be waived under CC0 declaration: http://creativecommons.org/choose/zero)

Page 35: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

IV. Privacy Concerns

• Any regulations apply to the data (for example HIPAA for health care related data)?

• Any ethical issues in data management?• Privileged or confidential information should be released only

in a form that protects the privacy of individuals and subjects involved. Data-sharing policies for awards that involve human subjects should recognize and address human-subjects protocols and the need to protect privacy and confidentiality.

Page 36: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

IV. Intellectual Property Concerns

• Some proposals may involve proprietary or other restricted data. For example, projects having proprietary information that will eventually lead to commercialization, such as Engineering Research Center (ERC), Nanoscale Science and Technology Center (NSEC), Industry/University Cooperative Research Center (I/UCRC), Small Business Innovative Research (SBIR), Small Business Technology Transfer (STTR), and Grant Opportunities for Academic Liaison with Industry (GOALI) awards.

• Any such data-management issues should be discussed as well as the conditions that might prevent or delay the sharing of data. The proposal’s DMP would address the distinction between released and restricted data and how they would be managed. Exceptions to the basic data-management policy should be discussed with the cognizant program officer before submission of such proposals.

Page 37: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

V. Plans for Eventual Transition or Termination of the Data

Provisions for transition or termination could entail the following:

• Do you need to destroy the data after a

certain period? How permanent is the data? Long-term (10 years or more)? Or short-term (3-5 years or less)?

Page 38: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

V. Plans for Eventual Transition or Termination of the Data

• Do you have the right to do it (Is this your data, is it copyrighted, etc.)?

• Do you need to keep all versions? Just final version? First and last? Depends on re-processing costs. If you can re-process the data, do so.

Page 39: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

V. Plans for Eventual Transition or Termination of the Data

• Are there any legal or ethical obligations for secure removal of data after a certain period? e.g. HIPAA data –Health Insurance Portability and Accountability Act)

• How do you plan to destroy the data? (Degaussing---exposure of the media to magnetic field, software

tools to wipe disks out, destruction of the media---there are companies that do it)

Page 40: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

V. Plans for Eventual Transition or Termination of the Data

• Is there a need and plan to migrate/transition your data to another media or structure (after keeping the data for a long period of time)?

Page 41: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

V. Good Practices

Test data restore from backupCheck documentation and metadataAre files still readable? Still accessible at the published URL?Migrate files to newer formatsUpdate software to read/write dataWeed out obsolete data (and destroy where

appropriate)

Page 42: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

V. Post-Project Plans

• Leadership Opportunities– Metadata schema development– Repository development

• Collaboration is leadership• Standards context

– OAIS reference model (ISO14721:2003)• Submission/Ingest• Archiving, including “fixity checks” (via checksums)• Dissemination

Page 43: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

VI. Example DMPs

• A (gold handout)• B (salmon handout)

Page 44: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

44

VI. Data Management Plan

• Supplementary document entitled Data Management Plan (no more than 2 pages)

• Not included in 15-page limit for proposal bodies• May not be needed, e.g., because the project doesn’t

deal with data, but that must be stated/justified in the “plan”

• Will be reviewed as part of intellectual merit or broader impacts of the proposal, or both

Page 45: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

VI. Monitoring/Reporting• Annual Reports

– Progress on data production – Progress on sharing and dissemination of data

• Final Project Report– Data produced during the award– Data to be retained after the award expires – How data will be available for sharing– How data will be disseminated– Formats used including any metadata– Location of data (archive/storage)

• Future Proposals– Data management issues included in “Results of prior NSF

support”

Page 46: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

VI. Examples/Samples/Templates

• DMP depends on discipline, types of data, nature of the project

• Difficult to provide templates• DMP should provide answers to the questions

NSF posted; these can be used as headers;• Examples can be found through links provided

on the UConn website

Page 47: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

UConn Resources

• Libraries:http://lib.uconn.edu/scholarlycommunication/data.html

• UITS: http://itrequest.uconn.edu

Page 48: Data Management for Grant Funded Projects May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn

Thank you! Questions?

Our contact info:• Library:

– Jennifer Eustis, Catalog/Metadata Librarian– David Lowe, Digital Preservation Librarian– Carolyn Mills, Liaison to Biology and Agriculture

• Office of Sponsored Programs:– Antje Harnisch, Assistant Director, Pre-Award

and Contract Services • University ITS:

– Jila Kazerounian, Manager of Web Development and Integration Technologies