data management lab: session 1 slides

58
Research Data Management Spring 2014: Session 1 Practical strategies for better results University Library Center for Digital Scholarship

Upload: heather-coates

Post on 11-Nov-2014

482 views

Category:

Education


3 download

DESCRIPTION

Spring 2014 Data Management Lab: Session 1 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab) What you will learn: 1. Build awareness of research data management issues associated with digital data. 2. Introduce methods to address common data management issues and facilitate data integrity. 3. Introduce institutional resources supporting effective data management methods. 4. Build proficiency in applying these methods. 5. Build strategic skills that enable attendees to solve new data management problems.

TRANSCRIPT

Page 1: Data Management Lab: Session 1 Slides

Research Data Management

Spring 2014: Session 1

Practical strategies for better results

University Library Center for Digital Scholarship

Page 2: Data Management Lab: Session 1 Slides

Acknowledgements

Department of Biostatistics – Data Management, Indiana University School of Medicine Colleagues at Johns Hopkins University, Purdue University, Oregon State University, University of Oregon, New York University, and others who shared their expertise.

Page 3: Data Management Lab: Session 1 Slides

ROAD MAP FOR THIS LAB

Page 4: Data Management Lab: Session 1 Slides

Overview

• Four sessions, 2 hours each • Some lecture, more discussion and activities • Major products

– Practical, detailed data management plan [DRAFT] – Map of data outcomes – Storage & backup plan – Documentation checklist – Data quality standards – Screening & cleaning checklist

Page 5: Data Management Lab: Session 1 Slides

Products & Resources

• Box folders – Session 1, 2, 3, 4: Materials for each session – Resources: Miscellaneous resources that span

sessions or are useful later – Upload HERE: Folder for uploading products

• Will be used to assess my teaching – content & delivery • Will NOT be used to assess you • Please delete your name from the file before you

upload them

Page 6: Data Management Lab: Session 1 Slides

1. Research data management plans & planning

2. Documentation & metadata

3. Data quality 4. Ethical & Legal issues

in data sharing & reuse

Page 7: Data Management Lab: Session 1 Slides

Session 1

1. Research data management plans & planning a) Planning for good data management from the

start b) Defining expected outcomes for your data c) Getting a storage and backup plan

Page 8: Data Management Lab: Session 1 Slides

Activities & Discussions

• Introductions (<1 minute each) –Name –Department or Program –What do you want to get out of these

workshops?

Page 9: Data Management Lab: Session 1 Slides

INTRODUCTION TO RESEARCH DATA MANAGEMENT

MODULE 1

Page 10: Data Management Lab: Session 1 Slides

LEARNING OUTCOMES • Describe key challenges

associated with managing digital research data

• Identify the potential consequences for irresponsible or inattentive data management

Page 11: Data Management Lab: Session 1 Slides

Phot

o co

urte

sy o

f ww

w.c

arbo

afric

a.ne

t

Data is collected from sensors, sensor networks, remote sensing, observations, and more - this calls for increased attention to data management and stewardship

Data Deluge

Phot

o co

urte

sy o

f ht

tp:/

/mod

is.gs

fc.n

asa.

gov/

Phot

o co

urte

sy o

f ht

tp:/

/ww

w.fu

turle

c.co

m

CC im

age

by ta

jai o

n Fl

ickr

CC im

age

by C

IMM

YT o

n Fl

ickr

Imag

e co

llect

ed b

y Vi

v Hu

tchi

nson

Page 12: Data Management Lab: Session 1 Slides

Source: John Gantz, IDC Corporation: The Expanding Digital Universe

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

800,000

900,000

1,000,000

2005 2006 2007 2008 2009 2010

The World of Data Around Us)

Transient information or unfilled demand for storage

Information

Available Storage

Pet

abyt

es W

orld

wid

e

Page 13: Data Management Lab: Session 1 Slides

Why Data Management

• Natural disaster • Facilities infrastructure failure • Storage failure • Server hardware/software failure • Application software failure • External dependencies (e.g. PKI

failure) • Format obsolescence • Legal encumbrance • Human error • Malicious attack by human or

automated agents • Loss of staffing competencies • Loss of institutional commitment • Loss of financial stability • Changes in user expectations and

requirements

CC

imag

e by

Sha

ryn

Mor

row

on

Flic

kr

CC

imag

e by

mom

bole

um o

n Fl

ickr

Page 14: Data Management Lab: Session 1 Slides

Best Practices

Best Practices for Preparing Ecological Data Sets, ESA, August 2010

Poor data practice results in loss of information (data entropy)

Info

rmat

ion

Cont

ent

Time

Time of publication

Specific details

General details

Accident

Retirement or career change

Death

(Michener et al. 1997)

14

Page 15: Data Management Lab: Session 1 Slides

Data Loss

15

.33

Vines et al, 2014

Page 16: Data Management Lab: Session 1 Slides

“MEDICARE PAYMENT ERRORS NEAR $20B” (CNN) December 2004 Miscoding and Billing Errors from Doctors and Hospitals totaled $20,000,000,000 in FY 2003 (9.3% error rate) . The error rate measured claims that were paid despite being medically unnecessary, inadequately documented or improperly coded. In some instances, Medicare asked health care providers for medical records to back up their claims and got no response. The survey did not document instances of alleged fraud. This error rate actually was an improvement over the previous fiscal year (9.8% error rate).

“AUDIT: JUSTICE STATS ON ANTI-TERROR CASES FLAWED” (AP) February 2007 The Justice Department Inspector General found only two sets of data out of 26 concerning terrorism attacks were accurate. The Justice Department uses these statistics to argue for their budget. The Inspector General said the data “appear to be the result of decentralized and haphazard methods of collections … and do not appear to be intentional.”

“OOPS! TECH ERROR WIPES OUT Alaska Info” (AP) March 2007 A technician managed to delete the data and backup for the $38 billion Alaska oil revenue fund – money received by residents of the State. Correcting the errors cost the State an additional $220,700 (which of course was taken off the receipts to Alaska residents.)

Slide courtesy of BLM

Page 17: Data Management Lab: Session 1 Slides

Professional Stakes

Page 18: Data Management Lab: Session 1 Slides

Benefits of GOOD Data Management

• Efficiency • Safety • Quality • Reputation • Compliance

Page 19: Data Management Lab: Session 1 Slides

Minute paper

Why should we care about how research data is managed? [Subtext: Why should researchers spend time managing their data better?]

Don’t forget to upload your paper to Box.

Page 20: Data Management Lab: Session 1 Slides

References 1. DataONE Education Module: Data Management. DataONE. Retrieved

December 2013. From http://www.dataone.org/sites/all/documents/ L01_DataManagement.pptx

2. Cook, B. (2013). NACP All Investigator Meeting: Data Management Practices for Early Career Scientists. Presented February 3, 2013. From http://daac.ornl.gov/NACP_AIM_2013/NACP_AIM_Agenda.html

3. Vines et al, (2014), Current Biology, The availability of research data declines rapidly with article age. http://dx.doi.org/10.1016/j.cub.2013.11.014

Page 21: Data Management Lab: Session 1 Slides

DATA MANAGEMENT PLANS & PLANNING

MODULE 1

Page 22: Data Management Lab: Session 1 Slides

LEARNING OUTCOMES • Understand the life

cycle approach to managing research data

• Summarize the basic components of US federal funding agency requirements for data management and sharing.

• Outline planned project and data documentation in a data management plan.

• Define expected outcomes for data.

Page 23: Data Management Lab: Session 1 Slides

The Life Cycle Approach

• Helps define and explain complex processes (graphically). (Carlson, 2013)

• Help to identify important components, roles, responsibilities, milestones, etc. (Carlson, 2013)

• Demonstrates connections and relationships between parts and the whole. (Carlson, 2013)

• Emphasizes the role of data management as an active process embedded throughout the research and knowledge creation life cycles.

Page 24: Data Management Lab: Session 1 Slides

DataONE Data Life Cycle

Page 25: Data Management Lab: Session 1 Slides

Amanda Whitmire, 2013

Page 26: Data Management Lab: Session 1 Slides

Humphrey, Knowledge Creation Cycle

Page 27: Data Management Lab: Session 1 Slides

Progress Towards Openness

1985: National Research Council

1999: OMB

Circular A-110

revisions

2003: NIH Data Sharing Policy

2008: NIH

Public Access Policy

2011: NSF DMP Requirem

ent

2012: NEH,

Office of Digital

Humanities DMP

Requirement

2013: NSF Bio sketch change

2013: OSTP

Memo on Public

Access to the

Results of Federally-

Funded Research

Page 28: Data Management Lab: Session 1 Slides

OSTP Memo - February 2013

• Data – Maximize access by the general public and without charge…protecting

confidentiality and personal privacy – …recognizing proprietary interests, business confidential information,

and intellectual property rights – …preserving the balance between the relative value of long-term

preservation and access and administrative burden – …ensure all researchers develop data management plans – Ensure appropriate evaluation of the merits of submitted DMPs – Promote the deposit of data in publicly accessible databases – …support training, education, and workforce development related to

scientific data management, analysis, storage, preservation, and stewardship

Page 29: Data Management Lab: Session 1 Slides

Policy Drivers

• Funding agencies – Increased impact of funding dollars – Reduce redundant data collection – Further scientific research

• Research Communities – Enhance use and value of existing data – Address big challenges

Page 30: Data Management Lab: Session 1 Slides

Data Management Planning

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

Page 31: Data Management Lab: Session 1 Slides

DMPs – What do they do?

• Outlines what you will do with your data during and after you complete your research

• Submitted to funders – formal document • Functional DMP – working document

– Start developing during design – Use to guide project start-up – Review and update throughout the project

Page 32: Data Management Lab: Session 1 Slides

DMPs – Why?

• Doing it right saves you time and makes your research more efficient – Document crucial information for your thesis or

dissertation

• Makes it easier to preserve and share your data • Increases visibility of research

Data management is an investment in your research to

make it easier and more efficient.

Page 33: Data Management Lab: Session 1 Slides

A dose of DMP realism

My data management plan – a satire

Page 34: Data Management Lab: Session 1 Slides

DMP

Introduction to the DMP • Workshop - emphasis on planning • BUT it is a working document

Sections to draft • Data description • Existing data (if applicable) • Format

Page 35: Data Management Lab: Session 1 Slides

Mapping Data Outcomes

• Clearly describe what you want your research project to accomplish

• Define what the data need to be in order for you to answer your research questions

• Review example

Page 36: Data Management Lab: Session 1 Slides

DMP

Data mapping exercise – map out research questions through data fields/points/variables

Page 37: Data Management Lab: Session 1 Slides

References

1. Carlson, J. (2013). ICPSR Curating and Managing Data for Reuse: Life Cycle Models and Principles.

2. DataONE Education Module: Data Management Planning. DataONE. From http://www.dataone.org/sites/all/ documents/L03_DataManagementPlanning.pptx

3. Humphrey, C. (2008). e-Science and the Life Cycle of Research. From http://datalib.library.ualberta.ca/ ~humphrey/lifecycle-science060308.doc

4. Whitmire, A. (2013). Research Life Cycle. From http://guides.library.oregonstate.edu/content.php?pid=502068&sid=4136875

Page 38: Data Management Lab: Session 1 Slides

ETHICAL & LEGAL OBLIGATIONS MODULE 1

Page 39: Data Management Lab: Session 1 Slides

LEARNING OUTCOMES • Identify your legal

obligations for sharing and long-term preservation.

• Identify your ethical obligations for ensuring data confidentiality, privacy, and security.

• Describe intellectual property issues for data that result in a patentable or commercial product.

Page 40: Data Management Lab: Session 1 Slides

Ethical vs. Legal • Ethical (Professional Society, Licensure, Community of Practice)

– Sharing (consent, IRB approval, de-identification, etc.) – Redistribution & Re-use – Citation

• Legal (Federal, State, Local, Funding Agency, Institution) – Intellectual Property (e.g., who owns it?) – Copyright – Patents – Trade secrets – Licensing – Monetary exchange – Open source vs. proprietary software – Data retention

Page 41: Data Management Lab: Session 1 Slides

Privacy

• Privacy: having control over the extent, timing, and circumstances of sharing oneself (physically, behaviorally, or intellectually) with others.

• Federal guidelines: FERPA, HIPAA • Most research involves asking subjects to provide or release

information voluntarily following an informed consent process.

• Privacy issues arise in regard to information obtained for research purposes without the consent of the subjects.

Page 42: Data Management Lab: Session 1 Slides

Confidentiality • Confidentiality: treatment of information that an individual has

disclosed in a relationship of trust and with the expectation that it will not be divulged to others in ways that are inconsistent with the understanding of the original disclosure without permission.

• Questions to consider: – Are identifiers really needed or could data be collected anonymously? – If identifiers are needed, can coded IDs be created to use for data collection,

merging, and analysis, with identifiers kept entirely separate and secure? – How will the data be protected from inadvertent disclosure or unauthorized

access during collection, storage, and analysis? – Should data be manipulated in specific ways to reduce specificity, by

collapsing data into categories with small numbers of individuals, reducing age or geographic specificity, etc.

Page 43: Data Management Lab: Session 1 Slides

Intellectual Property Rights

• Patent • Copyright • Trademark • Design • Circuit Layout Right • Plant Breeder’s Right • Trade Secret

Page 44: Data Management Lab: Session 1 Slides

DMP

Sections to work on: • Ethics and privacy • Legal obligations

Page 45: Data Management Lab: Session 1 Slides

References

1. Australian Research Council. (nd). National Principles of Intellectual Property Management for Publicly Funded Research. From http://www.arc.gov.au/pdf/01_01.pdf

Page 46: Data Management Lab: Session 1 Slides

STORAGE & BACKUP MODULE 1

Page 47: Data Management Lab: Session 1 Slides

LEARNING OUTCOMES • Prepare a

comprehensive storage and backup plan.

• Create protected copies of files at crucial points in your study.

Page 48: Data Management Lab: Session 1 Slides

Storage & Back-up Plan

• Storage – Keep primary copies in a secure, accessible location

• Backup – Additional copies to prevent data loss – Rule of 3 – Diversify hardware, software, and physical location

• Other considerations – Security, encryption, compression

Page 49: Data Management Lab: Session 1 Slides

Storage @ IU

• Box @ IU – http://kb.iu.edu/data/bdsv.html

• Research File System – http://kb.iu.edu/data/aroz.html

• Scholarly Data Archive – http://kb.iu.edu/data/aiyi.html

• REDCap – http://www.indianactsi.org/rct

• Slashtmp (sharing) – http://kb.iu.edu/data/angt.html

Page 50: Data Management Lab: Session 1 Slides

Backup Plan

• Rule of 3 – Local copy (ex: desktop or laptop) – Semi-local copy (ex: IU cloud storage) – Remote copy (ex: IU cloud storage)

• Backup frequency – How much data can you risk losing?

• Backup procedure – Manual or automatic? – Full or incremental? – Verification/testing? – Documentation

Page 51: Data Management Lab: Session 1 Slides

Security & Encryption

• Use IU systems – Strong authentication protocols

• Encryption – Useful for portable devices (e.g., laptops, external hard

drives, flash drives, smartphones, etc.) – Use for highly sensitive data – IU recommendations

• http://kb.iu.edu/data/ayzi.html • http://kb.iu.edu/data/bcnh.html

Page 52: Data Management Lab: Session 1 Slides

Master Files

• Provides snapshots of key phases in the data life cycle – Raw – Cleaned – Phases of processing

• In combination with detailed documentation, these files make write-up easier and supports reproducibility and reuse

Page 53: Data Management Lab: Session 1 Slides

EF-5 Horror Stories

• World’s Biggest Data Breaches: http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/

• Excel error responsible for misinterpretation of data and resulting policy decisions: http://arstechnica.com/tech-policy/2013/04/ microsoft-excel-the-ruiner-of-global-economies/

• Sandy’s floodwaters damage 1500 volumes of digital art: http://www.theverge.com/2013/1/15/3876790/eyebeam-hurricane-sandy-digital-archive-rescue

Page 54: Data Management Lab: Session 1 Slides

EF-3 Horror Stories • UNC Researcher Demoted over data breach:

– http://www.insidehighered.com/news/2011/01/27/unc_case_highlights_debate_about_data_security_and_accountability_for_hacks

– http://www.databreaches.net/cancer-researcher-fights-unc-demotion-over-data-breach/

• UK Tamiflu Clinical Trial data: http://blogs.plos.org/speakingofmedicine/2014/01/03/follow-the-money-or-why-it-took-an-accounts-committee-to-decide-why-access-to-clinical-trial-data-matters/

• Data loss at Emory Healthcare exposes over 315,000 patients: http://www.bizjournals.com/atlanta/news/2012/04/18/data-loss-at-emory-healthcare-exposes.html?s=print

Page 56: Data Management Lab: Session 1 Slides

Minute Paper

Describe how your storage and backup plan will address the key risks for your data.

Don’t forget to upload your paper to Box.

Page 57: Data Management Lab: Session 1 Slides

DMP

Sections to work on: • Data organization

– Storage & Backup Plan

Don’t forget to upload your DMP to Box.

Page 58: Data Management Lab: Session 1 Slides

Wrapping up

What’s next? Discussion • What worked? • What didn’t?