data management lab: session 4 slides

51
Research Data Management Spring 2014: Session 4 Practical strategies for better results University Library Center for Digital Scholarship

Upload: heather-coates

Post on 07-May-2015

368 views

Category:

Education


3 download

DESCRIPTION

Data Management Lab: Session 4 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab) What you will learn: 1. Build awareness of research data management issues associated with digital data. 2. Introduce methods to address common data management issues and facilitate data integrity. 3. Introduce institutional resources supporting effective data management methods. 4. Build proficiency in applying these methods. 5. Build strategic skills that enable attendees to solve new data management problems.

TRANSCRIPT

Page 1: Data Management Lab: Session 4 Slides

Research Data Management

Spring 2014: Session 4

Practical strategies for better results

University Library Center for Digital Scholarship

Page 2: Data Management Lab: Session 4 Slides

ETHICAL & LEGAL OBLIGATIONS DATA PROTECTION, RIGHTS, & ACCESS

MODULE 4

Page 3: Data Management Lab: Session 4 Slides

LEARNING OUTCOMES • Identify your legal

obligations for sharing and long-term preservation.

• Identify how ethical and legal obligations affect data protection and sharing.

• Identify core project documents for long-term access and preservation.

• Select appropriate tools and platforms for storing, managing, and preserving data.

Page 4: Data Management Lab: Session 4 Slides

Ethical vs. Legal Obligations • Ethical (Professional Society, Licensure, Community of Practice)

– Sharing (consent, IRB approval, de-identification, etc.) – Redistribution & Re-use – Citation

• Legal (Federal, State, Local, Funding Agency, Institution) – Intellectual Property (e.g., who owns it?) – Copyright – Patents – Trade secrets – Licensing – Monetary exchange – Open source vs. proprietary software – Data retention

Page 5: Data Management Lab: Session 4 Slides

Federal Laws

• HIPAA in research • FERPA • Centers for Medicare and Medicaid Services

– Requires patient records be retained for a period of 5 years (see 42CFR482.24 (b) [PDF]). Medicaid requirements may vary by state.

• AHIMA – Retention of Health Information – Federal Record Retention Requirements

Page 6: Data Management Lab: Session 4 Slides

Indiana State Laws

• Data Retention – Health care records (Ind. Code § 16-39-7-1)

• Health care providers must maintain the original health records or microfilms of the records for at least 7 years.

• IU General Counsel Guidance - State Data Protection and Security Laws – Social Security Number law: Ind. Code § 4-1-10 – Security Breach law: Ind. Code § 4-1-11 – Data Destruction: Ind. Code § 24-4-14

Page 7: Data Management Lab: Session 4 Slides

Indiana University Policies

• Human Subjects Standard Operating Procedures: http://researchadmin.iu.edu/HumanSubjects/hs_policies.html

• Animal Care & Use (IACUC) Policies: http://researchadmin.iu.edu/IACUC/IUPUI/iacuc_policies.html

• Research, Ethics, Education & Policy: http://researchadmin.iu.edu/REEP/reep_policies.html – Code of Federal Regulations (CFR), Copyright, Patents, etc.

Page 8: Data Management Lab: Session 4 Slides

Handling Sensitive Data

• IU Guidelines: http://protect.iu.edu/cybersecurity/data/handling

• Privacy: having control over the extent, timing, and circumstances of sharing oneself (physically, behaviorally, or intellectually) with others. – Privacy issues arise in regard to information obtained for research

purposes without the consent of the subjects.

• Confidentiality: treatment of information that an individual has disclosed in a relationship of trust and with the expectation that it will not be divulged to others in ways that are inconsistent with the understanding of the original disclosure without permission.

Page 9: Data Management Lab: Session 4 Slides

Protect, Store, Preserve

• Protection – Includes storage, backup, archiving, preservation AND physical

security, encryption, and other topics

• Backup v. archive – Backups (active files): a copy (or copies) of the original file; intended

for rapid recovery – Archives (selected, static files): long-term preservation of the file, not

intended for rapid recovery

• Preservation is archiving PLUS data rescue, reformatting, conversion, metadata to ensure ACCESS

Page 10: Data Management Lab: Session 4 Slides

Deciding what to preserve

Gutman et al, 2004, Data Science Journal • How significant are the records for research? • How significant is the source and context of the records? • Is the information unique? • How useable are the records? • Do the records document decisions that set precedents? • Are the records related to other permanent records? • What is the time frame covered by the information? • What are the cost considerations for permanent maintenance

of the records?

Page 11: Data Management Lab: Session 4 Slides

IU Resources: Protect, Store, Preserve

• Storage for active files – Research File System: http://kb.iu.edu/data/aroz.html

• Collect and store sensitive data @ REDCap – http://www.indianactsi.org/rct

• Encryption software @ IUWare • Preserve

– open data @ IUPUI DataWorks – “dark data” @ Scholarly Data Archive:

http://kb.iu.edu/data/aiyi.html

Page 12: Data Management Lab: Session 4 Slides

Documentation for Preservation

• What will you need to reuse the data in 5 years? What will a colleague or student need to understand the data in 5-10 years? – Study: research questions/aims, IRB protocol, informed

consents/authorizations, etc. – Data collection instruments or tools OR data sources – Data collection process or workflow – Data dictionary or model, codebook, readme.txt – Lab or research notebook – Processing or analytical scripts – Suggested citation

Page 13: Data Management Lab: Session 4 Slides

Metadata for Preservation

• If your data is worth keeping and can be shared, put in in a repository that enables both preservation and sharing – Typically, the repository creates metadata to enable discovery and

preservation – Standards depend on the community

• If it can’t be shared openly, put it in a “dark” repository or storage system – IU Scholarly Data Archive – Metadata likely not necessary; submit documentation with the data

Page 14: Data Management Lab: Session 4 Slides

Discussion

How do our ethical and legal obligations as researchers affect how we store and protect our data?

Page 15: Data Management Lab: Session 4 Slides

Backups, Archives, and Data Preservation

1. Backup, wikipedia.org, http://en.wikipedia.org/wiki/Backup , (accessed 3/16/2011)

2. Georgia Tech Library, NSF Data Management Plans – Research Data Management (Georgia Tech Library and Information Center), http://libguides.gatech.edu/content.php?pid=123776&sid=1514980 (accessed 3/16/2011)

3. Albanesius, Chloe, Google: Storage software update led to e-mail bug, http://www.pcmag.com/article2/0,2817,2381168,00.asp (accessed 11/18/2011)

4. Van den Eynden, Veerle, Corti, Louise, Woollard, Matthew, Bishop, Libby and Horton, Laurence, Managing and Sharing Data, http://www.data-archive.ac.uk/media/2894/managingsharing.pdf (accessed 4/25/12)

For more information about physical security, encryption, and data disposal, visit: http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

Page 16: Data Management Lab: Session 4 Slides

References 1. UK Data Service: How to share data. From

http://ukdataservice.ac.uk/manage-data/plan/how-share.aspx 2. DataONE Education Module: Data Protection Backups. DataONE.

Retrieved Nov12, 2012. From http://www.dataone.org/sites/all/documents/L06_DataProtectionBackups.pptx

3. Gutman, M. P., Schurer, K., Donakowski, D., Beedham, H. (2004). The Selection, Appraisal, and Retention of Social Science Data. Data Science Journal, 3, 209-221. doi: 10.2481/dsj.3.209

Page 17: Data Management Lab: Session 4 Slides

DATA SHARING & REUSE MODULE 4

Page 18: Data Management Lab: Session 4 Slides

LEARNING OUTCOMES • Evaluate resources

for sharing data and openly or publicly available data.

Page 20: Data Management Lab: Session 4 Slides

Why should I care?

What matters is the totality of the evidence.

Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124

Page 21: Data Management Lab: Session 4 Slides
Page 24: Data Management Lab: Session 4 Slides

Common Data Sharing Strategies

• Depositing in a specialist data centre or archive • Submitting to a journal to support a publication • Depositing in a self-archiving system or an

institutional repository • Disseminating via a project or institutional

website • Informal peer-to-peer exchange

http://ukdataservice.ac.uk/manage-data/plan/how-share.aspx

Page 25: Data Management Lab: Session 4 Slides

Strategies for Sensitive Data • Authorization or Informed Consent • De-identification

– Statistical deletion or masking of 19 identifiers (HIPPA) – Can be reversed

• Anonymization – Removing all links to subject so that data cannot be traced back – Cannot be reversed

• Limited data set – A set of data in which most of the Protected Health Information has

been removed. – Typically involves a Data Use Agreement

• Restricted Use Data & Data Enclaves – ICPSR – UK Data Service Secure Lab

Page 26: Data Management Lab: Session 4 Slides

The Spectrum of Data Sharing

ACCESS

DARK OPEN

SCO

PE O

F DA

TA

RAW DATA

CLEANED DATA

LIMITED DATA

RESTRICTED USE DATA

Published results

Page 27: Data Management Lab: Session 4 Slides

Sharing Considerations

What

De-identified data

Limited Data Set

Publication-related

All processed data

With whom

Upon request

Colleagues

Community

Anyone

Where/ How

Secure system

Community resource

Subject repository

Institutional repository

When

Embargoed

With publication

Immediately

Page 28: Data Management Lab: Session 4 Slides

IU Resources: Access, Sharing, Re-use

• Slashtmp: http://kb.iu.edu/data/angt.html • Indiana CTSI Resources

– Alfresco Share, REDCap @ http://www.indianactsi.org/rct

• IU High Performance Storage: – Research File System: http://kb.iu.edu/data/aroz.html – Scholarly Data Archive: http://kb.iu.edu/data/aiyi.html

• IUPUIDataWorks: http://dataworks.iupui.edu • Data Dryad: http://datadryad.org/ • Figshare: http://figshare.com/ (Data, tables, figures) • ICPSR: http://www.icpsr.umich.edu/icpsrweb/deposit/ • IU Github: https://github.iu.edu/repositories (Code)

Page 29: Data Management Lab: Session 4 Slides

Activity & Discussion Explore one of the following repositories. • ICPSR • Harvard Dataverse Network • National Database for Autism Research

Be prepared to discuss the following: • Is it easy to browse or search for data? • Is it easy to view and download data and documentation? • Can you analyze or visualize the data in the repository?

Page 30: Data Management Lab: Session 4 Slides

References 1. DataONE Education Module: Data Management. DataONE. Retrieved

December 2013. From http://www.dataone.org/sites/all/ documents/L04_DataEntryManipulation.pptx

2. Data Information Specialists Committee(DISC-UK). Data Sharing Continuum: http://www.disc-uk.org/docs/data_sharing_continuum.pdf

Page 31: Data Management Lab: Session 4 Slides

DATA ATTRIBUTION & CITATION MODULE 4

Page 32: Data Management Lab: Session 4 Slides

LEARNING OUTCOMES • Identify two

technologies enabling data citation.

Page 33: Data Management Lab: Session 4 Slides

Unique Identifiers: Items • DOI (most common for data)

– Provides an actionable, interoperable, persistent link – Actionable – through use of identifier syntax and network resolution

mechanism (Handle System®) – Persistent – through combination of supporting improved handle

infrastructure (registry database, proxy support, etc) and social infrastructure (obligations by Registration Agencies)

– Interoperable – through use of a data model providing semantic interoperability and grouping mechanisms

• Data Citation Index (Thomson Reuters) – Gathers citation data (often for free) to build a database of data citations – Takes in free data and charges institutions for the expensive tool

• Handle – a unique URL – for example, an item record in a repository

Page 35: Data Management Lab: Session 4 Slides

Unique Identifiers: Authors • ORCID

– 10 things – Can be used across

multiple platforms • ResearcherID

– Created by Thomson Reuters

– limited to their products (e.g., Web of Science)

• Scopus Author ID – Created by Elsevier – limited to the Scopus

database

Page 36: Data Management Lab: Session 4 Slides

What do unique identifiers do?

• Enable increased availability of impact measures – Citation metrics (e.g., Google Scholar) – Article-level metrics (e.g., PLoS) – Altmetrics (social media based: Twitter, ResearchGate,

blog mentions, etc.) – Web analytics (views, downloads)

• Make it easy to share, cite, and track • Enable you to be proactive in disseminating your

work and gathering evidence of its impact

Page 38: Data Management Lab: Session 4 Slides

Open Data Repositories

• These two will be merged soon: – DataBib: http://databib.org/ – Re3data: http://www.re3data.org/

• DOAR: http://www.opendoar.org/ • NCBI: http://www.ncbi.nlm.nih.gov/ • US Government:

– https://www.data.gov/ – https://www.data.gov/open-gov/ – Healthdata.gov

• Stats Indiana: http://www.stats.indiana.edu/

Page 39: Data Management Lab: Session 4 Slides

Licensed Statistics & Data @ IUPUI

• Linked from http://ulib.iupui.edu/resources/abc/A – Business Insights: Essentials – Current Index to Statistics – Datastream (terminal in UL Reference Room) – ICPSR – National Archive of Criminal Justice Data – ProQuest Statistical Insight Tables

• Statistical Data (research guide, many sources)

Page 40: Data Management Lab: Session 4 Slides

Practical Strategies

• Sign up for an ORCID, especially if – You have a common name – You have publications in more than one name

• Put your scholarship online where others can find and access it – Subject repository – Institutional repository – Personal portfolio

Page 41: Data Management Lab: Session 4 Slides

Activity & Discussion

How would you cite the following dataset? http://www.icpsr.umich.edu/icpsrweb/NAHDAP/studies/34792/version/1

How can you track citations to your own data?

Page 42: Data Management Lab: Session 4 Slides

SYNTHESIS & WRAP UP MODULE 4

Page 43: Data Management Lab: Session 4 Slides

LEARNING OUTCOMES • ALL OF THEM!

Page 44: Data Management Lab: Session 4 Slides

Review of Strategies Covered: S1

• Research Data Management – Remember the stakes for poor data management

• Data management plans & planning – Funding agency, publisher, & legal requirements – Plan to make your research more efficient – Map data outcomes

• Ethical & legal obligations – Identify them before you begin a project – Ask for help

• Storage & Backup – Choose reliable, secure tools available at IUPUI

Page 45: Data Management Lab: Session 4 Slides

Review of Strategies Covered: S2

• Organizing data & files – Create a good file organization plan & write it down – Create a good file naming scheme & write it down – Be consistent – Create master/locked copies of files

• Project & data documentation – Identify key documentation & create it – Update documentation throughout the project – Use standards in your field or community of practice – Document for yourself in 5 years

Page 46: Data Management Lab: Session 4 Slides

Review of Strategies Covered: S3

• Quality assurance & control – Identify standards & write them down – Develop procedures to ensure data quality & write them down

• Data collection • Data coding & entry

– Use best practices

• Data screening & cleaning – Develop a protocol or checklist based on the data outcomes map &

identified quality standards

• Automation – When possible, choose tools with automation features and detailed

event logs

Page 47: Data Management Lab: Session 4 Slides

Review of Strategies Covered: S4

• Ethical & legal obligations – Know your obligations for data sharing, retention, & preservation – Be aware of the options (always changing)

• Data protection, rights, & access – Know what resources are available

• Data sharing & re-use – Data sharing exists on a spectrum – it is NOT all-or-none – Know the benefits of sharing your data – Know your options for sharing your data

• Data attribution & citation – Know how to cite a dataset | Make it easy for others to cite yours

Page 48: Data Management Lab: Session 4 Slides

Activity

Choose 2-3 strategies from all four sessions to improve your own data management (upload Word doc to Box: Upload HERE: Session 4) Share & Discuss: Which strategies did you choose? Why?

Page 49: Data Management Lab: Session 4 Slides

NACP Best Data Management Practices, February 3, 2013

Fundamental Data Practices

1. Define the contents of your data files 2. Use consistent data organization 3. Use stable file formats 4. Assign descriptive file names 5. Preserve information 6. Perform basic quality assurance 7. Provide documentation 8. Protect your data

49

Page 50: Data Management Lab: Session 4 Slides

Final Evaluation

Please complete the evaluation before you leave. Your feedback will be used to improve the workshop for the next group.

Page 51: Data Management Lab: Session 4 Slides