introduction to rdm for trainee physicians

38
Stuart Macdonald Associate Data Librarian EDINA & Data Library University of Edinburgh [email protected] Introduction to Research Data Management for trainee physicians Research - an introduction for trainee physicians Royal College of Physicians of Edinburgh 28 October 2015

Upload: edina

Post on 15-Apr-2017

334 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Introduction to RDM for trainee physicians

Stuart MacdonaldAssociate Data L ibrar ianEDINA & Data L ibraryUnivers i ty of Edinburghstuart [email protected]

Introduction to Research Data Management for trainee physicians

Research - an introduction for trainee physiciansRoyal College of Physicians of Edinburgh28 October 2015

Page 2: Introduction to RDM for trainee physicians

Background

EDINA and University Data Library (EDL) together are a division within Information Services (IS) of the University of Edinburgh.

EDINA is a Jisc centre for digital expertise providing national online resources for education and research.

The Data Library assists Edinburgh University users in the discovery, access, use and management of research datasets.

Data Library Services: http://www.ed.ac.uk/is/data-library

EDINA: http://edina.ac.uk/

Page 3: Introduction to RDM for trainee physicians

Running order Defining Research data & data types Research Data Management (RDM) Funder requirements Data management planning Organising data File formatting Documentation & metadata Storage & security Data protection, rights & access Preservation, sharing & licensing

Page 4: Introduction to RDM for trainee physicians

Defining research data Research data are collected, observed or created,

for the purposes of analysis to produce and validate original research results.

Data can also be created by researchers for one purpose and used by another set of researchers at a later date for a completely different research agenda.

Digital data can be: o created in a digital form ('born digital')o converted to a digital form (digitised)

Page 5: Introduction to RDM for trainee physicians

Types of research data

Page 6: Introduction to RDM for trainee physicians

Research Data Management (RDM)

• RDM is a general term covering how you organise, structure, store, and care for the data used or generated during the lifetime of a research project.

• It includes:– How you deal with data on a day-to-day basis

over the lifetime of a project,– What happens to data after the project

concludes. RDM is considered an essential part of good

research practice. Good research needs good data!

Page 7: Introduction to RDM for trainee physicians

Activities involved in RDM

Data management Planning

Creating data Documenting data Storage and backup Sharing data Preserving data

Page 8: Introduction to RDM for trainee physicians

Why manage your data?

So you can find and understand it when needed.

To avoid unnecessary duplication. To validate results if required. So your research is visible and has impact. To get credit when others cite your work.

Page 9: Introduction to RDM for trainee physicians

Drivers of RDM

“Publicly funded research data are a public good, produced in the public interest, which should

be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm

intellectual property.”RCUK Common Principles on Data Policy

http://www.rcuk.ac.uk/research/datapolicy/

Page 10: Introduction to RDM for trainee physicians

Funding bodies’ requirements

Funders are increasingly requiring researchers to meet certain data management criteria.

When applying for funding, you need to submit a technical or data management plan.

You are expected to make your data publicly available where appropriate at the end of your project.

Page 11: Introduction to RDM for trainee physicians

http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies

What do Funders want?

Page 12: Introduction to RDM for trainee physicians

University’s RDM Policy

University of Edinburgh is one of the first few Universities in UK who adopted a policy for managing research data: http://www.ed.ac.uk/is/research-data-policy

The policy was approved by the University Court on 16 May 2011.

It’s acknowledged that this is an aspirational policy and that implementation will take some years. http://www.ed.ac.uk/is/research-data-policy

Page 13: Introduction to RDM for trainee physicians

What is a Data Management Plan(DMP)

DMPs are written at the start of a project to define: What data will be collected or created? How the data will be documented and described? Where the data will be stored? Who will be responsible for data security and backup? Which data will be shared and/or preserved? How the data will be shared and with whom?

DMPs are often submitted as part of grant applications, but are useful in their own right whenever you are creating data.

Page 14: Introduction to RDM for trainee physicians

DMPonline

Free and open web-based tool to help researchers write plans: https://dmponline.dcc.ac.uk/It features:

o Templates based on different funder requirements

o Tailored guidance (disciplinary, funder etc.)

o Customised exports to a variety of formats

o Ability to share DMPs with others

DMPonline screencast:http://www.screenr.com/PJHN

Page 15: Introduction to RDM for trainee physicians

Tips to share

Keep it simple, short and specific. Avoid jargon. Seek advice - consult and collaborate. Base plans on available skills and support. Make sure implementation is feasible. Justify any resources or restrictions needed.Also see: http://www.youtube.com/watch?v=7OJtiA53-Fk

Page 16: Introduction to RDM for trainee physicians

Organising data

Why? To ensure your research data files are identifiable by you and others in the future.Organising and labelling your research data files and folders will help to:

prevent file loss through overwriting, deleting, misplacing facilitate location and future retrieval save you time (mostly in the future)

How? With consistent & disciplined approach by: Setting conventions at the start of your project Adopting an appropriate file naming & versioning convention

Page 17: Introduction to RDM for trainee physicians

File formats

Type Recommended Avoid for sharing

Tabular data CSV, TSV, SPSS portable ExcelText Plain text, HTML, RTF, PDF/A

only if layout matters Word

Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC

Quicktime, H264

Images TIFF, JPEG2000, PNG GIF, JPGStructured data XML, RDF RDBMS

Files encoded as text or binary files: • Text encoding: machine- and human-readable. Less likely to

become obsolete .txt, .csv, .html, .xml, .tex, etc.• Binary encoding: only readable with appropriate

software .fcp, .xlxs, .docx, .psd, .nc, etc.

Page 18: Introduction to RDM for trainee physicians

File formatting

If you need to convert or migrate your data files to another format be aware of the potential risk of loss or corruption of your data. Always test the files you convert or migrateYou may also use the data normalisation process i.e. convert data from one format (e.g. proprietary) into another for use or preservation (e.g. into raw ASCII).

When compressing your data files (storage, sending, sharing) you encode the information using fewer bits than the original representation. Compression programs like Zip and Tar.Z produce files such as .zip, .tar.gz, .tar.bz2

Page 19: Introduction to RDM for trainee physicians

Documentation and metadata

Documentation (intending for reading by humans) Contextual information

o Aims & objectives of the originating project Explanatory material

o data sourceo collection methodology & processo questionnaire, codebooko dataset structureo technical information

Metadata (intended for reading by machines) ‘data about data’ descriptors to facilitate cataloguing and

discoverability.

Page 20: Introduction to RDM for trainee physicians

What it does

Documentation Facilitates understanding and

interpretation of your data.o @ project level

explains the background to the research that produced it and its methodologies.

o @ file or database level describes their respective

formats and their relationships with each other.

o @ variable or item level supplies the background to the

variables and their descriptions.

Metadata Provides context for your data,

particularly for those outside your research environment, discipline and institution.

Tracks its provenance.

Makes your data discoverable.

Makes your data easier to use.

Helps support the archiving and preservation of your data.

Page 21: Introduction to RDM for trainee physicians

Why it is necessary

To help you … remember the details of your data archive your data for future access & re-use

To help others … discover your data understand the aims and conduct of the

originating research verify your findings replicate your results

Page 22: Introduction to RDM for trainee physicians

Data Storage - basic principles

Use managed, network services whenever possible to ensure:o Regular back-upo Data Securityo Accessibility

Avoid using portable HD’s, USB memory sticks, CD’s, or DVD’s to avoid:o Data loss due to damage or

failureo Quality control issues due to

version confusiono Unnecessary security risks

e.g. theft

Digital Preservation Coalition’s new promotional USB stick:https://twitter.com/digitalfay/status/411444578122600450/photo/1

Page 23: Introduction to RDM for trainee physicians

Secure storage & backup

Make at least 3 copies of the data:o on at least 2 different media,o keep storage devices in separate

locations with at least 1 offsite, o check they work regularly,o ensure you know the back-up

procedure and follow it. Ensure you can keep track of

different versions of data, especially when backing-up to multiple devices. o Use a versioning software e.g.,

SVNTortoise, Subversion

One copy = risk of data loss

•CC image by Sharyn Morrow on Flickr

• CC

imag

e by

mom

bole

um o

n Fl

ickr

Page 24: Introduction to RDM for trainee physicians

Keeping sensitive data secure

Ensure PC’s, laptops, and portable data storage devices are stored securely and encrypted if necessary - BitLocker (Windows), FileVault (Mac).

Be aware that if the any encrypted data will be lost if the password/encryption key is lost or if the hard disk fails.

Give access to data to authorised people only

System lock: Image by Yuri Yu. Samoilov - Flickr (CC-BY)https://www.flickr.com/photos/110751683@N02/

Page 25: Introduction to RDM for trainee physicians

Data disposal

Ensure disposal of confidential data securely.o Hard drives: use software for secure

erasing such as BC Wipe, Wipe File, DeleteOnClick, Eraser for Windows; ‘secure empty trash’ for Mac.

o USB Drives: physical destruction is the only way

o Paper and CDs/optical Discs: shredding

UoE has a comprehensive guide on the disposal of confidential and/or sensitive waste held on paper, CDs, DVDs, tapes, discs hard drives etc.

http://www.ed.ac.uk/schools-departments/estates-buildings/waste-recycling/how/confidential-waste

Page 26: Introduction to RDM for trainee physicians

Things to think about …

Ethics Requirements relating to data that relates to human

subjects. Privacy, confidentiality & disclosure Data protection Intellectual Property Rights (IPR) Copyright

Page 27: Introduction to RDM for trainee physicians

Ethics

Ethics committees 

Review research applications and advise on whether they are ethical.

Safeguard the rights of research participants.

Participants  

Must be fully informed as to the purpose and intended uses of the research, and advised of what their involvement will entail.

Participation must be voluntary, fully informed and free of any coercion.

Confidentiality of information collected and anonymity of subjects must be respected at all times.

 

Page 28: Introduction to RDM for trainee physicians

Privacy, confidentiality & disclosure

Privacy An entitlement of an individual subject. Handling, storage and sharing of data must be managed to preserve the

privacy of the subject.

Confidentiality Refers to the behaviour of the researcher, whereby the privacy of the

subject is maintained at all times.

Disclosure Must be guarded against! Various techniques to avoid it, whether for ethical, legal reasons or

commercial reasons, e.g. o removing identifiers from personal information (e.g. D.o.B, Nat. Ins. No.)o aggregating geographical data to reduce precisiono anonymising data – but without overdoing it!

Page 29: Introduction to RDM for trainee physicians

Data protection

The Data Protection Act 1998 is a Parliamentary Act defining UK law on the processing of data on identifiable living people.

It is the main piece of legislation that governs the protection of personal data in the UK

Research data falls within the scope of this Act.

Failure to observe it can result in: monetary penalty notices, prosecutions enforcement notices audit without consent

Page 30: Introduction to RDM for trainee physicians

Intellectual Property Rights (IPR)

Legally recognized exclusive rights and protection given to persons for ‘creations of the mind’.

IPR grants exclusive rights to creators to: Publish a work License its distribution to others Sue if unlawful copies or use is made of it

Page 31: Introduction to RDM for trainee physicians

Copyright

Can be contentious & complex! When data are archived or

shared, the creator retains copyright.

Data structured within a database as a result of intellectual investment, retains an additional ‘database right’

Can sit alongside the copyright attached to the data contents.

Page 32: Introduction to RDM for trainee physicians

Freedom of information The Freedom of

Information Act 2000 … gives a right of access to

information held by 'public authorities‘, which includes most universities

… covers all records and information held by them , whether digital or print, current or archived. 

Some research data are exempt (data about human subject, commercial partners, national security)

Page 33: Introduction to RDM for trainee physicians

Data preservation

Preservation is key to the long term existence and future accessibility of research data …

… by the original creator (yourself)… by future researchers … by any other person

Storage and access media (formats, hardware, software)…

… are superseded … fail (software/hardware) … deteriorate

Worth thinking about preservation at the planning stage.

Mapping the preservation process, workflow devised by Higgins, S., DCC (Digital Curation Centre)

Page 34: Introduction to RDM for trainee physicians

Data preservation …

… requires a trusted repository.

Research-funders ESRC data store: http://store.data-archive.ac.uk/store/ Zenodo (EU): https://zenodo.org/

Institutional (UoE) Edinburgh DataShare: http://datashare.is.ed.ac.uk/

Discipline-specific Archaeology Data Service: http://archaeologydataservice.ac.uk/

Discipline-agnostic Figshare: http://figshare.com/

Page 35: Introduction to RDM for trainee physicians

Data sharing ..

… the researcher Comply with funder requirements

Research can be validated

Increase impact through citation (reputation)

Increase visibility of research

Long-term data storage (preservation)

Enables future re-use (you & others)

… research & society Avoid duplication of effort & resources

Publicly funded research is available

Academic & scientific integrity increases transparency & accountability facilitates scrutiny of research findings prevents fraud

Extend reach of original research

Fosters collaboration

..is making your research available for others to reuse and build upon.Benefits

Page 36: Introduction to RDM for trainee physicians

Barriers to sharing

“Scientists would rather share their toothbrush than their data!” Carol Goble, Keynote address, EGEE (Enabling Grid for EsciencE) ’06 Conference

Valid reasons not to share: Research conducted in clinical settings (e.g. clinical trials) Research that includes confidential data pertaining to human subjects Research for national security (e.g. with MoD) Research with commercial partners to develop patents (e.g. for drug development)

Future ‘share-ability’ of the data - issues to consider: Format, Software, Documentation, Ethics, Consent & Confidentiality, Anonymisation Timescale for release (embargo) Infrastructure for sharing Rights & licensing

http://openclipart.org/detail/172856/toothbrush-by-bpcomp-172856

Page 37: Introduction to RDM for trainee physicians

Data licensing

Why?

The license explicitly states how your data may be used

Makes them available to others (where appropriate)

Ensures your data are open!

How?

Repository rights statement’

Creative Commons (CC): http://wiki.creativecommons.org

Open Data Commons (ODC): http://opendatacommons.org/

Page 38: Introduction to RDM for trainee physicians

Thank You!

Questions?Email:

[email protected]