data management booklet...the rural economy and land use programme, drawing on best practice in data...

28
MANAGING AND SHARING DATA a best practice guide for researchers

Upload: others

Post on 19-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

MANAGING AND SHARING DATA

a best practice guide for researchers

Page 2: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

FOREWORD

Good data management is the foundation for goodresearch. If data are properly organised and preserved,and their accuracy and integrity is controlled at all times,the result is high quality data, efficient research and thesaving of time and resources. Researchers themselvesbenefit greatly from properly managing their researchdata. Data management should be planned from the startof research. If it becomes part of standard researchpractice, then it need not incur additional time or costs.

When it comes to sharing research data, goodmanagement is essential to ensure that data can bepreserved and remain accessible in the long-term, so theycan be re-used and understood by other researchers.When managed and preserved properly, research data canbe successfully re-used for future scientific andeducational purposes, thus maximising the investmentmade in generating the data and increasing the visibilityof the research.

Data management primarily occurs within the lifecycleof a research project and is ideally carried out by allmembers of the research team. Digital preservation,which enables long-term data sharing, is often carried outby a specialised data archive or centre. The value of thedata to be preserved depends on the quality andefficiency of the data management during research.The data management information provided in thisbooklet is designed to help researchers and datamanagers across all research disciplines and researchenvironments make sure that research data are of thehighest quality and have the greatest potential forlong-term re-use.

It is recognised that different types of data created andmanaged across the research discipline spectrum mayrequire certain discipline-specific approaches to datamanaging and sharing; and that data centres may differ intheir approach to specific data management andpreservation issues. This guidance has been written forresearchers spanning the natural and social sciences andhumanities. Expertise has been taken from the DataSupport Service of the interdisciplinary Rural Economyand Land Use programme, funded by the Economic andSocial Research Council (ESRC), Natural EnvironmentResearch Council (NERC) and Biotechnology and BiologicalSciences Research Council (BBSRC). The guidance hasbeen reviewed and commented upon by datamanagement experts from the NERC, the NERCEnvironmental Bioinformatics Centre (NEBC), the BritishLibrary (BL), the Research Information Network (RIN), theArchaeology Data Service (ADS), the History Data Service(HDS), the London School of Economics ResearchLaboratory and the Wellcome Trust. The UKDA thanksthem for their valuable comments.

Key areas of advice are:

• data documentation and metadata

• data formats and software

• data storage, back-up and security

• research ethics, consent and data confidentiality

• copyright

This printed guide is complemented by detailed andpractical online information, available from the UK DataArchive web site.1

The UKDA also provides training and workshops on datamanagement and sharing, including bespoke advice [email protected] or +44 (0)1206 872572/872974

Libby Bishop, Matthew

Woollard, Veerle Van d

en Eynden,Louise Cort

i

Page 3: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

Foreword ifcData lifecycle 3Sharing data - why and how 4

Why share research data? 4How to share data 5

Data documentation and metadata 6Data documentation 6Metadata 7

Data formats and software 8Choice of formats and conversions 8Transcription 10Data quality control 10Version control 11Authenticity 11

Data storage, back-up and security 12Making back-ups 12Data storage 13Data security 14Encrypting data for transmission 15

Research ethics, consent and data confidentiality 16Personal, confidential and sensitive personal data 16Informed consent and data sharing 18Written or verbal consent? 18One-off or process consent? 19Anonymising data 20Access control 21

Copyright 22Links and references 24

INDEX

Page 4: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

2

Research Laboratory

Endorsed by:

© 2009 University of Essex

First published 2009

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form orby any means, electronic, mechanical, photocopying, recording or otherwise, without written permission from the publisher.

Published by:UK Data ArchiveUniversity of EssexWivenhoe ParkColchesterEssexCO4 3SQ

ISBN: 1-904059-66-X

Printed by University of Essex Printing Services

Page 5: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

DATA LIFECYCLE

Data creation

• planning

• data collection

(survey, experiment, measurement etc.)

• data entry or digitisation

• data checking and cleaning

Data analysis

• analysis

• derived data creation

• creation of data documentation

Re-use of data

• by same researcher

• by other researchers

Distribution/publication

of data

End of project

• research outputs

• preparing data for preservation

Preservation of data

• storage of data

• migration to suitable format/medium

• metadata creation

End of research project/funding

Start of research project/funding

3

Page 6: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

Data management plans

The Rural Economy and Land Use programme, drawing on best practice

in data management and sharing across three research councils (ESRC,

NERC and BBSRC), requires all funded projects to develop and

implement a data management plan2 to ensure that data are well

managed throughout the duration of a research project. In a data

management plan researchers describe:

• the need for access to existing data sources

• data planned to be produced by the research project

• planned quality assurance and back-up procedures

• plans for management and archiving of collected data

• expected difficulties in making data available for secondary research

and measures to overcome such difficulties

• who holds copyright and Intellectual Property Rights of the data

• who has data management responsibility roles within the

research team

As part of its policy on data management and sharing3, the Wellcome

Trust expects applicants to supply a data management and sharing plan

if the goal of the proposal is to create or develop a research resource for

the benefit of the research community; or if the proposal involves the

generation of a significant quantity of data that could potentially be

shared for added benefit. The plan should outline:

• how the data will be made available to the wider community

• the proposed timeframe of data sharing

• data quality and standards

• the use of public data repositories (which the Trust expects)

• intellectual property of the data

• protection of research participants and possible limitations on

data sharing

• long-term preservation and sustainability strategy

Why share research data?Research data are a valuable resource, usuallyrequiring much time and money to be produced.Many datasets have a significant value beyond theoriginal research. Sharing research data:

• encourages scientific enquiry and debate

• enables scrutiny of research outcomes

• facilitates research beyond the scope of theoriginal research

• leads to new collaborations between data usersand data creators

• reduces the cost of duplicating data collection

• provides important resources for educationand training

• encourages the improvement and validation of research methods

• promotes the research that created the data andits outcomes

• can provide a direct credit to the researcher asa research output in its own right

The ease with which digital data can be stored,disseminated and made accessible to secondary users viathe internet, means that many institutions embrace thesharing of research data to increase the impact andvisibility of their research.

Many research funders increasingly follow guidance fromthe Organisation for Economic Co-operation andDevelopment that publicly funded research data shouldbe openly available to the scientific community to themaximum extent possible. They have adopted datasharing policies and encourage or oblige researchers toshare research datasets and outputs. Data sharing policiesallow researchers exclusive data use for a reasonable timeperiod in which to publish the results of the data.

In the UK, the Economic and Social Research Council(ESRC), the Natural Environment Research Council (NERC)and the British Academy contractually require researchersto offer all research data resulting from their grants todesignated data centres - UK Data Archive and NERC datacentres. In addition, the Biotechnology and BiologicalSciences Research Council (BBSRC), the Medical ResearchCouncil (MRC) and the Wellcome Trust now all have datapolicies which encourage researchers to share theirresearch data in a timely manner, with as few restrictionsas possible.

SHARING DATA - WHY AND HOW

4

Page 7: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

How to share dataResearch data can be shared:

• by depositing data with a specialist data centreor archive

• by depositing data in an institutional repository

• online via a project or institutional web site

• informally between researchers on apeer-to-peer basis

Sharing data by any of these means is valuable.Exact approaches to data sharing may vary according todifferent research environments and disciplines, due tothe varying nature of data types and their characteristics.Ensuring the sustainability of data resources is animportant aspect to consider.

Depositing data with a specialist data centre hasadditional advantages:

• safe-keeping of research data in asecure environment

• helping to ensure that data meet setquality thresholds

• ability to regulate and control access to data when needed (e.g. limiting data access to researchers only or restricting access to confidential data)

• licensing arrangements to acknowledge data rights

• long-term preservation in a standardised data format that remains accessible

• regular data back-ups

• conversion of data formats when needed due to software upgrades or changes

• online resource discovery of data through data catalogues

• standardised citation mechanism to acknowledge data ownership

• ensure efficiency of resource use, avoiding duplication of efforts

• promotion of data to users

• easy data dissemination to many users

• monitoring of the secondary usage of data

• management of access and user queries on behalfof the data owner

Data centres may apply certain criteria to evaluate andselect datasets for preservation.

HOW TO SHARE DATA

Data centres where researchers can deposit

data for preservation and distribution

Antarctic Environmental Data Centre

Archaeology Data Service

Biomedical Informatics Research Network Data Repository

British Atmospheric Data Centre

British Library

British Oceanographic Data Centre

Cambridge Crystallographic Data Centre

Environmental Information Data Centre

European Bioinformatics Institute

Geospatial Repository for Academic Deposit and Extraction

History Data Service

Infrared Space Observatory

National Biodiversity Network

National Geoscience Data Centre

NERC Earth Observation Data Centre

NERC Environmental Bioinformatics Centre

Publishing Network for Geoscientific & Environmental Data

Scran

The Oxford Text Archive

UK Data Archive

UK PubMed Central

UK Solar System Data Centre

Visual Arts Data Service

45

Page 8: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

A crucial part of making data user-friendly, shareableand with long-lasting usability is ensuring that theycan be understood, interpreted and used at all times.This requires clear and detailed data description anddocumentation.

Data documentationComprehensive data documentation is easiest when begunat the onset of a project and continued throughout theresearch process. It should be considered as part of bestpractice in terms of organising and managing data.

Data documentation explains how data were createdor digitised, what data mean, what their content andstructure is, and any manipulations that may have takenplace. It ensures that data can be understood withinresearch teams and that researchers will continue tounderstand their own data in the long term.Gooddocumentation is also vital for any successfuldata preservation.

Good data documentation includes information on:

• the context of data collection: project history, aims, objectives, hypotheses, etc.

• data collection methods: data collection process, sampling design, instruments, hardware and software used, questionnaire used, scale and resolution, temporal coverage and geographic coverage

• dataset structure: data files, cases, relationships between files, etc.

• data sources used

• data validation, checking, proofing, cleaning and other quality assurance procedures carried out

• modifications made to data over time since their original creation and identification of different versions of datasets

• where applicable information on data confidentiality and consent agreements made

Numerical, tabular data should also bedocumented with:

• names, labels and descriptions for variables, fields, records and their values

• explanation of codes and classifications schemes used, with reference to published classifications where appropriate

• codes of, and reasons for, missing values

• derived data created after collection, with well documented code, algorithm or command file usedto create them

• weighting and grossing variables created

Qualitative data require as documentation:

• data listing of biographical characteristicsof interviewees

• listing and descriptions of image or sound files

Variable-level descriptions may be embedded within adataset itself as metadata. Other documentation may becontained in user guides, reports, publications, workingpapers and laboratory books.

6

DATA DOCUMENTATION AND METADATA

The Stockholm Environmental Institute has created an integrated spatial

dataset4, Social and Environmental Conditions in Rural Areas (SECRA),

containing socio-economic and environmental characteristics of all rural

Census 2001 Super Output Areas. This dataset is available online as an

MS Access database and can be downloaded together with

accompanying metadata and documentation files that clearly describe

the data.

The dataset is organised in four data themes - natural and constructed

features, qualities of people and place, living and working, and political

and economic context. The dataset is documented in detail by:

• four data list files providing descriptions of all variables within each

data theme

• four metadata files describing how each variable was constructed

and calculated, with references for the data sources used

• a detailed report describing the rationale for the dataset, the

relevance of variables to rural conditions, the methodology used, an

overview of variables included per data theme and examples of how

the dataset may be used

All data files and metadata files are clearly labelled according to the four

data themes.

Case study

Page 9: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

7

Example of online study-level documentation available for a datasetin the UKDA catalogue

Format Name Size in Kilo Bytes Description

PDF guide.pdf 3557 User guide

PDF method.pdf 509 Methodology

PDF source.pdf 41 Example of sources

HTML UKDA Study 4177 Information.htm 18 Study information and citation

MetadataIn the context of data management, metadata are asubset of core data documentation, which providesstandardised structured information explaining thepurpose, origin, time references, geographic location,creator, access conditions and terms of use of a dataset.Metadata are typically used:

• for resource discovery, providing searchable information that helps users to find existing data

• as a bibliographic record for citation

Metadata for online data catalogues or discoveryportals are often structured to international standardsor schemes such as Dublin Core, ISO 19115 for geographicinformation, Data Documentation Initiative (DDI),Metadata Encoding and Transmission Standard (METS)and General International Standard Archival Description(ISAD(G)).

The use of standardised records in eXtensible Mark-upLanguage (XML) brings key data documentation togetherinto a single document, creating rich and structuredcontent about the data. Metadata can be viewed withweb browsers, can be used for extract and analysisengines and can enable field-specific searching. Disparatecatalogues can be shared and interactive browsing toolscan be applied. In addition, metadata can be harvestedfor data sharing through the Open Archives InitiativeProtocol for Metadata Harvesting (OAI-PMH).

Researchers typically create metadata records for theirdata by completing a data centre’s data deposit form orby using a metadata creation tool, such as Go Geo GeoDocor MetaGenie. Providing detailed and meaningful datasettitles, descriptions and keywords, etc. enables datacentres to create rich metadata for archived datasets.This should enable more comprehensible resourcediscovery and data that are easier to use.

Data centres accompany each dataset with a bibliographiccitation that data users are required to state in researchoutputs to reference and acknowledge accurately thedata source used. A citation gives credit to the datasource and distributor and identifies data sourcesfor validation.

Page 10: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

8

Choice of formats and conversionsThe selection of software used to collect, create anddigitise data typically depends on how researchers wish toanalyse data. When considering the long-term usability ofdata, attention needs to be given to the most appropriatesoftware and data format to use.

All digital information is designed to be interpretedby computer programs to make it understandable andis by nature software dependent. All digital data are thusendangered by the loss and obsolescence of the hardwareand software environment on which access todata depends.

Despite the backward compatibility of many softwarepackages to import data created in previous softwareversions and the interoperability between competingpopular software programs, the safest option toguarantee long-term data access is to convert data tostandard formats that most software are capable ofinterpreting, and that are suitable for data interchangeand transformation. This typically means using openformats - such as Rich Text Format (RTF) or OpenDocument Format (ODF) - or as open as possible,as opposed to proprietary ones.

Thus, whilst researchers use the most suitable dataformats and software according to their planned analyses,once data analysis is completed and data are prepared tobe stored, researchers should consider converting theirresearch data to standard, interchangeable formats, inorder to avoid being unable to use the data in the future.Equally for back-ups of data, standard formats shouldbe considered.

For long-term digital preservation, data archives also holddata in such standard formats. At the same time, data areoffered to users by conversion to current common anduser-friendly data formats.

When researchers offer data to data archives forpreservation, it is preferable for the researchersthemselves to convert data to a preferred datapreservation format, as the person knowing the datais in the best position to ensure data integrity duringconversions. Advice should be sought on up-to-dateformats, as software changes occur so quickly.

When data are converted from one format to another -through export or by using data translation software -certain changes may occur to the data. For data held inspreadsheets or databases, some data or internalmetadata may be lost during conversions to anotherformat, e.g. missing value definitions, decimal numbersor variable labels. For textual data, editing such ashighlighting or bold text may be lost. After softwareconversions, data should therefore be checked for errorsor changes that may be caused by the export process.The researchers knowing the data are in the best positionto carry out such conversions.

DATA FORMATS AND SOFTWARE

The Wessex Archaeology Metric Archive Project5 has brought together

metric animal bone data from a range of archaeological sites in England

into a single database format. The dataset contains a selection of

measurements commonly taken during Wessex Archaeology

zooarchaeological analysis of animal bone fragments found during field

investigations. The dataset was created by the researchers in MS Excel

and MS Access formats and deposited with the Archaeology Data

Service (ADS) in the same formats. ADS has preserved the dataset in

Oracle and in CSV format and disseminates the data via both a

Oracle/Cold Fusion live interface and as downloadable CSV files.

Case study

Page 11: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

89

Data formats currently recommended by UKDA for long-term preservation of research data

Note that other data centres or digital archives may recommend different formats

Type of dataPreferred format for management

back-ups and data preservation

Other acceptable formats

for data preservation

Quantitative tabular data with

extensive metadata

i.e. a dataset with variable labels, code labels,

and defined missing values, in addition to the

matrix of data

SPSS portable (.por) format, or delimited text

and command (‘setup’) file (SPSS, Stata, SAS,

etc.) containing metadata information, other

structured text/mark-up file containing metadata

information e.g. DDI XML file

proprietary formats of statistical packages , e.g.

SPSS (.sav), Stata (.dta)

Quantitative tabular data with

minimal metadata

i.e. a matrix of data with or without column

headings/variable names, but no other

metadata or labelling

comma-delimited (.csv) or tab-delimited (.tab)

files, including delimited text of given character

set with SQL data definition statements where

appropriate - these are most widely used, and

most widely recognised by import ‘wizards’

delimited text of given character set -

only characters not present in the data should

be used as delimiters, widely-used formats

e.g. MS Excel (.xls/.xlsx), MS Access

(.mdb/.accdb), dBase (.dbf) and

OpenDocument Spreadsheet (.ods)

GIS and CAD data

i.e. vector and raster

ESRI Shapefile (.shp, .shx and .dbf),

geo-referenced TIFF (.tif and .tfw),

CAD data (.dwg), GIS attribute data - ‘tabular

data with minimal metadata’

MapInfo Interchange Format (.mif) for vector

data, Keyhole Markup Language (.KML) as

used for Google Earth, Google Maps, Adobe

Illustrator, CAD data (.dxf or .svg), binary

formats of GIS and CAD packages may

be acceptable

Qualitative data

textual

eXtensible Markup Language (XML) marked-up

text according to an appropriate Document

Type Definition (DTD) or schema, Rich Text

Format (.rtf), plain text data, ASCII (.txt)

Hypertext Markup Language (HTML),

widely-used proprietary formats e.g. MS Word

(.doc/.docx), proprietary/software-specific

formats such as NUD*IST, NVivo and ATLAS.ti

Digital image data TIFF (version 6) uncompressed

JPEG (.jpeg, .jpg), TIFF (other versions),

Adobe Portable Document Format

(PDF/A or PDF), raw image format (.RAW)

software-specific formats, e.g. Photoshop files

(.psd)

Digital audio dataFree Lossless Audio Codec(FLAC) (.flac),

WAV file (.wav)

MPEG-1 Audio Layer 3 (.mp3),

Audio Interchange File Format (AIFF) (.aif)

Digital video data JPEG 2000

DocumentationRich Text Format (.rtf), PDF/A or PDF,

HTML (.htm), Open Document Text (.odt)

plain text (.txt), widely-used proprietary formats

e.g. MS Word (.doc/.docx) or Excel (.xls/ .xlsx),

are acceptable but offer less long-term security,

XML marked-up text according to an

appropriate DTD or schema, e.g. XHMTL 1.0

Page 12: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

10

TranscriptionWhere qualitative data are collected as audio or videorecordings, such as for interviews or focus groups, theyideally should also be transcribed as textual files forarchiving and sharing. Transcripts should:

• have a unique identifier

• have a document header giving brief details of the data collection event, including date, place, interviewer name, interviewee details, etc.

• have a uniform layout throughout theresearch project

• make use of speaker tags indicating the question/answer sequence

• use pseudonyms to anonymise personalidentifying information

• have line breaks

• be page numbered

Data quality controlQuality control of data is an integral part of all researchand takes place at various stages, during data collection,data preparation and data verification.

During data collection, researchers must ensure that thedata recorded reflect the actual facts, responses orevents, for example:

• Computer-Aided Interview (CAI) software can be used to verify response consistency, routing questions so that only appropriate questions are asked and confirming responses against previous answers where appropriate

• for audio-visual data or interview recordings, the quality of data depends on the quality of the audio-visual equipment used

• if data are collected with instruments, calibration of instruments is essential to check the precision, bias and/or scale of measurement; data are validated by checking for equipment as well as transcription errors; data may be verified by checking the truth of the record with an expert or by taking multiple measurements, observations or samples

The quality of data collection methods used has asignificant bearing on data quality.

During data preparation, when data are transcribed,entered in a database or spreadsheet, coded, etc.,quality is ensured by adhering to standardised andconsistent procedures for data entry and transcription.This may include setting up validation rules in data entrysoftware or detailed labelling of variable and recordnames to avoid confusion.

During data verification, data are edited, cleaned,verified, double-checked and cross-checked. Checkingtypically involves both automated and manual procedures.This may include double-checking the coding of responsesand removing out-of-range codes; verifying randomsamples of the digital data against the original data;verifying the entire dataset; or double entry of data.

Study Title: ‘Healthy diets across generations’

Depositor: A. Person

Interviewer: A. Person

Interview number: 12

Interview ID: Chris Smith

Date of interview: 3 May 2007

Information about interviewee

Date of birth: 6 June 1949

Gender: male

Marital status: widowed

Occupation: bricklayer

Geographic region: North-East England

I: Just one or two factual details first of all

before we go on to your health and that....

how old are you?

FL: I'm 58 in June.

I: What schools did you go to? Can you remember

that far back!

FL:Oh... the last school was at Longside.. aye,

ken Longside?

Sample transcript

Page 13: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

1011

Version controlIt is important to ensure that different copies or versionsof files, materials held in different formats or locationsand information that is cross-referenced between filesare all subject to version control. Checks and proceduresshould be put in place to make sure that if theinformation in one file is altered, the related informationin other files is also updated. It is important to keep trackof which version of a file is the most current, especiallywhere files are shared between people or held indifferent locations.

Best practice is to:

• uniquely identify files, preferably using a systematic naming convention

• clearly record version and status of a file, e.g. draft, interim, final, internal

• record what changes are made to a file when a new version is created

• record relationships between items as in many cases the information contained in a single file is supported by information held in other files, e.g. relationship between the code and the data file it is run against, or between the data file and the documentation or metadata that relate to it, or between multiple tables

• track the location of all files if stored in a varietyof locations

• regularly synchronise files in different locations,e.g. using MS SyncToy software

• maintain single master files in a suitable format to remove version control problems associated with multiple working versions being developed in parallel

Version control can be maintained through:

• file naming conventions, using number sequencesor dates in file names - although avoid very long file names or using spaces and special characters infile names

• including a file history or version control table at the start of each file, in which versions, dates, authors and details of changes to the file are recorded

• versioning software

AuthenticityDigital information can be copied, altered or deletedvery easily. It is therefore important to be able todemonstrate the authenticity of data and to preventunauthorised access to data that may potentially leadto unauthorised changes.

Best practice to ensure authenticity and controlaccess is to:

• keep a master file (a formalised and checked master copy of the data and other materials)

• assign responsibility for master files where possible to an individual member of the project team

• restrict write access to master versions to specific members of the project team

• create a formal procedure for the destruction of master files

• record all changes to master files

• maintain old master files (in case later onescontain errors)

• archive copies of master files at certain stagesof development

Page 14: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

Making back-upsMaking back-ups of files is an essential element of datamanagement. Regular back-ups protect against accidentalor malicious data loss due to:

• hardware failure

• software or media faults

• virus infection or malicious hacking

• power failure

• fire, flood or other severe catastrophe

• human errors

Backing up involves making copies of files which canbe used to restore originals if there is loss of data.Choosing a precise back-up procedure to adopt dependson local circumstances, the perceived value of the dataand the levels of risk considered appropriate for thecircumstances. Where data contain personal information,care should be taken to only create the minimal numberof copies needed, e.g. a master file and oneback-up copy.

When deciding upon the best back-up procedure for datafiles, consider:

• whether to back up particular data files or back up the entire system

• the frequency of back-up

• back up after each change to data or atregular intervals

• frequently used and critical data files may be backed up daily using an automated back-up process

• if data are held on an institutional network space, they may be automatically backed up at regular intervals thanks to an institutional back-up policy

• carrying out incremental or differential back-ups

• incremental back-up consists of first makinga copy of all relevant files - often the complete contents of a personal computer - and then making incremental back-ups of the files which have altered since the last back-up; removable media (CD/DVD) is recommended

• for differential back-ups a complete back-up is made first, then back-ups are made of files changed or created since the first full back-up and not just since the last partial back-up; using ‘fixed’ media such as hard drives is recommended

• choice of media

• depends on the quantity of files, type of data,and the preferred method of backing up

• examples include recordable CD/DVD, networked hard drive, removable hard drive or magnetic tape

• location of back-up files

• online back-up files or offline storage onremovable media or transportable hard drives which can be physically removed to another location for safe-keeping

• organising and labelling well all back-up files

• formats

• back-ups of master copies should ideally be in formats that are suitable for long-term digital preservation

• this typically means open as opposed toproprietary formats

• verifying and validating back-up files regularly

• fully restoring them to another location and comparing them with the originals

• comparing back-up copies for completeness, for example by checking the MD5 sum values, file size and date to ensure the integrity of the files

• not overwriting old back-ups with new ones

DATA STORAGE, BACK-UP AND SECURITY

12

Page 15: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

13

Data storageThe storage of any digital research data should be basedon two principles:

• digital storage media are inherently unreliableunless they are stored appropriately

• all file formats and physical storage media will ultimately become obsolete

Best practice is to:

• store data in formats which meet long-term readability requirements

• in general this means non-proprietary formats or open standard formats

• some proprietary formats, such as MS Rich Text Format and Excel, are widely used and likely to be accessible for a reasonable, but not unlimited, time after becoming obsolete

• obtain up-to-date guidance since changes to formats and media may occur quickly

• make digital versions of paper documentation in PDF/A format for long-term security

• copy/migrate data files to new media between two and five years after they were first created, since optical media (CDs and DVDs) and magnetic media (hard drives, tapes) are subject to physical degradation

• check the data integrity of all stored data files at regular intervals

• ensure that any storage strategy, even for a short-term project, involves at least two different formsof storage

• organise stored data well, ensuring they are easily located and physically accessible

• ensure that areas and rooms designated for storageof digital or non-digital data are suitable for the purpose, are structurally sound and free from the risk of flood and fire

Also note that optical and magnetic media are vulnerableto poor handling, changes in temperature, changes inrelative humidity, air quality and lighting conditions. TheNational Preservation Office has published guidelines6 oncaring for CDs and DVDs which is available on the BritishLibrary web site.

Non-digital printed materials and photographs are subjectto degradation from sunlight and acid, e.g. from sweat onskin and in some kinds of paper.

A project at a university carries out coral reef research. Field data are

collected using handheld Personal Digital Assistants (PDAs). Digital data

are transmitted daily to the university network drive, where they are held

in password protected files. All data files are identified by an individual

version number and creation date. Version information (version numbers

and notes detailing differences between versions) are stored in a

spreadsheet, also on the network drive.

The university’s network drive is fully backed up onto Ultrium LTO2 data

tapes. Incremental back-ups are made daily Monday to Thursday; full

server back-ups are made over Friday/Saturday/Sunday. Tapes are

securely stored in a separate building. Upon completion of the research

the datasets are deposited in the university’s digital repository.

Case Study

Page 16: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

Data securityData security is the protection of any data fromunauthorised access, use, change, disclosure anddestruction; as well as the prevention of unwantedchanges that can affect the integrity of data.

Some elements of data security, where the safe-guardingof personal data is involved, are based on nationallegislation (the Data Protection Act) and therefore cannotbe ignored. Personal data should only be accessible toauthorised persons. Personal data may also exist in non-digital format. Consider for example the storage ofcompleted consent forms, interview cover sheets or papercopies of a questionnaire survey. These should beprotected in the same way as digital files. In addition,personal and identifying data (e.g. people’s names and/oraddresses) should be stored separate from the relatedanonymised data.

Data which contain personal information should betreated with higher levels of security than data which donot. Security arrangements therefore need to beproportionate to the nature of the data and the risksinvolved. Ensuring data security means paying attentionto physical security, network security and security ofcomputer systems and files.

Physical security includes:

• restricting access to rooms and buildings where digital data, computers or media are held

• logging the removal of, and access to, media or hardcopy material in store rooms

• transporting personal data only under certain exceptional circumstances, even for repair purposes, e.g. giving a failed hard drive containing sensitive data to a computer manufacturer may cause a breach of security

• carrying out the destruction of data in a consistent manner – paper should be shredded and computer files permanently deleted from all systems

• ensuring the secure destruction of files on a computer at the end of its productive life

• deleting all files and reformatting a hard drive will not prevent the possible recovery of data that have previously been on that hard drive

• specialist advice should be sought where needed, CD/DVD shredders should be used and hard drives should be removed from their casings and disposed of securely

14

In February 2008 the British Library (BL) received the recorded output of the Survey of Anglo-Welsh Dialects (SAWD),

carried out by University College, Swansea between 1969 and 1995. This survey recorded the English spoken in

Wales by interviewing and tape-recording elderly speakers on topics including the farm and farming, the house and

housekeeping, nature, animals, social activities and the weather.

The collection was deposited in the form of 503 digital audio files, which were accessioned as .wav files in the BL’s

Digital Library. Digital clones of all files are held at the Archive of Welsh English, alongside the original master

recordings on 151 audio cassettes, from which the digital copies were created. The BL’s Digital Library is mirrored on

four sites – at Boston Spa, St Pancras, Aberystwyth and a ‘dark’ archive which is provided by a third party. Each of

these servers has inbuilt integrity checks. The British Library makes available access copies for users, in the form of

.mp3 audio files, in the British Library Reading Rooms via the Soundserver system. A small set of audio extracts from

the SAWD recordings are also available online on the BL’s Accents and Dialects web site, Sounds Familiar.

Case Study

Page 17: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

1615

Network security means:

• not storing confidential data such as those containing names or addresses on servers or computers connected to an external network, particularly servers that host internet services, either web or email

Security of computer systems and files includes:

• locking computer systems with a password and installing a firewall system

• protecting from viruses and malicious code through regularly updated virus detection software

• implementing password protection of, and access permissions to, data files (no access, read only, read and write, administrator permission)

• imposing confidentiality agreements for data users of confidential data

• never sending personal or confidential data via email or using File Transfer Protocol (FTP), but rather as encrypted data

The risk of security breaches and disclosure ofconfidential data can also be removed or reduced by:

• anonymising digital data by removing identifiers or aggregating data (see section on Anonymising Data)

• separating disclosable from non-disclosable data by obscuring, removing or hiding individual fields, records, columns or tables

Encrypting data for transmissionAfter testing a number of software applications forencrypting data to allow secure data transmission fromgovernment departments to the data archive, the UKDArecommends the use of Pretty Good Privacy (PGP), anindustry-standard encryption technology. Availablesupporting encryption software can be open source,e.g. GnuPG, or commercial, e.g. PGP.

Encryption requires the creation of a Public and PrivateKey pair and passphrase. The Private PGP Key andpassphrase are used to digitally sign each encrypted file,and thus allow the recipient to validate the sender’sidentity. The recipient’s Public PGP Key is installed by thesender in order to encrypt files so that only theauthorised recipient can decrypt them.

Page 18: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

When research involves obtaining data from people,e.g. in social, anthropological or medical research,researchers are expected to maintain high ethicalstandards. Ethical guidelines are typically issued byprofessional bodies, institutions and funding organisations.

Research will usually require obtaining informed consentfor people to participate in research and for use of theinformation collected. It is essential that consent alsotakes into account long-term use of data, such aspreserving and sharing data. Without consent for datasharing, opportunities for sharing research data withother researchers can be jeopardised.

Personal, confidential and sensitivepersonal dataAt times data obtained from people may hold sensitive orconfidential information. This does not mean that all dataobtained by research with participants are confidential.

Personal data are defined in the Data Protection Act 1998as data which relate to a living individual who can beidentified from those data, or from those data and otherinformation which is in the possession of, or is likely tocome into the possession of, the data controller (e.g.researcher). This includes any expression of opinion aboutthe individual. Confidential data are data that:

• can be connected to the person providing them or that could lead to the identification of a person referred to (names, addresses, occupation, photographs)

• are given in confidence, or data agreed to be kept confidential (secret) between two parties, that are not in the public domain

• are conditioned by factors such as ethicalguidelines,legal requirements or research-specific consent agreements

Sensitive personal data are defined in the Data ProtectionAct 1998 as data that may incriminate a participant orthird party, such as a person's race, ethnic origin, politicalopinion, religious beliefs, trade union membership,physical or mental health, sexual orientation, criminalproceedings or convictions.

Strategies for dealing with confidentiality depend uponthe nature of the research, but are essentially informedby a researcher’s ethical obligations towards participantsand society and by legislation such as the Data ProtectionAct 1998.

Legislation7 that may impact on the sharing ofconfidential data:

• duty of confidentiality

• Data Protection Act 1998

• Freedom of Information Act 2000

• Human Rights Act 1998

• Statistics and Registration Services Act 2007

• Environmental Information Regulations 2004

Sensitive and confidential data can be shared ethicallyif researchers pay attention, from the planning stages ofresearch, to three important aspects:

• when gaining informed consent, include consentfor data sharing

• where needed, protect people’s identities by anonymising data

• consider access restrictions to data

These measures should be considered jointly and never inisolation. The same measures form part of good researchpractice and data management, even if data sharing isnot envisioned.

RESEARCH ETHICS, CONSENTAND DATA CONFIDENTIALITY

16

Page 19: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

Thank you very much for agreeing to participate in this survey.

The information provided by you in this questionnaire will be used for research purposes. It will not be used in a

manner which would allow identification of your individual responses.

Anonymised research data will be archived at ………. in order to make them available to other researchers in line with

current data sharing practices.

I have read and understood the project information sheet dated DD/MM/YYYY. �

I have been given the opportunity to ask questions about the project. �

I agree to take part in the project. Taking part in the project will include being interviewed and audio

recorded [other forms of participation can be listed]. �

I understand that my taking part is voluntary; I can withdraw from the study at any time and I will not

be asked any questions about why I no longer want to take part. �

Select only one of the next two options:

I would like my name used where what I have said or written as part of this study will be used in

reports, publications and other research outputs so that anything I have contributed to this project

can be recognised. �

I do not want my name used in this project. �

I understand my personal details such as phone number and address will not be revealed to people

outside the project. �

I understand that my words may be quoted in publications, reports, web pages, and other research

outputs but my name will not be used unless I requested it above. �

I agree for the data I provided to be archived at ………. [More detail can be provided here so thatdecisions can be made separately about audio, transcripts, etc.] �

I understand that other researchers will have access to this data only if they agree to preserve the

confidentiality of that data and if they agree to the terms I have specified in this form. �

I understand that other researchers may use my words in publications, reports, web pages, and

other research outputs according to the terms I have specified in this form. �

I agree to assign the copyright I hold in any materials related to this project to [name of researcher]. �

________________________ ________________ ________

Name of Participant Signature Date

________________________ ________________ ________

Researcher Signature Date

Contact details for further information: Names, phone, email addresses, etc.

Sample consent statement for quantitative surveys

Sample extensive consent form for interviews

17

Page 20: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

Informed consent and data sharingIt is essential that gaining consent takes into account anyfuture uses of data, such as the sharing, preservation andlong-term use of research data. Researchers should:• inform participants how research data will be stored,

preserved and used in the long-term

• inform participants how confidentiality will be maintained e.g. by anonymising data

• obtain informed consent (written or verbal) fordata sharing

To ensure that consent is informed, consent must befreely given with sufficient information provided on allaspects of participation and data use. There must beactive communication between the parties. Consent mustnever be inferred from a non-response to acommunication such as a letter.

Written or verbal consent?Whether informed consent is obtained in writing througha detailed consent form, by means of an informativestatement, or verbally, depends on the nature of theresearch, the kind of data gathered, the data format andhow the data will be used.

• For detailed interviews or research where personal, sensitive or confidential data are gathered, the use of written consent forms is recommended to assure compliance with the Data Protection Act and with ethical requirements. Written consent documentation typically includes an information sheet and consent form signed by the participant.

• For surveys or informal interviews, where no personal data are gathered or personal identifiers are removed from the data, obtaining written consent may not be required. At a minimum an information sheet should be provided to participants detailing the nature and scope of the study, the identity of the researcher(s) and what will happen to the data collected (including any data sharing).

• If data are collected verbally through audio or video recordings, verbal consent agreements can be recorded together with the data.

• For audio-visual data where the identity of people may be disclosed from the data, it may be important that informed consent is obtained to use the data unaltered for research purposes, sharing and preservation. Voice alteration or image blurring are usually labour and cost intensive and decrease the research potential of data.

Sample consent forms8 are available from theUK Data Archive.

Research Ethics Committees

and data sharing

There is a potential tension between data sharing and

data protection. Data archives work to increase

availability of, and access to, research data, while the

primary purpose of Research Ethics Committees

(RECs) is to ensure ethical conduct in research and to

protect the safety, rights and well being of research

participants.

The need to protect personal data and preserve

confidentiality - where explicitly required - cannot be

overstated. This does not mean, however, that all

research data obtained from research with people

should be kept confidential, cannot be shared or,

worse even, are destroyed. It is important to

distinguish between personal or sensitive data

collected in research, and research data in general.

Personal data should not be disclosed, unless

consent has been given for disclosure. Identifiable

information may be excluded from data sharing. A

REC should, however, not object to the sharing of

research data in general. If research data contain

sensitive or confidential information, then the sharing

of such data must be considered carefully, but should

not be dismissed as being impossible.

18

Page 21: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

2019

One-off or process consent?Discussing and obtaining consent for:

• participation in research

• the use of the information gathered for analyses, publications and outputs

• data sharing beyond the research

can be a one-off occurrence or an ongoing process.

One-off consent is simple, practical, avoids repeatedrequests to participants, and meets the formalrequirements of most Research Ethics Committees.However, it may place too much emphasis on‘ticking boxes’.

If consent is considered throughout the research process,it assures active informed consent from participants.Thus, consent for participation in research, for data useand for data sharing can be considered at different stagesof the research, giving participants a clearer view of whatparticipating in the research involves and what the datato be shared consist of. It may, however, be too repetitiveand annoying for some participants.

Special consent9 considerations are needed for:

• medical research

• research with children and young adults

• research with people with learning difficulties

• research within organisations or the workplace

• research into crime

• internet research

The Biological Records Centre (BRC)10 is the national custodian

of data on the distribution of wildlife in the British Isles. Data are

provided by volunteers, researchers and organisations. BRC

disseminates data for environmental decision-making, education

and research.

Data whose publication could present a significant threat to a species

or habitat (e.g. nesting location of birds of prey) will be treated as

confidential. The BRC provides access to the data it holds via the

National Biodiversity Network Gateway. Standard access controls are

as follows:

• public access to view and download all records at a minimum

10 km2 level of resolution, and at higher resolution if the data

provider agrees

• registered users have access to view and download all except

confidential records at the 1 km2 level of resolution

• conservation organisations have access to view and download

all except confidential records at full resolution with attributes

• conservation officers in statutory conservation agencies have

access to view and download all records, including confidential

records at full resolution with attributes

• records that have been signified as confidential by a data provider

will not be made available to the conservation agencies without the

consent of the data provider

Case Study

Page 22: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

Anonymising dataBefore data obtained from research with people can bedisseminated, published or shared with other researchers,they may need to be anonymised11 so that individuals,organisations or businesses cannot be identified from thedata. Anonymisation may be needed for ethical reasons toprotect people’s identities, for legal reasons to notdisclose personal data, or for commercial reasons.Personal data should not be disclosed from researchinformation, unless a respondent has given specificconsent to do so. In some forms of research, for examplewhere oral histories are recorded or in anthropologicalresearch, it is customary to publish and share the namesof people studied, for which they have giventheir consent.

Quantitative datasets may be anonymised by:• removing direct identifiers, e.g. name or address

• aggregating or reducing the precision of a variable, e.g. replacing the date of birth by age groups or replacing full postcodes with postcode sectors

• generalising the meaning of a detailed text variable, e.g. replacing a doctor’s detailed area of medical expertise by ‘an area of medical speciality’

• restricting the upper or lower ranges of a variable to hide outliers, e.g. top-coding salaries

Special attention may be needed for: • relational data, where relations between variables in

related datasets can disclose identities

• geo-referenced data, where identifying spatial references such as point co-ordinates also have a geographical value

Simply removing spatial references prevents disclosure,but it also means that all geographical, locational andrelated information is lost. A better option may be tokeep spatial references intact and to impose accessrestrictions on the data instead. As an alternative, pointco-ordinates may be replaced by larger, non-disclosinggeographical areas or by meaningful alternative variablesthat typify the geographical position.

When anonymising qualitative material such astranscriptions of textual data, identifiers should not becrudely removed or aggregated, as this can distort thedata or even make them unusable. Rather pseudonyms,replacement terms or vaguer descriptors should be used.The objective should be to achieve a reasonable level ofanonymisation, avoiding unrealistic or overly harshediting, whilst maintaining maximum content. Proceduresto anonymise data should always be considered alongsideobtaining informed consent for data sharing.

Best practice for qualitative data is to:

• plan anonymisation at the time of transcription or initial write up

• use pseudonyms or replacements

• retain unedited versions of data for use within the research team and for preservation

• create an anonymisation log of all replacements, aggregations or removals made; care should be takento store such a log separately from the anonymised data files

• identify replacements in a meaningful way,e.g. with [brackets]

Digital manipulation of audio and image files can be usedto remove personal identifiers. However, techniques suchas voice alteration and image blurring are labour-intensive and expensive to apply to large quantities ofdata and are likely to damage the research potential ofthe data. If confidentiality of audio-visual data is anissue, it is better to obtain the participant’s consent touse and share the data unaltered.

A person’s identity can be disclosed from:

• direct identifiers, e.g. name, address, postcode

information or telephone number

• indirect identifiers that, when linked with other

publicly available information sources, could

identify someone, e.g. information on workplace,

occupation or exceptional values of characteristics

like salary or age

20

Page 23: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

Access controlUnder certain circumstances, sensitive and confidentialdata can also be safeguarded by regulating or restrictingaccess to and use of such data, while at the same timeenabling data sharing for research and educationalpurposes.

Data held at data centres and archives are not generallyin the public domain. Their use is restricted to specificpurposes after user registration. Users sign an End UserLicence in which they agree to certain conditions, i.e. notto use data for commercial purposes or identify anypotentially identifiable individuals.

Data centres may impose stricter access regulations forconfidential data, such as:

• providing access to approved researchers only

• requiring data access authorisation from the data owner prior to release

• placing confidential data under embargo fora given period of time until confidentiality is nolonger pertinent

• providing secure access to data, which enables analysis of confidential data but excludes access to the data or the ability to download the data

Mixed levels of access regulations may be putin place for some datasets, combining regulatedaccess to confidential data with user access tonon-confidential data.

Data centres typically liaise with the researcherswho own the data in selecting the most suitabletype of access for data. Access regulations should alwaysbe proportionate to the kind of data and confidentialityinvolved.

2221

Page 24: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

COPYRIGHT

Researchers ‘creating’ data hold copyright oversuch data. Copyright is an intellectual property right that protects the owner of a work from its unauthorisedcopying. Most research materials including spreadsheets,publications, reports and computer programmes, fallunder literary work and are therefore protected bycopyright. Facts, however, cannot be copyrighted.

If information is structured in a database, the structureacquires a database right, alongside the copyright in thecontent of the database. A database may be protected byboth copyright and database right. For database right toapply, the database must be the result of substantialintellectual investment in obtaining, verifying orpresenting the content.

In the case of interviews, the interviewee holdsthe copyright in the spoken word. If a transcriptionis a substantial reproduction of the words spoken,the speaker will own copyright in the words andthe transcriber will have separate copyright of the transcription.

In the case of collaborative research or derived data,copyright may be held jointly by various researchers orinstitutions. Copyright should be assigned correctly,especially if datasets have been created from a variety ofsources; for example ones which have been bought or‘lent’ by other researchers.

In academia, in theory the employer is the first owner ofthe copyright in a work made during the course of theemployee’s employment. Many academic institutions,however, waive copyright in research materials, data andpublications and give ownership to the researchers.Researchers should check with their institutions if theirinstitution retains copyright or waives it.

When data are archived or shared, the researcher or datacreator keeps the copyright over data. A data archivecannot effectively archive data unless all the rightsholders are identified and give their permission for thedata to be archived.

Secondary users of data must obtain copyright clearancefrom the rights holder before data can be reproduced.Data can be copied for non-commercial teaching orresearch purposes without infringing copyright, under thefair dealing concept, providing that the ownership of thedata is acknowledged to the copyright holder. Anacknowledgement should give credit to the data sourceused, the data distributor and the copyright holder. Datacentres typically specify how data use should beacknowledged and cited either within the metadatarecord for a dataset or in a data use licence.

Scenario: A researcher has collated articles about the Prime Minister from The Guardian over the past ten years, using

the Lexis Nexis database to source articles. They are then transcribed/copied by the researcher into a database so

that content analysis can be applied. The researcher offers a copy of the database with the original transcribed text to

a data centre.

Rights Issue: Researchers cannot share either of these data sources as they do not have copyright in the original

material. A data centre cannot accept these data as to do so would be breach of copyright. The rights holders, in this

case The Guardian and Lexis Nexis, would need to provide consent for archiving.

Case study - Media sources

22

Page 25: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

24

Scenario: A researcher has used the National Diet and Nutrition Survey (NDNS) data, obtained via the UKDA. NDNS

data are Crown Copyright. The researcher has processed the NDNS data (filtered, integrated and aggregated data

across variables, while maintaining individual records) and used the processed data to model food chain risks. The

researcher would like to the archive the processed data that were used as input data for the modelling, as well as the

modelling code, at the UKDA.

Rights Issue: There is joint copyright over the processed data, shared between the researcher and the Crown (holding

copyright over the NDNS data). The researcher should declare this joint copyright for the modelling input data and

requires no further permission from the Crown. The UKDA End User Licence, which the researcher signed when

obtaining the NDNS data from the UKDA, specifically states "offer for deposit any new data collections derived from

the data supplied or created by the combination of the data supplied with other data." Thus the UKDA can archive the

processed data with a joint copyright declaration.

Scenario: A researcher has interviewed five retired cabinet ministers about their careers, producing audio recordings

and full transcripts. The researcher then analyses the data and offers them to a data centre for preserving. However

the researcher did not get signed copyright transfers for the interviewees’ words.

Rights Issue: In this case it would be problematic for a data centre to accept the data. Large extracts of the data

cannot be quoted by secondary users. To do so would breach the interviewees’ copyright over their words. This is

equally a problem for the primary researcher. The researcher should have asked for transfer of copyright or a licence

to use the data obtained through interviews, as the possibility exists that the interviewee may at some point wish to

assert the right over their words, e.g. when publishing memoirs.

Scenario: A researcher subscribes to access spatial AgCensus data from EDINA. These data are then integrated with

data collated by the researcher. As part of the ESRC award contract the data has to be offered for archiving at the

UKDA. Can such integrated data be offered?

Rights Issue: The subscription agreement on accessing AgCensus data states that data may not be transferred to any

other person or body without prior written permission from EDINA. Therefore, UKDA cannot accept the integrated data,

unless the researcher obtains permission from EDINA. The researcher’s partial data, with the AgCensus data

removed, can be archived. Secondary users could then re-combine these data with the AgCensus data, if they were to

obtain their own AgCensus subscription.

Case study - Data purchased under licence

Case study - Data collected using in-depth interviews with 'elites'

Case study - Data obtained from UKDA

23

Page 26: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

LINKS AND REFERENCES

1 UK Data Archive: www.data-archive.ac.uk

2 RELU Data Management Plans: www.data-archive.ac.uk/relu/plan.asp

3 Wellcome Trust Policy on data management and sharing:

www.wellcome.ac.uk/About-us/Policy/Policy-and-position-statements/WTX035043.htm

4 Social and Environmental Conditions in Rural Areas (SECRA) dataset: www.sei.se/relu/secra/

5 Wessex Archaeology Metric Archive Project: ads.ahds.ac.uk/catalogue/resources.html?abmap_grimm_na_2008

6 National Preservation Office - Caring for CDs and DVDs: www.bl.uk/npo/pdf/cd.pdf

7 UKDA - Research ethics and legislation relevant to data sharing: www.data-archive.ac.uk/sharing/legal.asp

8 UKDA - Consent forms: www.data-archive.ac.uk/sharing/consentforms.asp

9 UKDA - Special cases of consent: www.data-archive.ac.uk/sharing/consentspecial.asp

10 Biological Records Centre: www.brc.ac.uk

11 UKDA - Anonymising research data: www.data-archive.ac.uk/sharing/anonymise.asp

24

Page 27: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires
Page 28: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires

UK Data ArchiveUniversity of EssexWivenhoe ParkColchesterCO4 3SQEmail: [email protected]: +44 (0)1206 872572/872974