data management booklet...the rural economy and land use programme, drawing on best practice in data...
TRANSCRIPT
![Page 1: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/1.jpg)
MANAGING AND SHARING DATA
a best practice guide for researchers
![Page 2: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/2.jpg)
FOREWORD
Good data management is the foundation for goodresearch. If data are properly organised and preserved,and their accuracy and integrity is controlled at all times,the result is high quality data, efficient research and thesaving of time and resources. Researchers themselvesbenefit greatly from properly managing their researchdata. Data management should be planned from the startof research. If it becomes part of standard researchpractice, then it need not incur additional time or costs.
When it comes to sharing research data, goodmanagement is essential to ensure that data can bepreserved and remain accessible in the long-term, so theycan be re-used and understood by other researchers.When managed and preserved properly, research data canbe successfully re-used for future scientific andeducational purposes, thus maximising the investmentmade in generating the data and increasing the visibilityof the research.
Data management primarily occurs within the lifecycleof a research project and is ideally carried out by allmembers of the research team. Digital preservation,which enables long-term data sharing, is often carried outby a specialised data archive or centre. The value of thedata to be preserved depends on the quality andefficiency of the data management during research.The data management information provided in thisbooklet is designed to help researchers and datamanagers across all research disciplines and researchenvironments make sure that research data are of thehighest quality and have the greatest potential forlong-term re-use.
It is recognised that different types of data created andmanaged across the research discipline spectrum mayrequire certain discipline-specific approaches to datamanaging and sharing; and that data centres may differ intheir approach to specific data management andpreservation issues. This guidance has been written forresearchers spanning the natural and social sciences andhumanities. Expertise has been taken from the DataSupport Service of the interdisciplinary Rural Economyand Land Use programme, funded by the Economic andSocial Research Council (ESRC), Natural EnvironmentResearch Council (NERC) and Biotechnology and BiologicalSciences Research Council (BBSRC). The guidance hasbeen reviewed and commented upon by datamanagement experts from the NERC, the NERCEnvironmental Bioinformatics Centre (NEBC), the BritishLibrary (BL), the Research Information Network (RIN), theArchaeology Data Service (ADS), the History Data Service(HDS), the London School of Economics ResearchLaboratory and the Wellcome Trust. The UKDA thanksthem for their valuable comments.
Key areas of advice are:
• data documentation and metadata
• data formats and software
• data storage, back-up and security
• research ethics, consent and data confidentiality
• copyright
This printed guide is complemented by detailed andpractical online information, available from the UK DataArchive web site.1
The UKDA also provides training and workshops on datamanagement and sharing, including bespoke advice [email protected] or +44 (0)1206 872572/872974
Libby Bishop, Matthew
Woollard, Veerle Van d
en Eynden,Louise Cort
i
![Page 3: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/3.jpg)
Foreword ifcData lifecycle 3Sharing data - why and how 4
Why share research data? 4How to share data 5
Data documentation and metadata 6Data documentation 6Metadata 7
Data formats and software 8Choice of formats and conversions 8Transcription 10Data quality control 10Version control 11Authenticity 11
Data storage, back-up and security 12Making back-ups 12Data storage 13Data security 14Encrypting data for transmission 15
Research ethics, consent and data confidentiality 16Personal, confidential and sensitive personal data 16Informed consent and data sharing 18Written or verbal consent? 18One-off or process consent? 19Anonymising data 20Access control 21
Copyright 22Links and references 24
INDEX
![Page 4: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/4.jpg)
2
Research Laboratory
Endorsed by:
© 2009 University of Essex
First published 2009
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form orby any means, electronic, mechanical, photocopying, recording or otherwise, without written permission from the publisher.
Published by:UK Data ArchiveUniversity of EssexWivenhoe ParkColchesterEssexCO4 3SQ
ISBN: 1-904059-66-X
Printed by University of Essex Printing Services
![Page 5: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/5.jpg)
DATA LIFECYCLE
Data creation
• planning
• data collection
(survey, experiment, measurement etc.)
• data entry or digitisation
• data checking and cleaning
Data analysis
• analysis
• derived data creation
• creation of data documentation
Re-use of data
• by same researcher
• by other researchers
Distribution/publication
of data
End of project
• research outputs
• preparing data for preservation
Preservation of data
• storage of data
• migration to suitable format/medium
• metadata creation
End of research project/funding
Start of research project/funding
3
![Page 6: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/6.jpg)
Data management plans
The Rural Economy and Land Use programme, drawing on best practice
in data management and sharing across three research councils (ESRC,
NERC and BBSRC), requires all funded projects to develop and
implement a data management plan2 to ensure that data are well
managed throughout the duration of a research project. In a data
management plan researchers describe:
• the need for access to existing data sources
• data planned to be produced by the research project
• planned quality assurance and back-up procedures
• plans for management and archiving of collected data
• expected difficulties in making data available for secondary research
and measures to overcome such difficulties
• who holds copyright and Intellectual Property Rights of the data
• who has data management responsibility roles within the
research team
As part of its policy on data management and sharing3, the Wellcome
Trust expects applicants to supply a data management and sharing plan
if the goal of the proposal is to create or develop a research resource for
the benefit of the research community; or if the proposal involves the
generation of a significant quantity of data that could potentially be
shared for added benefit. The plan should outline:
• how the data will be made available to the wider community
• the proposed timeframe of data sharing
• data quality and standards
• the use of public data repositories (which the Trust expects)
• intellectual property of the data
• protection of research participants and possible limitations on
data sharing
• long-term preservation and sustainability strategy
Why share research data?Research data are a valuable resource, usuallyrequiring much time and money to be produced.Many datasets have a significant value beyond theoriginal research. Sharing research data:
• encourages scientific enquiry and debate
• enables scrutiny of research outcomes
• facilitates research beyond the scope of theoriginal research
• leads to new collaborations between data usersand data creators
• reduces the cost of duplicating data collection
• provides important resources for educationand training
• encourages the improvement and validation of research methods
• promotes the research that created the data andits outcomes
• can provide a direct credit to the researcher asa research output in its own right
The ease with which digital data can be stored,disseminated and made accessible to secondary users viathe internet, means that many institutions embrace thesharing of research data to increase the impact andvisibility of their research.
Many research funders increasingly follow guidance fromthe Organisation for Economic Co-operation andDevelopment that publicly funded research data shouldbe openly available to the scientific community to themaximum extent possible. They have adopted datasharing policies and encourage or oblige researchers toshare research datasets and outputs. Data sharing policiesallow researchers exclusive data use for a reasonable timeperiod in which to publish the results of the data.
In the UK, the Economic and Social Research Council(ESRC), the Natural Environment Research Council (NERC)and the British Academy contractually require researchersto offer all research data resulting from their grants todesignated data centres - UK Data Archive and NERC datacentres. In addition, the Biotechnology and BiologicalSciences Research Council (BBSRC), the Medical ResearchCouncil (MRC) and the Wellcome Trust now all have datapolicies which encourage researchers to share theirresearch data in a timely manner, with as few restrictionsas possible.
SHARING DATA - WHY AND HOW
4
![Page 7: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/7.jpg)
How to share dataResearch data can be shared:
• by depositing data with a specialist data centreor archive
• by depositing data in an institutional repository
• online via a project or institutional web site
• informally between researchers on apeer-to-peer basis
Sharing data by any of these means is valuable.Exact approaches to data sharing may vary according todifferent research environments and disciplines, due tothe varying nature of data types and their characteristics.Ensuring the sustainability of data resources is animportant aspect to consider.
Depositing data with a specialist data centre hasadditional advantages:
• safe-keeping of research data in asecure environment
• helping to ensure that data meet setquality thresholds
• ability to regulate and control access to data when needed (e.g. limiting data access to researchers only or restricting access to confidential data)
• licensing arrangements to acknowledge data rights
• long-term preservation in a standardised data format that remains accessible
• regular data back-ups
• conversion of data formats when needed due to software upgrades or changes
• online resource discovery of data through data catalogues
• standardised citation mechanism to acknowledge data ownership
• ensure efficiency of resource use, avoiding duplication of efforts
• promotion of data to users
• easy data dissemination to many users
• monitoring of the secondary usage of data
• management of access and user queries on behalfof the data owner
Data centres may apply certain criteria to evaluate andselect datasets for preservation.
HOW TO SHARE DATA
Data centres where researchers can deposit
data for preservation and distribution
Antarctic Environmental Data Centre
Archaeology Data Service
Biomedical Informatics Research Network Data Repository
British Atmospheric Data Centre
British Library
British Oceanographic Data Centre
Cambridge Crystallographic Data Centre
Environmental Information Data Centre
European Bioinformatics Institute
Geospatial Repository for Academic Deposit and Extraction
History Data Service
Infrared Space Observatory
National Biodiversity Network
National Geoscience Data Centre
NERC Earth Observation Data Centre
NERC Environmental Bioinformatics Centre
Publishing Network for Geoscientific & Environmental Data
Scran
The Oxford Text Archive
UK Data Archive
UK PubMed Central
UK Solar System Data Centre
Visual Arts Data Service
45
![Page 8: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/8.jpg)
A crucial part of making data user-friendly, shareableand with long-lasting usability is ensuring that theycan be understood, interpreted and used at all times.This requires clear and detailed data description anddocumentation.
Data documentationComprehensive data documentation is easiest when begunat the onset of a project and continued throughout theresearch process. It should be considered as part of bestpractice in terms of organising and managing data.
Data documentation explains how data were createdor digitised, what data mean, what their content andstructure is, and any manipulations that may have takenplace. It ensures that data can be understood withinresearch teams and that researchers will continue tounderstand their own data in the long term.Gooddocumentation is also vital for any successfuldata preservation.
Good data documentation includes information on:
• the context of data collection: project history, aims, objectives, hypotheses, etc.
• data collection methods: data collection process, sampling design, instruments, hardware and software used, questionnaire used, scale and resolution, temporal coverage and geographic coverage
• dataset structure: data files, cases, relationships between files, etc.
• data sources used
• data validation, checking, proofing, cleaning and other quality assurance procedures carried out
• modifications made to data over time since their original creation and identification of different versions of datasets
• where applicable information on data confidentiality and consent agreements made
Numerical, tabular data should also bedocumented with:
• names, labels and descriptions for variables, fields, records and their values
• explanation of codes and classifications schemes used, with reference to published classifications where appropriate
• codes of, and reasons for, missing values
• derived data created after collection, with well documented code, algorithm or command file usedto create them
• weighting and grossing variables created
Qualitative data require as documentation:
• data listing of biographical characteristicsof interviewees
• listing and descriptions of image or sound files
Variable-level descriptions may be embedded within adataset itself as metadata. Other documentation may becontained in user guides, reports, publications, workingpapers and laboratory books.
6
DATA DOCUMENTATION AND METADATA
The Stockholm Environmental Institute has created an integrated spatial
dataset4, Social and Environmental Conditions in Rural Areas (SECRA),
containing socio-economic and environmental characteristics of all rural
Census 2001 Super Output Areas. This dataset is available online as an
MS Access database and can be downloaded together with
accompanying metadata and documentation files that clearly describe
the data.
The dataset is organised in four data themes - natural and constructed
features, qualities of people and place, living and working, and political
and economic context. The dataset is documented in detail by:
• four data list files providing descriptions of all variables within each
data theme
• four metadata files describing how each variable was constructed
and calculated, with references for the data sources used
• a detailed report describing the rationale for the dataset, the
relevance of variables to rural conditions, the methodology used, an
overview of variables included per data theme and examples of how
the dataset may be used
All data files and metadata files are clearly labelled according to the four
data themes.
Case study
![Page 9: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/9.jpg)
7
Example of online study-level documentation available for a datasetin the UKDA catalogue
Format Name Size in Kilo Bytes Description
PDF guide.pdf 3557 User guide
PDF method.pdf 509 Methodology
PDF source.pdf 41 Example of sources
HTML UKDA Study 4177 Information.htm 18 Study information and citation
MetadataIn the context of data management, metadata are asubset of core data documentation, which providesstandardised structured information explaining thepurpose, origin, time references, geographic location,creator, access conditions and terms of use of a dataset.Metadata are typically used:
• for resource discovery, providing searchable information that helps users to find existing data
• as a bibliographic record for citation
Metadata for online data catalogues or discoveryportals are often structured to international standardsor schemes such as Dublin Core, ISO 19115 for geographicinformation, Data Documentation Initiative (DDI),Metadata Encoding and Transmission Standard (METS)and General International Standard Archival Description(ISAD(G)).
The use of standardised records in eXtensible Mark-upLanguage (XML) brings key data documentation togetherinto a single document, creating rich and structuredcontent about the data. Metadata can be viewed withweb browsers, can be used for extract and analysisengines and can enable field-specific searching. Disparatecatalogues can be shared and interactive browsing toolscan be applied. In addition, metadata can be harvestedfor data sharing through the Open Archives InitiativeProtocol for Metadata Harvesting (OAI-PMH).
Researchers typically create metadata records for theirdata by completing a data centre’s data deposit form orby using a metadata creation tool, such as Go Geo GeoDocor MetaGenie. Providing detailed and meaningful datasettitles, descriptions and keywords, etc. enables datacentres to create rich metadata for archived datasets.This should enable more comprehensible resourcediscovery and data that are easier to use.
Data centres accompany each dataset with a bibliographiccitation that data users are required to state in researchoutputs to reference and acknowledge accurately thedata source used. A citation gives credit to the datasource and distributor and identifies data sourcesfor validation.
![Page 10: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/10.jpg)
8
Choice of formats and conversionsThe selection of software used to collect, create anddigitise data typically depends on how researchers wish toanalyse data. When considering the long-term usability ofdata, attention needs to be given to the most appropriatesoftware and data format to use.
All digital information is designed to be interpretedby computer programs to make it understandable andis by nature software dependent. All digital data are thusendangered by the loss and obsolescence of the hardwareand software environment on which access todata depends.
Despite the backward compatibility of many softwarepackages to import data created in previous softwareversions and the interoperability between competingpopular software programs, the safest option toguarantee long-term data access is to convert data tostandard formats that most software are capable ofinterpreting, and that are suitable for data interchangeand transformation. This typically means using openformats - such as Rich Text Format (RTF) or OpenDocument Format (ODF) - or as open as possible,as opposed to proprietary ones.
Thus, whilst researchers use the most suitable dataformats and software according to their planned analyses,once data analysis is completed and data are prepared tobe stored, researchers should consider converting theirresearch data to standard, interchangeable formats, inorder to avoid being unable to use the data in the future.Equally for back-ups of data, standard formats shouldbe considered.
For long-term digital preservation, data archives also holddata in such standard formats. At the same time, data areoffered to users by conversion to current common anduser-friendly data formats.
When researchers offer data to data archives forpreservation, it is preferable for the researchersthemselves to convert data to a preferred datapreservation format, as the person knowing the datais in the best position to ensure data integrity duringconversions. Advice should be sought on up-to-dateformats, as software changes occur so quickly.
When data are converted from one format to another -through export or by using data translation software -certain changes may occur to the data. For data held inspreadsheets or databases, some data or internalmetadata may be lost during conversions to anotherformat, e.g. missing value definitions, decimal numbersor variable labels. For textual data, editing such ashighlighting or bold text may be lost. After softwareconversions, data should therefore be checked for errorsor changes that may be caused by the export process.The researchers knowing the data are in the best positionto carry out such conversions.
DATA FORMATS AND SOFTWARE
The Wessex Archaeology Metric Archive Project5 has brought together
metric animal bone data from a range of archaeological sites in England
into a single database format. The dataset contains a selection of
measurements commonly taken during Wessex Archaeology
zooarchaeological analysis of animal bone fragments found during field
investigations. The dataset was created by the researchers in MS Excel
and MS Access formats and deposited with the Archaeology Data
Service (ADS) in the same formats. ADS has preserved the dataset in
Oracle and in CSV format and disseminates the data via both a
Oracle/Cold Fusion live interface and as downloadable CSV files.
Case study
![Page 11: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/11.jpg)
89
Data formats currently recommended by UKDA for long-term preservation of research data
Note that other data centres or digital archives may recommend different formats
Type of dataPreferred format for management
back-ups and data preservation
Other acceptable formats
for data preservation
Quantitative tabular data with
extensive metadata
i.e. a dataset with variable labels, code labels,
and defined missing values, in addition to the
matrix of data
SPSS portable (.por) format, or delimited text
and command (‘setup’) file (SPSS, Stata, SAS,
etc.) containing metadata information, other
structured text/mark-up file containing metadata
information e.g. DDI XML file
proprietary formats of statistical packages , e.g.
SPSS (.sav), Stata (.dta)
Quantitative tabular data with
minimal metadata
i.e. a matrix of data with or without column
headings/variable names, but no other
metadata or labelling
comma-delimited (.csv) or tab-delimited (.tab)
files, including delimited text of given character
set with SQL data definition statements where
appropriate - these are most widely used, and
most widely recognised by import ‘wizards’
delimited text of given character set -
only characters not present in the data should
be used as delimiters, widely-used formats
e.g. MS Excel (.xls/.xlsx), MS Access
(.mdb/.accdb), dBase (.dbf) and
OpenDocument Spreadsheet (.ods)
GIS and CAD data
i.e. vector and raster
ESRI Shapefile (.shp, .shx and .dbf),
geo-referenced TIFF (.tif and .tfw),
CAD data (.dwg), GIS attribute data - ‘tabular
data with minimal metadata’
MapInfo Interchange Format (.mif) for vector
data, Keyhole Markup Language (.KML) as
used for Google Earth, Google Maps, Adobe
Illustrator, CAD data (.dxf or .svg), binary
formats of GIS and CAD packages may
be acceptable
Qualitative data
textual
eXtensible Markup Language (XML) marked-up
text according to an appropriate Document
Type Definition (DTD) or schema, Rich Text
Format (.rtf), plain text data, ASCII (.txt)
Hypertext Markup Language (HTML),
widely-used proprietary formats e.g. MS Word
(.doc/.docx), proprietary/software-specific
formats such as NUD*IST, NVivo and ATLAS.ti
Digital image data TIFF (version 6) uncompressed
JPEG (.jpeg, .jpg), TIFF (other versions),
Adobe Portable Document Format
(PDF/A or PDF), raw image format (.RAW)
software-specific formats, e.g. Photoshop files
(.psd)
Digital audio dataFree Lossless Audio Codec(FLAC) (.flac),
WAV file (.wav)
MPEG-1 Audio Layer 3 (.mp3),
Audio Interchange File Format (AIFF) (.aif)
Digital video data JPEG 2000
DocumentationRich Text Format (.rtf), PDF/A or PDF,
HTML (.htm), Open Document Text (.odt)
plain text (.txt), widely-used proprietary formats
e.g. MS Word (.doc/.docx) or Excel (.xls/ .xlsx),
are acceptable but offer less long-term security,
XML marked-up text according to an
appropriate DTD or schema, e.g. XHMTL 1.0
![Page 12: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/12.jpg)
10
TranscriptionWhere qualitative data are collected as audio or videorecordings, such as for interviews or focus groups, theyideally should also be transcribed as textual files forarchiving and sharing. Transcripts should:
• have a unique identifier
• have a document header giving brief details of the data collection event, including date, place, interviewer name, interviewee details, etc.
• have a uniform layout throughout theresearch project
• make use of speaker tags indicating the question/answer sequence
• use pseudonyms to anonymise personalidentifying information
• have line breaks
• be page numbered
Data quality controlQuality control of data is an integral part of all researchand takes place at various stages, during data collection,data preparation and data verification.
During data collection, researchers must ensure that thedata recorded reflect the actual facts, responses orevents, for example:
• Computer-Aided Interview (CAI) software can be used to verify response consistency, routing questions so that only appropriate questions are asked and confirming responses against previous answers where appropriate
• for audio-visual data or interview recordings, the quality of data depends on the quality of the audio-visual equipment used
• if data are collected with instruments, calibration of instruments is essential to check the precision, bias and/or scale of measurement; data are validated by checking for equipment as well as transcription errors; data may be verified by checking the truth of the record with an expert or by taking multiple measurements, observations or samples
The quality of data collection methods used has asignificant bearing on data quality.
During data preparation, when data are transcribed,entered in a database or spreadsheet, coded, etc.,quality is ensured by adhering to standardised andconsistent procedures for data entry and transcription.This may include setting up validation rules in data entrysoftware or detailed labelling of variable and recordnames to avoid confusion.
During data verification, data are edited, cleaned,verified, double-checked and cross-checked. Checkingtypically involves both automated and manual procedures.This may include double-checking the coding of responsesand removing out-of-range codes; verifying randomsamples of the digital data against the original data;verifying the entire dataset; or double entry of data.
Study Title: ‘Healthy diets across generations’
Depositor: A. Person
Interviewer: A. Person
Interview number: 12
Interview ID: Chris Smith
Date of interview: 3 May 2007
Information about interviewee
Date of birth: 6 June 1949
Gender: male
Marital status: widowed
Occupation: bricklayer
Geographic region: North-East England
I: Just one or two factual details first of all
before we go on to your health and that....
how old are you?
FL: I'm 58 in June.
I: What schools did you go to? Can you remember
that far back!
FL:Oh... the last school was at Longside.. aye,
ken Longside?
Sample transcript
![Page 13: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/13.jpg)
1011
Version controlIt is important to ensure that different copies or versionsof files, materials held in different formats or locationsand information that is cross-referenced between filesare all subject to version control. Checks and proceduresshould be put in place to make sure that if theinformation in one file is altered, the related informationin other files is also updated. It is important to keep trackof which version of a file is the most current, especiallywhere files are shared between people or held indifferent locations.
Best practice is to:
• uniquely identify files, preferably using a systematic naming convention
• clearly record version and status of a file, e.g. draft, interim, final, internal
• record what changes are made to a file when a new version is created
• record relationships between items as in many cases the information contained in a single file is supported by information held in other files, e.g. relationship between the code and the data file it is run against, or between the data file and the documentation or metadata that relate to it, or between multiple tables
• track the location of all files if stored in a varietyof locations
• regularly synchronise files in different locations,e.g. using MS SyncToy software
• maintain single master files in a suitable format to remove version control problems associated with multiple working versions being developed in parallel
Version control can be maintained through:
• file naming conventions, using number sequencesor dates in file names - although avoid very long file names or using spaces and special characters infile names
• including a file history or version control table at the start of each file, in which versions, dates, authors and details of changes to the file are recorded
• versioning software
AuthenticityDigital information can be copied, altered or deletedvery easily. It is therefore important to be able todemonstrate the authenticity of data and to preventunauthorised access to data that may potentially leadto unauthorised changes.
Best practice to ensure authenticity and controlaccess is to:
• keep a master file (a formalised and checked master copy of the data and other materials)
• assign responsibility for master files where possible to an individual member of the project team
• restrict write access to master versions to specific members of the project team
• create a formal procedure for the destruction of master files
• record all changes to master files
• maintain old master files (in case later onescontain errors)
• archive copies of master files at certain stagesof development
![Page 14: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/14.jpg)
Making back-upsMaking back-ups of files is an essential element of datamanagement. Regular back-ups protect against accidentalor malicious data loss due to:
• hardware failure
• software or media faults
• virus infection or malicious hacking
• power failure
• fire, flood or other severe catastrophe
• human errors
Backing up involves making copies of files which canbe used to restore originals if there is loss of data.Choosing a precise back-up procedure to adopt dependson local circumstances, the perceived value of the dataand the levels of risk considered appropriate for thecircumstances. Where data contain personal information,care should be taken to only create the minimal numberof copies needed, e.g. a master file and oneback-up copy.
When deciding upon the best back-up procedure for datafiles, consider:
• whether to back up particular data files or back up the entire system
• the frequency of back-up
• back up after each change to data or atregular intervals
• frequently used and critical data files may be backed up daily using an automated back-up process
• if data are held on an institutional network space, they may be automatically backed up at regular intervals thanks to an institutional back-up policy
• carrying out incremental or differential back-ups
• incremental back-up consists of first makinga copy of all relevant files - often the complete contents of a personal computer - and then making incremental back-ups of the files which have altered since the last back-up; removable media (CD/DVD) is recommended
• for differential back-ups a complete back-up is made first, then back-ups are made of files changed or created since the first full back-up and not just since the last partial back-up; using ‘fixed’ media such as hard drives is recommended
• choice of media
• depends on the quantity of files, type of data,and the preferred method of backing up
• examples include recordable CD/DVD, networked hard drive, removable hard drive or magnetic tape
• location of back-up files
• online back-up files or offline storage onremovable media or transportable hard drives which can be physically removed to another location for safe-keeping
• organising and labelling well all back-up files
• formats
• back-ups of master copies should ideally be in formats that are suitable for long-term digital preservation
• this typically means open as opposed toproprietary formats
• verifying and validating back-up files regularly
• fully restoring them to another location and comparing them with the originals
• comparing back-up copies for completeness, for example by checking the MD5 sum values, file size and date to ensure the integrity of the files
• not overwriting old back-ups with new ones
DATA STORAGE, BACK-UP AND SECURITY
12
![Page 15: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/15.jpg)
13
Data storageThe storage of any digital research data should be basedon two principles:
• digital storage media are inherently unreliableunless they are stored appropriately
• all file formats and physical storage media will ultimately become obsolete
Best practice is to:
• store data in formats which meet long-term readability requirements
• in general this means non-proprietary formats or open standard formats
• some proprietary formats, such as MS Rich Text Format and Excel, are widely used and likely to be accessible for a reasonable, but not unlimited, time after becoming obsolete
• obtain up-to-date guidance since changes to formats and media may occur quickly
• make digital versions of paper documentation in PDF/A format for long-term security
• copy/migrate data files to new media between two and five years after they were first created, since optical media (CDs and DVDs) and magnetic media (hard drives, tapes) are subject to physical degradation
• check the data integrity of all stored data files at regular intervals
• ensure that any storage strategy, even for a short-term project, involves at least two different formsof storage
• organise stored data well, ensuring they are easily located and physically accessible
• ensure that areas and rooms designated for storageof digital or non-digital data are suitable for the purpose, are structurally sound and free from the risk of flood and fire
Also note that optical and magnetic media are vulnerableto poor handling, changes in temperature, changes inrelative humidity, air quality and lighting conditions. TheNational Preservation Office has published guidelines6 oncaring for CDs and DVDs which is available on the BritishLibrary web site.
Non-digital printed materials and photographs are subjectto degradation from sunlight and acid, e.g. from sweat onskin and in some kinds of paper.
A project at a university carries out coral reef research. Field data are
collected using handheld Personal Digital Assistants (PDAs). Digital data
are transmitted daily to the university network drive, where they are held
in password protected files. All data files are identified by an individual
version number and creation date. Version information (version numbers
and notes detailing differences between versions) are stored in a
spreadsheet, also on the network drive.
The university’s network drive is fully backed up onto Ultrium LTO2 data
tapes. Incremental back-ups are made daily Monday to Thursday; full
server back-ups are made over Friday/Saturday/Sunday. Tapes are
securely stored in a separate building. Upon completion of the research
the datasets are deposited in the university’s digital repository.
Case Study
![Page 16: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/16.jpg)
Data securityData security is the protection of any data fromunauthorised access, use, change, disclosure anddestruction; as well as the prevention of unwantedchanges that can affect the integrity of data.
Some elements of data security, where the safe-guardingof personal data is involved, are based on nationallegislation (the Data Protection Act) and therefore cannotbe ignored. Personal data should only be accessible toauthorised persons. Personal data may also exist in non-digital format. Consider for example the storage ofcompleted consent forms, interview cover sheets or papercopies of a questionnaire survey. These should beprotected in the same way as digital files. In addition,personal and identifying data (e.g. people’s names and/oraddresses) should be stored separate from the relatedanonymised data.
Data which contain personal information should betreated with higher levels of security than data which donot. Security arrangements therefore need to beproportionate to the nature of the data and the risksinvolved. Ensuring data security means paying attentionto physical security, network security and security ofcomputer systems and files.
Physical security includes:
• restricting access to rooms and buildings where digital data, computers or media are held
• logging the removal of, and access to, media or hardcopy material in store rooms
• transporting personal data only under certain exceptional circumstances, even for repair purposes, e.g. giving a failed hard drive containing sensitive data to a computer manufacturer may cause a breach of security
• carrying out the destruction of data in a consistent manner – paper should be shredded and computer files permanently deleted from all systems
• ensuring the secure destruction of files on a computer at the end of its productive life
• deleting all files and reformatting a hard drive will not prevent the possible recovery of data that have previously been on that hard drive
• specialist advice should be sought where needed, CD/DVD shredders should be used and hard drives should be removed from their casings and disposed of securely
14
In February 2008 the British Library (BL) received the recorded output of the Survey of Anglo-Welsh Dialects (SAWD),
carried out by University College, Swansea between 1969 and 1995. This survey recorded the English spoken in
Wales by interviewing and tape-recording elderly speakers on topics including the farm and farming, the house and
housekeeping, nature, animals, social activities and the weather.
The collection was deposited in the form of 503 digital audio files, which were accessioned as .wav files in the BL’s
Digital Library. Digital clones of all files are held at the Archive of Welsh English, alongside the original master
recordings on 151 audio cassettes, from which the digital copies were created. The BL’s Digital Library is mirrored on
four sites – at Boston Spa, St Pancras, Aberystwyth and a ‘dark’ archive which is provided by a third party. Each of
these servers has inbuilt integrity checks. The British Library makes available access copies for users, in the form of
.mp3 audio files, in the British Library Reading Rooms via the Soundserver system. A small set of audio extracts from
the SAWD recordings are also available online on the BL’s Accents and Dialects web site, Sounds Familiar.
Case Study
![Page 17: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/17.jpg)
1615
Network security means:
• not storing confidential data such as those containing names or addresses on servers or computers connected to an external network, particularly servers that host internet services, either web or email
Security of computer systems and files includes:
• locking computer systems with a password and installing a firewall system
• protecting from viruses and malicious code through regularly updated virus detection software
• implementing password protection of, and access permissions to, data files (no access, read only, read and write, administrator permission)
• imposing confidentiality agreements for data users of confidential data
• never sending personal or confidential data via email or using File Transfer Protocol (FTP), but rather as encrypted data
The risk of security breaches and disclosure ofconfidential data can also be removed or reduced by:
• anonymising digital data by removing identifiers or aggregating data (see section on Anonymising Data)
• separating disclosable from non-disclosable data by obscuring, removing or hiding individual fields, records, columns or tables
Encrypting data for transmissionAfter testing a number of software applications forencrypting data to allow secure data transmission fromgovernment departments to the data archive, the UKDArecommends the use of Pretty Good Privacy (PGP), anindustry-standard encryption technology. Availablesupporting encryption software can be open source,e.g. GnuPG, or commercial, e.g. PGP.
Encryption requires the creation of a Public and PrivateKey pair and passphrase. The Private PGP Key andpassphrase are used to digitally sign each encrypted file,and thus allow the recipient to validate the sender’sidentity. The recipient’s Public PGP Key is installed by thesender in order to encrypt files so that only theauthorised recipient can decrypt them.
![Page 18: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/18.jpg)
When research involves obtaining data from people,e.g. in social, anthropological or medical research,researchers are expected to maintain high ethicalstandards. Ethical guidelines are typically issued byprofessional bodies, institutions and funding organisations.
Research will usually require obtaining informed consentfor people to participate in research and for use of theinformation collected. It is essential that consent alsotakes into account long-term use of data, such aspreserving and sharing data. Without consent for datasharing, opportunities for sharing research data withother researchers can be jeopardised.
Personal, confidential and sensitivepersonal dataAt times data obtained from people may hold sensitive orconfidential information. This does not mean that all dataobtained by research with participants are confidential.
Personal data are defined in the Data Protection Act 1998as data which relate to a living individual who can beidentified from those data, or from those data and otherinformation which is in the possession of, or is likely tocome into the possession of, the data controller (e.g.researcher). This includes any expression of opinion aboutthe individual. Confidential data are data that:
• can be connected to the person providing them or that could lead to the identification of a person referred to (names, addresses, occupation, photographs)
• are given in confidence, or data agreed to be kept confidential (secret) between two parties, that are not in the public domain
• are conditioned by factors such as ethicalguidelines,legal requirements or research-specific consent agreements
Sensitive personal data are defined in the Data ProtectionAct 1998 as data that may incriminate a participant orthird party, such as a person's race, ethnic origin, politicalopinion, religious beliefs, trade union membership,physical or mental health, sexual orientation, criminalproceedings or convictions.
Strategies for dealing with confidentiality depend uponthe nature of the research, but are essentially informedby a researcher’s ethical obligations towards participantsand society and by legislation such as the Data ProtectionAct 1998.
Legislation7 that may impact on the sharing ofconfidential data:
• duty of confidentiality
• Data Protection Act 1998
• Freedom of Information Act 2000
• Human Rights Act 1998
• Statistics and Registration Services Act 2007
• Environmental Information Regulations 2004
Sensitive and confidential data can be shared ethicallyif researchers pay attention, from the planning stages ofresearch, to three important aspects:
• when gaining informed consent, include consentfor data sharing
• where needed, protect people’s identities by anonymising data
• consider access restrictions to data
These measures should be considered jointly and never inisolation. The same measures form part of good researchpractice and data management, even if data sharing isnot envisioned.
RESEARCH ETHICS, CONSENTAND DATA CONFIDENTIALITY
16
![Page 19: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/19.jpg)
Thank you very much for agreeing to participate in this survey.
The information provided by you in this questionnaire will be used for research purposes. It will not be used in a
manner which would allow identification of your individual responses.
Anonymised research data will be archived at ………. in order to make them available to other researchers in line with
current data sharing practices.
I have read and understood the project information sheet dated DD/MM/YYYY. �
I have been given the opportunity to ask questions about the project. �
I agree to take part in the project. Taking part in the project will include being interviewed and audio
recorded [other forms of participation can be listed]. �
I understand that my taking part is voluntary; I can withdraw from the study at any time and I will not
be asked any questions about why I no longer want to take part. �
Select only one of the next two options:
I would like my name used where what I have said or written as part of this study will be used in
reports, publications and other research outputs so that anything I have contributed to this project
can be recognised. �
I do not want my name used in this project. �
I understand my personal details such as phone number and address will not be revealed to people
outside the project. �
I understand that my words may be quoted in publications, reports, web pages, and other research
outputs but my name will not be used unless I requested it above. �
I agree for the data I provided to be archived at ………. [More detail can be provided here so thatdecisions can be made separately about audio, transcripts, etc.] �
I understand that other researchers will have access to this data only if they agree to preserve the
confidentiality of that data and if they agree to the terms I have specified in this form. �
I understand that other researchers may use my words in publications, reports, web pages, and
other research outputs according to the terms I have specified in this form. �
I agree to assign the copyright I hold in any materials related to this project to [name of researcher]. �
________________________ ________________ ________
Name of Participant Signature Date
________________________ ________________ ________
Researcher Signature Date
Contact details for further information: Names, phone, email addresses, etc.
Sample consent statement for quantitative surveys
Sample extensive consent form for interviews
17
![Page 20: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/20.jpg)
Informed consent and data sharingIt is essential that gaining consent takes into account anyfuture uses of data, such as the sharing, preservation andlong-term use of research data. Researchers should:• inform participants how research data will be stored,
preserved and used in the long-term
• inform participants how confidentiality will be maintained e.g. by anonymising data
• obtain informed consent (written or verbal) fordata sharing
To ensure that consent is informed, consent must befreely given with sufficient information provided on allaspects of participation and data use. There must beactive communication between the parties. Consent mustnever be inferred from a non-response to acommunication such as a letter.
Written or verbal consent?Whether informed consent is obtained in writing througha detailed consent form, by means of an informativestatement, or verbally, depends on the nature of theresearch, the kind of data gathered, the data format andhow the data will be used.
• For detailed interviews or research where personal, sensitive or confidential data are gathered, the use of written consent forms is recommended to assure compliance with the Data Protection Act and with ethical requirements. Written consent documentation typically includes an information sheet and consent form signed by the participant.
• For surveys or informal interviews, where no personal data are gathered or personal identifiers are removed from the data, obtaining written consent may not be required. At a minimum an information sheet should be provided to participants detailing the nature and scope of the study, the identity of the researcher(s) and what will happen to the data collected (including any data sharing).
• If data are collected verbally through audio or video recordings, verbal consent agreements can be recorded together with the data.
• For audio-visual data where the identity of people may be disclosed from the data, it may be important that informed consent is obtained to use the data unaltered for research purposes, sharing and preservation. Voice alteration or image blurring are usually labour and cost intensive and decrease the research potential of data.
Sample consent forms8 are available from theUK Data Archive.
Research Ethics Committees
and data sharing
There is a potential tension between data sharing and
data protection. Data archives work to increase
availability of, and access to, research data, while the
primary purpose of Research Ethics Committees
(RECs) is to ensure ethical conduct in research and to
protect the safety, rights and well being of research
participants.
The need to protect personal data and preserve
confidentiality - where explicitly required - cannot be
overstated. This does not mean, however, that all
research data obtained from research with people
should be kept confidential, cannot be shared or,
worse even, are destroyed. It is important to
distinguish between personal or sensitive data
collected in research, and research data in general.
Personal data should not be disclosed, unless
consent has been given for disclosure. Identifiable
information may be excluded from data sharing. A
REC should, however, not object to the sharing of
research data in general. If research data contain
sensitive or confidential information, then the sharing
of such data must be considered carefully, but should
not be dismissed as being impossible.
18
![Page 21: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/21.jpg)
2019
One-off or process consent?Discussing and obtaining consent for:
• participation in research
• the use of the information gathered for analyses, publications and outputs
• data sharing beyond the research
can be a one-off occurrence or an ongoing process.
One-off consent is simple, practical, avoids repeatedrequests to participants, and meets the formalrequirements of most Research Ethics Committees.However, it may place too much emphasis on‘ticking boxes’.
If consent is considered throughout the research process,it assures active informed consent from participants.Thus, consent for participation in research, for data useand for data sharing can be considered at different stagesof the research, giving participants a clearer view of whatparticipating in the research involves and what the datato be shared consist of. It may, however, be too repetitiveand annoying for some participants.
Special consent9 considerations are needed for:
• medical research
• research with children and young adults
• research with people with learning difficulties
• research within organisations or the workplace
• research into crime
• internet research
The Biological Records Centre (BRC)10 is the national custodian
of data on the distribution of wildlife in the British Isles. Data are
provided by volunteers, researchers and organisations. BRC
disseminates data for environmental decision-making, education
and research.
Data whose publication could present a significant threat to a species
or habitat (e.g. nesting location of birds of prey) will be treated as
confidential. The BRC provides access to the data it holds via the
National Biodiversity Network Gateway. Standard access controls are
as follows:
• public access to view and download all records at a minimum
10 km2 level of resolution, and at higher resolution if the data
provider agrees
• registered users have access to view and download all except
confidential records at the 1 km2 level of resolution
• conservation organisations have access to view and download
all except confidential records at full resolution with attributes
• conservation officers in statutory conservation agencies have
access to view and download all records, including confidential
records at full resolution with attributes
• records that have been signified as confidential by a data provider
will not be made available to the conservation agencies without the
consent of the data provider
Case Study
![Page 22: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/22.jpg)
Anonymising dataBefore data obtained from research with people can bedisseminated, published or shared with other researchers,they may need to be anonymised11 so that individuals,organisations or businesses cannot be identified from thedata. Anonymisation may be needed for ethical reasons toprotect people’s identities, for legal reasons to notdisclose personal data, or for commercial reasons.Personal data should not be disclosed from researchinformation, unless a respondent has given specificconsent to do so. In some forms of research, for examplewhere oral histories are recorded or in anthropologicalresearch, it is customary to publish and share the namesof people studied, for which they have giventheir consent.
Quantitative datasets may be anonymised by:• removing direct identifiers, e.g. name or address
• aggregating or reducing the precision of a variable, e.g. replacing the date of birth by age groups or replacing full postcodes with postcode sectors
• generalising the meaning of a detailed text variable, e.g. replacing a doctor’s detailed area of medical expertise by ‘an area of medical speciality’
• restricting the upper or lower ranges of a variable to hide outliers, e.g. top-coding salaries
Special attention may be needed for: • relational data, where relations between variables in
related datasets can disclose identities
• geo-referenced data, where identifying spatial references such as point co-ordinates also have a geographical value
Simply removing spatial references prevents disclosure,but it also means that all geographical, locational andrelated information is lost. A better option may be tokeep spatial references intact and to impose accessrestrictions on the data instead. As an alternative, pointco-ordinates may be replaced by larger, non-disclosinggeographical areas or by meaningful alternative variablesthat typify the geographical position.
When anonymising qualitative material such astranscriptions of textual data, identifiers should not becrudely removed or aggregated, as this can distort thedata or even make them unusable. Rather pseudonyms,replacement terms or vaguer descriptors should be used.The objective should be to achieve a reasonable level ofanonymisation, avoiding unrealistic or overly harshediting, whilst maintaining maximum content. Proceduresto anonymise data should always be considered alongsideobtaining informed consent for data sharing.
Best practice for qualitative data is to:
• plan anonymisation at the time of transcription or initial write up
• use pseudonyms or replacements
• retain unedited versions of data for use within the research team and for preservation
• create an anonymisation log of all replacements, aggregations or removals made; care should be takento store such a log separately from the anonymised data files
• identify replacements in a meaningful way,e.g. with [brackets]
Digital manipulation of audio and image files can be usedto remove personal identifiers. However, techniques suchas voice alteration and image blurring are labour-intensive and expensive to apply to large quantities ofdata and are likely to damage the research potential ofthe data. If confidentiality of audio-visual data is anissue, it is better to obtain the participant’s consent touse and share the data unaltered.
A person’s identity can be disclosed from:
• direct identifiers, e.g. name, address, postcode
information or telephone number
• indirect identifiers that, when linked with other
publicly available information sources, could
identify someone, e.g. information on workplace,
occupation or exceptional values of characteristics
like salary or age
20
![Page 23: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/23.jpg)
Access controlUnder certain circumstances, sensitive and confidentialdata can also be safeguarded by regulating or restrictingaccess to and use of such data, while at the same timeenabling data sharing for research and educationalpurposes.
Data held at data centres and archives are not generallyin the public domain. Their use is restricted to specificpurposes after user registration. Users sign an End UserLicence in which they agree to certain conditions, i.e. notto use data for commercial purposes or identify anypotentially identifiable individuals.
Data centres may impose stricter access regulations forconfidential data, such as:
• providing access to approved researchers only
• requiring data access authorisation from the data owner prior to release
• placing confidential data under embargo fora given period of time until confidentiality is nolonger pertinent
• providing secure access to data, which enables analysis of confidential data but excludes access to the data or the ability to download the data
Mixed levels of access regulations may be putin place for some datasets, combining regulatedaccess to confidential data with user access tonon-confidential data.
Data centres typically liaise with the researcherswho own the data in selecting the most suitabletype of access for data. Access regulations should alwaysbe proportionate to the kind of data and confidentialityinvolved.
2221
![Page 24: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/24.jpg)
COPYRIGHT
Researchers ‘creating’ data hold copyright oversuch data. Copyright is an intellectual property right that protects the owner of a work from its unauthorisedcopying. Most research materials including spreadsheets,publications, reports and computer programmes, fallunder literary work and are therefore protected bycopyright. Facts, however, cannot be copyrighted.
If information is structured in a database, the structureacquires a database right, alongside the copyright in thecontent of the database. A database may be protected byboth copyright and database right. For database right toapply, the database must be the result of substantialintellectual investment in obtaining, verifying orpresenting the content.
In the case of interviews, the interviewee holdsthe copyright in the spoken word. If a transcriptionis a substantial reproduction of the words spoken,the speaker will own copyright in the words andthe transcriber will have separate copyright of the transcription.
In the case of collaborative research or derived data,copyright may be held jointly by various researchers orinstitutions. Copyright should be assigned correctly,especially if datasets have been created from a variety ofsources; for example ones which have been bought or‘lent’ by other researchers.
In academia, in theory the employer is the first owner ofthe copyright in a work made during the course of theemployee’s employment. Many academic institutions,however, waive copyright in research materials, data andpublications and give ownership to the researchers.Researchers should check with their institutions if theirinstitution retains copyright or waives it.
When data are archived or shared, the researcher or datacreator keeps the copyright over data. A data archivecannot effectively archive data unless all the rightsholders are identified and give their permission for thedata to be archived.
Secondary users of data must obtain copyright clearancefrom the rights holder before data can be reproduced.Data can be copied for non-commercial teaching orresearch purposes without infringing copyright, under thefair dealing concept, providing that the ownership of thedata is acknowledged to the copyright holder. Anacknowledgement should give credit to the data sourceused, the data distributor and the copyright holder. Datacentres typically specify how data use should beacknowledged and cited either within the metadatarecord for a dataset or in a data use licence.
Scenario: A researcher has collated articles about the Prime Minister from The Guardian over the past ten years, using
the Lexis Nexis database to source articles. They are then transcribed/copied by the researcher into a database so
that content analysis can be applied. The researcher offers a copy of the database with the original transcribed text to
a data centre.
Rights Issue: Researchers cannot share either of these data sources as they do not have copyright in the original
material. A data centre cannot accept these data as to do so would be breach of copyright. The rights holders, in this
case The Guardian and Lexis Nexis, would need to provide consent for archiving.
Case study - Media sources
22
![Page 25: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/25.jpg)
24
Scenario: A researcher has used the National Diet and Nutrition Survey (NDNS) data, obtained via the UKDA. NDNS
data are Crown Copyright. The researcher has processed the NDNS data (filtered, integrated and aggregated data
across variables, while maintaining individual records) and used the processed data to model food chain risks. The
researcher would like to the archive the processed data that were used as input data for the modelling, as well as the
modelling code, at the UKDA.
Rights Issue: There is joint copyright over the processed data, shared between the researcher and the Crown (holding
copyright over the NDNS data). The researcher should declare this joint copyright for the modelling input data and
requires no further permission from the Crown. The UKDA End User Licence, which the researcher signed when
obtaining the NDNS data from the UKDA, specifically states "offer for deposit any new data collections derived from
the data supplied or created by the combination of the data supplied with other data." Thus the UKDA can archive the
processed data with a joint copyright declaration.
Scenario: A researcher has interviewed five retired cabinet ministers about their careers, producing audio recordings
and full transcripts. The researcher then analyses the data and offers them to a data centre for preserving. However
the researcher did not get signed copyright transfers for the interviewees’ words.
Rights Issue: In this case it would be problematic for a data centre to accept the data. Large extracts of the data
cannot be quoted by secondary users. To do so would breach the interviewees’ copyright over their words. This is
equally a problem for the primary researcher. The researcher should have asked for transfer of copyright or a licence
to use the data obtained through interviews, as the possibility exists that the interviewee may at some point wish to
assert the right over their words, e.g. when publishing memoirs.
Scenario: A researcher subscribes to access spatial AgCensus data from EDINA. These data are then integrated with
data collated by the researcher. As part of the ESRC award contract the data has to be offered for archiving at the
UKDA. Can such integrated data be offered?
Rights Issue: The subscription agreement on accessing AgCensus data states that data may not be transferred to any
other person or body without prior written permission from EDINA. Therefore, UKDA cannot accept the integrated data,
unless the researcher obtains permission from EDINA. The researcher’s partial data, with the AgCensus data
removed, can be archived. Secondary users could then re-combine these data with the AgCensus data, if they were to
obtain their own AgCensus subscription.
Case study - Data purchased under licence
Case study - Data collected using in-depth interviews with 'elites'
Case study - Data obtained from UKDA
23
![Page 26: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/26.jpg)
LINKS AND REFERENCES
1 UK Data Archive: www.data-archive.ac.uk
2 RELU Data Management Plans: www.data-archive.ac.uk/relu/plan.asp
3 Wellcome Trust Policy on data management and sharing:
www.wellcome.ac.uk/About-us/Policy/Policy-and-position-statements/WTX035043.htm
4 Social and Environmental Conditions in Rural Areas (SECRA) dataset: www.sei.se/relu/secra/
5 Wessex Archaeology Metric Archive Project: ads.ahds.ac.uk/catalogue/resources.html?abmap_grimm_na_2008
6 National Preservation Office - Caring for CDs and DVDs: www.bl.uk/npo/pdf/cd.pdf
7 UKDA - Research ethics and legislation relevant to data sharing: www.data-archive.ac.uk/sharing/legal.asp
8 UKDA - Consent forms: www.data-archive.ac.uk/sharing/consentforms.asp
9 UKDA - Special cases of consent: www.data-archive.ac.uk/sharing/consentspecial.asp
10 Biological Records Centre: www.brc.ac.uk
11 UKDA - Anonymising research data: www.data-archive.ac.uk/sharing/anonymise.asp
24
![Page 27: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/27.jpg)
![Page 28: Data Management Booklet...The Rural Economy and Land Use programme, drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), requires](https://reader036.vdocument.in/reader036/viewer/2022071007/5fc40b55592b3e2f5d0e57d5/html5/thumbnails/28.jpg)
UK Data ArchiveUniversity of EssexWivenhoe ParkColchesterCO4 3SQEmail: [email protected]: +44 (0)1206 872572/872974