relu conference, 20 january 2005 relu data support service relu-dss data management workshop louise...

83
RELU Conference , 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

Upload: samuel-lindsey

Post on 28-Mar-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

RELU Data Support Service RELU-DSS

Data Management Workshop

Louise Corti and Isabella Tindall

Page 2: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Workshop overview• Guidance for creating and sharing high quality data• Will cover the key practical, technical, legal and ethical

issues including:

• An overview of the RELU themes and projects• Data Management Policy and the RELU Data Support

Service• ESRC’s and NERC’s existing Datasets Policies • Accessing ESRC and NERC archived data holdings• Data held by third parties• QA and data management plans• Data formats, metadata and standards that allow for

longer term sharing and archiving • Ethical and legal issues• Questions/Discussion

Page 3: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

RELU Programme• Rural Economy and Land Use Programme

• Harnessing the sciences for sustainable rural development:

Rural areas in the UK are experiencing a period of considerable change. The rural economy and land use programme aims to advance understanding of the challenges caused by this change today and in the future. Interdisciplinary research is being funded between 2004 and 2009 in order to inform policy and practice with choices on how to manage the countryside and rural economies.

The rural economy and land use programme enables researchers to work together to investigate the social, economic, environmental and technological challenges faced by rural areas. The programme will encourage social and economic vitality of rural areas and promote the protection and conservation of the rural environment.

Page 4: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Themes and data

• RELU themes:– A The Integration of Land and Water Use – B The Environmental Basis of Rural Development – C Sustainable Food Chains (Call 1)– D Economic and Social Interactions with the Rural

Environment

• Call 1: 27 projects funded; smaller pilots/ scoping/capacity building and 8 major data research projects

• Programme is both using and creating a variety of data sources

• Disparate types of data – social and environmental and biological data

Page 5: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Call 1: Research projects

• Eating Biodiversity: An Investigation of the Links between Quality Food Production and Biodiversity Protection

• Comparative Assessment of Environmental, Community & Nutritional Impacts of Consuming Fruit & Vegetables Produced Locally and Overseas

• Biological Alternatives to Chemical Pesticide Inputs in the Food Chain: An Assessment of Environmental and Regulatory Sustainability

• Warmwater Fish Production as a Niche Production and Market Diversification Strategy for Organic Arable Farmers with Implications for Sustainability and Public Health

• Implications of a Nutrition Driven Food Policy for Land Use and the Rural Environment

• Sustainable and Holistic Food Chains for Recycling Livestock Waste to Land

• Integration of Social and Natural Sciences to Develop Improved Tools for Assessing and Managing Food Chain Risks Affecting the Rural Economy

• Re-Bugging the System: Promoting Adoption of Alternative Pest Management Strategies in Field Crop Systems

Page 6: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

RELU Data Management PolicyThe data management policy enhances the capabilities for interdisciplinarity and therefore improves the ability of the research community to:

• apply learning from one field to another

• combine different methodological approaches and sources of information

• cross-fertilise ideas and concepts

• understand scientific, technological and environmental problems in their social and economic contexts

Page 7: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Policy principles• Publicly funded research data are a valuable, long term

resource

• To ensure maximum research exploitation data must be managed effectively from day-1

• Researchers must collect data in such a way as to ensure longer term sharing

• and manage their data effectively during the life of a project

• RELU funds will support data management through the life of the project

• Data must be made available by researchers for archiving: ESRC and NERC supported data centres provide long-term, post-project data management

Page 8: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

RELU Data Support Service

• Set up to provide a support service for RELU researchers and staff to gain information and guidance on issues surrounding longer-term data sharing and preservation

• Joint support service run by:– ESRC/JISC supported UK Data Archive at Essex – The NERC-supported Centre for Ecology &

Hydrology

• Funded for one year supporting one FTE and outreach activities: 1 Jan 05 – 31 Dec 05

Page 9: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

RELU-DSS• a data management advisory and support service for Call 1

award holders and Call 2 applicants and successful award holders

• a web-based information portal that will provide

– expert guidance on data management issues

– a searchable meta-data catalogue, detailing the data that RELU award-holders are intending to produce

• a programme of outreach and training aimed at RELU award holders

• the facilitation of access to key external data sources for RELU projects, where required

• guidance to the PMG and data sub-group on data management issues and longer-term costing for supporting RELU projects’ data management

Page 10: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Research Council Data Policies

RELU Data Management Policy builds on :

• NERC data policy found in the Data Policy Handbook available from the NERC web site www.nerc.ac.uk/data/documents/datahandbook.pdf

• ESRC Datasets Policy found in the www.esrc.ac.uk/esrccontent/researchfunding/sec17.asp

Page 11: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

ESRC Datasets Policy –what is expected of award holders?

• to preserve and share data from ESRC funded research

• funding allowed to prepare data for archiving

• all award-holders must offer data for deposit to the ESDS within 3 months of the end of the award

• any potential problems should be notified to the ESDS at the earliest opportunity

• final payment will be withheld if dataset has not been deposited within 3 months of the end of the award, except where a waiver has been agreed in advance

Page 12: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

NERC Data Policy –Thematic Programmes

• all managers of NERC programmes are expected to be familiar with the Policy

• scientists are expected to consider all the scientific data management implications of their projects at the planning stage (and before submitting grant applications), consulting the Designated Data Centres (DDCs) responsible for scientific data in their subject area.

• The appropriate DDC should be consulted as soon as it is clear what datasets will be emerging from the project. At the end of their projects grant holders are required to offer to deposit with NERC a copy of datasets resulting from their research

Page 13: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Longer-term data sharing

• data centres /archives make (selected) data created available to other bona fide researchers

• safeguards to protect the interests of the original collector, who may retain Intellectual Property Rights

• preserve data using up-to-date curation systems and keep apace with technology and data trends

Page 14: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

RELU Theme C data types• Social data – people based

– Micro (survey)• Household or individual level attributes• Behaviour, attitudes and options

• Business/company– Farm level data– Aggregated

• UK Census e.g. small area statistics)• Retail statistics• health indicators

• GIS/Spatial data geographically referenced environmental databases– Ordnance survey– Road networks– Settlement

Page 15: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

RELU Theme C data types continued

• Water quality, land fill, air quality, emission levels

• Soil data, eg mineral composition

• Ecological data, animal and bird distributions

• Agricultural census

• Climate and meteorological data

• River flow data

• Biochemical data relating to foods/habitats

Page 16: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Existing 3rd party datasets

• Research Council data centres – Rothmansted (BBSRC experimental samples of crops

and soils)– Economic and social data service (eg ESRC Health and

Lifestyle survey)– EDINA/UK Borders (boundary data for admin areas)

• Public/Private Research institutes – Macaulay

• soils and derived; climate; land cover; land capability data

• Department for Environment, Food and Rural Affairs (DEFRA) eg Farm Business survey

• Scottish Executive Environment and Rural Affairs Department (SEERAD)

• Environment Agency (EA)• National Soil Research Institute• Met Office

Page 17: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Use of 3rd party datasets

• 3rd parties likely to require RELU to:

– Identify one point of contact for discussing data issues

• E.g. a NERC Data Centre for EA datasets– All partners in a project to sign licenses for use of

data– The Data Centre to be responsible for issuing

licenses to other projects wishing to use the same data

– The Data Centre to distribute the data, once licenses have been signed

Page 18: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Access to ESRC/NERC

data resources

Page 19: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

ESRC/JISC Data Centre• national data archiving and dissemination

service, running from 1 Jan. 2003

www.esds.ac.uk

• jointly supported by: – Economic and Social Research Council – Joint Information Systems Committee

• partners:– UK Data Archive (UKDA), Essex – Manchester Information and Associated – Services (MIMAS), Manchester– Cathie Marsh Centre for Census and Survey

Research (CCSR), Manchester – Institute of Social and Economic Research

(ISER), Essex

Page 20: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

ESDS overview• ESDS Management

– central help desk service; coherent and flexible collections development policy; central registration service; universal data portal

• ESDS Access and Preservation – collections development strategy; ingest activities -

including data and documentation processing; metadata creation; data dissemination services; long-term preservation

• Specialist data services– ESDS Government– ESDS International– ESDS Longitudinal – ESDS Qualidata

–dedicated web sitesdata and documentation enhancementstailored user support outreach and training

Page 21: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

ESDS HoldingsData for research and teaching purposes and used in all sectors and for many different disciplines

• official agencies - mainly central government

• individual academics - research grants

• market research agencies

• public records/historical sources

• links to UK census data

• qualitative and quantitative

• international statistical time series

• access to international data via

links with other data archives worldwide

• history data service in-house (AHDS)

• 4,000+ datasets

in the collection

• 200+ new

datasets are

added each year

• 6,500+ orders for

data per year

• 18,000+ datasets

distributed

worldwide pa

Page 22: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

The large-scale government surveys

• General Household Survey• Labour Force Survey• Health Survey for England/Wales/Scotland • Family Expenditure Survey• British Crime Survey• Family Resources Survey • National Food Survey/Expenditure and Food Survey • ONS Omnibus Survey • Survey of English Housing • British Social Attitudes• National Travel Survey• Time Use Survey

Page 23: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Benefits of the large-scale government datasets

• good quality data– produced by experienced research organisations– usually nationally representative with large

samples– good response rates– very well documented

• continuous data– allows comparison over time– data is largely cross-sectional

• hierarchical data– individual and household– intra-household differences– household effects on individuals

0

5

10

15

20

25

30

1979 1985 1989 1991 1993 1995 1998 2000

Percentage of women aged 18-49 cohabiting

General Household Survey

Page 24: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Search on ‘Environmental’

200+ datasets

found

Page 25: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Types of qualitative data

• diverse data types: in-depth interviews; semi-structured interviews; focus groups; oral histories; mixed methods data; open-ended survey questions; case notes/records of meetings; diaries/research diaries

• multimedia: audio, video, photos and text (most common is interview transcriptions)

• formats: digital, paper, analogue audio-visual

• data structures - differ across different ‘document types’

Page 26: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

International data providers• International Monetary Fund • OECD • United Nations• World Bank • Eurostat• International Labour

Organisation• UK Office for National

Statistics

• freely available to UK HE/FE – data licensing costs are paid by ESRC

• datasets delivered over the web via Beyond 20/20

Databanks cover:

• economic performance and development

• trade, industry and markets

• employment• demography, migration

and health• governance• human development • social expenditure• education• science and technology • land use and the

environment

Page 27: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

ESDS: Online access to data and user guides

• web pages – easy to navigate format– web catalogue with variable level searching– subject browsing and major series – free web access to online doc - pdf user guides and forms

• registration– one-off registration with userid/password– online account management and “Shopping Basket” ordering– data are freely available for the majority of users– One-stop Athens authentication

• data download and online browsing – web download in various software formats - SPSS, STATA, tab-delimited,

word – Nesstar – online data analysis and visualisation– ESDS International online system– ESDS Qualidata online browsing system

Page 28: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

NERC Data Centres

• NERC’s data holdings – core asset

• Network of 7 Designated Data Centres who are responsible for managing NERC funded data and implementation of the NERC Data Policy data centres

• Central directory – the NERC metadata gateway

• E-Science funded NERC Data Grid under development

Page 29: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

NERC Designated Data Centres

• Antarctic Environmental Data Centre: Responsible for all NERC's data from the Antarctic, regardless of discipline.

• British Atmospheric Data Centre: Responsible for atmospheric sciences data.

• British Oceanographic Data Centre: Responsible for marine data.

• National Geosciences Information Service: Responsible for geosciences data.

• National Water Archive: Responsible for NERC's hydrological data and for the Government's National River Flow Archive.

• Environmental Information Centre: Responsible for all other NERC terrestrial and freshwater data.

• NERC Earth Observation Data Centre: Responsible for NERC’s non-discipline-related remotely sensed data of the surface of the Earth acquired by satellite and airborne sensors.

Page 30: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

NERC Data Centre Holdings• The NERC MetaData Gateway simultaneously

searches the catalogues of data held at several of the NERC designated data centres.

Page 31: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

QA and Data Management Plans

Page 32: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Data Management Plan

• proforma to complete (Section 3 of the Project Communication and Data Management Plan)

• highlighting data management and custody issues at an early stage

• providing a basis for quality assurance within the Programme

• providing a basis from which award holders and the Programme Director can report and monitor project and overall RELU Programme progress

Page 33: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Data management

• Award holders will be required to provide full metadata together with a description of the datasets which their project generates – metadata is the information necessary to

interpret, understand and use a given dataset without reference to the original data collector

• Agree the technical arrangements for data management and archiving (including decisions concerning final archiving destination for project data sets– formats for supply of data– licence agreements; IPR etc.

Page 34: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Information required from plan• requirements for access to existing datasets

• details of new and derived datasets to be produced

• quality assurance of data

• formats and standards

• data description and documentation

• ethical, legal issues and IPR resolution

• data back-up procedures, security

• archiving data (for Research Council data archives)

• data management representative

RELU-DSS helps support these areas

Page 35: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Quality control and data management issues

• Survey data

• Qualitative data

• Environmental data

Page 36: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Characteristics of a “good” archived research collection

• Life cycle approach taken

• accurate data, well organised and labelled files

• appropriate measurement of key concepts

• supporting data/documentation should be deposited to a standard that would enable them to be used by a third partycreated– major stages of research recorded – research/measurement instruments documented

• data that can be stored in user-friendly “dissemination” formats, but can also be archived in a future-proof “preservation” format

• consent, confidentiality & copyright resolved

Page 37: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

ESDS: Supporting documentation•To produce catalogue record and user guide

– funding application– questionnaire/Interview schedules– description of methodology (details of sample design,

response rate, etc)– “codebook”(variable names, variable descriptions, code

names and variable formatting information)– technical report describing the research project.– communication with informants on confidentiality– Coding schemes / themes– End of award report– software description/versions used– bibliographies, resulting publications– code used to create derived variables or check data

(e.g. SPSS, STATA or SAS “command files”).

•Anything that adds insight or aids understanding and secondary usage

Page 38: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Standardised description

(metadata) fields taken from DDI

specification for social science datasets

Page 39: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Survey data - variables

Page 40: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Labelling of survey data

• all variables should be named. Variable names should not exceed 8 characters where possible, as the most common format for disseminating data is SPSS

• all variables should be labelled. Labels should be brief (preferably < 80 characters), but precise and always make explicit the unit of measurement for continuous (interval) variables. Where possible, all variable labels should reference the question number (and if necessary questionnaire). For example, the variable q11bhexc might have the label “q11b: hours spent taking physical exercise in a typical week”. This gives the unit of measurement and a reference to the question number (q11b), so the user can quickly and easily cross-reference to it

Page 41: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Labelling of survey data II

• for categorical variables, all codes (values) should be given a brief label (preferably < 60 characters). For example, p1sex (gender of person 1) might have these value labels: 1 = male, 2 = female, -8 = don’t know, -9 = not answered

• where possible, all such labelling should be created and supplied to the UKDA as part of the data file itself. This is the expectation with data supplied in one of the three major statistical packages - SPSS, STATA or SAS.

Page 42: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

QA survey data: validation checks

Computer aided surveys (CAPI, CATI or CAWI)

• these are the most accurate way of gathering survey data, but the software (e.g. Blaise) and hardware (e.g. a laptop for every interviewer) may be beyond project resources 

• computer aided surveys allow one to build in as many logical checks - on question routing and responses - as is possible at the point of data creation

Non computer aided surveys

• less control over initial responses, but checks can performed:– at the point of data entry/transcription if “data entry” software is

used. However, there are few cheap data entry packages around– the only feasible option may be to enter data without checks

directly into a spreadsheet style interface (e.g. Excel worksheet, SPSS data view), and perform validation checks afterwards - via command files in statistical packages or Visual Basic code in Excel or Access

Page 43: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

An example of data seemingly untouched by the human eye:

Originating error in text variables:

Occupation Description of Occupation‘sole trader’ ‘purveyor of seafood’

Propagated error in derived numeric variables:• Respondent was coded under the standard

occupational (SIC) code relating to food retailers:52.2 Retail sale of food, beverages and tobacco in specialised stores

Page 44: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Identifiers

‘Direct' and 'indirect' identifiers may threaten confidentiality

• Direct identifiers may have been collected as part of the survey administration process and include names, addresses including postcode information, telephone number etc.

• Indirect identifiers are variables which include information that when linked with other publicly available sources, could result in a breach of confidentiality. This could include geographical information, workplace/organisation, education institution or occupation

Page 45: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Quantitative data

• Remove the identifier from the dataset

• Aggregate/reduce the precision of a variable – record the year of birth rather than the day, month and year;

record postcode sectors (first 3 or 4 digits) rather than full postcode

• Bracket a coded (categorical) variable – aggregated SOC up to 'minor group' codes by removing the

terminal digit

• Generalise the meaning of a nominal (string) variable

• Restrict the upper or lower ranges of a continuous variable

Page 46: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Online access to dataNESSTAR:

• browse detailed information (metadata) about these data sources, including links to other sources

• do simple data analysis and visualisation on microdata

• bookmark analyses

• download the appropriate subset of data in one of a number of formats (e.g. SPSS, Excel)

• Data ,must be ‘perfect’ - 100% labelled

Page 47: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Derived and aggregated products

• Permission to share and IPR is main issue

• Range of potential parties with interest:– Owners, funders, data gatherers, employers

other stakeholders, etc.

• All original source information must be recorded

Page 48: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Transcribing qualitative data

• integrated into the ongoing research – budget accordingly

• full transcriptions or summaries

• costs and benefits;– self transcription– internal team transcription– external transcription

• full transcriptions;– consistent layout– speaker tags– line breaks– header with identifier / other details – checked for errors

Page 49: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Qualitative data: identifiers removed

• Scheme devised – different for each dataset

• Ideally should reflect any pseudonyms used in publications

• Confidentiality respected

• Anonymisation?

• Problems of anonymisation– Applied too weakly– Applied to strongly– Timing – Potential for distortion– Examples

• User undertakings

• Appropriate and sympathetic

Page 50: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Qualitative Research• e.g set of in-depth interviews

• Data list: list of contents of research collection

• acts as a point of entry for secondary user

• qualitative data: excel template interviewee/case study characteristics

Page 51: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Online access to qualitative data

• new emphasis on providing direct access to collection content

– supports more powerful resource discovery

– greater scope for searching and browsing content of data (supplementary to higher level study-related metadata)

– since users can search and explore content directly… can retrieve data immediately

• providing access to qualitative data via common interface (EDSD Qualidata Online)

• supporting tools for searching, retrieval, and analysis across different datasets

Means that data must be

accurate and standardised

Page 52: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Page 53: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Back up and security

• Digital, paper and audio media are fragile. Digital media are even easier to change/copy/delete!

• a good backup procedure will protect against a range of mishaps such as: – accidental changes to data– accidental deletion of data – loss of data due to media or software faults– virus infections & hackers– catastrophic events (such as fire or flood)

• Back up frequently, retain off site copies

• Consider storage conditions, fireproofing etc.

Page 54: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

ESDS in-house processing

• in-house data processing

– ‘cleaning up’ research data

– Collating documentation received from depositor

– repairing minor errors

– meeting users’ expectations

– cannot engage in major processing tasks unless destined for publishing into online systems

Page 55: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Environmental Data

Page 56: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Example: LOCAR Programme

• To better understand the hydrological, physical, chemical and biological processes operating in lowland catchments

• To improve modelling to support the integrated management of lowland catchment systems

• To create a database

– £7.75 Million– Three

catchments– 12 Research

projects– Field

Programme

Page 57: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Flow of data through LOCAR

The LOCAR Field Programme

Water level Flow

Ecology Biology

Rainfall Evaporation

NERC and 3 rd Parties

e.g.: OS, EA, MO

LOCAR Data Centre

Lab NERC meta data gateway

LOCAR PI’s and

Users

Finding data

Requesting data

Receiving data

Supplying Supplying data data

The LOCAR Field Programme

Water level Flow

Quality

Ecology Biology

Recharge Evaporation

Groundwater NERC and 3 rd

Parties

e.g.: OS, EA, MO

LOCAR Data Centre

Lab

Processing & QC

NERC meta data gateway

LOCAR PI’s and

Users

Finding data

Requesting data

Receiving data

Supplying Supplying data data

CST UserData Centre

Page 58: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

• Acquire major datasets • Provide data to LOCAR Scientists• Establish standards for data definition and

exchange• Receive data and model output from scientists• Publish appropriate data at the end of the

Programme

• Ensure long term security and availability of LOCAR data

Objectives of the LOCAR Data Centre

Page 59: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Pang and Lambourn Catchments

Site Pang

Lambourn TernFromePiddle

Recharge 7 4 3

Borehole 5 13 9

Water Quality 6 6 14

Flow 8 6 11

Page 60: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Datasets from NERC

• River Network• DTM• Land Cover• HOST• Daily Mean Flows• Rainfall• Ground Water Level • Keyworth Borehole Archive

Records• Wellmaster Borehole data• Geological maps

Page 61: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Raingauges

• Automatic Raingauges– 0.2 mm tipping

bucket - hourly• Manual Raingauges

– Checking Automatic gauge

• Rainwater collector – Rainwater chemistry

samples

• Water levels– Deep boreholes

• Flow – EA gauging

stations– Ultrasonic

doppler flow meter

Level and Flow

Page 62: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Automatic Weather Station

• Solar & net radiation• Wind speed & direction• Air temperature• Relative humidity• Atmospheric pressure• Rainfall• Soil temperature & heat

flux

• Carbon Dioxide and Water Vapour Fluxes

Hydra (Mk 3)

Page 63: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Water Quality

• Temperature• Conductivity• Dissolved oxygen• pH• Turbidity• River level• Automatic water

sampler

• Salmon counts• Smolt counts• Redd counts• Fish surveys• River Habitat

Surveys• Plant surveys (Mean

Trophic Rank)• Diatom surveys• Chironomid Exuviae• Macro invertebrate

surveys

Ecology

Page 64: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Soil Moisture

• Neutron Probe– Soil water content – Radioactive source– Manual

• Profile Probe– 6 shallow depths– Dielectric constant– Automatic

• Tensiometers– Puncture Tensiometers

(Shallow, Manual)– Purgeable Tensiometers

(Shallow, Automatic)– Equitensiometers

(Deeper, Automatic)– Deep jacking

tensiometers (depths up to 60m)

• Soil Water Chemistry– Suction Samplers

Soil Water Potential

Page 65: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Set up Tasks

• Hardware and Software requirements• Create dictionaries• Load site and instrument data• Format conversion facilities• Methods• QC• Meet with 3rd party suppliers• Load 3rd party & NERC data• Liaise with CSTs and PIs• Website

Page 66: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Operational Tasks:

• Receive and load: – Field data – Data from researchers

• Maintenance• Data dissemination• Develop software• Meetings with:

– researchers– CSTs– data managers

• Attend workshops, seminars and annual science meeting

• Report to steering committee

Page 67: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

What can the Data Centre offer Scientists?

• Data– Access to the field programme data– Access to NERC data– Access to third party data

• Data Management– Data Centre – acquire, store,

disseminate, long term storage, standards

– Web site

Page 68: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

What does the Data Centre ask of Scientists?

• Appoint– Quality and Data Managers

• Write and Maintain– Quality and Data Management

Plans

• Supply– Data sets and metadata

• Observe the Data Policy• Meet with the Data Centre

Page 69: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Access to datasets

• Build a metadata database• Build a thesaurus of terms• Provide a web based search tool• Later provide web access to the datasets

Page 70: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Searching for metadata on the web

• Search:– by keyword– by project– detailed search– by theme

• Description of selected dataset:– Title– Abstract– Contact– Extent

Page 71: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Ethical and legal issues

Page 72: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Up front

• issues of consent and confidentiality allowing archiving should be included in the project management plan & addressed before data collection starts

• longer-term rights management in place and IPR issues considered

• unless a waiver on deposition has been agreed, researchers should not make commitments to informants which preclude archiving their data

Page 73: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Consent for archiving• anonymity and privacy of research participants should be

respected

• explicit ‘informed’ consent gained

• information for research participants should be clear and coherent and include:

– purpose of research – what is involved in participation – benefits and risks – storage and access to data – usage of data (current and future uses)– withdrawal of consent at any time– Data Protection & Copyright Acts

• N.B. Additional measures are needed when participants are unable to consent through incapacity or age

• reflect needs and views of all

• works in practice

Page 74: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Legal issues in data preparation

• ‘Duty of confidentiality’

• Law of Defamation

• Data Protection Act 1998 and EU Directive

• Copyright Act 1988

• Freedom of Information

Page 75: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Duty of Confidentiality• disclosure of information may constitute a breach of

confidentiality and possibly a breach of contract

• not governed by an Act of Parliament• not necessarily in writing• can be a legal contractual

• exemptions are:– relevant police investigations or proceedings– disclosure by court order– ‘public interest’ - defined by the courts– ethical obligations in cases of disclosure of child abuse

Page 76: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Law of Defamation

• a defamatory statement is one which may injure the reputation of another person, company or business

Page 77: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Data Protection Act 1998

• eight principles:– Fairly and lawfully processed – Processed for limited purposes – Adequate, relevant and not excessive – Accurate – Not kept longer than necessary – Processed in accordance with the data subject's

rights – Secure

– Not transferred to countries without adequate protection

• allows for secondary use of data for research purposes under certain conditions

Page 78: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Options for preserving confidentiality

• anonymisation

• consent to archive at the time of field work

• researcher contacts informants retrospectively

• user undertakings

• in exceptional circumstances - permission to use or closure of material

Page 79: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Copyright Act 1988• developed for the broadcasting industry not research!

• protection of author’s rights

• multiple copyrights apply:– automatically assigned to the speaker– researcher holds the copyright in the sound recording of an

interview obtain written assignment of copyright from

interviewee, or oral agreement (license) to use– employer holds the copyright in research data

obtain copyright clearance from employer)

• copyright lasts for 70 years after the end of the year in which the author dies

• copying work is an infringement unless it is for the purposes of research, private study, criticism or review or reporting current events, and if the use can be regarded as being in the context of 'fair dealing

• seek legal advice on problem issues

Page 80: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Freedom of Information

• Freedom of Information Act 2000

A statutory right for individuals and organisations to request information held by public authorities.FOI specifically excludes environmental information which is covered by …

• Environmental Information Regulations 2004

• Enables individuals and organisations to obtain environmental information held by public authorities….

Many RELU data sets will fall under the EIRs

Page 81: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

What is the legislation?

• Statutory rights of access to information

• Apply to public authorities – BBSRC, ESRC, NERC and the universities are public authorities

• Any one, anywhere can request copy of any information you hold – includes data sets

• Not all information has to be released

• Must respond to most requests in 20 days

Page 82: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

Exemptions –information protected by law

• Don’t Panic - not all information has to be made available under FoI & EIRs

• FOI & EIRs provide a number of exemptions that can be applied to the release of information

• The presumption is that information will be made available unless for good reason (a public interest test).

• Exemptions protect scientific output, commercial business and personal information (through the Data Protection Act)

• Exemptions can be complex and difficult to apply. If in doubt, ask….

Page 83: RELU Conference, 20 January 2005 RELU Data Support Service RELU-DSS Data Management Workshop Louise Corti and Isabella Tindall

RELU Conference , 20 January 2005

RELU-DSS

• The DSS will provide support to RELU award holders (Call 1 and 2) and round 2 applicants for Call 2, through a telephone and email help desk, a web portal and a series of training events.

• http://www.esds.ac.uk:8080/aandp/create/reludss.asp

• Email: [email protected]

• Tel: 01206 872572 or 01206 872974