building a digital preservation programme · 10/29/2018  · tna definitions digital preservation...

27
1 © UKRI All rights reserved Jaana Pinnick British Geological Survey Research Data and Digital Preservation Manager 29 October 2018 Building a Digital Preservation Programme Enhancing the digital continuity of research data © UKRI All rights reserved What IS “digital preservation”? Different interpretations of what digital preservation means, depending on role and experience Essential to develop and promote a common understanding of digital preservation concept if the NGDC is to develop its digital preservation culture Essential to integrate research activities of scientists and corporate data management procedures better, if progress is to be made in the long-term availability and usability = digital continuity of BGS’ data

Upload: others

Post on 15-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

1

© UKRI All rights reserved

Jaana Pinnick

British Geological Survey

Research Data and Digital Preservation Manager

29 October 2018

Building a Digital Preservation Programme

Enhancing the digital continuity of research data

© UKRI All rights reserved

What IS “digital preservation”?

Different interpretations of what digital preservation means, depending on role and experience

• Essential to develop and promote a common understanding of digital preservation concept if the NGDC is to develop its digital preservation culture

• Essential to integrate research activities of scientists and corporate data management procedures better, if progress is to be made in the long-term availability and usability = digital continuity of BGS’ data

Page 2: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

2

© UKRI All rights reserved

Digital preservation includes…

• Persistent unique identifiers

• Significant properties

• Descriptive, discovery and T&Cs metadata

• Characterisation using technical metadata

• Preservation metadata

• Authenticity

• Data integrity – complete and unaltered data

• Fixity – unchanged digital files (using checksums)

• Maintaining access

• Renderability –continued ability to access a digital object

• Appraisal – what to preserve, what to dispose of

• Physical media obsolescence

• File format obsolescence

• Sustainability –maintenance and interoperability

© UKRI All rights reserved

Digital preservation includes…

• Persistent unique identifiers

• Significant properties

• Descriptive, discovery and T&Cs metadata

• Characterisation using technical metadata

• Preservation metadata

• Authenticity

• Data integrity – complete and unaltered data

• Fixity – unchanged digital files (using checksums)

• Maintaining access

• Renderability –continued ability to access a digital object

• Appraisal – what to preserve, what to dispose of

• Physical media obsolescence

• File format obsolescence

• Sustainability –maintenance and interoperability

Page 3: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

3

© UKRI All rights reserved

PEOPLE!!!

But it is also about…

• Institutional policies and strategies

• Collaboration

• Advocacy

• Procurement and third party services

• Audit and certification

• Legal compliance

• Risk and change management

• Staff training and development

• Standards and best practice

© UKRI All rights reserved

PEOPLE!!!

But it is also about…

• Institutional policies and strategies

• Collaboration

• Advocacy

• Procurement and third party services

• Audit and certification

• Legal compliance

• Risk and change management

• Staff training and development

• Standards and best practice

http://handbook.dpconline.org/contents

Page 4: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

4

© UKRI All rights reserved

TNA definitions

Digital preservation

The long-term archival management of digital information assets selected for their historical value, once they have passed out of business ownership

VS.

Digital continuity

The ability to use your information in the way that you need, for as long as you need. If you do not actively work to ensure digital continuity, your information can easily become unusable

© UKRI All rights reserved

Outline

• Initial review: Organisational background, what to preserve and for whom

• Defining the purpose: Objectives, benefits and challenges

• Taking the first steps: Assessing risk, using tools, and raising staff awareness

• Looking at the Big Picture: Writing a preservation policy and a business case

• Getting into it: Developing a preservation strategy and a digital asset register

Page 5: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

5

© UKRI All rights reserved

British Geological Survey

© UKRI All rights reserved

Organisational background

• Approved Place of Deposit under the Public Records Act

• Making most of the data

freely available under the

Open Government Licence

(OGL)

• Under legal obligation to

manage some types of data

• UKRI best practice:

data that by their nature cannot be re-measured

or re-created […] may often warrant ‘indefinite

storage and preservation’

Page 6: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

6

© UKRI All rights reserved

National Geoscience Data Centre (NGDC)

• Robust and diverse data management skills

• Statutory, commercial and voluntary data donations

• Geoscience data from NERC grant-funded projects

• Heterogeneous data type and volumes

• The long validity of geoscience data means permanent retention is often requiredOne of NERC Environmental Data Centres

© UKRI All rights reserved

What do we need to preserve?

Page 7: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

7

© UKRI All rights reserved

Know your data!

What do we need to preserve?

Difficulty in appraising the

value of geoscience

research data

© UKRI All rights reserved

What is geoscience data?

• Borehole

• Bedrock

• Hydrogeology

• Geochemistry

• Seismic

• Marine geoscience

• Oil and gas

• Airborne geophysical

• Climate change

• Earth characteristics

• Rocks

• Sediments and soils

• Seismology

• Marine geology

• Land contamination

• Geological processes including erosion and volcanic activity

• Natural resources

• And many more

Page 8: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

8

© UKRI All rights reserved

What is geoscience data?

• Borehole

• Bedrock

• Hydrogeology

• Geochemistry

• Seismic

• Marine geoscience

• Oil and gas

• Airborne geophysical

• Climate change

• Earth characteristics

• Rocks

• Sediments and soils

• Seismology

• Marine geology

• Land contamination

• Geological processes including erosion and volcanic activity

• Natural resources

• And many more

© UKRI All rights reserved

What is geoscience data?

• Borehole

• Bedrock

• Hydrogeology

• Geochemistry

• Seismic

• Marine geoscience

• Oil and gas

• Airborne geophysical

• Climate change

• Earth characteristics

• Rocks

• Sediments and soils

• Seismology

• Marine geology

• Land contamination

• Geological processes including erosion and volcanic activity

• Natural resources

• And many more

Page 9: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

9

© UKRI All rights reserved

MSc stakeholder survey

Who uses our data? What for? How long?

• Heterogeneous stakeholder groups (academia, industry, services, government, general public…)

• Heterogeneous purposes of data use (business/personal decision making, consultancy work, public sector policy making, trading onwards, innovation, education, personal interest…)

• Length of data use (40% ten years or longer, 40% 3-9 years)

© UKRI All rights reserved

Defining the purpose

Page 10: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

10

© UKRI All rights reserved

Objectives

© UKRI All rights reserved

• Maximise the long-term accessibility of digital data – by creating robust and fit-for-purpose contextual & preservation metadata

• Support innovation and economic growth using geoscience data – by working smarter, facilitating data reuse and increasing collaboration with scientific disciplines and partners

• Culture change – by raising awareness of and building up skills in digital preservation and research data management and by adopting and implementing best practice across the user community

Objectives

Page 11: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

11

© UKRI All rights reserved

Benefits

© UKRI All rights reserved

• Preservation planning increases financial and operational efficiencies

• Increases the value of unrepeatable and unique geoscience datasets and time-series data

• Enhances the potential for income generation and new service models

• Deduplication of data reduces storage costs and facilitates data retrieval

• Historical research data available for reuse and analysis when new tools and techniques become available

Benefits

Page 12: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

12

© UKRI All rights reserved

Data volumes vs. available resources

• Data deluge: sensor, real time, and monitoring data on the increase

• Funding: Building services using various sources

• Position: Amalgamating contradictory stakeholder requirements

• Staff: Securing a permanent digital skills base

A key challenge

© UKRI All rights reserved

Data volumes vs. available resources

• Data deluge: sensor, real time, and monitoring data on the increase

• Funding: Building services using various sources

• Position: Amalgamating contradictory stakeholder requirements

• Staff: Securing a permanent digital skills base

A key challenge

Page 13: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

13

© UKRI All rights reserved

Taking the first steps

© UKRI All rights reserved

NGDC online data deposit portal

Page 14: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

14

© UKRI All rights reserved

Tools: Using DROID

Checksum values for

fixity checks

© UKRI All rights reserved

Tools: Using DROID

File format profiling

Page 15: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

15

© UKRI All rights reserved

Raising awareness

DPC International Digital Preservation Day 30 Nov 2017

DPC World Digital Preservation Day 29 Nov 2018

© UKRI All rights reserved

• The Simple Property-Oriented Threat (SPOT) Model for Risk Assessment defines six essential properties of successful digital preservation: availability, identity, persistence, renderability, understandability, and authenticity

• For each of these properties, a set of threats is identified which would seriously diminish the ability of the repository to achieve the property in question

• The threats are described at a high-level, and focus on outcome

• An outcome-based typology of threats that individual custodial institutions can use in evaluating their own situational risk and risk mitigation strategies

SPOT Model: Risk Matrix

http://mirror.dlib.org/dlib/september12/vermaaten/09vermaaten.html

Page 16: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

16

© UKRI All rights reserved

• The Simple Property-Oriented Threat (SPOT) Model for Risk Assessment defines six essential properties of successful digital preservation: availability, identity, persistence, renderability, understandability, and authenticity

• For each of these properties, a set of threats is identified which would seriously diminish the ability of the repository to achieve the property in question

• The threats are described at a high-level, and focus on outcome

• An outcome-based typology of threats that individual custodial institutions can use in evaluating their own situational risk and risk mitigation strategies

SPOT Model: Risk MatrixRisk 

priorityRISK description Consequences

Management or mitigation 

methods

Tools or technologies 

available

1

Bit errors, bit rot, 

deterioration of digital 

objects

Access to data may be lost

Data objects become unavailable 

for preservation activities

Fixity information, checksums, 

multiple copies of data

DROID, Fixity, Autopsy

2

Links between objects and 

associated metadata not 

captured or maintained

Long‐term usability of data 

affected

Use unique identifiers for data 

objects and link these to 

descriptive and preservation 

metadata IDs

Bagit

Use a relational 

database to maintain 

links

3

Changing technologies Hardware obsolescence

Media obsolescence

Authenticity of data lost if unable 

to fully render the original 

content 

Creation of a technology watch

Use of open data formats

Migration

4

File format changes Access to data may be lost

Authenticity of data objects may 

suffer

Format obsolescence

Migration, emulation, technical 

metadata, use of open formats

DROID, Jhove, Apache 

Tika, Python Magic 

Library, SIARD

5

Sufficient preservation 

metadata not captured or 

created

Provenance and authenticity of 

data unverifiable

Unable to make appropriate 

preservation decisions in future

File identification tools

Data rescue and forensics 

(expensive)

Maintain a full audit trail

Apache Tika, Dspace

May need manual 

intervention

© UKRI All rights reserved

Looking at the Big Picture

Page 17: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

17

© UKRI All rights reserved

Preservation policy

development

How to preserve? • Dissertation /stakeholder survey

findings• Review of publicly available

digital preservation policies and strategies

• Digital Preservation Coalition Handbook

• TNA ‘Parsimonious Preservation’• Includes: Scope, objectives,

benefits• Outlines preservation framework,

requirements and drivers, and roles and resources

• Describes key concepts for functional preservation

© UKRI All rights reserved

Preservation policy

development

How to preserve? • Dissertation /stakeholder survey

findings• Review of publicly available

digital preservation policies and strategies

• Digital Preservation Coalition Handbook

• TNA ‘Parsimonious Preservation’• Includes: Scope, objectives,

benefits• Outlines preservation framework,

requirements and drivers, and roles and resources

• Describes key concepts for functional preservation

Page 18: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

18

© UKRI All rights reserved

Preservation policy

development

How to preserve? • Dissertation /stakeholder survey

findings• Review of publicly available

digital preservation policies and strategies

• Digital Preservation Coalition Handbook

• TNA ‘Parsimonious Preservation’• Includes: Scope, objectives,

benefits• Outlines preservation framework,

requirements and drivers, and roles and resources

• Describes key concepts for functional preservation

© UKRI All rights reserved

Who pays for preservation?

Writing a business

case

• Identified challenges and set objectives going forward

• Identified key benefits and opportunities to the organisation

• Outlined a modular preservation programme based on OAIS

• Estimated staff costs and proposed measures of success

• Future vision?

Page 19: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

19

© UKRI All rights reserved

Who pays for preservation?

Writing a business

case

• Identified challenges and set objectives going forward

• Identified key benefits and opportunities to the organisation

• Outlined a modular preservation programme based on OAIS

• Estimated staff costs and proposed measures of success

• Future vision?

© UKRI All rights reserved

Who pays for preservation?

Writing a business

case

• Identified challenges and set objectives going forward

• Identified key benefits and opportunities to the organisation

• Outlined a modular preservation programme based on OAIS

• Estimated staff costs and proposed measures of success

• Future vision?

Page 20: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

20

© UKRI All rights reserved

Repository certification

• Builds stakeholder confidence in the repository (funders, users, publishers etc.)

• Bench marking of NGDC processes, procedures and services against recognized standards

• Recognition as a trusted repository for the designated community

• Differentiates NGDC from other repositories

© UKRI All rights reserved

Accreditation ProcessMay – June 2016

Create Working Group

Identify Project Leads

ReviewedRequirements

Oct 2016 – May 2017

Regular monthly meetings to review

progress

Project Leads gathering responses

to requirements

June 2017

Submit Application

Sept 2017

Feedback received

Feb 2018Oct 2017

Edited application resubmitted

Seal Granted

Page 21: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

21

© UKRI All rights reserved

Let’s get going!

© UKRI All rights reserved

Let’s get going!

Page 22: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

22

© UKRI All rights reserved

Preservation strategy• An internal action plan for preservation activities

• Dynamic in nature, modified as more information becomes available

ACTION PLAN

collaboration

objectiveimprovement check

strategy

implementationschedule

act

© UKRI All rights reserved

Preservation strategy• An internal action plan for preservation activities

• Dynamic in nature, modified as more information becomes available

ACTION PLAN

collaboration

objectiveimprovement check

strategy

implementationschedule

act

Page 23: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

23

© UKRI All rights reserved

Know your data: digital asset

register

© UKRI All rights reserved

Know your data: digital asset

register

1. What data assets currently exist?

2. Where are these assets located?

3. How have the assets been managed to date?

4. Which of these assets need to be maintained in the long term?

5. Do current data management practices place these assets at risk?

Page 24: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

24

© UKRI All rights reserved

Know your data: digital asset

register

1. What data assets currently exist?

2. Where are these assets located?

3. How have the assets been managed to date?

4. Which of these assets need to be maintained in the long term?

5. Do current data management practices place these assets at risk?

© UKRI All rights reserved

Digital Object

Rights

Preservation Event

Agent

PREMIS metadata Ensuring Data Integrity

Page 25: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

25

© UKRI All rights reserved

Digital Object

Rights

Preservation Event

Agent

PREMIS metadata Ensuring Data Integrity

© UKRI All rights reserved

OBJECT objectIdentifier

preservationLevel

significantProperties

objectCharacteristics

fixity

size

format

creatingApplication

storage

EVENT eventIdentifier

eventType

eventDateTime

eventDetailInformation

Sample of a PREMIS scheme

Page 26: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

26

© UKRI All rights reserved

Lessons learned

• Consider your preservation strategies early on

• Create Readme notes with additional information to describe the data as early as possible

• Recommend a list of preferred/ open file formats and comply with it

• Back the up data using the 3-2-1 method

• Run fixity checks at ingestion and repeat at regular intervals, replacing corrupt data from other copies

• Include metadata fields to capture preservation events and populate them from the outset

• Monitor your data regularly!

© UKRI All rights reserved

Lessons learned

• Consider your preservation strategies early on

• Create Readme notes with additional information to describe the data as early as possible

• Recommend a list of preferred/ open file formats and comply with it

• Back the up data using the 3-2-1 method

• Run fixity checks at ingestion and repeat at regular intervals, replacing corrupt data from other copies

• Include metadata fields to capture preservation events and populate them from the outset

• Monitor your data regularly!

Page 27: Building a Digital Preservation Programme · 10/29/2018  · TNA definitions Digital preservation The long-term archival management of digital information assets selected for their

27

© UKRI All rights reserved

“Digital preservation is not a project to be completed over the next few years and then forgotten about. It is rather a

new way of approaching the whole digital data life cycle and the new

digital information world we live in.”