preserving transactional data: a dpc study for the big ... · digital preservation & web...

Post on 02-Aug-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

13.30 – 15.15 : 17 May 2016

Oxford Room

Preserving Transactional Data: a DPC Study for the Big Data

NetworkSara Day Thomson

Project Officer, Digital Preservation Coalition

Digital Preservation & Web Archiving Café

Breakout F1-1

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Preserving Transactional DataIRMS Conference 2016

Sara Day Thomson | @sdaythomson

17 May 2016

Our digital memory accessible tomorrow…

UK Data Service

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Outline

Background

Defining Transactional Data

Databases: SQL v NoSQL

Preservation Challenges

Legal Review & Anonymization

Conclusions

Tweet

Share your thoughts on Preserving Transactional Data

Me @sdaythomson

DPC @dpc_chat

UKDS @UKDSBigData

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Administrative

Data Research Network

Big Data Research Centres

DPC Technology

Watch

UK Data Service

Big Data Network Support

N Ireland

England Scotland

Wales

Urban Big Data

Centre

ESRC Business and Local

Gov’t Data

Consumer Data

Research Centre

Preserving Social Media

PreservingTransactional

Data

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Preserving Transactional DataDPC Technology Watch Report

http://dpconline.org/publications/technology-watch-reports

Available summer 2016

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Defining Transactional

Data Greater number of re-uses than imagined

for archived data in the past

Computationally combined to form

richer data Too large or volatile for traditional processing applications to handle

Raw data: numbers, symbols

Information: conclusions based on combinations of raw

data

Individual interactions

with a database

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Computationally combined to form

richer data

The value of ‘Big’ Data is the ability

to combine different data sources

+ =

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Greater number of re-uses than imagined for archived databases

in the past

Consumer AnalysisSpread of Disease

Network Analysis

Business Analysis

Geo-spatial AnalysisHuman Behaviour

Cyber CrimeEconomics

Energy Usage

Urban Planning

Regulatory Compliance

Public Services

Heritage Collections Healthcare

Environmental Science

Education

Governance

Transport

Compare archived

data with current data

Compare data from different

sourcesComputer Science

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Motivations for Long-term Preservation

• Reproducibility of analysis

• Availability of historical data

• Compliance and records management

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Databases

SQL v NoSQL

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Preserving Databases

SQL (Relational)

• SQL somewhat standardised

• Established and supported

• Tested against ACID properties

• Does not always scale up

• Does not always support data arranged in a hierarchy

• Not necessary if reading is higher priority than writing

• Often interdependent on other databases

NoSQL (Other than relational)

• Scales up (usually multiple nodes)

• Allows relationships bn objects that are not the same

• Relies more heavily on application layer for storing information

• Not standardised

• Prioritises availability over ACID properties

• Fewer tested methods for archiving

= Preservation friendly

= Less preservation friendly

= Preservation neutral

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

NoSQL: the Future?

Image source: DB_Engines, ‘RDBMS dominate the database market, but NoSQL systems are catching up’, by Paul Andlinger, 21 Nov 2013, http://db-engines.com/de/blog_post/23

Image source: DB_Engines, ‘DB-Engines Ranking -Trend Popularity’, May 2016, http://db-

engines.com/en/ranking_trend

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

NoSQL?

Key-Value Database

Document Database

Column Family Store

Graph Database

Image source: ThoughtWorks, ‘NoSQL Databases: An Overview’ by Pramod Sadalage, 1 Oct 2014 https://www.thoughtworks.com/insights/blog/nosql-databases-overview

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Challenges

Long-term Preservation

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Summary of Challenges

Volatility

Volume and capacity

Multiple entry points

Context

Data purpose

Legalities

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Interdependence

Information stored in Application Layer

People Directory

Customer Queries

Inventory

Application Application

Application

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

EU Directive 96/9/EC (Database Directive) (1996)

UK Human Rights Act (1998)

Legal Review

UK Data Protection Act (1998)

European General Data Protection Regulation (GDPR)

(2018)

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

EU Directive 96/9/EC ‘Database Directive’

• Exclusive rights holder

• Copyright protection

• Sui generis

UK Data Protection Act (1998)

• Protects personal data

• Data collected for one purpose cannot necessarily be re-used for another

There is no anonymization

in big data

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Thomas and Walport, ‘Data Sharing Review Report’,

11 July 2008

‘…in the vast majority of cases…the complexity of the law, amplified by a plethora of guidance, leaves those who may wish to share data in a fog of confusion’.

Laurie and Stevens, ‘The Administrative Data Research Centre Scotland: A scoping report on the legal & ethical issues arising from access & linkage of administrative data’, Research Paper Series No 2014/35

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Conclusions

• New approaches to preservation

• Selection, compatibility, metadata, and documentation

• Preserving more than data

• Planning for broader uses

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Actual Experience?

NookGoogleCodeGeoCitiesGoogleWaveknol

Yahoo 360del.icio.usMyBlogLogBeBo…

IS CORPORATE ABANDONMENT AS BIG A THREAT

TO THE DIGITAL ESTATE AS OBSOLESCENCE?

Friends ReUnitedYahoo Mail ClassicBlipfotoMySpaceBlogs…

@WilliamKilbride

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

NEW! Digital Preservation Handbook

handbook.dpconline.org

Getting Started&

Making Progress in Digital Preservation

dpconline.org/events

Looking Forward

New Suite of EventsIncl. Digital Preservation for

Records Managementinfo@dpconline.org

E-Ark

www.eark-project.com

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Preserving Transactional DataDPC Technology Watch Report

http://dpconline.org/publications/technology-watch-reports

Available summer 2016

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Preserving Social MediaDPC Technology Watch Report

dx.doi.org/10.7207/twr16-01

@sdaythomsonsara.thomson@dpconline.org

www.dpconline.org

Thanks!

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

Questions? Comments?

top related