who decides? reinterpreting archival processes for the management of digital research

25
Who Decides? Reinterpreting archival processes for the management of digital research Gareth Knight Centre for e-Research, King’s College London

Upload: garethknight

Post on 15-Jan-2015

1.107 views

Category:

Technology


2 download

DESCRIPTION

Management of digital records can benefit from the contribution of digital curators and archivists. The presentation outlines the efforts of the PEKin project at King's College London to develop a management strategy that combines these disparate skillsets

TRANSCRIPT

Page 1: Who Decides? Reinterpreting archival processes for the management of digital research

Who Decides?Reinterpreting archival processes for the management of digital research

Gareth KnightCentre for e-Research, King’s College London

Page 2: Who Decides? Reinterpreting archival processes for the management of digital research

2

Presentation Themes

1. Need to re-appraise definition & criteria for a Record

2. Challenges posed when attempting to archive digital records

3. Technical architecture and processes required to manage digital records

Page 3: Who Decides? Reinterpreting archival processes for the management of digital research

3

What is a record?

“information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business”

ISO 15489-1:2001. Records management

“a specific piece of recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity “

The International Committee on Archives (ICA) Committee

Page 4: Who Decides? Reinterpreting archival processes for the management of digital research

4

What is a record?

“information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business”

ISO 15489-1:2001. Records management

“a specific piece of recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity “

The International Committee on Archives (ICA) Committee

Page 5: Who Decides? Reinterpreting archival processes for the management of digital research

5

What is a Record today?What is a Record tomorrow?Criteria for defining records that have archival value is subject to range of factors and may change over time:

Legal obligation:• Varies between institution of different type• Geographic location• E.g. Public Records Act 1958 (government), Data Protection Act, Freedom of

Information (FoI) Act 2000, Environmental Information Regulations (2004), etc.

Proof of business:• What type of business? Commercial, non-commercial• Research as a business activity

Proof of activity:• May be broad. Influenced by function, evidential value, uniqueness

•Memory of institution• Same as above – requires consideration of function, value, uniqueness and

other criteria

Page 6: Who Decides? Reinterpreting archival processes for the management of digital research

6

Re-evaluating Records management at King’s College LondonCollege Archives acquire, preserve and makes available all material of long-term, evidential and research interest that forms part of the College’s heritage•Acquisition policy:

• Paper and electronic records of day-to-day business operation, private papers of academics and researchers and material related to College heritage

•Preservation policy:• Preserve archives in their original physical format

(storage, packaging). • Maintain in appropriate environmental conditions – in

compliance with BS5454: 2000

http://www.kcl.ac.uk/iss/archivespec/spcm.html

Page 7: Who Decides? Reinterpreting archival processes for the management of digital research

7

New archiving challenges

New types of information:• Published research papers, research data – funder

requirements for data management

Creation of increasingly diverse content types:• Hybrid (paper+digital), digital content• CAD designs, interactive resources, datasets. Publication

of dynamic content

Update frequency:• Web site, blogs, twitter, object revisions. Versioning

Access lifecycle:• Technology dependencies – software & hardware

Uncertain value:• What is the business value? What period of time?

Page 8: Who Decides? Reinterpreting archival processes for the management of digital research

8

Preservation Exemplars at King’s (PEKin)Project objectives:

1. Develop a management system capable of handling digital business records ANDAND research material

2. Adopt a management strategy that brings together archival AND data curation approaches

3. Embed preservation practices within the institution through a multi-layered strategy:• work with central services to develop a preservation strategy and service for digital

records• work with academic units and professional services to ensure local data producers

and systems managers are provided with targeted advice, guidance and tools to support decision-making

Project Partners:

Centre for e-Research (CeRch) & Archives & Information Management (AIM) at King’s College London

Funder:JISC Information Environment 09-11 preservation exemplars strand

Page 9: Who Decides? Reinterpreting archival processes for the management of digital research

9

Audit framework

PEKin audit framework combines sections of DAF, DRAMBORA, DIRKS & other audit work

Page 10: Who Decides? Reinterpreting archival processes for the management of digital research

10

Audit management practices

Purpose of audit was to identify:• Functions within the organisation that create records• Who used records and for what purpose• Location and responsibility for storage• Time period they are currently / should be retained for• Future stakeholders that need/may wish to use records

Survey academic and business units• Core business units:

• College Estates, Student Records, College Committees

• Research groups:• Twins Early Development Study, Regional Information Collection

Centre, Environment Research Group, Randall Division of Cell and Molecular Biophysics

Page 11: Who Decides? Reinterpreting archival processes for the management of digital research

11

Types of digital information

•Business records:• Estates records – Property records, contract for building

maintenance, Computer Aided Design (CAD)• Student Records – Students, courses, grades• Committee Records – Structure, operation

•Research records:• Commercial – research data created for commercial

purposes, e.g. pollution monitoring, patents• Funded research – Contracts awarded by funding bodies• Unfunded research – Academic researcher who has interest

in topic

Each record has different value and retention period.

Page 12: Who Decides? Reinterpreting archival processes for the management of digital research

12

Many types of lifecycleRecord lifecycle

(variants: Information, data lifecycle)

Access lifecycle

(e.g. digital lifecycle)

Page 13: Who Decides? Reinterpreting archival processes for the management of digital research

13

Analysis of lifecycle risks

•Identify & evaluate risks that occur in the lifecycle

•Applied a ‘light touch’ DRAMBORA (http://www.repositoryaudit.eu/) methodology to case studies. Influenced by DIRKS

•Risk categories• Organisation Management, Staff, Tech Infrastructure, Acquisition &

Ingest, Preservation & Storage, Access & Dissemination

•Risk Description• Definition, manifestations, consequences, severity (risk impact x

probability), mitigation strategies

Page 14: Who Decides? Reinterpreting archival processes for the management of digital research

14

Recognised risks

•Storage:• Insufficient capacity: local drives, network drives, 3rd party server

•Authenticity & integrity:• Unidentified/unknown change. Some staff rely upon print-outs of

digital original

•Archival value and retention period:• Different criteria & quality thresholds• Business records – Recognised legal value & retention period• Research data – Archival value of research papers understood.

Retention period of data has, until recently, not been recognised

•Access and usage:• Business records have well-defined period of primary use, but

unrecognised secondary use• Research papers understood, but do not always consider datasets &

other outputs

Page 15: Who Decides? Reinterpreting archival processes for the management of digital research

15

Risk management Strategy

•Storage and management infrastructure:• Technical infrastructure to store and manage their data

•Education:• Develop staff understanding of data management and archival

principles• Topics include: Authenticity and integrity, assessment of

archival value• Methods: Practical documentation on data

creation/management, training events

•Policies:• type of record collected, time period for collection, appraisal

criteria for long-term retention

•Procedures:• Data capture, curation, preservation

Developed with consideration of cost implications

Page 16: Who Decides? Reinterpreting archival processes for the management of digital research

16

KCL Archives Preservation Repository

An preservation repository for college data of short/long-term value:

Standards compliance• OAIS Reference Model, TRAC, ISO 15489

•Bitstream preservation:• fixity creation/verification, online + offline storage

•Information Content Preservation:• Format conversion, event logging – audit trail

•Access:• Limited to archive reading room, catalogue descriptive MD to

common standard

•Interoperable:• Interact with other college & public systems eg student records

Page 17: Who Decides? Reinterpreting archival processes for the management of digital research

17

OAIS Reference Model

Page 18: Who Decides? Reinterpreting archival processes for the management of digital research

18

Technical Infrastructure

Page 19: Who Decides? Reinterpreting archival processes for the management of digital research

19

Alfresco Actions

•Actions - a parameterized unit of work that can be applied to a node

•Parameters – rules for action execution and type to be performed

•jBPM synchronous or asynchronous workflows

•Actions be performed at different stages of workflow

Page 20: Who Decides? Reinterpreting archival processes for the management of digital research

20

Ingest Actions

•Content model compliance• Conforms to defined structure & object types

•Fixity generation• All: MD5, SHA-1, CRC

•Format identification• All: File(1), DROID

•Technical metadata extraction• Format specific: JHOVE, MP3Info, others

•Specification conformance• branch workflow according to threshold

•Conversion to preservation & dissemination derivative• parameters for each format & licence• OpenOffice, ImageMagick, SoX

•Data Packaging• Generate METS package, record action results as PREMIS Event

Page 21: Who Decides? Reinterpreting archival processes for the management of digital research

21

Archiving actions

•Transfer into Fedora archive• when collection closed (e.g. all papers submitted for meeting

collection• After specified time period, e.g. 3 months

•Fixity verification:• Conform that fixity unchanged

•Manual activity for future date:• Anonymisation in 2 years,• Re-appraisal in xx years – Retain or remove

•Obsolescence monitoring?• Possible future implementation

Page 22: Who Decides? Reinterpreting archival processes for the management of digital research

22

Content Models

•Content models define rules that govern collection structure, data type & behaviour

•Preservation Archive Content Models designed with consideration of resource type and Alfresco & Fedora capabilities

•Content model for each resource type composed of 3 layers

•Each layer is a Fedora Object that contains different metadata

Filing Cabinet

Drawer

Folder

Page 23: Who Decides? Reinterpreting archival processes for the management of digital research

23

Content Model examples

Each item represents a Fedora Object

Each FO holds user provided metadata

Different MD required at each layer

Committee Records

Meeting

AgendaPrevious minutes

PapersCurrent minutes

Research Project

Year

Data Papers Meeting notes Documents

Filing Cabinet

Drawer

Folder

Page 24: Who Decides? Reinterpreting archival processes for the management of digital research

24

Findings (so far)

• Data audit & risk analysis provide useful frameworks for analysing data management practices & justifying data archiving

• No single definition of archival value – different criteria & quality thresholds

• Application of archival principles to data assets provide demonstrable benefits

• Duration requirements of business & research data are broadly similar

• Digital repository architecture provide sufficient flexibility to manage data assets created for different purposes

Page 25: Who Decides? Reinterpreting archival processes for the management of digital research

25

Contact

Gareth Knight

Centre for e-Research, King’s College London

[email protected]

020 7848 1979

http://www.kcl.ac.uk/iss/cerch/projects/portfolio/pekin.html

Centre for e-Research :

www.kcl.ac.uk/iss/cerch

Archives and Information Management (AIM) :

http://www.kcl.ac.uk/iss/explore/team/aim