Systems, processes & how we stop the wheels falling off
Digitisation Open Day, September 2013 Dave Thompson
Digital Curator, Wellcome Library
Digitisation – process overview
Plan project
Catalogue
Identify material
Identify resources
Plan process
Review as you go
Digitise/process
Deliver
Refine processes
Document/share
Document/share
Document/share
Funding, staff, equipment, IT, storage, data management
planning
Open source player
Meanwhile, at the coal face…
Administrative metadata
Descriptive metadata
Digitised images
Ingestion into repository
Creation of METS Access
+
=+
+ +
Thinking conceptually … OAIS
Administrative metadata
Descriptive metadata
Digitised images
Ingestion into repository
Creation of METS Access
+
=+
+ +
In OAIS speak this is a SIP. An aggregation of object & its metadata in a form that is acceptable to the repository, e.g. JPEG2000 images and MARC XML.
The Open Archive Information System Reference model (OAIS) is an ISO that describes a conceptual model of an archive. It sets out the activities of an archive & the processes involved in submission, storage & access. Developed by NASA after they ‘lost’ space data through obsolescence.
Thinking conceptually… OAIS
Administrative metadata
Descriptive metadata
Digitised images
Ingestion into repository
Creation of METS Access
+
=+
+ +
In OAIS speak this is a AIP. This is the object & its metadata stored in a repository.
OAIS talks of 3 information packages.1.Submission Information package = what is ingested2.Archive Information Package = what is stored3.Dissemination Information package = what is made available
Thinking conceptually …OAIS
Administrative metadata
Descriptive metadata
Digitised images
Ingestion into repository
Creation of METS Access
+
=+
+ +
In OAIS speak this is a DIP. This is the parts of the object & its metadata that we are able to make available.
As defined in the (#DPC) handbook, access is assumed to mean continued, ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy and functionality deemed to be essential for the purposes the digital material was created and/or acquired for.
Lets tackle the basics…processing
Administrative metadata
Descriptive metadata
Digitised images
Ingestion into repository
Creation of METS Access
+
=+
+ +
Administrative metadata, (AMD) technical description of the files. Automatically created by Safety Deposit Box (SDB) on ingest into our repository. Used by the player for display purposes.
Administrative MetaData is typically created automatically, it could be:•File size•Image HxW•File format•Checksum
Lets tackle the basics…processing
Administrative metadata
Descriptive metadata
Digitised images
Ingestion into repository
Creation of METS Access
+
=+
+ +
DMD. MARC, converted to MARC XML. This becomes MODS in the METS. Material must be catalogued before we can store it & make it available.
Descriptive MetaData (DMD), typically human generated, AKA cataloguing metadata. ISAD(g) for archival material, MARC for bibliographic material. Metadata Object Description Schema (MODS)
Lets tackle the basics…processing
Administrative metadata
Descriptive metadata
Digitised images
Ingestion into repository
Creation of METS Access
+
=+
+ +
Safety Deposit Box (SDB), the place where we store digital stuff. Ingest is automatically initiated by Goobi. Database that associates objects with DMD & AMD. Source for dissemination.
Digital Repositories offer a convenient infrastructure through which to store, manage, re-use and curate digital materials. They are used by a variety of communities, may carry out many different functions, and can take many forms.
Lets tackle the basics…processing
Administrative metadata
Descriptive metadata
Digitised images
Ingestion into repository
Creation of METS Access
+
=+
+ +
METS is metadata about structure & pagination created by humans, METS file built automatically.
A Metadata Encoding & Transmission Standard (METS) file is an aggregated collection of DMD & AMD (a file list with structure) that provides a mechanism for managed access. A METS file allows metadata from different system to be combined into a portable format.
The formats
• JPEG2000 is our master image format.
• We create dissemination images (JPEG) on the fly.
• Also use PDF, MPEG2, MP3
The systems
• Goobi. Manages & tracks the production of digitised content.
• SDB. Repository that stores digitised content along with its DMD & AMD.
• Player. User interface to view digitised material.
How Goobi works – the basics
• Project based.
• Workflow driven.
• Users accept ‘tasks’.
• A users role determines what projects they belong to & what roles they have.
How Goobi works – a workflow
How Goobi works – METS editing
Pagination as per original
Descriptive metadata
Structure
Lessons from Goobi
• Design your workflows in advance. But be flexible.
• Automate as much as possible, saves time & more efficient.
• Document processes & procedures.
• Share what you learn.
How SDB works – the basics
• Workflow based easily ‘talks’ to other systems.
• Content agnostic.
• Creates administrative metadata on ingest.
• Preservation orientated.
How SDB works
How SDB works – behind the scenes
• No public access to SDB.
• Little direct staff access to SDB content.
• High levels of automation of ingest, Goobi.
• Platform for dissemination mediated by the player.
Lessons from SDB
• Plan your systems integration, which system talks to which, and how.
• Plan workflows & processes.
• Data management plan. Your eggs in one basket.
• Plan what you’ll do when it all turns to custard.
How the player works – the basics
How the player works
• Makes HTTP request to SDB for content.
• Draws access conditions from METS file.
• Permitted actions drawn from METS.
• Draws DMD from live catalogue.
Summary
• Digitisation is an end to end process that brings together objects & metadata.
• Have to think about the whole system to deliver results. Process is one of combining metadata from different systems.
• Document plans & document process.
• Be prepared to be flexible & to change as necessary. But try to stick to the plan!
Further reading
• Wellcome Library – http://wellcomelibrary.org
• Metadata Encoding & Transmission Standard at the Library of Congress - http://www.loc.gov/standards/mets/
• Reference Model for an Open Archival Information System (OAIS). Magenta Book. Issue 2. June 2012 - http://public.ccsds.org/publications/RefModel.aspx
• Tessella, Safety Deposit Box - http://www.tessella.com/tag/safety-deposit-box/
• Data management planning - http://www.dcc.ac.uk/resources/data-management-plans
• Repository Software Comparison: Building Digital Library Infrastructure at LSE - http://www.ariadne.ac.uk/issue64/fay
Thank you
Questions now, questions later…?
Dave Thompson, Digital CuratorWellcome Library
[email protected] - #welldigi
http://wellcomelibrary.org/