mark phillips, kurt nordstrom; university of north texas .../67531/metadc77154/m2/1/high_res… ·...

1
Content Manifest HOW DO WE VERIFY THE OBJECT ONCE WE HAVE IT? WHAT OBJECTS DO WE WANT TO REPLICATE? All Identifiers WHERE CAN WE GET THE CONTENTS OF AN OBJECT? URL Listing Transport Neutral Digital Object Replication Mark Phillips, Kurt Nordstrom; University of North Texas; Denton, Texas, USA 1. The source Digital Object Archive provides a listing of identifiers for contained objects 2. For each indentifier within the Digital Object Archive, we can request a listing of URLs for individual files. 3. Objects are packaged according to the Bagit specification, allowing for validity checking via contained manifest files. Replication Queue Master Archive Backup Archive Harvester Populator This diagram illustrates the actual steps involved in our replication process. The populator queries the Master and Backup archives for content listings, and creates a queue of objects to replicate in the the Replication Queue. The Harvester reads items from the Replication Queue, downloads them from the Master Archive and then stores them in the Backup Archive, checking them for validity before moving them to their final archival destinations and removing the item from the queue. THEORY PRACTICE IDEA GOAL The University of North Texas (UNT) Libraries has implemented a simple transport neutral digital object replication strategy in its production digital repository infrastructure. This strategy is built with the same ideals as other Curation Micro-Services, in respect to lightweight, software independent specifications coupled to provide a set of services for digital repositories, this approach to replication has allowed the UNT Libraries the flexibility of multiple storage infrastructures and the reassurance that objects are being fully validated as they are replicated throughout the repository. Building on standard Web technologies and methodologies like the Atom Publishing Protocol and REST, coupled with digital library technologies such as Checkm and BagIt, a transport neutral replication strategy allows institutions to meet the increasing demands on their services while keeping the overall costs low by allowing the use of a variety of storage platforms. The goal is to provide seamless replication across a variety of storage and transport mediums, provided that each system is able to provide the necessary services for its contained objects.

Upload: others

Post on 17-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mark Phillips, Kurt Nordstrom; University of North Texas .../67531/metadc77154/m2/1/high_res… · technologies and methodologies like the Atom Publishing Protocol and REST, coupled

Content

ManifestHOWDO WE VERIFY

THE OBJECT ONCEWE HAVE IT?

WHATOBJECTS DOWE WANT TOREPLICATE?

All Identifiers

WHERECAN WE GET

THE CONTENTSOF AN OBJECT?

URL Listing

Transport Neutral Digital Object ReplicationMark Phillips, Kurt Nordstrom; University of North Texas; Denton, Texas, USA

1. The source Digital Object Archive provides a listing of identifiers for contained objects

2. For each indentifier within the Digital Object Archive, we can request a listing of URLs for individual files.

3. Objects are packaged according to the Bagit specification, allowing for validity checking via contained manifest files.

Replication QueueMaster Archive Backup Archive

Harvester

Populator

This diagram illustrates the actual steps involved in our replication process.

The populator queries the Master and Backup archives for content listings, and creates a queue of objects to replicate in the the Replication Queue.

The Harvester reads items from the Replication Queue, downloads them from the Master Archive and then stores them in the Backup Archive, checking them for validity before moving them to their final archival destinations and removing the item from the queue.

TH

EO

RY

PR

AC

TIC

EID

EA

GO

AL

The University of North Texas (UNT) Libraries has implemented a simple transport neutral digital object replication strategy in its production digital repository infrastructure. This strategy is built with the same ideals as other Curation Micro-Services, in respect to lightweight, software independent specifications coupled to provide a set of services for digital repositories, this approach to replication has allowed the UNT Libraries the flexibility of multiple storage infrastructures and the reassurance that objects are being fully validated as they are replicated throughout the repository. Building on standard Web technologies and methodologies like the Atom Publishing Protocol and REST, coupled with digital library technologies such as Checkm and BagIt, a transport neutral replication strategy allows institutions to meet the increasing demands on their services while keeping the overall costs low by allowing the use of a variety of storage platforms.

The goal is to provide seamless replication across a variety of storage and transport mediums, provided that each system is able to provide the necessary services for its contained objects.

ReplicationServices

ReplicationServices

ReplicationServices

Master ArchiveSecondary Archive Tertiary Archive

HTTP Transport iRODS* Transport

*Just one possibility among many