iac digital preservation committee ________________________________________________ 10 april 2007...

28
IAC Digital Preservation Committee _________________________________________ _______ 10 April 2007 Yale University Library 10 April 2007

Upload: garey-cannon

Post on 19-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

IACDigital Preservation Committee

________________________________________________

10 April 2007

Yale University Library

10 April 2007

Page 2: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

IAC Digital Preservation Committee________________________________________________

• Outline– Charge & members.– Accomplishments

• Policy

• Best practices

– What’s next

10 April 2007

Page 3: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

IAC Digital Preservation Committee________________________________________________

The DPC is an Integrated Access Council committee charged to: – Develop a digital preservation program by evaluating,

compiling, documenting and articulating policies, procedures, best practices and systems in order to establish a digital preservation infrastructure at Yale University Library.

– Work from a base of clearly articulated policies, then focus on preservation program planning and, finally, make recommendations for program implementation through digital preservation projects, initiatives, and system development.

10 April 2007

Page 4: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

IAC Digital Preservation Committee________________________________________________

• Members:– Rebekah Irwin, BRBL– David Gewirtz, ILTS/AM&T– Kevin Glick, MSS/A – Audrey Novak, ILTS (Co-Chair)– Bobbie Pilette, Preservation (Co-Chair)– E.C. Schroeder, BRBL – Former members:

• Ann Green, ILTS/ITS, Co-Chair• Nicole Bouche, Beinecke Library• Gretchen Gano, Social Science Library

10 April 2007

Page 5: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

IAC Digital Preservation Committee________________________________________________

Accomplishments:• Published a Digital Preservation policy that establishes a mission

statement and promulgates preservation policies for institutional standards governing the quality, type and source of digital assets to be archived in the repository (revised Feb 2007).

• Published best practices addressing: Local practice for implementing PREMIS; Preservation Strategies; Persistent Identifiers; Fixity (checksums, message digest and digital signatures); Format Registries; Encoding & Transmission of Structured Metadata; and Care and Handling of Originals.

• Modeled an organizational structure for the ongoing coordination and management of digital preservation. This structure recognizes that the responsibility for the creation and administration of digital preservation services at Yale is shared by three services: Metadata, Repository and Preservation.

10 April 2007

Page 6: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Digital Preservation Best Practices ________________________________________________

Digital preservation does not have established and vetted standards.Issues and problems associated with preserving digital resources arenumerous, complex and dynamic. DPC best practices are an effort to parse the larger digital preservation problem space into discrete issues andto identify processes, activities and/or methodologies that are emerging asstandards. This work by the DPC is by no means finished. More work isrequired to establish additional best practices for the myriad of relatedtopics and to keep these recommendations current with the latestthinking and research in this field. Note, too, that although informed byresearch, most of these best practices are untested in productionpreservation archives.

10 April 2007

Page 7: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practice – Care & Handling of Physical Collections

________________________________________________

“White paper” to advise Library staff on how to protect originals during digital conversion. Available on the web site for easy access

– Sections include:• Assessment of Physical Collections

• Criteria for Selecting Proper Scanning Equipment

• Preparing the Scanning Surface

• Specifications for Scanning

• Handling Procedures for Library Materials

10 April 2007

Page 8: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Care & Handling of Physical Collections, continued ________________________________________________

• Assessment of Physical Collections– Important to include Preservation Department; contact

Tara Kennedy, Field Service Librarian– List of questions to ask before scanning an object

• Criteria for Selecting Proper Scanning Equipment– Describes available equipment and appropriate use– Indicates which materials can be scanned safely on

each type of equipment• Preparing the Scanning Surface

– How to clean the scanning surface (flatbed)

10 April 2007

Page 9: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Care & Handling of Physical Collections, continued __________________________________________

• Specifications for Scanning– Illumination levels and types, – Proper supports for bound materials, – Environmental considerations (dust, temperature,

relative humidity)• Handling Procedures for Library Materials

– Mostly “common sense” reminders, but also specific suggestions, e.g. oversized materials

– Includes paper-based, multimedia (sound, film, historical, optical), objects

10 April 2007

Page 10: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practice - Fixity ________________________________________________

• Fixity, in preservation terms, means that the digital object has not been changed between two points in time or events.

• Fixity checks such as checksums, message digests and digital signatures are used to verify a digital object’s fixity.

• Information created by these fixity checks, provides evidence for the integrity and authenticity of the digital objects and are essential to enabling trust.

10 April 2007

Page 11: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Fixity, continued ________________________________________________

• Fixity checks are all used in the same basic way. A value is initially generated and saved. Then, in response to an event (e.g., ingest) or over time, it is recomputed and compared to the original to ensure the object (file or bitstream) has not changed.

• All fixity checks are not the same. – Checksums are the simplest and least reliable method. They are

typically used in error-detection to find accidental problems in transmission and storage. They do not account for such changes as the re-ordering of bytes or changes that cancel one another out.

10 April 2007

Page 12: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Fixity, continued ________________________________________________

– Message digests are more secure. They are computed by applying a more complex algorithm to the file of any length to produce a unique, short, uniform length character string. Change one pixel or one note in the file and the message digests will be completely different. (Ex: 93326bff6636655dcd6abff18ed2de997).

– Digital signatures combine message digests with encryption. The message digest is created and then encrypted using a private/public key pair.

10 April 2007

Page 13: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Fixity, continued ________________________________________________

Current best practice for digital preservation

repositories:• The creation of message digests using two

algorithms, MD5 and SHA-1.– These are implemented in the widely used

JHOVE format identification, validation and characterization application (e.g, in the Rescue Repository before and after ingest).

10 April 2007

Page 14: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practice – Format Registries and Tools ________________________________________________

What is a Format?• A technical specification describing a standard

encoding or representation of digital content stored in a file.

– A file format extension such as “.jpg” indicates the encoded content is a digital image.

• File encoding standards are used by programs to read the encoded information and present useable content of the file to a user’s monitor or another output device.

10 April 2007

Page 15: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Format Registries ________________________________________________

What is a Format Registry?• A database that stores information about the

technical specifications of an electronic file’s format.

• Format registries record file format changes over time so that files remain readable in the face of technological obsolescence to a format standard.

How does a format registry work?• Global Digital Format Registry

10 April 2007

Page 16: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

File Format Tools ________________________________________________

File format identification & validation toolsanswer two questions:• How can we tell a file's type? • If we know its type, how can we be sure that it conforms to

its format specification so that we know it is still useable?

10 April 2007

Page 17: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

File Format Tools __________________________________________• JHOVE:  A  widely used tool file type identification,

validation and characterization tool developed by Harvard Univ. Library & JSTOR. – Handles many format types, (e.g., AIFF, ASCII, BYTESTREAM,

GIF, HTML, JPEG, JPEG2000, PDF, TIFF, UTF8, WAV, XML.) – Is configurable in many respects, including the option to: select

full validation or “short” mode, in which only the header’s signature is analyzed; the ability to include or exclude message digests in the output; and to choose from various output formats, including plain text and XML.

• Because JHOVE does both file type identification as well as validation, it is currently Yale University Library’s format-related tool of choice.

 

10 April 2007

Page 18: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

File Format Tools _______________________________________________

Other tools:• DROID (Digital Record Object Identification): A file type

identification tool developed by the Digital Preservation Department of the National Archives of the United Kingdom, to perform automated batch file format identification, using the PRONOM registry .

• National Library of New Zealand Preservation Metadata Extract Tool: A tool that extracts metadata from file headers. This Java tool uses “adapters” to extract metadata from filetypes including: MS Word, Word Perfect, Open Office, MS Works, MS Excel, MS PowerPoint, TIFF, JPEG, WAV, MP3, HTML, PDF,GIF, and BMP.  This data is output in a standard XML format.

10 April 2007

Page 19: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practice – Persistent Identifiers __________________________________________

• A persistent identifier (PI) is a unique name (identifier) associated with an internet resource that provides a link to the content and persists over changes of server location, ownership, and other state conditions. – A location (e.g., a given URL) is not a persistent identifier if

the content moves to another location.The principal problem addressed by PIs is: Broken links to internet resources, i.e., “the HTTP 404 Error – Document not found.”

• Persistent identification is not possible without an associated service. It is the service that supports persistence. The identifier takes you to the service, the service resolves to the object.

• Optimally a PI should be created and assigned when the digital object is created.

10 April 2007

Page 20: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practice – Persistent Identifiers __________________________________________• Several technologies are available to create

persistent identifiers such as:– CNRI Handle System – A generic system for assigning names

to objects and resolving them. Key is the Global Handle Registry which manages the namespace of all handle prefixes.

– DOI (Digital Object Identifier) - An application of the CNRI Handle System that associates intellectual property to structured metadata. A typical use of a DOI is to give a scientific paper or article a unique identifying number that can be resolved through the DOI resolver or the CNRI global handle resolver.

– PURL – A Persistent Uniform Resource Locator is a URL that describes an intermediate (and more persistent) location which when retrieved results in a standard HTTP redirect to the current location of the resource.

Page 21: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Persistent Identifiers - Handle Server ________________________________________________

• The implementation of a CNRI handle server at YUL is tightly coupled to the implementation of the VITAL/Fedora Digital Repository Service.

• Digital objects within the Digital Repository Service will have handles such as: http://moonpie:8085/fedora/get/hdl:10079.2F-2103288706 (opaque), or http://hdl.rutgers.edu/1782.1/SPCOLSMAPS.Map.b1849 (semantic)

• A handle server, like a web server, requires ongoing system administration, e.g., when resources are moved.

• Continuing research in the assignment of handles to resources in other YUL repositories such as the Rescue Repository, Image Commons (DL/Insight), etc.

/

10 April 2007

Page 22: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practice - Maintenance Strategies ________________________________________________

A1. Clear Allocation of Responsibilities

A2. Provision of the appropriate technical infrastructure

A3. Establishment & implementation of a plan for system maintenance, support and replacement

A4. Establishment & implementation of plan for regular transfer of records to new storage media

A5. Adherence to appropriate storage and handling conditions for storage media

A6. Ensuring redundancy and regular backup

A7. Establishment of system security

A8. Disaster planning

10 April 2007

Page 23: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practice - Preservation Strategies ________________________________________________

B1. Use of standards

B2. Data extraction and structuring

B3. Encapsulation

B4. Restricting the range of formats to be managed

B5. Technology preservation

B6. Reliance on backward compatibility

B7. Migration

B8. Software re-engineering

B9. Viewers and migration at the point of access

B10. Emulation

B11. Non-digital approaches

B12. Data restoration

10 April 2007

Page 24: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practice - PREMIS __________________________________________

PREservation Metadata: Implementation Strategies

Yale Working GroupMatthew Beacom, Metadata Librarian, Catalog and Metadata Services (Co-chair) Rebekah Irwin, Catalog Librarian for Digital Projects, Beinecke Library (Co-chair)Youn Noh, Digital Resources Catalog Librarian, Catalog and Metadata Services George Ouellette, Senior Programmer Analyst, Library ILTS David Walls, Preservation Librarian, Library Preservation Dept

Yale Advisory GroupReed Beaman, Associate Director for Biodiversity Informatics, Peabody Museum Lee Faulkner, Media Director, Digital Media Center for the Arts David Gewirtz, Project Manager, Library Projects, ITS Kevin Glick, Electronic Records Archivist, Manuscripts and ArchivesEdward Kairiss, Director, Instructional Computing Instructional Technology, ITS Daniel Lee, E-Publishing/Internet Marketing Manager, Yale University PressThomas Raich, Associate Director, Information Technology, Art Gallery

10 April 2007

Page 25: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practice - PREMIS _______________________________________________

Outcome:

Develop PREMIS profiles that match specific digital collection and administrative needs

Base profile (up to 6 elements): This base profile of elements would support digital preservation of a wide range of digital assets

Full profile (over 200): This full profile would provide guidance to administrators of digital information assets acting as trusted custodians of material deemed to be of long-term value

10 April 2007

Page 26: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

Best Practices - Summary ________________________________________________

• Most of these best practices are the outcome of current research projects.

• Few are tested in production preservation repositories. • At Yale the Rescue Repository is becoming a local testbed.

– Fixity: MD5 and SHA-1 message digests

– JHOVE file format identification and validation

– Maintenance strategies

– PREMIS base profile element set.

• VITAL/Fedora Digital Repository Service implementation– Persistent identifiers through the CNRI Handle System.

10 April 2007

Page 27: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

What’s Next________________________________________________

Goals:• Creation of a Transition Team to continue the work of the DPC, and

most importantly, within a 6 month timeframe, create the roadmap for the implementation of the permanent management model for an ongoing digital preservation program. – The recommended structure consists of a core team representing 2FTE

comprised of staff with expertise in metadata, repository and preservation services. It is modeled as a virtual Digital Curation Center (DCC). The DCC will put into practice the identified best practices and the Digital Repostiory Service (DRS) Preservation Archive.

• The Transition Team will prepare a business plan for the Digital Curation Center. The business plan will identify the DCC’s: Vision, mission, goals and first year deliverables; Staffing models; Budget; and Timeline for creation.

10 April 2007

Page 28: IAC Digital Preservation Committee ________________________________________________ 10 April 2007 Yale University Library 10 April 2007

IAC Digital Preservation Committee ________________________________________________

Website:

http://www.library.yale.edu/iac/dpc.html

10 April 2007