iac digital preservation committee ________________________________________________ 10 april 2007...
TRANSCRIPT
IACDigital Preservation Committee
________________________________________________
10 April 2007
Yale University Library
10 April 2007
IAC Digital Preservation Committee________________________________________________
• Outline– Charge & members.– Accomplishments
• Policy
• Best practices
– What’s next
10 April 2007
IAC Digital Preservation Committee________________________________________________
The DPC is an Integrated Access Council committee charged to: – Develop a digital preservation program by evaluating,
compiling, documenting and articulating policies, procedures, best practices and systems in order to establish a digital preservation infrastructure at Yale University Library.
– Work from a base of clearly articulated policies, then focus on preservation program planning and, finally, make recommendations for program implementation through digital preservation projects, initiatives, and system development.
10 April 2007
IAC Digital Preservation Committee________________________________________________
• Members:– Rebekah Irwin, BRBL– David Gewirtz, ILTS/AM&T– Kevin Glick, MSS/A – Audrey Novak, ILTS (Co-Chair)– Bobbie Pilette, Preservation (Co-Chair)– E.C. Schroeder, BRBL – Former members:
• Ann Green, ILTS/ITS, Co-Chair• Nicole Bouche, Beinecke Library• Gretchen Gano, Social Science Library
10 April 2007
IAC Digital Preservation Committee________________________________________________
Accomplishments:• Published a Digital Preservation policy that establishes a mission
statement and promulgates preservation policies for institutional standards governing the quality, type and source of digital assets to be archived in the repository (revised Feb 2007).
• Published best practices addressing: Local practice for implementing PREMIS; Preservation Strategies; Persistent Identifiers; Fixity (checksums, message digest and digital signatures); Format Registries; Encoding & Transmission of Structured Metadata; and Care and Handling of Originals.
• Modeled an organizational structure for the ongoing coordination and management of digital preservation. This structure recognizes that the responsibility for the creation and administration of digital preservation services at Yale is shared by three services: Metadata, Repository and Preservation.
10 April 2007
Digital Preservation Best Practices ________________________________________________
Digital preservation does not have established and vetted standards.Issues and problems associated with preserving digital resources arenumerous, complex and dynamic. DPC best practices are an effort to parse the larger digital preservation problem space into discrete issues andto identify processes, activities and/or methodologies that are emerging asstandards. This work by the DPC is by no means finished. More work isrequired to establish additional best practices for the myriad of relatedtopics and to keep these recommendations current with the latestthinking and research in this field. Note, too, that although informed byresearch, most of these best practices are untested in productionpreservation archives.
10 April 2007
Best Practice – Care & Handling of Physical Collections
________________________________________________
“White paper” to advise Library staff on how to protect originals during digital conversion. Available on the web site for easy access
– Sections include:• Assessment of Physical Collections
• Criteria for Selecting Proper Scanning Equipment
• Preparing the Scanning Surface
• Specifications for Scanning
• Handling Procedures for Library Materials
10 April 2007
Care & Handling of Physical Collections, continued ________________________________________________
• Assessment of Physical Collections– Important to include Preservation Department; contact
Tara Kennedy, Field Service Librarian– List of questions to ask before scanning an object
• Criteria for Selecting Proper Scanning Equipment– Describes available equipment and appropriate use– Indicates which materials can be scanned safely on
each type of equipment• Preparing the Scanning Surface
– How to clean the scanning surface (flatbed)
10 April 2007
Care & Handling of Physical Collections, continued __________________________________________
• Specifications for Scanning– Illumination levels and types, – Proper supports for bound materials, – Environmental considerations (dust, temperature,
relative humidity)• Handling Procedures for Library Materials
– Mostly “common sense” reminders, but also specific suggestions, e.g. oversized materials
– Includes paper-based, multimedia (sound, film, historical, optical), objects
10 April 2007
Best Practice - Fixity ________________________________________________
• Fixity, in preservation terms, means that the digital object has not been changed between two points in time or events.
• Fixity checks such as checksums, message digests and digital signatures are used to verify a digital object’s fixity.
• Information created by these fixity checks, provides evidence for the integrity and authenticity of the digital objects and are essential to enabling trust.
10 April 2007
Fixity, continued ________________________________________________
• Fixity checks are all used in the same basic way. A value is initially generated and saved. Then, in response to an event (e.g., ingest) or over time, it is recomputed and compared to the original to ensure the object (file or bitstream) has not changed.
• All fixity checks are not the same. – Checksums are the simplest and least reliable method. They are
typically used in error-detection to find accidental problems in transmission and storage. They do not account for such changes as the re-ordering of bytes or changes that cancel one another out.
10 April 2007
Fixity, continued ________________________________________________
– Message digests are more secure. They are computed by applying a more complex algorithm to the file of any length to produce a unique, short, uniform length character string. Change one pixel or one note in the file and the message digests will be completely different. (Ex: 93326bff6636655dcd6abff18ed2de997).
– Digital signatures combine message digests with encryption. The message digest is created and then encrypted using a private/public key pair.
10 April 2007
Fixity, continued ________________________________________________
Current best practice for digital preservation
repositories:• The creation of message digests using two
algorithms, MD5 and SHA-1.– These are implemented in the widely used
JHOVE format identification, validation and characterization application (e.g, in the Rescue Repository before and after ingest).
10 April 2007
Best Practice – Format Registries and Tools ________________________________________________
What is a Format?• A technical specification describing a standard
encoding or representation of digital content stored in a file.
– A file format extension such as “.jpg” indicates the encoded content is a digital image.
• File encoding standards are used by programs to read the encoded information and present useable content of the file to a user’s monitor or another output device.
10 April 2007
Format Registries ________________________________________________
What is a Format Registry?• A database that stores information about the
technical specifications of an electronic file’s format.
• Format registries record file format changes over time so that files remain readable in the face of technological obsolescence to a format standard.
How does a format registry work?• Global Digital Format Registry
10 April 2007
File Format Tools ________________________________________________
File format identification & validation toolsanswer two questions:• How can we tell a file's type? • If we know its type, how can we be sure that it conforms to
its format specification so that we know it is still useable?
10 April 2007
File Format Tools __________________________________________• JHOVE: A widely used tool file type identification,
validation and characterization tool developed by Harvard Univ. Library & JSTOR. – Handles many format types, (e.g., AIFF, ASCII, BYTESTREAM,
GIF, HTML, JPEG, JPEG2000, PDF, TIFF, UTF8, WAV, XML.) – Is configurable in many respects, including the option to: select
full validation or “short” mode, in which only the header’s signature is analyzed; the ability to include or exclude message digests in the output; and to choose from various output formats, including plain text and XML.
• Because JHOVE does both file type identification as well as validation, it is currently Yale University Library’s format-related tool of choice.
10 April 2007
File Format Tools _______________________________________________
Other tools:• DROID (Digital Record Object Identification): A file type
identification tool developed by the Digital Preservation Department of the National Archives of the United Kingdom, to perform automated batch file format identification, using the PRONOM registry .
• National Library of New Zealand Preservation Metadata Extract Tool: A tool that extracts metadata from file headers. This Java tool uses “adapters” to extract metadata from filetypes including: MS Word, Word Perfect, Open Office, MS Works, MS Excel, MS PowerPoint, TIFF, JPEG, WAV, MP3, HTML, PDF,GIF, and BMP. This data is output in a standard XML format.
10 April 2007
Best Practice – Persistent Identifiers __________________________________________
• A persistent identifier (PI) is a unique name (identifier) associated with an internet resource that provides a link to the content and persists over changes of server location, ownership, and other state conditions. – A location (e.g., a given URL) is not a persistent identifier if
the content moves to another location.The principal problem addressed by PIs is: Broken links to internet resources, i.e., “the HTTP 404 Error – Document not found.”
• Persistent identification is not possible without an associated service. It is the service that supports persistence. The identifier takes you to the service, the service resolves to the object.
• Optimally a PI should be created and assigned when the digital object is created.
10 April 2007
Best Practice – Persistent Identifiers __________________________________________• Several technologies are available to create
persistent identifiers such as:– CNRI Handle System – A generic system for assigning names
to objects and resolving them. Key is the Global Handle Registry which manages the namespace of all handle prefixes.
– DOI (Digital Object Identifier) - An application of the CNRI Handle System that associates intellectual property to structured metadata. A typical use of a DOI is to give a scientific paper or article a unique identifying number that can be resolved through the DOI resolver or the CNRI global handle resolver.
– PURL – A Persistent Uniform Resource Locator is a URL that describes an intermediate (and more persistent) location which when retrieved results in a standard HTTP redirect to the current location of the resource.
Persistent Identifiers - Handle Server ________________________________________________
• The implementation of a CNRI handle server at YUL is tightly coupled to the implementation of the VITAL/Fedora Digital Repository Service.
• Digital objects within the Digital Repository Service will have handles such as: http://moonpie:8085/fedora/get/hdl:10079.2F-2103288706 (opaque), or http://hdl.rutgers.edu/1782.1/SPCOLSMAPS.Map.b1849 (semantic)
• A handle server, like a web server, requires ongoing system administration, e.g., when resources are moved.
• Continuing research in the assignment of handles to resources in other YUL repositories such as the Rescue Repository, Image Commons (DL/Insight), etc.
/
10 April 2007
Best Practice - Maintenance Strategies ________________________________________________
A1. Clear Allocation of Responsibilities
A2. Provision of the appropriate technical infrastructure
A3. Establishment & implementation of a plan for system maintenance, support and replacement
A4. Establishment & implementation of plan for regular transfer of records to new storage media
A5. Adherence to appropriate storage and handling conditions for storage media
A6. Ensuring redundancy and regular backup
A7. Establishment of system security
A8. Disaster planning
10 April 2007
Best Practice - Preservation Strategies ________________________________________________
B1. Use of standards
B2. Data extraction and structuring
B3. Encapsulation
B4. Restricting the range of formats to be managed
B5. Technology preservation
B6. Reliance on backward compatibility
B7. Migration
B8. Software re-engineering
B9. Viewers and migration at the point of access
B10. Emulation
B11. Non-digital approaches
B12. Data restoration
10 April 2007
Best Practice - PREMIS __________________________________________
PREservation Metadata: Implementation Strategies
Yale Working GroupMatthew Beacom, Metadata Librarian, Catalog and Metadata Services (Co-chair) Rebekah Irwin, Catalog Librarian for Digital Projects, Beinecke Library (Co-chair)Youn Noh, Digital Resources Catalog Librarian, Catalog and Metadata Services George Ouellette, Senior Programmer Analyst, Library ILTS David Walls, Preservation Librarian, Library Preservation Dept
Yale Advisory GroupReed Beaman, Associate Director for Biodiversity Informatics, Peabody Museum Lee Faulkner, Media Director, Digital Media Center for the Arts David Gewirtz, Project Manager, Library Projects, ITS Kevin Glick, Electronic Records Archivist, Manuscripts and ArchivesEdward Kairiss, Director, Instructional Computing Instructional Technology, ITS Daniel Lee, E-Publishing/Internet Marketing Manager, Yale University PressThomas Raich, Associate Director, Information Technology, Art Gallery
10 April 2007
Best Practice - PREMIS _______________________________________________
Outcome:
Develop PREMIS profiles that match specific digital collection and administrative needs
Base profile (up to 6 elements): This base profile of elements would support digital preservation of a wide range of digital assets
Full profile (over 200): This full profile would provide guidance to administrators of digital information assets acting as trusted custodians of material deemed to be of long-term value
10 April 2007
Best Practices - Summary ________________________________________________
• Most of these best practices are the outcome of current research projects.
• Few are tested in production preservation repositories. • At Yale the Rescue Repository is becoming a local testbed.
– Fixity: MD5 and SHA-1 message digests
– JHOVE file format identification and validation
– Maintenance strategies
– PREMIS base profile element set.
• VITAL/Fedora Digital Repository Service implementation– Persistent identifiers through the CNRI Handle System.
10 April 2007
What’s Next________________________________________________
Goals:• Creation of a Transition Team to continue the work of the DPC, and
most importantly, within a 6 month timeframe, create the roadmap for the implementation of the permanent management model for an ongoing digital preservation program. – The recommended structure consists of a core team representing 2FTE
comprised of staff with expertise in metadata, repository and preservation services. It is modeled as a virtual Digital Curation Center (DCC). The DCC will put into practice the identified best practices and the Digital Repostiory Service (DRS) Preservation Archive.
• The Transition Team will prepare a business plan for the Digital Curation Center. The business plan will identify the DCC’s: Vision, mission, goals and first year deliverables; Staffing models; Budget; and Timeline for creation.
10 April 2007
IAC Digital Preservation Committee ________________________________________________
Website:
http://www.library.yale.edu/iac/dpc.html
10 April 2007