Download - Metadata For Preservation Delos
Metadata for Preservation
Metadata for Preservation
Priscilla Caplan, Florida Center for Library Automation
Outline
Do it yourself: let’s invent some preservation metadata
The OAIS Information Model
Metadata standards for preservation general preservation metadata standards format-specific technical metadata packaging standards
Problems/issues/interesting things
first things first
what is metadata?
how do we normally classify different types of metadata?
what is preservation metadata?
first things first
what is metadata?
how do we normally classify different types of metadata?
what is preservation metadata? metadata related to the preservation management of
information resources; for example, metadata used to document, or created as a result of, preservation processes performed on information resources.
information that supports and documents the long-term preservation of information materials.
Fixity
Viability
Renderability
Description
Secure storage
Media management
Availability
Identity
CaptureSelection
Understandability
AuthenticityFormat strategies (migration, emulation..)
Authentication
Documentation
Preservation Pyramid
Fixity
Viability
Renderability
Description
Secure storage
Media management
Availability
Identity
CaptureSelection
Understandability
AuthenticityFormat strategies (migration, emulation..)
Authentication
Documentation
Preservation Pyramid
pre s e r v a t ion
metadata
fixity
the quality of not being altered or deleted threatened by insecure storage and media degredation
metadata supporting fixity
a message digest (checksum) the algorithm used to generate it when it was last calculated who did the calculation
viability
the quality of being readable from media threatened by media degredation and media
obsolescence
metadata supporting viability
the type of medium used to store the object the age of the specific unit the date the object was written to the unit performance metrics for the medium (MTTF) usage metrics for the unit
renderability
the quality of being displayable, playable, or otherwise usable
threatened by format obsolescence
authenticity
the object is what it purports to be; both the source and the content are verifiable
threatened by unknown provenance, undocumented alterations
metadata supporting authenticity
the source of the object a history of the custody of the object a record of any changes to the object a digital signature (maybe)
OAIS
OAIS Information Model
representation information
the information that is needed to make a Content Data Object understandable to a Designated Community
Structural: the format is biff8 column 1 is a date yyyy-mm-dd, column 2 is a decimal
Semantic: this is a daily business log for XYZ Corp. col. 1 is the date of business, col. 2 is gross take in Euros
representation information
may be recursive
Structural: the format is biff8
format specification for biff 8 (in PDF) format specification for PDF
rules for rendering as a spreadsheet column 1 is a date yyyy-mm-dd, column 2 is a decimal
Semantic: this is a daily business log for XYZ Corp. col. 1 is the date of business, col. 2 is gross in Euros
currency equivalence chart
preservation descriptive information
The information necessary to preserve the Content Information
reference = identifier(s)
context = relation to other Content Information
provenance = history of creation, modification, custody
fixity = checksums and similar mechanisms
packaging information
the information which, either actually or logically, binds, identifies and relates the Content Information and Preservation Descriptive Information
Standards
metadata standards for preservation
General preservation metadata standards PREMIS (Preservation Metadata: Implementation
Strategies) LMER (Long-term Preservation Metadata for Electronic
Resources)
Format-specific technical metadata Z39.87 NISO/AIIM Technical metadata for digital still
images AES X089 core audio metadata
Packaging standards METS (Metadata Encoding and Transmission Standard) MPEG-21 Digital Item Declaration Language
general standards
PREMIS (Preservation Metadata: Implementation Strategies)
LMER (Long-term Preservation Metadata for Electronic Resources)
PREMIS
an implementable core set of preservation metadata
defines preservation metadata as “the information a repository uses to support the digital preservation process”
defines core as what most repositories need to know most of the time
but what is implementable?
Implementable preservation metadata ...
is precisely defined can be automatically supplied can be automatically processed
e.g. prefer coded values from authority lists is implementation independent is based on a rigorous data model
PREMIS Data Model
Intellectual entity
Set of content that is considered a single intellectual unit for purposes of management and description (e.g., a book, a photograph, a map, a database)
May include other Intellectual Entities (e.g. a website that includes a web page)
Has one or more digital representations Not described in PREMIS – use descriptive metadata
Examples: Planets Newsletter, Issue 3 “Identical twins” by Diane Arbus (a photograph) Digital Curation Centre website
Object
What the repository actually preserves Three types of object:
FILE: named and ordered sequence of bytes that is known by an operating system
REPRESENTATION: set of files, including structural metadata, that, taken together, constitute a complete rendering of an Intellectual Entity
BITSTREAM: data within a file with properties relevant for preservation purposes (but needs additional structure or reformatting to be stand-alone file)
Example: An IE with two representations
Intellectual Entity:“My dog Ace”
Representation1: TIFF version
Representation 2:JPEG2000 version
File 1: dog.TIFF File 2: dog.JP2
Bitstream 1:Embedded metadata
Example 2: Another IE with 2 representations
Intellectual EntityDa Vinci Code by
Dan Brown
Representation 1Page image
version
Representation 2ebook version
File 1: page1.tiff
File 2:page2.tiff
File N:pageN.tiff
File 1:book.lit
File N+1:METS.xml
Event
An action that involves or impacts at least one Object or Agent associated with or known by the preservation repository
Helps document digital provenance. Can track history of Object through the chain of Events that occur during the Objects lifecycle
Examples: Validation Event: verify that chapter1.pdf is a valid PDF
file Ingest Event: transform an OAIS SIP into an AIP (one
Event or multiple Events?) Migration Event: create a new version of a file in a more
current format
Agent
Person, organization, software program associated with an Event or a Right
Not defined in detail in PREMIS
Examples: Seamus Ross (a person) British Library (an organization) DAITSS (a system) dioscuri (a software program)
Rights
Rights statement describes one or more rights or permissions granted to the repository What is the basis for claiming the right? – statute, copyright,
license What can the repository do?
Examples: because copyright status is public domain, repository can
give unrestricted access, make copies and make derivative works
because of license terms, repository can make up to 10 copies
some things we say about Objects
object identifier general technical characteristics, e.g.
size, format, fixity, inhibitors, creating application composition level
format specific technical characteristics (use extension) original name storage environment digital signature relationships to other objects relationships to agents, events, and rights statements significant properties
significant properties
the characteristics of digital objects which must be preserved over time in order to ensure the continued accessibility, usabilty, and meaning of the objects, and their capacity to be accepted as evidence of what the purport to record. (Andrew Wilson)
the characteristics of a particular object subjectively determined to be important to maintain through preservation actions
how could you preserve this apple?
significant properties
performance model: a source file is interpreted through a process to create a performance; in other words, the object is meaningful only as it is perceived
often faceted as content, context, appearance (rendering), structure, and behavior
InSPECT (Investigating the Significant Properties of Electronic Content over Time)
can apply to all objects of a given format, or individual objects
may be in the eye of the beholder
some things we say about Events
event identifier event type date and time detail outcome information agents and their roles objects and their roles
Sample Data Dictionary entry
Semantic unit size Semantic components
None
Definition The size in bytes of the file or bitstream stored in the repository.
Rationale Size is useful for ensuring the correct number of bytes from storage have been retrieved and that an application has enough room to move or process files. It might also be used when billing for storage.
Data constraint Integer Object category Representation File Bitstream Applicability Not applicable Applicable Applicable Examples 2038927 Repeatability Not repeatable Not repeatable Obligation Optional Optional Creation/ Maintenance notes
Automatically obtained by the repository.
Usage notes Defining this semantic unit as size in bytes makes it unnecessary to record a unit of measurement. However, for the purpose of data exchange the unit of measurement should be stated or understood by both partners.
PREMIS Maintenance Activity
LMER
Authored by Die Deutsche Bibliothek, used in kopal
Explicitly for exchange
Based on the National Library of New Zealand’s data model
a quick aside on archives
format specific technical metadata
What kinds of properties are format-specific? number of tracks character set height, width color space fonts
format specific metadata “standards”
NISO/AIIM Z39.87-2006, Data Dictionary - Technical metadata for digital still images
AES-X098B, Core audio metadata XML definition (draft) textMD (now maintained by Library of Congress) JHOVE and metadata extraction tools
Z39.87
Revised second edition
Defers to PREMIS where elements overlap
XML binding is MIX, maintained by Library of Congress
issues with format-specific metadata
how much of it is useful for preservation? what would you use it for? if you can extract it from a file header, do you need to need
to extract it from the file header? what to do when schema for format-specific metadata also
defines general technical metadata? what is the proper role of registries?
packaging standards
METS (Metadata Encoding and Transmission Standard) MPEG-21 Digital Item Declaration Language IMS Global Learning Consortium Content Packaging
Standards Sharable Content Object Reference Model (SCORM) CCSDS XML Packaging scheme
METS
structure of a METS document
amdsec can include source, provenance, rights, and technical metadata
Issues, problems and Interesting things
does preservation metadata actually work?
how best to store in working repositories?
role of centralized registries
what can be automated
best practices for interoperability
references
Priscilla Caplan, Preservation Metadata (DCC Digital Curation Manual) http://www.dcc.ac.uk/resource/curation-manual/chapters/preservation-metadata/
Brian Lavoie, Technology Watch Report: Preservation Metadata http://www.dpconline.org/docs/reports/dpctw05-01.pdf
PREMIS Maintenance Activity http://www.loc.gov/standards/premis/
METS Maintenance Activity http://www.loc.gov/standards/mets/
Creative Commons Licence
This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.