niso webinar: metadata for preservation: a digital object's best friend

Post on 11-Jun-2015

1.612 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Over the past decade, as the scholarly community’s reliance on e-content has increased, so too has the development of preservation-related digital repositories. The need for descriptive, administrative, and structural metadata for each digital object in a preservation repository was clearly recognized by digital archivists and curators. However, in the early 2000’s, most of the published specifications for preservation-related metadata were either implementation specific or broadly theoretical. In 2003, the Online Computer Library Center (OCLC) and Research Libraries Group (RLG) established an international working group called PREMIS (Preservation Metadata: Implementation Strategies) to develop a common core set of metadata elements for digital preservation. The first version of the PREMIS Data Dictionary for Preservation Metadata and its supporting XML schema was issued in 2005. Experience using its specifications in preservation repositories has led to several revisions, with the completion of a version 2.0 in 2008. The Data Dictionary is now in version 2.2 (July 2012), and it is widely implemented in preservation repositories throughout the world in multiple domains.

TRANSCRIPT

NISO Webinar:Metadata for Preservation:

A Digital Object's Best Friend

February 13, 2013

Speakers: Rebecca Guenther, Amy Kirchhoff

http://www.niso.org/news/events/2013/webinars/preservation

Metadata for Preservation: A Digital Object’s Best FriendIntroduction to Preservation Metadata

Rebecca Squire GuentherLibrary of Congress, NDMSO and Consultant, meetyourdata.comrguenther52@gmail.com

NISO Webinar, Feb. 13, 2013

Digital preservation: imperative and challenge

More and more of scholarly and cultural record exists in digital form; steps must be taken to secure its long-term future

Groups such as Digital Preservation Coalition, NDIIPP and National Digital Stewardship Alliance have made significant progress in raising awareness about digital preservation imperative

Gradual shift in focus from articulating problem to solving it …• Not so much “Why is digital preservation important” anymore; rather, “What must be done to achieve preservation objectives?”

Many practical challenges in implementing reliable, sustainable digital preservation programs

One key challenge: preservation metadata

Metadata and preservation metadata

“Structured information thatdescribes, explains, locates,or otherwise makes it easier toretrieve, use, or manage aninformation resource”

METADATA

“Metadata that supportsand documents the digitalpreservation process”

PRESERVATIONMETADATA

Preservation metadata includes:

Provenance:• Who has had custody/ownership of the

digital object?

Authenticity:• Is the digital object what it purports to be?

Preservation Activity:• What has been done to preserve it?

Technical Environment:• What is needed to render and use it?

Rights Management:• What IPR must be observed?

Makes digital objects self-documenting across time

Content

PreservationMetadata

10 years on

50 years on

Forever!

Basics of preservation metadata

Digital preservation concentrates on well-designed formal systems based on digital library and trusted digital repository concepts

Information about what needs to be preserved and how are part of any preservation system

Since items aren’t on shelves, metadata is the only mechanism for actually keeping or finding anything

3 concepts are important• Metadata about preservation of digital objects• Preservation of metadata itself to ensure that content

and metadata is preserved• Use of metadata in a trusted digital repository

PREMIS Data Dictionary

May 2005: Data Dictionary for PreservationMetadata: Final Report of the PREMIS Working

Group• Version 2.0 (April 2008)• Version 2.1 (January 2011)• Version 2.2 (July 2012)• Version 3.0 expected 2013

Includes:Data Dictionary Context/assumptionsData model Usage examplesConformance XML schema to support implementation

Data Dictionary: • Core set of implementable, broadly applicable preservation

metadata semantic units, supported by guidelines and recommendations for management and use

What does PREMIS cover?

Administrative metadata that supports the digital preservation process

Provides information to help manage a resource for preservation purposes• Technical characteristics• Information about actions on an object• Relationships (structural and derivative)

• Structural: indicates how compound objects are put together

• Derivative: results of common preservation actions• Rights metadata associated with preservation

In OAIS terms:• Metadata as part of SIP, AIP or DIP• Fits into Preservation Description Information (Reference,

Context, Provenance, Fixity)

What PREMIS is and is not

What PREMIS is:• Common data model for organizing/thinking about preservation

metadata• A checklist for core metadata in a repository• Guidance for local implementations• Standard for exchanging information packages between repositories

What PREMIS is not:• Out-of-the-box solution: need to instantiate as metadata elements in

repository system• All needed metadata: excludes business rules, format-specific

technical metadata, descriptive metadata for access, non-core preservation metadata

• Lifecycle management of objects outside repository• Rights management: limited to permissions regarding actions taken

within repository

PREMIS Data Model

IntellectualEntities

Objects

RightsStatements

Agents

Events

Intellectual Entities

Examples: Rabbit Run by John Updike (a

book) “Maggie at the beach”

(a photograph) The Library of Congress

Website (a website) The Library of Congress:

American Memory Home page (a web page)

Set of content that is considered a single intellectual unit for purposes of management and description (e.g., a book, a photograph, a map, a database)

May include other Intellectual Entities (e.g. a website that includes a web page)

**Has one or more digital representations** Previously not fully described in PREMIS DD, but will be in scope in

version 3.0

Objects

Examples: chapter1.pdf (a file) chapter1.pdf + chapter2.pdf +

chapter3.pdf (representation of a book w/3 chapters)

TIFF file containing header and 2 images (2 bitstreams (images), each with own set of properties (semantic units): e.g., identifiers, technical metadata, inhibitors, … )

Discrete unit of information in digital form

**Objects are what repository actually preserves**

Three types of Object:• FILE: named and ordered

sequence of bytes that is known by an operating system

• REPRESENTATION: set of files, including structural metadata, that, taken together, constitute a complete rendering of an Intellectual Entity

• BITSTREAM: data within a file with properties relevant for preservation purposes (but needs additional structure or reformatting to be stand-alone file)

Intellectual entity will become another level of object

Object Example: book in two versions

Intellectual EntityDa Vinci Code by

Dan Brown

Representation 1Page image

version

Representation 2ebook version

File 1: page1.tiff

File 2:page2.tiff

File N:pageN.tiff

File 1:book.lit

File N+1:METS.xml

Events

Examples: Validation Event: use JHOVE

tool to verify that chapter1.pdf is a valid PDF file

Ingest Event: transform an OAIS SIP into an AIP

Migration Event: create a new version of an Object in an up-to-date format

An action that involves or impacts at least one Object or Agent associated with or known by the preservation repository

Helps document digital provenance. Can track history of Object through the chain of Events that occur during the Objects lifecycle

Determining which Events should be recorded, and at what level of granularity is up to the repository

Agents

Examples: Martha Anderson (a person) Library of Congress (an

organization) Dark Archive in the Sunshine

State implementation (a system)

JHOVE version 1.0 (a software program)

Person, organization, or software program/system associated with an Event or a Right (permission statement)

Agents are associated only indirectly to Objects through Events or Rights

Not defined in detail in PREMIS DD; not considered core preservation metadata beyond identification

Rights Statements

Example: Priscilla Caplan grants FCLA

digital repository permission to make three copies of metadata_fundamentals.pdf for preservation purposes.

An agreement with a rights holder that grants permission for the repository to undertake an action(s) associated with an Object(s) in the repository.

Not a full rights expression language; focuses exclusively on permissions that take the form:• Agent X grants Permission Y to the repository in regard to

Object Z.

Technical metadata pertaining to objects

Object identifier Preservation level Significant characteristics Object characteristics

• fixity• format• size• creating application• inhibitors• object characteristics

extension Creating application Original name

Storage Environment

• software• hardware

Digital signatures Relationships Linking event identifier Linking permission

statement identifier

Semantic units pertaining to Events: provenance and preservation activity Event identifier Event type (e.g. capture, creation, validation,

migration, fixity check) Event dateTime Event detail Event outcome Event outcome detail Linking agent identifier Linking object identifier

Semantic units pertaining to Rights

· Rights Statement· Rights Statement

Identifier· Rights Basis· Copyright Information· License Information· Statute Information· Other Rights Information

· Rights Granted· act· restriction· termOfGrant· rightsGranted

· Linking Object Identifier· Linking Agent Identifier· rightsExtension

Semantic units pertaining to Agents

Agent Identifier Agent Name Agent Type Agent Note Agent Extension linking Event Identifier Linking Rights Identifier

The State of PREMIS

de facto standard for preservation metadata; in some countries mandated for cultural heritage repositories

Was recognized by winning the Digital Preservation Award (2005) and was shortlisted for DPC Decennial award for outstanding contribution to digital preservation (2012)

PREMIS implementations are appearing in many places, many contexts, many forms

Experimentation has led to changes in the data dictionary and schema

PREMIS Implementation fairs: attempts to consolidate implementation experiences, issues, best practices,

Key features of PREMIS Developed through international consensus-making process

Mobilized community to address shared need Shared solution to a shared need

Implementation neutral• Makes no assumptions about technology• Can be flexibly adapted for use across all sorts of

institutions, digital preservation contexts, repository systems• Allows for extensibility

Supported by Maintenance Activity and Editorial Committee, under auspices of US Library of Congress PREMIS is sustained, maintained, and evolved

Extensive outreach to implementer community Tutorials, guides, implementation fairs, PIG Forum “Support system” in place for PREMIS implementers

PREMIS Maintenance Activity

Web site:• Permanent Web presence, hosted by

Library of Congress• Central destination for PREMIS-related

info, announcements, resources• Home of the PREMIS Implementers’ Group (PIG)

discussion list

PREMIS Editorial Committee:• Set directions/priorities for PREMIS development• Coordinate future revisions of Data Dictionary and XML

schema• Promote implementation

http://www.loc.gov/standards/premis/

Implementation resources

Tools:• XML schema• PREMIS-in-METS toolbox <http://pim.fcla.edu>• Controlled vocabularies at http://id.loc.gov• RDF/OWL ontology for use as Linked Data

Guidelines:• PREMIS conformance statement• PREMIS & METS guidelines

Community Working groups on special topics Others:

• Understanding PREMIS (available in multiple languages)• PIG Forum• Implementation Registry• Tools Registry

Some implementers …

DAITTSS (Florida): a preservation repository for the use of the libraries of the public universities of Florida.

Ex Libris Rosetta: a commercial digital preservation system supporting acquisition, validation, ingest, storage, management, preservation and dissemination of different types of digital objects

National Digital Newspaper Program Archivematica: comrehensive open-source digital preservation

system National Archives of Sweden, National Archives of Scotland Carolina Digital Repository: repository for material in electronic

formats produced by members of the University of North Carolina at Chapel Hill community.

British Library electronic journal archiving project

For more information see:• http://www.loc.gov/premis/premis-registry.html

Impact De facto international standard for preservation metadata

• Part of permanent infrastructure supporting digital preservation

• ISO standardization being considered

Wide applicability means benefits from PREMIS extend to entire digital preservation community

Ongoing work to revise/update Data Dictionary and create new supporting resources• PREMIS is a dynamic resource that continues to generate

new sources of value to implementer community

Stood the test of time:• Seven years after initial release, is now indispensable part

of digital preservation implementations around the world• Not surpassed or replaced by other standard or resource

URLs, etc.

PREMIS Maintenance Activity:http://www.loc.gov/standards/premis/

PREMIS Data Dictionary for Preservation Metadata:http://www.loc.gov/standards/premis/v2/premis-2-2.pdf

Understanding PREMIS:http://www.loc.gov/standards/premis/understanding-premis.pdf

PREMIS Implementation Registryhttp://www.loc.gov/standards/premis/premis-registry.php

PREMIS Implementers Group listhttp://listserv.loc.gov/listarch/pig.html

Metadata for PreservationA digital object’s best friend

Implementation!

?

?Amy Kirchhoff

Archive ServiceProduct Manager

Standards

Standards

framework for thinking

Standards

framework for thinkinginterchange specification

[The PREMIS documentation has an] emphasis on the need to know rather than the need to record or represent in any particular way.

Content Type

Content Set(s)

Archival Unit(s)

Content Unit(s)

Functional Unit(s)Storage Unit(s)

IntellectualEntities

Objects

RightsStatements

Agents

Events

Digital preservation is the series of management policies and activities necessary to ensure the enduring usability, authenticity, discoverability and accessibility of content over the very long term.

Dublin Core

Dublin Core

DIDL (from MPEG-21)

Dublin Core

DIDL (from MPEG-21)

METS

Dublin Core

DIDL (from MPEG-21)

METS

OAIS

Dublin Core

DIDL (from MPEG-21)

METS

OAIS

Dublin Core

DIDL (from MPEG-21)

METS

OAIS

Experience

1. Content model

2. Metadata

elements

3. Registries

IntellectualEntities

Objects

RightsStatements

Agents

Events

Identifiers

IntellectualEntities

Objects

RightsStatements

Agents

Events

BooksJournalsDigitized NewspapersDigitized DocumentsSupplied FilesArchive Management Documents

BooksJournalsDigitized NewspapersDigitized DocumentsSupplied FilesArchive Management Documents1

Content Type(s)

Content Type(s)

Content Set(s)

Content Type(s)

Content Set(s)

Archival Unit(s)

Content Type(s)

Content Set(s)

Archival Unit(s)

Content Unit(s)

Content Type(s)

Content Set(s)

Archival Unit(s)

Content Unit(s)

Functional Unit(s)

Content Type

Content Set(s)

Archival Unit(s)

Content Unit(s)

Functional Unit(s)Storage Unit(s)

Descriptive Metadata

Technical Metadata

Events Metadata

PMD

PMDa thing of beauty

IntellectualEntities

Objects

RightsStatements

Agents

Events

1.1 objectIdentifier 1.2 objectCategory 1.3 preservationLevel 1.4 significantProperties 1.5 objectCharacteristics 1.6 originalName1.7 storage 1.8 environment 1.9 signatureInformation 1.10 relationship 1.11 linkingEventIdentifier 1.12 linkingIntellectualEntityIdentifier 1.13 linkingRightsStatementIdentifier

Semantic Unitsfor Objects

Registries

IntellectualEntities

Objects

RightsStatements

Agents

Events

2.1 eventIdentifier 2.2 eventType 2.3 eventDateTime2.4 eventDetail2.5 eventOutcomeInformation2.6 linkingAgentIdentifier 2.7 linkingObjectIdentifier

Semantic Unitsfor Events

Processing Records

Event Sets

Events

Some Portico Events

Edit Descriptive Metadata

Check Descriptive Metadata

Generate Descriptive Metadata

Ingest Into Archive

Create File

Generate Technical Metadata

Set Preservation Level

Generate Fixity

Timestamp

Rationale

InputList

ArgList

Output

ToolWrapper

Tool Component List

Outcome

OutcomeDetailList

Portico EventElements

Content Type

Content Set(s)

Archival Unit(s)

Content Unit(s)

Functional Unit(s)Storage Unit(s)

IntellectualEntities

Objects

RightsStatements

Agents

Events

3.1 agentIdentifier 3.2 agentName3.3 agentType3.4 agentNote3.5 agentExtension3.6 linkingEventIdentifier3.7 linkingRightsStatementIdentifier

Semantic Unitsfor Agents

IntellectualEntities

Objects

RightsStatements

Agents

Events

4.1 rightsStatement4.1.1 rightsStatementIdentifier 4.1.2 rightsBasis 4.1.3 copyrightInformation4.1.4 licenseInformation4.1.5 statuteInformation4.1.6 otherRightsInformation 4.1.7 rightsGranted4.1.8 linkingObjectIdentifier4.1.9 linkingAgentIdentifier

4.2 rightsExtension

Semantic Unitsfor Rights

Easy

Easy

For PorticoFor the moment

IntellectualEntities

Objects

RightsStatements

Agents

Events

THANK YOU.

Amy Kirchhoffamy.kirchhoff@ithaka.org

http://www.portico.org

NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend

NISO Webinar • February 13, 2013

Questions?

All questions will be posted with presenter answers on the NISO website following the webinar:

http://www.niso.org/news/events/2013/webinars/preservation

Thank you for joining us today. Please take a moment to fill out the brief online survey.

We look forward to hearing from you!

THANK YOU

top related