besser--sfs-hague 18/10/02 1 building a digital future: sustainable, interoperable, accessible...
TRANSCRIPT
Besser--SfS-Hague 18/10/02 1
Building a Digital Future: Sustainable, Interoperable,
Accessible Repositories
Howard Besser
NYU Archiving and Preservation Program and Library Senior Scientist
http://www.gseis.ucla.edu/~howard
Besser--SfS-Hague 18/10/02 2
Sustainable, Interoperable, Accessible Repositories -
Models for Digital Repositories Importance of Metadata Standards & Philosophies Discovery Metadata: The Dublin Core Administrative and Structural Metadata: MOA2/METS Actors Metadata Longevity Metadata Identification/Provenance The 4/99 NISO/DLF Image Metadata Workshop Various other Metadata
Besser--SfS-Hague 18/10/02 3
From Digital Collections to Digital Libraries, Museums,
and Archives
_ No longer merely experiments_ Adhere to our fields’ traditions (access,
interoperability, sustainable, privacy, …)_ Provide services
Besser--SfS-Hague 18/10/02 4
To respond to our needs for both Service & Traditions, we
face the challenges of:
Access (discovery) Sustainability (longevity)- Interoperability-
Besser--SfS-Hague 18/10/02 5
Serious Longevity Problems
What we know from prior widespread digital file formats
Images separating from their metadata Inaccessibility of software needed to view
an image Inability to even decode the file format of
an image
Besser--SfS-Hague 18/10/02 6
Traditional Digital Repository Model
DL
DL
DL
DL
useruser
search & presentation
search & presentation
search & presentation
search & presentation
Besser--SfS-Hague 18/10/02 7
Ideal Digital Repository Model
DL
DL
DL
DL
useruser
search & presentation
Besser--SfS-Hague 18/10/02 8
For Interoperability, Repositories Need Standards
(as well as Sustainability & Access) Descriptive Metadata for consistent
description Discovery Metadata for finding Administrative Metadata for viewing and
maintaining Structural Metadata for navigation ... Terms & Conditions Metadata for
controlling access...
Besser--SfS-Hague 18/10/02 9
Why are Standards and Metadata consensus
important? Managing digital files over time Longevity Interoperability Veracity Recording in a consistent manner Will give vendors incentive to create
applications that support this
Besser--SfS-Hague 18/10/02 10
Philosophical Metadata Decisions-
_ Warwick vs MARC_ Where to put the metadata
Besser--SfS-Hague 18/10/02 11
Containers and Packages of Metadata
Warwick, not MARC
_ modular_ overlapping_ extensible_ community-based_ designed for a networked world to aid
commonality btwn communities while still providing full functionality within each community
Besser--SfS-Hague 18/10/02 12
Some different schemes where Metdata is kept
_ embedded within the object (TIFF headers)_ encapsulated with image (MOA2/METS)_ in a separate related DB maintained by
same organization (OPAC)_ in a separate DB maintained by a separate
organization (Books in Print, ratings systems)
Besser--SfS-Hague 18/10/02 13
Discovery Metadata
_ Dublin Core - NISO Z39.85 (3/95)-_ CBIR (ongoing)
Besser--SfS-Hague 18/10/02 14
Dublin Core--further work
_ Warwick Framework– metadata packages for extensible functions– layed groundwork for RDF
_ Canberra Qualifiers– refining the semantics of the element set to provide more precise info– SUBELEMENT, SCHEME, LANG
_ Granularity– no hierarchical relationships w/i a given DC record; only one record
per discrete object (collection or item-level), and relationship field plus qualifier links them
The Research Process and Functional Categories of
Metadata_ Discovery_ Retrieval_ Collation_ Analysis_ Re-presentation
Besser--SfS-Hague 18/10/02 16
Structural & Administrative Metadata-
Making of America II (MOA2) Metadata Encoding & Transmission
Standard (METS)
Besser--SfS-Hague 18/10/02 17
MOA II Classes of Objects
Continuous Tone Photos Photo Albums Diaries, journals, letterpress books Ledgers Correspondence
Besser--SfS-Hague 18/10/02 18
MOA II Metadata
_ Administrative Metadata– for enhancing resource management
_ Structural Metadata– for reflecting internal hierarchies and
relationships btwn parts
_ Raw/Seared/Cooked
Besser--SfS-Hague 18/10/02 20
MOA II Best practices
Use/Users/Collection: Benchmarking Masters vs. Derivatives Scanning- Administrative Metadata- Structural Metadata-
Besser--SfS-Hague 18/10/02 21
Scanning Best Practices
_ Think about users (and potential users), uses, and type of material/collection
_ Scan at the highest quality that does not exceed the likely potential users/uses/material
_ Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery
_ Many documents which appear to be bitonal actually are better represented with greyscale scans
_ Include color bar and ruler in the scan
_ Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct)
_ Don’t use lossy compression_ Store in a common (standardized)
file format_ Capture as much metadata as is
reasonably possiple (including metadata about the scanning process itself)
Besser--SfS-Hague 18/10/02 23
Administrative Metadatato uniquely identify a digital resource and manage it
over time
_ Information about where the various pieces/versions of the object reside
_ Information to view the digital object_ Information about the scanning process
Besser--SfS-Hague 18/10/02 24
Structural Metadata:that which is relevant to presentation of the
digital object to the user
_ metadata defining the "object”: a book, a diary, a photo album
_ metadata defining the “sub-objects”: pages (physical) or chapters and subheads (intellectual)
Besser--SfS-Hague 18/10/02 25
Other Types of Metadata-
_ Actors Metadata_ Longevity_ Identification/Provenance_ Rights Management
Besser--SfS-Hague 18/10/02 26
Reference Modelsfor
Digital Libraries:Actors and Roles
DELOS/NSF Working Group
http://www.delos-nsf.actorswg.cdlib.org/
Besser--SfS-Hague 18/10/02 27
NSF/DELOS Actors/Roles Project
_ Classes of Actors, including– Persons
– Organizations
– automata
_ Roles & implications– Production
– Dissemination
– Management
– use
Besser--SfS-Hague 18/10/02 28
Multimedia & Collaborative Authorship imply
_ Not only:– Authors
– Editors
– Publishers
_ But also creators of– Text
– Illustrations
– Composers
– Musicians...
Besser--SfS-Hague 18/10/02 29
And goes beyond conventional authors
_ Others that are part of digital library process– Users
– Catalogers
– Reference librarians
_ Even other groups/entities– Software agents
– Mediators
– Special rights holders...
Besser--SfS-Hague 18/10/02 30
Borbinha’s “naive tentative sketch” of the problem...
User Registered
AnonymousLibrarian
Agent
Creator Editor
Distributor
Preservation
Publication
Licensing Acquisition
RegistrationDissemination
Search
Digital Library
Access
Besser--SfS-Hague 18/10/02 31
Benefits for
_ Linking metadata to authority records_ Rights management_ Privacy protection
Besser--SfS-Hague 18/10/02 32
Deliverables_Workshop proceedings: proceedings with invited contributions and papers selected from a call, intended to be a reference source for the current state of the art.
_White paper:– Definition and introduction to the problem.– Description and analysis of the requirements.– A proposal to the community for a reference
model, focusing on definitions of key concepts, terminology, classes of agents, services, relationships, etc.
– Proposals for an international agenda for further technical and collaborative developments.
Besser--SfS-Hague 18/10/02 33
Core groupDELOS (Europe)_ José Borbinha, National Library of
Portugal (DELOS coordinator)_ Michel Mabe, Elsevier Science, UK
(Publishing industry)_ Peter Mutschke, Social Science
Information Centre, Germany (Software agents, Information Retrieval)
_ Hans-Jörg Lieder, Berlin State Library, Germany (LEAF project)
_ Gunnar Karlsen, University of Bergen, Norway (Archives)
WIPO – World Intellectual Property Organisation
_ Glenn Macstravic
NSF (USA)_ John Kunze, University of California,
USA (NSF coordinator)_ Barbara Tillett, Library of Congress,
USA (Libraries)_ Becky Dean, OCLC, USA (Libraries
services)_ Angela Spinazze, CIMI/RLG, USA
(Museums)_ Howard Besser, University of
California, USA (Multimedia and digital art production)
DCMI - Dublin Core Metadata Initiative
_ Warwick Cathro, National Library of Australia
Besser--SfS-Hague 18/10/02 34
Work planPhase 1: Starting (March - April 2002)_ Tuning objectives, scope, and action plan_ Identification of reference sources _ Call for contributions to the workshop
Phase 2: Internal Discussion (May - June 2002)_ Analysis of the problem_ Draft paper
Phase 3: Public Discussion (July - October 2002)_ Expose the draft paper. Promote open public discussion _ Workshop in Portugal (July 3-5). Workshop report _ Draft paper (second version)
Phase 4: Conclusions (November - December 2002)_ Review of the work done..._ Final report
Besser--SfS-Hague 18/10/02 36
Recent Digital Preservation Activities-
The Problem Preservation Repositories Preservation Metadata Other Digital Preservation Activities Special concerns of Cult Heritage
community
Besser--SfS-Hague 18/10/02 37
Serious Longevity Problems
What we know from prior widespread digital file formats
Previous formats required little ongoing intervention (remote storage facilities, Iron Mtn); digital formats require intense ongoing management
The Short Life of Digital Info-
Besser--SfS-Hague 18/10/02 38
The Short Life of Digital Info: Digital Longevity Problems
Disappearing Information The Viewing Problem The Scrambling Problem The Inter-relation Problem The Custodial Problem The Translation Problem
Besser--SfS-Hague 18/10/02 39
Older Longevity Projectshttp://sunsite.berkeley.edu/Longevity/
CPA Task Force Getty “Time & Bits” Conference & Follow-ups- Preservation experiments in US and Europe
NEDLIB, CURL, Michigan
Internet Archive Long Now
Besser--SfS-Hague 18/10/02 40
Preservation Repositories:Projects based on OAIS Model
CEDARS NEDLIB Pandora CDL OCLC/RLG Working Group on
Preservation Metadata, Attributes of a Trusted Digital Repository, August 2001-
Besser--SfS-Hague 18/10/02 41
Preservation Metadata
OCLC/RLG Working Group on Preservation Metadata, Preservation Metadata for Digital Objects: A Review of the State of the Art, January 31 2001
OCLC/RLG Working Group on Preservation Metadata, A Recommendation for Content Information, October 2001
Besser--SfS-Hague 18/10/02 42
Preservation Repositories:Open Archival Info System Model
High-level reference model describing submission, organization and management, and continuing access
Conceptual framework for different organizations to share discussions with a common language
Producers, consumers, management, actual repository SIP, DIP, AIP AIP consists of data objects plus representation info
(Content, Preservation Description, Packaging, Descriptive)
Originally developed for Space Science community
Besser--SfS-Hague 18/10/02 43
Preservation Repositories -- AIP Metadata
_ Preservation Description Info– reference info
– context info
– provenance info
– fixity info
_ Packaging Info_ Descriptive Info_ Content Info
Besser--SfS-Hague 18/10/02 44
Other Digital Preservation Activities-
LC Natl Dig Info Infrastructure & Preservation InterPARES Emulation Projects E-Journal Archiving ERPANET Persistent Naming
Besser--SfS-Hague 18/10/02 45
LC’s National Digital Information Infrastructure and
Preservation Program_ Authorized Dec 2000_ LC, Dept of Commerce, NARA, White House
Office of Sci & Tech Policy_ with help from CLIR, NLM, NAL, OCLC, RLG_ Ongoing collab process_ Commissioned papers on preserving: the Web,
periodicals, digital sound, E-Books, Digital TV, Digital Video
Besser--SfS-Hague 18/10/02 46
InterPARES International Research on Permanent Authentication
Records in Electronic Systems
_ Ongoing international archival world project examining how to make electronically-generated records last over time
_ Developing the theoretical and methodological knowledge needed, then will formulate model policies, strategies, and standards
_ Next year will be extended to include images and rich media
Besser--SfS-Hague 18/10/02 47
Electronic Resource Preservation and Access NETwork (ERPANET)
_ Best practices and skills development for digital preservation of cultural heritage and scientific objects
_ 3 year project launched Nov 2001; 1.2 million Euros
Besser--SfS-Hague 18/10/02 49
What’s special about Cult Heritage Materials?
_ Images & rich media_ Inter-relationships btwn parts_ For Contemporary Art: What is the Work?-
Besser--SfS-Hague 18/10/02 53
Complexity of Rich Media
_ Works often have artistic nature (including video games)
_ Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact)
_ Too complex to save every one of these aspects for every type of material
_ Importance of saving documentation
Besser--SfS-Hague 18/10/02 54
What can we do specific to Electronic Art?
_ Works themselves may no longer even exist; in many cases, what we can save amounts to forensic evidence
_ Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact)
_ Too complex to save every one of these aspects for every type of material
_ Importance of saving pieces, representations, and documentation_ Involve the artists to capture their intentions_ Importance of Standards_ Familiarize ourselves with recent conservation developments (Who
Knows?, TechArcheology, Tate, IMAP)
Besser--SfS-Hague 18/10/02 55
Standards for encodingartists intentions
(group efforts w/i Cult Heritage community)
_ Artists Interviews Project, Netherlands Institute for Cultural Heritage 1998-1999, Modern Art: Who Cares (http://www.icn.nl/english/6.4.2.html)
_ TechArcheology: A Symposium on Installation Preservation (SFMOMA)
_ More recent SFMOMA/Tate collaborations_ IMAP_ Guggenheim’s Variable Media
Besser--SfS-Hague 18/10/02 56
Structural Metadata Standards for Encoding Multimedia-
(no time for details)
_ SMIL_ MPEG 4
Besser--SfS-Hague 18/10/02 57
Identification/Provenance (Images)-
The number of variant forms of a work can be enormous Image Families A digital image frequently has many layers of parentage Information about the parentage that can indicate the
quality and veracity of the image (Dublin Core "Source" and "Relation")
how to deal with different versions derived from the same scan or different encoding schemes
Vocabulary Standards to express this
Besser--SfS-Hague 18/10/02 58
The number of variant forms of a work can be enormous
different views of the same object different scans of the same photo different resolutions different compression schemes different compression ratios different file storage formats different details of the same image ...
Besser--SfS-Hague 18/10/02 60
Identification/Provenance
how to deal with different versions (browse, hi-res, medium res) derived from the same scan or different encoding schemes (TIFF, PICT, JFIF)
Vocabulary Standards to express this– VRA Surrogate Categories– CIMI's "Image Elements”
Besser--SfS-Hague 18/10/02 61
Incorporate parts of Functional Requirements for
Bibliographic Records (FRBR)_ work_ expression_ manifestion_ item
_ (and push into “change history” section of Technical Image Metadata)
Besser--SfS-Hague 18/10/02 62
NISO/DLF Image Metadata Workshop--4/99
(Z39.87-2002 draft)
create metadata needed to manage images in digital repositories over long periods of time (full life-cycle mgmt)
document image provenance & history ensure that the images will be rendered
accurately on any output device
Besser--SfS-Hague 18/10/02 63
Technical Image Metadata
Focus on Metadata that may prove helpful for
management use preservation ...
Besser--SfS-Hague 18/10/02 64
Technical Image Metadata
In Scope
still, bit-mapped pictorial images scanned/reformatted images (+ born digital)
Besser--SfS-Hague 18/10/02 65
Technical Image Metadata
Out of Scope
vector images moving images images of OCR-able text structural and hierarchical relationships
between images rights management, terms of use (authenticity/security)
Besser--SfS-Hague 18/10/02 66
Technical Image Metadata
Technical Image Metadata-Z39.87
Image parameters (MIME type, compression, colorspace & profile, …)
Image Creation (source, capture info, etc.) Image performance assessment (sampling,
colormap, whitepoint, target data, etc.) Change history (source, processing, etc.)
Besser--SfS-Hague 18/10/02 67
Technical Image Metadata
Technical Image Metadata-Z39.87
additional XML implementation schema (MIX)
Besser--SfS-Hague 18/10/02 68
Other Metadata
_ Description of depiction/surrogate (What VRA calls its "Surrogate Categories")
_ Description of original object
_ Rights and Reproduction Information_ Location Information
Besser--SfS-Hague 18/10/02 69
Data Structures:The VRA Core
28 elements specifically for visual resource collections
Work Description Categories- Visual Document Description Categories- http://www.oberlin.edu/~art/vra/dsc.html
Besser--SfS-Hague 18/10/02 70
VRA Core:Work Description Categories
Work type Title Measurements Material Technique Creator Role Date Repository name Repository place
_ Repository number_ Current site_ Original site_ Style/period/group/
movement_ Nationality/culture_ Subject_ Related work_ Relationship type_ Notes
Besser--SfS-Hague 18/10/02 71
VRA Core:Visual Document Description
Categories Visual document type Visual document format Visual document measurements Visual document date Visual document owner Visual document owner number Visual document view description Visual document subject Visual document source
Besser--SfS-Hague 18/10/02 74
Thesaurus for Graphic Materials
designed for subject indexing of pictorial materials, particularly large general collections of historical images
for cataloging and retrieval good for general audiences and broad approaches
to the material TGM-I: Subject Terms & TGM-II: Genre and
Physical Characteristic Terms http://lcweb.loc.gov/rr/print/tgm/toc.html
Besser--SfS-Hague 18/10/02 75
AAT
120,000 terms for describing objects, textual materials,
images, architecture, and material culture from antiquity to present
large and complex http://www.getty.edu/gri/vocabularies/
Besser--SfS-Hague 18/10/02 77
Thesaurus of Geographic Names
over 1 million records hierarchical and global throughout history most records include coordinates and
descriptive notes
Besser--SfS-Hague 18/10/02 79
<Indecs>
formal structure for describing and uniquely identifying intellectual property itself, the people and businesses involved in its trading, and the agreements which they make about it (primarily for publishing, music, and visual arts)
will develop high-level specifications for the services that will be required to implement a global IP trading system based on this <indecs> generic data model
focus is on encoding rights at a high level, not on resource discovery likely to involve metadata schma registration and directory to allow
interoperation of personal identifiers for rightsholders and users supported by EEC DG-13 First meeting July 1999 http://www.indecs.org/
Besser--SfS-Hague 18/10/02 81
Crosswalks
mapping btwn differing metadata structures eliminate the need for monolithic,
universally adopted standards focus on flexibility and interoperatiblity RDF-based metadata registries
Besser--SfS-Hague 18/10/02 82
Crosswalk ExampleCDWA Object ID
CIMISchema
FDAVRA CoreCategories
USMARCDUBLINCORE
OBJECT/WORK (core)
DocumentClassification-CatalogLevel (core)DocumentClassification-Group Type
Object/Work-Type (core)
Type ofObject
objectNAME DocumentClassification- DocumentType (core)Purpose-Purpose(Broad) (core)Purpose-Purpose(Narrow)
W1. WorkType
655 Genre-Form
Type
Object/Work-Components
quantity DocumentClassification-Extent
300a PhysicalDescription-Extent
ORIENTATION/ARRANGEMENT
Description
TITLES ORNAMES(core)
Title objectTitlebibliographicTitle
Group/ItemIdentification-RepositoryTitleGroup/ItemIdentification-DescriptiveTitle (core)Group/ItemIdentification-InscribedTitle
W2. Title 24Xa Titleand Title-RelatedInformation
Title
Besser--SfS-Hague 18/10/02 83
Resource Description Framework (RDF, spec released 2/99)
_ W3C Metadata activity_ designed to move the Web beyond simple links to
semantically-rich relationships btwn resources_ metadata application using XML as a common syntax for
exchange and processing_ flexible architecture for managing diverse application-
specific metadata packets that can be processed by machines_ associates resources, property types, and corresponding
values_ http://www.w3.org/RDF/
Besser--SfS-Hague 18/10/02 84
RDF
_ Resources (character strings, names, digital objects)
_ Property (“is the author of”)_ Value
_ resources+properties=relationships_ many different relationships can be reflected
Besser--SfS-Hague 18/10/02 85
XML-encoded RDF
_ <?xml:namespace ns=http://www.w3.org/RDF/RDF prefix="RDF" ?>
_ <?xml:namespace ns=http://purl.oclc.org/DC/ prefix="DC" ?>
_ <RDF:RDF>_ <DC:Creator>Howard Besser</DC:Creator>_ </RDF:Description>_ </RDF:RDF>
Besser--SfS-Hague 18/10/02 86
Should you start building with RDF today?
_ Tools are primitive_ Standard still likely to evolve
Besser--SfS-Hague 18/10/02 87
Digital Repository Traditions & Services require
Sustainability Interoperability Access
And all of these require Standards and Metadata
Besser--SfS-Hague 18/10/02 88
Building a Digital Future: Sustainable, Interoperable, Accessible Repositories
Howard Besser
NYU Archiving & Preservation Program
http://www.firstmonday.dk/issues/issue7_6/besser/index.html
Baca, Murtha (ed). Introduction to Metadata, Los Angeles: Getty Information Institute, 1998
http://www.getty.edu/gri/standard/intrometadata/
http://www.gseis.ucla.edu/~howard/Metadata/UC-May00/
http://sunsite.berkeley.edu/Metadata/sp2000.html
http://sunsite.berkeley.edu/Longevity/
http://www.oclc.org/digitalpreservation/presmeta_wp.pdf
http://is.gseis.ucla.edu/us-interpares/
http://www.niso.org/commitau.html
http://www.ifla.org/II/metadata.htm
http://www.gseis.ucla.edu/~howard/image-meta.html
http://sunsite.berkeley.edu/Imaging/Databases/#standards
Besser--SfS-Hague 18/10/02 90
OCLC/RLGDigital Repository Attributes
_ Administrative responsibility_ Organizational viability_ Financial sustainability_ Technological suitability_ System security_ Procedural accountability
Besser--SfS-Hague 18/10/02 91
OCLC/RLGSelected Recommendations
_ Policies, Certification processes, Risk management, Persistent ID, Migration/Emulation experiments
_ Stakeholders meet to decide how to describe what is in a dig repository
_ Examine special properties of particular classes of digital objects
_ Technical standards for exchange and interoperability btwn repositories
_ Develop projects and case studies_ Copyright issues
Besser--SfS-Hague 18/10/02 93
E-Journal Archiving
_ Issues– License, don’t own; may not be even able to obtain right to make archival
copy
– Increasingly no paper back-up at all
– Usually we don’t have the important redundancy factor
_ Mellon funded projects (2001)– Yale, Harvard, Penn working w/individual publishers
– Cornell, NYPL--specific disciplines
– MIT exploring characteristics that change (dynamic)\
– Stanford--archiving software tools
Besser--SfS-Hague 18/10/02 94
NEA 2001 grant to BAVCfor $150,000
_ “To support development and dissemination of a DVD that contains a curriculum for the preservation of electronic art. The DVD will feature a preservation overview; discussions with conservators, artists, curators and technicians; a curriculum to train professionals in the field and project case studies to conserve electronic art.”
Besser--SfS-Hague 18/10/02 95
A few questions the NINCH community should address
_ Special issues raised by non-library institutions_ Special issues raised by images and rich media_ What is the work (or salient points we need to preserve)?_ Bring the arts communities (artist intent, BAVC) together with the
preservation repository communities and the preservation metadata communities
_ Specifically get Cult Heritage communities involved with the selected OCLC/RLG recommendations
_ Get cult heritage groups started on working to make sure that structure standards incorporate our works
_ What organizations will take responsibility to save today’s digital “ephemeral” materials (online ‘zines, arts discussion groups, etc.)?