besser--planning (brazil) 31/5/01 1 planning to maximize longevity of digital information howard...

49
ser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information http://www.gseis.ucla.edu/ ~howard

Post on 19-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 1

Planning to Maximize Longevity of Digital Information

Howard Besser

UCLA School of Education & Information

http://www.gseis.ucla.edu/~howard

Page 2: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 2

Planning to Maximize Longevity of Digital Info-

Access and Preservation Why are you Managing this Information? Key Considerations for Imaging Projects Important Planning Considerations Models for Digital Collections Importance of Metadata Standards Digital Longevity Issues More Planning Issues

Page 3: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 3

Access and Preservation

_ Digitizing can serve both Access and Preservation– E.g. Access to digital surrogates saves wear & tear on

originals

_ But Digitization for Access can be quite different than Digitization for Preservation– Level of detail, scanning quality, extensiveness of

resources

– And long-term retention of digital works is still an open issue

Page 4: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 4

Why are you Managing this Information?

Organizational mission & type Users Uses

Page 5: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 5

Key Considerations for Imaging Projects-

Users' Needs Image Quality Intellectual Property Standards Topology Tools & Processes

Page 6: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 6

Key Considerations for Imaging Projects (1 of 3)

Users' Needs– Quality of Digital Surrogate– Interoperable desktop applications

Image Quality– Archival– Current online delivery

Page 7: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 7

Key Considerations for Imaging Projects (2 of 3)

Intellectual Property Standards

– Modular and Layered Architecture– Terminology– Technical imaging information

Topology

Page 8: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 8

Key Considerations for Imaging Projects (3 of 3)

Tools & Processes– Scanners– Compression techniques– Linking files– Workflow– Interoperable desktop applications

Page 9: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 9

Some nuts-and-boltsPlanning Considerations

Think about users (and potential users), uses, and type of material/collection

Scan at the highest quality that does not exceed the likely potential users/uses/material

Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery

Many documents which appear to be bitonal actually are better represented with greyscale scans

Include color bar and ruler in the scan

Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct)

Don’t use lossy compression Store in a common (standardized)

file format Capture as much metadata as is

reasonably possiple (including metadata about the scanning process itself)

Page 10: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 10

Why Scale is important

Page 11: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 11

Important Planning Considerations

File Formats Choosing Interoperable Systems Adhere to standards Vendors with large installed base Refreshing and/or Migration

Page 12: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 12

Key problems we’re facing

Discovery Longevity- Interoperability-

Page 13: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 13

Serious Longevity Problems

What we know from prior widespread digital file formats

Images separating from their metadata Inaccessibility of software needed to view an

image Inability to even decode the file format of an

image …return to Longevity problem later-

Page 14: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 14

Traditional Digital Library Model

DL

DL

DL

DL

useruser

search & presentation

search & presentation

search & presentation

search & presentation

Page 15: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 15

Ideal Digital Library Model

DL

DL

DL

DL

useruser

search & presentation

Page 16: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 16

For Interoperability Digital Libraries Need Standards

Descriptive Metadata for consistent description

Discovery Metadata for finding Administrative Metadata for viewing and

maintaining Structural Metadata for navigation ... Terms & Conditions Metadata for

controlling access...

Page 17: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 17

Why are Standards and Metadata consensus

important? Managing digital files over time Longevity Interoperability Veracity Recording in a consistent manner Will give vendors incentive to create

applications that support this

Page 18: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 18

Why Standards? Why do we need standards?

– To make information universally available to users– facilitate sharing and interchange of information– To preserve information (make it safe from

changes in hardware and software) Standards only work if communities widely

accept them, but they’re necessary for communities to work together

Page 19: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 19

Questions to Ask

What communities is this standard designed for? What type of information is this standard designed to

handle? What functions is this standard designed to serve? What previous standards is it built upon? Does the standard prescribe how to create new records (or

parts of records), or how to map from existing records? How far does the standard go? Semantics: Does it define

element sets? Rules? Syntax?-

Page 20: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 20

Semantics/Syntax/Structure

_ Semantics– meaning, as defined by a community to meet their particular needs

(DC)

_ Syntax– a systematic arrangement of data elements for machine processing

– facilitates the exchange and use of metadata among various applications (HTML, XML, RDF)

_ Structure– a formal arrangement of the syntax with the goal of consistent

representation of the semantics (rules defining field contents like 1/11/99)

Page 21: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 21

The Short Life of Digital Info: Digital Longevity Problems-

Disappearing Information The Viewing Problem The Scrambling Problem The Inter-relation Problem The Custodial Problem The Translation Problem

Page 22: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 22

The Viewing Problem

Digital Info requires a whole infrastructure to view it

Each piece of that infrastructure is changing at an incredibly rapid rate

How can we ever hope to deal with all the permutations and combinations

Page 23: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 23

The Scrambling Problem

Dangers from: Compression to ease storage & delivery Container Architecture to enhance digital

commerce

Page 24: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 24

The Inter-relation Problem

-Info is increasingly inter-related to other info

-How do we make our own Info persist when it points to and integrates with Info owned by others?

-What is the boundary of a set of information (or even of a digital object)?

Page 25: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 25

The Custodial Problem

How do we decide what to save? Who should save it? How should they save it?

– -methods for later access: emulation, migration, etc.

– -issues of authenticity and evidence

Page 26: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 26

The Translation Problem

Content translated into new delivery devices changes meaning– -A photo vs. a painting– -If Info is produced originally in digital form in

one encoded format, will it be the same when translated into another format?

– Behaviors

Page 27: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 27

Pieces of the Solution (1/2)

-We need to insist upon clearly readable standardized ways for digital objects to self-identify their formats

-We should discourage scrambling -We need to better understand information

inter-relates to other Info, and what constitutes “boundaries” of Info objects

Page 28: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 28

Pieces of the Solution (2/2)

-People and organizations wishing to make information persist need guidelines of how to go about doing it

-We need to better understand how translating from one storage or display format to another affects the meaning of a work

-We need to save the “behaviors” of a digital object, not just it’s “contents”

Page 29: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 29

Conceptual Approaches to Digital Preservation

_ Refreshing always necessary due to volatility of physical strata– Impact on evidential value

_ Migration -- advantages & disadvantages_ Emulation -- advantages & disadvantages

Page 30: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 30

Metadata can be the first line of defense

Can tell you– where the file is (if you can’t find the file)– where more info about the file is (if you have the

file but most other metadata has become separated)

– what the file format is– what the compression scheme is– what application program and version is needed

for the file

Page 31: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 31

Groups Working onthe Big Problem

http://sunsite.berkeley.edu/Longevity/

CPA Task Force Getty “Time & Bits” Conference & Follow-ups- Emulation experiments in US and Europe

NEDLIB, CURL, Michigan

Mellon-funded E-Journal Archive experiments

Internet Archive Long Now

Page 32: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 32

Time & Bits

Page 33: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 33

Time & Bits Participants

Steward Brand Howard Besser Brian Eno Danny Hillis Peter Lyman Brewster Kahle Kevin Kelly

Jaron Lanier Doug Carlston John Heilemann Ben Davis Margaret MacLean Bruce Sterling Paul Saffo

Page 34: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 34

Groups Working onPieces of the Big Problem

http://sunsite.berkeley.edu/Longevity/

Internet Archive Long Now Emulation experiments in US and Europe

NEDLIB, CURL, Michigan

Page 35: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 35

Journal Archiving

_ License, don’t own; may not be even able to obtain right to make archival copy

_ Increasingly no paper back-up at all_ Usually we don’t have the important

redundancy factor_ Stanford’s LOCKSS Project (Lots of Copies

Keeps Stuff Safe) and its problems (http://lockss.stanford.edu)

Page 36: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 36

Migration/Refreshing

Impact on evidential value

Page 37: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 37

More Planning Issues

_ Image Families_ Behaviors_ Persistent Identification

Page 38: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 38

Identification/Provenance (Images)-

The number of variant forms of a work can be enormous Image Families A digital image frequently has many layers of parentage Information about the parentage that can indicate the

quality and veracity of the image (Dublin Core "Source" and "Relation")

how to deal with different versions derived from the same scan or different encoding schemes

Vocabulary Standards to express this

Page 39: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 39

The number of variant forms of a work can be enormous

different views of the same object different scans of the same photo different resolutions different compression schemes different compression ratios different file storage formats different details of the same image ...

Page 40: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Image Families

Page 41: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 41

Identification/Provenance

how to deal with different versions (browse, hi-res, medium res) derived from the same scan or different encoding schemes (TIFF, PICT, JFIF)

Vocabulary Standards to express this– VRA Surrogate Categories– CIMI's "Image Elements”

Page 42: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 42

MOA II Behaviors

Navigation Display/Print

Page 43: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 43

MOA II Best practices

Use/Users/Collection: Benchmarking Masters vs. Derivatives Scanning- Administrative Metadata- Structural Metadata-

Page 44: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 44

To deal with Immediately

_ Persistent IDs_ Metadata

Page 45: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 45

Persistent IDs--the Problem

_ Need to separate work ID from work location

_ URNs probably won’t be ready until 2003_ Becomes a business process issue when one

organization maintains the resource and another organization references it (ie. licensed from vendors or managed by separate administrative structures)

Page 46: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 46

More Persistent IDs--the Approach for today

_ PURLs_ Handles_ HTTP redirects

_ And worry about costs now and conversion costs when URNs become feasible

Page 47: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 47

Data Set ManagementMore issues with referencing IDs

_ References for mirror sites_ References for back-up sites when main site

is down or bottle-necked_ References for off-site copies and archival

copies

Page 48: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 48

One Final Question:Who will collect the digital works of

today that should become the Special Collections of tomorrow?

_ web sites_ zines_ electronic journals_ listserve and email discussions_ drafts of works that later become famous

Page 49: Besser--Planning (Brazil) 31/5/01 1 Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information howard

Besser--Planning (Brazil) 31/5/01 49

Howard Besser

UCLA School of Education & Information

http://sunsite.berkeley.edu/Longevity/ http://www.gseis.ucla.edu/~howard http://sunsite.berkeley.edu/moa2 http://lockss.stanford.edu http://www.longnow.com/10klibrary/TimeBitsDisc/ http://www.archive.org/

Planning to Maximize Longevity of Digital Information