besser--textonezero 5/22/01 1 the new information environments: helping content persist over time...

35
sser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information http://www.gseis.ucla.edu/ ~howard

Upload: amice-blankenship

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 1

The New Information Environments:Helping content persist over time

Howard Besser

UCLA School of Education & Information

http://www.gseis.ucla.edu/~howard

Page 2: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 2

The New Information Environments:Helping content persist over time-

What the Movie Industry is learning Major Issues Facing Digital Projects The Short Life of Digital Info Whose working on these problems? Important Planning Considerations

Page 3: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 3

What the Movie Industry is learning

_ Repurposing is a key part of future business models

_ The products they sell will be an integral part of a larger infrastructure and a larger set of informational products

Page 4: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 4

What this implies

_ Must save digital content over very long periods of time (much longer than backlists)

_ Digital content must be designed to interoperate with other digital content coming from other publishers/vendors (Age of the stand-alone book are gone)

_ Publishers need to seriously worry about– Longevity Issues

– Standards

Page 5: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 5

Major Issues Facing Digital Projects

Changes in Intellectual Property Law Intellectual Access Storage Delivery Integration with other tools Interoperability

Page 6: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 6

Serious Longevity Problems

_ What we know from prior widespread digital file formats

_ Images separating from their metadata_ Inaccessibility of software needed to view a

complex work_ Inability to even decode the file format of a

work

Page 7: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 7

The Short Life of Digital Info: Digital Longevity Problems-

Disappearing Information The Viewing Problem The Scrambling Problem The Inter-relation Problem The Custodial Problem The Translation Problem

Page 8: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 8

The Viewing Problem

Digital Info requires a whole infrastructure to view it

Each piece of that infrastructure is changing at an incredibly rapid rate

How can we ever hope to deal with all the permutations and combinations

Page 9: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 9

The Scrambling Problem

Dangers from: Compression to ease storage & delivery Container Architecture to enhance digital

commerce

Page 10: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 10

The Inter-relation Problem

-Info is increasingly inter-related to other info

-How do we make our own Info persist when it points to and integrates with Info owned by others?

-What is the boundary of a set of information (or even of a digital object)?

Page 11: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 11

The Custodial Problem

In the past, much of survival was due to redundancy

How do we decide what to save? Who should save it?

Mellon-funded E-Journal Archives How should they save it?-

Page 12: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 12

The Custodial Problem:How to save information?

Methods for later accessRefreshingMigrationEmulation

Issues of authenticity and evidence

Page 13: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 13

The Translation Problem

Content translated into new delivery devices changes meaning– -A photo vs. a painting– -If Info is produced originally in digital form in

one encoded format, will it be the same when translated into another format?

– Behaviors

Page 14: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 14

Still another problem: Layers of rights

_ eg. recent electronic versions of art books have been released with most of the art missing!

Page 15: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 15

Pieces of the Solution (1/2)

-We need to insist upon clearly readable standardized ways for digital objects to self-identify their formats

-We need to standardize on fewer file formats -We should discourage scrambling -We need to better understand information inter-

relates to other Info, and what constitutes “boundaries” of Info objects

Page 16: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 16

Pieces of the Solution (2/2)

-People and organizations wishing to make information persist need guidelines of how to go about doing it

-We need to better understand how translating from one storage or display format to another affects the meaning of a work

-We need to save the “behaviors” of a digital object, not just its “contents”

-Supporting strong Copyright legislation can come back to bite us

Page 17: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 17

Conceptual Approaches to Digital Preservation

_ Refreshing always necessary due to volatility of physical strata– Impact on evidential value

_ Migration -- advantages & disadvantages_ Emulation -- advantages & disadvantages

Page 18: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 18

To deal with Immediately-

_ Persistent IDs_ Metadata

Page 19: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 19

Persistent IDs--the Problem

_ Need to separate work ID from work location

_ URNs probably won’t be ready until 2003_ Becomes a business process issue when one

organization maintains the resource and another organization references it (ie. licensed from vendors or managed by separate administrative structures)

Page 20: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 20

More Persistent IDs--the Approach for today

_ PURLs_ Handles_ HTTP redirects

_ And worry about costs now and conversion costs when URNs become feasible

Page 21: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 21

Data Set ManagementMore issues with referencing IDs

_ References for mirror sites_ References for back-up sites when main site

is down or bottle-necked_ References for off-site copies and archival

copies

Page 22: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 22

Metadata can be the first line of defense

Can tell you– where the file is (if you can’t find the file)– where more info about the file is (if you have the

file but most other metadata has become separated)

– what the file format is– what the compression scheme is– what application program and version is needed

for the file

Page 23: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 23

Metadata Encoding

_ XML Mark-up_ Structural & Administrative Metadata --

http://sunsite.berkeley.edu/moa2_ File Name management

Page 24: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 24

Groups Working onthe Big Problem

http://sunsite.berkeley.edu/Longevity/

CPA Task Force Getty “Time & Bits” Conference & Follow-ups- Emulation experiments in US and Europe

NEDLIB, CURL, Michigan

Mellon Journal Archiving experiments

Internet Archive Long Now

Page 25: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 25

Time & Bits

Page 26: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 26

Time & Bits Participants

Steward Brand Howard Besser Brian Eno Danny Hillis Peter Lyman Brewster Kahle Kevin Kelly

Jaron Lanier Doug Carlston John Heilemann Ben Davis Margaret MacLean Bruce Sterling Paul Saffo

Page 27: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 27

Groups Working onPieces of the Big Problem

http://sunsite.berkeley.edu/Longevity/

Internet Archive Long Now Emulation experiments in US and Europe

NEDLIB, CURL, Michigan

Mellon Journal Archiving experiments

Page 28: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 28

Important Planning Considerations

File Formats Choosing Interoperable Systems Adhere to standards Vendors with large installed base Refreshing and/or Migration

Page 29: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 29

Key Considerations for Imaging Projects-

Users' Needs Image Quality Intellectual Property Standards Topology Tools & Processes

Page 30: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 30

Key Considerations for Imaging Projects (1 of 3)

Users' Needs– Quality of Digital Surrogate– Interoperable desktop applications

Image Quality– Archival– Current online delivery

Page 31: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 31

Some nuts-and-boltsPlanning Considerations

Think about users (and potential users), uses, and type of material/collection

Scan at the highest quality that does not exceed the likely potential users/uses/material

Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery

Many documents which appear to be bitonal actually are better represented with greyscale scans

Include color bar and ruler in the scan

Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct)

Don’t use lossy compression Store in a common (standardized)

file format Capture as much metadata as is

reasonably possiple (including metadata about the scanning process itself)

Page 32: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 32

Howard Besser

UCLA School of Education & Information

http://sunsite.berkeley.edu/Longevity/ http://www.gseis.ucla.edu/~howard http://sunsite.berkeley.edu/moa2 http://lockss.stanford.edu http://www.longnow.com/10klibrary/TimeBitsDisc/ http://www.archive.org/

The New Information Environments:Helping content persist over time

Page 33: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 33

Page 34: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 34

Architecture: Separating Longevity and Delivery Servers

BerkeleyLongevity

Server

BerkeleyDeliveryServer

OtherDeliveryServer

OtherDeliveryServer

OtherDeliveryServer

User

User

User

User

Page 35: Besser--TextOneZero 5/22/01 1 The New Information Environments: Helping content persist over time Howard Besser UCLA School of Education & Information

Besser--TextOneZero 5/22/01 35

Journal Archiving

_ License, don’t own; may not be even able to obtain right to make archival copy

_ Increasingly no paper back-up at all_ Usually we don’t have the important

redundancy factor_ Stanford’s LOCKSS Project (Lots of Copies

Keeps Stuff Safe) and its problems (http://lockss.stanford.edu)