besser--vala 2/8/02 1 moving from isolated digital collections to interoperable digital libraries...

70
Besser--VALA 2/8/02 1 Moving from Isolated Digital Collections to Interoperable Digital Libraries VALA 2002 Conference Howard Besser UCLA School of Education & Information http://www.gseis.ucla.edu/ ~howard

Upload: virgil-hicks

Post on 02-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Besser--VALA 2/8/02 1

Moving from Isolated Digital Collections to Interoperable Digital

Libraries

VALA 2002 Conference

Howard Besser

UCLA School of Education & Information

http://www.gseis.ucla.edu/~howard

Besser--VALA 2/8/02 2

Moving from Isolated Digital Collections to Interoperable Digital

Libraries-_ Digital Collections vs Digital Libraries -- What’s missing?_ Importance of Standards and New Metadata Models_ Best Practices & Standards for Managing Digital Projects_ Longevity_ Other issues remaining in order to create real Digital

Libraries

Besser--VALA 2/8/02 3

Important parts of conventional Libraries-

_ Components_ Ethics & Traditions

Besser--VALA 2/8/02 4

Components

_ Service to a clientele_ Stewardship over a collection_ Sustainability_ Ability to find material outside that

collection

Besser--VALA 2/8/02 5

Ethics & Traditions

_ Free speech_ Privacy_ Equal access to info_ Diversity of info_ Serving the underserved

Besser--VALA 2/8/02 6

Brief Digital LibraryFunding History

Stage Date Sponsor What

I 1994 NSF/ARPA/NASA Experiments

IIa 1998/99 NSF/ARPA/NASA Begin to considercustodialship,sustainability, usercommunities

IIb Late 1990s CLIR Further work on IIaissues

III ? ? Real digitallibraries

Besser--VALA 2/8/02 7

Moving from Digital Collections to Digital Libraries

_ What’s the difference?– not experiments– real users– service– longevity

Besser--VALA 2/8/02 8

Traditional Digital Collection Model

DL

DL

DL

DL

useruser

search & presentation

search & presentation

search & presentation

search & presentation

Besser--VALA 2/8/02 9

Ideal Digital Collection Model

DL

DL

DL

DL

useruser

search & presentation

Besser--VALA 2/8/02 10

Developmental Stages

_ Experiment with methods_ Build real operational systems_ Build interoperable operational systems_ Make the system useful for users

– For DL Initiatives– For OPACs– For I & A Services– For Image Retrieval

Besser--VALA 2/8/02 11

To move from Collections to Libraries, we need

_ Standards & Metadata_ Sustainability_ Other issues involving components and

ethics/traditions

Besser--VALA 2/8/02 12

For Interoperability Digital Collections Need Standards

_ Descriptive Metadata for consistent description

_ Discovery Metadata for finding_ Administrative Metadata for viewing and

maintaining_ Structural Metadata for navigation_ ... Terms & Conditions Metadata for

controlling access...

Besser--VALA 2/8/02 13

Metadata is not just indexing terms

_ CBIR attributes used for retrieval on color, shape, texture, etc._ Structural attributes used for page-turning_ Administrative attributes used for managing a digital work

over time_ IPR attributes to limit unauthorized use_ Identification attributes to determine what application software

is needed to view a particular digital work

_ Can be located anywhere

Besser--VALA 2/8/02 14

Why are Standards and Metadata consensus

important? Managing digital files over time Longevity Interoperability Veracity Recording in a consistent manner Will give vendors incentive to create

applications that support this

Besser--VALA 2/8/02 15

Moving to New Metadata Models-

_ Containers & Packages_ Qualifiers_ Crosswalks

Besser--VALA 2/8/02 16

Containers and Packages of Metadata

Warwick, not MARC

_ modular_ overlapping_ extensible_ community-based_ designed for a networked world to aid

commonality btwn communities while still providing full functionality within each community

Besser--VALA 2/8/02 17

DC Qualifiers

_ allows one community to express important nuances and qualifications, while still making the basic importance available to communities with simple needs

_ our community can reflect alternate title, transliterated title, and main title, yet they will all be found under a simple Web search under “title”

Besser--VALA 2/8/02 18

Crosswalks

mapping btwn differing metadata structures eliminate the need for monolithic,

universally adopted standards focus on flexibility and interoperatiblity RDF-based metadata registries

Besser--VALA 2/8/02 19

Crosswalk ExampleCDWAObject IDCIMISchema FDAVRA CoreCategories USMARCDUBLINCOREOBJECT/WORK (core)     DocumentClassification-CatalogLevel (core)DocumentClassification-Group Type

     

Object/Work-Type (core) Type ofObject objectNAMEDocumentClassification- DocumentType (core)Purpose-Purpose(Broad) (core)Purpose-Purpose(Narrow)

W1. WorkType 655 Genre-Form Type

Object/Work-Components   quantity DocumentClassification-Extent   300a PhysicalDescription-Extent  ORIENTATION/ARRANGEMENT

          DescriptionTITLES ORNAMES(core)

Title objectTitlebibliographicTitleGroup/ItemIdentification-RepositoryTitleGroup/ItemIdentification-DescriptiveTitle (core)Group/ItemIdentification-InscribedTitle

W2. Title 24Xa Titleand Title-RelatedInformationTitle 

Besser--VALA 2/8/02 20

Best Practices & Standards for Managing Digital Projects-

_ Who will your users be?_ Best Practices Guidelines (CDL, MOA2)_ NISO/DLF Imaging Technical Standards_ Managing Multiple Image Files

Besser--VALA 2/8/02 21

Why are you Managing this Information?

Organizational mission & type Users Uses

Besser--VALA 2/8/02 22

Scanning Best Practices

_ Think about users (and potential users), uses, and type of material/collection

_ Scan at the highest quality that does not exceed the likely potential users/uses/material

_ Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery

_ Many documents which appear to be bitonal actually are better represented with greyscale scans

_ Include color bar and ruler in the scan

_ Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct)

_ Don’t use lossy compression_ Store in a common (standardized)

file format_ Capture as much metadata as is

reasonably possible (including metadata about the scanning process itself)

Besser--VALA 2/8/02 23

Why Scale is important

Besser--VALA 2/8/02 24

Digital Object Behaviors

_ Book example

Besser--VALA 2/8/02 25

Metadata Standards(from MOA2)

_ Administrative Metadata– for enhancing resource management

_ Structural Metadata– for reflecting internal hierarchies and

relationships btwn parts

_ Raw/Seared/Cooked

Besser--VALA 2/8/02 26

The number of variant forms of a work can be enormous

_ different views of the same object_ different scans of the same photo_ different resolutions_ different compression schemes_ different compression ratios_ different file storage formats_ different details of the same image_ ...

Image Families

Besser--VALA 2/8/02 28

Identification/Provenance

how to deal with different versions (browse, hi-res, medium res) derived from the same scan or different encoding schemes (TIFF, PICT, JFIF)

Vocabulary Standards to express this– VRA Surrogate Categories– CIMI's "Image Elements”

Besser--VALA 2/8/02 29

Digital Longevity

Serious Longevity Problems

_ What we know from prior widespread digital file formats

_ Images separating from their metadata_ Inaccessibility of software needed to view

an image_ Inability to even decode the file format of

an image

Besser--VALA 2/8/02 30

Digital Longevity

The Short Life of Digital Info: Digital Longevity Problems-

_ Disappearing Information_ The Viewing Problem_ The Scrambling Problem_ The Inter-relation Problem_ The Custodial Problem_ The Translation Problem

Besser--VALA 2/8/02 31

Digital Longevity:

The Viewing Problem

Digital Info requires a whole infrastructure to view it

Each piece of that infrastructure is changing at an incredibly rapid rate

How can we ever hope to deal with all the permutations and combinations

Besser--VALA 2/8/02 32

Digital Longevity:

The Scrambling Problem

Dangers from: Compression to ease storage & delivery Container Architecture to enhance digital

commerce

Besser--VALA 2/8/02 33

Digital Longevity:

The Inter-relation Problem

-Info is increasingly inter-related to other info

-How do we make our own Info persist when it points to and integrates with Info owned by others?

-What is the boundary of a set of information (or even of a digital object)?

Besser--VALA 2/8/02 34

Digital Longevity:

The Custodial Problem

How do we decide what to save? Who should save it? How should they save it?

– -methods for later access: emulation, migration, etc.

– -issues of authenticity and evidence

Besser--VALA 2/8/02 35

Digital Longevity:

The Translation Problem

Content translated into new delivery devices changes meaning– -A photo vs. a painting– -If Info is produced originally in digital form in

one encoded format, will it be the same when translated into another format?

– Behaviors

Besser--VALA 2/8/02 36

Digital Longevity

Pieces of the Solution (1/2)

-We need to insist upon clearly readable standardized ways for digital objects to self-identify their formats

-We should discourage scrambling -We need to better understand information

inter-relates to other Info, and what constitutes “boundaries” of Info objects

Besser--VALA 2/8/02 37

Digital Longevity

Pieces of the Solution (2/2)

-People and organizations wishing to make information persist need guidelines of how to go about doing it

-We need to better understand how translating from one storage or display format to another affects the meaning of a work

-We need to save the “behaviors” of a digital object, not just it’s “contents”

Besser--VALA 2/8/02 38

Digital Longevity

Metadata can be the first line of defense

Can tell you– where the file is (if you can’t find the file)– where more info about the file is (if you have the

file but most other metadata has become separated)

– what the file format is– what the compression scheme is– what application program and version is needed

for the file

Besser--VALA 2/8/02 39

Digital Longevity

Migration/Refreshing

_ Impact on evidential value

Besser--VALA 2/8/02 40

Digital Longevity

Older Longevity Projectshttp://sunsite.berkeley.edu/Longevity/

CPA Task Force Getty “Time & Bits” Conference & Follow-ups- Preservation experiments in US and Elsewhere

NEDLIB, CURL, Michigan, Pandora

Internet Archive Long Now

Besser--VALA 2/8/02 41

Digital Longevity

Preservation Repositories:Open Archival Info System Model

High-level reference model describing submission, organization and management, and continuing access

Conceptual framework for different organizations to share discussions with a common language

Producers, consumers, management, actual repository SIP, DIP, AIP AIP consists of data objects plus representation info

(Content, Preservation Description, Packaging, Descriptive)

Originally developed for Space Science community

Besser--VALA 2/8/02 42

Digital Longevity

Preservation Repositories:Projects based on OAIS Model

CEDARS NEDLIB Pandora CDL OCLC/RLG Working Group on

Preservation Metadata, Attributes of a Trusted Digital Repository, August 2001-

Besser--VALA 2/8/02 43

Digital Longevity

OCLC/RLGDigital Repository Attributes

_ Administrative responsibility_ Organizational viability_ Financial sustainability_ Technological suitability_ System security_ Procedural accountability

Besser--VALA 2/8/02 44

Digital Longevity

OCLC/RLGSelected Recommendations

_ Policies, Certification processes, Risk management, Persistent ID, Migration/Emulation experiments

_ Stakeholders meet to decide how to describe what is in a dig repository

_ Examine special properties of particular classes of digital objects

_ Technical standards for exchange and interoperability btwn repositories

_ Develop projects and case studies_ Copyright issues

Besser--VALA 2/8/02 45

Digital Longevity

Preservation Metadata

OCLC/RLG Working Group on Preservation Metadata, Preservation Metadata for Digital Objects: A Review of the State of the Art, January 31 2001

OCLC/RLG Working Group on Preservation Metadata, A Recommendation for Content Information, October 2001

Besser--VALA 2/8/02 46

Digital Longevity

Other Digital Preservation Activities-

LC Natl Dig Info Infrastructure & Preservation InterPARES Emulation Projects E-Journal Archiving ERPANET Persistent Naming

Besser--VALA 2/8/02 47

Digital Longevity

LC’s National Digital Information Infrastructure and

Preservation Program_ Authorized Dec 2000_ LC, Dept of Commerce, NARA, White House

Office of Sci & Tech Policy_ with help from CLIR, NLM, NAL, OCLC, RLG_ Ongoing collab process_ Commissioned papers on preserving: the Web,

periodicals, digital sound, E-Books, Digital TV, Digital Video

Besser--VALA 2/8/02 48

Digital Longevity

InterPARES International Research on Permanent Authentication

Records in Electronic Systems_ Ongoing international archival world project

examining how to make electronically-generated records last over time

_ Developing the theoretical and methodological knowledge needed, then will formulate model policies, strategies, and standards

_ Next year will be extended to include images and rich media

Besser--VALA 2/8/02 49

Digital Longevity

Emulation Projects

_ CAMiLEON (Michigan/Leeds)_ NEDLIB

Besser--VALA 2/8/02 50

Digital Longevity

E-Journal Archiving

_ Issues– License, don’t own; may not be even able to obtain right to make archival

copy

– Increasingly no paper back-up at all

– Usually we don’t have the important redundancy factor

_ Mellon funded projects (2001)– Yale, Harvard, Penn working w/individual publishers

– Cornell, NYPL--specific disciplines

– MIT exploring characteristics that change (dynamic)\

– Stanford--archiving software tools

Besser--VALA 2/8/02 51

Digital Longevity

Electronic Resource Preservation and Access NETwork (ERPANET)

_ Best practices and skills development for digital preservation of cultural heritage and scientific objects

_ 3 year project launched Nov 2001; 1.2 million Euros

Besser--VALA 2/8/02 52

Digital Longevity

What’s special about Cult Heritage Materials?

_ Images & rich media_ Inter-relationships btwn parts_ For Contemporary Art: What is the Work?-

Besser--VALA 2/8/02 53

Digital Longevity

One Final Longevity Question:Who will collect the digital works of

today that should become the Special Collections of tomorrow?

_ web sites_ zines_ electronic journals_ listserve and email discussions_ drafts of works that later become famous

Besser--VALA 2/8/02 54

Other Standards Issues-

_ Persistent Naming_ Making your works accessible throughout

the Net_ Problems with works residing outside the

library’s jurisdiction

Besser--VALA 2/8/02 55

Persistent IDs--the Problem

_ Need to separate work ID from work location

_ URNs probably won’t be ready until 2003_ Becomes a business process issue when one

organization maintains the resource and another organization references it (ie. licensed from vendors or managed by separate administrative structures)

Besser--VALA 2/8/02 56

Persistent Naming

URNs Handles PURLs Re-directs

Besser--VALA 2/8/02 57

Making your works accessible throughout the Net

_ Open Archives & Metadata Harvesting_ An administrative and political issue as

much as a a technical one

Besser--VALA 2/8/02 58

Problems with works residing outside the library’s

jurisdiction_ Open URL_ Authentication

Besser--VALA 2/8/02 59

Digitization meansNew Audiences

_ more access for more people_ outreach to new groups_ but new groups have different usability requirements

– different user interfaces

– different vocabulary

– new methods of navigation

_ we already have enough differences btwn different institution types (& even within the same type)– MESL results

– Organization & indexing reflects the biases of the original intent when records were formed

Besser--VALA 2/8/02 60

Still Further Research

_ Development of good tools to encourage use_ Seamless integration of Remote-source

content with locally-scanned content_ Making specialized vocabulary more

accessible to general audiences_ Building Adaptive delivery systems_ Understanding what really is the work-

Besser--VALA 2/8/02 61

What Really is the Work?

_ Artifact or informational content?_ Creator’s Intent (Gary Hill)_ With artistic works, sometimes it’s very

difficult to determine what the work really is, what its boundaries are, etc. (more later if time remains)

Besser--VALA 2/8/02 62

LeWitt: Wall Drawing 340

Besser--VALA 2/8/02 63

Installing LeWitt

Besser--VALA 2/8/02 64

LeWitt Install Directions

Besser--VALA 2/8/02 65

ECI - Hole in Space (both)

Besser--VALA 2/8/02 66

ECI - 84-locations

Besser--VALA 2/8/02 67

ECI - 84-Community Memory

Besser--VALA 2/8/02 68

ECI - 84-kids

Besser--VALA 2/8/02 69

What do we need for Real Digital Libraries?

_ Components– Service to a clientele– Stewardship over a collection– Sustainability– Ability to find material outside that collection

_ Ethics & Traditions– Free speech– Privacy– Equal access to info– Diversity of info– Serving the underserved

Besser--VALA 2/8/02 70

Moving from Isolated Digital Collections to Interoperable Digital Libraries

http://www.getty.edu/gri/standard/intrometadata/

http://www.ifla.org/II/metadata.htm

http://sunsite.Berkeley.EDU/Imaging/Databases/

http://www.ucop.edu/irc/cdl/tasw/Current/current.html

http://sunsite.Berkeley.EDU/moa2/

http://sunsite.Berkeley.EDU/Longevity/

http://www.gseis.ucla.edu/~howard/image-meta.html

http://sunsite.berkeley.edu/Metadata/sp2000.html

http://www.gseis.ucla.edu/~howard/

http://is.gseis.ucla.edu/impact/f95/special-collectns.html

http://is.gseis.ucla.edu/us-interpares/

http://www.diglib.org/preserve/ejp.htm

http://www.longnow.com/10klibrary/TimeBitsDisc/

http://www.archive.org/