besser--uc librarians metadata 5/19/00 1 introduction to metadata for digital libraries howard...

79
ser--UC Librarians Metadata 5/19/00 1 Introduction to Metadata for Digital Libraries Howard Besser UCLA School of Education & Information http://www.gseis.ucla.edu/ ~howard

Post on 22-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Besser--UC Librarians Metadata 5/19/00 1

Introduction toMetadata for Digital Libraries

Howard Besser

UCLA School of Education & Information

http://www.gseis.ucla.edu/~howard

Besser--UC Librarians Metadata 5/19/00 2

Metadata for Digital Libraries- Models for Digital Libraries Importance of Metadata Standards Types and Uses of Metadata Discovery Metadata: The Dublin Core Administrative and Structural Metadata: The Making of

America II Project Longevity Metadata Identification/Provenance The 4/99 NISO/DLF Image Metadata Workshop Various other Metadata

Besser--UC Librarians Metadata 5/19/00 3

Key problems we’re facing

Discovery Longevity- Interoperability-

Besser--UC Librarians Metadata 5/19/00 4

Serious Longevity Problems

What we know from prior widespread digital file formats

Images separating from their metadata Inaccessibility of software needed to view

an image Inability to even decode the file format of

an image

Besser--UC Librarians Metadata 5/19/00 5

Traditional Digital Library Model

DL

DL

DL

DL

useruser

search & presentation

search & presentation

search & presentation

search & presentation

Besser--UC Librarians Metadata 5/19/00 6

Ideal Digital Library Model

DL

DL

DL

DL

useruser

search & presentation

Besser--UC Librarians Metadata 5/19/00 7

For Interoperability Digital Libraries Need Standards

Descriptive Metadata for consistent description

Discovery Metadata for finding Administrative Metadata for viewing and

maintaining Structural Metadata for navigation ... Terms & Conditions Metadata for

controlling access...

Besser--UC Librarians Metadata 5/19/00 8

Why are Standards and Metadata consensus

important? Managing digital files over time Longevity Interoperability Veracity Recording in a consistent manner Will give vendors incentive to create

applications that support this

Besser--UC Librarians Metadata 5/19/00 9

Why Standards? Why do we need standards?

– To make information universally available to users– facilitate sharing and interchange of information– To preserve information (make it safe from

changes in hardware and software) Standards only work if communities widely

accept them, but they’re necessary for communities to work together

Besser--UC Librarians Metadata 5/19/00 10

Why are you Managing this Information?

Organizational mission & type Users Uses

Besser--UC Librarians Metadata 5/19/00 11

Questions to Ask

What communities is this standard designed for? What type of information is this standard designed to

handle? What functions is this standard designed to serve? What previous standards is it built upon? Does the standard prescribe how to create new records (or

parts of records), or how to map from existing records? How far does the standard go? Semantics: Does it define

element sets? Rules? Syntax?-

Besser--UC Librarians Metadata 5/19/00 12

What is Metadata

_ Structured data describing other data used to find or help manage information resources

_ Aids in interoperability_ Titles, dates, captions, cataloging and

indexing data, file headers, rights info, provenance, code books, transaction logs, ...

_ One person’s metadata is another’s data

Besser--UC Librarians Metadata 5/19/00 13

Sorting through the Standards Morass

_ Data Structures (DC, CDWA, MARC, VRA Core, TEI, EAD, MESL data dict)

_ Data Interchange (Z39.50)

_ Data Values/vocabularies (LCSH, AAT, ULAN, TGN)

_ Data Content/syntax (AACR2)

Besser--UC Librarians Metadata 5/19/00 14

Semantics/Syntax/Structure

_ Semantics– meaning, as defined by a community to meet their particular needs

(DC)

_ Syntax– a systematic arrangement of data elements for machine processing

– facilitates the exchange and use of metadata among various applications (HTML, XML, RDF)

_ Structure– a formal arrangement of the syntax with the goal of consistent

representation of the semantics (rules defining field contents like 1/11/99)

Besser--UC Librarians Metadata 5/19/00 15

What is MetadataTypes & Uses

lots of different ways of dividing the clusters

Besser--UC Librarians Metadata 5/19/00 16

Uses of Metadata

_ Discovery & Retrieval_ Identification/Provenance_ Rights Management_ Viewing_ Integrity_ Longevity_ Content rating

Besser--UC Librarians Metadata 5/19/00 17

Containers and Packages of Metadata

Warwick, not MARC

_ modular_ overlapping_ extensible_ community-based_ designed for a networked world to aid

commonality btwn communities while still providing full functionality within each community

Besser--UC Librarians Metadata 5/19/00 18

Some different schemes where Metdata is kept

_ embedded withing the object (HTML tags)_ in a separate related DB maintained by same

organization (OPAC, MOA II)_ in a separate DB maintained by a separate

organization (Books in Print, ratings systems)

_ derived on-the-fly from a different scheme (MARC-to-DC)

Besser--UC Librarians Metadata 5/19/00 19

Collaborative Metadata Projects

Dublin Core NSF/ERCIM Digital Collaboratory OCLC CORC Project- Visual Resources Association (VRA) Core Encoded Archival Description (EAD) Computerized Interchange of Museum Information

(CIMI)- Records Export for Art and Cultural Heritage

(REACH)

Besser--UC Librarians Metadata 5/19/00 20

CORC--Cooperative Online Resource Catalog

_ both bib records & webliographies (pathfiinders)

_ supports both AACR2/MARC and DC_ began 1/99, scheduled availability 7/00_ 100-200 participants

– Academic libraries– OCLC networks, special libraries, public

libraries, state & national libraries, consortia

Besser--UC Librarians Metadata 5/19/00 21

Dublin Core (3/95)

_ improve resource discovery_ anticipate precision problems of Web Crawler-

based searching tools_ existing metadata could be “dumbed down”_ elements should be simple to understand and use,

so that any individual should be able to assign terms him/herself

_ software might eventually automatically generate very base-level metadata

Besser--UC Librarians Metadata 5/19/00 22

Dublin Core

Title Creator Subject Description Publisher Contributors Date Type

Format Identifier Source Language Relation Coverage Rights

Besser--UC Librarians Metadata 5/19/00 23

Dublin Core

every element is both optional and repeatable elements are cross-disciplinary elements are extensible by organized communities can employ a syntax such as html’s

<META> tagset for use by Spiders and HarvestersMay 2000 DLF Metadata Harvesting Project

Besser--UC Librarians Metadata 5/19/00 24

DC Qualifiers

_ allows one community to express important nuances and qualifications, while still making the basic importance available to communities with simple needs

_ our community can reflect alternate title, transliterated title, and main title, yet they will all be found under a simple Web search under “title”

Besser--UC Librarians Metadata 5/19/00 25

Discovery Metadata:Recent History

_ Dublin Core (3/95)_ Warwick Framework (4/96)_ Image Metadata Workshop (9/96)_ Canberra, Helsinki, ... DC (98)_ Digital Library Collaboratory (97-)_ DC-8, Frankfurt 10/99

Besser--UC Librarians Metadata 5/19/00 26

Dublin Core--further work

_ Warwick Framework– metadata packages for extensible functions

– layed groundwork for RDF

_ Canberra Qualifiers– refining the semantics of the element set to provide more precise info

– SUBELEMENT, SCHEME, LANG

_ Granularity– no hierarchical relationships w/i a given DC record; only one record

per discrete object (collection or item-level), and relationship field plus qualifier links them

The Research Process and Functional Categories of

Metadata_ Discovery_ Retrieval_ Collation_ Analysis_ Re-presentation

Besser--UC Librarians Metadata 5/19/00 28

Making of America II-

Background of the DLF Project Administrative Metadata Structural Metadata

Besser--UC Librarians Metadata 5/19/00 29

MOA2 Goal is Interpoerability

Book example

Besser--UC Librarians Metadata 5/19/00 30

DLF Metadata for Interoperability Testbed:

the MOA II Project R & D Distributed Repositories Transportation, 1869-1900 Testbed Project Best Practices Structural and administrative metadata

Besser--UC Librarians Metadata 5/19/00 31

Previous Projects/Background

Library Standards Background UC Berkeley Background Finding Aids EAD SGML EAD “Digital Archives”

Besser--UC Librarians Metadata 5/19/00 32

MOA II Classes of Objects

Continuous Tone Photos Photo Albums Diaries, journals, letterpress books Ledgers Correspondence

Besser--UC Librarians Metadata 5/19/00 33

MOA II Metadata

_ Administrative Metadata– for enhancing resource management

_ Structural Metadata– for reflecting internal hierarchies and

relationships btwn parts

_ Raw/Seared/Cooked

Besser--UC Librarians Metadata 5/19/00 34

MOA II Behaviors

Navigation Display/Print

Besser--UC Librarians Metadata 5/19/00 35

MOA II Best practices

Use/Users/Collection: Benchmarking Masters vs. Derivatives Scanning- Administrative Metadata- Structural Metadata-

Besser--UC Librarians Metadata 5/19/00 36

Scanning Best Practices

_ Think about users (and potential users), uses, and type of material/collection

_ Scan at the highest quality that does not exceed the likely potential users/uses/material

_ Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery

_ Many documents which appear to be bitonal actually are better represented with greyscale scans

_ Include color bar and ruler in the scan

_ Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct)

_ Don’t use lossy compression_ Store in a common (standardized)

file format_ Capture as much metadata as is

reasonably possiple (including metadata about the scanning process itself)

Besser--UC Librarians Metadata 5/19/00 37

Why Scale is important

Besser--UC Librarians Metadata 5/19/00 38

Administrative Metadatato uniquely identify a digital resource and manage it

over time

_ Information about where the various pieces/versions of the object reside

_ Information to view the digital object_ Information about the scanning process

Besser--UC Librarians Metadata 5/19/00 39

Structural Metadata:that which is relevant to presentation of the

digital object to the user

_ metadata defining the "object”: a book, a diary, a photo album

_ metadata defining the “sub-objects”: pages (physical) or chapters and subheads (intellectual)

Besser--UC Librarians Metadata 5/19/00 40

SGML, XML, HTML

_ TEI for structured humanities text_ EAD for Finding Aids

Besser--UC Librarians Metadata 5/19/00 41

Other Types of Metadata-

_ Longevity_ Identification/Provenance_ Rights Management

Besser--UC Librarians Metadata 5/19/00 42

The Short Life of Digital Info: Digital Longevity Problems-

Disappearing Information The Viewing Problem The Scrambling Problem The Inter-relation Problem The Custodial Problem The Translation Problem

Besser--UC Librarians Metadata 5/19/00 43

The Viewing Problem

Digital Info requires a whole infrastructure to view it

Each piece of that infrastructure is changing at an incredibly rapid rate

How can we ever hope to deal with all the permutations and combinations

Besser--UC Librarians Metadata 5/19/00 44

The Scrambling Problem

Dangers from: Compression to ease storage & delivery Container Architecture to enhance digital

commerce

Besser--UC Librarians Metadata 5/19/00 45

The Inter-relation Problem

-Info is increasingly inter-related to other info

-How do we make our own Info persist when it points to and integrates with Info owned by others?

-What is the boundary of a set of information (or even of a digital object)?

Besser--UC Librarians Metadata 5/19/00 46

The Custodial Problem

How do we decide what to save? Who should save it? How should they save it?

– -methods for later access: emulation, migration, etc.

– -issues of authenticity and evidence

Besser--UC Librarians Metadata 5/19/00 47

The Translation Problem

Content translated into new delivery devices changes meaning– -A photo vs. a painting– -If Info is produced originally in digital form in

one encoded format, will it be the same when translated into another format?

– Behaviors

Besser--UC Librarians Metadata 5/19/00 48

Pieces of the Solution (1/2)

-We need to insist upon clearly readable standardized ways for digital objects to self-identify their formats

-We should discourage scrambling -We need to better understand information

inter-relates to other Info, and what constitutes “boundaries” of Info objects

Besser--UC Librarians Metadata 5/19/00 49

Pieces of the Solution (2/2)

-People and organizations wishing to make information persist need guidelines of how to go about doing it

-We need to better understand how translating from one storage or display format to another affects the meaning of a work

-We need to save the “behaviors” of a digital object, not just it’s “contents”

Besser--UC Librarians Metadata 5/19/00 50

Metadata can be the first line of defense

Can tell you– where the file is (if you can’t find the file)– where more info about the file is (if you have the

file but most other metadata has become separated)

– what the file format is– what the compression scheme is– what application program and version is needed

for the file

Besser--UC Librarians Metadata 5/19/00 51

Groups Working onthe Big Longevity Problem

http://sunsite.Berkeley.EDU/Imaging/Databases/Longevity/

CPA Task Force CPA Study Group Getty “Time & Bits” Conference-

Internet Archive Long Now

Besser--UC Librarians Metadata 5/19/00 52

Migration/Refreshing

Impact on evidential value

Besser--UC Librarians Metadata 5/19/00 53

Identification/Provenance (Images)-

The number of variant forms of a work can be enormous Image Families A digital image frequently has many layers of parentage Information about the parentage that can indicate the

quality and veracity of the image (Dublin Core "Source" and "Relation")

how to deal with different versions derived from the same scan or different encoding schemes

Vocabulary Standards to express this

Besser--UC Librarians Metadata 5/19/00 54

The number of variant forms of a work can be enormous

different views of the same object different scans of the same photo different resolutions different compression schemes different compression ratios different file storage formats different details of the same image ...

Image Families

Besser--UC Librarians Metadata 5/19/00 56

Identification/Provenance

how to deal with different versions (browse, hi-res, medium res) derived from the same scan or different encoding schemes (TIFF, PICT, JFIF)

Vocabulary Standards to express this– VRA Surrogate Categories– CIMI's "Image Elements”

Besser--UC Librarians Metadata 5/19/00 57

NISO/DLF Image Metadata WorkshopPossible Goals

Metadata fields Rules for Field Contents (authority control)

Core set of necessary fields

Syntax for expressing fields and contents (headers)

Besser--UC Librarians Metadata 5/19/00 58

Image Metadata

Focus on Metadata that may prove helpful for

management use preservation ...

Besser--UC Librarians Metadata 5/19/00 59

Image Metadata

Break-out Groups: Work Done

Characteristics and Features of Images Image Production and Reformatting

Features Image Identification and Integrity

Besser--UC Librarians Metadata 5/19/00 60

Other Metadata

_ Description of depiction/surrogate (What VRA calls its "Surrogate Categories")

_ Description of original object

_ Rights and Reproduction Information_ Location Information

Besser--UC Librarians Metadata 5/19/00 61

Data Structures:The VRA Core

28 elements specifically for visual resource collections

Work Description Categories- Visual Document Description Categories- http://www.oberlin.edu/~art/vra/dsc.html

Besser--UC Librarians Metadata 5/19/00 62

VRA Core:Work Description Categories

Work type Title Measurements Material Technique Creator Role Date Repository name Repository place

_ Repository number_ Current site_ Original site_ Style/period/group/

movement_ Nationality/culture_ Subject_ Related work_ Relationship type_ Notes

Besser--UC Librarians Metadata 5/19/00 63

VRA Core:Visual Document Description

Categories Visual document type Visual document format Visual document measurements Visual document date Visual document owner Visual document owner number Visual document view description Visual document subject Visual document source

Besser--UC Librarians Metadata 5/19/00 64

Data Value Metadata(vocabularies)

LCSH TGM AAT ULAN TGN VRA Core

Besser--UC Librarians Metadata 5/19/00 65

LCSH

very general

Besser--UC Librarians Metadata 5/19/00 66

Thesaurus for Graphic Materials

designed for subject indexing of pictorial materials, particularly large general collections of historical images

for cataloging and retrieval good for general audiences and broad approaches

to the material TGM-I: Subject Terms & TGM-II: Genre and

Physical Characteristic Terms http://lcweb.loc.gov/rr/print/tgm/toc.html

Besser--UC Librarians Metadata 5/19/00 67

AAT

120,000 terms for describing objects, textual materials,

images, architecture, and material culture from antiquity to present

large and complex http://www.getty.edu/gri/vocabularies/

Besser--UC Librarians Metadata 5/19/00 68

ULAN

name authority http://www.getty.edu/gri/vocabularies/

Besser--UC Librarians Metadata 5/19/00 69

Thesaurus of Geographic Names

over 1 million records hierarchical and global throughout history most records include coordinates and

descriptive notes

Besser--UC Librarians Metadata 5/19/00 70

Metadata for Digital Commerce

DOI <indecs>-

Besser--UC Librarians Metadata 5/19/00 71

<Indecs>

formal structure for describing and uniquely identifying intellectual property itself, the people and businesses involved in its trading, and the agreements which they make about it (primarily for publishing, music, and visual arts)

will develop high-level specifications for the services that will be required to implement a global IP trading system based on this <indecs> generic data model

focus is on encoding rights at a high level, not on resource discovery likely to involve metadata schma registration and directory to allow

interoperation of personal identifiers for rightsholders and users supported by EEC DG-13 First meeting July 1999 http://www.indecs.org/

Besser--UC Librarians Metadata 5/19/00 72

Metadata Mapping-

Crosswalks Resource Description Framework (RDF)

Besser--UC Librarians Metadata 5/19/00 73

Crosswalks

mapping btwn differing metadata structures eliminate the need for monolithic,

universally adopted standards focus on flexibility and interoperatiblity RDF-based metadata registries

Besser--UC Librarians Metadata 5/19/00 74

Crosswalk ExampleCDWA Object ID

CIMISchema

FDAVRA CoreCategories

USMARCDUBLINCORE

OBJECT/WORK (core)

    DocumentClassification-CatalogLevel (core)DocumentClassification-Group Type

     

Object/Work-Type (core)

Type ofObject

objectNAME DocumentClassification- DocumentType (core)Purpose-Purpose(Broad) (core)Purpose-Purpose(Narrow)

W1. WorkType

655 Genre-Form

Type

Object/Work-Components

  quantity DocumentClassification-Extent

  300a PhysicalDescription-Extent

 

ORIENTATION/ARRANGEMENT

          Description

TITLES ORNAMES(core)

Title objectTitlebibliographicTitle

Group/ItemIdentification-RepositoryTitleGroup/ItemIdentification-DescriptiveTitle (core)Group/ItemIdentification-InscribedTitle

W2. Title 24Xa Titleand Title-RelatedInformation

Title 

Besser--UC Librarians Metadata 5/19/00 75

Resource Description Framework (RDF, spec released 2/99)

_ W3C Metadata activity_ designed to move the Web beyond simple links to

semantically-rich relationships btwn resources_ metadata application using XML as a common syntax for

exchange and processing_ flexible architecture for managing diverse application-

specific metadata packets that can be processed by machines_ associates resources, property types, and corresponding

values_ http://www.w3.org/RDF/

Besser--UC Librarians Metadata 5/19/00 76

RDF

_ Resources (character strings, names, digital objects)

_ Property (“is the author of”)_ Value

_ resources+properties=relationships_ many different relationships can be reflected

Besser--UC Librarians Metadata 5/19/00 77

XML-encoded RDF

_ <?xml:namespace ns=http://www.w3.org/RDF/RDF prefix="RDF" ?>

_ <?xml:namespace ns=http://purl.oclc.org/DC/ prefix="DC" ?>

_ <RDF:RDF>_ <DC:Creator>Howard Besser</DC:Creator>_ </RDF:Description>_ </RDF:RDF>

Besser--UC Librarians Metadata 5/19/00 78

Should you start building with RDF today?

_ Tools are primitive_ Standard still likely to evolve

Besser--UC Librarians Metadata 5/19/00 79

Metadata for Digital LibrariesHoward Besser

UCLA School of Education & Information

Baca, Murtha (ed). Introduction to Metadata, Los Angeles: Getty Information Institute, 1998

http://www.getty.edu/gri/standard/intrometadata/

http://sunsite.Berkeley.EDU/Imaging/Databases/#standards

http://sunsite.Berkeley.EDU/moa2/

http://sunsite.Berkeley.EDU/Longevity/

http://www.ifla.org/II/metadata.htm

http://purl.oclc.org/metadata/dublin_core/

http://purl.oclc.org/corc/

http://lcweb.loc.gov/ead/

http://www.gseis.ucla.edu/~howard/image-meta.html

http://www.gseis.ucla.edu/~howard/Metadata/UC-May00/

http://sunsite.berkeley.edu/Metadata/sp2000.html