cornell cs 502 metadata for the web issues and simple answers cs 502 – 20030219 carl lagoze –...

Post on 19-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Cornell CS 502

Metadata for the WebIssues and Simple Answers

CS 502 – 20030219Carl Lagoze – Cornell University

Cornell CS 502

“Metadata is data about data”

Cornell CS 502

Metadata is semi-structured data conforming to commonlyagreed upon models, providing operational interoperability

in a heterogeneous environment

Cornell CS 502

Some untested hypotheses

• Metadata is useful for…– People– Machines

• More metadata is better• (semi) automated digital libraries and simple

metadata

Cornell CS 502

Some known facts

• Number and variety of metadata vocabularies will continue to increase

• The Tower of Babel is a franchise– There is not one common view of reality

• “The one thing I know about metadata is that it is expensive” (Bill Arms)

• “I hate metadata projects because they make every other digital library project more expensive” (Michael Lesk)

Cornell CS 502

Are metadata and data distinguishable?

• Objectivity?• Intellectual property?• Structure?• Aboutness?

Cornell CS 502

The fiction of classification

…there is no classification of the universe that is not fictional and

conjectural.

Jorge Luis Borges

Cornell CS 502

Lenses and Views

• All classification does and should provide a biased lens or view of reality

• Each view emphasizes certain characteristics and hides others

GeospatialRights

Museum

Cornell CS 502

Reality is Complex

Created by:George Castaldo

Created on:1994

Created by:Leonardo da Vinci

Created on:1506

Relationship?

Cornell CS 502

Objects are Related

IFLA Entity Model

Cornell CS 502

Entities, Events, and Agents

Photographer

Camera type Software

Computer artist

Cornell CS 502

Haven’t we done metadata already?

Cornell CS 502

What’s wrong with this model?

• Expensive– Complex (even for its original goal?) – Professional intervention (assumes single community

of expertise)

• Monolithic– One size fits all approach– Reflects its centralized system origins

• Bias towards physical artifacts– Fixed resources– Incomplete handling of resource evolution and other

resource relationships

• Anglo-centric

Cornell CS 502

Web Challenge to Traditional Cataloging

• Scale

• Permanence

• Authenticity

• Organizational Context

• Custodial Control

• Variety

Cornell CS 502

Internet Commons includes Multiple Communities

ScientificData

HomePages Geo

InternetCommons

Library

Museums

Commerce

Whatever...

Cornell CS 502

Metadata Takes Many Forms

resourcediscovery

documentadministration

rightsmanagement

contentrating

security andauthentication

archivalstatus

products andservices

databaseschemas

process controlor description

Cornell CS 502

Metadata Challenges

• Accommodate multiple varieties of metadata– community-specific functionality, creation,

administration, access

• Tensions– functionality and simplicity – extensibility and interoperability– human and machine creation and use

Cornell CS 502

Interoperability has many facets

• Semantics– Meaning/classification/ontology

• Models/Structure– Entities and relationships

• Syntax– grammars to convey semantics and structure

Cornell CS 502

Warwick Framework: Containing Chaos

• Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2)

• Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata

• Provide context for metadata efforts (including Dublin Core)– avoids the “black-hole” of comprehensive element

sets– focuses interoperability issues at package level

Cornell CS 502

Metadata Container

Container

Package

Dublin Core

Package

MARC record

Package

Indirect Reference

Package

Terms and Conditions

URI

Cornell CS 502

Modularization Allows Distributed Management

• Communities of expertise (not software vendors) are responsible for:– Semantics– Registration– Administration– Access management– Authority of data– Sharing and Distribution

Cornell CS 502

Realities of Web search and discovery

• Search systems are motivated by advertising• Index coverage is unpredictable and limited• Too much recall, too little precision• Index spam abounds• Resources (and their names) are volatile

Cornell CS 502

Metadata: Part of a Solution

• Structured data about data– helps to impose order on chaos– enables automated discovery/manipulation

• Variety across various dimensions:– specialization– decentralization– democratization

Cornell CS 502

Web Metadata Models:Drill-Down Searching Paradigm

• Moving along a specificity spectrum• Inter-domain vs. intra-domain terms, models,

query mechanisms• One size doesn't fit all

– Cognitive models of searching and browsing

Cornell CS 502

Drill-down search paradigm

DomainIndependent

view

DomainSpecific

View

Cornell CS 502

Metadata:Part of the problem

cost

functionality

AACR2/MARC

googleDublin Core

Cornell CS 502

Why hasn’t metadata worked on the Web?

• Its all about trust• People are lazy• Metadata is hard• No perceived benefit

– “Reverse tragedy of the commons”

• No agreement on one way to describe things

• “Metacrap” - http://www.well.com/~doctorow/metacrap.htm

top related