metadata for the web a necessary evil?

35
Metadata for the Web A Necessary Evil? CS 431 – March 2, 2005 Carl Lagoze – Cornell University

Upload: kaipo

Post on 13-Jan-2016

29 views

Category:

Documents


2 download

DESCRIPTION

Metadata for the Web A Necessary Evil?. CS 431 – March 2, 2005 Carl Lagoze – Cornell University. “Metadata is data about data”. Metadata is semi-structured data conforming to commonly agreed upon models , providing operational interoperability in a heterogeneous environment. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Metadata for the Web A Necessary Evil?

Metadata for the WebA Necessary Evil?

CS 431 – March 2, 2005Carl Lagoze – Cornell University

Page 2: Metadata for the Web A Necessary Evil?

“Metadata is data about data”

Page 3: Metadata for the Web A Necessary Evil?

Metadata is semi-structured data conforming to commonlyagreed upon models, providing operational interoperability

in a heterogeneous environment

Page 4: Metadata for the Web A Necessary Evil?

Are metadata and data distinguishable?

• Objectivity?• Intellectual property?• Structure?• Aboutness?

Page 5: Metadata for the Web A Necessary Evil?

Some untested hypotheses

• Metadata is useful for…– People– Machines

• More metadata is better• Simple metadata created by non-experts (Joe

Sixpack and his smarter friends) is useful

Page 6: Metadata for the Web A Necessary Evil?

Metadata Quality as function of Creator Expertise

Creator Expertise (High to Low)

Me

tad

ata

Qu

alit

y (

Lo

w t

o H

igh

)

Hoped

Actual?

Page 7: Metadata for the Web A Necessary Evil?

Some known facts

• Number and variety of metadata vocabularies will continue to increase

• The Tower of Babel is a franchise– There is not one common view of reality

• “The one thing I know about metadata is that it is expensive” (Bill Arms)

• “I hate metadata projects because they make every other digital library project more expensive” (Michael Lesk)

Page 8: Metadata for the Web A Necessary Evil?

Metadata Triage

Page 9: Metadata for the Web A Necessary Evil?

Metadata Takes Many Forms

resourcediscovery

documentadministration

rightsmanagement

contentrating

security andauthentication

archivalstatus

products andservices

databaseschemas

process controlor description

Page 10: Metadata for the Web A Necessary Evil?

Metadata Challenges

• Accommodate multiple varieties of metadata– community-specific functionality, creation,

administration, access– Lowest common denominator

• Tensions– functionality and simplicity – extensibility and interoperability– human and machine creation and use

Page 11: Metadata for the Web A Necessary Evil?

The fiction of classification

…there is no classification of the universe that is not fictional and

conjectural.

Jorge Luis Borges

Page 12: Metadata for the Web A Necessary Evil?

Lenses and Views

• All classification does and should provide a biased lens or view of reality

• Each view emphasizes certain characteristics and hides others

GeospatialRights

Museum

Page 13: Metadata for the Web A Necessary Evil?

Reality is Complex

Created by:George Castaldo

Created on:1994

Created by:Leonardo da Vinci

Created on:1506

Relationship?

Page 14: Metadata for the Web A Necessary Evil?

Objects are Related

IFLA Entity Model

Page 15: Metadata for the Web A Necessary Evil?

Haven’t we done metadata already?

Page 16: Metadata for the Web A Necessary Evil?

What’s wrong with this model?

• Expensive– Complex (even for its original goal?) – Professional intervention (assumes single community

of expertise)

• Monolithic– One size fits all approach– Reflects its centralized system origins

• Bias towards physical artifacts– Fixed resources– Incomplete handling of resource evolution and other

resource relationships

• Anglo-centric

Page 17: Metadata for the Web A Necessary Evil?

Why hasn’t metadata worked on the Web?

• Its all about trust• People are lazy• Metadata is hard• No perceived benefit

– “Reverse tragedy of the commons”

• No agreement on one way to describe things

• “Metacrap” - http://www.well.com/~doctorow/metacrap.htm

Page 18: Metadata for the Web A Necessary Evil?

The fifteen Dublin Core Elements

Creator Title Subject

Contributor Date Description

Publisher Type Format

Coverage Rights Relation

Source Language I dentifi er

http://dublincore.org/usage/terms/dc/current-elements/http://dublincore.org

Page 19: Metadata for the Web A Necessary Evil?

A Pidgin for Digital Tourists

• Metadata is language• Dublin Core is a small and simple language -- a

pidgin -- for finding resources across domains.• Speakers of different languages naturally

"pidginize" to communicate– E.g., tourists using simple phrases to order beer

("zwei Bier bitte" "dva pivo prosim" "biru o san bai kudasi"...)

• We are all "tourists" on the global Internet.

Page 20: Metadata for the Web A Necessary Evil?

What is the Dublin Core (1)

• A simple set of properties to support resource discovery on the web (fuzzy search buckets)?

DomainIndependent

view

Page 21: Metadata for the Web A Necessary Evil?

What is Dublin Core (2)?

• An extensible ontology for resource desciption?

Gre

ate

r Fun

ction

ality

&

Cost

Page 22: Metadata for the Web A Necessary Evil?

Progressive Metadata Models:Drill-Down Searching Paradigm

• Moving along a specificity spectrum• Inter-domain vs. intra-domain terms, models,

query mechanisms

Page 23: Metadata for the Web A Necessary Evil?

Drill-down search paradigm

DomainIndependent

view

DomainSpecific

View

Page 24: Metadata for the Web A Necessary Evil?

What is the Dublin Core (3)?

• A cross-domain switchboard for interoperable metadata?

Switchboard

DublinCore

MARC

INDECSIMS

Page 25: Metadata for the Web A Necessary Evil?

Dublin Core Qualifiers

• From fuzzy buckets to more specific description

• Model of “graceful degradation”– Support both simplicity and specificity– Intra-domain and inter-domain semantics

Page 26: Metadata for the Web A Necessary Evil?

Varieties of qualifiers: Element Refinements

• Make the meaning of an element narrower or more specific.

• Narrowing implies an is a relationship – a "date created“ is a "date“– an "is part of relation“ is a "relation“

• If your software does not understand the qualifier, you can safely ignore it.

Page 27: Metadata for the Web A Necessary Evil?

Varieties of Qualifiers: Value Encoding Schemes

• Says that the value is– a term from a controlled vocabulary (e.g., Library of

Congress Subject Headings)– a string formatted in a standard way (e.g., "2001-05-

02" means May 3, not February 5)

• Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

Page 28: Metadata for the Web A Necessary Evil?

A Grammar of Dublin Core

• http://www.dlib.org/dlib/october00/baker/10baker.html

• By design not as subtle as mother tongues, but easy to learn and extremely useful in practice

• Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives)

• Simple grammars: sentences (statements) follow a simple fixed pattern...

Page 29: Metadata for the Web A Necessary Evil?

Example Dublin Core statements

• Resource has Title 'Grammar of Dublin Core'.• Resource has Creator 'Tom Baker'.• Resource has Subject 'Metadata'.• Resource has Relation http://foo.org/file.htm.

Page 30: Metadata for the Web A Necessary Evil?

Resource has property

DC:CreatorDC:TitleDC:SubjectDC:Date...

X

implied subject

impliedverb

one of 15properties

property value(an appropriateliteral)

Page 31: Metadata for the Web A Necessary Evil?

Resource has property

DC:CreatorDC:TitleDC:SubjectDC:Date...

X

implied subject

impliedverb

one of 15properties

property value(an appropriateliteral)

[optional qualifier]

[optional qualifier]

qualifiers(adjectives)

Page 32: Metadata for the Web A Necessary Evil?

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

Page 33: Metadata for the Web A Necessary Evil?

Dumb-Down Principle for Qualifiers

• The fifteen elements should be usable and understandable with or without the qualifiers

• Qualifiers refine meaning (but may be harder to understand)

• Nouns can stand on their own without adjectives

• If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!

• "has a“ relations break the model– E.g., a creator has a hair color

Page 34: Metadata for the Web A Necessary Evil?

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

Test for “good““ qualifiers:cover and ask: -- Does the statement still make sense? -- Is it still correct?

Page 35: Metadata for the Web A Necessary Evil?

Resource has subjectaudience

Resource has creatoraffiliation

“Incorrect” Qualification

“Cornell University”

“pre-schoolers”