William Y. Arms
Corporation for National Research Initiatives
March 22, 1999
Object models, overlay journals, and virtual collections
William Y. Arms
Department of Computer Science, Cornell University
March 22, 1999
Object models, overlay journals, and virtual collections
William Y. Arms
Corporation for National Research Initiatives
March 22, 1999
Object models, overlay journals, and virtual collections
Physical and Logical Views of Information
Physical view:
Data structures, files, directories, servers
Publishers, libraries, web sites
Logical view:
Works, expressions, manifestations, items
Object models (document models)
Overlay journals
Virtual collections
Work
Work
The underlying abstraction.
Examples
• Homer's The Iliad.• Beethoven's Fifth Symphony.• The Unix operating system.
Expression
Expression
A work is realized through an expression.
Examples
• The Iliad was first expressed orally, then it was written down as a fixed sequence of words.
• Beethoven's Fifth Symphony can be expressed as a printed score or by any one of many performances.
• The Unix operating system has separate expressions as source code and machine code.
Works and Expressions
Works and Expression
Many works are realized through a single expression.
Examples
• The poem, The Road Not Taken by Robert Frost.
• The picture:
In such examples, there is no practical distinction between expression and work.
Manifestations
Manifestation
A expression is given form in one or more manifestations.
Examples
• The text of The Iliad has been manifest in numerous manuscripts and printed books.
• A musical performance can be distributed on CD, or broadcast on television.
• Software is manifest as files, which may be stored or transmitted in any digital medium.
Items
Item
When many copies are made of a manifestation, each is a separate item.
Examples
• A specific copy of a book.
• A copy of a computer file.
Beyond Simple Documents
Many digital objects are more than static files of data.
Dynamic objects: What is presented to the user depends upon the execution of computer programs or other external activities.
Complex objects: Objects are made up from many inter-related elements.
Alternate disseminations: Digital objects may offer the user a choice of access methods.
Databases: A database comprises many alternative records, with different records selected each time the database is accessed.
Object Models and Structural Types
Web object
Digitized materials
Digitized image Set of digitized page images Marked-up text with page images Digitized audio recording
Sets
Set of digital objects Searchable set of digital objects
Object Model: Digitized Image
Data
Several manifestations: thumbnail image reference image archival image
Metadata
Each manifestations may have its own metadata
Object Model: Digitized Image
Identifier
Data
Metadataarchive
jpg
hdl:loc.ndlp/amrlp.1234567
thumbnailgif
referencejpg
objectmetadata
Object Model: Set of Digitized Page Images
Data
Each page:
separate image
Metadata
Structure of work:
page sequence page numbers special pages
Object Model: Set of Digitized Page Images
Identifier
Data
Metadatapage 3
gif
hdl:loc.ndlp/amrlp.13579
page 1gif
page 2gifpage map
Page Map
• List of pages
• Numbers printed on pages
• Blocking of information on pages (columns, figures)
• Sequences of information across pages
A page map relates the page images to the structure of the information, e.g.:
A page map is metadata for a specific manifestation
The NSF SMETE Library
Soon, all scientific and engineering information will be available on-line:
• Journals, reports, papers, standards, patents
• Data sets, instruments, sensors
• Computer programs, simulations, designs
• Maps, images, films
• ... etc., etc., etc.
The Instructor's Wish List
To discover materials and services:• Good science
• Comprehensible to students -- effective for teaching
• Stable -- will not change or disappear
Access to collections and services that are provided by many independent organizations:
• No uniform catalog or index to everything
• Mixture of for-profit and open access information
The Instructor's Wish List
To discover materials and services:• Good science
• Comprehensible to students -- effective for teaching
• Stable -- will not change or disappear
Access to collections and services that are provided by many independent organizations:
• No uniform catalog or index to everything
• Mixture of for-profit and open access information
Overlay Journals
Contents ofJournal I
Articles inRepository A
Articles inRepository B
Contents ofJournal II
Overlay Journals with Preprint Servers
Contents ofJournal I
ResearchWeb site
Preprint server
Contents ofJournal II
CoRRCornell CS Reports
Metadata for Virtual Collections
Reference linking
Identifiers (URLs, URNs, ...) Citations and reverse citations
Information discovery
Cataloguing and indexing
Object models
Structural types Disseminators
Indexing and Cataloguing
Conventional cataloguing and indexing: Skilled professionals, following quality guidelines.
Web spiders and gatherers: Programs that gather information and build indexes (e.g., Infoseek, Harvest).
Meta-data in publishing: Addition of metadata by the creator to aid automatic indexing (e.g., Dublin Core).
Content extraction: Indexing using structured text, speech recognition, or image content.