self-preserving digital objects michael l. nelson [email protected] mln/ several slides from terry l....

93
Self-Preserving Digital Objects Michael L. Nelson [email protected] http://www.cs.odu.edu/~mln/ Several Slides from Terry L. Harrison University of Southern California 6/15/04

Upload: brenda-lester

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Self-Preserving Digital Objects

Michael L. [email protected]

http://www.cs.odu.edu/~mln/

Several Slides from Terry L. Harrison

University of Southern California6/15/04

Page 2: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Outline

• History

• Preservation

• Archives vs. Objects

• Smart Objects & Dumb Archives

• Self-Preserving Objects

Page 3: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

My DL History• 1992 - work first begun on first generation Langley Technical Report Server

(LTRS)• 1993 - WWW version of LTRS

• http://techreports.larc.nasa.gov/ltrs/• work w/ ODU on WATERS

• 1994 - NASA Technical Report Server (NTRS)• distributed searching of many “LTRS-like” servers (20 separate nodes, all

NASA centers)• http://techreports.larc.nasa.gov/cgi-bin/NTRS

• 1996 - NACA Technical Report Server (NACATRS)• http://naca.larc.nasa.gov/

• 1996 - Joint research in DLs with ODU begins• 1997 - NCSTRL+ (clustering, buckets)• 1999 - OAI-PMH development begins• 2001 - Arc, DP9, Archon, Kepler, etc.• 2002 - OAI-PMH version of the NTRS

• http://ntrs.nasa.gov/

Page 4: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

History• ca. 1994 - 1995: a LaRC researcher, upon seeing

LTRS remarked:

“all of these reports are nice, but what we really want is the data...”

• ca. 1995 - present: many reports in LTRS start to include data files, appendices, software and other information types

• NACATRS: the scanned nature of the reports imply that 1 report = N files

N >= (pages * 3) + 2

Page 5: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

NASA STI

• Formal publications cover a decreasing percentage of NASA’s STI output– most DLs focus only on formal publications

• Informal STI is maintained by only by a network of collegial distribution– aging and shrinking workforce weakens this network

• Customers want much more than formal publication– rather than stretch the meaning of “report” or

“document”, define a new object for DL transactions

Page 6: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

STI Observations• Media formats are instantiations of a more general

class of information• Most DLs are uni-format, following the obsolete

media boundaries of their non-digital predecessors• “Separate but equal” DLs considered harmful

– customer should not have to re-integrate what should never have been de-integrated...

– institutional knowledge being lost because we don’t have a publishing vector established

Page 7: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Pyramid of Scientific and Technical Information (STI)

Journal Articles

Conference Papers

Technical Reports

time

software raw data notes video / images

Information is created in a variety of formats. Formal publications, the focus of

most DL projects, are supported by a pyramid of informal information.

Page 8: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Information Lost Over Time

Project

manuscript

software

raw data

images

library

ftp site

thrown away

filing cabinent

User NewProject

Figure 7: STI Lost in Project / Archival / Reuse Process

Page 9: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Content is King

The information content is more important than the systems used for its storage, management and retrieval

Objects should not be “locked” in specific DLs or archives

Page 10: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Prelude to OAI…

• I met Herbert Van de Sompel in April 1999...– we spoke of a demonstration project he had in mind and

had received sponsorship from Paul Ginsparg and Rick Luce

– We wanted to demonstrate a multi-disciplinary DL that leveraged the large number of high quality, yet often isolated, tech report servers, e-print servers, etc.

• most digital libraries (DLs) had grown up along single disciplines or institutions

– little to no interoperability; isolated DL “gardens”

Page 11: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Universal Preprint Service

• A cross-archive DL that that provides services on a collection of metadata harvested from multiple archives– Nelson: NCSTRL+; a modified version of Dienst

• support for “clustering”• support for “buckets”

– Krichel: ReDIF metadata format– Van de Sompel: SFX Linking

• Demonstrated at Santa Fe NM, October 21-22, 1999– http://web.archive.org/web/*/http://ups.cs.odu.edu/– D-Lib Magazine, 6(2) 2000 (2 articles)

• http://www.dlib.org/dlib/february00/02contents.html

– UPS was soon renamed the Open Archives Initiative (OAI) http://www.openarchives.org/

Page 12: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

UPS ParticipantsArchive / DL Records in DL Buckets in UPS Buckets Linked to

Full Content

arXiv

www.arxiv.org

128943 85204 85204

CogPrints

cogprints.soton.ac.uk

743 742 659

NACA

naca.larc.nasa.gov

3036 3036 3036

NCSTRL

www.ncstrl.org

29680 25184 9084

NDLTD

www.ndltd.org

1590 1590 951

RePEc

netec.mcc.ac.uk

71359 71359 13582

Totals: 235361 187115 112516

totals ca. July 1999

Page 13: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Buckets: Information Surrogates in UPS

• Limitations on intellectual property, file size, transmission time, system load, etc. caused us to focus on metadata only

• Metadata was collected into “buckets”, with pointers back to the data files (still at the original sites)

Page 14: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Value Added Services Attached

to the Buckets

SFX Reference Linking Service, developed at Univ of Ghent, Belgium. - provides a layer of indirection between reference services available at a local site and the object itself

SFX “buttons” are attached to the buckets themselves - communication occurs between SFX server and the bucket

Adding other services to the buckets is easy...

Page 15: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

• Data Providers– publishing into an archive– Self-describing archives

• Much of the learning about the constituent UPS archives occurred out of band…

– providing methods for metadata “harvesting”• provide non-technical context for sharing information

also

• Service Providers– harvest metadata from providers– implement user interface to data

Data and Service Providers

Even if theseare done bythe same DL,these are distinct roles

Page 16: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Metadata Harvesting• Move away from distributed searching• Extract metadata from various sources• Build services on local copies of metadata

– data remains at remote repositories

user

. . .

search for “cfd applications”

local copy ofmetadata

metadataharvested offline

metadataharvested offline

metadataharvested offline

metadataharvested offline

each node independently maintained

all searching, browsing, etc. performed on the metadata hereindividual nodes can

still support direct userinteraction

Page 17: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Result… OAI

• The OAI was the result of the demonstration and discussion during the Santa Fe meeting

– OAI = a bunch of people, a religion, a cult, etc.

– OAI Protocol For Metadata Harvesting (OAI-PMH) = the protocol created and maintained by the OAI

• Initial focus was on federating collections of scholarly e-print materials…

• …however, interest grew and the scope and application of OAI-PMH expanded to become a generic bulk metadata transport protocol

• Note:– OAI-PMH is only about metadata -- not full text!

• but what is metadata vs. full-text?

– OAI is neutral with respect to the nature of the metadata or the resources the metadata describes

• read: commercial publishers have an interest in OAI-PMH too...

Page 18: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

A Look Back at UPS

• Primary outcome of the meeting was the OAI & OAI-PMH

• Krichel: ReDIF metadata:– still in use & being developed

• Van de Sompel: SFX– OpenURL (NISO Standard)– SFX is a commercial OpenURL resolver marketed by Ex

Libris

• Nelson: – NCSTRL+ begat Arc (arc.cs.odu.edu) and others– Buckets?

Page 19: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Componentized Digital Libraries

. . .

RSS

SRW

!?

Page 20: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Preservation

• RLG Report: Preserving Digital Information: Final Report and Recommendations– http://www.rlg.org/ArchTF/– refreshing - moving to new media

• considered (comparatively) easy

– migrating - transitioning to new systems, formats, idioms

• considered hard

Page 21: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Really Long Term Preservation

• Migration is very hard, to be sure– but given sufficient demand, this can be accomplished– cf. early 1980s game emulation:

• http://www.intellivisionlives.com/• http://stella.atari.org/

• Refreshing may actually be harder…– or at least intrinsically bound to the migration problem

• http://web.archive.org/web/19980128071544/http://www.usc.edu/

• http://web.archive.org/web/*/http://library.usc.edu/• http://web.archive.org/web/19971210220634/http://lib-

www.lanl.gov/

Page 22: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Preservation Metrics So Far– Nelson & Allen

• 3% decay of objects in DLs– http://www.dlib.org/dlib/january02/nelson/01nelson.html

– Lawrence, et al.• 3% decay of URLs included in technical papers

– http://www.neci.nec.com/~lawrence/papers/persistence-computer01/bib.html

– Koheler• ~ 33% of URLs “unstable” or “partially unstable”• http://InformationR.net/ir/4-4/paper60.html

– Kahle• average URL lasts 44 days

– http://www.hackvan.com/pub/stig/articles/trusted-systems/0397kahle.html

– Spinellis• 28% loss of 5-8 year old URLs from CACM / IEEE Computer

– http://citeseer.ist.psu.edu/spinellis03decay.html

Page 23: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Case Study: ICASE• Institute for Computer Applications in Science and

Engineering– independent research institute affiliated with NASA Langley

Research Center• www.icase.edu

– years of operation: 1972-2002– combined with other LaRC institutes, rolled into the National

Institute for Aerospace (NIA)

• ICASE Report Series– pre-prints/e-prints of all ICASE affiliated authors

• also issued as NASA Contractor Reports

– Dienst was used for report management & workflow• Harrison, Zubair & Nelson, JCDL 03, Dienst <-> OAI-PMH gateway

Page 24: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

NIA Transition

• At first, all files at www.icase.edu were lost

• then, the site was brought back online

• but how well do DLs survive bulk-transfer?

Page 25: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Whither the ICASE Digital Library?

it appears to be reinstated… but not completely…

Page 26: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

How Long is Forever?

• Average human life span (from: http://www.che.uc.edu/acs/archives/cintacs/vol39no5/vol39no5.html)

– female: 78– male: 77

• Average Fortune 500 company lifespan: (from:

http://www.businessweek.com/chapter/degeus.htm)

– 40 - 50 years

• Universities?• U.S. Government agency or institution?

– what about individual labs?• NASA Zero Base Review• U.S. Military BRAC

Page 27: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Self-Preservation

• Objects should be prepared to outlive the people & institutions that are charged with their well-being

• Many areas of risk:– company, agency, university, etc. ceases to exist

– funding cut

– person dies

– disaster (hurricane, earthquake, etc.)

– malicious attack

Page 28: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

P2P Model

• Applicable for scientific and technical information?– Napster, Gnutella, etc. rely on the repetitive nature of

popular culture media (songs, movies, etc.) to insure the availability of items

– a “bubble” of recent and popular interest

• this assumption is probably not valid in STI DLs– cf. popularity(HBO) >> popularity(AMC)

Page 29: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Smart Objects, Dumb Archives

DA

Guildford Protocol

OAI-PMH

Buckets

???

Fedora? METS?

Page 30: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

“Key Concepts in the Architecture of the Digital Library”

• next 9 slides taken from Bill Arm’s seminal article in the inaugural issue of D-Lib Magazine:– http://www.dlib.org/dlib/July95/07arms.html

Page 31: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

The technical framework exists within a legal and social framework

• DLs no longer represent systems specific to academics or information specialists– content influences how the DL is used

• architecture must allow the implementation of various policies

Page 32: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Understanding of digital library concepts is hampered by terminology

• “common English” != “professional English”– multiple professional jargons too

• What do these words mean to you?– copy

– publish

– content

– document

– work

Page 33: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

The underlying architecture should be separate from the content stored in the library

• general purpose functions and content-specific functions should be separated

• TL analogy:– the more specific the bookshelf is to holding

actual books, the harder it is to repurpose the bookshelf in the future

Page 34: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Names and identifiers are the basic building block for the digital library

• names != addresses

• in any DL architecture diagram, (almost) anything that can be drawn can be named

• consider the impact that handles/DOIs have had on the publishing/DL community

Page 35: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Digital library objects are more than collections of bits

• objects = metadata + data– “but what is metadata?”

• don’t ask hard questions

figure 2 in http://www.dlib.org/dlib/July95/07arms.html

Page 36: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

The digital library object that is used is different from the stored object

• what you store is not necessarily what you get– storage and dissemination are separate events,

and can represent separate formats• also, potentially separate from the application-

specific format

Page 37: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Users want intellectual works, not digital objects

• The DL architect’s needs should not inconvenience the users’ needs

• recombination of objects– what is an object in your world view?

figure 4 in http://www.dlib.org/dlib/July95/07arms.html

Page 38: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Repositories must look after the information they hold

• “Repository Access Protocol”– Kahn Wilensky Framework

• http://www.cnri.reston.va.us/home/cstr/arch/k-w.html

figure 3 in http://www.dlib.org/dlib/July95/07arms.html

Page 39: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Objects vs. Archives

• This is the tenet that I question…

• Most DL objects still bound to the applications that generate or render the objects

Page 40: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Design Goals

• Aggregation– DLs should be shielded from the transient

nature of file formats– Prevent information hemorrhaging by archiving

all data types

• Intelligence– Aggregation (above) implies code, why stop at

passive objects? Make objects smart...– Bucket-bucket & bucket-tool intelligence

Page 41: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Design Goals

• Self-Sufficiency– Maximum autonomy & survivability: fully self-

sufficient buckets– Option to internally store all needed materials

• Mobility– Why should an information object be stuck in

one place?– Mobility for replication, workflow, data

collection

Page 42: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Design Goals

• Heterogeneity– One size does not fit all...– Different buckets for different applications, sites,

disciplines, etc.

• Archive Independence– Focus is on information, not yet another DL “system”

• does not require an archive to function

– “Work with everything; break nothing”

Page 43: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Smart Objects• aggregate:

– metadata– data– methods to operate on the metadata/data

• http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=getMetadata&type=all

• http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=listMethods• http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=listPreference• (cheat)

http://www.cs.odu.edu/~mln/teaching/cs595-f03/bucket/bucket.xml

• assumptions– Perl– http server

Page 44: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Internal Structurejaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % lsbucket/ CVS/ index.cgi*jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/bucket.xml* content/ CVS/ lib/ logs/ methods/jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/content/~syllabus.txt ~week1~readings.html ~week5~readings.html~week10~readings.html ~week1~week-01.ppt ~week6~readings.html~week11~readings.html ~week2~readings.html ~week7~readings.html~week12~readings.html ~week2~week-02.ppt ~week8~readings.html~week13~readings.html ~week3~assignment1.ppt ~week9~readings.html~week14~readings.html ~week3~readings.html~week15~readings.html ~week3~week-03.pptjaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/libCVS/ EZXML.pm mime.e style.cssjaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/logs/access.log CVS/ mylog.logjaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/methods/addElement.pl* getElement.pl* listMethods.pl* setPreference.pl*CVS/ get_log.pl* listPreference.pl*deleteElement.pl* getlog.pl* log.pl*display.pl* getMetadata.pl* setMetadata.pl*jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 %

Page 45: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Examples

• 1.6.X bucket– http://ntrs.nasa.gov/– http://www.cs.odu.edu/~mln/phd/

• 2.0 buckets– http://www.cs.odu.edu/~mln/teaching/cs595-f03/– http://www.cs.odu.edu/~lutken/smalltest/1120/

• 3.0 buckets (under development)– http://www.cs.odu.edu/~jallen/buckets/– uses MPEG-21 DIDLs

• cf. http://www.dlib.org/dlib/november03/bekaert/11bekaert.html

Page 46: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Self-Preservation

• Objectives:– knowledge of the system state not required

• i.e. -- you don’t need to keep track of where everything is…

– the knowledge required for each object should be minimal

• actually, the required number of “friends” should be finite, even in very large systems

Page 47: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Friends and Family

• Friends– connections to “other” buckets

• Family– connections to replications of you

Page 48: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Scenario: 3buckets/2pals each

A

Pals: b,c

B

Pals: a,c

C

Pals: b,a

Page 49: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

We want to add new_guy (D)

A

Pals: b,c

B

Pals: a,c

C

Pals: b,a

D

Pals:(none)

Page 50: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Tool calls: C.insert(D,”start”)

A

Pals: b,c

B

Pals: a,c

C Pals: b,a

D

Pals:

Page 51: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

D is added to C’s pal list

A

Pals: b,c

B

Pals: a,c

C Pals: b,a,(d)

D

Pals:

C pal_list is overstuffed

Page 52: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Return handshake: D.insert(C,”finish”)

A

Pals: b,c

B

Pals: a,c

C Pals: b,a,(d)

D

Pals: c

C pal_list is overstuffed

Page 53: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

C “refits” pal list

A

Pals: b,c

B

Pals: a,c

C Pals: b,a,(d)

D

Pals: c

C pal_list is overstuffed

Page 54: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Refit step 1: C.pop_1st_pal not known by (D)

A

Pals: b,c

B

Pals: a,c

CPals: b,a,d

D

Pals: c

Now C pal_list is overstuffed

:

Page 55: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Refit step 2B.pop_pal( C )

A

Pals: b,c

B

Pals: a,cC Pals: a,d

D

Pals: c

Page 56: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Refit step 2B.insert( D, “start” )

A

Pals: b,c

BPals: a,d C Pals: a,d

D

Pals: c

Page 57: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Refit step 3D.insert( B, “finish” )

A

Pals: b,c

BPals: a,d C Pals: a,d

D

Pals: c,b

Page 58: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Refit step 3D.insert( B, “finish” )

A

Pals: b,c

BPals: a,d C Pals: a,d

D

Pals: c,b

Page 59: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

A Pals: b,c

BPals: a,d C Pals: a,d

DPals: c,b

Page 60: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

10 Buckets, 4 Friends: Step 2

Page 61: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

10 Buckets, 4 Friends: Step 3

Page 62: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

10 Buckets, 4 Friends: Step 4

Page 63: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

10 Buckets, 4 Friends: Step 5

Page 64: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

10 Buckets, 4 Friends: Step 6

Page 65: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

10 Buckets, 4 Friends: Step 7

Page 66: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

10 Buckets, 4 Friends: Step 8

Page 67: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

10 Buckets, 4 Friends: Step 9

Page 68: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

10 Buckets, 4 Friends: Step 10

Page 69: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

20 Buckets, 4 Friends

Page 70: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

100 Buckets, 10 Friends

Page 71: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Building the NetworkBucket:

this_node_name;max_friend size;list_of_pals;

insert ( new_guy, string handshake)// Adds new_guy to this bucket's pal list// handshake = "start" or "finish"

{ if (I know(new_guy) { return; } else {

put new_guy at end of my pal list;if ( handshake = "start" )

{new_node.insert(this_node_name, "finish"); }if ( my pal list if now overstuffed)

{ refit(); } }

return list_of_pals;}

refit ()// To keep pal_list from being overstuffed{

read in new_guy's pal list;pop_1st_pal_list();

// I remove 1st pal "Y" from my list that's// not present in "new_guy's" pal list

Y.pop_from_list(Me)// Have "Y" pop "Me"

Y.insert(new_guy , "start");// Y adds new_guy to his list// this will call new_guy to add "Y" as well

}

Page 72: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Communications Cost: Building the Network

• Total communications cost to build the network

b2 - f - (b-f)2

• b = # of buckets

• f = # of friends

Page 73: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Communications Cost: Building the Network

Page 74: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Communications Cost:Traversing the Network

• Flood algorithm:

b(f-1) - f + 2

• Spanning Tree:

b - 1

• Upper bound on the diameter of the network:

(b-f) /2 +1– (typically much less)

Page 75: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Network Resiliency

• The network can survive at least f-1 node (bucket) or edge (communications) failures and still remain fully connected

Page 76: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Cf. Other P2P Projects

• Gnutella– also O(N2) to build the network

• currently don’t know the exact message cost

• Chord, Tapestry, etc.– content addressable networking

• hash function to map keys to locations

– orthogonal to buckets

Page 77: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Chatting

• the stored objects are inactive until invoked– if no one communicates with the object, it never wakes

up, can never perform self-tests, etc.

• solution:– circulate a number of tokens through the network to

insure that everyone is woken up– buckets can perform a number of administrative tasks

at these times

• Core to solving the migration issue

Page 78: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Communications Tokens

Page 79: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Flocking…• Craig Reynolds, “Flocks, Herds, and Schools:A

Distributed Behavioral Model”, SIGGRAPH 87• Observations:

– flocks, schools, herds, etc. exhibit many desirable properties:• scale-free

– neighbors matter, not total size of flock

• no upper bound– flocks are never “full”

– flocks, etc. can be modeled with simple rules:• Collision Avoidance: avoid collisions with nearby flockmates • Velocity Matching: attempt to match velocity with nearby flockmates • Flock Centering: attempt to stay close to nearby flockmates

Page 80: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Flocking for DLs

Rules Flocking Boids Flocking Buckets

Collision Avoidance

avoid collisions with nearby flockmates

not overwriting one's own copies nor the copies of other buckets (i.e., namespace collision avoidance)

Velocity Matching

attempt to match velocity with nearby flockmates

deleting copies of oneself to provide “space” for late arrivals in a storage location

Flock Centering

attempt to stay close to nearby flockmates

following others to available storage locations

Page 81: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Flocking (9,4) “new repository available”

“new repository available”

Page 82: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Flocking (10,4)

Page 83: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Future Work

• Friends– optimizing the connections while sending the

communication token• convert to small world graph over time

– repair faults in the network

• Family– types

• active• passive

– provenance / authenticity

Page 84: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Other Applications for Smart Objects

• communication pulses will share the location of new services– format conversion (migration)– new repository locations (refreshing)– submit logs, alerts, other messages to people,

services, etc.

• self-arranging displays

Page 85: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Self-Arranging Displays For Buckets

• premise: to have the links in the object reflect the community’s preferences– real-time computation; no log file processing– Bollen & Nelson, “Adaptive Networks of Smart

Objects”– http://www.cs.odu.edu/~mln/pubs/bollenj_adaptive.pdf

Page 86: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Hebbian Learning

http://b2?method=display&referer=b2&redirect=http://b1?method\=display\%26redirect=http://b3?method=display\%26referer=http://b2

http://b1?method=display&referer=b1&redirect=http://b2?method\=display\%26referer=http://b1

Page 87: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Initial Experiment• Elango, Bollen & Nelson, "Dynamic Linking of Smart

Digital Objects Based on User Navigation Patterns"

– http://www.arxiv.org/abs/cs.DL/0401029– http://www.acm.org/technews/articles/20046/0607m.html#item8

– Take top 50 all-time pop music bands • from Spin Magazine’s top 50 bands of all time

– From each band, take 2 “related” bands • according to allmusic.com

– Create network of 150 buckets with band info (metadata from allmusic.com)

– Randomize the network• each band points to 3 other randomly selected bands

– Get people to traverse the network…

Page 88: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Sample Screenshot

Page 89: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Sample Results

Page 90: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

From the Initial Node: Public Enemy

Page 91: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Reviews and Summaries of Related Work

• Fedora, Warwick Framework, Kahn-Wilensky Framework, VERS, Multivalent Documents, Cryptolopes, etc.– NASA TM 211426– http://techreports.larc.nasa.gov/ltrs/PDF/2001/tm/NASA-2001-tm211426.pdf

• Journal of Digital Libraries, forthcoming special issue on Complex Digital Objects:– CFP http://www.dljournal.org/

Page 92: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Risks

• Why have these projects met with limited success or are only used in niche applications?– it is one thing to add a layer to your DL, but

changing the structure of your first-class objects incurs a level of short-term risk

– however, even the most well-thought out componentized DL is subject to long-term risks

• cf. ICASE DL

Page 93: Self-Preserving Digital Objects Michael L. Nelson mln@cs.odu.edu mln/ Several Slides from Terry L. Harrison University of Southern

Conclusions

• Smart objects are an idea whose time has come– natural progression of DL R&D

• Smart objects will play an fundamental role in digital preservation

• More info on preservation:– http://www.cs.odu.edu/~mln/teaching/cs791-s04/