the ups protoproto project herbert van de sompel, michael nelson, thomas krichel ups 1 meeting santa...

30
the UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

Upload: horatio-bruce

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

projectwhy a protoproto? UPS: enable cross-archive end-user services protoproto: –facilitate discussions –identify issues involved in creating cross-archive services –experiment with digital object concepts for archive material –does not claim to be a solution protoproto is multi-disciplinary –a special instance of cross-archive –there is a market –promotional value

TRANSCRIPT

Page 1: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

the UPS protoproto project

herbert van de sompel, michael nelson, thomas krichel

UPS 1 MeetingSanta Fe - October 21th 1999

Page 2: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

descriptionproject

the UPS protoprotodemo

the data exchange frameworkdex

Page 3: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project why a protoproto?

• UPS: enable cross-archive end-user services• protoproto:

– facilitate discussions– identify issues involved in creating cross-archive services– experiment with digital object concepts for archive

material– does not claim to be a solution

• protoproto is multi-disciplinary– a special instance of cross-archive– there is a market– promotional value

Page 4: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project who?

• coordination: herbert van de sompel, michael nelson, thomas krichel

• involvement of:– Old Dominion U & NASA Langley– U of Surrey– U of Ghent– Los Alamos National Laboratory - Library– Russian Academy of Science - Siberian branch

Page 5: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project sponsors

• Los Alamos National Laboratory - Research Library• JISC eLib WoPEc project

Page 6: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project datasets– metadata only– full text remains at archives– static dumps obtained ca. July 99

the arXivCogPrintsNACANCSTRLNDLTDRePEc

Total

objects85,223

7423,03629,1841,59073,367

193,142

full-text85,223

6593,0369,084951

13,582

112,535

!organization17,983

14100931

2,453

Page 7: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project metadata formats

the arXivCogPrintsNACANCSTRLNDLTDRePEc

formatinternalinternalReferRFC1807MARCReDIF

Page 8: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• Getting metadata out of archives– not all archives support metadata extraction

• some archives have undocumented metadata extraction procedures

– not all archives support rich criteria for extraction • single dump concept only

• Intellectual property and use rights not always clear

project metadata extraction

Page 9: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• Metadata has problems with:– record duplication– crucial missing fields– internal errors– ambiguous references to people and places,

publications

project metadata quality

Page 10: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project metadata conversion

• data enhancements:• creation of unique identifier• addition of raw subject-classification• normalization of publication types

• all datasets converted to ReDIF:• essential to have a single fomat for the creation of services • supply by archives in a single format was not realistic• no downgrading of data

Page 11: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project re-creation of archives

• creation of archives for ReDIF-ed metadata• using intelligent digital objects : “buckets”

arXiv

RePEc

NCSTRL

Page 12: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• Buckets were chosen to study the implications of using rich, intelligent objects in UPS

• Buckets are:– DL protocol / system independent– self-contained and mobile– handle their own display, enforcement of terms and conditions, and

dissemination of their contents– designed for bundling multiple data representations and data instance types

• The aggregative nature of buckets is well suited for adding valued-added services at the object level

project buckets

Page 13: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project creation of end-user service

• NCSTRL+ digital library service• indexing buckets in archives by requesting their metadata• enhanced user-interface• NCSTRL+ search results point at buckets• buckets auto-display• buckets provide link to full-text in native archive

Page 14: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• UPS contains 193K objects– using buckets consumed inodes (~60 inodes per

bucket)• filesystem reformatted with more generous amount of

inodes– Solaris and Dienst conflict

• Dienst wants each object in an publishing authority to be in a single directory

• Solaris has a hard limit of 32K objects in a directory• resolution: use many (100+) authorities for UPS

project scaling problems

Page 15: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project addition of linking service

• integrate the archives with the traditional communication mechanism• context-sensitive linking to deliver extended services via SFX technology

Page 16: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project SFX linking service

metadatametadata evaluate metadata

extended services

system A system B

Page 17: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

project SFX linking database

Page 18: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• buckets for arXiv, NCSTRL and RePEc are SFX-aware

• Cogprints, NACA, NDLTD not SFX-aware• SLAC/SPIRES is SFX-aware• linking services for preprint metadata + for published version

project addition of linking service

Page 19: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

demo the UPS protoproto

http://ups.cs.odu.edu:8000/

• will be available starting beginning of November• UPS list will be notified• disclaimer “not a production system”

http://ups.cs.odu.edu

Page 20: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

dex some issues (I)

•data exchange framework•data provision vs. data implementation •central searching, distributed archives

• need for a framework by which archives can describe themselves:

• content • terms and conditions• protocols, criteria supported to extract (meta)data• metadata scheme, subject classification scheme, material-type scheme, ...

Page 21: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• need for an identifier scheme for archives and archive objects

•(cf. ISSN, ISBN, DOI) • metadata quality obstructs the creation of services• desirabile to extend metadata with citation information• smart objects

• archived objects that are active, not passsive

dex some issues (II)

Page 22: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• Providing data:– publishing into an archive– providing methods for metadata “harvesting”

• provide non-technical context for sharing information also

• Implementing Data:– harvest metadata from providers– implement user interface to data

• Even if provided by the same DL, these are distinct functions

dex providing vs. implementing data

Page 23: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

ProviderInputinterface

Nativeend-userinterface

ProviderInputinterface

Nativeend-userinterface

Nativeharvestinginterface

No machine based way to extract metadata…

Machine and user interfacesfor extracting metadata….

dex providing vs. implementing data

Page 24: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

ProviderInputinterface

Nativeharvestinginterface

ProviderInputinterface

Nativeend-userinterface

Nativeharvestinginterface

ImplementorNativeend-userinterface

Input and harvesting interfaces optional

Native end-userinterface optional(e.g., RePEc)

dex providing vs. implementing data

Page 25: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• Much of the learning about the constituent UPS archives occurred out of band…

• Given an unknown archive, we should be able to algorithmically determine the archive’s metadata...

ProviderInputinterface

Nativeend-userinterface

Nativeharvestinginterface Where possible, the

harvesting interface should provide the samecriteria as the end-user interface

dex self-describing archives

Page 26: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• Recommended criteria for metadata extraction:– subject classification– accession date– publication date

• Criteria for archive description– metadata formats employed– contact information for archive– publication type scheme– identifier scheme– subject classification scheme

dex self-describing archives

Page 27: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• Useful in:– reference linking– can be used in citations– resolving duplications

• UPS duplications were removed by hand– tracking publication lifecycle

• Need the ability for an object to have multiple unique identifiers – organization, discipline, etc.

dex identifiers

Page 28: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• Premise: Objects are more important than the archives that hold them

• SODA: Smart Objects, Dumb Archives

• Objects should be the canonical authority for• metadata• contents• use

• Objects should be able to grow and change• correct metadata• add new formats• add new services• reflect the lifecycle of the object

dex smart objects

Page 29: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• It would be beneficial if the archived objects could be heterogenous:

• with their own “look-and-feel”• unique functionality / services

– e.g., the data archiving needs of an atmospheric scientist can be different than that of a computer scientist, engineer or medical researcher

• yet maintained a standard API for:• extracting metadata• content retrieval• resource discovery on the object• terms and conditions

dex smart objects

Page 30: The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

• A strong distinction between the provision of data, and the implementation of data– also, a socio-legal context for sharing metadata

• Open, “self-describing” archives• A universal, unique identifier name space• Archived objects with more intelligence and

flexibility

dex lessons learned