new approaches to the catalog
DESCRIPTION
New approaches to the catalog. T. Hickey http://errol.oclc.org/laf/n82-54463.html Svensk Biblioteksförening 2005 October 28. OCLC. Founded 1967 Nonprofit membership organization > 53,000 libraries 96 countries ~1,000 employees Cataloging Interlibrary Loan Preservation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/1.jpg)
New approaches to the catalog
T. Hickeyhttp://errol.oclc.org/laf/n82-54463.html
Svensk Biblioteksförening 2005 October 28
![Page 2: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/2.jpg)
OCLC
Founded 1967 Nonprofit membership
organization > 53,000 libraries 96 countries ~1,000 employees
Cataloging Interlibrary Loan Preservation Dewey Decimal Classification netLibrary FirstSearch
![Page 3: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/3.jpg)
OCLC Research
Research for both• OCLC services• Membership
Metadata management Knowledge organization Content management Interoperability Systems & interaction design ~30 employees
![Page 4: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/4.jpg)
What do users want?
The right information– with minimum effort
![Page 5: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/5.jpg)
How to give them what they want
Catch them where they are Increase our data Improve our data Make the data work harder Interconnect with other systems Do all this efficiently
![Page 6: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/6.jpg)
What has changed
Computers and telecommunications• User expectations• Digital materials• Remoteness of our users• Huge amounts of bandwidth, storage
![Page 7: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/7.jpg)
The competition
Online booksellers• Reviews• Tables of contents• Excerpts• Inside-the-book searching
Web search engines• Speed• Full-text searching• Global coverage (of web resources)• Good enough
Ourselves• Electronic journals
![Page 8: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/8.jpg)
Current projects (my group)
Live search Registries, PURLs Dewey browser Harvesting, electronic theses VIAF, LAF SRU/W, OpenURLs, OAI FRBR, xISBN Beowulf cluster Map-reduce Text searching Batch loading
Open WorldCat WorldCat Wiki Publisher Names MXG
![Page 9: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/9.jpg)
Other Research Projects
FictionFinder, Curiouser Schema Transformation Terminology Services Digital Preservation Collection Analysis Dublin Core FAST User Studies Data mining
Also: http://www.oclc.org/research/researchworks/
![Page 10: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/10.jpg)
Catch them where they are
Google, Yahoo, etc.• Open WorldCat• Open URL• OAI-PMH
Creation too• WCat Wiki• Tags?
![Page 11: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/11.jpg)
OpenWorldCat
![Page 12: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/12.jpg)
Editions
![Page 13: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/13.jpg)
OpenURL
OpenURL registry• Supports version 1.0• Also registry of OpenURL servers• Used for WikiD
![Page 14: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/14.jpg)
WorldCat ‘Wiki’
Opening up WorldCat to user annotations• Reviews• Notes• Tables of contents• Cover art?• Book lists?
Based on WikiD software• Full Wiki
• Many features off for WorldCat• Uses OpenURL 1.0 protocol internally• Allows collections of pages of arbitrary XML schemas• Tools for the creation of simple collections
Doesn’t look like a Wiki
![Page 15: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/15.jpg)
Reviews
![Page 16: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/16.jpg)
Tags?
Folksonomies? User-generated key words We’ve been here before
• Is it different?• Is there another direction?
![Page 17: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/17.jpg)
Opening Dewey
![Page 18: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/18.jpg)
![Page 19: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/19.jpg)
More data
Harvesting• OAI-PMH• ETDs
Batch load• 60 million records• 3 million new manifestations
Other• Cover art• Reviews• WC
![Page 20: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/20.jpg)
Better data and organization
VIAF FRBR Authority files in general
• LAF• Publisher names• Genre• FAST
Registries• PURLs• Generalized solution?Get them nearer to creation
![Page 21: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/21.jpg)
FRBR
Work-set algorithm• Keys based on author/title• Authority files• Auxiliary authority files• xISBN
Used for• xISBN• Open WorldCat• FirstSearch (coming)• Collection analysis (coming)• Research
![Page 22: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/22.jpg)
![Page 23: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/23.jpg)
Authority Files
LAF• http://errol.oclc.org/laf/n82-54463.html
Publisher names• Not normally controlled• Looking for variations with ISBN prefixes• Also worked with dissertations
![Page 24: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/24.jpg)
VIAF
Merge national-level files Library of Congress (NACO) and Die Deutsche Bibliothek
• Bibliographic records analyzed• 15% would be erroneous based just on names
Basic matching now completed• 435,000 matching names• < 1% mismatched
Working on• Public interface• OAI harvesting• Persistent identifiers
![Page 25: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/25.jpg)
![Page 26: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/26.jpg)
Maj
![Page 27: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/27.jpg)
Registries
Show relationships between metadata Often associated with an identifier General solution? Examples
• Authority files• WorldCat• PURLs
![Page 28: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/28.jpg)
PURLs
Persistent URLs• Map one URL to another• http://purl.org/hickey/outgoing ->
• http://outgoing.typepad.com/• 500,000+ PURLs• 111 million resolutions
Port to Wiki’D platform?• http://www.oclc.org/research/projects/wikid/
String of PURL servers?• Use OAI-PMH for synchronization• Spread responsibility
Generalized solution?
![Page 29: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/29.jpg)
More connectivity
Open URL RSS feeds OpenSearch, SRU/W OAI-PMH
![Page 30: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/30.jpg)
OpenURL
Developed to address the ‘appropriate copy’ problem Transitioning to OpenURL 1.0 OpenURL resolver
• Accepts requests specifying• Resource• Services
Generalized syntax• Specifying a resource• Services to be performed
Metadata elements specified in registry• http://purl.org/openurl/
![Page 31: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/31.jpg)
![Page 32: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/32.jpg)
SRU
Simplified version of Z39.50• Web based• SRW – SOAP• SRU – URL
Even simpler?• OpenSearch• No search syntax• Looking for common ground
MXG• Metasearch XML Gateway• Simplifies metasearcher’s lives
![Page 33: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/33.jpg)
OAI-PMH
Method of harvesting metadata• More generally, a way of synchronizing databases
No real restriction to metadata Becomes a repository protocol
• Identifiers• Timestamps
Layered implementation• OAI• SRU• Pears
![Page 34: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/34.jpg)
Efficient processing
Beowulf cluster Map reduce Text searching
![Page 35: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/35.jpg)
Beowulf Cluster 24 nodes
• 2 processors, 4 gigabytes of RAM, 120 gigabytes disk• Gigabit network
Use it for• FRBR processing• Text indexing• Text searching
~ 30-fold speed up on many tasks• 1 year ⇒ 2 weeks• 1 week ⇒ 1 day• 1 day ⇒ 1 hour• 1 hour ⇒ 2 minutes
Extremely cheap processing
![Page 36: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/36.jpg)
Map reduce
Pioneered by Google• Petabytes of data on thousands of nodes
Adapted to our cluster• Tens of gigabytes of data on dozens of nodes
Simple functional programming paradigm Allows batch processing across cluster
![Page 37: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/37.jpg)
Text Searching
Spread database across cluster Two levels of aggregation
• 3 servers/node• 24-way aggregation• Aggregators run across cluster
SRU used• HTTP based• SRW (SOAP) slowed it down
Open source software
![Page 38: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/38.jpg)
Better interfaces
More interactive• Live search• Dewey Browser
Better connected
![Page 39: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/39.jpg)
![Page 40: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/40.jpg)
![Page 41: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/41.jpg)
![Page 42: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/42.jpg)
![Page 43: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/43.jpg)
![Page 44: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/44.jpg)
Post-coordination of Services
Systems that expose low level services Higher level coordination of those services Loosely coupled services Examples from OCLC
• Validation service• RSS feeds• SRU• OpenURL, OAI-PMH• xISBN• DDC Browser built this way
• Very different interfaces have been built
![Page 45: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/45.jpg)
DDC Browser XML <?xml version="1.0" encoding="utf-8"?><?xml-stylesheet
type="text/xsl" href="/ddcbrowser/xsl/wcat.xsl" ?> <cells>
• <language>swe</language>• <cell ddc="330" count="23" /> • <cell ddc="331" count="28" /> • <cell ddc="332" count="5" /> • <cell ddc="333" count="7" /> • <cell ddc="334" count="2" /> • <cell ddc="335" count="1" /> • <cell ddc="336" count="3" /> • <cell ddc="337" count="2" /> • <cell ddc="338" count="26" /> • <cell ddc="339" count="5" />
</cells>
![Page 46: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/46.jpg)
Do We Need It?
Just have Google harvest everything• Our experience with Google• Fielded searching• Reliable searching
Possibility of user-supplied metadata Cost of good metadata Cost of non-existent metadata
![Page 47: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/47.jpg)
Conclusions
Shift to remote users Online availability – trend towards centralization More flexibility in implementations
Patrons are better served Less emphasis on physical collections
![Page 48: New approaches to the catalog](https://reader033.vdocument.in/reader033/viewer/2022042822/56812d6c550346895d927fdf/html5/thumbnails/48.jpg)
Thank you
T. Hickeyhttp://errol.oclc.org/laf/n82-54463.html
Swedish Library Association2005 October 28