new approaches to the catalog

48
New approaches to the catalog T. Hickey http://errol.oclc.org/laf/n82-54463.html Svensk Biblioteksförening 2005 October 28

Upload: alegria-martinez

Post on 31-Dec-2015

15 views

Category:

Documents


0 download

DESCRIPTION

New approaches to the catalog. T. Hickey http://errol.oclc.org/laf/n82-54463.html Svensk Biblioteksförening 2005 October 28. OCLC. Founded 1967 Nonprofit membership organization > 53,000 libraries 96 countries ~1,000 employees Cataloging Interlibrary Loan Preservation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: New approaches to the catalog

New approaches to the catalog

T. Hickeyhttp://errol.oclc.org/laf/n82-54463.html

Svensk Biblioteksförening 2005 October 28

Page 2: New approaches to the catalog

OCLC

Founded 1967 Nonprofit membership

organization > 53,000 libraries 96 countries ~1,000 employees

Cataloging Interlibrary Loan Preservation Dewey Decimal Classification netLibrary FirstSearch

Page 3: New approaches to the catalog

OCLC Research

Research for both• OCLC services• Membership

Metadata management Knowledge organization Content management Interoperability Systems & interaction design ~30 employees

Page 4: New approaches to the catalog

What do users want?

The right information– with minimum effort

Page 5: New approaches to the catalog

How to give them what they want

Catch them where they are Increase our data Improve our data Make the data work harder Interconnect with other systems Do all this efficiently

Page 6: New approaches to the catalog

What has changed

Computers and telecommunications• User expectations• Digital materials• Remoteness of our users• Huge amounts of bandwidth, storage

Page 7: New approaches to the catalog

The competition

Online booksellers• Reviews• Tables of contents• Excerpts• Inside-the-book searching

Web search engines• Speed• Full-text searching• Global coverage (of web resources)• Good enough

Ourselves• Electronic journals

Page 8: New approaches to the catalog

Current projects (my group)

Live search Registries, PURLs Dewey browser Harvesting, electronic theses VIAF, LAF SRU/W, OpenURLs, OAI FRBR, xISBN Beowulf cluster Map-reduce Text searching Batch loading

Open WorldCat WorldCat Wiki Publisher Names MXG

Page 9: New approaches to the catalog

Other Research Projects

FictionFinder, Curiouser Schema Transformation Terminology Services Digital Preservation Collection Analysis Dublin Core FAST User Studies Data mining

Also: http://www.oclc.org/research/researchworks/

Page 10: New approaches to the catalog

Catch them where they are

Google, Yahoo, etc.• Open WorldCat• Open URL• OAI-PMH

Creation too• WCat Wiki• Tags?

Page 11: New approaches to the catalog

OpenWorldCat

Page 12: New approaches to the catalog

Editions

Page 13: New approaches to the catalog

OpenURL

OpenURL registry• Supports version 1.0• Also registry of OpenURL servers• Used for WikiD

Page 14: New approaches to the catalog

WorldCat ‘Wiki’

Opening up WorldCat to user annotations• Reviews• Notes• Tables of contents• Cover art?• Book lists?

Based on WikiD software• Full Wiki

• Many features off for WorldCat• Uses OpenURL 1.0 protocol internally• Allows collections of pages of arbitrary XML schemas• Tools for the creation of simple collections

Doesn’t look like a Wiki

Page 15: New approaches to the catalog

Reviews

Page 16: New approaches to the catalog

Tags?

Folksonomies? User-generated key words We’ve been here before

• Is it different?• Is there another direction?

Page 17: New approaches to the catalog

Opening Dewey

Page 18: New approaches to the catalog
Page 19: New approaches to the catalog

More data

Harvesting• OAI-PMH• ETDs

Batch load• 60 million records• 3 million new manifestations

Other• Cover art• Reviews• WC

Page 20: New approaches to the catalog

Better data and organization

VIAF FRBR Authority files in general

• LAF• Publisher names• Genre• FAST

Registries• PURLs• Generalized solution?Get them nearer to creation

Page 21: New approaches to the catalog

FRBR

Work-set algorithm• Keys based on author/title• Authority files• Auxiliary authority files• xISBN

Used for• xISBN• Open WorldCat• FirstSearch (coming)• Collection analysis (coming)• Research

Page 22: New approaches to the catalog
Page 23: New approaches to the catalog

Authority Files

LAF• http://errol.oclc.org/laf/n82-54463.html

Publisher names• Not normally controlled• Looking for variations with ISBN prefixes• Also worked with dissertations

Page 24: New approaches to the catalog

VIAF

Merge national-level files Library of Congress (NACO) and Die Deutsche Bibliothek

• Bibliographic records analyzed• 15% would be erroneous based just on names

Basic matching now completed• 435,000 matching names• < 1% mismatched

Working on• Public interface• OAI harvesting• Persistent identifiers

Page 25: New approaches to the catalog
Page 26: New approaches to the catalog

Maj

Page 27: New approaches to the catalog

Registries

Show relationships between metadata Often associated with an identifier General solution? Examples

• Authority files• WorldCat• PURLs

Page 28: New approaches to the catalog

PURLs

Persistent URLs• Map one URL to another• http://purl.org/hickey/outgoing ->

• http://outgoing.typepad.com/• 500,000+ PURLs• 111 million resolutions

Port to Wiki’D platform?• http://www.oclc.org/research/projects/wikid/

String of PURL servers?• Use OAI-PMH for synchronization• Spread responsibility

Generalized solution?

Page 29: New approaches to the catalog

More connectivity

Open URL RSS feeds OpenSearch, SRU/W OAI-PMH

Page 30: New approaches to the catalog

OpenURL

Developed to address the ‘appropriate copy’ problem Transitioning to OpenURL 1.0 OpenURL resolver

• Accepts requests specifying• Resource• Services

Generalized syntax• Specifying a resource• Services to be performed

Metadata elements specified in registry• http://purl.org/openurl/

Page 31: New approaches to the catalog
Page 32: New approaches to the catalog

SRU

Simplified version of Z39.50• Web based• SRW – SOAP• SRU – URL

Even simpler?• OpenSearch• No search syntax• Looking for common ground

MXG• Metasearch XML Gateway• Simplifies metasearcher’s lives

Page 33: New approaches to the catalog

OAI-PMH

Method of harvesting metadata• More generally, a way of synchronizing databases

No real restriction to metadata Becomes a repository protocol

• Identifiers• Timestamps

Layered implementation• OAI• SRU• Pears

Page 34: New approaches to the catalog

Efficient processing

Beowulf cluster Map reduce Text searching

Page 35: New approaches to the catalog

Beowulf Cluster 24 nodes

• 2 processors, 4 gigabytes of RAM, 120 gigabytes disk• Gigabit network

Use it for• FRBR processing• Text indexing• Text searching

~ 30-fold speed up on many tasks• 1 year ⇒ 2 weeks• 1 week ⇒ 1 day• 1 day ⇒ 1 hour• 1 hour ⇒ 2 minutes

Extremely cheap processing

Page 36: New approaches to the catalog

Map reduce

Pioneered by Google• Petabytes of data on thousands of nodes

Adapted to our cluster• Tens of gigabytes of data on dozens of nodes

Simple functional programming paradigm Allows batch processing across cluster

Page 37: New approaches to the catalog

Text Searching

Spread database across cluster Two levels of aggregation

• 3 servers/node• 24-way aggregation• Aggregators run across cluster

SRU used• HTTP based• SRW (SOAP) slowed it down

Open source software

Page 38: New approaches to the catalog

Better interfaces

More interactive• Live search• Dewey Browser

Better connected

Page 39: New approaches to the catalog
Page 40: New approaches to the catalog
Page 41: New approaches to the catalog
Page 42: New approaches to the catalog
Page 43: New approaches to the catalog
Page 44: New approaches to the catalog

Post-coordination of Services

Systems that expose low level services Higher level coordination of those services Loosely coupled services Examples from OCLC

• Validation service• RSS feeds• SRU• OpenURL, OAI-PMH• xISBN• DDC Browser built this way

• Very different interfaces have been built

Page 45: New approaches to the catalog

DDC Browser XML <?xml version="1.0" encoding="utf-8"?><?xml-stylesheet

type="text/xsl" href="/ddcbrowser/xsl/wcat.xsl" ?> <cells>

• <language>swe</language>• <cell ddc="330" count="23" /> • <cell ddc="331" count="28" /> • <cell ddc="332" count="5" /> • <cell ddc="333" count="7" /> • <cell ddc="334" count="2" /> • <cell ddc="335" count="1" /> • <cell ddc="336" count="3" /> • <cell ddc="337" count="2" /> • <cell ddc="338" count="26" /> • <cell ddc="339" count="5" />

</cells>

Page 46: New approaches to the catalog

Do We Need It?

Just have Google harvest everything• Our experience with Google• Fielded searching• Reliable searching

Possibility of user-supplied metadata Cost of good metadata Cost of non-existent metadata

Page 47: New approaches to the catalog

Conclusions

Shift to remote users Online availability – trend towards centralization More flexibility in implementations

Patrons are better served Less emphasis on physical collections

Page 48: New approaches to the catalog

Thank you

T. Hickeyhttp://errol.oclc.org/laf/n82-54463.html

Swedish Library Association2005 October 28