delivering marc/xml records from the library of congress catalogue using the open protocols srw/u...

49
Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data [email protected]

Upload: alexis-harper

Post on 28-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARC/XML records

from the Library of Congress

catalogue using the open

protocols SRW/U and Z39.50

Mike Taylor, Index Data

[email protected]

Page 2: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Overview

Where we're headed in the next half-hour:

Existing standards for library catalogues

The new XML equivalents of these standards

Providing XML access to existing catalogues

Two services running from two databases

Two services running from a single database

New gateway running over the existing service

The Library of Congress's solution

Page 3: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Existing standards for catalogues

The value of existing standards is well understood:

MARC (MAchine Readable Catalogue) records

ISO 2709 (interchange format for MARC)

ANSI/NISO Z39.50 (search and retrieve on the Internet)

These standards allow interoperability and co-operation

between libraries that other fields can only dream about.

(Librarians don't know how lucky they are!)

Page 4: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Z39.50 for searching catalogues

Library of Congress

Z39.50 server

Z39.50 client

Z39.50 (fetching MARC records)

Page 5: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Library of Congress

Z39.50 server

Z39.50 client

Z39.50

British Library

Z39.50 server

Z39.50 for searching catalogues

Page 6: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Library of Congress

Z39.50 server

Z39.50 client

Z39.50

British Library

Z39.50 server

Local catalogue

Z39.50 server

Z39.50 for searching catalogues

Page 7: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Library of Congress

Z39.50 server

Metasearching

Z39.50 client

Z39.50

British Library

Z39.50 server

Local catalogue

Z39.50 server

Z39.50 Z39.50

Z39.50 for searching multiple catalogues

Page 8: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Trouble in paradise

Then the serpent saith unto Adam, “Lo, why doth thy

catalogue service not use XML?” And Adam saith, “Verily,

Z39.50 worketh just fine.” But the serpent, who was subtle

of tongue, saith unto him, “But XML is more fashionable.”

And, behold, Adam was deceived, and did fall.

-- The Book of Standards, ch. 3, v. 4-6.

Page 9: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Library of Congress

Z39.50 server

Metasearching

Z39.50 client

Z39.50

British Library

Z39.50 server

Local catalogue

Z39.50 server

Z39.50 Z39.50

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Welcome to the 21st Century

Everything

must be XML

Page 10: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Library of Congress

Z39.50 server

Metasearching

Z39.50 client

Z39.50

British Library

Z39.50 server

Local catalogue

Z39.50 server

Z39.50 Z39.50

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Welcome to the 21st Century

Resistance

is useless!

Page 11: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Catalogue standards in an XML world

The binary USMARC format is superseded by MARCXML.

“As many of the original developers of Dublin Core were

Americans, various parochial national standards were

referenced. This will hopefully get fixed with the belated

discovery of the rest of the planet.” (Unattributed, sadly.)

Enter MarcXchange, a MARCXML superset that can

represent all the national MARC formats (DANMARC, etc.)

(Though repairing MARCXML might have been better.)

Page 12: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Catalogue standards in an XML world

The binary Z39.50 protocol is superseded by SRU.

(Search/Retrieve by Url). This is a NISO-registered

standard for expressing queries using rich URLs, to obtain

XML responses that contain records matching the query.

http://sru.miketaylor.org.uk/sru.pl?version=1.1&operation=searchRetrieve&query=dinosaur&startRecord=1&maximumRecords=1&recordSchema=dc

Page 13: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

An SRU response (single DC record)<?xml version="1.0"?><zs:searchRetrieveResponse xmlns:zs='http://www.loc.gov/zing/srw/'> <zs:version>1.1</zs:version> <zs:numberOfRecords>29</zs:numberOfRecords> <zs:records> <zs:record> <zs:recordSchema>info:srw/schema/1/dc-v1.1</zs:recordSchema> <zs:recordPacking>xml</zs:recordPacking> <zs:recordPosition>1</zs:recordPosition> <zs:recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns="http://purl.org/dc/elements/1.1/"> <title>Fossils</title> <creator>Lappi, Megan.</creator> <type>text</type> <publisher>New York, NY: Weigl Publishers</publisher> <date>2005</date> <language>en</language> <description>Studying fossils -- Fossil facts -- Gone forever -- A fossil is born -- From bone to stone -- Insects in amber -- Dinosaur footprints</description> <identifier>http://www.loc.gov/catdir/toc/ecip0415/2004004136.html</identifier> <identifier>URN:ISBN:1590362136</identifier> </srw_dc:dc> </zs:recordData> </zs:record> </zs:records></zs:searchRetrieveResponse>

Page 14: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

An SRU response (single DC record)<?xml version="1.0"?><zs:searchRetrieveResponse xmlns:zs='http://www.loc.gov/zing/srw/'> <zs:version>1.1</zs:version> <zs:numberOfRecords>29</zs:numberOfRecords> <zs:records> <zs:record> <zs:recordSchema>info:srw/schema/1/dc-v1.1</zs:recordSchema> <zs:recordPacking>xml</zs:recordPacking> <zs:recordPosition>1</zs:recordPosition> <zs:recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns="http://purl.org/dc/elements/1.1/"> <title>Fossils</title> <creator>Lappi, Megan.</creator> <type>text</type> <publisher>New York, NY: Weigl Publishers</publisher> <date>2005</date> <language>en</language> <description>Studying fossils -- Fossil facts -- Gone forever -- A fossil is born -- From bone to stone -- Insects in amber -- Dinosaur footprints</description> <identifier>http://www.loc.gov/catdir/toc/ecip0415/2004004136.html</identifier> <identifier>URN:ISBN:1590362136</identifier> </srw_dc:dc> </zs:recordData> </zs:record> </zs:records></zs:searchRetrieveResponse>

Page 15: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

An SRU response (single DC record)<?xml version="1.0"?><zs:searchRetrieveResponse xmlns:zs='http://www.loc.gov/zing/srw/'> <zs:version>1.1</zs:version> <zs:numberOfRecords>29</zs:numberOfRecords> <zs:records> <zs:record> <zs:recordSchema>info:srw/schema/1/dc-v1.1</zs:recordSchema> <zs:recordPacking>xml</zs:recordPacking> <zs:recordPosition>1</zs:recordPosition> <zs:recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns="http://purl.org/dc/elements/1.1/"> <title>Fossils</title> <creator>Lappi, Megan.</creator> <type>text</type> <publisher>New York, NY: Weigl Publishers</publisher> <date>2005</date> <language>en</language> <description>Studying fossils -- Fossil facts -- Gone forever -- A fossil is born -- From bone to stone -- Insects in amber -- Dinosaur footprints</description> <identifier>http://www.loc.gov/catdir/toc/ecip0415/2004004136.html</identifier> <identifier>URN:ISBN:1590362136</identifier> </srw_dc:dc> </zs:recordData> </zs:record> </zs:records></zs:searchRetrieveResponse>

Page 16: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

SRU's big brother: SRW

SRU works by fetching rich URLs.

SRW (Search/Retrieve Webservice) works over SOAP.

In theory, SRW is more powerful and flexible than SRU.

In practice, it is hard to implement and runs more slowly.

It is still important because many Big Players (Microsoft,

IBM, etc.) have a big investment in SOAP.

However, most implementations have used SRU. With

HTTP/1.1 persistent connections, performance is fine.

Page 17: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

SRU's query language: CQL

CQL (Common Query Language) is used by SRU and SRW.

It may also be used in other contexts (including Z39.50).

Its syntax is easy to learn, but very expressive.

dinosaurtitle=dinosaurtitle=(dinosaur or pterosaur) and author=martilldc.title=*saur and dc.author=martilltitle exact "the complete dinosaur" and date < 2000name=/phonetic "smith"fish prox/distance<3/unit=sentence frog

Page 18: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Now what?

We have:

A mature, functional infrastructure based on MARC and Z39.50

A world out there that is comfortable with XML-based technology

An XML-based equivalent of MARC (MARCXML/MarcXchange)

An XML-based equivalent of Z39.50 (SRU)

But we don't have

Actual running SRU servers that deliver MARCXML records.

Can we get there from here?

Page 19: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Server providers don't want to switch

Library of Congress

SRU server

Z39.50 client

Z39.50

Uh-oh!

Page 20: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Client applications don't want to switch

Library of Congress

Z39.50 server

SRU client

SRU

Uh-oh!

Page 21: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Transition period: run both services

Library of Congress

Z39.50 server

Z39.50 client

Z39.50

Library of Congress

SRU server

SRU client

SRU

Page 22: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Transition period: run both services

This approach gives client applications a choice:

Existing client applications continue to work

New applications can be built using new technology

This flexibility comes at a cost to the service providers,

who have to provide not one but two services.

How can they do this? There are three approaches.

Page 23: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

The two-database approach

Library of Congress

Z39.50 server

Library of Congress

SRU server

MARCXML

database

MARC

database

Proprietary APIProprietary API

Page 24: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Why the two-database approach sucks

The two-database has the advantage of conceptual and

operational simplicity. The two separate systems can be

maintained by separate teams.

However: THE TWO DATABASES HAVE TO BE KEPT

SYNCHRONISED.

At best this entails duplication of effort.

At worst, it fails completely, and a record fetch from one

database may be different from the same record fetched

from the other database. (If it exists at all.)

Page 25: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

The one-database-two-services approach

Library of Congress

Z39.50 server

Library of Congress

SRU server

MARC

database

Proprietary API Proprietary API

Page 26: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Advantages of the 1D2S approach

When both services use data from the same database,

only one copy of the database has to be maintained.

This approach has several advantages:

Page 27: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Advantages of the 1D2S approach

When both services use data from the same database,

only one copy of the database has to be maintained.

This approach has several advantages:

Eliminates duplication

Page 28: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Advantages of the 1D2S approach

When both services use data from the same database,

only one copy of the database has to be maintained.

This approach has several advantages:

Eliminates duplication

Reduces redundancy

Page 29: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Advantages of the 1D2S approach

When both services use data from the same database,

only one copy of the database has to be maintained.

This approach has several advantages:

Eliminates duplication

Reduces redundancy

Reduces redundancy

Page 30: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Advantages of the 1D2S approach

When both services use data from the same database,

only one copy of the database has to be maintained.

This approach has several advantages:

Eliminates duplication

Reduces redundancy

Reduces redundancy

Eliminates duplication

Page 31: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

The horrible truth

Library of Congress

Z39.50 server

Library of Congress

SRU server

Proprietary

database

No API!

When the database (and Z39.50 server) are part of an integrated

proprietary system, the SRU server runs into a brick wall.

Opa

que

blac

k bo

x

Page 32: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

The solution

Library of Congress

Z39.50 server

Library of Congress

SRU server

Proprietary

database

Z39.50 IS the API!B

lack

box

with

a li

ttle

hol

e

Page 33: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Why this is so cute

When the SRU server uses Z39.50 as its API to the database,

it is an SRU-to-Z39.50 gateway. Its front-end is an SRU

server and its back-end is a Z39.50 client.

This rocks because:

No duplication of data is necessary

No co-operation is necessary from the existing software

Use of the standard Z39.50 protocol as the API to the

database means that THE SAME GATEWAY can be

used to provide SRU access to ANY CATALOGUE

that is already available via Z39.50.

Page 34: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

A novel application of Z39.50

Z39.50 is most often used to allow a client to query a

remote server.

Here we are using it as a tightly integrated part of

a locally provided service -- the gateway will typically run

on the same machine as the Z39.50 server, or on a

“nearby” machine on the same LAN.

HOWEVER, because Z39.50 is a network API rather than

a link-time API, other interesting arrangements are possible.

Page 35: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Typical architecture: “integrated” SRU

Library of Congress

Z39.50 server

Library of Congress

SRU server

Proprietary

database

SRU client

SRU

Opa

que

blac

k bo

x

Page 36: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

Alternative architecture: “3rd party” SRU

3rd party service

SRU server

Library of Congress

Z39.50 server

Proprietary

database

SRU client

SRU

Running in England

Running in USA

Denmark

Opa

que

blac

k bo

x

Page 37: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

“What's it like?”

SRU client software neither knows nor cares that the

server it is connected to is really a gateway.

Application user knows nothing about the Z39.50 database.

You might expect that performance would degrade due

to the additional step.

In practice, with a high-quality gateway, performance of

the SRU server greatly exceeds that of the underlying

Z39.50 server.

Page 38: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

“What's it like?”

SRU client software neither knows nor cares that the

server it is connected to is really a gateway.

Application user knows nothing about the Z39.50 database.

You might expect that performance would degrade due

to the additional step.

In practice, with a high-quality gateway, performance of

the SRU server greatly exceeds that of the underlying

Z39.50 server. (This is done using magic.)

Page 39: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

The Library of Congress's solution

The Library of Congress contracted Index Data (that's us)

to build an SRU-to-Z39.50 gateway for them.

Having built it, we released it under an Open Source licence,

(the GNU General Public Licence)

The LC SRU server is available to anyone at:http://z3950.loc.gov:7090/Voyager

The gateway is freely available to download at:http://indexdata.com/yazproxy/

Page 40: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

(Digression: why is it called YAZ Proxy?)

YAZ is our battle-tested and widely deployed Z39.50 toolkit.

(It powers 2/3 of all Z39.50 clients and servers worldwide.)

YAZ Proxy is so called because it acts as a Z39.50-to-Z39.50

gateway as well as SRU-to-Z39.50 (and SRW-to-Z39.50).

Why would you want a Z39.50 proxy? For the same reasons

you want a Web proxy such as Squid:

Reduce load on the underlying server

Improve client performance through caching

Protect fragile back-end by sanitising client requests

Balance load over multiple back-end servers

Page 41: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

What YAZ Proxy does

For each SRU Search Request that it receives, YAZ Proxy:

Translates the CQL query into a Z39.50 Type-1 query

Embeds the translated query in a Z39.50 Search Request

Sends the request to the back-end server

(Asynchronously) awaits the Z39.50 Search Response

Extracts the MARC records from the response

Converts them into MARCXML

Embeds the converted records in an SRU Search Response

Returns the response to the client

All this is transparent to the SRU client and the Z39.50 server.

Page 42: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

The sauropod dinosaur Brachiosaurus

(It's been a while since we had a picture.)

Page 43: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

YAZ Proxy in detail: performance features

Access to the LC catalogue -- whether by Z39.50 or SRU --

is much faster through YAZ Proxy than directly.

YAZ Proxy re-uses a pool of initialised back-end sessions

It can pre-cache a set of ready-to-use back-end sessions

Query-caching avoids repeated identical searches

Record-caching allows repeated requests for the same

record to be instantaneous

The total effect is that access via YAZ Proxy is typically 10-100

times faster. (Source: Larry Dixson of the Library of Congress.)

Page 44: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

YAZ Proxy in detail: load balancing

YAZ Proxy can be configured to balance load across

multiple back-end Z39.50 servers. Queries are generally

sent to the least heavily loaded back-end.

This allows a heavily-used service to be scaled across multiple

servers, distributed and made robust against system failure.

(Arrangements must be made to keep the multiple copies

up to date and synchronised.)

Page 45: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

YAZ Proxy in detail: query translation

Both CQL and the Z39.50 Type-1 query allow application-specific

extensions (e.g. geospatial searching, thesaurus navigation).

Translation from CQL to Type-1 is therefore driven by a simple

configuration file which maps CQL index-names, relations, etc.

into Z39.50 Type-1 query attributes.

index.cql.serverChoice = 1=1016index.rec.id = 1=12index.dc.title = 1=4index.dc.subject = 1=21relation.< = 2=1relation.le = 2=2relationModifier.relevant = 2=102

Page 46: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

YAZ Proxy in detail: record translation

Translating MARC (ISO2709) records into MARCXML is a core

function of YAZ Proxy.

It can also be configured to further transform the translated

MARCXML records using arbitrary XSLT stylesheets.

Standard stylesheets support translation to

Dublin Core

MODS

METS

Other formats, such as OAI_DC, are easy to support.

Page 47: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

But, Mike! This is too good to be true!

Yes.

Page 48: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>

But how do you people make a living?

Apart from living on good karma, we make money from:

Bespoke development (e.g. building YAZ Proxy)

Customisation (e.g. adding support for new XML formats)

Integration (e.g. making the proxy use local authentication)

Support contracts (but these are strictly optional)

Consultancy

We also provide services such as hosted SRU-to-Z39.50

gateways, so YOUR ORGANISATION could support SRU

(and SRW) access, and accelerate its Z39.50 service,

without requiring you to install any software.

Page 49: Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

Thanks for listening!

You know where to find us.http://indexdata.com/

Tel. +45 3341 0100

Fax. +45 3341 0101

Delivering MARCXML using SRW/U Mike Taylor, Index Data <[email protected]>