a rest-ful web services approach to library federated search using sru kevin reiss rutgers-newark...

A REST-ful Web Services Approach to Library Federated

Search using SRU

Kevin ReissRutgers-Newark

Law LibraryCALI 2005 – June 11th

Wouldn’t it Be Nice?

If you could search all your library’s electronic and print resources from one location?

If interesting distributed resources could be easily integrated with local materials for your users?

If you could get at the “Deep Web” in a structured and simple manner? Institutional Repositories Database Content Open Access Journals

Project Genesis

Desire to increase visibility of all library resources

Provide a common search interface to a growing group of resources Library Catalog In House Digital Library Collections The Web & Academic Internet (Future)

Need to use an open-source solution Cost Compatibility with emerging digital library standards Make our digitized resources programmer-friendly

Solution: Federated Search

The SRU Protocol The Search Retrieve URL Service Implemented using open source Perl and Python Modules

Search different collections and resources at the same time

Provides a complimentary search interface monolithic single application search interfaces Not a replacement, but a complement

Implementation of standard search and retrieval protocols can benefit: Authors Publisher Content Providers Users

Why Should you Care?

Directors Open standards & protocols = More return on software

investment Bring the Academic Internet to your users

Web Developers/Masters Open protocols and standards bring down implementation

barriers such as time and cost Develop customized interfaces using XML/XSLT for search

and retrieval Library Technologists

Improve deep web accessibility Learn about a descendent of a familiar tool (Z39.50)

What is a Web Service?

Facilitates communication to a networked application Client requests something Server carries out the request and reports success or

failure Responses and requests (sometimes) are encoded in

XML Programmers embed calls to a web service as a

part of a useful local application Query a online database Receive news updates Receive stock quotations

What is REST?

REpresentational State Transfer Roy Fielding A design philosophy not a protocol The fundamental concept behind the web Each URL/URI is a unique state transferred from

server to client Characteristics of REST-ful web services:

Always over HTTP Request: Form a URL + query string

http://myrest.com/?query=cat&operation=search Response: Comes back in XML

<?xml version=“1.0” encoding=“UTF-8”?>

<searchResponse>

<resquestparams>

<query>cat</query>

</requestparams>

<results>

<result>

<title>Cat in the Hat</title>

</result>

<!– more results follow

</results

</searchResponse>

REST v. SOAP / XML-RPC

Eric Lease Morgan classifies web services: SOAP-ful Web Services

More complicated; but potentially more robust than REST Can use any sort of transport mechanism, email; SSH,

telnet Encoded using the SOAP XML wrapper – W3C standard

for web services Example – The Google API (incorporate Google’s search

results into your own program) REST-ful Web Services

Serve up as arbitrary application defined XML only Transported via HTTP requests only

What are SRW and SRU?

The Search/Retrieve Web Service and the Search/Retrieve URL Service

Standard way to search any Internet information resource: Set up an SRW or SRU server (relatively painless) Accept queries & return search results

For librarians – SRW/U comes out of the ZING (Z39.50 International: the Next Generation) Group

Hasn’t hit critical mass unlike OAI or RSS SRW is the SOAP-ful flavor SRU is the REST-ful flavor

SOAP-ful Example - SRW<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP:Body>

<SRW:searchRetrieveResponse xmlns:SRW="http://www.loc.gov/zing/srw/" xmlns:DIAG="http://www.loc.gov/zing/srw/diagnostics/"> <SRW:version>1.1</SRW:version> <SRW:numberOfRecords>1</SRW:numberOfRecords> <SRW:records>

<SRW:record> <dc:title>Law and Technology Journal</dc:title>

<dc:identifier>http://www.example.com/journal/</dc:title> <!– record content </SRW:record> </SRW:records> </SRW:searchRetrieveResponse> </SOAP:Body></SOAP:Envelope>

REST-ful Example - SRU

<SRW:searchRetrieveResponse xmlns:SRW="http://www.loc.gov/zing/srw/" xmlns:DIAG="http://www.loc.gov/zing/srw/diagnostics/"> <SRW:version>1.1</SRW:version> <SRW:numberOfRecords>1</SRW:numberOfRecords> <SRW:records>

<SRW:record> <dc:title>Law and Technology Journal</dc:title>

<dc:identifier>http://www.example.com/journal/</dc:title> <!– record content </SRW:record> </SRW:records> </SRW:searchRetrieveResponse>

Basic SRU Server

Understands queries written in CQL (Common Query Language)

Queries sent to an SRU server as a URL parameter

Receive a structured XML response with search results

Take this Result and… Format it for your users using XSLT More processing- do something else with it

CQL - Basics

Combines two search engine traditions Simple Google-like queries Don’t worry, you don’t have to write a query parser Expressive and powerful but “non-intuitive” query languages

Z39.50’s query language SQL & XML Query languages

Examples “new jersey statutes” description = “governor of new jersey ” and description

= “ethics” cat prox/distance=3/unit=word/ordered hat

URL + Parameters

Test server [http://law-library2.rutgers.edu/SRU/sru.pl] inspired by an implementation by Mike Taylor

Basic SRU parameters: operation – tells the SRU server what it is supposed to do (only 3 of

them searchRetrieve, explain, scan) version – currently 1.1 startRecord – the first record that you want back maximumRecords – number of records you want back at any one time recordSchema – metadata format you want back

The full SRU request: http://law-library2.rutgers.edu/SRU/srucql.pl?

query=“new+jersey+statutes”&startRecord=1&maximumRecords=10&collection=lawlib&version=1.1&operation=searchRetrieve&recordSchema=dc

Application defined parameters, you just must let users know these are there in the documentation for your SRU server:

ex: the collection parameter above

XSLT for Formatting

Use the stylesheet parameterThis allows you to specify an XSLT

stylesheet that to format your search reYou can have different stylesheets for

different usersClient side v. Server Side XSLT

Browser support is unreliable Large XML documents can tax a server

Current Collections

Electronic Journal & Databases (Titles + Descriptions Only)

Law Library Website Digital Library Collections [8 Collections] Collections indexed using swish-e Library Catalog

Done using the British Library’s python wrapper class for Z39.50 servers

Uses the python ZOOM and CQL

Application Diagram

Harvested OAI Data

PythonZ39.50

Swish-e

Digital Library

Library Website

Library

Catalog

SRU

Server

User

XSL

Stylesheet

HTML XMLResponse

URL Requestw/ SRU Params

Other SRU Operations

Explain Tells a programmer about your SRU Server Explain Response structure defined by the Zeerex XML DTD

Z39.50 explain, explained and re-engineered in XML Fields to search Metadata sets Default Parameters for an SRU or SRW server What portion of CQL is supported Provides documentation for your SRU implementation

Scan Index function Could display a controlled vocabulary for a given collection Not implemented on any SRU apps

SRU could Fight Search Engine Babble

Consider the query “urban planning law”: http://law-new.rutgers.edu/search/t?SEARCH=

urban+planning+law http://www.google.com/search?hl=en&q=

urban+planning+law&btnG=Google+Search http://search.yahoo.com/search?p=

urban+planning+law&sm=Yahoo%21+Search&fr=FP-tab-web-t&toggle=1

http://law.bepress.com/cgi/query.cgi?field_1=full_text&field_2=author&value_1=urban+planning+law&value_2=&connector_3=and&field_3=ancestor.link&op_3=in&value_3=http%3A%2F%2Flaw.bepress.com%2Frepository&hidden_3=1&x_force_carryover=&format=cover_page&query=Processing...

To combine These Search Results…

Get the query syntax and all URL parameters correct

Then scrape the HTML output in order get the information on the query response Not a very reliable proposition What if you could send a query to all them

using the same syntax and receive your responses back in the same format?

You can using SRU

Imagine if…

Many useful remote resources supported SRU

It would be easy to bring users search results from targeted SRU-compliant resources

Possible SRU Applications

Incorporation of content from popular indexes and resources into federated searches

Subject-specific search interfaces Institutional repository content harvested via OAI or

something like it Blogs Self-published works Grey literature BePress, PLOS, or SSRN

OAI and SRU/W

OAI is a harvesting protocol (Open Archives Initiative) Full-text search isn’t an option in OAI

SRU/W is a searching protocol Full-text search is an option in SRU if you want

They Compliment each other You could easily search harvested OAI data via

SRU Imagine if one could easily harvest court

decisions via OAI….(Tom Bruce) Search for OAI data providers Registry via SRU

– See the University of Illinois OAI registry

Potential OAI – SRU Synergy

OAI HarvestersOAI Data Providers

SRU Server

Make Data AvailableHarvest and Maintain Updated

Indexes of Data

Search and Present Data to Users

Adding the Nellco Repository

Harvest the Records via OAI Create a Swish config file for the harvested

records Index with Swish Create an explain record & id value in our

federated application for the index Make the new index a search target This is what google scholar is doing with IRs like

Dspace (full-text possible)

Harvested Resource Results

What Can Libraries Gain?

Improved discovery of institutional and remote resources

Subject specific aggregation Especially for the “Academic Internet”

Different entry points to collections Provide user-sensitive display of resources

using XSL stylesheets “We must be able to accept and deliver multiple

forms of metadata in order to build scalable digital libraries” - Roy Tennant

Federated Search Issues

Ranking of search resultsEffective display of resultsTo build a fully featured search interface

you need: Metadata and more Metadata Simple Dublin Core’s focus is discovery

Doesn’t represent technical information The identifier element isn’t adequate for resources

with multiple formats and manifestations

For the Future

Your search engine is only as good as your data XML that parses

Metadata – early and often A robust and extensible format such as METS METS provides facilities for encoding structural and technical

information along with description metadata Crosswalks can be used to extract simpler metadata formats

from METS for use by apps like SRU and OAI Open protocols layered on top of each

SRU interface over OAI harvested metadata Institutional OpenURL resolver information applied to

identifiers returned via SRU and OAI to grant user access to subscriptions

REST-ful Applications You’ve Heard off

RSS/Atom – The technology behind blogs OAI – The Open Archives Initiative Some Library OPACs are quasi REST-ful services

See the library lookup tool by Jon Udell Sessions or cookies deployed for anonymous OPAC

searches can kill the possibility for writing applications like library lookup

Consider this URL: http://law-new.rutgers.edu/search/t?SEARCH=urban+planning

With SRU any OPAC cold be queryed in this fashion http://myopac.edu/?

operation=searchRetrieve&version=1.1&query=“dc.title=urban planning”&recordSchema=dc

What you can do next?

Demand Vendor Support for simple REST-like open interfaces

Check out http://law-library2.rutgers.edu/SRU/examples/ SRU/OAI examples SRU/OAI practical research

Develop an SRU server for your own collection(s)

Get involved, ZING meets in Chicago next week

a rest-ful web services approach to library federated search using sru kevin reiss rutgers-newark...

Documents

xml slide

search results

search response

library federated search

soapful web services

restful flavor slide

web services example

restful web services