a rest-ful web services approach to library federated search using sru kevin reiss rutgers-newark...
TRANSCRIPT
A REST-ful Web Services Approach to Library Federated
Search using SRU
Kevin ReissRutgers-Newark
Law LibraryCALI 2005 – June 11th
Wouldn’t it Be Nice?
If you could search all your library’s electronic and print resources from one location?
If interesting distributed resources could be easily integrated with local materials for your users?
If you could get at the “Deep Web” in a structured and simple manner? Institutional Repositories Database Content Open Access Journals
Project Genesis
Desire to increase visibility of all library resources
Provide a common search interface to a growing group of resources Library Catalog In House Digital Library Collections The Web & Academic Internet (Future)
Need to use an open-source solution Cost Compatibility with emerging digital library standards Make our digitized resources programmer-friendly
Solution: Federated Search
The SRU Protocol The Search Retrieve URL Service Implemented using open source Perl and Python Modules
Search different collections and resources at the same time
Provides a complimentary search interface monolithic single application search interfaces Not a replacement, but a complement
Implementation of standard search and retrieval protocols can benefit: Authors Publisher Content Providers Users
Why Should you Care?
Directors Open standards & protocols = More return on software
investment Bring the Academic Internet to your users
Web Developers/Masters Open protocols and standards bring down implementation
barriers such as time and cost Develop customized interfaces using XML/XSLT for search
and retrieval Library Technologists
Improve deep web accessibility Learn about a descendent of a familiar tool (Z39.50)
What is a Web Service?
Facilitates communication to a networked application Client requests something Server carries out the request and reports success or
failure Responses and requests (sometimes) are encoded in
XML Programmers embed calls to a web service as a
part of a useful local application Query a online database Receive news updates Receive stock quotations
What is REST?
REpresentational State Transfer Roy Fielding A design philosophy not a protocol The fundamental concept behind the web Each URL/URI is a unique state transferred from
server to client Characteristics of REST-ful web services:
Always over HTTP Request: Form a URL + query string
http://myrest.com/?query=cat&operation=search Response: Comes back in XML
<?xml version=“1.0” encoding=“UTF-8”?>
<searchResponse>
<resquestparams>
<query>cat</query>
</requestparams>
<results>
<result>
<title>Cat in the Hat</title>
</result>
<!– more results follow
</results
</searchResponse>
REST v. SOAP / XML-RPC
Eric Lease Morgan classifies web services: SOAP-ful Web Services
More complicated; but potentially more robust than REST Can use any sort of transport mechanism, email; SSH,
telnet Encoded using the SOAP XML wrapper – W3C standard
for web services Example – The Google API (incorporate Google’s search
results into your own program) REST-ful Web Services
Serve up as arbitrary application defined XML only Transported via HTTP requests only
What are SRW and SRU?
The Search/Retrieve Web Service and the Search/Retrieve URL Service
Standard way to search any Internet information resource: Set up an SRW or SRU server (relatively painless) Accept queries & return search results
For librarians – SRW/U comes out of the ZING (Z39.50 International: the Next Generation) Group
Hasn’t hit critical mass unlike OAI or RSS SRW is the SOAP-ful flavor SRU is the REST-ful flavor
SOAP-ful Example - SRW<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP:Body>
<SRW:searchRetrieveResponse xmlns:SRW="http://www.loc.gov/zing/srw/" xmlns:DIAG="http://www.loc.gov/zing/srw/diagnostics/"> <SRW:version>1.1</SRW:version> <SRW:numberOfRecords>1</SRW:numberOfRecords> <SRW:records>
<SRW:record> <dc:title>Law and Technology Journal</dc:title>
<dc:identifier>http://www.example.com/journal/</dc:title> <!– record content </SRW:record> </SRW:records> </SRW:searchRetrieveResponse> </SOAP:Body></SOAP:Envelope>
REST-ful Example - SRU
<SRW:searchRetrieveResponse xmlns:SRW="http://www.loc.gov/zing/srw/" xmlns:DIAG="http://www.loc.gov/zing/srw/diagnostics/"> <SRW:version>1.1</SRW:version> <SRW:numberOfRecords>1</SRW:numberOfRecords> <SRW:records>
<SRW:record> <dc:title>Law and Technology Journal</dc:title>
<dc:identifier>http://www.example.com/journal/</dc:title> <!– record content </SRW:record> </SRW:records> </SRW:searchRetrieveResponse>
Basic SRU Server
Understands queries written in CQL (Common Query Language)
Queries sent to an SRU server as a URL parameter
Receive a structured XML response with search results
Take this Result and… Format it for your users using XSLT More processing- do something else with it
CQL - Basics
Combines two search engine traditions Simple Google-like queries Don’t worry, you don’t have to write a query parser Expressive and powerful but “non-intuitive” query languages
Z39.50’s query language SQL & XML Query languages
Examples “new jersey statutes” description = “governor of new jersey ” and description
= “ethics” cat prox/distance=3/unit=word/ordered hat
URL + Parameters
Test server [http://law-library2.rutgers.edu/SRU/sru.pl] inspired by an implementation by Mike Taylor
Basic SRU parameters: operation – tells the SRU server what it is supposed to do (only 3 of
them searchRetrieve, explain, scan) version – currently 1.1 startRecord – the first record that you want back maximumRecords – number of records you want back at any one time recordSchema – metadata format you want back
The full SRU request: http://law-library2.rutgers.edu/SRU/srucql.pl?
query=“new+jersey+statutes”&startRecord=1&maximumRecords=10&collection=lawlib&version=1.1&operation=searchRetrieve&recordSchema=dc
Application defined parameters, you just must let users know these are there in the documentation for your SRU server:
ex: the collection parameter above
XSLT for Formatting
Use the stylesheet parameterThis allows you to specify an XSLT
stylesheet that to format your search reYou can have different stylesheets for
different usersClient side v. Server Side XSLT
Browser support is unreliable Large XML documents can tax a server
Current Collections
Electronic Journal & Databases (Titles + Descriptions Only)
Law Library Website Digital Library Collections [8 Collections] Collections indexed using swish-e Library Catalog
Done using the British Library’s python wrapper class for Z39.50 servers
Uses the python ZOOM and CQL
Application Diagram
Harvested OAI Data
PythonZ39.50
Swish-e
Digital Library
Library Website
Library
Catalog
SRU
Server
User
XSL
Stylesheet
HTML XMLResponse
URL Requestw/ SRU Params
Other SRU Operations
Explain Tells a programmer about your SRU Server Explain Response structure defined by the Zeerex XML DTD
Z39.50 explain, explained and re-engineered in XML Fields to search Metadata sets Default Parameters for an SRU or SRW server What portion of CQL is supported Provides documentation for your SRU implementation
Scan Index function Could display a controlled vocabulary for a given collection Not implemented on any SRU apps
SRU could Fight Search Engine Babble
Consider the query “urban planning law”: http://law-new.rutgers.edu/search/t?SEARCH=
urban+planning+law http://www.google.com/search?hl=en&q=
urban+planning+law&btnG=Google+Search http://search.yahoo.com/search?p=
urban+planning+law&sm=Yahoo%21+Search&fr=FP-tab-web-t&toggle=1
http://law.bepress.com/cgi/query.cgi?field_1=full_text&field_2=author&value_1=urban+planning+law&value_2=&connector_3=and&field_3=ancestor.link&op_3=in&value_3=http%3A%2F%2Flaw.bepress.com%2Frepository&hidden_3=1&x_force_carryover=&format=cover_page&query=Processing...
To combine These Search Results…
Get the query syntax and all URL parameters correct
Then scrape the HTML output in order get the information on the query response Not a very reliable proposition What if you could send a query to all them
using the same syntax and receive your responses back in the same format?
You can using SRU
Imagine if…
Many useful remote resources supported SRU
It would be easy to bring users search results from targeted SRU-compliant resources
Possible SRU Applications
Incorporation of content from popular indexes and resources into federated searches
Subject-specific search interfaces Institutional repository content harvested via OAI or
something like it Blogs Self-published works Grey literature BePress, PLOS, or SSRN
OAI and SRU/W
OAI is a harvesting protocol (Open Archives Initiative) Full-text search isn’t an option in OAI
SRU/W is a searching protocol Full-text search is an option in SRU if you want
They Compliment each other You could easily search harvested OAI data via
SRU Imagine if one could easily harvest court
decisions via OAI….(Tom Bruce) Search for OAI data providers Registry via SRU
– See the University of Illinois OAI registry
Potential OAI – SRU Synergy
OAI HarvestersOAI Data Providers
SRU Server
Make Data AvailableHarvest and Maintain Updated
Indexes of Data
Search and Present Data to Users
Adding the Nellco Repository
Harvest the Records via OAI Create a Swish config file for the harvested
records Index with Swish Create an explain record & id value in our
federated application for the index Make the new index a search target This is what google scholar is doing with IRs like
Dspace (full-text possible)
Harvested Resource Results
What Can Libraries Gain?
Improved discovery of institutional and remote resources
Subject specific aggregation Especially for the “Academic Internet”
Different entry points to collections Provide user-sensitive display of resources
using XSL stylesheets “We must be able to accept and deliver multiple
forms of metadata in order to build scalable digital libraries” - Roy Tennant
Federated Search Issues
Ranking of search resultsEffective display of resultsTo build a fully featured search interface
you need: Metadata and more Metadata Simple Dublin Core’s focus is discovery
Doesn’t represent technical information The identifier element isn’t adequate for resources
with multiple formats and manifestations
For the Future
Your search engine is only as good as your data XML that parses
Metadata – early and often A robust and extensible format such as METS METS provides facilities for encoding structural and technical
information along with description metadata Crosswalks can be used to extract simpler metadata formats
from METS for use by apps like SRU and OAI Open protocols layered on top of each
SRU interface over OAI harvested metadata Institutional OpenURL resolver information applied to
identifiers returned via SRU and OAI to grant user access to subscriptions
REST-ful Applications You’ve Heard off
RSS/Atom – The technology behind blogs OAI – The Open Archives Initiative Some Library OPACs are quasi REST-ful services
See the library lookup tool by Jon Udell Sessions or cookies deployed for anonymous OPAC
searches can kill the possibility for writing applications like library lookup
Consider this URL: http://law-new.rutgers.edu/search/t?SEARCH=urban+planning
With SRU any OPAC cold be queryed in this fashion http://myopac.edu/?
operation=searchRetrieve&version=1.1&query=“dc.title=urban planning”&recordSchema=dc
What you can do next?
Demand Vendor Support for simple REST-like open interfaces
Check out http://law-library2.rutgers.edu/SRU/examples/ SRU/OAI examples SRU/OAI practical research
Develop an SRU server for your own collection(s)
Get involved, ZING meets in Chicago next week