metasearch niso metasearch initiative overview local uses of metasearch andrew k. pace head, systems...

Post on 29-Dec-2015

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Metasearch

NISO Metasearch Initiative Overview

Local Uses of Metasearch

Andrew K. Pace

Head, Systems

NCSU Libraries

Co-chair, NISO-MI

andrew_pace@ncsu.edu

John Little

Senior Analyst, IT

Duke Libraries

Member, NISO-MI TG3

John_R_Little@notes.duke.edu

Tim Shearer

Library Systems

UNC Libraries

Member, NISO-MI TG1

sheat@ils.unc.edu

Rumsfeld’s Law of Metasearch

You metasearch with the standard you

have, not the standard you wish

you had.

Credits & Thanks• NISO Metasearch Initiative Team

– Jenny Walker, VP Marketing, Ex LibrisCo-chair of the initiative

– Mike Teets, OCLC, Task Group Chair– Juha Hakala, Nat’l Lib Finland, Task Group Chair– Sara Randall, Endeavor and Katherine Kott, DLF,

Task Group chairs– All the active participants of the 3 task groups

• TRLN (for co-hosting 2 critical meetings)

Why I’m Here

• What is metasearch?

• Talk about the history, work, and present status of the NISO Metasearch Initiative Committee

• Convey the complexity of improving the standing of metasearch

• Talk about the work left to be done

I wish I had time to do more of….

• Convincing even the unconvinced that metasearch is a worthwhile endeavor (I will try to do this anyway)

• Talk more about Google (I will do this anyway)

• I do want to leave plenty of time for discussion

What’s in a name?

• Federated search

• Channel (RSS) search

• Metasearch

Diverse information resources

Query form

?

Query form

?

Query form

?

Query form

?

Diverse information resources

Federated search

Query form

?

Just-in-case

Federated search examples• ENCompass for Journals OnSite (EJOS)• SCIRUS• Google Scholar

Diverse information resources

Turned on or off

Query form

?

Channel (RSS) Search

Just-in-time

On request `

Example of Channel Search

Diverse information resources

Query form

?

Metasearch

Just-in-time

integrated searching = metasearching =

cross database searching = parallel searching =

broadcast searching = …

Diverse information resources

Query form

?

Metasearch

Query form

?

Diverse information resources

MetaSearch Technology

Translators/connectorsMetasearch agent

Metasearch….Why bother?• Because most patrons do not care where

information is or who packaged it• Present systems require users to know

– How to select / access a database– How to get to them– How to use unique search options

• Because Google cannot do it all• Challenge is creating a system that helps

users find what they need while minimizing what they need to know

Tennant’s Tenets• Only librarians like to search, everyone else likes to find• All things being equal, one place to search is better

than two or more.• “Good enough” is often just that• Users are not lazy, they’re human• Our ability to create effective one-stop searching is

dependent on our ability to appropriately target user needs

• The size of the result set doesn’t matter as much as how the results are presented. (‘the Google lesson’)

• Services should be placed as close to the user as possible

http://www.cdlib.org/inside/projects/metasearch/nsdl/

NISO-MI History

• ALA (Philadelphia) Midwinter 2003

• NISO-MI Planning (Denver), Spring 2003

• NISO-MI Proselytizing (Washington, D.C.), Fall 2003

• Task Groups, 2004 - present

The NISO Metasearch Initiative• Any standards identified must help all the

stakeholders:– libraries to deliver services that distinguish their

offerings from other free web services– metasearch service providers to offer more

effective and responsive services– content providers to deliver enhanced content

and protect their intellectual property

• Win – Win - Win

NISO-MI History• ALA Midwinter 2003

– Meeting called by 3 providers: Ebsco, Gale, Proquest– Concerned about impact on services – NISO offered to take leadership role and formed a

planning committee– Identified key issues

• Access Management (a.k.a. authentication/authorization)• Resource Identification• Metasearch Identification• The Search Itself• Results Management• Statistics

– Planned another meeting

NISO-MI History• Denver Spring 2003

– Access Management• Understand metasearch needs; find best solutions

available; develop best practices– Resource Identification

• Work with Dublin Core RSLP and ISO Directories group; exchange format for collection and service descriptions

– Search, Retrieve, Results Management• Current environment analysis (Z39.50, SRW/SRU,

Proprietary API’s, XML Gateways); develop best practice for API’s; continue Z39.50 profiling

==========================================– Metasearch Identification

• Solution: Register a practice that metasearch engines can use to identify themselves

– Statistics• Work with Z39.7 and COUNTER; Explain Metasearch environment;

Adapt existing standards; Publicize importance of statistics

NISO-MI History• D.C., Fall 2003

– Combined with OpenURL for 2-day workshop; briefed a larger audience on the broad issues discussed in Denver; Agreed that a focused initiative was needed

– Approved Recommendations – Appointed leadership

NISO-MI Leadership• Overall Co-chairs

– Jenny Walker, ExLibris– Andrew Pace, NCSU

• Access Management (TG1 / NISO BA)– Mike Teets, OCLC

• Collection Description (TG2 / NISO BB)– Juha Hakala, National Library of Finland

» Pete Johnston, UKOLN, Collection Description» Larry Dixson, LC, Service Description

• Search and Retrieve (TG3 / NISO BC)– Sara Randall, Endeavor – Matt Goldner, OCLC (formerly of Fretwell-Downing)– Katherine Kott, Digital Library Federation

TG 1: Access Management

Active Participants

• Katie Anstock – Talis Information Ltd.• Susan Campbell - CCLA• Frank Cervone – Northwestern University• Paul Cope – Auto-Graphics, Inc.• David Fiander – University of West. Ontario• Ted Koppel – The Library Corporation• Mark Needleman – SIRSI Corporation• Ed Riding - Dynix• RL Scott – US DOE, OSTI• Tim Shearer – University of North Carolina• Mike Teets – OCLC, Inc. (Chair)

TG1 – Access management• Authentication

– The process where a network user establishes a right to an identity -- in essence, the right to use a name (Lynch 1998)

– Are you who you say you are?

• Authorization– The process whereby a network user, based on their

attributes, receives entitlements or authority to use a resource

– So, can you use this?

Access Management Charter

• Gather requirements for Metasearch authentication and access needs, inventory existing processes now in place, and develop a series of formal use cases describing the needs.

• Deliver– Definitions document of Access Management and

Metasearch terms.– Inventory of methods and techniques in use today– Use cases describing authentication and access

needs.

TG1’s Plan of AttackInventorying Current Approaches and Technologies

Breaking apart the problem

Identifying (defining) all the actors

Enumerating functions

Developing Use Cases

Analyzing Use Cases• Ranking appropriateness of solutions

to use cases• Recommend standard or best practice

Situations Can Be Complex

Student

DatabasesStudent

LibraryMenu

CampusAuthent

MetasearchStudent

Library Auth

Citizen

StateAuthen

Current authentication technologies. Potential solutions?

• Proprietary APIs?

• NCIP? SIP2?• LDAP?• Shibboleth?• Kerberos?• Athens (UK) ?• PAPI?• Tequila?

• Non-authenticated identification?

• IP recognition?• Proxy Servers?• Referring URL?• Embedded data in URL?• Vendor provided

Javascript?• Cookies?• Shouting?

Status

• Completed survey of authentication methods in use.

• Developed comprehensive use cases then simplified to a three metasearch specific cases.

• Ranked authentication methods in use by their ability to deliver on use case needs.

• Introduced an environmental ranking to cover factors such as ease of use, adoption, complexity, cost, etc.

• Developed a charting model to identify best solutions.

Access Management ProcessObjects Processes

Credentials

Attributes

Entitlements

Certificate

Authentication

Authorization

Certification

The AMPA Mike Teets Invention

Access Management

User MetaSearch

Resource

Resource

= AMPS, Access Management Process Symbol

1

2

3

Instances of Authentication that take place in a simple metasearch transaction

IP Filtering

Proxy Server

Referring URL

Username/Password

SIP/SIP2NCIP Shibboleth

Kerberos

Athens

x.509 Digital Certificates

Cookies

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

3.0 4.0 5.0 6.0 7.0 8.0 9.0

Use Case Ranking

En

vir

on

men

tal

Ran

kin

g

Relative Rankings of Authentication Methods

Decisions to be Reached

• Are any current approaches universally applicable?

• Can/Should we develop our own authentication standard that addresses all situations? <not desirable>

• Is authentication conducive to a standard at all? Possible result: a series of “best practices”?

TG1 Recommendations

• Now– IP authentication– Username / Password

• Potential for the future– Shibboleth

What’s next…

RANKINGS AND RECOMMENDATIONS• Text document with comprehensive analysis

of methods in use.• Recommend best practices where available.• Recommend development necessary for

models with the most promise for metasearch.

• Liaison with Shibboleth community started

TG2: Collection Description

The Meta-Problem (from a Discovery Standpoint)

• Many database (content) providers, each with their own web presence and means of interaction

• User wants to use data from many providers at the same time

User Needs

• Find/discover collections that match a certain list of criteria

• Obtain enough descriptive information to be able to identify a desired collection

• Discover the services that provide access to the collection(s)

• Interpret items retrieved from the collection in the context of the collection

TG2 Mission

• Understand how portals use collection and/or service descriptions

• Analyze options; recommend schemas and syntax for implementation of collection (S1) and service (S2) descriptions

TG2 Work Plan• Create data models for collections and services• Design metadata semantics for models• Design syntax for representation and data

exchange• Build on existing work where possible• Ensure linkages between Collections (S1) and

Services (S2)• Don’t build a whole new service• Don’t specify the architecture for a given service• Don’t specify protocols for exchange of

collection and service metadata

Goals (Solutions)

• Create two element sets to be used by metasearch (and other) applications– Collections descriptions: human readable text

to describe contents of database• Building on Significant previous work, notably

– Research Support Libraries Programme, UK, 1999-2002– Dublin Core Collection Description Working Group, 2003+

– Service descriptions: to be used by applications to access remote database services

Relations between collections and services

• A collection may have a parent, and may have multiple sub-collections (children)

• Each collection description has 0-to-many service descriptions

• A service may make multiple resources available

• Each service description has 1 (only) collection description

DC Collection Description Application Profile (DC CD AP)

• A "core" set of collection description properties– For simple collection-level descriptions – Suitable for a broad range of collections– Primarily to support discovery of collections

• Includes: • Collection title• Description• Size• Subject(s)• Language• Type• Intellectual Rights

• Access Rights• Data Range• Collection method• Logo• Collection history• Etc.

TG2-S1 progress to date

• Working with/around DC CD AP issues (some joint membership) with data model

• Metasearch Initiative introduced some library-specific requirements out of scope for DC CD AP.

• TG2-S1 ends up with super-set of DC CD AP

Service Description Goals• Ultimately, a mechanism to describe (and

access) informational services that, in turn, provide access to collections

• How?– Indicate protocol used– Provide access point(s) for service– Provide authentication/authorization guidelines– Lists operations/queries supported

• TG2-S2 using Zeerex as vehicle

Zeerex: A Starting Point

• Originally a Z39.50 based specification• Based on Z39.50 “Explain” service, which

was never fully or particularly well implemented

• Flexible enough to deliver collection descriptions, relatively easy to implement

• “Z39.50 Explain, Explained and Re-Engineered in XML”

Under discussion:• Maintaining and exchanging collection

description and service access information– Auto-generate descriptions?– Harvest descriptions?

• Collection Identifiers– Metasearch needs globally unique and persistent

identifiers for collections ( and services)– Also needed by ONIX community, e-resource

management systems and more

Future• Publish/promote standardized Collection and

Service Description schemas• Write guidelines, best practices for

implementation• Promote creation of, and facilitate sharing of,

collection and service descriptions among metasearch providers

• Ensure interoperability (or at least consistency) with TG1 (Authentication) and TG3 (Search and Retrieve)

TG 3: Search/Retrieve

Goals– Describe current practice in Metasearching search and

retrieval – Define a standard vocabulary and terms – Define a template for exchange of search and retrieval

functionality– Inventory proprietary XML interfaces and best practices

for Metasearch search and retrieval – Recommend the data elements to describe a Result Set

and a record within a Result Set– Review SRW/SRU and recommend modifications for use

as the basis of a Metasearch search and retrieval standard.

Initial steps

Four main areas of activity in 2004-2005– Current practices– Metadata returned about result sets – Citation level data elements– Search / Retrieval standard investigation

Survey of Current Practices

• What protocols commonly used

• Capture other common information

• Sent in June to over 100 organizations with a (disappointing) 25% response rate

• Responses analyzed (Stanford) and will be posted to the NISO-MI Wiki

Result Set Metadata

• Result set metadata—information that is valid only in the context of the current result set

As opposed to…• Record metadata

– Administrative/control metadata– Descriptive

• Intended to inform possible standard protocol or to make sure proprietary protocols have sufficient information

• Reviewing and tightening the elements

Results Set Management

• How to allow for extension to core metadata so that Information Providers can transmit “extra value” information

• If cross database searching allowed in single search, how will variations in the results set metadata be handled on a database level?

• Can result set metadata be overridden at the single record level?

• Tension between the need for a simple to implement protocol and the need for rich metadata to provide advanced features?

Citation level data elements

• How to map them to commonly supported metadata formats– DC, MARC, MODS

• The goal? – Provide recommendations to improve

citation information for reuse in standards like OpenURL or document delivery

Citation Level Data Elements

• Need to be able to parse volume, issue, … information to reuse for other actions (OpenURL, document delivery)

• Reviewing the work of several other groups– MARBI 773 work– IMS Resource List standard– OpenURL 1.0 (the winner)

Search and retrieve

• Review current practices and make recommendations for best practice and further standards work.

• Specifically review SRW/SRU and recommend modifications for use as the basis of a Metasearch search and retrieval standard.

Search and retrieve

• Structured search and retrieve– Z39.50– SRW/SRU (http://www.loc.gov/srw/)– XML gateways (proprietary)

• Unstructured– html/http parsing (“screen- scraping”)

March 2005 meeting adopted a new approach:

• What is the lowest barrier of entry to encourage adoption of a standard search and retrieve protocol that would enable consistent processing of search results – display, sort, merge, de-dupe and ability to generate OpenURLs for onward linking?

NISO Metasearch XML Gateway (MXG)

Yes, a new standard !!

To be specific….

Relationship of this standard with NISO SRW/UThe NISO Metasearch XML Gateway is a non-conformant subset of the NISO SRW/U standard (http://www.loc.gov/z3950/agency/zing/srw/). The features missing from MXG that are necessary for SRU conformance are support for an Explain record and rich CQL support.

MXG has been designed to provide a low implementation barrier to content providers that want to make their databases available to metasearch engines. Interoperability across content providers was explicitly not a goal of MXG. The features of SRU that are missing from MXG are necessary for interoperability.

NISO MI XML Gateway (MXG)• SRU/SRW as possible starting point• Amazon A9 OpenSearch as low barrier• Recommended schemas for result set metadata• Recommended schemas for citation data based on

subset of OpenURL 1.0 data elements

Challenges and Opportunities• WebFeat Metasearch patent (????)

• Googlezon-mania

• Building the business case

• Finishing

Sept. 19-21

NISO Metasearch and OpenURL Workshop

Washington, DC

How to make your e-resources earn their keep, Ruth Stubbings, presentation at The radical library: taking up the challenge seminar, November 2003

Loughborough University

Metasearch here to stay!

• Key initiative for NISO• Needs support of all the stakeholders

– Metasearch vendors– Information Providers– Libraries

“If your not fer us, yer agin us”-John Little

Thank You.

Andrew K. Pace

Head, Systems

NCSU Libraries

andrew_pace@ncsu.edu

http://www.lib.ncsu.edu/niso-mi

http://www.lib.ncsu.edu/staff/pace

top related