interoperability of collections brandon muramatsu on behalf of merlot and smete
TRANSCRIPT
Interoperability of Collections
Brandon MuramatsuOn behalf of
MERLOT and SMETE
March 29, 2003Federated Search at MERLOT and SMETE 2
Underlying Question
How could widespread distribution of a collection’s item-level metadata enhance rather than dilute the value of the collection?
Andy Dong, SMETE
Director of Technology
March 29, 2003Federated Search at MERLOT and SMETE 3
Enhancing Value
• (Focusing on educational digital libraries and not Z39.50 providers like traditional digital libraries)
• Providing additional related resources to end-users of a collection– Example: ENC mathematics resources available to
users of MathDL• Providing additional value-added services on top
of multiple collections for end-users– Example: Providing MERLOT-“peer review” to
resources in Math Tools• Wider-scale discovery and use of resources
– Ideally leading to increased use of the host collection (especially if it provides a value added services)
March 29, 2003Federated Search at MERLOT and SMETE 4
Outline
• Approach– Federated Search
– Metadata Harvesting
• Policy Issues– Integrity
– Identity
• Technical Issues– SOAP and WSDL
– Prototype Implementation
• Discussion
March 29, 2003Federated Search at MERLOT and SMETE 5
Approaches: Harvesting
• Metadata from distributed collections are “harvested” or “gathered”– Typically using Open Archives Initiative-Protocol for
Metadata Harvesting• Item-level metadata is stored “permanently” by a
third party that creates a composite index of all item-level metadata it has stored
• Searches are conducted against “permanent” index of harvested item-level metadata
• Pros:– Common approach used by commercial indexing and
abstracting services and Web search engines
March 29, 2003Federated Search at MERLOT and SMETE 6
Approaches:Federated Search
• Metadata is searched synchronously from multiple, distributed collections
• Item-level metadata is held “temporarily” during a user’s session
• Search returns results of distributed search– May be with or without integrated lists of results
• Pros:– Common approach used by Libraries when they query
multiple databases
March 29, 2003Federated Search at MERLOT and SMETE 7
Concerns
Applies to Either Approach
Protect Value and Integrity of the Providing Collection that Lead to Issues of Sustainability
– Protect integrity of the providing collection
– Protect identity of the providing collection
March 29, 2003Federated Search at MERLOT and SMETE 8
Policy Principles
• Protect Integrity of the Providing Collection– Ensures Providing Collection maintains “control”
of it’s metadata to ensure quality• Enables use of the “name” of the providing
collection to indicate quality – Allow federated searches after “formal”
agreement– Prevent unauthorized access (either harvesting
or federated search) and redistribution• May use “keys” for authentication coupled with log
analysis
March 29, 2003Federated Search at MERLOT and SMETE 9
Policy Issues
• Protect Identity of the Providing Collection– Attribute providing collection as the source of
the metadata• Potentially need to acknowledge provider of the
metadata and the original cataloger of the metadata
– Ensure “branding” of the providing collection• Typically through using a logo
– Enables use of the “name” of the providing collection to indicate quality
March 29, 2003Federated Search at MERLOT and SMETE 10
Technology Issues
• Using SOAP and WSDL to Simplify Process– Gives service providers a mechanism to
publish available services, including the semantics and syntax for accessing and consuming the service (WSDL).
– Allows service consumers the ability to discover services and configure software clients to access remote services.
March 29, 2003Federated Search at MERLOT and SMETE 11
Technologies: SOAP
• Simple Object Access Protocol– W3C Consortium Spec: www.w3.org/TR/SOAP
• XML-based Protocol for Exchanging Information in a Distributed Environment– Envelope describing what is in the message
– Set of encoding rules for expressing instances of application-defined datatypes
– Convention for representing remote procedure calls and responses
March 29, 2003Federated Search at MERLOT and SMETE 12
Technologies: WSDL
• Web Services Description Language– W3C Consortium Spec: www.w3.org/TR/WSDL.html
• XML-based grammar for describing network services as collections of communications endpoints capable of exchanging messages– (XML-formatted description of network-based services
as a set of endpoints operating on messages containing either document oriented or procedure-oriented information.)
– Abstract definition tied to concrete network protocol and message format at each endpoint
March 29, 2003Federated Search at MERLOT and SMETE 13
Prototype Implementation Architecture
GLUE toolkitMERLOT ServicedoMerlotSearch(···)
WSDL
MERLOT Service deployment
Interface Discovery
Axis toolkitSMETE ServicedoSmeteSearch(···)
SMETE Sevice deployment
WSDL
Federated Search Client
MERLOT client stub SMETE client stub
GLUE toolkit
Service Access Thread Service Access Thread
Results Processing Search Input Transformation
Search DispatchSearch Input
March 29, 2003Federated Search at MERLOT and SMETE 14
Prototype Implementation
• Separate implementations (server-side code)– Different WSDL files
• Similar input parameters– Key, query, start, maxResults, language
• Different search syntaxes supported– Google API, Lucene, full XML IEEE LOM
• Different ranking methodologies but common agreement on scale– Convert to 1-100 scale to integrate results
• Still need to do better documentation of implementations
www.smete.org/?path=/public/about_smete/activities/technology/federated_search/smetesearchapiv2.jhtml
March 29, 2003Federated Search at MERLOT and SMETE 15
Relation to Standards and Specifications Activities
• No item-level metadata element set is prescribed– Though both MERLOT and SMETE use variants of
IMS/IEEE LOM– Presumably most educational digital libraries will be
familiar with Dublin Core and IEEE LOM• Not using IMS Digital Repositories Spec• No “standard” for query languages
– Z39.50 query and query-type is most widely adopted– Not using XML-based query languages XQuery or
XPath because of adoption issues and evolving specifications
• No widely adopted “standard” for federated search
March 29, 2003Federated Search at MERLOT and SMETE 16
Context of Use?
• Talking Primarily about Item-Level Metadata– Interpret as generally the non-pedagogical/context
items of IEEE 1484.12.1 Learning Object Metadata
• What about Context?– “Assignments”
– “Comments”
– “Reviews”