outline of the course - cnr€¦ · – type-1: rpn (1: rpn (reverse polish notationreverse polish...

58
Outline of the course Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (30%) User Services (10%) Additional topics (15%) Buliding of a (small) digital library Reference material: Ian Witten, David Bainbridge, David Nichols, How to build a Digital Library Morgan Kaufmann 2010 ISBN 978 0 12 374857 7 Library, Morgan Kaufmann, 2010, ISBN 978-0-12-374857-7 (Second edition) The Web FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -1

Upload: others

Post on 30-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Outline of the course

Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (30%)( ) User Services (10%) Additional topics (15%)p ( )

Buliding of a (small) digital library

Reference material:– Ian Witten, David Bainbridge, David Nichols, How to build a Digital

Library Morgan Kaufmann 2010 ISBN 978 0 12 374857 7Library, Morgan Kaufmann, 2010, ISBN 978-0-12-374857-7(Second edition)

– The Web

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -1

Page 2: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Where are we

Digital Libraries– Description of information

• MetadataMARC– MARC

– Dublin Core– MODS– METS– TEI– EADEAD– ......

• Knowledge Representation– FRBR– RDF

• InteroperabilityFUB 2012-2013 Vittore Casarosa – Digital Libraries

• InteroperabilityPart 6 -2

Page 3: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Interoperability

Interoperability is the ability of systems, services p y y y ,and organisations to work together seamlessly toward common or diverse goals. In the technical garena it is supported by open standards for communication between systems and for ydescription of resources and collections, among others. Interoperability is of paramountg p y prelevance in the context of resource discovery and access.

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -3

Page 4: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Interoperabilityin digital libraries (1/2)in digital libraries (1/2)

Interoperability between vendors– Different databases and user interfaces

Interoperability between different organisations– Eg. using different library formats

Interoperability between groups of users– Eg. Public libraries/Academic libraries– Eg. libraries in different countries

Interoperability between communities– Eg. libraries, publishers, archives, museums

Interoperability across time – Preservation

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -4

Page 5: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Interoperabilityin digital libraries (2/2)in digital libraries (2/2)

Traditional libraries interoperabilityU i l (OPAC )– Union catalogs (OPACs)

– Interlibrary loan Digital libraries interoperabilityg p y

– Documents (digital objects, resources)– Metadata

C t l d l– Conceptual models– Protocols

• Z39.50• OAI-PMH

– Queries• Z39.50 queries (Type 1, Z39.58, CQL)Z39.50 queries (Type 1, Z39.58, CQL) • SRU, SRW• SPARQL

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -5

Page 6: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Z39.50

"Information Retrieval (Z39.50); Application Service Definition and Protocol Specification ANSI/NISO Z39 50 1995"Protocol Specification, ANSI/NISO Z39.50-1995"

Developed by NISO (National Information Standards Organization), the standards development organization serving libraries, publishing

d i f ti iand information services NISO was (is) the Z39 Committee of ANSI (American National

Standards Institute), and Z39.50 was the 50th standard defined by NISONISO

Current version (Version 3) was adopted in 1995, superceding earlier versions adopted in 1992 and 1988 (1984 version was rejected)– Another revision, initiated in 2001, is still “work in progress”

Z39.50 was heavily influenced by OSI, and was an “application layer” protocol that needed a full-duplex reliable OSI connectionprotocol that needed a full duplex reliable OSI connection– In Version 3 it runs over TCP/IP

It is a wide ranging protocol for information retrieval between a client and a database server which attempts to standardize shared

FUB 2012-2013 Vittore Casarosa – Digital Libraries

and a database server, which attempts to standardize shared semantic knowledge

Part 6 -6

Page 7: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 7

Page 8: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 8

Page 9: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 9

Page 10: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 10

Page 11: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 11

Page 12: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 12

Page 13: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Z39.50 architectural model (1/2)

A server houses one or more databases containing records. g Associated with each database are a set of access points (indexes)

that can be used for searchinghow to segment logical data into relations and how to name the– how to segment logical data into relations and how to name the columns in the relations are hidden (server-specific)

Z39.50 includes a set of “registries” that provide each (application) domain with a an agreed-upon structure and attributes (query syntax, attribute fields, content retrieval format, etc.)

A search (sent from the client - origin to the server - target) produces A search (sent from the client origin to the server target) produces a set of records, called a "result set", that are maintained on the serverTh li t h l f ti f h t ( t The client has also functions for search management (e.g. request progress reports for an active search, authorize the server to continue a resource intensive search, abort an active search)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -13

Page 14: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Z39.50 architectural model (2/2)

Records from the result set can be retrieved by the client, which has i f lli h d f f h dmany options for controlling the contents and format of the records

that are returned (e.g. sorting a result set, selecting a subset of the result set, using the result set for a new search)

The client has available also a general mechanism called "extended services" to invoke services on the server, which can survive past the end of the session (e g saving result sets across sessions queuingend of the session (e.g. saving result sets across sessions, queuing result sets for print or electronic mail processing at the server, registering queries that would be executed periodically on the server)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -14

Page 15: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Z39.50 functionality

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -15

Page 16: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Initialization facility

Init service: establishes Z-association

O i i TInit requestVersion (id/password)Origin TargetVersion, (id/password),option flags,message sizes,implementation information

Init responseResult, version,option flags,message sizes,implementation information

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -16

Page 17: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Search facility

Search service

Search requestSearch type, query,

Origin Targetyp , q y,

databases,result setlimits for small, medium, large, g

Search responseNumber of records found,number of records attached,,status information,(records)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -17

Page 18: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Retrieval facility

Present serviceOrigin TargetPresent request

Number of records,starting point,result setresult set

Present responseNumber of returned records,stat s

Segment service

status,(records)

Segment service– Allows a “Present response” that is larger than max size to

be split in segments

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -18

Page 19: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Sort facility

Sort service

Origin TargetSort requestresult set to sortOrigin Targetresult set to sort,sorted result set,sort directives

Sort responsestatus

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -19

Page 20: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Browse facility

Scan service

Origin TargetScan requestdatabase termOrigin Targetdatabase, termlist, starting point,number of terms,(step size)

Scan responsestatusnumber of elements(elements)( )

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -20

Page 21: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Result-set-delete facility

Delete service

Origin TargetDelete requestlist of result setsto delete

Delete responseDelete responsestatus

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -21

Page 22: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Access control facility

Access-control service

Origin TargetRequest

Access control response

Request

Access control responseSecurity-challenge

Access control requestSecurity-challenge-response

Response

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -22

Page 23: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

AccountingResource control facilityResource control facility

Resource-control service Trigger-resource-control service Resource report service Resource-report service

– Complex functionality to control and report resource usageresource usage

– Mostly used for fee based operation

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -23

Page 24: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Termination facility

Close service– Terminates a Z-association

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -24

Page 25: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Explain facility

Explain service– Gives access to information about the Z39.50 target

• Databases• Access points• Access points• Query languages• Element sets• ...

This information is maintained by the server in a specific data base and therefore can be accessed using thedata base, and therefore can be accessed using the Search and Retrieve facilities of Z39.50

The idea is that a (smart) client, when accessing a ( ) , g(unknown) data base, could be able to find its access points, its element sets and other info by querying the ”Explain” data base

FUB 2012-2013 Vittore Casarosa – Digital Libraries

Explain data basePart 6 -25

Page 26: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Extended Service facility

Extended Services service– Persistent Result Set Extended Service– Persistent Query Extended Service– Periodic Query Schedule Extended Service– Item Order Extended Service– Database Update Extended Service– Export Specification Extended Service

Task package– Used to create, modify or delete an Extended

Service Request

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -26

Page 27: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Queries

Query typesy yp– Type-0: proprietary between 2 parties– Type-1: RPN (Reverse Polish Notation)Type-1: RPN (Reverse Polish Notation)– Type-2: ISO 8777

Type 100: Z39 58– Type-100: Z39.58– Type-101: Extended RPN (v 2)– Type 102: Ranked List query

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -27

Page 28: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Type-1 QueryReverse Polish NotationReverse Polish Notation

Consists of– One or more operands linked (RPN style) with Boolean

operators (AND, OR, AND_NOT)– Every operand is a search expression consisting of 7 parts

Example of query( d)( d)– (operand)(operand)operator

– (“Mark Twain”, 1:1003, 2:3, 3:1, 4:1, 5:100, 6:1)(“Clemence Samuel” 1:1003 2:3 3:3 4:101 5:100 6:2)( Clemence, Samuel , 1:1003, 2:3, 3:3, 4:101, 5:100, 6:2)AND_NOT

RPN(3 + 5) * (7 – 2)

+5

-27

*5

FUB 2012-2013 Vittore Casarosa – Digital Libraries

3 5 + 7 2 – * 3 8 8 40Part 6 -28

Page 29: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Operands in Type-1 queries

0. TermWhat you are looking for– What you are looking for

1.Use Attributes – Which abstract access point to use (e.g. title, author)

2 Relation Attributes 2.Relation Attributes– Relation between the term and the data in the access point (e.g. less than, equals,

phonetic equals) 3 Position Attributes 3.Position Attributes

– Where in the access point should the term be? (e.g. first in field, first in subfield) 4.Structure Attributes

– How is the query term to be treated? (e g as phrase as words as date asHow is the query term to be treated? (e.g. as phrase, as words, as date, as normalised name)

5.Truncation Attributes – Should truncation be applied on the match? (e.g. left truncation, right and left

truncation, no truncation) 6.Completeness Attributes

– What is the term to be matched against? (e.g. part of subfield, whole subfield, hole field)

FUB 2012-2013 Vittore Casarosa – Digital Libraries

whole field)

Part 6 -29

Page 30: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

From Z39.50 to SRW/U

Need for a generic Information Retrieval capability more suited to the Web Architecture

Motivation to create an easy to implement protocol with (more or less) the power of Z39.50

Use existing off the shelf solutions where possible Re-evaluate Z39.50, “a good idea at the time” Avoid library-centric perspective

Solution: SRU – Search/Retrieve via URL SRU – Search/Retrieve via URL SRW – Search/Retrieve via Web Service (SRW is now

called SRU over SOAP)

FUB 2012-2013 Vittore Casarosa – Digital Libraries

called SRU over SOAP)

Part 6 -30

Page 31: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Simple SRU query

http://sru.miketaylor.org.uk/sru.pl?http://sru.miketaylor.org.uk/sru.pl?version=1.1&operation=searchRetrieve&query=dinosaur&query=dinosaur&startRecord=1&maximumRecords=1&recordSchema=dc

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -31

Page 32: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

SRU response in XML (1/2)

<?xml version="1.0"?>hR t i R<zs:searchRetrieveResponse

xmlns:zs='http://www.loc.gov/zing/srw/'><zs:version>1.1</zs:version><zs:numberOfRecords>29</zs:numberOfRecords><zs:records>

.... details in a moment ....

</zs:records></zs:records></zs:searchRetrieveResponse>

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -32

Page 33: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

SRU response in XML (2/2)

<zs:record><zs:recordSchema>info:srw/schema/1/dc-v1 1</zs:recordSchema><zs:recordSchema>info:srw/schema/1/dc v1.1</zs:recordSchema><zs:recordPacking>xml</zs:recordPacking><zs:recordPosition>1</zs:recordPosition><zs:recordData><srw dc:dc xmlns:srw dc="info:srw/schema/1/dc-schema"<srw_dc:dc xmlns:srw_dc= info:srw/schema/1/dc-schema

xmlns="http://purl.org/dc/elements/1.1/"><title>Fossils</title><creator>Lappi, Megan.</creator><type>text</type><type>text</type><publisher>New York, NY: Weigl Publishers</publisher><date>2005</date><language>en</language><d i ti >St d i f il F il f t G<description>Studying fossils -- Fossil facts -- Goneforever -- A fossil is born -- From bone to stone --Insects in amber -- Dinosaur footprints</description>

<identifier>http://www.loc.gov/catdir/toc/ecip0415/2004004136.html

</identifier><identifier>URN:ISBN:1590362136</identifier>

</srw_dc:dc>

FUB 2012-2013 Vittore Casarosa – Digital Libraries

</zs:recordData></zs:record>

Part 6 -33

Page 34: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Contextual Query Language

CQL (formerly known as Common Query Language) is the query language used in SRU

The conceptual model of CQL is the same as Z39.50– The server has one or more databases, containing records– The databases can be searched through access points, or

i dindexes The language defines a number of defaults to make

simple queries really simplesimple queries really simple Ar the same time it defines a number of Indexes,

Relations Relation Modifiers Booleans and BooleanRelations, Relation Modifiers, Booleans and Boolean Modifiers to increase the expressing power of the language

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -34

Page 35: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

CQL search clause

subject any/relevant "fish frog"subject any/relevant "fish frog"subject any/relevant fish frog

i d l tiRelation Search term

subject any/relevant fish frog

i d l tiRelation Search termindex relation modifier

Search termindex relation modifierSearch term

Subject to context qualificationSubject to context qualification

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -35

Page 36: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Learning curves for query languageslanguages

SQLCQL

SQLCQL

SQLCQL

SQLCQLCQLCQL

lear

n

CQLCQL

lear

n

GoogleGoogleffort

to

GoogleGoogleffort

to

EfEf

Expressive PowerExpressive Power

FUB 2012-2013 Vittore Casarosa – Digital Libraries

pp

Part 6 -36

Page 37: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 37

Page 38: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 38

Page 39: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 39

Page 40: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 40

Page 41: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 - 41

Page 42: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Z39.50 and OAI-PMH

It is interesting to see another protocol for “resourcediscovery”, namely the Open Archive Initiative (OAI-PMH) protocol

Historical separation from Z39.50– OAI-PMH appears about 15 years after Z39.50

C f Cultural separation from Z39.50– Z39.50 originated in the traditional library community

OAI PMH i i t d i th “W b C it ”– OAI-PMH originated in the “Web Community” Conceptual separation from Z39.50

Z39 50 b d lid (b t h d b lk ) f d ti– Z39.50 based on solid (but heavy and bulky) foundations– OAI-PMH based on simple and pragmatic ideas

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -42

Page 43: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI – Open Archives Initiative

The roots of OAI lie in the development of eprint archivese oots o O e t e de e op e t o ep t a c es(i.e. Institutional Repositories) such as arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL, etc.

Each repository offered a web interface for deposit of articles and for end-user searches

It was difficult for end-users to work across archives without having to learn multiple different interfacesI iti l i t f i l h i t f t ll Initial experiments for single search interface to all archives

Universal Pre print Service (UPS) renamed OAI at the Universal Pre-print Service (UPS) renamed OAI at the Santa Fe Convention (1999)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -43

Page 44: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Searching versus Harvesting

Two possible approaches for single search interface to all hiarchives

– cross searching multiple archives based on protocol like Z39.50(possibly lighter)(p y g )

– harvesting metadata into one or more ‘central’ services Problems with cross searching

– Not scalable (overall performance determined by slowest server)– Problems of deciding which servers to target (collection

descritpions not consistent))– Differences in interfaces and query languages– Problems in the ranked merging of results (different types and

size of targets can skew results)size of targets can skew results)– Browse interface very difficult to build

Decision was to go with harvesting

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -44

Page 45: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI - PMH

OAI Protocol for Metadata Harvestingg Data providers make metadata available for harvesting Service Providers harvest metadata Metadata can be centrally collected or “aggregated” Data Providers

– Are creators and keepers of the metadata for objects (repositories) and (possibly but not necessarily) archives of resources

– Handle deposit and publishing Service Providers

– Are harvesters of metadata for the purpose of providing a service such as a search interface, peer-review system, etc.

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -45

Page 46: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI – PMH overview

Harvestingb dHarvestingb dbased onOAI-PMHbased onOAI-PMH

Aggregator

Searchingbased onSearchingbased on

Aggregator

based onZ39.50 orSRW

based onZ39.50 orSRW

Service providersService providersFUB 2012-2013 Vittore Casarosa – Digital Libraries

Se ce p o de sSe ce p o de sPart 6 -46

Page 47: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI disclaimer

The OAI use of the term ‘archive’ in fact implies very little of what we normally associate with archivesnormally associate with archives

No preservation aspect is implied whatsoever (not what the protocol is about at all)

No appraisal or provenance of eprints or digital objects is implied by this descriptive term

The term simply refers to a collection of digital objects (full text, p y g j (learning objects, etc.) which might (only might) also have been harvested along with the metadata

‘Archive’ is a term within the OAI PMH which serves to distinguish a gcollection of digital objects (the ‘archive’) from the collected metadata associated with these objects, described as ‘repositories’

In OAI repositories expose metadata about ePrints (strictly there areIn OAI repositories expose metadata about ePrints (strictly there are no metadata archives)

In OAI archives hold ePrints (strictly there are no eprint repositories)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -47

Page 48: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Conceptual model of OAI data

resourceresourceresourceresource

all available metadataitem = all available metadataitem = all available metadataitem = all available metadata about David

itemitem = identifier

all available metadata about David

itemitem = identifier

all available metadata about David

itemitem = identifier

recordsDublin Coremetadata

MARCmetadata

SPECTRUMmetadata recordsDublin Core

metadataMARC

metadataSPECTRUM

metadataDublin Core

metadataMARC

metadataSPECTRUM

metadata

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -48

Page 49: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI – PMH records

A record contains the metadata of a resource in a specific fformat

It has three partsheader (mandatory)– header (mandatory)

• identifier • datestamp p

– metadata (mandatory)• XML encoded metadata with root tag, namespace

it i t t D bli C• repositories must support Dublin Core• may support other formats

– about (optional)( p )• rights statements• provenance statements

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -49

Page 50: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI-PMH Protocol Overview

Protocol based on HTTP Request arguments as GET or POST parameters Six request types (verbs)q yp ( ) Responses are encoded in XML syntax Supports any metadata format (Dublin Core mandatory pp y ( y

for each data provider) Support selective harvesting

– logical set hierarchy (data providers)– date stamps (last change of metadata set)

Flow control (token to retrieve subsequent records) Error messages

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -50

Page 51: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI – PMH verbs

Identify– description of an archive

ListMetadataFormatst i il bl t d t f t f hi– retrieve available metadata formats from archive

ListSetsretrieve set structure of a repository– retrieve set structure of a repository

ListIdentifiers– abbreviated form of ListRecords, retrieving only headersabbreviated form of ListRecords, retrieving only headers

ListRecords– harvest records from a repository

GetRecord– retrieve individual metadata record from a repository

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -51

Page 52: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Overview of OAI - PMH

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -52

Page 53: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI – PMH request

Requests must be submitted using the GET or POST methods of HTTP

Repositories must support both methods At least one key=value pair: verb=[RequestType] Additional key=value pairs depend on request type Example for GET request

– http://archive.org/oai?verb=ListRecords&metadataPrefix=oai_dc

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -53

Page 54: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI – PMH response

Formatted as HTTP responses Content type must be text/xml HTTP compression optional in OAI-PMH XML declaration XML declaration

(<?xml version="1.0" encoding="UTF-8" ?>) Root element named OAI-PMH with three attributes (xmlns,

xmlns:xsi, xsi:schemaLocation) Three child elements

ResponseDate (UTC datetime)– ResponseDate (UTC datetime)– Request (copy of the request that generated the response)– a) error (in case of an error or exception condition)) ( p )– b) element with the name of the OAI-PMH request

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -54

Page 55: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

OAI – PMH example

http://edoc.hu-berlin.de/OAI-2.0?verb=ListIdentifiers&from=2002-01-06&until=2002 01 08&until=2002-01-08&metadataPrefix=oai_dc&set=doctypes:dissertationsyp

ListIdentifiers returns the record headers

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -55

Page 56: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Response to ListIdentifiers (1/2)

<?xml version="1.0" encoding="UTF-8"?> <OAI PMH l "htt // hi /OAI/2 0/"<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/

http://www openarchives org/OAI/2 0/OAI PMH xsd">http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd > <responseDate>2002-10-22T17:49:49+01:00</responseDate> <request verb="ListIdentifiers" from="2002-01-03" until="2002-01-08"

metadataPrefix="oai dc" set="doctypes:dissertations">metadataPrefix= oai_dc set= doctypes:dissertations >http://edoc.hu-berlin.de/OAI-2.0</request>

<ListIdentifiers>

...... details in a moment

</ListIdentifiers> </OAI-PMH>

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -56

Page 57: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Response to ListIdentifiers (2/2)

<ListIdentifiers> <header><header>

<identifier>oai:HUBerlin.de:3000819</identifier> <datestamp>2002-01-08</datestamp> <setSpec>doctypes</setSpec> y<setSpec>doctypes:dissertations</setSpec> <setSpec>dnb</setSpec> <setSpec>dnb:dnb33</setSpec>

</header></header> <header>

<identifier>oai:HUBerlin.de:3000831</identifier> <datestamp>2002-01-07</datestamp> p p<setSpec>doctypes</setSpec> <setSpec>doctypes:dissertations</setSpec> <setSpec>dnb</setSpec> <setSpec>dnb:dnb27</setSpec><setSpec>dnb:dnb27</setSpec>

</header> </ListIdentifiers>

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 6 -57

Page 58: Outline of the course - CNR€¦ · – Type-1: RPN (1: RPN (Reverse Polish NotationReverse Polish Notation) – Type-2: ISO 8777 – Type-100: Z39 58100: Z39.58 – Type-101: Extended

Where are we

Digital LibrariesDi f i f ti– Discovery of information

• Describing Information– Metadata

• MARCD bli C• Dublin Core

• MODS• METS• TEI• EAD• ......

– Knowledge Representation• FRBR• RDF

• Interoperability– Queries

• Z39.50 queries• Common Command Language

(CCL – ISO 8777 or Z39 58)(CCL ISO 8777 or Z39.58)– Protocols

• Z39.50• SRU/SRW

FUB 2012-2013 Vittore Casarosa – Digital Libraries

• OAI-PMH

Part 6 -58