metadata april 8 2013

23
1 Open Archives Initiative -Protocol for Metadata Harvesting April 8, 2013 Richard Sapon-White

Upload: richardsapon-white

Post on 18-Dec-2014

83 views

Category:

Education


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Metadata april 8 2013

1

Open Archives Initiative -Protocol for Metadata Harvesting

April 8, 2013

Richard Sapon-White

Page 2: Metadata april 8 2013

2

Overview

Definitions History The OAI Model Protocol for Metadata Harvesting

Page 3: Metadata april 8 2013

3

Definitions

Harvester - client application issuing OAI-PMH requests

Harvesting - the gathering together of metadata from a number of distributed repositories into a combined data store

Archives – synonym for a repository of scholarly papers

Protocol - a set of rules defining communication between systems (such as ftp or http)

Page 4: Metadata april 8 2013

4

History of the OAI

E-print servers = archives or repositories E-print servers provide access to scientific and

technical papers, scholarly journal articles Authors deposit pre-prints or published articles in

these repositories Concept: public, free access to scholarly

information without paid subscription to journals

Page 5: Metadata april 8 2013

5

History of the OAI (cont.)

Why? Scholarly research belongs to people Speeds the sharing of research Better for authors and readers

Known as the “open archives movement” Has nothing to do with physical archives

(repositories of institutional history or collections of unpublished materials)

Page 6: Metadata april 8 2013

6

History of the OAI (cont.)

Many e-print servers grew Overlapping disciplinary coverage Overlapping geographic coverage

Developing need to search multiple repositories simultaneously

(=federated searching) automatically identify and copy papers from

other repositories (=repository synchronization)

Page 7: Metadata april 8 2013

7

History of the OAI (cont.)

Meeting of experts, 1999, Santa Fe, New Mexico, USA

Defined an interface so that repositories could expose metadata for papers they held

Metadata could then be discovered by federated search services and other repositories and copied

Known as the Santa Fe Convention (later developed into PMH – Protocol for Metadata Harvesting

Page 8: Metadata april 8 2013

8

The Open Archives Model

Similar concept to union catalog Metadata “harvested” and stored in central

repository “Pull” rather than “push” model Collecting is similar to Internet spider

collecting HTML content

Page 9: Metadata april 8 2013

9

PMH and Z39.50

Differs from Z39.50 (specifically rejected at Santa Fe)

Z39.50: allows a client to search a remote

information server across a network Difficult to perform high-quality federated searches

across many servers – would need to deal with each server individually

Complex protocol

Page 10: Metadata april 8 2013

10

PHM and Z39.50 (cont.)

PHM is a simple protocol User interacts with database of harvested metadata,

not with individual repositories Database is constructed by the federated search

service using PHM Therefore, performance depends only on the

federated search service, not the individual repositories

Page 11: Metadata april 8 2013

11

Metadata Harvesting Protocol

Queries and responses carried over http Harvester application can request a single

metadata record or group of records to be exported Application can restrict records by date to only

gather new records (since previous harvesting)

Page 12: Metadata april 8 2013

12

Metadata Harvesting Protocol (cont.)

OAI-compliant data providers are capable of responding to such requests Data provider must be able to export metadata in

at least DC (unqualified) using XML communication syntax

Data provider includes URI with metadata

Page 13: Metadata april 8 2013

13

Metadata Harvesting Protocol (cont.)

Servers can also provide metadata in other schemes beside DC

Harvester applications can request metadata in other schemes beside DC

Harvester applications can also query a metadata repository for: List of metadata formats supported by repository List of record sets supported by the repository List of the identifiers of all records within the repository

Page 14: Metadata april 8 2013

14

Why the OAI PHM is important

Provides for a minimal level of interoperability Drives development of community-

specific metadata schemes Potential for new modes of scholarly

communication Dependent on widespread implementation by

research organizations, publishers, and “memory organizations” (i.e., libraries, museums, archives)

Page 15: Metadata april 8 2013

15

QUIZ!!!

http://www.oaforum.org/tutorial/english/page1.htm#section5

Page 16: Metadata april 8 2013

Problems with Metadata Harvesting

Loss of data when mapping unqualified DC Incorrect data from improper mapping Inconsistent punctuation and formatting

because of diverse sources of metadata High variance in data between institutions

16

Page 17: Metadata april 8 2013

Metasearching

Many systems = many metadata standards Convert to single system (harvesting)? Maintain individual element sets BUT create

interface to search simultaneously across heterogeneous databases

Voila: Metasearching! Not a single method

17

Page 18: Metadata april 8 2013

Definition

From NISO MetaSearch Initiative:“search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at one time.”

Best known: Z39.50 protocol. Used to search remote library catalogs.

18

Page 19: Metadata april 8 2013

Z39.50

Allows computers to communicate to retrieve information – between client and server

Searches and results are restricted to Z39.50 databases

19

Page 20: Metadata april 8 2013

Z39.50 results

Server may interpret the query incorrectly Some automatically add Boolean “and” while

others add Boolean “or” Vocabulary issues – different vocabulary in

different databases Display results in order retrieved, by database

found, by data, by relevance

20

Page 21: Metadata april 8 2013

Problems with Z39.50

High recall, little precision Also present in Google Search: few studies

on user satisfaction Results may display in an irrelevant order for

the searcher

21

Page 22: Metadata april 8 2013

Metasearching: pros and cons

Single database searching allows users to use specialized indexing or controlled vocabulary

Single portal: No need for searcher to select a particular

database from list of databases

22

Page 23: Metadata april 8 2013

Case Studies

Divide into 3-4 groups Read the case study Discuss and report:

Describe the case briefly (2 min.) What can we learn from this case study? (3 min.)

23