a digital library repository utilizing the open archives initiative developed to meet the needs of...

21
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Upload: audrey-fox

Post on 23-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

A Digital Library RepositoryUtilizing the

Open Archives Initiative

Developed to meet the needs of UTK Library Special Collections

Page 2: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Tremendous quantities of valuable information exist in Museums, Libraries, and Research Centers

which are not available in a standardized format via centralized search engines

Photos and videosScientific records Mathematical findings

Musical scores and sound tracks

Historical Documents

Theses and Dissertations

The Problem:

How to make the connection???

Page 3: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Translation of records:

Into a Common Format and Language:

XML & Unqualified Dublin Core

Storage: of these translations

Response: to a standardized set of queries

The Open Archives Solution:

Gather document descriptions from Repositories into large databases, using OAI Harvesters Set up search engines to offer up information in these databases

Page 4: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Photos and videosScientific records Mathematical findings

Musical scores and sound tracks Historical

DocumentsTheses and Dissertations

Required For Translation:

Understanding of XML and XML schemas

Determining correct mapping of information to Unqualified Dublin Core Elements, in order to translate legacy files into a metadata format supported by the Open Archives Initiative

Scripts to reduce the labor of translation

Page 5: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

The 15 elements of Dublin Core Unqualified:

A Common Language…. Dublin Core

Content: Title Description Coverage

Relation Source Subject Type 

Intellectual Property: Contributor Creator Publisher Rights

Instantiation: Date Format Identifier Language

Page 6: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

<complexType name="dublincoreType">

<choice minOccurs="0" maxOccurs="unbounded">

<element name="subject" minOccurs="0" maxOccurs="unbounded" type="string"/>

</choice> </complexType> </schema>

A Common Framework: XML schemas

The XML schema

constrains each element of the document,

providing rules and framework for parsing:

Page 7: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

<PROFILEDESC><TEXTCLASS><KEYWORDS>SCHEME="LCSH"><LIST><ITEM>Letters</ITEM><ITEM>Cherokee Indians—Claims against</ITEM><ITEM>Tennessee</ITEM></LIST></KEYWORDS></TEXTCLASS></PROFILEDESC></TEIHEADER>

From a TEI Lite SGML file segment:

To an Unqualified Dublin Core XML file segment:

<subject> Letters</subject> <subject>Cherokee Indians Claims against</subject> <subject>Tennessee</subject>

A Common Format…. XML

Page 8: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

<TEIHEADER> <FILEDESC> <TITLESTMT><TITLE>[Letter] July 8, 1839, Washington City DC, [to] HP King, Qualla Town / William

Holland Thomas: a machine-readable transcription of an image</TITLE>… <AUTHOR>Thomas, William Holland</AUTHOR> … <PUBLISHER>The University of Tennessee Libraries</PUBLISHER> <IDNO>wt025</IDNO> …<AVAILABILITY><P>This work is the property of the Special Collections Library, University of Tennessee, Knoxville, TN. It may be used freely by individuals for

research, teaching, and personal use as long as this statement of availability is included in the text.</P></AVAILABILITY></PUBLICATIONSTMT> <SOURCEDESC><BIBL>…

<DATE VALUE="1839-07-08">July 8, 1839</DATE>…<NOTE TYPE="summary">This document is a letter dated July 8, 1839 to H.P. King

from William Holland Thomas with instructions for running the Indian Store.</NOTE> … <PROFILEDESC> <TEXTCLASS> KEYWORDS SCHEME="LCSH"><LIST> <ITEM>Cherokee Indians</ITEM> <ITEM>Government relations</ITEM> </LIST> /KEYWORDS></TEXTCLASS></PROFILEDESC>… <TEXT><BODY><DIV1 TYPE="letter">…

Selected Portions of a TEI-Lite SGML record

Page 9: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

… Translated to XML Unqualified Dublin Core

<title>[Letter] July 8, 1839, Washington City DC, [to] HP King, QuallaTown</title> <contributor>The University of Tennessee Libraries, Knoxville</contributor> <contributor>Southeastern Native American Documents Collection (GALILEO (Georgia statewide project)) GAGAL</contributor> <creator>Thomas, William Holland</creator> <publisher>The University of Tennessee Libraries</publisher> <date>July 8, 1839</date> <description> This document is a letter dated July 8, 1839 toH.P. King from William Holland

Thomas with instructions for running the Indian Store.</description> <identifier>Document ID: wt025</description> <identifier>http://www.helios.dii.utk.edu/oai/sgm/00178.html <subject>Cherokee Indians</subject> <subject>Government relations</subject> <rights> This work is the property of the Special Collections Library, University of Tennessee, Knoxville, TN. It may be used freely by individuals for research, teaching, and personal use as long as this statement of availability is included in the text. </rights> <type>letter</type> <type>computer file</type>

Page 10: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Crosswalks available:

MARC to DC: http://www.loc.gov/marc/dccross.html

Shown in action at: http://alcme.oclc.org/marc2dc/index.html

OTHERS:

http://www.sinica.edu.tw/~metadata/tool/mapping-foreign.html

http://www.lub.lu.se/tk/metadata/MDin9612.html

http://www.getty.edu/research/institute/standards/intrometadata/3_crosswalks/index.html

Translation Tools:

Page 11: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

The Open Archives Solution:

Gather document descriptions from Repositories into large databases, using OAI Harvesters Set up search engines to offer up information in these databases

Translation of records:

Into a Common Format and Language:

XML & Unqualified Dublin Core

Storage: of these translations

Response: to a standardized set of queries

Page 12: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Storage of OAI Records

mysql> create table gsm( -> id char(10) not null, -> primary key (id), -> date char(10), -> path char (80), -> listit text);

$sth = $dbh->prepare("select listit from $set where date <= '$until' and date >= '$from' order by id");

MySQL: small, fast, and free: http://www.mysql.com

Use scripts to load database and retrieve information

Store entire records, already marked up in Unqualified Dublin Core, for quick response; …or

Store fields untagged, multiple values for a field separated by tags, and retag upon request: flexibility. This structure allows for a record to be entered once and retrieved in various formats upon request.

For local search engines, also store hardcoded xml files in a directory.

Page 13: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

The Open Archives Solution:

Gather document descriptions from Repositories into large databases, using OAI Harvesters Set up search engines to offer up information in these databases

Translation of records:

Into a Common Format and Language:

XML & Unqualified Dublin Core

Storage: of these translations

Response: to a standardized set of queries

Page 14: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Response:

Offer up document descriptions via a standardized set of queries & responses:

the Open Archives Initiative Protocol1) 6 Verbs, with 5 required and/or optional arguments

2) Unique Identifiers, Optional Sets, and Metadata Prefixes

3) Flow control & Resumption Tokens

4) Error Codes

Page 15: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Verbs and arguments: The Open Archives Protocol

1) Identify

2) ListSets

3) ListMetadataFormats: optional: identifier

4) ListIdentifiers: required: metadata prefix (oai_dc); optional: from, until, set, resumption token

5) ListRecords: required: metadata prefix (oai_dc); optional: from, until, set, resumption token

6) GetRecord: required: identifier and metadata prefix

Page 16: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Identifiers, Sets, and Metadata Prefixes

oai:tkn:har/har0001oai:tkn:che/che0003oai:tkn:civ/civ0001oai:tkn:etd/etd0002oai:tkn:emn/emn0001oai:tkn:ead/ead0003oai:tkn:gsm/gsm0045oai:tkn:ldr/ldr0002oai:tkn:rth/rth0034oai:tkn:tdh/tdh0005oai:tkn:vid/vid0001

harche civ etd

emn ead gsm ldr rth tdh vid

Bessie Harvey CollectionCherokee Civil War CollectionElectronic Theses and Dissertations Emancipator Encoded Archival Description Great Smoky Mountains Library Development Review Roth Photography Collection Tennessee Documentary HistoryVideos

Sample Identifiers:Input as "Set":

Current Sets:

Supported Metadata prefix:     oai_dc

Page 17: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

Flow Control and ResumptionTokens

For ListIdentifiers, ListSets and ListRecords

<resumptionToken>

LRrtdc20f19990202u20020101

</resumptionToken>

LR or LI for ListRecord or ListIdentifier

rt: Number or letter combination: which set next

dc: Metadata format

20: Which record number to start with this time

f19990202 = From date 1999-02-02

U20020101 = Until date 2002-01-01

Specifies the call to the database when thisResumption token is returned!!

Page 18: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

badResumptionToken

badVerb

badArgument

idDoesNotExist

cannotDisseminateFormat noMetadataFormats noRecordsMatch

noSetHierarchy

Error Codes: version 2.0

Page 19: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

OAI 1.1 Test interface and Local Search Engine: http://oai.sunsite.utk.edu/1.1.html

Search by:

word or phrase

Searching by all or any field and set,

Sorting by date or set

Returning:

Lists of identifiers or short file descriptions,

each with links to full file in HTML, XML, and online document

Scientific records Mathematical findings

Musical scores and sound tracks

Historical Documents

Theses and Dissertations

Videos and Photos

Page 20: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

The Open Archives Solution:

Gather document descriptions from Repositories into large databases, using OAI Harvesters Set up search engines to offer up information in these databases

Translation of records:

Into a Common Format and Language:

XML & Unqualified Dublin Core

Storage: of these translations

Response: to a standardized set of queries

Page 21: A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections

CrossWalks:http://www.sinica.edu.tw/~metadata/tool/mapping-foreign.html

http://www.lub.lu.se/tk/metadata/MDin9612.htmlhttp://www.getty.edu/research/institute/standards/

intrometadata/3_crosswalks/index.html

More Information: www.openarchives.org

Pre-developed repositories, harvesters, search engines, and more: http://www.openarchives.org/tools/tools.html

Current Service Providers, who can offer searches of your records from your repository responses;

http://www.openarchives.org/service/listproviders.html