a library science perspective on digitization
DESCRIPTION
A Library Science Perspective on Digitization. Bryan Heidorn University of Arizona. Library-Museum Parallels. Intellectual Property Rights Physical /Digital Objects Sharing Descriptive Metadata Formats Preservation Metadata Transport Metadata Formats Communication Protocols (no so much) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/1.jpg)
A Library Science Perspective on Digitization
Bryan HeidornUniversity of Arizona
![Page 2: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/2.jpg)
Library-Museum Parallels
• Intellectual Property Rights• Physical/Digital Objects Sharing• Descriptive Metadata Formats• Preservation Metadata • Transport Metadata Formats• Communication Protocols (no so much)• Similar Digitization Workflow• OCR Challenges
![Page 3: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/3.jpg)
Intellectual Property Rights
• Expanded to 75yrs in US from 25• Academic Publishing anomalies• Attribution required (data no so much) • Decoupling of Data from Text
![Page 4: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/4.jpg)
Online Computer Library Center (OCLC)
• Collaborative Automation of libraries including copy cataloging
• Started 1967• Catalog 271 million items/year• 72,000 libraries in 170 countries and
territories use OCLC services to locate, acquire, catalog, lend and preserve library materials.
![Page 5: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/5.jpg)
Descriptive Metadata Formats
• MARC(XML) 21 Standard• METS• Dublin Core (Interchange Format only)
![Page 6: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/6.jpg)
Biodiversity Heritage Library Workflow
Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries
![Page 7: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/7.jpg)
![Page 8: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/8.jpg)
MARC 21 Standard
• Formats: Bibliographic, Authority, Holdings, Classification, Community
• Bibliographic Material Types: – Books (BK)– Continuing resources (CR) – Computer files (CF) – Maps (MP) – Music (MU) – Visual materials (VM) – Mixed materials (MX)
http://www.loc.gov/marc/
![Page 9: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/9.jpg)
MARC Fields• 00X: Control Fields• 01X-09X: Numbers and Code Fields• Heading Fields - General Information• 1XX: Main Entry Fields• 20X-24X: Title and Title-Related Fields• 25X-28X: Edition, Imprint, Etc. Fields• 3XX: Physical Description, Etc. Fields• 4XX: Series Statement Fields• 5XX: Note Fields• 6XX: Subject Access Fields• 70X-75X: Added Entry Fields• 76X-78X: Linking Entry Fields• 80X-83X: Series Added Entry Fields• 841-88X: Holdings, Location, Alternate Graphics, Etc. Fields
![Page 10: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/10.jpg)
MARC Book Exampleeader/00-23 *****nam##22*****#a#4500001 <control number>003 <control number identifier>005 19920331092212.7007/00-01 ta008/00-39 820305s1991####nyu###########001#0#eng##020 ##$a0845348116 :$c$29.95 (£19.50 U.K.)020 ##$a0845348205 (pbk.)040 ##$a[organization code]$c[organization code]050 14$aPN1992.8.S4$bT47 1991082 04$a791.45/75/0973$219100 1#$aTerrace, Vincent,$d1948-245 10$aFifty years of television :$ba guide to series and pilots, 1937-1988 /$cVincent Terrace.246 1#$a50 years of television260 ##$aNew York :$bCornwall Books,$cc1991.300 ##$a864 p. ;$c24 cm.500 ##$aIncludes index.650 #0$aTelevision pilot programs$zUnited States$vCatalogs.650 #0$aTelevision serials$zUnited States$vCatalogs.
![Page 11: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/11.jpg)
Difference between Museum and Library
• Full Darwin code has parallels in MARC• Many more commercial and custom products• Larger installed base• Library Entries somewhat more detailed • There is a MARC(XML) and MARC Lite• MARC differentiates among material types
![Page 12: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/12.jpg)
Digital Content Transport
• METS – Metadata Encoding and Transmission Standard
• The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language.
![Page 13: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/13.jpg)
Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries
![Page 14: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/14.jpg)
METS Components• METS Header • Descriptive Metadata • Administrative Metadata • File Section - The file section lists all files containing content
which comprise the electronic versions of the digital object. <file> elements may be grouped within <fileGrp> elements, to provide for subdividing the files by object version.
• Structural Map • Structural Links • Behavior
![Page 15: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/15.jpg)
I/O
• Submission Information Package (SIP), which is sent from the information producer to the archive;
• the Archive Information Package (AIP), which is the information package actually stored by the archive; and
• the Dissemination Information Package (DIP), which is the information package transferred from the archive in response to a request by a consumer.
![Page 16: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/16.jpg)
Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries
![Page 17: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/17.jpg)
Open Archives Initiative Protocol for Metadata Harvesting
• The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.
![Page 18: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/18.jpg)
OAI Verbs
• Get• Identify• ListIdentifiers• ListMetadataFormats• ListRecords• ListSets
![Page 19: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/19.jpg)
Get
• http://arXiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:cs/0112017&metadataPrefix=oai_dc
![Page 20: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/20.jpg)
<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-02-08T08:55:46Z</responseDate> <request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017" metadataPrefix="oai_dc">http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv.org:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Using Structural Metadata to Localize Experience of Digital Content</dc:title> <dc:creator>Dushay, Naomi</dc:creator> <dc:subject>Digital Libraries</dc:subject> <dc:description>With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or rendering, from digital object storage and manipulation, so the rendering can be tailored to particular communities of users. </dc:description> <dc:description>Comment: 23 pages including 2 appendices, 8 figures</dc:description> <dc:date>2001-12-14</dc:date> </oai_dc:dc> </metadata> </record> </GetRecord></OAI-PMH>
![Page 21: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/21.jpg)
Metadata Collection and Workflow (Macaw)
![Page 22: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/22.jpg)
Physical/Digital Objects Sharing
• Books both part of an Edition and Unique• 20th century books have standard front matter• LMS contained Metadata Only• Journals indexed by article• Most digital content is commercially owned and
born digital• 2011 author-publishing exceeded commercial • Born analog digitization (Google Books and BHL)
![Page 23: A Library Science Perspective on Digitization](https://reader035.vdocument.in/reader035/viewer/2022062218/568160fd550346895dd03a56/html5/thumbnails/23.jpg)
Governance
• Libraries pay for OCLC• OCLC is Participatory• Close Collaboration with Library of Congress
on Standards• School System exists to train librarians• Libraries are being cut in academic, public and
school sectors