optimized metadata repurposing in a library using marcedit sai deng, wichita state university...

38
Optimized Metadata Repurposing in a Library Using MarcEdit Sai Deng, Wichita State University Libraries ALCTS Catalog Form and Function Interest Group Meeting

Post on 20-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Optimized Metadata Repurposing in a Library Using MarcEdit

Sai Deng, Wichita State University Libraries

ALCTSCatalog Form and Function Interest Group Meeting

Outline

Goldbarth poems repurposing from an in-house inventory to OCLC/VoyagerChallenges in metadata mapping and transformationETDs harvesting and repurposing from DSpace to OCLC/VoyagerDiscussion on metadata repurposing and management

Goldbarth Poems in A Special Collection Inventory

Albert Goldbarth’s contributions to journals, including over 1800 poems.Goldbarth Collection:

A: Books authored by GoldbarthB: Blurbs by Goldbarth (reviews)C: Contributions to Journals (poems)L: Library (Goldbarth’s library collection) R: Research Books T: Textbooks and Anthologies X: Miscellaneous

(This in-house inventory structure was developed by Special Collections’

staff and is based on B. C. Bloomfield and Edward Mendelson’s schema for a bibliography of the poet W. H. Auden.)

Goldbarth Poems Metadata

Excel file, free style SC number; SC subnumber; vol/number/date; Journal title; Title of Goldbarth appearance (page number); NotesData example: 5575 C 50; Volume 1 Number 1 Spring 1971; Ark River Review; Blocking a Street for Peace (4); published in Wichita

Created by Special Collection to track materials.A local decision was made to add the poem entries to OCLC and Voyager. How to reuse these Excel data instead of re-cataloging every field in OCLC?

MarcEdit Delimited Text Translator

Manipulating and Optimizing Metadata in Excel

Enriching data and batch editingpublication place (260$a), publisher name (260$b), publication year (260$c), dimensions (300$c), journal OCLC no. (773$w), additional notes (if needed)

Combine several fields to one to conform to MARC field definition

Example: =Concatenate(G2, “(“, I2,”)”) Result: Texas Observer (Austin, Tex.)

(Connect “journal title” in column G and “publication place” in column I for 773$t, host item title)

Manipulating and Optimizing Metadata in Excel

Extract data to form a new fieldExample: =Right(F2, 4) Result: Vol. 76, No. 24, 1984 1984

(Extract publication year “1984” from original value “Vol. 76, No. 24, 1984” in column F line 2 for 260$c, publication year)

Formatting data to conform to MARC punctuationExample: =Concatenate(I2, “:”) Result: Austin, Tex. Austin, Tex. :

(Add “:” after publication year for Marc filed 260$a, publication place)

Other batch editingChange title cases;Make vol., no., places and other terms consistent…

Metadata Mapping and Transformation from Excel to Marc

Metadata Mapping and Transformation from Excel to Marc

Metadata Mapping and Transformation from Excel to Marc

Mapping and transformationExcel Field Marc fieldCallno 099$aTitle 245$aPubplace 260$aPubname 260$bSize 260$cPage 300$aNotes 500$a (or 590$a)Vol 773$aJournal 773$tTransformation from Excel file to Marc text file: (.xls->.mrk)

Post-transformation Editing in MarcEditor

Add/delete field (if needed) Example: add general notes, 500 $a Albert Goldbarth Collection.

Edit indicator data and subfield data (if needed)

Edit 008 field (adjust year and place code)

A Transformed Marc Text Sample Record

=LDR 00738caa a2200229Ia 4500 =008 081124s1988\\\\tnu\\\\\\\\\\\000\0\eng\d =099 \\$aSPEC$aCOLL$a5575$aC 1550 =100 1\$aGoldbarth, Albert. =245 10$aLullabye for skyler /$cAlbert Goldbarth. =260 \\$aClarksville, Tenn :$bAustin Peay State University, Center for the

Creative Arts,$c1988. =300 \\$aP. 15 ;$c23 cm. =500 \\$aAlbert Goldbarth Collection. =500 \\$aGoldbarth's poems printed on blue paper. =730 0\$aAlbert Goldbarth Collection. =773 0\$7nnas$tZone 3 (Clarksville, Tenn)$aVol. 3, No. 2 (Spring

1988)$w(OCoLC)13451008

Batch Export Marc to OCLC

Need to delete special collection call no in OCLC after records being exported to Voyager.

Case 1: Some Reflections

Greatly improves cataloging efficiency;

Some limitations:008 field cannot deal with data varieties, needs editing in MarcEditor;Data mapping interface doesn’t show the field name (only field number such as field 1…), which makes mapping less intuitive.

Challenges in Metadata Mapping and Transfer

The following apply to both Goldbarth and ETD projects.

One-to-many and many-to-one mappingOne field mapped to several sub fields;Combine several fields to form a new field.

No data to be mapped to a particular fieldEnriching field data (before/after mapping);Adding indicators in mapping.

Metadata integrity: data loss not obvious, except undefined punctuations…

Case-by-case analysis of homegrown metadata Data maintenance (different departments, systems)

More Challenges in Metadata Mapping and Transfer

The following apply more to the ETDs DC-Marc mapping.Specificity and granularity

Handling of keywords and controlled vocabulary to be mapped to a more structured system;Subfields and indicators need to be added;DSpace 1.4 does not offer qualified DC harvesting.

Left-out metadata Decision to discard some DC data.

ETDs at WSU

Chart created by Institutional Repository Librarian Susan Matveyeva

ETDs in DSpace/SOAR, OCLC and Voyager

One digital copy deposited in SOAR (Shocker Open

Access Repository, DSpace based)

Metadata in Dublin Core format

Re-entered in OCLC and downloaded to Voyager

Metadata in MARC format

ETD workflow dilemma Double keying in SOAR and OCLC

Metadata management: repurposing needed

ETD Workflow in Other Institutions

ETD workflowUniversity of Virginia (1999), Texas A & M (2004)

Home-grown scripts, site-specific harvesters

Kent State University (2007)Harvest from OhioLINK ETD Center, ETD-MS to Marc…

XSLT TransformationLC MARC 21 XML schema with MarcXML toolkit

Dublin Core to MARCXML Stylesheet

OAI community developed tools, mostly for IT staffMarcEdit (Terry Reese)

Metadata Harvester, MARC EditorLow-barrier harvester, can be used by catalogers

Sample Record in SOAR (Dublin Core) DC Field Value

dc.contributor.author Niles, Rae-dc.date.accessioned 2006-12-24T14:56:10Zdc.date.available 2006-12-24T14:56:10Z-dc.date.copyright 2006dc.date.issued 2006-05dc.identifier.other d06005dc.identifier.uri http://hdl.handle.net/10057/373-dc.description Thesis (Ed.D.)--Wichita State University, College of Education.endc.description "May 2006.”dc.description Includes bibliographic references (leaves 129-145).endc.description.abstract The purpose of this study was to describe and identify Sedgwick High School’s teacher and student perceptions of the impact of one-to-one laptop computer access using an appreciative inquiry theoretical research perspective and the theoretical frameworks of change and paradigm shift… dc.format.extent xiv, 167 leaves : digital, PDF file. dc.format.extent 1174852 bytes-dc.format.mimetype application/pdf-dc.language.iso en_US dc.rights Copyright Rae Niles, 2006. All rights reserved.dc.subject.lcsh Educational technologydc.subject.lcsh Education--Data processingdc.subject.lcsh Electronic dissertationsdc.title A study of the application of emerging technology: teacher and student perceptions of the impact of one-to-one laptop computer accessdc.type Dissertationdc.thesis.adviser Calabrese, Raymond L.dc.identifier.oclc 71805797-Appears in Collections: EL Theses and Dissertations COE Theses and Dissertations Dissertations

Dublin Core to MARC Mapping Fields in DSpace Transformed MARC fields in OCLC

dc.contributor.author 100 1 _ Author.dc.date.accessioned dc.date.available dc.date.copyright dc.date.issued 260 ǂc year. dc.identifier.other 099 …… dc.identifier.uri 856 4 0 …dc.description 502 Thesis (Ed.D.)--Wichita State University, College of …dc.description 500 "Month year." dc.description 504 Includes bibliographic references…dc.description.abstract 520 3 _ …dc.format.extent 300dc.format.extent dc.format.mimetype dc.language.iso 546 en_US dc.rights 540 Access restricted to WSU students, faculty and staff (delete)dc.subject 690 (keywords, not controlled vocabulary, delete)dc.subject.lcsh 650 _ 0 dc.title 245 1 _ …dc.type 655 _ 7 Dissertation ǂ2 local dc.thesis.adviser 700 1 2 … ǂe advisordc.identifier.oclc 856 4 1 … Appears in Collections:

Using MarcEdit

MarcEdit Interface

Metadata transformation in MarcEdit

The wheel and spoke design for metadata transformation (by Reese)

EAD TEI

MODS

MarcXML

Dublin Core

Data Flow Diagram

MarcEdit

OAI response

Export

MARC

OAI request

OCLC

Metadata Harvester

MarcEditor

Voyager

DSpaceAuthorized data processing(Title, author, subject…)

Resolving data ambiguity(Many to one mapping w/ element positioning…)

String Processing(Data normalization…)

XSLT

(DC to MarcXML)Customization

Raw XML

(DC)

Selective Harvesting

Define in MarcEditby identifier (e.g. oai:soar.wichita.edu:10057/255 )by set (e.g. hdl_10057_351)by date (e.g. from=2007-01-01&until=2008-01-01)

Or, http://soar.wichita.edu/dspace-oai/request?verb=ListRecords&metadataPrefix=oai_dc&from=2007-01-01&until=2008-01-01

How do we define harvesting theses only?

Define by set (http://soar.wichita.edu/dspace-oai/request?verb=ListSets)

Sets by schools and departmentsAE Theses and Dissertations (hdl_10057_313)ANTH Theses (hdl_10057_233)BIO Theses (hdl_10057_389)CE Theses and Dissertations …

Or sets in two categoriesMaster’s These (hdl_10057_351)Dissertations (hdl_10057_352)

Mapping Problems

Harvested Test Records Exported to OCLCError Reports in OCLC

100 occurrence 1, indicator 2 - invalid code520 occurrence 4, $a occurrence 1, position 76 - invalid character - data must be ALA characters655 occurrence 1, $2 - invalid relationship - when element is present, then 655 indicator 2 must equal 7 …

Mapping ProblemsFour “description” fields of DC all mapped to 520 (summary)

dc.date (e.g. “2006-12-24T14:56:10Z”) mapped to 260 (publication, distribution)dc.identifier (e.g. “d06005”) mapped to 500 (general notes) All keywords and subjects mapped to 690 (local subject).

Need customization to meet our needs.

Customization category (Reese)Resolving data ambiguityAuthorized data processingString Processing

What I originally tried to categorize the customization types

Selective data transformation; metadata element positioning; field relationship definition; field indicator correction and validation; partial data extraction…The issues were arranged under Reese’ category

Customized Mapping in XSLT

Customized Mapping in XSLT

Resolving data ambiguitySame DC fields to different MARC fields:

description 502(Dissertation) 500(General Note) 504 (Bibliography)

Qualified DC element:description.abstract 520(abstract)

Solution: element positioning <xsl:for-each select="dc:description[1]"> - <datafield tag="502" ind1="" ind2="">

- <subfield code="a"> <xsl:value-of select="normalize-space(.)" /> </subfield> </datafield> </xsl:for-each> <xsl:for-each select="dc:description[2]"> - <datafield tag="500" ind1="" ind2=""> - <subfield code="a"> <xsl:value-of select="normalize-space(.)" /> </subfield> </datafield> </xsl:for-each> …

Customized Mapping in XSLT

Authorized data processingPrimary entries vs. added entries: title and personal names processing

Template to deal with personal names (in MarcEdit)E.g. <dc:creator>Webb, Kyle M.</dc:creator>

<dc:creator>Webb, Kyle M., 1977 -</dc:creator> transformed to =100 1\$aWebb, Kyle M. =100 1\$aWebb, Kyle M., $d1977-

Identify field relationship and correct indicators 100, 245 (author, title) relationship: if 100 exists, 245 1 _

or else, 245 0 _

Local element: dc.thesis.advisor transformed to 700 1_ (If more than one dc.thesis exists, positioning is needed.)

Customized Mapping in XSLT

Processing of non-filing characters in title 245 (title) 2nd indicator: …a, an, the… (0, 2, 3, 4)

<xsl:for-each select="dc:title[1]"> - <xsl:choose> - <xsl:when test="$exist100!=''"> - <xsl:choose> - <xsl:when test="substring(., 1, 2)='A '"> - <datafield tag="245" ind1="1" ind2="2"> - <xsl:choose> - <xsl:when test="contains(.,':')"> - <subfield code="a"> <xsl:value-of select="concat(substring-before(.,':'),' : ')" /> </subfield> - <subfield code="b"> <xsl:value-of select="concat(substring-after(.,':'),' / ')" /> </subfield> </xsl:when> … <xsl:otherwise> <datafield tag=“245” ind1=“1” ind2=“0”>

Alternatively, it can be defined in the title template.

Customized Mapping in XSLT

Subjects vs. Keywords

Only kept common subject in the test (when keywords and subjects mixed inconsistently) - <xsl:for-each select="dc:subject"> - <xsl:if test=".='Electronic dissertations'"> - <datafield tag="650" ind1="" ind2="0"> - <subfield code="a"> <xsl:value-of select="." /> </subfield>

Subject template (OSU solution)<dc:subject>ocean wave energy</dc:subject>

<dc:subject>direct-drive</dc:subject> <dc:subject>fluid-structure interaction</dc:subject> <dc:subject>Ocean wave power</dc:subject> <dc:subject>Fluid-structure interaction</dc:subject> Transformed to =650 \0$aOcean wave power. =650 \0$aFluid-structure interaction. =690 \\$aocean wave energy. =690 \\$adirect-drive. =690 \\$afluid-structure interaction.

Customized Mapping in XSLT

String ProcessingFunctions

normalize-space() translate() substring()…

Example: Extract partial value from DC element260 (Date): only extract year from the issuing date in DC

- <xsl:for-each select="dc:date[4]"> - <xsl:if test=".!=''"> - <datafield tag="260" ind1="" ind2=""> - <subfield code="c"> <xsl:value-of select="substring(.,1,4)" /> . </subfield> </datafield> </xsl:if> </xsl:for-each>

Customized Mapping in XSLT

Leaders: fixed fields that comprise the first 24 character positions (00-23) of each MARC record. They provide information for the processing of the record.

008 field (Fixed-Length Data Elements)Type (t, manuscript language material) BLvl (m, Encoding level is monograph) Desc (a) ELvl (I, encoding level is full level) Form (s, form of item is electronic) Cont (b, m, content is theses with bibliographies) Ills (a, illustration included) Srce (d, cataloging source) Conf (0, not a conference publication) Fest (0, not a festschrift) LitF (0, not fiction) DtSt (s, single date) Indx (0, no index) Lang (eng, language is English) Ctry (xx)

Ways to handle:Scripting and adding all fixed fields (leader and 008 fields) in OAIDCtoMARCXML.xsl; Or, Adding 008 in MarcEditor after record export;Or, applying fixed field template after records being exported to OCLC.

Harvesting Using Customized XSLT and Records will be Dumped to MarcEdit- MarcEditor

MarcEditor

Edit harvested theses in MarcEditorBatch edit fields, subfields, indicators (if needed)

E.g.: add 008 field for all records

.mrk (MARC text file) Compile to .mrc (MARC)

OrSave as .mrk8 (MARC UTF8 text file) Compile to .mrc (MARC)

Batch Import Records to OCLC

Click “File-Import Records…” Select “Import to Local Save File” Review/editing as needed, attach holding and apply fixed field template of ETD (if needed).

Case 2: Some Reflections

The customized mapping and metadata transfer can eliminate the need of double entry in DSpace and OCLC/Voyager and significantly improve our ETD work flow.

Data mapping, manipulation and transformationUsing qualified DC instead of element positioning in XSLT;

DSpace 1.5 enables qualified DC crosswalk for OAI-PMH;Handling of MARC fixed fields and 008 field.

Other technical issuesUsing other tools for harvesting besides MarcEdit;Using DSpace Item Importer and Exporter instead of Metadata Harvester.

Discussion on Metadata Repurposing and Management

Metadata RepurposingThe tool

MarcEdit: a low-barrier metadata harvesting, mapping and transfer tool.

The crosswalkOne single crosswalk and style sheet will not meet all needs;Needs to be based on standard practice but add local variations;Application-specific mapping is needed for special projects.

Dealing with common mapping challenges One-to-many/many-to-one mapping, specificity and granularity, no data to be mapped to a field, left-out data, data integrity, data loss, home-grown data, case analysis…

Metadata management Coordination in metadata repurposing is important;Synconizing metadata in different systems?

Updating data in both systems (e.g. DSpace, OCLC/Voyager);Re-harvesting, re-transfer and data overlay?

Project Team and Acknowledgements

Goldbarth CatalogingSai Deng, Nancy Deyoe, Laurie Allen, Technical ServicesMary Nelson, Josh Yearout, Lorraine Madway, Special Collections, Wichita State University

ETDs Project Susan Matveyeva, Sai Deng, Tse-Min Wang, Sandy Oswald, Manoj Gogoi, Technical Services, Wichita State UniversityTerry Reese, Consultant, Oregon State University

Thank you!