authority work automating mike monaco coordinator, cataloging … · 2019. 5. 10. · sierra treats...

67
Automating Authority Work Mike Monaco Coordinator, Cataloging Services May 14, 2018

Upload: others

Post on 10-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Automating Authority Work

Mike Monaco

Coordinator, Cataloging Services

May 14, 2018

Page 2: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Automating authority work, or,

Be your own authority control vendor

Mike MonacoCoordinator, Cataloging ServicesThe University of [email protected]

Ohio Valley Group of Technical Services Librarians2018 Conference, May 13-15, 2018Hesburgh LibrariesThe University of Notre DameSouth Bend, Indiana

Page 3: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Who are you?John Carroll University (2001-2004)

Part-time AV catalogerAkron-Summit County Public Library (2001-2004)

Substitute librarianCleveland Public Library (2004-2016)

Catalog librarianThe University of Akron (2016- )

Coordinator, Cataloging Services

Page 4: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

The University of Akron Libraries

University Libraries

Bierce Library

Science & Technology Library

Archival Services

(Separate Units)

Wayne College Library

Akron Law Library

Center for the History of Psychology

Page 5: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Authority control at UA● 1995 migration and vendor (BNA) supplied one-time authority processing

● Local authority work put on hold in expectation contracting with a vendor…which never happened

● Authority work resumed early 2000s

○ Full authority control for tangible items only

○ Shift to batches of e-resources over time made authority work for batches overwhelming

○ 2013: Budget 80:20 electronic:tangible

○ 2018: ratio is about 95:5

Page 6: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

What this is NOT about

Automated authority control within the ILS

Working with an authority control vendor

Page 7: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

What this IS

Grabbing the “low-hanging fruit” for batches of records

When traditional authority work is not practical

(the item is not in-hand

or headings reports are too vast to address individually)

Page 8: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Wouldn’t it be nice if...

The “Headings used for the first time” report could export a list of the headings, and we could batch search OCLC for records?

Page 9: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

The tool box● MarcEdit● OCLC Connexion Client● Excel (or other program for sorting textual lists)● pgAdmin (or similar for a SQL query, III/Sierra only)● A rudimentary grasp of Regular Expressions● EditPad (or similar RegEx-compatible text editor: Google Sheets, EmEditor)

Page 10: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

The process1. Before loading, correct variant headings (with MarcEdit)2. After loading, extract headings from report (with SQL query or ILS’s output)3. Separate names and subjects (in a spreadsheet or text editor) 4. Remove extraneous data (with RegEx-capable editor)5. Batch search for authority records (in Connexion Client)6. Load authority records

Page 11: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Validate HeadingsMarcEdit can check name and subject fields against LC authorities in the Linked Data Service, and automatically correct headings that match a variant (“Use for”) heading*.

*NB: The process is imperfect!

Page 12: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Because this is an extra step, we’ve been comparing record sets from various vendors to determine which ones really benefit.

Page 13: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Selected Vendor Loads (March 2017-March 2018)

Record Source

Records per load

Invalid headings per load

Variants changed per load

Invalid heading:Record ratio

Variant:Record ratio

Alexander Street Press

293 117 18 0.399279 0.060566

EBSCO 76992 75668 181 0.982807 0.002348

Films on Demand

2509 1019 175 0.406314 0.069911

Kanopy 9960 4946 397 0.496628 0.039815

Proquest EBC

13086 2309 101 0.17647 0.0077

World Share 31 7 0.7 0.232114 0.023772

Page 14: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

III Sierra

Page 15: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

SQL query of Headings used for the first time report

https://mmonaco-uakron.tinytake.com/sf/MjUwMDQxMF83NTIyNTY0

Page 16: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Headings used for the first time

Hundreds or even thousands of entries after batch loads...

Page 17: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct
Page 18: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct
Page 19: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

SQL query*

Page 20: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Results...

Page 21: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

In Excel...Be sure to import as Unicode

(UTF-8) if your ILS is encoding characters as Unicode rather than MARC8!

Page 22: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Sort the termsSorting A-Z arranges the headings by field group tag and MARC tag

(a=names, b=other names, d=subject)

So

a100-b730 : names used as names

d600-d630 : names used as subjects

d650- : subjects

Page 23: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

NoticeYou can’t feed this raw data into a batch search in Connexion Client

Page 24: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

In EditPad(or other RegEx-enabled editor)

Strip out MARC tags, delimiters, punctuation, etc.

Page 25: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Find/replace using RegEx(.*\|a)

(\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.)

(\|e.*|\|4.*|\|0.*)

(\|x.*|\|v.*|\|z.*)

(\|.|\|$)

(;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’ | ‘| be | that |\.{3}| near )

Page 26: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Names(.*\|a)

Everything before |a

Page 27: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.)

AACR2 abbreviations

b. d. fl. ca.

Page 28: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(\|e.*|\|4.*|\|0.*)

Relator terms, URIs

Page 29: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(\|.|\|$)

Any remaining delimiters and subfield codes

Page 30: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’ | ‘| be | that |\.{3}| near )

Punctuation, operators, and stopwords that foil OCLC searches

Page 31: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Names as subjects(\|x.*|\|v.*|\|z.*)

Subdivisions

Page 32: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Converting SQL output to batch searchable text file with RegEx

https://mmonaco-uakron.tinytake.com/sf/MjU4ODk4OF83Nzg3NTMy

Page 33: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Name headings

Page 34: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(.*\|a)

Page 35: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.)

Page 36: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(\|e.*|\|4.*|\|0.*)

Page 37: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(\|.|\|$)

Page 38: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

Page 39: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Names (can be skipped)(.*\|a)

(\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.)

(\|e.*|\|4.*|\|0.*)

(\|x.*|\|v.*|\|z.*)

(\|.|\|$)

(;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

Page 40: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Names as subjects(.*\|a)

(\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.)

(\|e.*|\|4.*|\|0.*)

(\|x.*|\|v.*|\|z.*)

(\|.|\|$)

(;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

Page 41: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Topical subjects (can be skipped)(.*\|a)

(\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.)

(\|e.*|\|4.*|\|0.*)

(\|x.*|\|v.*|\|z.*)

(\|.|\|$)

(;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

Page 42: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Sirsi/Dynix Symphony

Page 43: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

List unauthorized tags report

Page 44: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct
Page 45: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct
Page 46: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Slightly different procedure to clean these up1. Open .txt file in editor2. Delete header of report 3. Find/Replace to delete page headers (“Tags With UNAUTHORIZED

Headings / Produced on Sat Jul 1 17:00:11 2017”)4. Separate name and topical headings5. RegEx to remove other data

Page 47: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

So far so good...(.*\|a)

Page 48: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Uh oh...(\|e.*|\|4.*|\|0.*)

Misses “|?UNAUTHORIZED” by itself.

Only captures it if preceded by |e |4 |0

Page 49: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

(\|e.*|\|4.*|\|0.*|\|\?.*)

\|\?.*

captures

“|?” followed by anything

Page 50: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Problems

Page 51: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Spaces

Page 52: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Scraps

Portions of “|?UNAUTHORIZED” that wrapped to new line

Page 53: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

AsterisksOutput changes any diacritics to them

Page 54: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Line breaksName/Title headings are especially likely to get broken up. Here, the delimiter was even separated from the subfield code “t”

Page 55: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Workaround1. FIND: \* REPLACE:[nothing] to

delete asterisks2. Use EditPad’s “Extras” to delete blank

lines, duplicate lines, etc.3. Depending on the number of items,

you might close up split lines by hand.

Page 56: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Searching in batches

Page 57: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Searching a batch of terms in Connexion Client

https://mmonaco-uakron.tinytake.com/sf/MjU4OTAyMF83Nzg3NjE2

Page 58: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Batch searching“Use default index” settings

nw: for names/titles

su: for topics/geographic terms

Maximum number of matches to download: 1 (Tools>Options>Batch)

Page 59: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Batch searchingNOTE:

Your local save file has a maximum capacity of 10,000 records, so don’t search more than that many strings!

Page 60: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Successful name searches (of 1941 entries)

Page 61: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Names, names as subjects, and subjectsIII requires name headings that are to be used as subjects to be loaded separately from name headings to be used as names!

SirsiDynix does not have this issue.

Page 62: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

A four month test

I’m very pleased with hit rate on names!

Total headings extracted

Hits in batch search Success rate (ARs found for heading)

Names 36,244 21,760 60 %

Names as Subjects 3,795 896 23.6 %

Subjects 29,147 1,516 5.2 %

Page 63: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

A four month test

Main issue:

Name/Title headings often not established

Total headings extracted

Hits in batch search Success rate (ARs found for heading)

Names 36,244 21,760 60 %

Names as Subjects 3,795 896 23.6 %

Subjects 29,147 1,516 5.2 %

Page 64: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

A four month test

Main issues:

Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct this issue!)

Music headings (instruments, Arranged) often valid but not established in an AR

Total headings extracted

Unique hits in batch search

Success rate (ARs found for heading)

Names 36,244 21,760 60 %

Names as Subjects 3,795 896 23.6 %

Subjects 29,147 1,516 5.2 %

Page 65: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Is it worth it?Typically this process (excluding Validate Headings in MarcEdit) takes less than an hour and resolved 36% of unauthorized headings -- averaging over a thousand ARs a week. Several “known issues” in Sierra made me place this project on hold until they are fixed however.

So consider

1. the number of new headings you normally see in a report, 2. the quality of your incoming bib records (can the headings be authorized?)

and 3. the capability of your ILS to use authority records effectively

Page 66: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Questions?If you decide to test this process at your institution or discover any refinements, please let me know!

[email protected]

Page 67: Authority Work Automating Mike Monaco Coordinator, Cataloging … · 2019. 5. 10. · Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct

Useful toolsHelp with regular expressions: Regular Expressions 101 http://regex101.com

RegEx enabled text editors: Editpad Lite https://www.editpadlite.com

EmEditor https://www.emeditor.com/

pgAdmin free software: https://www.pgadmin.org/download/

This presentation: https://events.library.nd.edu/ovgtsl2018/talk/monaco.shtml