automating controlled vocabulary reconciliation
TRANSCRIPT
![Page 1: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/1.jpg)
Automating Controlled Vocabulary Reconciliation
Anna NeatrourMetadata Librarian
Jeremy MynttiInterim Head, Digital Library Services
![Page 2: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/2.jpg)
2
SummaryMetadata inconsistencyOverview of vendor authority
processFurther work with Open RefineNext steps
http://www.utahindians.org
![Page 3: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/3.jpg)
3
InconsistencyGosiute IndiansGoshute Indians
Navajo IndiansNavaho Indians
Salt LakeSalt Lake CitySalt Lake City (Utah)
Bishop, Dail StapleyBishop, Dale StapelyBishop, Dale Stapley
Beckwith, Frank A. (1876-1951)Beckwith, Frank Asahel (1876-1951)Beckwith, Frank A.Beckwith, Frank A. (1876-1951)Beckwith, Frank Asahel (1876-1951)Beckwith, Frank Asahel, 1876-1951
Woven basket or jug;http://content.lib.utah.edu/cdm/ref/collection/UU_Photo_Archives/id/13887
![Page 4: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/4.jpg)
4
Project TimelineJune-Sept. 2012 – Define project
Oct. 2012 – May 2013 – TestingJune 2013 – Contracted with
Backstage Library WorksJune 2013-Feb. 2014 – Continued
testingFeb.-May 2014 – 17 collections
processedJune-Aug. 2014 – Manual review
(intern)April 2015-today – Explore
OpenRefine
![Page 5: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/5.jpg)
5
Methodology
<title>A group of St. George (Sibwit) Paiutes and Wickiups (cedar)</title><subjec>Paiute Indians; Ute Indians--History; Wickiups; Indians of North America--Dwellings;</subject><covspa>Utah;</covspa><descri>A group of people sitting and standing in front of a brush shelter;<descri><publis>Digitized by: J. Willard Marriott Library, University of Utah;</publis><type>Image;StillImage;</type><format>image/jpeg;</format>
http://content.lib.utah.edu/cdm/ref/collection/uaida/id/14697
![Page 6: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/6.jpg)
6
Backstage: statistics and reports
Unmatched reportChange report
![Page 7: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/7.jpg)
7
Backstage: standardizationCapitalization, Punctuation, and Updated Authorized Access PointsForests and Forestry – Utahforests and forestry -- UtahForest lands - UtahForests and forestry--Utah
A group of Navajos at Navajo Mountain government school;http://content.lib.utah.edu/cdm/ref/collection/uaida/id/43551
![Page 8: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/8.jpg)
8
Backstage: problems encounteredMissing MARC tags
Names treated as topical headings and vice versa
Provo => Provisional IRA
Data in wrong fields Date: Price Hiram, 1814-
1901
Incorrect match Local names matching wrong
records Johnson, Abe is not Johnson, F. T.
Walker War Map 1853-1854;http://content.lib.utah.edu/cdm/ref/collection/uaida/id/15474
![Page 9: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/9.jpg)
9
Intern review and clean-up
![Page 10: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/10.jpg)
10
OpenRefine project◦ Used UAIDA as a
pilot, since it had the greatest number of unmatched names due to the size of the collection (over 8,000 items)
◦ 529 unmatched names after Backstage process
Navajo woman weaving, http://content.lib.utah.edu/cdm/ref/collection/uaida/id/45379
![Page 11: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/11.jpg)
11
OpenRefine: two approachesReconciliation process
developed by Jenn Wright and Matt Carruthers, University of Michigan Library, https://github.com/mcarruthers/LCNAF-Named-Entity-Reconciliation
Reconciliation process developed by Roderic Page, http://iphylo.blogspot.com/2013/04/reconciling-author-names-using-open.html
A group of Navajo children and teenagers, http://content.lib.utah.edu/cdm/ref/collection/uaida/id/43285
![Page 12: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/12.jpg)
12
OpenRefine: differences in resultsBoth processes found name
matches through searching VIAF.◦Wright and Carruthers’ process looked
for a matching LC authority record in the VIAF cluster 81 records were matched, 132 were false
matches, and 312 number had no match◦Page’s process matched names to
authors in a more general fashion 70 records were matched, 37 were false
matches, and 449 had no match.
![Page 13: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/13.jpg)
13
OpenRefine: manual workCheck matches against collection
and discard false matches
![Page 14: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/14.jpg)
14
OpenRefine: updating UAIDAWe updated an
additional 455 records with updated names.
405 matches were from both processes, 38 were unique to Wright and Carruthers and 5 were matched by the Page process. Eight Hopi Baskets,
http://content.lib.utah.edu/cdm/ref/collection/uaida/id/45009
![Page 15: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/15.jpg)
15
Open Refine: student workFall 2015 – student ran additional
unmatched items from other collections through OpenRefine with Wright & Carruthers process
Metadata librarian currently reviewing student work and updating collections
![Page 16: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/16.jpg)
16
Next StepsCreate local and regional controlled vocabularies
![Page 17: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/17.jpg)
17
Next Steps: Reconcile across more collectionsCONTENTdm
metadata exported in SOLR
Easier to get list of personal names across all collections
Explore other reconciliation methods
![Page 18: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/18.jpg)
18
Next StepsURIs in Digital Collections Metadata, MWDL (Primo), and DPLA
http://content.lib.utah.edu/cdm/ref/collection/uaida/id/43183
![Page 19: Automating Controlled Vocabulary Reconciliation](https://reader034.vdocument.in/reader034/viewer/2022042619/5873c2201a28abbc788b6a25/html5/thumbnails/19.jpg)
19
Questions?
Anna Neatrour | [email protected]
Metadata Librarian
Jeremy Myntti | [email protected]
Interim Head, Digital Library Services
Forthcoming article:,Use Existing Data First: Reconcile Metadata Before Creating New Controlled Vocabularies. Journal of Library Metadata. http://dx.doi.org/10.1080/19386389.2015.1099989