#mashcat: evolving marcedit: leveraging semantic data in marcedit

25
#mashcat: Evolving MarcEdit LEVERAGING SEMANTIC DATA IN MARCEDIT

Upload: terry-reese

Post on 22-Feb-2017

1.278 views

Category:

Education


2 download

TRANSCRIPT

Page 1: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

#mashcat:Evolving MarcEditLEVERAGING SEMANTIC DATA IN MARCEDIT

Page 2: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Little History MarcEdit Development started around 1999ish (as parts)

◦ Originally coded in 3 programming languages: Assembler (libraries), Visual Basic (UI) and Delphi (COM). ◦ I started writing it as an undergraduate to better understand MARC & circumvent OCLC’s Passport for

Windows program◦ First “MarcEdit” was released Sept. 11, 2000 (thank you WayBack Machine:

http://web.archive.org/web/20001017105529/http://ucs.orst.edu/~reeset/marcedit/indexb.html)

Today:◦ Written in C# (Windows/Linux) & Object-C/C# (OSX)◦ Active user community is ~20,000ish (based on update logs)◦ Used in ~190ish countries/political regions

◦ Roughly 1/3 of the users reside outside of Canada/United States*

* Based on loose analysis of server logs by my server-side stats software

Page 3: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

MarcEdit Evolution MarcEdit 1.0-2.0 Main Window MarcEdit MARC Tools 1.0-2.0

MarcEdit 1.0-2.0 MarcEditor

Page 4: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

MarcEdit EvolutionEarly application was developed to (again, thank you Internet Archive):

1. Be user-friendly (whether I’ve accomplished that is debatable – I’m not a UI designer)2. Support LC’s MARCBreakr/Maker diacritics (largely yes)3. Be fast (which I think that it is)4. Simplify editing records in batch 5. Provide a set of programming tools to solve my own local needs

Page 5: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

MarcEdit Today

Page 6: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Three development rules I follow MarcEdit is a real-world metadata tool

◦ Tool is designed to provide workflows for data problems currently facing libraries right now

MarcEdit is MARC Agnostic◦ Too many metadata tools are anglo-centric; MarcEdit has been designed to work within the very

heterogeneous metadata environment that we find ourselves today, which includes:◦ Support for MARC (not a particular flavor*)◦ Near universal characterset support (because the world is bigger than MARC8 and UTF8)◦ Supports a wide range of Library metadata standards beyond MARC

MarcEdit is one part of the larger library metadata tooling environment◦ So integrations with OCLC, ILSs (when possible), OpenRefine are important

* And if something assumes MARC21 – call it out

Page 7: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

So how does any of this relate to semantic data in Libraries?

http://musictheorysite.com/img/dwight_question.jpg

Page 8: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

A lot of metadata people I talk to fall into two camps

Page 9: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

BibFrame and Linked Data as RDA 2.0

BibFrame

http://www.wired.com/wp-content/uploads/archive/news/images/full/duke_nukem_frever_f.16807.jpghttp://astronomy.nmsu.edu/cwc/Group/magiicat/images/magiicat-logo.gif

Linked Data

Page 10: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

BibFrame and linked data as datacorns

https://whatsthebigdata.files.wordpress.com/2015/10/datascience_unicorn.png?w=640

Page 11: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

I prefer a more practical outlook…

https://www.etsy.com/search?q=unicorn+cat+hat

Page 12: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

MarcEdit’s MARCNext MarcEdit’s MARCNext is a first attempt to start having this discussion by:

1. Integrating a linked data framework into MarcEdit, including tooling for:

a. JSON-LDb. SPARQLc. RDF

2. Providing catalogers with proof of concept tools to begin experimenting with their own data

3. Provide a method to integrate semantic concepts into legacy data

4. Provide a toolset that MarcEdit can use to build new tools.

Page 13: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Let’s take a closer look at two Link Identifiers Tool

◦ This tool embeds URIs into MARC data◦ Is rules driven (i.e., not MARC21 centric)◦ Supports ~24 different in-use data sources

Validate Headings Tool◦ First tool in MarcEdit to make use of the tools linked data platform and available data services to provide

a real-world application.

Page 14: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Link Identifiers Tool

Page 15: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Link Identifiers Tool Initially released in Aug. 2014[1] as a proof of concept for testing the linked data framework being developed in MarcEdit

◦ Initially only processed LCSH and NAF

Currently, I’ve profiled ~24 data sources, and the tool can be integrated in MarcEdit’s Task Workflow.

◦ Translation profiles are currently in flux, as I work with a PCC group developing recommendations for embedding URIs in MARC records.

◦ Working on a process that would allow users to self-profile identifier services, so long as they supported JSON-LD or SPARQL.

[1] MarcEdit’s Research Toolkit: MARCNext: http://blog.reeset.net/archives/1359

Page 16: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Link Identifiers Tool Tool has evolved over the last year to utilize a rules based configuration (example):

<field type="bibliographic"> <tag>630</tag> <ind2 value="0" vocab="naf_lcsh" /> <ind2 value="1" vocab="lcshac" /> <ind2 value="2" vocab="mesh" /> <subfields>adfkqnp</subfields> <uri>0</uri> <special_instructions>mixed</special_instructions> </field> <field type="authority|bibliographic"> <tag>336</tag> <subfields>a</subfields> <index>2</index> <uri>0</uri> </field>

Page 17: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Linked Identifiers: Turning strings

=336 \\$atext$btxt$2rdacontent

=337 \\$aunmediated$bn$2rdamedia

=338 \\$avolume$bnc$2rdacarrier

=600 10$6880-06$aHu, Zongnan,$d1896-1962$vDiaries.

=650 \0$aGenerals$zChina$vBiography.

=650 \0$aGenerals$zTaiwan$vBiography.

=600 17$aHu, Zongnan,$d1896-1962.$2fast$0(OCoLC)fst00131171

=650 \7$aGenerals.$2fast$0(OCoLC)fst00939841

=651 \7$aChina.$2fast$0(OCoLC)fst01206073

=651 \7$aTaiwan.$2fast$0(OCoLC)fst01207854

=655 \7$aDiaries.$2lcgft

=655 \7$aAutobiographies.$2lcgft

Page 18: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Linked Identifiers: into strings+ =336 \\$atext$btxt$2rdacontent$0http://id.loc.gov/vocabulary/contentTypes/txt

=337 \\$aunmediated$bn$2rdamedia$0http://id.loc.gov/vocabulary/mediaTypes/n

=338 \\$avolume$bnc$2rdacarrier$0http://id.loc.gov/vocabulary/carriers/nc

=600 10$6880-06$aHu, Zongnan,$d1896-1962$vDiaries.$0http://id.loc.gov/authorities/names/n84029846

=650 \0$aGenerals$zChina$vBiography.$0http://id.loc.gov/authorities/subjects/sh2008105087

=650 \0$aGenerals$zTaiwan$vBiography.$0http://id.loc.gov/authorities/subjects/sh2008105117

=600 17$aHu, Zongnan,$d1896-1962.$2fast$0http://id.worldcat.org/fast/00131171

=650 \7$aGenerals.$2fast$0http://id.worldcat.org/fast/00939841

=651 \7$aChina.$2fast$0http://id.worldcat.org/fast/01206073

=651 \7$aTaiwan.$2fast$0http://id.worldcat.org/fast/01207854

=655 \7$aDiaries.$2lcgft$0http://id.loc.gov/authorities/genreForms/gf2014026085

=655 \7$aAutobiographies.$2lcgft$0http://id.loc.gov/authorities/genreForms/gf2014026047

Page 19: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Example

Page 20: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Linked Data tools Things that are still hard:

◦ Most identifier services use their own rules for data escaping – and they aren’t documented

◦ Many services are still not well suited for this work◦ Anything that doesn’t provide an option to do an exact lookup like ULAN, AAT, or VIAF – all these require additional

processing to ensure that results match the queried term.

◦ Many services are little “p” production in that lots of look-ups can (and do) cause problems.

Page 21: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Validate Headings Automated authority control processing

◦ Utilizes id.loc.gov◦ Provides reports of data that isn’t currently “authorized”◦ Provides options for generating brief authorities◦ Extracts for further data processing◦ Ability to embed URIs during validation

◦ If URIs are present – they are used rather than a direct look up◦ Automatic heading correction when variants are encountered

Page 22: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Validate Headings

Validate Headings can be run from inside the MarcEditor, or outside as a stand alone tool

Page 23: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Example

Page 24: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Continued work… Would like to continue to add additional vocabularies

Expand headings validation to more than just LCSH/NAF

Include Linking Profiles for UNIMARC

Using Linked Data sources for sameas subject generation

Page 25: #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Questions Contact Information:

Terry ReeseEmail: [email protected] or [email protected] Website: http://marcedit.reeset.netHelp: http://marcedit.reeset.net/help