text to data
DESCRIPTION
talk/ rant about Marc21 derived metadataTRANSCRIPT
![Page 1: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/1.jpg)
+
Text to data
MashCat 2012
Ed Chamberlain
![Page 2: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/2.jpg)
+Me
Librarian (systems)
Data ‘munger’
Data consumer?
![Page 3: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/3.jpg)
+The way it used to be …
Control over record consumption
Control over record environment
Control over technology
![Page 4: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/4.jpg)
+
![Page 5: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/5.jpg)
+Competition …
No longer the single authority for content and description
Commercial, social and academic discovery mechanisms
Explosion of digital content
Illusion of ‘all on the web’
![Page 6: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/6.jpg)
+Fit for purpose?
Studies into Google Generation / ‘Generation Y’ 1
Cambridge Arcadia IRIS report 2009 2
Preference for search engine over catalogue
Online over in-building
Trust tutors and peers over Librarian
Still respect the library ‘brand’
1) ”The Google generation: the information behaviour of the researcher of the future”Aslib Proceedings, V60, issue 4 10.1108/00012530810887953
2) Arcadia IRIS Project report - http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf
![Page 7: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/7.jpg)
+ Keyword based discovery services
New ways to exploit old data
Relevancy ranking
Rich faceting
Greater linking
Search is the new browse
Repositories and archives
Is the OPAC dead?
Improve catalogues
![Page 8: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/8.jpg)
+Different but the same?
Catalogue data is now:
Consumed as keywords (not left anchored access points)
Faceted (not browsed) Supplemented Transformed Merged Amalgamated
![Page 9: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/9.jpg)
+Prepare for the future …
‘Use case you’ve not yet thought of’
‘Consumer as producer’
‘Pro-Am’
‘Free from silo’
Developers as well as readers
Preference for data over text
![Page 10: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/10.jpg)
+
Library data
Our local catalogues
National / international aggregations
Joe Public
Teenage software developer / hacker
Booksellers
Web start-ups
Search engines
Wikipedia
Other libraries
Research group website
![Page 11: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/11.jpg)
+Libraries have a lot to offer
Bibliographic data linked to many aspects of successful teaching and research Citation lists – measure output
Shared bibliography – core of research group work
Reading lists – backbone of undergraduate teaching
High quality data needed for re-use
Not all possible whilst data resides in the library ‘silo’
![Page 12: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/12.jpg)
+
'Open metadata creates the opportunity for enhancing impact through the release of descriptive data about library, archival and museum resources. It allows such data to be made freely available and innovatively reused to serve researchers, teachers, students, service providers and the wider community in the UK and internationally.'
http://discovery.ac.uk
![Page 13: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/13.jpg)
+Open data releases …
![Page 14: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/14.jpg)
+But …
Is Marc21 the right format for developers (or libraries?)
Is it easy to convert into something more palatable?
![Page 15: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/15.jpg)
+What can we do with an ISBN?
Build Union catalogues
Find existing or alternative records (copy catalogue)
Find related works (XISBN, ISBNThing)
Match and mash with resources on the web: Images Reviews Citations and references
![Page 16: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/16.jpg)
+020 - ISBN
What cataloguer record users want:
Accuracy
Contextualization
Access point
Something legible to read
What data consumers want:
– Accuracy
– Contextualization
– Access point
– Reusability
– Granularity
![Page 17: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/17.jpg)
+So …
Take ISBN from an 020$a my $isbn = $record->field('020')->as_string("a"); 0123456789(pbk)
(pbk) ?
Is it the same as (.pbk) I noticed earlier?
I’m a developer – I can solve this …
Regex /^[0-9]+$/ - just gets numbers …
Oh hang on, don’t some ISBNS end in X?
And all that information on hardback /paperback is lost …
![Page 18: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/18.jpg)
+Non Marc …
<identifier type=“isbn” relation=“hardback”>0123456789x</isbn>
identifier: {"id": "0123456789", "type": "isbn”, “rel”:”hardback”}
<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_100045> <http://purl.org/dc/terms/identifier >"urn:isbn:2853990060" .<http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> http://purl.org/ontology/bibo/Book.
![Page 19: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/19.jpg)
+Advantages
Self describing (if you read English)
Granular
Data NOT text for display (although this can be easily generated)
![Page 20: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/20.jpg)
+$100 …
"author" : [
{
"birthDate" : "1832",
"firstname" : " James",
"deathDate" : "1929",
"name" : "Greenwood, James",
"lastname" : "Greenwood"
}
]
• 1001_ |a Greenwood, James, |d 1832-1929.
• Greenwood, James, 1832-1929.
![Page 21: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/21.jpg)
+ my @exportAuthors=(); my @authors =(); my $eachAuthor =''; if ($record->field('100')) { @authors = $record->field('100'); foreach $eachAuthor(@authors) { my %exportAuthor =(); my $authorFull = trim($eachAuthor->subfield('a')); $exportAuthor{'name'} = $authorFull; my @parsed_author=split(/,/, $authorFull); $exportAuthor{'lastname'} = $parsed_author[0]; $exportAuthor{'firstname'} = $parsed_author[1]; my $dates = $eachAuthor->subfield('d'); my ($birthDate,$deathDate); # The glorious 100$d disassembled ... if ($dates) { #first of all, get rid of ca. and fl. which aren't real birth or death dates if ($dates=~/fl\.|ca\./){ #do nothing } #otherwise, if date contains a hyphen, assume range #but fix also works for unterminated dates? elsif ($dates=~/\-/) { my @dates=split(/\-/,$dates); $exportAuthor{'birthDate'} = trim($dates[0]); if ($dates[1]) { $exportAuthor{'deathDate'} = trim($dates[1]); } #No Hyphen - assume single date - look for definitive birth event with a 'd' ... } elsif ($dates=~/\b\./) { $exportAuthor{'birthDate'} = trim($dates[0]); # - look for definitive death event with a 'd' ... } elsif ($dates=~/\d\./) { $exportAuthor{'deathDate'} = trim($dates[0]); # Final assumption for authors with recorded dates but with single date no hyphen. Assume its a birthdate? } else { $exportAuthor{'birthDate'} = trim($dates[0]); } # produce output for dates ... } # Assemble author object push(@exportAuthors,\%exportAuthor); # End author loop } # Add list of authors to export object $exportRecord{'author'} = \@exportAuthors; }
![Page 22: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/22.jpg)
+How is this being solved?
Fix it at the source: RDA Marc transition initiative Other initiatives – BL, OCLC linked data releases Onyx Mods
![Page 23: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/23.jpg)
+Pragmatism: the end of big standards
Adoption of one new standard (or several) for its own sake is pointless
Fit in around changing needs of libraries and systems
Data needs to be flexible and re-purposable
No standard to ‘rule them all’ in the post Marc21 world
![Page 24: Text to data](https://reader033.vdocument.in/reader033/viewer/2022042816/558dfd1b1a28abb50d8b45d7/html5/thumbnails/24.jpg)
+If we do nothing?