or2016 - managing crowd sourced cultural heritage datasets
TRANSCRIPT
Managing Crowd sourced Cultural Heritage Datasets
National Library of WalesGlen Robson – Head of Systems
twitter: @glenrobson
Plan
• Background to the National Library of Wales
• Crowd Sourcing projects
– Cymru – 1900 – Wales
– Cynefin
– Shipping Records
– WW1 Book of Remembrance
• Providing storage and access
Data
• Fields:– Owner– Tennant– Use – arable/forest etc.– Size (acre, rood, perches)– Tithe Value (pounds, shilling, pence)– Geo-coordinates
• Storing in Fedora– ALTO– Open Annotations
• JSON-LD• RDF/XML
– Indexing in SOLR– Website in the summer
Shipping Registers
• 544 merchant vessels registered at the port of Aberystwyth
• 1856-1914
• Crew lists – name, position, birth date, reason for leaving, location
• Transcribed by volunteers
• https://www.llgc.org.uk/blog/?p=5716
Data Preservation
• Where do we store this data?
– Catalogue – MARC
– Fedora 3 Repository
• Excel files / RDF
• Data being enhanced
– Currently:
• Triple store (sesame) – preservation?
• https://github.com/LlGC-NLW/shippingrecords
– Fedora 4?
Enhancements
• Linking out– Places:- Birth and Ship arrival
• Volunteer using OpenRefine to group places• Will try and match with GeoNames
– Ships :-• Added to wikidata by NLW Wikipedian in Residence:
– https://tools.wmflabs.org/reasonator/?&q=23927955– https://tools.wmflabs.org/reasonator/?&q=24027483– Adding images, size, weight, creation, destruction, link to
newspapers
– Dutch Shipping to Newspaper linking: http://bit.ly/1Talish/
Research Potential
• By publishing these datasets as Linked Open Data it allows research that wasn’t possible when these items were physical or even when they were standalone digital objects.
• Physical:– Travel to Aberystwyth - x hours/days – Transcribe data in the reading room – x months/years– Process back home
• Standalone Digital Object– Transcribe data at home – x months/years– Process at home
• Linked Open Data Annotations– Process at home results in minutes
• Have to take transcriptions with trust
Simple Annotation Server
• https://github.com/glenrobson/SimpleAnnotationServer
• Stores IIIF Annotations as Linked Open Data
Providing Access
• Volunteers want to see results
• Cynefin – funded project
• Shipping records – independent website
• Cymru1900Wales – dataset (CSV + Linked Data)
• Mirador and IIIF options:
– IIIF Search API
– IIIF Ranges – table of contents
– Datasets for download
Can we do this at scale?
CynefinMaps
1838 to 1947
Newspapers1804 to 1919
Cymru 19141914 to 1918
General Digitisation
Shipping Records1856 to 1914
Crime and Punishment
Database1730 to 1830
Welsh Bibliography0 to 1970
Summary
• Different methods of crowd sourcing:– Excel– Outsourcing – Cynefin and wales1900– IIIF – Mirador & Simple Annotation Server– WikiData
• Ideally crowd sourcing platform directly connected to access solution (there will be corrections)
• Transcribing to linked data gives:– Connection to external data sources (geonames, wikipedia)– Connection to other resources (newspapers)– Allows researchers to query the data
• IIIF gives:– Easy to setup transcription platform– Work with other peoples content