linked data for digital humanities - big data summerschool
TRANSCRIPT
Linked (Open) Data for Digital Humanities Big Data in Society 2016 Amsterdam Summer school
Victor de Boer With input from Christophe Guéret, Serge ter Braake,
Niels Ockeloen, Antske Fokkens, Dirk Roorda, Lora Aroyo, Johan Oomen, Oana Inel, Jan Wielemaker, Jeroen Entjes
Victor de Boer Web & Media Group, CS, Vrije Universiteit Amsterdam Netherlands Institute for Sound and Vision
Linked Data for Cultural Heritage Linked Data for Digital History Linked Data for Development
Digital Humanities Part of the effort of researcher is moved from the physical archives to digital ones
Img:w
ww
.doaks.org, ww
w.dkrz.de
Cross - researcher - institution - project - domain collaborations
“Digital History”
http://armstrongdigitalhistory.org/, http://www.vcdh.virginia.edu/courses/fall07/hius401-f/, http://digitalhistory.unl.edu/essays/thomasessay.php, http://www.philipvickersfithian.com/2013/05/gender-in-stacks-on-managing-small.html
“That is great. I would love that… …but my research questions are slightly different.”
Img:Monty Python
Aging
Data Tool
C. Guéret based on http://redmonk.com
/jgovernor/2007/04/05/why-applciations-are-like-fish-and-data-is-like0w
ine/
Data as end-product Do not bake the data into the tool Build tools on top of the data. Make sure others can do so as well.
Fig: C. Guéret
Machine-readable Web Humans are very good at reading (web) documents and distilling information
Computers are good at calculating, combining and filtering information. But they are very bad at reading documents
We need to write down (web) data, information and knowledge in a way that machines can understand
http://info.cern.ch/Proposal.html
Tim Berners-Lee (The inventor of the Web) And the Semantic Web
Four V’s of Big Data http://ww
w.ey.com
/GL/en/Services/Advisory/EY-big-data-big-opportunities-big-challenges
How does all this work? Data, not documents Structured data Graph (networked) data! W3C Web standards stack
URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL, etc.
Four rules of Linked Data 1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF)
4. Include links to other URIs. so that they can discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
Use HTTP URIs for Things Uniform Resource Identifier (URI) used to identify a name of a resource http://rijksmuseum.nl/data/painting1 I can go there using HTTP (dereference) and then I get information about it
HTML page for humans RDF data for machines
Semantic Web standard for writing down data, information (Subject, Relation, Object)
<Painting001, has_location, Amsterdam>
Resource Description Framework (RDF)
Painting001 Amsterdam has_location
KB NEWSPAPERS
Dutch-Asiatic Shipping “VOC Opvarenden”
Jur Leinenga Matthias van Rossum
Elbing voyages Archangel voyages
DIFFERENT but LINKED DATAMODELS BASED ON COMPETENCY QUESTIONS
dss:Record gzmvoc:Telling
gzmvoc:telling-1046-De_Berkel
__bnode_1
gzmvoc:aziatischeBemanning
dss:Ship gzmvoc:Schip
gzmvoc: schip-1046-De_Berkel
dss:has_ship gzmvoc:schip
"1046"
“Schip”
“De Berkel”
rdfs:label dss:scheepsnaam
gzmvoc:scheepsnaam
dss:ShipType gzmvoc:Scheepstype gzmvoc: type-Ship
dss:has_shiptype gzmvoc:has_shiptype
gzmvoc:scheepstype
“21”
“Moorse mattroosen”
dss:azRegistratieKop
gzmvoc:azAantalMatrozen
gzmvoc:telling
gzmvoc:heeft DAS heenreis
dss:Record das:Voyage
das:voyage-1918_61
mdb:Schip1 mdb:Kof
mdb:scheepsType
das:ShipX das:Kofship
das:typeOfShip
Aat:Kof
Aat:Platbodems
skos:exactMatch
skos:exactMatch
skos:exactMatch
Link to other datasets
Identifying ships
Rather than irreversible normalization, we can add (sameAs) links
– Robin Ponstein
mdb:Alberdina1 mdb:Alberdina2
owl:sameAs
Provenance (1) Individual named graphs have provenance information
Who made it (people/software?) Based on what source Content confidence Prov-O vocabulary
MEDIA HISTORIANS AND RESEARCHERS Media researcher Lars Arve Røssland of the U
niversity of Bergen. (Photo: Andreas R. Graven)
EXPLORATORY SEARCH
Digital Hermeneutics: The combination of digital (Web) technology and theory of interpretation
Four data sources
OPENIMAGES.EU 300 News videos (1920s-1970s)
DELPHER Radio News Bulletins 2210 Scripts (1945-1985)
AMSTERDAM MUSEUM 3541 Objects (1950-1989)
TROPENMUSEUM ~3000 objects (20th C)
ENTITY EXTRACTION
CROWDTRUTH.ORG
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND CONCEPTS TO KEYFRAMES
LINKED DATA KNOWLEDGE GRAPH
DIVE:MEDIA OBJECT SEM:EVENT
SEM:PLACE
SEM:TIME
SEM:ACTOR
SKOS:CONCEPT
OA:ANNOTATION
PLACE
PLACE
DIGITAL SUBMARINE UI
https://ww
w.flickr.com
/photos/benjcarson/245171885 https://w
ww
.flickr.com/photos/m
ibuchat/2774251415
INFINITY OF EXPLORATION
Linked Data allows for new types of (Humanities) research
• Graphs, not tables
• Distributed, heterogeneous data • Integrate datasets in a flexible way • Cross-collection, -institution, -domain • Re-use background knowledge
• Provenance fits very well
• Linked Data is the (technically) best way to publish and share your research data