linked data in the digital humanities skills workshop for realising the opportunities of the...
DESCRIPTION
This workshop will introduce participants to Linked Data, a key semantic web technology, and its uses in the digital humanities. Through examples of Linked Data websites and applications, we will explore how Linked Data is being used by individual digital humanities scholars, by organisations such as the BBC and the Central Statistics Office, and by cultural heritage institutions worldwide. We will make comparisons to other approaches to structuring data (including markup and metadata approaches such as TEI and XML) and discuss best practices for creating and reusing Linked Data (such as the importance of identifiers and standard vocabularies). Participants will also be introduced to tools for creating and exploring Linked Data. The workshop will also include a hands-on exercise in creating Linked Data. Linked Data in the Digital Humanities was a Skills Workshop http://dri.ie/skills-workshops part of Realising the Opportunities of Digital Humanities http://dri.ie/realising-opportunities-digital-humanities Presenters: Jodi Schneider and Michael Hausenblas with support from Stefan Decker, Nuno Lopes, and Bahareh Heravi all of the Digital Enterprise Research Institute, National University of Ireland GalwayTRANSCRIPT
Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Linked Data in the Digital Humanities
Jodi Schneider & Michael Hausenblaswith Stefan Decker & Nuno Lopes
Thursday 25th October 2012
1
Realising the Opportunities of Digital HumanitiesNational University of Ireland, Maynooth
(2) 2
Introduction
(3) 3
What is Linked Data? Why use it? What are some examples?
How do Linked Data applications differ from conventional ones?
How is Linked Data different from other structured data used in digital humanities? (e.g. TEI, XML)
What are the best practices for creating Linked Data?
Objectives
(4) 4
Using identifiers to enable access to add structure to link to other stuff
What is Linked Data?
(5) 5
Why use Linked Data?
We do not want data silos!
Photo credit “nepatterson”, Flickr
(7) 7
A “Web” where documents are available for download on the Internet but there would be no hyperlinks among them
Imagine…
Slide credit: Ivan Herman
(10) 10
We need a proper infrastructure for a real Web of Data data is available on the Web
• accessible via standard Web technologies data are interlinked over the Web ie, data can be integrated over the Web
We need Linked Data
Data on the Web is not enough…
Slide credit: Ivan Herman
This is what we want!
Photo credit “kxlly”, Flickr
(12) 12
Where Linked Data is used
Mass Media BBC New York Times Guardian
Scholarly Publishers Nature CrossRef
Data Publishers USData.gov Data.gov.uk Central Statistics Office
Libraries
(13) 13
Libraries where Linked Data is used
(14) 14
http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide.html
Data, data everywhere
(15) 15
Cost and benefits of Open Data – 5 ★ Open Data
(17) 17
How do Linked Data applications differ from conventional ones?
DBpediarecord
data.gov.ierecord
British Libraryrecord
Europeanarecord
core record
apprecordsapp
conventional back-end Linked Data back-end
DBpediarecord
data.gov.ierecord
British Libraryrecord
Europeanarecord
core record
apprecordsapp
conventional back-end Linked Data back-end
up to 80% re-use possible
time & cost reduction!
(20) 20
Examples of Linked Data applicationswith Irish data …
http://county-rank.data-gov.ie
http://county-rank.data-gov.ie
http://school-explorer.data-gov.ie
(24) 24
Anatomy of the Authors N' Books application
http://srvgal85.deri.ie/ab-app/
Dydra
DBpedia
Europeana
user
DERI server-sideclient-side other server
ab-proxy.py(120 LOC)
ab-app.js(181 LOC)
(27) 27
What Would You Use Linked Data for?
(28) 28
How is Linked Data different from XML & TEI
(29) 29
Using identifiers to enable access to add structure to link to other stuff
What is Linked Data?
(30) 30
XML as a tree
Document
Paragraph
Sentence Sentence Sentence
Paragraph
Sentence Sentence
Remember, everything must nest properly!
We use family tree terms: parent, child, sibling, ancestor, and descendent.
Slide credit: Susan Schreibman
(31) 31
RDF: More than trees
(32) 32
What does TEI make explicit?
structural divisions within a texttitle-page, chapter, scene, stanza, line, etc
typographical elementschanges in typeface, special characters, etc
other textual featuresgrammatical structures, location of illustrations, variant
forms, etc
Slide credit: Susan Schreibman
(33) 33
What does TEI make explicit?
structural divisions within a texttitle-page, chapter, scene, stanza, line, etc
typographical elementschanges in typeface, special characters, etc
other textual featuresgrammatical structures, location of illustrations, variant
forms, etc
Slide credit: Susan Schreibman
(34) 34
Best Practice 1: Identifiers
(35) 35
Using identifiers to enable access to add structure to link to other stuff
What is Linked Data?
(36) 36
http://www.youtube.com/watch?v=TJfrNo3Z-DU
Why do we need identifiers?
(37) 37
(38) 38
(39) 39
Language ambiguity
dog =
Slide Credit: Karen Coyle
(40) 40
Language ambiguity
dog (lang=en)
hund (lang=de)
perro (lang=sp)
chien (lang=fr)
IDa87nn3
Slide Credit: Karen Coyle
(41) 41
UPC code
Slide Credit: Karen Coyle
(42) 42
ISBN
Slide Credit: Karen Coyle
(43) 43
URIs as Identifiers
Slide Credit: Michael Hausenblas
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. [RFC3986]
Syntax URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
Example foo://example.com:8042/over/there?name=ferret#nose \_/ \_________________/\____/ \_______________/ \____/ | | | | | scheme authority path query fragment
(44) 44
Example: DBPedia
http://dbpedia.org/page/Maynooth
(45) 45
Example: DBPedia
http://dbpedia.org/page/Maynooth
(46) 46
Using Linked Data 1: Querying & Filtering
(47) 47
Using identifiers to enable access to add structure to link to other stuff
What is Linked Data?
(48) 48
http://www.youtube.com/watch?v=TJfrNo3Z-DU#t=1m20.5s
Querying with identifiers
(49) 49
Linked Data Application Revisited
(50) 50
One Query
(51) 51
Best Practice 2: Vocabularies
(52) 52
Dublin Core
Title Creator Date Subject Contributor Coverage Description Format Identifier Publisher Relation Rights Source Type
(53) 53
Friend of a Friend (FOAF)
Slide credit: Dan Brickley
(54) 54
Friend of a Friend (FOAF)
Slide credit: Dan Brickley
(55) 55
Friend of a Friend (FOAF)
Slide credit: Dan Brickley
(56) 56
Friend of a Friend (FOAF)
Slide credit: Dan Brickley
(57) 57
Friend of a Friend (FOAF)
Slide credit: Dan Brickley
(58) 58
Friend of a Friend (FOAF)
Slide credit: Dan Brickley
(59) 59
Friend of a Friend (FOAF)
Slide credit: Dan Brickley
(60) 60
Which vocabularies might you use? What might you need vocabularies to
represent?
There are *many* vocabularies!
(61) 61
Quick Recap
(62) 62
What is Linked Data? Why use it? What are some examples?
Objectives
(63) 63
Using identifiers to enable access to add structure to link to other stuff
What is Linked Data?
We do not want data silos!
Photo credit “nepatterson”, Flickr
This is what we want!
Photo credit “kxlly”, Flickr
(67) 67
What is Linked Data? Why use it? What are some examples?
How do Linked Data applications differ from conventional ones?
Objectives
DBpediarecord
data.gov.ierecord
British Libraryrecord
Europeanarecord
core record
apprecordsapp
conventional back-end Linked Data back-end
up to 80% re-use possible
time & cost reduction!
(69) 69
What is Linked Data? Why use it? What are some examples?
How do Linked Data applications differ from conventional ones?
How is Linked Data different from other structured data used in digital humanities? (e.g. TEI, XML)
Objectives
(70) 70
RDF: More than trees
(71) 71
What is Linked Data? Why use it? What are some examples?
How do Linked Data applications differ from conventional ones?
How is Linked Data different from other structured data used in digital humanities? (e.g. TEI, XML)
What are the best practices for creating Linked Data?
Objectives
(72) 72
What is Linked Data? Why use it? What are some examples?
How do Linked Data applications differ from conventional ones?
How is Linked Data different from other structured data used in digital humanities? (e.g. TEI, XML)
What are the best practices for creating Linked Data?
Objectives
(73) 73
Best Practice 1: Identifiers
(74) 74
Need Identifiers!
dog =
Slide Credit: Karen Coyle
(75) 75
Example: DBPedia
http://dbpedia.org/page/Maynooth
(76) 76
Best Practice 2: Vocabularies
(77) 77
Dublin Core
Title Creator Date Subject Contributor Coverage Description Format Identifier Publisher Relation Rights Source Type
(78) 78
Best Practice 3: Connect to the community
(79) 79
79
(80) 80
(81) 81
(82) 82
(83) 83
(84) 84
(85) 85
(86) 86
Question Time!
(87) 87
Acknowledgements
Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Thanks to our funders!
(89) 89
Thanks for slides!
Dan BrickleyKaren CoyleSiegfried HandschuhMichael HausenblasIvan HermanJodi SchneiderSusan Schreibman
(90) 90
Appendix & Backup Slides
(91) 91
Additional Examples
(94) 94
(95) 95
Advanced Topics
(96) 96
Demo on Queries
(97) 97
Making Linked Data 1: Convert Existing Data
(98) 98
Demo Revisited: Behind the ScenesSpreadsheet to Linked Data
http://www.youtube.com/watch?v=1irwjiUOh_4
(99) 99
Using Linked Data 2: Data Integration
AKA “An introduction to the Semantic Web (Through an Example)” by Ivan Herman
(100) 100
Map the various data onto an abstract data representation make the data independent of its internal
representation… Merge the resulting representations Start making queries on the whole!
queries not possible on the individual data sets
The rough structure of data integration
(102) 102
A simplified bookstore data (dataset “A”)
ISBN Author Title Publisher Year
0006511409X id_xyz The Glass Palace id_qpr 2000
ID Name Homepage
id_xyz Ghosh, Amitav http://www.amitavghosh.com
ID Publisher’s name City
id_qpr Harper Collins London
(103) 103
1st: export your data as a set of relations
http://…isbn/000651409X
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:authora:publisher
(104) 104
Relations form a graph the nodes refer to the “real” data or contain some
literal how the graph is represented in machine is immaterial
for now
Some notes on the exporting the data
(106) 106
Another bookstore data (dataset “F”)
A B C D
1 ID Titre Traducteur Original2 ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X3
4
5
6 ID Auteur7 ISBN 0-00-6511409-X $A11$
8
9
10 Nom11 Ghosh, Amitav12 Besse, Christianne
(107) 107
2nd: export your second set of data
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirsf:original
f:nom
f:traducteur
f:auteurf:ti
tre
http://…isbn/2020386682
f:nom
(108) 108
3rd: start merging your data
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur f:titre
http://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
(109) 109
3rd: start merging your data (cont)
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:origina
l
f:nom
f:traducteur
f:auteur f:titre
http://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
Same URI!
(110) 110
3rd: start merging your dataa:title
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
(111) 111
User of data “F” can now ask queries like: “give me the title of the original”
• well, … « donnes-moi le titre de l’original »
This information is not in the dataset “F”… …but can be retrieved by merging with dataset
“A”!
Start making queries…
(112) 112
We “feel” that a:author and f:auteur should be the same
But an automatic merge doest not know that! Let us add some extra information to the
merged data: a:author same as f:auteur both identify a “Person” a term that a community may have already defined:
• a “Person” is uniquely identified by his/her name and, say, homepage
• it can be used as a “category” for certain type of resources
However, more can be achieved…
(113) 113
3rd revisited: use the extra knowledge
Besse, Christianne
Le palais des miroirsf:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Personr:type
r:type
(114) 114
User of dataset “F” can now query: “donnes-moi la page d’accueil de l’auteur de l’original”
• well… “give me the home page of the original’s ‘auteur’”
The information is not in datasets “F” or “A”… …but was made available by:
merging datasets “A” and datasets “F” adding three simple extra statements as an extra
“glue”
Start making richer queries!
(115) 115
Using, e.g., the “Person”, the dataset can be combined with other sources
For example, data in Wikipedia can be extracted using dedicated tools e.g., the “dbpedia” project can extract the “infobox”
information from Wikipedia already…
Combine with different datasets
(116) 116
Merge with Wikipedia data
Besse, Christianne
Le palais des miroirsf:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Personr:type
r:type
http://dbpedia.org/../Amitav_Ghosh
r:type
foaf:name w:reference
(117) 117
Merge with Wikipedia data
Besse, Christianne
Le palais des miroirsf:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Personr:type
r:type
http://dbpedia.org/../Amitav_Ghosh
http://dbpedia.org/../The_Hungry_Tide
http://dbpedia.org/../The_Calcutta_Chromosome
http://dbpedia.org/../The_Glass_Palace
r:type
foaf:name w:reference
w:author_of
w:author_of
w:author_of
w:isbn
(118) 118
Merge with Wikipedia data
Besse, Christianne
Le palais des miroirsf:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Personr:type
r:type
http://dbpedia.org/../Amitav_Ghosh
http://dbpedia.org/../The_Hungry_Tide
http://dbpedia.org/../The_Calcutta_Chromosome
http://dbpedia.org/../Kolkata
http://dbpedia.org/../The_Glass_Palace
r:type
foaf:name w:reference
w:author_of
w:author_of
w:author_of
w:born_in
w:isbn
w:long w:lat
(119) 119
It may look like it but, in fact, it should not be… What happened via automatic means is done
every day by Web users! The difference: a bit of extra rigour so that
machines could do this, too
Is that surprising?
(120) 120
We could add extra knowledge to the merged datasets e.g., a full classification of various types of library data geographical information etc.
This is where ontologies, extra rules, etc, come in ontologies/rule sets can be relatively simple and small,
or huge, or anything in between… Even more powerful queries can be asked as a
result
It could become even more powerful
(121) 121
What did we do?
Data in various formats
Data represented in abstract format
Applications
Map,Expose,…
ManipulateQuery…
(122) 122
Thanks!