linked open data for digital humanities
DESCRIPTION
This presentation was given to Digital Humanties students on March 7. The goal is to introduce LOD and showcase what can be done with it.TRANSCRIPT
Linked Open Data for Digital HumanitiesWhat is Linked Open Data and
why is it relevant for you ?
Christophe Guéret (@cgueret)
Open Data
“A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.”
http://opendefinition.org/
Linked Data
"a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."
http://linkeddata.org/
Linked Open Data
● Linked Open Data = Open Data + Linked Data
● Interconnected data sets that are on the Web and free to use
● 5-star scheme http://5stardata.info/
Why does it matter for DH ?
● Digital Humanities use a lot of data and study relations between things
● Data acquisition & curation represents a LOT of efforts for data consumers
● Linked Open Data is a good way to○ Facilitate your own work (as a data consumer)○ Facilitate other's work (as a data publisher)
Data found on the Web
● You get the following table as a CSV file
● And that Excel table from somewhere else
Kennis Stad
Christophe Amsterdam
David Parijs
Ville Pays
Paris France
Amsterdam Pays-Bas
And you want to integrate it
● Data integration issues○ Kennis, Stad, Ville, Pays ?○ Parijs = Paris ?○ Amsterdam = Amsterdam ?
● Lot of work for the (uninformed) consumer !
Kennis Stad
Christophe Amsterdam
David Parijs
Ville Pays
Paris France
Amsterdam Pays-Bas
+ = ?
Linked Data approach
● Assign unique identifiers (URIs) to concepts and things
● Create a "triple": connect the identifiers with labelled, directed edges
dbpedia:Amsterdam dbpedia:Netherlandsdbo:country
Why does it solves the issue?
● Shift some of the data integration load on the provider side○ Clarify the semantics of the data○ Refer to identifiers rather than names
● There is only one "dbpedia:Amsterdam" at http://dbpedia.org/resource/Amsterdam
● Labels used for the edges are published by an external authority
Some vocabulary publishers
From triples to the Web of Data
● Every triple is a bit of factual information
● Because nodes are re-used across triples, the union of all the triples is a graph
● The "Web of Data" is a pre-integrated, semantically clear, data set ready to be used!
Exploring relations in the graph
Let's make a social network !
● The network○ A node per European country○ An edge means a shared official language○ Label the edges with the languages○ Label the nodes with the country names
● Data source○ DBpedia SPARQL http://dbpedia.org/sparql
● Visualisation tool○ Gephi https://gephi.org/
SPARQL ?
● Query language for Linked Open Data● Describe part of the graph and use variables
dbpedia:Amsterdam ?Countrydbo:country
Suggested book to read
The query in SPARQLSELECT DISTINCT ?Source ?Target ?Label WHERE {
?country1 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country1 <http://dbpedia.org/ontology/officialLanguage> ?language.?country2 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country2 <http://dbpedia.org/ontology/officialLanguage> ?language.FILTER (?country1 != ?country2)
?country1 <http://www.w3.org/2000/01/rdf-schema#label> ?Source.?country2 <http://www.w3.org/2000/01/rdf-schema#label> ?Target.?language <http://www.w3.org/2000/01/rdf-schema#label> ?Label.FILTER ((LANG(?Source) = "en") && (LANG(?Target) = "en") && (LANG(?Label) = "en"))
}
Making the network
● Get the query from○ https://gist.github.com/cgueret/5098706
● Copy & paste in to○ http://dbpedia.org/sparql
● Change the result format to "CSV"● Press "Run Query" and save the result
● Open Gephi● Start a new project● Import the CSV file in the "Data Laboratory"
There is not only DBpedia ...
Last words
● Look for data sources published as Linked Open Data (RDF), this can save you time
● Consider publishing your own data as Linked Open Data
● There is much more to say...○ Using SPARQL within R (very easily)
■ http://linkedscience.org/tools/sparql-package-for-r/○ Reasoning capabilities of triple stores○ Creating and extending vocabularies