linked open data for digital humanities

20
Linked Open Data for Digital Humanities What is Linked Open Data and why is it relevant for you ? Christophe Guéret (@cgueret)

Upload: christophe-gueret

Post on 20-Jan-2015

1.043 views

Category:

Technology


2 download

DESCRIPTION

This presentation was given to Digital Humanties students on March 7. The goal is to introduce LOD and showcase what can be done with it.

TRANSCRIPT

Page 1: Linked Open Data for Digital Humanities

Linked Open Data for Digital HumanitiesWhat is Linked Open Data and

why is it relevant for you ?

Christophe Guéret (@cgueret)

Page 2: Linked Open Data for Digital Humanities

Open Data

“A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.”

http://opendefinition.org/

Page 3: Linked Open Data for Digital Humanities

Linked Data

"a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."

http://linkeddata.org/

Page 4: Linked Open Data for Digital Humanities

Linked Open Data

● Linked Open Data = Open Data + Linked Data

● Interconnected data sets that are on the Web and free to use

● 5-star scheme http://5stardata.info/

Page 5: Linked Open Data for Digital Humanities

Why does it matter for DH ?

● Digital Humanities use a lot of data and study relations between things

● Data acquisition & curation represents a LOT of efforts for data consumers

● Linked Open Data is a good way to○ Facilitate your own work (as a data consumer)○ Facilitate other's work (as a data publisher)

Page 6: Linked Open Data for Digital Humanities

Data found on the Web

● You get the following table as a CSV file

● And that Excel table from somewhere else

Kennis Stad

Christophe Amsterdam

David Parijs

Ville Pays

Paris France

Amsterdam Pays-Bas

Page 7: Linked Open Data for Digital Humanities

And you want to integrate it

● Data integration issues○ Kennis, Stad, Ville, Pays ?○ Parijs = Paris ?○ Amsterdam = Amsterdam ?

● Lot of work for the (uninformed) consumer !

Kennis Stad

Christophe Amsterdam

David Parijs

Ville Pays

Paris France

Amsterdam Pays-Bas

+ = ?

Page 8: Linked Open Data for Digital Humanities

Linked Data approach

● Assign unique identifiers (URIs) to concepts and things

● Create a "triple": connect the identifiers with labelled, directed edges

dbpedia:Amsterdam dbpedia:Netherlandsdbo:country

Page 9: Linked Open Data for Digital Humanities

Why does it solves the issue?

● Shift some of the data integration load on the provider side○ Clarify the semantics of the data○ Refer to identifiers rather than names

● There is only one "dbpedia:Amsterdam" at http://dbpedia.org/resource/Amsterdam

● Labels used for the edges are published by an external authority

Page 10: Linked Open Data for Digital Humanities

Some vocabulary publishers

Page 11: Linked Open Data for Digital Humanities
Page 12: Linked Open Data for Digital Humanities

From triples to the Web of Data

● Every triple is a bit of factual information

● Because nodes are re-used across triples, the union of all the triples is a graph

● The "Web of Data" is a pre-integrated, semantically clear, data set ready to be used!

Page 13: Linked Open Data for Digital Humanities

Exploring relations in the graph

Page 14: Linked Open Data for Digital Humanities

Let's make a social network !

● The network○ A node per European country○ An edge means a shared official language○ Label the edges with the languages○ Label the nodes with the country names

● Data source○ DBpedia SPARQL http://dbpedia.org/sparql

● Visualisation tool○ Gephi https://gephi.org/

Page 15: Linked Open Data for Digital Humanities

SPARQL ?

● Query language for Linked Open Data● Describe part of the graph and use variables

dbpedia:Amsterdam ?Countrydbo:country

Suggested book to read

Page 16: Linked Open Data for Digital Humanities

The query in SPARQLSELECT DISTINCT ?Source ?Target ?Label WHERE {

?country1 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country1 <http://dbpedia.org/ontology/officialLanguage> ?language.?country2 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country2 <http://dbpedia.org/ontology/officialLanguage> ?language.FILTER (?country1 != ?country2)

?country1 <http://www.w3.org/2000/01/rdf-schema#label> ?Source.?country2 <http://www.w3.org/2000/01/rdf-schema#label> ?Target.?language <http://www.w3.org/2000/01/rdf-schema#label> ?Label.FILTER ((LANG(?Source) = "en") && (LANG(?Target) = "en") && (LANG(?Label) = "en"))

}

Page 17: Linked Open Data for Digital Humanities

Making the network

● Get the query from○ https://gist.github.com/cgueret/5098706

● Copy & paste in to○ http://dbpedia.org/sparql

● Change the result format to "CSV"● Press "Run Query" and save the result

● Open Gephi● Start a new project● Import the CSV file in the "Data Laboratory"

Page 18: Linked Open Data for Digital Humanities
Page 19: Linked Open Data for Digital Humanities

There is not only DBpedia ...

Page 20: Linked Open Data for Digital Humanities

Last words

● Look for data sources published as Linked Open Data (RDF), this can save you time

● Consider publishing your own data as Linked Open Data

● There is much more to say...○ Using SPARQL within R (very easily)

■ http://linkedscience.org/tools/sparql-package-for-r/○ Reasoning capabilities of triple stores○ Creating and extending vocabularies