linked open data for digital humanities

Post on 20-Jan-2015

1.043 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation was given to Digital Humanties students on March 7. The goal is to introduce LOD and showcase what can be done with it.

TRANSCRIPT

Linked Open Data for Digital HumanitiesWhat is Linked Open Data and

why is it relevant for you ?

Christophe Guéret (@cgueret)

Open Data

“A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.”

http://opendefinition.org/

Linked Data

"a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."

http://linkeddata.org/

Linked Open Data

● Linked Open Data = Open Data + Linked Data

● Interconnected data sets that are on the Web and free to use

● 5-star scheme http://5stardata.info/

Why does it matter for DH ?

● Digital Humanities use a lot of data and study relations between things

● Data acquisition & curation represents a LOT of efforts for data consumers

● Linked Open Data is a good way to○ Facilitate your own work (as a data consumer)○ Facilitate other's work (as a data publisher)

Data found on the Web

● You get the following table as a CSV file

● And that Excel table from somewhere else

Kennis Stad

Christophe Amsterdam

David Parijs

Ville Pays

Paris France

Amsterdam Pays-Bas

And you want to integrate it

● Data integration issues○ Kennis, Stad, Ville, Pays ?○ Parijs = Paris ?○ Amsterdam = Amsterdam ?

● Lot of work for the (uninformed) consumer !

Kennis Stad

Christophe Amsterdam

David Parijs

Ville Pays

Paris France

Amsterdam Pays-Bas

+ = ?

Linked Data approach

● Assign unique identifiers (URIs) to concepts and things

● Create a "triple": connect the identifiers with labelled, directed edges

dbpedia:Amsterdam dbpedia:Netherlandsdbo:country

Why does it solves the issue?

● Shift some of the data integration load on the provider side○ Clarify the semantics of the data○ Refer to identifiers rather than names

● There is only one "dbpedia:Amsterdam" at http://dbpedia.org/resource/Amsterdam

● Labels used for the edges are published by an external authority

Some vocabulary publishers

From triples to the Web of Data

● Every triple is a bit of factual information

● Because nodes are re-used across triples, the union of all the triples is a graph

● The "Web of Data" is a pre-integrated, semantically clear, data set ready to be used!

Exploring relations in the graph

Let's make a social network !

● The network○ A node per European country○ An edge means a shared official language○ Label the edges with the languages○ Label the nodes with the country names

● Data source○ DBpedia SPARQL http://dbpedia.org/sparql

● Visualisation tool○ Gephi https://gephi.org/

SPARQL ?

● Query language for Linked Open Data● Describe part of the graph and use variables

dbpedia:Amsterdam ?Countrydbo:country

Suggested book to read

The query in SPARQLSELECT DISTINCT ?Source ?Target ?Label WHERE {

?country1 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country1 <http://dbpedia.org/ontology/officialLanguage> ?language.?country2 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country2 <http://dbpedia.org/ontology/officialLanguage> ?language.FILTER (?country1 != ?country2)

?country1 <http://www.w3.org/2000/01/rdf-schema#label> ?Source.?country2 <http://www.w3.org/2000/01/rdf-schema#label> ?Target.?language <http://www.w3.org/2000/01/rdf-schema#label> ?Label.FILTER ((LANG(?Source) = "en") && (LANG(?Target) = "en") && (LANG(?Label) = "en"))

}

Making the network

● Get the query from○ https://gist.github.com/cgueret/5098706

● Copy & paste in to○ http://dbpedia.org/sparql

● Change the result format to "CSV"● Press "Run Query" and save the result

● Open Gephi● Start a new project● Import the CSV file in the "Data Laboratory"

There is not only DBpedia ...

Last words

● Look for data sources published as Linked Open Data (RDF), this can save you time

● Consider publishing your own data as Linked Open Data

● There is much more to say...○ Using SPARQL within R (very easily)

■ http://linkedscience.org/tools/sparql-package-for-r/○ Reasoning capabilities of triple stores○ Creating and extending vocabularies

top related