summer school lex 2014 - rdf + sparql querying the web of (lex)data

Post on 28-Jun-2015

278 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Lecture for SUMMER SCHOOL LEX 2014 4, Sept. 2014, Ravenna, Italy http://summerschoollex.cirsfid.unibo.it/

TRANSCRIPT

RDF + SPARQL querying the web

of (lex)data

Diego Valerio Camarda regesta.exe

www.regesta.com

diego.camarda@regesta.com dvcama @ github&twitter

DiegoValerioCamarda @ slideshare

a (really) short introduction to linked open data

what about IRIs and RDF a new way to publish data on the web

ids are ambiguous and suck!

Use URIs as names for things Use HTTP URIs so that people can look up those names Use the standards (RDF, SPARQL) providing useful information Include links to other URIs so that they can discover more things

linked data principles Tim Berners-Lee July 27, 2006

The Children and Families Act 2014

http://www.legislation.gov.uk/id/uksi/2014/2270

what about IRIs and RDF turning documents into data

ids are ambiguous and suck!

A new way to design databases RDF

(aka ’define knowledge’)

Go Triples, go! the standard (old) approach

ID_P COGNOME NOME REF_ID_SOCIETA GENERE

1 Camarda Diego 1 maschio

2 … … … …

ID_SOCIETA DENOMINAZIONE SITO

1 Regesta.exe srl www.regesta.com

Go Triples, go! the new (cool) approach

<http://www.regesta.com/diego>

Subject

Go Triples, go! the new (cool) approach

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName>

Subject Predicate

Go Triples, go! the new (cool) approach

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’.

Subject Predicate Object

Go Triples, go! the new (cool) approach

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’. <http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/firstName> ‘Diego’. <http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/gender> ‘male’.

Go Triples, go! the new (cool) approach

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ .

Go Triples, go! ok, but what a “diego” is?

Go Triples, go! it’s a person!

<http://www.regesta.com/diego> a <http://xmlns.com/foaf/0.1/Person>

Go Triples, go! adding a Class

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ .

<http://www.regesta.com/diego> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .

Go Triples, go! building a graph

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ ; <http://www.w3.org/1999/...#type> <http://xmlns.com/foaf/0.1/Person> .

<http://www.regesta.com/diego> <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com/about> .

Go Triples, go! building a graph

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ ; <http://www.w3.org/1999/...#type> <http://xmlns.com/foaf/0.1/Person> ; <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com/about> .

<http://www.regesta.com/about> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/org#Organization> .

Go Triples, go! building a graph

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ ; <http://www.w3.org/1999/...#type> <http://xmlns.com/foaf/0.1/Person> ; <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com/about> . <http://www.regesta.com/about> <http://www.w3.org/1999/...#type> <http://www.w3.org/ns/org#Organization> .

Go Triples, go! building a graph

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ ; <http://www.w3.org/1999/...#type> <http://xmlns.com/foaf/0.1/Person> ; <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com/about> . <http://www.regesta.com/about> <http://www.w3.org/1999/...#type> <http://www.w3.org/ns/org#Organization> ; <http://www.w3.org/2004/02/skos/core#prefLabel> ‘Regesta.exe srl’ ; <http://xmlns.com/foaf/0.1/homepage> <http://www.regesta.com> .

Go Triples, go! Objects could be Subjects

diego

Go Triples, go! considering diego and regesta

diego

regesta

Go Triples, go! <diego> <memberOf> <regesta>

diego

regesta

Go Triples, go! but, <regesta> <locatedIn> <rome>

diego

regesta

rome

Go Triples, go! <diego> <placeOfBirth> <rome>

diego

regesta

rome

Go Triples, go! <rome> <parentADM> <italy>

diego

regesta

rome

italy

Go Triples, go! <silvia> <placeOfBirth> <italy>

diego

regesta

silvia

rome

italy

Go Triples, go! <silvia> <…> <…>

diego

regesta

silvia

rome

italy

Go Triples, go! <…> <…> <…> = a knowledge graph!

diego

regesta

silvia

rome

italy

A lot of sentence to achieve (descriptive) freedom

<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ . <http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ . <http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/gender> ‘male’ . <http://www.regesta.com/diego> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://www.regesta.com/diego> <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com> . <http://www.regesta.com/silvia> <http://xmlns.com/foaf/0.1/familyName> ‘Mazzini’ . <http://www.regesta.com/silvia> <http://xmlns.com/foaf/0.1/firstName> ‘Silvia’ . <http://www.regesta.com/silvia> <http://xmlns.com/foaf/0.1/gender> ‘female’ . <http://www.regesta.com/silvia> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://www.regesta.com/silvia> <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com> . <http://www.regesta.com> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/org#Organization> . <http://www.regesta.com> <http://www.w3.org/2004/02/skos/core#prefLabel> ‘Regesta.exe srl’ . <http://www.regesta.com/silvia> <http://xmlns.com/foaf/0.1/knows> <http://www.regesta.com/diego> .

<…> <…> <…>.

<noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>.<noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> …

Standards for semantic web

RDF http://www.w3.org/standards/techs/rdf SPARQL http://www.w3.org/standards/techs/sparql ONTOLOGIES http://www.w3.org/standards/semanticweb/ontology

Did you studied HTML? Good! it's time for a new standard

The Resource Description Framework is a general-purpose language for representing

information in the Web.

It's time for a new standard RDF

The SPARQL Protocol and RDF Query Language is a query language and protocol for RDF.

It's time for a new standard SPARQL

On the Semantic Web, vocabularies define the concepts and relationships

(also referred to as “terms”) used to describe and represent

an area of concern.

It's time for a new standard Ontologies

PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> foaf:firstName dc:title rdfs:label

Pre:fixes (ontologies) just a few words

Browsing the web of data

Resource Description Framework

› SPARQL endpoint › dereferenceable URIs › content negotiation › standard ports, like 80 (HTTP) › JSONP support

MUST!

Resource Description Framework

› SPARQL endpoint › dereferenceable URIs › content negotiation › standards port, like 80 (HTTP) › JSONP support › up-to-date › the endpoint URL is easy to deduce from resources › the resources are described by dc:title or rdfs:label › the endpoint hosts a page for humans › the resources and the endpoint are on the same domain

SHOULD! (please do it, for me)

SELECT * {?minnesota ?banana ?sun}

SPARQL a must know query language

SPARQL group graph pattern

diego

regesta

silvia

rome

italy

diego

regesta

silvia

rome

italy

SPARQL group graph pattern

diego

regesta

rome

silvia italy

silvia italy

SELECT ?person { ?person <placeOfBirth> ?place. ?person <memberOf> ?company . ?company <locatedIn> ?place . }

SPARQL group graph pattern

<diego>

SELECT ?person ?prop ?obj { ?person <placeOfBirth> ?place. ?person <memberOf> ?company. ?person ?prop ?obj . ?company <locatedIn> ?place . }

SPARQL group graph pattern

(turn the page)

person prop obj <diego> rdf:type foaf:Person <diego> foaf:firstName ‘Diego’ <diego> foaf:familyName ‘Camarda’ <diego> foaf:gender ‘male’ <diego> org:memberOf <regesta>

SPARQL group graph pattern

DESCRIBE <diego>

SPARQL describe

(turn the page)

<diego> rdf:type foaf:Person . <diego> foaf:firstName ‘Diego’ . <diego> foaf:familyName ‘Camarda’ . <diego> foaf:gender ‘male’ . <diego> org:memberOf <regesta> . <silvia> foaf:knows <diego> .

SPARQL describe

DISTINCT, COUNT GRAPH, PREFIX isBlank, isIRI, isLiteral, isNumeric FILTER, REGEX, STR FILTER NOT EXISTS, MINUS ORDER BY, OFFSET, LIMIT for other stuff http://www.w3.org/TR/sparql11-query/

SPARQL minimum requirements

Please start negotiating content right now!

Hi dude, I accept: text/html,application/xhtml+xml Html

page Great! I’ll serve you a web page

Hi dude, I accept: application/rdf+xml

RDF data Great… 302, redirect!

Hi dude, I accept: pizza/margherita

406 error mmm… sorry

Please start negotiating content right now!

application/rdf+xml application/xml text/plain text/turtle application/x-turtle application/trix application/x-trig text/n3 text/rdf+n3 application/trix

application/x-trig application/x-binary-rdf text/x-nquads application/ld+json application/rdf+json application/xhtml+xml text/xml application/json application/rdf+xml application/rdf+n3 application/sparql-results+xml application/sparql-results+json

curl -L -H "Accept: application/rdf+xml" http://dati.camera.it/ocd/governo.rdf/g102 curl -L -H "Accept: text/n3" http://dati.camera.it/ocd/governo.rdf/g102

Please start negotiating content using CURL…

Java : Sesame / Jena

Python : RDFLib Ruby : RDF.rb

nodeJs : sparql-client

or, as I do, simple HTTP GET +

parsing results as json or xml

Please start negotiating content …or a framework!

RDF data storing and deploying

It’s slow so keep calm

1 record 15 triples

2.949.771 votes 64.948.856 triples

usually

eg. Chamber of deputies

data big data

RDF probably will transform

Virtuoso Sesame

Fuseki (Jena) Owlim / Bigdata (Sesame)

AllegroGraph D2R server

ARC2 …

Triplestores I just need a SPARQL endpoint

I just really need http://yourdomain/sparql

Case studies

select distinct ?o where {?s a ?o}

select ?o count(distinct ?s) where {?s a ?o}

select count(?s) where {?s ?p ?o}

select count(?s) ?class where {?s ?p ?o; a ?class}

select distinct ?p where {?s a <http://classe>; ?p ?o}

select ?p count(?p) where {?s a <http://classe>; ?p ?o}

select ?s where {?s a <http://classe>}

?p ?o where {<http://URI> ?p ?o}

select distinct ?s ?title where {?s a <http://classe>; dc:title ?title. FILTER(REGEX(? title,’parola’,’i’))} LIMIT 100

SPARQL magic a query for all seasons

Case studies Chamber of deputies Senate of Republic

http://dati.camera.it/sparql

http://dati.senato.it/sparql

Useful links

All Bills filtered by year SELECT DISTINCT * {?bill a ocd:atto; dc:title ?title; dc:date ?date . FILTER(regex(?date,'^2014'))} ORDER BY ?date

Last voted Bills SELECT distinct * WHERE { ?bill a ocd:atto; dc:title ?title. ?votazione a ocd:votazione; ocd:rif_attoCamera ?bill; dc:date ?data; dc:title ?denominazione; dc:description ?descrizione; ocd:votanti ?votanti; ocd:votazioneFinale 1; ocd:favorevoli ?favorevoli; ocd:contrari ?contrari; ocd:astenuti ?astenuti; ocd:rif_leg <http://dati.camera.it/ocd/legislatura.rdf/repubblica_17>} ORDER BY DESC(?data)

Example queries Chamber of deputies

All Bills filtered by year PREFIX osr: <http://dati.senato.it/osr/> SELECT DISTINCT * {?bill a osr:Ddl; osr:titolo ?title; osr:dataPresentazione ?date . FILTER(regex(STR(?date),'^2014'))} ORDER BY ASC(?date)

Last approved Bills PREFIX osr: <http://dati.senato.it/osr/> SELECT DISTINCT ?ddl ?titolo ?titoloBreve ?natura ?stato ?dataApprovato WHERE { ?ddl a osr:Ddl. ?ddl osr:statoDdl ?stato. ?ddl osr:ramo "S"^^<http://www.w3.org/2001/XMLSchema#string>. ?ddl osr:dataPresentazione ?dataPresentazione. ?ddl osr:titolo ?titolo. OPTIONAL { ?ddl osr:titoloBreve ?titoloBreve }. ?ddl osr:natura ?natura. ?ddl osr:dataStatoDdl ?dataApprovato. ?ddl osr:testoApprovato ?testoApprovato FILTER(xsd:date(str(?dataApprovato)) <= xsd:date(str("2014-12-31"))) FILTER(xsd:date(str(?dataApprovato)) >= xsd:date(str("2014-01-01"))) } ORDER BY ?dataApprovato

Example queries Senate of Republic

Case studies UK Legislation

http://gov.tso.co.uk/legislation/sparql

http://openuplabs.tso.co.uk/sparql/gov-legislation

http://www.opsi.gov.uk/legislation-api/developer/formats/rdf

Useful links

All ‘Works’ filtered by year SELECT ?work ?date ?title {?work a frbr:Work . ?work dct:title ?title . ?work dct:created ?date . FILTER (REGEX(STR(?date),'^2014')) } ORDER BY desc(?date)

Top subjects by year SELECT (count(?sub) as ?tot) ?sub { ?work a frbr:Work . ?work dct:subject ?sub . ?work dct:created ?date . FILTER (REGEX(STR(?date),'^2014')) } GROUP BY ?sub ORDER BY desc(?tot) LIMIT 100

Example queries

Even more Useful links

W3C standards http://www.w3.org/standards/semanticweb/ OKFN endpoints status (and list) http://sparqles.okfn.org LodLive (a SPRQL navigator) http://en.lodlive.it a very good intro to RDF https://github.com/JoshData/rdfabout/blob/gh-pages/intro-to-rdf.md Tim Berners-Lee’s “Linked Data – 5 stars ranking” http://www.w3.org/DesignIssues/LinkedData.html My github page http://github.com/dvcama My email mailto:diego.camarda@regesta.com

top related