paths towards the sustainable consumption of semantic data on the web aidan hogan & claudio...

Paths towards the Sustainable Consumptionof Semantic Data on the Web

Aidan Hogan & Claudio Gutierrez

Growth of the Linked Data Cloud

Oct. 2007

Oct. 2007 Nov. 2007

Oct. 2007 Nov. 2007Feb. 2008

Oct. 2007 Nov. 2007Feb. 2008Sep. 2008

Oct. 2007 Nov. 2007Feb. 2008Sep. 2008Mar. 2009

Oct. 2007 Nov. 2007Feb. 2008Sep. 2008Mar. 2009July 2009

Oct 2007Oct. 2007Nov. 2007Feb. 2008Sep. 2008Mar. 2009July 2009Sept. 2010

Oct. 2007 Nov. 2007Feb. 2008Sep. 2008Mar. 2009July 2009Sept. 2010Sept. 2011

Oct. 2007 Nov. 2007Feb. 2008Sep. 2008Mar. 2009July 2009Sept. 2010Sept. 2011Sept. 2012

Oct. 2007 Nov. 2007Feb. 2008Sep. 2008Mar. 2009July 2009Sept. 2010Sept. 2011Sept. 2012Sept. 2013

Oct. 2007 Nov. 2007Feb. 2008Sep. 2008Mar. 2009July 2009Sept. 2010Sept. 2011Sept. 2012Sept. 2013June 2014

One new dataset in the past year

In loving memory of

Linked Data

2007–2012

Survived by its research

community

Access Methods

• Client has a request/query Q• Server has a dataset D• Client issues Q to server• Server computes and returns response Q(D)

Access Methods

• Multiple clients / multiple servers (blurred)• Remote, decentralised, uncoordinated• Web scale Anything else like this?

Access Methods (over the web)

“The Web is not a database”-- Alberto Mendelzon, slides from a presentation at the Workshop on Internet Data Management, Washington,

November 1998.

Linked Data Access Methods

1. Dereferencing:– Look up a URI, get an RDF document

2. Dumps:– Get all data in an archive

3. SPARQL Queries:– Send a query, get the answers

DEREFERENCING

Dereferencing (what is it?)

Q = “http://dbpedia.org/resource/Columbia”Q(D) =

Dereferencing (what’s wrong with it?)

• Responses vary from server to server– local triples where URI is subject (83%) vs.– local triples where URI is subject or object (55%)

[Hogan et al. 2012]

WELL-DEFINED: For a given Q and D, clients and servers agree on what Q(D) should be.

• Very coarse:– Give me all capitals of South American

countries.• Dereference documents for all country URIs• See which ones are in South America, throw away rest• Throw away triples other than capitals

GRANULAR: The language for Q allows the client to be specific so as to avoid wasting bandwidth

• No pagination:– Give me some information about Italy.• Load document with 100,000 triples• Throw away 99,900 triples the user won’t read

PAGINATION: A large response Q(D) can be split into chunks

Dumps (what are they?)

Dumps (what’s wrong with them?)

• 15× compression for RDF achievable• But same weaknesses as for deref. still apply

[Fernández et al. 2012]

Dumps (what’s wrong with them?)

• 15× compression for RDF achievable• But same weaknesses as for deref. still apply• Also, no standard access methods:– Various compressions and formats– Linked through generic HTML

[Fernández et al. 2012]

ACCESSIBLE: The protocol and formats are defined for automatic access by software agents

SPARQL

SPARQL (what is it?)

“What’s the name of the query language again … what’s that? Oh yes, SPARQL.”

-- C. Mohan, June 2014, Cartagena, AMW Summer School

SPARQL (what is it?)Q =

Q(D) =

PREFIX dbo: <http://dbpedia.org/ontology/>...

SELECT ?capitalWHERE { ?s a dbo:Country ; dbp:capital ?c ; dcterms:subject category:Countries_in_South_America . ?c rdfs:label ?capital . FILTER (lang(?capital)="en")}

SPARQL (to the rescue?)

SPARQL (what’s wrong with it?)[Buil Aranda et al. 2012]

SPARQL (what’s wrong with it?)

• Simple Protocol and RDF Query Language• SPARQL evaluation: PSPACE-complete• SPARQL 1.1 even more complex

[Perez et al. 2009]

[Arenas et al. 2013]

COSTABLE: The cost of processing a query can be anticipated before actual processing.

CACHEABLE: Common requests can be cached and re-used. Queries can be anticipated.

• Simple Protocol And RDF Query Language

• Protocol always expects a perfect answer– No support for partial results, timeouts,

exception handling, pagination …

ROBUST: Access method can gracefully handle error cases and provide Quality of Service

• D is a black-box for the user

Q(D) =

PREFIX dbo: <http://dbpedia.org/ontology/>SELECT (COUNT(?c) as ?count)WHERE { ?c a dbo:Country .}

TRANSPARENT: The client can determine if a dataset D is relevant and the service sufficient.

WRAP-UP

Problem Categories

1. Standardised2. Bandwidth-efficient3. Server-processing-efficient4. Usable by client

Conclusion

• Data access methods crucial– Access = Protocol + Query Language

• But current ones don’t work→ need something else (or multiple things?)– Open question:

paths towards the sustainable consumption of semantic data on the web aidan hogan & claudio...

linked data cloudoct

internet data management

given q

q daccess methods

dumps whats wrong

standard access methods

automatic access

dataset dclient issues

Documents

aidan hogan, antoine zimmermann, jürgen umbrich , axel...

finance presentation aidan o’byrne obk accountants/ the...

hogan personality inventory manual - … · hogan...

rdfs & owl reasoning for linked data - aidan...

cc5212-1 p rocesamiento m asivo de d atos o toÑo 2014 aidan...

aidan book

welcome to st. edward catholic church -bienvenidos a … ·...

· orenday, jose ricardo ramos, paula lizbeth salas, jorge...

aidan hogan, antoine zimmermann, jrgen umbrich, axel...

big data: a small introduction - universidad de...

cc5212-1 p rocesamiento m asivo de d atos o toÑo 2015...

aidan mcdonald

cc6202-1 l a w eb de d atos p rimavera 2015 lecture 5: web...

exploiting rdfs and owl for integrating...

answering visuo-semantic queries with imgpedia · answering...

eswc ss 2012 - monday tutorial 1 aidan hogan: semantic web...

cc5212-1 p rocesamiento m asivo de d atos o toÑo 2015...

linked data and the semantic web...

aidan dillon

cc5212-1 p rocesamiento m asivo de d atos o toÑo 2015...