Tutorial on RDF Stream Processing M. Balduini, J-P Calbimonte, O. Corcho, D. Dell'Aglio, E. Della Valle http://streamreasoning.org/rsp2014
C-SPARQL: A Continuous Extension of SPARQL Marco Balduini [email protected]
http://streamreasoning.org/rsp2014
Share, Remix, Reuse — Legally
§ This work is licensed under the Creative Commons Attribution 3.0 Unported License.
§ Your are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
§ Under the following conditions
• Attribution — You must attribute the work by inserting – “[source http://streamreasoning.org/rsp2014]” at the end of
each reused slide – a credits slide stating
- These slides are partially based on “Streaming Reasoning for Linked Data 2013” by M. Balduini, J-P Calbimonte, O. Corcho, D. Dell'Aglio, E. Della Valle, and J.Z. Pan http://streamreasoning.org/sr4ld2013
§ To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/
2
http://streamreasoning.org/rsp2014
Agenda
§ Introduction
§ Running example
§ C-SPARQL language • Query and Stream Registration • FROM STREAM Clause • TimeStamp Function • Query Chaining • Accessing background Information • Q/A under RDFS entailment regime • Q/A under RDFS++ entailment regime
§ C-SPARQL Engine
§ Resources
3
http://streamreasoning.org/rsp2014 4
A Reminder of SPARQL
http://streamreasoning.org/rsp2014 5
Where C-SPARQL Extends SPARQL
http://streamreasoning.org/rsp2014
C-SPARQL language
§ C-SPARQL is an extension of SPARQL 1.1
§ C-SPARQL queries • Changes the semantics of SPARQL 1.1 query forms from the
one-time semantics to the continuous one (i.e., instantaneous)
• Adds to SPARQL 1.1 query forms the STREAM form and • Adds to SPARQL 1.1 datasets clauses the FROM STREAM one • Adds a built-in function to access the timestamp of a triple
6
http://streamreasoning.org/rsp2014 7
Running Example – Data Model
Observation
Sensor
Person
Post
Room where
discusses who
observes
subClassOf
subClassOf
posts
subPropOf
Streaming information
Background information
isWith
isIn
isConnectedTo
http://streamreasoning.org/rsp2014
Running Example Background Information
§ The Ontology • http://www.streamreasoning.org/ontologies/rsp2014-onto.rdf
§ The Instances • :Alice a :Person . • :Bob a :Person . • :Carl a :Person . • :David a :Person . • :Elen a :Person . • :RedRoom a :Room . • :BlueRoom a :Room . • :RedRoom :isConnectedTo :BlueRoom . • :RedSensor a :Sensor . • :BlueSensor a :Sensor .
8
http://streamreasoning.org/rsp2014 9
Running Example
BlueRoom RedRoom
RedSensor
BlueSensor
R
f
4
f Alice
David
Bob
Carl
Elena
R RFID 4 Foursquare f Facebook is with
http://streamreasoning.org/rsp2014
Running Example Example data in the streams
§ Four ways to learn who is where
10
Sensor Room Person Time-stamp
RedSensor RedRoom Alice T1
… … … …
Person ChecksIn Time-stamp
Bob BlueRoom T2
… … …
Person IsIn With Time-stamp
Carl null Bob T2
David RedRoom Elena T3
… … … …
http://streamreasoning.org/rsp2014
Running Example Streaming Information
§ RDF Stream Data Type • Ordered sequence of pairs, where each pair is made of an
RDF triple and its timestamp
– Timestamps are not required to be unique, they must be non-decreasing
§ E.g.,
11
RDF graph Time-stamp
Stream
:RedSensor :observes [ :who :Alice; :where :RedRoom ] . T1 rfid :Bob :posts [ :who :Bob ; :where :BlueRoom ] . T2 fs :Carl :posts [ :who :Carl , :Bob ] . T2 fb :David :posts [ :who :David , :Elena ; :where :RedRoom] T3 fb
http://streamreasoning.org/rsp2014
C-SPARQL language
§ Features illustrated in the rest of this session • register continuous queries
– QUERY form – STREAM form
• identify relevant information in one or more RDF streams • derive and aggregate information from one or more RDF
streams • join or merge RDF streams • feed results of one C-SPARQL query to a subsequent
C-SPARQL query • access background RDF graphs • answer under RDFS entailment regime • answer under RDFS++ entailment regime
12
http://streamreasoning.org/rsp2014 13
C-SPARQL Language Query and Stream Registration
http://streamreasoning.org/rsp2014
C-SPARQL Language Query and Stream Registration
§ All C-SPARQL queries over RDF streams are continuous • Registered through the REGISTER statement
§ The output of queries is in the form of • Instantaneous tables of variable bindings • Instantaneous RDF graphs • RDF stream
§ Only queries in the CONSTRUCT form can be registered as generators of RDF streams
§ Composability: • Query results registered as streams can feed other registered
queries just like every other RDF stream
14
http://streamreasoning.org/rsp2014
C-SPARQL Language Query registration - Example
§ Using the social stream fb, Who is where?
REGISTER QUERY QWhoIsWhereOnFb AS PREFIX : <http://…/rsp2014-onto#> SELECT ?room ?person FROM STREAM <http://…/fb> [RANGE 1m STEP 10s] WHERE { ?person1 :posts [ :who ?person ; :where ?room ] . }
§ The resulting variable bindings has to be interpreted as an instantaneous. It expires as soon as the query is recomputed
15
http://streamreasoning.org/rsp2014
C-SPARQL Language Stream registration - Example
§ Results of a C-SPARQL query can be stream out for down stream queries
REGISTER STREAM SWhoIsWhereOnFb AS PREFIX : <http://…/rsp2014-onto#> CONSTRUCT { ?person :isIn ?room } FROM STREAM <http://…/fb> [RANGE 1m STEP 10s] WHERE { ?person1 :posts [ :who ?person ; :where ?room ] . }
§ The resulting RDF triples are streamed out on an RDF stream • More details in the C-SPARQL Engine hands-on session
16
http://streamreasoning.org/rsp2014
C-SPARQL Language Stream Registration - Notes
§ The output is constructed in the format of an RDF stream.
§ Every query execution may produce from a minimum of zero triples to a maximum of an entire graph.
§ The timestamp is always dependent on the query execution time only, and is not taken from the triples that match the patterns in the WHERE clause.
17
http://streamreasoning.org/rsp2014 18
C-SPARQL Language FROM STREAM Clause
http://streamreasoning.org/rsp2014
C-SPARQL Language FROM STREAM Clause
§ FROM STREAM clauses are similar to SPARQL datasets • They identify RDF stream data sources • They represent windows over a RDF stream
§ They define the RDF triples available for querying and filtering.
19
http://streamreasoning.org/rsp2014
C-SPARQL Language FROM STREAM Clause - windows
§ physical: a given number of triples
§ logical: a variable number of triples which occur during a given time interval (e.g., 1 hour) • Sliding: they are progressively advanced of
a given STEP (e.g., 5 minutes)
• Tumbling: they are advanced of exactly their time interval
20
http://streamreasoning.org/rsp2014
C-SPARQL Language FROM STREAM Clause - Windows
§ Grammar of the FROM STREAM clause
21
http://streamreasoning.org/rsp2014
C-SPARQL Language FROM STREAM Clause - Example
§ Using the social stream fb, how many people are in the same room? Count on a window of 1 minute that slides every 10 seconds REGISTER QUERY HowManyPoepleAreInTheSameRoom AS PREFIX : <http://…/rsp2014-onto#> SELECT ?room (count(?s) as ?person) FROM STREAM <http://…/fb> [RANGE 1m STEP 10s] WHERE { ?person1 :posts [ :who ?person ; :where ?room ] . FILTER(?person1 != ?person) } GROUP BY ?room
22
http://streamreasoning.org/rsp2014 23
C-SPARQL Language C-SPARQL reports only snapshots
t
t+10
t+20
t+30
t+40
t+50
t+60
t+70
t+80
d1
d2
d3
d1 d1 d1 d1 d1
d2 d2 d2 d2
d3 d3
Incoming timestamped RDF triples
Time window [RANGE 40s STEP 10s] Window content t+40
d1
d1, d2 d1, d2
d1, d2, d3
d2, d3
t+50 t+60 t+70 t+80
http://streamreasoning.org/rsp2014
C-SPARQL Language Multiple FROM STREAM Clause - Example
§ Using the social stream fb and fs, how many people are in the same room? Count on a window of 1 minute that slides every 10 seconds REGISTER QUERY HowManyPoepleAreInTheSameRoom AS PREFIX : <http://…/rsp2014-onto#> SELECT ?room (count(?s) as ?person) FROM STREAM <http://…/fb> [RANGE 1m STEP 10s] FROM STREAM <http://…/fs> [RANGE 1m STEP 10s] WHERE { ?person1 :posts [ :who ?person ; :where ?room ] . FILTER(?person1 != ?person) } GROUP BY ?room
24
http://streamreasoning.org/rsp2014 25
C-SPARQL Language TimeStamp Function
http://streamreasoning.org/rsp2014
C-SPARQL Language TimeStamp Function – Syntax and Semantics
§ The timestamp of a triple can be bound to a variable using a timestamp() function
§ Syntax • timestamp(variable|IRI|bn, variable|IRI, variable|IRI|bn|literal)
§ Semantics
26
Triple Result of evalutaion It is not in the window Type Error It appears once in the window
Timestamp of triple
It appears multiple times in the window
The timestamp of the most recent triple
http://streamreasoning.org/rsp2014
C-SPARQL Language TimeStamp Function - Example
§ Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users REGISTER QUERY FindOpinionMakers AS PREFIX f: <http://larkc.eu/csparql/sparql/jena/ext#> PREFIX : <http://…/rsp2014-onto#> SELECT ?someOne ?room (COUNT(?someOneElse) AS ?n) FROM STREAM <http://…/fb> [RANGE 1m STEP 10s] WHERE { ?someOne :posts ?p1 . ?p1 :where ?room . ?someOneElse :posts ?p2 . ?p2 :where ?room . FILTER(?someOne!=?someOneElse ) FILTER (f:timestamp(?p1 :where ?room) < f:timestamp(?p2 :where ?room)) } GROUP BY ?someOne ?room HAVING (?n>3)
27
http://streamreasoning.org/rsp2014
C-SPARQL Language Query Chaining
§ A C-SPARQL query Q1 registered using the STREAM clause streams results on an RDF stream
§ A down stream C-SPARQL query Q2 can open a window on the RDF stream of Q1 using the FROM STREAM clause
§ E.g.,
28
Is in on 4 query
4 Stream
f Stream
Is with on f query
Is In across f and 4 query
Stream
Stream
:Bob :posts [ :who :Bob ; :where :BlueRoom ] .
:Carl :posts [ :who :Carl , :Bob ] .
:Bob :isIn :BlueRoom .
:Carl :isWith :Bob .
:Carl :isIn :BlueRoom .
http://streamreasoning.org/rsp2014
C-SPARQL Language Accessing background Information
§ C-SPARQL allows for asking the engine to issue the query also against RDF graphs using the FROM clauses.
§ E.g., Where else can Alice go? REGISTER QUERY WhereElseCanAliceGo AS PREFIX : <http://…/rsp-onto#> SELECT ?room FROM STREAM <http://…/isIn> [RANGE 10m STEP 10m] FROM <http://…/bgInfo.rdf> WHERE { ?:Alice :isIn ?someRoom . ?someRoom :isConnectedTo ?room . }
29
IRI identifying the dataset containing the background information
http://streamreasoning.org/rsp2014
C-SPARQL Language Alert in accessing background Information
§ Accessing background information in DSMSs and CEPs is problematic and it can spoil performances
§ The C-SPARQL Engine 0.9.3 allows for managing background information using a SPARQL update endpoint
§ For a better solution see: • Le-Phuoc, D., Dao-Tran, M.: A native and adaptive approach
for unified processing of linked streams and linked data. In: International Semantic Web Conference (ISWC 2011). Volume 1380., Bonn, Germany, Springer (2011) 370–388
30
http://streamreasoning.org/rsp2014
C-SPARQL Language C-SPARQL queries and reasoning
§ SPARQL is orthogonal to reasoning, so C-SPARQL is
§ The C-SPARQL Engine 0.9.3 • supports data-driven naïve stream reasoning
– This is to be intended as a term of comparison for stream reasoning research
• can answer queries under – RDFS entailment regime – RDFS++ entailment regime
31
http://streamreasoning.org/rsp2014
C-SPARQL Language C-SPARQL queries and reasoning - example § Memo
• posts is a sub property of observes
§ Data
§ Query REGISTER QUERY QueryUnderRDFSEntailmentRegime AS PREFIX : <http://…/sr4ld2013-onto#> SELECT ?x ?room ?person FROM STREAM <http://…/fs> [RANGE 1m STEP 10s] FROM STREAM <http://…/rfid> [RANGE 1m STEP 10s] WHERE { ?x :observes [ :who ?person ; :where ?room ] .}
§ Results
32
RDF graph Time-stamp Stream
:RedSensor :observes [ :who :Alice; :where :RedRoom ] . T1 rfid
:Bob :posts [ :who :Bob ; :where :BlueRoom ] . T2 fs
?x ?room ?person
:RedSensor :RedRoom :Alice
:Bob :RedRoom :Bob
http://streamreasoning.org/rsp2014 33
Introduction C-SPARQL Engine Architecture
§ Simple, modular architecture
§ It relies entirely on existing technologies
§ Integration of • DSMSs (Esper) and • SPARQL engines (Jena-
ARQ)
http://streamreasoning.org/rsp2014
Introduction C-SPARQL Engine Features at a glance 1/3
§ Efficient RDF stream Processing • Continuous queries, filtering, aggregations, joins, sub-queries
via C-SPARQL (a SPARQL 1.1 extension) • Push based • High throughput, low latency
§ Minimal support for • pattern detection via timestamp function • Background RDF graph access
§ Naïve support for • RDFS reasoning • RDFS++ reasoning
§ Fully compatible with SPARQL 1.1 (tested with http://www.w3.org/wiki/SRBench)
Alert! using timestamp function, "static" RDF graphs, and reasoning spoil performances
34
http://streamreasoning.org/rsp2014
Introduction C-SPARQL Engine Features at a glance 2/3
§ Extensible Middleware • Runtime management of
– RDF streams – C-SPARQL query – Result listerners
• API driven • RESTful service driven
§ Quick start available • C-SPARQL Engine
– http://streamreasoning.org/download/csparqlreadytogopack • RDF Stream Processging RESTful Interface (RSP-service) for
C-SPARQL Engine – http://streamreasoning.org/download/rsp-service4csparql
35
http://streamreasoning.org/rsp2014
Introduction C-SPARQL Engine Features at a glance 3/3
§ Released open source under Apache 2.0 • C-SPARQL Engine
– https://github.com/streamreasoning/CSPARQL-engine – https://github.com/streamreasoning/CSPARQL-ReadyToGoPack
• RSP-services – https://github.com/streamreasoning/rsp-services-csparql – https://github.com/streamreasoning/rsp-services-api – https://github.com/streamreasoning/rsp-services-client-example
36
http://streamreasoning.org/rsp2014
Resources
§ Read out more • C-SPARQL semantics
– Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: C-SPARQL: a Continuous Query Language for RDF Data Streams. Int. J. Semantic Computing 4(1): 3-25 (2010)
• Most recent syntax – D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus,
Querying RDF streams with C-SPARQL, SIGMOD Record 39 (1) (2010) 20–26.
§ Downloads • http://streamreasoning.org/download/csparqlreadytogopack • http://streamreasoning.org/download/rsp-service4csparql
§ See demos • http://streamreasoning.org/demos
§ Contact point • [email protected] • [email protected]
37
Tutorial on RDF Stream Processing M. Balduini, J-P Calbimonte, O. Corcho, D. Dell'Aglio, E. Della Valle http://streamreasoning.org/rsp2014
C-SPARQL: A Continuous Extension of SPARQL Marco Balduini [email protected]