rdf stream processing: let's react

30
RDF Stream Processing: Let’s React! Jean-Paul Calbimonte LSIR EPFL OrdRing 2014. International Semantic Web Conference ISWC 2014 Riva del Garda, 20.10.2014 @jpcik

Upload: jean-paul-calbimonte

Post on 18-Dec-2014

264 views

Category:

Software


3 download

DESCRIPTION

RDF Stream processing and reactive systems

TRANSCRIPT

Page 1: RDF Stream Processing: Let's React

RDF Stream Processing: Let’s React!

Jean-Paul CalbimonteLSIR EPFL

OrdRing 2014. International Semantic Web Conference ISWC 2014

Riva del Garda, 20.10.2014

@jpcik

Page 2: RDF Stream Processing: Let's React

The RSP CommunityResearch workMany PapersPhD ThesisDatasetsPrototypesBenchmarks

RDF StreamsStream ReasoningComplex Event ProcessingStream Query ProcessingStream CompressionSemantic Sensor WebM

any

topi

cs

Tons

of w

ork

http://www.w3.org/community/rspW3C RSP Community Group

Effort to our work on RDF stream processingdiscussstandardizecombineformalizeevangelize

2

Page 3: RDF Stream Processing: Let's React

The RSP Community

3

Page 4: RDF Stream Processing: Let's React

Why Streams?Internet of ThingsSensor NetworksMobile NetworksSmart DevicesParticipatory SensingTransportation

Financial DataSocial MediaUrban PlanningHealth MonitoringMarketing

“It’s a streaming world!”[1]

4 Della Valle, et al : It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems

Page 5: RDF Stream Processing: Let's React

Why Streams?RDF

Web standardsData discoveryData sharingWeb queries G

o W

eb

SemanticsVocabulariesData HarvestingData linkingMatchingIn

tegr

ation Ontologies

ExpressivityInferenceRule processingKnowledge basesRe

ason

ing

Query languagesQuery answeringEfficient processingQuery FederationPr

oces

sing

Work in progress

Is this what we require?

5

Page 6: RDF Stream Processing: Let's React

Stonebreaker et al. The 8 requirement of Real-TimeStream Processing. SIGMOD Record. 2005.

Looking back 10 years ago…

“8 requirements of real-time

stream processing”[2]

Do we address them?Do we have more requirements?Do we need to do more?

Keep data movingQuery with stream SQLHandle imperfectionsPredictable outcomesIntegrate stored dataData safety & availabilityPartition & scaleRespond Instantaneously8

Requ

irem

ents

6

Page 7: RDF Stream Processing: Let's React

Reactive SystemsKeep data movingQuery with stream SQLHandle imperfectionsPredictable outcomesIntegrate stored dataData safety & availabilityPartition & scaleRespond Instantaneously8

Requ

irem

entsEvent-Driven

Jonas Boner. Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems. 2013.

Events:re

act t

o ScalableLoad:

ResilientFailure:

ResponsiveUsers:

Do we address them?Do we have more requirements?Do we need to do more?

7

Page 8: RDF Stream Processing: Let's React

Warning: You may see code

8

Page 9: RDF Stream Processing: Let's React

① Keep the data movingProcess data in-stream

Not required to store Active processing model

input streams

RSPqueries/

rules output streams/events

RDF Streams9

Page 10: RDF Stream Processing: Let's React

RDF Stream…Gi

Gi+1

Gi+2

…Gi+n

…unbo

unde

d se

quen

ce

Gi {(s1,p1,o1), (s2,p2,o2),…} [ti]

1+ triplesimplicit/explicit timestamp/interval

public class SensorsStreamer extends RdfStream implements Runnable { public void run() { .. while(true){ ... RdfQuadruple q=new RdfQuadruple(subject,predicate,object, System.currentTimeMillis()); this.put(q); } }} C-SP

ARQL

How do I code this?

something to run on a thread

timestamped triple

the stream is “observable”

Data structure, execution and callbacks are mixed

10

Observer patternTightly coupled listener

Page 11: RDF Stream Processing: Let's React

Actor Model

11

Actor1

Actor2

m No shared mutable stateAvoid blocking operatorsLightweight objectsLoose coupling

communicate through messages

mailboxstate

behaviornon-blocking response

send: fire-forget

More goodies later…

Implementations: e.g. Akka for Java/Scala

Page 12: RDF Stream Processing: Let's React

RDF Streamobject DemoStreams { ... def streamTriples={ Iterator.from(1) map{i=> ... new Triple(subject,predicate,object) } }

Data structureInfinite triple iterator

Executionval f=Future(DemoStreams.streamTriples)f.map{a=>a.foreach{triple=> //do something}}

Asynchronous iteration

Message passingf.map{a=>a.foreach{triple=> someSink ! triple}}

send triple to actor

Immutable RDF stream avoid shared mutable state avoid concurrent writes unbounded sequence

Ideas using akka actors

Disclai

mer: ideas

in

progre

ss

Futures non blocking composition concurrent computations work with not-yet-computed results

Actors message-based share-nothing async distributable

12

Page 13: RDF Stream Processing: Let's React

RDF Stream

13

… other issues: Graph implementation? Timestamps: application vs system? Serialization?

Loose coupling Immutable data streams Asynchronous message passing Well defined input/output

Event-driven

Page 14: RDF Stream Processing: Let's React

② Query using SQL on StreamsSPARQL

Model Continuou

s execution

Union, Join, Optional,

Filter

Aggregates

Time

window

Triple

window

R2S operato

r

Sequence, Co-

ocurrence

Time

functio

n

TA-SPARQL TA-RDF ✗ ✔ Limited ✗ ✗ ✗ ✗ ✗

tSPARQL tRDF ✗ ✔ ✗ ✗ ✗ ✗ ✗ ✗

Streaming SPARQL

RDF Stream ✔ ✔ ✗ ✔ ✔ ✗ ✗ ✗

C-SPARQL RDF Stream ✔ ✔ ✔ ✔ ✔ ✗ ✗ ✔

CQELS RDF Stream ✔ ✔ ✔ ✔ ✔ ✗ ✗ ✗

SPARQLStream (Virtual) RDF Stream

✔ ✔ ✔ ✔ ✗ ✔ ✗ ✗

EP-SPARQL RDF Stream ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✗

Instans RDF ✔ ✔ ✔ ✗ ✗ ✗ ✗ ✗

W3C RSP review features in existing systems agree on fundamental operators discuss on possible semantics https://www.w3.org/community/rsp/wiki/RSP_Query_Features

Not e

xhau

stive

!

RSP is not always/only SPARQL-like queryingSPARQL protocol is not enoughRSP RESTful interfaces?

14

Powerful languages for continuous query processing

Page 15: RDF Stream Processing: Let's React

ExecContext context=new ExecContext(HOME, false);

String queryString =" SELECT ?person ?loc …ContinuousSelect selQuery=context.registerSelect(queryString);selQuery.register(new ContinuousListener(){  public void update(Mapping mapping){    String result="";    for(Iterator<Var> vars=mapping.vars();vars.hasNext();)      result+=" "+ context.engine().decode(mapping.get(vars.next()));      System.out.println(result);    } });

RSP QueryingExample with CQELS (code.google.com/p/cqels)

CQELS continuous query: “Queries evaluated continuously against the changing dataset”[4]

Le Phuoc et al. An Native and adaptive approach for unifyied processing of Linked Streams and Linked Data. ISWC2011.

get result updates

adding listener

register query

15

SELECT ?person ?loc  WHERE { STREAM <http://deri.org/streams/rfid> [RANGE 3s] {?person :detectedAt ?loc} }

CQELS

Tightly coupled listenersResults delivery: push & pull?

Page 16: RDF Stream Processing: Let's React

Dynamic Push-Pull

16

ProducerConsumer

m

data flow

demand flow

Push when consumer is fasterPull when producer is fasterDynamically switch modes

Communication is dynamic depending on demand vs supply

Event-drivenResponsive

Page 17: RDF Stream Processing: Let's React

③ Handle stream imperfectionsDelayed, missing, out-of-order …

17

val future = myActor.ask("hello")(5 seconds)

context.setReceiveTimeout(100 milliseconds)

def receive = { case "Hello" => //do something case ReceiveTimeout => throw new RuntimeException("Timed out")

Setting timeouts in message passing

timeout after 5 sec

Timeout receiving messages

in case of timeout

Async message passing helpsStill to be studied at protocol levelDifferent guarantee requirementsIn W3C RSP: usually ‘out-of-scope’

“order matters!”[

1]

ResponsiveResilient

Page 18: RDF Stream Processing: Let's React

④ Generate predictable outcomes

Correctness in RSP

18

“RSP engines produce different but correct results”[1]

RSP queries: operational semanticsCommon query execution modelComparable / predictable results

Work

in Progre

ss

Dell’Aglio Calbimonte, Balduini, Corcho, Della Valle. On Correctness on RDF Stream Processing Benchmarking. ISWC 2013

Page 19: RDF Stream Processing: Let's React

⑤ Integrate stored and streaming data

19

SELECT  ?person1 ?person2 FROM NAMED <http://deri.org/floorplan/>WHERE {GRAPH <http://deri.org/floorplan/> {?loc1 lv:connected ?loc2}STREAM <http://deri.org/streams/rfid> [NOW] {?person1 lv:detectedAt ?loc1}

SELECT  ?road ?speed WHERE { { ?road :slowTrafficDue ?observ } { ?observ rdfs:subClassOf :SlowTraffic { ?observ :speed ?speed } FILTER (getDURATION() < ”P1H”^^xsd:duration)}

Integration in RSP languages

stored RDF graph

RDF streamCQELS

RDF stream + stored

EP-SPARQL

“EP-SPARQL uses background ontologies to enable

stream reasoning”[1]

 context.loadDataset("{URI OF GRAPH}", "{DIRECTORY TO LOAD}");

Loading background knowledge

More than just joins with stored dataStream reasoning / inferences

Clean query model for stream+stored dataWeb-ready inter-dataset queriesShared vocabularies/ontologies

Responsive?

Scalable?Anicic et al. EP-SPARQL: a unified language for event processing and stream reasoning. WWW2011.

Page 20: RDF Stream Processing: Let's React

⑥ Guarantee data safety and availability

Restart,Suspend,Stop,Escalate, etc

20

Parent

Actor1

Automatic supervisionIsolate failuresManage local failuresSupervision strategies: All-for-One One-for-oneDeath watch handling

Supervisionhierarchy

Supervision

Actor2

Actor3

Actor4 X

Resilient

Page 21: RDF Stream Processing: Let's React

Data availability

21

High load:input streams

RSPqueries/

rulesunde

r str

ess

“eviction: delete from cache of operators”[1]

binding1binding2..bindingX

binding3binding5..bindingY

X

Join operator

X

Evict cache items based on: recency LRU Likeliness of matchingResponsive

Resilient

Gao et al. The CLOCK Data-Aware Eviction Approach: Towards Processing Linked Data Streams with Limited Resources. ESWC 2014

Page 22: RDF Stream Processing: Let's React

⑦ Partition and scale automatically

22

Dynamite: Parallel MaterializationMaintaining a dynamic ontology

Parallel threads

Scalable

Page 23: RDF Stream Processing: Let's React

Actors everywhere

23

Actor1

Actor2

m No difference in one core many cores many servers

Actor3

Actor4

Transparent RemotingLocality optimizationDefine Routing policiesDefine actor clusters

m

m

Existing ‘map reduce’ for streams: Storm, S4, Spark, Akka StreamsCreate workflows of Stream Processors?

Page 24: RDF Stream Processing: Let's React

⑧ Process and respond instantaneously

Blocking queriesSync communicationBottlenecks No end to end reactivity

24

SPARQLStream

Virtual RDF Stream

DSMS CEP Sensor middleware

rewritten queries

users, applications

query processing

Morph-streams

data layer

“Query virtual RDF streams”[1]

Push:Using Websockets! Two way communicationResponsive answers

Responsive?

Page 25: RDF Stream Processing: Let's React

RSP Stream ReasoningExample with TrOWL (trowl.eu)

Ontologies evolve over time!

Adding and removing axioms over time

ONTO

+ axiom1+ axiom2

ONTO’- axiom3

ONTO’’

“reasoning with

changing knowledge”[3]

Ren, Pan, Zhao. Towards Scalable Reasoning on Ontology Streams via Syntactic Approximation. IWOD2010.

val onto=mgr.createOntology val reasoner = new RELReasonerFactory().createReasoner(onto) (1 to 10).foreach{i=> reasoner += (Indiv(rspEx+"sys"+i) ofClass system) reasoner.reclassify}println (reasoner.getInstances(SystemClass, true))

get instances of ‘System’ class

reclassify

add 10 axioms

25

trowl

Responsive

Powerful Dynamic Ontology MaintenanceStreaming query results?Notify new inferences?

Page 26: RDF Stream Processing: Let's React

A lot to do…

26

Page 27: RDF Stream Processing: Let's React

Reactive RSPsKeep data movingQuery with stream SQLHandle imperfectionsPredictable outcomesIntegrate stored dataData safety & availabilityPartition & scaleRespond Instantaneously8

Requ

irem

ents We go beyond only these

27

Data HeterogeneityData ModelingStream ReasoningData discoveryStream data linkingQuery optimization… more

Reactive PrinciplesNeeded if we want to build relevant systems

Page 28: RDF Stream Processing: Let's React

Reactive RSP workflows

28

MorphStreams

CSPARQL

s

Etalis

TrOWL

s

s CQELS

Dynamite

s

Minimal agreements: standards, serialization, interfacesFormal models for RSPs and reasoningWorking prototypes/systems!

Event-driven message passingAsync communicationImmutable streamsTransparent RemotingParallel and distributedSupervised Failure HandlingResponsive processing

Reactive RSPs

Page 29: RDF Stream Processing: Let's React

We like to do things

29

Page 30: RDF Stream Processing: Let's React

Muchas gracias!

Jean-Paul CalbimonteLSIR EPFL

@jpcik