stream reasoning - where we got so far 2011.1.18 oxford key note

33
•For more information visit http://wiki.larkc.eu/UrbanComputing Stream Reasoning Where We Got So Far Oxford - 2010.1.18 http://streamreasoning.org Emanuele Della Valle DEI - Politecnico di Milano [email protected] http://emanueledellavalle.org Joint work with: Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, and Michael Grossniklaus

Upload: emanuele-della-valle

Post on 27-Jan-2015

104 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

• For more information visit http://wiki.larkc.eu/UrbanComputing

Stream Reasoning Where We Got So Far

Oxford - 2010.1.18

http://streamreasoning.org

Emanuele Della Valle DEI - Politecnico di Milano

[email protected] http://emanueledellavalle.org

Joint work with:

Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, and Michael Grossniklaus

Page 2: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Agenda

•  Motivation •  Running Example •  Background •  Concept •  Achievements •  Retrospective and Conclusions

2 Oxford, 2011-1-18

Page 3: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Motivation It‘s a streaming World! [IEEE-IS2009]

3 Oxford, 2011-1-18

•  Sensor networks, …

•  traffic engineering, …

•  social networking, …

•  financial markets, …

•  generate streams!

Page 4: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Running Example Real-Time Streams on the Web

•  Streams are appearing more and more often on the Web in sites that distribute and present information in real-time streams.

•  Checkout http://activitystrea.ms/ for a standard API •  E.g.

4 Oxford, 2011-1-18

Emanuele Della Valle - visit http://streamreasoning.org 4

Page 5: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Running Example Examples of Questions Users are Asking

•  Which topics have my close friends discussed in the last hour?

•  Which book is my friend likely to read next? •  What impact have I been creating with my tweets in

the last day? • … •  <query> … <time dimension> ?

5 Oxford, 2011-1-18

Page 6: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Motivation Problem Statement

•  Making sense –  in real time –  of gigantic and inevitably noisy data streams –  in order to support the decision process of

extremely large numbers of concurrent user

6 Oxford, 2011-1-18

Page 7: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Background What are data streams anyway?

•  Formally: –  Data streams are unbounded sequences of time-

varying data elements

•  Less formally: –  an (almost) “continuous” flow of information –  with the recent information being more relevant as it

describes the current state of a dynamic system

time

7 Oxford, 2011-1-18

Page 8: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Background Continuous Semantics

•  Processing data streams in the space of one-time semantics is difficult because of the very nature of the underlying data

•  Innovative* assumption: continuous semantics! –  streams can be consumed on the fly rather than being

stored forever and –  queries are registered and continuously produce

answers

* This innovation arose in DB community in ’90s

8 Oxford, 2011-1-18

Page 9: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Background Stream Processing

•  Continuous queries registered over streams that are observed trough windows

Oxford, 2011-1-18

window

input stream stream of answer Registered  Con-nuous  

Query  

9

Page 10: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Background Data Stream Management Systems (DSMS)

•  Research Prototypes –  Amazon/Cougar (Cornell) – sensors –  Aurora (Brown/MIT) – sensor monitoring, dataflow –  Gigascope: AT&T Labs – Network Monitoring –  Hancock (AT&T) – Telecom streams –  Niagara (OGI/Wisconsin) – Internet DBs & XML –  OpenCQ (Georgia) – triggers, view maintenance –  Stream (Stanford) – general-purpose DSMS –  Stream Mill (UCLA) - power & extensibility –  Tapestry (Xerox) – publish/subscribe filtering –  Telegraph (Berkeley) – adaptive engine for sensors –  Tribeca (Bellcore) – network monitoring

•  High-tech startups –  Streambase, Coral8, Apama, Truviso

•  Major DBMS vendors are all adding stream extensions as well –  Oracle http://www.oracle.com/technology/products/dataint/htdocs/streams_fo.html –  DB2 http://www.eweek.com/c/a/Database/IBM-DB2-Turns-25-and-Prepares-for-New-Life/

10 Oxford, 2011-1-18

Page 11: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Background Can the Semantic Web process data stream?

•  The Semantic Web, the Web of Data is doing fine –  RDF, RDF Schema, SPARQL, OWL, RIF –  well understood theory, –  rapid increase in scalability

•  BUT it pretends that the world is static or at best a low change rate both in change-volume and change-frequency

–  ontology versioning –  belief revision –  time stamps on named graphs

•  It sticks to the traditional one-time semantics

11 Oxford, 2011-1-18

Page 12: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Concept Stream Reasoning [IEEE-IS2010]

•  Idea origination –  Can continuous semantics be ported to reasoning? –  This is an unexplored yet high impact research area!

•  Stream Reasoning –  Logical reasoning in real time on gigantic and

inevitably noisy data streams in order to support the decision process of extremely large numbers of concurrent users. -- S. Ceri, E. Della Valle, F. van Harmelen and H. Stuckenschmidt, 2010

•  Note: making sense of streams necessarily requires processing them against rich background knowledge

12 Oxford, 2011-1-18

Page 13: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Concept Research Challenges

•  Relation with data-stream systems –  Just as RDF relates to data-base systems?

•  Query languages for semantic streams –  Just as SPARQL for RDF but with continuous semantics?

•  Reasoning on Streams –  Formal representations for stream reasoning –  Notions of soundness and completeness –  Efficiency –  Scalability

•  Dealing with incomplete & noisy data –  Even more so than on the current Web of Data

•  Distributed and parallel processing –  Streams are parallel in nature

13 Oxford, 2011-1-18

Page 14: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements Explored Continuous Semantics for SeWeb

•  We investigated –  Architecture of a Stream Reasoner –  RDF streams

•  the natural extension of the RDF data model to the new continuous scenario and

–  Continuous SPARQL (or simply C-SPARQL) •  the extension of SPARQL for querying RDF streams.

–  Efficient incremental updates of deductive closures

•  specifically considering the nature of data streams –  Effective inductive stream reasoning (joint work

with Siemens - Munich) •  See paper in IEEE IS special issue on Social Media

Analytics

14 Oxford, 2011-1-18

Page 15: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements Architecture (IEEE-IS2010)

•  Based on the LarKC conceptual framework http://www.larkc.eu

15 Oxford, 2011-1-18

Legenddata  stream C-­‐SPARQL  queryRDF  stream SPARQL  with Probability

RDF  graph

SelectorDSMS  .

AbstracterDSMS

DeductiveReasonerWindow

AbstracterLong-­‐TermMatrix

AbstracterHypeMatrix

InductiveReasoner

InductiveReasoner

C

CC

C

P

P

P Social  Med

ia  Ana

lytics

Page 16: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements RDF Stream [WWW2009,EDBT2010,IJSC2010]

•  RDF Stream Data Type –  Ordered sequence of pairs, where each pair is made

of an RDF triple and its timestamp t (< triple >, t)

•  E.g., (<:Giulia :likes :Twilight >, 2010-02-12T13:34:41) (<:John :likes :TheLordOfTheRings >, 2010-02-12T13:36:28) (<:Alice :dislikes :Twilight >, 2010-02-12T13:36:28)

16 Oxford, 2011-1-18

Page 17: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements C-SPARQL [WWW2009,EDBT2010,IJSC2010]

•  We specificied of C-SPARQL syntax –  Incrementally, from existing specifications

•  Including windows, grouping, aggregates, timestamping

•  We gave the formal semantics of C-SPARQL –  Query registration, handling overloads –  Order of evaluation, pattern matching over time, …

•  We investigated efficiency of evaluation –  Defining a suitable algebra –  Applying optimizations –  Efficient materialization of inferred data from streams

17 Oxford, 2011-1-18

Page 18: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements An Example of C-SPARQL Query

Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them

REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource }

FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m]

WHERE {

?opinionMaker ?opinion ?resource .

?follower sioc:follows ?opinionMaker.

?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker)

&& ?opinion != sd:accesses )

}

HAVING ( COUNT(DISTINCT ?follower) > 3 )

18 Oxford, 2011-1-18

Page 19: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements An Example of C-SPARQL Query

Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them

REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource }

FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m]

WHERE {

?opinionMaker ?opinion ?resource .

?follower sioc:follows ?opinionMaker.

?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker)

&& ?opinion != sd:accesses )

}

HAVING ( COUNT(DISTINCT ?follower) > 3 )

19 Oxford, 2011-1-18

Query registration (for continuous execution)

FROM STREAM clause

WINDOW

RDF Stream added as new ouput format

Builtin to access

timestamps

Aggregates as in SPARQL 1.1

Page 20: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements Efficiency of Evaluation 1/3 [IEEE-IS2010]

•  Evaluation of Window-based Selection

20 Oxford, 2011-1-18

Page 21: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements Efficiency of Evaluation 2/3 [EDBT2010]

•  Several transformations can be applied to algebraic representation of C-SPARQL

•  some recalling well known results from classical relational optimization

–  push of FILTERs and projections •  some being more specific to the domain of streams.

–  push of aggregates.

21 Oxford, 2011-1-18

Page 22: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements Efficiency of Evaluation 3/3 [EDBT2010]

•  Push of filters and projections

22 Oxford, 2011-1-18

0

25

50

75

100

125

10 100 1000 10000 100000

ms

Window Size

None Static Only Streaming Only Both

Page 23: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements Example of C-SPARQL and Reasoning 1/2

What impact have I been creating with my tweets in the last hour? Is it positive or negative? Let’s count them … REGISTER QUERY CountPositiveAndNegativeReactions AS PREFIX : <http://ex.org/twitterImpactMining#>

SELECT ?t count(?pos) count(?neg) FROM STREAM <http://ex.org/discussions.trdf>

[RANGE 30m STEP 30s]

WHERE {

?t a :MonitoredTweet .

{ ?pos :discuss ?t ;

:ProduceReaction [ a :PositiveReaction ] .

} UNION {

?neg :discuss ?t ;

:ProduceReaction [ a :NegativeReaction ] .

} } GROUP BY ?t

23 Oxford, 2011-1-18

:discuss a owl:TransitiveProperty . :reply rdfs:subPropertyOf :discuss .

:retweet rdfs:subPropertyOf :discuss .

Page 24: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements Example of C-SPARQL and Reasoning 2/2

24 Oxford, 2011-1-18

t1   t1-­‐1   t1-­‐2   t1-­‐3  retweet   reply   retweet  

discuss  

discuss  

discuss   discuss   discuss  

discuss  

Monitored                        Posi.ve                            Nega.ve  

Page 25: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements State-of-the-Art Approach [Ceri1994,Volz2005]

1.  Overestimation of deletion: Overestimates deletions by computing all direct consequences of a deletion.

2.  Rederivation: Prunes those estimated deletions for which alternative derivations (via some other facts in the program) exist.

3.  Insertion: Adds the new derivations that are consequences of insertions to extensional predicates.

25 Oxford, 2011-1-18

Page 26: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements our approach [ESWC2010] 1/2

•  Assuption –  Insertions and deletions are triples respectively

entering and exiting the window –  The window size is known

•  Therefore –  The time when each triple will expire is known and

determined by the window size •  E.g. if the window is 10s long a triple entering at time t will

exit at time t+10s –  Note: all knowledge can be annotated with an

expiration time •  i.e., background knowledge is annotated with +∞

26 Oxford, 2011-1-18

Page 27: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements our approach [ESWC2010] 2/2

•  The algorithm 1.  deletes all triples (asserted or inferred) that have just

expired 2.  computes the entailments derived by the inserts, 3.  annotates each entailed triple with a expiration time,

and 4.  eliminates from the current state all copies of derived

triples except the one with the highest timestamp.

•  learn more –  http://www.slideshare.net/emanueledellavalle/incremental-

reasoning-on-streams-andrich-background-knowledge

27 Oxford, 2011-1-18

Page 28: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Achievements Comparative Evaluation 1/2 [ESWC2010]

•  Hypothesis –  Background knowledge do not change and it is fully materialized –  Changes only take place in the window

•  An experiment comparing the time required to compute a new materialization using

–  Re-computing from scratch (i.e.,1250 ms in our setting) –  State of the art incremental approach [Volz, 2005] –  Our approach

•  Results at increasing % of the materialization changed when the window slides

•  .

28 Oxford, 2011-1-18

10

100

1000

10000

0,0% 2,0% 4,0% 6,0% 8,0% 10,0% 12,0% 14,0% 16,0% 18,0% 20,0%

ms.

%  of  the  materialization   changed  when  the  window  slides

incremental-­‐volz incremental-­‐stream

Page 29: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

forward  reasoning naive  approach incremental-­‐stream

query 5,82 1,61 1,61materialization 0 15,91 0,28

0

5

10

15

20

ms.

Achievements Comparative Evaluation 2/2

•  Comparison of the average time needed to answer a C-SPARQL query using

–  a forward reasoner, –  the naive approach of re-computing the materialization –  our approach

29 Oxford, 2011-1-18

Page 30: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Retrospective and Conclusions Wrap Up

•  RDF Streams –  Notion defined

•  C-SPARQL –  Syntax and semantics defined as a SPARQL extension –  Engine designed –  Engine implemented based on the decision to keep stream

management and query evaluation separated •  Experiments with C-SPARQL under simple RDF entailment

regimes –  window based selection of C-SPARQL outperforms the standard

FILTER based selection –  having formally defined C-SPARQL semantics algebraic

optimizations are possible •  Experiment with C-SPARQL under OWL-RL entailment

regimes –  efficient incremental updates of deductive closures investigated –  our approach outperform state-of-the-art when updates comes as

stream

30 Oxford, 2011-1-18

Page 31: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

Retrospective and Conclusions Achievements vs. Research Challenges

•  Relation with data-stream systems –  Notion of RDF stream :-|

•  Query languages for semantic streams –  C-SPARQL :-D

•  Reasoning on Streams –  Formal representations for stream reasoning

•  :-P –  Notions of soundness and completeness

•  :-P –  Efficient incremental updates of deductive closures

•  ESWC 2010 paper :-) ... but much more work is needed! –  How to combine streams and background knowledge

•  ESWC 2010 paper :-| ... but a lot needs to be studied ... •  Dealing with incomplete & noisy data

–  :-P •  Distributed and parallel processing

–  :-P

31 Oxford, 2011-1-18

Page 32: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

Emanuele Della Valle - visit http://streamreasoning.org

References •  Vision

[IEEE-IS2009] Emanuele Della Valle, Stefano Ceri, Frank van Harmelen, Dieter Fensel It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009)

•  Continuous SPARQL (C-SPARQL) [EDBT2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri and Michael

Grossniklaus. An Execution Environment for C-SPARQL Queries. EDBT 2010 [WWW2009] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle,

Michael Grossniklaus: C-SPARQL: SPARQL for continuous querying. WWW 2009: 1061-1062

[IJSC2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: C-SPARQL: a Continuous Query Language for RDF Data Streams. Int. J. Semantic Computing 4(1): 3-25 (2010)

[IEEE-IS2010] Davide Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Yi Huang, Volker Tresp, Achim Rettinger, Hendrik Wermser, "Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics," IEEE Intelligent Systems, 30 Aug. 2010.

•  Stream Reasoning [ESWC2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle,

Michael Grossniklaus. Incremental Reasoning on Streams and Rich Background Knowledge. In. 7th Extended Semantic Web Conference (ESWC 2010)

•  Background work [Ceri1994] Stefano Ceri, Jennifer Widom: Deriving Incremental Production Rules for Deductive

Data. Inf. Syst. 19(6): 467-490 (1994) [Volz2005] Raphael Volz, Steffen Staab, Boris Motik: Incrementally Maintaining

Materializations of Ontologies Stored in Logic Databases. J. Data Semantics 2: 1-34 (2005)

32 Oxford, 2011-1-18

Page 33: Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

For more information visit http://www.larkc.eu/

Thank You! Questions?

33

Much More to Come! Keep an eye on

http://www.streamreasoning.org

33 Oxford, 2011-1-18