stream reasoning: state of the art and beyond

58
Emanuele Della Valle - visit http://streamreasoning.org Stream Reasoning: Stream Reasoning: State of the Art and State of the Art and Beyond Beyond http://streamreasoning.org http://streamreasoning.org Emanuele Della Valle DEI - Politecnico di Milano [email protected] http://emanueledellavalle.org

Upload: emanuele-della-valle

Post on 27-Jan-2015

107 views

Category:

Education


0 download

DESCRIPTION

The talk about "Stream Reasoning" for INQUEST -- INnovative QUErying of STreams 2012 -- (http://games.cs.ox.ac.uk/inquest12/) organized in Oxford, United Kingdom, September 25-27 2012. The talks presents a comprehensive view on "Stream Reasoning" -- reasoning on rapidly flowing information. It illustrates the challenges, presents the achievements of the database group of Politecnico di Milano on the topic, reviews the challenges pointing to results and ongoing work in the Semantic Web community and proposes how to go beyond the current Stream Reasoning concept. It particular, it points out that "orders matters" when processing massive data and it proposes to investigate streaming algorithms for automated reasoning that can be applied not only to data streams that are "naturally" ordered (by recency) but to any sortable data source.

TRANSCRIPT

Page 1: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

Stream Reasoning:Stream Reasoning: State of the Art and Beyond State of the Art and Beyond

http://streamreasoning.org http://streamreasoning.org

Emanuele Della Valle DEI - Politecnico di Milano

[email protected]://emanueledellavalle.org

Page 2: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Agenda

• Motivation

• Concept

• Running Example

• Achievements

• Challenges vs. Achievements

• Beyond Stream Reasoning

• Conclusions

Oxford, 2012-9-25 2

Page 3: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Motivation

It‘s a streaming World! [IEEE-IS2009] 1/3

• Oil operations

• Traffic

• Financial markets

• Social networks

• Generate data streams!

Oxford, 2012-9-25 3

Page 4: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Motivation

It‘s a streaming World! [IEEE-IS2009] 2/4

• … and want to analyse data streams in real time

• In a well in progress to drown, how long time do I have givenits historical behavior?

• Is public transportation where the people are?

• Can we detect any intra-daycorrelation clusters among stock exchanges?

• Who is driving the discussion about the top 10 emerging topics ?

Oxford, 2012-9-25 4

Page 5: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Motivation

What are data streams anyway?

• Formally: – Data streams are unbounded sequences of time-

varying data elements

• Less formally: – an (almost) “continuous” flow of information – with the recent information being more relevant as it

describes the current state of a dynamic system

time

Oxford, 2012-9-25 5

Page 6: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Motivation

The continuous nature of streams

• The nature of streams requires a paradigmatic change*

– from persistent data • to be stored and queried on demand • a.k.a. one time semantics

– to transient data • to be consumed on the fly by continuous queries• a.k.a. continuous semantics

* This paradigmatic change first arose in DB community [Henzinger98]

Oxford, 2012-9-25 6

Page 7: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Motivation – The continuous nature of streams

Continuous Semantics

• Continuous queries registered over streams that, in most of the cases, are observed trough windows

window

input streams streams of answerRegistered Continuous

Query

Oxford, 2012-9-25 7

Page 8: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Motivation – The continuous nature of streams

Tools exists [Cugola2011]• Types

– Data Stream Management Systems– Complex Event Processors

• Research Prototypes– Amazon/Cougar (Cornell) – sensors– Aurora (Brown/MIT) – sensor monitoring, dataflow– Gigascope: AT&T Labs – Network Monitoring– Hancock (AT&T) – Telecom streams– Niagara (OGI/Wisconsin) – Internet DBs & XML– OpenCQ (Georgia) – triggers, view maintenance– Stream (Stanford) – general-purpose DSMS– Stream Mill (UCLA) - power & extensibility– Tapestry (Xerox) – publish/subscribe filtering– Telegraph (Berkeley) – adaptive engine for sensors– Tribeca (Bellcore) – network monitoring

• High-tech startups– Streambase, Coral8, Apama, Truviso

• Major DBMS vendors are all adding stream extensions as well– IBM InfoSphere Stream – Microsoft streaminsight– Oracle CEP

Oxford, 2012-9-25 8

Page 9: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

Motivation

New Requirements New Challenges

Typical Requirements

•Processing Streams

•Large datasets

•Reactivity

•Fine-grained information access

•Modeling complex application domains

•Continuous semantics

• Scalable processing

• Real-time systems

• Powerful query languages

•Rich ontology languages

Oxford, 2012-9-25 9

Page 10: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

Motivation

Are DSMS/CEP ready to address them?

Typical Requirements

•Processing Streams

•Large datasets

•Reactivity

•Fine-grained information access

•Modeling complex application domains

DSMS/CEP

•Continuous semantics

• Scalable processing

• Real-time systems

• Powerful query languages

•Rich ontology languages

Oxford, 2012-9-25 10

Page 11: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Motivation

Is Semantic Web ready to address them?

• The Semantic Web, the Web of Data is doing fine– RDF, RDF Schema, SPARQL, OWL, RIF– well understood theory, – rapid increase in scalability

• BUT it pretends that the world is staticor at best a low change rateboth in change-volume and change-frequency

– ontology versioning– belief revision– time stamps on named graphs

• It sticks to the traditional one-time semantics

Oxford, 2012-9-25 11

Page 12: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

Motivation

New Requirements New Challenges

Typical Requirements

•Processing Streams

•Large datasets

•Reactivity

•Fine-grained information access

•Modeling complex application domains

Semantic Web

•Continuous semantics

• Scalable processing

• Real-time systems

• Powerful query languages

•Rich ontology languages

Oxford, 2012-9-25 12

Page 13: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

Motivation

New Requirements call for Stream Reasoning

Typical Requirements

•Processing Streams

•Large datasets

•Reactivity

•Fine-grained information access

•Modeling complex application domains

•Continuous semantics

• Scalable processing

• Real-time systems

• Powerful query languages

•Rich ontology languages

Stream Reasoning

Oxford, 2012-9-25 13

Page 14: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Concept

Stream Reasoning Definition [IEEE-IS2010]

• Making sense – in real time

– of multiple, heterogeneous, gigantic and inevitably noisy data streams

– in order to support the decision process of extremely large numbers of concurrent user

• Note: making sense of streams necessarily requires processing them against rich background knowledge, an unsolved problem in database

Oxford, 2012-9-25 14

Page 15: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Concept

Research Challenges• Relation with DSMSs and CEPs

– Just as RDF relates to data-base systems?• Data types and query languages for semantic streams

– Just RDF and SPARQL but with continuous semantics?• Reasoning on Streams

– Theory– Efficiency– Scalability

• Dealing with incomplete & noisy data– Even more than on the current Web of Data

• Distributed and parallel processing– Streams are parallel in nature, …

• Engineering Stream Reasoning Applications– Development Environment– Integration with other technologies– Benchmarks

Oxford, 2012-9-25 15

Page 16: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Running Example

Social Media Analytics in BOTTARI [JWS2012]

http://streamreasoning.org/demos/bottari

16Oxford, 2012-9-25 Winner of S

emantic Web Challenge 2012

Page 17: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Running Example

Data Model Used in BOTTARI

17Oxford, 2012-9-25

geo:SpatialThinggeo:lat

(xsd:float)geo:long(xsd:float)

sr:NamedPlace

sioc:creator_ofsioc:creator_of

sr:talksAboutsr:talksAbout

sr:replysr:reply

sr:retweetsr:retweet

sioc:has_creatorsioc:has_creator

sr:talksAboutNeutrallysr:talksAboutNeutrally

sr:talksAboutPositivelysr:talksAboutPositively

sr:followersr:followersr:followingsr:following

twd:posttwd:post

twd:discusstwd:discuss

sr:talksAboutNegativelysr:talksAboutNegatively

Page 18: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Running Example

Streaming vs. Background Information

18Oxford, 2012-9-25

data stream

User related background knowledge

Point of Interest related background knowledge

Page 19: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements• RDF Streams

– Notion defined• C-SPARQL

– Syntax and semantics defined as a SPARQL extension– Engine designed and implemented

• Experiments with C-SPARQL under simple RDF entailment regimes

– window based selection of C-SPARQL outperforms the standard FILTER based selection

– algebraic optimizations of C-SPARQL queries are possible– high throughputs

• Experiment with C-SPARQL under RDFS++ entailment regimes

– efficient incremental updates of deductive closures investigated – our approach outperform state-of-the-art when updates comes as

stream

Oxford, 2012-9-25 19

Page 20: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements

Outline• RDF Streams

– Notion defined• C-SPARQL

– Syntax and semantics defined as a SPARQL extension– Engine designed and implemented

• Experiments with C-SPARQL under simple RDF entailment regimes

– window based selection of C-SPARQL outperforms the standard FILTER based selection

– algebraic optimizations of C-SPARQL queries are possible– high throughputs

• Experiment with C-SPARQL under RDFS++ entailment regimes

– efficient incremental updates of deductive closures investigated – our approach outperform state-of-the-art when updates comes as

stream

Oxford, 2012-9-25 20

Page 21: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements

RDF Stream [WWW2009,EDBT2010,IJSC2010]

• RDF Stream Data Type– Ordered sequence of pairs, where each pair is made of

an RDF triple and its timestamp

Timestamps are not required to be unique, they must be non-decreasing

• E.g.,(<:Alice :posts :post1 >, 2010-02-12T13:34:41)

(<:post1 :talksAboutPositively :LaScala>, 2010-02-12T13:34:41)

(<:Bob :posts :post2 >, 2010-02-12T13:36:28)

(<:post2 :talksAboutNegatively :Duomo>, 2010-02-12T13:36:28)

Oxford, 2012-9-25 21

Page 22: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements

Outline• RDF Streams

– Notion defined• C-SPARQL

– Syntax and semantics defined as a SPARQL extension– Engine designed and implemented

• Experiments with C-SPARQL under simple RDF entailment regimes

– window based selection of C-SPARQL outperforms the standard FILTER based selection

– algebraic optimizations of C-SPARQL queries are possible– high throughputs

• Experiment with C-SPARQL under RDFS++ entailment regimes

– efficient incremental updates of deductive closures investigated – our approach outperform state-of-the-art when updates comes as

stream

Oxford, 2012-9-25 22

Page 23: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

MEMO: SPARQLMEMO: SPARQL

Oxford, 2012-9-25 23

Page 24: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

AchievementsAchievements

Where C-SPARQL Extends SPARQLWhere C-SPARQL Extends SPARQL

Oxford, 2012-9-25 24

Page 25: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements An Example of C-SPARQL Query

Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them

REGISTER STREAM OpinionMakers COMPUTED EVERY 5m ASCONSTRUCT { ?opinionMaker sd:about ?resource }FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m]

WHERE { ?opinionMaker ?opinion ?resource . ?follower sioc:follows ?opinionMaker. ?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker)

&& ?opinion != sd:accesses ) } HAVING ( COUNT(DISTINCT ?follower) > 3 )

Oxford, 2012-9-25 25

Page 26: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements An Example of C-SPARQL Query

Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them

REGISTER STREAM OpinionMakers COMPUTED EVERY 5m ASCONSTRUCT { ?opinionMaker sd:about ?resource }FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m]

WHERE { ?opinionMaker ?opinion ?resource . ?follower sioc:follows ?opinionMaker. ?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker)

&& ?opinion != sd:accesses ) } HAVING ( COUNT(DISTINCT ?follower) > 3 )

Oxford, 2012-9-25 26

Query registration(for continuous

execution)

FROM STREAM clause

WINDOW

RDF Stream added as new ouput format

Builtin to access

timestamps

Aggregates as in SPARQL

1.1

Page 27: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements

Outline• RDF Streams

– Notion defined• C-SPARQL

– Syntax and semantics defined as a SPARQL extension– Engine designed and implemented

• Experiments with C-SPARQL under simple RDF entailment regimes

– window based selection of C-SPARQL outperforms the standard FILTER based selection

– algebraic optimizations of C-SPARQL queries are possible– high throughputs

• Experiment with C-SPARQL under RDFS++ entailment regimes

– efficient incremental updates of deductive closures investigated – our approach outperform state-of-the-art when updates comes as

stream

Oxford, 2012-9-25 27

Page 28: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements FROM STREAM Clause - Types of Window

• physical: a given number of triples• logical: a variable number of triples which occur during a

given time interval (e.g., 1 hour)– Sliding: they are progressively advanced of

a given STEP (e.g., 5 minutes)

– Tumbling: they are advanced of exactly their time interval

Oxford, 2012-9-25 28

Page 29: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements Efficiency of Evaluation [IEEE-IS2010]

• window based selection of C-SPARQL outperforms the standard FILTER based selection

Oxford, 2012-9-25 29

Page 30: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements

Outline• RDF Streams

– Notion defined• C-SPARQL

– Syntax and semantics defined as a SPARQL extension– Engine designed and implemented

• Experiments with C-SPARQL under simple RDF entailment regimes

– window based selection of C-SPARQL outperforms the standard FILTER based selection

– algebraic optimizations of C-SPARQL queries are possible– high throughputs

• Experiment with C-SPARQL under RDFS++ entailment regimes

– efficient incremental updates of deductive closures investigated – our approach outperform state-of-the-art when updates comes as

stream

Oxford, 2012-9-25 30

Page 31: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements Algebraic optimizations of C-SPARQL [EDBT2010]

• Several transformations can be applied to algebraic representation of C-SPARQL

• some recalling well known results from classical relational optimization

– push of FILTERs and projections

• some being more specific to the domain of streams– push of aggregates

Oxford, 2012-9-25 31

Page 32: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements algebraic optimizations of C-SPARQL [EDBT2010]

• Push of filters and projections

0

25

50

75

100

125

10 100 1000 10000 100000

ms

Window Size

None Static Only Streaming Only Both

Oxford, 2012-9-25 32

Page 33: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements

Outline• RDF Streams

– Notion defined• C-SPARQL

– Syntax and semantics defined as a SPARQL extension– Engine designed and implemented

• Experiments with C-SPARQL under simple RDF entailment regimes

– window based selection of C-SPARQL outperforms the standard FILTER based selection

– algebraic optimizations of C-SPARQL queries are possible– high throughputs

• Experiment with C-SPARQL under RDFS++ entailment regimes

– efficient incremental updates of deductive closures investigated – our approach outperform state-of-the-art when updates comes as

stream

Oxford, 2012-9-25 33

Page 34: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements High Throughputs [JWS2012]

Oxford, 2012-9-25 34

Page 35: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Achievements

Outline• RDF Streams

– Notion defined• C-SPARQL

– Syntax and semantics defined as a SPARQL extension– Engine designed and implemented

• Experiments with C-SPARQL under simple RDF entailment regimes

– window based selection of C-SPARQL outperforms the standard FILTER based selection

– algebraic optimizations of C-SPARQL queries are possible– Complex event can be detected using a network of C-SPARQL

queries at high throughputs• Experiment with C-SPARQL under RDFS++ entailment

regimes– efficient incremental updates of deductive closures investigated – our approach outperform state-of-the-art when updates comes as

stream

Oxford, 2012-9-25 35

Page 36: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

AchievementsAchievements

Where’s the Reasoning? Where’s the Reasoning?

Example: can we measure the the impact of a tweet? Twitter allows two traceable ways of discussing a tweet:

reply: a user reply to a tweet of another user (it always retweet the original tweet)

retweet: a user propagates to his/her followers an interesting tweet

For example

t1 t3 t5 t8retweet reply reply

t2 t4 t7

t6

reply reply

retweet

reply

now10 min ago20 min ago30 min ago40 min ago50 min ago

Oxford, 2012-9-25 36

Page 37: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

AchievementsAchievements

Example of C-SPARQL and Reasoning 1/2Example of C-SPARQL and Reasoning 1/2

What impact have I been creating with my tweets in the last hour? Let’s count them …

REGISTER STREAM OpinionSpreading COMPUTED EVERY 30s AS

SELECT ?tweet (count(?tweet) AS ?impact

FROM STREAM <http://ex.org> [RANGE 60m STEP 10m]

WHERE {

:t1 sr:discuss ?tweet

}

:reply rdfs:subPropertyOf :discuss .:retweet rdfs:subPropertyOf :discuss .

t1 t3 t5 t8retweet reply reply

t2 t4 t7

t6

reply reply

retweet

reply

discuss discuss discuss

discuss discuss

discuss

discuss

:discuss a owl:TransitiveProperty .

7!Oxford, 2012-9-25 37

Page 38: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

AchievementsAchievements

Our approach [ESWC2010]

• The algorithm1. deletes all triples (asserted or inferred) that have just

expired

2. computes the entailments derived by the inserts,

3. annotates each entailed triple with a expiration time, and

4. eliminates from the current state all copies of derived triples except the one with the highest timestamp.

Oxford, 2012-9-25 38

Page 39: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

Implementation

Comparative Evaluation on Materialization

• base-line: re-computing the materialization from scratch• state-of-the-art [Ceri1994,Volz2005]• our approach [ESWC2010]

10

100

1000

10000

0,0% 2,0% 4,0% 6,0% 8,0% 10,0% 12,0% 14,0% 16,0% 18,0% 20,0%

ms.

% of the materialization changed when the window slides

incremental-volz incremental-streamOxford, 2012-9-25 39

% of the materialization changed when the window slides

Page 40: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

forward reasoning naive approach incremental-stream

query 5,82 1,61 1,61materialization 0 15,91 0,28

0

5

10

15

20

ms.

AchievementsAchievements

Comparative Evaluation on Query Answering

• comparison of the average time needed to answera C-SPARQL query using

– backward reasoner– the naive approach of re-computing the materialization– our approach

Backward reasoning

Oxford, 2012-9-25 40

Page 41: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

AchievementsAchievements

Location Based Social Media Analytics [JWS2012]

41Oxford, 2012-9-25

Live demonstration at http://streamreasoning.org/demos/bottari

http://youtu.be/XGOKe_lhSks

Page 42: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

AchievementsAchievements

Sensor Networks - Weather phenomenon detection

42Oxford, 2012-9-25Live demonstration at http://streamreasoning.org/demos/

Page 43: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org

Research Challenges vs. AchievementsResearch Challenges vs. Achievements Relation with DSMSs and CEPs

Notion of RDF stream :-| alternative solutions can be investigated Data types and query languages for semantic streams

C-SPARQL :-D work in progress in FZI&AIFB [1,2] DERI [3], UPM [4] Reasoning on Streams

Theory :-) work in progress in Potsdam&DERI [9,10] Efficiency :-) work in progress in ISTI-Innsbruck [5] Scalability :-| work in progress in IBM&VUA [6]

Dealing with incomplete & noisy data Even more than on the current Web of Data :-( some initial joint work

with SIEMENS only [IEEE-IS2010] Distributed and parallel processing

Streams are parallel in nature, … :-| work in progress in IBM&VUA [6] Engineering Stream Reasoning Applications

Demonstrative applications in Social Media and Sensor Networks :-) Development Environment :-) work in progress in UPM [7] Benchmarks :-P work in progress in CWI&UPM [8]

Oxford, 2012-9-25 43

Page 44: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Beyond Stream Reasoning [SWJ2012]

Positioning Stream Reasoning

44Oxford, 2012-9-25

Scalable reasoningScalable reasoning

Stream reasoningStream reasoning

Types of reasoning

No reasoning Data-driven Query-driven Combinations

No ordering

Natural

Types of orders

DSMS CEP

DSMS CEP

Page 45: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Beyond Stream Reasoning [SWJ2012]

Architecture of Stream Reasoner

• Continuous reasoning tasks registered over streams that, in most of the cases, are observed trough windows window

input streams streams of answerRegistered Continuous Reasoning

Tasks

Oxford, 2012-9-25 45

Page 46: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Beyond Stream Reasoning [SWJ2012]

Order-aware Data Management

46Oxford, 2012-9-25

Scalable reasoningScalable reasoning

Stream reasoningStream reasoning

Types of reasoning

No reasoning Data-driven Query-driven Combinations

No ordering

Natural

Cheap to enforce

Expensive to enforce

Combinations

Types of orders

Ord

er-a

war

e d

ata

man

agem

ent

Page 47: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Beyond Stream Reasoning [SWJ2012]

Order-aware Data Management

• When treating massive data order matters!

• If N is the size of the input, a problem is considered to be “well- solved” if a streaming algorithm exists which requires at most O(poly(log(N)) space and time [Henzinger98, Babcock2002]

47Oxford, 2012-9-25

Data as a sortable entity

streaming algorithms

where we can enforce orderings easily and logically

e.g., order by•sortable literals•popularity•uncertainty•trust

Most relevant answers first

Page 48: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Beyond Stream Reasoning [SWJ2012]

A first attempt: SPARQL-Rank [ISWC2012]

An extended SPARQL algebra where ORDER is a first class citizen

48Oxford, 2012-9-25

FROM Materialize then sort

TOSplit and interleave

Page 49: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Beyond Stream Reasoning [SWJ2012]

A first attempt: SPARQL-Rank [ISWC2012]

Performance increase up to 2 orders of magnitude for K<100

49Oxford, 2012-9-25

Page 50: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Beyond Stream Reasoning [SWJ2012]

A new space of investigation

50Oxford, 2012-9-25

Ord

er-a

war

e d

ata

man

agem

ent

Scalable reasoningScalable reasoning

Stream reasoningStream reasoning

Top-k Reasoning

Top-k Reasoning

Order-aware reasoning

Order-aware reasoning

Types of reasoning

No reasoning Data-driven Query-driven Combinations

No ordering

Natural

Cheap to enforce

Expensive to enforce

Combinations

Types of orders

Page 51: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Conclusions

• The Semantic Web community positively answered to the call at investigating stream reasoning

• A number of work in progress are rapidly developing this field, but we are only at the very beginning

• It may be the right time to move from naturally ordered data to other types of ordering by deepening the study of streaming algorithms for automatic reasoning

51Oxford, 2012-9-25

Page 52: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Credits• Politecnico di Milano’s colleagues

– Prof. Stefano Ceri who had the initial intuition about the value of introducing data streams to the semantic Web community

– Marco Balduini, Davide Barbieri, Daniele Braga, Stefano Ceri and Michael Grossniklaus who helped concieving the C-SPARQL Engine and the Streaming Linked Data Framework

– Sara Magliacane and Alessandro Bozzon who started the exploration of Order-aware reasoning working on SPARQL-RANK

• People I directly worked with on the topic– CEFRIEL: Irene Celino, and Danile Dell’Aglio– Saltlux: Seonho Kim, and Tony Lee– SIEMENS: Yi Huang, and Volker Tresp– STI-Innsbruck: prof. Dieter Fensel and Srdjan Komazec– UO: prof. Ian Horrocks and Markus Krötzsch– VUA: prof. Frank van Harmelen and Stefan Schlobach

• The broader research community that showed interest in stream reasoning

52Oxford, 2012-9-25

Page 53: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Downloads

• C-SPARQL Engine (no reasoning support)– A ready to go pack for eclipse

• http://streamreasoning.org/download

– Source code available on request

• SPARQL-Rank Engine (ARQ-Rank)– Source code and experimental data

• http://sparqlrank.search-computing.org/

53Oxford, 2012-9-25

Page 54: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

References

My papers• [IEEE-IS2009] E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel It's a Streaming World!

Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009)

• [EDBT2010] D.F. Barbieri, D.Braga, S. Ceri and M. Grossniklaus. An Execution Environment for C-SPARQL Queries. EDBT 2010

• [WWW2009] D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus: C-SPARQL: SPARQL for continuous querying. WWW 2009: 1061-1062

• [SIGMODRec2010] D.F. Barbieri, D.Braga, S. Ceri and M. Grossniklaus. : Querying RDF streams with C-SPARQL. SIGMOD Record 39(1): 20-26 (2010)

• [IEEE-IS2010] D. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A.Rettinger, H. Wermser: Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics IEEE Intelligent Systems, 30 Aug. 2010.

• [JWS2012] M. Balduini; I.Celino; E. Della Valle; D.Dell'Aglio; Y. Huang; T. Lee; S. Kim; V. Tresp: BOTTARI: an Augmented Reality Mobile Application to deliver Personalized and Location-based Recommendations by Continuous Analysis of Social Media Streams. JWS. 2012. IN PRESS.

• [ESWC2010] D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus. Incremental Reasoning on Streams and Rich Background Knowledge. ESWC 2010

• [SWJ2012] E. Della Valle, S.Schlobach, M. Krötzsch, A. Bozzon, S. Ceri, I. Horrocks. Order Matters! Harnessing a World of Orderings for Reasoning over Massive Data. Accepted with minor revision to SWJ

• [ISWC2012] S. Magliacane, A. Bozzon, E. Della Valle. Efficient Execution of Top-k SPARQL Queries. ISWC 2012. IN PRESS

54Oxford, 2012-9-25

Page 55: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

References

Other groups’ papers[1] Darko Anicic, Paul Fodor, Sebastian Rudolph, Nenad Stojanovic: EP-SPARQL: a unified

language for event processing and stream reasoning. WWW 2011: 635-644[2] Danh Le Phuoc, Minh Dao-Tran, Josiane Xavier Parreira, Manfred Hauswirth: A Native and

Adaptive Approach for Unified Processing of Linked Streams and Linked Data. International Semantic Web Conference (1) 2011: 370-388

[3] D. Anicic, S. Rudolph, P. Fodor, N. Stojanovic: Real-Time Complex Event Recognition and Reasoning-a Logic Programming Approach. Applied Artificial Intelligence 26(1-2): 6-57 (2012)

[4] Jean-Paul Calbimonte, Óscar Corcho, Alasdair J. G. Gray: Enabling Ontology-Based Access to Streaming Data Sources. ISWC (1) 2010: 96-111

[5] S. Komazec and D. Cerri: Towards Efficient Schema-Enhanced Pattern Matching over RDF Data Streams. First International Workshop on Ordering and Reasoning (OrdRing2011)

[6] Jesper Hoeksema, Spyros Kotoulas: High-performance Distributed Stream Reasoning using S4. First International Workshop on Ordering and Reasoning (OrdRing2011)

[7] A.J.G. Gray, R.Garcia-Castro, K.Kyzirakos, M.Karpathiotakis, J.Calbimonte, K.R.Page, J.Sadler, A.Frazer, I.Galpin, A.A.A. Fernandes, N.W. Paton, O.Corcho, M.Koubarakis, D.De Roure, K. Martinez, A. Gómez-Pérez: A Semantically Enabled Service Architecture for Mashups over Streaming and Stored Data. ESWC (2) 2011: 300-314

[8] Ying Zhang, Minh-Duc Pham, Oscar Corcho and Jean Paul Calbimonte. SRBench: A Streaming RDF/SPARQL Benchmark ISWC 2012: IN PRESS

[9] Gebser, M., Sabuncu, O., & Schaub, T. (2011). An incremental answer set programming based system for finite model computation. AI Commun., 24(2), 195–212.

[10] Gebser, M., Grote, T., Kaminski, R., Obermeier, P., Sabuncu, O., & Schaub, T. (2012). Stream Reasoning with Answer Set Programming: Preliminary Report. In KR’12, pages 613–617. AAAI Press.

55Oxford, 2012-9-25

Page 56: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

References

Background papers• [Babcock2002] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer

Widom: Models and Issues in Data Stream Systems. PODS 2002: 1-16• [Ceri1994] Stefano Ceri, Jennifer Widom: Deriving Incremental Production Rules for

Deductive Data. Inf. Syst. 19(6): 467-490 (1994)• [Cugola2011] Alessandro Margara, Gianpaolo Cugola: Processing flows of information:

from data stream to complex event processing. DEBS 2011: 359-360• [Henzinger98] Henzinger, M. R. & Raghavan, P. (1998). Computing on data streams.

Systems Research.• [Volz2005] Raphael Volz, Steffen Staab, Boris Motik: Incrementally Maintaining

Materializations of Ontologies Stored in Logic Databases. J. Data Semantics 2: 1-34 (2005)

56Oxford, 2012-9-25

Page 57: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Thank You! Questions?

Much More to Come!Keep an eye on

http://www.streamreasoning.org

Oxford, 2012-9-25 57

Page 58: Stream Reasoning: State of the Art and Beyond

Emanuele Della Valle - visit http://streamreasoning.org Emanuele Della Valle - visit http://streamreasoning.org

Call for Papers

The International Journal on Semantic Web and Information Systems

IJSWIS seeks contributions to a

Special Issue on Stream reasoningEditors:

Emanuele Della Valle, Stefano Ceri, Frank van Harmelen

Important Dates:

Abstract Submission: 30th September 2012

Submission Deadline: 31th October 2012

58Oxford, 2012-9-25