acqua: approximate continuous query answering over streams and dynamic linked data sets

15
ACQUA : A PPROXIMATE C ONTINUOUS QU ERY A NSWERING OVER STREAMS AND DYNAMIC LINKED DATA SETS Emanuele Della Valle DEIB - Politecnico of Milano http://emanueledellavalle.org @manudellavalle Schloss Dagstuhl, Germany - 26 June 2017

Upload: emanuele-della-valle

Post on 23-Jan-2018

169 views

Category:

Data & Analytics


0 download

TRANSCRIPT

ACQUA:

APPROXIMATE CONTINUOUS

QUERY ANSWERING

OVER STREAMS AND

DYNAMIC LINKED DATA SETS

Emanuele Della ValleDEIB - Politecnico of Milano

http://emanueledellavalle.org

@manudellavalle

Schloss Dagstuhl, Germany - 26 June 2017

Stream Processing in Nutshell

Stream Processing Engine

Results

Win

dow

s

Stream data Register query

once and execute it

continuously

2Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle

Web Stream Processing

Web

Resu

lts

Join

Win

dow

s

Web Streams Linked Data

High Latency

Rate Limits

Loosing Reactiveness

3

Stream Processing Engine

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle

RDF Stream Processing (RSP) EngineR

SP

en

gin

e

Web

Resu

lts

Join

Win

dow

s

RDF Streams SPARQL endpoint

4Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle

An example

The cloth brand ACME wants to persuade influential Social

Networks users to post commercial endorsements.

Every minute give me the ID of the users that are mentioned on

Social Network in the last 10 minutes whose number of followers is

greater than 100,000.

5

REGISTER STREAM <:InfluencersToContact> AS

CONSTRUCT {?user a :influentialUser}

FROM NAMED WINDOW W ON S [RANGE 10m STEP 1m]

WHERE {

WINDOW W {?user :hasMentions ?mentionsNumber}

SERVICE BKG {?user :hasFollowers ?followerCount }

FILTER (?followerCount > 100,000)

}

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle

Problem DefinitionR

SP

en

gin

e

Web

Resu

lts

Join

Win

dow

s

RDF Streams

Define Refresh

Budget to

control

reactiveness

6

Data become stale

if not refreshed

Correct vs

approximate

answer

SPARQL endpoint

Local

Replica

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle

Problem DefinitionR

SP

en

gin

e

Web

Resu

lts

Join

Win

dow

s

RDF Streams SPARQL endpoint

7

Local

Replica

Maintenance

Policy

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle

ACQUA approach

8

WINDOW clause

Stream data

JOIN Proposer Ranker

Maintainer

2

3

1

SERVICE clauseAQCUA: without FILTER

AQCUA.F: with FILTER Clause

E

C

ACQUA [2]

RND

LRU

WBM

ACQUA.F [3]

Filter Update Policy

RND.F

LRU.F

WBM.F

ACQUA.F+/* [5]

LRU.F+

WBM.F+

WBM.F*

Candidate set

Elected set: top γ mappings

of Candidate set

Local Replica

WSJ: Filter out mappings

that are not involved in

current evaluation

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle

Where to read about ACQUA1. Soheila Dehghanzadeh, Alessandra Mileo, Daniele Dell'Aglio,

Emanuele Della Valle, Shen Gao, Abraham Bernstein: Online View Maintenance for Continuous Query Evaluation. WWW (Companion Volume) 2015: 25-26

2. Soheila Dehghanzadeh, Daniele Dell'Aglio, Shen Gao, Emanuele Della Valle, Alessandra Mileo, Abraham Bernstein: Approximate Continuous Query Answering over Streams and Dynamic Linked Data Sets. ICWE 2015: 307-325

3. Shima Zahmatkesh, Emanuele Della Valle, Daniele Dell'Aglio:When a FILTER Makes the Difference in Continuously Answering SPARQL Queries on Streaming and Quasi-Static Linked Data. ICWE 2016: 299-316

4. Shen Gao, Daniele Dell'Aglio, Soheila Dehghanzadeh, Abraham Bernstein, Emanuele Della Valle, Alessandra Mileo: Planning Ahead: Stream-Driven Linked-Data Access Under Update-Budget Constraints. International Semantic Web Conference (1) 2016: 252-270

5. Shima Zahmatkesh, Emanuele Della Valle, Daniele Dell'Aglio: Using Rank Aggregation in Continuously Answering SPARQL Queries on Streaming and Quasi-static Linked Data. DEBS 2017: 170-179

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle 9

Experimental Results

10

Wors

tB

est

Perf

orm

ance

Experiment Dimension

For high selectivity

Filter Update Policy

is better than WBM

For low selectivity

WBM is better than

Filter Update Policy Comparable to

Best Result

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle

Future works

• Broaden the class of queries

• Multiple filtering

• Filtering condition formulated as a ranking clause

• Pushing the FILTER clause into the SERVICE clause and

considering caching instead of local replica

• Study the effect of different trends in the data

11Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle

ACQUA IN THE

STREAM REASONING

CONTEXTAnnex

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle 12

Stream Reasoning in a nutshell

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle 13

Tame data Variety and Velocity simultaneously

Traditional StreamReasoning

Tame data Variety and Velocity simultaneously

Traditional StreamReasoning

Stream Reasoning in a nutshell

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle 14

Tame data Variety and Velocity simultaneously without forgetting volume

Traditional StreamReasoning

What if the analysis

includes also

data "at rest"?

What if the data "at rest"

are massive and

slowly evolving?

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle 15

Stream Reasoning in a nutshell

ACQUA