an abstract semantics and concrete language for continuous queries over streams and relations

27
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS 224 1

Upload: meg

Post on 04-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Presenter: Liyan Zhang Presentation of ICS 224. outline. Introduction Related Work Running Example Streams and Relations Modeling the Running Example Mapping Operators - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

An Abstract Semantics and Concrete Language for Continuous

Queries over Streams and Relations

Presenter: Liyan ZhangPresentation of ICS 224

1

Page 2: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

outline• Introduction• Related Work• Running Example• Streams and Relations

– Modeling the Running Example– Mapping Operators

• Abstract Semantics– Relation-to-Stream Operators– Example

• Concrete Query Language– Window Specification Language– Syntactic Shortcuts and Defaults– Example Queries

• Discussion• Conclusion

2

Page 3: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

What is CQL? SQL -- Structured Query Language CQL -- Continuous Query Language

• Interest in query processing over data streams– E.g., computer network traffic, phone conversations, ATM transactions, web

searches, and sensor data• simple queries----easy to handle using SQL

– take a relational query language– replace references to relations with references to streams– register the query with the stream processor– wait for answers to arrive

• Complex queries----difficulties– aggregation, subqueries, windowing constructs, relations mixedwith streams,

S is a streamR is a relation[Rows 5] specifies a sliding window

one-time queries over stored data sets Continuous query over continuously arriving data

3

Page 4: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

How to define CQL?• Define abstract semantics based on components

– any relational query language– any window specification language– a set of relation-to-stream operators

• Define Concrete language that instantiates the abstract semantics• several goals in mind:

– exploit well-understood relational semantics– wanted queries performing simple tasks to be easy and compact to write– wanted to enable new transformations specific to streams

• contributions of this paper:– formalize streams, updateable relations, and their Interrelationship– define an abstract semantics for continuous queries– propose a concrete language, CQL (Continuous Query Language)– consider two issues:

• exploiting CQL equivalences for query-rewrite optimization,• Dealing with time-related issues

4

Page 5: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

outline• Introduction• Related Work• Running Example• Streams and Relations

– Modeling the Running Example– Mapping Operators

• Abstract Semantics– Relation-to-Stream Operators– Example

• Concrete Query Language– Window Specification Language– Syntactic Shortcuts and Defaults– Example Queries

• Discussion• Conclusion

5

Page 6: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Related work• focus on languages and semantics for continuous queries• Continuous queries were introduced for the first time in Tapestry

with a SQL-based language called TQL– TQL query is executed once every time instant as a one-time SQL

query– the results of all the one-time queries are merged using set union– Semantics based on periodic execution of one-time queries

• Several systems support procedural continuous queries– Aurora system

• based on users directly creating a network of stream operators• A large number of operator types, from simple stream filters to complex

windowing and aggregation operators.– Tribeca stream-processing system for network traffic analysis

• supports windows, a set of operators adapted from relational algebra, and a simple language for composing query plans from them

• Tribeca does not support joins across streams

6

Page 7: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

outline• Introduction• Related Work• Running Example• Streams and Relations

– Modeling the Running Example– Mapping Operators

• Abstract Semantics– Relation-to-Stream Operators– Example

• Concrete Query Language– Window Specification Language– Syntactic Shortcuts and Defaults– Example Queries

• Discussion• Conclusion

7

Page 8: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Running Example• online auction application

• Users:– Registers: providing a name and current state of residence– Deregister

• 3 transactions:– place an item for auction and specify a starting price– close an auction they previously started – bid for currently active auctions by specifying a bid price

• Continuous queries:– Users can register various monitoring queries in the system

• For example, a user might request to be notified about any auction placed by a user from California within a specified price range.

– The auction system can run continuous queries for administrative purposes• Whenever an auction is closed, generate an entry with the closing price of the

auction based on bid history• Maintain the current set of active auctions and currently highest bid for them • Maintain the current top 100 “hot items,” i.e., 100 items with the most number of

bids in the last hour. 8

Page 9: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

outline• Introduction• Related Work• Running Example• Streams and Relations

– Modeling the Running Example– Mapping Operators

• Abstract Semantics– Relation-to-Stream Operators– Example

• Concrete Query Language– Window Specification Language– Syntactic Shortcuts and Defaults– Example Queries

• Discussion• Conclusion

9

Page 10: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Streams and Relations example

tuple s arrives on stream S at time t

Given t, there could be 0, 1 or multiple elements with timestamp t in stream S

Base stream: source streams

Derive stream: streams resulting from queries or subqueries.

Base relations: stored relations

Derive relations : relation s resulting from queries or subqueries.

denotes an unordered bag of tuples at anytime instant

Timestamp t means logical time, NOT physical time

10

Mapping

Page 11: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Modeling the Running Example back• The input to the online auction system consists of the following five streams:• Register

• Deregister

• Open

• Close

• Bid

11

Page 12: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Mapping Operators

stream-to-relation

relation-to-relation

relation-to-stream

take a “sliding window” over the stream thatcontains the bids over the last ten minutes

stream the average price resulting from operator

every time the average price changes 12

Page 13: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

outline• Introduction• Related Work• Running Example• Streams and Relations

– Modeling the Running Example– Mapping Operators

• Abstract Semantics– Relation-to-Stream Operators– Example

• Concrete Query Language– Window Specification Language– Syntactic Shortcuts and Defaults– Example Queries

• Discussion• Conclusion

13

Page 14: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Abstract Semantics example

• relation-to-relation operators– Any relational query

language• stream-to-relation

operators– window specification

language: extract tuples from streams

• relation-to-stream operators– Istream, Dstream, and

Rstream

Applying the window semantics on the elements of S up to t if R is the output of a window operator over a stream S

Applying the semantics of the relational query on the input relations at time t if R is the output of a relational query

computed by

14

Page 15: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Relation-to-Stream Operators back

• Istream

• Dstream

• Rstream

counterpart

Rstream subsums combination of Istream and Dstream

15

Page 16: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

ExamplePrevious example:

Using relational algebra, written as:

At any time instant t, S[5] is an instantaneous relation containing the last five tuples in S up to t , and then joined with R(t)

Relation may change whenever a new tuple arrives in S or R is updated

Adding an outermost Istream to this query:• convert the relational result into a stream• With Istream semantics, a new element <u,t> is streamed whenever tuple

u is inserted into S[5] R at time t, as the result of a stream arrival or relation update.

S is a streamR is a relation[Rows 5] specifies a sliding window

16

Page 17: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

outline• Introduction• Related Work• Running Example• Streams and Relations

– Modeling the Running Example– Mapping Operators

• Abstract Semantics– Relation-to-Stream Operators– Example

• Concrete Query Language– Window Specification Language– Syntactic Shortcuts and Defaults– Example Queries

• Discussion• Conclusion

17

Page 18: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Concrete Query Language example

• CQL contains 3syntactic extensions to SQL:– Anywhere a relation may be referenced in SQL, a stream may be

referenced in CQL– In CQL every reference to a stream(base or derived) must be

followed immediately by a window specification.– In CQL any reference to a relation(base or derived)may be

converted into a stream by applying any of the operators Istream, Dstream, or Rstream

• Defaults:– Default windows

• When a stream is referenced in a CQL query and is not followed by a window specification, an Unbounded window is applied by default.

– Default Relation-to-Stream Operators• On the outermost query, even when streamed results rather than stored

results are desired• On an inner subquery, even though a window is specified on the subquery

result• Add an Istream when the query produce a monotonic relation

18

Page 19: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Window Specification Language back

• CQL supports only sliding windows, it supports three types:• Time-Based Windows• Parameters: a time interval T• Specified by “S[Range T]”, sliding an interval of size T time over S• Special cases:

– T=0, tuples from elements of S with timestamp t “S[Now]”– T= , tuples obtained from all elements of S up to t, “S[Range Unbounded]”

• Tuple-Based Windows• Parameters: a positive integer N• Specified by “S [Rows N]”, N elements with largest timestamp <= t • Special cases:

– N= , “S[Rows Unbounded]”

• Partitioned Windows• Parameters: a positive integer N, and a subset of S’s attributes• Specified by S “. • partitions S into different substreams based on the attributes (similar to SQL Group By),

computes a tuple-based sliding window of size N independently on each substream cases, then takes the union of these windows to produce the output relation.

19

Page 20: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Example Queries

• Window specification default– Open stream is referenced without

window

• Istream default– output relation is Monotonic– Converting the output relation into a

stream

• The query rewritten as

• explicit window specification

• Nonmonotonic result , so no default Istream– If add Istream: result will

stream new value when count changes

– If add Rstream: count will be streamed at each time instant.

20

Page 21: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Example Queries

• Unbounded windows are applied by default on both Open and Close

• Default Istream is not applied– Subquery return a monotonic

relation, but no window specification following the query.

– The result of the entire query is not monotonic—auction tuples are deleted from the result when the auction is closed—and therefore an outermost Istream operator is not applied.

• partitioned window on the Register stream obtains the latest registration for each user

• Where clause filters out users who have already deregistered.

21

Page 22: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Example Queries

• join the Open stream with the User relation• If use an Unbounded window on Open

– then whenever a user moved into California , all previous auctions started by that user would be generated in the result stream.

• if a stream is joined with a relation ( in order to add attributes to or filter the stream) – then a Now window on the stream coupled with

an Istream or Rstream operator usually provides the desired behavior

• stream any item_id from Close whose corresponding Open tuple arrived within the last 5 hours

•Unbounded windows are applied by default on the Bid and Open streams• An Istream operator is applied to the Union result by default

since the relational output of the Union subquery is monotonic followed by a window specification.

22

Page 23: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

outline• Introduction• Related Work• Running Example• Streams and Relations

– Modeling the Running Example– Mapping Operators

• Abstract Semantics– Relation-to-Stream Operators– Example

• Concrete Query Language– Window Specification Language– Syntactic Shortcuts and Defaults– Example Queries

• Discussion• Conclusion

23

Page 24: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Discussion• Stream-Only Query Language

– CQL distinguish two fundamental data types, relations and streams– derive a stream-only language from CQL

• Equivalences and Query Transformations– Window Reduction

• Unbounded windows require buffering the entire history of a stream, • while Now windows allow a stream tuple to be discarded as soon as it is

processed– Filter-Window Commutativity

• Timestamps and Physical Time– no direct relationship between T and physical clock-time at the Data Stream

Management System

Unbounded window and an Istream operator

Now window and an Rstream operator

24

Page 25: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

outline• Introduction• Related Work• Running Example• Streams and Relations

– Modeling the Running Example– Mapping Operators

• Abstract Semantics– Relation-to-Stream Operators– Example

• Concrete Query Language– Window Specification Language– Syntactic Shortcuts and Defaults– Example Queries

• Discussion• Conclusion

25

Page 26: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Conclusion• This paper firstly presented an abstract semantics based on – any relational query language– any window specification language to map from streams to

relations– and a set of operators to map from relations to streams

• Proposed CQL, a concrete language– using SQL as the relational query language– window specifications derived from SQL-99

• Identified several practical issues arising from CQL– syntactic shortcuts and defaults– intuitive query formulation– equivalences for query optimization

26

Page 27: An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations

Q&A

Thanks!

27