ffit execution of designated event-driven stream processing
TRANSCRIPT
DEIM Forum 2015 E4-4
Efficient Execution of Designated Event-driven Stream ProcessingYan WANG†, Hiroyuki KITAGAWA††, Salman AHMED SHAIKH†††, and Yousuke
WATANABE††††
† Graduate School of Systems and Information Engineering, University of Tsukuba†† Faculty of Engineering, Information and Systems, University of Tsukuba
††† Center for Computational Sciences, University of Tsukuba1-1-1 Tennodai, Tsukuba, Ibaraki. 305-8573, Japan
†††† Institute of Innovation for Future Society, Nagoya UniversityFuro-cho, Chikusa-ku, Nagoya, 464-8603, Japan
E-mail: †{wangyan,salman}@kde.cs.tsukuba.ac.jp, ††[email protected],†††[email protected]
Abstract Stream processing has been an important research issue with the increase in stream data sources. Todate, several stream processing engines have been developed and one thing which is common among most of themis that they process the stream data and generate query results as soon as any new data from any stream arrives.However, sometimes users are not interested in all the results but would like to get the continuous query resultfor a short duration after the arrival of data from a particular stream. We name this processing scheme as desig-nated event-driven stream processing scheme. We propose a smart approach for the designated event-driven streamprocessing scheme. We performed extensive experiments to show the advantage of the smart approach.Key words Stream processing, Event-driven processing, Smart approach
1. Introduction
With the advancement in technology, the amount of data
is increasing. If we look around, a lot of devices are gen-
erating continuous data. For example, our cell-phone GPS,
our car trackers, weather and traffic sensors. Such data is
called data streams among the database community. In or-
der to process and query such continuously evolving data,
many stream processing engines (SPEs) have been devel-
oped in past. STREAM [1], S4 [2], Discretized Streams [3],
Borealis [8], Aurora [6], TelegraphCQ [7] and Storm [4] are a
few examples of the well-known and commonly used SPEs.
One thing which is common among the most available SPEs
is that, whenever new data arrives from any stream source,
they process it and generate query results. In this work we
call such data processing scheme the traditional stream pro-
cessing scheme.
However, sometimes users are not interested in all the re-
sults but would like to get the continuous query result for
a short duration after the arrival of data from a particular
stream or after some particular event. For example, an ad-
ministrator of a data center may wants to know the network
condition when some failure occurs. He may also be inter-
ested in monitoring the network condition for the 30 minutes
following the failure. Suppose there are two streams being
read by a SPE, a network connection stream and a failure
stream. Then the SPE is supposed to generate query results
for 30 minutes only when the data arrives from the failure
stream. We name such processing scheme as the designated
event-driven stream processing scheme. Here the designated
stream is the one whose data arrival triggers the query and
causes the results to be generated. The designated event-
driven scheme is different from the traditional stream pro-
cessing schemes where the continuous queries are triggered
by all streams’ data. In this work, we use the terms mas-
ter stream and activation duration for the designated stream
and the triggering duration, respectively.
Precisely, in this work we introduce the idea of desig-
nated event-driven stream processing and propose an effi-
cient execution approach (smart approach) for it extend-
ing the incremental computation scheme employed in some
SPEs. We performed extensive experiments to show that
the proposed smart approach is capable of improving the
system’s throughput significantly when the master streams’
input rates are relatively lower than the non-master streams.
The rest of the paper is organized as follows. Section 2
discusses the related work. Section 3 precisely presents ba-
sics of the continuous query language and traditional stream
processing scheme. In section 4 we introduce the designated
event-driven stream processing scheme. Section 5 presents
and compares the naive approach with the proposed smart
approach. In section 6, an extensive experimental study is
presented, while section 7 concludes this paper and discusses
some of the future directions.
2. Related Work
Many stream processing engines have been developed re-
cently due to the increase in demand to efficiently process
fast evolving streams. In this section, we will review some of
the famous and commonly used SPEs.
STREAM: The Stanford Data Stream Management Sys-
tem [1], is a data management and query processing en-
gine for the applications requiring long-running or contin-
uous queries over continuous unbounded streams of data.
STREAM supports declarative continuous queries written in
CQL [5] over continuous streams and traditional stored data
sets. The STREAM prototype targets environments where
streams may be rapid, stream characteristics and query loads
may vary over time, and system resources may be limited.
Discretized stream [3] is a stream programming model for
computer clusters that provides consistency, fault recovery,
and integration with batch systems. The key idea in their
work is to treat data streams as a series of short batch jobs to
bring down the latency of these jobs. They also developed an
SPE to let users intermix streaming, batch and interactive
queries.
Storm [4] is also a distributed realtime computation sys-
tem which provides the processing of unbounded streams
of data. Storm is quite flexible and can be used with any
programming language. A storm cluster consists of a mas-
ter node which accepts queries called topologies and assigns
tasks to workers, a group of worker nodes for the processing
of the topologies, and a zoo-keeper cluster which provides co-
ordination between master and worker nodes and also keeps
the states of master and worker nodes.
Beside these, there are several other stream processing en-
gines. One thing which is common among most of the SPEs
discussed above is that they process the stream data as soon
as new data arrives. Some of the SPEs also provides event-
based stream processing capabilities in their work. How-
ever, none of the above discussed works provides designated
event-driven stream processing capability, which is the main
contribution of this work. Moreover, we propose an efficient
query execution approach (smart approach) for our desig-
nated event-driven stream processing scheme extending the
incremental computation scheme employed in some SPEs.
3. Traditional Stream Processing
In the traditional stream processing engines, when a user
registers a continuous query, it is executed continuously on
the incoming stream tuples and generates continuous out-
put. An example of such a query is shown in Query 1, which
performs continuous join operation with respect to attribute
A of Stream1 and Stream2.
SELECT ∗
FROM Stream1 [Rows 2 ] , Stream2 [Rows 2 ]
WHERE Stream1 .A = Stream2 .A
Query 1: An example of CQL query
Query 1 is written in CQL [5], which is an SQL-
based declarative language for registering continuous queries
against streams and updatable relations. The authors in [5]
proposed abstract semantics and a concrete language, which
is more general than many other continuous query languages
and is therefore adopted by many SPEs. CQL makes use of
the window specifications and constructs for mixing streams
and relations, and the power of any relational query lan-
guage. Since our stream engine also makes use of a slightly
modified form of CQL for querying data streams, we sum-
marize its abstract semantics and briefly discuss the three
classes of operators in CQL [5].
3. 1 Abstract Semantics of CQL
The abstract semantics for continuous queries is based on
two data types and three classes of operators, which are de-
fined using a discrete, ordered time domain Γ.
a ) Stream S :
A Stream is an unbounded bag (multiset) of pairs < s, t >,
where s is a tuple and t ∈ Γ is the timestamp that denotes
the logical arrival time of tuple s on stream S.
b ) Relation R:
A Relation is a time-varying bag of tuples. The bag of
tuples at time t ∈ Γ is denoted by R(t), where R(t) is an
instantaneous relation. Note that the definition of a relation
differs from the traditional one which has no built-in notion
of time.
There are three classes of operators over streams and re-
lations, which are stream-to-relation operators, relation-to-
relation operators and relation-to-stream operators.
Whenever a new stream tuple arrives at t, a CQL query
composes a new input relation ri(t) using a window operator,
and generates an output relation ro(t) evaluating relational
operators involved in the CQL query. Finally, a relation-
to-stream operator is applied to the output relation ro(t) to
convert it into a stream. For example, Istream outputs tu-
ples in ro(t) − ro(t′) downstream, where t′ is the previous
query evaluation timestamp.
window
window
join Istream
𝑆1
𝑆2
𝑞1
𝑞2
𝑞3
𝑞4
𝑞5 𝑞6
𝑊𝑖𝑛𝑑𝑜𝑤𝑆𝑦𝑛1
𝑊𝑖𝑛𝑑𝑜𝑤𝑆𝑦𝑛2
𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑆𝑦𝑛3
𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑆𝑦𝑛4
𝐿𝑖𝑛𝑒𝑎𝑔𝑒𝑆𝑦𝑛5
𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑆𝑦𝑛6
Figure 1: A simple query plan tree
3. 2 CQL Query Plan
Each CQL [5] query is compiled into a query plan tree. The
query plan tree for Query 1 is shown in Fig. 1. Each CQL
query plan tree runs continuously and is composed of three
different types of components: operators, queues and syn-
opses. The details of these components can be found in [9].
The query plan tree in Fig. 1 consists of three different
operators. The window operators are responsible for gener-
ating finite relations from infinite input streams. The tuples
in these finite relations are then processed by the algebraic
operators like join, aggregate, etc. In Fig. 1, binary join
operator receives the tuples of two windows through queues
q3 and q4. The last operator in the query plan is a relation-
to-stream operator Istream, which calculates increments of
consecutive join results.
3. 3 Incremental Computation
In this subsection, we discuss the incremental computation
scheme employed in SPEs like STREAM [1]. As mentioned
above, a CQL query logically outputs tuples based on ro(t)
and ro(t′), but computations required for ro(t) and ro(t
′)
often have a lot of overlap. To eliminate redundant compu-
tation, the incremental computation scheme is used. Assume
that ri(t) and ri(t′) are input relations used to generate ro(t)
and ro(t′), respectively. Intuitively, the incremental compu-
tation scheme calculates ro(t) incrementally by ri(t)− ri(t′),
ri(t′)− ri(t), and ro(t
′). For this purpose, operators are pro-
vided with synopses to maintain the current outputs, and tu-
ples flowing the query plan tree are associated with plus/mi-
nus tags.
Whenever a new tuple arrives at the window operator, it is
saved in the window synopsis. The window actually converts
a stream into a relation using a sliding window mechanism.
In order to implement the incremental computation scheme,
newly arriving tuples are appended with the plus tags. (We
call them p-tuples.) The p-tuples are sent to the downstream
operators, and they calculate increments caused by the p-
tuples. At the same time, the oldest tuples become obsolete
(outside the window), and minus tags are appended to them.
(We call them m-tuples.) The m-tuples are also sent to all
the operators downstreams to covey the expiration of the
tuples so that they can remove them from their synopses.
4. Designated Event-driven Stream Pro-cessing
In this section we introduce the proposed designated
event-driven stream processing scheme (shortly, designated
scheme). In this scheme, the continuous query is executed
for a fixed time duration τ after the arrival of a tuple from
the designated streams. The duration τ must be specified by
the user when the query is registered. More than one data
streams may be designated. In contrast, in the traditional
stream processing scheme, the query is executed continuously
on all incoming data streams. The traditional stream pro-
cessing scheme is a special case of the proposed designated
scheme. If we designate all the source streams as master
streams and set the activation duration to 0, then the desig-
nated scheme behaves like the traditional scheme.
We now formally define the designated scheme. Let T be
a time domain consisting of discrete, ordered timestamps t.
SPE assigns a timestamp t to each incoming stream tuple
from the domain T . Tuples may arrive at the SPE from the
master or non-master streams. Let T ′ be the time domain
of the master streams, such that T ′⊂=T and t′ ∈ T ′.
Suppose a query Q is registered on the SPE by a user.
When a tuple arrives from a master stream at t′, it triggers Q,
and we call this state of Q active. The active state of Q lasts
for a time duration τ . This time duration is referred to as the
activation duration. If a new tuple arrives from any incoming
data streams, while the query is still in the active state, the
query is triggered again. The active time duration of a query
which was activated at t′ is given by [t′, t′ + τ ]. Note that if
a new tuple arrives from a master stream at t′′ ∈ [t′, t′ + τ ],
the activation duration is updated to [t′′, t′′ + τ ]. Consider
a join query Query 2 using the proposed designated scheme.
In contrast to Query 1 for the traditional scheme, Query 2
contains two additional clauses. A MASTER clause and
an ACTIVATION DURATION clause. This query per-
forms a simple binary join operation with respect to attribute
X of Stream1 and Stream2. Stream1 is designated as the
master stream, while Stream2 is by default a non-master
stream. The activation duration τ is set to 1 second, which
means that the query will remain active for 1 second after
the arrival of each tuple from Stream1.
MASTER Stream1
SELECT ∗
FROM Stream1 [Rows 2 ] , Stream2 [Rows 2 ]
WHERE Stream1 .X = Stream2 .X
ACTIVATE DURATION 1second
Query 2: A designated event-driven query
Time (s)
Time:1 Time:2 Time:3 Time:4 Time:5 Time:6
Stream2
Master
Stream1
x y
4 2
x y
5 3
x y
6 1
x
5
x
6
x y
7 8
Time:7
x y
5 9
Time:8
Figure 2: Input data streams
5 2 x y 5 3 5 2
x y 6 1
Time:3 Time:7
Figure 3: Output stream
The proposed designated scheme has many applications.
For example, in the case of network streams mentioned in
Section 1, the failure stream can be the master stream.
Another example will be a scenario where a user has two
streams, a news stream and a Twitter tweet stream, and the
user wants to analyse the related tweets when a piece of news
of interest arrives. In this case, the user can designate the
news stream as the master to avoid the processing of all the
tuples in the tweet stream. It saves a lot of computation
resources and increases the query throughput. For queries
with only one input stream, S, an external trigger or event
stream can be specified as the master stream to activate the
query or we can designate S as the master stream.
Once again consider Query 2 and the data streams shown
in Fig. 2, where the streams’ tuples arrive in time order. For
the sake of simplicity we assume that each stream is capable
of generating one tuple per second. As stated earlier, arrival
of a tuple from the master stream activates the query and
produces the result. In Fig. 2, the master stream activates
the query at t = 2 and t = 7. Since the activation duration
is set to 1 second, the query is activated by the arrival of the
master stream tuples and remain active for 1 second, keeping
the query active for t = 2, 3, 7 and 8. The results generated
by this execution are shown in Fig. 3.
5. Query Execution Approaches for theDesignated Event-driven Stream Pro-cessing Scheme
We present two query execution approaches to the des-
ignated event-driven stream processing scheme: the naive
approach and the proposed smart approach, which can sig-
nificantly increase the query throughput.
Namely, this section discusses how incoming stream tu-
ples, including master and non-master stream tuples, are
processed in the designated event-driven stream processing
engine under the naive and smart approaches.
5. 1 Naive Approach
The simplest way to handle incoming stream tuples is to
process the tuples only when the query is active. However
this does not work, as we must maintain the incoming tuples
from non-master streams even when the query is inactive to
guarantee the correctness of the window-based query result.
In the naive approach, the system works almost like the
traditional incremental stream processing scheme as ex-
plained in Section 3. The only difference is that we set the
master flag of all the incoming tuples which arrive during
the query activation duration. Master flags are conveyed to
intermediate tuples constructed by operators involved in the
query. More precisely, if an input tuple which triggers the
operator has the master flag, the master flags of the corre-
sponding output tuples generated in the incremental com-
putation scheme are set. Using the master flag, the outer-
most operator, which is a relation-to-stream operator such
as Istream operator for generating the output, identifies the
tuples which should be output as the query results.
When a leaf operator (an operator responsible for accept-
ing a stream’s input) receives a tuple, it checks whether the
tuple is from a master stream or from a non-master stream.
If the tuple is from a master stream, then the master flag of
the tuple is set, the query is activated, and the query activa-
tion time is updated. On the other hand, if the tuple is from
a non-master stream, then we check the query state. If the
query is active, the tuple master flag is set, which means that
the tuple arrived while the query was activated and must be
considered for the output.
The problem with the naive approach is that, during the
query inactive states, a lot of tuples arriving from non-master
streams get processed by all the operators and generate in-
termediate query results without master flags, which are ac-
cumulated in the outermost operator synopsis and deleted
using m-tuples. This wastes the computational and storage
resources of the SPE.
Example: Assume that Query 2 is provided with the in-
put data streams shown in Fig. 2. The master flag is set by
the leaf operator for all tuples which arrive while the query is
active. Here, we assume that a tuple has the timestamp, the
unique identifier, a plus or minus tag, and the data contents.
In the stream shown in Fig. 2, due to the arrival of a
tuple from Stream1 at t = 2, the query is active at t = 2
and t = 3. Thus the master flag of the tuple arriving at
t = 3 from Stream2 is set. The resultant tuple looks like this:
< 3, 3,+, T, 5, 3 >, which consists of the timestamp, the tu-
ple identifier, the plus/minus tag, the master flag (T:master
flag on, F: master flag off), and the data contents.
Since the tuple from Stream1 at t = 2 and the tuple from
Stream2 at t = 3 satisfy the join condition, the join opera-
Smart
window
[Rows 2]
ts x y id M
Window synopsis
ts x y id T M
1 5 2 1 +
6 7
4 5 9 5 + F
5 6 1 6 + F
7 8 F
ts x y id T M
output
suspended
Figure 4: Smart Window Operator
tor generates a new p-tuple < 3, 4,+, T, 5, 3 >. When this
p-tuple arrives at the Istream operator, it outputs it as the
query result as shown in Fig. 3, because the arriving tuple’s
master flag is on.
At t = 4, a p-tuple < 4, 6,+, F, 5, 9 > is generated as a
result of the join of the tuple of t = 2 from Stream1 and the
tuple of t = 4 from Stream2. Since the query is inactive at
t = 4, its master flag is set to F. When this tuple arrives
at the Istream operator, it is saved in the synopsis. Since
the window size for Stream2 is 2, the corresponding window
synopsis maintains the latest 2 tuples from Stream2. At t =
5, a new tuple arrives from Stream2, and it is processed in a
similar way. At t = 6, a new tuple arrives from Stream2, and
it pushes the tuple of t = 4 out of the window synopsis. An
m-tuple < 6, 6,−, F, 5, 9 > is generated by the window oper-
ator and sent to the downstream. Then the Istream operator
delete the tuple < 4, 6,+, F, 5, 9 > from the IStream synop-
sis. Finally, at t = 7, another tuple arrives from Stream1
causing the query to be activated and generates another re-
sult as shown in Fig. 3. In this example, the join result of t
= 4 does not contribute to any query results.
5. 2 Smart Approach
The naive approach may generate many useless intermedi-
ate query results. The smart execution scheme address this
problem by changing the behaviour of the window opera-
tor. We introduce the smart window operator for non-master
streams.
In the smart approach, when a tuple arrives from a non-
master stream, the system checks whether the query is active
or not. If the query is inactive, the smart window operator
buffers this tuple inside the window operator and does not
output any p-tuple. If the query is active, the smart window
operator outputs p-tuples corresponding to this tuple as well
as all the buffered tuples. While buffering tuples in the inac-
tive state, some buffered tuples reach beyond the window size
because of arrivals of succeeding tuples and become obsolete.
These obsolete tuples can be deleted directly. In the naive
approach, p-tuples are also output for such tuples, and pro-
window
Smart window
join Istream
𝑆1
𝑆2
𝑞1
𝑞2
𝑞3
𝑞4
𝑞5 𝑞6
𝑊𝑖𝑛𝑑𝑜𝑤 𝑆𝑦𝑛1
𝑊𝑖𝑛𝑑𝑜𝑤 𝑆𝑦𝑛2
𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑦𝑛3
𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑦𝑛4
𝐿𝑖𝑛𝑒𝑎𝑔𝑒 𝑆𝑦𝑛5
𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑦𝑛6
Figure 5: Query plan tree using smart window operator
cessed by the downstream operators. When they get expired,
their results are canceled by the corresponding m-tuples.
The synopsis of the smart window operator is divided into
two parts: output and suspended parts as shown in Fig. 4.
Both the output and suspended parts keep recent incoming
tuples within the window. A new tuple is first put into the
suspended part. When the query is activated, p-tuples corre-
sponding to the tuples inside the suspended part are output
to the downstream, and they are moved to the output part.
If tuples in the output part get expired, corresponding m-
tuples are output to the downstream. But for tuples in the
suspended part, no p-tuples or m-tuples are sent to the down-
stream, which contributes to the reduction of computation
cost.
Example: Consider Query 2, whose query plan tree is
shown in Fig. 5. We assume the input data streams shown
in Fig. 2.
At t = 4, the query is inactive and the tuple is therefore
buffered in the suspended part of the smart window’s synop-
sis instead of sending a p-tuple like the naive approach. A
tuple of t = 5 from Stream2 is also buffered in the suspended
part. The snapshot of the smart window operator synopsis
is shown in Fig. 4. The window size of the smart window op-
erator is 2, so it always maintains the latest 2 tuples. When
the tuple < 6, 7, F, 7, 8 > arrives at t = 6, the oldest tuple
in the suspended part, i.e., < 4, 5,+, F, 5, 9 > gets expired.
Since this tuple stays in the suspended part of the synopsis,
it never contributed to generating intermediate results down-
stream. Thus, in contrast to the naive approach, the tuple
can be deleted directly from the window synopsis without
the need to generate any m-tuple . As a result, generation of
intermediate useless tuples can be avoided. The final results
of Query 2 is completely same as the one shown in Fig. 3.
6. Experiments
In this section we present detailed experimental study to
evaluate the effectiveness of the proposed designated event-
driven stream processing scheme and the smart approach.
For the sake of experiments, we developed a prototype SPE
which enables users to register CQL style queries in their al-
gebraic expressions. The source program consists of about
13,000 lines of C++ code. The query is translated into the
query plan tree consisting of operators, queues and synopses.
Our prototype SPE also supports multiple queries. The pro-
totype SPE supports the designated event-driven stream pro-
cessing and both the smart and naive execution approaches.
We performed the experiments on Lenovo ThinkPad L412
with Intel i3, 2.26GHz processor and 6 GB RAM running
Ubuntu 14.10 OS.
We used two synthetic data streams, each consisting of two
attributes, an integer and a string. Both the streams have
an integer attribute A and a string attribute named B and
C for Stream1 and Stream2, respectively.
For the purpose of performance measurements, different
queries are executed on the prototype SPE. Unless stated
otherwise, each query is executed for the duration of 60 sec-
onds. Each experiment is performed 5 times and the average
values are taken. In the following, the query activation du-
ration is expressed by τ and the join selectivity by δ.
a ) Join Query:
First, a join query Query 3 is evaluated. Stream1 is the
master stream with input rate Rm and window size Wm,
while Stream2 is the non-master stream with input rate Rs
and window size Ws.
MASTER Stream 1
SELECT ∗
FROM Stream1 [Rows Wm] , Stream2 [Rows Ws]
WHERE Stream1 . a = Stream2 . a
ACTIVATE DURATION = τ
Query 3: Join query
As discussed in section 5, the smart approach generates less
intermediate tuples than the naive approach when tuples in
the smart window synopsis can be deleted directly. Tuples
stay in the suspended part of the smart window synopsis
only while the query is inactive. Since the interval between
the arrival of two master stream tuples is 1Rm
, if τ is larger
than 1Rm
, the query remains active all the time. Therefore,
the smart approach can be advantageous when τ < 1Rm
.
The query is inactive for the duration 1Rm
− τ . During
this interval, ( 1Rm
− τ)Rs tuples arrive from the non-master
stream. These tuples are buffered in the suspended part of
the smart window synopsis. If the number of these tuples
grow larger than the window size Ws, the oldest tuples are
deleted directly from the smart window. This suggests that
the smart approach exceeds the naive approach if the follow-
ing equation holds.
(1
Rm− τ)Rs > Ws (1)
Fig. 6 compares the throughput of the naive and the pro-
posed approaches. For this experiment we set Wm = 1,
Ws = 100, Rm = Rs1000
, τ = 0 and δ = 1. Rs was varied
from 100,000 tuples/s to 1,000,000 tuples/s. Query 3 was
executed for 5 minutes and we observed the processing rate
as well as the average processing time of an incoming tuple
in the naive and smart approaches.
Fig. 6b shows that the average processing time of the
smart approach is far less than the naive approach. Fig. 6a
shows that the smart approach can deal with more input tu-
ples than the naive approach when the input rate is more
than 400,000 tuples/s. The smart approach graph in Fig.
6a becomes flat after the input rate reaches 800,000 tuples/s
because the prototype system can process up to 800,000 tu-
ples/s.
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
200000 400000 600000 800000 1000000
Pro
cess
ing
rate
(tu
ple
s/s)
Input rate (tuples/s)
naive
smart
(a) Processing Rate
0.00E+000
2.00E-007
4.00E-007
6.00E-007
8.00E-007
1.00E-006
1.20E-006
1.40E-006
1.60E-006
1.80E-006
200000 400000 600000 800000 1000000
Ave
rage
pro
cess
ing
tim
e (
us)
Input rate (tuples/s)
naive
smart
(b) Processing Time
Figure 6: Smart approach throughput
Next we performed experiments to verify Eq. 1. We used
the average processing times to compare the naive and smart
approaches. The experiments were performed with join se-
lectivities δ = 0.1 and δ = 1. For these experiments, the
following set of default values are used: Ws = 1, Wm = 1,
Rm = 100, Rs = 1000 and τ = 0s.
All the graphs in Figs. 7 and 8 shows the average process-
ing time of the naive and smart query processing approaches.
Fig. 7a shows the case of Ws = 1 which satisfies Eq. 1 and
therefore the average processing time of the smart approach
is far less than the naive approach. However with Ws = 10,
Eq. 1 does not hold and therefore there is not significant dif-
ference between the processing times of the two approaches.
Fig. 7b shows the measurements for different Rm values.
The cases of Rm = 10 and 100 satisfy Eq. 1. Similarly in
Fig. 7c, the cases of Rs = 1,000, 10,000 and 100,000 satisfy
Eq. 1. In Fig. 7d, the cases of τ = 0 and 0.001 also satisfy
Eq. 1.
0.00E+00
1.00E-06
2.00E-06
3.00E-06
4.00E-06
5.00E-06
6.00E-06
7.00E-06
1 10
Aver
age
pro
cess
ing t
ime(
s)
Ws
naive smart
(a) Varying Ws
0.00E+00
1.00E-06
2.00E-06
3.00E-06
4.00E-06
5.00E-06
6.00E-06
10 100 1000 10000
Av
erag
e p
roce
ssin
g t
ime(
s)
Rm
naive smart
(b) Varying Rm
0.00E+00
1.00E-06
2.00E-06
3.00E-06
4.00E-06
5.00E-06
6.00E-06
100 1000 10000 100000
Av
erag
e p
roce
ssin
g t
ime(
s)
Rs
naive smart
(c) Varying Rs
0.00E+00
5.00E-07
1.00E-06
1.50E-06
2.00E-06
2.50E-06
3.00E-06
3.50E-06
4.00E-06
4.50E-06
5.00E-06
0 0.001 0.01 0.1
Aver
age
pro
cess
ing t
ime(
s)
τ
naive smart
(d) Varying τ
Figure 7: Average query processing time with join selectivity
of 1
Fig. 8 shows the measurements for the join processing with
the join selectivity τ = 0.1. The graphs in Fig. 8 show very
similar trend as those of Fig. 7. However, the overall process-
ing costs of both the approaches in Fig. 8 are smaller than
those of Fig. 7 due to the reduction in the join selectivity.
0.00E+00
5.00E-07
1.00E-06
1.50E-06
2.00E-06
2.50E-06
1 10
Aver
age
pro
cess
ing t
ime(
s)
Ws
naive smart
(a) Varying Ws
0.00E+00
1.00E-06
2.00E-06
3.00E-06
4.00E-06
5.00E-06
6.00E-06
7.00E-06
8.00E-06
9.00E-06
10 100 1000 10000A
ver
ag
e p
roce
ssin
g t
ime(
s)
Rm
naive smart
(b) Varying Rm
0.00E+00
2.00E-06
4.00E-06
6.00E-06
8.00E-06
1.00E-05
1.20E-05
1.40E-05
100 1000 10000 100000
Av
erag
e p
roce
ssin
g t
ime(
s)
Rs
naive smart
(c) Varying Rs
0.00E+00
2.00E-07
4.00E-07
6.00E-07
8.00E-07
1.00E-06
1.20E-06
1.40E-06
1.60E-06
1.80E-06
2.00E-06
0 0.001 0.01 0.1
Aver
age
pro
cess
ing t
ime(
s)
τ
naive smart
(d) Varying τ
Figure 8: Average query processing time with join selectivity
of 0.1
b ) Single Stream Query:
The smart approach can be applied to queries with only
one input stream. Query 4 is an example of a single stream
query. We used this query, and the results are shown in Fig.
9a.
MASTER Stream1
SELECT a
FROM Stream1 [Row 1 ]
WHERE b = 1
ACTIVATE DURATION τ = 0
Query 4: Single stream query
From the figure we can observe that the average processing
time of the smart approach is almost the same with the naive
approach. This query deals with only one stream, and the
same stream is designated as the master and is responsible
for generating query results. There is no smart window oper-
ator in the query plan tree and therefore no buffered tuples
are needed. This is the reason why the smart approach is
not advantageous.
0.00E+00
1.00E-06
2.00E-06
3.00E-06
4.00E-06
5.00E-06
6.00E-06
7.00E-06
8.00E-06
9.00E-06
naïve smart
Aver
age
pro
cess
ing t
ime(
s)
naïve smart
(a) Single Stream Query
0.00E+00
2.00E-07
4.00E-07
6.00E-07
8.00E-07
1.00E-06
1.20E-06
1.40E-06
naïve smart
Ave
rage
pro
cess
ing
tim
e(s
) naïve smart
(b) Single Stream Query with External Event Stream
Figure 9: Single Stream Analysis
c ) Single Stream Query with Event Stream:
We evaluated a query with one stream and an additional
event stream to trigger the query. We performed a join be-
tween an event stream and a normal stream using Query 5.
The average processing times of both the query processing
approaches are shown in Fig. 9b. Fig. 9b shows that the
smart approach is much better than the naive approach.
MASTER Event
SELECT Stream2 .∗
FROM Event [Row 1 ] , Stream2 [Row 1 ]
ACTIVATE DURATION τ = 0
Query 5: Single stream query with event stream
7. Conclusion and Future Work
In this work, we have proposed the designated event-driven
stream processing scheme to enable complex event-driven
querying of data streams. We have also proposed an effi-
cient query execution approach extending the incremental
computation scheme employed in some SPEs. We have de-
veloped a prototype stream processing engine implementing
the proposed event-driven stream processing scheme and the
smart query execution approach. Detailed experimental eval-
uations using the prototype have been presented to show the
advantage of the proposed scheme. From the experiments
it is very clear that the proposed smart approach is less ex-
pensive than the naive approach when the input rates of
the master streams are relatively lower than the non-master
streams. Future works include sophisticated query optimiza-
tion techniques incorporating the proposed smart approach
and complex event-driven parallel data processing.
Acknowledgement
This research was partly supported by the program "Re-
search and Development on Real World Big Data Integration
and Analysis" of the Ministry of Education, Culture, Sports,
Science and Technology, Japan.
References[1] R. Motwani et al. Query Processing, Resource Management,
and Approximation in a Data Stream Management System.In Proc. of CIDR, January 2003.
[2] L. Neumeyer, et al. S4: distributed stream computing plat-form. In Proc. of KDCloud, December 2010.
[3] M. Zaharia, et al. Discretized Streams: An Efficient andFault-Tolerant Model for Stream Processing on Large Clus-ters. In Proc. of HotCloud, June 2012.
[4] http://storm-project.net/[5] A. Arasu, et al. CQL: A Language for Continuous Queries
over Streams and Relations. In Proc. of the Intl. Conf. onDatabase Programming Languages, September 2003.
[6] D. Abadi et al. Aurora: A New Model and Architecture forData Stream Management. The VLDB Journal (12)2, pp.120-139, August 2003
[7] S. Chandrasekaran et al. TelegraphCQ: ContinuousDataflow Processing for an Uncertain World. In Proc. ofCIDR, January 2003.
[8] D. J. Abadi et al. The design of the borealis stream process-ing engine. In Proc. of CIDR, January, 2005.
[9] A. Arasu, S. Babu, and Jennifer Widom. The CQL contin-uous query language: semantic foundations and query exe-cution. The VLDB Journal 15(2), pp. 121-142, June 2006.