iqdep: integrated query, data and event processing...ow graph is modelled using several operators...

30
Rochester Institute Of Technology iQdep: Integrated Query, Data and Event Processing by Suman Subash Advised By Dr. Peizhao Hu Rochester Institute Of Technology August 2016

Upload: others

Post on 27-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Rochester Institute Of Technology

iQdep: Integrated Query, Data and

Event Processing

by

Suman Subash

Advised By

Dr. Peizhao Hu

Rochester Institute Of Technology

August 2016

Page 2: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Contents

List of Figures ii

List of Tables iii

1 Introduction 1

2 Related Work 2

2.1 Dryad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 Spark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3 Naiad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 Query Processing in Wireless Sensor networks: . . . . . . . . . . . . . . . 5

2.4.1 TinyDB: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 Complex Event Processing: . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 The iQdep Architecture 7

3.1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3 Process Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.4 Distributed Architecture Model . . . . . . . . . . . . . . . . . . . . . . . . 10

3.5 Query Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.5.1 Query Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.5.2 Query Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5.3 Query Planner and Deployment . . . . . . . . . . . . . . . . . . . . 15

3.6 Programming Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.6.1 Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.6.2 Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.7 Continuous Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.7.1 Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.8 Event Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.8.1 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.9 Integrated Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.10 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.10.1 Fire Alarm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Future Work And Conclusion 22

i

Page 3: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

List of Figures

2.1 Dryad Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Spark Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Naiad Timely Dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.1 High Level View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 iQdep Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3 Process Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.4 Shared Nothing Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.5 Query Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.6 Query Lexer Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.7 Lexer Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.8 Parser Grammar Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.9 Parse Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.10 Breakdown Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.11 Pushdown Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.12 Pushdown Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.13 Query Deployment Example . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.14 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.15 Operators Contd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.16 Integrated Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

ii

Page 4: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

List of Tables

2.1 CEP Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

iii

Page 5: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Chapter 1

Introduction

Cloud Platforms such as Google Dataflow, Dryad, and Apache Spark is designed primar-

ily for Data centers. They provide acyclic data flow model where data passed through

deterministic data flow operators. Furthermore, Cloud Platforms also provide Complex

Event processing (CEP) engines which perform complex event pattern analysis to trigger

an action.

Although cloud platforms provide powerful Dataflow operators, they aren’t designed

for IoT on-device execution. Currently, data from IoT devices are transferred to the

cloud platforms where, using the powerful data flow operators, meaningful insights can

be obtained. Also, current systems lack tight integration between CEP and Streaming

Analytics. Streaming and CEP frameworks are studied individually in literature.

In this paper we propose an on-device execution framework called iQdep for IoT which

has the following goals. First, provide a stream processing engine optimized for IoT

devices. Second, integrate stream processing engines with complex event processing

engines for IoT. Finally, we provide a bridge which provides seamless integration between

on-device execution and cloud.

The paper is organized as follows: In section 2, we discuss about related work. Section

3 discusses about the architecture of iQdep. Finally, we’ll end the paper with Future

work and conclusion.

1

Page 6: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Chapter 2

Related Work

2.1 Dryad

Dataflow model was popularized by Dryad [1] in 2006. Using Dryad, users could define

a Dataflow graph which is represented as G(V,E), where V is Vertices and E is Edges.

Vertices represent sequential programs and Edges represent the communication mech-

anism. Dataflow graph is modelled using several operators for data processing. Some

operators include pointwise, cross product and fork. Using these operators, complex di-

rected acyclic graph (DAG) can be constructed. Although, Dryad model was a generic

representation of map reduce, it still had problems solving Iterative Machine Learning

Algorithms.

Dryad system overview can be seen in Figure 2.1. Name server (NS) is the resource

manager for the cluster. It collects information such as available cores and memory

across the cluster. Job manager (JM) uses the collected information to make scheduling

decisions for the user submitted job. Each worker node has a Daemon (D) which executes

the vertices sent by Job Manager.

2

Page 7: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

3

Figure 2.1: Dryad Architecture

2.2 Spark

Apache Spark[2][3][4] [5] solved the problems mentioned in Dryad by introducing RDD (

Resilient Distributed Datasets ). RDDs are immutable partitioned collection of records

on which a deterministic set of operators can be applied. Since the operators are deter-

ministic and operate on all the records in the partition, fault tolerance of RDDs can be

obtained using Lineage of RDDs. Operators are classified as Transformations and Ac-

tions. Transformations are lazy and doesn’t cause execution across the cluster. Actions

trigger the Dataflow graph to execute across the cluster. Furthermore, RDDs can be

persisted in memory which helps for iterative machine learning algorithms.

Figure 2.2 describes the architecture of Apache Spark in Cluster mode. Apache Spark

has a driver program which allows users to interact with the cluster. When the driver

program submits the task to the Cluster Manager, it determines which machine the

task should be executed on. Worker nodes report the resources to the cluster manager.

Additionally, the worker nodes manage the cache which could be used to persist the

RDDs.

Page 8: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

4

Figure 2.2: Spark Architecture

2.3 Naiad

Naiad[6][7][8][9][10] works on Directed graph with cycles. Since Naiad allows cycles

in the graph, iterative and incremental computations are easy to compute. Although

iterative and incremental computations are possible in Apache Spark, it has 1 second

latency to perform computations due to its micro-batch architecture. Naiad avoids this

by providing cycles to the Dataflow graph.

Naiad provides loops in Dataflow graph with the following constraint- Only within a

looping context the cycle can exist. And each looping context should have an ingress,

egress, and a feedback vertex. See Figure 2.3

Figure 2.3: Naiad Timely Dataflow

Page 9: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

5

2.4 Query Processing in Wireless Sensor networks:

Broadly, there are two main approaches for data acquisition [11]: pull-based and push-

based.

Pull-based: In this approach, data is acquired only at fixed user-defined intervals.

Some examples of pull-based approaches are TinyDB[12] and Cougar[13].

Push-based: In push-based approaches, the sensors values are communicated only

when there is a deviation in the expected value. Central Server and Sensors agree on

the expected behavior. This approach enables users to identify interesting patterns in

sensor values. Some examples of push-based approaches are PRESTO[14] and Ken.

Since iQdep uses Pull based approach, we’ll be discussing about tinydb in more detail.

2.4.1 TinyDB:

TinyDb uses a pull-based approach for query processing. It defines a table called sensors

which is partitioned across the sensor devices. Every sensor device can produce one or

more sensed value such as Light Intensity and Temperature. TinyDB schema looks as

shown below:

Timestamp, A1, A2, A3...., An

where A1 is first attribute which could be temperature for example. Some devices might

not produce all attribute values. In that case TinyDB allows sensors to produce NULL

values for certain columns.

For Query processing, TinyDb constructs a Semantic Routing Tree.[12]

2.5 Complex Event Processing:

Complex Event Processing has been widely used in Financial Industry for high per-

formance trading. Complex Event Processing has been well studied in literature but

mostly from Temporal Operators perspective. Spatial Operators and integration with

Streaming Dataflow still requires further research. For IoT, both Streaming Dataflow

Processing and Complex Event Processing(Temporal and Spatial) are necessary. Below

Table 2.1 provides a overview of the CEP research:

As we can see from the Table 2.1, none of the papers discuss about Spatial Operators.

Temporal Operators have been well discussed in all the papers. Cayuga and Snoop IB

Page 10: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

6

CEP Cayuga[15] SASE [16] Snoop IB[17] High Perfor-mance EventProcessing[18]

Spatial Operators - - - -

Temporal Operators DUR WITHIN RANGE WITHIN

Nested Queries YES Limited YES Limited

Condition Operators YES YES YES YES

Integration With Dataflow - - - -

Table 2.1: CEP Literature Review

has well defined support for nested queries. Lack of support for integration of streaming

Dataflow and complex event processing.

Page 11: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Chapter 3

The iQdep Architecture

In this section, we’ll deep dive into the iQdep Architecture. Figure 3.1 depicts the

high-level systems view of iQdep.

Figure 3.1: High Level View

A collection of networked devices (IoT) can send data either directly to the cloud sys-

tems such as Apache Spark and Google Cloud Dataflow or it could perform on-device

execution. The controller is central component which performs tasks such as scheduling

and maintaining topologies. The Controller interacts with the Forwarder to build a flow

table for Streaming Dataflow. The inputs and output flows are defined in the flow table.

Forwarder receives data from other Forwarders,applies a user defined function on the

7

Page 12: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

8

data, and sends the output to other Forwarders. The bridge enables some part of the

iQdep Dataflow graph to be executed on the cloud platforms.

3.1 Data Model

The iQdep framework defines the following data model:

A1, A2, A3, ..., An, T ime, Location

A1 to An are sensor attributes such as Temperature and Light Intensity. The senor

devices expose additional information such as Time and Location. Providing Time and

Location enables us to perform spatial and temporal queries.

3.2 Components

The iQdep Framework consists of four main components as illustrated in Figure 3.2.

Client Query Interfaces: Client submits user requests to the iQdep framework us-

ing one of the protocols mentioned in the Client Communications Figure 3.2. Direct

Communication is used when the client interacts with the iQdep framework directly. JD-

BC/ODBC is useful while interfacing with external applications such as reporting/visu-

alizations tools. The iQdep framework can also be extended to provide additional client

communication protocols. Currently, iQdep only supports a Direct connection.

Query Processor: When the client submits the SQL query, query processor receives

the query and performs the following tasks. First, query parser performs lexical analysis

and parsing to build a query Parse Tree (PT). From the PT, an Abstract Syntax Tree

(AST) is constructed. Next, query optimizer accepts an AST and produces an opti-

mized Query Plan. The optimized Query Plan is mapped to a Physical Plan using the

Topology manager. Finally, the Plan Executor deploys the Physical Plan with the least

cost across the IoT devices.

Storage: Several operators in iQdep Query Plan request data from the storage de-

vices. Upon receiving a request, the storage component acquires a lock on requested

data to ensure consistency. Finally, the transactions performed by the buffer manager

will be logged to an external device for fault tolerance. The iQdep does not have a

well-defined storage manager implemented which will be addressed in future work.

Page 13: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

9

Figure 3.2: iQdep Components

Shared Services: During query execution, most of the services mentioned in the

Shared Services will be utilized. Memory Manager provides memory allocation and

de-allocation APIs during the lifetime of the query.

3.3 Process Model

In order to handle concurrent requests from many users, a good process model needs

to be defined. The defined process model will have a significant impact on the overall

software architecture. Several process models have been studied in literature and some

of them are

• Process per Worker

• Process Pool or Thread Pool

• Thread per Worker

Page 14: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

10

Figure 3.3: Process Model

In iQdep framework, we use a variant of the Thread per Worker process model, called

CPU Pinned Thread per Worker. This model is also incorporated in MySQL[19] and

IBM DB2 [20]. We use Akka Actors[21] to pin a Thread to a CPU core. The Actor is

backed by a thread pool executor with a single thread within it. Furthermore, it has a

priority mailbox to handle different prioritized messages. The communication between

threads happen via asynchronous message passing.

Figure 3.3 describes the CPU core Pinned thread per worker process model. The reason

why this process model has been selected is because its easy to implement since each

thread is mapped directly to an OS thread. Additionally, its easier to debug when we

have single thread per CPU core. Scaling the CPU pinned thread per worker might have

challenges especially on IoT devices due to the memory constraints, since each thread

consumes around 1MB of memory. We will be addressing this as part of the future work.

3.4 Distributed Architecture Model

There are several distributed architectural models studied in literature. Some of them

include

• Shared Memory

• Shared Nothing

• Shared Disk

• NUMA: Non uniform Memory access

Page 15: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

11

• Hybrid

Figure 3.4: Shared Nothing Architecture

The iQdep framework uses Shared Nothing architecture because it provides easy fault

tolerance management. The device failures are isolated to a single machine in shared

nothing architecture thus providing high availability. Finally, with proper replication

strategy, we can ensure the system is available at all times.

Figure 3.4 describes the shared nothing architecture, in which the hardware resources

such as RAM and Disk, are not shared across devices. Every device is responsible

for accessing its local data such as temperature and ambient light. The IoT devices

communicate using commodity networking components using asynchronous messages.

The CPU pinned threads accepts tasks and performs the required calculations and re-

turns the results back to the client.

Replication strategy is not available in the current version of the framework.

3.5 Query Processor

A query processor takes a iQdep SQL statement parses it, performs optimizations and

converts it into a Dataflow graph. The components of query processor was discussed

in section 3.1. Figure 3.5 represents the flow once the query is submitted to when the

query is deployed.

3.5.1 Query Parser

Given a SQL query, the query parser performs two tasks lexical analysis and parse tree

construction.

Figure 3.6 illustrates few lexer rules defined as part of the iQdep framework.

Page 16: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

12

Figure 3.5: Query Processor

Figure 3.6: Query Lexer Rules

The output of lexical analyzer, given a SQL query, is as shown Figure 3.7. The output

[@0, 0 : 5 =′ select′, < 6 >, 1 : 0] denotes that this is the first token which starts from

character position 0 and ends at position 5. The token text is ”select”. The token has

6 characters and is present on line 1.

After the lexical analysis stage the parse tree construction begins. Parser grammar is

written using CFG ( context free grammar ). Figure 3.8 illustrates few CFG rules.

The outcome of the parse tree construction for the below SQL query can be seen in

Figure 3.9

select col1, col2, col3 from (Event e1, Event e2, Event e3)

Page 17: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

13

Figure 3.7: Lexer Output

Figure 3.8: Parser Grammar Rules

Figure 3.9: Parse Tree

Currently iQdep supports only limited grammar rules and lexer rules. More rules need

to be added to support additional operators which will be addressed as part of future

work.

Page 18: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

14

3.5.2 Query Optimizer

Query Optimizer is responsible for generating the most efficient Logical Plan. There are

three ways to optimize the SQL query studied in the literature: Rule Based Optimizer,

Cost Based Optimizer and Adaptive Query Optimizer.

Rule Based Optimizer[22] [23]: applies several rules repeatedly on the Logical Plan (

Tree/DAG ) to generate new Optimized Logical Plan ( Tree/DAG ).

Cost Based Optimizer[24]: estimates the cost associated with accessing disk, CPU and

network for several Logical Plans and outputs the best Logical plan amongst them.

Adaptive Query Optimizer[25][26]: When execution plan changes mid-query the need

for adaptive query optimizer becomes evident. This will become significant especially

when we are handling continuous queries.

In the iQdep framework, we chose to incorporate a Rule Based Optimizer. It is easy to

implement since each rule will not take more than few lines of code. It provides easy

extension since users can extend the available rules set to add their own.

Now, we’ll be discussing the different rules available in the iQdep query optimizer and

how the rules affect the Logical plan using examples.

(1) Break up predicates: Composition of predicates is split into simple predicates as

shown Figure 3.10.

Figure 3.10: Breakdown Predicates

(2) Pushdown predicates: is mostly used to minimize the network traffic by filtering

records at the source. Predicate push down can be seen in action in the below Figure

3.11.

Page 19: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

15

Figure 3.11: Pushdown Predicates

(3) Pushdown Projection: is also used to minimize the network traffic by filtering records

at the source. Projection push down can be seen in action in the below Figure 3.12

Figure 3.12: Pushdown Projection

3.5.3 Query Planner and Deployment

In Query Planning and Deployment phase, an overlay network is formed. The forma-

tion of the overlay network occurs with the help of the Controller. Within the overlay

network, a leader is elected. The leader will forward the query to each of its children.

Forwarding of the query will happen based on the entry in the Flow table provided by

the controller. If a child has multiple parents, it’ll choose the parent with least cost,

and returns its value to that parent. Parent nodes will append/partially aggregate the

values obtained from children along with its own value, and return it to the leader.

Leader performs the final aggregation and returns the result to the client. Note, partial

aggregations are applicable only when the query is commutative and associative.

Page 20: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

16

Figure 3.13 demonstrates an example of a Deployed query on IoT devices. An overlay

network is formed with a leader elected as the root node. In this case, node ’A’ receives

the query, forwards the query to its children. Each child performs either forwarding or

returns the results based on whether its a leaf node.

Figure 3.13: Query Deployment Example

The parent nodes perform partial aggregation when the query is commutative and as-

sociative. Final aggregation is performed at the leader node. Average Operator can

perform partial aggregation by reducing the query to return (sum, count). Leader

performs the division of (Total Sum / Total Count )

3.6 Programming Abstraction

The iQdep framework allows the users to write a Dataflow graph using API abstractions.

iQdep provides APIs similar to Dryad and Apache Spark.

Figure 3.14: Operators

Page 21: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

17

Figure 3.15: Operators Contd

iQdep APIs can be classified into two categories: Transformations and Actions. Trans-

formations and Actions are listed in Figure 3.14 and 3.15. The dotted circles are

Actions.

3.6.1 Map

Map is a transformation which accepts a Function as an argument. Function takes

an argument of type T and returns a type R. Map allows us to convert, for example,

temperature from one unit to another.

3.6.2 Filter

Filter is a transformation which accepts a Predicate as an argument. Predicate Lambda

expression takes an argument of type T and returns a boolean. The output type of the

Filter transform will remain the same since it doesn’t modify the input type. Filter

allows us to remove unwanted records. For example, if we want to remove temperatures

> 30F, filter can be applied.

There are several other operators available as part of iQdep framework. Please look at

the github repository for additional details.

3.7 Continuous Queries

Continuous queries (CQs) are long repeatedly running queries against new data to pro-

duce new results. The data might be coming from sensors or persistent storage. Since the

Page 22: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

18

data might be bounded or unbounded, the programming abstractions discussed above

will not suffice. Operators such as Joins and Sort will not complete. The operators

which require entire dataset are called ”Blocking” operators. In order to provide ”Non

Blocking” operators, the concept of window is required.

3.7.1 Window

A window represents a snapshot of a finite portion of a stream at any time point.

Operators can compute their functions over the window of data. There are two types of

windows: Time based and Tuple based. Time based windows as defined by

[ Window Size M time units, Window Slide N time units ]

. Similarly, the Tuple based windows are defined as follows:

[ Window Size M rows, Window Slide N rows ]

3.8 Event Processing

Event processing is fundamentally different than Stream processing. Traditionally, Event

processing and stream processing is studied separately but iQdep framework allows users

to combine Event Processing with Stream processing to model applications.

Event processing in general has three components: Event, Condition and Actions.

3.8.1 Events

Events can be either primitive or composite. Primitive events are simple events which the

framework can detect natively. Composite Events are composition of primitive events

using event operators.

iQdep framework allows users to compose events using the following operators:

• And ”∧”: is a binary operator which allows composition of events e1 and e2 as

follows (e1 ∧ e2).

• Or ”∨”: is a binary operator which allows composition of events e1 or e2 as follows

(e1 ∨ e2).

• Not ”¬”: is a unary operator which allows composition of a event as follows (¬e1).

Page 23: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

19

• Followed ”,”: defines the order in which events should occur. Composite event E

= (e1, e2 ∧ e3, e4) represents the order where e1 happens before e2 and e3 which

happens before e4.

Using iQdep programming APIs, composition of events can be achieved using the sytax

shown below:

select ∗ from (Event1 e1, Event2 e2 ∧ Event3 e3) where e1.temp = e2.temp

Page 24: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

20

3.9 Integrated Architecture

Stream Dataflow processing with continuous operators are useful when we need to per-

form data transformations. But when the usecase demands Temporal correlation of

events it is difficult to model using Stream Dataflow processing. Temporal Correla-

tion of events is really important for IoT applications. Hence we have Complex Event

Processors. Figure 3.4 shows an integrated architecture for Streaming Dataflow and

Triggers.

In Figure 3.16, there are three stages: Streaming Dataflow, Event Generation, and

Triggers and Rules Manager.

First Stage involves data transformations/ aggregations of streaming data coming from

either external or internal sources. We use programming abstractions defined in Section

3.2, 3.3 to perform data transformations/aggregations. The result of the streaming

dataflow are sent to the Event Generation Stage ( Stage 2 ).

In Stage 2, the output results are converted to Events. A single Event Generation source

could output multiple events. Each event generated is sent to the event buffer in Stage

3.

In Stage 3, each event stored in the event buffer is sent to multiple event graphs. The

Event graphs are modelled using the Event operators defines in Section 3.4. Each event

graph has interval conditions and also has associated rules. If each rule and event graph

conditions are met then certain actions are taken.

Figure 3.16: Integrated Architecture

Page 25: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

21

3.10 Examples

3.10.1 Fire Alarm

Given:

• Temperature Sensor collects information for every 5 seconds.

• Smoke Sensor collects information for every 5 seconds.

Requirement: If temperature increases by 20% in 30 seconds window and smoke in-

creases by 10% in 30 seconds window and both occur at the same location and within 1

minute duration then trigger fire alarm and notify user and authority.

Modelling:

E1 = temperature increases by 20% in 30 seconds window

E2 = smoke increases by 10% in 30 seconds window

E = E1 ∧ E2 where one event occurred at time t1 and other occurs at time t2. And

t2 − t1 <= 1 minute.

Page 26: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Chapter 4

Future Work And Conclusion

Optimizations for complex event processing needs further research. Currently, iQdep

discusses about spatial operators in terms of k-nearest neighbors and radius for IoT

devices; but higher level abstractions for spatial operators are required. Spatial Meta-

data management will also be part of future work. Further investigation is required on

cyclic Dataflow operators which enables low latency querying.

22

Page 27: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Bibliography

[1] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad:

Distributed data-parallel programs from sequential building blocks. In Proceedings

of the 2Nd ACM SIGOPS/EuroSys European Conference on Computer Systems

2007, EuroSys ’07, pages 59–72, New York, NY, USA, 2007. ACM. ISBN 978-1-

59593-636-3. doi: 10.1145/1272996.1273005. URL http://doi.acm.org/10.1145/

1272996.1273005.

[2] Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion

Stoica. Spark - cluster computing with working sets. In Proceedings of the 2nd

USENIX conference on Hot topics in cloud computing, volume 10, page 10, 2010.

[3] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma,

Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient

Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Com-

puting. In Proceedings of the 9th USENIX Conference on Networked Systems De-

sign and Implementation, NSDI’12, pages 2–2, Berkeley, CA, USA, 2012. USENIX

Association. URL http://dl.acm.org/citation.cfm?id=2228298.2228301.

[4] Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K.

Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei

Zaharia. Spark SQL - Relational Data Processing in Spark. In Proceedings of the

2015 ACM SIGMOD International Conference on Management of Data, SIGMOD

’15, pages 1383–1394, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-2758-

9. doi: 10.1145/2723372.2742797. URL http://doi.acm.org/10.1145/2723372.

2742797.

[5] SparkWebsite. Spark - Cluster Overview, September 2015. URL http://spark.

apache.org/docs/latest/cluster-overview.html.

[6] Rebecca Isaacs Michael Isard Paul Barham Martin Abadi Derek Murray,

Frank McSherry. Naiad: A timely dataflow system. In Proceedings of the

23

Page 28: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Bibliography 24

24th ACM Symposium on Operating Systems Principles (SOSP). ACM, Novem-

ber 2013. URL https://www.microsoft.com/en-us/research/publication/

naiad-a-timely-dataflow-system-2/.

[7] Derek Murray Tom Rodeheffer Martin Abadi, Frank McSherry. Formal

analysis of a distributed algorithm for tracking progress. In FMOODS-

FORTE’13: 15th Formal Methods for Open Object-Based Distributed Systems

and 33nd Formal Techniques for Networked and Distributed Systems. Springer,

June 2013. URL https://www.microsoft.com/en-us/research/publication/

formal-analysis-of-a-distributed-algorithm-for-tracking-progress/.

[8] Tom Rodeheffer. The naiad clock protocol: Specification, model

checking, and correctness proof. Technical report, February 2013.

URL https://www.microsoft.com/en-us/research/publication/

the-naiad-clock-protocol-specification-model-checking-and-correctness-proof/.

[9] Rebecca Isaacs Michael Isard Frank McSherry, Derek Murray. Differential dataflow.

In Proceedings of CIDR 2013, January 2013. URL https://www.microsoft.com/

en-us/research/publication/differential-dataflow/.

[10] Michael Isard Derek Murray Frank McSherry, Rebecca Isaacs. Composable incre-

mental and iterative data-parallel computation with naiad. Technical report, Oc-

tober 2012. URL https://www.microsoft.com/en-us/research/publication/

composable-incremental-and-iterative-data-parallel-computation-with-naiad/.

[11] Karen Henricksen and Ricky Robinson. A survey of middleware for sensor networks:

State-of-the-art and future directions. In Proceedings of the International Workshop

on Middleware for Sensor Networks, MidSens ’06, pages 60–65, New York, NY,

USA, 2006. ACM. ISBN 1-59593-424-3. doi: 10.1145/1176866.1176877. URL http:

//doi.acm.org/10.1145/1176866.1176877.

[12] Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong.

Tinydb: an acquisitional query processing system for sensor networks. ACM Trans-

actions on Database Systems, 30(1):122–173, March 2005.

[13] Philippe Bonnet, Johannes Gehrke, and Praveen Seshadri. Towards sensor database

systems. In Proceedings of the Second International Conference on Mobile Data

Management, pages 3–14, 2001.

[14] Ming Li, Deepak Ganesan, and Prashant Shenoy. Presto: Feedback-driven data

management in sensor networks. IEEE/ACM Trans. Netw., 17(4):1256–1269, Au-

gust 2009. ISSN 1063-6692. doi: 10.1109/TNET.2008.2006818. URL http:

//dx.doi.org/10.1109/TNET.2008.2006818.

Page 29: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Bibliography 25

[15] Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher,

Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker White. Cayuga:

A high-performance event processing engine. In Proceedings of the 2007 ACM

SIGMOD International Conference on Management of Data, SIGMOD ’07, pages

1100–1102, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-686-8. doi: 10.

1145/1247480.1247620. URL http://doi.acm.org/10.1145/1247480.1247620.

[16] Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg,

and Gordon Anderson. Sase: Complex event processing over streams. CoRR,

abs/cs/0612128, 2006. URL http://dblp.uni-trier.de/db/journals/corr/

corr0612.html#abs-cs-0612128.

[17] Sharma Chakravarthy and Deepak Mishra. Snoop: An expressive event specification

language for active databases, 1993.

[18] Eugene Wu, Yanlei Diao, and Shariq Rizvi. High-performance complex event pro-

cessing over streams. In Proceedings of the 2006 ACM SIGMOD International

Conference on Management of Data, SIGMOD ’06, pages 407–418, New York,

NY, USA, 2006. ACM. ISBN 1-59593-434-0. doi: 10.1145/1142473.1142520. URL

http://doi.acm.org/10.1145/1142473.1142520.

[19] Michael Widenius and Davis Axmark. Mysql Reference Manual. O’Reilly & Asso-

ciates, Inc., Sebastopol, CA, USA, 1st edition, 2002. ISBN 0596002653.

[20] Chaitanya Baru, Gilles Fecteau, Ambuj Goyal, Hui-I Hsiao, Anant Jhingran, Sriram

Padmanabhan, Walter Wilson, and Ambuj Goyal Hui i Hsiao. Db2 parallel edition,

1995.

[21] Derek Wyatt. Akka Concurrency. Artima Incorporation, USA, 2013. ISBN

0981531660, 9780981531663.

[22] Goetz Graefe and William J. McKenna. The volcano optimizer generator: Exten-

sibility and efficient search. In Proceedings of the Ninth International Conference

on Data Engineering, pages 209–218, Washington, DC, USA, 1993. IEEE Com-

puter Society. ISBN 0-8186-3570-3. URL http://dl.acm.org/citation.cfm?id=

645478.757691.

[23] Goetz Graefe. The cascades framework for query optimization. Data Engineering

Bulletin, 18, 1995.

[24] P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G.

Price. Access path selection in a relational database management system. In Pro-

ceedings of the 1979 ACM SIGMOD International Conference on Management of

Page 30: iQdep: Integrated Query, Data and Event Processing...ow graph is modelled using several operators for data processing. Some operators include pointwise, cross product and fork. Using

Bibliography 26

Data, SIGMOD ’79, pages 23–34, New York, NY, USA, 1979. ACM. ISBN 0-89791-

001-X. doi: 10.1145/582095.582099. URL http://doi.acm.org/10.1145/582095.

582099.

[25] Ron Avnur and Joseph M. Hellerstein. Eddies: Continuously adaptive query

processing. In Proceedings of the 2000 ACM SIGMOD International Confer-

ence on Management of Data, SIGMOD ’00, pages 261–272, New York, NY,

USA, 2000. ACM. ISBN 1-58113-217-4. doi: 10.1145/342009.335420. URL

http://doi.acm.org/10.1145/342009.335420.

[26] Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh,

and Miso Cilimdzic. Robust query processing through progressive optimization. In

Proceedings of the 2004 ACM SIGMOD International Conference on Management

of Data, SIGMOD ’04, pages 659–670, New York, NY, USA, 2004. ACM. ISBN 1-

58113-859-8. doi: 10.1145/1007568.1007642. URL http://doi.acm.org/10.1145/

1007568.1007642.