monitoring streams- a new class of data management applications presented by qing cao at cs@uva

30
Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

Upload: gerald-goodwin

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

Monitoring Streams- A New Class of Data Management Applications

Presented by Qing Cao at CS@UVA

Page 2: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

2/30

Table of contents

Introduction Aurora System Model Aurora Optimization Real-Time Operation Details Critique Conclusion Discussion throughout the talk

Page 3: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

3/30

Introduction Scenario

RFID taggedComponents

Armed various sensors

RPM, temperature, pressure, oil

status, …

Pressure Sensor

Brightness Sensor

User ID and Status

Page 4: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

4/30

Auto Service Database

4G WirelessNetwork

Service center

GPS

Repair center

Homevisit service

Notify Instead ofQuery

Page 5: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

5/30

Scenario Summary

Data Streams rather than Static Data Paradigm shift from HADP to DAHP Can traditional Database be used to ha-

ndle this kind of scenarios? According to the authors, NO!

Page 6: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

6/30

Comparison

Monitoring

Application

Traditional

DBMS

Typical modelData Active

Human Passive

Data Passive

Human Active

Managing History of values

requiredVery hard or

inefficient

Approximate query result

required Not supported

Real-time requirement

required Not supported

Page 7: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

7/30

So…

Quote: The primary goal of the Aurora project is to build a single infrastructure that can efficiently and seamlessly meet the requirements of such demanding applications. To this end, we are currently critically rethinking many existing data management and processing issues, as well as developing new proactive data processing concepts and techniques.

Page 8: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

8/30

Implementation - trigger

DataStream

Output???IBM AS/400

IBM AS/400

IBM AS/400

IBM AS/400

DBMS

DataSubmitter

MessagingSystems

Query registerCHALLENGE

CHALLENGECHALLENGE

CHALLENGE

CHALLENGE

Trigger: they are not

scalable

Data stream: Not in RealTime

Update query: millions update in

short time burst

Query management: often update new triggers or queries

requested by 3rd party

History of values: no scalable way to

support latest location of the car

CHALLENGE

Optimization: Is it helpful doing

massive optimization during high load?

CHALLENGE

QoS: can not ensure service for premium custome

rs

Page 9: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

9/30

Implementation - middleware

DataStream

Output???IBM AS/400

IBM AS/400

IBM AS/400

IBM AS/400

DBMS

DataSubmitter

MessagingSystems

Queryregister

query

IBM AS/400

IBM AS/400

IBM AS/400

QueryProcessor

CHALLENGE

CHALLENGE

CHALLENGECHALLENGE

QoS: can not ensure service for premium custome

rs

Query management: has to use new query

languageData stream: sometimes lost or

delivered lately

History of values: no scalable way to

find latest location of the car

Optimization: Can not benefit

from query optimization

Update query: millions update in

short time burst

CHALLENGE

CHALLENGE

Resource usage: are we efficiently using the system?

CHALLENGE

Page 10: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

10/30

Implementation - Aurora

DataStream

Output

IBM AS/400

IBM AS/400

IBM AS/400

IBM AS/400

DBMS

DataSubmitter

MessagingSystems

Queryregister

CHALLENGE

query

IBM AS/400

IBM AS/400

IBM AS/400

QueryProcessor

CHALLENGE

CHALLENGECHALLENGECHALLENGE

QoS: can not ensure service for premium custome

rs

Query management: has to use new query

languageData stream: sometimes lost or

delivered lately

History of values: no scalable way to

find latest location of the car

Optimization: Can not benefit

from query optimization

Update query: millions update in

short time burst

CHALLENGE

Data stream: new stream processing

architecture

Update queries: new stream processing

architecture

History of the values: new stream

processing architecture

Optimization: run-time

optimization

Query management: intuitive stream algebra and GUI

QoS: specified by application administrator &load shedding

CHALLENGE

Resource usage: are we efficiently using the system?

Resource usage: train scheduling & feed back from/to QoS

Page 11: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

11/30

System model of Aurora

Externaldata source

User application

Operatorboxes

data flowContinuous& ad hoc queries

HistoricalStorage

AuroraSystem

QoS spec

Query spec

Applicationadministrator

Page 12: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

12/30

Implementation - Aurora

DataStream

OutputBuffer manager

StorageManager

Persistent Store

Q1

Q2

Qm

Q1

Q2

Qn

Scheduler

LoadShedder

QoSMonitor

CatalogBox Processors

σμ

Router

inputs outputs

Page 13: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

13/30

Aurora Query Semantics

Traditional Structured Query Language Declarative query on static data

Aurora Data flow model for data stream

Application manager will construct queries using GUI

Stream Query Algebra Queries are processed by SQuAl operators on the data strea

m

Page 14: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

14/30

Operators Discussion

Slide Tumble Latch Resample Filter Drop Map GroupBy MAP+GROUPBY = CASE

Page 15: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

15/30

Query model

b1 b2 b3

b4

b5 b6

b7 b8 b9 app

app

QoS spec

QoS spec

QoS spec

continuous query

view

ad-hoc query

Connectionpoint

Storage

Page 16: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

16/30

Optimization

Dynamic continuous query optimization Inserting projections Combining boxes Reordering boxes

Ad hoc query optimization 1st stage : replace implementation (Filter/Join) 2nd stage : same as continuous query

Page 17: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

17/30

RunTime Operation

QoS Data Structure Storage Management Real-time Scheduling Load Shedding

Page 18: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

18/30

Whole Structure Revisited

DataStream

OutputBuffer manager

StorageManager

Persistent Store

Q1

Q2

Qm

Q1

Q2

Qn

Scheduler

LoadShedder

QoSMonitor

CatalogBox Processors

σμ

Router

inputs outputs

Page 19: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

19/30

Aurora from Above

...

App QoS

...

App QoS

...

......

App QoS

Page 20: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

20/30

Runtime OperationScheduling: Minimizing Per Tuple Processing Overhead

Train Scheduling:

A B… xyz A (x)A (y)A (z) B (A (x))B (A (y))B (A (z))

= Scheduler Action

AB… xyz B (A (x))B (A (y))B (A (z))Box Trains:

A B… xyz A (z, y, x) B (A (z), A (y), A (x))Tuple Trains:

Page 21: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

21/30

Performance

0

10

20

30

40

50

60

70

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Capacity

La

ten

cy

(s

ec

on

ds

)

Box-at-a-time with Tuple-Trains

Superbox

Page 22: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

22/30

Disucssion

Solution approach Rethink about everything for the requirements

Query model Data flow style query specification and QoS

Optimization Dynamic runtime optimization Train scheduling QoS specification based resource management

Page 23: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

23/30

Discussion

Can it works in a distributive manner? Aurora project

What is the final result? After intensive searching of the tens of papers

published on this subject, I finally finds what was implemented:

Page 24: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

24/30

The final Result

The Aurora stream-processing engine. Aurora is currently operational. It consists

of some 100K lines of C++ and Java and

runs on both Unix- and Linux-based platf-orms.

Page 25: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

25/30

Graphical Interface

Page 26: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

26/30

GUI for an Example

Page 27: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

27/30

Critique

The overall approaches lacks in novelty, e.g. stream operators are ad-hoc.

The overall result is not impressing. The

project output is no more than a toy

java program. Papers published lack

in originality, depth, and overlap too

much.

Page 28: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

28/30

Conclusion

Aurora is a large project that aims at

stream query based engine design. Various new approaches are presented. No comparison results found in any

paper. What do you think?

Page 29: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

29/30

Extra on Aurora

Aurora is the Latin word for "dawn". A polar light (caused by solar wind and seen near the poles). The collective noun for a group of polar bears. Several aircraft. Several vessels. Several Companies. In space:

An asteroid, discovered by J. C. Watson, in september 6, 1867. The Aurora Programme, a strategy of the European Space Agency.

In fiction: A superhero in the Marvel Universe. One of the Spacer worlds in Isaac Asimov's fiction

One of at least four distinct music groups: a UK house group, also known as Aurora UK; a California-based ambient group; a contemporary Christian R&B group; a Mexican Latin

music band. The name of the game engine that runs Neverwinter Nights, the toolset is called the Aurora

toolset because of this. AND the aurora system as presented today.

Page 30: Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at CS@UVA

30/30

THANK YOU!