monitoring streams- a new class of data management applications presented by qing cao at cs@uva
TRANSCRIPT
Monitoring Streams- A New Class of Data Management Applications
Presented by Qing Cao at CS@UVA
2/30
Table of contents
Introduction Aurora System Model Aurora Optimization Real-Time Operation Details Critique Conclusion Discussion throughout the talk
3/30
Introduction Scenario
RFID taggedComponents
Armed various sensors
RPM, temperature, pressure, oil
status, …
Pressure Sensor
Brightness Sensor
User ID and Status
4/30
Auto Service Database
4G WirelessNetwork
Service center
GPS
Repair center
Homevisit service
Notify Instead ofQuery
5/30
Scenario Summary
Data Streams rather than Static Data Paradigm shift from HADP to DAHP Can traditional Database be used to ha-
ndle this kind of scenarios? According to the authors, NO!
6/30
Comparison
Monitoring
Application
Traditional
DBMS
Typical modelData Active
Human Passive
Data Passive
Human Active
Managing History of values
requiredVery hard or
inefficient
Approximate query result
required Not supported
Real-time requirement
required Not supported
7/30
So…
Quote: The primary goal of the Aurora project is to build a single infrastructure that can efficiently and seamlessly meet the requirements of such demanding applications. To this end, we are currently critically rethinking many existing data management and processing issues, as well as developing new proactive data processing concepts and techniques.
8/30
Implementation - trigger
DataStream
Output???IBM AS/400
IBM AS/400
IBM AS/400
IBM AS/400
DBMS
DataSubmitter
MessagingSystems
Query registerCHALLENGE
CHALLENGECHALLENGE
CHALLENGE
CHALLENGE
Trigger: they are not
scalable
Data stream: Not in RealTime
Update query: millions update in
short time burst
Query management: often update new triggers or queries
requested by 3rd party
History of values: no scalable way to
support latest location of the car
CHALLENGE
Optimization: Is it helpful doing
massive optimization during high load?
CHALLENGE
QoS: can not ensure service for premium custome
rs
9/30
Implementation - middleware
DataStream
Output???IBM AS/400
IBM AS/400
IBM AS/400
IBM AS/400
DBMS
DataSubmitter
MessagingSystems
Queryregister
query
IBM AS/400
IBM AS/400
IBM AS/400
QueryProcessor
CHALLENGE
CHALLENGE
CHALLENGECHALLENGE
QoS: can not ensure service for premium custome
rs
Query management: has to use new query
languageData stream: sometimes lost or
delivered lately
History of values: no scalable way to
find latest location of the car
Optimization: Can not benefit
from query optimization
Update query: millions update in
short time burst
CHALLENGE
CHALLENGE
Resource usage: are we efficiently using the system?
CHALLENGE
10/30
Implementation - Aurora
DataStream
Output
IBM AS/400
IBM AS/400
IBM AS/400
IBM AS/400
DBMS
DataSubmitter
MessagingSystems
Queryregister
CHALLENGE
query
IBM AS/400
IBM AS/400
IBM AS/400
QueryProcessor
CHALLENGE
CHALLENGECHALLENGECHALLENGE
QoS: can not ensure service for premium custome
rs
Query management: has to use new query
languageData stream: sometimes lost or
delivered lately
History of values: no scalable way to
find latest location of the car
Optimization: Can not benefit
from query optimization
Update query: millions update in
short time burst
CHALLENGE
Data stream: new stream processing
architecture
Update queries: new stream processing
architecture
History of the values: new stream
processing architecture
Optimization: run-time
optimization
Query management: intuitive stream algebra and GUI
QoS: specified by application administrator &load shedding
CHALLENGE
Resource usage: are we efficiently using the system?
Resource usage: train scheduling & feed back from/to QoS
11/30
System model of Aurora
Externaldata source
User application
Operatorboxes
data flowContinuous& ad hoc queries
HistoricalStorage
AuroraSystem
QoS spec
Query spec
Applicationadministrator
12/30
Implementation - Aurora
DataStream
OutputBuffer manager
StorageManager
Persistent Store
Q1
Q2
Qm
Q1
Q2
Qn
Scheduler
LoadShedder
QoSMonitor
CatalogBox Processors
σμ
Router
inputs outputs
13/30
Aurora Query Semantics
Traditional Structured Query Language Declarative query on static data
Aurora Data flow model for data stream
Application manager will construct queries using GUI
Stream Query Algebra Queries are processed by SQuAl operators on the data strea
m
14/30
Operators Discussion
Slide Tumble Latch Resample Filter Drop Map GroupBy MAP+GROUPBY = CASE
15/30
Query model
b1 b2 b3
b4
b5 b6
b7 b8 b9 app
app
QoS spec
QoS spec
QoS spec
continuous query
view
ad-hoc query
Connectionpoint
Storage
16/30
Optimization
Dynamic continuous query optimization Inserting projections Combining boxes Reordering boxes
Ad hoc query optimization 1st stage : replace implementation (Filter/Join) 2nd stage : same as continuous query
17/30
RunTime Operation
QoS Data Structure Storage Management Real-time Scheduling Load Shedding
18/30
Whole Structure Revisited
DataStream
OutputBuffer manager
StorageManager
Persistent Store
Q1
Q2
Qm
Q1
Q2
Qn
Scheduler
LoadShedder
QoSMonitor
CatalogBox Processors
σμ
Router
inputs outputs
19/30
Aurora from Above
...
App QoS
...
App QoS
...
......
App QoS
20/30
Runtime OperationScheduling: Minimizing Per Tuple Processing Overhead
Train Scheduling:
A B… xyz A (x)A (y)A (z) B (A (x))B (A (y))B (A (z))
= Scheduler Action
AB… xyz B (A (x))B (A (y))B (A (z))Box Trains:
A B… xyz A (z, y, x) B (A (z), A (y), A (x))Tuple Trains:
21/30
Performance
0
10
20
30
40
50
60
70
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Capacity
La
ten
cy
(s
ec
on
ds
)
Box-at-a-time with Tuple-Trains
Superbox
22/30
Disucssion
Solution approach Rethink about everything for the requirements
Query model Data flow style query specification and QoS
Optimization Dynamic runtime optimization Train scheduling QoS specification based resource management
23/30
Discussion
Can it works in a distributive manner? Aurora project
What is the final result? After intensive searching of the tens of papers
published on this subject, I finally finds what was implemented:
24/30
The final Result
The Aurora stream-processing engine. Aurora is currently operational. It consists
of some 100K lines of C++ and Java and
runs on both Unix- and Linux-based platf-orms.
25/30
Graphical Interface
26/30
GUI for an Example
27/30
Critique
The overall approaches lacks in novelty, e.g. stream operators are ad-hoc.
The overall result is not impressing. The
project output is no more than a toy
java program. Papers published lack
in originality, depth, and overlap too
much.
28/30
Conclusion
Aurora is a large project that aims at
stream query based engine design. Various new approaches are presented. No comparison results found in any
paper. What do you think?
29/30
Extra on Aurora
Aurora is the Latin word for "dawn". A polar light (caused by solar wind and seen near the poles). The collective noun for a group of polar bears. Several aircraft. Several vessels. Several Companies. In space:
An asteroid, discovered by J. C. Watson, in september 6, 1867. The Aurora Programme, a strategy of the European Space Agency.
In fiction: A superhero in the Marvel Universe. One of the Spacer worlds in Isaac Asimov's fiction
One of at least four distinct music groups: a UK house group, also known as Aurora UK; a California-based ambient group; a contemporary Christian R&B group; a Mexican Latin
music band. The name of the game engine that runs Neverwinter Nights, the toolset is called the Aurora
toolset because of this. AND the aurora system as presented today.
30/30
THANK YOU!