spchains: a declarative framework for data stream processing in pervasive applications
Post on 08-May-2015
780 Views
Preview:
DESCRIPTION
TRANSCRIPT
spChains:
A Declarative Framework for Data Stream
Processing in Pervasive Applications
Dario Bonino, Fulvio Corno
Politecnico di Torino Dip. Automatica e Informatica
Torino, Italy
http://elite.polito.it
The 3rd International Conference on
Ambient Systems, Networks and Technologies
August 27-29, 2012, Niagara Falls, Ontario, Canada
Goals
Enable real-time ambient & sensor data processing
Allow AmI designers to easily specify required
computations
Provide an extensible open source processing library
spChains ANT’2012, Niagara Falls, Canada 2
Outline
spChains ANT’2012, Niagara Falls, Canada 3
Motivation and Background
Stream processing
spChains Framework
Use cases
Conclusions
Motivation
Ambient Intelligence Systems
100’s or 1,000’s of sensors
Different physical quantities (ºC, %H2O, kW, kWh, …)
Sampling frequencies from seconds to minutes
Huge stream of data being generated
Storage and retrieval
On-line processing
Off-line processing
Analytics
spChains ANT’2012, Niagara Falls, Canada 4
On-line processing: Applications
Data Decimation (from kHz to mHz)
Aggregation (over time, over space, over sensor types)
Averaging
Feeding User Displays and Dashboards
Computing up-to-date and user-meaningful information
Monitoring and Alerting
Checking Thresholds
Generating Alert messages
Virtual Sensors
Computing derivative quantities
spChains ANT’2012, Niagara Falls, Canada 5
Requirements
Input: up to 10,000-100,000 events/second
Data: real-valued quantities, explicit units of measure
Output: real-valued or Boolean, often at much lower
frequency
Computation: custom-defined depending on the
application requirements
Operators: reusable standard temporal operations
applicable to data streams
Usability: should not require database expert to define
computations, domain experts must be autonomous
spChains ANT’2012, Niagara Falls, Canada 6
Technology scouting
Standard Relational DBMS
Good for storage
Not efficient for
computations
Rely on central servers
NoSQL approaches
Great for storage
May do computations,
require custom
programming and expertise
Rely on central (or cloud)
servers
Custom programming
Perfect fit with application
requirements
Very expensive to
customize
Stream Processing
No storage
Excellent for computations
Requires custom expertise
spChains ANT’2012, Niagara Falls, Canada 7
Stream Processing
(or Complex Event Processing, CEP)
Event processing: tracking and analyzing streams of data
«events», and deriving a conclusion from them
Defines a set of (fixed) queries
Event streams are analyzed in real time (often with in-
memory processing) according to the programmed queries
Guarantees fast and scalable processing
Increasingly adopted in different domains: Business Process
Management, Recommender Systems, Financial Services, Time
Series, …
Several tools available (commercial and open source)
Specific skills needed to write efficient queries, in tool-
dependent languages
spChains ANT’2012, Niagara Falls, Canada 8
Stream Processing
(or Complex Event Processing, CEP)
Event processing: tracking and analyzing streams of data
«events», and deriving a conclusion from them
Defines a set of (fixed) queries
Event streams are analyzed in real time (often with in-
memory processing) according to the programmed queries
Guarantees fast and scalable processing
Increasingly adopted in different domains: Business Process
Management, Recommender Systems, Financial Services, Time
Series, …
Several tools available (commercial and open source)
Specific skills needed to write efficient queries, in tool-
dependent languages
spChains ANT’2012, Niagara Falls, Canada 9
insert into RealEvent(src, streamName, value, unitOfMeasure) select ‘‘Average’’, ‘‘Average-out’’, avg(value) as value, unitOfMeasure from realEvent (streamName=’’M1’’). win:time\_batch(‘‘1h’’) group by src, streamName, unitOfMeasure; insert into BooleanEvent(src, streamName, booleanValue) select ‘‘Threshold’’, ‘‘Threshold-out’’ as streamName, true as value from pattern [every (oldSample=RealEvent( streamName=‘‘Average-out’’, MeasureEventComparator.compareToMeasure(oldSample,‘‘1kW’’, EventComparisonEnum.LESS_THAN_OR_EQUAL)) -> newSample=RealEvent(streamName=oldSample.streamName, MeasureEventComparator.compareToMeasure(newSample,‘‘1kW’’, EventComparisonEnum.GREATER_THAN)))].win:length(2);
Proposed approach (1)
Stream Processing for event data processing in real time
(Extensible) Library of predefined operators (spBlocks)
Declarative framework (spChains) to express the
required computations
Each Computation = Stream Processing Chain
Chain = Sequence of Stream Processing Blocks
Block = predefined operator, configured with parameters
spChains ANT’2012, Niagara Falls, Canada 10
Proposed approach (2)
The set of spChains is described as a simple XML file
All chains are automatically mapped to Stream
Processing queries
spChains ANT’2012, Niagara Falls, Canada 11
insert into RealEvent(src, streamName, value, unitOfMeasure) select ‘‘Average’’, ‘‘Average-out’’, avg(value) as value, unitOfMeasure from realEvent (streamName=’’M1’’). win:time\_batch(‘‘1h’’) group by src, streamName, unitOfMeasure; insert into BooleanEvent(src, streamName, booleanValue) select ‘‘Threshold’’, ‘‘Threshold-out’’ as streamName, true as value from pattern [every (oldSample=RealEvent( streamName=‘‘Average-out’’, MeasureEventComparator.compareToMeasure(oldSample,‘‘1kW’’, EventComparisonEnum.LESS_THAN_OR_EQUAL)) -> newSample=RealEvent(streamName=oldSample.streamName, MeasureEventComparator.compareToMeasure(newSample,‘‘1kW’’, EventComparisonEnum.GREATER_THAN)))].win:length(2);
<spXML:blocks> <spXML:block id="Avg1“ function="AVERAGE"> <spXML:param name="window" value="1“ unitOfMeasure="h"/> <spXML:param name="mode“ value="batch"/> </spXML:block> <spXML:block id="Th1“ function="THRESHOLD"> <spXML:param name="threshold“ value="1" unitOfMeasure="kW"/> </spXML:block> </spXML:blocks>
spChains Framework
spChains ANT’2012, Niagara Falls, Canada 12
Stream Processing
Chains
Stream Processing
Block
Event D
rains
Event So
urce
s
spBlocks
Pervasive/Ubiquitous Communication Infrastructure
Aggregate / Computed Measures
Pattern Match / Alerts
Environmental Data
Pervasive application
(s) Final Users
Chain Definition
Basic spBlock Library
spChains ANT’2012, Niagara Falls, Canada 13
Examples of spChains
spChains ANT’2012, Niagara Falls, Canada 14
Examples of spChains
spChains ANT’2012, Niagara Falls, Canada 15
<spXML:blockid = "Avg1" function = "AVERAGE"> <spXML:param name = "window" value = "1" unitOfMeasure = "h" / > <spXML:param name = "mode" value = "batch" /> </spXML:block>
Implementation
Java spChains library (Apache v2.0 license)
Core library
Esper bindings
Basic spBlock library
Scales up to 200 k events/sec
Already in use
3 different data centers, running on embedded PCs
Monitoring environment, electrical power consumption,
thermal flows (heating and cooling), polled by means of the
Dog2.x multiprotocol gateway
Computed quantity are “pushed” to Web Service collectors
Over 3 months of uptime, no issues found
spChains ANT’2012, Niagara Falls, Canada 16
http://elite.polito.it/spchains
Conclusions
Complex computations in
the field and in real time
Efficient and easy to
integrate
Lowered the barrier to
adoption of Stream
Processing
Future work
User interface
Large-scale installations
spChains ANT’2012, Niagara Falls, Canada 17
http://elite.polito.it
http://elite.polito.it/spchains
fulvio.corno@polito.it
dario.bonino@polito.it
top related