the case for a signal-oriented data stream management systems m. reza rahimi, athena ahmadi,...

24
The Case for a Signal- Oriented Data Stream Management Systems M. REZA RAHIMI, ATHENA AHMADI, ADVANCES IN DATABASE MANAGEMENT SYSTEM TECHNOLOGY, SPRING 2010.

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

The Case for a Signal-Oriented Data Stream Management

SystemsM. REZA RAHIMI, ATHENA AHMADI,

ADVANCES IN DATABASE MANAGEMENT SYSTEM TECHNOLOGY,SPRING 2010.

Outline• Introduction• Typical Application• Data and Programming Model• System Architecture• Optimizations• Conclusion

Introduction

• There is a need for Data Management system that integrates high data rate sensor data and signal processing operations into single system.

• The WaveScope project aim to design an optimal event-stream signal processing systems.

• The project aims to:– Programming Language (WaveScript):

In the category of Domain Specific Language.

– High Performance execution engine.– The WaveScript program could be

distributed over PCs and Sensors.

Sensor DataSignal

Processing

WaveScript (Queries + User define

functions(UDF))

Execution Engine (scheduler and optimization)

Typical Application• To understand better consider the

following application:• Biologist used the sensor network for

study the behavior of Marmot.

• The Idea is to use audio sensors to study the behavior of Marmot.

• They want to gather information to answer the following queries:

• Query 1: Is there current activity (energy) in the frequency band corresponding to the marmot alarm call?

• Query 2: If so which direction is the call coming from? (use beam forming to enhance the signal quality).

• Query 3: Is the call that of male or female?

• Query 4: Where is the individual marmot located over time?

• …..

• The following workflow is for answering the first 3 queries?Query

1

Query 2

Query 3

Data and Programming Model• Data Types: Integer, float, characters,

string, array, sets, SigSeg (signal segments).

• SigSeg: Represents a window into a signal that are regularly spaced in time.

• It also contains information about sampling rates.

• It will provide efficient indexing for getting historical data.

• SigSeg could be easily expanded to support multidimensional signals like image and video.

Class Examples

POD (Plain Old Data Function) Functions

Arithmetic, SigSeg Operations, timebase operations, FFT/IFFT

Subquery Constructors profileDetect, Classify , beamForm, Sync, Zip

Fundamental Stream Operators

Iterate, union

• Programming elements in query work flow:

• In the following we will consider the programming language through sample application.

fun profileDetect (S, scorefun, <winsize, step>, threshsettings)

wins = rewindow(S, winsize, step);

scores : Stream< float >scores = iterate(w in hanning(wins)) {

freq = fft(w);

emit (scorefun(freq)); };

withscores : Stream<float, SigSeg<int16>>withscores = zip2(scores, wins);

return threshFilter(withscores, threshsettings)

Window input stream, ensuring that we will hit each event according to the event sample

rate.

Take a hanning window and convert to frequency domain.

Frequency Decomposition using FFT

Score each frequency-domain window

Associate each original window with its score, and merge them together.

Find time-ranges where scores are above threshold. ThreshFilter returns <bool, starttime, endtime> tuples.

Query 1:Filtering

Example Iterate: Running Database Aggregate

3 Functions: init(),aggregate(),out()Average:

1. init(A,val){A.sum=val; A.count=1;}

2. aggr(A1,A2){A1.sum+=A2.sum;A1.count+=A2.count;}

3. Out(A){return A.sum/A.count}Subquery running_agg(S,

init,aggr,out){ s2=iterate(x in S){state{acc=init();}

Acc=aggr(acc,x); emit out(x);} return s2

}

control = profileDetect (Ch0, marmotScore, <64,192>, <16.0, 0.999, 40, 2400, 48000>);

datawindows = sync4(control, Ch0, Ch1, Ch2, Ch4);

beam<doa,enhanced> = beamform(datawindows, arrayGeometry);

marmots = classify(beam.enhanced, marmotClassifier);return zip2(beam, marmots);

The snapshot of the detected call <bool, time1,time2>

Use the control stream to extract actual data windows.

Beam forming.

Classifying Marmot.

Query 2

System Architecture

Preprocessor

Expander

Compiler

Optimizer

Runtime

Syntax Check

Inline all query plan(expand sub query, POD,…)

Stream and Signal Processing Optimizer

Query Plan in Low-Level Language

such as C.

Run Time Library

Query Plan: The final query plan is an

imperative program corresponding to Aurora

directed graph with iterate, Union, and

source as basic operators

Scheduler: It chooses which operator in query

to run next.

Memory Manager: due to limit in memory for embedded application,

memory manager manage the memory resource, caching,

garbage collection,… But what does timebase

conversion graph mean?

• Scheduler

• Which operators in query to run next,• Tuple passing mechanism• Assiging threads• Compact memory footprint, Cache locality,

Fairness, Scalability, High throuput tuple passing

• Memory manegment

• To scale high data rates, instead of passed by values, passed by reference with copy-on-write

• Garbage collect : reference counting

• Managing timing information corresponding to signal data is a common problem in signal processing applications.

• Signal processing operators typically process vectors of samples with sequence numbers, leaving the application developer to determine how to interpret those samples temporally.

• WaveScope introduces the concept of a timebase, a dynamic data structure that represents and maintains a mapping between sample sequence numbers and time units.

• Based on input from signal source drivers and other WaveScope components, the timebase manager maintains a conversion graph that denotes which conversions are possible.

• In this graph, every node is a timebase, and an edge indicates the capability to convert from one timebase to another.

• The graph may contain cycles as well as redundant paths.

• Conversions may be composed along any path through the graph; when redundant paths exist, a weighted average of the results from each path may result in higher accuracy .

• Node to node time conversion

Distributed Query Execution• The query plan could be executed in

a distributed fashion.

Sensor Node

PCs

Query Stored Data• In addition to handling streaming data, many

WaveScope applications will need to query a pre-existing stored database, or historical data archived on secondary storage (e.g., disk or flash memory).

• Two special WaveScope library functions that will support archiving and querying stored data declaratively:

DiskArchive: which consumes tuples from its input stream and writes them to a named relational table on disk.

DiskSource: which reads tuples from a named relational table on disk and feeds them upstream.

Optimizations• Two category of optimization could

be done.• One in data stream optimization

and the other is signal processing optimization.

• The database optimization techniques has been used for example merging adjacent iterate operators.

• For signal processing by using the relation between operators the optimization could be done as follows:

Conclusion

• The paper talked about how optimally define query language that merges signal and stream processing concepts.

• We think several gap should be filled:– It considers the stream and

signal procesing optimization but for special application that they considered (sensor networks) they should define Power-aware query optimizer.

Conclusion

– The saving data is an issue in these applications. One of the main issues is handling these large amounts of data and retrieve them efficiently. • indexing