sensor data management: challenges and (some) solutions amol deshpande, university of maryland

14
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland

Upload: antony-hancock

Post on 27-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Sensor Data Management:Challenges and (some) Solutions

Amol Deshpande, University of Maryland

Motivation

Unprecedented, and rapidly increasing, instrumentation of our

every-day world

Wireless sensor networks

RFID

Distributed measurementnetworks (e.g. GPS)

Industrial Monitoring

Sensor Data Processing: Now

Database

time id temp

10am 1 20

10am 2 21

.. .. …

10am 7 29

Table raw-data

SensorNetwork

1. Extract all readings into a file

2. Run MATLAB/R/other data

processing tools

3. Write output to a file/back to

the database

4. Write data processing tools to

process/aggregate the output

(maybe using DB)

5. Decide new data to acquire

User

Repeat

Sensor Data Processing: What we want

Database

time id temp

10am 1 20

10am 2 21

.. .. …

10am 7 29

Table raw-data

SensorNetwork

Models to be applied to data in

real-time (at least simple ones)

User

time id temp

10am 1 20

10am 2 21

.. .. …

10am 7 29

Table processed-data

Tasks

DataContinuous (standing) queries

e.g. alert monitoring

Results to continuous queries

Ad hoc queries (possibly against

processed, modeled data)

Data Management Challenges

Very, very large scale Spatio-temporal querying essential

Need new indexing techniques, data description formats,

techniques for “data ingest” (cleaning the data etc)

Much work in scientific data management E.g. SkyServer

Data is typically imprecise, unreliable, or incomplete

(data quality) Measurement noise, failures in sensor/GPS data

High message loss rate in wireless/RFID

Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.

Data Management Challenges

Data is generated continuously and must be processed

in real-time (distributed data streams) Need different query processing paradigms

Typically very high data rates

Must be able to handle a large number of continuous queries

efficiently

Much recent work on “Data Streams” Research systems: TelegraphCQ [Berkeley], STREAM [Stanford],

Aurora [Brown/MIT/Brandeis] etc…

Commercial systems: Streambase, TruViso, …

Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.

Data Management Challenges

Need for real-time statistical modeling of data Eliminate spatial/temporal biases, handle missing data through

extrapolation (e.g. regression, interpolation models) Filter measurement noise (e.g. Kalman Filters) Infer hidden variables, pattern recognition (e.g. HMMs) Fault or anomaly detection Forecasting/prediction (e.g. ARIMA)

Regression/interpolation models

Temperature monitoring

Kalman Filters …

GPS Data

Data Management Challenges

The applications have strong acquisitional aspects Data has to be actively acquired as needed

Typically high data acquisition costs(e.g. energy consumption in battery-

powered devices)

Data provenance

Being able to trace something back to its origins

Data exploration and visualization

Data interoperability

Data security and privacy

Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.

My Research Interests

Managing imprecise and incomplete data Support statistical modeling and querying of sensor data in

relational databases Clean, declarative abstractions Real-time processing of streaming data

Probabilistic databases Store and query data annotated with probabilities

Energy-efficient algorithms for wireless sensornets Data acquisition, target monitoring, data compression .. In-network query processing

MauveDB

Written using Apache Derby Java open source DBMS

Supports an abstraction called model-based views Declarative specification of models to be applied

Can query the output of the models using SQL

Models kept updated as new data/measurements arrive

A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006

B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008

MauveDB

A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006

B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008

MauveDB

Written using Apache Derby Java open source DBMS

Supports an abstraction called model-based views Declarative specification of models to be applied

Can query the output of the models using SQL

Models kept updated as new data/measurements arrive

Status: Support for Regression- and Interpolation-based views

Currently building support for views based on Dynamic Bayesian

networks (Kalman Filters, HMMs etc)

Ongoing work: Query processing and optimization, continuous queries

APIs for arbitrary models …

A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006

B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008

Probabilistic Databases

Motivation: Increasing amounts of uncertain data From sensor networks

Imprecise data, data with confidence/accuracy bounds

Human-observed data

Statistical modeling/machine learning Many models provide a distribution over a set of labels (e.g. HMMs)

Information extraction from text

Social networks

How to manage and query such data in relational databases ? Different types of uncertainties

Complex correlation patterns

Much work in database community over last few years

P. Sen, A. Deshpande; Representing and Querying Correlated Tuples in Probabilistic Databases; ICDE 2007

Thanks !

Questions ?