1 probabilistic data management for the digital home: the heisendata project minos garofalakis (ir...

18
1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof. Mike Franklin, Daisy Wang, Eirinaios Michelakis September 22, 2006

Post on 15-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

1

Probabilistic Data Management for the Digital Home:

The HeisenData Project

Minos Garofalakis (IR Berkeley)

UC Berkeley: Prof. Joe Hellerstein, Prof. Mike Franklin, Daisy Wang, Eirinaios Michelakis

September 22, 2006

Page 2: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

2

Stanford UPDB MeetingSeptember 22, 2006

The Home, Year 2020

1000s of sensors (light, temp, sound, motion, location, …)

100s of actuators (locks, switches, heating, water, …)

Masses of data (“Traditional” data plus sensor streams)

Does a lot for you: security, HVAC, energy management & demand-response, entertainment, …

– Use Alice’s motion patterns to activate electrical devices (e.g., water heating)

– Correlate user motions with existing patterns to detect “suspicious” behavior

Page 3: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

3

Stanford UPDB MeetingSeptember 22, 2006

Not Just “Pie-in-the-Sky” Fantasy

Many prototypes GaTech, MIT, U Colorado, UT Arlington, Orange, Philips, MSR

Advances in Statistical Machine LearningActivity recognition, DARPA Grand Challenge, image recognition

Advances in sensing/actuationWireless, nano tech, sensor data fusion, real-world applications now

Statistical learning techniques enabling rapid advance

But most efforts are stand-alone, “point” solutions

Page 4: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

4

Stanford UPDB MeetingSeptember 22, 2006

Thesis: A “Smart” Home Must

1. Handle uncertainty and correlation (probabilistic reasoning)

P( sensor 2455 fired accurately ) > .8 P( someone in den | behavior of sensors ) > .95 P( Bob in den | Bob’s recent history ) < .05 P( Bob eating dinner | house global state ) > .75 P( Bob is happy | years observing Bob ) > .8

A hierarchy of inferences, from minute to abstract

Recognize, manage, and exploit correlations (spatial, temporal)

Page 5: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

5

Stanford UPDB MeetingSeptember 22, 2006

Example App: People Tracking

Motion (M2)

Motion (M1)

RFID

Door

Alice

Bob

Correlated

sensors

Sample event:

“Bob is at the front door” with high confidence

Page 6: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

6

Stanford UPDB MeetingSeptember 22, 2006

Thesis: A “Smart” Home Must

2. Share its knowledge across applications

Security, HVAC, entertainment, etc., apps need to

Share all objects (sensors, floor plan, people, …)

Share common models of the world, e.g., When does Bob usually come home on Tue? What’s a typical Sunday like? Who’s in the kitchen now?

Page 7: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

7

Stanford UPDB MeetingSeptember 22, 2006

Thesis: A “Smart” Home Must

3. Support both real-time & retrospective reasoning

Example real-time reasoning Fire alarm Intruder alert

Example retrospective reasoning Turn on hot water heater just-in-time Automatically detect and enter “vacation mode”

Page 8: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

8

Stanford UPDB MeetingSeptember 22, 2006

Existing Approaches

Uncertainty in DB management systems– Simple uncertainty models

– Independent tuples, only limited correlation modeling

– Attaching probabilities at the wrong granularity

ML and “intelligent environment” app areas– No sharing of data or models among apps

– Hard-wired world models, difficult to code/update

Page 9: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

9

Stanford UPDB MeetingSeptember 22, 2006

Existing Approaches (contd.)

All interesting data processing done outside the database! Lose all key benefits of a DBMS (declarative querying,

persistence, optimization, …) No sharing of data/knowledge/abstractions, duplication of effort

time id temp

10am 1 20

10am 2 21

.. .. …

10am 7 29

time id temp

10am 1 20

10am 2 21

.. .. …

10am 7 29

Raw Data TablesRaw Data Tables

Relational DBMSRelational DBMSSensor/RFID streamsSensor/RFID streams(+ metadata, floor plans, …)(+ metadata, floor plans, …)

SELECT *SELECT *FROM RAWDATAFROM RAWDATA

INPUT FILE INPUT FILE

……

OUTPUT FILE OUTPUT FILE

Page 10: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

10

Stanford UPDB MeetingSeptember 22, 2006

The HeisenData Project Integrated data-management & probabilistic-

reasoning platform

Push stat learning functions inside the DBMS– Model learning, inference, querying, …– Uncertainty, correlations and probabilistic reasoning as

“first-class citizens”

Provide high-level declarative interface, persistence, optimization, … for– Probabilistic models of the world & inference queries– Object/event hierarchies (w/ basic “out of the box” objects)

HeisenDB Engine: basis for ML app development

Page 11: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

11

Stanford UPDB MeetingSeptember 22, 2006

HeisenData Model

(Evidence + Model) define a probability distribution over “possible worlds”

Complete data model

time id temp volt

10am 1 20 2.5

10am 2 21 XXX

.. .. …

10am 7 2.8

T1T1

T3T3

T2T2

T4T4

V1V1

V3V3

Evidence Evidence Table(s)Table(s)

Hierarchical Hierarchical FO Graphical FO Graphical ModelModel

++

time id temp volt

10am 1 20 2.5

10am 2 21 2.7

.. .. …

10am 7 26 2.8

time id temp volt

10am 1 20 2.5

10am 2 21 2.7

.. .. …

10am 7 28 2.8

time id temp volt

10am 1 20 2.5

10am 2 21 2.7

.. .. …

10am 7 26 2.8

Prob=0.4Prob=0.4

Prob=0.3Prob=0.3

Prob=0.3Prob=0.3

““Possible Worlds”Possible Worlds”

Prob (World | Evidence)Prob (World | Evidence)ModelModel

Page 12: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

12

Stanford UPDB MeetingSeptember 22, 2006

Probabilistic Graphical Models 101

Nodes = Random Variables (RVs); Edges capture direct correlations

Parameterization = factor table for each clique– “Marginal probability distribution” (in general, “correlation

strengths”)

– Concise representation of multidimensional joint pdf

Probabilistic inference: Conditioning, marginalization, MAP estimation, …

T1T1

T3T3

T2T2

T4T4

T1 T2 P

21 22 0.5

22 23 0.2

.. .. …

24 27 0.1

T1 T3 P

21 23 0.2

22 21 0.3

.. .. …

24 29 0.2

. . . . . .

Page 13: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

13

Stanford UPDB MeetingSeptember 22, 2006

Hierarchical FO Graphical Models

Goal: Capture correlations at the right abstraction level Semantic hierarchy of RV entities (GROUP-BYs)

– In general, RVs can be defined as “slices” over the table schema– Probabilistic correlations expressed at a level are quantified over

all descendant RVs– Can also have exceptions/overrides at finer resolutions

Cleaner, more intuitive probabilistic models More opportunities for optimizing probabilistic inference

Temperature(T)

LivingRoom(TL)

T1 T2 T3 T4 T5

Bathroom(TB)

TT VV

T1T1

T3T3

T2T2

T4T4 T5T5

Page 14: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

14

Stanford UPDB MeetingSeptember 22, 2006

HeisenData Query Processing

“Possible worlds”: Clean semantics but impractical! Perform all query processing (relational & inference

operators) over evidence + model

Evidence TablesEvidence Tables ++Probabilistic ModelProbabilistic Model

Evidence TablesEvidence Tables ++Result ModelResult Model

Distribution ofDistribution ofPossible WorldsPossible Worlds

Resulting Possible Resulting Possible Worlds DistributionWorlds Distribution

Relational &Relational &Inference QueriesInference Queries

ExpandExpand InferInfer

Relational QueriesRelational Queries(for each world)(for each world)

FAST!!FAST!!

INFEASIBLE – Exponential explosion!!INFEASIBLE – Exponential explosion!!

Page 15: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

15

Stanford UPDB MeetingSeptember 22, 2006

Query Processing over HFO Models

Query processing algebra = both traditional relational operators and probabilistic inference – Simple example query: Find most probable sensor readings for

tomorrow, and join them with last week’s averages

– Cost, optimize, process such queries?

– Operate over both HFO factor tables and evidence

– “Open” inference primitives for optimizer (cost, ordering, etc.), access structures, …

Relational operators over HFO models – Output: Model for the possible worlds in the relational result

– Non-trivial – the different granularities of RVs in the model can complicate things even for simple operations

Page 16: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

16

Stanford UPDB MeetingSeptember 22, 2006

Challenges: Theoretical & Practical

What is the right language/algebra/interface?– Completeness, soundness

– Expressiveness & ease of use

Query Processing & Optimization– Inference is expensive!

– How to optimize & process probabilistic queries with relational and inference operators?

– How to index/summarize/sketch probabilistic data?

– Physical DB design (indexes, access structs, views, …)?

– CPU Intensive: Exploiting parallelism and many-core

Efficient hierarchical model learning & maintenance

Page 17: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

17

Stanford UPDB MeetingSeptember 22, 2006

Summary

HeisenData Engine: A base for “intelligent environment” application development

Handles real-world uncertainty and correlations

Pushes statistical learning tools into a DBMS

Sits between home storage management functions and “intelligent” applications

Page 18: 1 Probabilistic Data Management for the Digital Home: The HeisenData Project Minos Garofalakis (IR Berkeley) UC Berkeley: Prof. Joe Hellerstein, Prof

18

Stanford UPDB MeetingSeptember 22, 2006

Thank you!

[email protected]

http://berkeley.intel-research.net/minos/