lecture 15 15-829a/18-849b/95-811a/19-729a internet-scale sensor systems: design and policy...

53
Lecture 15 15-829A/18-849B/95-811A/19-729A 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Internet-Scale Sensor Systems: Design and Policy Design and Policy Lecture 15 Sensor Databases & Data Stream Systems Phil Gibbons March 4, 2003

Upload: brittany-porter

Post on 20-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Lecture “The Design of an Acquisitional Query Processor for Sensor Networks” Latest paper on the TinyDB work out of U.C. Berkeley & Intel Research Berkeley Goal was to have you study the very latest research on sensor databases Preliminary version. A more polished, camera-ready version will be available March 11 – I will post it. Thanks to Sam Madden for providing slides that I have adapted for use in part of this lecture. Note: He is interviewing at CMU in April. What did you think of this paper?

TRANSCRIPT

Page 1: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

Lecture 15

15-829A/18-849B/95-811A/19-729A15-829A/18-849B/95-811A/19-729A

Internet-Scale Sensor Systems: Internet-Scale Sensor Systems: Design and PolicyDesign and Policy

Lecture 15Sensor Databases & Data Stream

Systems

Phil GibbonsMarch 4, 2003

Page 2: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 2 Lecture 15

OutlineOutline•Sensor Databases

• Madden et al, “The Design of an Acquisitional Query Processor for Sensor Networks”, to appear in Sigmod’03

•Data Stream Systems• Babcock et al, “Models and Issues in Data Stream

Systems”, PODS’02 survey talk

Page 3: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 3 Lecture 15

““The Design of an Acquisitional The Design of an Acquisitional Query Processor for Sensor Query Processor for Sensor NetworksNetworks””•Latest paper on the TinyDB work out of U.C.

Berkeley & Intel Research Berkeley

•Goal was to have you study the very latest research on sensor databases

• Preliminary version. A more polished, camera-ready version will be available March 11 – I will post it.

•Thanks to Sam Madden for providing slides that I have adapted for use in part of this lecture.

• Note: He is interviewing at CMU in April.What did you think of this paper?

Page 4: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 4 Lecture 15

Acquisitional Query ProcessingAcquisitional Query Processing•What’s really new & different about (mote-based)

sensor networks?

•This paper’s answer:• Long running queries on physically embedded devices that

control when and where and with what frequency data is collected

• Versus traditional systems where data is provided a priori

•For a distributed, embedded sensing environment, ACQP provides a framework for addressing issues of

• When, where, and how often data is sensed/sampled• Which data is delivered

Page 5: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 5 Lecture 15

Context: Mica MotesContext: Mica Motes•Tiny Memory

• 4KB RAM• 128KB program memory

•Limited Communication• Broadcast to any that hear it. Form ad-hoc routing tree • ~Ten 48-byte messages delivered per second

•Power consumption• Every bit of data transmitted by radio = 1000 CPU insts• Deep sleeping is 4-10 times less power than when active• Can synchronize clocks with neighboring motes to within

+/- 1 millisec: Ensure all awake at roughly the same time

Page 6: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 6 Lecture 15

Acquisitional Query ProcessingAcquisitional Query Processing• How does the user control acquisition?

• Rates or lifetimes• Event-based triggers

• How should the query be processed?• Sampling as an operator, Power-optimal ordering• Frequent events as joins

• Which nodes have relevant data?• Semantic Routing Tree for effective pruning

• Nodes that are queried together route together

• Which samples should be transmitted?• Pick most “valuable”?• Adaptive transmission & sampling rates

Adapted from slides ©Sam Madden

Page 7: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 7 Lecture 15

Rate & Lifetime QueriesRate & Lifetime Queries• Rate query

SELECT nodeid, light, tempFROM sensorsSAMPLE INTERVAL 1s FOR 10s

• Lifetime querySELECT …LIFETIME 30 days

May not be able to transmit all the data

Estimate sampling rate that achieves this

SELECT …LIFETIME 10 daysMIN SAMPLE INTERVAL 1s

Adapted from slides ©Sam Madden

Page 8: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 8 Lecture 15

Processing Lifetimes: IssuesProcessing Lifetimes: Issues• Provide formulas for estimating power

consumption: set maximum per-node sampling rates

• What makes this difficult?• multiple sensing types (temp, accel) with different drain• estimating the selectivity of predicates• amount transmitted by a node varies widely• root is a bottleneck: all nodes rates must correspond to it• aggregation vs. sending individual values• conditions change: multiple queries, burstiness, message losses

• What to do when can’t transmit all the data

Adapted from slides ©Sam Madden

Page 9: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 9 Lecture 15

Lifetime Based QueriesLifetime Based Queries

Is this experiment convincing?Adapted from slides ©Sam Madden

Page 10: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 10 Lecture 15

Event Based ProcessingEvent Based Processing•ACQP – want to initiate queries in response to events

ON EVENT bird-detect(loc):SELECT AVG(s.light), AVG(s.temp), event.locFROM sensors AS sWHERE dist(s.loc, event.loc) < 10mSAMPLE PERIOD 2s FOR 30s

Reports the average light and temperature level at sensors near a bird nest where a bird has been detected

What are the issues here?E.g., New query instance generated for as long as bird is there

Adapted from slides ©Sam Madden

Page 11: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 11 Lecture 15

Event Based ProcessingEvent Based Processing

Single external interruptAdapted from slides ©Sam Madden

Page 12: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 12 Lecture 15

Acquisitional Query ProcessingAcquisitional Query Processing• How does the user control acquisition?

• Rates or lifetimes• Event-based triggers

• How should the query be processed?• Sampling as an operator, Power-optimal ordering• Frequent events as joins

• Which nodes have relevant data?• Semantic Routing Tree for effective pruning

• Nodes that are queried together route together

• Which samples should be transmitted?• Pick most “valuable”?• Adaptive transmission & sampling rates

Adapted from slides ©Sam Madden

Page 13: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 13 Lecture 15

Power-Optimal Operator Ordering: Power-Optimal Operator Ordering:

Interleave Sampling + SelectionInterleave Sampling + SelectionSELECT light, mag FROM sensorsWHERE pred1(mag) AND pred2(light)SAMPLE INTERVAL 1s

• Energy cost of sampling mag >> cost of sampling light

1500 uJ vs. 90 uJ

• Correct ordering (unless pred1 is very selective):2. Sample light

Apply pred2Sample magApply pred1

1. Sample light Sample magApply pred1Apply pred2

3. Sample mag

Apply pred1

Sample light

Apply pred2

Adapted from slides ©Sam Madden

Page 14: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 14 Lecture 15

Event Query BatchingEvent Query BatchingON EVENT E(nodeid)SELECT aFROM sensors AS sWHERE s.nodeid = e.nodeidSAMPLE INTERVAL d FOR k

Problem: Multiple outstanding queries (lots of samples)

SELECT s.aFROM sensors AS s, events AS eWHERE s.nodeid = e.nodeidAND e.type = E AND s.time – e.time <= k AND s.time > e.timeSAMPLE INTERVAL d

Solution: Rewrite as a sliding window join between sensors and the last k seconds of detected events:

If events are frequent, use join approach. Issues?Assumes regular occurrences: Would like to handle burstiness

Adapted from slides ©Sam Madden

Page 15: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 15 Lecture 15

Acquisitional Query ProcessingAcquisitional Query Processing• How does the user control acquisition?

• Rates or lifetimes• Event-based triggers

• How should the query be processed?• Sampling as an operator, Power-optimal ordering• Frequent events as joins

• Which nodes have relevant data?• Semantic Routing Tree for effective pruning

• Nodes that are queried together route together

• Which samples should be transmitted?• Pick most “valuable”?• Adaptive transmission & sampling rates

Adapted from slides ©Sam Madden

Page 16: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 16 Lecture 15

Attribute Driven Topology Attribute Driven Topology SelectionSelection•Observation: internal queries often over local area

• Or some other subset of the network• E.g. regions with light value in [10,20]

•Idea: build topology for those queries based on values of range-selected attributes

• For range queries• Relatively static trees

• Maintenance Cost

Adapted from slides ©Sam Madden

Page 17: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 17 Lecture 15

Attribute Driven Query Attribute Driven Query PropagationPropagation

1 2 3

4

[1,10][7,15]

[20,40]

SELECT … WHERE a > 5 AND a < 12

Precomputed intervals = Semantic Routing Tree (SRT)

Early pruning

Adapted from slides ©Sam Madden

Page 18: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 18 Lecture 15

Attribute Driven Parent SelectionAttribute Driven Parent Selection

1 2 3

4

[1,10] [7,15] [20,40]

[3,6]

[3,6] [1,10] = [3,6]

[3,6] [7,15] = ø

[3,6] [20,40] = ø

Even without intervals, expect that sending to parent with closest value will help

Adapted from slides ©Sam Madden

Page 19: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 19 Lecture 15

Simulation ResultSimulation Result

Nodes Visited vs. Query Range

050

100150200250300350400450

0.001 0.05 0.1 0.2 0.5 1Query Size as % of Value Range

(Random value distribution, 20x20 grid, ideal connectivity to (8) neighbors)

# o

f No

des V

isite

d (4

00 =

Max

)

Best Case (Expec ted)Closest P arentNearest ValueSnooping

Random Parent

Adapted from slides ©Sam Madden

Page 20: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 20 Lecture 15

Acquisitional Query ProcessingAcquisitional Query Processing• How does the user control acquisition?

• Rates or lifetimes• Event-based triggers

• How should the query be processed?• Sampling as an operator, Power-optimal ordering• Frequent events as joins

• Which nodes have relevant data?• Semantic Routing Tree for effective pruning

• Nodes that are queried together route together

• Which samples should be transmitted?• Pick most “valuable”?• Adaptive transmission & sampling rates

Adapted from slides ©Sam Madden

Page 21: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 21 Lecture 15

Adaptive Transmission RatesAdaptive Transmission RatesSample Rate vs. Delivery Rate

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10 12 14 16Samples Per Second (Per Mote)

Agg

rega

te D

eliv

ery

Rat

e (P

acke

ts/S

econ

d)

1 mote4 motes4 motes, adaptive

Adaptive = 2x % Successful Xmissions

TinyDB monitors channel contention & backs-off as neededAdapted from slides ©Sam Madden

Page 22: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 22 Lecture 15

Prioritizing Data DeliveryPrioritizing Data Delivery• Score each item

• Send largest score• Out of order -> Priority Queue

• Discard or aggregate when buffer is full

[1,2]

Adapted from slides ©Sam Madden

Page 23: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 23 Lecture 15

Choosing Data To SendChoosing Data To SendDelta encoding

[1,2]

Time vs. Value

02

46

810

1214

16

1 2 3 4

Time

Valu

e(time, value)

Adapted from slides ©Sam Madden

Page 24: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 24 Lecture 15

Choosing Data To SendChoosing Data To Send

[2,6] [3,15] [4,1]

[1,2]

|2-6| = 4

|2-15| = 13

|2-4| = 2

Time vs. Value

02

46

810

1214

16

1 2 3 4

Time

Valu

e

Delta encoding

Select which of the 3 to send

Adapted from slides ©Sam Madden

Page 25: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 25 Lecture 15

Choosing Data To SendChoosing Data To Send

[2,6]

[3,15]

[4,1]

[1,2]

Time vs. Value

0

24

68

10

1214

16

1 2 3 4

Time

Valu

e

|2-6| = 4 |15-4| = 11

Delta encoding

Keep selectinguntil hit maxdelivery rate

Adapted from slides ©Sam Madden

Page 26: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 26 Lecture 15

Choosing Data To SendChoosing Data To Send

[2,6]

[3,15] [4,1][1,2]

Time vs. Value

0

24

68

10

1214

16

1 2 3 4

Time

Valu

e

Delta encoding

Adapted from slides ©Sam Madden

Page 27: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 27 Lecture 15

Choosing Data To SendChoosing Data To Send

[2,6] [3,15] [4,1][1,2]

Time vs. Value

0

24

68

10

1214

16

1 2 3 4

Time

Valu

e

Delta encoding

If manageto send all

Adapted from slides ©Sam Madden

Page 28: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 28 Lecture 15

Delta + AdaptivityDelta + Adaptivity

•8 element queue

•4 motes transmitting different signals

•8 samples /sec / mote

Adapted from slides ©Sam Madden

Page 29: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 29 Lecture 15

ACQP SummaryACQP Summary• Lifetime & event based queries

• User preferences for when data is acquired

• Optimizations for• Order of sampling• Events vs. joins

• Semantic Routing Tree• Query dissemination

• Runtime prioritization• Adaptive rate control• Which samples to send

Adapted from slides ©Sam Madden

Page 30: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 30 Lecture 15

OutlineOutline•Sensor Databases

• Madden et al, “The Design of an Acquisitional Query Processor for Sensor Networks”, to appear in Sigmod’03

•Data Stream Systems• Babcock et al, “Models and Issues in Data Stream

Systems”, PODS’02 survey talk

Page 31: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 31 Lecture 15

Models & Issues in Data Stream Models & Issues in Data Stream SystemsSystems•Invited survey paper to PODS 2002

•Good overview of the basics & the issues• But with a definite Stanford bias

• Data arrives in multiple, continuous, rapid, time-varying data streams

• Can have continuous queries• Data Stream Management Systems

What did you think of this paper?

This part of the lecture does not follow the paper

Page 32: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 32 Lecture 15

Data Stream SystemsData Stream Systems•Introduction

•Research in Synopses for Data Streams (models, algorithms, lower bounds)

•Research in Data Stream Management Systems

Page 33: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 33 Lecture 15

Processing Data Streams: Processing Data Streams: MotivationMotivation•Many applications generate streams of data

• Performance measurements in network monitoring and traffic management

• Call detail records in telecommunications• Transactions in retail chains, ATM operations in banks• Log records generated by Web Servers• Sensor network data

•Application characteristics• Massive volumes of data (several terabytes)• Records arrive at a rapid rate

•Goal: Mine patterns, process queries and compute statistics on data streams in real-time

Adapted from slides ©Rajeev Motwani

Page 34: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 34 Lecture 15

Example: Network ManagementExample: Network Management

Network Operations Center

Network

MeasurementsAlarms

Massive amounts of rapidly-arriving data at each node

Adapted from slides ©Rajeev Motwani

Page 35: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 35 Lecture 15

Data Stream SystemsData Stream Systems•Introduction

•Research in Synopses for Data Streams (models, algorithms, lower bounds)

•Research in Data Stream Management Systems

Page 36: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 36 Lecture 15

Data Stream ModelData Stream Model• A data stream is a sequence of elements:

• Stream processing goals• Limited memory for storing synopsis, e.g., O(n), O(log n)• Fast synopsis update time (per element), e.g., O(1)• Fast query time, e.g., O(n), O(log n)

nee ,...,1

Stream ProcessingEngine

(Approximate) Answer

Synopsis in Memory

Data Stream

Adapted from slides ©Rajeev Motwani

Page 37: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 37 Lecture 15

Merged Data Streams ModelMerged Data Streams Model

Stream ProcessingEngine

(Approximate) Answer

Synopsis in Memory

Data Streams

Multiple data streams to a single party/agent• Arbitrary interleaving of streams

• Same goals as before (per stream)Adapted from slides ©Rajeev Motwani

Page 38: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 38 Lecture 15

Distributed Data Streams ModelDistributed Data Streams Model

Stream Processing

Engine

Synopsis in Memory

Data Stream

Stream Processing

Engine

Synopsis in Memory

Data Stream

When a query is requeste

d

AnalysisFrontEnd

(Approximate) Answer

+ Avoids sending streams to Analysis Front End[G, Tirthapura, SPAA’01]

...

Page 39: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 39 Lecture 15

Adversarial Stream InputsAdversarial Stream Inputs•Adversary controls input values and order

• No distributional assumptions on the inputs• Past may not be representative of the future• Typically, do know the input domain

•Randomized algorithms• Have oracle for uniformly random numbers• Would like to minimize the number of oracle calls• Adversary does not adapt to these random numbers

Page 40: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 40 Lecture 15

Coping With Memory LimitationsCoping With Memory Limitations•Many queries cannot be answered over streams,

due to the memory limitations• e.g., see proofs in [Arasu et al, PODS’02]

•However, often a detailed, exact answer over streams is not interesting:

• Prefer summarized data (aggregates)• Prefer to focus only on recent data• Suffices to get the leading digits of aggregates correct

=> Keys to staying within the memory limitations

Page 41: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 41 Lecture 15

Sliding WindowSliding Window•Maintain the aggregate / statistic over a sliding

window of the N most recent stream elements• Motivation: Only the most recent data is important

Position: 1 2 … 20 21 22 23 24 25 26 27 28 29Stream: 0 1 … 1 0 1 0 0 1 1 0 1 0

N = 10

Number of 1’s = 5

N = 10

30 0

Number of 1’s = 4

[Datar, Gionis, Indyk, Motwani, SODA’02]

Page 42: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 42 Lecture 15

New Stream Algorithms forNew Stream Algorithms for• Histograms

• Equi-Width Histograms (Quantiles)• Most popular items, V-Opt Histograms • Wavelets

• Data Mining• Stream Clustering (e.g. k-medians)• Decision Trees

• Frequency moments, Lp Norms of two streams

• Relational DB operators• Join size estimation

Papers in STOC, FOCS, SIGMOD, VLDB, etc

AlsoLower Bounds

Page 43: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 43 Lecture 15

Data Stream SystemsData Stream Systems•Introduction

•Research in Synopses for Data Streams (models, algorithms, lower bounds)

•Research in Data Stream Management Systems

Page 44: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 44 Lecture 15

Traditional DB Management SystemTraditional DB Management System

User/Application

Query OptimizerQuery Processor

DatabaseManagementSystem(DBMS)

QueryQuery ResultResult

LoaderLoader

QueryQuery ResultResult

Adapted from slides ©Rajeev Motwani

Page 45: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 45 Lecture 15

Data Stream Management SystemData Stream Management SystemUser/Application

Stream QueryProcessor

Scratch SpaceScratch Space(Memory and/or Disk)(Memory and/or Disk)

DataStream

ManagementSystem(DSMS)

Register QueryRegister Query ResultsResults

Centralized Processing: I.e., Merged Streams ModelAdapted from slides ©Rajeev Motwani

Page 46: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 46 Lecture 15

Related Database TechnologyRelated Database Technology• Triggers on Conventional Databases / Active Databases

handling stream ordering/rate, scaling/generality for triggers

• Main-Memory Databases handling ordering/rate, better for read-only/query-intensive

• Publish/Subscribe Systems handling stream ordering, event-filtering only, dissemination focus

• Materialized Views handling stream ordering, no streaming output

• Sequence/Temporal/Timeseries Databases represents time/ordering in stored relations

• Realtime Databases transactions with deadlines

Adapted from slides ©Rajeev Motwani

Page 47: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 47 Lecture 15

STREAM Architecture (Stanford)STREAM Architecture (Stanford)

Input streams

Users issue continuous and ad-hoc queries

Administrator monitors query execution and

adjusts run-time parameters

Applications register continuous queries

Output streams

x

x

Waiting Op

Ready Op

Running OpSynopses Query Plans

HistoricalStorage

Adapted from slides ©Rajeev Motwani

Page 48: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 48 Lecture 15

Stream DB ProjectsStream DB Projects• Amazon/CougarAmazon/Cougar (Cornell) – sensors

• Aurora (Brown/MIT) – sensor monitoring, dataflow

• Hancock Hancock (AT&T) – telecom streams

• Niagara (OGI/Wisconsin) – Internet XML databases

• OpenCQ OpenCQ (Georgia) – triggers, incr. view maintenance

• Stream (Stanford) – general-purpose DSMS

• TapestryTapestry (Xerox) – pub/sub content-based filtering

• Telegraph (Berkeley) – adaptive engine for sensors

• TribecaTribeca (Bellcore) – network monitoringAdapted from slides ©Rajeev Motwani

Page 49: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 49 Lecture 15

Summary: DBMS versus DSMSSummary: DBMS versus DSMS• Persistent relations

• One-time queries

• Random access

• “Unbounded” disk store

• Only current state matters

• Passive repository

• Relatively low update rate

• No real-time services

• Assume precise data

• Access plan determined by optimizer, physical DB design

• Transient streams

• Continuous queries

• Sequential access

• Bounded main memory

• History/arrival-order is critical

• Active stores

• Possibly multi-GB arrival rate

• Real-time requirements

• Data stale/imprecise

• Unpredictable/variable data arrival and characteristics

Adapted from slides ©Rajeev Motwani

Page 50: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 50 Lecture 15

The Bigger Picture: When & WhyThe Bigger Picture: When & Why•Data streams?

Page 51: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 51 Lecture 15

Important ScenariosImportant Scenarios1. Static / offline

• preprocessing time• query time• synopsis size

2. Dynamic / online• update time for new data• query time• synopsis size

3. Data Stream• update time, query time• synopsis size

Full Data Set on Disks

memory

Full Data Set on Disks

memoryNew data

memoryNew data

Synopsesinside

See all the data thus far, maintain synopsis, answer queries

[G, Matias ‘98]

Page 52: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 52 Lecture 15

The Bigger PictureThe Bigger Picture•Data streams?

•Single, Merged, or Distributed data streams?

• Continuous queries over distributed data streams?

• Adversarial inputs?

• Sliding windows?

• Applicability to IrisNet?

Page 53: Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture…

03-04-03 53 Lecture 15

Next LectureNext Lecture

Tuesday March 11

Adrian Perrig on

Key distribution & Trust bootstrapping