lecture 15 15-829a/18-849b/95-811a/19-729a internet-scale sensor systems: design and policy...
DESCRIPTION
Lecture “The Design of an Acquisitional Query Processor for Sensor Networks” Latest paper on the TinyDB work out of U.C. Berkeley & Intel Research Berkeley Goal was to have you study the very latest research on sensor databases Preliminary version. A more polished, camera-ready version will be available March 11 – I will post it. Thanks to Sam Madden for providing slides that I have adapted for use in part of this lecture. Note: He is interviewing at CMU in April. What did you think of this paper?TRANSCRIPT
Lecture 15
15-829A/18-849B/95-811A/19-729A15-829A/18-849B/95-811A/19-729A
Internet-Scale Sensor Systems: Internet-Scale Sensor Systems: Design and PolicyDesign and Policy
Lecture 15Sensor Databases & Data Stream
Systems
Phil GibbonsMarch 4, 2003
03-04-03 2 Lecture 15
OutlineOutline•Sensor Databases
• Madden et al, “The Design of an Acquisitional Query Processor for Sensor Networks”, to appear in Sigmod’03
•Data Stream Systems• Babcock et al, “Models and Issues in Data Stream
Systems”, PODS’02 survey talk
03-04-03 3 Lecture 15
““The Design of an Acquisitional The Design of an Acquisitional Query Processor for Sensor Query Processor for Sensor NetworksNetworks””•Latest paper on the TinyDB work out of U.C.
Berkeley & Intel Research Berkeley
•Goal was to have you study the very latest research on sensor databases
• Preliminary version. A more polished, camera-ready version will be available March 11 – I will post it.
•Thanks to Sam Madden for providing slides that I have adapted for use in part of this lecture.
• Note: He is interviewing at CMU in April.What did you think of this paper?
03-04-03 4 Lecture 15
Acquisitional Query ProcessingAcquisitional Query Processing•What’s really new & different about (mote-based)
sensor networks?
•This paper’s answer:• Long running queries on physically embedded devices that
control when and where and with what frequency data is collected
• Versus traditional systems where data is provided a priori
•For a distributed, embedded sensing environment, ACQP provides a framework for addressing issues of
• When, where, and how often data is sensed/sampled• Which data is delivered
03-04-03 5 Lecture 15
Context: Mica MotesContext: Mica Motes•Tiny Memory
• 4KB RAM• 128KB program memory
•Limited Communication• Broadcast to any that hear it. Form ad-hoc routing tree • ~Ten 48-byte messages delivered per second
•Power consumption• Every bit of data transmitted by radio = 1000 CPU insts• Deep sleeping is 4-10 times less power than when active• Can synchronize clocks with neighboring motes to within
+/- 1 millisec: Ensure all awake at roughly the same time
03-04-03 6 Lecture 15
Acquisitional Query ProcessingAcquisitional Query Processing• How does the user control acquisition?
• Rates or lifetimes• Event-based triggers
• How should the query be processed?• Sampling as an operator, Power-optimal ordering• Frequent events as joins
• Which nodes have relevant data?• Semantic Routing Tree for effective pruning
• Nodes that are queried together route together
• Which samples should be transmitted?• Pick most “valuable”?• Adaptive transmission & sampling rates
Adapted from slides ©Sam Madden
03-04-03 7 Lecture 15
Rate & Lifetime QueriesRate & Lifetime Queries• Rate query
SELECT nodeid, light, tempFROM sensorsSAMPLE INTERVAL 1s FOR 10s
• Lifetime querySELECT …LIFETIME 30 days
May not be able to transmit all the data
Estimate sampling rate that achieves this
SELECT …LIFETIME 10 daysMIN SAMPLE INTERVAL 1s
Adapted from slides ©Sam Madden
03-04-03 8 Lecture 15
Processing Lifetimes: IssuesProcessing Lifetimes: Issues• Provide formulas for estimating power
consumption: set maximum per-node sampling rates
• What makes this difficult?• multiple sensing types (temp, accel) with different drain• estimating the selectivity of predicates• amount transmitted by a node varies widely• root is a bottleneck: all nodes rates must correspond to it• aggregation vs. sending individual values• conditions change: multiple queries, burstiness, message losses
• What to do when can’t transmit all the data
Adapted from slides ©Sam Madden
03-04-03 9 Lecture 15
Lifetime Based QueriesLifetime Based Queries
Is this experiment convincing?Adapted from slides ©Sam Madden
03-04-03 10 Lecture 15
Event Based ProcessingEvent Based Processing•ACQP – want to initiate queries in response to events
ON EVENT bird-detect(loc):SELECT AVG(s.light), AVG(s.temp), event.locFROM sensors AS sWHERE dist(s.loc, event.loc) < 10mSAMPLE PERIOD 2s FOR 30s
Reports the average light and temperature level at sensors near a bird nest where a bird has been detected
What are the issues here?E.g., New query instance generated for as long as bird is there
Adapted from slides ©Sam Madden
03-04-03 11 Lecture 15
Event Based ProcessingEvent Based Processing
Single external interruptAdapted from slides ©Sam Madden
03-04-03 12 Lecture 15
Acquisitional Query ProcessingAcquisitional Query Processing• How does the user control acquisition?
• Rates or lifetimes• Event-based triggers
• How should the query be processed?• Sampling as an operator, Power-optimal ordering• Frequent events as joins
• Which nodes have relevant data?• Semantic Routing Tree for effective pruning
• Nodes that are queried together route together
• Which samples should be transmitted?• Pick most “valuable”?• Adaptive transmission & sampling rates
Adapted from slides ©Sam Madden
03-04-03 13 Lecture 15
Power-Optimal Operator Ordering: Power-Optimal Operator Ordering:
Interleave Sampling + SelectionInterleave Sampling + SelectionSELECT light, mag FROM sensorsWHERE pred1(mag) AND pred2(light)SAMPLE INTERVAL 1s
• Energy cost of sampling mag >> cost of sampling light
1500 uJ vs. 90 uJ
• Correct ordering (unless pred1 is very selective):2. Sample light
Apply pred2Sample magApply pred1
1. Sample light Sample magApply pred1Apply pred2
3. Sample mag
Apply pred1
Sample light
Apply pred2
Adapted from slides ©Sam Madden
03-04-03 14 Lecture 15
Event Query BatchingEvent Query BatchingON EVENT E(nodeid)SELECT aFROM sensors AS sWHERE s.nodeid = e.nodeidSAMPLE INTERVAL d FOR k
Problem: Multiple outstanding queries (lots of samples)
SELECT s.aFROM sensors AS s, events AS eWHERE s.nodeid = e.nodeidAND e.type = E AND s.time – e.time <= k AND s.time > e.timeSAMPLE INTERVAL d
Solution: Rewrite as a sliding window join between sensors and the last k seconds of detected events:
If events are frequent, use join approach. Issues?Assumes regular occurrences: Would like to handle burstiness
Adapted from slides ©Sam Madden
03-04-03 15 Lecture 15
Acquisitional Query ProcessingAcquisitional Query Processing• How does the user control acquisition?
• Rates or lifetimes• Event-based triggers
• How should the query be processed?• Sampling as an operator, Power-optimal ordering• Frequent events as joins
• Which nodes have relevant data?• Semantic Routing Tree for effective pruning
• Nodes that are queried together route together
• Which samples should be transmitted?• Pick most “valuable”?• Adaptive transmission & sampling rates
Adapted from slides ©Sam Madden
03-04-03 16 Lecture 15
Attribute Driven Topology Attribute Driven Topology SelectionSelection•Observation: internal queries often over local area
• Or some other subset of the network• E.g. regions with light value in [10,20]
•Idea: build topology for those queries based on values of range-selected attributes
• For range queries• Relatively static trees
• Maintenance Cost
Adapted from slides ©Sam Madden
03-04-03 17 Lecture 15
Attribute Driven Query Attribute Driven Query PropagationPropagation
1 2 3
4
[1,10][7,15]
[20,40]
SELECT … WHERE a > 5 AND a < 12
Precomputed intervals = Semantic Routing Tree (SRT)
Early pruning
Adapted from slides ©Sam Madden
03-04-03 18 Lecture 15
Attribute Driven Parent SelectionAttribute Driven Parent Selection
1 2 3
4
[1,10] [7,15] [20,40]
[3,6]
[3,6] [1,10] = [3,6]
[3,6] [7,15] = ø
[3,6] [20,40] = ø
Even without intervals, expect that sending to parent with closest value will help
Adapted from slides ©Sam Madden
03-04-03 19 Lecture 15
Simulation ResultSimulation Result
Nodes Visited vs. Query Range
050
100150200250300350400450
0.001 0.05 0.1 0.2 0.5 1Query Size as % of Value Range
(Random value distribution, 20x20 grid, ideal connectivity to (8) neighbors)
# o
f No
des V
isite
d (4
00 =
Max
)
Best Case (Expec ted)Closest P arentNearest ValueSnooping
Random Parent
Adapted from slides ©Sam Madden
03-04-03 20 Lecture 15
Acquisitional Query ProcessingAcquisitional Query Processing• How does the user control acquisition?
• Rates or lifetimes• Event-based triggers
• How should the query be processed?• Sampling as an operator, Power-optimal ordering• Frequent events as joins
• Which nodes have relevant data?• Semantic Routing Tree for effective pruning
• Nodes that are queried together route together
• Which samples should be transmitted?• Pick most “valuable”?• Adaptive transmission & sampling rates
Adapted from slides ©Sam Madden
03-04-03 21 Lecture 15
Adaptive Transmission RatesAdaptive Transmission RatesSample Rate vs. Delivery Rate
0
1
2
3
4
5
6
7
8
0 2 4 6 8 10 12 14 16Samples Per Second (Per Mote)
Agg
rega
te D
eliv
ery
Rat
e (P
acke
ts/S
econ
d)
1 mote4 motes4 motes, adaptive
Adaptive = 2x % Successful Xmissions
TinyDB monitors channel contention & backs-off as neededAdapted from slides ©Sam Madden
03-04-03 22 Lecture 15
Prioritizing Data DeliveryPrioritizing Data Delivery• Score each item
• Send largest score• Out of order -> Priority Queue
• Discard or aggregate when buffer is full
[1,2]
Adapted from slides ©Sam Madden
03-04-03 23 Lecture 15
Choosing Data To SendChoosing Data To SendDelta encoding
[1,2]
Time vs. Value
02
46
810
1214
16
1 2 3 4
Time
Valu
e(time, value)
Adapted from slides ©Sam Madden
03-04-03 24 Lecture 15
Choosing Data To SendChoosing Data To Send
[2,6] [3,15] [4,1]
[1,2]
|2-6| = 4
|2-15| = 13
|2-4| = 2
Time vs. Value
02
46
810
1214
16
1 2 3 4
Time
Valu
e
Delta encoding
Select which of the 3 to send
Adapted from slides ©Sam Madden
03-04-03 25 Lecture 15
Choosing Data To SendChoosing Data To Send
[2,6]
[3,15]
[4,1]
[1,2]
Time vs. Value
0
24
68
10
1214
16
1 2 3 4
Time
Valu
e
|2-6| = 4 |15-4| = 11
Delta encoding
Keep selectinguntil hit maxdelivery rate
Adapted from slides ©Sam Madden
03-04-03 26 Lecture 15
Choosing Data To SendChoosing Data To Send
[2,6]
[3,15] [4,1][1,2]
Time vs. Value
0
24
68
10
1214
16
1 2 3 4
Time
Valu
e
Delta encoding
Adapted from slides ©Sam Madden
03-04-03 27 Lecture 15
Choosing Data To SendChoosing Data To Send
[2,6] [3,15] [4,1][1,2]
Time vs. Value
0
24
68
10
1214
16
1 2 3 4
Time
Valu
e
Delta encoding
If manageto send all
Adapted from slides ©Sam Madden
03-04-03 28 Lecture 15
Delta + AdaptivityDelta + Adaptivity
•8 element queue
•4 motes transmitting different signals
•8 samples /sec / mote
Adapted from slides ©Sam Madden
03-04-03 29 Lecture 15
ACQP SummaryACQP Summary• Lifetime & event based queries
• User preferences for when data is acquired
• Optimizations for• Order of sampling• Events vs. joins
• Semantic Routing Tree• Query dissemination
• Runtime prioritization• Adaptive rate control• Which samples to send
Adapted from slides ©Sam Madden
03-04-03 30 Lecture 15
OutlineOutline•Sensor Databases
• Madden et al, “The Design of an Acquisitional Query Processor for Sensor Networks”, to appear in Sigmod’03
•Data Stream Systems• Babcock et al, “Models and Issues in Data Stream
Systems”, PODS’02 survey talk
03-04-03 31 Lecture 15
Models & Issues in Data Stream Models & Issues in Data Stream SystemsSystems•Invited survey paper to PODS 2002
•Good overview of the basics & the issues• But with a definite Stanford bias
• Data arrives in multiple, continuous, rapid, time-varying data streams
• Can have continuous queries• Data Stream Management Systems
What did you think of this paper?
This part of the lecture does not follow the paper
03-04-03 32 Lecture 15
Data Stream SystemsData Stream Systems•Introduction
•Research in Synopses for Data Streams (models, algorithms, lower bounds)
•Research in Data Stream Management Systems
03-04-03 33 Lecture 15
Processing Data Streams: Processing Data Streams: MotivationMotivation•Many applications generate streams of data
• Performance measurements in network monitoring and traffic management
• Call detail records in telecommunications• Transactions in retail chains, ATM operations in banks• Log records generated by Web Servers• Sensor network data
•Application characteristics• Massive volumes of data (several terabytes)• Records arrive at a rapid rate
•Goal: Mine patterns, process queries and compute statistics on data streams in real-time
Adapted from slides ©Rajeev Motwani
03-04-03 34 Lecture 15
Example: Network ManagementExample: Network Management
Network Operations Center
Network
MeasurementsAlarms
Massive amounts of rapidly-arriving data at each node
Adapted from slides ©Rajeev Motwani
03-04-03 35 Lecture 15
Data Stream SystemsData Stream Systems•Introduction
•Research in Synopses for Data Streams (models, algorithms, lower bounds)
•Research in Data Stream Management Systems
03-04-03 36 Lecture 15
Data Stream ModelData Stream Model• A data stream is a sequence of elements:
• Stream processing goals• Limited memory for storing synopsis, e.g., O(n), O(log n)• Fast synopsis update time (per element), e.g., O(1)• Fast query time, e.g., O(n), O(log n)
nee ,...,1
Stream ProcessingEngine
(Approximate) Answer
Synopsis in Memory
Data Stream
Adapted from slides ©Rajeev Motwani
03-04-03 37 Lecture 15
Merged Data Streams ModelMerged Data Streams Model
Stream ProcessingEngine
(Approximate) Answer
Synopsis in Memory
Data Streams
Multiple data streams to a single party/agent• Arbitrary interleaving of streams
• Same goals as before (per stream)Adapted from slides ©Rajeev Motwani
03-04-03 38 Lecture 15
Distributed Data Streams ModelDistributed Data Streams Model
Stream Processing
Engine
Synopsis in Memory
Data Stream
Stream Processing
Engine
Synopsis in Memory
Data Stream
When a query is requeste
d
AnalysisFrontEnd
(Approximate) Answer
+ Avoids sending streams to Analysis Front End[G, Tirthapura, SPAA’01]
...
03-04-03 39 Lecture 15
Adversarial Stream InputsAdversarial Stream Inputs•Adversary controls input values and order
• No distributional assumptions on the inputs• Past may not be representative of the future• Typically, do know the input domain
•Randomized algorithms• Have oracle for uniformly random numbers• Would like to minimize the number of oracle calls• Adversary does not adapt to these random numbers
03-04-03 40 Lecture 15
Coping With Memory LimitationsCoping With Memory Limitations•Many queries cannot be answered over streams,
due to the memory limitations• e.g., see proofs in [Arasu et al, PODS’02]
•However, often a detailed, exact answer over streams is not interesting:
• Prefer summarized data (aggregates)• Prefer to focus only on recent data• Suffices to get the leading digits of aggregates correct
=> Keys to staying within the memory limitations
03-04-03 41 Lecture 15
Sliding WindowSliding Window•Maintain the aggregate / statistic over a sliding
window of the N most recent stream elements• Motivation: Only the most recent data is important
Position: 1 2 … 20 21 22 23 24 25 26 27 28 29Stream: 0 1 … 1 0 1 0 0 1 1 0 1 0
N = 10
Number of 1’s = 5
N = 10
30 0
Number of 1’s = 4
[Datar, Gionis, Indyk, Motwani, SODA’02]
03-04-03 42 Lecture 15
New Stream Algorithms forNew Stream Algorithms for• Histograms
• Equi-Width Histograms (Quantiles)• Most popular items, V-Opt Histograms • Wavelets
• Data Mining• Stream Clustering (e.g. k-medians)• Decision Trees
• Frequency moments, Lp Norms of two streams
• Relational DB operators• Join size estimation
Papers in STOC, FOCS, SIGMOD, VLDB, etc
AlsoLower Bounds
03-04-03 43 Lecture 15
Data Stream SystemsData Stream Systems•Introduction
•Research in Synopses for Data Streams (models, algorithms, lower bounds)
•Research in Data Stream Management Systems
03-04-03 44 Lecture 15
Traditional DB Management SystemTraditional DB Management System
User/Application
Query OptimizerQuery Processor
DatabaseManagementSystem(DBMS)
QueryQuery ResultResult
LoaderLoader
QueryQuery ResultResult
Adapted from slides ©Rajeev Motwani
03-04-03 45 Lecture 15
Data Stream Management SystemData Stream Management SystemUser/Application
Stream QueryProcessor
Scratch SpaceScratch Space(Memory and/or Disk)(Memory and/or Disk)
DataStream
ManagementSystem(DSMS)
Register QueryRegister Query ResultsResults
Centralized Processing: I.e., Merged Streams ModelAdapted from slides ©Rajeev Motwani
03-04-03 46 Lecture 15
Related Database TechnologyRelated Database Technology• Triggers on Conventional Databases / Active Databases
handling stream ordering/rate, scaling/generality for triggers
• Main-Memory Databases handling ordering/rate, better for read-only/query-intensive
• Publish/Subscribe Systems handling stream ordering, event-filtering only, dissemination focus
• Materialized Views handling stream ordering, no streaming output
• Sequence/Temporal/Timeseries Databases represents time/ordering in stored relations
• Realtime Databases transactions with deadlines
Adapted from slides ©Rajeev Motwani
03-04-03 47 Lecture 15
STREAM Architecture (Stanford)STREAM Architecture (Stanford)
Input streams
Users issue continuous and ad-hoc queries
Administrator monitors query execution and
adjusts run-time parameters
Applications register continuous queries
Output streams
x
x
Waiting Op
Ready Op
Running OpSynopses Query Plans
HistoricalStorage
Adapted from slides ©Rajeev Motwani
03-04-03 48 Lecture 15
Stream DB ProjectsStream DB Projects• Amazon/CougarAmazon/Cougar (Cornell) – sensors
• Aurora (Brown/MIT) – sensor monitoring, dataflow
• Hancock Hancock (AT&T) – telecom streams
• Niagara (OGI/Wisconsin) – Internet XML databases
• OpenCQ OpenCQ (Georgia) – triggers, incr. view maintenance
• Stream (Stanford) – general-purpose DSMS
• TapestryTapestry (Xerox) – pub/sub content-based filtering
• Telegraph (Berkeley) – adaptive engine for sensors
• TribecaTribeca (Bellcore) – network monitoringAdapted from slides ©Rajeev Motwani
03-04-03 49 Lecture 15
Summary: DBMS versus DSMSSummary: DBMS versus DSMS• Persistent relations
• One-time queries
• Random access
• “Unbounded” disk store
• Only current state matters
• Passive repository
• Relatively low update rate
• No real-time services
• Assume precise data
• Access plan determined by optimizer, physical DB design
• Transient streams
• Continuous queries
• Sequential access
• Bounded main memory
• History/arrival-order is critical
• Active stores
• Possibly multi-GB arrival rate
• Real-time requirements
• Data stale/imprecise
• Unpredictable/variable data arrival and characteristics
Adapted from slides ©Rajeev Motwani
03-04-03 50 Lecture 15
The Bigger Picture: When & WhyThe Bigger Picture: When & Why•Data streams?
03-04-03 51 Lecture 15
Important ScenariosImportant Scenarios1. Static / offline
• preprocessing time• query time• synopsis size
2. Dynamic / online• update time for new data• query time• synopsis size
3. Data Stream• update time, query time• synopsis size
Full Data Set on Disks
memory
Full Data Set on Disks
memoryNew data
memoryNew data
Synopsesinside
See all the data thus far, maintain synopsis, answer queries
[G, Matias ‘98]
03-04-03 52 Lecture 15
The Bigger PictureThe Bigger Picture•Data streams?
•Single, Merged, or Distributed data streams?
• Continuous queries over distributed data streams?
• Adversarial inputs?
• Sliding windows?
• Applicability to IrisNet?
03-04-03 53 Lecture 15
Next LectureNext Lecture
Tuesday March 11
Adrian Perrig on
Key distribution & Trust bootstrapping