towards adaptive dataflow infrastructure joe hellerstein, uc berkeley

13
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Post on 21-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Towards Adaptive Dataflow Infrastructure

Joe Hellerstein, UC Berkeley

Page 2: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Online Query Processing:The CONTROL Project (’96-’01)

Data Analysis on massive datasets takes forever No feedback, 100% accuracy

Challenge: make queries more like image delivery But images are pre-encoded in progressive format Query is ad hoc

Solution: Online Aggregation Continuous sampling w/o replacement New pipelining query processing algorithms with good

statistical properties (e.g. Ripple Joins) and user control (Online Reordering – “Juggle”)

Estimators and confidence intervals for aggregates

Streaming samples, streaming answers

Page 3: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley
Page 4: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Images Are Aggregates

Page 5: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Can do Online “Enumeration” Too

“Potter’s wheel”

Page 6: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Volatility in Streaming Queries:Analogies for Sensors

Query engines map queries to dataflows Flow graph laid out by a query optimizer (typically on cluster) Query executor runs the flow

User priorities change during CONTROL queries Breaks “compile-then-run” query optimization paradigm Dynamic reordering of commutative tasks: f(g(x))? g(f(x)) ? Dynamic reordering of data objects: x1, x2, x3, … Requires dynamic competition among choices: f(x) or f’(x)?

Volatile networks are similar Hard to predict rates of consumption/production a priori Volatile over time, and queries may run “forever” Imagine interactive user “cockpit" on the sensor net!

Added metrics of power and data quality And different kinds of volatility, no doubt

Page 7: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Adaptive Dataflow: Convergence of DBs/Nets

The idea from two angles Queries are flows, query optimization is routing

Sensor queries need nets-style adaptivity New networking SW looks like a query engine

Click, Scout. Also CANs. Sensor Qs need DB-style semantic optimization (up to app)

Telegraph: An Adaptive Dataflow System Boxes & Arrows dataflow programming Adaptive reoptimization of the flow graph (Eddies) Adaptive prioritization of the delivery (Juggle) Adaptive load-balancing/FT across nodes (FLuX) Mix Push/Pull to blend streams and pools (Fjords)

Page 8: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Extra Slides on Telegraph

Page 9: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Telegraph Apps to Date

Web Queries: Election 2000 http://fff.cs.berkeley.edu

Enhanced P2P functionality Query by album or artist, via joins with web data Working on pure P2P query processing

Initial sensor app Join I-80 traffic movement with webcams and

incidents Smart Dust Mote simulations

Page 10: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Telenap: Amazon Meets Napster

Page 11: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Movie Stars Who Donated to Bush

Page 12: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Query >> Search: http://fff.cs.berkeley.edu

“Federated Facts and Figures” Yahoo join FECInfo

Page 13: Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Query >> Search:http://fff.cs.berkeley.edu

“Federated Facts and Figures” APBNews join

FECInfo