telegraph continuously adaptive dataflow joe hellerstein
TRANSCRIPT
![Page 1: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/1.jpg)
TelegraphContinuously Adaptive Dataflow
Joe Hellerstein
![Page 2: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/2.jpg)
Scenarios
• Ubiquitous computing: more than clients– sensors and their data feeds are key
• smart dust, biomedical (MEMS sensors)• each consumer good records (mis)use
– disposable computing
• video from surveillance cameras, broadcasts, etc.
• Global Data Federation– all the data is online – what are we waiting for?– The plumbing is coming
• XML/HTTP, etc. give LCD communication• but how do you flow, summarize, query and analyze
data robustly over many sources in the wide area?
![Page 3: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/3.jpg)
Dataflow in Volatile Environments• Federated query processors a reality
– Cohera, IBM DataJoiner– No control over stats, performance, administration
• Large Cluster Systems “Scaling Out”– No control over “system balance”
• User “CONTROL” of running dataflows– Long-running dataflow apps are interactive– No control over user interaction
• Sensor Nets: the next killer app– E.g. “Smart Dust”– No control over anything!
• Telegraph– Dataflow Engine for these environments
![Page 4: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/4.jpg)
Data Flood: Main Features
• What does it look like?– Never ends: interactivity required
• Online, controllable algorithms for all tasks!
– Big: data reduction/aggregation is key– Volatile: this scale of devices and nets will
not behave nicely
![Page 5: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/5.jpg)
The Telegraph Dataflow Engine
• Key technologies– Interactive Control
• interactivity with early answers and examples• online aggregation for data reduction
– Dataflow programming via paths/iterators • Elevate query processing frameworks out of DBMSs• Long tradition of static optimization here
– Suggestive, but not sufficient for volatile environments
– Continuously adaptive flow optimization• massively parallel, adaptive dataflow via Rivers
and Eddies
![Page 6: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/6.jpg)
CONTROLContinuous Output and Navigation Technology with Refinement On Line
• Data-intensive jobs are long-running. How to give early answers and interactivity?– online interactivity over feeds
• pipelining “online” operators, data “juggle”– online data correlation algs: ripple joins, online
mining and aggregation– statistical estimators, and their performance
implications• Deliver data to satisfy statistical goals
• Appreciate interplay of massive data processing, stats, and HCI
“Of all men's miseries, the bitterest is this: to know so much and have control over nothing”
–Herodotus
![Page 7: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/7.jpg)
Performance Regime for CONTROL
• New “Greedy” Performance Regime– Maximize 1st derivative of the user-
happiness function
Time
100%
CONTROLTraditional
![Page 8: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/8.jpg)
CONTROLContinuous Output and Navigation Technology with Refinement On Line
![Page 9: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/9.jpg)
CONTROLContinuous Output and Navigation Technology with Refinement On Line
![Page 10: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/10.jpg)
![Page 11: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/11.jpg)
Potter’s Wheel Anomaly Detection
![Page 12: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/12.jpg)
River
• We built the world’s fastest sorting machine– On the “NOW”: 100 Sun workstations + SAN– But it only beat the record under ideal
conditions!• River: performance adaptivity for data
flows on clusters– simplifies management and programming– perfect for sensor-based streams
![Page 13: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/13.jpg)
Declarative Dataflow: NOT new• Database Systems have been doing this for years
– Xlate declarative queries into an efficient dataflow plan– “query optimization” considers:
• Alternate data sources (“access methods”)• Alternate implementations of operators• Multiple orders of operators• A space of alternatives defined by transformation rules• Estimate costs and “data rates”, then search space
• But in a very static way!– Gather statistics once a week– Optimize query at submission time– Run a fixed plan for the life of the query
• And these ideas are ripe to elevate out of DBMSs– And outside of DBMSs, the world is very volatile– There are surely going to be lessons “outside the box”
![Page 14: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/14.jpg)
Static Query Plans
• Volatile environments like sensors need to adapt at a much finer grain
![Page 15: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/15.jpg)
Continuous Adaptivity: Eddies
• How to order and reorder operators over time– based on performance, economic/admin feedback
• Vs.River:– River optimizes each operator “horizontally”– Eddies optimize a pipeline “vertically”
Eddy
![Page 16: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/16.jpg)
Competitive Eddies
Eddy
R2R1 R3 S1 S2 S3
hash
block index1 index2
![Page 17: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/17.jpg)
Telegraph: Putting it Together• Scalable, adaptive dataflow infrastructure. Apps include…
– sensor nets– massively parallel and wide-area query engines– net appliances: chaining xform8n/aggreg8n/compression/ etc. in
proxies– any volatile dataflow scenario
• Technology: a marriage of…– CONTROL, Rivers & Eddies
• Many research questions here• E.g. how to combine River and Eddy adaptivity• E.g. how to tune Eddies for statistical performance goals
– Combinations of browse/query/mine at UI– Storage management to handle new hardware realities
• Look for a live service this summer!
![Page 18: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/18.jpg)
Integration with Endeavour
• Give– Be data-intensive backbone to diverse clients– Be replication/delivery dataflow engine for
OceanStore– Telegraph Storage Manager provides storage
(xactional/otherwise) for OceanStore– Provide platform for data-intensive “tacit info mining”
• Take– Leverage OceanStore to manage distributed
metadata, security– Leverage protocols out of TinyOS for sensors
![Page 19: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/19.jpg)
Connectivity & Heterogeneity
• Lots of folks working on data format translation, parsing– we will borrow, not build– currently using JDBC & Cohera Net Query
• commercial tool, donated by Cohera Corp. • gateways XML/HTML (via http) to ODBC/JDBC
– we may write “Teletalk” gateways from sensors• Heterogeneity
– never a simple problem– Control project developed interactive, online data
transformation tool: ABC
![Page 20: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/20.jpg)
More Info
• Collaborators:– Mike Franklin, Eric Brewer, Christos
Papadimitriou– Sirish Chandrasekaran, Amol Deshpande,
Kris Hildrum, Sam Madden, Vijayshankar Raman, Mehul Shah
• Me: [email protected]• Web:
– http://db.cs.berkeley.edu/telegraph– http://control.cs.berkeley.edu
![Page 21: Telegraph Continuously Adaptive Dataflow Joe Hellerstein](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e4d5503460f94b4393b/html5/thumbnails/21.jpg)
Extra slides for backup