telegraph: an adaptive global- scale query engine joe hellerstein

Post on 21-Dec-2015

222 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Telegraph: An Adaptive Global-Scale Query Engine

Joe Hellerstein

Scenarios

• Ubiquitous computing: more than clients!– sensors and their data feeds are key

• smart dust, biomedical (MEMS sensors)• each consumer good records (mis)use

– disposable computing

• video from surveillance cameras, broadcasts, etc.

• Global Data Federation– all the data is online – what are we waiting for?– The plumbing is coming

• XML/HTTP, etc. give LCD communication• but how do you query robustly over many sites in the

wide area?

There’s a Data Flood Coming

There’s a Data Flood Coming

• What does it look like?– Never ends: interactivity required– Big: data reduction/aggregation is key– Unpredictable: this scale of devices and

nets will not behave nicely

The Telegraph Query Engine

• Key technologies– Interactive Control

• interactivity with early answers• online aggregation for data reduction

– Continuously adaptive flow optimization• massively parallel, adaptive dataflow via

Rivers and Eddies

CONTROLContinuous Output, Navigation & Transformation with Refinement On Line

• Data-intensive jobs are long-running. How to give early answers and interactivity?– online interactivity over feeds: data “juggle”– online query processing algs: ripple joins– statistical estimators, and their performance

implications

• Appreciate interplay of massive data processing, stats, and UIs

CONTROLContinuous Output and Navigation Technology with Refinement On Line

CONTROLContinuous Output and Navigation Technology with Refinement On Line

River

• We built the world’s fastest sorting machine– On the “NOW”: 100 Sun workstations + SAN– But it only beat the record under ideal

conditions!• River: performance adaptivity for data

flows on clusters– simplifies management and programming– perfect for sensor-based streams

Eddy

• How to order and reorder operators over time– based on performance, economic/admin feedback

• Vs.River:– River optimizes each operator “horizontally”– Eddies optimize a pipeline “vertically”

Eddy

Telegraph: Putting it Together• Scalable, adaptive dataflow infrastructure. Apps

include…– sensor nets– massively parallel and wide-area query engines– net appliances: chaining xform8n/aggreg8n/etc. proxies– any unpredictable dataflow scenario

• Technology: a marriage of…– CONTROL, River & Eddy

• Many research questions here• E.g. how to combine River and Eddy adaptivity• E.g. how to tune Eddies for statistical performance goals

– Combinations of browse/query/mine at UI– Storage management to handle new hardware realities

Integration with Endeavour

• Give– Be data-intensive backbone to diverse clients– Be replication dataflow engine for OceanStore– Telegraph Storage Manager provides storage

(xactional/otherwise) for OceanStore– Provide platform for data-intensive “tacit info

mining”

• Take– Leverage OceanStore to manager distributed

metadata, security– Leverage protocols out of TinyOS for sensors

Additional Slides

• For use in questions, etc.

Connectivity & Heterogeneity

• Lots of folks working on data format translation, parsing– we will borrow, not build– currently using JDBC & Cohera Net Query

• commercial tool, donated by Cohera Corp. • gateways XML/HTML (via http) to ODBC/JDBC

– we may write “Teletalk” gateways from sensors• Heterogeneity

– never a simple problem– Control project developed interactive, online data

transformation tool: Potter’s Wheel

Potter’s Wheel Anomaly Detection

top related