telegraph: an adaptive global- scale query engine joe hellerstein

Telegraph: An Adaptive Global-Scale Query Engine

Joe Hellerstein

Scenarios

• Ubiquitous computing: more than clients!– sensors and their data feeds are key

• smart dust, biomedical (MEMS sensors)• each consumer good records (mis)use

– disposable computing

• video from surveillance cameras, broadcasts, etc.

• Global Data Federation– all the data is online – what are we waiting for?– The plumbing is coming

• XML/HTTP, etc. give LCD communication• but how do you query robustly over many sites in the

wide area?

There’s a Data Flood Coming

• What does it look like?– Never ends: interactivity required– Big: data reduction/aggregation is key– Unpredictable: this scale of devices and

nets will not behave nicely

The Telegraph Query Engine

• Key technologies– Interactive Control

• interactivity with early answers• online aggregation for data reduction

– Continuously adaptive flow optimization• massively parallel, adaptive dataflow via

Rivers and Eddies

CONTROLContinuous Output, Navigation & Transformation with Refinement On Line

• Data-intensive jobs are long-running. How to give early answers and interactivity?– online interactivity over feeds: data “juggle”– online query processing algs: ripple joins– statistical estimators, and their performance

implications

• Appreciate interplay of massive data processing, stats, and UIs

CONTROLContinuous Output and Navigation Technology with Refinement On Line

• We built the world’s fastest sorting machine– On the “NOW”: 100 Sun workstations + SAN– But it only beat the record under ideal

conditions!• River: performance adaptivity for data

flows on clusters– simplifies management and programming– perfect for sensor-based streams

• How to order and reorder operators over time– based on performance, economic/admin feedback

• Vs.River:– River optimizes each operator “horizontally”– Eddies optimize a pipeline “vertically”

Telegraph: Putting it Together• Scalable, adaptive dataflow infrastructure. Apps

include…– sensor nets– massively parallel and wide-area query engines– net appliances: chaining xform8n/aggreg8n/etc. proxies– any unpredictable dataflow scenario

• Technology: a marriage of…– CONTROL, River & Eddy

• Many research questions here• E.g. how to combine River and Eddy adaptivity• E.g. how to tune Eddies for statistical performance goals

– Combinations of browse/query/mine at UI– Storage management to handle new hardware realities

Integration with Endeavour

• Give– Be data-intensive backbone to diverse clients– Be replication dataflow engine for OceanStore– Telegraph Storage Manager provides storage

(xactional/otherwise) for OceanStore– Provide platform for data-intensive “tacit info

mining”

• Take– Leverage OceanStore to manager distributed

metadata, security– Leverage protocols out of TinyOS for sensors

Additional Slides

• For use in questions, etc.

Connectivity & Heterogeneity

• Lots of folks working on data format translation, parsing– we will borrow, not build– currently using JDBC & Cohera Net Query

• commercial tool, donated by Cohera Corp. • gateways XML/HTML (via http) to ODBC/JDBC

– we may write “Teletalk” gateways from sensors• Heterogeneity

– never a simple problem– Control project developed interactive, online data

transformation tool: Potter’s Wheel

Potter’s Wheel Anomaly Detection

telegraph: an adaptive global- scale query engine joe hellerstein

line slide

data feeds

data flows

data reductionaggregation

data flood

dataintensive backbone

global data federation

data format translation

Documents

telegraph continuously adaptive dataflow joe hellerstein

federated facts and figures joseph m. hellerstein uc...

daniel hellerstein (ers) and sean sylvia (arec/umd)

telegraph & telephone. outline visual telegraphy electric...

eddies: continuously adaptive query processing based on a...

tinydb: in-network query processing in...

the telegraph

parallel database primer joe hellerstein. today background:...

telegraph 300915

1 content integration for e-business joe hellerstein

towards adaptive dataflow infrastructure joe hellerstein, uc...

query processing and networking infrastructures day 2 of 2...

eddies: continuously adaptive query...

1 04/18/2005 flux flux: an adaptive partitioning operator...

federal tax - duke university school of law · federal...

tabu telegraph

relational query optimization jianlin feng school of...

sql: the query language jianlin feng school of software sun...

08 leket hellerstein against girl songs a

big data and the cloud: programming futures joe hellerstein