freddies: dht-based adaptive query processing via federated eddies ryan huebsch shawn jeffery cs...

21
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Post on 21-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Freddies: DHT-Based Adaptive Query

Processing via Federated Eddies

Ryan HuebschShawn Jeffery

CS 294-4 Peer-to-Peer Systems

12/9/03

Page 2: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Outline

Background: PIER Motivation: Adaptive Query Processing (Eddies) Federated Eddies (Freddies)

System Model Routing Policies Implementation

Experimental Results Conclusions and Continuing Work

Page 3: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

PIER

Fully decentralized relational query processing engine Principles:

Relaxed consistency Organic Scaling Data in its Natural Habitat Standard Schemas via Grassroots software

Relational queries can be executed in a number of logically equivalent ways Optimization step chooses the best performance-wise Currently, PIER has no means to optimize queries

Page 4: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Adaptive Query Processing Traditional query optimization occurs at query time

and is based on statistics. This is hard because: Catalog (statistics) must be accurate and maintained Cannot recover from poor choices

The story gets worse! Long running queries:

Changing selectivity/costs of operators Assumptions made at query time may no longer hold

Federated/autonomous data sources: No control/knowledge of statistics

Heterogeneous data sources: Different arrival rates

Thus, Adaptive Query Processing systems attempt to change execution order during the query Query Scrambling, Tukwila, Wisconsin, Eddies

Page 5: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Eddies

Eddy: A tuple router that dynamically chooses the order of operators in a query plan Optimize query at runtime on a per-tuple basis Monitors selectivities and costs of operators to determine

where to send a tuple to next Currently centralized in design and implementation

Some other efforts for distributed Eddies from Wisconsin & Singapore (neither use a DHT)

Page 6: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Why use Eddies in P2P? (The easy answers) Much of the promise of P2P lies in its fully

distributed nature No central point of synchronization no central catalog Distributed catalog with statistics helps, but does not solve

all problems Possibly stale, hard to maintain Need CAP to do the best optimization No knowledge of available resources or the current state of

the system (load, etc) This is the PIER Philosophy!

Eddies were designed for a federated query processor Changing operator selectivities and costs Federated/heterogeneous data sources

Page 7: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Why Eddies in P2P? (The not so obvious answers)

Available compute resources in a P2P network are heterogeneous and dynamically changing Where should the query be processed?

In a large P2P system, local data distributions, arrival rates, etc. maybe different than global

Page 8: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Freddies: Federated Eddies

A Freddy is an adaptive query processing operator within the PIER framework

Goals: Show feasibility of adaptive query processing in

PIER Build foundation and infrastructure for smarter

adaptive query processing Establish baseline for Freddy performance to

improve upon with smarter routing policies

Page 9: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

An Example Freddy

Freddy

Put (Join Value RS)

Put (Join Value ST)

Get(R) Get(S)

Output

Get(T)

R join S S join T

Local Operators

To DHT

From DHT

R S T

Page 10: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

System Model Same functionality as centralized Eddy

Allows easy concept reuse Freddy uses its Routing Policy to determine the next

operator for a tuple Tuples in a Freddy are tagged with DoneBits indicating

which operators have processed it Freddy does all state management, thus existing operators

require no modifications Local processing comes first (in most cases)

Conserve network bandwidth Not as simple as it seems

Freddy: decide how to rehash a tuple This determines join order Challenge: Decoupling of routing decision and operator.

Most Eddy techniques no longer valid

Page 11: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Query Processing in Freddies

Query origin creates a query plan with a Freddy Possible routings determined at this time, but not the order

Freddy operators on all participating nodes initiate data flow

As tuples arrive, the Freddy determines the next operator for this tuple based on the DoneBits and routing policy Source tuples tagged with clean DoneBits and routed

appropriately When all DoneBits are set, the tuple is sent to the

output operator (return to query origin)

Page 12: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Tuple Routing Policy

Determines to which operator to send a tuple Local information

Messages expensive Monitor local usage and adjust locally

“Processing Buddy” information During processing, discover general trends in input/output

nodes’ processing capabilities/output rates, etc For instance, want to alert previous Freddy of poor PUT

decisions Design space is huge large research area

Page 13: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Freddy Routing Policies

Simple (KISS): Static Random: Not as bad as you may think Local Stat Monitoring (sampling)

More complex: Queue lengths

Somewhat analogous to the “back-pressure” effect Monitors DHT PUT ACKs Load balancing through “learning” of global join key

distribution Piggyback stats on other messages

Don’t need global information, only stats about processing buddies (nodes with which we communicate) Different sample than local – may or may not be better

Page 14: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Implementation & Experimental Setup Design Decisions:

Simplicity is key Roughly 300 of NCSS (PIER is about 5300) Single query processing operator

Separate routing policy module loaded at query time Possible routing orders determined by simple optimizer

Required generalizations to the PIER execution engine to deal with generic operators Allow PIER to run any dataflow operator

Simulator with 256 nodes, 100 tuples/table/node Feasibility, not scalability In the absence of global (or stale) knowledge, a static

optimizer could chose any join ordering we compare Freddy performance to all possible static plans

Page 15: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

3-way join

R join S join T R join S is highly selective (drops 90%) S join T is expensive (multiples tuple count by

25) Possible static join orderings:

RT

S SR

T

Page 16: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

3 Way Join Results

0

100

200

300

400

500

600

700

800

900

1000

25 50 100 150

Bandwidth/Node (KB/s)

Co

mp

leti

on

Tim

e (

s)

RST

STR

Eddy

Page 17: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

4-way join

R join S join T join U S join T is still expensive Possible static join orderings:

RT

S

U

SU

T

R

SR

T

U

TS

U

R

R S T U

Note: A traditional optimizer can’t make this plan

Page 18: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

4-Way Join

0

50

100

150

200

250

300

350

50 75 100 125 150

Bandwidth/Node (KB/s)

Co

mp

leti

on

Tim

e (

s)

RSTU

STRU

STUR

TUSR

Bushy

Eddy

Page 19: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

The Promise of Routing Policy

Illustrative example of how routing policy can improve performance

This not meant to be an exhaustive comparison of policies, rather to show the possibilities

EddyQL considers number of outstanding PUTs (queue length) to decide where to send

0

20

40

60

80

100

120

Ag

gre

ga

te B

an

dw

idth

(MB

/s)

RST STR Eddy EddyQL

Page 20: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Conclusions andContinuing Work

Freddies provide adaptable query processing in a P2P system Require no global knowledge Baseline performance shows promise for smarter

policies In the future…

Explore Freddy performance in a dynamic environment

Explore more complex routing policies

Page 21: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Questions? Comments?

Snide remarks for Ryan?Glorious praise for Shawn?

Thanks!