eddies: continuously adaptive query processing ross rosemark

Post on 17-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Eddies: Continuously Eddies: Continuously Adaptive Query ProcessingAdaptive Query Processing

Ross RosemarkRoss Rosemark

Goals of PresentationGoals of Presentation

What are traditional query optimizers?What are traditional query optimizers? What are the problems with these What are the problems with these

optimizers?optimizers?

What is an eddy?What is an eddy? How does an eddy work?How does an eddy work? Discuss how an eddy self tunes query Discuss how an eddy self tunes query

plans?plans?

Traditional Query OptimizerTraditional Query Optimizer

Metadata (statistics) about the distribution Metadata (statistics) about the distribution of data is collected by the query optimizerof data is collected by the query optimizer

Based on the metadata the query Based on the metadata the query optimizer determines the most energy optimizer determines the most energy efficient query planefficient query plan

Query plan is executed.Query plan is executed.

This is good for snapshot queries (short This is good for snapshot queries (short lived queries)lived queries)

ProblemsProblems The problem with this approach is that data became streamingThe problem with this approach is that data became streaming

InternetInternet Sensor nodesSensor nodes Etc.Etc.

The query plan chosen by the query optimizer could eventually become The query plan chosen by the query optimizer could eventually become inefficient if the metadata statistics changeinefficient if the metadata statistics change Cost of operatorsCost of operators Operator selectivies could changeOperator selectivies could change Rate tuples arrive from inputsRate tuples arrive from inputs

Also it’s difficult to just re-optimize a queryAlso it’s difficult to just re-optimize a query Determining when to re-optimize the query is a difficult research issue that has Determining when to re-optimize the query is a difficult research issue that has

not been addressed.not been addressed. Must determine not only when you should re-optimize but that re-optimizing still leaves Must determine not only when you should re-optimize but that re-optimizing still leaves

produces the same stateproduces the same state

Example Example

Assume a query that asks for Employees Assume a query that asks for Employees age and salary > 100000 is issued into the age and salary > 100000 is issued into the sensor network.sensor network. Initially the predicate salary > 100000 will be Initially the predicate salary > 100000 will be

very selective… hence eliminating a lot of very selective… hence eliminating a lot of tuplestuples As older employees in the database start to be As older employees in the database start to be

read the predicate salary > 100000 will be less read the predicate salary > 100000 will be less selectiveselective

Lets show some things Eddie must Lets show some things Eddie must consider.consider.

Eddie should:Eddie should: Increase synchronizationIncrease synchronization Increase the moments of SymmetryIncrease the moments of Symmetry

Synchronization BarriersSynchronization Barriers Synchronization barriersSynchronization barriers

Where one operation hinders the speed of another operationsWhere one operation hinders the speed of another operations

Extreme exampleExtreme example Assume you have a merge join on two duplicate-free inputs. (slowlo and Assume you have a merge join on two duplicate-free inputs. (slowlo and

fasthi)fasthi) Assume that during processing the next tuple is always consumed from the Assume that during processing the next tuple is always consumed from the

relation that had the lowest valuesrelation that had the lowest values Assume that slowlo is a slowly delivered external relation with many low Assume that slowlo is a slowly delivered external relation with many low

columns in it’s bandwithcolumns in it’s bandwith Assume that fasthi is high bandwith (i.e. delivers tuples fast) and has only Assume that fasthi is high bandwith (i.e. delivers tuples fast) and has only

high values in it’s columnhigh values in it’s column In this example fasthi is delayed why slowlo delivers tuplesIn this example fasthi is delayed why slowlo delivers tuples

Known as a synchronization barrierKnown as a synchronization barrier

Desirable to minimize the number of Synchronization barriersDesirable to minimize the number of Synchronization barriers

Moments of SymmetryMoments of Symmetry You can only re-optimize at a moment You can only re-optimize at a moment

of symmetryof symmetry A moment of symmetry is when the A moment of symmetry is when the

query is executed to a point that the query is executed to a point that the optimizer can change the query plan optimizer can change the query plan without affecting the way the query without affecting the way the query plans predicates are performedplans predicates are performed

Typically happens in joinsTypically happens in joins

ExampleExample Assume you have a nested loop join Assume you have a nested loop join

with inner relation R and outer relation with inner relation R and outer relation S.S.

In this example you can only re-optimize In this example you can only re-optimize this join when R is completely scanned.this join when R is completely scanned.

It is possible to re-optimize in the middle It is possible to re-optimize in the middle of scanning S but the join algorithm of scanning S but the join algorithm would then have to be changedwould then have to be changed

EddyEddy An Eddy was designed to dynamically re-An Eddy was designed to dynamically re-

optimize queries.optimize queries. The authors implemented Eddies in a RiverThe authors implemented Eddies in a River

A river is a shared nothing parallel query A river is a shared nothing parallel query processing framework that dynamically processing framework that dynamically adapts to fluctuations and workloadsadapts to fluctuations and workloads

An eddy is implemented via a module in a river containing an arbitrary number of input relations, a number of participating unary and binary modules, and a single output relation

An Eddy also maintains a fixed size buffer An Eddy also maintains a fixed size buffer of tuples that need to be processed of tuples that need to be processed

Each operator takes two tuples, processes Each operator takes two tuples, processes them and delivers them back to the eddythem and delivers them back to the eddy

Eddy (Cont)Eddy (Cont)

A tuple in a eddy is associated with a tuple descriptorA tuple in a eddy is associated with a tuple descriptor Contains a vector of Ready bits and Done bitsContains a vector of Ready bits and Done bits

The Eddy ships the tuple to only the operators that have The Eddy ships the tuple to only the operators that have the Ready bits turned onthe Ready bits turned on

After an operators is processed it’s Done bits are setAfter an operators is processed it’s Done bits are set If all done bits are set the tuple is sent to the Eddy’s outputIf all done bits are set the tuple is sent to the Eddy’s output Else it’s sent to another operatorsElse it’s sent to another operators

Eddy (Cont)Eddy (Cont)

So how do you route tuples to the different So how do you route tuples to the different Eddy operatorsEddy operators In this paper they look at multiple different In this paper they look at multiple different

waysways

Naïve SchemeNaïve Scheme Eddies buffer is implemented as a priority queueEddies buffer is implemented as a priority queue

Recall the buffer is used to store tuples that should be processed by the Recall the buffer is used to store tuples that should be processed by the EddiesEddies

When a tuple enters a buffer it’s priority is set to lowWhen a tuple enters a buffer it’s priority is set to low After it’s processed by an operator in the Eddy and returned to the After it’s processed by an operator in the Eddy and returned to the

buffer it’s priority is set to highbuffer it’s priority is set to high

This ensures that tuples do not get stuck in the Eddy.This ensures that tuples do not get stuck in the Eddy.I.e. starvationI.e. starvation

This scheme dynamically adjusts to work required of operatorsThis scheme dynamically adjusts to work required of operators Operators that are slower (i.e. take 4 seconds vs. 1 second will receive Operators that are slower (i.e. take 4 seconds vs. 1 second will receive

less tuples)less tuples) Note each operator has a fixed size queueNote each operator has a fixed size queue

Once queue is filled up no more tuples can be inserted into queueOnce queue is filled up no more tuples can be inserted into queue

Lottery SchemeLottery Scheme Each time a tuple is routed to a operator the operator is credited Each time a tuple is routed to a operator the operator is credited

with a ticketwith a ticket When the operator returns a tuple the eddy one ticket is debited When the operator returns a tuple the eddy one ticket is debited

from the eddies running count for that operatorfrom the eddies running count for that operator Tracks how efficiently a operator drains tuples from the systemTracks how efficiently a operator drains tuples from the system

When a tuple is to be routed to an operator the Eddy holds a lotteryWhen a tuple is to be routed to an operator the Eddy holds a lottery Only the operators that have their Ready bit sets can participate in the Only the operators that have their Ready bit sets can participate in the

lotterylottery An operator’s chance of “winning the lottery” and receiving

the tuple corresponds to the count of tickets for that operator.

Dynamically adjusts to selectivity of operators

Window SchemeWindow Scheme Problem with lottery scheme is that it uses to much past infoProblem with lottery scheme is that it uses to much past info

Problem: An operator that gained a lot of ticket initially but then became Problem: An operator that gained a lot of ticket initially but then became slow slow

In this scheme the lottery scheme is modified such that the lottery In this scheme the lottery scheme is modified such that the lottery only looks at tickets gained by an operator in a fixed window.only looks at tickets gained by an operator in a fixed window. Keeps track of two types of ticketsKeeps track of two types of tickets

Banked ticketsBanked tickets Used when running the lotteryUsed when running the lottery

Escrow ticketsEscrow tickets Used to measure efficiency during the windowUsed to measure efficiency during the window

At the end of a window At the end of a window Banked Tickets = Escrow TicketsBanked Tickets = Escrow Tickets Escrow Tickets = 0Escrow Tickets = 0

Ensure operators re-approve themselves each windowEnsure operators re-approve themselves each window

ComparisonComparison

Shows that due to Shows that due to “Fluid Dynamics” (i.e. “Fluid Dynamics” (i.e. the varying rates of the varying rates of operators) the Naïve operators) the Naïve approach naturally approach naturally adjusts based on the adjusts based on the cost of operatorscost of operators

Shows that Lottery Shows that Lottery also adjusts based on also adjusts based on workloadworkload

ComparisonComparison Naïve eddies does not adjust based on selectivityNaïve eddies does not adjust based on selectivity

Naïve performs between the best and worseNaïve performs between the best and worse Lottery does adjust based on selectivityLottery does adjust based on selectivity

JoinsJoins

Shows that Lottery Shows that Lottery performs nearly performs nearly optimallyoptimally

Naïve performs Naïve performs between the best and between the best and worstworst

Joins (Cont)Joins (Cont)

All joins are ripple All joins are ripple joinsjoins Change selectivity of Change selectivity of

join predicatejoin predicate

In all cases the Eddy In all cases the Eddy with the Lottery is with the Lottery is close to optimalclose to optimal

Window SchemeWindow Scheme

Both cases are Both cases are related to different related to different query plansquery plans Shows eddy is in Shows eddy is in

between the best and between the best and worst caseworst case

AdaptabilityAdaptability

Toggle the cost of the Toggle the cost of the operator 3 times operator 3 times throughout throughout experimentexperiment Notice that Eddy Notice that Eddy

switches how many switches how many tuples are first tuples are first processed by that processed by that operatoroperator

ProblemsProblems Does not work well when Does not work well when

there is initially a long there is initially a long delay at an operatordelay at an operator

Eddy gives all tuples to Eddy gives all tuples to operator because operator because operator is not returning operator is not returning tuples that satisfy the join tuples that satisfy the join predicates criteriapredicates criteria Not until after the delayNot until after the delay

Notice eventually this Notice eventually this problem is figured out by problem is figured out by EddyEddy Just make take some timeJust make take some time

top related