adaptive execution of variable-accuracy functions

1

Adaptive Execution of Adaptive Execution of Variable-Accuracy Variable-Accuracy

FunctionsFunctions

VLDB ConferenceSeoul

September 2006

Matt Denny - UC Berkeley/Fred Alger, Inc.Michael Franklin - UC Berkeley

Matt Denny, Mike FranklinUC Berkeley EECS

IntroductionIntroduction

• Many applications apply expensive functions to streams of data• Finance: real-time market monitoring with

securities models• Power Management: overload prediction

using current weather conditions• Supply Chain Management: inventory models

using RFID data to find shortages in real-time


Continuous Queries w/ Continuous Queries w/ UDFsUDFs

Example: Bond Pricing BondData: table of bond data (maturity, coupon, etc.) IntRate: stream of interest rate data

model(): C++/Java routine takes bond data and interest rate, and returns a price

SELECT BD.BondIDFROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100

Filtering

SELECT MAX(model(BD,IR.rate))FROM BondData BD, IntRate IR [Rows 1]WHERE BD.numHeld > 0

Aggregation


The ProblemThe Problem

• Analytical functions can be expensive!• minutes or hours per data point.

• Query processor has no control over execution of individual function calls.• UDF API is a Black Box

• Earlier work aims to avoid UDF calls:• predicate reordering ([HS93][KMPS94][CS96]))• memoization and caching ([HN96], [DF05])

• Remaining calls can still be a showstopper.


The IntuitionThe Intuition

1. Many functions have accuracy/cost tradeoffs. e.g., iterative solvers.

2. UDFs often appear in predicates and aggregates where exact answers are not required.SELECT BD.*

FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100


Our SolutionOur Solution

VAOs (Variable Accuracy Operators)

New query operators that:• Expose function cost/accuracy

tradeoffs using a new UDF API.

• Exploit this tradeoff to avoid excess work while correctly answering the query.


VAOs - Basic IdeaVAOs - Basic Idea

• Initially run function to obtain a coarse answer.• This needs to be cheaper than

running to a more accurate answer.

• If more accuracy needed - iterate!


Traditional Execution - Traditional Execution - SelectSelect

Select> 100 ?

execute model (IR.Rate,BD)

SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)

> $100;

10.1% . . .InterestRate

BondData

BD 1 $105.01 Result

BD 1


VAO VAO Execution: Execution: SelectSelect


> $100;


BondData

Select> 100 ?


-VAO

BD 1 $98 $110

ResultObject

L H


VAO VAO Execution: Execution: SelectSelect


> $100;


BondData

BD 1

Select> 100 ?


-VAO

BD 1 $101 $108

ResultObject

L H

Iterate()


VAO APIVAO API

• Use iterative interface• Traditional: <number> = f(<args>) • VAO: <result object> = f(<args>)

1. fields for (conservative) error bounds2. iterate() method: refines bounds with more

work3. for some vaos: also need estimates for CPU

cost and error reduction of next iteration

• Useful for:• Any sort of iterative function (e.g. root

finders, numerical integration)• Any technique with iterative step refinement

(e.g. PDEs)


Iteration StrategyIteration Strategy

• Selection iterates over an object until predicate value is known.

• Aggregate operators more difficult • Answer dependent on sets of result

objects• Need to decide how to iterate over

multiple result objects


Example: MAX(f(x1), Example: MAX(f(x1), f(x2))f(x2))

xx1 x2

f(x) bounds

initial

bounds

IterateOverf(x1)

xx1 x2

f(x) bounds

xx1 x2

f(x) bounds

IterateOverf(x2)

IterateOverboth

xx1 x2

f(x) bounds

Need an iteration strategy that attempts to minimize cost


Solution: Greedy Solution: Greedy StrategyStrategy

• Iterate over the object that has the best ratio of benefit to CPU cost among the current choices.

• Good strategy if functions converge• Later iterations likely to have

less benefit/unit cost

• Operator-dependent


Example RevisitedExample Revisited

MAX(f(x1),f(x2))

Greedy Strategy: choose best overlap reduction per CPU costUse error reduction estimates to estimate overlap reduction.Cost estimation depends on function.

Goal State: no overlap between f(x1) and f(x2)


Example RevisitedExample Revisited• Determine if f(x1) > f(x2)

Function Overlap Red. Est.

CPU Cost Est.

f(x1)

f(x2)

$.04 4 sec.

$.04 4 sec.

xx1 x2

f(x)




CPU Cost Est.

f(x1)

f(x2)

xx1 x2

f(x)

$.01 8 sec.

$.02 4 sec.

$.04 4 sec.

$.04 4 sec.




CPU Cost Est.

f(x1)

f(x2)

xx1 x2

f(x)

$0 8 sec.

$0 8 sec.

$.01 8 sec.

$.02 4 sec.


AggregatesAggregates

Operator

Goal State Greedy Heuristic

min/max(general)

No overlap between minimum (maximum) value and other function error bounds

Make educated guess for max. Choose iteration that reduces most overlap between guess and other error bounds per cycle

avg/sum avg/sum of error bounds have widthless than user-defined tolerance

Choose iteration which reduces avg/sum of bounds the most per cycle


Performance SetupPerformance Setup

• Standalone implemenation of VAO framework in C++

• Used numeric bond model and bond data from [DF05]

• Real Bond Data - 500 Mortgage-backed Securities.

• Synthetic Bond Data - to stress test VAOs

• Single Interest Rate.


VAO ImplementationVAO Implementation

• Numeric bond model [S95] implemented with traditional and VAOs interface• Based on PDE solver• VAO iterate(): double size of PDE

grid• Bounds and error reduction estimates

derived by using current and previous iteration results and Richardson’s Extrapolation [BF01]


Selection Selection PerformancePerformance500 bonds, 1 interest rate

Selection Performance

1

10

100

1000

10000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Selectivity

Runtime (sec.)

Trad

VAO

Runtime depends on number of bonds close to predicate.


Stress TestStress Test• Generate bonds with accurate

values near the predicateGaussian, mean = predicate value, vary

std. dev.

Std. dev. of realbonds: $7.78


In the PaperIn the Paper• Other Results

• Max• Real bonds: 111 sec. vs. 6953 sec.• Synthetic bonds: VAOs better than traditional above

$.05 std. dev.• Average

• Up to 5x improvement if a small number of bonds are weighted heavily in average.

• Details on Error and Cost estimates for PDE-based bond model.• Other types of models covered in Matt’s thesis.


ConclusionConclusion• Many emerging CQ applications require the

repeated execution of expensive functions.• VAOs are new operators that change how

these functions execute• Use new iterative API that exposes work-accuracy

tradeoff in functions• Do only enough work to answer the query using

greedy strategy to choose iterations

• With real bond data and models, VAOs show 1-2 orders of magnitude improvement.

• For more detailed information:[email protected]


The Advisor’s DodgeThe Advisor’s Dodge

Relative Contribution to Research

0

20

40

60

80

100

0 1 2 3 4 5

Time in Program (years)

Percent Contribution

Student

AdvisorThisWork

…

Courtesy of Jennifer Widom


BibliographyBibliography

• [HS93] J. M. Hellerstein and M. Stonebraker, “Predicate Migration: Optimizing Queries with Expensive Predicates”, SIGMOD 1993.

• [HN96] J. M. Hellerstein and J. Naughton, “Query Execution Techniques for Caching Expensive Predicates”, SIGMOD 1996.

• [DF05] M. Denny and M.J. Franklin. “Predicate Result Range Caching for Continuous Queries”, SIGMOD 2005


BibliographyBibliography

• [S95] R. Stanton, “Rational Prepayment and the Valuation of Mortgage-Backed Securities,” The Review of Financial Studies, Vol. 8, No. 3, 677-708.

• [BF01] R.L. Burden, J.D. Faires, Numerical Analysis. Brooks/Cole, 2001.

adaptive execution of variable-accuracy functions

Documents

vao execution

bondidfrom bonddata

intrate ir rows

function error boundsmake

sort of iterative function

best overlap reduction

cost estimation

fx1 fx2functionoverlap