adaptive execution of variable-accuracy functions

28
1 Adaptive Execution of Adaptive Execution of Variable-Accuracy Variable-Accuracy Functions Functions VLDB Conference Seoul September 2006 Matt Denny - UC Berkeley/Fred Alger, Inc. Michael Franklin - UC Berkeley

Upload: wilmet

Post on 29-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Adaptive Execution of Variable-Accuracy Functions. Matt Denny - UC Berkeley/Fred Alger, Inc. Michael Franklin - UC Berkeley. VLDB Conference Seoul September 2006. Introduction. Many applications apply expensive functions to streams of data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Adaptive Execution of           Variable-Accuracy Functions

1

Adaptive Execution of Adaptive Execution of Variable-Accuracy Variable-Accuracy

FunctionsFunctions

VLDB ConferenceSeoul

September 2006

Matt Denny - UC Berkeley/Fred Alger, Inc.Michael Franklin - UC Berkeley

Page 2: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

IntroductionIntroduction

• Many applications apply expensive functions to streams of data• Finance: real-time market monitoring with

securities models• Power Management: overload prediction

using current weather conditions• Supply Chain Management: inventory models

using RFID data to find shortages in real-time

Page 3: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Continuous Queries w/ Continuous Queries w/ UDFsUDFs

Example: Bond Pricing BondData: table of bond data (maturity, coupon, etc.) IntRate: stream of interest rate data

model(): C++/Java routine takes bond data and interest rate, and returns a price

SELECT BD.BondIDFROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100

Filtering

SELECT MAX(model(BD,IR.rate))FROM BondData BD, IntRate IR [Rows 1]WHERE BD.numHeld > 0

Aggregation

Page 4: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

The ProblemThe Problem

• Analytical functions can be expensive!• minutes or hours per data point.

• Query processor has no control over execution of individual function calls.• UDF API is a Black Box

• Earlier work aims to avoid UDF calls:• predicate reordering ([HS93][KMPS94][CS96]))• memoization and caching ([HN96], [DF05])

• Remaining calls can still be a showstopper.

Page 5: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

The IntuitionThe Intuition

1. Many functions have accuracy/cost tradeoffs. e.g., iterative solvers.

2. UDFs often appear in predicates and aggregates where exact answers are not required.SELECT BD.*

FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100

Page 6: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Our SolutionOur Solution

VAOs (Variable Accuracy Operators)

New query operators that:• Expose function cost/accuracy

tradeoffs using a new UDF API.

• Exploit this tradeoff to avoid excess work while correctly answering the query.

Page 7: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

VAOs - Basic IdeaVAOs - Basic Idea

• Initially run function to obtain a coarse answer.• This needs to be cheaper than

running to a more accurate answer.

• If more accuracy needed - iterate!

Page 8: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Traditional Execution - Traditional Execution - SelectSelect

Select> 100 ?

execute model (IR.Rate,BD)

SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)

> $100;

10.1% . . .InterestRate

BondData

BD 1 $105.01 Result

BD 1

Page 9: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

VAO VAO Execution: Execution: SelectSelect

SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)

> $100;

10.1% . . .InterestRate

BondData

Select> 100 ?

execute model (IR.Rate,BD)

-VAO

BD 1 $98 $110

ResultObject

L H

Page 10: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

VAO VAO Execution: Execution: SelectSelect

SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)

> $100;

10.1% . . .InterestRate

BondData

BD 1

Select> 100 ?

execute model (IR.Rate,BD)

-VAO

BD 1 $101 $108

ResultObject

L H

Iterate()

Page 11: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

VAO APIVAO API

• Use iterative interface• Traditional: <number> = f(<args>) • VAO: <result object> = f(<args>)

1. fields for (conservative) error bounds2. iterate() method: refines bounds with more

work3. for some vaos: also need estimates for CPU

cost and error reduction of next iteration

• Useful for:• Any sort of iterative function (e.g. root

finders, numerical integration)• Any technique with iterative step refinement

(e.g. PDEs)

Page 12: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Iteration StrategyIteration Strategy

• Selection iterates over an object until predicate value is known.

• Aggregate operators more difficult • Answer dependent on sets of result

objects• Need to decide how to iterate over

multiple result objects

Page 13: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Example: MAX(f(x1), Example: MAX(f(x1), f(x2))f(x2))

xx1 x2

f(x) bounds

initial

bounds

IterateOverf(x1)

xx1 x2

f(x) bounds

xx1 x2

f(x) bounds

IterateOverf(x2)

IterateOverboth

xx1 x2

f(x) bounds

Need an iteration strategy that attempts to minimize cost

Page 14: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Solution: Greedy Solution: Greedy StrategyStrategy

• Iterate over the object that has the best ratio of benefit to CPU cost among the current choices.

• Good strategy if functions converge• Later iterations likely to have

less benefit/unit cost

• Operator-dependent

Page 15: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Example RevisitedExample Revisited

MAX(f(x1),f(x2))

Greedy Strategy: choose best overlap reduction per CPU costUse error reduction estimates to estimate overlap reduction.Cost estimation depends on function.

Goal State: no overlap between f(x1) and f(x2)

Page 16: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Example RevisitedExample Revisited• Determine if f(x1) > f(x2)

Function Overlap Red. Est.

CPU Cost Est.

f(x1)

f(x2)

$.04 4 sec.

$.04 4 sec.

xx1 x2

f(x)

Page 17: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Example RevisitedExample Revisited• Determine if f(x1) > f(x2)

Function Overlap Red. Est.

CPU Cost Est.

f(x1)

f(x2)

xx1 x2

f(x)

$.01 8 sec.

$.02 4 sec.

$.04 4 sec.

$.04 4 sec.

Page 18: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Example RevisitedExample Revisited• Determine if f(x1) > f(x2)

Function Overlap Red. Est.

CPU Cost Est.

f(x1)

f(x2)

xx1 x2

f(x)

$0 8 sec.

$0 8 sec.

$.01 8 sec.

$.02 4 sec.

Page 19: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

AggregatesAggregates

Operator

Goal State Greedy Heuristic

min/max(general)

No overlap between minimum (maximum) value and other function error bounds

Make educated guess for max. Choose iteration that reduces most overlap between guess and other error bounds per cycle

avg/sum avg/sum of error bounds have widthless than user-defined tolerance

Choose iteration which reduces avg/sum of bounds the most per cycle

Page 20: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Performance SetupPerformance Setup

• Standalone implemenation of VAO framework in C++

• Used numeric bond model and bond data from [DF05]

• Real Bond Data - 500 Mortgage-backed Securities.

• Synthetic Bond Data - to stress test VAOs

• Single Interest Rate.

Page 21: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

VAO ImplementationVAO Implementation

• Numeric bond model [S95] implemented with traditional and VAOs interface• Based on PDE solver• VAO iterate(): double size of PDE

grid• Bounds and error reduction estimates

derived by using current and previous iteration results and Richardson’s Extrapolation [BF01]

Page 22: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Selection Selection PerformancePerformance500 bonds, 1 interest rate

Selection Performance

1

10

100

1000

10000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Selectivity

Runtime (sec.)

Trad

VAO

Runtime depends on number of bonds close to predicate.

Page 23: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

Stress TestStress Test• Generate bonds with accurate

values near the predicateGaussian, mean = predicate value, vary

std. dev.

Std. dev. of realbonds: $7.78

Page 24: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

In the PaperIn the Paper• Other Results

• Max• Real bonds: 111 sec. vs. 6953 sec.• Synthetic bonds: VAOs better than traditional above

$.05 std. dev.• Average

• Up to 5x improvement if a small number of bonds are weighted heavily in average.

• Details on Error and Cost estimates for PDE-based bond model.• Other types of models covered in Matt’s thesis.

Page 25: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

ConclusionConclusion• Many emerging CQ applications require the

repeated execution of expensive functions.• VAOs are new operators that change how

these functions execute• Use new iterative API that exposes work-accuracy

tradeoff in functions• Do only enough work to answer the query using

greedy strategy to choose iterations

• With real bond data and models, VAOs show 1-2 orders of magnitude improvement.

• For more detailed information:[email protected]

Page 26: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

The Advisor’s DodgeThe Advisor’s Dodge

Relative Contribution to Research

0

20

40

60

80

100

0 1 2 3 4 5

Time in Program (years)

Percent Contribution

Student

AdvisorThisWork

Courtesy of Jennifer Widom

Page 27: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

BibliographyBibliography

• [HS93] J. M. Hellerstein and M. Stonebraker, “Predicate Migration: Optimizing Queries with Expensive Predicates”, SIGMOD 1993.

• [HN96] J. M. Hellerstein and J. Naughton, “Query Execution Techniques for Caching Expensive Predicates”, SIGMOD 1996.

• [DF05] M. Denny and M.J. Franklin. “Predicate Result Range Caching for Continuous Queries”, SIGMOD 2005

Page 28: Adaptive Execution of           Variable-Accuracy Functions

Matt Denny, Mike FranklinUC Berkeley EECS

BibliographyBibliography

• [S95] R. Stanton, “Rational Prepayment and the Valuation of Mortgage-Backed Securities,” The Review of Financial Studies, Vol. 8, No. 3, 677-708.

• [BF01] R.L. Burden, J.D. Faires, Numerical Analysis. Brooks/Cole, 2001.