approximate query processing using waveletsadobra/approxqp/wav0401.pdfapproximate query processing...

51
Approximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim Presented by Guanghua Yan

Upload: duongnguyet

Post on 25-Apr-2018

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

Approximate Query Processing Using Wavelets

Kaushik Chakrabarti

Minos Garofalakis

Rajeev Rastogi

Kyuseok Shim

Presented by Guanghua Yan

Page 2: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

2

Outlinen Approximate query processing:

– Problem and Prior solutions

– Another Solution - wavelets

n Using wavelets to construct synopsis:– 1-D Haar Wavelets

– Multi-D Haar Wavelets

– Construction of Synopsis

n Query processing in wavelets domain:– Select

– Project

– Join

n Rendering the result

n Experimental Evaluation

n Conclusions

Page 3: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

3

Why do we need Approximate Query Processing?

n Characteristics of DSS applications– Huge Amount of Data(GB/TB)

– High Query Complexity

– Stringent response-time requirement

n EXACT answer NOT always required– Exploratory nature of DSS applications

– Aggregate query : Precision to penny?NO

– Fast, approximate answer is preferable

n Approximate Query Processing– Approximate answers

– Quick response

Data Warehouse (GB/TB)

SQL Query

ExactAnswers

Problem:Long Response Time

Page 4: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

4

How does Approximate Query Processing work?

Data Warehouse (GB/TB)

SQL Query

ApproximateAnswers

FastResponseTimes

CompactRelations (MB)

Construct Compact Relations(in advance)

TransformedSQL Query

TransformationAlgebra

Page 5: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

5

Previous Workn Construct compact relations using:

– Random Sampling (AQUA system)• accurate for aggregate queries(Count, SUM, AVG)• not suitable when joins are involved (too few tuples)• not suitable for non-aggregate queries

– Histograms (Ioannidis and Poosala)• effectiveness at high dimensions is unclear• construction is costly (And Storage, dimensionality curse)• needs to expand for joins(join makes the Dim even higher)

– Wavelets (Vitter and Wang)• effective for aggregate queries even at high dimensions• limited in query processing scope (only range-sum queries)

Page 6: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

6

Overview of the work in this paper

n Construct compact synopsis of interesting tables using

multi-resolution wavelet decomposition (done in advance)

– fast, takes just a single pass over the relation in the best case,

otherwise logarithmic passes

n SQL queries are answered by working just on the compact

relations i.e. entirely in the wavelet (compressed) domain

– fast response times

– results converted back to relational domain (rendering) at the end

– all types of queries supported: aggregate, non-aggregate

n Fast, accurate, general

Page 7: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

7

Overview of the work – the big picture

Data Warehouse (GB/TB)

SQL Query

ApproximateAnswers

FastResponseTimes

CompactRelations (MB)

Construct Compact Relations(in advance)

TransformedSQL Query

TransformationAlgebra

- -++

Query Result Rendering(If needed)

Result Relation

Step 1

Step 2

Step 3

Page 8: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

8

Step1 : Construct synopsis with wavelets decomposition

n 1-D Haar Wavelets

n Multi-D Haar Wavelets

n Construction of Synopsis

Page 9: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

9

What’s decomposition?n Vector Decomposition

– V = (1, 2, 3, 4)– V = 1 * (1, 0, 0, 0) + 2 * (0, 1, 0, 0) +

3 * (0, 0, 1, 0) + 4 * (0, 0, 0, 1)

– 1, 2, 3, 4 called coefficients.b1 = (1, 0, 0, 0) called basis vector

3 = (1, 2, 3, 4) * (0, 0, 1, 0)

– Orthogonal :• Given two basis vectors bi & bj

• No redundancy, regular, easy to reconstruct

– Looks useless(from (1, 2, 3, 4) to (1, 2, 3, 4)) except the idea of decomp.

Basis Vectors

1 i = jPdot = bi * bj =

0 otherwise

Page 10: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

10

What’s decomposition?

n Idea of Decomposition– Fix a set of basis– Compute a set of coefficients

• Multiplying the original data by one basis gives us one coefficient• Dot product vs. Inner product

– # of basis = # of coefficients = # of elements(original data)– Represent the original data(or function) by a set of

coefficients in terms of a set of basis – Motivation

• Find new features of data (Fourier) • Compress data (Wavelets in this paper)

– The original data could be reconstructed (Easy for orthogonal basis)

• Multiply the coefficient by the corresponding basis• Sum up all the products

Page 11: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

11

What’s decomposition?n Function Decomposition

– Fourier Transformation and Inverse Trans.

– Basis functions : cosine and sine functions .– Widely used in Engineering– Problem : 1. Losing time resolution, good for periodic signal

2. Basis functions fixed

Basis functions

Page 12: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

12

What’s decomposition?n Wavelets Decomposition

– Share the idea with Fourier Transformation– Time resolution added

– Basis functions – scaled & shifted version of mother wavelets– Orthogonal – Vanishing moments, Compact support, Regularity– Wavelet decomposition generates compact representations that

exploit the local structure of the function– Wavelets decomposition – Scaling function & wavelets function– Problem : What wavelets decomposition to use? (Haar, CDF(2, X),

CDF(3, X), Daubechies series)

Basis functionsWavelets function (Mother Wavelets)

Page 13: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

13

n Why Haar Wavelets?– Simplest wavelets function– Fast to compute( averaging & differencing )– Performing well in practice(Image Compression)

n What does Haar Wavelets look like? – First Example

Background on Wavelets: 1-d Haar Wavelets

35 -3 16 10 8 -8 0 12

35 -3 16 10 8 -8 0 12

32 38 16 10 8 -8 0 12

32 16 38 10 8 -8 0 12

48 16 48 28 8 -8 0 12

48 8 16 -8 48 0 28 12

56 40 8 24 48 48 40 16

Blue : Original or average coefficient

Red : Detail coefficient

Page 14: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

14

Haar Wavelets functions

n Scaling function ( Father Wavelets)

n Wavelets function ( Mother Wavelets)

Scaling

Wavelets

1

0

-1

1

0

-1

Scaled

1

0

-1Scaled & Shifted

1

0

-1

Scaled

1

0

-1Scaled & Shifted

1

0

-1

1 t in [0, 1]h0(t) =

0 otherwise

1 t in [0, ½]

h0(t) = -1 t in [½, 1]0 otherwise

Scaled & Shifted

Scaled & Shifted

Page 15: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

15

1-d Haar basis functions (Daughter Wavelets)

1

0

-1

1

0

-1

1

0

-1

1

0

-1

1

0

-1

1

0

-1

1

0

-1

1

0

-1

h : (1 ,1, 1, 1, 1, 1, 1, 1) h1 : (1 ,1, 1, 1, -1, -1, -1, -1) h2 : (1 ,1, -1, -1, 0, 0, 0, 0) h3 : (0 ,0, 0, 0, 1, 1, -1, -1)

h4 : (1 ,-1, 0, 0, 0, 0, 0, 0) h5 : (0 ,0, 1, -1, 0, 0, 0, 0) h6 : (0 ,0, 0, 0, 1, -1, 0, 0) h7 : (0 ,0, 0, 0, 0, 0, 1,- 1)

Scaled and shifted version of mother wavelets

Scaling function Wavelets function

n Set of basis functions(complete decomp.) for signal S of length 8n Vector below each basis function is a sampling of the basis functionn Multiply S by each basis will give each coefficient(Result : 8 coefficients)n Connection with the First Example

35 -3 16 10 8 -8 0 12

56 40 8 24 48 48 40 16

Page 16: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

16

Compute 1-d Haar wavelets decomp.By linear algebra

n Decomp. Matrix Ma ( Collecting the 8 basis vectors, put each one as a column)– Dot product of any two columns is ZERO– Normalizing each column is easy

n Decomp.(Complete)– Given any signal S of length 8– Multiplying S by Ma gives the wavelets decomp.– Y = S * Ma

n Reconstruction– Make Ma orthogonal (Ma

-1 = MaT)

– S = Y * Ma–1 = Y * Ma

T

1 1 1 0 1 0 0 0

1 1 1 0 -1 0 0 0

1 1 -1 0 0 1 0 0

Ma = 1 1 -1 0 0 -1 0 0

1 -1 0 1 0 0 1 0

1 -1 0 1 0 0 -1 0

1 -1 0 -1 0 0 0 1

1 -1 0 -1 0 0 0 -1

Decomp. Matrix

Page 17: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

17

n Decomposition– Pair wise averaging and differencing

[One scale decomposition]

– Distribution, put average(approximate coefficient) together and put difference(detail coefficient) together

– Repeat above on average until only one average number left

[Recursive, Complete decomposition]

– Result : Last average + all detail coefficients

n Reconstruction– Exactly the inverse of decomposition

Compute 1-d Haar wavelets decomp.Scale by scale

Page 18: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

18

How does 1-d Haar Wavelet work?Example

35 -3 16 10 8 -8 0 12

35 -3 16 10 8 -8 0 12

32 38 16 10 8 -8 0 12

32 16 38 10 8 -8 0 12

48 16 48 28 8 -8 0 12

48 8 16 -8 48 0 28 12

56 40 8 24 48 48 40 16

Decomposition ( logN steps needed )3 Steps are used to do the complete decomposition

ReconstructionExact inverse of the above process

Blue : Original or average coefficient

Red : Detail coefficient

Page 19: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

19

Where’s the compression and Approximate?

n Thresholdingn Set a threshold value Cn Replace those wavelet coefficients

whose absolute value less than C with ZERO

n More zero in the wavelet coefficients Compression – store ONLY non-zero

n The more similar data we have, the more compression we get

35 -3 16 10 8 -8 0 12

56 40 8 24 48 48 40 16

56 40 8 24 48 48 40 16

Threshold C = 4

Threshold C = 9

35 0 16 10 0 0 0 12

51 51 19 19 45 45 37 13

56 40 8 24 48 48 40 16

35 0 16 10 8 -8 0 12

59 43 11 27 45 45 37 13

56 40 8 24 48 48 40 16

Row 1 : original data

Row 2 : coefficients

Row 3 : Reconstructed data

n How much does this influence the original data?

Page 20: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

20

Haar wavelets compression and approximate

Blue line : Original signal Red line : Reconstructed signal

Threshold C = 4

35 0 16 10 0 0 0 12

51 51 19 19 45 45 37 13

56 40 8 24 48 48 40 16

35 0 16 10 8 -8 0 12

59 43 11 27 45 45 37 13

56 40 8 24 48 48 40 16

Threshold C = 9

Page 21: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

21

Background on Wavelets: Multi-d Haar Wavelets

n Data cube has multi dimensions(of equal-length)– Standard decomposition

– Non-standard decomposition

n Standard decomposition– Fix an ordering for the data dimensions, say 1, 2, …, d

– For each dimension k, fix other (d-1) dimensions, we get an 1-D “row” vector

– Perform complete 1-D Haar wavelet decomposition on the I-D vector

– Repeat the last two steps in the order fixed in step 1

n Non-standard decomposition– Fix an ordering for the data dimensions, say 1, 2, …, d

– In this order for each dimension, perform one scale of 1-D Haar decomp

– Collect the averages together, repeat the last step on the averages

– Conceptualizing : using a hyper-box of size 2 X 2 X 2 … X 2( = 2d)

Page 22: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

22

Multi-d Haar Wavelets (non-standard)

a b

c d

s d1

d2 d3

S = (a + b + c + d) / 4d1 = (a + c - b - d) / 4d2 = (a + b - c - d) / 4d3 = (a + d - c - d) / 4

a = S + d1 + d2 + d3b = S + d2 - d1 - d3c = S + d1 - d2 - d3d = S + d3 - d1 - d2

Wavelets Coefficients

( a + b ) / 2 ( a - b ) / 2

( c + d ) / 2 ( c - d ) / 2One step along Dim 1 (x axis)

One step along Dim 2 (y axis)rebuilding S = ( ( a + b ) / 2 + ( c + d)/ 2 ) / 2

= ( a + b + c + d ) / 4

d1, d2, d3

Page 23: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

23

Multi-d Haar Wavelets Example

Bad Position

Page 24: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

24

Multi-d Haar Coefficients: Semantics and Representation

n Question : What’s the contribution of each coefficient (W) in rebuilding the data array?How to store a coefficient?

n Answer : W = <R, S, v>– R : d-dimensional support hyper-rectangle of W

– S : sign information for all d-dimensional cells of W.R

– V : magnitude of the coefficient of W

– R & S only depends on Haar basis function

– V depends on the original data

Page 25: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

25

Multi-d Haar Coefficients: Semantics and Representation

+ + - +- +-

-+

--++ + - +-

-+ +

+ +

-

- -

- -++ --+

+

- -++ --+

+

A :2D Data Array

Wa: Wavelet Coefficients

0

0

1 2 3

1

2

3

W = Wa[1, 2] W.v = -2W.R.bound[1].lo = 2 W.R.bound[1].hi = 3 W.R.bound[2].lo = 0 W.R.bound[2].hi = 1W.S.sign[1].lo = ‘+’ W.S.sign[1].hi = ‘+’ W.S.sign[2].lo = ‘+’ W.S.sign[2].hi = ‘-’W.S.schg[1] = 2 W.S.schg[2] = 1A[0,1] = +Wa[0,0]+Wa[0,1]+Wa[1,0]+Wa[1,1]-Wa[0,2]+Wa[2,0]-Wa[2,2]=2.5-(-1)+(-.5) = 3

Page 26: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

26

Notation used in the paper

Page 27: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

27

Construction of Compact Relations: Wavelet decomposition of JFD Matrix

Relation (Numeric Attributes)

Joint FrequencyDistribution (JFD) Matrix

Page 28: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

28

Thresholding

n Retain the k coefficients with largest absolute value after normalization

n Minimizes overall mean squared error

n The set of coefficients retained after thresholdingis the wavelet-coefficient synopsis

n All SQL queries will be on the synopsis

Page 29: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

29

Summary of Step1n Wavelets Decomp. & Construction of synopsis

– 1-D Haar wavelets Decomp.• Simple & fast to compute• Pair wise averaging & differencing• Recursive fashion

– M-D Haar wavelets Decomp. • Non-standard extension• Alternate between dimensions

– Thresholding • Thresholding smallest coefficients• Lossy data compression • approximation

– How to store coefficients • Semantics of the notations W = (R, S, v)• SQL will be on coefficients

Page 30: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

30

Query Processing(Step 2)

Wavelet Synopses

Approximate Relations

Query Results in Wavelet Domain

Final Approximate Results

Render

Render

Querying in Wavelet Domain

Querying in Relation Domain

•Entire processing in compressed (wavelet) domain

Compressed domain (FAST)

Relation domain (SLOW)

Page 31: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

31

Query Processing

join

project

select select

Set of coeffs Set of coeffs

Set of coeffs

n Each operator (e.g., select, project,

join, aggregates etc.)

– input: set of coefficients

– output: set of coefficients

n Finally, rendering step

– input: set of coefficients

– output: (multi)set of tuples

n Questions– How to map query algebra?– Can we maintain the semantics

of the coefficients?

render

Set of tuples

Page 32: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

32

n Selectpred(WT) ;– T is a d-dimensional relation– WT is T’s wavelets synopsis

n Pred = ( li1 ≤ Di1≤ hi1

) ^ … ^ (lik ≤ Dik≤ hik

)

n K-dimensional range selection– Range defined for k dimensions, D’ = {Di1

, Di2, … , Dik

}– Range unspecified for remaining (d - k) dimensions : 0 ≤ X ≤ |Dx|

n Example

Query algebra mapping - Selection : Definition

Page 33: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

33

6

3

73

322

4

1

1

86

3

Query RangeJFD Matrix

n D1 : (0, 7) D2 : (0, 7)n Pred = (1 ≤ D1 ≤ 4 ) ^ ( 2 ≤ D2 ≤ 6 ) D’ = { D1, D2}

n In relation domain, interested in only those cells inside query rangen In wavelet domain, interested in only the coefficients that contribute to those cells

Dim D1 (Attr1)

Dim D2 (Attr2)

Count

0 6 6 1 2 3 1 3 4 1 5 6 1 6 8 2 6 7 3 0 1 4 2 3 5 2 2 6 1 3 6 2 2 6 5 1 6 6 3

Dim. D2

Dim. D1

Query algebra mapping - Selection : example

Page 34: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

34

--++

+ --+

+-

-+

-+-+

D2

D2

D1

D1

QueryRange

n 1. For each W in WT don 2. If for every Dij in D’ /* Check overlapping */

lij ≤ W.R.bound[ii].lo ≤ hij orW.R.bound[ii].lo ≤ lij ≤ W.R.bound[ii].hi

then goto 3else goto 5

n 3. For all Dij in D’ doset /* Overlapping area is the new hyper-rectangle*/

W.R.bound[ii].lo := max{lij , W.R.bound[ii].lo}

W.R.bound[ii].hi := min {hij, W.R.bound[ii].hi}

if W.R.bound[ii].hi < W.R.schg[ii] thenset /* no sign change any more */

W.S.schg[ii] := W.R.bound[ii].loW.S.sign[ii] := [W.S.sign[ii]. lo, W.S.sign[ii]. lo]

elseif W.R.bound[ii].lo ≥ W.S.schg[ii] thenset /* no sign change any more */

W.S.schg[ii] := W.R.bound[ii].loW.S.sign[ii] := [W.S.sign[ii].hi, W.S.sign[ii].hi]

n 4. Output updated W, Ws = Ws ∪ Wn 5. Goto 1, select next W

W1W4

W3

W2

W4’

W2’W3

Query algebra mapping - Selection : Mapping

Page 35: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

35

n ProjectXi1, Xi2 ,…, Xik(WT) ;

– T is a d-dimensional relation– WT is T’s wavelets synopsis

n Xi1, ,Xi2 , … , Xik

are the set of attributes we are interested– Remaining (d-k) dimensions will be projected out

n Project out (d-k) dimensions one by one

n Example

Query algebra mapping - Projection : Definition

Page 36: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

36

6

3

73

322

4

1

1

86

3

Retain this

dim.(D1)

JFD Matrix

n D1 is to be retained, D2 will be projected outn In relation domain, sum elements in each row along eliminated dimensionn In wavelet domain, sum the contribution of coefficient along eliminated dimension

Eliminate thisdimension (D2)

92317216

Result of projection

Dim D1(Attr1)

Dim D2(Attr2)

Count

6 1 36 2 26 5 16 6 3

Dim D1(Attr1)

Count

6 9

Project

Query algebra mapping - Projection : example

Page 37: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

37

+

-+

+-

X2X1

- +-

-+

+ +

+

-+

-

+-

X

D2

D1

D1

Projecton D1

Query algebra mapping - Projection : Mapping

W1

W2

W 1.v = X * W1 .v W2 .v =( X2 – X1 )* W1 .v

n 1. For each Dj in D’ (To be projected out)n 2. For every W in WT do

2.1 Set W.v = W.v * Pj

where Pj equals to(W.R.bound[j].hi - W.S.schg[j] + 1) * W.S.sign[j].hi+ (W.S.schg[j] - W.R.bound[j].lo) * W.R.bound[j].lo

2.2 Discard dimension Dj (Hyper-rectangle and sign)from W

n 3. Goto 1, select next Dj

n In Step 2, by summing up the contributions of W along Dj, we are projecting out Dj

n In a word we can simply do for each W– W.v := W.v * PRODDj in D –D ‘ Pj– Discard dimensions D – D’

W2

Page 38: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

38

n Joinpred(WT1 ,WT2)

– Dim(T1) = d1, Dim(T2) = d2

– wavelets synopsis(T1) = WT1 , wavelets synopsis(T2) = WT2

n Pred = ( X11 = X2

1 ) ^ … ^ ( X1k = X2

k )– Pred is of k-dim, k ≤ d1 && k ≤ d2

– WLOG, assume they are the first k dimensions of both T1 and T2

– Let D’ = (D1, D2, … , Dk)

n Dimension of Result would be ( d1 + d2 - k )

n Example

Query algebra mapping - Equi-Join : Definition

Page 39: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

39

7

n In relation domain, join count = 7*3n In wavelet domain, consider all pairs of coefficients and check

joinability (and compute new coefficients)

3

JFD Matrix of Relation1

Join Dimension D1

Dim. D2 Dim. D3

These two cells have the same value on D1

JFD Matrix of Relation2

Dim D1(Attr1)

Dim D2(Attr2)

Count

6 2 74 3 6

6 Dim D1(Attr1)

Dim D3(Attr3)

Count

6 3 3

Join along D1

Dim D1(Attr1)

Dim D2(Attr2)

Dim D3(Attr3)

Count

6 2 3 21

Relation1

Relation2

Query algebra mapping - Equi-Join : example

Page 40: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

40

D2

-+--++-

+

NOTHING

+-

D1D1

D3

D2 D3

--++

+-+ -

W.v =W11.v*W21.v

+-

+-

D1D1

Join Dimension D1

Query algebra mapping - Equi-Join : examplen Case 1 : no overlapping

– Output nothing

n Case 2: Overlapping– Cell A(X1, X2) and Cell B(X1, X3) – W11 and W12 cover A (W12 not shown)

– W21 and W22 cover B (W22 not shown)

– Calculate join result for (X1, X2, X3 )

(W11.v + W12.v) * (W21.v + W22.v) =W11.v * W21.v + W11.v * W22.v + W12.v * W21.v + W12.v * W22.v

n Consider each coefficient pairn Join range along any dimension can

contain at most one true sign change due to the complete containmentproperty of the Haar wavelets decomposition

W11

W11W21

W21

A(X1, X2) B(X1, X3)

X1

X2 X3

Page 41: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

41

n 1. For each pair (W1 ,W2) W1 in WT1 && W2 in WT2 don 2. If for every Di in D’ /* 2. Check overlapping in the k join dimensions*/

If ( W1.R.bound[i].lo ≤ W2.R.bound[i].lo ≤ W1.R.bound[i].hi ) OR( W2.R.bound[i].lo ≤ W1.R.bound[i].lo ≤ W2.R.bound[i].hi )

then goto 3 else goto 7n 3. For each join dimension Di in D’ do /* 3,4,5,6 build a new coefficient on join range */

1.1 set W.R.bound[i].lo := max{W1.R.bound[i].lo, W2.R.bound[i].lo} /* set join boundary */W.R.bound[i].hi := min {W1.R.bound[i].hi, W2.R.bound[i].hi}

1.2 For j = 1, 2 /*Let Sj be a temporary sign-vector variable*/ /* compute sign info */if W.R.bound[i].hi < W j.S.schg[i] then Sj := [W j.S.sign[I].lo, W j.S.sign[I].lo];elseif W.R.bound[i].lo ≥ W j.S.schg[i] then Sj := [W j.S.sign[I].hi, W j.S.sign[I].hi];

else set Sj := W j.S.sign[I];1.3 Set W.S.sign[i] := [S1.lo * S2.lo, S1.hi * S2.hi];1.4 If W.S.sign[i].lo == W.S.sign[i].hi then set W.S.schg[i] := W.R.bound[i].lo1.5 else set W.S.schg[i] :=

maxj=1,2{W j.S.schg[i] : W j.S.schg[i] in [W.R.bound[i].lo , W.R.bound[i].hi] }

n 4. For each non-join dimension Di, i = k + 1, … , d1 do /* 4,5 inherit non-join dimensions */set W.R.bound[i] := W1.R.bound[i], W.S.sign[i] := W1.S.sign[I], W.S.schg[i] := W1.S.schg[i]

n 5. For each non-join dimension Di, i = d1 + 1, … , d1 + d2 – k doset W.R.bound[i] := W2.R.bound[i – d1 + k], W.S.sign[i] := W2.S.sign[i – d1 + k ], W.S.schg[i] := W2.S.schg[i – d1 + k ]

n 6. Set W.v : = W1.v * W2.v and output W, Ws = Ws ∪ W n 7. Goto 1, select another pair

Equi-Join : Mapping

Page 42: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

42

D2

-+-

-++

-+ NOTHING

+-

D1D1

D3

D2 D3

--++

+-+ -val =val1*val2

+

-

+-

D1D1

Join Dimension D1

Query algebra mapping - Equi-Join : example

Page 43: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

43

-+-

-++

+-

+-+

-val =val1*val2

D3D2

D1D1

--++

-++

-+-

val =val1*val2

++

D2 D3

D1 D1

Query algebra mapping - Equi-Join : example

Page 44: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

44

Summary of Step2

n Query algebra mapping(Only non-aggregate)– Selection

• Update those wavelets coefficients whose hyper-rectangle overlapping the selection range

– Projection• Sum up all wavelets coefficients along all dimensions to be

projected out

– Join• Create new wavelets coefficients• Hyper-rectangle equals to the join range plus non-join dimensions• Compute sign information

– Results need to be rendered• Output of above queries are wavelets coefficients• Need to be converted to database relation

Page 45: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

45

Rendering(Step 3)

n Go back from wavelets domain to database relations

n Semantics of wavelets coefficients unchanged– Range, Sign, Sign-change, Magnitude

n Inverse wavelets decomposition is easy– Sum up the contributions of all coefficients to each cell

Page 46: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

46

Experimental Results

n Compare wavelets-based technique – With sampling and histograms

– In terms of efficiency and accuracy

n Measuring accuracy (Error Metrics)– Aggregate : Absolute relative error

– Non-aggregate : EMD error

n Query types– SELECT, SELECT-SUM, SELECT-JOIN, SELECT-JOIN-SUM

Page 47: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

47

Datasets and Queries

n Synthetic data set

n Real data set: – CENSUS Population Survey (www.census.gov) 1992 & 1994

– 4-d data: age (0-17), education level (0-46), income (0-41), hrs/week (0-13)

– JFD Matrix size: 2 million cells(≈32 * 64 * 64 * 16)

– Relation sizes (2 relations) ~ 16,000

– Density ~ 0.001

n Queries:– Selects: 5 ≤ age < 10 ^ 10 ≤ income < 15, selectivity ~ 6%

– Joins: join age on 1992 and 1994 data

– Sum : sum on age

Page 48: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

48

Query Execution Time

n Two-D synthetic data set usedn Running time on base relation is 3.6 seconds (Enough

memory used)n Sampling is not counted here

– Giving too less tuples of joinn Wavelets runs faster (than Histograms)

– More than two orders of magnitude– Histograms expanded to generate tuple-value distribution– Wavelets expanded at the very end

Page 49: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

49

Query Execution Accuracy

Page 50: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

50

Query Execution Accuracy

Page 51: Approximate Query Processing Using Waveletsadobra/approxqp/wav0401.pdfApproximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim

51

Conclusion

n Wavelets are an effective tool for general purpose approximate query answering– fast query processing (entirely in wavelet (compressed)

domain)

– low synopsis construction cost

– high accuracy even at high dimensions

– can handle all types of queries