1 wavelet synopses with error guarantees minos garofalakis phillip b. gibbons information sciences...

1

Wavelet synopseswith Error Guarantees

Minos Garofalakis Phillip B Gibbons1048576Information Sciences Research Center Bell Labs Lucent Technologies Murray Hill NJ 07974

ACM SIGMOD 2002

2

Outline

Introduction Wavelet basics Probabilistic wavelet synopses Experimental study Conclusions

3

Introduction The wavelet decomposition has demonstrated

the effectiveness in reducing large amounts of data to compact sets of wavelet coefficients (termed ldquowavelet synopsesrdquo) that can be used to provide fast and reasonably accurate approximate answers to queries

Due to exploratory nature of many Decision Support Systems applications there are a number of scenarios in which the user may prefer a fast approximate answer

4

Introduction A major criticism of wavelet-based

techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers

5

Introduction The problem for approximate query

processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees

We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers

6

Introduction The technique is based on probabilistic thre

sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis

7

Wavelet basics Given the data vector A the wavelet

transform of A can be computed as follow

In order equalize the importance of all wavelet coefficients we normalize the coefficient is

8

Wavelet basics A helpful tool for exploring and

understanding the key properties of the wavelet decomposition is error tree structure

9

Wavelet basics The important reconstruction properties

(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)

(P2)The range sum d(lh)=

10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11

Probabilistic wavelet synopsesAThe problem with conventional wavelets

Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error

12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13


Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi

cients without compensating for their loss

14

Probabilistic wavelet synopses BGeneral Approach

Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero

By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be

retained (2)We minimize a desired error metric in the

reconstruction of the data

15


The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci

where we select a rounding value λi for each non-zero ci such that

16


Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability

It variance is simply

17

Probabilistic wavelet synopses BGeneral Approach 1

For example λ0=c0 λ10= 2c10 λi=3ci2

18

Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s

λi closer ci reduce the variance

λi further from ci reduces the expected number of retained coefficients

19

Probabilistic wavelet synopses CRounding to minimize the expected mean-square error

A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)

1

20


Letting and The expected L2 error minimization problem is

equivalent to

Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when

21


Let

22

Probabilistic wavelet synopses DRounding to minimize the maximum relative error

We focus on minimizing the maximum reconstruction error for individual (related error)

The goal is to produce estimate for each value di such that

23


The expected value of we would like to minimize the variance

More precisely we seek to minimize the normalized standard error for a reconstructed data value

24


Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)

So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric

25


26


We would like to formulate a dynamic programming recurrence for this problem

Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B

27


M[jB] depicted in (11)

28


29


The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use

The key technical idea is to quantize the solution space

We modify the constraint

where q is a input integer

30

Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses

Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric

31

Probabilistic wavelet synopsesF Summary of the approach

32

Experimental study A Zipfian data generator was used to produ

ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)

We use real world data set download from the National Forest Service

Let q=10 sanity bound S as the 10-percentile in the da

ta perturbation Δ= min001 S100

33

Experimental study

34

Experimental study

35

Experimental study

36

Conclusions We has introduced probabilistic wavelet synopses

the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers

We have described a number of novel techniques for tuning our scheme to minimize desired error metrics

Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach

2

Outline

Introduction Wavelet basics Probabilistic wavelet synopses Experimental study Conclusions

3




4



5




6



7




8



9




10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





3




4



5




6



7




8



9




10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





4



5




6



7




8



9




10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





5




6



7




8



9




10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





6



7




8



9




10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





7




8



9




10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





8



9




10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





9




10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





10

Wavelet basics

d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93

11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





11



12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





12


d5=65-0+0-0=65 d(35)=365-0-0+0-0=195

13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





13




14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





14






15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





15




16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





16




17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





17



18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





18




19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





19



1

20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





20



equivalent to


21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





21


Let

22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





22




23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





23




24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





24




25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





25


26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





26




27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





27



28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





28


29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





29






30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





30



31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





31


32






33

Experimental study

34

Experimental study

35

Experimental study

36





32






33

Experimental study

34

Experimental study

35

Experimental study

36





33

Experimental study

34

Experimental study

35

Experimental study

36





34

Experimental study

35

Experimental study

36





35

Experimental study

36





36





1 wavelet synopses with error guarantees minos garofalakis phillip b. gibbons information sciences...

Documents

conventional wavelet

wavelet decomposition

building wavelet synopses

b wavelet coefficients

wavelet basics d

major criticism of wavelet

loss slide

total of b c