approximate query processing using waveletsadobra/approxqp/wav0401.pdfapproximate query processing...
TRANSCRIPT
Approximate Query Processing Using Wavelets
Kaushik Chakrabarti
Minos Garofalakis
Rajeev Rastogi
Kyuseok Shim
Presented by Guanghua Yan
2
Outlinen Approximate query processing:
– Problem and Prior solutions
– Another Solution - wavelets
n Using wavelets to construct synopsis:– 1-D Haar Wavelets
– Multi-D Haar Wavelets
– Construction of Synopsis
n Query processing in wavelets domain:– Select
– Project
– Join
n Rendering the result
n Experimental Evaluation
n Conclusions
3
Why do we need Approximate Query Processing?
n Characteristics of DSS applications– Huge Amount of Data(GB/TB)
– High Query Complexity
– Stringent response-time requirement
n EXACT answer NOT always required– Exploratory nature of DSS applications
– Aggregate query : Precision to penny?NO
– Fast, approximate answer is preferable
n Approximate Query Processing– Approximate answers
– Quick response
Data Warehouse (GB/TB)
SQL Query
ExactAnswers
Problem:Long Response Time
4
How does Approximate Query Processing work?
Data Warehouse (GB/TB)
SQL Query
ApproximateAnswers
FastResponseTimes
CompactRelations (MB)
Construct Compact Relations(in advance)
TransformedSQL Query
TransformationAlgebra
5
Previous Workn Construct compact relations using:
– Random Sampling (AQUA system)• accurate for aggregate queries(Count, SUM, AVG)• not suitable when joins are involved (too few tuples)• not suitable for non-aggregate queries
– Histograms (Ioannidis and Poosala)• effectiveness at high dimensions is unclear• construction is costly (And Storage, dimensionality curse)• needs to expand for joins(join makes the Dim even higher)
– Wavelets (Vitter and Wang)• effective for aggregate queries even at high dimensions• limited in query processing scope (only range-sum queries)
6
Overview of the work in this paper
n Construct compact synopsis of interesting tables using
multi-resolution wavelet decomposition (done in advance)
– fast, takes just a single pass over the relation in the best case,
otherwise logarithmic passes
n SQL queries are answered by working just on the compact
relations i.e. entirely in the wavelet (compressed) domain
– fast response times
– results converted back to relational domain (rendering) at the end
– all types of queries supported: aggregate, non-aggregate
n Fast, accurate, general
7
Overview of the work – the big picture
Data Warehouse (GB/TB)
SQL Query
ApproximateAnswers
FastResponseTimes
CompactRelations (MB)
Construct Compact Relations(in advance)
TransformedSQL Query
TransformationAlgebra
- -++
Query Result Rendering(If needed)
Result Relation
Step 1
Step 2
Step 3
8
Step1 : Construct synopsis with wavelets decomposition
n 1-D Haar Wavelets
n Multi-D Haar Wavelets
n Construction of Synopsis
9
What’s decomposition?n Vector Decomposition
– V = (1, 2, 3, 4)– V = 1 * (1, 0, 0, 0) + 2 * (0, 1, 0, 0) +
3 * (0, 0, 1, 0) + 4 * (0, 0, 0, 1)
– 1, 2, 3, 4 called coefficients.b1 = (1, 0, 0, 0) called basis vector
3 = (1, 2, 3, 4) * (0, 0, 1, 0)
– Orthogonal :• Given two basis vectors bi & bj
• No redundancy, regular, easy to reconstruct
– Looks useless(from (1, 2, 3, 4) to (1, 2, 3, 4)) except the idea of decomp.
Basis Vectors
1 i = jPdot = bi * bj =
0 otherwise
10
What’s decomposition?
n Idea of Decomposition– Fix a set of basis– Compute a set of coefficients
• Multiplying the original data by one basis gives us one coefficient• Dot product vs. Inner product
– # of basis = # of coefficients = # of elements(original data)– Represent the original data(or function) by a set of
coefficients in terms of a set of basis – Motivation
• Find new features of data (Fourier) • Compress data (Wavelets in this paper)
– The original data could be reconstructed (Easy for orthogonal basis)
• Multiply the coefficient by the corresponding basis• Sum up all the products
11
What’s decomposition?n Function Decomposition
– Fourier Transformation and Inverse Trans.
– Basis functions : cosine and sine functions .– Widely used in Engineering– Problem : 1. Losing time resolution, good for periodic signal
2. Basis functions fixed
Basis functions
12
What’s decomposition?n Wavelets Decomposition
– Share the idea with Fourier Transformation– Time resolution added
– Basis functions – scaled & shifted version of mother wavelets– Orthogonal – Vanishing moments, Compact support, Regularity– Wavelet decomposition generates compact representations that
exploit the local structure of the function– Wavelets decomposition – Scaling function & wavelets function– Problem : What wavelets decomposition to use? (Haar, CDF(2, X),
CDF(3, X), Daubechies series)
Basis functionsWavelets function (Mother Wavelets)
13
n Why Haar Wavelets?– Simplest wavelets function– Fast to compute( averaging & differencing )– Performing well in practice(Image Compression)
n What does Haar Wavelets look like? – First Example
Background on Wavelets: 1-d Haar Wavelets
35 -3 16 10 8 -8 0 12
35 -3 16 10 8 -8 0 12
32 38 16 10 8 -8 0 12
32 16 38 10 8 -8 0 12
48 16 48 28 8 -8 0 12
48 8 16 -8 48 0 28 12
56 40 8 24 48 48 40 16
Blue : Original or average coefficient
Red : Detail coefficient
14
Haar Wavelets functions
n Scaling function ( Father Wavelets)
n Wavelets function ( Mother Wavelets)
Scaling
Wavelets
1
0
-1
1
0
-1
Scaled
1
0
-1Scaled & Shifted
1
0
-1
Scaled
1
0
-1Scaled & Shifted
1
0
-1
1 t in [0, 1]h0(t) =
0 otherwise
1 t in [0, ½]
h0(t) = -1 t in [½, 1]0 otherwise
Scaled & Shifted
Scaled & Shifted
15
1-d Haar basis functions (Daughter Wavelets)
1
0
-1
1
0
-1
1
0
-1
1
0
-1
1
0
-1
1
0
-1
1
0
-1
1
0
-1
h : (1 ,1, 1, 1, 1, 1, 1, 1) h1 : (1 ,1, 1, 1, -1, -1, -1, -1) h2 : (1 ,1, -1, -1, 0, 0, 0, 0) h3 : (0 ,0, 0, 0, 1, 1, -1, -1)
h4 : (1 ,-1, 0, 0, 0, 0, 0, 0) h5 : (0 ,0, 1, -1, 0, 0, 0, 0) h6 : (0 ,0, 0, 0, 1, -1, 0, 0) h7 : (0 ,0, 0, 0, 0, 0, 1,- 1)
Scaled and shifted version of mother wavelets
Scaling function Wavelets function
n Set of basis functions(complete decomp.) for signal S of length 8n Vector below each basis function is a sampling of the basis functionn Multiply S by each basis will give each coefficient(Result : 8 coefficients)n Connection with the First Example
35 -3 16 10 8 -8 0 12
56 40 8 24 48 48 40 16
16
Compute 1-d Haar wavelets decomp.By linear algebra
n Decomp. Matrix Ma ( Collecting the 8 basis vectors, put each one as a column)– Dot product of any two columns is ZERO– Normalizing each column is easy
n Decomp.(Complete)– Given any signal S of length 8– Multiplying S by Ma gives the wavelets decomp.– Y = S * Ma
n Reconstruction– Make Ma orthogonal (Ma
-1 = MaT)
– S = Y * Ma–1 = Y * Ma
T
1 1 1 0 1 0 0 0
1 1 1 0 -1 0 0 0
1 1 -1 0 0 1 0 0
Ma = 1 1 -1 0 0 -1 0 0
1 -1 0 1 0 0 1 0
1 -1 0 1 0 0 -1 0
1 -1 0 -1 0 0 0 1
1 -1 0 -1 0 0 0 -1
Decomp. Matrix
17
n Decomposition– Pair wise averaging and differencing
[One scale decomposition]
– Distribution, put average(approximate coefficient) together and put difference(detail coefficient) together
– Repeat above on average until only one average number left
[Recursive, Complete decomposition]
– Result : Last average + all detail coefficients
n Reconstruction– Exactly the inverse of decomposition
Compute 1-d Haar wavelets decomp.Scale by scale
18
How does 1-d Haar Wavelet work?Example
35 -3 16 10 8 -8 0 12
35 -3 16 10 8 -8 0 12
32 38 16 10 8 -8 0 12
32 16 38 10 8 -8 0 12
48 16 48 28 8 -8 0 12
48 8 16 -8 48 0 28 12
56 40 8 24 48 48 40 16
Decomposition ( logN steps needed )3 Steps are used to do the complete decomposition
ReconstructionExact inverse of the above process
Blue : Original or average coefficient
Red : Detail coefficient
19
Where’s the compression and Approximate?
n Thresholdingn Set a threshold value Cn Replace those wavelet coefficients
whose absolute value less than C with ZERO
n More zero in the wavelet coefficients Compression – store ONLY non-zero
n The more similar data we have, the more compression we get
35 -3 16 10 8 -8 0 12
56 40 8 24 48 48 40 16
56 40 8 24 48 48 40 16
Threshold C = 4
Threshold C = 9
35 0 16 10 0 0 0 12
51 51 19 19 45 45 37 13
56 40 8 24 48 48 40 16
35 0 16 10 8 -8 0 12
59 43 11 27 45 45 37 13
56 40 8 24 48 48 40 16
Row 1 : original data
Row 2 : coefficients
Row 3 : Reconstructed data
n How much does this influence the original data?
20
Haar wavelets compression and approximate
Blue line : Original signal Red line : Reconstructed signal
Threshold C = 4
35 0 16 10 0 0 0 12
51 51 19 19 45 45 37 13
56 40 8 24 48 48 40 16
35 0 16 10 8 -8 0 12
59 43 11 27 45 45 37 13
56 40 8 24 48 48 40 16
Threshold C = 9
21
Background on Wavelets: Multi-d Haar Wavelets
n Data cube has multi dimensions(of equal-length)– Standard decomposition
– Non-standard decomposition
n Standard decomposition– Fix an ordering for the data dimensions, say 1, 2, …, d
– For each dimension k, fix other (d-1) dimensions, we get an 1-D “row” vector
– Perform complete 1-D Haar wavelet decomposition on the I-D vector
– Repeat the last two steps in the order fixed in step 1
n Non-standard decomposition– Fix an ordering for the data dimensions, say 1, 2, …, d
– In this order for each dimension, perform one scale of 1-D Haar decomp
– Collect the averages together, repeat the last step on the averages
– Conceptualizing : using a hyper-box of size 2 X 2 X 2 … X 2( = 2d)
22
Multi-d Haar Wavelets (non-standard)
a b
c d
s d1
d2 d3
S = (a + b + c + d) / 4d1 = (a + c - b - d) / 4d2 = (a + b - c - d) / 4d3 = (a + d - c - d) / 4
a = S + d1 + d2 + d3b = S + d2 - d1 - d3c = S + d1 - d2 - d3d = S + d3 - d1 - d2
Wavelets Coefficients
( a + b ) / 2 ( a - b ) / 2
( c + d ) / 2 ( c - d ) / 2One step along Dim 1 (x axis)
One step along Dim 2 (y axis)rebuilding S = ( ( a + b ) / 2 + ( c + d)/ 2 ) / 2
= ( a + b + c + d ) / 4
d1, d2, d3
23
Multi-d Haar Wavelets Example
Bad Position
24
Multi-d Haar Coefficients: Semantics and Representation
n Question : What’s the contribution of each coefficient (W) in rebuilding the data array?How to store a coefficient?
n Answer : W = <R, S, v>– R : d-dimensional support hyper-rectangle of W
– S : sign information for all d-dimensional cells of W.R
– V : magnitude of the coefficient of W
– R & S only depends on Haar basis function
– V depends on the original data
25
Multi-d Haar Coefficients: Semantics and Representation
+ + - +- +-
-+
--++ + - +-
-+ +
+ +
-
- -
- -++ --+
+
- -++ --+
+
A :2D Data Array
Wa: Wavelet Coefficients
0
0
1 2 3
1
2
3
W = Wa[1, 2] W.v = -2W.R.bound[1].lo = 2 W.R.bound[1].hi = 3 W.R.bound[2].lo = 0 W.R.bound[2].hi = 1W.S.sign[1].lo = ‘+’ W.S.sign[1].hi = ‘+’ W.S.sign[2].lo = ‘+’ W.S.sign[2].hi = ‘-’W.S.schg[1] = 2 W.S.schg[2] = 1A[0,1] = +Wa[0,0]+Wa[0,1]+Wa[1,0]+Wa[1,1]-Wa[0,2]+Wa[2,0]-Wa[2,2]=2.5-(-1)+(-.5) = 3
26
Notation used in the paper
27
Construction of Compact Relations: Wavelet decomposition of JFD Matrix
Relation (Numeric Attributes)
Joint FrequencyDistribution (JFD) Matrix
28
Thresholding
n Retain the k coefficients with largest absolute value after normalization
n Minimizes overall mean squared error
n The set of coefficients retained after thresholdingis the wavelet-coefficient synopsis
n All SQL queries will be on the synopsis
29
Summary of Step1n Wavelets Decomp. & Construction of synopsis
– 1-D Haar wavelets Decomp.• Simple & fast to compute• Pair wise averaging & differencing• Recursive fashion
– M-D Haar wavelets Decomp. • Non-standard extension• Alternate between dimensions
– Thresholding • Thresholding smallest coefficients• Lossy data compression • approximation
– How to store coefficients • Semantics of the notations W = (R, S, v)• SQL will be on coefficients
30
Query Processing(Step 2)
Wavelet Synopses
Approximate Relations
Query Results in Wavelet Domain
Final Approximate Results
Render
Render
Querying in Wavelet Domain
Querying in Relation Domain
•Entire processing in compressed (wavelet) domain
Compressed domain (FAST)
Relation domain (SLOW)
31
Query Processing
join
project
select select
Set of coeffs Set of coeffs
Set of coeffs
n Each operator (e.g., select, project,
join, aggregates etc.)
– input: set of coefficients
– output: set of coefficients
n Finally, rendering step
– input: set of coefficients
– output: (multi)set of tuples
n Questions– How to map query algebra?– Can we maintain the semantics
of the coefficients?
render
Set of tuples
32
n Selectpred(WT) ;– T is a d-dimensional relation– WT is T’s wavelets synopsis
n Pred = ( li1 ≤ Di1≤ hi1
) ^ … ^ (lik ≤ Dik≤ hik
)
n K-dimensional range selection– Range defined for k dimensions, D’ = {Di1
, Di2, … , Dik
}– Range unspecified for remaining (d - k) dimensions : 0 ≤ X ≤ |Dx|
n Example
Query algebra mapping - Selection : Definition
33
6
3
73
322
4
1
1
86
3
Query RangeJFD Matrix
n D1 : (0, 7) D2 : (0, 7)n Pred = (1 ≤ D1 ≤ 4 ) ^ ( 2 ≤ D2 ≤ 6 ) D’ = { D1, D2}
n In relation domain, interested in only those cells inside query rangen In wavelet domain, interested in only the coefficients that contribute to those cells
Dim D1 (Attr1)
Dim D2 (Attr2)
Count
0 6 6 1 2 3 1 3 4 1 5 6 1 6 8 2 6 7 3 0 1 4 2 3 5 2 2 6 1 3 6 2 2 6 5 1 6 6 3
Dim. D2
Dim. D1
Query algebra mapping - Selection : example
34
--++
+ --+
+-
-+
-+-+
D2
D2
D1
D1
QueryRange
n 1. For each W in WT don 2. If for every Dij in D’ /* Check overlapping */
lij ≤ W.R.bound[ii].lo ≤ hij orW.R.bound[ii].lo ≤ lij ≤ W.R.bound[ii].hi
then goto 3else goto 5
n 3. For all Dij in D’ doset /* Overlapping area is the new hyper-rectangle*/
W.R.bound[ii].lo := max{lij , W.R.bound[ii].lo}
W.R.bound[ii].hi := min {hij, W.R.bound[ii].hi}
if W.R.bound[ii].hi < W.R.schg[ii] thenset /* no sign change any more */
W.S.schg[ii] := W.R.bound[ii].loW.S.sign[ii] := [W.S.sign[ii]. lo, W.S.sign[ii]. lo]
elseif W.R.bound[ii].lo ≥ W.S.schg[ii] thenset /* no sign change any more */
W.S.schg[ii] := W.R.bound[ii].loW.S.sign[ii] := [W.S.sign[ii].hi, W.S.sign[ii].hi]
n 4. Output updated W, Ws = Ws ∪ Wn 5. Goto 1, select next W
W1W4
W3
W2
W4’
W2’W3
’
Query algebra mapping - Selection : Mapping
35
n ProjectXi1, Xi2 ,…, Xik(WT) ;
– T is a d-dimensional relation– WT is T’s wavelets synopsis
n Xi1, ,Xi2 , … , Xik
are the set of attributes we are interested– Remaining (d-k) dimensions will be projected out
n Project out (d-k) dimensions one by one
n Example
Query algebra mapping - Projection : Definition
36
6
3
73
322
4
1
1
86
3
Retain this
dim.(D1)
JFD Matrix
n D1 is to be retained, D2 will be projected outn In relation domain, sum elements in each row along eliminated dimensionn In wavelet domain, sum the contribution of coefficient along eliminated dimension
Eliminate thisdimension (D2)
92317216
Result of projection
Dim D1(Attr1)
Dim D2(Attr2)
Count
6 1 36 2 26 5 16 6 3
Dim D1(Attr1)
Count
6 9
Project
Query algebra mapping - Projection : example
37
+
-+
+-
X2X1
- +-
-+
+ +
+
-+
-
+-
X
D2
D1
D1
Projecton D1
Query algebra mapping - Projection : Mapping
W1
W2
W 1.v = X * W1 .v W2 .v =( X2 – X1 )* W1 .v
n 1. For each Dj in D’ (To be projected out)n 2. For every W in WT do
2.1 Set W.v = W.v * Pj
where Pj equals to(W.R.bound[j].hi - W.S.schg[j] + 1) * W.S.sign[j].hi+ (W.S.schg[j] - W.R.bound[j].lo) * W.R.bound[j].lo
2.2 Discard dimension Dj (Hyper-rectangle and sign)from W
n 3. Goto 1, select next Dj
n In Step 2, by summing up the contributions of W along Dj, we are projecting out Dj
n In a word we can simply do for each W– W.v := W.v * PRODDj in D –D ‘ Pj– Discard dimensions D – D’
W2
38
n Joinpred(WT1 ,WT2)
– Dim(T1) = d1, Dim(T2) = d2
– wavelets synopsis(T1) = WT1 , wavelets synopsis(T2) = WT2
n Pred = ( X11 = X2
1 ) ^ … ^ ( X1k = X2
k )– Pred is of k-dim, k ≤ d1 && k ≤ d2
– WLOG, assume they are the first k dimensions of both T1 and T2
– Let D’ = (D1, D2, … , Dk)
n Dimension of Result would be ( d1 + d2 - k )
n Example
Query algebra mapping - Equi-Join : Definition
39
7
n In relation domain, join count = 7*3n In wavelet domain, consider all pairs of coefficients and check
joinability (and compute new coefficients)
3
JFD Matrix of Relation1
Join Dimension D1
Dim. D2 Dim. D3
These two cells have the same value on D1
JFD Matrix of Relation2
Dim D1(Attr1)
Dim D2(Attr2)
Count
6 2 74 3 6
6 Dim D1(Attr1)
Dim D3(Attr3)
Count
6 3 3
Join along D1
Dim D1(Attr1)
Dim D2(Attr2)
Dim D3(Attr3)
Count
6 2 3 21
Relation1
Relation2
Query algebra mapping - Equi-Join : example
40
D2
-+--++-
+
NOTHING
+-
D1D1
D3
D2 D3
--++
+-+ -
W.v =W11.v*W21.v
+-
+-
D1D1
Join Dimension D1
Query algebra mapping - Equi-Join : examplen Case 1 : no overlapping
– Output nothing
n Case 2: Overlapping– Cell A(X1, X2) and Cell B(X1, X3) – W11 and W12 cover A (W12 not shown)
– W21 and W22 cover B (W22 not shown)
– Calculate join result for (X1, X2, X3 )
(W11.v + W12.v) * (W21.v + W22.v) =W11.v * W21.v + W11.v * W22.v + W12.v * W21.v + W12.v * W22.v
n Consider each coefficient pairn Join range along any dimension can
contain at most one true sign change due to the complete containmentproperty of the Haar wavelets decomposition
W11
W11W21
W21
A(X1, X2) B(X1, X3)
X1
X2 X3
41
n 1. For each pair (W1 ,W2) W1 in WT1 && W2 in WT2 don 2. If for every Di in D’ /* 2. Check overlapping in the k join dimensions*/
If ( W1.R.bound[i].lo ≤ W2.R.bound[i].lo ≤ W1.R.bound[i].hi ) OR( W2.R.bound[i].lo ≤ W1.R.bound[i].lo ≤ W2.R.bound[i].hi )
then goto 3 else goto 7n 3. For each join dimension Di in D’ do /* 3,4,5,6 build a new coefficient on join range */
1.1 set W.R.bound[i].lo := max{W1.R.bound[i].lo, W2.R.bound[i].lo} /* set join boundary */W.R.bound[i].hi := min {W1.R.bound[i].hi, W2.R.bound[i].hi}
1.2 For j = 1, 2 /*Let Sj be a temporary sign-vector variable*/ /* compute sign info */if W.R.bound[i].hi < W j.S.schg[i] then Sj := [W j.S.sign[I].lo, W j.S.sign[I].lo];elseif W.R.bound[i].lo ≥ W j.S.schg[i] then Sj := [W j.S.sign[I].hi, W j.S.sign[I].hi];
else set Sj := W j.S.sign[I];1.3 Set W.S.sign[i] := [S1.lo * S2.lo, S1.hi * S2.hi];1.4 If W.S.sign[i].lo == W.S.sign[i].hi then set W.S.schg[i] := W.R.bound[i].lo1.5 else set W.S.schg[i] :=
maxj=1,2{W j.S.schg[i] : W j.S.schg[i] in [W.R.bound[i].lo , W.R.bound[i].hi] }
n 4. For each non-join dimension Di, i = k + 1, … , d1 do /* 4,5 inherit non-join dimensions */set W.R.bound[i] := W1.R.bound[i], W.S.sign[i] := W1.S.sign[I], W.S.schg[i] := W1.S.schg[i]
n 5. For each non-join dimension Di, i = d1 + 1, … , d1 + d2 – k doset W.R.bound[i] := W2.R.bound[i – d1 + k], W.S.sign[i] := W2.S.sign[i – d1 + k ], W.S.schg[i] := W2.S.schg[i – d1 + k ]
n 6. Set W.v : = W1.v * W2.v and output W, Ws = Ws ∪ W n 7. Goto 1, select another pair
Equi-Join : Mapping
42
D2
-+-
-++
-+ NOTHING
+-
D1D1
D3
D2 D3
--++
+-+ -val =val1*val2
+
-
+-
D1D1
Join Dimension D1
Query algebra mapping - Equi-Join : example
43
-+-
-++
+-
+-+
-val =val1*val2
D3D2
D1D1
--++
-++
-+-
val =val1*val2
++
D2 D3
D1 D1
Query algebra mapping - Equi-Join : example
44
Summary of Step2
n Query algebra mapping(Only non-aggregate)– Selection
• Update those wavelets coefficients whose hyper-rectangle overlapping the selection range
– Projection• Sum up all wavelets coefficients along all dimensions to be
projected out
– Join• Create new wavelets coefficients• Hyper-rectangle equals to the join range plus non-join dimensions• Compute sign information
– Results need to be rendered• Output of above queries are wavelets coefficients• Need to be converted to database relation
45
Rendering(Step 3)
n Go back from wavelets domain to database relations
n Semantics of wavelets coefficients unchanged– Range, Sign, Sign-change, Magnitude
n Inverse wavelets decomposition is easy– Sum up the contributions of all coefficients to each cell
46
Experimental Results
n Compare wavelets-based technique – With sampling and histograms
– In terms of efficiency and accuracy
n Measuring accuracy (Error Metrics)– Aggregate : Absolute relative error
– Non-aggregate : EMD error
n Query types– SELECT, SELECT-SUM, SELECT-JOIN, SELECT-JOIN-SUM
47
Datasets and Queries
n Synthetic data set
n Real data set: – CENSUS Population Survey (www.census.gov) 1992 & 1994
– 4-d data: age (0-17), education level (0-46), income (0-41), hrs/week (0-13)
– JFD Matrix size: 2 million cells(≈32 * 64 * 64 * 16)
– Relation sizes (2 relations) ~ 16,000
– Density ~ 0.001
n Queries:– Selects: 5 ≤ age < 10 ^ 10 ≤ income < 15, selectivity ~ 6%
– Joins: join age on 1992 and 1994 data
– Sum : sum on age
48
Query Execution Time
n Two-D synthetic data set usedn Running time on base relation is 3.6 seconds (Enough
memory used)n Sampling is not counted here
– Giving too less tuples of joinn Wavelets runs faster (than Histograms)
– More than two orders of magnitude– Histograms expanded to generate tuple-value distribution– Wavelets expanded at the very end
49
Query Execution Accuracy
50
Query Execution Accuracy
51
Conclusion
n Wavelets are an effective tool for general purpose approximate query answering– fast query processing (entirely in wavelet (compressed)
domain)
– low synopsis construction cost
– high accuracy even at high dimensions
– can handle all types of queries