quotient cube: how to summarize the semantics of a data cube laks v.s. lakshmanan (univ. of british...
TRANSCRIPT
Quotient Cube: How to Summarize the Semantics of a Data Cube
Laks V.S. Lakshmanan (Univ. of British Columbia)*
Jian Pei (State Univ. of New York at Buffalo)*
Jiawei Han (Univ. of Illinois at Urbana-Champaign)+
* The work is partially supported by NSERC and NCE/IRIS+ The work is partially supported by NSF, UI, and Microsoft Research
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 2
Outline
• Introduction and motivation
• Cube lattice partitions
• Semantics preserving partitions
• Algorithms
• Experimental results
• Discussion and summary
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 3
Data CubeBase table
Dimensions Measure
Store Product Season AVG(Sales)
S1 P1 Spring 6
S1 P2 Spring 12
S2 P1 Fall 9
S1 * Spring 9
… … … …
* * * 9
Dimensions Measure
Store Product Season Sales
S1 P1 Spring 6
S1 P2 Spring 12
S2 P1 Fall 9
Aggregation
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 4
Previous Work: Efficient Cube Computation
• Compute a cube from a base table: e.g. (Agarwal et al. 98), (Zhao et al. 97)
• View materialization with space constraint: e.g. Harinarayann et al. 96
• Handling scarcity (Ross & Srivastava 97)• Cube compression: e.g. (Sismanis et al. 02),
(Shanmugasundaram et al. 99), (Want et al. 02)• Approximation: e.g. (Barbara & Sullivan 97), (Barbara
& Xu 00), (Vitter et al. 98)• Constrained cube construction: e.g. (Beyer &
Ramakrishnan 99)
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 5
Previous Work: Extracting Semantics From Cubes
• General contexts of patterns (Sathe & Sarawagi 01)
• Generalize association rules (Imielinski et al. 00)
• Cube gradient analysis (Dong et al. 01)
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 6
Cube (Cell) Lattice
• Many cells have same aggregate values• Can we summarize the semantics of the
cube by grouping cells by aggregate values?
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 7
A Naïve Attempt
• Put all cells having same aggregate value in a class
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
C1 C2 C3
C4
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 8
Problems w/ the Naïve Attempt
• The result is not a lattice anymore!– Anomaly– The rollup/drilldown semantics is lost
C1 C2 C3
C4
343 CCC rolluprollup
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 9
A Better Partitioning
• Quotient cube: partitioning reserving the rollup/drilldown semantics
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
C1 C3
C5
C4
C2
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 10
Problem Statement
• Given a cube, characterize a good way (quotient cube) of partitioning its cells into classes such that– The partition generates a reduced lattice
preserving the rollup/drilldown semantics– The partition is optimal: # classes as small
as possible
• Compute quotient cubes efficiently
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 11
Why A Quotient Cube Useful?
• Semantic compression
• Semantic OLAP browsing(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6(*,P1,s):6 (S1,P2,*):12(*,P2,s):12(S2,*,f):9 (S2,P1,*) (*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
C1 C2
C5
C4
C3
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 12
Why A Quotient Cube Useful?
• Semantic compression
• Semantic OLAP browsing(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6(*,P1,s):6 (S1,P2,*):12(*,P2,s):12(S2,*,f):9 (S2,P1,*) (*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
C1 C2
C5
C4
(S2,P1,f):9
(S2,*,f):9 (S2,P1,*) (*,P1,f):9
(*,*,f):9 (S2,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 13
Outline
• Introduction and motivation
• Cube lattice partitions
• Semantics preserving partitions
• Algorithms
• Experimental results
• Discussion and summary
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 14
Convex Partitions
• A convex partition retains semantics
CLScCLSccccc rolluprollup 231321 ,,
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
C1 C3
C5
C4
C2
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 15
A Non-convex Partition
• Anomaly
• The rollup/drilldown semantics is lost
C1 C2 C3
C4
343 CCC rolluprollup
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 16
Connected Partitions
• Cells c1 and c2 are connected if a series of rollup/drilldown operation starting from c1 can touch c2
• Intuitively, (each class of) a partition should be connected
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 17
Cover Partition
• For a cell c, a tuple t in base table is in c’s cover if t can be rolled up to c– E.g., Cov(S1,*,spring)={(S1,P1,spring),
(S1,P2,spring)}
Dimensions Measure
Store Product Season Sales
S1 P1 Spring 6
S1 P2 Spring 12
S2 P1 Fall 9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 18
Cover Partitions Are Convex
• All cells having the same cover are in a class• (S1,P2,s) and (*,P2,*) cover same tuples in
the base table (S1,P2,*) and (*,P2,s) are in the same class.
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 19
Cover Partitions Are Connected
• Cells c1 and c2 have the same cover there must be some common ancestor c3 of c1 and c2 st c3 has the same cover– Cells c1 and c2 are in the same class and
connected (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 20
Cover Partitions & Aggregates
• All cells in a cover partition carry the same aggregate value w.r.t. any aggregate function– But cells in a class of MIN() may have
different covers
• For COUNT() and SUM() (positive), cover equivalence coincides with aggregate equivalence
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 21
Outline
• Introduction and motivation
• Cube lattice partitions
• Semantics preserving partitions
• Algorithms
• Experimental results
• Discussion and summary
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 22
Class 1 = Class 2
Class 1
Weak Congruence
• Weak congruence preserves semantics
Class 2
c c’
d d’
roll
up
roll
up
c c’
d d’ro
llup
roll
upimply
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 23
Weak Congruence = Convex
• Convex no “hole” in the class weak congruence
• They preserve the rollup/drilldown semantics
• Quotient cube lattice is the lattice of convex classes
• How to derive the coarsest quotient cube?
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 24
Monotone Aggregate Functions
• Monotone functions– S T f(S) f(T)– S T f(S) f(T)– MIN(), MAX(), COUNT(), PSUM(), …
• The aggregate function f is monotone f is the unique coarsest partition– MIN(): put all cells having the same MIN()
value into a class
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 25
Non-monotone Functions
• Bad news: f may or may not be a convex/weak congruence.
• Good news: cover partition is convex (I.e., weak congruence) and always yields a quotient cube w.r.t. any aggregate function!
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 26
Outline
• Introduction and motivation
• Cube lattice partitions
• Semantics preserving partitions
• Algorithms
• Experimental results
• Discussion and summary
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 27
How to Compute A QC
• Aggregate functions– Monotone functions– Non-monotone functions
• Settings– The cube is available– Only the base table is available
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 28
Monotone Functions
• The cube is available grab all cells with the same aggregate value and put them into a class
• Only the base table is available bottom-up, depth-first search– For a cell, compute its cover, find the upper
bound having the same aggregate value– Group lower bounds by upper bounds
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 29
Example: Cover QC(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 30
Non-monotone Functions
• Class merging
• Find cover partition classes
• Merge classes as long as convexity is retained
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 31
Example: AVG QC(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9
(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 32
Outline
• Introduction and motivation
• Cube lattice partitions
• Semantics preserving partitions
• Algorithms
• Experimental results
• Discussion and summary
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 33
Reduction Ratio vs. Dimensionality
0
10
20
30
40
50
60
70
80
90
100
2 3 4 5 6 7 8 9 10
Re
duction
ratio
(%
)
Dimensionality
MinCubeQC_CovQC_MIN
# base tuples = 200k Zipf factor = 2.0
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 34
Reduction Ratio vs. Zipf Factor
0
10
20
30
40
50
60
0 0.5 1 1.5 2 2.5 3
Reduction r
atio (
%)
Zipf factor
MinCubeQC_CovQC_MIN
# base tuples = 200k # dimensions = 6
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 35
Reduction Ratio vs. Base Table Size
0
10
20
30
40
50
60
70
80
0 200 400 600 800 1000 1200 1400
Red
uct
ion r
atio
(%
)
Number of tuples (k)
MinCubeQC_CovQC_MIN
Zipf factor = 2.0 # dimensions = 6
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 36
Runtime
0
500
1000
1500
2000
2500
3000
0 200 400 600 800 1000 1200 1400
Runtim
e (
seconds)
Number of tuples (k)
MinCubeQC_CovQC_MIN
BUC
Zipf factor = 2.0 # dimensions = 6
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 37
Compression Ratio on Weather Data Set
0
10
20
30
40
50
60
70
80
90
100
2 3 4 5 6 7
Red
uctio
n ra
tio (
%)
Number of dimensions
QC_CovQC_AVG
0
10
20
30
40
50
60
2 3 4 5 6 7 8 9
Red
uctio
n ra
tio (
%)
Number of dimensions
MinCubeQC_Cov
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 38
Outline
• Introduction and motivation
• Cube lattice partitions
• Semantics preserving partitions
• Algorithms
• Experimental results
• Discussion and summary
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 39
Semantic Cube Exploration
• Theoretical foundation for semantic summarization in data cube– concept and properties of quotient cubes
• Efficient algorithms for quotient cube construction– Quotient cubes can be computed directly
from base tables
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 40
Ongoing Research
• Efficient implementation of quotient cube-based OLAP system– Data warehouse built using quotient cubes
• Hierarchies and constraints
• Incremental maintenance
• Semantics based OLAP and mining
• Efficient query answering
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 41
References (1)
• R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994
• S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB, 1996.
• D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997.
• D. Barbara and X. Wu. Using loglinear models to compress datacube. In WAIM'2000}, pages 311--322, 2000.
• K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In SIGMOD'99.
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 42
Reference (2)• G. Birkhoff, Lattice Theory, 2nd edition, New York, American
Mathematical Society (Colloquium Publications, vol. 25), 1948.• S. Geffner, D. Agrawal, A. El Abbadi, and T. R. Smith. Relative
prefix sums: An efficient approach for querying dynamic OLAP data cubes. In ICDE'99.
• Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE'96.
• C.-T. Ho, J. Bruck, and R. Agrawal. Partial-sum queries in data cubes using covering codes. In PODS'97.
• J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. In SIGMOD'01.
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 43
Reference (3)
• V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD'96.
• T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing Association Rules. Technical Report, Rutgers University, August 2000.
• H. V. Jagadish, J. Madar, R.T. Ng. Semantic Compression and Pattern Extraction with Fascicles. VLDB'99.
• K. Ross and D. Srivastava. Fast computation of sparse datacubes. In VLDB'97.
• G. Sathe and S. Sarawagi. Intelligent Rollups in Multidimensional OLAP Data. VLDB'01.
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 44
Reference (4)
• J. Shanmugasundaram, U.M. Fayyad, and P. S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. SIGKDD’99.
• J. S. Vitter, M. Wang, and B. R. Iyer. Data cube approximation and historgrams via wavelets. In CIKM'98.
• W. Wang, H. Lu, J. Feng, and J. X. Yu. Condensed cube: An effective approach to reducing data cube size. In ICDE'02.
• Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97.
• G.K. Zipf. Human Behavior and The Principle of Least Effort Addison-Wesley, 1949.