chapter 4: dimensions, hierarchies, operations, modeling
DESCRIPTION
Chapter 4: Dimensions, Hierarchies, Operations, Modeling. Chapter 4.1 Hierarchical Dimensions. Def: Hierarchical Dimensions are composite keys with an order on the key attributes. Prefixes are allowed as keys. Ex: dimension Time = ( Year, Month, Day) legal keys are: (Year)or - PowerPoint PPT PresentationTRANSCRIPT
Prof. Bayer, DWH, Ch.4, SS 2002 1
Chapter 4: Dimensions, Hierarchies, Operations, Modeling
Prof. Bayer, DWH, Ch.4, SS 2002 2
Chapter 4.1 Hierarchical Dimensions
Def: Hierarchical Dimensions are composite keys withan order on the key attributes. Prefixes are allowed as keys.
Ex: dimension Time = ( Year, Month, Day)legal keys are:
(Year) or(Year, Month) or(Year, Month, Day)
Def: Basic facts are values in cells with full foreign keys
Prof. Bayer, DWH, Ch.4, SS 2002 3
Aggregations, Summaries
Def: Aggregations are facts in cells with partial keys. These facts are derived by aggregation functions. In a cube with derived facts the aggregation function must be specified.
Ex: Sales on a monthly basis
Sales (Year, Month) = Sales (Year, Month, Days)
Aggregation Functions: count, sum, avg, min, max, ...
Prof. Bayer, DWH, Ch.4, SS 2002 4
Note on Aggregations
• Aggregations may be stored explicitely in the cube, but then they should be secured by integrity constraints
• Aggregations may be virtual and must be computed on demand when needed
• i.e., classical tradeoff between storage space, performance, flexibility
Prof. Bayer, DWH, Ch.4, SS 2002 5
Relational Modeling
Expand and complete partial key by ALL
(Year, Month, ALL)
(ALL, Month, ALL)
(ALL, ALL, ALL)
to obtain simple and complete relational keys via special symbol ALL
Question: SQL to compute complete cube with all aggregations from base-cube?
Prof. Bayer, DWH, Ch.4, SS 2002 6
Hierarchy Example
Prof. Bayer, DWH, Ch.4, SS 2002 7
Chapter 4.2: OLAP Operations
Def: Roll-up computes higher aggregations from lower aggregations or base facts according to hierarchies
Ex: for base facts (Year, Month, Day) there are 3 hierarchical roll-up functions:
Roll-up (Year, Month, ALL)
Roll-up (Year, ALL, ALL)
Roll-up (ALL, ALL, ALL)
which are supported in general (canonical roll-ups)
Prof. Bayer, DWH, Ch.4, SS 2002 8
Additional Roll-ups:
(ALL, Month, ALL) etc.
therefore 23 -1 aggregations or in general
2m -1 aggregations
for m hierarchy levels
Note: see later chapters for the support of arbitrary aggregations
Note: for m dimensions with h1, h2, ...hm hierarchy levels there are
different aggregations for a given aggregation function.
1)1(1
m
iih
Prof. Bayer, DWH, Ch.4, SS 2002 9
Size of base cube
2-dim example
Dim1: (4, 5) = cardinality of the dimension levels
Dim2: (6, 7, 2)
(4 5) ( 6 7 2) 1680 = Size of base cube
42
8420
Prof. Bayer, DWH, Ch.4, SS 2002 10
4 - 6 7 2 336
4 5 6 7 - 840
- - 6 7 2 84
4 - 6 7 - 168
4 5 6 - - 120
- - 6 7 - 42
4 - 6 - - 24
4 5 - - - 20
- - 6 - - 6
4 - - - - 4
- - - - - 1
Number of cells per aggregation function 1645
Size of hierarchically aggregated Cube
Prof. Bayer, DWH, Ch.4, SS 2002 11
Size of completely aggregated cube
4 5 6 7 2
0 0 0 0 0
0 |
| 0
| |
0 0 | 0 0
0 |
| 0
| |
0 | 0 0 0
| | |
| 0 0 0 0
| | | | |
12
7
1424
24 x 6 =144168
5 x 168 = 840 840 + 168
6 x 168 10084 x 1008 = 40325 x 1008 = 4032 + 1008 = 5040
:
:
Prof. Bayer, DWH, Ch.4, SS 2002 12
Computation with binary Tree
2 2 22 2
2 2 21 1 1 1 1 1 1 1
11 1 1
7 7 7 7
661 1
5 1
4
840 12020 168 24 28 4140
120 20 24 4
20 4
1680
840
4828562448168
336
2040140
280
120
240
Prof. Bayer, DWH, Ch.4, SS 2002 13
Lemma: Given a data cube with m dimensions with h1, ..., hm hierarchy levels resp. Let the hierarchy levels of dimension i have
Then the base cube has
and the cube with all aggregations has
resp. elements ,...,, 21 ihiii ccc
cells )1(1 1
m
i
h
j
ji
i
c
cells 1 1
m
i
h
j
ji
i
c
Size of the Cube
Prof. Bayer, DWH, Ch.4, SS 2002 14
Size of the Cube (2)
The aggregated cube is larger than the base cube by the
factor
)1
(1 1
m
i
h
jji
ji
i
c
c
Prof. Bayer, DWH, Ch.4, SS 2002 15
Size of the hierarchically aggregated Cube
For a hierarchy i with hi levels and
there are
hierarchical aggregation possibilities , i.e.
Lemma: A hierarchically completely aggregated data cube has
level,per elements ,...,, 21 ihiii ccc
*...*...*1 21211 ihiiiiii cccccc
iespossibilit , )(11 1
ih
j
j
k
kic
cells )(11 1 1
m
i
h
j
j
k
ki
i
c
Prof. Bayer, DWH, Ch.4, SS 2002 16
Ex: (4 5) (6 7 2)
size of the hierarchically aggregated cube plus base cube
= (1 + 4 + 20) * (1 + 6 + 42 + 84)
= 25 * 133 = 3325
Ex: (4 5) (6 7 2) ( 8 3)
size of base cube: 40,320
hierarchically aggregated cube plus base:
= (1 + 4 + 20) * (1 + 6 + 42 + 84) * (1 + 8 + 24)
= 3325 * 33 = 109,725
Prof. Bayer, DWH, Ch.4, SS 2002 17
Ex: (4 5) (6 7 2) ( 8 3) (5 9)
size of base cube: 1 814,400
hierarchically aggregated cube plus base:
= 109,725 * (1 + 5 + 45) = 5 595,975
Prof. Bayer, DWH, Ch.4, SS 2002 18
Additional comments on aggregations
1. In addition to the size of the complete cube there is a factor of 5 for the various aggregation functions, e.g.
sum, avg, min, max, count, ...
2. So far we did not consider general restrictions, e.g. „all Saturdays in March“ or „vacation months July
and August“, which cross bounds of hierarchy levels
Interactive query formulation results in an unlimited number of aggregations
Optimization: restrictions corresponding to hierarchy levels shoud be pushed down, since they lead to query boxes
Prof. Bayer, DWH, Ch.4, SS 2002 19
Note: See later chapters for multidimensional indexes and MHC techniques and optimization of ROLAP-algebra to support hierarchical canonical aggregations like
Roll-up (Year, Month, ALL)
Roll-up (Year, ALL, ALL)
Roll-up (ALL, ALL, ALL)
but not
Roll-up ( ALL, Month, ALL)
Prof. Bayer, DWH, Ch.4, SS 2002 20
Optimization Problem
Non-hierarchical aggregation, e.g.
March for all years
decompose into union of several restrictions, e.g.
Sales (Year, Month, Day)
where Month = March and
(Year = 1996 or Year = 1997 or Year = 1998)
see later for translation into ROLAP expression and transformations for optimization
Prof. Bayer, DWH, Ch.4, SS 2002 21
Multiple Hierarchies
e.g. the time hierarchy
Aggregation for month e.g. by covering QB of weeks and postfiltering
Prof. Bayer, DWH, Ch.4, SS 2002 22
Navigation Operations
Drill Down: first show single result for aggregated value, e.g. sales per day, then show:
hourly values for days with very high or very low sales
in order to plan working hours for sales people better
Other Examples:
daily sales during Christmas season
vacation bookings for skiing on fasching
Prof. Bayer, DWH, Ch.4, SS 2002 23
Roll-up: Compute Aggregations
Prof. Bayer, DWH, Ch.4, SS 2002 24
Slicing
Selection of a smaller data cube or even reduction of a multidimensional datacube to fewer dimensions by a point restriction in some dimension (becomes pivot element)
R acer Future
Tria-R acer
R acer-Junior
Haid hausen 47 11 8
S chwab ing 53 9 14
Z e ntrum 77 26 15
M ai
Region
Zentrum
Schwabing
H aidhausen 47 11 8
53 9 14
77 26 15
R acerFuture
Tria-Racer
RacerJunior
Produkt
Juni
ZeitJuli
M ai
Prof. Bayer, DWH, Ch.4, SS 2002 25
Dicing (würfeln)
rotate result, to show another view, e.g. exchanging rows and columns
Slice management
precomputing and caching of several slices for later or special use, e.g. for a special sales person
Haidhausen Schwabing ZentrumRacer Future
47 53 77
Tria-Racer
11 9 26
Racer-Junior
8 14 15
Prof. Bayer, DWH, Ch.4, SS 2002 26
Chapter 4.3 Modeling Methodology
Purpose: analysis of business processes, characteristic facts (Kennzahlen) for managers to support decisions (DSS)
Steps of Decision Process:
1. Which business processes to model and analyze?
2. What are the measures, where do they come from?
3. Which degree of details, e.g. minutes like in SAP? Which precision is required for OLAP?
4. Common properties of measures to determine dimensions? Brand, Time, geogr. Region, Productgroup? Dependencies between levels of hierarchies?
Prof. Bayer, DWH, Ch.4, SS 2002 27
5. Attributes of dimensions, e.g. of products
• screen size of TV & computers
• cc and PS for cars
• focal length for camera
Problem: how common are properties to dimensions? Non common properties cannot be modeled by levels of dimensions, are called features at GfK (up to 50), they are numbered, their meaning dependent on a specific dimension element, e.g.
TV: screen size color audio system
Car: transmission cc PS #cyl ...
Prof. Bayer, DWH, Ch.4, SS 2002 28
6. Constant or changing attributes of dimensions? E.g.
• New models of car makers
• new powersource: electrical, hydrogen, solar
attributes are rather stable, but still should be planned ahead! (mergers like Daimler-Crysler)
7. Sparsity: one hypercube or several, i.e. multicube model? Influences storage requirements, query formulation and performance, cannot be hidden easily from user, maybe by views?
Prof. Bayer, DWH, Ch.4, SS 2002 29
8. Caching and management of aggregates?
Number of aggregates
Maintenance costs
Avg.Response time
100%0%
Total costsTim
e
OptimalNumber ofaggregates
Prof. Bayer, DWH, Ch.4, SS 2002 30
Chapter 4.4 Comparison of OLAP Architectures
1. MOLAP: Multidimensional OLAP
2. ROLAP: Relational OLAP
3. HOLAP: Hybrid OLAP
Prof. Bayer, DWH, Ch.4, SS 2002 31
MOLAP Architecture
Data WarehouseDatenbank(re lational)
Benutzer Benutzer Benutzer
D ata M artsM DDBM S
Prof. Bayer, DWH, Ch.4, SS 2002 32
MDDBMS in ANSI-X3-Sparc
Konzeptuelle Ebene
D im ensionen m itD im ensionselem enten
H ierarchien
Externe Ebene
Ind ividuelle Subm odelleD aten- /
Speicherstrukturen
In terne Ebene
711
23
55
Prof. Bayer, DWH, Ch.4, SS 2002 33
Logical components of a MDDBMS
Prof. Bayer, DWH, Ch.4, SS 2002 34
ROLAP Architecture
D ata W arehouseD atenbank(re la tional)
Benutzer
R O LAPProdukt
R O LAPProdukt
re la tionaleD ata M arts
R O LAPProdukt
Benutzer Benutzer
Prof. Bayer, DWH, Ch.4, SS 2002 35
HOLAP Architecture
HO LAPProdukt
Benutzer
D ata W arehouseD atenbank(re lational) D ata W arehouse
D atenbank(m ultid im ensional)
Prof. Bayer, DWH, Ch.4, SS 2002 36
Reasons for MOLAP
• performance
• write access
• Data Marts
• functional power
Reasons for ROLAP
• scalability
• flexible precomputations, partial aggregates
• parallelism
• DB-mamagement and ACID