chapter 4: dimensions, hierarchies, operations, modeling

36
Prof. Bayer, DWH, Ch.4, S S 2002 1 Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Upload: ted

Post on 07-Jan-2016

47 views

Category:

Documents


4 download

DESCRIPTION

Chapter 4: Dimensions, Hierarchies, Operations, Modeling. Chapter 4.1 Hierarchical Dimensions. Def: Hierarchical Dimensions are composite keys with an order on the key attributes. Prefixes are allowed as keys. Ex: dimension Time = ( Year, Month, Day) legal keys are: (Year)or - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 1

Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Page 2: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 2

Chapter 4.1 Hierarchical Dimensions

Def: Hierarchical Dimensions are composite keys withan order on the key attributes. Prefixes are allowed as keys.

Ex: dimension Time = ( Year, Month, Day)legal keys are:

(Year) or(Year, Month) or(Year, Month, Day)

Def: Basic facts are values in cells with full foreign keys

Page 3: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 3

Aggregations, Summaries

Def: Aggregations are facts in cells with partial keys. These facts are derived by aggregation functions. In a cube with derived facts the aggregation function must be specified.

Ex: Sales on a monthly basis

Sales (Year, Month) = Sales (Year, Month, Days)

Aggregation Functions: count, sum, avg, min, max, ...

Page 4: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 4

Note on Aggregations

• Aggregations may be stored explicitely in the cube, but then they should be secured by integrity constraints

• Aggregations may be virtual and must be computed on demand when needed

• i.e., classical tradeoff between storage space, performance, flexibility

Page 5: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 5

Relational Modeling

Expand and complete partial key by ALL

(Year, Month, ALL)

(ALL, Month, ALL)

(ALL, ALL, ALL)

to obtain simple and complete relational keys via special symbol ALL

Question: SQL to compute complete cube with all aggregations from base-cube?

Page 6: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 6

Hierarchy Example

Page 7: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 7

Chapter 4.2: OLAP Operations

Def: Roll-up computes higher aggregations from lower aggregations or base facts according to hierarchies

Ex: for base facts (Year, Month, Day) there are 3 hierarchical roll-up functions:

Roll-up (Year, Month, ALL)

Roll-up (Year, ALL, ALL)

Roll-up (ALL, ALL, ALL)

which are supported in general (canonical roll-ups)

Page 8: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 8

Additional Roll-ups:

(ALL, Month, ALL) etc.

therefore 23 -1 aggregations or in general

2m -1 aggregations

for m hierarchy levels

Note: see later chapters for the support of arbitrary aggregations

Note: for m dimensions with h1, h2, ...hm hierarchy levels there are

different aggregations for a given aggregation function.

1)1(1

m

iih

Page 9: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 9

Size of base cube

2-dim example

Dim1: (4, 5) = cardinality of the dimension levels

Dim2: (6, 7, 2)

(4 5) ( 6 7 2) 1680 = Size of base cube

42

8420

Page 10: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 10

4 - 6 7 2 336

4 5 6 7 - 840

- - 6 7 2 84

4 - 6 7 - 168

4 5 6 - - 120

- - 6 7 - 42

4 - 6 - - 24

4 5 - - - 20

- - 6 - - 6

4 - - - - 4

- - - - - 1

Number of cells per aggregation function 1645

Size of hierarchically aggregated Cube

Page 11: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 11

Size of completely aggregated cube

4 5 6 7 2

0 0 0 0 0

0 |

| 0

| |

0 0 | 0 0

0 |

| 0

| |

0 | 0 0 0

| | |

| 0 0 0 0

| | | | |

12

7

1424

24 x 6 =144168

5 x 168 = 840 840 + 168

6 x 168 10084 x 1008 = 40325 x 1008 = 4032 + 1008 = 5040

:

:

Page 12: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 12

Computation with binary Tree

2 2 22 2

2 2 21 1 1 1 1 1 1 1

11 1 1

7 7 7 7

661 1

5 1

4

840 12020 168 24 28 4140

120 20 24 4

20 4

1680

840

4828562448168

336

2040140

280

120

240

Page 13: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 13

Lemma: Given a data cube with m dimensions with h1, ..., hm hierarchy levels resp. Let the hierarchy levels of dimension i have

Then the base cube has

and the cube with all aggregations has

resp. elements ,...,, 21 ihiii ccc

cells )1(1 1

m

i

h

j

ji

i

c

cells 1 1

m

i

h

j

ji

i

c

Size of the Cube

Page 14: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 14

Size of the Cube (2)

The aggregated cube is larger than the base cube by the

factor

)1

(1 1

m

i

h

jji

ji

i

c

c

Page 15: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 15

Size of the hierarchically aggregated Cube

For a hierarchy i with hi levels and

there are

hierarchical aggregation possibilities , i.e.

Lemma: A hierarchically completely aggregated data cube has

level,per elements ,...,, 21 ihiii ccc

*...*...*1 21211 ihiiiiii cccccc

iespossibilit , )(11 1

ih

j

j

k

kic

cells )(11 1 1

m

i

h

j

j

k

ki

i

c

Page 16: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 16

Ex: (4 5) (6 7 2)

size of the hierarchically aggregated cube plus base cube

= (1 + 4 + 20) * (1 + 6 + 42 + 84)

= 25 * 133 = 3325

Ex: (4 5) (6 7 2) ( 8 3)

size of base cube: 40,320

hierarchically aggregated cube plus base:

= (1 + 4 + 20) * (1 + 6 + 42 + 84) * (1 + 8 + 24)

= 3325 * 33 = 109,725

Page 17: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 17

Ex: (4 5) (6 7 2) ( 8 3) (5 9)

size of base cube: 1 814,400

hierarchically aggregated cube plus base:

= 109,725 * (1 + 5 + 45) = 5 595,975

Page 18: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 18

Additional comments on aggregations

1. In addition to the size of the complete cube there is a factor of 5 for the various aggregation functions, e.g.

sum, avg, min, max, count, ...

2. So far we did not consider general restrictions, e.g. „all Saturdays in March“ or „vacation months July

and August“, which cross bounds of hierarchy levels

Interactive query formulation results in an unlimited number of aggregations

Optimization: restrictions corresponding to hierarchy levels shoud be pushed down, since they lead to query boxes

Page 19: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 19

Note: See later chapters for multidimensional indexes and MHC techniques and optimization of ROLAP-algebra to support hierarchical canonical aggregations like

Roll-up (Year, Month, ALL)

Roll-up (Year, ALL, ALL)

Roll-up (ALL, ALL, ALL)

but not

Roll-up ( ALL, Month, ALL)

Page 20: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 20

Optimization Problem

Non-hierarchical aggregation, e.g.

March for all years

decompose into union of several restrictions, e.g.

Sales (Year, Month, Day)

where Month = March and

(Year = 1996 or Year = 1997 or Year = 1998)

see later for translation into ROLAP expression and transformations for optimization

Page 21: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 21

Multiple Hierarchies

e.g. the time hierarchy

Aggregation for month e.g. by covering QB of weeks and postfiltering

Page 22: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 22

Navigation Operations

Drill Down: first show single result for aggregated value, e.g. sales per day, then show:

hourly values for days with very high or very low sales

in order to plan working hours for sales people better

Other Examples:

daily sales during Christmas season

vacation bookings for skiing on fasching

Page 23: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 23

Roll-up: Compute Aggregations

Page 24: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 24

Slicing

Selection of a smaller data cube or even reduction of a multidimensional datacube to fewer dimensions by a point restriction in some dimension (becomes pivot element)

R acer Future

Tria-R acer

R acer-Junior

Haid hausen 47 11 8

S chwab ing 53 9 14

Z e ntrum 77 26 15

M ai

Region

Zentrum

Schwabing

H aidhausen 47 11 8

53 9 14

77 26 15

R acerFuture

Tria-Racer

RacerJunior

Produkt

Juni

ZeitJuli

M ai

Page 25: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 25

Dicing (würfeln)

rotate result, to show another view, e.g. exchanging rows and columns

Slice management

precomputing and caching of several slices for later or special use, e.g. for a special sales person

Haidhausen Schwabing ZentrumRacer Future

47 53 77

Tria-Racer

11 9 26

Racer-Junior

8 14 15

Page 26: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 26

Chapter 4.3 Modeling Methodology

Purpose: analysis of business processes, characteristic facts (Kennzahlen) for managers to support decisions (DSS)

Steps of Decision Process:

1. Which business processes to model and analyze?

2. What are the measures, where do they come from?

3. Which degree of details, e.g. minutes like in SAP? Which precision is required for OLAP?

4. Common properties of measures to determine dimensions? Brand, Time, geogr. Region, Productgroup? Dependencies between levels of hierarchies?

Page 27: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 27

5. Attributes of dimensions, e.g. of products

• screen size of TV & computers

• cc and PS for cars

• focal length for camera

Problem: how common are properties to dimensions? Non common properties cannot be modeled by levels of dimensions, are called features at GfK (up to 50), they are numbered, their meaning dependent on a specific dimension element, e.g.

TV: screen size color audio system

Car: transmission cc PS #cyl ...

Page 28: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 28

6. Constant or changing attributes of dimensions? E.g.

• New models of car makers

• new powersource: electrical, hydrogen, solar

attributes are rather stable, but still should be planned ahead! (mergers like Daimler-Crysler)

7. Sparsity: one hypercube or several, i.e. multicube model? Influences storage requirements, query formulation and performance, cannot be hidden easily from user, maybe by views?

Page 29: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 29

8. Caching and management of aggregates?

Number of aggregates

Maintenance costs

Avg.Response time

100%0%

Total costsTim

e

OptimalNumber ofaggregates

Page 30: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 30

Chapter 4.4 Comparison of OLAP Architectures

1. MOLAP: Multidimensional OLAP

2. ROLAP: Relational OLAP

3. HOLAP: Hybrid OLAP

Page 31: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 31

MOLAP Architecture

Data WarehouseDatenbank(re lational)

Benutzer Benutzer Benutzer

D ata M artsM DDBM S

Page 32: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 32

MDDBMS in ANSI-X3-Sparc

Konzeptuelle Ebene

D im ensionen m itD im ensionselem enten

H ierarchien

Externe Ebene

Ind ividuelle Subm odelleD aten- /

Speicherstrukturen

In terne Ebene

711

23

55

Page 33: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 33

Logical components of a MDDBMS

Page 34: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 34

ROLAP Architecture

D ata W arehouseD atenbank(re la tional)

Benutzer

R O LAPProdukt

R O LAPProdukt

re la tionaleD ata M arts

R O LAPProdukt

Benutzer Benutzer

Page 35: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 35

HOLAP Architecture

HO LAPProdukt

Benutzer

D ata W arehouseD atenbank(re lational) D ata W arehouse

D atenbank(m ultid im ensional)

Page 36: Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Prof. Bayer, DWH, Ch.4, SS 2002 36

Reasons for MOLAP

• performance

• write access

• Data Marts

• functional power

Reasons for ROLAP

• scalability

• flexible precomputations, partial aggregates

• parallelism

• DB-mamagement and ACID