olap basics and fundamentals by bharat kalia

68
ON-LINE ANALYTICAL PROCESSING By : Bharat Kalia

Upload: bharat-kalia

Post on 22-Nov-2014

100 views

Category:

Engineering


0 download

DESCRIPTION

OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into the data through fast, consistent, interactive, access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

TRANSCRIPT

Page 1: OLAP Basics and Fundamentals by Bharat Kalia

ON-LINE ANALYTICAL PROCESSING

By :Bharat Kalia

Page 2: OLAP Basics and Fundamentals by Bharat Kalia

2

OBJECTIVES

What is OLAP Need for OLAP Features & functions of

OLAP Different OLAP models OLAP implementations

Page 3: OLAP Basics and Fundamentals by Bharat Kalia

3

DEMAND FOR OLAP To develop DM, three approaches In all approaches, Data Marts

rest on Dimensional Model Data Marts are sufficient for

basic data analysis Users need to go beyond such

basic analysis

Page 4: OLAP Basics and Fundamentals by Bharat Kalia

4

DEMAND FOR OLAP Need for Multidimensional

Analysis Fast Access & Powerful

Calculations Limitations of other analysis

methods like: SQL Spreadsheets Report Writers

Page 5: OLAP Basics and Fundamentals by Bharat Kalia

5

DEMAND FOR OLAP Traditional tools of report writers,

query products, spreadsheets, & language interfaces do not match the user expectations as far as performing multidimensional analysis with complex calculations is concerned.

Tools used with OLTP and basic DW environments do not match up to the task

Page 6: OLAP Basics and Fundamentals by Bharat Kalia

6

OLAP IS THE ANSWER!OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into the data through fast, consistent, interactive, access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

Page 7: OLAP Basics and Fundamentals by Bharat Kalia

7

Why is OLAP useful? Facilitates multidimensional data

analysis by pre-computing aggregates across many sets of dimensions

Provides for: Greater speed and responsiveness Improved user interactivity

Page 8: OLAP Basics and Fundamentals by Bharat Kalia

8

DATA WAREHOUSES A data warehouse is based on a

multidimensional data model which views data in the form of a data cube

A data cube allows data to be modeled and viewed in multiple dimensions

In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

Page 9: OLAP Basics and Fundamentals by Bharat Kalia

9

LATTICE OF CUBOIDS

all

time item location supplier

time,item time,location

time,supplier

item,location

item,supplier

location,supplier

time,item,location

time,item,supplier

time,location,supplier

item,location,supplier

time, item, location, supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

Page 10: OLAP Basics and Fundamentals by Bharat Kalia

10

CUBE

sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

dimensions = 3

Multi-dimensional cube:Fact table view:

Page 11: OLAP Basics and Fundamentals by Bharat Kalia

11

AGGREGATES

sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

• Add up amounts for day 1• In SQL: SELECT sum(amt) FROM SALE WHERE date = 1

81

Page 12: OLAP Basics and Fundamentals by Bharat Kalia

12

AGGREGATES

sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

• Add up amounts by day• In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date

ans date sum1 812 48

Page 13: OLAP Basics and Fundamentals by Bharat Kalia

13

Operators: sum, count, max, min, median, avg

“Having” clause Using dimension hierarchy

average by region (within store) maximum by month (within date)

Aggregates

Page 14: OLAP Basics and Fundamentals by Bharat Kalia

14

CUBE AGGREGATION

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

. . .

drill-down

rollup

Example: computing sums

Page 15: OLAP Basics and Fundamentals by Bharat Kalia

15

CUBE OPERATORS

day 1

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

. . .

sale(c1,*,*)

sale(*,*,*)sale(c2,p2,*)

sale(*,p1,*)

Page 16: OLAP Basics and Fundamentals by Bharat Kalia

16

EXTENDED CUBE

c1 c2 c3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129day 2 c1 c2 c3 *

p1 44 4 48p2* 44 4 48

c1 c2 c3 *p1 12 50 62p2 11 8 19* 23 8 50 81

day 1

*

sale(*,p2,*)

Page 17: OLAP Basics and Fundamentals by Bharat Kalia

17

AGGREGATION USING HIERARCHIES

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

region A region Bp1 56 54p2 11 8

customer

region

country

(customer c1 in Region A;customers c2, c3 in Region B)

Page 18: OLAP Basics and Fundamentals by Bharat Kalia

18

PIVOTING

sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

Multi-dimensional cube:Fact table view:

c1 c2 c3p1 56 4 50p2 11 8

Page 19: OLAP Basics and Fundamentals by Bharat Kalia

19

CUBE AGGREGATES LATTICE

city, product, date

city, product city, date product, date

city product date

all

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3p1 67 12 50

129

use greedyalgorithm todecide whatto materialize

Page 20: OLAP Basics and Fundamentals by Bharat Kalia

20

DIMENSION HIERARCHIES

all

state

city

cities city statec1 CAc2 NY

Page 21: OLAP Basics and Fundamentals by Bharat Kalia

21

DIMENSION HIERARCHIES

city, product

city, product, date

city, date product, date

city product date

all

state, product, date

state, date

state, product

state

not all arcs shown...

Page 22: OLAP Basics and Fundamentals by Bharat Kalia

22

INTERESTING HIERARCHY

all

years

quarters

months

days

weeks

time day week month quarter year1 1 1 1 20002 1 1 1 20003 1 1 1 20004 1 1 1 20005 1 1 1 20006 1 1 1 20007 1 1 1 20008 2 1 1 2000

conceptualdimension table

Page 23: OLAP Basics and Fundamentals by Bharat Kalia

23

SAMPLE CUBETotal annual salesof TV in U.S.A.Date

Produ

ct

Cou

ntrysum

sum TV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

sum

Total annual salesof PC in U.S.A.

Total annual salesof VCR in U.S.A.Total Q1 sales

In U.S.ATotal Q1 salesIn Canada

Total Q1 salesIn Mexico

Total Q1 salesIn all countries

Total Q2 salesIn all countries

Total salesIn U.S.A

Total salesIn CanadaTotal sales

In Mexico

TOTAL SALES

Page 24: OLAP Basics and Fundamentals by Bharat Kalia

24

Roll-Up Drill-Down Slice & Dice Pivot Drill-Across Drill-Through

OLAP Operations

Page 25: OLAP Basics and Fundamentals by Bharat Kalia

25

OLAP Operations

Page 26: OLAP Basics and Fundamentals by Bharat Kalia

26

Slicing

Page 27: OLAP Basics and Fundamentals by Bharat Kalia

27

Dicing (Sub-cube)

Page 28: OLAP Basics and Fundamentals by Bharat Kalia

28

ROLL-UP

Page 29: OLAP Basics and Fundamentals by Bharat Kalia

29

DRILL-DOWN

Page 30: OLAP Basics and Fundamentals by Bharat Kalia

30

OTHER OLAP OPERATIONS

o Drill-Across: Queries involving more than one fact table

o Drill-Through: Makes use of SQL to drill through the bottom level of a data cube down to its back-end relational tables

o Pivot (rotate): Pivot (also called "rotate") is avisualization operation which rotates the data axes inview in order to provide an alternative presentation ofthe data. Other examples include rotating the axes in a3-D cube, or transforming a 3-D cube into a series of 2-D planes.

Page 31: OLAP Basics and Fundamentals by Bharat Kalia

31

OTHER OLAP OPERATIONS

o Moving Averageso Growth Rateso Depreciationo Currency Conversiono Statistical Functionso Top N or Bottom N queries

Page 32: OLAP Basics and Fundamentals by Bharat Kalia

32

Conceptual vs. Actual The “cube” is a logical way of

visualizing the data in an OLAP setting

Not how the data is actually represented on disk

Two ways of storing data: ROLAP: Relational OLAP MOLAP: Multidimensional OLAP

Page 33: OLAP Basics and Fundamentals by Bharat Kalia

33

Construction of the data cube is key to the operation of OLAP

The computation process creates a set of aggregates on the various dimensions of the data

The CUBE operator

OLAP & CUBE

Page 34: OLAP Basics and Fundamentals by Bharat Kalia

34

AN EXAMPLE OF THE CUBE OPERATOR

Page 35: OLAP Basics and Fundamentals by Bharat Kalia

35

The CUBE Operator Proposed by Gray et al* Effectively involves a series of

GROUP-BY operations to aggregate data

Creates power set on all attributes according to: A measure An aggregator function

*J. Gray, S. Chaudhuri, A. Bosworth, A. Layman,D. Reichart, M. Venkatrao, F. Pellow and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29-54, 1997.

Page 36: OLAP Basics and Fundamentals by Bharat Kalia

36

Problem: this generates a lot of data and work (2n sets in total, where n is the number of dimensions)

Solution: optimized algorithms to run faster, consume less memory, and perform fewer I/Os.

CUBING Problem

Page 37: OLAP Basics and Fundamentals by Bharat Kalia

37

o ROLAP-based cubing algorithms (Agarwal et al’96)

o Array-based cubing algorithm (Zhao et al’97)

Efficient Computation of Data Cubes

S. Agarwal, R. Agrawal, P. M. Deshpande, A.Gupta, J. F. Naughton, R. Ramakrishnan and S.Sarawagi.On the computation of multidimensional aggregates. In VLDB'96.Y. Zhao, P. M. Deshpande, and J. F. Naughton.An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97.

Page 38: OLAP Basics and Fundamentals by Bharat Kalia

38

o How many cuboids in a cube with 3 dimensions?o Answer: o As many group by operations?o No hierarchies involved!!o π (Li +1), where Li is the number of

levels associated with dimension Io 10 dimensions & 4 levels for each

dimensiono Total Cuboids = 510

Efficient Computation of Data Cubes

Page 39: OLAP Basics and Fundamentals by Bharat Kalia

39

It is all about which DBMS you choose to store your data warehouse data

RDBMS – ROLAP MDDB – MOLAP BOTH - HOLAP

Approaches to OLAP Servers

Page 40: OLAP Basics and Fundamentals by Bharat Kalia

40

Three possibilities for OLAP servers(1) Relational OLAP (ROLAP)

Relational and specialized relational DBMS to store and manage warehouse data

OLAP middleware to support missing pieces(2) Multidimensional OLAP (MOLAP)

Array-based storage structures Direct access to array data structures

(3) Hybrid OLAP (HOLAP) Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools

Approaches to OLAP Servers

Page 41: OLAP Basics and Fundamentals by Bharat Kalia

41

Special schema design: star, snowflake

Special indexes: bitmap, multi-table join Proven technology (relational model,

DBMS), tend to outperform specialized MDDB especially on large data sets

Products IBM DB2, Oracle, Sybase IQ,

RedBrick, Informix

ROLAP

Page 42: OLAP Basics and Fundamentals by Bharat Kalia

42

Defines complex, multi-dimensional data with simple model

Reduces the number of joins a query has to process

Allows the data warehouse to evolve with relatively low maintenance

Can contain both detailed and summarized data.

ROLAP is based on familiar, proven, and already selected technologies.

BUT!!! SQL for multi-dimensional manipulation of

calculations.

ROLAP

Page 43: OLAP Basics and Fundamentals by Bharat Kalia

43

MDDB: a special-purpose data model

Facts stored in multi-dimensional arrays

Dimensions used to index array Sometimes on top of relational DB Products

Pilot, Arbor Essbase, Gentia

MOLAP

Page 44: OLAP Basics and Fundamentals by Bharat Kalia

44

Pre-calculating or pre-consolidating transactional data improves speed.

BUTFully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB

MDDBs are great candidates for the < 100GB department data marts.

With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake.

MOLAP

Page 45: OLAP Basics and Fundamentals by Bharat Kalia

45

User Needs Multidimensional view Excellent Performance Analytical Flexibility Real-Time Data Access High Data Capacity

MIS Needs Leverages Data Warehouse Easy Development Low Structure Maintenance Low Aggregate Maintenance

OLAP Needs

Page 46: OLAP Basics and Fundamentals by Bharat Kalia

46

Multidimensional View All true OLAP tools, whether they work

with a MDDB or an RDBMS, provide a multidimensional view of data.

For example, decision makers may view sales by office, quarter, representative, product, etc. This perspective on data, which mirrors the way business professional think, allows for more intuitive and more powerful analysis.

OLAP Needs: User Needs

Page 47: OLAP Basics and Fundamentals by Bharat Kalia

47

Excellent Performance The performance of your decision

support tool directly depends on the way it manages aggregates.

RDBMS Calculate aggregates on fly (response

time suffers) DBA creates summary tables to store

aggregates (enormous amount of disk space)

OLAP Needs: User Needs

Page 48: OLAP Basics and Fundamentals by Bharat Kalia

48

Excellent Performance For example, suppose you have a Sales indicator

with six dimensions—Representatives, Products, Customers, Regions, Months, and Years.

MOLAP tools will store a given aggregate, such as the November 1997 government sales of product A504 by representative 1040 in New York, in 1 cell of the MDDB.

In contrast, ROLAP tools consume 600% more space, because they require a record of seven values—six foreign keys and the actual aggregate—in a relational summary table.

OLAP Needs: User Needs

Page 49: OLAP Basics and Fundamentals by Bharat Kalia

49

Excellent Performance

OLAP Needs: User Needs

Page 50: OLAP Basics and Fundamentals by Bharat Kalia

50

Excellent Performance

OLAP Needs: User Needs

RDBMSs must use several summary tables to store the aggregatesthat a MOLAP could store in just one cube. For example, consider a Sales indicator with three dimensions: Months, Regions, and Products. The indicator cube will contain seven sets of aggregates:• Sales by month• Sales by product• Sales by region• Sales by month and product• Sales by month and region• Sales by product and region• Sales by product, month, and regionTo store these aggregates in an RDBMS, you’d have to create seven summary tables, one for each aggregate set. HOW MANY SUMMARY TABLES FOR 6 DIMENSIONS? (Separate fact table and shrunken dimension table approach for storing aggregates)

Page 51: OLAP Basics and Fundamentals by Bharat Kalia

51

Excellent Performance

OLAP Needs: User Needs

• Huge amounts of extra storage space is required (even if there is no sparsity failure)

• Maintenance costs are high

• Lot of statistical analysis needs to be done to decide which aggregates are to be precomputed

• DBA must keep the cost/performance ratio in check

Page 52: OLAP Basics and Fundamentals by Bharat Kalia

52

Excellent Performance

OLAP Needs: User Needs

• In contrast, we’ve seen that multidimensional databases store aggregates in a very compact structure that consumes very little disk space and requires very little maintenance

• All levels of consolidation can therefore be precomputed and stored in MDDB

• As a result, fast response time is not limited to the most frequently accessed queries; all aggregates can be accessed with lightning speed.

Page 53: OLAP Basics and Fundamentals by Bharat Kalia

53

Analytical Flexibility

Both ROLAP & MOLAP tools offer comparative performance for Comparative Analysis Roll-up and Drill-down Slicing & Dicing

Only MOLAP tools offer ‘what-if’ analysis

OLAP Needs: User Needs

Page 54: OLAP Basics and Fundamentals by Bharat Kalia

54

Real-Time Data Access MOLAP tools load data into the multidimensional cubes.

Consequently, the data being accessed is only as recent as the last load.

Some applications require real-time data access Process of continually refreshing the data attaches higher

costs to operating a MOLAP system Some MOLAP tools offer reach-through functionality to

access volatile data stored outside the MDDB Unfortunately, users must be aware of the underlying

database structure Relational data access is too complex for the typical user

OLAP Needs: User Needs

Page 55: OLAP Basics and Fundamentals by Bharat Kalia

55

Real-Time Data Access ROLAP tools maintain a constant link to the

operational RDBMS, which provides users with up-to-the-minute, accurate data(Real-Time Data Warehousing)

Industries & organizations with highly volatile data particularly benefit from this access to live, operational data.

OLAP Needs: User Needs

Page 56: OLAP Basics and Fundamentals by Bharat Kalia

56

High Capacity Data MOLAP products are limited by the size of the

cube defined by the multidimensional view. When dimension elements are predefined, the scope of available data is limited at the onset.

ROLAP tools circumvent this barrier. Dynamic dimensions are not stored in the predefined multidimensional model, but fetched at run time from the RDBMS.

OLAP Needs: User Needs

Page 57: OLAP Basics and Fundamentals by Bharat Kalia

57

High Capacity Data

OLAP Needs: User Needs

o In MOLAP, only aggregates are stored in the cube. Atomic, operational data are forced out of the user’s analytical realm.

o ROLAP systems can access extremely detailed

operational data, as well as aggregated data stored in summary tables.

Page 58: OLAP Basics and Fundamentals by Bharat Kalia

58

MIS Needs Administrators should be able to

leverage their existing relational databases without devoting large amounts of time and effort to intricate development, fine tuning, or intensive maintenance.

OLAP Needs

Page 59: OLAP Basics and Fundamentals by Bharat Kalia

59

Leveraging Data Warehouse Both the finance and the MIS departments of

your organization will appreciate a decision support tool that leverages existing investments in data warehousing.

MIS staff that opts for a MOLAP tool must duplicate data in its own proprietary MDDB.

MIS staff that chooses a ROLAP tool will be able to access the data warehouse directly.

OLAP Needs: MIS Needs

Page 60: OLAP Basics and Fundamentals by Bharat Kalia

60

Easy Development MOLAP development is straightforward, it requires no

fine tuning and creates its own aggregates. ROLAP tools, on the other hand, require a specific

schema for the relational database. Skilled DBAs must provide the appropriate schema

(star or snowflake schema), tune the database, and create the appropriate summary tables.

However, many ROLAP tools are metadata-driven, which means the multidimensional view is generated and maintained more easily.

OLAP Needs: MIS Needs

Page 61: OLAP Basics and Fundamentals by Bharat Kalia

61

Low Structure Maintenance The structure of a MOLAP tool’s underlying MDDB

greatly depends on each of its dimensions. When one dimension changes, the entire MDDB must be re-structured.

Multi-matrix MDDBs reduce the maintenance burden ROLAP systems do not store data in a proprietary

structure. They build and maintain a constant link between the

multidimensional view and the underlying RDBMS using the metadata.

No database restructuring is required.

OLAP Needs: MIS Needs

Page 62: OLAP Basics and Fundamentals by Bharat Kalia

62

Low Aggregate Maintenance MOLAP tools automatically create high-level

aggregates based on your lower-level MDDB data and aggregate definitions.

When data is updated, the aggregates are automatically updated and stored in the MDDB.

With ROLAP tools, MIS staff must continually monitor the use of summary tables to keep their cost/performance ratio in check.

DBAs inevitably use sophisticated statistics to isolate only the most frequently accessed aggregates, and store them in summary tables.

These tables leave ROLAP administrators with a heavy maintenance burden.

OLAP Needs: MIS Needs

Page 63: OLAP Basics and Fundamentals by Bharat Kalia

63

ROLAP vs. MOLAP

Page 64: OLAP Basics and Fundamentals by Bharat Kalia

64

ROLAP vs. MOLAP1) Performance:

• How fast will the system appear to the end-user?

• MDD server vendors believe this is a key point in their favor.

2) Data volume and scalability:

• While MDD servers can handle up to 100GB of storage, RDBMS servers can handle hundreds of gigabytes and terabytes.

Page 65: OLAP Basics and Fundamentals by Bharat Kalia

65

o Best of both worlds

o Storing detailed data in RDBMS

o Storing aggregated data in MDBMS

o User access via MOLAP tools

Hybrid OLAP - HOLAP

Page 66: OLAP Basics and Fundamentals by Bharat Kalia

66

HOLAP

Multi-dimensional

accessMultidimensional

Viewer

RelationalViewer

ClientMDBMS Server

Multi-dimensional

data

SQL-Read

RDBMS Server

Userdata Meta data

Deriveddata

SQL-Reach Through

SQL-Read

Page 67: OLAP Basics and Fundamentals by Bharat Kalia

67

IFA. You require write access B. Your data is under 50 GBC. Your timetable to implement is 60-90 daysD. Lowest level already aggregatedE. Data access on aggregated levelF. You’re developing a general-purpose application for inventory movement or assets management

THENConsider an MDD /MOLAP solution for your data mart

 IF

A. Your data is over 100 GBB. You have a "read-only" requirementC. Historical data at the lowest level of granularityD. Detailed access, long-running queriesE. Data assigned to lowest level elements

THENConsider an RDBMS/ROLAP solution for your data mart.

IFA. OLAP on aggregated and detailed dataB. Different user groupsC. Ease of use and detailed data

THENConsider an HOLAP for your data mart

ROLAP, MOLAP, or HOLAP

Page 68: OLAP Basics and Fundamentals by Bharat Kalia

68

ROLAP: RDBMS -> star/snowflake schema MOLAP: MDDB -> Cube structures ROLAP or MOLAP: Data models used play major role in

performance differences MOLAP: for summarized and relatively lesser volumes

of data (100GB) ROLAP: for detailed and larger volumes of data Both storage methods have strengths and weaknesses The choice is requirement specific, though currently

data warehouses are predominantly built using RDBMSs/ROLAP.

HOLAP is emerging as the OLPA server of choice

Conclusions