relational on-line analytical processing (rolap) 1

34
Relational On-Line Analytical Processing (ROLAP) 1

Upload: ally-barstow

Post on 31-Mar-2015

250 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Relational On-Line Analytical Processing (ROLAP) 1

1

Relational On-Line Analytical Processing (ROLAP)

Page 2: Relational On-Line Analytical Processing (ROLAP) 1

Data Warehouse Schema

Star Schema Snowflake Schema Fact constellation

Page 3: Relational On-Line Analytical Processing (ROLAP) 1

Star Schema

A single, large and central fact table and one table for each dimension.

The star schema separates business data into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data. Examples of fact data include sales price, sale quantity, and time, distance, speed, and weight measurements. Related dimension attribute examples include product models, product colors, product sizes, geographic locations, and salesperson names.

Page 4: Relational On-Line Analytical Processing (ROLAP) 1

Star Schema (contd..)

Store Key

Product Key

Period Key

Units

Price

Store Dimension

Time Dimension

Product Dimension

Fact Table

Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.

Store Key

Store Name

City

State

Region

Period Key

Year

Quarter

Month

Product Key

Product Desc

Page 5: Relational On-Line Analytical Processing (ROLAP) 1

SnowFlake Schema

Variant of star schema model. A single, large and central fact table and one

or more tables for each dimension. Dimension tables are normalized i.e. split

dimension table data into additional tables

Page 6: Relational On-Line Analytical Processing (ROLAP) 1

SnowFlake Schema (contd..)

Store Key

Product Key

Period Key

Units

Price

Time Dimension

Product Dimension

Fact Table

Store Key

Store Name

City Key

Period Key

Year

Quarter

Month

Product Key

Product Desc

City Key

City

State

Region

City Dimension

Store Dimension

Drawbacks: Time consuming joins,report generation slow

Page 7: Relational On-Line Analytical Processing (ROLAP) 1

7

Fact Constellation

Fact Constellation Multiple fact tables that share many dimension

tables Booking and Checkout may share many

dimension tables in the hotel industryHotels

Travel Agents

Promotion

Room Type

Customer

Booking

Checkout

Page 8: Relational On-Line Analytical Processing (ROLAP) 1

8

ComparisonSnowflake Schema Star Schema

Ease of maintenance/change:

No redundancy and hence more easy to maintain and change

Has redundant data and hence less easy to maintain/change

Ease of Use: More complex queries and hence less easy to understand

Less complex queries and easy to understand

Query Performance:

More foreign keys-and hence more query execution time

Less no. of foreign keys and hence lesser query execution time

Normalization: Has normalized tables Has De-normalized tables

Type of Data warehouse:

Good to use for datawarehouse core to simplify complex relationships (many:many)

Good for datamarts with simple relationships (1:1 or 1:many)

Joins: Higher number of Joins Fewer Joins

Dimension table:It may have more than one dimension table for each dimension

Contains only single dimension table for each dimension

When to use:When dimension table is relatively big in size, snowflaking is better as it reduces space.

When dimension table contains less number of rows, we can go for Star schema.

Page 9: Relational On-Line Analytical Processing (ROLAP) 1

9

To understand a Multidimensional view of data and how it can be implemented in a relational database. ROLAP – Relational On-Line Analytical Processing.

To be aware of extensions have been added to SQL to make OLAP queries easier to write; in particular: Rollup Operator Cube Operator

ROLAP and SQL

Page 10: Relational On-Line Analytical Processing (ROLAP) 1

10

Star schema (dimensional model) for property sales of DreamHome

Page 11: Relational On-Line Analytical Processing (ROLAP) 1

A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process (Inmon, 1993).

“The technique of taking large amounts of operational data and putting them into a large database, with the aim of analysing them every which way to yield useful information” Scoring Points – How Tesco Continues to Win Customer Loyalty. Terry

Hunt Pub. Kogan Page

11

Data Warehouse Definitions

Page 12: Relational On-Line Analytical Processing (ROLAP) 1

12

Data Warehousing: Consolidate data from many sources in one large repository. Loading, periodic synchronization of replicas. Semantic integration.

OLAP: Complex SQL queries and views. Queries based on spreadsheet-style operations and

“multidimensional” view of data. Interactive and “online” queries.

Data Mining: Exploratory search for interesting trends and anomalies. (Another module/topic!)

Three Complementary Trends

Page 13: Relational On-Line Analytical Processing (ROLAP) 1

13

Fact table in BCNF; dimension tables un-normalized. Dimension tables are small; updates/inserts/deletes are

rare. So, anomalies less important than query performance. This kind of schema is very common in OLAP

applications, and is called a star schema; computing the join of all these relations is called a star join.

Design Issues – Further Example

price

category

pname

pid country

statecitylocid

sales

locidtimeid

pid

holiday_flag

weekdate

timeid month

quarter

year

(Fact table)SALES

TIMES

PRODUCTS LOCATIONS

Page 14: Relational On-Line Analytical Processing (ROLAP) 1

14

Collection of numeric measures, which depend on a set of dimensions. E.g., measure Sales, dimensions Product

(key: pid), Location (locid), and Time (timeid).

Multidimensional Data Model

8 10 10

30 20 50

25 8 15

1 2 3 timeid

p

id11 12

13

11 1 1 25

11 2 1 8

11 3 1 15

12 1 1 30

12 2 1 20

12 3 1 50

13 1 1 8

13 2 1 10

13 3 1 10

11 1 2 35

p

id tim

eid

locid

sale

s

locid

Slice locid=1is shown:

Page 15: Relational On-Line Analytical Processing (ROLAP) 1

15

Multidimensional data can be stored physically in a (disk-resident, persistent) array; called MOLAP systems. Alternatively, can store as a relation; called ROLAP systems.

The main relation, which relates dimensions to a measure, is called the fact table. Each dimension can have additional attributes and an associated dimension table. E.g., Products(pid, pname, category, price) Fact tables are much larger than dimensional

tables.

MOLAP vs ROLAP

Page 16: Relational On-Line Analytical Processing (ROLAP) 1

16

For each dimension, the set of values can be organized in a hierarchy:

Dimension Hierarchies

PRODUCT TIME LOCATION

category week month state

pname date city

year

quarter country

Page 17: Relational On-Line Analytical Processing (ROLAP) 1

17

Influenced by SQL and by spreadsheets. A common operation is to aggregate a measure

over one or more dimensions. Find total sales. Find total sales for each city, or for each state.

Roll-up: Aggregating at different levels of a dimension hierarchy. E.g., Given total sales by city, we can roll-up to

get sales by state.

OLAP Queries

Page 18: Relational On-Line Analytical Processing (ROLAP) 1

18

Drill-down: The inverse of roll-up. E.g., Given total sales by state, can drill-down to

get total sales by city. E.g., Can also drill-down on different dimensions to

get total sales by product for each state. Pivoting: Aggregation on selected dimensions.

E.g., Pivoting on Location and Time yields this cross-tabulation:

OLAP Queries

63 81 144

38 107 145

75 35 110

WI CA Total

1995

1996

1997

176 223 399Total

Page 19: Relational On-Line Analytical Processing (ROLAP) 1

19

The cross-tabulation obtained by pivoting can also be computed using a collection of SQLqueries:

Comparison with SQL Queries

SELECT T.Year, L.state, SUM(S.sales)FROM Sales S, Times T, Locations LWHERE S.timeid=T.timeid AND S.timeid=L.timeidGROUP BY T.year, L.state

This query generates the entries in the body of the pivot chart on slide 12.

Page 20: Relational On-Line Analytical Processing (ROLAP) 1

20

SELECT L.state, SUM(S.sales)FROM Sales S, Location LWHERE S.timeid=L.timeidGROUP BY L.state

This query produces the summary row at the bottom of the pivot chart on slide 12.

Page 21: Relational On-Line Analytical Processing (ROLAP) 1

21

SELECT T.Year, SUM(S.sales)FROM Sales S, Times TWHERE S.timeid=T.timeidGROUP BY T.Year;

This query produces the summary column on the right of the pivot chart on slide 12.

Page 22: Relational On-Line Analytical Processing (ROLAP) 1

22

The cumulative sum in the bottom-right corner of the chart is produced by this query:

SELECT SUM (S.sales)FROM Sales S, Locations LWHERE S.locid=L.locid;

Page 23: Relational On-Line Analytical Processing (ROLAP) 1

23

The Group by clause with the CUBE keyword is equivalent to a collection of GROUP BY statements:

The result of the previous four queries can be produced with one query using the CUBE keyword.

SELECT T.year, L.state, SUM (S.sales)FROM Sales S, Times T, Locations LWHERE S.timid=T.timeid AND S.locid.L.locidGROUP BY CUBE (T.year, L.state);

The results of this query (on the following slide) is a tabular representation of the cross tabulation on slide 18.

The Cube Operator

Page 24: Relational On-Line Analytical Processing (ROLAP) 1

24

T.Year L.State SUM(S.sales)

1995 WI 63

1995 CA 81

1995 NULL 144

1996 WI 38

1996 CA 107

1996 NULL 145

1997 WI 75

1997 CA 35

1997 NULL 110

NULL WI 176

NULL CA 223

NULL NULL 399

Page 25: Relational On-Line Analytical Processing (ROLAP) 1

25

S# P# QTY

S1 P1 300

S1 P2 200

S2 P1 300

S2 P2 400

S3 P2 200

S4 P2 200

OLAP: Example of Queries using Rollup keyword

The above table (called SP) shows Suppliers who supply Parts in a certain quantity. S# = Supplier No and P# = Part number.

Page 26: Relational On-Line Analytical Processing (ROLAP) 1

26

1. Get the total shipment quantity.2. Get total shipment quantities by supplier3. Get total shipment quantities by part4. Get total shipment quantities by supplier

and part.

Queries

Page 27: Relational On-Line Analytical Processing (ROLAP) 1

27

1 Get the total shipment quantity. Select SUM (QTY) From SP;

2 Get total shipment quantities by supplier Select S# , SUM(QTY) From SP

Group by (S#);

Page 28: Relational On-Line Analytical Processing (ROLAP) 1

28

3 Get total shipment quantities by part. Select P#, SUM(QTY) From SP Group by (P#);

4 Get total shipment quantities by supplier and part. Select S#, P#, SUM(QTY) From SP

Group by (S#, P#);

Page 29: Relational On-Line Analytical Processing (ROLAP) 1

29

Consider the following querySelect S#, P#, Sum (QTY) As TOTQTYFrom SPGroup by Rollup (S#, P#);

The query is a bundled SQL formulation of Queries 1, 2 and 4. Result is on the next slide.

Called Rollup because the quantities have been “rolled up” along the Supplier dimension.

Rollup

Page 30: Relational On-Line Analytical Processing (ROLAP) 1

30

S# P# TOTQTY

S1 P1 300

S1 P2 200

S2 P1 300

S2 P2 400

S3 P2 200

S4 P2 200

S1 NULL 500

S2 NULL 700

S3 NULL 200

S4 NULL 200

NULL NULL 1600

Result of Rollup

Page 31: Relational On-Line Analytical Processing (ROLAP) 1

31

Consider the following querySelect S#, P#, Sum (QTY) As TOTQTYFrom SPGroup by CUBE (S#, P#);

The result of this query, on the next slide, is a bundle of all four of the original queries.

Note C. J. Date describes this as “… a table (an SQL-style table, at any rate,) but it is hardly a relation….In fact the result table in this example can be regarded as an “outer union”….outer union is not a respectable relational operation.“An introduction to Database Systems” eighth edition. C.J.Date Addison Wesley 2004

CUBE

Page 32: Relational On-Line Analytical Processing (ROLAP) 1

32

Result of CubeS# P# TOTQTY

S1 P1 300

S1 P2 200

S2 P1 300

S2 P2 400

S3 P2 200

S4 P2 200

S1 NULL 500

S2 NULL 700

S3 NULL 200

S4 NULL 200

NULL P1 600

NULL P2 1000

NULL NULL 1600

Page 33: Relational On-Line Analytical Processing (ROLAP) 1

33

The drawbacks to the previous queries without CUBE and Rollup is obvious.

Formulating several similar but distinct queries is tedious

Using these additions to SQL we can attempt to try and represent several levels of aggregation in a single query.

This is the motivation behind “Rollup” and “Cube” which are extra options on the GROUP BY clause in SQL

Several Levels of Aggregation in same query – Rollup

Page 34: Relational On-Line Analytical Processing (ROLAP) 1

34

Multi-Dimensional views of data can be stored in a relational database.

The SQL Group by clause can be used to aggregate data but queries can be tedious.

The new keywords of Rollup and Cube have been added to SQL for use in Relational OnLine Analytical Processing (ROLAP)

Practical exercises in the lab next week.

Summary