iso/iec jtc1 sc32 1sql/olap sang-won lee let’s e-wha! email: [email protected] url: swlee jul....

27
ISO/IEC JTC1 SC32 1 SQL/OLAP SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee @ ewha .ac. kr URL: http://home.ewha.ac.kr/~swlee Jul. 12th, 2001

Upload: alvin-freeman

Post on 05-Jan-2016

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 1SQL/OLAP

SQL/OLAP

Sang-Won Lee

Let’s e-Wha!

Email: [email protected]: http://home.ewha.ac.kr/~swlee

Jul. 12th, 2001

Page 2: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 2SQL/OLAP

Contents

Introduction to OLAP and SQL Issues

Current OLAP Solutions

SQL/OLAP

Future OLAP Trends

Page 3: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 3SQL/OLAP

OLAP

On-Line Analytical Processing– E.F. Codd coined the term “OLAP”([1])

– Multi-dimensional data model

– vs. On-Line Transaction Processing

– vs. Data warehouse

Page 4: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 4SQL/OLAP

Data Warehouse Architecture

Page 5: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 5SQL/OLAP

Multi-dimensional Data Model

Sales(prod-id,store-id,time-id,qty,amt)

R egiona l M gr. V iew

F inancia l M gr. V iew A d H oc V iew

MARKET

TIM E

S A LE SP roduct M gr. V iew

Dimension: Product, Store, Time

Hierarchy:– Product -> Category -> Industry– Store->City -> State -> Country– Date -> Month -> Quarter -> Year

Page 6: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 6SQL/OLAP

Multi-dimensional Data Model(2)

Operations– roll-up/drill-down – slice/dice – pivot – ranking – comparisons– drill-across– etc.

Example– for each state show me top 10 products based on total sales – what is the percentage growth of Jan-99 total sales

over total Jan-98? – for each product show me the quantity shipped and sold

Page 7: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 7SQL/OLAP

Database Back in the OLAP Game- History of SQL Evolutions in 1990s(OLAP Area) -

Requirements from industries(‘95 ~ ‘96)– R. Kimball, “Why Decision Support Fails and How to Fix it?”

([2]); see also [3], [4]

Reactions from researchers(‘96)– Jim Gray et al., “Data Cube: A Relational Aggregation Operator G

eneralizing Group-By, Cross-Tab and Sub Totals,” ([7,8])– Chatziantoniou, K. Ross, “Querying Multiple Features in Relation

al Databases,”([9])

Commercial DBMSs and SQL standards(‘98 ~ )– commercial products: e.g. Oracle, “Analytical Functions for Oracl

e8i”, Oct., 1999– SQL standards

ANSI X3H2-96-205(R3): Super Sets(The Cube and Beyond) ANSI NCITS H2-99-154: Introduction to OLAP

see also [6]

Page 8: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 8SQL/OLAP

OLAP Operations

Many business operations was hard or impossible to express in SQL

– multiple aggregations

– comparisons(with aggregation)

– reporting features

Be prepared for serious performance penalty

Client and middle-ware tools provide the necessary functionality

– OLAP server: ROLAP vs. MOLAP

Page 9: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 9SQL/OLAP

Multiple Aggregations

Create a 2-dimensional spreadsheets that shows sum of sales by maker as well as model of car

Each subtotal requires a separate aggregate query

RED

WHITE

BLUE

Chevy Ford

By Make

By Color

Sum

Cross Tab

SELECT color, make, sum(amt)FROM salesGROUP BY color, makeunionSELECT color, sum(amt)FROM salesGROUP BY colorunionSELECT make, sum(amt)FROM salesGROUP BY makeunionSELECT sum(amt)FROM sales

Page 10: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 10SQL/OLAP

Comparisons

Examples:– last year’s sales vs. this year’s sales for each product

requires a self-join

VIEW:create or replace view v_sales asselect prod-id, year, sum(qty) as sale_sumfrom salesgroup by prod-id, year;

QUERY:select cur.year cur_year, cur.sale_cur_sales, last.sum last_salesfrom v_sales curr, v_sales lastwhere curr.year=(last.year+1)

Page 11: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 11SQL/OLAP

Reporting Features

It was too complex to express– rank(top 10) and N_tile(“top 30%” of all products)

– median, mode, …

– running total, moving average, cumulative totals

Page 12: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 12SQL/OLAP

Reporting Features(2)

Examples:– a moving average(over 3 day window) of total sales for

each product for 2000

VIEW:create or replace view v_sales asselect prod-id, time-id, sum(qty) as sale_sumfrom salesgroup by prod-id, time-id;

QUERY:select end.time, avg(start.sale_sum)from v_sales start, v_sales endwhere end.time >= start.time and end.time <= start.time+2group by end.time

Page 13: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 13SQL/OLAP

OLAP Servers

ProcessingMD queriesefficiently

Page 14: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 14SQL/OLAP

ROLAP

OLAP Client OLAP Client OLAP Client

OLAP Engine

Relational Database(Star or Snowflake Schema)

meta-data

To map warehouseschema into a MD model

Page 15: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 15SQL/OLAP

ROLAP(2)

Example: Oracle Discoverer 4i leverages Oracle 8i– 8i - biggest SQL improvements in a decade!

– more powerful analysis using new analytic functions

– sharing query redirection(rewrite) using MVs

– 100% automated summary management

Page 16: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 16SQL/OLAP

MOLAP

A multidimensional database(MDDB) stores data in a series of array structures, indexed to provide optimal access time to any element in the array.

Example: Oracle Express stores arrays of data

6 7 8

0 1 2

3 4 5

8

5

2

14

11

17

26

23

20

0 1 2

9 10 11

18 19 20

0 1 2

0

1

2

P

R

O

D

U

C

T

M O N T H

0

1

2

C

I

T

Y

16 17 18 19 20 21 22 23

24 25 26

8 9 10 11 12 13 14 15

0 1 2 3 4 5 6 7

Page 17: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 17SQL/OLAP

Propose SQL Constructs

Multiple aggregations– Gray et. al., “Cube and Roll-Up”[6,7]

Comparison– Chatziantoniou and Ross, “Group By Column Variabl

e”[8]SELECT subscriber, r.login-time

FROM log

GROUP BY subscriber: r

SUCH THAT r.spent-time = max(spent_time)

Reporting– Redbrick provides SQL extensions in RISQL

rank, tertile, ratio-to-report etc

Page 18: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 18SQL/OLAP

The Data CUBE Relational Operator Generalizes Group By and Aggregates

CHEVY

FORD 19901991

19921993

REDWHITEBLUE

By Color

By Make & Color

By Make & Year

By Color & Year

By MakeBy Year

Sum

The Data Cube and The Sub-Space Aggregates

REDWHITE

BLUE

Chevy Ford

By Make

By Color

Sum

Cross TabRED

WHITEBLUE

By Color

Sum

Group By (with total)Sum

Aggregate

source:[6]

Page 19: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32SQL/OLAP

Getting Sub-totals: ROLLUP Operation

SELECT year, brand, SUM(qty)FROM salesGROUP BY ROLLUP (year, brand);

YEAR BRAND SUM(qty)1996 Ford 2501996 Honda 3001996 Toyota 450 1997 Ford 300 …

1996 1000

1997 1200 2200

Page 20: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32SQL/OLAP

Getting Cross-tabs: CUBE Operation

SELECT year, brand, SUM(amount)FROM salesGROUP BY CUBE (year, brand);

YEAR BRAND SUM(AMOUNT)1996 Ford 250 ...1996 Toyota 4501997 Ford 300 ...1997 1200

2200

Ford 550 Honda 650

Toyota 1000

Page 21: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32SQL/OLAP

Flexible Grouping: GROUPING_SETS Operator

SELECT year, brand, color, SUM(qty) FROM salesGROUP BY GROUPING_SETS ((year, brand),

(brand,color),());

YEAR BRAND COLOR SUM(QTY)1996 Ford 2501996 Honda 300 1996 Toyota 4501997 Ford 3001997 Honda 3501997 Toyota 550 Ford Blue 400 Ford Red 150 Honda Blue 650 Toyota Red 700 Toyota White 300 2200

Brand, ColorBrand, Color

Year, BrandYear, Brand

Grand totalGrand total

Page 22: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 22SQL/OLAP

LAG Operator

TIMEKEY SALES SALES_LAST_YEAR SALES_CHANGE98-1 1100 - -….. … … ...99-1 1200 1100 10099-2 1500 1450 5099-3 1700 1350 25099-4 1600 1700 -10099-5 1800 1600 20099-6 1500 1450 5099-7 1300 1250 5099-8 1400 1200 200

SQL> SELECT timekey, sales 2 LAG(sales, 12) OVER 3 (ORDER BY timekey) AS sales_last_year, 4 (sales - sales_last_year) AS sales_change 5 FROM sales;

Page 23: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 23SQL/OLAP

MOVING Average

SELECT time-id, avg(sum(qty)) over (order by time-id RANGE INTERVAL ‘2’ DAY PRECEDING ) as mvg_avg_salesfrom salesgroup by time_id ;

Page 24: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 24SQL/OLAP

SQL/OLAP

Why enhance the RDBMS for OLAP calculations? – Performance– Scalability– Simpler SQL development– Productivity

Rollup Functional Index Top 10 Moving window Cumulative window Lead and lagBefore 8.32 8.62 4.26 43.62 45.55 175.01After 1.42 0.91 1.02 4.97 3.36 4.96Improvement 486% 847% 318% 778% 1256% 3428%

0%

500%

1000%

1500%

2000%

2500%

3000%

3500%

4000%

Rollup Functional Index Top 10 Moving w indow Cumulativew indow

Lead and lag

% Im

pro

vem

en

t

Page 25: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 25SQL/OLAP

Database Back in the OLAP Game

Materialized views

Index techniques: e.g. bitmap (join) index

Partitioning: e.g. range/hash/list

Query optimization: e.g. star query optimization

......

Page 26: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 26SQL/OLAP

Future OLAP Trends

To be or not to be?

OLAP API:-OLE DB for OLAP-JOLAP

Page 27: ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: swlee Jul. 12th, 2001swlee@ewha.ac.krswlee

ISO/IEC JTC1 SC32 27SQL/OLAP

References

[1] E.F. Codd et al., “Providing OLAP(On-line Analytical Processing) to User-Analysts: An IT Mandate,” Available from Arborsoft’s Web Site(http://www.arborsoft.com)

[2] R. Kimball, “Why Decision Support Fails and How to Fix it?” SIGMOD Record, Sep.,1995

[3] R. Kimball, “The Problem with Comparisons,” DBMS Magazine, Jan., 1996(also available from http://www.rkimball.com/html/articles.html)

[4] R. Kimball, “SQL Roadblocks and Pitfalls,” DBMS Magazine, Feb., 1996(also available from http://www.rkimball.com/html/articles.html

[5] R. Winter, “Database Back in the OLAP Game,” Intelligent Enterprise Magazine, Dec., 1998,(available from http://www.intelligententerprise.com)

[6] R. Winter, “SQL-99’s New OLAP Functions,” Intelligent Enterprise Magazine, Jan., 2000,(available from http://www.intelligententerprise.com)

[7] Jim Gray et al., “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab and Sub Totals,” Proceedings of International Conferences on Data Engineering, p. 152 - 159, 1996

[8] Jim Gray et al., “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab and Sub Totals,” Data Mining and Knowledge Discovery Journal, Vol. 1, No. 1, 1997

[9] D. Chatziantoniou, K. Ross, “Querying Multiple Features in Relational Databases,”, Proc. Of VLDB Conf., 1996