data warehouse and olap - lear-fabini

28
Data Warehouses, OLAP, and the CUBE operator Jake Lear, Scott Fabini CS586 - Winter 2015

Upload: scott-fabini

Post on 07-Aug-2015

122 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Warehouse and OLAP - Lear-Fabini

Data Warehouses,

OLAP, and the

CUBE operator

Jake Lear, Scott Fabini

CS586 - Winter 2015

Page 2: Data Warehouse and OLAP - Lear-Fabini

OLTP/OLAP ComparisonOLAP: On-Line

Analytic Processing

OLTP: On-Line

Transaction Processing

OLTP Query for A Single

Sales Order in an SAP DB

OLAP Query for revenue

by Geography / Product-

Line in an IBM Cognos DB

Page 3: Data Warehouse and OLAP - Lear-Fabini

OLTP/OLAP Comparison

OLTP

• Simple, frequent queries

• Up-to-the-minute information

• Fast response time needed

OLAP

• Few queries, but very complex

• Lots of indexes

• May run for hours

• Doesn’t need up-to-the-minute info

Page 4: Data Warehouse and OLAP - Lear-Fabini

OLTP QueriesOLAP Query

Animation

OLAP Queries on a DB

Page 5: Data Warehouse and OLAP - Lear-Fabini

OLAP Queries

• We have a long-running transaction

analyzing all sales data.

• What about the new sales data

coming into the database while our

transaction is running?

• Talk to your neighbor. Think in

terms of serializable transactions

and locking mechanisms…

• Recall, Reads get an S-Lock.

Writers forbidden from

getting their X-Lock

Stalled Transactions!

Animation

Page 6: Data Warehouse and OLAP - Lear-Fabini

OLAP

Data Warehouse

OLTP

Database

Copy

Over

night

Support fast transactionsDedicated for Analytics of

Aggregate Data

Animation

Page 7: Data Warehouse and OLAP - Lear-Fabini

The Data Warehouse Congregates

Multiple Data Sources

CRM

SCP

ERP

Data Warehouse

Supply Chain

(Vendor) Data

Factory

Planning Data

Sales/

Marketing Data

OLAP Queries

What is our worst-selling vehicle?

Do we need another factory to meet demand?

Should we focus sales on Europe or Japan?

Page 8: Data Warehouse and OLAP - Lear-Fabini

Summarizing the

Motivation for OLAP• We want to add lots of Indexes for faster search, and perform

long-running transactions, but this slows down the DB

• We only need to look at the long term trends, not up-to-the-

second data

• We may want to aggregate data from multiple DBs

• Answer: Move the data to a Data Warehouse for targeted

analysis. Use OLAP techniques which we learn today to slice

& dice the massive aggregate into discrete actionable data.

Page 9: Data Warehouse and OLAP - Lear-Fabini

Auto Sales Example

• Say we want to record in a data warehouse all auto sales

• We would create a Table with key info for the sale

• Serial Number, Dealer Name, Date of Sale, Price of Sale

• The relation becomes our Fact Table:

• sales(auto, dealer, date, price)

Page 10: Data Warehouse and OLAP - Lear-Fabini

Visualizing Multidimensional

Data

• Fact Table: a central

relation or collection

of data

• E.g. all information

about a sale of an

automobile

• Dimensions:

Represent the

properties of the sale Dealer

Days

Auto

Page 11: Data Warehouse and OLAP - Lear-Fabini

Dealer

Auto

Dicing the Data

• The Fact Table is ‘diced’

(divided) by its dimensions

- the Dimension Tables

Date

Boring Barney’s

Witty Wilma’s

Friendly Fred’s

Go

bi

Aa

rdva

rk

200120022003200420052006

Page 12: Data Warehouse and OLAP - Lear-Fabini

Price

Dependent

AttributesDimension attributes

AutoDealerDate

ColorModelSerial#

Phone#CityNameYearMonthDay

Dealer Dimension

Auto Dimension

Days Dimension

Price

Sales Fact Table

Implementation: Star Schema

Date

Dimension

allows us to

aggregate by

year, month,

or week

Dealer

Dimension

allows us to

aggregate by

dealer

Auto

Dimension

allows us to

aggregate by

Model or Color

Animation

Dependent

Attributes help

group Aggregate

data, (like sum of

prices of autos

sold)

-

Page 13: Data Warehouse and OLAP - Lear-Fabini

Working within a Star Schema

• A Star Schema example can be found in my personal

‘sfabini’ CAT database

• Connect by issuing the following command:

Please do not alter the database

Page 14: Data Warehouse and OLAP - Lear-Fabini

ColorModelSerial#

Phone#CityName

YearMonthDay

Dealer Dimension

Auto Dimension

Days Dimension

Price

Implementation: Star Schema

Dimensions

-

Page 15: Data Warehouse and OLAP - Lear-Fabini

Price

Dependent

AttributesDimension attributes

Auto DealerDate

ColorModelSerial# Phone#CityName

YearMonthDay

Dealer DimensionAuto Dimension

Date Dimension

Price

Sales Fact Table

Implementation: Star Schema

Fact Table

-

Page 16: Data Warehouse and OLAP - Lear-Fabini

Dealer

Auto

Dicing the Data

• Recapping, does everyone see

the ‘dimensions’ and how

they divide the Fact Table?

• Date divided along x axis

• Different dealers on y axis

• Different autos on z axis

• Within the cube (fact table) are

the dots -- individual auto sales,

including how much it sold for (price)

Date

Boring Barney’s

Witty Wilma’s

Friendly Fred’s

Go

bi

Aa

rdva

rk

200120022003200420052006

Page 17: Data Warehouse and OLAP - Lear-Fabini

Dealer

Auto

Exercise: Slicing the Data

• Find Average Sales Price

by State

SELECT state, AVG(price)

FROM sales, dealers

WHERE sales.dealer =

dealer.name

GROUP BY state;

Date

Boring Barney’sWitty Wilma’s

ArizonaN

ew M

exico

Friendly Fred’s

Try it yourself!

Which state’s dealers maximize ASP

(average selling price)?

Page 18: Data Warehouse and OLAP - Lear-Fabini

Dealer

Auto

Exercise: Drilling Down

• Find Total Sales Price for Gobis at

Friendly Fred’s

SELECT color, SUM(price)

FROM sales,

NATURAL JOIN autos

JOIN dealer ON name = sales.dealer

WHERE sales.dealer = ‘Friendly Fred’

AND

Model = ‘Gobi’

GROUP BY color;

Date

Boring Barney’sWitty Wilma’s

ArizonaN

ew M

exico

Friendly Fred’s

Try it yourself!

Joining tables in your query, and then

Grouping ‘drills down’ the table

Page 19: Data Warehouse and OLAP - Lear-Fabini

Rolling Up and Drilling Down

• Drill-Down: Focusing on specific values in certain

dimensions

• looking at the sales of red Gobis at Friendly Fred’s

• Roll-Up: Aggregate along one or more dimensions

• Looking at all colors as groups to find the best selling color overall

Page 20: Data Warehouse and OLAP - Lear-Fabini

Foundations of the CUBE

Operator

Dealer

Date

AutoSum of price of

all autos, by a

specific dealer,

on all dates

Sum of price of a

specific auto, by a

specific dealer, on

all datesPrice of a specific

auto, by a specific

dealer, on a

specific date

The Cube

operator adds a

border of

aggregations to

the data cube

across all

combinations of

dimensions.

Sum of price of all

autos, by all

dealers, on all

dates

Page 21: Data Warehouse and OLAP - Lear-Fabini

CUBE vs. ROLLUP Operators

• WITH CUBE: If we add the term WITH CUBE to a group-by

clause, then we get not only the tuples for the group , but

also the tuples that represent aggregation along one or

more dimensions along which we have grouped

• WITH ROLLUP: A variant of the CUBE operator that

produces the additional aggregated tuples over the tail of

the sequence of grouping attributes

Page 22: Data Warehouse and OLAP - Lear-Fabini

Working with a Cube and

phpMyAdmin

• We need to use MySQL to demonstrate Cube/Rollup

• A Cube example can be found in my personal ‘sfabini’ CAT

database

• Connect via website: cat.pdx.edu/phpMyAdmin/

Page 23: Data Warehouse and OLAP - Lear-Fabini

Working with a Cube

• We need to use MySQL to demonstrate Cube/Rollup

• A Cube example can be found in my personal ‘sfabini’ CAT

database

• Connect by issuing the following command:

Page 25: Data Warehouse and OLAP - Lear-Fabini

Results: Auto Sales WITH ROLLUP

in MySQL

Aggregate of blue Gobis sold

Aggregate of red Gobis sold

Aggregate of all Gobis sold

Aggregate of all cars sold

Aggregate of red Gobis sold by any dealer on this date

Page 27: Data Warehouse and OLAP - Lear-Fabini

• CUBE Query Result (Partial):

• Rollup Query Result:

Comparison: WITH ROLLUP vs.

WITH CUBE results

All colors of Gobis sold at Friendly Fred’s on 5/21

Aggregate of all Gobis sold

Aggregate of all autos sold

Aggregate of all Gobi sales by any dealer on 5/21

… all such combinations are listed WITH CUBE

Red Gobis sold at Friendly Fred’s on 5/21

Red Gobis sold at any dealer on 5/21

Red Gobis sold at any dealer on any dateAny Gobi sold at any dealer on any date

Aggregate of all autos sold

Page 28: Data Warehouse and OLAP - Lear-Fabini

Homework

• 1a) Use the ‘sfabini’ mysql db to answer Exercise 10.7.2 (a)

and (b) from the book

• Notes:

• Execute your query FROM the SalesCube table

• Use the WITH ROLLUP operator, instead of WITH CUBE

• Smilin Sally is spelled without the apostrophe in the DB

• See Example 10.33 for further hints on using WITH ROLLUP