1
On-Line Analytic Processing
Warehousing
Data Cubes
2
Overview
• Traditional database systems are tuned to many, small, simple queries.
• Some new applications use fewer, more time-consuming, complex analytic queries.
• New architectures have been developed to handle analytic queries efficiently.
3
OLTP
• Most database operations involve On-Line Transaction Processing (OTLP).– Short, simple, frequent queries and/or
modifications, each involving a small number of tuples.
– ExamplesAnswering queries from a Web interfacesales at cash registersselling airline tickets
4
OLAP
• On-Line Analytic Processing (or A for “application”) queries: – Few, but complex queries– May query a large amount of data and run for
hours– Do not depend on having an absolutely up-to-
date database.
Example: OLAP Application
• Analysts at Wal-Mart look for items with increasing sales in some region recently.Sales(saledate,item,store,qty)
Items(item,size,color)
Stores(store,city,provice)
SELECT item,city,SUM(qty)
FROM Sales NATURAL JOIN Stores
WHERE saledate >= ‘2009-1-1’
GROUP BY item,city;
6
Data Warehouse
• It’s better for OLAP applications to take place in a separate copy of the master database.
• Analysis may involve data from various sources across the enterprise.
• Data warehouse is the most common form of data integration.– Copy sources into a single DB (warehouse) and try to
keep it up-to-date.
– Usual method: periodic reconstruction of the warehouse, perhaps overnight.
7
Common Architecture
• Databases at store branches handle OLTP.
• Local store databases copied to a central warehouse overnight.
• Analysts use the warehouse for OLAP.
Star Schemas
• A star schema is a common organization for data at a warehouse. It consists of:– Fact table : a very large accumulation of facts
such as sales.Often “insert-only.”
– Dimension tables : smaller, generally static information about the entities involved in the facts.
9
Example
• Suppose we want to record in a warehouse information about sales of products:– the store where the item is sold– the item sold– the customer who bought the item– the time when the item is sold– the price
• The fact table is a relation:Sales(store, item, customer, timeID, price)
10
Example(cont.)
• The dimension tables include information about stores, items, customers and time “dimensions”:
Stores(store,city,province)
Items(item, size, color, manf)
Customers(customer, addr, phone)
Time(timeID, day, week, month, year)
Visualization: Star Schema
11
Dimension Table Items Dimension Table Time
Dimension Table Customers
Dimension Table Stores
Fact Table - Sales
Dimension Attrs. Dependent Attr.
12
Dimension/Dependent Attributes
• Two classes of fact-table attributes:– Dimension attributes: the key of a dimension
table.Foreign key for fact table.
– Dependent attributes: a value determined by the dimension attributes of the tuple.More often called “measure” attributes.
13
Example: Dependent Attribute
• price is the dependent attribute of our example Sales relation.– Other dependent attributes can also be present,
e.g., quantity.
• It is determined by the combination of dimension attributes: store, item, customer and time attributes.
14
Approaches to Implementation
• ROLAP = relational OLAP: Use relational DBMS to support star schemas.
• MOLAP = multidimensional OLAP: Use a specialized multidimensional data structure.– e.g., data cube
• HOLAP = hybrid OLAP: Use both the above.
15
Data Cube
• OLAP data can be modelled in a multidimensional space manner.
• Keys of dimension tables are the dimensions of a hypercube.– Example: for the Sales data, the four
dimensions are store, item, customer and time.
• Dependent attributes (e.g., price) appear as points within the multidimensional space.
Visualization: Data Cube
16
price
store
item
customer
17
Data Cube w/ Aggregations
• Raw-data cube: original data in the fact table.
• Formal data cube: also includes points that represent aggregation (typically SUM) of the raw-data grouped in all subsets of dimensions.– Precomputed aggregations– Critical for fast response upon an analytic
query.
Visualization: Formal Data Cube
18
price
store
item
customer
SUM o
ver
all c
usto
mer
s
Tuple w/ Aggregate Components
• Think of each dimension as having an additional value *.– Stands for “all”.
• A point with one or more *’s in its coordinates aggregates over the dimensions with the *’s.
• Example: Sales(‘Shop-1’, ‘TV’, *, *) holds the sum of prices, over all customers and all time, of the TV sets sold at Shop-1.
19
Building Data Cube in SQL
• In SQL:1999SELECT store,item,customer,SUM(price)
FROM Sales
GROUP BY CUBE(store,item,customer);
– Group by 23 subsets of the three dimensions
– Use NULL for the “*”
– In SQL Server: GROUP BY … WITH CUBE
• To store the cube:CREATE MATERIALIZED VIEW myCube AS
… cube-generating statement here …
Lu Chaojun, SJTU
Variant of CUBE
• ROLLUP operator:SELECT store,item,customer,SUM(price)
FROM Sales
GROUP BY ROLLUP(store,item,customer);
– Group by 4 subsets of the three dimensions: {store, item, customer}, {store, item}, {store}, {}.
– In SQL Server: GROUP BY … WITH ROLLUP
Lu Chaojun, SJTU
Operations on Cube: Dicing
• Dicing and Slicing– Each dimension is partitioned at some level of
granularity.e.g., “store” dimension may be partitioned by store,
by city, by province. “time” dimension may be partitioned by day, by week, by month, by year.
– A choice of partition for each dimension “dices” the cube.
– A choice of partition for one dimension generate a “slices” of the cube.
Lu Chaojun, SJTU
Example: Slicing/Dicing
SELECT city, color, SUM(price)
FROM (((Sales NATURAL JOIN Stores)
NATURAL JOIN Items)
NATURAL JOIN Times)
WHERE year = 2009
GROUP BY city, color;– Slice in time dimension, and dice in store and
item dimension.
Lu Chaojun, SJTU
24
Operations on Cube: Roll-up
• Roll-up– Aggregate along one or more dimensions.– From finer gruanularity to coarser granularity:
Going up the dimension hierarchy.Reducing dimensions.
• Example– Given sales data of each store, roll it up into
aggregated data by cities. – Or simply omit the store dimension.
25
Operations on Cube: Drill-down
• Drill-down– “de-aggregate”: break an aggregate into its
constituents.– From coarser gruanularity to finer granularity:
Going down the dimension hierarchy.Adding dimensions.
• Example: having found that Shop-1 doesn’t sell TV well, break down its TV sales by particular size, or by the Time dimension.
Example: Roll-up/Drill-down
26
TV PC Refrige
Shop-1
45 33 30
Shop-2
50 36 42
Shop-3
38 31 40
Qty by store/itemQty by province/item
Roll upstore by province
Qty by city/item
Drill downstore by city
TV PC Refrige
Jiangsu 133
100
112
TV PC Refrige
Nanjing 60 36 80
Suzhou 73 74 32
End