![Page 1: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/1.jpg)
Dimensional modelingMIS2502
Data Analytics
![Page 2: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/2.jpg)
So we know…
• Relational databases are good for storing transactional data• But bad for analytical data
• What we can do is design an analytical data store based on the operational data store
• That architecture gives us the advantages of both• Relational database for operational use• Analytical database for analysis (Online Analytical Processing)
![Page 3: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/3.jpg)
Why have a separate ADS?
• Issue 1: Performance• The structure is built to handle analysis• You keep the load off the operational data store
• Issue 2: Usability• We can structure the data in an intuitive way• You keep the load off of your IT department
![Page 4: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/4.jpg)
Some terminology
![Page 5: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/5.jpg)
How they all relate
We’ll start here.
![Page 6: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/6.jpg)
The Data Cube• Core component of
Online Analytical Processing and Multidimensional Data Analysis
• Made up of “facts” and “dimensions”
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
Product
Sto
re
Tim
e
M&Ms DietCoke
DoritosFamousAmos
Ardmore, PA
TempleMain
Cherry Hill,NJ
King of Prussia, PA
Jan. 2011
Feb. 2011
Mar. 2011
Quantity sold and total price are measured facts.Why isn’t product price a measured fact?
![Page 7: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/7.jpg)
The Data Cube
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
Product
Sto
re
Tim
e
M&Ms DietCoke
Doritos FamousAmos
Ardmore, PA
TempleMain
Cherry Hill,NJ
King of Prussia, PA
Jan. 2011
Feb. 2011
Mar. 2011
The highlighted element represents all the M&Ms sold in Ardmore, PA in
January, 2011
A single summary record representing
a business event (monthly sales).
![Page 8: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/8.jpg)
The Data Cube
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
Product
Sto
re
Tim
e
M&Ms DietCoke
Doritos FamousAmos
Ardmore, PA
TempleMain
Cherry Hill,NJ
King of Prussia, PA
Jan. 2011
Feb. 2011
Mar. 2011
The highlighted elements represent
Famous Amos cookies sold on Temple’s Main campus from
January to March, 2011
This is called “slicing the data.”
![Page 9: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/9.jpg)
The Data Cube
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
quantity
& total price
Product
Sto
re
Tim
e
M&Ms DietCoke
Doritos FamousAmos
Ardmore, PA
TempleMain
Cherry Hill,NJ
King of Prussia, PA
Jan. 2011
Feb. 2011
Mar. 2011
What do the orange highlighted elements
represent?
What do the orange highlighted elements
represent?
What do the blue highlighted elements
represent?
![Page 10: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/10.jpg)
The n-dimensional cube
• Could you have a data mart with five dimensions?• If so, give an example
• Then why does our cube example (and most others you will see) only have three?
![Page 11: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/11.jpg)
Designing the Cube: The Star Schema
• We can’t reasonably store the original data as a single table• Summarization would
be too slow• A lot of redundancy
• So it is stored as a star schema
SalesSales_ID
Product_IDStore_IDTime_ID
Quantity SoldTotal Price
ProductProduct_ID
Product_NameProduct_Price
Product_Weight
StoreStore_ID
Store_AddressStore_City
Store_StateStore_Type
TimeTime_ID
DayMonthYear
Fact
Dim
en
sion
Dim
en
sion
Dim
en
sion
![Page 12: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/12.jpg)
A join to make the cube?
SalesID
Qty. Sold
Total Price
Prod.ID
Prod.Name
Prod.Price
Prod.Weight
StoreID
StoreAddress
StoreCity
StoreState
StoreType
TimeID
Day Month Year
1000
1001
1002
Storing the entire join would generate many,
many rows!
Product Dimension Store Dimension Time DimensionSales Fact
![Page 13: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/13.jpg)
It adds up fast
![Page 14: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/14.jpg)
From Star Schema to Cubes
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
quantity& total price
Product
Store
M&Ms DietCoke
Doritos FamousAmos
Ardmore, PA
TempleMain
Cherry Hill,NJ
King of Prussia, PA
Jan. 2011
Feb. 2011
Mar. 2011
![Page 15: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/15.jpg)
Demo – Foodmart• A pre-created data cube that can be read in Excel• Summaries are already created
![Page 16: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/16.jpg)
Updating the cube• Data marts are non-volatile (i.e., they can’t be changed)• Logically: It’s a record of what has happened• Practically: Would require constant re-computation of the
cube
• The cube is refreshed periodically from the transactional database• Overnight• Daily• Weekly
![Page 17: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/17.jpg)
Designing the Star Schema
• Kimball’s Four Step Process for Data Cube Design (Kimball et al., 2008)
• Choose the business process• Decide on the level of granularity• Identify the dimensions• Identify the fact
• From where do you get the data?
![Page 18: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/18.jpg)
Choose the business process• What your data cube is “about”• Determined by the questions you want to answer about your organization
Question Business Process
Who is my best customer? Sales
What are my highest selling products? Sales
Which teachers have the best student performance?
Standardized testing
Which supplier is offering us the best deals?
Purchasing
Note that a “business process” is not always about business.
![Page 19: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/19.jpg)
Decide on the level of granularity• Level of detail for each event (row in the table)• Will determine the data in the dimensions
• Example: Who is my best customer?• The “event” is a sales transaction• Choices for time: yearly, quarterly, monthly, daily• Choices for store: store, city, state
How would you select the right granularity?
![Page 20: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/20.jpg)
Identify the dimensions• Determined by the business process
• Refined by the level of granularity
• The key elements of the process needed to answer to the question
• Example: Sales transaction• Our example schema defines a “sale” as
taking place for a particular product, in a particular store, at a particular time
• Could this data mart tell you• The best selling product?• The best customer? Try it for the “student
performance” example.
![Page 21: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/21.jpg)
Identify the fact
• The data associated with the business event
Try it for the “student performance” example.
![Page 22: DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What](https://reader034.vdocument.in/reader034/viewer/2022052702/56649f265503460f94c3e121/html5/thumbnails/22.jpg)
Data cube caveats• You have to choose your aggregations in advance
• So choose wisely!
• Consider a sales data cube with product, store, time, salesperson• If quantity_sold and total_price are the facts, you can’t figure out
the average number of people working in a store • All people might not have sold all products and therefore wouldn’t
be in the joined table
• Granularity is also an issue• Can’t track daily sales if “date” is monthly (pre-aggregated)• So why not include every single sale and do no aggregation
beforehand?