introduction to dimesional modelling
TRANSCRIPT
AN INTRODUCTION TO DIMESIONAL DATA MODELLING
Ashish ChandwaniIntern – Nationwide InsuranceSchool – University of Maryland, College Park
CONTENTS
Overview of Data Warehouse Introduction to Dimensional Modelling Elements of Dimensional Model Designing a Dimensional Data Model Types of Schema Dimensional Data Model vs Relational Data Model
Data Warehouse
Central Repositories of Integrated Data from one or more diverse sources.
Store current and historical data.
Sometimes referred to as Enterprise Data Warehouse
Often data is collected from multiple sources within and outside the organization and processes are deployed involving cleansing and data integrity.
DW is used for reporting and analysis.
Introduction to Dimensional Modelling
Dimensional Modeling is a technique for database design.
Important for supporting end user queries relating to business transactions.
Intended to support analysis and reporting. Contains business attribute tables (dimensions) and business transaction tables(facts/measures).
Used as basis for OLAP(Online Analytical Processing) cubes.
Elements of Dimensional Data Model
Dimensions Table: Collection of reference information
about a business. Eg: Location, Product and Date are dimensions for certain metrics of organizations like Nationwide Insurance.
Each dimension table contains attributes which describe the details of the dimension. Eg: Product dimensions can contain product name, type, price.
Each dimension table may also contain hierarchies. For eg: Location dimension can contain location name, location city, location state, location country.
Fact Table Measurable events for which
dimension table data is collected and is used for analysis and reporting.
Facts tables could contain information like sales against a set of dimensions like Location, Product and Date.
Primary Key in Dimensional Models are mapped as foreign keys in the Fact Tables.
Usually these keys are Surrogate Keys.
Dimensions contain the context for the business problems and facts are the measures for those contexts.
Surrogate Keys
Before moving along to understand how to design dimensional models, it’s important to understand the concept of Surrogate Keys.
A surrogate key is an unintelligent/dumb key which is not derived from application data like natural keys.
Surrogate key is artificially derived to cover regular changes with in the fact and dimension tables.
It is usually an incremental key with values from 1 to N against each row entry in the data warehouse tables.
Why Surrogate Keys
Avoid backend application data key conflicts. Consistency among dimension keys as different
backend application may use different columns as keys.
Covers the data warehouse for changes in the backend application data.
Implement history of slowly changing dimensions. Usually surrogate keys are integers and not
characters. Surrogate keys are also used for recycling data as
per business requirements.
Designing a Dimensional Model
Understand the business problem = Most Important.Basically while designing a data model solution you should be able to answer : Why, How Much , When/Where/Who, WhatDesigning Dimensional Models typically involves the following steps:
Choose the Business Process •WhyDeclare the Grain •How
Much
Identify the Dimension •3Ws
Identify the Fact
Choose the Business Process
The actual business processes the data warehouse should cover.
Describe the problem on which/for which models should be built on.
This is the “why” of building a data model.
Here is a sample business process :-
The Senior Executives at Nationwide want to determine the sales for certain products in different location for a particular time period.
Declare the Grain
The Grain describes the level of detail needed for the business problem/solution.
Lowest level of information stored in any table. This is the “How much” of building a data model.
Sample Grain:The Senior Executives at Nationwide want to determine the sales for certain products in different locations for every week.So the grain is “by product by location by week”.
Identify the Dimension
Dimensions are the reference information for the business. Contains dimension tables with their attributes(columns) and
hierarchies. This is the “When, Where and Who” of building a data model
Sample Dimensions:The Senior Executives at Nationwide want to
determine the sales for certain products in different locations for a particular time period.Dimensions here are : - Products, Location and TimeDimension Attributes :- For Product - Product key(surrogate key), Product Name, Product specs, Product type.Dimension Hierarchies : - For location – location country, location city, location street, location name
Identify the Fact
Measurable events for Dimensions. This is the “What” of building a data model
Sample Facts:The Senior Executives at Nationwide want to
determine the sales for certain products in certain locations for a particular time period.Fact here is :- Sum of Sales by product by location by time.
Types of Dimensional Model Schemas
A star schema is the one in which a central fact table is surrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table.
A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions.
Star Schema Snowflake Schema
Differences between Star and Snowflake Schema
Property Star Schema Snowflake SchemaEase of maintenance / change
Easy to maintain due to low redundancy.
Difficult to maintain due to high redundancy.
Facts and Dimension Properties
Dimension Tables are normalized, Fact tables are denormalized
Dimension Tables and tables are denormalized
Ease of Use Difficult to understand to due to increased queries
Easier to understand due to simple queries
Query Performance Poor, due to increased complexity in joins.(increased foreign keys)
Good, less complexity.(Less foreign keys).
Type of Data warehouse
Complex Relations ( Many to Many)
Simple Relations ( One to One/ One to Many)
When to use Greater size of dimension tables, snowflake schema helps reduce space.
Smaller size of dimension tables.
Slowly Changing Dimensions
Sometimes the attribute information with in the dimensions might be altered to correspond to business decisions/rules.
Hence dimension information would be altered which has to be accounted for in the data model.
The changes in the dimension are unpredictable rather than changing over a fixed schedule.
These are Slowly Changing Dimensions.
Illustration of Slowly Changing Dimensions - I Lets consider our example: The Senior Executives at Nationwide want to determine
the sales for certain products in certain locations for a particular time period.
Consider the Product Dimension:Product Key Product
NameProduct Type Product
Price1 Nationwide
PersonalPL $10
2 Nationwide Commercial
CL $25
3 Nationwide Pet
PL $35 Let us consider the company decided tomorrow that Nationwide pet should be
classified as others instead of PL or decided to change the price of Nationwide Personal from $10 to $15?
How will that affect the analysis and reporting and how do we account for such changes?
Do we keep the old historical data or we insert the new data directly?
Methodologies for Handling Slowly Changing Dimensions Type 1- No need to track historical data simply
overwrite the existing data with the new one. (No history)
Type 2 – Historical data should be tracked. Create a new row for the natural key but with a different surrogate key. ( Full History)
Type 3 – Historical data should be partially tracked. Between Type 1 and Type 2. Insert additional columns to track current and last state of the changing attribute. (Partial History).
Illustrations of SCD Handling Methodologies
Lets take a product dimension. Product Type Changes for Nationwide Pet from PL to others.
Type1:
Type2:
Type3:
Product Key Product Name
Product Type Product Price
3 Nationwide Pet
PL $35
Product Key Product Name
Product Type Product Price
3 Nationwide Pet
Others $35
Product Key
Product Name
Product Type
Product Price
Effective Date
Expiry Date
Latest_Ind
3 Nationwide Pet PL $35 01-01-2000
08-10-2015
N
4 Nationwide Pet Others $35 08-11-2015
12-31-9999
YProduct Key Product
NameProduct Type_Old
Product Price Product Type_New
3 Nationwide Pet PL $35 Others
Relational vs Dimensional Models
Relational Data Models Dimensional Data ModelsUnits of storage are tables. Units of Storage are CubesData is Normalized. Data is Denormalized.Detailed Level of Transaction. Aggregates and Measures used for
Business.Volatile and Time Variant. Non Volatile and Time Invariant.Used for OLTP. Used for OLAP CubesNormal Reports. Interactive, user friendly reports.
References
http://learndatamodeling.com/blog/comparison-of-relational-and-dimensional-data-modeling/
http://searchdatamanagement.techtarget.com/definition/fact-table
https://technet.microsoft.com/en-us/library/Aa905979(v=SQL.80).aspx
http://dwbi.org/data-modelling/dimensional-model/1-dimensional-modeling-guide
http://dwbi.org/data-modelling/dimensional-model/19-modeling-for-various-slowly-changing-dimension