introduction to dimesional modelling

21
AN INTRODUCTION TO DIMESIONAL DATA MODELLING Ashish Chandwani Intern – Nationwide Insurance School – University of Maryland, College Park

Upload: ashish-chandwani

Post on 20-Feb-2017

260 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Introduction to Dimesional Modelling

AN INTRODUCTION TO DIMESIONAL DATA MODELLING

Ashish ChandwaniIntern – Nationwide InsuranceSchool – University of Maryland, College Park

Page 2: Introduction to Dimesional Modelling

CONTENTS

Overview of Data Warehouse Introduction to Dimensional Modelling Elements of Dimensional Model Designing a Dimensional Data Model Types of Schema Dimensional Data Model vs Relational Data Model

Page 3: Introduction to Dimesional Modelling

Data Warehouse

Central Repositories of Integrated Data from one or more diverse sources.

Store current and historical data.

Sometimes referred to as Enterprise Data Warehouse

Often data is collected from multiple sources within and outside the organization and processes are deployed involving cleansing and data integrity.

DW is used for reporting and analysis.

Page 4: Introduction to Dimesional Modelling

Introduction to Dimensional Modelling

Dimensional Modeling is a technique for database design.

Important for supporting end user queries relating to business transactions.

Intended to support analysis and reporting. Contains business attribute tables (dimensions) and business transaction tables(facts/measures).

Used as basis for OLAP(Online Analytical Processing) cubes.

Page 5: Introduction to Dimesional Modelling

Elements of Dimensional Data Model

Dimensions Table: Collection of reference information

about a business. Eg: Location, Product and Date are dimensions for certain metrics of organizations like Nationwide Insurance.

Each dimension table contains attributes which describe the details of the dimension. Eg: Product dimensions can contain product name, type, price.

Each dimension table may also contain hierarchies. For eg: Location dimension can contain location name, location city, location state, location country.

Fact Table Measurable events for which

dimension table data is collected and is used for analysis and reporting.

Facts tables could contain information like sales against a set of dimensions like Location, Product and Date.

Primary Key in Dimensional Models are mapped as foreign keys in the Fact Tables.

Usually these keys are Surrogate Keys.

Dimensions contain the context for the business problems and facts are the measures for those contexts.

Page 6: Introduction to Dimesional Modelling

Surrogate Keys

Before moving along to understand how to design dimensional models, it’s important to understand the concept of Surrogate Keys.

A surrogate key is an unintelligent/dumb key which is not derived from application data like natural keys.

Surrogate key is artificially derived to cover regular changes with in the fact and dimension tables.

It is usually an incremental key with values from 1 to N against each row entry in the data warehouse tables.

Page 7: Introduction to Dimesional Modelling

Why Surrogate Keys

Avoid backend application data key conflicts. Consistency among dimension keys as different

backend application may use different columns as keys.

Covers the data warehouse for changes in the backend application data.

Implement history of slowly changing dimensions. Usually surrogate keys are integers and not

characters. Surrogate keys are also used for recycling data as

per business requirements.

Page 8: Introduction to Dimesional Modelling

Designing a Dimensional Model

Understand the business problem = Most Important.Basically while designing a data model solution you should be able to answer : Why, How Much , When/Where/Who, WhatDesigning Dimensional Models typically involves the following steps:

Choose the Business Process •WhyDeclare the Grain •How

Much

Identify the Dimension •3Ws

Identify the Fact

Page 9: Introduction to Dimesional Modelling

Choose the Business Process

The actual business processes the data warehouse should cover.

Describe the problem on which/for which models should be built on.

This is the “why” of building a data model.

Here is a sample business process :-

The Senior Executives at Nationwide want to determine the sales for certain products in different location for a particular time period.

Page 10: Introduction to Dimesional Modelling

Declare the Grain

The Grain describes the level of detail needed for the business problem/solution.

Lowest level of information stored in any table. This is the “How much” of building a data model.

Sample Grain:The Senior Executives at Nationwide want to determine the sales for certain products in different locations for every week.So the grain is “by product by location by week”.

Page 11: Introduction to Dimesional Modelling

Identify the Dimension

Dimensions are the reference information for the business. Contains dimension tables with their attributes(columns) and

hierarchies. This is the “When, Where and Who” of building a data model

Sample Dimensions:The Senior Executives at Nationwide want to

determine the sales for certain products in different locations for a particular time period.Dimensions here are : - Products, Location and TimeDimension Attributes :- For Product - Product key(surrogate key), Product Name, Product specs, Product type.Dimension Hierarchies : - For location – location country, location city, location street, location name

Page 12: Introduction to Dimesional Modelling

Identify the Fact

Measurable events for Dimensions. This is the “What” of building a data model

Sample Facts:The Senior Executives at Nationwide want to

determine the sales for certain products in certain locations for a particular time period.Fact here is :- Sum of Sales by product by location by time.

Page 13: Introduction to Dimesional Modelling

Types of Dimensional Model Schemas

A star schema is the one in which a central fact table is surrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table.

A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions.

Star Schema Snowflake Schema

Page 14: Introduction to Dimesional Modelling

Differences between Star and Snowflake Schema

Property Star Schema Snowflake SchemaEase of maintenance / change

Easy to maintain due to low redundancy.

Difficult to maintain due to high redundancy.

Facts and Dimension Properties

Dimension Tables are normalized, Fact tables are denormalized

Dimension Tables and tables are denormalized

Ease of Use Difficult to understand to due to increased queries

Easier to understand due to simple queries

Query Performance Poor, due to increased complexity in joins.(increased foreign keys)

Good, less complexity.(Less foreign keys).

Type of Data warehouse

Complex Relations ( Many to Many)

Simple Relations ( One to One/ One to Many)

When to use Greater size of dimension tables, snowflake schema helps reduce space.

Smaller size of dimension tables.

Page 15: Introduction to Dimesional Modelling

Slowly Changing Dimensions

Sometimes the attribute information with in the dimensions might be altered to correspond to business decisions/rules.

Hence dimension information would be altered which has to be accounted for in the data model.

The changes in the dimension are unpredictable rather than changing over a fixed schedule.

These are Slowly Changing Dimensions.

Page 16: Introduction to Dimesional Modelling

Illustration of Slowly Changing Dimensions - I Lets consider our example: The Senior Executives at Nationwide want to determine

the sales for certain products in certain locations for a particular time period.

Consider the Product Dimension:Product Key Product

NameProduct Type Product

Price1 Nationwide

PersonalPL $10

2 Nationwide Commercial

CL $25

3 Nationwide Pet

PL $35 Let us consider the company decided tomorrow that Nationwide pet should be

classified as others instead of PL or decided to change the price of Nationwide Personal from $10 to $15?

How will that affect the analysis and reporting and how do we account for such changes?

Do we keep the old historical data or we insert the new data directly?

Page 17: Introduction to Dimesional Modelling

Methodologies for Handling Slowly Changing Dimensions Type 1- No need to track historical data simply

overwrite the existing data with the new one. (No history)

Type 2 – Historical data should be tracked. Create a new row for the natural key but with a different surrogate key. ( Full History)

Type 3 – Historical data should be partially tracked. Between Type 1 and Type 2. Insert additional columns to track current and last state of the changing attribute. (Partial History).

Page 18: Introduction to Dimesional Modelling

Illustrations of SCD Handling Methodologies

Lets take a product dimension. Product Type Changes for Nationwide Pet from PL to others.

Type1:

Type2:

Type3:

Product Key Product Name

Product Type Product Price

3 Nationwide Pet

PL $35

Product Key Product Name

Product Type Product Price

3 Nationwide Pet

Others $35

Product Key

Product Name

Product Type

Product Price

Effective Date

Expiry Date

Latest_Ind

3 Nationwide Pet PL $35 01-01-2000

08-10-2015

N

4 Nationwide Pet Others $35 08-11-2015

12-31-9999

YProduct Key Product

NameProduct Type_Old

Product Price Product Type_New

3 Nationwide Pet PL $35 Others

Page 19: Introduction to Dimesional Modelling

Relational vs Dimensional Models

Relational Data Models Dimensional Data ModelsUnits of storage are tables. Units of Storage are CubesData is Normalized. Data is Denormalized.Detailed Level of Transaction. Aggregates and Measures used for

Business.Volatile and Time Variant. Non Volatile and Time Invariant.Used for OLTP. Used for OLAP CubesNormal Reports. Interactive, user friendly reports.

Page 20: Introduction to Dimesional Modelling

References

http://learndatamodeling.com/blog/comparison-of-relational-and-dimensional-data-modeling/

http://searchdatamanagement.techtarget.com/definition/fact-table

https://technet.microsoft.com/en-us/library/Aa905979(v=SQL.80).aspx

http://dwbi.org/data-modelling/dimensional-model/1-dimensional-modeling-guide

http://dwbi.org/data-modelling/dimensional-model/19-modeling-for-various-slowly-changing-dimension

Page 21: Introduction to Dimesional Modelling