datawarehousing concepts | 7.0 9/7/2015 datawarehousing concepts
TRANSCRIPT
© 2 04/19/23Datawarehousing Concepts | 7.0
Objectives
The participants will be able to:
Discuss the basic concepts of Data warehousing
Explain the business need for decision support system
Define the Data warehouse features like KPI, fact, dimension
Describe the architecture of Data warehouse
Describe the terms OLTP and OLAP and explain the difference between them
Describe Entity Relationship Diagram with the help of an example
Describe classical star schema
Explain different variations of classical star schema
© 3 04/19/23Datawarehousing Concepts | 7.0
Topics
Business need for decision support system
Datawarehouse definitions
Features of Datawarehouse
Entity Relationship diagram
Classical Star Schema
Different variations of Classical Star Schema
© 4 04/19/23Datawarehousing Concepts | 7.0
A Decision support system needs to meet the following demands made by decision makers: Immediate, single-point access to all relevant information regardless of source
Coverage of all business processes.
High quality of information not only in terms of Data content, but also in terms of the ability to evaluate Data flexibly.
High quality decision-making support: The Data warehouse must be developed and structured on the basis of requirements of operative and strategic management.
Short implementation time with less resources: As well being quick to implement, a Data warehouse must enable simple and quick access to relevant Data.
Business need for decision support system
© 5 04/19/23Datawarehousing Concepts | 7.0
Data warehousing is a tool dedicated to the delivery of information which advances decision making, improves business practices, and empowers business users.
Integrating Data from multiple sources, internal and external.
Providing subject-oriented views of the business through current and historical Data.
Providing a platform for consistent Data repository to analyze different sources of information.
Datawarehouse Definitions
© 6 04/19/23Datawarehousing Concepts | 7.0
Data Extraction & Loading Gathering Data from operational systems (ERP / Legacy)
Cleansing Data
Aggregating the Data
Data Warehouse Optimized for performance
Storing historical Data
Building the schema : Star Schema
The OLAP Cube Multi-dimensional modeling
front-end access tools
Datawarehouse Definitions
© 7 04/19/23Datawarehousing Concepts | 7.0
Fact: The information that business users want to know
The performance measures of the business
Facts are numbers, percentages
Sales volume, sales quantity etc. can be considered as facts
Dimension: How the Data needs to be viewed, like by Sales Organization, Distribution Channel etc.
A Data Model based on: Business Objectives
Business Strategy
Facts and Dimension
© 8 04/19/23Datawarehousing Concepts | 7.0
Key Performance Indicators (KPI)
Internal Process Measures
Innovation and Learning Measures
Customer Measures
Financial Measures
% Sales of New Products
Customers Acquired
Customer Satisfaction
Market Share
ROI and ROA
Revenue Growth
Product Time to Market
Unit Manufacturing Cost
Days Supply to inventory
New Product Introduction
Mgmt Skills
Employee Turnover
© 9 04/19/23Datawarehousing Concepts | 7.0
InvoicingSystems
Purchasing Systems
General Ledger
Ext. Data Sources
Other Int.Systems
Source Data
Data Extraction Integration
and Cleansing Processes
Purchasing
Marketing and Sales
Corporate Information
Product Line
Location
Summation
Functional Area
Translate
Attribute
Calculate
Derive
Synchronize
Summarize
Segmented Data Subsets
Summarized Data
Custom Developed
Applications
Query AccessTools
DataMining
StatisticalPrograms
Data Marts
Extract Operational Data Store Transformation
ApplicationsDataWarehouse
Generic Data warehouse Architecture
© 10 04/19/23Datawarehousing Concepts | 7.0
Distinction between the Operative/inoperative environment
© 11 04/19/23Datawarehousing Concepts | 7.0
OLTP Systems compared to OLAP Systems
OLTP Systems OLAP Systems
Target Efficiency through automation of business processes
Generation of knowledge (competitive advantage)
Priorities High availability, higher Data volume
simple use, flexible Data access
View of Data detailed frequently aggregated
Database operations add, change, delete (refresh) and read
read
Typical Data structures relational (flat tables, high normalization)
multi-dimensional structures
Integration of Data from various modules/applications
minimal comprehensive
© 12 04/19/23Datawarehousing Concepts | 7.0
OLTP Systems compared to OLAP Systems…contd(1)
OLTP Systems OLAP Systems
Dataset Dynamic, short lived
( 60-90 days )
Static; historical ( 2+ years )
Application oriented Subject oriented
Purpose Day-to-day operations Planning & knowledge based functions
Highly structured repetitive processing
Highly unstructured analytical processing
User base Mostly operational community
Mostly managerial community
© 13 04/19/23Datawarehousing Concepts | 7.0
OLAP, MOLAP, ROLAP, HOLAP
OLAP OLAP :: On Line Analytical ProcessingOn Line Analytical Processing
MOLAPMOLAP: Multidimensional OLAP A multidimensional Database and an analytical engine e.g. EssBase from Arbor Software
ROLAPROLAP: Relational OLAP Analytical engine that front-ends a relational DB: Data stored in relational DBMS and
build multidimensional views of the Data
HOLAP: Hybrid OLAP A combination of relational OLAP and multidimensional OLAP
© 15 04/19/23Datawarehousing Concepts | 7.0
Developing an ERD Developing an ERD requires an understanding of the system and its components.
Consider a hospital: Patients are treated in a single ward by the doctors assigned to them. Usually each patient will be assigned a single doctor, but in rare cases they will have two.
Healthcare assistants also attend to the patients, a number of these are associated with each ward.
Initially the system will be concerned solely with drug treatment. Each patient is required to take a variety of drugs a certain number of times per day and for varying lengths of time.
The system must record details concerning patient treatment and staff payment. Some staff are paid part time and doctors and care assistants work varying amounts of overtime at varying rates (subject to grade).
The system will also need to track what treatments are required for which patients and when and it should be capable of calculating the cost of treatment per week for each patient (though it is currently unclear to what use this information will be put).
Building an Entity Relationship Diagram
© 17 04/19/23Datawarehousing Concepts | 7.0
Customer ID
Customer name
City
Region
Time ID
Month
Quarter
Year
Material Name
Customer ID
Material IDTime ID
Sales Volume
Sales Quantity
Customer dimension
Fact
Time dimension
Material dimension
Classical Star Schema
Material ID
Material Group
© 18 04/19/23Datawarehousing Concepts | 7.0
Dimension Tables
Customer Dimension Table
Material Dimension Table
Time Dimension Table
Customer id Customer name
City Region
C100 David London North
C200 Peter Paris West
Material id Material name
Material Group
…..
M1111 Hard Disc Hardware …..
M2222 Keyboard Software ….
Time id Month Quarter Year
07.01.2004 01.2004 Q1/2004 2004
05.08.2004 08.2004 Q3/2004 2004
© 19 04/19/23Datawarehousing Concepts | 7.0
Fact Table
Fact Table
Time id Customer id Material id Sales Volume
Quantity
07.01.2004 C100 M1111 50,000 100
07.01.2004 C100 M2222 3,000 60
07.01.2004 C200 M1111 100,000 250
07.01.2004 C200 M2222 10,000 250
05.08.2004 C100 M1111 25,000 50
05.08.2004 C200 M2222 300 6
…. …. …. …. ….
© 20 04/19/23Datawarehousing Concepts | 7.0
Classical Star Schema
Customer Dimension Table Material Dimension Table
Fact Table
Time Dimension Table
Customer id Customer name
C100 David
C200 Peter
Material id Material name …..
M1111 Hard Disc …..
M2222 Keyboard ….
Time id Month ….
07.01.2004 01.2004 ….
05.08.2004 08.2004 ….
Time id Customer id Material id Sales Volume Quantity
07.01.2004 C100 M1111 50,000 100
07.01.2004 C100 M2222 3,000 60
…. …. …. …. ….
© 23 04/19/23Datawarehousing Concepts | 7.0
Material Name
Customer dimension
Fact
Time dimension
Material dimension
Snowflake Schema
Material ID
Customer ID
Material IDTime ID
Sales Volume
Sales Quantity
Material Group
Material ID
Customer Name
Customer ID
City
Customer ID
Region
Month
Time ID
Quarter
Year
© 24 04/19/23Datawarehousing Concepts | 7.0
Summary of Datawarehousing and Modeling
Datawarehouse reflects subject oriented view of Data suitable for analysis purpose.
Datawarehouse provides high quality information to support decision making in an organization.
KPIs are set of measures derived from strategies, goals and objectives.
Facts are numeric measures, dimensions are a perspective by which a fact is viewed.
Generic Datawarehouse architecture consists of source system, extraction, transformation and loading, storing Data and analysis.
OLTP is best suitable for transactional systems( for insert/update/delete), whereas OLAP is most suited for analytical purpose (executing adhoc queries)
© 25 04/19/23Datawarehousing Concepts | 7.0
Summary of Datawarehousing and Modeling…contd(1)
Classical star schema consists of a single fact table surrounded by large demoralized dimension tables.
Dimension tables are linked relationally with the fact table by way of foreign key or primary key relationships.
Multi dimensional modeling represents a dimensional view of Data suitable for analysis
Snow flake schema is a type of star schema where dimension tables are normalized to eliminate redundancy but increases number of table joins.