dwh concepts hc

40
Dataware House Concepts

Upload: aniludavala16

Post on 31-Dec-2015

21 views

Category:

Documents


0 download

DESCRIPTION

dataware house

TRANSCRIPT

Page 1: DWH Concepts HC

Dataware House Concepts

Page 2: DWH Concepts HC

Topics

• Business Intelligence

• Components of Business Intelligence

• How BI help Companies?

• Data Warehouse concepts

Page 3: DWH Concepts HC

Business Intelligence

What is Business Intelligence?• BI refers to technologies, applications for

collection, integration, analysis & presentation of Business Information.

How different it is from OLTP Systems?• OLTP systems are designed for Day to Day

operations, while BI applications are for strategic decision making.

Page 4: DWH Concepts HC

Components of Business Intelligence

– Data Warehouse– Data Marts– OLAP (Reports & Dash Boards)– Operational Data Store– Data Mining

Page 5: DWH Concepts HC

How BI help Companies? Historical Analysis of Data (Trend Analysis) Predictive Analysis & Planning Churn Management Strategic Decision Making Single view of Customer Cross Sell/Up-Sell Finding out relation between products, fraud

management (Using Data Mining) Etc..

Page 6: DWH Concepts HC

Data WarehouseData Warehouse is a collection of integrated, subject

oriented database designed to support the DSS

function, where each unit is relevant to some moment

in time.

Goals of Data Warehousing:• Easy Accessibility of Information• Present organizations information consistently• Adaptive and resilient to change• Secure bastion that protects our information assets• Serve as a foundation for improved decision making

Page 7: DWH Concepts HC

Characteristics of DW

• Subject Oriented

• Integrated

• Non Volatile

• Time Variant (explicit dependence on time)

Page 8: DWH Concepts HC

OLTP and OLAP

OLTP:• Online Transaction Processing (OLTP)• OLTP systems were built to automate business

transactions.• A focus on bookkeeping functions.• Applications were built along functional lines• Historical data was typically not needed or

retained.

Page 9: DWH Concepts HC

OLTP and OLAP

OLAP:

• Standard reporting

• Ad-hoc query and reporting

• Multidimensional analytical reporting

• Predictive analysis and planning

Page 10: DWH Concepts HC

Dimensional ModelingDimensional Modeling is a logical design technique for Data

warehousing aimed at easier access of data and business

representation of data.

Features of Dimensional Model:• De-normalized Structures• Database structured for faster and easier querying• Stress is on easier interpretation of data • Dimensional Databases occupy extra space compared to

equivalent ER model because of redundancy • Consists of Fact & Dimension Tables

Page 11: DWH Concepts HC

Dimension tables

• Dimension tables contain the details about the business entities such as customer, product, etc. This enables the business users to better understand the data and their reports.

• Since the data in a dimension table is demoralized, it typically has a large number of columns.

• The attributes in a dimension table are typically used as row and column headings in a report or query results display.

• Arrange members into hierarchies or levels

Page 12: DWH Concepts HC

Dimension tables

Page 13: DWH Concepts HC

Fact table

A fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema or a snowflake schema, surrounded by dimension tables.

• A typical fact table contains numeric facts and foreign keys that references dimension tables.

Page 14: DWH Concepts HC

Star Schema • Star schema is combination of fact table and

several dimension tables.• Each Dimension has foreign key relationship

with fact table.• Such an arrangement in the dimensional model

looks like a star formation, with the fact table at the core of the star and the dimension tables along the spikes of the

Page 15: DWH Concepts HC

Star Schema

Page 16: DWH Concepts HC

Snowflake Schema Design

• Dimension table hierarchies are broken into simpler

tables

• In few organizations, they try to normalize the dimension

tables to save space

• Both Fact and Dimensional tables are Normalized

• Increases the number of joins and poor performance in

retrieval of data

• May become large and unmanageable

• Degrades query performance

Page 17: DWH Concepts HC

Snowflake Schema Design

Page 18: DWH Concepts HC

DA

• It is the process of extracting the relevant business info/- from the different source systems transforming the data from one format into an another format, integrating the data in to homogeneous format and loading the data in to a warehouse database.

• Data Extraction (E)• Data Transformation (T)• Data Loading (L)

Page 19: DWH Concepts HC

Sample ETL Process Flow

Step 1: Select the Business Process

Step 2: Declare the Grain.

Step 3: Identify the Facts

Step 4: Choose the Dimensions

Page 20: DWH Concepts HC

ETL Process

The ETL Process having the following basic steps

• Is mapping the data between source systems and target database

• Is cleansing of source data in staging area

• Is transforming cleansed source data and then loading into the target system

Page 21: DWH Concepts HC

ETL Process

• Source SystemA database, application, file, or other storage facility from which the data in a data warehouse is derived.

• MappingThe definition of the relationship and data flow between source and target objects.

• Staging AreaA place where data is processed before entering the warehouse.

• CleansingThe process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

Page 22: DWH Concepts HC

ETL Process

• TransformationThe process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.

• TransportationThe process of moving copied or transformed data from a source to a data warehouse.

• Target SystemA database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.

Page 23: DWH Concepts HC

Important aspects of Star Schema & Snow Flake Schema

• In a star schema every dimension will have a primary key. • In a star schema, a dimension table will not have any

parent table. • Whereas in a snow flake schema, a dimension table will

have one or more parent tables. • Hierarchies for the dimensions are stored in the

dimensional table itself in star schema. • Whereas hierarchies are broken into separate tables in

snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.

Page 24: DWH Concepts HC

Designing a Dimension Model

Step 1: Select the Business Process

Step 2: Declare the Grain.

Step 3: Identify the Facts

Step 4: Choose the Dimensions

Page 25: DWH Concepts HC

Slowly-changing Dimensions

When the DW receives notification that some record in a dimension has changed, there are three basic responses:

• Type 1 slow changing dimension (Correction of Errors)• Type 2 slow changing dimensions(Preservation of

History)• Type 3 slow changing dimensions (Alternate Realities)

Page 26: DWH Concepts HC

Slowly-changing DimensionsType 1 Slowly Changing Dimension (Overwrite)• Overwrite one or more values of the dimension with the

new value• Use when • the data are corrected• there is no interest in keeping history• there is no need to run previous reports or the changed

value is immaterial to the report• Type 1 Overwrite results in an UPDATE SQL statement

when the value changes

Page 27: DWH Concepts HC

Slowly-changing DimensionsType-2 Slowly Changing Dimension (Preservation of History)

Standard When a record changes, instead of overwriting• create a new dimension record• with a new surrogate key• add the new record into the dimension table• use this record going forward in all fact tables• no fact tables need to change• no aggregates need to be re-computed

Types of Type 2

• Flag based (Active/Inactive)• Version based (1,2,3..)• Start Date & End Date based

Page 28: DWH Concepts HC

Slowly-changing DimensionsType-3 Slowly Changing Dimensions (Alternate

Realities/Soft Changes)• Applicable when a change happens to a dimension record

but the old record remains valid as a second choice• Product category designations• Sales-territory assignments

Instead of creating a new row, a new column is inserted (if it

does not already exist)• The old value is added to the secondary column• Before the new value overrides the primary column• Example: old category, new category

Page 29: DWH Concepts HC

DW Tools

Vendor ETL OLAP• SAP SAP BW/BODI Business Objects• Oracle Oracle BW Hyperion/Seibel Analytics• Microsoft SQL Server SSIS SQL Server SSRS• Informatics Informatics Power Center Power Analyzer

• IBM Data Stage Cognos8

Page 30: DWH Concepts HC

DW Project Lifecycle• Project Planning• Business Requirement Definition• Technical Architecture Design• Dimensional Modeling• Physical Design• Data Staging Design and Development (ETL)• Analytic Application Specification Design and

Development (OLAP)• Testing and Production Deployment• Maintenance

Page 31: DWH Concepts HC

Dimensions

• Types of Dimensions:Junk DimensionConfirmedDegenerate DimensionSlowly Changing Dimensions

Page 32: DWH Concepts HC

Junk Dimension

A junk dimension is a convenient grouping of typically low-cardinality flags and indicators. By creating an abstract dimension, these flags and indicators are removed from the fact table while placing them into a useful dimensional framework. A Junk Dimension is a dimension table consisting of attributes that do not belong in the fact table or in any of the existing dimension tables. The nature of these attributes is usually text or various flags, e.g. non-generic comments or just simple yes/no or true/false indicators.

Page 33: DWH Concepts HC

Degenerate Dimensions

A dimension key, such as a transaction number, invoice number, ticket number, or bill-of-lading number, that has no attributes and hence does not join to an actual dimension table. Degenerate dimensions are very common when the grain of a fact table represents a single transaction item or line item because the degenerate dimension represents the unique identifier of the parent. Degenerate dimensions often play an integral role in the fact table's primary key.

Page 34: DWH Concepts HC

Degenerate Dimensions• "A degenerate dimension is data that is dimensional in

nature but stored in a fact table.• "Any values in the fact table that don’t join to dimensions

are either considered degenerate dimensions or measures."

• "A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table.

• "A degenerate dimension acts as a dimension key in the fact table but does not join a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions."

Page 35: DWH Concepts HC

Types Fact tables And Facts

Facts:– Additive Facts– Semi Additive Facts– Non-Additive Facts

Page 36: DWH Concepts HC

Types Fact tables And Facts

Facts:– Additive Facts– Semi Additive Facts– Non-Additive Facts

Page 37: DWH Concepts HC

Fact tables

– Transaction Fact Table– Factless Fact Table– Snapshot Fact Table

Page 38: DWH Concepts HC

E/R Modeling• E/R modeling is a design technique in which we

store the data in highly normalized form inside a relational database.

• Features of ER model:• ER model is highly normalized• Stress is on optimization of OLTP transaction

Page 39: DWH Concepts HC

ER Model for Retail SalesPromotion

Page 40: DWH Concepts HC

Answer the following Queries

• Total Sales for Product • Total Sales by Store• Total Sales by Country• Total Sales by Year• Total Sales by Qtr