introduction to data warehousing

24
Introduction to Data Warehousing December 20, 2012 Tameem Ahmad M.Tech. (F) ZHCET, AMU, Aligarh

Upload: tameem-ahmad

Post on 14-Jan-2015

907 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Introduction to Data Warehousing

Introduction to Data Warehousing

December 20, 2012

Tameem AhmadM.Tech. (F)ZHCET, AMU, Aligarh

Page 2: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 2

References:

• “Building Data Warehouse” by Inmon (Third Edition), New York: John Wiley & Sons, (2002)

• “Data Mining: Concepts and Techniques” by Han,Kamber. 2000

• http://www.data-warehouse-online.com/ [Accessed: November 4, 2012]

• Data Warehousing Battle of the Giants: Comparing the Basics of the Kimball and Inmon Models: by Mary Breslin

http://www.bibestpractices.com/view-articles/4768

Page 3: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 3

Plan for the Presentation

• Necessity of Data Warehousing. (Why it is needed?)• What is Data Warehousing?• Architecture• Schema• How to build Data Warehouse (components)• Data Warehousing Tools

Page 4: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 4

Necessity is the mother of invention…

Why Data Warehouse?

? ? ? ?

Page 5: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 5

Scenario

• ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.

Page 6: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 6

Scenarion: ABC Pvt. Ltd.

6

Mumbai

Delhi

Chennai

Banglore

SalesManager

Sales per item type per branchfor first quarter.

Page 7: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 7

Solution: ABC Pvt. Ltd.

Extract sales information from each database.

Store the information in a common repository at a single site.

Page 8: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 8

Solution: ABC Pvt. Ltd.

Mumbai

Delhi

Chennai

Banglore

DataWarehouse

SalesManager

Query &Analysis tools

Report

Page 9: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 9

Data Warehousing…

• DefinitionA data warehouse is» -subject-oriented,» -integrated,» -time-variant,» -nonvolatile

collection of data in support of management’s decision making process.

Page 10: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 10

Subject-oriented

• Data warehouse is organized around subjects such as sales, product, customer.

• It focuses on modeling and analysis of data for decision makers.

• Excludes data not useful in decision support process.

Page 11: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 11

Integration

• Data Warehouse is constructed by integrating multiple heterogeneous sources.

• Data Preprocessing are applied to ensure consistency.

RDBMS

LegacySystem

DataWarehouse

Flat File Data ProcessingData Transformation

Page 12: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 12

Time-variant

• Provides information from historical perspective e.g. past 5-10 years

Page 13: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 13

Nonvolatile

• Data once recorded cannot be updated.• Data warehouse requires two operations

in data accessing– Initial loading of data– Access of data

load access

Page 14: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 14

Data Warehousing Architecture

Page 15: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 15

Data Warehousing Architecture (Contt…)

• Data Warehouse server• almost always a relational DBMS, rarely flat files

• OLAP servers• to support and operate on multi-dimensional data

structures• Clients

• Query and reporting tools• Analysis tools• Data mining tools

Page 16: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 16

Data Warehousing Schema

• Star Schema• Snowflake Schema

Page 17: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 17

Measures & Dimensions

• Measure – Units sold, Amount.

• Dimensions – Product, Time, Region

Page 18: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 18

Star Schema

• A single, large and central fact table and one table for each dimension.

• Every fact points to one tuple in each of the dimensions and has additional attributes.

• Does not capture hierarchies directly.

Page 19: Introduction to Data Warehousing

04/10/2023 19

Star Schema (Contt…)

Store Key

Product Key

Period Key

Units

Price

Store Dimension

Time Dimension

Product Dimension

Fact Table

Store Key

Store Name

City

State

Region

Period Key

Year

Quarter

Month

Product Key

Product Desc

Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.

Tameem Ahmad

Page 20: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 20

Snowflake Schema

• Variant of star schema model.• A single, large and central fact table and one or more tables

for each dimension.• Dimension tables are normalized i.e. split dimension table

data into additional tables

Page 21: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 21

Snowflake Schema (Contt…)

Store Key

Product Key

Period Key

Units

Price

Time Dimension

Product Dimension

Fact Table

Store Key

Store Name

City Key

Period Key

Year

Quarter

Month

Product Key

Product Desc

City Key

City

State

Region

City Dimension

Store Dimension

Drawbacks: Time consuming joins,report generation slow

Page 22: Introduction to Data Warehousing

04/10/2023 22

Building the Data Warehouse

• Data Selection

• Data Pre-processing

– Fill missing values

– Remove inconsistency

• Data Transformation & Integration

• Data Loading

Data in warehouse is stored in form of fact tables and dimension tables.

Tameem Ahmad

Page 23: Introduction to Data Warehousing

04/10/2023 Tameem Ahmad 23

Data Warehousing Tools

• Data Warehouse– SQL Server 2000 DTS– Oracle 8i Warehouse Builder

• ETL tools– Ab Initio– Informatica

• OLAP tools– SQL Server Analysis

Services– Oracle Express Server

• Reporting tools− MS Excel Pivot Chart− VB Applications− cognos, − Microstrategy,

− Hyperion

Page 24: Introduction to Data Warehousing

Thank You