seminar datawarehousing

20
Data Warehousing Kavisha Uniyal And Gunjan Bhandari

Upload: kavisha-uniyal

Post on 05-Dec-2014

943 views

Category:

Education


3 download

DESCRIPTION

its a ppt on data warehouse. includes topics like data mart, olap etc. with examples.

TRANSCRIPT

Page 1: Seminar datawarehousing

Data Warehousing

Kavisha Uniyal

And

Gunjan Bhandari

Page 2: Seminar datawarehousing

DEFINITION

• ”A DATA WAREHOUSE is a subject-oriented, integrated, time varient, non volatile collection of data in support of management decision”

Page 3: Seminar datawarehousing

Data Warehouse Features

• Subject-oriented: WH is organized around the major subjects of the enterprise rather than the major application areas.. This is reflected in the need to store decision-support data rather than application-oriented data

• Integrated: because the source data come together from different enterprise-wide applications systems. The source data is often inconsistent using..The integrated data source must be made consistent to present a unified view of the data to the users

• Time-variant : the source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots

• Non-volatile : data is not update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data

Page 4: Seminar datawarehousing

Why Data Warehousing??

“Necessity is the mother of invention”

Page 5: Seminar datawarehousing

Scenario 1

Cola Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants annual sales report to plan the future production quantity in each branch. Each branch has a separate operational system.

Page 6: Seminar datawarehousing

Scenario 1 : Cola Pvt Ltd.

Delhi

Mumbai

Chennai

Banglore

Sales per item type per branch

for a year.

Sales

Manager

Page 7: Seminar datawarehousing

Solution 1:Cola Pvt Ltd.

Mumbai

Delhi

Chennai

Banglore

Data

Warehouse

Query &

Analysis toolsSales

Manager

Report

Page 8: Seminar datawarehousing

Need Of Data Warehouse• Business Users : require data warehouse to view summarized data

from past. The data is presented in a very simple form such that it is very easy to understand the facts and figures.

• Store Historic Data : data warehouse is required to store the time variable data from past.

• Selective Data : when data is stored in DWH it may not be full data because DWH contains summarized data.

• Differentiate analytical and operational processing: operational DB is entitled for online transactions and various operations. And analytical DB is used for analysis the summarized data.

• Make Strategic Decisions : some strategies may be depending upon the data in the DWH

Page 9: Seminar datawarehousing

3 Tier Architecture

Page 10: Seminar datawarehousing

extraction, cleaning, and transformation (e.g., to merge

similar data from different

3 Tire Architecture

1. Extraction and Transformation Tier : Data is collected from various sources and then refined, non useful data is eliminated, transformed into standard format and then loaded into data warehouse.

2. Connective Tier : The data mart server serves as connective or middle tier. Extracted and transformed data from DB is stored in DWH(central storage)

3. Data Access and Retrieval Tier : End user enters a query through OLAP tool. The query is processed by WH and the graphical and complex results are displayed.

extraction, cleaning, and transformation (e.g., to merge similar data from different

extraction, cleaning, and transformation (e.g., to merge similar data from different

Page 11: Seminar datawarehousing

DATA MART

• Definition of Data Mart A subset of data warehouse that stores only relevant data

Data Mart is of two types : dependent and independent• Dependent data mart

A subset that is created directly from a data warehouse • Independent data mart

A small data warehouse designed for a strategic business unit or a department

Page 12: Seminar datawarehousing

Scenario 2

• There are three companies X, Y and Z arranged along the x-axis. There are three countries India, China, Japan arranged along y-axis. The two years 2010 and 2011 are shown along the z-axis. The intersection of each element from x, y and z-axis gives the sales in lacs of a particular company in a particular country in a particular year. ‘All’ given along axis displays the sum of sale in all entities with the intersecting dimension.

Page 13: Seminar datawarehousing

2010

2011

All

INDIA

CHINA

JAPAN

31 46 24 101

4 9 3 16

7 12 5 24

20 1625 61

X Y Z All

All

year

country

company

Page 14: Seminar datawarehousing

Online Analysis Processing(OLAP)

• It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

Data

Warehouse

Page 15: Seminar datawarehousing

OLAP Operations

1. ROLL UP: The roll uses concept hierarchy which maps lower level details to higher generalized details.

For eg:

STREET

AREA

CITY

STATE

Page 16: Seminar datawarehousing

2.DRILL DOWN: It is opposite of Roll Up. It

goes from higher level details to lower level details. For eg:

CONTINENT

COUNTRY

STATE

Page 17: Seminar datawarehousing

3. SLICE AND DICE: This is used for searching and accessing data in the cube.

YEAR

COUNTRY

COUNTRY

COMPANY

COMPANY

Page 18: Seminar datawarehousing

4. PIVOT OR ROTATE: This operation is used when a user wants to change the orientation of the view of cube. In this operation position of some rows or some columns may be changed.

year

country

countrycompany

company

year

Page 19: Seminar datawarehousing

Difference Between Data Warehouse and Database

Database Data Warehouse

Orientation Application oriented Subject oriented

Amount of Detail Detailed data Summarized data

Time Dependence Give data at the moment of access

Give data over time

Community served Clerical community Managerial community

volatility Volatile Non-volatile

Availability Highly available Relaxed availability

Redundancy Non-redundant Some redundancies

Page 20: Seminar datawarehousing

REFERENCE

BOOK:

• Data Mining and Warehousing by KANIKA LAKHANI AND GAURAV GIRDHAR.

SEARCH ENGINE:

• Google