seminar datawarehousing
DESCRIPTION
its a ppt on data warehouse. includes topics like data mart, olap etc. with examples.TRANSCRIPT
Data Warehousing
Kavisha Uniyal
And
Gunjan Bhandari
DEFINITION
• ”A DATA WAREHOUSE is a subject-oriented, integrated, time varient, non volatile collection of data in support of management decision”
Data Warehouse Features
• Subject-oriented: WH is organized around the major subjects of the enterprise rather than the major application areas.. This is reflected in the need to store decision-support data rather than application-oriented data
• Integrated: because the source data come together from different enterprise-wide applications systems. The source data is often inconsistent using..The integrated data source must be made consistent to present a unified view of the data to the users
• Time-variant : the source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots
• Non-volatile : data is not update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data
Why Data Warehousing??
“Necessity is the mother of invention”
Scenario 1
Cola Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants annual sales report to plan the future production quantity in each branch. Each branch has a separate operational system.
Scenario 1 : Cola Pvt Ltd.
Delhi
Mumbai
Chennai
Banglore
Sales per item type per branch
for a year.
Sales
Manager
Solution 1:Cola Pvt Ltd.
Mumbai
Delhi
Chennai
Banglore
Data
Warehouse
Query &
Analysis toolsSales
Manager
Report
Need Of Data Warehouse• Business Users : require data warehouse to view summarized data
from past. The data is presented in a very simple form such that it is very easy to understand the facts and figures.
• Store Historic Data : data warehouse is required to store the time variable data from past.
• Selective Data : when data is stored in DWH it may not be full data because DWH contains summarized data.
• Differentiate analytical and operational processing: operational DB is entitled for online transactions and various operations. And analytical DB is used for analysis the summarized data.
• Make Strategic Decisions : some strategies may be depending upon the data in the DWH
3 Tier Architecture
extraction, cleaning, and transformation (e.g., to merge
similar data from different
3 Tire Architecture
1. Extraction and Transformation Tier : Data is collected from various sources and then refined, non useful data is eliminated, transformed into standard format and then loaded into data warehouse.
2. Connective Tier : The data mart server serves as connective or middle tier. Extracted and transformed data from DB is stored in DWH(central storage)
3. Data Access and Retrieval Tier : End user enters a query through OLAP tool. The query is processed by WH and the graphical and complex results are displayed.
extraction, cleaning, and transformation (e.g., to merge similar data from different
extraction, cleaning, and transformation (e.g., to merge similar data from different
DATA MART
• Definition of Data Mart A subset of data warehouse that stores only relevant data
Data Mart is of two types : dependent and independent• Dependent data mart
A subset that is created directly from a data warehouse • Independent data mart
A small data warehouse designed for a strategic business unit or a department
Scenario 2
• There are three companies X, Y and Z arranged along the x-axis. There are three countries India, China, Japan arranged along y-axis. The two years 2010 and 2011 are shown along the z-axis. The intersection of each element from x, y and z-axis gives the sales in lacs of a particular company in a particular country in a particular year. ‘All’ given along axis displays the sum of sale in all entities with the intersecting dimension.
2010
2011
All
INDIA
CHINA
JAPAN
31 46 24 101
4 9 3 16
7 12 5 24
20 1625 61
X Y Z All
All
year
country
company
Online Analysis Processing(OLAP)
• It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
Data
Warehouse
OLAP Operations
1. ROLL UP: The roll uses concept hierarchy which maps lower level details to higher generalized details.
For eg:
STREET
AREA
CITY
STATE
2.DRILL DOWN: It is opposite of Roll Up. It
goes from higher level details to lower level details. For eg:
CONTINENT
COUNTRY
STATE
3. SLICE AND DICE: This is used for searching and accessing data in the cube.
YEAR
COUNTRY
COUNTRY
COMPANY
COMPANY
4. PIVOT OR ROTATE: This operation is used when a user wants to change the orientation of the view of cube. In this operation position of some rows or some columns may be changed.
year
country
countrycompany
company
year
Difference Between Data Warehouse and Database
Database Data Warehouse
Orientation Application oriented Subject oriented
Amount of Detail Detailed data Summarized data
Time Dependence Give data at the moment of access
Give data over time
Community served Clerical community Managerial community
volatility Volatile Non-volatile
Availability Highly available Relaxed availability
Redundancy Non-redundant Some redundancies
REFERENCE
BOOK:
• Data Mining and Warehousing by KANIKA LAKHANI AND GAURAV GIRDHAR.
SEARCH ENGINE: