why data warehouse? scenario 1 abc pvt. ltd is a company with branches at mumbai, delhi, chennai and...
TRANSCRIPT
Why Data Warehouse?
ABC Pvt. Ltd is a company with branches at Mumbai, Delhi, Chennai and Bangalore.
The Sales Manager wants quarterly sales report.
Each branch has a separate operational system.
Mumbai
Delhi
Chennai
Banglore
SalesManager
Sales per item type per branchfor first quarter.
Solution 1:ABC Pvt Ltd.
Extract sales information from each database.
Store the information in a common repository at a single site.
Mumbai
Delhi
Chennai
Banglore
DataWarehouse
SalesManager
Query &Analysis tools
Report
One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.
OperationalDatabase
Data Entry Operator
Data Entry Operator
ManagementWait
Report
Solution 2
Extract data needed for analysis from operational database.
Store it in another system, the data warehouse.
Refresh warehouse at regular intervals so that it contains up to date information for analysis.
Warehouse will contain data with historical perspective.
Operationaldatabase
DataWarehouse
Extractdata
Data EntryOperator
Data EntryOperator
ManagerTransaction
Report
Cakes & Cookies is a small, new company. The chairman of this company wants his company to grow. He needs information so that he can make correct decisions.
Solution 3Improve the quality of data before loading it
into the warehouse.Perform data cleaning and transformation
before loading the data.Use query analysis tools to support adhoc
queries.
Query & Analysis
tool
Chairman
Expansion
Improvement
sales
time
DataWarehouse
Summing up?Why do you need a warehouse?
Operational systems could not provide strategic information
Executive and managers need such strategic information for Making proper decision Formulating business strategies Establishing goals Setting objectives Monitoring results
Why operational data is not capable of producing valuable information?Data is spread across incompatible structures
and systemsNot only that, improvements in technology
had made computing faster, cheaper and available
FAILURES OF PAST DECISION-SUPPORT SYSTEMS
OLTP systems
Decision support systems
Operational and informational
Functional definition of a DWThe data warehouse is an informational
environment that Provides an integrated and total view of the
enterprise Makes the enterprise’s current and historical
information easily available for decision making Makes decision-support transactions possible
without hindering operational systems Renders the organization’s information consistent Presents a flexible and interactive source of
strategic information
Questions????Describe five differences between operational
systems and informational systemsA data warehouse in an environment, not a
product. Discuss.
A data warehouse is- subject-oriented,- integrated,- time-variant,- nonvolatile
collection of data in support of management’sdecision making process.
Subject-oriented
Data warehouse is organized around subjects such as sales, product, customer.
It focuses on modeling and analysis of data for decision makers.
Excludes data not useful in decision support process.
Integration
Data Warehouse is constructed by integrating multiple heterogeneous sources.
Data Preprocessing are applied to ensure consistency.RDBMS
LegacySystem
DataWarehouse
Flat File Data ProcessingData Transformation
IntegrationIn terms of data.
encoding structures.
Measurement ofattributes.
physical attribute. of data
naming conventions.
Data type format
remarks
Time-variant
Provides information from historical perspective, e.g. past 5-10 years
Every key structure contains either implicitly or explicitly an element of time, i.e., every record has a timestamp.
The time-variant nature in a DWAllows for analysis of the pastRelates information to the presentEnables forecasts for the future
Non-volatile
Data once recorded cannot be updated.Data warehouse requires two
operations in data accessingInitial loading of dataIncremental loading of data
load
access
Data GranularityIn an operational system, data is usually kept at the
lowest level of detail.In a DW, data is summarized at different levels.
Three data levels in a banking data warehouse
Daily Detail Monthly Summary Quaterly Summary
Account Account Account
Activity Date Month Month
Amount No. of transactions No. of transactions
Deposit/ Withdraw Withdrawals Withdrawals
Deposits Deposits
Beginning Balance Beginning Balance
Ending Balance Ending Balance
Operational v/s Information SystemFeatures Operational Information
Characteristics Operational processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA,database professional
Knowledge workers
Function Day to day operation Decision support
Data Content Current Historical, archived, derived
View Detailed, flat relational Summarized, multidimensional
DB design Application oriented Subject oriented
Unit of work Short ,simple transaction Complex query
Access Read/write Read only
Operational v/s Information System
Features Operational Information
Focus Data in Information out
No. of records accessed
tens/ hundreds millions
Number of users thousands hundreds
DB size 100MB to GB 100 GB to TB
Usage Predictable, repetitive Ad hoc, random, heuristic
Response Time Sub-seconds Several seconds to minutes
Priority High performance,high availability
High flexibility,end-user autonomy
Metric Transaction throughput Query throughput
Two approaches in designing a DWTop-down approach Bottom-up approach
Enterprise view of data Narrow view of data
Inherently architected Inherently incremental
Single, central storage of data Faster implementation of manageable parts
Centralized rules and control Each datamart is developed independently
Takes longer time to build Comparatively less time than a DW
Higher risk to failure Less risk of failure
Needs higher level of cross-functional skills
Unmanageable interfaces
Bottom Up Approach
Top Down Approach
A Practical Approach-Kimball1. Plan and Define requirements2. Create a surrounding architecture3. Conform and Standardize the data Content4. Implement Data Warehouse as series of
super-mart one at a time.
Individual Architected Data Marts
Common Logical Subject Area ERD
Common Business Dimensions
Common Business Rules
Common Business Metrics
Glossary
SalesDistribution
Product
Marketing Customer Accounts
Finance Inventory Vendors
An Incremental Approach
ArchitectedArchitectedEnterpriseEnterpriseFoundationFoundation
SalesDistribution
Product
Marketing Customer Accounts
Finance Inventory Vendors
Enterprise Data Warehouse
The Eventual Result
Data Warehouse:
Holds multiple subject areasHolds very detailed informationWorks to integrate all data sourcesDoes not necessarily use a dimensional model but feeds dimensional models.Data Mart
Often holds only one subject area- for example, Finance, or SalesMay hold more summarised data (although many hold full detail)Concentrates on integrating information from a given subject area or set of source systemsIs built focused on a dimensional model using a star schema.
Data Warehouse verses data marts