introduction to dw
DESCRIPTION
Presentation on Data WarehousingTRANSCRIPT
From DBMS to Decision Support
• DBMSs widely used to maintain transactional data
• Attempts to use of these data for analysis, exploration, identification of trends etc. has led to Decision Support Systems.
• Rapid Growth since mid 70’s• DBMSs vendors have answered this trend by
adding new features to existing products• Rarely enough
DBs for Decision Support
• Trend towards Data Warehousing
• Data Warehousing – consolidation of data from several databases which are in turn maintained by individual business units along with historical and summary information
Characteristics of TPSsCharacteristics of TPSs
Characteristic OLTP
Typical operation Update
Level of analytical requirements Low
Screens Unchanging
Amount of data per transaction Small
Data level Detailed
Age of data Current
Orientation Records
Complex Analysis
Historical information
to analyze
Data needs to be integrated
Database design:
Denormalized, star schema
OLTP
Information to support
day-to-day service
Data stored at transaction
level
Database design: Normalized
TPS vs Decision SupportTPS vs Decision SupportTPS vs Decision SupportTPS vs Decision Support
MIS and Decision Support
Operational reportsOperational reports Decision makersDecision makers
ProductionProductionplatformsplatforms
• MIS systems provided business data• Reports were developed on request• Reports provided little analysis capability• no personal ad hoc access to data
Ad hoc accessAd hoc access
Analyzing Data from Operational Systems
• Data structures are complex• Systems are designed for high performance and
throughput• Data is not meaningfully represented• Data is dispersed• TPS systems unsuitable for intensive queries
Operational reportsOperational reports
ProductionProductionplatformsplatforms
ERP
• End user computing offloaded from the operational environment
• User’s own data
Data Extract Processing
ExtractsExtractsOperational systemsOperational systems Decision makersDecision makers
Management Issues
Extract explosion• Duplicated effort• Multiple technologies• Obsolete reports• No metadata
ExtractsExtractsOperational systemsOperational systems Decision makersDecision makers
Data Quality Issues
• No common time basis• Different calculation algorithms• Different levels of extraction• Different levels of granularity• Different data field names• Different data field meanings• Missing information• No data correction rules• No drill-down capability
From Extract to Warehouse DSS
• Controlled• Reliable• Quality information• Single source of
data
Data warehouseData warehouseInternal andInternal andexternal systemsexternal systems
Decision makersDecision makers
Data Warehousing Architecture
Metadata respository Serves
Extract Clean
Transform Load
RefreshOLAP
Data Warehouse
External Data Sources
Operational Databases
Visualisation
Data Mining
Business Motivators
• Provide superior services and products
• Know the business
• New products
• Invest in customers
• Retain customers
• Invest in technology
• Reinvent to face new challenges
Centralised data warehouse
Mainframe
Corporatedata-
warehouse
CorporateFinancial
MarketingManufacturing
Distribution
Server Analyst
Analyst
Analyst
Federated data warehouse
Mainframe
Corporatedata
warehouse
Financial
Analyst
Analyst
AnalystMarketing
Manufacturing
Distribution
Analyst
Tiered data warehouse
Local data mart
Mainframe
Analyst
Tier 3 (detailed data)
Tier 1 (highly summarized data)
Tier 2 (summarized data)
Workstation
Corporate data warehouse
Data Warehouses Vs Data Marts
Data Mart
Department
Single-subject
Few
< 100 GB
Months
Data MartData
Warehouse
Property
Scope
Subjects
Data Source
Size (typical)
Implementation time
Data Warehouse
Enterprise
Multiple
Many
100 GB to > 1 TB
Months to years
End-user Access Tools
• High performance is achieved by pre-planning the requirements for joins, summations, and periodic reports by end-users.
• There are five main groups of access tools:– Data reporting and query tools– Application development tools– Executive information system (EIS) tools– Online analytical processing (OLAP) tools– Data mining tools
Data Usage - $1000 questions
Verification Discovery What is the average sale for in-store and catalog customers?
What is the best predictor of sales?
What is the average high school GPA of students who graduate from college compared to those who do not?
What are the best predictors of college graduation?
Need to complement RDBMS technology with a flexible,
multidimensional view of data
The Functionality of OLAP
• Rotate and drill down
• Create and examine calculated data
• Determine comparative or relative differences.
• Perform exception and trend analysis.
• Perform advanced analytical functions
The star structure
Facts
Week
Product
Product
Year
Region
Time
Channel
Revenue
Expenses
Units
Model
Type
Color
Channel
Region
Nation
District
Dealer
Time
Multidimensional Database Model
The data is found at the intersection of dimensions.
StoreStore
TimeTime
FINANCE
StoreStore
ProductProduct
TimeTime
SALES
CustomerCustomer
Data mining functions• Associations
– 85 percent of customers who buy a certain brand of wine also buy a certain type of pasta
• Sequential patterns– 32 percent of female customers who order a red jacket within six
months buy a gray skirt
• Classifying– Frequent customers are those with incomes about $50,000 and
having two or more children
• Clustering– Market segmentation
• Predicting– predict the revenue value of a new customer based on that personal
demographic variables