introduction to data warehousing

24
Introduction to Data Warehousing

Upload: akeem-andrews

Post on 31-Dec-2015

45 views

Category:

Documents


2 download

DESCRIPTION

Introduction to Data Warehousing. From DBMS to Decision Support. DBMSs widely used to maintain transactional data Attempts to use of these data for analysis, exploration, identification of trends etc. has led to Decision Support Systems. Rapid Growth since mid 70’s - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to  Data Warehousing

Introduction to Data Warehousing

Page 2: Introduction to  Data Warehousing

From DBMS to Decision Support

• DBMSs widely used to maintain transactional data

• Attempts to use of these data for analysis, exploration, identification of trends etc. has led to Decision Support Systems.

• Rapid Growth since mid 70’s• DBMSs vendors have answered this trend by

adding new features to existing products• Rarely enough

Page 3: Introduction to  Data Warehousing

DBs for Decision Support

• Trend towards Data Warehousing

• Data Warehousing – consolidation of data from several databases which are in turn maintained by individual business units along with historical and summary information

Page 4: Introduction to  Data Warehousing

Characteristics of TPSsCharacteristics of TPSs

Characteristic OLTP

Typical operation Update

Level of analytical requirements Low

Screens Unchanging

Amount of data per transaction Small

Data level Detailed

Age of data Current

Orientation Records

Page 5: Introduction to  Data Warehousing

Complex Analysis

Historical information

to analyze

Data needs to be integrated

Database design:

Denormalized, star schema

OLTP

Information to support

day-to-day service

Data stored at transaction

level

Database design: Normalized

TPS vs Decision SupportTPS vs Decision SupportTPS vs Decision SupportTPS vs Decision Support

Page 6: Introduction to  Data Warehousing

MIS and Decision Support

Operational reportsOperational reports Decision makersDecision makers

ProductionProductionplatformsplatforms

• MIS systems provided business data• Reports were developed on request• Reports provided little analysis capability• no personal ad hoc access to data

Ad hoc accessAd hoc access

Page 7: Introduction to  Data Warehousing

Analyzing Data from Operational Systems

• Data structures are complex• Systems are designed for high performance and

throughput• Data is not meaningfully represented• Data is dispersed• TPS systems unsuitable for intensive queries

Operational reportsOperational reports

ProductionProductionplatformsplatforms

ERP

Page 8: Introduction to  Data Warehousing

• End user computing offloaded from the operational environment

• User’s own data

Data Extract Processing

ExtractsExtractsOperational systemsOperational systems Decision makersDecision makers

Page 9: Introduction to  Data Warehousing

Management Issues

Extract explosion• Duplicated effort• Multiple technologies• Obsolete reports• No metadata

ExtractsExtractsOperational systemsOperational systems Decision makersDecision makers

Page 10: Introduction to  Data Warehousing

Data Quality Issues

• No common time basis• Different calculation algorithms• Different levels of extraction• Different levels of granularity• Different data field names• Different data field meanings• Missing information• No data correction rules• No drill-down capability

Page 11: Introduction to  Data Warehousing

From Extract to Warehouse DSS

• Controlled• Reliable• Quality information• Single source of

data

Data warehouseData warehouseInternal andInternal andexternal systemsexternal systems

Decision makersDecision makers

Page 12: Introduction to  Data Warehousing

Data Warehousing Architecture

Metadata respository Serves

Extract Clean

Transform Load

RefreshOLAP

Data Warehouse

External Data Sources

Operational Databases

Visualisation

Data Mining

Page 13: Introduction to  Data Warehousing

Business Motivators

• Provide superior services and products

• Know the business

• New products

• Invest in customers

• Retain customers

• Invest in technology

• Reinvent to face new challenges

Page 14: Introduction to  Data Warehousing

Centralised data warehouse

Mainframe

Corporatedata-

warehouse

CorporateFinancial

MarketingManufacturing

Distribution

Server Analyst

Analyst

Analyst

Federated data warehouse

Mainframe

Corporatedata

warehouse

Financial

Analyst

Analyst

AnalystMarketing

Manufacturing

Distribution

Analyst

Page 15: Introduction to  Data Warehousing

Tiered data warehouse

Local data mart

Mainframe

Analyst

Tier 3 (detailed data)

Tier 1 (highly summarized data)

Tier 2 (summarized data)

Workstation

Corporate data warehouse

Page 16: Introduction to  Data Warehousing

Data Warehouses Vs Data Marts

Data Mart

Department

Single-subject

Few

< 100 GB

Months

Data MartData

Warehouse

Property

Scope

Subjects

Data Source

Size (typical)

Implementation time

Data Warehouse

Enterprise

Multiple

Many

100 GB to > 1 TB

Months to years

Page 17: Introduction to  Data Warehousing

End-user Access Tools

• High performance is achieved by pre-planning the requirements for joins, summations, and periodic reports by end-users.

• There are five main groups of access tools:– Data reporting and query tools– Application development tools– Executive information system (EIS) tools– Online analytical processing (OLAP) tools– Data mining tools

Page 18: Introduction to  Data Warehousing

Data Usage - $1000 questions

Verification Discovery What is the average sale for in-store and catalog customers?

What is the best predictor of sales?

What is the average high school GPA of students who graduate from college compared to those who do not?

What are the best predictors of college graduation?

Need to complement RDBMS technology with a flexible,

multidimensional view of data

Page 19: Introduction to  Data Warehousing
Page 20: Introduction to  Data Warehousing

The Functionality of OLAP

• Rotate and drill down

• Create and examine calculated data

• Determine comparative or relative differences.

• Perform exception and trend analysis.

• Perform advanced analytical functions

Page 21: Introduction to  Data Warehousing

The star structure

Facts

Week

Product

Product

Year

Region

Time

Channel

Revenue

Expenses

Units

Model

Type

Color

Channel

Region

Nation

District

Dealer

Time

Page 22: Introduction to  Data Warehousing

Multidimensional Database Model

The data is found at the intersection of dimensions.

StoreStore

TimeTime

FINANCE

StoreStore

ProductProduct

TimeTime

SALES

CustomerCustomer

Page 23: Introduction to  Data Warehousing

Data Mining

Page 24: Introduction to  Data Warehousing

Data mining functions• Associations

– 85 percent of customers who buy a certain brand of wine also buy a certain type of pasta

• Sequential patterns– 32 percent of female customers who order a red jacket within six

months buy a gray skirt

• Classifying– Frequent customers are those with incomes about $50,000 and

having two or more children

• Clustering– Market segmentation

• Predicting– predict the revenue value of a new customer based on that personal

demographic variables