90300_633579030311875000
TRANSCRIPT
-
8/2/2019 90300_633579030311875000
1/18
Data Warehousing/Mining 1
Data Warehousing/Mining
Introduction
-
8/2/2019 90300_633579030311875000
2/18
Data Warehousing/Mining 2
Outline of Lecture
Brief History of Data Warehousing
What is a Data Warehouse?
Need For Strategic Information Information Crisis
Operational and Decision Support System
Difference B/W standard DB and Datawarehouse
-
8/2/2019 90300_633579030311875000
3/18
Data Warehousing/Mining 3
Data Warehouse Evolution
TIME
2000199519801960 1975
Information-
Based
Management
Data
RevolutionMiddle
Ages
Prehistoric
Times
Relational
Databases
PCs and
SpreadsheetsEnd-user
Interfaces
1st DW
Article
DW
Confs.
Vendor DW
Frameworks
Company
DWs
Building theDW
Inmon (1992)
Data ReplicationTools
1985 1990
-
8/2/2019 90300_633579030311875000
4/18
Data Warehousing/Mining 4
Escalating Need For Strategic Information
Organizations need information to formulatethe business strategies,establish Goals,set
Objectivese.g. Increase the customer by 10% over the next 5 years
Gain market share by 15% in the next 2 years
Increase product quality levels in the top five productgroups
-
8/2/2019 90300_633579030311875000
5/18
Data Warehousing/Mining 5
The Information Crisis
Information is said to be doubled every 18months
Organizations have tons of data availableThen why information Crisis?
Why cant organizations convert the data into
useful information for strategic decisionmaking?
-
8/2/2019 90300_633579030311875000
6/18
Data Warehousing/Mining 6
Problem: Heterogeneous InformationSources
Heterogeneities are everywhere
Different interfaces
Different data representations
Diverse structure of databases
Duplicate and inconsistent information
PersonalDatabases
Digital LibrariesScientific Databases WorldWide
Web
-
8/2/2019 90300_633579030311875000
7/18Data Warehousing/Mining 7
About Some Definitions
What is data?
What is information?
What is Warehouse?
-
8/2/2019 90300_633579030311875000
8/18Data Warehousing/Mining 8
What is a Data Warehouse?A Practitioners Viewpoint
A data warehouse is simply a single, complete,and consistent store of data obtained from avariety of sources and made available to endusers in a way they can understand and use itin a business context.
-- Barry Devlin, IBM Consultant
-
8/2/2019 90300_633579030311875000
9/18Data Warehousing/Mining 9
A Data Warehouse is...
Stored collection of diverse data A solution to data integration problem
Single repository of information
Subject-oriented Organized by subject, not by application
Used for analysis, data mining, etc.
Large volume of data (Gb, Tb) Non-volatile
Historical
Time attributes are important
-
8/2/2019 90300_633579030311875000
10/18Data Warehousing/Mining 10
A Data Warehouse is... (continued)
Updates infrequent
Examples All transactions EVER at WalMart
Complete client histories at insurance firm
Stockbroker financial information and portfolios
-
8/2/2019 90300_633579030311875000
11/18Data Warehousing/Mining 11
Summary
Operational Systems
Data Warehouse
Population
Data
Warehouse
Business Information
Interface
-
8/2/2019 90300_633579030311875000
12/18Data Warehousing/Mining 12
What is Operational and DecisionSupport System
Operational Systems
Making the wheels of Business Turn
Take an order Process a claim
Make shipment
Generate an invoice
Receive cash
Reserve an airline seat
-
8/2/2019 90300_633579030311875000
13/18
Data Warehousing/Mining 13
Decision Support System
Watching the wheels of business turn
Show the top selling products Show the problem regions
Tell me why (drill down)
Let me see other data (drill across)
Alert me when a district sells below target
What is Operational and DecisionSupport System (Contd)
-
8/2/2019 90300_633579030311875000
14/18
Data Warehousing/Mining 14
Difference
OperationalCurrent Values
Optimized fortransaction
High
Read, update, delete
Predictable, repetitive
Sub seconds
Large Number
InformationalArchived, derived, optimized
Optimized for complexqueries
Medium to Low
Read
Ad hoc, random, Heuristic
Several Seconds to Minutes
Relatively Small number
Data Content
Data Structure
AccessFrequency
Access Type
UsageResponse Time
Users
-
8/2/2019 90300_633579030311875000
15/18
Data Warehousing/Mining 15
Warehouse is a Specialized DB
Standard DB Mostly updates
Many small transactions
Mb - Gb of data
Current snapshot
Index/hash on p.k.
Raw data Thousands of users (e.g.,
clerical users)
Warehouse Mostly reads
Queries are long and complex
Gb - Tb of data
History
Lots of scans
Summarized, reconciled data Hundreds of users (e.g.,
decision-makers, analysts)
-
8/2/2019 90300_633579030311875000
16/18
Data Warehousing/Mining 16
Warehousing and Industry
Warehousing is big business $2 billion in 1995
$3.5 billion in early 1997 About $8 billion in 1998 [Metagroup]
WalMart has largest warehouse 900-CPU, 2,700 disk, 23 TB Teradata system
~7TB in warehouse
40-50GB per day
-
8/2/2019 90300_633579030311875000
17/18
Data Warehousing/Mining 17
Data Warehousing: Two DistinctIssues
(1) How to get information into warehouse
Data warehousing
(2) What to do with data once its in warehouseWarehouse DBMS
Both rich research areas
Industry has focused on (2)
-
8/2/2019 90300_633579030311875000
18/18
Data Warehousing/Mining 18
Thank You Very Much