data warehouse-final
TRANSCRIPT
![Page 1: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/1.jpg)
1
Data Warehousing
![Page 2: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/2.jpg)
2
OverviewPart 1: Data Warehouses Part 2: OLAPPart 3: Data MiningPart 4: Query Processing and
Optimization
![Page 3: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/3.jpg)
3
Part 1: Data Warehouses
![Page 4: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/4.jpg)
4
Data, Data everywhereyet ...
I can’t find the data I need data is scattered over the network many versions, subtle differences
I can’t get the data I need need an expert to get the data
I can’t understand the data I found available data poorly
documented I can’t use the data I found results are unexpected data needs to be transformed
from one form to other
![Page 5: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/5.jpg)
5
What is a Data Warehouse? A single, complete and
consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
Data Characteristics: Subject Oriented Integrated Non-Volatile Time-Referenced
![Page 6: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/6.jpg)
6
Which are our lowest/highest margin
customers ?Who are my customers
and what products are they buying?
Which customers are most likely to go to the competition ?
What impact will new products/services
have on revenue and margins?
What product prom--otions have the biggest
impact on revenue?
What is the most effective distribution
channel?
Why Data Warehousing?
![Page 7: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/7.jpg)
WHY BI AND WHAT IS IT
Information is the life-blood of modern organization
BI transform data stored in core business systems into knowledge,which in turn enables you to,
•Know your customers and let your customer know you
•Streamline your business processes by aligning technology with business goals
•Know how your business runs as a whole by getting 360 deg view of your organization
![Page 8: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/8.jpg)
BI Architecture
![Page 9: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/9.jpg)
9
Decision SupportUsed to manage and control businessData is historical or point-in-timeOptimized for inquiry rather than updateUse of the system is loosely defined and
can be ad-hocUsed by managers and end-users to
understand the business and make judgements
![Page 10: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/10.jpg)
10
What are the users saying...Data should be integrated
across the enterpriseSummary data had a real
value to the organizationHistorical data held the
key to understanding data over time
What-if capabilities are required
![Page 11: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/11.jpg)
11
Data Warehousing -- It is a process
Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible
A decision support database maintained separately from the organization’s operational database
![Page 12: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/12.jpg)
12
Traditional RDBMS used for OLTP Database Systems have been used
traditionally for OLTP clerical data processing tasks detailed, up to date data structured repetitive tasks read/update a few records isolation, recovery and integrity are critical
Will call these operational systems
![Page 13: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/13.jpg)
13
OLTP vs Data Warehouse
OLTP Application Oriented Used to run business Clerical User Detailed data Current up to date Isolated Data Repetitive access by
small transactions Read/Update access
Warehouse (DSS) Subject Oriented Used to analyze business Manager/Analyst Summarized and refined Snapshot data Integrated Data Ad-hoc access using large
queries Mostly read access (batch
update)
![Page 14: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/14.jpg)
14
Data Warehouse Architecture
RelationalDatabases
LegacyData
Purchased Data
Data Warehouse Engine
Optimized Loader
ExtractionCleansing
AnalyzeQuery
Metadata Repository
![Page 15: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/15.jpg)
15
From the Data Warehouse to Data Marts
DepartmentallyStructured
IndividuallyStructured
Data WarehouseOrganizationallyStructured
Less
More
HistoryNormalizedDetailed
Data
Information
![Page 16: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/16.jpg)
16
Users have different views of Data
Organizationallystructured
OLAP
Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data
Farmers: Harvest informationfrom known access paths
Tourists: Browse information harvestedby farmers
![Page 17: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/17.jpg)
17
Wal*Mart Case StudyFounded by Sam WaltonOne the largest Super Market Chains
in the US
Wal*Mart: 2000+ Retail Stores SAM's Clubs 100+Wholesalers Stores
![Page 18: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/18.jpg)
18
Old Retail ParadigmWal*Mart
Inventory Management Merchandise Accounts
Payable Purchasing Supplier Promotions:
National, Region, Store Level
Suppliers Accept Orders Promote Products Provide special
Incentives Monitor and Track The
Incentives Bill and Collect
Receivables Estimate Retailer
Demands
![Page 19: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/19.jpg)
19
New (Just-In-Time) Retail Paradigm No more deals Shelf-Pass Through (POS Application)
One Unit Price Suppliers paid once a week on ACTUAL items sold
Wal*Mart Manager Daily Inventory Restock Suppliers (sometimes SameDay) ship to Wal*Mart
Warehouse-Pass Through Stock some Large Items
Delivery may come from supplier Distribution Center
Supplier’s merchandise unloaded directly onto Wal*Mart Trucks
![Page 20: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/20.jpg)
20
Information as a Strategic WeaponDaily Summary of all Sales InformationRegional Analysis of all Stores in a logical
areaSpecific Product SalesSpecific Supplies SalesTrend Analysis, etc.Wal*Mart uses information when negotiating
with Suppliers Advertisers etc.
![Page 21: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/21.jpg)
21
Schema Design
Database organization must look like business must be recognizable by business user approachable by business user Must be simple
Schema Types Star Schema Fact Constellation Schema Snowflake schema
![Page 22: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/22.jpg)
22
Star SchemaA single fact table and for each
dimension one dimension table
T ime
prod
cust
city
fact
date, custno, prodno, cityname, sales
![Page 23: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/23.jpg)
23
Dimension Tables
Dimension tables Define business in terms already familiar to
users Wide rows with lots of descriptive text Small tables (about a million rows) Joined to fact table by a foreign key heavily indexed typical dimensions
time periods, geographic region (markets, cities), products, customers, salesperson, etc.
![Page 24: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/24.jpg)
24
Fact Table
Central table Typical example: individual sales
records mostly raw numeric items narrow rows, a few columns at most large number of rows (millions to a
billion) Access via dimensions
![Page 25: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/25.jpg)
25
Snowflake schemaRepresent dimensional hierarchy directly
by normalizing tables. Easy to maintain and saves storage
T ime
prod
cust
city
fact
date, custno, prodno, cityname, ...
region
![Page 26: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/26.jpg)
26
Fact Constellation
Fact Constellation Multiple fact tables that share many
dimension tables Booking and Checkout may share many
dimension tables in the hotel industryHotels
Travel Agents
Promotion
Room Type
Customer
Booking
Checkout
![Page 27: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/27.jpg)
27
Data Granularity in WarehouseSummarized data stored
reduce storage costs reduce cpu usage increases performance since smaller
number of records to be processed design around traditional high level
reporting needs tradeoff with volume of data to be stored
and detailed usage of data
![Page 28: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/28.jpg)
28
Granularity in WarehouseSolution is to have dual level of
granularity Store summary data on disks
95% of DSS processing done against this data
Store detail on tapes5% of DSS processing against this data
![Page 29: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/29.jpg)
29
Levels of Granularity
Operational
60 days ofactivity
account activity date amount teller location account bal
accountmonth # trans withdrawals deposits average bal
amountactivity date amount account bal
monthly accountregister -- up to 10 years
Not all fieldsneed be archived
Banking Example
![Page 30: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/30.jpg)
30
Data Integration Across Sources
Trust Credit cardSavings Loans
Same data different name
Different data Same name
Data found here nowhere else
Different keyssame data
![Page 31: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/31.jpg)
31
Data Transformation
Data transformation is the foundation for achieving single version of the truth
Major concern for ITData warehouse can fail if appropriate
data transformation strategy is not developed
Sequential Legacy Relational ExternalOperational/Source Data
Data Transformation
Accessing Capturing Extracting Householding FilteringReconciling Conditioning Loading Validating Scoring
![Page 32: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/32.jpg)
32
Data Transformation Exampleen
cod i
ngun
itfie
ld
appl A - balanceappl B - balappl C - currbalappl D - balcurr
appl A - pipeline - cmappl B - pipeline - inappl C - pipeline - feetappl D - pipeline - yds
appl A - m,fappl B - 1,0appl C - x,yappl D - male, female
Data Warehouse
![Page 33: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/33.jpg)
33
Data Integrity Problems Same person, different spellings
Agarwal, Agrawal, Aggarwal etc... Multiple ways to denote company name
Persistent Systems, PSPL, Persistent Pvt. LTD. Use of different names
mumbai, bombay Different account numbers generated by different
applications for the same customer Required fields left blank Invalid product codes collected at point of sale
manual entry leads to mistakes “in case of a problem use 9999999”
![Page 34: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/34.jpg)
34
When to Refresh?periodically (e.g., every night, every week)
or after significant eventson every update: not warranted unless
warehouse data require current data (up to the minute stock quotes)
refresh policy set by administrator based on user needs and traffic
possibly different policies for different sources
![Page 35: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/35.jpg)
35
Refresh techniquesIncremental techniques
detect changes on base tables: replication servers (e.g., Sybase, Oracle, IBM Data Propagator)snapshots (Oracle)transaction shipping (Sybase)
compute changes to derived and summary tables
maintain transactional correctness for incremental load
![Page 36: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/36.jpg)
36
How To Detect Changes
Create a snapshot log table to record ids of updated rows of source data and timestamp
Detect changes by: Defining after row triggers to update
snapshot log when source table changes Using regular transaction log to detect
changes to source data
![Page 37: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/37.jpg)
37
Querying Data WarehousesSQL ExtensionsMultidimensional modeling of data
OLAP More on OLAP later …
![Page 38: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/38.jpg)
38
Reporting ToolsAndyne Computing -- GQL Brio -- BrioQuery Business Objects -- Business Objects Cognos -- Impromptu Information Builders Inc. -- Focus for Windows Oracle -- Discoverer2000 Platinum Technology -- SQL*Assist, ProReports PowerSoft -- InfoMaker SAS Institute -- SAS/Assist Software AG -- Esperant Sterling Software -- VISION:Data
![Page 39: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/39.jpg)
39
Deploying Data WarehousesWhat business information
keeps you in business today? What business information can put you out of business tomorrow?
What business information should be a mouse click away?
What business conditions are the driving the need for business information?
![Page 40: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/40.jpg)
40
Cultural ConsiderationsNot just a technology projectNew way of using information
to support daily activities and decision making
Care must be taken to prepare organization for change
Must have organizational backing and support
![Page 41: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/41.jpg)
41
User TrainingUsers must have a higher level of IT
proficiency than for operational systems
Training to help users analyze data in the warehouse effectively
![Page 42: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/42.jpg)
42
Warehouse ProductsComputer Associates -- CA-Ingres Hewlett-Packard -- Allbase/SQL Informix -- Informix, Informix XPSMicrosoft -- SQL Server Oracle -- Oracle7, Oracle Parallel ServerRed Brick -- Red Brick Warehouse SAS Institute -- SAS Software AG -- ADABAS Sybase -- SQL Server, IQ, MPP
![Page 43: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/43.jpg)
43
Strengths of OLAPIt is a powerful visualization
toolIt provides fast, interactive
response timesIt is good for analyzing time
seriesIt can be useful to find
some clusters and outlinersMany vendors offer OLAP
tools
![Page 44: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/44.jpg)
44
Part 3: Data Mining
![Page 45: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/45.jpg)
45
Why Data Mining Credit ratings/targeted marketing:
Given a database of 100,000 names, which persons are the least likely to default on their credit cards?
Identify likely responders to sales promotions Fraud detection
Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?
Customer relationship management: Which of my customers are likely to be the most loyal, and
which are most likely to leave for a competitor? :Data Mining helps extract such information
![Page 46: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/46.jpg)
46
Data mining
Process of semi-automatically analyzing large databases to find interesting and useful patterns
Overlaps with machine learning, statistics, artificial intelligence and databases but more scalable in number of features and
instances more automated to handle heterogeneous
data
![Page 47: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/47.jpg)
47
Application Areas
Industry ApplicationFinance Credit Card AnalysisInsurance Claims, Fraud Analysis
Telecommunication Call record analysisTransport Logistics managementConsumer goods promotion analysisData Service providersValue added dataUtilities Power usage analysis
![Page 48: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/48.jpg)
48
Data Mining in UseThe US Government uses Data Mining to track
fraudA Supermarket becomes an information brokerBasketball teams use it to track game strategyCross SellingTarget MarketingHolding on to Good CustomersWeeding out Bad Customers
![Page 49: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/49.jpg)
49
Why Now?Data is being producedData is being warehousedThe computing power is availableThe computing power is affordableThe competitive pressures are strongCommercial products are available
![Page 50: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/50.jpg)
50
Data Mining works with Warehouse Data
Data Warehousing provides the Enterprise with a memory
Data Mining provides the Enterprise with intelligence
![Page 51: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/51.jpg)
51
Mining marketAround 20 to 30 mining tool vendorsMajor players:
Clementine, IBM’s Intelligent Miner, SGI’s MineSet, SAS’s Enterprise Miner.
All pretty much the same set of toolsMany embedded products: fraud detection,
electronic commerce applications
![Page 52: Data Warehouse-Final](https://reader033.vdocument.in/reader033/viewer/2022052606/5880296e1a28ab9f0f8b5027/html5/thumbnails/52.jpg)
52
OLAP Mining integrationOLAP (On Line Analytical Processing)
Fast interactive exploration of multidim. aggregates.
Heavy reliance on manual operations for analysis:
Tedious and error-prone on large multidimensional data
Ideal platform for vertical integration of mining but needs to be interactive instead of batch.