the data warehouse this is where the data lives. 2 agenda zwhat is a warehouse? zdata warehouse...

The Data Warehouse

This is where the data lives

2

Agenda

What is a Warehouse?Data Warehouse

ArchitectureData StorageData TransformationDBMS in

DatawarehousingBuilding a Successful

Data Warehouse

WHAT IS A

DATAWAREHOUSE ?

4

Peter Drucker believes ...

The two most important people in the 21st Century will be the CFO (managing the Cash Flow) and the CIO (managing the Information Flow)

5

Data, Data everywhereyet ...

I can’t find the data I need data is scattered over the network many versions, subtle differences

I can’t get the data I need need an expert to get the data

I can’t understand the data I found available data poorly documented

I can’t use the data I found results are unexpected data needs to be transformed

from one form to other

6

Which are our lowest/highest margin

customers ?

Which are our lowest/highest margin

customers ?

Who are my customers and what products are they buying?

Who are my customers and what products are they buying?

Which customers are most likely to go to the competition ?

Which customers are most likely to go to the competition ?

What impact will new products/services

have on revenue and margins?

What impact will new products/services

have on revenue and margins?

What product prom--otions have the biggest

impact on revenue?

What product prom--otions have the biggest

impact on revenue?

What is the most effective distribution

channel?

What is the most effective distribution

channel?

Why Data Warehousing?

7

What are the users saying...

Data should be integrated across the enterprise

Summary data had a real value to the organization

Historical data held the key to understanding data over time

What-if capabilities are required

8

What is Data Warehousing?

A process of transforming data into information and making it available to users in a timely enough manner to make a difference

[Forrester Research, April 1996]

Data

Information

9

What is a Data Warehouse?

A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.

[Barry Devlin]

10

Data Warehousing -- It is a process

Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible

A decision support database maintained separately from the organization’s operational database

11

Data Warehouse

A data warehouse is a subject-oriented integrated time-varying non-volatile

collection of data that is used primarily in organizational decision making.

-- Bill Inmon, Building the Data Warehouse 1996

12

Traditional RDBMS used for OLTP

Database Systems have been used traditionally for OLTP clerical data processing tasks detailed, up to date data structured repetitive tasks read/update a few records isolation, recovery and integrity are critical

Will call these operational systems

13

Operational Systems

Run the business in real time Based on up-to-the-second data Optimized to handle large numbers

of simple read/write transactions Optimized for fast response to

predefined transactions Used by people who deal with

customers, products -- clerks, salespeople etc.

They are increasingly used by customers

14

Examples of Operational Data

Data IndustryUsage Technology Volumes

CustomerFile

All TrackCustomerDetails

Legacy application, flatfiles, main frames

Small-medium

AccountBalance

Finance Controlaccountactivities

Legacy applications,hierarchical databases,mainframe

Large

Point-of-Sale data

Retail Generatebills, managestock

Client/Server,relational databases

Very Large

CallRecord

Telecomm-unications

Billing Legacy application,hierarchical database,mainframe

Very Large

ProductionRecord

Manufact-uring

ControlProduction

New applications,relational databases,AS/400

Medium

15

Application-Orientation vs. Subject-Orientation

Application-Orientation

Operational Database

LoansCredit Card

Trust

Savings

Subject-Orientation

DataWarehouse

Customer

VendorProduct

Activity

16

OLTP vs. Data Warehouse

OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse

Special data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries)

e.g., average amount spent on phone calls between 9AM-5PM in California during the month of December

17

OLTP vs. Data Warehouse

Complex Data Warehouse queries would degrade performance of operational DBMS

Data Warehouse requires historical data; not typically maintained by operational databases

Decision support requires consolidation (aggregation, summarization) of data from heterogeneous sources: operational DBMS, external sources, legacy systems

Different sources typically use different representations, code and format which have to be reconciled

18

OLTP vs Data Warehouse

OLTP Application

Oriented Used to run

business Detailed data Current up to date Isolated Data Repetitive access Clerical User

Warehouse (DSS) Subject Oriented Used to analyze

business Summarized and refined Snapshot data Integrated Data Ad-hoc access Knowledge User

(Manager)

19


OLTP Performance Sensitive Few Records accessed

at a time (tens)

Read/Update Access

No data redundancy Database Size

100MB -100 GB

Data Warehouse Performance relaxed Large volumes

accessed at a time(millions)

Mostly Read (Batch Update)

Redundancy present Database Size

100 GB - few terabytes

20


OLTP Transaction

throughput is the performance metric

Thousands of users Managed in entirety

Data Warehouse Query throughput is

the performance metric

Hundreds of users Managed by

subsets

21

Criteria for Selecting a Warehouse

load/index time query response time database size

requirements/limitations quality ratio of raw data size to full

database size (including indices, temp space, etc.)

parallel capabilities price company DBMS

standardization policy

DATAWAREHOUSE

ARCHITECTURE

23

Data Warehouse Architecture - Process View

RelationalDatabases

LegacyData

Purchased Data

Data Warehouse Engine

Optimized Loader

ExtractionCleansing

AnalyzeQuery

Metadata Repository

24

IT users

Operational Data Stores

DataTransformatio

nEnterprise Warehouse Management

Replication &Propagation

Data Marts Departmental Warehouses

Knowledge Discovery/Data Mining

Information Access Tools

Business Users

Getting Data In

Heart of theData Warehouse

GettingInformation Out

ωί Ιί θ

Data Warehouse Architecture - User View

25

Heart of the Data Warehouse

Heart of the data warehouse is the data itself!

Single version of the truthCorporate memoryData is organized in a way that

represents business -- subject orientation

26

Data Warehouse Structure

Subject Orientation -- customer, product, policy, account etc... A subject may be implemented as a set of related tables. E.g., customer may be five tables

27


base customer (1985-87)custid, from date, to date, name, phone, dob

base customer (1988-90)custid, from date, to date, name, credit rating,

employer customer activity (1986-89) -- monthly

summary customer activity detail (1987-89)

custid, activity date, amount, clerk id, order no customer activity detail (1990-91)

custid, activity date, amount, line item no, order no

Time is Time is part of part of key of key of each tableeach table

28


Base Customer (1985-87)Base Customer

(1988-90)Cust Activity (1986-89)

Cust Activity Detail (1987-89)

Cust activity Detail (1990-91)

Subject data may contain different data on different media

Can also useoptical disks

29

Schema Design

Database organization must look like business must be recognizable by business user approachable by business user Must be simple

Schema Types Star Schema Fact Constellation Schema Snowflake schema

30

Dimension Tables

Dimension tables Define business in terms already familiar to

users Wide rows with lots of descriptive text Small tables (about a million rows) Joined to fact table by a foreign key heavily indexed typical dimensions

time periods, geographic region (markets, cities), products, customers, salesperson, etc.

31

Fact Table

Central table mostly raw numeric items narrow rows, a few columns at most large number of rows (millions to a

billion) Access via dimensions

32

Star Schema

A single fact table and for each dimension one dimension table

Does not capture hierarchies directly

T im

e

prod

cust

city

fact

date, custno, prodno, cityname, ...

33

Snowflake schema

Represent dimensional hierarchy directly by normalizing tables.

Easy to maintain and saves storage

T im

e

prod

cust

city

fact

date, custno, prodno, cityname, ...

region

34

Fact Constellation

Fact Constellation Multiple fact tables that share many

dimension tables Booking and Checkout may share many

dimension tables in the hotel industryHotels

Travel Agents

Promotion

Room Type

Customer

Booking

Checkout

DATA STORAGE

IN DATAWAREHOUSE

36

Data Granularity in Warehouse

Summarized data stored reduce storage costs reduce cpu usage increases performance since smaller

number of records to be processed design around traditional high level

reporting needs tradeoff with volume of data to be stored

and detailed usage of data

37

Granularity in Warehouse

Can not answer some questions with summarized data Did Ashish call Vivek last month? Not

possible to answer if total duration of calls by Ashish over a month is only maintained and individual call details are not.

Detailed data too voluminous

38

Granularity in Warehouse

Tradeoff is to have dual level of granularity Store summary data on disks

95% of DSS processing done against this data

Store detail on tapes5% of DSS processing against this data

39

Estimates of Data Volume

To determine appropriate level (dual or single) of granularity we need to estimate the disk space requirements

For each known table Get upper and lower bounds on size of row Estimate max and min number of rows in

table for the 1 year horizon and the 5 year horizon

Calculate space for indexes for max and min number of rows

40

Dual level of granularity

1 Year Horizon 5 Year Horizon

10,000

100,000

1,000,000

10,000,000

# Rows

Any design will do

Careful Design

Dual levels ofgranularity

Dual levels of granularity and careful design

# Rows

100,000

1,000,000

10,000,000

20,000,000Dual levels of granularity and careful design

Careful Design

Any design will do

Dual levels ofgranularity

41

Dual Level of Granularity

On the five year horizon, the totals shift by an order of magnitude. Reason is: More expertise will be available in managing

the data warehouse volumes of data Hardware costs will have dropped More powerful software tools will be available End user will be more sophisticated

Actual size of record is not that important since size of indexes determines the above

42

What should be granularity level?

Starting point for deciding level of granularity is made on the basis of previous estimates and some educated guess

The initial guess is refined through an iterative analysis

Reports

Analysis

Design

Populate

Data WarehouseDeveloper

DSS Analysts

43

How to control granularity?

Summarize data from source as it goes into target Average data as it goes into target Push highest/lowest set values into target

Push only data that is needed at targetPush only a subset of rows based on

some conditions

44

Levels of Granularity

Operational

60 days ofactivity

account activity date amount teller location account bal

accountmonth # trans withdrawals deposits average bal

amountactivity date amount account bal

monthly accountregister -- up to 10 years

Not all fieldsneed be archived

Banking Example

45

Partitioning

Breaking data into several physical units that can be handled separately

Not a question of whether to do it in data warehouses but how to do it

Granularity and partitioning are key to effective implementation of a warehouse

46

Why Partitioning?

Flexibility in managing dataSmaller physical units allow

easy restructuring free indexing sequential scans if needed easy reorganization easy recovery easy monitoring

47

Criterion for Partitioning

Typically partitioned by date line of business geography organizational unit any combination of above

48

Partitioning Example

An insurance company may partition its data as follows: 1995 medical claims 1995 life claims 1996 medical claims 1996 life claims

49

Where to Partition?

Application level or DBMS levelMakes sense to partition at

application level Allows different definition for each year

Important since warehouse spans many years and as business evolves definition changes

Allows data to be moved between processing complexes easily

50

Denormalization

Normalization in a data warehouse may lead to lots of small tables

Can lead to excessive I/O’s since many tables have to be accessed

Denormalization is the answer especially since updates are rare

51

Denormalization

Create ArraysSelective RedundancyDerived Data

52

Creating Arrays

Many time each occurrence of a sequence of data is in a different physical location

Beneficial to collect all occurrences together and store as an array in a single row

Makes sense only if there are a stable number of occurrences which are accessed together

In a data warehouse, such situations arise naturally due to time based orientation can create an array by month

53

Selective Redundancy

Description of an item can be stored redundantly with order table -- most often item description is also accessed with order table

Updates have to be careful

54

Vertical Partitioning

acctno balance address date opened . . . .

acctno balance

acctno address date -opened . . .

Frequentlyaccessed

Rarely accessed

Smaller tableand so less I/O

55

Derived Data

Introduction of derived (calculated data) may often help

Have seen this in the context of dual levels of granularity

Can keep auxiliary views and indexes to speed up query processing

DATA TRANSFORMATION

57

Loading the Warehouse

Load is a crucial component for the success of warehouse project

Issues: Sources of data for the warehouse Data quality at the sources Data Transformation How to propagate updates (on the sources) to

the warehouse Terabytes of data to be loaded

58

Source Data

Typically host based, legacy applications Customized applications, COBOL, 3GL, 4GL

Point of Contact Devices POS, ATM, Call switches

External Sources Vendors, Delivery Partners, Agents

Sequential Legacy Relational ExternalOperational/Source Data

59

Data Quality - The Reality

Tempting to think that all that is there to creating a data warehouse is extracting operational data and entering into a data warehouse

Nothing could be farther from the truth

Warehouse data comes from disparate questionable sources

60

Data Quality - The Reality

Legacy systems no longer documentedOutside sources with questionable

quality proceduresProduction systems with no built in

integrity checks and no integration Operational systems are usually designed

to solve a specific business problem and are rarely developed to a a corporate plan“… And get it done quickly, we do not have

time to worry about corporate standards...”

61

Data Transformation

Data transformation is the foundation for achieving single version of the truth

Major concern for ITData warehouse can fail if appropriate

data transformation strategy is not developed

Sequential Legacy Relational ExternalOperational/Source Data

Data Transformation

Accessing Capturing Extracting Householding FilteringReconciling Conditioning Loading Validating Scoring

62

Data Integration Across Sources

Trust Credit cardSavings Loans

Same data different name

Different data Same name

Data found here nowhere else

Different keyssame data

63

Data Transformation Example

enco

ding

unit

fiel

d

appl A - balanceappl B - balappl C - currbalappl D - balcurr

appl A - pipeline - cmappl B - pipeline - inappl C - pipeline - feetappl D - pipeline - yds

appl A - m,fappl B - 1,0appl C - x,yappl D - male, female

Data Warehouse

64

Data Integrity Problems

Same person, different spellings Rajiv, Rajeev, Raju...

Multiple ways to denote company name Novatek Systems, NISL, Novatek Pvt. Ltd.

Use of different names Mumbai, Bombay Bill, William

Different account numbers generated by different applications for the same customer

Required fields left blank Invalid product codes collected at point of sale

manual entry leads to mistakes “in case of a problem use 9999999”

65

Data Transformation Terms

ExtractingConditioningScrubbingMergingHouseholding

EnrichmentScoringLoadingValidatingDelta Updating

66


Extracting Capture of data from operational source in

“as is” status Sources for data generally in legacy

mainframes in VSAM, IMS, IDMS, DB2; more data today in relational databases on Unix

Conditioning The conversion of data types from the

source to the target data store (warehouse) -- always a relational database

67


Scrubbing Ensuring all data meets the input validation

rules which should have been in place when the data was captured by the operational system. E.g..., null values for data declared not null, numeric in non-numeric, proper zip codes etc...

Merging Bringing together data from operational

sources. Choosing information from each functional system to populate the single occurrence of the data item in the warehouse

68


Householding Identifying all members of a household

(living at the same address) Ensures only one mail is sent to a

household Can result in substantial savings: 1

million catalogues at Rs. 50 each costs Rs. 50 million . A 2% savings would save Rs. 1 million

69


Enrichment Bring data from external sources to

augment/enrich operational data. (e.g. foreign currency fluctuations over period) Data sources include newspaper groups, consulting firms etc...

Scoring computation of a probability of an event.

e.g..., chance that a customer will defect to AT&T from MCI, chance that a customer is likely to buy a new product

70


Loading placing data into the warehouse --

accomplished using a load utility provided by database vendors

Validating Process of ensuring that the data

captured is accurate and transformation process is correct

71


Delta Updating propagation of changes to source since

last extraction load smaller subsets into data

warehouseMetadata

Data dictionary for the warehouse

72

Loads

After extracting, scrubbing, cleaning, validating etc. need to load the data into the warehouse

Issues huge volumes of data to be loaded small time window available when warehouse can be

taken off line (usually nights) when to build index and summary tables allow system administrators to monitor, cancel, resume,

change load rates Recover gracefully -- restart after failure from where you

were and without loss of data integrity

73

Load Techniques

Use SQL to append or insert new data record at a time interface will lead to random disk I/O’s

Use batch load utility

74

Batch Load Utility

Sort input records on clustering key Sequential I/O significantly faster than random

I/O Single pass load

perform all transformations, scrub, clean, validate, aggregate etc.

build indexes at same time and create summary tables

Sequential loads may take long (loading a TB warehouse may take ~100 days) Exploit I/O Parallelism to load data at acceptable rates

Leverage knowledge of data warehouse schemas

75

Load Taxonomy

Incremental versus Full loadsOnline versus Offline loads

76

Incremental Load

Full load is too disruptive and not required if updates since last load can be identified easily

Can use incremental load to reduce data actually loaded insert only updated tuples

77

Online Load

Online full loads can load a new table while queries on old table continue if there is enough disk space

Online incremental loads conflict with queries break into shorter transactions (every 1000

records or every so many seconds) coordinate this sequence of transactions: must

ensure consistency between base and derived tables and indices

78

Refresh

Propagate updates on source data to the warehouse

Issues: when to refresh how to refresh -- refresh techniques

79

When to Refresh?

periodically (e.g., every night, every week) or after significant events

on every update: not warranted unless warehouse data require current data (up to the minute stock quotes)

refresh policy set by administrator based on user needs and traffic

possibly different policies for different sources

80

Refresh Techniques

Full Extract from base tables read entire source table: too expensive maybe the only choice for legacy

systems

81

Refresh techniques

Incremental techniques detect changes on base tables: replication

servers (e.g., Sybase, Oracle, IBM Data Propagator)snapshots (Oracle)transaction shipping (Sybase)

compute changes to derived and summary tables

maintain transactional correctness for incremental load

82

How To Detect Changes

Create a snapshot log table to record ids of updated rows of source data and timestamp

Detect changes by: Defining after row triggers to update

snapshot log when source table changes Using regular transaction log to detect

changes to source data

DBMS

FOR DATAWAREHOUSING

84

Relational DBMS

Features that support DSS Specialized Indexing techniques Specialized join and scan methods data partitioning and use of parallelism complex query processing intelligent aggregate processing extensions to SQL and their processing

85

Indexing Techniques

Bitmap index: A collection of bitmaps -- one for each

distinct value of the column Each bitmap has N bits where N is the

number of rows in the table A bit corresponding to a value v for a

row r is set if and only if r has the value for the indexed attribute

86Customer Query : select * from customer where

gender = ‘F’ and vote = ‘Y’

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

Bitmap Index

M

F

F

F

F

M

Y

Y

Y

N

N

N

87

Bitmap Indexing

Bit arithmetic (AND/OR) is fastSpace occupied depends on cardinality

of domains Not good if indexed attribute has too many

distinct valuesCan be compressed (e.g. using run

length encoding) effectivelyProducts that support bitmaps: Model

204, TargetIndex(Redbrick), IQ (Sybase), Oracle 7.3

88

Join Indexes

Pre-computed joinsA join index between a fact table and

a dimension table correlates a dimension tuple with the fact tuples that have the same value on the common dimensional attribute e.g., a join index on city dimension of

calls fact table correlates for each city the calls (in the

calls table) that originated from that city

89

Join Indexes

Join indexes can also span multiple dimension tables e.g., a join index on city and time

dimension of calls fact table

90

Star Join Processing

Use join indexes to join dimension and fact table

CallsC+T

C+T+L

C+T+L+P

Time

Loca-tion

Plan

91

Optimized Star Join Processing

Time

Loca-tion

Plan

Calls

Virtual Cross Productof T, L and P

Apply Selections

92

Bitmapped Join Processing

AND

Time

Loca-tion

Plan

Calls

Calls

Calls

Bitmaps101

001

110

93

Intelligent Scan

Piggyback multiple scans of a relation (Redbrick) piggybacking also done if second scan

starts a little while after the first scan

94

Parallel Query Processing

Three forms of parallelism Independent Pipelined Partitioned and “partition and replicate”

Deterrents to parallelism startup communication

95

Parallel Query Processing

Partitioned Data Parallel scans Yields I/O parallelism

Parallel algorithms for relational operators Joins, Aggregates, Sort

Parallel Utilities Load, Archive, Update, Parse, Checkpoint,

Recovery Parallel Query Optimization

96

Pre-computed Aggregates

Keep aggregated data for efficiency (pre-computed queries)

Questions Which aggregates to compute? How to update aggregates? How to use pre-computed aggregates

in queries?

97

Pre-computed Aggregates

Aggregated table can be maintained by the warehouse server middle tier client applications

Pre-computed aggregates -- special case of materialized views -- same questions and issues remain

98

SQL Extensions

Extended family of aggregate functions rank (top 10 customers) percentile (top 30% of customers) median, mode Object Relational Systems allow

addition of new aggregate functions

99

SQL Extensions

Reporting features running total, cumulative totals

Cube operator group by on all subsets of a set of

attributes (month,city) redundant scan and sorting of data can

be avoided

100

Technological Requirements

Managing Large amounts of dataManaging multiple media -- storage

hierarchy cache (L1 and L2) main memory disks optical disks tapes fiche

101


Ability to index data at will temporary indices, sparse indices

Ability to monitor data freely and easily to determine whether reorganization is

required to determine if index is poorly structured to determine statistical composition of data

Need to interface to many technologies for both receiving and passing data

102


Programmer/Designer control of dataParallel Storage/Management of dataGood Metadata managementLoad the warehouse efficientlyUse indexes efficientlyCompaction of data

103


Compound KeysVariable Length dataLock Management

Need to be able to turn the lock manager on and off

Index Only processing

104

Warehouse Server Products

Oracle 8Informix

Online Dynamic Server for SMP Extended Parallel Server for MPP Universal Server for object relational

applicationsSybase

Adaptive Server 11.5 Sybase MPP Sybase IQ

105

Warehouse Server Products

MS-SQL Server, OLAP serverRed Brick WarehouseTandem NonstopIBM

DB2 MVS Universal Server DB2 400

Teradata

106

Server Scalability

Scalability is the #1 IT requirement for Data Warehousing

Hardware Platform options SMP Clusters (shared disk) MPP

Loosely coupled (shared nothing)Hybrid

107

SMP Characteristics

SMP -- Symmetric multi processing -- shared everything

Multiple CPUs share same memory

Workload is balanced across CPUs by OS

Scalability is limited to bandwidth of internal bus and OS architecture

Not tolerant to failure in processing node

Architecture is mostly invisible to applications

108

SMP Benefits

Lower entry point -- can start with SMP

Mature technology

109

MPP Characteristics

Each node owns a portion of the database

Nodes are connected via an interconnection network

Each node can be a single CPU or SMPLoad balancing done by applicationHigh scalability due to local processing

isolation

110

MPP benefits

High availabilityHigh scalability

111

Sizing your system

Estimate Total volume of data Total disk throughput required

Determine number of controllers and disks required

Determine CPU and memory based on the workload

112

Other Warehouse Related Products

Connectivity to Sources Apertus Information Builders EDA/SQL Platimum Infohub SAS Connect IBM Data Joiner Oracle Open Connect Informix Express Gateway

113


Data extract, clean, transform, refresh CA-Ingres replicator Carleton Passport Prism Warehouse Manager SAS Access Sybase Replication Server Platinum Inforefiner, Infopump

114

Other warehouse related products

Multidimensional Database Engines Arbor Essbase Oracle IRI Express SAS System

ROLAP servers HP Intelligent Warehouse Informix metacube MicroStrategy DSS server

115


Query/Reporting Environments Brio/Query Cognos Impromptu Informix Viewpoint CA Visual Express Business Objects Platinum Forest and Trees

116


Multidimensional Analysis Andyne Pablo Arbor Essbase Analysis Server Cognos Powerplay Holistic Systems (HOLOS) Microstrategy DSS SAS OLAP++

BUILDING A SUCCESSFUL

DATAWAREHOUSE

118

From The Standish Group

TSG: How many Data Warehouses do you have?

Data Warehouser: We have had eight.

TSG: To what do you attribute so many warehouses?

Data Warehouser: Seven mistakes ...

119

For a Successful Warehouse

From day one establish that warehousing is a joint user/builder project

Establish that maintaining data quality will be an ONGOING joint user/builder responsibility

Train the users one step at a timeConsider doing a high level corporate data

model in no more than three weeks

From Larry Greenfield, http://pwp.starnetinc.com/larryg/index.html

120


Look closely at the data extracting, cleaning, and loading tools

Implement a user accessible automated directory to information stored in the warehouse

Determine a plan to test the integrity of the data in the warehouse

From the start get warehouse users in the habit of 'testing' complex queries

121


Coordinate system roll-out with network administration personnel

When in a bind, ask others who have done the same thing for advice

Be on the lookout for small, but strategic, projects

Market and sell your data warehousing systems

122

Data Warehouse Pitfalls

You are going to spend much time extracting, cleaning, and loading data

Despite best efforts at project management, data warehousing project scope will increase

You are going to find problems with systems feeding the data warehouse

You will find the need to store data not being captured by any existing system

You will need to validate data not being validated by transaction processing systems

123


Some transaction processing systems feeding the warehousing system will not contain detail

Many warehouse end users will be trained and never or seldom apply their training

After end users receive query and report tools, requests for IS written reports may increase

Your warehouse users will develop conflicting business rules

Large scale data warehousing can become an exercise in data homogenizing

124


'Overhead' can eat up great amounts of disk space The time it takes to load the warehouse will

expand to the amount of the time in the available window... and then some

Assigning security cannot be done with a transaction processing system mindset

You are building a HIGH maintenance system You will fail if you concentrate on resource

optimization to the neglect of project, data, and customer management issues and an understanding of what adds value to the customer

THANK YOU

the data warehouse this is where the data lives. 2 agenda zwhat is a warehouse? zdata warehouse...

Documents