what is a data warehouse by w. h. inmon

33
What is a Data Warehouse What is a Data Warehouse by W. H. Inmon by W. H. Inmon http://www.cait.wustl.edu/cait/papers/prism/vol1_no1/ http://www.cait.wustl.edu/cait/papers/prism/vol1_no1/

Post on 21-Dec-2015

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: What is a Data Warehouse by W. H. Inmon

What is a Data WarehouseWhat is a Data Warehouse

by W. H. Inmonby W. H. Inmon

http://www.cait.wustl.edu/cait/papers/prism/vol1_no1/http://www.cait.wustl.edu/cait/papers/prism/vol1_no1/

Page 2: What is a Data Warehouse by W. H. Inmon

What is a Data Warehouse? What is a Data Warehouse?

• A data warehouse is a: A data warehouse is a:

subject-orientedsubject-oriented,,

integratedintegrated,,

time-varianttime-variant,,

nonvolatilenonvolatile,,

collection of data in support of management's collection of data in support of management's decision making processdecision making process

• The data comes from the operational environmentThe data comes from the operational environment• The data warehouse is always a physically separate The data warehouse is always a physically separate

storestore

Page 3: What is a Data Warehouse by W. H. Inmon

Difference Between Operational Systems Difference Between Operational Systems and Data and the Data Warehouse (DW)and Data and the Data Warehouse (DW)

• DW is oriented around the major DW is oriented around the major subjectssubjects of the of the enterpriseenterprise

• The data-driven, subject orientation is in contrast to The data-driven, subject orientation is in contrast to the more classical process/functional orientation of the more classical process/functional orientation of applicationsapplications

• The DW world focuses on data modeling and The DW world focuses on data modeling and database design exclusivelydatabase design exclusively

• DW data excludes data that will not be used for DSS DW data excludes data that will not be used for DSS processingprocessing

• DW data spans a spectrum of time and the DW data spans a spectrum of time and the relationships found in the data warehouse are manyrelationships found in the data warehouse are many

Page 4: What is a Data Warehouse by W. H. Inmon

The data warehouse has a strong The data warehouse has a strong subject orientationsubject orientation

OperationalOperational Data warehouseData warehouse

Loans

Savings

Bank card

Trust

An application orientation

Activity

Product

Vendor

Customer

A subject orientation

Page 5: What is a Data Warehouse by W. H. Inmon

IntegrationIntegration

• Data found within the DW is integratedData found within the DW is integrated– ALWAYSALWAYS– WITH NO EXCEPTIONSWITH NO EXCEPTIONS– consistent naming conventionsconsistent naming conventions– consistent measurement of variablesconsistent measurement of variables– consistent encoding structuresconsistent encoding structures– consistent physical attributes of dataconsistent physical attributes of data

• data needs to be stored in the DW in a singular, data needs to be stored in the DW in a singular, globally-acceptable fashionglobally-acceptable fashion

Page 6: What is a Data Warehouse by W. H. Inmon

When data is moved to the DW from the application-oriented When data is moved to the DW from the application-oriented operational environment, the data is integrated before entering the DWoperational environment, the data is integrated before entering the DW

appl A - m, fappl B - 1, 0appl C - x, yappl D - male, female

appl A - descriptionappl B - descriptionappl C - descriptionappl D - description

appl A - balance dec fixed (13,2)appl B - balance pic 9(9)v99appl C - balance dec fixed (11,0)appl D - balance pic s9(7)v99 comp 3

appl A - pipeline cmappl B - pipeline inchesappl C - pipeline mcfappl D - pipeline yds

appl A - date (Julian)appl B - date (yymmdd)appl C - date (mmddyy)appl D - date (absolute)

appl A - bal-on-handappl B - current-balanceappl C - cash-in-houseappl D - balance

m, f

pipeline cm

balance dec fixed (13,2)

?description

balance

date (Julian)

OperationalOperational Data warehouseData warehouse

Page 7: What is a Data Warehouse by W. H. Inmon

IntegrationIntegration

• The collective ability of many application The collective ability of many application designers to create inconsistent applications is designers to create inconsistent applications is legendarylegendary

• The integration affects almost every aspect of The integration affects almost every aspect of design - the physical characteristics of data, the design - the physical characteristics of data, the dilemma of having more than one source of data, dilemma of having more than one source of data, the issue of inconsistent naming standards, the issue of inconsistent naming standards, inconsistent date formats, and so forthinconsistent date formats, and so forth

Page 8: What is a Data Warehouse by W. H. Inmon

Time VariancyTime Variancy

• All data in the data warehouse is accurate as of All data in the data warehouse is accurate as of some moment in time (i.e., not "right now")some moment in time (i.e., not "right now")

• In the operational environment data is accurate In the operational environment data is accurate as of the moment of accessas of the moment of access

• Data found in the warehouse is said to be "time Data found in the warehouse is said to be "time variantvariant

Page 9: What is a Data Warehouse by W. H. Inmon

Time VariancyTime Variancy

OperationalOperational Data warehouseData warehouse

Current value data:• time horizon -- 60 - 90 days• key may or may not have an element of time• data can be updated

Snapshot data:• time horizon -- 5 - 10 years• key contains an element of time• once snapshot is made, record cannot be updated

Page 10: What is a Data Warehouse by W. H. Inmon

NonvolatileNonvolatile

Replace

Insert

Replace

Change

Replace

Insert

Change

LoadAccess

Data is updated on a record-by-record basis regularly

OperationalOperational

Data is loaded into the warehouse and is accessed there, but once the snapshot of data is made, the data in the warehouse does not change

Data warehouseData warehouse

Page 11: What is a Data Warehouse by W. H. Inmon

NonvolatileNonvolatile

• The basic manipulation of data that occurs in the The basic manipulation of data that occurs in the data warehouse is simpledata warehouse is simple

• There are only two kinds of operationsThere are only two kinds of operations– the initial loading of datathe initial loading of data– the access of datathe access of data

• There is no update of dataThere is no update of data• The need to be cautious of the update anomaly is The need to be cautious of the update anomaly is

no factorno factor• Liberties can be taken to optimize the access of Liberties can be taken to optimize the access of

datadata

Page 12: What is a Data Warehouse by W. H. Inmon

NonvolatileNonvolatile

• Another consequence is in the technologyAnother consequence is in the technology• Technologies to support:Technologies to support:

– record-by-record update in an on-line moderecord-by-record update in an on-line mode– backup and recoverybackup and recovery– transaction and data integritytransaction and data integrity– detection and remedy of deadlockdetection and remedy of deadlock

are quite complex and unnecessary for data are quite complex and unnecessary for data warehouse processingwarehouse processing

• DW environment is VERY, VERY different from DW environment is VERY, VERY different from the classical operational environmentthe classical operational environment

Page 13: What is a Data Warehouse by W. H. Inmon

NonvolatileNonvolatile

• The source of nearly all data warehouse data is The source of nearly all data warehouse data is the operational environmentthe operational environment

• It is a temptation to think that there is massive It is a temptation to think that there is massive redundancy of data between the two environmentsredundancy of data between the two environments

• In fact there is a MINIMUM of data redundancyIn fact there is a MINIMUM of data redundancy– data is filtered; much data never passes out of the data is filtered; much data never passes out of the

operational environmentoperational environment

– the time horizon of data is very differentthe time horizon of data is very different

– the data warehouse contains summary datathe data warehouse contains summary data

– data undergoes a fundamental transformation as it data undergoes a fundamental transformation as it passes into the data warehousepasses into the data warehouse

Page 14: What is a Data Warehouse by W. H. Inmon

The Structure of the WarehouseThe Structure of the Warehouse

• Data warehouses have a distinct structureData warehouses have a distinct structure• Different components of the data warehouse are:Different components of the data warehouse are:

– meta datameta data– current detail datacurrent detail data– older detail dataolder detail data– lightly summarized datalightly summarized data– highly summarized datahighly summarized data

• The major concern is the current detail dataThe major concern is the current detail data– the most recent happenings are always of great interestthe most recent happenings are always of great interest– voluminous, stored at the lowest level of granularityvoluminous, stored at the lowest level of granularity– disk storage is fast to access but expensive and complex to disk storage is fast to access but expensive and complex to

managemanage

Page 15: What is a Data Warehouse by W. H. Inmon

There are different levels of summarization and There are different levels of summarization and detail that demark the data warehousedetail that demark the data warehouse

Highly summarized

Lightly summarized

Current data

Older detail data

META

DATA

Page 16: What is a Data Warehouse by W. H. Inmon

The Structure of the WarehouseThe Structure of the Warehouse

• Older detail data is stored on some form of mass Older detail data is stored on some form of mass storagestorage– it is infrequently accessedit is infrequently accessed– it is stored at a level of detail consistent with current it is stored at a level of detail consistent with current

detailed datadetailed data

• Lightly summarized data is distilled from the low Lightly summarized data is distilled from the low level of detail found at the current detailed levellevel of detail found at the current detailed level– it is almost always stored on disk storageit is almost always stored on disk storage– the design issues are: the design issues are:

• what unit of time is the summarization done overwhat unit of time is the summarization done over

• what attributes will the lightly summarized data containwhat attributes will the lightly summarized data contain

Page 17: What is a Data Warehouse by W. H. Inmon

The Structure of the WarehouseThe Structure of the Warehouse

• Highly summarized data is compact and easily Highly summarized data is compact and easily accessibleaccessible

• Meta data plays a special and very important role Meta data plays a special and very important role in the data warehousein the data warehouse

• It is used as:It is used as:– a directory to help locate the contentsa directory to help locate the contents– a guide to the mapping of data as the data is transformed a guide to the mapping of data as the data is transformed

from the operational to the DW environmentfrom the operational to the DW environment– a guide to the algorithms used for summarizationa guide to the algorithms used for summarization

Page 18: What is a Data Warehouse by W. H. Inmon

An example of the levels of summarization that An example of the levels of summarization that might be found in the data warehousemight be found in the data warehouse

national salesby month1988-1996

monthly salesby product line1993-1996

national salesby week1986-1996

weekly salesby subproduct1988-1996

sales detail1995-1996

sales detail1985-1994

META

DATA

Page 19: What is a Data Warehouse by W. H. Inmon

An Example of the Data An Example of the Data WarehouseWarehouse

• Old sales detail is that detail about sales that is older than Old sales detail is that detail about sales that is older than 19951995

• The current value detail contains data from 1995 to 1996The current value detail contains data from 1995 to 1996• The sales detail is summarized weekly by subproduct line The sales detail is summarized weekly by subproduct line

and by region to produce the lightly summarized stores of and by region to produce the lightly summarized stores of datadata

• The weekly sales detail is further summarized monthly The weekly sales detail is further summarized monthly along even broader lines to produce the highly summarized along even broader lines to produce the highly summarized datadata

• Meta data contains (at the least!):Meta data contains (at the least!):– the structure of the datathe structure of the data– the algorithms used for summarizationthe algorithms used for summarization– the mapping from the operational environment to the the mapping from the operational environment to the

data warehousedata warehouse

Page 20: What is a Data Warehouse by W. H. Inmon

Old Detail Storage MediumOld Detail Storage Medium

• A wide variety of storage media that should be A wide variety of storage media that should be considered for storing older detail dataconsidered for storing older detail data– photo optical storagephoto optical storage– CD-ROMCD-ROM– micro fichemicro fiche– magnetic tapemagnetic tape– mass storagemass storage

• It is entirely likely that other storage media will It is entirely likely that other storage media will serve the needsserve the needs

Page 21: What is a Data Warehouse by W. H. Inmon

The Flow of Data Inside the Data Warehouse The Flow of Data Inside the Data Warehouse

Operational environment

Summarization process

Aging process

Page 22: What is a Data Warehouse by W. H. Inmon

Flow of DataFlow of Data

• As data enters the data warehouse from the As data enters the data warehouse from the operational environment, it is transformedoperational environment, it is transformed

• Upon entering the data warehouse, data goes into Upon entering the data warehouse, data goes into the current detail level of detailthe current detail level of detail

• It resides there and is used there until one of It resides there and is used there until one of three events occurs:three events occurs:– it is purgedit is purged– it is summarized, and/orit is summarized, and/or– it is archivedit is archived

Page 23: What is a Data Warehouse by W. H. Inmon

The higher the levels of summarization, the more The higher the levels of summarization, the more the usage of the datathe usage of the data

Page 24: What is a Data Warehouse by W. H. Inmon

Summarized DataSummarized Data

• The more summarized the data, the quicker and The more summarized the data, the quicker and more efficient it is to get to the datamore efficient it is to get to the data

• The DSS analyst in a pre-data warehouse The DSS analyst in a pre-data warehouse environment has used data at the detailed levelenvironment has used data at the detailed level

• One of the tasks of the data architect is to wean One of the tasks of the data architect is to wean the DSS user from constantly using data at the the DSS user from constantly using data at the lowest level of detaillowest level of detail– installing a chargeback systeminstalling a chargeback system– pointing out very good response time when dealing with pointing out very good response time when dealing with

data at a high level of summarizationdata at a high level of summarization

Page 25: What is a Data Warehouse by W. H. Inmon

Other ConsiderationsOther Considerations

• Data at the higher levels of summarization can be Data at the higher levels of summarization can be freely indexedfreely indexed

• Data at the lower levels of detail is so voluminous Data at the lower levels of detail is so voluminous that it can be indexed sparinglythat it can be indexed sparingly

• The data model and formal design applies almost The data model and formal design applies almost exclusively to the current level of detailexclusively to the current level of detail

• The data modeling activities do not apply to the The data modeling activities do not apply to the levels of summarizationlevels of summarization

Page 26: What is a Data Warehouse by W. H. Inmon

Indexes and Data ModelIndexes and Data Model

Data model

Page 27: What is a Data Warehouse by W. H. Inmon

Partitioning of DW DataPartitioning of DW Data

• Partitioning can be done in two waysPartitioning can be done in two ways– at the DBMS levelat the DBMS level

• the DBMS is aware of the partitions and manages them the DBMS is aware of the partitions and manages them accordinglyaccordingly

• the automatic management of the partitions is inflexiblethe automatic management of the partitions is inflexible

– at the application levelat the application level• the responsibility for the management of the partitions is the responsibility for the management of the partitions is

left up to the programmerleft up to the programmer

• provides flexibility in the management of data in the data provides flexibility in the management of data in the data warehousewarehouse

Page 28: What is a Data Warehouse by W. H. Inmon

Current detail data is almost always partitionedCurrent detail data is almost always partitioned

Page 29: What is a Data Warehouse by W. H. Inmon

The internal structuring of data in a sample data warehouse The internal structuring of data in a sample data warehouse

current detailed data

customer history

88 - present

vendor/supplier history

91 - present

order/customer

93

parts/order history

95949291

q193

q293

q393

q493

q194

q294

q394

q494

q195

q295

q395

q495

q196

parts manufacture history

part

part

part/order

part

93 959492

parts bill of material

part/assembly

assembly history

87 - present93

parts shipments

95949291

Page 30: What is a Data Warehouse by W. H. Inmon

An Example of a Data WarehouseAn Example of a Data Warehouse

• The levels of summarization are not shown, nor is The levels of summarization are not shown, nor is the old detail archive shownthe old detail archive shown

• There are tables of the same type divided over There are tables of the same type divided over timetime

• For different types of tables there are different For different types of tables there are different units of time physically dividing the units of dataunits of time physically dividing the units of data

• Different tables are linked by means of a common Different tables are linked by means of a common identifieridentifier

Page 31: What is a Data Warehouse by W. H. Inmon

Other AnomaliesOther Anomalies

• Public summary data is summary data that has Public summary data is summary data that has been calculated outside the boundaries of the been calculated outside the boundaries of the data warehouse but is used throughout the data warehouse but is used throughout the corporationcorporation

• Another anomaly is that of external dataAnother anomaly is that of external data• Another exceptional type of data sometimes Another exceptional type of data sometimes

found in a data warehouse is that of permanent found in a data warehouse is that of permanent detail data stored for ethical or legal reasonsdetail data stored for ethical or legal reasons– the medium the data is stored on must be as safety the medium the data is stored on must be as safety

proof as possibleproof as possible– the data must be able to be restoredthe data must be able to be restored– the data needs special treatment in the indexing to be the data needs special treatment in the indexing to be

accessibleaccessible

Page 32: What is a Data Warehouse by W. H. Inmon

SummarySummary

A data warehouse is a subject-oriented, A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of integrated, time-variant, nonvolatile collection of data in support of management's decision needsdata in support of management's decision needs

There are four levels of data warehouse data:There are four levels of data warehouse data:– old detailold detail– current detailcurrent detail– lightly summarized datalightly summarized data– highly summarized datahighly summarized data

Meta data is also an important part of the data Meta data is also an important part of the data warehouse environment warehouse environment

Page 33: What is a Data Warehouse by W. H. Inmon