olap and data warehouse -inmon 050204 u

24
OLAP AND DATA WAREHOUSE BY W. H. Inmon

Upload: talita-lima

Post on 09-Jun-2015

88 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Olap and data warehouse -inmon 050204 u

OOLLAAPP AANNDD DDAATTAA WWAARREEHHOOUUSSEE

BY

W. H. Inmon

Page 2: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 1

The goal of informational processing is to turn data into information. Online analytical processing (OLAP) is an important method by which this goal can be accomplished in the data warehouse architecture. As data warehouse users’ understanding of DSS processing capabilities increases, and as the volume of data grows, the sophistication of data warehouse use increases. Figure 1 depicts the data warehouse architecture’s role in turning the data into information, and some of the general differences between the levels of the data warehouse architecture.

The different levels of data warehouse architecture may be described as:

• Organizationally Structured – Also known as “atomic”, “corporate” or “current detail” data, this level – the heart of the data warehouse – is structured to meet the informational requirements of the entire organization. Data which has been archived (“archived detail”) is also considered to belong to this level.

• Departmentally Structured – Data at this level of warehouse architecture is structured to meet the focused informational requirements of a distinct group identified by a specific business function. The data at this level has also been referred to as “lightly summarized” or “departmental” data.

• Individually Structured – Data at this level is structured to meet an even more focused set of informational requirements, as defined by a specific management function. The data at this level has also been referred to as “highly summarized” or “individual” data.

Page 3: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 2

OTHER NAMES FOR OLAP The increasing sophistication of the end users in meeting their informational – or “decision support” (DSS) – processing requirements leads to OLAP processing. OLAP is a natural extension of the data warehouse. Indeed, the departmentally structured level of the data warehouse, from the earliest descriptions of the data warehouse architecture, is ideal for addressing OLAP processing this level of data is also called the OLAP or “data mart” level of DSS processing. Figure 2 shows the different names for the OLAP level of data processing.

THE SOURCE OF OLAP DATA The OLAP level of data originates from the organizationally structured level of data in the data warehouse. This detailed, historical data is the heart of the data warehouse and forms a perfect foundation for the OLAP level of data. The organizationally structured level of data is fed by the operational environment and, in turn, feeds the OLAP. Figure 3 shows the relationship between the organizationally structured level of data in the data warehouse and the OLAP level of data.

Page 4: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 3

Some of the characteristics of the OLAP level of data are: • Smallness: Compared to the organizationally structured level of data, there is far

less data that resides in OLAP. As a rule there are two to three orders of magnitude less data,

• Flexibility: OLAP processing is much more flexible than the processing that occurs at the organizationally structured level of data warehouse processing. OLAP is flexible because there is much less data to contend with and because the software found at the OLAP level is designed for flexibility in contrast to the software found managing the organizationally structured level which is designed to manage large amounts of data,

• Limited History: The OLAP environment rarely contains more than six months to a year's worth of history. The organizationally structured level of data contains from five to ten years worth of data,

• Customized: The OLAP environment is customized by department to suit the particular needs of the organization that owns and manages it. The organizationally structured data in the data warehouse is truly corporate data,

• Pre-Categorized: Departmentally structured data in the OLAP environment is usually organized into pre-defined categories to facilitate the informational requirements of a specific department, while the data in the organizationally structured level of the data warehouse maintains all of the categories required for the entire corporate structure.

• Source: The source of OLAP data is the detailed data found in the organizationally structured level of the data warehouse. The source of data for the organizationally structured level is the operational environment.

DIFFERENCES BETWEEN OLAP AND ORGANIZATIONALLY STRUCTURED DW DATA There are many significant differences between the departmentally structured (OLAP) and the organizationally structured levels of the data warehouse. One of the most important aspects of the OLAP environment is that it is customized by department. Figure 4 shows that different instances of the OLAP environment can exist for different departments.

Page 5: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 4

In Figure 4 there is an OLAP instance for finance, a separate OLAP instance for accounting, and yet another separate OLAP instance for marketing, all originating from the organizationally structured level of the data warehouse. Figure 5 shows the methods by which the customization for a department from data in the organizationally structured level of the data warehouse to the OLAP level can be achieved.

Departmental customization can take many forms, such as: • Subsets – Finance will select some detailed data, while marketing will select other

detailed data. • Aggregations - Accounting will summarize their data one way, while finance will

summarize theirs another way. These different approaches may apply to different data being summarized, to different ways in which the aggregated results are calculated, or to different sets of categories by which the aggregated data is organized.

• Supersets - One department will denormalize their OLAP data by joining data from tables A and B, while another department will join data from tables B and C.

• Indexing - One department will index their data on keys ABC and BCD, while another department will index the same data on keys CDE and DEF, and so forth, to provide more optimal search paths that meet their different departmental requirements for informational processing.

• Derivations – A department may want a particular metric precomputed and the results stored in their OLAP environment. A similar metric may be stored at the organizationally structured level, but the department wants to compare their department-specific calculation to the organization-standard one.

• Arrays – In order to make the data in their OLAP environment more useful, a department may op to create an array of data to assist them in their informational goals. For example, data that is stored one record per month in the organizationally structured detail may be required as an array of 13 months to represent a contiguous year and facilitate current-year-previous-month analysis.

Page 6: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 5

There are as many ways to customize the data for a department, as there are departments. Indeed, a department is not limited to a single OLAP instance, but may require several OLAP environments to meet all of the department’s business requirements for information. The data that feeds the OLAP environment is the detailed data of the organizationally structured level of the data warehouse. Because the data residing in the detailed portion of the data warehouse is corporate, it is not optimized to suit the needs of any given department. One of the issues that naturally arise in the customization of the OLAP environments is that of reconcilability. With each department taking its own perspective of the corporate data found in the data warehouse, isn't there a problem with the loss of reconcilability of data? The answer is “no”. The reason why there is no problem with reconcilability is illustrated by the diagram in Figure 6.

Figure 6 shows that the same detailed data is looked at in many different ways by different departments. Because all departments are operating from the same foundation of detailed data, there is always reconcilability of data however the detailed data is customized. In this way detailed data provides a very satisfactory foundation for OLAP processing. Another way in which organizationally structured data provides a very good basis for departmental OLAP processing is that the price of creating the detailed foundation needs be paid only once. In other words, suppose an OLAP environment is to be created for the finance planning department. It is no small task to create the proper detailed

Page 7: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 6

foundation. But once the detailed foundation is created for the finance planning OLAP effort, then the very same foundation can be used for the sales OLAP effort, for the accounting OLAP effort, and so forth. Once the organizationally structured environment is created, there is no further incremental cost to the usage of the detailed data found therein. As many OLAP environments as desired can take advantage of the organizationally structured data once built. INDEXING THE TWO ENVIRONMENTS One of the substantive differences between the OLAP and the detailed data warehouse environment is that of indexing. Figure 7 shows that the OLAP environment can be highly and generously indexed, while the detailed environment should be sparsely indexed.

There may be as many as thirty or forty indexes in the OLAP environment while there may be as few as two or three indexes in the detailed environment. There are several reasons for this disparity in indexing. The first is in the volume of data found in the two environments. Where there is a modest amount of data in an environment, such as the OLAP environment, the luxury of having many indexes can be enjoyed. Where there is an immodest amount of data, such as the organizationally structured environment, there can only be a few indexes. But volume of data is not the only consideration in the difference in indexing. There is much direct end user access that occurs in the OLAP environment, while, relatively speaking - there is little direct end user access that occurs on detailed data, (once the organizationally structured environment is mature). Because of the disparity in direct end user access, there is a very real difference in the need for indexing in the different environments. One of the important distinctions made between the two environments is the end user interface found at each level. Figure 8 shows the difference in interfaces.

Page 8: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 7

Figure 8 shows that the OLAP interface is optimal for DSS access and analysis of information. There is a direct end user interface to organizationally structured data but it is a much cruder and much simpler interface. Much of the activity that occurs in the organizationally structured data environment is the selection and gathering of data. Very few analysts conduct detailed, heuristic analysis of data is done there. Those who do are the corporate explorers, and they typically require more powerful and, correspondingly, difficult to use access tools. These analysts are typically very knowledgeable about the organization’s data, and are not only comfortable using procedural tools to interface with the organizationally structured data, but discover things in this vast amount of detailed data that were previously unknown or unsuspected. EXPLORERS AND FARMERS Because of the differences in the volume of data found in the two environments and the difference in direct end user interfaces, there is a difference in the communities of usage. As a rule, the organizationally structured data serves the explorer community while the OLAP environment serves the farmer community. Figure 9 depicts these differences.

Page 9: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 8

The detailed data serves the explorer community because it is organizationally oriented (“corporate”), supports random access, and because it is complete and historical. The OLAP environment supports the farmer community because the data is customized before the data is sent to the OLAP environment. In order to customize the data it is necessary to know how the data is to be used, and it is the farmers of the world that are able to foretell how data will be used. There are some exceptions to this rule of the different communities of users. Because of the limited amount of data found there, the large number of indexes, and the elegance of the interface, some exploration can be done in the OLAP environment. But the OLAP level exploration is cursory, looking at the broad picture, not the detailed one. For the most part, the OLAP environment exists for and is optimal for the farmer community, not the explorer community. DRILL DOWN PROCESSING One of the features of the OLAP environment is that it supports drill down processing. Figure 10 shows the OLAP support of drill down processing.

Page 10: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 9

There are two types of drill down processing that are relevant: inter-OLAP drill down processing and OLAP-to-organizationally structured data drill down processing. Inter-OLAP drill down processing is used to show the relationships of summarization between the different instances of data within the OLAP environment. A lower level of detail exists the organizationally structured level of data, and that level of serves as a further level of drill down for the entire OLAP environment. OLAP WITH NO ORGANIZATIONALLY STRUCTURED DATA One of the temptations the designer has in building the OLAP environment is to not build the organizationally structured level of the data warehouse. It is a temptation to just build OLAP immediately on top of the operational environment. After all, the organizationally structured level of the data warehouse is: • expensive, • complex, • not easy to build. Figure 11 depicts the problems associated with skipping the organizationally structured level of the data warehouse in support of OLAP environments.

Page 11: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 10

Building OLAP directly on top of the operational environment is a grave mistake for a variety of reasons: • The operational environment is not designed to support integrated processing, but

the OLAP environment assumes data integration has been done some where prior to it.

• The operational environment contains only a limited amount of historical data. The OLAP environment requires historical data.

• Each OLAP environment must build their own customized interface to the operational environment. The development effort to do this is not trivial.

• Each OLAP environment puts a drag on the performance of the operational environment. The collective drag of many OLAP environments is very significant.

For these reasons (and many more!) building the OLAP environment directly from the operational environment is a very poor idea indeed. METADATA FOR THE OLAP ENVIRONMENT One of the more important aspects of the OLAP environment is that of metadata. OLAP metadata is important because it is metadata that keeps track of what is in the OLAP environment and where it came from. Upon doing an analysis or a new report, the end user in the OLAP environment first turns to metadata in order to determine what data is available as a basis for the analysis. Figure 12 shows metadata in the OLAP environment.

The components of OLAP metadata are very similar to those found in the organizationally structured level of the data warehouse. The OLAP components of metadata include: • descriptive information about what is in the OLAP environment:

o content, o structure o definition, etc.;

• the source of the data (the organizationally structured data or external data; • the business and technical name of the data;

Page 12: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 11

• a description of the summarization, , subset, superset and/or denormalization processes that describe the data’s journey form the organizationally structured level into the OLAP environment;

• metrics that describe how much data of what type is found in the OLAP environment, • refreshment scheduling information, describing when data has been populated; and • modeling information, describing how the data in the OLAP environment relates to

the corporate data model and to the OLAP data model (if one exits). Metadata in the OLAP environment is somewhat more complex than metadata found elsewhere in the data warehouse environment because there is a need for a specialized kind of metadata in the OLAP environment. Figure 13 illustrates the need for a unique kind of metadata in the OLAP environment.

There is a need for both local and global metadata in the OLAP environment. Local OLAP metadata relates immediately to the department that the OLAP instance serves. There might be financial planning OLAP metadata, sales OLAP metadata, and marketing OLAP metadata. At the same time there is a need for OLAP metadata that is global to the OLAP environment. Global OLAP metadata might include descriptions of how different departments relate to each other, how different sources of data differ, how data might flow from one OLAP environment to another, and so forth. As with metadata for any and all aspects of the entire data warehouse architecture, OLAP metadata needs to be supported as an interactive part of the process of the OLAP environment; an integral, important aspect of the OLAP environment, not an afterthought. One of the important uses of metadata in the OLAP environment is that in can (indeed, should) be used interactively in the query process. Once the end user has examined

Page 13: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 12

OLAP metadata to determine what the possibilities are, then the user is able to use the metadata interactively in the query process. MOVING DATA INTO OLAP Moving data into the OLAP environment from the detailed data warehouse environment is a non trivial task. Figure 11 shows the positioning of the program required for the transport of data into the OLAP environment.

There are several functions that are accomplished in this movement of data. These functions are not necessarily mutually exclusive, and include: • selection of a subset of detailed data, • summarization of detailed data, • customization of the detailed data into a departmental format, • precategorizaton of detailed data to meet departmental requirements, • creation of supersets by merging and joining of detailed data, • creation or update of arrays of detailed data, and so forth. Some of the issues that must be resolved in the creation and execution of the program that feeds the OLAP environment from the detailed data warehouse environment are: • frequency of refreshment, • efficiency with which the detailed data is read (acquired), • amount of detailed data to be acquired, • platform that sorting, joining, merging, etc., • ability to know what data already resides in the OLAP environment so that the same

record is not (unintentionally) created twice, • that unnecessary records will not be created, • whether data once processed is to be appended or updated into the OLAP

environment, and so forth. One of the interesting aspects of the program that loads the OLAP environment from the organizationally structured data warehouse environment is its mutability. The organizationally structured/OLAP interface is an unstable interface because the OLAP environment supports informational processing, and informational processing inherently implies instability because of the exploratory nature of how it is utilized. For this reason,

Page 14: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 13

the interface needs to as flexible as possible, because maintenance of the interface will be an everyday occurrence. Another issue of the program that loads the OLAP environment from the organizationally structured data warehouse environment is the efficiency of operation. The first issue of efficiency is that of simple access of organizationally structured data. Assuming indexes are used wisely, the next issue is that of combining acquisition programs. Instead of every OLAP instance having its own data acquisition program, if the same detailed data is going to be accessed by more than one OLAP environment, then there needs to be a single program acquiring organizationally structured data that feeds all the OLAP instances that must be supported. By having a single pass done against the organizationally structured environment, very efficient OLAP data acquisition processing can be accomplished. DATA MODELING FOR OLAP The OLAP environment may or may not have a data model built for it, as shown in Figure 15.

The usage of a data model in the OLAP environment is questionable because the OLAP environment is subject to change at a moment's notice. The high degree of flexibility of the OLAP environment is such that some types of data and results are created and destroyed faster than they can be modeled. On the other hand, some of the data in the OLAP environment is very stable and in fact should be modeled. Whether a model is applicable or not depends on the kind of data that is being considered. There are several important kinds of data found in the OLAP environment: • permanent detailed data, • nonpermanent detailed data, • static summary data, and • dynamic summary data.

Page 15: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 14

Figure 16 shows the different types of data found in the OLAP environment.

Permanent detailed data is data that comes from the organizationally structured level and is regularly and normally needed in OLAP processing. Permanent detailed data will be detailed from the standpoint of the department that owns the OLAP platform. In actuality, the OLAP permanent detailed data may well be summarized as it passes from the organizationally structured level of the data warehouse into the OLAP environment. In that respect, what is detailed in any one instance of the OLAP environment may be summarized from the perspective of the corporate DSS analyst. Referring back to Figure 1, the data warehouse architecture supports maintaining he appropriate level of detail and summarization to support the informational requirements of the entire organization, as well as the different functional requirements of different departments within the organization. The second kind of data found in the OLAP environment is nonpermanent detailed data. Non-permanent detailed data is that data that is brought into the OLAP environment on a one-time only, or a temporary basis. Non-permanent data is used for special reports and analyses. The data model for the OLAP environment applies to permanent detailed data and does not apply to nonpermanent detailed data. Static summary data is that data that can be recalculated repeatedly with the same result, regardless of when the calculation is made. Nearly all of the data that is summarized in the OLAP environment is static. As such, a data model can and should be created that identifies the static summary data that belongs in the OLAP environment. The farmers that constitute the OLAP community will tell the data modeler what summarized data is needed. The database administrator (DBA), or whoever is responsible for monitoring the activity against the data warehouse, will also be able to provide input to the data modeler as to what detailed data should be summarized and how, based on patterns of utilization.

Page 16: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 15

There normally is some small amount of dynamic summary data that is found in the OLAP environment. Three of the most common occurrences that affect whether or not dynamic data will be found in the OLAP environment are: 1. changes in how the department wishes to manipulate OLAP data, 2. corrections to detailed data in the organizationally structured environment, and 3. changes in how the different levels of aggregation of a complex categorization are

organized. One of the characteristics of the OLAP environment is that it is flexible. Some departments, especially those engaged in “what if” types of analyses such as marketing, have extremely dynamic requirements for information. The OLAP environment, representing the departmentally structured level of the data warehouse architecture, can react and respond to these frequently and often radically changing requirements without requiring a change to the underlying organizationally structured level of the data warehouse. While the data in the data warehouse is defined as being “nonvolatile”, there are circumstances where, for whatever reason, organizationally structured data must be corrected. The most common cause of these corrections is business processing rules that fall outside of the business rules used to trigger data acquisition for the data warehouse. While these situations usually do not have a significant impact on the data customized in the OLAP environment, there may be exceptions. The other, more common reason for summary data to be considered dynamic is changes to the structure of complex categories. A Department (or even the entire organization) may have a business requirement to analyze historical data based on the new method organizing a category, such as sales or product hierarchies. If the data stored in the o organizationally structured level of the data warehouse is at the appropriate level of detail, then resummarizing this data will not present a problem. The actual processing of the resummarization may be considerable, but the ability to meet this requirement of turning data into information will exist. PHYSICAL DESIGN OF THE OLAP ENVIRONMENT The data model that is created for the OLAP database design leads to a physical design. The basis of physical design is a combination of properly normalized data and the star schema. For those entities of data that are non-frequently occurring, the data model and normalization serve as the basis for physical design. For those entities that are frequently occurring, the star schema serves as a basis for physical design. Figure 17 shows a star schema.

Page 17: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 16

The star schema shown in Figure 14 has several components. At the center of the star schema is the fact table. The fact table represents the entity that is most populous in terms of data occurrences. The data in the fact table is made up of data elements from the organizationally structured level that are additive in nature; that is, the values in these data elements can be summed in a variety of ways without jeopardizing the integrity of the data. Surrounding the fact table are the dimension tables. The dimension tables are where constant related data are stored. The data in dimension tables are descriptive in nature, not additive, there is a prejoined foreign key relationship relating the fact table to the dimension tables. The purpose of the fact table is to streamline the information processing that must make use of the numerous occurrences of data found in the fact table. The remainder of the data, (i.e., the non-populous entities) make use of the classical data model as a basis for physical design. THE ORDER OF BUILDING THE COMPONENTS There is a predictable order in which the various components of the architecture are built. Figure 18 shows that order.

Page 18: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 17

The organizationally structured level of the data warehouse is the first component of the architecture that is built and populated. The organizationally structured data begins its journey in the operational environment. From the operational environment, the data is transformed and integrated. The organizationally structured portion of the data warehouse is then populated. After a serious amount of detailed data has been accumulated, the OLAP environment (departmentally structured) is begun. Only after a significant amount of organizationally structured data that has been gathered does it make sense to start to build the departmentally structured environment. The OLAP environment is populated by the internal data of the organization, as well as external data source, as seen in Figure 19.

EXTERNAL DATA AND OLAP External data can come from any number of sources. It may be fed directly into the OLAP environment or may be fed into the organizationally structured environment where it can then be passes along to the OLAP environment. When external data is fed directly into the OLAP environment, the implication is that there is no other corporate use of it outside of the department that controls that OLAP instance (Figure 19.1).

Page 19: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 18

When there is a corporate need for that external data, then the data is fed into the organizationally structured portion of the data warehouse where it is then available to any instance in the OLAP environment (Figure 19.2). This design may be the result of an up-front decision, or may evolve over time from the previous example f just populating an OLAP instance with external data.

External data may undergo some amount of refinement before being placed in the OLAP environment. Some of the typical refinements include: • editing fields, • removing selected records, • joining records to other data, • summarizing external data, etc. PLATFORMS AND OLAP One of the important aspects of the data warehouse organizationally structured /OLAP relationship is that there are many, significant differences between the two environments. If there are very stark differences, the two environments are best placed on different platforms, as seen in Figure 20.

Page 20: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 19

Figure 20 shows that there are more differences than just platform between the two environments - there could be differences in which DBMS best supports each, in budget, and in the type and number of end users. As a rule, the expenditures for the creation and management of organizationally structured data warehouse data is placed in the corporate IS budget, while the OLAP expenditures should be placed in each of the departmental budgets that have a need for a departmentally structured environment. Organizationally structured data users, for the most part, are the corporate explorers who need to get at raw corporate data. The users of the OLAP environment are the departmental analysts who have a parochial interest and perspective of the data found in their OLAP instances. There are mostly farmers at the OLAP level. When the organizationally structured level of the data warehouse is small, it is probably most cost-effective to combine the organizationally structured data of the data warehouse and the OLAP environment together onto a single platform. Once the organizationally structured data blossoms into significant volumes, there is no real possibility that the two environments can be subsumed into the same physical environment. In most cases, separating the organizationally structured and OLAP environment onto separate platforms is an evolutionary process. An organization will begin with a single physical implementation of their data warehouse environment, including the organizationally structured and OLAP levels. Then, over time, the existence of various factors at various degrees of completeness will force the necessity of separating the two environments onto different physical platforms. Sometimes these factors can be predicted during the design of the data warehouse, and consequently the overall design takes into consideration a certain degree of physical separation between organizationally structured and OLAP data and processing. Whether the initial implementation of the data warehouse does or does not include OLAP processing, and then whether initially separated or sharing a single platform, eventually most data warehouses evolve to the point of requiring that the organizationally structured and departmentally structured levels have their own, dedicated platforms. Some of the factors that affect the decision to separate or not to separate include: • Size: The size of the organizationally structured level of the data warehouse,

whether initially or through the natural growth process of a data warehouse (addition of and changes to subject areas, mushrooming end user requirements for OLAP capabilities, etc.), may require all of the resources of a given hardware platform. Likewise, there may be real physical limitation to the RDBMS of choice to maintain the organizationally structured level, requiring that any OLAP instances be housed elsewhere.

• Performance: The primary responsibility of the organizationally structured level of the data warehouse is to maintain integrated data from a variety of sources I a manner that facilitates informational processing for the entire organization. The performance of fulfilling this requirement may be threatened by end user access to the OLAP environment, or vice versa.

• Number of Departments: The sheer number of departments requiring OLAP capabilities may be mathematically impossible to support on a single platform.

Page 21: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 20

• Volatility: The kinds of changes to the underlying data structure of an organizationally structured level necessary to support the information requirements of the organization may be too disruptive to the OLAP environments, affecting their stability and performance.

• Geographic Location: For a distributed or decentralized organization, it may be more efficient to physically locate an OLAP environment at geographically diverse locations. These could be across the street, across the state, nation or world from the organizationally structured level, or any combination thereof.

• User Autonomy: It is not uncommon for end users to complain about sharing resources with other departments, claiming that it negatively impacts their performance. True or not, perception is reality, and in order for the users to feel that they can turn their data into information they may require a physically separate environment that they can call their own. There may also be security or other confidentiality issues requiring that some OLAP data be physically separated from the rest of the data warehouse environment. Whatever the reason, budgetary independence plays a significant role in whether a separate OLAP environment can be made available to these departments.

• Platform Considerations: Some applications of OLAP processing capabilities may need to take advantage of unique or physically diverse platforms. A multi-dimensional database and/or server may be the appropriate configuration for a given group, rather than a multi-dimensional tool accessing a relational database, requiring separation.

For these reasons and because of the differences in budget and control of the different environments, there usually is no problem with having separate platforms for the different environments. Figure 20.1 represents a platform shared by both an organizationally structured environment and an OLAP environment, while Figure 20.2 shows them separated. It is important to note that not all instances of an OLAP environment within an organization’s data warehouse will warrant a separate physical environment. In some cases one or more OLAP instances must or should be physically separate, while one or more others can continue to function on the same physical platform as the organizationally structured level of the data warehouse.

Page 22: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 21

OLAP AND THE WORKLOAD One of the interesting benefits of building an OLAP environment is that of the redistribution of workload from data access (queries) only against the organizationally structured data, to a combination of data acquisition from the organizationally structured environment to the OLAP environment and data access against both. Figure 21 depicts these differences before and after the OLAP environment is built.

Page 23: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 22

When there is no OLAP environment all queries MUST be run in the organizationally structured environment. There simply is no other choice. Running all queries in the organizationally structured environment may be no problem as long as there is not much data there or as long as not much data there, or as long as not much processing is occurring there. But the instant that there is much data in the organizationally structured environment or considerable processing against that environment to facilitate data access, then the need for the OLAP environment becomes apparent. Some of the factors that accelerate the need for an OLAP environment to facilitate more efficient data access in the quest to turn data into information include: • preaggregating data for better performance, • precategorizing data to enhance understanding and usability by end users, • standardizing calculation, metrics and other derived data to ensure accuracy

throughout the organization, • one access point (organizationally structured data) vs. many (OLAP data), and so

forth. Once the OLAP environment is created, the bulk of the queries are executed away from the organizationally structured environment. Note, that not all queries are shifted to the OLAP environment. Even in the most mature DSS environments, there are always a number of queries that simply cannot be done outside the organizationally structured environment of the data warehouse. Corporate data explorers must certainly have access to this detailed data. The level of detail or type of data that is needed is such that ONLY queries at the organizationally structured level will suffice. The shift to the OLAP environment from the organizationally structured environment has much to be said for it: • it is economical, • it is highly flexible, • it allows customization of data to occur for a given department, • it takes advantage of different software to meet different requirements, residing on

the OLAP platform, • it allows significant portions of data to be isolated, • it allows subsets of data to be isolated, and so forth. SUMMARY The OLAP environment is sometimes called the data mart, the departmental, lightly summarized or departmentally structured level of the data warehouse. The OLAP environment is customized for the department that it serves. A subset of the detailed data maintained in the organizationally structured level of the data warehouse is placed in the OLAP environment, usually undergoing some form of pre-processing (summarization, denormalization, etc.) as it moves from the organizationally structured level. There can be many OLAP environments, at least one for each department needing to do OLAP processing in meeting their objective of turning data into information. The

Page 24: Olap and data warehouse -inmon 050204 u

OO LL AA PP AA NN DD DD AA TT AA WW AA RR EE HH OO UU SS EE

©Copyright 2000 by William H. Inmon, all rights reserved Page 23

organizationally structured level of the data warehouse serves as a basis of reconcilability for many departments that do OLAP processing. The OLAP environment is highly indexed, in contrast to the organizationally structured environment, which is sparsely indexed. The OLAP environment entails an elegant interface, as opposed to the crude (or virtually nonexistent) interface to the detailed data in the organizationally structured level. Drill down processing is an integral part of the OLAP environment and building the OLAP environment directly from the operational environment without first building the organizationally structured level of the data warehouse is patently a mistake. An important aspect of the OLAP environment is metadata. There are two types: local and global OLAP metadata. The data model is used for the design of some of the data found in the OLAP environment, while observation of how the organizationally structured data is utilized provides other important insights into the OLAP design. Physical database design centers on classical normalization and the star schema. External data as well as internal data can be included in the OLAP environment.