· web viewmultidimensional structure is defined as “a variation of the relational model that...

Online analytical processing

From Wikipedia, the free encyclopedia

(Redirected from OLAP)

Jump to: navigation, search

Online analytical processing, or OLAP (IPA: /ˈoʊlæp/), is an approach to quickly answer multi-dimensional analytical queries.[1] OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining.[2] The typical applications of OLAP are in business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).[3]

Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and hierarchical databases that are faster than relational databases.[4]

Nigel Pendse has suggested that an alternative and perhaps more descriptive term to describe the concept of OLAP is Fast Analysis of Shared Multidimensional Information (FASMI).[5]

The output of an OLAP query is typically displayed in a matrix (or pivot) format. The dimensions form the rows and columns of the matrix; the measures form the values.

Contents

[hide]

· 1 Concept

· 1.1 Multidimensional databases

· 2 Aggregations

· 3 Types

· 3.1 Multidimensional

· 3.2 Relational

· 3.3 Hybrid

· 3.4 Comparison

· 3.5 Other types

· 4 APIs and query languages

· 5 Products

· 5.1 History

· 5.2 Market structure

· 6 See also

· 7 Bibliography

· 8 References

[edit] Concept

In the core of any OLAP system is a concept of an OLAP cube (also called a multidimensional cube or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.

Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure.

A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale.

Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by adding a column to the fact table. This allows an analyst to view the measures along any combination of the dimensions.

For Example:

Sales Fact Table

+-----------------------+

| sale_amount | time_id |

+-----------------------+ Time Dimension

| 2008.08| 1234|---+ +----------------------------+

+-----------------------+ | | time_id | timestamp |

| +----------------------------+

+---->| 1234 | 20080902 12:35:43|

+----------------------------+

[edit] Multidimensional databases

Multidimensional structure is defined as “a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data” (O'Brien & Marakas, 2009, pg 177). The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. “Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions” (pg. 178). Even when data is manipulated it is still easy to access as well as be a compact type of database. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications (O’Brien & Marakas, 2009). Analytical databases use these databases because of their ability to deliver answers quickly to complex business queries. Data can be seen from different ways, which gives a broader picture of a problem unlike other models (Williams, Garza, Tucker & Marcus, 1994).

[edit] Aggregations

It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time for the same query on OLTP relational data. [6] [7] The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions. The number of possible aggregations is determined by every possible combination of dimension granularities.

The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data [8].

Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-Complete. Many approaches to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms and A* search algorithm.

A very effective way to support aggregation and other common OLAP operations is the use of bitmap indexes.

[edit] Types

OLAP systems have been traditionally categorized using the following taxonomy.[9]

[edit] Multidimensional

Main article: MOLAP

MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Therefore it requires the pre-computation and storage of information in the cube - the operation known as processing.

[edit] Relational

Main article: ROLAP

ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design.

[edit] Hybrid

Main article: HOLAP

There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.

[edit] Comparison

Each type has certain benefits, although there is disagreement about the specifics of the benefits between providers.

· Some MOLAP implementations are prone to database explosion. Database explosion is a phenomenon causing vast amounts of storage space to be used by MOLAP databases when certain common conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data. The typical mitigation technique for database explosion is not to materialize all the possible aggregation, but only the optimal subset of aggregations based on the desired performance vs. storage trade off.

· MOLAP generally delivers better performance due to specialized indexing and storage optimizations. MOLAP also needs less storage space compared to ROLAP because the specialized storage typically includes compression techniques.[10]

· ROLAP is generally more scalable.[10] However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer tremendously

· Since ROLAP relies more on the database to perform calculations, it has more limitations in the specialized functions it can use.

· HOLAP encompasses a range of solutions that attempt to mix the best of ROLAP and MOLAP. It can generally pre-process quickly, scale well, and offer good function support.

[edit] Other types

The following acronyms are also sometimes used, although they are not as widespread as the ones above:

· WOLAP - Web-based OLAP

· DOLAP - Desktop OLAP

· RTOLAP - Real-Time OLAP

[edit] APIs and query languages

Unlike relational databases, which had SQL as the standard query language, and wide-spread APIs such as ODBC, JDBC and OLEDB, there was no such unification in the OLAP world for a long time. The first real standard API was OLE DB for OLAP specification from Microsoft which appeared in 1997 and introduced the MDX query language. Several OLAP vendors - both server and client - adopted it. In 2001 Microsoft and Hyperion announced the XML for Analysis specification, which was endorsed by most of the OLAP vendors. Since this also used MDX as a query language, MDX became the de-facto standard.[11]

[edit] Products

[edit] History

The first product that performed OLAP queries was Express, which was released in 1970 (and acquired by Oracle in 1995 from Information Resources)[12]. However, the term did not appear until 1993 when it was coined by Ted Codd, who has been described as "the father of the relational database". Codd's paper[1] resulted from a short consulting assignment which Codd undertook for former Arbor Software (later Hyperion Solutions, and in 2007 acquired by Oracle), as a sort of marketing coup. The company had released its own OLAP product, Essbase, a year earlier. As a result Codd's "twelve laws of online analytical processing" were explicit in their reference to Essbase. There was some ensuing controversy and when Computerworld learned that Codd was paid by Arbor, it retracted the article. OLAP market experienced strong growth in late 90s with dozens of commercial products going into market. In 1998, Microsoft released its first OLAP Server - Microsoft Analysis Services, which drove wide adoption of OLAP technology and moved it into mainstream.

[edit] Market structure

Below is a list of top OLAP vendors in 2006, with figures in millions of United States Dollars.[13]

Vendor

Global Revenue

Microsoft Corporation

1,801

Hyperion Solutions Corporation

1,077

Cognos

735

Business Objects

416

MicroStrategy

416

SAP AG

330

Cartesis SA

210

Applix

205

Infor

199

Oracle Corporation

159

Others

152

Total

5,700

Microsoft was the only vendor that continuously exceeded the industrial average growth during 2000-2006. Since the above data was collected, Hyperion has been acquired by Oracle, Cartesis by Business Objects, Business Objects by SAP, Applix by Cognos, and Cognos by IBM.[14]

[edit] See also

Computer science portal

· Business intelligence

· Data warehousing

· Data mining

· Predictive analytics

· Business analytics

· OLTP

[edit] Bibliography

· Daniel Lemire (2007-12). "Data Warehousing and OLAP-A Research-Oriented Bibliography" (in English). http://www.daniel-lemire.com/OLAP/.

· Erik Thomsen. (1997). OLAP Solutions: Building Multidimensional Information Systems, 2nd Edition. John Wiley & Sons. ISBN 978-0471149316.

· O’Brien, J. A., & Marakas, G. M. (2009). Management information systems (9th ed.). Boston, MA: McGraw-Hill/Irwin.

· Williams, C., Garza, V. R., Tucker, S, Marcus, A. M. (1994, January 24). Multidimensional models boost viewing options. InfoWorld, 16(4).

[edit] References

1. ^ a b Codd E.F., Codd S.B., and Salley C.T. (1993). "Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate". Codd & Date, Inc. http://www.fpm.com/refer/codd.html. Retrieved on 2008-03-05.

2. ^ Deepak Pareek (2007). Business Intelligence for Telecommunications. CRC Press. pp. 294 pp. ISBN 0849387922. http://books.google.com/books?id=M-UOE1Cp9OEC. Retrieved on 2008-03-18.

3. ^ "OLAP Council White Paper" (PDF). OLAP Council. 1997. http://www.symcorp.com/downloads/OLAP_CouncilWhitePaper.pdf. Retrieved on 2008-03-18.

4. ^ Hari Mailvaganam (2007). "Introduction to OLAP - Slice, Dice and Drill!". Data Warehousing Review. http://www.dwreview.com/OLAP/Introduction_OLAP.html. Retrieved on 2008-03-18.

5. ^ Nigel Pendse (2008-03-03). "What is OLAP? An analysis of what the often misused OLAP term is supposed to mean". OLAP Report. http://www.olapreport.com/fasmi.htm. Retrieved on 2008-03-18.

6. ^ MicroStrategy, Incorporated (1995). "The Case for Relational OLAP" (PDF). http://www.cs.bgu.ac.il/~dbm031/dw042/Papers/microstrategy_211.pdf. Retrieved on 2008-03-20.

7. ^ Surajit Chaudhuri and Umeshwar Dayal (1997). "An overview of data warehousing and OLAP technology". SIGMOD Rec. (ACM) 26: 65. doi:10.1145/248603.248616. http://doi.acm.org/10.1145/248603.248616. Retrieved on 2008-03-20.

8. ^ Gray, Jim; Chaudhuri, Surajit; Layman, Andrew; Reichart, Hamid; Venkatrao; Pellow; Pirahesh (1997). "Data Cube: {A} Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals". J. Data Mining and Knowledge Discovery 1 (1): pp. 29–53. http://citeseer.ist.psu.edu/gray97data.html. Retrieved on 2008-03-20.

9. ^ Nigel Pendse (2006-06-27). "OLAP architectures". OLAP Report. http://www.olapreport.com/Architectures.htm. Retrieved on 2008-03-17.

10. ^ a b Bach Pedersen, Torben; S. Jensen (December 2001). "Multidimensional Database Technology" (PDF). Distributed Systems Online (IEEE): 40–46. ISSN 0018-9162. http://ieeexplore.ieee.org/iel5/2/20936/00970558.pdf.

11. ^ Nigel Pendse (2007-08-23). "Commentary: OLAP API wars". OLAP Report. http://www.olapreport.com/Comment_APIs.htm. Retrieved on 2008-03-18.

12. ^ Nigel Pendse (2007-08-23). "The origins of today’s OLAP products". OLAP Report. http://olapreport.com/origins.htm. Retrieved on November 27.

13. ^ Nigel Pendse (2006). "OLAP Market". OLAP Report. http://www.olapreport.com/market.htm. Retrieved on 2008-03-17.

14. ^ Nigel Pendse (2008-03-07). "Consolidations in the BI industry". http://www.olapreport.com/consolidations.htm. Retrieved on 2008-03-18.

OLAP

Origem: Wikipédia, a enciclopédia livre.

Ir para: navegação, pesquisa

OLAP,ou On-line Analytical Processing é a capacidade para manipular e analisar um grande volume de dados sob múltiplas perspectivas.

As aplicações OLAP são usadas pelos gestores em qualquer nível da organização para lhes permitir análises comparativas que facilitem a sua tomada de decisões diária.

Classifica-se em DOLAP, ROLAP, MOLAP e HOLAP

[editar] Ligações externas

OLAP - On Line Analytical Processing

O que é OLAP

OLAP - On Line Analytical Processing é a tecnologia que permite ao usuário (geralmente diretores, presidentes e gerentes) um rápido acesso para visualizar e analisar os dados com alta flexibilidade e desempenho. Esse alto desempenho se dá graças ao modelo multidimensional, que simplifica o processo de pesquisa. Classifica-se em (DOLAP, ROLAP, MOLAP e HOLAP).

DOLAP – Desktop On Line Analytical Processing

São as ferramentas que disparam uma QUERY da estação de trabalho para o servidor que por sua vez retornam enviando o micro-cubo de volta para ser analisado na estação de trabalho do cliente.

Vantagem: Pouco tráfego na rede, pois o processamento acontece estação de trabalho do cliente. Maior agilidade na análise dos dados.

Desvantagem: O tamanho do micro-cubo não pode ser grande, se não a análise passa a ser demorada e a máquina do cliente pode não suportar dependendo de sua configuração.

ROLAP - Relational On Line Analytical Processing

São ferramentas que enviam as consultas SQL para o servidor de banco de dados relacional e processada lá mesmo. Sendo assim o processamento será apenas no servidor.

Vantagem: Permite a análise de grandes volumes de dados devido aos processamentos serem do lado do servidor e não da estação de trabalho do cliente.

Desvantagem: Se forem feitas diversas requisições ao servidor simultaneamente o mesmo poderá ficar lento ou até mesmo indisponível dependendo de sua configuração. Isso se da exatamente por ele ter que processar todas as requisições de todos os clientes.

MOLAP - Multidimensional On Line Analytical Processing

São ferramentas que fazem suas requisições diretamente ao servidor de banco de Dados multidimensional. O usuário manipula os dados diretamente no servidor.

Vantagem: Ganho no desempenho, e permite a consulta de grandes volumes de dados devido ao processamento ser feito diretamente no servidor.

Desvantagem: Custo da ferramenta é elevado e também temos o problema de escalailidade.

HOLAP - Hybrid On Line Analytical Processing

São as ferramentas hibridas, ou seja, a combinação de ROLAP e MOLAP. Vantagem: A mistura das duas tecnologias obtendo o melhor de cada uma delas, ROLAP (escalabilidade) + MOLAP (alto desempenho).

Desvantagem: Custo da ferramenta é elevado

OLAP - On Line Analytical Processing pode ser traduzido como Processo Analítico On Line, é a tecnologia que permite ao usuário (geralmente diretores, presidentes e gerentes) um rápido acesso para visualizar e analisar os dados com alta flexibilidade e desempenho. Esse alto desempenho se dá graças ao modelo multidimensional, que simplifica o processo de pesquisa. Classifica-se em (DOLAP, ROLAP, MOLAP e HOLAP).

Business Objects Cognos Hyperion Microstrategy MV Business Analytics Suite Oracle BI Enterprise Edition Pentaho

What is OLAP?

An analysis of what the often misused OLAP term is supposed to mean

You can contact Nigel Pendse, the author of this section, by e-mail on [email protected] if you have any comments or observations. Last updated on March 3, 2008.

The term, of course, stands for ‘On-Line Analytical Processing’. Unfortunately, this is neither a meaningful definition nor a description of what OLAP means. It certainly gives no indication of why you would want to use an OLAP tool, or even what an OLAP tool actually does. And it gives you no help in deciding if a product is an OLAP tool or not. It was simply chosen as a term to contrast with OLTP, on-line transaction processing, which is much more meaningful.

We hit this problem as soon as we started researching The OLAP Report in late 1994 as we needed to decide which products fell into the category. Deciding what is an OLAP has not got any easier since then, as more and more vendors claim to have ‘OLAP compliant’ products, whatever that may mean (often they don’t even know). It is not possible to rely on the vendors’ own descriptions and membership of the long-defunct OLAP Council was not a reliable indicator of whether or not a company produces OLAP products. For example, several significant OLAP vendors were never members or resigned, and several members were not OLAP vendors. Membership of the instantly moribund replacement Analytical Solutions Forum was even less of a guide, as it was intended to include non-OLAP vendors.

The Codd rules also turned out to be an unsuitable way of detecting ‘OLAP compliance’, so we were forced to create our own definition. It had to be simple, memorable and product-independent, and the resulting definition is the ‘FASMI’ test. The key thing that all OLAP products have in common is multidimensionality, but that is not the only requirement for an OLAP product.

This is copyright material. You can make brief references to it freely, with attribution, but not reproduce large sections or the entire article without permission from the publisher. You are free to link to this page without permission.

In addition to this article, The OLAP Report contains numerous other analyses, product reviews and case studies. Many of these are available for immediate individual purchase, or you can subscribe to the entire site.

The FASMI test

We wanted to define the characteristics of an OLAP application in a specific way, without dictating how it should be implemented. As our research has shown, there are many ways of implementing OLAP compliant applications, and no single piece of technology should be officially required, or even recommended. Of course, we have studied the technologies used in commercial OLAP products and this report provides many such details. We have suggested in which circumstances one approach or another might be preferred, and have also identified areas where we feel that all the products currently fall short of what we regard as a technology ideal.

Our definition is designed to be short and easy to remember — 12 rules or 18 features are far too many for most people to carry in their heads; we are pleased that we were able to summarize the OLAP definition in just five key words: Fast Analysis of Shared Multidimensional Information — or, FASMI for short.

This definition was first used by us in early 1995, and we are very pleased that it has not needed revision in the years since. This definition has now been widely adopted and is cited in over 120 Web sites in about 30 countries.

FAST means that the system is targeted to deliver most responses to users in less than five seconds, with the simplest analyses taking no more than one second and very few taking more than 20 seconds. Even if users have been warned that it will take more than a few seconds, they are soon likely to get distracted and lose their chain of thought, so the quality of analysis suffers. This speed is not easy to achieve with large amounts of data, particularly if on-the-fly and ad hoc calculations are required. Vendors resort to a wide variety of techniques to achieve this goal, including specialized forms of data storage, extensive pre-calculations and specific hardware requirements, but we do not think any products are yet fully optimized, so we expect this to be an area of developing technology. In particular, the full pre-calculation approach fails with very large, sparse applications as the databases simply get too large (the database explosion problem), whereas doing everything on-the-fly is much too slow with large databases, even if exotic hardware is used. Even though it may seem miraculous at first if reports that previously took days now take only minutes, users soon get bored of waiting, and the project will be much less successful than if it had delivered a near instantaneous response, even at the cost of less detailed analysis. The BI and OLAP Surveys have found that slow query response is consistently the most often-cited technical problem with OLAP products, so too many deployments are clearly still failing to pass this test. Indeed, there are strong indications that users are becoming ever more demanding, so query responses that would have been considered adequate just a few years ago are now regarded as painfully slow. After all, if Google can search a large proportion of all the on-line information in the world in a quarter of a second, why should relatively tiny amounts of management information take orders of magnitude longer to query?

ANALYSIS means that the system can cope with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user. Although some pre-programming may be needed, we do not think it acceptable if all application definitions have to be done using a professional 4GL. It is certainly necessary to allow the user to define new ad hoc calculations as part of the analysis and to report on the data in any desired way, without having to program, so we exclude products (like Oracle Discoverer) that do not allow adequate end-user oriented calculation flexibility. We do not mind whether this analysis is done in the vendor's own tools or in a linked external product such as a spreadsheet, simply that all the required analysis functionality be provided in an intuitive manner for the target users. This could include specific features like time series analysis, cost allocations, currency translation, goal seeking, ad hoc multidimensional structural changes, non-procedural modeling, exception alerting, data mining and other application dependent features. These capabilities differ widely between products, depending on their target markets.

SHARED means that the system implements all the security requirements for confidentiality (possibly down to cell level) and, if multiple write access is needed, concurrent update locking at an appropriate level. Not all applications need users to write data back, but for the growing number that do, the system should be able to handle multiple updates in a timely, secure manner. This is a major area of weakness in many OLAP products, which tend to assume that all OLAP applications will be read-only, with simplistic security controls. Even products with multi-user read-write often have crude security models; an example is Microsoft OLAP Services.

MULTIDIMENSIONAL is our key requirement. If we had to pick a one-word definition of OLAP, this is it. The system must provide a multidimensional conceptual view of the data, including full support for hierarchies and multiple hierarchies, as this is certainly the most logical way to analyze businesses and organizations. We are not setting up a specific minimum number of dimensions that must be handled as it is too application dependent and most products seem to have enough for their target markets. Again, we do not specify what underlying database technology should be used providing that the user gets a truly multidimensional conceptual view.

INFORMATION is all of the data and derived information needed, wherever it is and however much is relevant for the application. We are measuring the capacity of various products in terms of how much input data they can handle, not how many Gigabytes they take to store it. The capacities of the products differ greatly — the largest OLAP products can hold at least a thousand times as much data as the smallest. There are many considerations here, including data duplication, RAM required, disk space utilization, performance, integration with data warehouses and the like.

We think that the FASMI test is a reasonable and understandable definition of the goals OLAP is meant to achieve. We encourage users and vendors to adopt this definition, which we hope will avoid the controversies of previous attempts.

The techniques used to achieve it include many flavors of client/server architecture, time series analysis, object-orientation, optimized proprietary data storage, multithreading and various patented ideas that vendors are so proud of. We have views on these as well, but we would not want any such technologies to become part of the definition of OLAP. Vendors who are covered in this report had every chance to tell us about their technologies, but it is their ability to achieve OLAP goals for their chosen application areas that impressed us most.

Dr Edgar “Ted” Codd (1923-2003)

It is with sadness that I learned of the death last week of Dr Ted Codd, the inventor of the relational database model. I was fortunate enough to meet Dr Codd in October 1994, shortly after he, in a white paper commissioned by Arbor Software (now part of Hyperion Solutions), first coined the term OLAP. I was chairing a conference in London (the same conference at which I first met Nigel Pendse) and Dr Codd gave the keynote address. He explained how analytical databases were a necessary companion to databases built on the relational model which he invented in 1969. It is easy to forget today, when the relational database is ubiquitous, that there was a time when it was far from the dominant standard and, in fact, competed with network, hierarchical and other types of databases. Dr Codd defended his invention strongly. Even when Honeywell MRDS, the first commercial relational data base, was released in 1976, there were still many detractors. By the time Oracle released its relational database in 1979 and started to gain traction with the market, Dr Codd had spent ten long years defending his invention. It was not until the early 80’s that the relational database emerged as a clear standard.

Subsequently I was fortunate enough to share the podium with Dr Codd and his knowledgeable wife, Sharon, as we gave many presentations on the subject of OLAP at conferences around North America. This gave me a chance to get to know both Ted and Sharon on a more personal level. To hear Ted explain how he landed flying boats on lakes in Africa during the second World War made me realize that there was much more to Ted than the public face of this man who revolutionized computing in his lifetime.

The invention of the relational model is well understood to be a major factor in making modern computing what it is today. ERP systems could not have evolved to where they are without a strong database standard such as the relational model. Modern e-commerce Web sites are dependent on relational technology. But relational technology is equally crucial to those of us in the OLAP world. The source data for our OLAP system comes almost exclusively from relational sources, and it is reassuring to know that the man who invented the relational model, also recognized that it could not provide, without help, the rich analytics that business needs. In the 1994 white paper Dr Codd wrote, “Attempting to force one technology or tool to satisfy a particular need for which another tool is more effective and efficient is like attempting to drive a screw into a wall with a hammer when a screwdriver is at hand: the screw may eventually enter the wall but at what cost?”

Thank you, Ted Codd.

Richard CreethApril 22, 2003

The Codd rules and features

In 1993, E.F. Codd & Associates published a white paper, commissioned by Arbor Software (now Hyperion Solutions), entitled ‘Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate’. The late Dr Codd was very well known as a respected database researcher from the 1960s through to the late 1980s and is credited with being the inventor of the relational database model in 1969. Unfortunately, his OLAP rules proved to be controversial due to being vendor-sponsored, rather than mathematically based.

It is also unclear how much involvement Dr Codd himself had with the OLAP work, but it seems likely that his role was very limited, with more of the work being done by his wife and a temporary researcher than by Dr Codd himself. Several of the rules seem to have been invented by the sponsoring vendor, not Dr Codd. The white paper should therefore be regarded as a vendor-published brochure (which it was) rather than as a serious research paper (which it was not). Note that this paper was not published by Codd & Date, and Chris Date has never endorsed Codd’s OLAP work.

The OLAP white paper included 12 rules, which are now well known (and available for download from vendors’ Web sites). They were followed by another six (much less well known) rules in 1995 and Dr Codd also restructured the rules into four groups, calling them ‘features’. The features are briefly described and evaluated here, but they are now rarely quoted and little used.

Basic Features B

F1: Multidimensional Conceptual View (Original Rule 1). Few would argue with this feature; like Dr Codd, we believe this to be the central core of OLAP. Dr Codd included ‘slice and dice’ as part of this requirement.

F2: Intuitive Data Manipulation (Original Rule 10). Dr Codd preferred data manipulation to be done through direct actions on cells in the view, without recourse to menus or multiple actions. One assumes that this is by using a mouse (or equivalent), but Dr Codd did not actually say so. Many products fail on this, because they do not necessarily support double clicking or drag and drop. The vendors, of course, all claim otherwise. In our view, this feature adds little value to the evaluation process. We think that products should offer a choice of modes (at all times), because not all users like the same approach.

F3: Accessibility: OLAP as a Mediator (Original Rule 3). In this rule, Dr Codd essentially described OLAP engines as middleware, sitting between heterogeneous data sources and an OLAP front-end. Most products can achieve this, but often with more data staging and batching than vendors like to admit.

F4: Batch Extraction vs Interpretive (New). This rule effectively required that products offer both their own staging database for OLAP data as well as offering live access to external data. We agree with Dr Codd on this feature and are disappointed that only a minority of OLAP products properly comply with it, and even those products do not often make it easy or automatic. In effect, Dr Codd was endorsing multidimensional data staging plus partial pre-calculation of large multidimensional databases, with transparent reach-through to underlying detail. Today, this would be regarded as the definition of a hybrid OLAP, which is indeed becoming a popular architecture, so Dr Codd has proved to be very perceptive in this area.

F5: OLAP Analysis Models (New). Dr Codd required that OLAP products should support all four analysis models that he described in his white paper (Categorical, Exegetical, Contemplative and Formulaic). We hesitate to simplify Dr Codd’s erudite phraseology, but we would describe these as parameterized static reporting, slicing and dicing with drill down, ‘what if?’ analysis and goal seeking models, respectively. All OLAP tools in this Report support the first two (but some other claimants do not fully support the second), most support the third to some degree (but probably less than Dr Codd would have liked) and few support the fourth to any usable extent. Perhaps Dr Codd was anticipating data mining in this rule?

F6: Client Server Architecture (Original Rule 5). Dr Codd required not only that the product should be client/server but that the server component of an OLAP product should be sufficiently intelligent that various clients could be attached with minimum effort and programming for integration. This is a much tougher test than simple client/server, and relatively few products qualify. We would argue that this test is probably tougher than it needs to be, and we prefer not to dictate architectures. However, if you do agree with the feature, then you should be aware that most vendors who claim compliance, do so wrongly. In effect, this is also an indirect requirement for openness on the desktop. Perhaps Dr Codd, without ever using the term, was thinking of what the Web would one day deliver? Or perhaps he was anticipating a widely accepted API standard, which still does not really exist. Perhaps, one day, XML for Analysis will fill this gap.

F7: Transparency (Original Rule 2). This test was also a tough but valid one. Full compliance means that a user of, say, a spreadsheet should be able to get full value from an OLAP engine and not even be aware of where the data ultimately comes from. To do this, products must allow live access to heterogeneous data sources from a full function spreadsheet add-in, with the OLAP server engine in between. Although all vendors claimed compliance, many did so by outrageously rewriting Dr Codd’s words. Even Dr Codd’s own vendor-sponsored analyses of Essbase and (then) TM/1 ignore part of the test. In fact, there are a few products that do fully comply with the test, including Analysis Services, Express, and Holos, but neither Essbase nor iTM1 (because they do not support live, transparent access to external data), in spite of Dr Codd’s apparent endorsement. Most products fail to give either full spreadsheet access or live access to heterogeneous data sources. Like the previous feature, this is a tough test for openness.

F8: Multi-User Support (Original Rule 8). Dr Codd recognized that OLAP applications were not all read-only and said that, to be regarded as strategic, OLAP tools must provide concurrent access (retrieval and update), integrity and security. We agree with Dr Codd, but also note that many OLAP applications are still read-only. Again, all the vendors claim compliance but, on a strict interpretation of Dr Codd’s words, few are justified in so doing.

Special Features S

F9: Treatment of Non-Normalized Data (New). This refers to the integration between an OLAP engine and denormalized source data. Dr Codd pointed out that any data updates performed in the OLAP environment should not be allowed to alter stored denormalized data in feeder systems. He could also be interpreted as saying that data changes should not be allowed in what are normally regarded as calculated cells within the OLAP database. For example, Essbase allows this, and Dr Codd would perhaps have disapproved.

F10: Storing OLAP Results: Keeping Them Separate from Source Data (New). This is really an implementation rather than a product issue, but few would disagree with it. In effect, Dr Codd was endorsing the widely-held view that read-write OLAP applications should not be implemented directly on live transaction data, and OLAP data changes should be kept distinct from transaction data. The method of data write-back used in Microsoft Analysis Services is the best implementation of this, as it allows the effects of data changes even within the OLAP environment to be kept segregated from the base data.

F11: Extraction of Missing Values (New). All missing values are cast in the uniform representation defined by the Relational Model Version 2. We interpret this to mean that missing values are to be distinguished from zero values. In fact, in the interests of storing sparse data more compactly, a few OLAP tools such as TM1 do break this rule, without great loss of function.

F12: Treatment of Missing Values (New). All missing values to be ignored by the OLAP analyzer regardless of their source. This relates to Feature 11, and is probably an almost inevitable consequence of how multidimensional engines treat all data.

Reporting Features R

F13: Flexible Reporting (Original Rule 11). Dr Codd required that the dimensions can be laid out in any way that the user requires in reports. We would agree, and most products are capable of this in their formal report writers. Dr Codd did not explicitly state whether he expected the same flexibility in the interactive viewers, perhaps because he was not aware of the distinction between the two. We prefer that it is available, but note that relatively fewer viewers are capable of it. This is one of the reasons that we prefer that analysis and reporting facilities be combined in one module.

F14: Uniform Reporting Performance (Original Rule 4). Dr Codd required that reporting performance be not significantly degraded by increasing the number of dimensions or database size. Curiously, nowhere did he mention that the performance must be fast, merely that it be consistent. In fact, our experience suggests that merely increasing the number of dimensions or database size does not affect performance significantly in fully pre-calculated databases, so Dr Codd could be interpreted as endorsing this approach — which may not be a surprise given that Arbor Software sponsored the paper. However, reports with more content or more on-the-fly calculations usually take longer (in the good products, performance is almost linearly dependent on the number of cells used to produce the report, which may be more than appear in the finished report) and some dimensional layouts will be slower than others, because more disk blocks will have to be read. There are differences between products, but the principal factor that affects performance is the degree to which the calculations are performed in advance and where live calculations are done (client, multidimensional server engine or RDBMS). This is far more important than database size, number of dimensions or report complexity.

F15: Automatic Adjustment of Physical Level (Supersedes Original Rule 7). Dr Codd required that the OLAP system adjust its physical schema automatically to adapt to the type of model, data volumes and sparsity. We agree with him, but are disappointed that most vendors fall far short of this noble ideal. We would like to see more progress in this area and also in the related area of determining the degree to which models should be pre-calculated (a major issue that Dr Codd ignores). The Panorama technology, acquired by Microsoft in October 1996, broke new ground here, and users can now benefit from it in Microsoft Analysis Services.

Dimension Control D

F16: Generic Dimensionality (Original Rule 6). Dr Codd took the purist view that each dimension must be equivalent in both its structure and operational capabilities. This may not be unconnected with the fact that this is an Essbase characteristic. However, he did allow additional operational capabilities to be granted to selected dimensions (presumably including time), but he insisted that such additional functions should be grantable to any dimension. He did not want the basic data structures, formulae or reporting formats to be biased towards any one dimension. This has proven to be one of the most controversial of all the original 12 rules. Technology focused products tend to largely comply with it, so the vendors of such products support it. Application focused products usually make no effort to comply, and their vendors bitterly attack the rule. With a strictly purist interpretation, few products fully comply. We would suggest that if you are purchasing a tool for general purpose, multiple application use, then you want to consider this rule, but even then with a lower priority. If you are buying a product for a specific application, you may safely ignore the rule.

F17: Unlimited Dimensions & Aggregation Levels (Original Rule 12). Technically, no product can possibly comply with this feature, because there is no such thing as an unlimited entity on a limited computer. In any case, few applications need more than about eight or ten dimensions, and few hierarchies have more than about six consolidation levels. Dr Codd suggested that if a maximum must be accepted, it should be at least 15 and preferably 20; we believe that this is too arbitrary and takes no account of usage. You should ensure that any product you buy has limits that are greater than you need, but there are many other limiting factors in OLAP products that are liable to trouble you more than this one. In practice, therefore, you can probably ignore this requirement.

F18: Unrestricted Cross-dimensional Operations (Original Rule 9). Dr Codd asserted, and we agree, that all forms of calculation must be allowed across all dimensions, not just the ‘measures’ dimension. In fact, many products which use only relational storage are weak in this area. Most products, such as Essbase, with a multidimensional database are strong. These types of calculations are important if you are doing complex calculations, not just cross tabulations, and are particularly relevant in applications that analyze profitability.

This page is part of the free content of The OLAP Report, but ten times more information is available only to subscribers, including reviews of dozens of products, case studies and in-depth analyses. You can register for access to a preview of some of the subscriber-only material in The OLAP Report or subscribe on-line. It is also possible to purchase individual reviews, analyses and case studies from The OLAP Report.

Category:OLAP History

From OLAP

(Redirected from OLAP History)


Contents

[hide]

· 1 The History of OLAP

· 2 Birth of the Multidimensional Analysis through the APL

· 3 Express, an Enduring Example

· 4 System W for Financial Applications

· 5 Metaphor, the Beginning of the Client/Server

· 6 The New MIS Using GUI

· 7 PowerOLAP, Real-time Data and Excel Integraton

· 8 The Spread of Spreadsheets

The History of OLAP

OLAP is not a new concept and has persisted through the decades. As a matter of fact, the origin of OLAP technology can be traced way back in 1962. It was not until 1993 that the term OLAP was coined in the Codd white paper authored by the highly esteemed database researcher Ted Codd, who also established the 12 rules for an OLAP product. Like many other applications, it has undergone several stages of evolution whose patterns of progress are relatively intricate to follow through.

Birth of the Multidimensional Analysis through the APL

It was Kenneth Iverson who first introduced the base foundation of OLAP through his book “A Programming Language”, which defined a mathematical language with processing operators and multidimensional variables. The APL was regarded as the first multidimensional language and its implementation as a computer programming language happened during the late 1960’s by IBM.

Iverson created brief notations by employing Greek symbols as operators. During this period, high resolution GUIs had not yet surfaced and, as APL uses Greek symbols, it requires support of special hardware like special keyboards, screens and printers. On top of this, since early APL programs were interpreted as opposed to being compiled, it tends to inefficiently exhaust more machine resources and is known for consuming too much RAM space, to name only a few of its drawbacks. Maintenance of APL-based mainframe products is very costly and most programmers encounter difficulties in programming multidimensional applications using arrays in other languages.

Eventually, there was a decline in the market significance of APL, but it still survives to a limited degree. Although it was not deemed a modern OLAP tool, several of its ideas can be seen living through some of the modern day multidimensional applications.

Express, an Enduring Example

A new multidimensional product emerged during the year 1970’s, which became a popular OLAP offering, in the form of Express. This was the first multidimensional tool directed to support marketing related demands or application needs. It later on evolved into a hybrid OLAP after its acquisition by Oracle and has thrived for more than 3 decades. It remains, even in the current period, as one of the well-marketed multidimensional products. One of Express’ more famous successors is the Oracle9i OLAP. And though several enhanced versions have been released throughout the years, the concepts and data models remain unchanged.

The 1980’s period played a significant role in the advancement of the OLAP industry as this triggered the rise of many multidimensional products.

System W for Financial Applications

By the year 1981, a new decision support system software, has been developed by Comshare as a result of their attempt to expand the scope of their market and services offered. System W was the first OLAP tool to cater to financial applications and the first to apply hypercube approach in its multidimensional modeling. But though it proved to be a profitable venture for Comshare for quite some time, it didn’t really achieve much success in the market and was even less favored by technical people as it was more difficult to program in comparison with other software of its kind. Furthermore, it also takes up much of the machine resources and often suffers from database explosion.

UNIX also released APL but never promoted it as an OLAP tool. Presently, System W ceased being marketed but is still operating limitedly on a few IBM mainframes. Other products who replicated similar System W concepts came out such as DOS One-Up by Comshare and the Windows-based Commander Prism but did not make quite a significant mark in the industry. In 1992, Essbase was launched by Hyperion Solution which eventually became a major OLAP server product in the market come year 1997. But just like the original product, this descendant application suffers too from database explosion. Hyperion was finally able to resolve the problem with exploding databases through the release of its Essbase 7X version.

Metaphor, the Beginning of the Client/Server

After a couple or so years after the release of System W, the generally considered first ROLAP product, Metaphor, entered the OLAP market. This multidimensional product established new concepts like client/server computing, multidimensional processing on relational data, workgroup processing and object-oriented development and was basically designed to cater for companies of consumption goods. The vendor of Metaphor was compelled to create proprietary PC and networks since hardware in those days could barely support Metaphor’s requirements.

In 1991, IBM acquired Metaphor and launched the product under the new name IDS. The product still remains operational to support remaining loyal users.

The New MIS Using GUI

A new type of Management Information System product emerged during the mid 1980’s in the form of Executive Information System, or more commonly known as EIS which emphasizes the use of graphical user interfaces (GUI). And on 1985, Pilot Command Center, which was branded as the first ever client/server EIS was released.

Other client/server products that came out are Strategy, Holos, and Information Advantage. Pilot has decided to phase out Command Center but has implemented some of the concepts in its Lightship Server product. Some of Command Center concepts such as automatic time series handling, multidimensional client/server processing and simplified human factors can still be seen living through some modern OLAP products.

PowerOLAP, Real-time Data and Excel Integraton

Founded in 1997, PARIS Technologies published PowerOLAP™, which represents a milestone in the evolution of OLAP (on-line analytical processing) technology. Like any important evolutionary event, PowerOLAP combines the most advanced features of what came before it with new capabilities. Most significantly, PowerOLAP enables users to reach through seamlessly to access transactional data in a relational database for dynamic OLAP manipulations in a true multidimensional environment. In addition, PowerOLAP employs Excel and the Web as a front end, connecting users throughout an organization with underlying data sources via the tools they know best, direct to their desktops.

The Spread of Spreadsheets

A new end-user analysis tool was becoming a favorite during the latter period of 1980. The spreadsheet market was fast prevailing which compelled some of the vendors to create multidimensional applications that could reside on a spreadsheet environment.

Compete initiated to open the market for a multidimensional spreadsheet. It was later on acquired by Computer Associates, in addition to its other spreadsheet products like the SuperCalc and 20/20, from its original vendor then heavily advertised and offered it at a lower cost, but even at this rate it still did not make much market significance. CA later on came out with the version 5 of SuperCalc which was clearly influenced by the almost defunct Compete product.

Improv from Lotus followed suit after Compete. Lotus 1-2-3 began to develop Improv for the NeXT machine under the code name ‘BackBay’. This became a reality as Improv was later on launched on NeXT machines. This became a phenomenal success and has considerably augmented Lotus’ sales until after the efforts to port Improv in Windows and Macintosh system software. The rise of the competitor Microsoft’s Excel product marked the beginning of the decline of Lotus. Lotus attempted moving Improv down the market in the hope of increasing it’s marketability but did not work out. Excel steadily gained on 1-2-3 and ultimately proved to be the superior product which dominated the market. Microsoft’s integration of the Pivot Tables feature in Excel was probably one of the most important enhancements of the Excel product as PivotTable became the most popular and widely used tool for multidimensional analysis. Throughout the years, Microsoft continued to produce new and enhanced versions of Excel like the Excel 2000 and Excel 2003 which showcases a more sophisticated Pivot Table feature that is functions as both a desktop OLAP: small cubes, generated from large databases, but downloaded to PCs for processing (even though, in Web implementations, the cubes usually reside on the server) and a client to Microsoft Analysis Services.

Sinper Corporation came into the OLAP market during the late 1980’s and presented its multidimensional analysis software product for DOS and Windows, then known as TM/1. Sinper turned TM/1 to serve as a multidimensional back-end server for Excel and 1-2-3. Essbase by Arbor followed suit. Market for a multidimensional spreadsheet is booming fast. More and more vendors were attracted to plunge into this growing business. Traditional vendors of host-oriented products like Acumate, Express, Gentia, Holos, Hyperion, Mineshare, MetaCube, PowerPlay and WhiteLight all offer products which provide highly integrated spreadsheet access to their OLAP servers.

Soon after came the release of the OLAP@Work Excel Add-In with features that enable users to make full use of OLAP Services. Then on the year 2004, Excel Add-in went mainstream. Vendors like Business Objects, Cognos, Microsoft, MicroStrategy and Oracle launched their own versions of the product. Concurrently, IntelligentApps, a main vendor of Analysis Services Excel Add-In, was acquired by Sage.

Microsoft released PerformancePoint which delivers more functionality for execution of performance management in the year 2007, but has announced the existence of the product in the prior year.

Pages in category "OLAP History"

The following 4 pages are in this category, out of 4 total.

B

· Business Applications

C

· Codd's Paper

M

· Multidimensional Basics

T

· Types of OLAP Systems

OLAP AND OLAP SERVER DEFINITIONSOLAP: ON-LINE ANALYTICAL PROCESSING

Defined terms

On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated enterprise data supporting end user analytical and navigational activities including:

· calculations and modeling applied across dimensions, through hierarchies and/or across members

· trend analysis over sequential time periods

· slicing subsets for on-screen viewing

· drill-down to deeper levels of consolidation

· reach-through to underlying detail data

· rotation to new dimensional comparisons in the viewing area

OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless of database size and complexity. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server.

OLAP SERVER

An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate on multi-dimensional data structures. A multi-dimensional structure is arranged so that every data item is located and accessed based on the intersection of the dimension members which define that item. The design of the server and the structure of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. The OLAP Server may either physically stage the processed multi-dimensional information to deliver consistent and rapid response times to end users, or it may populate its data structures in real-time from relational or other databases, or offer a choice of both. Given the current state of technology and the end user requirement for consistent and rapid response times, staging the multi-dimensional data in the OLAP Server is often the preferred method.

OLAP GLOSSARYDefined terms:

· AGGREGATE

· ANALYSIS, MULTI-DIMENSIONAL

· ARRAY, MULTI-DIMENSIONAL

· CALCULATED MEMBER

· CELL

· CHILDREN

· COLUMN DIMENSION

· CONSOLIDATE

· CUBE

· DENSE

· DERIVED DATA

· DERIVED MEMBERS

· DETAIL MEMBER

· DIMENSION

· DRILL DOWN/UP

· FORMULA

· FORMULA, CROSS-DIMENSIONAL

· GENERATION, HIERARCHICAL

· HIERARCHICAL RELATIONSHIPS

· HORIZONTAL DIMENSION

· HYPERCUBE

· INPUT MEMBERS

· LEVEL, HIERARCHICAL

· MEMBER, DIMENSION

· MEMBER COMBINATION

· MISSING DATA, MISSING VALUE

· MULTI-DIMENSIONAL DATA STRUCTURE

· MULTI-DIMENSIONAL QUERY LANGUAGE

· NAVIGATION

· NESTING (OF MULTI-DIMENSIONAL COLUMNS AND ROWS)

· NON-MISSING DATA

· OLAP CLIENT

· PAGE DIMENSION

· PAGE DISPLAY

· PARENT

· PIVOT

· PRE-CALCULATED/PRE-CONSOLIDATED DATA

· REACH THROUGH

· ROLL-UP

· ROTATE

· ROW DIMENSION

· SCOPING

· SELECTION

· SLICE

· SLICE AND DICE

· SPARSE

· VERTICAL DIMENSION

Definitions:

AGGREGATE

See: Consolidate

ANALYSIS, MULTI-DIMENSIONAL

The objective of multi-dimensional analysis is for end users to gain insight into the meaning contained in databases. The multi-dimensional approach to analysis aligns the data content with the analyst's mental model, hence reducing confusion and lowering the incidence of erroneous interpretations. It also eases navigating the database, screening for a particular subset of data, asking for the data in a particular orientation and defining analytical calculations. Furthermore, because the data is physically stored in a multi-dimensional structure, the speed of these operations is many times faster and more consistent than is possible in other database structures. This combination of simplicity and speed is one of the key benefits of multi-dimensional analysis.

ARRAY, MULTI-DIMENSIONAL

A group of data cells arranged by the dimensions of the data. For example, a spreadsheet exemplifies a two-dimensional array with the data cells arranged in rows and columns, each being a dimension. A three-dimensional array can be visualized as a cube with each dimension forming a side of the cube, including any slice parallel with that side. Higher dimensional arrays have no physical metaphor, but they organize the data in the way users think of their enterprise. Typical enterprise dimensions are time, measures, products, geographical regions, sales channels, etc. Synonyms: Multi-dimensional Structure, Cube, Hypercube

CALCULATED MEMBER

A calculated member is a member of a dimension whose value is determined from other members' values (e.g., by application of a mathematical or logical operation). Calculated members may be part of the OLAP server database or may have been specified by the user during an interactive session. A calculated member is any member that is not an input member.

CELL

A single datapoint that occurs at the intersection defined by selecting one member from each dimension in a multi-dimensional array. For example, if the dimensions are measures, time, product and geography, then the dimension members: Sales, Janu

OLAP Server

From OLAP


An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate on multi-dimensional data structures. A multi-dimensional structure is arranged so that every data item is located and accessed based on the intersection of the dimension members which define that item. The design of the server and the structure of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. The OLAP Server may either physically stage the processed multi-dimensional information to deliver consistent and rapid response times to end users, or it may populate its data structures in real-time from relational or other databases, or offer a choice of both. Given the current state of technology and the end user requirement for consistent and rapid response times, staging the multi-dimensional data in the OLAP Server is often the preferred method.

OLAP Functionality

From OLAP


In the core of any OLAP system is a concept of an OLAP cube (also called a multidimensional cube or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.

OLAP Cube

From OLAP


An OLAP cube is a data structure that allows fast analysis of data. The arrangement of data into cubes overcomes a limitation of relational databases. Relational databases are not well suited for near instantaneous analysis and display of large amounts of data. Instead, they are better suited for creating records from a series of transactions known as OLTP or On-Line Transaction Processing. Although many report-writing tools exist for relational databases, these are slow when the whole database must be summarized.

Contents

[hide]

· 1 Background

· 1.1 Functionality

· 1.2 Pivot

· 1.3 Hierarchy

· 1.4 OLAP operations

· 1.5 Linking cubes and sparsity

· 1.6 Variance in products

· 2 Technical definition

Background

OLAP cubes can be thought of as extensions to the two-dimensional array of a spreadsheet. For example a company might wish to analyze some financial data by product, by time-period, by city, by type of revenue and cost, and by comparing actual data with a budget. These additional methods of analyzing the data are known as dimensions.Because there can be more than three dimensions in an OLAP system the term hypercube is sometimes used.

Functionality

The OLAP cube consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typicallyTemplate:Fact created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.

Pivot

A financial analyst might want to view or "pivot" the data in various ways, such as displaying all the cities down the page and all the products across a page. This could be for a specified period, version and type of expenditure. Having seen the data in this particular way the analyst might then immediately wish to view it in another way. The cube could effectively be re-oriented so that the data displayed now had periods across the page and type of cost down the page. Because this re-orientation involved re-summarizing very large amounts of data, this new view of the data had to be generated efficiently to avoid wasting the analyst's time, i.e within seconds, rather than the hours a relational database and conventional report-writer might have taken.

Hierarchy

Each of the elements of a dimension could be summarized using a hierarchy. The hierarchy is a series of parent-child relationships, typically where a parent member represents the consolidation of the members which are its children. Parent members can be further aggregated as the children of another parent.

For example May 2005 could be summarized into Second Quarter 2005 which in turn would be summarized in the Year 2005. Similarly the cities could be summarized into regions, countries and then global regions; products could be summarized into larger categories; and cost headings could be grouped into types of expenditure. Conversely the analyst could start at a highly summarized level such as the total difference between the actual results and the budget and drill down into the cube to discover which locations, products and periods had produced this difference.

OLAP operations

The analyst can understand the meaning contained in the databases using multi-dimensional analysis. By aligning the data content with the analyst's mental model, the chances of confusion and erroneous interpretations are reduced. The analyst can navigate through the database and screen for a particular subset of the data, changing the data's orientations and defining analytical calculations. The user-initiated process of navigating by calling for page displays interactively, through the specification of slices via rotations and drill down/up is sometimes called "slice and dice". Common operations include slice and dice, drill down, roll up, and pivot.

Slice: A slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset.

Dice: The dice operation is a slice on more than two dimensions of a data cube (or more than two consecutive slices).Template:Cite web

Drill Down/Up: Drilling down or up is a specific analytical technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down).

Roll-up: A roll-up involves computing all of the data relationships for one or more dimensions. To do this, a computational relationship or formula might be defined.

Pivot: To change the dimensional orientation of a report or page display.

Linking cubes and sparsity

The commercial OLAP products have different methods of creating the cubes and hypercubes and of linking cubes and hypercubes (see Types of OLAP in the article on OLAP.)

Linking cubes is a method of overcoming sparsity. Sparsity arises when not every cell in the cube is filled with data and so valuable processing time is taken by effectively adding up zeros. For example revenues may be available for each customer and product but cost data may not be available with this amount of analysis. Instead of creating a sparse cube, it is sometimes better to create another separate, but linked, cube in which a sub-set of the data can be analyzed into great detail. The linking ensures that the data in the cubes remain consistent.

Variance in products

The data in cubes may be updated at times, perhaps by different people. Techniques are therefore often needed to lock parts of the cube while one of the users is writing to it and to recalculate the cube's totals. Other facilities may allow an alert that shows previously calculated totals are no longer valid after the new data has been added, but some products only calculate the totals when they are needed.

Technical definition

In database theory, an OLAP cube isTemplate:Fact an abstract representation of a projection of an RDBMS relation. Given a relation of order N, consider a projection that subtends X, Y, and Z as the key and W as the residual attribute. Characterizing this as a function,

W : (X,Y,Z) → W

the attributes X, Y, and Z correspond to the axes of the cube, while the W value into which each ( X, Y, Z ) triple maps corresponds to the data element that populates each cell of the cube.

Insofar as two-dimensional output devices cannot readily characterize four dimensions, it is more practical to project "slices" of the data cube (we say project in the classic vector analytic sense of dimensional reduction, not in the SQL sense, although the two are clearly conceptually homologous), perhaps

W : (X,Y) → W

which may suppress a primary key, but still have some semantic significance, perhaps a slice of the triadic functional representation for a given Z value of interest.

The motivationTemplate:Fact behind OLAP displays harks back to the cross-tabbed report paradigm of 1980s DBMS. One may wish for a spreadsheet-style display, where—to appropriate the Microsoft Excel paradigm—values of X populate row $1; values of Y populate column $A; and values of W : ( X, Y ) → W populate the individual cells "southeast of" $B2, so to speak, $B2 itself included. While one can certainly use the DML (Data Manipulation Language) of traditional SQL to display ( X, Y, W ) triples, this output format is not nearly as convenient as the cross-tabbed alternative: certainly, the former requires one to hunt linearly for a given ( X, Y ) pair in order to determine the corresponding W value, while the latter enables one to more conveniently scan for the intersection of the proper X column with the proper Y row.

See also Cube

OLAP

From OLAP


OLAP or ON-Line Analytical Processing is a software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by a user. OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated enterprise data supporting end user analytical and navigational activities. OLAP tools do not store individual transaction records in two-dimensional, row-by-column formats, like a worksheet, but instead use multi-dimensional database structures-known as Cubes in OLAP terminology-to store arrays of consolidated information. The data and formulas are stored in an optimized multidimensional database, while views of the data are created on demand. Analysts can take any view, or, Slice, of a Cube to produce a worksheet-like view of points of interest.

Category:OLAP and Excel

From OLAP


Contents

[hide]

· 1 The Power of Excel-Friendly OLAP

· 2 Introducing OLAP

· 3 Excel-Friendly OLAP

· 4 How Much Truth?

· 5 Data Warehouse vs. OLAP Database

· 5.1 Data Silos

· 5.2 Mergers and Acquisitions

· 5.3 System Conversions

· 5.4 External Data

· 5.5 Forecasts

· 5.6 Statistical Corrections

· 5.7 Excel Dashboard Reporting

· 6 Limitations of Excel BPM Reporting

· 7 Examples of OLAP integration in Excel

The Power of Excel-Friendly OLAP

Should Excel be a key component of your company’s BPM system?

There’s no doubt how most IT managers would answer this question. Name IT’s top ten requirements for a successful BPM system, and they’ll quickly explain how Excel violates dozens of them. Even the user community is concerned. Companies are larger and more complex now than in the past; they seem too complex for Excel. Managers need information more quickly now; they can’t wait for another Excel report.

Excel spreadsheets don’t scale well. They can’t be used by many different users. Excel reports have many errors. Excel security is a joke. Excel output is ugly. Excel consolidation occupies a large corner of Spreadsheet Hell. And Sarbanes Oxley has changed everything.

Or so we’re told.

For these reasons, and many more, a growing number of companies of all sizes have concluded that it’s time to replace Excel.

But before your company takes that leap of hope or faith, perhaps you should take another look at Excel…particularly when Excel can be enhanced by an Excel-friendly OLAP database.

Excel-friendly OLAP could force your company to take another look at Excel. That technology helps to eliminate many of the classic objections to using Excel for business performance management.

Introducing OLAP

Excel-friendly OLAP products cure many of the problems that both users and IT managers have with Excel. But before I explain why this is so, I should explain what OLAP is, and how it can be Excel-friendly.

Although OLAP technology has been available for years, it’s still quite obscure. One reason is that “OLAP” is an acronym for four words that are remarkably devoid of meaning: On-Line Analytical Processing.

OLAP databases are more easily understood when they’re compared with relational databases. Both “OLAP” and “relational” are names for a type of database technology. Oversimplified, relational databases contain lists of stuff; OLAP databases contain cubes of stuff.

For example, you could keep your accounting general ledger data in a simple cube with three dimensions: Account, Division, and Month. At the intersection of any particular account, division, and month you would find one number. By convention, a positive number would be a debit and a negative number would be a credit.

Most cubes have more than three dimensions. And they typically contain a wide variety of business data, not merely General Ledger data. OLAP cubes also could contain monthly headcounts, currency exchange rates, daily sales detail, budgets, forecasts, hourly production data, the quarterly financials of your publicly traded competitors, and so on.

You can define any consolidation hierarchy for any of a cube’s dimensions. For example, in the Month dimension every month could roll up into quarters, which could roll up into years. Months also could roll up into year-to-date categories. Users treat both the “leaf” members and the consolidated members as equivalent sources of data. To illustrate, users could choose data from a leaf member like Aug-2006 just as easily as they could choose from a consolidated member like Aug-2006-YTD.

Other dimensions typically have their own roll-up structures. An Account dimension could roll up accounts into traditional financial statement hierarchies. A Division dimension could roll up divisions into the corporate reporting hierarchy. And a Product dimension could roll up products into one or more product structures.

Excel-Friendly OLAP

You probably could find at least 50 OLAP products on the market. But most of them lack a key characteristic: spreadsheet functions.

Excel-friendly OLAP products offer a wide variety of spreadsheet functions that read data from cubes into Excel. Most such products also offer spreadsheet functions that can write to the OLAP database from Excel…with full security, of course.

Read-write security typically can be defined down to the cell level by user. Therefore, only certain analysts can write to a forecast cube. A department manager can read only the salaries of people who report to him. And the OLAP administrator must use a special password to update the General Ledger cube.

Other OLAP products push data into Excel; Excel-friendly OLAPs pull data into Excel. To an Excel user, the difference between push and pull is significant.

Using the push technology, users typically must interact with their OLAP product’s user interface to choose data and then write it as a block of numbers to Excel. If a report relies on five different views of data, users must do this five times. Worse, the data typically isn’t written where it’s needed within the body of the report. Instead, the data merely is parked in the spreadsheet for use somewhere else.

Using the pull technology, spreadsheet users can write formulas that pull the data from any number of cells in any number of cubes in the database. Even a single spreadsheet cell can contain a formula that pulls data from several cubes.

To illustrate, suppose that an Excel dashboard presents information for a particular division and month. Excel users typically would designate a Month and a Division cell, which all the formulas would reference. With this design, you could change the Month cell from “Jun-2006” to “Jul-2006”, and the Division cell from “Northeast” to “Southwest”. Then, by simply recalculating your workbook, you would update the report to reflect the new settings. Under automation, you could print a report for every division for a given month.

At first reading, it’s easy to overlook the significant difference between this method of serving data to Excel and most others. Spreadsheets linked to Excel-friendly OLAP databases don’t contain data; they contain only formulas linked to data on the server. In contrast, most other technologies write blocks of data to Excel. It really doesn’t matter whether the data is imported as a text file, copied and pasted, generated by a PivotTable, or pushed to a spreadsheet by some other OLAP. The other technologies turn Excel into a data store. But Excel-friendly OLAP avoids that problem.

How Much Truth?

It’s common these days for database vendors to talk about having “one version of the truth.” (Recently, for example, Google listed 48,000 hits for that expression.) What’s less common is for anyone to ask these vendors how much relevant truth their systems can provide. This is a critical question for managers looking for BPM information, and for their staff—usually Excel users—who must provide the information. As most Excel users are sadly aware, the IT Department’s data warehouse never will provide all the data needed for business performance management.

It’s true that corporate data warehouses typically contain massive numbers of transactions. But this exhaustive detail largely is irrelevant to BPM, which typically relies on detailed summaries of data. At the extreme, data warehouses are a yard wide and a mile deep. But BPM requires data that is a mile wide and a yard deep.

Data Warehouse vs. OLAP Database

Here are some examples of data that OLAP databases can contain, but which data warehouses typically don’t:

Data Silos

Many information systems—both old and new—rely on databases that never will be added to the data warehouse. But these systems contain data that managers often need for managing business performance.

Most of those systems provide some way to export their data. Often, they support ODBC. Most can export their data as text files. Some companies even print reports from their legacy systems to files, and then use Monarch software to convert that text into rows and columns of data that can be imported into their OLAP database.

Mergers and Acquisitions

When two companies merge, the one company now has two data warehouses, not one. Each organization has one version of its own truth, but neither has one version of the whole truth. This is not an easy problem for IT to solve.

I know of one company, for example, that has five ERPs on four continents. For nearly ten years, IT’s goal has been to create a single data warehouse within two years.

Unfortunately, users and their managers need summary data to be fully available immediately, certainly by the end of the month in which a merger or acquisition closes.

One company closed the purchase of a billion-dollar subsidiary on the 26th of the month. By the Board meeting two weeks later, the Finance staff had printed more than 200 spreadsheets that reported both consolidated and consolidating reports for the new company, down to low-level summaries. All financial data was expressed in terms of the parent company’s Chart of Accounts. The staff could integrate the disparate systems so quickly because the parent already was using an Excel-friendly OLAP. They mapped the subsidiary’s meta data (general ledger codes, department codes, and so on) to the parent’s meta data. They imported the subsidiary’s financials to a new “slice” in the parent’s General Ledger cube, translating the meta data on the fly. Then they printed their standard spreadsheet analyses, all 200 pages of them, while adding a few new Excel analyses specific to the new subsidiary.

System Conversions

When a company purchases a new Enterprise Resource Planning (ERP) system, it creates at least two problems for BPM reporting.

First, the company typically converts the fewest months of historical data it can. For financial systems, companies often convert only one year of history prior to the current fiscal year. But for many BPM purposes, data about past performance is very useful, even critical:

•Monthly time-series forecasting requires at least 30 months of historical data, preferably more. •New products and sales offices often follow a consistent pattern for both revenue growth and startup expenses; but those patterns only can be discovered by analyzing data for startups during the past several years. •The analysis of trends in cost-volume-profit relationships during past downturns can serve as a guide to cost-reduction efforts during current downturns.

With an Excel-friendly OLAP in place before the conversion, all historical data continues to be available. Better yet, managers continue to receive their standard Excel reports, which can display data from both ERPs.

Second, transactions can be classified differently between the old and new systems, and this problem can be very difficult to solve.

To illustrate, I know of two large companies whose system conversions were significantly over budget. The accountants for both companies had specified that all account-department combinations that were not explicitly allowed were to be rejected by the new systems. But to reduce expenses, both systems were set to allow all such combinations that weren’t specifically prohibited. As a consequence, many transactions each month were automatically booked to incorrect account-department combinations.

Under normal circumstances the accountants in each company would have had to manually inspect more than one-hundred million account balances to find GL accounts whose transaction patterns had changed when the accounting systems changed. This would have been an impossible task, of course. However, both companies had been using Excel-friendly OLAP systems before their conversions began. Therefore, each created a simple spreadsheet that returned the monthly transactions for any specific account, department, and division, for the twelve months prior to the conversion and for all months after. Then using standard Excel statistics functions, and simple spreadsheet automation, the spreadsheets looped through every combination of account, department, and division, and listed all questionable combinations. The staff quickly corrected the obvious mistakes and researched the others.

External Data

Managers often need to see their performance reported within the context of their business environment. That environment can be described by the financial data of publicly held customers and competitors, by local and regional economic data, by population trends, and by other measures. IT doesn’t control such data. Nor do IT managers typically understand it. That’s not their job; it’s the user’s job. In most companies, if users don’t create and maintain cubes of external data, no one ever will. It’s not unusual for a knowledgeable user to create an OLAP cube on her local computer, populate it with public data, and then test its use with various spreadsheet reports. Once the cube is tested, she can work with the database administrator to move the cube to the OLAP server.

Forecasts

Most data warehouses provide empty buckets for budget data. But they typically don’t capture the wide variety of forecasts that companies generate. Nor do they help to generate those forecasts. But Excel-friendly OLAP offers both solutions.

To illustrate, Excel users easily can generate both top-down and bottom-up sales forecasts, compare the forecasts to find large conflicts, and then revise the forecasts after researching the differences.

To prepare the top-down forecasts, users can send a forecasting spreadsheet to the sales people. Unlike most spreadsheets, this one would include formulas that write the new forecast data to the appropriate area of an OLAP cube on the server. Full security would be maintained, of course. To prepare the bottom-up forecasts, users first create a spreadsheet that uses statistical methods to extend past sales performance for any product and region into the future. This spreadsheet also writes the forecast to an area of the OLAP cube, again, with full security. Then, using automation, they apply this spreadsheet forecast to every product and every region.

To compare the forecasts, they set up a spreadsheet to compare the top-down and the bottom-up forecasts for any product and region. Again, using automation, they calculate the workbook for every combination and automatically note where the two versions vary by an unusual degree.

Statistical Corrections

Forecasts, analyses, and management reporting all can be seriously flawed if analysts rely on historical data that reflects errors and oversights. But for a variety of reasons, managers, investors, and auditors all take a dim view of prior-period adjustments to the General Ledger. One way to handle this problem, particularly for forecasting and analysis, is for users to maintain an error-correction entity that can be consolidated or ignored, depending on the circumstance. Of course, these corrections must be managed carefully. It would be very easy, after all, for indications of real problems to be “corrected” out of existence. But when statistical corrections are tightly controlled, they provide the only practical way that past performance can be analyzed as it actually happened, not as it was mistakenly booked at the time.

Excel Dashboard Reporting

Excel has not been an obvious choice for BPM reporting. One reason for this is obvious: Typical Excel reports are ugly and difficult to read. But they don’t need to be.

Figure 1 illustrates an Excel dashboard report of public data for Starbucks Corporation. I created this report completely in Excel, with no assistance from third-party tools. This particular report uses data from two public web sites, downloaded into Excel. The report could display equivalent data for any public company whose financial information is covered by the two web sites.

Figure 1

In a business environment, a report like this could report performance for a department, division, product line, or for an entire company. The data would come from an Excel-friendly OLAP, not from the Web.

One significant advantage to using Excel for this type of reporting is that Excel users can change the report quickly and easily, without involving the IT Department. In fact, assuming that the necessary data already resides in the OLAP database, an Excel user typically could replace one measure with another in less than ten minutes.

Another significant advantage is that the report – even a single figure in the report -- can display data from many original sources. To illustrate, a figure could show the trend in labor costs (from the General Ledger cube) per full-time-equivalent employee (from the Headcount cube). Another figure could show the ratio of total company sales (from the General Ledger cube) to the sales of its largest publicly traded competitor (from a Competitor cube).

There is virtually no limit to the appearance that an Excel dashboard can take. Figure 2 illustrates a mockup dashboard based on a standard display that Business Week used about ten years ago. In fact, I often “steal” ideas for dashboard designs from the pages of business magazines. Excel dashboards also can compare the same measures for many different products, divisions, departments, and other entities.

Figure 2

Limitations of Excel BPM Reporting

As a general rule,

· web viewmultidimensional structure is defined as “a variation of the relational model that...

Documents