business analysis-data warehousing

33
UNIT II BUSINESS ANALYSIS Reporting and Query tools and Applications – Tool Categories – Cognos Impromptu–– Online Analytical Processing (OLAP) – Need – Multidimensional Data Model – OLAP Guidelines – Multidimensional versus Multirelational OLAP – Categories of Tools. DECISION SUPPORT TOOLS/Access Tools Tool Categories There are five categories of decision support tools, although the lines that separate them are quickly blurring: Reporting Managed query Executive information systems On-line analytical processing Data mining Reporting tools Reporting tools can be divided into production reporting tools and desktop report writers. Production reporting tools will let companies generate regular operational report or support high- volume batch jobs, such as calculating and printing paychecks. Production reporting tools include third-generation languages such as COBOL; specialized fourth-generation languages, such as information Builders, Inc’s Focus; and high-end client/server tools, such as MITTs SQR.

Upload: dhilsath-fathima

Post on 16-Apr-2017

496 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: business analysis-Data warehousing

UNIT II

BUSINESS ANALYSIS

Reporting and Query tools and Applications – Tool Categories – Cognos Impromptu–– Online Analytical Processing (OLAP) – Need –Multidimensional Data Model – OLAP Guidelines – Multidimensional versus Multirelational OLAP – Categories of Tools.

DECISION SUPPORT TOOLS/Access Tools

Tool Categories

There are five categories of decision support tools, although the lines that separate them

are quickly blurring:

Reporting

Managed query

Executive information systems

On-line analytical processing

Data mining

Reporting tools

Reporting tools can be divided into production reporting tools and desktop report writers.

Production reporting tools will let companies generate regular operational report or support

high-volume batch jobs, such as calculating and printing paychecks. Production reporting tools

include third-generation languages such as COBOL; specialized fourth-generation languages,

such as information Builders, Inc’s Focus; and high-end client/server tools, such as MITTs SQR.

Page 2: business analysis-Data warehousing

Data warehouse architecture

Report writers, on the other hand, are inexpensive desktop tools designed for end users.

Products such as Seagate Software’s Crystal Reports let users design and run reports without

having to rely on the IS department. In general, report writers have graphical interfaces and

built-in charting functions. They can pull groups of data from a variety of data sources and

integrate them in a single report. Leading report writers include Crystal Reports, Actuate

Software Corp’s. Actuate Reporting System, IQ software Corp.’s IQ objects, and Platinum

Technology, Inc.’s InfoReports. Vendors are trying to increase the scalability of report writers

by supporting three-tiered architectures in which report processing is done on a Windows NT or

UNIX server. Report writers also are beginning to offer object-oriented interfaces for designing

and manipulating reports and modules for performing ad hoc queries and OLAP analysis.

Users and Related Activities

User Activity Tools

Clerk Simple retrieval 4GL*

Executive Exception reports EIS

Page 3: business analysis-Data warehousing

Manager Simple retrieval 4GL

Business analysts Complex analysis Spreadsheets, OLAP , data mining

Fourth-generation language

Managed query tools

Managed query tools shields end users from the complexities of SQL and database

structure by inserting a metalayer between users and the database Metalayer is the software that

provides subject-oriented views of a database and supports points and click creation of SQL.

Some vendors, such a Business Objects, Inc., call this layer a “universe.” Other vendors, such a

Cognos Corp., call it a “catalog.” Managed query tools have been extremely popular because

they make it possible for knowledge workers to access corporate data without IS intervention.

Most managed query tools have embraced three-tiered architecture to improve scalability.

They support asynchronous query execution and integrate with Web servers. Managed query

tools vendors are racing to embed support for OLAP and data mining features. Some tool

makers, such as Business Objects, take an all-in one approach. It embeds OLAP functionality in

its core 4.0 product. Other vendors, such as Cognos, Platnium Technologies, and Information

Builders, take a best-of-breed approach, offering Microsoft Corp. Office like suites composed of

managed query, OLAP, and data mining tools. Other leading managed query tools are IQ

Software IQ objects, Andyne Computing Ltd.’s GQL, IBM’s Decision Server, Speedware Corp’s

Esperant (formerly sold by software AG), and Oracle Corp.’s Discoverer/2000.

Executive information system tools

Executive information system (EIS) tools predate report writers and managed query

tools; they were first deployed on mainframes. EIS tools allow developers to build customized,

graphical decision support application or ‘briefing books” that give managers and executives a

high –level view of the business and access to external sources, such as custom , on-line news

feeds. EIS applications highlight exceptions to normal business activity or rules by using color-

coded graphics.

Page 4: business analysis-Data warehousing

Popular EIS tools include Pilot Software, Inc.’s Lightship, Platinum Technolog’s Forest

and Trees, Comshare, Inc.’s Commander Decision, Oracle’s Express Analyzer, and SAS

Institute, Inc.’s SAS/EIS. EIS vendors are moving in two directions. Many are adding managed

query functions to compete head-on-with other decision support tools. Others are building

packaged applications that address horizontal functions, such as sales, budgeting and marketing,

or vertical industries, such as financial services. For example, Platinum Technologies offers

RiskAdvisor , a decision support application for the insurance industry that was built with Forest

and Trees. Comshare provides the Arthur family of supply-chain applications for the retail

industry.

OLAP tools

OLAP tools provide an intuitive way to view corporate data. These tools aggregate data

along common business subjects or dimensions and then let users navigate through the

hierarchies and dimensions with the click of a mouse button. Users can drill across or up levels

in each dimension or pivot and swap out dimensions to change their view of the data.

Some data, such as Arbor Software Corp’s Essbase and Oracle’s Express preaggregate

data in special multidimensional database. Other tools work directly against relations data and

aggregate date on the fly, such as Micro-strategy, Inc.’s DSS Agent or Information Advantage,

Inc’s Decision Suite. Some tools process OLAP data on the desktop instead of a server.

Desktop OLAP tools tinclude cognos’ Powerplay, Brio Technology, Inc.’s BrioQuery Planning

Sciences, Inc.’s Gentium, and Andyne Pablo. Many of the difference between OLAP tools are

fading. Vendors are rearchitecting their products to give users greater control over the tradeoff

between flexibility and performance that is inherent in OLAP tools. Many vendors are rewriting

pieces of their products in Java.

Data mining tools

Data mining tools are becoming hot commodities because they provide insights into

corporate data that aren’t easily discerned with managed query or OLAP tools. Data mining

tools use a variety of statistical and artificial-intelligence (AI) algorithms to analyze the

correlation of variables in the data and ferret out interesting patterns and relationships to

investigate.

Page 5: business analysis-Data warehousing

Some data mining tools, such as IBM’s Intelligent Miner, are expensive and require

statisticians to implement and manage. But there is a new breed of tools emerging that promises

to take the mystery out of data mining. These tools include DataMind Corp’s DataMind, Pilot’s

Discovery Server, and tools from Business Objects and SAS Institute. These tools offer simple

user interfaces that plug in directly to existing OLAP tools or databases and can be return

directly against data warehouses.

The end-user tools are spans a number of data warehouse compounds. For example, all

end-user tools use metadata definitions to obtain access to data stored in the warehouse, and

some of these tools (e.g., OLAP tools) may employ additional or intermediary data stores (e.g.,

data marts, multidimensional databases).

COGNOS IMPROMPTU

Overview:

Impromptu from Cognos Corporation is positioned as an enterprise solution for

interactive database reporting that delivers 1- to 1000+ seat scalability. Impromptu’s object-

oriented architecture ensures control and administrative consistency across all users and reports.

Users access Impromptu through its easy-to-use graphical user interface. Impromptu has been

well received by users because querying and reporting are unified in one interface, and the users

can get meaningful views of corporate data quickly and easily.

Impromptu offers a fast and robust implementation at the enterprise level, and features

full administrative control, ease and deployment and low cost of ownership. Impromptu is the

database reporting tools that exploit the power of the database, while offering complete control

over all reporting within the enterprise. In terms of scalability, Impromptu can support a single

user reporting on personal data, or thousands of users reporting on data from large data

warehouses.

Impromptu offers a fast and robust implementation at the enterprise level, and features

full administrative control, ease of deployment, and low cost of the database, while offering

complete control over all reporting within the enterprise. In terms of scalability, Impromptu can

Page 6: business analysis-Data warehousing

support a single user reporting on personal data, or thousands of users reporting on data from

large data warehouses.

User acceptance of Impromptu is very high because its user interface looks and feels just

like the Windows products these users already use. With Impromptu users can leverage the

skills they’ve acquired from using today’s popular spreadsheets and word processors. In

additions, Impromptu insulates users from the underlying database technology, which also

reduces the time necessary to learn the tool.

The Impromptu Information Catalog:

Improving reporting begins with the Information Catalog, a LAN- based repository of

business knowledge and data accesses rules. The Catalog insulates users from such technical

aspects of the database as SQL syntax, table joins, and cryptic table and field names. The

Catalog also protects the database from repeated queries and unnecessary processing.

Creating a catalog is a relatively simple task, so that an Impromptu administrator can be

anyone who’s familiar with basic database query functions.

The Catalog presents the database in a way that reflects how the business sis organized,

and uses the terminology of the business. Impromptu administrators are free to organize

database items such as tables and fields into Impromptu’s subject-oriented folders, subfolders

and columns. Structuring the data in this way makes it easy for users to navigate within a

database and assemble reports. In additions, users are not restricted to fixed combinations or

predetermined selections; they can select on the finest detail within a database.

Impromptu enables business relevant through business rules, which can consist of shared

calculations, filters, and ranges for critical success factors. For example, users can create a

report that includes only high-margin sales from the last fiscal year for the eastern region, instead

of having to use complex filter statements.

Reporting

Page 7: business analysis-Data warehousing

Impromptu is designed to make it easy for users to build and run their own reports. With

Report Wise templates and Head Starts, users simply apply data to Impromptu to produce reports

rapidly.

Impromptu’s predefined Report Wise templates include templates for mailing labels,

invoices, sales reports, and directories. These templates are complete with formatting, logic,

calculations, and custom automation. Organizations can create templates for standard company

reports, and then deploy them to every user who needs them. The templates are database-

independent; therefore users simply map their data onto the existing placeholders to quickly

create sophisticated reports. Additionally, Impromptu provides users with variety of page and

screen formats, knows as HeadStarts, to create new reports that are visually appealing.

Impromptu offers special reporting options that increase the value of distributed standard

reports.

Picklists and prompts. Organizations can create standard Impromptu reports for which

users can select from lists of values called picklists. For example, a user can select a

picklist of all sales representatives with a single click of the mouse. For reports

containing too many values for a single variable, Impromptu offers prompts. For

example, a prompt asks the user at run time to supply a value or range for the report data.

Picklists and prompts make a single report flexible enough to serve many users.

Custom templates. Standard report templates with global calculations and business rules

can be created once and then distributed to users of different databases. Users then can

apply their data to the placeholders contained in the template. A template’s standard

logic, calculations, and layout complete the report automatically in the user’s choice of

format.

Exception reporting. Exception reporting is the ability to have report high –light values

that the lie outside accepted ranges. Impromptu offers three types of exception reporting

that help managers and business users immediately grasp the status of their business.

Conditional filters. Retrieve only those values that are outside defined

thresholds, or define ranges to organize data for quick evaluation. For example,

a user can set a condition to show only those sales under $10,000.

Page 8: business analysis-Data warehousing

Conditional highlighting. Create rules for formatting data on the basis of data

values. For example, a user can set a condition that all sales over $10,000

always appear in blue.

Conditional display. Display report objects under certain conditions. For

example, a report will display a regional sales history graph only if the sales are

below a predefined value.

Interactive reporting .Impromptu unifies querying and reporting in a single interface.

Users can perform both these tasks interfacing with live data in one integrated module.

Frames. Impromptu offers an interesting frame-based reporting style. Frames are

building blocks that may be used to produce reports that are formatted with fonts,

borders, colors, shading, etc. Frames know about their contents and how to display them.

Frames, or combinations of frames, simplify building even complex reports. Once a

multiframe report is designed, it can be saved as a template and return at any time with

other data. The data formats itself according to the type of frame selected by the user.

List frames are used to display details information. List frames can contain

calculated columns, data filters, headers and footers, etc.

Form frames offer layout and design flexibility. Form reports can contain

multiple or repeating forms such as mailing labels.

Cross-tab frames are used to show the totals of summarized data at selected

intersections, for example, sales of product by outlet.

Hart frames make it easy for users to see their business data in 2-D and 3-D

displays using line, bar, ribbon, area, and pie charts. Charts can be stand-alone or

attached to other frames in the same report.

Text frames allows users to add descriptive text to reports and display binary

large objects (BLOBs) such as product descriptions or contracts.

Picture frames incorporate bitmaps to reports or specific records, perfect for

visually enhancing reports.

Page 9: business analysis-Data warehousing

OLE frames make it possible for users to insert any OLE object into a report.

Impromptu’s design is tightly integrated with the Microsoft Windows environment and

standards , including OLE2 support. Users can quickly learn Impromptu using Microsoft

Office-compatible user interface that is complete with tabbed dialog boxes, bubble Help,

and customizable toolbars. Together with OLE 2 support. Users can quickly learn

Impromptu using Microsoft Office – compatible user interface that is complete with

tabbed dialog boxes, bubble Help, and customizable toolbars. Together with OLE

support, users can produce enhanced reports by simply placing data or objects in a

document, regardless of the application in which it resides. For example, Impromptu

reports can be embedded in spreadsheet files, or placed in a Word document.

OLAP OPERATIONS ON MULTIDIMENSIONAL DATA.

OLAP operations on multidimensional data.

1. Roll-up: The roll-up operation performs aggregation on a data cube, either by climbing-up a concept hierarchy for a dimension or by dimension reduction. Figure shows the result of a roll-up operation performed on the central cube by climbing up the concept hierarchy for location. This hierarchy was defined as the total order street < city < province or state <country.

2. Drill-down: Drill-down is the reverse of roll-up. It navigates from less detailed data to more detailed data. Drill-down can be realized by either stepping-down a concept hierarchy for a dimension or introducing additional dimensions. Figure shows the result of a drill-down operation performed on the central cube by stepping down a concept hierarchy for time defined as day < month < quarter < year. Drill-down occurs by descending the time hierarchy from the level of quarter to the more detailed level of month.

3. Slice and dice: The slice operation performs a selection on one dimension of the given cube, resulting in a subcube. Figure shows a slice operation where the sales data are selected from the central cube for the dimension time using the criteria time=”Q2". The dice operation defines a subcube by performing a selection on two or more dimensions.

4. Pivot (rotate): Pivot is a visualization operation which rotates the data axes in view in order to provide an alternative presentation of the data. Figure shows a pivot operation where the item and location axes in a 2-D slice are rotated.

Figure : Examples of typical OLAP operations on multidimensional data.

Page 10: business analysis-Data warehousing

OLAP GUIDELINES

Multidimensionality is at the core of a number of OLAP systems (databases and front-

end tools) available today. However, the availability of these systems does not eliminate the

need to define a methodology of how to select and use the products. Dr. E.F. Codd, the “father”

of the relational model, has formulated a list of 12 guide lines and requirements as the basis for

selecting OLAP systems. Users should prioritize this suggested list to reflect their business

requirements and consider products that best match those needs.

Page 11: business analysis-Data warehousing

1. Multidimensional conceptual view. A tool should provide users with a multidimensional

model that corresponds to the business problems and is intuitively analytical and easy to use.

2. Transparency . The OLAP system’s technology, the underlying database and computing

architecture (client/server, mainframe gateways, etc.) and the heterogeneity of input data sources

should be transparent to users to preserve their productivity and profieciency with familiar front-

end environments and tools (e.g., MS Windows , MS Excel).

3. Accessibility. The OLAP system should access only the data actually required to perform the

analysis. Additionally, the system should be able to access data from all heterogeneous

enterprise data source required for the analysis.

4.Consistent reporting performance. As the number of dimensions and the size of the database

increase, users should not perceive any significant degradation in performance.

5.Client/server architecture. The OLAP system has to conform to client/server architectural

principles for maximum price and performance, flexibility, adaptivity and interoperability.

6. Generic dimensionality. Every data dimension must be equivalent in both structure and

operational capabilities.

7. Dynamic sparse matrix handling. As previously mentioned, the OLAP system has to be able

to adapt its physical schema to the specific analytical model that optimizes sparse matrix

handling to achieve and maintin the required level of performance.

8. Multiuser support. The OLAP system must be able to support a work group of users working

concurrently on a specific model.

9. Unrestricted cross-dimensional operations. The OLAP system must be able to recognize

dimensional hierarchies and automatically perform associated roll-up-calculations within and

across dimensions.

10. Intuitive data manipulation. Consolidation path reorientation (pivoting), drill-down and roll-

up, and other manipulations should be accomplished via direct point-and click, drag and drop

actions on the cells of the cube.

Page 12: business analysis-Data warehousing

11. Flexible reporting. The ability to arrange rows, columns and cells in a fashion that facilitates

analysis by intuitive visual presentation of analytical reports must exist.

12. Unlimited dimensions and aggregation level. Depending on business requirements, and

analytical model may have a dozen or more dimensions, each having multiple hierarchies. The

OLAP system should not impose any artificial restrictions on the number of dimensions or

aggregation levels.

In addition to these 12 guidelines, a robust production-quality OLAP system should also support.

Comprehensive database management tools. These tools should functions as an

integrated centralized tool and allow for database management for the distributed

enterprise.

The ability to drill down to detail (source record) level. This means that the tools should

allow for a smooth transition from the multidimensional (preaggregated) database to the

detail record level of the source relations data bases.

Incremental database refresh .Many OLAP databases support only full refresh, and this

presents an operations and usability problem as the size of the database increases.

Structured Query Language (SQL) interface. An important requirements for the OLAP

system to be seamlessly integrated into the existing enterprise environment.

MULTIDIMENSIONAL DATA MODEL

The multidimensional nature of business questions is reflected in the fact that, for

example, marketing managers are no longer satisfied by asking simple one-dimensional

questions such as “How much revenue did the new product generate?”Instead, they ask questions

such as “How much revenue di the new product generate by month, in the northeastern divisions,

broken down by user demographic, by sales office, relative to the previous version of the

product, compared them with plan”- a six dimensional question. One way to look at the

multidimensional data model is to view it as cubce . The table on the left contains detailed sales

data by product, market and time. The cube on the right associates sales numbers (unit sold)

with dimensions –product type- market, and time – with the UNIT variables organized as cell in

an array. This cube can be expanded to include another array-price- which can be associated

Page 13: business analysis-Data warehousing

with all or only some dimensions (for example, the unit price of a product may or may not

change with time, or from city to city). The cube supports matrix arithmetic that allows the cube

to present the dollar sales array simply by performing a single matrix operation on all cells of the

array (dollar sales = units * price}.

The response time of the multidimensional query still depends on how many cells have to

be added on the fly. The caveat here is that, as the number of dimensions increases, the number

of the cubes cells increases exponentially. On the other hand, the majority of multidimensional

queries deal with summarized high-level data. Therefore, the solution to building an efficient

multi-dimensional database is to preaggragate (consolidate) all logical subtotals and totals along

all dimensions are hierarchical in nature. For example, the TIME dimension may contain

hierarchies for years, quarters, months, weeks and days; GEOGRAPHY may contain country,

state city, etc. Having the predefined hierarchy within dimensions allows for logical

preaggreagation and, conversely, allows for a logical drill-down – from the product group to

individual products, from annual sales to weekly sales, and so on.

Another way to reduce the size of the cube is to properly handle sparse data. Often, not

every cell has a meaning across all dimensions (many marketing database may have more than

95 percent of all cells empty or containing 0). Another kind of sparse data is create when many

cells contains duplicate data (i.e. if the cube contains a PRICE dimensions, the same price may

apply to all markets and all quarter for the year). The ability of a multidimensional data-base to

skip empty or repetitive cells can greatly reduce the size of the cube and the amount of

processing.

Dimensional hierarchy, sparse data management, and preaggregation are the key, since

they can significantly reduce the size of he database and the need to calculate values, such a

design obviates the need for multitable joins and provides quick and direct access to the arrays of

answers, thus significantly speeding up execution of the multidimensional queries.

Page 14: business analysis-Data warehousing

Figure: Relational table and multidimensional cubes.

Multidimensional DataModel.The most popular data model for data warehouses is a multidimensional model. This

model can exist in the form of a star schema, a snowflake schema, or a fact constellation schema. Let's have a look at each of these schema types.

Star schema: The star schema is a modeling paradigm in which the data warehouse contains (1) a large central table (fact table), and (2) a set of smaller attendant tables (dimension tables), one for each dimension. The schema graph resembles a starburst, with the dimension tables displayed in a radial pattern around the central fact table.

Page 15: business analysis-Data warehousing

Figure Star schema of a data warehouse for sales.

Snowflake schema: The snowflake schema is a variant of the star schema model, where some dimension tables are normalized, thereby further splitting the data into additional tables. The resulting schema graph forms a shape similar to a snowflake. The major difference between the snowflake and star schema models is that the dimension tables of the snowflake model may be kept in normalized form. Such a table is easy to maintain and also saves storage space because a large dimension table can be extremely large when the dimensional structure is included as columns.

Figure Snowflake schema of a data warehouse for sales.

Fact constellation: Sophisticated applications may require multiple fact tables to share dimension tables. This kind of schema can be viewed as a collection of stars, and hence is called a galaxy schema or a fact constellation.

Figure Fact constellation schema of a data warehouse for sales and shipping.

Page 16: business analysis-Data warehousing

A Data Mining Query Language, DMQL: Language Primitives Cube Definition (Fact Table)

define cube <cube_name> [<dimension_list>]: <measure_list> Dimension Definition (Dimension Table)

define dimension <dimension_name> as (<attribute_or_subdimension_list>) Special Case (Shared Dimension Tables)

First time as “cube definition” define dimension <dimension_name> as <dimension_name_first_time> in cube

<cube_name_first_time>Defining a Star Schema in DMQLdefine cube sales_star [time, item, branch, location]:dollars_sold = sum(sales_in_dollars), avg_sales = avg(sales_in_dollars), units_sold = count(*)define dimension time as (time_key, day, day_of_week, month, quarter, year)define dimension item as (item_key, item_name, brand, type, supplier_type)define dimension branch as (branch_key, branch_name, branch_type)define dimension location as (location_key, street, city, province_or_state, country)

Defining a Snowflake Schema in DMQLdefine cube sales_snowflake [time, item, branch, location]:dollars_sold = sum(sales_in_dollars), avg_sales = avg(sales_in_dollars), units_sold = count(*)define dimension time as (time_key, day, day_of_week, month, quarter, year)define dimension item as (item_key, item_name, brand, type, supplier(supplier_key, supplier_type))define dimension branch as (branch_key, branch_name, branch_type)define dimension location as (location_key, street, city(city_key, province_or_state, country))

Defining a Fact Constellation in DMQLdefine cube sales [time, item, branch, location]:dollars_sold = sum(sales_in_dollars), avg_sales = avg(sales_in_dollars), units_sold = count(*)define dimension time as (time_key, day, day_of_week, month, quarter, year)define dimension item as (item_key, item_name, brand, type, supplier_type)define dimension branch as (branch_key, branch_name, branch_type)

Page 17: business analysis-Data warehousing

define dimension location as (location_key, street, city, province_or_state, country)define cube shipping [time, item, shipper, from_location, to_location]:dollar_cost = sum(cost_in_dollars), unit_shipped = count(*)define dimension time as time in cube salesdefine dimension item as item in cube salesdefine dimension shipper as (shipper_key, shipper_name, location as location in cube sales, shipper_type)define dimension from_location as location in cube salesdefine dimension to_location as location in cube sales

A Concept HierarchyConcept hierarchies allow data to be handled at varying levels of abstraction

CATEGORIZATION OF OLAP TOOLS

On-line analytical processing (OLAP) tools are based on the concepts of multi-

dimensional databases and allow a sophisticated user to analyze the data using elaborate,

multidimensional , complex views. Typical business applications for these tools include product

performance and profitability, effectiveness of a sales program or a marketing campaign, sales

forecasting, and capacity planning. These tools assume that he data is organized in a

multidimensional model which is supported by a special multidimensional database (MDDB) or

by a relational database designed to enable multidimensional properties. (e.g., star schema) a

chart comparing capabilities of these two classes of OLAP tools is shown in figure.

1. MOLAP

Traditionally, these product utilized specialized data structures [i.e.,

multidimensional database management system (MDDBMSs) to organize, navigate, and

navigate data, typically in an aggregated form, and traditionally required a tight coupling with

the application layer and presentation layer. There recently has been a quick movement by

MOLAP vendors to segregate the OLAP through the use of published application programming

interfaces APIs). Still, there remains the need to store the data in a way similar to the way in

which it will be utilized, to enhance the performance and provide a degree of predictability for

complex analysis queries. Data structures use array technology and, in most cases, provide

improved storage techniques to minimize the disk space requirements through sparse data

Page 18: business analysis-Data warehousing

management. This architecture enables excellent performance when the data is utilized as

designed, and predictable application response times for applications addressing a narrow

breadth of data for a specific DSS requirement. In addition, some products treat time as a special

dimension (e.g., Pilot Software’s Analysis Server), enhancing their ability to perform time series

analysis. Other products provide strong analytical capabilities (e.g. Oracle’s Express Server)

built into the database.

The area of the circles indicates the data size

Figure: OLAP style comparison.

Applications requiring iterative and comprehensive time series analysis of trends are well

suited for MOLAP technology (e.g., financial analysis and budgeting). Examples include Arbor

Software’s Essbase , Oracle’s Express Server, Pilot Software’s Lightship Server, Sinper’s TM/1,

Planning Sciences’ Gentium, and Kenan Technology’s Multiway.

Several Challenges face users considering the implementation of applications with

MOLAP products. First, there are limitation in the ability of data structures to support multiple

subject areas of data (a common trait of many strategic DSS applications) and the detail data

required by many analysis applications. This has a begun to be addressed in some products,

utilizing rudimentary “reach through” mechanisms that enable the MOLAP tools to access detail

data maintained in an RDBMS (as shown in figure). There are also limitations in the way data

can be navigate and analyzed, because the data is structured around the navigation and analysis

Page 19: business analysis-Data warehousing

requirements known at the time the data structures are built. When the navigation or dimension

requirements change, the data structures may need to be physically reorganized to optimally

support the new requirements. This problem is similar in nature to the older hierarchical and

network DBMS (e.g., IMS, IDMS), where different sets of data had to be created for each

application that used the data in a manner different from the way the date was originally

maintained. Finally, MOLAP products require a different set of skills and tools for the database

administrator to build and maintain the database, thus increasing the cost and complexity of

support.

To address this particular issue, some vendors significantly enhanced their reach-through

capabilities. These hybrid solutions have as their primary characteristics the integration of

specialized multidimensional data storage with RDBMS technology, providing users with a

facility that tightly” couples the multidimensional data structures (MDDSs) with data maintained

in a an RDBMS. (see figure left). This allows the MDDSs to dynamically obtain detail data

maintained in an RDBMS, when the application reaches the bottom of the multidimensional cells

during drill-down analysis. This may deliver the best of both worlds, MOLAP and ROLAP.

This approach can be very useful for organizations with performance –sensitive

multidimensional analysis requirements and that have built, or are in the process of building, a

data warehouse architecture that contains multiple subject areas. An example would be the

creation so sales data measured by several dimensions (e.g., product and sales regions0 to be

stored and maintained in a persistent structure. This structure would be provided to reduce the

application overhead of performing calculations and building aggregations during application

initialization. These structures can be automatically refreshed at predetermined intervals

established by an administrator.

Page 20: business analysis-Data warehousing

Figure: MOLAP architecture

2. ROLAP

This segment constitutes the fastest-growing style of OLAP technology, with new

vendors (e.g, Sagent Technology) entering the market at an accelerating pace. Products in this

group have been engineered from the beginning to support RDBMS products directly through a

dictionary layer of metadata, by passing any requirement for creating a static multidimensional

data structure (see figure). This enables multiple multidimensional views of the two-dimensional

relational tables to be created with the need to structure the data around the desired view.

Finally, some of the products in this segment have developed strong SQL generating engines to

support the complexity of multidimensional analysis. This includes the creation of multi SQL

statements to handle user requests, being “RDBMS-aware,” and providing the capability to

generate the SQL based on the optimizer of the DBMS engine. While flexibility is an attractive

feature of ROLAP products , thee are products in this segments that recommend, or require, the

use of highly de normalized database designs (e.g. star schema). The design and performance

issues associated with the star schema have been discussed.

Page 21: business analysis-Data warehousing

Figure: ROLAP architecture

The ROLAP tools are undergoing some technology realignment. This shift in technology

emphasis is coming in two forms. First is the movement toward pure middleware technology

that provides facilities to simplify development of multidimensional applications. Second there

continues further blurring of the lines that delineate ROLAP and hybrid –OLAP products.

Vendrors of ROLAP tools and RDBMS products look to provide an option to create

multidimensional, persistent structures, with facilities to assist in the administration of these

structures. Examples include information Advantage (Axsys), MicroStrategy (DSS Agent /DSS

server), Platnium/Prodea Software (Beacon), Informix/Stanford Technology Group(Metacube),

and Sybase (HighGate Project).

3. Managed query environment (MQE)

The style of OLAP, which is beginning to see increased activity, provides users with the

ability to perform limited analysis capability, either directly against RDBMS products, or by

leveraging an intermediate MOLAP server(see figure) . some products (e.g. Andyne’s Pablo)

that have a heritage in ad hoc query have developed features to provide “datacube” and “slice

and dice” analysis capabilities. This is achieved by first developing a query to select data from

the DBMS, which then delivers the requested data to the desktop, where it is placed into a

datacube. This datacube can be stored and maintained locally, to reduce the overhead required to

create the structure each time the query is executed. Once the data is in the datacube, users can

perform multidimensional analysis (i.e. slice, dice and pivot operations) against it. Alternatively,

Page 22: business analysis-Data warehousing

these tools can work with MOLAP servers, and the data from the relational DBMS can be

delivered to the MOLAP server, and from there to the desktop.

The simplicity of the installation and administration of such products makes them

particularly attractive to organizations looking to provide seasoned users with more sophisticated

analyses capabilities, without the significant cost and maintenance of more complex products.

With all the ease of installation and administration that accompanies the desktop OLAP products,

most of these tools require the datacube to be built and maintained on the desktop or a separate

server. With metadata definitions that assist users in retrieving the correct set of data that makes

up the datacube, this method causes a plethora of data redundancy and strain to most network

infrastructures that support many users. Although this mechanism allow for the flexibility of

each user to build a custom datacube, the lack of data consistency among users, and the relatively

small amount of data that can be efficiently maintained are significant challenges facing tools

administrators.

Examples include Cognos Software Powerplay, Andyne, Software’s Pablo Business

Objects’ Mercury project, Dimensional Insight’s CrossTarget, and Speedware’s Media.

Figure: Hybrid /MQE architecture

By

M.Dhilsath Fathima