can your business intelligence environment handle...

14
A White Paper Can Your Business Intelligence Environment Handle Data Growth? WebFOCUS Hyperstage: A Query-Optimized Data Store for Enhanced Reporting and Analytical Performance WebFOCUS iWay Software Omni

Upload: duongkiet

Post on 06-Sep-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

A White Paper

Can Your Business Intelligence Environment Handle Data Growth?WebFOCUS Hyperstage: A Query-Optimized Data Store for Enhanced Reporting and Analytical Performance

WebFOCUS iWay Software Omni

1 Executive Summary

2 So Much Data, So Little Time

2 The Response from IT

2 The BI Response

3 In-Memory Computing

4 An In-Memory Hybrid Approach

5 Row-Oriented vs. Column-Oriented Databases: Which Are Better?

6 The WebFOCUS Hyperstage Advantage: Superior Performance

7 A Formidable Architecture

9 WebFOCUS Hyperstage: The Ideal Solution for Improved Query Performance

9 Reduced Administration

9 Ease of Use

9 Scalability

10 Reduced TCO

10 Simplified Administration

10 Small Data Footprint

10 Accelerated Data Loading

10 Highly Efficient Query Processing

10 Rapid Implementation

11 Conclusion

Table of Contents

Information Builders1

Executive Summary

As companies experience rapid growth in the volume of data they maintain, many are finding that their business intelligence (BI) environments are struggling to keep pace. Regardless of the amount of data being processed, end users expect superior performance from the reporting and analytics applications they use. Slow response times will undoubtedly lead to frustration, or even abandonment of the environment altogether.

IT departments have tried many different approaches to address this issue – from indexes and cubes to reporting-specific data marts and warehouses – none of which have been able to truly solve the problem. Only embedded data stores can effectively combat the countless challenges associated with slow query response times.

This white paper will highlight WebFOCUS Hyperstage, an effective and economical way for users of Information Builders’ WebFOCUS BI platform to achieve outstanding reporting and analytic performance – even in the face of expanding data volumes. Hyperstage is a query-optimized data store, embedded directly into the WebFOCUS environment. It significantly accelerates the speed and efficiency of WebFOCUS applications, particularly those that handle high numbers of ad hoc or unplanned queries, or those that consume large volumes of data. We’ll discuss why its columnar architecture is better suited to reporting and analytics than row-oriented databases, as well as its dynamic Knowledge Grid and built-in facilities for automated data access and migration, which allow companies to take advantage of a truly self-managing environment that is fully optimized for business intelligence.

Can Your Business Intelligence Environment Handle Data Growth?2

While most web-based BI applications are successful, some fail because users are dissatisfied with the slow environment. Users can become impatient if query response times exceed certain thresholds. These delays are sometimes caused by flaws in the design of the underlying database or data warehouse, which render them unable to support large numbers of users. But more often, BI environments underperform because the supporting database is unable to process user queries quickly, due to the amount of data stored online.

As the volume of data contained in warehouses and other typical information sources used for reporting and analysis continues to grow, query processing times will continue to slow to levels that are considered unacceptable by the user base. That’s why TDWI believes that “advanced analytic techniques that operate on big datasets” are considered “one of the most profound trends in business intelligence today”.1

Today’s users need to efficiently access and analyze large data volumes for a variety of critical purposes, ranging from strategic planning to operational reporting. Yet, few BI architectures are structured to effectively support such requirements. In a recent article on SearchStorage.com, contributing author John Webster, senior partner at Evaluator Group, Inc. writes, “Traditional data warehousing is a large but relatively slow producer of information to business analytics.”2

The Response from ITMany IT departments have sought to combat this problem by creating and maintaining indexes, building cubes or projections, or partitioning data. Others have purchased faster, more powerful hardware and massively parallel processing (MPP) configurations. And some rely on extract, transform, and load (ETL) tools to create data marts and warehouses specifically for reporting and analysis. “When organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the problem.”3

These techniques, however, frequently lead to additional problems. For example, because they rely heavily on the efforts of database administrators, they can hinder agility and divert critical resources from other important projects. Because ETL processes are often performed just once a day, they often result in lack of synchronization between operational systems and those repositories used for reporting. Most importantly, they do not eliminate poor response times of those unanticipated or ad hoc queries for which the database has not been optimized.

The BI ResponseThe business intelligence software community has attempted to address the need for increased performance. One popular approach of some BI vendors has been to create physical OLAP cubes that pre-calculate every possible aggregate combination across multiple dimensional attributes. Although this does increase performance, physical cubes have trouble supporting large volumes and are inflexible in the face of business changes.

So Much Data, So Little Time

1 “Big Data Analytics,” TDWI, September 2011.2 Webster, John. “Understanding Big Data Analytics,” SearchStorage.com, August 2011.3 McKendrick, Joseph. “Keeping Up With Ever-Expanding Enterprise Data,” Unisphere Research, October 2010.

Information Builders3

Another common technique used by embedded data stores is to put all of the data into server RAM to take advantage of the inherent I/O rate improvements over disk. This technique has been very successful and spawned a trend of using in-memory analytics for increased BI performance. But with the need to respond to today’s evolving business drivers, in-memory analytic solutions struggle to maintain performance as the size of the data goes beyond the fixed amount of server RAM.

WebFOCUS Hyperstage uses a hybrid approach that combines the I/O advantage of in-memory analytics with an intelligent architecture that allows data to be stored on disk without sacrificing performance. Hyperstage can scale up to many terabytes, while many other in-memory solutions are restricted to a couple of hundred gigabytes.

In-Memory ComputingThe high-performance of random access memory can also be leveraged to speed up the I/O needed to execute data queries. This can mean either using an in-memory database, or caching aggregated query results, which is also known as in-memory analytics.

Database solutions can load all data in-memory, for rapid access and retrieval. The data is typically persisted on disk as well, to keep it protected in the event of a server failure. The underlying structures remain the same, but I/O speed and performance is substantially increased over disk. In operational scenarios, there may be some time lag between RAM and disk, since indexes are still commonly used and there can be significant hardware requirements. WebFOCUS has been used to successfully exploit in-memory databases at many customer sites.

In-memory analytic solutions typically use SQL to cache data results in RAM, to support multiple users and reports. To reduce the amount of hardware required, the cached results are usually limited in scope or are maintained at a very high level of aggregation. If detail data is changed, or if the server goes down, the in-memory software must send a SQL request back to the originating database. Applications that require users to drill from aggregate to detail data must also issue SQL requests back to the originating database.

One commonality among in-memory solutions is the need for substantial computing power. In-memory databases need RAM for compressed data images, RAM for uncompressed data to respond to queries, and RAM for processing aggregates, sorts, and table joins. A technical deployment document from one common database vendor recommends 35 core to house a one terabyte database in memory. More CPU would likely be needed, depending on concurrent usage levels.

Can Your Business Intelligence Environment Handle Data Growth?4

A BI vendor recently talked about loading 16 terabytes of detailed information into their in-memory solution. The hardware needed to support the data loads and test queries included 24 8-core server nodes.

An In-Memory Hybrid Approach WebFOCUS Hyperstage uses an intelligent approach to data storage and in-memory computing, to provide optimum performance with minimal hardware, while greatly reducing the overall cost of deploying our technology.

WebFOCUS extracts data, using any of our existing adapters, then populates Hyperstage. Data is stored on disk using a high level of compression. Data statistics are also collected for every piece of data loaded into Hyperstage. These statistics, called the Knowledge Grid, can be used to answer analytical queries directly, or to navigate to a specific location on disk containing the data needed to complete the query. The entire set of statistics is loaded into RAM when the Hyperstage service starts. It is also persisted on disk, in case the service is taken down.

Hyperstage also uses intelligent caching techniques to store recently-accessed data into memory for other users and reports. This data remains in memory until Hyperstage determines that it needs to reuse the RAM for additional in-memory processing.

A typical deployment of Hyperstage that supports approximately 20 terabytes of raw data would use a single 16-core Intel server with 64 gigabytes of RAM.

Information Builders5

Most organizations derive the data used for reporting and analysis from traditional relational – or row-oriented – databases, such as Oracle or SQL Server. While these are ideal for transaction processing, they are often lacking when it comes to supporting high-performance BI environments with large volumes of data.

A paper written by Yale and MIT students working with AvantGarde Consulting, states, “It is not possible for a row-store to achieve some of the performance advantages of a column-store”.4 In a study published by Bloor Research, Research Director Philip Howard agrees, stating, “conventional approaches to data warehousing use traditional relational databases. However, these were originally designed to support transaction processing (OLTP) and do not have an architecture specifically designed for supporting queries”.5

The primary difference between column and row-orientation lies with how the data is stored on disk. In row-oriented databases, all the values of a single data record are stored as a single entity, which means that a request for a single field would also need to include all the fields of that table in each record. If there are 100 fields of data, every record request will need to account for 100 values.

In other words, applications that retrieve data from row-oriented databases must read each field within each record in order to access just one value. This means that a lot of I/O is wasted processing fields that are not needed to complete a simple BI request. If the data is structured column by column, then only the fields that are required are actually processed by queries, resulting in I/O reduction that can have a major impact on reporting performance.

Often, a query issued against a row-based database will require a full table scan. Each field in each row is lifted from disk and processed in memory. As a column-oriented database, Hyperstage avoids full table scans and commonly uses automatically generated statistics to find data, without the need to define additional indexes.

Row-Oriented vs. Column-Oriented Databases: Which Are Better?

4 Abadi, Daniel J.; Madden, Samuel R.; Hachem, Nabil. “Column-Stores vs. Row-Stores: How Different Are They Really?,” AvantGarde Consulting, June 2008.

5 Howard, Philip. “What’s Cool About Columns (and How to Extend Their Benefits),” Bloor Research, October 2010.

Can Your Business Intelligence Environment Handle Data Growth?6

WebFOCUS Hyperstage, an embedded column-oriented data store, reduces the I/O generated by standard row-based relational databases. Its unique architecture also tracks statistics about every data item loaded into Hyperstage, and eliminates the need to define indexes on specific data columns.

The kind of performance gains that can be achieved with Hyperstage have been proven in real-world scenarios. For example, one customer was experiencing long-running queries with a common row-based relational database. Queries running against that database required a full table scan or index scan. Hyperstage, however, used the automatically generated data statistics to complete each query in less than a second.

When combined with the power of WebFOCUS metadata, Hyperstage is able to automatically navigate dimensional hierarchies, similar to the way common OLAP solutions work. Although, unlike OLAP solutions, Hyperstage can support thousands of dimensional attributes and scale to hundreds of terabytes of raw data.

The outstanding compression offered by Hyperstage (typically 10:1), combined with minimal maintenance and the ability of its server to run on common hardware, creates a much lower total cost of ownership than other similar technology bundles that require proprietary hardware or a large multi-server platform.

The WebFOCUS Hyperstage Advantage: Superior Performance

“Using columns instead of rows means that you get greatly reduced

I/O because you only read the columns referenced by the query. This

means that you get dramatically improved performance. The use of

compression improves I/O rates (and performance) still further.” 6

WebFOCUS Hyperstage

SQL Server

(Insurance Company) (Financial Services Company)

Res

po

nse

Tim

es (s

eco

nd

s)

6 Howard, Philip. “What’s Cool About Columns (and How to Extend Their Benefits),” Bloor Research, October 2010.

Information Builders7

The combination of WebFOCUS and Hyperstage creates a formidable architecture that can accommodate a wide range of BI requirements. WebFOCUS metadata has the ability to automatically detect dimensional attributes and navigate dimensional hierarchies. Application drill-through actions can add additional power. WebFOCUS can also maintain complex join definitions and eliminate the need for users to understand the data model. Metadata can also provide row-level security for large BI deployments.

As a column-oriented data store, Hyperstage provides an immediate performance benefit for WebFOCUS applications via reduced I/O. The Hyperstage engine also includes additional architectural features for consistent query performance, without the need for continual analysis and database tuning.

First, when loading data into the engine, Hyperstage separates each field into columns, so that each column is treated as it own table. On disk, each column is stored with its own set of files, which directly contributes to reductions in I/O.

Before the data is stored on disk, the values are assembled into data packs of 65,536 elements. The information contained in each data pack is then inspected, and the resulting statistics are stored in a layer called the Knowledge Grid. As a final step, the data pack is compressed to its optimal size in memory, before it is stored on disk. Since the data, by definition, is column-oriented and the compressed image is applied at a granular level, Hyperstage will commonly compress data at a 10:1 ratio – and sometimes as high as 40:1.

Reporting queries generated by the WebFOCUS Hyperstage adapter will first inspect the knowledge grid before trying to access data on disk. For typical BI queries that aggregate data, the information needed may be contained in the automatically generated statistics – which will typically result in an immediate response. For more complex queries, the statistics are used to quickly identify the data packs that contain the needed information. This intelligent approach

A Formidable Architecture

Can Your Business Intelligence Environment Handle Data Growth?8

combines in-memory with reduced disk I/O to maintain consistent performance, while providing the scalability to support large volumes of data.

The ability of the Hyperstage intelligent architecture to provide increased performance with large data sizes was proven recently at a customer site, where existing WebFOCUS reports were commonly taking 20 to 25 minutes to complete. The reports analyzed web traffic using a popular row-based relational database with 1.6 billion rows. During a three-day proof of concept (POC), the customer was able to migrate the data to Hyperstage, point the existing application at the new target, and immediately begin completing the same reports in just a few seconds.

Information Builders9

WebFOCUS Hyperstage delivers substantial advantages to companies operating reporting environments with:

■n A large number of unplanned or ad hoc queries

■n Large databases (200 GB or more)

■n Bulk loads

■n High volumes of machine-generated data

These organizations can realize dramatic benefits, such as:

Reduced AdministrationThe benefits of WebFOCUS Hyperstage can be easily attained without the need to analyze queries, or develop indexing strategies or partition plans. Data is automatically partitioned into columns, and assembled into data packs. Statistics are automatically generated for every piece of data loaded, so there is no need to continually tune the data structure to support unplanned and ad hoc queries.

Ease of UseWebFOCUS includes an adapter that automatically translates queries generated by an existing application into supporting SQL understood by Hyperstage. For more complex tasks that may require the need to send pass-through requests, the SQL only needs to be formatted into syntax understood by MySQL. The WebFOCUS Hyperstage engine is shipped within a MySQL framework to provide extended functionality, including security, stored procedures, and user defined functions.

ScalabilityWith the intelligent hybrid architecture of Hyperstage, WebFOCUS customers can maintain consistent query performance as data volumes grow, using minimal hardware. Superior compression allows large quantities of data to be stored on a smaller disk footprint. For example, using a 10:1 ratio (the average compression rate for Hyperstage), one terabyte of raw data will consume 100 GB of disk space. If loaded into an RDBMS, two to three terabytes would typically be needed to accommodate the data, as well as the indexes needed to support query and reporting.

In one example, a customer took data from a common row-based database with a disk footprint of 400 GB, and loaded it into Hyperstage. The disk footprint required by Hyperstage to store the same data was slightly over 10 GB.

The WebFOCUS Hyperstage server can be also be installed on simple hardware, and provides performance levels that compare to other technologies that require large server farms.

WebFOCUS Hyperstage: The Ideal Solution for Improved Query Performance

Can Your Business Intelligence Environment Handle Data Growth?10

Reduced TCOHyperstage minimizes total cost of ownership by employing industry-standard servers and leveraging unparalleled compression. This reduces storage requirements, decreases software costs, and minimizes ongoing operational expenses.

Simplified AdministrationBecause it eliminates the need for new schemas, indexes, and data partitioning, WebFOCUS Hyperstage is easy and economical to maintain, requiring little or no effort from database administrators. Additionally, all Knowledge Grid structures are automatically maintained across all columns in the database, so tuning, index creation, or modifications are never needed.

Small Data FootprintWebFOCUS Hyperstage compresses data at an average ratio of 10:1, with greater compression possible depending upon the type of data being loaded. Some clients have experienced compression levels of 30:1 or even 40:1.

Accelerated Data LoadingHyperstage excels with large data volumes, and can scale to up to 50 TB – compressed to less than 5 TB of disk space – on a single server. Load speeds can top 150 GB per hour.

Highly Efficient Query ProcessingQuery response times will be dramatically improved with WebFOCUS Hyperstage, particularly for ad hoc analytics, with no indexes or manual tuning required.

Rapid ImplementationA built-in migration tool makes it fast and easy to improve the performance of existing WebFOCUS applications. Simply load-and-go using your existing schemas – no need to create materialized views or to partition data.

“Columns provide better performance at a lower cost with a smaller

footprint: it is difficult to understand why any company seriously interested

in query performance would not consider a column-based solution.” 7

7 Howard, Philip. “What’s Cool About Columns (and How to Extend Their Benefits),” Bloor Research, October 2010.

Information Builders11

As more and more companies recognize the need to support large data volumes, they are seeking new ways to improve the speed of their business intelligence applications. Column-oriented databases deliver faster response times than their row-oriented counterparts when it comes to reporting against large data sets. In fact, companies that move their enterprise data into a column-oriented store before allowing users to access and analyze it often achieve a tenfold or greater performance improvement.

WebFOCUS Hyperstage is a robust embedded data store that is ideal for companies experiencing lagging query response times due to large data volumes or a high volume of ad hoc queries. With Hyperstage, customers can dramatically improve the performance of their WebFOCUS applications. Its column-oriented database and Knowledge Grid architecture, combined with automatic data migration facilities, deliver a self-managing environment optimized for reporting and analytics that not only improves performance, but also eliminates cumbersome database administration and reduces TCO.

Conclusion

Worldwide Offices

Corporate HeadquartersTwo Penn Plaza New York, NY 10121-2898 (212) 736-4433 (800) 969-4636

United StatesAtlanta, GA* (770) 395-9913

Boston, MA* (781) 224-7660

Channels (770) 677-9923

Chicago, IL* (630) 971-6700

Cincinnati, OH* (513) 891-2338

Dallas, TX* (972) 398-4100

Denver, CO* (303) 770-4440

Detroit, MI* (248) 641-8820

Federal Systems, D.C.* (703) 276-9006

Florham Park, NJ (973) 593-0022

Houston, TX* (713) 952-4800

Los Angeles, CA* (310) 615-0735

Minneapolis, MN* (651) 602-9100

New York, NY* (212) 736-4433

Philadelphia, PA* (610) 940-0790

Pittsburgh, PA (412) 494-9699

San Jose, CA* (408) 453-7600

Seattle, WA (206) 624-9055

St. Louis, MO* (636) 519-1411, ext. 321

Tampa, FL (813) 639-4251

Washington, D.C.* (703) 276-9006

InternationalAustralia* Melbourne 61-3-9631-7900 Sydney 61-2-8223-0600

Austria Raffeisen Informatik Consulting GmbH Wien 43-1-211-36-3344

Brazil São Paulo 55-11-3285-2716

Canada Calgary (403) 718-9828 Montreal* (514) 421-1555 Ottawa (416) 364-2760 Toronto* (416) 364-2760 Vancouver (604) 688-2499

China Information Builders Beijing 86-10-5128-9680 Peacom, Inc. Fuzhou 86-15-8800-93995 SolventoSOFT Technology (HK) Limited Hong Kong 852-9802-4757

Estonia InfoBuild Estonia ÖÜ Tallinn 372-618-1585

Finland InfoBuild Oy Espoo 358-207-580-840

France* Suresnes +33 (0)1-49-00-66-00

Germany Eschborn* 49-6196-775-76-0

Greece Applied Science Ltd. Athens 30-210-699-8225

Guatemala IDS de Centroamerica Guatemala City (502) 2412-4212

India* InfoBuild India Chennai 91-44-42177082

Israel SRL Software Products Ltd. Petah-Tikva 972-3-9787273

Italy Agrate Brianza 39-039-59-66-200

Japan KK Ashisuto Tokyo 81-3-5276-5863

Latvia InfoBuild Lithuania, UAB Vilnius 370-5-268-3327

Lithuania InfoBuild Lithuania, UAB Vilnius 370-5-268-3327

Mexico Mexico City 52-55-5062-0660

Middle East Barmajiat Information Technology, LLC Dubai 971-4-420-9100 n Bahrain n Kuwait n Oman n Qatar n Saudi Arabia n United Arab Emirates (UAE)

Innovative Corner Est. Riyadh 966-1-2939007 n Iraq n Lebanon n Oman n Saudi Arabia n UAE

Netherlands* Amstelveen 31 (0)20-4563333 n Belgium n Luxembourg

Nigeria InfoBuild Nigeria Garki-Abuja 234-9-290-2621

Norway InfoBuild Norge AS c/o Okonor Tynset 358-0-207-580-840

Portugal Lisboa 351-217-217-400

Singapore Automatic Identification Technology Ltd. Singapore 65-69080191/92

South Africa InfoBuild (Pty) Ltd. Johannesburg 27-11-510-0070

South Korea UVANSYS, Inc. Seoul 82-2-832-0705

Southeast Asia Singapore 60-172980912 n Bangladesh n Brunei n Burma n Cambodia n Indonesia n Malaysia n Papua New Guinea n Thailand n The Philippines n Vietnam

Spain Barcelona 34-93-452-63-85 Bilbao 34-94-400-88-05 Madrid* 34-91-710-22-75

Sweden InfoBuild AB Stockholm 46-8-76-46-000

Switzerland Wallisellen 41-44-839-49-49

Taiwan Azion Corporation Taipei 886-2-2356-3996 Galaxy Software Services, Inc. Taipei 886-2-2586-7890, ext. 114

Turkey E-Kalite Yazilim Ankara 90-312-210-10-39

United Kingdom* Uxbridge Middlesex 44-20-7107-4000

Venezuela InfoServices Consulting Caracas 58212-763-1653

* Training facilities are located at these offices.

Corporate Headquarters Two Penn Plaza, New York, NY 10121-2898 (212) 736-4433 Fax (212) 967-6406 DN7506992.0415Connect With Us informationbuilders.com [email protected]

Copyright © 2015 by Information Builders. All rights reserved. [127] All products and product names mentioned in this publication are trademarks or registered trademarks of their respective companies.