maximize webfocus performance with hyperstage

Apr, 2012

Maximize WebFOCUS Performance with Hyperstage

Agenda

Introduction to HyperstageHow does it workRecent resultsDemonstrationWrap Up and Q&A

Copyright 2007, Information Builders. Slide 3

Introducing Hyperstage

WebFOCUS HyperstageWhy?

Why Do BI Applications Fail? Typically 3 Reasons….1. Too ComplicatedSelf-Service, Guided Ad hoc2. Bad DataData Quality

3. Too SlowHyperstage

Hyperstage will improve database performance for

WebFOCUS applications with less hardware, no database tuning and easy migration.

What is WebFOCUS Hyperstage

Embedded, columnar data store that can dramatically increase the performance of WebFOCUS applications

Columnar = reduced I/O (vs relational) Easily implemented without the need for database administration Disk footprint is reduced with a powerful compression algorithm Includes embedded ETL for seamless migration of existing analytical

databases No change in query or application required Data migrations are seamless and easy

WF 7.7.03M and higher includes optimized Hyperstage Adapter Runs on commodity hardware (Intel based)

Windows 64 Linux (Redhat, Centos, Suse, Debian)

5

Hyperstage is an integrated columnar oriented data store that helps WebFOCUS applications achieve outstanding query

performance.

Introducing WebFOCUS Hyperstage ….

Smarter Architecture

No maintenance No query planning No partition schemes No DBA

Data Packs – data stored in manageably sized, highly compressed data packs

Knowledge Grid – statistics and metadata “describing” the super-compressed data

Column Orientation

WebFOCUS Hyperstage Engine

Data compressed using algorithms tailored to data type

How does it work?


Data Organization and the Knowledge Grid …

Employee Id

1

2

3

Name

Smith

Jones

Fraser

Location

New York

New York

Boston

Sales

50,000

65,000

40,000

1 Smith New York 50,000

2 Jones New York 65,000

3 Fraser Boston 40,000

1

2

3

Smith New York 50,000

Jones New York 65,000

Data stored in rows

Fraser Boston 40,000

Data stored in columns

Pivoting Your Perspective: Columnar Technology

4 Fraser Boston 70,000

4 Fraser Boston 70,000 4 Fraser Boston 70,000

Data Packs - The data within each column is stored in groupings of 65,536 values called Data Packs

Data Packs improves data compression as the optimal compression algorithm is applied based on the data contents

An average compression ratio of 10:1 is achieved after loading data into Hyperstage. For example 1TB of raw data can be stored in about 100GB of space.

Data Organization and the Knowledge Grid ….

Data Pack Data Pack Data Pack Data Pack Data Pack Data Pack

64K

Data Packs Each data pack contains 65, 536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data

type and data distribution

Compression Results vary depending on the

distribution of data among data packs

A typical overall compression ratio seen in the field is 10:1

Some customers have seen results have been as high as 40:1

Patent PendingCompression

Algorithms

64K

64K

64K

Data Packs and Compression


Pack Row 1

Column A

Pack Row 2

Pack Row 3

Pack Row 4

Pack Row 5

Pack Row 6

Column B

Global Knowledge

String and character data

Numeric data

Distributions

Dynamic Knowledge Built per-querye.g. foraggregates, joins

Built duringLOAD

The Knowledge Grid Knowledge Nodes



This metadata layer = 1% of the compressed volume

Data Pack Nodes (DPN)A separate DPN is created for every data pack created in the database to store basic statistical information

Character Maps (CMAPs)Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character

HistogramsHistograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals.

Pack-to-Pack Nodes (PPN)PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used.


How does it work …

salary age job city

Completely Irrelevant

Suspect

All values match

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’;

WebFOCUS Hyperstage Example: Query and Knowledge Grid

salary age job city

1. Find the Data Packs with salary > 50000


WebFOCUS Hyperstage Example: salary > 50000


All values match

salary age job city

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 65


WebFOCUS Hyperstage Example: age<65


Suspect

All values match

salary age job city

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’


WebFOCUS Hyperstage Example: job = ‘shipping


Suspect

All values match

salary age job city

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Toronto’


WebFOCUS Hyperstage Example: city = ‘Toronto


Suspect

All values match

salary cityAll packsignored

All packsignored

All packsignored

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Toronto’5. Eliminate All rows that have been flagged as

irrelevant


WebFOCUS Hyperstage Example: Eliminate Pack Rows


Suspect

All values match

age job

salary cityAll packsignored

Only this pack will be de-compressed

All packsignored

All packsignored

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Toronto’5. Eliminate All rows that have been flagged as

irrelevant6. Finally we identify the pack that needs to be

decompressed


WebFOCUS Hyperstage Example: Decompress and scan


Suspect

All values match

age job

POC Results (Internal Use Only)

Insurance Company Query performance issues with SQL Server - Insurance claims analysis 3 day POC - Compression achieved 40:1 Most queries running 3X faster in Hyperstage

Large Bank Query performance issues with SQL Server - Web traffic analysis 3 day POC -Compression achieved 10:1 Queries than ran for 10 to 15 mins in SQL Server ran sub-second in

Hyperstage Government Application

Query performance issues with Oracle – Federal Loan/Grant Tracking 3 day POC -Compression achieved 15:1 Queries than ran for 10 to 15 mins in Oracle ran in 30 secs in

Hyperstage

22

POCs can typically be completed with 3 days

Beyond WebFOCUS

23

Java

.Net

WF

Conn

ecto

rWebFOCUSReporting

Server

WebFOCUS Client

WF

Hype

rsta

ge A

dapt

er

WebFOCUSHyperstage

ServerW

F Se

rvic

e

Hyperstage is integrated in the WebFOCUS BI Architecture through the reporting server and is administered using the WebFOCUS console

WebFOCUS client applications communicate directly through the reporting server Custom applications developed via Java or .Net can access the reporting server via

WebFOCUS services and a supplied WebFOCUS connector Hyperstage also supports connections from any application via industry standard JDBC or

ODBC connections. There are also native drivers for .NET, C, or PHP applications to connect directly to the Hyperstage engine.

Data can be loaded and maintained in Hyperstage using iWay Data Integration or using any commercial ETL tool.

Generic AppJava

C.NetPHPPerl

Hyperstage vs. OLAP

Many companies are looking to migrate from legacy OLAP solutions Hyperstage can offer excellent query performance with a commonly

understood star pattern database WebFOCUS can offer navigation and drill path navigation Hyperstage can support large numbers of dimensional attributes and

can be easily updated

24

OLAP WebFOCUS HyperStage

Limited number of dimensions Supports up to 4096 columns on a single table

Difficult to add new dimensions Dimension tables can be updated Rebuilding cubes can be slow Bulk loads of up to 500GB per hour Up to 10X raw data size to amount

of disk consumed Typically 10:1 compression

Hyperstage vs. In-Memory

WebFOCUS Hyperstage is a viable alternative to BI tools that utilize an in-memory architecture like QlikView, Tableau, Cognos TM1 and Tibco/Spotfire

In-memory is limited to the amount of data you can store in RAM. Hyperstage is a hybrid approach that efficiently uses disk I/O without

sacrificing the performance achieved by in-memory Tableau for example has approximately a 100GB limit on its in-

memory cache.

25

In Memory Solutions WebFOCUS HyperStage

Storage: RAM Storage: RAM/Disk Expensive Cheap Short term Long Term Requires additional hardware Leverage existing hardware


Demonstration …

NYSE Daily Stock Price History

Downloaded from internet daily history from 1970 to 2006 for 7000 stocks

14 million rows 1.4GB of raw data Compressed to 70MB

Test query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse)

Note: Hyperstage running on a Dell laptop 1 duo core processor with 4GB of RAM

NYSE Daily Stock Price History (exploded)

Simulated additional stock prices up to 2043 2 billion rows 200GB of raw data Compressed to 17GB

Test query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse)

WebFOCUS HyperstageThe Big Deal…

No indexesNo partitionsNo viewsNo materialized aggregates

Value propositionLow IT overheadAllows for autonomy from ITEase of implementationFast time to marketLess Hardware Lower TCO

No DBA Required!

Q&A


maximize webfocus performance with hyperstage

Documents