maximize webfocus performance with hyperstage
DESCRIPTION
Maximize WebFOCUS Performance with Hyperstage. Apr, 2012. Agenda. Introduction to Hyperstage How does it work Recent results Demonstration Wrap Up and Q&A. Introducing Hyperstage. WebFOCUS Hyperstage Why?. Why Do BI Applications Fail? Typically 3 Reasons…. 1. Too Complicated - PowerPoint PPT PresentationTRANSCRIPT
Apr, 2012
Maximize WebFOCUS Performance with Hyperstage
Agenda
Introduction to HyperstageHow does it workRecent resultsDemonstrationWrap Up and Q&A
Copyright 2007, Information Builders. Slide 3
Introducing Hyperstage
WebFOCUS HyperstageWhy?
Why Do BI Applications Fail? Typically 3 Reasons….1. Too ComplicatedSelf-Service, Guided Ad hoc2. Bad DataData Quality
3. Too SlowHyperstage
Hyperstage will improve database performance for
WebFOCUS applications with less hardware, no database tuning and easy migration.
What is WebFOCUS Hyperstage
Embedded, columnar data store that can dramatically increase the performance of WebFOCUS applications
Columnar = reduced I/O (vs relational) Easily implemented without the need for database administration Disk footprint is reduced with a powerful compression algorithm Includes embedded ETL for seamless migration of existing analytical
databases No change in query or application required Data migrations are seamless and easy
WF 7.7.03M and higher includes optimized Hyperstage Adapter Runs on commodity hardware (Intel based)
Windows 64 Linux (Redhat, Centos, Suse, Debian)
5
Hyperstage is an integrated columnar oriented data store that helps WebFOCUS applications achieve outstanding query
performance.
Introducing WebFOCUS Hyperstage ….
Smarter Architecture
No maintenance No query planning No partition schemes No DBA
Data Packs – data stored in manageably sized, highly compressed data packs
Knowledge Grid – statistics and metadata “describing” the super-compressed data
Column Orientation
WebFOCUS Hyperstage Engine
Data compressed using algorithms tailored to data type
How does it work?
Copyright 2007, Information Builders. Slide 8
Data Organization and the Knowledge Grid …
Employee Id
1
2
3
Name
Smith
Jones
Fraser
Location
New York
New York
Boston
Sales
50,000
65,000
40,000
1 Smith New York 50,000
2 Jones New York 65,000
3 Fraser Boston 40,000
1
2
3
Smith New York 50,000
Jones New York 65,000
Data stored in rows
Fraser Boston 40,000
Data stored in columns
Pivoting Your Perspective: Columnar Technology
4 Fraser Boston 70,000
4 Fraser Boston 70,000 4 Fraser Boston 70,000
Data Packs - The data within each column is stored in groupings of 65,536 values called Data Packs
Data Packs improves data compression as the optimal compression algorithm is applied based on the data contents
An average compression ratio of 10:1 is achieved after loading data into Hyperstage. For example 1TB of raw data can be stored in about 100GB of space.
Data Organization and the Knowledge Grid ….
Data Pack Data Pack Data Pack Data Pack Data Pack Data Pack
64K
Data Packs Each data pack contains 65, 536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data
type and data distribution
Compression Results vary depending on the
distribution of data among data packs
A typical overall compression ratio seen in the field is 10:1
Some customers have seen results have been as high as 40:1
Patent PendingCompression
Algorithms
64K
64K
64K
Data Packs and Compression
Data Organization and the Knowledge Grid ….
Pack Row 1
Column A
Pack Row 2
Pack Row 3
Pack Row 4
Pack Row 5
Pack Row 6
Column B
Global Knowledge
String and character data
Numeric data
Distributions
Dynamic Knowledge Built per-querye.g. foraggregates, joins
Built duringLOAD
The Knowledge Grid Knowledge Nodes
Data Organization and the Knowledge Grid ….
Data Organization and the Knowledge Grid ….
This metadata layer = 1% of the compressed volume
Data Pack Nodes (DPN)A separate DPN is created for every data pack created in the database to store basic statistical information
Character Maps (CMAPs)Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character
HistogramsHistograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals.
Pack-to-Pack Nodes (PPN)PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used.
Copyright 2007, Information Builders. Slide 14
How does it work …
salary age job city
Completely Irrelevant
Suspect
All values match
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’;
WebFOCUS Hyperstage Example: Query and Knowledge Grid
salary age job city
1. Find the Data Packs with salary > 50000
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’;
WebFOCUS Hyperstage Example: salary > 50000
Completely Irrelevant
All values match
salary age job city
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 65
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’;
WebFOCUS Hyperstage Example: age<65
Completely Irrelevant
Suspect
All values match
salary age job city
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’;
WebFOCUS Hyperstage Example: job = ‘shipping
Completely Irrelevant
Suspect
All values match
salary age job city
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Toronto’
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’;
WebFOCUS Hyperstage Example: city = ‘Toronto
Completely Irrelevant
Suspect
All values match
salary cityAll packsignored
All packsignored
All packsignored
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Toronto’5. Eliminate All rows that have been flagged as
irrelevant
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’;
WebFOCUS Hyperstage Example: Eliminate Pack Rows
Completely Irrelevant
Suspect
All values match
age job
salary cityAll packsignored
Only this pack will be de-compressed
All packsignored
All packsignored
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Toronto’5. Eliminate All rows that have been flagged as
irrelevant6. Finally we identify the pack that needs to be
decompressed
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’;
WebFOCUS Hyperstage Example: Decompress and scan
Completely Irrelevant
Suspect
All values match
age job
POC Results (Internal Use Only)
Insurance Company Query performance issues with SQL Server - Insurance claims analysis 3 day POC - Compression achieved 40:1 Most queries running 3X faster in Hyperstage
Large Bank Query performance issues with SQL Server - Web traffic analysis 3 day POC -Compression achieved 10:1 Queries than ran for 10 to 15 mins in SQL Server ran sub-second in
Hyperstage Government Application
Query performance issues with Oracle – Federal Loan/Grant Tracking 3 day POC -Compression achieved 15:1 Queries than ran for 10 to 15 mins in Oracle ran in 30 secs in
Hyperstage
22
POCs can typically be completed with 3 days
Beyond WebFOCUS
23
Java
.Net
WF
Conn
ecto
rWebFOCUSReporting
Server
WebFOCUS Client
WF
Hype
rsta
ge A
dapt
er
WebFOCUSHyperstage
ServerW
F Se
rvic
e
Hyperstage is integrated in the WebFOCUS BI Architecture through the reporting server and is administered using the WebFOCUS console
WebFOCUS client applications communicate directly through the reporting server Custom applications developed via Java or .Net can access the reporting server via
WebFOCUS services and a supplied WebFOCUS connector Hyperstage also supports connections from any application via industry standard JDBC or
ODBC connections. There are also native drivers for .NET, C, or PHP applications to connect directly to the Hyperstage engine.
Data can be loaded and maintained in Hyperstage using iWay Data Integration or using any commercial ETL tool.
Generic AppJava
C.NetPHPPerl
Hyperstage vs. OLAP
Many companies are looking to migrate from legacy OLAP solutions Hyperstage can offer excellent query performance with a commonly
understood star pattern database WebFOCUS can offer navigation and drill path navigation Hyperstage can support large numbers of dimensional attributes and
can be easily updated
24
OLAP WebFOCUS HyperStage
Limited number of dimensions Supports up to 4096 columns on a single table
Difficult to add new dimensions Dimension tables can be updated Rebuilding cubes can be slow Bulk loads of up to 500GB per hour Up to 10X raw data size to amount
of disk consumed Typically 10:1 compression
Hyperstage vs. In-Memory
WebFOCUS Hyperstage is a viable alternative to BI tools that utilize an in-memory architecture like QlikView, Tableau, Cognos TM1 and Tibco/Spotfire
In-memory is limited to the amount of data you can store in RAM. Hyperstage is a hybrid approach that efficiently uses disk I/O without
sacrificing the performance achieved by in-memory Tableau for example has approximately a 100GB limit on its in-
memory cache.
25
In Memory Solutions WebFOCUS HyperStage
Storage: RAM Storage: RAM/Disk Expensive Cheap Short term Long Term Requires additional hardware Leverage existing hardware
Copyright 2007, Information Builders. Slide 26
Demonstration …
NYSE Daily Stock Price History
Downloaded from internet daily history from 1970 to 2006 for 7000 stocks
14 million rows 1.4GB of raw data Compressed to 70MB
Test query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse)
Note: Hyperstage running on a Dell laptop 1 duo core processor with 4GB of RAM
NYSE Daily Stock Price History (exploded)
Simulated additional stock prices up to 2043 2 billion rows 200GB of raw data Compressed to 17GB
Test query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse)
WebFOCUS HyperstageThe Big Deal…
No indexesNo partitionsNo viewsNo materialized aggregates
Value propositionLow IT overheadAllows for autonomy from ITEase of implementationFast time to marketLess Hardware Lower TCO
No DBA Required!
Q&A
Copyright 2007, Information Builders. Slide 30