peter azzarello april 11, 2012 ib toronto user forum webfocus hyperstage overview summit 2012

Peter Azzarello

April 11, 2012

IB Toronto User Forum

WebFOCUS Hyperstage Overview

Summit 2012

WebFOCUSHigher Adoption & Reuse with Lower TCO

Reporting

Query & Analysis

Dashboards

Information Delivery

PerformanceManagement

EnterpriseSearch

Visualization& Mapping

Data UpdatingPredictive Analytics

MS Office &e-Publishing

Extended BI

Core BI

Extensions to the WebFOCUS

platform allow you to build more

application types at a lower cost

Business toBusiness

Data Warehouse& ETL

Master DataManagement

Data Profiling & Data Quality

Business ActivityMonitoring

High PerformanceData Store

Mobile Applications

Copyright 2007, Information Builders. Slide 3

The Business Challenge

Big Data

Big Data and Machine Generated Data

Data Storage

Time

Machine- GeneratedData

Human-GeneratedData

Today’s Top Data-Management Challenge

Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010)

How Performance Issues are Typically Addressed – by Pace of Data Growth

Don't Know / Unsure

Upgrade networking infrastructure

Archive older data on other systems

Upgrade/expand storage systems

Upgrade server hardware/processors

Tune or upgrade existing databases

0% 20% 40% 60% 80% 100%

7%

21%

30%

33%

54%

66%

4%

32%

44%

60%

70%

75%

High Growth

Low Growth

When organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the

problem

IT Manager’s try to mitigate these response times …..


Traditional Data Warehousing

Labor intensive, heavy indexing, aggregations and partitioning

Hardware intensive: massive storage; big servers

Expensive and complex

More Data, More Data Sources

More Kinds of Output Needed by More Users,

More Quickly

Limited Resources and Budget

0101010101010101010101010101

0101010101010101010101010

0101010101010101010101

1

0101010101010101010101

10

1010 1011001

0 110

01

1

0

01

101

010101

1

1

0101

0

1010

101

10 0101

10

01

10

0110

1

0

10101

01 010 01 0101

011

10100101

1

01

0

10

1010 1011001

0 110

01

1

0

01

10

1

0

10101

10

0101010101010101010101010

0101010101010101010101010101

1

10110

0 101

1010 10 1101

010

0

0 101 0010

0

Real time data

Multiple databases

External Sources

Data Warehousing Challenges

New Demands: Larger transaction volumes driven by the internetImpact of Cloud ComputingMore -> Faster -> Cheaper

Data Warehousing Matures: Near real time updatesIntegration with master data managementData mining using discrete business transactionsProvision of data for business critical applications

Early Data Warehouse Characteristics:Integration of internal systemsMonthly and weekly loadsHeavy use of aggregates

Data Warehousing Challenges

CUBES/OLAP

Classic Approaches to deal with Large Data

INDEXES

Limitations of Indexes

Increased Space requirements Sum of Index Space requirements can exceed the source

DB Index Management

Increases Load timesBuilding the index

Predefines a fixed access path

Limitations of OLAP

Cube technology has limited scalability Number of dimensions is limited Amount of data is limited

Cube technology is difficult to update (add Dimension) Usually requires a complete rebuild Cube builds are typically slow New design results in a new cube

Easy Migration to Hyperstage

Most cubes will be fed from a relational sourceCommon that relational source is a star schemaThe source star schema can be migrated directly to

HyperstageWebFOCUS metadata can be used to define hierarchies and

drill paths to navigate the star schema


Pivoting Your Perspective:Columnar Technology ….

1. Impediments to business agility: Organizations often must wait for DBAs to create indexes or other tuning structures, thereby delaying access to data. In addition, indexes significantly slow data-loading operations and increase the size of the database, sometimes by a factor of 2x.

2. Loss of data and time fidelity: IT generally performs ETL operations in batch mode during non-business hours. Such transformations delay access to data and often result in mismatches between operational and analytic databases.

3. Limited ad hoc capability: Response times for ad hoc queries increase as the volume of data grows. Unanticipated queries (where DBAs have not tuned the database in advance) can result in unacceptable response times, and may even fail to complete.

4. Unnecessary expenditures: Attempts to improve performance using hardware acceleration and database tuning schemes raise the capital costs of equipment and the operational costs of database administration. Further, the added complexity of managing a large database diverts operational budgets away from more urgent IT projects.

These Solutions Contribute to Operational Limitations

The Limitation of Rows

Row-based databases are ubiquitous because so many

of our most important business systems are transactional.

Row-oriented databasesare well suited for

transactional environments, such as a call center where a

customer’s entire record is required when their profile

is retrieved and/or when fields are frequently updated.

The Ubiquity of Rows …

But - Disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all

column data for any query.

30 columns

50 millions

Rows

The Limitation of Rows

Row Oriented (1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000)

Works well if all the columns are needed for every query. Efficient for transactional processing if all the data for the row is available

Works well with aggregate results (sum, count, avg. ) Only columns that are relevant need to be touched Consistent performance with any database design Allows for very efficient compression

Column Oriented (1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000)

Pivoting Your Perspective: Columnar Technology

Employee Id

1

2

3

Name

Smith

Jones

Fraser

Location

New York

New York

Boston

Sales

50,000

65,000

40,000

4 Fraser Boston 70,000

Employee Id

1

2

3

Name

Smith

Jones

Fraser

Location

New York

New York

Boston

Sales

50,000

65,000

40,000

1 Smith New York 50,000

2 Jones New York 65,000


1

2

3

Smith New York 50,000

Jones New York 65,000

Data stored in rows

Fraser Boston 40,000

Data stored in columns

Pivoting Your Perspective: Columnar Technology


4 Fraser Boston 70,000 4 Fraser Boston 70,000


Introducing WebFOCUS Hyperstage

The Hyperstage Mission

Improve database performance for WebFOCUS applications with less

hardware, no database tuning and easy migration.

The WebFOCUS Hyperstage high performance analytic data store is designed to handle business-driven queries on large volumes of data—without IT intervention. Easy to implement and manage, Hyperstage provides the answers to your business users need at a price you can afford.

Introducing WebFOCUS Hyperstage ….

What is it?

Hyperstage combines a columnar database with intelligence we call the Knowledge Grid to deliver fast query responses.

.


How is it architected?

Hyperstage Engine

Knowledge Grid

Compressor

BulkLoader

• Unmatched Administrative Simplicity • No Indexes• No data partitioning• No Manual tuning

Self-managing: 90% less administrative effortLow-cost: More than 50% less than alternative

solutionsScalable, high-performance: Up to 50 TB using a

single industry standard serverFast queries: Ad-hoc queries are as fast as

anticipated queries, so users have total flexibilityCompression: Data compression of 10:1 to 40:1

that means a lot less storage is needed, it might mean you can get the entire database in memory!


What does this mean for Customers?

Create Information(Metadata) about the data,

and, upon Load, automatically …

Uses the metadata whenProcessing a query to

Eliminate / reduce need to access data

Architecture Benefits

o Stores it in the Knowledge Grid (KG)o KG Is loaded into Memoryo Less than 1% of compressed data Size

o The less data that needs to be accessed, the faster the response o Sub-second responses when answered by KG

o No Need to partition data, create/maintain indexes projections, or tune for performanceo Ad hoc queries are as fast as static queries, so users have total flexibility


How does it work?

WebFOCUS Hyperstage Runtime Architecture

Hypercopy

Hyperstage Server

Hyperstage Engine

MySQL

WebFOCUS Server

WebFOCUSPro Server

HyperstageAdapter

Knowledge Grid

Compressor

BulkLoader

Hypercopy

Hyperstage Server

Hyperstage Engine

WebFOCUS Server

WebFOCUSHyperstage

Adapter

Knowledge Grid

Compressor

BulkLoader

Smarter Architecture

No maintenance No query planning No partition schemes No DBA

Data Packs – data stored in manageably sized, highly compressed data packs

Knowledge Grid – statistics and metadata “describing” the super-compressed data

Column Orientation

WebFOCUS Hyperstage Engine

Data compressed using algorithms tailored to data type

How does it work?


Summary


Business Intelligence – Meeting Requirements

WebFOCUS HyperstageThe Big Deal…

No indexesNo partitionsNo viewsNo materialized aggregates

Value propositionLow IT overheadAllows for autonomy from ITEase of implementationFast time to marketLess Hardware Lower TCO

No DBA Required!

What’s it look like?

Pay no attention to that man behind the curtain.

CREATE FILE baseapp/pa_inventory_ind_t DROP -RUN

BULKLOAD baseapp/pa_inventory_ind_t FOR SQLINLD INV_CODE; TYPE; CATEGORY; NAME; MODEL; MEASURE1_INV; MEASURE2_INV; MEASURE3_INV;

JOIN SYMBOLS.SYMBOLS.SYMBOL IN SYMBOLS TO MULTIPLE QUOTES_2B.QUOTES_2B.SYMBOL IN QUOTES_2B TAG J0 AS J0 END TABLE FILE SYMBOLS PRINT SYMBOL CLOSE_DATE CLOSE_PRICE VOLUME OPEN_PRICE WHERE ( SYMBOL EQ '&SYMBOL.(<MSFT,MSFT>).SYMBOL.' ) AND ( CLOSE_DATE GT '&START_DATE.(<2000-03-

01,2000-03-01>).yyyy-mm-dd.' ) AND ( CLOSE_DATE LT '&END_DATE.(<2000-03-31,2000-03-31>).yyyy-mm-dd.' ); ON TABLE SET PAGE-NUM NOLEAD ON TABLE NOTOTAL ON TABLE PCHOLD FORMAT HTML ON TABLE SET HTMLCSS ON ON TABLE SET STYLE * INCLUDE = endeflt, $ ENDSTYLE END

Example – Focus to Hyperstage Compression 243639 Rows

Q&A


STAR SCHEMA CONSIDERATIONS

Leverage the Knowledge Grid

• Do constrain the fact table directly

• Do use sub-selects instead of joins

• Do use date based constraints as much as possible

• Do add additional columns to create useful knowledge nodes

Everyone wants to be a Star

Adding as many WHERE conditions as you can to your SQL increases the chance that knowledge grid statistics can be used to increase the performance of your queries.

peter azzarello april 11, 2012 ib toronto user forum webfocus hyperstage overview summit 2012

Documents

data sources

large data indexes

pace of data growth

new cube slide

fixed access path slide

lower cost business

limited cube technology

source star schema