maximize webfocus performance with hyperstage

33
Louisville User Group Meeting April 25, 2012 Lori Pieper Maximize WebFOCUS Performance with Hyperstage

Upload: adara

Post on 24-Feb-2016

121 views

Category:

Documents


0 download

DESCRIPTION

Maximize WebFOCUS Performance with Hyperstage. Louisville User Group Meeting April 25, 2012 Lori Pieper. Agenda. The “Big Data” Business Challenge Pivoting Your Perspective Introducing WebFOCUS Hyperstage How does it work? So what’s the big deal? Demonstration Wrap Up and Q&A. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Maximize  WebFOCUS  Performance with  Hyperstage

Louisville User Group MeetingApril 25, 2012

Lori Pieper

Maximize WebFOCUS Performance with Hyperstage

Page 2: Maximize  WebFOCUS  Performance with  Hyperstage

Agenda

The “Big Data” Business ChallengePivoting Your Perspective Introducing WebFOCUS HyperstageHow does it work?So what’s the big deal?DemonstrationWrap Up and Q&A

Page 3: Maximize  WebFOCUS  Performance with  Hyperstage

Copyright 2007, Information Builders. Slide 3

The “Big Data”Business Challenge

Page 4: Maximize  WebFOCUS  Performance with  Hyperstage

Copyright 2007, Information Builders. Slide 4

Traditional Data Warehousing

Labor intensive, heavy indexing, aggregations and partitioning

Hardware intensive: massive storage; big servers

Expensive and complex

More Data, More Data Sources

More Kinds of Output Needed by More Users,

More Quickly

Limited Resources and Budget

0101010101010101010101010101

0101010101010101010101010

0101010101010101010101

1

0101010101010101010101

10

1010 1011001

0 110

01

10

01

101010

1011

1

0101

0

1010

101

10 0101

10

01

10

0110

10

1010101 010 0

1 010101

1 10100101

1

01

0

10

1010 1011001

0 110

01

10

01

101

0

1010110

0101010101010101010101010

0101010101010101010101010101

1 10110

0 101

1010 10 1101010

0

0 101 0010

0

Real time data

Multiple databases

External Sources

Data Warehousing Challenges

Page 5: Maximize  WebFOCUS  Performance with  Hyperstage

Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010)

How Performance Issues are Typically Addressed – by Pace of Data Growth

Don't Know / Unsure

Upgrade networking infrastructure

Archive older data on other systems

Upgrade/expand storage systems

Upgrade server hardware/processors

Tune or upgrade existing databases

0% 20% 40% 60% 80% 100%

7%

21%

30%

33%

54%

66%

4%

32%

44%

60%

70%

75%

High GrowthLow Growth

When organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the

problem

IT Manager’s try to mitigate these response times …..

Page 6: Maximize  WebFOCUS  Performance with  Hyperstage

Limitations of “Traditional” Solutions

Adding indexes: Increases disk space requirements

Sum of index space requirements can even exceed the source DB

Index Management Increases load times to build the index Predefines a fixed access path

Reports run slow if you haven’t “anticipated” the reporting needs correctly

Page 7: Maximize  WebFOCUS  Performance with  Hyperstage

Limitations of “Traditional” Solutions

Building OLAP Cubes:Cube technology has limited scalability

Number of dimensions is limited Amount of data is limited

Cube technology is difficult to update (add Dimension) Usually requires a complete rebuild Cube builds are typically slow New design results in a new cube

Reports run slow if you haven’t “anticipated” the reporting needs correctly

Page 8: Maximize  WebFOCUS  Performance with  Hyperstage

Copyright 2007, Information Builders. Slide 8

Pivoting Your Perspective:Turn Row-based into Column-based

Page 9: Maximize  WebFOCUS  Performance with  Hyperstage

Row-based databases are ubiquitous because so many

of our most important business systems are transactional.

Row-oriented databasesare well suited for

transactional environments, such as a call center where a

customer’s entire record is required when their profile

is retrieved and/or when fields are frequently updated.

The Ubiquity of Rows …

But - Disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all

column data for any query.

30 columns

50 millions

Rows

Why is Row-based Limiting for Analytics?

Page 10: Maximize  WebFOCUS  Performance with  Hyperstage

Row Oriented (1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000)

Works well if all the columns are needed for every query. Efficient for transactional processing if all the data for the row is available

Works well with aggregate results (sum, count, avg. ) Only columns that are relevant need to be touched Consistent performance with any database design Allows for very efficient compression

Column Oriented (1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000)

Why is Column-based Perfect for Analytics?

Employee Id

1

2

3

Name

Smith

Jones

Fraser

Location

New York

New York

Boston

Sales

50,000

65,000

40,000

4 Fraser Boston 70,000

Page 11: Maximize  WebFOCUS  Performance with  Hyperstage

Employee Id

1

2

3

Name

Smith

Jones

Fraser

Location

New York

New York

Boston

Sales

50,000

65,000

40,000

1 Smith New York 50,000

2 Jones New York 65,000

3 Fraser Boston 40,000

1

2

3

Smith New York 50,000

Jones New York 65,000

Data stored in rows

Fraser Boston 40,000

Data stored in columns

Why is Column-based Perfect for Analytics?

4 Fraser Boston 70,000

4 Fraser Boston 70,000 4 Fraser Boston 70,000

Page 12: Maximize  WebFOCUS  Performance with  Hyperstage

Copyright 2007, Information Builders. Slide 12

Introducing Hyperstage

Page 13: Maximize  WebFOCUS  Performance with  Hyperstage

Hyperstage is a high performance analytic data store designed to

handle business-driven queries on large volumes of data—with minimal

IT intervention—achieving outstanding query performance, with less hardware, no database

tuning and easy migration.

Introducing WebFOCUS Hyperstage ….

Page 14: Maximize  WebFOCUS  Performance with  Hyperstage

Easy to implement and manage, Hyperstage provides the answers to your business users’ needs at a price you can afford.

Introducing WebFOCUS Hyperstage ….

But really…What is it?

Page 15: Maximize  WebFOCUS  Performance with  Hyperstage

Hyperstage combines a columnar database with intelligence we call the Knowledge Grid to deliver fast query responses.

Introducing WebFOCUS Hyperstage ….

How is it architected?Hyperstage Engine

Knowledge Grid

Compressor

BulkLoader

Unmatched Administrative Simplicity: • No indexes• No data partitioning• No materialized views

Page 16: Maximize  WebFOCUS  Performance with  Hyperstage

Hyperstage adds data compression of 10:1 to 40:1 so you can

manage large amounts of data using

much smaller disk footprint.

Introducing WebFOCUS Hyperstage ….

How is it architected?Hyperstage Engine

Knowledge Grid

Compressor

BulkLoader

Powerful Data compression:• Store terabytes of data with only

gigabytes of disk space

Page 17: Maximize  WebFOCUS  Performance with  Hyperstage

Hyperstage adds a bulk loader plus an

easy to use extraction and load tool, called HyperCopy, making

data loading a breeze.

Introducing WebFOCUS Hyperstage ….

How is it architected?Hyperstage Engine

Knowledge Grid

Compressor

BulkLoader

Includes embedded ETL:• Easy and seamless migration of existing

analytical databases• No change in query or application

required

Page 18: Maximize  WebFOCUS  Performance with  Hyperstage

Copyright 2007, Information Builders. Slide 18

How Does it Work?

Page 19: Maximize  WebFOCUS  Performance with  Hyperstage

Smarter Architecture

No maintenance No query planning No partition schemes Easy “load and go”

Data Packs – data stored in manageably sized, highly compressed data packs

Knowledge Grid – statistics and metadata “describing” the super-compressed data

Column Orientation

WebFOCUS Hyperstage Engine

Data compressed using algorithms tailored to data type

How does it work?

Page 20: Maximize  WebFOCUS  Performance with  Hyperstage

64K

Data Packs Each data pack contains 65, 536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data

type and data distribution

Compression Results vary depending on the

distribution of data among data packs

A typical overall compression ratio seen in the field is 10:1

Some customers have seen results have been as high as 40:1

Patent PendingCompression

Algorithms

64K

64K

64K

Data Packs and Compression

Data Organization and the Knowledge Grid ….

Page 21: Maximize  WebFOCUS  Performance with  Hyperstage

Data Organization and the Knowledge Grid ….

This knowledge grid layer = 1% of the compressed volume

Data Pack Nodes (DPN)A separate DPN is created for every data pack created in the database to store basic statistical information

Character Maps (CMAPs)Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character

HistogramsHistograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals.

Pack-to-Pack Nodes (PPN)PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used.

Page 22: Maximize  WebFOCUS  Performance with  Hyperstage

salary age job city

Completely Irrelevant

Suspect

All values match

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;

WebFOCUS Hyperstage Example: Query and Knowledge Grid

Page 23: Maximize  WebFOCUS  Performance with  Hyperstage

salary age job city

1. Find the Data Packs with salary > 50000

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;

WebFOCUS Hyperstage Example: salary > 50000

Completely Irrelevant

All values match

Suspect

Page 24: Maximize  WebFOCUS  Performance with  Hyperstage

salary age job city

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 65

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;

WebFOCUS Hyperstage Example: age<65

Completely Irrelevant

Suspect

All values match

Page 25: Maximize  WebFOCUS  Performance with  Hyperstage

salary age job city

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;

WebFOCUS Hyperstage Example: job = ‘shipping’

Completely Irrelevant

Suspect

All values match

Page 26: Maximize  WebFOCUS  Performance with  Hyperstage

salary age job city

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Louisville’

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;

WebFOCUS Hyperstage Example: city = ‘Louisville’

Completely Irrelevant

Suspect

All values match

Page 27: Maximize  WebFOCUS  Performance with  Hyperstage

salary cityAll packsignored

All packsignored

All packsignored

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Louisville’5. Eliminate All rows that have been flagged as

irrelevant

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;

WebFOCUS Hyperstage Example: Eliminate Pack Rows

Completely Irrelevant

Suspect

All values match

age job

Page 28: Maximize  WebFOCUS  Performance with  Hyperstage

salary cityAll packsignored

Only this pack will be de-compressed

All packsignored

All packsignored

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Louisville’5. Eliminate All rows that have been flagged as

irrelevant6. Finally we identify the pack that needs to be

decompressed

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;

WebFOCUS Hyperstage Example: Decompress and scan

Completely Irrelevant

Suspect

All values match

age job

Page 29: Maximize  WebFOCUS  Performance with  Hyperstage

Copyright 2007, Information Builders. Slide 29

Hyperstage – So what’s the big deal?

Page 30: Maximize  WebFOCUS  Performance with  Hyperstage

WebFOCUS HyperstageThe Big Deal…

No indexesNo partitionsNo viewsNo materialized aggregates

Value propositionLow IT overheadReduced I/O = faster response timesEase of implementationFast time to marketLess Hardware Lower TCO

“Load and Go”

Page 31: Maximize  WebFOCUS  Performance with  Hyperstage

Some Real World Results

Insurance Company Query performance issues with SQL Server - Insurance claims

analysis Compression achieved 40:1 Most queries running 3X faster in Hyperstage

Large Bank Query performance issues with SQL Server - Web traffic analysis Compression achieved 10:1 Queries that ran for 10 to 15 mins in SQL Server ran in sub-seconds

in Hyperstage Government Application

Query performance issues with Oracle – Federal Loan/Grant Tracking

Compression achieved 15:1 Queries that ran for 10 to 15 minutes in Oracle ran in 30 seconds in

Hyperstage

31

Page 32: Maximize  WebFOCUS  Performance with  Hyperstage

Copyright 2007, Information Builders. Slide 32

Demonstration …

Page 33: Maximize  WebFOCUS  Performance with  Hyperstage

Q&A

Copyright 2007, Information Builders. Slide 33