peter azzarello april 11, 2012 ib toronto user forum webfocus hyperstage overview summit 2012
TRANSCRIPT
Peter Azzarello
April 11, 2012
IB Toronto User Forum
WebFOCUS Hyperstage Overview
Summit 2012
WebFOCUSHigher Adoption & Reuse with Lower TCO
Reporting
Query & Analysis
Dashboards
Information Delivery
PerformanceManagement
EnterpriseSearch
Visualization& Mapping
Data UpdatingPredictive Analytics
MS Office &e-Publishing
Extended BI
Core BI
Extensions to the WebFOCUS
platform allow you to build more
application types at a lower cost
Business toBusiness
Data Warehouse& ETL
Master DataManagement
Data Profiling & Data Quality
Business ActivityMonitoring
High PerformanceData Store
Mobile Applications
Copyright 2007, Information Builders. Slide 3
The Business Challenge
Big Data
Big Data and Machine Generated Data
Data Storage
Time
Machine- GeneratedData
Human-GeneratedData
Today’s Top Data-Management Challenge
Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010)
How Performance Issues are Typically Addressed – by Pace of Data Growth
Don't Know / Unsure
Upgrade networking infrastructure
Archive older data on other systems
Upgrade/expand storage systems
Upgrade server hardware/processors
Tune or upgrade existing databases
0% 20% 40% 60% 80% 100%
7%
21%
30%
33%
54%
66%
4%
32%
44%
60%
70%
75%
High Growth
Low Growth
When organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the
problem
IT Manager’s try to mitigate these response times …..
Copyright 2007, Information Builders. Slide 6
Traditional Data Warehousing
Labor intensive, heavy indexing, aggregations and partitioning
Hardware intensive: massive storage; big servers
Expensive and complex
More Data, More Data Sources
More Kinds of Output Needed by More Users,
More Quickly
Limited Resources and Budget
0101010101010101010101010101
0101010101010101010101010
0101010101010101010101
1
0101010101010101010101
10
1010 1011001
0 110
01
1
0
01
101
010101
1
1
0101
0
1010
101
10 0101
10
01
10
0110
1
0
10101
01 010 01 0101
011
10100101
1
01
0
10
1010 1011001
0 110
01
1
0
01
10
1
0
10101
10
0101010101010101010101010
0101010101010101010101010101
1
10110
0 101
1010 10 1101
010
0
0 101 0010
0
Real time data
Multiple databases
External Sources
Data Warehousing Challenges
New Demands: Larger transaction volumes driven by the internetImpact of Cloud ComputingMore -> Faster -> Cheaper
Data Warehousing Matures: Near real time updatesIntegration with master data managementData mining using discrete business transactionsProvision of data for business critical applications
Early Data Warehouse Characteristics:Integration of internal systemsMonthly and weekly loadsHeavy use of aggregates
Data Warehousing Challenges
CUBES/OLAP
Classic Approaches to deal with Large Data
INDEXES
Limitations of Indexes
Increased Space requirements Sum of Index Space requirements can exceed the source
DB Index Management
Increases Load timesBuilding the index
Predefines a fixed access path
Limitations of OLAP
Cube technology has limited scalability Number of dimensions is limited Amount of data is limited
Cube technology is difficult to update (add Dimension) Usually requires a complete rebuild Cube builds are typically slow New design results in a new cube
Easy Migration to Hyperstage
Most cubes will be fed from a relational sourceCommon that relational source is a star schemaThe source star schema can be migrated directly to
HyperstageWebFOCUS metadata can be used to define hierarchies and
drill paths to navigate the star schema
Copyright 2007, Information Builders. Slide 12
Pivoting Your Perspective:Columnar Technology ….
1. Impediments to business agility: Organizations often must wait for DBAs to create indexes or other tuning structures, thereby delaying access to data. In addition, indexes significantly slow data-loading operations and increase the size of the database, sometimes by a factor of 2x.
2. Loss of data and time fidelity: IT generally performs ETL operations in batch mode during non-business hours. Such transformations delay access to data and often result in mismatches between operational and analytic databases.
3. Limited ad hoc capability: Response times for ad hoc queries increase as the volume of data grows. Unanticipated queries (where DBAs have not tuned the database in advance) can result in unacceptable response times, and may even fail to complete.
4. Unnecessary expenditures: Attempts to improve performance using hardware acceleration and database tuning schemes raise the capital costs of equipment and the operational costs of database administration. Further, the added complexity of managing a large database diverts operational budgets away from more urgent IT projects.
These Solutions Contribute to Operational Limitations
The Limitation of Rows
Row-based databases are ubiquitous because so many
of our most important business systems are transactional.
Row-oriented databasesare well suited for
transactional environments, such as a call center where a
customer’s entire record is required when their profile
is retrieved and/or when fields are frequently updated.
The Ubiquity of Rows …
But - Disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all
column data for any query.
30 columns
50 millions
Rows
The Limitation of Rows
Row Oriented (1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000)
Works well if all the columns are needed for every query. Efficient for transactional processing if all the data for the row is available
Works well with aggregate results (sum, count, avg. ) Only columns that are relevant need to be touched Consistent performance with any database design Allows for very efficient compression
Column Oriented (1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000)
Pivoting Your Perspective: Columnar Technology
Employee Id
1
2
3
Name
Smith
Jones
Fraser
Location
New York
New York
Boston
Sales
50,000
65,000
40,000
4 Fraser Boston 70,000
Employee Id
1
2
3
Name
Smith
Jones
Fraser
Location
New York
New York
Boston
Sales
50,000
65,000
40,000
1 Smith New York 50,000
2 Jones New York 65,000
3 Fraser Boston 40,000
1
2
3
Smith New York 50,000
Jones New York 65,000
Data stored in rows
Fraser Boston 40,000
Data stored in columns
Pivoting Your Perspective: Columnar Technology
4 Fraser Boston 70,000
4 Fraser Boston 70,000 4 Fraser Boston 70,000
Copyright 2007, Information Builders. Slide 17
Introducing WebFOCUS Hyperstage
The Hyperstage Mission
Improve database performance for WebFOCUS applications with less
hardware, no database tuning and easy migration.
The WebFOCUS Hyperstage high performance analytic data store is designed to handle business-driven queries on large volumes of data—without IT intervention. Easy to implement and manage, Hyperstage provides the answers to your business users need at a price you can afford.
Introducing WebFOCUS Hyperstage ….
What is it?
Hyperstage combines a columnar database with intelligence we call the Knowledge Grid to deliver fast query responses.
.
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Knowledge Grid
Compressor
BulkLoader
• Unmatched Administrative Simplicity • No Indexes• No data partitioning• No Manual tuning
Self-managing: 90% less administrative effortLow-cost: More than 50% less than alternative
solutionsScalable, high-performance: Up to 50 TB using a
single industry standard serverFast queries: Ad-hoc queries are as fast as
anticipated queries, so users have total flexibilityCompression: Data compression of 10:1 to 40:1
that means a lot less storage is needed, it might mean you can get the entire database in memory!
Introducing WebFOCUS Hyperstage ….
What does this mean for Customers?
Create Information(Metadata) about the data,
and, upon Load, automatically …
Uses the metadata whenProcessing a query to
Eliminate / reduce need to access data
Architecture Benefits
o Stores it in the Knowledge Grid (KG)o KG Is loaded into Memoryo Less than 1% of compressed data Size
o The less data that needs to be accessed, the faster the response o Sub-second responses when answered by KG
o No Need to partition data, create/maintain indexes projections, or tune for performanceo Ad hoc queries are as fast as static queries, so users have total flexibility
Introducing WebFOCUS Hyperstage ….
How does it work?
WebFOCUS Hyperstage Runtime Architecture
Hypercopy
Hyperstage Server
Hyperstage Engine
MySQL
WebFOCUS Server
WebFOCUSPro Server
HyperstageAdapter
Knowledge Grid
Compressor
BulkLoader
Hypercopy
Hyperstage Server
Hyperstage Engine
WebFOCUS Server
WebFOCUSHyperstage
Adapter
Knowledge Grid
Compressor
BulkLoader
Smarter Architecture
No maintenance No query planning No partition schemes No DBA
Data Packs – data stored in manageably sized, highly compressed data packs
Knowledge Grid – statistics and metadata “describing” the super-compressed data
Column Orientation
WebFOCUS Hyperstage Engine
Data compressed using algorithms tailored to data type
How does it work?
Copyright 2007, Information Builders. Slide 26
Summary
Copyright 2007, Information Builders. Slide 27
Business Intelligence – Meeting Requirements
WebFOCUS HyperstageThe Big Deal…
No indexesNo partitionsNo viewsNo materialized aggregates
Value propositionLow IT overheadAllows for autonomy from ITEase of implementationFast time to marketLess Hardware Lower TCO
No DBA Required!
What’s it look like?
What’s it look like?
Pay no attention to that man behind the curtain.
CREATE FILE baseapp/pa_inventory_ind_t DROP -RUN
BULKLOAD baseapp/pa_inventory_ind_t FOR SQLINLD INV_CODE; TYPE; CATEGORY; NAME; MODEL; MEASURE1_INV; MEASURE2_INV; MEASURE3_INV;
JOIN SYMBOLS.SYMBOLS.SYMBOL IN SYMBOLS TO MULTIPLE QUOTES_2B.QUOTES_2B.SYMBOL IN QUOTES_2B TAG J0 AS J0 END TABLE FILE SYMBOLS PRINT SYMBOL CLOSE_DATE CLOSE_PRICE VOLUME OPEN_PRICE WHERE ( SYMBOL EQ '&SYMBOL.(<MSFT,MSFT>).SYMBOL.' ) AND ( CLOSE_DATE GT '&START_DATE.(<2000-03-
01,2000-03-01>).yyyy-mm-dd.' ) AND ( CLOSE_DATE LT '&END_DATE.(<2000-03-31,2000-03-31>).yyyy-mm-dd.' ); ON TABLE SET PAGE-NUM NOLEAD ON TABLE NOTOTAL ON TABLE PCHOLD FORMAT HTML ON TABLE SET HTMLCSS ON ON TABLE SET STYLE * INCLUDE = endeflt, $ ENDSTYLE END
Example – Focus to Hyperstage Compression 243639 Rows
Q&A
Copyright 2007, Information Builders. Slide 33
STAR SCHEMA CONSIDERATIONS
Leverage the Knowledge Grid
• Do constrain the fact table directly
• Do use sub-selects instead of joins
• Do use date based constraints as much as possible
• Do add additional columns to create useful knowledge nodes
Everyone wants to be a Star
Adding as many WHERE conditions as you can to your SQL increases the chance that knowledge grid statistics can be used to increase the performance of your queries.