high-performance data warehousing (hiper...

20
Philip Russom Research Director for Data Management, TDWI October 9, 2012 HIGH-PERFORMANCE DATA WAREHOUSING

Upload: others

Post on 26-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Philip Russom Research Director for Data Management, TDWI

October 9, 2012

HIGH-PERFORMANCE DATA WAREHOUSING

Page 2: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

TDWI would like to thank the following companies

for sponsoring the 2012 TDWI Best Practices research report:

High-Performance Data Warehousing

This presentation is based on the findings of that report.

Page 3: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Download a free

copy of the report

• Download the report in a

PDF file at:

bit.ly/TDWI-BP-Rpt-List

• Feel free to distribute the

PDF file of any TDWI Best

Practices Report

Page 4: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Today’s Agenda

• Definitions – High-Performance Data

Warehousing (HiPer DW)

– Four Dimensions of HiPer DW

– Why care about HiPer DW?

• The State of HiPer DW – Benefits and Barriers

– Problems and Opportunities

• HiPer DW Best Practices – Developer Productivity

– Specific Techniques

• Adoption of HiPer DW Features

• Top Ten Priorities for HiPer DW

Please Tweet:

@prussom, #TDWI, #HiPerDW,

#DataWarehouse, #EDW,

#Analytics, #BigData

Page 5: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

NUTSHELL DEFINITION

High-Performance Data

Warehousing (HiPer DW) • Primarily about achieving speed and scale

– While also coping with increasing complexity and concurrency

• Speed, scale, complexity and concurrency are related

– Scaling may require speed

– Complexity and concurrency tend to inhibit speed and scale

• Not just the data warehouse (DW), but every layer of tech stack

– Business intelligence (BI), data integration (DI), analytics, etc.

• Speed and scale from all platform components

– Hardware server, CPU, memory, operating system, storage

• Users must design for optimal high performance

– DW architecture, reports, queries, analytic models, etc.

– Lots of tactical tweaking and tuning, too

Page 6: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

The Four Dimensions of HiPer DW Speed and Scale, plus Complexity and Concurrency

CONCURRENCY • Competing Workloads

• Reporting, Real Time,

OLAP, Adv. Analytics, etc.

• Intra-Day Data Loads

• Thousands of Users

• Ad hoc Queries

SCALE • Big Data Volumes

• Detailed Source Data

• Thousands of Reports

• Scale Out Into: • Clouds, clusters, grids,

distributed architectures

SPEED • Streaming Big Data

• Event Processing

• Real-Time Operation • Operational BI

• Near-Time Analytics

• Dashboard Refresh

• Fast Queries

COMPLEXITY

• Big Data Variety • Unstructured Data

• Machine/sensor Data

• Web & Social Media

• Many Sources/Targets

• Complex Models & SQL

• High Availability

HIGH

PERFORMANCE

DATA

WAREHOUSING

(HiPer DW)

Page 7: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Why Care About Next Gen MDM Now • HiPer DW is a critical success factor to any

real-time business process or BI solution.

– Operational BI, streaming analytics, just-in-

time inventory, facility monitoring, fraud

detection, mobile asset mgt…

• HiPer DW is key to surviving and leveraging

the volume and complexity of Big Data.

– Evolve big data from a cost center to a

resource for business innovation

• BI user constituencies and their collections

of reports are exploding.

– HiPer DW (that’s not just the DW) is key to

scaling up to massive user communities

• Advanced analytics is growing aggressively

– It brings extreme, demanding workloads

that will required HiPer DW

Page 8: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Users’ Priorities for High-Performance DW In priority order, based on survey responses

• Analytic methods are the primary beneficiaries of high performance.

– Advanced analytics (62%), big data for analytics (40%), OLAP (26%)

• Real-time BI practices are also key beneficiaries of HiPer DW.

– Operational BI (37%), dashboards & performance mgt (34%),

operational analytics (30%), automated decisions for real-time (25%)

• System performance can contribute to business processes that rely on

data or BI/DW/DI infrastructure.

– Business decisions and strategies (33%), customer experience and

service (21%), business performance and execution (19%), and data-

driven corporate objectives (14%)

• Enterprise BI needs scalable performance.

– Standard reports (15%), supporting thousands of concurrent users

(15%), refreshing thousands of reports (12%)

Page 9: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Challenges to High-Performance DW In priority order, based on survey responses

• Cost is the leading challenge to achieving high performance (61%).

– New software, train users to optimize, acquire bigger/faster hardware

• A third of users (34%) feel their tools/platforms hold back performance.

– Half want to replace tools/platforms, in hopes of higher performance

• Some users think low performance is due to inadequate skills (34%).

– Optimal designs, tweaking, and tuning are special skills

• Handling data in real time (31%) can seem sluggish.

– Common perception: report refreshes should be as fast as Google

– We need to set realistic expectations with users

• Data problems are the most common type of performance challenge.

– Inadequate data mgt infrastructure (28%), poor quality of data or metadata

(27%), increasingly complex data transformations (21%)

Page 10: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

HiPer DW is an Opportunity, not a Problem

Page 11: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

HiPer DW is Important

• Few users deny that HiPer DW is

important.

– It’s extremely important (66%)

or moderately important (28%)

– Only 6% consider HiPer DW to

be a non-issue

• The majority of users surveyed

are doing something about it.

– Most achieve HiPer DW via a

moderate amount of tweaking

(61%)

– Others made major changes for

the sake of performance (27%)

– Only 12% have done little or

nothing

Users are Taking Action

Page 12: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Why do Developers invest time in Performance Optimization?

• Business needs optimal performance from systems for BI/DW/DI and analytics.

– Business practices demand faster and bigger BI and analytics (68%), business strategy seeks maximum value from each system (19%)

• Keeping pace with growth is a common reason for performance optimization.

– Scaling up to large data volumes (46%), scaling to greater analytic complexity (32%), scaling to larger user communities with more reports (25%)

• One way to keep pace with growth is to upgrade hardware.

– Adding more data without upgrading hardware (14%), adding users and applications without upgrading hardware (8%).

– Adding more and heftier hardware is a tried-and-true method of optimization, though – when taken to extremes – it raises costs and dulls optimization skills.

• Performance optimization occasionally (or rarely) compensates for tool deficiencies.

– BI and analytic tools are not high performance (15%), database software is not high performance (6%), BI and analytic tools do not take advantage of database software (4%), database software does not have features we need (3%)

• SUMMARY –

– Users improve system performance mostly in response to new business demands and overall growth, less often due to tool deficiencies.

Page 13: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Developer Productivity with HiPer DW

• GOOD NEWS -- Performance

tweaking and tuning for are not

too time consuming.

– 3/4 of survey respondents say

optimization work consumes

30% or less of their time.

– Only 9% of report expending

50% or more of their time.

• POINT -- Tuning and tweaking

are part of the job.

– DW/BI professionals need to

hone their optimization skills

and apply them to the design

process, plus ongoing

maintenance.

• BAD NEWS -- Performance

optimization prevents developers

from developing.

– In most shops, developers are

under pressure to develop as

much new functionality as

possible, in a short time.

– Performance tuning gets in the

way of that primary mandate.

• POINT – Take care optimization

doesn’t take over your job.

– Adopt tools/platforms and

developer methods/standards

that performance without

much ex post facto tuning.

Page 14: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Specific Techniques for Achieving HiPer DW • The most common techniques involve changing the physical location of data.

– Creating summary tables (45%), creating a data mart with its own copy of data (20%), column-oriented data storage (16%).

• Some optimization techniques are more virtual than physical. – Creating customer indices (44%) or materializing queries

• Fine-tuning SQL statements is a highly valued skill for HiPer DW. – SQL is everywhere: reporting, analytics, DW, ETL… / hand coded, generated…

– A programmer with a knack for SQL can cure a lot of performance bottlenecks.

• Using in-memory databases (24%) avoids I/O bottlenecks. – E.g., in-memory data caches with automated refresh and backup to disk

– For I/O free access to tables or cubes for operational BI, performance mgt, dashboards, etc.

• Upgrading hardware (21%) can be a useful technique. – It can also be an expensive crutch.

– Add hardware only when truly necessary and effective.

• Workload management controls (16%) are great, if available to you. – Most vendor brands of DBMS have some kind of workload management tool built in

• SUMMARY -- Common optimization techniques include remodeling data, indexing, revising SQL, and upgrading hardware.

Page 15: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

-50%+

25%

100%

0% +25%

50%

75%

GROWTH

CO

MM

ITM

EN

T

Weak

Declining

MapReduce

In-Database

Analytics

Group 3 – Moderate-to-strong commitment,

weak-to-declining growth

Group 2 - Weak commitment,

moderate-to-strong potential growth

Group 1 - Moderate commitment,

moderate-to-strong potential

growth

-25%

Streaming Data

Hadoop Distributed

File System (HDFS)

Weak Moderate Strong

Modera

te

Str

ong

Real-Time Data

Feeds Into DW

In-Memory

Database

Private Cloud

Real-Time Data

Fetches from DW

No-SQL

Database

Column-Oriented

Storage Engine

Solid-State

Drives

Complex Event Processing

(CEP)

Public Cloud

Intra-Day

Micro-Batch

Service

Bus

Grid

Computing

Data

Warehouse

Appliance

Managing Large

Volumes of Detailed

Source Data

Mixed

Workloads in

Single DW

Central

EDW Multi-Core

CPUs (-63%)

• HOW TO

READ CHART

• Techniques

with growing

adoption are

on the right

• Techniques in

decline are on

the left

• Heavily used

techniques are

at the top

• Barely used

techniques are

at the bottom

Trends in Techniques for High-Performance DW

Source: TDWI. Based on 278 respondents to HiPer DW Best Practices Report Survey, 2012.

Page 16: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

ACCORDING TO 2012 TDWI BEST PRACTICES SURVEY

Trends Among HiPer DW Techniques 1. Hottest growth areas

– Real-time operation: Operational BI & Operational Analytics

– In-database analytics: bring algorithm to data, not reverse

– In-memory databases: eliminate I/O for speed and scale

– Solid-state drives: faster (& more costly) than spinning drives

2. New stuff not used much today, but poised for growth

– Hadoop Distributed File System (HDFS): lots of interest, but implementations are rare; promises scalability & unstruc’d data

– MapReduce: provides MPP execution of hand-coded routines

– New Analytic Databases: especially columnar & NoSQL

– Clouds: Users are considering private ones over public ones

3. Traditional tech’s will have slow growth due to saturation

– Enterprise Data Warehouse (EDW): most users surveyed have this in place already

– Mixed workloads on single DW: some users are pushing non-standard workloads to standalone “edge” systems next to EDW

Page 17: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Top Ten Priorities for High-Performance DW These are recommendations, requirements, or rules that can guide you.

1. Enable new business practices based on high-performance BI/DW/DI and analytics.

2. Make real-time operation your first technology priority for HiPer DW.

3. Make scalability your second priority.

4. Hardware: Use it, but don’t abuse it.

5. Select database platforms and analytic tools that are designed for high performance.

6. Rely on specialized platform and tool functionality for certain performance gains.

7. Consider the many new architectures that boost performance.

8. Keep your performance optimization skills sharp and current.

9. Design and develop with high performance in mind.

10. Develop and apply a technology strategy for HiPer DW.

Page 18: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

Four Components of a HiPer DW Strategy No single approach is adequate for all situations.

Tap into and balance four approaches.

1. Up-to-date hardware platform components

• Especially CPUs, memory, and storage

2. Up-to-date enterprise software platforms and tools

• Especially those designed specifically for demanding applications in data warehousing and analytics

3. Technical users’ global architectures

• Especially data and team standards for BI development

• Govern data models, SQL coding, ETL logic, and analytic algorithms to assure performance

4. Tactical tweaking and tuning on the local level

• As required by reports, data structures, analytic algorithms, or deficient tools and platforms

VE

ND

OR

PR

OD

UC

TS

US

ER

PR

AC

TIC

ES

Page 19: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

19

Questions??

Page 20: High-Performance Data Warehousing (HiPer DW)download.101com.com/pub/tdwi/Files/ResearchReport100912.pdfHigh-Performance Data Warehousing (HiPer DW) • Primarily about achieving speed

20

Contact Information

If you have further questions or comments:

Philip Russom, TDWI

[email protected]