high-performance data warehousing (hiper...
TRANSCRIPT
Philip Russom Research Director for Data Management, TDWI
October 9, 2012
HIGH-PERFORMANCE DATA WAREHOUSING
TDWI would like to thank the following companies
for sponsoring the 2012 TDWI Best Practices research report:
High-Performance Data Warehousing
This presentation is based on the findings of that report.
Download a free
copy of the report
• Download the report in a
PDF file at:
bit.ly/TDWI-BP-Rpt-List
• Feel free to distribute the
PDF file of any TDWI Best
Practices Report
Today’s Agenda
• Definitions – High-Performance Data
Warehousing (HiPer DW)
– Four Dimensions of HiPer DW
– Why care about HiPer DW?
• The State of HiPer DW – Benefits and Barriers
– Problems and Opportunities
• HiPer DW Best Practices – Developer Productivity
– Specific Techniques
• Adoption of HiPer DW Features
• Top Ten Priorities for HiPer DW
Please Tweet:
@prussom, #TDWI, #HiPerDW,
#DataWarehouse, #EDW,
#Analytics, #BigData
NUTSHELL DEFINITION
High-Performance Data
Warehousing (HiPer DW) • Primarily about achieving speed and scale
– While also coping with increasing complexity and concurrency
• Speed, scale, complexity and concurrency are related
– Scaling may require speed
– Complexity and concurrency tend to inhibit speed and scale
• Not just the data warehouse (DW), but every layer of tech stack
– Business intelligence (BI), data integration (DI), analytics, etc.
• Speed and scale from all platform components
– Hardware server, CPU, memory, operating system, storage
• Users must design for optimal high performance
– DW architecture, reports, queries, analytic models, etc.
– Lots of tactical tweaking and tuning, too
The Four Dimensions of HiPer DW Speed and Scale, plus Complexity and Concurrency
CONCURRENCY • Competing Workloads
• Reporting, Real Time,
OLAP, Adv. Analytics, etc.
• Intra-Day Data Loads
• Thousands of Users
• Ad hoc Queries
SCALE • Big Data Volumes
• Detailed Source Data
• Thousands of Reports
• Scale Out Into: • Clouds, clusters, grids,
distributed architectures
SPEED • Streaming Big Data
• Event Processing
• Real-Time Operation • Operational BI
• Near-Time Analytics
• Dashboard Refresh
• Fast Queries
COMPLEXITY
• Big Data Variety • Unstructured Data
• Machine/sensor Data
• Web & Social Media
• Many Sources/Targets
• Complex Models & SQL
• High Availability
HIGH
PERFORMANCE
DATA
WAREHOUSING
(HiPer DW)
Why Care About Next Gen MDM Now • HiPer DW is a critical success factor to any
real-time business process or BI solution.
– Operational BI, streaming analytics, just-in-
time inventory, facility monitoring, fraud
detection, mobile asset mgt…
• HiPer DW is key to surviving and leveraging
the volume and complexity of Big Data.
– Evolve big data from a cost center to a
resource for business innovation
• BI user constituencies and their collections
of reports are exploding.
– HiPer DW (that’s not just the DW) is key to
scaling up to massive user communities
• Advanced analytics is growing aggressively
– It brings extreme, demanding workloads
that will required HiPer DW
Users’ Priorities for High-Performance DW In priority order, based on survey responses
• Analytic methods are the primary beneficiaries of high performance.
– Advanced analytics (62%), big data for analytics (40%), OLAP (26%)
• Real-time BI practices are also key beneficiaries of HiPer DW.
– Operational BI (37%), dashboards & performance mgt (34%),
operational analytics (30%), automated decisions for real-time (25%)
• System performance can contribute to business processes that rely on
data or BI/DW/DI infrastructure.
– Business decisions and strategies (33%), customer experience and
service (21%), business performance and execution (19%), and data-
driven corporate objectives (14%)
• Enterprise BI needs scalable performance.
– Standard reports (15%), supporting thousands of concurrent users
(15%), refreshing thousands of reports (12%)
Challenges to High-Performance DW In priority order, based on survey responses
• Cost is the leading challenge to achieving high performance (61%).
– New software, train users to optimize, acquire bigger/faster hardware
• A third of users (34%) feel their tools/platforms hold back performance.
– Half want to replace tools/platforms, in hopes of higher performance
• Some users think low performance is due to inadequate skills (34%).
– Optimal designs, tweaking, and tuning are special skills
• Handling data in real time (31%) can seem sluggish.
– Common perception: report refreshes should be as fast as Google
– We need to set realistic expectations with users
• Data problems are the most common type of performance challenge.
– Inadequate data mgt infrastructure (28%), poor quality of data or metadata
(27%), increasingly complex data transformations (21%)
HiPer DW is an Opportunity, not a Problem
HiPer DW is Important
• Few users deny that HiPer DW is
important.
– It’s extremely important (66%)
or moderately important (28%)
– Only 6% consider HiPer DW to
be a non-issue
• The majority of users surveyed
are doing something about it.
– Most achieve HiPer DW via a
moderate amount of tweaking
(61%)
– Others made major changes for
the sake of performance (27%)
– Only 12% have done little or
nothing
Users are Taking Action
Why do Developers invest time in Performance Optimization?
• Business needs optimal performance from systems for BI/DW/DI and analytics.
– Business practices demand faster and bigger BI and analytics (68%), business strategy seeks maximum value from each system (19%)
• Keeping pace with growth is a common reason for performance optimization.
– Scaling up to large data volumes (46%), scaling to greater analytic complexity (32%), scaling to larger user communities with more reports (25%)
• One way to keep pace with growth is to upgrade hardware.
– Adding more data without upgrading hardware (14%), adding users and applications without upgrading hardware (8%).
– Adding more and heftier hardware is a tried-and-true method of optimization, though – when taken to extremes – it raises costs and dulls optimization skills.
• Performance optimization occasionally (or rarely) compensates for tool deficiencies.
– BI and analytic tools are not high performance (15%), database software is not high performance (6%), BI and analytic tools do not take advantage of database software (4%), database software does not have features we need (3%)
• SUMMARY –
– Users improve system performance mostly in response to new business demands and overall growth, less often due to tool deficiencies.
Developer Productivity with HiPer DW
• GOOD NEWS -- Performance
tweaking and tuning for are not
too time consuming.
– 3/4 of survey respondents say
optimization work consumes
30% or less of their time.
– Only 9% of report expending
50% or more of their time.
• POINT -- Tuning and tweaking
are part of the job.
– DW/BI professionals need to
hone their optimization skills
and apply them to the design
process, plus ongoing
maintenance.
• BAD NEWS -- Performance
optimization prevents developers
from developing.
– In most shops, developers are
under pressure to develop as
much new functionality as
possible, in a short time.
– Performance tuning gets in the
way of that primary mandate.
• POINT – Take care optimization
doesn’t take over your job.
– Adopt tools/platforms and
developer methods/standards
that performance without
much ex post facto tuning.
Specific Techniques for Achieving HiPer DW • The most common techniques involve changing the physical location of data.
– Creating summary tables (45%), creating a data mart with its own copy of data (20%), column-oriented data storage (16%).
• Some optimization techniques are more virtual than physical. – Creating customer indices (44%) or materializing queries
• Fine-tuning SQL statements is a highly valued skill for HiPer DW. – SQL is everywhere: reporting, analytics, DW, ETL… / hand coded, generated…
– A programmer with a knack for SQL can cure a lot of performance bottlenecks.
• Using in-memory databases (24%) avoids I/O bottlenecks. – E.g., in-memory data caches with automated refresh and backup to disk
– For I/O free access to tables or cubes for operational BI, performance mgt, dashboards, etc.
• Upgrading hardware (21%) can be a useful technique. – It can also be an expensive crutch.
– Add hardware only when truly necessary and effective.
• Workload management controls (16%) are great, if available to you. – Most vendor brands of DBMS have some kind of workload management tool built in
• SUMMARY -- Common optimization techniques include remodeling data, indexing, revising SQL, and upgrading hardware.
-50%+
25%
100%
0% +25%
50%
75%
GROWTH
CO
MM
ITM
EN
T
Weak
Declining
MapReduce
In-Database
Analytics
Group 3 – Moderate-to-strong commitment,
weak-to-declining growth
Group 2 - Weak commitment,
moderate-to-strong potential growth
Group 1 - Moderate commitment,
moderate-to-strong potential
growth
-25%
Streaming Data
Hadoop Distributed
File System (HDFS)
Weak Moderate Strong
Modera
te
Str
ong
Real-Time Data
Feeds Into DW
In-Memory
Database
Private Cloud
Real-Time Data
Fetches from DW
No-SQL
Database
Column-Oriented
Storage Engine
Solid-State
Drives
Complex Event Processing
(CEP)
Public Cloud
Intra-Day
Micro-Batch
Service
Bus
Grid
Computing
Data
Warehouse
Appliance
Managing Large
Volumes of Detailed
Source Data
Mixed
Workloads in
Single DW
Central
EDW Multi-Core
CPUs (-63%)
• HOW TO
READ CHART
• Techniques
with growing
adoption are
on the right
• Techniques in
decline are on
the left
• Heavily used
techniques are
at the top
• Barely used
techniques are
at the bottom
Trends in Techniques for High-Performance DW
Source: TDWI. Based on 278 respondents to HiPer DW Best Practices Report Survey, 2012.
ACCORDING TO 2012 TDWI BEST PRACTICES SURVEY
Trends Among HiPer DW Techniques 1. Hottest growth areas
– Real-time operation: Operational BI & Operational Analytics
– In-database analytics: bring algorithm to data, not reverse
– In-memory databases: eliminate I/O for speed and scale
– Solid-state drives: faster (& more costly) than spinning drives
2. New stuff not used much today, but poised for growth
– Hadoop Distributed File System (HDFS): lots of interest, but implementations are rare; promises scalability & unstruc’d data
– MapReduce: provides MPP execution of hand-coded routines
– New Analytic Databases: especially columnar & NoSQL
– Clouds: Users are considering private ones over public ones
3. Traditional tech’s will have slow growth due to saturation
– Enterprise Data Warehouse (EDW): most users surveyed have this in place already
– Mixed workloads on single DW: some users are pushing non-standard workloads to standalone “edge” systems next to EDW
Top Ten Priorities for High-Performance DW These are recommendations, requirements, or rules that can guide you.
1. Enable new business practices based on high-performance BI/DW/DI and analytics.
2. Make real-time operation your first technology priority for HiPer DW.
3. Make scalability your second priority.
4. Hardware: Use it, but don’t abuse it.
5. Select database platforms and analytic tools that are designed for high performance.
6. Rely on specialized platform and tool functionality for certain performance gains.
7. Consider the many new architectures that boost performance.
8. Keep your performance optimization skills sharp and current.
9. Design and develop with high performance in mind.
10. Develop and apply a technology strategy for HiPer DW.
Four Components of a HiPer DW Strategy No single approach is adequate for all situations.
Tap into and balance four approaches.
1. Up-to-date hardware platform components
• Especially CPUs, memory, and storage
2. Up-to-date enterprise software platforms and tools
• Especially those designed specifically for demanding applications in data warehousing and analytics
3. Technical users’ global architectures
• Especially data and team standards for BI development
• Govern data models, SQL coding, ETL logic, and analytic algorithms to assure performance
4. Tactical tweaking and tuning on the local level
• As required by reports, data structures, analytic algorithms, or deficient tools and platforms
VE
ND
OR
PR
OD
UC
TS
US
ER
PR
AC
TIC
ES
19
Questions??
20
Contact Information
If you have further questions or comments:
Philip Russom, TDWI