mwdug-olap

36
© 2007 IBM Corporation IBM Software Group © 2008 IBM Corporation OnLine Analytical Processing (OLAP) Andy Perkins zWarehouse SWAT Team [email protected]

Upload: timothy212

Post on 27-Jan-2015

105 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: MWDUG-OLAP

© 2007 IBM Corporation

IBM Software Group

© 2008 IBM Corporation

OnLine Analytical Processing (OLAP)

Andy PerkinszWarehouse SWAT Team

[email protected]

Page 2: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation2

The right information

To the right people

At the right time

Page 3: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation3

Reporting grows up..

Historical ReportsQuery & Reporting

to Understand What Happened

Operational Reports

Transaction Systems to understand what is

happening in the business RIGHT NOW

OLAP & Data Mining to Understand Why and

Recommend Future ActionInformation Analysis

Page 4: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation4

… to become Business Intelligence

Simple reporting no longer sufficient

Business Intelligence: the process of gathering, consolidating, and analyzing data from multiple sources for strategic and tactical decision making.

– derives new value from transactional data

– supports strategic planning, monitoring, and efficiency

– delivers knowledge of the customer, suppliers, and channels

– unifies the enterprise with actionable information for operational Business Intelligence

Top quality BI relies on a secure, high performing, warehouse oriented infrastructure to deliver Information on Demand—based on open standards

Analysis

Page 5: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation5

Examples of Business Intelligence

Financial Analytics –Financial consolidation

–Business Performance Monitoring (BPM)

–Balanced Scorecards

–ERP reporting

CRM Analytics–Customer segmentation

–Customer acquisition & retention

–Profitability analysis

–Campaign management

–Market basket analysis

Other Analytics–Demand Planning

–Pricing elasticity analysis

–Risk analysis

–Inventory Forecasting

–Supply chain forecasting

–Supplier scorecards

–Workforce analysis

–Logistics trend analysis

–Procurement analysis

–Category management

DB2 Perform

ance

And Usage

Page 6: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation6

Business Intelligence requires good foundation…

Business Intelligence (BI) and Data Warehousing (DW) are sometimes used interchangeably– Typically BI includes end user tools for query, reporting, analysis,

dashboarding etc.

– Includes advanced analytics such as Online Analytic Processing (OLAP) and data mining

– Both concepts depend on each other

• BI almost always assumes a Warehouse (WH), Operational Data Store (ODS) or Data Mart (DM) exists with timely, trusted information

• A DW depends on end user tools that turn data into information.

Both terms (DW and BI) address desire for timely, accurate, available data delivered when, where and how the end users want it

NEW TERM: Operational Intelligence or Operational BI

Page 7: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation7

Multidimensional Reporting/Analysis

A style of viewing information from various perspectives and aggregation levels over time

Start at a high level for seeing trends and finding outliers – Sales vs Costs by Region by Product Category by Quarter for the last 5

quarters

Drill down for more detail– Sales vs Costs by Stores in the South Region by Product Category by

Month for the last 2 quarters

Change perspective and filter– Sales vs Costs by Sales Person by Product Category by Month for the last

2 quarters for Store 25

Not dependent on any particular technology

Can be accomplished by iteratively requesting batch reports

Page 8: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation8

OnLine Analytical Processing (OLAP)

Interactive multidimensional analysis at the “speed of thought”

Great CalculationsSimple aggregations: sums, averagesTime based calculations – 3 month moving averagesMulti-pass calculations – rank, percentage of total

AggregationExpress queries in terms of dimensionsAggregate using dimension hierarchiesIdentify key indicators using business terminology

Navigation Dimensions: Product, Geography, Time Dimensions have attributes: Products have colors, sizes, price ranges Dimensions have hierarchical levels: Region->State->City

Page 9: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation9

General OLAP Architecture

Data Warehouse

Multidimensional Server

Excel

Report Server

Report Server

SQLMDX Web

Server

Web Server

MD

X

Modeling and Admin Tooling

9

Page 10: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation10

Desktop OLAP (DOLAP)

Data Warehouse

Multidimensional Server

10

Client Desktop/Laptop

SQL Extract

Page 11: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation11

Multidimensional Storage OLAP (MOLAP)

Data Warehouse

Multidimensional Server

Excel

Report Server

Report Server

MDX Web Server

Web Server

MD

X

Modeling and Admin Tooling

11

Extract

Page 12: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation12

Relational OLAP (ROLAP)

Data Warehouse

Multidimensional Metadata

Excel

Report Server

Report Server

Web Server

Web Server

Modeling and Admin Tooling

12

SQL

Page 13: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation13

Hybrid OLAP (HOLAP)

Data Warehouse

Multidimensional Server

Excel

Report Server

Report Server

MDX Web Server

Web Server

MD

X

Modeling and Admin Tooling

13

SQL

Page 14: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation14

Data Modeling

Star Schema model is predominant modeling style for the relational

database

Fact table(s)Store valuesFact table(s)Store values

Dept DimensionDept Dimension

TimeDimensionTimeDimension

Account DimensionAccount Dimension

ProjectDimensionProjectDimension

XYZDimensionXYZDimension

Dimension Tables– Define the categories that organize the

analyzed metrics

– E.g., Stores, Time, Customer

– Contain everything about that category that the business analysis might need (attributes)

– Primary key identifies a single member at the lowest level of grain.

Fact Tables– Contain all the metrics (measures) for the

business analysis

– At the same grain as the dimension tables*

– Foreign keys join back to the dimension tables to enable grouping and aggregating.

Page 15: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation15

Sample Star Schema

Page 16: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation16

Snowflake Schema

Page 17: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation17

Advantages of Star and Snowflake Schemas

Reflect the dimensional nature of the business and the business questions– SQL query is (longer but) very similar to the business question:

What were sales of shoes in Q1 by region?

SelectSUM(Fact.Sales), Store.Region

FromFact, Store, Product, Time

WhereFact.time_id = Time.time_id ANDFact.produ_id=Proeuct.prod_id ANDFact.store_id=Store.store_id ANDTime.Qtr = ‘Q1’ ANDProduct.product = ‘Shoes”

Group ByStore.Region

For the Snowflake, we would simply see more joins to join the table that has the granularity of the business question.

Page 18: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation18

Advantages of Star and Snowflake Schemas

SQL is straightforward for a tool to generate

Denormalized for faster reads

Optimized for n-way joins on the fact table– With good RI (enforced or informational) the DB2 optimizer can do

a good job with star joins

Optimized for aggregations on the dimensional hierarchies– Advisors and MQTs can help materialize aggregations

Column calculations on the same row are efficient– E.g, Profit = sales_col – COGS_col

Page 19: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation19

PRODUCT Qtr1

WAGroup1 Qtr1

WAProd11 Qtr1

WAProd12 Qtr1

WAProd13 Qtr1

WAProd14 Q

tr1

WA

PRODUCT Qtr1

ORGroup1 Qtr1

ORProd11 Qtr1

ORProd12 Qtr1

ORProd13 Q

tr1ORProd14 Q

tr1

OR

PRODUCT Qtr1

LAGroup1 Qtr1

LAProd11 Q

tr1

LAProd12 Qtr1

LAProd13 Q

tr1

LAProd14 Qtr1

LA

PRODUCT Qtr1

WESTGroup1 Qtr1

WESTProd11 Qtr1

WESTProd12 Qtr1

WESTProd13 Qtr1

WESTProd14 Qtr1

WEST

Slice

PRODUCT Qtr1

MARKETGroup1 Qtr1

MARKETProd11 Qtr1

MARKETProd12 Qtr1

MARKETProd13 Qtr1

MARKETProd14 Qtr1

MARKET

PRODUCT Mar

WAGroup1 Mar

WAProd11 Mar

WAProd12 Mar

WAProd13 Mar

WAProd14 M

ar

WA

PRODUCT Mar

ORGroup1 Mar

ORProd11 Mar

ORProd12 Mar

ORProd13 M

ar

ORProd14 Mar

OR

PRODUCT Mar

LAGroup1 Mar

LAProd11 M

ar

LAProd12 Mar

LAProd13 M

ar

LAProd14 Mar

LA

PRODUCT Mar

WESTGroup1 Mar

WESTProd11 Mar

WESTProd12 Mar

WESTProd13 Mar

WESTProd14 Mar

WEST

PRODUCT Mar

MARKETGroup1 M

ar

MARKETProd11 Mar

MARKETProd12 Mar

MARKETProd13 Mar

MARKETProd14 Mar

MARKET

PRODUCT Feb

WAGroup1 Feb

WAProd11 Feb

WAProd12 Feb

WAProd13 Feb

WAProd14 Fe

b

WA

PRODUCT Feb

ORGroup1 Feb

ORProd11 Feb

ORProd12 Feb

ORProd13 Fe

b

ORProd14 Feb

OR

PRODUCT Feb

LAGroup1 Feb

LAProd11 Fe

b

LAProd12 Feb

LAProd13 Fe

b

LAProd14 Feb

LA

PRODUCT Feb

WESTGroup1 Feb

WESTProd11 Feb

WESTProd12 Feb

WESTProd13 Feb

WESTProd14 Feb

WEST

PRODUCT Feb

MARKETGroup1 Fe

b

MARKETProd11 Feb

MARKETProd12 Feb

MARKETProd13 Feb

MARKETProd14 Feb

MARKET

PRODUCT Jan

WAGroup1 Jan

WAProd11 Jan

WAProd12 Jan

WAProd13 Jan

WAProd14 Ja

n

WA

PRODUCT Jan

ORGroup1 Jan

ORProd11 Jan

ORProd12 Jan

ORProd13 Ja

n

ORProd14 Jan

OR

PRODUCT Jan

LAGroup1 Jan

LAProd11 Ja

n

LAProd12 Jan

LAProd13 Ja

n

LAProd14 Jan

LA

PRODUCT Jan

WESTGroup1 Jan

WESTProd11 Jan

WESTProd12 Jan

WESTProd13 Jan

WESTProd14 Jan

WEST

PRODUCT Jan

MARKETGroup1 Ja

nMARKETProd11 Ja

n

MARKETProd12 Jan

MARKETProd13 Jan

MARKETProd14 Jan

MARKET

PRODUCT Tim

e

WAGroup1 Tim

e

WAProd11 Tim

e

WAProd12 Tim

e

WAProd13 Tim

e

WAProd14 Ti

me

WA

PRODUCT Tim

e

ORGroup1 Tim

e

ORProd11 Tim

e

ORProd12 Tim

e

ORProd13 Ti

me

ORProd14 Tim

e

OR

PRODUCT Tim

e

LAGroup1 Tim

e

LAProd11 Ti

me

LAProd12 Tim

e

LAProd13 Ti

me

LAProd14 Tim

e

LA

PRODUCT Tim

e

WESTGroup1 Tim

e

WESTProd11 Tim

e

WESTProd12 Tim

e

WESTProd13 Tim

e

WESTProd14 Tim

e

WEST

PRODUCT Tim

e

MARKETGroup1 Ti

me

MARKETProd11 Tim

e

MARKETProd12 Tim

e

MARKETProd13 Tim

e

MARKETProd14 Tim

e

MARKET

Mea

sure

s

Sales

Sales

Sales

Sales

Sales

Cell

Dimension

Members

Dice

What is a OLAP Cube ?

Page 20: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation20

20

Levels

Defines the “Resolution” or Granularity of the Dimension.

Consists of– Level Key Attribute(s)

– Default Attribute

– Ordering Attribute(s)

– Related Attribute(s)

The Level Key uniquely identifies every member of the level.

Page 21: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation21

21

Hierarchies

Ordered Collection of Levels

Defines Navigation and Aggregation Paths– Month -> Quarter -> Year

– Week -> Year

Various Types and Deployments

2004

Qtr1

Jan

Feb

Mar

Qtr2

Apr

Page 22: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation22

22

Hierarchy Types

Balanced

Unbalanced

RaggedNetwork

Page 23: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation23

Issues of Measures

Measure definitions– Can be a simple mapping to a fact table column or calculated.

– Calculated measures based on fact columns can and will be represented in the MQTs

– Calculated measures defined by MDX statements will not be calculated in the MQT, which has performance implications.

Aggregation functions define how the measures will be summarized up the hierarchy– Defined: Calculate the values then aggregate the results.

– None: Aggregate the inputs, then calculate the aggregates.

– The order of aggregation to calculation is extremely important for non-additive functions.

23

Page 24: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation24

Cube Model

Relational tables in DB2

fact table

dimension tables dimension tables

Cube dimension

Join

AttributeAttribute Join

Hierarchy

Measure

Facts

Dimension

Cube Model

MeasureCube Facts

Cube hierarchy

LevelCube Level

Cube

Join Attribute

Page 25: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation25

InfoSphere Warehouse on System z

Development tooling IDE - Design Studio– Data Warehouse and OLAP tooling over DB2 for System z

– Physical data modeling for relational tables

– DB2-based data movement and transformation (SQW)

– OLAP Modeling

Runtime tooling– OLAP Cube Server

– Data movement and transformation runtime services (SQW)

– Web-based Administration Console

Page 26: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation26

InfoSphere Warehouse

Universal Cube Access(MDX, ODBO, XMLA)

IBM Cognos 8 BIIBM DataQuant, DB2 QMF, IBM Alphablox

Microsoft Excel

Portals, Web Applications, Dashboards, Interactive Reports,Ad Hoc Analysis, Common Desktop Tools

26

OLAP on DB2 for System z

Page 27: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation27

InfoSphere Warehouse – Cubing Services

InfoSphere Warehouse

DB2 for z/OS

Cube Server

Excel

Cognos 8 BI Server

Cognos 8 BI Server

Linux LPAR

Metadata and Data

Cache

SQLMDX

Web Server

Web Server

MD

X

Design Studio & Admin Console

27

Linux LPAR

Page 28: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation28

Cube Server in Action – Startup

28

Cubing ServicesCubing Services

DB2DB2

Start CubeStart Cube

OLAP MetadataOLAP Metadata

MQTs.MQTs.

SQLSQL

Dim MemberCache

Dim MemberCache

Page 29: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation29

Cube Server in Action – Query Processing

29

Cubing ServicesCubing Services

DB2DB2

MDX QueryMDX Query

MDX calculation engineMDX calculation engine

Data cacheData cache

OLAP MetadataOLAP Metadata

MQTsMQTs

SQLSQL

MDXMDX

Dim MemberCache

Dim MemberCache

Can pre-populate cache with an MDX seed query

Page 30: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation30

Develop Star Schema model and create tables

Page 31: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation31

Populate Dimension and Fact Tables

Populate Group dimension table

Populate Fact table

Page 32: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation32

Create Cube Model and Cube definition

Measures

Facts

Dimensions

Levels

Hierarchies

Cubes

FactsSubset

DimSubset

Cube Model - Superset Cube - Subset

Page 33: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation33

Deploy model/cubes

Deploy Cube– Moves “definition” of the cubes to the runtime environment, ie: Cube Server

– Step 1 – use Design Studio to deploy to the metadata repository

– Step 2 – use Administration Console to define and start a Cube Server

– Step 3 – assign a Cube to cube server and start

Page 34: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation34

Optimize Cube

Optimization of a cube means to optimize the cube’s access to DB2– This is accomplished by means of defining a performance layer of MQTs

– The Optimization Wizard creates a recommended set of MQTs based on the Cube Model and sampling of data

Page 35: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation35

Cube Performance Statistics From cube server performance log

Captured 54 queries from from demo prep

Cube server started at 6:21am PST

Last MDX request at 8:48am PST

Queries < 1 sec– 42 total queries

– 33 queries satisfied from cube cache

– 9 queries went back to DB2

– Probably routed to MQTs

Queries 1-10 secs– 8 total queries

– 8 queries went back to DB2

– Probably hit MQTs

Queries 10-20 secs– 2 total queris

– 2 queries went back to DB2

– Maybe hit MQTs??

Long Queries – Opportunity for adding MQTs or database tuning– 1 at 120.81 secs – 120.30 seconds in DB2

– 1 at 143.70 secs – 143.70 seconds in DB2

Using only initial MQT recommendationOf 20 small MQTs

Started with a cold cache

When data in cache, time to satisfy queryWas mostly < .010 sec

Fact table size in DB2 ~ 2M rows

Page 36: MWDUG-OLAP

IBM Software Group | Information Management Software

IBM Information Management © 2008 IBM Corporation36

Further Reading

Books– Data Warehouse – from Architecture to Implementation by Barry Devlin

– Building the Data Warehouse, 4th Edition - by W. H. (Bill) Inmon

– The Data Warehouse Toolkit, by Ralph Kimball

IBM Redbooks (http://www.ibm.com/redbooks)– Dimensional Modeling: In a Business Intelligence Environment (SG24-7138)

– Enterprise Data Warehousing with DB2 9 for z/OS (SG24-7637)

– InfoSphere Warehouse: Cubing Services and Client Access Interfaces (SG24-7582)

Websites – International DB2 Users Group – http://www.idug.org

– The Data Warehousing Institite – http://www.tdwi.org/

– BeyeNetwork - http://www.b-eye-network.com

– IBM – http://www.ibm.com