mwdug-olap
DESCRIPTION
TRANSCRIPT
© 2007 IBM Corporation
IBM Software Group
© 2008 IBM Corporation
OnLine Analytical Processing (OLAP)
Andy PerkinszWarehouse SWAT Team
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation2
The right information
To the right people
At the right time
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation3
Reporting grows up..
Historical ReportsQuery & Reporting
to Understand What Happened
Operational Reports
Transaction Systems to understand what is
happening in the business RIGHT NOW
OLAP & Data Mining to Understand Why and
Recommend Future ActionInformation Analysis
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation4
… to become Business Intelligence
Simple reporting no longer sufficient
Business Intelligence: the process of gathering, consolidating, and analyzing data from multiple sources for strategic and tactical decision making.
– derives new value from transactional data
– supports strategic planning, monitoring, and efficiency
– delivers knowledge of the customer, suppliers, and channels
– unifies the enterprise with actionable information for operational Business Intelligence
Top quality BI relies on a secure, high performing, warehouse oriented infrastructure to deliver Information on Demand—based on open standards
Analysis
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation5
Examples of Business Intelligence
Financial Analytics –Financial consolidation
–Business Performance Monitoring (BPM)
–Balanced Scorecards
–ERP reporting
CRM Analytics–Customer segmentation
–Customer acquisition & retention
–Profitability analysis
–Campaign management
–Market basket analysis
Other Analytics–Demand Planning
–Pricing elasticity analysis
–Risk analysis
–Inventory Forecasting
–Supply chain forecasting
–Supplier scorecards
–Workforce analysis
–Logistics trend analysis
–Procurement analysis
–Category management
DB2 Perform
ance
And Usage
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation6
Business Intelligence requires good foundation…
Business Intelligence (BI) and Data Warehousing (DW) are sometimes used interchangeably– Typically BI includes end user tools for query, reporting, analysis,
dashboarding etc.
– Includes advanced analytics such as Online Analytic Processing (OLAP) and data mining
– Both concepts depend on each other
• BI almost always assumes a Warehouse (WH), Operational Data Store (ODS) or Data Mart (DM) exists with timely, trusted information
• A DW depends on end user tools that turn data into information.
Both terms (DW and BI) address desire for timely, accurate, available data delivered when, where and how the end users want it
NEW TERM: Operational Intelligence or Operational BI
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation7
Multidimensional Reporting/Analysis
A style of viewing information from various perspectives and aggregation levels over time
Start at a high level for seeing trends and finding outliers – Sales vs Costs by Region by Product Category by Quarter for the last 5
quarters
Drill down for more detail– Sales vs Costs by Stores in the South Region by Product Category by
Month for the last 2 quarters
Change perspective and filter– Sales vs Costs by Sales Person by Product Category by Month for the last
2 quarters for Store 25
Not dependent on any particular technology
Can be accomplished by iteratively requesting batch reports
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation8
OnLine Analytical Processing (OLAP)
Interactive multidimensional analysis at the “speed of thought”
Great CalculationsSimple aggregations: sums, averagesTime based calculations – 3 month moving averagesMulti-pass calculations – rank, percentage of total
AggregationExpress queries in terms of dimensionsAggregate using dimension hierarchiesIdentify key indicators using business terminology
Navigation Dimensions: Product, Geography, Time Dimensions have attributes: Products have colors, sizes, price ranges Dimensions have hierarchical levels: Region->State->City
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation9
General OLAP Architecture
Data Warehouse
Multidimensional Server
Excel
Report Server
Report Server
SQLMDX Web
Server
Web Server
MD
X
Modeling and Admin Tooling
9
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation10
Desktop OLAP (DOLAP)
Data Warehouse
Multidimensional Server
10
Client Desktop/Laptop
SQL Extract
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation11
Multidimensional Storage OLAP (MOLAP)
Data Warehouse
Multidimensional Server
Excel
Report Server
Report Server
MDX Web Server
Web Server
MD
X
Modeling and Admin Tooling
11
Extract
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation12
Relational OLAP (ROLAP)
Data Warehouse
Multidimensional Metadata
Excel
Report Server
Report Server
Web Server
Web Server
Modeling and Admin Tooling
12
SQL
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation13
Hybrid OLAP (HOLAP)
Data Warehouse
Multidimensional Server
Excel
Report Server
Report Server
MDX Web Server
Web Server
MD
X
Modeling and Admin Tooling
13
SQL
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation14
Data Modeling
Star Schema model is predominant modeling style for the relational
database
Fact table(s)Store valuesFact table(s)Store values
Dept DimensionDept Dimension
TimeDimensionTimeDimension
Account DimensionAccount Dimension
ProjectDimensionProjectDimension
XYZDimensionXYZDimension
Dimension Tables– Define the categories that organize the
analyzed metrics
– E.g., Stores, Time, Customer
– Contain everything about that category that the business analysis might need (attributes)
– Primary key identifies a single member at the lowest level of grain.
Fact Tables– Contain all the metrics (measures) for the
business analysis
– At the same grain as the dimension tables*
– Foreign keys join back to the dimension tables to enable grouping and aggregating.
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation15
Sample Star Schema
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation16
Snowflake Schema
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation17
Advantages of Star and Snowflake Schemas
Reflect the dimensional nature of the business and the business questions– SQL query is (longer but) very similar to the business question:
What were sales of shoes in Q1 by region?
SelectSUM(Fact.Sales), Store.Region
FromFact, Store, Product, Time
WhereFact.time_id = Time.time_id ANDFact.produ_id=Proeuct.prod_id ANDFact.store_id=Store.store_id ANDTime.Qtr = ‘Q1’ ANDProduct.product = ‘Shoes”
Group ByStore.Region
For the Snowflake, we would simply see more joins to join the table that has the granularity of the business question.
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation18
Advantages of Star and Snowflake Schemas
SQL is straightforward for a tool to generate
Denormalized for faster reads
Optimized for n-way joins on the fact table– With good RI (enforced or informational) the DB2 optimizer can do
a good job with star joins
Optimized for aggregations on the dimensional hierarchies– Advisors and MQTs can help materialize aggregations
Column calculations on the same row are efficient– E.g, Profit = sales_col – COGS_col
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation19
PRODUCT Qtr1
WAGroup1 Qtr1
WAProd11 Qtr1
WAProd12 Qtr1
WAProd13 Qtr1
WAProd14 Q
tr1
WA
PRODUCT Qtr1
ORGroup1 Qtr1
ORProd11 Qtr1
ORProd12 Qtr1
ORProd13 Q
tr1ORProd14 Q
tr1
OR
PRODUCT Qtr1
LAGroup1 Qtr1
LAProd11 Q
tr1
LAProd12 Qtr1
LAProd13 Q
tr1
LAProd14 Qtr1
LA
PRODUCT Qtr1
WESTGroup1 Qtr1
WESTProd11 Qtr1
WESTProd12 Qtr1
WESTProd13 Qtr1
WESTProd14 Qtr1
WEST
Slice
PRODUCT Qtr1
MARKETGroup1 Qtr1
MARKETProd11 Qtr1
MARKETProd12 Qtr1
MARKETProd13 Qtr1
MARKETProd14 Qtr1
MARKET
PRODUCT Mar
WAGroup1 Mar
WAProd11 Mar
WAProd12 Mar
WAProd13 Mar
WAProd14 M
ar
WA
PRODUCT Mar
ORGroup1 Mar
ORProd11 Mar
ORProd12 Mar
ORProd13 M
ar
ORProd14 Mar
OR
PRODUCT Mar
LAGroup1 Mar
LAProd11 M
ar
LAProd12 Mar
LAProd13 M
ar
LAProd14 Mar
LA
PRODUCT Mar
WESTGroup1 Mar
WESTProd11 Mar
WESTProd12 Mar
WESTProd13 Mar
WESTProd14 Mar
WEST
PRODUCT Mar
MARKETGroup1 M
ar
MARKETProd11 Mar
MARKETProd12 Mar
MARKETProd13 Mar
MARKETProd14 Mar
MARKET
PRODUCT Feb
WAGroup1 Feb
WAProd11 Feb
WAProd12 Feb
WAProd13 Feb
WAProd14 Fe
b
WA
PRODUCT Feb
ORGroup1 Feb
ORProd11 Feb
ORProd12 Feb
ORProd13 Fe
b
ORProd14 Feb
OR
PRODUCT Feb
LAGroup1 Feb
LAProd11 Fe
b
LAProd12 Feb
LAProd13 Fe
b
LAProd14 Feb
LA
PRODUCT Feb
WESTGroup1 Feb
WESTProd11 Feb
WESTProd12 Feb
WESTProd13 Feb
WESTProd14 Feb
WEST
PRODUCT Feb
MARKETGroup1 Fe
b
MARKETProd11 Feb
MARKETProd12 Feb
MARKETProd13 Feb
MARKETProd14 Feb
MARKET
PRODUCT Jan
WAGroup1 Jan
WAProd11 Jan
WAProd12 Jan
WAProd13 Jan
WAProd14 Ja
n
WA
PRODUCT Jan
ORGroup1 Jan
ORProd11 Jan
ORProd12 Jan
ORProd13 Ja
n
ORProd14 Jan
OR
PRODUCT Jan
LAGroup1 Jan
LAProd11 Ja
n
LAProd12 Jan
LAProd13 Ja
n
LAProd14 Jan
LA
PRODUCT Jan
WESTGroup1 Jan
WESTProd11 Jan
WESTProd12 Jan
WESTProd13 Jan
WESTProd14 Jan
WEST
PRODUCT Jan
MARKETGroup1 Ja
nMARKETProd11 Ja
n
MARKETProd12 Jan
MARKETProd13 Jan
MARKETProd14 Jan
MARKET
PRODUCT Tim
e
WAGroup1 Tim
e
WAProd11 Tim
e
WAProd12 Tim
e
WAProd13 Tim
e
WAProd14 Ti
me
WA
PRODUCT Tim
e
ORGroup1 Tim
e
ORProd11 Tim
e
ORProd12 Tim
e
ORProd13 Ti
me
ORProd14 Tim
e
OR
PRODUCT Tim
e
LAGroup1 Tim
e
LAProd11 Ti
me
LAProd12 Tim
e
LAProd13 Ti
me
LAProd14 Tim
e
LA
PRODUCT Tim
e
WESTGroup1 Tim
e
WESTProd11 Tim
e
WESTProd12 Tim
e
WESTProd13 Tim
e
WESTProd14 Tim
e
WEST
PRODUCT Tim
e
MARKETGroup1 Ti
me
MARKETProd11 Tim
e
MARKETProd12 Tim
e
MARKETProd13 Tim
e
MARKETProd14 Tim
e
MARKET
Mea
sure
s
Sales
Sales
Sales
Sales
Sales
Cell
Dimension
Members
Dice
What is a OLAP Cube ?
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation20
20
Levels
Defines the “Resolution” or Granularity of the Dimension.
Consists of– Level Key Attribute(s)
– Default Attribute
– Ordering Attribute(s)
– Related Attribute(s)
The Level Key uniquely identifies every member of the level.
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation21
21
Hierarchies
Ordered Collection of Levels
Defines Navigation and Aggregation Paths– Month -> Quarter -> Year
– Week -> Year
Various Types and Deployments
2004
Qtr1
Jan
Feb
Mar
Qtr2
Apr
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation22
22
Hierarchy Types
Balanced
Unbalanced
RaggedNetwork
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation23
Issues of Measures
Measure definitions– Can be a simple mapping to a fact table column or calculated.
– Calculated measures based on fact columns can and will be represented in the MQTs
– Calculated measures defined by MDX statements will not be calculated in the MQT, which has performance implications.
Aggregation functions define how the measures will be summarized up the hierarchy– Defined: Calculate the values then aggregate the results.
– None: Aggregate the inputs, then calculate the aggregates.
– The order of aggregation to calculation is extremely important for non-additive functions.
23
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation24
Cube Model
Relational tables in DB2
fact table
dimension tables dimension tables
Cube dimension
Join
AttributeAttribute Join
Hierarchy
Measure
Facts
Dimension
Cube Model
MeasureCube Facts
Cube hierarchy
LevelCube Level
Cube
Join Attribute
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation25
InfoSphere Warehouse on System z
Development tooling IDE - Design Studio– Data Warehouse and OLAP tooling over DB2 for System z
– Physical data modeling for relational tables
– DB2-based data movement and transformation (SQW)
– OLAP Modeling
Runtime tooling– OLAP Cube Server
– Data movement and transformation runtime services (SQW)
– Web-based Administration Console
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation26
InfoSphere Warehouse
Universal Cube Access(MDX, ODBO, XMLA)
IBM Cognos 8 BIIBM DataQuant, DB2 QMF, IBM Alphablox
Microsoft Excel
Portals, Web Applications, Dashboards, Interactive Reports,Ad Hoc Analysis, Common Desktop Tools
26
OLAP on DB2 for System z
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation27
InfoSphere Warehouse – Cubing Services
InfoSphere Warehouse
DB2 for z/OS
Cube Server
Excel
Cognos 8 BI Server
Cognos 8 BI Server
Linux LPAR
Metadata and Data
Cache
SQLMDX
Web Server
Web Server
MD
X
Design Studio & Admin Console
27
Linux LPAR
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation28
Cube Server in Action – Startup
28
Cubing ServicesCubing Services
DB2DB2
Start CubeStart Cube
OLAP MetadataOLAP Metadata
MQTs.MQTs.
SQLSQL
Dim MemberCache
Dim MemberCache
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation29
Cube Server in Action – Query Processing
29
Cubing ServicesCubing Services
DB2DB2
MDX QueryMDX Query
MDX calculation engineMDX calculation engine
Data cacheData cache
OLAP MetadataOLAP Metadata
MQTsMQTs
SQLSQL
MDXMDX
Dim MemberCache
Dim MemberCache
Can pre-populate cache with an MDX seed query
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation30
Develop Star Schema model and create tables
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation31
Populate Dimension and Fact Tables
Populate Group dimension table
Populate Fact table
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation32
Create Cube Model and Cube definition
Measures
Facts
Dimensions
Levels
Hierarchies
Cubes
FactsSubset
DimSubset
Cube Model - Superset Cube - Subset
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation33
Deploy model/cubes
Deploy Cube– Moves “definition” of the cubes to the runtime environment, ie: Cube Server
– Step 1 – use Design Studio to deploy to the metadata repository
– Step 2 – use Administration Console to define and start a Cube Server
– Step 3 – assign a Cube to cube server and start
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation34
Optimize Cube
Optimization of a cube means to optimize the cube’s access to DB2– This is accomplished by means of defining a performance layer of MQTs
– The Optimization Wizard creates a recommended set of MQTs based on the Cube Model and sampling of data
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation35
Cube Performance Statistics From cube server performance log
Captured 54 queries from from demo prep
Cube server started at 6:21am PST
Last MDX request at 8:48am PST
Queries < 1 sec– 42 total queries
– 33 queries satisfied from cube cache
– 9 queries went back to DB2
– Probably routed to MQTs
Queries 1-10 secs– 8 total queries
– 8 queries went back to DB2
– Probably hit MQTs
Queries 10-20 secs– 2 total queris
– 2 queries went back to DB2
– Maybe hit MQTs??
Long Queries – Opportunity for adding MQTs or database tuning– 1 at 120.81 secs – 120.30 seconds in DB2
– 1 at 143.70 secs – 143.70 seconds in DB2
Using only initial MQT recommendationOf 20 small MQTs
Started with a cold cache
When data in cache, time to satisfy queryWas mostly < .010 sec
Fact table size in DB2 ~ 2M rows
IBM Software Group | Information Management Software
IBM Information Management © 2008 IBM Corporation36
Further Reading
Books– Data Warehouse – from Architecture to Implementation by Barry Devlin
– Building the Data Warehouse, 4th Edition - by W. H. (Bill) Inmon
– The Data Warehouse Toolkit, by Ralph Kimball
IBM Redbooks (http://www.ibm.com/redbooks)– Dimensional Modeling: In a Business Intelligence Environment (SG24-7138)
– Enterprise Data Warehousing with DB2 9 for z/OS (SG24-7637)
– InfoSphere Warehouse: Cubing Services and Client Access Interfaces (SG24-7582)
Websites – International DB2 Users Group – http://www.idug.org
– The Data Warehousing Institite – http://www.tdwi.org/
– BeyeNetwork - http://www.b-eye-network.com
– IBM – http://www.ibm.com