ibm exploiting your data warehouse
TRANSCRIPT
-
7/28/2019 IBM Exploiting Your Data Warehouse
1/49
IBM Software Group
2007 IBM Corporation
Designing your BI Architecture
Exploiting your Data Warehouse
David Cope
EDW Architect Asia Pacific
-
7/28/2019 IBM Exploiting Your Data Warehouse
2/49
IBM Software Group
2
Ad HocAd HocAnalysisAnalysis
Business
Value
Decision Empowerment
ReportsReports
Static, repetitive queriesStatic, repetitive queries
about past results.about past results.
Empowering analysts to testEmpowering analysts to test
hypotheses for better decisionhypotheses for better decision
making. Query and OLAPmaking. Query and OLAP
Discovering previouslyDiscovering previously
unknown and unsuspectedunknown and unsuspectedinformation.information.
The Analytical Evolution
InsightInsight
ActionActionIBMDifferentiator
Easy Mining and Alphablox
enable insights to bedelivered throughout the
enterprise.
InsightInsight
-
7/28/2019 IBM Exploiting Your Data Warehouse
3/49
IBM Software Group
3
Embedded analytics
Data mining andvisualization
In-lineanalytics
IBM DB2 Warehouse Software
Modelinga
nddesign
Administrationandcontrol
Data movement and transformation
Database management
Performance optimization
Workloadcontrol
Datapartitioning
Deepcompression
IBM DB2 Warehouse
Embedded analytics
Data mining andvisualization
In-lineanalytics
-
7/28/2019 IBM Exploiting Your Data Warehouse
4/49
IBM Software Group
4
IBM DB2 Warehouse Software
Modelinga
nddesign
Administrationandcontrol
Data movement and transformation
Database management
Performance optimization
Workloadcontrol
Datapartitioning
Deepcompression
Embedded analytics
Data mining andvisualization
IBM DB2 Warehouse
In-lineanalytics
-
7/28/2019 IBM Exploiting Your Data Warehouse
5/49
IBM Software Group
5
DWE OLAP Model
Relational
tables in DB2
fact table
dimension tables dimension tables
Cube dimension
Join
AttributeAttribute Join
Hierarchy
Measure
Facts
Dimension
Cube Model
MeasureCube Facts
Cube hierarchy
LevelCube Level
Cube
Join Attribute
-
7/28/2019 IBM Exploiting Your Data Warehouse
6/49
IBM Software Group
6
Model
OLAP MetadataOLAP Metadata
Base TablesAdministrator Catalog Tables
MQT's
Time & Space constraintsQuery Types
Model Information
Data Samples
Performance Advisor
Statistics
Model-Based Optimization
Benefits
Smart Aggregate Selection Smart Index Selection SQL Generation DB2 Exploitation
-
7/28/2019 IBM Exploiting Your Data Warehouse
7/49
IBM Software Group
7
DB2 DataDB2 Data
WarehouseWarehouse
RDBMSRDBMS
MetadataMetadata
OLAP MetadataOLAP Metadata
OLAP MetadataOLAP Metadata
Model & ETLtool metadata
OLAP MetadataOLAP Metadata meta datameta data
bridgebridge
BI toolmetadata
meta datameta data
bridgebridge
DATADATADMLDML
DDLDDL
OLAP MetadataOLAP Metadata
OLAPOLAPMetadataMetadata
OLAP MetadataOLAP Metadata
OLAPOLAPMetadataMetadata
OLAP MetadataOLAP Metadata
Hyperion
BUSINESS OBJECTS
QMF forWindows
OLAPOLAPMetadataMetadata
OLAP MetadataOLAP Metadata
MITI
DB2Alphablox
QlikTech
ArcPlan
OLAP Metadata Interchange
-
7/28/2019 IBM Exploiting Your Data Warehouse
8/49
IBM Software Group
8
Platform for CustomizedAnalytic Applications and
Inline Analytics
Pre-built components (Blox)for analytic functionality
Allows you to createcustomized analyticcomponents that are
embedded into existingbusiness processes and webapplications
Alphablox
-
7/28/2019 IBM Exploiting Your Data Warehouse
9/49
IBM Software Group
9
For end-users:
A web application, portal ordashboard with embeddedanalytics in an easy-to-use
interactive interface
For application developers:
A J2EE application for analysis-
oriented interaction A set of analytic-focused
extensions to the applicationserver
Alphablox with DWE: SQL generated by DWE Design
Studio can be pasted intoAlphablox pages for warehouse-based embedded analytics
Alphablox
-
7/28/2019 IBM Exploiting Your Data Warehouse
10/49
IBM Software Group
10
Alphablox Architecture
RelationalDatabases
Alphablox
Cubing
EngineROLAP
OLAP
Essbase /
MSAS /SAP BW
MQ
XMLHttpRequest
Web BrowserDHTML Based Client similar to AJAX
DataBlox
Calculations Bookmarks Alerts Comments
GridBlox ChartBlox PresentBlox
UI Model
Alphablox
WebLogicWebSphere
Tomcat
-
7/28/2019 IBM Exploiting Your Data Warehouse
11/49
IBM Software Group
11
Relational Cubing Engine & OLAP Optimization
Application Server Tier Database Server Tier
Customer Tier
Fact DataRetrieval
Dimension DataRetrieval
Relational Cubing EngineRelational Cube
DB2 Alphablox Server
cubelets Cube Definition
DB2 Alphablox Application
Data Blox
Present Blox Grid Blox Chart Blox
OLAP MetadataOLAP Metadata
MetadataImport
DB2 Cube Views Star SchemaDB2 MQTs
HTTP Server
MDX MDX
-
7/28/2019 IBM Exploiting Your Data Warehouse
12/49
IBM Software Group
12
Versatile Architecture Support
Mart
BI Applications and Tools
EDW
External
Marts
Internal
Marts
Virtual
Marts
DB2 Warehousesupports versatileanalytics
architectures
Analytics directed
againstExternal Mart
Internal Mart
Virtual Mart
-
7/28/2019 IBM Exploiting Your Data Warehouse
13/49
IBM Software Group
13
IBM DB2 Warehouse Software
Modelinga
nddesign
Administrationandcontrol
Data movement and transformation
Database management
Performance optimization
Workloadcontrol
Datapartitioning
Deepcompression
Embedded analytics
In-lineanalytics
IBM DB2 Warehouse
Data mining andvisualization
-
7/28/2019 IBM Exploiting Your Data Warehouse
14/49
IBM Software Group
14
DWE Easy Mining Mining without a Statistician
Realize the benefits of mining by enablinganalysts, rather than relying on statisticians,for your data mining needs
Reporting Tool
DB2 DataWarehouse
Edition
-
7/28/2019 IBM Exploiting Your Data Warehouse
15/49
IBM Software Group
15
Two Types of Data Mining Discovery & Predictive
Predictive
Specific question
Probability associated with outcomes
Directed analysis
Iterative process
Train
Test
Apply
Apply model in database at customer touch points
Discovery
Automatically find trends and patterns
Answer unasked questions
Relatively undirected analysis
Tool reports on findings
In a word Easier
Useful for non-statisticians
-
7/28/2019 IBM Exploiting Your Data Warehouse
16/49
IBM Software Group
16
Discovery Methods finding useful patterns and relationships
Associations
Which item affinities (rules) are in my data?
[Beer => Diapers] single transaction
Sequences
Which sequential patterns are in my data?
[Love] => [Marriage] => [Baby Products] sequential
Clustering
Which interesting groups are in my data?
customer profiles, store profiles
Predictive Methods predicting values Classification
How to predict categorical values in my data?
will the patient be cured, harmed, unaffected by treatment?
Regression
How to predict numerical values in my data?
how likely a customer will respond to the promotion
how much will each customer spend this year?
Score data directly in DB2, scalable and real time
DWE Easy Mining Algorithms
Select Transform Mine Assimilate
ExtractedInformation
AssimilatedInformation
SelectedData
DataWarehouse
Statistician & DataMining Workbench
DWE
Enterprise DataWarehouse
BusinessAnalyst
DWE
Partner
-
7/28/2019 IBM Exploiting Your Data Warehouse
17/49
IBM Software Group
17
How to Recognize a Data Mining Need
What do my customers look like?
Which customers should I target in a promotion?
Which products should I use for the promotion?
How should I lay out my new stores?
Which products should I replenish in anticipation of a promotion?
Which of my customers are most likely to churn?
How can I improve customer loyalty? What is the most likely item that a customer will purchase next?
Who is most likely to have another heart attack?
What is the likelihood of a part failure?
When one part fails, what other part(s) are most likely to fail soon?
How can I identify high-potential prospects (lead generation)?
How can I detect potential fraud?
-
7/28/2019 IBM Exploiting Your Data Warehouse
18/49
IBM Software Group
18
High Level view of the Data Mining Process
Data
Ware-
house
BusinessProblem
Insight
Extract &
Transform data
Build Model
Deploy
Validate,Refine
A minor
miracle occurs
-
7/28/2019 IBM Exploiting Your Data Warehouse
19/49
IBM Software Group
19
The Data Mining Process
Select Transform
SelectData
DataWare-
house
BusinessProblem
Mine
ApplyResults
Revise Data & Refine Model
Visualize
Understand
Analyze
Data Preparation Data Mining
Report
Score data
Embed inapplication
Y=f(X
,Z)
(
(
((X
j)
Discover & Interpret
Information
ETL
MININGDEPLOY
This is an
iterativeprocess!
MINING
-
7/28/2019 IBM Exploiting Your Data Warehouse
20/49
IBM Software Group
20
Associations
Discovery technique to find associations or affinities among items (or conditions,outcomes, etc.) in a single transaction.
Constructs statements (rules) that quantify the relationships among items that tend tooccur together in transactions
Example:
In a supermarket, Cola is bought in 20% of all purchases.
Cola is bought in 60% of the purchases involving Orange juice.
3.7% of all purchases involve both Cola and Orange juice.
The rule [ Orange juice ] [ Cola ] has the following properties:
Support = 3.7% Cola and OJ are present together in 3.7% of all baskets. Confidence = 60% Cola is present in 60% of the baskets containing OJ. Lift = 60% / 20% = 3 Cola is 3 times as likely to be in the basket when OJ is also.
Scoring
Given the item(s) purchased (rule body), what item (rule head) is most likely to bepurchased as well?
Common uses
Promotional or cross-sell offers, Disease management, Part failure
-
7/28/2019 IBM Exploiting Your Data Warehouse
21/49
IBM Software Group
21
Sequences
Discovery technique to find affinities among items (or conditions, outcomes, etc.)across multiple transactions over time.
Quantifies relationships (sequences) to identify the most likely item in the next transaction
ScoringGiven the item(s) purchased previously (rule body), what item (rule head) is most likely to
be purchased in a subsequent transaction within a certain time frame?
Common uses
Fraud detection, Promotional offers, Disease management, Part failure
G, B ---- C ---- X
B ---- A ---- Y
Y ---- D ---- C --- B ---- X
100% of the customers who get Cwill get X at a later time
67% of the customers who get B
will get X at a later timeX
C
-
7/28/2019 IBM Exploiting Your Data Warehouse
22/49
IBM Software Group
22
Clustering
Discovery technique to find clusters having distinct behaviors andcharacteristics
Gain insights to customers, stores, insurance claims, etc.
Generate distinct behavioral/demographic profiles
Understand the most important attributes of each cluster
Create a model to assign individuals to best-fit clusters
Apply model to assign new individuals or re-assign existing individuals
Design business actions tailored to different characteristic profiles
Scoring
Apply model to assign each record to its best-fit cluster
Apply appropriate business action for each record based on its assignedcluster
Common uses
Customer segmentation, store profiling, deviation detection
-
7/28/2019 IBM Exploiting Your Data Warehouse
23/49
IBM Software Group
23
Classification
Prediction technique to classify individuals by outcome
Classify by a categorical class variable (e.g., YES-NO-MAYBE response)
Understand the most important factors (predictors) leading to each outcome
Modeling
Create a model to classify individuals according to expected outcome
Design business action based on most important predictors
ScoringApply model to predict the outcome for each individual
New prospects (expected behavior)
Existing individuals (changes in behavior)
Identify target individuals for business action
Common uses
Customer attrition (churn), Part failure
-
7/28/2019 IBM Exploiting Your Data Warehouse
24/49
IBM Software Group
24
Regression Set of predictive techniques to predict a dependent variable
Predict continuous value or binary numeric value
Continuous: e.g., revenue (prediction represents amount of revenue)
Binary: e.g., 0=No, 1=Yes (prediction represents probability of Yes)
Understand the most important predictors of the dependent variable
Transform regression, linear regression, polynomial regression
Modeling
Create a model to predict the dependent variableDesign business action (e.g., predict likelihood of default for a loan
application, in real time)
Scoring
Apply model to generate a prediction for each individual (e.g., probability ofpart failure)
Identify target individuals for business action
Common uses
Predict revenue/cost/profitability, Predict risk of loan default
-
7/28/2019 IBM Exploiting Your Data Warehouse
25/49
IBM Software Group
25
The Data Mining Process
Select Transform
SelectData
DataWare-
house
BusinessProblem
Mine
ApplyResults
Revise Data & Refine Model
Visualize
Understand
Analyze
Data Preparation Data Mining
Report
Score data
Embed inapplication
Y=f(X
,Z)
(
(
((X
j)
Discover & Interpret
Information
ETL
MINING
DEPLOYThis is an
iterativeprocess!
ETL
-
7/28/2019 IBM Exploiting Your Data Warehouse
26/49
IBM Software Group
26
Data exploration
DWE enables you to explore the data.
Check data quality (prior to performing ETLfor data preparation) and gain a general
understanding of the data Design Studio provides four tools to
inspect data:
Table sampling
Univariate distributions
Bivariate distributions
Multivariate distributions
All these tools are accessible by right-clicking on a table/view/alias/nicknamein the database explorer:
-> Datafor table sampling/editing
-> Value Distributionsfor multivariate/univariate/bivariate distributions
-
7/28/2019 IBM Exploiting Your Data Warehouse
27/49
IBM Software Group
27
The Data Mining Process
Select Transform
SelectData
DataWare-
house
BusinessProblem
Mine
ApplyResults
Revise Data & Refine Model
Visualize
Understand
Analyze
Data Preparation Data Mining
Report
Score data
Embed inapplication
Y=f(X
,Z)
(
(
((X
j)
Discover & Interpret
Information
ETL
MINING
DEPLOYThis is an
iterativeprocess!
DEPLOY
-
7/28/2019 IBM Exploiting Your Data Warehouse
28/49
IBM Software Group
28
Leveraging Mining and Alphablox: DWE Miningblox
Create web applications that provide access to DWE Data Mining
Extends the DB2 Alphablox API with mining specific functionality.
With Miningblox, you can perform the following tasks:
Selecting input data
Processing input data
Displaying mining results graphically in a Web browser, for example, thecharacteristics of a customer segment
Administering or managing mining runs
Typically a web application using MiningBlox tags might be integrated in a
business application or an intranet portal.
-
7/28/2019 IBM Exploiting Your Data Warehouse
29/49
IBM Software Group
29
Why use Miningblox ?
Provide access to Data Mining for a group of business analysts.
Create a Miningblox web application that provides access to mining functionalitythrough the Web browser, no need to install software on the Clients machines
Analysts can execute mining runs and view results in a customized webapplication without extensive knowledge about mining software.
With the Miningblox Application wizard in the DWE Design Studio, you can easily
create Web applications by selecting sample templates or you can extendAlphablox applications with mining functionality.
-
7/28/2019 IBM Exploiting Your Data Warehouse
30/49
IBM Software Group
30
Deployment through Alphablox application example
MBA application console
-
7/28/2019 IBM Exploiting Your Data Warehouse
31/49
IBM Software Group
31
Deployment through Alphablox application example
MBA execution
IBM S ft G
-
7/28/2019 IBM Exploiting Your Data Warehouse
32/49
IBM Software Group
32
Deployment through Alphablox application example
MBA completion
IBM S ft G
-
7/28/2019 IBM Exploiting Your Data Warehouse
33/49
IBM Software Group
33
Deployment through Alphablox application example
MBA results report
-
7/28/2019 IBM Exploiting Your Data Warehouse
34/49
IBM Software Group
2007 IBM Corporation
Case Study: Retail Department Store
Analytics with Data Mining and Alphablox
David Cope
EDW Architect Asia Pacific
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
35/49
IBM Software Group
35
Retail Department Store Chain
Business requirements
Perform a data mining POC (really a pilot project) to support the original DWEdecision, ensure success, and highlight DWE capabilities for further uptake
Define business problem Boost storewide sales (across other departments) based on womens shoes
Define analytical approach and ETL procedure
Extract all transactions of customers who have purchased womens shoes
Transform transactional data into one record per customer, for customersegmentation
Perform market basket analysis (MBA) for high-potential customers who havepurchased womens shoes
Challenges
Engagement sponsored by IT with limited access to business users (LOB)
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
36/49
IBM Software Group
36
Solution Overview
DB2 Data Warehouse
Analytical Dashboard
Alphablox
Data Mining
Visualizer/Alphablox
Heat Maps
/ OtherVisualization
CubingEngine DataMining API
Prepare data for mining by:
Pulling transactions for womens shoecustomers
Creating data for customer segmentation
Use DB2 Mining to perform:
Clustering
Identify high-potential customer segments
Market Basket Analysis for high-potentialsegments
Identify associated items
Identify next-most-likely purchases
Deploy mining results in Alphablox
Integrate data mining information into the
dashboard and as part of the guided analysis
Build a dashboard in Alphablox:
Provide critical information and metrics in anAlphablox dashboard to merchandising and
marketing.
Integrate powerful visualization to make iteasier to identify problem areas
Mining Models & Services Clustering Associations & Sequences Scoring Services
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
37/49
IBM Software Group
37
Business Scenario for Mining
Business requirements for POC
Focus on customers who have purchased womens shoes in the past 12 months
Boost storewide sales (across other departments) based on womens shoes
Increase wallet share from high-potential customers
Business questions to be answered
What do my womens shoes customers look like?
Which of these customers should I target in a promotion?
Which products should I use for the promotion?
Which products should I replenish in anticipation of a promotion?
How can I improve customer loyalty?
What is the most likely item that a womens shoes customer will purchase next?
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
38/49
IBM Software Group
38
Step 1: Identify High-Potential Shoe Customers
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
39/49
p
39
Result: 16 Distinct Clusters Created
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
40/49
p
40
Cluster 1: Those who Act Like VIPs
VIPs
FrequentShoppers
Big
Spenders
ActiveShoppers
Respond toDiscounts High Returns
High Potential Customers!
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
41/49
41
Cluster 6: Frequent Good Shoppers
Shop Here 30
days/yrAbove-AvgPurchases
Above-Avg
Spending
Respond toDiscounts
AverageReturns
High Potential Customers!
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
42/49
42
Step 2: Identify Associated Items for Clusters 1 & 6
Extracted transactions for those clusters of customers
Performed market basket analysis and interpreted results
Associations (items purchased together in one visit)
+
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
43/49
43
Identify Purchased Together for Clusters 1 & 6
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
44/49
44
Results: Associations for Clusters 1 & 6
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
45/49
45
Step 3: Identify Next Likely Purchase for Clusters 1 & 6
Extracted transactions for those cluster of customers
Performed market basket analysis and interpreted results
Sequences (next most likely purchase in a future visit)
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
46/49
46
Identify Next Likely Purchases for Clusters 1 & 6
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
47/49
47
Results: Sequences for Customers in Clusters 1 & 6
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
48/49
48
Results and Future Ideas
Deployment of customer segmentation and MBA
End-user application with Alphablox
Create & refresh mining models
Identify high-potential customer segments Refresh assignment of each customer to best-fit cluster
Target selected customer segments for promotions
Batch scoring to identify best offer(s) for each customer/segment
Merchandising now has a view of their customers, not just products
Future ideas
Score a customer at checkout register in real time
MBA scoring (associations, sequences)
Focused MBA scoring for known customers, based on best-fit cluster
Make an offer to induce customers to visit other departments before leaving the store
IBM Software Group
-
7/28/2019 IBM Exploiting Your Data Warehouse
49/49
49