ibm exploiting your data warehouse

Upload: vincenzo-presutto

Post on 03-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    1/49

    IBM Software Group

    2007 IBM Corporation

    Designing your BI Architecture

    Exploiting your Data Warehouse

    David Cope

    EDW Architect Asia Pacific

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    2/49

    IBM Software Group

    2

    Ad HocAd HocAnalysisAnalysis

    Business

    Value

    Decision Empowerment

    ReportsReports

    Static, repetitive queriesStatic, repetitive queries

    about past results.about past results.

    Empowering analysts to testEmpowering analysts to test

    hypotheses for better decisionhypotheses for better decision

    making. Query and OLAPmaking. Query and OLAP

    Discovering previouslyDiscovering previously

    unknown and unsuspectedunknown and unsuspectedinformation.information.

    The Analytical Evolution

    InsightInsight

    ActionActionIBMDifferentiator

    Easy Mining and Alphablox

    enable insights to bedelivered throughout the

    enterprise.

    InsightInsight

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    3/49

    IBM Software Group

    3

    Embedded analytics

    Data mining andvisualization

    In-lineanalytics

    IBM DB2 Warehouse Software

    Modelinga

    nddesign

    Administrationandcontrol

    Data movement and transformation

    Database management

    Performance optimization

    Workloadcontrol

    Datapartitioning

    Deepcompression

    IBM DB2 Warehouse

    Embedded analytics

    Data mining andvisualization

    In-lineanalytics

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    4/49

    IBM Software Group

    4

    IBM DB2 Warehouse Software

    Modelinga

    nddesign

    Administrationandcontrol

    Data movement and transformation

    Database management

    Performance optimization

    Workloadcontrol

    Datapartitioning

    Deepcompression

    Embedded analytics

    Data mining andvisualization

    IBM DB2 Warehouse

    In-lineanalytics

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    5/49

    IBM Software Group

    5

    DWE OLAP Model

    Relational

    tables in DB2

    fact table

    dimension tables dimension tables

    Cube dimension

    Join

    AttributeAttribute Join

    Hierarchy

    Measure

    Facts

    Dimension

    Cube Model

    MeasureCube Facts

    Cube hierarchy

    LevelCube Level

    Cube

    Join Attribute

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    6/49

    IBM Software Group

    6

    Model

    OLAP MetadataOLAP Metadata

    Base TablesAdministrator Catalog Tables

    MQT's

    Time & Space constraintsQuery Types

    Model Information

    Data Samples

    Performance Advisor

    Statistics

    Model-Based Optimization

    Benefits

    Smart Aggregate Selection Smart Index Selection SQL Generation DB2 Exploitation

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    7/49

    IBM Software Group

    7

    DB2 DataDB2 Data

    WarehouseWarehouse

    RDBMSRDBMS

    MetadataMetadata

    OLAP MetadataOLAP Metadata

    OLAP MetadataOLAP Metadata

    Model & ETLtool metadata

    OLAP MetadataOLAP Metadata meta datameta data

    bridgebridge

    BI toolmetadata

    meta datameta data

    bridgebridge

    DATADATADMLDML

    DDLDDL

    OLAP MetadataOLAP Metadata

    OLAPOLAPMetadataMetadata

    OLAP MetadataOLAP Metadata

    OLAPOLAPMetadataMetadata

    OLAP MetadataOLAP Metadata

    Hyperion

    BUSINESS OBJECTS

    QMF forWindows

    OLAPOLAPMetadataMetadata

    OLAP MetadataOLAP Metadata

    MITI

    DB2Alphablox

    QlikTech

    ArcPlan

    OLAP Metadata Interchange

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    8/49

    IBM Software Group

    8

    Platform for CustomizedAnalytic Applications and

    Inline Analytics

    Pre-built components (Blox)for analytic functionality

    Allows you to createcustomized analyticcomponents that are

    embedded into existingbusiness processes and webapplications

    Alphablox

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    9/49

    IBM Software Group

    9

    For end-users:

    A web application, portal ordashboard with embeddedanalytics in an easy-to-use

    interactive interface

    For application developers:

    A J2EE application for analysis-

    oriented interaction A set of analytic-focused

    extensions to the applicationserver

    Alphablox with DWE: SQL generated by DWE Design

    Studio can be pasted intoAlphablox pages for warehouse-based embedded analytics

    Alphablox

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    10/49

    IBM Software Group

    10

    Alphablox Architecture

    RelationalDatabases

    Alphablox

    Cubing

    EngineROLAP

    OLAP

    Essbase /

    MSAS /SAP BW

    MQ

    XMLHttpRequest

    Web BrowserDHTML Based Client similar to AJAX

    DataBlox

    Calculations Bookmarks Alerts Comments

    GridBlox ChartBlox PresentBlox

    UI Model

    Alphablox

    WebLogicWebSphere

    Tomcat

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    11/49

    IBM Software Group

    11

    Relational Cubing Engine & OLAP Optimization

    Application Server Tier Database Server Tier

    Customer Tier

    Fact DataRetrieval

    Dimension DataRetrieval

    Relational Cubing EngineRelational Cube

    DB2 Alphablox Server

    cubelets Cube Definition

    DB2 Alphablox Application

    Data Blox

    Present Blox Grid Blox Chart Blox

    OLAP MetadataOLAP Metadata

    MetadataImport

    DB2 Cube Views Star SchemaDB2 MQTs

    HTTP Server

    MDX MDX

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    12/49

    IBM Software Group

    12

    Versatile Architecture Support

    Mart

    BI Applications and Tools

    EDW

    External

    Marts

    Internal

    Marts

    Virtual

    Marts

    DB2 Warehousesupports versatileanalytics

    architectures

    Analytics directed

    againstExternal Mart

    Internal Mart

    Virtual Mart

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    13/49

    IBM Software Group

    13

    IBM DB2 Warehouse Software

    Modelinga

    nddesign

    Administrationandcontrol

    Data movement and transformation

    Database management

    Performance optimization

    Workloadcontrol

    Datapartitioning

    Deepcompression

    Embedded analytics

    In-lineanalytics

    IBM DB2 Warehouse

    Data mining andvisualization

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    14/49

    IBM Software Group

    14

    DWE Easy Mining Mining without a Statistician

    Realize the benefits of mining by enablinganalysts, rather than relying on statisticians,for your data mining needs

    Reporting Tool

    DB2 DataWarehouse

    Edition

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    15/49

    IBM Software Group

    15

    Two Types of Data Mining Discovery & Predictive

    Predictive

    Specific question

    Probability associated with outcomes

    Directed analysis

    Iterative process

    Train

    Test

    Apply

    Apply model in database at customer touch points

    Discovery

    Automatically find trends and patterns

    Answer unasked questions

    Relatively undirected analysis

    Tool reports on findings

    In a word Easier

    Useful for non-statisticians

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    16/49

    IBM Software Group

    16

    Discovery Methods finding useful patterns and relationships

    Associations

    Which item affinities (rules) are in my data?

    [Beer => Diapers] single transaction

    Sequences

    Which sequential patterns are in my data?

    [Love] => [Marriage] => [Baby Products] sequential

    Clustering

    Which interesting groups are in my data?

    customer profiles, store profiles

    Predictive Methods predicting values Classification

    How to predict categorical values in my data?

    will the patient be cured, harmed, unaffected by treatment?

    Regression

    How to predict numerical values in my data?

    how likely a customer will respond to the promotion

    how much will each customer spend this year?

    Score data directly in DB2, scalable and real time

    DWE Easy Mining Algorithms

    Select Transform Mine Assimilate

    ExtractedInformation

    AssimilatedInformation

    SelectedData

    DataWarehouse

    Statistician & DataMining Workbench

    DWE

    Enterprise DataWarehouse

    BusinessAnalyst

    DWE

    Partner

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    17/49

    IBM Software Group

    17

    How to Recognize a Data Mining Need

    What do my customers look like?

    Which customers should I target in a promotion?

    Which products should I use for the promotion?

    How should I lay out my new stores?

    Which products should I replenish in anticipation of a promotion?

    Which of my customers are most likely to churn?

    How can I improve customer loyalty? What is the most likely item that a customer will purchase next?

    Who is most likely to have another heart attack?

    What is the likelihood of a part failure?

    When one part fails, what other part(s) are most likely to fail soon?

    How can I identify high-potential prospects (lead generation)?

    How can I detect potential fraud?

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    18/49

    IBM Software Group

    18

    High Level view of the Data Mining Process

    Data

    Ware-

    house

    BusinessProblem

    Insight

    Extract &

    Transform data

    Build Model

    Deploy

    Validate,Refine

    A minor

    miracle occurs

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    19/49

    IBM Software Group

    19

    The Data Mining Process

    Select Transform

    SelectData

    DataWare-

    house

    BusinessProblem

    Mine

    ApplyResults

    Revise Data & Refine Model

    Visualize

    Understand

    Analyze

    Data Preparation Data Mining

    Report

    Score data

    Embed inapplication

    Y=f(X

    ,Z)

    (

    (

    ((X

    j)

    Discover & Interpret

    Information

    ETL

    MININGDEPLOY

    This is an

    iterativeprocess!

    MINING

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    20/49

    IBM Software Group

    20

    Associations

    Discovery technique to find associations or affinities among items (or conditions,outcomes, etc.) in a single transaction.

    Constructs statements (rules) that quantify the relationships among items that tend tooccur together in transactions

    Example:

    In a supermarket, Cola is bought in 20% of all purchases.

    Cola is bought in 60% of the purchases involving Orange juice.

    3.7% of all purchases involve both Cola and Orange juice.

    The rule [ Orange juice ] [ Cola ] has the following properties:

    Support = 3.7% Cola and OJ are present together in 3.7% of all baskets. Confidence = 60% Cola is present in 60% of the baskets containing OJ. Lift = 60% / 20% = 3 Cola is 3 times as likely to be in the basket when OJ is also.

    Scoring

    Given the item(s) purchased (rule body), what item (rule head) is most likely to bepurchased as well?

    Common uses

    Promotional or cross-sell offers, Disease management, Part failure

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    21/49

    IBM Software Group

    21

    Sequences

    Discovery technique to find affinities among items (or conditions, outcomes, etc.)across multiple transactions over time.

    Quantifies relationships (sequences) to identify the most likely item in the next transaction

    ScoringGiven the item(s) purchased previously (rule body), what item (rule head) is most likely to

    be purchased in a subsequent transaction within a certain time frame?

    Common uses

    Fraud detection, Promotional offers, Disease management, Part failure

    G, B ---- C ---- X

    B ---- A ---- Y

    Y ---- D ---- C --- B ---- X

    100% of the customers who get Cwill get X at a later time

    67% of the customers who get B

    will get X at a later timeX

    C

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    22/49

    IBM Software Group

    22

    Clustering

    Discovery technique to find clusters having distinct behaviors andcharacteristics

    Gain insights to customers, stores, insurance claims, etc.

    Generate distinct behavioral/demographic profiles

    Understand the most important attributes of each cluster

    Create a model to assign individuals to best-fit clusters

    Apply model to assign new individuals or re-assign existing individuals

    Design business actions tailored to different characteristic profiles

    Scoring

    Apply model to assign each record to its best-fit cluster

    Apply appropriate business action for each record based on its assignedcluster

    Common uses

    Customer segmentation, store profiling, deviation detection

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    23/49

    IBM Software Group

    23

    Classification

    Prediction technique to classify individuals by outcome

    Classify by a categorical class variable (e.g., YES-NO-MAYBE response)

    Understand the most important factors (predictors) leading to each outcome

    Modeling

    Create a model to classify individuals according to expected outcome

    Design business action based on most important predictors

    ScoringApply model to predict the outcome for each individual

    New prospects (expected behavior)

    Existing individuals (changes in behavior)

    Identify target individuals for business action

    Common uses

    Customer attrition (churn), Part failure

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    24/49

    IBM Software Group

    24

    Regression Set of predictive techniques to predict a dependent variable

    Predict continuous value or binary numeric value

    Continuous: e.g., revenue (prediction represents amount of revenue)

    Binary: e.g., 0=No, 1=Yes (prediction represents probability of Yes)

    Understand the most important predictors of the dependent variable

    Transform regression, linear regression, polynomial regression

    Modeling

    Create a model to predict the dependent variableDesign business action (e.g., predict likelihood of default for a loan

    application, in real time)

    Scoring

    Apply model to generate a prediction for each individual (e.g., probability ofpart failure)

    Identify target individuals for business action

    Common uses

    Predict revenue/cost/profitability, Predict risk of loan default

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    25/49

    IBM Software Group

    25

    The Data Mining Process

    Select Transform

    SelectData

    DataWare-

    house

    BusinessProblem

    Mine

    ApplyResults

    Revise Data & Refine Model

    Visualize

    Understand

    Analyze

    Data Preparation Data Mining

    Report

    Score data

    Embed inapplication

    Y=f(X

    ,Z)

    (

    (

    ((X

    j)

    Discover & Interpret

    Information

    ETL

    MINING

    DEPLOYThis is an

    iterativeprocess!

    ETL

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    26/49

    IBM Software Group

    26

    Data exploration

    DWE enables you to explore the data.

    Check data quality (prior to performing ETLfor data preparation) and gain a general

    understanding of the data Design Studio provides four tools to

    inspect data:

    Table sampling

    Univariate distributions

    Bivariate distributions

    Multivariate distributions

    All these tools are accessible by right-clicking on a table/view/alias/nicknamein the database explorer:

    -> Datafor table sampling/editing

    -> Value Distributionsfor multivariate/univariate/bivariate distributions

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    27/49

    IBM Software Group

    27

    The Data Mining Process

    Select Transform

    SelectData

    DataWare-

    house

    BusinessProblem

    Mine

    ApplyResults

    Revise Data & Refine Model

    Visualize

    Understand

    Analyze

    Data Preparation Data Mining

    Report

    Score data

    Embed inapplication

    Y=f(X

    ,Z)

    (

    (

    ((X

    j)

    Discover & Interpret

    Information

    ETL

    MINING

    DEPLOYThis is an

    iterativeprocess!

    DEPLOY

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    28/49

    IBM Software Group

    28

    Leveraging Mining and Alphablox: DWE Miningblox

    Create web applications that provide access to DWE Data Mining

    Extends the DB2 Alphablox API with mining specific functionality.

    With Miningblox, you can perform the following tasks:

    Selecting input data

    Processing input data

    Displaying mining results graphically in a Web browser, for example, thecharacteristics of a customer segment

    Administering or managing mining runs

    Typically a web application using MiningBlox tags might be integrated in a

    business application or an intranet portal.

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    29/49

    IBM Software Group

    29

    Why use Miningblox ?

    Provide access to Data Mining for a group of business analysts.

    Create a Miningblox web application that provides access to mining functionalitythrough the Web browser, no need to install software on the Clients machines

    Analysts can execute mining runs and view results in a customized webapplication without extensive knowledge about mining software.

    With the Miningblox Application wizard in the DWE Design Studio, you can easily

    create Web applications by selecting sample templates or you can extendAlphablox applications with mining functionality.

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    30/49

    IBM Software Group

    30

    Deployment through Alphablox application example

    MBA application console

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    31/49

    IBM Software Group

    31

    Deployment through Alphablox application example

    MBA execution

    IBM S ft G

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    32/49

    IBM Software Group

    32

    Deployment through Alphablox application example

    MBA completion

    IBM S ft G

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    33/49

    IBM Software Group

    33

    Deployment through Alphablox application example

    MBA results report

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    34/49

    IBM Software Group

    2007 IBM Corporation

    Case Study: Retail Department Store

    Analytics with Data Mining and Alphablox

    David Cope

    EDW Architect Asia Pacific

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    35/49

    IBM Software Group

    35

    Retail Department Store Chain

    Business requirements

    Perform a data mining POC (really a pilot project) to support the original DWEdecision, ensure success, and highlight DWE capabilities for further uptake

    Define business problem Boost storewide sales (across other departments) based on womens shoes

    Define analytical approach and ETL procedure

    Extract all transactions of customers who have purchased womens shoes

    Transform transactional data into one record per customer, for customersegmentation

    Perform market basket analysis (MBA) for high-potential customers who havepurchased womens shoes

    Challenges

    Engagement sponsored by IT with limited access to business users (LOB)

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    36/49

    IBM Software Group

    36

    Solution Overview

    DB2 Data Warehouse

    Analytical Dashboard

    Alphablox

    Data Mining

    Visualizer/Alphablox

    Heat Maps

    / OtherVisualization

    CubingEngine DataMining API

    Prepare data for mining by:

    Pulling transactions for womens shoecustomers

    Creating data for customer segmentation

    Use DB2 Mining to perform:

    Clustering

    Identify high-potential customer segments

    Market Basket Analysis for high-potentialsegments

    Identify associated items

    Identify next-most-likely purchases

    Deploy mining results in Alphablox

    Integrate data mining information into the

    dashboard and as part of the guided analysis

    Build a dashboard in Alphablox:

    Provide critical information and metrics in anAlphablox dashboard to merchandising and

    marketing.

    Integrate powerful visualization to make iteasier to identify problem areas

    Mining Models & Services Clustering Associations & Sequences Scoring Services

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    37/49

    IBM Software Group

    37

    Business Scenario for Mining

    Business requirements for POC

    Focus on customers who have purchased womens shoes in the past 12 months

    Boost storewide sales (across other departments) based on womens shoes

    Increase wallet share from high-potential customers

    Business questions to be answered

    What do my womens shoes customers look like?

    Which of these customers should I target in a promotion?

    Which products should I use for the promotion?

    Which products should I replenish in anticipation of a promotion?

    How can I improve customer loyalty?

    What is the most likely item that a womens shoes customer will purchase next?

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    38/49

    IBM Software Group

    38

    Step 1: Identify High-Potential Shoe Customers

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    39/49

    p

    39

    Result: 16 Distinct Clusters Created

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    40/49

    p

    40

    Cluster 1: Those who Act Like VIPs

    VIPs

    FrequentShoppers

    Big

    Spenders

    ActiveShoppers

    Respond toDiscounts High Returns

    High Potential Customers!

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    41/49

    41

    Cluster 6: Frequent Good Shoppers

    Shop Here 30

    days/yrAbove-AvgPurchases

    Above-Avg

    Spending

    Respond toDiscounts

    AverageReturns

    High Potential Customers!

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    42/49

    42

    Step 2: Identify Associated Items for Clusters 1 & 6

    Extracted transactions for those clusters of customers

    Performed market basket analysis and interpreted results

    Associations (items purchased together in one visit)

    +

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    43/49

    43

    Identify Purchased Together for Clusters 1 & 6

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    44/49

    44

    Results: Associations for Clusters 1 & 6

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    45/49

    45

    Step 3: Identify Next Likely Purchase for Clusters 1 & 6

    Extracted transactions for those cluster of customers

    Performed market basket analysis and interpreted results

    Sequences (next most likely purchase in a future visit)

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    46/49

    46

    Identify Next Likely Purchases for Clusters 1 & 6

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    47/49

    47

    Results: Sequences for Customers in Clusters 1 & 6

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    48/49

    48

    Results and Future Ideas

    Deployment of customer segmentation and MBA

    End-user application with Alphablox

    Create & refresh mining models

    Identify high-potential customer segments Refresh assignment of each customer to best-fit cluster

    Target selected customer segments for promotions

    Batch scoring to identify best offer(s) for each customer/segment

    Merchandising now has a view of their customers, not just products

    Future ideas

    Score a customer at checkout register in real time

    MBA scoring (associations, sequences)

    Focused MBA scoring for known customers, based on best-fit cluster

    Make an offer to induce customers to visit other departments before leaving the store

    IBM Software Group

  • 7/28/2019 IBM Exploiting Your Data Warehouse

    49/49

    49