machine_learning_101_dist.pdf - home | tibco community · machine learning 101 ... •fraud &...

42
© Copyright 2000-2016 TIBCO Software Inc. Mike Alperin September, 2016 Machine Learning 101

Upload: lethuy

Post on 09-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

© Copyright 2000-2016 TIBCO Software Inc.

Mike Alperin

September, 2016

Machine Learning 101

© Copyright 2000-2016 TIBCO Software Inc.

• What is Machine Learning?

• Decision Tree Models

• Customer Analytics Examples

• Manufacturing Examples

• Fraud Use Examples

• Machine Learning on the TIBCO Community

Agenda

© Copyright 2000-2016 TIBCO Software Inc.

What is Machine Learning?

Machine Learning

Machine learning is a method of data analysis that automates analytical

model building. Using algorithms that iteratively learn from data, machine

learning allows computers to find hidden insights without being explicitly

programmed where to look.

Machine Learning

Machine learning is a method of data analysis that automates analytical

model building. Using algorithms that iteratively learn from data, machine

learning allows computers to find hidden insights without being explicitly

programmed where to look.

Enabled by exponentially

increasing compute power –

doubling every 2 years

6

Why use machine learning algorithms?

• Good Results• Machine learning algorithms + Big Data sets can

produce models that accurately fit complex data patterns.

• Can make predictions for complex processes & systems• Can handle systems with hundreds or thousands of

variables

• Easy to use / Simple user interface• Computer algorithm does the heavy lifting • Results presented with easy-to-understand visualizations

© Copyright 2000-2016 TIBCO Software Inc.

© Copyright 2000-2016 TIBCO Software Inc.

• Supervised – Solve known problems

• Build a model that predicts something

• What factors are driving fraud or customer behavior or manufacturing defects?

• Decision Trees, Random Forest, Gradient Boosting Machine

• Unsupervised – Identify new patterns, Detect anomalies

• Are there new fraud clusters or buying patterns or failure modes emerging?

• Clustering, Principle Components, Neural Networks, Support Vector Machines

• Optimization – Support Decision-making

• Find best solution even when there are complex constraints

• What is the optimum route to take or allocation of resources or equipment maintenance schedule?

• Genetic Algorithm

Types of Machine Learning

© Copyright 2000-2016 TIBCO Software Inc.

• Customer Analytics - Prediction of customer behavior: customer

segmentation, customer churn, cross-sell/up-sell, propensity

• Fraud & Financial crime – Money laundering, credit card fraud,

medical fraud, insurance fraud

• Manufacturing - Optimization of manufacturing equipment,

processes and product yield

• Energy - Completions optimization, Blend optimization, Predictive

maintenance

• Transportation & Logistics - routing optimization, fuel efficiency,

predictive maintenance and warehouse distribution / space

optimization

Use Cases that leverage Machine Learning

Advanced Analytics and Big Data Tools

Many more ….

© Copyright 2000-2016 TIBCO Software Inc.

Decision Trees

© Copyright 2000-2016 TIBCO Software Inc.

Decision Tree – Titanic Survival Rate

family size

Wikipedia

© Copyright 2000-2016 TIBCO Software Inc.

Classical Statistics – Fit parameters to a well-defined model

Decision Tree – Product Pass / Fail by Process & Equipment

Bad Product

Good Product

Clearcoat Bake Temperature>= 132 C

Sanding Station1, 2, 4 3 Basecoat Thickness

Peeling Clearcoat

< 132 C

… … … …

Automobile Paint Process

Decision Tree – Training and Test Data Sets

© Copyright 2000-2016 TIBCO Software Inc.

Ensemble Tree Algorithms

• Random Forest, Gradient Boosting Machine (GBM)

• Method – Average many simple trees

• Sample the data: fit a simple tree

• Re-sample the data; up-weighting the observations that weren’t fitted well in

previous model

• Continue adding trees until fit is good

• Save all the trees and average them

• Better fit + prediction than single trees

© Copyright 2000-2016 TIBCO Software Inc.

Customer Analytics Examples

Consumer Analytics

• Segmentation

• Propensity

• Affinity & Association

• Social: Sentiment & Intent

• Churn

• Loyalty

• Cross-sell / Up-sell

• Test & Learn (A|B testing)

• Online Analytics (Path,

Cart Abandonment, …)

Market Analytics

• Pricing

• Promotion

• Campaign Effectiveness

• Forecasting

• Market Mix

• Media Attribution

Customer

Acquisition

Customer

Retention

Relationship

Growth

Customer

Lifecycle

PoS, Panel

Loyalty Data

Market (Syndicated) Data

Store & Distribution Analytics

• Store Clustering; geospatial

modeling

• Store Performance

• Forecasting

• Effects: Price, Promotion

• Distribution: Pick, Pack, Ship

Store and DC Data

Customer & Marketing Analytics

© Copyright 2000-2016 TIBCO Software Inc.

Customer Segmentation

Top Shopper

27% of customers &

35% of revenues

Broad purchase behavior

Budget Minded

34% of customers &

29% of revenues

Highly focused on

core building categories

Outdoor Plus

15% of customers &

16% of revenues

Mainly outdoor, but

other spending

Gardener

10% of customers &

5% of revenuesPrimarily garden

Seasonal Shopper

11% of customers &

12% of revenues

Very “event” oriented

Pool Customer

3% of customers &

4% of revenues

Very focused on pool and

patio categories

© Copyright 2000-2016 TIBCO Software Inc.

Segmentation - Cluster Analysis

Objectives:

• Select most important “Response Products” to highlight in 2015

Holiday season direct marketing

• Identify and quantify predictive significance of “Driver Products”

based on historical data from 2014 sales

• Build campaigns for as many people as possible that are relevant

Propensity to Buy – Customer Success Story

Propensity to buy models

Results

• Same year repeat visits are 3x higher for customers targeted in the

campaign

• Average order value is much higher

• Year over Year repeat visitors is double

Telco Machine Learning Churn Model

Predicted Prob(attrition) = f (X, b)

• Y variable

• Attrition (Y/N over time period)

• X variables

• How long a member

• Website interactions - section

• Prior spend

• Time since last interaction

• Experian: demog, …

• f function

• Additive Model

• Random Forest, Gradient Boosting

Variable Names

Redacted

Attrition and Value Models

Call Center Real-time Alert Actions

Real-Time Customer Interactions / Offers

No Login = No Customer History => Offer based on Product Association

Sarah Login = Sarah’s Customer History => Offer based on Propensity Model scored for Sarah

© Copyright 2000-2016 TIBCO Software Inc.

© Copyright 2000-2016 TIBCO Software Inc.

Manufacturing Examples

Correlate Product or Equipment Results to Process & Supplier Data

• Supplier - Incoming Materials and Components• measured electrical, chemical, physical characteristics

• batch-id, lot_id

• Manufacturing Process• Physical, chemical or electrical measurements

• WIP / MES: track-in / track-out date, process equipment id, recipe, operator, …

• Process equipment sensor data

• Equipment Maintenance logs

• Defect Inspections

• Cost of labor, materials, machines and facilities

• Product Quality and Reliability Test• Measured product functional and performance characteristics

• Accelerated life test results

• Product Field Returns• Failure mode, unit / batch / lot ID

• Failure analysis root cause results

• Warranty / Repair claim, call center and cost – structured & unstructured

• Problem

• Product & Equipment problems difficult to accurately diagnose for complex manufacturing processes

• Big Data problem – millions of units, hundreds / thousands of predictors

• Response: Product, Process or Equipment Fail data

• Predictors: in-process equipment, process and product measurements or attributes

• Value

• Being used by customers to find previously undetected problems. Reduces time-to-market and increases profit.

• Method

• GBM analysis template to identify significant predictors, interactions and nonlinearities

• For large datasets, hybrid data access used to perform variable reduction step in-DB

• Simple interface – easy for business analyst to run and interpret results

GBM results for semiconductor yield as a function of in-process equipment & product measurements

Machine Learning to Predict Equipment or Product Fails

Real-time Predictive Analytics for Process Cost reduction

Goal: Scrap parts as early as possible to reduce costs in a manufacturing process.

Question: When to scrap a part in Station 1 instead of sending it to Station 2?

Station 1 Station 2

Cost Before9€

7€ 13€Total Cost

29€(or more)

Scrap? Scrap?

TIBCO Spotfire with H2O Integration

Advanced Analytics (“Scrap parts as early as possible!”)

Deploy real-time model: TIBCO Live Datamart & Streambase

Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)

Live Dartmart Desktop Client

© Copyright 2000-2016 TIBCO Software Inc.

Fraud Examples

Step 1 – Catching New Fraud Like Old Fraud – Supervised Learning

Model to predict credit card fraud based on customer information: Variable Importance chart

Sort existing transaction by Probability of Fraud

Step 2 - Find unusual transactions - Unsupervised learning

Fradulent

Good

Algorithm examples:

• Principle Component Analysis

• Auto-encoder Neural Network

• Single-class Support Vector Machine

• Clustering (e.g. K-means, Hierarchical)

Sort existing transaction by Oddity

Prioritize investigators’ work

Step 3 – apply models in real-time with Streambase

Deploy models in real-time with a click from Spotfire

Monitor transactions in real time with LiveView

© Copyright 2000-2016 TIBCO Software Inc.

Machine Learning on the TIBCO Community

© Copyright 2000-2016 TIBCO Software Inc.

Learn & Do More: Machine Learning on the TIBCO Community

Wiki page

Component Exchange:• Data functions• Accelerators• Templates

https://community.tibco.com/wiki/machine-learning-tibco-spotfire-and-streambase

https://community.tibco.com/exchange/tags/machine-learning-12816

© Copyright 2000-2016 TIBCO Software Inc.

Thank You