machine learning in big data

Machine Learning in Big Data- Look forward or be left behind

V. William Porto Hadoop Summit Dublin 2016

Overview of RedPoint Global

2 RedPoint Global Inc. 2016 Confidential

Launched in 2006

Founded and staffed by industry veterans

Headquarters: Wellesley, Massachusetts

Offices in US, UK, Australia, Philippines

Global customer base

Serves most major industries

Overview of RedPoint Global


MAGIC QUADRANTData Quality

MAGIC QUADRANTIntegrated Marketing

Management

MAGIC QUADRANTMultichannel Campaign

Management

MAGIC QUADRANTDigital Marketing Hubs

FORRESTER WAVE™Cross-channel

Campaign Management

FORRESTER WAVE™Data Quality Solutions


With apologies to Gary Larson

Hadoop


Machine Learning – why bother?

If you have always done it that way, it is probably wrong” - Charles Kettering


Machine Learning – keeping ahead of the curve

• Three basic tenants for success in today’s world

• Prediction - you need to learn and use what you’ve learned

• Optimization - the world is a dynamic place

• Automation - because people don’t scale well


Machine Learning – what really is it all about?

• Learning vs. instruction

• Humans learn instinctively – computers not so much

• Intelligent Systems

• Memory

• Prediction (modeling)

• Assessment

• Feedback

• Adaptation


Data Modeling – what, why, how

• Regression – what happened in the past• Prediction – what will happen in the future

“Prediction is very difficult – especially if it’s about the future”

- Nihls Bohr


Data Modeling – what, why, how

The wide world of data modeling

• Supervised models• you have historical data and known correlated outputs (truth)

• Unsupervised models• historical data, but may not have (or trust) associated outputs


Decision Trees

Major Assumption: the world is discrete• fast, easy to understand, no linearity assumptions

• ‘human time’ required, unbalanced and/or large trees


Standard Linear Models

Assumption: the world is linear• the real world really isn’t linear

• all errors are not all equal

• easy to get misleading results

? !

Which line is best?


Generalized ‘Non-Linear’ Models

Assumptions• underlying functional mapping is known

• all errors are equal

• data is ‘well-conditioned’

• ‘standard’ error distribution

• Polynomials

• Exponentials (e.g., Gaussian, Poisson)

• Piece-wise linear


Non-Linear Models

Assumption: data is representative• ‘universal’ modeling tools

• fast execution

• no linearity assumptions

• lots of parameters, many techniques

• difficult to explain

Artificial Neural Network


User Story: Predict Retention / Attrition

Historical Behavioral Data

Customer Rating

Retention Customer NameLoyalty

MemberDays Since

Last PurchaseImmediate Relatives

Household Children

Customer IDLatest

Purchase Price

Latest Purchase Item ID

Region Code

Customer Capture Method

Customer Contact Code

Domicile

1 1 Allen, Geraldine yes 29 0 2 24160 211.39 B5 MW 2 6 St Louis, MO1 1 Anderson, Harry no 48 0 3 19952 26.55 E12 NE 3 New York, NY1 1 Andrews, Cynthia yes 63 1 0 13502 77.95 D7 NE 10 6 Hudson, NY1 0 Andrews, Thomas Jr no 39 0 0 112050 0 A36 SW Los Angeles, CA1 1 Appleton, Mary yes 53 2 3 11769 51.49 C101 NE D Bayside, Queens, NY1 0 Ashbury, Jeffrey no 47 1 0 PC 17757 29.99 C62 C64 NE 124 New York, NY1 1 Aston, Mrs. yes 18 1 0 PC 17757 29.99 C62 C64 NE 4 New York, NY1 1 Barber, Ellen yes 26 0 2 19877 78.85 S 61 1 Barkley, Henry no 80 0 0 27042 30 A23 NE B Yorktown, PA1 0 Baumann, David no 0 0 PC 17318 25.99 NE New York, NY1 1 Bazzeno, Alice yes 32 0 1 11813 76.95 D15 C 8 341 0 Beattie, Mr. Samuel no 36 0 0 13050 75.29 C6 C A 11 Winnipeg, MN1 1 Beckworth, June yes 47 1 1 11751 52.49 D35 NE 5 New York, NY1 1 Behr, John no 26 0 0 111369 30 C148 NE 5 New York, NY1 1 Biden, Roseanne yes 42 0 0 PC 17757 127.99 C 41 1 Bird, Ellen yes 29 0 0 PC 17483 18.95 C97 S 81 0 Birnbaum, Jason no 25 0 0 13905 26 C 148 San Francisco, CA


User Story: Predict Customer Retention / Attrition

Machine Learning Processing Chain - Training


User Story: Predict Retention / Attrition

Machine Learning Processing Chain - Prediction

Reward predicted ‘retainees’ with

targeted product offerings

Give potential attrition customers special

incentives to stay with the business


User Story: Accurate vs. Useful Prediction

Sparse data + Least-Squares (Linear) Classifier• Task: predict chance of purchasing a sundry item

• Result: ‘best’ model always predicts “none”

• Analysis: LS algorithm assumes all errors are equalBread

Cake & Pie

Chocolate Coffee Cookie DieselJuice & Smoothies

Lubricants MilkOther Bakery

Premium Sandwich Snack TeaTotal Transaction

Total Revenue

0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 30000 0 0 0 0 3 0 0 0 0 0 0 0 0 3 20000 0 0 0 0 0 0 0 0 0 0 0 0 0 6 18000 0 0 0 0 5 0 0 0 0 0 0 0 0 6 48000 0 0 2 0 0 0 0 0 0 0 0 0 0 2 1000 0 0 0 0 1 0 0 0 0 0 0 0 0 1 18280 0 0 0 0 0 0 0 0 0 0 0 0 0 13 164600 0 0 0 0 2 0 0 0 0 0 0 0 0 2 10000 0 0 0 0 2 0 0 0 0 0 0 0 0 2 15000 0 0 0 0 0 0 0 0 0 0 0 0 0 7 46000 0 0 0 0 11 0 0 0 0 0 0 0 0 11 19381.50 0 0 0 0 1 0 0 0 0 0 0 0 0 1 18600 0 0 0 0 0 0 0 0 0 0 0 0 0 3 30000 0 0 0 0 0 0 0 0 0 0 0 0 0 18 9838.820 0 0 0 0 0 0 0 0 0 0 0 0 0 22 110000 0 0 0 0 5 0 0 0 0 0 0 0 0 19 182250 0 0 0 0 0 0 0 0 0 0 0 0 0 1 5000 0 0 0 0 0 0 0 0 0 0 0 0 0 1 8000 0 0 0 0 0 0 0 0 0 0 1 0 0 7 79900 0 0 0 0 0 0 0 0 0 0 0 0 0 5 38200 0 0 0 0 1 0 0 0 0 0 0 0 0 55 43230


Clustering/Segmentation – group think

Collaborative FilteringRelationship Matrix


Personalization – not really

!=


Clustering/Segmentation

Similarity?

Customer Browser GenderAge

SectorIncome Sector

Married Children HomeownerRecent Baby

Clothes Purchase

George IE9 M 0 A N 0 1 NCarol Chrome F 1 B Y 1 0 YMary IE9 F 0 A N 1 0 Y

Dist(George,Carol) = 8Dist(George,Mary) = 4Dist(Carol,Mary) = 4

Can you afford to target (George,Mary) the same way as (Carol,Mary) ?


Clustering/Segmentation

Basic Question – which one describes the data the best?

Raw data

How many clusters are there ?

Two Clusters

Four Clusters

Six Clusters


Clustering/Segmentation with Statistics

• relatively simple

• data distribution assumptions

• initialization dependencies

0 10 20 30 40 50 60 70 80 90 1000

102030405060708090

100Raw Data

0 10 20 30 40 50 60 70 80 90 1000

102030405060708090

100Ellipsoidal Clustering

0 10 20 30 40 50 60 70 80 90 1000

102030405060708090

100K-Means Clustering


Clustering/Segmentation – data driven

• let the data speak for itself

• multiple data projection ‘views’

• important boundary relationships

(“swing voters”)

Customer Demographics


User Story: Clustering / Segmentation

ML Clustering - Training ML Clustering – Processing New Data


Model Selection – how to choose?

• Basic Model Type (prediction or segmentation)• inputs + correlated outputs• inputs only?

• Basic Questions:• what to use for my problem?• parameters?• is this the best choice?• could I do better, and how?


Optimization – Evolving better solutions

• Simulated Evolution• fast, efficient search• always have a solution• arbitrary ‘evaluation’ functions• can start with existing solution(s)

• Variation – alter model type, parameters• Assessment – how well does the model work?• Selection – survival of the fittest


Evolutionary Optimization – Evaluation Function

• can use any measureable data• no continuity assumptions• no differentiability assumptions• no symmetry assumptions

Sunshine Hurricane

20 -10005 50

SunshineHurricane

Prediction

Reality (Truth)


User Story: Optimizing Classification Models

Task: Predict Retention/Attrition

0 1 2 3 4 5 60.00

20.00

40.00

60.00

80.00

100.00

34.828.8

24.5 22.1 20.9

62

70.2 72.3 73.4 75.2

Model Performance Optimization

Classification AccuracyTest Set Error (RMS)

GenerationPe

rfor

man

ce

17 Potential input features(customer demographics)

2 outputs (retention/attrition)

1300 Training Samples (50 – 50, A / B Split)1300 Test Samples ( naïve test data )


Use Case – Fully Adaptive Feedback (Next Best Offer)

DB

Historical User Behavior

(stimulus/response)

Train / Update Model

Non-Adaptive (Fixed) Mode

Randomized A/B/C Offer Selection

Adaptive ML Mode

ML Prediction Offer Selection

Operation (Trigger)

Ad / Offer (stimulus)

Feedback Cycle


Five Keys to Successful Machine Learning

• Let the data speak for itself – don’t force fit your models• Remember, all errors are not all equal – use this to your advantage• True learning requires continual adaptation !• Automate the process with feedback – remove the “man-in-the-loop”• Trust the optimization process – it really works!


Q&A

Contact InfoVisit : www.redpoint.net

Bill PortoSr. Engineering AnalystRedPoint Global [email protected]

Want More Information about this topic?

Fill out your card or go to redpoint.net/hadoopeurope

machine learning in big data

Technology