predictive analytics - what it really is and what it really does

77
PREDICTIVE ANALYTICS WHAT IT REALLY IS AND WHAT IT REALLY DOES Kevin Gray Cannon Gray LLC http://www.cannongray.com/ [email protected]

Upload: kevin-gray

Post on 17-Aug-2015

71 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Predictive Analytics - What It Really Is and What It Really Does

PREDICTIVE ANALYTICS

WHAT IT REALLY IS AND WHAT IT REALLY DOES

Kevin GrayCannon Gray LLC

http://www.cannongray.com/[email protected]

Page 2: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

"There falls the words of fools about my ears...

Answers everywhere, promising solutions to my fears,

leading through halls with no doors in the walls,

and leave me in the darkness."

The Silence of a Candle (Ralph Towner)https://www.youtube.com/watch?v=6Do_P7R9tQE

Page 3: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

This Presentation…

…will not be a sales pitch promising that Predictive Analytics is The Answer to everything

Instead, it will be a snapshot of a complex topic

But first…what is “Predictive Analytics?”

Page 4: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Two Examples

A Data Scientist at a university develops a model to identify students at high risk of defaulting on their student loans

A Data Scientist at a financial services company builds a model to identify customers most likely to invest in certain new retirement funds

Page 5: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Page 6: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Confession…

In each case the Data Scientist was me…

…and this was the 1980s!

Page 7: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Page 8: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

“New” Can Be Old

“Data Scientist,” “Big Data,” and “CRM” were not used at the time

“Predictive Analytics” was called Predictive Modeling

Page 9: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

“New” Can Be Old

The origins of Predictive Modeling can be traced back centuries to epidemiology, actuarial science, astronomy and other fields

Banks had developed credit scoring systems based on statistical models at least as far back as the 1950s

Page 10: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Plus ça change, plus c'est la même chose?

"Sometimes things don’t change as much as all the

terminology changes!"

- veteran US analytics recruiter (personal communication)

Page 11: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Data, Data, Everywhere…

Companies store data for many reasons Legal (e.g., Sarbanes-Oxley) HR Operations Supply chain management Customer service Sales ... Marketing

Page 12: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Not Just For Marketing

Most Predictive Analytics has no connection with marketing Medical and pharmaceutical research Fraud detection Finance Human Resource Management Oil and gas exploration Network security Military and National security Seismology

Page 13: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Marketing Applications

Marketing applications are most common in industries with detailed consumer data Retailing Banking Insurance Travel and hospitality Medical/Pharmaceutical Telecommunications

Page 14: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Marketing Applications

A few examples are Customer Relationship Management Customer retention Retail recommender systems Direct marketing Cross selling Targeted ads Analysis of website traffic

Page 15: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

“Big” Comes In Many Sizes

There is absolutely no requirement that Predictive Analytics data must be “big”

Sometimes just a few hundred observations My student loan model Segmentation typing tools

Page 16: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

“Big” Comes In Many Sizes

Massive, high-velocity, streaming data also used but not required

Data can be structured or unstructured Real-time analytics the exception, not

the rule

Page 17: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

No Escaping The Basics

Understanding the fundamentals of research and statistics is now more

important than ever!!

Page 18: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Sampling

Sampling is an essential part of most research, not just market surveys

Predictive Analytics is typically based on a sample then deployed on new data

Page 19: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Sampling

Seldom need zillions of records to develop and evaluate the model

Complex sampling and weighting sometimes used e.g., when predicting rare events such as

fraud

Page 20: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Design And Inference

Knowledge of experimental and quasi-experimental designs essential e.g., when running different campaigns

among different customer groups Sound grasp of causal inference also

needed e.g., “Why isn’t our campaign working?”

Page 21: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

“Trad” Stats Are Not Dead

Descriptive Statistics Principal Components Analysis Multiple Regression Ridge Regression, LASSO Partial Least Squares Regression Logistic Regression Discriminant Analysis CHAID, CART Survival Analysis Time-series Analysis Mixture Modeling/Latent Class

Page 22: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

“Janitorial Work”

Big Data is often small data repeated many times and can be substantially reduced at the pre-processing stage e.g., may only need monthly spend on one

food category in P1Y not every transaction in P5Y

Page 23: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

“Janitorial Work”

Many data fields are “exhaust” By-products of transactional or operational

processes and not useful in Predictive Analytics

Most data have little or no marketing value

Page 24: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Predictive Analytics Is A ProcessCRISP-DM: Cross Industry Standard Process for Data Mining

Page 25: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Page 26: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Core Concepts Of Predictive Analytics

Existing data are used to develop a model that scores new data

By score I mean any of the following: Classifying into most probable group (will

purchase/ will not purchase) Assigning a probability score (probability of

purchase) Predicting a quantity (how much will spend)

Page 27: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Core Concepts Of Predictive Analytics

Existing data are used to develop a model that scores new data

By new I mean data not used to build the model, for example: Data that do not yet exist (e.g., future

customers) Data deliberately set aside (held out) when

the model was being built

Page 28: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Core Concepts Of Predictive Analytics

Existing data are used to develop a model that scores new data

By model I mean either of the following: An equation or system of equations used to

represent the process that generated the data - a statistical model

A computer algorithm designed for pattern recognition - a machine learner

These are not official definitions!

Page 29: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Overfitting

Any sample has its idiosyncrasies We want to develop a Predictive Model

that generalizes well to new data Must try to avoid modelling noise in our

data Known as overfitting Models will always be less accurate on new

data (“shrinkage’)

Page 30: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Overfitting

Overfitting is related to bias-variance trade-off

Overfitting is nearly always a concern, but especially when few cases (e.g., B2B customers) and many independent variables (predictors) “small n large p”

Page 31: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Overfitting and Bias-Variance Trade-off

Systematically wrong (biased)but by about the same amount from sample to sample

Less biased but accuracy ofpredictions will vary a lot from sample to sample

Page 32: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Model Validation

We use a Training Sample to develop our model

We use a Validation Sample to estimate the accuracy of our candidate models Which method and parameter settings will

work best? How well will it predict new data?

Page 33: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Cross Validation

One of the simplest ways is Cross Validation

We randomly split data into two parts - a training sample and a validation sample

70/30 splits are common Build the model on the training sample

and observe how accurately it predicts on the validation (hold out) sample

Page 34: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Cross Validation

70%

30%

Training Validation

Page 35: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

K-Fold Validation

K-Fold Validation is generally preferred Many variations but popular way is to

randomly divide data into 5 subsamples Build model on 4 subsamples combined

and validate on 5th

Repeat this 4 times so that model performance is assessed in each subsample

Run this whole process 5-10 times on different random subsamples use average/modal result

Page 36: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

5-Fold Validation

Run 5

Run 4

Run 3

Run 2

Run 1

0% 20% 40% 60% 80% 100%

Subsample 1Subsample 2Subsample 3Subsample 4Subsample 5

Validate

Validate

Validate

Validate

Validate

Repeat process 5-10 times on different random subsamples

Page 37: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Model Selection

Most Machine Learners have several tuning parameters and Statistical modelling usually requires many decisions

Often many methods are tried, tuned and tested since even small differences in accuracy can translate into Big $

Jackknife and Bootstrap are two other procedures sometimes used

Page 38: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Final Model

When the type of predictive model and its parameters have been decided the model is re-run on the entire sample

This will be the model that is actually deployed

Sometimes many models/versions are “stacked” and the results averaged Usually a Plan B option because of cost and

complexity

Page 39: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Updating Model

The predictive accuracy of any model will decline over time

Should be periodically updated on new data

Page 40: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Page 41: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Some Machine Learners Used In Predictive Analytics

Naive Bayes KNN Apriori Artificial Neural Networks Support Vector Machines Random Forests Stochastic Gradient Boosting MARS Cubist, C5.0 (J. Ross Quinlan) Latent Dirichlet Allocation …many more…

Page 42: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

K-Nearest Neighbors (KNN)

Nearest 3 Neighbors

Nearest 5 Neighbors

New (Green) point predicted to be Red

New (Green) point predicted to be Blue

Page 43: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Artificial Neural Networks (ANN)

Input Layer Output Layer

Hidden LayerNodes

Nodes

Page 44: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Support Vector Machines (SVM)

Margin

Support Vectors

Page 45: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Random Forests (RF)

Hundreds or Thousands of Trees

Variables and Cases Randomly Selected

Page 46: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Prediction And Explanation

An ideal model will predict new cases and be informative

Compared to Statistical models, Machine Learners often slightly better at prediction but are usually hard to interpret

Sometimes can use a Machine Learner and Statistical model in tandem - if each adequate their predictions will be highly correlated

Page 47: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Page 48: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Humans Are Not Yet Obsolete

Applied Logistic Regression (Hosmer and Lemeshow):

"[statistical] methods...are not to be used as a substitute, but rather as an addition to clear and careful thought. Successful modeling of a complex data set is part science, part statistical methods, and part experience and common sense.“

Page 49: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Humans Are Not Yet Obsolete

Leo Breiman and Adele Cutler (developers of RF algorithm):

“[random forests] is an example of a tool that is useful in doing analyses of scientific data. But the cleverest algorithms are no substitute for human intelligence and knowledge of the data in the problem. Take the output of random forests not as absolute truth, but as smart computer generated guesses that may be helpful in leading to a deeper understanding of the problem.”

Page 50: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Humans Are Not Yet Obsolete

The resurgence of Bayesian statistics is further evidence that human judgment cannot be purged from analytics

We must also avoid finding a great answer…to the wrong question!

Page 51: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Humans Are Not Yet Obsolete

More analytic options also mean higher risk and greater need for well-trained and experienced researchers

Paradoxically, technology has made it easier to be an incompetent data scientist and harder to be a good one! e.g., “abuser-friendly” stats software Beautiful data visualizations…that tell us

nothing

Page 52: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Humans Are Not Yet Obsolete

With bigger and messier data, understanding people will become more critical, not less

Demand will rise for data scientists able to see beyond math and programming who truly understand marketing and consumers

Page 53: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Page 54: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

The Why

Understanding The Why is critical Marketing isn’t only about predicting

behavior it’s also about changing behavior Understanding The Why can help us do this

Page 55: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

The Why

Reverse-engineering The Why from past behavior isn’t always easy Two people can do the same things for the

same reasons Two people can do the same things for

different reasons Two people can do different things for the

same reasons Two people can do different things for

different reasons

Page 56: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

The Why

There is also the “multiple me”…on different occasions I can do the same things for the same reasons I can do the same things for different reasons I can do different things for the same reasons I can do different things for different reasons

Also…some of our behavior is random…

Page 57: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

“How do I know it will pay off?”

Page 58: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Reasons To Be Skeptical

Data infrastructure may involve considerable direct costs

The necessary skills may not be available in-house

To work well, Predictive Analytics is a team effort and time costs can be substantial

Page 59: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Reasons To Be Skeptical

Managers and staff often already overstretched “This is the last thing we need…”

Can be seen as threat, not opportunity

Page 60: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Reasons To Be Skeptical

Anonymity, confidentiality and privacy are common (and legitimate) concerns

Potential legal issues Risk of data breaches Customers can become worried and

irritated!

Page 61: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Reasons To Be Skeptical

Still only a handful of universities offer degree programs in Data Science - not as established as Finance or Accounting

Predictive Analytics is not guaranteed to pay back

C-Level buy-in critical!

Page 62: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Big Expectations

On the other hand, some clients think Predictive Analytics is easy “I have all this data, can you find me

something?” Managing client expectations critical -

may only be a small, rusty needle in that Big Expensive Haystack!

Page 63: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Page 64: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Some Books The Data Warehouse Toolkit (Kimball and Ross) Data Architecture: A Primer for the Data Scientist (Inmon and

Linstedt) Hadoop: The Definitive Guide (White) Sampling: Design and Analysis (Lohr) Experimental and Quasi-Experimental Designs (Shadish et al.) Categorical Data Analysis (Agresti) Propensity Score Analysis (Guo and Fraser) Time Series Analysis (Wei) Data Mining Techniques (Linoff and Berry) Regression Modeling Strategies (Harrell) Data Mining (Whitten et al.) Applied Predictive Modeling (Kuhn and Johnson) An Introduction to Statistical Learning (James et al.) Elements of Statistical Learning (Hastie et al.) Data Mining: The Textbook (Aggarwal)

Page 65: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Some Online Resources

KDNuggets - http://www.kdnuggets.com/ Data Science Central -

http://www.datasciencecentral.com/ About Data Analysis -

https://www.linkedin.com/grp/home?gid=8156839

CRISP-DM - https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

Cannon Gray company library - http://cannongray.com/methods

Page 66: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Page 67: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Key Points To Remember

Predictive Analytics uses existing data to build a model that will accurately predict new data (e.g., customer behavior)

It does not require Big Data, Machine Learning or Real-Time Analytics

Page 68: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Key Points To Remember

It is not a recent development but has become much more sophisticated in past decade

It is now much more widely accepted - not so exotic (or wacko) anymore Very quizzical looks used to be routine!

Page 69: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Key Points To Remember

Big Data has not miraculously made data clean and easy to analyze - just the opposite!

Only 10% - 20% of total analyst time spent on modeling

Page 70: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Key Points To Remember

Overfitting frequently a problem - use K-fold validation when possible

Aim for parsimony - as simple as possible but not too simple

Page 71: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Key Points To Remember

Implementation is the graveyard of many good ideas!

After deployment, sales and customer behavior should be tracked The actual effects must be assessed - e.g., are

we just making customers more price sensitive and eroding our brand equity?

Even successful implementations must be refined, modified or discarded with passage of time

Page 72: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Key Points To Remember

Predictive Analytics is a process that requires a team - technical skill is only one aspect

Marketing researchers can’t do it all but there is often an important role for us

Vital that end users be active team participants

Page 73: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Key Points To Remember

Marketing researchers, marketers, statisticians and computer scientists often use different jargon or same jargon differently!

Don’t assume - communicate!

Page 74: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Key Points To Remember

Predictive Analytics is not a substitute for marketing research

Synergizes well with “trad” marketing research to provide richer insights into what consumers do and why

Page 75: Predictive Analytics - What It Really Is and What It Really Does

Cannon Gray LLC

Key Points To Remember

Marketing researchers have a natural advantage over others working in this space - we are more than technicians

Some criticisms of MR by “Data Scientists” betray lack of understanding of marketing and research

Page 76: Predictive Analytics - What It Really Is and What It Really Does

Science Fiction Is Not Science

Page 77: Predictive Analytics - What It Really Is and What It Really Does

THANK YOU!!

Kevin GrayCannon Gray LLC

http://www.cannongray.com/[email protected]