predicting customer conversion with random forests

Predicting Customer Conversion with Random Forests

Daniel Gerlanc, PrincipalEnplus Advisors, Inc.www.enplusadvisors.comdgerlanc@enplusadvisors.com

A Decision Trees Case Study

Topics

Objectives Research Question

DataBank Prospect

Conversion

MethodsDecision Trees

Random Forests

Results

Objective

•Which customer or prospects should you call today?

•To whom should you offer incentives?

Dataset

•Direct Marketing campaign for bank loans

•http://archive.ics.uci.edu/ml/datasets/Bank+Marketing

•45211 records, 17 features

Dataset

Decision Trees

No Coat

Statistical Decision Trees

•Randomness

•May not know the relationships ahead of time

Decision Trees

Splitting

Deterministic process

Decision Tree Codetree.1 <- rpart(takes.loan ~ ., data=bank)

• See the ‘rpart’ and ‘rpart.plot’ R packages.• Many parameters available to control the fit.

Make Predictionspredict(tree.1, type=“vector”)

How’d it do?

Actual

Predicted no yes

no (1) 38,904(2) 1,018

(3) 3,444(4) 1,845yes

Naïve Accuracy: 11.7%

Decision Tree Precision: 34.8%

Decision Tree Problems

•Overfitting the data (high variance)

•May not use all relevant features

Random Forests

One Decision Tree

Many Decision Trees (Ensemble)

Building RF

•Sample from the data

•At each split, sample from the available variables

•Repeat for each tree

Motivations for RF

•Create uncorrelated trees

•Variance reduction

•Subspace exploration

Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)

Most important parameters are:

Variable

Description Default

ntree Number of Trees 500

mtry Number of variables to randomly select at each node

• square root of # predictors for classification

• # predictors / 3 for regression

How’d it do?

Naïve Accuracy: 11.7%

Random Forest • Precision: 64.5% (2541 / 3937)• Recall: 48% (2541 / 5289)

Actual

Predicted yes no

yes (1)2,541 (3) 2748

no (2) 1,396 (4) 38,526

Tuning RF

rffit.1 <- tuneRF(X, y, mtryStart=1, stepFactor=2,improve=0.05)

Benefits of RF

•Good accuracy with default settings

•Relatively easy to make parallel

•Many implementations

•R, Weka, RapidMiner, Mahout

References

• A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.

• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.

• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.htm

• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.

predicting customer conversion with random forests

decision tree code tree

bankloans http

principalenplus advisors

relationships aheadof

prospects shouldyou

splittingdeterministic

Documents

14.6 random forests -...

predicting movie ratings a comparative study on random...

disjunctive normal random forests · 2. disjunctive normal...

predicting university students’ academic success and major...

lecture 20: bagging, random forests, boosting · lecture...

causal random forests duncan - econ.washington.edu

random forests for big data

random forests for language modeling

bootstrapped aggregation and random forests

online random forests in 10 minutes

on oblique random forests - mit csail

accelerating random forests in scikit-learn

présentation g.biau random forests

critical behaviour of spanning forests on random planar...

using random forests v3.1

consistency of random forests - arxiv · consistency of...

random forests for classification and regression

predicting carbon dynamics in forests using remote

conistency of random forests

understanding random forests