predictingrepeatcust

Predicting “the” Repeat Customers

22st March 2016

V1.0

How to identify potential repeat customers?Type of Customer ( since 2005)*

No. of Customers

Repeat Customer 125,629Customer who purchased 1 car with ALJF

425,745

How many are Repeat customer? A “subset” of 425,745 ( about 20 %)

Need to identify a time window beyond which second purchase chances are very low. First purchase Customer older than this can be safely assumed to be non-repeat customer.

Is there a pattern in the time between subsequent purchase for repeat customers?

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 1010.5 1111

.5 1212.5 1313

.5

5%

10%12%

10%9%8% 7% 6% 6% 5%

4% 4% 3% 2% 2% 1% 1% 1% 1% 1% 1% 0% 0% 0% 0% 0% 0%

Distribution of Time interval between purchase for all Repeat Customers

Years between purchase of subsequent vehicles

% o

f su

bseq

uent

pur

chas

e

Need to identify a time window beyond which second purchase chances are very low. First purchase Customer older than this can be safely assumed to be non-repeat customer.

Beyond 8 years only 5 % of the repeat customers bought their subsequent car. Hence considering 8 years as a time window to observe repeat customers.

Training data for predictive modeling.Subsequent Car

Buying preference generally depends upon multifactor ( Additional Data we have of our Customers)Marital Status

Nationality Work Place Job Car Model

Guest Age Range

Guest Income Range

Guest Gender Government/ Non Government

No. of Dependents

Customers profile Labelling as Repeat Customers

All repeat customers from 2005 till now 1Single purchase customers 8 years from now, since 2005

0Independent variables which determines repeat/non repeat behavior:-

Data Partitioning:- Training ( to train the model) Validation ( to test the

model)55% 45%

Probabilistic Binary Classification using Decision Tree

Decision Tree Model: Non-linear Prediction Model for binary classification through hierarchical segmentation of the data, by partitioning recursively.A tree structure of rules over the input variables are used to classify or predict according to the target variable ( Repeat).

Outcome assigns each case a probability, upon which one can apply a threshold ( 0.5). Higher probability than threshold is predicted as repeat customers and lower probability than threshold is predicted as non-repeat customers.

Decision Tree to predict repeat/non repeat customer

Compare Models

Each model gives an improved classification than a naïve (randomly selecting customers for solicitation) model. Decision Tree out performs other models.

Decision Tree Performance:-

The validation leaves are almost likely populated as the training leaves indicating no over fitting.

The misclassification rate stabilizes beyond 50 leaves, indicating model converges. The depth of the tree is good enough.

0 100 200 300 400 500 600 700 800 900 10000

0.2

0.4

0.6

0.8

1

Observing Probability Levels and Actual Repeat Customer Cases

Customer Cases

Prob

abilit

y

More “crowded” plots at higher probabity.

Probability Zone for Maximum Prediction

Probability above 0.8 will have most repeat customer cases!

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10000

20000

30000

40000

50000

60000

70000

80000

90000

Probability rating on actual Cases

Repeat Customer Non-repeat Customer

But who is “ready” to buy?

Most of subsequent purchase occur in this time window.Superimpose on probability model to get Priority rating.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 100

0.010.020.030.040.050.060.070.080.09

0.1

Percentage of Customer making 2nd purchase after 1st Purchase wrt. time

interval

Strategic Solicitation(Ranking and Targeting) :-1. Score all single purchase customers within 8 years using different

predictive model(logistic regression/ neural network/decision tree). 2. Identify region of maximum purity derived from actual case

frequency on different probability levels. 3. Adjust the weight using “Distribution of time between purchase for

purchase customers” as a factor depending upon the time since purchase. Assign peak(frequency) as weight of 100% and weights of other period based on relative frequency to peak and modify probability of being a repeat customer with the above factor.

4. Remove delinquent customers.5. Sort them in descending order of repeat customer probability score

from the model.

Dashboarding process

Advantages of dashboard Integrated Data availability for all stake holders ( Operation, Telemarketing). Summary area provides a snapshot of campaign performance. Automated process with minimal work to launch campaign. Top management will observe telemarketing feed and actual buying behavior to generate insight to improve sales.

Telemarketing can view the progress of their previous call, and pursue customers who planned to buy but did not buy yet.

Future plans1. Extend this campaign to customers acquisition through external data.2. Automate Telemarking update through web pages.3. Create SMS broadcast service.4. Direct campaign to auto-dialer.

Thank you!

predictingrepeatcust

Documents