predictingrepeatcust
TRANSCRIPT
How to identify potential repeat customers?Type of Customer ( since 2005)*
No. of Customers
Repeat Customer 125,629Customer who purchased 1 car with ALJF
425,745
How many are Repeat customer? A “subset” of 425,745 ( about 20 %)
Need to identify a time window beyond which second purchase chances are very low. First purchase Customer older than this can be safely assumed to be non-repeat customer.
Is there a pattern in the time between subsequent purchase for repeat customers?
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 1010.5 1111
.5 1212.5 1313
.5
5%
10%12%
10%9%8% 7% 6% 6% 5%
4% 4% 3% 2% 2% 1% 1% 1% 1% 1% 1% 0% 0% 0% 0% 0% 0%
Distribution of Time interval between purchase for all Repeat Customers
Years between purchase of subsequent vehicles
% o
f su
bseq
uent
pur
chas
e
Need to identify a time window beyond which second purchase chances are very low. First purchase Customer older than this can be safely assumed to be non-repeat customer.
Beyond 8 years only 5 % of the repeat customers bought their subsequent car. Hence considering 8 years as a time window to observe repeat customers.
Training data for predictive modeling.Subsequent Car
Buying preference generally depends upon multifactor ( Additional Data we have of our Customers)Marital Status
Nationality Work Place Job Car Model
Guest Age Range
Guest Income Range
Guest Gender Government/ Non Government
No. of Dependents
Customers profile Labelling as Repeat Customers
All repeat customers from 2005 till now 1Single purchase customers 8 years from now, since 2005
0Independent variables which determines repeat/non repeat behavior:-
Data Partitioning:- Training ( to train the model) Validation ( to test the
model)55% 45%
Probabilistic Binary Classification using Decision Tree
Decision Tree Model: Non-linear Prediction Model for binary classification through hierarchical segmentation of the data, by partitioning recursively.A tree structure of rules over the input variables are used to classify or predict according to the target variable ( Repeat).
Outcome assigns each case a probability, upon which one can apply a threshold ( 0.5). Higher probability than threshold is predicted as repeat customers and lower probability than threshold is predicted as non-repeat customers.
Compare Models
Each model gives an improved classification than a naïve (randomly selecting customers for solicitation) model. Decision Tree out performs other models.
Decision Tree Performance:-
The validation leaves are almost likely populated as the training leaves indicating no over fitting.
The misclassification rate stabilizes beyond 50 leaves, indicating model converges. The depth of the tree is good enough.
0 100 200 300 400 500 600 700 800 900 10000
0.2
0.4
0.6
0.8
1
Observing Probability Levels and Actual Repeat Customer Cases
Customer Cases
Prob
abilit
y
More “crowded” plots at higher probabity.
Probability Zone for Maximum Prediction
Probability above 0.8 will have most repeat customer cases!
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10000
20000
30000
40000
50000
60000
70000
80000
90000
Probability rating on actual Cases
Repeat Customer Non-repeat Customer
But who is “ready” to buy?
Most of subsequent purchase occur in this time window.Superimpose on probability model to get Priority rating.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 100
0.010.020.030.040.050.060.070.080.09
0.1
Percentage of Customer making 2nd purchase after 1st Purchase wrt. time
interval
Strategic Solicitation(Ranking and Targeting) :-1. Score all single purchase customers within 8 years using different
predictive model(logistic regression/ neural network/decision tree). 2. Identify region of maximum purity derived from actual case
frequency on different probability levels. 3. Adjust the weight using “Distribution of time between purchase for
purchase customers” as a factor depending upon the time since purchase. Assign peak(frequency) as weight of 100% and weights of other period based on relative frequency to peak and modify probability of being a repeat customer with the above factor.
4. Remove delinquent customers.5. Sort them in descending order of repeat customer probability score
from the model.
Advantages of dashboard Integrated Data availability for all stake holders ( Operation, Telemarketing). Summary area provides a snapshot of campaign performance. Automated process with minimal work to launch campaign. Top management will observe telemarketing feed and actual buying behavior to generate insight to improve sales.
Telemarketing can view the progress of their previous call, and pursue customers who planned to buy but did not buy yet.
Future plans1. Extend this campaign to customers acquisition through external data.2. Automate Telemarking update through web pages.3. Create SMS broadcast service.4. Direct campaign to auto-dialer.