customer activation predictive model
TRANSCRIPT
Customer ActivationActivity Predictive Model
Customer ActivationFocus on Equity
Objectives1. To predict activity levels of each customer in the near future( (Current Model: 90 days)2. To profile customer activity over time (i.e., activity states with durations)3. To determine the recommendations to activate customers
Problem Dimensions• People
– Who are the people likely to be inactive in the next month?
• Activity State– What are the different states in customer life cycle?– What is the customer behaviour in a particular state?
• State Duration– How long would the customer will be in particular state?– What will be the transition time for a particular customer?
• Recommendation– What strategy will be effective to prohibit inactivity of a particular customer?– What strategy can bring customer back from inactive state to active state?
Analysis Process
Distributions:Inactive period behaviour, life cycle of customer
Comparative views:First time inactive vs. current inactive, inactive vs. active customer life cycle
ETL Merge Filter Visualize
Storage: ACMIIL (Trades)
Data Formats:Dates, categories, numeric value ranges, etc.
File Formats:Comma, Tilde, or Tab delimited
Customer types:Individual vs. Institutions, etc.
Transaction types:Buying/Selling,First time inactive, current inactive
Identifiers:Client Code, CommonClientCode
Timeline:Daily, Monthly
Aggregates:Counts, Sums of EQ buy, Sums of EQ sell
Activity Modelling - OutlineTrades
Data• Summary• Discovery
Model• Setup• Application• Code• Results• Setup - Next Steps• Application – Next Steps
Future Work
Data: Summary
Statistical measures (e.g., mean) errors– Units field has negative values– Too large or small values
Text data:Mis-matches
Numerical data: Unreal ranges
Numerical data:Spurious values
DQ Issues
Sizing for technology• ~7M EQ and ~1M DER trades per year• ~100k trading customers currently on
platform, and 1/3rd transacted in the last 6 months
Analysis caution• Data distributions highly skewed,
e.g., few high amount Txs by one or two individuals
Data: DiscoveryInactivity count
All clients inactive at least once for greater than 91 days
Inactivity Count
InsightsAll Clients have been inactive at least once
Freq
uenc
y
Data ParticularsData Duration 2012 Apr - 2015 SepEach Row Client-monthClient Category Individual and HUF# of Rows 366421# of Columns 31# of Unique Clients 49444
Data: DiscoveryAverage Inactive Duration
Average inactivity duration (days)
Freq
uenc
y
300 days
InsightsHistogram of Average inactivity duration gives maximum frequency at 300 days
Data ParticularsData Duration 2012 Apr - 2015 SepEach Row Client-monthClient Category Individual and HUF# of Rows 366421# of Columns 31# of Unique Clients 49444
Data: DiscoveryFirst Time Inactive vs. Currently Inactive
First time inactive
Currently inactive
Vintage (yrs) Vintage (yrs)
Freq
uenc
y
Freq
uenc
y5 yrs 7 yrs
InsightsCurrent inactive customers are a mix of first time inactive and other periods making it harder to study current inactivity alone => It brings about the need to study each activity level or state separately
Data ParticularsData Duration 2012 Apr - 2015 SepEach Row Client-monthClient Category Individual and HUF# of Rows 366421# of Columns 31# of Unique Clients 49444
Sum
Am
t. (s
old)
Sum
Am
t. (B
ough
t)Data: DiscoveryRandom customer 1: currently inactive(Tx Amount)
Trend Curve
Trend Curve
Data: DiscoveryRandom customer 1: currently Inactive (Tx Count)
Tx C
ount
(sol
d)Tx
Cou
nt (b
ough
t)
Trend Curve
Trend Curve
Data: DiscoveryRandom customer 2: currently active(Tx Amount)
Sum
Am
t. (s
old)
Sum
Am
t. (B
ough
t)
Trend Curve
Trend Curve
Data: DiscoveryRandom customer 2: currently active (Tx Count)
Tx C
ount
(sol
d)Tx
Cou
nt (b
ough
t)
Trend Curve
Trend Curve
Data: Discovery Insights
• All clients have been inactive (> 91 days inactivity) at least once• The most-likely inactivity duration is ~300 days, i.e., if customer becomes
inactive => there is a high chance of a long inactivity period• Customer behaviour is different before various inactive states• Each inactive state (i.e., first time or second time, etc.) need to be
modelled separately• There are different trend curves in a customer’s life cycle that each of
customers follow• The trend curves may be grouped together into a finite set of
representative trend curves• All the above may be modelled using a State-space approach• A simple binary approximation is the Logistic regression model
Test Data
Three Year Trade Data
60% Used for Training Model
20% Used for Validating Model
20% Used for Testing Model
Total Available Data
Training Data
Validation Data
Time
Acc Opening Date
1 1
First Time inactive Inactive
1
Active Period
Inactive Period
Inactivity: Defined as 0 transactions in consecutive 91 days
Hypothesis: Customer’s state can be predicted using transactions data
Logistic Regression Model
To find predictive variables To predict next state of the
customer
0 0 0
0
Data Set Creation
Model: Setup
Summary after training the model
Model Validation
Model Test
Model: Code View
Model: Application
0 0 1 0 0 0 1
0 0 0 0 1 0 1
Actual States
Predicted States
Inactive State miss
Active State miss
Actu
al
Predicted
Positive
Positi
ve
Negative
Neg
ative
a b
c d
a - True Positiveb - False Negativec - False Positived - True Negative
𝐻𝑎=𝑑𝑁 0
𝑀𝑎=𝑏𝑁0
𝐻 𝑖=𝑎𝑁1
𝑀 𝑖=𝑐𝑁 1
- Active state hit rate- Active state miss rate - Inactive state hit rate - Inactive state miss rate
Model: Results
= 0.01%
Correct Pre-dicted Active
State
Wrong Predicted Active State
0
5000
10000
15000
20000
25000
30000
35000
Correct Predicted Inctive State
Wrong Prdicted Inctive State
02000400060008000
100001200014000160001800020000
= 84.5%
Threshold = 0.25
Correct
Predicte
d Active
State
Wrong P
redict
ed Active
State
0100002000030000
= 60.8%
Correct Predicted Inctive State
Wrong Prdicted Inctive State
0
5000
10000
15000
20000
25000
= 93.1%
Threshold = 0.35
Correct Predicted Active State
Wrong Predicted Active State
0
10000
20000
30000
40000
50000
60000
Correct Predicted Inctive State
Wrong Prdicted Inctive State
0
5000
10000
15000
20000
25000
= 40.3%
= 0.0009%
Threshold = 0.50
- Active state miss rate - Inactive state hit rate
aa
ac
c
c
d
d db
b
b
Model: Application (next steps)Multi-period
Hypothesis:- Error rates can be decreased by taking into account multiple periods for predictions
0 0 1 0 0 0 1
0 0 0 0 1 0 1
Actual States
Predicted States
Model predicts 1
Check customer’s transaction in next 30
days
If Tx = 0
Model output is 0 Model output is 1
TrueFalse
1
Active Period
Inactive Period
0
Future…
State-space Model
active
inactive closed
On-boarded
Technical Model: State-space Model
• In the applied model we have taken only two states 0 for active and 1 for inactive• Between these active and inactive state a customer can transit into many different states as shown in the
state space model above
• By applying state space model the complete life cycle of a customeri. Previous state ii. Next state iii. Time he will be in a particular state iv. Behaviour of customer in a particular statev. Behaviour of customer just before transition,vi. Behaviour of customer before going off-board, etc., will be profiled
Discussions and Questions
Back-up Slides
Model: DiscoveryPredictive Variables
Model: Setup (next steps)Customer Sampling
For the current model, Training, validation and Testing dataset has been created by sampling on the basis of rows, where each row is a particular customer and aggregated transaction amounts on monthly basis.
We can create Training, validation and Testing dataset by sampling as per customer basis.