customer churn, a data science use case in telecom
TRANSCRIPT
Customer Churn A Data Science Use Case in Telecom
Chris Chen - Data Analyst@Shaw Communications
The Problem
Who?
Why?
How?
CRISP-DM: Cross Industry Standard Process for Data Mining
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Business Understanding
Business objectives: • Reduce customer
churns • Minimize the costs
(efforts) of retention • Conduct insights
Success criteria • Metrics • Non-metrics
Data Understanding
Data sources
Internal: Customer Data, Product Data, Transactions and Customer interactions
External
Data qualities: missing values, duplicates, outliers etc.
First insights: Binary Classification, Skewed (Imbalanced)
Data Preparation
ETL
Feature selection
Feature engineering
Train/ validation/ test
Feature Selection (subtraction)
• Expert voting system
• Engage SMEs from various background: tech vs non-tech, marketing vs customer care, management vs frontline sales.
• 10-15 most important features that may have impacts on customer churn/ customer retention
Wrapper based methods
Random Forest/ Boosting Tree - also good hints for feature engineering
Filter methods
Missing Values Ratio
Low Variance Filter ( less informative features )
High Correlation Filter ( similar features)
Can be good candidates for interactions. e.g. Age vs Income
PCA
Feature Selection (subtraction) - science
Feature Engineering ( addition)
Business acumen, combined with domain knowledge and model understandings
Ordinal vs Nominal: label encoding, one-hot-encoding
Transformation: normalization, log and so on
Imputation: missing values
Feature Interactions
Time series
Modeling
Classification:
Gradient Boosting Tree (GBT) - I’m big fan of Xgboost
Random Forest (RF)
Logistic Regression (LR) or Elastic Net (EN)
Neural Network (NN)
Support Vector Machine (SVM)
Modeling - Metric
• Precision = True Positive/ (True Positive + False Positive)
• Recall = True Positive / (True Positive + False Negative)
• Accuracy = (TP + TN)/ (TP + TN +FP + FN)
For a dataset that contains 99% non-churn customers and 1% churn customers if we predicted all customers as churn then the accuracy would be 99%
Area Under Curve (AUC): Precision vs Recall Trade off for skewed classifications
True Positive (Churn Customers that were correctly predicted as churns)
False Positive (No-churn Customers that were incorrectly predicted as churn
customers)
False Negative (Churn customers
that were incorrectly predicted as non-
churns)
True Negative (Non-churn
customers that were correctly predicted as
non-churns)
Actuals
Pred
ictio
ns
Modeling - Ensemble
Evaluation - Model excellence vs Business excellence
Accuracy
Inte
rpre
tabi
lity
Low
Low
High
High
Radom ForestBoosting
Deep Learning
Neural Network
Linear/ Logistic Regresson
Decision Trees
Naive Bayes
Nearest NeighboursSVM
Deployment
Rule of Thumb:
Business Engagement
Thank you!