customer churn, a data science use case in telecom

Customer Churn A Data Science Use Case in Telecom

Chris Chen - Data Analyst@Shaw Communications

The Problem

Who?

Why?

How?

CRISP-DM: Cross Industry Standard Process for Data Mining

Business Understanding

Data Understanding

Data Preparation

Modeling

Evaluation

Deployment

Business Understanding

Business objectives: • Reduce customer

churns • Minimize the costs

(efforts) of retention • Conduct insights

Success criteria • Metrics • Non-metrics

Data Understanding

Data sources

Internal: Customer Data, Product Data, Transactions and Customer interactions

External

Data qualities: missing values, duplicates, outliers etc.

First insights: Binary Classification, Skewed (Imbalanced)

Data Preparation

ETL

Feature selection

Feature engineering

Train/ validation/ test

Feature Selection (subtraction)

• Expert voting system

• Engage SMEs from various background: tech vs non-tech, marketing vs customer care, management vs frontline sales.

• 10-15 most important features that may have impacts on customer churn/ customer retention

Wrapper based methods

Random Forest/ Boosting Tree - also good hints for feature engineering

Filter methods

Missing Values Ratio

Low Variance Filter ( less informative features )

High Correlation Filter ( similar features)

Can be good candidates for interactions. e.g. Age vs Income

PCA

Feature Selection (subtraction) - science

Feature Engineering ( addition)

Business acumen, combined with domain knowledge and model understandings

Ordinal vs Nominal: label encoding, one-hot-encoding

Transformation: normalization, log and so on

Imputation: missing values

Feature Interactions

Time series

Modeling

Classification:

Gradient Boosting Tree (GBT) - I’m big fan of Xgboost

Random Forest (RF)

Logistic Regression (LR) or Elastic Net (EN)

Neural Network (NN)

Support Vector Machine (SVM)

Modeling - Metric

• Precision = True Positive/ (True Positive + False Positive)

• Recall = True Positive / (True Positive + False Negative)

• Accuracy = (TP + TN)/ (TP + TN +FP + FN)

For a dataset that contains 99% non-churn customers and 1% churn customers if we predicted all customers as churn then the accuracy would be 99%

Area Under Curve (AUC): Precision vs Recall Trade off for skewed classifications

True Positive (Churn Customers that were correctly predicted as churns)

False Positive (No-churn Customers that were incorrectly predicted as churn

customers)

False Negative (Churn customers

that were incorrectly predicted as non-

churns)

True Negative (Non-churn

customers that were correctly predicted as

non-churns)

Actuals

Pred

ictio

ns

Modeling - Ensemble

Evaluation - Model excellence vs Business excellence

Accuracy

Inte

rpre

tabi

lity

Low

Low

High

High

Radom ForestBoosting

Deep Learning

Neural Network

Linear/ Logistic Regresson

Decision Trees

Naive Bayes

Nearest NeighboursSVM

Deployment

Rule of Thumb:

Business Engagement

Thank you!

customer churn, a data science use case in telecom

Data & Analytics