bagging and boosting classification trees to predict churn. insights from the us telecom industry...

32
Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work with Christophe Croux Aurélie Lemmens

Upload: millicent-marshall

Post on 18-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Bagging and Boosting Classification Trees to Predict Churn.

Insights from the US Telecom Industry

Forthcoming, Journal of Marketing Research

Joint work with Christophe Croux

Aurélie Lemmens

Page 2: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

The 2002 Churn Tournament organised by Teradata Center for CRM

at Duke University

Churn means defecting from a company, i.e. take his business

elsewhere

Customer database from an anonymous U.S. wireless telecom

company

Challenge: predicting churn for elaborating targeted retention

strategies (Bolton et al. 2000, Ganesh et al. 2000, Shaffer and Zhang

2002)

Details can be found in Neslin et al. (2004)

The Context

Page 3: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

The US Wireless Telecom market (2004)

182.1 million subscribers

Leader in market share: Cingular Wireless

26.9% total market volume

turnover US$19.4 billion / net income US$201 million

Other major players: AT&T, Verizon, Sprint and Nextel

Mergers & Acquisitions : Cingular with AT&T Wireless & Sprint

with Nextel

The Context (cont’d)

Page 4: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Churn

High churn rates 2.6% a month

Causes: increased competition, lack of differentiation,

market saturation

Cost: $300 to $700 cost of replacement of a lost

customer in terms of sales support, marketing,

advertising, etc.

Targeted retention strategies

The Context (cont’d)

Page 5: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Formulation of the Churn Problem

Churn as a Classification issue:

Classify a customer i characterized by k variables

xi = (xi1 , xi2 , …, xiK ) as

Churner yi = + 1

Non-churner yi = - 1

Churn is the response binary variable to predict: yi = f(xi )

Choice of the binary choice model f ( . ) ?

Page 6: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Classification Models in Marketing

Simple binary logit choice model (e.g. Andrews et al. 2002)

Models allowing for the heterogeneity in consumers’ response:

Finite mixture model (e.g. Wedel and Kamakura 2000)

Hierarchical Bayes model (e.g. Yang and Allenby 2003)

Non-parametric choice models:

Decisions trees, neural nets (e.g. Thieme et al. 2000; West et al. 1997)

Bagging (Breiman 1996), Boosting (Freund and Schapire 1996),

Stochastic gradient boosting (Friedman 2002)

Page 7: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Classification Models in Marketing

Simple binary logit choice model (e.g. Andrews et al. 2002)

Models allowing for the heterogeneity in consumers’ response:

Finite mixture model (e.g. Wedel and Kamakura 2000)

Hierarchical Bayes model (e.g. Yang and Allenby 2003)

Non-parametric choice models:

Decisions trees, neural nets (e.g. Thieme et al. 2000; West et al. 1997)

Bagging (Breiman 1996), Boosting (Freund and Schapire 1996),

Stochastic gradient boosting (Friedman 2002)

Mostly ignored in the marketing literature

S.G.B. won the Tournament (Cardell, from Salford Systems)

Page 8: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Decision Trees for ChurnChange in consumption

Customer care calls

< 0.5 ≥ 0.5

≥ 3< 3

Age

Yes

≥ 55

55< & ≥ 26 < 26

No

Handset price

≥ $150 <$150

No Yes

Yes No

Example:

Page 9: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Bagging and Boosting

Machine Learning Algorithms

Principle: classifier aggregation (Breiman, 1996)

Tree-based method (e.g. Currim et al. 1988)

Bagging: Bootstrap AGGregatING

Page 10: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Calibration sampleZ = {(xi , yi ) }, i = 1, …, N

Random sample Z1*

Random sample Z2*

xf *1̂

xf *2̂

e.g. tree

Page 11: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Aggregating bootstrap samples

. . .

xf *2̂

xf *1̂

xf *3̂

xfB*ˆ

B

bbbag xf

Bxf

1

*ˆ1)(ˆ

Churn propensity score:

Churn classification:

)(ˆ)(ˆ xfsignxc bagbag

Page 12: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Let the calibration sample be Z={(x1,y1), …, (xi,yi), …, (xN ,yN)}

B bootstrap samples

From each , a base classifier (e.g. tree) is estimated,

giving B score functions:

The final classifier is obtained by averaging the scores

The classification rule is carried out via

BbZb ,,2 ,1 ,*

B

bbbag xf

Bxf

1

*ˆ1)(ˆ

*bZ

xfxfxf Bb***

1ˆ,, ˆ,, ˆ

)(ˆ)(ˆ xfsignxc bagbag

Bagging

Page 13: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Winner of the Teradata Churn Modeling Tournament

(Cardell, Golovnya and Steinberg, Salford Systems).

Data adaptively resampled

Stochastic Gradient Boosting

• Previously misclassified observations weights

• Previously well-classified observations weights

Page 14: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Data

Time

Customer

Balanced

Sample

Proportional

Sample

Calibration Sample Validation Hold-Out Sample

yi = + 1

yi = + 1

yi = - 1

yi = - 1

Xi = (x1,…, x46) yi

Xi=(x1,…, x46) yi

Behavioral predictorse.g. the average monthly minutes of use

Company interaction’s variablese.g. mean unrounded minutes of customer care calls

Customer demographicse.g. the number of adults in the household

N = 51,306

N=100,462Real-life proportion of churners = 1.8%

Equal proportion of churners = 50%

Page 15: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Research Questions

Do bagging (and boosting) provide better results than

other benchmarks?

What are the financial gains to be expected from this improvement?

What are the more relevant churn drivers or triggers that marketers

could watch for?

How to correct estimated scores obtained from a

balanced calibration sample, when predicting rare

events like churn?

Page 16: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Comparing Error Rates…

Model* Validated Error Rate**

Binary Logit Model 0.400

Bagging (tree-based) 0.374

Stochastic Gradient Boosting 0.460

* Model estimated on the balanced calibration sample** Error rates computed on the hold-out proportional validation sample

Page 17: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Bias due to Balanced Sampling

Overestimation of the number of churners

Several bias correction methods exist (see e.g. Cosslett 1993;

Donkers et al. 2003; Franses and Paap 2001, p.73-75; Imbens and Lancaster

1996; King and Zeng 2001a,b; Scott and Wild 1997).

However, most are dedicated to traditional models (e.g. logit).

We discuss two corrections for bagging and boosting.

Page 18: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

The Bias Correction Methods

The weighting correction:

Based on marketers’ prior beliefs about the churn rate, i.e. the

proportion of churners among their customers, we attach weights

to observations of a balanced calibration sample.

The intercept correction:

Take a non-zero cut-off value τB such that the proportion of

predicted churners in the calibration sample equals the actual a

priori proportion of churners.

Page 19: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Let the calibration sample be Z={(x1,y1), …, (xi,yi), …, (xN,yN)}

B bootstrap samples

From each , a base classifier (e.g. tree) is estimated,

giving B score functions:

The final classifier is obtained by averaging the scores

The classification is carried out via

BbZb ,,2 ,1 ,*

B

bbbag xf

Bxf

1

*ˆ1)(ˆ

*bZ

xfxfxf Bb***

1ˆ,, ˆ,, ˆ

)(ˆ)(ˆ Bbagbag xfsignxc

Bagging

Page 20: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Assessing the Best Bias Correction…

Bias Correction

No correction Intercept Weighting

Model* Validated Error Rates**

Binary logit model 0.400 0.035 0.018

Bagging (tree-based) 0.374 0.034 0.025

S.G. boosting 0.460 0.034 0.018

* Model estimated on the balanced calibration sample** Error rates computed on the hold-out proportional validation sample

Page 21: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

The Top-Decile Lift Focuses on the most critical group of customers

regarding their churn risk: Ideal segment for targeting a

retention marketing campaign

The top 10% riskiest customers

With = the proportion of churners in this risky segment

And = the proportion of churners in the whole validation set

Risk to churn

10%

ˆ

ˆlift decile-Top %10

%10̂̂

Page 22: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Financial Gains: Neslin et al. (2004)

N : customer base of the company

α : percentage of targeted customers (here, 10%)

ΔTop decile : increase in top-decile lift

γ : success rate of the incentive among the churners

LVC : lifetime value of a customer (Gupta, Lehmann and Stuart 2004)

δ : incentive cost per customer

ψ : success rate of the incentive among the non-churners.

LVCdecileTopNGain ˆ

Page 23: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

0 20 40 60 80 100

Number of iterations

1.6

1.8

2.0

2.2

2.4

2.6

Top d

eci

le*

BaggingStochastic Gradient BoostingBinary Logit Model

Top-Decile Lift with Intercept Correction

* Model estimated on the balanced sample, and lift computed on the validation sample.

+26%

Page 24: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Validated** Top-Decile Lift

Model*No / Intercept

correctionWeighting correction

Binary logit model 1.775 1.764

Bagging (tree-based) 2.246 1.549

Stochastic gradient boosting 2.290 1.632

* Model estimated on the balanced calibration sample** Error rates computed on the hold-out proportional validation sample

Page 25: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Financial Gains

If we consider

N : customer base of 5,000,000 customers

α : 10% of targeted customers

γ : 30% success rate of the incentive among the churners

LVC : $2,500 lifetime value of a customer

δ : $50 incentive cost per customer

ψ : 50% success rate of the incentive among the non-churners

LVCdecileTopNGain ˆ

Page 26: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Financial Gains

Additional financial gains that we may expect from a retention marketing campaign which would be targeted using the scores predicted by the bagging instead of the logit model:

ΔTop decile : 0. 471 (= 2.246 – 1.775)

Gain = + $ 3,214,800

Additional financial gains that we may expect from a retention marketing campaign which would be targeted using the scores predicted by the bagging instead of a random selection:

ΔTop decile : 1.246 (= 2.246 – 1.000)

Gain = + $ 8,550,000

Page 27: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Most Important Churn Triggers

Bagging

Page 28: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Partial Dependence Plots

-1000 0 1000 2000

Change in monthly min. of use

48

50

52

54

56

58

60

62

Pro

bability t

o c

hurn

0 500 1000 1500

Equipment days

44

46

48

50

52

54

56

Pro

bability t

o c

hurn

Bagging

Page 29: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Partial Dependence Plot

Pro

bab

ilit

y to

ch

urn

49

50

51

Page 30: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Conclusions: Main Findings

1. Bagging and S.G. boosting are substantially better

classifiers than the binary logit choice model

Improvement of 26% for the top-decile lift,

Good diagnostic measures offering face validity,

Interesting insights about potential churn drivers,

Bagging is conceptually simple and easy-to-implement.

2. Intercept correction constitutes an appropriate bias

correction for bagging when using balanced sampling

scheme.

Page 31: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

Thanks for your attention

Page 32: Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry Forthcoming, Journal of Marketing Research Joint work

From Profit to Financial Gains

LVCdecileTopN

LVCN

ˆ

ˆ-ˆ

ProfitProfitGain

2 1

2 1 2-1

cLVCN 1111 ˆ1ˆ ˆ Profit

LVC of a churner

who does not

churn

Incentive cost for the

churners retained

+ non-churners

targeted

Contact

cost

ˆ/ ˆdecile Top 1 1