“how to project customer retention” revisited: the...

16
How to Project Customer RetentionRevisited: The Role of Duration Dependence Peter S. Fader a , & Bruce G.S. Hardie b & Yuzhou Liu c & Joseph Davin d & Thomas Steenburgh e a The Wharton School of the University of Pennsylvania b London Business School c Man Numeric d Two Six Capital e Darden Graduate School of Business, University of Virginia Available online 24 April 2018 Abstract Cohort-level retention rates typically increase over time, and the beta-geometric (BG) distribution has proven to be a robust model for capturing and projecting these patterns into the future. According to this model, the phenomenon of increasing cohort-level retention rates is purely due to cross-sectional heterogeneity; an individual customers propensity to churn does not change over time. In this paper we present the beta-discrete- Weibull (BdW) distribution as an extension to the BG model, one that allows individual-level churn probabilities to increase or decrease over time. In addition to capturing the phenomenon of increasing cohort-level retention rates, this new model can also accommodate situations in which there is an initial dip in retention rates before they increase (i.e., a U-shaped cohort-level retention curve). A key nding is that even when aggregate retention rates are monotonically increasing, the individual-level churn probabilities are unlikely to be declining over time, as conventional wisdom would suggest. We carefully explore these connections between heterogeneity, duration dependence, and the shape of the retention curve, and draw some managerially relevant conclusions, e.g., that accounting for cross-sectional heterogeneity is more important than accounting for any individual-level dynamics in churn propensities. © 2018 Direct Marketing Educational Foundation, Inc. dba Marketing EDGE. All rights reserved. Keywords: beta-geometric (BG) distribution; beta-discrete-Weibull (BdW) distribution; retention rate dynamics Introduction Any researcher working with data from a business that has a contractualrelationship with its customers (e.g., one with a subscription-based business model) will want models for projecting customer retention (or equivalently, tenure) as part of their toolkit. For example, estimates of the length of a customers relationship with the firm lie at the heart of any attempt to compute customer lifetime value (CLV). Similarly, such models are useful when evaluating the relative perfor- mance of different acquisition channels. Fader and Hardie (2007), hereafter FH, presented the beta- geometric (BG) distribution as a simple probability model for projecting customer retention. This model is based on an easy- to-understand storyof customer behavior, is simple to implement (e.g., can be done so in Excel), and its estimates of customer retention over a longitudinal holdout period have proven to be surprisingly accurate and robust. According to this model, the widely observed phenomenon of increasing cohort-level retention rates (Reichheld 1996) is purely due to cross-sectional heterogeneity, with individual customers having a constant propensity to churn. Cohort-level retention rates increase because those customers with high churn propensities drop out early on, leaving an ever-increasing Corresponding author at: 771 Jon M. Huntsman Hall, 3730 Walnut Street, Philadelphia, PA 19104-6340. E-mail addresses: [email protected], http://www.petefader.com (P.S. Fader), [email protected], http://www.brucehardie.com (B.G.S. Hardie). www.elsevier.com/locate/intmar https://doi.org/10.1016/j.intmar.2018.01.002 1094-9968© 2018 Direct Marketing Educational Foundation, Inc. dba Marketing EDGE. All rights reserved. Available online at www.sciencedirect.com ScienceDirect Journal of Interactive Marketing 43 (2018) 1 16

Upload: others

Post on 07-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

⁎ Corresponding author at: 771 Jon M. Huntsman Hall, 3730 Walnut Street,Philadelphia, PA 19104-6340.

E-mail addresses: [email protected], http://www.petefader.com(P.S. Fader), [email protected], http://www.brucehardie.com (B.G.S. Hardie).

www.elsevier

https://doi.org/10.1016/j.intmar.2018.01.0021094-9968© 2018 Direct Marketing Educational Foundation, Inc. dba Marketing EDGE. All rights reserved.

Available online at www.sciencedirect.com

ScienceDirectJournal of Interactive Marketing 43 (2018) 1–16

.com/locate/intmar

“How to Project Customer Retention” Revisited:The Role of Duration Dependence

Peter S. Fader a,⁎& Bruce G.S. Hardie b& Yuzhou Liu c& Joseph Davin d& Thomas Steenburgh e

a The Wharton School of the University of Pennsylvaniab London Business School

c Man Numericd Two Six Capital

e Darden Graduate School of Business, University of Virginia

Available online 24 April 2018

Abstract

Cohort-level retention rates typically increase over time, and the beta-geometric (BG) distribution has proven to be a robust model for capturingand projecting these patterns into the future. According to this model, the phenomenon of increasing cohort-level retention rates is purely due tocross-sectional heterogeneity; an individual customer’s propensity to churn does not change over time. In this paper we present the beta-discrete-Weibull (BdW) distribution as an extension to the BG model, one that allows individual-level churn probabilities to increase or decrease over time.In addition to capturing the phenomenon of increasing cohort-level retention rates, this new model can also accommodate situations in which thereis an initial dip in retention rates before they increase (i.e., a U-shaped cohort-level retention curve). A key finding is that even when aggregateretention rates are monotonically increasing, the individual-level churn probabilities are unlikely to be declining over time, as conventional wisdomwould suggest. We carefully explore these connections between heterogeneity, duration dependence, and the shape of the retention curve, and drawsome managerially relevant conclusions, e.g., that accounting for cross-sectional heterogeneity is more important than accounting for anyindividual-level dynamics in churn propensities.© 2018 Direct Marketing Educational Foundation, Inc. dba Marketing EDGE. All rights reserved.

Keywords: beta-geometric (BG) distribution; beta-discrete-Weibull (BdW) distribution; retention rate dynamics

Introduction

Any researcher working with data from a business that has a“contractual” relationship with its customers (e.g., one with asubscription-based business model) will want models forprojecting customer retention (or equivalently, tenure) as partof their toolkit. For example, estimates of the length of acustomer’s relationship with the firm lie at the heart of anyattempt to compute customer lifetime value (CLV). Similarly,

such models are useful when evaluating the relative perfor-mance of different acquisition channels.

Fader and Hardie (2007), hereafter FH, presented the beta-geometric (BG) distribution as a simple probability model forprojecting customer retention. This model is based on an easy-to-understand “story” of customer behavior, is simple toimplement (e.g., can be done so in Excel), and its estimates ofcustomer retention over a longitudinal holdout period haveproven to be surprisingly accurate and robust.

According to this model, the widely observed phenomenon ofincreasing cohort-level retention rates (Reichheld 1996) is purelydue to cross-sectional heterogeneity, with individual customershaving a constant propensity to churn. Cohort-level retentionrates increase because those customers with high churnpropensities drop out early on, leaving an ever-increasing

Page 2: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

1 FH called this the shifted-beta-geometric (sBG) model; the term “shifted” isused to make the distinction between two versions of the geometric distribution;one with support 0, 1, 2, …, and the other with support 1, 2, 3, …, with the term“shifted” being applied to the second version. (In our contract-duration setting,the first version would apply when T is defined as the number of contractrenewals the individual makes before they cancel their contract, rather than thelength of the individual’s relationship with the firm (measured in number ofcontract periods), as is the case above.) However, most applications of thismixture model are in settings in which the support is 1, 2, 3, …, yet the termshifted is not applied. So as to be consistent with this broader literature, we usethe label BG (rather than sBG) for this distribution.

2 P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

proportion of customers who have low propensities to churn. Thisassumption that an individual customer’s propensity to churndoes not change over time flies in the face of conventionalwisdom, which assumes that a customer’s propensity to churndecreases the longer their tenure with the firm.

While cohort-level retention rates are typically monotoni-cally increasing with tenure, we sometimes observe an initialdip before they increase (e.g., Israel 2005; Nitzan, Libai, andEin-Gar 2011, and one of the FH datasets), a phenomenon thatthe BG model cannot capture. In this paper we develop ageneralization of the BG model, one that both relaxes theassumption of time-invariant individual-level propensities tochurn and is sufficiently flexible to capture the phenomenon ofnon-monotonically increasing cohort-level retention rates.Surprisingly, we find that when the assumption of constantindividual-level propensities to churn is violated, it is morelikely that these propensities increase with tenure (rather thandecrease, as conventional wisdom would suggest).

This paper is organized as follows. In the next section we re-examine the work of FH, reviewing the BG model and itsempirical performance. We then present our generalization ofthe BG model, the beta-discrete-Weibull (BdW) distribution,and examine its performance using the two datasets presentedin FH. This is followed by an investigation of the properties ofthe cohort-level retention rates associated with the BdW model.We then investigate the robustness of our results by exploringsome alternative model specifications, and conclude with abrief discussion of the implications of this work.

A Brief Review of the BG Model

FH propose a simple probability model for characterizingand forecasting the length of a customer’s relationship with afirm in a contractual setting that is based on the following“as if” story of customer behavior:

i. At the end of each contract period, an individual decideswhether or not to renew their contract by tossing a coin:“heads” they renew their contract, “tails” they cancel it.

ii. For a given individual, the probability of a coin comingup “tails” does not change over time.

iii. The probability of a coin coming up “tails” varies acrosscustomers. (This implies that the coins are not assumed tobe “fair.”)

This is formalized in the following manner. Let the randomvariable T denote the length of an individual’s relationship withthe firm, and θ denote the probability of a given individual’scoin coming up “tails” when tossed. Assumptions (i) and (ii)are equivalent to assuming that T is distributed geometric withsurvivor function

S t jθð Þ ¼ 1−θð Þt; 0 b θ b 1; t ¼ 0; 1; 2;… ð1Þ

From the analyst’s perspective, the unobserved (andunobservable) θ is a realization of the random variable Θ.

Given its flexibility and mathematical convenience, the naturaldistribution for characterizing Θ is the beta distribution:

f θ jγ; δð Þ ¼ θγ−1 1−θð Þδ−1B γ; δð Þ ; γ; δ N 0: ð2Þ

It follows that for a randomly chosen individual,

S t γ; δjð Þ ¼Z 1

0S t θjð Þ f θ γ;j δð Þdθ ¼ B γ; δþ tð Þ

B γ; δð Þ ; t ¼ 0; 1; 2;…

ð3Þ

This beta mixture of geometrics is called the beta-geometric(BG) distribution.1 (See FH for model derivations andinformation on how to estimate the model parameters; alsosee Fader and Hardie (2014) for an alternative estimationapproach.)

To the best of our knowledge, this mixture model was firstderived by Pielou (1962), who used it to characterize the runslengths of species in plant populations. Potter and Parker(1964) were the first to use it as a model for duration-time data,using it to characterize the number of menstrual cycles awoman experiences before she conceives. Within the marketingliterature, it was used by Morrison and Perry (1970) as a modelof the number of units purchased on a given transactionoccasion, and by Buchanan and Morrison (1988) as a model ofresponse to promotional stimuli; also see Fox, Reddy, and Rao(1997). FH explored its properties as a model of the length of acustomer’s relationship with a firm in a contractual setting, withFader and Hardie (2010) taking the logical next step and usingit as the basis for calculating CLV.

At the heart of the FH paper is a customer metric of greatinterest to managers and analysts: the retention rate. Whencomputed at the level of the cohort, the period t retention rate isthe portion of period t customers (i.e., those who have“survived” to period t) who renew their contracts at the end ofthat period. This can be computed as

r t γ; δjð Þ ¼ S t γ; δjð ÞS t−1 γj ; δð Þ

¼ B γ; δþ tð ÞB γ; δþ t−1ð Þ

¼ δþ t−1γþ δþ t−1

; t ¼ 1; 2; 3;…

ð4Þ

Note that, for any values of γ and δ, this is an increasingfunction of time. It is important to note that there are no underlying

Page 3: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

Table 1Pattern of year-on-year renewals for a cohort of 1000 customers from twosegments (Regular and High End) acquired at the beginning of Year 1

# Customers

Year Regular High End

1 1000 10002 631 8693 468 7434 382 6535 326 5936 289 5517 262 5178 241 4919 223 46810 207 44511 194 42712 183 40913 173 394

2 A copy of the spreadsheet containing the analyses presented in this papercan be found at b insert URLN.

Table 2BG model estimation results

Eight-year CalibrationPeriod

Five-year CalibrationPeriod

Regular High End Regular High End

γ 0.704 0.668 0.764 1.281δ 1.182 3.806 1.296 7.790LL -1680.3 -1611.2 -1401.6 -1225.1

3P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

time dynamics at the level of the individual customer; seeassumption (ii) above. The increasing (aggregate/cohort-level)retention rate is simply due to a sorting effect in a heterogeneouspopulation.

To elaborate on this sorting effect, let ρ(t) denote theindividual-level probability that someone who has made t − 1renewals will renew at the next opportunity (i.e., P(heads)). Giventhe assumption of individual-level relationship durations charac-terized by the geometric distribution, ρ(t) = S(t | θ)/S(t − 1 | θ) =1 − θ. Recall that the unobserved (and unobservable) θ areviewed as realization of the random variableΘ. Similarly, ρ(t) is arealization of P(t).

Since ρ(t) is a function of θ, the distribution of P(t) is afunction of the posterior distribution of Θ across the period tcustomers (i.e., those who have made t − 1 contract renewals).Recalling Bayes’ theorem,

f θ γ; δ; t−1 renewalsjð Þ ¼ S t−1 θjð Þ f θ γ; δjð ÞS t−1 γ; δjð Þ

¼ θγ−1 1−θð Þδþt−2

B γ; δþ t−1ð Þ ; t ¼ 1; 2; 3;…ð5Þ

which is a beta distribution with parameters γ and δ + t − 1.Given that ρ(t) = 1 − θ, the distribution of P(t) across period tcustomers is simply the reflection of this posterior distributionabout θ = 0.5, which is a beta distribution with parametersδ + t − 1 and γ:

f ρ tð Þ jγ; δð Þ ¼ ρ tð Þδþt−2 1−ρ tð Þð Þγ−1B δþ t−1; γð Þ ; t ¼ 1; 2; 3;… ð6Þ

The mean of this distribution equals the expression for r(t | γ,δ) given in (4), i.e., r(t) = E[P(t)]. Dynamics in r(t) aresimply due to changes in the nature of the distribution of P(t),which are simply due to customers with higher churn propensitiesdropping out, leaving an ever-increasing proportion of customerswho have low propensities to churn.

Revisiting FH’s Analysis

We start by revisiting the empirical analysis presented inFH. Our objectives are two-fold. First, we wish to highlight therobustness of the BG model. Second, we wish to identify thephenomenon that motivates this work.

The data presented in Table 1, drawn from Berry and Linoff(2004), document the year-on-year renewals for two segments ofcustomers (“Regular” and “High End”) of an unspecified firm in acontractual setting. (See FH for further details.) For a nominalsample of 1000 customers acquired at the beginning of Year 1,we observe their pattern of renewals over 12 consecutive (annual)renewal opportunities. For example, 631 individuals in theRegular dataset renew their contract at the end of the first year,and are therefore customers in Year 2. Of these 631 individualswho have a contractual relationship with the firm in Year 2, 468renew their contract at the end of the year and are thereforecustomers in Year 3. And so on.

FH undertake an analysis in which the model is calibratedusing the first eight years of data (seven renewal opportunities)and its predictive performance assessed over the remaining fiveyears of data (five renewal opportunities). The estimationresults are presented in Table 2 (columns 2 and 3) and themodel-based estimates of survival and retention are comparedagainst the actual numbers for both datasets in Fig. 1.2

We note that this simple probability model does an excellentjob of predicting survival (and therefore retention) in theRegular dataset. The prediction of survival in the High Enddataset is good but not quite as impressive on retention as forthe Regular dataset. FH noted that “[d]espite the existence ofcertain unexplained “blips” as in Year 2 for the High Enddataset, the tracking/prediction plot for [r(t)] is very impressivethrough Year 12,” and made no further comment.

We now “stress test” the BG model by shortening thecalibration period to five years (four renewal opportunities),thereby lengthening the validation period to eight years. Theestimation results are presented in Table 2 (columns 4 and 5).Note that for the Regular cohort the parameter estimates arequite similar. In fact, evaluating the eight-year calibrationperiod likelihood function using the five-year calibration periodparameter estimates yields a log-likelihood of −1680.6. Thisstability in parameter estimates suggests that the BG model isan excellent characterization of the true data-generatingprocess. This conclusion is supported when we compare the

Page 4: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

Fig. 1. Comparing actual and BG-model-based estimates of survival (LHS) and retention (RHS) given an eight-year model calibration period. (The model-basednumbers to the right of the vertical dashed line are projections given the parameter values estimated using the data to the left of this line.)

4 P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

model-based estimates of survival and retention to the actualnumbers (top half of Fig. 2). While not as good as for aneight-year calibration period, the performance is still veryimpressive when we consider that we are predicting behavioracross a holdout period that is twice as long as the calibrationperiod.

The results for the High End dataset are a completelydifferent story. We note from Table 2 that the parameter

Fig. 2. Comparing actual and BG-model-based estimates of survival (LHS) and reteyear model calibration period. (The model-based numbers to the right of the vertical dthe left of this line.)

estimates are very sensitive to the length of the calibrationperiod, suggesting that the BG model is a not goodcharacterization of the true data-generating process for thisdataset. This problem is even more evident when we comparethe model-based estimates of survival and retention to theactual numbers (bottom half of Fig. 2). While the modelappears to be tracking actual survival in the calibration period,it progressively under-predicts survival with the passage of

ntion (RHS) for the Regular (top) and High End (bottom) datasets given a five-ashed line are projections given the parameter values estimated using the data to

Page 5: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

4 There are no closed-form expressions for the mean and variance of the dWdistribution. Looking at (8), we see that P(T = 1 | θ,c) = θ, which means the

5P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

time in the validation period. This is reflected in the failure ofthe model to capture the dynamics in the retention ratesobserved in this dataset.

For an eight-year calibration period, the dip in the retentionrate observed in Year 2 is effectively treated as an outlier thathas little impact on model estimation; the overall trend ofincreasing retention rates is adequately captured (as observed inFig. 1). However, it has a far greater influence on modelestimation when we shorten the model calibration period tofive years (as observed in the bottom-right plot in Fig. 2).Whereas it seemed acceptable for FH to brush aside the Year 2dip, the shorter calibration period shows that we cannotignore it.

Cohort-level retention rates are predominantly monotoni-cally increasing (as in the Regular dataset), and the BG model isa robust way to characterize such behaviour. However, it is nota robust model when faced with the type of cohort-levelretention rate pattern observed in the High End datset. We haveobserved such a dip in several other datasets (e.g., Israel 2005;Nitzan, Libai, and Ein-Gar 2011). This suggests the need for analternative, more flexible, model for characterizing andforecasting the length of a customer’s relationship with thefirm in a contractual setting.

The BdW Model

When a model doesn’t “work,” we question its underlyingassumptions. Reflecting on the “as if” story of buyer behaviorunderpinning the BG model, a number of people struggle withthe assumption that, for a given individual, the probability of acoin coming up “tails” does not change over time. They expectit to become more “headsy” over time (i.e., the individual isexpected to become more “loyal” the longer they remain acustomer).3

To accommodate this in a continuous-time environment, thenatural starting point would be to replace the exponentialdistribution (the continuous-time equivalent of the geometricdistribution) with the Weibull distribution, which allows for anindividual’s risk of canceling their contract to increase ordecrease as the length of the relationship with the firm increases(Murthy et al. 2004; Rinne 2009).Working within a discrete-timecontractual setting, the natural starting point is to use a discreteWeibull distribution.

A discrete-time equivalent of a continuous distribution canbe constructed by treating the discrete lifetime variable as theinteger part of the continuous lifetime and discretizing its cdfor, equivalently, survivor function (Lai 2013). Suppose thecontinuous lifetime random variable X is distributed Weibullwith survivor function

S x jλ; cð Þ ¼ exp −λxcð Þ ¼ exp −λð Þ½ �xc :

3 There are two standard explanations for such an expectation. The first isbased on an evolution of customer satisfaction argument (e.g., Bolton 1998),while the second is based on an increasing switching costs argument (e.g.,Burnham et al. 2003).

Letting exp (−λ) = 1 − θ, it follows that the survivorfunction associated with the discrete lifetime random variableT = ⌊X⌋ is

S t jθ; cð Þ ¼ 1−θð Þtc ; 0 b θ b 1; c N 0; t ¼ 0; 1; 2;…: ð7ÞThe associated pmf is given by

P T ¼ tjθ; cð Þ ¼ 1−θð Þ t−1ð Þc− 1−θð Þtc ; t ¼ 1; 2; 3;…: ð8Þ

This is the discrete Weibull (dW) distribution proposed byNakagawa and Osaki (1975). While there are other discreteWeibull distributions (Murthy et al. 2004, Chapter 13; Rinne2009, Section 3.3.1), this one is simple, flexible, and the bestanalogue of the continuous Weibull distribution (Bracquemondand Gaudoin 2003). We note that the dW collapses to thegeometric distribution when c = 1, just as the Weibull collapsesto the exponential distribution when c = 1.4

Under the dW distribution, the individual-level probabilitythat someone who has made t − 1 renewals will renew at thenext opportunity is given by

ρ t θ; cjð Þ ¼ S t θ; cjð ÞS t−1 θ; cjð Þ

¼ 1−θð Þtc− t−1ð Þc ; t ¼ 1; 2; 3;…

ð9Þ

When c = 1, we have a constant individual retentionprobability, i.e., P(heads) in the “story” underpinning the BGmodel. When c N 1, tc − (t − 1)c increases with time, whichmeans the proverbial coin becomes less “headsy” the longer theindividual remains a customer (i.e., the longer they remain acustomer, the less likely they are to renew their contract). Whenc b 1, tc − (t − 1)c decreases with time, which means the coinbecomes more “headsy” the longer the individual remains acustomer (i.e., we have an increasing individual-level retentionprobability). When c N 1, the discrete Weibull is said to exhibitpositive duration dependence (i.e., an increasing probability of“failure”). Similarly, when c b 1, it is said to exhibit negativeduration dependence (i.e., a decreasing probability of “failure”).

We fit this distribution to our two datasets (using the samefive-year calibration period). The estimation results are presentedin Table 3, and the associated tracking plots in Fig. 3. (SeeTable 4.)

As would be expected, our estimate of c is less than 1 (i.e.,negative duration dependence) for both datasets, which impliesthat individual-level retention probabilities increase over time.Comparing the LL values with those associated with the BGmodel (Table 2), we see that the dW does not fit the data nearlyas well as the BG. Looking at the tracking plots in Fig. 3, wesee that the dW fails to capture the retention rate dynamicsobserved in both datasets, and therefore fails to track the

pmf is reverse-J-shaped (i.e., the mode is at T = 1) when θ N 0.5. When c ≤ 1,the pmf is reverse-J-shaped for all values of θ. When θ b 0.5, the pmf has aninterior mode (i.e., the cdf is S-shaped) when c N ln [ln (1 − 2θ)/ ln (1 − θ)]/ln (2). (In contrast, the geometric distribution pmf is always reverse-J-shaped,which means its cdf is concave.)

Page 6: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

Table 3dW model estimation results

Regular High End

θ 0.374 0.138c 0.636 0.910LL -1404.0 -1226.5

Table 4BdW model estimation results

Regular High End

γ 0.523 0.259δ 0.894 1.722c 1.197 1.584LL -1401.4 -1222.7

6 P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

number of “surviving” customers. Comparing these plots withthose in Fig. 2, it is clear that, at least for these datasets, a modelthat explains increases in aggregate retention rates in terms ofheterogeneity alone (i.e., the BG, which assumes c = 1) doesbetter than one that explains it in term of duration dependencealone. We also note that the dW fails to capture the Year 2 dipin the High End retention rate curve.

So what happens when we allow for both heterogeneity andduration dependence? Assuming cross-sectional heterogeneityin θ is characterized by a beta distribution with parameters(γ,δ), it follows that for a randomly chosen individual,

S t γ; δ; cjð Þ ¼Z 1

0S t θ; cjð Þ f θ γ; δjð Þdθ

¼ B γ; δþ tcð ÞB γ; δð Þ ; t ¼ 0; 1; 2;…

ð10Þ

Fig. 3. Comparing actual and dW-model-based estimates of survival (LHS) and reteyear model calibration period. (The model-based numbers to the right of the vertical dthe left of this line.)

and

P T ¼ t γ;j δ; cð Þ ¼ S t−1 γ; δ; cjð Þ−S t γ; δ; cjð Þ¼ B γ; δþ t−1ð Þcð Þ−B γ; δþ tcð Þ

B γ; δð Þ ; t ¼ 1; 2; 3;…

ð11Þ

We call this parametric mixture model the beta-discrete-Weibull (BdW).

The associated aggregate/cohort-level retention rate is

r t γ; δ; cjð Þ ¼ S t γ;j δ; cð ÞS t−1 γ;j δ; cð Þ

¼ B γ; δþ tcð ÞB γ; δþ t−1ð Þcð Þ

¼ Γ δþ tcð ÞΓ δþ t−1ð Þcð Þ

Γ γþ δþ t−1ð Þcð ÞΓ γþ δþ tcð Þ ; t ¼ 1; 2; 3;…

ð12Þ

ntion (RHS) for the Regular (top) and High End (bottom) datasets given a five-ashed line are projections given the parameter values estimated using the data to

Page 7: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

7P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

We explore the shape of the associated retention rate curvebelow.

The distribution of P(t) (i.e., P(heads) across those customerswho have made t − 1 renewals) is

f ρ tð Þ jγ; δ; cð Þ ¼ 1

tc− t−1ð Þc1

ρ tð Þ

�1−ρ tð Þ 1

tc− t−1ð Þcn oγ−1

ρ tð Þ 1tc− t−1ð Þc

n oδþ t−1ð Þc

Bðγ; δþ t−1Þcð Þ ; t ¼ 1; 2; 3;…:

ð13Þ

The mean of this distribution is, of course, the (aggregate)retention rate r(t | γ,δ,c). (See Appendix A for the derivations.)When c = 1 (i.e., BdW → BG), (13) reduces to (6). Similarly,(12) reduces to (4).

We fit this model to both the Regular and High End datasetsusing a five-year calibration period. (See Appendix B fordetails of how to estimate the model parameters in Excel.) Theestimation results are presented in Table 2. Comparing theseresults with those for the BG model (Table 2), we see that theimprovement in model fit for the Regular dataset is negligible(LR = 0.35, p = 0.553), which means c is not significantlydifferent from 1. On the other hand, we observe a significantimprovement in fit for the High End dataset (LR = 4.77, p =0.029), which means c is significantly different from 1.Comparing these estimates of c to those obtained when fittingthe dW to these datasets (Table 3), we notice that dW estimateis downward biased, reflecting the well-know result thatunobserved heterogeneity induces spurious negative durationdependence in models for duration-time data (Kiefer 1988;Proschan 1963; Vaupel and Yashin 1985).

In Fig. 4 the model-based estimates of survival and retentionare compared against the actual numbers for both datasets. Thefigures speak for themselves; the performance of the BdWmodel is impressive.5

At first glance, it is not clear why the model captures thedownward blip in the aggregate retention rate curve we observein the High End dataset. We note that c ¼ 1:584, indicating thatat the level of the individual, customers become less likely torenew their contracts the longer they remain a customer.However, after some time, the sorting effect of heterogeneity(which causes the aggregate retention rate to increase overtime) dominates the individual-level decline and the aggregateretention rate starts rising. We explore this further in Section 5below.

5 Fitting the BdW model to the Regular dataset using an eight-year calibrationperiod yields the following parameter estimates: γ ¼ 0:456 , δ ¼ 0:779 , andc ¼ 1:284 . While these are slightly different from those associated with afive-year calibration period (Table 2), inserting these eight-year estimates in thelog-likelihood function associated with a five-year calibration period yields thesame value of −1401.4. Using an eight-year calibration period for the High Enddataset yields the following parameter estimates: γ ¼ 0:214, δ ¼ 1:427, andc ¼ 1:724. Inserting these eight-year estimates in the log-likelihood functionassociated with a five-year calibration period yields a value of −1222.9 (versus−1222.7 in Table 2). This relative insensitivity to the length of the modelcalibration period suggests that the BdW provides a good characterization of thetrue data-generating process.

How do our inferences about the underlying level of cross-sectional heterogeneity change when we allow for individual-level duration dependence? The parameters of the betadistribution can be characterized in terms of the mean E(Θ) = γ/(γ + δ) and polarization index ϕ = 1/(γ + δ + 1). Thelogic behind the polarization index is as follows: as γ, δ → 0(thus ϕ → 1), the values of θ are concentrated near θ = 0 andθ = 1 and we can think of the values of θ as being very different,or “highly polarized.” As γ, δ → ∞ (thus ϕ → 0), the betadistribution becomes a spike at its mean; there is no “polariza-tion” in the values of θ. Given the five-year calibration periodparameter estimates for the High End dataset in Table 2, ϕBG

¼ 0:099; given the parameter estimates in Table 2, ϕBdW ¼ 0:335. We observe that there is greater heterogeneity in the presence ofthe positive duration dependence to capture the dominant patternof increasing aggregate retention rates observed in the data.

Exploring the Shape of r(t)

Our analysis of the High End dataset demonstrates that theBdW model can capture an early dip in the aggregate retentionrate curve, something that is not immediately obvious given theunderlying model assumptions. The explanation given is one ofthe “battle” between individual-level duration dependence andcross-sectional heterogeneity. We now undertake a morethorough investigation of the shape of r(t) under the BdWmodel.

Let us start by considering three scenarios: Case 1 (γ = 4.75and δ = 14.25), Case 2 (γ = 0.5 and δ = 1.5), and Case 3 (γ =0.083 and δ = 0.250). While the associated distributions of Θhave the same mean (E(Θ) = 0.25), they take on quite differentshapes (Fig. 5). In Case 1, the distribution of Θ is relativelyhomogeneous (ϕ = 0.05) with an interior mode. In Case 2,there is quite a bit of heterogeneity (ϕ = 0.33) in thedistribution of Θ, with the majority of individuals havinglowish values of θ. The heterogeneity in Case 3 (ϕ = 0.75) isextreme; this U-shaped distribution indicates that some of theacquired customers have a high value of θ (which maps to alow probability of renewal), while a larger number of customershave small values of θ.

When c b 1, the cohort-level retention rates always increaseover time; see Fig. 6a, which shows the cohort-level retentionrates when c = 0.75 for the completely homogeneous case aswell as for Cases 1–3. Referring back to Fig. 5, the distributionof Θ for Case 3 is very polarized. This means we have a groupof people with high values of θ and they churn almostimmediately, leaving us with the group of customers who havevery low values of θ (and, because c b 1, their churnprobabilities only get smaller over time). Thus r(t) jumps andthen levels off very quickly. In Case 1, where the distribution ofΘ is relatively more homogeneous, the leveling-off process isslower, and is much closer to the homogeneous case. Case 2lies between these extremes.

When c N 1, an individual’s retention probability decreasesover time. But what is the countervailing effect of heterogeneity?Consider Fig. 6b which shows the cohort-level retention rateswhen c = 1.25 for differing levels of heterogeneity in the

Page 8: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

Fig. 4. Comparing actual and BdW-model-based estimates of survival (LHS) and retention (RHS) for the Regular (top) and High End (bottom) datasets given an five-year model calibration period. (The model-based numbers to the right of the vertical dashed line are projections given the parameter values estimated using the data tothe left of this line.)

8 P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

distribution of Θ across the cohort members. In the completelyhomogeneous case, we observe a monotonically decreasingretention rate. But when any heterogeneity is present, we see the“ruse of heterogeneity” (Vaupel and Yashin 1985): even though

Fig. 5. Shape of the beta distribution for Cases 1–3.

the individual-level customer retention probabilities are decreasingover time, the effect of a moderate amount of heterogeneity in thedistribution of Θ across cohort members is to cause the retentionrate to start increasing, either immediately or (if ϕ is low) after afew periods. We note that model can accommodate a one-perioddip in retention rates (as observed in Fig. 4) or a more prolongeddip (as observed in Case 1). As the level of heterogeneity increases,as in Cases 2 and 3, we find that the cohort-level retention ratemonotonically increases over time (even though the individual-level retention probability is a decreasing function of time).

To further explore the relationship between heterogeneityand dynamics, we look at the shape of the BdW retention curveas a function of c and ϕ for three different values of E(Θ) (0.10,0.25, 0.40) — see Fig. 7. The monotonically decreasing curveappears only in the degenerate case where there is noheterogeneity (ϕ → 0), so we ignore this special case tofocus on the other two general shapes that can occur. Whenc b 1, the aggregate retention curve must increase monotoni-cally. For this range of the shape parameter c, each individual’srenewal probability increases over time. The sorting effect ofheterogeneity also pushes the curve up. Since both forces workin the same direction, the resulting aggregate retention curvemust rise over time. When c N 1, either shape can arisedepending on the strength of duration dependence and the level

Page 9: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

Fig. 6. Shape of the beta-discrete-Weibull retention curve for different levels of heterogeneity in Θ and different values of c (with E(Θ) = 0.25 in all cases).

9P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

of heterogeneity in the underlying distribution of Θ. A U-shaped curve requires customers to have an increasingpropensity to churn in order to generate the initial dip. Thelatter part of the U results from the sorting effect ofheterogeneity. However, as the level of heterogeneity increases,the sorting effect of heterogeneity dominates individual-levelduration dependence and we have a monotonically increasingretention curve. Finally, as seen by the gradual “fanning out” ofthe curved line towards the top of the figure, we note that theshape of the BdW retention curve is surprisingly insensitive tovariations in E(Θ).

To gain additional insight into the U-shaped versus monoton-ically increasing shape of r(t), let us further investigate the shapeof the BdW retention curve for different levels of heterogeneity inθ, with E(Θ) = 0.25 and c = 1.25 in all cases. We see in Fig. 8

that the location of the minimum of r(t) increases as the level ofheterogeneity in θ decreases, with its location going to infinity asϕ → 0. As the level of heterogeneity increases, the location ofthe dip shifts to the left and it becomes less deep. We then get tothe point where r(1) = r(2) after which r(t) becomes S-shapedand then concave.

Clearly c N 1 when r(t) is U-shaped. Is there anything we canlearn about the sign of c from the shape of a monotonicallyincreasing r(t)? The answer is yes. Given r(1), r(2), and r(3), wecan (numerically) solve for the three BdW model parameters(γ, δ, c). We present in Fig. 9 a plot that indicates whether c isgreater than or less than 1 as a function of r(2) and r(3) given r(1) = 0.60 and r(1) = 0.75. (We limit the analysis to the region ofinterest above the dashed line where retention rates increasemonotonically (r(1) ≤ r(2) ≤ r(3)). We note that the area where

Page 10: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

Fig. 7. Shape of the beta-discrete-Weibull retention curve as a function of c andϕ for E(Θ) = {0.10,0.25,0.40}.

10 P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

the aggregate retention curve is monotonically increasing whileeach individual’s renewal probability increases over time (i.e.,c b 1) is very small. In order for c to be less than 1, r(2) must belarger than r(1) but r(3) cannot be much larger than r(2). In otherwords, the jump in the aggregate retention rate between periods 2and 3 is crucial to determining whether there is positive ornegative dependence at the level of the individual customer: the

Fig. 8. Evolution of the shape of the beta-discrete-Weibull retention curve as the lev

bigger the difference between r(2) and r(3), the more positiveduration dependence there is in the data (i.e., individualcustomers’ propensities to churn increase over time). (Weobserve this in Fig. 6 (where r(1) = 0.75). For the Case 3retention curves, r(2) = 0.952 and r(3) = 0.973 in Fig. 6a, whilethe corresponding numbers in Fig. 6b are 0.923 and 0.956. Thebigger gap is observed in Fig. 6b, which corresponds to the caseof c N 1).

Alternative Model Specifications

The conclusions drawn above about the nature of retentiondynamics are based on the assumption that the BdW is a validcharacterization of the true data-generating process. In thissection, we investigate the robustness of these results byexploring some alternative model specifications. We firstconsider a model based on an alternative distribution thatallows for retention dynamics at the level of the individualcustomer, and find that our results hold. This analysis stillsuffers from a key assumption associated with our BdW-basedanalysis, that of homogeneity in the parameter that determineswhether individual-level retention probabilities are decreasing,constant, or increasing over time. We therefore undertake someanalysis in which we allow the c parameter of the discrete-Weibull distribution to vary across customers. We do not findany strong evidence to support such an effect.

Changing the Underlying Weibull Distribution

In Section 4, we noted that the natural starting point foraccommodating individual-level retention probability dynamicsin a continuous-time environment would be to replace theexponential distribution (the continuous-time equivalent of thegeometric distribution) with the Weibull distribution (whichreduces to the exponential when c = 1). Alternatively, we could

el of heterogeneity in Θ increases (with E(Θ) = 0.25 and c = 1.25 in all cases).

Page 11: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

6 Strictly speaking, the B2 has what is known in the survival analysisliterature as an upside-down bathtub-shaped failure-rate/hazard function (Glaser1980) when s N 1; this maps to a U-shaped retention curve.7 When s = 1, this collapses to the Pareto distribution of the second kind,

which is the continuous-time equivalent of the BG model (Fader et al. 2017).8 The fact that the B2 survivor function contains the Gaussian hypergeometric

function means that it is impractical to estimate the model parameters in Excel.Our maximum likelihood estimates of the model parameters were obtainedusing MATLAB.

Fig. 9. Nature of individual-level duration dependence as a function of r(2) and r(3) given r(1) = 0.60 (LHS) and r(1) = 0.75 (RHS).

11P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

assume that the continuous lifetime random variable X isdistributed gamma, with pdf

f x jλ; sð Þ ¼ λsxs−1e−λx

Γ sð Þ :

where λ N 0 is the rate (or scale) parameter and s N 0 is the shapeparameter. The gamma distribution exhibits negative durationdependence when s b 1 and positive duration dependence whens N 1. This corresponds to increasing and decreasing retentionrates, respectively. When s = 1, the gamma distribution collapsesto the exponential distribution. While there is no closed-formexpression for the associated survivor function, it can be writtenin terms of the incomplete gamma function:

S x jλ; sð Þ ¼ 1−γ s; λxð ÞΓ sð Þ :

Each customer’s λ is unobserved (and unobservable) and istreated as a realization of the random variable Λ, which weassume to be distributed gamma with pdf

g λ j r; αð Þ ¼ αrλr−1e−αλ

Γ rð Þ :

Assuming the value of s is constant across the population, itfollows that, for a randomly chosen individual,

S x j r; α; sð Þ ¼Z ∞

0S x jλ; sð Þg λ j r; αð Þdλ

¼ 1−1

sB r; sð Þα

αþ x

� �r xαþ x

� �s

2

� F1 r þ s; 1; sþ 1;x

αþ x

� �;

where 2F1(·) is the Gaussian hypergeometric function. Thisgamma mixture of gammas is sometimes known as the “beta ofsecond kind” (B2) distribution. The implied retention curve is

monotonically increasing when s ≤ 1 and U-shaped whens N 1.6,7

The B2 is a continuous-time distribution. As noted in Section 4,the discrete-time equivalent of a continuous distribution can beconstructed by treating the discrete lifetime variable as the integerpart of the continuous lifetime and discretizing its survivorfunction. For T = ⌊X⌋,

P T ¼ t j r; α; sð Þ ¼ S t−1 j r; α; sð Þ−S t j r; α; sð Þ; t ¼ 1; 2; 3;…

Fitting this model to the High End dataset using a five-yearmodel calibration period yields the following parameter esti-mates: r ¼ 0:483, α ¼ 0:562, and s ¼ 2:721; the associated valueof the log-likelihood function is −1222.8.8 The model-basedestimates of survival and retention are compared against theactual numbers in Fig. 10.

In terms of fit and forecasting performance (for both survivaland retention), the results are almost identical to those associatedwith the BdW model. Our estimate of s is greater than 1, whichimplies that individual churn probabilities increase with tenure,even though the aggregate retention rate increases once we arepast the initial dip. This demonstrates that the conclusions drawnusing the BdW model are robust to changes in the underlyingmodel specification.

Relaxing the Assumption of Common c

The dW distribution has two parameters: θ and c. Whenderiving the BdW distribution, we allowed θ to vary across

Page 12: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

Fig. 10. Comparing actual and B2-model-based estimates of survival (LHS) and retention (RHS) for the High End dataset given an five-year model calibration period.(The model-based numbers to the right of the vertical dashed line are projections given the parameter values estimated using the data to the left of this line.)

12 P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

customers but assumed that everyone has the same c. To whatextent could our counterintuitive results regarding individual-levelretention dynamics be a consequence of this assumption? In otherwords, what happens if we allow c to vary across customers?

To the best of our knowledge, there is no continuousdistribution f(c) that can be used to characterize heterogeneityin c that results in a closed-form solution to the integral

Z ∞

01−θð Þtc f cð Þdc:

We will therefore capture heterogeneity in c using a discretemixing distribution.

Let us consider the two-component dW distribution withsurvivor function

S t jθ1; θ2; c1; c2; πð Þ ¼ πS t jθ1; c1ð Þþ 1−πð ÞS t jθ2; c2ð Þ; 0 b π b 1:

The five model parameters are not identified if we use thefive-year model calibration period (as we only observe fourrenewal opportunities). We will therefore use the whole dataset,which contains 12 renewal opportunities, in our investigations

Table 5Estimation results for the two-component dW distribution and its nestedvariants.

Model Specification:

– heterogeneity in θ ✓ ✓ ✗ ✗

– heterogeneity in c ✓ ✗ ✓ ✗

θ1 0.068 0.019θ2 0.288 0.302θ 0.135 0.160c1 0.853 0.668c2 1.386 2.230c 1.232 0.688π 0.710 0.599 0.840LL –2004.3 –2004.5 –2005.3 –2027.7AIC 4018.8 4017.0 4018.6 4059.4BIC 4043.2 4036.6 4038.2 4069.2Evidence ratio 4.5 1.9 4.4 3 × 109

of heterogeneity in c. The estimation results are reported inTable 5, along with the results for all the nested modelsthat “switch off” the heterogeneity (i.e., θ1 = θ2 = θ and/orc1 = c2 = c).9

When we allow for heterogeneity in both θ and c, we seethat one segment of the customer base exhibits negativeduration dependence while the other exhibits positive durationdependence. At first glance, this would suggest that ourassumption of homogeneity in c is not supported. However,we must first compare the fit of this specification to that of itsnested variants. On the basis of both AIC and BIC, thespecification with heterogeneity in θ and homogeneity in c isthe best model. Estimating the BdW model using the fulldataset yields the following results: γ ¼ 0:250, δ ¼ 1:654, c¼ 1:597, LL = − 2004.8. The associated values of AIC andBIC are 4015.6 and 4030.4, respectively. This means the BdWis the “best” model among those examined in this analysis.

We also report in Table 5 the evidence ratio (Anderson2008; Burnham and Anderson 2002), which tells us thestrength of the empirical support for the model with theminimum AIC relative to the other candidate models.10 Theevidence for the BdW is 4.5 times that for the full two-component dW model and 4.4 times that for the model thatallows for heterogeneity in c but not in θ. The evidence for theBdW is only 1.9 times that for the two-component model withhomogeneous c. The conclusion we draw from this is that thereis no reason to reject our modeling assumption of homogeneityin c.

9 What about the fit of a three-component model? The value of the log-likelihood function is −2004.1. This improvement of 0.2 (relative to the log-likelihood associated with the two-component model) comes at a cost of threeadditional model parameters, and so the three-component model is clearlydominated by the two-component model.10 The evidence ratio is the relative likelihood of a pair of models. Theevidence ratio for the best model (i.e., the one with the lowest AIC) versusmodel i is computed as Ei = exp ((AICi − AICmin)/2). An evidence ratio of Ei

means that the probability that the model with the lowest AIC is the K-L bestmodel is Ei times that of model i. By K-L best, we mean that the model has thesmallest estimated Kullback-Leibler distance (i.e., it has the lowest informationlost among the models used to approximate the true data-generating process).

Page 13: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

13P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

We find similar results when we bring in the betadistribution for θ and use a discrete mixture for c alone. Eventhough there is intuitive appeal for having heterogeneity in c,there is very little empirical support for it. Most of the “action”is in the heterogeneity in the baseline churn propensities acrosscustomers; once this factor is accommodated, there is virtuallynothing left over to be explained by a more elaborate modelspecification.

Discussion

Generally speaking, aggregate/cohort-level retention ratesincrease over time, and the BG model has proven to be a robusttool for projecting retention (and therefore survival) into thefuture. According to the BG model, this phenomenon is entirelydue to heterogeneity; individual-level propensities to churn areassumed to be constant. Despite the performance of the model,a number of people struggle with this assumption, contendingthat increasing cohort-level retention rates are the result ofindividual-level propensities to churn decreasing over time.

Occasionally we observe an initial dip in the cohort-levelretention rates before they increase. This phenomenon cannot becaptured by the BGmodel.We have presented the BdWmodel asan extension to the BG model, one that relaxes the assumptionof time-invariant individual-level propensities to churn. Thismodel is sufficiently flexible to capture the phenomenon of non-monotonically increasing cohort-level retention rates.11 If theaggregate retention curve is U-shaped then individual-level churnprobabilities must increase over time (i.e., c N 1). However,c N 1 does not guarantee that the aggregate retention curve is U-shaped; the effect of heterogeneity can swamp the individual-level positive duration dependence to yield a monotonicallyincreasing aggregate retention curve. Our analysis suggests thatwhen the assumption of constant individual-level propensities tochurn is violated in a setting where aggregate retention rates areincreasing, it is most likely that these individual propensitiesincrease with tenure (rather than decrease, as conventionalwisdom would suggest).

This surprising result that, if not constant, individual-levelchurn propensities are expected to increase over time is supportedby other researchers. In an analysis of health-club membershipdata, Giudicati, Riccaboni, and Romiti (2013) find a negativecorrelation between length of membership and the probability ofa member renewing their contract. Lemmens and Croux (2006)find that a customer’s churn probability is postively correlatedwith the length of time they have owned their current phone.Jamal and Bucklin (2006) use a latent-class Weibull model intheir analysis of churn among customers of a direct-to-homesatellite TV provider. In all three segments, the estimate of c isgreater than 1. Schweidel, Fader, and Bradlow (2008) use aWeibull-gammamodel (a continuous-time analogue of the BdW)to analyze churn among customers of a telecommunications

11 As previously noted, Fader and Hardie (2010) explore how to use the BGmodel as the basis for calculating CLV. We show in Appendix C how suchcalculations can be performed assuming lifetimes are characterized by the BdWmodel.

provider and find that c N 1. In a non-marketing context,Morrison and Schmittlein (1980) observe the same result insome analyses of job duration data.

What are some potential causes for such an effect? Possibleexplanations include the novelty of the new service wearing offor boredom setting-in over time, increasing competitivepressures, and changes in consumer preferences that are notmatched by changes in the firm’s offerings. In additional toexploring such possibly causes, future research should explorethe robustness of this result across other product categories.Given the increasing role of networks in society, it is importantthat we understand the role of network externalities. Of interestis how changes in network structure (and the customer’scentrality in the network) affect retention rates.

Appendix A. Deriving the Distribution of P(t)

Recall from (9) that, conditional on θ and c, the individual-levelprobability that someone who has made t − 1 renewals will renewat the next opportunity is

ρ t jθ; cð Þ ¼ 1−θð Þtc− t−1ð Þc ; t ¼ 1; 2; 3;… ðA1Þ

The distribution of Θ across those individuals who havemade t − 1 renewals is simply the posterior distribution of Θfor the BdW model:

f θ jγ; δ; c; t−1 renewalsð Þ ¼ S t−1 renewals jθ; cð Þ f θ jγ; δð ÞS t−1 renegwals jγ; δ; cð Þ

¼1−θð Þ t−1ð Þc θγ−1 1−θð Þδ−1

B γ; δð ÞBðγ; δþ t−1Þcð Þ

B γ; δð Þ¼ θγ−1 1−θð Þδþ t−1ð Þc−1

Bðγ; δþ t−1Þcð Þ ; t ¼ 1; 2; 3;…

We derive the distribution of P(t) using the basic result forderiving the distribution of the function Y = g(X ) of randomvariable X:

f Y yð Þ ¼ ddy

g−1 yð Þ����

���� f X g−1 yð Þ� �: ðA2Þ

Rewriting (A1) as ρ(t) = g(θ), we have

g−1 ρ tð Þð Þ ¼ 1−ρ tð Þ 1tc− t−1ð Þc and

ddρ tð Þ g

−1 ρ tð Þð Þ ¼ −1

tc− t−1ð Þc ρ tð Þ 1tc− t−1ð Þc−1:

It follows from (A2) that

f ρ tð Þ jγ; δ; cð Þ

¼ 1tc− t−1ð Þc ρ tð Þ 1

tc− t−1ð Þc−11−ρ tð Þ 1

tc− t−1ð Þcn oγ−1

ρ tð Þ 1tc− t−1ð Þc

n oδþ t−1ð Þc−1

Bðγ; δþ t−1Þcð Þ

¼ 1tc− t−1ð Þc

1ρ tð Þ

1−ρ tð Þ 1tc− t−1ð Þc

n oγ−1ρ tð Þ 1

tc− t−1ð Þcn oδþ t−1ð Þc

Bðγ; δþ t−1Þcð Þ ; t ¼ 1; 2; 3;…:

Page 14: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

14 P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

The mean of this distribution is

E P tð Þ jγ; δ; c½ �

¼Z 1

0

1tc− t−1ð Þc

1−ρ tð Þ 1tc− t−1ð Þc

n oγ−1ρ tð Þ 1

tc− t−1ð Þcn oδþ t−1ð Þc

Bðγ; δþ t−1Þcð Þ dρ tð Þ

which, letting z ¼ ρðtÞ 1tc−ðt−1Þc ,

¼ 1Bðγ; δþ t−1Þcð Þ

Z1

0

1−zð Þγ−1zδþtc−1 dz

¼ B γ; δþ tcð ÞBðγ; δþ t−1Þcð Þ ; t ¼ 1; 2; 3;…;

which is r(t) under the BdW model.

Appendix B. Implementing the BdW Model in Excel

We briefly describe how to estimate the BdW modelparameters using Microsoft Excel. It is assumed that the readeris familiar with the basics of estimating the parameters of theBG model, as covered in FH, Appendix B.

With reference to Fig. B1, we “code up” our expression for S(t) (cells F7:F11) and compute P(T = t) as S(t − 1) − S(t) (cellsE8:E11). So as to reduce the chances of any numerical precisionproblems, we compute S(t | γ,δ,c) as exp(ln(B(γ,δ + tc)) −ln (B(γ,δ))). The elements of the log-likelihood functionare computed in cells G8:G12, and their sum computed incell B4.

Fig. B1. Screenshot of the Excel worksheet for parameter estimation (HighEnd dataset).

The values of γ, δ, c that maximize the value of the log-likelihood function are found using the Excel add-in Solver —see Fig. B2. (In this particular case, the log-likelihood functionis quite flat near its maximum. The parameter estimatesreported in Table 2 are obtained by running Solver twice,with the solution from the first “run” serving as starting valuesfor the second “run”.) With reference to Fig. B3, we can nowproject S(t) by copying cell F11 down to F19, and compute theassociated retention rates as S(t)/S(t − 1) (cells H8:H19).

Fig. B2. Solver settings.

Fig. B3. Screenshot of the Excel worksheet for projecting survival and retention(High End dataset).

Appendix C. Computing CLV under the BdW Model

Fader and Hardie (2010) introduce the idea of discountedexpected lifetime and discounted expected residual lifetime,which they label DEL and DERL, and show how thesequantities are of use when computing (expected) customerlifetime value and residual lifetime value in contractual settings.It is more correct to think of the random variables discountedlifetime (DL) and discounted residual lifetime (DRL) and tocompute their means, giving us expected discounted lifetime (E(DL)) and expected discounted residual lifetime (E(DRL)).

For discount rate d, the expected discounted lifetime of anas-yet-to-be-acquired customer is, by definition,

E DLð Þ ¼X∞t¼0

S tð Þ1þ dð Þt ; ðC1Þ

Page 15: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

15P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

while expected discounted residual lifetime of customer at theend of period n (i.e., someone who has made n − 1 renewals) is

E DRL jactive for n periodsð Þ ¼X∞t¼n

S t Tj in−1ð Þ1þ dð Þt−n : ðC2Þ

When the duration of a customer’s relationship with the firmis characterized by the BG model, we can substitute (3) in (C1)and (C2), solve the infinite sums, and derive closed-formexpressions for E(DL) and E(DRL). This is not the case whenlifetimes are characterized by the BdW model. However, wecan simply evaluate (C1) and (C2) terminating the series at apoint where additional terms are effectively zero. Withreference to Fig. C1, we first compute S(t) for t = 0, 1, 2, …,100 using (10) in column C. Given an annual discount rate of10% (cell B4, the discount factor is computed in column E.Summing up the product of the two terms (cell E6) gives us thevalue of E(DL) for an as-yet-to-be-acquired customer.

In order to compute E(DRL), we need to evaluate theconditional survivor function S(t |T N n − 1). Standing at theend of year 5, the customer has made four renewals, and we cancompute the conditional survivor function as S(t)/S(4) — seecolumn G. Given the discount factor computed in column H,we sum up the product of the two terms (cell H6), giving us thevalue of E(DRL).

Fig. C1. Screenshot of the Excel worksheet for computing E(DL) and E(DRL)under the BdW model.

References

Anderson, David R. (2008), Model Based Inference in the Life Sciences: APrimer on Evidence. New York, NY: Springer Science+Business Media,LLC.

Berry, Michael J.A. and Gordon S. Linoff (2004), Data Mining Techniques2ndedition. Indianapolis, IN: Wiley Publishing, Inc.

Bolton, Ruth N. (1998), “A Dynamic Model of the Duration of the Customer’sRelationship with a Continuous Service Provider: The Role of Satisfaction,”Marketing Science, 17, 1, 45–65.

Bracquemond, Cyril and Olivier Gaudoin (2003), “A Survey on DiscreteLifetime Distributions,” International Journal of Reliability, Quality andSafety Engineering, 10, 1, 69–98.

Buchanan, Bruce and Donald G. Morrison (1988), “A Stochastic Model of ListFalloff with Implications for Repeat Mailings,” Journal of DirectMarketing, 2, Summer, 7–15.

Burnham, Kenneth P. and David R. Anderson (2002), Model Selection andMultimodel Inference: A Practical Information-Theoretical Approach2ndedition. New York, NY: Springer-Verlag, Inc.

Burnham, Thomas A., Judy K. Frels, and Vijay Mahajan (2003), “ConsumerSwitching Costs: A Typology, Antecedents, and Consequences,” Journal ofthe Academy of Marketing Science, 31, Spring, 109–26.

Fader, Peter S. and Bruce G.S. Hardie (2007), “How to Project CustomerRetention,” Journal of Interactive Marketing, 21, Winter, 76–90.

——— and ——— (2010), “Customer-Base Valuation in a ContractualSetting: The Perils of Ignoring Heterogeneity,” Marketing Science, 29,January–February, 85–93.

——— and ——— (2014), “A Spreadsheet-Literate Non-Statistician’s Guideto the Beta-Geometric Model,” http://brucehardie.com/notes/032/.

Fader, Peter S., Bruce G. S. Hardie, Daniel McCarthy, and RamnathVaidyanathan (2017), “Exploring the Equivalence of Two CommonMixture Models for Duration Data,” (unpublished working paper).

Fox, Richard J., Srinivas K. Reddy, and Bharat Rao (1997), “ModelingResponse to Repetitive Promotional Stimuli,” Journal of the Academy ofMarketing Science, 25, 3, 242–55.

Giudicati, Gianna, Massimo Riccaboni, and Anna Romiti (2013), “Experience,Socialization and Customer Retention: Lessons From the Dance Floor,”Marketing Letters, 24, December, 409–22.

Glaser, Ronald E. (1980), “Bathtub and Related Failure Rate Characteriza-tions,” Journal of the American Statistical Association, 75, September,667–72.

Israel, Mark (2005), “Tenure Dependence in Consumer-Firm Relationships: AnEmpirical Analysis of Consumer Departures from Automobile InsuranceFirms,” The RAND Journal of Economics, 36, Spring, 165–92.

Jamal, Zainab and Randolph E. Bucklin (2006), “Improving the Diagnosisand Prediction of Customer Churn: A Heterogeneous Hazard ModelingApproach,” Journal of Interactive Marketing, 20, Summer/Autumn, 16–29.

Kiefer, Nicholas M. (1988), “Economic Duration Data and Hazard Functions,”Journal of Economic Literature, 26, June, 646–79.

Lai, C.D. (2013), “Issues Concerning Constructions of Discrete LifetimeModels,” Quality Technology and Quantitative Management, 10, 2,251–62.

Lemmens, Aurélie and Christophe Croux (2006), “Bagging and BoostingClassification Trees to Predict Churn,” Journal of Marketing Research, 43,May, 276–86.

Morrison, Donald and Arnon Perry (1970), “Some Data Based Models forAnalyzing Sales Fluctuations,” Decision Sciences, 1, 3–4, 258–74.

Morrison, Donald G. and David C. Schmittlein (1980), “Jobs, Strikes, andWars: Probability Models for Duration,” Organizational Behavior andHuman Performance, 25, April, 224–51.

Murthy, D., N. Prabhakar, Min Xie, and Renyan Jiang (2004), Weibull Models.Hoboken, NJ: John Wiley & Sons.

Nakagawa, Toshio and Shunji Osaki (1975), “The Discrete WeibullDistribution,” IEEE Transactions on Reliability, 24, December, 300–1.

Nitzan, Irit, Barak Libai, and Danit Ein-Gar (2011), “The Time of DecreasingRetention: Payments as a Virtual Lock-in,” unpublished working paper.

Pielou, E.C. (1962), “Runs of One Species with Respect to Another in Transectsthrough Plant Populations,” Biometrics, 18, December, 579–93.

Potter, R.G. and M.P. Parker (1964), “Predicting the Time Required toConceive,” Population Studies, 18, July, 99–116.

Proschan, Frank (1963), “Theoretical Explanation of Observed DecreasingFailure Rate,” Technometrics, 5, August, 375–83.

Reichheld, Frederick F. (1996), The Loyalty Effect: The Hidden Force BehindGrowth, Profits, and Lasting Value. Boston, MA: Harvard Business SchoolPress.

Page 16: “How to Project Customer Retention” Revisited: The ...josephdavin.com/wp-content/uploads/2018/07/Fader-et-al-2018-How … · “How to Project Customer Retention” Revisited:

16 P.S. Fader et al. / Journal of Interactive Marketing 43 (2018) 1–16

Rinne, Horst (2009), The Weibull Distribution: A Handbook. Boca Raton, FL:Chapman & Hall/CRC Press.

Schweidel, David A., Peter S. Fader, and Eric Bradlow (2008), “UnderstandingService Retention Within and Across Cohorts Using Limited Information,”Journal of Marketing, 72, January, 82–94.

Vaupel, James W. and Anatoli I. Yashin (1985), “Heterogeneity’s Ruses: SomeSurprising Effects of Selection on Population Dynamics,” The AmericanStatistician, 39, August, 176–85.