dsaa 2016 churn prediction in mobile social games

Post on 21-Jan-2018

164 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Churn Prediction inMobile Social Games: Towards a Complete Assessment Using Survival Ensembles

1

África Periáñez, Alain Saas, Anna Guitart and Colin MagneIEEE/ACM DSAA 2016Montreal, October 19th, 2016

About us

2

Who are we?● Game and technology company based in Tokyo (spin-off of

Silicon Graphics)

● Research project to provide Game Data Science as a Service

● Goals: predict player behavior, scale to big data and intuitive result visualization

3

● Free-to-play mobile social games● in-app purchases and activity behavioral data

Our data

4

Churn prediction in Free-To-Play games

We focus on the top spenders: the whales ➔ 0.2% of the players, 50 % of the revenues➔ Their high engagement make them more likely to answer positively to

action taken to retain them➔ For this group, we can define churn as 10 days of inactivity

◆ The definition of churn in F2P games is not straightforward

Features selection

◎ Game independent features:

○ player attention: time spent per day, lifetime

○ player loyalty : number of days connecting, loyalty index (number of days played over lifetime), days from registration to first purchase, days since last purchase

○ player intensity: number of actions, sessions, amount in-app purchases, action activity distance (total average actions compared to last days behaviour)

○ player level: concept common to most games)

◎ Game dependent features researched but ultimately not part of our model:

○ participation in a guild (social feature)

○ actions measured by categories

5

The modelSurvival Ensembles

6

Challenge: modeling churn

◎ Survival analysis focuses on predicting the time-to-event, e.g. churn○ when a player will stop playing?

◎ Classical methods, like regressions, are appropriate when all players have left the game

◎ Censoring Problem: dataset with incomplete churning information

◎ Censoring is the nature of churn

➔ Survival analysis is used in biology and medicine to deal with this problem

➔ Ensemble learning techniques provide high-class prediction results

7

◎ We focus on whales◎ Cumulative survival probability (Kaplan-Meier estimates) ◎ Step function that changes every time that a player churns

8

Output of the model

◎ Two approaches:○ Churn as a binary classification○ Churn as a censored data problem

◎ One model: Conditional Inference Survival Ensembles1 ○ deals with censoring ○ high accuracy due to ensemble learning

Survival Analysis

➔ Survival analysis methods (e.g. Cox regression) does not follow any particular statistical distribution: fitted from data

➔ Fixed link between output and features: efforts to model selection and evaluation

1) Hothorn et al., 2006. Unbiased recursive partitioning: A conditional inference framework 9

Challenge: modeling churn

Survival Tree➔ Split the feature space

recursively

➔ Based on survival statistical criterion the root node is divided in two daughter nodes

➔ Maximize the survival difference between nodes

➔ A single tree produces instability predictions

Conditional Survival Ensembles➔ Outstanding predictions

➔ Make use of hundreds of trees ➔ Conditional inference survival

ensemble use a Kaplan-Meier function as splitting criterion

➔ Overfit is not present

➔ Robust information about variable importance

➔ Not biased approach10

Conditional inference survival ensembles

Conditional inference survival tree partition with Kaplan-Meier estimates of the survival time which characterizes the players placed in every terminal node group

11

Linear rank statistics as splitting criterion

Survival tree

◎ Two steps algorithm:

○ 1) the optimal split variable is selected: association between covariates and response

○ 2) the optimal split point is determined by comparing two sample linear statistics for all possible partitions of the split variable

Random Survival Forest

➔ RSF is based on original random forest algorithm1

➔ RSF favors variables with many possible split points over variables with fewer

121) Breiman L. 2001. Random Forests.

Conditional inference survival ensembles

The ResultsWith “Age of Ishtaria” Game Data

13

14

Binary classification results and comparison with other models

15

Predicted Kaplan-Meier survival curves as a function of time (days) for new or existing players

Censored data problem results

16

Validation -- Churn prediction

17

Validation -- Churn prediction

1000 bootstrap cross-validation error curves for the survival ensemble model and Cox regression

◎ Censoring problem is the right approach○ the median survival time, i.e. time when the percentage of

surviving in the game is 50%, can be used as a time threshold to categorize a player in the risk of churning

◎ Binary problem -- static model○ also bring relevant information○ useful insight for a short-term prediction

◎ SVM, ANN, Decision Trees, etc. are useful tools for regression or classification problems.○ in their original form cannot handle with censored data○ 1) modification of algorithm or 2) transformation of the data

18

Survival ensembles approach

◎ Application of state-of-the-art algorithm “conditional inference survival ensembles” ○ to predict churn ○ and survival probability of players in social games

◎ Model able to make predictions every day in operational environment

◎ adapts to other game data: Democratize Game Data Science

◎ relevant information about whales behaviour ○ discovering new playing patterns as a function of time○ classifying gamers by risk factors of survival experience

◎ Step towards the challenging goal of the comprehensive understanding of players

19

Summary and conclusion

20

Other work of the authors related to Game Data Science

Discovering Playing Patterns:Time Series Clustering of Free-To-Play Game DataAlain Saas, Anna Guitart and África PeriáñezIEEE CIG 2016

Special Session on Game Data ScienceChaired by Alain Saas and África PeriáñezIEEE/ACM DSAA 2016www.gamedatascience.org

top related