games and big data: a scalable multi-dimensional churn...

19
Games and Big Data: A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart and África Periáñez (Silicon Studio) CIG 2017 New York 23rd August 2017

Upload: others

Post on 14-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Games and Big Data: A Scalable Multi-Dimensional Churn Prediction Model

Paul Bertens, Anna Guitart and África Periáñez (Silicon Studio)

CIG 2017 New York 23rd August 2017

Page 2: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

2

Who are we?

Game studio and graphics middleware company based in Tokyo (spin-off of Silicon Graphics)

YOKOZUNA data: Research unit of Game Data Science providing individual player predictions to game studios

Goals: predict player behavior, scale to big data and provide an intuitive result visualization

Page 3: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Churn prediction in Free-To-Play games

31) Rothenbuehler J. et al., 2015. Hidden markov models for churn prediction.2) Periáñez A. et al., 2016. Churn prediction in mobile social games: towards a complete assessment using survival ensembles

This is all about churn… When a player is going to exit the game?

→ In terms of days1,2, level, played hours

We focus on the top spenders: whalesless than 2% of the players, 50 % of the revenue

General model that adapts to diverse games and datasets we define churn as 10 days of inactivity (coming back only 1% revenue)the definition of churn in F2P games is not straightforward

Parallelizable algorithmapplied in a production environmentscalable to Big Data up to tens of millions MAU

Page 4: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

The model: Survival Ensembles

4

Page 5: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Challenge: modeling churn

Survival analysis focuses on predicting the time-to-event, e.g. churn

➔ Survival analysis is used in biology and medicine to deal with this problem

➔ Ensemble learning techniques provide high-class prediction results

Classical methods, like regressions, are appropriate when all players have left the gameCensoring Problem: dataset with incomplete churning informationCensoring is the nature of churn

○ when a player will stop playing?

5

Page 6: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

6

● Two approaches:○ Churn as a binary classification○ Churn as a censored data problem

➔ Survival analysis methods (e.g. Cox regression3) do not follow any particular statistical distribution: fitted from data

➔ Fixed link between output and features: efforts to model selection and evaluation

2) Hothorn T. et al., 2006. Unbiased recursive partitioning: A conditional inference framework.3) Cox. D.R., 1972. Regression Models and Life-Tables.

● One model: Conditional Inference Survival Ensembles2 ○ deals with censoring ○ high accuracy due to ensemble learning

Survival Analysis

Challenge: modeling churn

Page 7: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Survival Tree

➔ Split the feature space recursively

➔ Based on survival statistical criterion the root node is divided in two daughter nodes

➔ Maximize the survival difference between nodes

➔ A single tree produces instability predictions

Conditional Survival Ensembles

➔ Make use of hundreds of trees

➔ Outstanding predictions ➔ Conditional inference survival ensemble use a

Kaplan-Meier function as splitting criterion

➔ Robust information about variable importance ✓

➔ Overfit is not present ✓

➔ Not biased approach ✓

Conditional inference survival ensembles

7

Page 8: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

● Two steps algorithm:○ 1) the optimal split variable is selected: association between covariates

and response○ 2) the optimal split point is determined by comparing two-sample linear

statistics for all possible partitions of the split variable

Random Survival Forest4 ➔ RSF is based on original random forest algorithm5

➔ RSF favors variables with many possible split points over variables with fewer

4) Ishwaran H. et. al, 2008. Random Survival Forests.5) Breiman L. et. al, 2001. Random Forests. 8

Conditional inference survival ensembles

Page 9: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

9

● Cumulative survival probability ● Step function that changes every time that a player churns● Output in terms of level and playtime (hours played)

Kaplan-Meier estimates6

6) Kaplan E. L. et. al., 1958. Non-parametric estimation from incomplete observations.

Page 10: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

ResultsConditional Inference Survival Ensembles

10

Page 11: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Features selection

Daily logins, purchases, playtime and level-ups

player attention: ● information per day (e.g.playtime per day)

player loyalty: ● mean over several different time periods● time elapsed until first and last day to

information (e.g. time from last purchase)player intensity:

● total amount (e.g. total in-app purchases)

player level (concept common to most games)

11

Page 12: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Features selection

Daily logins, purchases, playtime and level-ups

player attention: ● information per day (e.g.playtime per day)

player loyalty: ● mean over several different time periods● time elapsed until first and last day to

information (e.g. time from last purchase)player intensity:

● total amount (e.g. total in-app purchases)

player level (concept common to most games)

12

RPG free-to-play gameAction battle card game popular in JapanLong-term loyal players

Page 13: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Predicted Kaplan-Meier survival curves as a function of playtime (hours) and level for new or existing players

Censored data problem results

13

Page 14: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Validation -- Churn prediction

Survival Ensembles

Cox Regression

14

median survival level, i.e. level when the percentage of surviving in the game is 50%

Page 15: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Survival Ensembles

Cox Regression

15

median survival playtime, i.e. number of played hours when the percentage of surviving in the game is 50%

Validation -- Churn prediction

Page 16: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

1000 bootstrap cross-validation error curves for the survival ensemble model and Cox regression

Model IBS

Survival EnsembleCox RegressionKaplan Meier

0.0250.0540.127

16

7

7) Graf E.. et. al, 1999. Assessment and comparison of prognostic classification schemes for survival data.

Validation -- Churn prediction

Page 17: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

1000 bootstrap cross-validation error curves for the survival ensemble model and Cox regression

Model IBS

Survival EnsembleCox RegressionKaplan Meier

0.0260.0440.134

177) Graf E.. et. al, 1999. Assessment and comparison of prognostic classification schemes for survival data.

7

Validation -- Churn prediction

Page 18: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

Summary and conclusion

● Application of state-of-the-art algorithm “conditional inference survival ensembles” ○ to predict churn and survival probability of players in social games○ median survival time, i.e. time when the percentage of surviving in the game is 50%,

can be used as a time threshold to categorize a player in the risk of churning

Model able to make predictions every day in an operational environmentAdapts to other game data: Democratizing Game Data Science YOKOZUNA data

○ It does not require previous manipulation of the data○ It is able to deal efficiently with the temporary dimension○ It can be parallelized○ It not only outputs churn information but also variable importance

18

Page 19: Games and Big Data: A Scalable Multi-Dimensional Churn ...yokozunadata.com/research/CIG2017_public.pdf · A Scalable Multi-Dimensional Churn Prediction Model Paul Bertens, Anna Guitart

THANK YOU

19

yokozunadata.com