4front game data science

35
4Front Game Data Science

Upload: silicon-studio-corporation

Post on 21-Jan-2018

1.939 views

Category:

Technology


0 download

TRANSCRIPT

4Front Game Data Science

Silicon Studio4Front Game Data Science

Who are we?● Game studio and graphics middleware company based

in Tokyo (spin-off of Silicon Graphics)

● Research group to provide Game Data Science as a Service

● Goals: predict player behavior, scale to big data and intuitive result visualization

Convert Raw Data into Knowledge

A Game Data Science teaminto an end-to-end Product

What? understand and predict player behavior

How? using cutting-edge machine learning able to work at scale

Why? increase player activity, retention and happiness

Game Data Science  ゲームデータサイエンス

研究 x プロダクト = 最新鋭の機械学習

 Research x Product = cutting-edge machine learning able to work at scale

Game Data Science as a Service Democratizing Game Data Science

Place your screenshot here

4FrontData Science

A Data Science team into an end-to-end product

ChallengesData Science

▪ discover playing patterns▪ predict player behaviour▪ adapt to different game

data distributions

Big Data Engineering▪ scalability (distributed system)▪ availability ▪ security▪ reliability

4Front Features

4Front Game Data Science features

ChurnPrediction

Customer LifeTime Value

● Time● Level

Forecast of in-app purchases

&MKT Simulation

tool

Pattern Recognition of Player profiles

Achievements 2016

IEEE Computational Intelligence and Games Conference

● Publication at IEEE journal● Selected Oral Presentation!● Invited to the Game Data Science Panel

(only 3 people invited)

Organizers of IEEE/ACM DSAA GDS 2016

● Special Session Chairs in GDS

IEEE/ACM DSAA GDS 2016

● Published at IEEE journal● Selected Presentation! 12,9%

acceptance rate

Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival EnsemblesPeriáñez Á., Saas A., Guitart A. and Magne C.

IEEE/ACM DSAA 2016 Montreal October 19th 2016

Churn prediction in Free-To-Play games

● This is all about churn…● When a player is going to exit the game?● Time and Level

● We focus on the top spenders: whales ● 0.2% of the players, 50 % of the revenues

● general model that adapts to diverse games and datasets

● we define churn as 10 days of inactivity● the definition of churn in F2P games is not straightforward

The model: Survival Ensembles

Challenge: modeling churn

▪ Survival analysis focuses on predicting the time-to-event, e.g. churn

➔ Survival analysis is used in biology and medicine to deal with this problem

➔ Ensemble learning techniques provide high-class prediction results

▪ Classical methods, like regressions, are appropriate when all players have left the game

▪ Censoring Problem: dataset with incomplete churning information▪ Censoring is the nature of churn

▫ when a player will stop playing?

● Churn definition as 10 days of inactivity● Cumulative survival probability (Kaplan-Meier estimates) ● Step function that changes every time that a player churns

Output of the model

● Kaplan-Meier estimates:

▪ Two approaches:▫ Churn as a binary classification▫ Churn as a censored data problem

➔ Survival analysis methods (e.g. Cox regression) does not follow any particular statistical distribution: fitted from data

➔ Fixed link between output and features: efforts to model selection and evaluation

1) Hothorn et al., 2006. Unbiased recursive partitioning: A conditional inference framework

▪ One model: Conditional Inference Survival Ensembles1 ▫ deals with censoring ▫ high accuracy due to ensemble learning

Survival Analysis

Challenge: modeling churn

Survival Tree

➔ Split the feature space recursively

➔ Based on survival statistical criterion the root node is divided in two daughter nodes

➔ Maximize the survival difference between nodes

➔ A single tree produces instability predictions

Conditional Survival Ensembles➔ Make use of hundreds of trees

➔ Outstanding predictions ➔ Conditional inference survival ensemble use a

Kaplan-Meier function as splitting criterion

➔ Robust information about variable importance ✓

➔ Overfit is not present ✓

➔ Not biased approach ✓

Conditional inference survival ensembles

Conditional inference survival tree partition with Kaplan-Meier estimates of the survival time which characterizes the players placed in every terminal node group

Linear rank statistics as splitting criterion to distinguish K-M curves

Survival tree

● Two steps algorithm:

➔ RSF is based on original random forest algorithm2

➔ RSF favors variables with many possible split points over variables with fewer

Conditional inference survival ensembles

○ 1) the optimal split variable is selected: association between covariates and response

○ 2) the optimal split point is determined by comparing two sample linear statistics for all possible partitions of the split variable

Random Survival Forest

2) Breiman L. 2001. Random Forests.

Features selection● player attention:

○ time spent per day

● player loyalty: ○ number of days connecting (loyalty index)○ days from registration to first purchase○ days since last purchase

● player intensity: ○ number of actions, sessions, etc. ○ amount in-app purchases

● player level (concept common to most games)

The ResultsWith “Age of Ishtaria” Game Data

Binary classification results

Predicted Kaplan-Meier survival curves as a function of time (days) for new or existing players

Censored data problem results

Validation -- Churn prediction

Survival Ensembles

Cox Regression

median survival time, i.e. time when the percentage of surviving in the game is 50%

1000 bootstrap cross-validation error curves for the survival ensemble model and Cox regression

Validation -- Churn prediction

median survival level, i.e. level when the percentage of surviving in the game is 50%

Validation -- Churn prediction

Survival Ensembles

Cox Regression

1000 bootstrap cross-validation error curves for the survival ensemble model and Cox regression

Validation -- Churn prediction

0.0500.0680.187

▪ Censoring problem is the right approach▫ the median survival time, i.e. time when the percentage of surviving in the game is 50%,

can be used as a time threshold to categorize a player in the risk of churning

▪ Binary problem -- static model▫ also bring relevant information▫ useful insight for a short-term prediction

▪ SVM, ANN, Decision Trees, etc. are useful tools for regression or classification problems.▫ in their original form cannot handle with censored data▫ 1) modification of algorithm or 2) transformation of the data

Survival ensembles approach

▪ Application of state-of-the-art algorithm “conditional inference survival ensembles” ▫ to predict churn ▫ and survival probability of players in social games

▪ Model able to make predictions every day in operational environment▪ adapts to other game data: Democratize Game Data Science

▫ It does not require previous manipulation of the data▫ It is able to deal efficiently with the temporary dimension▫ It can be parallelized▫ It not only outputs churn information but also variable importance

Summary and conclusion

Do you want to use 4Front?

What distinguishes from the rest?

● Automatic Predictive Analytics

● Simulating future events

● Gives Recommendations

● Goal: Maximizes Sales and create awesome games