user payment prediction in free-to-play

40
USER PAYMENT PREDICTION IN F2P GAMES Master Thesis Ahmed Hassan

Upload: ahmed-hassan

Post on 29-Jan-2018

86 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: User Payment Prediction in Free-to-Play

USER PAYMENT PREDICTION IN F2P GAMES

Master Thesis

Ahmed Hassan

Page 2: User Payment Prediction in Free-to-Play

Overview

Introduction

Methodology

Experiments

Results and Findings

Conclusion and Future Work

Page 3: User Payment Prediction in Free-to-Play

INTRODUCTION

Page 4: User Payment Prediction in Free-to-Play

Who is Bigpoint GmbH?

The company

The BI team

The project

Page 5: User Payment Prediction in Free-to-Play

What is Predictive Analytics?

Predict future behaviour based on past and current data

Process:

Page 6: User Payment Prediction in Free-to-Play

Problem Definition

Player Lifetime Value

𝐿𝑇𝑉𝑡 = 𝑡 ∗ 𝑝𝑡 ∗ 𝑛𝑡 ∗ 𝑐where:

t: timeframe of calculation

pt: average payment within timeframe

nt: number of payments within timeframe

c: other factors such as profit margin, discount rate, etc…

Page 7: User Payment Prediction in Free-to-Play

Problem Definition

Normally to predict LTV, very simple extrapolation is used on the current and past data

This ignores all the factors underlying the variables in the equation and usually yield inaccurate forecasting!

Page 8: User Payment Prediction in Free-to-Play

Problem Statement

“Through the huge amount of data collected about the players in a free-to-play game, which includes player personal information, geographical information, game

experience information, temporal information, etc...; can we predict if a player, who is registered within a certain period, will pay real currency inside the game within a

specified timeframe?”

Page 9: User Payment Prediction in Free-to-Play

Reviewing Literature

Sifa et al., use classification and regression to predict purchase decision and number of payments for an F2P mobile game. They use Decision Trees, SVM and Random Forests for classification; while using Poisson Regression Trees for the count

Xie et al., use a simple approach to obtain generic features independent on game. They only use the frequency of different game events to predict player churn and first payment.

Kim et al., use combined classifiers to predict user purchase decision in an e-commerice application. The combination is done via Genetic Algorithm by modelling the classifier as individuals, and the fitness based on the hit ratio of the classifiers

Page 10: User Payment Prediction in Free-to-Play

METHODOLOGY

Page 11: User Payment Prediction in Free-to-Play

METHODOLOGY: DATA COLLECTION

Page 12: User Payment Prediction in Free-to-Play

Big Data Environment

Page 13: User Payment Prediction in Free-to-Play

Data Collection

Dataset is contain around 300,000 players registered in 3 months period

The dataset contains dimensions regards players personal information, character information, game activity and interaction, in addition to the payment information

Page 14: User Payment Prediction in Free-to-Play

METHODOLOGY: DATA ANALYSIS

Page 15: User Payment Prediction in Free-to-Play

Payuser Distribution

Page 16: User Payment Prediction in Free-to-Play

Data Analysis

Page 17: User Payment Prediction in Free-to-Play

Dataset Visualization

Page 18: User Payment Prediction in Free-to-Play

Cluster analysis

Page 19: User Payment Prediction in Free-to-Play

METHODOLOGY: DATA MODELLING

Page 20: User Payment Prediction in Free-to-Play

Feature Selection

Spearman’s Coefficient

𝑟𝑠 =𝑐𝑜𝑣(𝑟𝑎𝑛𝑘 𝑥 , 𝑟𝑎𝑛𝑘 𝑦 )

𝜎𝑥 ∗ 𝜎𝑦

Mutual Information

𝑀𝐼 𝑋, 𝑌 =

𝑥,𝑦

𝑃𝑋𝑌 𝑥, 𝑦 log(𝑃𝑋𝑌(𝑥, 𝑦)

𝑃𝑋 𝑥 ∗ 𝑃𝑌(𝑦))

Page 21: User Payment Prediction in Free-to-Play

Class Imbalance

It is when one of the predicted classes has much less number of samples than the others

Bad for classifiers because they learn to predict everything as majority class, as it still gives high accuracy

Solutions? Use different performance measure

Balance the dataset by sampling

Undersampling

Oversampling

Combined

Weighted cost functions

Page 22: User Payment Prediction in Free-to-Play

Class Imbalance

Suitable performance measures True Positive Rate (Sensitivity, Recall)

𝑇𝑃𝑅 =𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒

True Negative Rate (Specificity)

𝑇𝑁𝑅 =𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒

𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒

False Negative Rate

𝐹𝑁𝑅 =𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒

𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑡𝑖𝑣𝑒

False Positive Rate

𝐹𝑃𝑅 =𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒

𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒

Page 23: User Payment Prediction in Free-to-Play

Class Imbalance

Synthetic Minority Oversampling TEchnique (SMOTE)

Page 24: User Payment Prediction in Free-to-Play

The Classification Problem

Which classifiers to use? We use criteria to help:

1. Offer a good true positive rate

2. Handle nonlinear feature space

3. Have good generalization, and does not overfit when using some of the class balancing techniques

4. Able to adjust the weights of the classes or optimize the cost function of the classifier

Page 25: User Payment Prediction in Free-to-Play

Classifiers

Support Vector Machines

Weighted Random Forests

Gradient Boosting

Page 26: User Payment Prediction in Free-to-Play

EXPERIMENTS

Page 27: User Payment Prediction in Free-to-Play

Experiment 1

Goal: To test the performance of the classifiers without application of SMOTE

Settings: Weighted Random Forests

number of trees = 500

number of random features to use = 10

SVM

kernel = RBF

gamma = 1

C = 200

Gradient Boosting

number of trees = 150

depth of tree = 3

Page 28: User Payment Prediction in Free-to-Play

Experiment 2

Goal: To test the performance of the classifiers after application of SMOTE

Settings: Weighted Random Forests

number of trees = 100

number of random features to use = 10

SVM

kernel = RBF

gamma = 0.5

C = 300

Gradient Boosting

number of trees = 150

depth of tree = 3

Page 29: User Payment Prediction in Free-to-Play

RESULTS AND FINDINGS

Page 30: User Payment Prediction in Free-to-Play

SVM Results

AUC without SMOTE = 0.8639

AUC with SMOTE = 0.8969

Page 31: User Payment Prediction in Free-to-Play

Random Forests Results

AUC without SMOTE = 0.9537

AUC with SMOTE = 0.9607

Page 32: User Payment Prediction in Free-to-Play

Gradient Boosting Results

AUC without SMOTE = 0.8831

AUC with SMOTE = 0.8953

Page 33: User Payment Prediction in Free-to-Play

Classifiers Performance

Experiment 1

Algorithm ACC TPR TNR FPR FNR AUC

SVM 0.950 0.25 0.99 0.01 0.55 0.8639

wRF 0.96 0.62 0.97 0.03 0.38 0.9537

GBM 0.89 0.19 0.97 0.03 0.81 0.8831

Page 34: User Payment Prediction in Free-to-Play

Classifiers Performance

Experiment 2

Algorithm ACC TPR TNR FPR FNR AUC

SVM 0.95 0.39 0.99 0.01 0.61 0.8969

wRF 0.97 0.66 0.97 0.03 0.34 0.9607

GBM 0.94 0.36 0.97 0.03 0.64 0.8953

Page 35: User Payment Prediction in Free-to-Play

Findings

Using SMOTE improves the classifiers performance

The TPR is still suffering, which could be attributed to the selected features

Gradient Boosting seems to overfit due to the large number of sequential tees

Although Random Forests has more developed and deeper trees, it is highly parallelizable, in contrast to Gradient Boosting which sequential nature; so Random Forests is faster and favorable in our case with a big dataset, while SVM was worst in terms of computation time

The results confirm our doubts about the class overlapping

Page 36: User Payment Prediction in Free-to-Play

CONCLUSION AND FUTURE WORK

Page 37: User Payment Prediction in Free-to-Play

Summing Up

The goal was to create a framework or a process to help BI in predicting user payments using machine learning; to be able optimize their output analysis and for better targeting

We have followed the predictive analytics procedure from collecting data, to analysis, to modelling

We have shown that there is potential for the methodology we follow with acceptable performance; however we need to address the open issues that we found before starting the last step of deployment

Page 38: User Payment Prediction in Free-to-Play

Future Work

To achieve more beneficial prediction, we want to predict also

Number of payments

Value of payments

Add more features like in-game activities, and game technical performance

Address the class overlapping problem, using more data from different time windows, as well as the newly introduced features

Integrate the final framework into the current running systems used by BI

Page 39: User Payment Prediction in Free-to-Play

Questions?

Page 40: User Payment Prediction in Free-to-Play