byu computer science department hierarchical bayesian models for rating individual players from...

44
BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Hierarchical Bayesian Models for Rating Individual Players

from Group Competitions

Joshua E. Menke

Page 2: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Why Rank and Rate?

• Ranking in groups important• Sports, executive teams between competing

corporations, military training, etc.• Computer and Video Gaming Industry

– Big business: $18 billion gross output in U.S. in 2004

• Players prefer games that help them compare themselves.

• Use for balancing teams: TrueSkill™• Use for game / level design.

Page 3: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Brief Rating Background

• Elo (1978) for Chess– Thurstone Case V: Normal distribution– Later modified to use a logistic distribution

• Glickman (1999, 2001) for Chess– Bradley-Terry Model

(Bradley and Terry, 1952)

– Uncertainty based on number of matches played and time between matches.

Page 4: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Rating Players From Groups

• TrueSkill™ (Herbrich, Graepel, 2006)– Generalized Bayesian Thurstone Case V

• Huang (2006)– Generalized Bradley-Terry

(Maximum Likelihood)

• Menke et. al (2006)– Hierarchical Bayesian Bradley-Terry– Extensions: improve predictions / analyze game

Page 5: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Bradley-Terry Model

• Two opponents, ability parameters 1 and 2, probability the first opponent wins:

1/(1+2)

• Current logistic Elo uses Bradley-Terry with

x = exp(x).

• Wider distribution: • Allows weaker players a greater chance of winning.

Page 6: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Wolfenstein: Enemy Territory™

• Two Teams or Sides, WWII: Axis vs. Allies

• Objective-based

• Multiplayer

• Online: Players come, go, change teams

• Asymmetry: Team sizes / Maps fairness

• Soccer (Football) Example

• Splash Damage, London

Page 7: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Map-Side in Enemy Territory

• Axis side vs. Allies side

• Matches take place on certain maps

• Different objectives for each side

• Player i on side s for map m

Page 8: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

First Data Set

• Matches: 100 per server, 3 servers for 300

• Players: 877

• Matches per Player: ~ 7

Page 9: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Data Example

InitGame: ...\mapname\fueldump\...Winner: AXIS Time: 1800000Name: |R!P|Orpheo GUID DFBB5: Axis: 0 Allies: 1450200Name: |R!P|Crazyeskimo GUID EF071: Axis: 1549800 Allies: 0Name: sliveR GUID 0A589: Axis: 1614950 Allies: 0Name: DaSaNi GUID 3F6C7: Axis: 1278400 Allies: 0Name: BlackSheep GUID 6C875: Axis: 352600 Allies: 1336200*

* Played on both teams

• Map Name, Winner, Duration• Name, GUID, milliseconds on Axis,Allies

Page 10: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Model

Bayes Law:

We need:

• Prior: p(), model the individual players

• Likelihood: p(matches|), model match outcomes given players

p(µjmatches) = p(matchesjµ)p(µ)Rp(matchesjµ)p(µ)dµ

Page 11: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Basic Player Model

• Let i represent player i’s ability to help their side win a match

• A simple model for i

i » N(,02)

Page 12: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Basic

i » N(,02)

• Let = 0 without loss of generality

• 2 is given a prior distribution

• Symmetric around 0Good players + , bad players –

• But: Assumes map-side has no effect

Page 13: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Accounting for Map-Side Effects

• Map fairness varied in Enemy Territory

• Sometimes harder for Axis, and vice versa

• Basic model naïve

• Map effects uniform for all players

Page 14: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Accounting for Map-Side Effects

• Let i,m-s represent player i’s ability to help side s win a match played on map m:

im-s ´ i + m-s with i » N(0,2)

• 2 given a prior distribution

• Player’s rating increases or decreases based on map-side

Page 15: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Accounting for Map-Side Effects

im-s ´ i + m-s

• Similar to Agresti’s (1988) “homefield” parameter, except one for Axis, one for Allies: model decision for simplicity.

Page 16: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Map-Side Effects

• More skilled team can have equal challenge for a given map by playing on the harder side

• Judge which maps are more balanced.

• Useful for map/level designers

Page 17: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Server Difficulty

• Compare players across different servers

• Determine how a given server affects a player’s rating adding server bias j

i,m-s,j ´ i+m-s+j

j » N(0,2)

• With given a prior distribution

Page 18: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Server Difficulty

i,m-s,j ´ i+m-s+j

• Modeled as an increase instead of decrease in player ability for simplicity.

• Lower not higher is more difficult.

• Player performance composed of base ability, map-side offset, and server difficulty

Page 19: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Server Difficulty

• Can use to choose servers

• Rank players globally across servers

• Requires some server “cross-over”

Page 20: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Likelihood

• Choose side s’s probability of winning a match on map m proportional to:

• Exponentiated sum of player ratings

• Modified by map-size and server

¸s;m =exp(P jP s j

i=1;i2P sµi ;m¡ s;j )

Page 21: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Bradley Terry Likelihood

• Probability of sAxis defeating sAllies:

¸sA x i s ;m=(¸sA x i s ;m +¸sA l l i es ;m)

Page 22: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Likelihood Function

• Product of map predictions

• G: total # of matches, w(g): winning side for match g, l(g): losing side, m: map

P (wj¸) =QG

g=1 ¸w(g);m(¸w(g);m +¸ l(g);m)¡ 1

¸w(g);m = exp(P jPw (g) j

i=1;i2Pw (g)(µi ;m¡ s;j ))

¸ l(g);m = exp(P jP l ( g) j

i=1;i2P l ( g)(µi ;m¡ s;j ))

Page 23: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Public Server Problem

• Players come, go, change teams at will

• Need time played per team

• Available in original data

Page 24: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Simple Exposure Model

• Weighted sum: % time played per time

i,w(g) (i,l(g)): % of the total match time player i spent on the winning (losing) team

¸w(g);m = exp(P jPw (g) j

i=1;i2Pw (g)(¿i ;w(g)µi ;m¡ s;j ))

¸ l(g);m = exp(P jP l ( g) j

i=1;i2P l ( g)(¿i ;l(g)µi ;m¡ s;j ))

Page 25: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Prior Selection

• Instead of non-informative priors, hyperprior distributions:

2, 2

, and 2 ~ Inverse Gamma

• chosen such that the means are 1 and the variances 1/3.

• Keeps player ratings between -3,3

• Hyperpriors to infer relative differences

Page 26: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Fit with MCMC: Quickly

• Markov-Chain Monte Carlo Integration

• Samples complete conditional distributions– Thousands of samples per parameter– Take the mean / standard deviation of samples

Page 27: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

MCMC Results Example: 3-1

Ranked 2 standard deviations below mean

3rd place 8-1 vs. 8th place 9-0

Page 28: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Combined Server Difficulty

• Ranked in order of difficulty

• Lower posterior mean is more difficult

• Veterans could choose to play on server 2

• Newer players on Server 1

Page 29: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Combined Map-Side Effects

• Oasis biased towards Allies.

• Better players should play on Axis

• Venice a balanced map

• Of interest: both popular maps.

Page 30: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Bayesian 2 Goodness-of-fit

• Valen Johnson, Annals of Statistics, 2004• Yields p-values for joint samples

• Server 2 does have a less consistent player base• Biased accuracy near 100%

Page 31: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Problems with MCMC

• Average Enemy Territory match: – 15 minutes

• Time to fit 300 matches with MCMC:– 30 minutes

• MCMC can not keep up with new matches

Page 32: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Second Data Set

• Matches: 5,000

• Players: 2,000+

• Time for MCMC: On the order of days

• Common Efficient Solutions:– Newton-Raphson method– Elo / Glickman Update– Expectation Propagation

Page 33: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Newton-Raphson Method

• Batch Gradient Descent

• L’: vector of first derivatives

• L’’: matrix of second partial derivatives

• k: current iteration

• Note: [-L'']-1 covariance matrix of multivariate normal approximation

Page 34: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Problems with Newton-Raphson

• Requires storing match history and re-fitting the data after every match, becomes impractical and slow.

– Preferable to update based on last match only

• Matrix of partial-second derivatives too large.

– Millions of players: impossible to store. – Takes too long to invert.

Page 35: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Recursive Newton-Raphson

• Based on Bottou and Lecun (2004)

• t a “leaky” approximation to [-L'']-1 (covariance matrix).

Page 36: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Recursive Newton-Raphson

• Bottou and Lecun: Empirical / Theoretical– asymptotically outperforms Newton-Raphson – Any batch gradient descent method.

Page 37: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Applied to Enemy Territory

• Derive from the log posterior

• Priors instead from MCMC

• Example: Player Rating. – Winning Time - Prediction - Shrinkage

Page 38: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Bayesian Shrinkage Terms

• Batch: applied once on entire set of matches

• Recursive: Applied once per update– Weight each by 1/|matches|

• |matches| unknown a priori

– Weight by infinite geometric series 2-t-1

• Sums to 1.0, like applying once

• Effect of prior diminishes given data

Page 39: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Time-Varying

• Recursive algorithms track time-varying differences

• Update a weighted sum of prior performance and recent performance

• Variance approximation leaky, can track changes over time.

Page 40: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Results: Accuracy

• Measured before updating for each match

• For an unfair comparison:– TrueSkill™ Reported Large Teams: ~ 0.62

– More to show 70% is good.

Page 41: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Uses for Ratings

• Rank Players

• Improve Map Design

• Help Choose Servers

• Level up, MMORPG– Clear progression path– Play on easier servers first, “graduate” to harder

ones

Page 42: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Active Team Balancing

• Public Server dynamics mean teams need to be balanced during play

• Greedy: Move player to bring probability of both teams winning closest to 50-50

• Uncomfortable for player moved

• Increases “fun” factor overall

• Sequential optimal design

Page 43: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Future Directions

• Explicitly Model time-varying changes

• Number of players vs. map-side rating

• Online Bayesian Neural Network Training

• Expectation-Propagation for this model

• Direct Comparisons to TrueSkill™

Page 44: BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from Group Competitions Joshua E. Menke TexPoint fonts used

BYU Computer Science Department

Questions?

• Thanks for coming!

• Demo if time:http://stats.etpub.org