how to win texas hold'em poker - school of computer ...mealingr/documents/how to win... · how...

How to Win Texas Hold’em Poker

Richard Mealing

Machine Learning and Optimisation GroupSchool of Computer Science

University of Manchester

1 / 44

How to Play Texas Hold’em Poker

1 Deal 2 private cards per player

2 1st (sequential) betting round

3 Deal 3 shared cards (“flop”)

4 2nd betting round

5 Deal 1 shared card (“turn”)

6 3rd betting round

7 Deal 1 shared card (“river”)

8 4th (final) betting round

If all but 1 player folds, that player wins the pot (total bet)

Otherwise at the end of the game hands are compared (“showdown”)and the player with the best hand wins the pot

2 / 44


3 / 44


Ante = forced bet (everyone pays)

Blinds = forced bets (2 people pay big/small)

If players > 2 then (big blind player, small blind player, dealer)If players = 2 (“heads-up”) then (big blind, small blind/dealer)

No-Limit Texas Hold’em lets you bet all your money in a round

Minimum bet = big blindMaximum bet = all your money

Limit Texas Hold’em Poker has fixed betting limits

A $4/$8 game means in betting rounds 1 & 2 bets = $4 and in bettingrounds 3 & 4 bets = $8Big blind usually equals “small” bet e.g. $4 and small blind is usually50% of big blind e.g. $2Total number of raises per betting round is usually capped at 4 or 5

4 / 44

1-Card Poker Trees

1 Game tree - both players’ private cards are known

5 / 44

1-Card Poker Trees

1 Public tree - both players’ private cards are hidden

6 / 44

1-Card Poker Trees

1 P1 information set tree - P2’s private card is hidden

7 / 44

1-Card Poker Trees


8 / 44

1-Card Poker Trees

1 Game tree - both players’ private cards are known

2 Public tree - both players’ private cards are hidden



9 / 44

Heads-Up Limit Texas Hold’em Poker Tree Size

F

F

©

C

F

©

C

F

©

C

F

©

C

F

©

C

R

R

R

R

C

F

©

C

F

©

C

F

©

C

F

©

C

R

R

R

R

Cards Dealt

P1 dealt 2 private cards =(52

2

)= 1326

P2 dealt 2 private cards =(50

2

)= 1225

1st betting round = 29, 9 continuing

Flop dealt =(48

3

)= 17296

2nd betting round = 29, 9 continuing

Turn dealt = 45

3rd betting round = 29, 9 continuing

River dealt = 44

4th betting round = 29

10 / 44

Heads-Up Limit Texas Hold’em Poker Tree Size

Player 1 Deal = 1

Player 2 Deal = 1326

1st Betting Round = 1326 * 1225 * 29

2nd Betting Round = 1326 * 1225 * 9 * 17296 * 29

3rd Betting Round = 1326 * 1225 * 9 * 17296 * 9 * 45 * 29

4th Betting Round = 1326 * 1225 * 9 * 17296 * 9 * 45 * 9 * 44 * 29

Total = 1.179× 1018 (quintillion)

11 / 44

Abstraction

Lossless

Suit isomorphism, at the start (pre-flop) two hands are strategically thesame if each of their cards’ ranks match and they are both “suited” or“off-suit” e.g. (A♠K♠, A♣K♣) or (T♣J♠, T♦J♥), 169 equivalenceclasses reduces possible starting hands from 1624350 to 28561

Lossy

Bucketing (binning) groups hands into equivalence classes e.g. basedon their probability of winning at showdown against a random handImperfect recall eliminates past informationBetting round reductionBetting round elimination

12 / 44

Abstraction

Heads-up Limit Texas Hold’em poker has around 1018 states

Abstraction can reduce the game to e.g. 107 states

Nesterov’s excessive gap technique can find approximate Nashequilibria in a game with 1010 states

Counterfactual regret minimization can find approximate Nashequilibria in a game with 1012 states

13 / 44

Nash Equilibrium

Game theoretic solution

Set of strategies 1 per player such that no one can do better bychanging their strategy if the others keep their strategies fixed

Nash proved that in every game with finite players and pure strategiesthere is at least 1 (possibly mixed) Nash equilibrium

14 / 44

Annual Computer Poker Competition 2012Heads-up Limit Texas Hold’em

Total Bankroll:1 Slumbot (Eric Jackson, USA)

2 Little Rock (Rod Byrnes, Australia) and Zbot (Ilkka Rajala, Finland)

Bankroll Instant Run-off:1 Slumbot (Eric Jackson, USA)

2 Hyperborean (University of Alberta, Canada)

3 Zbot (Ilkka Rajala, Finland)

Heads-up No-Limit Texas Hold’emTotal Bankroll:

1 Little Rock (Rod Byrnes, Australia)


3 Tartanian5 (Carnegie Mellon University, USA)

Bankroll Instant Run-off:1 Hyperborean (University of Alberta, Canada)

2 Tartanian5 (Carnegie Mellon University, USA)

3 Neo Poker Bot (Alexander Lee, Spain)

3-player Limit Texas Hold’emTotal Bankroll:



3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University ofAuckland, New Zealand)

Bankroll Instant Run-off:1 Hyperborean (University of Alberta, Canada)


3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University ofAuckland, New Zealand)

Source: http://www.computerpokercompetition.org/index.php/competitions/results/90-2012-results

15 / 44

http://www.computerpokercompetition.org/index.php/competitions/results/90-2012-results

Annual Computer Poker Competition

Total Bankroll = total money won against all agents

Bankroll Instant Run-off1 Set S = all agents2 Set N = agents in a game3 Play every

(|S||N|)

possible matches between agents in S storing each

agent’s total bankroll4 Remove the agent(s) with the lowest total bankroll from S5 Repeat steps 2 and 3 until S only contains |N| agents6 Play a match between the last |N| agents and rank them according to

their total bankroll in this game

16 / 44

Extensive-Form Game

A finite set of players N = {1, 2, ..., |N|} ∪ {c}A finite set of action sequences or histories e.g.H = {(), ..., (A♥A♠), ...}Z ⊆ H terminal histories e.g. Z = {..., (A♥A♠, 2♦7♣, r ,F ), ...}A(h) = {a : (h, a) ∈ H} actions available after history h ∈ H\ZP(h) ∈ N ∪ {c} player who takes an action after history h ∈ H\Zui : Z → R utility function for player i

17 / 44

Extensive-Form Game

fc maps every history h where P(h) = c to an independent probabilitydistribution fc(a|h) for all a ∈ A(h)

Ii is an information partition (set of nonempty subsets of X whereeach element of X is in 1 subset) for player i

Ij ∈ Ii is player i ’s jth information set containing indistinguishablehistories e.g. Ij = {..., (A♥A♠, 2♦7♣), ..., (A♥A♠, 6♣3♠), ...}Player i ’s strategy σi is a function that assigns a distribution overA(Ij) for all Ij ∈ Ii where A(Ij) = A(h) for any h ∈ Ij

A strategy profile σ is a strategy for each player σ = {σ1, σ2, ..., σ|N|}

18 / 44

Nash Equilibrium

Nash Equilibrium:u1(σ) ≥ maxσ′

1∈Σ1u1(σ′1, σ2)

u2(σ) ≥ maxσ′2∈Σ2

u2(σ1, σ′2)

ε-Nash Equilibrium:u1(σ) + ε ≥ maxσ′

1∈Σ1u1(σ′1, σ2)

u2(σ) + ε ≥ maxσ′2∈Σ2

u2(σ1, σ′2)

19 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

20 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


21 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


22 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


23 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


24 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


25 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


26 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


27 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


28 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


29 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4

30 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


31 / 44

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K


32 / 44

Counterfactual Regret Minimization

Counterfactual regret minimization minimizes the maximumcounterfactual regret (over all actions) at every information set

Minimizing counterfactual regrets minimizes overall regret

In a two-player zero-sum game at time T , if both players’ averageoverall regret is less than ε, then σ̄T is a 2ε Nash equilibrium.

33 / 44


Counterfactual Value

vi (Ij |σ) =∑n∈Ij

πσ−i (root, n)ui (n)

ui (n) =∑

z∈Z [n]

πσ(n, z)ui (z)

vi (Ij |σ) is the counterfactual value to player i of information set Ijgiven strategy profile σ

πσ−i (root, n) is the probability of reaching node n from the rootignoring player i ’s contributions according to strategy profile σ

πσ(n, z) is the probability of reaching node z from node n accordingto strategy profile σ

ui (n) is the payoff to player i at node n if it is a leaf node or itsexpected payoff if it is a non-leaf node

Z [n] is the set of terminal nodes that can be reached from node n

34 / 44


I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

v1(I8|σ) =∑n∈I8

πσ−i (root, n)u1(n)

= 0.5 ∗ 0.5 ∗ 0.2 ∗ (0.0 ∗ −1 + 1.0 ∗ 2) +

0.5 ∗ 0.5 ∗ 0.9 ∗ (0.0 ∗ −1 + 1.0 ∗ 0)

= 0.1

35 / 44


Counterfactual Regret

r(Ij , a) = vi (Ij |σIj→a)− vi (Ij |σ)

r(Ij , a) is the counterfactual regret of not playing action a atinformation set Ij

Positive regret means the player would have preferred to play action arather than their strategy

Zero regret means the player was indifferent between their strategyand action a

Negative regret means the player preferred their strategy rather thanplaying action a

36 / 44


I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

1.0F

2

0.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

1.0F

0

0.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

v1(I8|σI8→F ) = 0.5 ∗ 0.5 ∗ 0.2 ∗ (1.0 ∗ −1 + 0.0 ∗ 2) +

0.5 ∗ 0.5 ∗ 0.9 ∗ (1.0 ∗ −1 + 0.0 ∗ 0)

= −0.275

r1(I8|F ) = v1(I8|σI8→F )− v1(I8|σ) = −0.275− 0.1 = −0.375

37 / 44


Cumulative Counterfactual Regret

RT (Ij , a) =T∑t=1

r t(Ij , a)

RT (Ij , a) is the cumulative counterfactual regret of not playing actiona at information set Ij for T time steps

Positive cumulative regret means the player would have preferred toplay action a rather than their strategy over those T steps

Zero cumulative regret means the player was indifferent between theirstrategy and action a over those T steps

Negative cumulative regret means the player preferred their strategyrather than playing action a over those T steps

38 / 44


Regret Matching

σT+1(Ij , a) =

RT ,+(Ij ,a)∑

a′∈A(Ij ) RT ,+(Ij ,a′)

if denominator is positive

1

|A(Ij )| otherwise

RT ,+(Ij , a) = max(RT (Ij , a), 0)

39 / 44


1 Initialise the strategy profile σ e.g. for all i ∈ N, for all Ij ∈ Ii and forall a ∈ A(Ij) set σ(Ij , a) = 1

|A(Ij )|2 For each player i ∈ N, for all Ij ∈ Ii and for all a ∈ A(Ij) calculate

r(Ij , a) and add it to R(Ij , a)

3 For each player i ∈ N, for all Ij ∈ Ii and for all a ∈ A(Ij) use regretmatching to update σ(Ij , a)

4 Repeat from 2

40 / 44


Cumulative counterfactual regret is bounded by

RTi (Ij) ≤

(maxz ui (z)−minz ui (z))√|A(Ij)|√

T

Total counterfactual regret is bounded by

RTi ≤

|Ii |(maxz ui (z)−minz ui (z))√

maxh:P(h)=i |A(h)|√T

41 / 44


(a) Number of game states, number of iterations, computation time, andexploitability of the resulting strategy for different sized abstractions

(b) Convergence rates for three different sized abstractions, x-axis showsiterations divided by the number of information sets in the abstraction

Source: 2008 - “Regret Minimization in Games with Incomplete Information” - Zinkevich et al

42 / 44

Summary

If you want to win (in expectation) at Texas Hold’em poker (againstexploitable players) then. . .

1 Abstract the version of Texas Hold’em poker you are interested so ithas at most 1012 game states

2 Run the counterfactual minimization algorithm on the abstraction forT iterations and obtain the average strategy profile σ̄Tabs

3 Map the average strategy profile σ̄Tabs for the abstracted game to oneσ̄T for the real game

4 Play your average strategy profile σ̄T against your (exploitable)opponents

43 / 44

References

1 Annual Computer Poker Competition Websitehttp://www.computerpokercompetition.org/

2 2008 - “Regret Minimization in Games with Incomplete Information” -Zinkevich et al -http://martin.zinkevich.org/publications/regretpoker.pdf

3 2007 - “Robust strategies and counter-strategies Building a champion levelcomputer poker player” - Johanson -http://poker.cs.ualberta.ca/publications/johanson.msc.pdf

4 2013 - “Monte Carlo Sampling and Regret Minimization for EquilibriumComputation and Decision-Making in Large Extensive Form Games” -Lanctot http://era.library.ualberta.ca/public/view/item/uuid:482ae86c-2045-4c12-b91c-3e7ce09bc9ae

44 / 44

http://www.computerpokercompetition.org/

http://martin.zinkevich.org/publications/regretpoker.pdf

http://poker.cs.ualberta.ca/publications/johanson.msc.pdf

http://era.library.ualberta.ca/public/view/item/uuid:482ae86c-2045-4c12-b91c-3e7ce09bc9ae

http://era.library.ualberta.ca/public/view/item/uuid:482ae86c-2045-4c12-b91c-3e7ce09bc9ae

how to win texas hold'em poker - school of computer ...mealingr/documents/how to win... · how...

Documents