how to win texas hold'em poker - school of computer ...mealingr/documents/how to win... · how...
TRANSCRIPT
![Page 1: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/1.jpg)
How to Win Texas Hold’em Poker
Richard Mealing
Machine Learning and Optimisation GroupSchool of Computer Science
University of Manchester
1 / 44
![Page 2: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/2.jpg)
How to Play Texas Hold’em Poker
1 Deal 2 private cards per player
2 1st (sequential) betting round
3 Deal 3 shared cards (“flop”)
4 2nd betting round
5 Deal 1 shared card (“turn”)
6 3rd betting round
7 Deal 1 shared card (“river”)
8 4th (final) betting round
If all but 1 player folds, that player wins the pot (total bet)
Otherwise at the end of the game hands are compared (“showdown”)and the player with the best hand wins the pot
2 / 44
![Page 3: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/3.jpg)
How to Play Texas Hold’em Poker
3 / 44
![Page 4: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/4.jpg)
How to Play Texas Hold’em Poker
Ante = forced bet (everyone pays)
Blinds = forced bets (2 people pay big/small)
If players > 2 then (big blind player, small blind player, dealer)If players = 2 (“heads-up”) then (big blind, small blind/dealer)
No-Limit Texas Hold’em lets you bet all your money in a round
Minimum bet = big blindMaximum bet = all your money
Limit Texas Hold’em Poker has fixed betting limits
A $4/$8 game means in betting rounds 1 & 2 bets = $4 and in bettingrounds 3 & 4 bets = $8Big blind usually equals “small” bet e.g. $4 and small blind is usually50% of big blind e.g. $2Total number of raises per betting round is usually capped at 4 or 5
4 / 44
![Page 5: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/5.jpg)
1-Card Poker Trees
1 Game tree - both players’ private cards are known
5 / 44
![Page 6: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/6.jpg)
1-Card Poker Trees
1 Public tree - both players’ private cards are hidden
6 / 44
![Page 7: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/7.jpg)
1-Card Poker Trees
1 P1 information set tree - P2’s private card is hidden
7 / 44
![Page 8: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/8.jpg)
1-Card Poker Trees
1 P2 information set tree - P1’s private card is hidden
8 / 44
![Page 9: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/9.jpg)
1-Card Poker Trees
1 Game tree - both players’ private cards are known
2 Public tree - both players’ private cards are hidden
3 P1 information set tree - P2’s private card is hidden
4 P2 information set tree - P1’s private card is hidden
9 / 44
![Page 10: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/10.jpg)
Heads-Up Limit Texas Hold’em Poker Tree Size
F
F
©
C
F
©
C
F
©
C
F
©
C
F
©
C
R
R
R
R
C
F
©
C
F
©
C
F
©
C
F
©
C
R
R
R
R
Cards Dealt
P1 dealt 2 private cards =(52
2
)= 1326
P2 dealt 2 private cards =(50
2
)= 1225
1st betting round = 29, 9 continuing
Flop dealt =(48
3
)= 17296
2nd betting round = 29, 9 continuing
Turn dealt = 45
3rd betting round = 29, 9 continuing
River dealt = 44
4th betting round = 29
10 / 44
![Page 11: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/11.jpg)
Heads-Up Limit Texas Hold’em Poker Tree Size
Player 1 Deal = 1
Player 2 Deal = 1326
1st Betting Round = 1326 * 1225 * 29
2nd Betting Round = 1326 * 1225 * 9 * 17296 * 29
3rd Betting Round = 1326 * 1225 * 9 * 17296 * 9 * 45 * 29
4th Betting Round = 1326 * 1225 * 9 * 17296 * 9 * 45 * 9 * 44 * 29
Total = 1.179× 1018 (quintillion)
11 / 44
![Page 12: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/12.jpg)
Abstraction
Lossless
Suit isomorphism, at the start (pre-flop) two hands are strategically thesame if each of their cards’ ranks match and they are both “suited” or“off-suit” e.g. (A♠K♠, A♣K♣) or (T♣J♠, T♦J♥), 169 equivalenceclasses reduces possible starting hands from 1624350 to 28561
Lossy
Bucketing (binning) groups hands into equivalence classes e.g. basedon their probability of winning at showdown against a random handImperfect recall eliminates past informationBetting round reductionBetting round elimination
12 / 44
![Page 13: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/13.jpg)
Abstraction
Heads-up Limit Texas Hold’em poker has around 1018 states
Abstraction can reduce the game to e.g. 107 states
Nesterov’s excessive gap technique can find approximate Nashequilibria in a game with 1010 states
Counterfactual regret minimization can find approximate Nashequilibria in a game with 1012 states
13 / 44
![Page 14: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/14.jpg)
Nash Equilibrium
Game theoretic solution
Set of strategies 1 per player such that no one can do better bychanging their strategy if the others keep their strategies fixed
Nash proved that in every game with finite players and pure strategiesthere is at least 1 (possibly mixed) Nash equilibrium
14 / 44
![Page 15: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/15.jpg)
Annual Computer Poker Competition 2012Heads-up Limit Texas Hold’em
Total Bankroll:1 Slumbot (Eric Jackson, USA)
2 Little Rock (Rod Byrnes, Australia) and Zbot (Ilkka Rajala, Finland)
Bankroll Instant Run-off:1 Slumbot (Eric Jackson, USA)
2 Hyperborean (University of Alberta, Canada)
3 Zbot (Ilkka Rajala, Finland)
Heads-up No-Limit Texas Hold’emTotal Bankroll:
1 Little Rock (Rod Byrnes, Australia)
2 Hyperborean (University of Alberta, Canada)
3 Tartanian5 (Carnegie Mellon University, USA)
Bankroll Instant Run-off:1 Hyperborean (University of Alberta, Canada)
2 Tartanian5 (Carnegie Mellon University, USA)
3 Neo Poker Bot (Alexander Lee, Spain)
3-player Limit Texas Hold’emTotal Bankroll:
1 Hyperborean (University of Alberta, Canada)
2 Little Rock (Rod Byrnes, Australia)
3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University ofAuckland, New Zealand)
Bankroll Instant Run-off:1 Hyperborean (University of Alberta, Canada)
2 Little Rock (Rod Byrnes, Australia)
3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University ofAuckland, New Zealand)
Source: http://www.computerpokercompetition.org/index.php/competitions/results/90-2012-results
15 / 44
![Page 16: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/16.jpg)
Annual Computer Poker Competition
Total Bankroll = total money won against all agents
Bankroll Instant Run-off1 Set S = all agents2 Set N = agents in a game3 Play every
(|S||N|)
possible matches between agents in S storing each
agent’s total bankroll4 Remove the agent(s) with the lowest total bankroll from S5 Repeat steps 2 and 3 until S only contains |N| agents6 Play a match between the last |N| agents and rank them according to
their total bankroll in this game
16 / 44
![Page 17: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/17.jpg)
Extensive-Form Game
A finite set of players N = {1, 2, ..., |N|} ∪ {c}A finite set of action sequences or histories e.g.H = {(), ..., (A♥A♠), ...}Z ⊆ H terminal histories e.g. Z = {..., (A♥A♠, 2♦7♣, r ,F ), ...}A(h) = {a : (h, a) ∈ H} actions available after history h ∈ H\ZP(h) ∈ N ∪ {c} player who takes an action after history h ∈ H\Zui : Z → R utility function for player i
17 / 44
![Page 18: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/18.jpg)
Extensive-Form Game
fc maps every history h where P(h) = c to an independent probabilitydistribution fc(a|h) for all a ∈ A(h)
Ii is an information partition (set of nonempty subsets of X whereeach element of X is in 1 subset) for player i
Ij ∈ Ii is player i ’s jth information set containing indistinguishablehistories e.g. Ij = {..., (A♥A♠, 2♦7♣), ..., (A♥A♠, 6♣3♠), ...}Player i ’s strategy σi is a function that assigns a distribution overA(Ij) for all Ij ∈ Ii where A(Ij) = A(h) for any h ∈ Ij
A strategy profile σ is a strategy for each player σ = {σ1, σ2, ..., σ|N|}
18 / 44
![Page 19: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/19.jpg)
Nash Equilibrium
Nash Equilibrium:u1(σ) ≥ maxσ′
1∈Σ1u1(σ′1, σ2)
u2(σ) ≥ maxσ′2∈Σ2
u2(σ1, σ′2)
ε-Nash Equilibrium:u1(σ) + ε ≥ maxσ′
1∈Σ1u1(σ′1, σ2)
u2(σ) + ε ≥ maxσ′2∈Σ2
u2(σ1, σ′2)
19 / 44
![Page 20: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/20.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
20 / 44
![Page 21: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/21.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
21 / 44
![Page 22: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/22.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
22 / 44
![Page 23: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/23.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
23 / 44
![Page 24: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/24.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
24 / 44
![Page 25: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/25.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
25 / 44
![Page 26: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/26.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
26 / 44
![Page 27: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/27.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
27 / 44
![Page 28: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/28.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
28 / 44
![Page 29: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/29.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
29 / 44
![Page 30: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/30.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4
30 / 44
![Page 31: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/31.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4
31 / 44
![Page 32: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/32.jpg)
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4
32 / 44
![Page 33: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/33.jpg)
Counterfactual Regret Minimization
Counterfactual regret minimization minimizes the maximumcounterfactual regret (over all actions) at every information set
Minimizing counterfactual regrets minimizes overall regret
In a two-player zero-sum game at time T , if both players’ averageoverall regret is less than ε, then σ̄T is a 2ε Nash equilibrium.
33 / 44
![Page 34: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/34.jpg)
Counterfactual Regret Minimization
Counterfactual Value
vi (Ij |σ) =∑n∈Ij
πσ−i (root, n)ui (n)
ui (n) =∑
z∈Z [n]
πσ(n, z)ui (z)
vi (Ij |σ) is the counterfactual value to player i of information set Ijgiven strategy profile σ
πσ−i (root, n) is the probability of reaching node n from the rootignoring player i ’s contributions according to strategy profile σ
πσ(n, z) is the probability of reaching node z from node n accordingto strategy profile σ
ui (n) is the payoff to player i at node n if it is a leaf node or itsexpected payoff if it is a non-leaf node
Z [n] is the set of terminal nodes that can be reached from node n
34 / 44
![Page 35: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/35.jpg)
Counterfactual Regret Minimization
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
v1(I8|σ) =∑n∈I8
πσ−i (root, n)u1(n)
= 0.5 ∗ 0.5 ∗ 0.2 ∗ (0.0 ∗ −1 + 1.0 ∗ 2) +
0.5 ∗ 0.5 ∗ 0.9 ∗ (0.0 ∗ −1 + 1.0 ∗ 0)
= 0.1
35 / 44
![Page 36: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/36.jpg)
Counterfactual Regret Minimization
Counterfactual Regret
r(Ij , a) = vi (Ij |σIj→a)− vi (Ij |σ)
r(Ij , a) is the counterfactual regret of not playing action a atinformation set Ij
Positive regret means the player would have preferred to play action arather than their strategy
Zero regret means the player was indifferent between their strategyand action a
Negative regret means the player preferred their strategy rather thanplaying action a
36 / 44
![Page 37: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/37.jpg)
Counterfactual Regret Minimization
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
1.0F
2
0.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
1.0F
0
0.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
v1(I8|σI8→F ) = 0.5 ∗ 0.5 ∗ 0.2 ∗ (1.0 ∗ −1 + 0.0 ∗ 2) +
0.5 ∗ 0.5 ∗ 0.9 ∗ (1.0 ∗ −1 + 0.0 ∗ 0)
= −0.275
r1(I8|F ) = v1(I8|σI8→F )− v1(I8|σ) = −0.275− 0.1 = −0.375
37 / 44
![Page 38: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/38.jpg)
Counterfactual Regret Minimization
Cumulative Counterfactual Regret
RT (Ij , a) =T∑t=1
r t(Ij , a)
RT (Ij , a) is the cumulative counterfactual regret of not playing actiona at information set Ij for T time steps
Positive cumulative regret means the player would have preferred toplay action a rather than their strategy over those T steps
Zero cumulative regret means the player was indifferent between theirstrategy and action a over those T steps
Negative cumulative regret means the player preferred their strategyrather than playing action a over those T steps
38 / 44
![Page 39: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/39.jpg)
Counterfactual Regret Minimization
Regret Matching
σT+1(Ij , a) =
RT ,+(Ij ,a)∑
a′∈A(Ij ) RT ,+(Ij ,a′)
if denominator is positive
1
|A(Ij )| otherwise
RT ,+(Ij , a) = max(RT (Ij , a), 0)
39 / 44
![Page 40: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/40.jpg)
Counterfactual Regret Minimization
1 Initialise the strategy profile σ e.g. for all i ∈ N, for all Ij ∈ Ii and forall a ∈ A(Ij) set σ(Ij , a) = 1
|A(Ij )|2 For each player i ∈ N, for all Ij ∈ Ii and for all a ∈ A(Ij) calculate
r(Ij , a) and add it to R(Ij , a)
3 For each player i ∈ N, for all Ij ∈ Ii and for all a ∈ A(Ij) use regretmatching to update σ(Ij , a)
4 Repeat from 2
40 / 44
![Page 41: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/41.jpg)
Counterfactual Regret Minimization
Cumulative counterfactual regret is bounded by
RTi (Ij) ≤
(maxz ui (z)−minz ui (z))√|A(Ij)|√
T
Total counterfactual regret is bounded by
RTi ≤
|Ii |(maxz ui (z)−minz ui (z))√
maxh:P(h)=i |A(h)|√T
41 / 44
![Page 42: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/42.jpg)
Counterfactual Regret Minimization
(a) Number of game states, number of iterations, computation time, andexploitability of the resulting strategy for different sized abstractions
(b) Convergence rates for three different sized abstractions, x-axis showsiterations divided by the number of information sets in the abstraction
Source: 2008 - “Regret Minimization in Games with Incomplete Information” - Zinkevich et al
42 / 44
![Page 43: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/43.jpg)
Summary
If you want to win (in expectation) at Texas Hold’em poker (againstexploitable players) then. . .
1 Abstract the version of Texas Hold’em poker you are interested so ithas at most 1012 game states
2 Run the counterfactual minimization algorithm on the abstraction forT iterations and obtain the average strategy profile σ̄Tabs
3 Map the average strategy profile σ̄Tabs for the abstracted game to oneσ̄T for the real game
4 Play your average strategy profile σ̄T against your (exploitable)opponents
43 / 44
![Page 44: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group](https://reader031.vdocument.in/reader031/viewer/2022030400/5a70444f7f8b9ab6538bc6b8/html5/thumbnails/44.jpg)
References
1 Annual Computer Poker Competition Websitehttp://www.computerpokercompetition.org/
2 2008 - “Regret Minimization in Games with Incomplete Information” -Zinkevich et al -http://martin.zinkevich.org/publications/regretpoker.pdf
3 2007 - “Robust strategies and counter-strategies Building a champion levelcomputer poker player” - Johanson -http://poker.cs.ualberta.ca/publications/johanson.msc.pdf
4 2013 - “Monte Carlo Sampling and Regret Minimization for EquilibriumComputation and Decision-Making in Large Extensive Form Games” -Lanctot http://era.library.ualberta.ca/public/view/item/uuid:482ae86c-2045-4c12-b91c-3e7ce09bc9ae
44 / 44